It is high time we started developing humanity’s long-term digital memory. Are we, as archivists and digital preservation researchers, up to the job? Do we understand the nature of digital society and its memory needs? Do we understand digital technology and how it evolves? Above all, do we have the right mindset to become digital archivists?

1 From archiving to storytelling

Postmodernism has taken hold of our mindset and sensitized us to the human influence on the making, safekeeping, using and destructing of records and archives. Whether by those in power while creating the records, those with ‘trusted’ archival systems while keeping the records, those in research, education, the media or entertainment when interpreting the records, or those overthrowing those in power and deliberately destroying the records, human intervention on the recording of our collective societal memory is multiple and complex. This recording process, with its never-ending acts of edits and deletions, is a form of continuously evolving storytelling and memorializing (Piggott 2007). This realization has become widespread and has overturned archivists’ image as impartial and neutral record keepers. Our profession responded by flipping its role to one of intentionally “adding value” to our societal memory. Some archivists have turned activists, pro-actively re-shaping our present memory to help ‘reconstruct marginalized identities’. Others, focusing on electronic records, have sharpened appraisal practices to ensure that ‘only content with enduring value’ is permanently retained (Cook 2001). In the post-truth crisis, archivists are now being seduced to “capture the truth” and “protect democracies against alternate facts”, while at the same time, attempting to remedy generations of bias in the archives under pressure of the equality, diversity and inclusion movement. In the words of Terry Cook: “They have evolved from being, allegedly, impartial custodians of inherited records to becoming intervening agents who set record-keeping standards and, most pointedly, who select for archival preservation only a tiny portion of the entire universe of recorded information. Archivists have become in this way very active builders of their own ‘houses of memory’” (Cook 1997).

2 From keepers to destroyers of records

The storytelling mindset of the post-modern archivist did not just come about. It could develop thanks to the evolution of the archival concept of appraisal during the twentieth century. Appraisal is “the process of distinguishing records of continuing value from those of no further value so that the latter may be eliminated” (The National Archives 2013). During the nineteenth century, it was considered as an undesirable archival intervention. The most influential manual for archivists from 1898 did not discuss appraisal and cautioned against destroying records (Muller et al. 1898). In those times, the archival mindset was infused with principles from the French Revolution and historical positivism which together, shaped the ideal of building an archival documentary record that is as complete, authentic and reliable as possible—i.e., almost exactly as it was recorded in the course of an organization’s or person’s ongoing affairs.

This mindset changed dramatically in the course of the twentieth century, which saw a phenomenal increase in the rate, volume and variety of paper records. Backlogs of materials piled up awaiting archival processing, and archives faced a permanent shortage of physical storage capacity. The stage was set for the development of a systematic approach to archival appraisal, including a controlled process of record destruction. Archivists did everything in their power to retain only a small subset of documents which “possess the optimum concentration of desired information so that a maximum of documentation is achieved with a minimum of documents” (Booms 1987).

3 Resisting the digital wave

With the shift from analogue to digital, the overflow problem has gotten completely out of hand. The amount of information generated, collected and recorded by digital society has exceeded human imagination. In response, archivists intensified their appraisal practices. However, the changeable and dynamic nature of electronic records challenged most archival assumptions, expertise and practices. The rapid obsolescence of the hardware, the software and the medium carrying the bits, left the profession at a loss for solutions to appraise and preserve the records. From the 1980s until today, archivists have taken de-facto appraisal decisions, some of which are questionable and others downright short-sighted. One tactic was to put appraisal and the intake of electronic records on hold until the technologies and document formats used by records creators would stabilize. Another was to print born-digital documents onto paper or, in some cases, to transform the original file into a preservation format. This seemed the best solution to fix the documents and make them stable, at the cost of violating their integrity, authenticity and evidential value. Well into the nineties, it was not strange to see archivists transform light-weight email text-messages into heavy TIFF image files. Another proposed solution was to force the IT-industry to support a list of preservation formats and engineer an office environment that would generate non-proprietary and perennial document formats. During this whole period, archivists have demonstrated a great reluctance to accept electronic records under their custody, because they could not adapt to the nature of digital information. Many have systematically pushed back and drawn a line between records that they would keep and those they would not, in keeping with the motto “not all records having archival value can be kept” (Bak 2016). The technical valuation of archival records overruled the valuation of the content. As a result, the majority of the electronic records produced up-to date has been destroyed or abandoned in the depth of IT-backup systems and legacy systems awaiting sunsetting.

4 Leveraging digital technology

In the meantime, the most significant digital preservation issues have been resolved with digital technologies that have been around for decades. Offline long-term storage has become stable, secure and affordable. Long-term access solutions have been devised and tested—such as emulation and virtualization technologies. Even if not used now, these can be implemented in the future, when the demand for access justifies it. Digital forensics is a growing area of expertise specializing in the recovery and investigation of evidence on digital devices. In short, technology is not the problem. Even better, it offers the unimagined opportunity to build digital high-fidelity archives in the twenty-first century, containing the real-time master-recordings of the way we conducted our affairs as digital organizations and individuals (van der Werf and van der Werf 2020). Digital technology excels in recording, monitoring and logging all human interventions during the creation, safekeeping, use and deletion of digital information. Therefore, while human interventions during the record’s lifecycle intensify with the use of digital technology, this same technology offers the potential to accurately document all these interventions—something hitherto impossible.

But are we leveraging this potential? Are the log files and audit trails of record-keeping systems preserved or are they discarded by the appraisal regime?

We have not yet fully grasped and exploited the potential of past digital technologies and here comes the next digital wave: AI and Big Data. By now, we are talking about petabytes and petabytes of data being generated and processed in governments, businesses and society. And, as with every new generation of digital technology, archivists—entrenched with their appraisal and selection mindset—are asking themselves: “How do we figure out what to keep and what to throw away?”. The impressive volumes of data are not helping them to flip this mindset around and say—like IT-people do—“1,5 petabytes? That is not a big deal”.

5 The temptation to use AI to perpetuate an obsolete principle

Faced with the reality of the uncontrollable flow of born-digital records, archivists have pinned their hopes on AI-powered appraisal. Experiments to train learning algorithms to replicate experts’ assessment of e-mails with and without business value have been reported to be successful. In another context, tools are being developed to support the identification and extraction of corporate data for the timely disposal or transfer to an archival repository (Colavizza et al. 2021). However, before drawing conclusions from these advances, one should be aware of the pitfalls.

The algorithms used are mostly off-the-shelf products and rely on language comprehension models. Experiments with recognizing sensitive data in email message corpora have demonstrated the importance of context in such models. The use of wrong context variables or insufficient relevant context variables makes these models prone to bias. This has raised awareness of the ease with which AI bias could skew the record of the future (Seles 2020). Moreover, algorithms that focus on text—for example, in the body and subject field of email messages—do not deal with system features which provide background knowledge—for example, about the identity of the sender and receiver in messaging systems. Background knowledge is a necessary component of communication. It gets lost when individual messages are selected and archived outside of their native system. If the context of the record is gone, even an expert will struggle to understand the content. Finally, noise—such as the duplication of information in email conversation threading—often provides meaningful information as well. Therefore, there is a strong and compelling case for keeping the environmental context of digital content intact. Every practical IT-person would advise that the best, and by far the easiest solution, is to archive the entire system where the content was originally created and processed. Robust methods and tools can help with this and track the updates while the system is still in use, so that snapshots from specific time periods can be reconstructed.

6 When things get out of hand

In the future, when a user requests access to the system—be that in a year or a hundred years—then is the time to decide which access restrictions still apply and which filtering mechanism to use. At that point in time, archivists have a duty to intervene to protect data privacy and classified information. This is where the shoe pinches: how to identify sensitive information in the sea of data at the point of need?

Archivists have decided to combine the sensitivity review with the appraisal and selection process—so that all checks are performed before the records are transferred to the archive—with the aim of being in control of a well-organized and “clean” archive. This task encounters a whole set of issues—especially as record creators needed the time to establish sound digital information management practices—but also because data protection has become a moving target. As a result, government agencies and archival institutions have reached the point where the inability to accurately identify sensitive data at scale makes it practically impossible for them to grant freedom of information (FOI) requests or permit archival research (Colavizza et al. 2021). Clearly then, destroying as much information as possible, as soon as the retention schedules allow, can help ease the task of both record creators and keepers.

7 Give future generations the chance to build their own, rich memories

As digital archivists, we cling to our appraisal mindset as a panacea for all the problems we face, and we worry about giving too much access to the data under our custody. The use of AI will serve this propensity and aggravate its impact. It will reproduce and amplify the ‘natural stupidity’—i.e., errors in judgment and biases, as Tversky and Kahneman have defined it (Lewis 2016)—that has crept into the profession’s appraisal practice during the shift from paper to digital. We need to change.

We need to embrace digital society and understand that information overload is a prerequisite for it to thrive. In this view, digital archives are big data and AI the way to unlock them. The way digital scholars are using data mining techniques to parse through large volumes of data in digital library collections gives us a glimpse of how digital archives will be used by future generations: as data collections, breaking down the descriptive silos we have built, and using AI-based tools to construct multiple collective and individualized memories, histories and truths. The more data we leave behind the richer their storytelling and memorializing will be.

Therefore, the question for our profession is: will AI serve the power to destroy electronic records and amplify present-day biases? Or should we be the record-keepers and let future generations decide how to use AI to unlock the inherited records and enhance their digital memory?