A recent front page article in the New York Times, entitled In the Digital Age, Federal Files Slip into Oblivion, really caught my attention. The article described a problem with which I am painfully and intimately familiar, namely the struggle to preserve the electronic record of government processes and deliberations. Quoting from the article,
Many federal officials admit to a haphazard approach to preserving e-mail and other electronic records of their work. Indeed, many say they are unsure what materials they are supposed to preserve.
This confusion is causing alarm among historians, archivists, librarians, Congressional investigators and watchdog groups that want to trace the decision-making process and hold federal officials accountable. With the imminent change in administrations, the concern about lost records has become more acute.
Even with an army of government clerks, there is a limit to how many pieces of paper the federal government could produce. However, the explosive growth of digital communications and document preparation has far outstripped the processes and technology available to the Library of Congress and the National Archives and Records Administration (NARA). However, it is not just the volume of digital data, it is the diversity of electronic formats and the myriad of physical devices on which the data is stored.
Imagine receiving a truck filled with PC disk drives and being expected to identify, curate and manage the data contained on them. Sound daunting and farfetched? It isn't. This is precisely what the Clinton White House delivered to the National Archives for preservation; though it included a mere 32 million e-mail messages. (Remember that the White House did not have Internet access until DARPA and Randy Katz wired it in the 1990s.)
Given the growth of electronic communication since the early 1990s, the Bush administration will undoubtedly have generated hundreds of millions of e-mail messages that must be preserved, along with a plethora of electronic documents in a dizzying array of file formats. In addition to the standard challenges of document identification, extraction and preservation, the Archives of course must deal with national security and classification issues, further exacerbating the challenge.
I have seen this struggle first hand, as a member of the Advisory Committee for the Electronic Records Archive (ACERA), the digital document preservation project of the National Archives. The National Archives are building a web accessible, indexed repository that will eventually host at least a portion of the torrent of digital data pouring from the federal government. It is an arduous and difficult journey, with more work ahead.