Libraries, Archives, and Digital Preservation: The Reality of What We Must DoLeslie JohnstonActing Director, National Digital Information Infrastructure and Preservation ProgramLibrary of Congress
Libraries are receiving and ingesting large digital collections and must consider new ways in which they should be managed and used, not to mention preserved
What are examples of some of the challenges of managing and preserving large digital collections in many formats, and making them available for re-use?
Sheer amount.Huge variation in file formats.Unclear and undocumented rights.SecurityMissing metadata.Data citation and identifier issues.Unclear retention periods.Discovery expectations: discovery across collections and institutions together.Cost.
I will mention infrastructure only in passing.
There are scale issues related to:
BandwidthStorageBackup and tape archivingSoftware developmentStaffing for processing
There is no national preservation architecture, system, or storage backend.Highly variable institution by institution, but commonalities in backend repository systems, ingest models, and discovery models.Community- and discipline-based repositories, often with an unclear relationship to libraries or archives.Multiple methods for certifying the trust level for a repository.Agreed upon protocols and mechanisms for the transfer of files, but no single standard for the interchange of files and metadata between environments.Synchronization and versioning are not just a technical challenge; it complicates management and preservation and access.
And at the Library of Congress?The Library is currently modifying its preservation and collection security policies around digital collections.
The Library has repository services that inventory its file assets and maintains multiple copies of files on servers and on tape, in geographically distributed locations.
The Library developed the BagIt transfer specification for the movement of files between and within organizations.http://www.digitalpreservation.gov/documents/bagitspec.pdf
The Library has documented sustainability factors for file formats.http://www.digitalpreservation.gov/formats/
For cases where we do have control over what comes in, we have a Best Edition Preferred Formats statement, which is currently being updated. http://www.copyright.gov/circs/circ07b.pdf
Guidance for Personal Digital Archivinghttp://digitalpreservation.gov/personalarchiving/
The Library is developing Digital Format Preservation Action Plans.
Tangentially, there are many new library and archive public services to be planned for with new digital collection uses and expectations, which must be supported by digital preservation efforts.
Digital collections need to be ingested, preserved, and made available with appropriate controls, through an unmediated self-serve service.
Are libraries and archives ready?
Yes and No.
What are the emerging Digital Preservation Services?
We must develop sufficient infrastructure for distributed, replicated preservation storage.
We will spend an increasing amount of time auditing our files and storage to ensure that no issues have arisen.
We may need to process all files to create a variety of derivatives that are more sustainable, and that might be required for various forms of use and analysis before ingesting them and providing access.
We must develop sufficient infrastructure to support large scale discovery.
We are comfortable with self-service through the institutional repository model, but can libraries ingest, manage and provide access to an increasing number of digital collections without any mediation?
We are providing quite a bit of guidance to researchers on digital preservation standards and personal digital preservation.
And where are the digital preservation innovations?The Cloud as a supplement NOT a replacement for local preservation storage resources.
In the adaptation and use of legal digital forensics tools for the analysis and creation of complete and authentic copies of unique digital media.
In the increasingly useful content characterization tools, such as JHOVE and DROID, FITS, and UDFR, so we can more fully understand the risks inherent in the files in our collections.
In virtualization and emulation technologies used to recreate environments needs for digital preservation and for access.