14
Volume 107, Number 1, January–February 2002 Journal of Research of the National Institute of Standards and Technology [J. Res. Natl. Inst. Stand. Technol. 107, 93–106 (2002)] The State of the Art and Practice in Digital Preservation Volume 107 Number 1 January–February 2002 Kyong-Ho Lee, Oliver Slattery, Richang Lu, Xiao Tang, and Victor McCrary National Institute of Standards and Technology, Gaithersburg, MD 20899-8951 [email protected] [email protected] [email protected] [email protected] [email protected] The goal of digital preservation is to ensure long-term access to digitally stored information. In this paper, we present a survey of techniques used in digital preservation. We also introduce representative digital preservation projects and case studies that provide insight into the advantages and disadvantages of different preservation strategies. Finally, the pros and cons of current strategies, critical issues for digital preservation, and future directions are discussed. Key words: digital preservation; emulation; encapsulation; migration; standardization; XML. Accepted: November 29, 2001 Available online: http://www.nist.gov/jres 1. Introduction Information preservation is one of the most important issues in human history, culture, and economics, as well as the development of our civilization. While earliest information was recorded in carvings on stone, ceramic, bamboo, or wood, the development of civilization paved the way for new storage media and techniques for recording information, such as writing on silk or print- ing on paper. Eventually we were able to put photo- graphic images on film and music on records. A revolu- tionary change occurred in the information storage field with the invention of electronic storage media. With the advent of high-performance computing and high-speed networks, the use of digital technologies is increasing rapidly. Digital technologies enable informa- tion to be created, manipulated, disseminated, located, and stored with increasing ease. Ensuring long-term access to the digitally stored information poses a signif- icant challenge, and is increasingly recognized as an important part of digital data management [1,2]. The evolution of data storage media and the develop- ment of the preservation technology can be described as shown in Fig. 1. This diagram lists the various media used in data storage (both digital and analog) and the techniques needed to ensure that the data on them is preserved. It also highlights the trend from analog to digital/optical storage media and indicates the transfer of data from one generation of media to the next. It is clear that while it is easier to create, amend, and distribute digital data, the media storing this data such as optical discs are not as robust as traditional analog media such as paper or film. In view of modern infor- mation preservation requirements, this paper will focus on the aspects of the technical strategies used in digital information preservation. Digital preservation involves the retention of both the information object and its meaning. It is therefore necessary that preservation techniques be able to under- stand and re-create the original form or function of the 93

[J. Res. Natl. Inst. Stand. Technol. 107 The State of …Volume 107, Number 1, January–February 2002 Journal of Research of the National Institute of Standards and Technology [J

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: [J. Res. Natl. Inst. Stand. Technol. 107 The State of …Volume 107, Number 1, January–February 2002 Journal of Research of the National Institute of Standards and Technology [J

Volume 107, Number 1, January–February 2002Journal of Research of the National Institute of Standards and Technology

[J. Res. Natl. Inst. Stand. Technol. 107, 93–106 (2002)]

The State of the Art and Practice inDigital Preservation

Volume 107 Number 1 January–February 2002

Kyong-Ho Lee, Oliver Slattery,Richang Lu, Xiao Tang, andVictor McCrary

National Institute of Standards andTechnology,Gaithersburg, MD 20899-8951

[email protected]@[email protected]@[email protected]

The goal of digital preservation is toensure long-term access to digitallystored information. In this paper, wepresent a survey of techniques used indigital preservation. We also introducerepresentative digital preservation projectsand case studies that provide insight intothe advantages and disadvantages ofdifferent preservation strategies. Finally, thepros and cons of current strategies,critical issues for digital preservation, andfuture directions are discussed.

Key words: digital preservation; emulation;encapsulation; migration; standardization;XML.

Accepted: November 29, 2001

Available online: http://www.nist.gov/jres

1. Introduction

Information preservation is one of the most importantissues in human history, culture, and economics, as wellas the development of our civilization. While earliestinformation was recorded in carvings on stone, ceramic,bamboo, or wood, the development of civilization pavedthe way for new storage media and techniques forrecording information, such as writing on silk or print-ing on paper. Eventually we were able to put photo-graphic images on film and music on records. A revolu-tionary change occurred in the information storage fieldwith the invention of electronic storage media.

With the advent of high-performance computing andhigh-speed networks, the use of digital technologies isincreasing rapidly. Digital technologies enable informa-tion to be created, manipulated, disseminated, located,and stored with increasing ease. Ensuring long-termaccess to the digitally stored information poses a signif-icant challenge, and is increasingly recognized as animportant part of digital data management [1,2].

The evolution of data storage media and the develop-ment of the preservation technology can be described asshown in Fig. 1. This diagram lists the various mediaused in data storage (both digital and analog) and thetechniques needed to ensure that the data on them ispreserved. It also highlights the trend from analog todigital/optical storage media and indicates the transferof data from one generation of media to the next. Itis clear that while it is easier to create, amend, anddistribute digital data, the media storing this data suchas optical discs are not as robust as traditional analogmedia such as paper or film. In view of modern infor-mation preservation requirements, this paper will focuson the aspects of the technical strategies used in digitalinformation preservation.

Digital preservation involves the retention of boththe information object and its meaning. It is thereforenecessary that preservation techniques be able to under-stand and re-create the original form or function of the

93

Page 2: [J. Res. Natl. Inst. Stand. Technol. 107 The State of …Volume 107, Number 1, January–February 2002 Journal of Research of the National Institute of Standards and Technology [J

Volume 107, Number 1, January–February 2002Journal of Research of the National Institute of Standards and Technology

object to ensure its authenticity and accessibility.Preservation of digital information is complex becauseof the dependency digital information has on itstechnical environment. Furthermore, as newer digitaltechnologies rapidly appear and older ones are discon-tinued, information that relies on obsolete technologiessoon becomes inaccessible. Therefore, digital resourcespresent more difficult problems than conventionalanalog media such as paper-based books [3].

Recently, several approaches for digital preservationhave been identified and presented. Conventionalmethods are mainly technology emulation, informationmigration, and encapsulation [4,5,6,7,8]. However,there is a lack of proven preservation methods to ensurethat the information will continue to be readable.

To promote a solid understanding of the pros andcons of different preservation techniques, this papertries to present a comprehensive survey of them througha review of a wide range of literature and representativeprojects. We suggest a preservation strategy based onXML (eXtensible Markup Language) [9] for interoper-ability and interchangeability of preserved digital infor-mation. The most appropriate preservation strategyshould be determined by considering various aspectsincluding cost-effectiveness, legal restrictions, and useraccess requirements. However, as stated earlier, thispaper particularly focuses on the technical issues ofpreservation.

This paper is organized as follows. Section 2 summa-rizes various techniques for digital preservation. Theiradvantages and disadvantages are presented through aliterature survey. By introducing current projects andcase studies, the application of these preservationtechniques is described in detail in Sec. 3. Finally,points of discussion and a summary are given in Sec. 4.

2. Digital Preservation Techniques

This section introduces the current strategies fordigital preservation. Techniques for the preservation ofdigital information include technology preservation,technology emulation, information migration, andencapsulation.

Digital resources can be stored on any mediumthat can represent their binary digits or bits, such as aCD-ROM or a DVD. Rothenberg [10] defines a bitstream as an intended meaningful sequence of bits withno intervening spaces, punctuation, or formatting. Topreserve that bit stream, the first requirement is toensure that the bit stream is stored on a stable medium.If the digital medium deteriorates or becomes obsoletebefore the digital information has been copied ontoanother medium, the data will be lost.

Therefore, digital preservation involves copying thedigital information onto newer media before the oldmedia becomes so obsolete that the data cannot beaccessed. This is referred to as copying or refreshing[6,8]. This process preserves the integrity of the digitalinformation. Well-established techniques for preservingthe integrity of the digital information exist at thislevel [5].

For timely refreshing, the lifetime of digital storagemedia must be predicted. The lifetime of a mediumdetermines the period of time in which the informationrecorded on the medium is stored safely withoutloss. The specification of the lifetime of the mediumwill prompt librarians or archivists to refresh theirmedia before medium deterioration. Recently,Zwaneveld [11] suggested that media could be ratedinto five classes according to certain criteria includingthe lifetime.

Fig. 1. A progression of information preservation strategies.

94

Page 3: [J. Res. Natl. Inst. Stand. Technol. 107 The State of …Volume 107, Number 1, January–February 2002 Journal of Research of the National Institute of Standards and Technology [J

Volume 107, Number 1, January–February 2002Journal of Research of the National Institute of Standards and Technology

However, simply copying the digital information isnot always sufficient as a preservation strategy [5]. Onehas to ensure that the information can be retrieved andprocessed in the future. Retrieving a bit stream requiresa hardware device such as a disk drive for reading thephysical representation of the bits from the medium, aswell as driver software.

Existing preservation strategies can be broadlyclassified into two main approaches as shown in Fig. 2.The first is the more conservative approach where theoriginal technological environment is fully preservedfor decoding the digital information in the future. Thisapproach can be further divided into two preservationtechniques. The first is to preserve the working replicasof all computer hardware and software platformsfor future use. This is referred to as the technologypreservation strategy [5]. The other is to program thenewer computer systems to emulate on demand theolder obsolete platforms and operating systems. This isthe so-called technology emulation strategy [10].

The second approach is to overcome the technicalobsolescence of file formats. It may also be classifiedinto two techniques. The first is to transform or convertthe old digital resource to a format that is independentof the particular hardware and software that wereapplied to create them. This is called the informationmigration strategy [5]. The second technique, termedencapsulation, is where a digital object and anythingelse necessary to provide access to that object aregrouped together and preserved [12].

2.1 Technology Preservation

This solution supposes that complete museums ofobsolete equipment could be maintained in order toreplicate any old configuration of hardware andsoftware [13]. This strategy involves preserving anoriginal application program, operating system soft-ware, and hardware platform [6].

The advocates of this strategy emphasize that theoriginal environment needs to be run to really preserve

the behavior as well as the look and feel of the digitalobject. For some digital objects, this may be the bestsolution at least in the short-term because it ensures thatthe material is accessible by preserving the access toolsas well as the object itself.

However, various issues including space, mainte-nance, and costs may make this impossible in thelonger term. Specifically, equipment ages and breaks,documentation disappears, vendor support vanishes, andthe storage medium as well as the equipment deterio-rates [14]. This strategy also limits the portability of theresource since it will depend on hardware stored inspecific locations [3].

2.2 Technology Emulation

This strategy has a lot in common with the technologypreservation strategy. It involves preserving the originalapplication program. Emulator programs can bedesigned and run on future computer platforms. Theemulator is programmed to mimic the behavior of oldhardware platforms and operating system software, suchas for games and executable files [6,16]. However, thisstrategy does not involve preserving ageing hardwareand original operating system software.

The goal of emulation is to preserve the look and feelof the digital object as well as its functionality. Theessence of this strategy is to copy the technical contextof the resource allowing the original object or arefreshed copy of the original object to be used in thefuture. Rothenberg illustrates the conceptual view of therelationships among elements of the emulation processas shown in Fig. 3 [15]. This approach decouplesapplication programs from the platform via a virtualmachine with the result that applications can run ondifferent platforms by migrating the virtual machine tothose platforms. The Java virtual machine might be anexample.

Fig. 2. Existing preservation approaches.

Fig. 3. Rothenberg’s emulation-based preservation [15].

95

Page 4: [J. Res. Natl. Inst. Stand. Technol. 107 The State of …Volume 107, Number 1, January–February 2002 Journal of Research of the National Institute of Standards and Technology [J

Volume 107, Number 1, January–February 2002Journal of Research of the National Institute of Standards and Technology

Hendley sees emulation as a short to medium termstrategy or a specialist strategy where the need tomaintain the physical presence of the original digitalresource is of great importance to the users. Again, hesees technology emulation being primarily used in caseswhere digital resources cannot be converted intosoftware independent formats and migrated forward.

Waugh et al. [12] indicate that the applicationsoftware itself could contain viruses that would result inthe loss of information over time. Particularly, they notethat emulation requires preserving a significant amountof information. They consider it useful if the goal is topreserve the software as an artifact itself and if futureuser organizations lack sufficient knowledge to under-stand the format of the digital information. Granger [16]discusses its potential advantages such as the re-creationof the look and feel of the resource. They identify thepossible disadvantages as the undefined nature oftechnological change and the complexity of creatingemulator specifications. His conclusion is thatemulation is not a complete digital preservation solutionbut a partial one.

On the contrary, Russell [3] states that emulationpotentially offers the best solution for very long-termpreservation of digital material, especially for resourcesfor which the value is unknown and where future use ofthe material is unlikely. Woodyard [1] points out thatemulation could be a more suitable solution thanmigration for long-term access to complex digitalresources such as executable files. Gilheany [18]discusses the need for emulators to permanentlypreserve the functionality of computers. Rothenbergadvocates and recommends emulation as the bestpossible solution [10,18] and presents results from thefirst phase of an emulation experiment [15] in theNetworked European Deposit Library (NEDLIB)1

project [19].Holdsworth and Wheatley [20] consider the method

of Rothenberg, in which the production of an emulatorcan be postponed by instead producing an emulatorspecification at the time of platform obsolescence, as avery risky approach to preserving information forever.This is because there is no guarantee that a specificationproduced can be used to produce an emulator in thefuture. They present some illustrative guidelines for theuse of emulation and argue that emulation is a validmethod of digital preservation, at least for systems wherethe documentation is not available in electronic form.Holdsworth [21] also suggests the use of widelyavailable programming languages such as C in theimplementation of emulation.

1 http://www.konbib.nl/nedlib/

The emulation approach requires detailed specifica-tions for the outdated hardware and operating systemsoftware [10]. Therefore, standard formats for theemulation specification need to be developed. Theextent to which emulation should mimic the originaltechnical environment entirely or emulates only thosecomponents necessary to access the data remains anissue for debate.

2.3 Information Migration

The migration of digital information refers to theperiodic transfer of digital materials from one hardwareand software configuration to another, or from onegeneration of computer technology to a subsequentgeneration [5]. The purpose of migration is to preservethe integrity of digital objects and to retain the ability ofusers to retrieve, display, and use them in the face ofconstantly changing technology.

The Open Archiving Information System (OAIS)model [22], developed by the Consultative Committeefor Space Data Systems (CCSDS)2, breaks migrationinto four categories: refreshment, replication, repackag-ing, and transformation. Refreshment ensures that areliable copy of the bit stream of a digital object ismaintained while replication and repackaging ensurethat a manageable package of the object is available. Onthe other hand, transformation actually modifies the bitstream of a digital object and this is what is consideredas the process of migration in this paper.

Rather than focus on the technology, informationmigration tends to focus on the intellectual content andon ensuring its accessibility using current technology[3]. This strategy could be facilitated through copyingthe digital information to an analog medium or throughapplication programs that are backward compatible.It could also be facilitated through the conversion ofdigital resources into a small number of standardformats that are hardware and software independent [6].

The first solution involves the transfer of digitalresources from less stable to more stable analog mediasuch as paper or microfilm [23]. While analog mediahave better proven long-term reliability, they may notprovide adequate representations of the original object.This solution may also lead to severe loss in functional-ity and presentation of the original digital object. It isnot possible, for example, to microfilm the equationsembedded in a spreadsheet, to print out an interactivevideo, or to preserve a multimedia document as a flatfile [5,18,24].

2 http://www.ccsds.org/

96

Page 5: [J. Res. Natl. Inst. Stand. Technol. 107 The State of …Volume 107, Number 1, January–February 2002 Journal of Research of the National Institute of Standards and Technology [J

Volume 107, Number 1, January–February 2002Journal of Research of the National Institute of Standards and Technology

The second solution, software that is backwardcompatible, will simplify migration. The latest versionsof most popular word processing packages, for example,will be able to decode files created on earlier versions ofthe same package. While this strategy may work overthe short term for simple digital resources created onsome of the leading application packages, it cannot berelied upon over the medium to long-term or for morecomplex digital resources. Meanwhile, interoperabilityof systems will also facilitate migration because aspecific program is not needed to access a digitalresource. However, these features become harder toachieve with greater software complexity. Hence, anyinformation migration through backward compatibilityor interoperability between application softwarepackages would represent a short-term strategy [6].

Another approach entails initially migrating digitalinformation from the great multiplicity of formats to asmaller more manageable number of standard formats.These are less volatile than the wide array of non-standard formats and can still encode the complexityof structure and form of the original resource [5].Decisions on which formats to convert digital resourcesinto should be based on the structure of the digitalresources, on the objectives set by the collectionmanager, and on the requirements of the users of thatcollection [6]. For example, one important issue iswhether the top priority is given to preserving the abilityto process the digital resource or to preserving theformat or visual presentation of the digital resource.

A report [5] from the task force on the Commissionon Preservation and Access (CPA) and the ResearchLibraries Group (RLG) makes the point that there are avariety of migration strategies. Particularly, Wheatley[25] attempted to break migration into more specificcases in the Creative Archiving at Michigan and Leeds:Emulating the Old and the New (CAMiLEON)3 project:minimum migration, preservation migration, recreation,human conversion migration, and automatic conversionmigration. He applied each case to test materials anddiscussed its usefulness for digital preservation.

Recently, the concept of migration on requesthas been conceived by the Consortium of UniversityResearch Libraries Exemplars for Digital Archives(CEDARS)4 project [3], in which original objects aremaintained and preserved in addition to a migration toolthat runs on a current computing platform and can beemployed by users as necessary. When the currentplatform becomes obsolete, the migration tool will no

3 http://www.si.umich.edu/CAMILEON/4 http://www.leeds.ac.uk/cedars/

longer work. Therefore, the preservation problem in thiscase is obviously focused on the maintenance of themigration tool. Preserving the original bit stream and amigration tool is compatible with emulation strategy.

The task force report and Hendley consider thismigration strategy the most promising for the future.Bearman [26] also believes that it is the most promisingstrategy for preservation of electronic records and theonly one that has worked to date. Russell [3] pointsout that the migration strategy is the most practicalapproach at least for the short and medium term.

However, for resources where it is difficult to dis-entangle format from content such as complex multi-media documents, this is not an easy option. Multiplecomponents may require separate migration activitiesand this can be very complex. Indeed, for some multi-media resources, migration may not be possible withoutsignificant compromises in functionality. Russell statesthat the costs of migration may, in the long run, exceedthose costs necessary for preserving either the technol-ogy itself or the detailed technical specification that willallow future emulation.

Waugh et al. believe that the key to successfulmigration is the knowledge of the original data formatand a close match in functionality between the originaland replacement format. They consider migration thesimplest over the short and medium term for digitalinformation that is actively managed. On the other hand,Rothenberg dismisses the strategy of migratingelectronic records systematically before they becomeinaccessible [18].

Lawrence et al. [27] have attempted to quantify therisks involved in the use of migration and have analyzedseveral commercially available migration tools for theirrelative accuracy. This practical investigation has led tothe identification of some key requirements for migra-tion software: access to the source file specification,analysis of the differences between it and the targetformat, identification of the degree of risk in the case ofa mismatch, accurate conversion of the source file to atarget specification, and so on.

However, none of current migration methods or toolscomes close to meeting all of these requirements. It isimportant to realize that technical standards may changerapidly and that this strategy may not ensure that digitalinformation remains accessible. Recently, however,XML has become widely accepted as a universalstandard format in various fields of digital library,electronic commerce, and the Web. Standardizationbased on XML could be helpful in addressing the digitalpreservation problem.

97

Page 6: [J. Res. Natl. Inst. Stand. Technol. 107 The State of …Volume 107, Number 1, January–February 2002 Journal of Research of the National Institute of Standards and Technology [J

Volume 107, Number 1, January–February 2002Journal of Research of the National Institute of Standards and Technology

2.4 Encapsulation

Encapsulation aims to overcome the problems of thetechnological obsolescence of file formats by makingthe details of how to interpret the digital object part ofthe encapsulated information. This strategy involvescreating the original application that was used to createor access the digital object on future computer plat-forms. Part of the process of encapsulation may be tomigrate the record to a more easily documented format.

The concept of encapsulation is similar to the Bento5

[28] container, which was developed to increasecompatibility of data between computer applications.Bento is a specification for storage and interchange ofcompound content, and is designed to be platform andcontent neutral. Thus, it provides a convenient containerfor transporting any type of compound content betweenmultiple platforms.

Encapsulation can be achieved by using physical orlogical structures called containers or wrappers toprovide a relationship between all information compo-nents such as the digital object and some supportinginformation including metadata [29,30]. The referencemodel for the OAIS also describes the types of support-ing information that should be included in an encapsula-tion. They include the representation information usedto interpret the bits appropriately, the provenance todescribe the source of the object, the context to describehow the object relates to other information outside thecontainer, a reference to one or more identifiers touniquely identify the object, and fixity to provideevidence that the object has not been altered.

The Universal Preservation Format (UPF) [31,32,33]is a method being developed for digital preservation,based on the theory of encapsulation. It is a self-describ-ing storage technology, which uses a wrapper to hold thedigital object and the metadata together to protectagainst technological obsolescence. The Digital RosettaStone [34] is a method for storing the representationinformation needed to interpret the digital content ofan object separate from the encapsulation to avoidduplication of effort and inefficient use of storage space.

Encapsulation has been widely promoted byRothenberg who is a strong advocate of encapsulationand emulation methods for digital preservation. Day [30]and Shepard [35] have also supported the encapsulation

5 Certain commercial equipment, instruments, or materials are identi-fied in this paper to foster understanding. Such identification does notimply recommendation or endorsement by the National Institute ofStandards and Technology, nor does it imply that the materials orequipment identified are necessarily the best available for the purpose.

approach. However, Bearman [26] disputes the theoryof Rothenberg stating that it is not clear as to howmetadata encapsulation strategies may be practicallyimplemented.

Waugh et al. argue that encapsulation is the best basisfor long-term preservation. They introduce threechallenges of encapsulation: the requirements forapplications to generate encapsulated records, thepotential storage overhead of including documentationabout the format within each record, and informationabout unpublished data formats. However, even if formatspecifications are publicly available, they are oftenincomplete and substantial components of file specifica-tions often consist of nonstandard elements.

Encapsulation can be considered to be a type ofmigration technique. Although documentation maydelay the need for migration for a long time, the encap-sulated information will eventually need to be migrated.Therefore, encapsulation techniques can be applied tothe digital resources whose format is well known andthat are unlikely to be accessed actively.

2.5 The Digital Tablet

Kranch [36,37] proposes developing a digital tabletfor preservation. The digital tablet technique does notprecisely fit into the above categories. The tablet wouldhave a self-contained power source, present the storeddigital information on a screen as glyphs from a writtenlanguage appropriate for the information, and havetouch-sensitive controls to change the presentation andmanipulate the information. Additionally, it should beable to withstand millennia of neglect under harshconditions yet cost no more than a few dollars to pro-duce and encode. It should have a storage capacity ofdozens or hundreds of terabytes, and contain a simpleserial read-only port to download the original digitalinformation into an external system along with instruc-tions on how to do it. However, the digital tablet may beconsidered to be another technology preservationmethod.

3. Projects and Case Studies

This section introduces representative projects fordigital preservation, describes their goals and results,and presents their specific preservation strategies.Recently, Cloonan and Sanett [38] presented a report onthe initial phase of projects involved in developing,evaluating, and/or implementing digital preservationstrategies in Europe, Australia, and the United States.

98

Page 7: [J. Res. Natl. Inst. Stand. Technol. 107 The State of …Volume 107, Number 1, January–February 2002 Journal of Research of the National Institute of Standards and Technology [J

Volume 107, Number 1, January–February 2002Journal of Research of the National Institute of Standards and Technology

3.1 Australian Projects

Australia has been examining digital preservationissues since 1994. Several projects are aiming to pre-serve various digital materials such as electronicrecords, online publications, digital audio resources,theses, and cartographic materials [1,12,39]. TheVictorian Electronic Records Strategy (VERS)6 project[12] produced a standard for management and preserva-tion of electronic records [40,41]. The standard pro-posed by the VERS project recommends encapsulatingthe documents and their context in a single object basedon XML.

The Preserving and Accessing Networked Documen-tary Resources of Australia (PANDORA)7 project bythe National Library of Australia has led the way inarchiving the Web [42,43]. The primary objective ofPANDORA is to capture, archive, and provide long-termaccess to significant online publications. The projectaims at addressing both archiving and preservationprocesses. The PANDORA archiving processes refer tothe collection and provision of immediate access to thepublications while preservation processes involvemanaging the materials and applying appropriate strate-gies (e.g., migration) to ensure long-term access. Thearchiving processes have been developed whiletechniques for managing long-term access to thesedigital resources is still being developed.

The project has also embarked on migration experi-ments with some HTML pages. The format of thesepages is not yet obsolete, but the HTML specification[44] has declared a number of mark-up tags as dead andnot to be supported in this or future versions. The aimof these trials is to make changes in the HTML sourcecode to remove tags declared as dead and replace themwith current tags, effectively migrating the source codeto a different version to reduce problems of futurecompatibility with Web browsers. Additionally, theNational Archives of Australia is undertaking a projectto develop advice for commonwealth agencies on usingmigration as a preservation strategy for electronicrecords [45].

3.2 CEDARS

The Consortium of University Research Libraries,which represents both university and national librariesacross the UK and Ireland, leads the CEDARS project.

6 http://www.prov.vic.gov.au/vers/7 http://www.nla.gov.au/policy/plan/pandora.html

The project is based on the OAIS model describedearlier. Essentially it is an archival model for an archiv-ing system but does not explicitly include a preservationmodule. Current projects investigating the use of themodel include NEDLIB in Europe, CEDARS in the UK,and PANDORA in Australia.

The CEDARS project aims to address the strategic,methodological, and practical issues of digital preserva-tion as well as providing guidance for libraries in bestpractices. Specifically, the project is producing guide-lines for developing digital collection managementpolicies and preserving different classes of digitalresources. Additionally, it is performing an analysis ofthe cost implications of digital preservation. The projectis running pilot projects to test and promote the chosenstrategy for digital preservation. The CEDARS DataPreservation Strategies Working Group is looking atpreservation issues that are related with migration,emulation, and data refreshing.

Specifically, the project has done some work compar-ing migration and emulation across older digital materi-als in conjunction with the CAMiLEON project [20,25].They suggest that both migration and emulation strate-gies are viable for different types of digital materials.They also believe that no strategy is a panacea, and thestrategy adopted for providing access to preservedresources will very much depend on the nature of theresource itself and the reason for its preservation. Morework is planned to investigate the information lossassociated with each strategy.

3.3 CAMiLEON

The CAMiLEON project is funded by the JointInformation Systems Committee (JISC) in the UK andthe National Science Foundation (NSF) in the USA. Theproject looks at the issues surrounding the implementa-tion of technology emulation as a digital preservationstrategy. The project recognizes the potential of emula-tion for the retention of the functionality and the lookand feel of digital objects [20]. It aims to develop tools,guidelines, and costs for emulation as compared to otherdigital preservation options.

The project is performing user testing of variousdigital resources both in their original environments andin emulated environments. The project presents guide-lines for use of emulation and argues that emulation is avalid method for both complex digital resources thatinclude executable files and resources for which thedocumentation is not available in electronic form.

99

Page 8: [J. Res. Natl. Inst. Stand. Technol. 107 The State of …Volume 107, Number 1, January–February 2002 Journal of Research of the National Institute of Standards and Technology [J

Volume 107, Number 1, January–February 2002Journal of Research of the National Institute of Standards and Technology

3.4 NEDLIB

The NEDLIB project was initiated by a permanentstanding committee of the Conference of EuropeanNational Libraries (CENL) in 1998, with funding fromthe European Commission’s Telematics ApplicationProgramme. Eight national libraries in Europe, onenational archive, and major publishers are participatingin the project. The National Library of the Netherlandsleads the project.

The project aims to develop a common architecturalframework and basic tools for building deposit systemsfor electronic publications. The project has also adoptedthe OAIS migration model described earlier. The mainobjective of the NEDLIB project is to provide betterinsight into the merit and weakness of different long-term preservation strategies. The project defines thecharacteristics of electronic publications and othercategories of digital deposit material and associatedpreservation and authenticity requirements. It is recog-nized that many aspects including cost-effectiveness,legal restrictions, agreements with publishers, and useraccess requirements ultimately need to be taken intoaccount when policy choices for preservation strategiesare set. However, the project focuses on the technicalissues of preservation.

The project has taken a first step to test the technical-ities of the preservation mechanisms by starting anemulation experiment. The fundamental idea of thework is to test whether emulating obsolete computerhardware on future systems could be used to ensurelong-term access to digital publications. Rothenberg hasperformed the first phase of this experimental work[15]. It involved developing a prototype experimentalenvironment for trying out emulation-based preserva-tion and using commercial emulation tools to providean initial proof-of-concept. The experimental resultsindicate that emulation should work in principle, assum-ing that suitable emulators for obsolete computingplatforms can be hosted on future platforms.

3.5 Kulturarw3 Heritage

The Kulturarw3 Heritage Project8 of the RoyalLibrary in Sweden is testing methods of collecting,archiving, and providing access to Swedish electronicdocuments. Web crawlers or robots are used in order tocollect all the Swedish Web pages automatically.Although the project currently does not focus on preser-vation, it is growing into a broader Nordic initiative thatmay explore the long-term preservation of this archive.

8 http://kulturarw3.kb.se/html/projectdescription.html

3.6 Library of Congress

Recently, the Computer Science and Telecommuni-cations Board (CSTB) of the National Academies[46,47] convened the Committee on the InformationTechnology Strategy for the Library of Congress foradvice on digital preservation. The committee reportincludes specific recommendations for enhancingtechnology infrastructure, particularly in the area ofnetworks, databases, and information technologysecurity.

The Library of Congress’ pilot project, working withthe Internet Archive9, has worked through all aspects ofarchiving the Web in the area of political Web sites. Thisproject uses the Digital Library SunSITE Collection andPreservation Policy10 from the University of California,Berkeley, which provides several digital collectinglevels, as guidance.

3.7 NARA: Persistent Archives and ElectronicRecords Management

The NARA (National Archives and Records Admin-istration) project,11 which is led by the San Diego Super-computer Center and funded by NARA, aims to developa persistent archive to support ingestion, archivalstorage, information discovery, and preservation ofdigital collections. One of its premises is the importanceof preserving the organization of digital collectionssimultaneously with the digital objects that comprise thecollection [48].

The ultimate goal is to preserve not only the originaldata, but also the context that permits the data to beinterpreted. The project proposes an approach formaintaining digital data for hundreds of years throughdevelopment of an environment that supports migrationof collections onto new software systems. The proposedinfrastructure combines elements from supercomputercenters, digital libraries, and distributed computingenvironments. The project emphasizes the synergy thatis achieved through the identification of the uniquecapabilities provided by each environment, and theconstruction of interoperability mechanisms forintegrating these environments. According to thisproject, collection-based persistent archives are nowfeasible and can manage massive amounts of informa-tion based on XML [49].

9 http://www.archive.org/10 http://sunsite.berkeley.edu/11 http://www.sdsc.edu/NARA/12 http://www.interPARES.org/

100

Page 9: [J. Res. Natl. Inst. Stand. Technol. 107 The State of …Volume 107, Number 1, January–February 2002 Journal of Research of the National Institute of Standards and Technology [J

Volume 107, Number 1, January–February 2002Journal of Research of the National Institute of Standards and Technology

3.8 InterPARES

The International Research on Permanent AuthenticRecords Electronic Systems (InterPARES)12 project [38]is a multinational research initiative, in which archivalscholars, computer engineering scholars, nationalarchival institutions, and private industry representativesare collaborating to develop the theoretical knowledgeand methodology required for the permanent preserva-tion of authentic records created using electronicsystems [50].

The research areas include identifying the electronicrecord elements that need to be maintained, developingcriteria to appraise electronic records for preservation,and formulating principles for development of inter-national, national, and organizational preservationstrategies. Specifically, the research areas are dividedinto four complementary domains: authenticity,appraisal, preservation, and strategies. In terms ofauthenticity, the purpose is to identify the specificelements of electronic records that must be preserved,over time and across technologies, in order to verify therecord’s authenticity. As a first step, a template to guidethe analysis of electronic records was developed andis being evaluated. The project is based in the Schoolof Library, Archival and Information Studies at theUniversity of British Columbia in Canada. The currentphase of the project would end on December 31, 2001.

3.9 PRISM

The Preservation, Reliability, Interoperability,Security, Metadata (PRISM)13 project [29] of theCornell University is a 4 year project funded by theDigital Library Initiative to investigate and developpolicies and mechanisms needed for informationintegrity in digital libraries. The project focuses onlong-term survivability of digital information, reliabilityof information resources and services, interoperability,security, and metadata.

The current direction of the project is toward develop-ing techniques for monitoring the integrity of dis-tributed Web-based information resources and enforcingpreservation policies set by the owners and users ofcollections. Monitoring resources will involve both theautomated capture of information using a specializedWeb crawler and the manual gathering of data on theorganizational status of particular resources and collec-tions. The ultimate objective is to develop a cost-effec-tive and event-based metadata scheme that will enable

13 http://www.prism.cornell.edu/

users to define preservation policies and enforce themautomatically.

3.10 Canadian Projects

E-preservation14 was developed through a cooperativeeffort between the National Library of Canada and theCanadian Initiative on Digital Libraries (CIDL).E-preservation is intended 1) to provide Canadians witheasy access to policies and 2) to perform research on thecreation, use, and preservation of digital collections. Theproject includes guidelines about various aspectsincluding acquiring digital materials, formats, andmetadata.

3.11 Preservation Projects at the National Instituteof Standards and Technology (NIST)

The earliest work on data preservation at NIST can betraced back to the 1980s when Podio performedresearch on the lifetime measurement of compact discs[51]. His work provided a basis for a standard methodol-ogy for the lifetime measurement of optical discs. Withthe increasing usage of digital storage in libraries andthe archiving of the government agencies, the greatimportance of digital preservation became clear toNIST’s Information Technology Laboratory,15 andaccordingly new projects on the study of digital datapreservation have started in the following aspects:

• Longevity testing. This project initially consisted ofan examination of the effects of heat and humidityon the lifetime of optical discs and was laterextended to include the effects of light exposure.The focus is not only on the lifetime itself, but alsoon the deterioration process. The results may beuseful both for new disc production and for theclassification of existing recorded discs.

• Testing of interchangeability and interoperability ofoptical discs for use in high-density storage systemssuch as optical disc “Jukeboxes.” Combined with theapplication and further development of XML, anew preservation strategy may be developed. Thisprogram is being conducted in collaboration with theHigh Density Storage Association (HDSA).16 Anopen testing laboratory is being developed that willinclude interoperability and interchangeabilitytesting as well as testing of the suitability of varioustypes of high capacity storage systems for differentapplications including preservation.

14 http://www.nlc-bnc.ca/cidl/preserv-conserv/index.htm15 http://www.itl.nist.gov/16 http://www.hidensity.org/

101

Page 10: [J. Res. Natl. Inst. Stand. Technol. 107 The State of …Volume 107, Number 1, January–February 2002 Journal of Research of the National Institute of Standards and Technology [J

Volume 107, Number 1, January–February 2002Journal of Research of the National Institute of Standards and Technology

• Development of the Turbo coding system [52].Unlike the traditional digital preservation techniquesthat aim to keep the information readable in the longterm, this new technique aims to develop a methodfor finding and recovering useful information fromfailed discs. The failure of many commercial errorcorrection codes to support retrieval of importantinformation from aging or damaged digital storagedevices has indirectly resulted in the reduction ofthe rated life expectancy and archival propertiesof digital storage devices. Turbo code has shownpromise in retrieving information previouslyinaccessible. Turbo codes are two parallel, recursive,and systematic convolution codes. These codesare used for the channel coding and decoding inorder to detect and correct the errors that mayoccur in the transmission of digital data throughdifferent channels. The iterative method of thedecoding scheme helps to achieve the theoreticallimit (near Shannon-limit [53]) in error correctionperformance. This program is being conducted incooperation with Carnegie-Mellon University.

4. Discussion and Summary

In this paper, three main preservation techniques havebeen discussed in detail. Each of them has advantagesand disadvantages as shown in Table 1. Different strate-gies are viable for different types of digital materials. Incases of complex resources and application softwaressuch as games and executable files, emulation is a suit-able approach. In particular, emulation is the mostfeasible choice where there is a lack of sufficientknowledge regarding the format of the digital informa-tion and where the look and feel of digital informationis important.

On the other hand, migration or encapsulation isappropriate for digital resources where knowledge ofthe format is sufficient and where the resource isrelatively simple. Specifically, migration is appropriatefor resources that are actively accessed and managed.Encapsulation is suitable for resources that are unlikelyto be actively accessed. Fig. 4 shows a schematicdiagram to select the suitable preservation techniquesfor digital resources according to the type and complex-ity of digital information, the availability of the dataformat, and its usage.

Table 1. Advantages and disadvantages of preservation techniques

Technique Advantages Disadvantages Domain

Emulation Maintains the look The complexity of creating Application software.and feel. emulator specifications.

Complex digital resources such asThe large amount of those that contain executable files.information that must bepreserved. Resources for which there is a lack

of sufficient knowledge.Archaic software requiredto access information. Resources for which the value is unknown

and for which future use is unlikely.Resources whose look and feel are important.

Migration Does not need to retain Significant cost for long-term Resources that are actively accessed andoriginal applications. preservation. managed, such as scientific data or database.

Supports active access Information degradation. Resources whose formats are sufficientlyand management. well known.

Lack of preservation metadata.

Need for continued diligenceon the part of archivists.

Encapsulation Maintains preservation Knowledge about the format Resources that are unlikely to be accessedinformation. must be preserved. and managed actively.

Systems required for capturing Resources whose formats are sufficientlythe digital information. well-known.

102

Page 11: [J. Res. Natl. Inst. Stand. Technol. 107 The State of …Volume 107, Number 1, January–February 2002 Journal of Research of the National Institute of Standards and Technology [J

Volume 107, Number 1, January–February 2002Journal of Research of the National Institute of Standards and Technology

As mentioned before, the objective of long-termdigital preservation is to ensure continuing access tostored digital information. Future users will be able toaccess digital information, which will be preserved bydigital libraries, through their own computing environ-ments or portable reading devices. This means that thedigital information may have to be migrated for long-term digital preservation even if some emulationtechniques have already been applied. Therefore, long-term preservation may involve several preservationtechniques. In other words, for successful preservation,combined application of different strategies should betaken into account. These techniques are not mutuallyexclusive, that is, one approach may include someaspects of other approaches.

The preservation of physical artifacts generallyoccurs through developing and maintaining collectionsin a decentralized manner. By contrast, the preservationof digital resources does not lend itself well to such aprocess and no single institution can preserve all digitalinformation. Fortunately, the Internet enables digitallibraries worldwide to communicate and share informa-tion online. For this to happen successfully, the inter-operability of archives and digital library systems andthe interchangeability of preserved digital informationare necessary.

Standard formats for preservation of digital informa-tion should be developed. Standards can provide opti-mum interoperability and interchangeability, well-constructed tool sets for developers, and solutions forcomplex systems. Furthermore, standard solutions arelikely to be available for use for far longer than non-standard solutions. Migration from standard formats ismuch easier, cheaper, and more accurate that migrationfrom non-standard formats.

Standards are required for both emulation and encap-sulation. Conventional standards were based on variouskinds of encoding schemes according to the types ofdigital resources, with the result that quite a few soft-ware packages support these standards. Meanwhile,XML is being adopted as a standard in various fieldsdue to the recognition of its advantages, including itsindependence of hardware and software, and due to itswide dissemination. Although XML is a relatively newtechnology for the Web, SGML (Standard GeneralizedMarkup Language) [54], which is a superset of XML, isa proven technology that has been tested and evaluatedsince 1986, when it was accepted as an internationalstandard. Furthermore, standards based on XML willfacilitate support for the Internet and Web. For example,a standard [55] for digital representation of paper-basedbooks is based on XML.

Since the use of XML is expected to grow wider inthe areas of interoperability and interchangeability,preservation standards based on XML are needed. Asingle standard format is desirable when implementing apreservation system.

The most successful preservation strategies willcontain elements of migration based on standardization.While techniques for the migration of digital resourcescomprised of simple data appear to be widely acceptedand followed, the preservation community is onlybeginning to address migration of more complex digitalobjects. Additional research on migration is needed totest the technical feasibility of various approachesto migration, determine the costs associated withthese approaches, and establish benchmarks and bestpractices.

Furthermore, a test bed is needed in which specificmigration techniques can be prototyped. This test bedshould be used to quantitatively test the performance ofmigration tools that are currently available on the marketand new tools being developed. Results from this testbed shall give good guidelines for choosing appropriatetools to migrate digital resources that are in differentformats. For this purpose, systematic categorization ofdigital resources must be developed.

Fig. 4. A schematic diagram for selection of preservation techniquesof digital information.

103

Page 12: [J. Res. Natl. Inst. Stand. Technol. 107 The State of …Volume 107, Number 1, January–February 2002 Journal of Research of the National Institute of Standards and Technology [J

Volume 107, Number 1, January–February 2002Journal of Research of the National Institute of Standards and Technology

Other critical technical issues must be resolved forpreserving digital information. They include require-ments relating to metadata, authenticity and integrity,cost modeling, content and structure, format and styles,storage media, and workflow process. Brief descriptionsof some critical issues are as follows:

Metadata

The metadata for digital preservation is critical butcomplex. Metadata types include descriptive metadatafor resource discovery, administrative metadata for thepreservation process itself, technical metadata, andrights metadata to describe copyright information [3].Metadata must enable access to the intellectual contentof the object (whether by migration or emulation), findthe object, manage the object, and allow other versionsof the object to be produced. There have been severalspecifications for preservation metadata by CEDARS[56], the National Archives and Library of Australia[57,58], and NEDLIB [59]. Recently, collaborative ef-forts between RLG and the Online Computer LibraryCenter (OCLC) have produced a report [60] identifyingcommon goals and approaches to these digital preserva-tion metadata. The metadata should cover the require-ments of various types of organizations and be stan-dardized for interoperability. Efforts to develop toolsfor automatically creating metadata are needed.

Interoperability and Interchangeability

As mentioned before, the interoperability of archivesand digital libraries as well as the interchangeability ofdigital information are important issues, particularly ina distributed environment such as the Internet. Standardformats and procedures should therefore be developed.

Authenticity and Integrity

Authentication allows the user to be certain of theoriginality of the digital resource when it is needed.Although there are several techniques used for authenti-cation including cryptography, hashing, and timestamping [15], there are multiple meanings and implica-tions of authenticity and integrity. Therefore, creating acommon understanding of authenticity and integrity iscritical in the digital environment [61]. Dollar [62]identifies maintaining the authenticity of a digital objectas a key aim in the migration process.

Cost Model

Decisions about preserving information should con-sider the costs [6]. However, there are no proven tech-niques for estimating the costs of long-term preserva-

tion of digital information. Recently, Russell andWeinberger [63] postulate that the ongoing costs ofdigital preservation span a more extended timeframethan traditional preservation and will therefore requireresource commitments of a different nature. Differentstrategies may necessitate different costing timeframesand schedules. They state that current cost models haveyet to reflect this more complex environment.

5. Appendix A. Glossary

The following is a list of important words that areused in this paper. Brief definitions of these words aresummarized here for readers’ convenience. For morespecific definitions refer to the URLs.

Emulation

The process of setting up a system to perform in thesame way as another system of a different type in orderto run its programs.<http://www.nla.gov.au/download/dsp/appendices.pdf>

Encapsulation

A technique of grouping together a digital object andanything else necessary to provide access to that object.<http://www.nla.gov.au/padi/topics/20.html>

Migration

The periodic transfer of digital materials from onehardware/software configuration to another or from onegeneration of computer technology to a subsequent gen-eration. The purpose of migration is to preserve theintegrity of digital objects and to retain the ability forclients to retrieve, display, and otherwise use them in theface of constantly changing technology.<http://info.wgbh.org/upf/glossary.html>

Open Archiving Information System (OAIS)

An archive, consisting of an organization of peopleand systems, that has accepted responsibility to preserveinformation and make it available for one or moredesignated communities.<http://www.kb.nl/coop/nedlib/glossary.pdf>

Refreshing

To copy digital information from one long-termstorage medium to another of the same type, with nochange whatsoever in the bit�stream (e.g. from adecaying 800bpi tape to a new 800bpi tape, or from anolder 5 1/4” floppy to a new 5 1/4” floppy).<http://info.wgbh.org/upf/glossary.html>

104

Page 13: [J. Res. Natl. Inst. Stand. Technol. 107 The State of …Volume 107, Number 1, January–February 2002 Journal of Research of the National Institute of Standards and Technology [J

Volume 107, Number 1, January–February 2002Journal of Research of the National Institute of Standards and Technology

XML (eXtensible Markup Language)

Subset of SGML designed to be transmissible overthe Internet in such a way that document browsers do notneed to access the document type definition to validatethe document before display. As well as requiring allelements to be “well-formed,” e.g., have both start andend tags present, the specification provides XMLspecific attributes and processing instructions that canbe used to control the way documents are presented tousers.<http://www.w3.org/TR/PR-xml.html>

Acknowledgments

The authors would like to thank Mr. John Roberts,Dr. Albert Paul, and Dr. Theodore Vorburger for theirhelpful and constructive comments and suggestions.

6. References

[1] D. Woodyard, Digital Preservation: The Australian Experience,Proc. Third Conf. Digital Library: Positioning the Fountainof Knowledge, Malaysia (2000), http://www.nla.gov.au/nla/staffpaper/dw001004.html.

[2] D. M. Levy, Heroic Measure: Reflections on the Possibility andPurpose of Digital Preservation, Proc. Conf. ACM DigitalLibraries, Pittsburgh (1998) pp. 152-161.

[3] K. Russell, Digital Preservation and the CEDARS ProjectExperience, Proc. Int’l Conf. Preservation and Long TermAccessibility of Digital Materials, York, England (2000)pp. 139-154.

[4] S.-S. Chen, The Paradox of Digital Preservation, IEEEComputer, 34 (3), 24-28 (2001).

[5] J. Garrett and D. Waters, Preserving Digital Information, Reportof the Task Force on Archiving of Digital Information, TheCommission on Preservation and Access and The ResearchLibraries Group, Washington DC and Mountain View CA, May1996, 64 pp.

[6] T. Hendley, Comparison of Methods & Costs of DigitalPreservation, British Library Research and Innovation Report106, British Library Research and Innovation Center, WestYorkshire (1998) 121 pp.

[7] M. Fresko and K. Tombs, Digital Preservation Guidelines,European Commission DG XIII/E, Luxembourg (1998) 69 pp.

[8] D. Woodyard, Farewell My Floppy: A Strategy for Migration ofDigital Information, Proc. Conf. VALA, Melbourne (1998),http://www.nla.gov.au/nla/staffpaper/valadw.html.

[9] W3C Recommendation, Extensible Markup Language (XML)1.0 (Second Edition), World Wide Web Consortium (2000)http://www.w3c.org/TR/REC-xml.

[10] J. Rothenberg, Ensuring the Longevity of Digital Documents,Sci. Amer. 272 (1), 42-47 (1995).

[11] E. H. Zwaneveld, Standards and New Technology Strategies toPreserve Content on Magnetic and Disc Media, Point of View,SMPTE J.109, 628-635 (2000).

[12] A. Waugh et al., Preserving Digital Information Forever, Proc.Conf. ACM Digital Libraries, San Antonio (2000) pp. 175-184.

[13] D. Bearman, Collecting Software: A New Challenge forArchives & Museums, Technical Reports 1 (2), ArchivalInformatics (1987) 80 pp.

[14] P. S. Graham, Intellectual Preservation in the Electronic Environ-ment, After the Electronic Revolution, Will You Be the First toGo?, Arnold Hirshon, ed., American Library Association,Chicago (1993).

[15] J. Rothenberg, An Experiment in Using Emulation to PreserveDigital Publications, Koninklijke Bibliotheek, Hague, Nether-lands (2000) 85 pp.

[16] S. Granger, Emulation as a Digital Preservation Strategy, D-LibMag. 6 (10), (2000), http://www.dlib.org/dlib/october00/granger/10granger.html.

[17] S. Gilheany, Preserving Information Forever and a Call forEmulators, Proc. Digital Libraries Conference and Exhibition:The Digital Era: Implications, Challenges and Issues, Singapore(1998), http://www.archivebuilders.com/aba010.html.

[18] J. Rothenberg, Avoiding Technological Quicksand: Finding aViable Technical Foundation for Digital Preservation (A Reportto the Council on Library and Information Resources), Councilon Library and Information Resources, Washington, DC (1999),http://www.clir.org/pubs/reports/rothenberg/contents.html.

[19] T. Werf-Davelaar, Long-term Preservation of Electronic Publica-tions, D-Lib Mag. 5 (9), (1999), http://www.dlib.org/dlib/september99/vanderwerf/09vanderwerf.html.

[20] D. Holdsworth and P. Wheatley, Emulation, Preservation andAbstraction (2000), http://www.leeds.ac.uk/CAMiLEON/dh/ep5.html.

[21] D. Holdsworth, Emulation: C-ing Ahead (2000), http://129.11.152.25/CAMiLEON/dh/cingahd.html.

[22] Reference Model for an Open Archival Information System(OAIS), CCSDS 650.0-R-1, Consultative Committee for SpaceData Systems, May 1999, 140 pp, http://www.ccsds.org/documents/pdf/CCSDS- 650.0-R-1.pdf.

[23] F. L. Eaton, The National Archives and Electronic Records forPreservation, Preservation of Electronic Formats & Formats forElectronic Preservation, Janice Mohlhenrich, ed., Ft. Atkinson,Highsmith (1993).

[24] D. Bearman, The Implications of Armstrong vs. Executive Officeof the President for the Archival Management of ElectronicRecords, Amer. Archivist 56, 150-160 (1993).

[25] P. Wheatley, Migration—A CAMiLEON Discussion Paper(2000), http://www.personal.leeds.ac.uk/~issprw/camileon/migration.htm.

[26] D. Bearman, Reality and Chimeras in the Preservation ofElectronic Records, D-Lib Mag. 5 (4), (1999), http://www.dlib.org/dlib/april99/bearman/04bearman.html.

[27] G. W. Lawrence et al., Risk Management of Digital Information:A File Format Investigation, The Council on Library andInformation Resources, Washington DC, June 2000.

[28] J. Harris and I. Ruben, Bento Specification, AppleComputer Inc, July 1993, http://info.wgbh.org/upf/pdfs/BentoSpec1_0d5.pdf.

[29] O. Y. Rieger, Project PRISM: Preservation Metadata Research,Proc. Conf. Information Infrastructures for Digital Preservation,York, England (2000), http://www.rlg.org/events/pres-2000/infopapers.html.

[30] M. Day, Issues and Approaches to Preservation Metadata, Proc.RLG/NPO Conf. Guidelines for Digital Imaging, University ofWarwick (1998), http://www.rlg.org/preserv/joint/day.html.

[31] T. Shepard and D. MacCarn, The Universal Preservation Format:Background and Fundamentals, Proc. Sixth Delos Workshop,Tomar, Portugal (1998), http://www.ercim.org/publication/ws-proceedings/DELOS6/upf.pdf.

[32] D. MacCarn, Toward a Universal Data Format for the Preserva-tion of Media, SMPTE J. 106, 477 (1997).

105

Page 14: [J. Res. Natl. Inst. Stand. Technol. 107 The State of …Volume 107, Number 1, January–February 2002 Journal of Research of the National Institute of Standards and Technology [J

Volume 107, Number 1, January–February 2002Journal of Research of the National Institute of Standards and Technology

[33] T. Shepard and D. MacCarn, The Universal Preservation Format:A Recommended Practice for Archiving Media and ElectronicRecords, WGBH Educational Foundation, Boston (1999).

[34] A. R. Heminger and S. B. Robertson, Digital Rosetta Stone: AConceptual Model for Maintaining Long- term Access to DigitalDocuments, Proc. Sixth Delos Workshop, Tomar, Portugal(1998), http://www.ercim.org/publication/ws-proceedings/DELOS6/rosetta.pdf.

[35] T. Shepard, Universal Preservation Format (UPF): ConceptualFramework, RLG DigiNews 2 (6), (1998), http://www.rlg.org/preserv/diginews/diginews2-6.html.

[36] D. A. Kranch, Preserving Electronic Documents, Proc. Conf.ACM Digital Libraries, Pittsburgh (1998) pp. 295-296.

[37] D. A. Kranch, Beyond Migration: Preserving ElectronicDocuments with Digital Tablets, Inform. Technol. Libraries 17(3), 138-148 (1998).

[38] M. V. Cloonan and S. Sanett, Comparing Preservation Strategiesand Practices for Electronic Records, Proc. Int’l Conf. Preserva-tion and Long Term Accessibility of Digital Materials,York, England (2000), http://www.rlg.org/events/pres-2000/cloonan.html.

[39] A. R. Kenney and O. Y. Rieger, The National Library ofAustralia’s Digital Preservation Agenda: An Interview withColin Webb, RLG DigiNews 5 (1), (2001), http://www.rlg.org/preserv/diginews/diginews5-1.html.

[40] PROS 99/007 Standard for the Management of ElectronicRecords, Public Record Office Victoria (1999), http://www.prov.vic.gov.au/gservice/standard/pros9907.htm.

[41] Victorian Electronic Records Strategy Final Report, PublicRecords Office Victoria (1999), http://www.prov.vic.gov.au/vers/final.htm.

[42] M. E. Phillips, Ensuring Long-term Access to Online Publica-tions, J. Electron. Publ. 4 (4), (1999), http://www.press.umich.edu/jep/04-04/phillips.html.

[43] C. Webb, Towards a Preserved National Collection of SelectedAustralian Digital Publications, Proc. Int’l Conf. Preservationand Long Term Accessibility of Digital Materials, York, England(2000), http://www.rlg.org/events/pres-2000/webb.html.

[44] W3C Recommendation, Hypertext Markup Language (HTML)4.01, World Wide Web Consortium (1999), http://www.w3.org/TR/html4/.

[45] Electronic Records: Preservation and Migration of ElectronicRecords, National Archives of Australia (2000), http://www.naa.gov.au/recordkeeping/er/summary.html�preservation.

[46] A. S. Inouye, A Digital Strategy for the Library of Congress,Commun. ACM 44 (5), 43 (2001).

[47] Committee on an Information Technology Strategy for theLibrary of Congress, LC21: A Digital Strategy for the Library ofCongress, National Academy Press, Washington, DC (2001).

[48] R. Moore et al., Collection-Based Persistent Digital Archives:Part 1, D-Lib Mag. 6 (3), (2000), http://www.dlib.org/dlib/march00/moore/03moore-pt1.html.

[49] R. Moore et al., Collection-Based Persistent Digital Archives:Part 2, D-Lib Mag. 6 (4), (2000), http://www.dlib.org/dlib/april00/moore/04moore-pt2.html.

[50] L. Duranti, The Impact of Technological Change on ArchivalTheory, Proc. Int’l Congress of Archives, Seville (2000), http://www.interpares.org/documents/ld_sevilla_2000.pdf.

[51] F. Podio, Development of a Testing Methodology to PredictOptical Disk Life Expectancy Values, NIST Special Publication500-200, NIST, Gaithersburg, MD (1991).

[52] V. K. Bhagavatula, Turbo Coding for Data Storage, CarnegieMellon University (2001), http://amp.ece.cmu.edu/ECESeminar/slides/kumar/kumar.pdf.

[53] C. E. Shannon, A Mathematical Theory of Communication, BellSys. Tech. J. 27, 379-423 (1948); ibid. 27, 623-656 (1948).

[54] ISO/IEC 8879, Information Processing—Text and OfficeSystems—Standard Generalized Markup Language (SGML),International Organization for Standardization (1986).

[55] OEB Publication Structure 1.0, Open eBook Forum, 1999, http://www.openebook.org/OEB1.html.

[56] The CEDARS Project Team and UKOLN, Metadata for DigitalPreservation: The CEDARS Project Outline Specification,March 2000, http://www.leeds.ac.uk/cedars/MD-STR~5.pdf.

[57] National Archives of Australia, Recordkeeping MetadataStandard for Commonwealth Agencies Version 1.0, May 1999,http://www.naa.gov.au/govserv/TECHPUB/rkms/intro.htm.

[58] National Library of Australia, Preservation Metadata for DigitalCollections: Exposure Draft, October 1999, http://www.nla.gov.au/preserve/pmeta.html.

[59] C. Lupovici and J. Masanes, Metadata for Long Term Preserva-tion, July 2000, http://www.kb.nl/coop/nedlib/results/preserva-tionmetadata.pdf.

[60] OCLC/RLG Working Group on Preservation Metadata, Preser-vation Metadata for Digital Objects: A Review of the State of theArt, January 2001, http://www.oclc.org/difitalpreservation/presmeta_wp.pdf.

[61] C. T. Cullen et al., Authenticity in a Digital Environement, TheCouncil on Library and Information Resources, Washington,DC, May 2000, 84 pp.

[62] C. Dollar, Authentic Electronic Records: Strategies for Long-Term Access, Cohasset Associates, Chicago (1998).

[63] K. Russell and E. Weinberger, Cost Elements of DigitalPreservation (2000), http://www.leeds.ac.uk/cedars/documents/CIW01r.html.

About the authors: Kyong-Ho Lee, Oliver Slattery,and Richang Lu are guest researchers in theConvergent Information Systems Division of the NISTInformation Technology Laboratory. Xiao Tang isthe Group Leader of the Information Storage andIntegrated Systems Group within the ConvergentInformation Systems Division. Victor McCrary is theChief of the Convergent Information Systems Division.The National Institute of Standards and Technologyis an agency of the Technology Administration, U.S.Department of Commerce.

106