Long-Term Preservation

Preview:

DESCRIPTION

Long-Term Preservation. Technical Approaches to Long-Term Preservation the challenge is to interpret formats a similar development: sound carriers From phonograph to MP3. Those that do not keep up with this development soon will lack support: - PowerPoint PPT Presentation

Citation preview

Long-Term Preservation

Technical Approaches to Long-Term Preservation

• the challenge is to interpret formats• a similar development: sound carriers• From phonograph to MP3

• Those that do not keep up with this development soon will lack support:– new audio documents are only produced in

current formats– out-of-date equipment spare parts are hard to

come by.

• technical approaches to long-term preservation of digital documents fall into two categories:– aim to preserve the original state of documents

along with systems that are suitable for rendering the documents in their original format

– aim to continually transform digital documents into the formats of state-of-the-art rendition systems and at the same time to retain their original “look and feel.”

Migration• advantages:– well known– documents available all the time– Possibly improved quality

• disadvantages:– reduced authenticity– hard to automate

Hardware Museums• The mission of a hardware museum is to

collect (and keep operational) all relevant computing systems so that future generations may view our documents in their original environments.

• Hardware museums are not feasible in practice :– too many items– Additional software

and hardware required

– hard to maintain

Hardware museum at the Universität der Bundeswehr München, Germany

Emulation• Emulators allow the function of processors and

other hardware components to be simulated by software.

• When using emulation, for each digital document the following items have to be preserved (using, e.g., migration):– The character stream and the metadata– A specification of the hardware that can be interpreted

by the emulator– The complete software of the rendition system (in the

form of binary data streams).

• If interested persons would like to access a document conserved that way in, say, 100 years from now, they would have to proceed as follows:

1. create an emulator,– Load the hardware specification into an emulator

to obtain a software implementation which is functionally equivalent to the original hardware.

2. install software– On the emulated computer install the systems

software and the application programs needed for rendering the document

3. and render documents– Load the character stream of the digital document

into the emulated . . . and render computer and start the rendition software to access the document.

• advantages of emulation:– relatively small cost per document– cost proportional to actual use one emulator suffices for

many documents– high authenticity

• Whenever an old format becomes obsolete emulation (while new ones become popular), new conversion techniques and tools have to be developed that achieve the required transformation.

Standard Formats• costs proportional to number of formats• standards for simple character sequences and

For complex document types

Legal and Social Concerns• long-term preservation of digital documents

involves legal and social concerns:1. “Digital Rights Management” (DRM) and copy

protection2. reserved software right3. Should hardware manufacturers provide

emulators?4. criteria for selection5. costs as a limiting factor 6. make costs affordable7. balance of interests between shareholders

OAIS Models

• Open Archival Information System Reference Model

• an ISO standard on the long-term preservation of digital documents.

• two complementary points of view: both, an information model and a process model

The Information Model• Data Object and Information Object• The knowledge which is required to understand

data is called Knowledge Base• In order to understand the data one needs

additional information.• Ex, Along with the source code of the Java program,

a book about the programming language Java must be available (Representation Information)

• The Content Information is the information object proper which contains all the information necessary to interpret data

• Preservation Description Information (PDI) denotes all the information required to suitably preserve the corresponding Content Information.

• Content Information and PDI are combined into one logical entity, the Information Package.

• Packaging Information. It specifies how Content Information and PDI are actually related to each other e.g., by describing the directory structure of a CD-ROM.

• Descriptive Information which yields Information about the content of the Information Package and thus allows the Information Package to be found in the archive.

Modeling Context and Processes• In order to define the processes that are going

on in the archive in more detail, the OAIS Reference Model starts by considering the context of the archive.

• An archive’s purpose is to maintain documents, which are submitted to it and which are to be made available to future users.

• Producers, i.e., authors, institutions, etc. that deliver documents to the archive.

• Management. defines the specific purpose of the archive, e.g., which documents are to be collected and which are not

• The OAIS Reference Model differentiates three different kinds of Information Packages in their relation to the environment of the archive:– Submission Information Packages (SIP) are sent to

the archive by Producers– Archive Information Packages (AIP) are preserved in

the archive– Dissemination Information Packages (DIP) are passed

from the archive to Consumers.

• The Ingest process receives an SIP from the Producer and prepares it Ingest for storage and administration within the archive.

• SIPs must be transformed into AIPs, and Descriptive Information corresponding to the AIPs has to be created.

• AIP is passed on to the Archival Storage process, and the corresponding Descriptive Information to the Data Management process.

• Data Management process manages the Descriptive Information and also the data that are necessary to run the system

• Administration process handles routine work in the archive: negotiates with producers the prerequisites for sending documents to the archive.

DSEP Model

• Deposit System for Electronic Publications• The business routine of library can be

subdivided into four domains:– Acquisition of stock– Capturing metadata– Preservation and maintenance– Providing access

• The process Delivery & Capture transforms documents into SIPs conforming to the DSEP standards.

• The process Packaging & Delivery unpacks the DIP and transforms it into a format that can be used by the library system.

Recommended