Preservation Rumination
Priscilla Caplan,FCLA
OCLC DSSFebruary 16, 2005
Preservation Basics
THE NEED FOR DIGITAL PRESERVATION
Number of academic/scholarly journals published online: 15,757
Percent of U.S. federal government publications produced only online in 2003: 65 percent
Estimated percent of U.S. federal government publications available only online by 2008: 90 percent
From: California Digital Libraryhttp://www.cdlib.org/inside/projects/preservation
The problem of abundance
0500
10001500200025003000350040004500
items (millions)
LoCWeb
•Percent of web-based references in scientific articles from 3 major journals inaccessible within 2 years of publication: 21%
•Proportion of websites in 1998 gone in 1999: 44%
•Life of an average website: 44 days
The problem of ephemerality
The problems of media life expectancy and obsolescence
The problem of format obsolescence
Maintain original
technology
Preserve Technology
OBJECTIVE
Preserve Objects
Spec
ific
APPLI
CABIL
ITY
Gen
eral
ProgrammableChips
Emulation
Viewer
Re-engineerSoftware
VirtualMachine
UniversalVirtual
Computer
VersionMigration
FormatStandardization
Rosetta StoneTranslation
Typed ObjectConversion
PersistentArchives
ObjectI nterchange
Format
Source: Thibodeau, 2002.
The problem of rights
Integrity
Viability
Renderability
The Preservation Pyramid
Description
Secure storage
Media management
Preservation strategies
Availability
Identity
CaptureSelection
Authenticity
Traditionally, preserving things meant keeping them unchanged; however … if we hold on todigital information without modifications, accessing the information will become increasinglymore difficult, if not impossible.
From: The Paradox of Preservation,Su-Shing Chen
“Preservation metadata ...is the information necessary to maintain the viability, renderability, and understandability of digital resources over the long-term.”
OCLC/RLGPreservation
Metadata Framework Working Group
Understandability
Integrity
Viability
Renderability
Revised Preservation Pyramid
Description
Secure storage
Media management
Availability
Identity
CaptureSelection
UnderstandabilityAuthenticity
Preservation strategies
Who is doing preservation?
Research Libraries
Government Archives
Historical Societies
Individual Collectors
Who is doing digital preservation?
Research Libraries
Government Archives
Historical Societies
Individual Collectors
National Libraries
Research Centers
Public broadcasting
Integrity
Viability
Renderability
Description
Secure storage
Media management
Availability
Identity
CaptureSelection
UnderstandabilityAuthenticity
Preservation strategies
DSPACE
Integrity
Viability
Renderability
Description
Secure storage
Media management
Availability
Identity
CaptureSelection
UnderstandabilityAuthenticity
Preservation strategies
LOCKSS
Integrity
Viability
Renderability
Description
Secure storage
Media management
Availability
Identity
CaptureSelection
UnderstandabilityAuthenticity
Preservation strategies
OCLCDigitalArchive
Integrity
Viability
Renderability
Description
Secure storage
Media management
Availability
Identity
CaptureSelection
UnderstandabilityAuthenticity
Preservation strategies
LCMinerva
Integrity
Viability
Renderability
Description
Secure storage
Media management
Availability
Identity
CaptureSelection
UnderstandabilityAuthenticity
Preservation strategies
FCLADigitalArchive
Preservation in Action
State Universities
FCLA
•Designed as a “dark archive”•Preservation repository functions only•Based on OAIS functional architecture•“Bit-level” and “Full” preservation•Format migration and normalization
OAIS Functional Architecture
4-1.
2
MANAGEMENT
Ingest
Data Management
SIP
AIPDIP
queries
result setsAccess
PRODUCER
CONSUMER
Descriptive Info
AIP
orders
Descriptive Info
Archival Storage
Administration
Preservation Planning
DAITSS Functional Architecture
IngestSIP
AIP
Storagemanagement
Access
DIP
Reporting
MgmtDB
L
I
B
R
A
R
Y
L
I
B
R
A
R
Y
DAITSS Data Model
Intellectualentity
(1)
Bitstream(0..n)
Information Package
Data File (1..n)
DAITSS Data File Object
X M L S G M L
M a rku p F ile T IF F F ile
D T D
T e x tF ile P D F F ile
D a ta F ile
A u d io
JP E G Im a ge T IF F Im a ge
Im a ge T e xt V id eo
B its tre am
DAITSS Bitstream Object
Risk Management
•Storing multiple master copies of files•Calculating two message digests•Storing metadata as XML and in RDBMs•Normalizing when possible•Always retaining original•Action plans and background papers
Ingest Functions
METS validation and metadata extraction Virus check and checksum verification File format identification Creation of Data File and Bitstream objects Harvesting of external files Normalization and Forward Migration Technical, relationship and event metadata AIP creation Storage update Data table update
Ingest Example: A simple SIP
XML
PDF AVI
SIP
XML
PDF AVI
SIP
XML
XML
XML
XML
XML
XML
TIFF
TIFF
TIFF
Database
AIP
Future Plans
Find partners to install at other places
Finish DAITSS
Release under open source license
Build a community of developers for different formats
References
Priscilla Caplan: www.fcla.edu/~pcaplan, [email protected] FCLA Digital Archive: www.fcla.edu/digitalArchive Terry Kuny, “A Digital Dark Ages?”
www.ifla.org/IV/ifla63/63kuny1.pdf PREMIS Implementation Survey
www.oclc.org/research/projects/pmwg/surveyreport.pdf Roy Rosenzweig, “Scarcity or Abundance?”
www.historycooperative.org/journals/ahr/108.3/rosenzweig.html
O’Neil et al. “Trends in the Evolution of the Public Web” www.dlib.org/dlib/april03/lavoie/04lavoie.html
Clifford Lynch, “Authenticity and Integrity in the Digital Environment” www.clir.org/pubs/reports/pub92/lynch.html