Upload
future-perfect-2012
View
8.661
Download
2
Tags:
Embed Size (px)
DESCRIPTION
Digital Preservation in Perspective:How far have we come, and what's next?Jeff Rothenberg
Citation preview
Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24
Jeff RothenbergMarch 26, 2012
Digital Preservation in Perspective:How far have we come, and what's next?
Col
or p
hoto
by
Jeff
Rot
henb
erg
Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 0
A brief history of digital preservation
• Early statements of the problem– Jay Bolter, Margaret Hedstrom, David Bearman – Avra Michelson’s & my 1992 American Archivist paper– My 1995 Scientific American article– Into the Future film (CLIR, 1997; shown on PBS)– Tora Bikson’s & my 1999 report for the Dutch National Archives
• Gradual recognition of the problem– By librarians, archivists, modern museum curators – But without much technological depth of understanding in most cases– OAIS Preservation Planning assumed migration, though admits problems
• Some experiments & demonstrations– U. Leeds & U. Mich: CEDARS & CAMiLEON projects; BBC Domesday Book– Dutch National Archives Testbed: migration & UVC “data archiving” – UCSD Supercomputing Center & NARA: formalisms (e-mail only)– Guggenheim “ErlKing” renewal project– Dutch Royal Library (KB): Dioscuri emulator & eDepot
• Few serious attempts at implementation– Most implementations essentially ignore long-term preservation
Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 1
Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 2
Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 3
Col
or p
hoto
by
Jeff
Rot
henb
erg
Outline
• What should we mean by digital preservation?
• Levels of awareness of the problem
• Distinctions across disciplines
Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 4
• Responses
• Remaining challenges
Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 5
What should preservation mean?
“The goal of digital preservation is the accurate rendering of authenticated content over time.”
—ALA “medium” definition
Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 6
Preserve originals as well as “vernacular renditions”
The Canterbury Tales
Whan that Aprill, with his shoures soote The droghte of March hath perced to the roote
And specially from every shires endeOf Engelond, to Caunterbury they wende,The hooly blisful martir for to sekeThat hem hath holpen, whan that they were seeke.
When in April the sweet showers fallThat pierce March’s drought to the root and all
And specially from every shire’s endOf England they to Canterbury went,The holy blessed martyr there to seekWho helped them when they lay so ill and weak
• Used by scholars for serious research• Used to generate & evaluate vernacular renditions • Accessed by non-scholars for aesthetic purposes
(with help, e.g., see below)
• Used by non-scholars for casual research• May be used by scholars for research as well• Not thought of as a preservation copy• Not used as a source for later vernacular
renditions
Original Vernacular Rendition
Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 7
A particular “view” of information may be crucial
3 1
2 1
1 1 1 1 2
0 1 3 1 1 2 1 1 1 2 1 1 1 1
53 57 58 63 66 67 68 69 70 72 73 75 76 78 79 80 81
Temperature °F
Levels of O-ring
damage
Example: Space Shuttle O-ring damage vs. temperaturePrior to Challenger
Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 8
Revealing View of Space Shuttle O-ring Data
3
2
1
0
30o 35o 40o 45o 50o 55o 60o 65o 70o 75o 80o 85o
3
2
1
0
Temperature oF
Extrapolation of damage curve to the 31o F temperature forecast for Challenger’s launch on January 28, 1986.
Dots indicate temperature and O-ring damage for 24 successful launches prior to Challenger. Curve shows that increasing damage is related to cooler temperature.
Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 9
Furthermore, many digital artifacts are inherently digital
• They cannot be meaningfully represented as page images– Doing so loses essential aspects of their contents and/or behavior
• Examples include dynamic, active or interactive artifacts– Multimedia (e.g., web pages, CD-ROM publications, Ph.D. dissertations) – Dynamically generated (e.g., JavaScript, cgi, ASP or PHP web pages, Servelets)– Active presentation (e.g., animation, simulation, virtual reality)– Interactive (e.g., applets, interactive virtual reality, games)– Digital artwork
• Inherently digital artifacts are those whose perceptibility, meaning, or usability arise from and rely on their being encoded in digital form
Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 10
What you see is not what you get
V2.24 ERwin if %JoinPKPK(oldrows,newrows,” <> “,” or “) then select count(*) into numrows from %Child where %JoinFKPK(%Child,oldrows,” = “,” and”); if (numrows > 0) then signal parent_updrstrct_err end if; end if; if %JoinPKPK(oldrows,newrows,” <> “,” or “) then update %Child set %JoinFKPK(%Child,newrows,” = “,”,”) where %JoinFKPK(%Child,oldrows,” = “,” and”);
Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 11
Render unto seer...
Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 12
In fact, every digital artifact is a program
• A program– Is a sequence of commands in some formal language– That is intended to be interpreted– By an interpreter that understands that language
• An interpreter– Is an active process– That knows how to perform commands– Specified in a given formal language
• Interpretation ultimately involves hardware– ASCII codes are rendered by a printer or display– More complex entities are interpreted by software (applications)– But all software is ultimately interpreted by hardware
Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 13
Digital information promises to last better than analog
• A bitstream lasts forever– Producing exactly the same behavior, without loss (at least in principle)– So long as it can be interpreted correctly
• But interpreting a bitstream correctly requires software– And software must be run on hardware (a computer)– A computer is (ultimately) an analog device, that does decay– And both hardware and software become obsolete, long before they decay
• Digital objects do not decay, fade, tear, crumble, dissolve, etc.– Their media may, but not the bits themselves
Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 14
“Digital objects last forever — or five years, whichever comes first”
So the best we can say is...
Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 15
min ( ∞ , 5 )
“Digital objects last forever — or five years, whichever comes first”
So the best we can say is...
Outline
• What should we mean by digital preservation?
• Levels of awareness of the problem
• Distinctions across disciplines
Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 16
• Responses
• Remaining challenges
Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 17
Levels of awareness of the problem(by disciplines/institutions/individuals)
• Innocence
• Awakening
• Analysis
• Looking under the streetlamp
• Experimentation/Demonstration
• Where are we now?
Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 18
Innocence
• Why should digital artifacts be any different?– Preservation is preservation, isn’t it?
• Except for media obsolescence– Isn’t this just analogous to medieval monks copying manuscripts?
• Digital artifacts don’t decay or change– Isn’t this a dream come true for preservationists?
Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 19
Awakening
• Digital poses unique problems– Media obsolescence– Description (unique and complex attributes)– Cataloging (ephemeral reference, links)– Metadata (unique requirements)– Format/encoding (interpretation, conversion, corruption)– Future rendering (in the face of obsolete software and hardware)
• Digital preservation must be proactive– Over relatively short timeframes (5 years?)– Otherwise artifacts are likely to be irretrievably lost
Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 20
• Digital artifacts– What are their essential characteristics for preservation?
• Authenticity – What does this mean for digital artifacts?
• Rendering – How can we guarantee proper (or any) rendering in the future?
• Preservation– What does (should) this mean for digital artifacts in various disciplines?
• Costs – What are the up-front and long-term costs of digital preservation?– How should these costs be paid and by whom?
Analysis
Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 21
Looking under the streetlamp
• Metadata– Dublin Core, etc.– Depends on the nature of digital artifacts & technical preservation schemes
• Reference models– OAIS– Premature in the absence of viable technical preservation schemes
• Institutional process models– Premature in the absence of defined, viable technical preservation schemes– May tend to lock in approaches that are not viable
Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 22
The Open Archival Information System Reference Model(OAIS)
Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 23
Experimentation/Demonstration
• Dutch Archives Testbed– “Discovered” that migration is very hard (duh!)
• PLANETS, KEEP– Continuing to explore technically viable approaches
• BBC Domesday Book / CAMiLEON Project– Early warning of the need for timely, extreme action– Demonstrated the potential of hardware emulation
• Other emulation examples– Apple’s M68000 emulator for PowerPC– U. Warwick’s EDSAC emulator– Emory U’s MARBL collection – Guggenheim: Renewing the ErlKing– KB’s Dioscuri Emulator
Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 24
The BBC Domesday / CAMiLEON Project
Emulated at the University of Leeds, U.K. (2002)
Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 25
EDSAC: the first electronic digital computer
Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 26
Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 27
Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 28
Renewing the ErlKing
• An interactive mixed-media video experience– By Roberta Friedman and Grahame Weinbren– That overlays text and graphics on video content– And branches in response to user touchscreen input
• Highly innovative when created in 1982– Pushed the limits of affordable computers and video display– Included a custom-built “authoring” environment– Widely exhibited in major museums and other venues
Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 29
The ErlKing in the Guggenheim’s “Seeing Double” Show(March 18, 2004)
Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 30
KB’s Dioscuri EmulatorRunning my 1982 Calendar/1 Program
Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 31
Where are we now?
• Somewhere between 4 and 5 – Looking under the streetlamp– Experimentation/Demonstration
• Few end-to-end implementations– Except for page-image artifacts (e.g., LOCKSS, Portico)– And KB eDepot
Outline
• What should we mean by digital preservation?
• Levels of awareness of the problem
• Distinctions across disciplines
Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 32
• Responses
• Remaining challenges
Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 33
Responses
• Denial– What problem?
• Wishful thinking– Deus ex machina
• Misguided efforts (IMHO)– Digital garden paths
• Facing reality– What will it take?
• Where are we now?
Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 34
Denial
• Just save bits– And hope for the best (let our grandchildren worry about it)
• Expect commercial sector solutions– Microsoft, IBM, etc. will save us
• Popular formats will live forever or auto-migrate– (What the ancient Egyptians thought)
• Convergent formats like HTML and XML solve everything– But these are really just “scaffold” formats embedding others
Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 35
Preservation approaches
• Save and run obsolete hardware and software– In “computer museums”– To read documents by running the original programs that created them
• Rely on emulation of obsolete hardware to run saved software– Requires no migration or conversion (aside from media)– Saves originals in original form
• Rely on universal, formal description of logical formats – To allow interpreting those formats in the future– Thereby correctly rendering saved digital artifacts
• Rely on standards and migration– Expect new programs to read old documents in enduring standard forms– Convert documents from old standards to new ones as standards evolve
Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 36
Wishful thinking
• Metadata is all we need– Describe formats, behavior, etc.
• Format migration– The game of “telephone”
• Formal encoding (UCSD/NARA-ERA)– Maybe someday
• Rely on future cryptography– Counterexample: Hieroglyphics
• Digitize to preserve– e.g., Shoah
Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 37
Misguided efforts (IMHO)
• Focus on short-term preservation– Urgent enough to preclude long-term focus (e.g., JSTOR?)
• Reject emulation without understanding it– Seems like smoke and mirrors
• LC, NARA-ERA– Full speed ahead and damn the technical realities
Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 38
Facing reality
• Technological issues– For “inherently digital” artifacts (which will become more prevalent)
• Defining/preserving “digital originals”– Retaining original rendering & behavior– Enabling repeated “vernacular extraction” of surrogates
• Comparative cost analyses– Informed by technological understanding– Looking at overall lifecycle costs
• Realistic process models– Based on technologically viable approaches
• Facing long-term issues (KB/IBM-NL eDepot)– Loss of metadata– Partial loss or corruption of archival information package indexes
Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 39
Current implementation efforts
• NARA’s ERA project– Ill-conceived: assumed a solution would magically appear
• KB may still be in the lead– eDepot designed to address long-term preservation– Using a two-pronged migration/emulation approach– Planets & KEEP projects continuing to explore longer-term issues
• LC still seems somewhat aimless– Lost half their NDIIP funding after 2006 (some since restored)
• Most so-called “archiving” efforts ignore preservation– LOCKSS, Portico (journal archiving) offer no real preservation– Internet Archive seems based on wishful thinking
• BL proceeding rationally– Pursuing a broadly-based, intelligent strategy
Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 40
Where are we now?
• Somewhere between 2 and 4?– Misguided efforts– Facing reality
• Still at 1?– Denial
Outline
• What should we mean by digital preservation?
• Levels of awareness of the problem
• Distinctions across disciplines
Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 41
• Responses
• Remaining challenges
Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 42
Distinctions across contexts
• Disciplines: Libraries, Archives, Museums– Archives: preserve “record” value– Libraries: preserve[/contextualize] content/rendering– Museums: preserve/recreate/contextualize experience
• Institutions: National, Commercial, NGO– Commercial: film industry, petrochemical, pharma
(core vs. ancillary assets)– Shoah Fndn (Spielberg): http://dornsife.usc.edu/vhi/preservation
• Individuals– Mostly not yet begun
Outline
• What should we mean by digital preservation?
• Levels of awareness of the problem
• Distinctions across disciplines
Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 43
• Responses
• Remaining challenges
Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 44
Remaining challenges
• Integrate true long-term perspective– Render “inherently digital” artifacts– Recognize the executability of all digital artifacts– Preserve digital originals and facilitate “vernacular renditions”
• Engage the Computer Science (ICT) field– Conference sessions, working groups, etc.
• Perform serious cost and process analyses– Based on viable technological approaches
• Try some small-scale “end-to-end” demonstrations– Long-term focus– Inherently digital artifacts– Preserve digital originals and produce “vernacular renditions”– Develop and test realistic process models– Instrument, measure, and evaluate:
- Authenticity, quality, accessibility, usability, cost - Effort, scalability, reproducibility (of process)
Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 45
Expected cost & effectiveness comparisons
Cost:
Per-format (x 1000)
Per-platform (x 10)
Per-artifact (x 100,000,000)Process at Ingest
Reverse-engineer
Convert over time
Obtain necessary S/W
Create H/W emulators
Per-approach (x 1)Create EVM or formalism
H,M,L: High, Med, Low +,- : Frequent, Rare
arch
aeol
ogy
form
aliz
atio
stan
dard
s
view
ers
mig
ratio
n
emul
atio
n
Access
0 H/ - 0 0 0 H/ -
0 H/ - H/ - H/+ H/+ 00 0 0 M/+ M/- L /+
0 H H 0 0 L0 M/- H/ - H/+ H/+ 0H M L L L L
0 0 0 0 0 H/ -
Effectiveness:On each artifact% of formats handled
L M M M M HL L L M L H
Port to new platforms 0 L/ - M/ - H/ - M/ - M/ -
Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 46
References for Jeff Rothenberg
http://www.JeffRothenberg.org