- 1.Digital Preservation An Introduction DPE/Planets/nestor
training eventOctober, 1st- 5th 2007 Vilnius, Lithuania Stefan
Strathmann Gttingen State and University Library nestor -Network of
Expertise in Long-Term Storage of Digital Resources
2. Session Outline
- 12:30 13:00 Groups present their results
- 13:00 13:15 Summary discussion
3. Key Questions
- What is digital preservation?
- Why is digital preservation important?
- What are the big challenges?
- What are the relevant standards, initiatives, programs?
4. Table of Contents
5. Digital Preservation The Challenge
- Hardware and Software are becoming obsolete invery short
periods of time
- Incompatibility of different versions of hard- and
software
- Fading knowledge of how to use older hard- and software
- Aging and decaying storage media
6. Example Loss of Information Acrobat 5 Acrobat 7 7. UNESCO
- Charter on the Preservation of Digital Heritage, October 15th,
2003
- The digitalheritageconsists of unique resources of human
knowledge and expression.
- Many of these resources have lasting value and significance,
and therefore constitute a heritage that should beprotected and
preservedfor current and future generations.
8. UNESCO Charter: Articles
- Article 2 Access to the digital heritage
- Article 3 The threat of loss
- Article 4 Need for action
- Article 5 Digital continuity
- Article 6 Developing strategies and policies
- Article 7 Selecting what should be kept
- Article 8 Protecting the digital heritage
- Article 9 Preserving cultural heritage
- Article 10 Roles and responsibilities
- Article 11 Partnerships and cooperation
9. Digital Resources
- New forms of information:
-
- digital production (digitization, born digital, only
digital)
-
- digital publication (only digital, object features like
retrieval)
-
- digital distribution (portal, value chain)
- Rapid change of technology
10. Digital Long-Term Preservation
- Digital Preservation consists of processes that ensure
that
- Digital Preservation has to ensure that future software
and
- hardware tools retain the authenticity, integrity, and
- reliability of the digital object.
11. Digital Preservation A Definition
- What is meant by digital long-term preservation or digital
- Definition by Ute Schwens / Hans Liegmann (DNB/nestor):
- In terms of preserving digital resources, long-term does
- not mean issuing a guarantee for five or fifty years,
rather
- the responsible development of strategies which can cope
- with the constant changes brought about by the information
12. Preservation Approaches
- Hardware Museum/Technology Preservation
- Print to Paper or Microfilm/fiche or barcode
13. Digital Information An Estimate
- UC Berkeleys School of Information Management and
- How much Information? 2003
- Analysis of the year 2002 to estimate the yearly increase of
new (digital and analog) information.
- Finding: 30 % increase of digital information per year
- See:
http://www2.sims.berkeley.edu/research/projects/how-much-info-2003/index.htm
14. Heterogeneity - Materials
- Web Documents, Web Server
- Preprint-Server, theses, e-Proceedings, etc.
- Primary data, research data, raw data
- Film, Music, Multimedia etc.
15. Heterogeneity: Formats
-
- Mathematics (TEX, PS, ...)
-
- Interactive objects in e-Learning
- Different versions in e.g. PDF, TEX, ...
- Presentation Format / Preservation Format
16. Heterogeneity - General
- (Dublin Core, MODS, PREMIS, MIX, ..)
- Exchange formats (XML, METS, XML/RDF, SOAP, ...)
- Controlled vocabulary systems (Ontologies, Taxonomies,
...)
- Standardisation & Interoperability
17. Dealing with the Heterogeneity
- Cooperation: international/national
- Cooperation: cross-domain (e.g. museums, archives, research
institutes, commercials, ...)
- Redundancy of digital repositories explicitly desired
- Cooperative management/administration of distributed digital
archives/repositories
18. ...
- Coordinated cooperation needed between:
-
- producers of digital objects (e.g. scientists)
-
- providers (e.g. libraries)
-
- distributors (e.g. publishers, hosts of db)
- Use of international standards (e.g. DC, OAI, OAIS, METS)
19. producer consumer SIP DIP Access Archival storage AIP AIP
Administration Preservation Planning SIP AIP DIP Submission
Information Package ArchivalInformation Package Dissemination
Information Package Ingest Data management OAIS Model Example for a
Standard 20. Relevant Aspects
- Technical Issues / Obsolescence
- Identification & Validation of Formats
21. Technical Issues / Obsolescence
- Digital information is stored as a bit stream on physical media
=> Preservation of the bit stream!
-
- Storage media types change quickly and are subject to
obsolescence
-
- Storage media are unstable and can degrade quickly
- Keeping the bit stream accessible
-
- Migration (Medium and Format)
-
- Emulation (Hard- and Software)
22. Formats: Identification & Validation
-
- Raster Images - TIFF, PNG, JPEG
-
- Structured graphics - CAD, VSD,
- We are dealing with lots of different formats!
- File format registries may help to handle the
heterogeneity.
23. File Format Registries: Use Cases
- Identification I have a digital object; what format is it?
- Validation I have an object purportedly of formatF ; is
it?
- Transformation I have an object of formatF , but needG ; how
can I produce it?
- Characterization I have an object of formatF ; what are its
features?
- Risk assessment I have an object of formatF ; is it at risk of
obsolescence?
- Delivery I have an object of formatF ; how can I render
it?
- (Abrams, Seaman: Towards a global digital format registry. IFLA
2003)
24. Format validation with JOHVE
- JSTOR/Harvard Object Validation Environment
-
- see: http://hul.harvard.edu/jhove/
- The concept of representation format, or type, permeates all
technical areas of digital repositories. Policy and processing
decisions regarding objectingest, storage, access, and
preservationare frequently conditioned on a per-format basis. In
order to achieve necessary operational efficiencies, repositories
need to be able to automate these procedures to the fullest extent
possible
- How much technical metadata do I need?
25. Preservation Metadata
- All Preservation strategies (migration, emulation, etc.) depend
on the creation, capture and maintenance of suitable metadata:
-
- "Preserving the right metadata is key to preserving digital
objects" (ERPANET Briefing Paper, 2003)
-
- "It's all about metadata" (Cedars project manager, ca.
2000)
26. Preservation Metadata
- Specific preservation metadata are necessary to ensure that
information can be accessed in the future, e.g. metadata
about:
- Much of the necessary metadata can be extracted automatically,
e.g. via tools like JHOVE
27. Preservation Policy
- What do you want to preserve?
- Why do you want to preserve?
- How do you want to render an object in the future?
-
- Policy for short-term preservation
-
- Policy for long-term preservation
28. Preservation Policy
- What kind of digital objects is the repository responsible
for?
-
- Fixed format texts, images, web resources, complex digital
objects, datasets,
- What do you want to render in the future?
-
- Offer extended functionalities?
29. Preservation Policy
- What are the significant properties of the object?
-
- Appearance (layout, colour, font size, etc)
-
- Behaviour (functionality, interaction, etc)
-
- Structure (chapter, section, etc)
-
- Content (text, video, audio, etc)
-
- Context (cross-references, etc)
- How do you want to provide access?
-
- Designated User Community
30. But Policies/Strategies are not enough ...
-
- help choose & perform a strategy
-
- make the strategy possible (emulators, migration tools)
-
- maintain the link between originals and conversions
-
- enable interoperability and co-operation between different
repositories/archives
- Tools have to be implemented in the archiving system and
archiving workflow.
- Preservation has to come to practice!
31. Legal Aspects
- Copyright and other intellectual property rights (IPR) have a
substantial impact on digital preservation
- Preservation of digital materials is dependent on a range of
strategies, which has implications for IPR in those materials
- Consideration may need to be given not only to content but to
any associated software
- Specific permissions may be very challenging e.g. for
webarchiving or digital art
32. Legal Aspects: Examples
- What will be covered by legal deposit?
-
- How much is served from within the country?
-
- The national publication archive
-
- How are roles/responsibilities shared?
-
- Web archiving initiatives (e.g. European Web Archive)
-
- Development of electronic deposit systems
- International collaboration
-
- Other international repositories
33. ..., but
- Digital preservation is often a legal grey area not yet
understood or considered by legislators
- Lack of legal certainty should not prevent digital preservation
actions
- Take action to manage risks
34. Trusted Repositories
- Why trusted repositories?
-
- It is very easy to manipulate digital information
-
- The users need to trust the accessed information
-
- Nobody is able to preserve everything distributed preservation
management
- Criteria of trusted repositories (i.e.TRAC , nestor)
-
- Administrative responsibility
35. Thank you very much for your attention! Comments? Questions?
Stefan Strathmann Gttingen State and University Library
[email_address] 36. Exercise
- WhichDigital Preservationissues are relevant in the
- context of your Digital Collection? How are they relevant?
- Data management (collection management)?
- Data documentation and description?
- Try to describe a digital preservation Framework for your
37. Session Outline
- 12:30 13:00 Groups present their results
- 13:00 13:15 Summary discussion