37
Digital Preservation – An Introduction DPE/Planets/nestor training event October, 1st- 5th 2007 Vilnius, Lithuania Stefan Strathmann Göttingen State and University Library nestor - Network of Expertise in Long-Term Storage of Digital Resources

Trm Introduction

Embed Size (px)

DESCRIPTION

DPE Training materials

Citation preview

  • 1.Digital Preservation An Introduction DPE/Planets/nestor training eventOctober, 1st- 5th 2007 Vilnius, Lithuania Stefan Strathmann Gttingen State and University Library nestor -Network of Expertise in Long-Term Storage of Digital Resources

2. Session Outline

  • 10:00 10:45 Lecture
  • 10:45 11:00 Discussion
  • 11:00 11:30 Coffee Break
  • 11:30 12:30 Group Work
  • 12:30 13:00 Groups present their results
  • 13:00 13:15 Summary discussion

3. Key Questions

  • What is digital preservation?
  • Why is digital preservation important?
  • What are the big challenges?
  • What are the relevant standards, initiatives, programs?

4. Table of Contents

  • General Introduction
  • Relevant Aspects

5. Digital Preservation The Challenge

  • Hardware and Software are becoming obsolete invery short periods of time
  • Incompatibility of different versions of hard- and software
  • Fading knowledge of how to use older hard- and software
  • Aging and decaying storage media
  • Loss of Information

6. Example Loss of Information Acrobat 5 Acrobat 7 7. UNESCO

  • Charter on the Preservation of Digital Heritage, October 15th, 2003
  • Article 1:
  • The digitalheritageconsists of unique resources of human knowledge and expression.
  • Many of these resources have lasting value and significance, and therefore constitute a heritage that should beprotected and preservedfor current and future generations.

8. UNESCO Charter: Articles

  • Article 2 Access to the digital heritage
  • Article 3 The threat of loss
  • Article 4 Need for action
  • Article 5 Digital continuity
  • Article 6 Developing strategies and policies
  • Article 7 Selecting what should be kept
  • Article 8 Protecting the digital heritage
  • Article 9 Preserving cultural heritage
  • Article 10 Roles and responsibilities
  • Article 11 Partnerships and cooperation

9. Digital Resources

  • New forms of information:
    • digital production (digitization, born digital, only digital)
    • digital publication (only digital, object features like retrieval)
    • digital distribution (portal, value chain)
  • Rapid change of technology

10. Digital Long-Term Preservation

  • Digital Preservation consists of processes that ensure that
  • digital objects remain
    • accessible,
    • (re-)usable and
    • understandable
  • in the future.
  • Digital Preservation has to ensure that future software and
  • hardware tools retain the authenticity, integrity, and
  • reliability of the digital object.

11. Digital Preservation A Definition

  • What is meant by digital long-term preservation or digital
  • preservation?
  • Definition by Ute Schwens / Hans Liegmann (DNB/nestor):
  • In terms of preserving digital resources, long-term does
  • not mean issuing a guarantee for five or fifty years, rather
  • the responsible development of strategies which can cope
  • with the constant changes brought about by the information
  • market.

12. Preservation Approaches

  • Migration
  • Emulation
  • Normalisation
  • Refreshing
  • Digital Archaeology
  • Hardware Museum/Technology Preservation
  • Print to Paper or Microfilm/fiche or barcode
  • ...

13. Digital Information An Estimate

  • UC Berkeleys School of Information Management and
  • Systems:
  • How much Information? 2003
  • Analysis of the year 2002 to estimate the yearly increase of new (digital and analog) information.
  • Finding: 30 % increase of digital information per year
  • See: http://www2.sims.berkeley.edu/research/projects/how-much-info-2003/index.htm

14. Heterogeneity - Materials

  • Journals and monographs
    • retrodigitized material
    • genuine digital material
  • Web Documents, Web Server
  • Preprint-Server, theses, e-Proceedings, etc.
  • Primary data, research data, raw data
  • Emails, blogs, etc.
  • Film, Music, Multimedia etc.
  • ...

15. Heterogeneity: Formats

  • Depends on subject, e.g.
    • Mathematics (TEX, PS, ...)
    • Geography (GIS)
    • ...
  • Multimedia, e.g.
    • Animated WWW pages
    • Interactive objects in e-Learning
    • ...
  • Different versions in e.g. PDF, TEX, ...
  • Presentation Format / Preservation Format

16. Heterogeneity - General

  • Metadata formats
  • (Dublin Core, MODS, PREMIS, MIX, ..)
  • Exchange formats (XML, METS, XML/RDF, SOAP, ...)
  • Controlled vocabulary systems (Ontologies, Taxonomies, ...)
  • Architecture, Protocols
  • ...
  • Standardisation & Interoperability

17. Dealing with the Heterogeneity

  • Preservation policy
  • Cooperation: international/national
  • Cooperation: cross-domain (e.g. museums, archives, research institutes, commercials, ...)
  • Redundancy of digital repositories explicitly desired
  • Cooperative management/administration of distributed digital archives/repositories

18. ...

  • Coordinated cooperation needed between:
    • producers of digital objects (e.g. scientists)
    • providers (e.g. libraries)
    • distributors (e.g. publishers, hosts of db)
  • Use of international standards (e.g. DC, OAI, OAIS, METS)

19. producer consumer SIP DIP Access Archival storage AIP AIP Administration Preservation Planning SIP AIP DIP Submission Information Package ArchivalInformation Package Dissemination Information Package Ingest Data management OAIS Model Example for a Standard 20. Relevant Aspects

  • Technical Issues / Obsolescence
  • Identification & Validation of Formats
  • Preservation Metadata
  • Preservation Policy
  • Legal Aspects
  • Trusted Repositories

21. Technical Issues / Obsolescence

  • Digital information is stored as a bit stream on physical media => Preservation of the bit stream!
    • Storage media types change quickly and are subject to obsolescence
    • Storage media are unstable and can degrade quickly
  • Keeping the bit stream accessible
    • Migration (Medium and Format)
    • Emulation (Hard- and Software)
    • ...

22. Formats: Identification & Validation

  • Examples:
    • Document - DOC, HTML
    • Raster Images - TIFF, PNG, JPEG
    • Structured graphics - CAD, VSD,
    • Audio - WAV, MP3, MIDI
    • Video - MPEG, AVI
    • Databases - DBF, MDB
    • Raw data
    • Collections - tar, zip
  • We are dealing with lots of different formats!
  • File format registries may help to handle the heterogeneity.

23. File Format Registries: Use Cases

  • Identification I have a digital object; what format is it?
  • Validation I have an object purportedly of formatF ; is it?
  • Transformation I have an object of formatF , but needG ; how can I produce it?
  • Characterization I have an object of formatF ; what are its features?
  • Risk assessment I have an object of formatF ; is it at risk of obsolescence?
  • Delivery I have an object of formatF ; how can I render it?
  • (Abrams, Seaman: Towards a global digital format registry. IFLA 2003)

24. Format validation with JOHVE

  • JSTOR/Harvard Object Validation Environment
    • see: http://hul.harvard.edu/jhove/
  • The concept of representation format, or type, permeates all technical areas of digital repositories. Policy and processing decisions regarding objectingest, storage, access, and preservationare frequently conditioned on a per-format basis. In order to achieve necessary operational efficiencies, repositories need to be able to automate these procedures to the fullest extent possible
  • How much technical metadata do I need?

25. Preservation Metadata

  • All Preservation strategies (migration, emulation, etc.) depend on the creation, capture and maintenance of suitable metadata:
    • "Preserving the right metadata is key to preserving digital objects" (ERPANET Briefing Paper, 2003)
    • "It's all about metadata" (Cedars project manager, ca. 2000)

26. Preservation Metadata

  • Specific preservation metadata are necessary to ensure that information can be accessed in the future, e.g. metadata about:
    • Provenance
    • Structure
    • File Format(s)
    • Technical Environment
    • Rights
  • Much of the necessary metadata can be extracted automatically, e.g. via tools like JHOVE

27. Preservation Policy

  • What do you want to preserve?
  • Why do you want to preserve?
  • How do you want to render an object in the future?
  • Furthermore ...
    • Documentation
    • Policy for short-term preservation
    • Policy for long-term preservation

28. Preservation Policy

  • What kind of digital objects is the repository responsible for?
    • Fixed format texts, images, web resources, complex digital objects, datasets,
  • What do you want to render in the future?
    • Keep the original?
    • What is the original?
    • Offer extended functionalities?

29. Preservation Policy

  • What are the significant properties of the object?
    • Appearance (layout, colour, font size, etc)
    • Behaviour (functionality, interaction, etc)
    • Structure (chapter, section, etc)
    • Content (text, video, audio, etc)
    • Context (cross-references, etc)
  • How do you want to provide access?
    • Designated User Community
    • Options for the user?

30. But Policies/Strategies are not enough ...

  • we need tools that
    • help choose & perform a strategy
    • make the strategy possible (emulators, migration tools)
    • maintain the link between originals and conversions
    • enable interoperability and co-operation between different repositories/archives
  • Tools have to be implemented in the archiving system and archiving workflow.
  • Preservation has to come to practice!

31. Legal Aspects

  • Copyright and other intellectual property rights (IPR) have a substantial impact on digital preservation
  • Preservation of digital materials is dependent on a range of strategies, which has implications for IPR in those materials
  • Consideration may need to be given not only to content but to any associated software
  • Specific permissions may be very challenging e.g. for webarchiving or digital art

32. Legal Aspects: Examples

  • What will be covered by legal deposit?
    • How much is served from within the country?
  • Strategy
    • The national publication archive
    • How are roles/responsibilities shared?
    • Web archiving initiatives (e.g. European Web Archive)
    • Development of electronic deposit systems
  • International collaboration
    • Other international repositories
    • Levels of redundancy
  • Access restrictions

33. ..., but

  • Digital preservation is often a legal grey area not yet understood or considered by legislators
  • Lack of legal certainty should not prevent digital preservation actions
  • Take action to manage risks

34. Trusted Repositories

  • Why trusted repositories?
    • It is very easy to manipulate digital information
    • The users need to trust the accessed information
    • Nobody is able to preserve everything distributed preservation management
  • Criteria of trusted repositories (i.e.TRAC , nestor)
    • Administrative responsibility
    • Financial sustainability
    • Technical security
    • ...

35. Thank you very much for your attention! Comments? Questions? Stefan Strathmann Gttingen State and University Library [email_address] 36. Exercise

  • WhichDigital Preservationissues are relevant in the
  • context of your Digital Collection? How are they relevant?
  • Data creation?
  • Data management (collection management)?
  • Data storage?
  • Data documentation and description?
  • Data preservation?
  • Data use?
  • Rights management?
  • ...
  • Try to describe a digital preservation Framework for your
  • institution.

37. Session Outline

  • 10:00 10:45 Lecture
  • 10:45 11:00 Discussion
  • 11:00 11:30 Coffee Break
  • 11:30 12:30 Group Work
  • 12:30 13:00 Groups present their results
  • 13:00 13:15 Summary discussion