35
MetaArchive Distributed Digital Preservation Workshop Wednesday, May 30, 2007 Robert W. Woodruff Library Emory University Atlanta, Georgia

MetaArchive Distributed Digital Preservation Workshop Wednesday, May 30, 2007 Robert W. Woodruff Library Emory University Atlanta, Georgia

Embed Size (px)

Citation preview

Page 1: MetaArchive Distributed Digital Preservation Workshop Wednesday, May 30, 2007 Robert W. Woodruff Library Emory University Atlanta, Georgia

MetaArchiveDistributed Digital Preservation Workshop

Wednesday, May 30, 2007Robert W. Woodruff LibraryEmory UniversityAtlanta, Georgia

Page 2: MetaArchive Distributed Digital Preservation Workshop Wednesday, May 30, 2007 Robert W. Woodruff Library Emory University Atlanta, Georgia

5/30/07 DDP Workshop -Session 1 Slide 2

Day One Overview8:30 AM - 9:00 AM Light Breakfast and Welcome

9:00 AM - 10:30 AM Session 1. Overview of Distributed Digital Preservation Networks, M. Halbert

10:30 AM - 10:45 AM Break

10:45 AM - 12:15 PM Session 2. Content Management, C. Jannik and G. MacMillan

12:15 PM - 1:15 PM Lunch

1:15 PM - 2:45 PM Session 3. Costs and Operational Considerations, M. Halbert and K. Skinner

2:45 PM - 3:00 PM Break

3:00 PM - 4:30 PM Session 4. Organizational Agreements, D. Buttler and K. Skinner

4:30 PM - 4:45 PM Wrap Up

Page 3: MetaArchive Distributed Digital Preservation Workshop Wednesday, May 30, 2007 Robert W. Woodruff Library Emory University Atlanta, Georgia

5/30/07 DDP Workshop -Session 1 Slide 3

Purposes of this Workshop

• Foster discussion concerning distributed digital preservation strategies

• Share information and perspectives acquired in the course of the MetaArchive NDIIPP project

• Provide information and training for institutions seeking to build or join distributed digital preservation networks based on the LOCKSS software.

Page 4: MetaArchive Distributed Digital Preservation Workshop Wednesday, May 30, 2007 Robert W. Woodruff Library Emory University Atlanta, Georgia

5/30/07 DDP Workshop -Session 1 Slide 4

Introductions – Who We All Are

• Please introduce yourself

• Say where you are from

• Mention any particular things that you hope to get out of this workshop, and any other expectations you may have

• Identify any particular topics you hope we will spend time discussing

Page 5: MetaArchive Distributed Digital Preservation Workshop Wednesday, May 30, 2007 Robert W. Woodruff Library Emory University Atlanta, Georgia

5/30/07 DDP Workshop -Session 1 Slide 5

Learning Objectives for this Session• Review day one workshop sessions

• Overview of some digital preservation basics

• Reasons to establish or join a network

• Models of network organization

• Defining partner/member responsibilities

• Overview of MetaArchive and LOCKSS

Page 6: MetaArchive Distributed Digital Preservation Workshop Wednesday, May 30, 2007 Robert W. Woodruff Library Emory University Atlanta, Georgia

Overview of Some Digital Preservation Basics

Page 7: MetaArchive Distributed Digital Preservation Workshop Wednesday, May 30, 2007 Robert W. Woodruff Library Emory University Atlanta, Georgia

5/30/07 DDP Workshop -Session 1 Slide 7

The New Field of Digital PreservationCultural heritage organizations are rapidly expanding their digitization programs in an effort to provide better access to collections. As these digitization efforts go forward, and as an increasing number of born-digital acquisitions are made, there are concomitant needs for preservation of these materials.

•The DigCCurr 2007 Conference was hosted in April 2007 by the School of Information and Library Science at the University of North Carolina at Chapel Hill in an explicit effort to define the new field of Digital Curation.

•The Consultative Committee for Space Data Systems has of necessity created many working standards for preservation of digital information. One of the most notable standards was the Reference Model for an Open Archival Information System (OAIS) which provided a broad vocabulary for discussing digital archives systems and processes

•The National Digital Information Infrastructure and Preservation Program (NDIIPP) is the congressionally chartered national program to digitally preserve our national heritage

•The Digital Preservation Management Workshop hosted by Cornell University from 2003-2006 was an effort to collate and share relevant best practices and documentation from a large number of emerging projects and efforts related to digital preservation.

•In the UK, groups such as the Digital Curation Centre and the Digital Preservation Coalition have been formed to “foster joint action to address the urgent challenges of securing the preservation of digital resources in the UK and to work with others internationally to secure our global digital memory and knowledge base.”

Page 8: MetaArchive Distributed Digital Preservation Workshop Wednesday, May 30, 2007 Robert W. Woodruff Library Emory University Atlanta, Georgia

5/30/07 DDP Workshop -Session 1 Slide 8

The Data Loss Problem

Page 9: MetaArchive Distributed Digital Preservation Workshop Wednesday, May 30, 2007 Robert W. Woodruff Library Emory University Atlanta, Georgia

5/30/07 DDP Workshop -Session 1 Slide 9

The Data Loss Problem (cont.)

Page 10: MetaArchive Distributed Digital Preservation Workshop Wednesday, May 30, 2007 Robert W. Woodruff Library Emory University Atlanta, Georgia

5/30/07 DDP Workshop -Session 1 Slide 10

The Data Loss Problem (cont.)

Page 11: MetaArchive Distributed Digital Preservation Workshop Wednesday, May 30, 2007 Robert W. Woodruff Library Emory University Atlanta, Georgia

5/30/07 DDP Workshop -Session 1 Slide 11

The Data Loss Problem (cont.)

Page 12: MetaArchive Distributed Digital Preservation Workshop Wednesday, May 30, 2007 Robert W. Woodruff Library Emory University Atlanta, Georgia

5/30/07 DDP Workshop -Session 1 Slide 12

The Data Loss Problem (cont.)

From NDIIPP Website on the Importance of Digital preservation (http://www.digitalpreservation.gov/importance/):

Page 13: MetaArchive Distributed Digital Preservation Workshop Wednesday, May 30, 2007 Robert W. Woodruff Library Emory University Atlanta, Georgia

5/30/07 DDP Workshop -Session 1 Slide 13

National Digital Information and Infrastructure Preservation Program (NDIIPP)

Commentary • Technology has so altered our world that most of what

we now create begins life in a digital format. • The artifacts that tell the stories of our lives no longer

reside in a trunk in the attic, but on personal computers or Web sites, in e-mails or on digital photo and film cards.

• The flip side to the ease with which we are able to create digital content is the complexity of preservation and long-term retrieval of this content.

• We must contend with issues relating to hardware and software compatibility; long-term storage; organization of files for ease of search and retrieval; media quality; disaster recovery; and integrity of original data

Page 14: MetaArchive Distributed Digital Preservation Workshop Wednesday, May 30, 2007 Robert W. Woodruff Library Emory University Atlanta, Georgia

5/30/07 DDP Workshop -Session 1 Slide 14

Making Our Digital Heritage a Top PriorityWhen we consider the ways in which the American story has been conveyed to the nation, we think of items such as the Declaration of Independence, Depression-era photographs, television transmission of the lunar landing and audio of Martin Luther King's "I Have a Dream" speech. Each of these are physically preserved and maintained according to the properties of the physical media on which they were created. Yet, how will we preserve these essential pieces of our heritage?

•Web sites as they existed in the days following Sept. 11, 2001, or Hurricane Katrina?•What about Web sites developed during the national elections? •Executive correspondence generated via e-mail?•Web sites dedicated to political, social and economic analyses? •Data generated via geographical information systems, rather than physical maps? •Digitally recorded music or video recordings?•Web sites that feature personal information such as videos or photographs?•Social networking sites?•Should these be at a greater risk of loss, simply because they are not tangible?

•The content of digital archives at cultural heritage institutions, created with scarce resources in a time of great change

Page 15: MetaArchive Distributed Digital Preservation Workshop Wednesday, May 30, 2007 Robert W. Woodruff Library Emory University Atlanta, Georgia

5/30/07 DDP Workshop -Session 1 Slide 15

The Gap in Digital Preservation Programs• 66% of cultural heritage institutions

(academic libraries, archives, art museums, public libraries, and other similar kinds of institutions) report that no one is responsible for digital preservation activities

• 30% of all archives have been backed up one time or not at all

Source: 2005 NEDCC Survey by Bishoff and Clareson

Page 16: MetaArchive Distributed Digital Preservation Workshop Wednesday, May 30, 2007 Robert W. Woodruff Library Emory University Atlanta, Georgia

Reasons to Establish or Join a DDP Network

Page 17: MetaArchive Distributed Digital Preservation Workshop Wednesday, May 30, 2007 Robert W. Woodruff Library Emory University Atlanta, Georgia

5/30/07 DDP Workshop -Session 1 Slide 17

Backups versus Digital PreservationWhat differentiates a schedule for data backups from a digital

preservation program?

• Backups are tactical measures. Backups are typically stored in a single location (often nearby or collocated with the servers backed up) and are performed only periodically. Backups are designed to address short-term data loss via minimal investment of money and staff time resources. Backups are better than nothing, but not a comprehensive solution to the problem of preserving information over time.

• Digital preservation is strategic. A digital preservation program entails a geographically dispersed set of secure caches of critical information. A true digital preservation program will require multi-institutional collaboration and at least some ongoing investment to realistically address the issues involved in preserving information over time.

Page 18: MetaArchive Distributed Digital Preservation Workshop Wednesday, May 30, 2007 Robert W. Woodruff Library Emory University Atlanta, Georgia

5/30/07 DDP Workshop -Session 1 Slide 18

What is Digital Preservation?•Digital Preservation refers to the management of digital information over time.

• Unlike the preservation of paper or microfilm, the preservation of digital information demands ongoing attention. This constant input of effort, time, and money to handle rapid technological and organisational advance is considered the main stumbling block for preserving digital information beyond a couple of years.

• Digital preservation can therefore be seen as the set of processes and activities that ensure the continued access to information and all kinds of records, scientific and cultural heritage existing in digital formats.

http://en.wikipedia.org/wiki/Digital_preservation

Page 19: MetaArchive Distributed Digital Preservation Workshop Wednesday, May 30, 2007 Robert W. Woodruff Library Emory University Atlanta, Georgia

5/30/07 DDP Workshop -Session 1 Slide 19

Secure and Distributed Cache Networks

Why are the characteristics of geographically distribution and security so important? This strategy maximizes survivability of content in both individual and collective terms:

• Security reduces the likelihood that any single cache will be compromised.

• Distribution reduces the likelihood that the loss of any single cache will lead to a loss of the preserved content.

By creating a collaborative network for secure and distributed preservation, a group can also work together on more complex issues such as format migration.

Page 20: MetaArchive Distributed Digital Preservation Workshop Wednesday, May 30, 2007 Robert W. Woodruff Library Emory University Atlanta, Georgia

5/30/07 DDP Workshop -Session 1 Slide 20

Case Study from the Chirographic (Handwritten) Era: The Nag Hammâdi Library

• Collection of early Coptic texts discovered near the town of Nag Hammâdi in 1945

• Had been buried in the 4th Century CE when censored

• Only extent copies of core early Gnostic scholarship

• Survived 15 centuries because they were part of a secure, distributed chirographic network

Page 21: MetaArchive Distributed Digital Preservation Workshop Wednesday, May 30, 2007 Robert W. Woodruff Library Emory University Atlanta, Georgia

5/30/07 DDP Workshop -Session 1 Slide 21

Shared archiving Fails without a Pre-coordinated Digital Preservation Network in PlaceThe NDIIPP Archive Ingest and Handling Test (AIHT):• Designed to document methods for preserving digital

cultural materials, & identify areas that require further research

• Participants tested five different preservation systems • Encountered many unexpected incompatibilities because

of different systems• Realization that much of the cost in preserving digital

material is in coordinating the organizational and institutional imperatives of preservation, and not the technological costs of storage space

Page 22: MetaArchive Distributed Digital Preservation Workshop Wednesday, May 30, 2007 Robert W. Woodruff Library Emory University Atlanta, Georgia

5/30/07 DDP Workshop -Session 1 Slide 22

Both Technical Networking and Organizational Networking are Required

• A single cultural heritage organization is unlikely to have the capability to operate several geographically dispersed and securely maintained servers

• Collaboration between institutions on technological solutions is essential

• Similarly, inter-institutional agreements must be put in place or there will be no commitment to act in concert over time

The increased number and diversity of those concerned with digital preservation—coupled with the current general scarcity of resources for preservation infrastructure—suggests that new collaborative relationships that cross institutional and sector boundaries could provide important and promising ways to deal with the data preservation challenge.  These collaborations could potentially help spread the burden of preservation, create economies of scale needed to support it, and mitigate the risks of data loss.

- The Need for Formalized Trust in Digital Repository Collaborative Infrastructure NSF/JISC Repositories Workshop (April 16, 2007)

Page 23: MetaArchive Distributed Digital Preservation Workshop Wednesday, May 30, 2007 Robert W. Woodruff Library Emory University Atlanta, Georgia

Defining Partner/Member Responsibilities

Page 24: MetaArchive Distributed Digital Preservation Workshop Wednesday, May 30, 2007 Robert W. Woodruff Library Emory University Atlanta, Georgia

5/30/07 DDP Workshop -Session 1 Slide 24

Institutional and Consortial Roles• Preservation Sites are entities responsible for the ongoing activity of

preserving digital content. At a minimum, every preservation site must include responsible staff and a node server of the relevant preservation network. Preservation sites collectively comprise a preservation network.

• Development Sites are responsible for technical development of the computer systems that enable the preservation network. Obviously, development sites may also be preservation sites and/or contributing sites.

• A Preservation Network is composed of all preservation sites that work together to preserve at-risk digital content.

• Contributing (Content) Sites are institutions that need to preserve digital content, and therefore decide to contribute digital content into the preservation network. The preservation network acts for the common good to preserve the at-risk content submitted by the contributing sites. Contributing sites may also be preservation sites.

Page 25: MetaArchive Distributed Digital Preservation Workshop Wednesday, May 30, 2007 Robert W. Woodruff Library Emory University Atlanta, Georgia

5/30/07 DDP Workshop -Session 1 Slide 25

Individual Roles• Selectors are staff that identify and prioritize content to be

preserved. They will most often be knowledgeable concerning the content of an institution’s digital archives, and may have been the same individuals that originally created or acquired the archives.

• System Administrators are staff members that maintain individual preservation node servers of the relevant preservation network.

• Data Wranglers are programmers and other technically adept workers that prepare local digital archives for ingestion into a preservation network.

• Program Managers are leaders that accept responsibility for coordinating the activities of a digital preservation network.

NOTE: All of the above roles may overlap in creative ways!

Page 26: MetaArchive Distributed Digital Preservation Workshop Wednesday, May 30, 2007 Robert W. Woodruff Library Emory University Atlanta, Georgia

Models of Network Organization

Different Ways of Creating or Joining Digital Preservation

Networks

Page 27: MetaArchive Distributed Digital Preservation Workshop Wednesday, May 30, 2007 Robert W. Woodruff Library Emory University Atlanta, Georgia

5/30/07 DDP Workshop -Session 1 Slide 27

Dedicated Network

Create a Dedicated Preservation Network:

• Provides the greatest organizational control

• You can set up the rules for the network

• Requires greatest up-front investment to implement

Page 28: MetaArchive Distributed Digital Preservation Workshop Wednesday, May 30, 2007 Robert W. Woodruff Library Emory University Atlanta, Georgia

5/30/07 DDP Workshop -Session 1 Slide 28

Strategic Alliance

Build onto an Existing Preservation Network:

• Takes advantage of previous investments by others

• Requires understanding the rules of existing network and abiding by them

• Still requires capital investment in infrastructure

Page 29: MetaArchive Distributed Digital Preservation Workshop Wednesday, May 30, 2007 Robert W. Woodruff Library Emory University Atlanta, Georgia

5/30/07 DDP Workshop -Session 1 Slide 29

Piggyback RideArrange Contribution

Strategy to an Existing Preservation Network:

• No capital investment in infrastructure required

• Maximum advantage from previous investments by others

• Requires abiding by rules of existing network

• Requires convincing the existing network to preserve your stuff; will likely entail fees

Page 30: MetaArchive Distributed Digital Preservation Workshop Wednesday, May 30, 2007 Robert W. Woodruff Library Emory University Atlanta, Georgia

5/30/07 DDP Workshop -Session 1 Slide 30

Network Security FactorsWhat level of security and control over access to your data

do you need?

• Do you have sensitive assets that require access controls? If so, you may need a dedicated network in which you control access to the preservation nodes, or at least be able to join a network which provides such access assurances.

• Do you have some flexibility in adapting to other infrastructures and security policies? If so, it may be simplest to join and build your preservation nodes onto an existing network. The requirements may be readily acceptable.

• Do you have relaxed or no security/access expectations? If so, you may simply want to piggyback off an existing network and depend on their good graces.

Page 31: MetaArchive Distributed Digital Preservation Workshop Wednesday, May 30, 2007 Robert W. Woodruff Library Emory University Atlanta, Georgia

5/30/07 DDP Workshop -Session 1 Slide 31

Decisions on Degrees of SecurityMore security and access assurances drive up the required

costs of a preservation network:

• Extra costs may very well be justified! The entire point of a preservation network is long term security for you digital content.

• Strategic alliances can make a lot of sense. They leverage your resources, but still give you ownership of a portion of the infrastructure.

• If you have no infrastructural capacity, and little or no funding, a piggyback ride is better than nothing!

Page 32: MetaArchive Distributed Digital Preservation Workshop Wednesday, May 30, 2007 Robert W. Woodruff Library Emory University Atlanta, Georgia

Overview of MetaArchive and LOCKSS

Page 33: MetaArchive Distributed Digital Preservation Workshop Wednesday, May 30, 2007 Robert W. Woodruff Library Emory University Atlanta, Georgia

5/30/07 DDP Workshop -Session 1 Slide 33

MetaArchiveA dedicated preservation network for digital archives established under the

auspices of and with funding from the National Digital Information and Infrastructure Preservation Program (NDIIPP):

• Based on LOCKSS technology, but a separate network with high capacity nodes

• Highly distributed geographically across multiple states• Node servers are very secure, with a variety of extra security hardening

measures added to each preservation node• Memoranda of Understanding between participating sites concerning

commitment to maintain each other’s data security and network integrity• Motivation to preserve partners digital archives is based on signed

agreements and commitment to the preservation network • Available for others to join, both to build onto or to piggyback on• Active development community, committed to ongoing exploration of

distributed preservation technologies, digital Curation tools, and format migration methods

• Fee structure to join as members or to piggyback on

Page 34: MetaArchive Distributed Digital Preservation Workshop Wednesday, May 30, 2007 Robert W. Woodruff Library Emory University Atlanta, Georgia

5/30/07 DDP Workshop -Session 1 Slide 34

LOCKSSA dedicated preservation network for online journals, established with

funding from the Mellon Foundation and new funding from the NDIIPP:

• The pioneering leader in distributed digital preservation• Very highly distributed geographically across the world, with

hundreds of sites• Available for others to join, both to build onto or to piggyback on• Fee structure for membership• No signed agreements between sites; individual nodes may

preserve content or withdraw at will• Motivation to preserve content is based on interest by members in

long-term access to online journal content to which they subscribe • Active development community, with new initiatives with publishers

(CLOCKSS) and many other technical advancement directions

Page 35: MetaArchive Distributed Digital Preservation Workshop Wednesday, May 30, 2007 Robert W. Woodruff Library Emory University Atlanta, Georgia

Q&A Discussion