14
Auditing Distributed Digital Preservation Micah Altman MIT Libraries Jonathan Crabtree Odum Institute UNC Chapel Hill Prepared for Aligning Digital Preservation across Nations Amsterdam 2013

Micah Altman MIT Libraries Jonathan Crabtree Odum Institute UNC Chapel Hill Prepared for Aligning Digital Preservation across Nations Amsterdam 2013 Website:

Embed Size (px)

Citation preview

Page 1: Micah Altman MIT Libraries Jonathan Crabtree Odum Institute UNC Chapel Hill Prepared for Aligning Digital Preservation across Nations Amsterdam 2013 Website:

Auditing Distributed Digital Preservation

Micah AltmanMIT Libraries

Jonathan CrabtreeOdum Institute UNC Chapel Hill

Prepared for Aligning Digital Preservation across NationsAmsterdam 2013

Micah Altman
cleaned up look, moved branding to next page
Page 2: Micah Altman MIT Libraries Jonathan Crabtree Odum Institute UNC Chapel Hill Prepared for Aligning Digital Preservation across Nations Amsterdam 2013 Website:

Collaborators* Micah Altman, Leonid Andreev, Ed Bachman,

Adam Buchbinder, Ken Bollen, Bryan Beecher, Steve Burling, Tom Carsey, Thu-Mai Christian, Kevin Condon, Jonathan Crabtree, Merce Crosas, Gary King, Patrick King, Sophia Lafferty-Hess, Tom Lipkis, Freeman Lo, Jared Lyle, Marc Maynard, Nancy McGovern, Lois Timms-Ferrarra, Terry Rowland, Akio Sone, Bob Treacy

Research SupportThanks to the, IMLS (LG-05-09-0041-09), Library of

Congress (PA#NDP03-1), the National Science Foundation (DMS-0835500, SES 0112072), the Harvard University Library, the Institute for Quantitative Social Science, the Harvard-MIT Data Center, and the Murray Research Archive.

* And co-conspirators

Micah Altman
Added
Page 3: Micah Altman MIT Libraries Jonathan Crabtree Odum Institute UNC Chapel Hill Prepared for Aligning Digital Preservation across Nations Amsterdam 2013 Website:

Related Work Reprints available from:

http://futurelib.org

Altman, M., and J. Crabtree, 2011. “Using the SafeArchive System: TRAC-Based Auditing of LOCKSS”, Proceedings of Archiving 2011.

Thu-mai Christian, Jonathan Crabtree, Nancy Mcgovern et al., Overview of SafeArchive : An Open-Source System for Automatic Policy-Based Collaborative Archival Replication. Proceedings of iPres 2011. (Forthcoming)

Altman, M., Beecher, B., and Crabtree, J.; with L. Andreev, E. Bachman, A. Buchbinder, S. Burling, P. King, M. Maynard. 2009. "A Prototype Platform for Policy-Based Archival Replication." Against the Grain. 21(2): 44-47.

Altman, M., Adams, M., Crabtree, J., Donakowski, D., Maynard, M., Pienta, A., & Young, C. 2009. "Digital preservation through archival collaboration: The Data Preservation Alliance for the Social Sciences." The American Archivist. 72(1): 169-182

Micah Altman
Added
Page 4: Micah Altman MIT Libraries Jonathan Crabtree Odum Institute UNC Chapel Hill Prepared for Aligning Digital Preservation across Nations Amsterdam 2013 Website:

Managing copies can be challenging

Page 5: Micah Altman MIT Libraries Jonathan Crabtree Odum Institute UNC Chapel Hill Prepared for Aligning Digital Preservation across Nations Amsterdam 2013 Website:

Why distributed digital preservation?

Page 6: Micah Altman MIT Libraries Jonathan Crabtree Odum Institute UNC Chapel Hill Prepared for Aligning Digital Preservation across Nations Amsterdam 2013 Website:

Potential Nexuses for Preservation Failure

Technical• Media failure: storage conditions, media characteristics• Format obsolescence• Preservation infrastructure software failure• Storage infrastructure software failure• Storage infrastructure hardware failure

External Threats to Institutions• Third party attacks • Institutional funding• Change in legal regimes

Quis custodiet ipsos custodes?• Unintentional curatorial modification • Loss of institutional knowledge & skills• Intentional curatorial de-accessioning• Change in institutional mission

Source: Reich & Rosenthal 2005

Page 7: Micah Altman MIT Libraries Jonathan Crabtree Odum Institute UNC Chapel Hill Prepared for Aligning Digital Preservation across Nations Amsterdam 2013 Website:

Why was Created?Verified geographically-distributed replication of content is

an essential component of any comprehensive digital preservation plan.

The requirement has emerged as a necessity for recognition and certification as a trusted repository.

Page 8: Micah Altman MIT Libraries Jonathan Crabtree Odum Institute UNC Chapel Hill Prepared for Aligning Digital Preservation across Nations Amsterdam 2013 Website:

What can you do with ?

• Analyze any existing set of public LOCKSS systems or Private LOCKSS Network

• which collections are replicated?• when were they last verified, and updated?• identify potential problems with the storage network

• Create formal TRAC policies• create operational policies for replication and distribution• create advisory policies for all TRAC criteria

• Audit your storage network against your policies• verify that collections are currently replicated, verified, updated• create historical audit trails and evidence of long-term compliance

• Replicate content from web sites or digital repository systems• use SafeArchive/DVN plugins to replicate content in the Dataverse

Network• use SafeArchive/LOCKSS plugins to replicate content through OAI or

HTTP• Automatically deploy and repair LOCKSS replication based on policy

Page 9: Micah Altman MIT Libraries Jonathan Crabtree Odum Institute UNC Chapel Hill Prepared for Aligning Digital Preservation across Nations Amsterdam 2013 Website:

Why use ? SafeArchive provides the reliability of a top-down replication

system with the resiliency of a peer-to-peer model.

- SafeArchive automates high-level replication and distribution policies- SafeArchive automates multi-institutional replication- SafeArchive facilitates sharing TRAC policies- SafeArchive verification and audit trails for replication policies- SafeArchive is Open Source, and integrates with LOCKSS, and the

Dataverse Network- SafeArchive is Standards-Based, and supports DDI, OAI-PMH, and TRAC

Page 10: Micah Altman MIT Libraries Jonathan Crabtree Odum Institute UNC Chapel Hill Prepared for Aligning Digital Preservation across Nations Amsterdam 2013 Website:

Latest Research: Lessons Learned

Lesson 1: Replication agreement does not prove collection integrity seek external evidence of correct harvesting

Lesson 2: Replication disagreement does not not prove collection corruption seek diagnostics

Lesson 3: Distributed digital preservation works …with evidence-based tuning and adjustment

Page 11: Micah Altman MIT Libraries Jonathan Crabtree Odum Institute UNC Chapel Hill Prepared for Aligning Digital Preservation across Nations Amsterdam 2013 Website:

Lessons Learned Cont. Lesson 4: All networks had substantial and

unrecognized gaps Trust but continuously verify

Lesson 5: Don’t aim for 100% performance,aim for 100% compliance

Lesson 6: Many different things can go wrong in distributed systems, without easily recognizable external symptoms Distributed preservation requires distributed auditing analysis

Lesson 7: External information on system operation and collection characteristics is important for analyzing results Transparency helps preservation

Page 12: Micah Altman MIT Libraries Jonathan Crabtree Odum Institute UNC Chapel Hill Prepared for Aligning Digital Preservation across Nations Amsterdam 2013 Website:

Potential Alignment Areas Sharing experiences and solutions Sharing auditing tools Expand tools sets to additional audit

standards Develop standardized audit

interfaces to distributed digital preservation networks

Page 13: Micah Altman MIT Libraries Jonathan Crabtree Odum Institute UNC Chapel Hill Prepared for Aligning Digital Preservation across Nations Amsterdam 2013 Website:

Future SafeArchive Possibilities

Support additional audit standards• Data Seal of Approval• ISO 16363

Support additional replication networks• iRODS• Data Conservancy• Others??

Audit other policy sets• Data Management policies• IRB Policies

Page 14: Micah Altman MIT Libraries Jonathan Crabtree Odum Institute UNC Chapel Hill Prepared for Aligning Digital Preservation across Nations Amsterdam 2013 Website:

Questions Website

• www.safearchive.org Sourceforge

• http://safearchive.sourceforge.net/ Contacts

[email protected][email protected]