18
Chronopolis – MetaArchive Improving and Strengthening Inter-Institutional Preservation

Chronopolis – MetaArchive Improving and Strengthening Inter-Institutional Preservation

Embed Size (px)

Citation preview

Page 1: Chronopolis – MetaArchive Improving and Strengthening Inter-Institutional Preservation

Chronopolis – MetaArchive

Improving and Strengthening Inter-Institutional Preservation

Page 2: Chronopolis – MetaArchive Improving and Strengthening Inter-Institutional Preservation

.

From Silos to Interoperability

• Digital preservation is still an emerging field• Two successful approaches: – Integrated Rule-Oriented Data System (iRODS)– Lots of Copies Keep Stuff Safe (LOCKSS)

• Powerful technologies, currently isolated• Seeking to bridge the gap and foster

interoperability

Page 3: Chronopolis – MetaArchive Improving and Strengthening Inter-Institutional Preservation

.

Presentation sections

• Chronopolis Program overview • MetaArchive Cooperative overview• Current and proposed work to automate the

exchange of data between the systems

Page 4: Chronopolis – MetaArchive Improving and Strengthening Inter-Institutional Preservation

.

Chronopolis Basic Facts

• Three node federated data grid at UCSD/SDSC, NCAR and UMIACS with capacity for up to 50 TB of data per node (150 TB total)

• Using the Storage Resource Broker (SRB) for data management (moving to iRODS)

• Using BagIt file packaging format and SRB tools to ingest and transfer data

• Using Auditing Control Environment (ACE) for integrity checking

Page 5: Chronopolis – MetaArchive Improving and Strengthening Inter-Institutional Preservation

Current Chronopolis collections

Spring 2010Data Providers:

• Inter-university Consortium of Political and Social Research –preservation copy of collections including 40 years of social science data and Census

• California Digital Library –political and government web crawls, Web-at-risk collection

• SIO Explorer – data from 50 years of research voyages

• NCSU Libraries -- state and local geospatial data

http://chronopolis.sdsc.edu

Page 6: Chronopolis – MetaArchive Improving and Strengthening Inter-Institutional Preservation

.

MetaArchive Basic Facts• Established in 2004, preserving content for 15 member institutions

• Uses LOCKSS software to provide long-term care for materials in a distributed digital preservation network

• Sustainable organizational framework: Membership organization with a 501c3 host (Educopia Institute)

• 254 TB network capacity (adding more as new members join)

• Compliant as a Trustworthy Digital Repository (2009 TRAC audit available on our site)

Page 7: Chronopolis – MetaArchive Improving and Strengthening Inter-Institutional Preservation

.

MetaArchive CollectionsCurrent Members/Contributors, Spring 2010Auburn UniversityBoston CollegeClemson UniversityFlorida State UniversityFolger Shakespeare LibraryGeorgia TechLibrary of CongressPenn State UniversityPUC Rio de JaneiroRice UniversityUniversity of HullUniversity of LouisvilleUniversity of North TexasUniversity of South CarolinaVirginia Tech

Current AffiliatesLibrary of CongressNDLTDSDSC Chronopolis We welcome new members!

Page 8: Chronopolis – MetaArchive Improving and Strengthening Inter-Institutional Preservation

.

Collaboration Roadmap

• Chronopolis and MetaArchive realize the value in looking at inter-institutional preservation

• Have been pursuing informally

• Looking at ways of formalizing this process for long-term preservation goals

Page 9: Chronopolis – MetaArchive Improving and Strengthening Inter-Institutional Preservation

.

The Plan

• Develop tools and methods to automate exchange of data between MetaArchive Cooperative (LOCKSS-based) and Chronopolis (iRODS-based)

• Examine data transfer tools/protocols from:– California Digital Library micro-services– iRODS protocols for data transfer– LOCKSS “plug-in” approach for data transfer

• Goal: A highly robust, easy to use preservation “system,” allowing digital objects to be shared between several major preservation networks in the U.S

Page 10: Chronopolis – MetaArchive Improving and Strengthening Inter-Institutional Preservation

.

Focus Issues• What does it mean to unite systems?

• Ability to export data between systems– Verify appropriate fixity– Transparency for system administrators

• Ability to track collections between systems– Verify collections are retrievable– Verify collections retain original characteristics

Page 11: Chronopolis – MetaArchive Improving and Strengthening Inter-Institutional Preservation

.

Technical Issues

• What are the best ways to have an SRB/iRODS datagrid and a LOCKSS PLN interact?

• What does it mean to have an active system (MetaArchive) and an archival system (Chronopolis) work together?

• What are the appropriate transfer technologies?– iRODS and LOCKSS native tools– CDL Micro-services, e.g. BagIt

Page 12: Chronopolis – MetaArchive Improving and Strengthening Inter-Institutional Preservation

.

The Process

• Identify the atomic units in our process– E.g. ingest, verification, data transfer, fixity

checking

• Identify commonalities and differences

• Resolve needed issues

Page 13: Chronopolis – MetaArchive Improving and Strengthening Inter-Institutional Preservation

.

Transfer technology: BagIt• Hierarchical file packaging format for exchanging digital content

– There is no software to install– Consists of base directory with manifest file & subdirectory with content– Manifest file has a row for each content file with:

• Full path in content directory• A checksum for file

• “Holey” Bags– Have additional ‘fetch.txt’ file in base directory & empty content directory– URLs for each content file are listed in fetch.txt file.– Can reduce transfer time by fetching content in parallel

http://www.digitalpreservation.gov/library/resources/tools/docs/bagitspec.pdf

Page 14: Chronopolis – MetaArchive Improving and Strengthening Inter-Institutional Preservation

.

Initial development goals

• XML-standardized representation of common technical data that needs to be tracked for exchange and preservation of data and metadata

• Ingestion reference model and framework to enable automated and interoperable capture of metadata from files in MetaArchive and Chronopolis

Page 15: Chronopolis – MetaArchive Improving and Strengthening Inter-Institutional Preservation

.

Procedural Issues

• What exactly are the inter-institutional ties?– “Just” backup? – Added service for our customers/members?– Will all customers want this?

• Legal issues with data owners

• MetaArchive and Chronopolis have very different management approaches. How do cross-institutional decisions get made?

Page 16: Chronopolis – MetaArchive Improving and Strengthening Inter-Institutional Preservation

.

Organizational Issues

• Having a “seat at the table” at meetings and planning processes

• Working together on staffing and hiring

• Working together to identify customers and new opportunities

Page 17: Chronopolis – MetaArchive Improving and Strengthening Inter-Institutional Preservation

.

The Big Win

• Important data preservation demonstration– No single system can solve all problems– No single system appeals to all user needs

• Practical, useful process for our organizations– Makes us individually stronger– Provides LOCKSS and iRODS systems with exit

strategies if they ever prove necessary– Enables tools built for one system to be used by both

Page 18: Chronopolis – MetaArchive Improving and Strengthening Inter-Institutional Preservation

.

Contacts

• MetaArchive: http://www.metaarchive.org/

• Chronopolis: http://chronopolis.sdsc.edu/

• Katherine Skinner: [email protected]

• David Minor: [email protected]