26
Can a Consortium Build a Viable Preservation Repository? Presentation at CNI March 31, 2014 Bradley Daigle (APTrust – University of Virginia) Stephen Davis (Columbia University) Linda Newman (University of Cincinnati) Suzanne Thorin (APTrust – University of Virginia) Scott Turnbull (APTrust – University of Virginia) www.aptrust.o

Can a Consortium Build a Viable Preservation Repository? Presentation at CNI March 31, 2014 Bradley…

Embed Size (px)

DESCRIPTION

APTrust Institutions Columbia University Johns Hopkins University Indiana University North Carolina State University Penn State University Stanford University Syracuse University University of Chicago University of Cincinnati University of Connecticut University of Maryland University of Miami University of Michigan University of North Carolina University of Notre Dame University of Virginia Virginia Tech

Citation preview

Page 1: Can a Consortium Build a Viable Preservation Repository? Presentation at CNI March 31, 2014 Bradley…

Can a Consortium Build a Viable Preservation

Repository?Presentation at CNI

March 31, 2014

Bradley Daigle (APTrust – University of Virginia) Stephen Davis (Columbia University)

Linda Newman (University of Cincinnati)Suzanne Thorin (APTrust – University of Virginia)Scott Turnbull (APTrust – University of Virginia)

www.aptrust.org

Page 2: Can a Consortium Build a Viable Preservation Repository? Presentation at CNI March 31, 2014 Bradley…

Academic Preservation TrustAcademic Preservation Trust, a consortium of 17 institutions, is taking a community approach in building and managing a repository infrastructure that will provide long-term preservation of the scholarly record. APTrust will also be a DPN first node.

www.aptrust.org

Page 3: Can a Consortium Build a Viable Preservation Repository? Presentation at CNI March 31, 2014 Bradley…

APTrust InstitutionsColumbia University Johns Hopkins

University Indiana UniversityNorth Carolina State

UniversityPenn State UniversityStanford UniversitySyracuse UniversityUniversity of ChicagoUniversity of Cincinnati

www.aptrust.org

University of Connecticut

University of MarylandUniversity of MiamiUniversity of MichiganUniversity of North

CarolinaUniversity of Notre

DameUniversity of VirginiaVirginia Tech

Page 4: Can a Consortium Build a Viable Preservation Repository? Presentation at CNI March 31, 2014 Bradley…

APTrust is hosted by the University of Virginia, which fully supports 5 ½ staff, including space and equipment.

Program DirectorLead EngineerJunior EngineerSystems EngineerContent Lead (1/2 time)

www.aptrust.org

Page 5: Can a Consortium Build a Viable Preservation Repository? Presentation at CNI March 31, 2014 Bradley…

Membership DuesMember dues: $20,000 annuallySupports partner meetings,

conference travel, contract and cloud services, marketing, and the web site

www.aptrust.org

Page 6: Can a Consortium Build a Viable Preservation Repository? Presentation at CNI March 31, 2014 Bradley…

What is the problem we are trying to solve?

Columbia UniversityUniversity of CincinnatiUniversity of Virginia

www.aptrust.org

Page 7: Can a Consortium Build a Viable Preservation Repository? Presentation at CNI March 31, 2014 Bradley…

Columbia University – Use Case 1Columbia University Libraries / Information Services has made commitments …to granting agencies to provide long-term

digital archiving for digital content created with grant funds

to third-party content creators to provide permanent access to born-digital content acquired from them

to continuing to collect and preserve archival collections, now partly or wholly born-digital content

to permanently preserve University-generated archival and research content

Page 8: Can a Consortium Build a Viable Preservation Repository? Presentation at CNI March 31, 2014 Bradley…

Columbia University – Use Case 2We must preserve the content of …

Local Digitization ProjectsPreservation-Related DigitizationInstitutional Repository / Data

SetsBorn Digital Archival ContentArchived Web SitesSuper Dark Archives – highly

secure

Page 9: Can a Consortium Build a Viable Preservation Repository? Presentation at CNI March 31, 2014 Bradley…
Page 10: Can a Consortium Build a Viable Preservation Repository? Presentation at CNI March 31, 2014 Bradley…
Page 11: Can a Consortium Build a Viable Preservation Repository? Presentation at CNI March 31, 2014 Bradley…

Columbia University – QuestionsWhy create our own single-institution long-

term preservation repository?Why divert scarce existing CUL/IS internal

equipment funds to storage on a permanent basis?

Why divert scarce existing CUL/IS staff time to creation, enhancement and maintenance of our own local preservation repository, permanently?

Why undergo the costs and staff investment in obtaining local TRAC certification?

Page 12: Can a Consortium Build a Viable Preservation Repository? Presentation at CNI March 31, 2014 Bradley…

Question: Why is digital preservation important to us?

Answer: We have digital collections where the original source material has deteriorated or is about to be intentionally destroyed. (Magnetic tapes, nitrate negatives considered flammable). The digital object is THE ONLY object.

Magnetic tape image by Daniel P. B. Smith. Released under the GNU Free Documentation License. http://en.wikipedia.org/wiki/File:Magtape1.jpg

Nitrate negative from Cincinnati Subway and Street Improvements (digital collection) http://drc.libraries.uc.edu/handle/2374.UC/702759

University of Cincinnati – Use Case

www.aptrust.org

Page 13: Can a Consortium Build a Viable Preservation Repository? Presentation at CNI March 31, 2014 Bradley…

University of Cincinnati – Use CaseQuestion: Why is digital preservation

important to us?Answer: We just moved a repository system from Columbus Ohio to our Cincinnati campus. 10 TBs of data, in 16 different VMDKs (virtual machine disk images) was transferred over the internet pipelineChecksums were created for each VMDK and verified upon receipt, some taking 24 hours to calculate.Checksums were also created for one-million+ files, compared with info in the repository database, and re-compared after the storage format was changed (from VMDK to NFS).

www.aptrust.org

Page 14: Can a Consortium Build a Viable Preservation Repository? Presentation at CNI March 31, 2014 Bradley…

University of Cincinnati – Use CaseQuestion: Why is digital preservation

important to us?Answer: (continued)We decided to test a full backup and restore. This took over a week, and we discovered that 16 of our digital assets were corrupt. We diagnosed the cause, adjusted, and repeated without error – but if we had not been comparing before and after checksums of all files we would not have known about the corruption. This process took a 1.5 months and offered a striking example of the care that must be taken to avoid losing data when moving large amounts of it.

www.aptrust.org

Page 15: Can a Consortium Build a Viable Preservation Repository? Presentation at CNI March 31, 2014 Bradley…

University of Cincinnati – Use CaseQuestion: Why is digital preservation

important to us?Answer: Our credibility is at stake. We want

to be believed.

www.aptrust.orgPhotograph; President Nixon with Elvis Presley; 20 Dec 1970; Richard Nixon Presidential Library and Museum, Yorba Linda, California. http://www.nixonlibrary.gov/forresearchers/find/av/photo/images/12_20_70_3.gif

Page 16: Can a Consortium Build a Viable Preservation Repository? Presentation at CNI March 31, 2014 Bradley…

University of Cincinnati – Use CaseQuestion: Why is digital preservation

important to us?Answer: (continued) We are promoting a new digital repository to

our faculty. Its raison d'être – why researchers should deposit their digital assets in this repository rather than or in addition to several short-term delivery systems on our campus – is long term persistence.

We have promised that their assets will also be preserved in a dark archive such as the Academic Preservation Trust. We have stated that preservation means bit-level integrity and format migration.

We have asserted that the Libraries’ traditional mission of preservation of the cultural record now applies to the digital scholarly record.

www.aptrust.org

Page 17: Can a Consortium Build a Viable Preservation Repository? Presentation at CNI March 31, 2014 Bradley…

University of Virginia Use Case

Integral part of our preservation and curatorial landscape

Soup to nuts process for analogue materials◦Selection◦Digitization◦Management◦Stewardship

Page 18: Can a Consortium Build a Viable Preservation Repository? Presentation at CNI March 31, 2014 Bradley…

UVa - continuedBorn Digital

◦It is all about transfer◦Disk images awaiting

arrangement◦Need and I/O space◦Digital Scholarship

Wish we had this yearsago

Page 19: Can a Consortium Build a Viable Preservation Repository? Presentation at CNI March 31, 2014 Bradley…

UVa Landscape

Local disk (please only temporary) / scratch disk

Spinning disk – still only backupLocal HSM – local tape backupAPTrust – more robust

preservation actionsDPN – dark archive

Page 20: Can a Consortium Build a Viable Preservation Repository? Presentation at CNI March 31, 2014 Bradley…

Basic Technology GoalsSimple submission packaging – BagItStrong Chain of Custody – LoggingFormat agnostic basic preservation -

FixityStrong auditing and reporting -

PREMISEasily reference items between

systems – IdentifiersSimple distribution package for

restoration - BagIt

Page 21: Can a Consortium Build a Viable Preservation Repository? Presentation at CNI March 31, 2014 Bradley…

Flow of Content in APTrust

Intellectual Object

Generic File1

Generic File2

Generic File3

Submission Bag

•Metadata (TagFiles)•Preservation Files•data/File1•data/File2•data/File3

DPN Bag

DPN Bag

DPN Bag

DPN Bag

Break apart bag and manage as separate fedora objects

Repackage to same bag format

Ingest

Restore

Bagged separately in DPN to support versioning

Related Fedora Objects

Page 22: Can a Consortium Build a Viable Preservation Repository? Presentation at CNI March 31, 2014 Bradley…

ChallengesAbstracting away from specific

repository softwareIdentifying content across

distributed systemsScaling solutions are still a mixed

bagManaging dependencies in a

consortiumDeleting content requires some

more work

Page 23: Can a Consortium Build a Viable Preservation Repository? Presentation at CNI March 31, 2014 Bradley…

Sustainability of ServiceCommon development

frameworks – HydraUse available cloud services -

AWSAlign with evolving preservation

ecosystem – OAIS & DDP◦Fedora 4◦Standards like OAIS and DDP

Page 24: Can a Consortium Build a Viable Preservation Repository? Presentation at CNI March 31, 2014 Bradley…

APTrust and TRAC CertificationAPTrust is committed to working toward

TRAC certification,APTrust is the first ever repository to be built

from the ground up taking TRAC into account.

A Certification Working Group has been established and will be advising and consulting with the APTrust staff and partners on TRAC objectives.

Initial development work is proceeding at the level of Digital Object Management and Infrastructure.

Page 25: Can a Consortium Build a Viable Preservation Repository? Presentation at CNI March 31, 2014 Bradley…

Examples of TRAC Requirements “The repository shall have an appropriate succession plan,

contingency plans, and/or escrow arrangements in place in case the repository ceases to operate or the governing or funding institution substantially changes its scope.”

“The repository shall have short- and long-term business planning processes in place to sustain the repository over time.”

“The repository shall have contracts or deposit agreements which specify and transfer all necessary preservation rights, and those rights transferred shall be documented.”

“The repository shall have the appropriate number of staff to support all functions and services.”

“The repository shall have and use a convention that generates persistent, unique identifiers.”

Page 26: Can a Consortium Build a Viable Preservation Repository? Presentation at CNI March 31, 2014 Bradley…

Academic Preservation Trust – part of the evolving national digital preservation infrastructure 

“The Task Force envisions the development of a national system of digital archives, which it defines as repositories of digital information that are collectively responsible for the long-term accessibility of the nation’s social, economic, cultural and intellectual heritage instantiated in digital form.”Preserving Digital Information. Report of the Task Force on Archiving of Digital Information, commissioned by The Commission on Preservation and Access and the Research Libraries Group. May 1, 1996. Executive Summary, iii.