36
HATHI TRUST A Shared Digital Repository HathiTrust Digital Library Cooperation for Preservation

HATHI TRUST A Shared Digital Repository HathiTrust Digital Library Cooperation for Preservation

Embed Size (px)

Citation preview

Page 1: HATHI TRUST A Shared Digital Repository HathiTrust Digital Library Cooperation for Preservation

HATHI TRUST A Shared Digital Repository

HathiTrust Digital Library

Cooperation for Preservation

Page 2: HATHI TRUST A Shared Digital Repository HathiTrust Digital Library Cooperation for Preservation

Outline

• About HathiTrust– Mission & Goals

• Background• What we do– Services

• How we do it– Governance– Partnership & Resources– Technology

• Future Directions

Page 4: HATHI TRUST A Shared Digital Repository HathiTrust Digital Library Cooperation for Preservation

What is HathiTrust• Shared Digital Repository– Launched 2008 by 25 institutions (now 26)– Initial focus on digitized book and journal content– Expanding to non-book/non-journal, born digital – “Light” archive

• Collaboration – Preservation and access– Print collections– Local services– Public Good

Page 5: HATHI TRUST A Shared Digital Repository HathiTrust Digital Library Cooperation for Preservation

Background

Page 6: HATHI TRUST A Shared Digital Repository HathiTrust Digital Library Cooperation for Preservation

History

• Michigan Digitization Project 2004• “…U of M shall have the right to use the U of

M Digital Copy, in whole or in part at U of M's sole discretion, as part of services offered in cooperation with partner research libraries such as the institutions in the Digital Library Federation…”

Page 7: HATHI TRUST A Shared Digital Repository HathiTrust Digital Library Cooperation for Preservation

History

• Collective Agreement with CIC Announced in June 2007

• CIC agreed to establish a shared digital repository

Page 8: HATHI TRUST A Shared Digital Repository HathiTrust Digital Library Cooperation for Preservation

History

Page 9: HATHI TRUST A Shared Digital Repository HathiTrust Digital Library Cooperation for Preservation

The Partners

• When announced in October 2008, partners included:– University of California system– CIC (Committee on Institutional Cooperation)

– University of Virginia

University of ChicagoUniversity of IllinoisIndiana UniversityUniversity of IowaUniversity of Michigan Michigan State University

University of MinnesotaNorthwestern University Ohio State University Pennsylvania State University Purdue University University of Wisconsin-Madison

Columbia University

Page 10: HATHI TRUST A Shared Digital Repository HathiTrust Digital Library Cooperation for Preservation

The Name

• The meaning behind the name– Hathi (hah-tee)--Hindi for elephant– Big, strong– Never forgets, wise– Secure– Trustworthy

Page 11: HATHI TRUST A Shared Digital Repository HathiTrust Digital Library Cooperation for Preservation

Content Distribution

As of February 1:5,323,716 - Total 764,481 - Public Domain

Page 12: HATHI TRUST A Shared Digital Repository HathiTrust Digital Library Cooperation for Preservation

Content Growth

Page 13: HATHI TRUST A Shared Digital Repository HathiTrust Digital Library Cooperation for Preservation

What we do

Page 14: HATHI TRUST A Shared Digital Repository HathiTrust Digital Library Cooperation for Preservation

Services

Page 15: HATHI TRUST A Shared Digital Repository HathiTrust Digital Library Cooperation for Preservation

How we do it

Page 16: HATHI TRUST A Shared Digital Repository HathiTrust Digital Library Cooperation for Preservation

Governance

HathiTrustHathiTrust

Executive Committee

Strategic Advisory

Board

Strategic Advisory

Board

Budget/FinancesDecision-making

PolicyPlanning

Page 17: HATHI TRUST A Shared Digital Repository HathiTrust Digital Library Cooperation for Preservation

Executive Committee

• Paul Courant, University Librarian and Dean of Libraries, UM• Laine Farley, Executive Director, CDL• John King, Vice Provost for Academic Information, UM• Paula Kaufman, University Librarian and Dean of Libraries, UI• Brian Schottlaender, University Librarian, UCSD• Ed Van Gemert, Director of Libraries, UW - Madison• Brenda Johnson, Dean of Libraries, IU• Brad Wheeler, Chief Information Officer, IU• John Wilkin, Executive Director of HathiTrust and

Associate University Library, LIT, UM

Page 18: HATHI TRUST A Shared Digital Repository HathiTrust Digital Library Cooperation for Preservation

Strategic Advisory Board

• Ed Van Gemert (Chair), Director of Libraries, UW - Madison• John Butler, Associate University Librarian for Information

Technology, U Minn• Patricia Cruse, Director, Preservation, CDL• Bernie Hurley, Director, Library Technologies, UC Berkeley• R. Bruce Miller, University Librarian, UC - Merced• Sarah Pritchard, University Librarian, Northwestern• Paul Soderdahl, Director, LIT, U Iowa• John Wilkin, Executive Director, HathiTrust (ex officio)

Page 19: HATHI TRUST A Shared Digital Repository HathiTrust Digital Library Cooperation for Preservation

Partnership & Resources (1)

• Funded for a initial 5 years with base-funding from partners

• Budget – separately held within UMich budget system, managed by the Executive Committee

• Cost Model – Per GB cost of storage per year with a one-time fee on new content to build a capital fund

• Review in 3rd yr of each 5 yr period

Page 20: HATHI TRUST A Shared Digital Repository HathiTrust Digital Library Cooperation for Preservation

Partnership & Resources (2)

• Staff/Expertise – highly integrated– Project managers, IT and communications

staff, copyright experts, administrators (UM,

Indiana and UC taking the lead)• Working groups• UM recently hired a Digital Preservation Librarian• Shared development space

Page 21: HATHI TRUST A Shared Digital Repository HathiTrust Digital Library Cooperation for Preservation

Financial contributions of partners

HathiTrust Functional Framework

Page 22: HATHI TRUST A Shared Digital Repository HathiTrust Digital Library Cooperation for Preservation

Partnership & Resources (3)

• Toward a Cloud Library– CLIR, Mellon Foundation– OCLC Research, NYU, HathiTrust, Recap Libraries

• Objective: Characterize the near-term opportunity for externalizing management of academic research collections leveraging capacity of large-scale shared print and digital repositories*

• Outcomes: opportunity and risk assessment based on aggregate collection analysis; draft service agreement enabling generic consumer library to selectively outsource preservation and access of low-use research collections to large-scale print and digital repositories

*From the RLG Partner Update January 7, 2010

Page 23: HATHI TRUST A Shared Digital Repository HathiTrust Digital Library Cooperation for Preservation

Partnership & Resources (4)

• CRL TRAC Audit– Portico and HathiTrust assessments timely– “Certification will augment CRL’s strategic archiving of

print, and support a responsible transition to electronic-only formats where appropriate.”

– Work with UC to design shared print journal archiving effort

– “With this hybrid strategy CRL hopes to enable its community to accelerate the shift to electronic-only resources in a careful and responsible manner.”

* http://www.crl.edu/archiving-preservation/digital-archives/certification-and-assessment-digital-repositories

Page 24: HATHI TRUST A Shared Digital Repository HathiTrust Digital Library Cooperation for Preservation

Partnership & Resources (5)

• New cost model• Based on benefits to institutions– Public Domain– In-copyright• Volumes “held”

Page 25: HATHI TRUST A Shared Digital Repository HathiTrust Digital Library Cooperation for Preservation

Partnership & Resources (6)

• Timeline:– Implement in 2013– Accept new partners now with costs based on

overlap calculations

• Requirements:– Print holdings database– Update mechanisms– Manual remediation

Page 26: HATHI TRUST A Shared Digital Repository HathiTrust Digital Library Cooperation for Preservation

Technology - OAIS

GRINInternal Data Loading

GRINInternal Data Loading

Google[OCA]

In-house Conversion

Google[OCA]

In-house Conversion

MARC record extensions (Aleph)

Rights DB

MARC record extensions (Aleph)

Rights DB

Page TurnerHathiTrust API

OAIGeoIP DB

CNRI Handles[Solr]

Page TurnerHathiTrust API

OAIGeoIP DB

CNRI Handles[Solr]

METS/PREMIS objectTIFF G4/JPEG2000

OCRMD5 checksums

METS/PREMIS objectTIFF G4/JPEG2000

OCRMD5 checksums

METS objectPNGOCRPDF

METS objectPNGOCRPDFIsilon

Site ReplicationTSM

MD5 checksum validation

IsilonSite Replication

TSMMD5 checksum validation

GROOVE(JHOVE)GROOVE(JHOVE)

;

Page 27: HATHI TRUST A Shared Digital Repository HathiTrust Digital Library Cooperation for Preservation

Technology – Architecture

• Inbound validation, standards-based object storage and related metadata

• Storage in Ann Arbor and Indianapolis• Encrypted backup to 3rd location• Rights database for rights metadata• Online catalog as source and storage for descriptive

metadata

Page 28: HATHI TRUST A Shared Digital Repository HathiTrust Digital Library Cooperation for Preservation

Technology - Ingest

• Automatic validation in GROOVE– Check barcode check digit using Luhn algorithm– Fixity check on JPG2000, TIFF, UTF8 using MD5– Well-formedness and embedded metadata check

on JPG2000, TIFF, UTF8 using JHOVE• Creation of METS and PREMIS

Page 29: HATHI TRUST A Shared Digital Repository HathiTrust Digital Library Cooperation for Preservation

• Isilon storage• Simple filesystem layout– One directory per volume, zip file and METS file– Use of a namespace allows for conflicting

identifiers– Namespaces for institutions and, if needed, types

of identifiers within the institution

Technology - Repository

Page 30: HATHI TRUST A Shared Digital Repository HathiTrust Digital Library Cooperation for Preservation

• Why METS?– Can serve as Archival Information

Package and a Dissemination Information Package

– Designed to record the relationship between pieces of complex digital objects

– Can be created automatically as texts are loaded or reloaded

– Preservation actions (PREMIS)

Technology – METS Object

Page 31: HATHI TRUST A Shared Digital Repository HathiTrust Digital Library Cooperation for Preservation

• What’s there?

–metsHdr with an ID and CREATEDATE

– 2 dmdSecs: Marcxml and mdRef

– amdSec containing one techMD with PREMIS metadata

– fileSec with 4 fileGrps (zip, images, OCR, hOCR)

– Physical structMap tying together files with metadata (pg. numbers and features)

Technology – METS Object

Page 32: HATHI TRUST A Shared Digital Repository HathiTrust Digital Library Cooperation for Preservation

Future Directions

Page 33: HATHI TRUST A Shared Digital Repository HathiTrust Digital Library Cooperation for Preservation

Future Directions (1)

Page 34: HATHI TRUST A Shared Digital Repository HathiTrust Digital Library Cooperation for Preservation

Future Directions (2)

Page 35: HATHI TRUST A Shared Digital Repository HathiTrust Digital Library Cooperation for Preservation

Links• Catalog, Full-text search, and Collection Builder

– http://catalog.hathitrust.org• METS and PREMIS implementation

– http://www.hathitrust.org/preservation• Technical profile:

– http://www.hathitrust.org/technology• Technical flow diagram

– http://www.hathitrust.org/documents/HathiTrust-PASIG-200910.pdf– http://www.hathitrust.org/documents/HathiTrust-PASIG-notes-200910.pdf

• Rights management– http://www.hathitrust.org/rights_management

• TRAC– http://www.hathitrust.org/accountability

Page 36: HATHI TRUST A Shared Digital Repository HathiTrust Digital Library Cooperation for Preservation

Thank [email protected]

[email protected]

http://www.hathitrust.org