16

Click here to load reader

Goethals Harvard Library's Digital Preservation Repository

Embed Size (px)

Citation preview

Page 1: Goethals Harvard Library's Digital Preservation Repository

Harvard Library’sDigital Preservation Repository,the Digital Repository Service (DRS)NISO-NFAIS Joint Virtual Conference

Andrea Goethals, HL Digital Preservation Services 12/7/2016

Page 2: Goethals Harvard Library's Digital Preservation Repository

AGENDA

• DRS Overview

• Highlights of Current Work

• Challenges

• Future Work

• Q&A

Page 3: Goethals Harvard Library's Digital Preservation Repository

DRS

Overview

Current work

Challenges

Future work

Q & A

WHAT IS THE DRS?

• Harvard-maintained service for digital content for:• long-term preservation

• keep the content safe• keep the information usable long-term on modern platforms

• delivery to users

• A service - not storage or a tool• Includes preservation & IT staff actively monitoring the

content and systems• Includes documented policies, practices & preservation

plans• Uses technology & systems but these change over time

Page 4: Goethals Harvard Library's Digital Preservation Repository

DRS

Overview

Current work

Challenges

Future work

Q & A

KEY POLICIES

• What can be deposited into the DRS

• Who can deposit to the DRS

• Obligations of collection managers

• Responsibilities of DRS staff

• Retention policies

• Discovery & access policies

• Delivery services

• Preservation services

DRSPolicyGuide:http://hul.harvard.edu/ois/systems/drs/policyGuide/DRS_Policy_Guide-Printable.pdf

Page 5: Goethals Harvard Library's Digital Preservation Repository

DRS

Overview

Current work

Challenges

Future work

Q & A

KEY STRATEGIES

• Format guidance (preferred & accepted for deposit)

• Deposit tools with automatic technical characterization

• Content validated against documented content models

• Constant bit integrity checking

• Regular storage refreshes

• Format migrations

• Expert networks

Page 6: Goethals Harvard Library's Digital Preservation Repository

DRS

Overview

Current work

Challenges

Future work

Q & A

WHAT’S IN THE DRS?

• 63 million files,204 TB per copy

• Many formats• Images, audio, text,

digitized books, web sites, documents, biomedical image stacks, email, “opaque objects” and soon video• Primarily digitized images

and text

Text~ 1/3

Image~ 2/3

Rest of formats (audio,

documents, etc.)

Page 7: Goethals Harvard Library's Digital Preservation Repository

DRS

Overview

Current work

Challenges

Future work

Q & A

TECHNOLOGY

• Modular architecture• Front ends (deposit, delivery, management)• Middleware (APIs)• Back end (preservation storage, database, index)

• Combination of:• Third-party tools

• Open-source and/or free software (Linux, Apache tools, Java, SOLR, etc.)

• Commercial off-the-shelf software (Oracle, LuraTech Image Server)

• Custom services (for authorization, authentication, persistent naming, viewing/updating metadata, deleting content, etc.)

Page 8: Goethals Harvard Library's Digital Preservation Repository

DRS

Overview

Current work

Challenges

Future work

Q & A

COLLABORATIVELY MANAGED

DRS Business Owner DRS Technology OwnerDigital Preservation Services Library Technology Services

Key Responsibilities:• Preservation & usage policies,

strategies• Manage & communicate about service• Represent users’ & content’s

preservation needs• Define high-level enhancement

roadmaps• Preservation plans• Preservation outreach, consulting,

guidelines

Key Responsibilities:• Technology & security policies,

strategies• Manage hardware, software,

development• Bug fixes & enhancements• System monitoring & scaling• Refine roadmaps based on resources• System testing & documentation• User support & training on systems

Page 9: Goethals Harvard Library's Digital Preservation Repository

DRS

Overview

Current work

Challenges

Future work

Q & A

HIGHLIGHTS OF CURRENT WORK

• Metadata migration

• Rollout of video preservation services

Page 10: Goethals Harvard Library's Digital Preservation Repository

DRS

Overview

Current work

Challenges

Future work

Q & A

METADATA MIGRATION PROJECT

• Last piece of the DRS2 Project (move to the next-generation DRS)• Previously completed:• Transition all infrastructure to standards-based object

model and metadata schemas• New modern management tools and API-based services

layer• Support for more formats

• Metadata migration• Re-describe all the content at the object and file-level• Result: more accurate and detailed metadata to support

curatorial management and preservation planning

Page 11: Goethals Harvard Library's Digital Preservation Repository

DRS

Overview

Current work

Challenges

Future work

Q & A

VIDEO SERVICES PROJECT

• Part of a larger project to add DRS support for formats most-requested by curators:• video• word processing• CAD (2D and 3D)• disk images• RAW camera images• (image sequences for scanned film)

• New fast-tracking process working with consultants to help with the analysis

Page 12: Goethals Harvard Library's Digital Preservation Repository

DRS

Overview

Current work

Challenges

Future work

Q & A

VIDEO SERVICES PROJECT

• DRS support• Video content model• DRS tool enhancements

• Deposit (FITS, etc.)• Delivery (Streaming Delivery Service based on JWPlayer)

• Media Preservation Services enhancements• Purchased video playback technology, professional

digitization technology, additional SAN storage• Custom routing of data for deposit from MPS SAN to the

DRS loader• Wrote custom DRS deposit tools• Staff training

Page 13: Goethals Harvard Library's Digital Preservation Repository

DRS

Overview

Current work

Challenges

Future work

Q & A

CHALLENGES

• Long-running back-end projects, for example:• Format migrations• Architecture migrations• Metadata schema migrations• Data model migrations

Invisible to most + resource-heavy!

• Support for the long tail of formats• Necessary but arguably less-impactful

Page 14: Goethals Harvard Library's Digital Preservation Repository

DRS

Overview

Current work

Challenges

Future work

Q & A

LONG TAIL

19%

15%

10%

8% 8%

7%

6% 6% 6%

5% 5%

1% 1% 1% 1% 1% 0%

CuratorRequeststoAddFormatSupporttotheHLDRS(2004-2016)

Page 15: Goethals Harvard Library's Digital Preservation Repository

DRS

Overview

Current work

Challenges

Future work

Q & A

FUTURE WORK

• Format migrations (RealAudio, SMIL, Kodak PhotoCD)

• Easier deposits• User-friendly for humans• More automated streams from systems inside and outside

of Harvard

• Medium-term preservation

• Full support through delivery for disk images, CAD, email

Page 16: Goethals Harvard Library's Digital Preservation Repository

Q & A