29
An Introduction to the Merritt Curation Repository University of California Curation Center Team California Digital Library June 9, 2011 UC3 Summer Webinar Series

An Introduction to the Merritt Curation Repository University of California Curation Center Team California Digital Library June 9, 2011 UC3 Summer Webinar

Embed Size (px)

Citation preview

Page 1: An Introduction to the Merritt Curation Repository University of California Curation Center Team California Digital Library June 9, 2011 UC3 Summer Webinar

An Introduction to the Merritt Curation Repository

University of California Curation Center TeamCalifornia Digital Library

June 9, 2011

UC3 Summer Webinar Series

Page 2: An Introduction to the Merritt Curation Repository University of California Curation Center Team California Digital Library June 9, 2011 UC3 Summer Webinar

First, a word about the webinar series…• A forum for timely topics of interest to the UC

community– Highlighting projects, services, and developments in the

areas of digital preservation, web archiving, and data curation

– Intended to raise awareness of issues, and provide information on useful resources and services available to the UC community

– 2nd and 4th Thursday of the month, and as scheduled, featuring UC3 staff and UC librarians, content managers, and technologists

Teleconference +1 (866) 740-1260, access code 9879016#Webconference http://bit.ly/jdjMAP

Page 3: An Introduction to the Merritt Curation Repository University of California Curation Center Team California Digital Library June 9, 2011 UC3 Summer Webinar

First, a word about the webinar series…

• Some logistics…– Participant phones will be muted during the formal

presentation, but we will be monitoring the online chat

– Slides, Q & A, and web and voice recordings will be posted after each presentation

– Schedule available at http://www.cdlib.org/uc3/uc3webinars.html

– Please suggest additional [email protected]

– Take the short surveyhttp://www.surveymonkey.com/s/XSGWP8R

Page 4: An Introduction to the Merritt Curation Repository University of California Curation Center Team California Digital Library June 9, 2011 UC3 Summer Webinar

Now on with the show…

• Today’s topic is an introduction to the Merritt curation repository– Who is it for?

– What can it do?

– Why use it?

– What does it cost?

– Next steps?

– Q & A

Page 5: An Introduction to the Merritt Curation Repository University of California Curation Center Team California Digital Library June 9, 2011 UC3 Summer Webinar

What keeps you up at night?

Are there standards or best practices I should

be aware of?

How much will it cost?

How can I transfer my content to an

appropriate curation environment

How do I know my content is safe?

What’s the best strategy to ensure

permanent availability?

Do I need to create new derivatives just for preservation purposes?

How can I get a persistent reference

to my content? What if my content needs to evolve over

time?

Can I control who can see my

content?

I have a good discovery platform; how can I add preservation services?

Page 6: An Introduction to the Merritt Curation Repository University of California Curation Center Team California Digital Library June 9, 2011 UC3 Summer Webinar

“There’s an app for that”

Are there standards or best practices I should

be aware of?

How much will it cost?

How can I transfer my content to an

appropriate curation environment

How do I know my content is safe?

What’s the best strategy to ensure

permanent availability?

Do I need to create new derivatives just for preservation purposes?

How can I get a persistent reference

to my content? What if my content needs to evolve over

time?

Can I control who can see my

content?

I have a good discovery platform; how can I add preservation services?

Automatic replication and high-availability redundancy

Periodic fixity audit

Simple submission UI/APIMETS “feeder” duplicates

existing DPR workflow

Model freeNo packaging, format, or metadata requirements

Strongly versionedIntegration with

EZID and DataCite

Curator-defined access control rules

Modular micro-services “toolkit”

UC3 consultation

Storage at $1.04/GB/year

Page 7: An Introduction to the Merritt Curation Repository University of California Curation Center Team California Digital Library June 9, 2011 UC3 Summer Webinar

Merritt repository

• Merritt is available for use by all members of the UC community

– Libraries/archives/museums– ORU/MRUs– Faculty/staff

• Centrally hosted by UC3/CDL on behalf of the UC community– Economies of scale– Shared experience and

expertise

Mediated through campus libraries

Page 8: An Introduction to the Merritt Curation Repository University of California Curation Center Team California Digital Library June 9, 2011 UC3 Summer Webinar

Modes of use: dark archive

• Pro-active preservation, but no expectation of direct end user access– Legacy DPR content contributed by campus libraries– Cultural heritage texts, master images, sound, moving

image, data sets

– All DPR content will be automatically migrated to Merritt

Page 9: An Introduction to the Merritt Curation Repository University of California Curation Center Team California Digital Library June 9, 2011 UC3 Summer Webinar

Modes of use: bright archive

• Provide preservation and end user access– NIH Healthy Pathways project on bio-demographics

• Multi-institutional: UC Davis, University of Colorado, University of Virginia, Syddansk University (Denmark)

• Need to restrict access to project partners initially, with eventual public access

Page 10: An Introduction to the Merritt Curation Repository University of California Curation Center Team California Digital Library June 9, 2011 UC3 Summer Webinar

Modes of use: bright archive

• Content discovery: search

Page 11: An Introduction to the Merritt Curation Repository University of California Curation Center Team California Digital Library June 9, 2011 UC3 Summer Webinar

Modes of use: bright archive

• Content discovery: search

Page 12: An Introduction to the Merritt Curation Repository University of California Curation Center Team California Digital Library June 9, 2011 UC3 Summer Webinar

Modes of use: bright archive

• Content discovery: browse

Page 13: An Introduction to the Merritt Curation Repository University of California Curation Center Team California Digital Library June 9, 2011 UC3 Summer Webinar

Modes of use: bright archive

• Content discovery: browse

Page 14: An Introduction to the Merritt Curation Repository University of California Curation Center Team California Digital Library June 9, 2011 UC3 Summer Webinar

Modes of use: preservation “back end”

• Preservation only; content discovery/delivery provided by well-known external systems– Using direct hooks into Merritt to retrieve content

– eScholarshipOpen access publishing

– Open ContextArchaeological data publishing

– Investigating integration with Islandora/Drupal and Alfresco

Page 15: An Introduction to the Merritt Curation Repository University of California Curation Center Team California Digital Library June 9, 2011 UC3 Summer Webinar

Modes of use: distributed data grids

• DataONE “Enable new science and knowledge creation through universal access to data about life on earth and the environment that sustains it”

Page 16: An Introduction to the Merritt Curation Repository University of California Curation Center Team California Digital Library June 9, 2011 UC3 Summer Webinar

More information

• Online help http://merritt.cdlib.org/help

• FAQ http://merritt.cdlib.org/docs/merritt_handout.pdf

• User’s guidehttp://merritt.cdlib.org/docs/merritt_user_guide.pdf

• UC3 contact http://www.cdlib.org/uc3/[email protected]

Page 17: An Introduction to the Merritt Curation Repository University of California Curation Center Team California Digital Library June 9, 2011 UC3 Summer Webinar

Merritt cost model

• UC3 provides technical infrastructure, data center hosting, staff, monitoring, maintenance, enhancements, help, outreach, consultation, etc.

• Contributors are charged only for storage used, at the UC3 recovery rate of $1.04/GB/year

• Developing an “endowment” model: Pay once, preserve forever

• Will soon extend model for non-UC contributors

How does this compare?• Cost of a physical book in RLF † $

4.62/year• Cost of a digital book in HathiTrust ‡ $

0.15/year• Cost of a digital book in Merritt $

0.06/year

† Gary Lawrence (2007) Internal analysis, CDL; ‡ Paul Courant and Matthew Nielsen (2010), On the cost of keeping a book, HathiTrust.

Page 18: An Introduction to the Merritt Curation Repository University of California Curation Center Team California Digital Library June 9, 2011 UC3 Summer Webinar

Average collection sizes and costs

Collection Objects Size Annual cost

CA DOE reports 8,000 12.0 GB $ 12.48

Cal Cultures 420 65.6 GB $ 68.22

eScholarship 46,425 118.6 GB $ 123.34

A “cost calculator” spreadsheet is available athttp://www.cdlib.org/uc3/docs/Merritt-cost-calculator-v3.xlsx

Page 19: An Introduction to the Merritt Curation Repository University of California Curation Center Team California Digital Library June 9, 2011 UC3 Summer Webinar

Average ETD size and cost

Campus ETD titles Size Annual cost

Berkeley 797 12.4 GB $ 12.88

Davis 837 13.0 GB $ 13.52

Irvine 390 6.1 GB $ 6.30

Los Angeles 720 11.2 GB $ 11.63

Riverside 192 2.9 GB $ 3.10

San Diego 558 8.7 GB $ 9.02

San Francisco * 560 8.7 GB $ 9.05

Santa Barbara 325 5.0 GB $ 5.25

Santa Cruz 155 2.4 GB $ 2.50

Based on 2009 holdings in ProQuest * UCSF based on total ETD holdings in Merritt

Page 20: An Introduction to the Merritt Curation Repository University of California Curation Center Team California Digital Library June 9, 2011 UC3 Summer Webinar

Average research data size and cost

• Almost 50% of all research data is less than 1 GB

Source: Science 331:6018 (February 11, 2011): 692-693 <DOI: 10.1126/science.331.6018.692>

Size Percentage Annual cost

< 1 GB 48.3 % < $ 1.04

1 – 100 GB 32.0 % $ 1.04 – 104.00

100 GB – 1 TB 12.1 % $ 104.00 – 1,040.00

> 1 TB 7.6 % > $ 1,040.00

Page 21: An Introduction to the Merritt Curation Repository University of California Curation Center Team California Digital Library June 9, 2011 UC3 Summer Webinar

Next steps

• UC3 is working with campus partners to determine ongoing development and collection priorities

AnnotationNotification

TransformationCharacterization

Fixity / Linked data

ReplicationIdM/Authn/AuthzIngest, Access

Inventory, QueuingStorage and Identity

Technology watchMetadata standards

Policy and business modelData management guidelines

Object and collection modeling

New contentacquisition

Page 22: An Introduction to the Merritt Curation Repository University of California Curation Center Team California Digital Library June 9, 2011 UC3 Summer Webinar

Next steps

In production• Model-free objects• Submission via UI and API• Persistent identifiers• Format identification• Version provenance• Automated replication• Automated fixity audit• Role-based access control• Collections• Semantic index and search• Object/version/file download

In progress

• Simplified update

• Enhanced characterization (JHOVE2)

• Faceted search and browse (XTF)• CMS/DAMS-like function

(Islandora)

In planning

• Simplified batch

• UCTrust integration

• Linked data

• Transformation• Notification• Annotation• Support for NGTS/DLSTF

recommendations

We welcome your feedback on needs and priorities!http://www.cdlib.org/uc3/[email protected]

Page 23: An Introduction to the Merritt Curation Repository University of California Curation Center Team California Digital Library June 9, 2011 UC3 Summer Webinar

Simplified update

• Variant form of object update requiring the submission of only the changed components

• Client-side tools to simplify the creation of batch manifests #%checkm_0.7

#%profile | http://uc3.cdlib.org/registry/ingest/mani#%prefix | mrt: | http://merritt.cdlib.org/terms##%prefix | nfo: | http://www.semanticdesktop.org/onto#%fields | nfo:fileUrl | nfo:hashAlgorithm | nfo:hash

http://merritt.cdlib.org/samples/goldenDragon.jpg | mhttp://merritt.cdlib.org/samples/tumbleBug.jpg | md5 http://merritt.cdlib.org/samples/generalDrapery.jpg | http://merritt.cdlib.org/samples/generalDrapery.jpg |

#%eof

Page 24: An Introduction to the Merritt Curation Repository University of California Curation Center Team California Digital Library June 9, 2011 UC3 Summer Webinar

Enhanced characterization

• JHOVE2 next-generation framework for format-aware characterization http://jhove2.org/

– Automated extraction and inference of extensive technical metadata significant for preservation analysis and planning

"Module": { "scope": "ICCModule“, "Header": { "scope": "ICCHeader“, "ProfileSize": { "unit": "byte“, "value": 60960 } ,"ProfileVersionNumber": "4.2.0.0“ ,"ProfileDeviceClass_raw": "spac“ ,"ProfileDeviceClass_descriptive": "ColorSpace Conversion profile“ ,"ColourSpace_raw": "RGB “ ,"ColourSpace_descriptive": "rgbData“ ,"ProfileConnectionSpace_raw": "Lab “ ,"ProfileConnectionSpace_descriptive": "labData“

Page 25: An Introduction to the Merritt Curation Repository University of California Curation Center Team California Digital Library June 9, 2011 UC3 Summer Webinar

Enhanced discovery via XTF

• eXtensible Text Framework http://xtf.cdlib.org/

– CDL developed/supported open source discovery platform– Robust, scalable faceted search and browse

Page 26: An Introduction to the Merritt Curation Repository University of California Curation Center Team California Digital Library June 9, 2011 UC3 Summer Webinar

CMS/DAMS-like function

• Many campuses are looking for CMS/DAMS solutions

• Investigating integration with Islandora to provide a Drupal CMS/DAMS front-end to Merritt

http://islandora.ca/ http://drupal.org/

Page 27: An Introduction to the Merritt Curation Repository University of California Curation Center Team California Digital Library June 9, 2011 UC3 Summer Webinar

Questions?

Page 28: An Introduction to the Merritt Curation Repository University of California Curation Center Team California Digital Library June 9, 2011 UC3 Summer Webinar

Upcoming webinarsDate/time TopicWednesday, June 1512:30 pm

Data Sharing by Scientists: Practices and PerceptionsCarol Tenopir, Univ. TennesseeMike Frame, USGS

Thursday, June 302:00 pm

The Data Management Planning Tool (DMP Tool)Trisha Cruse, UC3

Thursday, July 142:00 pm

Data as PublicationJohn Kunze, UC3Catherine Mitchell, CDL Publishing Program

Thursday, July 282:00 pm

Merritt: Depositing Content and Providing Access

Thursday, August 112:00 pm

DCXL (Data Curation Excel)

http://www.cdlib.org/uc3/uc3webinars.html

Please take the webinar survey http://www.surveymonkey.com/s/XSGWP8R

Page 29: An Introduction to the Merritt Curation Repository University of California Curation Center Team California Digital Library June 9, 2011 UC3 Summer Webinar

For more information

UC Curation Centerhttp://www.cdlib.org/uc3http://www.cdlib.org/uc3/[email protected]

Stephen Abrams Margaret LowLisa Colvin David LoyPatricia Cruse Mark Reyes Scott Fisher Tracy Seneca Erik Hetzner Joan StarrGreg Janée Marisa StrongJohn Kunze Perry Willett

UC3 webinar serieshttp://www.cdlib.org/uc3/uc3webinars.html

Merritt repositoryhttp://merritt.cdlib.org/ http://merritt.cdlib.org/helphttp://merritt.cdlib.org/docs/merritt_handout.pdfhttp://merritt.cdlib.org/docs/merritt_user_guide.pdf