Upload
gerard-stevenson
View
215
Download
2
Tags:
Embed Size (px)
Citation preview
Merritt RepositoryDepositing Content and Providing Access
University of California Curation Center TeamCalifornia Digital Library
July 28, 2011
UC3 Summer Webinar Series
Merritt summary
• Curation repository– Supporting long-term preservation and access– Publish, share, preserve, discover, (re-)use
• “Model free”– There are no prescriptive requirements for content genre,
format, structure, or accompanying metadata
• No service fee (for UC affiliates)– Contributors are billed only for storage, $1.04/GB/year
Cost of a physical book in offsite storage $4.62/yearCost of a digital book in HathiTrust $0.15/yearCost of a digital book in Merritt $0.06/year
Cost of a dataset in Merritt $1.00/year
For more information, review the June 9 webinarhttp://www.cdlib.org/uc3/uc3webinars.html
Master recipe
• Registration (one time) [contributor → UC3, [email protected]]
• Submission [contributor → Merritt]
• Ingest [Merritt]
• Notification [Merritt → contributor]
• Discovery/delivery [consumer → Merritt → consumer]
Registration
• Contact Perry Willett, Merritt service manager [email protected]
Submission
• User interface• METS feeder• API
manual deposits
existing DPR workflows
automated deposits
UI submission
• The submission package is always a single file
• An opportunity to supply descriptive metadata
UI submission
• The submission package is always a single file, which may be:– For a single object
• The complete object• A multi-file object in a container (zip, gzip, tar.gz)
• A multi-file object defined by a manifest
– For a batch of objects• A manifest referring to single file objects• A manifest referring to objects in containers• A manifest referring to objects defined by manifests
Manifest
• A “packing slip” for an object, providing URLs for all object’s file components– Object manifest
• Algorithm = adler32, crc32, md2, md5, sha1, sha256, sha384, sha256
• See User’s Guide and online help for more information http://merritt.cdlib.org/
fileURL | hashAlgorithm | hashValue | fileSize | fileName | mimeType...
#%checkm_0.7#%profile| http://uc3.cdlib.org/registry/ingest/manifest/mrt-ingest-manifest #%prefix | mrt: | http://merritt.cdlib.org/terms##%prefix | nfo: | http://www.semanticdesktop.org/ontologies/2007/03/22/nfo# #%fields | nfo:fileUrl | nfo:hashAlgorithm | nfo:hashValue | nfo:fileSize | nfo:fileLastModified | nfo:fileName | mrt:mimeType
http://merritt.cdlib.org/samples/call911.jpg | md5 | 47d321056e60944a06973...http://merritt.cdlib.org/samples/call911.txt | md5 | 77fe42b1055bbabe51648...
#%eof
Manifest
• A “packing slip” for a batch, providing URLs for all object’s file components– Batch manifest
• Batch of single file objects• Batch of container objects• Batch of manifest objects
• An Excel macro is available for automatically generating manifests from spreadsheets http://merritt.cdlib.org/docs/merrittManifest.xls
• See User’s Guide and online help for more information http://merritt.cdlib.org/
fileURL | hashAlgorithm | hashValue | fileSize | fileName | primaryID | localID | creator | title | date...
Metadata
• Submission form• Batch manifest• Object component: mrt-erc.txt
erc:who: Blaine, Tegan Woodwardwhat: Continuous measurements of atmospheric argon/nitrogen ...when: 2005where: ark:/20775/bb21509964
Dublin Kernel Dublin Core Element
who creator Responsible person or party
what title Content description
when date Lifecycle-meaningful date
where identifier Locally-meaningful identifier
http://dublincore.org/groups/kernel/spec/
METS feeder
• METS must conform to a profile documented in the CDL Guidelines for Digital Objectshttp://www.cdlib.org/services/dsc/contribute/docs/GDO.pdf
– METS, all referenced file components, and manifest must be web accessible
– The Merritt IP address can be provided for configuring firewall rules
• Feeder manifest
• Submission
http://url/path/mets.xmlhttp://url/path/mets.xml...
http://feeder.cdlib.org/?userID=id&authCode=passwd& accessGroupID=collection&manifestURL=manifest
API submission
Field Value
filename optional File name
file required File contents
type optional
File type:• file • batch-manifest• container • container-batch-
manifest• object-manifest • single-file-batch-
manifestprofile required Profile (supplied by UC3)
primaryIdentifier optional Primary identifier (ARK)
localIdentifier optional Local identifier
digestType optional
Message digest type:• adler-32 • sha-1• crc-32 • sha-256• md2 • sha-384• md5 • sha-512
API submission
Field optional Value
digestValue optional Message digest value (hexadecimal encoded)
creator optional Creator
title optional Title
date optional Date
note optional Descriptive note
responseForm optional Response form:• anvl• json• xhtml• xml
API submission
POST /object/ingest HTTP/1.1Host: merritt.cdlib.orgContent-type: multipart/form-data; boundary=boundary
--boundaryContent-disposition: form-data; name=“file”; filename=“filename”file--boundaryContent-disposition: form-data; name=“type”
type--boundaryContent-disposition: forma-data; name=“profile”
profile--boundary...
API submission
• cURLhttp://curl.haxx.se/
% curl –s –u user password –F “file=@manifest” -F “type=manifest-type” -F “profile=profile” -F “localIdentifier=identifier” -F “creator=creator” -F title=title” http://merritt.cdlib.org/object/ingest
Ingest
• Primary identifier– ARK (required; auto-generated by if not
supplied)
– DOI (can be optionally requested from )
• Validation
• Characterization
• SIP → AIPISO 1472, Open Archival InformationSystem (OAIS)
Notification
• You will receive two email separate notifications– Initial notification that we have received your submission,
and that it is queued for subsequent processing
– Final notification that we have fully processed your submission• UC3’s preservation commitment starts at the time of final
notification
Initial notification
From: UC3 Merritt Support [mailto:[email protected]] Sent: Thursday, July 14, 2011 3:28 PMTo: Stephen AbramsSubject: Completion of submission Completion of submission - Notification - Submission ID: bid-4ed4bf45-aa78-4da7-bb65- 63b125d88150 - Job(s):
Number of pending job(s): 1Number of completed job(s): 0Number of failed job(s): 0
- User agent: slabrams - Submission date: 2011-07-14T15:27:41-07:00 - Status: QUEUED
Completion of submission - Notification Report
- Submission ID: bid-4ed4bf45-aa78-4da7-bb65-63b125d88150 - Job(s):
- Job ID: jid-3498bef6-e296-429d-b652-da1f35f8bc04 - Primary ID: ark:/20775/bb21509964 - Local ID: http://libraries.ucsd.edu/ark:/20775/bb21509964;b4946677;umi-ucsd-1040 - Filename: manifest2.txt - Object title: Continuous measurements of atmospheric argon/nitrogen as a tracer of
air-sea heat flux : models, methods, and data
- Object creator: Blaine, Tegan Woodward - Object date: 2005 - Status: PENDING
- User agent: slabrams - Submission date: 2011-07-14T15:27:41-07:00 - Status: QUEUED
With attachment, bid-4ed4bf45-aa78-4da7-bb65-63b125d88150.txt
Final notification
From: UC3 Merritt Support [mailto:[email protected]] Sent: Thursday, July 14, 2011 3:28 PMTo: Stephen AbramsSubject: Completion of ingest Notification Summary - Submission ID: bid-4ed4bf45-aa78-4da7-bb65-63b125d88150 - Job(s):
Number of pending job(s): 0Number of completed job(s): 1Number of failed job(s): 0
- User agent: slabrams - Queue Priority: 06 - Submission date: 2011-07-14T15:27:41-07:00 - Completion date: 2011-07-14T15:27:53-07:00 - Status: COMPLETED
With attachment, bid-4ed4bf45-aa78-4da7-bb65-63b125d88150.txt
Completion of ingest - Notification Report
- Submission ID: bid-4ed4bf45-aa78-4da7-bb65-63b125d88150 - Job(s):
- Job ID: jid-3498bef6-e296-429d-b652-da1f35f8bc04 - Primary ID: ark:/99999/fk4vm4kg6 - Local ID: ark:/20775/bb21509964 - Version: 3 - Filename: manifest2.txt - Object title: Continuous measurements of atmospheric argon/nitrogen as a tracer of air-sea heat flux : models, methods,
and data - Object creator: Blaine, Tegan Woodward - Object date: 2005 - Object state: http://store-stage.cdlib.org:35121/state/2111/ark%3A%2F99999%2Ffk4vm4kg6?t=xhtml - Submission date: 2011-07-14T15:27:46-07:00 - Completion date: 2011-07-14T15:27:53-07:00 - Status: COMPLETED
- User agent: slabrams - Queue Priority: 06 - Submission date: 2011-07-14T15:27:41-07:00 - Completion date: 2011-07-14T15:27:53-07:00 - Status: COMPLETED
Discovery/delivery
• Search
Discovery/delivery
• Search
Discovery/delivery
• Search
Discovery/delivery
• Browse
Discovery/delivery
• Browse
Coming soon …
• Enhanced characterization– JHOVE2
http://jhove2.org/
• Faceted search/browse– XTF (the technology behind )
http://xtf.cdlib.org/
• Investigation of CMS/DAMS-like function through integration with …– Islandora/Drupal (in cooperation with UCLA)
– Alfresco (in cooperation with UCB)
– Omeka (in cooperation with UCSC)
Questions?
Upcoming webinars
Date/time TopicThursday, August 112:00 pm
EZID: Create and Manage Persistent IdentifiersJoan Starr, UC3/CDL
Thursday, August 252:00 pm
DCXL (Data Curation Excel)Carly Strasser, UC3/CDL
Thursday, Sept. 222:00 pm
Data Management Planning ToolPatricia Cruse/Tracy Seneca, UC3/CDL
http://www.cdlib.org/uc3/uc3webinars.html
For more information
UC Curation Centerhttp://www.cdlib.org/uc3http://www.cdlib.org/uc3/[email protected]
Stephen Abrams David LoyLisa Colvin Mark Reyes Patricia Cruse Abhishek SalveScott Fisher Tracy Seneca Erik Hetzner Joan StarrGreg Janée Carly StrasserJohn Kunze Marisa StrongMargaret Low Perry Willett
UC3 webinar serieshttp://www.cdlib.org/uc3/uc3webinars.html
Merritt repositoryhttp://merritt.cdlib.org/ http://merritt.cdlib.org/helphttp://merritt.cdlib.org/docs/merritt_handout.pdfhttp://merritt.cdlib.org/docs/merritt_user_guide.pdf