NISO Webinar:Metadata for Preservation:
A Digital Object's Best Friend
February 13, 2013
Speakers: Rebecca Guenther, Amy Kirchhoff
http://www.niso.org/news/events/2013/webinars/preservation
Metadata for Preservation: A Digital Object’s Best FriendIntroduction to Preservation Metadata
Rebecca Squire GuentherLibrary of Congress, NDMSO and Consultant, [email protected]
NISO Webinar, Feb. 13, 2013
Digital preservation: imperative and challenge
More and more of scholarly and cultural record exists in digital form; steps must be taken to secure its long-term future
Groups such as Digital Preservation Coalition, NDIIPP and National Digital Stewardship Alliance have made significant progress in raising awareness about digital preservation imperative
Gradual shift in focus from articulating problem to solving it …• Not so much “Why is digital preservation important” anymore; rather, “What must be done to achieve preservation objectives?”
Many practical challenges in implementing reliable, sustainable digital preservation programs
One key challenge: preservation metadata
Metadata and preservation metadata
“Structured information thatdescribes, explains, locates,or otherwise makes it easier toretrieve, use, or manage aninformation resource”
METADATA
“Metadata that supportsand documents the digitalpreservation process”
PRESERVATIONMETADATA
Preservation metadata includes:
Provenance:• Who has had custody/ownership of the
digital object?
Authenticity:• Is the digital object what it purports to be?
Preservation Activity:• What has been done to preserve it?
Technical Environment:• What is needed to render and use it?
Rights Management:• What IPR must be observed?
Makes digital objects self-documenting across time
Content
PreservationMetadata
10 years on
50 years on
Forever!
Basics of preservation metadata
Digital preservation concentrates on well-designed formal systems based on digital library and trusted digital repository concepts
Information about what needs to be preserved and how are part of any preservation system
Since items aren’t on shelves, metadata is the only mechanism for actually keeping or finding anything
3 concepts are important• Metadata about preservation of digital objects• Preservation of metadata itself to ensure that content
and metadata is preserved• Use of metadata in a trusted digital repository
PREMIS Data Dictionary
May 2005: Data Dictionary for PreservationMetadata: Final Report of the PREMIS Working
Group• Version 2.0 (April 2008)• Version 2.1 (January 2011)• Version 2.2 (July 2012)• Version 3.0 expected 2013
Includes:Data Dictionary Context/assumptionsData model Usage examplesConformance XML schema to support implementation
Data Dictionary: • Core set of implementable, broadly applicable preservation
metadata semantic units, supported by guidelines and recommendations for management and use
What does PREMIS cover?
Administrative metadata that supports the digital preservation process
Provides information to help manage a resource for preservation purposes• Technical characteristics• Information about actions on an object• Relationships (structural and derivative)
• Structural: indicates how compound objects are put together
• Derivative: results of common preservation actions• Rights metadata associated with preservation
In OAIS terms:• Metadata as part of SIP, AIP or DIP• Fits into Preservation Description Information (Reference,
Context, Provenance, Fixity)
What PREMIS is and is not
What PREMIS is:• Common data model for organizing/thinking about preservation
metadata• A checklist for core metadata in a repository• Guidance for local implementations• Standard for exchanging information packages between repositories
What PREMIS is not:• Out-of-the-box solution: need to instantiate as metadata elements in
repository system• All needed metadata: excludes business rules, format-specific
technical metadata, descriptive metadata for access, non-core preservation metadata
• Lifecycle management of objects outside repository• Rights management: limited to permissions regarding actions taken
within repository
PREMIS Data Model
IntellectualEntities
Objects
RightsStatements
Agents
Events
Intellectual Entities
Examples: Rabbit Run by John Updike (a
book) “Maggie at the beach”
(a photograph) The Library of Congress
Website (a website) The Library of Congress:
American Memory Home page (a web page)
Set of content that is considered a single intellectual unit for purposes of management and description (e.g., a book, a photograph, a map, a database)
May include other Intellectual Entities (e.g. a website that includes a web page)
**Has one or more digital representations** Previously not fully described in PREMIS DD, but will be in scope in
version 3.0
Objects
Examples: chapter1.pdf (a file) chapter1.pdf + chapter2.pdf +
chapter3.pdf (representation of a book w/3 chapters)
TIFF file containing header and 2 images (2 bitstreams (images), each with own set of properties (semantic units): e.g., identifiers, technical metadata, inhibitors, … )
Discrete unit of information in digital form
**Objects are what repository actually preserves**
Three types of Object:• FILE: named and ordered
sequence of bytes that is known by an operating system
• REPRESENTATION: set of files, including structural metadata, that, taken together, constitute a complete rendering of an Intellectual Entity
• BITSTREAM: data within a file with properties relevant for preservation purposes (but needs additional structure or reformatting to be stand-alone file)
Intellectual entity will become another level of object
Object Example: book in two versions
Intellectual EntityDa Vinci Code by
Dan Brown
Representation 1Page image
version
Representation 2ebook version
File 1: page1.tiff
File 2:page2.tiff
File N:pageN.tiff
File 1:book.lit
File N+1:METS.xml
Events
Examples: Validation Event: use JHOVE
tool to verify that chapter1.pdf is a valid PDF file
Ingest Event: transform an OAIS SIP into an AIP
Migration Event: create a new version of an Object in an up-to-date format
An action that involves or impacts at least one Object or Agent associated with or known by the preservation repository
Helps document digital provenance. Can track history of Object through the chain of Events that occur during the Objects lifecycle
Determining which Events should be recorded, and at what level of granularity is up to the repository
Agents
Examples: Martha Anderson (a person) Library of Congress (an
organization) Dark Archive in the Sunshine
State implementation (a system)
JHOVE version 1.0 (a software program)
Person, organization, or software program/system associated with an Event or a Right (permission statement)
Agents are associated only indirectly to Objects through Events or Rights
Not defined in detail in PREMIS DD; not considered core preservation metadata beyond identification
Rights Statements
Example: Priscilla Caplan grants FCLA
digital repository permission to make three copies of metadata_fundamentals.pdf for preservation purposes.
An agreement with a rights holder that grants permission for the repository to undertake an action(s) associated with an Object(s) in the repository.
Not a full rights expression language; focuses exclusively on permissions that take the form:• Agent X grants Permission Y to the repository in regard to
Object Z.
Technical metadata pertaining to objects
Object identifier Preservation level Significant characteristics Object characteristics
• fixity• format• size• creating application• inhibitors• object characteristics
extension Creating application Original name
Storage Environment
• software• hardware
Digital signatures Relationships Linking event identifier Linking permission
statement identifier
Semantic units pertaining to Events: provenance and preservation activity Event identifier Event type (e.g. capture, creation, validation,
migration, fixity check) Event dateTime Event detail Event outcome Event outcome detail Linking agent identifier Linking object identifier
Semantic units pertaining to Rights
· Rights Statement· Rights Statement
Identifier· Rights Basis· Copyright Information· License Information· Statute Information· Other Rights Information
· Rights Granted· act· restriction· termOfGrant· rightsGranted
· Linking Object Identifier· Linking Agent Identifier· rightsExtension
Semantic units pertaining to Agents
Agent Identifier Agent Name Agent Type Agent Note Agent Extension linking Event Identifier Linking Rights Identifier
The State of PREMIS
de facto standard for preservation metadata; in some countries mandated for cultural heritage repositories
Was recognized by winning the Digital Preservation Award (2005) and was shortlisted for DPC Decennial award for outstanding contribution to digital preservation (2012)
PREMIS implementations are appearing in many places, many contexts, many forms
Experimentation has led to changes in the data dictionary and schema
PREMIS Implementation fairs: attempts to consolidate implementation experiences, issues, best practices,
Key features of PREMIS Developed through international consensus-making process
Mobilized community to address shared need Shared solution to a shared need
Implementation neutral• Makes no assumptions about technology• Can be flexibly adapted for use across all sorts of
institutions, digital preservation contexts, repository systems• Allows for extensibility
Supported by Maintenance Activity and Editorial Committee, under auspices of US Library of Congress PREMIS is sustained, maintained, and evolved
Extensive outreach to implementer community Tutorials, guides, implementation fairs, PIG Forum “Support system” in place for PREMIS implementers
PREMIS Maintenance Activity
Web site:• Permanent Web presence, hosted by
Library of Congress• Central destination for PREMIS-related
info, announcements, resources• Home of the PREMIS Implementers’ Group (PIG)
discussion list
PREMIS Editorial Committee:• Set directions/priorities for PREMIS development• Coordinate future revisions of Data Dictionary and XML
schema• Promote implementation
http://www.loc.gov/standards/premis/
Implementation resources
Tools:• XML schema• PREMIS-in-METS toolbox <http://pim.fcla.edu>• Controlled vocabularies at http://id.loc.gov• RDF/OWL ontology for use as Linked Data
Guidelines:• PREMIS conformance statement• PREMIS & METS guidelines
Community Working groups on special topics Others:
• Understanding PREMIS (available in multiple languages)• PIG Forum• Implementation Registry• Tools Registry
Some implementers …
DAITTSS (Florida): a preservation repository for the use of the libraries of the public universities of Florida.
Ex Libris Rosetta: a commercial digital preservation system supporting acquisition, validation, ingest, storage, management, preservation and dissemination of different types of digital objects
National Digital Newspaper Program Archivematica: comrehensive open-source digital preservation
system National Archives of Sweden, National Archives of Scotland Carolina Digital Repository: repository for material in electronic
formats produced by members of the University of North Carolina at Chapel Hill community.
British Library electronic journal archiving project
For more information see:• http://www.loc.gov/premis/premis-registry.html
Impact De facto international standard for preservation metadata
• Part of permanent infrastructure supporting digital preservation
• ISO standardization being considered
Wide applicability means benefits from PREMIS extend to entire digital preservation community
Ongoing work to revise/update Data Dictionary and create new supporting resources• PREMIS is a dynamic resource that continues to generate
new sources of value to implementer community
Stood the test of time:• Seven years after initial release, is now indispensable part
of digital preservation implementations around the world• Not surpassed or replaced by other standard or resource
URLs, etc.
PREMIS Maintenance Activity:http://www.loc.gov/standards/premis/
PREMIS Data Dictionary for Preservation Metadata:http://www.loc.gov/standards/premis/v2/premis-2-2.pdf
Understanding PREMIS:http://www.loc.gov/standards/premis/understanding-premis.pdf
PREMIS Implementation Registryhttp://www.loc.gov/standards/premis/premis-registry.php
PREMIS Implementers Group listhttp://listserv.loc.gov/listarch/pig.html
Metadata for PreservationA digital object’s best friend
Implementation!
?
?Amy Kirchhoff
Archive ServiceProduct Manager
Standards
Standards
framework for thinking
Standards
framework for thinkinginterchange specification
[The PREMIS documentation has an] emphasis on the need to know rather than the need to record or represent in any particular way.
Content Type
Content Set(s)
Archival Unit(s)
Content Unit(s)
Functional Unit(s)Storage Unit(s)
IntellectualEntities
Objects
RightsStatements
Agents
Events
Digital preservation is the series of management policies and activities necessary to ensure the enduring usability, authenticity, discoverability and accessibility of content over the very long term.
Dublin Core
Dublin Core
DIDL (from MPEG-21)
Dublin Core
DIDL (from MPEG-21)
METS
Dublin Core
DIDL (from MPEG-21)
METS
OAIS
Dublin Core
DIDL (from MPEG-21)
METS
OAIS
…
Dublin Core
DIDL (from MPEG-21)
METS
OAIS
…
Experience
1. Content model
2. Metadata
elements
3. Registries
IntellectualEntities
Objects
RightsStatements
Agents
Events
Identifiers
IntellectualEntities
Objects
RightsStatements
Agents
Events
BooksJournalsDigitized NewspapersDigitized DocumentsSupplied FilesArchive Management Documents
BooksJournalsDigitized NewspapersDigitized DocumentsSupplied FilesArchive Management Documents1
Content Type(s)
Content Type(s)
Content Set(s)
Content Type(s)
Content Set(s)
Archival Unit(s)
Content Type(s)
Content Set(s)
Archival Unit(s)
Content Unit(s)
Content Type(s)
Content Set(s)
Archival Unit(s)
Content Unit(s)
Functional Unit(s)
Content Type
Content Set(s)
Archival Unit(s)
Content Unit(s)
Functional Unit(s)Storage Unit(s)
Descriptive Metadata
Technical Metadata
Events Metadata
PMD
PMDa thing of beauty
IntellectualEntities
Objects
RightsStatements
Agents
Events
1.1 objectIdentifier 1.2 objectCategory 1.3 preservationLevel 1.4 significantProperties 1.5 objectCharacteristics 1.6 originalName1.7 storage 1.8 environment 1.9 signatureInformation 1.10 relationship 1.11 linkingEventIdentifier 1.12 linkingIntellectualEntityIdentifier 1.13 linkingRightsStatementIdentifier
Semantic Unitsfor Objects
Registries
IntellectualEntities
Objects
RightsStatements
Agents
Events
2.1 eventIdentifier 2.2 eventType 2.3 eventDateTime2.4 eventDetail2.5 eventOutcomeInformation2.6 linkingAgentIdentifier 2.7 linkingObjectIdentifier
Semantic Unitsfor Events
Processing Records
Event Sets
Events
Some Portico Events
Edit Descriptive Metadata
Check Descriptive Metadata
Generate Descriptive Metadata
Ingest Into Archive
Create File
Generate Technical Metadata
Set Preservation Level
Generate Fixity
Timestamp
Rationale
InputList
ArgList
Output
ToolWrapper
Tool Component List
Outcome
OutcomeDetailList
Portico EventElements
Content Type
Content Set(s)
Archival Unit(s)
Content Unit(s)
Functional Unit(s)Storage Unit(s)
IntellectualEntities
Objects
RightsStatements
Agents
Events
3.1 agentIdentifier 3.2 agentName3.3 agentType3.4 agentNote3.5 agentExtension3.6 linkingEventIdentifier3.7 linkingRightsStatementIdentifier
Semantic Unitsfor Agents
IntellectualEntities
Objects
RightsStatements
Agents
Events
4.1 rightsStatement4.1.1 rightsStatementIdentifier 4.1.2 rightsBasis 4.1.3 copyrightInformation4.1.4 licenseInformation4.1.5 statuteInformation4.1.6 otherRightsInformation 4.1.7 rightsGranted4.1.8 linkingObjectIdentifier4.1.9 linkingAgentIdentifier
4.2 rightsExtension
Semantic Unitsfor Rights
Easy
Easy
For PorticoFor the moment
IntellectualEntities
Objects
RightsStatements
Agents
Events
NISO Webinar: Metadata for Preservation: A Digital Object's Best Friend
NISO Webinar • February 13, 2013
Questions?
All questions will be posted with presenter answers on the NISO website following the webinar:
http://www.niso.org/news/events/2013/webinars/preservation
Thank you for joining us today. Please take a moment to fill out the brief online survey.
We look forward to hearing from you!
THANK YOU