Upload
anis-robbins
View
228
Download
0
Tags:
Embed Size (px)
Citation preview
2
Agenda
• Preservation Metadata• PREMIS Overview• Data Dictionary Conventions• PREMIS Data Model• The Data Dictionary• PREMIS In use
4
• Metadata is often defined as “Data about Data”.• It defines information about one or more characteristics of
the data; such as,– Data’s name, description, purpose, created date-time, creator, basic
information, and etc.
• For example– Library catalogues: a small card contains a book’s title, author,
subject, category, shelf, and etc. that describes resource in library
• Furthermore, it can say that – “Metadata is commonly understood as an amplification of traditional
bibliographic cataloguing practices in an electronic environment.”
Metadata
Metadata Meaning
wikipedia.org
5
• Descriptive– It always describes identification and information of resource;
such as, title, author, and etc.
• Administrative– It helps to manage information of resource;
such as, version number, archiving data, technical information, right management, and etc.
• Structure– It informs relationships within and among resource objects;
such as, web page contains html files, image files, css files, javascript files, links to others files, and etc.
Metadata
Metadata Categories
wikipedia.org
6
Preservation Metadata
Overview
• It is “an essential component of most digital preservation strategies”. [Wikipedia]
• It’s basic requirements are: [OCLC]
– To store technical information that supports making decision and action in order to do preservation
– To document actions taken, such as migration.– To record the effects of preservation strategies– To ensure authenticity of digital resources over the long-term– To note information about collection management and rights management
• It’s basic functional objectives are: [OCLC]
– Providing knowledge about actions to maintain digital resource over the long-term
– Ensuring that the digital resources can be rendered originally
OCLC.org, wikipedia.org
7
Preservation Metadata
Basic features
According to preservation requirements, preservation metadata should include following information:• Provenance
– Describe history of creation, ownership, access, and change
• Authenticity– Ensure trustworthiness (Does digital resource render originally?)
• Preservation activities– Record process supporting preservation, such as migration
• Technical environment– Provide name and version of hardware, platform, OS, and software that is required to
render digital resources
• Rights management– Inform concern of intellectual property rights and agreement that need to be observed
when execute preservation process.E.g. does a creator allow to copy his/her work or not?
OCLC.org, usenix.org, wikipedia.org
8
Preservation Metadata
Example
• Date• Transcriber• Producer• Capture Device• Capture Details• Change History• Validation Key• Encryption
• Watermark• Resolution• Compression• Source• Color• Color Management• Color Bar/Gray-scale Bar• Control Targets
16 preservation metadata elements ( recommended by oclc.org, May 1998)
OCLC.org
9
Preservation Metadata Framework
Overview
• A framework that is an overview or description types and association of digital preservation metadata
• Following OCLC/RLG, the framework should have 3 requirements
– Comprehensive• The metadata completely includes information that meet requirements of
big picture of digital preservation data structure and processes
– Structured• Preservation metadata should represent in structural format which
makes human and machine understand clearly.
– Broadly applicable• Digital object type, preservation activities, their relationship
should be flexible for implementing in real world, such as institution, and etc.
OCLC.org
10
Preservation Metadata Framework
Overview
In order to meet the requirements, it should realize these 3 steps1. Design metadata model that supports content model, long-
term accessibility, and preservation activities.2. Think of future interoperability, then, modify the model for
supporting metadata exchange and resource sharing.3. Improve the model to be flexible to intergrade with
external archive.
OCLC.org
11
Preservation Metadata Framework
Example
AHDS
Technical Description
Persistent ID
File Description
Text
Format
Version
Structure division
Image
Format
Resolution
Size
Management Description
Created date
Storage information
Software Environment
Application required
OS Name Version
Functionalities
Ingest
Migrate
Agent
Date
Software VersionAccess
Share
Modify from AHDS Preservation Metadata Framework
AHDS.ac.uk
12
Summary
• Metadata is“Structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource” [LOC]
• Preservation Metadata is“A metadata that supports and documents the digital preservation process” [LOC]
• Preservation Metadata Framework is“An important contribution toward shaping an international consensus on the metadata requirements of archived digital objects and consolidating expertise on the use of metadata to support digital preservation” [OCLC]
LOC.gov, OCLC.org
14
• PREservation Metadata: Implementation Strategies• Sponsor by Library of Congress (LOC)• People usually refer to “PREMIS” as “Data Dictionary”• Represent in XML format
PREMIS Overview
What?
LOC.gov, wikipedia.org
15
• Set of Semantic Unit• Metadata for digital object
– Can read from media– Can render– Store securely– Keep track of changing format
• Metadata Scope– Format-spec e.g. audio, video, image, …– Implementation-spec How to access it (by app)– Descriptive metadata Data properties; like, MARC, DC– Detailed info (For media or hardware)– Agents info e.g. people, org, or software– Right info e.g. license, permission
PREMIS Overview
PREMIS Data Dictionary
PREMIS from LOC.gov
16
PREMIS Overview
Where is PREMIS?
PREMIS responses itself as a coordinator among several types of metadata in order to perform preservation function on all digital resources.
Thus, PREMIS is a small core at the heart of preservation metadata
PREMIS from LOC.gov
17
• Administrative metadata that support the process of digital preservation
• Information providing to support preservation management– Technical information (Characteristics)
• E.g. creator, created date-time, creating software, …
– Information about action of a digital object• E.g. ingest, migrate, verify, …
– Relationship• Structural : point out how objects are put together• Derivative : result from actions of preservation
– Rights• E.g. Rights and agreement metadata associated with preservation
PREMIS Overview
PREMIS data dictionary covers:
PREMIS from LOC.gov
18
• Support managing repository system– Long-term preservation– Repository migration (to another)
• Scope– Repository Design– Repository Evaluation– Exchange of archived ‘information package’ among repositories
• Development view– Use PREMIS as a guideline for what info should be recorded
PREMIS Overview
Usefulness
PREMIS from LOC.gov
19
• Support Data preservation by having– Inhibitors
• Password, encryption, … in order to access digital objects
– Digital Provenance• Record change of object format e.g. .DOC .PDF• Contain application, version, environment, … in order to render digital objects
– Significant Properties (If important)• Object’s characteristics e.g. font, formatting, color, …., etc• Look and feel
– Right• Copyright status, License term
PREMIS Overview
Using PRMIS if you have to
PREMIS from LOC.gov
21
• Information a repository uses to support the digital preservation process– Guidelines/recommendations to support preservation process; such
as, creation, use, and management.
• Information is defined as:– Thing that most working repositories have common concern and
need in order support digital preservation
Data Dictionary Conventions
Data dictionary
PREMIS from LOC.gov
22
• PREMIS prefers to use term “Semantic Unit” rather than “Metadata Element”.
• Semantic unit is an entry of data dictionary• Semantic unit is defined as a property of entity in PREMIS
data model• Semantic unit supports the recording of relationship
between objects.• Example
– Identifier, size, format, environment, software, …
Data Dictionary Conventions
Semantic Unit
PREMIS from LOC.gov
24
Software- swName = “Windows”- swVersion = “XP”- swType = “OperatingSystem”
Data Dictionary Conventions
Container
Software = “Windows|XP|OperationSystem”
What should we do if the semantic unit’ value has to address with many meaning?
The data dictionary allow concept of container that group as set of related semantic units together.
Container
components
26
• New in PREMIS 2.0• Contains externally defined semantic units• Allows to extend PREMIS with semantic units which are
more granular, non-core or out of scope of the PREMIS data dictionary
• Data in the container may replace, refine or be additional to the appropriate PREMIS semantic unit
• One schema per extension; if more schemas are needed, the extension element needs to be repeated
Data Dictionary Conventions
Extension Container (General)
PREMIS from LOC.gov
27
Data Dictionary Conventions
Example : <objectCharacteristicsExtension>
Normally, <objectCharacteristicsExtension> has information following PREMIS schema like:
PREMIS louis.xml from LOC.gov
28
Data Dictionary Conventions
Example : <objectCharacteristicsExtension>
If it need more information a part from PREMIS schema, the information fromother schemas (e.g. METS) can be address in <objectCharacteristicsExtension>
PREMIS louis.xml from LOC.gov
30
PREMIS Data Model
Data Model
Including:
• Entity– Thing relevance to do digital preservation that is described by
preservation metadatasuch as, Intellectual, Objects, Events, Rights, and Agents
• Property of entity (Semantic Unit)– Such as, Identifier, size, format, environment, software
• Relationship between entities– Linking entity together e.g. isPartOf, isSourceOf, isDerivedFrom, …– For example:
• Document X2 is a newer version of document X1• Document AA is a chapter of document A
32
• May called “Bibliographic Entities”• A set of content that is considered a single intellectual unit
for purposes of management and description– E.g. book, map, photograph, or database
• Not fully described in PREMIS Data Dictionary– It can use by other metadata standard, such as, DublinCore.
Intellectual Entities
Overview
Intellectual
Objects
Rights
Agents
Events
PREMIS tutorial from LOC.gov
33
• To be stored and managed in the preservation repository• E.g.
– Intellectual Entity : “Thailand Map”• Object Entity : Image file
• 3 Kinds of object– File
• A computer file, likes a PDF or JPEG
– Representation• Set of files that work together• E.g. web page including, html, image, css, javascript
– Bitstream• A part of file• E.g. a frame image in video file
Object Entitles
Overview
Intellectual
Objects
Rights
Agents
Events
PREMIS tutorial from LOC.gov
34
• Chapter1.pdf is a File• Chapter1.pdf + Chapter2.pdf + chapter3.pdf is a
Representation of a book having 3 chapters• A TIFF file contain header and 2 images
– It means that there are 2 Bitstreams of 2 images– Each bitstream (image) has own set of semantic unit
Object Entitles
Example
35
Object Entitles
ExampleThailand Map
Intellectual
Object 1 Object 2 Object 3
Representation File File1 jpeg file1 TIFF file include:
3 bitstreams of images of map layers• Province• mountain,• river
It can be a web page that contains 3 files • HTML• CSS• JPEG
Example types of object that is possible to preserve the Thailand Map
36
• a unique identifier for the object (type and value),• fixity information such as a checksum (message digest) and the algorithm used to
derive it,• the size of the object,• the format of the object, which can be specified directly or by linking to a format
registry,• the original name of the object,• information about its creation,• information about inhibitors,• information about its significant properties,• information about its environment
– OS MacOS, Browser Safari
• where and on what medium it is stored,• digital signature information,• relationships with other objects and other types of entities.
Object Entitles
Data Dictionary
PREMIS from LOC.gov
41
• Action that effect object in the repository– The action must has at least one object and agent recorded– Event must has outcome (a result of event); such as, success or fail.
Event Entities
Overview
Intellectual
Objects
Rights
Agents
Events
PREMIS tutorial from LOC.gov
42
Event Entities
Event Type
Event Type Description
capture the process whereby a repository actively obtains an object
compression the process of coding data to save storage space or transmission time
creation the process of removing an object from the inventory of a repository
deaccession the process of removing an object from the inventory of a repository
decompression the process of reversing the effects of compression
decryption the process of converting encrypted data to plaintext
deletion the process of removing an object from repository storage
1
PREMIS from LOC.gov
43
Event Entities
Event Type
Event Type Description
digital signature validation
the process of determining that a decrypted digital signature matches an expected value
dissemination the process of retrieving an object from repository storage and making it available to users
fixity check the process of verifying that an object has not been changed in a given period
ingestion the process of adding objects to a preservation repository
message digest calculation
the process by which a message digest(“hash”) is created
migration a transformation of an object creating a version in a morecontemporary format
PREMIS from LOC.gov
2
44
• a unique identifier for the event (type and value),• the type of event (creation, ingestion, migration, etc.),• the date and time the event occurred,• a detailed description of the event,• a coded outcome of the event,
(Result of event; success | fail | …)• a more detailed description of the outcome,• agents involved in the event and their roles,• objects involved in the event and their roles.
Event Entities
Data dictionary
PREMIS from LOC.gov
46
• Actor, e.g. person, organization, or software• Metadata standard, e.g. FOAF, vCARD, eduPerson, …
• Note: Agent can has many roles – Role is not belong to Agent– It is up to Event entities or Rights entities
Agent Entities
Overview
Intellectual
Objects
Rights
Agents
EventsPREMIS tutorial from LOC.gov
47
• a unique identifier for the agent (type and value),• the agent's name,• designation of the type of agent (person, organization,
software).
Agent Entities
Data dictionary
PREMIS from LOC.gov
49
• Information about Rights and Permissionsthat are directly relevant to preserving objects in repository– Rights: Assertions of one or more rights or permissions
pertaining to a Digital Object and/or an Agent.
• Example:– John Hebeler grants AIT digital repository permission to make 10
copies of Semantic_Web_Programming.pdf for preservation purposes
• Pattern– Agent A – grants permission B to the repository – in regard to object C.
Rights Entities
Overview
Intellectual
Objects
Rights
Agents
EventsPREMIS tutorial from LOC.gov
50
• a unique identifier for the rights statement (type and value),• whether the basis for claiming the right is copyright, license
or statute,• more detailed information about the copyright status,
license terms, or statute, as applicable,• the action(s) that the rights statement allows,• any restrictions on the action(s),• the term of grant, or time period in which the statement
applies,• the object(s) to which the statement applies,• agents involved in the rights statement and their roles.
Rights Entities
Data dictionary
PREMIS from LOC.gov
53
The Data Dictionary
Example Data dictionary of semantic unit
Semantic UnitName of semantic unit
PREMIS from LOC.gov
54
The Data Dictionary
Example Data dictionary of semantic unit
Semantic Component
If it contains child components, components will describe. Otherwise, display “None”.
PREMIS from LOC.gov
55
The Data Dictionary
Example Data dictionary of semantic unit
Definition
Description of the semantic unit
PREMIS from LOC.gov
56
The Data Dictionary
Example Data dictionary of semantic unit
Rationale
Reason that PREMIS include this semantic unit
PREMIS from LOC.gov
57
The Data Dictionary
Example Data dictionary of semantic unit
Data constraint
Specification on value of the sematic unit.For example:• None
(No constraint)• Integer
(Value must be integer number)• Value from controlled
vocabulary(The value must come from controlled vocabulary)
• Container (the unit is a container)
PREMIS from LOC.gov
58
The Data Dictionary
Example Data dictionary of semantic unit
Object category
This section is describe rule of data that depend on eachobject type:• Presentation• File• Bitstream
PREMIS from LOC.gov
59
The Data Dictionary
Example Data dictionary of semantic unit
Applicability
Describe that is this semantic unit applicable to current working object type or not.If “Not applicable”, this semantic unit can be ignored from metadata. In this case, semantic unit “Size” can be apply to object types “File” and “Bitstream” only, but not “Representation”.
PREMIS from LOC.gov
60
The Data Dictionary
Example Data dictionary of semantic unit
Example
An example value of this semantic unit may use.
PREMIS from LOC.gov
61
The Data Dictionary
Example Data dictionary of semantic unit
Repeatability
Indicates that the semantic unit is able to take multiple value under same container
“Not repeatable” = can use at most one time.
“Repeatable” = can use more than one time.
PREMIS from LOC.gov
62
The Data Dictionary
Example Data dictionary of semantic unit
Obligation
Indicate that is the semantic unit required to store in metadata or not?
“Mandatory” = It is required.
“Optional” = It is not necessary to use.
PREMIS from LOC.gov
63
The Data Dictionary
Example Data dictionary of semantic unit
Creation / Maintenance Note
Further detail regarding how the values are created and or updated.
In this case, the value is automatically generate by repository
PREMIS from LOC.gov
64
The Data Dictionary
Example Data dictionary of semantic unit
Usage notes
provides information regarding the use of the semantic unit.
PREMIS from LOC.gov
65
The Data Dictionary
Example list of PREMIS Semantic Unit
Name : Name of semantic unit (It can be a container, if it has component units)
PREMIS from LOC.gov
66
The Data Dictionary
Example list of PREMIS Semantic Unit
M : Mandatory (Must define)
O : Optional (Not necessary to define)
PREMIS from LOC.gov
67
The Data Dictionary
Example list of PREMIS Semantic Unit
R : Repeatable (Can use at most 1 unit)
NR : Not repeatable (Can use more than 1 unit)
PREMIS from LOC.gov
68
The Data Dictionary
Example list of PREMIS Semantic Unit
End with [a,b] : Apply to specific object types e.g. presentation and file
None : Apply to all object typesPREMIS from LOC.gov
69
• Although descriptive metadata is important to describe Intellectual Entities, the descriptive metadata is not focused in PREMIS because:– There have existing well-defined standard, such as MARC, MOD,
DublinCore, and etc.– The descriptive metadata is often domain specification. Thus, each domain
should use a proper standard.
Data Dictionary Conventions
Limitation of Data Dictionary
PREMIS from LOC.gov
71
• Institution– University of North Carolina at Chapel Hill
• Description– The Carolina Digital Repository (CDR) is being designed as repository for material in
electronic formats produced by members of the University of North Carolina at Chapel Hill community. Its chief purpose is to provide for the long-term preservation of such materials. By preservation we mean the ability to ingest the material, index and search it, replicate it, and keep it safe from alteration. The project is recording and/or mapping to PREMIS elements as the repository with a preservation focus is built.
• Link– http://www.lib.unc.edu/cdr/
• Tool– Locally developed Java web apps plus Fedora Commons, iRODS data grid, Solr search
engine and the Duke Data Accessioner
PREMIS in use
Carolina Digital Repository
PREMIS registry from LOC.gov
72
• Institution– The National Archives of Sweden
• Description– PREMIS is used for processing and storing digital objects in a digital
repository. The National Archives is developing a transfer model for digital objects created in our scanning projects. A function is being developed for packaging and storing data about the digital objects in our archival information system ARKIS partly stored as PREMIS-metadata. The application is in use for storing data. An application for exporting PREMIS data as XML will be developed in the future.
• Tool– ESSearch
PREMIS in use
Creating a digital repository at the Swedish National Archives using PREMIS
PREMIS registry from LOC.gov
73
• Institution– Florida Center for Library Automation
• Description– The FCLA Digital Archive is a preservation repository for the use of the libraries of the
public universities of Florida. The FCLA Digital Archive uses a locally-developed software application called DAITSS, which implements most of the PREMIS data elements.
• Link– http://www.fcla.edu/digitalArchive/
• Tool– The archive is in production as of November 2005. Dissemination (DIPs) with PREMIS-
conformant metadata is expected by July 2006.
• Document– http://www.fcla.edu/digitalArchive/daInfo.htm
PREMIS in use
FCLA Digital Archive and DAITSS
PREMIS registry from LOC.gov
74
• Institution– National Archives of Scotland
• Description– The NAS is preparing for the ingest of digital objects from the Scottish
Executive (the government of Scotland) and the Scottish Courts. An application is under development that aims to be compliant with OAIS, PD0008 and PREMIS to met this requirement.
• Tool– The DDA aims to implement the DROID API of PRONOM, developed
by the National Archives, among other tools.
PREMIS in use
Digital Data Archive (DDA) Project
PREMIS registry from LOC.gov
76
References
• http://www.oclc.org/research/activities/past/orprojects/pmwg/presmeta_wp.pdf Preservation Metadata for Digital Objects: A Review of the State of the ArtOCLC/RLG Working Group on Preservation MetadataJanuary 31, 2001
• http://en.wikipedia.org/wiki/Metadata• http://en.wikipedia.org/wiki/Preservation_metadata• http://www.usenix.org/event/tapp09/tech/full_papers/factor/factor.pdf
Authenticity and Provenance in Long Term Digital Preservation: Modeling and Implementation in Preservation Aware StorageMichael Factor, Ealan Henis, Dalit Naor, Simona Rabinovici-Cohen, Petra Reshef, Shahar Ronen,IBM Research Lab in Haifa, Israel and Giovanni Michetti, Maria Guercio, University of Urbino, Italy
• http://www.ahds.ac.uk/preservation/preservation-metadata-review.pdfAHDS Preservation Metadata FrameworkRaivo Ruusalepp, Estonian Business Archives, Ltd, September 2002
• http://www.loc.gov/standards/premis/understanding-premis.pdf• http://www.loc.gov/standards/premis/v2/premis-2-0.pdf• http://www.loc.gov/standards/premis/premis-registry.php• http://www.loc.gov/standards/premis/tutorials.html• http://www.loc.gov/standards/premis/louis-2-0.xml