64
Grace Agnew August 1, 2005 National Library of Medicine

Grace Agnew August 1, 2005 National Library of Medicine

Embed Size (px)

Citation preview

Grace Agnew

August 1, 2005

National Library of Medicine

Definition of Metadata

Data about Data

Data that describes, defines or manages data

“Pure” metadata has meaning only in relation to the primary data that is being described.

auto-generated

automatically harvested from the resource

human-created

end user

metadata creator/manager

computer application/program

METADATA MAY BE:

AUDIENCE MAY BE:

Data Model:

o Abstract characterization or “World View” of the data:

-- relationships between objects in the model

-- “living” data—events occur in the lifecycle of each object in the model

--context independent—so that any context can be supported

ENTITIES

Metadata - Educational Objects - Metadata Creators - Users

ATTRIBUTES

Identify, Define Entities

MODEL

Relationships between Entities within a Domain

RELATIONSHIPS

One to one; One to Many ; Parent, child, sibling ; Inheritance

ORGANIZATION’S INFORMATION MODEL

The Structure of Information (IFLA)

Work

Expression Expression

Distinct intellectual or

artistic creation

Intellectual or artistic realization of a work (“interpretation”)

ManifestationManifestation Manifestation

Item

Unique physical

instance of a manifestation.

Physical manifestation of an expression. May differ in physical format, but not in content or interpretation

Intellectual / artistic content

Physical recording of content

Single physical representation of a

recording

A

B

S

T

R

A

C

T

I

O

N

GONE WITH THE WIND

InterpretationNovel MovieScript

WORK

EXPRESSION

MANIFESTATION Paper

PDF

HTML

70 MM Film

35 MM Film

DVD

MPEG2

Copy in Blockbuster, Atlanta, GA

24 Reels of film, MGM Archive

ITEM

PP

rr

oo

dd

uu

cc

ee

rr

SIPSIP

Ingest

Descriptive Info

Access/

Dissemination

Archival Storage

DIP

CC

OO

NN

SS

UU

MM

EE

RR

DI

AIP

DI

AIP

OAIS - Reference Model for an Open Archival Information SystemFrom: CCSDS 650.0-R-1: Reference Model for an Open Archival Information System (OAIS). Red Book. Issue 1. May 1999. PDF.Available at: http://ssdoo.gsfc.nasa.gov/nost/isoas/overview.html

OAIS INFORMATION MODEL

Data Model

Record Structure

Repository

Design

Data Element Registration

Database Population

Dissemination to Users

Data interchange

(other repositories)

End - to - End Metadata Implementation

Inside the Digital Information Repository

Persistent Objects:

Manage objects through changes to: hardware, software, players, search&retrieval systems, etc.Persistent Metadata:

Manage metadata through schema and data element versioning changes, new metadata formats, I&R changes, hardware & database migrations.

Key Issue for Preservation

• Authenticity

-- integrity “digital document must be whole and undisturbed”

--provenance – must be tightly associated with its creator and act of creation

Gladney and Bennett. What do we mean by authentic? http://www.dlib.org/dlib/july03/gladney/07gladney.html

In the analog space

Object in hand is compared with a conceptual (“canonical”) historical version

Authenticity

o In the digital space

-- Fidelity to the source artifact

-- Identical (true/false) to the digital

canonical master

--accompanied by a “true” provenance

statement

--Proof: digital signature verifying that canonical object is unchanged. Digital audit trail documenting provenance and any changes to artifact or chain of provenance

Administrative metadata: provenance, fixity, context, reference, and

lifecycle management. Rights MD may be a subset.

Technical Metadata: physical characteristics of the resource. Used to

manage digital preservation and display of resource. May be a subset of Administrative MD. Also called Preservation Metadata

Descriptive Metadata: - information to discover, identify, select and

obtain the resource

Metadata Managing the Resource

Structural metadata: - Information about the structured

relationship between components of a complex object. May be a subset of Administrative MD.

Meta metadata: metadata that describes and manages the

metadata record. Can add “intelligence” to metadata.

Metadata Managing the Resource

Repository design concatenates all types of metadata to support preservation and access to objects in the repository

METADATA SCHEMA COMPONENTS Data Element - Atomic Unit of Meaning- Community Defined

Attribute - Refines, Extends, Interprets data element

Value - Information unique to each data element instance

Constraint - Order imposed on data element expression for consistency; semantic viability

Label - contextual instance of data element name. “How the data element displays on the web for the end user.”

OAIS – Preservation and Access

File Encoding and Transport

METS: Metadata Encoding & Transmission Standard

• XML document format for encoding metadata for resource description and management.

• “wrapper” that concatenates digital object(s) in multiple formats, metadata, a structure map documenting the organization of the digital object(s), as well as behaviors that act upon digital object(s)

• standardized transmission of METS package between repositories and applications

METS:

Metadata Encoding & Transmission Standard

METS Document has seven major sections:

METS Header: minimal descriptive metadata about the METS document itself

Descriptive Metadata: metadata describing the digital object, to enable discovery and evaluation.

Administrative Metadata: metadata about the creation, use and provenance of the digital object(s). Includes four subtypes: technical, source, rights and digital provenance metadata

METS:

Metadata Encoding & Transmission Standard

File Section: Includes one or more <fileGrp> elements, to group together related files, such as the different digital manifestations of a file, e.g.,the uncompressed digital master, mpeg4 and Quicktime access files, for a video title.

Structural Map: Outlines hierarchical structure of a digital object and links the elements of that structure to relevant content files and metadata

METS:

Metadata Encoding & Transmission Standard

Structural Links: Contains a single element, <smLink>. Used to record the existence of hyperlinks between items within the structural map.

Behavior: Used to associate executable behaviors with content within the METS document. For example, a behavior could automatically launch a video player application when a digital video file is selected for display.

FEDORAFEDORABackground:

o “Flexible, Extensible Digital Object Repository Architecture”

o Developed by Cornell University and University of Virginia via a Mellon Foundation Grant.

o Utilizes METS (v 2.0 – FOXML (Interoperable with METS)

http://www.fedora.info/

PREMIS Data Dictionary

o Sponsored by OCLC and RLG

o Defines a “core” set of preservation metadata elements

o Provides a data dictionary supporting the preservation of digital information

PREMIS Data Model

Intellectual

Entities

Objects

Events

Agents

Rights

http://www.oclc.org/research/projects/pmwg/

MPEG-21 Multimedia Framework

oTransparent management and use of digital multimedia resources, from creation through consumption.

o Key concept is the Digital Item Declaration, which includes structure, resources and metadata bundled in the item.

o Repository architecture—LANL’s aDORe—modular digital object repository architecture modeled on MPEG21.

http://public.lanl.gov/herbertv/papers/aDORe_20050128_submission.pdf

MXF: Multimedia Exchange Format`

• “Open file format targeted at the interchange of audiovisual material, with associated data and metadata.”

• Intended to support file interoperability between content creation devices, servers and workstations. Supports integration of file-based and streaming resource formats.

• Maintains the “documentation chain” for metadata about audiovisual essences throughout the resource lifecycle—creation, broadcast, storage, re-use

MXF: Multimedia Exchange Format

Example: Video footage of hurricane activity in the field has automatic GPS, date/time and duration capture as captions on the footage. MXF can maintain the essence and the metadata captured simultaneously by the camera for use in production, archiving and reuse, without the need to “recatalog” the information.

Example: Footage of jaguar hunting in Brazil is captioned in the field, transferred with captions to production facility, where it is packaged into a program, “The Vanishing Rainforest.” Footage is licensed to a travelog production company. Footage of jaguar on the DVD, “This is Brazil” has online attribution to “The Vanishing Rainforest,” from metadata added in production, as well as attribution to the the field cinematographer, location, date and time of capture, from the original captions, with no recreation of metadata.

MXF: Multimedia Exchange Format

Header partition

pack

Header metadata

Essence ContainerFooter

partition pack

File HeaderFile Body

File FooterEvery item in MXF File is KLV (Key Length Value) encoded—identified by a unique 16-byte key and by its length. Anything that is not understood or needed (unrecognized keys) can be ignored and skipped over

MXF: Multimedia Exchange Format

Header Metadata:

• Metadata (DMS-1 or other schema)

• Timing and synchronization parameters

Synchronization and Description of the Essence through three packages:

• Material Package: Output timeline of the file (tracks and sequence)

• File Package: the essence itself

• Source Package: Derivation of the essence (“source film stock” descriptions, etc.

Content IntellectualProperty

Instantiation

Title Creator DateSubject Publisher TypeDescription Contributor FormatSource Rights IdentifierLanguageRelationCoverage

Dublin Core

From “Description of Dublin Core Elements”http://purl.oclc.org/metadata/dublin_core_elements

Every element is optional, repeatable, with rules for format and values

DESCRIPTIVE METADATA SCHEMAS

• Provides a great deal of flexibility.• Easy to learn.• Ensures interoperability with other schemes.• Good transport protocol when expressed as XML

+

-• Lacks support for multiple formats• Lacks support for seriality• Technical description (formats, containers, extent,

etc.) is weak and not standardized.

DUBLIN CORE

PBCore

Intended to address description, preservation and access needs of television, radio, and associated web activities.

Based on Dublin Core—qualifies and expands the 15 Dublin Core data elements.

58 Data Elements (30 mandatory)

V 1.0 available free of charge for use, via the Corporation for Public Broadcasting.

Maps readily to other schema (Dublin Core, MPEG-7, MODS, etc.)

PBCore

PBCore

• Data elements address descriptive and technical metadata for access and management

• Simple “linear” data model is easy to apply

• Like Dublin Core, does not address issue of “multiple manifestations” (Although both can be used within METS to address this issue).

<FormatFileSize>296 MB </FormatFilesize>

<FormatImageFrameRate>30 fps</FormatImageFrameRate>

<format>296 MB</format>

<format>30 fps</format>

DC “Dumb Down”

PBCore – “Qualified” Dublin Core for DV

Synchronization between content and description

Textual indexing: Creation information, subjects, concepts, media profiles.

Non-textual indexing - melody and speech recognition, color, shape, scene changes, etc.

Textual format/Binary Format completely equivalent. You can use any functionality in textual or nontextual form.

MPEG-7: Multimedia Content Description Interface

Does not support description of analog or textual resources

High-level textual description of component parts (“table of contents”) does not exist.

Some duplication of descriptive information across MPEG7 descriptive schemes

Documentation, examples and widespread adoption as a descriptive metadata standard is weak.

MPEG-7: Multimedia Content Description Interface

MPEG-7

TextualEncoder

MPEG-7

TextualDecoder

Contentdescription

MPEG-7

BinaryEncoder

MPEG-7

BinaryEncoder

Content

Access Unit -Textual Format

Access Unit

BinaryFormat

MPEG-7MPEG-7

MPEG-7 Content Description:Low level Audio Visual descriptors

• Color • Camera motion• Motion activity• Mosaic

• Color • Motion

trajectory• Parametric

motion• Spatio-temporal

shape

• Color • Shape• Position• Texture

Video segments Still regions

Moving regions Audio segments

• Spoken content

• Spectral characterization

• Music: timbre, melody

MPEG-7 Description ToolsDescription Schemes (structure) and Descriptors (features)

Figure 1: Overview of the DSsFigure 2: Overview of the DSs

Datatype &Structures

Link & MediaLocalization

Models

Navigation &Access

Content management

Content description

Collection &Classification

Summaries

Variations

Content organization

Creation &Production

Media Usage

Semanticstructure

Spatio-temporalstructureAspects

User Interaction

UserPreferences

UsageHistory

Roots and Top-level Elements

PackagesSchemaTools

Partitions andDecompositions

Basicelements

Audio and Visualfeatures

Dublin Core vs. MPEG7 – The Challenges

• MPEG7 is a structured, hierarchical schema.

• “Work” described in CreationInformation DS

• Manifestation/Item described in MediaInformation and UsageInformation DSs

•Dublin Core is a “flat” schema that mixes “work” or intellectual content with single manifestation/item description

(“1:1 principle”)

MANIFESTATION in DC and MPEG-7

CREATOR

TITLE

SUBJECT

DATE

IDENTIFIER

FORMAT

RIGHTS

IDENTIFIER

FORMAT

RIGHTS

CreationInformation

MediaProfileUsageAvailability

MediaProfileUsageAvailability

MediaInstance

MODS: Metadata Object Description Schema

• XML representation of MARC21 data, to enable seamless transfer of MARC data to XML.

• Enables both original description of digital and analog resources and mapping of legacy metadata in MARC to MODS

• MODS is represented in application profiles for METS Descriptive MD and OAI-PMH for data sharing and transport

MXF DMS-1Material Exchange Format – Descriptive Metadata Scheme-1 (SMPTE 380M-2004)

• Utilizes SMPTE RP 210 –Metadata Dictionary Registry of Metadata Element Descriptions

• Data model and core rules are taken from AAF, so that DMS-1 can be seen as an Application of AAF.

• Utilizes a collection of descriptive metadata frameworks.

• Supports migration of DM from one MXF file to another when essence is migrated or reused.

MXF DMS-1

Frameworks: “grouping of related descriptive metadata properties and sets, which describe the contents of an MXF file body.”

• Production framework: “provide[s] identification and ownership details of the audio-visual content in the file body.” “Applies to the complete input or output of the MXF file as a whole.”

• Clip framework: “provide[s] capture and creation information about the individual “audio-visual” clips in the file body. “A ‘clip’ is a continuous essence element, or essence element interleave, in the essence container.

MXF DMS-1

Scene framework: “describe[s] actions and events within individual scenes of the aufio-visual content of the file body.” “Scene is an editorial concept and describes a continuous section of content in an MXF file.”

MXF DMS-1Production framework

Award

Identification

Group Relationship

Branding

Titles

Participant

Metadata Server Locator

Event

Captions Description

Annotation

Setting/Period

Contract

Picture Format

Project

Publication

Annotation

Classification

Cue Words

Related Material Locator

Rights

MXF DMS-1Clip framework

Project

Captions Description

Picture Format

Processing

Titles

Participant

Metadata Server Locator

Annotation

Scripting

Shot

Contract

Device Parameters

Scripting Locator

Cue Words

Related Material Locator

Classification

Cue Words

Key Point

Rights

Name-value

Name-value

MXF DMS-1Scene framework

Setting period

Participant

Contacts List

Titles

Metadata Server Locator

Annotation

Shot

Cue Words

Related Material Locator

Classification

Cue Words

Key Point

Name-Value

MXF DMS-1

ParticipantPerson

Organization

Location

Address

Name-value

Communications

Name-value

Union Catalog

Archive Directory

Education and Outreach Space

Cataloging Utility

Dynamic, contextual portals

Union Catalog

Archive Directory

Education and Outreach Space

Cataloging Utility

Dynamic, contextual portals

Concatenate moving images for preservation and access through:

MIC Organization Directory

• Contact information, home page URL, logo

• Collection descriptions

• Preservation activities

• Cataloging activities

• How to obtain materials

• Administrative information

• Shibboleth Authentication/Authorization

MIC Organization Directory

• Intersects with the Union Catalog for:Intersects with the Union Catalog for:

• pre-selection for union catalog searches

• provide information about the organization, particularly obtaining resources, audience served, location, etc.

OrgID

Org Org DirectoryDirectory

Union Union CatalogCatalog

MIC PORTALS

• Resource and organization descriptions specific and organization descriptions specific to communityto community

• PortalID in both Org Directory and Union PortalID in both Org Directory and Union Catalog retrieves portal-specific informationCatalog retrieves portal-specific information

PortalID

Org Org DirectoryDirectory

Union Union CatalogCatalog