35
Metadata Modularization Concepts and Tools Carl Lagoze CS502 2001-03-14

Metadata Modularization Concepts and Tools Carl Lagoze CS502 2001-03-14

Embed Size (px)

Citation preview

Metadata ModularizationConcepts and Tools

Carl Lagoze

CS502

2001-03-14

Metadata

Structured data about data….

Why is Metadata important?

Key to organizing, managing, preserving, and locating content and services in digital

libraries

Why is Metadata difficult?

• Cost• Interoperability

– Syntax

– Semantics

• Customizability• Extensibility

• Distribution• Integrity, Authenticity,

Quality• Human and Machine

Factors• Naming

Metadata Thoughts

• Metadata takes a variety of forms– descriptive cataloging– specialized

• terms and conditions

• administrative

• content ratings

• provenance

• linkage

More Metadata Thoughts

• New metadata sets will continually evolve

• Many metadata sets are “community-specific”– administration– use

• Human and machine use

Dublin Core

• Metadata Set for Simple Resource Discovery

• 15 elements allowing simple descriptive sentences about document like objects:– “Document has title Hamlet”– “Document has creator William Shakespeare”– “Document has subject love and anguish”

The Dublin Core 15

• Title • Creator • Subject /Keywords

• Description • Publisher • Other Contributor

• Date

• Resource Type • Format • Resource

Identifier • Source • Language • Relation • Coverage • Rights

Management

A Scope for the Dublin Core

• Increase or decrease number of elements?

• Structured or Unstructured value syntax?

• Accommodate community extensions?

Warwick Framework

• Provide context for Dublin Core effort

• Integrate multiple sets of metadata addressing issues of:– individual integrity– distinct audiences– separate realms of responsibility and

management

Warwick Framework Design

• Containers for aggregating …• Packages of typed metadata sets• General principles - information hiding:

– only operation defined at container level returns sequence of contained packages

– packages are opaque at the container level– access to package contents subject to terms and

conditions

Package Types

• Simple metadata set– segregating distinct metadata into separate

packages

• Recursive container – nesting semantically related metadata sets

• Indirect reference– allowing distribution and sharing of metadata

sets

Metadata Container

Container

Package

Dublin Core

Package

MARC record

Package

Indirect Reference

Package

Terms and Conditions

URI

Open Implementation Issues

• Data encoding

• Semantic interaction of overlapping sets– between semantically-related packages– between semantically distinct packages

• Type registry

Modeling & Encoding Metadata Components: XML Namespaces

• Prevent term clash:– record?, creator?

• Establish concept spaces through URIs

xmlns:dc=“http://purl.org/dcxmlns:abc=“http://ilrt.ac.uk/abc<dc:creator>Herbert Van de Sompel</dc:creator><abc:organization>Cornell University</abc:organization>

Modeling & Encoding Metadata Components: RDF

• RDF (Resource Description Format)• The instantiation of the Warwick

Framework on the Web• Provides enabling technology for richly-

structured metadata• Rich data model supporting notions of

distinct entities and properties• Syntax expressed in XML

RDF Components

• Formal data model

• Syntax for interchange of data

• Schema Type system (schema model)

RDF Data Model

• Directed labeled graphs

• Model elements– Resource– Property– Value– Statement– Containers

RDF Model Primitives

ResourceProperty

ValueResource

Statement

RDF Syntax Example

URI:R“CIMI Presentation”

Title

Creatordc:

dc:

“Eric Miller”

<RDF xmlns = “http://www.w3.org/TR/WD-rdf-syntax#” xmlns:dc = “http://purl.org/dc/elements/1.0/”> <Description about = “URI:R”> <dc:Title> CIMI Presentation </dc:Title> <dc:Creator> Eric Miller </dc:Creator> </Description></RDF>

“Eric Miller”

RDF Model Example #2

URI:R

URI:ERIC

[email protected]”“Eric Miller”

“OCLC”

bib:Emailbib:Affbib:Name

URI:OCLC

“CIMI Presentation”Title

Creatoroa:

dc:

<RDF xmlns = “http://www.w3.org/TR/WD-rdf-syntax#” xmlns:dc = “http://purl.org/dc/elements/1.0/” xmlns:bib = “http://www.bib.org/persons#”> <Description about = “URI:R”> <dc:Title> CIMI Presentation </dc:Title> <oa:Creator> <Description> <bib:Name> Eric Miller </bib:Name> <bib:Email> [email protected] </bib:Email> <bib:Aff resource = “http://www.oclc.org” /> </Description> </oa:Creator> </Description></RDF>

RDF Syntax Example #2

RDF Containers

• Permit the aggregation of several values for a property

• Express multiple aggregation semantics– unordered– sequential or priority order– alternative

RDF Schemas

• Declaration of vocabularies– properties defined by a particular community– characteristics of properties and/or constraints on

corresponding values

• Schema Type System - Basic Types– Property, Class, SubClassOf, Domain, Range– Minimal (but extensible) at this time– minimize significant clashes with typing system designed

for XML Schema WG

• Expressible in the RDF model and syntax

Relationships among vocabularies

dc:Creator

ms:director

marc:100

bib:Author

Bringing it together

RDF Data Model – Support consistent encoding, exchange and

processing of metadata… critical when aggregating data from multiple sources

• RDF Schema– Declare, define, reuse vocabularies

• RDF Metadata transmission– XML encoding

Interoperability among Metadata Vocabularies

coreclasses

DublinCore

MARC

INDECSIMS

Attribute/Value approaches to metadata…

Hamlet has a creator Shakespeare

subject implied verb metadata noun literal

Play

wrig

ht

metadata adjective

The playwright of Hamlet was Shakespeare

R1

“Shakespeare”

“Hamlet”

dc:creator.playwright

dc:title

…run into problems for richer descriptions…

Hamlet has a creator Stratford

birt

hpla

ce

The playwright of Hamlet was Shakespeare,who was born in Stratford

“Stratford”R1

“Shakespeare”dc:creator.playwright

dc:creator.birthplace

Hamlet has a creator Shakespeare

…because of their failure to model entity distinctions

R1

“Stratford”

creatorR2

name “Shakespeare”

birthplacetitle

“Hamlet”

Understanding Metadata based on Query Capabilities

• Simple boolean tags?

• Agent, time, place questions?– Who was responsible for what and when

Applying a Model-Centric Approach

• Formally define common entities and relationships underlying multiple metadata vocabularies

• Describe them (and their inter-relationships) in a simple logical model

• Provide the framework for extending these common semantics to domain and application-specific metadata vocabularies.

Events are key to understanding metadata relationships?

• Recognizing inherent lifecycle aspects of digital content - transformation of “input” resources to “output” resources and of their descriptions. (e.g., IFLA model)

• Modeling implied events as first-class objects provides attachment points for common entities – e.g., agents, contexts (times & places), roles.

• Clarifying attachment points facilitates mapping across common entities in different vocabularies.

desc1

Content, Events, & Descriptions

desc2

R1 R2 R3

R4

E2 E3E1

E4

Museum Data