View
215
Download
0
Category
Preview:
Citation preview
Why is Metadata important?
Key to organizing, managing, preserving, and locating content and services in digital
libraries
Why is Metadata difficult?
• Cost• Interoperability
– Syntax
– Semantics
• Customizability• Extensibility
• Distribution• Integrity, Authenticity,
Quality• Human and Machine
Factors• Naming
Metadata Thoughts
• Metadata takes a variety of forms– descriptive cataloging– specialized
• terms and conditions
• administrative
• content ratings
• provenance
• linkage
More Metadata Thoughts
• New metadata sets will continually evolve
• Many metadata sets are “community-specific”– administration– use
• Human and machine use
Dublin Core
• Metadata Set for Simple Resource Discovery
• 15 elements allowing simple descriptive sentences about document like objects:– “Document has title Hamlet”– “Document has creator William Shakespeare”– “Document has subject love and anguish”
The Dublin Core 15
• Title • Creator • Subject /Keywords
• Description • Publisher • Other Contributor
• Date
• Resource Type • Format • Resource
Identifier • Source • Language • Relation • Coverage • Rights
Management
A Scope for the Dublin Core
• Increase or decrease number of elements?
• Structured or Unstructured value syntax?
• Accommodate community extensions?
Warwick Framework
• Provide context for Dublin Core effort
• Integrate multiple sets of metadata addressing issues of:– individual integrity– distinct audiences– separate realms of responsibility and
management
Warwick Framework Design
• Containers for aggregating …• Packages of typed metadata sets• General principles - information hiding:
– only operation defined at container level returns sequence of contained packages
– packages are opaque at the container level– access to package contents subject to terms and
conditions
Package Types
• Simple metadata set– segregating distinct metadata into separate
packages
• Recursive container – nesting semantically related metadata sets
• Indirect reference– allowing distribution and sharing of metadata
sets
Metadata Container
Container
Package
Dublin Core
Package
MARC record
Package
Indirect Reference
Package
Terms and Conditions
URI
Open Implementation Issues
• Data encoding
• Semantic interaction of overlapping sets– between semantically-related packages– between semantically distinct packages
• Type registry
Modeling & Encoding Metadata Components: XML Namespaces
• Prevent term clash:– record?, creator?
• Establish concept spaces through URIs
xmlns:dc=“http://purl.org/dcxmlns:abc=“http://ilrt.ac.uk/abc<dc:creator>Herbert Van de Sompel</dc:creator><abc:organization>Cornell University</abc:organization>
Modeling & Encoding Metadata Components: RDF
• RDF (Resource Description Format)• The instantiation of the Warwick
Framework on the Web• Provides enabling technology for richly-
structured metadata• Rich data model supporting notions of
distinct entities and properties• Syntax expressed in XML
RDF Components
• Formal data model
• Syntax for interchange of data
• Schema Type system (schema model)
RDF Data Model
• Directed labeled graphs
• Model elements– Resource– Property– Value– Statement– Containers
RDF Syntax Example
URI:R“CIMI Presentation”
Title
Creatordc:
dc:
“Eric Miller”
<RDF xmlns = “http://www.w3.org/TR/WD-rdf-syntax#” xmlns:dc = “http://purl.org/dc/elements/1.0/”> <Description about = “URI:R”> <dc:Title> CIMI Presentation </dc:Title> <dc:Creator> Eric Miller </dc:Creator> </Description></RDF>
“Eric Miller”
RDF Model Example #2
URI:R
URI:ERIC
“emiller@oclc.org”“Eric Miller”
“OCLC”
bib:Emailbib:Affbib:Name
URI:OCLC
“CIMI Presentation”Title
Creatoroa:
dc:
<RDF xmlns = “http://www.w3.org/TR/WD-rdf-syntax#” xmlns:dc = “http://purl.org/dc/elements/1.0/” xmlns:bib = “http://www.bib.org/persons#”> <Description about = “URI:R”> <dc:Title> CIMI Presentation </dc:Title> <oa:Creator> <Description> <bib:Name> Eric Miller </bib:Name> <bib:Email> emiller@oclc.org </bib:Email> <bib:Aff resource = “http://www.oclc.org” /> </Description> </oa:Creator> </Description></RDF>
RDF Syntax Example #2
RDF Containers
• Permit the aggregation of several values for a property
• Express multiple aggregation semantics– unordered– sequential or priority order– alternative
RDF Schemas
• Declaration of vocabularies– properties defined by a particular community– characteristics of properties and/or constraints on
corresponding values
• Schema Type System - Basic Types– Property, Class, SubClassOf, Domain, Range– Minimal (but extensible) at this time– minimize significant clashes with typing system designed
for XML Schema WG
• Expressible in the RDF model and syntax
Bringing it together
RDF Data Model – Support consistent encoding, exchange and
processing of metadata… critical when aggregating data from multiple sources
• RDF Schema– Declare, define, reuse vocabularies
• RDF Metadata transmission– XML encoding
Attribute/Value approaches to metadata…
Hamlet has a creator Shakespeare
subject implied verb metadata noun literal
Play
wrig
ht
metadata adjective
The playwright of Hamlet was Shakespeare
R1
“Shakespeare”
“Hamlet”
dc:creator.playwright
dc:title
…run into problems for richer descriptions…
Hamlet has a creator Stratford
birt
hpla
ce
The playwright of Hamlet was Shakespeare,who was born in Stratford
“Stratford”R1
“Shakespeare”dc:creator.playwright
dc:creator.birthplace
Hamlet has a creator Shakespeare
…because of their failure to model entity distinctions
R1
“Stratford”
creatorR2
name “Shakespeare”
birthplacetitle
“Hamlet”
Understanding Metadata based on Query Capabilities
• Simple boolean tags?
• Agent, time, place questions?– Who was responsible for what and when
Applying a Model-Centric Approach
• Formally define common entities and relationships underlying multiple metadata vocabularies
• Describe them (and their inter-relationships) in a simple logical model
• Provide the framework for extending these common semantics to domain and application-specific metadata vocabularies.
Events are key to understanding metadata relationships?
• Recognizing inherent lifecycle aspects of digital content - transformation of “input” resources to “output” resources and of their descriptions. (e.g., IFLA model)
• Modeling implied events as first-class objects provides attachment points for common entities – e.g., agents, contexts (times & places), roles.
• Clarifying attachment points facilitates mapping across common entities in different vocabularies.
Recommended