34
Dublin Core and Emerging Conventions for a Semantic Web Thomas Baker Fraunhofer-Gesellschaft, Bonn ELPUB 2003, Guimaraes, Portugal 26 June 2003

Dublin Core and Emerging Conventions for a Semantic Web Thomas Baker Fraunhofer-Gesellschaft, Bonn ELPUB 2003, Guimaraes, Portugal 26 June 2003

  • View
    217

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Dublin Core and Emerging Conventions for a Semantic Web Thomas Baker Fraunhofer-Gesellschaft, Bonn ELPUB 2003, Guimaraes, Portugal 26 June 2003

Dublin Core and Emerging Conventions for a Semantic Web

Thomas BakerFraunhofer-Gesellschaft, Bonn

ELPUB 2003, Guimaraes, Portugal

26 June 2003

Page 2: Dublin Core and Emerging Conventions for a Semantic Web Thomas Baker Fraunhofer-Gesellschaft, Bonn ELPUB 2003, Guimaraes, Portugal 26 June 2003

A particular set of metadata terms

• Dublin Core as a simple and semantically generic lingua franca– Fifteen “core” elements: Subject, Description, Title…– A metadata "pidgin" for "digital tourists" on a

culturally diverse global Web– Limited grammar, easy to learn and use– Enough "as is" for many needs– 33 "element refinements" and 17 "encoding schemes"

to qualify the elements for specialized purposes– A small set of 12 resource types for use with dc:type

Page 3: Dublin Core and Emerging Conventions for a Semantic Web Thomas Baker Fraunhofer-Gesellschaft, Bonn ELPUB 2003, Guimaraes, Portugal 26 June 2003

A simple data model(resource with properties)

• 1996-1998: Collective realization that machine-processability requires a coherent data model

• 1996: “Warwick Framework” proposed at DC-2 workshop: DC as one specialized module (“resource discovery”)

• 1997: “Qualifiers” proposed for specifying meanings– Some early adopters took this to unintended extremes:

“DC.Creator.telephone-number”• 1998: DCMI involvement in emerging Resource

Description Framework, clarification of simple data model • 2000: First set of qualifiers approved

Page 4: Dublin Core and Emerging Conventions for a Semantic Web Thomas Baker Fraunhofer-Gesellschaft, Bonn ELPUB 2003, Guimaraes, Portugal 26 June 2003

A typology of metadata terms ("grammar")

• Elements– (core) properties of resources

• Element Refinements– properties that semantically refine elements

• Encoding Schemes– give context to a metadata value

• Vocabulary Terms– constitute controlled lists of possible values

Page 5: Dublin Core and Emerging Conventions for a Semantic Web Thomas Baker Fraunhofer-Gesellschaft, Bonn ELPUB 2003, Guimaraes, Portugal 26 June 2003

An emergent approach to"structured values"

• Implementers sometimes "shoehorn" complex sets of information into a single value– Creator: "name=Tom, affiliation=FHG, shoesize=47"

• In practice, a large variety of "structured values"– Labelled strings– Unlabelled strings– Marked-up strings (e.g., LaTex, HTML)– Secondary resource descriptions (as above)– Post-processing ad-hoc constructs is messy and does not scale

• Andy Powell's model:– Elements can have string values (Simple DC)– A further requirement to point to linked metadata?

Page 6: Dublin Core and Emerging Conventions for a Semantic Web Thomas Baker Fraunhofer-Gesellschaft, Bonn ELPUB 2003, Guimaraes, Portugal 26 June 2003

A process for community standardization [10]

• 1995-1999: open workshops, unruly but stimulating meetings of minds, rough consensus

• 2000: qualifier vote: circa 25 voting members of an ad-hoc "Usage Committee"

• 2001: smaller Usage Board– Codification of formal process for editorial control

– Two two-day face-to-face meetings per year

– Mandate and responsibility to maintain standard, approve extensions and clarifications

Page 7: Dublin Core and Emerging Conventions for a Semantic Web Thomas Baker Fraunhofer-Gesellschaft, Bonn ELPUB 2003, Guimaraes, Portugal 26 June 2003

...based editorial review bya Usage Board

• Term set must evolve as implementors coin new terms and usage patterns emerge– Working groups propose new terms or clarifications– Evaluate in light of grammatical principle, usefulness, clarity

of definition, overlap with existing terms– Review application profiles based on Dublin Core

• Tiered model of approval status: conforming, recommended, obsolete, registered

• Meeting materials, mailing lists, and decisions archived and accessible on the open Web

• DCMI as maintenance agency for ISO 15836

Page 8: Dublin Core and Emerging Conventions for a Semantic Web Thomas Baker Fraunhofer-Gesellschaft, Bonn ELPUB 2003, Guimaraes, Portugal 26 June 2003

A bias towards simple and generic

• DCMI Usage Board bias– Strength and value of DC lies in simplicity and generic

applicability

– Keep the core standard small, generic, and lightweight

– Resist temptation to "complexify"– people want and need distinctions, but not in a "small standard"

– DCMI Type Vocabulary has just 12 terms: user communities should invent or re-use their own more specific sub-types

Page 9: Dublin Core and Emerging Conventions for a Semantic Web Thomas Baker Fraunhofer-Gesellschaft, Bonn ELPUB 2003, Guimaraes, Portugal 26 June 2003

A bias towards cooperation and re-use

• Help user communities define and use their own extensions– Cooperate with maintainers of specialized

vocabularies on forms of mutual recognition– Provide a model for re-use

Page 10: Dublin Core and Emerging Conventions for a Semantic Web Thomas Baker Fraunhofer-Gesellschaft, Bonn ELPUB 2003, Guimaraes, Portugal 26 June 2003

"Good neighbor" policies

• MARC Relators (roles such as "adapter", "artist")– DCMI: "use MARC Relators to refine dc:contributor"

– LoC's RDF schema: "MARC Relators (identified with URIs) are sub-properties of dc:contributor"

• Encoding Schemes– DCMI term designates Library of Congress Subject

Headings (http://purl.org/dc/terms/LCSH)

– If LoC coins own term, DCMI should promote its use

Page 11: Dublin Core and Emerging Conventions for a Semantic Web Thomas Baker Fraunhofer-Gesellschaft, Bonn ELPUB 2003, Guimaraes, Portugal 26 June 2003

A "namespace policy" [20]

• All DCMI metadata terms are given unique identity within three namespaces:– http: //purl.org/dc/elements/1.1/ - the core elements– http://purl.org/dc/terms/ - all other elements/qualifiers– http://purl.org/dc/dcmitype/ - a Type vocabulary– Example: http://purl.org/dc/elements/1.1/title

• Policy on long-term stability of namespace URIs– Changes not substantially “semantic” (i.e., corrections) will not

result in change of namespace URIs– “Semantic” changes must trigger a change of name– Version turnover of a “document management” nature will

have no effect on namespace URIs

Page 12: Dublin Core and Emerging Conventions for a Semantic Web Thomas Baker Fraunhofer-Gesellschaft, Bonn ELPUB 2003, Guimaraes, Portugal 26 June 2003

A typology of metadata vocabularies

• Term declarations– Declare a unique set of elements and definitions

– Each DCMI term is identified with a URI

– Documented in HTML pages, formally declared as RDF schemas

• Application profiles– Declare how an application uses which terms in its

metadata

– May mix-and-match from multiple namespaces

Page 13: Dublin Core and Emerging Conventions for a Semantic Web Thomas Baker Fraunhofer-Gesellschaft, Bonn ELPUB 2003, Guimaraes, Portugal 26 June 2003

Why application profiles?• People want them!

– Most standards have them: IEEE/LOM, MARC, DOI...– As focus of dialogue and semantic negotiation – Deep human need to resist total standardization?– To identify emerging semantics "at the edges" of a

standard– To know how colleagues and peers are designing

metadata – and avoid "reinventing the wheel"

• To harmonize metadata usage within domains:– User communities (DC-Libraries, DC-Government)– Subject gateways (Renardus)

Page 14: Dublin Core and Emerging Conventions for a Semantic Web Thomas Baker Fraunhofer-Gesellschaft, Bonn ELPUB 2003, Guimaraes, Portugal 26 June 2003

Dublin Core application profiles

• Declaration specifying which metadata terms an information provider uses in metadata– Identifies source of terms used– May provide additional documentation

• Designed to promote interoperability within constraints of Dublin Core model

• Draft guidelines sponsored by European Standardization Committee (CEN) to be progressed through DCMI process– http://www.cenorm.be/isss/Workshop/MMI-DC/application-

profile-for-comment.pdf

• Caution: a documentary format cannot itself guarantee interoperability

Page 15: Dublin Core and Emerging Conventions for a Semantic Web Thomas Baker Fraunhofer-Gesellschaft, Bonn ELPUB 2003, Guimaraes, Portugal 26 June 2003

A set of encoding practices

• Guidelines for encoding metadata records (or embedded metadata) in HTML, XML, RDF– Use of rdfs:label and rdfs:value allow nesting of

secondary resource descriptions

• A model for declaring terms "machine-processably" in RDF– Namespace Policy mandates this, though not

specifically RDF

• Work item: a model for declaring application profiles machine-processably

Page 16: Dublin Core and Emerging Conventions for a Semantic Web Thomas Baker Fraunhofer-Gesellschaft, Bonn ELPUB 2003, Guimaraes, Portugal 26 June 2003

CORES Resolution

Page 17: Dublin Core and Emerging Conventions for a Semantic Web Thomas Baker Fraunhofer-Gesellschaft, Bonn ELPUB 2003, Guimaraes, Portugal 26 June 2003

Shared conventions fordeclaring namespaces? [30]

• Cross-community consensus-building– W3C metadata standards and URIs as a basis

for interoperability among different standards?

• EU CORES Project (2002-2003)– Identify and explore areas of possible

agreement among major standards initiatives– Interoperability Forum meeting in Brussels,

November 2002

Page 18: Dublin Core and Emerging Conventions for a Semantic Web Thomas Baker Fraunhofer-Gesellschaft, Bonn ELPUB 2003, Guimaraes, Portugal 26 June 2003

CORES Resolution on Identifying Metadata Elements

• http://www.cores-eu.net/interoperability/cores-resolution/• Whereas

– Our metadata standards have “elements” – units of meaning comparable and mappable to elements of other standards,

• We agree:– To assign Uniform Resource Identifiers to our

elements;– To articulate and publish specific policies regarding

the stability, persistence, and maintenance of the URIs assigned to the elements.

Page 19: Dublin Core and Emerging Conventions for a Semantic Web Thomas Baker Fraunhofer-Gesellschaft, Bonn ELPUB 2003, Guimaraes, Portugal 26 June 2003

Clarifications to theCORES Resolution

• URIs not necessarily used in applications "as is"– In metadata records, maybe dc:contributor instead of

http://purl.org/dc/elements/1.1/contributor

• Signatories decide what to identify with URIs– An individual element? An entire set of elements? A specific

historical version of an element?

• No implication that URIs will "resolve" to anything– URIs may "get" something with HTTP on Web – or not!– E.g., resolve to a database query?– Resolve to an RDF schema?– Or even resolve to nothing at all ("file not found")!!

Page 20: Dublin Core and Emerging Conventions for a Semantic Web Thomas Baker Fraunhofer-Gesellschaft, Bonn ELPUB 2003, Guimaraes, Portugal 26 June 2003

Signatories• Eliot Christian, USGS, for GILS• Brian Green, EDItEUR, for ONIX• Rebecca Guenther, Library of Congress, for

MARC21• Keith Jeffery, EuroCRIS, for CERIF• Norman Paskin, Int’l DOI Foundation, for DOI• Robby Robson, IEEE LTSC, for IEEE/LOM• Stuart Weibel, DCMI, for Dublin Core

Page 21: Dublin Core and Emerging Conventions for a Semantic Web Thomas Baker Fraunhofer-Gesellschaft, Bonn ELPUB 2003, Guimaraes, Portugal 26 June 2003

Signatories’ Action Plan

• Action plan, November 2002 – May 2003:– Define and publish URI assignment mechanisms

– Assign URIs to elements

– Publish URI persistence policies

• Article on follow-up scheduled for D-Lib Magazine in July 2003 issue

– Taken as a whole, corpus of good-practice policies for others to discuss and emulate

Page 22: Dublin Core and Emerging Conventions for a Semantic Web Thomas Baker Fraunhofer-Gesellschaft, Bonn ELPUB 2003, Guimaraes, Portugal 26 June 2003

Beyond the CORES Resolution [40]

• Benefits for signatories:– Important first step towards future interoperability

applications (e.g., mapping, conversion)– Improve "citability" of elements between standards

• Potential areas of further work:– Provide persistent URIs for terms in taxonomies and

ontologies– Shared conventions on declaring URIs in machine-

processable forms– Shared conventions for application profiles and mapping

constructs– Shared ontologies as targets for mapping

Page 23: Dublin Core and Emerging Conventions for a Semantic Web Thomas Baker Fraunhofer-Gesellschaft, Bonn ELPUB 2003, Guimaraes, Portugal 26 June 2003

What exactly is being identified?• Is a particular term the same when used in different

contexts?• A single term in a flat namespace?

– http://ltsc.ieee.org/LOM/Identifier

• Or two terms in a flat namespace?– http://ltsc.ieee.org/LOM/GeneralIdentifier

– http://ltsc.ieee.org/LOM/MetadataIdentifier

• Or two terms in a hierarchical namespace?– http://ltsc.ieee.org/LOM/General/Identifier

– http://ltsc.ieee.org/LOM/Metadata/Identifier

Page 24: Dublin Core and Emerging Conventions for a Semantic Web Thomas Baker Fraunhofer-Gesellschaft, Bonn ELPUB 2003, Guimaraes, Portugal 26 June 2003

What exactly is being identified?• For purposes of identification, is a term "the

same" through successive versions?• At first, DC reflected version in the URI:

– http://purl.org/dc/elements/1.1/title

• Then decided to keep URIs stable and define the limits of change in the Namespace Policy– http://purl.org/dc/terms/audience

• URIs for DC 1.1 kept for legacy reasons• URIs for successive versions of a term used

"behind the scenes" for tracking changes

Page 25: Dublin Core and Emerging Conventions for a Semantic Web Thomas Baker Fraunhofer-Gesellschaft, Bonn ELPUB 2003, Guimaraes, Portugal 26 June 2003

Publishing and documentinga vocabulary

Page 26: Dublin Core and Emerging Conventions for a Semantic Web Thomas Baker Fraunhofer-Gesellschaft, Bonn ELPUB 2003, Guimaraes, Portugal 26 June 2003

A method for maintaining (and versioning) a vocabulary

• Assume that vocabularies must evolve:– Anticipate need to understand discrete states of the

standard– All documents, decisions, and term declarations must

evolve– Versioning to support future automated methods for

processing legacy metadata

• Numbered decisions linked to:– A specific historical version of a term– Supporting documentation for the decision– Historical record of the Usage Board meeting

Page 27: Dublin Core and Emerging Conventions for a Semantic Web Thomas Baker Fraunhofer-Gesellschaft, Bonn ELPUB 2003, Guimaraes, Portugal 26 June 2003

Modes for publishing a vocabulary

• Multiple publication formats needed– Web pages for human use– RDF schemas for expressing relationships between terms

in machine-processable form– OWL ontologies and rules languages will improve

expressivity of these constructs– Future schemas may need to express versioning machine-

processably

• Workflow– Web pages and schemas from a common source– XML data + XSLT scripts – simple, effective

Page 28: Dublin Core and Emerging Conventions for a Semantic Web Thomas Baker Fraunhofer-Gesellschaft, Bonn ELPUB 2003, Guimaraes, Portugal 26 June 2003

A searchable "registry" of terms [50]

• DCMI Registry– Searchable database of metadata terms– Terms translated into various languages– Goal: application interface for Web services– Goal: harvest schemas directly from their maintainers

• An ecology of registries?– Harvest and merge element sets, vocabularies, profiles

• For general overviews: SCHEMAS, CORES• Specific domains: MEG, GEM (education), FAO (agriculture)

– Publication environment for information models– Tool for harmonization, mapping, conversion, merging

Page 29: Dublin Core and Emerging Conventions for a Semantic Web Thomas Baker Fraunhofer-Gesellschaft, Bonn ELPUB 2003, Guimaraes, Portugal 26 June 2003

The evolving Web context

Page 30: Dublin Core and Emerging Conventions for a Semantic Web Thomas Baker Fraunhofer-Gesellschaft, Bonn ELPUB 2003, Guimaraes, Portugal 26 June 2003

The Web as a new social context• Something new in history

– Not just an historical set of technologies (HTTP, URLs, HTML)

– Platform for historically unprecedented forms of social and intellectual interaction

• Metadata as language for the Web– A language for statements about Web resources– Statements created and used both by humans and by

machines– "Semantic Web" is about describing how resources

relate to each other

Page 31: Dublin Core and Emerging Conventions for a Semantic Web Thomas Baker Fraunhofer-Gesellschaft, Bonn ELPUB 2003, Guimaraes, Portugal 26 June 2003

Scale and automation• The Web is too big to control

– Metadata statements are expensive to make and maintain

– Shift away from the metaphor of "library"?

– NSF workshop on "Post Digital Library Futures"• http: //www.sis.pitt.edu/~dlwkshop/

• Automated resource discovery (e.g. Google)– Using contextual information (e.g., URL structures) to

infer "aboutness"

– Natural-language technology, e.g. summarization

Page 32: Dublin Core and Emerging Conventions for a Semantic Web Thomas Baker Fraunhofer-Gesellschaft, Bonn ELPUB 2003, Guimaraes, Portugal 26 June 2003

An evolving role for metadata

• Balance between human and machine– Automated methods to generate metadata– "Let Google do it" versus expert intervention

• Granularity of metadata– Describe each item or entire collections?– How much metadata is "enough" to improve

discovery?– Semantic precision or tolerance of fuzziness?

Page 33: Dublin Core and Emerging Conventions for a Semantic Web Thomas Baker Fraunhofer-Gesellschaft, Bonn ELPUB 2003, Guimaraes, Portugal 26 June 2003

Which aspects of Dublin Core willprove most useful over time?

• The elements and related sets of terms• Open processes for community standardization• Editorial review by a Usage Board• A bias toward simple and generic metadata• A bias toward cooperative re-use of vocabularies

– The etiquette of mutual recognition

• A namespace policy for using URIs• A typology of vocabularies (e.g. application profiles)• A set of encoding practices (HTML, XML, RDF)• Methods for maintaining and versioning a vocabulary• Publishing a vocabulary for humans and machines• Searchable registries of metadata terms