52
Hideaki Takeda / National Institute of Informatics Identity and schema for Linked Data Hideaki Takeda National Institute of Informatics takedanii.ac.jp 2012 INTERNATIONAL ASIAN SUMMER SCHOOL IN LINKED DATA IASLOD 2012, August 13-17, 2012, KAIST, Daejeon, Korea

Schema and Identity for Linked Data

Embed Size (px)

Citation preview

Hideaki Takeda / National Institute of Informatics

Identity and schema for Linked Data

Hideaki Takeda

National Institute of Informatics

takeda@nii.ac.jp

2012 INTERNATIONAL ASIAN SUMMER SCHOOL IN LINKED DATA

IASLOD 2012, August 13-17, 2012, KAIST, Daejeon, Korea

Hideaki Takeda / National Institute of Informatics

How to put the data into computer?

• How to describe the data? – The way to describe individual data

• Schema/Class/Concept

– The way to describe relationship among schema/class/concept • Ontology/Taxonomy/Thesaurus

• How to refer the data? – The way to identify individual data

• Identifier

– Relationship among identifiers

Hideaki Takeda / National Institute of Informatics

Architecture for the Semantic Web

Tim Berners-Lee http://www.w3.org/2002/Talks/09-lcs-sweb-tbl/

The world of instances (Linked Data)

The world of classes (Ontologies)

Hideaki Takeda / National Institute of Informatics

Layers of Semantic Web • Ontology

– Descriptions on classes

– RDFS, OWL

– Challenges for ontology building

• Ontology building is difficult by nature

– Consistency, comprehensiveness, logicality

• Alignment of ontologies is more difficult

Tim Berners-Lee http://www.w3.org/2002/Talks/09-lcs-sweb-tbl/

Descriptions on classes

インスタンスに関する記述

Ontology

Linked Data

Hideaki Takeda / National Institute of Informatics

Layers of Semantic Web • Linked Data

– Descriptions on instances (individuals)

– RDF + (RDFS, OWL)

– Pros for Linked Data

• Easy to write (mainly fact description)

• Easy to link (fact to fact link)

– Cons for Linked Data

• Difficult to describe complex structures

• Still need for class description (-> ontology)

Tim Berners-Lee http://www.w3.org/2002/Talks/09-lcs-sweb-tbl/

Descriptions on classes

Description on instances

Ontology

Linked Data

Hideaki Takeda / National Institute of Informatics

Importance of Identifiers for Entities

• Everything should be identifiable!

• Human can identify things with vague identifiers or even without identifiers with help from the context around things

• On the web, the context is usually not available and the computer can seldom understand the context even if it exists

• So we need identifiers for all things

Hideaki Takeda / National Institute of Informatics

Identification System

• Identification is one of the primary functions for human information processing – Naming: e.g., names for people, pets, and some daily

things • OK if the number of things is not so big

– Systematic Identification • e.g., phone number, post-code, passport number, product number,

ISBN • If the number of things is big enough

• Requirements for Systematic Identification – Identifier is stable and sustainable – Uniqueness is guaranteed – Identifier publisher is reliable and sustainable

Hideaki Takeda / National Institute of Informatics

Identification system for Web

• Not so different from conventional identification systems • Difference

– Cross-system use – Truly digitized

• Requirements for Systematic Identification for web – Identifier is stable and sustainable (even after an entity may

disappear) – Uniqueness is guaranteed over all systems – Description on should be associated to identifiers

• since entities may not accessible

– Identifier publisher is reliable and sustainable

Hideaki Takeda / National Institute of Informatics

Solutions for the Requirements by LOD

• Requirements for Systematic Identification for web – 1. Identifier is stable and sustainable (even after an

entity may disappear) • (up to each identifier publisher)

– 2. Uniqueness is guaranteed over all systems • URI (not URN)

– 3. Description on should be associated to identifiers • Dereferenceable URI

– If URI is accessed, a description associated to it should be returned

– 4. Identifier publisher is reliable and sustainable

Hideaki Takeda / National Institute of Informatics

Some examples ISBN(International Standard Book Number)

• Abstract

– a unique numeric commercial book identifier

– 13 digits

• Prefix: 978 or 979 (for compatibility with EAN code)

• Group(language-sharing country group): 1 to 5 digits

• Publisher code:

• Item number:

• Check num: 1 digit

– Management: two layers

• National ISBN Agency – Publisher

• Requirement Satisfaction

– 1. (Stable ID) Maybe (versioning often matters, and sometimes publisher may re-use ISBN)

– 2. (Unique ID) Uniqueness is guaranteed but not URI

– 3. (Dereferenceable) No mechanisms (amazon does instead!)

– 4. (Reliable publisher) Yes

Hideaki Takeda / National Institute of Informatics

Some examples DOI (Digital Object Identifier)

• Abstract

– An identifier for scientific digital objects (mostly scientific articles)

– An unfixed string: “prefix/suffix”

• Prefix: assigned for publishers

• Suffix: assigned for each object

– Management: three layers

• IDF (International DOI Foundation) – Registration Agency – Publisher

• Requirement Satisfaction

– 1. (Stable ID) Yes (not re-usable)

– 2. (Unique ID)Uniqueness is guaranteed and URI accessible (http://dx.doi.org/”DOI”)

– 3. (Dereferenaceable)Mapping to object pages but no RDF

– 4. (Reliable publisher) Maybe

Hideaki Takeda / National Institute of Informatics

Some examples Dbpedia (as Identifier)

• Abstract

– A wikipedia page

– Name of wikipedia page

• Maintained manually

– Disambiguation page

– Redirect page

• Requirement Satisfaction

– 1. (Stable ID) maybe (sometimes disappear, sometimes change names, sometime change contents)

– 2. (Unique ID) Uniqueness is mostly guaranteed and URI accessible

– 3. (Dereferenceable) RDF

– 4. (Reliable publisher) Maybe

Hideaki Takeda / National Institute of Informatics

Identification of relationship between identifiers

• Co-existence of multiple identification systems on a field – Difference of coverage – Difference of Viewpoint

An entity can have multiple identifiers Need for mapping between identifiers in different

identification systems Method: Use special properties

owl:sameAs, (rdfs:seeAlso, skos:exactMatch) http://sameas.org

Some problems – Logical inconsistency with owl:sameAs – Maintainance

Hideaki Takeda / National Institute of Informatics

LOD Cloud (Linking Open Data)

Hideaki Takeda / National Institute of Informatics

Summary for ID

• Identification is the crucial part in LOD

– Data availability

– Data inconsistency

– Data interoperability

• Establishment of a good identification system leads a reliable and sustainable LOD.

Hideaki Takeda / National Institute of Informatics

Structuring Information • A wide range of structuring information

– Keywords, tags

• A freely chosen word or phrase just indicating some features

– Controlled vocabulary

• Mapping to the fixed set of words or phrases

• e.g., the list of countries, the name authorities

– Classification

• System for classifying entities. Often hierarchical. Class may not carry meaning.

– Taxonomy

• Hierarchical term system for classification. Upper/lower relation usually means general/specific relation

• e.g., the subject headings of LC

– Thesaurus

• System for semantics. More different types of relations: (hypersym, hyposym), synonym, antonym, homonym, holonym, meronym

– Ontology

• System of concepts. Concepts rather than words. More various relations, the definitions of concepts

Hideaki Takeda / National Institute of Informatics

Examples in Library Science

• Many systems in the library community • Classification

– Universal Decimal Classification (UDC)

• Controlled Vocabulary – the authority files for person names, organizations, location names

• Library of Congress : 8 Million records, MADS &SKOS • British Library: 2.6 million records, foaf & BIO (A vocabulary for

biographical information) • National Diet Library (Japan): 1 million records, foaf • Deutsche Nationalbibliothek (DNB, Germany): 1.8 & 1.3 million records

(names & organization), • Virtual International Authority File (VIAF): 4 million records

• Taxonomy – Subject Heading: LC, NDL,

• Library of Congress: MADS &SKOS • British Library: • National Diet Library (Japan): 0.1 million records, SKOS • Deutsche Nationalbibliothek (DNB, Germany): 0.16 million records

Hideaki Takeda / National Institute of Informatics

Hideaki Takeda / National Institute of Informatics

Hideaki Takeda / National Institute of Informatics

UDC as Linked Data UDC ELEMENT DEFINITION SKOS TERM UDC

SUBPROPERTY

UDC number (notation) UDC notation is combination of symbols (numerals, signs and letters) that represent a class, its position in the hierarchy and its relation to other classes. Notation is a language-independent indexing term that enables mechanical sorting and filing of subjects. Also called 'UDC number' and 'UDC classmark'

skos:notation ---

class identifier (URI) A unique identifier assigned to each UDC class. It identifies the relationship between a class' meaning and its notational representation

skos:Concept ---

broader class (URI) Superordinate class: the class hierarchically above the class in question skos:broader ---

caption Verbal description of the class content skos:prefLabel ---

including note Extension of the caption containing verbal examples of the class content (usually a selection of important terms that do not appear in the subdivision)

skos:note udc:includingNote

application note Instructions for number building, further extension and specification of the class skos:note udc:applicationNote

scope note Note explaining the extent and the meaning of a UDC class. Used to resolve disambiguation or to distinguish this class from other similar classes

skos:scopeNote

---

examples Examples of combination are used to illustrate UDC class building i.e. complex subject statements

skos:example ---

see also reference Indication of conceptual relationship between UDC classes from different hierarchies skos:related ---

<skos:Concept rdf:about="http://udcdata.info/025553">

<skos:inScheme rdf:resource="http://udcdata.info/udc-schema"/>

<skos:broader rdf:resource="http://udcdata.info/025461"/>

<skos:notation rdf:datatype="http://udcdata.info/UDCnotation">510.6</skos:notation>

<skos:prefLabel xml:lang="en">Mathematical logic</skos:prefLabel>

<skos:prefLabel xml:lang="ja">記号論理学</skos:prefLabel>

<skos:related rdf:resource="http://udcdata.info/000016"/>

</skos:Concept>

http://udcdata.info/

69,000 records

40 Languages

Hideaki Takeda / National Institute of Informatics

http://id.loc.gov/authorities/names/n79084664.html <http://id.loc.gov/authorities/names/n79084664>

<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>

<http://www.loc.gov/mads/rdf/v1#PersonalName> .

<http://id.loc.gov/authorities/names/n79084664>

<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>

<http://www.loc.gov/mads/rdf/v1#Authority> .

<http://id.loc.gov/authorities/names/n79084664>

<http://www.loc.gov/mads/rdf/v1#authoritativeLabel>

"Natsume, Sōseki, 1867-1916"@en .

<http://id.loc.gov/authorities/names/n79084664>

<http://www.loc.gov/mads/rdf/v1#elementList>

_:bnode7authoritiesnamesn79084664 .

_:bnode7authoritiesnamesn79084664

<http://www.w3.org/1999/02/22-rdf-syntax-ns#first>

_:bnode8authoritiesnamesn79084664 .

_:bnode7authoritiesnamesn79084664

<http://www.w3.org/1999/02/22-rdf-syntax-ns#rest>

_:bnode010 .

_:bnode8authoritiesnamesn79084664

<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>

<http://www.loc.gov/mads/rdf/v1#FullNameElement> .

_:bnode8authoritiesnamesn79084664

<http://www.loc.gov/mads/rdf/v1#elementValue>

"Natsume, Sōseki,"@en .

_:bnode010 <http://www.w3.org/1999/02/22-rdf-syntax-ns#first>

_:bnode11authoritiesnamesn79084664 .

_:bnode010 <http://www.w3.org/1999/02/22-rdf-syntax-ns#rest>

<http://www.w3.org/1999/02/22-rdf-syntax-ns#nil> .

_:bnode11authoritiesnamesn79084664

<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>

<http://www.loc.gov/mads/rdf/v1#DateNameElement> .

_:bnode11authoritiesnamesn79084664

<http://www.loc.gov/mads/rdf/v1#elementValue> "1867-1916"@en .

<http://id.loc.gov/authorities/names/n79084664>

<http://www.loc.gov/mads/rdf/v1#classification> "PL812.A8" .

<http://id.loc.gov/authorities/names/n79084664>

<http://www.loc.gov/mads/rdf/v1#hasExactExternalAuthority>

<http://viaf.org/viaf/sourceID/LC%7Cn+79084664#skos:Concept> .

<http://id.loc.gov/authorities/names/n79084664>

<http://www.loc.gov/mads/rdf/v1#isMemberOfMADSCollection>

<http://id.loc.gov/authorities/names/collection_NamesAuthorizedHeadin

gs> .

<http://id.loc.gov/authorities/names/n79084664>

<http://www.loc.gov/mads/rdf/v1#isMemberOfMADSScheme>

<http://id.loc.gov/authorities/names> .

<http://id.loc.gov/authorities/names/n79084664>

<http://www.loc.gov/mads/rdf/v1#isMemberOfMADSCollection>

<http://id.loc.gov/authorities/names/collection_LCNAF> .

Hideaki Takeda / National Institute of Informatics

http://id.loc.gov/authorities/subjects/sh85008180.html

Hideaki Takeda / National Institute of Informatics

http://data.bnf.fr/11932084/intelligence_artificielle/

Hideaki Takeda / National Institute of Informatics

Some examples Scientific Names for Species and Taxa

• Abstract

– Names for biological species and other taxa (kingdom, divison, class, order, family, tribe, genus)

– A string

• Binomial name for species

• Academic societies maintain taxon names individually

– E.g., Papilo xuthus (Asian Swallowtail, ナミアゲハ,호랑나비)

• Requirement Satisfaction

– 1. Mostly yes (sometimes disappear, change names, change contents)

– 2. Uniqueness is generally guaranteed but precise speaking some ambiguity because of change.

– 3. No. Many systems exists but none covers all species

– 4. Maybe

Hideaki Takeda / National Institute of Informatics

分類群 Taxon 植物

Plants 藻類 Algae

菌類 Fungi

動物 Animals

ドメイン Domain

界 Kingdom

門 Division/Phylum -phyta -phyta -mycota

亜門 Subdivision/Subphylum -phytina -phytina -mycotina

綱 Class -opsida -phyceae -mycetes

亜綱 Subclass -idae -phycidae -mycetidae

目 Order -ales -ales -ales

亜目 Suborder -ineae -ineae -ineae

上科 Superfamily -acea -acea -acea -oidea

科 Family -aceae -aceae -aceae -idae

亜科 Subfamily -oideae -oideae -oideae -inae

族/連 Tribe -eae -eae -eae -ini

亜族/亜連 Subtribe -inae -inae -inae -ina

属 Genus

亜属 Subgenus

種 Species

亜種 Subspecies

Hideaki Takeda / National Institute of Informatics

Ontology

An ontology is an explicit specification of a conceptualization [Gruber]

An ontology is an explicit specification of a conceptualization. The

term is borrowed from philosophy, where an Ontology is a systematic account of Existence. For AI systems, what "exists" is that which can be represented. When the knowledge of a domain is represented in a declarative formalism, the set of objects that can be represented is called the universe of discourse. This set of objects, and the describable relationships among them, are reflected in the representational vocabulary with which a knowledge-based program represents knowledge. Thus, in the context of AI, we can describe the ontology of a program by defining a set of representational terms. In such an ontology, definitions associate the names of entities in the universe of discourse (e.g., classes, relations, functions, or other objects) with human-readable text describing what the names mean, and formal axioms that constrain the interpretation and well-formed use of these terms. Formally, an ontology is the statement of a logical theory.

Hideaki Takeda / National Institute of Informatics

Conceptualization object

box

red box blue box yellow box

on_desk(A)

on(A, B)

put(A,B)

object

box

box

color:{red, blue, yellow}

on_desk(A)

on(A, B)

put(A,B)

object

box desk

on(A/box, B/object)

put(A/box,B/object)

box

color:{red, blue, yellow}

Trade off between generality and efficiency

There are many possible ways to conceptualize the target world

Hideaki Takeda / National Institute of Informatics

Types of Ontologies

• Upper (top-level) ontology vs. Domain ontology – Upper Ontology: A common ontology throughout all domains – Domain Ontology: An ontology which is meaningful in a specific

domain

• Object ontology vs. Task ontology – Object Ontology: An ontology on “things” and “events” – Task Ontology: An ontology on “doing”

• Heavy-weight ontology vs. light-weight ontology – Heavy-weight ontology: fully described ontology including

concept definitions and relations, in particular in a logical way – Light-weight ontology: partially described ontology including

typically only is-a relations

Hideaki Takeda / National Institute of Informatics

Top-level ontology

• Ontology which covers all of the world!

• Very…. Difficult – e.g., how does a thing exist?

• A thing is four dimensional existence?

• A thing exists three-dimensionally over time?

• Common requirements – A small number of concepts can cover the world

– Concepts can be used in lower ontologies

– Concept should be general and abstract

Hideaki Takeda / National Institute of Informatics

Top-level ontology • Three approaches – Formal approach

• Logical formalization • Fully Abstract • Pros: clean • Cons: hardly understandable • e.g., Sowa’s top-level ontology, DOLCE

– Linguistic approach • Use and extension of linguistic concepts • Partially abstract and partially general • Pros: understandable • Cons: limitation to the linguistic world • e.g., Penman Upper Model, WordNet

– Empirical Approach • Use and extension of everyday concepts • Mostly general • Pros: understandable and applicable to all the world • Cons: lack of solid foundation • e.g. SUMO, Cyc, EDR

Hideaki Takeda / National Institute of Informatics

Empirical top-level ontology • SUMO(Suggested Upper

Merged Ontology)

– Collection and organization of concepts used frequently

– Simple relationship between concepts

Entity

BiologicalProcess

ChangeOfState

Transfer

IntentionallyCausedProcess

NaturalProcess

Inorganic

Organic

Substance

Collection

CorpuscularObject

SelfConnectedObject

Process

Object

Abstract

Phsical

PathojogicProcess

PhisiologicProcess

SocialInteraction

Searching

ChangeOfProssession

Communication

BringingTogether

Meeting

Contest

Cooperation

Impelling

Transportation

Removing

PuttingImpacting

Motion

Separating

Hideaki Takeda / National Institute of Informatics

Formal Ontology: DOLCE

• DOLCE(a Descriptive Ontology for Linguistic and Cognitive Engineering)

– Intended to a reference system for top-level ontology

– Logical definition

– Particular (DOLCE) vs. Universal

• Particular: ontology about things, phenomena, quality…

• Universal: ontology for describing particular like categories and attributes

Hideaki Takeda / National Institute of Informatics

Formal Ontology: DOLCE

• Concepts – Endurant / Perdurant / Quality / Abstract

• Endurant: – “Things” – An existence over time – May change its attribute

• Perdurant – “process” – No change over time – May switch a part to the other

• Relations – Parthood (abstract or perdurant) – Temporally Parthood (endurant) – Constitution (endurant or perdurant) – Participation between perdurant and endurant

ALLEntity

PDPerdurantOccurence

EDEndurant

QQuality

ABAbstract

ASArbitrary

Sum

NPEDNon-Physical

Endurant

PEDPhysicalEndurant

MAmount of

Matter

EVEvent

STVStative

APOAgentive

Physical Object

FFeature

POBPhysicalObject NAPO

Non-agentivePhysical Object

NPOBNon-physical

Object

SOBSocial Object

MOBMental Object

PROProcess

STState

ACCAccomplishment

ACHAchievement

AQAbstract Quality

PQPhysical Quality

TQTemporal Quality

TLTemporal Location

SLSpatial Location

RRegion

TRTemporal Region

Fact

SetT

Time IntervalPR

Physical Region

ARAbstract Region

SSpace Region

Hideaki Takeda / National Institute of Informatics

Linguistic top-level ontology

• WordNet – A lexical reference system

• “Link-based electronic dictionary”

– Concepts • synset

– Noun 79,689 – Verb 13,508

– Relations • synonym • hypernym/hyponym (is-a) • holonym/meronym (a-part-of)

http://www.cogsci.princeton.edu/cgi-bin/webwn

Hideaki Takeda / National Institute of Informatics

Linguistic top-level ontology • WordNet

– Top-level • { entity, physical thing (that which is perceived or known or inferred to

have its own physical existence (living or nonliving)) } • { psychological_feature, (a feature of the mental life of a living organism) } • { abstraction, (a general concept formed by extracting common features

from specific examples) } • { state, (the way something is with respect to its main attributes; "the

current state of knowledge"; "his state of health"; "in a weak financial state") }

• { event, (something that happens at a given place and time) } • { act, human_action, human_activity, (something that people do or cause

to happen) } • { group, grouping, (any number of entities (members) considered as a

unit) } • { possession, (anything owned or possessed) } • { phenomenon, (any state or process known through the senses rather

than by intuition or reasoning) }

Hideaki Takeda / National Institute of Informatics

Summary for structuring information

• Keywords, tags/Controlled vocabulary /Classification/Taxonomy /Thesaurus/Ontology

– The difference is not clear, not important

– The trend is to go more structured ones

– The same requirements to Identification systems

Hideaki Takeda / National Institute of Informatics

Summary

• Requirements for Successful Structuring Systems

– 1. Entity is stable and sustainable

– 2. Uniqueness is guaranteed over all systems

– 3. Description on should be associated to entity

– 4. System publisher is reliable and sustainable

• Learn from success in the library community

LOD Tech.

can help

Hideaki Takeda / National Institute of Informatics

Schema/Vocabulary for LOD

• Class/Concept description – Axiom of a concept in ontology – Database schema for a table in Relational database – Object definition in Object-Oriented Programming/DB

• Class description in Semantic Web – RDFS/OWL description for a class

• RDFS: Simple class system • OWL: Description Logic-based

• Class description in Linked Data – Mostly RDFS-based (exception: owl:sameAs) – Simple Structure (mostly property-value pair)

Hideaki Takeda / National Institute of Informatics

Schema/Vocabulary for LOD

• The importance of sharing schema

– Interoperability

– Generic applications

• Some famous and frequently used shemata

– Dublin Core

– FOAF (Friend-Of-A-Friend)

– SKOS (Simple Knowledge Organization System)

Hideaki Takeda / National Institute of Informatics

Usage of Common Vocabularies Prefix Namespace Used by

dc http://purl.org/dc/elements/1.1/ 66 (31.88 %)

foaf http://xmlns.com/foaf/0.1/ 55 (26.57 %)

dcterms http://purl.org/dc/terms/ 38 (18.36 %)

skos http://www.w3.org/2004/02/skos/core# 29 (14.01 %)

akt http://www.aktors.org/ontology/portal# 17 (8.21 %)

geo http://www.w3.org/2003/01/geo/wgs84_pos# 14 (6.76 %)

mo http://purl.org/ontology/mo/ 13 (6.28 %)

bibo http://purl.org/ontology/bibo/ 8 (3.86 %)

vcard http://www.w3.org/2006/vcard/ns# 6 (2.90 %)

frbr http://purl.org/vocab/frbr/core# 5 (2.42 %)

sioc http://rdfs.org/sioc/ns# 4 (1.93 %)

LDOW2011 Presentation, Christian Bizer (Freie Universität Berlin), 2011

Hideaki Takeda / National Institute of Informatics

(Simple) Dublin Core

• Started from the library community

• Now maintained by DCMI (Dublin Core Metadata Initiative)

• (Simple) Dublin Core – Just 15 elements – Simple is best – No range restriction – http://purl.org/dc/elements/1.1/

• 15 elements – Title – Creator – Subject – Description – Publisher – Contributor – Date – Type – Format – Identifier – Source – Language – Relation – Coverage – Rights

Hideaki Takeda / National Institute of Informatics

dc terms • Qualified Dublin Core

– Domain & Range

– More precise terms

• Extension of simple dc

Properties in the / abstract , accessRights , accrualMethod , accrualPeriodicity , accrualPolicy , alternative , audience , available , bibliograp

hicCitation ,conformsTo , contributor , coverage , created , creator , date , dateAccepted , dateCopyrighted , dateSubmitted , description ,educationLevel , extent , format , hasFormat , hasPart , hasVersion , identifier , instructionalMethod , isFormatOf , isPartOf , isReferencedBy ,isReplacedBy , isRequiredBy , issued , isVersionOf , language , license , mediator , medium , modified , provenance , publisher , references ,relation , replaces , requires , rights , rightsHolder , source , spatial , subject , tableOfContents , temporal , title , type , valid

Properties in the /elements/1.1/namespace

contributor , coverage , creator , date , description , format , identifier , language , publisher , relation , rights , source , subject , title , type

Vocabulary Encoding Schemes DCMIType , DDC , IMT , LCC , LCSH , MESH , NLM , TGN , UDC

Syntax Encoding Schemes Box , ISO3166 , ISO639-2 , ISO639-3 , Period , Point , RFC1766 , RFC3066 , RFC4646 , RFC5646 , URI , W3CDTF

Classes Agent , AgentClass , BibliographicResource , FileFormat , Frequency , Jurisdiction , LicenseDocument , LinguisticSystem , Location ,LocationPeriodOrJurisdiction , MediaType , MediaTypeOrExtent , MethodOfAccrual , MethodOfInstruction , PeriodOfTime , PhysicalMedium ,PhysicalResource , Policy , ProvenanceStatement , RightsStatement , SizeOrDuration , Standard

DCMI Type Vocabulary Collection , Dataset , Event , Image , InteractiveResource , MovingImage , PhysicalObject , Service , Software , Sound , StillImage , Text

Terms related to the DCMI Abstract Model

memberOf , VocabularyEncodingScheme

Hideaki Takeda / National Institute of Informatics

Dcterms subPropertyOf Domain Range

contributor dc:contributor rdfs:Resource dcterms:Agent

creator dc:creator, dcterms:contributor

rdfs:Resource dcterms:Agent

coverage dc:coverage rdfs:Resource dcterms:LocationPeriodOrJurisdiction

spatial dc:coverage, dcterms:coverage

rdfs:Resource dcterms:Location

Temporal dc:coverage, dcterms:coverage

rdfs:Resource dcterms:PeriodOfTime

Date dc:date rdfs:Resource rdfs:Literal

Available dc:date, dcterms:date rdfs:Resource rdfs:Literal

Created dc:date, dcterms:date rdfs:Resource rdfs:Literal

dateAccepted dc:date, dcterms:date rdfs:Resource rdfs:Literal

dateCopyrighted dc:date, dcterms:date rdfs:Resource rdfs:Literal

dateSubmitted dc:date, dcterms:date rdfs:Resource rdfs:Literal

Issued dc:date, dcterms:date rdfs:Resource rdfs:Literal

Modified dc:date, dcterms:date rdfs:Resource rdfs:Literal

Valid dc:date, dcterms:date rdfs:Resource rdfs:Literal

description dc:description rdfs:Resource rdfs:Resource

Abstract dc:description, dcterms:description

rdfs:Resource rdfs:Resource

tableOfContents dc:description, dcterms:description

rdfs:Resource rdfs:Resource

format dc:format rdfs:Resource dcterms:MediaTypeOrExtent

extent dc:format, dcterms:format rdfs:Resource dcterms:SizeOrDuration

Medium dc:format, dcterms:format dcterms:PhysicalResource

dcterms:PhysicalMedium

Identifier dc:identifier rdfs:Resource rdfs:Literal

bibliographicCitation

dc:identifier, dcterms:identifier

dcterms:BibliographicResource

rdfs:Literal

Language dc:language rdfs:Resource dcterms:LinguisticSystem

Publisher dc:publisher rdfs:Resource dcterms:Agent Relation dc:relation rdfs:Resource rdfs:Resource

source dc:source, dcterms:relation rdfs:Resource rdfs:Resource

Dcterms subPropertyOf Domain Range

conformsTo dc:relation, dcterms:relation rdfs:Resource dcterms:Standard

hasFormat dc:relation, dcterms:relation rdfs:Resource rdfs:Resource

hasPart dc:relation, dcterms:relation rdfs:Resource rdfs:Resource

hasVersion dc:relation, dcterms:relation rdfs:Resource rdfs:Resource

isFormatOf dc:relation, dcterms:relation rdfs:Resource rdfs:Resource

isPartOf dc:relation, dcterms:relation rdfs:Resource rdfs:Resource

isReferencedBy dc:relation, dcterms:relation rdfs:Resource rdfs:Resource

isReplacedBy dc:relation, dcterms:relation rdfs:Resource rdfs:Resource

isRequiredBy dc:relation, dcterms:relation rdfs:Resource rdfs:Resource

isVersionOf dc:relation, dcterms:relation rdfs:Resource rdfs:Resource

References dc:relation, dcterms:relation rdfs:Resource rdfs:Resource

Replaces dc:relation, dcterms:relation rdfs:Resource rdfs:Resource

Requires dc:relation, dcterms:relation rdfs:Resource rdfs:Resource

Rights dc:rights rdfs:Resource dcterms:RightsStatement

accessRights dc:rights, dcterms:rights rdfs:Resource dcterms:RightsStatement

License dc:rights, dcterms:rights rdfs:Resource dcterms:LicenseDocument

Subject dc:subject rdfs:Resource rdfs:Resource

title dc:title rdfs:Resource rdfs:Resourcerdfs:Literal

alternative dc:title, dcterms:title rdfs:Resource rdfs:Resourcerdfs:Literal

type dc:type rdfs:Resource rdfs:Class audience rdfs:Resource dcterms:AgentClass educationLevel dcterms:audience rdfs:Resource dcterms:AgentClass mediator dcterms:audience rdfs:Resource dcterms:AgentClass

accrualMethod dcmitype:Collection

dcterms:MethodOfAccrual

accrualPeriodicity dcmitype:Collection

dcterms:Frequency

accrualPolicy dcmitype:Collection

dcterms:Policy

instructionalMethod rdfs:Resource dcterms:MethodOfInstruction

provenance rdfs:Resource dcterms:ProvenanceStatement

rightsHolder rdfs:Resource dcterms:Agent

http://www.kanzaki.com/docs/sw/dc-domain-range.html http://dublincore.org/documents/dcmi-terms/

Hideaki Takeda / National Institute of Informatics

The Friend of a Friend (FOAF) • Metadata describe persons and their relationship

• Voluntary project

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .

@prefix foaf: <http://xmlns.com/foaf/0.1/> .

@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

<#JW>

a foaf:Person ;

foaf:name "Jimmy Wales" ;

foaf:mbox <mailto:[email protected]> ;

foaf:homepage <http://www.jimmywales.com/> ;

foaf:nick "Jimbo" ;

foaf:depiction

<http://www.jimmywales.com/aus_img_small.jpg> ;

foaf:interest <http://www.wikimedia.org> ;

foaf:knows [

a foaf:Person ;

foaf:name "Angela Beesley"

] .

<http://www.wikimedia.org>

rdfs:label "Wikipedia" .

Classes:

| Agent | Document | Group | Image | LabelProperty |

OnlineAccount | OnlineChatAccount |

OnlineEcommerceAccount | OnlineGamingAccount |

Organization | Person | PersonalProfileDocument | Project |

Properties:

| account | accountName | accountServiceHomepage | age |

aimChatID | based_near | birthday | currentProject |

depiction | depicts | dnaChecksum | familyName |

family_name | firstName | focus | fundedBy | geekcode |

gender | givenName | givenname | holdsAccount |

homepage | icqChatID | img | interest | isPrimaryTopicOf |

jabberID | knows | lastName | logo | made | maker | mbox |

mbox_sha1sum | member | membershipClass | msnChatID

| myersBriggs | name | nick | openid | page | pastProject |

phone | plan | primaryTopic | publications |

schoolHomepage | sha1 | skypeID | status | surname | theme

| thumbnail | tipjar | title | topic | topic_interest | weblog |

workInfoHomepage | workplaceHomepage | yahooChatID |

Hideaki Takeda / National Institute of Informatics

SKOS (Simple Knowledge Organization System)

• Metadata for taxonomy

– Hierarchical structure of concepts

• Invented to represent taxonomy such as subject heading

• =/= subclass relationship among classes

• W3C Recommendation 18 August 2009

Hideaki Takeda / National Institute of Informatics

SKOS (Simple Knowledge Organization System)

• SKOS Core (hierarchical concept structure)

– skos:semanticRelation

– skos:broaderTransitive

– skos:narrowerTransitive

– skos:broader

– skos:narrower

– skos:related

– skos:preflabel

– skos:altlabel

– skos:hiddenlabel

subPropertyOf

Hideaki Takeda / National Institute of Informatics

SKOS (Simple Knowledge Organization System)

• SKOS Mapping

– skos:mappingRelation

– skos:closeMatch

– skos:exactMatch

– skos:broadMatch

– skos:narrowMatch

– skos:relatedMatch

subPropertyOf

Hideaki Takeda / National Institute of Informatics

Linked Open Vocabulary (LOV)

• A technical platform for search and quality assessment among the vocabularies ecosystem

– Register schemata

– Search schemata

• http://labs.mondeca.com/dataset/lov/

Hideaki Takeda / National Institute of Informatics

X

Hideaki Takeda / National Institute of Informatics

More Info.

• http://www.w3.org/2005/Incubator/lld/wiki/Vocabulary_and_Dataset

Hideaki Takeda / National Institute of Informatics

Summary for schema

• Some major schemata

– DC, DC terms, FOAF, SKOS …

• More domain-specific schemata

– CIDOC CRM

– PRISM

– …

• Re-using is highly recommended

– LOV

Hideaki Takeda / National Institute of Informatics

Summary

• Three layers

– Ontology/Thesaurus/Taxonomy

– Schema

– Identification

• Not just top-down, rather bottom-up

• Each layer has own role

• Not pursue the value of each layer, rather make a good combination of them