50
Accessing Cultural Heritage Collections using Semantic Web Techniques Antoine ISAAC (inluding cool graphics by Frank van Harmelen) STITCH Project Book & Digital Media Master March 2 nd , 2007

Accessing Cultural Heritage Collections using Semantic Web Techniques Antoine ISAAC (inluding cool graphics by Frank van Harmelen) STITCH Project Book

Embed Size (px)

Citation preview

Page 1: Accessing Cultural Heritage Collections using Semantic Web Techniques Antoine ISAAC (inluding cool graphics by Frank van Harmelen) STITCH Project Book

Accessing Cultural Heritage Collections using Semantic Web Techniques

Antoine ISAAC(inluding cool graphics by Frank van Harmelen)STITCH Project

Book & Digital Media MasterMarch 2nd, 2007

Page 2: Accessing Cultural Heritage Collections using Semantic Web Techniques Antoine ISAAC (inluding cool graphics by Frank van Harmelen) STITCH Project Book

Accessing Cultural Heritage collections using Semantic Web techniques

Background

• CATCH • Continuous Access To Cultural Heritage• Funded by NWO• 10 computer science research projects applied to the

Cultural Heritage field• Personalization of access• Image and text analysis for creating metadata• …

• STITCH• SemanTic Interoperability To access Cultural

Heritage• Exchanging and integrating metadata

Page 3: Accessing Cultural Heritage Collections using Semantic Web Techniques Antoine ISAAC (inluding cool graphics by Frank van Harmelen) STITCH Project Book

Accessing Cultural Heritage collections using Semantic Web techniques

Agenda

• Cultural Heritage and Semantic Web• Two important issues

• Publishing Cultural Heritage vocabularies on the Semantic Web

• Vocabulary alignment

• Demo

Page 4: Accessing Cultural Heritage Collections using Semantic Web Techniques Antoine ISAAC (inluding cool graphics by Frank van Harmelen) STITCH Project Book

Accessing Cultural Heritage collections using Semantic Web techniques

Some Needs for Cultural Heritage Collections

• Representation of objects and knowledge about them • Pointing at collection objects• Describing them (creating metadata)

according to specific• Metadata structures (schemes)• Controlled expert vocabularies (e.g. thesauri)

• Accessing object using metadata • E.g. search using information contained in

thesauri

Page 5: Accessing Cultural Heritage Collections using Semantic Web Techniques Antoine ISAAC (inluding cool graphics by Frank van Harmelen) STITCH Project Book

Accessing Cultural Heritage collections using Semantic Web techniques

KB Illustrated Manuscripts

Page 6: Accessing Cultural Heritage Collections using Semantic Web Techniques Antoine ISAAC (inluding cool graphics by Frank van Harmelen) STITCH Project Book

Accessing Cultural Heritage collections using Semantic Web techniques

KB Illustrated Manuscripts

Page 7: Accessing Cultural Heritage Collections using Semantic Web Techniques Antoine ISAAC (inluding cool graphics by Frank van Harmelen) STITCH Project Book

Accessing Cultural Heritage collections using Semantic Web techniques

The Semantic Web (1/4)

• Pointing at resources: documents, knowledge objectsUniform Resource Identifiers (≈ URLs)

Page 8: Accessing Cultural Heritage Collections using Semantic Web Techniques Antoine ISAAC (inluding cool graphics by Frank van Harmelen) STITCH Project Book

Accessing Cultural Heritage collections using Semantic Web techniques

A Web of Resources

Amsterdam

rep321#paragraph3

rep321

The_Netherlands

http://www.ned.nl/rep321

Page 9: Accessing Cultural Heritage Collections using Semantic Web Techniques Antoine ISAAC (inluding cool graphics by Frank van Harmelen) STITCH Project Book

Accessing Cultural Heritage collections using Semantic Web techniques

The Semantic Web (2/4)

• Pointing at resources: documents, knowledge objects

• Creating structured assertions involving resourcesRDF (Resource Description Framework)Factual knowledge encoded as subject-property-object

triples

Page 10: Accessing Cultural Heritage Collections using Semantic Web Techniques Antoine ISAAC (inluding cool graphics by Frank van Harmelen) STITCH Project Book

Accessing Cultural Heritage collections using Semantic Web techniques

Metadata in RDF

subject

Amsterdam

rep321#paragraph3

rep321

partOf

The_Netherlands

hasCapital

subject

http://www.ned.nl/rep321

Page 11: Accessing Cultural Heritage Collections using Semantic Web Techniques Antoine ISAAC (inluding cool graphics by Frank van Harmelen) STITCH Project Book

Accessing Cultural Heritage collections using Semantic Web techniques

The Semantic Web (3/4)

• Pointing at resources: documents, knowledge objects

• Enabling structured assertions

• Using “building blocks” with precise semanticsOntologies: formal definitions of shared conceptual

vocabulariesRDF Schema /OWL (Ontology Web Language)

Page 12: Accessing Cultural Heritage Collections using Semantic Web Techniques Antoine ISAAC (inluding cool graphics by Frank van Harmelen) STITCH Project Book

Accessing Cultural Heritage collections using Semantic Web techniques

Ontological information

subject

Amsterdam

rep321#paragraph3

rep321

Report

type

partOf

The_Netherlands

hasCapital

subject

DocumentsubClassOf

http://www.ned.nl/rep321

Page 13: Accessing Cultural Heritage Collections using Semantic Web Techniques Antoine ISAAC (inluding cool graphics by Frank van Harmelen) STITCH Project Book

Accessing Cultural Heritage collections using Semantic Web techniques

The Semantic Web (4/4)

• Pointing at resources: documents, knowledge objects

• Enabling structured assertions

• Using “building blocks” with precise semantics

• Controlling existing facts, inferring new onesPart of the tasks are delegated from the user to

inference engines that use the formal semantics of ontologies

Page 14: Accessing Cultural Heritage Collections using Semantic Web Techniques Antoine ISAAC (inluding cool graphics by Frank van Harmelen) STITCH Project Book

Accessing Cultural Heritage collections using Semantic Web techniques

Ontological information

subject

Amsterdam

rep321#paragraph3

rep321

Report

type

partOf

The_Netherlands

hasCapital

subject

DocumentsubClassOf

http://www.ned.nl/rep321

type

Page 15: Accessing Cultural Heritage Collections using Semantic Web Techniques Antoine ISAAC (inluding cool graphics by Frank van Harmelen) STITCH Project Book

Accessing Cultural Heritage collections using Semantic Web techniques

Building on top of XML

<rdf:Description rdf:about=”http://www.ned.nl/doc321”> <subject rdf:resource=” http://www.geo.org/voc/The_Netherlands”/></rdf:Description><rdf:Description rdf:about=”http://www.geo.org/voc/The_Netherlands”> <hasCapital rdf:resource=”http://www.geo.org/voc/Amsterdam”/></rdf:Description>

eXtensible Markup Language

Page 16: Accessing Cultural Heritage Collections using Semantic Web Techniques Antoine ISAAC (inluding cool graphics by Frank van Harmelen) STITCH Project Book

Accessing Cultural Heritage collections using Semantic Web techniques

Building on top of the Web

• Web-based resources allow division/sharing of • document• vocabulary• metadata

(par3, subject, Amsterdam)

differentowners & locations

http://www.kb.nl/eDepot

http://www.geo.org/voc/

http://www.ned.nl/rep321

Page 17: Accessing Cultural Heritage Collections using Semantic Web Techniques Antoine ISAAC (inluding cool graphics by Frank van Harmelen) STITCH Project Book

Accessing Cultural Heritage collections using Semantic Web techniques

Cultural Heritage Collections and Semantic Web

• Need to categorize/classify things• Need to structure representations

• Using MD schemes is similar to using relations

Semantic Web techniques are good candidate for representing Cultural Heritage metadata

Page 18: Accessing Cultural Heritage Collections using Semantic Web Techniques Antoine ISAAC (inluding cool graphics by Frank van Harmelen) STITCH Project Book

Accessing Cultural Heritage collections using Semantic Web techniques

Agenda

• Cultural Heritage and Semantic Web• Two important issues

• Publishing Cultural Heritage vocabularies on the Semantic Web

• Vocabulary alignment

• Demo

Page 19: Accessing Cultural Heritage Collections using Semantic Web Techniques Antoine ISAAC (inluding cool graphics by Frank van Harmelen) STITCH Project Book

Accessing Cultural Heritage collections using Semantic Web techniques

Publishing Cultural Heritage vocabularies on the Semantic Web

• Situation: a lot of knowledge up there• Aim: providing domain expertise to the outside

world• Thesaurus web services

• Aim: a global network of collection and vocabularies• Coordinating different vocabularies

• Problem: need to enforce some homogenization• Many different models and formats

Page 20: Accessing Cultural Heritage Collections using Semantic Web Techniques Antoine ISAAC (inluding cool graphics by Frank van Harmelen) STITCH Project Book

Accessing Cultural Heritage collections using Semantic Web techniques

SKOS

• Simple Knowledge Organization Systems• World Wide Web Consortium (W3C)

• Model to represent structured vocabularies (thesauri, classification schemes) on the Semantic Web

• Building blocks to create XML/RDF data• Concepts and Concept schemes• Lexical properties (prefLabel, altLabel)• Semantic relations (broader, related)• Notes (scopeNote, definition)

Page 21: Accessing Cultural Heritage Collections using Semantic Web Techniques Antoine ISAAC (inluding cool graphics by Frank van Harmelen) STITCH Project Book

Accessing Cultural Heritage collections using Semantic Web techniques

SKOS: Nederlandse Basisclassificatie (KB)

nbc:nbc0214

Organisatie vanWetenschap en

cultuur

nbc:nbc0200

Wetenschap encultuur in het

algemeen

skos:prefLabel

skos:prefLabel

skos:broader

nbc:nbc0230

skos:related

skos:broader

Museologie

Verwijzing:vooralgemene musea, zie

02.14

skos:prefLabel

skos:scopeNote

skos: = http://www.w3.org/2004/02/skos/core#nbc: = http://www.kb.nl/nbc/

Page 22: Accessing Cultural Heritage Collections using Semantic Web Techniques Antoine ISAAC (inluding cool graphics by Frank van Harmelen) STITCH Project Book

Accessing Cultural Heritage collections using Semantic Web techniques

SKOS: Nederlandse Basisclassificatie (KB)

<rdf:Description rdf:about="http://stitch.cs.vu.nl/nbc#nbc0200"><rdf:type

rdf:resource="http://www.w3.org/2004/02/skos/core#Concept"/><skos:prefLabel>wetenschap en cultuur in het

algemeen</skos:prefLabel></rdf:Description><rdf:Description rdf:about="http://stitch.cs.vu.nl/nbc#nbc0214">

<rdf:type rdf:resource="http://www.w3.org/2004/02/skos/core#Concept"/>

<skos:prefLabel>organisatie van wetenschap en cultuur</skos:prefLabel>

<skos:broader rdf:resource="http://stitch.cs.vu.nl/nbc#nbc0200"/></rdf:Description><rdf:Description rdf:about="http://stitch.cs.vu.nl/nbc#nbc0230">

<rdf:type rdf:resource="http://www.w3.org/2004/02/skos/core#Concept"/>

<skos:prefLabel>museologie</skos:prefLabel><skos:broader rdf:resource="http://stitch.cs.vu.nl/nbc#nbc0200"/><skos:scopeNote>voor algemene musea, zie: 02.14</skos:scopeNote>

</rdf:Description>

Page 23: Accessing Cultural Heritage Collections using Semantic Web Techniques Antoine ISAAC (inluding cool graphics by Frank van Harmelen) STITCH Project Book

Accessing Cultural Heritage collections using Semantic Web techniques

SKOS: Brinkman Trefwoorden (KB)

skos: = http://www.w3.org/2004/02/skos/core#bk: = http://www.kb.nl/brinkman/

bk:075611791

kindergeneeskundekinderen ouder dan12 vallen niet onderkindergeneeskunde

bk:075607204

geneeskunde

bk:075607220

medicijnengeneesmiddelenskos:prefLabel

skos:scopeNote

skos:broader

skos:prefLabel

skos:related

skos:prefLabel skos:altLabel

Page 24: Accessing Cultural Heritage Collections using Semantic Web Techniques Antoine ISAAC (inluding cool graphics by Frank van Harmelen) STITCH Project Book

Accessing Cultural Heritage collections using Semantic Web techniques

SKOS

• Open (future) standard• Web-compatible• Shareable

• Links and blocks have established meaning• Compliant with community needs

Page 25: Accessing Cultural Heritage Collections using Semantic Web Techniques Antoine ISAAC (inluding cool graphics by Frank van Harmelen) STITCH Project Book

Accessing Cultural Heritage collections using Semantic Web techniques

Agenda

• Cultural Heritage and Semantic Web• Two important issues

• Publishing Cultural Heritage vocabularies on the Semantic Web

• Vocabulary alignment

• Demo

Page 26: Accessing Cultural Heritage Collections using Semantic Web Techniques Antoine ISAAC (inluding cool graphics by Frank van Harmelen) STITCH Project Book

Accessing Cultural Heritage collections using Semantic Web techniques

Cultural Heritage Interoperability Problems

• Current trend: accessing different collections simultaneously

• Problem: integrating different databases/metadata schemes/vocabularies

• Syntactic interoperability can be solved• Common metadata scheme• Common vocabulary model (SKOS?)

• How about conceptual heterogeneity?

Page 27: Accessing Cultural Heritage Collections using Semantic Web Techniques Antoine ISAAC (inluding cool graphics by Frank van Harmelen) STITCH Project Book

Accessing Cultural Heritage collections using Semantic Web techniques

The semantic interoperability problem

• There is no standard thesaurus• We don’t really want it

different vocabularies for different expertise domains, traditions, tasks

• Consequence:• “klassieke ruïnes” vs. “landschap met ruïnes”• “maagd Maria” vs. “Heilige Moeder”

• Practical problem:• Searching for “Heilige Moeder” misses “maagd

Maria”• Unless we know both vocabularies

Page 28: Accessing Cultural Heritage Collections using Semantic Web Techniques Antoine ISAAC (inluding cool graphics by Frank van Harmelen) STITCH Project Book

Accessing Cultural Heritage collections using Semantic Web techniques

Old situation

Page 29: Accessing Cultural Heritage Collections using Semantic Web Techniques Antoine ISAAC (inluding cool graphics by Frank van Harmelen) STITCH Project Book

Accessing Cultural Heritage collections using Semantic Web techniques

Vocabulary alignment

• STITCH aim: find correspondences between vocabulary elements• “klassieke ruïnes” ≈ “landschap met ruïnes”• “maagd Maria” = “Heilige Moeder”

• Doing it automatically• Vocabularies are big (tens of thousands concepts)• They evolve• Application can change their reference vocabularies

• Using techniques from• Linguistics• Computer science• Statistics

Page 30: Accessing Cultural Heritage Collections using Semantic Web Techniques Antoine ISAAC (inluding cool graphics by Frank van Harmelen) STITCH Project Book

Accessing Cultural Heritage collections using Semantic Web techniques

New situation

Page 31: Accessing Cultural Heritage Collections using Semantic Web Techniques Antoine ISAAC (inluding cool graphics by Frank van Harmelen) STITCH Project Book

Accessing Cultural Heritage collections using Semantic Web techniques

Automatic alignment techniques

• Lexical Labels of entities and textual definitions

• StructuralStructure of the formal definitions of entities, position in the

hierarchy

• StatisticalObject information (e.g. book indexing)

• Background knowledge Using a shared conceptual reference to find links

brainLong tumor tumorLong

Page 32: Accessing Cultural Heritage Collections using Semantic Web Techniques Antoine ISAAC (inluding cool graphics by Frank van Harmelen) STITCH Project Book

Accessing Cultural Heritage collections using Semantic Web techniques

Lexical alignment

• Compare each pair of concepts• Use labels and synonyms of concepts• Heuristic method to discover

equivalence and specialization relations

tumorbrainLong tumor LongMore specific than

Page 33: Accessing Cultural Heritage Collections using Semantic Web Techniques Antoine ISAAC (inluding cool graphics by Frank van Harmelen) STITCH Project Book

Accessing Cultural Heritage collections using Semantic Web techniques

Lexical alignment: Manuscripts case

broaderEquivalent

Page 34: Accessing Cultural Heritage Collections using Semantic Web Techniques Antoine ISAAC (inluding cool graphics by Frank van Harmelen) STITCH Project Book

Accessing Cultural Heritage collections using Semantic Web techniques

Automatic Alignment Techniques

• Lexical Labels of entities and textual definitions

• StructuralStructure of the formal definitions of entities, position in the

hierarchy

• StatisticalObject information (e.g. book indexing)

• Shared background knowledge Using a conceptual reference to deduce correspondences

brainLong tumor tumorLong

Page 35: Accessing Cultural Heritage Collections using Semantic Web Techniques Antoine ISAAC (inluding cool graphics by Frank van Harmelen) STITCH Project Book

Accessing Cultural Heritage collections using Semantic Web techniques

Statistical alignment

Page 36: Accessing Cultural Heritage Collections using Semantic Web Techniques Antoine ISAAC (inluding cool graphics by Frank van Harmelen) STITCH Project Book

Accessing Cultural Heritage collections using Semantic Web techniques

Statistic approach: KB case

• Experiment with GOO trefwoordenthesaurus and Brinkman thesaurus

Page 37: Accessing Cultural Heritage Collections using Semantic Web Techniques Antoine ISAAC (inluding cool graphics by Frank van Harmelen) STITCH Project Book

Accessing Cultural Heritage collections using Semantic Web techniques

Statistic approach: KB case

• Comparing books indexed with BK concepts and books indexed with GTT concepts• Overlap measure

concept C1 [GTT]

concept C2 [BK]

Page 38: Accessing Cultural Heritage Collections using Semantic Web Techniques Antoine ISAAC (inluding cool graphics by Frank van Harmelen) STITCH Project Book

Accessing Cultural Heritage collections using Semantic Web techniques

Results

1: 9132.9 (1704 3479 976) Schilderijen - schilderkunst

2: 8088.5 (1204 2330 767) Kwaliteitszorg - kwaliteitsmanagement

3: 6232.7 (820 1572 543) Personeelsmanagement - personeelsbeleid

4: 5392.1 (1399 3271 622) Beeldende kunsten - beeldende kunst

5: 5063.1 (4951 1152 613) Nederlands - Nederlandse taalkunde

17: 3421.8 (280 714 243) Diabetes mellitus - suikerziekte

Page 39: Accessing Cultural Heritage Collections using Semantic Web Techniques Antoine ISAAC (inluding cool graphics by Frank van Harmelen) STITCH Project Book

Accessing Cultural Heritage collections using Semantic Web techniques

Agenda

• Cultural Heritage and Semantic Web• Two important issues

• Publishing Cultural Heritage vocabularies on the Semantic Web

• Vocabulary alignment

• Demo

Page 40: Accessing Cultural Heritage Collections using Semantic Web Techniques Antoine ISAAC (inluding cool graphics by Frank van Harmelen) STITCH Project Book

Accessing Cultural Heritage collections using Semantic Web techniques

Demo

• KB Illuminated Manuscripts• BNF Mandragore Manuscripts

Page 41: Accessing Cultural Heritage Collections using Semantic Web Techniques Antoine ISAAC (inluding cool graphics by Frank van Harmelen) STITCH Project Book

Accessing Cultural Heritage collections using Semantic Web techniques

Manuscripts, 2nd Collection: BNF Mandragore

Page 42: Accessing Cultural Heritage Collections using Semantic Web Techniques Antoine ISAAC (inluding cool graphics by Frank van Harmelen) STITCH Project Book

Accessing Cultural Heritage collections using Semantic Web techniques

Manuscripts, 2nd Collection: BNF Mandragore

Page 43: Accessing Cultural Heritage Collections using Semantic Web Techniques Antoine ISAAC (inluding cool graphics by Frank van Harmelen) STITCH Project Book

Accessing Cultural Heritage collections using Semantic Web techniques

Manuscripts vocabularies

• Mandragore• Big (16000 terms)• Weakly structured (2-level deep, multi-inheritance)• Alternative lexical forms• Definitions

• IconClass• Huge (>24000 subjects)• Richly structured : 10 level hierarchy, cross-

references• Compound concepts: keys, structural digits…• Keywords

[Monolingual case, since Iconclass comes in French and English]

Page 44: Accessing Cultural Heritage Collections using Semantic Web Techniques Antoine ISAAC (inluding cool graphics by Frank van Harmelen) STITCH Project Book

Accessing Cultural Heritage collections using Semantic Web techniques

Demo

• http://stitch.cs.vu.nl/rp33333/MANDRA-SV-ICE-mandraNewNONE , amphibians

• Wheat

Page 45: Accessing Cultural Heritage Collections using Semantic Web Techniques Antoine ISAAC (inluding cool graphics by Frank van Harmelen) STITCH Project Book

Accessing Cultural Heritage collections using Semantic Web techniques

Conclusion: Semantic Web can help Cultural Heritage

• Representation of collections and

associated expert vocabularies• Publication and access• Semantic integration

New opportunities for making knowledge accessible

Cf. Dublin core RDF Schema

Page 46: Accessing Cultural Heritage Collections using Semantic Web Techniques Antoine ISAAC (inluding cool graphics by Frank van Harmelen) STITCH Project Book

Accessing Cultural Heritage collections using Semantic Web techniques

Links

• Semantic Web at W3C• http://www.w3.org/2001/sw/

• Semantic Web at Vrije Universiteit• http://www.cs.vu.nl/ai/kr/• http://www.cs.vu.nl/bi/

• SKOS• http://www.w3.org/2004/02/skos/

• Other Cultural Heritage and Semantic Web projects• MuseumFinland, http://www.museosuomi.fi/ • eCulture, http://e-culture.multimedian.nl/

Page 47: Accessing Cultural Heritage Collections using Semantic Web Techniques Antoine ISAAC (inluding cool graphics by Frank van Harmelen) STITCH Project Book

Accessing Cultural Heritage collections using Semantic Web techniques

Thanks!

Page 48: Accessing Cultural Heritage Collections using Semantic Web Techniques Antoine ISAAC (inluding cool graphics by Frank van Harmelen) STITCH Project Book

Accessing Cultural Heritage collections using Semantic Web techniques

Page 49: Accessing Cultural Heritage Collections using Semantic Web Techniques Antoine ISAAC (inluding cool graphics by Frank van Harmelen) STITCH Project Book

Accessing Cultural Heritage collections using Semantic Web techniques

Page 50: Accessing Cultural Heritage Collections using Semantic Web Techniques Antoine ISAAC (inluding cool graphics by Frank van Harmelen) STITCH Project Book

Accessing Cultural Heritage collections using Semantic Web techniques