31
Mapping Cultural Heritage Information to CIDOC-CRM Maria Theodoridou Foundation for Research and Technology Hellas Institute of Computer Science

Mapping Cultural Heritage Information to CIDOC-CRM

Embed Size (px)

DESCRIPTION

Presentation at the Semantics & Cultural Heritage meet-up at The British Museum, London, September 12, 2014

Citation preview

Page 1: Mapping Cultural Heritage Information to CIDOC-CRM

Mapping Cultural Heritage Information

to CIDOC-CRM

Maria Theodoridou

Foundation for Research and Technology – Hellas

Institute of Computer Science

Page 2: Mapping Cultural Heritage Information to CIDOC-CRM

BM meet-up “Semantics and Cultural Heritage”, London, September 12, 2014

X3ML

An interface for sustainable management of data mapping process

Use Case

Mapping the dFMRÖ coin database to CIDOC-CRM

2

Overview

Page 3: Mapping Cultural Heritage Information to CIDOC-CRM

BM meet-up “Semantics and Cultural Heritage”, London, September 12, 2014

X3ML

An interface for sustainable

management of data mapping process

Haridimos Kondylakis, Martin Doerr FORTH-ICS

Gerald de Jong Delving B.V.

Dominic Oldman British Museum

3

Page 4: Mapping Cultural Heritage Information to CIDOC-CRM

BM meet-up “Semantics and Cultural Heritage”, London, September 12, 2014

Cultural Diversity and Data Standards

Cultural information is more than a domain:

Collection description (art, archeology, natural history….)

Archives and literature (records, treaties, letters, artful works..)

Administration, preservation, conservation of material heritage

Science and scholarship – investigation, interpretation

Presentation – exhibition making, teaching, publication

But how to make a documentation standard?

Each aspect needs its methods, forms, communication means

Data overlap, but do not fit in one schema

4

Page 5: Mapping Cultural Heritage Information to CIDOC-CRM

BM meet-up “Semantics and Cultural Heritage”, London, September 12, 2014

“One model to rule them all”

The CIDOC CRM

The CIDOC Conceptual Reference Model

A collaboration with the International Council of Museums

An ontology of 86 classes and 137 properties for culture and more

With the capacity to explain hundreds of (meta)data formats

Accepted by ISO TC46 in September 2000

International standard since 2006 - ISO 21127:2006

Serving as:

intellectual guide to create schemata, formats, profiles

A language for analysis of existing sources for integration/mediation

“Identify elements with common meaning”

Transportation format for data integration / migration / Internet

5

Page 6: Mapping Cultural Heritage Information to CIDOC-CRM

BM meet-up “Semantics and Cultural Heritage”, London, September 12, 2014

Mappings

Mapping Rule

“a sufficient specification for the transformation of each

instance of a source schema into an instance of a target

schema while preserving as much as possible its initial

‘meaning’ ”

In practice mappings are produced manually by

Domain/IT experts

Labor-intensive

Error prone

Time consuming

CIDOC-CRM

DB1 DB2 DBn …

Mappings

6

Page 7: Mapping Cultural Heritage Information to CIDOC-CRM

BM meet-up “Semantics and Cultural Heritage”, London, September 12, 2014

Existing Mapping Approaches

Enormous amount of work already developed

Relational databases to RDF/S and OWL models

Files/XMLs to RDF/S

However previous approaches lack understanding of:

the borders between semantics and programming

the semantic heterogeneity cases between models

the business process that should be part of it

7

Page 8: Mapping Cultural Heritage Information to CIDOC-CRM

BM meet-up “Semantics and Cultural Heritage”, London, September 12, 2014

X3ML Workflow

Schema

Matching

CIDOC

-CRM

DB2 DB2 DB2

Domain

Experts

Schema Matching

Definition file

URI

generation

specification

IT Experts

Terminology

Mapping

8

Page 9: Mapping Cultural Heritage Information to CIDOC-CRM

BM meet-up “Semantics and Cultural Heritage”, London, September 12, 2014

X3ML Mapping format

9

Page 10: Mapping Cultural Heritage Information to CIDOC-CRM

BM meet-up “Semantics and Cultural Heritage”, London, September 12, 2014

X3ML – Additional Nodes

10

Page 11: Mapping Cultural Heritage Information to CIDOC-CRM

BM meet-up “Semantics and Cultural Heritage”, London, September 12, 2014

X3ML - Intermediate Paths

11

Page 12: Mapping Cultural Heritage Information to CIDOC-CRM

BM meet-up “Semantics and Cultural Heritage”, London, September 12, 2014

X3ML – Variables, Conditions, Info & Comments

Blocks

<if>

<exists>[xpath]</exists>

</if>

<if><not>

<if><exists>[xpath]</exists></if>

</not></if>

<if>

<equals value="[value-for-

comparison]">[xpath]</equals>

</if>

<if><not>

<if><equals value="[value-for-

comparison]">[xpath]</equals></if>

</not></if>

<x3ml>

<info>

... various fields describing the mapping ...

</info>

<namespaces/>

<mappings>

<mapping>

<domain>

<comments>

... various notes about the domain ...

</comments>

</domain>

<link>

<path>

<comments>

... various notes about the path ...

</comments>

</path>

<range>

<comments>

... various notes about the range ...

</comments>

</range>

</link>

</mapping>

</mappings>

<comments>

... various notes about the mappings ...

</comments>

</x3ml>

<entity variable="p1">

[generate the value]

</entity>

12

Page 13: Mapping Cultural Heritage Information to CIDOC-CRM

BM meet-up “Semantics and Cultural Heritage”, London, September 12, 2014

Use Case:

Mapping the dFMRÖ coin database to

CIDOC-CRM

13

Martin Doerr, Maria Theodoridou FORTH-ICS

Edeltraud Aspöck, Klaus Vondrovec ÖAW

Page 14: Mapping Cultural Heritage Information to CIDOC-CRM

BM meet-up “Semantics and Cultural Heritage”, London, September 12, 2014

Advanced Research Infrastructure for Archaeological Dataset Networking in

Europe

FP7-INFRASTRUCTURES-2012-1 EU project , no: 313193

http://www.ariadne-infrastructure.eu/

Primary goals

To integrate existing archaeological research infrastructures

To enable the use of distributed datasets and services

To develop new and powerful technologies as an integral component of the

archaeological research methodology

14

Page 15: Mapping Cultural Heritage Information to CIDOC-CRM

BM meet-up “Semantics and Cultural Heritage”, London, September 12, 2014

CIDOC-CRM was chosen as ARIADNE’s integration platform since its

primary role is to enable information exchange and integration between

heterogeneous sources of cultural heritage information.

During the first year of ARIADNE, several mapping activities were initiated

trying to convert existing schemata of archaeological data to CIDOC-CRM

Content providers were supported by FORTH

ÖAW worked on the mapping of four data bases:

dFMRÖ, a relational database of ancient Roman coin finds from Austria

and Romania

UK Material Pool Database (Site DB)

UK Thunau Database (Image DB)

Franzhausen Kokoron Database (Cemetery DB)

15

Page 16: Mapping Cultural Heritage Information to CIDOC-CRM

dFMRÖ digitale FundMünzen der Römischen Zeit in

Österreich

Austrian Academy of Sciences Numismatic Commission

Klaus Vondrovec

[email protected]

Access DB since 1999 MySQL DB online since 2007

http://www.oeaw.ac.at/numismatik/projekte/dfmroe/dfmroe.html

16

Page 17: Mapping Cultural Heritage Information to CIDOC-CRM

17

Page 18: Mapping Cultural Heritage Information to CIDOC-CRM

Tables

Page 19: Mapping Cultural Heritage Information to CIDOC-CRM

BM meet-up “Semantics and Cultural Heritage”, London, September 12, 2014

dFMRÖ coin db: mapping Coin

19

Target Domain:

E22 Man-Made Object

Source Domain:

//COIN

Two approaches for defining Coin

Introduce a specialization of E22 Man-Made Object:

Exx Coin subclass of E22 Man-Made Object

Define the Type of E22 Man-Made Object:

E22 Man-Made Object. P2 has type: E55 Type = “Coin”

To choose we need to answer the question:

Does the new class Coin have new properties that are not available in E22?

E55 Type

Coin

Page 20: Mapping Cultural Heritage Information to CIDOC-CRM

BM meet-up “Semantics and Cultural Heritage”, London, September 12, 2014

dFMRÖ coin db: Identifiers

20

Target Range:

E41 Appellation

Target Domain:

E22 Man-Made Object

Target Path:

P1 is identified by

Source Path:

ID

Source Domain:

//COIN

Source Range:

ID

E55 Type

Coin

Guideline: We map local identifiers in relational database tables explicitly only if these

identifiers are visible in the user interface and used in other documents as well.

Alternatively, we use the local database identifiers only for generating URIs for the

record instance, here the coin instance, and do NOT map the COIN.ID at all.

Page 21: Mapping Cultural Heritage Information to CIDOC-CRM

BM meet-up “Semantics and Cultural Heritage”, London, September 12, 2014

Mapping joins

21

Target Range:

E4 Period

Target Domain:

E22 Man-Made Object

P108i was produced by

Source Path:

COUNTRY_ID == COYNTRY_ID

Source Domain:

//Coin

Source Range:

//COUNTRY

P10 falls within

E12 Production

p1

Page 22: Mapping Cultural Heritage Information to CIDOC-CRM

BM meet-up “Semantics and Cultural Heritage”, London, September 12, 2014

Mixing categorical and factual info

Need to separate categorical and factual data

Inconsistent information:

Find spot -> for a specific coin

Historical facts -> for a category of coins

22

Page 23: Mapping Cultural Heritage Information to CIDOC-CRM

BM meet-up “Semantics and Cultural Heritage”, London, September 12, 2014

Categorical production

23

P108i was

produced by

PC2 is example of

PC1 produced

things of type

Need to extend the model in order to

support categorical production

(similar to FRBR R26 produced things

of type and R7 is example of)

Type can take values such as

"AU from Rome, mint ..."

which characterize the "edition" of the

mint

that can be recognized to be outcome

of the same minting process.

Typically we would assume that there

is a unique stamp used. E12 Production

p1

E55 Type

AU from Rome

E22 Man-Made Object

MyCoin

Page 24: Mapping Cultural Heritage Information to CIDOC-CRM

BM meet-up “Semantics and Cultural Heritage”, London, September 12, 2014

P2 has type

P17 was motivated by

Needs specialization “gave order”

P108 has produced

P2 has type

E55 Type

AU from Rome,

mint …

Mixing categorical and factual info

PC1 produced

things of type

E55 Type

“AU”

(DENOMINATION)

E55 Type

Issuing

E22 Man-Made Object

MyCoin

E12 Production

p1

E7 Activity

ia1

Page 25: Mapping Cultural Heritage Information to CIDOC-CRM

BM meet-up “Semantics and Cultural Heritage”, London, September 12, 2014

Issuer

25

Target Range:

E39 Actor

Target Domain:

E22 Man-Made Object

P108i was produced by

Source Path:

ISSUER_ID == PR_ID

Source Domain:

//COIN

Source Range:

//ISSUER

P14 carried out by

E12 Production

p1

"Issuer" is an accidental role, does not characterize

an actor independently from particular contexts of

activity. Therefore the Actor does not have the type

"Issuer" but the activity only has the type "Issuing"

P17 was motivated by

E7 Activity

ia1

E55 Type

Issuing

Page 26: Mapping Cultural Heritage Information to CIDOC-CRM

BM meet-up “Semantics and Cultural Heritage”, London, September 12, 2014

dFMRÖ coin db

26

Page 27: Mapping Cultural Heritage Information to CIDOC-CRM

BM meet-up “Semantics and Cultural Heritage”, London, September 12, 2014

dFMRÖ coin db

27

Page 28: Mapping Cultural Heritage Information to CIDOC-CRM

BM meet-up “Semantics and Cultural Heritage”, London, September 12, 2014

dFMRÖ coin db

28

Page 29: Mapping Cultural Heritage Information to CIDOC-CRM

BM meet-up “Semantics and Cultural Heritage”, London, September 12, 2014

CIDOC CRM Mapping Repository

29

Published schema matching definitions are available at:

http://139.91.183.3:9080/mapping_technology/

The schema matching definition (Version 1.0) format is available:

http://139.91.183.3:9080/mapping_technology/xsd/x3ml/x3ml_v1.0.xsd

The Mapping Memory Manager (3M) is available:

http://139.91.183.3:9080/3M/

Domain experts are able to easily understand & edit X3ML mapping files

You are kindly invited to send us your schema matching definition.

Page 30: Mapping Cultural Heritage Information to CIDOC-CRM

BM meet-up “Semantics and Cultural Heritage”, London, September 12, 2014

Lessons from mapping experiences

Semantic Interoperability can be defined by the capability of mapping

Mapping for epistemic networks is relatively simple:

Specialist/primary information databases frequently employ a flat schema, reducing complex

relationships into simple fields

Source fields frequently map to composite paths under the CRM, making semantics explicit

using a small set of primitives

Intermediate nodes are postulated or deduced (e.g., “production” from “coin”, “birth” from

“person”). They are the hooks for integration with complementary sources

Cardinality constraints must not be enforced= Alternative or incomplete knowledge

Domain experts easily learn schema mapping

IT experts may not understand meaning, underestimate it or are bored by it!

Intuitive tools for domain experts needed:

Separate identifier matching from schema mapping

Separate terminology mediation from schema mapping

30

Page 31: Mapping Cultural Heritage Information to CIDOC-CRM

BM meet-up “Semantics and Cultural Heritage”, London, September 12, 2014

Thank you!

31