22
oMAP: An Implemented Framework for Automatically Aligning OWL Ontologies SWAP, December, 2005 Raphaël Troncy, Umberto Straccia ISTI-CNR [email protected]

OMAP: An Implemented Framework for Automatically Aligning OWL Ontologies SWAP, December, 2005 Raphaël Troncy, Umberto Straccia ISTI-CNR [email protected]

Embed Size (px)

Citation preview

oMAP: An Implemented Framework for Automatically Aligning OWL

Ontologies

SWAP, December, 2005

Raphaël Troncy, Umberto StracciaISTI-CNR

[email protected]

2

Outline• Motivations

• oMAP• A formal framework• The different classifiers used

• Evaluation

• Conclusion

3

Motivations• Heterogeneity of information systems

• Ontologies as a solution to data heterogeneity on the Web

• Ontologies are themselves heterogeneous:• knowledge representation language• degree of formalization

• Semantic Web• More and more OWL/RDF ontologies on the Web• Need for comparing/reusing/merging ontologies

• partially covering the same domain• different version of the same ontology

4

Motivations (cont.)

• Distributed Information Retrieval• Resource selection: The agent has to select a subset of some

relevant resources

• Query reformulation: For every selected resource, the agent has to re-formulate its information need accordingly

• Data fusion & rank aggregation: The results from the selected resources have finally to be merged together.

5

Aligning Ontologies• A matching operator:

• Input: a set of discrete entities (tables, XML elements, classes, properties…)

• Output:• relationship holding between the entities (subsumption,

equivalence, disjointness…)• a confidence measure

• Automatic vs manual techniques• Numerous work from various communities

• schema matching, machine learning, data integration

[ ]1..0∈v

6

Example

Equivalence

Subsumption

Disjointness

7

oMAP: A Formal Framework• Inspirations:

• Formal work in data exchange [Fagin et al., 2003]

• GLUE: combining several specialized components for finding the best set of mappings [Doan et al., 2003]

• Notations:• A mapping is a tuple: M = (T, S, ∑)

• S et T are the source and target ontologies

• Si is an OWL entity (class, datatype property, object property) of the ontology

• ∑ is a set of mapping rules: αij Tj ← Si

8

oMAP: Overall Strategy• A three step process:

• Form possible ∑ sets and estimate its quality based on the quality measures for its mapping rules

• For each mapping rule Tj ← Si, estimate its confidence αij which also depends on the ∑ it belongs to

• Use heuristics to build iteratively the final set of mappings

9

oMAP: Combining Classifiers• Weight of a mapping rule:

• αij = w (Si,Tj, ∑)

• Using different classifiers:• w (Si,Tj,CLk) is the classifier's approximation of the

rule Tj ← Si

• Combining the approximations:• Use of a priority list: CL1 CL2 … CLn

p p p

10

Terminological Classifiers

• Same entity names (or URI)

• Same entity name stems

⎩⎨⎧

=otherwise 0

name, same have , if 1),,(

ji

Nji

TSCLTSw

⎩⎨⎧

=otherwise 0

stem, same have , if 1),,(

ji

Sji

TSCLTSw

11

Terminological Classifiers• String distance name

• WordNet distance name

• lcs is the longest common substring between Si and Tj

• sim =

))(),(max(

),(),,(

ji

jinLevenshteiLDji TlengthSlength

TSdistCLTSw =

⎪⎩

⎪⎨

⎟⎟⎠

⎞⎜⎜⎝

+

=otherwise

)()(

*2,max

synonyms, are , if 1

),,(

ji

ji

WNji

TlengthSlength

lcssim

TS

CLTSw

UI

)()(

)()(

ji

ji

TsynonymSsynonym

TsynonymSsynonym

12

Machine Learning-Based Classifiers

• Collecting individuals:• label for the named individuals• data value for the datatype properties• type for the anonymous individuals and the

range of object properties

• Recursion on the OWL definition:• depth parameter

13

Machine Learning-Based Classifiers

• ExampleIndividual (x1 type (Conference)

value (label "Int. Conf. on WISE") value (location x2) )

Individual (x2 type (Address)

value (city "New York city") value (country "USA") )

u1 = ("Int. Conf. on WISE", "Address")

u2 = ("Address", "New York City", "USA")

• Naïve Bayes text classifier∑ ∏∈ ∈

⋅=jTux um

iiNBji SmSCLTSw),(

)Pr()Pr(),,(

14

Structural and Semantics-Based Classifier

• If Si and Tj are property names:

• If Si and Tj are concept names1:

⎪⎩

⎪⎨⎧

Σ

Σ∉←=Σ

otherwise ),,('

if 0),,(

ji

ij

ji TSw

STTSw

⎪⎪⎪

⎪⎪⎪

⎟⎟⎠

⎞⎜⎜⎝

⎛⎟⎟⎠

⎞⎜⎜⎝

⎛Σ+Σ⋅

+

Σ∈←=Σ

Σ∉←

∑∈

otherwise ),,(max),,('1)Set(

1

and 0D if ),,('

if 0

),,(

),(

t

setDjCijisetji

ijji

ij

ji

DCwTSw

STTSw

ST

TSw

1 Where D = D(Si) * D(Tj) ; D(Si) represents the set of concepts directly parent of Si

15

Structural and Semantics-Based Classifier

• Let CS=(QR.C) and DT=(Q’R’.D), then1:

• Let CS=(op C1…Cm) and DT=(op’ D1…Dm), then2:

),,(),',()',(),,( Σ⋅Σ⋅=Σ DCwRRwQQwDCw QTS

),min(

),,(max

)',(),,(),(

nm

DCw

opopwDCwsetDjCi

jiset

opTS

⎟⎟⎠

⎞⎜⎜⎝

⎛Σ

⋅=Σ∑

1 Where Q,Q’ are quantifiers, R,R’ are property names and C,D concept expressions2 Where op, op’ are concept constructors and n,m ≥ 1

16

Structural and Semantics-Based Classifier

• Possible values for wop and wQ weights

wop wQ

⊓ ⊔ ¬

⊓ 1 1/4 0

⊔ 1 0

¬ 1

1 1/4

1

n

n

m

1 1/3

m

1

17

Evaluation• More and more techniques / tools for aligning

ontologies• difficult to compare all the approaches theoretically• pragmatism: evaluation campaign and contest

• I3CON : based on the NIST Text Retrieval Conference model • EON : systematic benchmark tests on all OWL constructs• OAEI : http://oaei.inrialpes.fr

• Alignment API [Euzenat, ISWC 2004]

• common format for representing / exchanging the alignments found

• tools and metrics for evaluating these alignments

18

• 3 series of tests on bibliographic ontologies:• simple tests: identity, specialization/generalization of the

language• systematic tests: some features of the initial ontology

are progressively discarded• complex tests: aligning 4 real ontologies available on

the Web

• The directory real world case consists of aligning web sites directory using the large dataset

19

20

Conclusion• oMAP: a formal framework for aligning

automatically OWL ontologies• Combining several specific classifiers

• terminological classifiers• machine learning-based classifiers• structural and semantics-based classifier

• Empirical evaluation on benchmark tests• using traditional information retrieval metrics• machine resources, memory, computation time…

not yet considered

21

Future Work• Alignment:

• Using additional classifiers:• kNN, KL-distance, WordNet or other terminological

resources• straightforward theoretically but practically difficult

• Finding complex alignment• name = firstName + lastName

• Distributed Information Retrieval• Automated relevant resource selection

22

Useful Links• oMAP: http://homepages.cwi.nl/~troncy/oMAP/

• Tutorial: Schema and Ontology Matching @ ESWC http://dit.unitn.it/~accord/Presentations/ESWC'05-MatchingHandOuts.pdf

• Alignment API: http://co4.inrialpes.fr/align/align.html

• OAEI: http://oaei.inrialpes.fr/

• State of the Art:• P. Shvaiko and J. Euzenat: A Survey of Shema-based Matching

Approaches. Journal on Data Semantics (JoDS), 2005• KW Consortium: State of the Art on Ontology Alignment. Knowledge

Web D2.2.3, 2004