18
1 Découverte de mappings entre schemas : les différentes approches Schema Matching : Different Approaches Khalid Saleem LIRMM

Khalid Saleem LIRMM

  • Upload
    olympe

  • View
    27

  • Download
    0

Embed Size (px)

DESCRIPTION

Découverte de mappings entre schemas : les différentes approches Schema Matching : Different Approaches. Khalid Saleem LIRMM. RDF Schema. XML Schema. XML. RDF. OWL. Schema and Ontology. Schema represents Database Community - PowerPoint PPT Presentation

Citation preview

Page 1: Khalid Saleem LIRMM

1

Découverte de mappings entre schemas :

les différentes approches

Schema Matching : Different Approaches

Khalid Saleem

LIRMM

Page 2: Khalid Saleem LIRMM

2

Schema and Ontology

Schema represents Database Community Schemas often do not provide explicit semantics of their data

(ER, XML document schema). Ontology represents the AI Community

Ontologies are logical systems that themselves obey some formal semantics. Designed to be interpreted by computers for reasoning (OWL)

Schemas and Ontologies are similar in the sense that Both provide a vocabulary of terms that describes a domain Both constraint the meaning of terms used in vocabulary

(Hierarchy/ relations)

XML XML Schema

RDFRDF

Schema

OWL

Page 3: Khalid Saleem LIRMM

3

Schema vs Ontology : examples

class-def animal%plants are a class that is disjoint from animalsclass-def plant subclass-of NOT animal%it is necessary but not sufficient for a tree to be a plant:class-def tree subclass-of plant%branches are PART OF treesclass-def branch

slot-constraint is-part-of has-value tree%it is necessary and sufficient for a carnivore to be an animal:class-def defined carnivore subclass-of animal

slot-constraints eats value-type animal%herbivores eat only plants OR part of plantsclass-def defined herbivore subclass-of animal

slot-constraint eats value-type plant OR (slot-constraint is-part-of has-value plant)

DAML+OIL

<class-def><name>branch</name><slot-constraint>

<name>is-part-of</name><has-value>tree</has-value>

</slot-constraint></class-def>

XML

Page 4: Khalid Saleem LIRMM

4

Match

Takes two schemas/ontologies as input and produces a mapping between elements of the two schemas that correspond semantically to each other

1-1 matchcomplex match

26,60 Harry Potter J. K. Rowling11,50 Marie Des Intrigues Juliette Benzoni

16,50 Nous Les Dieux Bernard Werber24 Pompei Robert Harris

price book-title author-name

BooksSource A

listed-price title a-fname a-lname

BooksSource B

Page 5: Khalid Saleem LIRMM

5

Schema Matching vs Ontology Matching

Schema matching is usually performed with the help of techniques trying to guess the meaning encoded in the schemas

Ontology matching try to exploit knowledge explicitly encoded in the ontologies.`

In real world applications : Solutions from both domains are mutually beneficial

Page 6: Khalid Saleem LIRMM

6

Application Domains

Traditional (Static) Schema Integration Data warehousing E-commerce Catalogue Integration

New Frontiers (Dynamic) Semantic Query Processing Agent Communication Web Services Integration P2P Databases

Page 7: Khalid Saleem LIRMM

7

Basic Classification of Matchers [RB01]

Schema vs Data Instance Element vs Structure Language vs Constraint

String based : Prefix, Suffix e.g. auth: author

Tokenization, Lemmatization, Eliminition [GSY04]

Tool_Kit :(Tool,Kit), Kits:Kit, IsRelatedTo : Related Data Types, Value domain e.g. 1..12 : month

Match Cardinalities - 1:1, 1:n, n:m (Tel Res, Other) : (Tel Day, Evening, Night)

Auxiliary Information Global Schema, Dictionaries, Thesauri, Previous Match

Decisions, User Input

Page 8: Khalid Saleem LIRMM

8

Basic Classification of Matchers [SE05]

Structure Level Techniques Graph Matching Children Leaves Relations

Taxonomy based Techniquese.g if super concept is same then sub concepts are

same or vice versa Model Based

ER, XML or XML schema, OWL, OO etc.

Combinational Matchers [RB01] Hybrid Matcher Multiple/Composite Matcher

Page 9: Khalid Saleem LIRMM

9

Match Dimensions [SE05]

For Match Algorithms designing

We need the knowledge for its utilization i.e. Dimensions

Input of the Algorithm Data or Schema, Element level or Structure Level

Characteristics of the Matching Process Require exact or approximate matching Performance over quality

Output of the Algorithms Output is a graded result, or part of a set of match

algorithms which are combined together for a map result

Page 10: Khalid Saleem LIRMM

10

Existing Matching Tools Cupid [MBR01]

COMA (COMA++) [ADMR05]

Similarity Flooding SemInt Artemis DIKE TransScm AutoMed Charlie [TBBT04]

Ontologies Specific NOM/ QOM OLA Anchor-PROMPT S-Match [GSY04] HICAL

SKAT

Page 11: Khalid Saleem LIRMM

11

Matching Tools continued

Machine Learning GLUE (LSD, CGLUE) [DMDH02]

Automatch

These tools do not completely fulfil the requirements for large scale schema matching because

Not fully automated Emphasise less on search space optimisation

Page 12: Khalid Saleem LIRMM

12

Our Approach Motivation :

Large Scale Scenario

Peer-to-peer Information Systems over the XML Web

b

a p

n

t

n

b

w f

n

t

p i

n

b

d

a

g

t p r

w

n h o

t

a: authorb: bookd: detailf: informationg: generalh: birthi: isbnn: nameo: own-booksp: publisherr: pricet: titlew: writer

b

w f

n

t

p i

n

a=wb=of=d

Our Schema Matching and Integration ApproachTree Mining Techniques

Name Matcher

Element Level Matching

Structure Level Matching

Search sub-trees

h

Page 13: Khalid Saleem LIRMM

13

Tree Mining Approach

Our work extends these data structures for schema matching and integration process for handling large sets of XML schema trees.

Employs a) Element level Name Matcher (same node label or synonym)

Cluster similar/synonym labelsb) Utilize the node scope values properties to extract semantics out of

structure E.g. node with label name n2[2,2] is a descendent of node with

label author n1[1,2] and not of node with label publisher n3[3,4] verified using descendent test

Inspired from the tree mining algorithms and data structures based on node scope

values (calculated by depth first pre-order traversal) Top-down [Z02]

bookn0 [0,5]

b

title n5 [5,5]

t

authorn1 [1,2]

a

namen2 [2,2]

n

publisher

n3 [3,4]

p

name n4 [4,4]

n

Descendent Node Check :Scope of Node x is [X,Y] and Scope of Descendent Node xd [Xd,Yd]

then Xd>X and Yd<=Y

Page 14: Khalid Saleem LIRMM

14

Tree Mining Approach … continued

Data Structure used Label List : Sorted list of all node labels in the forest of XML

schema trees xGrid : Matrix in which each row represent each participating

XML tree and each column represents the corresponding node label. Each cell contains the scope values, parent node number and mapping information.

Output Creation of a Mediated Schema Tree , from the given forest

of participating XML schema trees. Generation of Mapping Information between participating

schema trees and the mediated schema tree

Page 15: Khalid Saleem LIRMM

15

Tree Mining Approach … continued

b f g h i n n p r t w

<1,0>,<2,0>,<3,0>,<4,3>

<2,3>,<3,4>

<3,1> <4,2> <2,6> <1,2>,<2,2>,<3,3>,<4,1>

<1,4>,<2,5>, <3,6>

<1,3>,<2,4>

<3,5> <1,5>,<2,7>,<3,2>,<4,4>

<1,1>,<2,1>,<4,0)

a b d f g h i n n o p r t w R

1,11, 0

5,9, 1

11, 11,1

4,4, 2,

8,8, 5

3,3, 2

7,7, 6

6,7, 5

9,9, 5

10, 10,1

2,4, 1

0,11, -1,-1

1,2, 0,13

0,5, -1,1

2,2, 1,7

4,4, 3,8

3,4, 0,10

5,5, 0,12

0,7, -1,1

3,6, 0,3

6,6, 3,6

2,2, 1,7

5,5, 4,8

4,5, 3,10

7,7, 0,12

1,2, 0,13

3,3, 1,7

0,4, -1,1

4,6, 0,3

1,3, 0,4

6,6, 4,8

5,5, 4,11

2,2, 1,12

2,2, 0,5

1,1, 0,7

3,4, 0,1

4,4, 3,12

0,4, -1,13

Mapping Information is the column number of node

Sm

S1

S2

S3

S4

Page 16: Khalid Saleem LIRMM

16

Conclusion Element level Name and Linguistic Matching with the support of

thesaurus is an integral part of every Match system.

With systems moving towards schema/ontology based manipulation, and lack of global schemas or previous matching results, Structure Level matching is equally important for making out the semantics.

Peer-to-peer environment requires new methods to be exploited for performance and quality mapping i.e. integration of Tree Mining techniques for matching purposes and search space optimisation.

Machine Learning algorithms can be beneficial in the P2P environment in later stages when training examples have been created from instance data, provided the target domain remains the same.

Page 17: Khalid Saleem LIRMM

17

References

[AH04] Antoniou G., Harmelen F. A Semantic Web Primer, The MIT Press, 2004 [ADMR05] Aumuller D., Do H. H. , Massmann S., and Rahm E. Schema and ontology

matching with COMA++. In Proceedings of the International Conference on Management of Data (SIG-MOD), 2005

[BR04] Bellahsène Z. and Roantree M. (2004) Querying Distributed Data in a Super-peer based Architecture. DEXA 2004.

[BMP04] Bernstein PA., Melnik S., Petropoulos M. and Quix C. (2004) Industrial-Strength Schema Mapping. SIGMOD Record, Vol. 33, No. 4, December 2004

[DMDH02] Doan AH., Madhavan J., Domingos P. and Halvey A. (2002) Learning to Map Ontologies on the Semantic Web. WWW 2002

[MBR01] Madhavan J., Bernstein PA. and Rahm E. (2001) Generic Schema Matching with Cupid. VLDB 2001.

[RB01] Rahm E. and Bernstein PA (2001) A Survey of Approaches to Automatic Schema Matching. VLDB Journal 2001 : 10(4):334-3503

[SE05] Shvaiko P. and Euzenat J. (2005) A Survey of Schema-based Matching Approaches. Journal on Data Semantics, 2005.

[TBBT04] Tranier J., Baraer R., Bellahsene Z. and Teisseire M (2004) Where’s Charlie: Family Based Heuristics for Peer-to-Peer Schema Integration. IDEAS 2004, 227-235

[Z02] Zaki MJ (2002) Efficiently Mining Frequent Trees in a Forest. 8 th ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining. July 2002

http://www.w3.org/TR/daml+oil-reference http://www.doc.ic.ac.uk/automed/

Page 18: Khalid Saleem LIRMM

18

Thank you