21
1 Doan Dai Duong and Le Thi Thu Thuy {Duong_Dai.Doan, Thuy_Thi_Thu.Le}@unb.ca The University of New Brunswick, Fredericton, NB, Canada A Unified Framework for the Semantic Integration of XML Databases First IEEE International Conference on Digital Information Management (ICDIM) December 06-08, 2006 Presented by Virendrakumar C. Bhavsar

1 Doan Dai Duong and Le Thi Thu Thuy {Duong_Dai.Doan, Thuy_Thi_Thu.Le}@unb.ca The University of New Brunswick, Fredericton, NB, Canada A Unified Framework

Embed Size (px)

Citation preview

1

Doan Dai Duong and Le Thi Thu Thuy{Duong_Dai.Doan, Thuy_Thi_Thu.Le}@unb.ca

The University of New Brunswick, Fredericton, NB, Canada

A Unified Framework for the Semantic Integration of

XML Databases

First  IEEE International Conference on Digital Information Management (ICDIM)

December 06-08, 2006

Presented by

Virendrakumar C. Bhavsar

2

Agenda

Introduction XML Declarative Description (XDD) Modeling of Data Components Modelling of Processing

Components Conclusion

3

Introduction General model of XML database integration

Integrated XML schema

Set of mappings

XML Database Schema

Integration System

XML schema1

XML schemaN

XML schema2

Ontology

RDS

RDS

OODS

convert

Step 1: Schema Integration

Integrated data

<studenewrrwerr"><Fname>><room/rrrrrrrrrrr><national/rrewe></studeerewrewnt>

query<Fname>><national/></student>

<student source=“A"><Fname> Xuan</Fname><room>G26</room><nationality>Vietnam</nationality>

</student><student source="B">

<Fname>Phuoc</Fname><room>A12</room><nationality>Campuchia</nationality>

</student>

Local data<student source="B">

<Fname> Xuan</Fname><room>G26</room><nationality>Vietnam</nationality>

</student><student source="B">

<Fname>Phuoc</Fname><room>A12</room><nationality>Campuchia</nationality>

</student>

Local data<student source=“C">

<Fname> Xuan</Fname><room>G26</room><nationality>Vietnam</nationality>

</student><student source="B">

<Fname>Phuoc</Fname><room>A12</room><nationality>Campuchia</nationality>

</student>

Local data

s

n r c

s

n r c

s

n r c

xxx

xx xx

x x

xx

Integrated schema

Step 2: Query Processing

Users

4

Powerful XDD supports for all tasks of framework

Input XML query, input XML data, output XML data Rules, constraints, mappings Metadata

Based on XML standard format, XDD combines all tasks of framework tightly and makes it easily to manipulate data

Reduce time and effort of programmers and users and syntax errors

Integration

system

Integrated schema

Database sources

Metadata

User query

Integrated data

Proposed Integration Framework

XMLSchem

aXML database

XML data

XML query

XDD as underlying model

5

6

XML Declarative Description*

XML Declarative Description (XDD) is XML-based information representation

Ordinary XML expressions (ground XML expressions)+ variables = Non-ground XML expressions

Enhancement of expressive power and representation of implicit information

XML clauses of the form H ← B1, … , Bm, C1, …, Cn

Able to express conditions, constraints*Wuwongse, V., Anutariya, C., Akama, K., and Nantajeewarawat, E. XML Declarative Description (XDD): A Language for the Semantic Web. IEEE Intelligent Systems, Vol. 16, No. 3, (2001) 54-65

7

XML Databases Extension (actual data values): ground XML

expressions Intension (schemas, logical specifications,

relationships, indexes and constraints): non-ground XML expressions

XML Queries Include constructor, patterns, and filters Correspond to three parts (H, Bi, Cj) of XDD

rule H B1 …, Bm, C1,…,Cn

Modeling of Data Components

8

Modeling of Data Components

Query modelled by XDD

constructor

pattern

filter

<Student> <name>John</name> <nationality>Canadian </nationality> <GPA>4</GPA></Student>

<Student> <name>Duong</name> <nationality>Vietnamese </nationality> <GPA>4.2</GPA></Student>

<Student> <name>John</name> <nationality>Canadian </nationality> <GPA>4</GPA> <phone>234-7856<phone> <ID>3224567<ID> </Student>

<Student> <name>Duong</name> <nationality>Vietnamese </nationality> <GPA>4.2</GPA> <phone>456-3241<phone></Student>

Data source

Queryresult1 result2

Query Execution Example

10

Mappings Describes correspondence between object

in integrated schema and its corresponding objects in local schemas

Supports decomposing XML queries and converting data

Modeled by non-ground XML expressions

Modeling of Data Components

11

Sample of Mappings

Object in integrated schema

Object in schema A

Object in schema B

12

Schema Integration Component The main task is to resolve conflicts between

schemas of participating databases Conflict resolution between various schemas

is done at one time (one-shot strategy) Each local schema is big non-ground XML

expression ($E_variable)

Modelling of Processing Components

13

<Integrating_schema>

<schema name="1">…</schema>

<schema name="2">…</schema>

<schema name="n">…</schema>

</Integrating_schema>

<schema name="1"></schema>

<schema name="2"></schema>

<schema name="n"></schema>

$E expression

$E expression

$E expression

Schema Integration Component

XDD can interactively process all schemas as $E expressions

14

Schema Conflict Classification

Naming conflicts Synonyms

Acronyms

Homonyms

Structural conflicts Missing items conflicts

Internal path discrepancy conflicts

Aggregation conflicts

Generalization/specification

Constraint conflicts Occurring numbers of elements

Fixed vs. default values

Constraints of attributes

Data type conflicts Disjoint or incompatible data types

Compatible data types

IDREF and IDREFS

Conflicts between schemas can be classified into four main kindsConflicts between schemas can be classified into four main kinds

Aggregationconflict

Professor

FName MName LName

Professor

Name

Professor

FName MName LName Name

Union rule

Professor

FName MName LName

Name

Aggregation checking and

data type constructing ruleNew

data type is

created

14

16

Query Decomposition The main task yield n local subqueries from global query

<student id =“$S:id”><name>$S:name</name><country>$S:country</country>

</student>

<SATstudent key =”$S:id” source=”B”><fullname> $S:name </fullname><country>$S:country</country>

</SATstudent>

<SOMstudent id=”$S:id” source=”A”><name> $S:name </name><nation>$S:country</nation>

</SOMstudent>

SATstudent

country

fieldStudy

fullname

key

SOMstudent

nation program

name position

id

student

country

fieldname positionid

Integrated schema

Schema for source B

Schema for source A

Query Decomposition

B. Solution

Sub query for local source

query Query Decomposition

Sub query for local source

Mappings from global to localA. Brief

view

<student id =”$S:id”><name>$S:name</name><country>$S:country</country>

</student>

XML metadata

•XDD rules for transformation

Input XML query

<SATstudent key =”$S:id” source=”B”>

<country>$S:country</country></SATstudent>

<SOMstudent id =”$S:id” source=”A”>

<nation>$S:country</ nation></SOMstudent>

Output XML queries

<name> $S:name</name>

<fullname> $S:name </fullname>

16

<answer><SATstudent source=”B”> <country>$S:country</country></SATstudent><SOMstudent source=”A”> <nation>$S:country</nation></SOMstudent>

</answer>

<answer>$E:expression

</answer> <Mapping> <student> <country>$S:country</country> </student> <local>$E:expression</local> </Mapping>

<Mapping> <student>

<country>$S:country</country> </student> <local>

<SATstudent source=“B"> <country>$S:country</country></SATstudent><SOMstudent source=“A">

<nation>$S:country</nation></SOMstudent>

</local></Mapping>

1

matches with

2

bounds to

3

4

infers to

results in

Local query for source A

Local query for source B

Query Decomposition Example

19

Query Decomposition

Using special structure of mapping and applying XDD rules for query decomposition Subqueries for distributed data sources are

simultaneously produced

Similarly for data conversion, extracted data are simultaneously converted to global schema format

20

Conclusion

XDD is used to model all data components and processing components of XML database integration framework

Components of system modeled by XDD can communicate and

exchange data easily Special structure for XDD-based bidirectional mappings is

designed. Information is produced efficiently for both query decomposition and data conversion, avoiding data redundancy

The framework can Integrate n participating schemas Decompose a query into n subqueries at a time.