44
Containment of Partially Specified Tree-Pattern Queries Dimitri Theodoratos (NJIT, USA) Theodore Dalamagas (NTUA, GREECE) Pawel Placek (NJIT, USA) Stefanos Souldatos (NTUA, GREECE) Timos Sellis (NTUA, GREECE)

Containment of Partially Specified Tree-Pattern Queries

  • Upload
    upton

  • View
    35

  • Download
    2

Embed Size (px)

DESCRIPTION

Containment of Partially Specified Tree-Pattern Queries. Dimitri Theodoratos (NJIT, USA) Theodore Dalamagas (NTUA, GREECE) Pawel Placek (NJIT, USA) Stefanos Souldatos (NTUA, GREECE) Timos Sellis (NTUA, GREECE). Introduction. r. GREECE. USA. ATHENS. YAMAHA. BMW. HONDA. YAMAHA. BMW. - PowerPoint PPT Presentation

Citation preview

Page 1: Containment of  Partially Specified  Tree-Pattern Queries

Containment of Partially Specified

Tree-Pattern QueriesDimitri Theodoratos (NJIT, USA)

Theodore Dalamagas (NTUA, GREECE)

Pawel Placek (NJIT, USA)

Stefanos Souldatos (NTUA, GREECE)

Timos Sellis (NTUA, GREECE)

Page 2: Containment of  Partially Specified  Tree-Pattern Queries

Introduction

Page 3: Containment of  Partially Specified  Tree-Pattern Queries

Stefanos Souldatos - SSDBM 2006 3

Motivating example Consider a database including shops that sell

spare parts for motorbikes. The authors of the paper need spare parts for

their motorbikes. BUT… rr

ATHENSATHENS

HONDAHONDA

GREECEGREECE USAUSA

YAMAHAYAMAHA BMWBMW

TRAVELTRAVEL

VARADEROVARADERO

125cc125cc 1000cc1000cc

ON-OFFON-OFF

200cc200cc

SERROWSERROW

TRAVELTRAVEL

650cc650cc

F650F650F650GSF650GS

YAMAHAYAMAHA BMWBMW

ON-OFFON-OFF

200cc200cc

SERROWSERROW

TRAVELTRAVEL

F650GSF650GS

650cc650cc

NJNJ

Page 4: Containment of  Partially Specified  Tree-Pattern Queries

Stefanos Souldatos - SSDBM 2006 4

1: Structural Differences Dimitri Theodoratos lives in NY. He owns a

Yamaha Serrow motorbike in Greece. So, he searches for spare parts in Greece or USA.

A structural difference emerges…

rr

ATHENSATHENS

HONDAHONDA

GREECEGREECE USAUSA

YAMAHAYAMAHA BMWBMW

TRAVELTRAVEL

VARADEROVARADERO

125cc125cc 1000cc1000cc

ON-OFFON-OFF

200cc200cc

SERROWSERROW

TRAVELTRAVEL

650cc650cc

F650F650F650GSF650GS

YAMAHAYAMAHA BMWBMW

ON-OFFON-OFF

200cc200cc

SERROWSERROW

TRAVELTRAVEL

F650GSF650GS

650cc650cc

NJNJ

?

Page 5: Containment of  Partially Specified  Tree-Pattern Queries

Stefanos Souldatos - SSDBM 2006 5

2: Structural Inconsistencies Theodore Dalamagas owns a BMW motorbike

and looks for spare parts worldwide. A structural inconsistency emerges…

rr

ATHENSATHENS

HONDAHONDA

GREECEGREECE USAUSA

YAMAHAYAMAHA BMWBMW

TRAVELTRAVEL

VARADEROVARADERO

125cc125cc 1000cc1000cc

ON-OFFON-OFF

200cc200cc

SERROWSERROW

TRAVELTRAVEL

650cc650cc

F650F650F650GSF650GS

YAMAHAYAMAHA BMWBMW

ON-OFFON-OFF

200cc200cc

SERROWSERROW

TRAVELTRAVEL

F650GSF650GS

650cc650cc

NJNJ

650cc/F650GS

vs

F650GS/650cc

Page 6: Containment of  Partially Specified  Tree-Pattern Queries

Stefanos Souldatos - SSDBM 2006 6

3: Unknown Structure Stefanos Souldatos owns a Honda Varadero. But, he is not fully aware of the hierarchy...

rr

ATHENSATHENS

HONDAHONDA

GREECEGREECE USAUSA

YAMAHAYAMAHA BMWBMW

TRAVELTRAVEL

VARADEROVARADERO

125cc125cc 1000cc1000cc

ON-OFFON-OFF

200cc200cc

SERROWSERROW

TRAVELTRAVEL

650cc650cc

F650F650F650GSF650GS

YAMAHAYAMAHA BMWBMW

ON-OFFON-OFF

200cc200cc

SERROWSERROW

TRAVELTRAVEL

F650GSF650GS

650cc650cc

NJNJ

Page 7: Containment of  Partially Specified  Tree-Pattern Queries

Stefanos Souldatos - SSDBM 2006 7

rr

ATHENSATHENS

HONDAHONDA

GREECEGREECE USAUSA

YAMAHAYAMAHA BMWBMW

TRAVELTRAVEL

VARADEROVARADERO

125cc125cc 1000cc1000cc

ON-OFFON-OFF

200cc200cc

SERROWSERROW

TRAVELTRAVEL

650cc650cc

F650F650F650GSF650GS

YAMAHAYAMAHA BMWBMW

ON-OFFON-OFF

200cc200cc

SERROWSERROW

TRAVELTRAVEL

F650GSF650GS

650cc650cc

NJNJ

4: Source Integration Pawel Placek wants to buy a motorbike that he

can easily find spare parts for. He searches in many hierarchies with different

structure…rr

ATHENSATHENS

HONDAHONDA

GREECEGREECE USAUSA

YAMAHAYAMAHA BMWBMW

TRAVELTRAVEL

VARADEROVARADERO

125cc125cc 1000cc1000cc

ON-OFFON-OFF

200cc200cc

SERROWSERROW

TRAVELTRAVEL

650cc650cc

F650F650F650GSF650GS

YAMAHAYAMAHA BMWBMW

ON-OFFON-OFF

200cc200cc

SERROWSERROW

TRAVELTRAVEL

F650GSF650GS

650cc650cc

NJNJ

rr

ATHENSATHENS

HONDAHONDA

GREECEGREECE USAUSA

YAMAHAYAMAHA BMWBMW

TRAVELTRAVEL

VARADEROVARADERO

125cc125cc 1000cc1000cc

ON-OFFON-OFF

200cc200cc

SERROWSERROW

TRAVELTRAVEL

650cc650cc

F650F650F650GSF650GS

YAMAHAYAMAHA BMWBMW

ON-OFFON-OFF

200cc200cc

SERROWSERROW

TRAVELTRAVEL

F650GSF650GS

650cc650cc

NJNJ

Page 8: Containment of  Partially Specified  Tree-Pattern Queries

Stefanos Souldatos - SSDBM 2006 8

Motivation Querying tree-structured data requires exact

knowledge of the structure. BUT, structure is not always strictly defined!

MOREOVER, the user needs to set queries without always dealing with structure: Find shops for Honda spare parts in Athens, Greece …where Athens is a descendant of Greece …but I don’t know if Athens is an ancestor or a

descendant of Honda.

Page 9: Containment of  Partially Specified  Tree-Pattern Queries

Stefanos Souldatos - SSDBM 2006 9

Previous Approaches Keyword-based search approach

Total absence of structure. Approximation techniques

Relax the initial query and retrieve more answers. Naive approach

All possible query patterns are generated and evaluated against the tree structure.

Traditional integration approach Global structure and mapping rules.

Page 10: Containment of  Partially Specified  Tree-Pattern Queries

Stefanos Souldatos - SSDBM 2006 10

Our Approach We have designed a Query Language that

allows for partial specification of the structure (Partially Specified Tree-Pattern Queries).

Query formulation and evaluation is based on Dimension Graphs that summarize the structure of the trees.

Dimension graphs group sets of semantically related nodes called Dimensions.

We study the problem of Query Containment for Partially Specified Tree-Pattern Queries.

Page 11: Containment of  Partially Specified  Tree-Pattern Queries

Data Model

Page 12: Containment of  Partially Specified  Tree-Pattern Queries

Stefanos Souldatos - SSDBM 2006 12

Dimension Graphs

Dimension graphs summarize the structure of the tree.

rr

ATHENSATHENS

HONDAHONDA

GREECEGREECE USAUSA

YAMAHAYAMAHA BMWBMW

TRAVELTRAVEL

VARADEROVARADERO

125cc125cc 1000cc1000cc

ON-OFFON-OFF

200cc200cc

SERROWSERROW

TRAVELTRAVEL

650cc650cc

F650F650F650GSF650GS

YAMAHAYAMAHA BMWBMW

ON-OFFON-OFF

200cc200cc

SERROWSERROW

TRAVELTRAVEL

F650GSF650GS

650cc650cc

NJNJ

R (oot)

C (ountry) = {GREECE, USA}

B (rand)

T (ype)

L (ocation)

M (odel)

E (ngine)

R

C

B

T M

L

E

DIMENSIONS

Page 13: Containment of  Partially Specified  Tree-Pattern Queries

Stefanos Souldatos - SSDBM 2006 13

Dimension Graphs…

Offer a summary of the structure of the tree.

Provide the necessary semantics for query formulation.

Set the framework for querying sources with structural differences and inconsistencies.

Support query evaluation and optimization.

R

C

B

T M

L

E

R (oot)

C (ountry)

B (rand)

T (ype)

L (ocation)

M (odel)

E (ngine)

DIMENSIONS

Page 14: Containment of  Partially Specified  Tree-Pattern Queries

Stefanos Souldatos - SSDBM 2006 14

Partially Specified Tree-pattern Queries

Query: Find shops with spare parts for all models and engines of BMW motorbikes in Greece.

R

C

B

T M

L

E

C = {Greece}

B = {BMW}

M = ?

B = {BMW}

E = ?

R (oot)

C (ountry)

B (rand)

T (ype)

L (ocation)

M (odel)

E (ngine)

DIMENSIONS

Page 15: Containment of  Partially Specified  Tree-Pattern Queries

Stefanos Souldatos - SSDBM 2006 15

Partially Specified Tree-pattern Queries

Query: Find shops with spare parts for all models and engines of BMW motorbikes in Greece.

R

C

B

T M

L

E

partially specified paths (PSP)

C = {Greece}

B = {BMW}

M = ?

B = {BMW}

E = ?

PSP *p2PSP p1R (oot)

C (ountry)

B (rand)

T (ype)

L (ocation)

M (odel)

E (ngine)

DIMENSIONS

Page 16: Containment of  Partially Specified  Tree-Pattern Queries

Stefanos Souldatos - SSDBM 2006 16

Partially-Specified Tree-pattern Queries

Query: Find shops with spare parts for all models and engines of BMW motorbikes in Greece.

R

C

B

T M

L

E

output path (*)

partially specified paths (PSP)

C = {Greece}

B = {BMW}

M = ?

B = {BMW}

E = ?

PSP *p2PSP p1R (oot)

C (ountry)

B (rand)

T (ype)

L (ocation)

M (odel)

E (ngine)

DIMENSIONS

Page 17: Containment of  Partially Specified  Tree-Pattern Queries

Stefanos Souldatos - SSDBM 2006 17

Partially-Specified Tree-pattern Queries

Query: Find shops with spare parts for all models and engines of BMW motorbikes in Greece.

parentchild

R

C

B

T M

L

E

output path (*)

partially specified paths (PSP)

ancestordescendant

C = {Greece}

B = {BMW}

M = ?

B = {BMW}

E = ?

PSP *p2PSP p1R (oot)

C (ountry)

B (rand)

T (ype)

L (ocation)

M (odel)

E (ngine)

DIMENSIONS

Page 18: Containment of  Partially Specified  Tree-Pattern Queries

Stefanos Souldatos - SSDBM 2006 18

Partially-Specified Tree-pattern Queries

Query: Find shops with spare parts for all models and engines of BMW motorbikes in Greece.

parentchild

R

C

B

T M

L

E

node sharing expression

(NSE)

output path (*)

partially specified paths (PSP)

ancestordescendant

C = {Greece}

B = {BMW}

M = ?

B = {BMW}

E = ?

PSP *p2PSP p1R (oot)

C (ountry)

B (rand)

T (ype)

L (ocation)

M (odel)

E (ngine)

DIMENSIONS

Page 19: Containment of  Partially Specified  Tree-Pattern Queries

Dimension Trees and Query Evaluation

Page 20: Containment of  Partially Specified  Tree-Pattern Queries

Stefanos Souldatos - SSDBM 2006 20

Dimension TreesR

C

B

T M

L

E

1) We apply Inference Rules to the query to produce its Full Form.

2) We search for paths in the graph that match the paths of the query.

3) Paths in the graph correspond to Dimension Trees.

4) Dimension trees are translated into XPath queries.

C = {Greece}

B = {BMW}

M = ?

B = {BMW}

E = ?

PSP *p2PSP p1

C = {Greece}

Page 21: Containment of  Partially Specified  Tree-Pattern Queries

Stefanos Souldatos - SSDBM 2006 21

Dimension TreesR

C

B

T M

L

E

1) We apply Inference Rules to the query to produce its Full Form.

2) We search for paths in the graph that match the paths of the query.

3) Paths in the graph correspond to Dimension Trees.

4) Dimension trees are translated into XPath queries.

C = {Greece}

B = {BMW}

M = ?

B = {BMW}

E = ?

PSP *p2PSP p1

C = {Greece}

Page 22: Containment of  Partially Specified  Tree-Pattern Queries

Stefanos Souldatos - SSDBM 2006 22

Dimension TreesR

C

B

T M

L

E

1) We apply Inference Rules to the query to produce its Full Form.

2) We search for paths in the graph that match the paths of the query.

3) Paths in the graph correspond to Dimension Trees.

4) Dimension trees are translated into XPath queries.

RC = {Greece}

B = {BMW}

T

ME

C = {Greece}

B = {BMW}

M = ?

B = {BMW}

E = ?

PSP *p2PSP p1

C = {Greece}

Page 23: Containment of  Partially Specified  Tree-Pattern Queries

Stefanos Souldatos - SSDBM 2006 23

Dimension TreesR

C

B

T M

L

E

r/Greece/BMW/*T[*E]/*M

1) We apply Inference Rules to the query to produce its Full Form.

2) We search for paths in the graph that match the paths of the query.

3) Paths in the graph correspond to Dimension Trees.

4) Dimension trees are translated into XPath queries.

RC = {Greece}

B = {BMW}

T

ME

C = {Greece}

B = {BMW}

M = ?

B = {BMW}

E = ?

PSP *p2PSP p1

C = {Greece}

Page 24: Containment of  Partially Specified  Tree-Pattern Queries

Stefanos Souldatos - SSDBM 2006 24

Dimension TreesR

C

B

T M

L

E

So, Dimension Trees are equivalent to the Query in a specific Dimension Graph:

“Dimension Trees = Query + Graph”

RC = {Greece}

B = {BMW}

T

ME

C = {Greece}

B = {BMW}

M = ?

B = {BMW}

E = ?

PSP *p2PSP p1

C = {Greece}

r/Greece/BMW/*T[*E]/*M

Page 25: Containment of  Partially Specified  Tree-Pattern Queries

Query Containment

Page 26: Containment of  Partially Specified  Tree-Pattern Queries

Stefanos Souldatos - SSDBM 2006 26

Query Containment

Absolute Containment Query Q1 is absolutely contained in query Q2 if

the answers of Q1 in every dimension graph are also answers of Q2.

Relative Containment (with respect to graph) Query Q1 is contained in query Q2 with respect to

graph G if the answers of Q1 in G are also answers of Q2 in G.

Page 27: Containment of  Partially Specified  Tree-Pattern Queries

Stefanos Souldatos - SSDBM 2006 27

Query Containment

Absolute Containment Query Q1 is absolutely contained in query Q2 if

the answers of Q1 in every dimension graph are also answers of Q2.

Relative Containment (with respect to graph) Query Q1 is contained in query Q2 with respect to

graph G if the answers of Q1 in G are also answers of Q2 in G.

Page 28: Containment of  Partially Specified  Tree-Pattern Queries

Stefanos Souldatos - SSDBM 2006 28

Checking Absolute Containment

Query Q1 is absolutely contained in query Q2 if there is a mapping from Q2 to Q1.

Q1 Q2

C = ?

B = ?

M = ?

B = ?

E = ?

PSP p2PSP *p1

C = ? C = ?

B = ?E = ?

PSP p4PSP *p3

C = ?

Page 29: Containment of  Partially Specified  Tree-Pattern Queries

Stefanos Souldatos - SSDBM 2006 29

Checking Relative Containment

Query Q1 is contained in query Q2 with respect to graph G if every dimension tree of Q1 is contained in a dimension tree of Q2.

RC

B

T

ME

RC

B

T

E

A dimension tree of Q1 A dimension tree of Q2

Page 30: Containment of  Partially Specified  Tree-Pattern Queries

Stefanos Souldatos - SSDBM 2006 30

Absolute & Relative Containment Checking Absolute Containment:

1msec

Checking Relative Containment: 100msec – 1sec

We suggest a technique that improves the performance of Relative Containment check. (Relative Containment 2 – RC2)

It is a sound but not complete technique However, it maintains high accuracy for containment

check.

Page 31: Containment of  Partially Specified  Tree-Pattern Queries

Stefanos Souldatos - SSDBM 2006 31

Relative Containment 2 (RC2)

For each possible parent-child or ancestor-descendant relation in the query, we extract relations that hold in all graph paths that include it:

R=>C : RC

LB : RC, CL

We add these relations in Q1 and check absolutely whether Q1’ Q2.

R

C

B

T M

L

E

Page 32: Containment of  Partially Specified  Tree-Pattern Queries

Stefanos Souldatos - SSDBM 2006 32

Example Q1 is NOT absolutely contained in Q2. But, Q1 is relatively contained in Q2.

Q1 Q2

L = ?

B = ?

PSP *p1 PSP *p2

L = ?

C = ?

Page 33: Containment of  Partially Specified  Tree-Pattern Queries

Stefanos Souldatos - SSDBM 2006 33

Example Q1 is NOT absolutely contained in Q2. But, Q1 is relatively contained in Q2.

Q1 Q2

L = ?

B = ?

PSP *p1

L = ?

PSP *p2

C = ?

Relative Containment 2 LB : RC, CL

C = ?

R = ?

G

Page 34: Containment of  Partially Specified  Tree-Pattern Queries

Experiments

Page 35: Containment of  Partially Specified  Tree-Pattern Queries

Stefanos Souldatos - SSDBM 2006 35

Experiments We measured…

execution time for checking Absolute Containment, Relative Containment and Relative Containment 2 (RC2)

the accuracy of RC2

…varying the density of the dimension graphs (dimensions,

root-to-leaf paths) the size of the queries (PSPs, nodes per PSP)

Page 36: Containment of  Partially Specified  Tree-Pattern Queries

Stefanos Souldatos - SSDBM 2006 36

Varying Graphs

time (sec)

graph paths

Relative Containment

Relative Containment 2

Absolute Containment

RC2 accuracy > 80%

RC2 accuracy > 50%30 dimensions

Page 37: Containment of  Partially Specified  Tree-Pattern Queries

Stefanos Souldatos - SSDBM 2006 37

Varying Queries

time (sec)

dimensions per PSP

Relative Containment

Relative Containment 2

Absolute Containment

RC2 accuracy > 93%2 PSPs

Page 38: Containment of  Partially Specified  Tree-Pattern Queries

Conclusionand Future Work

Page 39: Containment of  Partially Specified  Tree-Pattern Queries

Stefanos Souldatos - SSDBM 2006 39

Conclusion We addressed the problem of Query

Containment for Partially Specified Tree-Pattern Queries (PSTPQs).

We suggested a sound technique for checking Relative Query Containment.

Experiments showed that the technique improves checking of relative containment one order of magnitude with accuracy over 80% for graphs of common sizes

(based on XML benchmarks e.g. XMark, XMach etc).

Page 40: Containment of  Partially Specified  Tree-Pattern Queries

Stefanos Souldatos - SSDBM 2006 40

Future Work Suggest and experiment with a set of heuristics

for checking Relative Query Containment. Discuss the trade-off between time and accuracy

for each of those heuristics.

Page 41: Containment of  Partially Specified  Tree-Pattern Queries

Questions?

Page 42: Containment of  Partially Specified  Tree-Pattern Queries

Stefanos Souldatos - SSDBM 2006 42

Links

Introduction (2-10)

Data Model (11-18)

Query Evaluation (19-24)

Query Containment (25-32)

Experiments (33-36)

Conclusion – Future Work (37-39)

Page 43: Containment of  Partially Specified  Tree-Pattern Queries

Appendix

Page 44: Containment of  Partially Specified  Tree-Pattern Queries

Stefanos Souldatos - SSDBM 2006 44

Dimension TreesR

C

B

T M

L

E

RC = {Greece}

B = {BMW}

T

M

E

r/Greece/BMW/

*T[*E]/*M

RC = {Greece}

B = {BMW}

T

ME

r/Greece/BMW/*T/*M [*E]

RC = {Greece}

B = {BMW}

T

E

M

RC = {Greece}

B = {BMW}

T

M

E

E

M

r/Greece/BMW/*T[*M/*E]/*E*M

r/Greece/BMW/*T/*E/*M

C = {Greece}

B = {BMW}

M = ?

B = {BMW}

E = ?

PSP *p2PSP p1

C = {Greece}