17
CSE 636 Data Integration Answering Queries Using Views Bucket Algorithm Fall 2006

CSE 636 Data Integration Answering Queries Using Views Bucket Algorithm Fall 2006

Embed Size (px)

Citation preview

Page 1: CSE 636 Data Integration Answering Queries Using Views Bucket Algorithm Fall 2006

CSE 636Data Integration

Answering Queries Using Views

Bucket Algorithm

Fall 2006

Page 2: CSE 636 Data Integration Answering Queries Using Views Bucket Algorithm Fall 2006

2

• Each subgoal g of Q must be “covered” by some view

• Make a list of candidates (buckets) per query subgoal

• Consider combinations of candidates from different buckets

• Not all combos are “compatible”• Keep the compatible ones and minimize them• Discard the ones contained in another• Take their union

The Bucket Algorithm

Page 3: CSE 636 Data Integration Answering Queries Using Views Bucket Algorithm Fall 2006

3

The Bucket Algorithm

q(X,Y,R) :- ForSale(X,Y,C,”auto”), Review(X,R,”auto”), Y > 1985

Step 1: For each subgoal, put the relevant sources into a bucket:

V1(name, year) :- ForSale(name, year, “France”, “auto”), year > 1990 would be relevantV3(name, year) :- ForSale(name, year, “France”, “cheese”) would be irrelevant

Step 2: Take the Cartesian product of the bucketsAlgorithm produces maximally contained rewritingIgnores interactions between subgoals in Step 1

Page 4: CSE 636 Data Integration Answering Queries Using Views Bucket Algorithm Fall 2006

4

The Bucket Algorithm: Example

V1(Std,Crs,Qtr,Title) :- reg(Std,Crs,Qtr), course(Crs,Title), Crs ≥ 500, Qtr ≥ Aut98V2(Std,Prof,Crs,Qtr) :- reg(Std,Crs,Qtr), teaches(Prof,Crs,Qtr)V3(Std,Crs) :- reg(Std,Crs,Qtr), Qtr ≤ Aut94V4(Prof,Crs,Title,Qtr) :- reg(Std,Crs,Qtr), course(Crs,Title), teaches(Prof,Crs,Qtr), Qtr ≤ Aut97

q(S,C,P) :- teaches(P,C,Q), reg(S,C,Q), course(C,T), C ≥ 300, Q ≥ Aut95

Step 1: For each query subgoal, put the relevant sources into a bucket

Page 5: CSE 636 Data Integration Answering Queries Using Views Bucket Algorithm Fall 2006

5

The Bucket Algorithm: Example

V1(Std,Crs,Qtr,Title) :- reg(Std,Crs,Qtr), course(Crs,Title), Crs ≥ 500, Qtr ≥ Aut98V2(Std,Prof,Crs,Qtr) :- reg(Std,Crs,Qtr), teaches(Prof,Crs,Qtr)V3(Std,Crs) :- reg(Std,Crs,Qtr), Qtr ≤ Aut94V4(Prof,Crs,Title,Qtr) :- reg(Std,Crs,Qtr), course(Crs,Title), teaches(Prof,Crs,Qtr), Qtr ≤ Aut97

q(S,C,P) :- teaches(P,C,Q), reg(S,C,Q), course(C,T), C ≥ 300, Q ≥ Aut95

PProf, CCrs, QQtr

Note: Arithmetic predicates don’t pose a problem

V2

Buckets

V4

teaches reg course

Page 6: CSE 636 Data Integration Answering Queries Using Views Bucket Algorithm Fall 2006

6

The Bucket Algorithm: Example

V1(Std,Crs,Qtr,Title) :- reg(Std,Crs,Qtr), course(Crs,Title), Crs ≥ 500, Qtr ≥ Aut98V2(Std,Prof,Crs,Qtr) :- reg(Std,Crs,Qtr), teaches(Prof,Crs,Qtr)V3(Std,Crs) :- reg(Std,Crs,Qtr), Qtr ≤ Aut94V4(Prof,Crs,Title,Qtr) :- reg(Std,Crs,Qtr), course(Crs,Title), teaches(Prof,Crs,Qtr), Qtr ≤ Aut97

q(S,C,P) :- teaches(P,C,Q), reg(S,C,Q), course(C,T), C ≥ 300, Q ≥ Aut95

SStd, CCrs, QQtr

Note: V3 doesn’t work: arithmetic predicates not consistentV4 doesn’t work: S not in the output of V4

V2

Buckets

V4

teaches reg course

V1V2

Page 7: CSE 636 Data Integration Answering Queries Using Views Bucket Algorithm Fall 2006

7

The Bucket Algorithm: Example

V1(Std,Crs,Qtr,Title) :- reg(Std,Crs,Qtr), course(Crs,Title), Crs ≥ 500, Qtr ≥ Aut98V2(Std,Prof,Crs,Qtr) :- reg(Std,Crs,Qtr), teaches(Prof,Crs,Qtr)V3(Std,Crs) :- reg(Std,Crs,Qtr), Qtr ≤ Aut94V4(Prof,Crs,Title,Qtr) :- reg(Std,Crs,Qtr), course(Crs,Title), teaches(Prof,Crs,Qtr), Qtr ≤ Aut97

q(S,C,P) :- teaches(P,C,Q), reg(S,C,Q), course(C,T), C ≥ 300, Q ≥ Aut95

CCrs, TTitle V2

Buckets

V4

teaches reg course

V1V2

V1V4

Page 8: CSE 636 Data Integration Answering Queries Using Views Bucket Algorithm Fall 2006

8

The Bucket Algorithm: Example

Step 2:• Try all combos of views, one each from a bucket• Test satisfaction of arithmetic predicates in each case

– e.g., two views may not overlap, i.e., they may be inconsistent

• Desired rewriting = union of surviving ones

Query rewriting 1:

q1(S,C,P) :- V2(S’,P,C,Q), V1(S,C,Q,T’), V1(S”,C,Q’,T)– no problem from arithmetic predicates (none in V2)– May or may not be minimal (why?)

V2V4

teaches reg course

V1V2

V1V4

Page 9: CSE 636 Data Integration Answering Queries Using Views Bucket Algorithm Fall 2006

9

The Bucket Algorithm: Example

Unfolding of rewriting 1:q1’(S,C,P) :- r(S’,C,Q), t(P,C,Q), r(S,C,Q), c(C,T’), r(S”,C,Q’), c(C,T), C ≥ 500, Q ≥ Aut98, C ≥ 500, Q’ ≥ Aut98

• Black r’s can be mapped to green r:S’S, S”S, Q’Q

• Black c can be mapped to green c:just extend above mapping to TT’

Minimized unfolding of rewriting 1:q1m’(S,C,P) :- t(P,C,Q), r(S,C,Q), c(C,T’), C ≥ 500, Q ≥ Aut98Minimized rewriting 1:q1m(S,C,P) :- V2(S’,P,C,Q), V1(S,C,Q,T’)

Page 10: CSE 636 Data Integration Answering Queries Using Views Bucket Algorithm Fall 2006

10

The Bucket Algorithm: Example

Query Rewriting 2:

q2(S,C,P) :- V2(S’,P,C,Q), V1(S,C,Q,T’), V4(P’,C,T,Q’)q2’(S,C,P) :- r(S’,C,Q), t(P,C,Q), r(S,C,Q), r(S,C,Q), c(C,T’), C ≥ 500, Q ≥ Aut98, r(S”,C,Q’), c(C,T), t(P’,C,Q’), Q’ ≤ Aut97• This combo is infeasible: consider the conjunction of

arithmetic predicates in V1 and V4

Query rewriting 3:

q3(S,C,P) :- V2(S’,P,C,Q), V2(S,P’,C,Q), V4(P”,C,T,Q’)

V2V4

teaches reg course

V1V2

V1V4

V2V4

teaches reg course

V1V2

V1V4

Page 11: CSE 636 Data Integration Answering Queries Using Views Bucket Algorithm Fall 2006

11

The Bucket Algorithm: Example

Unfolding of rewriting 3:q3’(S,C,P) :- r(S’,C,Q), t(P,C,Q), r(S,C,Q), t(P’,C,Q), r(S”,C,Q’), c(C,T), t(P”,C,Q’), Q’ ≤ Aut97• The green subgoals can cover the black ones under the

mapping: S’S, S”S, P’P, P”P, Q’Q

Minimized rewriting 3:q3m(S,C,P) :- V2(S,P,C,Q), V4(P,C,T,Q)

Verify that there are only two rewritings that are not covered by others

Maximally Contained Rewriting:q’ = q1m q3m

Page 12: CSE 636 Data Integration Answering Queries Using Views Bucket Algorithm Fall 2006

12

The Bucket Algorithm: Example 2

Query:q(X) :- cites(X,Y), cites(Y,X), sameTopic(X,Y)

Views:V4(A) :- cites(A,B), cites(B,A)V5(C,D) :- sameTopic(C,D)V6(F,H) :- cites(F,G), cites(G,H), sameTopic(F,G)

Note: Should we list V4(X) twice in the buckets?

V4

Buckets

V6

cites cites sameTopic

V4

V6

V5

V6

Page 13: CSE 636 Data Integration Answering Queries Using Views Bucket Algorithm Fall 2006

13

The Bucket Algorithm: Example 2

• Consider all combos & check for containment of the unfolded rewriting in Q

• V4(X) cannot be combined with anything (why?) Try q1(X) :- V4(X), V4(X), V5(X,Y)Try q2(X) :- V4(X), V6(X,Y), V5(X,Y)

• Does any of these work?• When can we discard a view from consideration?

Page 14: CSE 636 Data Integration Answering Queries Using Views Bucket Algorithm Fall 2006

14

The Bucket Algorithm: Example 2

Here is a successful rewriting:q3(X) :- V6(X,Y), V6(X,Y), V6(X,Y)

• By itself is not contained in Q• But, with subgoal X=Y added, it is!

By minimizing the rewriting, we get:q3m(X,Y) :- V6(X,X)

Page 15: CSE 636 Data Integration Answering Queries Using Views Bucket Algorithm Fall 2006

15

The Bucket Algorithm: Example 2

Remarks: • V4 didn’t contribute to any rewrite, but the

bucket algorithm doesn’t recognize it ahead• Consider:

q2(X,Y) :- cites(X,Y), cites(Y,X)• Then both cites predicates can be folded into V4

– Not recognized by the bucket algorithm

Page 16: CSE 636 Data Integration Answering Queries Using Views Bucket Algorithm Fall 2006

16

The State of Affairs

• Bucket algorithm:– deals well with predicates, Cartesian product can be

large (containment check required for every candidate rewriting)

• Inverse rules:– modular (extensible to binding patterns, FD’s)– no treatment of predicates– resulting rewritings need significant further

optimization

Neither scales up• The MINICON algorithm:

– change perspective: look at query variables

Page 17: CSE 636 Data Integration Answering Queries Using Views Bucket Algorithm Fall 2006

17

References

• Querying Heterogeneous Information Sources Using Source Descriptors– By Alon Y. Levy, Anand Rajaraman and Joann J. Ordille– VLDB, 1996

• Laks VS Lakshmanan– Lecture Slides

• Alon Halevy– Answering Queries Using Views: A Survey– VLDB Journal, 2000– http://citeseer.ist.psu.edu/halevy00answering.html