Upload
hilde
View
51
Download
0
Embed Size (px)
DESCRIPTION
Local as View: First steps. Introduction and an example Rewriting queries using views The Information Manifold system. Introduction and an example. LAV : local as view The sources are modeled as views, typically conjunctive, of the global (virtual) schema (why conjunctive?) - PowerPoint PPT Presentation
Citation preview
2005 lav-i 1
Local as View: First steps
Introduction and an exampleRewriting queries using viewsThe Information Manifold system
2005 lav-i 2
Introduction and an example
LAV: local as view
The sources are modeled as views, typically conjunctive, of the global (virtual) schema
(why conjunctive?)
Advantage: adding / removing a source are local to the source, and do not impact the global schema (if sufficiently general)
2005 lav-i 3
A source is associated with a view definition; what are the assumptions on its contents?
Sound: contains a subset of the view definition
Complete: contains all the data in the definition (possibly more)
Exact: contains exactly the data in the view definition
The common assumption: sound views (fits the WWW environment)
2005 lav-i 4
Answering a conjunctive query:
Rewrite the query in terms of the views
Equivalent rewriting: the rewriting is equivalent to the query
Contained rewriting: the rewriting is contained in the query
Maximally contained rewriting: • contained in query,
• not contained in another rewriting
For sound views, it is reasonable to search for maximally contained rewritings , then take their union
2005 lav-i 5
Example: a university db (each attribute is explained when it first occurs)
course(c#,ti) //c-number, title
teaches(pr, c#, qu) // prof, quarter
registered(st, c#, qu) //student
major(st, dp) //dept
advises(pr, st)
Views:
v0(p, s):- advises(p,s)
v1(p,s, q) :- registered(s,c,q), teaches(p,c,q), q>= w97 //winter97
2005 lav-i 6
Q: select a.pr, a.st, r.qu
from registered r, teaches t, advises a
where r.c# = t.c# and r.qu = t.qu and a.pr = t.pr a.st = r.st
and r.qu >=w98
As a conjunctive query : (with individual variables) jump
q(p,s,q) :- registered(s,c,q), teaches(p,c,q), advises(p,s), q>=w98
Let Q’: q(p,s,q):- v1(p.s, q), v0(p,s), q>=w98
Expanding by the view defs:
Q’’: q(p,s,q) :- registered(s,c,q), teaches(p,c,q), q>= w97 ,
advises(p,s), q>=w98
Q’’ is equivalent to Q, hence this is a good rewriting
2005 lav-i 7
Assume v1 is replaced by
v2(s,q) :- registered(s,c,q), teaches(p,c,q), q>= w98
Can we answer the query?
The expansion of v2(s,q), v0(p,s):
q(p,s,q) :- registered(s,c,q), teaches(p’,c,q), q>= w98 ,
advises(p,s),
We use p’ (rather than p), since it is an existential variable of v1
This is not equivalent to the query, nor contained in it
Assume q was dropped from both v1 and Q, could we answer Q using v0 and v1?
2005 lav-i 8
Assume v1 is replaced by v3:
select r.st, t.pr, r.qu
from registered r, teaches t
where r.c# = t.c# and r.qu >= win98
Convert to conjunctive!
Can we answer the query?
Interim summary:
To be useable in a query rewriting, views have to
• export variables that are subject to (arithmetic or join) conditions in the body of the query , or to contain such joins in their bodies,
• treat query head variables as head variables of the view
2005 lav-i 9
Rewriting queries using views
Scenario:
We have a collection of view definitions
V = {v1, v2, …, vn}
Given a query Q, we ask:
• is there a rewriting that uses the views?
• If there is, how can it be computed?
Applications:
• Query optimization by using materialized views
• Data integration
2005 lav-i 10
Example :
Q: q(X, U) :- p(X, Y), r(Y, Z), s(X, W), t(W, U)
V: v1(A,B) :- p(A,C), r(C,B), s(A, D)
A partial rewriting:
Q’: q(X, U) :- v1(X, Z), s(X, W), t(W, U)
The atom s(A, D) in v1 does not replace s(X, W) in Q (and Q’)
Given also v2(A,B) :- s(A, C), t(C, B), r(D, E)
Q’’: q(X, U) :- v1(X, Z), v2(X, U)
Is a complete equivalent rewriting
Note: looking at Q’, it is not evident that by replacing the last two atoms with v2 we obtain an equivalent rewriting; v2 contains r(D, E), and Q’ does not contain r.
2005 lav-i 11
Classification:
• Partial rewriting: only part of the query is replaced by views
useful in query optimization, not in data integration
• Complete rewriting: only views occur in it
useful in both scenarios
• Equivalent rewriting: yields the same answer on all db’s
useful in both cases
• Contained rewriting: its expansion is a contained query
not interesting for query optimization, useful for integration
• Maximal contained rewriting: contained in query, not in another rewriting
We assume (unless stated otherwise) conjunctive queries and views, no b.i. preds
2005 lav-i 12
Assume a query Q q(..):- Qbody is given.
A view is usable in an equivalent rewriting if it occurs in some partial equivalent rewriting of Q
Claim 1 : jump
That is, if v(D) is empty, for some db D, then so is Q(D)
Proof:
If the containment does not hold, then there is a D s.t. v(D) is empty, Q(D) is not, so the rewriting is not equivalent
If it holds, then Q’: q(..):- Qbody, v(Y) (Y new vars) is an equivalent partial rewriting
(bold letters – vectors of variables/constants)
(in equiv. rewriting)
(the projection of Q on the empty set of attributes is contained in that of v)
v is usable iff (Q) (v) Æ ÆP Í P
2005 lav-i 13
Corollary: checking for usability is NP-complete
Corollary : if v1,…, vk are usable, then
Q’: q(..):- Qbody, v1(Y1), …, vk(Yk) (Yi new and distinct) is an equivalent rewriting
Apply the containment map hi to each v(Yi), obtaining Q’’ that is contained in Q’ (and does not introduce any new vars)
Q’’ contains Q, hence Q, Q’ Q’’ are equivalent
Q’’ still contains all atoms of bodyQ; some of these may be removed by standard minimization; then some views may be removed
How many must be left?
(Q) (v):
iff there is a containment mapping from body(v) to body(Q)Æ ÆP Í P
2005 lav-i 14
Claim 2 : jump
If Qbody has n atoms, then a minimized, equivalent or contained, rewriting Q’ contains at most n (view or regular) atoms
Proof: consider the expansion exp(Q’), and the containment mapping from Q to exp(Q’); its image has at most n atoms
Q: q(..) :- p1(..), pk(..) pk+1(..) pn(..)
Q’: q(..):- p1,(..), , pk(..), v1(..), vm(..)
r1,1 .. r1k1 rm,1 .. rmk1
2005 lav-i 15
Claim 3: The problem:
is there a complete equivalent rewriting
is NP-complete
Proof: given v, construct
v’: Qhead(..):- Qbody, vbody
Then v’ is usable (NP-complete) iff v’ is a complete rewriting
Claim 4:
The problem above is NP-complete even if the query and the views do not contain repeated predicates in their bodies (which simplifies the search for containment mappings)
And also if we are looking for a complete contained rewriting
Proof: next page
2005 lav-i 16
Exact cover by 3-sets : (NP-complete problem)
Given s={e1,…,en} and sets s1,…,sk, each of size 3, is there a cover of s by a subset of the {sj} where each element occurs in just one set?
A reduction from the above to finding a complete rewriting:
The query Q: q( ) :- p1(A1,1,…,A1,k),…, pn(An,1,…, An,k)
Note: since each sj contains 3 elements, Sj occurs in 3 atoms – this is a join condition in the query body
The view vj (for sj): the 3 atoms (as above) for the elements of sj, and a head that contain all the Sm that occur in its body, but not Sj (in the body, Sj is existential)
In pi, Ai,j is Sj if ei sj, a new variable otherwiseÎ
2005 lav-i 17
Assume a complete contained rewriting Q’ exists
There is a containment mapping h : body(Q) body( exp(Q’))
Note: Sj in exp(vj) is renamed to a new var, say Gj, that does not occur in
expansion of other views, since it is existential
If h maps pi(..) from Q to the expansion of vj, then the expansion contain pi, so and h(Sj)=Gj
the atoms pm, pq in Q contain Sj
And the body of vj contains pi, pm, pq, with Gj instead of Sj
Since h(Sj) = Gj, h maps pm and pq from Q also to exp(vj)(Since vj does not export Sj, the join condition on Sj can be satisfied only if
all three atoms pi, pm, pq are mapped by h to vj)
The views with expansions in image(h) provide an exact cover
ei sjÎ
Assume sj={ei, em, eq}
2005 lav-i 18
The other direction (if an exact cover exists, it gives a complete rewriting) - left for you
Also: a complete contained rewriting in this case is always an equivalent rewriting
2005 lav-i 19
Comments:• For contained rewritings, the characterization (claim1, p.12)
of usability does not hold (even if v(D) is empty on some D, and Q(D) is not, v may be useful for a contained rewriting)
• Claim 2 (p. 14) holds also for contained rewritings : search for rewritings is restricted by size of query body
but: if db satisfies functional dependencies, the size bound fails
Example:
Database: a single relation e(X, Y, Z), with fd: X Y
Query Q: q(X, Y, Z):- e(X, Y, Z)
Views: v1(X,Y):- e(X, Y, Z), v2(X,Z):- e(X, Y, Z)
A complete rewriting, of minimal size :
Q’: q(X, Y, Z):- v1(X, Y), v2(X, Z)
2005 lav-i 20
If the query and views contain b.i. predicates (comparisons) :
Let Q’ be Q w/o the comparisons
If Q’’ is a complete rewriting for Q’, just add to it the comparisons in Q (with variables suitably renamed), to obtain a complete contained rewriting for Q (or a contradiction!)
Example: from p. 6
Views: v0(p, s):- advises(p,s)
v1(p,s, q) :- registered(s,c,q), teaches(p,c,q), q>= w97
A query: q(p,s,q) :- registered(s,c,q), teaches(p,c,q), advises(q,s), q>=w98
A rewriting: q(p,s,q):- v1(p.s, q), v0(p,s), q>=w98
Can we change the example to obtain a contradiction?
2005 lav-i 21
The Information Manifold system (IM)
A LAV system, implemented in Bell Labs around 94-96, supported about 100 WWW sources
Main ideas:
• Rewriting queries using views for answering queries
• Using fine-grained descriptions of sources to eliminate irrelevant sources
• Support for restricted-capability sources
From now, rewriting always means complete, contained
2005 lav-i 22
Rewriting views using views in IM – the bucket algorithm :
The goal: Finding rewritings is difficult in worst case
But good heuristics often work well in practice
Outline:
1) Find for each query atom – a subgoal, views that may be targets of a mapping from it; put them in a bucket for the subgoal (these are candidates) (hopefully, this filters out many candidates)
2) Combine views, one from each bucket; test if
1) there is a containment mapping;
2) adding constraints yields a satisfiable query on the views
3) Minimize each rewriting, eliminate those contained in others
4) Take the union
2005 lav-i 23
Step 1 – computing the bucket for a query atom :
Assume Q: q(X) :- p1(U1), …, pn(Un), C(Q)
We add h(vj) to bucket(pi(Ui)) if:
• vj’s definition contains an atom pi(Y) ; // if vj is used, then possibly a containment mapping sends pi(Ui) to pi(Y)
• if the k’th var of Ui is a head var of Q, then the k’th var of Y is a head var of vj (this var is needed for the query result)
• h renames variables of vj and (some vars of) Q as follows:• If y, k’th in Y, is a head var of vj, then rename to k’th var of Ui
// this may be a head var of Q, or used in a join
• Otherwise, h(y) is a new distinct var
• h(pi(Ui)) = h(pi(Y))
• h(C(Q)) and h(C(vj)) is satisfiable
2005 lav-i 24
Example (same university db) :
Views:
v1(s, c, q, t) :- registered(s, c, q), course(c, t), c>=500, q>=a98
v2(s, p, c, q) :- registered(s, c, q), teaches(p, c, q)
v3(s, c) :- registered(s, c, q), q<=a94
v4(p, c, t, q) :- registered(s, c, q), teaches(p, c, q), course(c, t), q<=a97
Query: q(s, p, c) :- registered(s, c, q), teaches(p, c, q), course(c, t), c>=300, q>=a95
Bucket for registered(s, c, q): v1(s, c, q, t1), v2(s, p1, c, q) (but not v3 – for one(two?) reasons, not v4 – for one reason)
Bucket for teaches(p, c, q): v2(s1, p, c, q), v4(p, c, t2, q)
Bucket for course(c, t):
2005 lav-i 25
Step 2 – combining views, testing for satisfiability of rewriting :
Example (cont’d): Combining 1st element of each bucket:
Q1: q(s, p, c) :- v1(s, c, q, t1), v2(s1, p, c, q), v1(s2, c, q1, t)
Minimize by s2s, cc, q1q, t t1 (3rd atom is removed)
Q1’: q(s, p, c) :- v1(s, c, q, t1), v2(s1, p, c, q)
Expand:v1: registered(s, c, q), course(c, t1), c>=500, q>=a98,
v2: registered(s, c, q), teaches(p, c, q)
With query constraints (under containment mapping) c>=300, q>=a95, this is satisfiable, so we have a contained rewriting
2005 lav-i 26
Another combination:
Q2: q(s, p, c) :- v1(s, c, q, t1), v4(p, c, t2, q) , v4(p2, c, t, q2)
Can minimize, remove 3rd atom
Expansion:
v1: registered(s, c, q), course(c, t1), c>=500, q>=a98
v4 : registered(s, c, q), teaches(p, c, q), course(c, t2), q<=a97
The conjunction is unsatisfiable (conditions in two views are contradictory), this is not a rewriting
Taking the union of all the rewritings that pass the filters, we obtain (in this example) a maximally contained rewriting
2005 lav-i 27
Another example:
The database contains a single relation :
flight(from, to, carrier)
The views are
v1(F, T) :- flight(F, T, wn) // wn is Southwest airlines
v2(F, T) :- flight (F, T, ua) // United airlines
v3(F, T, C) :- flight(F, Z, C), flight(Z, T, C)
A user wants to fly from Tucson to S.F, w/o changing the airline, and with at most one stop
Q: q(C) :- flight(tus, sfo, C)
q(C) :- flight(tus, Z, C), flight(Z, sfo, C)
2005 lav-i 28
Buckets are computed for each sub-query: (an extension!)
For flight(tsu, sfo, C):
v1(tus, sfo), v2(tus, sfo), v3(tus, T1, C), v3(F1, sfo, C)
Each of these gives a candidate rewriting for q, of which only
q1(wn) :- v1(tus, sfo)
q2(ua) :- v2(tus, sfo)
remain
2005 lav-i 29
An alternative presentation of the bucket (p. 23)
Step 1 – computing the bucket for a query atom :
Assume Q: q(X) :- p1(U1), …, pn(Un), C(Q)
We add h(vj) to bucket(pi(Ui)) if:
• vj’s definition contains an atom pi(Y) & pi(U) and pi(Y) unify // if vj is used, then possibly a containment mapping sends pi(Ui) to pi(Y)
• Some more conditions (next page) are satisfied
Thus, if vj’s body contains pi twice, it may be added to the bucket (at most) twice
For h, see next page
2005 lav-i 30
The additional conditions & the definition of h:• (condition) if the k’th var of Ui is a head var of Q, or a join var of
Q, then the k’th var of Y is a head var of vj (this var is needed for the query result or for the join)
(the condition on join vars does not occur in Alon’s survey paper!?)
• h renames variables of vj as follows:• If y, k’th in Y, is a head var of vj, then rename to k’th var of Ui
• Otherwise, h(y) is a new distinct var
• h(pi(Ui)) = h(pi(Y))
this is simply a renaming of the variables of vj• (condition) Since pi(Y) and pi(U) are unifiable, h can be extended
to (some variables of) Q, so that h(pi(Ui)) = h(pi(Y))
• (condition) h(C(Q)) and h(C(vj)) is satisfiable (h is defined on some variables of C(Q), so it maps part of C(Q); that part
should be consistent with h(C(vj)) )