Upload
kolya
View
31
Download
0
Embed Size (px)
DESCRIPTION
Answering Queries using views: A survey. The problem…. We have a schema A and a set of materialized views V1, V2,…, Vn We are given a query Q over schema A Can we answer Q using another query Q’ referring only to (one or more) views? Depending on the specific context: - PowerPoint PPT Presentation
Citation preview
Answering Queries using views: A survey
The problem…
• We have a schema A and a set of materialized views V1, V2,…, Vn
• We are given a query Q over schema A• Can we answer Q using another query Q’
referring only to (one or more) views?• Depending on the specific context:
– Schema A can be either virtual or materialized– Q’ may be required to be an equivalent rewriting or a
maximally contained rewriting of Q
Equivalent and Containment rewritings
• Query Containment: Q1 is contained in Q2 (Q1Q2), if for all db instances D, Q1(D)Q2(D)
• Query Containment: Q1 is equivalent to Q2 if Q1Q2 and Q2 Q1
Let A a schema and V={V1, V2, …, Vm} a set of views over A and Q a query over A
• Q’ is a equivalent rewriting of Q, if– Q’ refers only to views in V
– Q’ is equivalent to Q
• Q’ is a maximally contained rewriting of Q, if– Q’ refers only to views in V
– Q’ Q
– There is no other rewriting Q1, such as Q’ Q1 Q
When the problem arises?
• Data integration – The sources are described as views over the mediated schema
• Data warehouse– The schema of the warehouse can be defined as views over the
schemata of the sources • Query Optimization
– Computing the query using previously materialized views– semantic caching
• Database design – Independence of the physical view of the data and logical view– Rewriting of a query over the logical schema into a query over the
physical view
The variations of the problemData integration
Data Warehouses
Query optimization
Database design
Equivalent rewriting
Only if the sources are complete
Maximally-contained rewriting
Views are materialized
The db schema is virtual
A running example…
• Suppose we have the following schemaStudent(st_id, st_name, dp_id)Professor(pr_id, pr_name)Department(dp_id, dp_description)Course(cr_id, cr_title, cr_semester)Prof_Dep(pr_id, dp_id, university)Student_course(st_id, cr_id)Prof_course(pr_id, cr_id)
Data Integration
Mediator
Student(st_id, st_name, dp_id)Professor(pr_id, pr_name)Department(dp_id, dp_description)Course(cr_id, cr_title, cr_semester, cr_field)Prof_Dep(pr_id, dp_id, university)Student_course(st_id, cr_id)Prof_course(pr_id, cr_id)
Create View DBProfs AsSelect cr_title, pr_nameFrom Course, Prof_course, ProfessorWhere Course.cr_id = Prof_course.cr_id and Prof_course.pr_id = Professor.pr_id and Course.cr_field = “DB”
Source 1Professors Of Databases
Source 1Professors Of AUEB
Create View AuebProfs AsSelect cr_title, pr_nameFrom Course, Prof_course, ProfessorWhere Course.cr_id = Prof_course.cr_id and Prof_course.pr_id = Professor.pr_id and Course.univ = “AUEB”
Data Integration
Mediator
Student(st_id, st_name, dp_id)Professor(pr_id, pr_name)Department(dp_id, dp_description)Course(cr_id, cr_title, cr_semester, cr_field)Prof_Dep(pr_id, dp_id, university)Student_course(st_id, cr_id)Prof_course(pr_id, cr_id)
Source 1Professors Of Databases
Source 1Professors Of AUEB
Query Q:Select pr_nameFrom Professor
Query Q’:Select pr_nameFrom DBProfsUnionSelect pr_nameFrom AUEBProfs
Q’ is a maximally-containedRewriting of Q(professors of Algorithms in the university of Piraeus cannotbe retrieved…)
Data Integration
Student(st_id, st_name, dp_id)Professor(pr_id, pr_name)Department(dp_id, dp_description)Course(cr_id, cr_title, cr_semester, cr_field)Prof_Dep(pr_id, dp_id, university)Student_course(st_id, cr_id)Prof_course(pr_id, cr_id)
Source 1Professors Of Databases
Source 1Professors Of AUEB
Query P:Select pr_nameFrom Course, Prof_course, ProfessorWhere Course.cr_id = Prof_course.cr_id and Prof_course.pr_id = Professor.pr_id and Course.cr_field = “DB”And Course.univ = “AUEB”
P’ is an Equivalent Rewriting of P
Mediator
Query P’:Select pr_nameFrom DBProfs, AUEBProfsWhereDBProfs.pr_name = AUEBProfs.pr_name
Query Optimization• Suppose that we have a database with the schema of
our running example• Suppose that we have the following materialized views
– create view StudentCoursesView As
Select st_name, cr_id
from Student, Stud_course
where Student.st_id = Stud_course.st_id– create view ProfCoursesView As
Select pr_name, pr_id
from Professor, Prof_course
where Professor.pr_id = Prof_course.pr_id
Query Optimization
• Suppose we are given the query Select st_name, prof_name
from Student, Stud_course, Prof_course, Professor
where Student.st_id = Stud_course.cr_id
and Stud_course.cr_id = Prof_course.cr_id
and Prof_course.pr_id = Professor.pr_id
Query Optimization
• We can exploit the materialized views in order to eliminate joins
• There are many possible combinations
• One could be:Select st_name, pr_name
From StudentCoursesView, ProfessorCoursesView
Where StudentCoursesView.cr_id =
ProfessorCoursesView.cr_id
Query Optimization
• But:– Our aim here is not just to find an equivalent rewriting– We have to find all rewritings and evaluate their costs in order to
select the cheapest plan– In some cases views may not help due to loss of indexes – Therefore, we need to make a cost-based rewriting
• Similar case is the semantic caching– Client-server architecture– Query results can be temporary maintained as relations in the
client side to be exploited for the evaluation of future queries
A common question: When a view V is useful for a query Q?
• There must be a mapping μ from the relations of V to the relations of Q
• If so, if R1 is both in V and Q then:– If R1 participates in a join in Q with relation R2, then
• Either R1 and R2 must be joined in the same attributes with the same conditions in V
• Or R1 and R2 must be joined in V in the same attributes but with weaker conditions than those in Q, provided that the join attributes are projected in V
• Or only R1 is in V and the attributes of R1 participating in the join in Q, are projected in V
– If Q applies a selection predicate in an attribute A of R1, then V:• Either V must apply the same selection predicate in attribute A of R1• Or V must apply a weaker selection predicate, provided that attribute A is
projected in V• Or does not apply a predicate in attribute A of R1 at all and attribute A is
projected in V
• V must project at least all attributes projected in Q
ExamplesQ: Select st_name, prof_name
from Student, Stud_course, Prof_course, Professor
where Student.st_id = Stud_course.st_id
and Stud_course.cr_id = Prof_course.cr_id
and Prof_course.pr_id = Professor.pr_id
and Student.st_year > 2000
V1: Select st_name from Student where st_year>2000
V1: Select st_name, st_id from Student where st_year> 2000
V1: Select st_name, cr_name from Student, Stud_course, Course
where Student.st_id = Stud_course.st_idand Stud_course.cr_id = Course.cr_id
V1: Select st_name, st_id from Student where st_year >1996
V1: Select st_name, st_id, ma from Student where am >1996
V1: Select st_name, st_year, cr_name,cr_id
from Student, Stud_course, Course
where Student.st_id = Stud_course.st_idand Stud_course.cr_id = Course.cr_id
Algorithms for finding query rewritings using views
• In the query optimization context– A variation of the Selinger Algorithm
• Cost-based equivalent rewritings
• In the data integration context– The bucket algorithm– The inverse-rules algorithm– The MiniCon algorithm
Maximally-contained rewritings
Query Optimization – the Selinger Algorithm
• Recall: the Selinger Algorithm chooses the best join order
• A bottom-up dynamic programming algorithm
• Each iteration explores all join combinations of certain length
• For each combination of joins, it keeps only the cheapest plan, taking into account costs estimated in the previous iteration
A variation of the Selinger Algorithm
• The Selinger Algorithm can be modified in order to choose the cheapest equivalent rewriting
• We start with views that seem useful for a certain query
• We continue in a bottom-up fashion exploring all join combinations
• For each combination we prune plans estimated to be more expensive or less contributing than others
A variation of the Selinger Algorithm
• First Iteration:– Find all views that are relevant (useful) to the query – Choose the best access path for each view and evaluate its cost
• Second Iteration:– Find all pair combinations– For each pair find all possible Joins in all possible attributes that
lead to useful plans– Plans that are complete (i.e. are sufficient to answer the query) are
distinguished as final candidates– For the rest plans do the following:
• Compare pairs of plans• For each pair (p, p’), if p’ is cheaper and has greater or equal
contribution (covers more relations), then p is pruned
• Third Iteration:– Find all possible pairs of plans derived from the previous iteration
and the plans derived from the first iteration– …
A variation of the Selinger Algorithm
Query: R1 R2 R3 R4
Views: V1=R1 R2 , V2=R2 R3 R4
V3=R2 R3, V4=R3 R4,
V1 V2 V3 V4
V1-V2 V1-V3 V1-V4 V2-V3 V2-V4 V3-V4
(V1-V3)-V2 (V1-V3)-V4
Algorithms in the Data Integration Context
• We will discuss 2 algorithms– The bucket algorithm– The inverse-rules algorithm
• But first a short introduction to conjunctive queries
Conjunctive Queries• Conjunctive queries are able to express select-
project-join queries• h(X,…) :- a(Y,…), b(Z,…), …
• Head and subgoals are atoms • An atom consists of a predicate and a tuple of
arguments: a(Y, Z, ‘aueb’, W, 2006)
• Predicates stand for relations
head body subgoals
variables constants
Conjunctive Queries• Subgoals can also be arithmetic comparisons
h(X,…) :- a(Y,…), b(Z,…), Z>3• A variable that appears in the head is said to be
distinguished ; otherwise nondistinguished.h(X,Y):-a(X,Z),b(Z,Y)X,Y: distinguished, Z:nondestinguished
• project-select-join SQL queries as conjunctive queries:Select pr_name, dep_nameFrom Professor, DepartmentWhere Professor.pr_dep = Department.dep_idand Department.dep_year>1996and Professor.field = “DB”
q(pr_name, dep_name) :- Professor(pr_name, dep_id, “DB”),Department(dep_id, dep_name,
dep_year),dep_year>1996.
Conjunctive Queries• The problem of rewriting queries using
views in the notation of conjunctive queries:– He have
• a set of Database Relations• a set of views declared as conjunctive queries over
the Database Relations• a query q expressed as a conjunctive query over the
Database Relations
– We want to find:• A set of conjunctive queries over the view relations,
the disjunction of which being a maximally contained rewriting of q
example
Relations: R(S,C,Q), CO(C,T), TE(P,C,Q)
Views:V1(S, C, Q, T) :- R(S, C, Q), CO(C, T), C>500, Q>97
V2(S, P, C, Q) :- R(S, C, Q), TE(P, C, Q)
V3(S, C) :- R(S, C, Q), Q<94
V4(P, C, T, Q) :- R(S, C, Q), CO(C, T), TE(P, C, Q), Q<97
Query:Q(S, C, P) :- TE(P,C,Q), R(S,C,Q), CO(C,T), C>300, Q>95
The Bucket Algorithm• Create a ‘bucket’ for each subgoal of the query q• Fill each bucket with views that ‘cover’ the subgoal• A view V covers a subgoal p(A,B) of q if there is a
subgoal in V with the same predicate p(X,Y) and:– there is a mapping from the variables A, B of the subgoal of q to
the variables X, Y of the subgoal of the view– For variables of the subgoal of q appearing in the head of q, their
mapping variables in the subgoal of the view must also appear in the head of the view
– For each variable of the subgoal of q that also appear in arithmetic conditions C in q, if its mapping variable in the subgoal of the view also appear in arithmetic conditions C’ in the view, then C and C’ should not be mutually exclusive
• A solution must include q view from each bucket
The Bucket AlgorithmV1(S, C, Q, T) :- R(S, C, Q), CO(C, T), C>500, Q>97V2(S, P, C, Q) :- R(S, C, Q), TE(P, C, Q)V3(S, C) :- R(S, C, Q), Q<94V4(P, C, T, Q) :- R(S, C, Q), CO(C, T), TE(P, C, Q), Q<97Q(St, Cn, Pr) :- TE(Pr,Cn,Qu), R(St,Cn,Qu), CO(Cn,Tl),
Cn>300, Qu>95
V4(P,C,T’, Q)
V2(S’,P,C, Q)
TE R CO
V1(S,C,Q,T’)
V2(S,P’,C,Q)
V1(S’, C, Q’, T)
V4(S’, C, T, Q’)
The Bucket Algorithm• There are 8 combinations we have to check• For each, we equaling appropriate variables to enforce
joins• We check for condition confligs• If there are duplicate subgoals we try to eliminate them
V4(P,C,T’, Q)
V2(S’,P,C, Q)
TE R CO
V1(S,C,Q,T’)
V2(S,P’,C,Q)
V1(S’, C, Q’, T)
V4(S’, C, T, Q’)
q’(S,C,P):-V2(S’,P,C,Q), V1(S,C,Q,T’), V1(S’,C,Q’,T)
q’(S,C,P):-V2(S’,P,C,Q), V1(S,C,Q,T’) The first solution
The inverse-rules algorithm
• Key idea: invert the view definitions to give the global predicates (not only those appearing in the query) in terms of views
• For each nondistinguished variable X, pick a new function symbol fExample:v(X,Y) :- p(X,Z), p(Z,Y)Replace Z by f(X,Y):v(X,Y) :- p(X,f(X,Y)), p(f(X,Y),Y)
The inverse-rules algorithm
• Replace a view definition by rules with:– A subgoal as the head, and– The view itself as the only subgoal of the
body.
• Example:v(X,Y) :- p(X,f(X,Y)), p(f(X,Y),Y)becomes:p(X,f(X,Y)) :- v(X,Y) p(f(X,Y),Y) :- v(X,Y)
The inverse-rules algorithm - example
View: v(D,C):-m(S,D), r(S, C)
Query: q(D) :- m(S,D), r(S, 444)
After applying the inverse-rules, we have:
m(f1(D,C), D) :- v(D, C)
r(f1(D,C), C) :- v(D,C)
The inverse-rules algorithm - example
• Supose that v has the following tuples:
{(CS, 444), (EE, 444), (CS, 333)}
• Applying the inverse-rules on these tuples we have:
m: { (f1(CS, 444), CS), (f1(EE, 444),EE), (f1(CS, 333), CS)}
r: { (f1(CS, 444), 444), (f1(EE, 444),444), (f1(CS, 333), 333)}
The inverse-rules algorithm - example
• We can now apply the query in this instance
q: {(CS),(EE)}
The MiniCon algorithm
• A variation of the bucket algorithm• It takes into account the non-distinguished variables as
well, which usually imply joins• If a view contains a ‘target’ subgoal, which participates
in a join in the query, then the ‘join’ variables should appear in the head of the view (unless the join takes place also within the view)
• It excludes much more views from participating in buckets (here called MCDs)
• There are fewer combination to check at the end
Discussion…