Answering Queries using views: A survey

Answering Queries using views: A survey

The problem…

• We have a schema A and a set of materialized views V1, V2,…, Vn

• We are given a query Q over schema A• Can we answer Q using another query Q’

referring only to (one or more) views?• Depending on the specific context:

– Schema A can be either virtual or materialized– Q’ may be required to be an equivalent rewriting or a

maximally contained rewriting of Q

Equivalent and Containment rewritings

• Query Containment: Q1 is contained in Q2 (Q1Q2), if for all db instances D, Q1(D)Q2(D)

• Query Containment: Q1 is equivalent to Q2 if Q1Q2 and Q2 Q1

Let A a schema and V={V1, V2, …, Vm} a set of views over A and Q a query over A

• Q’ is a equivalent rewriting of Q, if– Q’ refers only to views in V

– Q’ is equivalent to Q

• Q’ is a maximally contained rewriting of Q, if– Q’ refers only to views in V

– Q’ Q

– There is no other rewriting Q1, such as Q’ Q1 Q

When the problem arises?

• Data integration – The sources are described as views over the mediated schema

• Data warehouse– The schema of the warehouse can be defined as views over the

schemata of the sources • Query Optimization

– Computing the query using previously materialized views– semantic caching

• Database design – Independence of the physical view of the data and logical view– Rewriting of a query over the logical schema into a query over the

physical view

The variations of the problemData integration

Data Warehouses

Query optimization

Database design

Equivalent rewriting

Only if the sources are complete

Maximally-contained rewriting

Views are materialized

The db schema is virtual

A running example…

• Suppose we have the following schemaStudent(st_id, st_name, dp_id)Professor(pr_id, pr_name)Department(dp_id, dp_description)Course(cr_id, cr_title, cr_semester)Prof_Dep(pr_id, dp_id, university)Student_course(st_id, cr_id)Prof_course(pr_id, cr_id)

Data Integration

Mediator

Student(st_id, st_name, dp_id)Professor(pr_id, pr_name)Department(dp_id, dp_description)Course(cr_id, cr_title, cr_semester, cr_field)Prof_Dep(pr_id, dp_id, university)Student_course(st_id, cr_id)Prof_course(pr_id, cr_id)

Create View DBProfs AsSelect cr_title, pr_nameFrom Course, Prof_course, ProfessorWhere Course.cr_id = Prof_course.cr_id and Prof_course.pr_id = Professor.pr_id and Course.cr_field = “DB”

Source 1Professors Of Databases

Source 1Professors Of AUEB

Create View AuebProfs AsSelect cr_title, pr_nameFrom Course, Prof_course, ProfessorWhere Course.cr_id = Prof_course.cr_id and Prof_course.pr_id = Professor.pr_id and Course.univ = “AUEB”

Data Integration

Mediator




Query Q:Select pr_nameFrom Professor

Query Q’:Select pr_nameFrom DBProfsUnionSelect pr_nameFrom AUEBProfs

Q’ is a maximally-containedRewriting of Q(professors of Algorithms in the university of Piraeus cannotbe retrieved…)

Data Integration




Query P:Select pr_nameFrom Course, Prof_course, ProfessorWhere Course.cr_id = Prof_course.cr_id and Prof_course.pr_id = Professor.pr_id and Course.cr_field = “DB”And Course.univ = “AUEB”

P’ is an Equivalent Rewriting of P

Mediator

Query P’:Select pr_nameFrom DBProfs, AUEBProfsWhereDBProfs.pr_name = AUEBProfs.pr_name

Query Optimization• Suppose that we have a database with the schema of

our running example• Suppose that we have the following materialized views

– create view StudentCoursesView As

Select st_name, cr_id

from Student, Stud_course

where Student.st_id = Stud_course.st_id– create view ProfCoursesView As

Select pr_name, pr_id

from Professor, Prof_course

where Professor.pr_id = Prof_course.pr_id

Query Optimization

• Suppose we are given the query Select st_name, prof_name

from Student, Stud_course, Prof_course, Professor

where Student.st_id = Stud_course.cr_id

and Stud_course.cr_id = Prof_course.cr_id

and Prof_course.pr_id = Professor.pr_id

Query Optimization

• We can exploit the materialized views in order to eliminate joins

• There are many possible combinations

• One could be:Select st_name, pr_name

From StudentCoursesView, ProfessorCoursesView

Where StudentCoursesView.cr_id =

ProfessorCoursesView.cr_id

Query Optimization

• But:– Our aim here is not just to find an equivalent rewriting– We have to find all rewritings and evaluate their costs in order to

select the cheapest plan– In some cases views may not help due to loss of indexes – Therefore, we need to make a cost-based rewriting

• Similar case is the semantic caching– Client-server architecture– Query results can be temporary maintained as relations in the

client side to be exploited for the evaluation of future queries

A common question: When a view V is useful for a query Q?

• There must be a mapping μ from the relations of V to the relations of Q

• If so, if R1 is both in V and Q then:– If R1 participates in a join in Q with relation R2, then

• Either R1 and R2 must be joined in the same attributes with the same conditions in V

• Or R1 and R2 must be joined in V in the same attributes but with weaker conditions than those in Q, provided that the join attributes are projected in V

• Or only R1 is in V and the attributes of R1 participating in the join in Q, are projected in V

– If Q applies a selection predicate in an attribute A of R1, then V:• Either V must apply the same selection predicate in attribute A of R1• Or V must apply a weaker selection predicate, provided that attribute A is

projected in V• Or does not apply a predicate in attribute A of R1 at all and attribute A is

projected in V

• V must project at least all attributes projected in Q

ExamplesQ: Select st_name, prof_name

from Student, Stud_course, Prof_course, Professor

where Student.st_id = Stud_course.st_id

and Stud_course.cr_id = Prof_course.cr_id

and Prof_course.pr_id = Professor.pr_id

and Student.st_year > 2000

V1: Select st_name from Student where st_year>2000

V1: Select st_name, st_id from Student where st_year> 2000

V1: Select st_name, cr_name from Student, Stud_course, Course

where Student.st_id = Stud_course.st_idand Stud_course.cr_id = Course.cr_id

V1: Select st_name, st_id from Student where st_year >1996

V1: Select st_name, st_id, ma from Student where am >1996

V1: Select st_name, st_year, cr_name,cr_id

from Student, Stud_course, Course

where Student.st_id = Stud_course.st_idand Stud_course.cr_id = Course.cr_id

Algorithms for finding query rewritings using views

• In the query optimization context– A variation of the Selinger Algorithm

• Cost-based equivalent rewritings

• In the data integration context– The bucket algorithm– The inverse-rules algorithm– The MiniCon algorithm

Maximally-contained rewritings

Query Optimization – the Selinger Algorithm

• Recall: the Selinger Algorithm chooses the best join order

• A bottom-up dynamic programming algorithm

• Each iteration explores all join combinations of certain length

• For each combination of joins, it keeps only the cheapest plan, taking into account costs estimated in the previous iteration

A variation of the Selinger Algorithm

• The Selinger Algorithm can be modified in order to choose the cheapest equivalent rewriting

• We start with views that seem useful for a certain query

• We continue in a bottom-up fashion exploring all join combinations

• For each combination we prune plans estimated to be more expensive or less contributing than others


• First Iteration:– Find all views that are relevant (useful) to the query – Choose the best access path for each view and evaluate its cost

• Second Iteration:– Find all pair combinations– For each pair find all possible Joins in all possible attributes that

lead to useful plans– Plans that are complete (i.e. are sufficient to answer the query) are

distinguished as final candidates– For the rest plans do the following:

• Compare pairs of plans• For each pair (p, p’), if p’ is cheaper and has greater or equal

contribution (covers more relations), then p is pruned

• Third Iteration:– Find all possible pairs of plans derived from the previous iteration

and the plans derived from the first iteration– …


Query: R1 R2 R3 R4

Views: V1=R1 R2 , V2=R2 R3 R4

V3=R2 R3, V4=R3 R4,

V1 V2 V3 V4

V1-V2 V1-V3 V1-V4 V2-V3 V2-V4 V3-V4

(V1-V3)-V2 (V1-V3)-V4

Algorithms in the Data Integration Context

• We will discuss 2 algorithms– The bucket algorithm– The inverse-rules algorithm

• But first a short introduction to conjunctive queries

Conjunctive Queries• Conjunctive queries are able to express select-

project-join queries• h(X,…) :- a(Y,…), b(Z,…), …

• Head and subgoals are atoms • An atom consists of a predicate and a tuple of

arguments: a(Y, Z, ‘aueb’, W, 2006)

• Predicates stand for relations

head body subgoals

variables constants

Conjunctive Queries• Subgoals can also be arithmetic comparisons

h(X,…) :- a(Y,…), b(Z,…), Z>3• A variable that appears in the head is said to be

distinguished ; otherwise nondistinguished.h(X,Y):-a(X,Z),b(Z,Y)X,Y: distinguished, Z:nondestinguished

• project-select-join SQL queries as conjunctive queries:Select pr_name, dep_nameFrom Professor, DepartmentWhere Professor.pr_dep = Department.dep_idand Department.dep_year>1996and Professor.field = “DB”

q(pr_name, dep_name) :- Professor(pr_name, dep_id, “DB”),Department(dep_id, dep_name,

dep_year),dep_year>1996.

Conjunctive Queries• The problem of rewriting queries using

views in the notation of conjunctive queries:– He have

• a set of Database Relations• a set of views declared as conjunctive queries over

the Database Relations• a query q expressed as a conjunctive query over the

Database Relations

– We want to find:• A set of conjunctive queries over the view relations,

the disjunction of which being a maximally contained rewriting of q

example

Relations: R(S,C,Q), CO(C,T), TE(P,C,Q)

Views:V1(S, C, Q, T) :- R(S, C, Q), CO(C, T), C>500, Q>97

V2(S, P, C, Q) :- R(S, C, Q), TE(P, C, Q)

V3(S, C) :- R(S, C, Q), Q<94

V4(P, C, T, Q) :- R(S, C, Q), CO(C, T), TE(P, C, Q), Q<97

Query:Q(S, C, P) :- TE(P,C,Q), R(S,C,Q), CO(C,T), C>300, Q>95

The Bucket Algorithm• Create a ‘bucket’ for each subgoal of the query q• Fill each bucket with views that ‘cover’ the subgoal• A view V covers a subgoal p(A,B) of q if there is a

subgoal in V with the same predicate p(X,Y) and:– there is a mapping from the variables A, B of the subgoal of q to

the variables X, Y of the subgoal of the view– For variables of the subgoal of q appearing in the head of q, their

mapping variables in the subgoal of the view must also appear in the head of the view

– For each variable of the subgoal of q that also appear in arithmetic conditions C in q, if its mapping variable in the subgoal of the view also appear in arithmetic conditions C’ in the view, then C and C’ should not be mutually exclusive

• A solution must include q view from each bucket

The Bucket AlgorithmV1(S, C, Q, T) :- R(S, C, Q), CO(C, T), C>500, Q>97V2(S, P, C, Q) :- R(S, C, Q), TE(P, C, Q)V3(S, C) :- R(S, C, Q), Q<94V4(P, C, T, Q) :- R(S, C, Q), CO(C, T), TE(P, C, Q), Q<97Q(St, Cn, Pr) :- TE(Pr,Cn,Qu), R(St,Cn,Qu), CO(Cn,Tl),

Cn>300, Qu>95

V4(P,C,T’, Q)

V2(S’,P,C, Q)

TE R CO

V1(S,C,Q,T’)

V2(S,P’,C,Q)

V1(S’, C, Q’, T)

V4(S’, C, T, Q’)

The Bucket Algorithm• There are 8 combinations we have to check• For each, we equaling appropriate variables to enforce

joins• We check for condition confligs• If there are duplicate subgoals we try to eliminate them

V4(P,C,T’, Q)

V2(S’,P,C, Q)

TE R CO

V1(S,C,Q,T’)

V2(S,P’,C,Q)

V1(S’, C, Q’, T)

V4(S’, C, T, Q’)

q’(S,C,P):-V2(S’,P,C,Q), V1(S,C,Q,T’), V1(S’,C,Q’,T)

q’(S,C,P):-V2(S’,P,C,Q), V1(S,C,Q,T’) The first solution

The inverse-rules algorithm

• Key idea: invert the view definitions to give the global predicates (not only those appearing in the query) in terms of views

• For each nondistinguished variable X, pick a new function symbol fExample:v(X,Y) :- p(X,Z), p(Z,Y)Replace Z by f(X,Y):v(X,Y) :- p(X,f(X,Y)), p(f(X,Y),Y)

The inverse-rules algorithm

• Replace a view definition by rules with:– A subgoal as the head, and– The view itself as the only subgoal of the

body.

• Example:v(X,Y) :- p(X,f(X,Y)), p(f(X,Y),Y)becomes:p(X,f(X,Y)) :- v(X,Y) p(f(X,Y),Y) :- v(X,Y)

The inverse-rules algorithm - example

View: v(D,C):-m(S,D), r(S, C)

Query: q(D) :- m(S,D), r(S, 444)

After applying the inverse-rules, we have:

m(f1(D,C), D) :- v(D, C)

r(f1(D,C), C) :- v(D,C)


• Supose that v has the following tuples:

{(CS, 444), (EE, 444), (CS, 333)}

• Applying the inverse-rules on these tuples we have:

m: { (f1(CS, 444), CS), (f1(EE, 444),EE), (f1(CS, 333), CS)}

r: { (f1(CS, 444), 444), (f1(EE, 444),444), (f1(CS, 333), 333)}


• We can now apply the query in this instance

q: {(CS),(EE)}

The MiniCon algorithm

• A variation of the bucket algorithm• It takes into account the non-distinguished variables as

well, which usually imply joins• If a view contains a ‘target’ subgoal, which participates

in a join in the query, then the ‘join’ variables should appear in the head of the view (unless the join takes place also within the view)

• It excludes much more views from participating in buckets (here called MCDs)

• There are fewer combination to check at the end

Discussion…

Documents

Answering Queries using views: A survey