View
215
Download
1
Embed Size (px)
Citation preview
Presentation Outline
21.4 Capability Based Optimization 21.4.1The Problem of Limited Source
Capabilities 21.4.2 A notation for Describing
Source Capabilities 21.4.3 Capability-Based Query-Plan
Selection 21.4.4 Adding Cost-Based
Optimization 21.5 Optimizing Mediator Queries
21.5.1 Simplified Adornment Notation 21.5.2 Obtaining Answers for
Subgoals 21.5.3 The Chain Algorithm 21.5.4 Incorporating Union Views at
the Mediator
21.4 Capability Based Optimization Introduction
Typical DBMS estimates the cost of each query plan and picks what it believes to be the best
Mediator – has knowledge of how long its sources will take to answer
Optimization of mediator queries cannot rely on cost measure alone to select a query plan
Optimization by mediator follows capability based optimization
21.4.1 The Problem of Limited Source Capabilities Many sources have only Web Based
interfaces Web sources usually allow querying
through a query form E.g. Amazon.com interface allows us to
query about books in many different ways. But we cannot ask questions that are too
general E.g. Select * from books;
21.4.1 The Problem of Limited Source Capabilities (con’t) Reasons why a source may limit the ways
in which queries can be asked Earliest database did not use relational
DBMS that supports SQL queries Indexes on large database may make certain
queries feasible, while others are too expensive to execute
Security reasons E.g. Medical database may answer queries about
averages, but won’t disclose details of a particular patient's information
21.4.2 A Notation for Describing Source Capabilities For relational data, the legal forms of
queries are described by adornments Adornments – Sequences of codes that
represent the requirements for the attributes of the relation, in their standard order f(free) – attribute can be specified or not b(bound) – must specify a value for an
attribute but any value is allowed u(unspecified) – not permitted to specify a
value for a attribute
21.4.2 A notation for Describing Source Capabilities….(cont’d)
c[S](choice from set S) means that a value must be specified and value must be from finite set S.
o[S](optional from set S) means either do not specify a value or we specify a value from finite set S
A prime (f’) specifies that an attribute is not a part of the output of the query
A capabilities specification is a set of adornments
A query must match one of the adornments in its capabilities specification
21.4.2 A notation for Describing Source Capabilities….(cont’d)
E.g. Dealer 1 is a source of data in the form:Cars (serialNo, model, color, autoTrans, navi)The adornment for this query form is b’uuuu
21.4.3 Capability-Based Query-Plan Selection Given a query at the mediator, a capability
based query optimizer first considers what queries it can ask at the sources to help answer the query
The process is repeated until: Enough queries are asked at the sources to resolve
all the conditions of the mediator query and therefore query is answered. Such a plan is called feasible.
We can construct no more valid forms of source queries, yet still cannot answer the mediator query. It has been an impossible query.
21.4.3 Capability-Based Query-Plan Selection (cont’d) The simplest form of mediator query
where we need to apply the above strategy is join relations
E.g we have sources for dealer 2 Autos(serial, model, color) Options(serial, option)
Suppose that ubf is the sole adornment for Auto and Options have two adornments, bu and uc[autoTrans, navi]
Query is – find the serial numbers and colors of Gobi models with a navigation system
21.4.4 Adding Cost-Based Optimization
Mediator’s Query optimizer is not done when the capabilities of the sources are examined
Having found feasible plans, it must choose among them
Making an intelligent, cost based query optimization requires that the mediator knows a great deal about the costs of queries involved
Sources are independent of the mediator, so it is difficult to estimate the cost
21.5 Optimizing Mediator Queries Chain algorithm – a greed algorithm that
finds a way to answer the query by sending a sequence of requests to its sources. Will always find a solution assuming at least
one solution exists. The solution may not be optimal.
21.5.1 Simplified Adornment Notation A query at the mediator is limited to b
(bound) and f (free) adornments. We use the following convention for
describing adornments: nameadornments(attributes) where:
name is the name of the relation the number of adornments = the number of
attributes
21.5.2 Obtaining Answers for Subgoals
Rules for subgoals and sources: Suppose we have the following subgoal:
Rx1x2…xn(a1, a2, …, an),
and source adornments for R are: y1y2…yn. If yi is b or c[S], then xi = b. If xi = f, then yi is not output restricted.
The adornment on the subgoal matches the adornment at the source: If yi is f, u, or o[S] and xi is either b or f.
21.5.3 The Chain Algorithm
Maintains 2 types of information: An adornment for each subgoal. A relation X that is the join of the relations for
all the subgoals that have been resolved. Initially, the adornment for a subgoal is b
iff the mediator query provides a constant binding for the corresponding argument of that subgoal.
Initially, X is a relation over no attributes, containing just an empty tuple.
21.5.3 The Chain Algorithm (con’t) First, initialize adornments of subgoals
and X. Then, repeatedly select a subgoal that
can be resolved. Let Rα(a1, a2, …, an) be the subgoal:
1. Wherever α has a b, we shall find the argument in R is a constant, or a variable in the schema of R. Project X onto its variables that appear in R.
21.5.3 The Chain Algorithm (con’t)2. For each tuple t in the project of X, issue a
query to the source as follows (β is a source adornment).
2. If a component of β is b, then the corresponding component of α is b, and we can use the corresponding component of t for source query.
3. If a component of β is c[S], and the corresponding component of t is in S, then the corresponding component of α is b, and we can use the corresponding component of t for the source query.
4. If a component of β is f, and the corresponding component of α is b, provide a constant value for source query.
21.5.3 The Chain Algorithm (con’t)
If a component of β is u, then provide no binding for this component in the source query.
If a component of β is o[S], and the corresponding component of α is f, then treat it as if it was a f.
If a component of β is o[S], and the corresponding component of α is b, then treat it as if it was c[S].
3. Every variable among a1, a2, …, an is now bound. For each remaining unresolved subgoal, change its adornment so any position holding one of these variables is b.
21.5.3 The Chain Algorithm (con’t)4. Replace X with X πs(R), where S is all
of the variables among: a1, a2, …, an.
5. Project out of X all components that correspond to variables that do not appear in the head or in any unresolved subgoal.
If every subgoal is resolved, then X is the answer.
If every subgoal is not resolved, then the algorithm fails.
α
21.5.3 The Chain Algorithm Example Mediator query:
Q: Answer(c) ← Rbf(1,a) AND Sff(a,b) AND Tff(b,c) Example:
Relation R S TData
Adornment bf c’[2,3,5]f bu
w x
1 2
1 3
1 4
x y
2 4
3 5
y z
4 6
5 7
5 8
21.5.3 The Chain Algorithm Example (con’t) Initially, the adornments on the subgoals are
the same as Q, and X contains an empty tuple. S and T cannot be resolved because they each
have ff adornments, but the sources have either a b or c.
R(1,a) can be resolved because its adornments are matched by the source’s adornments.
Send R(w,x) with w=1 to get the tables on the previous page.
21.5.3 The Chain Algorithm Example (con’t) Project the subgoal’s relation onto its
second component, since only the second component of R(1,a) is a variable.
This is joined with X, resulting in X equaling this relation.
Change adornment on S from ff to bf.
a
2
3
4
21.5.3 The Chain Algorithm Example (con’t) Now we resolve Sbf(a,b):
Project X onto a, resulting in X. Now, search S for tuples with attribute a
equivalent to attribute a in X.
Join this relation with X, and remove a because it doesn’t appear in the head nor any unresolved subgoal:
a b
2 4
3 5
b
4
5
21.5.3 The Chain Algorithm Example (con’t) Now we resolve Tbf(b,c):
Join this relation with X and project onto the c attribute to get the relation for the head.
Solution is {(6), (7), (8)}.
b c
4 6
5 7
5 8
21.5.4 Incorporating Union Views at the Mediator This implementation of the Chain
Algorithm does not consider that several sources can contribute tuples to a relation.
If specific sources have tuples to contribute that other sources may not have, it adds complexity.
To resolve this, we can consult all sources, or make best efforts to return all the answers.
21.5.4 Incorporating Union Views at the Mediator (con’t) Consulting All Sources
We can only resolve a subgoal when each source for its relation has an adornment matched by the current adornment of the subgoal.
Less practical because it makes queries harder to answer and impossible if any source is down.
Best Efforts We need only 1 source with a matching
adornment to resolve a subgoal. Need to modify chain algorithm to revisit each
subgoal when that subgoal has new bound requirements.