View
0
Download
0
Category
Preview:
Citation preview
© Dennis Shasha, Philippe Bonnet - 2001
QUERY PROCESSINGBased on slides from Shasha et al, Kifer et al.
© Dennis Shasha, Philippe Bonnet - 2001
Query TuningSELECT s.RESTAURANT_NAME, t.TABLE_SEATING, to_char(t.DATE_TIME,'Dy, Mon FMDD') AS THEDATE, to_char(t.DATE_TIME,'HH:MI PM') AS THETIME,to_char(t.DISCOUNT,'99') || '%' AS AMOUNTVALUE,t.TABLE_ID, s.SUPPLIER_ID, t.DATE_TIME, to_number(to_char(t.DATE_TIME,'SSSSS')) AS SORTTIME
FROM TABLES_AVAILABLE t, SUPPLIER_INFO s,
(SELECT s.SUPPLIER_ID, t.TABLE_SEATING, t.DATE_TIME, max(t.DISCOUNT) AMOUNT, t.OFFER_TYPEFROM TABLES_AVAILABLE t, SUPPLIER_INFO WHERE t.SUPPLIER_ID = s.SUPPLIER_ID
and (TO_CHAR(t.DATE_TIME, 'MM/DD/YYYY') != TO_CHAR(sysdate, 'MM/DD/YYYY') OR TO_NUMBER(TO_CHAR(sysdate, 'SSSSS')) < s.NOTIFICATION_TIME - s.TZ_OFFSET)
and t.NUM_OFFERS > 0and t.DATE_TIME > SYSDATE and s.CITY = 'SF' and t.TABLE_SEATING = '2' and t.DATE_TIME between sysdate and (sysdate + 7) and to_number(to_char(t.DATE_TIME, 'SSSSS')) between 39600 and 82800 and t.O F F E R _T Y P E = 'D iscou nt„
GROUP BYs.SUPPLIER_ID, t.TABLE_SEATING, t.DATE_TIME, t.OFFER_TYP
) u
WHEREt.SUPPLIER_ID = s.SUPPLIER_ID
and u.SUPPLIER_ID = s.SUPPLIER_IDand t.SUPPLIER_ID = u.SUPPLIER_IDand t.TABLE_SEATING = u.TABLE_SEATINGand t.DATE_TIME = u.DATE_TIMEand t.DISCOUNT = u.AMOUNTand t.OFFER_TYPE = u.OFFER_TYPEand (TO_CHAR(t.DATE_TIME, 'MM/DD/YYYY') !=
TO_CHAR(sysdate, 'MM/DD/YYYY') OR TO_NUMBER(TO_CHAR(sysdate, 'SSSSS')) < s.NOTIFICATION_TIME - s.TZ_OFFSET)
and t.NUM_OFFERS > and t.DATE_TIME > SYSDATE and s.CITY = 'SF' and t.TABLE_SEATING = '2' and t.DATE_TIME between sysdate and (sysdate + 7) and
to_number(to_char(t.DATE_TIME, 'SSSSS')) between 39600 and 82800 and t.OFFER_TYPE = 'Discount'
ORDER BY AMOUNTVALUE DESC, t.TABLE_SEATING ASC, upper(s.RESTAURANT_NAME) ASC,SORTTIME ASC, t.DATE_TIME ASC
E xecution is too slow …1) How is this query executed?2) How to make it run faster?
© Dennis Shasha, Philippe Bonnet - 2001
Query Execution PlanOutput of the Oracle EXPLAIN tool
Physical Operators
Access Method Cost Model
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=165 Card=1 Bytes=106)
1 0 SORT (ORDER BY) (Cost=165 Card=1 Bytes=106)
2 1 NESTED LOOPS (Cost=164 Card=1 Bytes=106)
3 2 NESTED LOOPS (Cost=155 Card=1 Bytes=83)
4 3 TABLE ACCESS (FULL) OF 'TABLES_AVAILABLE' (Cost=72 Card=1 Bytes=28)
5 3 VIEW
6 5 SORT (GROUP BY) (Cost=83 Card=1 Bytes=34)
7 6 NESTED LOOPS (Cost=81 Card=1 Bytes=34)
8 7 TABLE ACCESS (FULL) OF 'TABLES_AVAILABLE' (Cost=72 Card=1 Bytes=24)
9 7 TABLE ACCESS (FULL) OF 'SUPPLIER_INFO' (Cost=9 Card=20 Bytes=200)
10 2 TABLE ACCESS (FULL) OF 'SUPPLIER_INFO' (Cost=9 Card=20 Bytes=460)
© Dennis Shasha, Philippe Bonnet - 20014
Query Processing Architecture
© Dennis Shasha, Philippe Bonnet - 2001
Agenda
• Query Representation• Access Path• Physical Operators• Query Optimization• Back to Query Representation
© Dennis Shasha, Philippe Bonnet - 2001
Query Representation
• A query is decomposed into blocks– Aggregation– Order by– SPJ– Relations
• Each block is represented and optimized independently– Inter block optimization is possible/desirable.
© Dennis Shasha, Philippe Bonnet - 2001
Query Representation
• Query Tree • Query graph
Reserves Sailors
sid=sid
bid=100
sname(On-the-fly)
rating > 5(Scan;write to temp T1)
(Scan;write totemp T2)
(Sort-Merge Join)
Reserves
Sailors
rating > 5
bid=100
sid=sid
© Dennis Shasha, Philippe Bonnet - 2001
Query Execution PlanOutput of the Oracle EXPLAIN tool
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=165 Card=1 Bytes=106)
1 0 SORT (ORDER BY) (Cost=165 Card=1 Bytes=106)
2 1 NESTED LOOPS (Cost=164 Card=1 Bytes=106)
3 2 NESTED LOOPS (Cost=155 Card=1 Bytes=83)
4 3 TABLE ACCESS (FULL) OF 'TABLES_AVAILABLE' (Cost=72 Card=1 Bytes=28)
5 3 VIEW
6 5 SORT (GROUP BY) (Cost=83 Card=1 Bytes=34)
7 6 NESTED LOOPS (Cost=81 Card=1 Bytes=34)
8 7 TABLE ACCESS (FULL) OF 'TABLES_AVAILABLE' (Cost=72 Card=1 Bytes=24)
9 7 TABLE ACCESS (FULL) OF 'SUPPLIER_INFO' (Cost=9 Card=20 Bytes=200)
10 2 TABLE ACCESS (FULL) OF 'SUPPLIER_INFO' (Cost=9 Card=20 Bytes=460)
© Dennis Shasha, Philippe Bonnet - 2001
Agenda
• Query Representation• Access Path• Physical Operators• Query Optimization• Back to Query Representation
© Dennis Shasha, Philippe Bonnet - 200110
Access Path•• Access pathAccess path is the notion that denotes algorithm +
data structure used to locate rows satisfying some condition
• Examples:– File scan: can be used for any condition– Hash: equality search; all search key attributes of hash
index are specified in condition– B+ tree: equality or range search; a prefix of the search
key attributes are specified in condition• B+ tree supports a variety of access paths
– Binary search: Relation sorted on a sequence of attributes and some prefix of that sequence is specified in condition
© Dennis Shasha, Philippe Bonnet - 200111
Access Paths Supported by B+ tree• Example: Given a B+ tree whose search key is the
sequence of attributes a2, a1, a3, a4– Access path for search a1>5 AND a2=3 AND a3= ‘x’ (R): find
first entry having a2=3 AND a1>5 AND a3= ‘x’ and scan leaves from there until entry having a2>3 or a3 ‘x’. Select satisfying entries
– Access path for search a2=3 AND a3 >‘x’ (R): locate first entry having a2=3 and scan leaves until entry having a2>3. Select satisfying entries
– Access path for search a1>5 AND a3 = ‘x’ (R): Scan of R
© Dennis Shasha, Philippe Bonnet - 200112
Choosing an Access Path•• SelectivitySelectivity of an access path = number of pages
retrieved using that path• Size of domain of attribute is an indicator of the
selectivity of search conditions that involve that attribute
• Example: CrsCode= „C S 305‟ AND Grade= „B ‟ (TranscriptTranscript)– a B+ tree with search key CrsCode has lower selectivity
than a B+ tree with search key Grade
© Dennis Shasha, Philippe Bonnet - 2001
Agenda
• Query Representation• Access Path• Physical Operators• Query Optimization• Back to Query Representation
© Dennis Shasha, Philippe Bonnet - 2001
Physical Operators
• Algorithms– Sort– Select – Project– Join
• Nested Loop• Sort-Merge• Hash-Join
© Dennis Shasha, Philippe Bonnet - 200115
External Sorting• Sorting is used in implementing many relational
operations• Problem:
– Relations are typically large, do not fit in main memory– So cannot use traditional in-memory sorting algorithms
• Approach used:– Combine in-memory sorting with techniques aimed at
minimizing I/O– I/O costs dominate => cost of sorting algorithm is measured
in the number of page transfers
© Dennis Shasha, Philippe Bonnet - 200116
E xternal S orting (cont‟d)
• External sorting has two main components:– Computation involved in sorting records in
buffers in main memory– I/O necessary to move records between mass
store and main memory
© Dennis Shasha, Philippe Bonnet - 200117
Simple Sort Algorithm• M = number of main memory page buffers• F = number of pages in file to be sorted• Typical algorithm has two phases:
– Partial sort phase: sort M pages at a time; create F/M sorted runsruns on mass store, cost = 2F
Example: M = 2, F = 7run
Original file
Partially sorted file
5 3 2 6 1 10 15 7 20 11 8 4 7 5
2 3 5 6 1 7 10 15 4 8 11 20 5 7
© Dennis Shasha, Philippe Bonnet - 200118
Simple Sort Algorithm– Merge Phase: merge all runs into a single run
using M-1 buffers for input and 1 output buffer• Merge step: divide runs into groups of size M-1 and merge each group into a run; cost = 2F
each step reduces number of runs by a factor of M-1
M pagesBuffer
© Dennis Shasha, Philippe Bonnet - 200119
Duplicate Elimination
• A major step in computing projection, union, and difference relational operators
• Algorithm:– Sort– At the last stage of the merge step eliminate
duplicates on the fly– No additional cost (with respect to sorting) in
terms of I/O
© Dennis Shasha, Philippe Bonnet - 200120
Sort-Based Projection
• Algorithm:– Sort rows of relation– Eliminate unwanted columns in partial sort
phase (no additional cost)– Eliminate duplicates on completion of last
merge step (no additional cost)• Cost: the cost of sorting
© Dennis Shasha, Philippe Bonnet - 200121
Computing Selection (attr op value)
• No index on attr:– If rows are not sorted on attr:
• Scan all data pages to find rows satisfying selection condition
• Cost = F– If rows are sorted on attr and op is =, >, < then:
• Use binary search (at log2 F ) to locate first data page containing row in which (attr = value)
• Scan further to get all rows satisfying (attr op value)• Cost = log2 F + (cost of scan)
© Dennis Shasha, Philippe Bonnet - 200122
Computing Selection (attr op value)• Clustered B+ tree index on attr (for “= ” or range search):
– Locate first index entry corresponding to a row in which (attr = value). CostCost = depth of tree
– Rows satisfying condition packed in sequence in successive data pages; scan those pages.CostCost: number of pages occupied by qualifying rows
B+ treeindex entries(containing rows)that satisfycondition
© Dennis Shasha, Philippe Bonnet - 200123
Computing Selection (attr op value)
• Unclustered B+ tree index on attr (for “= ” or range search):– Locate first index entry corresponding to a row in which (attr
= value). CostCost = depth of tree
– Index entries with pointers to rows satisfying condition are packed in sequence in successive index pages• Scan entries and sort record Ids to identify table data pages
with qualifying rowsAny page that has at least one such row must be fetched once.
•• CostCost: number of rows that satisfy selection condition
© Dennis Shasha, Philippe Bonnet - 200124
Unclustered B+ Tree Indexindex entries (containingrow Ids) that satisfycondition
data page
Data file
B+ Tree
© Dennis Shasha, Philippe Bonnet - 200125
Computing Selection (attr = value)
• Hash index on attr (for “= ” search only):– Hash on value. CostCost 1.2
• 1.2 – typical average cost of hashing (> 1 due to possible overflow chains)
• Finds the (unique) bucket containing all index entries satisfying selection condition
• Clustered index – all qualifying rows packed in the bucket (a few pages)CostCost: number of pages occupies by the bucket
• Unclustered index – sort row Ids in the index entries to identify data pages with qualifying rowsEach page containing at least one such row must be fetched onceCostCost: min(number of qualifying rows in bucket, number of pages in file)
© Dennis Shasha, Philippe Bonnet - 200126
Computing Joins• The cost of joining two relations makes the
choice of a join algorithm crucial•• SimpleSimple blockblock--nested loopsnested loops join algorithm
for computing r A=B s
foreach page pr in r doforeach page ps in s do
output pr A=B ps
© Dennis Shasha, Philippe Bonnet - 200127
Block-Nested Loop Illustrated
Output buffer
s
r
Input buffer for s
Input buffer for r
… and so on
r s
© Dennis Shasha, Philippe Bonnet - 200128
Index-Nested Loop Join r A=B s• Use an index on s with search key B (instead of
scanning s) to find rows of s that match tr–– Cost Cost = r + r + cost of outputting final result
– Effective if number of rows of s that match tuples in r is small (i.e., is small) and index is clustered
foreach tuple tr in r do {use index to find all tuples ts in s satisfying tr.A=ts.B;output (tr, ts)
}
Number of rows in r
avg cost of retrieving all rows in s that match tr
© Dennis Shasha, Philippe Bonnet - 200129
Sort-Merge Join r A=B ssort r on A;sort s on B;while !eof(r) and !eof(s) do {
Scan r and s concurrently until tr.A=ts.B=c;Output A=c(r)B=c (s)
}
r
s
B=c (s)
A=c(r)
© Dennis Shasha, Philippe Bonnet - 200130
Hash-Join r A=B s
• Step 1: Hash r on A and s on B into the same set of buckets
• Step 2: Since matching tuples must be in same bucket, read each bucket in turn and output the result of the join
•• CostCost:: 3 (r + s ) + cost of output of final result– assuming each bucket fits in memory
© Dennis Shasha, Philippe Bonnet - 2001
Agenda
• Query Representation• Access Path• Physical Operators• Query Optimization• Back to Query Representation
© Dennis Shasha, Philippe Bonnet - 200132
Query Optimizer
• Uses heuristic algorithms to evaluate relational algebra expressions. This involves:– estimating the cost of a relational algebra expression– transforming one relational algebra expression to an
equivalent one– choosing access paths for evaluating the subexpressions
• Q uery optim izers do not “optim ize” – just try to find “reasonably good” evaluation strategies
© Dennis Shasha, Philippe Bonnet - 200133
Equivalence Preserving Transformations
• To transform a relational expression into another equivalent expression we need transformation rules that preserve equivalence
• Each transformation rule– Is provably correct (ie, does preserve equivalence)– Has a heuristic associated with it
© Dennis Shasha, Philippe Bonnet - 200134
Selection and Projection Rules
• Break complex selection into simpler ones:– Cond1Cond2 (R) Cond1 (Cond2 (R) )
• Break projection into stages:– attr (R) attr ( attr(R)), if attr attr
• Commute projection and selection:– attr (Cond(R)) Cond ( attr (R)),
if attr all attributes in Cond
© Dennis Shasha, Philippe Bonnet - 200135
Commutativity and Associativity of Join(and Cartesian Product as Special Case)
• Join commutativity: R S S R– used to reduce cost of nested loop evaluation strategies (smaller relation
should be in outer loop)
• Join associativity: R (S T) (R S) T– used to reduce the size of intermediate relations in computation of multi-
relational join – first compute the join that yields smaller intermediate result
• N-way join has T(N) N! different evaluation plans– T(N) is the number of parenthesized expressions– N! is the number of permutations
• Query optimizer cannot look at all plans (might take longer to find an optimal plan than to compute query brute-force). Hence it does not necessarily produce optimal plan
© Dennis Shasha, Philippe Bonnet - 200136
Pushing Selections and Projections
• Cond (R S) R Cond S – Cond relates attributes of both R and S– Reduces size of intermediate relation since rows can be
discarded sooner
• Cond (R S) Cond (R) S – Cond involves only the attributes of R– Reduces size of intermediate relation since rows of R are
discarded sooner
• attr(R S) attr(attr(R) S),if attributes(R) attr attr ∩ attributes(R)– reduces the size of an operand of product
© Dennis Shasha, Philippe Bonnet - 200137
Equivalence Example
• C1 C2 C3 (RR SS) C1 (C2 (C3 (RR SS) ) ) C1 (C2 (RR) C3 (SS) ) C2 (RR) C1C3 (SS)
assuming C2 involves only attributes of RR, C3 involves only attributes of SS, and C1 relates attributes of R and SS
© Dennis Shasha, Philippe Bonnet - 200138
SELECT P.NameFROM ProfessorProfessor P, TeachingTeaching TWHERE P.Id = T.ProfId -- join condition
AND P. DeptId = „C S ‟ AND T.Semester = „F 1994‟
Name(D eptId= ‘C S’ Sem ester= ‘F 1994’(ProfessorProfessor Id=ProfId TeachingTeaching))
Cost - Example 1
Name
D eptId= ‘C S’ Sem ester= ‘F 1994’
Id=ProfId
ProfessorProfessor TeachingTeaching
Master query execution plan (nothing pushed)
© Dennis Shasha, Philippe Bonnet - 200139
Metadata on Tables (in system catalogue)
–– ProfessorProfessor (Id, Name, DeptId)• size: 200 pages, 1000 rows, 50 departments• indexes: clustered, 2-level B+tree on DeptId, hash on Id
–– TeachingTeaching (ProfId, CrsCode, Semester)• size: 1000 pages, 10,000 rows, 4 semesters• indexes: clustered, 2-level B+tree on Semester;
hash on ProfId– Definition: WeightWeight of an attribute – average number
of rows that have a particular value• weight of Id = 1 (it is a key)• weight of ProfId = 10 (10,000 classes/1000 professors)
© Dennis Shasha, Philippe Bonnet - 200140
Estimating Cost - Example 1• Join - block-nested loops with 52 page buffer (50
pages – input for Professor, Professor, 1 page – input for TeachingTeaching, 1 – output page– Scanning ProfessorProfessor (outer loop): 200 page transfers,
(4 iterations, 50 transfers each)– Finding matching rows in TeachingTeaching (inner loop):
1000 page transfers for each iteration of outer loop– Total cost = 200+4*1000 = 4200 page transfers
© Dennis Shasha, Philippe Bonnet - 200141
Estimating Cost - E xam ple 1 (cont‟d)
• Selection and projection – scan rows of interm ediate file, discard those that don‟t satisfy selection, project on those that do, write result when output buffer is full.
• Complete algorithm:– do join, write result to intermediate file on disk– read intermediate file, do select/project, write final
result– Problem: unnecessary I/O
© Dennis Shasha, Philippe Bonnet - 200142
Pipelining• Solution: use pipeliningpipelining:
– join and select/project act as coroutines, operate as producer/consumer sharing a buffer in main memory.• When join fills buffer; select/project filters it and outputs
result• Process is repeated until select/project has processed last
output from join – Performing select/project adds no additional cost
joinjoin selectselect/projectprojectIntermediateresult
outputfinal result
buffer
© Dennis Shasha, Philippe Bonnet - 200143
Estimating Cost - E xam ple 1 (cont‟d)
• Total cost:4200 + (cost of outputting final result)
– We will disregard the cost of outputting final result in comparing with other query evaluation strategies, since this will be same for all
© Dennis Shasha, Philippe Bonnet - 200144
Name(Sem ester= ‘F 1994’ (D eptId= ‘C S’ (ProfessorProfessor) Id=ProfId TeachingTeaching))
Cost Example 2SELECT P.NameFROM ProfessorProfessor P, TeachingTeaching TWHERE P.Id = T.ProfId AND
P. DeptId = „C S ‟ AND T.Semester = „F 1994‟
Name
Sem ester= ‘F 1994’
DeptId= „C S ‟
ProfessorProfessor TeachingTeaching
Id=ProfId
Partially pushed plan: selection pushed to ProfessorProfessor
© Dennis Shasha, Philippe Bonnet - 200145
Cost Example 2 -- selection• Compute DeptId= „C S ‟ (ProfessorProfessor) (to reduce size of one
join table) using clustered, 2-level B+ tree on DeptId.– 50 departments and 1000 professors; hence weight
of DeptId is 20 (roughly 20 CS professors). These rows are in ~4 consecutive pages in ProfessorProfessor.• Cost = 4 (to get rows) + 2 (to search index) = 6• keep resulting 4 pages in memory and pipe to next step
clustered indexon DeptId rows of
ProfessorProfessor
© Dennis Shasha, Philippe Bonnet - 200146
Cost Example 2 -- join
• Index-nested loops join using hash index on ProfId of TeachingTeaching and looping on the selected professors (computed on previous slide)– Since selection on Semester was not pushed, hash
index on ProfId of TeachingTeaching can be used – Note: if selection on Semester were pushed, the
index on ProfId would have been lost – an advantage of not using a fully pushed query execution plan
© Dennis Shasha, Philippe Bonnet - 200147
Cost Example 2 – join (cont‟d)– Each professor matches ~10 TeachingTeaching rows. Since 20 CS
professors, hence 200 teaching records.– All index entries for a particular ProfId are in same bucket.
Assume ~1.2 I/Os to get a bucket.• Cost = 1.2 20 (to fetch index entries for 20 CS
professors) + 200 (to fetch TeachingTeaching rows, since hash index is unclustered) = 224
TeachingTeachinghash
1.2 20 10
ProfId
© Dennis Shasha, Philippe Bonnet - 200148
Cost Example 2 – select/project
• Pipe result of join to select (on Semester) and project (on Name) at no I/O cost
• Cost of output same as for Example 1• Total cost:
6 (select on ProfessorProfessor) + 224 (join) = 230• Comparison:
4200 (example 1) vs. 230 (example 2) !!!
© Dennis Shasha, Philippe Bonnet - 200149
Choosing Query Execution Plan
• Step 1: Choose a logical plan• Step 2: Reduce search space• Step 3: Use a heuristic search to further
reduce complexity
© Dennis Shasha, Philippe Bonnet - 200150
Step 1: Choosing a Logical Plan
• Involves choosing a query tree, which indicates the order in which algebraic operations are applied
• Heuristic: P ushed trees are good, but som etim es “nearly fully pushed” trees are better due to indexing (as w e saw in the example)
• So: T ake the initial “m aster plan” tree and produce a fully pushed tree plus several nearly fully pushed trees.
© Dennis Shasha, Philippe Bonnet - 200151
Step 2: Reduce Search Space
• Deal with associativity of binary operators (join, union, … )
A B C DA B C D
A B C DA B C D
DD
CC
A BA BLogical queryexecution plan
Equivalentquery tree Equivalent left left
deep query treedeep query tree
© Dennis Shasha, Philippe Bonnet - 200152
S tep 2 (cont‟d)
• Two issues:– Choose a particular shape of a tree (like in the
previous slide)• Equals the number of ways to parenthesize N-way
join – grows very rapidly– Choose a particular permutation of the leaves
• E.g., 4! permutations of the leaves A, B, C, DA, B, C, D
© Dennis Shasha, Philippe Bonnet - 200153
Step 2: Dealing With Associativity• Too many trees to evaluate: settle on a particular
shape: leftleft--deep tree.deep tree.– Used because it allows pipeliningpipelining:
– Property: once a row of XX has been output by P1, it need not be output again (but C may have to be processed several times in P2 for successive portions of XX)
– Advantage: none of the intermediate relations (X, YX, Y) have to be completely materialized and saved on disk.• Important if one such relation is very large, but the final result is
small
A BA B X CX C Y DY DXX YY
P1 P2 P3
© Dennis Shasha, Philippe Bonnet - 200154
Step 2: Dealing with Associativity
• consider the alternative: if we use the association ((A BA B) (C DC D))
A BA B
C DC D
X YX Y
XX
YY
Each row of XX mustbe processed againstall of YY. Hence all ofYY (can be very large)must be stored in P3, or P2 has torecompute it several times.
P1
P2P3
© Dennis Shasha, Philippe Bonnet - 200155
Step 3: Heuristic Search
• The choice of left-deep trees still leaves open too many options (N! permutations):– (((A BA B) CC) DD), – (((C AC A) DD) BB), … ..
• A heuristic (often dynamic programming based) algorithm is used to get a „good‟ plan
© Dennis Shasha, Philippe Bonnet - 200156
Step 3: Dynamic Programming Algorithm
• To compute a join of E1, E2, … , E N in a left-deep manner:– Start with 1-relation expressions (can involve , )– C hoose the best and “nearly best” plans for each (a plan is
considered nearly best if its out put has som e “interesting” form, e.g., is sorted)
– Combine these 1-relation plans into 2-relation expressions. Retain only the best and nearly best 2-relation plans
– Do same for 3-relation expressions, etc.
© Dennis Shasha, Philippe Bonnet - 2001
Performance Impact of the Query Rewritings
>10000
-10
0
10
20
30
40
50
60
70
80
Thro
ughp
ut ra
tio
SQLServer 2000Oracle 8iDB2 V7.1
Running Example100000 Employees100000 Students10 tech
© Dennis Shasha, Philippe Bonnet - 2001
Agenda
• Query Representation• Access Path• Physical Operators• Query Optimization• Back to Query Representation
© Dennis Shasha, Philippe Bonnet - 2001
SubqueriesTraditional optimization:• Block / Block
optimization• Nested loop across
blocks
select c custkey
from customer,
(select o custkey from orders
group by c custkey
having 1000000 < sum(o totalprice))
as AggResult
where o custkey = c custkey
© Dennis Shasha, Philippe Bonnet - 2001
Subqueries
• Using an algebraic notation to represent subqueries
• Apply is a higher order operator that takes as input a relation and a parametrized expression and outputs a relation.
© Dennis Shasha, Philippe Bonnet - 2001
Subqueries
• The query tree can be rewritten to eliminate the apply operator
1000000<X(customer A. G1 X=sum(o price) o custkey=c custkeyorders)
= 1000000<X Gc custkey;X=sum(o price)(customer ALOJ o custkey=c custkeyorders);
= 1000000<X Gc custkey;X=sum(o price)(customer LOJ o custkey=c custkeyorders);
= 1000000<X Gc custkey;X=sum(o price)(customer JOIN o custkey=c custkey orders);
Recommended