Upload
adanne
View
33
Download
2
Embed Size (px)
DESCRIPTION
Efficient Mid-Query Re-Optimization of Sub-Optimal Query Execution Plans. Kabra and DeWitt presented by Zack Ives CSE 590DB, May 11, 1998. The Problem. Query execution plans are formulated based on estimated cost of operations, which in turn depend on estimates of table size and cardinality - PowerPoint PPT Presentation
Citation preview
Efficient Mid-Query Re-Optimization of Sub-Optimal Query Execution Plans
Kabra and DeWitt
presented by Zack IvesCSE 590DB, May 11, 1998
The Problem
Query execution plans are formulated based on estimated cost of operations, which in turn depend on estimates of table size and cardinality
These estimates may be highly inaccurate, especially for user-defined types or predicates
The errors become multiplicative as the number of joins increases
We might have chosen a more nearly optimal plan based on greater knowledge
The Solution
We shall monitor how the query is doing at key points, and consider dynamically re-optimizing those portions of the query which have not yet been started
Since re-optimization is expensive, we shall only do it if we think we will see an improvement
Elements of the Algorithm
Annotated Query Execution Plans Annotate plan with estimates of size
Runtime Collection of Statistics Statistics collectors embedded in execution tree Keep overhead down
Dynamic Resource Re-allocation Reallocate memory to individual operations
Query Plan Modification May wish to re-optimize the remainder of query
Annotated Query Plans
We save at each point in the tree the expected: Sizes and cardinalities Selectivities of predicates Estimates of number of groups to be
aggregated
Statistics Collectors
Add into tree Must be
collectable in a single pass
Will only help with portions of query “beyond” the current pipeline
Resource Re-Allocation
Based on improved estimates, we can modify the memory allocated to each operation
Results: less I/O, better performance
Only for operations that have not yet begun executing
Plan Modification
Only re-optimize part not begun
Suspend query, save intermediate in temp file
Create new plan for remainder, treating temp as an input
Re-Optimization
When to re-optimize: Calculate time current should take (using gathered
stats) Only consider re-optimization if:
Our original estimate was off by at least some factor 2 and if
Topt, estimated < 1Tcur-plan,improved where 1 5% and cost of optimization depends on number of operators, esp. joins
Only modify the plan if the new estimate, including the cost of writing the temp file, is better
Low-Overhead Statistics
Want to find “most effective” statistics Don’t want to gather statistics for “simple”
queries Want to limit effect of algorithm to maximum
overhead ratio, Factors:
Probability of inaccuracyFraction of query affected
Inaccuracy Potentials
The following heuristics are used: Inaccuracy potential = low, medium, high Lower if we have more information on table
value distribution 1+max of inputs for multiple-input selection Always high for user-defined methods Always high for non-equijoins For most other operators, same as worst of
inputs
More Heuristics
Check fraction of query affected Check how many other
operators use the same statistic
The winner: Higher inaccuracy
potentials first Then, if a tie, the one
affecting the larger portion of the plan
Implementation
On top of Paradise (parallel database that supports ADTs, built on OO framework)
Using System-R optimizerNew SCIA (Stat Collector Insertion Algorithm)
and Dynamic Re-Optimization modules
It Works!
Results are 5% worse for simple queries, much better for complex queries Of course, we would not really collect statistics on
simple queries Data skew made a slight difference - both normal
and re-optimized queries performed slightly better