14
Efficient Mid-Query Re- Optimization of Sub-Optimal Query Execution Plans Kabra and DeWitt presented by Zack Ives CSE 590DB, May 11, 1998

Efficient Mid-Query Re-Optimization of Sub-Optimal Query Execution Plans

  • Upload
    adanne

  • View
    33

  • Download
    2

Embed Size (px)

DESCRIPTION

Efficient Mid-Query Re-Optimization of Sub-Optimal Query Execution Plans. Kabra and DeWitt presented by Zack Ives CSE 590DB, May 11, 1998. The Problem. Query execution plans are formulated based on estimated cost of operations, which in turn depend on estimates of table size and cardinality - PowerPoint PPT Presentation

Citation preview

Page 1: Efficient Mid-Query Re-Optimization of Sub-Optimal Query Execution Plans

Efficient Mid-Query Re-Optimization of Sub-Optimal Query Execution Plans

Kabra and DeWitt

presented by Zack IvesCSE 590DB, May 11, 1998

Page 2: Efficient Mid-Query Re-Optimization of Sub-Optimal Query Execution Plans

The Problem

Query execution plans are formulated based on estimated cost of operations, which in turn depend on estimates of table size and cardinality

These estimates may be highly inaccurate, especially for user-defined types or predicates

The errors become multiplicative as the number of joins increases

We might have chosen a more nearly optimal plan based on greater knowledge

Page 3: Efficient Mid-Query Re-Optimization of Sub-Optimal Query Execution Plans

The Solution

We shall monitor how the query is doing at key points, and consider dynamically re-optimizing those portions of the query which have not yet been started

Since re-optimization is expensive, we shall only do it if we think we will see an improvement

Page 4: Efficient Mid-Query Re-Optimization of Sub-Optimal Query Execution Plans

Elements of the Algorithm

Annotated Query Execution Plans Annotate plan with estimates of size

Runtime Collection of Statistics Statistics collectors embedded in execution tree Keep overhead down

Dynamic Resource Re-allocation Reallocate memory to individual operations

Query Plan Modification May wish to re-optimize the remainder of query

Page 5: Efficient Mid-Query Re-Optimization of Sub-Optimal Query Execution Plans

Annotated Query Plans

We save at each point in the tree the expected: Sizes and cardinalities Selectivities of predicates Estimates of number of groups to be

aggregated

Page 6: Efficient Mid-Query Re-Optimization of Sub-Optimal Query Execution Plans

Statistics Collectors

Add into tree Must be

collectable in a single pass

Will only help with portions of query “beyond” the current pipeline

Page 7: Efficient Mid-Query Re-Optimization of Sub-Optimal Query Execution Plans

Resource Re-Allocation

Based on improved estimates, we can modify the memory allocated to each operation

Results: less I/O, better performance

Only for operations that have not yet begun executing

Page 8: Efficient Mid-Query Re-Optimization of Sub-Optimal Query Execution Plans

Plan Modification

Only re-optimize part not begun

Suspend query, save intermediate in temp file

Create new plan for remainder, treating temp as an input

Page 9: Efficient Mid-Query Re-Optimization of Sub-Optimal Query Execution Plans

Re-Optimization

When to re-optimize: Calculate time current should take (using gathered

stats) Only consider re-optimization if:

Our original estimate was off by at least some factor 2 and if

Topt, estimated < 1Tcur-plan,improved where 1 5% and cost of optimization depends on number of operators, esp. joins

Only modify the plan if the new estimate, including the cost of writing the temp file, is better

Page 10: Efficient Mid-Query Re-Optimization of Sub-Optimal Query Execution Plans

Low-Overhead Statistics

Want to find “most effective” statistics Don’t want to gather statistics for “simple”

queries Want to limit effect of algorithm to maximum

overhead ratio, Factors:

Probability of inaccuracyFraction of query affected

Page 11: Efficient Mid-Query Re-Optimization of Sub-Optimal Query Execution Plans

Inaccuracy Potentials

The following heuristics are used: Inaccuracy potential = low, medium, high Lower if we have more information on table

value distribution 1+max of inputs for multiple-input selection Always high for user-defined methods Always high for non-equijoins For most other operators, same as worst of

inputs

Page 12: Efficient Mid-Query Re-Optimization of Sub-Optimal Query Execution Plans

More Heuristics

Check fraction of query affected Check how many other

operators use the same statistic

The winner: Higher inaccuracy

potentials first Then, if a tie, the one

affecting the larger portion of the plan

Page 13: Efficient Mid-Query Re-Optimization of Sub-Optimal Query Execution Plans

Implementation

On top of Paradise (parallel database that supports ADTs, built on OO framework)

Using System-R optimizerNew SCIA (Stat Collector Insertion Algorithm)

and Dynamic Re-Optimization modules

Page 14: Efficient Mid-Query Re-Optimization of Sub-Optimal Query Execution Plans

It Works!

Results are 5% worse for simple queries, much better for complex queries Of course, we would not really collect statistics on

simple queries Data skew made a slight difference - both normal

and re-optimized queries performed slightly better