Proactive Query Re-optimization

Proactive Proactive Query Query Re-optimizationRe-optimization

bybyÇetin MeriçliÇetin Meriçli

12.10.200512.10.2005

OverviewOverviewQuery ProcessingQuery Processing

Query OptimizationQuery Optimization– IdeaIdea– ProblemsProblems

Solutions to problems in query optimizationSolutions to problems in query optimization– Reactive re-optimizationReactive re-optimization– Proactive Re-optimizationProactive Re-optimization

RIO Implementation DetailsRIO Implementation Details

Query ProcessingQuery Processing

A SQL statement is subjected to four A SQL statement is subjected to four phases of processingphases of processing– ParsingParsing– OptimizationOptimization– Code GenerationCode Generation– ExecutionExecution

Query OptimizationQuery Optimization

Same result set for a query can be Same result set for a query can be obtained in more than one way.obtained in more than one way.

Depending on the query, different Depending on the query, different execution plans may have different costs.execution plans may have different costs.

Query optimizers try to find an execution Query optimizers try to find an execution plan with the lowest cost for a given query plan with the lowest cost for a given query based on some statistical estimations based on some statistical estimations about the data. about the data.

Query Optimization (cont’d)Query Optimization (cont’d)

Traditional optimization follows a Traditional optimization follows a plan-plan-first-execute-nextfirst-execute-next approach approach

This approach enumerates all execution This approach enumerates all execution plans, computes the cost of each plan and plans, computes the cost of each plan and picks the plan with the lowest costpicks the plan with the lowest cost

Performance highly depends on the Performance highly depends on the accuracy of the estimated statistics used accuracy of the estimated statistics used to compute coststo compute costs


Example:Example:

select * from R, S select * from R, S

where R.a = S.a and where R.a = S.a and

R.b > K1 and R.b > K1 and

R.c > K2R.c > K2


Assume thatAssume that– DB Buffer cache size is 200 MbDB Buffer cache size is 200 Mb– |R| = 500 Mb|R| = 500 Mb– |S| = 160 Mb|S| = 160 Mb– | | σσ(R) | = 300 Mb(R) | = 300 Mb

Due to skew and correlations in the data, Due to skew and correlations in the data, optimizer estimates | optimizer estimates | σσ(R) | to be 150 Mb(R) | to be 150 Mb


Two parts of the queryTwo parts of the query– SS– σσ(R) (result of the selection on R )(R) (result of the selection on R )

HashJoin

σ(R) S

HashJoin

σ(R)S

P1a P1b


Since | | σσ(R) |(R) | is underestimated, P1a is selected as the optimal plan, but P1b should have been selected by the optimizer since the estimation is wrong and P1a gets more costly for greater values of | | σσ(R) |(R) | .

Reactive OptimizationReactive Optimization

Reactive optimizers works in the following Reactive optimizers works in the following wayway– Use a traditional optimizer to find the best Use a traditional optimizer to find the best

plan.plan.– Use check operators to detect sub-optimality Use check operators to detect sub-optimality

during execution.during execution.– Trigger re-optimization, if required.Trigger re-optimization, if required.

Problems with Reactive Problems with Reactive Re-optimizationRe-optimization

The optimizer may pick plans whose The optimizer may pick plans whose performance depends heavily on uncertain performance depends heavily on uncertain statistics, making re-optimization very likelystatistics, making re-optimization very likely

The partial work done in a pipelined plan is lost The partial work done in a pipelined plan is lost when re-optimization is triggered and the plan is when re-optimization is triggered and the plan is changedchanged

The ability to collect statistics both quickly and The ability to collect statistics both quickly and accurately during execution is limitedaccurately during execution is limited

So, when re-optimization is triggered, the So, when re-optimization is triggered, the optimizer may make new mistakes, leading optimizer may make new mistakes, leading potentially to thrashingpotentially to thrashing

Proactive Re-optimizationProactive Re-optimizationA novel approachA novel approachUses Uses Bounding boxesBounding boxes instead of single instead of single point estimations to represent uncertaintypoint estimations to represent uncertaintyBounding boxes are used during Bounding boxes are used during optimization to generate optimization to generate robustrobust and and switchableswitchable plans, minimizing the need for plans, minimizing the need for re-optimization (hence, the loss of re-optimization (hence, the loss of pipelined work)pipelined work)Random-sample processing is merged Random-sample processing is merged with query execution to collect statistics with query execution to collect statistics quickly and accuratelyquickly and accurately

Proactive Re-optimizationProactive Re-optimization

Query

Catalog

1. Compute bounding boxes for estimates

2. Use bounding boxes to pick robust or switchable

plans

3. Execute query; Collect accurate statistics

estimates

Estimate within the bounding box?

No, re-optimize

Yes, use robust or switchable plan

Optimization

Execution

Run-time estimates

Representing UncertaintyRepresenting Uncertainty

Most of the current optimizers uses single-Most of the current optimizers uses single-point estimates of the statistics needed to point estimates of the statistics needed to cost planscost plans

Using intervals instead of single points Using intervals instead of single points allows the optimizer to handle uncertainty allows the optimizer to handle uncertainty about the estimatesabout the estimates

As the confidence about the estimate As the confidence about the estimate increases, bounding box gets narrowerincreases, bounding box gets narrower

Representing UncertaintyRepresenting Uncertainty

150 30075

160

192

144

|σ(R)| (in Mb)

|S| (in Mb)

Bounding BoxPotential Max.

Potential Min.

Estimated

Potential Max.Potential Min. Estimated

Using Bounding-boxes During Using Bounding-boxes During OptimizationOptimization

There is always one optimal plan for a single-There is always one optimal plan for a single-point estimatepoint estimateFor a bounding box For a bounding box BB, following cases can , following cases can occur:occur:– Single optimal plan: Single optimal plan: A single plan is optimal at all A single plan is optimal at all

points within Bpoints within B– Single robust plan:Single robust plan: There is a single plan whose There is a single plan whose

cost is very close to the optimal at all points in Bcost is very close to the optimal at all points in B– A switchable plan:A switchable plan: Explained in the next slide Explained in the next slide– None of the above:None of the above: Different plans are optimal at Different plans are optimal at

different points in B, but no switchable plan is different points in B, but no switchable plan is availableavailable

Switchable PlansSwitchable Plans

A switchable plan in A switchable plan in BB is a set is a set SS of plans with of plans with the following propertiesthe following properties– At each point At each point ptpt in B, there is a plan in B, there is a plan pp in in SS whose whose

cost at cost at ptpt is close to that of the optimal plan at is close to that of the optimal plan at ptpt– The decision of which plan in The decision of which plan in S S will be executed can will be executed can

be deferred until accurate estimates of uncertain be deferred until accurate estimates of uncertain statistics are availablestatistics are available

– If the actual statistics lie within If the actual statistics lie within BB, an appropriate plan , an appropriate plan from from SS can be picked and run without losing any can be picked and run without losing any significant fraction of the execution work done so farsignificant fraction of the execution work done so far

RIO Implementation DetailsRIO Implementation Details

Computing Bounding-boxesComputing Bounding-boxes

Optimizing with Bounding-boxesOptimizing with Bounding-boxes– Generating the Seed PlansGenerating the Seed Plans– Generating the Switchable PlanGenerating the Switchable Plan

Extensions to Query Execution EngineExtensions to Query Execution Engine

ExperimentsExperiments

Computing Bounding-boxesComputing Bounding-boxes

RIO restricts the computation of bounding boxes RIO restricts the computation of bounding boxes to size and selectivity estimatesto size and selectivity estimatesFor each such estimate For each such estimate EE, a bounding box , a bounding box BB is is computed using the following processcomputed using the following process– An uncertainty bucket An uncertainty bucket UU is assigned to is assigned to EE– The bounding box is computed from the (The bounding box is computed from the (E, UE, U) )

pairpair

An integer domain [0,6] is assigned to An integer domain [0,6] is assigned to UU according to some information (is there an according to some information (is there an accurate value of accurate value of EE exists in the catalog, etc..) exists in the catalog, etc..) from 0 (no uncertainty) to 6 (very high from 0 (no uncertainty) to 6 (very high uncertainty)uncertainty)

Optimizing with Bounding-boxesOptimizing with Bounding-boxes

RIO computes bounding boxes for all input RIO computes bounding boxes for all input sizes used to cost planssizes used to cost plans

Then it tries to compute a switchable plan Then it tries to compute a switchable plan for each distinct for each distinct (JS, IO)(JS, IO) pair (JS : Join pair (JS : Join Subset, IO : Interesting Orders )Subset, IO : Interesting Orders )

If RIO fails to find a switchable plan, it If RIO fails to find a switchable plan, it picks the optimal plan based on single-picks the optimal plan based on single-point estimatespoint estimates

Computing switchable plansComputing switchable plans

RIO computes switchable plans in two RIO computes switchable plans in two stepssteps– First, it finds three seed plans for each First, it finds three seed plans for each (JS, (JS,

IO)IO) pair pair– Then, it creates the switchable plan from the Then, it creates the switchable plan from the

seed plansseed plans

Generating seed plansGenerating seed plans

In RIO, each enumeration for plans considers three In RIO, each enumeration for plans considers three different costsdifferent costs– CCLOWLOW

– CCESTEST

– CCHIGHHIGH

CCESTEST is the traditional single-point estimation is the traditional single-point estimationCCLOWLOW and C and CHIGHHIGH are lower left and upper right corners are lower left and upper right corners of the bounding boxof the bounding boxFor each For each (JS, IO) (JS, IO) pair, we end up with three seed planspair, we end up with three seed plans– BestPlanLowBestPlanLow: plan with minimum cost C: plan with minimum cost CLOWLOW

– BestPlanEstBestPlanEst: plan with minimum cost C: plan with minimum cost CESTEST

– BestPlanHighBestPlanHigh: plan with minimum cost C: plan with minimum cost CHIGHHIGH

Generating the Switchable PlanGenerating the Switchable Plan

Given the seeds Given the seeds BestPlanLowBestPlanLow, , BestPlanEstBestPlanEst and and BestPlanHighBestPlanHigh, one of the , one of the following cases arisesfollowing cases arises– C1 : The seeds are all the same planC1 : The seeds are all the same plan– C2 : They are not the same, but one is a C2 : They are not the same, but one is a

robustrobust plan plan– C3 : Neither they are the same, nor one is a C3 : Neither they are the same, nor one is a

robustrobust plan, but, a plan, but, a switchableswitchable plan can be plan can be created from the seedscreated from the seeds

– C4 : A single optimal plan, a single robust C4 : A single optimal plan, a single robust plan or a switchable plan cannot be found plan or a switchable plan cannot be found

Generating the Switchable Plan Generating the Switchable Plan (cont’d)(cont’d)

In C1, the single optimal plan is the In C1, the single optimal plan is the switchable planswitchable planIn C2, RIO finds the robust plan among In C2, RIO finds the robust plan among the seeds and uses it as a singleton the seeds and uses it as a singleton switchable planswitchable planIn C3, RIO tries to find a switchable plan In C3, RIO tries to find a switchable plan (next slide)(next slide)In C4, RIO picks In C4, RIO picks BestPlanEstBestPlanEst as the as the optimal planoptimal plan

Finding Switchable PlansFinding Switchable Plans

RIO tries to find the set RIO tries to find the set SS of plans of plans satisfying the following constraints by satisfying the following constraints by enumerating the seedsenumerating the seeds– All plans in All plans in SS have a different joint operator as have a different joint operator as

the root operatorthe root operator– All plans in All plans in SS have the same subplan for the have the same subplan for the

deep subtree input to the root operatordeep subtree input to the root operator– All plans in All plans in SS have the same base table, but have the same base table, but

not necessarily the same access path, as the not necessarily the same access path, as the other input to the root operatorother input to the root operator

Using Switchable PlansUsing Switchable Plans

Contrary to the single point optimization, it is seen that P1b is more robust since it has a cost near to the optimal at all points in bounding box. But, since two plans are switchable as long as ||σσ(R)|(R)| lies within the bounding box, it is preferable to select P = {P1a, P1b} switchable plan instead of selecting P1b

Extensions to Query Execution Extensions to Query Execution EngineEngine

Following extensions have been made in order Following extensions have been made in order to satisfy requirements of the proactive to satisfy requirements of the proactive

re-optimizationre-optimization– A A switchswitch operator for handling switchable plans operator for handling switchable plans– A A bufferbuffer operator for buffering the tuples until it can operator for buffering the tuples until it can

compute an input-size estimate needed by the switch compute an input-size estimate needed by the switch operatoroperator

– Randomization-aware operatorsRandomization-aware operators for performing for performing random sampling for more accurate estimations of the random sampling for more accurate estimations of the statisticsstatistics

– An An inter-operator communication mechanisminter-operator communication mechanism allowing operators to exchange estimates and allowing operators to exchange estimates and random samplesrandom samples

ExperimentsExperiments

RIO has been tested with the following RIO has been tested with the following casescases– Two-way join queriesTwo-way join queries– Three-way join queriesThree-way join queries– Correlation-based mistakesCorrelation-based mistakes– ThrashingThrashing– More increased query complexityMore increased query complexity

Results for two-way join queries Results for two-way join queries experimentsexperiments

Results for three-way join queries Results for three-way join queries experimentsexperiments

Results for correlation mistakes Results for correlation mistakes experimentsexperiments

Results for more increased query Results for more increased query compexity experimentscompexity experiments

ConclusionsConclusions

Proactive re-optimization is a novel approach to Proactive re-optimization is a novel approach to query optimizationquery optimizationRIO is a prototype that uses RIO is a prototype that uses – Bounding boxes instead of single point estimates for Bounding boxes instead of single point estimates for

handling uncertaintyhandling uncertainty– Switchable plans for reducing the loss in pipelined Switchable plans for reducing the loss in pipelined

workwork– Random sampling techniques for collecting statistics Random sampling techniques for collecting statistics

quickly and more accuratelyquickly and more accurately

In the experiments, RIO outperforms the current In the experiments, RIO outperforms the current re-optimizers by up to a factor of threere-optimizers by up to a factor of three

ReferencesReferences

Babu, Babu, et. al, et. al, “Proactive Re-optimization”“Proactive Re-optimization”

Babu and Bizarro, “Adaptive Query Babu and Bizarro, “Adaptive Query Processing in the Looking Glass”Processing in the Looking Glass”

Selinger Selinger et. al, “et. al, “Access Path Selection in a Access Path Selection in a Relational Database Management Relational Database Management System”System”

Thanks…Thanks…

Any Questions?Any Questions?

Documents

Proactive Query Re-optimization