Upload
kapila
View
47
Download
0
Embed Size (px)
DESCRIPTION
Proactive Query Re-optimization. by Çetin Meriçli 12.10.2005. Overview. Query Processing Query Optimization Idea Problems Solutions to problems in query optimization Reactive re-optimization Proactive Re-optimization RIO Implementation Details. Query Processing. - PowerPoint PPT Presentation
Citation preview
Proactive Proactive Query Query Re-optimizationRe-optimization
bybyÇetin MeriçliÇetin Meriçli
12.10.200512.10.2005
OverviewOverviewQuery ProcessingQuery Processing
Query OptimizationQuery Optimization– IdeaIdea– ProblemsProblems
Solutions to problems in query optimizationSolutions to problems in query optimization– Reactive re-optimizationReactive re-optimization– Proactive Re-optimizationProactive Re-optimization
RIO Implementation DetailsRIO Implementation Details
Query ProcessingQuery Processing
A SQL statement is subjected to four A SQL statement is subjected to four phases of processingphases of processing– ParsingParsing– OptimizationOptimization– Code GenerationCode Generation– ExecutionExecution
Query OptimizationQuery Optimization
Same result set for a query can be Same result set for a query can be obtained in more than one way.obtained in more than one way.
Depending on the query, different Depending on the query, different execution plans may have different costs.execution plans may have different costs.
Query optimizers try to find an execution Query optimizers try to find an execution plan with the lowest cost for a given query plan with the lowest cost for a given query based on some statistical estimations based on some statistical estimations about the data. about the data.
Query Optimization (cont’d)Query Optimization (cont’d)
Traditional optimization follows a Traditional optimization follows a plan-plan-first-execute-nextfirst-execute-next approach approach
This approach enumerates all execution This approach enumerates all execution plans, computes the cost of each plan and plans, computes the cost of each plan and picks the plan with the lowest costpicks the plan with the lowest cost
Performance highly depends on the Performance highly depends on the accuracy of the estimated statistics used accuracy of the estimated statistics used to compute coststo compute costs
Query Optimization (cont’d)Query Optimization (cont’d)
Example:Example:
select * from R, S select * from R, S
where R.a = S.a and where R.a = S.a and
R.b > K1 and R.b > K1 and
R.c > K2R.c > K2
Query Optimization (cont’d)Query Optimization (cont’d)
Assume thatAssume that– DB Buffer cache size is 200 MbDB Buffer cache size is 200 Mb– |R| = 500 Mb|R| = 500 Mb– |S| = 160 Mb|S| = 160 Mb– | | σσ(R) | = 300 Mb(R) | = 300 Mb
Due to skew and correlations in the data, Due to skew and correlations in the data, optimizer estimates | optimizer estimates | σσ(R) | to be 150 Mb(R) | to be 150 Mb
Query Optimization (cont’d)Query Optimization (cont’d)
Two parts of the queryTwo parts of the query– SS– σσ(R) (result of the selection on R )(R) (result of the selection on R )
HashJoin
σ(R) S
HashJoin
σ(R)S
P1a P1b
Query Optimization (cont’d)Query Optimization (cont’d)
Since | | σσ(R) |(R) | is underestimated, P1a is selected as the optimal plan, but P1b should have been selected by the optimizer since the estimation is wrong and P1a gets more costly for greater values of | | σσ(R) |(R) | .
Reactive OptimizationReactive Optimization
Reactive optimizers works in the following Reactive optimizers works in the following wayway– Use a traditional optimizer to find the best Use a traditional optimizer to find the best
plan.plan.– Use check operators to detect sub-optimality Use check operators to detect sub-optimality
during execution.during execution.– Trigger re-optimization, if required.Trigger re-optimization, if required.
Problems with Reactive Problems with Reactive Re-optimizationRe-optimization
The optimizer may pick plans whose The optimizer may pick plans whose performance depends heavily on uncertain performance depends heavily on uncertain statistics, making re-optimization very likelystatistics, making re-optimization very likely
The partial work done in a pipelined plan is lost The partial work done in a pipelined plan is lost when re-optimization is triggered and the plan is when re-optimization is triggered and the plan is changedchanged
The ability to collect statistics both quickly and The ability to collect statistics both quickly and accurately during execution is limitedaccurately during execution is limited
So, when re-optimization is triggered, the So, when re-optimization is triggered, the optimizer may make new mistakes, leading optimizer may make new mistakes, leading potentially to thrashingpotentially to thrashing
Proactive Re-optimizationProactive Re-optimizationA novel approachA novel approachUses Uses Bounding boxesBounding boxes instead of single instead of single point estimations to represent uncertaintypoint estimations to represent uncertaintyBounding boxes are used during Bounding boxes are used during optimization to generate optimization to generate robustrobust and and switchableswitchable plans, minimizing the need for plans, minimizing the need for re-optimization (hence, the loss of re-optimization (hence, the loss of pipelined work)pipelined work)Random-sample processing is merged Random-sample processing is merged with query execution to collect statistics with query execution to collect statistics quickly and accuratelyquickly and accurately
Proactive Re-optimizationProactive Re-optimization
Query
Catalog
1. Compute bounding boxes for estimates
2. Use bounding boxes to pick robust or switchable
plans
3. Execute query; Collect accurate statistics
estimates
Estimate within the bounding box?
No, re-optimize
Yes, use robust or switchable plan
Optimization
Execution
Run-time estimates
Representing UncertaintyRepresenting Uncertainty
Most of the current optimizers uses single-Most of the current optimizers uses single-point estimates of the statistics needed to point estimates of the statistics needed to cost planscost plans
Using intervals instead of single points Using intervals instead of single points allows the optimizer to handle uncertainty allows the optimizer to handle uncertainty about the estimatesabout the estimates
As the confidence about the estimate As the confidence about the estimate increases, bounding box gets narrowerincreases, bounding box gets narrower
Representing UncertaintyRepresenting Uncertainty
150 30075
160
192
144
|σ(R)| (in Mb)
|S| (in Mb)
Bounding BoxPotential Max.
Potential Min.
Estimated
Potential Max.Potential Min. Estimated
Using Bounding-boxes During Using Bounding-boxes During OptimizationOptimization
There is always one optimal plan for a single-There is always one optimal plan for a single-point estimatepoint estimateFor a bounding box For a bounding box BB, following cases can , following cases can occur:occur:– Single optimal plan: Single optimal plan: A single plan is optimal at all A single plan is optimal at all
points within Bpoints within B– Single robust plan:Single robust plan: There is a single plan whose There is a single plan whose
cost is very close to the optimal at all points in Bcost is very close to the optimal at all points in B– A switchable plan:A switchable plan: Explained in the next slide Explained in the next slide– None of the above:None of the above: Different plans are optimal at Different plans are optimal at
different points in B, but no switchable plan is different points in B, but no switchable plan is availableavailable
Switchable PlansSwitchable Plans
A switchable plan in A switchable plan in BB is a set is a set SS of plans with of plans with the following propertiesthe following properties– At each point At each point ptpt in B, there is a plan in B, there is a plan pp in in SS whose whose
cost at cost at ptpt is close to that of the optimal plan at is close to that of the optimal plan at ptpt– The decision of which plan in The decision of which plan in S S will be executed can will be executed can
be deferred until accurate estimates of uncertain be deferred until accurate estimates of uncertain statistics are availablestatistics are available
– If the actual statistics lie within If the actual statistics lie within BB, an appropriate plan , an appropriate plan from from SS can be picked and run without losing any can be picked and run without losing any significant fraction of the execution work done so farsignificant fraction of the execution work done so far
RIO Implementation DetailsRIO Implementation Details
Computing Bounding-boxesComputing Bounding-boxes
Optimizing with Bounding-boxesOptimizing with Bounding-boxes– Generating the Seed PlansGenerating the Seed Plans– Generating the Switchable PlanGenerating the Switchable Plan
Extensions to Query Execution EngineExtensions to Query Execution Engine
ExperimentsExperiments
Computing Bounding-boxesComputing Bounding-boxes
RIO restricts the computation of bounding boxes RIO restricts the computation of bounding boxes to size and selectivity estimatesto size and selectivity estimatesFor each such estimate For each such estimate EE, a bounding box , a bounding box BB is is computed using the following processcomputed using the following process– An uncertainty bucket An uncertainty bucket UU is assigned to is assigned to EE– The bounding box is computed from the (The bounding box is computed from the (E, UE, U) )
pairpair
An integer domain [0,6] is assigned to An integer domain [0,6] is assigned to UU according to some information (is there an according to some information (is there an accurate value of accurate value of EE exists in the catalog, etc..) exists in the catalog, etc..) from 0 (no uncertainty) to 6 (very high from 0 (no uncertainty) to 6 (very high uncertainty)uncertainty)
Optimizing with Bounding-boxesOptimizing with Bounding-boxes
RIO computes bounding boxes for all input RIO computes bounding boxes for all input sizes used to cost planssizes used to cost plans
Then it tries to compute a switchable plan Then it tries to compute a switchable plan for each distinct for each distinct (JS, IO)(JS, IO) pair (JS : Join pair (JS : Join Subset, IO : Interesting Orders )Subset, IO : Interesting Orders )
If RIO fails to find a switchable plan, it If RIO fails to find a switchable plan, it picks the optimal plan based on single-picks the optimal plan based on single-point estimatespoint estimates
Computing switchable plansComputing switchable plans
RIO computes switchable plans in two RIO computes switchable plans in two stepssteps– First, it finds three seed plans for each First, it finds three seed plans for each (JS, (JS,
IO)IO) pair pair– Then, it creates the switchable plan from the Then, it creates the switchable plan from the
seed plansseed plans
Generating seed plansGenerating seed plans
In RIO, each enumeration for plans considers three In RIO, each enumeration for plans considers three different costsdifferent costs– CCLOWLOW
– CCESTEST
– CCHIGHHIGH
CCESTEST is the traditional single-point estimation is the traditional single-point estimationCCLOWLOW and C and CHIGHHIGH are lower left and upper right corners are lower left and upper right corners of the bounding boxof the bounding boxFor each For each (JS, IO) (JS, IO) pair, we end up with three seed planspair, we end up with three seed plans– BestPlanLowBestPlanLow: plan with minimum cost C: plan with minimum cost CLOWLOW
– BestPlanEstBestPlanEst: plan with minimum cost C: plan with minimum cost CESTEST
– BestPlanHighBestPlanHigh: plan with minimum cost C: plan with minimum cost CHIGHHIGH
Generating the Switchable PlanGenerating the Switchable Plan
Given the seeds Given the seeds BestPlanLowBestPlanLow, , BestPlanEstBestPlanEst and and BestPlanHighBestPlanHigh, one of the , one of the following cases arisesfollowing cases arises– C1 : The seeds are all the same planC1 : The seeds are all the same plan– C2 : They are not the same, but one is a C2 : They are not the same, but one is a
robustrobust plan plan– C3 : Neither they are the same, nor one is a C3 : Neither they are the same, nor one is a
robustrobust plan, but, a plan, but, a switchableswitchable plan can be plan can be created from the seedscreated from the seeds
– C4 : A single optimal plan, a single robust C4 : A single optimal plan, a single robust plan or a switchable plan cannot be found plan or a switchable plan cannot be found
Generating the Switchable Plan Generating the Switchable Plan (cont’d)(cont’d)
In C1, the single optimal plan is the In C1, the single optimal plan is the switchable planswitchable planIn C2, RIO finds the robust plan among In C2, RIO finds the robust plan among the seeds and uses it as a singleton the seeds and uses it as a singleton switchable planswitchable planIn C3, RIO tries to find a switchable plan In C3, RIO tries to find a switchable plan (next slide)(next slide)In C4, RIO picks In C4, RIO picks BestPlanEstBestPlanEst as the as the optimal planoptimal plan
Finding Switchable PlansFinding Switchable Plans
RIO tries to find the set RIO tries to find the set SS of plans of plans satisfying the following constraints by satisfying the following constraints by enumerating the seedsenumerating the seeds– All plans in All plans in SS have a different joint operator as have a different joint operator as
the root operatorthe root operator– All plans in All plans in SS have the same subplan for the have the same subplan for the
deep subtree input to the root operatordeep subtree input to the root operator– All plans in All plans in SS have the same base table, but have the same base table, but
not necessarily the same access path, as the not necessarily the same access path, as the other input to the root operatorother input to the root operator
Using Switchable PlansUsing Switchable Plans
Contrary to the single point optimization, it is seen that P1b is more robust since it has a cost near to the optimal at all points in bounding box. But, since two plans are switchable as long as ||σσ(R)|(R)| lies within the bounding box, it is preferable to select P = {P1a, P1b} switchable plan instead of selecting P1b
Extensions to Query Execution Extensions to Query Execution EngineEngine
Following extensions have been made in order Following extensions have been made in order to satisfy requirements of the proactive to satisfy requirements of the proactive
re-optimizationre-optimization– A A switchswitch operator for handling switchable plans operator for handling switchable plans– A A bufferbuffer operator for buffering the tuples until it can operator for buffering the tuples until it can
compute an input-size estimate needed by the switch compute an input-size estimate needed by the switch operatoroperator
– Randomization-aware operatorsRandomization-aware operators for performing for performing random sampling for more accurate estimations of the random sampling for more accurate estimations of the statisticsstatistics
– An An inter-operator communication mechanisminter-operator communication mechanism allowing operators to exchange estimates and allowing operators to exchange estimates and random samplesrandom samples
ExperimentsExperiments
RIO has been tested with the following RIO has been tested with the following casescases– Two-way join queriesTwo-way join queries– Three-way join queriesThree-way join queries– Correlation-based mistakesCorrelation-based mistakes– ThrashingThrashing– More increased query complexityMore increased query complexity
Results for two-way join queries Results for two-way join queries experimentsexperiments
Results for three-way join queries Results for three-way join queries experimentsexperiments
Results for correlation mistakes Results for correlation mistakes experimentsexperiments
Results for more increased query Results for more increased query compexity experimentscompexity experiments
ConclusionsConclusions
Proactive re-optimization is a novel approach to Proactive re-optimization is a novel approach to query optimizationquery optimizationRIO is a prototype that uses RIO is a prototype that uses – Bounding boxes instead of single point estimates for Bounding boxes instead of single point estimates for
handling uncertaintyhandling uncertainty– Switchable plans for reducing the loss in pipelined Switchable plans for reducing the loss in pipelined
workwork– Random sampling techniques for collecting statistics Random sampling techniques for collecting statistics
quickly and more accuratelyquickly and more accurately
In the experiments, RIO outperforms the current In the experiments, RIO outperforms the current re-optimizers by up to a factor of threere-optimizers by up to a factor of three
ReferencesReferences
Babu, Babu, et. al, et. al, “Proactive Re-optimization”“Proactive Re-optimization”
Babu and Bizarro, “Adaptive Query Babu and Bizarro, “Adaptive Query Processing in the Looking Glass”Processing in the Looking Glass”
Selinger Selinger et. al, “et. al, “Access Path Selection in a Access Path Selection in a Relational Database Management Relational Database Management System”System”
Thanks…Thanks…
Any Questions?Any Questions?