29
Multi-Query Multi-Query Optimization Optimization Prasan Roy Indian Institute of Technology - Bombay

Multi-Query Optimization Prasan Roy Indian Institute of Technology - Bombay

Embed Size (px)

Citation preview

Page 1: Multi-Query Optimization Prasan Roy Indian Institute of Technology - Bombay

Multi-Query OptimizationMulti-Query Optimization

Prasan RoyIndian Institute of Technology - Bombay

Page 2: Multi-Query Optimization Prasan Roy Indian Institute of Technology - Bombay

OverviewOverview

Multi-Query Optimization: What?– Problem statement

Multi-Query Optimization: Why?– Application scenarios

Multi-Query Optimization: How?– A cost-based practical approach– Prototyping Multi-Query Optimization

• On MS SQL-Server at Microsoft• Research prototype at IIT-Bombay

Page 3: Multi-Query Optimization Prasan Roy Indian Institute of Technology - Bombay

Multi-Query Optimization: Multi-Query Optimization: What?What?Exploit common subexpressions (CSEs) in

query optimization Consider DAG execution plans in addition

to tree execution plans

Page 4: Multi-Query Optimization Prasan Roy Indian Institute of Technology - Bombay

ExampleExample

AA

BB CC

BB

CC DD

Best Plan for Best Plan for A JOIN B JOIN CA JOIN B JOIN C

Best Plan forBest Plan forB JOIN C JOIN DB JOIN C JOIN D

Page 5: Multi-Query Optimization Prasan Roy Indian Institute of Technology - Bombay

Example (contd)Example (contd)

Alternative:

AA

BB CC

DD

Common SubexpressionCommon Subexpression

Page 6: Multi-Query Optimization Prasan Roy Indian Institute of Technology - Bombay

Multi-Query Optimization: Multi-Query Optimization: Why?Why? Queries on views, nested queries, … Overlapping query batches generated by

applications Update expressions for materialized views Query invocations with different parameters . . .

Practical solutions needed! Practical solutions needed!

Page 7: Multi-Query Optimization Prasan Roy Indian Institute of Technology - Bombay

Multi-Query Optimization: Multi-Query Optimization: How?How?Set up the search space

– Identify the common subexpressionsExplore the search space efficiently

– Find the best way to exploit the common subexpressions

Page 8: Multi-Query Optimization Prasan Roy Indian Institute of Technology - Bombay

ProblemsProblems

Materializing and sharing a CSE not necessarily cheaper

Mutually exclusive alternatives (A JOIN B JOIN C)(A JOIN B JOIN C)

(B JOIN C JOIN D)(B JOIN C JOIN D)

(C JOIN D JOIN E)(C JOIN D JOIN E)

What to share: (B JOIN C)(B JOIN C) or (C JOIN D)(C JOIN D) ?

Huge search space! Huge search space!

Page 9: Multi-Query Optimization Prasan Roy Indian Institute of Technology - Bombay

Earlier Work: Earlier Work: Practical SolutionsPractical SolutionsAs early as 1976 Preprocess query before optimization

[Hall, IBM-JRD76]

As late as 1998 Postprocess optimized plans

[Subramanium and Venkataraman, SIGMOD98]

Query optimizer is not aware! Query optimizer is not aware!

Page 10: Multi-Query Optimization Prasan Roy Indian Institute of Technology - Bombay

Earlier Work: Earlier Work: Theoretical StudiesTheoretical Studies[Sellis, TODS88], [Cosar et al., CIKM93], [Shim et al., DKE94],...

Set of queries {Q1, Q2, …, Qn} For each query Qi, set of execution plans

{Pi1, Pi2, …, Pim} Pij is a set of tasks from a common pool

Pick a plan for each query such that the cost of tasks in the union is minimized

Not integrated with existing optimizers, no practical study Not integrated with existing optimizers, no practical study

Page 11: Multi-Query Optimization Prasan Roy Indian Institute of Technology - Bombay

Microsoft ExperienceMicrosoft Experience

with Paul Larson,Microsoft Research

Page 12: Multi-Query Optimization Prasan Roy Indian Institute of Technology - Bombay

Prototyping MQO on Prototyping MQO on SQL-ServerSQL-ServerAdd multi-query optimization capability to

SQL-Server Well integrated with the existing

optimization framework– another optimization level– minimal changes, minimal extra lines of code

First cut: exhaustive– How slow can it be?

A working prototype by the summer-end

Page 13: Multi-Query Optimization Prasan Roy Indian Institute of Technology - Bombay

What (almost) already exists What (almost) already exists in the SQL-Server Optimizerin the SQL-Server Optimizer AND/OR Query-DAG representation of plan space

Group (OR node)Group (OR node)

AA BB CC DD

Op (AND node)Op (AND node)

Page 14: Multi-Query Optimization Prasan Roy Indian Institute of Technology - Bombay

What actually exists in the What actually exists in the SQL-Server OptimizerSQL-Server Optimizer Relations cloned for each use

AA B1B1 C1C1 DDB2B2 C2C2

Page 15: Multi-Query Optimization Prasan Roy Indian Institute of Technology - Bombay

Preprocessing Step: Preprocessing Step: Query-DAG UnificationQuery-DAG Unification Performed in a bottom-up traversal

AA B1B1 C1C1

DDB2B2 C2C2

Page 16: Multi-Query Optimization Prasan Roy Indian Institute of Technology - Bombay

Common Subexpression Common Subexpression IdentificationIdentification Unified nodes are CSEs

Common SubexpressionCommon Subexpression

AA BB CC DD

Page 17: Multi-Query Optimization Prasan Roy Indian Institute of Technology - Bombay

Exploring the Search Space: Exploring the Search Space: A Naïve AlgorithmA Naïve Algorithm For each set S of common subexpressions

– materialize each node in S– MatCost(S) = sum of materialization costs of the

nodes in S– invoke optimizer to find the best plan for the root

and for each node S – CompCost(S) = sum of costs of above plans– Cost(S) = MatCost(S) + CompCost(S)

Pick S with the minimum Cost

Page 18: Multi-Query Optimization Prasan Roy Indian Institute of Technology - Bombay

Doing Better: Doing Better: Incremental ReoptimizationIncremental ReoptimizationGoal: best plan for Si best plan for Sj Observation

– Best plans change for only the ancestors of nodes in Si XOR Sj

Algorithm: – Propagate changed costs in bottom-up topological

order from nodes in Si XOR Sj

– Update min-cost plan at each node visited

– Do not propagate further up if min-cost plan remains unchanged at a node

Work done at IIT-BombayWork done at IIT-Bombay

Page 19: Multi-Query Optimization Prasan Roy Indian Institute of Technology - Bombay

min-costmin-cost

Incremental Optimization: Incremental Optimization: ExampleExample Si =

AA BB CC DD

Page 20: Multi-Query Optimization Prasan Roy Indian Institute of Technology - Bombay

Previous min-costPrevious min-cost

New min-costNew min-cost

Incremental Optimization: Incremental Optimization: ExampleExample Si = Sj = {(B JOIN C)}

Now materializedNow materialized

AA BB CC DD

Page 21: Multi-Query Optimization Prasan Roy Indian Institute of Technology - Bombay

Current StatusCurrent Status

A first-cut implementation working– Lines of C++ code added: 1500 approx.

Page 22: Multi-Query Optimization Prasan Roy Indian Institute of Technology - Bombay

Future WorkFuture Work

Performance tuning and smarter data structures needed

Ways to restrict enumeration taking DAG structure into account

Page 23: Multi-Query Optimization Prasan Roy Indian Institute of Technology - Bombay

Research at IIT-Bombay: Research at IIT-Bombay: Heuristics for MQOHeuristics for MQO

with S. Sudarshan, S. Seshadri

Page 24: Multi-Query Optimization Prasan Roy Indian Institute of Technology - Bombay

A Greedy HeuristicA Greedy Heuristic

Pick nodes for materialization one at a time, in “benefit” order

Benefit(n) = reduction in cost on materialization of n

Benefit computation is expensiveBenefit computation is expensive

Page 25: Multi-Query Optimization Prasan Roy Indian Institute of Technology - Bombay

Monotonicity AssumptionMonotonicity Assumption

Benefit of a node does not increase due to materialization of other nodes

Exploited to avoid some benefit computations

Optimization costs decrease by 90%

Page 26: Multi-Query Optimization Prasan Roy Indian Institute of Technology - Bombay

A Postpass Heuristic: A Postpass Heuristic: Volcano-SHVolcano-SH No change in Volcano best plan

computation Cost-based materialization of nodes in

best Volcano plan

Implementation easy

Low overhead

Optimizer is not aware

Page 27: Multi-Query Optimization Prasan Roy Indian Institute of Technology - Bombay

A Volcano Variant: A Volcano Variant: Volcano-RUVolcano-RU Volcano best plan search aware of best

plans for earlier queries– Cost based materialization of best plan nodes

that are used by later queries

Implementation easy

Low overhead

Local decisions, plan quality sensitive to query sequence

Page 28: Multi-Query Optimization Prasan Roy Indian Institute of Technology - Bombay

Experimental ConclusionExperimental Conclusion

Greedy – Expensive, but practical– Overheads typically offset by plan quality

• especially for expensive “canned” queries

– Almost linear scaleup with query batch size• typically, only the width of the Query DAG affected

Volcano-RU – Mostly better than Volcano-SH, same overhead– Negligible overhead over Volcano

• recommended for cheap but complex queries

Page 29: Multi-Query Optimization Prasan Roy Indian Institute of Technology - Bombay

ConclusionConclusion

Multi-query optimization is neededMulti-query optimization is practical!Multi-query optimization is an easy

next step for DAG-based optimizers