28
1 Lecture 25: Query Optimization Wednesday, November 26, 2003

Lecture 25: Query Optimization

Embed Size (px)

DESCRIPTION

Lecture 25: Query Optimization. Wednesday, November 26, 2003. Outline. Cost-based optimization 16.5, 16.6. Cost-based Optimizations. Main idea: apply algebraic laws, until estimated cost is minimal Practically: start from partial plans, introduce operators one by one - PowerPoint PPT Presentation

Citation preview

Page 1: Lecture 25: Query Optimization

1

Lecture 25:Query Optimization

Wednesday, November 26, 2003

Page 2: Lecture 25: Query Optimization

2

Outline

• Cost-based optimization 16.5, 16.6

Page 3: Lecture 25: Query Optimization

3

Cost-based Optimizations

• Main idea: apply algebraic laws, until estimated cost is minimal

• Practically: start from partial plans, introduce operators one by one– Will see in a few slides

• Problem: there are too many ways to apply the laws, hence too many (partial) plans

Page 4: Lecture 25: Query Optimization

4

Cost-based Optimizations

Approaches:

• Top-down: the partial plan is a top fragment of the logical plan

• Bottom up: the partial plan is a bottom fragment of the logical plan

Page 5: Lecture 25: Query Optimization

5

Search Strategies

• Branch-and-bound:– Remember the cheapest complete plan P seen so far

and its cost C

– Stop generating partial plans whose cost is > C

– If a cheaper complete plan is found, replace P, C

• Hill climbing:– Remember only the cheapest partial plan seen so far

• Dynamic programming:– Remember the all cheapest partial plans

Page 6: Lecture 25: Query Optimization

6

Dynamic Programming

Unit of Optimization: select-project-join

• Push selections down, pull projections up

Page 7: Lecture 25: Query Optimization

7

Join Trees

• R1 ⋈ R2 ⋈ …. ⋈ Rn• Join tree:

• A plan = a join tree• A partial plan = a subtree of a join tree

R3 R1 R2 R4

Page 8: Lecture 25: Query Optimization

8

Types of Join Trees

• Left deep:

R3 R1

R5

R2

R4

Page 9: Lecture 25: Query Optimization

9

Types of Join Trees

• Bushy:

R3

R1

R2 R4

R5

Page 10: Lecture 25: Query Optimization

10

Types of Join Trees

• Right deep:

R3

R1R5

R2 R4

Page 11: Lecture 25: Query Optimization

11

Problem

• Given: a query R1 ⋈ R2 ⋈ … ⋈ Rn

• Assume we have a function cost() that gives us the cost of every join tree

• Find the best join tree for the query

Page 12: Lecture 25: Query Optimization

12

Dynamic Programming

• Idea: for each subset of {R1, …, Rn}, compute the best plan for that subset

• In increasing order of set cardinality:– Step 1: for {R1}, {R2}, …, {Rn}– Step 2: for {R1,R2}, {R1,R3}, …, {Rn-1, Rn}– …– Step n: for {R1, …, Rn}

• It is a bottom-up strategy• A subset of {R1, …, Rn} is also called a subquery

Page 13: Lecture 25: Query Optimization

13

Dynamic Programming

• For each subquery Q ⊆ {R1, …, Rn} compute the following:– Size(Q)– A best plan for Q: Plan(Q)– The cost of that plan: Cost(Q)

Page 14: Lecture 25: Query Optimization

14

Dynamic Programming

• Step 1: For each {Ri} do:– Size({Ri}) = B(Ri)– Plan({Ri}) = Ri– Cost({Ri}) = (cost of scanning Ri)

Page 15: Lecture 25: Query Optimization

15

Dynamic Programming

• Step i: For each Q ⊆ {R1, …, Rn} of cardinality i do:– Compute Size(Q) (later…)– For every pair of subqueries Q’, Q’’

s.t. Q = Q’ Q’’compute cost(Plan(Q’) ⋈ Plan(Q’’))

– Cost(Q) = the smallest such cost– Plan(Q) = the corresponding plan

Page 16: Lecture 25: Query Optimization

16

Dynamic Programming

• Return Plan({R1, …, Rn})

Page 17: Lecture 25: Query Optimization

17

Dynamic Programming

To illustrate, we will make the following simplifications:

• Cost(P1 ⋈ P2) = Cost(P1) + Cost(P2) + size(intermediate result(s))

• Intermediate results:– If P1 = a join, then the size of the intermediate result is

size(P1), otherwise the size is 0– Similarly for P2

• Cost of a scan = 0

Page 18: Lecture 25: Query Optimization

18

Dynamic Programming

• Example:

• Cost(R5 ⋈ R7) = 0 (no intermediate results)

• Cost((R2 ⋈ R1) ⋈ R7) = Cost(R2 ⋈ R1) + Cost(R7) + size(R2 ⋈ R1) = size(R2 ⋈ R1)

Page 19: Lecture 25: Query Optimization

19

Dynamic Programming

• Relations: R, S, T, U• Number of tuples: 2000, 5000, 3000, 1000

• Size estimation: T(A ⋈ B) = 0.01*T(A)*T(B)

Page 20: Lecture 25: Query Optimization

20

Subquery Size Cost Plan

RS

RT

RU

ST

SU

TU

RST

RSU

RTU

STU

RSTU

Page 21: Lecture 25: Query Optimization

21

Subquery Size Cost Plan

RS 100k 0 RS

RT 60k 0 RT

RU 20k 0 RU

ST 150k 0 ST

SU 50k 0 SU

TU 30k 0 TU

RST 3M 60k (RT)S

RSU 1M 20k (RU)S

RTU 0.6M 20k (RU)T

STU 1.5M 30k (TU)S

RSTU 30M 60k+50k=110k (RT)(SU)

Page 22: Lecture 25: Query Optimization

22

Dynamic Programming

• Summary: computes optimal plans for subqueries:– Step 1: {R1}, {R2}, …, {Rn}– Step 2: {R1, R2}, {R1, R3}, …, {Rn-1, Rn}– …– Step n: {R1, …, Rn}

• We used naïve size/cost estimations• In practice:

– more realistic size/cost estimations (next time)– heuristics for Reducing the Search Space

• Restrict to left linear trees• Restrict to trees “without cartesian product”

– need more than just one plan for each subquery:• “interesting orders”

Page 23: Lecture 25: Query Optimization

23

Reducing the Search Space

• Left-linear trees v.s. Bushy trees

• Trees without cartesian product

Example: R(A,B) S(B,C) T(C,D)⋈ ⋈Plan: (R(A,B) T(C,D)) S(B,C) has a cartesian ⋈ ⋈

product – most query optimizers will not consider it

Page 24: Lecture 25: Query Optimization

24

Counting the Number of Join Orders

• The mathematics we need to know:

Given x1, x2, ..., xn, how many permutations can we construct ? Answer: n!

• Example: permutations of x1x2x3x4 are: x1x2x3x4

x2x4x3x1

.. . . (there are 4! = 4*3*2*1 = 24 permutations)

Page 25: Lecture 25: Query Optimization

25

Counting the Number of Join Orders

• The mathematics we need to know:

Given the product x0x1x2...xn, in how many ways can we place n pairs of parenthesis around them ? Answer: 1/(n+1)*C2n

n = (2n)!/((n+1)*(n!)2)

• Example: for n=3 ((x0x1)(x2x3))

((x0(x1x2))x3)

(x0((x1x2)x3))

((x0x1)x2)x3)

(x0(x1(x2x3)))• There are 6!/(4*3!*3!) = 5 ways

Page 26: Lecture 25: Query Optimization

26

Counting the Number of Join Orders (Exercise)

R0(A0,A1) R⋈ 1(A1,A2) . . . R⋈ ⋈ n(An,An+1)• The number of left linear join trees is:

• The number of left linear join trees without cartesian products is:

• The number of bushy join trees is:

• The number of bushy join trees without cartesian product is:

Page 27: Lecture 25: Query Optimization

27

Number of Subplans Inspected by Dynamic Programming

R0(A0,A1) R⋈ 1(A1,A2) . . . R⋈ ⋈ n(An,An+1)• The number of left linear subplans inspected is:

• The number of left linear subplans without cartesian products inspected is:

• The number of bushy join subplans inspected is:

• The number of bushy join subplans without cartesian product:

Page 28: Lecture 25: Query Optimization

28

Happy Thanksgivings !