116
CS 245 Notes 6 1 CS 245: Database System Principles Notes 6: Query Processing Hector Garcia-Molina

CS 245: Database System Principles

  • Upload
    chyna

  • View
    43

  • Download
    0

Embed Size (px)

DESCRIPTION

CS 245: Database System Principles. Notes 6: Query Processing Hector Garcia-Molina. Query Processing. Q  Query Plan. Focus: Relational System. Others?. Query Processing. Q  Query Plan. Example. Select B,D From R,S Where R.A = “c”  S.E = 2  R.C=S.C. - PowerPoint PPT Presentation

Citation preview

Page 1: CS 245: Database System Principles

CS 245 Notes 6 1

CS 245: Database System Principles

Notes 6: Query Processing

Hector Garcia-Molina

Page 2: CS 245: Database System Principles

CS 245 Notes 6 2

Query Processing

Q Query Plan

Page 3: CS 245: Database System Principles

CS 245 Notes 6 3

Query Processing

Q Query Plan

Focus: Relational System

• Others?

Page 4: CS 245: Database System Principles

CS 245 Notes 6 4

Example

Select B,DFrom R,SWhere R.A = “c” S.E = 2 R.C=S.C

Page 5: CS 245: Database System Principles

CS 245 Notes 6 5

R A B C S C D E

a 1 10 10 x 2

b 1 20 20 y 2

c 2 10 30 z 2

d 2 35 40 x 1

e 3 45 50 y 3

Page 6: CS 245: Database System Principles

CS 245 Notes 6 6

R A B C S C D E

a 1 10 10 x 2

b 1 20 20 y 2

c 2 10 30 z 2

d 2 35 40 x 1

e 3 45 50 y 3

Answer B D2 x

Page 7: CS 245: Database System Principles

CS 245 Notes 6 7

• How do we execute query?

- Do Cartesian product

- Select tuples- Do projection

One idea

Page 8: CS 245: Database System Principles

CS 245 Notes 6 8

RXS R.A R.B R.C S.C S.D S.E

a 1 10 10 x 2

a 1 10 20 y 2

. .

C 2 10 10 x 2 . .

Page 9: CS 245: Database System Principles

CS 245 Notes 6 9

RXS R.A R.B R.C S.C S.D S.E

a 1 10 10 x 2

a 1 10 20 y 2

. .

C 2 10 10 x 2 . .

Bingo!

Got one...

Page 10: CS 245: Database System Principles

CS 245 Notes 6 10

Relational Algebra - can be used to

describe plans...Ex: Plan I

B,D

R.A=“c” S.E=2 R.C=S.C

XR S

Page 11: CS 245: Database System Principles

CS 245 Notes 6 11

Relational Algebra - can be used to

describe plans...Ex: Plan I

B,D

R.A=“c” S.E=2 R.C=S.C

XR S

OR: B,D [ R.A=“c” S.E=2 R.C = S.C (RXS)]

Page 12: CS 245: Database System Principles

CS 245 Notes 6 12

Another idea:

B,D

R.A = “c” S.E = 2

R S

Plan II

natural join

Page 13: CS 245: Database System Principles

CS 245 Notes 6 13

R S

A B C (R) (S) C D E

a 1 10 A B C C D E 10 x 2

b 1 20 c 2 10 10 x 2 20 y 2

c 2 10 20 y 2 30 z 2

d 2 35 30 z 2 40 x 1

e 3 45 50 y 3

Page 14: CS 245: Database System Principles

CS 245 Notes 6 14

Plan III Use R.A and S.C Indexes

(1) Use R.A index to select R tuples with R.A = “c”

(2) For each R.C value found, use S.C index to find matching tuples

Page 15: CS 245: Database System Principles

CS 245 Notes 6 15

Plan III Use R.A and S.C Indexes

(1) Use R.A index to select R tuples with R.A = “c”

(2) For each R.C value found, use S.C index to find matching tuples

(3) Eliminate S tuples S.E 2

(4) Join matching R,S tuples, project

B,D attributes and place in result

Page 16: CS 245: Database System Principles

CS 245 Notes 6 16

R S

A B C C D E

a 1 10 10 x 2

b 1 20 20 y 2

c 2 10 30 z 2

d 2 35 40 x 1

e 3 45 50 y 3

A CI1 I2

Page 17: CS 245: Database System Principles

CS 245 Notes 6 17

R S

A B C C D E

a 1 10 10 x 2

b 1 20 20 y 2

c 2 10 30 z 2

d 2 35 40 x 1

e 3 45 50 y 3

A CI1 I2

=“c”

<c,2,10>

Page 18: CS 245: Database System Principles

CS 245 Notes 6 18

R S

A B C C D E

a 1 10 10 x 2

b 1 20 20 y 2

c 2 10 30 z 2

d 2 35 40 x 1

e 3 45 50 y 3

A CI1 I2

=“c”

<c,2,10> <10,x,2>

Page 19: CS 245: Database System Principles

CS 245 Notes 6 19

R S

A B C C D E

a 1 10 10 x 2

b 1 20 20 y 2

c 2 10 30 z 2

d 2 35 40 x 1

e 3 45 50 y 3

A CI1 I2

=“c”

<c,2,10> <10,x,2>

check=2?

output: <2,x>

Page 20: CS 245: Database System Principles

CS 245 Notes 6 20

R S

A B C C D E

a 1 10 10 x 2

b 1 20 20 y 2

c 2 10 30 z 2

d 2 35 40 x 1

e 3 45 50 y 3

A CI1 I2

=“c”

<c,2,10> <10,x,2>

check=2?

output: <2,x>

next tuple:<c,7,15>

Page 21: CS 245: Database System Principles

CS 245 Notes 6 21

Overview of Query Optimization

Page 22: CS 245: Database System Principles

CS 245 Notes 6 22

parse

convert

apply laws

estimate result sizes

consider physical plans estimate costs

pick best

execute

{P1,P2,…..}

{(P1,C1),(P2,C2)...}

Pi

answer

SQL query

parse tree

logical query plan

“improved” l.q.p

l.q.p. +sizes

statistics

Page 23: CS 245: Database System Principles

CS 245 Notes 6 23

Example: SQL querySELECT titleFROM StarsInWHERE starName IN (

SELECT nameFROM MovieStarWHERE birthdate LIKE ‘%1960’

);

(Find the movies with stars born in 1960)

Page 24: CS 245: Database System Principles

CS 245 Notes 6 24

Example: Parse Tree<Query>

<SFW>

SELECT <SelList> FROM <FromList> WHERE <Condition>

<Attribute> <RelName> <Tuple> IN <Query>

title StarsIn <Attribute> ( <Query> )

starName <SFW>

SELECT <SelList> FROM <FromList> WHERE <Condition>

<Attribute> <RelName> <Attribute> LIKE <Pattern>

name MovieStar birthDate ‘%1960’

Page 25: CS 245: Database System Principles

CS 245 Notes 6 25

Example: Generating Relational Algebra

title

StarsIn <condition>

<tuple> IN name

<attribute> birthdate LIKE ‘%1960’

starName MovieStar

Fig. 7.15: An expression using a two-argument , midway between a parse tree and relational algebra

Page 26: CS 245: Database System Principles

CS 245 Notes 6 26

Example: Logical Query Plan

title

starName=name

StarsIn name

birthdate LIKE ‘%1960’

MovieStar

Fig. 7.18: Applying the rule for IN conditions

Page 27: CS 245: Database System Principles

CS 245 Notes 6 27

Example: Improved Logical Query Plan

title

starName=name

StarsIn name

birthdate LIKE ‘%1960’

MovieStar

Fig. 7.20: An improvement on fig. 7.18.

Question:Push project to

StarsIn?

Page 28: CS 245: Database System Principles

CS 245 Notes 6 28

Example: Estimate Result Sizes

Need expected size

StarsIn

MovieStar

Page 29: CS 245: Database System Principles

CS 245 Notes 6 29

Example: One Physical Plan

Parameters: join order,

memory size, project attributes,...

Hash join

SEQ scan index scan Parameters:Select Condition,...

StarsIn MovieStar

Page 30: CS 245: Database System Principles

CS 245 Notes 6 30

Example: Estimate costs

L.Q.P

P1 P2 …. Pn

C1 C2 …. Cn Pick best!

Page 31: CS 245: Database System Principles

CS 245 Notes 6 31

Textbook outline

Chapter 155 Algebra for queries [bags vs sets]

- Select, project, join, …. [project lista,a+b->x,…]- Duplicate elimination, grouping, sorting

15.1 Physical operators- Scan,sort, …

15.2 - 15.6 Implementing operators + estimating their cost

[Ch 5]

[15.1]

[15.2-15.6]

Page 32: CS 245: Database System Principles

CS 245 Notes 6 32

Chapter 1616.1[16.1]Parsing16.2[16.2]Algebraic laws16.3[16.3]Parse tree -> logical query

plan16.4[16.4]Estimating result sizes16.5-7[16.5-7] Cost based

optimization

Page 33: CS 245: Database System Principles

CS 245 Notes 6 33

Reading textbook - Chapters 15, 16

Optional:– Sections 15.7, 15.8, 15.9 [15.7, 15.8]– Sections 16.6, 16.7 [16.6, 16.7]

Optional: Duplicate elimination operator

grouping, aggregation operators

Page 34: CS 245: Database System Principles

CS 245 Notes 6 34

Query Optimization - In class order

• Relational algebra level (A)• Detailed query plan level

– Estimate Costs (B)• without indexes• with indexes

– Generate and compare plans (C)

Page 35: CS 245: Database System Principles

CS 245 Notes 6 35

parse

convert

apply laws

estimate result sizes

consider physical plans estimate costs

pick best

execute

{P1,P2,…..}

{(P1,C1),(P2,C2)...}

Pi

answer

SQL query

parse tree

logical query plan

“improved” l.q.p

l.q.p. +sizes

statistics(A)

(C)

(B)

(C)

Page 36: CS 245: Database System Principles

CS 245 Notes 6 36

Relational algebra optimization

• Transformation rules(preserve equivalence)

• What are good transformations?

Page 37: CS 245: Database System Principles

CS 245 Notes 6 37

Rules: Natural joins & cross products & unionR S = S R(R S) T = R (S T)

Page 38: CS 245: Database System Principles

CS 245 Notes 6 38

Note:

• Carry attribute names in results, soorder is not important

• Can also write as trees, e.g.:

T R

R S S T

Page 39: CS 245: Database System Principles

CS 245 Notes 6 39

R x S = S x R(R x S) x T = R x (S x T)

R U S = S U RR U (S U T) = (R U S) U T

Rules: Natural joins & cross products & unionR S = S R(R S) T = R (S T)

Page 40: CS 245: Database System Principles

CS 245 Notes 6 40

Rules: Selects

p1p2(R) =

p1vp2(R)

=

Page 41: CS 245: Database System Principles

CS 245 Notes 6 41

Rules: Selects

p1p2(R) =

p1vp2(R)

=

p1 [ p2 (R)]

[ p1 (R)] U [ p2 (R)]

Page 42: CS 245: Database System Principles

CS 245 Notes 6 42

Bags vs. Sets

R = {a,a,b,b,b,c}S = {b,b,c,c,d}RUS = ?

Page 43: CS 245: Database System Principles

CS 245 Notes 6 43

Bags vs. Sets

R = {a,a,b,b,b,c}S = {b,b,c,c,d}RUS = ?

• Option 1 SUMRUS = {a,a,b,b,b,b,b,c,c,c,d}

• Option 2 MAXRUS = {a,a,b,b,b,c,c,d}

Page 44: CS 245: Database System Principles

CS 245 Notes 6 44

Option 2 (MAX) makes this rule work:

p1vp2 (R) = p1(R) U p2(R)

Example: R={a,a,b,b,b,c}P1 satisfied by a,b; P2 satisfied by b,c

Page 45: CS 245: Database System Principles

CS 245 Notes 6 45

Option 2 (MAX) makes this rule work:

p1vp2 (R) = p1(R) U p2(R)

Example: R={a,a,b,b,b,c}P1 satisfied by a,b; P2 satisfied by b,c

p1vp2 (R) = {a,a,b,b,b,c}

p1(R) = {a,a,b,b,b}

p2(R) = {b,b,b,c}

p1(R) U p2 (R) = {a,a,b,b,b,c}

Page 46: CS 245: Database System Principles

CS 245 Notes 6 46

“Sum” option makes more sense:

Senators (……) Rep (……)

T1 = yr,state Senators; T2 = yr,state Reps

T1 Yr State T2 Yr State 14 CA 16 CA 16 CA 16 CA 15 AZ 15 CA

Union?

Page 47: CS 245: Database System Principles

CS 245 Notes 6 47

Executive Decision

-> Use “SUM” option for bag unions-> Some rules cannot be used for

bags

Page 48: CS 245: Database System Principles

CS 245 Notes 6 48

Rules: Project

Let: X = set of attributesY = set of attributesXY = X U Y

xy (R) =

Page 49: CS 245: Database System Principles

CS 245 Notes 6 49

Rules: Project

Let: X = set of attributesY = set of attributesXY = X U Y

xy (R) = x [y (R)]

Page 50: CS 245: Database System Principles

CS 245 Notes 6 50

Rules: Project

Let: X = set of attributesY = set of attributesXY = X U Y

xy (R) = x [y (R)]

Page 51: CS 245: Database System Principles

CS 245 Notes 6 51

Let p = predicate with only R attribs q = predicate with only S attribs m = predicate with only R,S attribs

p (R S) =

q (R S) =

Rules: combined

Page 52: CS 245: Database System Principles

CS 245 Notes 6 52

Let p = predicate with only R attribs q = predicate with only S attribs m = predicate with only R,S attribs

p (R S) =

q (R S) =

Rules: combined

[p (R)] S

R [q (S)]

Page 53: CS 245: Database System Principles

CS 245 Notes 6 53

Some Rules can be Derived:

pq (R S) =

pqm (R S) =

pvq (R S) =

Rules: combined (continued)

Page 54: CS 245: Database System Principles

CS 245 Notes 6 54

Do one, others for homework:

pq (R S) = [p (R)] [q (S)]

pqm (R S) =

m [(p R) (q S)]pvq (R S) =

[(p R) S] U [R (q S)]

Page 55: CS 245: Database System Principles

CS 245 Notes 6 55

--> Derivation for first one:

pq (R S) =

Page 56: CS 245: Database System Principles

CS 245 Notes 6 56

--> Derivation for first one:

pq (R S) =

p [q (R S) ] =

p [ R q (S) ] =

[p (R)] [q (S)]

Page 57: CS 245: Database System Principles

CS 245 Notes 6 57

Rules: combined

Let x = subset of R attributes z = attributes in predicate P

(subset of R attributes)

x[p (R) ] =

Page 58: CS 245: Database System Principles

CS 245 Notes 6 58

Rules: combined

Let x = subset of R attributes z = attributes in predicate P

(subset of R attributes)

x[p (R) ] = {p [ x (R) ]}

Page 59: CS 245: Database System Principles

CS 245 Notes 6 59

Rules: combined

Let x = subset of R attributes z = attributes in predicate P

(subset of R attributes)

x[p (R) ] = {p [ x (R) ]} x xz

Page 60: CS 245: Database System Principles

CS 245 Notes 6 60

Rules: combined

Let x = subset of R attributes y = subset of S attributes

z = intersection of R,S attributes

xy (R S) =

Page 61: CS 245: Database System Principles

CS 245 Notes 6 61

Rules: combined

Let x = subset of R attributes y = subset of S attributes

z = intersection of R,S attributes

xy (R S) =

xy{[xz (R) ] [yz (S) ]}

Page 62: CS 245: Database System Principles

CS 245 Notes 6 62

xy {p (R S)} =

Page 63: CS 245: Database System Principles

CS 245 Notes 6 63

xy {p (R S)} =

xy {p [xz’ (R) yz’ (S)]}

z’ = z U {attributes used in P }

Page 64: CS 245: Database System Principles

CS 245 Notes 6 64

Rules for combined with X

similar...

e.g., p (R X S) = ?

Page 65: CS 245: Database System Principles

CS 245 Notes 6 65

p(R U S) = p(R) U p(S)

p(R - S) = p(R) - S = p(R) - p(S)

Rules U combined:

Page 66: CS 245: Database System Principles

CS 245 Notes 6 66

p1p2 (R) p1 [p2 (R)]

p (R S) [p (R)] S

R S S R

x [p (R)] x {p [xz (R)]}

Which are “good” transformations?

Page 67: CS 245: Database System Principles

CS 245 Notes 6 67

Conventional wisdom: do projects early

Example: R(A,B,C,D,E) x={E} P: (A=3) (B=“cat”)

x {p (R)} vs. E {p{ABE(R)}}

Page 68: CS 245: Database System Principles

CS 245 Notes 6 68

What if we have A, B indexes?

B = “cat” A=3

Intersect pointers to get

pointers to matching tuples

But

Page 69: CS 245: Database System Principles

CS 245 Notes 6 69

Bottom line:

• No transformation is always good• Usually good: early selections

Page 70: CS 245: Database System Principles

CS 245 Notes 6 70

In textbook: more transformations

• Eliminate common sub-expressions• Other operations: duplicate

elimination

Page 71: CS 245: Database System Principles

CS 245 Notes 6 71

Outline - Query Processing

• Relational algebra level– transformations– good transformations

• Detailed query plan level– estimate costs– generate and compare plans

Page 72: CS 245: Database System Principles

CS 245 Notes 6 72

• Estimating cost of query plan

(1) Estimating size of results(2) Estimating # of IOs

Page 73: CS 245: Database System Principles

CS 245 Notes 6 73

Estimating result size

• Keep statistics for relation R– T(R) : # tuples in R– S(R) : # of bytes in each R tuple– B(R): # of blocks to hold all R tuples– V(R, A) : # distinct values in R

for attribute A

Page 74: CS 245: Database System Principles

CS 245 Notes 6 74

Example R A: 20 byte string

B: 4 byte integerC: 8 byte dateD: 5 byte string

A B C D

cat 1 10 a

cat 1 20 b

dog 1 30 a

dog 1 40 c

bat 1 50 d

Page 75: CS 245: Database System Principles

CS 245 Notes 6 75

Example R A: 20 byte string

B: 4 byte integerC: 8 byte dateD: 5 byte string

A B C D

cat 1 10 a

cat 1 20 b

dog 1 30 a

dog 1 40 c

bat 1 50 d

T(R) = 5 S(R) = 37V(R,A) = 3 V(R,C) = 5V(R,B) = 1 V(R,D) = 4

Page 76: CS 245: Database System Principles

CS 245 Notes 6 76

Size estimates for W = R1 x R2

T(W) =

S(W) =

Page 77: CS 245: Database System Principles

CS 245 Notes 6 77

Size estimates for W = R1 x R2

T(W) =

S(W) =

T(R1) T(R2)

S(R1) + S(R2)

Page 78: CS 245: Database System Principles

CS 245 Notes 6 78

S(W) = S(R)

T(W) = ?

Size estimate for W = A=a(R)

Page 79: CS 245: Database System Principles

CS 245 Notes 6 79

Example R V(R,A)=3

V(R,B)=1V(R,C)=5V(R,D)=4

W = z=val(R) T(W) =

A B C D

cat 1 10 a

cat 1 20 b

dog 1 30 a

dog 1 40 c

bat 1 50 d

Page 80: CS 245: Database System Principles

CS 245 Notes 6 80

Example R V(R,A)=3

V(R,B)=1V(R,C)=5V(R,D)=4

W = z=val(R) T(W) =

A B C D

cat 1 10 a

cat 1 20 b

dog 1 30 a

dog 1 40 c

bat 1 50 dwhat is probability thistuple will be in answer?

Page 81: CS 245: Database System Principles

CS 245 Notes 6 81

Example R V(R,A)=3

V(R,B)=1V(R,C)=5V(R,D)=4

W = z=val(R) T(W) =

A B C D

cat 1 10 a

cat 1 20 b

dog 1 30 a

dog 1 40 c

bat 1 50 d

T(R)V(R,Z)

Page 82: CS 245: Database System Principles

CS 245 Notes 6 82

Assumption:

Values in select expression Z = valare uniformly distributedover possible V(R,Z) values.

Page 83: CS 245: Database System Principles

CS 245 Notes 6 83

Alternate Assumption:

Values in select expression Z = valare uniformly distributedover domain with DOM(R,Z) values.

Page 84: CS 245: Database System Principles

CS 245 Notes 6 84

Example R Alternate assumption

V(R,A)=3 DOM(R,A)=10

V(R,B)=1 DOM(R,B)=10

V(R,C)=5 DOM(R,C)=10

V(R,D)=4 DOM(R,D)=10

A B C D

cat 1 10 a

cat 1 20 b

dog 1 30 a

dog 1 40 c

bat 1 50 d

W = z=val(R) T(W) = ?

Page 85: CS 245: Database System Principles

CS 245 Notes 6 85

Example R Alternate assumption

V(R,A)=3 DOM(R,A)=10

V(R,B)=1 DOM(R,B)=10

V(R,C)=5 DOM(R,C)=10

V(R,D)=4 DOM(R,D)=10

A B C D

cat 1 10 a

cat 1 20 b

dog 1 30 a

dog 1 40 c

bat 1 50 d

W = z=val(R) T(W) = ?

what is probability thistuple will be in answer?

Page 86: CS 245: Database System Principles

CS 245 Notes 6 86

Example R Alternate assumption

V(R,A)=3 DOM(R,A)=10

V(R,B)=1 DOM(R,B)=10

V(R,C)=5 DOM(R,C)=10

V(R,D)=4 DOM(R,D)=10

A B C D

cat 1 10 a

cat 1 20 b

dog 1 30 a

dog 1 40 c

bat 1 50 d

W = z=val(R) T(W) = T(R)

DOM(R,Z)

Page 87: CS 245: Database System Principles

CS 245 Notes 6 87

Selection cardinality

SC(R,A) = average # records that satisfy

equality condition on R.A T(R)

V(R,A)SC(R,A) =

T(R) DOM(R,A)

Page 88: CS 245: Database System Principles

CS 245 Notes 6 88

What about W = z val (R) ?

T(W) = ?

Page 89: CS 245: Database System Principles

CS 245 Notes 6 89

What about W = z val (R) ?

T(W) = ?

• Solution # 1:T(W) = T(R)/2

Page 90: CS 245: Database System Principles

CS 245 Notes 6 90

What about W = z val (R) ?

T(W) = ?

• Solution # 1:T(W) = T(R)/2

• Solution # 2:T(W) = T(R)/3

Page 91: CS 245: Database System Principles

CS 245 Notes 6 91

• Solution # 3: Estimate values in range

Example RZ

Min=1 V(R,Z)=10

W= z 15 (R)

Max=20

Page 92: CS 245: Database System Principles

CS 245 Notes 6 92

• Solution # 3: Estimate values in range

Example RZ

Min=1 V(R,Z)=10

W= z 15 (R)

Max=20

f = 20-15+1 = 6 (fraction of range) 20-1+1 20

T(W) = f T(R)

Page 93: CS 245: Database System Principles

CS 245 Notes 6 93

Equivalently: fV(R,Z) = fraction of distinct

valuesT(W) = [f V(Z,R)] T(R) = f

T(R) V(Z,R)

Page 94: CS 245: Database System Principles

CS 245 Notes 6 94

Size estimate for W = R1 R2

Let x = attributes of R1 y = attributes of R2

Page 95: CS 245: Database System Principles

CS 245 Notes 6 95

Size estimate for W = R1 R2

Let x = attributes of R1 y = attributes of R2

X Y =

Same as R1 x R2

Case 1

Page 96: CS 245: Database System Principles

CS 245 Notes 6 96

W = R1 R2 X Y = AR1 A B C R2 A D

Case 2

Page 97: CS 245: Database System Principles

CS 245 Notes 6 97

W = R1 R2 X Y = AR1 A B C R2 A D

Case 2

Assumption:

V(R1,A) V(R2,A) Every A value in R1 is in R2V(R2,A) V(R1,A) Every A value in R2 is in R1

“containment of value sets” Sec. 7.4.4

Page 98: CS 245: Database System Principles

CS 245 Notes 6 98

R1 A B C R2 A D

Computing T(W) when V(R1,A) V(R2,A)

Take 1 tuple Match

Page 99: CS 245: Database System Principles

CS 245 Notes 6 99

R1 A B C R2 A D

Computing T(W) when V(R1,A) V(R2,A)

Take 1 tuple Match

1 tuple matches with T(R2) tuples... V(R2,A)

so T(W) = T(R2) T(R1) V(R2, A)

Page 100: CS 245: Database System Principles

CS 245 Notes 6 100

• V(R1,A) V(R2,A) T(W) = T(R2) T(R1)

V(R2,A)

• V(R2,A) V(R1,A) T(W) = T(R2) T(R1)

V(R1,A)

[A is common attribute]

Page 101: CS 245: Database System Principles

CS 245 Notes 6 101

T(W) = T(R2) T(R1)max{ V(R1,A), V(R2,A) }

In general W = R1 R2

Page 102: CS 245: Database System Principles

CS 245 Notes 6 102

with alternate assumptionValues uniformly distributed over domain

R1 A B C R2 A D

This tuple matches T(R2)/DOM(R2,A) so

T(W) = T(R2) T(R1) = T(R2) T(R1) DOM(R2, A) DOM(R1, A)

Case 2

Assume the same

Page 103: CS 245: Database System Principles

CS 245 Notes 6 103

In all cases:

S(W) = S(R1) + S(R2) - S(A) size of attribute

A

Page 104: CS 245: Database System Principles

CS 245 Notes 6 104

Using similar ideas,we can estimate sizes of:

AB (R) ….. Sec. 16.4.2 (same for either edition)

A=aB=b (R) …. Sec. 16.4.3

R S with common attribs. A,B,C Sec. 16.4.5

Union, intersection, diff, …. Sec. 16.4.7

Page 105: CS 245: Database System Principles

CS 245 Notes 6 105

Note: for complex expressions, need

intermediate T,S,V results.

E.g. W = [A=a (R1) ] R2

Treat as relation U

T(U) = T(R1)/V(R1,A) S(U) = S(R1)

Also need V (U, *) !!

Page 106: CS 245: Database System Principles

CS 245 Notes 6 106

To estimate Vs

E.g., U = A=a (R1) Say R1 has attribs A,B,C,D

V(U, A) = V(U, B) =V(U, C) = V(U, D) =

Page 107: CS 245: Database System Principles

CS 245 Notes 6 107

Example R 1 V(R1,A)=3

V(R1,B)=1V(R1,C)=5V(R1,D)=3

U = A=a (R1)

A B C D

cat 1 10 10

cat 1 20 20

dog 1 30 10

dog 1 40 30

bat 1 50 10

Page 108: CS 245: Database System Principles

CS 245 Notes 6 108

Example R 1 V(R1,A)=3

V(R1,B)=1V(R1,C)=5V(R1,D)=3

U = A=a (R1)

A B C D

cat 1 10 10

cat 1 20 20

dog 1 30 10

dog 1 40 30

bat 1 50 10

V(U,A) =1 V(U,B) =1 V(U,C) = T(R1) V(R1,A)

V(D,U) ... somewhere in between

Page 109: CS 245: Database System Principles

CS 245 Notes 6 109

Possible Guess U = A=a (R)

V(U,A) = 1V(U,B) = V(R,B)

Page 110: CS 245: Database System Principles

CS 245 Notes 6 110

For Joins U = R1(A,B) R2(A,C)

V(U,A) = min { V(R1, A), V(R2, A) }V(U,B) = V(R1, B)V(U,C) = V(R2, C)

[called “preservation of value sets” in section 7.4.4]

Page 111: CS 245: Database System Principles

CS 245 Notes 6 111

Example:

Z = R1(A,B) R2(B,C) R3(C,D)

T(R1) = 1000 V(R1,A)=50 V(R1,B)=100

T(R2) = 2000 V(R2,B)=200 V(R2,C)=300

T(R3) = 3000 V(R3,C)=90 V(R3,D)=500

R1

R2

R3

Page 112: CS 245: Database System Principles

CS 245 Notes 6 112

T(U) = 10002000 V(U,A) = 50 200 V(U,B) = 100

V(U,C) = 300

Partial Result: U = R1 R2

Page 113: CS 245: Database System Principles

CS 245 Notes 6 113

Z = U R3

T(Z) = 100020003000 V(Z,A) = 50200300 V(Z,B) = 100

V(Z,C) = 90 V(Z,D) = 500

Page 114: CS 245: Database System Principles

CS 245 Notes 6 114

A Note on Histograms

10 20 30 40

10

20

30

40

number of tuplesin R with A valuein given range

A=val(R) = ?

Page 115: CS 245: Database System Principles

CS 245 Notes 6 115

Summary

• Estimating size of results is an “art”

• Don’t forget:Statistics must be kept up to

date…(cost?)

Page 116: CS 245: Database System Principles

CS 245 Notes 6 116

Outline

• Estimating cost of query plan– Estimating size of results

done!– Estimating # of IOs next…

• Generate and compare plans