47
1 CS257 Query Optimization

CS257 Query Optimization

Embed Size (px)

DESCRIPTION

CS257 Query Optimization. 2.4 An Algebraic Query Language. 2.4 An Algebraic Query Language. 2.4.1 Why Do We Need a Special Query Language? 2.4.2 What is an Algebra? 2.4.3 Overview of Relational Algebra 2.4.4 Set Operations on Relations 2.4.5 Projection 2.4.6 Selection - PowerPoint PPT Presentation

Citation preview

Page 1: CS257 Query Optimization

1

CS257

Query Optimization

Page 2: CS257 Query Optimization

2.4 An Algebraic Query Language

Page 3: CS257 Query Optimization

2.4 An Algebraic Query Language

• 2.4.1 Why Do We Need a Special Query Language? • 2.4.2 What is an Algebra? • 2.4.3 Overview of Relational Algebra • 2.4.4 Set Operations on Relations • 2.4.5 Projection • 2.4.6 Selection • 2.4.7 Cartesian Product • 2.4.8 Natural Joins • 2.4.9 Theta-Joins • 2.4.10 Combining Operations to Form Queries • 2.4.11 Naming and Renaming • 2.4.12 Relationships Among Operations • 2.4.13 A Linear Notation for Algebraic Expressions • 2.4.14 Exercises for Section 2.4

Page 4: CS257 Query Optimization

2.4.1 Why Do We Need a Special Query Language?

Page 5: CS257 Query Optimization

2.4.2 What is an Algebra?

+

-

*

/

ProjectionSelectionNatural JoinCartesian ProductUnionIntersectMinus

Page 6: CS257 Query Optimization

2.4.3 Overview of Relational Algebra

• Both numerical algebra and relational algebra are defined by (respective sets of)

• Algebraic Laws.

Page 7: CS257 Query Optimization

2.4.3 Overview of Relational Algebra

• Algebraic laws of Numbers

that we have learned in calculus, algebra and etc since high school days

Page 8: CS257 Query Optimization

2.4.3 Overview of Relational Algebra

• Algebraic laws

of relational algebra are

• “New” to most of us

Page 9: CS257 Query Optimization

2.4.3 Overview of Relational Algebra

Special Relational Operators 1. Projection2. Selection3. Natural Join

Traditional Set Operators4. Cartesian Product5. Union6. Intersect7. Minus

Page 10: CS257 Query Optimization

10

SELECT PROJECT PRODUCT

aabbcc

xyxyxy

UNION INTERSECTION DIFFERENCE

b1b2b3

c1c2c3

a1a2a3

b1b1b2

a1a2a3

b1b1b2

c1c1c2

(NATURAL) JOIN

aaabc

xyzxy

xz

a

DIVIDE

x

y

a

b

c

Page 11: CS257 Query Optimization

QUERY OPTIMIZATION

• In query optimization the query is transformed by compiler into such a form that can solve the problem “fastest ”

Page 12: CS257 Query Optimization

QUERY OPTIMIZATION

• Illustration: 2 * 3 + 5 * 3

• From the laws of Numerical algebra, it can be computed as follows:.

Method I - It computes in three operations

• 2 * 3 + 5 * 3 = 6 (first) + 15(second)

= 21 (third)

Page 13: CS257 Query Optimization

QUERY OPTIMIZATION

• Illustration: 2 * 3 + 5 * 3

• From the laws of Numerical algebra, it can Also be computed as follows:.

Method II- transform into equivalent one.

= (2 + 5) * 3 = 7 (first) * 3 = 21 (second)

• It computes in two operations

So compiler choose Method II

Page 14: CS257 Query Optimization

Roles of Relational Algebra

• QUERY OPTIMIZATION

• Similar Idea is used in Relational Algebra

• So we need to know the

• Algebraic Laws

of Relational Algebra well

• That is the main goal for Ch5 and part of Ch16

Page 15: CS257 Query Optimization

2.4.4 Set Operations on Relations

1. Recall Venn Diagram

a b

c

d

e

X Y

Page 16: CS257 Query Optimization

16

Query AlgebraIn DBMS a query is turn into

a sequence of operators on U

• These operators forms the query algebra (Extended RA)

Page 17: CS257 Query Optimization

17

Query AlgebraBag operations:

RS SUM (# of times in R, - - - - - - S)

R∩S Min (# of times in R, - - - - - - S)

Page 18: CS257 Query Optimization

18

ExamplesTwo bags: R = {A, B, B}

S = {C, A, B, C}

R S ={A, A, B, B, B, C, C}

R ∩ S = {A, B}

R—S ={B} (Obvious?)

Page 19: CS257 Query Optimization

19

Selection: C(R)

SQL: Select *

from R

where C

C can involve computable formulas

Page 20: CS257 Query Optimization

20

Projection: L(R)

Select L

from R

Page 21: CS257 Query Optimization

21

Projection: L(R)• L can have :

1. A single attribute of R.

2. An expression x y: It means we take the attribute x of R and rename it as y.

3. An expression E z where E is an expression involving attributes of R and z is a new name for the attribute that results from the calculation

. Ex: a+b x

Page 22: CS257 Query Optimization

22

Theta Join: Theta Join:

R c S = σc(RS)

Natural Join (Special Case of Theta Join)

R c S = πL(σc(RS)) Where L: The list that meets the condition c of equality and

redundant attributes are dropped

Page 23: CS257 Query Optimization

23

SORTING OPERATOR ()

• This is the SQL ORDER BY clause and denoted by the operator

L (R) • where R is a relation and L a list of some of R’s

attributes in the relation R but with the tuples of R sorted in the order indicated by L.

• If L is a1, a2…an, then tuples are first sorted by a1, then a2 until an. By default sorting is in ascending order.

Page 24: CS257 Query Optimization

24

Grouping and Aggregation: L(R)

Select L(=A,B)

from R

group by AA: aggregating attribute

B: aggregated attributes

Page 25: CS257 Query Optimization

25

Grouping and Aggregation: L(R)

It returns a relation that partitions the tuples

of R in to groups. Each group consists of all

tuples having one particular assignment of

values to the grouping attributes A in L.

L also contains B, Aggregated attributes, in the form: Aggregation operator Name

Page 26: CS257 Query Optimization

26

Grouping and Aggregation• Aggregation operators :

AVG, SUM, COUNT, MIN, MAX

• Grouping: GROUP BY clause in SQL

• Having clause must follow a GROUP BY clause

Page 27: CS257 Query Optimization

27

Grouping and Aggregation• Grouping and aggregation are generally

implemented together. So we have a single operator defining it

• It is a generalized Projection Operator

• Delicate-elimination operator is a special Aggregation operator.

Page 28: CS257 Query Optimization

28

pnum, sum(qty)sum(SP)

Select pnum, sum(qty) as sum

from SP

group by pnum;

Page 29: CS257 Query Optimization

29

Some rules on selection:

1.σc1 and c2(R) = σc1(σc2 (R) )

2.σc1 or c2(R) = (σc1R) s (σc2 R)

whenever c apply to R or S

Page 30: CS257 Query Optimization

30

Some Rules about Selection:

3.σc(R S) = (σcR ) (σc S)

4.σc(R S) = (σcR) (σc S)

5.σc(R S) = (σcR) (σcS)

whenever c apply to R or S

Page 31: CS257 Query Optimization

31

Rules about (generalized) Projection:

6. We may introduce a (generalized) projection anywhere in an expression tree, as long as it eliminates only attributes that are never used by any operator above

Page 32: CS257 Query Optimization

32

Some Rules about Projection:

7. L(R c S ) =L(M(R) c N(R))

8. L(R S ) =L(R) L(S)

Page 33: CS257 Query Optimization

33

Query Optimization Example

Select p.pname, p.pnum, sum(sp.qty) as sum

from Parts p, Shipments sp

where p.pnum = sp.pnum

and p.weight > 10

group by p.pname, p.pnum

having sum(sp.qty) >= 200;

Page 34: CS257 Query Optimization

34

Translating to Query Algebra

Step 1 (P SP)

Step 2σP.PNUM = SP.PNUM and P.WEIGHT >10(PSP) = Step 3σSUM >= 200 (γP.PNAME, P. PNUM, SUM(SP.QTY) → SUM ())=,

Step 4 πP.PNAME, P. PNUM, SUM () =

Page 35: CS257 Query Optimization

35

πP.PNAME, P. PNUM, SUM

σSUM >= 200

γP.PNAME, P. PNUM, SUM(SP.QTY) → SUM

σP.PNUM = SP.PNUM and P.WEIGHT >10

P SP

Page 36: CS257 Query Optimization

36

Computing in QA

= πP.PNAME, P. PNUM, SUM( γP.PNAME, P. PNUM, SUM(SP.QTY) → SUM (

σP.WEIGHT >10 and P.PNUM = SP.PNUM (PSP) )

=(1)πP.PNAME, P. PNUM, SUM( γP.PNAME, P. PNUM, SUM(SP.QTY) → SUM (

σP.WEIGHT >10 (σP.PNUM = SP.PNUM (PSP) )

Page 37: CS257 Query Optimization

37

Computing in QA

πP.PNAME, P. PNUM, SUM( γP.PNAME, P. PNUM, SUM(SP.QTY) → SUM (

σP.WEIGHT >10 (σP.PNUM = SP.PNUM (PSP) )=(6)

πP.PNAME, P. PNUM, SUM( γP.PNAME, P. PNUM, SUM(SP.QTY) → SUM (

σP.WEIGHT >10 (πLσP.PNUM = SP.PNUM (PSP) )

Page 38: CS257 Query Optimization

38

Computing in QA

πP.PNAME, P. PNUM, SUM( γP.PNAME, P. PNUM, SUM(SP.QTY) → SUM (

σP.WEIGHT >10 (πLσP.PNUM = SP.PNUM (PSP) )=D

πP.PNAME, P. PNUM, SUM( γP.PNAME, P. PNUM, SUM(SP.QTY) → SUM ( σP.WEIGHT >10

(P P.PNUM = SP.PNUM SP )

Page 39: CS257 Query Optimization

39

Computing in QA

πP.PNAME, P. PNUM, SUM( γP.PNAME, P. PNUM, SUM(SP.QTY) → SUM ( σP.WEIGHT >10

(P P.PNUM = SP.PNUM SP )=(5)

πP.PNAME, P. PNUM, SUM( γP.PNAME, P. PNUM, SUM(SP.QTY) → SUM (

σP.WEIGHT >10 (P) P.PNUM = SP.PNUM σP.WEIGHT >10 (SP) )

Page 40: CS257 Query Optimization

40

Computing in QA

πP.PNAME, P. PNUM, SUM ( γP.PNAME, P. PNUM, SUM(SP.QTY) → SUM (

(σP.WEIGHT >10 (P) P.PNUM = SP.PNUM σP.WEIGHT >10 (SP) ) =D

πP.PNAME, P. PNUM, SUM( γP.PNAME, P. PNUM, SUM(SP.QTY) → SUM (

σP.WEIGHT >10 (P) P.PNUM = SP.PNUM (σP.WEIGHT >10 =1)(SP) )=

Page 41: CS257 Query Optimization

41

Final Expression

= πP.PNAME, P. PNUM, SUM(

σSUM >= 200 (γP.PNAME, P. PNUM, SUM(SP.QTY) → SUM (

σP.WEIGHT >10 (P) P.PNUM = SP.PNUM (SP) )=

Page 42: CS257 Query Optimization

42

πP.PNAME, P. PNUM, SUM

σSUM >= 200

γP.PNAME, P. PNUM, SUM(SP.QTY) → SUM

P.PNUM = SP.PNUM

P SP

σp.weight

Page 43: CS257 Query Optimization

43

Final Expression

is computationally cheaper than

σP.WEIGHT >10 (P) is smaller than P

Page 44: CS257 Query Optimization

44

πP.PNAME, P. PNUM, SUM

σSUM >= 200

P.PNUM = SP.PNUM

P SP

σp.weightγP. PNUM, SUM(SP.QTY) → SUM

Page 45: CS257 Query Optimization

45

πP.PNAME, P. PNUM, SUM

P.PNUM = SP.PNUM

P

SP

σp.weight

γP. PNUM, SUM(SP.QTY) → SUM

σSUM >= 200

Page 46: CS257 Query Optimization

46

Sample Division

DEND DIVIDE BY DOR

DEND DOR

S # S # P # P # S1 S1 P1 P1 S2 S1 P2 S1 P3 S1 P4 P # S # S1 P5 P2 S1 S1 P6 P4 S4 S2 P1 S2 P2 S3 P2 P # S4 P2 P1 S4 P4 P2 S # S4 P5 P3 S1

P4 P5 P6

Page 47: CS257 Query Optimization

47

EXPRESSION TREES

• Generated by combining several Qerry Algebra operators into one expression by applying one operator to the result(s) of one or more operators.

• The leaves of this tree are names of relations.

Interior nodes are operators, which are applied to the relations represented by its child or children