CS257 Query Optimization

Preview:

DESCRIPTION

CS257 Query Optimization. 2.4 An Algebraic Query Language. 2.4 An Algebraic Query Language. 2.4.1 Why Do We Need a Special Query Language? 2.4.2 What is an Algebra? 2.4.3 Overview of Relational Algebra 2.4.4 Set Operations on Relations 2.4.5 Projection 2.4.6 Selection - PowerPoint PPT Presentation

Citation preview

1

CS257

Query Optimization

2.4 An Algebraic Query Language

2.4 An Algebraic Query Language

• 2.4.1 Why Do We Need a Special Query Language? • 2.4.2 What is an Algebra? • 2.4.3 Overview of Relational Algebra • 2.4.4 Set Operations on Relations • 2.4.5 Projection • 2.4.6 Selection • 2.4.7 Cartesian Product • 2.4.8 Natural Joins • 2.4.9 Theta-Joins • 2.4.10 Combining Operations to Form Queries • 2.4.11 Naming and Renaming • 2.4.12 Relationships Among Operations • 2.4.13 A Linear Notation for Algebraic Expressions • 2.4.14 Exercises for Section 2.4

2.4.1 Why Do We Need a Special Query Language?

2.4.2 What is an Algebra?

+

-

*

/

ProjectionSelectionNatural JoinCartesian ProductUnionIntersectMinus

2.4.3 Overview of Relational Algebra

• Both numerical algebra and relational algebra are defined by (respective sets of)

• Algebraic Laws.

2.4.3 Overview of Relational Algebra

• Algebraic laws of Numbers

that we have learned in calculus, algebra and etc since high school days

2.4.3 Overview of Relational Algebra

• Algebraic laws

of relational algebra are

• “New” to most of us

2.4.3 Overview of Relational Algebra

Special Relational Operators 1. Projection2. Selection3. Natural Join

Traditional Set Operators4. Cartesian Product5. Union6. Intersect7. Minus

10

SELECT PROJECT PRODUCT

aabbcc

xyxyxy

UNION INTERSECTION DIFFERENCE

b1b2b3

c1c2c3

a1a2a3

b1b1b2

a1a2a3

b1b1b2

c1c1c2

(NATURAL) JOIN

aaabc

xyzxy

xz

a

DIVIDE

x

y

a

b

c

QUERY OPTIMIZATION

• In query optimization the query is transformed by compiler into such a form that can solve the problem “fastest ”

QUERY OPTIMIZATION

• Illustration: 2 * 3 + 5 * 3

• From the laws of Numerical algebra, it can be computed as follows:.

Method I - It computes in three operations

• 2 * 3 + 5 * 3 = 6 (first) + 15(second)

= 21 (third)

QUERY OPTIMIZATION

• Illustration: 2 * 3 + 5 * 3

• From the laws of Numerical algebra, it can Also be computed as follows:.

Method II- transform into equivalent one.

= (2 + 5) * 3 = 7 (first) * 3 = 21 (second)

• It computes in two operations

So compiler choose Method II

Roles of Relational Algebra

• QUERY OPTIMIZATION

• Similar Idea is used in Relational Algebra

• So we need to know the

• Algebraic Laws

of Relational Algebra well

• That is the main goal for Ch5 and part of Ch16

2.4.4 Set Operations on Relations

1. Recall Venn Diagram

a b

c

d

e

X Y

16

Query AlgebraIn DBMS a query is turn into

a sequence of operators on U

• These operators forms the query algebra (Extended RA)

17

Query AlgebraBag operations:

RS SUM (# of times in R, - - - - - - S)

R∩S Min (# of times in R, - - - - - - S)

18

ExamplesTwo bags: R = {A, B, B}

S = {C, A, B, C}

R S ={A, A, B, B, B, C, C}

R ∩ S = {A, B}

R—S ={B} (Obvious?)

19

Selection: C(R)

SQL: Select *

from R

where C

C can involve computable formulas

20

Projection: L(R)

Select L

from R

21

Projection: L(R)• L can have :

1. A single attribute of R.

2. An expression x y: It means we take the attribute x of R and rename it as y.

3. An expression E z where E is an expression involving attributes of R and z is a new name for the attribute that results from the calculation

. Ex: a+b x

22

Theta Join: Theta Join:

R c S = σc(RS)

Natural Join (Special Case of Theta Join)

R c S = πL(σc(RS)) Where L: The list that meets the condition c of equality and

redundant attributes are dropped

23

SORTING OPERATOR ()

• This is the SQL ORDER BY clause and denoted by the operator

L (R) • where R is a relation and L a list of some of R’s

attributes in the relation R but with the tuples of R sorted in the order indicated by L.

• If L is a1, a2…an, then tuples are first sorted by a1, then a2 until an. By default sorting is in ascending order.

24

Grouping and Aggregation: L(R)

Select L(=A,B)

from R

group by AA: aggregating attribute

B: aggregated attributes

25

Grouping and Aggregation: L(R)

It returns a relation that partitions the tuples

of R in to groups. Each group consists of all

tuples having one particular assignment of

values to the grouping attributes A in L.

L also contains B, Aggregated attributes, in the form: Aggregation operator Name

26

Grouping and Aggregation• Aggregation operators :

AVG, SUM, COUNT, MIN, MAX

• Grouping: GROUP BY clause in SQL

• Having clause must follow a GROUP BY clause

27

Grouping and Aggregation• Grouping and aggregation are generally

implemented together. So we have a single operator defining it

• It is a generalized Projection Operator

• Delicate-elimination operator is a special Aggregation operator.

28

pnum, sum(qty)sum(SP)

Select pnum, sum(qty) as sum

from SP

group by pnum;

29

Some rules on selection:

1.σc1 and c2(R) = σc1(σc2 (R) )

2.σc1 or c2(R) = (σc1R) s (σc2 R)

whenever c apply to R or S

30

Some Rules about Selection:

3.σc(R S) = (σcR ) (σc S)

4.σc(R S) = (σcR) (σc S)

5.σc(R S) = (σcR) (σcS)

whenever c apply to R or S

31

Rules about (generalized) Projection:

6. We may introduce a (generalized) projection anywhere in an expression tree, as long as it eliminates only attributes that are never used by any operator above

32

Some Rules about Projection:

7. L(R c S ) =L(M(R) c N(R))

8. L(R S ) =L(R) L(S)

33

Query Optimization Example

Select p.pname, p.pnum, sum(sp.qty) as sum

from Parts p, Shipments sp

where p.pnum = sp.pnum

and p.weight > 10

group by p.pname, p.pnum

having sum(sp.qty) >= 200;

34

Translating to Query Algebra

Step 1 (P SP)

Step 2σP.PNUM = SP.PNUM and P.WEIGHT >10(PSP) = Step 3σSUM >= 200 (γP.PNAME, P. PNUM, SUM(SP.QTY) → SUM ())=,

Step 4 πP.PNAME, P. PNUM, SUM () =

35

πP.PNAME, P. PNUM, SUM

σSUM >= 200

γP.PNAME, P. PNUM, SUM(SP.QTY) → SUM

σP.PNUM = SP.PNUM and P.WEIGHT >10

P SP

36

Computing in QA

= πP.PNAME, P. PNUM, SUM( γP.PNAME, P. PNUM, SUM(SP.QTY) → SUM (

σP.WEIGHT >10 and P.PNUM = SP.PNUM (PSP) )

=(1)πP.PNAME, P. PNUM, SUM( γP.PNAME, P. PNUM, SUM(SP.QTY) → SUM (

σP.WEIGHT >10 (σP.PNUM = SP.PNUM (PSP) )

37

Computing in QA

πP.PNAME, P. PNUM, SUM( γP.PNAME, P. PNUM, SUM(SP.QTY) → SUM (

σP.WEIGHT >10 (σP.PNUM = SP.PNUM (PSP) )=(6)

πP.PNAME, P. PNUM, SUM( γP.PNAME, P. PNUM, SUM(SP.QTY) → SUM (

σP.WEIGHT >10 (πLσP.PNUM = SP.PNUM (PSP) )

38

Computing in QA

πP.PNAME, P. PNUM, SUM( γP.PNAME, P. PNUM, SUM(SP.QTY) → SUM (

σP.WEIGHT >10 (πLσP.PNUM = SP.PNUM (PSP) )=D

πP.PNAME, P. PNUM, SUM( γP.PNAME, P. PNUM, SUM(SP.QTY) → SUM ( σP.WEIGHT >10

(P P.PNUM = SP.PNUM SP )

39

Computing in QA

πP.PNAME, P. PNUM, SUM( γP.PNAME, P. PNUM, SUM(SP.QTY) → SUM ( σP.WEIGHT >10

(P P.PNUM = SP.PNUM SP )=(5)

πP.PNAME, P. PNUM, SUM( γP.PNAME, P. PNUM, SUM(SP.QTY) → SUM (

σP.WEIGHT >10 (P) P.PNUM = SP.PNUM σP.WEIGHT >10 (SP) )

40

Computing in QA

πP.PNAME, P. PNUM, SUM ( γP.PNAME, P. PNUM, SUM(SP.QTY) → SUM (

(σP.WEIGHT >10 (P) P.PNUM = SP.PNUM σP.WEIGHT >10 (SP) ) =D

πP.PNAME, P. PNUM, SUM( γP.PNAME, P. PNUM, SUM(SP.QTY) → SUM (

σP.WEIGHT >10 (P) P.PNUM = SP.PNUM (σP.WEIGHT >10 =1)(SP) )=

41

Final Expression

= πP.PNAME, P. PNUM, SUM(

σSUM >= 200 (γP.PNAME, P. PNUM, SUM(SP.QTY) → SUM (

σP.WEIGHT >10 (P) P.PNUM = SP.PNUM (SP) )=

42

πP.PNAME, P. PNUM, SUM

σSUM >= 200

γP.PNAME, P. PNUM, SUM(SP.QTY) → SUM

P.PNUM = SP.PNUM

P SP

σp.weight

43

Final Expression

is computationally cheaper than

σP.WEIGHT >10 (P) is smaller than P

44

πP.PNAME, P. PNUM, SUM

σSUM >= 200

P.PNUM = SP.PNUM

P SP

σp.weightγP. PNUM, SUM(SP.QTY) → SUM

45

πP.PNAME, P. PNUM, SUM

P.PNUM = SP.PNUM

P

SP

σp.weight

γP. PNUM, SUM(SP.QTY) → SUM

σSUM >= 200

46

Sample Division

DEND DIVIDE BY DOR

DEND DOR

S # S # P # P # S1 S1 P1 P1 S2 S1 P2 S1 P3 S1 P4 P # S # S1 P5 P2 S1 S1 P6 P4 S4 S2 P1 S2 P2 S3 P2 P # S4 P2 P1 S4 P4 P2 S # S4 P5 P3 S1

P4 P5 P6

47

EXPRESSION TREES

• Generated by combining several Qerry Algebra operators into one expression by applying one operator to the result(s) of one or more operators.

• The leaves of this tree are names of relations.

Interior nodes are operators, which are applied to the relations represented by its child or children

Recommended