Upload
timothy-wood
View
222
Download
0
Embed Size (px)
Citation preview
8/12/2019 Fo Optimization2
1/32
Query processing and
o timization
Reading (5th edition): Chapters 6.1-6.3, 15.1-15.3, 15.7-15.8.2
Jose M. Pea
8/12/2019 Fo Optimization2
2/32
ERdiagram
Relational model
MySQL
8/12/2019 Fo Optimization2
3/32
Relation schema
Attributes
-
yymmdd-xxxx
Textual string less than 30 chars
Textual string less than 30 chars
rrr - nn nn nn
aaaaannn
Positive integer0
8/12/2019 Fo Optimization2
4/32
Relation (state)
PNumber Name Address Telephone E-mail Age123456-7890 Anders
AnderssonRydsvgen 1 013-11 22 33 andan111 25
112233-4455 Veronika Alsters 2 013-22 33 44 ver e222 27
Pettersson
Tuple = list of values in the corresponding domains, or NULL
8/12/2019 Fo Optimization2
5/32
Key constraints Relation = set of tuples.
Then, no duplicates are allowed.
Then, every tuple is uniquely identifiable
super ey, can ate ey, pr mary eywhich are all time-invariant).
PNumber Name Address Telephone E-mail Age
123456-7890 AndersAndersson
Rydsvgen 1 013-11 22 33 andan111 25
112233-4455 VeronikaPettersson
Alstersg 2 013-22 33 44 verpe222 27
8/12/2019 Fo Optimization2
6/32
Integrity constraints
Entity integrity constraint = no primarykey value is NULL.
domain(FK) = domain(PK) and (ii) everyvalue of FK in R1 refers to an existing
tuple in R2 or is NULL. Referential integrity constraint =
conditions (i) and (ii) above hold.
8/12/2019 Fo Optimization2
7/32
8/12/2019 Fo Optimization2
8/32
Select Selects the tuples of a relation satisfying
some condition over its attributes.
)(3)21( RZAYAXA =
8/12/2019 Fo Optimization2
9/32
Example: select
PNum Name Address TelNr
112233-4455 Elin Rydsvgen 1 112233
223344-5566 Nisse Alstersgatan 3 223344
334455-6677 Nisse Rydsvgen 3 334455
STUDENT:
113322-1122 Pelle Rydsvgen 2 113322552233-1144 Monika Rydsvgen 4 443322
442211-2222 Patrik Rydsvgen 6 111122
334433-1111 Camilla Alstersgatan 1 665544
)('')'334455'''( STUDENTCamillaNameTelNrNisseName ===
PNum Name Address TelNr
334455-6677 Nisse Rydsvgen 3 334455
334433-1111 Camilla Alstersgatan 1 665544
8/12/2019 Fo Optimization2
10/32
Project Projects a relation over some attributes.
The result must be a relation = duplicatesare removed.
3,2,1 AAA
8/12/2019 Fo Optimization2
11/32
Example: project
PNum Name Address TelNr112233-4455 Elin Rydsvgen 1 112233
223344-5566 Nisse Alstersgatan 3 223344
334455-6677 Nisse R dsv en 3 334455
STUDENT:
)(, STUDENTNamePNum
PNum Name
112233-4455 Elin
223344-5566 Nisse
334455-6677 Nisse
?)(STUDENTName
8/12/2019 Fo Optimization2
12/32
Union, intersection anddifference
SRISRU SR
, . .same number of attributes and with thesame domains.
The result must be a relation =duplicates are removed (union).
8/12/2019 Fo Optimization2
13/32
Example: IntersectionPNum Name Address TelNr
112233-4455 Elin Rydsvgen 1 112233
223344-5566 Nisse Alstersgatan 3 223344
334455-6677 Nisse Rydsvgen 3 334455
STUDENT:
PNum Name Office address TelNr
884455-4455 Monika Teknikringen 1 111112
223344-5566 Nisse Alstersgatan 3 223344
668877-7766 Patrik Teknikringen 3 332211
EMPLOYEESTUDENTIPNum Name Address TelNr
223344-5566 Nisse Alstersgatan 3 223344
8/12/2019 Fo Optimization2
14/32
Cartesian productName STATE
Los Angeles Calif
Oakland Calif
Atlanta Ga
Name STATE Key City
Los Angeles Calif 5 San Fransisco
Los Angeles Calif 7 Oakland
Los Angeles Calif 8 Boston
Oakland Calif 5 San Fransisco
Oakland Calif 7 Oakland
Oakland Calif 8 Boston
R:
San Fransisco Calif
Boston Mass
Key City
5 San Fransisco
7 Oakland
8 Boston
AtlantaGa 5 San Fransisco
Atlanta Ga 7 Oakland
Atlanta Ga 8 Boston
San Fransisco Calif 5 San Fransisco
San Fransisco Calif 7 Oakland
San Fransisco Calif 8 Boston
Boston Mass 5 San Fransisco
Boston Mass 7 Oakland
Boston Mass 8 Boston
S: R x S
8/12/2019 Fo Optimization2
15/32
Join
Joins two tuples from two relations if they satisfysome condition over their attributes.
Join = Cartesian product followed by selection.
Tuples with NULL in the condition attributes donot appear in the result.
Recall: Join only on foreign key-primary key
attributes.
R.A1=S.B3 AND R.A5
8/12/2019 Fo Optimization2
16/32
Example: joinName STATE
Los Angeles Calif
Oakland Calif
Atlanta Ga
Key City
5 San Fransisco
7 Oakland
R:
S:
San Fransisco Calif
Boston Mass8 Boston
Name STATE Key City
Oakland Calif 7 Oakland
San Fransisco Calif 5 San Fransisco
Boston Mass 8 Boston
R.Name=S.CityR S
8/12/2019 Fo Optimization2
17/32
Name STATE Key City
Los Angeles Calif 5 San Fransisco
Los Angeles Calif 7 Oakland
Los Angeles Calif 8 Boston
OaklandCalif 5 San Fransisco
Oakland Calif 7 Oakland
Oakland Calif 8 Boston
Atlanta Ga 5 San Fransisco
Atlanta Ga 7 Oakland
Atlanta Ga 8 Boston
San Fransisco Calif 5 San Fransisco
San Fransisco Calif 7 Oakland
San Fransisco Calif 8 Boston
Boston Mass 5 San Fransisco
Boston Mass 7 Oakland
Boston Mass 8 Boston
8/12/2019 Fo Optimization2
18/32
Example: joinName Area
Los Angeles 2
Oakland 9
Atlanta 7
R:
Name Area Key City
Los Angeles 2 5 San Fransisco
Los Angeles 2 7 Oakland
Los Angeles 2 8 Boston
Boston 16
Key City
5 San Fransisco
7 Oakland
8 Boston
S: R.Area
8/12/2019 Fo Optimization2
19/32
Name Area Key City
Los Angeles 2 5 San Fransisco
Los Angeles 2 7 Oakland
Los Angeles 2 8 Boston
Oakland 9 5 San FransiscoOakland 9 7 Oakland
Oakland 9 8 Boston
Atlanta 7 7 Oakland
Atlanta 7 8 Boston
San Fransisco 11 5 San Fransisco
San Fransisco11 7 Oakland
San Fransisco 11 8 Boston
Boston 16 5 San Fransisco
Boston 16 7 Oakland
Boston 16 8 Boston
8/12/2019 Fo Optimization2
20/32
Variants of join
Theta join = join. Equijoin = join with only equality conditions.
=
duplicate attributes is removed (attributes inthe conditions must have the same name).
Unless otherwise specified, natural join joins
all the attributes with the same name in Rand S.
AR S*
8/12/2019 Fo Optimization2
21/32
Example
8/12/2019 Fo Optimization2
22/32
Query trees Tree that represents a relational algebra expression. Leaves = base tables. Internal nodes = relational algebra operators applied to the nodes
children. The tree is executed from leaves to root.
Example: List the last name of the employees born after 1957 who work .
SELECT E.LNAMEFROM EMPLOYEE E, WORKS_ON W, PROJECT PWHERE P.PNAME = Aquarius AND P.PNUMBER = W.PNO AND W.ESSN = E.SSN AND E.BDATE > 1957-12-31
Canonial query tree
SELECT attributesFROM A, B, CWHERE condition
XX
CA B
condition
attributes
Construct the canonical query tree as follows Cartesian product of the FROM-tables
Select with WHERE-condition Project to the SELECT-attributes
8/12/2019 Fo Optimization2
23/32
Equivalent query trees
8/12/2019 Fo Optimization2
24/32
Real World
Model
DatabaseProcessing of
Queries AnswersUpdates
User 4
Queries AnswersUpdates
User 3
Queries AnswersUpdates
User 2
Queries AnswersUpdates
User 1
Overview
Physicaldatabase
management
system
Access to stored data
8/12/2019 Fo Optimization2
25/32
Query processingStarsIn( movieTitle, movieYear, starName )MovieStar( name, address, gender, birthdate )
SELECT movieTitleFROM StarsInWHERE starName IN (
SELECT nameFROM MovieStarWHERE birthdate LIKE %1960);
Canonical query tree(usually very inefficient)
8/12/2019 Fo Optimization2
26/32
Parsing and validating Control of used relations
Have to be declared in FROM Must exist in the database
Control and resolve attributes
Attributes must exist in the relations
Type checking
Attributes that are compared must be of the same type
8/12/2019 Fo Optimization2
27/32
Query optimizer: Heuristic
Heuristic: Use joins instead of cartesian product+selections and doselection and projection as soon as possible, in order to keep theintermediate tables as small as possible, because
If the tables do not fit in memory, then we need to perform fewerdisc accesses
If the tables fit in memory, then we use less memory
,
If the tables have to be sorted, joined, etc., then we use lesscomputation power
ORDER_ID, ENTRY_DATE
ENTRY_DATE>2001-08-30
ORDER
ENTRY_DATE>2001-08-30( ORDER_ID, ENTRY_DATE( ORDER ) )
n = 6 tuples
4+4+27 (= 35) bytes
total: 210 bytes
n = 6 tuples
4+27 (=31) bytes
total: 181 bytes
n = 2 tuples
4+27 (=31) bytes
total: 62 bytes
ORDER_ID, ENTRY_DATE
ENTRY_DATE>2001-08-30
ORDER
ORDER_ID, ENTRY_DATE( ENTRY_DATE>2001-08-30( ORDER ) )
n = 6 tuples
4+4+27 (= 35) bytes
= 210 bytes
n = 2 tuples
4+4+27 (=35) bytes
= 70bytes
n = 2 tuples
4+27 (=31) bytes
= 62 bytes
8/12/2019 Fo Optimization2
28/32
Query optimizer: Heuristic Algorithm:
1. Break up conjunctive select into cascade
2. Move down select as far as possible in the tree
3. Rearrange select operations: The most restrictive should be executed first
4. Convert Cartesian product followed by selection into join
5. Move down project operations as far as possible in the tree. Create newprojections so that only the required attributes are involved in the tree
Fewest tuples ? Smallestsize ? Smallest selectivity ?
DBMS catalog containsrequired info.
.
8/12/2019 Fo Optimization2
29/32
Equivalence rules
8/12/2019 Fo Optimization2
30/32
8/12/2019 Fo Optimization2
31/32
ExercisesTrue or false ?
SELECT *
FROM ol_order_line, it_item
WHERE ol_item_id = it_item_id
AND ol_order_id = 1001
Optimize the queries below:
8/12/2019 Fo Optimization2
32/32
Execution plans Execution plan: Optimized query tree extended
with access methods and algorithms toimplement the operations.