Upload
shadowfax5885
View
93
Download
9
Embed Size (px)
Citation preview
Query processing & Query optimization
Master : DR alesheikh
Student : mohsen yousefzadeh (9413374)
1
What is Query Processing?
The activities involved in retrieving data from the
database are called as query processing.
It is a 3 step process that transforms a high level query
(sql) into an equivalent and more efficient lower-level
query (of relational algebra).
2
Query
Query
Query is the statement written by the user in high
language using sql.
3
Scanning & Parsing & validating
Query
Scanning
parsing
and
validating
Scanner : identifies the query tokens _ such as SQL keywords,
attribute names, and relation names _ that appear in the text
of the query .
Parser : Checks the syntax and verifies the relation.
Vaditing : checking that all attribute and relation names are
valid and semantically meaningful names .
4
Query optimizer
QueryQuery
optimizer
A query typically has many possible execution strategies,and
the process of choosing a suitable one for processing a query
is known as query optimization.
The query optimizer module has the task of producing a
good execution plan.It will select the query which has low cost.
Scanning
parsing
and
validating
5
Optimizer
Query
6
Scanning
parsing
and
validating
Query
optimizer
For example :"Find all female senators who own a business.“
This query is actually a composition of two subqueries"all
female senators" is a selection query. The query "Find all
senators who own a business " is a join query because we
combine two tables to process the query.
SENATOR
BUSINESS
The question is, in which order should these subqueries be
processed: select before join or join before select?
Remember a join is a multiscan query, and select is a single-
scan query. Now it is obvious that select should be done before
join.
name Soc-sec gender District(polygon)
B-name owner Soc-sec Location(point)
7
There are two main techniques that are employed during query
optimization :
The first technique is based on heuristic rules for ordering the
operations in a query execution strategy.
A heuristic is a rule that works well in most cases but is not
guaranteed to work well in every case. The rules typically
reorder the operations in a query tree.
8
The second technique involves systematically estimating
the cost of different execution strategies and choosing
the execution plan with the lowest cost estimate .
These techniques are usually combined in a query
optimizer .
9
Using Heuristics in Query Optimization
The scanner and parser of an SQL query first generate a data
structure that corresponds to an initial query representation,
which is then optimized according to heuristic rules . This
leads to an optimized query representation .
One of the main heuristic rules is to apply SELECT and
PROJECT operations before applying the JOIN .
10
Example of Transforming a Query
Consider the following query Q on the database
Q : Find the last names of employees born after 1957 who
work on a project named ‘Aquarius’.
This query can be specified in SQL as follows:
Q: SELECT Lname
FROM EMPLOYEE, WORKS_ON, PROJECT
WHERE Pname=‘Aquarius’ AND Pnumber=Pno AND Essn=Ssn
AND Bdate > ‘1957-12-31’;
11
πLname
σPname=‘Aquarius’ AND Pnumber=Pno AND Essn=Ssn AND Bdate>‘1957-12-31’
X
12
PROJECTX
WORKS_ONEMPLOYEE
(a) Initial query tree for SQL query Q
13
PROJECT
X
WORKS_ON
EMPLOYEE
(b) Moving SELECT
operations down the query
tree.
Bdate>‘1957-12-31’σpnumber=pno σ
σEssn=ssn
X
σPname=‘Aquarius’
πLname
14
PROJECT
WORKS_ON EMPLOYEE
(c) Replacing CARTESIAN
PRODUCT and SELECT with
JOIN operations.
Bdate>‘1957-12-31’pnumber=pno σ
Essn=ssn
σPname=‘Aquarius’
πLname
15
PROJECT
WORKS_ON
EMPLOYEE
(d) Moving PROJECT
operations down the query
tree.
Bdate>‘1957-12-31’
pnumber=pno
σ
Essn=ssn
σPname=‘Aquarius’
πLname
πssn,Lname
πpnumber
πEssn
πEssn,pno
Using Selectivity and Cost Estimates
in Query Optimization
A query optimizer does not depend solely on heuristic
rules .
it also estimates and compares the costs of executing a
query using different execution strategies and
algorithms.
and it then chooses the strategy with the lowest cost
estimate .
16
Cost Components for Query Execution
1. Access cost to secondary storage
2. Disk storage cost
3. Computation cost
4. Memory usage cost
5. Communication cost
17
Query
Query
code
generator
Query code generator
Code generator
generates the code to
execute that plan.
18
Query
optimizer
Scanning
parsing
and
validating
Runtime database processor
Query
The runtime database processor
has the task of running
(executing) the query code,
whether in compiled or
interpreted mode, to produce
the query result.
19
Scanning
parsing
and
validating
Query
optimizer
Query
code
generator
Runtime database processor
Code can be :
Excecuted directly (interpreted mode)
Stored and executed later whenever needed (compiled mode)
The term optimization is actually a misnomer because in some
cases the chosen execution plan is not the optimal (or absolute
best) strategy it is just a reasonably efficient strategy for executing
the query.
20
Result of query
Query
Result of
query
21
Scanning
parsing
and
validating
Query
optimizer
Query
code
generator
Runtime
database
processor