View
215
Download
1
Embed Size (px)
Citation preview
1
King Saud UniversityCollege of Computer & Information Sciences
IS 335 Database Management System
Lecture 6Query Processing and Optimization (Practice)
Dr. Mourad YKHLEF
The slides content is derived from many references
IS 335 – Query Processing and Optimization - Dr. Mourad Ykhlef
2
Motivation (1)
• We would like to find the cheapest way
to calculate the join of three tables:
• Sailors Reserves Boats
IS 335 – Query Processing and Optimization - Dr. Mourad Ykhlef
3
Motivation (2)
• We need to decide on the order of
operations:(Sailors Reserves) Boats
or
Sailors (Reserves Boats)
• We need to decide which join algorithm
to use for each of the operations
• What information do we need?
IS 335 – Query Processing and Optimization - Dr. Mourad Ykhlef
4
Statistics Maintained by DBMS for Relations
• Cardinality NTuples(R): Number of tuples in each relation R
• Size NPages(R) : Number of pages in each relation R
IS 335 – Query Processing and Optimization - Dr. Mourad Ykhlef
5
Statistics Maintained by DBMS for Indexes
• Index Cardinality: Number of distinct key values NKeys(I) for each index I
• Index Size: Number of pages INPages(I) in each index I
• Index Height: Number of non-leaf levels IHeight(I) in each B+ Tree index I
• Index Range: The minimum value ILow(I) and maximum value IHigh(I) for each index I
IS 335 – Query Processing and Optimization - Dr. Mourad Ykhlef
6
Note
• The statistics are updated periodically
(not every time the underlying
relations are modified).
• We cannot use the cardinality for
computing
select count(*)
from R
IS 335 – Query Processing and Optimization - Dr. Mourad Ykhlef
7
Estimating Result Sizes
• Consider
• The maximum number of tuples is the product of the cardinalities of the relations in the FROM clause
• The WHERE clause is associating a reduction factor with each term. It reflects the impact of the term in reducing result size.
SELECT attribute-list
FROM relation-list
WHERE term1 and ... and termn
IS 335 – Query Processing and Optimization - Dr. Mourad Ykhlef
8
Result Size
• Estimated result size:
maximum size
X
the product of the reduction factors
IS 335 – Query Processing and Optimization - Dr. Mourad Ykhlef
9
Assumptions
• Containment of value sets: if
NKeys(I1)<NKeys(I2) for attribute Y,
then every Y-value of R will be a Y-
value of S
• Empirically-obtained reduction factor is
1/10 if no additional info is available
IS 335 – Query Processing and Optimization - Dr. Mourad Ykhlef
10
Estimating Reduction Factors
• column = value: 1/NKeys(I) – There is an index I on column.
– This assumes a uniform distribution.
– Otherwise, use 1/10.
• column1 = column2: 1/Max(NKeys(I1),NKeys(I2)) – There is an index I1 on column1 and an index I2 on column2.
– Containment of value sets assumption
– If only one column has an index, we use it to estimate the value.
– Otherwise, use 1/10.
IS 335 – Query Processing and Optimization - Dr. Mourad Ykhlef
11
Estimating Reduction Factors
• column > value:
(High(I)-value)/(High(I)-Low(I)) if there
is an index I on column.
IS 335 – Query Processing and Optimization - Dr. Mourad Ykhlef
12
Example
• Cardinality(R) = 100,000
• Cardinality(S) = 40,000
• NKeys(Index on R.agent) = 100
• High(Index on Rating) = 10, Low = 0
Reserves (sid, agent), Sailors(sid, rating)
SELECT *
FROM Reserves R, Sailors S
WHERE R.sid = S.sid and S.rating > 3 and
R.agent = ‘Joe’
IS 335 – Query Processing and Optimization - Dr. Mourad Ykhlef
13
Example (cont.)
• Maximum cardinality: 100,000 * 40,000
• Reduction factor of R.sid = S.sid: 1/40,000
– sid is a primary key of S
• Reduction factor of S.rating > 3: (10–3)/(10-
0) = 7/10
• Reduction factor of R.agent = ‘Joe’: 1/100
• Total Estimated size: 700
Creating Indexes Using Oracle
IS 335 – Query Processing and Optimization - Dr. Mourad Ykhlef
15
Index
• Map between
– the row key
– the row location
• Oracle has two kinds of indexes
– B* tree
– Bitmap
• Sorted
IS 335 – Query Processing and Optimization - Dr. Mourad Ykhlef
16
B* tree
Root
19 24 33
2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*
14
IS 335 – Query Processing and Optimization - Dr. Mourad Ykhlef
17
Creating an Index
• Syntax:
create [bitmap] [unique] index iname on
table(column [,column] . . .)
IS 335 – Query Processing and Optimization - Dr. Mourad Ykhlef
18
Unique Indexes
• Create an index that will guarantee the uniqueness of the key. Fail if any duplicate already exists.
• When you create a table with a – primary key constraint or
– unique constraint
a "unique" index is created automatically
create unique index rating_bit on Sailors(rating);
IS 335 – Query Processing and Optimization - Dr. Mourad Ykhlef
19
Bitmap Indexes
• Appropriate for columns that may have very few possible values
• For each value c that appears in the column, a vector v of bits is created, with a 1 in v[i] if the i-th row has the value c– Vector length = number of rows
• Oracle can automatically convert bitmap entries to RowIDs during query processing
IS 335 – Query Processing and Optimization - Dr. Mourad Ykhlef
20
Bitmap Indexes: Example
create bitmap index rating_bit on Sailors(rating);
• Corresponding bitmaps:– 3: <1 0 0 1>
– 7: <0 1 0 0>
– 10: <0 0 1 0>
SidSnameagerating
12Jim553
13John467
14Jane4610
15Sam373
IS 335 – Query Processing and Optimization - Dr. Mourad Ykhlef
21
When to Create an Index
• Large tables, on columns that are likely
to appear in where clauses as a
simple equality
• where s.sname = ‘John’ and s.age = 50
• where s.age = r.age
IS 335 – Query Processing and Optimization - Dr. Mourad Ykhlef
22
Function-Based Indexes
• You can't use an index on sname for the
following query:select *
from Sailors
where UPPER(sname) = 'SAM';
• You can create a function-based index to
speed up the query:
create index upp_sname on Sailors(UPPER(sname));
IS 335 – Query Processing and Optimization - Dr. Mourad Ykhlef
23
Index-Organized Tables• An index organized table keeps its data sorted by the
primary key
• Rows do not have RowIDs
• They store their data as if they were an index
create table Sailors(
sid number primary key,
sname varchar2(30),
age number,
rating number)
organization index;
IS 335 – Query Processing and Optimization - Dr. Mourad Ykhlef
24
Index-Organized Tables (2)
• What advantages does this have?– Primary key is not duplicated in the index
– Improve performance of queries based on the primary key
• What disadvantages? – expensive to add columns, dynamic data
• When to use?– where clause on the primary key
– static data
IS 335 – Query Processing and Optimization - Dr. Mourad Ykhlef
25
Clustering Tables Together
• You can ask Oracle to store several tables
with common columns together on the disk
• This is useful if you often join these tables
• Cluster: area on the disk where the rows
of the tables are stored
• Cluster key: the columns by which the
tables are usually joined in a query
IS 335 – Query Processing and Optimization - Dr. Mourad Ykhlef
26
Clustering Tables Together: Syntax
• create cluster sailor_reserves (X number);– Create a cluster with nothing in it
• create table Sailors(
sid number primary key,
sname varchar2(30),
age number,
rating number)
cluster sailor_reserves(sid);
– create the table in the cluster
IS 335 – Query Processing and Optimization - Dr. Mourad Ykhlef
27
Clustering Tables Together: Syntax (cont.)
• create index sailor_reserves_index on cluster sailor_reserves– Create an index on the cluster
• create table Reserves(
sid number,
bid number,
day date,
primary key(sid, bid, day) )
cluster sailor_reserves(sid);
– A second table is added to the cluster
IS 335 – Query Processing and Optimization - Dr. Mourad Ykhlef
28
Reserves
sidbidday
221027/7/97
2210110/10/96
5810311/12/96
Sailors
sidsnameratingage
22Dustin745.0
31Lubber855.5
58Rusty1035.0
Stored
sidsnameratingagebidday
22Dustin745.010
27/7/97
10
110/10/96
31Lubber855.5
58Rusty1035.010
311/12/96
The Oracle Optimizer
IS 335 – Query Processing and Optimization - Dr. Mourad Ykhlef
30
Types of Optimizers
• There are different modes for the optimizer
• RULE: Rule-based optimizer (RBO)– deprecated
• CHOOSE: Cost-based optimizer (CBO); picks a plan based on statistics (e.g. number of rows in a table, number of distinct keys in an index) – Need to analyze the data in the database using analyze
command
ALTER SESSION SET optimizer_mode = {choose|rule|first_rows(_n)|all_rows}
IS 335 – Query Processing and Optimization - Dr. Mourad Ykhlef
31
Types of Optimizers
• ALL_ROWS: execute the query so that
all of the rows are returned as quickly
as possible– Merge Join has priority over Block Nested Loop Join
• FIRST_ROWS(n): execute the query so
that all of the first n rows are returned
as quickly as possible– Block Nested Loop Join has priority over Merge Join
IS 335 – Query Processing and Optimization - Dr. Mourad Ykhlef
32
analyze table | index
<table_name> | <index_name>
compute statistics |
estimate statistics [sample <integer>
rows | percent] |
delete statistics;
analyze table Sailors estimate statistics sample 25 percent;
Analyzing the Data
IS 335 – Query Processing and Optimization - Dr. Mourad Ykhlef
33
Viewing the Execution Plan(Option 1)
• You need a PLAN_TABLE table. So, the first time that you want to see execution plans, run the command:
• Set autotrace on to see all plans– Display the execution path for each query,
after being executed
@$ORACLE_HOME/rdbms/admin/utlxplan.sql
Or
C:\oracle\ora92\rdbms\admin\utlxplan.sql
IS 335 – Query Processing and Optimization - Dr. Mourad Ykhlef
34
PLAN_TABLE
• create table PLAN_TABLE ( statement_id varchar2(30), plan_id number, timestamp date,
remarks varchar2(4000), operation varchar2(30), options varchar2(255), object_node varchar2(128), object_owner varchar2(30),
IS 335 – Query Processing and Optimization - Dr. Mourad Ykhlef
35
PLAN_TABLE
• object_name varchar2(30), object_alias varchar2(65), object_instance numeric, object_type varchar2(30), optimizer varchar2(255), search_columns number, id numeric, parent_id numeric, depth numeric,
IS 335 – Query Processing and Optimization - Dr. Mourad Ykhlef
36
PLAN_TABLE
• position numeric, cost numeric, cardinality numeric, bytes numeric, other_tag varchar2(255), partition_start varchar2(255), partition_stop varchar2(255), partition_id numeric, other long,
IS 335 – Query Processing and Optimization - Dr. Mourad Ykhlef
37
PLAN_TABLE
• distribution varchar2(30), cpu_cost numeric, io_cost numeric, temp_space numeric, access_predicates varchar2(4000), filter_predicates varchar2(4000), projection varchar2(4000), time numeric, qblock_name varchar2(30) );
IS 335 – Query Processing and Optimization - Dr. Mourad Ykhlef
38
Viewing the Execution Plan (Option 2)
• Another option:
explain planset statement_id='test'for SELECT *FROM Sailors SWHERE sname='Joe';
explain plan set statement_id=‘<name>’ for <statement>
Select … from Plan_Table where statement_id=‘test’;
IS 335 – Query Processing and Optimization - Dr. Mourad Ykhlef
39
Operations that Access Tables
• TABLE ACCESS FULL: sequential table scan
– Oracle optimizes by reading multiple blocks
– Used whenever there is no where clause on a
query
select * from Sailors
• TABLE ACCESS BY ROWID: access rows by
their RowID values.
– How do you get the rowid? From an index!
select * from Sailors where sid > 10
IS 335 – Query Processing and Optimization - Dr. Mourad Ykhlef
40
Types of Indexes
• Unique: each row of the indexed table
contains a unique value for the
indexed column
• Nonunique: the row’s indexed values
can repeat
IS 335 – Query Processing and Optimization - Dr. Mourad Ykhlef
41
Operations that Use Indexes
• INDEX UNIQUE SCAN: Access of an
index that is defined to be unique
• INDEX RANGE SCAN: Access of an
index that is not unique or access of a
unique index for a range of values
IS 335 – Query Processing and Optimization - Dr. Mourad Ykhlef
42
IS 335 – Query Processing and Optimization - Dr. Mourad Ykhlef
43
IS 335 – Query Processing and Optimization - Dr. Mourad Ykhlef
44
IS 335 – Query Processing and Optimization - Dr. Mourad Ykhlef
45
When are Indexes Used/Not Used?
• If you set an indexed column equal to a value, e.g., sname = 'Jim'
• If you specify a range of values for an indexed column, e.g., sname like 'J%'– sname like '%m': will not use an index
– UPPER(sname) like 'J%' : will not use an index
– sname is null: will not use an index, since null values are not stored in the index
– sname is not null: will not use an index, since every value in the index would have to be accessed
IS 335 – Query Processing and Optimization - Dr. Mourad Ykhlef
46
When are Indexes Used? (cont)
• 2*age = 20: Index on age will not be used. Index on 2*age will be used.
• sname != 'Jim': Index will not be used.
• MIN and MAX functions: Index will be used
• Equality of a column in a leading column of a multicolumn index. For example, suppose we have a multicolumn index on (sid, bid, day)– sid = 12: Can use the index
– bid = 101: Cannot use the index
IS 335 – Query Processing and Optimization - Dr. Mourad Ykhlef
47
Optimizer Hints
• You can give the optimizer hints about
how to perform query evaluation
• Hints are written in /*+ */ right after
the select
• Note: These are only hints. The Oracle
optimizer can choose to ignore your
hints
IS 335 – Query Processing and Optimization - Dr. Mourad Ykhlef
48
Hints
• FULL hint: tell the optimizer to perform a TABLE ACCESS FULL operation on the specified table
• ROWID hint: tell the optimizer to perform a TABLE ACCESS BY ROWID operation on the specified table
• INDEX hint: tells the optimizer to use an index-based scan on the specified table
IS 335 – Query Processing and Optimization - Dr. Mourad Ykhlef
49
Examples
Select /*+ FULL (sailors) */ sidFrom sailorsWhere sname=‘Joe’;
Select /*+ INDEX (sailors) */ sidFrom sailorsWhere sname=‘Joe’;
Select /*+ INDEX (sailors s_ind) */ sidFrom sailors S, reserves RWhere S.sid=R.sid AND sname=‘Joe’;
IS 335 – Query Processing and Optimization - Dr. Mourad Ykhlef
50