Upload
sage-computing-services
View
29
Download
0
Embed Size (px)
Citation preview
OHarmony – Finding Your Perfect Match
How the Optimizer Works
Penny Cookson SAGE Computing Services
SAGE Computing ServicesCustomised Oracle Training Workshops and
Consulting
WARNING
This presentation contains material which is not politically correct
Includes adult concepts
May contain strong language
SAGE Computing ServicesCustomised Oracle Training Workshops and
Consulting
Penny Cookson
Managing Director and Principal Consultant
Working with Oracle products since 1987
Oracle Magazine Educator of the Year 2004
www.sagecomputing.com.au
Optimiser Step 1 - Transformation
Query rewrite for materialised views
Transitive conditions
OR expansion
View merging
Predicate pushing
Join factorisation
COL1 = :B1AND COL2 >= :B2 AND COL2 < :B3AND COL3 >= :B4AND COL4 = :B5
…………………………………….
Col1
Col2
Col3
Col4
Col5
Col6 How many of
these are there likely to
be?
How many rows will be returned?
There are 23 million rows in this table
ATTRIBUTE1 = :B1AND ATTRIBUTE2 >= :B2 AND ATTRIBUTE2 < :B3AND ATTRIBUTE3 >= :B4AND ATTRIBUTE4 = :B5
How many of
these are there likely to
be?
Looking for your perfect match
There are 11 million malesin Australia
MARRIED = ‘N’AND AGE >=25 AND AGE <30AND HEIGHT >= 6ft 2 inAND JOB=‘DBA’
How many of
these are there likely to
be?
There are 11 million malesin Australia
The attribute Married has two distinct values Yes or No
50% 50%
We assume 50% of each
How many people satisfy the criteria Married = ‘N’?
There are 11 million males in Australia
Number of Unmarried males is 11,000,000/2 = 5,500,000
How many people satisfy the criteria Married = ‘N’?
More Statistics
6 in every 10 males are married
So - Number of Unmarried males is 4,400,000
This is what we did originallybegindbms_stats.gather_schema_stats(ownname=>'AUSOUG',method_opt => 'for all columns size 1' );end;
How can Oracle get better statistics?
begindbms_stats.gather_table_stats(ownname=>'AUSOUG',tabname=>'MEN', method_opt => 'for all columns size 1, for columns size auto married' );end;
Create a histogram only for Married
This is no longer relevant Because we have this
Cummulative
http://jonathanlewis.wordpress.com/2010/10/05/frequency-histogram-4/
SELECT /*+ GATHER_PLAN_STATISTICS */ COUNT(*)FROM menWHERE married = 'N‘;
SELECT dbms_xplan.display_cursor('5zyycq4y22j9g',format=>'ALLSTATS LAST')FROM dual;
Now it gets it right
MARRIED = ‘N’
AND AGE >=25 AND AGE <30AND HEIGHT >= 6ft 2 inAND JOB=‘DBA’
How many of
these are there likely to
be?
There are 11 million malesin Australia
MARRIED = ‘N’ 40%
Age statistics
8.2% of men are BETWEEN 25 and 29
Age statistics
SELECT /*+ GATHER_PLAN_STATISTICS */ COUNT(*)FROM menWHERE age >=25 and age < 30;
create or replace function raw_to_num(i_raw raw)return number as m_n number;begin dbms_stats.convert_raw_value(i_raw,m_n); return m_n;end;/ create or replace function raw_to_date(i_raw raw)return date as m_n date;begin dbms_stats.convert_raw_value(i_raw,m_n); return m_n;end;/ create or replace function raw_to_varchar2(i_raw raw)return varchar2 as m_n varchar2(20);begin dbms_stats.convert_raw_value(i_raw,m_n); return m_n;end;/
http://jonathanlewis.wordpress.com/2006/11/29/low_value-high_value/
SELECT column_name, decode(data_type, 'VARCHAR2',to_char(raw_to_varchar2(low_value)), 'DATE',to_char(raw_to_date(low_value)), 'NUMBER',to_char(round(raw_to_num(low_value),2)) ) low_value, decode(data_type, 'VARCHAR2',to_char(raw_to_varchar2(high_value)), 'DATE',to_char(raw_to_date(high_value)), 'NUMBER',to_char(round(raw_to_num(high_value),2)) ) high_valueFROM user_tab_columnsWHERE table_name='MEN';
http://jonathanlewis.wordpress.com/2006/11/29/low_value-high_value/
5.10204… % * 11,000,000 = 561,224
0 9825 30
SELECT TRUNC(11000000*(30-25)/(98)),trunc((30-25)/(98)*100,6) from dual
Add a histogram
begindbms_stats.gather_table_stats(ownname=>'AUSOUG',tabname=>'MEN', method_opt => 'for all columns size 1 for columns size auto married, for columns size 254 age ' );end;
SELECT trunc(h.endpoint_value,2),h.endpoint_repeat_countfrom user_tab_histograms h, user_tables tWHERE t.table_name = h.table_nameAND h.table_name = 'MEN'AND h.column_name = 'AGE' ORDER BY endpoint_value
SELECT /*+ GATHER_PLAN_STATISTICS */ COUNT(*)FROM menWHERE age >=25 and age < 30;
SELECT dbms_xplan.display_cursor('6aub3fn962277',format=>'ALLSTATS LAST')FROM dual
Not perfect but better
MARRIED = ‘N’AND AGE >=25 AND AGE <30
AND HEIGHT >= 6ft 2 in (188cm)AND JOB=‘DBA’
How many of
these are there likely to
be?
There are 11 million malesin Australia
MARRIED = ‘N’ 40% AND AGE >=25 AND <30 8.2%
SELECT /*+ GATHER_PLAN_STATISTICS */ COUNT(*) FROM men WHERE height >= 188
SELECT dbms_xplan.display_cursor('bf4wbtgb9zptq', format=>'ALLSTATS LAST')FROM dual
How many men are >= 188cm (we have gathered a histogram)
MARRIED = ‘N’AND AGE >=25 AND AGE <30AND HEIGHT >= 6ft 2 in (188cm)
AND JOB=‘DBA’
How many of
these are there likely to
be?
There are 11 million malesin Australia
MARRIED = ‘N’ 40% AND AGE >=25 AND <30 8.2% AND HEIGHT >= 6ft 2 in (188cm) 2.9%
11000000*40/100 *8.2/100 * 2.9/100 =10,463
MARRIED = ‘N’ 40% AND AGE >=25 AND <30 8.2% AND HEIGHT >= 6ft 2 in (188cm) 2.9%
SELECT /*+ GATHER_PLAN_STATISTICS */ COUNT(*) FROM men WHERE married = 'N'AND age >=25 AND age < 30 AND height >= 188
Height Statistics
5.3% of males are >= 6ft 2inches
Are these statistics out
of date?- do select of last_analyzed
5.3% of males are >= 6ft 2inches
Note the correlation between
gender, age and height
SELECT c.column_name, t.num_rows "Number of Rows", c.num_distinct "Distinct Values", c.histogram "Histogram"FROM user_tables t, user_tab_col_statistics cWHERE t.table_name = c.table_nameAND t.table_name = 'MEN'
SELECT /*+ GATHER_PLAN_STATISTICS */ COUNT(*) FROM men WHERE age =6 AND height = 174
BEGINdbms_stats.gather_table_stats(ownname=>'AUSOUG',tabname=>'MEN', estimate_percent=>NULL,method_opt => 'for all columns size 1 for columns size auto married height, for columns size 254 age, for columns (age, height) size 2048 ' );END;
Gather extended statistics
SELECT * FROM dba_stat_extensions WHERE table_name = 'MEN'
SELECT c.column_name, t.num_rows "Number of Rows", c.num_distinct "Distinct Values", c.histogram "Histogram"FROM user_tables t, user_tab_col_statistics cWHERE t.table_name = c.table_nameAND t.table_name = 'MEN'
SELECT /*+ GATHER_PLAN_STATISTICS */ COUNT(*) FROM men WHERE age =6 AND height = 174
ALTER SESSION SET EVENTS ' 10053 trace name context forever ‘;EXPLAIN PLAN FOR …………;ALTER SESSION SET EVENTS ' 10053 trace name context off ‘;
SELECT /*+ GATHER_PLAN_STATISTICS */ COUNT(*) FROM men WHERE age = 30AND height = 174
SELECT /*+ GATHER_PLAN_STATISTICS */ COUNT(*) FROM men WHERE married = 'N'AND age >=25 AND age < 30 AND height >= 188
When we combine range checks it gets it wrong
10053 trace file – its not looking at the extended stats
ALTER SESSION SET EVENTS ' 10053 trace name context forever ‘;EXPLAIN PLAN FOR …………;ALTER SESSION SET EVENTS ' 10053 trace name context off ‘;
BEGINDBMS_STATS.DROP_EXTENDED_STATS('AUSOUG', 'MEN', '("AGE","HEIGHT","MARRIED")') ;END;
BEGINdbms_stats.purge_stats(sysdate);END;
What about Statistics Feedback?
Execute again
begindbms_spd.flush_sql_plan_directive;end;
Clear any existing SQL Plan Directives
SELECT d.directive_id, d.type, d.state, d.reason, d.created, o.object_name, o.subobject_name, o.notes, o.ownerFROM dba_sql_plan_directives d, dba_sql_plan_dir_objects oWHERE o.directive_id = d.directive_idAND o.owner NOT IN ('XDB','SYS','SYSTEM')ORDER BY directive_id desc;
begindbms_spd.drop_sql_plan_directive(3579438123315094543);end;
ALTER SYSTEM FLUSH shared_pool;
ALTER SESSION SET statistics_level = ALL;
Clear any existing plans and gather plan statistics
Execute each of these twice
Only the last two use statistics feedback
UNMARRIED = ‘Y’AND AGE >=25 AND AGE <30AND HEIGHT >= 6ft 2 in (188cm)
AND JOB=‘DBA’
How many of
these are there likely to
be?
There are 11 million malesin Australia
UNMARRIED = ‘Y’ 40% AND AGE >=25 AND <30 8.2% AND HEIGHT >= 6ft 2 in (188cm) 2.9%
We have no idea what percentage of males are DBAs
begindbms_stats.delete_schema_stats(ownname=>'AUSOUG' );end;
SELECT COUNT(*) FROM men WHERE job = 'DBA'
Actual value is 1,000,014
Now we know how many matches to expect what is the best way to get to them
Two main types:
Pretty cruisy really – almost anyone will do
Really very picky – he must be just right
The Oracle approach to
Pretty cruisy really – almost anyone will do
SELECT COUNT(*) FROM men WHERE age >=20 AND age < 50 AND married = 'N'AND job != 'LAWYER'
Full table scan
WHERE col1 = ' bbbbb '
aaaaabbbbbcccccdddddeeeee
aaaaagggggcccccdddddeeeee
aaaaakkkkkcccccdddddeeeee
aaaaabbbbbcccccdddddeeeee
aaaaabbbbbbbbbbdddddeeeee
Multi block read
SELECT COUNT(*) FROM men WHERE age >=20 AND age < 50 AND married = 'N'AND job != 'LAWYER'
Really very picky – he must be just right
OHarmony
I tell you what I want and you just give me the phone numbers where I can contact them
OHarmony
OHarmony
Brown
DBA
188
120,000
N
25Age From Age To 30
IQ (min)
Married
Smoker
Eyes
Height From
Salary (min annual)
120
N
195Height To
Preferred Job Type
Search
OHarmony
Brown
DBA
188
120,000
N
25Age From Age To 30
IQ (min)
Married
Smoker
Eyes
Height From
Salary (min annual)
120
N
195Height To
Preferred Job Type
Search
John Smith 9999 9999
Index range scan
a
z
bbbb rowidbbbb rowid
aaaaabbbbbcccccdddddeeeee
aaaaagggggcccccdddddeeeee
aaaaakkkkkcccccdddddeeeee
aaaaabbbbbcccccdddddeeeee
WHERE col1 = ' bbbbb '
SELECT id, surname, firstnameFROM menWHERE iq = 155;
SELECT COUNT(surname) FROM menWHERE iq > 154;
SELECT COUNT(surname) FROM menWHERE iq > 153;
9 rows0.00008%
137,488 rows1.25%
OHarmony
DEVELOPER
195
120,000
N
25Age From Age To 30
IQ (min)
Married
Smoker
Eyes
Height From
Salary (min annual)
N
Height To
Preferred Job Type
Search
Oharmony has provided 14 matches in 14 locations
Poorly clustered
Oharmony has provided 14 matches in 2 locations
Well clustered
SELECT i.index_name, i.distinct_keys, i.num_rows, i.clustering_factor, t.blocksFROM user_indexes i, user_tables tWHERE t.table_name = i.table_nameAND t.table_name = 'MEN';
CREATE TABLE men2AS SELECT * FROM menORDER BY IQ;
CREATE INDEX MEN2_IQ_N4 ON men2(iq);
begindbms_stats.gather_table_stats(ownname=>'AUSOUG',tabname=>'MEN2', estimate_percent => null, cascade=>true,method_opt => 'for all columns size 1 for columns size 2000 iq ' );end;
SELECT i.distinct_keys, i.num_rows, i.clustering_factor, t.blocksFROM user_indexes i, user_tables tWHERE t.table_name = i.table_nameAND i.index_name = 'MEN2_IQ_N4';
previously
previously
Optimizer Mode
Optimizer Mode
All_Rows
First_Rows
Tends towards:
SELECT id, surname, firstnameFROM menWHERE married = 'N' AND age_range = '25-29' AND height = 188AND iq = 120 AND job = 'DBA';
Bitmap indexes
With Bitmap indexes
Sorting
With B*Tree indexes
SELECT id, surname, firstnameFROM menWHERE married = 'N' AND age_range = '25-29‘AND height = 188AND iq = 120 AND job= 'DBA';
access conditions number of rows? access method?
access conditions number of rows? access method?
access conditions number of rows? access method?
Identify each join pathHow many rows am I likely to get?What is the best join method?
All I have talked about so far is Access to one table – how does it JOIN them together ?
access conditions number of rows? access method?
access conditions number of rows? access method?
access conditions number of rows? access method?
1
2
3
access conditions number of rows? access method?
access conditions number of rows? access method?
access conditions number of rows? access method?
1
2 3
access conditions number of rows? access method?
access conditions number of rows? access method?
access conditions number of rows? access method?
1
2 3
for each join path for each join – what is the best JOIN METHOD?
Joins methods – Nested Loop with index
1 2
A.COL1 = B.COL2
A B
COL1 = 1B.COL2index
ACCESS ROWS WHERE COL2 = 1
COL2 = 1COL2 = 1COL2 = 1
Joins methods – Hash join
A
B
HASH TABLE1
Joins methods – cartesian
A
B
SELECT COUNT(*) FROM men WHERE married = 'N'AND age < 12AND height >= 188
The wrong plan 0 rows
0 rows
ALTER SESSION SET OPTIMIZER_INDEX_COST_ADJ = 5;
SELECT COUNT(*) FROM men WHERE married = 'N'AND age < 12AND height >= 188
This is even worse
SELECT /*+ INDEX_COMBINE(MEN MEN_AGE_N1,MEN_HEIGHT_N1) */ COUNT(*) FROM men WHERE married = 'N'AND age < 12AND height >= 188
0 rowsBetter
SELECT COUNT(surname) FROM men WHERE height between 159.4 AND 160
26188 rows
SELECT /*+ FULL(MEN) */ COUNT(surname)FROM men WHERE height between 159.4 AND 160
10,299,997 rowsSELECT COUNT(surname), COUNT( start_date)FROM men m, events eWHERE e.men_id (+) = m.idAND m.iq = 156
SELECT /*+ USE_HASH(M,E) */ COUNT(surname), COUNT( start_date)FROM men m, events eWHERE e.men_id (+) = m.idAND m.iq = 156
Questions?
Penny Cookson SAGE Computing Services
SAGE Computing ServicesCustomised Oracle Training Workshops and
Consulting