Recent-Rows-First Case Study February, 2010 ©2010 Dan Tow, All rights reserved [email protected] SingingSQL Presents SingingSQL

Recent-Rows-FirstCase Study

February, 2010

©2010 Dan Tow, All rights [email protected]

SingingSingingSQL SQL PresentsPresents:

The Standard SQL-Tuning Problem Statement

•Find SQL that needs tuning.•Take the SQL that needs tuning as a functional spec for what rows the application needs at that point in its flow of control.•Find some database change (usually a new index) or some transformation of the SQL that absolutely guarantees (assuming no wrong-rows bug in Oracle, which is a safe assumption!) returning the same rows as the original SQL spec, only with a faster execution plan, and confirm that it is faster after the change.

Finding the Fastest Plan, Normally

•The fastest execution plan usually is the one that touches the fewest rows, and this, in turn, is the one that discards the fewest rows, and that discards those rows that must be discarded the earliest, before wasting work on unneeded joins. This usually starts with the best filter condition (the one reaching the smallest fraction of its table), and reaches every large table with the fastest path (usually a filter-column index for the first table and nested loops to a join-key index for the later large tables).•The goal is normally a query that runs in minutes, at the most, without requiring parallel threads.

A Hideously Nasty Problem of Rare Proportions

•New user report timed out (snapshot-too-old error) consistently after hours of runtime, with a process planned to run during business hours, so it should not use parallel threads.•Several tables were on the order of a half-billion rows, so even gathering stats or trying out alternatives was very time-consuming.•Sampling strategies helped analysis (e.g., test joining only every hundredth row)

The SQL Core, Anonymized

SELECT ... FROM LB, IT, HB, LA, HA, CA, PGIWHERE LB.DATE_T BETWEEN '1-NOV-2009' AND '6-NOV-2009' AND LB.A_ID = '9' AND LB.ID = IT.E_ID AND IT.B_ID = HB.ID AND IT.ID1 = LA.I_ID AND HB.C_NUM = LA.SC_NUM AND LA.L_STATUS_ID <> 10096 AND LA.ATTRIBUTE8 = 'Mid-Term' AND LA.ATTRIBUTE9 = 'TO' AND LA.E_PRICE > 0 AND LA.SUM_DET = 'D' AND HA.HEADER_ID = LA.HEADER_ID AND HA.ATTRIBUTE3 = 'Y' AND HA.STATUS_ID = 10047 AND CA.SITE_ID = HA.INVOICE_SITE_ID AND CA.SITE_CODE = 'BILL_TO' AND PGI.COUNTRY_CODE = CA.COUNTRY AND UPPER(PGI.THEATER) = UPPER('EURO')

The SQL Diagram

IT

HB LB 0.0008 (recent rows)

LA 0.0013

HA 0.028

CA

PGI 0.13

The SQL Diagram

IT

HB LB 0.0008 (recent rows)

LA 0.0013

HA 0.028

CA

PGI 0.13

Standard (heuristic-best) join order:

LB, IT, HB, LA, HA, CA, PGI

Too slow – 4+ hours

The SQL Diagram

IT

HB LB 0.0008

LA 0.0013 (no index)

HA 0.028

CA

PGI 0.13

0.008 combined

0.00012 combined

698M

231

1M

479M

2.7M 703M

The SQL Diagram

IT

HB LB 0.0008

LA 0.0013 (no index)

HA 0.028

CA

PGI 0.13

0.008 combined

0.00012 combined

698M

231

1M

479M

2.7M703MBest join

order:

HA, CA, PGI, LA, HB, IT, LB

But, there is a catch! (Takes 2:40)

The SQL Diagram

IT

HB LB 0.0008

LA 0.0013

HA 0.028

CA

PGI 0.13

0.008 combined

0.00012 combined

698M

231

1M

479M

2.7M703M

Takes 2:40, but at the rate it begins (based on those sampling tests), it should take half as long!

Explanation:

Rollback overhead!

The SQL Diagram

IT

HB LB 0.0008

LA 0.0013

HA 0.028

CA

PGI 0.13

0.008 combined

0.00012 combined

698M

231

1M

479M

2.7M703M

Takes 2:40, but at the rate it begins, it should take half as long!

Explanation:

Rollback overhead!

Solution:

Hit the recent rows first, as these are more likely to have rollback! – Takes 1:20!

The SQL Fix, Anonymized

SELECT /*+ (leading(ha_iv la hb it lb) use_nl(ha la hb it lb) index(it IT_N2) */ …FROM LA,(SELECT /*+ leading(HA) index_desc(HA HA_PK) first_rows */ … FROM HA, CA, PGIWHERE HA.HEADER_ID > 0 /* True always, drives desc range scan */ AND HA.ATTRIBUTE3 = 'Y' AND HA.STATUS_ID = 10047 /* two filters, together, 29K/1M */ AND CA.SITE_ID = HA.INVOICE_SITE_ID AND CA.SITE_CODE = 'BILL_TO' AND PGI.COUNTRY_CODE = CA.COUNTRY AND UPPER(PGI.THEATER) = UPPER('EURO') AND ROWNUM > 0) HA_IV,<Rest of FROM clause>WHERE <rest of joins and filters>

Conclusions•Most optimized queries drive first to recent rows, first, automatically.•Recent rows are hottest – most likely to require rollback.•The best way to avoid rollback overheads for these queries is simply to find the fastest plan – with less runtime, there will be fewer rows requiring rollback.

Conclusions

•Sometimes, optimized queries drive from time-independent conditions.•The best way to avoid rollback overheads for these is to speed up the query, resulting in fewer row changes during the query runtime!

Conclusions•Sometimes, optimized queries drive from time-independent conditions.•In rare cases, these queries run so long that when they reach hot rows, rollback overheads are high.•In these cases, driving to recent rows first, independent of any filter on the driving table (something like a reverse full table scan) can avoid rollback overheads because the last rows reached (old rows) are the least likely to require rollback.

Questions?

Documents

Recent-Rows-First Case Study February, 2010 ©2010 Dan Tow, All rights reserved [email protected] SingingSQL Presents SingingSQL