25
M PAR I 1 t r A o D c a S ining th RT 1 - Whe I’m certain th 108 AWR rep he bottleneck reports by han AWR reports of the 1000+ l Definitely thi consultant I m already availa SQL*Plus ses e AWR New CPU a cost. H available DBAs an guesswo proper p for your savings AWR is steroids workloa samples to visua AWR da and Util metrics In this p to have Analysis re it all sta hat DBAs or d ports in just an k? And what nd, and it is a generation, e lines of perfo s will lead to must also be aw able to help ssion? Reposit o Us and storage a Hence, capacity p e and to handle nd IT managers ork you'll end up planning, and ma r workload with for the company a built-in data s ". It has impro d information w s, we could build alize data and us ta samples is we ization in terms for Capacity Plan paper you will lea e a clear-cut me s, and Performan arted developers do n hour. How a is the bottlen a daunting an especially wh rmance data t o longer analy ware on how with the trou tory for C other Re K Oracle ACE karlara arrays are getting planning plays a expected and un is justifying the p getting the mo anagement of gr allowance for a y and a happier I store that starte oved significantl when going thro amazing reports se statistical met e are able to def of CPU, IO, me nning. arn how to make easurement on nce Firefighting. on’t have eno about 108 AW neck? Well be nd repetitive e hen you start r to correlate it ysis periods h to optimize m ubleshooting Capacity al World Karl Arao E, OCP-DBA a[email protected]o g faster, but the very important nexpected worklo expense of add st expensive har rowth, you'll be particular grow IT shop. d in 10gR1 and y in 11gR2, en ugh all the AW s that will let us thods for analys fine the databas emory, and netw e use of the AWR resources to aid ough time to s WR reports in efore it will ta execution of a reading each t to the proble hence longer my troublesho but what if y Plann d Stuff A, RHCE om se resources are role to ensure pr oads. Another cri ding resources on rdware. With pro able to get just wth period. This is very much lik nabling you to R snapshots. Fr notice trends an is. Even more su se server's Capac work, which are v R, specifically the d in Capacity P spare to read n 5 minutes ju ake so much awrrpt.sql. Y of them and em at hand. r time for a p ooting time. Y you are only ing, Vis e finite and come roper resources a itical matter for t n the system. W oper measureme the right hardwa will result in hu ke a "Statspack have a far bet rom the AWR d d makes it possi urprising about t city, Requiremen very important k e DBA_HIST view Planning, Predict d 108 AWR re ust to answer of my time ju You will be ov you only nee problem to be You can argu y left with ju ualizatio e at are the With ent, are uge on tter ata ble the nts, key ws, tive eports in a da the question w ust to generat verwhelmed b ed to see parti e solved and ue that there ar ust a comma on, and ay, even mor what period i te these AWR by the manua icular section as a databas re visual tool and line or an e s R al s e s n

Whitepaper: Mining the AWR repository for Capacity Planning and Visualization

Embed Size (px)

Citation preview

Page 1: Whitepaper: Mining the AWR repository for Capacity Planning and Visualization

M

PAR

I1trAo

DcaS

ining th

RT 1 - Whe

I’m certain th108 AWR repthe bottleneckreports by hanAWR reports of the 1000+ l

Definitely thiconsultant I malready availaSQL*Plus ses

e AWR

New CPU

a cost. H

available

DBAs an

guesswo

proper p

for your

savings

AWR is

steroids

workloa

samples

to visua

AWR da

and Util

metrics

In this p

to have

Analysis

re it all sta

hat DBAs or dports in just ank? And what nd, and it is ageneration, e

lines of perfo

s will lead tomust also be awable to help ssion?

Reposito

Us and storage a

Hence, capacity p

e and to handle

nd IT managers

ork you'll end up

planning, and ma

r workload with

for the company

a built-in data s

". It has impro

d information w

s, we could build

alize data and us

ta samples is we

ization in terms

for Capacity Plan

paper you will lea

e a clear-cut me

s, and Performan

arted

developers don hour. How ais the bottlena daunting anespecially whrmance data t

o longer analyware on how with the trou

tory for Cother Re

KOracle ACE

karlara

arrays are getting

planning plays a

expected and un

is justifying the

p getting the mo

anagement of gr

allowance for a

y and a happier I

store that starte

oved significantl

when going thro

amazing reports

se statistical met

e are able to def

of CPU, IO, me

nning.

arn how to make

easurement on

nce Firefighting.

on’t have enoabout 108 AW

neck? Well bend repetitive ehen you start rto correlate it

ysis periods hto optimize m

ubleshooting

Capacityal World

Karl Arao E, [email protected]

g faster, but the

very important

nexpected worklo

expense of add

st expensive har

rowth, you'll be

particular grow

IT shop.

d in 10gR1 and

y in 11gR2, en

ugh all the AW

s that will let us

thods for analys

fine the databas

emory, and netw

e use of the AWR

resources to aid

ough time to sWR reports inefore it will taexecution of areading each t to the proble

hence longermy troubleshobut what if

y Plannd Stuff

A, RHCE om

se resources are

role to ensure pr

oads. Another cri

ding resources on

rdware. With pro

able to get just

wth period. This

is very much lik

nabling you to

R snapshots. Fr

notice trends an

is. Even more su

se server's Capac

work, which are v

R, specifically the

d in Capacity P

spare to readn 5 minutes juake so much awrrpt.sql. Yof them and

em at hand.

r time for a pooting time. Yyou are only

ing, Vis

e finite and come

roper resources a

itical matter for t

n the system. W

oper measureme

the right hardwa

will result in hu

ke a "Statspack

have a far bet

rom the AWR d

d makes it possi

urprising about t

city, Requiremen

very important k

e DBA_HIST view

Planning, Predict

d 108 AWR reust to answer of my time ju

You will be ovyou only nee

problem to beYou can arguy left with ju

ualizatio

e at

are

the

With

ent,

are

uge

on

tter

ata

ble

the

nts,

key

ws,

tive

eports in a dathe question wust to generatverwhelmed bed to see parti

e solved and ue that there arust a comma

on, and

ay, even morwhat period ite these AWRby the manuaicular section

as a databasre visual tool

and line or an

e s

R al s

e s n

Page 2: Whitepaper: Mining the AWR repository for Capacity Planning and Visualization

Tpv

PAR

AsO Tap3t

Fdsd Tf

T3

At

This scenario performance dvisualize the d

RT 2 - How

AWR is muchsources of theOracle version

The AWR repan AWR repoperformance p339) within ththe workload

For the querydata blocks resince instancedelta and tran

To transform formula. See t

IO MB/s = ( (d                = ((5                = 73

To validate th339. The imag

Also a run of the throughpu

triggered medata in more data, or even p

w to mine th

h like “Statspae AWR reporn 11.2.

port provides ort for SNAPproblems we he specified inchange that’s

y output we aead from diske start. We arnsforming it to

the delta to the example f

delta * <block_size5663126 * 8192) /3.37 MB/s 

he accuracy oge below show

Automatic Dut of 74 MB/s

e to mine on tmeaningful mpossible to do

he AWR

ack on steroidrt are the DB

a single summP_ID 335 to are more intenterval. In ths happening.

are investigatik. It is also imre particularlyo a more mea

a more meanfor SNAP_ID

e>) /1024/1024 ) /1024/1024) / 603 

f the derived ws the delta w

Database Diagthat is really

the source tabmanner that wo some statist

ds” it is a wonBA_HIST view

mary report b339 that is aerested to see at way we ha

ing for the Smportant to noy interested o

aningful and r

ningful outpuD 338 below:

/ <snap_duration_

value we neewe used to der

gnostic Monitclose to our d

bles of the AWwill be easier tics out of it.

nderful data cws which hav

based upon anan interval tim

what occurreave a granular

YSSTAT staote this is a cuon the delta oeadable outpu

ut that we cou

_in_seconds> 

ed to comparrive the MB/s

tor (ADDM) derived value

WR report to for me to no

collector for Oave grown fro

n interval of tme from 6:20ed during eacr view of wha

atistic “physicumulative phyof each SNAPut.

uld easily un

re it with the s is correct.

on SNAP_IDe

cut out the uotice trends an

Oracle and OSom 67 in Ora

time. On the i0 – 7:01AM. ch of the sampat’s going on

cal reads” whysical reads bP_ID that is

nderstand we

actual AWR

D 338 – 339 s

unnecessary annd even poss

S statistics. Tacle version 1

image below However wh

ple (335,336,n and have a b

hich is the toby all the dataend_value –

would apply

report on SN

shows that we

nd present thible for me to

The underlying10.1 to 108 in

we can creathen analyzing, 337, 338 andbetter view on

tal number oabase sessionstart_value =

y the IO MB/

NAP_ID 338 –

e are reaching

e o

g n

e g d n

of s = 

s

g

Page 3: Whitepaper: Mining the AWR repository for Capacity Planning and Visualization

A

T

 

And checking

The data show

SELECT * FRO( SELECT s0.sn  TO_CHAR(s0  s10t0.stat_n  s10t0.value   s10t1.value   (s10t1.value  round(((((s1                                                                            ),2) as phyrFROM dba_h           dba_hi           dba_hi           dba_hiWHERE s0.dbAND s1.dbid  AND s10t0.dbAND s10t1.dbAND s0.instanAND s1.instanAND s10t0.inAND s10t1.inAND s1.snap_AND s10t0.snAND s10t1.snAND s10t0.stAND s10t1.st) WHERE snap_ORDER BY sn

g it with the E

wn above com

OM nap_id snap_id, 0.END_INTERVAL_name, start_value,  end_value, e ‐ s10t0.value) de0t1.value ‐ s10t0.v          + EXTRACT(H          + EXTRACT(M          + EXTRACT(Sreads_mbps ist_snapshot s0, st_snapshot s1, st_sysstat s10t0,   st_sysstat s10t1 bid              = 26079              = s0.dbid bid             = s0.dbibid             = s0.dbince_number     = 1nce_number     = sstance_number  =stance_number  =_id             = s0.snanap_id          = s0.snnap_id          = s0.snat_name        = 'phat_name        = s10

_id in (335,336,33ap_id ASC; 

Enterprise Man

mes from quer

_TIME,'YY/MM/DD

lta, value)* 8192)/102HOUR FROM s1.ENMINUTE FROM s1SECOND FROM s1

              ‐‐ physica

950532    ‐‐ DBID 

d d 1               ‐‐ INSTANs0.instance_numb= s0.instance_num= s0.instance_numap_id + 1 nap_id nap_id + 1 hysical reads' 0t0.stat_name 

7,338,339) 

nager Perform

ry below:

D HH24:MI') TIME,

24/1024)  / ((roundND_INTERVAL_TIM.END_INTERVAL_T.END_INTERVAL_T

l reads, diffed 

NCE_NUMBER er 

mber mber 

mance page sh

d(EXTRACT(DAY FRME ‐ s0.END_INTERTIME ‐ s0.END_INTTIME ‐ s0.END_INT

hows that the

ROM s1.END_INTERVAL_TIME) * 60 TERVAL_TIME)  TERVAL_TIME) / 6

e Disk IO is ar

ERVAL_TIME ‐ s0.E

60, 2))*60) 

round our der

END_INTERVAL_TI

rived value

IME) * 1440  

Page 4: Whitepaper: Mining the AWR repository for Capacity Planning and Visualization

You may have noticed that I used the SQL trick below that has similar effect to the LAG function. This enables the query to get the start_value and end_value on a single row making it possible to get the delta value and apply the performance formula. The view DBA_HIST_SNAPSHOT also acts as an ultimate reference of snap information that allows joining to the other DBA_HIST views to provide meaningful data on other subsystems or workload performance data.

AND s10t0.snap_id          = s0.snap_id AND s10t1.snap_id          = s0.snap_id + 1 

The query I’ve shown you is just one part of the story, that’s only giving the “IO Read MB/s” - an IO subsystem statistic. Ideally we must have a correlation on the following subsystems of the database server to fully characterize the overall workload and performance:

1) Oracle Oracle instance and database configuration

2) Operating System CPU, memory, IO, and network

3) Application SQLs and anything specific to the application

For the correlation we would be using the “3-circle analysis” technique [1] where each subsystem represents a circle and is diagnosed separately and then in combination. If the problem resides with the database server, the overlap of the 3 circles is the current performance problem. By doing this we will have a clear correlation of the workload and performance across subsystems and will have targeted efforts to improve the overall response time. In mining the AWR having a query in a time series layout and only the relevant statistics shown side by side can be very useful in various ways and even if it can’t be shown side by side each bottleneck period relates to a particular SNAP_ID so the correlation across various performance data is extremely possible!

Having this we would have the following advantages

Quickly notice trends for performance diagnosis We have the beautiful set of workload and performance data now in our control We have lots of data points for statistical and predictive analysis Faster analysis ever!

Page 5: Whitepaper: Mining the AWR repository for Capacity Planning and Visualization

Aa T

Tc

Script Naawr_genw

awr_topev

awr_servic

As I go alongapplied succes

The chart belo

The table belcreated:

ame DBwl DB

DBDBDB

vents DBDBDB

ces DBDB

g with my ressfully on rea

ow shows the

low shows th

IM

BA_HIST vieBA_HIST_SNAPSBA_HIST_OSSTABA_HIST_SYS_TBA_HIST_SYSST

BA_HIST_SNAPSBA_HIST_SYSTEBA_HIST_SYS_T

BA_HIST_SNAPSBA_HIST_SERVI

esearch of mial world perfo

categorical r

he important

MPORTANT NO

ews SHOT AT TIME_MODEL TAT

SHOT EM_EVENT TIME_MODEL

SHOT ICE_STAT

ining the AWormance scena

relationship o

details of th

TE: Diagnostic

Data presAAS CPU capacCPU requirMemory reIO requireLogged onCPU Utiliza

Event Event RanWaits Time Avgwt (msDB Time %AAS Wait Class

Service NaDB Time DB CPU Physical ReLogical ReaAAS

WR I have crarios.

f the scripts:

he scripts and

c Pack License

sented

city rements equirements ments users

ation

k

s) %

ame

eads ads

reated and co

d some reaso

e is needed for

DescriptioThis is the overview ofthe relations Utilization = The AAS coperiods whejust idle This is a vewith AAS m Coming frommust be awdrilling dowof data over Graphing ththat outputsdifferent wayou could g Service enaor allowing This data isus a classifdatabase. Showing thicolumn will most the wo

ollected some

on behind ho

r the scripts

on starting point.f the load of thship of the form

= Requirements

olumn serves aere the databa

rsion of "Top 5 etric.

m the awr_genwware about the c

n on the time cr a period of tim

his data will be ms a nice graph aait classes givingo back and drill

ables the groupithe distribution

s commonly seefication of the

is data in a tim give us an ideaorkload of the d

e useful scrip

ow they are f

You first run he database semula

/ Capacity

as a (golden) mase could be h

Timed Events"

wl, for the AAS components of Acomponents) anme (across SNAP

much like the Eand slicing the Ag you a broad “l down on the p

ng of common of connections

en on the Enter application/mo

me series mannea if particular apdatabase.

pts that I hav

formatted and

this SQL to harver. It clearly

metric on findihaving a bottlen

but across SNA

to be more useAAS (much like d have this kindP_IDs).

nterprise ManagAAS component“historical” viewpast load activity

database connes (e.g. RAC).

prise Manager todule activity o

er and adding applications are

e

d

ave an shows

ng the neck or

AP_IDs

ful we d

ger ts to which y.

ections

to give on the

an AAS driving

Page 6: Whitepaper: Mining the AWR repository for Capacity Planning and Visualization

awr_sysstat DBA_HIST_SNAPSHOT DBA_HIST_OSSTAT DBA_HIST_SYS_TIME_MODEL DBA_HIST_SYSSTAT

AAS LIO/s DB Block Changes/s User Calls/s Parses/s Hard Parses/s Sorts/s Logon/s SQL*NET to client MB SQL*NET to dblink MB

This is a version of "Load Profile" but across SNAP_IDs with AAS metric. Useful to quickly notice the Oracle workload change. You may put additional SYSSTAT statistic you want to monitor here.

awr_topsqlx DBA_HIST_SNAPSHOT DBA_HIST_SQLSTAT DBA_HIST_SQLTEXT

SQL_ID Plan Hash Value Module Elapsed Time (s) Elapsed Time / exec (s) CPU Time (s) IO Time (s) App Time (s) Concurrency Time (s) Cluster Wait (s) LIO PIO Direct Writes Rows Exec Parse Count PX Exec Time Rank AAS SQL_TEXT

The “SQL section” of the AWR report is usually segregated into sections ordered by the following:

Elapsed Time CPU Time Gets Reads Executions Parse Calls

Having separate data for a particular problematic SQL_ID spread over 1000+ lines of report makes it hard to find every detail about its performance. I feel there’s a better way to present the data. And here are the info/sections you'll get from the script and some short description: 1) snap_id, time, instance, snap duration The time period and snap_id could be used to show the SQLs for a given workload period..let's say you usual work hours is 9-6pm, you could just show the particular SQLs on that period.. there's a data range section on the bottom of the script you could make use of it if you want to filter. 2) sql_id, plan_hash_value, module You could make use of this info if you want to know where the SQL was executed (SQL*Plus, OWB, Toad, etc.).. plus you could compare the plan_hash_value but I suggest you make use of Kerry Osborne's awr_unstable_plans.sql script if you'd like to search for unstable plans. 3) total elapsed time, elapsed time per exec - cpu time - io time - app wait time - concurrency wait time - cluster wait time These are the time info.. at least without tracing the SQL you'd know what time component is consuming the elapsed time of that particular SQL.. so let's say your total elapsed time is 1000sec, and cpu time of 30sec, and io time of 300sec... you would know that it is consuming significant IO but you have to look for the other 670sec which could be attributed by "other" wait events (like PX Deq Credit: send blkd,etc,etc) 4) - LIOs - PIOs - direct writes - rows - executions - parse count - PX Some other statistics about the SQL.. if your incurring a lot of PIOs, how many times this SQL was executed on that period, the # of PX spawed.. just be careful about these numbers if you have "executions" of

Page 7: Whitepaper: Mining the AWR repository for Capacity Planning and Visualization

let's say 8.. you have to divide these values to 8 as well as on the time section.. only the "elapsed time per exec" is the per execution value.. this is for formatting reasons because I can't fit them all on my screen.. 5) - AAS (Average Active Sessions) - Time Rank - SQL type, SQL text This is one of my favorites... this will measure how's the SQL is performing against the database server.. I'm using the AAS & CPU count as my yardstick for a possible performance problem (I suggest reading Kyle's stuff about this): if AAS < 1 -- Database is not blocked AAS ~= 0 -- Database basically idle -- Problems are in the APP not DB AAS < # of CPUs -- CPU available -- Database is probably not blocked -- Are any single sessions 100% active? AAS > # of CPUs -- Could have performance problems AAS >> # of CPUS -- There is a bottleneck so having the AAS as another metric on the TOP SQL is good stuff.. I've also added the "time rank" column to know what is the SQLs ranking on the top SQL.. normally the default settings of the script will show time rank 1 to 5.. this could be useful also if you are finding a particular SQL that is on rank #15 and you are seeing that there's an adhoc query that is time rank #1 and #2 affecting the database performance.. And.... this script could also show SQLs that span across SNAP_IDs... I would order the output by SNAP_ID and filter on that particular SQL then you would see that if the SQL is still running and span across let's say 2 SNAP_IDs then the exec count would be 0 (zero) and elapsed time per exec is 0 (zero).. only the time when the query is finished you'll see these values populated.. I've noticed this behavior and it's the same thing that is shown on the AWR reports.. you could go here for that scenario http://karlarao.tiddlyspot.com/#%5B%5BTopSQL%20on%20AWR%5D%5D

awr_topsql DBA_HIST_SNAPSHOT DBA_HIST_SQLSTAT DBA_HIST_SQLTEXT

SQL_ID Plan Hash Value Module Elapsed Time (s) Elapsed Time / exec (s) CPU Time (s) Cluster Wait (s) LIO PIO Rows Exec Parse Count PX Exec Time Rank AAS

Similar columns from awr_topsqlx but this time just showing the top 20 SQLs across SNAP_IDs.

awr_unstable_plans (by Kerry Osborne)

DBA_HIST_SNAPSHOT DBA_HIST_SQLSTAT

SQL_ID Executions Min,Max,Avg Etime Avg LIO STD_DEV

This script finds SQL statements with plan instability. I like the clever use of standard deviation to show SQLs with variable elapsed time.

Page 8: Whitepaper: Mining the AWR repository for Capacity Planning and Visualization

awr_parm_mods (by Kerry Osborne)

DBA_HIST_SNAPSHOT DBA_HIST_PARAMETER V$INSTANCE

Parameter Name Old Value New Value

This script shows all parameters (including hidden) that have been modified.

awr_netwl

DBA_HIST_SYSMETRIC_SUMMARY Network Minvalue (MB)/s Network Maxvalue (MB)/s Network Avgvalue (MB)/s Network STD_DEV (MB)/s

The data comes from the metric family of tables that shows “Network Traffic Volume Per Sec” Keep in mind that metrics are different from sysstat values. On sysstat you just get the delta and the rate, in metric the sampling is different let's say the snap duration is 10mins what metric does is it samples on per 60sec interval (num_interval) and get the max, min, avg, std_dev of those samples.

awr_est_gc_traffic (by John Kanagaraj)

DBA_HIST_SNAPSHOT DBA_HIST_SYSSTAT DBA_HIST_DLM_MISC V$DATABASE V$PARAMETER

Estimated Interconnect Traffic (KB)

This script is ideal for RAC environment and shows the interconnect throughput of an instance. Very useful if you want to check if the interconnect is being saturated.

awr_iowl

DBA_HIST_SNAPSHOT DBA_HIST_OSSTAT DBA_HIST_SYS_TIME_MODEL DBA_HIST_SYSSTAT

AAS CPU IO WAIT Utilization OS Load Single Block R/W IOPS Multi Block R/W IOPS R/W MB/s Total R/W IOPS R/W Ratio HW Disk IOPS HW # of Disks

This script is ideal for monitoring the Oracle IO activity. Very useful for sizing and consolidating storage for Oracle databases. This can be used together with a storage monitoring tool to have a complete picture of IO performance. The last two columns have the corresponding formula that is used by storage engineers to determine the number of disk needed by the database. HW Disk IOPS = (IOPS * Read Ratio) + (IOPS * Write Ratio * RAID penalty) HW # of Disks = Total disk IOPS / IOPS per disk Of course the “HW # of Disks” is not the final number. There are other factors (bandwidth, throughput, service time, etc.) that need to be considered to determine the right storage for a particular IO workload but this can be your starting point. Also benchmarking will help a lot on the storage decisions.

awr_io_ts

DBA_HIST_SNAPSHOT DBA_HIST_FILESTATXS DBA_HIST_TEMPSTATXS

Tablespace R/W IOPS Tablespace R/W latency

This script shows the IO performance of the tablespaces. This is the same as what you see in AWR but across SNAP_IDs. The latency formula is as follows latency (ms) = (readtim / phy reads) * 10 Keep in mind that on this script the IOPS and latency values are aggregated from all the datafiles of the tablespace. So diagnosing latency issues using this script may not represent the actual numbers but may warn you from the textual trends of high latency (ms) numbers that way you’ll be interested on particular workload periods to probe it with small duration samples.

awr_io_file

DBA_HIST_SNAPSHOT DBA_HIST_FILESTATXS DBA_HIST_TEMPSTATXS

Datafile R/W IOPS Datafile R/W latency

This script shows the IO performance of the datafiles. This is the same as what you see in AWR but across SNAP_IDs. Keep in mind that the IOPS and latency values may be normalized if the snap interval is too long (60mins above) compared to per 5seconds or 10 minute snap interval. (see Appendix)

r2toolkit [2] DBA_HIST_SNAPSHOT DBA_HIST_DATABASE_INSTANCE DBA_HIST_SYSSTAT DBA_HIST_SYSTEM_EVENT DBA_HIST_SYS_TIME_MODEL DBA_HIST_OSSTAT DBA_HIST_WR_CONTROL

Y and X values that can be plotted for Linear Regression

This is a performance toolkit that uses AWR data and Linear Regression to identify what metric/statistic is driving the database server’s workload. The data points can be very useful for capacity planning giving you informed decisions and completely avoiding guesswork!

You can also do the same kind of mining with Statspack. Each DBA_HIST view has a counterpart Statspack view and you can achieve similar results DBA_HIST_SNAPSHOT = STATS$SNAPSHOT DBA_HIST_OSSTAT = STATS$OSSTAT

Page 9: Whitepaper: Mining the AWR repository for Capacity Planning and Visualization

DBA_HIST_SYS_TIME_MODEL = STATS$SYS_TIME_MODEL DBA_HIST_SYSSTAT = STATS$SYSSTAT The scripts mentioned are freely downloadable and more details on the math and performance formulas (rates, time, IOPS, CPU, latency, utilization, AAS) will be discovered when you look into the SQL code. I would also suggest that if you are serious on mining the AWR you must take time to play further with the DBA_HIST tables and the underlying data and you’ll appreciate that you have a better understanding on how the data are derived on the plain AWR report.

PART 3 - Visualization

Average Active Sessions (AAS) has become my default (golden) metric on finding the periods where the database could be having a bottleneck or just idle. Essentially AAS is the database load; this value should not go above the CPU count (NUM_CPUS in DBA_HIST_OSSTAT) and if it does then that means the database is working very hard or waiting a lot for something. Together, the AAS & CPU count is used as a yardstick for a possible performance problem [3]

    If AAS < 1        ‐‐ Database is not blocked      AAS ~= 0        ‐‐ Database basically idle       ‐‐ Problems are in the APP not DB      AAS < # of CPUs       ‐‐ CPU available       ‐‐ Database is probably not blocked       ‐‐ Are any single sessions 100% active?      AAS > # of CPUs       ‐‐ Could have performance problems      AAS >> # of CPUS       ‐‐ There is a bottleneck 

Just like a doctor, AAS could be your “stethoscope” when investigating performance problems but it doesn’t stop there. For it to be more useful you must be aware about the components of AAS much like drilling down on the time components and have this kind of data over a period of time (across SNAP_IDs). Well Enterprise Manager does this nice graphs on the “Performance and Top Activity page” and slicing the AAS components into different “Wait Classes” and it’s got a “Historical” view which you could go back and drill down on the past load activity.

But what could be the problem?

Page 10: Whitepaper: Mining the AWR repository for Capacity Planning and Visualization

S

I know solong AWRbecause thsome othe

So what could

1) USN

2) O

To be coSNAP_ID

The imagthere’s a componen

ome of you hR retention phere was an ier issue where

d be the alter

Use the Top TNAP_IDs

Or use the scri

onsistent withD 335-339. No

ge below is a big spike on

nts.

have encounteperiod (365 dainstance shutde Enterprise M

rnative?

imed Events

pt together w

h the initial eote that the A

stacked area n the database

ered this Entays to exaggedown betweeManager reall

SQL (awr_to

with Perfsheet

example we AAS during th

awr_

chart of the e load… but

erprise Manaerate it) but En the date yoly can’t just g

opevents.sql)

! … a great to

will focus onhis period had

_genwl.sql ou

awr_topeventwe want to

ager error at Enterprise Maou want to gogive you the v

and focus on

ool for ad-hoc

n the same id a sudden spi

utput

nts.sql using Pknow more

some point. Yanager won’t o and the datevisualization y

the AAS and

c performance

interval timeike that is on t

Perfsheet. It’sabout it by d

You are conflet you go ba

e you are nowyou need.

d wait class co

e visualizatio

6:20 to 7:0the range of 2

s clear from tdrilling down

figured with ack farther al

w. Or could b

olumns acros

on [4]

1 AM that i2.2 to 3.5

the image than on the AAS

a ll e

s

s

at S

Page 11: Whitepaper: Mining the AWR repository for Capacity Planning and Visualization

S

Looking aknow whactivity, it

Some more ba

On the Einto differ From the ways to d

1) T2) S

AAS on tmodel. TDBA_HIS

at the “textuahich AAS comt’s evident tha

ackground

nterprise Marent wait clas

2nd slide of Kerive the valu

Time Statisticsampling

the PerformaThis is also ST_SYSTEM

al trends” of amponent is dat there’s a hi

anager “Perforses. But, did y

Kyle Hailey’sue:

s

ance Page usewhat the sc

M_EVENT a

Stacke

awr_topeventdriving the wigh User IO a

awr_to

rmance” andyou know tha

s presentation

es “Time Stacript awr_topand the “CPU

ed area chart o

ts.sql output jworkload of thactivity.

opevents.sql o

“Top Activitat their data so

n [3] on AAS

atistics” and ipevents.sql isU” from tim

of AAS

just by lookinhe database.

output

ty” Page youources are dif

(Average Ac

is actually frs doing… it

me model vie

ng at the AAFor the part

’ll see the AAfferent?

ctive Sessions

rom v$systemt unions theew DBA_HIS

S column weticular SNAP

AS compone

s) it says that

m_event + CPe output of ST_SYS_TIM

e would easilyP_IDs of high

ents are sliced

there are two

PU from tim“events” on

ME_MODEL

y h

d

e n L

Page 12: Whitepaper: Mining the AWR repository for Capacity Planning and Visualization

S

N

and then fit look simAAS valu“CPU use AAS on thon a 15refresh toCPU from

So what’s the

On a highto Performsession (ththink) thaTime Stat If you waHistory of

Now time for

Finding thwe can cr

filter only themilar to the Eues will be coued by this sess

he Top Activ5sec refresh o Historical tm time model)

e effect?

h CPU activitymance Pagehe only way t

an v$sysstat “tistics (one of

ant more infof Session Loa

Perfsheet a l

he AAS compeate the same

e top 5 and doEnterprise Maunted. By thesion”.

vity Page usesrate… but

then it also st).

y period you. Simply becto see CPU u“CPU used byf two ways to

o about the dad [5] and AA

la Enterprise

ponent that’s e visualization

Stacke

o this across thanager Perfore way, on 10g

s “Sampling”as I have tarts to behav

’ll notice thatcause ASH s

usage real timy this sessioncalculate AA

etails aroundAS investigati

Manager!

driving the wn like the Ent

ed area chart

he SNAP_IDrmance Page

g below the lo

and by defauobserved wh

ve like the Pe

t there will besamples everye) while the t

n” there couldAS) which cou

d the Performon [14]

workload is aerprise Mana

t AAS compo

Ds but for grapI have to inc

oad chart is co

ult is taking ahen you swerformance P

e a higher AAy second andtime model Cd still be somuld be affecte

mance and To

a lot easier inager broken do

onents – wai

phing purposeclude all of toming from v

advantage of Awitch from age (pulls da

AS on the Tod it does tha

CPU althoughme lag time aned by average

p Activity pa

n graphics. Thown into “Wa

it class

es on the Perfthe “events” v$system_eve

ASH (samplethe Real T

ata from v$sy

p Activity Paat quickly onh it updates qund it will stils.

age this is wo

he image beloait Class”.

fsheet to makso that all th

ent + v$syssta

es) and does iTime 15 seystem_event +

age comparedn every activuicker (5secs ll be based on

orth reading

ow shows tha

e e

at

it c +

d e I n

-

at

Page 13: Whitepaper: Mining the AWR repository for Capacity Planning and Visualization

Even morgraphs. Bis mostly

Ooops, douses couldview and compare chart view Compare it’s on the Then comhappening1.6 on SN

re, we have telow is brokeconsuming th

on’t get too exd hide importsee the data

the above anw could tell a

the wait clase range of 0.1

mpare the waig.. but on 3D

NAP_ID 335 a

the data now en down into he AAS.

Stack

xcited.. important informatia clearly sepand below cha more meanin

ss chart… ab1 (hidden bet

it event charyou can see

and 336. Yes,

in our contro“Wait Events

ked area chart

rtant remindeion and sometarated into thharts, you’ll kngful story.

bove notice thtween CPU an

rt… notice ththat only the, you will also

ol. So we cous”, aside from

t AAS compon

er… the 2-dimtimes could b

heir respectiveknow what I

he blue (Othend System IO

he big differendb file sequ

o not be foole

uld play aroum being more

ents – wait ev

mensional Stabe misleadinge componentI mean.. Wa

er wait class) O)… that’s a b

nce on the chential read aed when you

und with the colorful it let

vents

acked area chag [13] and it rts, rather thanait Class and

on the rangebig difference

hart? above yand direct palook at the ra

data and creat’s you see wh

art that Enterpreally helps ton being stack

d Wait Event

e of AAS of 1e!

you can’t realath read are oaw data… but

ate interestinghat wait even

prise Manageo have anotheked… As youts in 3D area

1 while below

lly tell what’on the AAS ot visualization

g nt

er er u a

w

s of n

Page 14: Whitepaper: Mining the AWR repository for Capacity Planning and Visualization

A

is much e

AAS through

On my tedata. YouSNAP_IDdatabase. beyond mthere you

asier and the

out the AWR

st machine I u can see fromD 335-339) ha

You could amy maximum

could use AS

way to go bu

3

3D

R retention pe

have 365 daym the chart bappens to be also see the p

CPU which SH, run the A

ut you must be

D area chart A

D area chart AA

eriod!

ys retention pbelow (stackethe highest l

period of shutcould justify

AWR report, ru

e able to sens

AAS componen

AS component

period. This eed area chartoad period ftdowns (negathe drill dow

un ADDM, o

e and validate

nts – wait clas

ts – wait even

enables me tot), that what from all the Aative value) awn on the speor make use o

e if it’s drivin

ss

nts

o have a data we are focus

AAS samples and other timecific SNAP_f your high ca

ng you to bad

warehouse ofsing on (6:20for the lifetim

me period whe_IDs or time aliber scripts!

d conclusions.

f performanc0 to 7:01 AMme of my tesere AAS wenframe… from!

e M st nt m

Page 15: Whitepaper: Mining the AWR repository for Capacity Planning and Visualization

PAR

U

The good

RT 4 - Capa

Utilization is

Capacity expected will fit inmeasuremand presen

Measuring H E E

On the InexplainedRequirem Essentiallformula

Utilization = 

As shownwater” andecision tinto the sserver capmuch or i

thing here is

acity Plann

the ultimate m

planning plaand unexpec

nto the availabment [7]. Goont the in a mo

g the workloaHave enough cEnable us to quEnable us to qu

ntroduction tod in detail whaments, and U

ly what we ca

Requirements / C

n on the imagnd “another pto purchase therver. And o

pacity. And wt could be the

, you are not

ning

metric!

ays a very imted workloadble capacity od thing the d

ore meaningfu

ad will give ucapacity and nuantify the reuantify the be

o Oracle Servat information

Utilization

are most in C

Capacity 

ge below the pitcher with bhe database sf course, the

when this doese other way ar

guessing!

mportant role ds. The primaof the databaata collectionul and useful

us the followinnot over buyesults of respoenefit of work

er Consolidatn you need to

Capacity Plann

“empty pitchbeer” are theerver that is tapplication r

sn’t occur nicround where t

to ensure prary principle se server. An

n process is amanner.

ng advantages

onse time optikload reductio

tion paper [6o get for you

ning is the da

her” represent Oracle workthey define threquirement mcely, there canthe capacity i

roper resourcis to ensure

nd with this walready being

s and benefits

imizations in on

] and Chapteto be able to

atabase server

ts the databaskload requirehe capacity. Tmay or may n be an excesis not enough

es are availathe applicati

we need to hadone by AW

s [7]:

the savings o

er 9 of Craig define the Da

r utilization a

se server capaements. TypicThen they stanot fit nicelys of capacity,

h for the requi

able and be aion workloadave a facility

WR. We just n

of system reso

Shallahamer’atabase Serve

and it is repre

acity while thcally the IT sart pouring thy on the avail, which meanirements at ha

able to handld requirementy for workloadneed to extrac

ources

’s book [8] her’s Capacity

esented by thi

he “glass withshop makes

he applicationlable databas

ns IT spent tooand.

e s d ct

e y,

s

h a s e o

Page 16: Whitepaper: Mining the AWR repository for Capacity Planning and Visualization

This simppresented

Having thperiods w

ple and very uin a manner

he data presenwith high work

useful concepthat we can e

nted this waykload requirem

pt can be appasily abstract

y, we can easiments.

plied as well it the performa

ily apply filte

in AWR. Usiance statistics

er to the data

ing the awr_gs to the Utiliz

set and imm

genwl.sql scrzation formula

mediately find

ript the data ia.

d the workload

s

d

Page 17: Whitepaper: Mining the AWR repository for Capacity Planning and Visualization

C

And we c

AAS range     Per SNAP_ID        Oracle CPU U    OS CPU Utiliz    Particular Wo  AND TO_CHA AND TO_CHA AND TO_CHA AND TO_CHA AND s0.END_ AND s0.END_

CPU sizing re

Having th The data server is aoccurred. needed to The formu

core need = #

The data collocatedcan virtua

ould do other

aas > 1 

or range of SNAPid in (336) where id >= 3

Utilization oracpupct > 5

zation oscpupct > 50

orkload periods 

AR(s0.END_INTERVAR(s0.END_INTERVAR(s0.END_INTERVAR(s0.END_INTERV_INTERVAL_TIME _INTERVAL_TIME 

ecommendati

his data outpu

points belowa dual core mThe manage handle the w

ula used to de

# of cores * utilizat

points were d to a data cenalize it to a ne

r filtering as w

P_IDs 

36 and  id <= 340 

VAL_TIME,'D') >= VAL_TIME,'D') <= 7VAL_TIME,'HH24MVAL_TIME,'HH24M>= TO_DATE('2010<= TO_DATE('2010

ions

ut can be easil

w came from amachine and bement would workload of th

erive the “CPU

tion * 1.25 

very useful nter, we could

ewer hardware

well…

1     ‐‐ Day of week7 MI') >= 0900     ‐‐ HMI') <= 1800 0‐jan‐17 00:00:000‐aug‐22 23:59:59

y used as inp

an actual probeen used forlike to know

he database.

U core need”

to characterd opt to just ue.

k: 1=Sunday 7=Sat

Hour 

','yyyy‐mon‐dd hh9','yyyy‐mon‐dd hh

uts to CPU si

oduction server almost 8 yea

w what would

[9] is as follo

ize the curreupgrade to a n

turday 

h24:mi:ss')    ‐‐ Dath24:mi:ss‘) 

izing of a data

er that needs ars and there

d be the ideal

ows:

ent utilizationnewer model

ta range 

abase server.

to be migrate have been a l machine and

n of the databut not the la

ted to a new couple of had how many

abase server. atest and the g

machine. Thardware error

cores will b

Since it wagreatest or w

e s e

s e

Page 18: Whitepaper: Mining the AWR repository for Capacity Planning and Visualization

S

But noticesummarizignore the

Validatinga year. Hprocess waffect the

Storage sizing

Having th

e the outlier (zing the data e outlier just l

g with the appHaving this inwill run again

overall conne

g recommend

his data outpu

(shown in redwill tell me tlike that becau

plication ownnformation, won the new seected users.

dations

ut can be easil

d above) reprethat I’m mostuse there mig

ner, she confiwe can safelyerver we just

y used as inp

esenting a SNt of the time

ght be a critica

irmed that it wremove the have to make

uts to storage

NAP period haon the 10 %<al application

was indeed anoutlier from

e sure that it’

e sizing of a d

aving high CP< CPU utiliza

n process on th

n adhoc procethe data poi

s being run o

database serve

PU utilizationation but we hat workload

ess that is beiints and even

on an off-peak

er.

n. Statisticallydon’t want to

d period.

ing done oncn if the adhok period to no

y o

e c

ot

Page 19: Whitepaper: Mining the AWR repository for Capacity Planning and Visualization

The data mentionedcan be usmeasured

Also take determinewill help a For storag

points belowd above. Thissed together w

data easily tr

note that thee the right stoa lot on the st

ge sizing purp

w came froms shows the Iwith a storagransforms req

re are other forage for a patorage decisio

poses, I strong

m awr_genwIOPS requireme monitoring

quirements to

factors (bandwarticular IO wons.

gly recommen

wl.sql as welments needed tool to havecapacity.

width, throughworkload but

nd using the a

l, sizing stod to run the de a complete

hput, service this can be y

awr_iowl.sql

rage for thedatabase on tpicture of IO

time, etc.) thyour starting

same produthe new envir

O performanc

hat need to be point. Also b

uction systemronment. Thi

ce. Having th

considered tobenchmarking

m s e

o g

Page 20: Whitepaper: Mining the AWR repository for Capacity Planning and Visualization

Rea

D

al World Ex

Diagnosing a

The graphprocessingdone any performan So it’s a plotted inwas able visualizat

On this impeaks aresuspect orparticular and OS sproblem. If it weren This is the

xample

and Resolving

h shown was g so it’s the mchanges on t

nce problem s

sudden slow one graph…to apply the

ion and I was

mage above ye the particular possible culdatabase ses

statistics (CPU

n’t for this vis

e image after

g GC Block L

a sudden slomost critical wthe database eso the tasks o

down, and I that would ae things thats able to achie

you can see tar periods wlprit for the pssions runningU, memory,

sualization th

replacing the

Lost

ow down on aweek of the menvironment…f finding whe

I was thinkinanswer a lot ot I have leareve what I hav

the where, whe are interest

performance pg critical modnetwork) we

he troubleshoo

e network inte

a client runnimonth. Interv… well that were/when/why

ng… if I can of questions. ned. So I mve envisioned

hen, and whyted in. And wproblem. Drildules that are

e were able c

oting would h

erconnect swi

ing 2 nodes oviewing the Dwould be the y it went wron

have time seComing from

made use of Pd.

y. Most of thwhat wait evlling down fue slow plus cconclude that

have taken lon

itch… this sh

of RAC and DBA, he wou

majority of thng is all left to

eries performm Tanel PodePerfsheet and

he load is on vents are conurther on thoscorrelating it t it was a ne

nger.

ows their nor

it’s a period ld insist that he customerso us.

mance of bothr’s seminar ind played aro

the first nodentributing on se peak periowith the data

etwork interco

rmal workload

of month endthey have no

s will say on

h of the noden Singapore, ound with th

e. And on ththe peak is

ods and on thabase advisoronnect switch

d.

d ot a

s I e

e a e s h

Page 21: Whitepaper: Mining the AWR repository for Capacity Planning and Visualization

L

Linear Regres

Mining thtargeted re The graph8core HS2respectiveat >80% t On the drihigh load componenwhen lookreduction, If the servseems to b

ssion of AAS

he AWR backesponse time

h shown below21 Bladeserveely which shothe AAS also

ill down showSQL greatly

nt being utilizking at the SQ, response tim

ver’s workloabe low. Also

Nod

S and CPU on

ked by solid soptimization

w is a scatter er on a DS48

ows a strong cshoots up!

wn below on taffecting the

zed is on “CPUQL details on me optimizatio

ad is on the you will notic

de 1

n 2 node RAC

statistical anans and worklo

plot of a prod00 SAN. Not

correlation be

the peak periooverall perfoU” hence youawr_topsqlx.

on, and huge

AAS value oce the top SQ

C

alysis [10] [1ad reduction.

duction envirotice the strongtween AAS v

od with AAS ormance of thu will see larg. Tuning the hsavings on sy

of 2.2, the CPQL from AAS

1] [12] lets y

onment with g correlation cvs. CPU utiliz

value of 10 ie database. A

ge LIOs and mhigh load SQLystem resourc

PU utilizationof 10 is not t

you do foreca

2 nodes of 11coefficient (Rzation. Also w

it shows that tAlso note that most of the elL will result tces.

n, latency, Athere anymore

Nod

ast that can gu

1gR1 RAC ruR2) of .97 andwhen CPU sta

the workload the large chu

lapsed time spto great work

AAS componee.

de 2

uide you with

unning on d .89 arts to queue

is driven by unk of AAS pent on CPU

kload

ent on “CPU

h

Page 22: Whitepaper: Mining the AWR repository for Capacity Planning and Visualization

Drill

The perfodatabase sinformed The toolk - CREAT - DROP - CREAT - POPUL - ANAL - POPUL - R2 REP

ing down o

1) General W

2) Tablespa

3) Top Ti

ormance toolkserver’s workdecisions and

kit contains 7 TE USER - cTABLES - d

TE THE r2 TLATE y data YZE r2 VALLATE x and rPORT - gene

on the peak

Workload repor

ace IO report

med Events

kit uses AWRkload based ond completely

sections, see breates the r2to

drop the tablesTABLES - cre

- y data is theLUES - get theresidual data rate the textu

workload...

rt

R data and Linn AAS. The davoiding gue

brief descriptoolkit user s for a fresh seate the main e "dependent e stat names w- x data is the

ual report and

. with AAS o

near Regressiodata points casswork!

tion below:

tart tables value", variabwith high r2 ve "independenr2 values wit

of 10

on to identifyan be very use

ble whose vavalues, to havnt value", useth or w/o outl

y what metric/eful for capac

lue is to be prve a more accud to predict thliers

/statistic is dricity planning g

redicted urate analysishe value of y

iving the giving you

s

Page 23: Whitepaper: Mining the AWR repository for Capacity Planning and Visualization

Now

4) Top 20

6) Top 5 SQ

w on the low

0 SQLs

QLs of SNAP_ID

w workload

D 8631.. which b

period… wi

y the way got a

ith AAS of 2

n AAS of 10

2.2

Page 24: Whitepaper: Mining the AWR repository for Capacity Planning and Visualization

Refe

1) Genera

2) Tables

3) Top Ti

4) Top 20

No entry – t

6) Top 5 SQ

erences [1] Craig [2] r2proj [3] Kyle H [4] Tanel [5] Histor [6] Craig [7] Andy [8] Craig [9] Husnu

http://husn [10] Forec [11] Statis

al Workload rep

space IO report

med Events

0 SQLs

the top SQL fro

QLs on SNAP_ID

Shallahamer ect - http://ka

Hailey SeminPoder – Perfs

ry of session lShallahamer Rivenes – OrShallahamer

u Sensoy - Danusensoy.filecasting Oraclstics Without

port

m AAS of 10 is

D 8582

- Oracle Perfarlarao.tiddlysnar – AAS presheet http://wload - http://si- Introduction

racle Workloa- Oracle Perf

atabase Consos.wordpress.ce Performanc

t Tears

not here anymo

formance Firespot.com/#r2pesentation

www.tanelpodites.google.con To Oracle Sad Measuremformance Fireolidation Bestcom/2010/05/ce

ore

efighting - Chproject

der.com/files/Pom/site/youviServer Consol

ment efighting - Cht Practices /database-con

hapter 1

PerfSheet.zipisualize/activelidation

hapter 9

nsolidation-be

p e-session-hist

est-practices.p

tory

pdf

Page 25: Whitepaper: Mining the AWR repository for Capacity Planning and Visualization

Ape

[12] Neer [13] Neil

http://arxi [14] AAS Other refe

o hto Sto hto ht

endix - Ave The IO lat

latency (ms) =

The imagshorter shttp://ww

raj Bahatia – Ll Gunther & iv.org/pdf/080

S investigationerences: ttp://karlarao.torage IOPS, ttp://karlarao.ttp://karlarao.

erage Laten

tency formula

= (readtim / phy re

ges below shonap intervalw.freelists.or

Linear RegresTanel Poder

09.2532 n http://goo.gl

.wordpress.cocapacity, per

.tiddlyspot.co

.tiddlyspot.co

ncy Issue

a used in AW

eads) * 10 

ow that latencls. Also reag/post/oracle-

ssion Paper r - Multidim

l/5WaAg

om rformance, coom/#Statisticsom/#OraclePe

WR is as follow

cy values maad on this l-l/Disk-Devic

mensional Vis

ost - http://goos erformance

ws:

ay be normalilink for thece-Busy-Wha

sualization of

o.gl/FCN0w

ized if the sne effects of at-exactly-is-t

f Oracle Per

nap interval iCPU sched

this,7

rformance us

is too long asduling issues

sing Barry007

s compared tos on latency

7

o y