Understanding I/O Performance with
PATROL-Perform and PATROL-Predict
Debbie Sheetz
Sr. Staff Consultant
BMC Software
C4P075
2
I/O Performance Analysis Overview
• I/O metric definitions
• Baseline I/O performance analysis
• What–if I/O performance analysis
C4P075
3
How Important is I/O to Performance?
• Predict/Visualizer presents a unified view of the system so that the relative contributions of CPU and disk I/O can be assessed
• Don’t solve a problem that you don’t have
CPU is the dominant factor here
C4P075
4
Source of I/O Metrics
• Key to understanding I/O is to know your metrics • Disks are reported/collected as they are
defined/known to UNIX or NT• This may or may not correspond 1-to-1 to physical
units• Disk configuration is collected from standard interface
for the particular OS• Disk statistics are collected from standard interface for
the particular OS (same metrics used by iostat, etc.)
• Analyze/Predict interprets and reports based on these metrics
C4P075
5
I/O Configuration Collection Issues
• Sometimes the disk configuration is reported as “Unknown”
• Three possible causes1. Disk configuration is not available from the OS2. Standard interface to OS fails to return the disk configuration3. Collected disk configuration is not matched by an entry in the
hardware (.hrw) and .odm
• RAID is not collected directly• This DOES NOT AFFECT the baseline metrics or
baseline model calibration• For certain ‘what-if’ disk modeling scenarios, the disk
must be identified
C4P075
6
Key I/O Metrics
• A few metrics tell most of the story about disk I/O
• Disk throughput• Data transferred (e.g. bytes, words, etc.)
– Disk reads/writes
• Disk accesses
• Disk utilization (active time)
C4P075
7
I/O Metrics: Throughput
• Data transferred (e.g. bytes, words, etc.)• PATROL-Perform and Predict report I/Os in 4 KB
units– Consistency for reporting (Analyze, Visualizer, Predict)
– Ease of modeling I/O cross-node and cross-platform
• Units measured vary by platform– HP, OSF: words Disk Statistics, Words Xfered
– Solaris, AIX: blocks Disk Statistics, Blocks Read/Written
– NT: bytes NT Physical Disk, Disk Read/Write
C4P075
8
I/O Metrics: Throughput
• Disk accesses (i.e. transfers) • Number of times an I/O request was made of the disk
– Size of data transfer can vary– Doesn’t matter where the I/O is actually serviced:
» Physical disk (seek, latency, and data transfer)» Cache on the disk» Cache on the disk controller
– Doesn’t matter whether RAID or non-RAID
• Similar metrics collected for UNIX/NT– UNIX Disk Statistics, Transfers– NT NT Physical Disk, Disk Transfers/Sec
C4P075
9
I/O Metrics: Throughput
• Disk reads/writes • Number of times a read vs. write I/O request was
made of the disk– Size of data transfer can vary
• Different metrics collected for UNIX/NT– Solaris, AIX Disk Statistics, Blocks Read/Written
– HP, OSF Not Available
– NT NT Physical Disk, Disk Read/Write Bytes/Sec
• Reported in Analyze/Predict in 4 KB=I/O rates
C4P075
10
I/O Metrics: Utilization (Active Time)
• Disk utilization (active time) • Amount of time disk was observed to be actively
servicing an I/O request – Doesn’t matter where the I/O is actually serviced:
» Physical disk (seek, latency, and data transfer)» Cache on the disk» Cache on the disk controller
– Doesn’t matter whether RAID or non-RAID
• Should reflect the relative efficiency of I/O processing when compared with disk throughput measures– Use disk service time for this (service time = utilization / IOs)
C4P075
11
I/O Metrics: Utilization (Active Time)
• Disk active time • Different metrics collected for UNIX/NT
– UNIX Disk Statistics, Active Time
– NT NT Physical Disk, % Disk Time
– Windows 2000 NT Physical Disk, % Idle Time
• Windows/NT metrics are reinterpreted by Analyze– Perfmon caps calculated utilization at 100%
– Observations of collected Windows/NT disk data show utilizations well over 100%
– Analyze scales all collected NT times down
– Perfmon and Analyze/Predict will not match
C4P075
12
I/O Metrics Collection Issues
• If “iostat” can’t see it, the collector can’t collect it • The OS is supplying the metrics
• If the metrics are missing or incorrect, both “iostat” and PATROL-Perform/Predict, etc. will report the same
• Problem needs to be addressed by the OS vendor
• Refer any questions about valid I/O metrics to BMC Technical Support
• Always need to know the exact platform (e.g. HP 11.00, 64-bit)
• Run iostat and the collector in parallel
• Use current collector for the platform
C4P075
13
Baseline I/O Performance Analysis Overview
• Observe key disk I/O metrics from baseline measurements
• Identify I/O patterns• For the system
• For a disk or group of disks– Distribution amongst disks
• For a workload/transaction
• Determine how important I/O is to overall performance
C4P075
14
Baseline I/O Performance Analysis Overview
• Observe key disk I/O metrics from baseline measurements
• Identify I/O performance characteristics• Relative speed of I/O processing
• Read/write ratios
• Blocksize used
• Disk utilization objectives– Distribution amongst disks
C4P075
15
Baseline Case Study
CPU pattern doesn’t precisely match I/O pattern
C4P075
16
Baseline Case Study• I/O is dominated by one oracle instance, but there are other contributors• Study patterns within days and across days, weeks, etc.
C4P075
17
Baseline Case Study
• I/O is the major component of response time during prime time
C4P075
18
Baseline Case Study
Distribution of I/O amongst disks is fairly even
C4P075
19
I/O Analysis Technique: CUTDISK
• How to filter I/O data so only the important disks are studied?
• Use “CUT DISK” feature• In Analyze• In Manager• If already specified in .an file input to Manager, don’t
need Manager specification, too
• Analyze/Predict reports shorter, Visualizer files smaller, Visualizer database smaller, Visualizer graphics easier to present
C4P075
20
I/O Analysis Technique: CUTDISK
• Concept is to aggregate I/O from less utilized disks, preserve important disks individually
• I/Os are NOT removed from the model
• Choose appropriate threshold• I/O rate or Disk utilization may be used
• Threshold value can be set for a specific purpose– Setting of 0 removes only disks which are not used at all
– Setting of 5% utilization removes most disks
– Paging disks are never removed
C4P075
21
I/O Analysis Technique: CUTDISK
• Specify under Options, Cut Disk Options in Analyze
C4P075
22
I/O Analysis Technique: CUTDISK
• Specify under Options, Advanced Features in Manager
C4P075
23
Baseline Case Study
• Observe Disk Utilization patterns
Utilizations mostly even, most under 40%
C4P075
24
Baseline Case Study
• Observe Disk processing efficiency
Looks good! Most service times under 5 ms per 4 KB transfer. A few outliers could use a closer look …
C4P075
25
Baseline Case Study
• Look at ssd4
High service time isn’t so high after all: 12.69 transfers divided by 9.85 I/Os is 1.3. That means 12.11 service time is for 1.3 actual data transfers or 9.3 ms per physical transfer.
C4P075
26
Baseline Case Study
• Look at ssd3High service time isn’t really high here either: 10.66 transfers divided by 1.37 I/Os is 7.8. That means 53.84 service time is for 7.8 actual data transfers or 6.9 ms. Another way to think about this is that the average blocksize is 4 KB / 7.8 or .5 KB.
C4P075
27
Baseline Case Study
• In fact, good (larger) blocksizes explain the good disk performance
These graphics show roughly a 2:1 ratio between I/Os and transfers, or an 8 KB blocksize
C4P075
28
Baseline Case Study Conclusion
• Even though I/O is a major contributor to response time, there are no obvious tuning opportunities
• Continue to study the key I/O metrics over time
• Identify trends in I/O performance
C4P075
29
What-if I/O Performance Analysis Overview
• Via the Predict model, you can change:
• I/O patterns• For the system
– Change in workload volume
– Change in the types of workloads
• For a disk or group of disks– Distribution amongst disks
• Change in amount of transaction I/O required
C4P075
30
What-if I/O Performance Analysis Overview
• Via the Predict model, you can change:
• I/O performance characteristics• Relative speed of I/O processing
– Disk configuration change
– Blocksize used
C4P075
31
What-if I/O Performance Analysis Overview
• Predict shows how this affects performance
• Performance objectives
• Workload/transaction response objectives
• Disk utilization objectives
• Reports I/O patterns
• System
• Distribution amongst disks
• Reports individual disk performance
• Can view results in Predict and/or Visualizer
C4P075
32
What-if Case Study
• Management wants to know how performance will change if a new RAID disk technology is implemented
• Study strategy1. Perform Visualizer analysis of baseline I/O
performance characteristics, build baseline model
2. Perform Visualizer analysis of benchmark of I/O using new disk technology (IBM “Shark”)
3. Use Predict to do ‘what-if’
C4P075
33
What-if Case Study: Benchmark Data Analysis
• Benchmark demonstrates substantial I/O rate• Since current system has high I/O rates, a subset of the benchmark will be studied
C4P075
34
What-if Case Study: Benchmark Data Analysis• Selected subset of the benchmark
C4P075
35
What-if Case Study: Benchmark Data Analysis• Key I/O characteristics: I/Os vs. transfers
Ratio of I/Os to transfers is about 5.7, or 23 KB per native I/O access
C4P075
36
What-if Case Study: Benchmark Data Analysis• Key I/O characteristic: reads vs. writes
Ratio of reads to writes is about 1.5:1
C4P075
37
What-if Case Study: Benchmark Data Analysis• Key I/O characteristic: service time for 4 KB I/O
Predominant service time is about .5 ms
C4P075
38
What-if Case Study: Benchmark Data Analysis• Key I/O characteristic: service time for 4 KB I/O
View by controller, disks over 5% utilization
Note less efficiency at lower I/O load
C4P075
39
What-if Case Study: Change Model
• Only one change is needed in the Predict model
• Set the disk service time/IO according to the benchmark
• DO NOT use the hardware table method because more specific info is available• Hardware table method applies ratio of new disk type
to current disk type
• Both disk types must be in the hardware table
• Baseline disk type must be specified
C4P075
40
What-if Case Study: Change Model
• Model must be baselined
• Two methods for changing service time
• Edit the disk service time/IO in the GUI
• Use a command file if there are many disks
Command file format
MODIFY DISK hdisk10
EDISKTIME .5
MODIFY DISK hdisk11
EDISKTIME .5
Etc.
C4P075
41
What-if Case Study: Modeling Results
• Model is evaluated and net change is observed
<< Baseline
What–if >>
C4P075
42
What-if Case Study: Modeling Results
• Relative reduction in response time reported with relative response time
Reduction of 26% for the workload of interest
C4P075
43
What-if Case Study: Modeling Results
• Why not a larger reduction?
New service time/utilization is about 75% of baseline (.5 ms / .65 ms) for the disks doing the most I/O
<< Baseline
What-if >>
C4P075
44
What-if Case Study: Modeling Results
• What else will improve performance more?
More even I/O distribution in benchmark
C4P075
45
What-if Case Study: Modeling Results
• What else will improve performance more?
Possible use of more optimistic service time, e.g. .45 ms observed with CUTDISK set at 100 IO/sec
Should confirm with more benchmark data and/or vendor
C4P075
46
What-if Case Study Conclusion
• Change to new technology will• Reduce I/O service time
• Reduce I/O wait time• From reduced utilization (due to service time decrease)
• From better I/O distribution (due to more even utilizations)
• Reduction not as large as expected because current I/O performance is already good (.65 ms vs. .5 ms)
• Allows for additional workload growth compared with current technology