© 2017 VMware Inc. All rights reserved.
Toronto VMUG Q1 Meeting
Steve Sykes, Staff EngineerGlobal Support, Premier ServicesJanuary 31, 2017
Troubleshooting Storage Performance
Agenda
• The ESXi Storage Stack
• Troubleshooting Performance
• Recommended Practices, Tools and Tips
• Steve’s 4-dimensional framework for latency
• Sample case # 1: Latency
• Sample case # 2: Unresponsive guests
• Community and other useful resources
2
VMware ESXi Architecture
PhysicalHardware
ESXi
Virtual Machine
Guest OS
Monitor (BT, HW, PV)
Memory
Allocator
NIC Drivers
Virtual Switch
I/O Drivers
File SystemScheduler
Virtual NIC Virtual SCSI
TCP/IP
File
System
I/O Drivers
Disk I/O Latencies
ApplicationGuest OS
ESX Storage
Stack
VMM
Driver
KAVG
DAVG
GAVG
QAVG
Fabric
vSCSI
HBA
Time spent in ESX storage stack
is minimal, for all practical
purposes
KAVG ~= QAVG
In a well configured system QAVG
should be zero
* KAVG = GAVG – DAVG
Array SP
Disk I/O Queues
GQLEN – Guest Queue
AQLEN – Adapter Queue
WQLEN – World Queue
DQLEN – Device / LUN
Queue
SQLEN – Array SP Queue
DQLEN
WQLEN
SQLEN
GQLEN
DQLEN can change dynamically
when SIOC is enabled
Reported in esxtopAQLEN
ApplicationGuest OS
ESX Storage
Stack
VMM
Driver
Fabric
vSCSI
HBA
Array SP
Use the Right Tool
• esxtop
2 sec data points, VERY granular, not scalable across hosts
• vRealize Operations
5 min data points, very scalable, best starting view
• vCenter Performance Charts
20 sec data points, okay real-time data, poor history, recommend vROPs
• VSAN Observer
Most detailed tool to troubleshoot VSAN related performance
• 3rd Party
Ensure you know what the counters mean and their sample rate
vRealize Operations
• vRealize Operations
– Manage storage performance
on scale
– Integrate with your storage
OEM
• VMware Virtual SAN with a
management pack for storage
monitoring:
– Virtual SAN Object and
component limits
– Disk/Disk Groups.
– Virtual SAN datastore
VSAN Observer
• VSAN Observer is the
engineering performance tool.
– Latency
– IOPS
– Congestion
– OutstandingIO
– Bandwidth
• Do not use esxtop
Storage: Key Indicators
• Kernel Latency Average (KAVG)
This counter tracks the latencies of IO passing thru the Kernel
Investigation Threshold: 1ms
• Device Latency Average (DAVG)
This is the latency seen at the device driver level. It includes the round-trip time between the HBA and the storage.
Investigation Threshold: 10-15ms, lower is better, some spikes okay
• Guest Latency Average (GAVG)
This is the latency seen at the guest level. It is effectively DAVG + KAVG. Needed for network attached storage.
Investigation Threshold: 10-15ms, lower is better, some spikes okay
esxtop– For live troubleshooting and root cause analysis, Finer Granularity (2 Second)
– Lots of Metrics reported
CPU
Scheduler
Memory
Scheduler
Virtual
SwitchvSCSI
c, i, p m d, u, vn
• c: cpu (default)
• m: memory
• n: network
• p: power management
• i: Interrupts
• d: disk adapter
• u: disk device
• v: disk VM
E S X T O P S C R E E N S
esxtop disk adapter screen (d)
Host bus adapters (HBAs) - includes SCSI,
iSCSI, RAID, and FC-HBA adapters
Latency stats from the Device,
Kernel and the Guest
DAVG/cmd - Average latency (ms) from the Device (LUN)
KAVG/cmd - Average latency (ms) in the VMKernel
GAVG/cmd - Average latency (ms) in the Guest
Guest Level Issues
Questions to ask
Is In Guest/App Latency > GAVG vCenter
Latency
Is Latency and IOPS Low but Performance is
“BAD”
Guest Level Queue
Guest App Tuning / Threads / Outstanding I/Os
For Very High IOP levels - Use Multiple vSCSIControllers / Disks
Guest Level Drivers PVSCSI Investigate Interrupt Coalescing
Alignment
Filesystem optimizations: fragmentation, sync/async, …
ApplicationGuest OS
ESXi Level Issues
Device Queue Overflow
World Queue Limiting
High %SYS/Chargeback or VMWAIT
– Blocked Waiting on I/O
– Blocked Waiting on Swapping
High Failed Disk IOPs
SIOC Kicked in – Latency Threshold
VM IOP Limit Set
Questions to ask
Is KAVG > 1ms
Is Device Queue full
Is ESX Host CPU > 85%
IS VM SYS% > 35%
Is VMWAIT > 5%
ESX Storage
Stack
VMM
Driver
vSCSI
Array Level Issues
Engage your storage partner to assist in diagnosis
Questions to ask
Is DAVG > 20ms
What is array health & utilization?
What is the array reporting for service times?
Fabric
Array SP
Device Queue Full
KAVG is non-
zeroQueuing issue
LUN Queue
depth is 32
32 IOs in flight
and 32 Queued
Disk I/O Queuing – World Queue
World ID
World Queue Length –
modifiable
Disk.SchedNumRequestOut
standing
Background: CPU State Times
IDLE
WAIT
SWPWT blocked
VMWAIT
RUNRDY
MLMTD
Elapsed Time
CSTP
Guest I/O
Chargeback : %SYS time
CPU frequency Scaling: Turbo boost USED > (RUN – SYS)
Power management USED < (RUN – SYS)
Identifying storage connectivity issues
I/O activity to NFS
datastore
System time charged
for NFS activity
NFS Connectivity Issue (1 of 2)
Identifying storage connectivity issues
VM blocked,
connectivity lost to
NFS datastore
No I/O activity on the
NFS datastore
VM is not using
CPU
NFS Connectivity Issue (2 of 2)
Performance Impact of Swapping
Some swapping activity
Time spent in blocked
state due to swapping
Storage – Recommendations
• Use Multiple vSCSI Adapters
Allows for more queues and I/O’s in flight
• Use pvscsi vSCSI Adapter
More efficient I/O’s per cycle
• Don’t Use RDM’s
Unless needed for shared disk clustering, no longer a performance advantage
• Leverage Your Storage OEM’s Integration Guide
They provide necessary guidance around items like multi-pathing, 80% of issues solved here
“VMDK on VMFS” or “RDM”
• There really is no difference in performance between
vmdk on VMFS and RDM
• https://blogs.vmware.com/vsphere/2013/01/vsphere
-5-1-vmdk-versus-rdm.html
• Use RDMs ONLY when you require shared disk
clustering (or native SAN tools)
“Thick” vs “Thin”MBs I/O Throughput
• Thin (Fully Inflated and Zeroed) Disk Performance =
Thick Eager Zero Disk
• Performance impact due to zeroing, not result of
allocation of new blocks
• To get maximum performance from the start, must use
Thick Eager Zero Disks (think Business Critical Apps)
• Maximum Performance happens eventually, but when
using lazy zeroing, zeroing needs to occur before you
can get maximum performance
http://www.vmware.com/pdf/vsp_4_thinprov_perf.pdf
Iometer
is an I/O subsystem measurement and characterization tool for single and clustered systems. Windows and Linux
Windows and Linux
Free (Open Source)
Single or Multi-server capable
Multi-threaded
Metrics Collected
• Total I/Os per Sec.
• Throughput (MB)
• CPU Utilization
• Latency (avg. & max)
I/O Analyzer
is a virtual appliance solution, which provides a simple and standardized way of measuring storage performance.
http://labs.vmware.com/flings/io-analyzer
Readily deployable virtual appliance
Easy configuration and launch of I/O
tests on one or more hosts
I/O trace replay as an additional
workload generator
Ability to upload I/O traces for
automatic extraction of vital metrics
Graphical visualization
Storage Profiling Tips and Tricks Common IO Profiles (database, web, etc): http://blogs.msdn.com/b/tvoellm/archive/2009/05/07/useful-io-profiles-for-simulating-
various-workloads.aspx
Make Sure to Check / Try:
– Load balancing / multi-pathing
– Queue depth & outstanding I/Os
– pvSCSI Device Driver
Look out for:
– I/O contention
– Disk Shares
– SIOC & SDRS
– IOP Limits
vscsiStats – DEEP Storage Diagnostics
vscsiStats characterizes IO for each virtual disk
• Allows us to separate out each different type of workload into
its own container and observe trends
Histograms only collected if enabled; no overhead
otherwise
Metrics
I/O Size
Seek Distance
Outstanding I/Os
I/O Interarrival Times
Latency
Steve’s 4-dimensional framework for Latency discussions
CONFIDENTIAL 28
• Magnitude
How high are the spikes, when they happen?
• Frequency
How often / what times / days do the spikes occur?
• Duration
How log do they last, when they occur?
• Spread
How many hosts / datastores are involved
Magnitude – It Matters
CONFIDENTIAL 29
Image Credit: http://housepetscomic.wikia.com/wiki/File:Order_Of_Magnitude.png
Magnitude – How High?
CONFIDENTIAL 30
• Magnitude - minor
– Spikes of 30-40-50 (or even 100) milliseconds, could be IOPS exceeding the underlying capacity of the hardware
– This level of magnitude will possibly cause small queues to develop
– Depending on the duration, this might cause applications to feel pain, but not intolerable – a “dull ache” periodically
• Magnitude - intermediate
– When the spikes get up towards 500 milliseconds or greater, not likely an IOPS issue
– This is approximately 50x as long as we would normally expect for each command to complete (i.e. 50 x 10 ms = 500 ms)
– Queues will most certainly develop, and the queues may get sufficiently long that the workload perceives it as an outage
• Magnitude - major
– If single SCSI commands take more than 1000 milliseconds (1 second), to execute, there are serious issues indeed
– Queues will almost certainly get sufficiently long that workloads will perceive storage is unavailable
– In both the intermediate and major cases here, duration and spread must be considered
Frequency – Variations in patterns
CONFIDENTIAL 31
Image Credit: https://www.sfu.ca/~truax/Frequency_Modulation.html
Frequency – How often / What days / times?
CONFIDENTIAL 32
• Frequency - occasional
– Once in a while, no set pattern in terms of day of week, time of day
– Also seemingly “random” with regard to datastores, hosts
– Not consistent over time, appears to come and go
• Frequency – some patterns
– Sometimes we see certain time frames of the day; i.e. middle of the night
– These time slots are usually reserved for maintenance type activity
– A “flood” of activity that is much more than the environment was engineered for, can be the cause
• Frequency – all over the place
– In this scenario, we see events logged all through the day / night, and multiple days of week, weeks of month
– Workloads may perceive storage is unavailable (because of excessive queuing)
– In both the intermediate and major cases here, duration and spread must be considered.
Duration – How Long … ?
CONFIDENTIAL 33
Image Credit: http://www.jqueryscript.net/images/Easy-Time-Duration-Picker-Plugin-with-jQuery-jQuery-UI.jpg
Duration – How Long Do the Spikes last?
CONFIDENTIAL 34
• Duration – a “blip”
– Durations of a few milliseconds, or even up to a second, are not necessarily material (unless the Magnitude is high)
– Generally don’t last long enough to cause queuing
– Consider, however, the frequency – if they are all consecutive, even short duration “blips” can add up to longer periods
• Duration – moderate in length
– Generally these are greater than 1 second, but likely in the order of < 1 minute or two
– Cause may be some sort of queue development and clearing
– Here, other factors such as frequency are relevant – if they happen too frequently, the effect can be much worse
• Duration - elongated
– Depending on the Magnitude, if spikes go on and on for many seconds, the effect can be cumulative
– If the spike lasts for minutes, and the Magnitude is sufficiently high, workloads may perceive “outages”
– Again, most important to consider this factor together with Magnitude and Frequency
Spread – Hosts / Datastores
CONFIDENTIAL 35
Image Credit: https://www.wired.com/wp-content/uploads/2015/04/epi-rail-web1-1024x1024.jpg
Spread – Confined vs Widespread (?)
CONFIDENTIAL 36
• Spread – confined
– If the issue is on a single host (or a small subset of the total hosts), that suggests an inquiry line
– Often it can be limited to a single cluster
– Same inference for single datastore, or small subset of datastores
• Spread – intermediate
– Multiple hosts, multiple clusters, multiple datacenter objects
– More than one array type, and/or significant representation of datastores
– This suggests more of a fabric issue, especially if multiple arrays involved
• Spread – widespread / universal
– Almost all hosts involved, many clusters and/or datacenters
– Most datastores involved also
– This may be a combination of fabric and array issues
Latency - Symptoms
CONFIDENTIAL 37
Host logs: /var/run/log/vobd.log
Magnitude over half a secondDuration 39 seconds
Latency - Evidence
CONFIDENTIAL 38Parsed from logs / imported to Excel for analysis
But where do we look?
CONFIDENTIAL 39
DAVG: Host HBA Driver Firmware Wire Switch Wire Array Front End LUN Media And return!
Strategy / Approach
CONFIDENTIAL 40
• Collect data from the hosts
– Based on vm-support log extracts
– Objective: Understand the Magnitude, Frequency, Duration and Spread
– Get everyone on the same page regarding the symptoms
• Share the data collaboratively
– Both storage and fabric support teams
– Get the correlating data from the array stats
– Compare with the hosts’ experience
• Does the array data agree with the host experience?
– If so, then array support / vendor can investigate / make changes
– If not, then issue must be in the fabric, so different direction for the investigation
– After any changes are made, collect fresh logs and perform comparative analysis
Configuration issues – many possibilities
CONFIDENTIAL 41
• Round Robin PSP
– IOPS=1000 vs IOPS=1
• Fabric
– Switches (hardware / field upgradeable code such as firmware)
– Cable plant (defective or inferior quality cables / connectors)
– Zoning issues
• HBA issues
– Drivers / firmware / hardware issues
– Queue depth settings
• Array issues
– Front end processor issues
– Defective media
– De-duplication and other overhead activity
– High % of cache misses
Sample Case # 2 – Unresponsive guests
CONFIDENTIAL 42
… and yet, can ping the hosts, no apparent network issues
Can this be a storage issue?
CONFIDENTIAL 43
Let’s look in the logs
CONFIDENTIAL 43
Host logs: /var/run/log/vobd.log
Between 11:19:05.105Z and 11:21:46.217Z, no I/O scheduled for datastore UUID 4a80
https://kb.vmware.com/kb/2136081 - "Understanding lost access to volume messages in ESXi 5.5/6.x"
And speaking of logs …
CONFIDENTIAL 44
• Many people find this painful
– But it is not meant to be
– https://kb.vmware.com/kb/1008524 - "Collecting diagnostic information for VMware products“
– The above KB has links to every product (or should do – please report if not)
– If you have trouble collecting logs, then that’s a reason for an SR all by itself
• Which logs?
– For storage, almost always get vm-supports from ALL hosts in any cluster of interest
– If LUNs are presented to multiple clusters, then ALL hosts in EACH cluster
– Generally vCenter and vSphere client logs can be omitted
– But … it is the responsibility of the investigator (TSE) to prescribe which logs are needed
• Uploading
– https://kb.vmware.com/kb/2069559 - "Uploading diagnostic information for VMware through the Secure FTP portal"
– Make sure to use Binary transmission mode
– Make sure to change into the SR directory (after creating as necessary – directory name is SR #) before transferring files
Bottom Line Principles
CONFIDENTIAL 45
• Optimally configured / engineered environments
– Should exhibit few, if any, latency alerts and/or VMFS heartbeat “timedout” events
– Log analysis can be done anytime, if problems are suspected
– Also, can be done when problems are NOT suspected – provides useful baseline info
– Better to cite evidence, than to throw darts
• Collaboration is key
– The root cause(s) of these issues are usually external to vSphere, BUT …
– ESXi log analysis can help direct the investigation, AND …
– Storage and fabric support teams are needed in addition to vSphere admins, AND …
– Vendors need to get engaged also
• It’s in everyone’s interest that things are smooth and stable
– Often, if these issues are chronic, word starts spreading that virtualized apps “can’t keep up” with physicals
– In most cases, that is no longer true
– And even if it is in some cases – we want to fix that!
Community Resources
VMware’s Performance Technology Pages (Whitepapers Here)
– http://www.vmware.com/technical-resources/performance/resources.html
VMware’s Tech-Marketing Performance Blog
– http://blogs.vmware.com/vsphere/performance/
VMware’s Perf-Eng Blog (VROOM!)
– http://blogs.vmware.com/performance
Performance Community Forum
– http://communities.vmware.com/community/vmtn/general/performance
VMware Performance Links – Master List
– https://communities.vmware.com/docs/DOC-25253
Virtualizing Business Critical Applications
– http://www.vmware.com/solutions/business-critical-apps/
Resources
VMware’s Performance – Technical Whitepapers
http://www.vmware.com/resources/techresources/cat/91,96
VMware’s Tech-Marketing Performance Blog
http://blogs.vmware.com/vsphere/performance/
VMware’s Perf-Eng Blog (VROOM!)
http://blogs.vmware.com/performance
Performance Community Forum
http://communities.vmware.com/community/vmtn/general/performance
VMware Performance Links – Master List
https://communities.vmware.com/docs/DOC-25253
Virtualizing Business Critical Applications
http://www.vmware.com/solutions/business-critical-apps/
CONFIDENTIAL 47
Resources
Performance Best Practices
http://www.vmware.com/pdf/Perf_Best_Practices_vSphere5.5.pdf
http://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/techpaper/vmware-perfbest-practices-vsphere6-0-white-paper.pdf
Troubleshooting Performance Related Problems in vSphere Environments
http://communities.vmware.com/docs/DOC-23094 (vSphere 5.x with vCOps)
CONFIDENTIAL 48
Resources
Virtualizing Microsoft Business Critical Applications on VMware vSphere
by: Matt Liebowitz, Alexander Fontana
vSphere High Performance Cookbook
by: Prasenjit Sarkar
Troubleshooting Storage Performance
By: Mike Preston
VMware vSphere Performance: Designing CPU, Memory, Storage, and Networking for Performance-Intensive Workloads
By: Matt Liebowitz, Christopher Kusek, Rynardt Spies
Virtualizing SQL Server with VMware: Doing IT Right
By: Jeff Szastak, Michael Corey, Michael Webster
Virtualizing Oracle Databases on vSphere
By: Don Sullivan, Kannan Mani
VMware vRealize Operations Performance and Capacity Management
By: Ewan ‘e1’ Rahabok
CONFIDENTIAL 49
Resources
VMware Hands-On-Labs
http://labs.hol.vmware.com/
HOL-SDC-1404:
vSphere Performance Optimization – This has always been one of the most popular labs and has content for both the beginner and the advanced vSphere Administrator. You can learn more about the basics of vSphere Performance or delve into esxtop, or vNUMA.
http://labs.hol.vmware.com/HOL/#lab/1474
CONFIDENTIAL 50
Thank You
Steve SykesStaff Engineer, Global Support, Premier [email protected]