35
ESXTOP ESXTOP is the utility only for ESX host to examine real-time resource usage for ESX and resxtop for ESX & ESXi hosts. esxtop can only be used for the local ESX machine but resxtop can be used remotely to view the resource utilization of ESX/ESXi hosts from other ESX/ESXi servers or VMA. There are 3 different types of Modes in esxtop/resxtop Interactive Mode Batch Mode Replay Mode. Interactive mode (the default mode) – All statistics are displayed are in real time Batch mode – Statistics can be collected so that the output can be saved in a file (csv) and can be viewed & analyzed using windows perfmon & other tools in later time. Replay mode – It is similar to record and replay operation. Data that was collected by the vm-support command is interpreted and played back as esxtop statistics. We can view the captured performance information for a particular duration or time period as like real time to view what was happening during that time. It is perfectly used for the VMware support person to replay the stats to understand what was happening to the server during that time. ESX Interactive mode (the default mode) – All statistics are displayed are in real time which is similar to windows task manager. By default screen refreshes by every 2 seconds. Below is snapshot of esxtop with the memory stats

ESXTOP

Embed Size (px)

Citation preview

ESXTOPESXTOP is the utility only for ESX host to examine real-time resource usage for ESX and resxtopfor ESX & ESXi hosts. esxtop can only be used for the local ESX machine butresxtop can be used remotely to view the resource utilization of ESX/ESXi hosts from other ESX/ESXi servers or VMA.There are 3 different types of Modes in esxtop/resxtopInteractive ModeBatch ModeReplay Mode.Interactive mode(the default mode) All statistics are displayed are in real timeBatch mode Statistics can be collected so that the output can be saved in a file (csv) and can be viewed & analyzed using windowsperfmon& other tools in later time.Replay mode It is similar to record and replay operation. Data that was collected by the vm-support command is interpreted and played back as esxtop statistics. We can view the captured performance information for a particular duration or time period as like real time to view what was happening during that time. It is perfectly used for the VMware support person to replay the stats to understand what was happening to the server during that time.ESX Interactive mode (the default mode) All statistics are displayed are in real time which is similar to windows task manager. By default screen refreshesby every 2 seconds.Below is snapshot ofesxtopwith the memory stats

Below are the single key commands inesxtopto switch to different stats while running in Interactive mode.C CPU View (default screen when you typedesxtop)Type c in the interactive mode to Switch to the CPU resource utilization screen of ESX serverm Memory viewType m in the interactive mode to Switch to the memory resource utilization screen of ESX serverd Disk adapter viewType d in the interactive mode to Switch to the storage disk adapter resource utilization screen of the ESX server

u Disk device viewType u in the interactive mode to Switch to the storage Disk device resource utilization screen of the ESX serverv Virtual Disk ViewType v in the interactive mode to Switch to the virtual disk resource utilization screen of the ESX servern Network ViewType n in the interactive mode to Switch to the network utilization screen of the ESX serveryPower ManagementType y in the interactive mode to switch to the power utilization screen of the ESX serverh Help screen foresxtopType h to display the help foresxtopcommands

q quit theesxtopType q to quit from theesxtopinteractive modef add or remove the fieldsin the respectiveview. For example, In the memory stats view, you can add or remove some fieldsby pressing f and If you want MEM SIZE to display press G to add that field to display in stats and press Enter to back to view the added filed in the interactive mode.

o to order the fields inthe respective view.usea-o to change order.Uppercase moves a filed left, lowercase moves a filed right.

s to set the refresh delay to refresh the screen. Default is 5 seconds. Press Space bar to refresh immediately.

W to save as the customized fields. Add or remove the fields as per your wish and if you want the customized fields to loadeverytimejust save as with the default name(Default to : /root/.esxtop4rc) or save as with name as per your wish.

To load theesxtopwith your customized fields1. Type ESXTOP command2. Add or remove the fields which you want, for example I just pressed m to show the memory details3. Press W to save the file4. When it prompted to save a config file to type the location other than the default one, for example typed /home/mohammedk/mystats

5. Quit the currentesxtopscreen6. To load ESXTOP with your saved custom configuration, type esxtopc, In our case esxtop c /home/mohammedk/mystats

ESXTOP -Batch Mode

Batch mode Statistics can be collected and output can be saved in a file (csv) and also it can be viewed & analyzed using windowsperfmon& other tools in later time.

To runesxtopin batch mode and save the output file for feature analysis use the command as in in below syntaxesxtop -b -d 10 -n 5 >/home/mohammedk/esxtstats.csv

dSwitch is used for the number of seconds between refreshesnswitch is the number of iterations to run theesxtopIn our above example,esxtopcommand will run for about 50 seconds. 10 seconds dealy* 5 iterations.( >/home/mohammedk/esxstats.csv) redirecting the output of aboveesxtopstats intocsvfile to store in the location /home/mohammedk/esxstats.csv

Once the command completed, browse towards the location /home/mohammedkto see theesxtopoutput file esxstats.csv. Transfer thecsvfile usingwinscpto your windows desktop and analyze using windowsperfmonoresxplot.ESXTOP Replay ModeReplay mode It is similar to record and replay operation. Data that was collected by the vm-support command is interpreted and played back asesxtopstatistics. We can view the captured performance information for a particular duration or time period as like real time to view what washappenedduring that time.This will be very useful for VMware support engineers who dont have access to your system to troubleshoot some performance issues. They can runesxtopagainst the collected support file to analyze the performance issue occurred during that particular time. Make sure you have enough free space on your server to save the support file. Runningesxtopfor a longer duration will consume a huge amount of disk space.To run theesxtopin replay mode, first run the vm-supportcommand first. I am running from the directory /home/mohammedk. So the output file will be savedinthe same directory.vm-support -s i5 -d 10

iisthe iteration and -d is the delay between the refresh. Above command will collect stats for 50 seconds ( 10 seconds * 5iteration) = 50 seconds.Oncevm-support completed,allthe files are stored in thelocation (/home/mohammedk)We need to extract the file fromesx-2012-06-2813.51.29993.tgztothe same directory using the below commandtar -xzfesx-2012-06-2813.51.29993.tgz

To run theesxtopin replay mode, run the below command with the extracted file vm-support-vmware-arena-2012-06-2813.51.29993 from the above command.esxtop-rvm-support-vmware-arena-2012-06-2813.51.29993

The output will appear similar toesxtopcommand but here we are re playingthe support file output usingesxtopreplay mode.

ESXTOP is a fantastic tool available for the VMware administrator when troubleshooting performance issues in a vSphere Environment. ESXTOP has a somewhat steep learning curve, but it is all worth it. In this post I want to help you get a head start with ESXTOP. If you want a really good read I recommend Duncans very comprehensive post on the same subject ESXTOP is available in two ways. Either through the ESXi Shell or through the vSphere Management Assistant with the command RESXTOP. In this article I will focus on ESXTOP from the ESXi shell. It is very simple to get access to ESXTOP.Step 1: Get access to the ESXi Shell. This is done by opening your vSphere Client, go to host, configuration, security profile and start the ESXi Shell service on a specific ESXi host.Step 2: Download putty (or another SSH client) and create a SSH connection on port 22 to your ESXi host. Login with root and your password.Step 3: Type the command esxtop and hit returnStep 4: You are now looking at ESXTOP it should look similar to this:

What you are looking at is the CPU screen in ESXTOP and you are now looking for CPU specific counters. You can browse around through different pages. If you typeMyou will see memory metrics.Nfor network etc. If you typeHyou will see all available commands. By default ESXTOP shows a lot of worlds a world is similar to a process in windows task manager. To sort it out and not show vmkernel worlds you type lower casev. By doing this you only see the virtual machines running on this specific ESXi host.Now you are inside ESXTOP so lets focus on some good counters to use for performance troubleshooting.CPUWhen troubleshooting CPU performance for your virtual machines the following counters are the most important.%USED, %RDY, %CSTP%USEDtells you how much time did the virtual machine spend executing CPU cycles on the physical CPU.%RDYis a Key Performance Indicator! Always start with this one. This one defines how much time your virtual machine wanted to execute CPU cycles but could not get access to the physical CPU. It tells you how much time did you spend in a queue. I normally expect this value to be better than 5% (this equals 1000ms in the vCenter Performance Graphs read about itCPU Ready 1000 ms. equals 5%JANUARY 31, 2011BYFRANKBRIX8 COMMENTSOne of the key performance counters in a vSphere enviroment is:CPU ready(%rdy in ESXTOP)CPU ready isthe time a virtual CPU is ready to run but is not being scheduled on a physical CPU. This would under normal circumstances indicate that there is not enough physical CPU resources on an ESX/ESXi host. This is the first go-to counter when your users complain about bad performance.The CPU ready counter is accessible from the vSphere Client and from ESXTOP. I have made two screenshots showing the a virtual machine and its ready time:vCenter Performance Graphs (Value 1035 milliseconeds)

ESXTOP (value 5.38%)

What we see is a virtual machine with a ready time of 1035 ms. or 5.38%. These numbers are actually telling us the same thing. When we are using the performance graphs the graph updates every 20 second (or 20,000 millisecond). With a ready time of 1035 ms. we can change it to a percentage:(1035 ms. x 100) / 20000 ms = 5,175%To be able to interprept ready times it is essential to know the relationship between the percentage of ESXTOP and ms. of the Performance Graphs. You are seeing the same numbers. One is in milliseconds the other is a percentage.1% = 200 ms.5% = 1,000 ms.10% = 2,000 ms.100% = 20,000 ms.In general you want to see virtual machines with a ready time lower than 1000 ms. or 5%.Read more about ESXTOPhereJust heard of a cool calculator to convert cpu ready times to a percentage:http://www.vmcalc.com/%CSTPtells you how much time a virtual machine is waiting for a virtual machine with multiple vCPU to catch up. If this number is higher than 3% you should consider lowering the amount of vCPU in your virtual machine.MemoryWhen troubleshooting memory performance this is the counters you want to focus on from a virtual machine perspective.MCTL?, MCTLSZ, SWCUR, SWR/s, SWW/sMCTL?This column is either YES or NO. If Yes it means that the balloon driver is installed. The Balloon driver is automatically installed with VMware tools and should be in every virtual machine. If it says No in this column then figure out why.MCTLSZThe column show you how inflated the balloon is in the virtual machine. If it says 500MB it translates to the balloon driver inside the guest operating system has stolen 500MB from Windows/Linux etc. You would expect to see a value of 0 (zero) in this columnSWCURtells you how much memory the virtual machine has in the .vswp file. If you see a number of 500MB here it means that 500MB is from the swap file. This does not necessarily equals to bad performance. To figure out if you virtual machine is suffering from hypervisor swapping you need to look at the next two counters. In a healthy environment you would want this value to p 0 (zero)SWR/sThis value tells you the Read activity to your swap file. If you see a number here, then your virtual machine is suffering from hypervisor swapping.SWW/sThis value tells you the Write activity to your swap file. You want to see the number 0 (zero) here. Every number above 0 is BAD.ESXTOPIntroThresholdsHowto RunHowto CaptureHowto AnalyzeHowto Limit esxtop to specific VMsReferencesChangelogThis page is solely dedicated to one of the best tools in the world for ESX; esxtop.IntroI am a huge fan of esxtop! I read a couple of pages of theesxtop bibleevery day before I go to bed. Something I however am always struggling with is the thresholds of specific metrics. I fully understand that it is not black/white, performance is the perception of a user in the end.There must be a certain threshold however. For instance it must be safe to say that when %RDY constantly exceeds the value of 20 it is very likely that the VM responds sluggish. I want to use this article to define these thresholds, but I need your help. There are many people reading these articles, together we must know at least a dozen metrics lets collect and document them with possible causes if known.Please keep in mind that these should only be used as a guideline when doing performance troubleshooting! Also be aware that some metrics are not part of the default view. You can add fields to an esxtop view by clicking f on followed by the corresponding character.I used VMworld presentations, VMware whitepapers, VMware documentation, VMTN Topics and of course my own experience as a source and these are the metrics and thresholds I came up with so far. Please comment and help build the main source for esxtop thresholds.Metrics and ThresholdsDisplayMetricThresholdExplanation

CPU%RDY10Overprovisioning of vCPUs, excessive usage of vSMP or a limit(check %MLMTD) has been set. See Jasonsexplanationfor vSMP VMs

CPU%CSTP3Excessive usage of vSMP. Decrease amount of vCPUs for this particular VM. This should lead to increased scheduling opportunities.

CPU%SYS20The percentage of time spent by system services on behalf of the world. Most likely caused by high IO VM. Check other metrics and VM for possible root cause

CPU%MLMTD0The percentage of time the vCPU was ready to run but deliberately wasnt scheduled because that would violate the CPU limit settings. If larger than 0 the world is being throttled due to the limit on CPU.

CPU%SWPWT5VM waiting on swapped pages to be read from disk. Possible cause: Memory overcommitment.

MEMMCTLSZ1If larger than 0 host is forcing VMs to inflate balloon driver to reclaim memory as host is overcommited.

MEMSWCUR1If larger than 0 host has swapped memory pages in the past. Possible cause: Overcommitment.

MEMSWR/s1If larger than 0 host is actively reading from swap(vswp). Possible cause: Excessive memory overcommitment.

MEMSWW/s1If larger than 0 host is actively writing to swap(vswp). Possible cause: Excessive memory overcommitment.

MEMCACHEUSD0If larger than 0 host has compressed memory. Possible cause: Memory overcommitment.

MEMZIP/s0If larger than 0 host is actively compressing memory. Possible cause: Memory overcommitment.

MEMUNZIP/s0If larger than 0 host has accessing compressed memory. Possible cause: Previously host was overcommited on memory.

MEMN%L80If less than 80 VM experiences poor NUMA locality. If a VM has a memory size greater than the amount of memory local to each processor, the ESX scheduler does not attempt to use NUMA optimizations for that VM and remotely uses memory via interconnect. Check GST_ND(X) to find out which NUMA nodes are used.

NETWORK%DRPTX1Dropped packets transmitted, hardware overworked. Possible cause: very high network utilization

NETWORK%DRPRX1Dropped packets received, hardware overworked. Possible cause: very high network utilization

DISKGAVG25Look at DAVG and KAVG as the sum of both is GAVG.

DISKDAVG25Disk latency most likely to be caused by array.

DISKKAVG2Disk latency caused by the VMkernel, high KAVG usually means queuing. Check QUED.

DISKQUED1Queue maxed out. Possibly queue depth set to low. Check with array vendor for optimal queue depth value.

DISKABRTS/s1Aborts issued by guest(VM) because storage is not responding. For Windows VMs this happens after 60 seconds by default. Can be caused for instance when paths failed or array is not accepting any IO for whatever reason.

DISKRESETS/s1The number of commands reset per second.

DISKCONS/s20SCSI Reservation Conflicts per second. If many SCSI Reservation Conflicts occur performance could be degraded due to the lock on the VMFS.

Running esxtopAlthough understanding all the metrics esxtop provides seem to be impossible using esxtop is fairly simple. When you get the hang of it you will notice yourself staring at the metrics/thresholds more often than ever. The following keys are the ones I use the most.Open console session or ssh to ESX(i) and type:esxtopBy default the screen will be refreshed every 5 seconds, change this by typing:s 2Changing views is easy type the following keys for the associated views:c = cpum = memoryn = networki = interruptsd = disk adapteru = disk device (includes NFS as of 4.0 Update 2)v = disk VMp = power states

V = only show virtual machine worldse = Expand/Rollup CPU statistics, show details of all worlds associated with group (GID)k = kill world, for tech support purposes only!l = limit display to a single group (GID), enables you to focus on one VM# = limiting the number of entitites, for instance the top 5

2 = highlight a row, moving down8 = highlight a row, moving up4 = remove selected row from viewe = statistics broken down per world6 = statistics broken down per worldAdd/Remove fields:f

Changing the order:o

Saving all the settings youve changed:WKeep in mind that when you dont change the file-name it will be saved and used as default settings.Help:?In very large environments esxtop can high CPU utilization due to the amount of data that will need to be gathered and calculations that will need to be done. If CPU appears to highly utilized due to the amount of entities (VMs / LUNs etc) a command line option can be used which locks specific entities and keeps esxtop from gathering specific info to limit the amount of CPU power needed:esxtop -lMore info about this command line option can be foundhere.Capturing esxtop resultsFirst things first. Make sure you only capture relevant info. Ditch the metrics you dont need. In other words run esxtop and remove/add(f) the fields you dont actually need or do need! When you are finished make sure to write(W) the configuration to disk. You can either write it to the default config file(esxtop4rc) or write the configuration to a new file.Now that you have configured esxtop as needed run it in batch mode and save the results to a .csv file:esxtop -b -d 2 -n 100 > esxtopcapture.csvWhere -b stands for batch mode, -d 2 is a delay of 2 seconds and -n 100 are 100 iterations. In this specific case esxtop will log all metrics for 200 seconds. If you want to record all metrics make sure to add -a to your string.Or what about directly zipping the output as well? These .csv can grow fast and by zipping it a lot of precious diskspace can be saved!esxtop -b -a -d 2 -n 100 | gzip -9c > esxtopoutput.csv.gzPlease note that when a new VM is powered on, a VM is vMotion to the host or a new world is created it will not show up within esxtop when -b is used as the entities are locked! This behavior is similar to starting esxtop with -l.Analyzing resultsYou can use multiple tools to analyze the captured data.1. VisualEsxtop2. perfmon3. excel4. esxplotWhat is VisualEsxtop as it is a relatively new tool (published 1st of July 2013).VisualEsxtop is an enhanced version of resxtop and esxtop. VisualEsxtop can connect to VMware vCenter Server or ESX hosts, and display ESX server stats with a better user interface and more advanced features.That sounds nice right? Lets have a look how it works, this is what I did to get it up and running: Go to http://labs.vmware.com/flings/visualesxtop and click download Unzip VisualEsxtop.zip in to a folder you want to store the tool Go to the folder Double click visualesxtop.bat when running Windows (Or followWilliams tipfor the Mac) Click File and Connect to Live Server Enter the Hostname, Username and Password and hit Connect

That is itNow some simple tips: By default the refresh interval is set to 5 seconds. You can change this by hitting Configuration and then Change Interval You can also load Batch Output, this might come in handy when you are a consultant for instance and a customers sends you captured data, you can do this under: File -> Load Batch Output You can filter output, very useful if you are looking for info on a specific virtual machine / world! See the filter section.

When you click Charts and double click Object Types you will see a list of metrics that you can create a chart with. Just unfold the ones you need and double click them to add them to the right pane

There are a bunch of other cool features in their like color-coding of important metrics for instance. Also the fact that you can show multiple windows at the same time is useful if you ask me and of course the tooltips that provide a description of the counter! If you ask me, a tool everyone should download and check out.Lets continue with my second favorite tool,perfmon. Ive used perfmon(part of Windows also know as Performance Monitor) multiple times and its probably the easiest as many people are already familiar with it. You can import a CSV as follows:1. Run: perfmon2. Right click on the graph and select Properties.3. Select the Source tab.4. Select the Log files: radio button from the Data source section.5. Click the Add button.6. Select the CSV file created by esxtop and click OK.7. Click the Apply button.8. Optionally: reduce the range of time over which the data will be displayed by using the sliders under the Time Range button.9. Select the Data tab.10. Remove all Counters.11. Click Add and select appropriate counters.12. Click OK.13. Click OK.The result of the above would be:

WithMS Excelit is also possible to import the data as a CSV. Keep in mind though that the amount of captured data is insane so you might want to limit it by first importing it into perfmon and then select the correct timeframe and counters and export this to a CSV. When you have done so you can import the CSV as follows:1. Run: excel2. Click on Data3. Click Import External Data and click Import Data4. Select Text files as Files of Type5. Select file and click Open6. Make sure Delimited is selected and click Next7. Deselect Tab and select Comma8. Click Next and FinishAll data should be imported and can be shaped / modelled / diagrammed as needed.Another option is to use a tool called esxplot. It hasnt been updated in a while, and I am not sure what the state of the tool is. You can download the latest versionherethough, but personally I would recommend using VisualEsxtop instead of esxplot, just because it is more recent.1. Run: esxplot2. Click File -> Import -> Dataset3. Select file and click Open4. Double click host name and click on metric

As you can clearly see in the screenshot above the legend(right of the graph) is too long. You can modify that as follows:1. Click on File -> preferences2. Select Abbreviated legends3. Enter appropriate valueFor those using a Mac, esxplot uses specific libraries which are only available on the 32Bit version of Python. In order for esxplot to function correctly set the following environment variable:export VERSIONER_PYTHON_PREFER_32_BIT=yesLimiting your viewIn environments with a very high consolidation ratio (high number of VMs per host) it could occur that the VM you need to have performance counters for isnt shown on your screen. This happens purely due to the fact that height of the screen is limited in what it can display. Unfortunately there is currently no command line option for esxtop to specify specific VMs that need to be displayed. However you can export the current list of worlds and import it again to limit the amount of VMs shown.esxtop -export-entity filenameNow you should be able to edit your file and comment out specific worlds that are not needed to be displayed.esxtop -import-entity filenameI figured that there should be a way to get the info through the command line as and this is what I came up with. Please note that needs to be replaced with the name of the virtual machine that you need the GID for.VMWID=`vm-support -x | grep |awk '{gsub("wid=", "");print $1}'`VMXCARTEL=`vsish -e cat /vm/$VMWID/vmxCartelID`vsish -e cat /sched/memClients/$VMXCARTEL/SchedGroupIDNow you can use the outcome within esxtop to limit(l) your view to that single GID. William Lam has written anarticlea couple of days after I added the GID section. The following is a lot simpler than what I came up with, thanks William!VM_NAME=STA202G ;grep "${VM_NAME}" /proc/vmware/sched/drm-stats | awk '{print $1}'

Identify CPU Memory Network Disk device or disk issues using ESXTOP , in interactive batch or replaymode Determine use cases for and apply esxtop Interactive, Batch and Replay modes Use vscsiStats to gather storage performance data Use esxtop/resxtopto collect performance datawitch display:c:cpui:interruptm:memoryn:networkd:disk adapteru:disk device v:disk VMp:power mgmt

fF Add or remove fieldsoO Change the order of displayed fieldss Set the delay in seconds between updates# Set the number of instances to displayW Write configuration file ~/.esxtop50rck Kill a worlde Expand/Rollup Cpu StatisticsV View only VM instancesL Change the length of the NAME fieldl Limit display to a single group

usage: esxtop [-h] [-v] [-b] [-l] [-s] [-a] [-c config file] [-R vm-support-dir-path][-d delay] [-n iterations][-export-entity entity-file] [-import-entity entity-file]-h prints this help menu.-v prints version.-b enables batch mode.-l locks the esxtop objects to those available in the first snapshot.-s enables secure mode.-a show all statistics.-c sets the esxtop configuration file, which by default is .esxtop50rc-R enables replay mode.-d sets the delay between updates in seconds.-n runs esxtop for only n iterations.Experimental Features-export-entity writes the entity ids into a file, which can be modifiedto select interesting entities.-import-entity reads the file of selected entities. If this optionis used, esxtop only shows the data for the selected entities.

2 = highlight a row, moving down8 = highlight a row, moving up4 = remove selected row from view

Type below command to displayallfields not default ones:~ # esxtop -a

Of course my screen even will not be enough to show all of them, but theMagicwhen you are here and presshthat will take you to the help screen , my concern here is not the help but how to order by the screen , for the above one , I have the below filters:

CPU (%USED, %RDY, %CSTP)

Press h as mentioned so you can sort by:Sort by:U:%USED R:%RDY N:GIDWhen troubleshooting CPU performance for your virtual machines the following counters are the most important.%USEDtells you how much time did the virtual machine spend executing CPU cycles on the physical CPU.%RDYis a Key Performance Indicator! Always start with this one. This one defines how much time your virtual machine wanted to execute CPU cycles but could not get access to the physical CPU. It tells you how much time did you spend in a queue. I normally expect this value to be better than 5% (this equals 1000ms in the vCenter Performance raphs read about ithere)%CSTPtells you how much time a virtual machine is waiting for a virtual machine with multiple vCPU to catch up. If this number is higher than 3% you should consider lowering the amount of vCPU in your virtual machine.Memory (MCTL?, MCTLSZ, SWCUR, SWR/s, SWW/s)

Sort by:M:MEMSZ B:MCTLSZ N:GIDWhen troubleshooting memory performance this is the counters you want to focus on from a virtual machine perspective.MCTL?This column is either YES or NO. If Yes it means that the balloon driver is installed. The Balloon driver is automatically installed with VMware tools and should be in every virtual machine. If it says No in this column then figure out why.MCTLSZThe column show you how inflated the balloon is in the virtual machine. If it says 500MB it translates to the balloon driver inside the guest operating system has stolen 500MB from Windows/Linux etc. You would expect to see a value of 0 (zero) in this columnSWCURtells you how much memory the virtual machine has in the .vswp file. If you see a number of 500MB here it means that 500MB is from the swap file. This does not necessarily equals to bad performance. To figure out if you virtual machine is suffering from hypervisor swapping you need to look at the next two counters. In a healthy environment you would want this value to p 0 (zero)SWR/sThis value tells you the Read activity to your swap file. If you see a number here, then your virtual machine is suffering from hypervisor swapping.SWW/sThis value tells you the Write activity to your swap file. You want to see the number 0 (zero) here. Every number above 0 is BAD.Sequence of memory bottle neckIFESXi host has a memory pressure situation it starts with:Page sharing then ballooning MCTLSZ then compression Cacheusd & ZIP/s then the last swap SWR/s & SWW/s which is really so badNetwork (MCTL?, MCTLSZ, SWCUR, SWR/s, SWW/s)

Sort by:T:MbTX/s R:MbRX/st:PKTTX/s r:PKTRX/sN:DefaultSPEED (Mbps) The link speed in Megabits per second. This information is only valid for a physical NIC.FDUPLX Y implies the corresponding link is operating at full duplex. N implies it is not. This information is only valid for a physical NIC.UP Y implies the corresponding link is up. N implies it is not. This information is only valid for a physical NIC.PKTTX/s The number of packets transmitted per second.PKTRX/s The number of packets received per second.MbTX/s (Mbps) The MegaBits transmitted per second.MbRX/s (Mbps) The MegaBits received per second.Q: Why does MbRX/s not match PKTRX/s for different workloads?A: This is because the packet size may not be the same. The average packet size can be computed as follows: average_packet_size = MbRX/s / PKTRX/s . A large packet size may improve CPU efficiency of processing the packet. However, it may potentially increase latency.Storage (d:disk adapter u:disk device v:disk VM vscsiStats )Disk Adapters:

KAVG/cmdAverage ESXi VMkernel latency per command, in milliseconds

DAVG/cmdAverage device latency per command, in milliseconds.

GAVG/cmdAverage virtual machine operating system latency per command, in milliseconds.

QAVG/cmdAverage queue latency per command, in milliseconds.

MetricThresholdWhat to Check

DAVG/cmd>20Storage processor/array performance for bottleneck.

KAVG/cmd>1Kernel driver firmware and adapter queue length.

GAVG/cmd>20DAVG/KAVG metrics, and Guest OS performance.

Note that:GAVG/cmd = KAVG/cmd + DAVG/cmdDAVG/cmd is the adapter device Driver Average Latency per Command. This is the round-trip in milliseconds from the HBA to the storage array and the return acknowledgement. Typically, most admins like to see around 20ms or less, though it can vary significantly depending on your workload and itssensitivity to latency.DAVG/cmd is a good indicator that you need to start your investigation outside of ESX at the fabric and storage array levels.KAVG/cmd is the adapter device VMkernel Average Latency per Command. This is the average latency between when the HBA receives the data from the storage fabric and passes it along to the Guest OS, or vice versabasically the round trip time in the kernel itself. So, it should be a very low value, meaning that the the I/O operation should spend as little time as possiblezero or near-zero is idealin the kernel.GAVG/cmd is the adapter device Guest OS Average Latency per Command. This is the round-trip in milliseconds from the Guest OS (its perspective) through the HBA to the storage array and back. This is why this number is a sum of DAVG/cmd + KAVG/cmd. If DAVG & KAVG are within normal thresholds, but GAVG/cmd is high, typically this indicates the VMs on that adapter or at least one of them is constrained by another resource, and needs more ESXi resources in order to process IOs more quickly. In my experience, however, high GAVG/cmd will typically be accompanied by another high value in either DAVG or KAVG.If KAVG/cmd is greater than 1ms or so, check a couple of things.1) Your device drivers are up-to-date and you are using compatible firmware versions, as this can slow down the kernel IO path;2) Your adapter optimization settings, which will be provided by the vendor (some of which we will discuss in the next post).Disk Device:

MetricThresholdWhat to Check

DQLENn/aFor reference; configured device queue length (prior to 5.0 LQLEN)

BLKSZn/aFor reference; configured device block size (for alignment issues)

RESETS/s>0Check paths and device availabilityCheck storage fabric/array for bottleneck

ABRTS/sQUED>0Check queue depth and storage fabric/array for bottleneck

RESV/s>0-1Compare to CONS/s

CON/sn/aIf >RESV/s, check for reservation conflicts with other ESXi hosts

>RESV/s

DQLEN is the configured Device Queue Length. This is really a reference point to make sure you have configured your devices correctly. A quick glance, as in the screenshot above, and you might notice one queue misconfigured.BLKSZ is the configured Device Block Size. This is another reference point to ensure that you have the correct block size for the type of workload you are running.RESETS/s is the number of Device SCSI Reset Commands per Second. A SCSI reset command is issued when the SCSI operation fails to reach the target, and in a SAN environment is usually indicative in a path down or multipathing issuei.e., ESXi thinks a path is fine but in reality it is faulty. This is commonly seen on Cisco Nexus fabrics as CRC errors on a port, for example.ABRTS/s is the number of Device SCSI Abort Commands per Second. A SCSI abort command is issued from the Guest OS when the command times out waiting for a response acknowledgement. In Windows 2008 and later, this is 60 seconds by default. Typically if you are encountering a large number of aborts, the storage fabric/array is causing a bottleneck and is the place to begin your investigation.If you are using something such as a NetApp FAS, be sure that you run theGOS Timeout Scripton your VM or VM template to make sure you have theproper timeout values(login required) set in order to prevent a SCSI abort during a path failover or path problem.QUED is the current Device Commands Queued in the VMkernel. As Iexplained previously, this number should be at zero or near zero, otherwise it is indicating that something in the kernel is throttling the IO throughput between the Guest OS and the HBA/storage fabric/array. Check firmware versions for correct revisions and other performance tuning options within ESXi, especially vendor recommendations.RESV/s is the Device SCSI Reservations per Second. SCSI reservations are commonplace; thats how SCSI commands work. This value is only important as it relates to CONS/s.CONS/s is the Device SCSI Reservation Conflicts per Second. If this value is greater than RESV/s, then it is indicative that some other ESXi hosts are holding reservations on this particular path that are conflicting with reservations currently held by this particular host. A very high value could be felt as a performance sluggishness in the storage subsystem due to the kernel constantly requesting SCSI locks and being denied, and consequently, retrying.Troubleshooting SCSI reservation conflicts can be challenging. Some helpful information can be found in this VMware KB deep-dive article onTroubleshooting SCSI Reservation Conflicts, as well as inVMware KB 1005009andVMware KB 1002293.From Virtual Machine Disk

vscsiStat ,,PleasereviewYou can output your results to csv file for other analysis :vscsiStats -p all -c > /tmp/output.csvDetermine use cases for and apply esxtop/resxtop Interactive, Batch and Replay modesUse cases:Troubleshooting poor performance for specific VM , or identify issues with storage , network or Memory.Interactive mode (the default mode): All statistics are displayed are in real time

Batch mode: Statistics can be collected so that the output can be saved in a file (csv) and can be viewed & analyzed using windows perfmon & other tools in later time.~ # esxtop -b -d 20 -n 2 -a > /tmp/20secsnds2intrpts.csvThis will run for 20 seconds for 2 iterations and output as csv

Replay mode: It is similar to record and replay operation. Data that was collected by the vm-support command is interpreted and played back as esxtop statistics. We can view the captured performance information for a particular duration or time period as like real time to view what was happening during that time. It is perfectly used for the VMware support person to replay the stats to understand what was happening to the server during that time.First let us see the vm-support switches:

So I run it withpto collect the performance data anddduring a period of 100 seconds , then over 2 seconds intervals/vmfs/volumes/4aaa440f-1a187eb4-6f5e-0000c985147e/LoGs # vm-support -p -d 100 -i 2 -w /vmfs/volumes/4aaa440f-1a187eb4-6f5e-0000c985147e/LoGs

Then reconstruct the data:/vmfs/volumes/4aaa440f-1a187eb4-6f5e-0000c985147e/LoGs # cd esx-esx01.com-2015-04-2104.56/./reconstruct.sh