31
© 2009 VMware Inc. All rights reserved Confidential Understanding esxtop statistics Matthew Mancini, Phoenix VMUG Leader 02/25/2010 Phoenix VMUG

Understanding esxtop statistics

  • Upload
    bly

  • View
    89

  • Download
    0

Embed Size (px)

DESCRIPTION

Understanding esxtop statistics. Matthew Mancini, Phoenix VMUG Leader 02/25/2010 Phoenix VMUG. Disclaimer. This presentation is not indented for any specific purpose. These are simply my findings with ESXTOP Use at your own risk. Speaker bio. Matt Mancini Leader of the Phoenix VMUG - PowerPoint PPT Presentation

Citation preview

Page 1: Understanding  esxtop  statistics

© 2009 VMware Inc. All rights reserved

Confidential

Understanding esxtop statistics

Matthew Mancini, Phoenix VMUG Leader

02/25/2010 Phoenix VMUG

Page 2: Understanding  esxtop  statistics

2 Confidential

Disclaimer... This presentation is not indented for any specific purpose.

These are simply my findings with ESXTOP

Use at your own risk...

Page 3: Understanding  esxtop  statistics

3 Confidential

Speaker bio Matt Mancini Leader of the Phoenix VMUG Server Room Administrator at John C Lincoln 20+ Years in IT [email protected] Blog -- vmexplorer.blogspot.com

Page 4: Understanding  esxtop  statistics

4 Confidential

Agenda References What is esxtop How to use it - Basic commands - Monitor CPU & Memory

Page 5: Understanding  esxtop  statistics

5 Confidential

References

Vmware Performance & Vmark Communities

http://communities.vmware.com/community/vmtn/general/performance?view=overview

** Interpreting esxtop Statistics **

http://communities.vmware.com/docs/DOC-9279

Resource Management Guide

http://www.vmware.com/pdf/vi3_35/esx_3/r35u2/vi3_35_25_u2_resource_mgmt.pdf

Understanding Performance

http://communities.vmware.com/docs/DOC-5250

Ready time

http://www.vmware.com/pdf/esx3_ready_time.pdf

Co-Scheduleing vCPU’s

http://communities.vmware.com/docs/DOC-4960

Page 6: Understanding  esxtop  statistics

6 Confidential

ESXTOP Translations...

Does anyone know what this means?Τα ελληνικά του σε με >> Its Greek to me

How about these 2, can you translate them?

Page 7: Understanding  esxtop  statistics

7 Confidential

What is esxtop

Esxtop allows monitoring and collection of data for all system resources: CPU, memory, disk and network. (aka the 4 basic food groups)

When used interactively, these “food groups” data can be viewed on different types of screens; CPU, memory, network, and disk adapter statistics.

If used correctly it can interpret the performance of your ESX server and the VM’s that are running on it.

Page 8: Understanding  esxtop  statistics

8 Confidential

What is esxtop

resxtop vs esxtop

The resxtop and esxtop command‐line utilities provide a detailed look at how ESX Server uses resources in real time. You can start either utility in one of three modes: interactive (default), batch, or replay.

The fundamental difference between resxtop and esxtop is that you can use resxtop remotely (or locally), whereas esxtop can be started only through the service console of a local ESX Server host.

The resxtop utility is a Remote Command Line Interface (Remote CLI) command and before you can use any Remote CLI commands, you must download, install, and configure the Remote CLI virtual appliance.

Page 9: Understanding  esxtop  statistics

9 Confidential

How to Use it – Basic commands

To launch – type esxtop press enter Can only be executed at Service Console and by the root user

Press q for Exit Press c for CPU Press m for Memory Press d for Disk Press n for Network Press f for fields Press Space to refresh (default refresh is 5 seconds)

Help – info esxtop << Detailed information Esxtop –h << Command line Syntax

Page 10: Understanding  esxtop  statistics

10 Confidential

How to Use it – Basic commands

At the root# type esxtop and press enter The default display is CPU metrics (Same as pressing c)

Press m for Memory

Page 11: Understanding  esxtop  statistics

11 Confidential

How to Use it – Basic commands

Press d for Disk

Press n for network

Page 12: Understanding  esxtop  statistics

12 Confidential

How to Use it – Basic commands

Press f for fields

Fields allow you to choose the fields you’d like displayed

In this case I pressed b to removethe GID field then I press space to exit

Now the GID field is not displayed

Note - By exiting esxtop this resets the fields

Page 13: Understanding  esxtop  statistics

13 Confidential

How to Use it – Basic commands

Press o for order

Use a-h to change order Uppercase move left Lowercase to the right

In this case I press ‘b’ to move the GIDField to the first position

Now the GID & ID field are swappedNote - By exiting esxtop this resets the fields

Page 14: Understanding  esxtop  statistics

14 Confidential

How to Use it – Basic commands

Set the screen delay ‘–d’ How quick the screen will refresh

Esxtop –d XX = where XX the time in seconds

Specifies the delay between screen updates. Default is 5 seconds. Minimum is 2 seconds.

You can change this with the interactive command 's'. If a delay of less than 2 seconds is specified then the delay is set to 2 seconds.

Page 15: Understanding  esxtop  statistics

15 Confidential

How to Use it – Monitor CPU

Uptime - The elapsed time since the server has been powered on CPU load average - CPU loads based on 1 minute, 5 minutes, and 15 minutes, based on 6-second samples.

PCPU(%) - The percentage CPU utilization per physical CPU Used total – Average of the PCPU’s CCPU(%) - Percentages of total CPU time as reported by the ESX Service Console.

"us" is for percentage user time, "sy" is for percentage system time, "id" is for percentage idle time and "wa" is for percentage wait time. "cs/sec" is for the context switches per second recorded by the ESX Service Console.

Page 16: Understanding  esxtop  statistics

16 Confidential

How to Use it – Monitor CPU

Esxtop uses worlds and groups as the entities to show CPU usage. A world is an ESX Server VMkernel schedulable entity, similar to a process or thread in other operating systems. A group contains multiple worlds.

Note: the sum of the NWLD column equals the total of worlds at the top.

ID 50 -71 Names have been blanked out these are VM’s

Page 17: Understanding  esxtop  statistics

17 Confidential

How to Use it – Monitor CPU

Note that ID 7 has 12 NWLD (NWLD = Number of Members in this world)

Press E to expand, type 7, press enter

ID 7 is now expanded and more detailed information is revealed

Page 18: Understanding  esxtop  statistics

18 Confidential

How to Use it – Monitor CPU

%RDY – A world in a run queue is waiting for CPU scheduler to let it run on a PCPU. %RDY accounts the percentage of this time. So, it is always smaller than 100% (clearest CPU contention indicator >5% is a concern)

%CSTP – high value indicates co-scheduling overhead %USED – amount of CPU load incurred by the world

Page 19: Understanding  esxtop  statistics

19 Confidential

How to Use it – Drill down CPU

Other CPU Commands R Sort resource pools, virtual machines, and worlds by the resource pool’s or virtual machine’s %RDY column.

N Sort resource pools, virtual machines, and worlds by the GID column. This is the default sort order.

** V Display virtual machine instances only

Page 20: Understanding  esxtop  statistics

20 Confidential

How to Use it – Monitor Memory

It is important to note that some statistics refer to guest physical memory while others refer to machine memory. "Guest physical memory" is the virtual-hardware physical memory presented to the VM. "Machine memory" is actual physical RAM in the ESX host. Let's use the following figure to explain. In the figure, two VMs are running on an ESX host, where each block represents 4 KB of memory and each color represents a different set of data on a block.

Inside each VM, the guest OS maps the virtual memory to its physical memory. ESX Kernel maps the guest physical memory to machine memory. Due to ESX Page Sharing technology, guest physical pages with the same content can be mapped to the same machine memory page.

Page 21: Understanding  esxtop  statistics

21 Confidential

How to Use it – Monitor Memory

Server Uptime Memory Over Commit 1-min, 5-min, 15-min.1.00 = 100%

Worlds

Page 22: Understanding  esxtop  statistics

22 Confidential

How to Use – Monitor Memory

Memory overcommit, the basics. In short: When you assign more RAM to your VMs than available in your host.

Good memory overcommit: When you assign more RAM to your VMs than available in your host BUT never cross the line where the amount of RAM that is USED by your VMs is more than available in your host.

Bad memory overcommit: When you assign more RAM to your VMs than available in your host AND cross the line where the amount of RAM that is USED by your VMs is more than available in your host.

Reference - http://www.gabesvirtualworld.com/?p=930

Page 23: Understanding  esxtop  statistics

23 Confidential

How to Use it – Monitor Memory

PMEM /MB: - is the total amount of machine memory in the servertotal: XXX cos – The amount of machine memory for the ESX Service Console

"vmk" : the amount of machine memory being used by the ESX VMKernel.

"other": the amount of machine memory being used by everything other than the ESX Service Console and ESX VMKernel. "other" contains not only the memory used by VM but also the user worlds that run directly on VMKernel.

"free" : the amount of machine memory that is free.

Page 24: Understanding  esxtop  statistics

24 Confidential

How to Use it – Monitor Memory

The machine memory statistics for VMKernel. "VMKMEM" (MB) total amount of machine memory managed by VMKernel.

“managed: XXX minfree” - the minimum amount of machine memory that VMKernel would like to keep free. This is because VMKernel needs to keep some amount of free memory for critical uses. "rsvd" : the amount of machine memory that is currently reserved. "rsvd" is the sum of three parts: (a) the reservation setting of the groups; (b) the overhead reservation of the groups; (c) "minfree“."ursvd" : the amount of machine memory that is currently unreserved. It is the memory available for reservation."state" : the free memory state..

Q: How do I know my ESX box is under memory pressure?

A: It is usually safe to say the ESX box is under memory pressure, if "state" is "hard" or "low". But, you need also check

"SWAP" and "MEMCTL" to find whether you are experiencing memory problems. Basically, if there is not enough free

memory and ESX are experiencing swapping or ballooning, ESX box is under memory pressure.

Page 25: Understanding  esxtop  statistics

25 Confidential

How to Use it – Monitor Memory

"COSMEM" (MB) The memory statistics reported by the ESX Service Console.

"free" : the amount of idle machine memory."swap_t": the total swap configured."swap_f": the amount of swap free."r/s" : the rate at which memory is swapped in from disk."w/s" : the rate at which memory is swapped out to disk.

Q: What does it mean if I see a high r/s or w/s?A: Your console OS is swapping. It is highly likely that your COS free memory is low. You may either configure more memory for COS and restart your ESX box, or stop some programs running inside your COS.

Q: Why can't I see this COSMEM line?A: You are using ESXi not classic ESX.

Page 26: Understanding  esxtop  statistics

26 Confidential

How to Use it – Monitor Memory

"PSHARE" (MB) The ESX page-sharing statistics.

“Pshare/MB: XXXX shared“ - the amount of guest physical memory that is being shared.

"common": the amount of machine memory that is common across World(s).

"saving": the amount of machine memory that is saved due to page-sharing.

Page 27: Understanding  esxtop  statistics

27 Confidential

How to Use it – Monitor Memory

"SWAP" (MB) The ESX swap usage statistics.

"curr" : the current swap usage. This is the total swapped machine memory of all the groups.

"target": the swap usage expected to be. This is the total swap target of all the groups.

"r/s" : the rate at which machine memory is swapped in from disk.

"w/s" : the rate at which machine memory is swapped out to disk

Page 28: Understanding  esxtop  statistics

28 Confidential

How to Use it – Monitor Memory

"MEMCTL" (MB) The memory balloon statistics.

"curr" : the total amount of physical memory reclaimed by balloon driver. This is the total ballooned memory by the VMs.

"target": total amount of ballooned memory expected to be. This is the total ballooned targets of the VMs.

"max" : the maximum amount of physical memory reclaimable.

Page 29: Understanding  esxtop  statistics

29 Confidential

How to Use it – Drill down Memory

"MEMSZ" (MB) For a VM, it is the amount of configured guest physical memory

"SZTGT" (MB)This counter is used to determine future swapping and ballooning activities

TCHD" (MB)The amount of guest physical RAM recently used by the VM

Page 30: Understanding  esxtop  statistics

30 Confidential

How to Use it – Monitor Memory

"%ACTV” Percentage of active guest physical memory, current value.

"%ACTVS” Percentage of active guest physical memory, slow moving average.

"%ACTVF” Percentage of active guest physical memory, fast moving average.

"%ACTVN” Percentage of active guest physical memory in the near future. This is an estimated value.

"OVHDUW" (MB) Amount of overhead memory reserved for the vmx user world of a VM group. This is an internal counter.

"OVHD" (MB) Amount of overhead memory currently consumed by a VM.

"OVHDMAX" (MB) Amount of reserved overhead memory for the entire VM.

Page 31: Understanding  esxtop  statistics

31 Confidential

Next VMUG Meeting –

esxtop Stats Part II –

Disk Network Batch Mode