Upload
others
View
17
Download
0
Embed Size (px)
Citation preview
VIRT3319BES
Deep Dive on pNUMA & vNUMASave your SQL VMs from certain DoomA!
Rob GirardPrincipal TME
#VMworld #VIRT3319BES
VMworld 2017 Content: Not fo
r publication or distri
bution
Virtualizing Applications Track Sessions and Offerings
• 30 Breakout Sessions with 2 Panels & 3 Quick Talks
• 10 BCA Meet-The-Experts sessions (15min 1-on-1 appts)
• 2 Birds-of-a-Feather special invitation receptions (Oracle & SAP)
• 5 Group Discussions
• 3 Saturday - Full Day Applications Bootcamps• Sign up for the Independent Oracle User Group
(IOUG) VMware Special Interest Group (SIG)www.ioug.org/vmware
VMworld 2017 Content: Not fo
r publication or distri
bution
The Percentage of Applications in Virtualized Infrastructure Has Increased Dramatically Over the Last Few Years
(VMware Core Metrics Survey 2016)
3
NA EU dAP BRIC SMB COMM ENT
80% 81% 75% 84% 75% 81% 86%
57% 70% 66% 71% 59% 70% 68%
52% 55% 49% 58% 48% 51% 60%
61% 44% 43% 51% 41% 56% 60%
36% 51% 48% 55% 32% 45% 59%
32% 29% 40% 38% 32% 35% 34%
38% 22% 24% 31% 24% 33% 34%
26% 28% 30% 36% 24% 37% 30%
18% 29% 41% 40% 21% 31% 35%
19% 20% 26% 29% 18% 24% 26%
388 289 139 208 401 217 406
Region Company Size
81%
65%
53%
52%
46%
33%
30%
29%
29%
22%
Microsoft SQL
Custom/Industry-Specific Business…
Microsoft Exchange
Microsoft SharePoint
SAP
Oracle Databases
IBM Middleware
Oracle Applications
High Performance Computing
Oracle Middleware
% Respondents Running the Application in Virtualized Infrastructure
> Total
< Total
N = 1024
#VIRT3319BES CONFIDENTIAL
VMworld 2017 Content: Not fo
r publication or distri
bution
Where Can I Learn More?
▪ Business Critical Applications VMware.com Homepage Page
• https://www.vmware.com/solutions/business-critical-apps.html
▪ VMware – DellEMC Collaborative Collateral and DBTA Surveys
• http://www.dbta.com/emc
▪ Blogs
• vSphere Blog
• https://blogs.vmware.com/vsphere/
• One Stop Shop - All Oracle on VMware SDDC
• https://blogs.vmware.com/apps/2017/01/oracle-vmware-collateral-one-stop-shop.html
• VMware IOUG Special Interest Group
• http://vmsig.org/
#VIRT3319BES CONFIDENTIAL 4
VMworld 2017 Content: Not fo
r publication or distri
bution
• This presentation may contain product features that are currently under development.
• This overview of new technology represents no commitment from VMware to deliver these features in any generally available product.
• Features are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind.
• Technical feasibility and market demand will affect final delivery.
• Pricing and packaging for any new technologies or features discussed or presented have not been determined.
Disclaimer
5#VIRT3319BES CONFIDENTIAL
VMworld 2017 Content: Not fo
r publication or distri
bution
About Rob
Rob Girard
• Principal Technical Marketing Engineer @ Tintri as of Jan, 2014
• Working in IT since 1997 with >12 years of VMware experience
• vExpert, VCAP4/5-DCA, VCAP4-DCD, VCP2/4/5, MCSE, CCNA AND TCSE
@robgirard www.linkedin.com/in/robgirard
#VIRT3319BES CONFIDENTIAL 6
VMworld 2017 Content: Not fo
r publication or distri
bution
• Always use a “Green Line” configuration to match optimized VM size to underlying physical
topology, while presenting the correct Socket & Cores to the Guest OS
• Leave Hot Add CPU off
• Adjust Virtual Machine Advanced Settings
• numa.autosize.once FALSE
• numa.autosize TRUE (deprecated in vSphere 6.5, which defaults to TRUE)
– Leave everything else alone – VMware does a great job of managing vNUMA
• If you want to know why, what all the other knobs are & their impact, as well as our testing to
prove these settings…. STICK AROUND!
2 Minute Version
#VIRT3319BES CONFIDENTIAL 7
VMworld 2017 Content: Not fo
r publication or distri
bution
Introduction
Met at SQL Elite Workshop, hosted by VMware and Tintri [April 2015]
Partnered to share expertise with different aspects of virtualization
Delivered VAP6433 Group Discussion session @ VMworld 2015
This session summarizes the research & lab behind that session
For those who want to understand how it works under the cover
#VIRT3319BES CONFIDENTIAL 8
VMworld 2017 Content: Not fo
r publication or distri
bution
Agenda
Explain pNUMA
and vNUMA
How vNUMA
works in VMware
vNUMA balancing
and boundaries
Advanced vNUMA
settings
Lab results
& findings
Monitoring
vNUMA
#VIRT3319BES CONFIDENTIAL 9
VMworld 2017 Content: Not fo
r publication or distri
bution
Non Uniform Memory Access (NUMA)
SMP vs NUMA
CP
U
CP
U
CP
U
CP
U
Memory
Controller
I/O
Controller
SMP Memory Program
Symmetrical Multiprocessing
(SMP)
Non Uniform Memory Access
(NUMA)
• Large physical machines ran into scale problems with memory access
• NUMA was created to divide up memory address space between CPUs
CPU CPU CPU CPU
Memory
Controller
I/O
Controller
CPU CPU CPU CPU
Memory
Controller
I/O
Controller
NUMA Diagram
Interconnect
#VIRT3319BES CONFIDENTIAL 10
VMworld 2017 Content: Not fo
r publication or distri
bution
There are 2 NUMA nodes per processor
4 socket server will have 8 NUMA nodes
AMD NUMA
I/O
Controller
Me
mo
ry
Con
tro
lle
r
CPU
Me
mo
ry
Con
tro
lle
r
CPU
Me
mo
ry
Con
trolle
r
CPU
Me
mo
ry
Con
trolle
r
CPU
#VIRT3319BES CONFIDENTIAL 11
VMworld 2017 Content: Not fo
r publication or distri
bution
Intel processors have one NUMA node per processor
Notice the QPI links between each CPU
Intel NUMA
I/O
Controller
Me
mo
ry
Con
tro
lle
r
CPU
Me
mo
ry
Con
tro
lle
r
CPU
Me
mo
ry
Con
trolle
r
CPU
Me
mo
ry
Con
trolle
rCPU
I/O
Controller
#VIRT3319BES CONFIDENTIAL 12
VMworld 2017 Content: Not fo
r publication or distri
bution
Intel processors have one NUMA node per processor
Notice the QPI links between each CPU
Intel Cluster On Die (COD)
Performance impact to ESXi varies up to 35%,
depending on workload, according to VMware
Controlled in BIOS; recommend OEM Default
Affects 10 cores or more
Available Haswell (v3) and later
Graphic from https://www.starwindsoftware.com/blog/numa-and-cluster-on-die
#VIRT3319BES CONFIDENTIAL 13
VMworld 2017 Content: Not fo
r publication or distri
bution
pNUMA vs vNUMA
vNUMA presents the pNUMA nodes to the virtual machine OS
Since vNUMA is software we can tune when it does not automatically match the desired configuration
Windows, Linux, and SQL server are all natively NUMA aware and have been for a very long time
vNUMA virtual NUMA presentation to a virtual machine
pNUMA NUMA architecture of the physical machine
#VIRT3319BES CONFIDENTIAL 14
VMworld 2017 Content: Not fo
r publication or distri
bution
Soft NUMA
SQL Server has a concept called soft NUMA, been around forever
Changed in SQL Server 2016; creates logical NUMA nodes up to 8 cores each
Works in conjunction with VMware vNUMA, not a substitute
SQL Server and Intel have all found 8 cores is the magic number for optimal
memory throughput
#VIRT3319BES CONFIDENTIAL 15
VMworld 2017 Content: Not fo
r publication or distri
bution
Memory
Controller
CPU
Hypervisor
APP
OS
Memory
Controller
CPU
Default, only comes into play when there are 9
vCPUs or more
If you have 4 or 6 core processors in your host and VMs with
more vCPU than cores you WILL have NUMA issues!
Consider changing the numa.min.vcpu on the virtual machine
to allow for vNUMA to take effect below this threshold
This can be set at VM level.
Introduced in vSphere 5.0, but improved in 5.5, 6.0 & 6.5
vNUMA
#VIRT3319BES CONFIDENTIAL 16
VMworld 2017 Content: Not fo
r publication or distri
bution
Test Methodology, Tools & Lab Setup
In-Guest analysis
Host Mem Usage analysis: ESXTOP (M for
memory, f to choose fields, g for NUMA fields)
.vmx file Analysis (to validate changes made via
GUI, vMotions to other hardware, impact of reboot
vs power cycle, FIRST BOOT vs others, etc…)
Worst Case Analysis – Pinning CPUs &
Memory to specific cores & nodes
01
02
03
04
#VIRT3319BES CONFIDENTIAL 17
VMworld 2017 Content: Not fo
r publication or distri
bution
• 1 x AMD Server: 2 x 16 core + 256 GB RAM
• 1 x Intel Server: 2 x 16 core + 384 GB RAM
• Tintri VMstore for storage
• SQL VMs - Win 2012 R2 + SQL 2014
• Size varied for CPU & RAM
• HammerDB
– Master/Slave config: 10 VMs @ 8 vCPU each,
– 16 virtual users per client against 24 vCPU SQL w/ 224GB RAM
Test Methodology, Tools & Lab Setup – Con’t
Lab:
#VIRT3319BES CONFIDENTIAL 18
VMworld 2017 Content: Not fo
r publication or distri
bution
• Task Manager can show you NUMA
nodes by right-clicking the graph
Determine NUMA configuration from Windows
#VIRT3319BES CONFIDENTIAL 19
VMworld 2017 Content: Not fo
r publication or distri
bution
• Task Manager can show you NUMA
nodes by right-clicking the graph
Determine NUMA configuration from Windows
#VIRT3319BES CONFIDENTIAL 20
VMworld 2017 Content: Not fo
r publication or distri
bution
• Task Manager can show you NUMA
nodes by right-clicking the graph
Determine NUMA configuration from Windows
#VIRT3319BES CONFIDENTIAL 21
VMworld 2017 Content: Not fo
r publication or distri
bution
• Resource monitor (CPU tab) shows more
detailed info about the CPUs and which NUMA
node they belong to
Determine NUMA configuration from Windows– Con’t
22
VMworld 2017 Content: Not fo
r publication or distri
bution
Check NUMA in SQL – Con’t
select * from sys.dm_os_memory_nodes
#VIRT3319BES CONFIDENTIAL 23
VMworld 2017 Content: Not fo
r publication or distri
bution
Check NUMA in SQL
select * from sys.dm_os_schedulers
#VIRT3319BES CONFIDENTIAL 24
VMworld 2017 Content: Not fo
r publication or distri
bution
Checking NUMA on Host (ESXTOP)
#VIRT3319BES CONFIDENTIAL 25
VMworld 2017 Content: Not fo
r publication or distri
bution
Checking NUMA on Host (ESXTOP) – Con’t
#VIRT3319BES CONFIDENTIAL 26
VMworld 2017 Content: Not fo
r publication or distri
bution
• Host 2 socket 12 cores, 384 GB of memory
• Each NUMA node is 12 cores and 192 GB of memory
• VM with 12 cores and 256 GB of memory will have two NUMA nodes, 6 cores each with 128 GB of memory per node
NUMA Node Balancing
• NUMA imbalance occurs when there is a mismatch between the number
CPU and memory for a virtual machine and the physical hardware.
• Since NUMA is a collection of CPU and memory resources ensure you
are sized correctly
• Two NUMA nodes means the memory is split in half
VMware rarely makes an imbalance when it auto-configures NUMA
Me
mo
ry
Con
tro
lle
r
CPU
Me
mo
ry
Con
tro
lle
r
CPU
Me
mo
ry
Con
trolle
r
CPU
Me
mo
ry
Con
trolle
r
CPU
NUMA Node
#VIRT3319BES CONFIDENTIAL 27
VMworld 2017 Content: Not fo
r publication or distri
bution
NUMA Penalty
NUMA wants to
schedule the thread on
the CPU where the
memory being assigned
to the thread
Memory lookup has a
cost which is known as
the NUMA penalty
When a thread runs but
the memory it needs is
in the other NUMA node
a memory lookup occurs
#VIRT3319BES CONFIDENTIAL 28
VMworld 2017 Content: Not fo
r publication or distri
bution
In our testing (HammerDB workload), we found the penalty to be as great as a 40% drop in performance!
Penalty varies by workload
NUMA Penalty – Con’t
#VIRT3319BES CONFIDENTIAL 29
VMworld 2017 Content: Not fo
r publication or distri
bution
• 8 vCPU machine will still run, but you will lose consolidate rates
• Most SQL server virtualization consolidation is not the main goal
• For large machines having them be multiples of the number of cores runs best, 12, 24, 36 vCPUs
• Remember to leave room for the hypervisor
VM Sizing
• Example 12 core servers work best with virtual machines sized
1, 2, 3, 4, 6, or 12 vCPUs
• Size a VM to fit inside a single NUMA node for best performance
• Right size your workloads
• For best CPU scheduling size all virtual machines to be evenly
divisible by the number cores in the processor
https://www.vmware.com/techpapers/2017/Perf_Best_Practices_vSphere65.html
#VIRT3319BES CONFIDENTIAL 30
VMworld 2017 Content: Not fo
r publication or distri
bution
Virtual Nodes – 24 cores on a 16 core CPU
#VIRT3319BES CONFIDENTIAL 31
VMworld 2017 Content: Not fo
r publication or distri
bution
Cores vs Sockets
• 1 core per socket (“wide”) allows the CPU scheduler the most flexibility on
scheduling, BUT can have a negative impact when interpreted by software
• vSphere will determine the best NUMA topology for a VM on first boot. This
is set in the .VMX file
• Changing from 1 core per socket, locks in the vNUMA configuration, vSphere
cannot update it (Autosize settings are ignored)
• Use multiple cores to save on licensing for applications you pay per socket.
• If you are sure of the underlying hardware you can change these settings to
match NUMA boundaries (Recommended)
• If you desire a non-standard NUMA configuration you can change them here
• Results do vary, you need to test to validate, each workload is impacted by
NUMA differently.
http://www.vmware.com/files/pdf/techpaper/VMware-PerfBest-Practices-vSphere6-0.pdf
Core
#VIRT3319BES CONFIDENTIAL 32
VMworld 2017 Content: Not fo
r publication or distri
bution
• cpuid.coresPerSocket = 1 (default)
• Determines number of virtual cores per socket
Cores & Sockets – VM Settings
numa.vcpu.followcorespersocket = 0 (default)
• If set to 1, reverts to the old behaviour for virtual
NUMA node sizing being tied to
cpuid.coresPerSocket
NEW IN vSPHERE 6.5
#VIRT3319BES CONFIDENTIAL 33
VMworld 2017 Content: Not fo
r publication or distri
bution
Cores & Sockets – vSphere 6.5 “Green Line” Configurations
https://blogs.vmware.com/performance/2017/03/virtual-machine-vcpu-and-vnuma-rightsizing-rules-of-thumb.html
#VIRT3319BES CONFIDENTIAL 34
VMworld 2017 Content: Not fo
r publication or distri
bution
numa.vcpu.maxPerVirtualNode=8 (default)
• Used to span additional NUMA nodes
numa.vcpu.preferHT=False (default)
• Enable if you want to use HT cores and less NUMA nodes
numa.vcpu.min=9 (default)
• Threshold for when vNUMA will take effect
numa.autosize.once=True (default)
• Recommended: False – Change behavior to recalculate
vNUMA on every power cycle
numa.autosize=False (default) DEPRECATED (v6.5)
• Change to True to have VM recalculate vNUMA on every
power cycle - *RECOMMENDED*
VM Advanced vNUMA Settings – con’t
#VIRT3319BES CONFIDENTIAL 35
VMworld 2017 Content: Not fo
r publication or distri
bution
numa.autosize.cookie=[auto-generated value]
• What VMware calculated as your vNUMA config
• (160001) = 16 sockets, 1 core each
• numa.autosize.vcpu.maxPerVirtualNode
• = [auto-generated value]
• How many cores per NUMA nodes based on the autosize
• 8 shown in example – boundary of the host we are using (AMD 16
cores x 2 sockets)
Auto-Generated Settings – LOOK, BUT DON’T TOUCH!
NOTE: As of vSphere 6.5 (and latest patches of vSphere 6.0), these settings are no longer visible in the UI, but can still be found in the .vmx files
#VIRT3319BES CONFIDENTIAL 36
VMworld 2017 Content: Not fo
r publication or distri
bution
VM Advanced vNUMA Settings – con’t
…..But you CAN access the VMX file via CLI or Datastore Browser!
TIP: You can’t see Advanced config settings while a VM is running….
#VIRT3319BES CONFIDENTIAL 37
VMworld 2017 Content: Not fo
r publication or distri
bution
What Does Auto-Sized NUMA Look Like?
Note: If cpuid.coresPerSocket or numa.vcpu.maxPerVirtualNode is present in a VM’s VMX file,
Autosize is ignored
numa.autosize.vcpu.maxPerVirtualNode= 12 (or 24 or 8 or ….?)
numa.autosize.cookie= 240001
#VIRT3319BES CONFIDENTIAL 38
VMworld 2017 Content: Not fo
r publication or distri
bution
• Tested on 4 NUMA node system
NUMA AutoSize
#VIRT3319BES CONFIDENTIAL 39
VMworld 2017 Content: Not fo
r publication or distri
bution
VMware Hot-Add Gotchas
• When you turn on CPU hot add, it will disable vNUMA
• Memory HotAdd works fine with one caveat
• In VMware hardware version 8-10 adding memory to a vNUMA machine
it only added to NUMA node 0
• You would then have a NUMA memory imbalance
• Requires a power cycle of the virtual machine to correct the imbalance
• Hardware versions 11+ (vSphere 6.0 +) balances the memory as it is
added
Me
mo
ry
Con
tro
lle
r
CPU
Me
mo
ry
Con
tro
lle
r
CPU
Me
mo
ry
Con
trolle
r
CPU
Me
mo
ry
Con
trolle
r
CPU
RA
MR
AM
RA
MR
AM
+
+
+
+
#VIRT3319BES CONFIDENTIAL 40
VMworld 2017 Content: Not fo
r publication or distri
bution
• numa.autosize TRUE
• numa.autosize.once FALSE
Update NUMA Configuration
• NUMA for a virtual machine is calculated at first power on.
• Only updates when you change the number of cores
• When a vMotion occurs between different hardware with
different underlying NUMA configuration it is not updated.
• Three scenarios:
• To force update review and/or update of NUMA topology to VM
there are two settings to add to the advanced section of the VM.
http://www.vmware.com/files/pdf/techpaper/VMware-vSphere-CPU-Sched-Perf.pdf
NUMA node size configuration is
smaller or the same to the new host,
no real change≤NUMA is smaller and is not evenly
divisible, then NUMA is basically
disabled<NUMA node size is larger but is evenly
divisible, then the NUMA node is divided
up to match however the OS will not
know the memory locality÷>
#VIRT3319BES CONFIDENTIAL 41
VMworld 2017 Content: Not fo
r publication or distri
bution
Prefer HT
• Off by default
• Host setting and VM setting
• If using, set at the VM level in nearly all cases
• Only turn it on when you have more vCPU than NUMA node size but your
memory still fits into one NUMA node.
• This will allow all threads to schedule on one processor and all memory is local
• Workloads with lots of inter-thread communication will benefit
• Mileage may vary and you should test your workload each way, the answer will
depend up the value of local memory vs having a full CPU cycle
https://blogs.vmware.com/vsphere/2014/03/perferht-use-2.html
Core
Hyper-threading: Doubling the number of processing threads per core
#VIRT3319BES CONFIDENTIAL 42
VMworld 2017 Content: Not fo
r publication or distri
bution
vNUMA Host Settings
• In nearly all cases do not TOUCH!!!
• Mostly covers when and how a host will change a VM from one NUMA node to another
• Most large virtual machines are not impacted by this as they don’t change
• Upon VM boot it is assigned a NUMA node or nodes
• If too many VMs are running on one NUMA node causing CPU pressure, ESXi will move a VM between nodes
• CPU thread move instantly, memory moves slowly
• ESXi will try to keep VMs communicating over the network with each other together for improved network speed
#VIRT3319BES CONFIDENTIAL 43
VMworld 2017 Content: Not fo
r publication or distri
bution
• Node interleaving off means NUMA is on
• Node interleaving on for SMP configurations
NUMA in BIOS
• NUMA can be turned off in the hardware BIOS
• Ensure it is enabled
• Every hardware vendor seems to call it something slightly
different
• Most have NUMA enabled by default
• “Node Interleaving” is the most common name
#VIRT3319BES CONFIDENTIAL 44
VMworld 2017 Content: Not fo
r publication or distri
bution
Before you blame NUMA….
• An important finding throughout this testing is how much impact
database optimization can have!
• More importantly, how negative NOT optimizing your database.
• HammerDB (a sample application) grinds to a crawl after
prolonged use…. Optimization can breath new life!
• In our case… 1.25 Million TPM down to <1000 TPM!!!!
• DB Size: 200 GB (2,000 warehouses) -> 245 GB -> 375 GB
(optimized)
• NUMA should be one of the last things you look at if 1 core per
socket is set
#VIRT3319BES CONFIDENTIAL 45
VMworld 2017 Content: Not fo
r publication or distri
bution
Additional Screenshots….
#VIRT3319BES CONFIDENTIAL 46
VMworld 2017 Content: Not fo
r publication or distri
bution
#VIRT3319BES CONFIDENTIAL 47
VMworld 2017 Content: Not fo
r publication or distri
bution
#VIRT3319BES CONFIDENTIAL 48
VMworld 2017 Content: Not fo
r publication or distri
bution
#VIRT3319BES CONFIDENTIAL 49
VMworld 2017 Content: Not fo
r publication or distri
bution
#VIRT3319BES CONFIDENTIAL 50
VMworld 2017 Content: Not fo
r publication or distri
bution
#VIRT3319BES CONFIDENTIAL 51
VMworld 2017 Content: Not fo
r publication or distri
bution
#VIRT3319BES CONFIDENTIAL 52
VMworld 2017 Content: Not fo
r publication or distri
bution
#VIRT3319BES CONFIDENTIAL 53
VMworld 2017 Content: Not fo
r publication or distri
bution
Closing Comments
When in doubt,
DON’T TOUCH IT!
Topic only applies to very
large VMs that don’t fit into
NUMA nodes and require
maximum performance
If you think you have a
handle on NUMA, that
may be even more
dangerous!
#VIRT3319BES CONFIDENTIAL 54
VMworld 2017 Content: Not fo
r publication or distri
bution
VMworld 2017 Content: Not fo
r publication or distri
bution
VMworld 2017 Content: Not fo
r publication or distri
bution