August 27-28, 2012,San Diego, CA, USA
Dario Faggioli,[email protected]
NUMA and Virtualization,the case of Xen
Dario Faggioli, [email protected]
August 27-28, 2012,San Diego, CA, USA
Dario Faggioli,[email protected]
NUMA and Virt. And Xen
Talk outline:
● What is NUMA– NUMA and Virtualization– NUMA and Xen
● What I have done in Xen about NUMA● What remains to be done in Xen about NUMA
August 27-28, 2012,San Diego, CA, USA
Dario Faggioli,[email protected]
NUMA and Virt. And Xen
Talk outline:
● What is NUMA– NUMA and Virtualization– NUMA and Xen
● What I have done in Xen about NUMA● What remains to be done in Xen about NUMA
August 27-28, 2012,San Diego, CA, USA
Dario Faggioli,[email protected]
What is NUMA
● Non-Uniform Memory AccessNon-Uniform Memory Access: it will take longerto access some regions of memory than others
● Designed to improve scalability on large SMPs● Bottleneck: contention on the shared memory bus
August 27-28, 2012,San Diego, CA, USA
Dario Faggioli,[email protected]
What is NUMA
● Groups of processors (NUMA node) have their own local memory
● Any processor can access any memory, including the one not "owned" by its group (remote memory)
● Non-uniform: accessing local memory is faster than accessing remote memory
August 27-28, 2012,San Diego, CA, USA
Dario Faggioli,[email protected]
What is NUMA
Since:● (most of) memory is
allocated at task startup,
● tasks are (usually) free to run on any processor
Both local and remote accesses can happen during task's life
NODE
CPUs
MEM
NODE
CPUs
MEM
task A
mem A
task A
mem A
August 27-28, 2012,San Diego, CA, USA
Dario Faggioli,[email protected]
NUMA and Virtualization
Short lived tasks are just fine... But whatabout VMs?
NODE
CPUs
MEM
NODE
CPUs
MEM
memVM1
VM1 VM1VM2
memVM2
FOREVER!!
August 27-28, 2012,San Diego, CA, USA
Dario Faggioli,[email protected]
NUMA and Xen
Xen allocates memory from all the nodeswhere the VM is allowed to run (when created)
NODE
CPUs
MEM
NODE
CPUs
MEM
memVM1
memVM1
VM1 VM1VM2
memVM2
memVM2
August 27-28, 2012,San Diego, CA, USA
Dario Faggioli,[email protected]
NUMA and Xen
Xen allocates memory from all the nodesall the nodeswhere the VM is allowed to runwhere the VM is allowed to run (when created)
NODE
CPUs
MEM
NODE
CPUs
MEM
memVM1
memVM1
VM1VM2
memVM2
memVM2
NODE
CPUs
MEM
NODE
CPUs
MEM
August 27-28, 2012,San Diego, CA, USA
Dario Faggioli,[email protected]
NUMA and Xen
Xen allocates memory from all the nodesall the nodeswhere the VM is allowed to runwhere the VM is allowed to run (when created)
How to specify that?
vCPU pinning during VM creation (e.g., in the VM config file)
August 27-28, 2012,San Diego, CA, USA
Dario Faggioli,[email protected]
NUMA and Virt. And Xen
Talk outline:
● What is NUMA– NUMA and Virtualization– NUMA and Xen
● What I have done in Xen about NUMA● What remains to be done in Xen about NUMA
August 27-28, 2012,San Diego, CA, USA
Dario Faggioli,[email protected]
Automatic Placement
1.What happens on the left is better thanwhat happens on the right
NODE
CPUs
MEM
NODE
CPUs
MEM
memVM1
memVM1
VM1
NODE
CPUs
MEM
NODE
CPUs
MEM
memVM1
VM1
August 27-28, 2012,San Diego, CA, USA
Dario Faggioli,[email protected]
Automatic Placement
2.What happens on the left can be avoided:look at what happens on the right
NODE
CPUs
MEM
NODE
CPUs
MEM
memVM1
VM1VM2
memVM2
NODE
CPUs
MEM
NODE
CPUs
MEM
memVM1
VM1 VM2
memVM2
VM1
August 27-28, 2012,San Diego, CA, USA
Dario Faggioli,[email protected]
Automatic Placement
NODE
CPUs
MEM
NODE
CPUs
MEM
memVM1
VM2
memVM2
VM1
1.VM1 creationtime: pin VM1 to the first node
2.VM2 creation time: pin VM2 to the second node, as first one already has another VM pinned to it
This now happensAutomatically within
libxl
VM1
August 27-28, 2012,San Diego, CA, USA
Dario Faggioli,[email protected]
NUMA Aware Scheduling
Nice, but using pinning has a majordrawback:
NODE
CPUs
MEM
NODE
CPUs
MEM
memVM1
VM1
memVM2
VM2
memVM3
VM3
VM1
When VM1 is back,It has to wait, even ifThere are idle CPUs!
August 27-28, 2012,San Diego, CA, USA
Dario Faggioli,[email protected]
NUMA Aware Scheduling
Don't use pinning, tell the scheduler wherea VM prefers to run (node affinity)
NODE
CPUs
MEM
NODE
CPUs
MEM
memVM1
VM1
memVM2
VM2
memVM3
VM3
Now VM1 canrun immediately:remote accesses
Are better than notrunning at all!
VM1
Patches almostready (targeting
Xen 4.3)
August 27-28, 2012,San Diego, CA, USA
Dario Faggioli,[email protected]
Experimental Results
Trying to verify the performances of:● Automatic placement (pinning)● NUMA aware scheduling
(automatic placement + node affinity)
August 27-28, 2012,San Diego, CA, USA
Dario Faggioli,[email protected]
Testing Placement(thanks to Andre Przywara from AMD)
● Host: AMD Opteron(TM) 6276, 64 cores, 128 GB RAM, 8 NUMA nodes
● VMs: 1 to 49, 2 vCPUs, 2GB RAM each.nr. vCPUs Node14 012 112 212 312 412 512 612 7
August 27-28, 2012,San Diego, CA, USA
Dario Faggioli,[email protected]
Some Traces aboutNUMA Aware Scheduling
August 27-28, 2012,San Diego, CA, USA
Dario Faggioli,[email protected]
Benchmarking Pinning andNUMA Aware Scheduling
● Host: Intel Xeon(R) E5620, 16 cores, 12 GB RAM,2 NUMA nodes
● VMs: 2, 4, 6, 8 and 10 of them, 960MB RAM. Different experiments with 1, 2 and 4 vCPUs per VM
➔ SPECjbb2005 executed concurrently in all VMs
➔ 3 configurations: all-cpus, auto-pinning, auto-affinity
➔ Exp. repeated 3 times per each configuration
August 27-28, 2012,San Diego, CA, USA
Dario Faggioli,[email protected]
Benchmarking Pinning andNUMA Aware Scheduling
1 vCPU per VM: 20% to 16% improvement!
August 27-28, 2012,San Diego, CA, USA
Dario Faggioli,[email protected]
Benchmarking Pinning andNUMA Aware Scheduling
2 vCPUs per VM: 17% to 13% improvement!
August 27-28, 2012,San Diego, CA, USA
Dario Faggioli,[email protected]
Benchmarking Pinning andNUMA Aware Scheduling
4 vCPUs per VM:
16% to 4% improvementfrom all-cpus to auto-affinity
0.2% to 8% gap fromauto-affinity to auto-pinning
August 27-28, 2012,San Diego, CA, USA
Dario Faggioli,[email protected]
Some Traces aboutNUMA Aware Scheduling
August 27-28, 2012,San Diego, CA, USA
Dario Faggioli,[email protected]
Some Traces aboutNUMA Aware Scheduling
August 27-28, 2012,San Diego, CA, USA
Dario Faggioli,[email protected]
NUMA and Virt. And Xen
Talk outline:
● What is NUMA– NUMA and Virtualization– NUMA and Xen
● What I have done in Xen about NUMA● What remains to be done in Xen about NUMA
August 27-28, 2012,San Diego, CA, USA
Dario Faggioli,[email protected]
Lots of Room forFurther Improvement!
Tracking ideas, people and progress on http://wiki.xen.org/wiki/Xen_NUMA_Roadmap
● Dynamic memory migration
● IO NUMA
● Guest (or Virtual) NUMA
● Ballooning and memory sharing
● Inter-VM dependencies
● Benchmarking and performances evaluation
August 27-28, 2012,San Diego, CA, USA
Dario Faggioli,[email protected]
Dynamic MemoryMigration
If VM2 goes away, it would be nice if wecould change VM1's (or VM3's) node affinityon-line NODE
CPUs
MEM
NODE
CPUs
MEM
memVM1
memVM2
VM2
memVM3
VM3 VM1
memVM1
And move it'sMemory accordingly!
Also targetingXen 4.3
August 27-28, 2012,San Diego, CA, USA
Dario Faggioli,[email protected]
IO NUMA
Different devices can be attached todifferent nodes: needs to be considered during placement / scheduling
August 27-28, 2012,San Diego, CA, USA
Dario Faggioli,[email protected]
Guest NUMA
If a VM is bigger than 1 node, should it know?
NODE
CPUs
MEM
NODE
CPUs
MEM
Pros: performances (especially HPC)
Cons: what if things changes?
● live migrationmemVM1
memVM2
VM2VM1
memVM1
VM1
NODE
CPUs
MEM
memVM2
VM2VM1
memVM1
August 27-28, 2012,San Diego, CA, USA
Dario Faggioli,[email protected]
Ballooning and Sharing
Ballooning should be NUMA aware
Sharing, should we allow that cross-node?
NODE NODE
CPUs
MEM
memVM1
NODE
CPUs
MEM
NODE
MEM
memVM1
VM1 VM2
memVM2
VM1
NODE
CPUs
MEM
memVM1
VM1
NODE
CPUs
MEM
CPUs
memVM1
VM1 VM2
memVM2
memVM3
VM3
shmemRemote access!
memVM1
VM2
memVM2!!
August 27-28, 2012,San Diego, CA, USA
Dario Faggioli,[email protected]
Inter-VM Dependences
NODE NODE
CPUs
MEM
memVM1
VM2
memVM2
NODE
CPUs
MEM
NODE
MEM
memVM1
VM1 VM2
memVM2
VM1
Are we sure the situation on the right is always better?Might it be workload dependant (VM cooperation VS. competiotion)
NODE
CPUs
MEM
memVM1
VM1
VM2
memVM2
NODE
CPUs
MEM
CPUs
memVM1
VM1 VM2
memVM2
VM1
memVM3
VM3
memVM3
VM3
August 27-28, 2012,San Diego, CA, USA
Dario Faggioli,[email protected]
Benchmarking andPerformances Evaluation
How to verify we are actually improving:
● What kind of workload(s)?
● What VMs configuration?
August 27-28, 2012,San Diego, CA, USA
Dario Faggioli,[email protected]
NUMA and Virt. And Xen
Talk outline:
● What is NUMA– NUMA and Virtualization– NUMA and Xen
● What I have done in Xen about NUMA● What remains to be done in Xen about NUMA