35
August 27-28, 2012, San Diego, CA, USA Dario Faggioli, [email protected] NUMA and Virtualization, the case of Xen Dario Faggioli, [email protected]

NUMA and Virtualization, the case of Xen

Embed Size (px)

Citation preview

Page 1: NUMA and Virtualization, the case of Xen

August 27-28, 2012,San Diego, CA, USA

Dario Faggioli,[email protected]

NUMA and Virtualization,the case of Xen

Dario Faggioli, [email protected]

Page 2: NUMA and Virtualization, the case of Xen

August 27-28, 2012,San Diego, CA, USA

Dario Faggioli,[email protected]

NUMA and Virt. And Xen

Talk outline:

● What is NUMA– NUMA and Virtualization– NUMA and Xen

● What I have done in Xen about NUMA● What remains to be done in Xen about NUMA

Page 3: NUMA and Virtualization, the case of Xen

August 27-28, 2012,San Diego, CA, USA

Dario Faggioli,[email protected]

NUMA and Virt. And Xen

Talk outline:

● What is NUMA– NUMA and Virtualization– NUMA and Xen

● What I have done in Xen about NUMA● What remains to be done in Xen about NUMA

Page 4: NUMA and Virtualization, the case of Xen

August 27-28, 2012,San Diego, CA, USA

Dario Faggioli,[email protected]

What is NUMA

● Non-Uniform Memory AccessNon-Uniform Memory Access: it will take longerto access some regions of memory than others

● Designed to improve scalability on large SMPs● Bottleneck: contention on the shared memory bus

Page 5: NUMA and Virtualization, the case of Xen

August 27-28, 2012,San Diego, CA, USA

Dario Faggioli,[email protected]

What is NUMA

● Groups of processors (NUMA node) have their own local memory

● Any processor can access any memory, including the one not "owned" by its group (remote memory)

● Non-uniform: accessing local memory is faster than accessing remote memory

Page 6: NUMA and Virtualization, the case of Xen

August 27-28, 2012,San Diego, CA, USA

Dario Faggioli,[email protected]

What is NUMA

Since:● (most of) memory is

allocated at task startup,

● tasks are (usually) free to run on any processor

Both local and remote accesses can happen during task's life

NODE

CPUs

MEM

NODE

CPUs

MEM

task A

mem A

task A

mem A

Page 7: NUMA and Virtualization, the case of Xen

August 27-28, 2012,San Diego, CA, USA

Dario Faggioli,[email protected]

NUMA and Virtualization

Short lived tasks are just fine... But whatabout VMs?

NODE

CPUs

MEM

NODE

CPUs

MEM

memVM1

VM1 VM1VM2

memVM2

FOREVER!!

Page 8: NUMA and Virtualization, the case of Xen

August 27-28, 2012,San Diego, CA, USA

Dario Faggioli,[email protected]

NUMA and Xen

Xen allocates memory from all the nodeswhere the VM is allowed to run (when created)

NODE

CPUs

MEM

NODE

CPUs

MEM

memVM1

memVM1

VM1 VM1VM2

memVM2

memVM2

Page 9: NUMA and Virtualization, the case of Xen

August 27-28, 2012,San Diego, CA, USA

Dario Faggioli,[email protected]

NUMA and Xen

Xen allocates memory from all the nodesall the nodeswhere the VM is allowed to runwhere the VM is allowed to run (when created)

NODE

CPUs

MEM

NODE

CPUs

MEM

memVM1

memVM1

VM1VM2

memVM2

memVM2

NODE

CPUs

MEM

NODE

CPUs

MEM

Page 10: NUMA and Virtualization, the case of Xen

August 27-28, 2012,San Diego, CA, USA

Dario Faggioli,[email protected]

NUMA and Xen

Xen allocates memory from all the nodesall the nodeswhere the VM is allowed to runwhere the VM is allowed to run (when created)

How to specify that?

vCPU pinning during VM creation (e.g., in the VM config file)

Page 11: NUMA and Virtualization, the case of Xen

August 27-28, 2012,San Diego, CA, USA

Dario Faggioli,[email protected]

NUMA and Virt. And Xen

Talk outline:

● What is NUMA– NUMA and Virtualization– NUMA and Xen

● What I have done in Xen about NUMA● What remains to be done in Xen about NUMA

Page 12: NUMA and Virtualization, the case of Xen

August 27-28, 2012,San Diego, CA, USA

Dario Faggioli,[email protected]

Automatic Placement

1.What happens on the left is better thanwhat happens on the right

NODE

CPUs

MEM

NODE

CPUs

MEM

memVM1

memVM1

VM1

NODE

CPUs

MEM

NODE

CPUs

MEM

memVM1

VM1

Page 13: NUMA and Virtualization, the case of Xen

August 27-28, 2012,San Diego, CA, USA

Dario Faggioli,[email protected]

Automatic Placement

2.What happens on the left can be avoided:look at what happens on the right

NODE

CPUs

MEM

NODE

CPUs

MEM

memVM1

VM1VM2

memVM2

NODE

CPUs

MEM

NODE

CPUs

MEM

memVM1

VM1 VM2

memVM2

VM1

Page 14: NUMA and Virtualization, the case of Xen

August 27-28, 2012,San Diego, CA, USA

Dario Faggioli,[email protected]

Automatic Placement

NODE

CPUs

MEM

NODE

CPUs

MEM

memVM1

VM2

memVM2

VM1

1.VM1 creationtime: pin VM1 to the first node

2.VM2 creation time: pin VM2 to the second node, as first one already has another VM pinned to it

This now happensAutomatically within

libxl

VM1

Page 15: NUMA and Virtualization, the case of Xen

August 27-28, 2012,San Diego, CA, USA

Dario Faggioli,[email protected]

NUMA Aware Scheduling

Nice, but using pinning has a majordrawback:

NODE

CPUs

MEM

NODE

CPUs

MEM

memVM1

VM1

memVM2

VM2

memVM3

VM3

VM1

When VM1 is back,It has to wait, even ifThere are idle CPUs!

Page 16: NUMA and Virtualization, the case of Xen

August 27-28, 2012,San Diego, CA, USA

Dario Faggioli,[email protected]

NUMA Aware Scheduling

Don't use pinning, tell the scheduler wherea VM prefers to run (node affinity)

NODE

CPUs

MEM

NODE

CPUs

MEM

memVM1

VM1

memVM2

VM2

memVM3

VM3

Now VM1 canrun immediately:remote accesses

Are better than notrunning at all!

VM1

Patches almostready (targeting

Xen 4.3)

Page 17: NUMA and Virtualization, the case of Xen

August 27-28, 2012,San Diego, CA, USA

Dario Faggioli,[email protected]

Experimental Results

Trying to verify the performances of:● Automatic placement (pinning)● NUMA aware scheduling

(automatic placement + node affinity)

Page 18: NUMA and Virtualization, the case of Xen

August 27-28, 2012,San Diego, CA, USA

Dario Faggioli,[email protected]

Testing Placement(thanks to Andre Przywara from AMD)

● Host: AMD Opteron(TM) 6276, 64 cores, 128 GB RAM, 8 NUMA nodes

● VMs: 1 to 49, 2 vCPUs, 2GB RAM each.nr. vCPUs Node14 012 112 212 312 412 512 612 7

Page 19: NUMA and Virtualization, the case of Xen

August 27-28, 2012,San Diego, CA, USA

Dario Faggioli,[email protected]

Some Traces aboutNUMA Aware Scheduling

Page 20: NUMA and Virtualization, the case of Xen

August 27-28, 2012,San Diego, CA, USA

Dario Faggioli,[email protected]

Benchmarking Pinning andNUMA Aware Scheduling

● Host: Intel Xeon(R) E5620, 16 cores, 12 GB RAM,2 NUMA nodes

● VMs: 2, 4, 6, 8 and 10 of them, 960MB RAM. Different experiments with 1, 2 and 4 vCPUs per VM

➔ SPECjbb2005 executed concurrently in all VMs

➔ 3 configurations: all-cpus, auto-pinning, auto-affinity

➔ Exp. repeated 3 times per each configuration

Page 21: NUMA and Virtualization, the case of Xen

August 27-28, 2012,San Diego, CA, USA

Dario Faggioli,[email protected]

Benchmarking Pinning andNUMA Aware Scheduling

1 vCPU per VM: 20% to 16% improvement!

Page 22: NUMA and Virtualization, the case of Xen

August 27-28, 2012,San Diego, CA, USA

Dario Faggioli,[email protected]

Benchmarking Pinning andNUMA Aware Scheduling

2 vCPUs per VM: 17% to 13% improvement!

Page 23: NUMA and Virtualization, the case of Xen

August 27-28, 2012,San Diego, CA, USA

Dario Faggioli,[email protected]

Benchmarking Pinning andNUMA Aware Scheduling

4 vCPUs per VM:

16% to 4% improvementfrom all-cpus to auto-affinity

0.2% to 8% gap fromauto-affinity to auto-pinning

Page 24: NUMA and Virtualization, the case of Xen

August 27-28, 2012,San Diego, CA, USA

Dario Faggioli,[email protected]

Some Traces aboutNUMA Aware Scheduling

Page 25: NUMA and Virtualization, the case of Xen

August 27-28, 2012,San Diego, CA, USA

Dario Faggioli,[email protected]

Some Traces aboutNUMA Aware Scheduling

Page 26: NUMA and Virtualization, the case of Xen

August 27-28, 2012,San Diego, CA, USA

Dario Faggioli,[email protected]

NUMA and Virt. And Xen

Talk outline:

● What is NUMA– NUMA and Virtualization– NUMA and Xen

● What I have done in Xen about NUMA● What remains to be done in Xen about NUMA

Page 27: NUMA and Virtualization, the case of Xen

August 27-28, 2012,San Diego, CA, USA

Dario Faggioli,[email protected]

Lots of Room forFurther Improvement!

Tracking ideas, people and progress on http://wiki.xen.org/wiki/Xen_NUMA_Roadmap

● Dynamic memory migration

● IO NUMA

● Guest (or Virtual) NUMA

● Ballooning and memory sharing

● Inter-VM dependencies

● Benchmarking and performances evaluation

Page 28: NUMA and Virtualization, the case of Xen

August 27-28, 2012,San Diego, CA, USA

Dario Faggioli,[email protected]

Dynamic MemoryMigration

If VM2 goes away, it would be nice if wecould change VM1's (or VM3's) node affinityon-line NODE

CPUs

MEM

NODE

CPUs

MEM

memVM1

memVM2

VM2

memVM3

VM3 VM1

memVM1

And move it'sMemory accordingly!

Also targetingXen 4.3

Page 29: NUMA and Virtualization, the case of Xen

August 27-28, 2012,San Diego, CA, USA

Dario Faggioli,[email protected]

IO NUMA

Different devices can be attached todifferent nodes: needs to be considered during placement / scheduling

Page 30: NUMA and Virtualization, the case of Xen

August 27-28, 2012,San Diego, CA, USA

Dario Faggioli,[email protected]

Guest NUMA

If a VM is bigger than 1 node, should it know?

NODE

CPUs

MEM

NODE

CPUs

MEM

Pros: performances (especially HPC)

Cons: what if things changes?

● live migrationmemVM1

memVM2

VM2VM1

memVM1

VM1

NODE

CPUs

MEM

memVM2

VM2VM1

memVM1

Page 31: NUMA and Virtualization, the case of Xen

August 27-28, 2012,San Diego, CA, USA

Dario Faggioli,[email protected]

Ballooning and Sharing

Ballooning should be NUMA aware

Sharing, should we allow that cross-node?

NODE NODE

CPUs

MEM

memVM1

NODE

CPUs

MEM

NODE

MEM

memVM1

VM1 VM2

memVM2

VM1

NODE

CPUs

MEM

memVM1

VM1

NODE

CPUs

MEM

CPUs

memVM1

VM1 VM2

memVM2

memVM3

VM3

shmemRemote access!

memVM1

VM2

memVM2!!

Page 32: NUMA and Virtualization, the case of Xen

August 27-28, 2012,San Diego, CA, USA

Dario Faggioli,[email protected]

Inter-VM Dependences

NODE NODE

CPUs

MEM

memVM1

VM2

memVM2

NODE

CPUs

MEM

NODE

MEM

memVM1

VM1 VM2

memVM2

VM1

Are we sure the situation on the right is always better?Might it be workload dependant (VM cooperation VS. competiotion)

NODE

CPUs

MEM

memVM1

VM1

VM2

memVM2

NODE

CPUs

MEM

CPUs

memVM1

VM1 VM2

memVM2

VM1

memVM3

VM3

memVM3

VM3

Page 33: NUMA and Virtualization, the case of Xen

August 27-28, 2012,San Diego, CA, USA

Dario Faggioli,[email protected]

Benchmarking andPerformances Evaluation

How to verify we are actually improving:

● What kind of workload(s)?

● What VMs configuration?

Page 34: NUMA and Virtualization, the case of Xen

August 27-28, 2012,San Diego, CA, USA

Dario Faggioli,[email protected]

NUMA and Virt. And Xen

Talk outline:

● What is NUMA– NUMA and Virtualization– NUMA and Xen

● What I have done in Xen about NUMA● What remains to be done in Xen about NUMA

Page 35: NUMA and Virtualization, the case of Xen

August 27-28, 2012,San Diego, CA, USA

Dario Faggioli,[email protected]

Thanks!

Any Questions?