36
ARM Virtualization: Performance and Architectural Implications Christoffer Dall, Shih-Wei Li, Jin Tack Lim, Jason Nieh, and Georgios Koloventzos

ISCA Presentation.key

  • Upload
    vothuy

  • View
    225

  • Download
    2

Embed Size (px)

Citation preview

ARM Virtualization: Performance and Architectural

Implications

Christoffer Dall, Shih-Wei Li, Jin Tack Lim, Jason Nieh, and Georgios Koloventzos

ARM Servers ARM Network Equipment

Virtualization

Virtualization

Hardware

Kernel

App App

Native

Hardware

Hypervisor

Virtual MachinesVM

Kernel

App App

VM

Kernel

App App

App

ARM Hardware Virtualization Support

Virtualization Extensions

x86

VM Exit

Root (Hypervisor) Non-Root (VM)

Save/Restore state to VMCS

ARM Virtualization Extensions

Kernel

User

Hypervisor

EL0

EL1

EL2

Trap

EL2

• Controlled by EL2 system registers

• Limited to support hypervisors, not OS kernels

_EL1 sysregs

User

_EL2 sysregs

EL0

EL1

EL2

1. Clear hierarchy from user to kernel to hypervisor

2. Reduced complexity

ARM Virtualization Extensions Design Choices

Kernel

User

Hypervisor

EL0

EL1

EL2

ARM Virtualization Performance ?

Measurement Study

• Micro-benchmarks: low-level hypervisor operations

• Macro-benchmarks: application workloads

Hardware SetupARM Hardware

• HP Moonshot m400

• 64-bit ARMv8-A

• 2.4 GHz APM Atlas CPU

• 8-way SMP

• 64 GB RAM (capped at 16 GB)

• 10 GB Ethernet

x86 Hardware

• Dell PowerEdge r320

• 64-bit x86_x64

• 2.1 GHz Intel Xeon ES-2450

• 8-way SMP

• 16 GB RAM

• 10 GB Ethernet

Software Setup

VM-to-Hypervisor Transitions

Hypervisor

VM

Kernel

App App

Interrupts

I/OMemory

Scheduling

No-Op Hypercall

Hypervisor

VM

Kernel

App App

Hypercall Return

App

Micro Results

1: ARM can be either much faster or slower than x86

CPU Clock Cycles ARM x86KVM Xen KVM Xen

Hypercall 6,500 376 1,300 1,228

VM Exit

Root (Hypervisor)

Non-Root (VM)

x86 ARM

Kernel

User

Hypervisor

EL0

EL1

EL2

Trap

Micro Results

1: ARM can be either much faster or slower than x86

-> x86 VM Exit more complicated than ARM Trap

2: KVM is much slower than Xen on ARM

CPU Clock Cycles ARM x86KVM Xen KVM Xen

Hypercall 6,500 376 1,300 1,228

Hypervisor Design

Hardware

Hypervisor

VM

Kernel

App App

VM

Kernel

App App

Type 1 (Bare-Metal)

Hardware

OS Kernel

VM

Kernel

App App

VM

Kernel

App App

Type 2 (Hosted)

Hypervisor

App

EL0

EL1

Xen ARM: Bare-MetalVM

Kernel

AppApp

VM

Kernel

AppApp

EL2 Xen

HVC Ret

EL0

EL1

EL2

KVM/ARM: HostedHost

Linux

AppApp

VM

Kernel

AppApp

KVM

KVM lowvisor

*ASPLOS 2014: KVM/ARM: The Design and Implementation of the Linux ARM Hypervisor

Software World Switchof all CPU state

EL0

EL1

EL2

KVM/ARM: HostedHost

Linux

AppApp

VM

Kernel

AppApp

KVM

KVM lowvisor

*ASPLOS 2014: KVM/ARM: The Design and Implementation of the Linux ARM Hypervisor

1. HVCswitch state

2. Ret3. HVC 4. Ret

Micro ResultsCPU Clock Cycles ARM x86

KVM Xen KVM XenHypercall 6,500 376 1,300 1,228

1: ARM can be either much faster or slower than x86

-> x86 VM Exit more complicated than ARM Trap

2: KVM is much slower than Xen on ARM

-> ARM architecture not designed for Type 2

Application WorkloadsApplication Description

Kernbench Kernel compile

Hackbench Scheduler stress

SPECjvm2008 Java workload

Netperf Network performance

Apache Web server stress

Memcached Key-Value store

MySQL Database workload

Application Performance

0.00

0.50

1.00

1.50

2.00

KernbenchHackbench

SpecJVM2008TCP_RR

TCP_STREAM

TCP_MAERTSApache

MemcachedMySQL

KVM ARM Xen ARM KVM x86 Xen x86

Normalized overhead (lower is better)

2.062.33

2.75 3.56 3.90

Virtualized I/O

Hardware

Hypervisor

VM

Kernel

App App

VM

Kernel

App App

I/O Trap

KVM I/O Model

Hardware

Linux

VM

Kernel

App App

VM

Kernel

App App

KVM

App

Xen Device Drivers

Hardware

Xen

VM: Dom0

Linux Kernel

VM

Kernel

App App App

Driver Driver

Xen I/O Model

Hardware

Xen

Dom0

Kernel

App App

VM

Kernel

App App

VM

Kernel

App App

VM Switch Trap

Copy Data VM-to-VM communication

• Trap is slow

• But all you do is a trap

• Trap is fast

• But you do much more…

KVM ARM I/O Xen ARM I/O

Architecture Improvements

VHEVirtualization Host Extension

KVM

EL0

EL1

EL2

Host

Linux

AppApp

VM

Kernel

AppApp

KVM

KVM lowvisorWorld Switch

~ 6,000 cycles fora hypercall!

KVM + VHE

EL0

EL1

EL2

Host

Linux

AppApp

VM

Kernel

AppApp

KVMTrap

VHE

1. Expand EL2 to support all EL1 features

2. EL1 register accesses to go to EL2

VHE

• Available in ARMv8.1

• No (public) hardware yet

Conclusions• Micro operations: ARM can be faster than x86

• Not achievable for Type 2 hypervisors

• Type 1 is dominated by other I/O costs

• ARM overhead is comparable to x86

• ARMv8.1 adds VHE for hosted hypervisors

• The software matters!