32
Xen-lite for ARM: Adapting Xen for a Samsung Exynos MicroServer with Hybrid FPGA IO Acceleration Presented by: Julian Chesterfield, Chief Scientific Officer, OnApp Ltd, [email protected] Contributing work from: Anastassios Nanos, Xenia Ragiadakou, Michail Flouris [email protected]

XPDDS17: Xen-lite for ARM: Adapting Xen for a Samsung Exynos MicroServer with Hybrid FPGA IO Acceleration - Julian Chesterfield, OnApp

Embed Size (px)

Citation preview

Xen-lite for ARM: Adapting Xen for a Samsung Exynos MicroServer with Hybrid FPGA IO Acceleration

Presented by: Julian Chesterfield, Chief Scientific Officer, OnApp Ltd, [email protected] work from: Anastassios Nanos, Xenia Ragiadakou, Michail Flouris

[email protected]

Xen Summit, Budapest, July 13th 2017.

Setting the scene…• increasing focus on embedded and integrated System-on-Chip hardware

• Mobile devices• Autonomous vehicles• edge server multi-tenant devices• ….

• ARM features prominently in the landscape as mobile device processors become more power efficient alternative

• Significantly lower power, but much smaller resources (RAM, CPU, Network)• Parallel growth of integrated accelerators such as GPU and/or FPGA hardware

(co-processors ARM/Xilinx, Intel/Altera)

2

Xen Summit, Budapest, July 13th 2017.

OnApp focus on HyperConverged Embedded Devices

3

• Hyper-Converged Infrastructure:• Software Defined Compute (Hypervisor Virtualisation)• Software Defined Networking (SDN, Openflow etc..)• Software Defined Storage (SDS)

• Fast growing infrastructure orchestration trend in enterprise DC• SDS - Utilising commodity direct attached storage devices

• Software controlled distributed block storage for Virtual machines• Software control is extremely advantageous

• fast dynamic reconfiguration• feature updates• no hardware appliance dependency

• But performance is significantly impacted=> OnApp focus on merging commodity virtualisation with hardware accelerated IO

Xen Summit, Budapest, July 13th 2017.

OnApp/Kaleao Server Architecture

4

12x13cm

• Hardware accelerated I/O• Low-power• Share-nothing• UNIMEM coherent memory

access across compute nodes• Samsung Galaxy S6 Exynos

7420 chipset

Samsung Exynos 7420

IO FPGAs

Xen Summit, Budapest, July 13th 2017.

Deployment

COMPUTE UNIT 1 big.LITTLE Server 8x ARM 64-bit Cores 128GB NV-CACHE

4GB DDR4 at 25 GB/s 20 Gb/s IO Bandwidth

15W Peak Power

NODE

4x Compute Units 2xZynq FPGA SoCs 7.68 TB NVMe SSD

STORAGE AT 1.9GB/s (NVMe over Fabric)

2 x 10Gb ETHERNET

4x4x

12x

>

BLADE

4x Nodes

30.8TB NVMe 2x 40Gb/s

Embedded 10/40Gb Ethernet Switch

PRODUCTION A and B

3U CHASSIS

12 BLADES

192 X SERVERS 1,532 X CORES

370TB NVMe FLASH

48x 40GbE (960Gb/s) Ethernet (stackable)

3KW Peak Power

External 48V

RACKS

STANDARD 42U RACKS

21,504 ARM 64b Cores 10.752 TB LPDDR4 344 TB NV-Cache 5.16 PB of NVMe SSD 13,440 Gb/s Ethernet

KALEAOKMAX

5

KALEAO Integrated PCB (Compute Node)

Xen Summit, Budapest, July 13th 2017.

KALEAO Integrated PCB (Compute Node)

6

Dedicated PCI Lanes

PCI bus

PCI bus

PCI bus

PCI bus

IO Mirroring

FPGA defined Network virtualisation

FPGA defined Storage virtualisation

Xen Summit, Budapest, July 13th 2017.

KALEAO Integrated PCB (Compute Node)

7

Dedicated PCI Lanes

PCI bus

PCI bus

PCI bus

PCI bus

IO Mirroring

FPGA defined Network virtualisation

FPGA defined Storage virtualisation

Software Defined Hardware!!

Xen Summit, Budapest, July 13th 2017.

Emerging Software Defined Hardware IO Architectures

• KMAX represents a common emerging theme that other integrated SoC servers are moving towards

• Centralised, smart virtualisation of IO resources in hardware

• hardware mapping of virtualised IO across non-cache coherent endpoints

• embedded PCI or Fibre fabric

• Facebook ‘Yosemite’ architecture with Intel XeonD processors

• NVMe over Fabric

8

Xen Summit, Budapest, July 13th 2017.

But what about multi-tenancy….?

Xen Summit, Budapest, July 13th 2017.

Multi-tenancy on low power ARM• Multi-tenant server support is as important (if not more important) on ARM as Intel arch

• efficient utilisation of hardware resources

• application execution and isolation on critical systems (unikernels)

• ARM CPU architecture is really well suited to Xen!

• EL0/EL1 Hypervisor trap overhead into EL2 is lower than Intel

but…….

Context switch overhead of handling IO via a Dom0 or stub-domain significantly overshadows any Type 1 architecture benefits

[“ARM Virtualization: Performance and Architectural Implications”, C.Dall et al, ISCA 2016]

10

Xen Summit, Budapest, July 13th 2017.

Third party comparisons of Xen vs KVM on ARM

• “operations such as accessing registers in the emulated GIC, sending virtual IPIs, and receiving virtual interrupts are much faster on Xen ARM than KVM ARM”

11

Xen Summit, Budapest, July 13th 2017.

Poor I/O performance on Xen vs KVM

• “a VM performing I/O has to communicate with Dom0 and not just the Xen hypervisor, which means not just trapping to EL2, but also going to EL1 to run Dom0”

12

Xen Summit, Budapest, July 13th 2017.

Revisiting some IO Architectural Assumptions for low power ARM SoCs

Xen Summit, Budapest, July 13th 2017.

Usual hypervisor architecture

14

Xen Summit, Budapest, July 13th 2017.

1. Move PV backend support into the VMM layer

2. Implement unified IO backend combining packet switch/Block transition logic to AoE frames in the VMM layer

3. Experiment with realtime service driver domain in EL1 (miniOS + integrated hardware driver) and/or integrated driver in EL2

4. Xenbus + xenstore integration into VMM layer to reduce overhead and speedup device initialisation for integrated backend drivers

5. Lightweight network based remote management interface

15

Architectural Review

Xen Summit, Budapest, July 13th 2017.

MicroVisor integrated architecture

Xen Summit, Budapest, July 13th 2017.

FPGA Acceleration Integration - Software Defined Hardware

Xen Summit, Budapest, July 13th 2017.

Clustered MicroVisor Management

Stats Collector 1

Stats Collector 2

Stats Collector N

Software Defined StorageRE

ST A

PI

virtxd

CPUPools

Phys +VirtualNets

VirtualBlockDevs

REST

API

REST

API

REST

API

REST

API

Go

API +

Dat

a Ca

che

UI

NOVA

NEUTRON

CINDER

CEILOMETER

Control aggregation point

Custom plugins

KEYSTONE

Rack 0

Rack 1

Rack ...

Rack N

Xen Summit, Budapest, July 13th 2017.

RackScale Management UI

Xen Summit, Budapest, July 13th 2017.

Example Usecases: NFV Service Function Chaining for the Edge

Xen Summit, Budapest, July 13th 2017.

• Rapid instantiation of virtualised network functions

• instantiate on demand

• fast boot and init required e.g. to respond to realtime TCP connection instantiation

• Lightweight packet processing in software with hardware passthrough of accelerated NIC functions

• IP firewalls

• NAT gateways

• Custom traffic shapers

• FPGA handles SDN overlays and ethernet forwarding logic

21

Superfluidity: Network Function Virtualisation

Xen Summit, Budapest, July 13th 2017. 22

Superfluidity: Network Virtualisation Overlay Management

Drag and drop network function instances

Xen Summit, Budapest, July 13th 2017.

Performance Results

Xen Summit, Budapest, July 13th 2017.

Runtime Memory Footprint

24

Xen Summit, Budapest, July 13th 2017.

MicroVisor guest Boot time (vs Stock Xen)• spawn guests in

parallel• start timer at spawn• stop timer at first ping

from the guest (triggered from the last service in the boot chain)

25

Xen Summit, Budapest, July 13th 2017.

VM boot time breakdown

26

MV Stock Xen

Tim

e (s

)

Number of VMs

Xen Summit, Budapest, July 13th 2017.

Intra-node Communication Latency

27

• X-Gene R1 (8x A57) + SF 7000

Xen Summit, Budapest, July 13th 2017.

Inter-Node Communication Latency (raw ETH)Ti

me

(us)

• Off-the-shelf Intel 10GbE

28

• 1-way latency• No TCP/UDP protocols involved —

custom raw ethernet latency tool

Xen Summit, Budapest, July 13th 2017. 29

Redis Unikernel (x86)

Xen Summit, Budapest, July 13th 2017.

Summary & Status

30

• Many core, integrated low power SoC designs are becoming much more common across a variety of industries due to cost, power efficiency and performance

• Integrated hardware acceleration technology (FPGA, GPU) features prominently in merging hardware

• Xen is well suited to ARM architecture multi-tenant operation but loses significant performance on processing I/O

• particular problem for NFV on ARM edge devices

• Moving a minimal set of services into EL2 significantly improves performance

• PV backend drivers

• Integrated block and ethernet frame switch logic

• Xenbus/xenstore communication service

• Functional platform with all basic PV backend support, lightweight remote management interface implemented for both ARM and Intel platform

• Integrated FPGA network/storage device drivers

• PoC for Intel ixgbe driver

• Xen-lite code changes will be open sourced shortly

Xen Summit, Budapest, July 13th 2017.

Thanks!

More info:[email protected]://onapp.com

https://superfluidity.eu

31

Xen Summit, Budapest, July 13th 2017. 32

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under Grant Agreement no 671566 (SUPERFLUIDITY)