31
Live Migration of Direct-Access Devices Asim Kadav and Michael M. Swift University of Wisconsin - Madison

Live Migration of Direct-Access Devices

Embed Size (px)

Citation preview

Page 1: Live Migration of Direct-Access Devices

Live Migration of Direct-Access Devices

Asim Kadav and Michael M. Swift

University of Wisconsin - Madison

Page 2: Live Migration of Direct-Access Devices

Live Migration

• Migrating VM across different hosts without noticeable downtime

• Uses of Live Migration

– Reducing energy consumption by hardware consolidation

– Perform non-disruptive hardware maintenance

• Relies on hypervisor mediation to maintain connections to I/O devices

– Shared storage

– Virtual NICs

2

Page 3: Live Migration of Direct-Access Devices

Live Migration with virtual I/O

Hardware Device

Hypervisor + OS

Guest OS

Virtual Driver

Hardware Device

Hypervisor + OS

VMM abstracts the hardware

Device state available to hypervisor.

3

Source Host Destination Host

Page 4: Live Migration of Direct-Access Devices

Access to I/O Devices

Physical Hardware Device

Guest OSVirtual Driver

HypervisorHypervisor + OS

Virtual I/O Architecture

Virtual I/O

• Guest accesses virtual devices• Drivers run outside guest OS• Hypervisor mediates all I/O

Moderate performance Device independence Device sharing Enables migration

4

Page 5: Live Migration of Direct-Access Devices

Direct access to I/O Devices

Hypervisor

Guest OS

Direct I/O Architecture (Pass-through I/O)

Direct I/O• Drivers run in Guest OS• Guest directly accesses

device

Near native performance No migration

Real Driver

Physical Hardware Device

5

Page 6: Live Migration of Direct-Access Devices

Live Migration with Direct I/O

Hypervisor + OS Hypervisor + OS

Device State lost

No Heterogeneous Devices 6

Guest OS

Real Driver

Source Host Destination Host

Hardware Device Hardware Device

Page 7: Live Migration of Direct-Access Devices

Live migration with Direct I/O

• Why not both performance and migration?

– Hypervisor unaware of device state

– Heterogeneous devices/drivers at source and destination

• Existing Solutions :

– Detach device interface and perform migration [Xen 3.3]

– Detach device and divert traffic to virtual I/O [Zhai OLS 08]

– Modify driver and device [Varley (Intel) ISSOO3 08]

7

Page 8: Live Migration of Direct-Access Devices

Overview

• Problem– Direct I/O provides native throughput– Live migration with direct I/O is broken

• Solution– Shadow drivers in guest OS capture device/driver state– Transparently re-attach driver after migration

• Benefits– Requires no modifications to the driver or the device– Supports migration of/to different devices– Causes minimal performance overhead

8

Page 9: Live Migration of Direct-Access Devices

Outline

• Introduction

• Architecture

• Implementation

• Evaluation

• Conclusions

9

Page 10: Live Migration of Direct-Access Devices

Architecture

• Goals for Live Migration

– Low performance cost when not migrating

– Minimal downtime during migration

– No activity executing in guest pre-migration

• Our Solution

– Introduce agent in guest OS to manage migration

– Leverage shadow drivers as the agent [Swift OSDI04]

10

Page 11: Live Migration of Direct-Access Devices

Shadow Drivers

• Kernel agent that monitors the state of the driver

• Recovers from driver failures

• Driver independent

• One implementation per device type

Guest OS Kernel

Shadow Driver

Device Driver

Device

11

Taps

Page 12: Live Migration of Direct-Access Devices

Shadow Driver OperationGuest OS

Kernel

Shadow Driver

Device Driver

Device

• Normal Operation

- Intercept Calls

- Track shared objects

- Log state changing operations

• Recovery

- Proxy to kernel

- Release old objects

- Restart driver

- Replay log

12

Taps

LogTracker

Page 13: Live Migration of Direct-Access Devices

Shadow Drivers for Migration

• Pre Migration– Record driver/device state in driver-independent

way

– Shadow driver logs state changing operations• configuration requests, outstanding packets

• Post Migration– Unload old driver

– Start new driver

– Replay log to configure driver

13

Page 14: Live Migration of Direct-Access Devices

Shadow Drivers for Migration

• Transparency

– Taps route all I/O requests to the shadow driver

– Shadow driver can give an illusion that the device is up

• State Preservation

– Always store only the absolute current state

– No history of changes maintained

– Log size only dependent on current state of the driver

14

Page 15: Live Migration of Direct-Access Devices

Migration with shadow drivers

Guest OS Kernel

Shadow Driver

Source Network

Driver

Source Network Card

Destination Network

Driver

Dest. Network Card

Taps

LogTracker

Source Hypervisor + OS

Destination Hypervisor + OS

15

Source Network

Driver

Page 16: Live Migration of Direct-Access Devices

Outline

• Introduction

• Overview

• Architecture

• Implementation

• Evaluation

• Conclusions

16

Page 17: Live Migration of Direct-Access Devices

Implementation

• Prototype Implementation

– VMM: Xen 3.2 hypervisor

– Guest VM based on linux-2.6.18.8-xen kernel

• New code: Shadow Driver implementation inside guest OS

• Changed code: Xen hypervisor to enable migration

• Unchanged code: Device drivers

17

Page 18: Live Migration of Direct-Access Devices

Changes to Xen Hypervisor

• Migration code modified to allow

– Allow migration of PCI devices

– Unmap I/O memory mapped at the source

– Detach virtual PCI bus just before VM suspension at source and reconnect virtual PCI bus at the destination

– Added ability to migrate between different devices

18

XenHypervisor

Guest VM

PCI Bus

Page 19: Live Migration of Direct-Access Devices

Modifications to the Guest OS

• Ported shadow drivers to 2.6.18.8-xen kernel– Taps

– Object tracker

– Log

• Implemented shadow driver for network devices– Proxy by temporarily disabling device

– Log ioctl calls, multicast address

– Recovery code

19

Page 20: Live Migration of Direct-Access Devices

Outline

• Introduction

• Architecture

• Implementation

• Evaluation

• Conclusions

20

Page 21: Live Migration of Direct-Access Devices

Evaluation

1. Cost when not migrating

2. Latency of Migration

21

Page 22: Live Migration of Direct-Access Devices

Evaluation Platform

• Host machines– 2.2GHz AMD machines

– 1 GB Memory

• Direct Access Devices– Intel Pro/1000 gigabit Ethernet NIC

– NVIDIA MCP55 Pro gigabit NIC

• Tests– Netperf on local network

– Migration with no applications inside VM

– Liveness tests from a third physical host

22

Page 23: Live Migration of Direct-Access Devices

Throughput

698 706.2773

940.5

769

937.7

0

200

400

600

800

1000

Intel Pro/1000 NVIDIA MCP55

Virtual I/O

Direct I/O

Direct I/O (SD)

Thro

ugh

pu

t in

Mb

its/

seco

nd

23

Direct I/O is 33% better than virtual I/O.

Direct I/O with shadow throughput within 1% of original

Page 24: Live Migration of Direct-Access Devices

CPU Utilization

13.80%

17.70%

2.80%

8.03%

4.00%

9.16%

0%

5%

10%

15%

20%

Intel Pro/1000 NVIDIA MCP55

Virtual I/O

Direct I/O

Direct I/O (SD)

CP

U U

tiliz

atio

n in

th

e gu

est

VM

24Direct I/O with shadow CPU utilization within 1% of original

Page 25: Live Migration of Direct-Access Devices

Migration Time

1 2 3 4

Events Timeline

PCI Unplug

VM Migration

PCI Replug

Shadow Recovery

Driver Uptime

ARP Sent

Time (in seconds)

Significant migration latency(56%) is in driver uptime.

25

G

H1 H2

D

Total Downtime = 3.97 seconds

Events TimeLine

Page 26: Live Migration of Direct-Access Devices

Conclusions

• Shadow Drivers are an agent to perform live migration of VMs performing direct access

• Supports heterogeneous devices

• Requires no driver or hardware changes

• Minimal performance overhead and latency during migration

• Portable to other devices, OS and hypervisors

26

Page 27: Live Migration of Direct-Access Devices

Questions

27

Contact : {kadav, swift} @cs.wisc.edu

More details:

http://cs.wisc.edu/~swift/drivers/

http://cs.wisc.edu/~kadav/

Page 28: Live Migration of Direct-Access Devices

Backup Slides

28

Page 29: Live Migration of Direct-Access Devices

Complexity of Implementation

• ~19000 LOCs

• Bulk of this (~70%) are wrappers around functions.

– Can be automatically generated by scripts

29

Page 30: Live Migration of Direct-Access Devices

Migration Time

1 2 3 4

Events Tim

eline

PCI Unplug

VM Migration

PCI Replug

Shadow Recovery

Driver Uptime

ARP Sent

Time (in seconds)

Total downtime = 3.97 seconds

Significant migration latency(56%) is in driver uptime.

30

G

H1 H2

D

Page 31: Live Migration of Direct-Access Devices

Migration with shadow drivers

Guest OS Kernel

Shadow Driver

Source Network

Driver

Source Network Card

Dest. Network

Driver

Destination Network Card

Taps

31

LogTracker

Source Hypervisor

+ OS

Destination Hypervisor

+ OS