60
1 Virtual Machine Mobility with Self-Migration Jacob Gorm Hansen Department of Computer Science, University of Copenhagen (now at VMware)

1 Virtual Machine Mobility with Self-Migration Jacob Gorm Hansen Department of Computer Science, University of Copenhagen (now at VMware)

Embed Size (px)

Citation preview

Page 1: 1 Virtual Machine Mobility with Self-Migration Jacob Gorm Hansen Department of Computer Science, University of Copenhagen (now at VMware)

1

Virtual Machine Mobility

with Self-Migration

Jacob Gorm HansenDepartment of Computer Science,

University of Copenhagen

(now at VMware)

Page 2: 1 Virtual Machine Mobility with Self-Migration Jacob Gorm Hansen Department of Computer Science, University of Copenhagen (now at VMware)

2

Short Bio

• Studied CS at DIKU in Copenhagen

• Worked for Io Interactive on the first two Hitman games

• Master’s thesis 2002 on “Nomadic Operating Systems”

• Ph.D. thesis 2007 “Virtual Machine Mobility with Self-Migration”– Early involvement in the Xen VMM project in Cambridge

– Worked on “Tahoma” secure browser at the University of Washington

– Interned at Microsoft Research Cambridge (2004) and Silicon Valley (2006) (security related projects)

• Presently working at VMware on top-secret cool stuff

Page 3: 1 Virtual Machine Mobility with Self-Migration Jacob Gorm Hansen Department of Computer Science, University of Copenhagen (now at VMware)

3

Virtual Machine Mobility

with Self-Migration

Jacob Gorm HansenDepartment of Computer Science,

University of Copenhagen

(now at VMware)

Page 4: 1 Virtual Machine Mobility with Self-Migration Jacob Gorm Hansen Department of Computer Science, University of Copenhagen (now at VMware)

4

Talk Overview

• Motivation & Background

• Virtual Machine Migration

– Live Migration in NomadBIOS

– Self-Migration in Xen

• Laundromat Computing

• Virtual Machines on the desktop

• Related & future work + conclusion

Page 5: 1 Virtual Machine Mobility with Self-Migration Jacob Gorm Hansen Department of Computer Science, University of Copenhagen (now at VMware)

5

Motivation & Background

Page 6: 1 Virtual Machine Mobility with Self-Migration Jacob Gorm Hansen Department of Computer Science, University of Copenhagen (now at VMware)

6

Motivation

• Researchers and businesses need computing power on-demand– Science increasingly relies on simulation– Web2.0 startups grow quickly (and die just as fast)

• Hardware is cheap, manpower and electricity are not – Idle machines are expensive– Immobile jobs reduce utilization– Fear of untrusted users stealing secrets or access

• We need a dedicated Grid/Utility computing platform:– Simple configuration & instant provisioning– Strong isolation of untrusted users– Backwards compatible with legacy apps (C, Fortran, …)– Location independence & Automated load-balancing– Pay-for-access without the lawyer part

Page 7: 1 Virtual Machine Mobility with Self-Migration Jacob Gorm Hansen Department of Computer Science, University of Copenhagen (now at VMware)

7

Our Proposal

• Use virtual machines as containers for untrusted code

• Use live VM migration to make execution transient and location-indepedent

• Use micro-payments for pay-as-you-go computing

Page 8: 1 Virtual Machine Mobility with Self-Migration Jacob Gorm Hansen Department of Computer Science, University of Copenhagen (now at VMware)

8

Her application roams freely,looking for the cheapest andfastest resources

She finds a Utility to host her application

Grid & Utility Computing Vision

Jill creates a web site for sending greeting cards

She pays for access

Page 9: 1 Virtual Machine Mobility with Self-Migration Jacob Gorm Hansen Department of Computer Science, University of Copenhagen (now at VMware)

9

Can’t We Do This With UNIX?

• Configuration complexity & lack of isolation– Hard to agree on a common software install (BSD, Redhat, Ubuntu?)

– Name-space conflicts, e.g., files in /tmp

– UNIX is designed for sharing, not security

• Mismatch of abstractions– Process ≠ Application

– User login ≠ Customer

– Quota ≠ Payment

• Location-dependence– No bullet-proof way of moving running application to a new host

– Process migration in UNIX just doesn’t work

Page 10: 1 Virtual Machine Mobility with Self-Migration Jacob Gorm Hansen Department of Computer Science, University of Copenhagen (now at VMware)

10

Use Virtual Machines Instead of Processes

“A virtual machine is […] an efficient, isolated duplicate of the real machine” [Popek & Goldberg, 1974]

”A virtual machine cannot be compromised by the operation of any other VM. It provides a private, secure, and reliable

computing environment for its users, …” [Creasy, 1981]

VMVM VMVM VMVM

VMMVMM

HardwareHardware

Page 11: 1 Virtual Machine Mobility with Self-Migration Jacob Gorm Hansen Department of Computer Science, University of Copenhagen (now at VMware)

11

Pros and Cons of VMs

• Pros:– Strongly isolated

– Name-space is not shared

– More configuration freedom

– Simple interface to hardware

– VMs can migrate between hosts

• Cons:– Memory and disk footprint of Guest OS

– Less sharing potential

– Extra layer adds I/O overhead

– Not processor-independent

VMVM VMVM VMVM

VMMVMM

HardwareHardware

Page 12: 1 Virtual Machine Mobility with Self-Migration Jacob Gorm Hansen Department of Computer Science, University of Copenhagen (now at VMware)

12

Virtual Machine Migration

Page 13: 1 Virtual Machine Mobility with Self-Migration Jacob Gorm Hansen Department of Computer Science, University of Copenhagen (now at VMware)

13

Why Process Migration Doesn’t Work

• Because of residual dependencies

• Interface between app and OS not clearly defined

• Part of application state resides in OS kernel

process

file

process

Page 14: 1 Virtual Machine Mobility with Self-Migration Jacob Gorm Hansen Department of Computer Science, University of Copenhagen (now at VMware)

14

Virtual Machine Migration is Simpler

• A VM is self-contained

• Interface to virtual hardware is clearly defined

• All dependencies abstracted via fault-resilient network protocols

process

file

process

VMM VMM

Page 15: 1 Virtual Machine Mobility with Self-Migration Jacob Gorm Hansen Department of Computer Science, University of Copenhagen (now at VMware)

15

VMs, VM Migration & Utility Computing

• Utility Computing on Commodity hardware

• Let customers submit their application as VMs

• Minimum-complexity base install– Stateless nodes are disposable

– Small footprint, no bugs or patches

• Can only provide the basic mechanisms– Job submission

– Scheduling and preemption (migration)

– Pay-as-you-go accounting

• Essentially, a BIOS for Grid and Utility Computing

Page 16: 1 Virtual Machine Mobility with Self-Migration Jacob Gorm Hansen Department of Computer Science, University of Copenhagen (now at VMware)

16

Live Migration in NomadBIOS

Joint work with Asger Jensen, 2002

Page 17: 1 Virtual Machine Mobility with Self-Migration Jacob Gorm Hansen Department of Computer Science, University of Copenhagen (now at VMware)

17

NomadBIOS: Hypervisor on L4

make

xeyesbash

emacs

L4Linux

NomadBIOS

L4 micro-kernel

Physical Hardware

make

vi

bash

gcc

L4Linux

untrusted

trusted

Page 18: 1 Virtual Machine Mobility with Self-Migration Jacob Gorm Hansen Department of Computer Science, University of Copenhagen (now at VMware)

18

NomadBIOS Live Migration

VMVM VMVM VMVM

NomadBIOSNomadBIOS

HardwareHardwareL4L4

VMVM VMVM

NomadBIOSNomadBIOS

HardwareHardwareL4L4

Pre-copy migration + gratuitous ARP = sub-second downtime

Page 19: 1 Virtual Machine Mobility with Self-Migration Jacob Gorm Hansen Department of Computer Science, University of Copenhagen (now at VMware)

19

Why Migration Downtime Matters

•Upsets users of interactive applications such as games

•May trigger failure detectors in a distributed system

Page 20: 1 Virtual Machine Mobility with Self-Migration Jacob Gorm Hansen Department of Computer Science, University of Copenhagen (now at VMware)

20

Live Migration Reduces Downtime

•The VM can still be used while it is migrating

•Data is transferred in the background, changes sent later

Page 21: 1 Virtual Machine Mobility with Self-Migration Jacob Gorm Hansen Department of Computer Science, University of Copenhagen (now at VMware)

21

Multi-Iteration Pre-copy Technique

Percent state transferred per pre-copy round

0%

20%

40%

60%

80%

100%

1 2 3 4 5 6 7 8 9

Iteration

Total mem

Modified

Page 22: 1 Virtual Machine Mobility with Self-Migration Jacob Gorm Hansen Department of Computer Science, University of Copenhagen (now at VMware)

22

Migration Downtime

• Two clients connected to a Quake2 server VM, 100Mbit network• Response time increases by ~50ms when server migrates

Page 23: 1 Virtual Machine Mobility with Self-Migration Jacob Gorm Hansen Department of Computer Science, University of Copenhagen (now at VMware)

23

Lessons Learned from NomadBIOS

• Migration & TCP/IP resulted in 10-fold code size increase– Simplicity/functionality tradeoff

• A lot of stuff was still missing:– Threading

– Encryption & access control

– Disk access

VMVM VMVM VMVM

VMMVMM

HardwareHardwareL4L4

VMVM VMVM VMVM

VMMVMM

HardwareHardware

Migration +TCP/IP

L4L4

Page 24: 1 Virtual Machine Mobility with Self-Migration Jacob Gorm Hansen Department of Computer Science, University of Copenhagen (now at VMware)

24

Self-Migration in Xen

Joint work with Cambridge University,

2004-2005

Page 25: 1 Virtual Machine Mobility with Self-Migration Jacob Gorm Hansen Department of Computer Science, University of Copenhagen (now at VMware)

25

The Promise of Xen

• “Xen” open source VMM announced in late 2003

• Xen 1.0 was– A lean system with many of the same goals as NomadBIOS

– Optimized for para-virtualized VM hosting

– Very low overhead (~5%)

• Our goal was to port Live Migration from NomadBIOS to Xen– Xen lacked layers of indirection that L4 had

– Worse: They were removed for a reason

– Nasty control plane “Dom0” VM

Page 26: 1 Virtual Machine Mobility with Self-Migration Jacob Gorm Hansen Department of Computer Science, University of Copenhagen (now at VMware)

26

Xen Control Plane (Dom0)

VMVM VMVM VMVM

VMMVMM

Con

trol

Pla

ne V

MC

ontr

ol P

lane

VM

• Xen uses a “side-car” model, with a trusted control VM– Has absolute powers

– Adds millions of lines of code to the TCB

• Security-wise, the control VM is the Achilles' Heel

Page 27: 1 Virtual Machine Mobility with Self-Migration Jacob Gorm Hansen Department of Computer Science, University of Copenhagen (now at VMware)

27

Reduce Complexity with Self-Migration

• VM migration needs:– TCP/IP for transferring system state

– Page-table access for checkpointing

• A VM is self-paging & has its own TCP/IP stack

• Reduce VMM complexity by performing migration from within the VM

• No need for networking, threading or crypto in the TCB

VMVM VMVM VMVM

VMMVMM

Migration

Paging

TCP/IP

HardwareHardware

Paging

TCP/IP

Paging

TCP/IP

Page 28: 1 Virtual Machine Mobility with Self-Migration Jacob Gorm Hansen Department of Computer Science, University of Copenhagen (now at VMware)

28

An Inspiring Example of Self-Migration

von Münchhausen in the swamp

Page 29: 1 Virtual Machine Mobility with Self-Migration Jacob Gorm Hansen Department of Computer Science, University of Copenhagen (now at VMware)

29

Simple Brute-Force Solution

• Reserve half of memory for a snapshot buffer

• Checkpoint by copying state into snapshot buffer

• Migrate by copying snapshot to destination host

Source

Destination

Page 30: 1 Virtual Machine Mobility with Self-Migration Jacob Gorm Hansen Department of Computer Science, University of Copenhagen (now at VMware)

30

Combination with Pre-copy

Percent state transferred per pre-copy round

0%

20%

40%

60%

80%

100%

1 2 3 4 5 6 7 8 9

Iteration

Total mem

Modified

Combine Pre-copy with Snapshot Buffer

Page 31: 1 Virtual Machine Mobility with Self-Migration Jacob Gorm Hansen Department of Computer Science, University of Copenhagen (now at VMware)

31

First Iteration

Page 32: 1 Virtual Machine Mobility with Self-Migration Jacob Gorm Hansen Department of Computer Science, University of Copenhagen (now at VMware)

32

Delta Iteration

Page 33: 1 Virtual Machine Mobility with Self-Migration Jacob Gorm Hansen Department of Computer Science, University of Copenhagen (now at VMware)

33

Snapshot/Copy-on-Write Phase

Page 34: 1 Virtual Machine Mobility with Self-Migration Jacob Gorm Hansen Department of Computer Science, University of Copenhagen (now at VMware)

34

Impact of Migration on Foreground Load

httperf

Page 35: 1 Virtual Machine Mobility with Self-Migration Jacob Gorm Hansen Department of Computer Science, University of Copenhagen (now at VMware)

35

Self-Migration Summary

• Pros:– Self-Migration is more flexible, under application control

– Self-Migration removes hardcoded and complex features from the trusted install

– Self-Migration can work with direct-IO hardware

• Cons:– Self-Migration is not transparent, has to be implemented by each OS

– Self-Migration cannot be forced from the outside

Page 36: 1 Virtual Machine Mobility with Self-Migration Jacob Gorm Hansen Department of Computer Science, University of Copenhagen (now at VMware)

36

Laundromat Computing

Page 37: 1 Virtual Machine Mobility with Self-Migration Jacob Gorm Hansen Department of Computer Science, University of Copenhagen (now at VMware)

37

Pay-as-you-go Processing

• Laundromats do this already– Accessible to anyone

– Pre-paid & pay-as-you-go

– Small initial investment

• We propose to manage clusters the same way– Micro-payment currency

– Pay from first packet

– Automatic garbage collection when payments run out

Page 38: 1 Virtual Machine Mobility with Self-Migration Jacob Gorm Hansen Department of Computer Science, University of Copenhagen (now at VMware)

38

Token Payments

• Initial payment is enclosed in Boot Token

• Use a simple hash-chain for subsequent payments– Hn(s), Hn-1(s), …, H(s), s

• Boot Token signed by trusted broker service

• Broker handles authentication

Page 39: 1 Virtual Machine Mobility with Self-Migration Jacob Gorm Hansen Department of Computer Science, University of Copenhagen (now at VMware)

39

Injecting a New VM

• Two-stage boot loader handles different incoming formats– ELF loader for injecting a Linux kernel image

– Checkpoint loader for injecting a migrating VM

• “Evil Man” service decodes Boot Token “magic ping”

• Evil Man is 500 lines of code + network driver

Page 40: 1 Virtual Machine Mobility with Self-Migration Jacob Gorm Hansen Department of Computer Science, University of Copenhagen (now at VMware)

40

Laundromat Summary

• Pros:– Simple and flexible model

– Hundreds instead of millions LOC

– Built-in payment system

– Supports self-scaling applications

• Cons:– Needs direct network access

– Magic ping does not always get through firewalls etc.

Page 41: 1 Virtual Machine Mobility with Self-Migration Jacob Gorm Hansen Department of Computer Science, University of Copenhagen (now at VMware)

41

Service-Oriented Model

Page 42: 1 Virtual Machine Mobility with Self-Migration Jacob Gorm Hansen Department of Computer Science, University of Copenhagen (now at VMware)

42

Pull Instead of Push

• In real life, most Grid clusters are hidden behind NATs– No global IP address for nodes– No way to connect from the outside– Usually allowed to initiate a connection from within

• Possible workarounds:– Run a local broker at each site– Port-forwarding in the NAT– Switch to a pull-based model

• Pull model– Boot VMs over HTTP– Add HTTP client to trusted software for fetching a work description– VMs run a web service for cloning and migration

Page 43: 1 Virtual Machine Mobility with Self-Migration Jacob Gorm Hansen Department of Computer Science, University of Copenhagen (now at VMware)

43

Pull Model

Page 44: 1 Virtual Machine Mobility with Self-Migration Jacob Gorm Hansen Department of Computer Science, University of Copenhagen (now at VMware)

44

Workload Description

Page 45: 1 Virtual Machine Mobility with Self-Migration Jacob Gorm Hansen Department of Computer Science, University of Copenhagen (now at VMware)

45

Pulse Notifications

• Periodic polling works, but introduces latency

• What we have essentially is a cache invalidation problem

• Pulse is a simple and secure wide-area cache invalidation protocol

• Clients listen on H(s), publishers release s to invalidate

• We can preserve the pull model, without adding latency

Page 46: 1 Virtual Machine Mobility with Self-Migration Jacob Gorm Hansen Department of Computer Science, University of Copenhagen (now at VMware)

46

Virtual Machines on the Desktop

Page 47: 1 Virtual Machine Mobility with Self-Migration Jacob Gorm Hansen Department of Computer Science, University of Copenhagen (now at VMware)

47

Security Problems on the Desktop

• Web browsers handle sensitive data, such as e-banking logins

• Risk of worms or spy-ware creeping from one site to another

• VMs could provide strong isolation features

Page 48: 1 Virtual Machine Mobility with Self-Migration Jacob Gorm Hansen Department of Computer Science, University of Copenhagen (now at VMware)

48

The Blink Display System

• VMs have traditionally had only simple 2D graphics

• Modern applications need 3D acceleration

• Cannot sacrifice safety for performance here

• Blink:– JIT-compiled OpenGL stored

procedures– Flexible, efficient and safe control of

the screen– Blink VMs can be checkpointed and

migrate to different graphics hardware

Page 49: 1 Virtual Machine Mobility with Self-Migration Jacob Gorm Hansen Department of Computer Science, University of Copenhagen (now at VMware)

49

VMs on Desktop Summary

• VMs can have native performance graphics, without sacrificing safety

• Stored procedures more flexible than, e.g., shared memory off-screen buffers

• Introduces a new display model, but still backwards compatible

Page 50: 1 Virtual Machine Mobility with Self-Migration Jacob Gorm Hansen Department of Computer Science, University of Copenhagen (now at VMware)

50

Concluding Remarks

Page 51: 1 Virtual Machine Mobility with Self-Migration Jacob Gorm Hansen Department of Computer Science, University of Copenhagen (now at VMware)

51

Related Work

• All commercial VMMs have or will have live migration:– VMware VMotion– Citrix/XenSource XenMotion (derived from our work), Sun, Oracle– Microsoft Hyper-V (planned)

• Huge body of previous process migration work– Distributed V, Emerald cross-platform object mobility– MOSIX– Zap process group migration

• Grid/utility computing projects– BOINC (SETI@Home) from Berkeley– PlanetLab– Shirako from Duke, Amazon EC2, Minimun Intrusion Grid, …

• Security– L4 and EROS secure display systems– L4 Nizza architecture

Page 52: 1 Virtual Machine Mobility with Self-Migration Jacob Gorm Hansen Department of Computer Science, University of Copenhagen (now at VMware)

52

Future Work

• A stateless VMM– All per-VM state stored sealed in the VM

– Seamless checkpointing and migration

– Cannot DoS the VMM or cause starvation of other VMs

• Migration-aware storage– Failure-resilient network file system for virtual disks

– Peer-to-peer caching of common contents

• Self-Migration of a native OS, directly on the raw hardware– Also useful for software-suspend / hibernation

Page 53: 1 Virtual Machine Mobility with Self-Migration Jacob Gorm Hansen Department of Computer Science, University of Copenhagen (now at VMware)

53

Conclusion & Contributions

• Compared to processes, VMs offer superior functionality– Control own paging and scheduling

– Provide file systems and virtual memory

– Backwards compatible

– Safe containers for untrusted code

• We have shown:– How VMs can live-migrate across a network, with sub-second downtimes

– How VMs can self-migrate, without help from the VMM

• Furthermore:– We have designed and implemented a “Laundromat Computing” system

– Reduced the network control plane from millions to hundreds of lines of code

– Pulse and Blink supporting systems

Page 54: 1 Virtual Machine Mobility with Self-Migration Jacob Gorm Hansen Department of Computer Science, University of Copenhagen (now at VMware)

54

Questions

Page 55: 1 Virtual Machine Mobility with Self-Migration Jacob Gorm Hansen Department of Computer Science, University of Copenhagen (now at VMware)

55

VMware is hiring in Aarhus

Thank You

http://www.diku.dk/~jacobg

Page 56: 1 Virtual Machine Mobility with Self-Migration Jacob Gorm Hansen Department of Computer Science, University of Copenhagen (now at VMware)

56

Dealing with Network Side-effects

• The copy-on-write phase results in a network fork

• “Parent” and “child” overlap and diverge

• Firewall network traffic during final copy phase

• All except migration-traffic is silently dropped in last phase

Page 57: 1 Virtual Machine Mobility with Self-Migration Jacob Gorm Hansen Department of Computer Science, University of Copenhagen (now at VMware)

57

Re-routing Network Traffic

• Simple techniques– IP redirection with gratuitous ARP

– MAC address spoofing

• Wide-area:– IP-in-IP tunnelling

Page 58: 1 Virtual Machine Mobility with Self-Migration Jacob Gorm Hansen Department of Computer Science, University of Copenhagen (now at VMware)

58

Overhead Added by Continuous Migration

Page 59: 1 Virtual Machine Mobility with Self-Migration Jacob Gorm Hansen Department of Computer Science, University of Copenhagen (now at VMware)

59

Control Models Compared

Page 60: 1 Virtual Machine Mobility with Self-Migration Jacob Gorm Hansen Department of Computer Science, University of Copenhagen (now at VMware)

60

User-Space Migration Driver