37
Xen and the Art of Virtualization Paul Barham*, Boris Dragovic, Keir Fraser, Steven Hand, Tim Harris, Alex Ho, Rolf Neugebauery, Ian Pratt, Andrew Wareld *Microsoft Research Cambridge, UK University of Cambridge Computer Laboratory 19th ACM Symposium on Operating System principles(SOSP’03) 1

Xen and the Art of Virtualization

  • Upload
    ide

  • View
    42

  • Download
    0

Embed Size (px)

DESCRIPTION

Xen and the Art of Virtualization. Paul Barham * , Boris Dragovic , Keir Fraser, Steven Hand, Tim Harris, Alex Ho, Rolf Neugebauery , Ian Pratt, Andrew Wareld * Microsoft Research Cambridge , UK University of Cambridge Computer Laboratory - PowerPoint PPT Presentation

Citation preview

Page 1: Xen  and the Art of Virtualization

1

Xen and the Art of Virtualization

Paul Barham*, Boris Dragovic, Keir Fraser, Steven Hand, Tim Harris, Alex Ho, Rolf Neugebauery, Ian Pratt, Andrew Wareld

*Microsoft Research Cambridge, UKUniversity of Cambridge Computer Laboratory

19th ACM Symposium on Operating System principles(SOSP’03)

Page 2: Xen  and the Art of Virtualization

2

Introduction

• Resurgence of interest in VM technology(2003)– Modern computers are sufficiently powerful to

use virtualization.• In this paper we present Xen:– a high performance resource-managed virtual

machine monitor(VMM)

Page 3: Xen  and the Art of Virtualization

3

Problem need to solve

• VM isolation:– It is not acceptable for the execution of one to

adversely affect the performance of another.• Different operating systems enabled:– To accommodate the heterogeneity of popular

applications.• Performance overhead:– the performance overhead introduced by

virtualization should be small.

Page 4: Xen  and the Art of Virtualization

4

XEN: APPROACH & OVERVIEW

• Traditional approach— full virtualization:– The virtual hardware exposed is functionally

identical to the underlying machine.– Benefit:• allowing unmodified operating systems to be hosted.

– Drawback:• Support for full virtualization was never part of the x86

architectural design.

Page 5: Xen  and the Art of Virtualization

5

XEN: APPROACH & OVERVIEW(cont.d)

• Improvement:– Paravirtualization:• Presenting a virtual machine abstraction that is similar

but not identical to the underlying hardware.• Improved performance.• Require modifications to the guest operating system.

– But no modifications are required to guest applications.» ABI: an application binary interface (ABI) describes the

low-level interface between an application (or any type of) program and the operating system or another application.

Page 6: Xen  and the Art of Virtualization

6

The Virtual Machine Interface

• Overview of the paravirtualized x86 interface:– Memory management– CPU– Device I/O

• Why x86?– x86 represents a worst case

Page 7: Xen  and the Art of Virtualization

7

Memory management

• Software-managed TLB V.S. Physical-managed TLB

• Two decision:– To ensure safety and isolation:

• Guest OSes are responsible for allocating and managing the hardware page tables

• Minimal involvement from Xen; – Avoiding a TLB flush when entering and leaving the

hypervisor.• Xen exists in a 64MB section at the top of every address

space.

Page 8: Xen  and the Art of Virtualization

8

Memory management(cont.d)

• Method:1. A guest OS requires a new page table

• EX: a new process is being created.

2. It allocates and initializes a page from its own memory reservation and registers it with Xen.

3. Guest OS relinquish direct write privileges to the page-table memory.• All subsequent updates must be validated by Xen

– Note: • Guest OSes may batch update requests to amortize the

overhead of entering the hypervisor.

Page 9: Xen  and the Art of Virtualization

9

CPU• In order to paravirtualize CPU, the hypervisor must have

higher privilege level than guest OS.– Prevents the guest OS from directly executing privileged

instructions• Isolation.• EX: memory management design we discuss before.

– In x86, processor has 4 privilege levels in hardware.• The x86 privilege levels are generally described as rings.

– From ring 0 to ring 3(0 is the most privileged)• Therefore, hypervisor is set to ring 0, guest OS is set to ring 1, ring 3 is

set to applications.– Any OS which follows this common arrangement can be ported to Xen by

modifying it to execute in ring 1.

Page 10: Xen  and the Art of Virtualization

10

CPU(cont.d)

• Exception handle:– EX: page faults and software exception.– A table describing the handler for each type of

exception is registered with Xen for validation.• Overhead.• Safety is ensured by validating exception handlers.

– Validate the handler's code segment does not specify execution in ring 0.

Page 11: Xen  and the Art of Virtualization

11

CPU(cont.d)

• Exception handle(cont.d):– To deal with overhead:• only two types of exception occur frequently enough to

affect system performance– system calls (usually implemented via a software exception)– page faults(no solution)

• System calls can be registered to a `fast' exception handler.– Accessed directly by the processor without indirecting via ring

0.

Page 12: Xen  and the Art of Virtualization

12

Device I/O

• Full-virtualized environments– Emulating existing hardware devices

• Paravirtualized:– Xen exposes a set of clean and simple device

abstractions.• Objective: Protection and isolation.

– I/O data is transferred to and from each VM via Xen.(describe later)• In order to perform validation checks

– EX: checking that buffers are contained within a domain's memory reservation.

Page 13: Xen  and the Art of Virtualization

13

Detail Design

• Control Transfer: – Hypercalls and Events

• Data Transfer: – I/O Rings

• Subsystem Virtualization– CPU scheduling– Virtual address translation– Network– Disk

Page 14: Xen  and the Art of Virtualization

14

Control Transfer: Hypercalls and Events

• Hypercalls:– Synchronous calls from a VM to Xen.

• In order to perform a privilege operation• EX: VM request a set of page table updates

• Events:– Notifications are delivered to VM from Xen using an

asynchronous event mechanism.• Replaces the usual delivery mechanisms for device interrupts.• EX: Indicate that new data has been received over the network.• Guest OS may specify an event-callback handler to respond to

the notification.

Page 15: Xen  and the Art of Virtualization

15

Control Transfer: Hypercalls and Events

• Events(cont.d):– Pending events:• Stored in a per-domain bitmask which is updated by

Xen.– How to pend events?• Set a Xen-readable software flag.

– This is analogous to disabling interrupts on a real processor.

Page 16: Xen  and the Art of Virtualization

16

Data Transfer: I/O Rings

• Data transfer mechanism main idea– Allows data to move vertically through the system

with as little overhead as possible.

– Minimize the work required to demultiplex data to a specific VM when an interrupt is received from a device

Page 17: Xen  and the Art of Virtualization

17

12

3 4

Data Transfer: I/O Rings

• I/O data buffers are allocated out-of-band by the guest OS– Zero copy: by transfer the pointer and edit permission.

Page 18: Xen  and the Art of Virtualization

18

Data Transfer: I/O Rings

• Order:– There is no requirement that requests be

processed by Xen.• The guest OS associates a unique identifier with each

request which is reproduced in the associated response.• Reason: reorder I/O operations due to scheduling or

priority considerations.

Page 19: Xen  and the Art of Virtualization

19

CPU scheduling

• Scheduling alg:– Borrowed Virtual Time (BVT) scheduling

algorithm[11]• work-conserving • has a special mechanism for low-latency wake-up when

VM receives an event

[11]K. J. Duda and D. R. Cheriton. Borrowed-Virtual-Time (BVT) scheduling: supporting latency-sensitive threads in a general-purpose scheduler. In Proceedings of the 17th ACM SIGOPS Symposium on Operating Systems Principles, volume 33(5) of ACM Operating Systems Review, pages 261.276, Kiawah Island Resort, SC, USA, Dec. 1999.

Page 20: Xen  and the Art of Virtualization

20

BVT scheduling

• Main idea:• Virtual time.– Dispatching the runnable thread with the

earliest effective virtual time (EVT).• The EVT for the thread is computed as:

– the scheduler runs thread i if it has the minimum of all the runnable threads.

• Parameter:

Page 21: Xen  and the Art of Virtualization

21

BVT scheduling

• Context switch decision:

• Weight-fair sharing– (note: k*mcu=thread i running for t ms)

• Parameter:– mcu = minimum charging unit– C = context switch allowance

Page 22: Xen  and the Art of Virtualization

22

BVT schedulingVirtual

Time(Ei)

Real Time

w=2/3w=1/3

Page 23: Xen  and the Art of Virtualization

23

BVT scheduling• Sleeping adjustment:

– Scheduler virtual time(SVT)• Scheduler variable indicating the minimum of any runnable thread.

Virtual Time(Ei)

Real Time

gcc sleep for 15 real time unit,

when gcc wakes up, is brought to

SVT

Page 24: Xen  and the Art of Virtualization

24

Low latency dispatch

• Recall– can be set directly by a system call– Larger warp values provide lower latency dispatch

than smaller values.• Another parameter:

Page 25: Xen  and the Art of Virtualization

25

Low latency dispatchVirtual Time(Ei)

Real Time

Mpeg wake on t=5 and 15, and execute

for 2.5 time.

• mpeg run first because it is warped back 50 virtual units

Page 26: Xen  and the Art of Virtualization

26

Low latency dispatchVirtual Time(Ei)

Real Time

Li exceeded

Page 27: Xen  and the Art of Virtualization

27

Virtual address translation

• Indeed, Xen need only be involved in page table updates.– Prevent guest OSes from making unacceptable

changes.• Approach:– Xen register guest OS page tables directly with the

Memory Management Unit(MMU) and restrict guest OSes to read-only access.

– Page table updates are passed to Xen via a hypercall

Page 28: Xen  and the Art of Virtualization

28

Network• Each VM has one or more Virtual network interfaces (VIFs).

– VIFs are attached to a virtual firewall-router(VFR)– Domain0 is responsible for inserting and removing rules on VFR.

• A VIF contains:– two I/O rings of buffer descriptors, one for transmit and one for

receive.– Zero copy:

• The guest OS exchanges an unused page frame for each packet it receives.

• Fairness:– Xen implements a simple round-robin packet scheduler.

Page 29: Xen  and the Art of Virtualization

29

Disk

• Only Domain0 has direct unchecked access to physical (IDE and SCSI) disks.– VM access persistent storage through the abstraction of

virtual block devices (VBDs).• I/O ring mechanism.

– A translation table is maintained within the hypervisor for each VBD.• Mapping VBD identifier and offset to the corresponding sector

address and physical device.– Xen services batches of requests from competing

domains in a simple round-robin fashion

Page 30: Xen  and the Art of Virtualization

30

Evaluation Environment

• Hardware:– Dell 2650 dual processor2.4GHz Xeon server – 2GB RAM, – a Broadcom Tigon 3 Gigabit Ethernet NIC,– a single Hitachi DK32EJ 146GB 10k RPM SCSI disk

• OS:– Linux version 2.4.21

Page 31: Xen  and the Art of Virtualization

31

EvaluationRelative Performance

• Compare a VM performance with “bare metal”• Bare metal: a pure Linux OS directly install on physical machine.

Page 32: Xen  and the Art of Virtualization

32

EvaluationConcurrent Virtual machine

Page 33: Xen  and the Art of Virtualization

33

Conclusion

• This paper presents Xen, an x86 virtual machine monitor– allows multiple commodity operating systems to share

conventional hardware.– without sacrificing either performance or functionality.

• As our experimental results shows.

• Ongoing work:– Porting BSD and Windows XP kernels to operate over

Xen.

Page 34: Xen  and the Art of Virtualization

34

Comment

• Paravirtualization indeed has good performance.

• However, Domain-0 may be the bottleneck– a lot of work need domain-0 to validate or execute

• OS need modification in order to install on Xen’s VM.

Page 35: Xen  and the Art of Virtualization

38

Introduction of Domain-0

• Domain-0– A special privileged domain(VM)– Serves as an administrative interface to Xen– The first domain launched when the system is

booted• Note:– Domain-0(Dom0) = Privileged domain– Domain-U(DomU) = Unprivileged domain

Page 36: Xen  and the Art of Virtualization

39

A simple Xen architecture

Direct physical access to all

hardware

1

Dom0 exports the simplified generic class devices to each DomU

2

Configuration and monitoring Interface

3

Page 37: Xen  and the Art of Virtualization

40

TCP working flow Example

Zero Copy

2

1 3

4