46
1 Disco: Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine, Mendel Rosenblum, Stanford University, 1997 Presented by Divya Parekh

1 Disco: Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine, Mendel Rosenblum, Stanford University, 1997 Presented

Embed Size (px)

Citation preview

Page 1: 1 Disco: Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine, Mendel Rosenblum, Stanford University, 1997 Presented

1

Disco: Running Commodity Operating Systems on Scalable Multiprocessors

Edouard Bugnion, Scott Devine, Mendel Rosenblum,Stanford University, 1997

Presented by Divya Parekh

Page 2: 1 Disco: Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine, Mendel Rosenblum, Stanford University, 1997 Presented

2

Outline

Virtualization Disco description Disco performance Discussion

Page 3: 1 Disco: Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine, Mendel Rosenblum, Stanford University, 1997 Presented

3

Virtualization “a technique for hiding the physical

characteristics of computing resources from the way in which other systems, applications, or end users interact with those resources. This includes making a single physical resource appear to function as multiple logical resources; or it can include making multiple physical resources appear as a single logical resource”

Page 4: 1 Disco: Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine, Mendel Rosenblum, Stanford University, 1997 Presented

4

Old idea from the 1960s IBM VM/370 – A VMM for IBM mainframe

Multiple OS environments on expensive hardware Desirable when few machine around

Popular research idea in 1960s and 1970s Entire conferences on virtual machine monitors Hardware/VMM/OS designed together

Interest died out in the 1980s and 1990s Hardware got more cheaper Operating systems got more powerful (e.g. multi-user)

Page 5: 1 Disco: Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine, Mendel Rosenblum, Stanford University, 1997 Presented

5

A Return to Virtual Machines Disco: Stanford research project (SOSP ’97)

Run commodity OSes on scalable multiprocessors Focus on high-end: NUMA, MIPS, IRIX

Commercial virtual machines for x86 architecture VMware Workstation (now EMC) (1999-) Connectix VirtualPC (now Microsoft)

Research virtual machines for x86 architecture Xen (SOSP ’03) plex86

OS-level virtualization FreeBSD Jails, User-mode-linux, UMLinux

Page 6: 1 Disco: Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine, Mendel Rosenblum, Stanford University, 1997 Presented

6

Overview Virtual Machine

A fully protected and isolated copy of the underlying physical machine’s hardware. (definition by IBM)”

Virtual Machine Monitor A thin layer of software that's between the hardware

and the Operating system, virtualizing and managing all hardware resources.

Also known as “Hypervisor”

Page 7: 1 Disco: Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine, Mendel Rosenblum, Stanford University, 1997 Presented

7

Classification of Virtual Machines

Page 8: 1 Disco: Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine, Mendel Rosenblum, Stanford University, 1997 Presented

8

Classification of Virtual Machines Type I

VMM is implemented directly on the physical hardware.

VMM performs the scheduling and allocation of the system’s resources.

IBM VM/370, Disco, VMware’s ESX Server, Xen

Type II VMMs are built completely on top of a host OS. The host OS provides resource allocation and

standard execution environment to each “guest OS.” User-mode Linux (UML), UMLinux

Page 9: 1 Disco: Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine, Mendel Rosenblum, Stanford University, 1997 Presented

9

Non-Virtualizable Architectures According to Popek and Goldberg,

” an architecture is virtualizable if the set of sensitive instructions is a subset of the set of privileged instructions.”

x86 Several instructions can read system state

in register CPL 3 without trapping MIPS

KSEG0 bypasses TLB, reads physical memory directly

Page 10: 1 Disco: Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine, Mendel Rosenblum, Stanford University, 1997 Presented

10

Type I contd..

Hardware Support for Virtualization

Figure: The hardware support approach to x86 VirtualizationE.g. Intel Vanderpool/VT and AMD-V/SVM

Page 11: 1 Disco: Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine, Mendel Rosenblum, Stanford University, 1997 Presented

11

Type I contd..

Full Virtualization

Figure : The binary translation approach to x86 VirtualizationE.g. VMware ESX server

Page 12: 1 Disco: Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine, Mendel Rosenblum, Stanford University, 1997 Presented

12

Type I contd..

Paravirtualization

Figure: The Paravirtualization approach to x86 Virtualization E.g. Xen

Page 13: 1 Disco: Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine, Mendel Rosenblum, Stanford University, 1997 Presented

13

Type II Hosted VM Architecture

E.g. VMware Workstation, Connectix VirtualPC

Page 14: 1 Disco: Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine, Mendel Rosenblum, Stanford University, 1997 Presented

14

Disco : VMM Prototype

Goals Extend modern OS to run efficiently on

shared memory multiprocessors without large changes to the OS.

A VMM built to run multiple copies of Silicon Graphics IRIX operating system on a Stanford Flash shared memory multiprocessor.

Page 15: 1 Disco: Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine, Mendel Rosenblum, Stanford University, 1997 Presented

15

Problem Description Multiprocessor in the market (1990s)

Innovative Hardware Hardware faster than System Software

Customized OS are late, incompatible, and possibly bug

Commodity OS not suited for multiprocessors Do not scale cause of lock contention,

memory architecture Do not isolate/contain faults

More Processors More failures

Page 16: 1 Disco: Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine, Mendel Rosenblum, Stanford University, 1997 Presented

16

Solution to the problems Resource-intensive Modification of OS (hard

and time consuming, increase in size, etc) Make a Virtual Machine Monitor (software)

between OS and Hardware to resolve the problem

Page 17: 1 Disco: Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine, Mendel Rosenblum, Stanford University, 1997 Presented

17

Two opposite Way for System Software Address these challenges in the operating

system: OS-Intensive Hive , Hurricane, Cellular-IRIX, etc innovative, single system image But large effort.

Hard-partition machine into independent failure units: OS-light Sun Enterprise10000 machine Partial single system image Cannot dynamically adapt the partitioning

Page 18: 1 Disco: Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine, Mendel Rosenblum, Stanford University, 1997 Presented

18

Return to Virtual Machine Monitors One Compromise Way between OS-intensive &

OS-light – VMM Virtual machine monitors, in combination with

commodity and specialized operating systems, form a flexible system software solution for these machines

Disco was introduced to allow trading off between the costs of performance and development cost.

Page 19: 1 Disco: Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine, Mendel Rosenblum, Stanford University, 1997 Presented

19

Architecture of Disco

Page 20: 1 Disco: Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine, Mendel Rosenblum, Stanford University, 1997 Presented

20

Advantages of this approach Scalability Flexibility Hide NUMA effect Fault Containment Compatibility with legacy applications

Page 21: 1 Disco: Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine, Mendel Rosenblum, Stanford University, 1997 Presented

21

Challenges Facing Virtual Machines Overheads

Trap and emulate privileged instructions of guest OS

Access to I/O devices Replication of memory in each VM

Resource Management Lack of information to make good policy

decisions Communication and Sharing

Stand alone VM’s cannot communicate

Page 22: 1 Disco: Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine, Mendel Rosenblum, Stanford University, 1997 Presented

22

Disco’s Interface Processors

MIPS R10000 processor Emulates all instructions, the MMU, trap architecture Extension to support common processor operations

Enabling/disabling interrupts, accessing privileged registers Physical memory

Contiguous, starting at address 0 I/O devices

Virtualize devices like I/O, disks, n/w interface exclusive to VM

Physical devices multiplexed by Disco Special abstractions for SCSI disks and network interfaces

Virtual disks for VMs Virtual subnet across all virtual machines

Page 23: 1 Disco: Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine, Mendel Rosenblum, Stanford University, 1997 Presented

23

Disco Implementation Multi threaded shared memory program Attention to NUMA memory placement, cache

aware data structures and IPC patterns Code segment of DISCO copied to each flash

processor – data locality Communicate using shared memory

Page 24: 1 Disco: Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine, Mendel Rosenblum, Stanford University, 1997 Presented

24

Virtual CPUs Direct Execution

execution of virtual CPU on real CPU Sets the real machine’s registers to the virtual CPU’s Jumps to the current PC of the virtual CPU, Direct

execution on the real CPU Challenges

Detection and fast emulation of operations that cannot be safely exported to the virtual machine privileged instructions such as TLB modification and Direct access to physical memory and I/O devices.

Maintains data structure for each virtual CPU for trap emulation

Scheduler multiplexes virtual CPU on real processor

Page 25: 1 Disco: Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine, Mendel Rosenblum, Stanford University, 1997 Presented

25

Virtual Physical Memory Address translation & maintains a physical-to-

machine address (40 bit) mapping. Virtual machines use physical addresses Software reloaded translation-lookaside buffer

(TLB) of the MIPS processor Maintains pmap data structure for each VM –

contains one entry for each physical to virtual mapping

pmap also has a back pointer to its virtual address to help invalidate mappings in the TLB

Page 26: 1 Disco: Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine, Mendel Rosenblum, Stanford University, 1997 Presented

26

Contd.. Kernel mode references on MIPS processors

access memory and I/O directly - need to re-link OS code and data to a mapped address space

MIPS tags each TLB entry with Address space identifiers (ASID)

ASIDs are not virtualized - TLB need to be flushed on VM context switches

Increased TLB misses in workloads Additional Operating system references VM context switches

TLB misses expensive - create 2nd level software - TLB . Idea similar to cache?

Page 27: 1 Disco: Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine, Mendel Rosenblum, Stanford University, 1997 Presented

27

NUMA Memory management Cache misses should be satisfied from local

memory (fast) rather than remote memory (slow)

Dynamic Page Migration and Replication Pages frequently accessed by one node are migrated Read-shared pages are replicated among all nodes Write-shared are not moved, since maintaining

consistency requires remote access anyway Migration and replacement policy is driven by cache-

miss-counting facility provided by the FLASH hardware

Page 28: 1 Disco: Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine, Mendel Rosenblum, Stanford University, 1997 Presented

28

Transparent Page Replication

1. Two different virtual processors of the same virtual machine logically read-share the same physical page, but each virtual processor accesses a local copy.2. memmap tracks which virtual page references each physical page. Used during TLB shootdown

Page 29: 1 Disco: Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine, Mendel Rosenblum, Stanford University, 1997 Presented

29

Disco Memory Management

Page 30: 1 Disco: Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine, Mendel Rosenblum, Stanford University, 1997 Presented

30

Virtual I/O Devices Disco intercepts all device accesses from the

virtual machine and forwards them to the physical devices

Special device drivers are added to the guest OS

Disco device provide monitor call interface to pass all the arguments in single trap

Single VM accessing a device does not require virtualizing the I/O – only needs to assure exclusivity

Page 31: 1 Disco: Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine, Mendel Rosenblum, Stanford University, 1997 Presented

31

Copy-on-write Disks Intercept DMA requests to translate the physical

addresses into machine addresses. Maps machine page as read only to destination

address page of DMA Sharing machine memory Attempts to modify a shared page will result in a

copy-on-write fault handled internally by the monitor.

Logs are maintained for each VM Modification Modification made in main memory

Non-persistent disks are copy on write shared E.g. Kernel text and buffer cache E.g. File systems root disks

Page 32: 1 Disco: Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine, Mendel Rosenblum, Stanford University, 1997 Presented

32

Transparent Sharing of Pages

Creates a global buffer cache shared across VM's and reduces memory foot print of the system

Page 33: 1 Disco: Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine, Mendel Rosenblum, Stanford University, 1997 Presented

33

Virtual Network Interface Virtual subnet and network interface use copy

on write mapping to share the read only pages Persistent disks can be accessed using

standard system protocol NFS Provides a global buffer cache that is

transparently shared by independent VMs

Page 34: 1 Disco: Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine, Mendel Rosenblum, Stanford University, 1997 Presented

34

Transparent sharing of pages over NFS

1. The monitor’s networking device remaps the data page from the source’s machine address space to the destination’s.2. The monitor remaps the data page from the driver’s mbuf to the clients buffer cache.

Page 35: 1 Disco: Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine, Mendel Rosenblum, Stanford University, 1997 Presented

35

Modifications to the IRIX 5.3 OS Minor changes to kernel code and data

segment – specific to MIPS Relocate the unmapped segment of the

virtual machine into the mapped supervisor segment of the processor– Kernel relocation

Disco drivers are same as original device drivers of IRIX

Patched HAL to use memory loads/stores instead of privileged instructions

Page 36: 1 Disco: Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine, Mendel Rosenblum, Stanford University, 1997 Presented

36

Modifications to the IRIX 5.3 OS Added code to HAL to pass hints to

monitor for resource management New Monitor calls to MMU to request

zeroed page, unused memory reclamation

Changed mbuf management to be page-aligned

Changed bcopy to use remap (with copy-on-write)

Page 37: 1 Disco: Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine, Mendel Rosenblum, Stanford University, 1997 Presented

37

SPLASHOS: A specialized OS

Thin specialized library OS, supported directly by Disco

No need for virtual memory subsystem since they share address space

Used for the parallel scientific applications that can span the entire machine

Page 38: 1 Disco: Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine, Mendel Rosenblum, Stanford University, 1997 Presented

38

Disco: Performance

Experimental Setup Disco targets the FLASH machine not

available that time Used SimOS, a machine simulator

that models the hardware of MIPS-based multiprocessors for the Disco monitor.

Simulator was too slow to allow long work loads to be studied

Page 39: 1 Disco: Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine, Mendel Rosenblum, Stanford University, 1997 Presented

39

Disco: Performance

Workloads

Page 40: 1 Disco: Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine, Mendel Rosenblum, Stanford University, 1997 Presented

40

Disco: Performance

Execution Overhead

Pmake overhead due to I/O virtualization, others due to TLB mappingReduction of kernel timeOn average virtualization overhead of 3% to 16%

Page 41: 1 Disco: Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine, Mendel Rosenblum, Stanford University, 1997 Presented

41

Disco: Performance

Memory Overheads

V: Pmake memory used if there is no sharingM: Pmake memory used if there is sharing

Page 42: 1 Disco: Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine, Mendel Rosenblum, Stanford University, 1997 Presented

42

Disco: Performance

Scalability

Partitioning of problem into different VM’s increases scalability.

Kernel synchronization time becomes smaller.

Page 43: 1 Disco: Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine, Mendel Rosenblum, Stanford University, 1997 Presented

43

Disco: Performance

Dynamic Page Migration and replication

Page 44: 1 Disco: Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine, Mendel Rosenblum, Stanford University, 1997 Presented

44

Conclusion

Disco VMM hides NUMA-ness from non-NUMA aware OS

Disco VMM is low(er) effort Moderate overhead due to

virtualization

Page 45: 1 Disco: Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine, Mendel Rosenblum, Stanford University, 1997 Presented

45

Discussion Was Disco- VMM done rightly?

Virtual Physical Memory on architectures other than MIPS

MIPS TLB is software managed Not sure of how well other OS perform on

Disco since IRIX was designed for MIPS Not sure how HIVE, Hurricane performs

comparatively Performance of long workloads on the

system Performance of heterogeneous VMs e.g.

Pmake case

Page 46: 1 Disco: Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine, Mendel Rosenblum, Stanford University, 1997 Presented

46

Discussion

Are VMM Microkernels done right?