22
Disco Disco Running Commodity Operating Systems on Scalable Multiprocessors Presented by Petar Bujosevic 05/17/2005 Paper by Edouard Bugnion, Scott Devine, and Mendel Rosenblum

Disco Running Commodity Operating Systems on Scalable Multiprocessors Presented by Petar Bujosevic 05/17/2005 Paper by Edouard Bugnion, Scott Devine, and

Embed Size (px)

Citation preview

Page 1: Disco Running Commodity Operating Systems on Scalable Multiprocessors Presented by Petar Bujosevic 05/17/2005 Paper by Edouard Bugnion, Scott Devine, and

DiscoDiscoRunning Commodity Operating Systems

on Scalable Multiprocessors

Presented by Petar Bujosevic05/17/2005

Paper by Edouard Bugnion, Scott Devine, and Mendel Rosenblum

Page 2: Disco Running Commodity Operating Systems on Scalable Multiprocessors Presented by Petar Bujosevic 05/17/2005 Paper by Edouard Bugnion, Scott Devine, and

2/22

IntroductionIntroduction

• More scalable systems on the market• System software trailing hardware• Development resource intensive• Idea: insert an additional layer of

software between OS and HW• FLASH microprocessor on ccNUMA• Multiple copies of commodity OSes

across the layer

Page 3: Disco Running Commodity Operating Systems on Scalable Multiprocessors Presented by Petar Bujosevic 05/17/2005 Paper by Edouard Bugnion, Scott Devine, and

3/22

Problem DescriptionProblem Description

• Innovative hardware (scalable shared memory multiprocessors)

• Requires significant changes to system software to support hardware advantages

• High cost, large system SW requires long development time, powerful SW companies

• HW vs. SW – ”Impediment to innovation””Impediment to innovation”• Challenges: Overhead, Resource

management, Sharing/Communication

Page 4: Disco Running Commodity Operating Systems on Scalable Multiprocessors Presented by Petar Bujosevic 05/17/2005 Paper by Edouard Bugnion, Scott Devine, and

4/22

Virtual Machine MonitorsVirtual Machine Monitors

● Run operating systems efficiently on scalable multi-processor systems

● Insert additional layer of software between HW and OS● Reduce overhead associated with layer● Small implementation effort with no major changes to the OS● Virtual machines as units of HW fault containment● Monitor handles all the NUMA related issues so that UMA OSes

do not need to be made aware of non-uniformity

● Challenges● OverheadOverhead - due to memory replication in each VM● Resource managementResource management - decisions w/out high-level knowledge● Communication and sharingCommunication and sharing - interoperating in distributed env.

Independent stand-alone systems that simply happened to be sharing same hardwareIndependent stand-alone systems that simply happened to be sharing same hardware

Page 5: Disco Running Commodity Operating Systems on Scalable Multiprocessors Presented by Petar Bujosevic 05/17/2005 Paper by Edouard Bugnion, Scott Devine, and

5/22

Disco ArchitectureDisco Architecture

● Virtual machine is assigned resources by Disco which manages a pool of processing elements/memory resources

● Decouple Operating System from machine hardware.● OS runs on virtual machine

Page 6: Disco Running Commodity Operating Systems on Scalable Multiprocessors Presented by Petar Bujosevic 05/17/2005 Paper by Edouard Bugnion, Scott Devine, and

6/22

Disco ImplementationDisco Implementation

• Disco emulates the MMU and the trap architecture, allowing unmodified applications and OSes to run on the VM

• Frequently used kernel operations can be optimized. For instance interrupt disabling is done by the OSes by load and storing to special addresses

• All I/O devices are virtualized, including network connections and disks, and all access to them must pass through Disco to be translated or emulated.

Page 7: Disco Running Commodity Operating Systems on Scalable Multiprocessors Presented by Petar Bujosevic 05/17/2005 Paper by Edouard Bugnion, Scott Devine, and

7/22

Disco ImplementationDisco Implementation

Managing resourcesManaging resources

• Virtual CPUs• Virtual Physical Memory• Advanced Hardware (NUMA)• Virtual I/O devices• Virtual Network interfaces

Page 8: Disco Running Commodity Operating Systems on Scalable Multiprocessors Presented by Petar Bujosevic 05/17/2005 Paper by Edouard Bugnion, Scott Devine, and

8/22

Virtual CPUsVirtual CPUs

• Schedules virtual machine/CPU as task

• Sets registers to virtual machine registers and runs the task directly

• Controlled (supervised) access to memory

Page 9: Disco Running Commodity Operating Systems on Scalable Multiprocessors Presented by Petar Bujosevic 05/17/2005 Paper by Edouard Bugnion, Scott Devine, and

9/22

Virtual Physical MemoryVirtual Physical Memory

• Disco maintains a physical-to-machine address mapping.

• machine addresses are FLASH’s 40 bit addresses

Page 10: Disco Running Commodity Operating Systems on Scalable Multiprocessors Presented by Petar Bujosevic 05/17/2005 Paper by Edouard Bugnion, Scott Devine, and

10/22

Virtual Physical MemoryVirtual Physical Memory

• When a heavy weight OS tries to update the TLB, Disco steps in and applies the physical-to-machine translation. Subsequent memory accesses then can go straight thru the TLB

• Each VM has an associated pmap in the monitor

• pmap also has a back pointer to its virtual address to help invalidate mappings in the TLB

Page 11: Disco Running Commodity Operating Systems on Scalable Multiprocessors Presented by Petar Bujosevic 05/17/2005 Paper by Edouard Bugnion, Scott Devine, and

11/22

Virtual Physical MemoryVirtual Physical Memory

• MIPS has a tagged TLB, called address space identifier (ASID).

• ASIDs are not virtualized, so TLB must be flushed on VM context switches

• 2nd level software TLB?

Page 12: Disco Running Commodity Operating Systems on Scalable Multiprocessors Presented by Petar Bujosevic 05/17/2005 Paper by Edouard Bugnion, Scott Devine, and

12/22

NUMA ManagementNUMA Management

• Cache misses are served faster from local memory rather than remote memory

• Read and read-shared pages are migrated to all nodes that frequently access them

• Write-shared are not, since maintaining consistency requires remote access anyway

• Migration and replacement policy is driven by cache miss counting

Page 13: Disco Running Commodity Operating Systems on Scalable Multiprocessors Presented by Petar Bujosevic 05/17/2005 Paper by Edouard Bugnion, Scott Devine, and

13/22

NUMA ManagementNUMA Management

• memmap tracks which virtual page references each physical page. Used during TLB shootdown

Page 14: Disco Running Commodity Operating Systems on Scalable Multiprocessors Presented by Petar Bujosevic 05/17/2005 Paper by Edouard Bugnion, Scott Devine, and

14/22

Virtual I/O DevicesVirtual I/O Devices

• all device accesses are intercepted by the monitor• disk reads can be serviced by monitor and if request size

is a multiple of the machine page size, monitor only has to remap machine pages into the VM physical memory address space.

• pages are read-only and will generate a copy-on-write fault if written to

Page 15: Disco Running Commodity Operating Systems on Scalable Multiprocessors Presented by Petar Bujosevic 05/17/2005 Paper by Edouard Bugnion, Scott Devine, and

15/22

Virtual Network InterfaceVirtual Network Interface

• Communication between virtual machines by accessing data in shared cache

• Avoid duplication of data• Use sharing whenever possible• Affects data locality

Transparent Sharing of Pages over NFS

Page 16: Disco Running Commodity Operating Systems on Scalable Multiprocessors Presented by Petar Bujosevic 05/17/2005 Paper by Edouard Bugnion, Scott Devine, and

16/22

IRIX, HAL changesIRIX, HAL changes

• Minor changes to kernel code and data segment (unique to MIPS architecture)

• Disco uses original device drivers• Added code to HAL to pass hints to

monitor in physical memory• Request zeroed page, unused memory

reclamation• Change in mbuf freelist data structure• Call to bcopy, remap function in HAL

Page 17: Disco Running Commodity Operating Systems on Scalable Multiprocessors Presented by Petar Bujosevic 05/17/2005 Paper by Edouard Bugnion, Scott Devine, and

17/22

SPLASHOSSPLASHOS

• Thin OS, supported directly by Disco (no need for virtual memory subsystem)

• Used for parallel scientific applications

Page 18: Disco Running Commodity Operating Systems on Scalable Multiprocessors Presented by Petar Bujosevic 05/17/2005 Paper by Edouard Bugnion, Scott Devine, and

18/22

ExperimentsExperiments

• Setup and Workloads

• Execution Overheads

• Memory Overheads

• Scalability

• Dynamic Page Migration and Replication

Page 19: Disco Running Commodity Operating Systems on Scalable Multiprocessors Presented by Petar Bujosevic 05/17/2005 Paper by Edouard Bugnion, Scott Devine, and

19/22

Related WorkRelated Work

• System Software for Scalable Shared Memory Machines

• Virtual Machine Monitors

• Other System Software Structuring Techniques

• ccNUMA Memory Management

Page 20: Disco Running Commodity Operating Systems on Scalable Multiprocessors Presented by Petar Bujosevic 05/17/2005 Paper by Edouard Bugnion, Scott Devine, and

20/22

ConclusionConclusion

• Developing system software for scalable shared memory multiprocessors without huge development effort

• Adding a layer level between commodity OSes and raw HW

• Disco resolves problems of traditional virtual machines

• Global buffer cache transparently shared across all virtual machines

• Low / modest overhead• Scalability and reliability• Low implementation cost

Page 21: Disco Running Commodity Operating Systems on Scalable Multiprocessors Presented by Petar Bujosevic 05/17/2005 Paper by Edouard Bugnion, Scott Devine, and

21/22

DeficienciesDeficiencies

• Hardware failure analysis

• Larger vs. smaller number of processors

• Virtual Physical Memory on architectures other than MIPS

Page 22: Disco Running Commodity Operating Systems on Scalable Multiprocessors Presented by Petar Bujosevic 05/17/2005 Paper by Edouard Bugnion, Scott Devine, and

22/22

ReferencesReferences

• Disco: Running Commodity Operating Systems on Scalable Multiprocessors, by Edouard Bugnion, Scott Devine, and Mendel Rosenblum, 1997

• Modern Operating Systems, Second Edition, Andrew S. Tanenbaum, 2001

• http://www-flash.stanford.edu/Disco

• http://www.cs.pdx.edu/~walpole/class/cs533/slides/151.ppt, Jeremy Greenwald, 2005

• http://www.core.org.cn/OcwWeb/Electrical-Engineering-and-Computer-Science/6-828Fall2003/LectureNotes/detail/virtual_machines-.htm

• http://www.cs.wisc.edu/~dusseau/Classes/CS736/CS736-S02/ReadingQuestions/Disco.html

• http://www.cs.northwestern.edu/ ~fabianb/classes/cs-443-s05/Disco.pps

• http://www.cs.washington.edu/sosp16/

• http://www.cs.berkeley.edu/~zf/cs262a/summary34.htm

• http://en.wikipedia.org/wiki/Microkernel