Upload
alyson-walters
View
216
Download
0
Tags:
Embed Size (px)
Citation preview
D u k e S y s t e m s
Intro to Clouds
Jeff ChaseDept. of Computer Science
Duke University
Virtual machinesPart 1
The story so far: OS platforms
• OS platforms let us run programs in contexts.
• Contexts are protected/isolated to varying degrees.
• The OS platform TCB offers APIs to create and manipulate protected contexts.– It enforces isolation of contexts for running programs.
– It governs access to hardware resources.
• Classical example: – Unix context: process
– Unix TCB: kernel
– Unix kernel API: syscalls
The story so far: layered platforms
• We can layer “new” platforms on “old” ones.– The outer layer hides the the inner layer,
– covering the inner APIs and abstractions, and
– replacing them with the model of the new platform.
• Example: Android over Linux
AndroidAMS
JVM+lib
Native virtual machines (VMs)
• Slide a hypervisor underneath the kernel.– New OS/TCB layer: virtual machine monitor (VMM).
• Kernel and processes run in a virtual machine (VM).– The VM “looks the same” to the OS as a physical machine.
– The VM is a sandboxed/isolated context for an entire OS.
• A VMM can run multiple VMs on a shared computer.
guest or tenant
VM contexts
hosthypervisor/VMM
guest VM1 guest VM2 guest VM3
OS kernel 1 OS kernel 2 OS kernel 3
P1A P2B P3C
What is a “program” for a VM?
VMM/hypervisor is a new layer of OS platform, with a new kind of protected context. What is a program?
What kind of program do we
launch into a VM context?
guest kernel
app
hypervisor/VMM
app app
It’s called avirtual appliance or VM image.
A VM is called an instanceof the image.
???
V 4.2.9
virtual appliance contains a
complete OS system image, with file tree and apps
[Graphics are from rPath inc. and VMware inc.]
Thank you, VMware
When virtual is better than real
Motivation: support multiple OS
When virtual is better than realeveryone plays nicely together
[image from virtualbox.org]
The story so far: protected CPU mode
user mode
kernel mode
kernel “top half”kernel “bottom half” (interrupt handlers)
syscall trap
u-start u-return u-start
fault
u-return
fault
clock interrupt
interruptreturn
Kernel handler manipulates CPU register context to return
to selected user context.
Any kind of machine exception transfers control to a registered (trusted) kernel handler running in a protected CPU mode.
A closer look
syscall trap
u-start
u-return
u-start
fault
u-return
fault
clock interrupt
interruptreturn
X
user stack
kernel stack
user stack
kernel stack
handler dispatch
table
boot
u-return
IA/x86 Protection Rings (CPL)
• Modern CPUs have multiple protected modes.
• History: IA/x86 rings (CPL)
– Has built in security levels (Rings 0, 1, 2, 3)
– Ring 0 – “Kernel mode” (most privileged)
– Ring 3 – “User mode”
• Unix uses only two modes:
– user – untrusted execution
– kernel – trusted execution
Increasing Privilege Level
Ring 0
Ring 1
Ring 2
Ring 3
CPU Privilege Level (CPL)
[Fischbach]
Protection Rings
• New Intel VT and AMD SVM CPUs introduce new protected modes for VMM hypervisors.
• We can think of it as a new inner ring: one ring to bind them all.
• Warning: this is an oversimplification: the actual architecture is more complex for backward compatibility.
hypervisor
kernel
user
hypervisor
guest
user
Protection Rings
• Computer scientists have drawn these rings since the 1960s.
• They represent layering: the outer ring “hides” the interface of the lower ring.
• The machine defines the events (exceptions) that transition to higher privilege (inner ring).
• Inner rings register handlers to intercept selected events.
• But the picture is misleading….
Increasing Privilege Level
Ring 0
Ring 1
Ring 2
Ring 3
[Fischbach]
Protection Rings
• We might just as soon draw it “inside out”.
• Now the ring represents power: what the code at that ring can access or modify.
• Bigger rings have more power.
• Inclusion: bigger rings can see or do anything that the smaller rings can do.
• And they can manipulate the state of the rings they contain.
• But still misleading: there are multiple ‘instances’ of the weaker rings.
hypervisor
guest
user
Maybe a better picture…
There are multiple ‘instances’ of the weaker rings.
And powers are nested: an outer ring limits the “sandbox” or scope of the rings it contains.
Post-note
• The remaining slides in the section are just more slides to reinforce these concepts.
• We didn’t see them in class.
• There is more detail in the reading…
registers
CPU core
R0
Rn
PC x
mode
CPU mode (a field in some status register) indicates whether a machine CPU (core) is running in a user program or in the protected kernel.
Some instructions or register accesses are legal only when the CPU (core) is executing in kernel mode.
CPU mode transitions to kernel mode only on machine exception events (trap, fault, interrupt), which transfers control to a handler registered by the kernel with the machine at boot time.
So only the kernel program chooses what code ever runs in the kernel mode (or so we hope and intend).
A kernel handler can read the user register values at the time of the event, and modify them arbitrarily before (optionally) returning to user mode.
Kernel Mode
U/K
synchronouscaused by an
instruction
asynchronouscaused by some other
event
intentionalhappens every time
unintentionalcontributing factors
trap: system callopen, close, read,
write, fork, exec, exit, wait, kill, etc.
faultinvalid or protected
address or opcode, page fault, overflow, etc.
interruptcaused by an external
event: I/O op completed, clock tick, power fail, etc.
“software interrupt” software requests an
interrupt to be delivered at a later time
Exceptions: trap, fault, interrupt
Kernel Stacks and Trap/Fault Handling
data
Processes execute user
code on a user stack in the user virtual
memory in the process virtual address space.
Each process has a second kernel stack in kernel
space (VM accessible only to
the kernel).
stack
stack
stack
stack
System calls and faults run
in kernel mode on the
process kernel stack.
syscall dispatch
table
Kernel code running in P’s
process context (i.e., on its kstack) has
access to P’s virtual memory.
The syscall handler makes an indirect call through the system call dispatch table to the handler registered for the specific system call.
More on VMsRecent CPUs support additional protected mode(s) for hypervisors. When the hypervisor initializes, it selects some set of event types to intercept, and registers handlers for them.
Selected machine events occuring in user mode or kernel mode transfer control to a hypervisor handler. For example, a guest OS kernel accessing device registers may cause the physical machine to invoke the hypervisor to intervene.
In addition, the VM architecture has another level of indirection in the MMU page tables: the hypervisor can specify and restrict what parts of physical memory are visible to each guest VM.
A guest VM kernel can map to or address a physical memory frame or command device DMA I/O to/from a physical frame if and only if the hypervisor permits it.
If any guest VM tries to do anything weird, then the hypervisor regains control and can see or do anything to any part of the physical or virtual machine state before (optionally) restarting the guest VM.
If you are interested…
2.1 The Intel VT-x Extension
In order to improve virtualization performance and simplify VMM implementation, Intel has developed VT-x [37], a virtualization extension to the x86 ISA. AMD also provides a similar extension with a different hardware interface called SVM [3].
The simplest method of adapting hardware to support virtualization is to introduce a mechanism for trapping each instruction that accesses privileged state so that emulation can be performed by a VMM. VT-x embraces a more sophisticated approach, inspired by IBM’s interpre tive execution architecture [31], where as many instructions as possible, including most that access privileged state, are executed directly in hardware without any intervention from the VMM. This is possible because hardware maintains a “shadow copy” of privileged state. The motivation for this approach is to increase performance, as traps can be a significant source of overhead.
VT-x adopts a design where the CPU is split into two operating modes: VMX root and VMX non-root mode. VMX root mode is generally used to run the VMM and does not change CPU behavior, except to enable access to new instructions for managing VT-x. VMX non-root mode, on the other hand, restricts CPU behavior and is intended for running virtualized guest OSes.
Transitions between VMX modes are managed by hardware. When the VMM executes the VMLAUNCH or VMRESUME instruction, hardware performs a VM entry; placing the CPU in VMX non-root mode and executing the guest. Then, when action is required from the VMM, hardware performs a VM exit, placing the CPU back in VMX root mode and jumping to a VMM entry point. Hardware automatically saves and restores most architectural state during both types of transitions. This is accomplished by using buffers in a memory resident data structure called the VM control structure (VMCS).
In addition to storing architectural state, the VMCS contains a myriad of configuration parameters that allow the VMM to control execution and specify which type of events should generate VM exits. This gives the VMM considerable flexibility in determining which hardware is exposed to the guest. For example, a VMM could configure the VMCS so that the HLT instruction causes a VM exit or it could allow the guest to halt the CPU. However, some hardware interfaces, such as the interrupt descriptor table (IDT) and privilege modes, are exposed implicitly in VMX non-root mode and never generate VM exits when accessed. Moreover, a guest can manually request a VM exit by using the VMCALL instruction.
Virtual memory is perhaps the most difficult hardware feature for a VMM to expose safely. A straw man solution would be to configure the VMCS so that the guest has access to the page table root register, %CR3. However, this would place complete trust in the guest because it would be possible for it to configure the page table to access any physical memory address, including memory that belongs to the VMM. Fortunately, VT-x includes a dedicated hardware mechanism, called the extended page table (EPT), that can enforce memory isolation on guests with direct access to virtual memory. It works by applying a second, underlying, layer of address translation that can only be configured by the VMM. AMD’s SVM includes a similar mechanism to the EPT, referred to as a nested page table (NPT).
From Dune: Safe User-level Access to Privileged CPU Features, Belay e.t al., (Stanford), OSDI, October, 2012
VT in a Nutshell
• New VM mode bit– Orthogonal to kernel/user mode or rings (CPL)
• If VM mode is off– Machine looks just like it always did
• If VM bit is on– Machine is running a guest VM
– “VMX non-root operation”
– Various events cause gated entry into hypervisor
– “virtualization intercept”
– Hypervisor can control which events cause intercepts
– Hypervisor can examine/manipulate guest VM state
ServicesPart 2
There is another motivation for VMs and hypervisors. Application services and computational jobs need access to computing power “on tap”. Virtualization allows the owner of a server to “slice and dice” server resources and allocate the virtual slices out to customers as VMs. The customers can install and manage their own software their own way in their own VMs. That is cloud hosting.
Services
RPC
GET (HTTP)
End-to-end application delivery
Cloud and Software-as-a-Service (SaaS)Rapid evolution, no user upgrade, no user data management.Agile/elastic deployment on virtual infrastructure.
Where is your application?Where is your data?Where is your OS?
Networking
channelbinding
connection
endpointport
Some IPC mechanisms allow communication across a network.E.g.: sockets using Internet communication protocols (TCP/IP).Each endpoint on a node (host) has a port number.
Each node has one or more interfaces, each on at most one network.Each interface may be reachable on its network by one or more names.
E.g. an IP address and an (optional) DNS name.
node A node B
operationsadvertise (bind)listenconnect (bind)close
write/sendread/receive
SaaS platform elements
[wiki.eeng.dcu.ie]“Classical OS”
browsercontainer
[Graphic from Amazon: Mike Culver, Web Scale Computing]
Motivation: “Success disaster”
[Graphic from Amazon: Mike Culver, Web Scale Computing]
Motivation: “Success disaster”
Virtual Cloud hostingPart 2
“Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction.”
- US National Institute for Standards and Technology http://www.csrc.nist.gov/groups/SNS/cloud-computing/
Client Server(s)
Cloud > server-based computing
• Client/server model (1980s - )
• Now called Software-as-a-Service (SaaS)
Cloud Provider(s)
Host
GuestClient Service
Host/guest model
• Service is hosted by a third party.– flexible programming model
– cloud APIs for service to allocate/link resources
– on-demand: pay as you grow
OS
VMM
Physical
Platform
Client Service
IaaS: infrastructure services
Deployment of private clouds is growing rapidly w/ open IaaS cloud software.
Hosting performance and isolation is determined by virtualization layer
Virtual machines: VMware, KVM, etc.
OS
VMM (optional)
Physical
Platform
Client Service
PaaS cloud services define the high-level programming models, e.g., for clusters or specific application classes.
PaaS: platform services
Hadoop, grids,batch job services, etc. can also be viewed as PaaS category.
Note: can deploy them over IaaS.
Varying workload
Fixed system Varying performance
Varying workload
Varying system Fixed performance
Varying workload
Varying system Target performance
“Elastic Cloud”Resource Control
Elastic provisioning
Managing Energy and Server Resources in Hosting Centers, SOSP, October 2001.
EC2 The canonical public cloud
Virtual Appliance
Image
OpenStack, the Cloud Operating SystemManagement Layer That Adds Automation & Control
[Anthony Young @ Rackspace]
IaaS Cloud APIs (OpenStack, EC2)
• Query of availability zones (i.e. clusters in Eucalyptus)
• SSH public key management (add, list, delete)
• VM management (start, list, stop, reboot, get console output)
• Security group management
• Volume and snapshot management (attach, list, detach, create, bundle, delete)
• Image management (bundle, upload, register, list, deregister)
• IP address management (allocate, associate, list, release)
Adding storage
Competing Cloud Models: PaaS vs. IaaS
• Cloud Platform as a Service (PaaS). The capability provided to the consumer is to deploy onto the cloud infrastructure consumer-created or acquired applications created using programming languages and tools supported by the provider. The consumer does not manage or control the underlying cloud infrastructure including network, servers, operating systems, or storage, but has control over the deployed applications and possibly application hosting environment configurations.
• Cloud Infrastructure as a Service (IaaS). The capability provided to the consumer is to provision processing, storage, networks, and other fundamental computing resources where the consumer is able to deploy and run arbitrary software, which can include operating systems and applications. The consumer does not manage or control the underlying cloud infrastructure but has control over operating systems, storage, deployed applications, and possibly limited control of select networking components (e.g., host firewalls).
Amazon Elastic Compute Cloud (EC2)EucalyptusOpenNebula
Post-note
• The remaining slides weren’t discussed.
• Some give more info on the various forms of cloud computing following the NIST model. Just understand IaaS and PaaS hosting models.
• The “Adaptation” slides deal with resource management: what assurances does the holder of virtual infrastructure have about how much resource it will receive, and how good its performance will (therefore) be? We’ll discuss this more later.
• The last slide refers to an advanced cloud project at Duke and RENCI.org, partially funded by NSF Global Environment for Network Innovations (geni.net).
Managing images
• “Let a thousand flowers bloom.”
• Curated image collections are needed!
• “Virtual appliance marketplace”
Infrastructure as a Service (IaaS)“Consumers of IaaS have access to virtual computers, network-accessible storage, network infrastructure components, and other fundamental computing resources…and are billed according to the amount or duration of the resources consumed.”
Cloud Models
• Cloud Software as a Service (SaaS)– Use provider’s applications over a network
• Cloud Platform as a Service (PaaS)– Deploy customer-created applications to a cloud
• Cloud Infrastructure as a Service (IaaS)– Rent processing, storage, network capacity, and
other fundamental computing resources
NIST Cloud Definition Framework
CommunityCommunityCloudCloud
Private Private CloudCloud
Public CloudPublic Cloud
Hybrid Clouds
DeploymentModels
ServiceModels
EssentialCharacteristics
Common Characteristics
Software as a Service (SaaS)
Platform as a Service (PaaS)
Infrastructure as a Service (IaaS)
Resource Pooling
Broad Network Access Rapid Elasticity
Measured Service
On Demand Self-Service
Low Cost Software
Virtualization Service Orientation
Advanced Security
Homogeneity
Massive Scale Resilient Computing
Geographic Distribution
Adaptations: Describing IaaS Services
Computer
CPU
Memory
Disk
BW
ra=(8,4)
rb=(4,8)
a
b
crc=(4,4)
→
→
→16
CPU shares
mem
ory
shar
es
Adaptations: service classes
• Must adaptations promise performance isolation?
• There is a wide range of possible service classes…to the extent that we can reason about them.
Availablesurplus
Weakeffort
Besteffort
Proportionalshare
Elastic reservation
Hard reservation
Continuum of service classes
Reflects load factor or overbooking degree
Reflects priority
Constructing “slices”
• I like to use TinkerToys as a metaphor for creating a slice in the GENI federated cloud.
• The parts are virtual infrastructure resources: compute, networking, storage, etc.
• Parts come in many types, shapes, sizes.
• Parts interconnect in various ways. • We combine them to create useful
built-to-order assemblies.• Some parts are programmable.• Where do the parts come from?