1
VirtualizationVirtualizationRecapRecap
2
Recap 1
What is the user part of an ISA? What is the system part of an ISA? What functionality do they provide?
3
Recap 2
Application Programs
Hardware
Libraries
Operating System
Arrows?What runs in User / Kernel Mode?
4
Recap 3
Difference Process VM and System VM in terms of the interface virtualized?
Classify Java in the taxonomy
5
Recap 4
Sa
S'a
Sb
S'b
e
e'
6
Implementing Virtual Machines Implementing Virtual Machines with the Same ISAwith the Same ISA
7
Same ISA VMs
Emulation needed for different ISA VMs For same ISA: Theoretically, source instructions
can be executed directly on target Fastest Does this work for all
instructions?
Virtual Machine Monitor
Application Programs
Hardware
Libraries
Guest OS
8
Same ISA VMs (cont'd)
No: System ISA instructions need to be controlled When? System VMs with a guest OS How?
Guest OS runs in CPU User Mode System ISA instructions called in User Mode
activate Kernel Mode (trap) VMM in Kernel Mode then emulates system
instruction
9
Example: App Scheduling
Applications
Hardware
Operating System
Run
Interrupts, Traps, faults
Privileged instructions
User Mode
Kernel Mode
10
Example: App Scheduling
Applications
Hardware
Operating System
2. Run app in User Mode
3. Interrupt 1. Set interval timer
Kernel Mode
User Mode
11
Virtual Machine Monitor
Example: App Scheduling
Applications
Operating System
4. Run app in User Mode
Hardware
5. Interrupt 2. Set interval timer t'
Kernel Mode 1. Set interval timer t
3. Run OS in User Mode
User Mode
12
Virtual Machine Monitor
Example: App Scheduling
Applications
Operating System
4. Run app in User Mode
Hardware
5. Interrupt 2. Set interval timer t'
Kernel Mode 1. Set interval timer t
3. Run OS in User Mode
t = requested by OS
t' = granted by VMM for fair scheduling of multiple VMs
User Mode
13
Virtual Machine Monitor
Example: App Scheduling
Applications
Operating System
4. Run app in User Mode
Hardware
5. Interrupt 2. Set interval timer t'
1. Set interval timer t
3. Run OS in User Mode
Guest OS schedules Apps
VMM schedules VMs
14
x86 Same ISA Problems (1/3)
Normal: VM: OS no longer in kernel mode!
Apps
OSOS
OS + Apps
OSVMM
Ring 0: Kernel Mode Ring 3: User Mode
15
x86 Same ISA Problems (2/3)
In x86, not all system instructions in User Mode activate Kernel Mode!
When Guest OS runs in User Mode, not all system instruction calls observed by VMM
Old solution: patch all binary code! Replace these critical instructions with explicit traps
to Kernel Mode
16
x86 Same ISA Problems (3/3)
New solution: Intel VT-x Allows Guest OS to run in Kernel Mode (Ring 0) Shared resources still controlled by VMM Using extra mode: VMX
VMX Root for VMX VMX Non-root for Guest OS
“Ring -1” Also hardware support for
VM context switch
Apps
OS
OS
VMM
17
VM Memory Management
Normal: Guest OS has virtual
memory Guest OS maintains
mapping to physical memory
VM: Guest OS has virtual
memory Guest OS maintains
mapping to real memory
VMM maintains mapping to physical memory
Virtual Real Physical
18
Host OS
Native and Hosted VMs
Guest OS
Apps
VMM
Hardware
Guest OS
Apps
Hardware
VMM
(a) Native VM
(b) Dual-Mode Hosted VM
User Mode
Kernel Mode
19
VMWare Workstation
Install on top of existing host OS Easy to use Can use myriad of device drivers available in host
OS
20
VMWare Architecture
VMApp
Host OS
VMMonitor
X86 Hardware
VMDriver
Applications
Guest OS
22
VMWare I/O
VMApp
Host OS
VMMonitor
X86 Hardware
VMDriver
Applications
Guest OS
DeviceDriver
DeviceDriver
DeviceDriver
Direct support, e.g. IDE
Use host support, e.g. CD, sound, serial port
23
VMWare: New Capabilities
VMApp
Host OS
VMMonitor
X86 Hardware
VMDriver
Applications
Guest OS
DeviceDriver
DeviceDriver
COW
24
Operating System Support Operating System Support for Virtualizationfor Virtualization
25
Native, Hosted, Paravirtualized VMs
Guest OS
Apps
VMM
Hardware
Host OS
Guest OS
Apps
Hardware
VMM
Apps
VMM
Hardware
Guest OS
Modify the Guest OS!
26
Paravirtualization
System VMs can be faster when Guest OS can be modified for virtualization
Showcased in Xen Project Modified
Linux Windows XP
Near native performance!
27
Xen Evolution
Problems: only open-source OSes can be modified Xen implementation tricks not on x86-64
New approach: Start from Full virtualization with Hardware Support
Apply Paravirtualization in areas where speed can be gained:
1. Disk and network I/O2. Interrupts and timers3. Emulated motherboard, legacy boot4. Privileged instructions, page tables
28
Xen Mode: HVM
Source: Lars Kurth, http://wiki.xen.org/wiki/File:XenModes.pnghttp://wiki.xen.org/wiki/Virtualization_Spectrum
30
Xen Mode: HVM + PV Drivers
31
Xen Mode: PVHVM Drivers
32
Xen Mode: PVH
33
KVM
KVM
34
Applications
Guest OS(Modified)
Xen Hypervisor
Xen Architecture
Domain 0Toolstack
Host OS
X86 Hardware
Applications
Guest OS(Modified)
Domain
virtual x86 CPUScheduler MMU Timers
Drivers PV front PV front
35
Applications
Guest OS(Modified)
Xen Hypervisor
Xen Architecture
Domain 0
Toolstack
Host OS
X86 Hardware
Applications
Guest OS(Modified)
Domain
virtual x86 CPUScheduler MMU Timers
Drivers PV front PV front
36
Operating-System Level Virtualization
In between System VM and Process VM Not System VM:
Cannot choose OS Not Process VM:
Multiple processes, not isolated As if multiple instances of the same OS are running
on the same machine Example: Linux Containers
cf. Docker
37
Linux Host OS
Linux Containers
Applications
OS View
X86 Hardware
Namespaces CGroups
Applications
OS View
Applications
OS View
38
Linux Containers: Namespaces
Linux has a configuration Controlled via many files and outside input Idea: allow a configuration per process (group) E.g. for process A the hostname is X, for process B
the hostname is Y cf. chroot Now: configuration is set of 6 namespaces
Source: Rami Rosen, “Linux Kernel Networking”, APress.
http://www.haifux.org/lectures/299/netLec7.pdf
39
6 Namespaces of Linux
uts (hostname) mnt (mount points, filesystems) pid (processes) user (UIDs) ipc (System V IPC) net (network stack)
(plans to add more)
40
UTS Namespace (1/3)
“UNIX time sharing” ?! Contains 6 strings:
sysname Operating system name (e.g., "Linux") nodename Name within "some implementation
-defined network" release OS release (e.g., "2.6.28") version OS version machine Hardware identifier domainname NIS or YP domain name
i.e., control the names of the containerSource: uname(2)
41
UTS Namespaces (2/3)
The old implementation of gethostname():
asmlinkage long sys_gethostname(char __user *name, int len)
{
...
if (copy_to_user(name, system_utsname.nodename, I))
errno = -EFAULT;
...
}
system_utsname is a global variable
42
UTS Namespaces (3/3)The new implementation of gethostname():static inline struct new_utsname *utsname(void)
{
return ¤t->nsproxy->uts_ns->name;
}
SYSCALL_DEFINE2(gethostname, char __user *, name, int, len)
{
struct new_utsname *u;
...
u = utsname();
if (copy_to_user(name, u->nodename, i))
errno = -EFAULT;
43
MNT Namespace
View of which filesystems are mounted New mounts only visible in current mnt namespace Unless special flags are used:
mount –make-shared
/ (root)
/mnt /home
sdb1 sda3 mnt ns1
/ (root)
/mnt /home
sdc1 mnt ns1
44
PID Namespace
Processes in different PID namespaces can have the same process ID.
When creating the first process in a new namespace, its PID is 1.
Hierarchy of PID namespaces: PIDs visible to parent namespace Nested upto 32 levels deep
45
User namespace
New namespace = new set of UIDs and GIDs Hierarchy Existing UIDs are mapped into new space
E.g. UID 1000 becomes UID 0 in new space First process in the new space has “root”
Only for namespaces inside the new space! Create container:
1. Create new user namespace2. Create new UTS, MNT, PID, etc. namespaces from that
namespace Outside: permissions of parent UID
46
NET Namespace (1/2)
A network namespace is logically another copy of the network stack: own routes, own firewall rules, own network devices.
A network device belongs to exactly one network namespace
A socket belongs to exactly one network namespace
47
NET Namespace (2/2)
The initial network namespace includes: loopback device all physical devices, networking tables, etc.
New network namespace includes only the loopback device Real devices can be moved into NS Virtual devices can be added Control via ip netns command And e.g. /etc/netns/<nsname>/hosts
48
Containers via namespaces
Create a container: 1. Create a user namespace2. Create a PID and UTS namespace inside3. Create a MNT namespace to get your own
filesystem 4. Mount container disk image5. Create NET namespace, add virtual devices6. Connect virtual devices to real network via e.g.
virtual bridges
49
CGroups
Namespaces can give groups of processes: Same view of the OS Illusion there are no other groups
Control Groups is a mechanism for resource management for groups of processes: Set limits, e.g. on memory usage (main + FS cache) Set priorities (CPU or disk bandwidth) Accounting Checkpointing