Upload
sequoia-lopez
View
23
Download
1
Embed Size (px)
DESCRIPTION
Memory-efficient Virtual Machine High Availability. Karen Kai-Yuan Hou Prof. Kang G. Shin University of Michigan Mustafa Uysal (VMware) Arif Merchant (HP Labs) Sharad Singhal (HP Labs). Protect VM from Host Failures. Set up backup by primary VM replication - PowerPoint PPT Presentation
Citation preview
1
Memory-efficient Virtual Machine High Availability
Karen Kai-Yuan HouProf. Kang G. Shin
University of Michigan
Mustafa Uysal (VMware)Arif Merchant (HP Labs)Sharad Singhal (HP Labs)
2
Protect VM from Host Failures
• Set up backup by primary VM replication• Backup takes over execution promptly if primary fails
• High memory costE.g. To protect a 1G VM, an additional 1G memory is reserved to just hold the backup.
App 1
Primary VM
Hypervisor
Primary Host
App 2
App 1
Backup VM
Hypervisor
Backup Host
App 2Physical Host Failure
3
Use a Shared Storage
• “Maintain” backup VM in storage instead of RAM• Improve resource and energy efficiency. Recover anywhere.
Other primary (active) VM
Other primary (active) VM
App 1
Primary VM
Hypervisor
App 2
Host 1Hypervisor
Host 2
Shared Storage
HypervisorHost 2
Hypervisor
Host n
App 1
Backup VM
App 2
App 1
Primary VM
HypervisorHost 1
App 2
4
Protection: Tracking Primary VM State
• Take checkpoints of the primary VM– Incremental, periodic, copy-on-write checkpoints
Primary VM
App 1App 2
VM memory space
VM Fail-over Image
5
Fail-over: Bringing Up Backup VM
• Slim VM Restore – Load only necessary information
and switch on backup VM quickly– Fetch pages on-demand as the
backup VM executes
VM Fail-over Image
Restored backup VM
App 1App 2
VM memory space
6
Improving I/O Efficiency with SSDs
• Small, random I/O’s are more efficient on SSDs
Primary Side
Updating the VM image continuously.
Restore Side
Fetching from the VM image on-demand.
VM Fail-over Image
small, random writes small, random reads
7
Preliminary Evaluation
• Prototype built on Xen 3.3.2• Questions– How much overhead does continuous checkpointing
introduce on the primary VM?– How does the shared storage support continuous updating
of the fail-over image?– How quickly can our system bring up a backup VM?– How does the backup VM perform when it executes by
fetching pages on-demand?
8
Checkpointing Overheads
• Kernel Compilation • RUBiS
Every 10s Every 5s Every 2s0
5
10
15
20
25
30
35
40
Overhead (%)
Every 10s Every 5s Every 2s0
1
2
3
4
5
6
7 HDHD, COWSSDSSD, COW
Overhead (%)
9
CoW and SSD Enhancements
• CoW reduces VM pause time for taking checkpoints
• Checkpoints commit faster on a SSD
Every 10s Every 5s Every 2s0
50
100
150 w/o COWCOW
Pause Time (ms)
Every 10s Every 5s Every 2s0246
Commit Time (sec)
HD SSD
10
Fail-over Time and Demand Fetching
• Time required to bring up a backup VM
• Overheads of fetching VM pages on-demand
Kernel Compilation RUBiS Video Transcoding0
0.51
1.52
Fail-over Time (sec)
HD SSD
Kernel Compilation RUBiS Video Transcoding05
1015
Overhead (%)
HDSSD
11
Interesting Observations:Page Fetching Behavior
• How a VM uses (demand fetches) its pages while compiling a kernel:
12
Interesting Observations:Page Fetching Behavior
• What actually happens on disk (recorded by blktrace):
13
Conclusions
35
113 ms 10.1 ms 10.1 ms
20 s 20 s 20 s
1.47 s
save restore
35 s
14
• Thank you!