Reading Report Cost of VM Live Migration

Reading Report Cost of VM Live Migration

By William Voorsluys1, James Broberg1, Srikumar Venugopal2, and Rajkumar Buyya1CLOUD 2009

the paper’s goal Migration overhead is acceptable but cannot

be disregarded , especially when SLA is strict. The paper gives a performance evaluation of

live migration.

Related Works(1) Multicore [8,9], paravirtualization[1] ,

hardware-assisted virtualization [10], live migration[3]

Individual measurement of VM runtime overhead imposed by hypervisors on a variety of workloads [1, 11, 12]

the impact of consolidating several applications on a single server running Xen[13]

Related Works(2) performance degradation when migrating CPU

and memory intensive workloads as well as migrating multiple VMs at the same time in a stop-and-copy way[15]

quantify its effects on a set of four applications common to hosting environments, primarily focusing on quantifying downtime and total migration time and demonstrating the viability of live migration.[3]

They don’t evaluated the effect of migration in the performance of modern Internet workloads, such as multi-tier and social network oriented applications.

Related Works evaluate the efficacy of migrating VMs across

long distances, such as over the Internet[16] the vConsolidate benchmark [14] a Web server , a database server a Java server , a mail server , an idle server. The Cloudstone benchmark [17] aims at

computing the monetary cost, in dollars/user/month, for hosting Web 2.0 applications in cloud computing platforms

Background:advantage of live migration Live (or hot) migration : Hypervisors allow

migrating an OS as it continues to run. Stop-and-copy (or cold)migration : halting the

VM ,copying all its memory pages to the destination then restarting the new VM.

The advantage of live migration : the possibility to migrate an OS with near-0 downtime .

Background:characteristic of modern Internet application Highly dynamic and interactive features make

Web2.0 apps explode. Social networking features make each user’s

actions affect many other users , which makes static load partitioning unsuitable as a scaling strategy.

By means of blogs , photostreams and tagging , users now publish content to one another rather than just consuming the static content.

Testbed specifications A cluster composed of 6 servers,1 head-node and 5 VM

hosts. Each equipped with Intel Xeon E5410 (2.33 GHz Quad-

core with 2x6MB L2 cache and Intel VT technology) , 4GB memory , 7200rpm hard drive , connected through a Gigabit Ethernet switch.

head-node : Ubuntu Server 7.10 with no hypervisor. other nodes : Citrix XenServer Enterprise Edition 5.0.0. All VMs run 64-bit Ubuntu Linux 8.04 Server Edition,

paravirtualized kernel version 2.6.24-23. The installed web server is Apache 2.2.8 running in prefork mode. PHP version is 5.2.4-2. MySQL, with Innodb engine, is version 5.1.32.

WorkLoad Olio[18] as a Web2.0 application , combined

with Faban load generator[19] Olio’s PHP implementation , employing the

popular LMAP stack(Linux Apache MySQL PHP) The Olio/Faban was originally proposed as part

of the CloudStone benchmark[17] The main metric : Service Level Agreement

defined in Cloudstone.

Cloudstone's SLA

Table 1. : The 90th/99th percentile of response times measured in any 5-minute window during steady state should not excess the following values (in seconds)

Benchmarking architecture(1)

Benchmarking architecture(2) MySQL tends to be CPU-bound when serving the

Olio database Apache/PHP tends to be memory-bound [17] All nodes share an NFS (Network File System)

mounted storage device, which resides in the head-node and stores VM images and virtual disks.

A local virtual disk is hosted in the server that hosts MySQL.

The load is driven from the head-node, where the multi-threaded workload drivers run, along with Faban's master component.

Experimental objective To quantify slowdown and downtime

experienced by the application when VM migrations are performed in the middle of a run.

In a series of runs did not consist of migrating a VM back and forth between the same two machines.

Preliminary experiments To define exact VM sizes. Without migration Driven load against Olio and gradually increase the

number of concurrent users. By analyzing the SLA , they found 600 is the max

concurrent users. By memory and CPU usage, they found the min VM

sizes serving 600 users could be : VM hosting Apache/PHP 1 vCPU 2GB memory VM hosting MySQL 2 vCPU 1GB memory Host SQL on NFS can only support 400 users. So the experiment would not include database server

migration.

Migration Experiment First set of experiment with Olio : 10-minute and 20 minute benchmark runs

with 600 concurrent users. To evaluate how the SLA is violated when the

system is nearly oversubscribed but not overloaded and also quantify the downtime when live migrations happen.

Then, run the benchmark with smaller numbers of concurrent users, namely 100,200,..,500, searching for a “safe” level (lower risk of SLA violation).

Result and Discussion(1) Result shows that overhead due to live

migration is acceptable but cannot be disregarded, especially in SLA-oriented environments equiring more demanding service levels.

Result and Discussion(2)

Fig.2.Effects of a live migration on Olio's homepage loading activity

Result and Discussion(3) Figure 2 shows the effect of a single migration performed

after five minutes in steady state of one run. A downtime of 3 seconds is experienced near the end of a

44 second migration. The highest peak observed in response times takes place

immediately after the VM resumes in the destination node; 5 seconds elapse until the system can fully serve all requests

that had initiated during downtime. In spite of that, no requests were dropped or timed out due

to application downtime. The downtime experienced by Olio when serving 600

concurrent users is well above the expected millisecond level , previously reported in the literature for a range of workloads [3]. This interesting result suggest that the workload complexity imposes a unusual memory


Fig. 3. 90th and 99th percentile SLA computed for the homepage loading response time with 600 concurrent users. The maximum allowed response time is 1 second

Result and Discussion(5) Figure 3 presents the effect of multiple

migrations on the homepage loading response times. These result corresponds to the average of 5 runs.

It is paramount that this information is employed by SLA-oriented VM-allocation mechanisms with the objective of reducing the risk of SLA non-compliance in situations when VM migrations are inevitable.


Result and Discussion(7) Table 2 presents more detailed results listing

maximum response times for all user actions as computed by the 99th percentile SLA formula when one migration was performed in the middle of a 10 minute run.

These results indicate that a workload of 500 users is the load level at which a live migration of the Web server should be carried out (e.g. to a least loaded server) in order to decrease the risk of SLA violation.

Reference(1)

Reference(2)

Reference(3)

Documents

Reading Report Cost of VM Live Migration