SimGrid VM: Virtual Machine Support for a Simulation ...static.tongtianta.site/paper_pdf/fd94ab3e-56ba-11e9-9363-00163e08… · SimGrid VM: Virtual Machine Support for a Simulation

SimGrid VM: Virtual Machine Support for aSimulation Framework of Distributed Systems

Takahiro Hirofuchi, Adrien Lebre, and Laurent Pouilloux

Abstract—As real systems become larger and more complex, the use of simulator frameworks grows in our research community. By

leveraging them, users can focus on the major aspects of their algorithm, run in-siclo experiments (i.e., simulations), and thoroughly

analyze results, even for a large-scale environment without facing the complexity of conducting in-vivo studies (i.e., on real testbeds).

Since nowadays the virtual machine (VM) technology has become a fundamental building block of distributed computing environments,

in particular in cloud infrastructures, our community needs a full-fledged simulation framework that enables us to investigate large-scale

virtualized environments through accurate simulations. To be adopted, such a framework should provide easy-to-use APIs as well

as accurate simulation results. In this paper, we present a highly-scalable and versatile simulation framework supporting VM

environments. By leveraging SimGrid, a widely-used open-source simulation toolkit, our simulation framework allows users to

launch hundreds of thousands of VMs on their simulation programs and control VMs in the same manner as in the real world

(e.g., suspend/resume and migrate). Users can execute computation and communication tasks on physical machines (PMs) and VMs

through the sameSimGrid API, which will provide a seamlessmigration path to IaaS simulations for hundreds of SimGrid users.

Moreover, SimGrid VM includes a livemigrationmodel implementing the precopymigration algorithm. Thismodel correctly calculates the

migration time as well as themigration traffic, taking account of resource contention caused by other computations and data exchanges

within the whole system. This allows user to obtain accurate results of dynamic virtualized systems.We confirmed accuracy of both the

VM and the livemigrationmodels by conducting several micro-benchmarks under various conditions. Finally, we conclude the article by

presenting a first use-case of one consolidation algorithm dealing with a significant number of VMs/PMs. In addition to confirming the

accuracy and scalability of our framework, this first scenario illustrates themain interest of SimGrid VM: investigating through in-siclo

experiments pros/cons of new algorithms in order to limit expensive in-vivo experiments only to themost promising ones.

Index Terms—Virtual machine, simulation, cloud computing, live migration

Ç

1 INTRODUCTION

NOWADAYS, virtual machine (VM) technology plays oneof the key roles in cloud computing environments.

Large-scale data centers can manipulate up to one millionof VMs, each of them being dynamically created anddestroyed according to user requests. Numerous studies onlarge-scale virtualized environments are being conductedby both academics and industries in order to improve per-formance and reliability of such systems. However, thesestudies sometimes involve a potential pitfall; the number ofvirtual machine considered to validate research proposals(up to one thousand in the best case) is far less than thenumber actually hosted by real Infrastructure-as-a-Service(IaaS) platforms. Such a gap prevents researchers fromcorroborating the relevance of managing VMs in a moredynamic fashion, since valid assumptions on small infra-structures sometimes become completely erroneous onmuch larger ones. The reason behind this pitfall is that it isnot always possible for researchers to evaluate robustnessand performance of cloud computing platforms through

large-scale “in vivo” (i.e., real-world) experiments. In addi-tion to the difficulty in obtaining an appropriate testbed, itimplies expensive and tedious tasks such as controlling,monitoring and ensuring the reproducibility of the experi-ments. Hence, correctness of most contributions in themanagement of IaaS platforms has been proved by meansof ad-hoc simulators and confirmed, when available, withsmall-scale “in vivo” experiments. Even though suchapproaches enable our community to make a certain degreeof progress, we advocate that leveraging ad-hoc simulatorswith not so representative “in vivo” experiments is not rig-orous enough to compare and validate new proposals.

Like for Grids [1], P2P [2] and more recently highest lev-els of cloud systems [3], the IaaS community needs an opensimulation framework for evaluating concerns related tothe management of VMs. Such a framework should allowinvestigating very large-scale simulations in an efficient andaccurate way as well as it should provide adequate analysistools to make the discussions of results as clear as possible:By leveraging simulation results, researchers will be able tolimit “in vivo” experiments only to the most relevant ones.

In this paper, we present SimGrid VM, the first highly-scalable and versatile simulation framework supporting VMenvironments. We chose to build it upon SimGrid [1] sinceits relevance in terms of performance and validity hasalready been demonstrated formany distributed systems [4].

Our simulation framework allows users to launch hun-dreds of thousands of VMs on their simulation programsand accurately control VMs in the same manner as in the

� T. Hirofuchi is with ITRI, AIST, Tsukuba, Japan.E-mail: [email protected].

� A. Lebre and L. Pouilloux are rwith Inria, Nantes, France.E-mail: {adrien.lebre, laurent.pouilloux}@inria.fr.

Manuscript received 26 Mar. 2015; revised 13 Aug. 2015; accepted 4 Sept.2015. Date of publication 23 Sept. 2015; date of current version 7 Mar. 2018.Recommended for acceptance by C. De Rose.For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference the Digital Object Identifier below.Digital Object Identifier no. 10.1109/TCC.2015.2481422

IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 6, NO. 1, JANUARY-MARCH 2018 221

2168-7161� 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See ht _tp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

mailto:

mailto:

real world. The framework correctly calculates resourceallocation to each computation/communication task, con-sidering VM placement on a simulated system.

The extension to SimGrid has been designed andimplemented in order to be as transparent as possible toother components: SimGrid users can now easily manip-ulate VMs while continuing to take the advantage of theusual SimGrid MSG API for creating computation andcommunication tasks either on PMs or on VMs. Such achoice provides a seamless migration path from tradi-tional clusters simulations to IaaS ones for hundreds ofSimGrid users. This has been made possible by introduc-ing the concept of abstraction level (i.e., physical and vir-tual) into the core of SimGrid. A virtual workstationmodel, inheriting from the workstation one, has beenadded into the framework and overrides only the opera-tions that are VM-specifics.

In addition to the virtual workstation model, we alsointegrated a live migration model implementing the pre-copy migration algorithm. This model correctly calcu-lates the migration time as well as the migration traffic,taking account of resource contention caused by othercomputations and data exchanges within the whole sys-tem. This allows user to obtain accurate results of sys-tems where migrations play a major role. This is animportant contribution as several people might errone-ously consider that live migrations can be simulated bysimply leveraging data transfer models. As discussed inthis article, several parameters such as the memoryupdate speed of a VM govern live migration operationsand should be considered in the model if we want todeliver correct values.

The rest of the paper is organized as follow. Section 2gives an overview of SimGrid. After summarizing therequirements for the VM support in a simulation frame-work, Section 3 introduces SimGrid VM and explains indetails how the virtual workstation and the live migrationmodels have been designed. Evaluation of both models isdiscussed in Section 4. A first use-case of the VM extensionsis presented in Section 5. Section 6 deals with related workand finally, Section 7 concludes and gives some perspec-tives for SimGrid VM.

2 SIMGRID OVERVIEW

SimGrid is a simulation framework to study the behavior oflarge-scale distributed systems such as Grids, HPC and P2Psystems. There is a large, world-wide user community ofSimGrid. The design overview of SimGrid was described in[1]. SimGrid is carefully designed to be scalable and extensi-ble. It is possible to run a simulation composed of 2,000,000processors on a computer with 16 GB of memory. It allowsrunning a simulation on arbitrary network topology underdynamic compute and network resource availabilities. Itallows users to quickly develop a simulating programthrough easy-to-use APIs in C and Java.

Users can dynamically create two types of tasks, i.e., com-putation tasks and communication tasks, in a simulationworld. As the simulation clock is going forward, these tasksare being executed, consuming CPU and network resource inthe simulation world. Behind the scene, SimGrid formalizes

constraint problems1 to get a resource share to each task. Eventhough there is complex resource contention, it can formulateconstraint problems expressing such a situation. Until thesimulation ends, it repeats formalizing constraint problems,solving them, and then continuing with the next simulationstep. As input parameters, users will prepare platform filesthat describe how their simulation environments are orga-nized; for example, theywill include the CPU capacity of eachhost and the network topology of their environments.

Although SimGrid has many features such as modelchecking, the simulation of MPI applications, and the taskscheduling simulation of DAGs (Direct Acyclic Graphs), welimit our description to the fundamental parts, which aredirectly related to the VM support.

SimGrid is composed of three different layers:The MSG layer provides programming APIs for users. In

most cases, users develop their simulation programs onlyusing the APIs in this layer. It allows users to create a processon a host and to execute a computation/communicationtask on it.

The SIMIX layer is located in the middle of the compo-nents. This layer works for the synchronization and sched-uling of process executions on a simulation. Roughly, itprovides a functionality similar to system calls in the realworld, i.e., simcall in the SimGrid terminology. When a pro-cess calls a simcall to do an operation (e.g., compute, orsend/receive data) in the simulation world, the SIMIX layerconverts it to an action in a corresponding simulation model(explained later). Then, the SIMIX layer blocks the executionof the process until the operation is completed in the simu-lation world. If there is another process that is not yetblocked, the SIMIX layer performs the context switch toanother process, converts its operation to another action,and blocks the execution of that process. After all processesare blocked (i.e., the SIMIX layer has converted the opera-tions of all the currently-running processes to actions), theSIMIX layer requests each model to solve constraint prob-lems, which determines the resource share of each action.After all constraint problems are solved, the SIMIX layersets the simulation clock ahead until at least one action iscompleted under the determined resource shares. Then, itrestarts the execution of the processes of the completedactions. These steps are repeated until a simulation is over.

The SURF layer is the kernel of SimGrid, where a simula-tion model of each resource is implemented. A model for-mulates constraint problems according to requests from theSIMIX layer, and then actually solves them to get theresource share of each action. There are a CPU model and anetwork model, used for computation and communicationtasks, respectively.

In addition to the MSG API, SimGrid provides severalabstractions such as the TRACE mechanisms that enable toperform fine post-mortem analysis of the users’ simulationswith some visualization tools like the Triva/Viva/PajeNGsoftware suite [5]. SimGrid will produce rich simulationoutputs allowing users to perform statistical analysis onsimulation results.

1. Constraint problem, or constraint satisfaction problem, is amathematical problem in which equations define requirements thatvariables should meet.

222 IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 6, NO. 1, JANUARY-MARCH 2018

3 THE VM SUPPORT IN SIMGRID

The extension of SimGrid to support VM abstractions hasbeen driven by the following requirements:

� Produce accurate results. Our simulation frameworkrequires the capability to accurately determineresource shares on both virtualized and non-virtual-ized systems, which must take account of VMplacement on physical resources as well as the mech-anism of virtualization. In other words, if VMs areco-located on a PM, the system should calculate acorrect CPU time to each VM. If a network link isshared with multiple data transfers from/to VMs orPMs, the system needs to assign a correct bandwidthto each transfer.

� Achieve high scalability. SimGrid has been carefullydesigned to be able to perform large-scale simula-tions. The VM support on SimGrid must also achievehigh scalability for large simulation environmentscomprising a mix of thousands of VMs and PMs.

After giving an overview of how we introduced the con-cept of virtual and physical machine layers in its solvingengine of constraint problems, we present the live migrationmodel we implemented. Thanks to these extensions, userscan now simulate the computation and the communicationof virtualized environments as well as investigating mecha-nisms involving VM live migration operations.

3.1 Adding a VMWorkstation Model

In the extension for the VM support, we introduced the con-cept of an abstraction level in the core of SimGrid, i.e., the PMlevel and VM level. This design enables us to leverage theconstraint problem solver of SimGrid also for the VM sup-port. Nomodification to the solver engine has been required.

Fig. 1 illustrates the design overview of SimGrid with theVM support. We added the virtual workstation model tothe SURF layer, and alsomodified the SIMIX layer to be inde-pendent of the physical and virtual workstations. Aworksta-tion model is responsible for managing resources in the PMor VM layer. The virtual workstation model inherits most

callbacks of the physical one, but implementing VM-specificcallbacks. When a process requests to execute a new compu-tation task, the SIMIX layer calls the SURF API of the corre-sponding workstation model (i.e., depending on where thetask is running, the PM or a VMworkstation model is used).Then, the target workstation model creates a computationaction, and adds a new constraint problem into that layer.

In the SURF layer, the physical workstation model cre-ates PM resource objects for each simulated PM. A PMresource object is composed of a CPU resource object and anetwork resource object. A CPU resource object has thecapability (flop/s, floating operation per second) of the PM.A network resource object corresponds to the network inter-face of the PM. A VM resource object basically has the samestructure as a PM one, including a CPU resource object anda network object. However, a VM resource object has thepointer to the PM resource object where the VM is running.It also has a dummy computation action, which representsthe CPU share of the VM in the PM layer (i.e., a variableobject Xi in the following). Currently, we mainly focuson the simulation of CPU and network resources. Diskresource simulation will be integrated in the near future.

As explained in Section 2, the SIMIX RUN function exe-cutes processes in a simulation in a one-by-one fashion, andthen requests each workstation model to calculate resourceshares in each machine layer. We modified the SIMIX RUNfunction to be aware of the machine layers on the simulationsystem. In theory, it is possible to support the simulation ofnested virtualization (i.e., execute a VM at the inside ofanother VM) by adding another machine layer to the code.

The extension of the VM support solves constraint prob-lems with two steps. First, the system solves the constraintproblems in the PM layer, and obtains the values of howmuch resource is assigned to each VM (using the corre-sponding dummy action of each VM). Then, the system sol-ves the constraint problems in the VM layer. From theviewpoint of a PM, a VM is considered as an ordinary taskon the PM. From the viewpoint of a task inside a VM, a VMis considered as an ordinary host below the task.

Without the VM support, the solver engine solves all con-straint problems on a simulation at once. The left side ofFig. 2 shows a simple example where two computation tasksare executed on a PM. The PM has a CPU of the capacity C(flop/s). Then, the system formulates a constraint problem,

X1 þX2 < C; (1)

Fig. 1. Design overview of the VM support in SimGrid. During a simula-tion, SimGrid repeats the following steps: (S1) The SIMIX Run functionswitches the execution context to each process. (1-2-3) Each processcalls a MSG API function, and the corresponding workstation model for-mulates constraint problems. (S2) The SIMIX Run function requests tosolve constraint problems. Then, it updates the states of processes withsolved values and sets the simulation clock ahead.

Fig. 2. Resource share calculation with VM support. C is the CPUcapacity (i.e., the processing speed in flop/s) of a PM. Xi are CPUresource shares assigned to tasks or VMs at the PM layer (from theHost OS viewpoint, a VM is regarded as a task). Xi;j are those of thetasks running inside the VMi.

HIROFUCHI ET AL.: SIMGRID VM: VIRTUAL MACHINE SUPPORT FOR A SIMULATION FRAMEWORK OF DISTRIBUTED SYSTEMS 223

where X1 and X2 are the CPU shares of each task, respec-tively. If there are no other conditions, the solver engineassigns 50 percent of the computation capacity of the PMto each task.

The right side of Fig. 2 shows an example with the VMsupport. A PM has two VMs (VM1 and VM2) and a compu-tation task. VM1 has two computation tasks, and VM2 has acomputation task. First, the system formulates a constraintproblem at the PM layer

X1 þX2 þX3 < C; (2)

whereX1,X2, andX3 are the CPU shares of VM1, VM2, andthe task on the PM. If there are no other conditions, thesolver engine assigns 33.3 percent of the computation capac-ity of the PM to VM1, VM2 and the task on the PM. Second,the system formulates a constraint problem at the VM layer.Regarding VM1 executing two computation tasks, thesolver engine makes

X1;1 þX1;2 < X1; (3)

where X1;1, X1;2 are the CPU shares of the tasks on VM1. Inthe same manner, for VM2 executing 1 computation task,the solver engine makes

X2;1 < X2; (4)

whereX2;1 is the CPU shares of the task onVM2. Thus, if thereare no other conditions, each task onVM1 obtains 16.7 percentof the CPU capacitywhile the VM2 one obtains 33.3 percent.

SimGrid allows end-users to set priority to each task.This capability also works for the VM support. As for theabove example, if we set 2x larger priority to VM1, VM1obtains 50 percent of the computation capacity, andVM2 andthe task on the PM get only 25 percent, respectively. Addi-tionally, we added the capping mechanism of the maximumCPU utilization of each task and VM. We can set the maxi-mum CPU utilization of VM1 to 10 percent of the capacity ofthe PM. Even if we remove VM2 and the task on the PM,VM1 cannot obtain more than 10 percent. These features areuseful to take account of virtualization overheads in simula-tions. In the real world, we can sometimes observe that theperformance of a workload is degraded at the inside of a VM.It is possible to simulate this kind of overhead by means ofsetting priority and capping of tasks andVMs appropriately.

The network resource calculation mechanism withthe VM support is implemented in the same manner as theCPU mechanism. The network mechanism considersthe resource contention on the PM and also that of sharednetwork links.

3.2 Adding a Live Migration Model

Virtual machine monitors (VMMs) supporting live migra-tion of VMs usually implement the precopy algorithm [6].At coarse grained, this algorithm transfers all memorypages to the destination PM, before switching the executionhost of a VM. Thus, one can erroneously envision that livemigration operations can be simulated simply by one net-work exchange between the source and the destinationnodes. However, the pre-copy algorithm is a bit more com-plex and it is crucial to consider several parameters that

clearly govern the time that is mandatory to migrate oneVM from one node to another. In this section, first, wedescribe the successive steps of the precopy algorithm andshow that the memory update speed of the VM governs thisalgorithm by discussing several micro-benchmarks. Thedesign of our live migration model that relies on this pre-liminary study is finally introduced.

3.2.1 Live Migration Fundamentals

When a live migration is invoked for a particular VM, theVMM performs the following stages:

� Stage 1: Transfer all memory pages of the VM. Notethat the guest operating system is still running at thesource. Hence, some memory pages can be updatedduring this first transfer.

� Stage 2: Transfer the memory pages that have beenupdated during the previous copy phase. Similar toStage 1, some memory pages will be updated duringthis second transfer. Hence Stage 2 is iteratively per-formed until the number of updated memory pagesbecomes sufficiently small.

� Stage 3: Stop the VM. Transfer the rest of memorypages and other negligibly small states (e.g., those ofvirtual CPU and devices). Finally, restart the VM onthe destination.

Since Stage 3 involves a temporal pause of the VM, theVMM tries to minimize this downtime as much as possible,making it unnoticeable from users and applications. Forexample, the default maximum downtime of Qemu/KVM,the de-fact Linux VMM [7], is 30 ms. During Stage 2,Qemu/KVM iteratively copies updated memory pages todestination, until the size of remaining memory pagesbecomes smaller than the threshold value that will achievethe 30 ms downtime. Consequently, a migration timemainly depends on the memory update speed of the VMand the network speed of the migration traffic. A VM inten-sively updating memory pages will require a longer migra-tion time and in the worst case (i.e., the memory updatespeed is higher than the network bandwidth), a migrationwill not finish (i.e., not converge in technical terms).Although libvirt [8] allows users to set a timeout value for amigration, this mechanism is not enabled in the default set-tings of Linux distributions.

3.2.2 Impact of the Memory Update Speed

In order to have an idea of the magnitude of the memoryupdate speed in cloud applications, we performed prelimi-nary experiments using representative workloads. Thisclarifies whether the memory update speed is large enoughto be considered as a predominant factor of a live migrationmodel. First, we used a web server workload, and second adatabase server. We measured how the memory updatespeed changes in response to the CPU load change of theVM. Each workload runs on the guest OS of a VM.

To measure the memory update speeds, we extendedQemu/KVM to periodically output the current memoryupdate speed of a VM. The VMM has the mechanism totrack updated memory pages during a migration, i.e., dirtypage tracking. The VMM maintains the bitmap recording


updated memory page offsets. With the extension, theVMM enables dirty page tracking: it scans and clears everysecond the bitmap in order to count up the number ofupdated pages. It is noteworthy that we did not observenoticeable CPU overhead due to this extension. The recenthardware supports dirty page tracking in the hardwarelevel, and its CPU overhead is substantially small comparedto the CPU consumption of a workload.

Web sever. We set up a VMwith one VCPU, 1 GB of mem-ory, and one network interface on a PM. The network inter-face was bridged to the physical network link of GbE. Weconfigured an Apache-2.2 web server on the guest OS of theVM. The Apache server worked in the multi-threads mode,handling each HTTP session with one thread. Another PMwas used to launch the siege web server benchmark pro-gram [9], which randomly retrieves files available on theweb server by performing HTTP get requests. Static Webcontents had been generated in advance on the guest OS.The size of each file was 100 KB (i.e., a typical size of a webcontent on the Internet) and the total size of the generatedweb contents was 2 GB (20K files).

In order to investigate the impact of the page cachemechanism upon the update memory speed, we performedtwo particular cases:

� For the first case, the benchmark program randomlyaccessed all 2 GB web contents. Because the RAMsize of the VM is 1 GB, accessing more than 1 GBinvolves I/O accesses since the whole contents can-not be cached by the guest OS. When a requested fileis not on the page cache, the guest OS reads the con-tent file from the virtual disk, involving memoryupdates.

� For the second case, we limited HTTP requests onlyto 512 MB of the web contents. The corresponding5,000 files had been read on the guest OS, beforelaunching the benchmark program. By caching tar-get files beforehand, we minimized memory updatesdue to the page cache operation.

For both experiments, we gradually increased the num-ber of concurrent accesses performed by the benchmarkprogram: The number of concurrent sessions was increasedby 16 every 60 seconds, up to 512. We measured the mem-ory update speed and the CPU utilization every second.Figs. 3a and 3b show the correlation between the CPU utili-zation level of the VM and its memory update speed. Whenthe number of concurrent sessions increased, the CPU utili-zation as well as the memory update speed became higher.

We expected that the memory update speed of the first casewould be significant because of refreshing the page cache,and that of the second case would be small because all filecontents were already cached. As shown in Fig. 3b, how-ever, in the second case, there exist intensive memoryupdates (e.g., 30 MB/s at 60 percent of the CPU utilization),which are not negligible in comparison to the networkbandwidth. Considering that the Apache server sends datathrough zero copy operations (e.g., the use of sendpage(),or the combination of mmap() and send()), this memoryupdate speed results from the pages used by the web serverto manage HTTP session (i.e., the heap on the guest OS).The guest OS kernel will also update memory pages forTCP connections, receiving client requests and sendingdynamically generated data (i.e., HTTP protocol headers).

Database server. The second study focuses on a post-gresql-9.2.4 database server. The configuration of the VMwas the same as that of the web server experiments. Thepgbench benchmark [10] was used on the other PM. It emu-lates the TPC-B benchmark specification [11] that targetsdatabase management systems on batch applications, andthe back-end database server on market segment. Thedefault setting of the benchmark program aims at measur-ing the maximum performance of a database server andthus completely saturates the CPU utilization of the VM. Toobserve the behavior of the VM at various CPU utilizationlevels, we inserted a 20 ms delay at the end of each sequenceof transaction queries.

After starting the experiment, we increased the numberof concurrent database sessions by 2 every 60 seconds andup to 70 concurrent sessions. Similarly to the Apache experi-ments, we can observe on Fig. 3c, a clear linear correlationbetween them. Because every 60 seconds the benchmarkprogram was re-launched with a new concurrency parame-ter, the sporadic points distant from the majority wascaused by establishing new database sessions. As shown inthe graph, there exists intensive memory updates of theVM. In this experiment, when the CPU utilization was60 percent, the memory update speed reached 40 MB/s(30 percent of the theoretical GbE bandwidth).

To conclude, although points are scattered in the differ-ent graphs, we can observe that a proportional correlationbetween the CPU usage and the memory update speedexists as a rough trend in the experiments. Such a correla-tion is important as it will enable to determine the memoryupdate speed of a VM according to its CPU usage.

Summary. Through the above experiments, we have seenthat the memory update speed can be quite significant in

Fig. 3. The correlation between CPU utilization and memory update speed.


comparison with the network bandwidth. Providing a naivemodel, which simply obtains the cost of a live migration bydividing the VM memory size by the available networkbandwidth, is not appropriate as the memory update speeddetermines the duration of Stage 2 of the precopy algorithm.As an example, the naive model will estimate the migrationof the aforementioned database to 8 seconds (i.e., the timerequired for transferring 1 GB data over the 1 Gbps link) forall situations. This might result in a tremendous gap fromthe migration performance in the real world once theVM starts to be active: when the database VM is utilizing60 percent CPU resource, the live migration time of this VMwas approximately 12 seconds, and the transferred datasize during the migration reached approximately 1,500 MB.This corresponds to 1.5 times the migration cost estimatedby the naive model. Fig. 4 presents the theoretical estima-tion of migration cost for a VM (1 GB RAM). The graphsclearly reveal that it is critical to consider the impact ofmemory updates in order to make an accurate estimation ofmigration time and generated network traffic.

Moreover, the time to perform the first two stages is eas-ily and greatly affected by activities of other VMs and work-loads running in the system:

� From the network point of view, the traffic of a livemigration can suffer from other network exchanges.For instance, a workload of a VM may create net-work traffics that competes with migration ones.Similarly, such a competition occurs when multiplelive migrations are performed simultaneously. In allthese cases, the available network bandwidth for alive migration dynamically changes.

� When several VMs are co-located, they compete witheach other for obtaining CPU resources. This conten-tion may lead to performance degradation of work-loads. Hence, the memory update speed of the VMdynamically changes due to the activities of otherco-located VMs.

A naive model that miscalculates migration times andpossibly under- or over-estimates the corresponding net-work traffic will result in erroneous simulation results farfrom the real world behaviors. To accurately estimate theimpact of the memory update speed, the resource sharingamongst workloads and VMs must be considered. In otherwords, the simulation framework should consider conten-tion on virtualized and non-virtualized systems: If VMs areco-located on a physical machine, the system should com-pute a correct CPU time to each VM. If a network link is

shared with multiple data transfers, including those ofmigrations, the system needs to assign a correct bandwidthto each transfer.

3.2.3 Implementation

The live migration model we implemented in SimGrid, per-forms the same operations of the precopy algorithmdescribed in Section 3.2.1. We denote the memory size of aVM as R. The memory update intensity of the VM is givenby a parameter a (bytes/flops). It denotes how much mem-ory pages are marked dirty while one CPU operation (in asimulation world) is executed. We have published a Qemuextension (Qemu-DPT, dirty page tracking) to easily mea-sure the current memory update speed of a VM, which ena-bles SimGrid users to easily determine the memory updateintensity of the workloads they want to study. For example,a user launches a VMwith Qemu-DPT, starts a workload onthe VM, changes a request rate to the workload, and meas-ures the current CPU utilization level and the memoryupdate speed. Users can easily confirm whether there is alinear correlation and determine the value of correlationcoefficient (i.e., the memory update intensity of the work-load). If a linear correlation is not appropriate, it is possibleto develop another correlation from measured data.

During a live migration, our model repeats data transferuntil the end of Stage 3. All transfers are simulated by usingthe send/recv operations proposed by the SimGrid MSGAPI. We define the data size of the ith transfer as Xi bytes.At Stage 1, the precopy algorithm sends all the memorypages, i.e., X1 ¼ R. In Stage 2, the algorithm sends memorypages updated during the previous data transfer. Based onour preliminary experiments, we assume that there will bemany situations where the memory update speed is roughlyproportional to the CPU utilization level. We use a linearcorrelation function as the first example of correlation func-tions. The size of updated memory pages during the datatransfer is proportional to the total amount of CPU sharesassigned to the VM during this period. Thus, we obtain thedata transfer size of the next phase as Xiþ1 ¼ minðaSi; R

0Þ,where Si (floating operations, flops) is the total amount ofCPU shares assigned to the VM during the ith data transfer(as a reminder, Si is computed by the solver engine asdescribed in Section 3.1). R0 is the memory size of the work-ing set used by workloads on the VM. The size of updatedmemory pages never exceeds the working set size.

The simulation framework formulates constraint prob-lems to solve the duration of each data transfer, which isbased on the network speed, latency, and other communica-tion tasks sharing the same network link. If the maximumthroughput of migration traffic B is given, the modelcontrols the transfer speed of migration data, not toexceed this throughput. This parameter corresponds tomigrate-set-speed of the libvirt API [8].

Every time migration data is sent in a simulation world,the model estimates the available bandwidth for the ongo-ing live migration, in the same manner as the hypervisordoes in the real world. The current throughput of migrationtraffic is estimated by dividing the size of sent data by thetime required for sending the data. This estimated value isused to determine the acceptable size of remaining datawhen migrating from Stage 2 to Stage 3. If the maximum

Fig. 4. Impact of memory updates on migration cost. A VM with 1 GBRAM over a 1 Gbps link. The values are theoretically estimated accord-ing to the precopy algorithm.


downtime d is 30 ms (i.e., the default value of Qemu), and ifthe migration bandwidth is estimated at 1 Gbps, the remain-ing data size must be less than 3,750 KB in Stage 3. Themigration mechanism repeats the iteration of Stage 2 untilthis condition is met.

Finally, at Stage 3, our model creates the communicationtask of Xn bytes to transfer the rest of memory pages. Afterthis final task ends, the system switches the execution hostof the VM to the destination.

Since a linear correlation will explain the memory updatespeed in typical situations, we implemented it as a defaultcorrelation function. However, the proportional functionused in the current implementation is just an example of cor-relation functions to estimate the memory update speed of aVM. If this proportional function is not appropriate for a situ-ation, it is possible to define another correlation function.

The correlation function of the memory update speed canbe defined not only with the CPU utilization level but alsowith any other variables available in the simulation world.For example, a request rate of an application can be used, ifit explains the memory update speed of a VM. We considerthat in most cases a user will need to modify only one func-tion get_update_size() in the VM extension, which cal-culates the size of updated memory pages.

3.3 SimGrid VM API (C and Java)

In our work, we extended the MSG programming API inorder to manipulate a VM resource object as shown inTable 1. Each operation in this API corresponds to a real-worldVMoperation such as create/destroy, start/shutdown,resume/suspend and migrate. We newly defined themsg_vm_t structure, which is a VM object in a simulationworld, supporting these VM APIs. It should be noted thata VM object inherits all features from a PM objectmsg_host_t. A VM resource object supports most existingoperations of a PM, such as task creation and execution. Fromthe viewpoint of users, they can treat a VM as an ordinaryhost, except a VM supports these VM-specific operations.

As discussed in Section 3.2.3, users need to specify thememory size of a VM, the memory update intensity of theVM, and the memory size of the working set of memoryused by a workload on the VM. These parameters arespecified either at the VM creation or through theMSG_VM_set_params() function.

Fig. 5 shows an example code using the VM APIs.example() starts a VM on the given PM, and launches aworker process on the VM. The way of launching a processis exactly the same as that of a PM; we can use MSG_

process_create() also for a VM.Although this example is in C, it is noteworthy that the

JAVA SimGrid API has been also extended. Hence, end-users can develop their simulators either by interactingwith the native C routines or by using the JAVA bindings.

Finally, we highlight that we also extended the multicoresupport of SimGrid to allow the simulation of virtualizedsystems running on multicore servers. The set_affinity

function pins the execution of a VM (or a task) on givenCPU cores.

4 EVALUATION OF VM EXTENSIONS

In order to confirm the correctness and the accuracy ofthe VM extensions within SimGrid, we conducted severalmicro benchmark experiments on Grid’5000 and com-pared results with the simulated ones. We discuss in thissection major ones.

4.1 Experimental Conditions

In this section, we give details that will enable to reproducethe experiments discussed in the next sections.

All experiments in the real world were conducted onthe Grid’5000 Graphene cluster. Each PM has one IntelXeon X3440 (4 CPU cores), 16 GB memory, and a GbENIC. The hardware virtualization mechanism (i.e., IntelVT) was enabled.

We used Qemu/KVM (Qemu-1.5 and Linux-3.2) for thehypervisor in the experiments. Assuming long-lived activeVMs instead of idle VMs that never became active afterbeing booted, we modified a few lines of source code ofQemu to disable the mechanism not to transfer zero-filled

TABLE 1The APIs to Manipulate a VM Resource Object

in the VM Support

Fig. 5. An example code using the VM APIs.


pages. This mechanism does not effectively work if the VMis running for a while with active workloads. In such case,non-zero data already exists in most memory pages. More-over, Qemu-1.5 also supports the XBRLE compression ofmigration data [12]. This mechanism, which is disabled inthe default settings of major Linux distributions, enables topick up updated regions of the pages and send them withcompression (even though a memory page is marked asdirty, only a few bytes in the page may have been updated,thus selecting only the updated data enables to reduce themigration traffic). Although, it is possible to extend SimGridto simulate the behaviors of these compression mechanisms,we choose to focus our study on the development of asound model that can capture common behaviors amonghypervisors, and to not focus on implementation-specificdetails of a particular hypervisor. Hence, we kept this mech-anism disabled. The virtual disk of a migrating VM isshared between source and destination physical machinesby a NFS server. This is a widely-used storage configurationin cloud computing platforms. To cover more advancedconfigurations, we are extending our model to support vir-tual disks and understand the impact of I/O intensiveworkloads. This effort enables us to simulate the relocationof the associated VM images, which will be reported in ourfuture work.

We used the Execo automatic deployment engine [13] todescribe and perform the experiments by using its PythonAPI. According to the scenario of an experiment, the Execodeployment engine automatically reserves, install and con-figure nodes and network resources that are mandatorybefore invoking the scripts of the experiment. This mecha-nism allows us to easily run and reproduce our experi-ments. All experiments were repeated at least five timeswith no noticeable deviations in obtained results. Then, thesame experiments were also conducted on SimGrid.

Although the VM extension of SimGrid supports multi-core CPU simulation, in the following micro benchmarks,VMs were pinned to the first CPU core both in real-worldand simulation experiments, so as to carefully discuss howresource contention impacts on live migration performance.

Finally, in order to carefully investigate live migrationbehaviors, we developed a memory update program, mem-touch, emulating various load conditions by real applica-tions. The memtouch program works as a workload thathas a linear correlation between CPU utilization levels andmemory update speeds. It produces the memory updatespeed at a given CPU utilization level, by interleaving busyloops, memory updates and micro sleeps in an appropriateratio as explained in our previous work [14].

The memory update program accepts two kinds ofparameters. One is a target CPU utilization level (percent).The other is a memory update intensity value that character-izes an application. For example, we observed that the data-base workload in Section 3.2.2 had a linear correlationbetween memory update speeds (Y ) and CPU utilizationlevels (X), which was Y ¼ aX. This a is the key parameterto model the memory update behavior of this databaseworkload. From Fig. 3c, we can roughly estimate the mem-ory update intensity a to be 60 MB/s at CPU 100 percent.This 60 MB/s is passed to the arguments of the memoryupdate program. If a given target CPU utilization level is

50 percent, the memory update speed of the programbecomes 30 MB/s. Moreover, if other workloads or co-located VMs compete for CPU resource and the memoryupdate program only gets 25 percent, the actual memoryupdate speed becomes 15 MB/s. This behavior correctlyemulates what happens in consolidated situations.

4.2 Evaluation of the VMWorkstation Model

The CPU resource allocation of a VM impacts on is memoryupdate speed, and the memory update speed is a dominantparameter that governs live migration time. Before doingthe experiments focusing on live migration, we confirmedthat our VM support of SimGrid correctly calculates CPUand network resource allocations for VMs.

We launched two VMs and four computation tasks withthe same arrangement as the right side of Fig. 2: 2 VMs(VM1 and VM2) are launched on a PM. VM1 has 2 computa-tion tasks, and VM2 has 1 computation task. The PMalso has a computation task. All tasks tried to obtain the100 percent CPU utilization, competing with each other. Weused the memory update intensity of the database bench-mark as discussed in the above (i.e., 60 MB/s at the 100 per-cent CPU utilization).

Fig. 6a shows the CPU utilization level of each task in areal-world experiment on Grid’5000. First, VM1, VM2, andthe task on the PM fairly shared the CPU resource of thePM, consuming approximately 33.3 percent respectively.On VM1, each task consumed approximately 50 percent ofthe CPU share assigned to VM1. At 30 seconds, we sus-pended VM1 running the 2 tasks. Thus, the task on VM2and the task on the PM consumed approximately 50 per-cent, respectively. At 60 seconds, we resumed VM1. TheCPU utilization of each task was recovered as the initialstate. At 90 seconds, we shut down VM2. Then, theCPU share of VM1 and the task on the PM increased toapproximately 50 percent, respectively. These results werereasonable, considering the hypervisor fairly assigns CPUresources to each VM and the task on the PM.

The same experiment was performed in simulation. Asshown in Fig. 6b, the VM support of SimGrid correctly cap-tured the CPU load change of each task in large part. How-ever, there are minor differences between the real-worldand simulation experiments, especially just after a VM oper-ation (i.e., suspend/resume and shutdown) was invoked.For example, the shutdown of VM2 took approximately3 seconds to be completed. When the suspension of a VM isinvoked, the guest OS stops all the processes on it, and thencommits all pending write operations to virtual disks. In thereal world, these operations will sometimes have an impacton other tasks on the same PM. We consider that if a user

Fig. 6. The CPU load of each task in a CPU resource contention experi-ment. In each graph, from the top to the bottom, the CPU usages ofTask on VM2, Task on PM, Task1 on VM1, and Task2 on VM1, areillustrated, respectively.


needs to simulate further detail of VM behaviors, it is possi-ble to add the overhead of VM operations into the VMmodel. As mentioned earlier, we are working for exampleon modeling VM boot and snapshotting costs related to I/Oaccesses to the VM images.

The second experiment we conducted aimed at observ-ing the behavior of VMs under network resource conten-tion. Because live migration time is impacted also bynetwork resource availability, the VM support of SimGridneeds to correctly calculate network resource assignmentsto each communication task. Although a former studyregarding SimGrid [1] already proved the correctness of itsnetwork model, the study had been done when our VMsupport was not available. It is necessary to confirm the net-work model also correctly works at the VM level.

We launched twoVMs and four communication taskswiththe same arrangement as the previous experiment. However,in this experiment, four communication tasks were launchedinstead of computation tasks. Each communication task con-tinued to send data to another PM. The destination PM ofeach communication taskwas different, respectively.

Fig. 7a shows the result of a real-world experiment.iperf was used to send data to destination. The host oper-ating system fairly assigned network resources to each TCPconnection. For example, while VM1 was suspended, thetwo communication tasks on it could not send data, and thebandwidth of the 1 Gbps link was split into two other TCPconnections. As shown in Fig. 7b, the VM support of Sim-Grid correctly captured this behavior.

It should be noted that, in the real world experiment, thepacket queuing discipline of the physical network interfaceof the source PM needed to be set to Stochastic FairnessQueuing (SFQ), which enforces bandwidth fairness amongTCP connections more strongly than the default one ofLinux (i.e., basically, first in first out). Although theTCP algorithm is designed to achieve fairness among con-nections, we experienced that, without using SFQ, the net-work traffic made by the task running on the PM occupiedmore than 90 percent of the available bandwidth of the net-work interface. We consider that this problem was causedby the difference of packet flow paths between the task onthe PM and the other tasks running on VMs. Although thelatest libvirt code on its development repository uses SFQas we did, libvirt-0.9.122 used in the experiments did not.

In summary, through these experiments, we confirmedthat the VM support of SimGrid correctly calculatesCPU and network resource assignments to VMs and

tasks. The prerequisite to obtain sound simulation resultsof live migrations is met.

4.3 Live Migrations with Various CPU Levels

We conducted a series of experiments to confirm the cor-rectness of the live migration model. The settings of VMswere the same as the above experiments. The memoryupdate intensity was set to that of the database benchmark(i.e., 60 MB/s at the 100 percent CPU utilization). The mem-ory size of a VMwas 2 GB. The size of the working set mem-ory was set to its 90 percent.

Fig. 8 shows live migration times with different CPUutilization levels. This graph compares the real experimentson Grid’5000, the simulation experiments using ourmigration model, and the simulation experiments with thenaive migration model (i.e., without considering memoryupdates). These results confirm the preliminary investiga-tions performed in Section 3.2.1: As the CPU utilizationlevel of the VM increased, we observed that the live migra-tion time of the VM became longer. Our simulation frame-work implementing the precopy algorithm successfullysimulated this upward curve, within 2 seconds deviations(9 percent at most).

On the other hand, the naive simulation without the pre-copy algorithm failed to calculate correct migration times,especially in the higher CPU utilization levels. At the CPU100 percent case, the naive simulation underestimated themigration time by 50 percent. Simulation frameworks with-out the migration model, e.g., CloudSim, will produce theseresults.

The small differences between the Grid’5000 results andour precopy model, e.g., up to 2 seconds in Fig. 8, can beexplained by the fact that in addition to the memtouch pro-gram, other programs and the guest kernel itself are con-suming CPU cycles and are slightly updating memorypages of the VM in the reality. Furthermore, because thesystem clock on the guest OS is less accurate than that of thehost OS, the memory update speed by the program involvessmall deviations from the target speed. We are going toshow that this deviation might be problematic in a fewcorner cases where memory update speed and availablemigration bandwidth are close to each other.

4.4 Live Migrations under CPU Contention

A live migration is deeply impacted by CPU resource con-tention on the source and destination PMs. We performed

Fig. 7. The network throughput of each task in a network resourcecontention experiment. In each graph, from the top to the bottom, thenetwork throughput of Task on VM2, Task on PM, Task1 on VM1, andTask2 on VM1, are illustrated, respectively.

Fig. 8. Comparison of live migration time. The X axis shows CPU utiliza-tion levels and corresponding memory update speeds. Update memoryspeed of the workload is 60 MB/s at 100 percent of CPU.

2. It is included the latest stable release (wheezy) of the Debian/GNU distribution.


migration experiments with the different numbers ofco-located VMs. In addition to the VM to be migrated, otherVMs were launched on the source or destination PM. Todiscuss serious resource contention, sometimes founddynamic VM packing systems, all the VMs were pinned tothe first physical CPU core of the PM. The cpuset featureof libvirt [8] was used in real-world experiments, andMSG_vm_set_affinity() was called in simulation pro-grams. All the VMs executed the memtouch program ontheir guest OSes. The target CPU utilization of memtouchwas set to 100 percent; although all the VM on the PM triedto obtain 100 percent CPU resource, they actually obtainedpartial CPU resource due to co-location. Qemu/KVM willfairly assign CPU resource to each VM.3

Fig. 9a shows live migration times when other VMs werelaunched on the source PM. When the number of co-locatedVMs was higher, the actual live migration times decreased,becoming close to that of no memory update case (i.e.,approximately 20 seconds). This serious CPU resource con-tention reduced the actual memory update speed of themigrating VM, which results in shorter migration times.Our simulation framework correctly calculated assignedCPU resource and captured migration behavior.

Fig. 9b is the case where the other VMs were launched onthe destination PM. Because the migrating VM, updatingmemory pages on the source PM, did not compete with theother VMs for CPU resource, the live migration times werenot affected by the number of other VMs. This behavior wassuccessfully reproduced by our simulation framework withthe precopy model.

When there were other VMs on the destination PM, themigration times on Grid’5000 were slightly longer than thatof the 1-VM case. During a live migration, the hypervisorlaunches a dummy VM process on the destination PM. Thedummy VM process is receiving migration data from thesource PM. Although this receiving operation requires verylittle CPU resource, the data receiving speed was slightlyreduced due to the CPU resource contention on the desti-nation PM. We consider that CPU overheads of sending/receiving migration data are negligible in most use-casesbecause they are substantially small as compared toresource usage by VM workloads. However, if a user ofour simulation framework needs to carefully simulatesuch behaviors, it is possible to create micro computationtasks corresponding to data sending/receiving overheadsby leveraging the SimGrid MSG API. In addition, as dis-cussed in Section 4.3, there were also small deviationsbetween the results of Grid’5000 and the simulation. Ifthese deviations are not negligible, it is also possible toimprove the simulation mechanism by taking intoaccount CPU cycle consumption and memory update bythe guest kernel and other processes.

4.5 Live Migrations under Network Contention

A live migration time is also deeply impacted by the avail-able network bandwidth for the migration. The hypervisoruses a TCP connection to transfer migration data of a VM. It

should be noted that from the viewpoint of the host OS thisis a normal TCP connection opened by a userland process(i.e., the Qemu process of a VM). On the host OS and thenetwork link, there is no difference between the TCP con-nection of a live migration and other TCP connections(including NFS sessions used for disk I/O of VMs). We lim-ited the available bandwidth for a migration to 25, 50,75 and 100 percent of the GbE throughput. The virshmigrate-set-speed command was used. These experi-ments are intended to correspond to real-world situationswhere concurrent live migrations are performed at once orlive migration traffic is affected by other background traffic.Because disk I/O of a VM creates NFS requests, theseexperiments also cover situations where disk I/O of VMsimpact live migration traffic. Figs. 9c, 9d, 9e, and 9f com-pared real and simulated migration times. When the avail-able bandwidth was smaller, the simulation with the naivemodel underestimated live migration times more seriously.The real migration times exponentially increased, as theavailable bandwidth was smaller. The precopy live migra-tion model correctly simulated these trends in most cases. Ifthe available bandwidth is smaller than the actual memoryupdate speed of the VM, the algorithm of the precopy livemigration never finished. The iterative memory copy phaseof the precopy live migration (i.e., Stage 2 in Section 3.2.1)continues until someone cancels (i.e., gives up) the livemigration. This behavior was correctly simulated at the 60,80 and 100 percent CPU utilization levels of the 25 percentGbE bandwidth case (Fig. 9c).

The only exceptional case where the precopy modeldid not follow the results of real experiments, was at the100 percent CPU utilization level of the 50 percent GbEbandwidth case (Fig. 9d). The simulation using the precopymodel predicted this live migration never finished,although the live migration on Grid’5000 finished in330 seconds. In this condition, the migration bandwidthwas 50 percent of 1 Gbps (i.e., 59.6MB/s), while the memoryupdate speed was 60 MB/s. Because the migration band-width is theoretically smaller than the memory updatespeed, the precopy migration model iterated Stage 2 of theprecopy algorithm forever. On the other hand, as discussedin Section 4.3, since the actual memory update speeds of theVM would be slightly below the migration bandwidth, thereal live migration finished in a finite period of time.

We consider that this problem will not appear in mostcases. However, if a user needs to simulate such an excep-tional situation where the migration bandwidth and thememory update speed of a VM are very close, it is necessaryto give careful consideration to the accuracy of these simula-tion parameters. A simulation program may need to con-sider subtle memory updates by the guest kernel andthe network bandwidth fluctuation caused by other back-ground traffic.

To conclude, we can affirm that our experiments wereaccurate enough to validate our extensions in most cases.

5 A FIRST USE CASE

To strengthen the confidence of the accuracy of our exten-sions within the SimGrid toolkit and also to illustratetheir usefulness, we implemented a first program dealing

3. It should be noted that even if multiple VMs share one PM, thenumber of memory pages associated to each VM does not change. Thehypervisor assigns 2 GB of memory pages to each VM.


with the well-known VM placement problem. VM place-ment problem is a combinational optimization determin-ing which VMs should be placed on which PMs. In thissection, our VM placement mechanism dynamicallymigrates VMs among PMs in order to resolve the over-loading of PMs.

We believe that such a use-case is relevant as it willenable researchers to investigate new VM placement poli-cies without performing large-scale in-vivo experiments.

5.1 Overview

The framework is composed of two processes. The firstone creates n VMs, each of which is based on one of prede-fined VM classes. A VM class is a template of the specifica-tion of a VM and its workload. It is described as nb_cpu:

ramsize:net_bw:mig_speed:mem_speed. VMs arelaunched on PMs in a round-robin manner, i.e., each PMhas almost the same number of VMs. Next, the processrepeatedly changes target CPU loads of VMs. Every t sec-ond, it selects one VM and changes its CPU load accordingto a Gaussian distribution. t is a random variable that fol-lows an exponential distribution with rate parameter �. TheGaussian distribution is defined by a particular mean (m) aswell as a particular standard deviation (s) that are given ateach simulation.

The second process controls a relocation the VMs eachtime it is needed. Every 60 seconds, the process checks CPUloads of VMs, and if it detects an overloading PM (i.e., a PMthat cannot satisfy the VMs expectations), it invokes theEntropy algorithm [15], [16] to solve the VM placementproblem. Roughly speaking, the Entropy process is com-posed of two phases. During the first one, Computation phase,Entropy looks for a viable placement by solving a constraintsatisfaction problem, then it calculates an optimal reconfigu-ration plan (i.e., the order of migrations that should be per-formed to switch from the previous reconfiguration to thenew ones). The second phase consists in applying thereturned reconfiguration plan. It is noteworthy that becauseVMPP is an NP-hard problem, the duration of the computa-tion phase is time-bound with a predefined value set tominðnb nodes=8; 300Þ.

5.2 Experimental Conditions and Results

The program has been implemented in Java on top of Sim-Grid and in the reality.4 A specific seed for the random-basedoperations enabled us to ensure reproducibility between thedifferent in-vivo executions and the simulations. The in vivoexperiments have been performed on top of the Grid’5000Graphene cluster with the same conditions (Linux 3.2, Qemu1.5 and SFQ network policy enabled, see 4.1).

In order to reach over-provisioning situations, six VMsper node are initially launched. Each VM has beencreated as one of the eight VM classes. Each VM class herecorresponded to a type of workload with differentmemory update intensity, i.e., the template was 1:1GB:

1Gbps:1Gbps:X, where the memory update speed X was avalue between 0 and 80 percent of the migration bandwidth(1Gbps) in steps of 10. Starting from 0 percent, the load ofeach VM varied according to the exponential and the Gauss-ian distributions previously described. The parameters were� ¼ Nb VMs=300 and m ¼ 60, s ¼ 20. Concretely, the load ofeach VM varied on average every 5 min in steps of 10 (with asignificant part between 40 and 80 percent). The memtouch

program introduced in Section 4.1 has been used to stressboth the CPU and the memory accordingly. The duration ofthe experiment was set to 3,600 seconds.

The simulated environment reflects the real conditions.In particular, we configured the network model of SimGridin order to cope with the network performance of theGraphene servers that were allocated to our experiment(6 MBytes for the TCP_gamma parameter and 0.88 for thebandwidth corrective simulation factor).

Fig. 10 shows the cost of the two phases of the Entropyalgorithm for each invocation when considering 32 PMsand 192 VMs through simulations (top) and in reality (bot-tom). At coarse-grained, we can see that simulation resultssuccessfully followed in-vivo results. Diving into details,the difference between the in-siclo and in-vivo reconfigura-tion time fluctuated between 6 and 18 percent (median was

Fig. 9. Live migration time with resource contention. In Figs. 9a and 9b, the number of co-located VMs is changed, on the source and destination PM,respectively. In Figs. 9c, 9d, 9e, and 9f, the available bandwidth is limited, i.e., 25, 50, 75, and 100 percent of GbE, respectively. The results on thegray zone (no convergence) mean the cases where migrations never finished. Note that the scale of the Y-axis is different. Update memory speed ofthe workload is 60 MB/s at 100 percent of CPU.

4. Codes are available on the github beyond the cloud pages, respec-tively at the VMPlaceS and VMPlaceS-G5K repositories (http://beyondtheclouds.github.io)


http://beyondtheclouds.github.io

http://beyondtheclouds.github.io

around 12 percent). The worst case, i.e., 18 percent, wasreached when multiple migrations were performed simulta-neously on the same destination node. In this case and evenif the SFQ network policy was enabled, we discovered thatin the reality the throughput of migration traffic fluctuatedwhenmultiple migration sessions simultaneously shared thesame destination node.We confirmed this point by analyzingTCP bandwidth sharing through iperf executions. We con-sider that the current simulation results are sufficiently accu-rate to capture performance trends. If necessary, we canfurthermore analysis the influence of this fluctuation by add-ing some bandwidth deviations to competing TCP sessionsin the simulation world. Finally, we found that, amongst the170GB that have been transferred during the complete exper-iment, a bit more than 10 percent was caused by the datatransfer of Stage 2 and Stage 3. The investigated algorithmperformedmigrations when PMs were overbooked (i.e., seri-ous resource contention happened, and thememory intensitywas less impacting on migration times). Our virtualizationmechanism successfully simulated these behaviors by takinginto account resource contentions amongst VMs.

To illustrate the relevance of the simulator framework,we conducted the same experiments with 2,048 PMs/12,288 VMs. The experiment ran in approximately twohours. Table 2 presents the results of the five invocations ofEntropy that has been performed during the 3,600 simulatedseconds. We also conducted the same experiments with4,096 PMs and 24,576 VMs. This simulation lasted approxi-mately 8 hours. Although the simulation time exponentiallyincreased as the number of nodes increased, it is possible tosimulate a large-scale, complex scenario comprising 10 thou-sands of nodes just in one day.We are now optimizing simu-lation code for further performance improvement.

The experiments enabled us to confirm that Entropy2.0 isnot scalable as it did not succeed to find a viable solutionwithin the 300 seconds timeout and hence did not performany reconfiguration. Discussing in details the efficiency ofthe Entropy algorithm is out of the scope of the paper. How-ever, thanks to the simulator we can see some importantphenomenons. In the first hundreds of seconds, the cluster

did not experience replacement of VMs, because the loadsof VMs were still small. However, once Entropy detectednon viable configuration, applying a reconfiguration planwas much more time-consuming than computing it. Thelongest reconfiguration plan did not complete in more than1,702 seconds. This result means that VM placement prob-lem also needs to address the way of shorten reconfigura-tion phases, not only that of computing phases. Leveragingthe SimGrid toolkit enables us to observe detailed systembehaviors without facing with the burden of conductinglarge scale experiments.

6 RELATED WORK

CloudSim [3] is a simulation framework that allows users todevelop simulation programs of cloud datacenters. It hasbeen used, for example, for studies regarding dynamic con-solidation and resource provisioning. Although it looks thatCloudSim shares the same goal with our SimGrid project,the current API of CloudSim is based on a relatively top-down viewpoint of cloud environments. Their API providesa perspective of datacenters composed of application serv-ices and virtual and physical hosts. Users will create pseudodatacenter objects in their simulation programs. In theirpublication, a migration time is calculated by dividing aVM memory size by a network bandwidth.5 This model,however, cannot correctly simulate many real environmentswhere workloads perform substantial memory writes. Onthe other hand, we can say that SimGrid takes a bottom-upapproach. Our on-going project currently pays great atten-tion to carefully simulate the behavior of a VM running invarious conditions, which leads to well-fitting simulationresults of cloud environments where many VMs are concur-rently running with various workloads. As far as we know,our virtualization support of SimGrid is the first simulationframework that implements a live migration model of theprecopy algorithm. It will provide sound simulation resultsfor dynamic consolidation studies. The SONGS project,supporting our activity to add virtualization abstractionsinto SimGrid, is also working on providing the modeling of

Fig. 10. Entropy 2.0—Comparison between simulated and in-vivo obser-vations. The red parts correspond to the time periods where Entropychecks the viability of the current configuration and compute a new via-ble configuration if necessary. The black parts correspond to the applica-tion phase of reconfiguration plans (i.e., the migrations of the VMs areperformed to apply a new reconfiguration plan).

TABLE 2The Cost of Each Reconfiguration during the 2,048

PMs/12,288 VMs Simulation

Simulation clock time when

reconfig. was invoked (sec)

60 120 433 796 1,597

Computation duration (sec) 9 301 301 301 301Reconfiguration duration(sec)

notneeded

12 62 500 notfinalized

Nb of migrations 0 2 74 227 483

Due to the time to perform the computation and reconfiguration phases,Entropy was invoked only five times during the simulation.

5. In the source code of the latest release of CloudSim (i.e., cloudsim-3.0.3), the power-aware datacenter model of CloudSim (PowerDatac-eter Class) assumes that the half network bandwidth of a target host isalways used for a migration because they assume the other half is usedfor other VM communications. According to FAQ of CloudSim [17], forthose who directly program migrations in simulations, CloudSim pro-vides an API to invoke a migration. However, users need to assign anestimated completion time to each migration. It does not have a mecha-nism to accurately estimate the completion time.


datacenters. The same level of the API of Amazon EC2 issupported in the ongoing work.

iCanCloud [18] is a simulation framework with the jobdispatch model of a virtualized datacenter. Their hypervisormodule is composed of a job scheduler, waiting/running/finished job queues, and a set of VMs. For the use of com-mercial cloud services like Amazon EC2, there is trade-offbetween financial cost and application performance. Thisframework was used to simulate provisioning and schedul-ing algorithms with the cost-per-performance metric. Koala[19] is a discrete-event simulator emulating the AmazonEC2 interface. It extends the cloud management frameworkEucalyptus [20], modeling its cloud/cluster/node control-lers. VM placement algorithms were compared using thissimulator. These frameworks are designed to study ahigher-level perspective of cloud datacenters, such asresource provisioning, scheduling and energy saving. Con-trary, SimGrid, originally designed for the simulation of dis-tributed systems, performs computation and data sending/receiving in the simulation world. It simulates more fine-grained system behavior, which is necessary to thoroughlyanalyze cloud environments.

GreenCloud [21] is a simulator extending a network simu-lator NS2. It allows simulating energy consumption of adatacenter, considering workload distributions and networktopology. This simulator is intended to capture communica-tion details with their packet-level simulation. SimGrid doesnot simulate distributed system in the packet level, but in thecommunication flow basis. It is designed to simulate distrib-uted systems, composed of computation and communica-tion, in a scalable manner. The SONGS project is alsoworking on integrate energy models to SimGrid, which ena-bles energy consumption simulations of datacenters.

We consider that it would be possible to implement aprecopy live migration model on these simulation toolkits.However, substantial extension of existing code may be nec-essary. In order to correctly simulate migration behaviors ofVMs, a simulation toolkit requires the mechanism modelinga live migration algorithm and the mechanism to correctlycalculate resource share of VMs.

SimGrid supports a formal verification mechanism fordistributed systems, a simulation mechanism of paralleltask scheduling with direct acyclic graphs (DAG) models,and a simulation mechanism of unmodified MPI applica-tions. The VM extension of SimGrid enables researchers touse these mechanisms also for virtualized environments.We have carefully designed the VM extension to be compat-ible with existing components in SimGrid.

7 CONCLUSION

We are developing a scalable, versatile, and easy-to-use sim-ulation framework supporting virtualized distributed envi-ronments, which is based on a widely-used, open-sourcesimulation framework, SimGrid. The extension to SimGridis seamlessly integrated with existing components of thesimulation framework. Users can easily develop simulationprograms comprising VMs and PMs through the same Sim-Grid API. We redesigned the constraint problem solver ofSimGrid to support resource share calculation of VMs,and developed a live migration model implementing the

precopy migration algorithm of Qemu/KVM. Throughmicro benchmarks, although we observed that a few cornercases cannot be easily simulated when the memory updatespeed is closed to the network bandwidth, we confirmed thatour precopy live migrationmodel reproduced sound simula-tions results in most cases. In addition, we showed that anaive migration model, not considering memory updates ofthe migrating VM nor resource sharing competition, under-estimated the live migration time as well as the resulting net-work traffic. We illustrated the advantage of our simulationframework by implementing and discussing a first use-casedealing with dynamic placement of VMs. Although we suc-ceeded to perform simulations up to 4 K PMs and 25 K VMs,we discovered that the scalability of our extensions is expo-nential and not proportional to the number of PMs/VMs.Weare working with the SimGrid core developers to investigatehow the performances of our extensions can be improved.The first envisioned approach is to use co-routines instead ofthe pthread library. Finally, we highlight that our exten-sions have been integrated into the core of SimGrid since itsversion 3.11, released inMay 2014.

ACKNOWLEDGMENTS

This work is supported by the French ANR project SONGS(11-INFRA-13) and the JSPS KAKENHI Grant 25330097.

REFERENCES

[1] H. Casanova, A. Legrand, and M. Quinson, “Simgrid: A genericframework for large-scale distributed experiments,” in Proc. 10thInt. Conf. Comput. Model. Simul., 2008, pp. 126–131.

[2] A. Montresor and M. Jelasity, “PeerSim: A scalable P2P simu-lator,” in Proc. 9th Int. Conf. Peer-to-Peer, Sep. 2009, pp. 99–100.

[3] R. N. Calheiros, R. Ranjan, A. Beloglazov, C. A. F. De Rose, and R.Buyya, “CloudSim: A toolkit for modeling and simulation ofcloud computing environments and evaluation of resource provi-sioning algorithms,” Softw. Pract. Exper., vol. 41, no. 1, pp. 23–50,Jan. 2011.

[4] Simgrid publications [Online]. Available: http://simgrid.gforge.inria.fr/Publications.html, 2015.

[5] L. M. Schnorr, A. Legrand, and J.-M. Vincent, “Detection and anal-ysis of resource usage anomalies in large distributed systemsthrough multi-scale visualization,” Concurrency Comput.: Pract.Exper., vol. 24, no. 15, pp. 1792–1816, 2012.

[6] C. Clark, K. Fraser, S. Hand, J. G. Hansen, E. Jul, C. Limpach,I. Pratt, and A. Warfield, “Live migration of virtual machines,” inProc. 2nd Conf. Symp. Netw. Syst. Des. Implementation—Vol. 2, 2005,pp. 273–286.

[7] A. Kivity, “kvm: the Linux virtual machine monitor,” in Proc.Ottawa Linux Symp., Jul. 2007, pp. 225–230.

[8] libvirt: The virtualization API [Online]. Available: http://libvirt.org, 2015.

[9] J. Fulmer. Siege [Online]. Available: http://www.joedog.org/siege-home, 2012.

[10] The PostgreSQL Global Development Group. Postgresql [Online].Available: http://www.postgresql.org/, 2015.

[11] Transaction Processing Performance Council. (1994, Jun.).TPC benchmark b, standard specification revision 2.0 [Online].Available: http://www.tpc.org/tpcb/spec/tpcb_current.pdf

[12] P. Svard, J. Tordsson, B. Hudzia, and E. Elmroth, “High perfor-mance live migration through dynamic page transfer reorderingand compression,” in Proc. IEEE 3rd Int. Conf. Cloud Comput. Tech-nol. Sci., 2011, pp. 542–548.

[13] INRIA. Execo: A generic toolbox for conducting and controllinglarge-scale experiments [Online]. Available: http://execo.gforge.inria.fr/, 2015.

[14] T. Hirofuchi, H. Nakada, S. Itoh, and S. Sekiguchi, “Reactivecloud: Consolidating virtual machines with postcopy livemigration,” IPSJ Trans. Adv. Comput. Syst., vol. ACS37, pp. 86–98,2012.


http://simgrid.gforge.inria.fr/Publications.html

http://simgrid.gforge.inria.fr/Publications.html

http://libvirt.org

http://libvirt.org

http://www.joedog.org/siege-home

http://www.joedog.org/siege-home

http://www.postgresql.org/

http://www.tpc.org/tpcb/spec/tpcb_current.pdf

http://execo.gforge.inria.fr/

http://execo.gforge.inria.fr/

[15] F. Hermenier, X. Lorca, J. M. Menaud, G. Muller, and J. Lawall,“Entropy: A consolidation manager for clusters,” in Proc. ACMSIGPLAN/SIGOPS Int. Conf. Virtual Execution Environ., Mar. 2009,pp. 41–50.

[16] F. Hermenier, J. Lawall, and G. Muller, “BtrPlace: A flexible con-solidation manager for highly available applications,” IEEE Trans.Dependable Secure Comput., vol. 10, no. 5, pp. 273–286, Sep./Oct.2013.

[17] CloudSim FAQ (retrieved 2015, Aug. 13). Advanced features:1. How can I code VM migration inside a data center? [Online].Available: https://code.google.com/p/cloudsim/wiki/FAQ

[18] A. Nunez, J. Vazquez-Poletti, A. Caminero, G. Castane,J. Carretero, and I. Llorente, “ iCanCloud: A flexible and scalablecloud infrastructure simulator,” J. Grid Comput., vol. 10, no. 1,pp. 185–209, 2012.

[19] K. Mills, J. Filliben, and C. Dabrowski, “Comparing vm-placementalgorithms for on-demand clouds,” in Proc. IEEE 3rd Int. Conf.Cloud Comput. Technol. Science, 2011, pp. 91–98.

[20] D. Nurmi, R. Wolski, C. Grzegorczyk, G. Obertelli, S. Soman,L. Youseff, and D. Zagorodnov, “The eucalyptus open-sourcecloud-computing system,” in Proc. 9th IEEE/ACM Int. Symp.Cluster Comput. Grid, 2009, pp. 124–131.

[21] D. Kliazovich, P. Bouvry, Y. Audzevich, and S. Khan,“Greencloud: A packet-level simulator of energy-aware cloudcomputing data centers,” in Proc. Global Telecommun. Conf., 2010,pp. 1–5.

Takahiro Hirofuchi received the BS degreein geophysics from Faculty of Science in KyotoUniversity in March 2002, and the PhD degree inengineering in March 2007 from Nara Institute ofScience and Technology (NAIST). He is a seniorresearcher in National Institute of AdvancedIndustrial Science and Technology (AIST) inJapan. He is working on virtualization technolo-gies for advanced cloud computing. He is anexpert of operating system, virtual machine, andnetwork technologies.

Adrien Lebre received the MS degree from theUniversity of Grenoble in 2002, and the PhDdegree from Grenoble Institute of Technologiesin September 2006. He is a full time researcher atInria (on leave from an assistant professor posi-tion at the Ecole des Mines de Nantes). Hisresearch interests are distributed and Internetcomputing. Since 2011, he has been a memberof the architect board of Grid’5000. He has takenpart to several program committees of conferen-ces and workshops.

Laurent Pouilloux received the PhD degreefrom Institut de Physique du Globe de Paris in2007. He is a research engineer in INRIA, alsowith the LIP laboratory at ENS Lyon, France. Hehas a strong experience in teaching computer sci-ence. He is mainly involved in helping Grid’5000users to setup large-scale experiments, especiallyfor virtual machines deployment and cloud com-puting experiments automation.

" For more information on this or any other computing topic,please visit our Digital Library at www.computer.org/publications/dlib.


https://code.google.com/p/cloudsim/wiki/FAQ

Documents

SimGrid VM: Virtual Machine Support for a Simulation ...static.tongtianta.site/paper_pdf/fd94ab3e-56ba-11e9-9363-00163e08… · SimGrid VM: Virtual Machine Support for a Simulation