38
Dynamic Cloud Resource Management Scheduling, Migration, and Server Disaggregation Petter Sv¨ ard PHDTHESIS,APRIL 2014 DEPARTMENT OF COMPUTING SCIENCE UME ˚ A UNIVERSITY SWEDEN

Dynamic Cloud Resource Management - DiVA portalumu.diva-portal.org/smash/get/diva2:712168/FULLTEXT02.pdf · Dynamic Cloud Resource Management Scheduling, Migration, ... antorer kontinuerligt

  • Upload
    others

  • View
    16

  • Download
    0

Embed Size (px)

Citation preview

Dynamic Cloud ResourceManagement

Scheduling, Migration, and Server Disaggregation

Petter Svard

PHD THESIS, APRIL 2014DEPARTMENT OF COMPUTING SCIENCE

UMEA UNIVERSITYSWEDEN

Department of Computing ScienceUmea UniversitySE-901 87 Umea, Sweden

[email protected]

Copyright c� 2014Paper I, c� ACM, 2011Paper II, c� IEEE, 2011Paper III, c� P. Svard, B. Hudzia, S. Walsh, J. Tordsson, E. Elmroth, 2013Paper IV, c� P. Svard, B. Hudzia, J. Tordsson, E. Elmroth, 2014Paper V, c� IEEE, 2012Paper VI, c� IEEE, 2013Paper VII, c� P. Svard, W. Li, E. Wadbro, J. Tordsson, E. Elmroth, 2014

ISBN 978-91-7601-038-9ISSN 0348-0542UMINF 14.09

Printed by Print & Media, Umea University, 2014

Abstract

A key aspect of cloud computing is the promise of infinite, scalable resources, and thatcloud services should scale up and down on demand. This thesis investigates methodsfor dynamic resource allocation and management of services in cloud datacenters,introducing new approaches as well as improvements to established technologies.

Virtualization is a key technology for cloud computing as it allows several oper-ating system instances to run on the same Physical Machine, PM, and cloud servicesnormally consists of a number of Virtual Machines, VMs, that are hosted on PMs.In this thesis, a novel virtualization approach is presented. Instead of running eachPM isolated, resources from multiple PMs in the datacenter are disaggregated and ex-posed to the VMs as pools of CPU, I/O and memory resources. VMs are provisionedby using the right amount of resources from each pool, thereby enabling both largerVMs than any single PM can host as well as VMs with tailor-made specifications fortheir application. Another important aspect of virtualization is live migration of VMs,which is the concept moving VMs between PMs without interruption in service. Livemigration allows for better PM utilization and is also useful for administrative pur-poses. In the thesis, two improvements to the standard live migration algorithm arepresented, delta compression and page transfer reordering. The improvements canreduce migration downtime, i.e., the time that the VM is unavailable, as well as thetotal migration time. Postcopy migration, where the VM is resumed on the destinationbefore the memory content is transferred is also studied. Both userspace and in-kernelpostcopy algorithms are evaluated in an in-depth study of live migration principlesand performance.

Efficient mapping of VMs onto PMs is a key problem for cloud providers as PMutilization directly impacts revenue. When services are accepted into a datacenter, adecision is made on which PM should host the service VMs. This thesis presents ageneral approach for service scheduling that allows for the same scheduling softwareto be used across multiple cloud architectures. A number of scheduling algorithms tooptimize objectives like revenue or utilization are also studied. Finally, an approachfor continuous datacenter consolidation is presented. As VM workloads fluctuate andserver availability varies any initial mapping is bound to become suboptimal overtime. The continuous datacenter consolidation approach adjusts this VM-to-PM map-ping during operation based on combinations of management actions, like suspend-ing/resuming PMs, live migrating VMs, and suspending/resuming VMs. Proof-of-concept software and a set of algorithms that allows cloud providers to continuouslyoptimize their server resources are presented in the thesis.

iii

iv

Sammanfattning

En viktig engenskap hos cloud computing ar loftet om oandliga och skalbara resurser,dvs att molntjanster ska kunna skalas upp och ner efter behov. Denna doktorsavhan-dling undersoker metoder for tilldelning och omfordelning av resurser i ett datacenterfor att forbattra robusthet, prestanda och kostnadseffektivitet.

En central teknik inom cloud computing ar virtualisering, som tillater flera in-stanser av operativsystem att kora pa samma maskin. Virtualisering lagger till ettmjukvarulager, en hypervisor, ovanpa hardvaruplattformen. Virtuella maskiner korsovanpa hypervisorn, som fordelar resurser till de virtuella gastmaskinerna. I avhand-lingen presenteras en ny metod for virtualisering. Istallet for att kora varje PM som enisolerad o, exponeras resurser fran flera PM i datacentret som aggregerade resurspooler.VMs kan sedan skapas genom att valja precis ratt mangd resurser fran varje pool,vilket mojliggor bade storre virtuella maskiner an storleken pa den storsta PM samtvirtuella maskiner med skraddarsydda specifikationer for deras anvandning. En annanaspekt av virtualisering ar mojligheten att livemigrera VM mellan PMs i datacentreteftersom det medfor att VMs kan flyttas under korning utan avbrott. I avhandlingenforeslas tva forbattringar av standardalgoritmen for livemigrering, deltakomprimeringoch dynamisk andring av overforingsordningen for minnessidor. Dessa forbattringarkan reducera nedtiden vid livemigrering, dvs tiden da en VM inte ar tillganglig, lik-som den totala migreringstiden. Avhandlingen studerar ocksa postcopy-migrering daren VM startas pa destinationen innan minnesinnehallet overfors. Bade userspace ochin-kernel algoritmer utvarderas i en fordjupad studie av principer och prestanda forlivemigrering.

For att sakerstalla en optimal mappning mellan VMs och PMs foreslas en generellschemalaggningslosning for molntjanster. Denna metod gor det mojligt att anvandasamma schemalaggningsprogramvara i flera distributionsscenarier, och att anvandaolika algoritmer for att maximera faktorer som intakt eller resursutnyttjande. Enmetod for kontinuerlig konsolidering av resurser i datacenter presenteras ocksa. Medtiden blir den initiala mappingen mellan VMs och PMs suboptimal, eftersom arbets-belastningen hos VMs varierar och sa aven PM-tillgangligheten. For att hantera dettaproblem justeras VM till PM-mappningen under drift med hjalp av kombinationerav grundlaggande administrationsatgarder, namligen att pausa och ateruppta PMs,livemigrera VMs, och att pausa och ateruppta VMs. Den foreslagna losningen bestarav programvara och en uppsattning algoritmer som gor att molninfrastrukturlever-antorer kontinuerligt kan optimera anvandningen av sina hardvaruresurser oavsett ar-betsbelastningen.

v

vi

Preface

This thesis consists of a brief introduction to the field, a short discussion of the mainproblems studied, and the following papers.

Paper I P. Svard, B. Hudzia, J. Tordsson, and E. ElmrothEvaluation of Delta Compression Techniques for Efficient Live Migrationof Large Virtual Machines, In The 2011 ACM SIGPLAN/SIGOPS Inter-national Conference on Virtual Execution Environments, pages 111-120.ACM 2011.

Paper II P. Svard, J. Tordsson, B. Hudzia, and E. ElmrothHigh Performance Live Migration through Dynamic Page Transfer Re-ordering and Compression, In The 2011 IEEE International Conferenceon Cloud Computing Technology and Science 2011, pages 542-548. IEEE2011.

Paper III P. Svard, B. Hudzia, S. Walsh, J. Tordsson, and E. ElmrothThe Noble Art of Live VM Migration - Principles and Performance ofPrecopy, Postcopy and Hybrid Migration of Demanding Workloads,Technical Report, UMINF-14.10, (Submitted), 2013.

Paper IV P. Svard, B. Hudzia, J. Tordsson, and E. ElmrothHecatonchire: Enabling Multi-Host Virtual Machines by Resource Aggre-gation and Pooling, Technical Report, UMINF-14.11, (Submitted), 2014.

Paper V W. Li, P. Svard, J. Tordsson, and E. ElmrothA General Approach to Service Deployment in Cloud Environments, InThe 2012 International Conference on Cloud and Green Computing, pages17-24. IEEE 2012.

Paper VI W. Li, P. Svard, J. Tordsson, and E. ElmrothCost-Optimal Cloud Service Deployment under Dynamic Pricing Schemes,In The 2013 IEEE/ACM International Conference on Utility and CloudComputing, pages 187-194. IEEE/ACM 2013.

Paper VII P. Svard, W. Li, E. Wadbro, J. Tordsson, and E. ElmrothContinous Datacenter Consolidation, Technical Report, UMINF-14.08,(Submitted), 2014.

vii

Preface

Financial support has in part been provided by the European Commission’s Sev-enth Framework Programme ([FP7/2010-2013]) under grant agreements no. 215605(RESERVOIR) and 257115 (OPTIMIS), the Swedish Research Council (VR) undercontract number C0590801 for the project Cloud Control, and the Swedish Govern-ment’s strategic effort eSSENCE.

viii

Acknowledgments

First things first, I thank my supervisor Erik Elmroth for his help, valuable adviceand his support during my graduate studies. I’m also grateful for the continuous sup-port of my co-supervisor Johan Tordsson and our interesting discussions concerningresearch, snowboarding and other matters of importance. I also thank my second co-supervisor Benoit Hudzia who introduced me to the Hecatonchire project and thefiner aspects of live migration, the linux kernel, and how to fit a europlug connectorinto a UK socket. I’m also grateful for him hosting me as a visiting PhD student at theSAP facility in Belfast during the fall of 2013. At SAP, Steve Walsh, Aidan Shrib-man and Stuart Hacking have been invaluable concerning the design, implemen-tation and debugging issues with some of the software prototypes developed withinthe scope of this thesis. I also thank Ben Greene at SAP for most generously wel-coming me into the lab and providing me with high-end equipment to perform myexperiments.

All of my colleagues in the Distributed Systems group deserve a big thank youbut there are a few that I want to give a special mention. I thank my lab ’wingman’,Daniel Espling for his top quality work, insightful comments and for our discussionson all topics, work related or not. I also thank him for seeing the light, i.e., supportingthe one true team, Lulea Hockey. I also thank Wubin ’Viali’ Li (ᵾᮼ) for hissupport, his humor, his positive attitude, and for generally being a pleasure to workand travel with. The help of P-O Ostberg was invaluable to me as a fresh PhD studentand I’m also grateful for our snowboarding trips. I thank the ”Obi-Wan” of comput-ers, Tomas ’Stric’ Ogren, for his help with with various computer-related issues andfinally, Mina Sedaghat. Her continuously picking on me has inspired me to do betterwork.

I also thank Eddie Wadbro at the Computing Science department for his helpwith scheduling algorithms and Eric Jul at Bell Labs for his feedback on my licentiatethesis, which have improved the quality of this thesis.

Last but not least, I thank my parents, Hans and Hillevi, for their support duringmy studies and in general, my wife Caroline for putting up with my frequent trav-els and paper deadlines, and my smooth fox terrier ’Devil’ for keeping me companyduring long nights of writing. I also thank my sister Anna-Maja and the rest of myfamily and friends. Without your support, this thesis would not exist. Finally I thankmy infant son Gustav. He won’t understand much of this thesis, but hopefully, he willfind pleasure in ripping it apart with his hands.

ix

x

Contents

1 Introduction 12 Cloud Resource Management 3

2.1 Service Deployment 42.1.1 Scheduling of Services 42.1.2 Scheduling Algorithms 6

3 Enabling Technologies 73.1 Virtualization Technologies 7

3.1.1 Server Disaggregation 83.2 Live Migration 10

3.2.1 Approaches to Live Migration 113.2.2 Live Migration Performance 14

4 Thesis Contributions 174.1 Paper I 174.2 Paper II 174.3 Paper III 184.4 Paper IV 184.5 Paper V 194.6 Paper VI 194.7 Paper VII 19

Paper I 29

Paper II 43

Paper III 55

Paper IV 81

Paper V 91

Paper VI 103

Paper VII 115

xi

xii

Chapter 1

Introduction

Cloud Computing promises access to virtually infinite, scalable resources thatcan be provisioned on demand [30]. However, applying this model to real-world datacenters is associated with several problems, for example, scaling ofcloud services is hard and often requires that applications are re-written to takeadvantage of the cloud model. It is also hard to achieve good utilization of re-sources in a datacenter. In order not to risk hampered performance and brokenService Level Agreements, SLAs, Physical Machines, PMs are often underpro-visioned. A related problem is that the workloads of most cloud applicationsare user-driven which means that their hardware resource demands are vari-able and non-deterministic. Sudden changes in usage patterns can make theprevious mapping of Virtual Machines, VMs, onto the PMs non-optimal andin need of update.

The focus of this thesis is techniques, algorithms, and methods for dy-namic resource management in cloud environments. A general architecture forscheduling of VMs in di↵erent deployment scenarios is proposed and investi-gated, as well as a number of scheduling algorithms in both static and dynamicpricing scenarios. To tackle the problem of non-deterministic VM resource de-mands, an approach for datacenter consolidation, or continuous optimizationof scheduling in a datacenter, is introduced.

An important mechanism in this re-placement of VMs is the ability tomove, or migrate, a VM from one physical host to another. If the migrationis performed in such a way that the connected clients do not perceive anyservice interruption, it is known as live migration. Live migration is useful inmany scenarios, for example, server consolidation is made easier if VMs do nothave to be shut down before they are moved. The technique is also used foradministrative purposes, for example, live migrating VMs to other hosts if aserver needs to be taken o↵-line for some reason. Live migration is not limitedto a single datacenter but can also be utilized to transfer VMs between cloudsites over Wide Area Networks, WANs.

The underlying concepts of live migration are investigated and a number

1

of improvements are proposed and evaluated. These improvements focus onreducing migration downtime, during which service is interrupted, and alsoreducing the amount of transmitted data as well as the total migration time.Di↵erent approaches to live migration are discussed and experimental evalua-tions are used to illustrate their properties and determine their suitability in anumber of scenarios.

The thesis also presents a novel approach to virtualization, server disag-gregation, where hardware resources from several PMs are made available inaggregated pools of CPU, I/O, and memory. Server disaggregation facilitatesa seamless increase of the amount of resources that are allocated to a VM,a concept known as vertical elasticity. Because most applications can takeadvantage of vertical elasticity without modifications, the use of server disag-gregation means that more applications can benefit from running in a cloudenvironment.

The rest of this thesis is structured as follows. Chapter 2 gives a short intro-duction to cloud computing, cloud services, and the concept of VM scheduling.Chapter 3 discusses enabling technologies for cloud computing, such as vir-tualization and live migration, and introduces algorithms for improved livemigration performance as well as a novel approach for de-coupling of resourcesfrom PMs, server disaggregation. The introductory chapters are followed byChapter 4, in which the thesis contributions are summarized. Finally, the sevenpapers are presented.

2

Chapter 2

Cloud Resource

Management

Cloud computing is a model for enabling ubiquitous, convenient, on-demandnetwork access to a shared pool of configurable computing resources that canbe rapidly provisioned and released with minimal management e↵ort or serviceprovider interaction [30]. These resources can be accessed using three servicemodels, Software as a Service where applications in the cloud are made avail-able to customers over a network, Platform as a Service where developers cancreate applications using programming tools and libraries provided by the cloudprovider and finally, Infrastructure as a Service where operators can provisioncomputing resources and deploy arbitrary software on them. Note that theusers has no control over the underlying hardware resources, as there is a layerof abstraction between these users and the fundamental computing resources.

Cloud computing shares many ideas with the field of grid computing [22],especially the concept of infinite, scalable resources and metered service. How-ever, in grid environments the user often pays (or is granted) in advance fora certain amount of resource consumption, e.g., CPU hours, while in cloudenvironments, users are charged on a pay-per-use basis, e.g., in terms of anhourly price to run a certain VM instance [15]. Another di↵erence betweencloud and grid computing is that grid computing typically handles jobs, thebatch execution of software that often has a known start and finish time, whilecloud computing mainly handles provisioning of user-driven services that areless predictable in terms of resource needs.

A service is commonly defined as functionality accessible at a network end-point. In the context of this thesis, however, a cloud service denotes a VM ora collection of VMs configured to run an application or an application stack.These services may consist of one or several components. The functional andnon-functional requirements of the service, such as elasticity bounds or SLAterms, are described in a document that is bundled with the service.

3

The lifecycle of a cloud service consists of four phases, as illustrated inFigure 1. In the construction phase, the service is packaged as a VM or a set ofVMs along with terms and conditions for hosting the service. The deploymentphase involves negotiating SLAs, transferring of the service to the selectedcloud provider and starting it, whereas in the operation phase, the service isrunning. Finally, the undeployment phase concerns shutting down the serviceand releasing its resources.

UndeploymentOperationDeploymentConstruction

Figure 1: Lifecycle of a cloud service.

2.1 Service Deployment

During the deployment phase, the service and its resources are transferred tothe provider where it is to be hosted and configured to run correctly. Thisphase consists of several steps, including discovery of available providers andfiltering these with respect to geographical or legislative criteria. Requests forhosting the service are then sent to the suitable providers. After a negotia-tion phase [20], one or more providers are selected and the packaged service istransferred to the provider. Before the service can be started, it also has to beconfigured correctly for the selected provider, a process known as contextual-ization [13].

In cloud computing, there are several service deployment scenarios, such aspublic clouds, private clouds, bursted cloud and federated clouds [12, 35, 39, 47].Services can be placed within a specific cloud or several cloud providers cancollaborate, meaning that a service can be placed on any of the providers, orsplit among them. This gives rise to di↵erent requirements and provides acertain degree of flexibility. Paper V introduces a general software package forscheduling and deploying VMs in multiple scenarios.

2.1.1 Scheduling of Services

Scheduling, or placement, of services is the process of deciding where servicesshould be hosted. Scheduling is a part of the service deployment process andcan take place both externally to the cloud, i.e., deciding on which cloudprovider the service should be hosted, and internally, i.e., deciding which PMin a datacenter a VM should be run on. For external placement, the decisionon where to host a service can be taken either by the owner of the service,or a third-party brokering service. In the first case, the service owner main-tains a catalog of cloud providers and performs the negotiation with them forterms and costs of hosting the service. In the later case, the brokering service

4

takes responsibility for both discovery of cloud providers and the negotiationprocess [47].

Regarding internal placement, the decision of which PMs in the datacentera service should be hosted by is taken when the service is admitted into theinfrastructure. Depending on criteria such as the current load of the PMs, thesize of the service and any a�nity or anti-a�nity constraints [23], i.e., rulesfor co-location of service components, one or more PMs are selected to runthe VMs that constitute the service. Figure 2 illustrates a scenario with newservices of di↵erent sizes (small, medium, and large) arriving into a datacenterwhere a number of services are already running.

SM

LS

S

S

S

M

SS

M

SM

L

New Services

Figure 2: Internal scheduling of VMs in a datacenter.

The scheduling of services in a datacenter is often performed with respectto some high-level goal [36], like reducing energy consumption, increasing uti-lization [37] and performance [27] or maximizing revenue [17, 38]. However,during operation of the datacenter, the initial placement of a service might nolonger be suitable, due to variations in application and PM load. Events likearrival of new services, existing services being shut down or services being mi-grated out of the datacenter can also a↵ect the quality of the initial placement.To avoid drifting too far from an optimal placement, thus reducing e�ciencyand utilization of the datacenter, scheduling should be performed repeatedlyduring operation. Information from monitoring probes [23], and events such astimers, arrival of new services, or startup and shutdown of PMs can be used todetermine when to update the mapping between VMs and PMs. An approachto perform such continuous datacenter consolidation is presented in Paper VII.

5

2.1.2 Scheduling Algorithms

Scheduling of VMs can be considered as a multi-dimensional type of the BinPacking [10] problem, where VMs with varying CPU, I/O, and memory require-ments are placed on PMs in such a way that resource utilization and/or otherobjectives are maximized. The problem can be addressed, e.g., by using integerlinear programming [52] or by performing an exhaustive search of all possiblesolutions. However, as the problem is complex and the number of possible solu-tions grow rapidly with the amount of PMs and VMs, such approaches can beboth time and resource consuming. A more resource e�cient, and faster, wayis the use of greedy approaches like the First-Fit algorithm that places a VM onthe first available PM that can accommodate it. However, such approximationalgorithms do not normally generate optimal solutions. All in all, approachesto solving the scheduling problem often lead to a trade-o↵ between the time tofind a solution and the quality of the solution found.

Hosting a service in the cloud comes at a cost, as most cloud providersare driven by economical incentives. However, the service workload and theavailable capacity in a datacenter can vary heavily over time, e.g., cyclic duringthe week but also more randomly [5]. It is therefore beneficial for providers tobe able to dynamically adjust prices over time to match the variation in supplyand demand. Paper VI investigates a number of algorithms for scheduling ofservices onto cloud providers under dynamic pricing conditions, ranging fromsimple heuristics to combinatorial optimization techniques.

6

Chapter 3

Enabling Technologies

This chapter discusses enabling technologies for cloud computing and manage-ment of cloud services.

3.1 Virtualization Technologies

Virtualization is the concept of running one or more VMs on top of the samehardware. Virtualization is a mature concept and has been used since the1960s [11]. The physical machine is referred to as the host and the virtual-ized operating system instances are known as guests, or more commonly, VMs.Applications hosted in VMs are isolated from each other and the underlyinghardware. Provisioning and sharing of the common hardware resources betweenthe operating system instances, which does not normally need to be virtualiza-tion aware, is normally handled by a software layer known as a hypervisor [7].In contrast to a full emulator, the hypervisor does not emulate the processorbut instead makes use of CPU self-virtualization to provide a virtualized inter-face to the real hardware. Such virtualization extensions have been present inmainstream CPUs from Intel and AMD since 2005 and because this hardware-accelerated virtualization results in less complex hypervisors and better VMperformance [31], the use of virtualization has increased dramatically since itsintroduction [46, 32].

A common approach to virtualization is full virtualization where a com-plete hardware platform is simulated. The VMs run on top of the virtualizedhardware and do not have to be modified in any way. In fact, they need not beaware that they are running in a virtualized environment. Examples of hyper-visors providing full virtualization are KVM [21], VMware [51] and MicrosoftVirtual PC [1]. If the simulated platform is not identical to the underlyinghardware this type of virtualization is known as paravirtualization. Becausethe interface presented to the guest operating system di↵ers slightly from thehardware, the operating system needs to be modified before it can run on the

7

virtualized platform. An example of a paravirtualization hypervisor is Xen [7].The ability to run several operating system instances on the same hardware

is important as it facilitates administration, better resource utilization andreduced power consumption [50]. Because virtualization provides the ability toquickly provision pre-configured VM instances, it is an enabling technology forcloud computing [6]. Virtualization also enables scalability and centralization ofadministrative functionality in cloud environments. Figure 3 shows an exampleof a small virtualized infrastructure. In this example, two servers running ahypervisor platform are used as hosts for the VMs and a shared storage solutionis used to store the VMs’ disk images.

Hypervisor Hypervisor

VM VM VM VM VM

Figure 3: Example of a virtualized infrastructure.

3.1.1 Server Disaggregation

Server disaggregation is a novel concept where the CPU, I/O, and memoryresources of multiple PMs are aggregated and made available in pools of CPU,I/O, and memory, respectively. VMs can then be provisioned using preciselythe right amount of resources from each pool, thereby tailoring the VMs to theapplication’s needs without being restricted by the limitations of individualPMs. The server disaggregation approach also allows for creation of largerVMs than a single PM can host, making it useful for big data applications. Incontrast to traditional virtualization, where a VM runs on top of exactly onehost, server disaggregation allows VMs to use resources from several PMs, asillustrated in Figure 4.

One motivation for server disaggregation is that CPU development is nolonger following Moore’s law [29]. CPU speeds are not increasing as fast as

8

Hypervisor

Memory Cloud

VM VM VM VM VM

Compute Cloud I/O Cloud

Memory Aggregation

Compute Aggregation

I/O Aggregation

Figure 4: Example of a disaggregated virtualized infrastructure.

before and instead, the industry has moved towards parallelism with proces-sors featuring more cores [16]. The same applies to RAM and disk where,from the CPU perspective, disk access has actually become slower over the last30 years [40]. However, as seen in Figure 5, network bandwidth continues toincrease rapidly. New interfaces such as Infiniband [19] are providing inter-connect speeds that are approaching internal bus speeds [40] and techniqueslike Remote Direct Memory Access, RDMA, are enabling fast access to remotememory. This means that the performance penalty for using cores from remotePMs is decreasing.

Today, with data from transactional enterprise applications, social net-works, mobile devices, etc, the amount of data that needs to be processedis growing steadily and petabyte data sets are no longer exceptional [49]. Towork upon such large data sets, large amounts of vertically scalable computingresources are required. By using server disaggregation, large VMs can be pro-visioned using resources from multiple PMs. Moreover, as these VMs can bescaled vertically (re-sized) on-demand by adding more resources without con-sidering the boundaries of a single PM, vertical scaling can be used instead ofhorizontal scaling (increasing the number of VMs), which is the most commonsolution for scaling in datacenters today. Because most applications need to

9

Figure 5: Development of network performance since 1979.

be adapted to benefit from horizontal scaling but usually need no modificationfor vertical scaling, the use of server disaggregation means that more types ofapplications can benefit from running in a cloud environment.

Performance of running applications can increase as the need for using diskas secondary storage is reduced when remote memory can be accessed. Nodescan also be purpose-built specifically for I/O, memory or CPU performance.Management and administration of the system is also simplified. Running ap-plications on a few larger VMs instead of many small ones reduces complexitywith respect to software installation, synchronization etc. Due to this improvedscalability, increased PM utilization and simplified administration, server dis-aggregation shows great potential for cost savings in data center operations.

Another benefit of using server disaggregation, besides vertical scalability,is that it simplifies VM scheduling to the one-dimensional problem of allocatingthe right amount of resources from each pool. This can in turn lead to higherutilization of the PMs in the datacenter [42]. To summarize, these benefitsmean that server disaggregation is a candidate for becoming the next step inthe evolution of datacenters, illustrated in Figure 6.

Paper IV contains an architecture and reference implementation of a keyconcept in server disaggregation, memory scaleout. The approach is evaluatedby running benchmarks and enterprise software with varying amounts of remotememory while measuring application performance.

3.2 Live Migration

One advantage of virtualization is that VMs can be migrated from one physicalhost to another. In order to do this, the VM’s state, which consists of itsRAM content and any disk images associated with it, must be transferred.To perform this (cold) migration of a VM, it is first suspended, whereuponthe VM’s memory content is written to a file. This file, the VM’s description

10

No virtualization

BasicConsolidation

Flexible ResourceManagement (cloud)

Resource Disaggregation

Figure 6: Evolution of virtualization in datacenters.

file, and its disk images are then transfered to the new host where the VM’sexecution is resumed.

However, if the VM continues to run during the transfer of its state, themigration can be performed with no perceivable interruption in service for theconnected clients. Such VM migration is known as live migration and was firstdemonstrated by Clark et al. [9]. Notably, even when using live migration theVM is suspended for a short while, in order to transfer the complete state, asthe VMs memory and disk are constantly updated during execution. Whilethe VM is suspended, it is not accessible so this migration downtime must bevery short to achieve the goal of no perceivable service interruption.

Live migration is normally performed within the same datacenter where thedisk images are kept on a network storage device accessible to both the sourceand the destination hosts, which means that the disk images do not need to bemigrated. If the VM is migrated within the same IP address space, an AddressResolution Protocol, ARP [33] message is used to notify clients of the VM’snew location when the migration is complete.

Today, live migration is supported by most hypervisors, such as Xen [7],KVM [21], and VMware [51] which all perform live migration with downtimesranging from tens of milliseconds to a second when migrating non-demandingworkloads over LANs. However, live migration performance is highly a↵ectedby the type of workload the VM is running and the available bandwidth of thenetwork link it is migrated over [43].

3.2.1 Approaches to Live Migration

Currently, there are two main approaches to live migration, both designed tolive migrate VMs with minimal disruption. However, they di↵er in fundamentalprinciples. The conceptual di↵erence between them is at what point during themigration process the execution switches from the source to the destination.

11

A B

Site II

A B C

Site 1

Site 2

VM

VM

Figure 7: In-datacenter (LAN) and cross-datacenter (WAN) migration.

Precopy Live Migration

With precopy migration the VM state is transferred in the background in aseries of iterations while the source VM is running and responding to requests.When enough of the state has been transferred, the source VM is suspended tostop further memory writes. The decision on when to switch is typically basedon the amount of memory remaining to transfer, or the switch can be forcedafter a set amount of time. After the VM has been suspended, the remainingstate is transferred and the VM is finally resumed on the destination host.

Suspend VMat source

Resume VMat destination

Transfer CPU stateand remaining pages

Migration downtime

Total migration time

Transfer dirty pages

Suspend VMat source

Transfer CPU state

Resume VMat destination

Pull remaining pages

Migration downtime

Total migration time

Figure 8: Precopy (left) and postcopy (right) migration.

12

Postcopy Migration

When using postcopy migration, only the CPU state is transferred before theVM is resumed on the destination. This means that the execution switchfrom source to destination is almost immediate and the migration downtime isnormally very short. The remaining state is then pulled from the source over thenetwork, either on demand, pre-emptively or using a combination of the two.Figure 8 highlights the algorithmic principles of precopy (left) and postcopy(right) live migration. In the figure, the concepts of migration downtime andtotal migration time are also illustrated.

In-kernel Migration

The part of the hypervisor that handles the live migration process normallyresides in userspace. This means that to transfer the contents of a memory pagefrom the source to the destination, it has first to be copied from kernel spaceto userspace on the source host, transferred to the destination host, and finallycopied from userspace to kernel space on the destination. This ping-pong in andout of kernel space adds overhead to the migration process thereby prolongingmigration time and wasting resources. However, by using techniques such asRDMA it is possible to copy the page content directly from the kernel space onthe source host to kernel space on the destination host. Paper IV investigatesthe e↵ects of this approach on live migration performance.

WAN Migration

Migration can also be performed between datacenters over a WAN, as illus-trated in Figure 7. However, WAN migration poses a number of di�cultiescompared to LAN migration. Firstly, a storage migration [54] must be per-formed, as it is not feasible to use a network storage device to share the diskimages between the datacenters due to latency and performance issues. Stor-age migration, which is the concept of transferring the VMs disk image duringoperation, can be performed in a similar manner as memory migration, i.e., thestorage blocks are transferred over the network from the source to the destina-tion while the VM is running. If the precopy approach is used, blocks that areupdated during this process have to be re-transferred, however, because diskwrites are slower than memory updates, storage migration often requires lesstransfer iterations than memory migration [54].

Another di�culty is that, in contrast to LAN migration, the IP addressspace often di↵ers between datacenters. Techniques to overcome this issueinclude, e.g., IP tunneling, Mobile IP, and light path reservation. Althoughhypervisor support for live migration usually is limited to LAN migration, liveWAN migration has been demonstrated in several contributions [4, 8, 18, 34,48, 53], mainly using variations on the precopy approach.

13

3.2.2 Live Migration Performance

Regardless of which live migration approach that is used, the main objective isthat users of the migrated VM should not perceive any interruption in service.Apart from this continuous operation objective, live migration performancecan be measured in a number of ways. The total migration time and themigration downtime are obvious metrics, shorter times implying a more e�-cient algorithm. However, the resource consumption of an algorithm is alsoimportant because live migration is aimed to be transparent and e�cient. Analgorithm that consumes excessive resources risks a↵ecting the performance ofthe migrating VM as well as any co-located VMs in a negative manner. Also,network and compute resources are valuable and should not be wasted if it canbe avoided. In this section, we discuss a couple of methods to tackle the chal-lenges of resource consumption, extended migration downtime and performancedegradation and thus improve live migration performance.

When migrating VMs running CPU and/or memory intensive workloads,the migration downtime can easily reach several seconds or more [26, 41, 45].In such cases, there is a high risk of service interruption, e.g., due to misseddatabase timers or dropped network connections. To reduce the migrationdowntime, the memory pages can be compressed before transfer. Because lessdata is transferred in each iteration, the live migration process has a higherprobability of keeping up with the pace with which the VM is writing to itsmemory. This means that the amount of data remaining to transfer whenthe source VM is suspended can be reduced, thereby shortening the migrationdowntime. In Paper I, a delta compression algorithm that is aimed at short-ening migration downtime is proposed and evaluated. The algorithm works byreducing the amount of transferred data during the iterative phase of precopylive migration. The use of compression to improve live migration performancehas also been proposed by Wood et al. [53]. Their strategy to increase livemigration performance between cloud sites involves data deduplication [28]techniques in addition to compression.

The complete memory content of the VM is transferred from the source tothe destination host during live migration. As RAM sizes of several gigabytesare not uncommon and precopy live migration tends to re-sends the same RAMpages several times, such live migration often involves transferring large datavolumes. This leads to long total migration times. Even though compressiontechniques reduce the amount of transferred data, they do not necessarily re-duce the number of page re-sends. To tackle this problem, Paper II proposesand evaluates a page reordering technique that reduces the amount of trans-mitted data by sending the memory pages in reverse order of usage frequencyto avoid re-transfers. In a related contribution, Zheng et al. [54] leverage blockupdate frequency when performing storage migration. Their idea is similarto the dynamic page transfer reordering scheme [45] but for storage migrationinstead of memory migration.

As postcopy migration algorithms only transfer pages once, this approach

14

does not have the same drawback of excessive resource usage as precopy ap-proaches. However, as postcopy algorithms pull missing pages from the sourcehost over the network, performance for applications running in the migratedVM can su↵er after the execution is moved from the source to the destination.Paper IV [43] investigates this problem and evaluates an in-kernel hybrid livemigration algorithm that addresses this issue by adding a short precopy phaseto a postcopy algorithm.

15

16

Chapter 4

Thesis Contributions

This section summarizes the contributions of the papers that make up thisthesis.

4.1 Paper I

Paper I [41] investigates the use of delta compression techniques to tackle theproblem of memory pages being updated faster than they can be transferredover the network during live migration. By compressing the data stream duringthe transfer, migration throughput is increased, reducing the risk of the dirty-ing rate being higher than migration throughput. This means that migrationdowntime can be reduced. The delta compression extension studied in the pa-per is implemented as a modification to the standard KVM precopy algorithmand the performance of the algorithm is evaluated by migrating VMs runningboth real-world and benchmark applications. The evaluation demonstrates asignificant decrease in migration downtime in all of the test cases, with a de-crease of a factor of 100 in one case. Paper I also discusses some general e↵ectsof delta compression on live migration and contains an analysis of when it isbeneficial to use the delta compression technique. The XBZRLE live migra-tion algorithm proposed in this paper has recently been included in the o�cialQemu-KVM distribution [2].

4.2 Paper II

Paper II [45] introduces a live migration algorithm where the pages are trans-ferred in a non-sequential order based on the page dirtying frequency. Thealgorithm, called Dynamic Page Transfer Reordering, is designed to reduce thenumber of page re-sends during migration thereby reducing the total amountof data being sent which in turn leads to a shorter migration time. The page

17

dirtying frequency is tracked by adding a priority bitmap on top of the stan-dard KVM dirty page bitmap. As pages are being dirtied, they move down inpriority. Since the algorithm transfers pages by order of priority the frequentlydirtied pages are not transferred until the end of the migration process, reduc-ing the risk of having to re-send them. The algorithm also includes the deltacompression techniques from Paper I [41]. To evaluate the performance of thealgorithm, a video streaming server as well as a benchmark application is livemigrated. In the evaluation, total migration time was reduced by around 35%,and network bandwidth consumption lowered by 26% to 39%.

4.3 Paper III

In order to advance live migration beyond incremental performance improve-ments to existing algorithms, it is important to gain a deeper understanding ofthe live migration problem itself and its underlying principles. Paper III [43] isan in-depth study of live migration. It takes a step back and investigates theessential characteristics of live migration by identifying 5 fundamental prop-erties, continuous service, low resource usage, robustness, predictability, andtransparency. These properties are then used to investigate, categorize, andcompare precopy, postcopy, and hybrid algorithms, including both well-knowntechniques and a novel RDMA in-kernel approach. A set of experiments is alsoperformed where virtual machines with large memory sizes hosting workloadswith high page dirtying rates are migrated to expose di↵erences and limita-tions of the di↵erent approaches. The results from the experiments are usedto validate the analysis of the fundamental properties. Finally, guidelines areprovided for which approach to use in di↵erent scenarios.

4.4 Paper IV

Vertical elasticity, or scale-up of individual VMs, is hard to perform in today’scloud environments due to limitations in the amount of hardware resourcesavailable in single servers. To address this, Paper IV [42] investigates the con-cept of server disaggregation. This novel technique adds another abstractionlayer on top of the hardware in order to divide CPU, memory, and I/O re-sources into pools which can be utilized to seamlessly provision VMs with theright amount of resources. In this paper, a reference architecture for serverdisaggregation is presented and implemented with key functionalities such astransparent and resilient memory aggregation and fast live migration. Theapproach is validated by a demonstration using benchmarks and a real-worldbig-data application. Performance results indicate a very low overhead, around6%, when using aggregated memory as well as a significant improvement in livemigration performance.

18

4.5 Paper V

The cloud computing landscape has developed into a spectrum of cloud archi-tectures, resulting in a broad range of management tools for similar operationsbut specialized for certain deployment scenarios. This both hinders the e�-cient reuse of algorithmic innovations and increases the heterogeneity betweendi↵erent management systems. In Paper V [24] a unified model to overcomethese problems is proposed. Di↵erent cloud models are analyzed to find simi-larities and common factors in order to build a software architecture for servicedeployment in cloud environments, the Service Deployment Optimizer, SDO.The SDO is capable of deploying services across all common cloud scenarios.The design of the SDO framework is validated through a demonstration of howit can be used to deploy services, perform bursting and brokering, as well asmediate a cloud federation in the context of the OPTIMIS Toolkit [14].

4.6 Paper VI

Extending on Paper V, Paper VI [25] studies service placement under dynamicpricing scenarios. Until now, most research on service placement has focusedon static pricing scenarios, where cloud providers o↵er their resources at fixedprices. However, with the recent trend of dynamic pricing of cloud resources [3],where the price of a compute resource can vary, e.g., depending on the availablecapacity and load of the provider, new placement algorithms are needed. InPaper VI, the SDO scheduling software from Paper V is used to investigatea number of dynamic pricing scheduling algorithms to optimize cost, rangingfrom heuristics to combinatorial optimization. The proposed algorithms arethen evaluated by deploying a set of services across multiple cloud providers.Finally, the tradeo↵ between quality of solution and execution time for thealgorithms is studied are analyzed.

4.7 Paper VII

Paper VII [44], introduces a model for continuous consolidation of resources indatacenters. E�cient mapping of VMs onto PMs is a key problem for cloudinfrastructure providers as hardware utilization directly impacts revenue. To-day, this mapping is commonly performed only when new VMs are created,but as VM workloads fluctuate and PM availability varies, any initial mappingis bound to become suboptimal over time. In Paper VII, methods and algo-rithms are developed to optimize the resource utilization and minimize powerconsumption of a datacenter, thereby increasing the revenue for the provider.The approach includes a set of heuristic methods for continuous optimizationof the VM-to-PM mapping based on combinations of fundamental managementactions, namely suspending and resuming PMs and VMs, and live migrating

19

VMs. To verify the real-world validity of the approach, a proof-of-concept dat-acenter management system is designed and implemented with the proposedalgorithms. The complete continuous datacenter consolidation approach is thenevaluated through a combination of simulations and real experiments where oursystem provisions a workload of benchmark applications. The results indicatethat the proposed algorithms are feasible, the combined management approachachieves the best results, and that the VM suspend and resume mechanism hasthe largest impact on revenue.

20

Bibliography

[1] Microsoft Virtual PC. http://www.microsoft.com/windows/

virtual-pc/default.aspx, visited on 2014-03-31.

[2] Qemu XBZRLE migration capability. http://lists.nongnu.org/

archive/html/qemu-devel/2012-04/msg01250.html, visited on 2014-03-31.

[3] Agmon Ben-Yehuda, O., Ben-Yehuda, M., Schuster, A., and Tsafrir, D.Deconstructing Amazon EC2 spot instance pricing. ACM Trans. Econ.Comput., 1(3):16:1–16:20, 2013.

[4] Akiyama, S., Hirofuchi, T., Takano, R., and Honiden, S. Fast wide arealive migration with a low overhead through page cache teleportation. InCCGrid ’13: The 2013 IEEE/ACM International Symposium on Cluster,Cloud and Grid Computing, pages 78–82, May 2013.

[5] Ali-Eldin, A., Rezaie, A., Mehta, A., Razroev, S., Sjostedt-de Luna, S.,Seleznjev, O., Tordsson, J., and Elmroth, E. How will your workload looklike in 6 years? Analyzing wikimedia’s workload. In IC2E ’14: Proceedingsof the 2014 IEEE International Conference on Cloud Engineering, pages349–354, 2014.

[6] Armbrust, M., Fox, A., Gri�th, R., Joseph, A. D., Katz, R., Konwinski,A., Lee, G., Patterson, D., Rabkin, A., Stoica, I., and Zaharia, M. A viewof cloud computing. Commun. ACM, 53(4):50–58, 2010.

[7] Barham, P., Dragovic, B., Fraser, K., Hand, S., Harris, T., Ho, A., Neuge-bauer, R., Pratt, I., and Warfield, A. Xen and the art of virtualization. InSOSP ’03: Proceedings of the nineteenth ACM symposium on Operatingsystems principles, pages 164–177. ACM, 2003.

[8] Bradford, R., Kotsovinos, E., Feldmann, A., and Schioberg, H. Live wide-area migration of virtual machines including local persistent state. In VEE’07: Third International ACM SIGPLAN/SIGOPS Conference on VirtualExecution Environments, pages 169–179. ACM, 2007.

21

[9] Clark, C., Fraser, K., Hand, S., Hansen, J. G., Jul, E., Limpach, C.,Pratt, I., and Warfield, A. Live migration of virtual machines. In NSDI’05: 2nd Symposium on Networked Systems Design and Implementation,pages 273–286. ACM, 2005.

[10] Co↵man, E. G., Jr., Garey, M. R., and Johnson, D. S. Approximationalgorithms for NP-hard problems. In Approximation Algorithms for BinPacking: A Survey, pages 46–93. PWS Publishing Co., 1997.

[11] Creasy, R. J. The origin of the VM/370 time-sharing system. IBM Journalof Research & Development, 25:483–490, 1981.

[12] Distributed Management Task Force. Architecture for ManagingClouds - a white paper from the open cloud standards incubator pa-per. http://dmtf.org/sites/default/files/standards/documents/

DSP-IS0102_1.0.0.pdf, visited on 2014-03-31.

[13] Espling, D., Armstrong, D., Tordsson, J., Djemame, K., and Elmroth,E. Contextualization: Dynamic configuration of virtual machines. 2013.Submitted.

[14] Ferrer, A. J., Hernandez, F., Tordsson, J., Elmroth, E., Ali-Eldin, A.,Zsigri, C., Sirvent, R., Guitart, J., Badia, R. M., Djemame, K., Ziegler, W.,Dimitrakos, T., Nair, S. K., Kousiouris, G., Konstanteli, K., Varvarigou,T., Hudzia, B., Kipp, A., Wesner, S., Corrales, M., Forgo, N., Sharif,T., and Sheridan, C. OPTIMIS: A Holistic Approach to Cloud ServiceProvisioning. Future Generation Computer Systems, 28(1):66–77, 2012.

[15] Foster, I., Zhao, Y., Raicu, I., and Lu, S. Cloud computing and gridcomputing 360-degree compared. In GCE ’08: The 2008 Grid ComputingEnvironments Workshop, pages 1–10, Nov 2008.

[16] Geer, D. Chip makers turn to multicore processors. Computer, 38(5):11–13, 2005.

[17] Goiri, I., Guitart, J., and Torres, J. Characterizing Cloud Federationfor Enhancing Providers’ Profit. In Proceedings of the IEEE 3rd Inter-national Conference on Cloud Computing (CLOUD’10), pages 123–130.IEEE, 2010.

[18] Harney, E., Goasguen, S., Martin, J., Murphy, M., and Westall, M. Thee�cacy of live virtual machine migrations over the internet. In VTDC ’07:2nd International Workshop on Virtualization Technologies in DistributedComputing, pages 1–7. ACM, 2007.

[19] InfiniBand Trade Association. InfiniBand Architecture Specification: Re-lease 1.0. InfiniBand Trade Association, 2000.

22

[20] Jennings, N., Faratin, P., Lomuscio, A., Parsons, S., Wooldridge, M., andSierra, C. Automated negotiation: prospects, methods and challenges.Group Decision and Negotiation, 10(2):199–215, 2001.

[21] Kernel Based Virtual Machine. KVM - kernel-based virtualization machinewhite paper. http://kvm.qumranet.com/kvmwiki, visited on 2012-05-23.

[22] Kesselman, C. and Foster, I. The grid: blueprint for a future computinginfrastructure. Morgan Kaufmann Publication, 1999.

[23] Larsson, L., Henriksson, D., and Elmroth, E. Scheduling and monitoringof internally structured services in cloud federations. In ISCC ’11: The2011 IEEE Symposium on Computers and Communications, pages 173–178, 2011.

[24] Li, W., Svard, P., Tordsson, J., and Elmroth, E. A General Approachto Service Deployment in Cloud Environments. In CGC ’12: Proceedingsof the second International Conference on Cloud and Green Computing,pages 17–24. IEEE, 2012.

[25] Li, W., Svard, P., Tordsson, J., and Elmroth, E. Cost-Optimal Cloud Ser-vice Deployment under Dynamic Pricing Schemes. In UCC ’13: Proceed-ings of the sixth International Conference on Utility and Cloud Computing,pages 187–194. IEEE/ACM, 2013.

[26] Liu, P., Yang, Z., Song, X., Zhou, Y., Chen, H., and Zang, B. Het-erogeneous live migration of virtual machines. Technical report, ParallelProcessing Institute, Fudan University, 2009.

[27] Lucas-Simarro, J. L., Moreno-Vozmediano, R., Montero, R. S., andLlorente, I. M. Scheduling Strategies for Optimal Service Deploy-ment across Multiple Clouds. Future Generation Computer Systems,29(6):1431–1441, 2013.

[28] Mandagere, N., Zhou, P., Smith, M. A., and Uttamchandani, S. Demysti-fying data deduplication. In Proceedings of the ACM/IFIP/USENIX Mid-dleware ’08 Conference Companion, Companion ’08, pages 12–17. ACM,2008.

[29] Mann, C. C. The end of Moores law. Technology Review, 103(3):42–48,2000.

[30] Mell, P. and Grance, T. The NIST definition of cloud computing (draft).NIST special publication, 800(145):7, 2011.

[31] Neiger, G., Santoni, A., Leung, F., Rodgers, D., and Uhlig, R. Intel virtual-ization technology: Hardware support for e�cient processor virtualization.Intel Technology Journal, 10:167–178, 2006.

23

[32] Nomura. Nomura VMWare Q4 report. Technical report, Nomura Re-search, 2012.

[33] Plummer, D. C. Rfc 826: An ethernet address resolution protocol – or– converting network protocol addresses to 48.bit ethernet address fortransmission on ethernet hardware. 1982.

[34] Ramakrishnan, K. K., Shenoy, P., and Van der Merwe, J. Live data centermigration across WANs: a robust cooperative context aware approach. InINM ’07: The ACM SIGCOMM Workshop on Internet Network Manage-ment 2007, pages 262–267. ACM.

[35] Rochwerger, B., Breitgand, D., Levy, E., Galis, A., Nagin, K., Llorente,I., Montero, R., Wolfsthal, Y., Elmroth, E., Caceres, J., Ben-Yehuda, M.,Emmerich, W., and Galan, F. The RESERVOIR model and architec-ture for open federated cloud computing. IBM Journal of Research andDevelopment, 53(4):1–11, 2009.

[36] Sedaghat, M., Hernandez, F., and Elmroth, E. Unifying cloud manage-ment : Towards overall governance of business level objectives. In CCGrid’11: Cluster, The 2011 International Symposium on Cloud and Grid Com-puting. IEEE Computer Society, 2011.

[37] Song, W., Xiao, Z., Chen, Q., and Luo, H. Adaptive Resource Provisioningfor the Cloud using Online Bin Packing. IEEE Transactions on Computers,99, 2013.

[38] Srikantaiah, S., Kansal, A., and Zhao, F. Energy aware consolidation forcloud computing. In Proceedings of the 2008 conference on Power awarecomputing and systems, volume 10. USENIX Association, 2008.

[39] Strauch, S., Kopp, O., Leymann, F., and Unger, T. A taxonomy forcloud data hosting solutions. In CGC ’11: Proceedings of the 2011 IEEEInternational Conference on Cloud and Green Computing, pages 577–584.IEEE Computer Society, 2011.

[40] Subramoni, H., Koop, M., and Panda, D. Designing next generation clus-ters: Evaluation of InfiniBand DDR/QDR on intel computing platforms.In HOTI ’09: The 2009 IEEE Symposium on High Performance Intercon-nects, pages 112–120, Aug 2009.

[41] Svard, P., Hudzia, B., Tordsson, J., and Elmroth, E. Evaluation of deltacompression techniques for e�cient live migration of large virtual ma-chines. In VEE ’11: The 2011 ACM SIGPLAN/SIGOPS InternationalConference on Virtual Execution Environments, pages 111–120. ACM,2011.

24

[42] Svard, P., Hudzia, B., Tordsson, J., and Elmroth, E. Hecatonchire: En-abling Multi-Host Virtual Machines by Resource Aggregation and Pooling.Technical report, 2014. Tech Report UMINF 14.11. Submitted.

[43] Svard, P., Hudzia, B., Walsh, S., Tordsson, J., and Elmroth, E. TheNoble Art of Live VM Migration - Principles and performance of precopy,postcopy and hybrid migration of demanding workloads. Technical report,2014. Tech Report UMINF 14.10. Submitted.

[44] Svard, P., Li, W., Wadbro, E., Tordsson, J., and Elmroth, E. ContinousDatacenter Consolidation. Technical report, 2014. Tech Report UMINF14.08. Submitted.

[45] Svard, P., Tordsson, J., Hudzia, B., and Elmroth, E. High performance livemigration through dynamic page transfer reordering and compression. InCloudCom ’11: 3rd IEEE International Conference on Cloud ComputingTechnology and Science, pages 542–548. IEEE, 2011.

[46] The Economist. Tanks in the cloud. http://www.economist.com/node/17797794, visited on 2014-03-31.

[47] Tordsson, J., Montero, R., Moreno-Vozmediano, R., and Llorente, I. Cloudbrokering mechanisms for optimized placement of virtual machines acrossmultiple providers. Future Generation Computer Systems, 28(2):358–367,2012.

[48] Travostino, F., Daspit, P., Gommans, L., Jog, C., de Laat, C., Mambretti,J., Monga, I., van Oudenaarde, B., Raghunatha, S., and Wang, P. Y.Seamless live migration of virtual machines over the MAN/WAN. In SC’06: The international Conference on for High Performance Computing,Networking, Storage and Analysis, page 290. ACM, 2006.

[49] Trelles, O., Prins, P., Snir, M., and Jansen, R. C. Big data, but are weready? Nature reviews Genetics, 12(3):224–224, 2011.

[50] Verma, A., Dasgupta, G., Nayak, T. K., De, P., and Kothari, R. Serverworkload analysis for power minimization using consolidation. In USENIX’09: The 2009 USENIX Annual Technical Conference ’09, pages 28–28.USENIX, 2009.

[51] VMWARE. VMware VMotion: Live migration of virtual machines with-out service interruption datasheet. http://www.vmware.com/files/pdf/VMware-VMotion-DS-EN.pdf, visited on 2014-03-31.

[52] Wagner, H. M. An integer linear-programming model for machine schedul-ing. Naval Research Logistics Quarterly, 6(2):131–140, 1959.

25

[53] Wood, T., Ramakrishnan, K. K., Shenoy, P., and van der Merwe, J. Cloud-Net: dynamic pooling of cloud resources by live WAN migration of vir-tual machines. In VEE ’11: The 2011 ACM SIGPLAN/SIGOPS Inter-national Conference on Virtual Execution Environments, pages 121–132.ACM, 2011.

[54] Zheng, J., Ng, T. S. E., and Sripanidkulchai, K. Workload-aware live stor-age migration for clouds. In VEE ’11: The 2011 ACM SIGPLAN/SIGOPSInternational Conference on Virtual Execution Environments, pages 133–144. ACM, 2011.

26