17
Keyword(s): Abstract: © Automated Control of Multiple Virtualized Resources Pradeep Padala, Kai-Yuan Hou, Kang G. Shin, Xiaoyun Zhu, Mustafa Uysal, Zhikui Wang, Sharad Singhal, Arif Merchant HP Laboratories HPL-2008-123 virtualization, consideration, service level objective, control, service differentiation Virtualized data centers enable consolidation of multiple applications and sharing of multiple resources among these applications. However, current virtualization technologies are inadequate in achieving complex service level objectives (SLOs) for enterprise applications with time-varying multiple resource demands. In this paper, we present AutoControl, a resource control system that automatically adapts to dynamic changes in a shared virtualized infrastructure to achieve application SLOs. AutoControl is a combination of an online model estimator and a novel multi-input, multi-output (MIMO) resource controller. The model estimator captures the complex relationship between application performance and resource allocation, while the MIMO controller allocates the right amount of resources to achieve application SLOs. Our experimental results using RUBiS and TPC-W benchmarks along with production-trace-driven workloads indicate that AutoControl can detect and adapt to CPU and disk I/O bottlenecks that occur over time and across multiple nodes and allocate multiple virtualized resources accordingly to achieve application SLOs. It can also provide service differentiation according to the priorities of individual applications. External Posting Date: October 6, 2008 [Fulltext] Approved for External Publication Internal Posting Date: October 6, 2008 [Fulltext] Copyright 2008 Hewlett-Packard Development Company, L.P.

Automated Control of Multiple Virtualized Resources · Automated Control of Multiple Virtualized Resources ... Resource usage in a production SAP appli- ... though the architecture

  • Upload
    buidan

  • View
    229

  • Download
    2

Embed Size (px)

Citation preview

Keyword(s): Abstract:

©

Automated Control of Multiple Virtualized Resources

Pradeep Padala, Kai-Yuan Hou, Kang G. Shin, Xiaoyun Zhu, Mustafa Uysal, Zhikui Wang,Sharad Singhal, Arif Merchant

HP LaboratoriesHPL-2008-123

virtualization, consideration, service level objective, control, service differentiation

Virtualized data centers enable consolidation of multiple applications and sharing of multiple resourcesamong these applications. However, current virtualization technologies are inadequate in achievingcomplex service level objectives (SLOs) for enterprise applications with time-varying multiple resourcedemands. In this paper, we present AutoControl, a resource control system that automatically adapts todynamic changes in a shared virtualized infrastructure to achieve application SLOs. AutoControl is acombination of an online model estimator and a novel multi-input, multi-output (MIMO) resourcecontroller. The model estimator captures the complex relationship between application performance andresource allocation, while the MIMO controller allocates the right amount of resources to achieveapplication SLOs. Our experimental results using RUBiS and TPC-W benchmarks along withproduction-trace-driven workloads indicate that AutoControl can detect and adapt to CPU and disk I/Obottlenecks that occur over time and across multiple nodes and allocate multiple virtualized resourcesaccordingly to achieve application SLOs. It can also provide service differentiation according to thepriorities of individual applications.

External Posting Date: October 6, 2008 [Fulltext] Approved for External PublicationInternal Posting Date: October 6, 2008 [Fulltext]

Copyright 2008 Hewlett-Packard Development Company, L.P.

Automated Control of Multiple Virtualized Resources

Pradeep Padala, Kai-Yuan HouKang G. Shin

The University of Michigan

Xiaoyun Zhu, Mustafa Uysal,Zhikui Wang, Sharad Singhal, Arif Merchant

Hewlett Packard Laboratories

AbstractVirtualized data centers enable consolidation of mul-tiple applications and sharing of multiple resourcesamong these applications. However, current virtual-ization technologies are inadequate in achieving com-plex service level objectives (SLOs) for enterprise ap-plications with time-varying multiple resource demands.In this paper, we present AutoControl, a resourcecontrol system that automatically adapts to dynamicchanges in a shared virtualized infrastructure to achieveapplication SLOs. AutoControl is a combination ofan online model estimator and a novel multi-input,multi-output (MIMO) resource controller. The modelestimator captures the complex relationship betweenapplication performance and resource allocation, whilethe MIMO controller allocates the right amount ofresources to achieve application SLOs. Our experi-mental results using RUBiS and TPC-W benchmarksalong with production-trace-driven workloads indi-cate that AutoControl can detect and adapt to CPUand disk I/O bottlenecks that occur over time andacross multiple nodes and allocate multiple virtual-ized resources accordingly to achieve application SLOs.It can also provide service differentiation according tothe priorities of individual applications.

1. INTRODUCTIONVirtualization is causing a disruptive change in en-

terprise data centers and giving rise to a new paradigm:shared virtualized infrastructure. In this new paradigm,multiple enterprise applications share dynamically al-located resources. These applications are also consol-idated to reduce infrastructure and operating costswhile simultaneously increasing resource utilization.As a result, data center administrators are faced withgrowing challenges to meet service level objectives(SLOs) in the presence of dynamic resource sharingand unpredictable interactions across many applica-tions. These challenges include:

• Complex SLOs : It is non-trivial to convert in-dividual application SLOs to corresponding re-source shares in the shared virtualized platform.

0

20

40

60

80

100

0 48 96 144 192 240 288

Re

sou

rce

util

iza

tion

(%

)

Time interval (every 5 mins)

CPU utilDisk util

Figure 1: Resource usage in a production SAP appli-

cation server for a one-day period.

For example, determining the amount of CPUand the disk shares required to achieve a speci-fied number of financial transactions is difficult.

• Changing resource requirements over time: Theintensity and the mix of enterprise applicationworkloads change over time. As a result, thedemand for individual resource types changesover the lifetime of the application. For exam-ple, Figure 1 shows the CPU and disk utilizationof an SAP application for a day. The utilizationfor both resources varies over time considerably,and the peaks of the two resource types occurredat different times of the day. This implies thatstatic resource allocation can meet applicationSLOs only when the resources are allocated forpeak demands, wasting resources.

• Distributed resource allocation: Multi-tier appli-cations spanning across multiple nodes requireresource allocations across all tiers to be at ap-propriate levels to meet end-to-end applicationSLOs.

• Resource dependencies : Application-level per-formance often depends on the application’s abil-ity to simultaneously access multiple system-levelresources.

The traditional approach to meeting SLOs of enter-prise applications involves resource over-provisioningbased on cumbersome pre-production sizing and test

1

cycles. Since resources can now be dynamically pro-visioned based on actual demand, the resource over-provisioning approach is wasteful. Furthermore, ina virtualized infrastructure, performance of a givenapplication depends on other applications sharing re-sources, making it difficult to replicate its behavior inpre-production environments.

In this paper, we address the problem of managingthe allocation of computational resources in a shared,virtualized infrastructure to achieve application-levelSLOs. Our solution to this problem is AutoControl,an automated resource control and adaptation sys-tem.

Our main contributions are twofold: First, we de-sign an online model estimator to dynamically de-termine and capture the relationship between appli-cation level performance and the allocation of indi-vidual resource shares. Our adaptive modeling ap-proach captures the complex behavior of enterpriseapplications including varying resource demands overtime, resource demands from distributed applicationcomponents, and shifting demands across multipleresources types. Second, we design a two-layered,multi-input, multi-output (MIMO) controller to auto-

matically allocate multiple types of resources to mul-tiple enterprise applications to achieve their SLOs.The first layer consists of a set of application con-trollers that automatically determines the amount ofresources necessary to achieve individual applicationSLOs, using the estimated models and a feedback ap-proach. The second layer is comprised of a set ofnode controllers that detect resource bottlenecks onthe shared nodes and properly allocate resources ofmultiple types to individual applications. In overloadcases, the node controllers can provide service dif-ferentiation by prioritizing allocation among differentapplications.

We have built a testbed using Xen [5], and evalu-ated our controller in various scenarios. Our experi-mental results indicate that AutoControl can detectand adapt to bottlenecks happening in both CPUand disk across multiple nodes. We show that thecontroller can handle multiple multi-tier applicationsrunning RUBiS and TPC-W benchmarks along withworkloads driven by production traces. We also showthat priorities can be enforced among different appli-cations during resource contention.

The remainder of the paper is organized as follows.Section 2 provides an overview of AutoControl. Thisis followed by a detailed description of the design ofthe model estimator and the MIMO controller in Sec-tion 3. Section 4 discusses experimental methodologyand testbed setup. We present experimental evalua-tion results in Section 5, followed by a discussion ofrelated work in Section 6. Section 7 delineates someof the limitations of this work along with suggestions

Figure 2: Physical organization: Each node hosts mul-

tiple applications running on VMs. Applications can

span multiple nodes

Figure 3: Logical controller organization: Each appli-

cation has one application controller. Each node has

one node controller that arbitrates the requests from

multiple application controllers

for future research, and conclusions are drawn in Sec-tion 8.

2. OVERVIEW, ASSUMPTIONS AND GOALSIn this section, we present an overview of our sys-

tem architecture and the assumptions and goals thatdrive our design. We assume that applications arehosted within containers or virtual machines (VM) [5]to enable resource sharing within a virtualized servernode. A multi-tier application may run on multipleVMs that span nodes. Figure 2 shows an examplewith three nodes hosting four applications.

In AutoControl, operators can specify the require-ments for an application in a tuple (priority, metric, target),where priority represents the priority of the applica-tion, metric specifies the performance metric (e.g.,transaction throughput, 90th percentile of responsetime), and target specifies the performance target.Currently, our implementation supports only a singlemetric specification within AutoControl at a time, al-though the architecture can be generalized to supportdifferent metrics for different applications.

AutoControl can use any resources that affect theapplication metrics of interest and that can be allo-

2

cated between the applications; in this paper, we useCPU and disk I/O as the resources, and applicationthroughput or average response time as the perfor-mance metric. AutoControl consists of a set of appli-cation controllers (AppControllers) and a set of nodecontrollers (NodeControllers). There is one AppCon-troller for each hosted application, and one NodeCon-troller for each virtualized node. Figure 3 shows thelogical controller architecture for the system shownin Figure 2. For each application, its AppControllerperiodically polls an application performance sensorfor the measured performance. We refer to this pe-riod as the control interval. The AppController com-pares this measurement with the application perfor-mance target, and based on the discrepancy, auto-matically determines the resource allocations neededfor the next control interval, and sends these requeststo the NodeControllers for the nodes that host theapplication.

For each node, based on the collective requests fromall relevant AppControllers, the corresponding Node-Controller determines whether it has enough resourceof each type to satisfy all demands, and computesthe actual resource allocation using the methods de-scribed in Section 3. The computed allocation valuesare fed into the resource schedulers in the virtual-ization layer for actuation, which allocate the corre-sponding portions of the node resources to the VMsin real time. Figure 3 shows CPU and disk schedulersas examples.

The AutoControl architecture allows the placementof AppControllers and NodeControllers in a distributedfashion. NodeControllers can be hosted in the physi-cal node they are controlling. AppControllers can behosted in a node where one of the application tiersis located. We do not mandate this placement, how-ever, and the data center operator can choose to hosta set of controllers in a node dedicated for controloperations.

We assume that all nodes in the data center are con-nected with a high speed network, so that sensor andactuation delays within AutoControl are small com-pared to the control interval. We also require accuratesystem sensors and actuators, and assume that theunderlying system schedulers provide a rich enoughinterface to dynamically adjust resource shares forVMs.

We set the following goals in designing AutoCon-

trol :

Performance assurance: If all applications can meettheir performance targets, AutoControl shouldallocate resources properly to achieve them. Ifthey cannot be met, AutoControl should provideservice differentiation according to applicationpriorities.

Table 1: NotationA set of all hosted applicationsTa set of all the tiers in application a ∈ A,

e.g., Ta = {web, db}R set of all resource types controlled,

e.g., R = {cpu, disk}k index for control intervalx(k) value of variable x in control interval kura,r,t requested allocation of resource type r

to tier t of application a, 0 ≤ ua,r,t(k) ≤ 1(ura,r for single-tier applications)

ua,r,t actual allocation of resource type rto tier t of application a, 0 ≤ ua,r,t(k) ≤ 1(ua,r for single-tier applications)

ya measured performance of application ayra performance target for application ayna normalized performance for application a,

where yna = ya/yra

wa priority weight for application aq stability factor in the AppController

Automation: While performance targets and cer-tain parameters within AutoControl may be setmanually, all allocation decisions should be madeautomatically without human intervention.

Adaptation: The controller should adapt to varia-tions in workloads or system conditions.

Scalability: The controller architecture should scaleto a large environment with many applicationsand nodes by not requiring a single centralizedcontroller.

3. DESIGN AND IMPLEMENTATIONThis section details the design of both AppCon-

troller and NodeController in the two-layered archi-tecture of AutoControl. For easy reference, Table 1summarizes the mathematical symbols that will beused for key parameters and variables in these con-trollers.

3.1 Design of AppControllerAs introduced in Section 2, every hosted applica-

tion has an AppController associated with it. In orderfor each AppController to decide how much resourceis needed for the application to meet its performancetarget, it first needs to determine the quantitativeand dynamic relationship between the application’sresource allocation and its performance. Such a rela-tionship is captured in the notion of “transfer func-tion” in traditional control theory for modeling ofphysical systems. However, most computing systems,such as the one considered in this paper, cannot berepresented by a single, linear transfer function (ormodel) because their behavior is often nonlinear andworkload-dependent. We assume, however, that the

3

Figure 4: AppController’s internal structure

behavior of the system can be approximately char-acterized locally by a linear model. We periodicallyre-estimate the model online based on real-time mea-surements of the relevant variables and metrics, allow-ing the model to adapt to different operating regimesand workload conditions.

Every AppController has two modules as illustratedin Figure 4: (1) a model estimator that automaticallylearns in real time a model for the relationship be-tween an application’s resource allocation and its per-formance, and (2) an optimizer that predicts the re-source allocation required for the application to meetits performance target based on the estimated model.For each application a ∈ A, let ya(k) be the valueof its performance metric provided by an applicationperformance sensor at the end of control interval k,and let yra be the desired value for its performance.Furthermore, we define yna(k) = ya(k)/yra to be thenormalized performance value for interval k. We thendefine the resource-allocation variable ua to be a vec-tor that contains all the elements in the set {ua,r,t :r ∈ R, t ∈ Ta}. For example, for a two-tier applica-tion whose performance depends on two critical re-sources, e.g., Ta = {web, db} and R = {cpu, disk},ua is a 4-dimensional vector. ua(k) represents theresource-allocation values for application a during in-terval k (we represent all vectors in boldface).

3.1.1 Model estimator

For every control interval, the model estimator re-computes a linear model that approximates the non-linear and time-varying relationship between the re-source allocation to application a (ua) and its nor-malized performance (yna) around the current op-erating point. More specifically, the following auto-regressive-moving-average (ARMA) model is used torepresent this relationship:

yna(k) = a1(k) yna(k − 1) + a2(k) yna(k − 2)

+b0

T (k)ua(k) + b1

T (k)ua(k − 1), (1)

where the model parameters a1(k) and a2(k) capturethe correlation between the application’s past andpresent performance, and b0(k) and b1(k) are vec-

tors of coefficients capturing the correlation betweenthe current performance and the recent resource al-locations. Both ua(k) and ua(k − 1) are column vec-tors, and b0

T (k) and b1

T (k) are row vectors. Wechose a linear model because it is easy to estimateonline and simplifies the corresponding controller de-sign problem. In our experiments, we have found thatthe second-order ARMA model in Eq. (1) (i.e., onethat takes into account the past two control inter-vals) can predict the application performance withadequate accuracy. (Some evidence of this will bepresented later in Section 5.)

The reason why we model the normalized perfor-mance rather than the absolute performance is thatthe latter can have an arbitrary magnitude. The nor-malized performance yna has values that are compa-rable to those of the resource allocations in ua, whichare less than 1. This improves the numerical stabilityof the algorithm.

Note that the model represented in Eq. (1) is adap-

tive itself, because all the model parameters a1, a2,b0 and b1 are functions of time interval k. Theseparameters can be re-estimated online using the re-cursive least squares (RLS) method [4]. At the end ofevery control interval k − 1, the model estimator col-lects the newly-measured performance value ya(k−1),normalizes it by the performance target yra, and usesit to update the values for the model parameters. Theapproach assumes that drastic variations in workloadsthat cause significant model parameter changes occurinfrequently relative to the control interval, thus al-lowing the the model estimator to converge locallyaround an operating point and track changes in theoperating point. The recursive nature of RLS makesthe time taken for this computation negligible for con-trol intervals longer than 10 seconds.

3.1.2 Optimizer

The main goal of the optimizer is to determine theresource allocation required (ura) in order for the ap-plication to meet its target performance. An addi-tional goal is to accomplish this in a stable manner,without causing large oscillations in the resource al-location. We achieve these goals by finding the valueof ura that minimizes the following cost function:

Ja = (yna(k) − 1)2 + q‖ura(k) − ua(k − 1)‖2. (2)

To explain the intuition behind this function, we de-fine Jp = (yna(k) − 1)2, and Jc = ‖ura(k) − ua(k −1)‖2. It is easy to see that Jp is 0 when ya(k) = yra,i.e., when application a is meeting its performancetarget. Otherwise, Jp serves as a penalty for the devi-ation of the application’s measured performance fromits target. Therefore, we refer to Jp as the perfor-

mance cost.The second function Jc, referred to as the control

4

cost, is included to improve controller stability. Thevalue of Jc is higher when the controller makes alarger change in the resource allocation in a singleinterval. Because Ja = Jp + q · Jc, our controlleraims to minimize a linear combination of both theperformance cost and the control cost. Using the ap-proximate linear relationship between the normalizedperformance and the resource allocation, as describedby Eq. (1), we can derive the resource allocation re-quired to minimize the cost function Ja, in terms ofthe recent resource allocation ua and the correspond-ing normalized performance values yna:

ur∗a(k) = (b0b0

T + qI)−1((1 − a1 yna(k − 1)

−a2 yna(k − 2) − b1

Tua(k − 1))b0 + qua(k − 1)).(3)

This is a special case of the optimal control law de-rived in [18]. Note that the dependency of the modelparameters a1, a2, b0 and b1 on the control intervalk has been dropped from the equation to improve itsreadability.

To understand the intuition behind this control lawand the effect of the scaling factor q, we define ∆yna(k) =1 − a1 yna(k − 1) − a2 yna(k − 2) − b1

Tua(k − 1).This indicates the discrepancy between the model-predicted value for yna(k) and its target (which is 1)that needs to be compensated by the next allocation(ua(k)). For a small q value, ur∗

a(k) is dominated

by the effect of ∆yna(k), and the controller reactsaggressively to tracking errors in performance. Asthe value of q increases, ur∗

a(k) is increasingly domi-

nated by the previous allocation (ua(k − 1)), and thecontroller responds slowly to the tracking error withless oscillation in the resulting resource allocation. Inthe extreme of an infinitely large q value, we haveur∗

a(k) = ua(k − 1), meaning the allocation remains

constant. As a result, the scaling factor q providesus an intuitive way to control the trade-off betweenthe controller’s stability and its ability to respond tochanges in the workloads and performance, hence isreferred to as the stability factor .

3.2 Design of NodeControllerFor each of the virtualized nodes, a NodeController

determines the allocation of resources to the applica-tions, based on the resources requested by the App-Controllers and the resources available at the node.This is required because the AppControllers act in-dependently of one another and may, in aggregate,request more resources than the node has available.The NodeController divides the resources between theapplications as follows. For resources where the to-tal resources requested are less than the available re-sources, the NodeController divides each resource inproportion to the requests from the AppControllers.For resources that are contested, that is, where thesum of the resource requests is greater than the avail-

able resource, the NodeController picks an allocationthat locally minimizes the discrepancy between theresulting normalized application performance and itstarget value. More precisely, the cost function usedis the weighted sum of the squared errors for the nor-malized application performance, where each applica-tion’s weight represents its priority relative to otherapplications.

To illustrate this resource allocation method, let ustake node1 in Figures 2 and 3 as an example (de-noted as “n1”). This node is being used to host theweb tier of application 1 and application 2. SupposeCPU and disk are the two critical and controllable re-sources being shared by the two applications. Then,the resource request from application 1 consists of twoelements, ur1,cpu,web and ur1,disk,web, one for each re-source. Similarly, the resource request from applica-tion 2 consists of ur2,cpu and ur2,disk. Because re-source allocation is defined as a percentage of the to-tal shared capacity of a resource, the resource requestsfrom both applications need to satisfy the followingcapacity constraints:

ur1,cpu,web + ur2,cpu ≤ 1 (4)

ur1,disk,web + ur2,disk ≤ 1 (5)

When constraint (4) is violated, we say the virtu-alized node suffers CPU contention. Similarly, disk

contention refers to the condition of the node whenconstraint (5) is violated. Next, we describe the fourpossible scenarios for the virtualized node n1, and theNodeController algorithm for dealing with each sce-nario.

3.2.1 Scenario I: No CPU or disk contention

In this case, the node has adequate CPU and diskresources to meet all resource requests, and hence theresources are divided in proportion to the resourcerequests, as follows:

u1,cpu,web = ur1,cpu,web/(ur1,cpu,web + ur2,cpu) (6)

u2,cpu = ur2,cpu/(ur1,cpu,web + ur2,cpu) (7)

u1,disk,web = ur1,disk,web/(ur1,disk,web + ur1,disk)(8)

u2,disk = ur2,disk/(ur1,disk,web + ur2,disk) (9)

This allocation policy implies two things: (1) for eachapplication and each resource, the requested alloca-tion should be satisfied; (2) the excess capacity foreach resource is allocated to both applications in pro-portion to their requests.

3.2.2 Scenario II: CPU contention only

In this scenario, node n1 has enough disk resourceto meet the requests from the AppControllers, butnot enough CPU resource; that is, constraint (4) isviolated while constraint (5) is satisfied. The Node-Controller divides the disk resources in proportion to

5

the requests, as in the previous case, using Eqs. (8)and (9). However, the applications will receive lessCPU resource than requested; let us denote the defi-ciencies as ∆u1,cpu,web = ur1,cpu,web − u1,cpu,web and∆u2,cpu = ur2,cpu − u2,cpu. The resulting discrep-ancy between the achieved and target normalized per-formance of application 1 can then be estimated as|∂yn1/∂u1,cpu,web∆u1,cpu,web|, and similarly for appli-cation 2. The sum of weighted mean squared discrep-ancies of the normalized performance values acrossthe applications can then be estimated as:

Jn1,cpu = w1(∂yn1

∂u1,cpu,web∆u1,cpu,web)

2

+w2(∂yn2

∂u2,cpu∆u2,cpu)2

The CPU resource allocation is found by optimizingthis overall normalized performance discrepancy:

Minimize Jn1,cpu subject to

∆u1,cpu,web + ∆u2,cpu ≥ ur1,cpu,web + ur2,cpu − 1 (10)

∆u1,cpu,web ≥ 0 (11)

∆u2,cpu ≥ 0 (12)

Note that constraint (10) is simply the capacityconstraint (4), applied to actual allocations. Con-straints (11) and (12) ensure that no application isthrottled to increase the performance of another ap-plication beyond its target. In the minimization ob-jective Jn1,cpu, the discrepancies for the applicationsare weighted by their priority weights, so that higherpriority applications experience less performance degra-dation.

From Eq. (1), we know that ∂yn1

∂u1,cpu,web= b0,1,cpu,web,

and ∂yn2

∂u2,cpu= b0,2,cpu. Both coefficients can be ob-

tained from the model estimators in the AppCon-trollers for both applications. This optimization prob-lem is convex and a closed-form solution exists for thecase of two applications sharing the node. For morethan two applications, the solution is more complexbut can still be easily computed by a quadratic pro-gramming solver. We show later, empirically, thatthe performance overhead due to the optimization issmall.

3.2.3 Scenario III: Disk contention only

In this scenario, the disk resource is under con-tention but CPU is not; that is, constraint (5) is vio-lated while constraint (4) is satisfied. The NodeCon-troller follows the same policy for CPU allocation asin Scenario I, and solves the following optimizationproblem to compute the actual disk allocations:

Minimize Jn1,disk = w1(∂yn1

∂u1,disk,web

∆u1,disk,web)2

+w2(∂yn2

∂u2,disk

∆u2,disk)2 s.t.

0

0.2

0.4

0.6

Tim

e (s

ec)

Perf stats sensingOptimization

CPU ActuationDisk Actuation

Total

Figure 5: Average performance overhead

Component SLOCRLS Modeler 262AppController 280NodeController 327

Matlab Optimization code 35Python wrappers 251

Figure 6: SLOC of major code components

∆u1,disk,web + ∆u2,disk ≥ ur1,disk,web + ur2,disk − 1 (13)

∆u1,disk,web ≥ 0 (14)

∆u2,disk ≥ 0 (15)

3.2.4 Scenario IV: CPU and disk contention

This is the scenario where both CPU and disk areunder contention. In this scenario, the actual alloca-tions of CPU and disk for both applications will bebelow the respective requested amounts. The Node-Controller determines the actual allocations by solv-ing the following optimization problem.

Minimize Jn1 = Jn1,cpu + Jn1,disk

subject to Eqs. (10), (11), (12), (13), (14), (15).

Note that the cost function here is the sum of thecost functions for the previous two cases, and the ca-pacity constraints for both resources need to be con-sidered. The optimization problem is more complexbut remains convex, and its optimal solution can becomputed by an existing solver.

3.3 Implementation and performanceconsiderations

We have implemented the model estimator and thecontrollers in Java, and written Python wrappers forsensors and actuators provided by the system. Thecontrollers communicate with the clients and the Pythonwrappers using XML-RPC. The optimization code iswritten in Matlab and the Java program communi-cates with Matlab using pipes. A more efficient im-plementation can be developed using JNI (Java Na-tive call Interface). The total number of lines of codein AutoControl is about 3000. Details of SLOC canbe seen in Figure 6. Our code is written to be ex-tensible and new controllers can be plugged into theframework easily.

The controllers are designed to be scalable by limit-ing the number of control variables each controller has

6

Figure 7: A virtualized node in the testbed

to deal with. More specifically, the number of vari-ables for each AppController is the number of tiersmultiplied by the number of controlled resources, andthe number of variables for each NodeController isthe number of VMs on that node multiplied by thenumber of controlled resources. The performance ofeach decision in AutoControl is mainly affected bythree factors: (1) time taken to collect statistics fromclients, (2) Matlab optimization time, (3) actuationtime. Figure 5 shows the average time taken on ourtestbed for each of these factors. The total time isless than 1.5% of the control interval.

4. TESTBED AND EXPERIMENTATIONTo evaluate AutoControl, we have built a testbed

consisting of three virtualized nodes, each runningmultiple VMs hosting multiple applications. Clientsrunning on other nodes generate workloads for theseapplications.

All the experiments were conducted on HP C-classblades, each equipped with two dual-core 2.2 GHz 64-bit processors with 4GB memory, two Gigabit Ether-net cards and two 146 GB hard disks. The blades wereinstalled with OpenSuse 10.3 and we used the defaultXen (2.6.22.5-31-xen SMP) available in OpenSuse torun the VMs. The VM images were built using thesame distribution, and no changes were made to thekernel.

One network interface and one disk were dedicatedto Dom0, which ran the monitoring framework andour controllers. The VMs were allocated the secondnetwork interface and disk. The clients connectedto the VMs using the network interface dedicated tothe VMs. The controller used its own network inter-face to poll application performance statistics fromthe clients. In order to demonstrate CPU bottlenecksmore easily, we allocated one CPU to the VMs, andused the remaining CPUs for Dom0. Our controller isfully extensible to VMs sharing multiple processors aslong as the CPU scheduler allows arbitrary slicing of

CPU allocation. Figure 7 shows all the componentsin our experiments.

Our experiments were designed to answer the fol-lowing questions:

1. Can the controller detect resource bottlenecksacross time and application tiers?

2. Can the controller adjust resource allocations toovercome these bottlenecks?

3. Can the controller handle different applicationworkloads?

4. Can the controller enforce performance targetsfor metrics including average throughput and re-sponse time?

5. Can the controller prioritize among applicationswhen resources are scarce?

We used three different applications in our experi-ments: RUBiS [2], an online auction site benchmark,a Java implementation of the TPC-W benchmark [6],and a custom-built secure media server.

RUBiS and TPC-W use a multi-tier setup consist-ing of web and database tiers. They both provideworkloads of different mixes and time-varying inten-sity. For RUBiS, we used a workload mix called thebrowsing mix that simulates a user browsing throughan auction site. For TPC-W, we used the shoppingmix, which simulates a user browsing through a shop-ping site. The browsing mix stresses the web tier,while the shopping mix exerts more demand on thedatabase tier.

The custom-built secure media (smedia) server isa representation of a media server that can serve en-crypted media streams. The smedia server runs acertain number of concurrent threads, each serving aclient that continuously requests media files from theserver. A media client can request an encrypted orunencrypted stream. Upon receiving the request, theserver reads the particular media file from the disk (orfrom memory if it is cached), optionally encrypts it,and sends it to the client. A closed-loop client modelis used where a new file is only requested after theprevious request is complete. Reading a file from thedisk consumes disk I/O resource, and encryption re-quires CPU resource. For a given number of threads,by changing the fraction of the client requests for en-crypted media, we can vary the amount of CPU ordisk I/O resource used. This flexibility allowed us tostudy our controller’s behavior for CPU and disk I/Obottlenecks.

4.1 Simulating production tracesTo test our controller under real workloads, we cre-

ated workloads that simulate production traces. Thesetraces were obtained from an SAP application server

7

0

10

20

30

40

50

60

70

0 300 600 900 1200 1500 1800 2100 2400 2700 3000 3300

0 30 60 90 120 150 180 210 240 270 300 330

Time (every 10 seconds)

Real traceSimulated trace

Figure 8: Simulating production traces

running in a production environment. We simulatedthe production traces using RUBiS and TPC-W bycarefully selecting the number of concurrent threadsat a particular time. For example, to create 40% aver-age CPU utilization over a 5 minute period, we used500 threads simulating 500 concurrent users. Notethat we only simulated the CPU utilization of theproduction trace. We did not attempt to recreate thedisk utilization, because the traces did not containthe needed metadata.

Figure 8 shows simulation of a SAP applicationserver production trace using RUBiS. You can seethat the trace is closely followed by carefully selectingthe number of threads that produce same intensity.

We also used traces generated from a media work-load generator called MediSyn [24]. MediSyn gener-ates traces that are based on analytical models drawnfrom real-world traces collected at an HP Labs pro-duction media server. It captures important proper-ties of streaming media workloads, including file du-ration, popularity, encoding bit rate, and streamingsession time. We re-created the access pattern of thetrace by closely following the start times, end times,and bitrates of the sessions. We did not attempt tore-create the disk access pattern, because of the lackof metadata.

4.2 SensorsOur sensors periodically collect two types of statis-

tics: real-time resource utilizations and performanceof applications. CPU utilization statistics are col-lected using Xen’s xm command. Disk utilizationstatistics are collected using the iostat command,which is part of the sysstat package. In our testbed,we measured both the application throughput and theserver-side response time directly from the applica-tion, where throughput is defined as the total num-ber of client requests serviced, and for each client re-quest, response time is defined as the amount of timetaken to service the request. In a real data center,application-level performance may be obtained from

Figure 9: Experimental setup

application logs or from tools like HP Openview.

4.3 ActuatorsOur actuators included Xen’s credit-based CPU sched-

uler and a custom-built proportional disk I/O sched-uler. The credit scheduler provided by Xen allowseach domain (or VM) to be assigned a cap. We usedthe cap to specify a CPU share for each VM. This non-work-conserving mode of CPU scheduling providedbetter performance isolation among applications run-ning in different VMs. The proportional share sched-uler for disk I/O was designed to maximize the effi-ciency of the disk access [11]. The scheduler is log-ically interposed between the virtual machines andthe physical disks: we implemented it as a driver inthe Dom0. The controller interacts with the disk I/Oscheduler by assigning a share to each VM in everycontrol interval.

5. EVALUATION RESULTSWe evaluated AutoControl in a number of exper-

imental scenarios to answer the questions posed inSection 4. In all of the experiments, a control intervalof 20 seconds was used1. In this section, we presentthe performance results from these experiments thatdemonstrate the effectiveness of the AutoControl de-sign.

5.1 Scenario 1: Detecting and mitigating re-source bottlenecks in multiple resourcesand across multiple application tiers

This scenario was designed to validate the followingclaims about AutoControl :

• Claim 1: It can automatically detect resourcebottlenecks and allocate the proper amount ofresources to each application such that all theapplications can meet their performance targetsif possible. This occurs for different types ofbottlenecks and for bottlenecks across multipletiers of an application.

• Claim 2: It can automatically detect the shiftof a bottleneck from one type of resource to an-

1A higher interval may cause the controller to miss somedynamic changes, while a smaller interval may cause os-cillations.

8

0

50

100

150

200

250

0 15 30 45 60 75 90

Thr

ough

put (

Req

uest

s/se

c)

Time intervals (every 20 secs)

AutoControlwork-conserving

statictarget

(a) RUBiS throughput

0

1000

2000

3000

4000

5000

6000

7000

8000

0 15 30 45 60 75 90

Thr

ough

put (

KB

ytes

/sec

)

Time intervals (every 20 secs)

AutoControlwork-conserving

statictarget

(b) Smedia1 throughput

0

1000

2000

3000

4000

5000

6000

7000

8000

0 15 30 45 60 75 90

Thr

ough

put (

KB

ytes

/sec

)

Time intervals (every 20 secs)

AutoControlwork-conserving

statictarget

(c) Smedia2 throughput

0

1000

2000

3000

4000

5000

6000

7000

8000

0 15 30 45 60 75 90

Thr

ough

put (

KB

ytes

/sec

)

Time intervals (every 20 secs)

AutoControlwork-conserving

statictarget

(d) Smedia3 throughput

0

1000

2000

3000

4000

5000

6000

7000

8000

0 15 30 45 60 75 90

Thr

ough

put (

KB

ytes

/sec

)

Time intervals (every 20 secs)

AutoControlwork-conserving

statictarget

(e) Smedia4 throughput

Figure 10: Application throughput with bottlenecks in

CPU or disk I/O and across multiple nodes

0

20

40

60

80

100

0 15 30 45 60 75 90

Per

cent

age

of s

hare

d C

PU

Time intervals (every 20 secs)

rubis-web-cpu-allocsmedia1-cpu-allocsmedia2-cpu-alloc

(a) CPU allocations on node 1

0

20

40

60

80

100

0 15 30 45 60 75 90

Per

cent

age

of s

hare

d I/O

Time intervals (every 20 secs)

rubis-web-dsk-allocsmedia1-dsk-allocsmedia2-dsk-alloc

(b) Disk allocations on node 1

0

20

40

60

80

100

0 15 30 45 60 75 90

Per

cent

age

of s

hare

d C

PU

Time intervals (every 20 secs)

rubis-db-cpu-allocsmedia3-cpu-allocsmedia4-cpu-alloc

(c) CPU allocations on node 2

0

20

40

60

80

100

0 15 30 45 60 75 90

Per

cent

age

of s

hare

d I/O

Time intervals (every 20 secs)

rubis-db-dsk-allocsmedia3-dsk-allocsmedia4-dsk-alloc

(d) Disk allocations on node 2

Figure 11: Resource allocations to different applica-

tions or application tiers on different nodes

other, and still allocate resources appropriatelyto achieve application-level goals.

We use the experimental setup shown in Figure9(a), where two physical nodes host one RUBiS ap-plication spanning two nodes, and four smedia ap-plications. For RUBiS, we used the default browsingmix workload with 600 threads emulating 600 concur-rent clients connecting to the RUBiS server, and used100 requests/sec as the throughput target. Each ofthe smedia applications was driven with 40 threadsemulating 40 concurrent clients downloading mediastreams at 350KB/sec. We ran calibration experi-

9

Table 2: Percentage of encrypted streams in each sme-

dia application in different time intervals

Intervals smedia1 smedia2 smedia3 smedia41-29 50% 50% 2% 2%30-59 2% 2% 2% 2%60-89 2% 2% 50% 50%

ments to measure the total throughput achievable foreach smedia application alone. We observe that, with50% of clients requesting encrypted streams, the ap-plication is CPU-bound and the maximum through-put is just above 7000 KB/sec. If, however, only 2% ofclients are requesting encrypted streams, the applica-tion becomes disk-bound and the maximum through-put is around 3000 KB/sec.

We then ran an experiment for 90 control inter-vals and varied the percentage of encrypted streamsto create a shift of the resource bottleneck in each ofthe virtualized nodes. Table 2 illustrates these transi-tions. For the first 29 intervals, smedia1 and smedia2on node 1 were CPU-bound, whereas smedia3 andsmedia4 on node 2 were disk-bound. We considereda scenario where smedia1 and smedia3 always had athroughput target of 2000 KB/sec each. We then setthe throughput targets for smedia2 and smedia4 at5000 and 1000 KB/sec, respectively. At interval 30,smedia1 and smedia2 on node 1 were switched to disk-bound, and so the throughput target for smedia2 waschanged to 1000 KB/sec. At interval 60, smedia3 andsmedia4 on node 2 were switched to CPU-bound, andso the throughput target for smedia4 was adjusted to5000 KB/sec. The targets were chosen such that bothnodes were running near their capacity limits for ei-ther CPU or disk I/O.

Figure 10 shows the throughput of all the five ap-plications as functions of the control interval. For thefirst 29 intervals, the RUBiS web tier, smedia1 andsmedia2 contended for CPU on node 1, and the RU-BiS db tier, smedia3 and smedia4 contended for diskI/O on node 2. AutoControl was able to achieve thetargets for all the applications in spite of the fact that(i) the resource bottleneck occurs either in the CPUor in the disk; (ii) both tiers of the RUBiS applicationdistributed across two physical nodes experienced re-source contention.

To help understand how the targets were achieved,Figures 11(a) and 11(b) show the CPU and disk I/Oallocations to the RUBiS web tier, smedia1 and sme-dia2 on node 1. For the first 29 intervals, these threeVMs were contending for CPU. The controller gavedifferent portions of both CPU and disk resources tothe three VMs such that all of their targets could bemet. In the same time period (first 29 intervals), onnode 2, the RUBiS database tier, smedia3 and sme-dia4 were contending for the disk I/O. Figures 11(c)and 11(d) show the CPU and disk I/O allocations for

all the three VMs on this node. The controller notonly allocated the right proportion of disk I/O to sme-dia3 and smedia4 for them to achieve their through-put targets, it also allocated the right amount of CPUto the RUBiS database tier so that the two-tier ap-plication could meet its target.

At interval 30, the workloads for the smedia ap-plications on node 1 were switched to be disk-heavy.As a result, smedia1 and smedia2 were contendingfor disk I/O, since RUBiS web tier uses minimal diskresource. The controller recognized this change inresource bottleneck automatically and ensured thatboth smedia1 and smedia2 could meet their new through-put targets by allocating the right amount of disk re-sources to both smedia applications (see Figure 11(b)).

At interval 60, the workloads for the smedia ap-plications on node 2 were switched to be CPU-heavy.Because the RUBiS db tier also requires a non-negligibleamount of CPU (around 20%), smedia3 and smedia4started contending for CPU with the RUBiS db tieron node 2. Again, the controller was able to auto-matically translate the application-level goals into ap-propriate resource allocations to the three VMs (seeFigure 11(c)).

For comparison, we repeated the same experimentusing two other resource allocation methods that arecommonly used on consolidated infrastructure, a work-conserving mode and a static mode. In the work-conserving mode, the applications run in the defaultXen settings, where a cap of zero is specified for theshared CPU on a node, indicating that the applica-tions can use any amount of CPU resources. In thismode, our proportional share disk scheduler was un-loaded to allow unhindered disk access. In the staticmode, the three applications sharing a node were al-located CPU and disk resources in the fixed ratio20:50:30. The resulting application performance fromboth approaches is shown in Figure 10 along with theperformance from AutoControl. As can be seen, nei-ther approach was able to offer the degree of perfor-mance assurance provided by AutoControl.

For the work-conserving mode, RUBiS was ableto achieve a throughput much higher than its tar-get at the cost of performance degradation in theother applications sharing the same infrastructure.The remaining throughput on either node was equallyshared by smedia1 and smedia2 on node 1, and sme-dia3 and smedia4 on node 2. This mode did not pro-vide service differentiation between the applicationsaccording to their respective performance targets.

For the static mode, RUBiS was never able to reachits performance target given the fixed allocation, andthe smedia applications exceeded their targets at sometimes and missed the targets at other times. Giventhe changes in workload behavior for the smedia ap-plications, there is no fixed allocation ratio for both

10

0

0.5

1

1.5

2

2.5

0 15 30 45 60 75 90

Mod

el p

aram

eter

s

Time intervals (every 20 secs)

smedia1 -b0-CPUsmedia1 -b0-DSK

smedia1 - a1

(a) Model parameter values for sme-dia1

0 1000 2000 3000 4000 5000 6000 7000 8000

0 15 30 45 60 75 90

Mod

el p

aram

eter

s

Time intervals (every 20 secs)

ThroughputPred throughput

(b) Measured and model-predictedthroughput for smedia2

Figure 12: Internal workings of the AppController -

model estimator performance

CPU and disk I/O that will guarantee the perfor-mance targets for all the applications. We chose toallocate both CPU and disk resources in the ratio20:50:30, as a human operator might.

To understand further the internal workings of Au-

toControl, we now demonstrate a key element of ourdesign - the model estimator in the AppController

that automatically determines the dynamic relation-ship between an application’s performance and its re-source allocation. Our online estimator continuouslyadapts the model parameters as dynamic changes oc-cur in the system. Figure 12(a)(a) shows the modelparameters (b0,cpu, b0,disk, and a1) for smedia1 asfunctions of the control interval. For lack of space,we omit the second-order parameters and the param-eter values for the other applications. As we can see,the values of b0,cpu, representing the correlation be-tween application performance and CPU allocation,dominated the b0,disk, and a1 parameters for the first29 intervals. The disk allocation also mattered, butwas not as critical. This is consistent with our ob-servation that node 1 had a CPU bottleneck duringthat period. After the 30th interval, when disk be-came a bottleneck on node 1, while CPU became lessloaded, the model coefficient b0,disk exceeded b0,cpu

and became dominant after a period of adaptation.To assess the overall prediction accuracy of the lin-

ear models, we computed two measures, the coeffi-cient of determination (R2) and the mean absolutepercentage error (MAPE), for each application. R2

Table 3: Predictive accuracy of linear models (in per-

centage)

rubis smedia1 smedia2 smedia3 smedia4R2 79.8 91.6 92.2 93.3 97.0

MAPE 4.2 5.0 6.9 4.5 8.5

and MAPE can be calculated as

R2 = 1 −

∑k=1

(yna(k) − yna(k))2∑k(yna(k) − yna,avg)2

MAPE =1

K

K∑

k=1

|yna(k) − yna(k)

yna(k)|

where K is the total number of samples, yna(k) andyna(k) are the model-predicted value and the mea-sured value for the normalized performance of ap-plication a, and yna,avg is the sample mean of yna.Table 3 shows the values of these two measures forall the five applications. As an example, we alsoshow in Figure 12(b)(b) the measured and the model-predicted throughput for smedia2. From both the ta-ble and the figure, we can see that our model is ableto predict the normalized application performance ac-curately, with R2 above 80% and MAPE below 10%.This validates our belief that low-order linear mod-els, when adapted online, can be good enough localapproximations of the system dynamics even thoughthe latter is nonlinear and time-varying.

5.2 Scenario 2: Enforcing application prior-ities

In this scenario, we use the experimental setup shownin Figure 9(b) to substantiate the following two claims:

• Claim 3: AutoControl can support differentmulti-tier applications.

• Claim 4: During resource contention, if twoapplications sharing the same resource have dif-ferent priority weights, the application with ahigher priority weight will see a lower normalizedtracking error (|yna − 1|) in its performance.

In this setup, we have two multi-tier applications,RUBiS and TPC-W, and four smedia applicationsspanning three nodes. We ran the same workloadsused in Scenario 1 for RUBiS, smedia1 and smedia2.TPC-W was driven with the shopping mix workloadwith 200 concurrent threads. Each of the smedia ap-plications on node 3 was driven with a workload of 40concurrent users, where 50% of clients requested en-crypted streams (making it CPU-bound). We assumethat the TPC-W application is of higher priority thanthe two smedia applications sharing the same node.Therefore, TPC-W is assigned a priority weight ofw = 2 while the other applications have w = 1 inorder to provide service differentiation.

11

Unlike the setup used in Scenario 1, there was noresource contention on node 2. For the first 29 inter-vals, all the six applications were able to meet theirgoals. Figure 13 shows the throughput target andthe achieved throughput for TPC-W and smedia3.(The other four applications are not shown to savespace.) At interval 30, 800 more threads were addedto the TPC-W client, simulating increased user activ-ity. The throughput target for TPC-W was adjustedfrom 20 to 50 requests/sec to reflect this change. Thisincreases the CPU load on the database tier creating aCPU bottleneck on node 3. AutoControl responds tothis change automatically and correctly re-distributesthe resources. Note that not all three applications(TPC-W, smedia3, and smedia4) on node 3 can reachtheir targets. But the higher priority weight for TPC-W allowed it to still meet its throughput target whiledegrading performance for the other two applications.

The result from using the work-conserving modefor the same scenario is also shown in Figure 13. Inthis mode, smedia3 and smedia4 took up more CPUresource, causing TPC-W to fall below its target.

We also use this example to illustrate how a tradeoffbetween controller stability and responsiveness can behandled by adjusting the stability factor q. Figure 14shows the achieved throughput for TPC-W and sme-dia3 under the same workload condition, for q valuesof 1, 2, and 10. The result for q = 2 is the sameas in Figure 13. For q = 1, the controller reacts tothe workload change more quickly and aggressively,resulting in large oscillations in performance. Forq = 10, the controller becomes much more sluggishand does not adjust resource allocations fast enoughto track the performance targets.

5.3 Scenario 3: Production-trace-driven work-loads

The last scenario is designed to substantiate thefollowing two claims for AutoControl :

• Claim 5: It can be used to control applicationresponse time.

• Claim 6: It can manage workloads that aredriven by real production traces.

In this scenario, we use the same setup as in Sce-nario 1, shown in Figure 9(a). We simulated the pro-duction workloads using the trace-driven approach asdescribed in Section 4. Each of the RUBiS, smedia1and smedia2 applications was driven by a productiontrace, while both smedia3 and smedia4 run a work-load with 40 threads with a 2% chance of requestingan encrypted stream (making it disk-bound).

In this experiment, we use response time as theperformance metric for two reasons.

1. Response time behaves quite nonlinearly withrespect to the resource allocation and can be

0

25

50

75

0 15 30 45 60Thr

ough

put (

Req

uest

s/se

c)

Time intervals (every 20 secs)

AutoControlwork-conserving

target

(a) TPC-W throughput

0

1000

2000

3000

4000

5000

0 15 30 45 60

Thr

ough

put (

KB

ytes

/sec

)

Time intervals (every 20 secs)

AutoControlwork-conserving

target

(b) Smedia3 throughput

Figure 13: Performance comparison between AutoCon-

trol and work-conserving mode, with different priority

weights for TPC-W (w = 2) and smedia3 (w = 1).

0

10

20

30

40

50

60

0 10 20 30 40 50

Time intervals (every 20 secs)

q = 1q = 2

q = 10target

(a) TPC-W throughput

0

1000

2000

3000

4000

5000

6000

0 10 20 30 40 50

Time intervals (every 20 secs)

q = 1q = 2

q = 10target

(b) Smedia3 throughput

Figure 14: Performance results for TPC-W and sme-

dia3 with stability factor q = 1, 2, 10

used to evaluate how AutoControl copes withnonlinearity in the system.

2. It is possible to specify the same response timetarget for all applications even if they are differ-ent, whereas specifying throughput targets areharder since they may depend on the offeredload.

For brevity, we only show the results for the three

12

0

1

2

3

4

5

6

7

0 10 20 30 40 50 60 70 80 90

Res

pons

e tim

e (in

sec

s) RUBiSsmedia1smedia2

target

(a) AutoControl

0

1

2

3

4

5

6

7

0 10 20 30 40 50 60 70 80 90

Res

pons

e tim

e (in

sec

s) RUBiSsmedia1smedia2

target

(b) Work-conserving mode

0

1

2

3

4

5

6

7

0 10 20 30 40 50 60 70 80 90

Res

pons

e tim

e (in

sec

s) RUBiSsmedia1smedia2

target

(c) Static allocation mode

Figure 15: Performance comparison of AutoControl, work-conserving mode and static allocation mode, while

running RUBiS, smedia1 and smedia2 with production-trace-driven workloads.

applications running on node 1. Figures 15(a), 15(b)and 15(c) show the measured average response timeof RUBiS, smedia1 and smedia2 as functions of thecontrol interval, using AutoControl, work-conservingmode, and static allocation mode, respectively. Weuse a response time target of 1.5 second for all thethree applications, and set the CPU allocation at afixed 33% for each application in the static mode. Thedark-shaded regions show the time intervals when aCPU bottleneck occurred.

In the first region, for the work-conserving mode,both smedia1 and smedia2 had high CPU demands,causing not only response time target violations forthemselves, but also a large spike of 6 second in theresponse time for RUBiS at the 15th interval. In com-parison, AutoControl allocated to both smedia1 andsmedia2 higher shares of the CPU without overly pe-nalizing RUBiS. As a result, all the three applicationswere able to meet the response time target most of thetime, except for the small spike in RUBiS.

In the second shaded region, the RUBiS applica-tion became more CPU intensive. Because there is noperformance assurance in the work-conserving mode,the response time of RUBiS surged and resulted in aperiod of target violations, while both smedia1 andsmedia2 had response times well below the target. Incontrast, AutoControl allocated more CPU capacityto RUBiS when needed by carefully reducing the re-source allocation to smedia2. The result was thatthere were almost no target violations for any of thethree applications.

The performance result from the static allocationmode was similar to that from the work-conservingmode, except that the RUBiS response time was evenworse in the second region.

Despite the fact that response time is a nonlin-ear function of resource allocation, and that the realtraces used here were much more dynamic than thestatic workloads with step changes tested in Scenario1 and 2, AutoControl was still able to balance the re-sources and minimize the response time violations forall three applications.

6. RELATED WORKIn recent years, control theory has been applied

to computer systems for resource management andperformance control [13, 16]. Examples of its ap-plication include web server performance guarantees[1], dynamic adjustment of the cache size for multi-ple request classes [19], CPU and memory utilizationcontrol in web servers [9], adjustment of resource de-mands of virtual machines based on resource avail-ability [27], and dynamic CPU allocations for multi-tier applications [18, 22]. These concerned themselveswith controlling only a single resource (usually CPU),used mostly single-input single-output (SISO) con-trollers (except in [9]), and required changes in theapplications. In contrast, our MIMO controller op-erates on multiple resources (CPU and storage) anduses the sensors and actuators at the virtualizationlayer and external QoS sensors without requiring anymodifications to applications.

In [9], the authors apply MIMO control to adjusttwo configuration parameters within Apache to regu-late CPU and memory utilization of the Web server.They used static linear models, which are obtained bysystem identification for modeling the system. Ourearlier attempts at static models for controlling CPUand disk resources have failed, and therefore, we useda dynamic adaptive model in this paper. Our workalso extends MIMO control to controlling multiple re-sources and virtualization, which has more complexinteractions than controlling a single web server.

Prior work on controlling storage resources inde-pendent of CPU includes systems that provide per-formance guarantees in storage systems [7, 10, 14,20]. However, one has to tune these tools to achieveapplication-level guarantees. Our work builds on topof our earlier work, where we developed an adaptivecontroller [17] to achieve performance differentiation,and efficient adaptive proportional share scheduler[11] for storage systems.

Traditional admission control to prevent comput-ing systems from being overloaded has focused mostlyon web servers. Control theory was applied in [15]

13

for the design of a self-tuning admission controller for3-tier web sites. In [17], a self-tuning adaptive con-troller was developed for admission control in stor-age systems based on online estimation of the rela-tionship between the admitted load and the achievedperformance. These admission control schemes arecomplementary to the our approach, because the for-mer shapes the resource demand into a server system,whereas the latter adjusts the supply of resources forhandling the demand.

Dynamic resource allocation in distributed systemshas been studied extensively, but the emphasis hasbeen on allocating resources across multiple nodesrather than in time, because of lack of good isolationmechanisms like virtualization. It was formulated asan online optimization problem in [3] using periodicutilization measurements, and resource allocation wasimplemented via request distribution. Resource pro-visioning for large clusters hosting multiple serviceswas modeled as a “bidding” process in order to saveenergy in [8]. The active server set of each servicewas dynamically resized adapting to the offered load.In [23], an integrated framework was proposed com-bining a cluster-level load balancer and a node-levelclass-aware scheduler to achieve both overall systemefficiency and individual response time goals. How-ever, these existing techniques are not directly appli-cable to allocating resources to applications runningin VMs. They also fall short of providing a way ofallocating resources to meet the end-to-end SLOs.

7. DISCUSSION AND FUTURE WORKThis section describes some of the design issues in

AutoControl, alternative methods and future researchwork.

7.1 Migration for dealing with bottlenecksMigration of VMs can also be used to handle an

overloaded physical node [26]. The black box migra-tion strategy uses resource-utilization statistics to in-fer which VMs need to be migrated, and may notknow about SLO violations. The gray box migra-tion strategy uses application statistics to infer theresource requirements using queuing theory. How-ever, requirements for complex applications requir-ing multiple resources cannot be easily predicted us-ing queuing theory. In addition, migration of state-ful applications (e.g., databases) might take too longto mitigate an SLO violation; in fact, the impact onthe SLOs is compounded during the migration. Ina heavily-consolidated data center where most of thenodes are highly utilized, migration may not be vi-able. Finally, security concerns in migration [21] maycause the vendors to add security features that willmake migration much slower.

Despite these shortcomings, migration is useful when

there is a long-term mismatch between node resourcesand application requirements. Migration can be addedas an actuator to our system. The migration actuatorworks at a much more coarse-grained time-scale, andas future work, we plan to extend AutoControl to usemigration.

7.2 Actuator & sensor behavior, network andmemory control

The behavior of sensors and actuators affects ourcontrol. In existing systems, most sensors return ac-curate information, but many actuators are poorly de-signed. We observed various inaccuracies with Xen’searlier SEDF scheduler and credit scheduler that areidentified by other researchers [12] as well. These in-accuracies cause VMs to gain more or less CPU thanset by the controller. Empirical evidence shows thatour controller is resistant to CPU scheduler’s inaccu-racies.

Our intial efforts in adding network resource con-trol have failed, because of inaccuracies in networkactuators. Since Xen’s native network control is notfully implemented, we tried to use Linux’s existingtraffic controller (tc) to allocate network resources toVMs. We found that the network bandwidth settingin (tc) is not enforced correctly when heavy networkworkloads are run. We plan to fix these problems andadd network control as well. The theory we developedin this paper is directly applicable to any number ofresources.

The memory ballooning supported in VMware [25]provides a way of controlling the memory required bya VM. However, the ballooning algorithm does notknow about application goals or multiple tiers, andonly uses the memory pressure as seen by the operat-ing system. In the future, we also plan to add memorycontrol actuators to our system.

7.3 Handling a combination of multiple tar-gets

AutoControl is shown to be able to handle differ-ent metrics, including throughput and response timeas performance metrics. How do we handle appli-

cations that want to specify a combination of met-

rics for performance? We have developed a prelim-inary utility-based framework, where we introducethe concept of utility that is a representative of the“value” of application. For example, a utility func-tion like U(y) = max{α(1 − ey−yref , 0)} representshigher utility for an application as it reaches its tar-get metric yref . Continuing this, we can create autility function using two or more metrics that areof interest to the application. An example utilityfunction using both response time and throughputis: U(thr, rt) = g(rt− rtref + f(thrref − thr)), whereg(x) = (1 + erf(α ∗ x))/2 f(x) = β ∗ g(x).

14

05

1015

0

20

40

60

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Throughput

Response time

Uili

ty

Figure 16: Combined metrics, thrref = 25, rtref = 10,

α = 1, β = 1, thr = 0 − 50, rt = 0 − 15

A 3-D visualization of the function is shown in Fig-ure 16.

8. CONCLUSIONSIn this paper, we presented AutoControl, a feed-

back control system to dynamically allocate compu-tational resources to applications in shared virtual-ized environments. AutoControl consists of an on-line model estimator that captures the relationshipbetween application-level performance and resourceallocation and a novel MIMO resource controller thatdetermines appropriate allocation of multiple resourcesto achieve application-level SLOs. We evaluated Au-

toControl using a testbed consisting of Xen virtualmachines and various single-tier and multi-tier appli-cations and benchmarks. Our experimental resultsconfirm that AutoControl can detect CPU and diskbottlenecks across multiple nodes and can adjust re-source allocation to achieve end-to-end application-level SLOs. In addition, AutoControl can cope withshifting resource bottlenecks and provide a level ofservice differentiation according to the priority of in-dividual applications. Finally, we showed that Auto-

Control can enforce performance targets for differentapplication-level metrics, including throughput andresponse time.

9. REFERENCES[1] Abdelzaher, T., Shin, K., and Bhatti, N.

Performance guarantees for web server end-systems:A control-theoretical approach. IEEE Transactionson Parallel and Distributed Systems 13 (2002).

[2] Amza, C., Ch, A., Cox, A., Elnikety, S., Gil,R., Rajamani, K., Cecchet, E., andMarguerite, J. Specification and implementationof dynamic Web site benchmarks. In Proc. of IEEE5th Annual Workshop on WorkloadCharacterization (Oct. 2002).

[3] Aron, M., Druschel, P., and Zwaenepoel, W.Cluster reserves: A mechanism for resourcemanagement in cluster-based network servers. InProc. of ACM SIGMETRICS (2000), pp. 90–101.

[4] Astrom, K., and Wittenmark, B. AdaptiveControl. Addition-Wesley, 1995.

[5] Barham, P., Dragovic, B., Fraser, K., Hand,S., Harris, T., Ho, A., Neugebauer, R., Pratt,I., and Warfield, A. Xen and the art ofvirtualization. In Proc. of the 19th ACM Symposiumon Operating Systems Principles (SOSP) (Oct.2003), pp. 164–177.

[6] Cain, H., Rajwar, R., Marden, M., andLipasti, M. H. An architectural evaluation of javaTPC-W. In HPCA (2001), pp. 229–240.

[7] Chambliss, D., Alvarez, G., Pandey, P., Jadav,D., Xu, J., Menon, R., and Lee, T. Performancevirtualization for large-scale storage systems. InProc. of Symp. on Reliable Distributed Systems(SRDS) (Oct. 2003), pp. 109–118.

[8] Chase, J., Anderson, D., Thakar, P., Vahdat,A., and Doyle, R. Managing energy and serverresources in hosting centers. In Proc. of Symposiumon Operating Systems Principles (SOSP) (October2001).

[9] Diao, Y., Gandhi, N., Hellerstein, J., Parekh,S., and Tilbury, D. MIMO control of an apacheweb server: Modeling and controller design. In Proc.of American Control Conference (ACC) (2002).

[10] Goyal, P., Modha, D., and Tewari, R.CacheCOW: providing qoS for storage systemcaches. In Proc. of ACM SIGMETRICS (2003),pp. 306–307.

[11] Gulati, A., Merchant, A., Uysal, M., andVarman, P. Efficient and adaptive proportionalshare I/O scheduling. Tech. Rep. HPL-2007-186, HPLabs, Nov 2007.

[12] Gupta, D., Cherkasova, L., Gardner, R., andVahdat, A. Enforcing performance isolation acrossvirtual machines in xen. In Proc. of InternationalMiddleware Conference (2006), vol. 4290,pp. 342–362.

[13] Hellerstein, J. L. Designing in controlengineering of computing systems. In Proc. ofAmerican Control Conference (2004).

[14] Jin, W., Chase, J., and Kaur, J. Interposedproportional sharing for a storage service utility. InProc. of ACM SIGMETRICS (2004), pp. 37–48.

[15] Kamra, A., Misra, V., and Nahum, E. Yaksha:A self-tuning controller for managing theperformance of 3-tiered web sites. In Proc. of theInternational Workshop on Quality of Service(IWQoS) (June 2004).

[16] Karamanolis, C., Karlsson, M., and Zhu, X.Designing controllable computer systems. In Proc.of HOTOS (June 2005), pp. 49–54.

[17] Karlsson, M., Karamanolis, C., and Zhu, X.Triage: Performance isolation and differentiation forstorage systems. In Proc. of IEEE Int. Workshop onQuality of Service (IWQoS) (2004).

[18] Liu, X., Zhu, X., Padala, P., Wang, Z., andSinghal, S. Optimal multivariate control fordifferentiated services on a shared hosting platform.In Proc. of the IEEE Conference on Decision andControl (CDC) (2007).

[19] Lu, Y., Abdelzaher, T., and Saxena, A. Design,implementation, and evaluation of differentiatedcaching serives. IEEE Transactions on Parallel andDistributed Systems 15, 5 (May 2004).

[20] Lumb, C., Merchant, A., and Alvarez, G.Facade: Virtual storage devices with performanceguarantees. In Proc. of File and StorageTechnologies (FAST) (2003), USENIX.

15

[21] Oberheide, J., Cooke, E., and Jahanian, F.Empirical exploitation of live virtual machinemigration. In Proc. of BlackHat DC convention(2008).

[22] Padala, P., Zhu, X., Uysal, M., Wang, Z.,Singhal, S., Merchant, A., Salem, K., andShin, K. G. Adaptive control of virutalizedresources in utility computing environments. InACM Proc. of the EuroSys (2007).

[23] Shen, K., Tang, H., Yang, T., and Chu, L.Integrated resource management for cluster-basedinternet services. ACM SIGOPS Operating SystemsReview 36, SI (2002), 225 – 238.

[24] Tang, W., Fu, Y., Cherkasova, L., andVahdat, A. Long-term streaming media serverworkload analysis and modeling. Tech. Rep.HPL-2003-23, HP Labs, Feb. 07 2003.

[25] Waldspurger, C. Memory resource managementin VMware ESX server. In Proc. of Symposium onOperating Systems Design and Implementation(OSDI) (Dec. 2002).

[26] Wood, T., Shenoy, P. J., Venkataramani, A.,and Yousif, M. S. Black-box and gray-boxstrategies for virtual machine migration. In NSDI(2007), USENIX.

[27] Zhang, Y., Bestavros, A., Guirguis, M.,Matta, I., and West, R. Friendly virtualmachines: leveraging a feedback-control model forapplication adaptation. In Proc. of the VirtualExecution Environments, VEE (2005), pp. 2–12.

16