IEEE TRANSACTIONS ON COMPUTERS, VOL. , NO. , AUGUST 2013 …plama/papers/TOC13.pdf · As shown in Figure 1, various applications are hosted on a virtualized server cluster according

IEEE TRANSACTIONS ON COMPUTERS, VOL. , NO. , AUGUST 2013 1

Coordinated Power and PerformanceGuarantee with Fuzzy MIMO Control in

Virtualized Server ClustersPalden Lama and Xiaobo Zhou

Abstract—It is important but challenging to assure the performance of multi-tier Internet applications with the power consumptioncap of virtualized server clusters mainly due to system complexity of shared infrastructure and dynamic and bursty nature ofworkloads. This paper presents PERFUME, a system that simultaneously guarantees power and performance targets with flexibletradeoffs and service differentiation among co-hosted applications while assuring control accuracy and system stability. Basedon the proposed fuzzy MIMO control technique, it effectively controls both the throughput and percentile-based response timeof multi-tier applications due to its novel self-adaptive fuzzy modeling that integrates the strengths of fuzzy logic, MIMO controland artificial neural network. Furthermore, we address an important challenge of pro-actively avoiding violations of power andperformance targets in anticipation of future workload changes. We implement PERFUME in a testbed of virtualized blade servershosting multi-tier RUBiS applications. Performance evaluation based on synthetic and real-world Web workloads demonstratesits control accuracy, flexibility in selecting tradeoffs between conflicting targets, service differentiation capability and robustnessagainst highly dynamic and bursty workloads. It outperforms a representative utility based approach in providing guarantee ofthe system throughput, percentile-based response time and power budget.

Index Terms—Power Budget, Performance Guarantee, Multi-tier Internet Services, Fuzzy MIMO control, Proactive Control, SelfAdaptation, Server Virtualization.

F

1 INTRODUCTION

Modern datacenters apply virtualization technology to hostmultiple Internet applications that share underlying highdensity server resources for server consolidation and systemmanageability. The widely used high density blade serversimpose stringent power and cooling requirements. Further-more, datacenters may adopt over-subscription practice forpower savings [27], by allowing the sum of the possiblepeak power consumption of all the servers combined to begreater than the provisioned capacity. Hence, it is essentialto precisely control power consumption to ensure that actualtotal power use stays below capacity.

A common technique for enforcing power budget isto dynamically transition the hardware components fromhigh power states to low-power states [7]. However, ithas significant influence on the performance of hostedapplications as it may result in violation of service levelagreements (SLAs) in terms of response time and through-put required by customers. Furthermore, such an approachis not easily applicable to virtualized environments wherephysical processors are shared by multiple virtual machines(VMs). Changing the power state of a processor will affectthe performance of multiple VMs belonging to different

• P. Lama is with the Department of Computer Science, University ofTexas at San Antonio, TX, 78249. E-mail: [email protected]

• X. Zhou is with the Department of Computer Science, University ofColorado, Colorado Springs, CO, 80918. E-mail: [email protected]

• X. Zhou is the corresponding author.

applications. Thus, power management may threaten theperformance isolation of hosted applications. It is importantto consider a coordinated approach in controlling power andperformance in virtualized datacenters.

Recent studies such as [33] found highly dynamic work-loads of Internet services that fluctuate over multiple timescales, which can have a significant impact on the process-ing demands imposed on datacenter servers. Furthermore,burstiness of Internet workloads has deleterious impacton client-perceived performance [29]. It is challenging todesign autonomic resource provisioning techniques that arerobust to dynamic variation and burstiness in workloads.

Many research studies focused on treating either poweror performance as the primary control target in a datacenterwhile satisfying the other objective in a best-effort manner.Power oriented approaches [25], [32], [37] disregard theSLAs of hosted applications while performance orientedapproaches do not have explicit control on power con-sumption [3]. Recently, vPnP [8] was proposed for explicitcoordination of power and performance in virtualized dat-acenters using utility function optimization. The approachcan achieve tradeoffs between power and performance ina flexible way. However, it lacks the guarantee on stabilityand performance of the server system especially in the faceof highly dynamic and bursty workloads.

Multiple-input-multiple-output (MIMO) control tech-nique has been applied for performance management ofmulti-tier applications and power control of high densityservers in an enclosure [37]. However, those MIMO controlsolutions do not provide explicit coordination between


Virtualized Server

Clusters

Preference Weights

Power Budget,

PlacementController

Coordinated Power and Performance Control

PERFUME

Performance Target,

Applications

VMs

Fig. 1: PERFUME system overview.

power and performance. They are designed based on offlinesystem identification for specific workloads [19], [37].Thus, they are not adaptive to situations with abrupt work-load changes though they can achieve control accuracy andsystem stability within a range theoretically.

An important goal in datacenters is to meet the SLAswith customers. Many studies focused on the averageend-to-end response time within a multi-tier system [2],[8], [16], [31], [34]. However, the average response timeguarantee is not sufficient for many applications, in par-ticular for interactive ones as it is unable to represent theshape of a delay curve. Instead, providers of such servicesprefer percentile-based performance guarantee [26], [34],[39]. But it is challenging to control the percentile-basedperformance, even without a power consumption cap, dueto its strong non-linear relation with resource allocation andworkload dynamics [20].

In this paper, we propose and develop PERFUME, asystem that simultaneously provides explicit guarantee onthe power consumption of underlying server clusters andthe performance of multi-tier applications hosted on them.As shown in Figure 1, various applications are hosted on avirtualized server cluster according to an interference-awareapplication placement policy. Such a placement policydecides which applications should be co-located so thatthe performance interference between applications can beminimized [28]. PERFUME provides the flexibility to thesystem administrator in setting the power and performancetargets, and specifying the trade-offs between the cost ofpower consumption and the business value of providingperformance assurance to various applications.

PERFUME’s core is a fuzzy MIMO (FUMI) controllerthat minimizes the deviation of power and performancefrom their respective targets while assuring control accuracyand system stability. FUMI control is capable of dealingwith the complexity of multi-tier applications in a sharedvirtualized infrastructure and the inherent non-linearitiesthat exist in a real Web system. It is due to its integration offuzzy modeling logic, MIMO control and artificial neuralnetwork. The control action is taken by adjusting the CPUusage limits among individual tiers of multiple applicationsin a coordinated manner. Furthermore, it provides servicedifferentiation by prioritizing CPU allocations among dif-ferent applications while avoiding power budget violations.

PERFUME provides performance guarantee for through-put and percentile-based response time in the face of highly

dynamic and bursty workloads. Its novel FUMI controlaccurately captures the strong nonlinearity of percentile-based performance metric such as the 95th-percentile re-sponse time by applying fuzzy models. It predicts the powerconsumption of the server clusters for various CPU usagelimits in hosted applications. PERFUME is self-adaptiveto highly dynamic workloads due to its online learningcapability. It adjusts the fuzzy model parameters at run timeusing a weighted recursive least-squares (wRLS) method.

Furthermore, PERFUME addresses an important chal-lenge of pro-actively avoiding violations of power and per-formance targets in anticipation of future workload changes.In spite of FUMI control’s ability to adapt itself in the faceof workload variations, target violations may occur to someextent. Such violations are even more significant in caseof continuously changing workloads as seen in real-worldWeb traces. It is due to the purely reactive approach ofupdating the power and performance models in response tothe measured modeling errors. Since it takes a few controlintervals to accurately update the system model, the controlactions may lag behind continuous variations in the work-load. We enhance the proposed FUMI control by integratinga workload prediction component. The integration involvesworkload-aware MIMO fuzzy modeling of the virtualizedserver system and the design of proactive FUMI controlbased on this model. Our proactive FUMI control is ableto make effective control decisions in anticipation of futureworkload changes. We apply a standard Kalman filteringtechnique to predict workload variations.

PERFUME is suitable for joint power and performancecontrol at the server cabinet level. We further enhanceits scalability by decomposing the global control probleminto local sub-problems for each application hosted in thecluster. Then, we construct decentralized FUMI controllersfor managing individual applications.

We implement PERFUME on a testbed of virtualizedserver clusters hosting RUBiS benchmark applications. Thetestbed consists of a cluster of HP ProLiant BL460C G6blade servers using VMware VMs. We apply highly dy-namic and bursty synthetic workloads as well as workloadsbased on real Web traces from the 1998 Soccer World Cupsite [1]. Experimental results demonstrate that PERFUME’sFUMI model significantly outperforms a recently appliedmodeling technique for Web systems, ARMA (Auto Re-gressive Moving Average) [8], [31] in terms of predictionaccuracy for application performance and power usage.

Compared to vPnP [8], PERFUME delivers significantlyimproved performance, in terms of power, throughput andresponse time assurance with respect to the given targetsin the face of highly dynamic and bursty workloads. Thisis due to its modeling accuracy, self-adaptiveness andcontrol theoretical foundation. Note that vPnP was origi-nally applied to a single tier, which was identified as thebottleneck of a Web application. However, in practice, thebottleneck tier can switch between multiple tiers dependingon workload patterns. For fair comparison, we extend thevPnP implementation to a multi-tier application consistingof a front-end Web tier that is responsible for HTTP request


processing, a middle application tier that implements coreapplication functionality, and a backend database tier.

Experimental results demonstrate that PERFUME de-livers consistent performance for various control optionsthat tradeoff between power and performance guarantee.PERFUME provides service differentiation according to thepriorities of individual applications during power overloadconditions. We also examine the impact of tuning importantFUMI control parameters on the stability and responsive-ness of the controller. We demonstrate the improvement inpower and performance assurance due to the integrationof workload prediction in PERFUME’s architecture forproactive control. Finally, we evaluate the effectiveness ofthe decentralized FUMI controllers in providing applicationperformance assurance and power control.

In the following, Section 2 discusses related work. ThePERFUME system architecture is presented in Section 3.Section 4 describes the modeling of power and performancecontrol. Section 5 discusses the design of FUMI control.Section 6 provides the testbed implementation details. Sec-tion 7 presents the experimental results and analysis. Weconclude the paper with future work in Section 8.

2 RELATED WORK

Power management in computing systems is an importantand challenging research area. There were many studiesin power management using the Dynamic Voltage Scaling(DVS) in embedded mobile devices and Web servers [5],[7]. Today, popular Internet applications have a multi-tierarchitecture forming server pipelines. Applying indepen-dent DVS algorithms in a pipeline will lead to inefficientusage of power for assuring an end-to-end delay guaranteedue to the inter-tier dependency [13]. Wang et al. [37]proposed a MIMO controller to regulate the power con-sumption by conducting processor frequency scaling foreach server while optimizing multi-tier application per-formance. Such controllers are designed based on offlinesystem identification for specific workloads. They are notadaptive to situations with abrupt workload changes.

Modern datacenters apply virtualization technology toconsolidate workloads on fewer powerful servers for im-proving server utilization.Traditional power managementtechniques are not easily applicable to virtualized environ-ments where physical processors are shared by multipleVMs. For instance, changing the power state of a proces-sor by DVS will inadvertently affect the performance ofmultiple VMs belonging to different applications.

Recent studies have dealt with the limitation of DVStechnique in virtualized environments by applying ‘soft’techniques which exploit a hypervisor’s ability to limithardware usage by guest VMs [8], [30]. Other studieshave tackled the challenges of enforcing power budgets onvirtualized environments by coordinating multiple powercontrol knobs. Lim et al. [27] proposed a combinationof hardware-based (e.g. DVS) and software-based (e.g.VM CPU time allocation) power control techniques tocoordinate the power distribution among a large number

of VMs within given peak power capacity. Verma etal. [36] combined automatic VM resizing and live migrationtechniques to ensure that datacenters can deal both withtemporary power outages and with surges in workload.

It is a trend that power and performance management ofvirtualized multi-tier servers are jointly tackled. However,it is challenging due to the inherently conflicting objectives.

Power-oriented approaches aim to ensure that a serversystem does not violate a given power budget whilemaximizing the performance of hosted applications [6],[25], [32], [37], [38] or increasing the number of servicesthat can be deployed [9]. pMapper [35] tackles power-cost tradeoffs under a fixed performance constraint. vMan-age [18] performs VM placement to save power withoutdegrading performance. Co-Con [38] is a two-level controlarchitecture for power control in virtualized server clusters.It gives a higher priority to power budget tracking andperformance is a secondary goal.

Performance-oriented approaches aim to guarantee aperformance target while minimizing the power consump-tion [15], [19], [24], [26]. They do not control the powerconsumption explicitly to meet the power budget.

Coordinated power and performance management withexplicit trade-offs is recently studied in virtualizedservers [8], [12], [16]. Mistral [16] optimizes power con-sumption, performance benefit, and the transient costsincurred by adaptations in virtualized server clusters.vPnP [8] coordinates power and performance in virtualizedservers using utility function optimization. It provides theflexibility to choose the tradeoff between power and per-formance, but lacks the guarantee on system stability andperformance, especially under highly dynamic workloads.

There are important studies in dynamic resource pro-visioning for delay guarantee in multi-tier Internet ser-vices [11], [20], [22], [26], [31], [34], [39]. For instance,Urgaonkar et al. [34] proposed a dynamic server provi-sioning approach based on queueing models, which re-quires extensive application profiling for each workload.An approach proposed in [39] can model the probabilitydistributions of response time based on CPU allocationson VMs. However, the performance model is not adaptiveto dynamic workloads. Furthermore, there was no powercontrol and power-performance tradeoff capability.

Recent works have analyzed the performance interfer-ence between co-located applications in virtualized en-vironments [10], [17]. There are performance isolationtechniques based on interference-aware application place-ment [28] as well as dynamic resource management [23] invirtualized servers. In this paper, we consider that the firstapproach is in place to decide the application placementthat mitigates or avoids performance interference.

The tradeoff flexibility and system stability requirementsin the face of highly dynamic and bursty workloads demandnovel techniques for autonomic performance and powercontrol. In this paper, we propose a Fuzzy MIMO controlto coordinate power and performance of virtualized serverclusters, and provide service differentiation and proactivepower control under dynamic workload and power budget.


Power Monitor

App 1Tier 1

App 2Tier 1

Tier 3

App 1 Tier 3

App 2 Tier 3

App 1 Tier 2

App 2 Tier 2

Tier 2Tier 1

App n App n App n

Physical Physical

Performance

Monitor

Physical

Server Server Server

Resource Pool

Virtualization Management Module

Resource AllocatorFUMI

Control

Fig. 2: The system architecture of PERFUME.

3 PERFUME SYSTEM ARCHITECTURE

Figure 2 illustrates the system architecture of PERFUME.The system under control is a virtualized server clusterhosting multi-tier applications. Each tier of an applicationis deployed at a VM created from a resource pool. Thepower monitor periodically measures the average powerconsumption of the server cluster. The performance monitorperiodically measures the performance of each multi-tierapplication. FUMI control determines the CPU usage limitson the Web, application and database tiers of multipleapplications to regulate per-application performance and thetotal power consumption of the server cluster. The resourceallocator limits the CPU usage of each VM by using thevirtualization management module. Applying CPU usagelimits on VMs constrains the utilization of underlyingprocessors, and thereby regulates power consumption. It isfeasible due the idle power management of modern proces-sors, which incorporate sleep states to achieve substantivepower savings when a processor is idle [27].

4 MODELING OF COORDINATED POWERAND PERFORMANCE CONTROL

We apply fuzzy modeling to predict the performance ofmulti-tier applications for various CPU usage limits im-posed on the VMs. Both throughput and the percentile-based response time are used as performance metrics. Fuzzymodeling also estimates the complex relationship betweenthe power consumption of the resource pool and the CPUusage limits imposed on various tiers of the applications. Akey strength of fuzzy model is its ability to represent highlycomplex nonlinear systems by a combination of inter-linkedsubsystems with simple functional dependencies. A simplelinear model is not sufficient in this case due to the complexinter-tier dependencies and the underlying complexity ofvirtualized server infrastructure.

4.1 The Fuzzy ModelWe consider a number of multi-tier applications hosted ina virtual resource pool as a MIMO system. The inputs tothe system are CPU usage limits set at various tiers of the

applications. The outputs of the system are the measuredperformance of each application and the average powerconsumption of the shared resource pool. We obtain twoseparate models for power and performance of the system,respectively. The system is approximated by a collection ofMIMO fuzzy models as follows:

y(k + 1) = R(ξ(k), u(k)). (1)

Let y(k) be the output variable and u(k) =[u1(k), .., um(k)]T be the vector of current inputs at sam-pling interval k. The regression vector ξ(k) includes currentand lagged outputs:

ξ(k) = [y(k), .., y(k − ny))]T (2)

where ny specifies the number of lagged values of theoutput variable. R is a fuzzy model consisting of a set offuzzy rules. Each fuzzy rule is described as follows:

• Ri: If ξ1(k) is Ωi,1 and .. ξϱ(k) is Ωi,ϱ and u1(k) isΩi,ϱ+1 and .. um(k) is Ωi,ϱ+m then

yi(k + 1) = ζiξ(k) + ηiu(k) + ϕi. (3)

Here, Ωi is a set of fuzzy values, which describe theelements of regression vector ξ(k) and the current inputvector u(k) for the fuzzy rule, Ri. The numeric values ofξ(k) and u(k) are mapped to fuzzy values by using thecorresponding fuzzy membership functions. For example,the fuzzy membership function of Ωi,1 determines thedegree to which it can accurately describe ξ1(k). ϱ denotesthe number of elements in the regression vector ξ(k).yi(k + 1) is the estimated model output according tothe fuzzy rule Ri. ζi and ηi are vectors containing theconsequent parameters and ϕi is the offset vector.

Note that the fuzzy membership functions may overlapwith each other. As a result, multiple fuzzy rules canbe triggered by a given set of input values. Each fuzzyrule describes a region of the complex non-linear systemmodel using a simple functional relation given by therule’s consequent part. The contribution of each rule to themodel output is determined by its firing strength, βi. It isthe product of the membership degrees of the antecedentvariables in that rule. The final model output is calculatedas the weighted average of the linear consequents in the Kindividual rules as follows.

y(k + 1) =

∑Ki=1 βi(ζiξ(k) + ηiu(k) + ϕi)∑K

i=1 βi

(4)

We conduct a case study to demonstrate the accuracy ofour MIMO fuzzy models in predicting the performance ofa RUBiS application and the power consumption of the un-derlying virtualized server cluster. The data for modeling iscollected by randomly allocating various CPU usage limitson the Web, application and database tiers of the RUBiSapplication, which faces a workload 1000 concurrent users.We construct an initial fuzzy model by applying subtractiveclustering technique [4] on data collected from the system.Each obtained cluster represents a certain operating regionof the system, where input-output data values are highlyconcentrated. Clustering partitions the input-output spaceto determine the number of fuzzy rules and the shape ofmembership functions.


2000

4000

6000

8000

10000

12000

14000

16000

18000

0 20 40 60 80 100

thro

ughp

ut

sampling interval

throughputpredicted throughput

0

5000

10000

15000

20000

25000

30000

35000

0 20 40 60 80 10095th

-per

cent

ile r

espo

nse

time

(ms)

sampling interval

response timepredicted response time

12

14

16

18

20

22

24

26

28

0 20 40 60 80 100

pow

er (

Wat

t)

sampling interval

power consumptionpredicted power consumption

(a) Throughput. (b) The 95th-percentile response time. The power consumption.

Fig. 3: The prediction accuracy of performance and power by fuzzy models.

0 %

5 %

10 %

15 %

20 %

25 %

30 %

35 %

40 %

Browse Browse->Bid Bid->Browse

Nor

mal

ized

roo

t mea

n sq

uare

err

or

Workload patterns

ARMAFUMI

0 %

5 %

10 %

15 %

20 %

25 %

30 %

35 %

40 %


Nor

mal

ized

roo

t mea

n sq

uare

err

or

Workload patterns

ARMAFUMI

0 %

5 %

10 %

15 %

20 %

25 %

30 %

35 %

40 %


Nor

mal

ized

roo

t mea

n sq

uare

err

or

Workload patterns

ARMAFUMI

(a) Throughput. (b) The 95th-percentile response time. The power consumption.

Fig. 4: The prediction accuracy comparison between fuzzy and ARMA models under a dynamic workload.

In this case study, we obtain four clusters in a fivedimensional space. The first four dimensions correspondto the three input variables, [u1(k), u2(k), u3(k)], andone regression vector element ξ1(k). The fifth dimensioncorresponds to the output variable, y(k), which is expressedas a linear function of the input variables. Each clustercenter describes a fuzzy rule in which the fuzzy values Ωi

corresponding to u(k) and ξ(k) are represented by gaussianmembership functions. The first four dimensions of thecluster center determine the mean of the gaussian functions.The function variance is determined by a tunable parameterin the subtractive clustering technique. For example, thefour cluster centers in our performance model are [0.46,0.61, 0.53, 0.6], [0.97, 0.4, 0.63, 0.81], [0.91, 0.87, 0.49,0.94], and [0.55, 0.65, 0.93, 0.96]. These are normalizedvalues with respect to the maximum CPU usage limitsimposed at the three tiers of the RUBiS application andthe average throughput in the previous sampling interval.We apply an adaptive network based fuzzy inference sys-tem [14] to further tune the fuzzy model parameters.

The consequent parameters of the fuzzy rules in ourperformance model corresponding to browsing and biddingworkload mixes are shown in Table 1. Here, η1, η2, andη3 represent the impact of CPU allocation at the Web,application, and database tiers respectively on the predictedapplication performance according to the four fuzzy rules.These values are significantly different for the two workloadmixes since they exhibit different inter-tier dependencies.The number of fuzzy rules is the same in both cases, sincethe range of input-output space does not vary a lot for thetwo workload mixes with the same workload intensity.

Figures 3(a), 3(b) and 3(c) show the accuracy of ourfuzzy models in predicting the application throughput,the 95th-percentile response time and the average powerconsumption for various CPU allocations in the face of abrowsing workload mix. The accuracy is measured by the

TABLE 1: Consequent parameters of fuzzy rules in the perfor-mance model for browsing and bidding workload mixes.

Rule workload η1 η2 η3 ζ1 ϕbrowse 0.183 0.144 0.106 0.021 0.1081

bid 0.108 0.243 0.162 0.02 0.107browse 0.291 0.229 0.168 0.025 0.082

bid 0.173 0.39 0.26 0.021 0.06browse 0.317 0.25 0.183 0.03 0.093

bid 0.182 0.409 0.273 0.027 0.1browse 0.27 0.213 0.156 0.024 0.1074

bid 0.156 0.352 0.234 0.026 0.11

normalized root mean square error (NRMSE), a standardmetric for deviation. The case study show that the checkingand the predicted data are very close, with the NRMSE12.5%, 17.6% and 15.2% in the three scenarios respectively.We use different data sets for training and validating thesystem models. We observe similar prediction accuracy offuzzy models in case of the bidding workload mix.

4.2 On-line Adaptation of the Fuzzy ModelInternet workloads to a datacenter vary dynamically inarrival rates as well as characteristics [33]. This results insignificantly varying resource demands at multiple tiers ofInternet applications. A static system model can not providesufficient prediction accuracy of power and performance forall possible variations in the workload. Hence, the systemmodels need to adapt on-line in the face of dynamic work-loads. We apply a wRLS method to adapt the consequentparameters of the fuzzy model obtained. The techniquecontinuously samples new measurements from the runtimesystem. It updates the model parameters in response to theerrors made by the existing fuzzy models in predictingthe performance and power consumption. The recursivenature of the wRLS method makes the time taken for thiscomputation negligible for a control interval that is more


than 10 seconds. It applies exponentially decaying weightson the sampled data so that higher weights are assigned tomore recent observations.

We express the fuzzy model output in Eq. (4) as follow:

y(k + 1) = Xθ(k) + e(k) (5)

where e(k) is the error value between actual outputof the system (i.e., measured performance or power)and predicted output of the model. θ = [θT1 θ

T1 ..θ

Tp ]

is a vector composed of the model parameters. X =[w1X(k), w2X(k), .., wpX(k)] where wi is the normalizeddegree of fulfillment or firing strength of ith rule andX(k) = [ξT (k), u(k)] is a vector containing current andprevious outputs and inputs of the system. The parametervector θ(k) is estimated so that the following cost functionis minimized. That is,

Cost =

k∑j=1

γk−je2(j). (6)

Here γ is a positive number less than one. It determinesin what manner the current prediction error and old errorsaffect the update of parameter estimation. The parametersof fuzzy model are updated according to the wRLS methodas follows:

θ(k) = θ(k − 1) + Q(k)X(k − 1)[y(k) − X(k − 1)θ(k − 1)]. (7)

Q(k) =1

γ[Q(k − 1) −

Q(k − 1)X(k − 1)XT (k − 1)Q(k − 1)

γ + XT (k − 1)Q(k − 1)X(k − 1)]. (8)

Here Q(k) is the updating matrix. The initial value of θ(0)is equal to the value obtained in the off-line identification.And, the initial value of Q(0) is equal to (XTX)−1.

To evaluate the self-adaptiveness of our fuzzy model,we measure its power and performance prediction accuracywhen a RUBiS workload is changed from browsing mix of1000 concurrent users to bidding mix of 500 concurrentusers and vice versa. Our results are compared with apopular and recently used technique for modeling Internetsystems, ARMA [8], [31]. As shown in Figures 4(a) and4(b), our fuzzy model outperforms ARMA model in pre-dicting performance of a multi-tier application. On average,the improvement in performance prediction accuracy for thethroughput and 95th-percentile end-to-end response timeare 35% and 43%, respectively. The improvement in powerprediction accuracy is shown in Figure 4(c). Compared toARMA, our fuzzy models are more accurate in capturingthe non-linear relationship between resource allocation andperformance or power in a virtualized server system.

4.3 Integration of workload-aware fuzzy modelingfor proactive FUMI ControlAlthough effective, the online adaptation of fuzzy modeltakes a few control intervals to capture the changing systembehavior. As a result, the control performance may bedegraded if the workload is continuously changing. Weaddress this challenge by the integration of workload-awarefuzzy modeling to achieve proactive control in anticipationof future workload changes. The novelty of proactive FUMIcontrol lies in its ability to consider the impact of futureworkload changes as well as current control actions whilesolving the MIMO control problem.

We incorporate the time varying workload intensity as ameasured disturbance in the system model. The system isnow approximated by a collection of MIMO fuzzy modelsas follows:

y(k + 1) = R(ξ(k), λ(k), u(k)). (9)

where the workload intensity at the sampling interval k isdenoted by λ(k). Each fuzzy rule in the model is describedas follows:

• Ri: If ξ1(k) is Ωi,1 and .. ξϱ(k) is Ωi,ϱ and λ(k) isΩi,ϱ+1 and u1(k) is Ωi,ϱ+2 and .. um(k) is Ωi,ϱ+m+1

then

yi(k + 1) = ζiξ(k) + θiλ(k) + ηiu(k) + ϕi. (10)

Here, a set of fuzzy values denoted by Ωi describes λ(k)in addition to other model parameters. The model output iscalculated as:

y(k + 1) =

∑Ki=1 βi(ζiξ(k) + θiλ(k) + ηiu(k) + ϕi)∑K

i=1 βi

(11)

It can also be expressed in the form of

y(k + 1) = ζ∗ξ(k) + θ∗λ(k) + η∗u(k) + ϕ∗. (12)

The aggregated parameters ζ∗, θ∗, η∗ and ϕ∗ are theweighted sum of vectors ζi, θi, ηi and ϕi respectively.They are applied to obtain a state-space system model andtransform the complex control problem into a computa-tionally efficient quadratic programming problem, which isdescribed in Section 5.2.

We construct a workload-aware fuzzy MIMO modelby applying subtractive clustering technique and adaptivenetwork based fuzzy inference system on the data col-lected from the virtualized server system hosting multi-tierapplications. For data collection, we measure the powerconsumption and performance achieved by the system forvarious combinations of CPU allocations and workloadintensities. The power and performance models, which arelearnt from the training data, have the ability to gener-alize the expected system behavior for previously unseenworkloads and CPU allocations. Our workload-aware fuzzymodel consists of six fuzzy rules.

At run time, we apply the wRLS method to update themodel parameters if the system behaves differently thanexpected by the fuzzy model. Such a situation could arisedue to change in the workload characteristics.

5 FUMI CONTROL DESIGNWe apply the model predictive control principle and fuzzymodeling to design the FUMI control. FUMI control is wellsuited for power and performance control in virtualizedserver clusters due to its capability to solve constrainedMIMO control problems of complex non-linear systems. Itdetermines control actions by optimizing a cost function,which expresses the control objectives and constraints overa time interval. We formulate the power and performanceassurance of virtualized multi-tier applications as a predic-tive control problem. Then, we present detailed steps totransform the control formulation to a standard quadraticprogramming problem.


5.1 FUMI Control Problem FormulationFUMI control aims to minimize the deviation of powerconsumption and performance of multi-tier applicationsfrom their respective targets. It decides the control actionsat every control period k by minimizing the cost function:

V (k) =

Hp∑i=1

||r1− y1(k + i)||2P +

Hp∑i=1

||r2− y2(k + i)||2Q

+

Hc−1∑j=0

||∆u(k + j)||2R. (13)

Here, y1(k) is the power consumption of the resource pool.y2(k) is a vector containing the percentile-based end-to-endresponse time or the throughput of each application. Thecontroller predicts both power and performance over Hp

control periods, called the prediction horizon. It computesa sequence of control actions ∆u(k),∆u(k+1), ..,∆u(k+Hc−1) over Hc control periods, called the control horizon,to keep the predicted power and performance close to theirpre-defined targets r1 and r2 respectively. The controlaction is the change in CPU usage limits imposed onvarious tiers of the multi-tier applications. P and Q are thetracking error weights that determine the trade-off betweenpower and performance. The tracking error weights alsofacilitate service differentiation between different applica-tions sharing the resource pool. The third term representsthe control penalty, weighted by R. It penalizes big changesin control action and contributes towards system stability.

A system administrator can select the power and per-formance targets and their respective weighting parametersby using an independent optimization framework, or byfollowing best practices, based on the business value ofproviding various levels of performance and the cost ofpower consumption. A higher preference weight can be setfor power consumption control, when the power budgetneeds to be enforced. However, temporary violations ofpower budgets are allowable, as long as they are bounded.Thermal failover happens only when the power budget isviolated long enough to create enough heat that increasesthe temperature beyond normal operational ranges. Hence,we do not treat power budget as a hard constraint in ourproblem formulation.

The control problem is subject to the constraint thatthe sum of CPU usage limits assigned to all multi-tierapplications must be bounded by the total CPU capacityof the resource pool. The constraint is formulated as:

M∑m=1

(∆um(k) + um(k)) ≤ Umax (14)

where M is the number of applications hosted and Umax

is the total CPU capacity of the resource pool. The use ofresource pool enables resource management at the servercluster level, independently of the actual hosts that con-tribute to the resources. Hence, we do not consider the CPUcapacity constraint of individual physical server.

5.2 Transformation to Quadratic ProgrammingWe transform the non-convex, time-consuming optimiza-tion involved in the MIMO control problem into a standard

quadratic programming problem, which can be solved ef-ficiently at run time. We express the objective of FUMIcontrol, defined by Eq. (13), as a quadratic program:

Minimize1

2∆u(k)TH∆u(k) + cT∆u(k) (15)

subject to constraint Ω∆u(k) ≤ ω.The matrices Ω and ω are chosen to formulate the

constraints on CPU resource usage as described in Eq. (14).Here, ∆u(k) is a matrix containing the CPU usage limitson each VM over the entire control horizon Hc

For this transformation, first we linearize the fuzzy modelat the current operating point and represent it as a state-space linear time variant model in the following form:

xlin(k + 1) = A(k)xlin(k) +Bu(k)u(k) +Bλ(k)λ(k).

ylin(k) = C(k)xlin(k). (16)

The vector for the state-space description is defined as

xlin(k + 1) = [ξT (k), 1]T . (17)

The matrices A(k),Bu(k),Bλ(k) and C(k) are constructedby freezing the parameters of the fuzzy model at a certainoperating point y(k) and u(k). Comparing Eq. (12) andEq. (16), the state matrices are computed as follows:

A=

ζ∗1,1 ζ∗1,2 .. .. .. ζ∗1,ϱ ϕ∗1

1 0 .. 0 0

0 1

.

.

. 0 0

.

.

.

.

.

.. . .

.

.

.

.

.

.ζ∗2,1 ζ∗2,2 .. .. .. ζ∗2,ϱ ϕ∗

2

0

.

.

.. . .

.

.

.

.

.

.ζ∗p,1 ζ∗p,2 .. .. .. ζ∗p,ϱ ϕ∗

p0 0 1 0 0 0 1

.

.

.

.

.

.

.

.

.. . .

.

.

.

.

.

.

.

.

.0 .. 0 .. 0 0 1

Bu =

η∗1,1 η∗

1,2 .. η∗1,m

0 .. .. 0

.

.

.

.

.

.η∗2,1 η∗

2,2 .. η∗2,m

0 .. .. 0

.

.

.

.

.

.η∗p,1 η∗

p,2 .. η∗p,m

0 .. .. 00 .. .. 0

Bλ =

θ∗10

.

.

.θ∗20

.

.

.θ∗p00

C=

1 0 .. .. .. .. 0

.

.

.. . .

.

.

.0 .. .. .. .. 1 0

where ζ∗ij is the jth element of aggregate parameter vectorsζ∗ for application i. η∗ij is the jth element of aggregateparameter vectors η∗ for application i. θ∗i is the aggregateparameter θ∗ for application i.

To ensure offset-free reference tracking, the optimizationproblem is defined with respect to the increment in thecontrol signal, ∆u(k), rather than the control signal u(k).The state-space description is extended correspondingly asfollows: [

xlin(k + 1)u(k)

]=

[A(k) Bu(k)0 I

][xlin(k)u(k − 1)

]

+

[Bu(k)

I

]∆u(k)+

[Bλ(k)

0

]λ(k)

ylin=[C(k) 0

][ xlin(k)u(k − 1)

]


X(k + 1) = AX(k) + Bu∆u(k) + Bλλ(k).

Y(k) = CX(k). (18)

We obtain the state-space description, shown in Eq. (18),corresponding to both power and performance models.Henceforth, we use the notations A1, B1u, B1λ, C1, X1(k),Y1(k) and A2, B2u, B2λ, C2, X2(k), Y2(k) for the statematrices that describe the performance model and the powermodel respectively.

Assuming that at time instant k, the state vector, thefuture control sequence and the future workload are known,the future process outputs can be predicted through succes-sive substitution. The complete output sequence over theprediction horizon Hp is given by the following:

Yi(k + 1)Yi(k + 2)

...Yi(k + Hp)

=RixAiXi(k)+Riu

∆u(k)

∆u(k + 1)

...∆u(k + Hc − 1)

+Riλ

λ(k)

λ(k + 1)

...λ(k + Hp)

where

Rix=

Ci

CiAi

...CiA

Hp−1

i

Riu=

CiBi 0 .. 0

CiAiBiu CiBiu .. 0

......

. . ....

CiAHp−1

i Biu CiAHp−2

i Biu .. CiAHp−Hci Biu

Riλ=

CiBiλ 0 .. 0

CiAiBiλ CiBiλ .. 0

......

. . ....

CiAHp−1

i Biλ CiAHp−2

i Biλ .. CiAHp−Hci Biλ

Combining the output sequence over the prediction hori-

zon with the FUMI control objective, we get a quadraticprogramming problem defined in Eq. (15) where,

H = 2(RT1uPR1u + R

T2uQR2u + R).

c = 2[RT1uP

T(R1xA1X1(k) − r1 + R1λλ)

+ RT2uQ

T(R2xA2X2(k) − r2 + R2λλ)]

T.

(19)

where

λ=

λ(k)

λ(k + 1)

...λ(k + Hp)

The integration of workload intensity λ in the FUMI controlproblem enables proactive control in the face of dynamicworkload variations. The future values of workload inten-sity λ(k) over the prediction horizon Hp are predicted byapplying Kalman filtering technique.

Note that we use the workload-aware fuzzy model forderiving a proactive FUMI control solution, assuming thatworkload predictions are sufficiently accurate. When theworkload variations are abrupt and unpredictable, FUMIcontrol uses the fuzzy model described in Section 4.1. Inthis case, we obtain the following:

H = 2(RT1uPR1u + R

T2uQR2u + R).

c = 2[RT1uP

T(R1xA1X1(k)−r1)+R

T2uQ

T(R2xA2X2(k)−r2)]

T. (20)

y(k+1)

ref

algorithm

u(k)

Optimizer Fuzzy model

Online learningVirtualized servers

Fig. 5: Interface between FUMI control and learning components.

The matrices R1x, R1u are associated with the performancemodels of hosted applications and matrices R2x, R2u areassociated with the power model of the resource pool.

Rix=

Ci

CiAi

...CiA

Hp−1

i

Riu=

CiBi 0 .. 0

CiAiBi CiBi .. 0

......

. . ....

CiAHp−1

i Bi CiAHp−2

i Bi .. CiAHp−Hci Bi

5.3 FUMI Control Interface

Figure 5 shows the interface between the FUMI onlinelearning and MIMO control components. The continuouschain of interactions between various components of theFUMI control system is described by the Algorithm 1.

Algorithm 1 The Control Algorithm.

1: Start with MIMO fuzzy models that are constructedfrom offline data collected from the system.

2: loop3: Linearize the fuzzy models at the current operating

state and obtain the state-space matrices: A1, B1u,B1λ, C1, A2, B2u, B2λ, and C2.

4: Predict the workload intensity of the hosted appli-cations over the prediction horizon, Hp, if workloadprediction capability is available.

5: Solve the quadratic programming problem describedin Section 5.2. Obtain a sequence of CPU resourceadjustments, u(k), over the control horizon, Hc.

6: Apply the first control action from the computedsequence of actions, in terms of CPU resource ad-justments on the hosted applications.

7: Adapt the fuzzy models using the wRLS method, inresponse to any modeling errors observed.

8: end loop

5.4 FUMI Control Scalability EnhancementWe now present a methodology for decentralizing FUMIcontrol, with the aim to further enhance its scalability.Given a total power budget of r1 and a total CPU capacityof Umax for a virtualized server cluster, we assign thepower budget and resource constraint for each applicationaccording to its priority as follows:

r1a = r1 ·Q(a) (21)Umax,a = Umax ·Q(a) (22)


Here, r1a and Umax,a are the power budget and themaximum amount of CPU resources that can be allocated toapplication a, respectively. Q(a) is the priority of applica-tion a as determined by the datacenter administrator basedon the business value of providing performance assurance toapplication a. The performance target of each applicationis determined by its SLA. Thus, we can decompose theglobal control problem given by Eq. (13) into local sub-problems for individual applications. Each local controlproblem is solved by a decentralized FUMI controller. Wefollow the same FUMI design principles to construct suchdecentralized controllers. The control solutions are obtainedby solving the quadratic programming problem given byEq. (15). The CPU allocations set by these individualcontrollers are feasible as long as the total amount of CPUallocated across all the applications do not exceed the servercluster capacity. It is mainly due to the use of VMwareresource pool, which enables resource management andload balancing at the server cluster level. The decentralizedcontrol approach is scalable to the number of applicationshosted in the cluster since it reduces the problem sizedrastically by allowing each controller to manage oneapplication only.

6 PERFUME IMPLEMENTATION

6.1 The Testbed

We have implemented PERFUME on a testbed consistingof two HP ProLiant BL460C G6 blade server modules anda HP EVA storage area network with 10 Gbps Ethernetand 8 Gbps Fibre/iSCSI dual channels. Each blade serveris equipped with Intel Xeon E5530 2.4 GHz quad-coreprocessor and 32 GB PC3 memory. We use VMware ESX4.1 for server virtualization, and create a resource pool fromthe virtualized server cluster to host multi-tier applications.An important feature of VMware resource pool is that theVMs do not have a static mapping with the physical servers.VMware’s Distributed Resource Scheduling mechanismdynamically changes the mapping for load balancing. Eachtier of an application is hosted inside a VM with 2 VCPUs,4GB RAM and 15GB disk space. The guest operatingsystem used is Ubuntu Linux 10.04.

As many related studies [8], [31], [34], [39], our workuses a multi-tier application benchmark RUBiS in thetestbed. RUBiS implements the core functionality of anauction site. It has nine tables in the database and defines26 interactions that can be accessed from the clients’Web browsers. The application contains a Java-based clientthat generates a session-oriented workload. RUBiS sessionshave an average duration of 15 minutes and the averagethink time is five seconds. It defines two workload mixes:a browsing mix made up of only read-only interactions anda bidding mix that includes 15% read-write interactions.

6.2 PERFUME Components

1) Power Monitor: The average power consumption canbe measured at the resource pool level or at the

VM level by using a feature of VMware ESX 4.1.VMware gathers such data through its IntelligentPower Management Interface sensors. The powermonitor program uses vSphere API to collect thepower measurement data.

2) Performance Monitor: It uses a sensor program pro-vided by RUBiS client for performance monitoring.We modify the sensor to measure the client-perceivedpercentile-based response time and throughput overa period of time. The number of requests finishedduring a control interval is the throughput.

3) FUMI Controller: It determines the control actionsat every interval of 30 seconds. This control intervalis chosen by considering the trade-off between noisein the sensor measurements and faster response ofthe controller. It invokes a quadratic programmingsolver, quadprog, in MATLAB to execute the controlalgorithm described in Section 5.

4) Resource Allocator: It uses vSphere API to im-pose CPU usage limits on the VMs. The vSpheremodule provides an interface to execute a methodReconfigVM Task for this purpose.

In our testbed, the overhead of applying the wRLStechnique on the fuzzy models and executing the controlalgorithm is less than half second. It is negligible comparedto the control interval of 30 seconds.

7 PERFORMANCE EVALUATION7.1 Power and Performance Assurance7.1.1 Flexible TradeoffsA key feature of PERFUME is its ability to assure jointpower and performance guarantee with flexible tradeoffswhile assuring control accuracy and system stability. Thetradeoffs between inherently conflicting power and perfor-mance objectives can be specified by a datacenter adminis-trator. The system stability is measured in terms of relativedeviation of power and performance from their respectivetargets, as defined in vPnP [8]. The relative deviationis |y(k) − r|/r, where y(k) is the measured applicationperformance or the power consumption of the resourcepool at time interval k and r is the corresponding targetvalue. We experiment with power-preferred, performance-preferred and balanced control options under a highlydynamic workload [20]. Figure 6(a) shows the dynamicchanges in the number of concurrent users. The workloadprediction component is not used in this case, due to thedifficulty in achieving sufficient prediction accuracy forrandomly varying step change workload.

PERFUME achieves the specified tradeoffs by tuning thetracking error weights, P and Q, in the MIMO controlobjective defined by Eq. (13). Figure 6(b) compares thecontrol accuracy of vPnP with PERFUME in assuring thethroughput target for various tradeoffs between power andperformance. Our results demonstrate that, compared tovPnP, PERFUME delivers average improvement of 30%in performance assurance in terms of relative deviation forvarious control options. We obtained similar results with the


0 100 200 300 400 500 600 700 800 900

0 5 10 15 20 25 30 35 40

num

ber

of u

sers

time (min)0 %

5 %

10 %

15 %

20 %

25 %

30 %

35 %

40 %

45 %

power_oriented perf_oriented balanced

Rel

ativ

e de

viat

ion

vPnPPERFUME

0 %

2 %

4 %

6 %

8 %

10 %

12 %

power_oriented perf_oriented balanced

Rel

ativ

e de

viat

ion

vPnPPERFUME

(a) A highly dynamic workload. (b) Throughput. (c) Power consumption.

Fig. 6: Comparison of PERFUME and vPnP for control accuracy under a dynamic workload.

10

15

20

25

30

35

40

0 10 20 30 40 50 60 70 80

pow

er (

Wat

t)

sampling interval (every 30 secs)

vPnPPERFUME

power budget

0

1000

2000

3000

4000

5000

6000

7000

0 10 20 30 40 50 60 70 80

thro

ughp

ut


vPnPPERFUME

target

400

500

600

700

800

900

1000

1100

1200

0 10 20 30 40 50 60 70 80

CP

U u

sage

lim

it (M

hz)


vPnPPERFUME

(a) Power consumption. (b) Throughput. (c) CPU allocation.

Fig. 7: Comparison of PERFUME and vPnP for power and performance assurance under a dynamic workload.

0 200 400 600 800

1000 1200 1400

0 10 20 30 40 50 60 70 80

CP

U u

sage

lim

it (M

hz)


web app db

Fig. 8: Per-tier CPU allocation for the bidding workload mix .

0 200 400 600 800

1000 1200 1400

0 10 20 30 40 50 60 70 80

CP

U u

sage

lim

it (M

hz)


web app db

Fig. 9: Per-tier CPU allocation for the browsing workload mix.

average improvement of 25% for relative deviation in powerconsumption with respect to its power budget, as shown inFigure 6(c). The control accuracy of the power-preferredoption is the highest for power assurance but the lowest forthroughput assurance. Whereas, the control accuracy of theperformance-preferred option is the highest for throughputassurance and the lowest for power assurance. The balancedcontrol option shows good control accuracy for both powerand performance assurance.

7.1.2 System StabilityWe now take a closer look at the system stability ofPERFUME under the highly dynamic workload. We exper-iment with the power-performance balanced control option.Figures 7(a) and 7(b) illustrate that PERFUME offersmore accurate assurance of power and performance targetscompared to vPnP [8]. We show the results for only one

0

1000

2000

3000

4000

5000

6000

0 10 20 30 40 50 60 70 8095th

-per

cent

ile r

espo

nse

time

(ms)


vPnPPERFUME

target

Fig. 10: The 95th-percentile response time.

of the hosted RUBiS applications. Similar results wereobtained for the other. PERFUME is able to adapt itselfand control both power consumption and throughput so thatthey eventually converge to the steady state. On the otherhand, results show there are more significant oscillations inpower and performance assurance due to the lack of controlaccuracy and system stability guarantee in vPnP. There isan improvement of 25% and 32% in relative deviation ofpower consumption and throughput respectively.

Figure 7(c) compares the total CPU usage limits allo-cated by vPnP and PERFUME at various sampling inter-vals. The total CPU usage limits is the sum of the CPUusage limits at the Web, application and database tiers. Onaverage, PERFUME uses similar amount of CPU resourcesas vPnP. However, there is significantly less fluctuationsin resource allocation. The CPU allocations are adjustedto track the power and performance targets. For instance,when an increase in the workload intensity causes the powerconsumption to exceed the power budget, CPU allocationis reduced to bring down the power consumption.

We investigate PERFUME’s ability to capture multi-tier resource dependency for making control decisionsby applying different workload mixes in this experiment.Figures 8 and 9 show the distribution of CPU allocationamong the three tiers of the RUBiS application when


50

100

150

200

250

300

350

0 20 40 60 80 100 120 140 160 180 200

num

ber

of a

ctiv

e us

ers

time (s)

0

5

10

15

20

25

30

35

40

0 20 40 60 80 100 120 140 160 180 200

pow

er (

Wat

t)

time (s)

vPnPPERFUME

power budget

1000

1500

2000

2500

3000

3500

4000

4500

5000

0 20 40 60 80 100 120 140 160 180 200

thro

ughp

ut

time (s)

vPnPPERFUME

target

(a) A bursty workload. (b) Power consumption. (c) Throughput.

Fig. 11: Power and Performance assurance under a bursty workload generated by 1000 users.

0

20

40

60

80

100

120

140

160

0 20 40 60

arriv

al (

sess

ions

/ m

in)

time (min)

actualpredicted

Fig. 12: Workload traffic trace and FUMI prediction.

bidding and browsing workload mixes are used respectively.We observe that the proportion of CPU allocations at thethree tiers vary with different workload mixes. The relativedeviation of power and performance from their respectivetargets are similar in both cases. It is due to the fact thatPERFUME is able to satisfy the various resource demandsat the three tiers of RUBiS application depending on thetype of workload mix used.

7.1.3 Percentile-Based Response Time GuaranteeWe now demonstrate the capability of PERFUME in accu-rately achieving the 95th-percentile response time guaran-tee in the face of highly dynamic workload. Note that PER-FUME is able to provide any percentile based response timeguarantee. Compared with the mean-based performancemetric, a percentile-based response time introduces muchstronger nonlinearity in the system. Figure 10 shows thatcompared to vPnP [8], PERFUME delivers significantlyimproved control accuracy and performance assurance forhighly non-linear percentile-based response times. For thisexperiment, we set the 95th-percentile response time targetof a RUBiS application as two seconds. We observe thatcompared with vPnP, there is the improvement of 40% interms of relative deviation by PERFUME. This is mainlydue to two reasons. First, PERFUME’s FUMI model hasbetter accuracy, even in case of highly non-linear percentile-based performance. Second, it provides more accurate con-trol and system stability due to the FUMI control design.

7.1.4 Sytem Robustness under A Bursty WorkloadWe evaluate the robustness of PERFUME under a burstyworkload. We use an approach proposed in [29] to injectburstiness into the arrival process of RUBiS clients ac-cording to the index of dispersion. The dispersion indexmodulates the think times of users between submissionof consecutive requests. We set the index of dispersion

to 4000 and the maximum number of concurrent users to1000. Figure 11 (a) shows the bursty workload in which thenumber of active users in a RUBiS application fluctuatesover a period of 200 seconds.

Figures 11(b) and 11(c) illustrate that, compared to vPnP,PERUME is able to provide better assurance of averagepower consumption and throughput targets in the face ofthe bursty workload. We choose a sampling interval of 20seconds for both approaches. A smaller sampling intervalprovides better responsiveness to workload fluctuations, butincreases the sensitivity towards random noise. The varia-tions in the average power consumption and throughput aremainly due to burstiness in the workload and the controlactions (CPU allocations) taken at each sampling interval.The robustness of PERFUME under bursty workloads isattributed to the fact that its control actions are basedon a more accurate model of the system and a soundcontrol theoretic foundation. Moreover, PERFUME is moreadaptive to variations in workload due to its fast onlinelearning algorithm. We observe that compared with vPnP,there is the improvement of 32% and 44% in terms ofrelative deviation of power and throughput by PERFUME.

7.2 Impact of Proactive FUMI Control on Powerand Performance AssuranceWe demonstrate the benefit of integrating workload pre-diction with FUMI control. We modify the RUBiS clientto generate workload based on the Web traces from the1998 Soccer World Cup site [1]. These traces contain thenumber of arrivals per minute to this website over an 8-dayperiod. As a case study, we choose the workload trace ofa moderately busy day and compress the original 24-hourlong trace to 1 hour, similar to the related work in [34].Figure 12 shows the actual workload and the predictionsmade by Kalman filtering technique. Our proactive FUMIcontroller acquires workload predictions over a horizon offour steps and computes the CPU resource adjustmentsrequired to meet the power and performance goals. Thepower budget and the performance target are set to be 15Watts and 2500 requests per sampling interval respectively.

Figures 13(a) and 13(b) compare the average power con-sumption and system throughput achieved by PERFUMEwith and without the integration of workload prediction. Incase of a purely reactive FUMI control, we observe signifi-cant violations of the power and performance targets duringthe time intervals 10-13 min, 21-29 min and 34-41 min. It


10

12

14

16

18

20

22

24

26

0 20 40 60

pow

er (

Wat

t)

time (min)

w/o predictionw prediction

target

2000 2200 2400 2600 2800 3000 3200 3400 3600 3800

0 20 40 60

thro

ughp

ut

time (min)


target

0 %

2 %

4 %

6 %

8 %

10 %

12 %

power throughput

Rel

ativ

e de

viat

ion


(a) Power consumption. (b) System Throughput. (c) Improvement in relative deviation.

Fig. 13: Comparison of PERFUME with and without workload prediction of proactive FUMI control.

0 5

10 15 20 25 30

0 20 40 60 80

pow

er b

udge

t (W

att)


0 5

10 15 20 25 30

0 20 40 60 80

pow

er (

Wat

t)


1000 1500 2000 2500 3000 3500 4000 4500 5000

0 20 40 60 80

thro

ughp

ut


app1app2

target

(a) Variation in power budget. (b) Power consumption. (c) Throughput.

Fig. 14: Service differentiation between two RUBiS applications for varying power budgets.

is due to the continuous increase in the workload intensityduring those intervals as shown in Figure 12. Without theintegration of workload prediction, FUMI control incor-rectly estimates the future states of the system assumingthat the workload intensity will not change. Furthermore, itrequires a few control intervals to update the system modelin response to the workload variations. As a result, thecontrol actions in terms of CPU resource adjustments lagbehind the continuously increasing workload.

On the other hand, the integration of workload predictionallows FUMI control to make proactive control decisionsin anticipation of future changes in the workload intensity.Hence, it is able to avoid significant violations of powerand performance in the face of dynamic workload varia-tions. For instance, Figure 13(a) shows that the averagepower consumption starts decreasing at time 18 min due toproactive control action in anticipation of future workloadchange starting at time 21 min. As a result of this trend, thecontroller is able to keep the power consumption close toits target during the time interval 21-29 min in spite of thecontinuous increase in the workload. Figure 13(c) showsthe improvement achieved by proactive FUMI control interms of the relative deviation of power and performance.

7.3 Service Differentiation ProvisioningWe evaluate the service differentiation capability of PER-FUME in the face of a dynamic power budget as shown inFigure 14(a). In practice, the power budget might changedue to thermal condition or temporary reductions in coolingor power delivery capacity.

In this experiment, two RUBiS applications, app1 andapp2, are hosted on a shared resource pool. For servicedifferentiation, the performance tracking error weight Qfor app 1 is set to be larger than that of app 2. Hence,PERFUME gives a higher priority to app 1. As a casestudy, a power-preferred control policy is applied. Eachapplication faces a workload of 1000 users. The throughputtarget is set to be 4000 requests per sampling interval.

Figures 14(b) and 14(c) show the average power con-sumption of the resource pool and the achieved throughputof the RUBiS applications. For the first 39 intervals, bothapplications are able to meet their performance goals whilekeeping the average power consumption below the specifiedpower budget of 25 Watts. At interval 40, the powerbudget is changed to 15 Watts. This constrains the systemin such a way that both applications can not maintaintheir current performance level without violating the powerbudget. PERFUME responds to this situation automaticallyand correctly re-distributes CPU resources so that thehigher priority application, app1, still achieves an averagethroughput that is close to the target. On the other hand, theperformance of lower priority application is degraded. Asa result of this service differentiation, PERFUME is ableto avoid power budget violations.

7.4 Effect of Control Parameter Tuning

TABLE 2: Impact of control penalty weight R on rise time (tr)and percent overshoot (PO).

R=0.02 R=0.04 R=0.06tr 90 sec 120 sec 300 secPO 13.7% 2.8% 1.1%

We now present the effect of tuning FUMI controlparameters on the control performance. Figure 15 shows theperformance of one RUBiS application under a workloadof 1000 concurrent users, for different values of controlpenalty weight R. For R = 0.02, the controller adjusts theCPU resources more quickly and aggressively to meet theperformance target. However, it results in large oscillationsin performance. On the other hand, for R = 0.06 thecontroller avoids significant oscillations in performance.However, it becomes more sluggish and takes ten controlintervals to meet the performance target. We quantify theseobservations by measuring two important properties of our


3000 4000 5000 6000 7000 8000 9000

10000 11000 12000

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

thro

ughp

ut


with R 0.02with R 0.04with R 0.06

target

Fig. 15: Performance results of PERFUME with different valuesof control penalty weight R.

2

2.5

3

3.5

4

4.5

5

2 3 4 5 6 7 8 9 10rela

tive

devi

atio

n in

thro

ughp

ut

prediction horizon (Hp)

Fig. 16: FUMI control performance as a function of Hp.

control system. First, we measure rise time (tr), whichis the time required for the throughput to reach its targetvalue. Second, we measure percent overshoot (PO), whichrepresents the maximum amount by which the throughputovershoots its target. It is desirable to minimize tr forcontrol responsiveness, and minimize PO for control accu-racy and stability. Hypothetically, an ideal control systemwill have a rise time equivalent to one control interval,and zero percent overshoot. However in practice, these twoproperties often conflict with each other. Table 2 shows thatthe control penalty weight R = 0.04 balances this tradeoff.

Next, we study the impact of tuning the predictionhorizon Hp on the control performance. In this experiment,the RUBiS application faces a dynamic workload shown inFigure 12. The control performance is measured in terms ofthe relative deviation of average throughput with respect tothe target of 2500 requests per control interval. Figure 16shows that the control performance initially improves withthe increase in Hp. Intuitively, as the controller looksfurther ahead, it anticipates future workload demands andtakes control actions accordingly at the current time stepitself. However, control performance degrades when Hp islarger than four. It is due to the fact that the predictionaccuracy decreases with the increase in Hp. Therefore, Hp

must be chosen carefully, considering the trade-off betweenlook-ahead performance and estimation errors.

7.5 Evaluation of Decentralized FUMI ControlWe evaluate the effectiveness of the decentralized FUMIcontrol in assuring application performance and controllingthe power consumption of the virtualized server cluster. Inthis experiment, four RUBiS applications are hosted ona resource pool of three virtualized blade servers. Eachapplication is controlled by a decentralized FUMI con-troller. Each application faces a workload of 700 concurrentusers. The throughput targets are set to 5400, 4800, 4200,

0

1000

2000

3000

4000

5000

6000

0 2 4 6 8 10 12 14

thro

ughp

ut


app1app2app3app4

target

Fig. 17: Performance assurance by decentralized FUMI control.

0

10

20

30

40

50

0 2 4 6 8 10 12 14

pow

er (

Wat

ts)


app1app2app3app4

total power

Fig. 18: Power control by decentralized FUMI control.

and 3600 requests per sampling interval for app1, app2,app3, and app4 respectively. The priority, Q(a), for eachapplication is set to be 0.25. According to the problemdecomposition strategy, each application is assigned anequal fraction of the total power budget, which is 47 Watts.Figure 17 shows that each decentralized FUMI controller

is able to meet the application performance target withinsix sampling intervals. As shown in Figure 18, the powerconsumption of each application increases in a controlledmanner due to the CPU allocations made by each controller.As a result, the total power consumption of the virtualizedserver cluster stays close to the peak power budget.

8 CONCLUSION

Datacenters face significant multi-facet challenges in powerand performance management for meeting SLAs, resourceutilization efficiency and power savings. PERFUME pro-vides a coordinated and self-adaptive power and perfor-mance control in a virtualized server cluster. As demon-strated by experimental results based on a testbed imple-mentation, its main contributions are the precise and proac-tive control of power consumption of virtualized serversfor power budgeting, average throughput and percentile-based response time guarantee of multi-tier applications,tradeoff flexibility between power and performance targetsand service differentiation among co-located applicationswhile assuring control accuracy and system stability in theface of highly dynamic and bursty workloads.

The main technical novelty of PERFUME system is dueto the proposed fuzzy MIMO (FUMI) control technique,which integrate the strengths of fuzzy logic, MIMO controland artificial neural network. It is self-adaptive to dynamicworkloads due to online learning of fuzzy model parametersusing a computationally efficient wRLS method. This iscomplemented by the integration of future workload pre-diction for proactive control.


Our future work will integrate power-aware consolidationtechniques with PERFUME and explore autonomous per-formance and power control for building energy-efficientdatacenters.

AcknowledgementThis research was supported in part by NSF CAREERaward CNS-0844983 and research grant CNS-1217979. Apreliminary version of the paper appeared in [21]. Theauthors thank the anonymous reviewers for their valuablesuggestions for revising the manuscript.

REFERENCES

[1] M. Arlitt and T. Jin. Workload characterization of the 1998 worldcup web site. In Technical report, HPL-99-35R, HP Labs., 1999.

[2] S. Chen, K. R. Joshi, M. A. Hiltunen, R. D. Schlichting, and W. H.Sanders. Cpu gradients: Performance-aware energy conservation inmultitier systems. In Proc. GREENCOMP, 2010.

[3] Y. Chen, A. Das, W. Qin, A. Sivasubramaniam, Q. Wang, andN. Gautam. Managing server energy and operational costs in hostingcenters. In Proc. ACM SIGMETRICS, pages 303–314, 2005.

[4] S. Chiu. Fuzzy model identification based on cluster estimation.Journal of Intelligent & Fuzzy Systems, 2(3), 1994.

[5] M. Elnozahy, M. Kistler, and R. Rajamony. Energy conservationpolicies for Web servers. In Proc. USITS, 2003.

[6] X. Fan, W. D. Weber, and L. A. Barroso. Power provisioning for awarehouse-sized computer. ACM SIGARCH, 35(2):13–23, 2007.

[7] A. Gandhi, M. Harchol-Balter, R. Das, and C. Lefurgy. Optimalpower allocation in server farms. In Proc. ACM SIGMETRICS, 2009.

[8] J. Gong and C.-Z. Xu. vpnp: Automated coordination of power andperformance in virtualized datacenters. In Proc. IEEE IWQoS, 2010.

[9] S. Govindan, J. Choi, B. Urgaonkar, A. Sivasubramaniam, andA. Baldini. Statistical profiling-based techniques for effective powerprovisioning in data centers. In Proc. EuroSys, 2009.

[10] S. Govindan, J. Liu, A. Kansal, and A. Sivasubramaniam. Cuanta:quantifying effects of shared on-chip resource interference for con-solidated virtual machines. In Proc. ACM SOCC, 2011.

[11] Y. Guo, P. Lama, and X. Zhou. Automated and agile server parametertuning with learning and control. In Proc. IEEE IPDPS, 2012.

[12] Y. Guo and X. Zhou. Coordinated vm resizing and server tuning:Throughput, power efficiency and scalability. In Proc. IEEE MAS-COTS, 2012.

[13] T. Horvath, T. Abdelzaher, K. Skadron, and X. Liu. Dynamic voltagescaling in multitier Web servers with end-to-end delay control. IEEETrans. on Computers, 56(4):444–458, 2007.

[14] J.-S. Jang. Anfis: adaptive-network-based fuzzy inference system.IEEE Transactions on Systems, Man and Cybernetics, 23(3):665 –685, 1993.

[15] C. Jiang, X. Xu, J. Wan, J. Zhang, X. You, and R. Yu. Power awarejob scheduling with qos guarantees based on feedback control. InProc. IEEE IWQoS, 2010.

[16] G. Jung, M. A. Hiltunen, K. R. Joshi, R. D. Schlichting, and C. Pu.Mistral: Dynamically managing power, performance, and adaptationcost in cloud infrastructures. In Proc. IEEE ICDCS, 2010.

[17] Y. Koh, R. Knauerhase, P. Brett, M. Bowman, W. Zhihua, andC. Pu. An analysis of performance interference effects in virtualenvironments. In Proc. IEEE ISPASS, 2007.

[18] S. Kumar, V. Talwar, V. Kumar, P. Ranganathan, and K. Schwan.vmanage: Loosely coupled platform and virtualization managementin data centers. In Proc. IEEE ICAC, 2009.

[19] D. Kusic, J. Kephart, J. Hanson, N. Kandasamy, and G. Jiang.Power and performance management of virtualized computing envi-ronments via lookahead control. In Proc. IEEE ICAC, 2008.

[20] P. Lama and X. Zhou. Autonomic provisioning with self-adaptiveneural fuzzy control for end-to-end delay guarantee. In Proc.IEEE/ACM MASCOTS, pages 151–160, 2010.

[21] P. Lama and X. Zhou. PERFUME: Power and performance guaranteewith fuzzy mimo control in virtualized servers. In Proc. IEEEIWQoS, pages 345–354, 2011.

[22] P. Lama and X. Zhou. Efficient server provisioning with control forend-to-end delay guarantee on multi-tier clusters. IEEE Transactionson Parallel and Distributed Systems, 19 pages, 23(1), 2012.

[23] P. Lama and X. Zhou. Ninepin: Non-invasive and energy efficientperformance isolation in virtualized servers. In Proc. IEEE/IFIPDSN, 2012.

[24] K. Le, R. Bianchiniy, M. Martonosiz, and T. D. Nguyeny. Cost-and energy-aware load distribution across data centers. In Proc.HotPower, 2009.

[25] C. Lefurgy, X. Wang, and M. Ware. Server-level power control. InProc. IEEE ICAC, 2007.

[26] J. C. B. Leite, D. M. Kusic, D. Mosse, and L. Bertini. Stochasticapproximation control of power and tardiness in a three-tier Web-hosting cluster. In Proc. IEEE ICAC, 2010.

[27] H. Lim, A. Kansal, and J. Liu. Power budgeting for virtualized datacenters. In Proc. USENIX Annual Technical Conference, 2011.

[28] J. Mars, L. Tang, R. Hundt, K. Skadron, and M. L. Soffa. Bubble-up: increasing utilization in modern warehouse scale computers viasensible co-locations. In Proc. IEEE/ACM MICRO, 2011.

[29] N. Mi, G. Casale, L. Cherkasova, and E. Smirni. Injecting realisticburstiness to a traditional client-server benchmark. In Proc. IEEEICAC, 2009.

[30] R. Nathuji, K. Schwan, A. Somani, and Y. Joshi. Vpm tokens: virtualmachine-aware power budgeting in datacenters. Cluster Computing,12(2):189–203, 2009.

[31] P. Padala, K.-Y. Hou, K. G. Shin, X. Zhu, M. Uysal, Z. Wang,S. Singhal, and A. Merchant. Automated control of multiplevirtualized resources. In Proc. EuroSys, pages 13–26, 2009.

[32] R. Raghavendra, P. Ranganathan, V. Talwar, Z. Wang, and X. Zhu.No power struggles: Coordinated multi-level power management forthe data center. In Proc. ASPLOS, 2008.

[33] R. Singh, U. Sharma, E. Cecchet, and P. Shenoy. Autonomic mix-aware provisioning for non-stationary data center workloads. InProc. IEEE ICAC, pages 21–30, 2010.

[34] B. Urgaonkar, P. Shenoy, A. Chandra, P. Goyal, and T. Wood. Agiledynamic provisioning of multi-tier Internet applications. ACM Trans.on Autonomous and Adaptive Systems, 3(1):1–39, 2008.

[35] A. Verma, P. Ahuja, and A. Neogi. pmapper: power and migrationcost aware application placement in virtualized systems. In Proc.ACM/IFIP/USENIX Middleware, 2008.

[36] A. Verma, P. De, V. Mann, T. Nayak, A. Purohit, G. Dasgupta,and R. Kothari. Brownmap: enforcing power budget in shared datacenters. In Proc. ACM/IFIP/USENIX Middleware, 2010.

[37] X. Wang, M. Chen, and X. Fu. MIMO power control for high-densityservers in an enclosure. IEEE Trans. on Parallel and DistributedSystems, 21(10):1412–1426, 2010.

[38] X. Wang and Y. Wang. Co-con: Coordinated control of power andapplication performance for virtualized server clusters. In Proc. IEEEIWQoS, 2009.

[39] B. J. Watson, M. Marwah, D. Gmach, Y. Chen, M. Arlitt, andZ. Wang. Probabilistic performance modeling of virtualized resourceallocation. In Proc. IEEE ICAC, 2010.

Palden Lama received the BTech degree inelectronics and communication engineeringfrom the Indian Institute of Technology, in2003. He received his PhD degree in Com-puter Science from the University of Col-orado, Colorado Springs in 2013. Currently,he is an Assistant Professor in the Depart-ment of Computer Science at the Universityof Texas, San Antonio. His research interestsinclude the areas of Cloud computing, sus-tainable computing, autonomic resource and

power management, and big data processing in the Cloud.

Xiaobo Zhou received the BS, MS, and PhDdegrees in Computer Science from NanjingUniversity, in 1994, 1997, and 2000, respec-tively. Currently he is a Professor and theChairperson of the Department of ComputerScience, University of Colorado, ColoradoSprings. His research lies broadly in com-puter network systems, more specifically, au-tonomic and sustainable computing in data-centers, Cloud computing, server virtualiza-tion, scalable Internet services and architec-

tures, and computer network security. His research was supported inpart by the US NSF and Air Force Research Lab. He was a recipientof the NSF CAREER AWARD in 2009. He is a senior member of theIEEE.

Documents

IEEE TRANSACTIONS ON COMPUTERS, VOL. , NO. , AUGUST 2013 …plama/papers/TOC13.pdf · As shown in Figure 1, various applications are hosted on a virtualized server cluster according