Characterizing the Impact of the Workload on the …users.cms.caltech.edu/~adamw/papers/netcalc.pdf1 Characterizing the Impact of the Workload on the Value of Dynamic Resizing in Data

1

Characterizing the Impact of the Workload onthe Value of Dynamic Resizing in Data Centers

Kai Wang, Student Member, IEEE, Minghong Lin, Student Member, IEEE,Florin Ciucu, Member, IEEE, Adam Wierman, Member, IEEE, and Chuang Lin, Senior Member, IEEE

Abstract—Energy consumption imposes a significant cost for data centers; yet much of that energy is used to maintain excess servicecapacity during periods of predictably low load. Resultantly, there has recently been interest in developing designs that allow theservice capacity to be dynamically resized to match the current workload. However, there is still much debate about the value of suchapproaches in real settings. In this paper, we show that the value of dynamic resizing is highly dependent on statistics of the workloadprocess. In particular, both slow time-scale non-stationarities of the workload (e.g., the peak-to-mean ratio) and the fast time-scalestochasticity (e.g., the burstiness of arrivals) play key roles. To illustrate the impact of these factors, we combine optimization-basedmodeling of the slow time-scale with stochastic modeling of the fast time scale. Within this framework, we provide both analytic andnumerical results characterizing when dynamic resizing does (and does not) provide benefits.

Index Terms—Data Centers, Dynamic Resizing, Energy Efficient IT, Stochastic Network Calculus.

F

1 INTRODUCTION

Energy costs represent a significant, and growing, frac-tion of a data center’s budget. Hence there is a push toimprove the energy efficiency of data centers, both interms of the components (servers, disks, network [30],[6], [8], [14]) and the algorithms [4], [11], [10], [26]. Onespecific aspect of data center design that is the focus ofthis paper is dynamically resizing the service capacity ofthe data center so that during periods of low load someservers are allowed to enter a power-saving mode (e.g.,go to sleep or shut down).

The potential benefits of dynamic resizing have beena point of debate in the community [19], [11], [31]. Onone hand, it is clear that, because data centers are farfrom perfectly energy proportional, significant energyis used to maintain excess capacity during periods ofpredictably low load when there is a diurnal workloadwith a high peak-to-mean ratio. On the other hand, thereare also significant costs to dynamically adjusting thenumber of active servers. These costs come in terms ofthe engineering challenges in making this possible [13],[35], [5], as well as the latency, energy, and wear-and-tearcosts of the actual “switching” operations involved [7],

• Kai Wang is with the Dept. of Computer Science and Technology, TsinghuaUniversity, Beijing, 100084 China. This work was done when he was avisiting student in the Dept. of Computing and Mathematical Sciences,California Institute of Technology, Pasadena, CA, 91125 USA, e-mail:[email protected].

• Minghong Lin and Adam Wierman are with the Dept. of Computing andMathematical Sciences, California Institute of Technology, Pasadena, CA,91125 USA, e-mail: {mhlin, adamw}@caltech.edu.

• Florin Ciucu is with the T-Labs/TU Berlin, Ernst-Reuter-Platz 7 10587Berlin, Germany, email: [email protected].

• Chuang Lin is with the Dept. of Computer Science and Technology, Ts-inghua University, Beijing, 100084 China, e-mail: [email protected].

[11], [17].The challenges for dynamic resizing highlighted above

have been the subject of significant research. At thispoint, many of the engineering challenges associatedwith facilitating dynamic resizing have been resolved,e.g., [13], [35], [5]. Additionally, the algorithmic challengeof deciding, without knowledge of the future workload,whether to incur the significant “switching costs” as-sociated with changing the available service capacityhas been studied in depth and a number of promisingalgorithms have emerged [26], [3], [11], [16].

However, despite this body of work, the question ofcharacterizing the potential benefits of dynamic resizinghas still not been properly addressed. Providing newinsight into this topic is the goal of the current paper.

The perspective of this paper is that, apart from en-gineering challenges, the key determinant of whetherdynamic resizing is valuable is the workload, and thatproponents on different sides tend to have different as-sumptions in this regard. In particular, a key observation,which is the starting point for our work, is that thereare two factors of the workload which provide dynamicresizing potential savings:

(i) Non-stationarities at a slow time-scale, e.g., diurnalworkload variations.

(ii) Stochastic variability at a fast time-scale, e.g., theburstiness of request arrivals.

The goal of this work is to investigate the impact of andinteraction between these two features with respect todynamic resizing.

To this point, we are not aware of any work character-izing the benefits of dynamic resizing that captures bothof these features. There is one body of literature whichprovides algorithms that take advantage of (i), e.g., [11],

2

[10], [26], [3]. This work tends to use an optimization-based approach to develop dynamic resizing algorithms.There is another body of literature which provides algo-rithms that take advantage of (ii), e.g., [16], [17]. Thiswork tends to assume a stationary queueing model withPoisson arrivals to develop dynamic resizing algorithms.

The first contribution of the current paper is to providean analytic framework that captures both effects (i)and (ii). We accomplish this by using an optimizationframework at the slow time-scale (see Section 2), whichis similar to that of [26], and combining this with stochas-tic network calculus and large deviations modeling forthe fast time-scale (see Section 3), which allows us tostudy a wide variety of underlying arrival processes.We consider both light-tailed models with various de-grees of burstiness and heavy-tailed models that exhibitself-similarity. The interface between the fast and slowtime-scale models happens through a constraint in theoptimization problem that captures the Service LevelAgreement (SLA) for the data center, which is used bythe slow time-scale model but calculated using the fasttime-scale model (see Section 3).

Using this modeling framework, we are able to pro-vide both analytic and numerical results that yield newinsight into the potential benefits of dynamic resizing(see Section 4). Specifically, we use trace-driven nu-merical simulations to study (i) the role of burstinessfor dynamic resizing, (ii) the role of the peak-to-meanratio for dynamic resizing, (iii) the role of the SLA fordynamic resizing, and (iv) the interaction between (i),(ii), and (iii). The key realization is that each of theseparameters are extremely important for determining thevalue of dynamic resizing. In particular, for any fixedchoices of two of these parameters, the third can bechosen so that dynamic resizing does or does not pro-vide significant cost savings for the data center. Thus,performing a detailed study of the interaction of thesefactors is important. To that end, Figures 12-14 provideconcrete illustrations of which settings of peak-to-meanratio, burstiness, and SLAs dynamic resizing is and isnot valuable. Hence, debate about the potential value ofdynamic resizing can be transformed into debate aboutcharacteristics of the workload and the SLA.

There are some interesting facts about these param-eters individually that our case studies uncover. Twoimportant examples are the following. First, while onemight expect that increased burstiness provides in-creased opportunities for dynamic resizing, it turns outthe burstiness at the fast time-scale actually reduces thepotential cost savings achievable via dynamic resizing.The reason is that dynamic resizing necessarily happensat the slow time-scale, and so the increased burstiness atthe fast time-scale actually results in the SLA constraintrequiring more servers be used at the slow time-scale dueto the possibility of a large burst occurring. Second, itturns out the impact of the SLA can be quite differentdepending on whether the arrival process is heavy- orlight-tailed. In particular, as the SLA becomes more strict,

the cost savings possible via dynamic resizing underheavy-tailed arrivals decreases quickly; however, thecost savings possible via dynamic resizing under light-tailed workloads is unchanged.

In addition to detailed case studies, we provide ana-lytic results that support many of the insights providedby the numerics. In particular, Theorems 1 and 2 providemonotonicity and scaling results for dynamic resizing inthe case of Poisson arrivals and heavy-tailed, self-similararrivals.

The remainder of the paper is organized as follows.The model is introduced in Sections 2 and 3, whereSection 2 introduces the optimization model of the slowtime-scale and Section 3 introduces the model of the fasttime-scale and analyzes the impact different arrival mod-els have on the SLA constraint of the dynamic resizingalgorithm used in the slow time-scale. Then, Section 4provides case studies and analytic results characterizingthe impact of the workload on the benefits of dynamicresizing. Finally, Section 5 provides concluding remarks.

2 SLOW TIME-SCALE MODEL

In this section and the one that follows, we introduceour model. We start with the “slow time-scale model”.This model is meant to capture what is happening atthe time-scale of the data center control decisions, i.e., atthe time-scale which the data center is willing to adjustits service capacity. For many reasons, this is a muchslower time-scale than the time-scale at which requestsarrive to the data center. We provide a model for this“fast time-scale” in the next section.

The slow time-scale model parallels closely the modelstudied in [26]. The only significant change is to adda constraint capturing the SLA to the cost optimizationsolved by the data center. This is a key change, whichallows an interface to the fast time-scale model.

2.1 The WorkloadAt this time-scale, our goal is to provide a model whichcan capture the impact of diurnal non-stationarities inthe workload. To this end, we consider a discrete-timemodel such that there is a time interval of interestwhich is evenly divided into “frames” k ∈ {1, ...,K}. Inpractice, the length of a frame could be on the orderof 5-10 minutes, whereas the time interval of interestcould be as long as a month/year. The mean arrivalrate to the data center in frame k is denoted by λk,and non-stationarities are captured by allowing differentrates during different frames. Although we could allowλk to have a vector value to represent more than onetype of workload as long as the resulting cost functionis convex in our model, we assume λk to have a scalarvalue in this paper to simplify the presentation. Becausethe request inter-arrival times are much shorter than theframe length, typically in the order of 1-10 seconds, ca-pacity provisioning can be based on the average arrivalrate during a frame.

3

2.2 The Data Center Cost Model

The model for data center costs focuses on the servercosts of the data center, as minimizing server energy con-sumption also reduces cooling and power distributioncosts. We model the cost of a server by the operatingcosts incurred by an active server, as well as the switch-ing cost incurred to toggle a server into and out of apower-saving model (e.g., off/on or sleeping/waking).Both components can be assumed to include energy cost,delay cost, and wear-and-tear cost.

Note that this model ignores many issues surroundingreliability and availability, which are key components ofdata center service level agreements (SLAs). In practice,a solution that toggles servers must still maintain thereliability and availability guarantees; however this isbeyond the scope of the current paper. See [35] for adiscussion.

The Operating CostThe operating costs are modeled by a convex functionf(λi,k), which is the same for all the servers, whereλi,k denotes the average arrival rate to server i duringframe k. The convexity assumption is quite general andcaptures many common server models. One example,which we consider in our numeric examples later, isto say that the operating costs are simply equal to theenergy cost of the server, i.e., the energy cost of anactive server handling arrival rate λi,k. This cost is oftenmodeled using an affine function as follows

f(λi,k) = e0 + e1λi,k , (1)

where e0 and e1 are constants [1], [6], [23]. Note thatwhen servers use dynamic speed scaling, if the energycost is modeled as polynomial in the chosen speed, thecost f(·) remains convex. In practice, we expect that f(·)will be empirically measured by observing the systemover time.

The Switching CostThe switching cost, denoted by β, models the cost oftoggling a server back-and-forth between active andpower-saving models. The switching cost includes thecosts of the energy used toggling a server, the delay inmigrating connections/data when toggling a server, andthe increased wear-and-tear on the servers toggling.

2.3 The Data Center Optimization

Given the cost model above, the data center has twocontrol decisions at each time: determining nk, the num-ber of active servers in every time frame, and assigningarriving jobs to servers, i.e., determining λi,k such thatΣnki=1λi,k = λk. All servers are assumed to be homo-geneous with constant rate capacity µ > 0. Modelingheterogeneous servers is possible but the online problemwill become more complicated [25] , which is out of thescope of this paper.

The goal of the data center is to determine nk andλi,k to minimize the cost incurred during [0,K], whichis modeled as follows:

minK∑k=1

nk∑i=1

f(λi,k) + β

K∑k=1

(nk − nk−1)+ (2)

s.t.

0 ≤ λi,k ≤ λkΣnki=1λi,k = λk

P(Dk > D) ≤ ε ,(3)

where the final constraint is introduced to capture theSLA of the data center. We use Dk to represent thesteady-state delay during frame k, and (D, ε) to representan SLA of the form “the probability of a delay larger thanD must be bounded by probability ε”.

This model generalizes the data center optimizationproblem from [26] by accounting for the additional SLAconstraint. The specific values in this constraint aredetermined by the stochastic variability at the fast time-scale. In particular, we derive (for a variety of workloadmodels) a sufficient constraint nk ≥ Ck(D,ε)

µ such that

nk ≥Ck(D, ε)

µ=⇒ P(Dk > D) ≤ ε . (4)

Here, µ is the constant rate capacity of each server andCk(D, ε) is to be determined for each considered arrivalmodel. One should interpret Ck(D, ε) as the overalleffective capacity/bandwidth needed in the data centersuch that the SLA delay constraint is satisfied withinframe k.

Note that the new constraint is only sufficient for theoriginal SLA constraint. The reason is that Ck(D, ε) willbe computed, in the next section, from upper bounds onthe distribution of the transient delay within a frame.

With the new constraint, however, the optimizationproblem in (2) can be considerably simplified. Indeed,note that nk is fixed during each time frame k andthe remaining optimization for λi,k is convex. Thus, wecan simplify the form of the optimization problem byusing the fact that the optimal dispatching strategy λ∗i,kis load balancing, i.e., λ∗1,k = λ∗2,k = . . . = λk/nk. Thisdecouples dispatching λ∗i,k from capacity planning nk,and so Eqs. (2)-(3) become:

Data Center Optimization Problem

minK∑k=1

nkf(λk/nk) + β

K∑k=1

(nk − nk−1)+ (5)

s.t. nk ≥Ck(D, ε)

µ.

Note that (5) is a convex optimization, sincenkf(λk/nk) is the perspective function of the convexfunction f(·).

As we have already pointed out, the key differencebetween the optimization above, and that of [26], is theSLA constraint. However, this constraint plays a key role

4

in the current paper. It is this constraint that provides abridge between the slow time-scale and fast time-scalemodels. Specifically, the fast time-scale model uses largedeviations and stochastic network calculus techniques tocalculate Ck(D, ε).

2.4 Algorithms for Dynamic Resizing

Though the Data Center Optimization Problem de-scribed above is convex, in practice it must be solvedonline, i.e., without knowledge of the future workload.Thus, in determining nk, the algorithm may not haveaccess to the future arrival rates λl for l > k. This factmakes developing algorithms for dynamic resizing chal-lenging. However, progress has been made recently [26],[27].

Deriving algorithms for this problem is not the goalof the current paper. Thus, we make use of a recentalgorithm called Lazy Capacity Provisioning (LCP) [26].We choose LCP because of the strong analytic per-formance guarantees it provides – LCP provides costwithin a factor of 3 of optimal for any (even adversarial)workload process.

LCP works as follows. Let (nLk,1, . . . , nLk,k) be the solu-

tion vector to the following optimization problem

mink∑l=1

nlf(λl/nl) + β

k∑l=1

(nl − nl−1)+

s.t. nl ≥Cl(D, ε)

µ, n0 = 0 .

Similarly, let (nUk,1, . . . , nUk,k) be the solution vector to the

following optimization problem

mink∑l=1

nlf(λl/nl) + β

k∑l=1

(nl−1 − nl)+

s.t. nl ≥Cl(D, ε)

µ, n0 = 0 .

Denote (n)ba = max(min(n, b), a) as the projection of ninto the closed interval [a, b]. Then LCP can be definedusing nLk,k and nUk,k as follows. Informally, LCP stays“lazily” between the upper bound nUk,k and the lowerbound nLk,k in all frames.

Lazy Capacity Provisioning, LCP

Let nLCP = (nLCP0 , . . . , nLCPK ) denote the vector of activeservers under LCP. This vector can be calculated online usingthe following forward recurrence relation:

nLCPk =

{0, k ≤ 0

(nLCPk−1 )nUk,knLk,k

, 1 ≤ k ≤ K .

Note that, in [26], LCP is introduced and analyzed forthe optimization from Eq. (5) without the SLA constraint.However, it is easy to see that the algorithm and perfor-mance guarantee extends to our setting. Specifically, theguarantees on LCP hold in our setting because the SLA

constraint can be removed by defining the operating costto be ∞ instead of nkf(λk/nk) when nk < Ck(D, ε)/µ.

A last point to highlight about LCP is that, as de-scribed, it does not use any predictions about the work-load in future frames. Such information could clearly bebeneficial, and can be incorporated into LCP if desired,see [26].

3 FAST TIME-SCALE MODEL

Given the model of the slow time-scale in the previoussection, we now zoom in to give a description for thefast time-scale model. By “fast” time-scale, we mean thetime-scale at which requests arrive, as opposed to the“slow” time-scale at which dynamic resizing decisionsare made by the data center. To model the fast time-scale,we evenly break each frame from the slow time-scaleinto “slots” t ∈ {1, . . . , U}, such that frame length = U ·slot length.

We consider a variety of models for the workloadprocess at this fast time-scale, including both light-tailedmodels with various degrees of burstiness, as well asheavy-tailed models that exhibit self-similarity. In allcases, our assumption is that the workload is stationaryover the slots that make up each time frame.

The goal of this section is to derive the value ofCk(D, ε) in the constraint nk ≥ Ck(D,ε)

µ from Eq. (4), andthus enable an interface between the fast and slow time-scales by parameterizing the Data Center OptimizationProblem from Eq. (5) for a broad range of workloads.

Note that throughout this section we suppress frame’ssubscript k for nk, λk, Ck, and Dk, and focus on a genericframe.

Our approach for deriving the SLA constraint for theData Center Optimization Problem will be to first derivean “aggregation property” which allows the data centerto be modeled as a single server, and to then derivebounds on the distribution of the transient delay undera variety of arrival processes.

3.1 An Aggregation PropertyNote that, if the arrival process were modeled as Poissonand job sizes were exponential, then an “aggregationproperty” would be immediate, since the response timedistribution only depends on the load. Hence the SLAcould be derived by considering a single server. Outsideof this simple case, however, we need to derive a suitablesingle server approximation.

The aggregation result that we derive and apply isformulated in the framework of stochastic network cal-culus [9], and so we begin by briefly introducing thisframework.

Denote the cumulative arrival (workload) process atthe data center’s dispatcher by A(t). That is, for each slott = 1, . . . , U , A(t) counts the total number of jobs arrivedin the time interval [0, t]. Depending on the total numbern of active servers the arrival process is dispatchedinto the sub-arrival processes Ai(t) with i = 1, . . . , n.

5

The cumulative response processes from the servers aredenoted by Ri(t), whereas the total cumulative responseprocess from the data center is denoted by R(t) =∑iRi(t). All arrival and response processes are assumed

to be non-negative, non-decreasing, and left-continuous,and satisfy the initial condition A(0) = R(0) = 0. Forconvenience we use the bivariate extensions A(s, t) :=A(t)−A(s) and R(s, t) := R(t)−R(s).

The service provided by a server is modeled in termsof probabilistic lower bounds using the concept of astochastic service process. This is a bivariate random pro-cess S(s, t) which is non-negative, non-decreasing, andleft-continuous. Formally, a server is said to guaranteea (stochastic) service process S(s, t) if for any arrivalprocess A(t) the corresponding response process R(t)from the server satisfies for all t ≥ 0

R(t) ≥ A ∗ S(t) , (6)

where ‘∗’ denotes the min-plus convolution operator, i.e.,for two (random) processes A(t) and S(s, t),

A ∗ S(t) := inf0≤s≤t

{A(s) + S(s, t)} . (7)

The inequality in (6) is assumed to hold almost surely.Note that the lower bound set by the service process isinvariant to the arrival processes.

We are now ready to state the aggregation property.The proof is deferred to the appendix.

Lemma 1: Consider an arrival process A(t) which isdispatched to n servers. Each server i is work-conservingwith constant rate capacity µ > 0. Arrivals are dis-patched deterministically across the servers such thateach server i receives a fraction 1

n of the arrivals. Then,the system has service process S(s, t) = nµ(t − s), i.e.,R(t) ≥ A ∗ S(t).

The significance of the Lemma is that if the SLA isverified for the virtual server with arrival process A(t)and service process S(s, t), then the SLA is verified foreach of the n servers.

3.2 Arrival ProcessesNow that we can reduce the study of the multi-serversystem to the study of a single server system usingLemma 1, we can move to characterizing the impact ofthe arrival process on the SLA constraint in the DataCenter Optimization Problem.

In particular, the next step in deriving the SLA con-straint n ≥ C(D,ε)

µ is to derive a bound on the distributionof the delay at the virtual server with arrival process A(t)and service process S(s, t) = C(D, ε)(t− s), i.e.,

P(D(t) > D

)≤ ε . (8)

It is importation to observe that the violation probabilityε holds for the transient delay process D(t), which isdefined as D(t) := inf {d : A(t− d) ≤ R(t)}, and whichmodels the delay spent in the system by the job leavingthe system, if any, at time t. However, the violation

0 100 200 300 400 500 6000

200

400

600

800

1000

time slot (1 sec)

inst

anta

neou

s ar

rival

rate

heavy−tailedMMPoisson

Fig. 1. Three synthetically generated traces within 1frame, with λ = 300 of Poisson, Markov-Modulated (MM)(T = 1, λl = 0.5λ and λh = 2λ), and heavy-tailed arrivals(b = λ/3 and α = 1.5).

probability ε is derived so that it is time invariant, whichimplies that it bounds the distribution of the stead-statedelay D = limt→∞D(t) as well. Therefore, the value ofC(D, ε) can be finally computed by solving the equationε = ε.

In the following, we follow the outline above to com-pute C(D, ε) for light- and heavy-tailed arrival processes.Figure 1 depicts examples of the three types of arrivalprocesses we consider in 1 frame: Poisson, Markov-Modulated (MM), and heavy-tailed arrivals. In all threecases, the mean arrival rate is λ = 300. The figure clearlyillustrates the different levels of burstiness of the threetraces.

3.2.1 Light-tailed ArrivalsWe consider two examples of light-tailed arrival pro-cesses: Poisson and Markov-Modulated (MM) processes.The latter is particularly interesting since it enables theadjustment of the burstiness level.

Following the theory of the stochastic network cal-culus, the computation of a probabilistic bound on thetransient delay D(t) from Eq. (8) reduces to

P(D(t) > D

)= P

(A(t− D

)> R(t)

)≤ P

(sup

0≤s≤t−D

{A(s, t− D

)− S(s, t)

}> 0

), (9)

where S(s, t) = C(D, ε)(t − s). In the last line we usedthe definition of the service process from Eq. (7). Thisargument parallels that of [21], [29], except that it addsmore generality by using the service process S(s, t),which can capture a broad range of scheduling and/ordispatching policies as constructed in Lemma 1. The lastterm is generally bounded, in both the large deviationsand stochastic network calculus literatures, by

t−D∑s=0

P(A(s, t− D

)− S(s, t) > 0

), (10)

where the sum can be shown to converge in the caseof light-tailed arrivals. This bound, which is a directapplication of Boole’s inequality, is not so bad if thevariables A

(s, t− D

)− S(s, t) are rather uncorrelated

(e.g., the Poisson case), but it is quite loose if they are

6

highly correlated (e.g., the MM case with high burstinesslevels) [34].

To derive tight bounds, especially in the MM case, weshall rely instead on maximal inequalities which providerefined bounds on the last term in Eq. (9). Such refinedbounds have been derived for queueing systems withvarious classes of Markov arrival processes [15], [28], [9].

Poisson Arrivals

We start with the case of Poisson processes, which arecharacterized by a low level of burstiness, due to theindependent increments property.

Let A(t) be a Poisson process with some rate λ > 0,and define

θ∗ := sup{θ > 0 :

λ

θ

(eθ − 1

)≤ C(D, ε)

}. (11)

Then a bound on the transient delay process is given forall t ≥ 0 by

P(D(t) > D

)≤ e−θ

∗C(D,ε)D := ε . (12)

The proof of this equation is deferred to the appendix.Solving for C(D, ε) by setting the violation probability εequal to ε yields the implicit solution

C(D, ε) = − 1θ∗D

log ε .

Further, using the monotonicity of the functionλθ

(eθ − 1

)in θ > 0 we immediately get the explicit

solutionC(D, ε) =

K

log (1 +K)λ , (13)

whereK = − log ε

λD.

The alternative to using Eq. (12) is to use an exact re-sult for the steady-state delay distribution at an M/D/1queue, i.e., [20]

P(D > D

)= 1− (1− ρ)eλD

DC(D,ε)∑j=0

(jρ− λD)j

j!e−(j−1)ρ ,

(14)where ρ = λ

C(D,ε)denotes the utilization factor and

bxc denotes the largest integer less than or equal tox. This exact formula poses numerical complicationswhen ρ approaches one, due to the appearance of largealternating, very nearly cancelling terms (note that thefactor jρ − λD is negative). The difference between theexact result (evaluated using a sophisticated numericalalgorithm [20]) and the bound from Eq. (12) is almost in-distinguishable. See [12] for a detailed evaluation of thetightness. Moreover, unlike the exact result, the closed-form and explicit formula from Eq. (13) readily leadsto qualitative properties on the scaling of the capacityC(D, ε) (see Theorem 1). For these reasons we shall usethe bound from Eq. (13) throughout.

Markov-modulated ArrivalsConsider now the case of Markov-Modulated (MM)processes which, unlike the Poisson processes, do notnecessarily have independent increments. The key fea-ture for the purposes of this paper is that the burstinessof MM processes can be arbitrarily adjusted.

We consider a simple MM processes with two states.Let a discrete and homogeneous Markov chain x(s) withtwo states denoted by ‘low’ and ‘high’, and transitionprobabilities ph and pl between the ‘low’ and ‘high’states, and vice-versa, respectively. Assuming that asource produces at some constant rates λl > 0 andλh > λl while the chain x(s) is in the ‘low’ and ‘high’states, respectively, then the corresponding MM cumu-lative arrival process is

A(t) =t∑

s=1

(λlI{x(s)=‘low′} + λhI{x(s)=‘high′}

), (15)

where I{·} is the indicator function. The average rate ofA(t) is λ = pl

ph+plλl + ph

ph+plλh.

To adjust the burstiness level of A(t) we introduce theparameter T := 1

ph+ 1

pl, which is the average time for

the Markov chain x(s) to change states twice. We notethat the higher the value of T is, the higher the burstinesslevel becomes (the time periods whilst x(s) spends in the‘high’ or ‘low’ states get longer and longer).

To compute the delay bound let us construct thematrix

Ψ(θ) =(

(1− ph)eθλl pheθλh

pleθλl (1− pl)eθλh

),

for some θ > 0 and consider its spectral radius

λ(θ) :=(1− ph)eθλl + (1− pl)eθλh +

√∆

2,

where ∆ =((1− ph)eθλl − (1− pl)eθλh

)2 +4phpleθ(λl+λh). Let also

K(θ) := max{

pheθλh

λ(θ)− (1− ph)eθλl,λ(θ)− (1− ph)eθλl

pheθλh

}.

The two terms are the ratios of the elements of the right-eigenvector of the matrix Ψ(θ). Also, let

θ∗ := sup{θ > 0 :

1θ

log λ(θ) ≤ C(D, ε)}.

Then a bound on the transient delay process imme-diately follows from the corresponding backlog bound(see [9], pp. 340) because of the constant rate serviceassumption

P(D(t) > D

)≤ K(θ∗)e−θ

∗C(D,ε)D := ε .

Setting the violation probability ε equal to ε yields theimplicit solution

C(D, ε) = − 1θ∗D

logε

K(θ∗). (16)

7

0 50 100 150 200 2500

200

400

600

800

1000Ar

rival

Time frame k (10 mins)(a) Hotmail

0 100 200 300 400 500 600 7000

200

400

600

800

1000

Arriv

al

Time frame k (10 mins)(b) MSR

Fig. 2. Illustration of the traces used for numerical experiments.

3.3 Heavy-tailed and Self-similar ArrivalsWe now consider the class of heavy-tailed and self-similar arrival processes. These processes are funda-mentally different from light-tailed processes in thatdeviations from the mean increase in time and decayin probability as a power law, i.e., more slower than theexponential.

To capture the properties of heavy-tailed and self-similar arrivals A(t), we use the following statistical en-velope model, called htss-envelope [24] for all 0 ≤ s ≤ tand σ ≥ 0

P(A(s, t) > λ(t− s) + σ(t− s)H

)≤ Kσ−α . (17)

Here, λ is the long-term arrival rate. The tail index −αdescribes the shape of the tail and smaller values of αindicate heavier tails. We are interested in the case whenα ∈ (1, 2), i.e., arrivals with finite mean but infinite vari-ance. The self-similarity index H (or Hurst parameter)satisfies H ∈ [1/α, 1) and describes the arrival processesbehavior when rescaling time. The burst parameter σcaptures the tail behavior, and K > 0 is a constant.

The key characteristic of the htss-envelope is that theerror function ε(σ) = Kσ−α is given by a power law.This means that the arrivals A(s, t) may deviate fromthe mean λ(t − s) by some very large values withnon-negligible probabilities; moreover, these deviationsincrease as a function of time due to self-similarity [24].

As a concrete example of heavy-tailed and self-similararrivals, consider the case of a source generating jobs inevery slot according to i.i.d. Pareto random variables Xi

with tail distribution for all x ≥ b:

P (Xi > x) = (x/b)−α , (18)

where 1 < α < 2. X has finite mean E[X] = αb/(α − 1)and infinite variance. The corresponding htss-envelopehas the parameters [24]

λ = E[X], α, H =1α, K ≈ 1 . (19)

The bound on the transient delay is then [24]

P(D(t) > D

)≤ K

(C(D, ε)D

)1−α := ε , (20)

where

K = inf1<γ<

C(D,ε)λ

{(C(D, ε)

γ− λ)−1

αγα−1α

(α− 1) log γ

}.

This bound is state-of-the-art for the delay distributionin non-asymptotic regimes (i.e., for finite D). The boundis asymptotically tight in that it captures the exact decayexponent; however, the bound can be numerically loosedue to the prefactor K. See [24] for a detailed discussionof the tightness of the bound.

Setting the violation probability ε equal to ε, we getthe implicit solution

inf1<γ<

C(D,ε)λ

{γ

C(D, ε)α−1(C(D, ε)− γλ

) γα−1α

log γα−1α

}= εDα−1 .

(21)It is important to point out that the value of K from

Eq. (19) is an approximation (for technical justificationssee [24]). However, there exists a constant value K whichis uniformly bounded in s and t (see Proposition 1.1.1in [36]); consequently, the scaling law to be determinedin Eq. (23) holds.

The alternative to using Eq. (20) is to use large devia-tions results for the steady-state delay distribution [32]:

P(D > D

)∼ λ

C − λbα−1

α

(CD

)1−α, (22)

where f(x) ∼ g(x) stands for limx→∞ f(x)/g(x) = 1.Such asymptotic approximations, however, can be inac-curate for typical D of interest [2]. Hence, we limit ourfocus to Eq. (20), which provides an upper bound on thedelay distribution in non-asymptotic regimes, though thesame qualitative results follow when using Eq. (22).

4 CASE STUDIESGiven the model described in the previous two sections,we are now ready to explore the potential of dynamicresizing in data centers, and how this potential dependson the interaction between non-stationarities at the slowtime-scale and burstiness/self-similarity at the fast time-scale. Our goal in this section is to provide insight intowhich workloads dynamic resizing is valuable for. Toaccomplish this, we provide a mixture of analytic resultsand trace-driven numerical simulations in this section.

It is important to note that the case studies that followdepend fundamentally on the modeling performed sofar in the paper, which allows us to capture and adjustindependently, both fast time-scale and slow time-scaleproperties of the workload; thus allowing a rigorousstudy of the impact of the workload on value of dynamicresizing.

8

0 50 100 150 200 250 300 3500

500

1000

1500

2000

2500

time frame k (10 mins)

num

ber o

f ser

vers

nk

_=1.1_=1.3_=1.5_=1.7_=1.9Poisson

(a) Hotmail

0 100 200 300 400 500 600 700 800 9000

500

1000

1500

2000

2500


num

ber o

f ser

vers

nk

=1.1=1.3=1.5=1.7=1.9

Poisson

(b) MSR

Fig. 3. Impact of burstiness on provisioning nk for heavy-tailed arrivals.

0 50 100 150 200 250 300 3500

500

1000

1500

2000


num

ber o

f ser

vers

nk

T=10sT=1sT=0.1sT=0.01sPoisson

(a) Hotmail

0 100 200 300 400 500 600 700 800 9000

500

1000

1500

2000


num

ber o

f ser

vers

nk

T=10sT=1sT=0.1sT=0.01sPoisson

(b) MSR

Fig. 4. Impact of burstiness on provisioning nk for MM arrivals.

4.1 SetupThroughout the experimental setup, our aim is to chooseparameters that provide conservative estimates of thecase savings from dynamic resizing. Thus, one shouldinterpret the savings shown as an lower-bound on thepotential savings.

Model ParametersThe time frame for adapting the number of servers nkis assumed to be 10 min, and each time slot is assumedto be 1 s, i.e., U = 600. When not otherwise specified,we assume the following parameters for the data centerSLA agreement: the delay upper bound D = 200ms, andthe delay violation probability ε = 10−3.

The cost is characterized by the two parameters ofe0 and e1, and the switching cost β. We choose unitssuch that the fixed energy cost is e0 = 1. The load-dependent energy consumption is set to e1 = 0, becausethe energy consumption of current servers with typicalutilization level is dominated by the fixed costs [1],[6], [23]. Note that adjusting e0 and e1 changes themagnitude of potential savings under dynamic resizing,but does not affect the qualitative conclusions about theimpact of the workload. So, due to space constraints, wefix these parameters during the case studies.

The normalized switching cost β/e0 measures theduration a server must be powered down to outweighthe switching cost. Unless otherwise specified, we useβ = 6, which corresponds to the energy consumption forone hour (six samples). This was chosen as an estimate ofthe time a server should sleep so that the wear-and-tearof power cycling matches that of operating [7], [26].

Workload InformationThe workloads for these experiments are drawn fromtwo real-world data center traces. The first set of traces

is from Hotmail, a large email service running on tensof thousands of servers. We used I/O traces from 8such servers over a 48-hour period, starting at midnight(PDT) on Monday August 4 2008 [35]. The second setof I/O traces is taken from 6 RAID volumes at MSRCambridge. The traced period was 1 week starting from5PM GMT on the 22nd February 2007 [35]. Thus, theseactivity traces represent a service used by millions ofusers and a small service used by hundreds of users. Thetraces are normalized as peak load λpeak = 1000, and arevisualized in Figure 2. Both sets of traces show strongdiurnal properties and have peak-to-mean ratios (PMRs)of 1.64 and 4.64 for Hotmail and MSR respectively. Loadswere averaged over disjoint 10 minute intervals.

The traces provide information for the slow time-scalemodel. To parameterize the fast time-scale model, weadapt the workload based on the mean arrival rate ineach frame, i.e., λ. To parameterize the MM processes,we take λl = 0.5λ, λh = 2λ, and we adjust the burstparameter T while keeping λ fixed for each process. Toparameterize the heavy-tailed processes, we adjust thetail index α for each process, and b in (18) is adaptedaccordingly in order to keep the mean fixed at λ. Unlessotherwise stated, we fix α = 1.5 and T = 1.

Comparative BenchmarkWe contrast three designs: (i) the optimal dynamic resiz-ing, (ii) dynamic resizing via LCP, and (iii) the optimal‘static’ provisioning.

The results for the optimal dynamic resizing shouldbe interpreted as characterizing the potential of dynamicresizing. But, realizing this potential is a challenge thatrequires both sophisticated online algorithms and excel-lent predictions of future workloads.1

1. Note that short-term predictions of workload demand within 24hours can be quite accurate [18], [23].

9

sec min hour day

0

10

20

30

40

` / e0

% c

ost r

educ

tion

T=1sPoisson_=1.2_=1.5_=1.8

(a) Hotmail

sec min hour day

0

20

40

60

80

` / e0

% c

ost r

educ

tion

T=1sPoisson_=1.2_=1.5_=1.8

(b) MSR

Fig. 5. Impact of burstiness on the cost savings ofdynamic resizing for different switching costs, β.

1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.90

20

40

60

80

100

_

% o

ptim

al c

ost r

educ

tion

atta

ined

`=6`=18

(a) cost savings

1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9

0

10

20

30

_

% c

ost r

educ

tion

optimal for `=6LCP for `=6optimal for `=18LCP for `=18

(b) relative savings of LCP

Fig. 6. Impact of burstiness on the performance of LCPin the Hotmail trace.

The results for LCP should be interpreted as one exam-ple of how much of the potential for dynamic resizingcan be attained with an online algorithm. One reasonfor choosing LCP is that it does not rely on predictingthe workload in future frames, and thus provides aconservative bound on the achievable cost savings.

The results for the optimal static provisioning shouldbe taken as an optimistic benchmark for today’s datacenters, which typically do not use dynamic resizing.We consider the cost incurred by an optimal staticprovisioning scheme that chooses a constant number ofservers that minimizes the costs incurred based on fullknowledge of the entire workload. This policy is clearlynot possible in practice, but it provides a very conservativeestimate of the savings from right-sizing since it usesperfect knowledge of all peaks and eliminates the needfor overprovisioning in order to handle the possibility offlash crowds or other traffic bursts.

4.2 ResultsOur experiments are organized to illustrate the impactof a wide variety of parameters on the cost savings at-tainable via dynamic resizing. The goal is to understandfor which workloads dynamic resizing can provide largeenough cost savings to warrant the extra implementationcomplexity. Remember, our setup is designed so that thecost savings illustrated is a conservative estimate of thetrue cost savings provided by dynamic resizing.

The Role of BurstinessA key goal of our model is to expose the impact of bursti-ness on dynamic resizing, and so we start by focusing

1 2 3 4 5 6 7 8 9 100

20

40

60

80

peak/mean ratio

% c

ost r

educ

tion

T=1sPoisson_=1.8_=1.5_=1.2

(a) Hotmail

1 2 3 4 5 6 7 8 9 10

020406080

peak/mean ratio

% c

ost r

educ

tion

T=1sPoisson_=1.8_=1.5_=1.2

(b) MSR

Fig. 7. Impact of peak-to-mean ratio on the cost savingsof the optimal dynamic resizing.

on that parameter. Recall that we can vary busrtiness inboth the light-tailed and heavy-tailed settings using Tfor MM arrivals and α for heavy-tailed arrivals.

The impact of burstiness on provisioning: A priori,one may expect that burstiness can be beneficial for dy-namic resizing, since it indicates that there are periods oflow load during which energy may be saved. However,this is not actually true since resizing decisions must bemade at the slow time-scale while burstiness is a charac-teristic of the fast time-scale. Thus, burstiness is actuallydetrimental for dynamic resizing, since it means thatthe provisioning decisions made on the slow time-scalemust be made with the bursts in mind, which results inan larger number of servers needed to be provisionedfor the same average workload. This effect can be seenin Figures 3 and 4, which show the optimal dynamicprovisioning as α and T vary. Recall that burstinessincreases as α decreases and T increases.

The impact of burstiness on cost savings: The largerprovisioning created by increased burstiness manifestsitself in the cost savings attainable through dynamiccapacity provisioning as well. This is illustrated in Figure5, which shows the cost savings of the optimal dynamicprovisioning as compared to the optimal static provision-ing for varying α and T as a function of the switchingcost β.

The impact of burstiness on LCP: Interestingly,though Figure 5 shows that the potential of dynamic re-sizing is limited by increased burstiness, it turns out thatthe relative performance of LCP is not hurt by burstiness.This is illustrated in Figure 6, which shows the percent ofthe optimal cost savings that LCP achieves. Importantly,it is nearly perfectly flat as the burstiness is varied.

The Role of the Peak-to-mean RatioThe impact of the peak-to-mean ratio on the potentialbenefits of dynamic resizing is quite intuitive: if thepeak-to-mean ratio is high, then there is more oppor-tunity to benefit from dynamically changing capacity.Figure 7 illustrates this well-known effect. The workloadfor the figure is generated from the traces by scaling λkas λk = c(λk)γ , varying γ and adjusting c to keep themean constant.

In addition to illustrating that a higher peak-to-meanratio makes dynamic resizing more valuable, Figure 7

10

0 50 100 150 200 250 300 350102

103

104

105


num

ber o

f ser

vers

nk

! = 10!6

! = 10!5

! = 10!4

! = 10!3

! = 10!2

! = 10!1

(a) Hotmail

0 100 200 300 400 500 600 700 800 900101

102

103

104

105


num

ber o

f ser

vers

nk

! = 10!6

! = 10!5

! = 10!4

! = 10!3

! = 10!2

! = 10!1

(b) MSR

Fig. 8. Impact of ε on provisioning nk for heavy tailed arrivals.

0 50 100 150 200 250 300 3500

500

1000

1500

2000


num

ber o

f ser

vers

nk

! = 10!6

! = 10!5

! = 10!4

! = 10!3

! = 10!2

! = 10!1

(a) Hotmail

0 100 200 300 400 500 600 700 800 9000

500

1000

1500

2000


num

ber o

f ser

vers

nk

! = 10!6

! = 10!5

! = 10!4

! = 10!3

! = 10!2

! = 10!1

(b) MSR

Fig. 9. Impact of ε on provisioning nk for MM arrivals.

0 50 100 150 200 250 300 350 4000

500

1000


num

ber o

f ser

vers

nk

D = 100msD = 200msD = 300msD = 400msD = 500msD = 600ms

(a) Hotmail

0 100 200 300 400 500 600 700 800 900 10000

500

1000

1500


num

ber o

f ser

vers

nk

D = 100msD = 200msD = 300msD = 400msD = 500msD = 600ms

(b) MSR

Fig. 10. Impact of D on provisioning nk for heavy tailed arrivals.

also highlights that there is a strong interaction betweenburstiness and the peak-to-mean ratio, where if there issignificant burstiness the benefits that come from a highpeak-to-mean ratio may be diminished considerably.

The Role of the SLAThe SLA plays a key role in the provisioning of a datacenter. Here, we show that the SLA can also have astrong impact on whether dynamic resizing is valuable,and that this impact depends on the workload. Recallthat in our model the SLA consists of a violation proba-bility ε and a delay bound D. We deal with each of thesein turn.

Figures 8 and 9 highlight the role the violation proba-bility ε has on the provisioning of nk under the optimaldynamic resizing in the cases of heavy-tailed and MMarrivals. Interestingly, we see that there is a significantdifference in the impact of ε depending on the arrivalprocess. As ε gets smaller in the heavy-tailed case theprovisioning gets significantly flatter, until there is al-most no change in nk over time. In contrast, no suchbehavior occurs in the MM case and, in fact, the impactof ε is quite small. This difference is a fundamentaleffect of the “heaviness” of the tail of the arrivals, i.e., aheavy tail requires significantly more capacity in orderto counter a drop in ε.

This contrast between heavy- and light-tailed arrivalsis also evident in Figure 11, which highlights the costsavings from dynamic resizing in each case as a functionof ε. Interestingly, the cost savings under light-tailedarrivals is largely independent of ε, while under heavy-tailed arrivals the cost savings is monotonically increas-ing with ε.

The second component of the SLA is the delay boundD. The impact of D on provisioning is much less dra-matic. We show an example in the case of heavy-tailedarrivals in Figure 10. Not surprisingly, the provisioningincreases as D drops. However, the flattening observedas a result of ε is not observed here. The case of MMarrivals is qualitatively the same, and so we do notinclude it.

When is Dynamic Resizing Valuable?

Now, we are finally ready to address the question ofwhen (i.e., for what workloads) is dynamic resizingvaluable. To address this question, we must look atthe interaction between the peak-to-mean ratio and theburstiness. Our goal is to provide a concrete under-standing of for which (peak-to-mean, burstiness, SLA)settings the potential savings from dynamic resizing islarge enough to warrant implementation. Figures 12–14

11

10−5

10−4

10−3

10−2

10−1

0

10

20

30

ε

% c

ost r

educ

tion

T=1Poisson

α=1.8

α=1.5

α=1.2

(a) Hotmail

10−5

10−4

10−3

10−2

10−1

0

20

40

60

ε

% c

ost r

educ

tion

T=1Poisson

α=1.8

α=1.5

α=1.2

(b) MSR

Fig. 11. Impact of ε on the cost savings of dynamicresizing.

1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.90

1

2

3

4

5

6

α

peak

/mea

n ra

tio

cost reduction =40%cost reduction =30%cost reduction =20%cost reduction =10%

(a) Hotmail

1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.90

2

4

6

8

10

α

peak

/mea

n ra

tio

cost reduction =40%cost reduction =30%cost reduction =20%cost reduction =10%

(b) MSR

Fig. 12. Characterization of burstiness and peak-to-meanratio necessary for dynamic resizing to achieve differentlevels of cost reduction.

focus on this question. Our hope is that these figureshighlight that a precursor to any debate about the valueof dynamic resizing must be a joint understanding of theexpected workload characteristics and the desired SLA,since for any fixed choices of two of these parameters(peak-to-mean, burstiness, SLA), the third can be cho-sen so that dynamic resizing does or does not providesignificant cost savings for the data center.

Starting with Figure 12, we see a set of curves fordifferent levels of cost savings. The interpretation of thefigures is that below (above) each curve the savingsfrom optimal dynamic resizing is smaller (larger) thanthe specified value for the curve. Thus, for example,if the peak-to-mean ratio is 2 in the Hotmail trace, a10% cost savings is possible for all levels of burstiness,but a 30% cost savings is only possible for α > 1.5.However, if the peak-to-mean ratio is 3, then a 30%cost savings is possible for all levels of burstiness. Itis difficult to say what peak-to-mean and burstinesssettings are “common” for data centers, but as a point ofreference, one might expect large-scale services to have apeak-to-mean ratio similar to that of the Hotmail trace,i.e., around 1.5-2.5; and smaller scale services to havepeak-to-mean ratios similar to that of the MSR trace, i.e.,around 4-6. The burstiness also can vary widely, but as arough estimate, one might expect α to be around 1.4-1.6.

Of course, many of the settings of the data center willeffect the conclusions illustrated in Figure 12. Two of themost important factors to understand the effects of arethe switching cost, β, and the SLA, particularly ε.

Figure 13 highlights the impact of the magnitude of

1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.90

2

4

6

8

10

α

peak

/mea

n ra

tio

β=72β=18β=6β=1

(a) Hotmail

1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.90

1

2

3

4

5

6

7

α

peak

/mea

n ra

tio

β=72β=18β=6β=1

(b) MSR

Fig. 13. Characterization of burstiness and peak-to-meanratio necessary for dynamic resizing to achieve 20% costreduction as a function of the switching cost, β.

1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.90

2

4

6

8

_

peak

/mea

n ra

tio

! = 10! 4

! = 10! 3

! = 10! 2

! = 10! 1

(a) Hotmail

1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.90

5

10

15

20

_

peak

/mea

n ra

tio

! = 10! 4

! = 10! 3

! = 10! 2

! = 10! 1

(b) MSR

Fig. 14. Characterization of burstiness and peak-to-meanratio necessary for dynamic resizing to achieve 20% costreduction as a function of the switching SLA, ε.

the switching costs on the value of dynamic resizing. Thecurves represent the threshold on peak-to-mean ratioand burstiness necessary to obtain 20% cost savingsfrom dynamic resizing. As the switching costs increase,the workload must have a larger peak-to-mean ratioand/or less burstiness in order for dynamic resizing tobe valuable. This is not unexpected. However, what isperhaps surprising is the small impact of played by theswitching cost. The class of workloads where dynamicresizing is valuable only shrinks slightly as the switchingcost is varied from on the order of the cost of running aserver for 10 minutes (β = 1) to running a server for 3hours (β = 18).

Interestingly, while the impact of the switching costson the value of dynamic resizing is small, the impactof the SLA is quite large. In particular, the violationprobability ε can dramatically affect whether dynamicresizing is valuable or not. This is shown in Figure 14,on which the curves represent the threshold on peak-to-mean ratio and burstiness necessary to obtain 20%cost savings from dynamic resizing. We see that, asthe violation probability is allowed to be larger, theimpact of the peak-to-mean ratio on the potential ofsavings from dynamic resizing disappears; and the valueof dynamic resizing starts to depend almost entirelyon the burstiness of the arrival process. The reason forthis can be observed in Figure 8, which highlights thatthe optimal provisioning nk becomes nearly flatter as εincreases.

12

Supporting Analytic ResultsTo this point we have focused on numerical simulations,but in this we provide analytic support for the behaviorwe observed in the experiments above. In particular,the following two theorems characterize the impact ofburstiness and the SLA (D, ε) on the value of dynamicresizing under Poisson and heavy-tailed arrivals. This isaccomplished by deriving the effect of these parameterson C(D, ε), which constrains the optimal provisioningnk. A smaller (larger) C(D, ε) implies a smaller (larger)provisioning nk, which in turn implies smaller (larger)costs.

We start providing a result for the case of Poissonarrivals. The proof is given in the appendix.

Theorem 1: The service capacity constraint fromEq. (13) increases as the delay constraint D or theviolation probability ε decrease. It also satisfies thescaling law

C(D, ε) = Θ(

D−1 log ε−1

log (D−1 log ε−1)

),

as D−1 log ε−1 →∞.This theorem highlights that as ε decreases and/or

D decreases C(D, ε), and thus the cost of the optimalprovisioning, increases. This shows that the observationsmade in our numeric experiments hold more generally.Perhaps the most interesting point about this theorem,however, is the contrast of the growth rate with that inthe case of heavy-tailed arrivals, which is summarizedin the following theorem. The proof is given in theappendix.

Theorem 2: The implicit solution for the capacity con-straint from Eq. (21) increases as the delay constraint Dor the violation probability ε decrease, or the value of αdecreases. It also satisfies the scaling law

C(D, ε) = Θ

((1

εDα−1

) 1α

)(23)

as εDα−1 → 0 for any given α ∈ (1, 2).A key observation about this theorem is that the

growth rate of C(D, ε) with ε is much faster than in thecase of the Poisson (polynomial instead of logarithmic).This supports what is observed in Figure 11. Addi-tionally, Theorem 2 highlights the impact of burstiness,α, and shows that the behavior we have seen in ourexperiments holds more generally.

5 CONCLUSION

Our goal in this paper is to provide new insight intothe debate about the potential of dynamic resizing indata centers. Clearly, there are many facets of this issuerelating to the engineering, algorithmic, and reliabilitychallenges involved in dynamic resizing which we haveignored in this paper. These are all important issueswhen trying to realize the potential of dynamic resizing.But, the point we have made in this paper is that

when quantifying the potential of dynamic resizing it isof primary importance to understand the joint impact ofworkload and SLA characteristics.

To make this point, we have presented a new modelthat captures the impact of SLA characteristics in ad-dition to both slow time-scale non-stationarities in theworkload and fast time-scale burstiness in the workload.This model allows us to provide the first study ofdynamic resizing that captures both the stochastic bursti-ness and diurnal non-stationarities of real workloads.Within this model, we have provided both trace-basednumerical case studies and analytical results. Perhapsmost tellingly, our results highlight that even when twoof SLA, peak-to-mean ratio, and burstiness are fixed, theother one can be chosen to ensure that there either are orare not significant savings possible via dynamic resizing.Figures 12-14 illustrate how dependent the potential ofdynamic resizing is on these three parameters. Thesefigures highlight that a precursor to any debate aboutthe value of dynamic resizing must be an understandingof the workload characteristics expected and the SLAdesired. Then, one can begin to discuss whether thispotential is obtainable.

Future work on this topic includes providing a moredetailed study of how other important factors affect thepotential of dynamic resizing, e.g., storage issues, relia-bility issues, and the availability of renewable energy.Note that provisioning capacity to take advantage ofrenewable energy when it is available is an importantbenefit of dynamic resizing that we have not consideredat all in the current paper.

APPENDIX

In this section, we collect the proofs for the results in thebody of the paper.

We start with the proof of Lemma 1, the aggregationproperty used to model the multiserver system with asingle service process.

Proof of Lemma 1: Fix t ≥ 1. Because each server ihas a constant rate capacity µ, it follows that the bivariateprocesses Si(s, t) = µ(t− s) are service processes for theindividual servers (see [9], pp. 167), i.e.,

Ri(t) ≥ inf0≤s≤t

{Ai(s) + µ(t− s)}

=1n

inf0≤s≤t

{A(s) + nµ(t− s)} ,

where Ri(t) is the departure process from server i. Inthe last line we used the load-balancing dispatchingassumption, i.e., Ai(s) = 1

nA(s). Adding the terms fori = 1, . . . , n it immediately follows that∑

i

Ri(t) ≥ inf0≤s≤t

{A(s) + nµ(t− s)} ,

which shows that the bivariate process S(s, t) = nµ(t−s)is a service process for the virtual system with arrivalprocess A(t) =

∑iAi(t) and departure process R(t) =∑

iRi(t).

13

Next, we prove the bound used for Poisson arrivalprocesses, i.e., Eq. (12).

Proof of Eq. (12): The proof follows closely a tech-nique from [22] for the analysis of GI/GI/1 queues.Denote for convenience C = C(D, ε). Fix t ≥ 1 andintroduce the process

T (s) = eθ∗(A(t−D−s,t−D)−Cs)

for all 0 ≤ s ≤ t − D. Consider also the filtration ofσ-algebras

Fs = σ{A(t− D − s, t− D)

},

i.e., Fs ⊆ Fs+u for all 0 ≤ s ≤ s + u ≤ t − D. Note thatT (s) is Fs-measurable for all s (see [33], pp. 79). Then wecan write for the conditional expectations for all s, u ≥ 0with s+ u ≤ t− D

E [T (s+ u) ‖ Fs] = E[T (s)eθ

∗(A(t−D−s−u,t−D−s)−Cu) ‖ Fs]

= T (s)E[eθ∗(A(t−D−s−u,t−D−s)−Cu) ‖ Fs

]= T (s)E

[eθ∗(A(t−D−s−u,t−D−s)−Cu)

]= T (s)eθ

∗(λθ∗

(eθ∗−1

)−C

)u

≤ T (s) .

In the second line we used that T (s) is Fs-measurable,and then we used the independent increments propertyof the Poisson process A(t), i.e., A(t−D−s−u, t−D−s)is independent of Fs. Then we computed the momentgenerating function for the Poisson process, and finallywe used the property of θ∗ from Eq. (11). Therefore, theprocess T (s) is a supermartingale, i.e.,

E [T (s+ u) ‖ Fs] ≤ T (s)

We can now continue Eq. (9) as follows

P(D(t) > D

)≤ P

(sup

0≤s≤t−D

{A(s, t− D

)− C(t− D − s)

}> CD

)

≤ P

(sup

0≤s≤t−DT (s) > eθ

∗CD

)≤ e−θ

∗CD ,

which proves Eq. (12). Note that in the last line we used amaximal inequality for the (continuous) supermartingaleT (s) (see [33], pp. 54).

Finally, we prove the monotonicity and scaling resultsin Theorems 1 and 2

Proof of Theorem 1: First, note that the monotonicityproperties follow immediately from the fact that thefunction f(x) = (1 + x)

1x is non-decreasing. Next, to

prove the more detailed scaling laws, simply notice thatlog (1 + cf(n)) = Θ (log f(n)) for some non-decreasingfunction f(n) and a constant c > 0. The result follows.

Proof of Theorem 2:

We first consider the monotonicity properties and thenthe scaling law.

Monotonicity properties: To prove the monotonicity re-sults on D and ε, observe that the left hand side (LHS)in the implicit equation from Eq. (21) is a non-increasingfunction in C(D, ε) because the range of the infimumexpands whereas the function in the infimum decreases,by increasing C(D, ε). Moreover, the LHS is unboundedat the boundary C(D, ε) = λ. The solution C(D, ε) isthus non-increasing in both D and ε.

Next, to prove monotonicity in α, fix α1 ≤ α2 anddenote by C1 the implicit solution of Eq. (21) for α = α1.In the first step we prove that C1D ≥ 1. Let C be thesolution of

1Cα1

= εDα1−1 ,

where the LHS was obtained by relaxing the LHS ofEq. (21) (we used that γ > 1, C − γλ < C, and x ≥ log xfor all x ≥ 1). Consequently, C1 ≥ C, and by assumingthat the units are properly scaled such that D ≥ 1, itfollows that CD ≥ 1 and hence C1D ≥ 1.

Secondly, we prove that γ, i.e., the optimal value inthe solution of C1, satisfies γ < e. Consider the functionf(γ) = γa+1

a log γ with a = α−1α . If, by contradiction, γ ≥ e,

then f ′(γ) > 0 and consequently f(γ) is increasing on[e,∞). Since the function 1

C1−γλ is also increasing in γ,we get a contradiction that γ is the optimal solution asassumed, and hence γ < e.

Finally, consider the function g(a) = γa

a log γ with a =α−1α . The previous property γ < e implies that g′(a) ≤ 0

and further that g is non-increasing in a and hence in αas well. Since 1

(C1D)α−1 is also non-increasing in α, weobtain that

inf1<γ<

C1λ

{γ

(C1D)α1−1 (C1 − γλ)γα1−1α1

log γα1−1α1

}

≥ inf1<γ<

C1λ

{γ

(C1D)α2−1 (C1 − γλ)γα2−1α2

log γα2−1α2

}.

Using the monotonicity in C1 in the term inside theinfimum, it follows that C1 ≥ C2, where C2 is the implicitsolution of Eq. (21) for α = α1. Therefore, C(D, ε) is non-increasing in α.

Scaling law: To prove the scaling law, denote by C theimplicit solution of the equation

inf1<γ<max

{Cλ ,e

2αα−1

}{

1Cα

γα−1α

log γα−1α

}= εDα−1 . (24)

The LHS here was constructed by relaxing the functioninside the infimum in the LHS of Eq. (21) and extendingthe range of the infimum. This means that the implicitsolution C is smaller than the implicit solution C(D, ε).The function inside the infimum of the LHS of Eq. (24)is convex on the domain of γ and attains its infimumat γ = e

αα−1 . Solving for C and using that C(D, ε) ≥ C

proves the lower bound.

14

To prove the upper bound, let us fix α, D0 and ε0, anddenote by C0(D0, ε0) the corresponding implicit solution.Using the monotonicity of the implicit solution in εDα−1,as shown above, it follows that

C(D, ε) ≥ C0(D0, ε0) ,

where εDα−1 ≤ ε0Dα−10 . Fixing γ0 = C0(D0,ε0)+λ

2λ , let Cbe the solution of the equation

γ0

Cα−1 (C − γ0λ)γα−1α

0

log γα−1α

0

= εDα−1 . (25)

Because the range of γ in the solution of C(D, ε) includesγ0, it follows that C(D, ε) ≤ C. On the other hand, theLHS of Eq. (25) satisfies

γ0

Cα−1 (C − γ0λ)γα−1α

0

log γα−1α

0

≤ K0

Cα, (26)

where K0 = γ0C0(D0,ε0)C0(D0,ε0)−γ0λ

γα−1α

0

log γα−1α

0

. Here we used that

C ≥ C0(D0, ε0) (note that we showed before that C ≥C(D, ε) and C(D, ε) ≥ C0(D0, ε0)). Finally, combiningEqs. (25) and (26) we immediately get the scaling lawC = O

((1

εDα−1

) 1α

), and since C(D, ε) ≤ C the proof is

complete.

REFERENCES[1] SPEC power data on SPEC website at http://www.spec.org.[2] J. Abate, G. L. Choudhury, and W. Whitt. Waiting-time tail

probabilities in queues with long-tail service-time distributions.Queueing Systems, 16(3-4):311–338, Sept. 1994.

[3] F. Ahmad and T. N. Vijaykumar. Joint optimization of idle andcooling power in data centers while maintaining response time.In Proceedings of the fifteenth edition of ASPLOS on Architecturalsupport for programming languages and operating systems, ASPLOS’10, pages 243–256, New York, NY, USA, 2010. ACM.

[4] S. Albers. Energy-efficient algorithms. Comm. of the ACM,53(5):86–96, 2010.

[5] H. Amur, J. Cipar, V. Gupta, G. R. Ganger, M. A. Kozuch, andK. Schwan. Robust and flexible power-proportional storage. InProc. ACM SoCC, 2010.

[6] L. A. Barroso and U. Holzle. The case for energy- proportionalcomputing. Computer, 40(12):33–37, 2007.

[7] P. Bodik, M. P. Armbrust, K. Canini, A. Fox, M. Jordan, and D. A.Patterson. A case for adaptive datacenters to conserve energyand improve reliability. Technical Report UCB/EECS-2008-127,University of California at Berkeley, 2008.

[8] E. V. Carrera, E. Pinheiro, and R. Bianchini. Conserving diskenergy in network servers. In Proceedings of the 17th annualinternational conference on Supercomputing, ICS ’03, pages 86–97,New York, NY, USA, 2003. ACM.

[9] C.-S. Chang. Performance Guarantees in Communication Networks.Springer Verlag, 2000.

[10] J. S. Chase, D. C. Anderson, P. N. Thakar, A. M. Vahdat, and R. P.Doyle. Managing energy and server resources in hosting cen-ters. In Proceedings of the eighteenth ACM symposium on Operatingsystems principles, SOSP ’01, pages 103–116, New York, NY, USA,2001. ACM.

[11] G. Chen, W. He, J. Liu, S. Nath, L. Rigas, L. Xiao, andF. Zhao. Energy-aware server provisioning and load dispatchingfor connection-intensive internet services. In Proc. USENIX NSDI,2008.

[12] F. Ciucu. Network calculus delay bounds in queueing networkswith exact solutions. In International Teletraffic Congress (ITC),pages 495–506, 2007.

[13] C. Clark, K. Fraser, S. Hand, J. G. Hansen, E. Jul, C. Limpach,I. Pratt, and A. Warfield. Live migration of virtual machines. InProc. USENIX NSDI, pages 273–286, 2005.

[14] B. Diniz, D. Guedes, W. Meira, Jr., and R. Bianchini. Limiting thepower consumption of main memory. In Proceedings of the 34thannual international symposium on Computer architecture, ISCA ’07,pages 290–301, New York, NY, USA, 2007. ACM.

[15] N. G. Duffield. Exponential bounds for queues with markovianarrivals. Queueing Systems, 17(3-4):413–430, sep 1994.

[16] A. Gandhi, V. Gupta, M. Harchol-Balter, and M. A. Kozuch.Optimality analysis of energy-performance trade-off for serverfarm management. Performance Evaluation, 67(11):1155 – 1171,2010.

[17] A. Gandhi, M. Harchol-Balter, and M. A. Kozuch. The case forsleep states in servers. In Proceedings of the 4th Workshop on Power-Aware Computing and Systems, HotPower ’11, pages 2:1–2:5, NewYork, NY, USA, 2011. ACM.

[18] D. Gmach, J. Rolia, L. Cherkasova, and A. Kemper. Workloadanalysis and demand prediction of enterprise data center applica-tions. In Proceedings of the 2007 IEEE 10th International Symposiumon Workload Characterization, IISWC ’07. IEEE Computer Society,2007.

[19] A. Greenberg, J. Hamilton, D. A. Maltz, and P. Patel. The cost ofa cloud: research problems in data center networks. SIGCOMMComput. Commun. Rev., 39:68–73, December 2008.

[20] V. Iversen and L. Staalhagen. Waiting time distribution in M/D/1queueing systems. Electronics Letters, 35(25):2184–2185, Dec. 1999.

[21] F. P. Kelly. Notes on effective bandwidths. In Stochastic Networks:Theory and Applications. Royal Statistical Society Lecture Notes Series,4, pages 141–168. Oxford University Press, 1996.

[22] J. F. C. Kingman. A martingale inequality in the theory of queues.Cambridge Philosophical Society, 60(2):359–361, Oct. 1964.

[23] D. Kusic, J. O. Kephart, J. E. Hanson, N. Kandasamy, and G. Jiang.Power and performance management of virtualized computingenvironments via lookahead control. In Proceedings of the 2008International Conference on Autonomic Computing, ICAC ’08, 2008.

[24] J. Liebeherr, A. Burchard, and F. Ciucu. Delay bounds in commu-nication networks with heavy-tailed and self-similar traffic. IEEETransactions on Information Theory, 58(2):1010 –1024, Feb. 2012.

[25] M. Lin, Z. Liu, A. Wierman, and L. L. Andrew. Online algorithmsfor geographical load balancing. Under submission.

[26] M. Lin, A. Wierman, L. L. H. Andrew, and E. Thereska. Dynamicright-sizing for power-proportional data centers. In IEEE Infocom,pages 1098–1106, 2011.

[27] M. Lin, A. Wierman, L. L. H. Andrew, and E. Thereska. Onlinedynamic capacity provisioning in data centers. In Allerton Confer-ence on Communication, Control, and Computing, 2011.

[28] Z. Liu, P. Nain, and D. Towsley. Exponential bounds withapplications to call admission. Journal of the ACM, 44(3):366–394,may 1997.

[29] R. R. Mazumdar. Performance Modeling, Loss Networks, and Statisti-cal Multiplexing. Synthesis Lectures on Communication Networks.Morgan & Claypool Publishers, 2009.

[30] D. Meisner, B. T. Gol, and T. F. Wenisch. PowerNap: Eliminatingserver idle power. In ACM SIGPLAN Notices, volume 44, pages205–216, Mar. 2009.

[31] D. Meisner, C. M. Sadler, L. A. Barroso, W.-D. Weber, and T. F.Wenisch. Power management of online data-intensive services. InProceeding of the 38th annual international symposium on Computerarchitecture, ISCA ’11, 2011.

[32] A. Pakes. On the tails of waiting-time distributions. Journal ofApplied Probability, 3(12):555–564, Sept. 1975.

[33] D. Revuz and M. Yor. Continuous Martingales and Brownian Motion(3rd Edition). Springer Verlag, 1999.

[34] M. Talagrand. Majorizing measures: The generic chaining. TheAnnals of Probability, 24(3):1049–1103, 1996.

[35] E. Thereska, A. Donnelly, and D. Narayanan. Sierra: a power-proportional, distributed storage system. Technical Report MSR-TR-2009-153, Microsoft Research, 2009.

[36] V. Vinogradov. Refined large deviation limit theorems. LongmanScientific & Technical, 1994.

Documents

Characterizing the Impact of the Workload on the …users.cms.caltech.edu/~adamw/papers/netcalc.pdf1 Characterizing the Impact of the Workload on the Value of Dynamic Resizing in Data