11
Selective Commitment and Selective Margin: Techniques to Minimize Cost in an IaaS Cloud Yu-Ju Hong, Jiachen Xue, and Mithuna Thottethodi School of Electrical and Computer Engineering Purdue University West Lafayette, IN, USA {yujuhong,xuej,mithuna}@purdue.edu Abstract—Cloud computing holds the exciting potential of elastically scaling computation to match time-varying demand, thus eliminating the need to provision for peak demand. However, the uncertainty of variable loads necessitate the use of margins servers that must be held active to absorb unpredictable potential load bursts – which can be a significant fraction of overall cost. Further, naively switching to an on-demand cloud model can actually degrade true costs (server costs that would be incurred even if margin costs disappeared) because of the fundamental economic rule wherein on-demand services/goods cost more compared to reserved services/goods where the user bears some commitment. On-demand customers pay a premium in exchange for not undertaking the fixed-cost risk that committed customers undertake. This paper addresses the twin challenges of minimizing margin costs and true costs in an Infrastructure-as-a-Service (IaaS) cloud. Our paper makes the following two contributions. First, rather than use a fixed margin, we observe that the margin may be selectively used depending on load levels. Based on the above observation, we develop ShrinkWrap-opt which is a dynamic programming algorithm that achieves optimal margin cost while satisfying the desired (statistical) response time guarantees. Second, we propose commitment straddling the selective use of some reserved machines in conjunction with on-demand machines – to achieve optimal true-cost. Simulations with real Web server load traces using the Amazon EC2 cost model reveal that our techniques save between 13% and 29% (21% on average) in cost while satisfying response-time targets. I. I NTRODUCTION In the pre-cloud world, server operators had to either incur the cost of provisioning for the peak-demand (or near-peak demand, if some modest dilution in server response time was acceptable [1]) or incur the cost of excessive degradation in response time. The emergence of commercially-available Infrastructure-as-a-Service (IaaS) cloud computing vendors such as Amazon EC2 has enabled a more elastic provisioning approach wherein on-demand computational resources can be “rented” at very short notice. Armbrust et al. provide an expanded overview of such tradeoffs in their white paper on cloud computing [2]. The cost-advantage of cloud-computing for episodic compu- tation demands (e.g., one-time document digitization, hosting sites covering major sporting events) is well-understood; users with such one-time demands can avoid capital expenditure and instead utilize their financial resources solely for oper- ational expenses. In contrast, the case for cloud computing for ongoing, day-to-day operations with long time horizons is less clear. There are many factors that may hinder cloud adoption, as described in [2]. This paper focuses on one important factor – costs incurred by the potential cloud user. The goal of this paper is to lower the cost of operating ongoing day-to-day computation in the cloud. Specifically, there are two key factors that affect cost. First, even though broad trends in load variation may be predictable (e.g., the diurnal/weekly patterns), prediction models are not perfect. Because a prediction model can underprovision servers, opera- tors are forced to maintain a margin – a pool of servers beyond the expected load – which adds to the “true” cost (which is the cost if loads are known a priori without any uncertainty). Minimizing such margin cost is important. One may think that starting on-demand machine instances reactively avoids the margin requirement because servers may be started whenever the load exceeds existing server capacity. However, such a reactive approach is not a viable option because IaaS vendors provide weak guarantees of launch time. For example, Amazon EC2 says that it “typically takes less than 10 minutes” for instances to begin their boot sequences (as listed at [3], observed on October 7th 2011). As such, relying on reactively launching servers to handle surging load will result in minutes of underprovisioning – an unacceptable outcome. Second, cloud vendors such as Amazon EC2 offer services at various commitment levels. For example, at the lowest commitment level, there are on-demand instances in which machine instances are acquired on an hourly basis with no longer-term commitment at all. At higher levels, there are the “reserved instances” wherein the user may pay an upfront fixed cost to ensure discounted hourly pricing for various durations (e.g., 1 year, 3 years). Minimizing cost by acquiring machine instances at the cost-optimal commitment level for loads is also an important challenge. This paper makes two key contributions to reduce both the above costs for cloud users. Our first contribution is a tech- nique to determine margins in such a way that margin costs are minimized under a given load volatility model. The technique has two innovations based on two observations we made in the request traces of real workloads. First, we observed that the margin requirements vary by load-level. Unlike traditional load-oblivious margin mechanisms which use some fixed arithmetic transformation on the load to compute margins (e.g., 99 978-1-4673-1146-5/12/$31.00 ©2012 IEEE

[IEEE 2012 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS) - New Brunswick, NJ, USA (2012.04.1-2012.04.3)] 2012 IEEE International Symposium on

  • Upload
    mithuna

  • View
    215

  • Download
    3

Embed Size (px)

Citation preview

Page 1: [IEEE 2012 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS) - New Brunswick, NJ, USA (2012.04.1-2012.04.3)] 2012 IEEE International Symposium on

Selective Commitment and Selective Margin:Techniques to Minimize Cost in an IaaS Cloud

Yu-Ju Hong, Jiachen Xue, and Mithuna ThottethodiSchool of Electrical and Computer Engineering

Purdue UniversityWest Lafayette, IN, USA

{yujuhong,xuej,mithuna}@purdue.edu

Abstract—Cloud computing holds the exciting potential ofelastically scaling computation to match time-varying demand,thus eliminating the need to provision for peak demand. However,the uncertainty of variable loads necessitate the use of margins –servers that must be held active to absorb unpredictable potentialload bursts – which can be a significant fraction of overall cost.Further, naively switching to an on-demand cloud model canactually degrade true costs (server costs that would be incurredeven if margin costs disappeared) because of the fundamentaleconomic rule wherein on-demand services/goods cost morecompared to reserved services/goods where the user bears somecommitment. On-demand customers pay a premium in exchangefor not undertaking the fixed-cost risk that committed customersundertake.

This paper addresses the twin challenges of minimizing margincosts and true costs in an Infrastructure-as-a-Service (IaaS)cloud. Our paper makes the following two contributions. First,rather than use a fixed margin, we observe that the marginmay be selectively used depending on load levels. Based onthe above observation, we develop ShrinkWrap-opt whichis a dynamic programming algorithm that achieves optimalmargin cost while satisfying the desired (statistical) response timeguarantees. Second, we propose commitment straddling –the selective use of some reserved machines in conjunction withon-demand machines – to achieve optimal true-cost. Simulationswith real Web server load traces using the Amazon EC2 costmodel reveal that our techniques save between 13% and 29%(21% on average) in cost while satisfying response-time targets.

I. INTRODUCTION

In the pre-cloud world, server operators had to either incurthe cost of provisioning for the peak-demand (or near-peakdemand, if some modest dilution in server response time wasacceptable [1]) or incur the cost of excessive degradationin response time. The emergence of commercially-availableInfrastructure-as-a-Service (IaaS) cloud computing vendorssuch as Amazon EC2 has enabled a more elastic provisioningapproach wherein on-demand computational resources can be“rented” at very short notice. Armbrust et al. provide anexpanded overview of such tradeoffs in their white paper oncloud computing [2].

The cost-advantage of cloud-computing for episodic compu-tation demands (e.g., one-time document digitization, hostingsites covering major sporting events) is well-understood; userswith such one-time demands can avoid capital expenditureand instead utilize their financial resources solely for oper-ational expenses. In contrast, the case for cloud computing

for ongoing, day-to-day operations with long time horizonsis less clear. There are many factors that may hinder cloudadoption, as described in [2]. This paper focuses on oneimportant factor – costs incurred by the potential cloud user.The goal of this paper is to lower the cost of operatingongoing day-to-day computation in the cloud. Specifically,there are two key factors that affect cost. First, even thoughbroad trends in load variation may be predictable (e.g., thediurnal/weekly patterns), prediction models are not perfect.Because a prediction model can underprovision servers, opera-tors are forced to maintain a margin – a pool of servers beyondthe expected load – which adds to the “true” cost (which isthe cost if loads are known a priori without any uncertainty).Minimizing such margin cost is important. One may think thatstarting on-demand machine instances reactively avoids themargin requirement because servers may be started wheneverthe load exceeds existing server capacity. However, such areactive approach is not a viable option because IaaS vendorsprovide weak guarantees of launch time. For example, AmazonEC2 says that it “typically takes less than 10 minutes” forinstances to begin their boot sequences (as listed at [3],observed on October 7th 2011). As such, relying on reactivelylaunching servers to handle surging load will result in minutesof underprovisioning – an unacceptable outcome.

Second, cloud vendors such as Amazon EC2 offer servicesat various commitment levels. For example, at the lowestcommitment level, there are on-demand instances in whichmachine instances are acquired on an hourly basis with nolonger-term commitment at all. At higher levels, there are the“reserved instances” wherein the user may pay an upfront fixedcost to ensure discounted hourly pricing for various durations(e.g., 1 year, 3 years). Minimizing cost by acquiring machineinstances at the cost-optimal commitment level for loads isalso an important challenge.

This paper makes two key contributions to reduce both theabove costs for cloud users. Our first contribution is a tech-nique to determine margins in such a way that margin costs areminimized under a given load volatility model. The techniquehas two innovations based on two observations we made inthe request traces of real workloads. First, we observed thatthe margin requirements vary by load-level. Unlike traditionalload-oblivious margin mechanisms which use some fixedarithmetic transformation on the load to compute margins (e.g.,

99978-1-4673-1146-5/12/$31.00 ©2012 IEEE

Page 2: [IEEE 2012 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS) - New Brunswick, NJ, USA (2012.04.1-2012.04.3)] 2012 IEEE International Symposium on

translation with a fixed offset for constant margins, scalingwith a fixed ratio for linear margins), our ShrinkWrap tech-nique uses a table-lookup to provide selective, load-dependentmargins. ShrinkWrap reduces wasted margins by avoidingthe one-size-fits-all approach (i.e., using the same absolute-margin/margin-ratio across all loads) Our second observationwas driven by the fact that systems typically have some “toler-ance” – the fraction of time where response time targets maynot be met. We observe that the way in which the tolerancebudget is expended affects cost because using the tolerance atsome loads may result in more cost savings than at other loads.We develop a dynamic programming algorithm to optimallyexpend the tolerance budget to achieve maximum margincost savings. By including our optimal tolerance expenditurealgorithm with ShrinkWrap we get ShrinkWrap-opt.Our second contribution addresses the true costs of servingrequests by selectively varying commitment levels. We demon-strate that commitment straddling – the selective useof reserved servers in conjunction with on-demand serversis fundamentally necessary to minimize cost, while meetingperformance requirements. To understand why such commit-ment straddling is cost-optimal, we may conceptually viewvariation of loads as inducing varying utilization in a collectionof servers with some servers being heavily loaded and othersbeing lightly loaded. Combining such variation in utilizationwith the well-known notion that reserved instances are lessexpensive than on-demand instances when high utilizationis expected (say, utilization beyond a break-even ratio), wecan divide the servers into two classes – those with higherutilization than the break-even ratio and those with lowerutilization than the break-even ratio. Naturally, the optimalcost configuration will employ reserved servers for the firstclass and on-demand servers for the second class. We showthat cost-optimal commitment straddling can be computedif we know the load frequency distribution. Intuitively, onemay think that commitment straddling is the equivalent ofusing reserved instances for the average load and on-demandinstances for the peak load. However, our precise analysisprovides a stronger result. For example, our results show thatit takes a grossly underutilized workload (with more than 50%idle-time), for an all-on-demand configuration to be the cost-optimal. Similarly, it takes a workload where the peak loadis sustained for nearly 50% of the time for the all-reservedconfiguration to be cost-optimal.

Both the above optimal cost techniques assume, in theirproofs of optimality, that (1) workloads have known sta-tistical behavior (frequency distributions), and that (2) thecloud model is ideal (fine-grained rental granularity). Thefirst assumption is reasonable because (a) workloads behaviorsindeed have stable statistical behavior, and (b) it is impossibleto optimize for an unknown workload. Our second assumptionis needed to simplify the analysis. However, evaluation usingpractical (i.e., non-ideal) conditions reveals significant costreductions from each of the two techniques, individually andin combination. Our techniques can save between 12% and29% in cost (21% on average) while satisfying response-time

targets for a range of real server traces. Specifically, we showthat a 14.5% cost saving is possible for one of the world’s topten Websites (Wikimedia).

In summary, the two primary contributions of this paper are:• We develop ShrinkWrap-opt, which combines two

new techniques to achieve optimal-margin-cost fora given statistical model of load-volatility. First,ShrinkWrap reduces wasted margin costs by usingload-dependent margins instead of fixed, load-obliviousmargins. Second, our dynamic programming approachprovides an optimal solution to the problem of exploit-ing quality-of-service tolerance to minimize costs inShrinkWrap-opt.

• We show that optimal commitment straddling –the combined use of reserved machines to serve part ofthe load and on-demand machines to serve the remainderof the load so as to minimize cost – is possible if theload frequency distribution is known.

The rest of the paper is organized as follows. Section IIdefines terms used in the rest of the paper. Section III describesthe margin savings via ShrinkWrap-opt. Section IV dis-cusses true-cost minimization via straddling. Section V de-scribes our evaluation methodology. Section VI discussesexperimental results. Related work is described in Section VII.Section VIII offers a brief discussion on possible extensionsand limitations of our approach. Finally, Section IX concludesthis paper.

II. TERMINOLOGY

We refer to a virtual machine instance in the cloud as amachine or a server. We refer to machines where the userassumes the fixed-cost risk (by using reserved instances in thecloud) as reserved machines. We refer to on-demand machineinstances where the user only pays for machine-hours that areused as on-demand machines. We use lower case c with theappropriate subscripts to denote hourly costs and upper case Cwith the appropriate subscript to denote aggregate costs overthe duration of a workload.

The hourly costs of an active on-demand machine instanceand an active reserved machine instance are cod and crs re-spectively. Because the cloud vendor assumes underutilizationrisk for on-demand machines, the cod is always higher thancrs. We refer to the difference between the on-demand costand reserved cost as the on-demand premium (= cod − crs).

The existence of the on-demand premium does not implythat reserved machines are always better than on-demandmachines because unlike cod, which is charged only whenmachines are rented (and the machines are rented only whenthey are to be used), the fixed part of crs is incurred re-gardless of whether the machine is actively used or not. Toincorporate the above notion, the hourly cost for an activereserved machine crs may be broken down to two components:hourly operational cost cop and hourly fixed cost cfix (i.e.,crs = cop + cfix). The distinction between fixed costs andoperational costs becomes relevant when we shut down areserved instance when not in use. For such cases, we charge

100

Page 3: [IEEE 2012 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS) - New Brunswick, NJ, USA (2012.04.1-2012.04.3)] 2012 IEEE International Symposium on

TABLE INOTATIONS

Symbol Descriptioncod hourly cost of an on-demand instancecrs hourly cost of an active reserved instance; crs = cfix+copcfix hourly fixed cost of a reserved instancecop hourly operational cost of a reserved instanceCrs aggregate cost of all-reserved configurationCod aggregate cost of all-on-demand configurationf utilization ratio of a serverf0 break-even utilization ratio

the fixed cost but not the operational cost. The terms and theirmeanings are summarized in Table I. Further, Table II includesthe pricing values for Amazon EC2 for the various terms indollars-per-hour for an extra-large machine instance, assuminga 1-year commitment on October 7th, 2011. The hourly fixedcost is computed by dividing the dollar-cost of the machinereservation by the number of hours in a year.

The configuration which uses only on-demand machinesis referred to as the all-on-demand configuration andits aggregate cost is represented by Cod. Similarly, the con-figuration which uses only reserved-machines is referred toas the all-reserved configuration and its aggregate costis represented by Crs. We focus solely on computing costsbecause it remains a barrier to cloud adoption; disk andnetwork bandwidth costs are already more attractive in thecloud [2].

Model of Operation. We assume that requests must besatisfied within a target response time. The precise model weuse to relate load, response time, and number of machinesis described in detail in Section V and is not important forthe exposition of our ideas. For now, it is sufficient to assumethat the number of machines needed is monotonic with respectto load (i.e., heavier loads need more machines). Becausethere is some monotonic mapping from load to the numberof machines needed to satisfy that load within its responsetime, we often specify loads in terms of machine units. Forexample, we use the phrase “a load of n machines” to meanthe server load (in terms of requests per unit time) that canbe served by n machines while satisfying the target responsetime.

III. MITIGATING MARGIN COSTS

This section addresses margin costs, which arise due touncertainty about load variation. While nominal load pre-diction using various techniques have been proposed [4], [5],there is typically an error distribution around the predictedvalues because of the unpredictable nature of fine-grained loadvariations. Because of the possibility of underprediction, acommon practice is to speculatively maintain a margin – apool of servers beyond the predicted number of servers that areavailable to handle underprediction. An obvious tradeoff hereis that larger margins maximize the probability of satisfyingthe target response time, but at a higher cost. The goal of ourtechnique is to minimize the margin requirements.

Our margin minimization techniques make the followingthree assumptions about the model of the operation. First, weframe the problem of determining margins as the problem ofdetermining the number of machines to keep active (ai+1)at the beginning of the (i + 1)th time interval given thatwe have some predicted load for the (i + 1)th interval (saypi+1). In general, pi+1 may depend on the prediction modelused and on model-specific parameters which may in turnbe dependent on parameters such as prior observed loads(say mi, mi−1 etc.), the time-of-day, and so on. While ourtechnique is orthogonal to the prediction mechanism, we usean auto-regressive moving average model similar to those usedin prior literature for data-center load prediction [4]. Note,because the predicted number of servers must be active at thebeginning of the (i+1)th interval, and because the predictionmay require information from the end of ith interval, ourmodel implicitly assumes instantaneous machine startup. (Weincorporate realistic startup time in our evaluation). Second,we assume that, for each predicted load-level, the distributionof prediction errors (i.e., differences of the actual load fromthe predicted load) can be accurately estimated. We referto the distribution as the error distribution and it serves asour model for load-dependent volatility. Note, knowledge oferror distributions is necessary for any margin mechanismand is not unique to our method. For example, one couldnot use a fixed margin without implicitly assuming that thefixed margin covers the error distributions. Finally, maintainingmargins to satisfy all possible loads may be expensive andimpractical. Just as it is undesirable to provision machines forpeak loads, it is also undesirable to provision margin machinesfor peak volatility. Consequently, margin mechanisms typicallyprovision margins to achieve statistical quality of service (e.g.,response time targets must be met 99% or 99.9% of the time)under the assumed error distribution. We refer to the timeintervals where the response time need not be satisfied as thetolerance of the system. We define the satisfaction ratio to bethe fraction of time intervals where the response time targetis met. For example, if the satisfaction ratio requirement is99%, the tolerance is 1%.

Operationally, margin mechanisms serve two key functions.First, they determine the number of machines needed in theinterval (ai) based on the predicted load for that interval (pi)and the expected volatility (i.e., error distribution).

The second function of margin mechanisms is to choosewhere the “tolerance budget” is spent (i.e., the choice of whenresponse time targets may be violated), which is implicitlydecided when ai is inadequate to serve the tail of the errordistribution. Recall our key observation that spending thetolerance budget uniformly may not yield the optimal margincost.

Fig. 1 illustrates how margin mechanisms achieve the twofunctionalities. Fig. 1 plots the absolute load (Y-axis) atvarious predicted loads (X-axis) for a simple example. Eachdot in a square (x, y) represents a time interval where thepredicted load is x while the actual load is y. Note that theprediction error is y−x, but we use the absolute load y to show

101

Page 4: [IEEE 2012 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS) - New Brunswick, NJ, USA (2012.04.1-2012.04.3)] 2012 IEEE International Symposium on

1 2 3 4 5

1

2

3

4

5

6

7 margin = 2

Predicted Load

Actu

al Load

1 2 3 4 5

1

2

3

4

5

6

7

ShrinkWrap

Predicted Load

Actu

al Load

1 2 3 4 5

1

2

3

4

5

6

7

ShrinkWrap−opt

Actu

al L

oa

d

Predicted Load

(a) Fixed margin (b) ShrinkWrap (c) ShrinkWrap-optFig. 1. Visualizing margin costs

the error distribution in this figure. The error distribution fora given (predicted) load x = L is represented by the range ofpossible values the load may take (non-empty squares stackedalong the Y-axis where x = L) and the frequency of thosevalues (the number of dots in each square). The position ofthe dots within a square is not meaningful. For example, thefigure highlights the error distribution for a predicted load of 2machines (x = 2). It shows that the probability that the actualload in the time interval being 2 machines (four dots) is twicethat of the load being 1 machine (two dots).

To contrast our approach with prior approaches, considerthat we want to minimize margin costs for the set of errordistributions shown in Fig. 1 with a tolerance of two timeintervals where we may violate the response time target. Priorwork has proposed the use of constant margins [6]. Such afixed margin may be graphically interpreted as the line ai =pi+c shown in Fig. 1. The line achieves both the functions by(1) assigning a value for the margin (c = 2), and (2) omittingthe two intervals that are to the top-left of the line.

The choice of using uniform margins can cause significantwastage when the volatility is non-uniform across loads. Forexample, in Fig. 1, the margin of −1 machine is adequatewhen the predicted load is 4 machines because the actualloads all lie at or below 3 machines. However, the marginis forced to a higher value of 6 at a load value of 4 machinesbecause of the error distribution at other load levels. We wishto avoid such a one-size-fits-all approach. Our ShrinkWrapeliminates such wastage by setting the load-dependent marginin an arbitrary, per-load manner by using a table lookup (i.e.,ai = table[pi]) instead of using less flexible approaches suchas translation (i.e., ai = pi + c for constant margins) orother arithmetic transformation (e.g., scaling for linear marginsai = αpi). Using the previous example with the predicted loadof 4 servers and the margin of −1, 4 + (−1) = 3 serversshould be provisioned. Thus we insert 3 into table[4]. Ingeneral, for predicted load pi, table[pi] holds pi+margin(pi),where margin(pi) is obtained by our dynamic programmingalgorithm to be described later.ShrinkWrap’s approach minimizes wasted margin costs

since margins are customized for each predicted load whichenables a contour-hugging margin curve (as shown inFig. 1(b). Note, in Fig. 1(b), ShrinkWrap uses its tolerancebudget in exactly the same way as the fixed margin (FM)

0

0.1

0.2

0.3

0.4

0.5

0.6

-60 -40 -20 0 20 40 60

Pro

ba

bili

ty D

ensity

Prediction Error

Predicted Load [87,102)Predicted Load [222,237)

Fig. 2. Dependence of margins on load level

approach (i.e., the same two time intervals represented by twodots are left unserved).

While the above example motivates the use of load-dependent margins using a toy example, the technique is drivenby real world traces. Fig. 2 illustrates the error distributionat two different load levels (load expressed as a range ofmachines, the two curves) for the Wikimedia Web traces (de-scribed in Section V) with error on the X-axis and frequencyon the Y-axis. As can be seen, the error distribution is notuniform across loads.ShrinkWrap decouples the two functions of a margin

mechanism. We may freely choose where we truncate the tailof the error distribution (to exploit tolerance) and wrap themargin curve around what remains. In the remainder of thissection, we design a dynamic programming algorithm to obtainoptimal margin costs under the ShrinkWrap approach for agiven tolerance and for a given set of load-dependent errordistributions. Our algorithm achieves optimal cost by usingtwo mechanisms: (1) careful choice of where to expend thetolerance budget and (2) use of the ShrinkWrap approach.The use of two different mechanisms throws open an inter-esting question on the relative value of the two factors. Later,we answer this question by considering non-optimal heuristics(Section VI-C).

A. Optimal margin minimization

Before we proceed to our margin minimization algorithm,we make two observations. First, consider the cost savingsthat accrue by using tolerance. Consider the error distributionof load pi = 2 in Fig. 1. To achieve 100% coverage, thenumber of machines ai would have to be 5 machines tosatisfy the maximum load possible. However, by choosing

102

Page 5: [IEEE 2012 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS) - New Brunswick, NJ, USA (2012.04.1-2012.04.3)] 2012 IEEE International Symposium on

not to satisfy the response time for the extreme end-pointin the distribution, we can set the ai for pi = 2 to be 4(which is the next populated box). Thus, tolerance reducesthe margin for this particular predicted load by 1 machine.Further, because the margin is used as many times as thereare instances in the error distribution (eight, because thereare eight dots in the shaded region), the total cost savingsequals 1 × 8 machine-intervals (an occupancy metric similarto machine-hours). Machine-time is the metric our algorithmminimizes. True costs will differ by some scaling valuesdepending on whether the saved machine-intervals were on-demand or reserved. With the above understanding of the costsavings from tolerance, we can now define our optimizationproblem. Fig. 1(c) illustrates the margins that achieve theoptimal cost. Compared to Fig. 1(b), where the cost savingsis 38 machine-intervals (= 2×15+1×8), the optimal marginreduces cost by 45 machine-intervals (= 3× 15).

We formally cast the margin minimization problem in termsof (a) the tolerance R and (b) a collection of error distributions,i.e., the frequency distribution of prediction errors of load lev-els for each unique predicted load level (which are representedin L and the collection of L different Nis, as described below).Note that the distributions are expressed using discrete countsrather than fractional probabilities.L: a vector to represent the unique predicted load levels.L: the length of the vector L.Ni: the vector of actual loads for L(i). The vector Ni is sortedin ascending order of the values of the loads (in number ofmachines).Ni: the length of vector Ni.R: the tolerance (in number of intervals).

To illustrate the above definitions with an example, weuse the scenario shown in Fig. 1. For that example, thevector of unique predicted load levels, L, is < 1, 2, 3, 4, 5 >;thus the length L is 5. The vector N2 for predicted loadx = 2 (corresponding to the highlighted column in Fig 1)is < 1, 1, 2, 2, 2, 2, 4, 5 >.

We define a matrix P (our dynamic programming matrix)of dimensions L × R where L and R are as defined above.We define P(i, j) as the maximum cost savings (compared toa zero-tolerance design) when considering the first i uniqueloads and using a tolerance of j time intervals. With thisdefinition, the bottom-up computation implicit in dynamicprogramming can be specified in terms of the initial conditions(to define the boundary conditions) and the recurrence (tobootstrap solutions to bigger problems in terms of solutionsto smaller problems).

For initialization, consider the first row of the P matrix.Because it deals with a single error distribution, the costsavings for various tolerances is computed using a similarprocess as described earlier in this section. In general, atolerance of j implies we can afford to lop off the top jelements in the error distribution of the first load (i.e., elementsN1(N1) through N1(N1 − j + 1)) and determine the numberof machines based on what remains in the distribution. Recall

that the number of machines is also scaled by the number oftime intervals N1 to count savings over all N1 intervals.

The initializations corresponding to the above intuitionare shown in Equations 1 and 2. There are two cases tohandle the corner cases such as the tolerance exceeding thenumber of intervals in the error distribution (second choice inEquation 1) and vice versa (first choice in Equation 1 and allof Equation 2).

Initialization:Case 1: N1 ≤ R

P(1, j) ={[N1(N1)− N1(N1 − j)]×N1 where j < N1

N1(N1)×N1 where N1 ≤ j ≤ R(1)

Case 2: N1 > R

P(1, j) = [N1(N1)− N1(N1 − j)]×N1 (2)

To bootstrap solutions to bigger problem sizes, we note thatthe optimal solution for arbitrary P(i, j) must necessarily beone of the following exhaustive set of possibilities. The firstpossibility is that none of the tolerance budget is spent on Ni,which implies that all of the tolerance is spent on the earliererror distributions (i.e., P(i− 1, j)). The second possibility isthat exactly one interval of the tolerance budget is spent onNi, which implies that all-but-one of the tolerance is spent onthe earlier next-load distributions (i.e., P(i − 1, j − 1)). Andso on. The set of possibilities terminate when either we runout of tolerance budget (i.e., j, as in Equation 3 or R as inEquation 5) or we run out of intervals in the error distribution(i.e., Ni, as in Equation 4). Because we can evaluate the costsavings from the known solutions for smaller problem sizesand from our knowledge of how tolerance affects cost savingsin a single error distribution, we can exhaustively compare allchoices to pick the optimal choice.

Bootstrapping, for 2 ≤ i ≤ L:Case 1: Ni ≤ R

if j < Ni,

P(i, j) = max

P(i− 1, j)

P(i− 1, j − 1) + [Ni(Ni)− Ni(Ni − 1)]×Ni...

...[Ni(Ni)− Ni(Ni − j)]×Ni

(3)if Ni ≤ j ≤ R

P(i, j) = max

P(i− 1, j)

P(i− 1, j − 1) + [Ni(Ni)− Ni(Ni − 1)]×Ni...

...P(i− 1, j −Ni) + Ni(Ni)×Ni

(4)Case 2: Ni > R

P(i, j) = max

P(i− 1, j)

P(i− 1, j − 1) + [Ni(Ni)− Ni(Ni − 1)]×Ni...

...[Ni(Ni)− Ni(Ni − j)]×Ni

(5)

103

Page 6: [IEEE 2012 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS) - New Brunswick, NJ, USA (2012.04.1-2012.04.3)] 2012 IEEE International Symposium on

Once the recurrence computation is complete and the matrixP is fully populated, the entry P(L,R) provides the max-imum cost savings possible. Since we are interested in thechoice of time-intervals that are tolerated in the optimal costconfiguration (and not only in the cost savings from such aconfiguration), we must introduce auxiliary data-structures toremember our optimal choices in the max operator. Becausesuch auxiliary data structures can be added in a fairly mechan-ical manner, we omit details thereof.

Complexity. The above algorithm must populate L × Relements performing a maximum of R computations perelement. Therefore, the worst-case complexity is O(LR2).Note that the R term introduces a pseudopolynomial elementbecause the complexity is expressed in terms of the value ofR whereas the input R is provided in logR bits. However, thepractical values of R are small because R is typically 1% oftime intervals. In practice, the algorithm runs in seconds whileanalyzing fairly large distributions.

IV. MITIGATING TRUE COSTS

Consider the well-known tradeoffs of reserved instances vs.on-demand instances. If expected utilization is low, reservedinstances incur unnecessary fixed costs for the entire duration.However, on-demand achieves lower costs by paying a smallpremium to avoid the fixed costs. In contrast, at high utiliza-tion, the on-demand premium is unnecessarily incurred forthe entire duration; making it more attractive to use reservedinstances. There exists a break-even utilization ratio where thecost of a reserved instance equals the cost of an on-demandinstance.

The above intuition can be quantified in the context of theutilization of a single server. Consider an ideal case wheremachine startup/shutdown in the cloud is instantaneous. Fora server that is used for a fraction f of time, the aggregatecost of a reserved instance (Crs) and an on-demand instance(Cod) are given by Crs = cfix + cop × f and Cod = cod ×f , respectively. Equating Crs and Cod, we can solve for thebreak-even utilization ratio f0 as f0 = cfix/(cod − cop). Atutilization ratios higher (lower) than f0, it is cheaper to use areserved (on-demand) machine instance. Note, for the AmazonEC2 pricing structure shown in Table II f0 is approximately0.47.

Optimizing true costs. Extending the above analysis to acollection of machines subjected to varying loads, we maketwo observations. First, a collection of machines can be imag-ined to have varying utilization by using the spatial variationview described in Section I. Consider an ordered collection ofk machines < m1,m2,m3, . . .mk >. Imagine that incomingserver requests are routed in the specific machine order suchthat requests spill to the machine mi only after all machinesmj (j < i) are at capacity1. Fig. 3(a) illustrates the applicationof the above model for an example load trace in which the

1This is purely an academic exercise. We will not use such strict machine-by-machine ordered load allocation in practice.

curve plots the load (Y-axis) over N discrete time intervals (X-axis). The load-allocation model discussed above assumes that,in any interval, the machine i serves all requests correspondingto the load in the semi-closed interval (i − 1, i], because ofwhich, the Y-values of the curve are all integers. As shown inFig. 3(a), the utilization ratio (fx) of an arbitrary xth machineis the ratio of the sum of widths of the dark-shaded areas tothe width of entire duration of the trace (i.e., N). Note, theutilization of the ith machine is less than the utilization of thejth machine if i > j because of the way the allocation modelworks; there cannot be a time interval where the ith machineis utilized but jth machine is not.

Second, the utilization of the ith machine can also beinterpreted as the fraction of time for which the load is atleast i machines. Graphically, an equivalent statement wouldbe to say that the height of the curve in the dark-shaded regionin Fig. 3(a) is at least i which is obviously true. Consequently,a cumulative distribution that plots load levels (on the X-axis)against the fraction of time intervals that meet or exceed thatload level (on the Y-axis) is equivalent to a curve that plotsthe utilization ratio (Y-axis) of the ith machine (X-axis) underour load-allocation model. Fig. 3(b) shows the load distribution(the number of intervals with load x for each load level) ofFig. 3(a); Fig. 3(c) shows the cumulative distribution. Such aCDF will start with a value of 1 at a load of zero machinessince obviously the entire duration has a load of zero or higher.Further the curve is monotonically decreasing with an eventualvalue of zero beyond the maximum load. Later in Section V,we show the CDFs of real server traces.

The above two observations directly lead to the designof our optimal-cost commitment straddling configu-ration. The first observation answers the question of whatthe optimal cost configuration is. If the utilization of the ith

machine is known, then the cost optimal configuration is toreserve n machines such that the utilization of the nth machineis no less than f0 (the break-even ratio) and the utilization ofthe (n + 1)th machine is less than f0. The load can then beserved on reserved machine instances, to the extent possibleand on on-demand machines for loads that exceed the capacityof the reserved machines.

The second observation tells us how such an optimal-cost commitment straddling configuration can be con-structed in a practical way. The example in Fig. 3 assumedperfect knowledge of the future load levels to obtain theutilization curve. In contrast, the equivalence of the CDF to themachine-utilization curve implies that we only need to knowthe load frequency distribution of the future load to constructthe utilization curve. We outline the two-step constructivemethod to develop the optimal-cost configuration below.

• Using the known model of load frequency distribution (asshown in Fig. 3(b)), construct the cumulative distributionfunction that maps load X to fraction of time the load isexpected to be at least X machines (as shown in Fig. 3(c)).

• Let the point where the above curve intersects the hor-izontal line defined by y = f0 be (xopt, f0). Recall,f0 is the break-even utilization ratio. A commitment

104

Page 7: [IEEE 2012 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS) - New Brunswick, NJ, USA (2012.04.1-2012.04.3)] 2012 IEEE International Symposium on

0

10

20

30

40

50

60

0 10 20 30 40 50 60

Load (machines)

Time (intervals)

max

min 0

2

4

6

8

10

0 10 20 30 40 50 60

Counts

(num

ber

of in

terv

als

)

Load (machines)

maxmin 0

0.2

0.4

0.6

0.8

1

0 10 20 30 40 50 60

Fra

ctio

n o

f tim

e in

terv

als

w

ith

hig

he

r th

an

x-m

ach

ine

lo

ad

Load (machines)

maxmin

f0 = 0.47

(a) Time-varying loads (b) Load distribution (c) CDF of load distributionsFig. 3. True Cost Savings

TABLE IIAMAZON EC2 TARIFFS ON OCTOBER 7TH, 2011

Symbol cod crs cfix copPrice ($/hr) 0.680 0.448 0.208 0.240

straddling configuration that reserves xopt machinesand uses on-demand machines for the remainder is theoptimal cost configuration. The proof is trivial from ourabove discussion because, by our method of construction,all machines beyond the xthopt machine have utilizationlower than f0 and all the reserved xopt machines havea utilization of at least f0. For brevity, we refer to theoptimal-cost commitment straddling configuration as thestraddle configuration in the remainder of this paper.

One may draw the following conclusions from the aboveanalysis. For Amazon EC2 cost parameters, the only way foran all-reserved configuration to be cost-optimal is if themachine with the lowest-utilization achieves higher utilizationthan the break even utilization ratio f0 = 0.47. This impliesthat the peak-load must be sustained for nearly half the time forthe all-reserved configuration to be cost-optimal. Simi-larly, the only way for an all-on-demand configuration tobe cost-optimal is if the machine with the highest-utilization(i.e., m1 in our machine-by-machine load allocation strategy)has a utilization ratio lower than f0 = 0.47. For Amazon EC2cost parameters, this implies that a load must have more than53% idle time for the all-on-demand configuration to becost-optimal.

Finally, we note that once the straddle configurationis finalized, there is no need to allocate requests in a strictmachine-by-machine order as assumed in the conceptual anal-ysis. It is adequate if we make sure that incoming serverrequests are served on reserved machines before being farmedout to on-demand machines. Within the reserved machines, wemay vary the load assignment based on other considerationssuch as wear-leveling, load balance, and so on.

V. EXPERIMENTAL METHODOLOGY

We use an in-house trace-driven simulator that models acloud vendor as seen by cloud clients. Our simulator assumesthat on-demand machine instances can be started up in 10

minutes. This includes the queuing delay while a machine-startup request waits in the cloud vendor’s request queue. Asmentioned earlier, Amazon EC2 queue time guarantees areindeed loosely specified as “typically under 10 minutes” [3].Further, we mimic Amazon EC2’s minimum rental granularityof one hour (except where specifically modified to study theimpact of rental granularity).

Our simulator models the costs for on-demand and reservedmachines based on Amazon EC2 (see Table II). Specifically,we use on-demand instance costs directly. We use the costs ofa “reserved instance” with a commitment period of 1-year forthe reserved-machine. In both cases, we use numbers from the“Extra large instance”. The reserved instance costs include anup-front fixed cost which we converted to an hourly cost asdescribed earlier in Section II. Based on the above decisions,our normalized cost/hr ratios for on-demand (cod), activereserved (crs = cfix + cop), and inactive reserved machines(cfix) were 1, 0.66, and 0.31, respectively (see Table I). Oursensitivity studies (discussed briefly in Section VI-C) revealthat varying the ratios does not significantly alter our results.

We use the Allen-Cunneen approximation formula [7],[8] for the GI/G/m model to obtain the mean response timeof the requests in each 10-minute interval of the traces. AGI/G/m queue models an m-server queuing system servingrequests with general arrival and service time distributions. Inthe Allen-Cunneen approximation, the mean response time isthe sum of the mean service time and average waiting time,detailed as follows:

W =1

µ+

Pmµ(1− ρ)

· C2A + C2

S

2m(6)

where W is the mean response time, µ is the mean service rateof a server, λ is the mean request arrival rate, ρ = λ

µm is theaverage utilization of a server, and m is the number of serversavailable to serve the requests. Pm = ρ

m+12 for ρ ≤ 0.7 and

Pm = ρm+ρ2 for ρ > 0.7; CA and CS are the coefficients

of variation of request inter-arrival times and service times;respectively.

All our traces (except for one, detailed in the next section)have at least the following fields: requesting host, timestamp ofthe requesting arrival time, request URL, HTTP reply code,and size (in bytes) of the the reply. Hence we can measureλ and CA directly from the traces. We assume the service

105

Page 8: [IEEE 2012 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS) - New Brunswick, NJ, USA (2012.04.1-2012.04.3)] 2012 IEEE International Symposium on

TABLE IIICHARACTERISTICS OF THE WEB TRACES

Trace Year Length(days)

Avg-to-Peak Ratio

Purdue 2010 156 0.0690Clarknet [9] 1995 14 0.5148NASA [9] 1995 62 0.1551UCBerkeley [9] 1996 18 0.5656Wikimedia [10] 2010 92 0.4567

TABLE IVNUMBER OF RESERVED MACHINES

Trace FM ShrinkWrap-optPurdue 157 146

Clarknet 253 225NASA 264 240

UCBerkeley 254 247Wikimedia 195 191

time depends on the amount of requested data. For requestsof less than 100KB data, the service time is 3 ms; the servicetime then increases linearly with a slope of 3 ms for everyadditional 100KB, and with a maximum of 6ms. The numberof requests for data larger than 200KB was extremely smallin our traces. These assumptions ensure that most requests(<200KB) can be served under our target response time, whichis fundamentally required for any feasible solution. We use theassumptions to compute CS . In our experiments, we assumethe target response time (target W ) is 6 ms, consistent with[6], [4].

We use the formula (6) to analyze the traces for provisioningand to help evaluate the performance of the provisioningpolicies in the experiments. First, in ShrinkWrap andstraddle we set W = 6ms to compute the number ofservers required to satisfy a target response time on a givenload. Second, for every 10-minute interval in the simulation,we use the formula to compute the response time (W ) in thepast interval, where m is the number of active servers duringthat time. Because our provisioning policies launch/shut downservers every 10 minutes, m remains constant throughout asingle interval. We record if the target response time is lessthan 6 ms for a interval and report the percentage of allintervals where the target was met.

Workloads and Configurations. We use five differenttraces (see Table III) to drive our simulator. We use twonewer traces: one obtained from a subset of server logs ofPurdue University’s College of Engineering website whichincludes the Web presence of 10+ academic departments(05/26/2010 thru 10/26/2010); and the other from Wikimediagroup of Websites for 3 months (06/15/2010 thru 09/15/2010).The Wikimedia trace is derived from publicly-posted requeststatistics [10], in which aggregated load level (in requests persecond) over intervals of 10 minutes is reported in graphicalformat. Because our request for raw data went unanswered, weextracted the data using graph-data interpretation software (En-gauge Digitizer v4.1 [11]). We estimate that errors are under0.5% of true load levels. We further modeled the coefficient of

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

Fra

ction o

f tim

e inte

rvals

w

ith h

igher

than x

load r

atio

Load/MaxLoad

f0 = 0.47

PurdueClarknetNASAUCBerkeleyWikimedia

Fig. 4. CDF of load distribution of the Web traces

variation of inter-arrival times and the distribution of requesteddata size after the most recent trace we have (Purdue). Thethree remaining traces (Clarknet, UC Berkeley, NASA) arerelatively old [9]. Because these older traces have very lowload which can be served on a very small number of modernservers, we scaled the inter-arrival times uniformly to reach60,000 requests per second, which is the average load fromthe Wikimedia trace. The relative load levels in the originaltraces are unaffected. The request sizes were assumed to havethe same distribution as in the original trace.

Fig. 4 (similar to the cumulative distribution curves inFig. 3(c)) illustrates how we use the CDFs to compute thenumber of reserved machines for each of the traces. Thecurves plot true load distributions and assume no margin.The X-axis represents the load relative to each trace’s peakload where peak load corresponds to 1. Thus the X-axiscannot be compared across curves in an absolute sense. TheY-axis represents the fraction of time when the load ratiois higher than x, where x = [0, 1]. The horizontal line ofbreak-even ratio intersects a distribution curve at a pointwhere the ratio indicates the optimal number of machinesdivided by the maximum number of machines needed. Inour experiments, we compute either the fixed margin or ourmargin table and then add the corresponding margins to theloads before applying the break-even ratio, while assumingthe ideal 10-minute granularity. Table IV shows the numberof reserved machines used for each trace for FM and ourShrinkWrap-opt configurations. The numbers for FM areconsistently higher because FM needs higher margin and thusmore number of machines.

Load Prediction. We use an autoregressive moving averagemodel for workload prediction. The model is of the form:

Yt =

p∑i=1

aiYt−i +

q∑i=1

ciεt−i + εt, v (7)

where Yt−1...Yt−p are previous output values, εt...εt−q arewhite noise disturbance values, and ai and ci are parametersobtained from training the model with the traces. The param-eters p and q are the orders of autoregressive and the orderof moving average terms, respectively. We use p = 4 andq = 2 in the experiments. Higher orders result in significantlydiminishing returns (i.e., little impact on prediction errors)

106

Page 9: [IEEE 2012 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS) - New Brunswick, NJ, USA (2012.04.1-2012.04.3)] 2012 IEEE International Symposium on

0

0.2

0.4

0.6

0.8

1

1.2

abc abc abc abc abc abc abc abc abc abc abc abc

Co

st

(No

rma

lize

d t

o F

M,

All-

Re

se

rve

d)

FM SO FM SO FM SO FM SO FM SO FM SO

Purdue Clarknet NASA UCBerkeleyWikimedia Geomean

a. All-ReservedCtrue Cmargin Coverhead

0

0.2

0.4

0.6

0.8

1

1.2

abc abc abc abc abc abc abc abc abc abc abc abc

b. All-On-Demand c. Straddle FM: Fixed Margin SO: ShrinkWrap-Opt

FM SO FM SO FM SO FM SO FM SO FM SO

Purdue Clarknet NASA UCBerkeley Wikimedia Geomean

(a) 1-hour rental granularity (b) 10-minute rental granularityFig. 5. Total costs; 1% tolerance

for our traces. The load predictor was trained for each tracewith the same full trace. This is conservative because betterpredictors need lesser margin. Using any less-trained predictor(i.e. less accurate) may yield higher margin in the base case;thus increasing our opportunity.

VI. RESULTS

The primary results of our evaluation are as follows.1) Margin-cost reduction: ShrinkWrap-opt is the best

practical margin minimization policy which achieves38% lower margin costs.

2) True cost savings: The straddle configurationachieves, on average, 27% and 21% lower true cost thanthe all-reserved and all-on-demand configu-rations, respectively, while achieving the same (or better)satisfaction ratios.

3) Taken together, the two techniques yield cost-reductionsbetween 13% and 29% (21% on average).

In addition to the above primary results, Section VI-Cpresents additional results.

A. Total Cost Savings

In this section, we evaluate the total cost savings assuming1% tolerance. Fig. 5 plots the total cost (Y-axis) for each oftraces (and the geometric mean) assuming 1 hour rental gran-ularity (Fig. 5(a)) and 10-minute rental granularity (Fig. 5(b)).For each trace, we include two subgroups of bars (one for FMand another for ShrinkWrap-opt) with three bars in eachsubgroup (one each for the three commitment policies). Eachbar is subdivided into subbars to indicate true cost (the costincurred by the fraction of servers that were actively servingrequests), margin cost (the cost of servers that were active,but did not have requests to serve) and overheads (the cost ofmachines beyond the margin which exist solely because theycannot be shutdown due to rental granularity).

The following four observations can be made from thegraph. First, ShrinkWrap-opt provides total cost sav-ings over FM across all machine acquisition policies be-cause margin costs are always incurred (either as unneces-sary operational costs in reserved machines or unnecessaryrental costs in on-demand machines). However, in the re-mainder of this section, we focus on the straddle policybecause it minimizes true costs (as shown later). Second,

on average, ShrinkWrap-opt reduces the margin costsby 38% over FM in the practical case with 1 hour rentalgranularity. With 10-minute rental granularity, the margincost reduction is 42%. Third, ShrinkWrap-opt increasesthe overhead in the 1 hour granularity configuration be-cause it attempts to shut down machines more frequentlythan FM. When these attempts fail because of minimumrental granularity, the machines contribute to overhead costsby 7%. Effectively, the rental granularity prevents some ofthe efficiency of ShrinkWrap-opt from translating tocost reductions. Finally, the incremental total cost reduc-tion of ShrinkWrap-opt over FM for straddle is 7%.The absolute total cost reduction of both straddle andShrinkWrap-opt over the all-reserved configurationwith FM is 21%.

We have also evaluated ShrinkWrap-opt with 5% tol-erance and seen reduced benefits (10% margin reduction, 5%overall reduction; not shown).

B. Commitment Straddling

To focus on true costs, we assume that the loads can beperfectly predicted for all configurations, thus eliminating theneed for margin costs.

Fig. 6 plots the true cost (Y-axis) of each of the config-urations (individual bars within groups) normalized to thatof the all-reserved configuration for each of the Webtraces (groups of bars on the X-axis) for the 1 hour rentalgranularity (Fig. 6(a)) and the 10 minute rental granularity(Fig. 6(b)) cases. Further, each bar illustrates the breakdownof on-demand costs and reserved costs (which is the sum ofthe two sub-bars – fixed costs and operational costs). Con-servatively, we let the all-reserved configuration achievea 99% satisfaction ratio whereas both all-on-demand andstraddle achieve 100% coverage. In each graph, we includeone additional group of bars for the geometric mean across allWeb traces.

From Fig. 6(a) and Fig. 6(b), we observe that straddleuniformly achieves the least cost configuration across alltraces and across the different rental granularities. With the1-hr granularity, the true cost savings of the straddleconfiguration is 27% and 21% over all-reserved andall-on-demand, respectively. With the 10 minute rentalgranularity, the cost savings of straddle are 26% and 17%

107

Page 10: [IEEE 2012 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS) - New Brunswick, NJ, USA (2012.04.1-2012.04.3)] 2012 IEEE International Symposium on

0

0.2

0.4

0.6

0.8

1

1.2

1.4

a b c a b c a b c a b c a b c a b c

Co

st

(No

rma

lize

d t

o A

ll-R

ese

rve

d) a. All-Reserved

b. All-On-Demandc. Straddle

Purdue Clarknet NASA UCBerkeley Wikimedia Geomean

CfixCopCod

0

0.2

0.4

0.6

0.8

1

1.2

1.4

a b c a b c a b c a b c a b c a b c

Co

st

(No

rma

lize

d t

o A

ll-R

ese

rve

d) a. All-Reserved

b. All-On-Demandc. Straddle

Purdue Clarknet NASA UCBerkeley Wikimedia Geomean

CfixCopCod

(a) 1-hr rental granularity (b) 10-minute rental granularityFig. 6. True costs with commitment straddling

respectively. Note, in this case, the 1-hr granularity hurts thebase cases (all-reserved and all-on-demand) morethan our design. Consequently, our cost savings are higherwith 1-hr granularity than with 10 minute granularity.

C. Other results

This section briefly summarizes the results from several aux-iliary studies to enhance the understanding of our techniques.

a) Sensitivity studies: (1) Our ShrinkWrap-opt configu-ration always achieves the lowest cost, regardless of the targettolerance (ranging from 0.90 to 0.99) for all traces. (2) Whenvarying the relative ratios of reserved-cost to on-demand cost(from 0.6 to 0.8), the gap with respect to all-on-demanddecreases with increasing crs/cod ratio because reserved costsapproach on-demand costs. The converse was true for the gapwith respect to all-reserved.

b) Dynamic programming vs. heuristics: We com-pare our optimal dynamic programming algorithm inShrinkWrap-opt with other heuristics algorithms suchas the greedy algorithm. The greedy algorithm successivelychooses time intervals (to omit from service) which yield themaximum cost savings. The results show that the advantage ofusing an optimal algorithm is limited, especially when the tol-erance is very low (due to Amdahl’s law effect). This impliesthat ShrinkWrap’s table-lookup based technique contributesmore in the cost savings than the dynamic programmingapproach. However, there is no reason to use heuristics becausethe execution time to run the optimal algorithm is negligibleand the algorithm is not run frequently.

c) Inaccurate statistical models: Both of our techniquesleverage accurate statistical models such as load frequency dis-tribution (for commitment straddling) and error distributions(for ShrinkWrap-opt). While developing such workloadmodels is not the focus of our study (rather, our focus ison leveraging such models to reduce costs), we conductedan experiment to study the impact of inaccurate models. Weused the distributions obtained from a partial trace and usedit as a predictor for the remainder of the trace. In the caseof the Wikimedia trace, we captured 5% of cost improvementout of a maximum possible opportunity of 14.5%. Our shorttraces limited the accuracy of our models; however it servesas a lower-bound on the cost reduction from our techniques.

VII. RELATED WORK

There has been some work in predicting the demand ofenterprise applications [4], [5] by using various techniquesincluding pattern recognition and feedback control theory. Weuse one such prediction mechanism in our base case. Becauseperfect load prediction is unlikely, the use of margins tohandle prediction error is likely to remain. Also note thatour technique to reduce true costs via commitment straddlingwill remain useful even in the unlikely event of perfect loadprediction.

There are a number of techniques that target operationalcosts (with a focus on server power, cooling power or somecombination of the two) in data centers [6], [12], [4], [13],[14], [15], [1]. At a high level, shutting idle servers down forpower savings is similar in spirit to shutting down machineinstances to save cloud-user cost. Our comparison with thefixed-margin configuration covers such techniques since itis similar to spatial subsetting [12] with the auto-regressivemoving-average model based load prediction [4] and a fixedmargin similar to SurgeGuard [6].

Recent work by Hajjat et al. [16] examines ways tooptimize the cost of migrating enterprise computation to thecloud while bounding performance degradation. Unlike ourtechniques, their model does not consider dynamic workloads.Bodik et al. [17] explore practical automatic resource allo-cation in data centers by applying statistical machine learningtechniques to performance modeling. Their techniques, whichestimate the number of machines required to serve the pre-dicted workload, can be used in conjunction with our work toachieve better resource provisioning.

VIII. DISCUSSION

One concern is how well our technique may adapt when thestatistical properties of a server workload changes. Such re-training must balance the twin concerns of (1) being responsiveto changes in behavior and (2) being resistant to glitches andnoisy behavior. For example, one simple approach is to placethe margin minimization technique in a feedback loop wheremargins for any given week can be computed using the errordistribution of the immediately prior N weeks.

Though our paper talks about migrating server workloads tothe cloud in their entirety, the analysis can be easily modified

108

Page 11: [IEEE 2012 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS) - New Brunswick, NJ, USA (2012.04.1-2012.04.3)] 2012 IEEE International Symposium on

for alternative situations where organizations provision captivedata centers for a base load2 and spill excess load to the publiccloud. One simple option is to subtract the base load andprovision cloud resources only for the excess load using ourtechniques to minimize margin/true costs. Another option isto treat the private cloud as a “commitment level” with itsown cost-structure and use our technique to provision boththe private cloud and the public cloud components.

One concern that our solution does not address is the “flashcrowd” phenomenon – unanticipated extreme load surges thatare tied to specific popular events/phenomena which attractwidespread attention. While our technique handles bursty loadbehavior typical in day-to-day operation, “flash crowds” mustbe handled separately because (a) the cost of margins becomesvery high if we attempt to handle such corner cases, and (2)no margin can guarantee flash crowd resilience because, bydefinition, there is no upper limit on either the load level orthe rate of change of load in a flash crowd. Since this typeof unexpected spikes cannot be pro-actively handled withoutmassive over-provisioning, reactive, large-scale rampup hasto be used. Depending on the system rampup time of thecloud services, some requests will inevitably be denied untilnew servers are launched. However, without the elasticity ofthe cloud, even the large-scale reactive rampup would not bepossible.

Another case where the system rampup time matters is whenthe datacenter encounters local meltdown. Such case, similarto the flash crowd, can only be handled reactively. However,cloud users are usually isolated from these problems becausecloud vendors handle them. For example, The Amazon EC2SLA commits a 99.95% availability for each region. Reactivelylaunching new servers in another datacenter upon unexpectedshutdowns is reasonable because the probability of such eventsis low (0.05%).

IX. CONCLUSIONS

Cost remains a significant barrier for adoption of cloudcomputing for ongoing computing operations (as opposed toepisodic computing demands). IaaS cloud operations incur twotypes of costs when serving variable workloads. They incurmargin costs to handle uncertainty of load and also true coststo serve requests. This paper addresses both costs optimally,given statistical properties of the workload.

To address margin costs, we develop ShrinkWrap-opt,which combines two key innovations. First, based on ourobservation that margin requirements differ according to load,ShrinkWrap avoids the one-size-fits-all approach to marginsand instead uses selective, load-dependent margins, thus re-ducing wastage. Second, we develop a dynamic programmingalgorithm that optimally “spends” its tolerance budget tominimize margin costs.

2The motivation for a private cloud may be a combination of multiplefactors such as (1) performance due to proximity to users, (2) cost, (3) privacyand security policies. We are only concerned with how such a private cloudaffects the load that spills to the public cloud.

To address true costs, we exploit the various commitmentlevels offered by cloud vendors to show that the optimal costconfiguration requires commitment straddling – which is theselective use of both reserved and on-demand servers.

While the proof of optimality of both the above techniquesare valid only under ideal conditions, the techniques do workwell in practical conditions. Simulations using real workloadtraces and real cloud pricing models (Amazon EC2) revealthat combining the two techniques yields 21% cost savings (onaverage) compared to the baseline configurations. Specifically,our results show that as much as 14.5% cost reduction ispossible for Wikimedia.

ACKNOWLEDGMENT

We thank the anonymous reviewers and the shepherd KenBarr for their feedback. This work is supported in part byNational Science Foundation (Grant no. CCF-0644183).

REFERENCES

[1] B. Urgaonkar, P. Shenoy, and T. Roscoe, “Resource overbooking andapplication profiling in shared hosting platforms,” ACM SIGOPS Oper.Syst. Rev., vol. 36, no. SI, pp. 239–254, 2002.

[2] M. Armbrust et al., “Above the clouds: A Berkeley view of cloudcomputing,” EECS Department, University of California, Berkeley, CA,Tech. Rep. EECS-2009-28, 2009.

[3] (2011, April) Amazon EC2 instance launch time. [Online]. Avail-able: http://aws.amazon.com/ec2/faqs/#How quickly will systems berunning

[4] Y. Chen et al., “Managing server energy and operational costs in hostingcenters,” in Proc. ACM Int. Conf. on Measurement and Modeling ofComputer Systems, Banff, Alberta, Canada, 2005, pp. 303–314.

[5] D. Gmach et al., “Workload analysis and demand prediction of enter-prise data center applications,” in Proc. IEEE Int. Symp. on WorkloadCharacterization, Boston, MA, 2007, pp. 171–180.

[6] F. Ahmad and T. N. Vijaykumar, “Joint optimization of idle and coolingpower in data centers while maintaining response time,” SIGPLAN Not.,vol. 45, no. 3, pp. 243–256, 2010.

[7] A. O. Allen, Probability, statistics, and queueing theory with computerscience applications. San Diego, CA: Academic Press Professional,Inc., 1990.

[8] G. Bolch et al., Queueing networks and Markov chains: modeling andperformance evaluation with computer science applications. New York,NY: Wiley-Interscience, 1998.

[9] The Internet traffic archive. [Online]. Available: http://ita.ee.lbl.gov/html/traces.html

[10] Wikimedia statistics. [Online]. Available: http://meta.wikimedia.org/wiki/Statistics

[11] Engauge digitizer, v4.1. [Online]. Available: http://digitizer.sourceforge.net

[12] J. S. Chase et al., “Managing energy and server resources in hostingcenters,” in Proc. ACM symp. on Operating systems principles, Banff,Alberta, Canada, 2001, pp. 103–116.

[13] D. Meisner, B. T. Gold, and T. F. Wenisch, “Powernap: eliminatingserver idle power,” in Proc. Int. Conf. on Architectural support forprogramming languages and operating systems, 2009, pp. 205–216.

[14] J. Moore et al., “Making scheduling ”cool”: temperature-aware work-load placement in data centers,” in Proc. USENIX Annu. Tech. Conf.,Anaheim, CA, 2005, pp. 5–5.

[15] R. K. Sharma et al., “Balance of power: Dynamic thermal managementfor internet data centers,” IEEE Internet Computing, vol. 9, no. 1, pp.42–49, 2005.

[16] M. Hajjat et al., “Cloudward bound: planning for beneficial migrationof enterprise applications to the cloud,” in Proc. ACM SIGCOMM, NewDelhi, India, 2010, pp. 243–254.

[17] P. Bodik et al., “Statistical machine learning makes automatic controlpractical for internet datacenters,” in USENIX Workshop on Hot Topicsin Cloud Computing, 2009.

109