Decision support for Amazon Spot Instance

Decision support for Amazon Spot Instance

Fei DongDuke University

[email protected]

December 6, 2011

Abstract

Infrastructure-as-a-Service (IaaS) provides an attractive computingparadigm to allocate cluster resource dynamically for enterprise cus-tomers and online business. Cloud providers offer a biddable virtualmachines for spare computing instances known as ”Spot Instances”,which price is usually significantly lower than their fixed on-demandprices. However, users need to take the risk of uncertain availability,that is, when the bidding price is lower than spot price, the runninginstances will be terminated by cloud providers.

This report will address the optimization problem for cloud users.We first apply regression technology to predict the spot price. Nextwe propose a resource application mechanism to maximize the util-ity under deadline and budget constrains. Finally, we present someexperiments with real instance price traces.

1 Introduction

Infrastructure-as-a-service(IaaS) cloud platform has brought unprecedentedchanges in the cloud leasing market. Amazon EC2 [1] is the popular cloudprovider to address the challenges, by providing the standard on-demandinstances, reserved instances and Spot Instances(SI) [3]. SI allow users to bidfor spare capacity and run them as long as the bid price is over the currentspot price. For some applications (e.g., web crawling, image processing, bigdata), SI can reduce 50%-60% computing costs. Table 1 shows the featuresand renting costs of some representative EC2 node types. We can formulatea simple pricing model to compute the corresponding total of each workloadexecution.

total cost = cost unit× num nodes× exec time (1)

Here, cost unit is the unit price in Table 1 num nodes is the number nodesin the cluster. exec time is the execution time. Based on Figure 1, if the

1

Figure 1: Performance Vs. pay-as-you-go costs for a workload that runs on differ-ent EC2 cluster resource configurations.

user want to minimize cost subject to an execution time of under 1 hour,then it is best to choose six c1.xlarge EC2 nodes.

More choices on cloud raise new challenges for users: how many instanceto rent, what kind of type(on-demand, spot, high-CPU, large disk), andwhat bid value to use for spot instances? In particular, renting on-demandrisks high costs while renting spot instances risks job interruption and thusdelayed completion when the spot price exceeds users’ bids. However, thissituation can be avoid by bidding slightly higher, thus mitigating this un-certainty, or by using fault-tolerance techniques such as checkpointing [7]

To manage those tradeoffs and decision support on behalf of customers,we propose a scheme to optimize the utility. The scheme relies on twocomponents: (i) a price prediction model aimed at determining the lowestlimit price to bid in order to achieve a given level of availability; and (ii)a strategy to apply spot instances given time and budget constrains. Twoalternative strategies are proposed in our report. The goal of first one is tofinish jobs as soon as possible. The second one, instead, emphasizes lowestmonetary cost rather than execution time.

EC2 Node CPU Memory Storage I/O CostType (# EC2 Units) (GB) (GB) Performance (U.S. $ per hour)

m1.small 1 1.7 160 moderate 0.085m1.large 4 7.5 850 high 0.34m1.xlarge 8 15 1,690 high 0.68c1.medium 5 1.7 350 moderate 0.17c1.xlarge 20 7 1,690 high 0.68

Table 1: Five representative EC2 node types, along with resources and costsThe rest of report is organized as follows: Section 2 describes related

work; Section 3 describe the proposed price prediction algorithm; Section4 details the mechanisms that composed our bidding strategy; Section 5presents experimental results and discussion; Section 6 concludes the report.

2

2 Related Work

2.1 Cloud-based Cluster Sizing Problem

Some work has proposed methods to predict job runtimes, Elastisizer [5]shows that ”one size fits all” notion does not apply to runtime estimationof job runtimes.In our scenarios, runtime prediction aid the decision-makingprocess in the following ways: i) the user can use Elastisizer to estimatethe running time on on-demand instances. Then we can know whether it ispossible to finish before deadline; ii) along with information about currentprices, we estimate the cost to run a job on a given instance type, thusincreasing the chances of meeting monetary constraints.

2.2 Cloud Management in Industry

Amazon launched the Spot Instance project in 2009. Although AWS [4]provides users dashboard to manage the account and usage, there are stillsome companies which help customer launch, monitor and manage multi-server deployments. One of these is RightScale [6], which claims that itsdynamic server configuration and automation allows you to dramaticallyreduce deployment and operational costs in the cloud.

3 Prediction Model

The first part of the proposal is a model to predict an optimal limit pricefor customer to bid on the spot market. In a Vickrey auction such as thatused in the Amazon spot market, bidders have an incentive to bid truthfullyrather than over or under bidding. Our objective is to bid in such a way toachieve a desired level of availability.

In order to achieve our goal, we first need to collect the history data.Fortunately, Amazon Web Service allows us to download the history priceof the latest 3 months. Cloudexchange [2] also demonstrates the spot pricetrace and provides source data download (see Figure 2. We report statisticsof the spot prices over the period August 15, 2011 - November 15, 2011 (us-east-1 region) as well as the approximated values estimated from a regressiondistribution with the mean and variance over 1,000 samples. According toFigure 3, the normal approximation is not better than exponential distribu-tion, as the distribution of the spot prices is more long-tailed: even thoughthe first one third are a good match, the minimum and maximum values inthe historical data differ substantially from the approximation.

Another attempt is to build a relationship between spot prices at differ-ent times. It is based on the factor that history data at specific momentshares similar pattern (i.e. Price at late night are cheaper than day time.We can conduct next moment price by referring yesterday price at the ex-

3

act monment). We suggest a formula (named time series model) to predictprice.

Predict(moment) =n∑

i=1

p(1− p)n−1H(i,moment) (2)

where p is a similarity factor, H(i,moment) donates history price at thatmoment i days ago.

(a) c1.medium (b) m1.large

Figure 2: Price history for c1.medium and m1.large Spot Instance types(in USDper hour; geographic zone us-east; operation system Linux/Unix).

(a) Normal Distribution (b) Exponential Distribution

Figure 3: Regression on Price history for m1.small Spot Instance type.

In light of the above, we propose the following algorithm:

1. Collect the prices over a period of time, in order to estimate their meanand variance.

2. Use the exponential approximation fitting, i.e., assume that spot pricesare distributed with the best fitting.

3. Given the availability PR, calculate the inverse of CDF , which is acandidate price.

4

4. Compare time series model, normal distribution or others and pick amaximum value.

5. if the bid price is smaller than the spot price, thus increase the bid byα for next interval.

The algorithm does well especially for large instances, while the mone-tary saving compared with ”on-demand” instances. The highest bid for spotinstances ($/hour) is:

• 0.229, achieving 98.11% uptime, compared to 0.34 for on-demandm1.large instance

Notice the ”best” fitting to bid depends on the availability. If user canaccept a slightly lower availability, the bid price can be reduced significantly.

4 Bidding Engine

To simplify the approach, we assume that the instance type and bid priceare fixed, then focus on answering the last question.

Notation Description

t one spot instance typen number of the instance type ts spot instance priceA users’ applicationB budgetD deadlineET estimated running timeEC estimated costAT available time ( total time in-bid)AR availability rate AT/ETST start execution time. (Clock Time)FT finish execution time. (Clock Time)BPt bid price on SI type tPP predicted price in a time windowM real monetary costV the overall value from executing all jobsU utility value when the user application completes

Table 2: PARAMETERS AND CONSTRAINS

The customer’s goal is to finish A which consists of n jobs {J1, J2, ...Jn}by the deadline D. We employ a user model with hard deadline constrains;if A finishes at RT ≤ DT , then the utility is U ; otherwise the utility is 0. Ina model of soft deadline constrains, the utility will decrease with a rewardfunction r(U). Here we consider V (D) = U instead of the latter one.

5

The utility function is described by the tuple {D,B,ET, PP [time0, time0+D], t, n}. Here PP can be retrieved from the previous Section.

U = F (D,B,ET, PP, t, n)

our goal is to maximize the utility and find the besting settings

settingsopt = argc∈S max U

1. Fastest Execution

minimize FT (3)

2. Minimize the cost

minimize

Dj∑i

(n · si · xi) (4)

subject to

xi ∈ [0, 1] ∀i ∈ {1...D} (5)

D∑i=1

xi ≥ ET (6)

M ≤ B (7)

Since we assume user set deadline in a small window. i.e.: 12 hours or1 day. the search space for the problem is limited. We can enumerate thetotal feasible solutions and check the constrains with exhaustive searching.Within the idea, we develop a program in Figure 4 in Python which canpredict the price and simulate the bid engine. It will return the ”optimal”solution containing ”BPt”, ”EC”, ”ST”, ”FT”, ”t” in less than 1 second.

Figure 4: program interface

6

5 Evaluation

The goal of experimental evaluation is to study the ability of the biddingengine to provide some useful suggestions on spot market. The evaluationmethodology is as follows:

• We evaluate the predictive price for a specified cluster type.

• We compare the running time and cost of spot instances with on-demand instances.

• We evaluate the optimization capabilities of the bidding engine in find-ing a good time range and bid price to meet user requirement.

To evaluate the prediction prices, we first collect the history price ofm1.small instance from August 17, 2011 to November 17, 2011 as the train-ing data. With the algorithm we propose in Section 3, we can predict 24hours in November 18. After comparing the real price on November 18, wefind the predictor can capture the trend of price and the variance is 0.001786which is small enough. We can see that the price between Hour 13 to 19is obviously higher than other time range, which means more users wouldlike to bid in the afternoon. When we tune some parameters carefully, wecan even get more precise result with variance 0.000769 which is marked as”Predict Adjust” in Figure 5.

To evaluate the relationship of running time and monetary cost, we run acomparison experiment between on-demand and spot instances. In Section1, we have shown the performance on on-demand types. Figure 6 showsthe running time of the workload when run with the same configurationsettings, across clusters each with a different type of node. It is interestingto note that complex interactions between execution times and monetarycosts as we vary the node type used in the clusters. As expected, using spotinstance lead to increasing of the running time (1 to 5 times) rather thanon-demand instances. However, monetary cost is reduced by 1/2 to 2/3 inmost of cases.

To evaluate the optimization capabilities of the bidding engine, we takethe m1.small instance as an example. In Figure 7, we show the predictionprice and real price in blue and green line respectively. Now a user wants torun an application which ET 8 hours with the constrain of $20 budget and 1day deadline. With economical mode, we set the bidding price as $0.157/has bidding engine suggested, the timeline shows the job starts at Hour 9 andends at Hour 21. Duration time is 12 hours and M is $8.04 (10 nodes). ARis 67%. When we switch to fast-run mode, the bidding price is $0.315 whichstarts at Hour 2 and ends at Hour 12. EC is $14.44 and M is $11.98. AR is8hours/12hours = 80%. If we notice that the on-demand price for m1.smallis only $0.085/h, which is lower than spot prices, the best strategy for usersis actually to choose on-demand instances.

7

Figure 5: Spot Price Prediction on m1.small, europe timezone

Figure 6: On demand Instance v.s. Spot Instance on running time and cost

Figure 7: Bid strategies on m1.small spot instance at November 18, 2011.

8

6 Conclusion

Market-based cloud systems with spot instances offer the flexibility of freemarket economics and the possibility of low cost utility computing. A majorchallenge is how to bid given the users’ constraints, like resource availabilityand deadline for job completion. We propose an algorithm to predict thespot instances. We next formulate a model which enables users to optimizemonetary costs, performance, and availability as desired with tuning someparameters(cluster size, instance type). With simulation by real price traceof Amazon’s Spot Instance and workload of real applications, we evaluatedthe result.

Some specific recommendations and general implications of this modelas follows.

• More cost-efficient than fixed-size instance choice. It reduces morethan 50% cost in most cases.

• Spot Instances not always provide inexpensive resources for transientworkloads. i.e. m1.small spot price is even higher than on-demandprice.

• A user can change several of the knobs in order to achieve a suit-able balance between monetary cost and desired service levels, such asdeadline for job execution or availability.

For the future work, we can study the optimization problem when allowingfor the mixing of instances(on-demand, SI together). In this proportion thebidding price is fixed totally by the users, while we should reconsider thatthe dynamic environment where customer requirement changes frequently.Besides, generalization of price prediction mechanisms and disaster recoveryproblems are some issues to be addressed.

References

[1] Amazon Elastic MapReduce. http://aws.amazon.com/elasticmapreduce.

[2] Spot Instance History Price Trace. http://cloudexchange.org/.

[3] Amazon Spot Instance. http://aws.amazon.com/ec2/spot-instances/.

[4] Amazon Web Service. http://aws.amazon.com/.

[5] H. Herodotou, F. Dong, and S. Babu. No One(Cluster) Size Fits All: AutomaticCluster Sizing for Data-intensive Analytics. In ACM Symposium on CloudComputing, 2011.

[6] RightScale. http://www.rightscale.com.

[7] S. Yi, D. Kondo, and A. Andrzejak. Reducing Costs of Spot Instances viaCheckpointing in the Amazon Elastic Compute Cloud. IEEE 3rd InternationalConference on Cloud Computing, 2010.

9

Documents

Decision support for Amazon Spot Instance