12
Cost Minimization for Computational Applications on Hybrid Cloud Infrastructures Maciej Malawski a,b Kamil Figiela b Jarek Nabrzyski a a University of Notre Dame, Center for Research Computing, Notre Dame, IN 46556, USA b AGH University of Science and Technology, Department of Computer Science, Mickiewicza 30, 30-059 Krak´ ow, Poland Abstract We address the problem of task planning on multiple clouds formulated as a mixed integer nonlinear programming problem (MINLP). Its specification with AMPL modeling language allows us to apply solvers such as Bonmin and Cbc. Our model assumes multiple heterogeneous compute and storage cloud providers, such as Amazon, Rackspace, GoGrid, ElasticHosts and a private cloud, parameterized by costs and performance, including constraints on maximum number of resources at each cloud. The optimization objective is the total cost, under deadline constraint. We compute the relation between deadline and cost for a sample set of data- and compute-intensive tasks, representing bioinformatics experiments. Our results illustrate typical problems when making decisions on deployment planning on clouds and how they can be addressed using optimization techniques. Key words: Distributed systems, Cloud computing, Constrained optimization 1. Introduction In contrast to already well established comput- ing and storage resources (clusters, grids) for the research community, clouds in the form of infrastructure-as-a-service (IaaS) platforms (pio- neered by Amazon EC2) provide on-demand re- source provisioning with a pay-per-use model. These capabilities together with the benefits introduced by virtualization, make clouds attractive to the sci- entific community [1]. In addition to public clouds Email addresses: [email protected] (Maciej Malawski), [email protected] (Jarek Nabrzyski). such as Amazon EC2 or Rackspace, private and community cloud installations have been deployed for the purpose of scientific projects, e.g. Future- Grid 1 or campus-based private cloud at Notre Dame 2 . As a result, multiple deployment scenarios differing in costs and performance, coupled together with new provisioning models offered by clouds make the problem of resource allocation and capac- ity planning for scientific applications a challenge. The motivation for this research comes from our pre- vious work [2,3], in which we run experiments with 1 http://futuregrid.org 2 http://www.cse.nd.edu/ ~ ccl/operations/opennebula/ Preprint submitted to Elsevier January 16, 2013

CostMinimizationforComputationalApplicationsonHybridCloud …galaxy.agh.edu.pl/~malawski/cloud-reu-fgcs.pdf · 2013-01-16 · the mean computational cost and data sizes of the applicationtasks.Ofcourse,inrealexecutiontheac-tual

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: CostMinimizationforComputationalApplicationsonHybridCloud …galaxy.agh.edu.pl/~malawski/cloud-reu-fgcs.pdf · 2013-01-16 · the mean computational cost and data sizes of the applicationtasks.Ofcourse,inrealexecutiontheac-tual

CostMinimization for Computational Applications onHybrid CloudInfrastructures

Maciej Malawski a,b Kamil Figiela b Jarek Nabrzyski a

aUniversity of Notre Dame, Center for Research Computing, Notre Dame, IN 46556, USA

bAGH University of Science and Technology, Department of Computer Science, Mickiewicza 30, 30-059 Krakow, Poland

Abstract

We address the problem of task planning on multiple clouds formulated as a mixed integer nonlinear programmingproblem (MINLP). Its specification with AMPL modeling language allows us to apply solvers such as Bonmin andCbc. Our model assumes multiple heterogeneous compute and storage cloud providers, such as Amazon, Rackspace,GoGrid, ElasticHosts and a private cloud, parameterized by costs and performance, including constraints on maximumnumber of resources at each cloud. The optimization objective is the total cost, under deadline constraint. Wecompute the relation between deadline and cost for a sample set of data- and compute-intensive tasks, representingbioinformatics experiments. Our results illustrate typical problems when making decisions on deployment planningon clouds and how they can be addressed using optimization techniques.

Key words: Distributed systems, Cloud computing, Constrained optimization

1. Introduction

In contrast to already well established comput-ing and storage resources (clusters, grids) forthe research community, clouds in the form ofinfrastructure-as-a-service (IaaS) platforms (pio-neered by Amazon EC2) provide on-demand re-source provisioning with a pay-per-use model. Thesecapabilities together with the benefits introducedby virtualization, make clouds attractive to the sci-entific community [1]. In addition to public clouds

Email addresses: [email protected] (Maciej

Malawski), [email protected] (Jarek Nabrzyski).

such as Amazon EC2 or Rackspace, private andcommunity cloud installations have been deployedfor the purpose of scientific projects, e.g. Future-Grid 1 or campus-based private cloud at NotreDame 2 . As a result, multiple deployment scenariosdiffering in costs and performance, coupled togetherwith new provisioning models offered by cloudsmake the problem of resource allocation and capac-ity planning for scientific applications a challenge.

The motivation for this research comes from our pre-vious work [2,3], in which we run experiments with

1 http://futuregrid.org2 http://www.cse.nd.edu/~ccl/operations/opennebula/

Preprint submitted to Elsevier January 16, 2013

Page 2: CostMinimizationforComputationalApplicationsonHybridCloud …galaxy.agh.edu.pl/~malawski/cloud-reu-fgcs.pdf · 2013-01-16 · the mean computational cost and data sizes of the applicationtasks.Ofcourse,inrealexecutiontheac-tual

compute-intensive bioinformatics application on ahybrid cloud consisting of Amazon EC2 and a pri-vate cloud. The application is composed of a setof components (deployed as virtual machines) thatcommunicate using a queue (Amazon SQS) and pro-cess data that is stored on a cloud storage (AmazonS3). The results of these experiments indicate thatclouds do not introduce significant delays in terms ofvirtualization overhead and deployment times. How-ever, multiple options for placement of applicationcomponents and input/output data, which differ intheir performance and costs, lead to non-trivial re-source allocation decisions. For example, when datais stored on the public cloud, the data transfer costsbetween storage and a private cloud may becomelarge enough to make it more economical to pay forcompute resources from the public cloud than totransfer the data to a private cloud where comput-ing is cheaper.

In this paper, we address the resource allocationproblem by applying the optimization techniquesusing AMPL modeling language [4], which pro-vides access to a wide range of ready to use solvers.Our model assumes multiple heterogeneous com-pute and storage cloud providers, such as Amazon,Rackspace, ElasticHosts and a private cloud, pa-rameterized by costs and performance. We alsoassume that the number of resources of a giventype in each cloud may be limited, which is oftenthe case not only for private clouds, but also forlarger commercial ones. The optimization objec-tive is the total cost, under deadline constraint. Toillustrate how these optimization tools can be use-ful for planning decisions, we analyze the relationsbetween deadline and cost for different task anddata sizes, which are close to our experiments withbioinformatics applications.

The main contributions of the paper are the follow-ing:

– We formulate the problem of minimization of costof running computational application on hybridcloud infrastructure as a mixed integer nonlinearprogramming problem and its specification withAMPL modeling language.

– We evaluate the model on scenarios involving lim-ited and unlimited public and private cloud re-sources, for compute-intensive and data-intensivetasks, and for a wide range of deadline parame-ters.

– We discuss the results and lessons learned fromthe model and its evaluation.

The paper is organized as follows: after discussingthe related work in Section 2, we introduce the de-tails and assumptions of our application and infras-tructure model in Section 3. Then, in Section 4 weformulate the problem using AMPL by specifyingthe variables, parameters, constraints and optimiza-tion goals. Section 5 presents the results we obtainedby applying the model to the scenarios involvingmultiple public and private clouds, overlapping com-putation and data transfers, and identifying specialcases. In section 6 we provide a sensitivity analysisof our model and show how such analysis can be use-ful for potential users or computing service resellers.In section 7 we estimate how our model behaves ifthe task sizes are not uniform and change dynami-cally. The conclusions and future work are given inSection 8.

2. Related work

The problem of resource provisioning in IaaS cloudshas been recently addressed in [5] and [6]. They typi-cally consider unpredictable dynamic workloads andoptimize the objectives such as cost, runtime or util-ity function by autoscaling the resource pool at run-time. These approaches, however, do not address theproblem of data transfer time and cost, which weconsider an important factor.

Integer programming approach has been applied tothe optimization of service selection for activities ofQoS aware grid workflows [7]. On the other hand,in our model we assume the IaaS cloud infrastruc-ture, while the objective function takes into accountcosts and delays of data transfers associated withthe tasks.

The cost minimization problem on clouds addressedin [8] uses a different model from ours. We imposea deadline constraint and assume that the numberof instances available from providers may be lim-ited. To satisfy these constraints, the planner has tochoose resources from multiple providers. Our modelalso assumes that VM instances are billed per hourof usage.

2

Page 3: CostMinimizationforComputationalApplicationsonHybridCloud …galaxy.agh.edu.pl/~malawski/cloud-reu-fgcs.pdf · 2013-01-16 · the mean computational cost and data sizes of the applicationtasks.Ofcourse,inrealexecutiontheac-tual

3. Model

3.1. Application model

The goal of this research is to minimize the costof processing a given number of tasks on a hybridcloud platform, as illustrated in Fig. 1. We assumethat tasks are independent from each other, but theyhave identical computational cost and require a con-stant amount of data transfer.

The assumption of homegeneous tasks can be jus-tified by the reason that there are many examplesof scientific applications (e.g. scientific workflowsor large parameter sweeps) that include a stage ofa high number of parallel nearly identical tasks.Such examples can be found e.g. in typical scientificworkflows executed using Pegasus Workflow Man-agement system, where e.g. CyberShake or LIGOworkflows have a parallel stage of nearly homoge-neous tasks [9]. Other examples are Wien2K and AS-TRO workflows that consist of iteratively executedparallel stages comprising homogeneous tasks[10].Due to the high number of parallel branches, thesestages accumulate the most significant computingtime of the whole application, so optimization of theexecution of this stage is crucial. Moreover, if thetasks are not ideally homogeneous, it is possible toapproximate them using a uniform set of tasks withthe mean computational cost and data sizes of theapplication tasks. Of course, in real execution the ac-tual task performance may vary, so the solution ob-tained using our optimization method becomes onlyapproximate of the best allocation, and the actualcost may be higher and deadline may be exceeded.In order to estimate the quality of this approxima-tion, we evaluate the impact of dynamic task run-time and non-uniform tasks in section 7.

We assume that for each task a certain amount ofinput data needs to be downloaded, and after it fin-ishes, the output results need to be stored. In thecase of data-intensive tasks, the transfers may con-tribute a significant amount of total task run time.

Figure 1. The model of application and infrastructure

3.2. Infrastructure model

Two types of cloud services are required to com-plete tasks: storage and virtual machines. AmazonS3 and Rackspace Cloud Files are considered as ex-amples of storage providers, while Amazon EC2,Rackspace, GoGrid and ElasticHosts represent com-putational services. In addition, the model includes aprivate cloud running on own hardware. Each cloudprovider offers multiple types of virtual machine in-stances with different performance and price.

For each provider the number of running virtual ma-chines may be limited. This is mainly the case forprivate clouds that have a limited capacity, but alsothe public clouds often impose limits on the num-ber of virtual machines. E.g. Amazon EC2 allowsmaximum of 20 instances and requires to request aspecial permission to increase that limit.

Cloud providers charge their users for each runningvirtual machine on an hourly basis. Additionally,users are charged for remote data transfer while localtransfer inside provider’s cloud is usually free. Thesetwo aspects of pricing policies may have a significantimpact on the cost of completing a computationaltask.

Cloud services are characterized by their pricingand performance. Instance types are describedby price per hour, relative performance and datatransfer cost. To assess the relative performanceof clouds it is possible to run application-specificbenchmarks on all of them, or to use publicly avail-able cloud benchmarking services, such as Cloud-

3

Page 4: CostMinimizationforComputationalApplicationsonHybridCloud …galaxy.agh.edu.pl/~malawski/cloud-reu-fgcs.pdf · 2013-01-16 · the mean computational cost and data sizes of the applicationtasks.Ofcourse,inrealexecutiontheac-tual

Harmony 3 . CloudHarmony defines performance ofcloud instances in the units named CloudHarmonyCompute Units (CCU) as similar to Amazon EC2Compute Unit (ECU), which are approximatelyequivalent to CPU capacity of a 1.0-1.2 GHz 2007Opteron or 2007 Xeon processor. Storage platformsinclude fees for data transfer. Additionally, a con-stant fee per task may be present, e.g. price perrequest for a queuing service 4 .

Our model includes all the restrictions that werementioned above.

4. Problem formulation using AMPL

To perform optimization of the total cost, MixedInteger Non-Linear Problem (MINLP) is formulatedand implemented in A Mathematical ProgrammingLanguage (AMPL) [4]. AMPL requires to specifyinput data sets and variables to define the searchspace, as well as constraints and objective functionto be optimized.

4.1. Input data

The formulation requires the following input sets,which represent the cloud infrastructure model:

– S = {s3, cloudfiles} – defines available cloudstorage sites,

– P = {amazon, rackspace, . . . } – defines possiblecomputing cloud providers,

– I = {m1.small, . . . , gg.1gb, . . . } – defines in-stance types,

– PIp ⊂ I – instances that belong to provider Pp,– LSs ⊂ P – compute cloud providers that are local

to storage platform Ss.

Each instance type Ii is described by the followingparameters:

– pIi – fee in $ for running instance Ii for one hour,– ccui – performance of instance in CloudHarmony

Compute Units (CCU),

3 http://blog.cloudharmony.com/2010/05/

what-is-ecu-cpu-benchmarking-in-cloud.html4 e.g. Amazon SQS

– pIouti and pIini – price for non-local data transferto and from the instance, in $ per MiB.

Storage sites are characterized by:

– pSouts and pSin

s characterize price in $ per MiB fornon local data transfer.

Additionally we need to provide data transfer ratesin MiB per second between storage and instances bydefining function ri,s > 0 .

We assume that the computation time of a task isknown and constant, this also applies to input andoutput data size. We also assume that tasks areatomic (non divisible). Computation is character-ized by the following parameters:

– Atot – count of tasks,– tx – execution time in hours of one task on 1 CCU

machine,– din and dout – data size for input and output of

one task in MiB,– pR – price per request for queuing service,– tD – total time for completing all tasks in hours

(deadline).

4.2. Auxiliary parameters

The formulation of the problem requires a set ofprecomputed parameters which are derived from themain input parameters of the model. The relevantparameters include:

tneti,s =din + dout

ri,s · 3600(1)

tui,s =tx

ccui+ tneti,s (2)

cti,s = (dout · (pIouti + pSins )

+ din · (pSouts + pIini ))

(3)

adi,s = b tD

tui,sc (4)

tqi,s = dtui,se (5)

aqi,s = btqi,stui,sc (6)

tdi,s = db tD

tui,sc · tui,se (7)

4

Page 5: CostMinimizationforComputationalApplicationsonHybridCloud …galaxy.agh.edu.pl/~malawski/cloud-reu-fgcs.pdf · 2013-01-16 · the mean computational cost and data sizes of the applicationtasks.Ofcourse,inrealexecutiontheac-tual

– tneti,s – transfer time: time for data transfer betweenIi and Ss,

– tui,s – unit time: time for processing a task on in-stance Ii using storage Ss that includes comput-ing and data transfer time (in hours),

– cti,s – cost of data transfer between instance Ii andstorage Ss,

– adi,s – number of tasks that can be done on one

instance Ii when using storage Ss running for tD

hours,– tqi,s – time quantum: minimal increase of instance

running time that is sufficient to increase the num-ber of processed tasks, rounded up to full hour,

– aqi,s – number of tasks that can be done in tqi,shours,

– tdi,s – instance deadline: number of hours that canbe effectively used for computation.

4.3. Variables

Variables that will be optimized and define the so-lution space are listed below:

– Ni – number of instances of type Ii to be deployed,– Ai – number of tasks to be processed on instances

of type Ii,– Ds – 1 iif Ss is used, otherwise 0; only one storage

may be used,– Ri – number of remainder (tail) hours for instanceIi,

– Hi – 1 iif Ri 6= 0, otherwise 0.

Fig. 2 illustrates how the tasks are assigned tomultiple running instances. The tasks are atomicand there is no checkpointing or preemption. Eventhough all the tasks have the same computationalcost, their total processing time depends on per-formance of the VM type and data transfer rate.Moreover, the users are charged for the runtime ofeach VM rounded up to full hours. In our model thetasks are assigned to the instances in the followingway. First, in the process of optimization Ai tasksare assigned to the instance Ii. This gives Ni in-stances of type Ii running for tdi,s hours. Remainingtasks are assigned to an additional instance runningfor Ri hours. We will refer to them as tail hours.

In order to enforce provider’s instance limit, Hi isintroduced which indicates if Ii has any tail hours.E.g. in Fig. 2 instances of type 1 have 3 tail hours,

0

1

2

3

4

5

type 1 type 2 type 3 type 4

VM ru

nnin

g tim

e (h

ours

)

Instance type

Regular instances"Tail" instance

Regular instances "Tail" instanceType not used

Idle timeTask processing

task

task

task

task

task

task

task

task

task

task

task

task

task

tasktasktasktasktasktasktasktask

tasktasktasktasktasktasktasktask

tasktasktasktasktasktasktasktasktask

tasktasktasktask

Figure 2. Task scheduling policy

and instances of type 2 have no tail hours.

4.4. Formulation of objectives

Cost of running a single task which includes the costof VM instance time required for data transfer andtask computing time, together with data transfercosts and request cost, can be described as:

(tnet + tu

)· pI+

din ·(pSout + pIin

)+

dout ·(pIout + pSin

)+

pR.

The objective function represents the total cost ofrunning multiple tasks of the application on thecloud infrastructure and it is defined as:

minimizetotal cost

∑i∈I

((∑s∈S

Ds · tdi,s ·Ni + Ri) · pIi

+ Ai · (pR +∑s∈S

Ds · cti,s))(8)

subject to the constraints:

5

Page 6: CostMinimizationforComputationalApplicationsonHybridCloud …galaxy.agh.edu.pl/~malawski/cloud-reu-fgcs.pdf · 2013-01-16 · the mean computational cost and data sizes of the applicationtasks.Ofcourse,inrealexecutiontheac-tual

∀i∈I

Ai ∈ Z ∧ 0 ≤ Ai ≤ Atot (9)

∀s∈S

Ds ∈ {0, 1} (10)

∀i∈I

Ni ∈ Z ∧ 0 ≤ Ni ≤ nImaxi (11)

∀i∈I

Ri ∈ Z ∧ 0 ≤ Ri ≤ tD − 1 (12)

∀i∈I

Hi ∈ {0, 1} (13)

∀i∈I

Ai ≥∑s∈S

(Ni · adi,s) ·Ds (14)

∀i∈I

Ai ≤∑s∈S

(Ni · adi,s + max(adi,s − 1, 0)) ·Ds

(15)

∀i∈I

Ri ≥∑s∈S

(Ai −Ni · adi,s) · tui,s ·Ds (16)

∀i∈I

Ri ≤∑s∈S

(Ai −Ni · adi,s + aqi,s) · tui,s ·Ds (17)∑

i∈I

Ai = Atot (18)∑s∈S

Ds = 1 (19)

∀i∈I

Hi ≤ Ri ≤ max(tD − 1, 0) ·Hi (20)

∀p∈P

∑i∈PIp

(Hi + Ni) ≤ nPmaxp (21)

Interpretation of the constraints is the following:

– (9) to (13) define weak constraints for a solution,i.e. they are to ensure that the required variableshave the appropriate integer or binary values,

– (14) and (15) ensure that number of instances isadequate to number of assigned tasks; for chosenstorage Ds

Ni · adi,s ≤ Ai ≤ Ni · adi,s + max(adi,s − 1, 0

)where the lower bound is given by full allocationof Ni machines and the upper bound includes afully allocated tail machine,

– (16) and (17) ensure that number of tail hoursis adequate to number of remaining tasks, imple-ment Ri =

⌈(Ai −Ni · adi,s

)· tuis⌉,

– (18) ensures that all tasks are processed,– (19) ensures that only one storage site is selected,– (20) ensures that Ri has proper value, implements

Hi =

1 Ri > 0

0 Ri = 0.

– (21) enforces providers’ instance limits.

Defining the problem in AMPL enables to chooseamong a wide range of solvers that can be used asbackend. The problem itself is MINLP, but can bereduced to Integer Linear Programming (ILP) prob-lem. The nonlinear part of problem comes from stor-age choice, so by fixing storage provider and runningoptmization procedure for each storage separatelythe optimal solution is found.

Initially we used Bonmin [11] solver, but after themodel was fully implemented and subject to moretests, it appeared that CBC [12] solver performs bet-ter with default options. This results from the factthat Bonmin is a solver designed to solve MINLPproblems and uses various heuristics, while CBCuses a branch and bound algorithm tuned for ILP. Asthe problem is linear and convex, CBC finds globaloptimum.

The model was optimized so that it should give ac-ceptable results in ∼0.10 seconds 5 .

5. Results

To evaluate our model, we first run the optimiza-tion process for two scenarios with a private cloudand (1) infinite public clouds (Section 5.1) and (2)finite public clouds (Section 5.2). We also evaluatedthe effect of possible overlapping of computationsand data transfers (Section 5.3). When testing themodel under various input parameters, we identifiedinteresting special cases, which are described in Sec-tion 5.4. Finally, we performed a sensitivity analysisand discussed its implications and potential usagein Section 6.

We evaluated the model for two types of tasks – dataintensive tasks that require relatively large amountof data for short computing time (we assumed 512MiB of input and 512 MiB of output) and computeintensive tasks that require relatively small amountof data (we assumed 0.25 MiB of input and 0.25MiB of output). Each case consists of 20,000 tasks,each of them requires 0.1 hour of computing time ona VM with performance of 1 CCU. Two scenarioswith infinite and finite public clouds were consid-ered, where as the costs and performance of publicclouds we used the data from CloudHarmony bench-

5 As measured on quad-core Intel i5 machine.

6

Page 7: CostMinimizationforComputationalApplicationsonHybridCloud …galaxy.agh.edu.pl/~malawski/cloud-reu-fgcs.pdf · 2013-01-16 · the mean computational cost and data sizes of the applicationtasks.Ofcourse,inrealexecutiontheac-tual

12

14

16

18

20

22

24

26

28

0 10 20 30 40 50 60 70 80 90 100

Cos

t ($)

Time limit (hours)

Amazon S3Rackspace Cloud Files

Optimal

Figure 3. Small data processed on infinite public cloud

marks. This dataset gives the data of 4 computecloud providers (Amazon EC2, Rackspace, Elasti-cHosts and GoGrid) and 2 storage providers (Ama-zon S3 and Rackspace Cloud Files), giving the in-stance prices between $0.06 and $2.40 per hour andperformance between 0.92 and 27.4 CCU. We as-sumed that the private cloud instances have theperformance of 1 CCU and $0 cost. For each sce-nario we varied the deadline parameter between 5and 100 hours. Since we considered only two storageproviders (S3 and Cloud Files), we run the solverseparately for these two parameters.

5.1. Private + infinite public clouds

In this scenario we assumed that the private cloudcan run a maximum of 10 instances, while the pub-lic clouds have unlimited capacity. The results fortasks that require small amount of data are shownin Fig. 3. As the deadline is extended, the total costlinearly drops as expected. As many tasks as possi-ble are run on the private cloud, and for all the re-maining tasks the instance type with best price toperformance ratio is selected.

This situation changes as data size grows (Fig. 4)– for data intensive tasks the total cost is nearlyconstant as all the work is done on a public cloud.This results from the fact that data transfer cost toand from the private cloud is higher than the costof running instances on public cloud. In our case itturns out that the lowest cost can be achieved whenusing Cloud Files storage from Rackspace, since inour dataset the best instance type in terms of priceto performance was available at Rackspace.

0

50

100

150

200

250

0 10 20 30 40 50 60 70 80 90 100

Cos

t ($)

Time limit (hours)

Amazon S3Rackspace Cloud Files

Optimal

Figure 4. Large data processed on infinite public cloud

0

20

40

60

80

100

120

140

160

0 10 20 30 40 50 60 70 80 90 100

Cos

t ($)

Time limit (hours)

Amazon S3Rackspace Cloud Files

Optimal

Figure 5. Small data processed on finite public cloud

5.2. Private + finite public clouds

If assumption that public clouds are limited is made,situation is not so straightforward (See Fig. 5 and 6).For relatively long deadlines, a single cloud platformfor both VMs and storage can be used, which meansthat the data transfer is free. As the deadline short-ens we first observe a linear cost increase. At thepoint when the selected cloud platform reaches itsVM limit, additional clouds need to be used, so weneed to begin paying for the data transfer. There-fore the cost begins to increase more rapidly.

This effect is very significant in the case of data in-tensive tasks (Fig. 6) as the cost growth may becomevery steep. For example, in our tests the task pro-cessing cost in 28 hours was $157.39, in 30 hours itwas $131.14 and in 34 hours it was only $30.26. Forlonger deadlines there was no further decrease. Wecan also observe that for longer deadlines the CloudFiles storage provider is less expensive for the samereason as it was in Fig. 4. Shorter deadlines, how-ever, require to run more powerful instances fromother clouds (Amazon EC2), thus it becomes more

7

Page 8: CostMinimizationforComputationalApplicationsonHybridCloud …galaxy.agh.edu.pl/~malawski/cloud-reu-fgcs.pdf · 2013-01-16 · the mean computational cost and data sizes of the applicationtasks.Ofcourse,inrealexecutiontheac-tual

0

500

1000

1500

2000

2500

3000

0 10 20 30 40 50 60 70 80 90 100

Cos

t ($)

Time limit (hours)

Rackspace instancesRackspace and private instances

Amazon's and private instances

Multiple providers

Amazon S3Rackspace Cloud Files

Optimal

Figure 6. Large data processed on finite public cloud

economical to use its local S3 storage.

5.3. Overlapping computation and data transfers

In our model we assume that computation and datatransfers are not overlapping. To achieve parallelismof these two processes, the model needs to be modi-fied in the following way. We assumed that the totaltask computation time is maximum of task execu-tion and data transfer time. Additionally, input forthe first task and output of the last task must betransferred sequentially. Equations 2, 4 and 7 areupdated for this case as follows:

tui,s = max(tx

ccui, tneti,s ) (22)

adi,s = b(tD − tneti,s )

tui,sc (23)

tdi,s = db(tD − tneti,s )

tui,sc · tui,se (24)

Fig. 7 shows results for the same experiment as inSection 5.2 and Fig. 6, but with computing andtransfers overlapping. Overlapping reduces totalcost as time required for task processing is signif-icantly lower. This is especially true for shorterdeadlines when multiple clouds are used, as transferrates are lower between different clouds comparingto one provider infrastructure.

0

500

1000

1500

2000

0 10 20 30 40 50 60 70 80 90 100

Cos

t ($)

Time limit (hours)

Optimal w/ overlappingOptimal w/o overlapping

Figure 7. Large data processed on finite public cloud withoverlapping computation and data transfer

5.4. Identifying special cases

Running a test case with very large tasks whichcannot be completed within one hour on largestavailable instance revealed an interesting model be-haviour. Results are shown on Fig. 8. Local minimamay occur for certain deadlines thus cost does notincrease monotonically with decrease of deadline.This is a consequence of our task scheduling pol-icy, as explained in Section 4.3. In the case of largetasks, the schedule for deadline of 9 hours as seen inFig. 9a costs less than for deadline of 10 hours as inFig. 9b. In the first case the total number of VM-hours of the gg-8mb instance is equal to 56, but inthe case of deadline = 10 hours the total number is58, which results in higher cost. This is the resultof the policy, which tries to keep VMs running timeas close to the deadline as possible. Such policy isbased on general observation, that for a given bud-get it is usually more economical to start less VMsfor longer time than to start more VMs for shortertime. However, there are rare cases when such pol-icy may lead to non-optimal solutions. It should benoted, that the solution of the model returned bythe solver in this case is optimal, but it is the modelitself that does not allow to find the minimum cost.

Moreover, for longer deadlines the cost is a step func-tion – e.g. the cost for deadline = 18 hours is thesame as for 14 hours. These two observations sug-gest that the model could be modified in such a waythat the deadline is also a variable with upper boundconstraint. Similar result can be achieved in a sim-ple way by solving the current model for multipledeadlines in the neigborhood of the desired deadlineand by selecting the best solution.

8

Page 9: CostMinimizationforComputationalApplicationsonHybridCloud …galaxy.agh.edu.pl/~malawski/cloud-reu-fgcs.pdf · 2013-01-16 · the mean computational cost and data sizes of the applicationtasks.Ofcourse,inrealexecutiontheac-tual

Figure 10. Optimal cost for a wide range of deadline constraints and data sizes.

0

50

100

150

200

250

300

0 5 10 15 20 25 30 35 40

Cos

t ($)

Time limit (hours)

Amazon S3Rackspace Cloud Files

Optimal

Figure 8. Special case with local minimum for deadline 9

6. Sensitivity analysis

In order to better assess how our model behaves inresponse to the changing constraints and for varyinginput parameters, we performed a sensitivity anal-ysis by sampling a large parameter space. Fig.10shows the solutions obtained for deadlines rangingfrom 2 to 40 hours and for input and output datasizes ranging from 0 to 256 MiB. As it can be seen inthe plot, for all data sizes the cost increases mono-tonically with the decrease of the deadline, whichconfirms that no anomalies are observed. The samedata can be also observed as animated plot availableas on-line supplement 6 .

6 See also http://youtu.be/FWjgMwLdZW4

0 1 2 3 4 5 6 7 8 9

10

c1.medium x 20 rs-1gb x 15 gg-8gb x 7 gg-1gb x 3 eh-1gb-2gh x 20

VM ru

nnin

g tim

e

Instance type

Storage: CloudFiles, Cost: 167.75, Duration: 9, Deadline: 9, Idle: 5.32 %

(a) Deadline = 9 hours

0 1 2 3 4 5 6 7 8 9

10

c1.medium x 20 rs-1gb x 15 gg-8gb x 5 + 1 gg-1gb x 4 eh-1gb-2gh x 20

VM ru

nnin

g tim

e

Instance type

Storage: CloudFiles, Cost: 172.12, Duration: 10, Deadline: 10, Idle: 6.09 %

Idle timeTask processing

(b) Deadline = 10 hours

Figure 9. Task schedule for the identified special case

Fig. 11 presents these results from a perspectivewhere each curve shows how the cost depends ondata size for varying deadlines.

One of the questions that our model can help answeris how the changes in cost are sensitive to the changes

9

Page 10: CostMinimizationforComputationalApplicationsonHybridCloud …galaxy.agh.edu.pl/~malawski/cloud-reu-fgcs.pdf · 2013-01-16 · the mean computational cost and data sizes of the applicationtasks.Ofcourse,inrealexecutiontheac-tual

0

100

200

300

400

500

600

0 50 100 150 200 250

Cos

t ($)

Data size (MB)

Deadline 5Deadline 10Deadline 15

Deadline 20Deadline 25Deadline 30

Deadline 35

Figure 11. Cost as a function of data size for different dead-lines.

0

100

200

300

400

500

600

0 5 10 15 20 25 30 35 40

Cos

t ($)

Time limit (hours)

Input/output size 150 MiB

Amazon S3Rackspace Cloud Files

Optimal

(a) Cost as a function of deadline

-10

-8

-6

-4

-2

0

0 5 10 15 20 25 30 35 40

Elas

ticity

Time limit (hours)

(b) Elasticity of cost versus deadline

Figure 12. Sensitivity of costs in response to changes in

deadline

of deadline. Fig.12 shows the cost function as wellas corresponding elasticity, which is computed as:Ef(x) = x

f(x)f′(x), where in this case f is cost and

x is deadline. Elasticity gives the information onwhat is the percentage change of cost in response tothe percentage change of deadline. The function hasnegative values, since the cost decreases with theincrease of deadline. It is interesting to see in whichranges the elasticity has larger absolute value, which

corresponds to more steep cost function. Here we cansee that the elasticity grows for short deadlines andclose to the deadline of 25 hours, which is the pointwhere the solution cannot use only the VM instancesfrom most cost-effective clouds and requires moreexpensive ones to meet the deadline.

Identifying such ranges with high absolute value ofelasticity is important for potential users of the cloudsystem, including researchers (end users) or resellers(brokers). The end user can for example observe thatchanging the deadline from 40 to 30 hours or evenfrom 20 to 10 hours will not incur much additionalcost. However, changing the deadline from 30 to 20hours is very costly, so it should be avoided. On theother hand, the situation looks different from theperspective of a reseller, who buys cloud resourcesfrom providers and offers them to end-users in orderto make profit. The reseller can for example offer tocompute the given set of tasks in 20 hours for $150with minimal profit, but it can also offer to completethe same set of tasks in 30 hours for $50. Such pricecan seem attractive to the end user who pays 1/3 ofthe price for increasing the deadline by 50%, but itis in fact very beneficial for the reseller, whose profitreaches 100%. Such cost analysis is an importantaspect of planning large computing experiments oncloud infrastructures.

7. Impact of dynamic environment

As stated in section 3, we assume that executiontime, transfer rate and data access time for all thetasks are constant. However, in real environmentsthe actual runtime of tasks will vary. The goal ofthe following simulation experiment was to estimatethe impact of this runtime variation on the qualityof results obtained using our optimization model.

For all the task assignment solutions presentedin Fig. 10 we add runtime variations to the taskruntimes in the following way. For each task wegenerate a random error in the range from −vto v using uniform distribution, where v is theruntime variation range. We tested values of v =10%, 20%, 30%, 40%, 50%. Such modified task run-times are then used to calculate the actual runtimeand cost of VM. Due to variation of task runtimes,it is possible that some computations may not fin-ish before the deadline (time overrun) and that the

10

Page 11: CostMinimizationforComputationalApplicationsonHybridCloud …galaxy.agh.edu.pl/~malawski/cloud-reu-fgcs.pdf · 2013-01-16 · the mean computational cost and data sizes of the applicationtasks.Ofcourse,inrealexecutiontheac-tual

-0.05

0

0.05

0.1

0.15

0.2

0.25

0.3

0 5 10 15 20 25 30 35 40

Cos

t ove

rrun

(%)

Time limit (hours)

10%

-5

0

5

10

15

20

25

30

0 5 10 15 20 25 30 35 40

Cos

t ove

rrun

(%)

Time limit (hours)

50%

Figure 13. Impact of runtime variation on cost overrun as a

function of deadline. Results obtained for random variationof task runtime in the range from −10% to +10% (top) and

from −50% to +50% (bottom).

cost of additional VM-hours may need to be paid(cost overrun).

We assume that task size variations include alsovariations of data transfer time. We don’t take intoaccount variations of data transfer costs, since thetransfer costs for each task depend linearly on datasize, so the aggregated impact of positive and nega-tive variations cancels to nearly 0.

Fig.13 shows the cost overrun with 10% and 50%of runtime variation range. The boxplots for eachdeadline represent averaged results for the data sizesfrom 0 to 256 MiB as in Fig. 10, with box boundariesat quartiles and whiskers at maximum and minimumvalues. We observe that the highest overruns are forthe shortest deadlines, which results from the factthat when each VM instance runs only for a fewhours, then the additional hour will incur relativelyhigh additional cost. On the other hand, for longerdeadlines the cost of additional VM-hour becomesless significant. We also observe that the aggregatedimpact of positive and negative variations in taskexecution time may cancel to nearly 0 and in somecases the cost overrun may be negative.

In a similar way Fig. 14 shows the deadline overrunin the presence of runtime variations. We can ob-serve that the deadline overrun is much smaller than

0

0.005

0.01

0.015

0.02

0.025

0.03

0 5 10 15 20 25 30 35 40

Tim

e ov

erru

n (%

)

Time limit (hours)

10%

0

5

10

15

20

25

0 5 10 15 20 25 30 35 40

Tim

e ov

erru

n (%

)

Time limit (hours)

50%

Figure 14. Impact of runtime variation on deadline overrun as

a function of deadline. Results obtained for random variationof task runtime in the range from −10% to 10% (top) and

from −50% to +50% (bottom).

cost overrun. This means that the actual finish timeof all tasks is not significantly affected by the run-time variation, due to the cancellation effect. We ob-serve that even for high variations in the range from−50% to 50% the actual runtime rarely exceeds thedeadline by more than 10%. This overrun can bethus easily compensated by solving the optimizationproblem with a respectively shorter deadline givinga safety margin in the case of expected high varia-tions of actual task runtimes. Similar estimation isalso possible in the case when the application con-sists of many tasks of similar size which distributionin known in advance.

8. Conclusions and Future Work

The results presented in this paper illustrate typi-cal problems when making decisions on deploymentplanning on clouds and how they can be addressedusing optimization techniques. We have shown howthe mixed integer nonlinear programming can beapplied to model and solve the problem of resourceallocation on multiple heterogeneous clouds, includ-ing private and public ones, and taking into accountthe cost of compute instances and data transfers.

11

Page 12: CostMinimizationforComputationalApplicationsonHybridCloud …galaxy.agh.edu.pl/~malawski/cloud-reu-fgcs.pdf · 2013-01-16 · the mean computational cost and data sizes of the applicationtasks.Ofcourse,inrealexecutiontheac-tual

Our results show that the total cost grows slowlyfor long deadlines, since it is possible to use free re-sources from a private cloud. However, for shor dead-lines it is necessary to use the instances from publicclouds, starting from the ones with best price/per-formance ratio. The shorter the deadlines, the morecostly instance types have to be added, thus the costgrows more rapidly. Moreover, our results can bealso useful for multi-objective optimization. In sucha case, it would be possible to run the optimizationalgorithm in a certain neighbourhood of the desireddeadline and select the best solution using a speci-fied cost/time trade-off. Alternatively, multiple so-lutions as in Fig. 6, 10 or 11 may be presented tothe users allowing them to select the most accept-able solution. Our model can be also used as an ap-proximate method to solve the problems where taskssizes are not ideally uniform, but can differ within alimited range.

Optimal task allocation in hybrid cloud environmentis not a trivial problem as one needs to know the es-timates of computational cost of tasks in advance.If such data are available, it is possible to use toolssuch as AMPL. This approach may be successfulas long as one is able to formulate the optimiza-tion model and select a suitable solver. These tasksare not straightforward though, since small changein model may move problem from one class to an-other (e.g. from mixed integer to MINLP) requiringto find another solver. Optimal specification of themodel is also important as the same problem maybe formulated in various ways, each of which whichmay differ considerably in performance.

In future work we plan to experiment with varia-tions of the model to represent other classes of ap-plications, such as scientific workflows [1] that oftenconsist of multiple stages, each characterized by dif-ferent data and compute requirements.

References

[1] E. Deelman, “Gridsand clouds: Making workflow applications work inheterogeneous distributed environments,” InternationalJournal of High Performance Computing Applications,

vol. 24, no. 3, pp. 284–298, August 2010.

[2] M. Malawski, J. Meizner, M. Bubak, and P. Gepner,

“Component approach to computational applications on

clouds,” Procedia Computer Science, vol. 4, pp. 432–

441, May 2011.

[3] M. Malawski, T. Guba la, and M. Bubak, “Component-based approach for programming and running scientific

applications on grids and clouds,” International Journal

of High Performance Computing Applications, vol. 26,no. 3, pp. 275–295, August 2012.

[4] R. Fourer, D. M. Gay, and B. W. Kernighan, AMPL:

A Modeling Language for Mathematical Programming.Duxbury Press, 2002.

[5] J. Chen, C. Wang, B. B. Zhou, L. Sun, Y. C. Lee, and

A. Y. Zomaya, “Tradeoffs between profit and customersatisfaction for service provisioning in the cloud,” in

Proceedings of the 20th international symposium onHigh performance distributed computing, ser. HPDC ’11.

New York, NY, USA: ACM, 2011, pp. 229–238.

[6] H. Kim, Y. el-Khamra, I. Rodero, S. Jha, andM. Parashar, “Autonomic management of application

workflows on hybrid computing infrastructure,” Sci.

Program., vol. 19, pp. 75–89, April 2011.[7] I. Brandic, S. Pllana, and S. Benkner, “Specification,

planning, and execution of qos-aware grid workflows

within the amadeus environment,” Concurrency andComputation: Practice and Experience, vol. 20, no. 4,

pp. 331–345, 2008.

[8] S. Pandey, A. Barker, K. K. Gupta, and R. Buyya,“Minimizing Execution Costs when Using Globally

Distributed Cloud Services,” in 24th IEEE InternationalConference on Advanced Information Networking

and Applications. IEEE Computer Society, 2010,

Conference proceedings (article), pp. 222–229.[9] S. Bharathi, A. Chervenak, E. Deelman, G. Mehta,

M.-H. Su, and K. Vahi, “Characterization of scientific

workflows,” in Workflows in Support of Large-ScaleScience, 2008. WORKS 2008. Third Workshop on.

IEEE, Nov. 2008, pp. 1–10. [Online]. Available: http:

//dx.doi.org/10.1109/WORKS.2008.4723958[10] R. Duan, R. Prodan, and X. Li, “A sequential

cooperative game theoretic approach to Storage-

Aware scheduling of multiple Large-Scale workflowapplications in grids,” in Grid Computing (GRID),

2012 ACM/IEEE 13th International Conference on.IEEE, 2012, pp. 31–39. [Online]. Available: http:

//dx.doi.org/10.1109/Grid.2012.14[11] P. Bonami, L. T. Biegler, A. R. Conn, G. Cornuejols,

I. E. Grossmann, C. D. Laird, J. Lee, A. Lodi,F. Margot, and N. Sawaya, “An algorithmic framework

for convex mixed integer nonlinear programs,” DiscreteOptimization, vol. 5, no. 2, pp. 186–204, May 2008.

[12] J. Forrest, “Cbc (coin-or branch and cut) open-sourcemixed integer programming solver,” 2012. [Online].Available: https://projects.coin-or.org/Cbc

12