Impact of the Available Bandwidth in Inter-Cloud Links on ...nfonseca/data/uploads... · 2.2 Scheduling Workﬂows in Hybrid Clouds A hybrid cloud (Figure1) is composed of the resources

2168-7161 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TCC.2015.2469650, IEEE Transactions on Cloud Computing

IEEE TRANSACTIONS ON CLOUD COMPUTING 1

Estimation of the Available Bandwidth inInter-Cloud Links for Task Scheduling

in Hybrid CloudsThiago A. L. Genez, Luiz F. Bittencourt, Nelson L. S. da Fonseca, and Edmundo R. M. Madeira

Institute of Computing, University of Campinas (UNICAMP)Av. Albert Einstein, 1251, 13083-852, Campinas, Sao Paulo, Brazil

{thiagogenez, bit, nfonseca, edmundo}@ic.unicamp.br

Abstract—In hybrid clouds, inter-cloud links play a key role in the execution of jobs with data dependencies. Insufficient availablebandwidth in inter-cloud links can increase the makespan and the monetary cost to execute the application on public clouds. Impreciseinformation about the available bandwidth can lead to inefficient scheduling decisions. This paper attempts to evaluate the impactof imprecise information about the available bandwidth in inter-cloud links on workflow schedules, and it proposes a mechanism tocope with imprecise information about the available bandwidth and its impact on the makespan and cost estimates. The proposedmechanism applies a deflating factor on the available bandwidth value furnished as input to the scheduler. Simulation results showedthat the mechanism is able to increase the number of solutions with makespans that are shorter than the defined deadline and reducethe underestimations of the makespan and cost provided by workflow schedulers.

Index Terms—Cloud computing, hybrid clouds, workflow, scheduling

F

1 INTRODUCTION

IN hybrid clouds, when computational demands ex-ceed the capacity of the private pool of computational

resources, users can lease resources from public cloudproviders. Such capacity of leasing on demand resourcesfrom public providers as needed is called “elasticity” [1].

A workflow scheduler is an agent that is responsiblefor determining on which resource the jobs of a work-flow should run, distributing workflow jobs to differentprocessing elements so that they can run in parallel,speeding up the execution of the application (the termjob refers to any computational work to be executed,therefore comprising services or standard tasks) [2]. Thescheduler is a central element in a cloud architecture thatpromotes elasticity.

When processing workflows with data dependencies,jobs can be placed on different clouds which requiredata transfer on Internet links. The available bandwidthon the Internet links fluctuates because it is a sharedresource; it can decrease, which increases data transfertimes and consequently the execution time (makespan)of workflows. Moreover, the increase of the makespancan cause the need of longer rental periods, whichincreases costs. Such variability of data transfer periodsneeds to be considered by a scheduler when producinga schedule. Moreover, the problem of workflow schedul-ing in hybrid clouds encompasses several variables, suchas a bound to the execution time (deadline). A previousstudy [3] suggested that data transfer on the Internetlinks when processing workflows with data dependen-

cies has a significant impact on the execution times andcosts when using public clouds.

Most studies on cloud networking focus on datacenternetworks [4], [5], including those aimed at guaranteeingbandwidth [6]. These studies do not address the problemof having imprecise information about the availablebandwidth at scheduling time. In a previous work [3],we proposed a procedure for adjusting the estimates ofthe available bandwidth that are used as inputs to sched-ulers that are not designed to address with inaccurateinformation. The contribution of the present paper istwofold. The first is to provide an answer to the funda-mental question about the impact of uncertainty in rela-tion to the value of the available inter-cloud bandwidthon the schedules produced by a hybrid cloud schedulernot originally designed to address such uncertainty. Thesecond is the introduction of a mechanism to reduce theimpact of such uncertainties on the schedule generatedby scheduling algorithms proposed in the literature.

In an attempt to answer this question, the performanceof three different scheduling algorithms, that were notdesigned to use inaccurate information, are evaluated toshow their inability to address with the variability of theavailable bandwidth in inter-cloud links. A procedure isthen proposed for adjusting the estimate of the availablebandwidth based on the expected uncertainty so thatunderestimations of the makespan and costs can bereduced.

The proposed procedure deflates the estimated avail-able bandwidth that is used as an input to a hybridcloud scheduler. The deflating factor is computed using




multiple linear regression, and the execution history of aspecific workflow is employed in the computation. Theeffectiveness of the procedure in handling imprecise iscompared the effectiveness the HCOC (Hybrid CloudOptimized Cost) [7] and PSO (Particle Swarm Opti-mization) [8] schedulers. The procedure was evaluatedusing several scenarios involving different clouds andtheir occupancy as well as a wide range of deadlineconstraints. Three types of scientific workflows availablefrom the workflow generator gallery1 were employed inthe evaluation. This gallery contains workflows that usestructures and parameters taken from real applications.

Simulation results showed that the use of the proposedprocedures can reduce the cost estimation error as wellas increase the number of qualified solutions. The pro-posed procedures were able to reduce the number ofdisqualified solutions by up to 70% with costs reducedby up to 10%.

This paper is organized as follows. Section 2 re-views the workflow scheduling problem and some cloudschedulers that were not designed to handle inaccu-rate information about the available bandwidth. Theproposed mechanism is introduced in Section 3 andevaluated in Section 4, and the results are discussed inSection 5. Section 6 presents related work, and Section 7provides final remarks and conclusions.

2 WORKFLOW SCHEDULING PROBLEM

In this section, we describe workflow scheduling inhybrid clouds and present two scheduling algorithms;one is cost-aware, and the other is cost- and deadline-aware.

2.1 Workflow RepresentationWorkflow applications vary in size and computationaldemands, and both can depend on the input data.This process makes this type of workflow suitable forprocessing in a hybrid cloud because the computationaldemands can exceed the available capacity of privateclouds. Data dependencies in non-cyclic workflows arecommonly modeled as a directed acyclic graph (DAG),in which nodes represent jobs, and arcs represent jobdependencies. A topological ordering of a DAG is aneasy way of visualizing the workflow. A job can onlystart after all of its data dependencies have been solved;i.e., after all of its parent jobs have finished and have sentthe necessary data to the job. Labels on the nodes andarcs of a workflow DAG designate the computationaland communication demands, respectively. The compu-tational demands depict how much computing powera job requires to be executed, and the communicationdemands depict how much data must to be transferredbetween two jobs.

An arc eva,vbof G represents a data file produced

by va that will be consumed by vb. Let G = (V, E) be

1. https://confluence.pegasus.isi.edu/display/pegasus

a DAG with n nodes (or jobs), in which each v ∈ Vis a workflow job with a non-negative computationaldemand wv ∈ P , and eva,vb ∈ E , va 6= vb, is an arc thatrepresents the data dependency between jobs ja and jbwith a non-negative communication demand ca,b ∈ C.The 4-tuple W = {V, E ,P, C} represents the workflowand job demands.

Several scientific workflows generate petabytes oreven exabytes of data, which are usually distributed forprocessing at different sites. Communication betweenjobs occurs via file transfer, usually on Internet links.Examples of scientific workflows include Montage2 [9],LIGO3 [10], AIRSN [11], and Chimera [12]. Moreover,a scheduler must produce schedules that support theQoS requirements of an application while consideringthe communication demands due to data dependencies.Therefore, the bandwidth availability in communicationchannels that connect private and public clouds playsan important role in achieving a schedule objective.Neglecting the variabilities of the available bandwidth inthe production of a schedule can hamper the executionof the workflow [13] and compromise the provisioningof quality of service (QoS) [14]. Uncertainties in the valueof the input data to the scheduler requires elaboratescheduling schemes.

2.2 Scheduling Workflows in Hybrid CloudsA hybrid cloud (Figure 1) is composed of the resourcesof a private cloud, which are free of charge, as well asthose of one or more public cloud providers, with thepublic cloud provider furnishing on-demand or reservedresources via long-term contracts. When public resourcesare leased, clients are charged on a pay-per-use basis,usually for hours of use with reserved resource typicallyleased at lower prices than on-demand ones. The hybridcloud scheduler must decide which resources should beleased from public cloud providers in the composition ofthe hybrid cloud so that the runtime of the applicationdoes not exceed the deadline and satisfies budget con-straints. In this paper, we assume that the organizationhas a broker in its premises which is responsible forconnecting resources from the public clouds to those ofthe private cloud resources. This broker receives requestsfor workflow execution and rents resources from thepublic cloud based on the decisions of schedulers.

The schedulers for hybrid clouds considered in thispaper are not aware of data center network topology.This paper considers only application schedulers whichresponsibility is to allocate tasks onto virtual machinesalready provisioned by the infrastructure provider. Theallocation of virtual machines onto physical machines isdone by the infrastructure provider, which uses mech-anisms aware of the data center network topology. Ex-amples of network-aware VM placement scheduler canbe found in [15], [16]. The application scheduler in this

2. http://montage.ipac.caltech.edu/3. http://www.ligo.caltech.edu/

https://confluence.pegasus.isi.edu/display/pegasus

http://montage.ipac.caltech.edu/

http://www.ligo.caltech.edu/




User

IaaS Provider 1

IaaS Provider 2

IaaS Provider nPrivate cloud

Workflow

submission

Organization

Scheduler

broker

Fig. 1. Hybrid cloud infrastructure and workflow submission.

paper does not receive as input information about thetopology of the data center network, but considers thisinformation indirectly since it uses historical data ofthe performance of the execution of workflows as wellestimations of the available bandwidth

Except for a few simple cases [2], the scheduling prob-lem is NP-Complete, which calls for heuristics to pro-duce a near-optimum schedule. Scheduling is a decision-making process that involves characteristics from theapplication being scheduled, such as memory and diskspace consumption, and from the target system, suchas processing power, storage and bandwidth availabil-ity. Intra- and inter-cloud communication is importantin hybrid clouds because large bandwidth availabilitydecreases communication delays, which contributes tothe minimization of the makespan. Information aboutthe application and on cloud resources is commonlyassumed to be precise; however, in a real environment,it is difficult to obtain precise estimates due to thelimitations of the measurement tools [17]. Informationabout the cost of resource usage is also required bythe schedulers. By considering all this information, ascheduler for hybrid clouds should be able to assign thejobs of a workflow to the resources of a cloud.

2.3 Cost-aware Scheduler

One class of schedulers that can be employed in hybridclouds focuses only on cost minimization. One such algo-rithm employs Particle Swarm Optimization (PSO) [8]. Aparticle in the swarm is taken as a potential schedule andrepresents a set of workflow jobs to be scheduled andevaluated by a fitness function that computes the cost ofexecution. A heuristic is employed to iteratively utilizethe results from the particle swarm optimization. Thisheuristic first computes the average computational andcommunication demands, which are expressed by labelsof nodes and arcs, respectively. It then computes aninitial schedule for each “ready job”; i.e., jobs that havereceived the files that are necessary for its execution. PSOis then run over the next set of ready jobs, and thesesteps are repeated until all the jobs have been scheduled.

Algorithm 1 shows a scheduling heuristic that is basedon PSO.

Algorithm 1 Scheduling Heuristic Algorithm Overview1: Compute the average computing cost of all jobs in all resources2: Compute the average cost of communication between resources3: R is a set of all jobs in the DAG G4: Call PSO(R)5: repeat6: for all ready jobs ji ∈ R do7: Assign job ji to resource pj according to the solution pro-

vided by PSO8: end for9: Dispatch the mapped jobs

10: R is a set of ready jobs in the DAG G11: Call PSO(R)12: until all jobs are scheduled

Even though the heuristics that are based on PSO donot consider deadlines, they do not completely neglectthe makespan. The cost function that is optimized bythe PSO ensures that all of the jobs are not mapped ontothe same resource, which promotes parallelism and con-sequently reduces the makespan by avoiding sequentialexecution of the workflow on the least expensive virtualmachine available. The PSO algorithm has a time com-plexity of O(n2× r) where n is the number of workflowtasks, corresponding to the number of dimensions of aparticle swarm, and r the number of resources in thehybrid cloud. The number of particles in the swarm andthe number of iterations (stopping criteria) are constantvalues.

To make this scheduler deadline-aware, the PSO func-tion was modified to calculate the monetary costs of onlythe particles that satisfy the deadline. If a particle doesnot satisfy the deadline, the solution is not “qualified”,and the cost is given a value of infinity. The objectivefunction consists of minimizing the monetary cost byconsidering only particles on the swarm that representqualified solutions. In this scheduling algorithm, whichis called adapted-PSO, the equation

Cost(M) = max∀j∈P

(Ctotal(Mj)) (1)

is replaced by

Cost(M) =

∑∀j∈P

Ctotal(Mj) if makespan(M) ≤ DM

+∞ otherwise(2)

where M is a schedule that is represented by a particleof the swarm, Ctotal(Mj) is the function that calculatesthe monetary cost of using the resource j, and DM isthe deadline required by the user. The cost Ctotal(Mj)of a resource j is the sum of the costs of execution anddata transfer. The time complexity of the adapted-PSOalgorithm is equal to that of the PSO algorithm.

2.4 Deadline- and Cost-Aware SchedulerThe Hybrid Cloud Optimized Cost (HCOC) [7] is analgorithm for workflow scheduling in hybrid clouds that




attempts to minimize the execution cost while meet-ing specified deadlines. HCOC starts by scheduling theworkflow in the private cloud because those resourcesare free of charge. If the predicted makespan exceeds thespecified deadline, jobs are selected to be rescheduled inthe public cloud.

HCOC accepts different heuristics to perform the ini-tial schedule in the private cloud. In this paper, the firstschedule is obtained by employing the path clusteringheuristic (PCH) [18], which clusters jobs in the same pathof the DAG. In HCOC, the selection of the job that willbe rescheduled and the selection of the resource fromthe public cloud are directly influenced by the availablebandwidth in the inter-cloud links. An incorrect esti-mate of the available bandwidth can lead to deadlineviolations and increased costs. Algorithm 2 details thesesteps. The heuristic PCH in the fist step can be replacedby any other heuristic, such as HEFT [19]. Both PCH andHCOC algorithms have O(n2×r) time complexity whenconsidering n tasks and r resources.

Algorithm 2 HCOC Algorithm Overview1: R is the set of all resources in the private cloud2: Schedule G in the private cloud using PCH3: while makespan(G) > D AND iteration < size(G) do4: iteration = iteration+ 15: select node ni such that its priority is max AND ni 3 T6: H = R; T = T ∪ t7: num clusters = the number of clusters of nodes in T8: while num clusters > 0 do9: Select resource ri from the public clouds such that ((pricei)÷

(num coresi × pi)) is minimum AND num coresi <=num clusters

10: H = H∪ {ri}11: num clusters = num clusters− num coresi12: end while13: for all ni ∈ T do14: Schedule ni in hj ∈ H such that EFTi is minimum15: Recalculate the jobs attributes for each job16: end for17: end while

3 HANDLING UNCERTAINTY IN AVAILABLEINTER-CLOUD BANDWIDTH VALUESOne of the problems faced in scheduling is the qualityof the information that is provided to the scheduler [20].Imprecision in the input data may lead to an underesti-mation of the makespan and as service is offered on apay-per-use basis, the customer will then potentially paymore to run the application. Communication betweencomputational resources in the private cloud and inthe public cloud is a major issue in hybrid clouds.Algorithms that are robust for low quality informationare of paramount importance to minimize the negativeconsequences of uncertainty in the available bandwidthinformation and to avoid underestimating the makespanand budget.

3.1 Procedure for Deflating Estimates of the Avail-able Bandwidth in Inter-cloud LinksThe quality of information model used in this paper isbased on the model presented in [20]. The estimate of the

available bandwidth that is utilized as an input to thescheduler at the scheduling time may deviate from thatexperienced at runtime by a percent error p. For example,in a scenario in which the expected uncertainty of theavailable bandwidth in the inter-cloud links is p = 50%,a data transfer that is initially estimated by the schedulerto take 30 seconds could take between 15 and 45 secondsduring the execution of the workflow.

Overestimating the available bandwidth yields longerdata transfer times than expected, which increases themakespan and can potentially exceed the establisheddeadline. This can lead to increased costs due to theneed for longer rental periods, which can surpass thedefined budget. On the other hand, underestimating theavailable bandwidth can also lead to unnecessary leasingof processing power; however, this does not call for abudget increase. Indeed, the precision of the input datathat are provided to the scheduler is essential to theeffectiveness of the schedule that is generated by thescheduling algorithm.

Traditional prediction methods, such as time series,can be used to predict the available bandwidth. How-ever, although such predictions may be a good represen-tation of the variability of the available bandwidth for aspecific period, they would not represent the impact ofthe bandwidth estimates on the predicted makespan andcosts. Thus, a procedure that correlates the uncertainty inthe available inter-cloud bandwidth with the producedschedule is needed and has not been proposed.

Thus, a proactive procedure is proposed here to mini-mize the negative impact of imprecise information aboutthe available bandwidth on the workflow execution inhybrid clouds. The proposed procedure produces a de-flating multiplier value that is applied to the estimatedavailable bandwidth. In other words, the proposedmechanism applies a reduction factor U to the measuredinter-cloud bandwidth at the scheduling time to preventmisleading information from affecting the performanceof the workflow execution. Information about the degreeof uncertainty is provided to the scheduling algorithm,and the schedule is generated based on the expected un-certainty in the data transfer time. The proposed proce-dure calculates a U value for a workflow using a requireddeadline value, an estimate of the available bandwidthand the bandwidth variability, which indicates the uncer-tainty of the available bandwidth. The computation of Uvalue employs information from previous executions ofthe target workflow. Thus, for each type of workflow, theproposed method uses previously calculated U values,the available bandwidth and the error that occurredin previous estimates of the makespan and cost. Theuncertainty value can be derived from benchmarks orhistorical data of bandwidth availability in the inter-cloud links obtained from monitors [21], [22], [23].

Figure 2 illustrates the proposed approach. Informa-tion about the deadline violation and cost of a workflowas well as on the bandwidth variabilities during theexecution of a workflow is stored in a repository and




contributes to the history of the execution of a specificworkflow. This data is retrieved to compute the U value.

Fig. 2. Scheduler bandwidth adjustment mechanism.

3.2 Computation of the U valueThe proposed procedure works between the availablebandwidth estimation tool and the scheduler. It takesthe available bandwidth value provided by the estima-tion tool and applies the deflating U value to it beforeproviding the deflated bandwidth value as an input tothe scheduler.

Historical information about previously computed Uvalues and estimates of the available bandwidth arestored in the database as well as the differences betweenpreviously predicted and measured makespan valuesand costs. A multiple linear regression (MLR) procedureis employed to compute the U value; these data areused to compute the values of the coefficients a, b andc in the equation f(x, y) = ax + by + c, where x is avariable that represents the current predicted uncertaintyin the available bandwidth, y is an independent variablethat represents the currently available bandwidth, andf(x, y) = U .

The data set HG is composed of the 5-tuplehi =

⟨bw, p,U , errormG , error$G

⟩that is used for the future

production of schedules, where bw is the estimatedavailable bandwidth in the inter-cloud channel for thescheduling of a workflow G, which is represented by adirected acyclic graph (DAG), errormG is the differencebetween the estimated makespan at the scheduling timeand the real observed runtime value of G, error$G isthe difference between the estimated monetary executioncost at the scheduling time and the real monetary costof the execution of G, p is the maximum error betweenbw and the observed available bandwidth in the inter-cloud channel during the execution of the workflow onthe hybrid cloud, and U is the reduction factor used atthe scheduling time.

The subset Hk ⊆ HG is used by the multiple lin-ear regression procedure. For each (bw, p) pair in HG,the subset Hk is constructed by selecting two 3-tuples:(bw, p,Um

), which gives the minimum absolute value

of errormG , and(bw, p,U$

), which gives the minimum

absolute value of error$G. The construction of the Hk

Algorithm 3 Hk computation for a DAG Gi

1: H = history of past executions of Gi

2: Hk = ∅3: for all (bwi, pi) pair do4: minimumErrorm = +∞; minimumError$ = +∞5: for all 5-tuples

⟨bwj , pj ,Uj , errormGi

, error$Gi

⟩∈ H do

6: if bwj = bwi and pj = pi then7: if |errormGi

| < minimumErrorm then8: minimumErrorm = |errormGi

|9: best tuplem = 〈bwj , pj ,Uj〉

10: end if11: if |error$Gi

| < minimumError$ then12: minimumError$ = |error$Gi

|13: best tuple$ = 〈bwj , pj ,Uj〉14: end if15: end if16: end for17: Hk = Hk ∪ {best tuplem, best tuple$}18: end for

subset is illustrated in Algorithm 3. This algorithm hasa time complexity of O(n2) with n being the number ofpast executions of G (size of the history of past executionfor a DAG G), and each execution n use different valuesof bwj , pj ,Uj , errormGi

and errorGi .When an application workflow G is about to be sched-

uled, the value of the reduction factor U is computedusing the linear equation f(x, y) = ax + by + c, wherethe coefficients a, b, and c are given by a multiplelinear regression that employs Hk as an input. Thedependent variable U is computed from the explanatory(independent) variables p and bw, which are obtainedfrom the repository. Making x = p the current predicteduncertainty and y = bw the current available bandwidth,a percentage reduction factor U = f(x, y) is computed forthe next schedule computation. The proposed procedurerequires a set of historical data for training before theMLR approach can be applied.

4 EVALUATION PROCEDURE

This section describes the evaluation of the efficacy of theproposed procedure. The performance of the schedul-ing algorithms used in this paper are evaluated in thepublications in which they were originally presented[7], [8]. Comparing results derived using the proposedscheme to optimal values is infeasible, due to the NP-Completeness of the scheduling problem. The objectiveof this section is to assess the proposed mechanism whenattached to existing schedulers.

4.1 Workflows

The proposed procedure is evaluated using simulationswith synthetic workflows from the workflow generatorgallery1. The gallery contains synthetic workflows thatwere modeled using patterns taken from real applica-tions, such as Montage, LIGO and CyberShake. Work-flows with different sizes such as 50, 100, 200 and 300tasks (DAG nodes) are employed. For each workflowapplication and for each workflow size, 100 different




TABLE 1Hybrid cloud environment

Provider Type Core Core Price per Data transferPerformance time unit out price

A1 S 1 1.0 $0.00 $0.00(private)

A2 S 1 1.0 $0.00 $0.00(private) M 1 1.5 $0.00 $0.00

A3S 1 1.0 $0.00 $0.00

(private) M 1 1.5 $0.00 $0.00G 2 2.0 $0.00 $0.00

BS 1 2.75 $0.104 $0.01

(public)M 2 2.75 $0.207 $0.01L 4 2.75 $0.415 $0.01

XL 8 2.75 $0.829 $0.01

C

S 1 1.0 $0.06 $0.12

(public)

M 1 2.0 $0.12 $0.12L 2 2.0 $0.24 $0.12

XL 4 2.0 $0.48 $0.12XXL 8 3.25 $0.9 $0.12

workflows were generated differing in the computa-tional demands (weights of the DAG nodes) uniformlydistributed in the interval [1, 10], and the communica-tion demands (weights of the DAG edges) uniformlydistributed in the interval [50, 60]. The set of syntheticworkflows contains 3 types of workflow applications, 4different workflow sizes, and 100 workflows per appli-cation and size, resulting in a total of 1, 200 syntheticworkflows. The simulator reads the workflow descrip-tion files in a DAX format that is used by the PegasusWorkflow Management System4.

4.2 Hybrid Cloud Scenario

Hybrid cloud scenarios are shown in Table 1. Theyemploy resources and prices based on the configurationsof the Amazon EC25 and Google Compute Engine6

cloud providers. To simulate the availability of privateresources, we use three different sizes of the privatecloud to simulate situations in which private resourcesare not sufficient to handle the computational demandsof the workflow execution and the leasing of public re-sources becomes necessary to meet with the applicationdeadline.

4.3 Simulator

To evaluate the use of the proposed procedure, a cloudsimulator was developed to calculate the estimated val-ues of the makespan and monetary cost considering anuncertainty value p%. Figure 3a shows an overview ofhow the simulations were performed. The HCOC andPSO schedulers were implemented in Java according totheir descriptions in [7] and [8], respectively, and theadapted-PSO according to the adjustments described inSection 2.3. The data inputs for the scheduler are a DAXfile, a VM file and an estimated value of the available

4. https://pegasus.isi.edu/5. http://aws.amazon.com/ec2/6. https://cloud.google.com/compute/

inter-cloud bandwidth bw. The DAX file contains thedescription of the workflow in XML format, while theVMs file contains information about the hybrid cloud(Table 1). For the HCOC and adapted-PSO schedulers, adeadline value for the workflow execution is also pro-vided as input data. The estimated inter-cloud availablebandwidth value bw is deflated by the factor U , andthe result is used by the scheduler; U = 0 means thatno reduction was applied, and U = 5 means that thedeflated bandwidth is 95% of the value provided at thescheduling time.

After the schedule is produced by the scheduler, it isused as an input to the simulator, which estimates thevalues of the makespan and cost of the execution of theworkflow. The simulator uses the uncertainty value pof the inter-cloud bandwidth. Every time the simulatorneeds to use the value of the inter-cloud available band-width, it takes a new value that is uniformly distributedin the interval [b−p% ; b+p%]. This uncertainty p is onlyapplied to the inter-cloud available bandwidth.

4.4 Experimental ParametersA broad space of parameter values, including deadlines,was used cases ranging from tight deadlines, which im-ply that a small number of workflows can be completedwith private resources, to more loose deadlines, whichimply that almost all of the workflows can be completedusing only private resources. In addiction, three levels ofcloud utilization were employed, ranging from an idleprivate cloud, in which most of the executions couldbe performed without violating the deadline, to a busyprivate cloud, which calls for the use of a public cloud toaccomplish most of the executions before the deadline.

The deadline values, D, varied from Tmax × 2/7 toTmax× 6/7 in 1/7 steps, where Tmax is the makespan ofthe least expensive sequential execution of all the DAGnodes on a single resource. Deadlines of Tmax × 1/7 re-sulted in only disqualified solutions (makespans greaterthan the deadline value) for all executions. Schedulesfor deadlines of Tmax × 7/7 can be trivially achievedby scheduling all of the tasks sequentially in the leastexpensive resource and were not generated in this paper.

Inter-cloud bandwidths of 10 to 60 Mbps and intra-cloud bandwidths of 1 Gbps were considered. We as-sume that the intra-cloud bandwidth is greater than theinter-cloud bandwidth, which is a reasonable assump-tion in real hybrid cloud environments. The availablebandwidth values in the inter-cloud links were basedon measurements of TCP connections for the AmazonEC2 cloud [21] and the Rackspace.com cloud [22].

4.5 Experimental SetupThe procedure was evaluated in three steps. In thefirst step, a database that contains historical informationabout the workflow execution is produced for each typeof workflow, deadline and scheduler (Figure 3a) usinga fixed deflating bandwidth U . In the second step, the

https://pegasus.isi.edu/

http://aws.amazon.com/ec2/

https://cloud.google.com/compute/




Scheduler

DAX

File

VMs

File

Schedule Simulator

Reduction factor Uncertain

Makespanand

Cost ($)

Available Bandwidth

Makespanand

Cost ($)U p

b

Database

(a) First step

Database

ProcedureResource

Monitorb

pScheduler

U

Available

Bandwidth

Uncertain Reduction factor

DAX

File

Hybrid CloudVMs

File

Workflow

(b) Second and third steps

Fig. 3. Overview of the steps of the simulations

coefficient values of the MLR equation are generatedusing information from the database. In the last step,the MLR procedure is applied to deflate the bandwidthvalue, which is used as an input to the scheduler (Figure3b) when a new workflow is set to be scheduled.

Initially, we assessed how using the procedure witha fixed U factor ameliorates the negative impact of im-precise information about the inter-cloud available band-width on the execution of the workflow. To investigatethe results of the produced schedules, fixed bandwidthdeflating factors U ∈ {0, 5, 15, 25, 35, 45, 50} were used.The choices of the fixed U values was based on experi-ments performed for this purpose. Results indicated thatwhen the measured available bandwidth values weredeflated by more than 50%, the scheduler did not findqualified solutions within the established deadline inmost of the experiments conducted. For this reason, Uvalues less or equal than 50% were used in this paper.For each value of U , the percent error p ranged from10% to 99%. For example, given a U value, a percentageerror p was introduced into the simulation to derivethe makespan and costs. The scheduling and simulationresults were stored in the database. Therefore, for each3-tuple 〈bw, p,U〉, the cost and makespan estimates givenbefore (by the original values from scheduler) and afterthe simulations were stored.

In the second step, based on the type of workflowand the scheduler, the values of the coefficients of themultiple linear regression (MLR) equation are computedusing Algorithm 3. The MLR derives the values of thecoefficients a, b and c in the equation f(x, y) = ax+by+c.By checking the resource monitor and making x = p thecurrently predicted uncertainty and y = bw the currentlyavailable bandwidth, a deflating factor U = f(x, y) canbe computed and used as an input to the scheduler.

In the last step, new simulations were run for eachtype of workflow to evaluate the behaviour of the pro-posed mechanism. For all U = f(x, y) and p values,average values were computed considering only theschedules that results in makespan values lower thanthe deadline.

5 RESULTS

This section presents numerical results that were pro-duced to assess the impact of imprecise information onthe available bandwidth in inter-cloud links as well asthe effectiveness of using a constant value and a valuederived by the MLR procedure for the deflating factorthat is applied to the available bandwidth.

5.1 Impact of Imprecise Information about the Avail-able Bandwidth on Inter-Cloud Links

To address the question posed in the introduction,uniformly-distributed errors were introduced into theestimates of the available bandwidth in inter-cloud links,and the increase of CPU usage and the achievementof the target objectives were assessed. The schedulingalgorithms were not changed to cope with inaccurateinaccurate estimates of the available bandwidth in thisexperiment. For each solution that satisfies the deadline(qualified solution) defined by the scheduler, the esti-mates of cost and makespan were obtained by applyinga random error p to the estimated available bandwidthbw, which is considered to be 100% accurate at thescheduling time.

5.1.1 Resource demand

The goal of this evaluation is to assess for differentbandwidth availability in the inter-cloud link the needsof leasing public resources to meet execution deadlineof workflows. The CPU-time usage for the executionof workflows was evaluated considering three privateclouds: A1, A2 and A3 (Table) 1). Workflow templateswith 50, 100, 200 and 300 tasks for three different appli-cations were employed in the evaluation. For each work-flow template, twenty workflows were created with dif-ferent values of the weight of nodes and edges, for a totalof 240 workflows. Each workflow was simulated using3 different scheduling algorithms, 5 different deadlinevalues, 6 inter-cloud estimates of available bandwidthand 11 different values of p. Therefore, results werederived from an extensive set of 712, 800 simulations.




A1A2A3 A1A2A3 A1A2A3 A1A2A3 A1A2A3 A1A2A3

0% 20% 40% 60% 80% 99%

Uncertain p

0

50

100

150

200

250

300CPU-tim

e Usage

Private resourcesPublic resources

(a) DG = Tmax × 2/7


0% 20% 40% 60% 80% 99%

Uncertain p

0

100

200

300

400

500

CPU-tim

e Usage


(b) DG = Tmax × 4/7


0% 20% 40% 60% 80% 99%

Uncertain p

0

100

200

300

400

CPU-tim

e Usage


(c) DG = Tmax × 6/7

Fig. 4. Averages of CPU usage of private and public resources of the HCOC scheduler for scheduling the CyberShakeapplication of 50 tasks when the measured inter-cloud available bandwidth of 30 Mbps was varied from 0% to 99% forall private cloud sizes A1, A2 and A3 of the Table 1.

0% 20% 40% 60% 80% 99%

Uncertain p

0

100

200

300

400

500

600

700

800

Co

st ($)

MONTAGELIGOCYBERSHAKE

(a) Cost – HCOC scheduler

0% 20% 40% 60% 80% 99%

Uncertain p

0

100

200

300

400

500

600

Cost ($)


(b) Cost – adapted-PSO scheduler

0% 20% 40% 60% 80% 99%

Uncertain p

0

200

400

600

800

1000

Cost

($)


(c) Cost – PSO scheduler

Fig. 5. Barplots of the cost estimates given by the HCOC, adapted-PSO and PSO schedulers to schedule 50-taskworkflow application when the error in the estimated available bandwidth of 40 Mbps varies from 0% to 99%. ForHCOC and adapted-PSO schedulers, the deadline value was DG = Tmax × 4/7.

Figure 4 shows the averages CPU usage in the private,as well as the public cloud for the HCOC scheduler toschedule a workflow with 50 tasks of the CyberShakeapplication when the measured inter-cloud availablebandwidth of a 30 Mbps link was varied from 0% to99%, with this usage shown as a function of the size ofthe private cloud and the p value. Figure 4 evinces theincrease in CPU-time usage with increasing uncertaintyin the available bandwidth estimates (for a better visu-alization, the results for p values of 10%, 30%, 50%, 70%and 90% are not plotted). The increase in data transfertimes implies longer periods of CPU usage becausethe jobs need to wait longer for the data they dependon. This increase in CPU-time usage increases the idleperiods during which the CPU is allocated and becomesunavailable for processing other dependent jobs, whichcan lead to the need to lease resources from the publiccloud. Figure 4 shows the increase in CPU-time usagetriggered by the uncertainty in the available bandwidthestimates.

Figure 4 shows results for three different deadlinevalues: a low, a median and a high value, each expressedas a proportion of the Tmax value. In most cases, the

TABLE 2% of disqualified solutions of Figure 5

Uncertain p%0 20 40 60 80 99

HCOCMontage 12% 58% 64% 82% 91% 92%

LIGO 42% 47% 49% 56% 62% 79%CyberShake 1% 22% 24% 26% 28% 63%

Adapted-Montage 0% 16% 17% 18% 39% 95%

PSOLIGO 1% 20% 20% 25% 24% 74%

CyberShake 3% 19% 21% 19% 23% 49%

low deadline can not be met when using the privatecloud A3, and resources from the public cloud mustconsequently be leased (Figure 4a). However, for a me-dian deadline, resources in the private cloud A3 weregenerally sufficient, which did not happen when theworkflows are executed in private cloud A1 and A2

(Figure 4b). For a high deadline value, execution only onthe private cloud A1 was not satisfatory (Figure 4c). Ingeneral, private clouds A2 and A3 have either sufficientor nearly sufficient resources to process the requiredworkload. Because the aim of this study is to analyze theimpact of the uncertainty of the available bandwidth in




the inter-cloud links, results will be shown only for theprivate cloud A1 since its resources were not sufficientto meet the application deadline, independent of theworkflow application, scheduling algorithm, deadlinevalue, and available bandwidth in the inter-cloud links.

5.1.2 Failure to Achieve the Established Objective

Various simulations were performed to assess the impacton the achievement of the objectives expressed in theobjective function of the algorithms. This evaluationinvolved 1, 200 workflows (3 types of application work-flow templates × 4 different sizes per template × 100workflows per application and size). These workflowswere simulated by using 6 different available bandwidthvalues, 11 p values and 5 deadline values. The evalu-ation summarizes the outcome of an extensive set of79, 200 simulations for the PSO scheduler and 396, 000simulations for the HCOC and adapted-PSO schedulers.The number of simulations using the PSO scheduler wasfive times lower since the deadline value is not used aspart of the input data. Figure 5 shows the cost estimatesgiven by the HCOC, adapted-PSO and PSO schedulersin scheduling a 50-task workflow application when themeasured inter-cloud available bandwidth of a 40 Mbpslink was between 0% and 99%.

The Montage and CyberShake applications have asimilar structural pattern in which the DAG jobs sendthe same intermediate file to two or more successorjobs. Moreover, depending on how the schedule wasproduced, sending multiple copies of the intermediatefile can increase the traffic in the inter-cloud links [24].Because the PSO algorithm does not consider the avail-able bandwidth between resources but only the cost ofdata transmission from/to the public cloud, unnecessaryflooding of the inter-cloud links can occur. This impliesan increase of the execution cost of the Montage andCyberShake applications (Figure 5c). For example, whenapplying an uncertainty value of 20% to the inter-cloudavailable bandwidth of a 40 Mbps that was used to pro-duce the original schedule, the average cost of runningthe Montage application in hybrid clouds increased by12%, while for an uncertainty value of 99%, the costdoubled. For CyberShake, the cost increased by 9% fora p value of 20%, while for a p value of 99% the costdoubled as well. Although the LIGO application sendsfewer duplicate intermediate files to other dependentjobs, which contributes with less traffic in the inter-cloudlinks, the uncertainty in the available bandwidth esti-mates cooperated with the increase in the cost estimates.The cost increased 4% for p = 20% and only 47% forp = 99%.

For each qualified schedule, a variability p > 0 in theavailable bandwidth estimates was introduced by thesimulator that checks if the schedule remained qualified.Table 2 shows the percentage of solutions that missedthe deadline (disqualified solutions) for the HCOC andadapted-PSO schedulers. Schedules produced by the

PSO scheduler were not classified as either qualified anddisqualified since it is a cost-aware scheduler.

Because both HCOC and the adapted-PSO aredeadline- and cost-aware schedulers, they are expectedto produce schedules that meet the deadline with lowcosts. However, the effect of errors in the inter-cloudavailable bandwidth estimates resulted in increasedmakespans, which caused the deadline to be missed.For example, of the 88% of the solutions for Montageproduced by HCOC that were eligible when p = 0%, 58%were disqualified for p = 20%, and 91% were disqualifiedfor p = 80%. The schedules produced by adapted-PSOwere less impacted. For example, of the 100% of theMontage solutions that were eligible when p = 0%,16% were disqualified when p = 20%, and 39% weredisqualified when p = 80%. The cost estimates providedby HCOC, increased by 5% when the p value variedfrom 0% to 20%, and 30% when p value varied from0% to 80%. The cost estimates provided by the adapted-PSO also increased with increasing values of p. The costestimate increased by 14% when the p value varied from0% to 20%, and 28% when p value varied from 0% to80%. Similar trends were obtained for the LIGO andCyberShake workflows; i.e, increasing the error in theestimated available bandwidth, increases the number ofdisqualified schedules as well as underestimates costs.

This section showed results for schedulers that werenot designed to address with inaccurate informationabout the available bandwidth and produced scheduleswith misleading cost and makespan estimates. The nextsection will show that the use of a deflating factor forthe available bandwidth that is used as an input to thescheduler can reduce the disqualification of schedules,by avoiding underestimates of the makespan and cost.Because the original PSO scheduler does not considerthe available bandwidth to produce schedules, it wasnot evaluated hereafter.

5.2 Using a Fixed Deflating U value

Deflating the measured available bandwidth value us-ing the U value and providing the deflated bandwidthestimate to the scheduler at the scheduling time willhopefully reduce the negative impact of imprecise band-width information on the production of schedules. Theresults from the next experiments assess the extent towhich a deflating factor U mitigates the negative impactof imprecise information about the available bandwidthon the produced schedules.

One thousand two hundred workflows were em-ployed in this evaluation (3 types of application work-flow template × 4 different workflow size per template× 100 workflows per application and size). These work-flows were used in simulations with 6 different availablebandwidth values, 11 p values, 5 deadline values and 6deflating U values. The experiment used a fixed U value,U ∈ {0, 5, 15, 25, 35, 45, 50}, but for better visualization,results for U = 35 and U = 45 were not plotted.




0% 10% 20% 40% 60% 80% 99%

Uncertain p

0

50

100

150

200

250

300

350Cost ($)

U = 0.0

U = 0.05

U = 0.15

U = 0.25

U = 0.50

(a) LIGO – HCOC

0% 10% 20% 40% 60% 80% 99%

Uncertain p

0

50

100

150

200

250

300

350

Cost ($)

U = 0.0

U = 0.05

U = 0.15

U = 0.25

U = 0.50

(b) CYBERSHAKE – HCOC

10% 20% 40% 60% 80% 99%

Uncertain p%

0

100

200

300

400

500

600

700

800

Cost ($)

U = 0.0

U = 0.05

U = 0.15

U = 0.25

U = 0.50

(c) MONTAGE – HCOC

0% 10% 20% 40% 60% 80% 99%

Uncertain p

0

100

200

300

400

Cost ($)

U = 0.0

U = 0.05

U = 0.15

U = 0.25

U = 0.50

(d) LIGO – Adapted-PSO

0% 10% 20% 40% 60% 80% 99%

Uncertain p

0

50

100

150

200

250

300

350

Cost ($)

U = 0.0

U = 0.05

U = 0.15

U = 0.25

U = 0.50

(e) CYBERSHAKE – Adapted-PSO

10% 20% 40% 60% 80% 99%

Uncertain p%

0

100

200

300

400

500

Cost ($)

U = 0.0

U = 0.05

U = 0.15

U = 0.25

U = 0.50

(f) MONTAGE – Adapted-PSO

Fig. 6. Barplots of the costs of disqualified solutions when the measured available bandwidth of 40 Mbps was deflatedby a fixed U value and provided as input to the HCOC (top) and PSO-adpted (bottom) schedulers to schedule theapplications workflows of 50 tasks with the deadline value equal to DG = Tmax × 4/7. The x-axis represents theestimated error in the available bandwidth which varies from 10% to 99%.

This experiment summarizes the outcome of a set of2, 376, 000 simulations for each deadline- and cost-awarescheduler. Figure 6 shows the cost estimates given by theHCOC and adapted-PSO schedulers to schedule a 50-task workflow application when the uncertainty of themeasured inter-cloud available bandwidth of 40 Mbpsand 60 Mbps varied from 0% to 99%. Table 3 shows thepercentage of disqualified solutions shown in Figure 6when the available bandwidth estimates were deflatedby a fixed U value.

The results show a reduction in the percentage of dis-qualified schedules when the deflating U value was em-ployed. This trend was present for most cases with thethree types of applications. For example, using HCOCwith U = 0.05 and p = 60%, there was an increase of10%, 21% and 22% in the number of qualified solutionsfor the LIGO, CyberShake and Montage applications,respectively; while for the adapted-PSO, the number ofqualified solutions increased 21%, 20% and 8% for LIGO,CyberShake and Montage application, respectively.

However, the use of a large deflating U value increasedthe number of disqualified solutions. For HCOC, whenU = 0.5 and p = 60%, the number of disqualified solu-tions for LIGO, CyberShake and Montage increased by15%, 5% and 13%, respectively; while using the adapted-PSO caused increases of 15%, 20% and 24% for LIGO,CyberShake and Montage, respectively. This experiment

also suggests that the number of disqualified solutionscan be reduced in the presence of high variability of theavailable bandwidth. For example, for an uncertaintyvalue of 99% and U = 0.05, the numbers of qualifiedsolutions produced by HCOC increased by 23% forCyberShake and 3% for LIGO. For the adapted-PSO, theincreases were 2% for Montage, 21% for LIGO, and 7%for CyberShake. In general, regardless of the schedulerand application, deflating the available bandwidth esti-mates increases the number of schedules that satisfy thedeadline in the presence of variabilities of the availablebandwidth of inter-cloud links.

The use of a non-null deflating factor U did not implyan increase in estimated costs, as shown in Figure 6. Forexample, using HCOC to schedule the LIGO applicationwhen p = 40% led to costs 13% lower for U = 0.05instead of U = 0. In other cases, the use of a largedeflating U value resulted in a further reduction in costsinstead of using a short U value. For example, whenthe adapted-PSO algorithm was used to schedule Cyber-Shake application for p = 10%, the cost was 6% lowerwhen using U = 0.25 instead of U = 0.05. The decreasein cost was possible since the set of virtual machinesselected by the algorithm to perform the same taskscheduling is different when the available bandwidthestimates deflated by different U values.

These experiments suggest that the decrease in the




TABLE 3% of disqualified solutions of the evaluation for the HCOC scheduler (left) and the Adapted-PSO (right) of Figure 6

HCOC Uncertain p%U value 10 20 40 60 80 99

LIG

O

0.0 52% 55% 61% 67% 81% 85%0.05 43% 53% 53% 57% 56% 66%0.15 45% 46% 47% 59% 62% 70%0.25 46% 47% 47% 60% 58% 75%0.50 79% 80% 80% 82% 86% 88%

CY

BER

-

0.0 16% 17% 22% 23% 31% 80%

SHA

KE 0.05 2% 2% 2% 2% 5% 13%

0.15 1% 1% 1% 1% 6% 24%0.25 6% 6% 6% 6% 10% 22%0.50 22% 24% 26% 28% 37% 57%

MO

NTA

GE 0.0 13% 58% 64% 82% 91% 92%

0.05 1% 49% 52% 60% 79% 85%0.15 1% 74% 73% 75% 81% 87%0.25 2% 77% 79% 83% 87% 89%0.50 74% 90% 92% 95% 95% 95%

Adapted-PSO Uncertain p%U value 10 20 40 60 80 99

LIG

O

0.0 18% 19% 17% 22% 22% 61%0.05 3% 3% 3% 3% 3% 6%0.15 11% 11% 11% 11% 12% 13%0.25 14% 14% 14% 14% 16% 19%0.50 20% 20% 20% 32% 35% 40%

CY

BER

-

0.0 18% 17% 18% 25% 24% 52%

SHA

KE 0.05 5% 5% 5% 5% 7% 8%

0.15 10% 10% 10% 11% 11% 13%0.25 14% 14% 14% 14% 15% 19%0.50 22% 23% 24% 40% 40% 45%

MO

NTA

GE 0.0 14% 15% 17% 13% 14% 67%

0.05 2% 4% 4% 5% 7% 12%0.15 12% 13% 13% 15% 17% 19%0.25 14% 19% 19% 22% 26% 23%0.50 21% 33% 33% 37% 53% 65%

number of disqualified solutions depends on the U valueand that there is no clear trend for choosing an optimumvalue. This motivated the adoption of a procedure thatconsiders the execution history to determine the mostappropriate U value.

5.3 Using the Proposed Mechanism to Calculate theU valueWhen the U value is derived using the proposed proce-dure, qualified schedules were produced for almost allp values. The analysis of the execution history is the keyto determining a specific deflating factor for the U value.The results show the extent to which using the U valuethat is computed considering the execution history canmitigate the negative impact of imprecise informationabout the available bandwidth. This section shows theresults of simulations when the employment of theproposed procedure to calculate a deflating factor U tothe measured available bandwidth value was used asan input to the scheduler. This experiment summarizesa set of 396, 000 simulations for each deadline- and cost-aware scheduler (3 application workflow templates × 4different sizes × 100 workflows per size × 6 availablebandwidth values × 11 p values × 5 deadline values).

A comparison of the results obtained when the pro-posed MLR procedure was not employed (Section 5.1.2)showed a decrease in the percentage of disqualifiedschedules. This is true independent of the p value forthe three types of applications and for both schedulers.For example, by comparing Table 2 with 4 using theemployment of the HCOC scheduler for a measuredavailable bandwidth of 40 Mbps was deflated by aspecific U value with p = 20%, the number of quali-fied solutions increased by 10%, 22% and 51% for theLIGO, CyberShake and Montage applications, respec-tively. When the adapted-PSO was employed, the num-ber of qualified solutions increased by 17%, 16% and 13%for the LIGO, CyberShake and Montage applications,respectively. Even in scenarios with high uncertaintiesin the available bandwidth estimates, a noticeable in-crease in the number of qualified solutions was found.For example, using HCOC and p = 99%, the number

of qualified solutions for the LIGO, CyberShake andMontage applications increased by 35%, 35% and 28%,respectively; while for the adapted-PSO this increasedby 31%, 19% and 12%, respectively.

The use of a specific U value not only increases thenumber of qualified solutions, but also achieves slightlyless expensive schedules, as shown in Figure 7 . Forexample, using HCOC and p = 40%, the costs of theLIGO, CyberShake and Montage applications were re-spectively 9%, 5% and 4% lower than the original costestimates (Figures 6 with U = 0). The adapted-PSOscheduler reduced the costs of these LIGO, CyberShakeand Montage applications by 5%, 4% and 2%, respec-tively. The results show that when the deflated valuesfor the measured available bandwidths are used, thescheduler produces schedules with higher utilisation ofresources, thus ensuring that the schedules meet thedeadlines as well as reducing misleading cost estimates.

A U value computed by the proposed procedure isdependent on several parameters such, as the type ofworkflow, the deadline value, the value of the measuredavailable inter-cloud bandwidth, the error occurred inprevious estimates of the makespan and the cost due tothe presence of an uncertainty value on the bandwidthvalue. Because several parameters can influence the Uvalue, applying a fixed value as in the previous sub-section does not produce a single best deflating factorvalue, because it can either underestimate or overesti-mate the available bandwidth, which leads to a deadlineviolation or unnecessary rental of processing power onpublic clouds. Thus, for a particular workflow applica-tion, a deadline value, an available bandwidth value andan uncertainty value, the execution history assists theproposed procedure to determinate a specific U valuethat tries to avoid the disqualification of the schedule.

The computation of the U value depends on the size ofthe execution history. The greater the number of tuples〈bm, p,U〉 in the Hk subset, the greater the amount ofinformation used by the multiple linear regression toyield the linear equation U = f(x, y). For example, inour previous work [14], the multiple linear regressionwere obtained by using 25%, 50%, 75% and 100% of




10% 20% 40% 60% 80% 99%

Uncertain p%

0

100

200

300

400

500

600

Cost ($)


(a) HCOC – 40 Mbps

10% 20% 40% 60% 80% 99%

Uncertain p%

0

100

200

300

400

500

Cost ($)


(b) Cost – HCOC – 50 Mbps

10% 20% 40% 60% 80% 99%

Uncertain p%

0

100

200

300

400

500

600

Cost ($)


(c) HCOC – 60 Mbps

10% 20% 40% 60% 80% 99%

Uncertain p%

0

50

100

150

200

250

300

350

Cost ($)


(d) PSO-adapted – 40 Mbps

10% 20% 40% 60% 80% 99%

Uncertain p%

0

50

100

150

200

250

300

Cost ($)


(e) Cost – PSO-adapted – 50 Mbps

10% 20% 40% 60% 80% 99%

Uncertain p%

0

50

100

150

200

250

Cost ($)


(f) PSO-adapted – 60 Mbps

Fig. 7. Barplots of the costs when the measured available bandwidth 40 Mbps, 50 Mbps and 60 Mbps was deflated bya specific U value and provided as input to the HCOC (top) and PSO-Adapted (bottom) schedulers to schedule theapplications workflows of 50 tasks with the deadline value equal to DG = Tmax × 4/7.

the historical data. When using half of the executionhistory, better results were obtained when a quarter ofthis historical data was used. A simulation pattern ofimprovement was also found when three quarters of thehistorical data were used in the regression instead of halfof it. However, we noticed that there was little improve-ment in the results when the amount of historical dataconsidered increased from 75% to 100%. Therefore, sincethe amount of execution history in this paper is the sameas the previous paper, we consider only the employmentof 100% of the historical data in the evaluation of theproposed procedure. Due to space limitations in [14],we only presented the evaluations results when 50% and100% of the historical were considered.

The results shown in this section indicate that theemployment of the proposed mechanism was able toreduce the impact of misleading estimates of the avail-able bandwidth values and produced a large numberof qualified solutions. When the proposed procedure isused with schedulers that were not designed to copewith inaccurate information about the available band-width, the costs were not higher than when the U valuewas not used, which implies that a higher utilization ofresources in the private cloud was achieved.

6 RELATED WORKSchedulers for cloud computing determine which jobsshould use local resources and which resources should

be leased from the public cloud to minimize the work-flow execution time (makespan) and overall costs. Re-source allocation and workflow scheduling in heteroge-neous systems, such as hybrid clouds, have been inten-sively studied [7], [8], [13], [25]. Nevertheless, schedulingworkflows is an NP-Complete problem [2]; as a result,alternatives are crucial for finding approximate optimalsolutions. Heuristics [7], [13], meta-heuristics [8] andlinear programming [25] are tools that are commonlyused to solve scheduling problems.

Imprecision in the input information to the sched-uler has been approached in the literature using sev-eral reactive techniques, such as dynamic scheduling[26], rescheduling [27], and self-adjusting [28]. Reactivetechniques are best suited of fluctuations in resourceavailability during the execution of the workflow. Be-cause reactive mechanisms are based on monitoringof resource utilization and on the performance of theexecution of applications, imprecise information can begenerated by the intrusion effects of monitoring tools. Asa result, unnecessary job migration can occur in additionto monitoring overhead [29].

An algorithm based on fuzzy optimization to scheduleworkflows in grids under uncertainty in both applica-tion demands and resource availability was presentedin [29]. Although the authors compared the proposedalgorithm to static schedulers for heterogeneous sys-tems, the impact of imprecise estimates of the available




TABLE 4% of disqualified solutions of the evaluation for the HCOC scheduler (left) and the Adapted-PSO (right) of Figure 7

Uncertain p%HCOC0 20 40 60 80 99

40 LIGO 33% 37% 40% 47% 60% 70%

Mbps CyberShake 0% 0% 3% 7% 7% 50%

Montage 3% 7% 23% 27% 77% 90%

50 LIGO 30% 20% 23% 37% 43% 70%


Montage 7% 0% 3% 7% 33% 87%

60 LIGO 3% 3% 3% 0% 7% 43%


MONTAGE 7% 3% 3% 13% 20% 83%

Uncertain p%Adapted-PSO0 20 40 60 80 99

40 LIGO 3% 3% 3% 0% 7% 43%


Montage 7% 3% 3% 13% 20% 83%

50 LIGO 30% 20% 23% 37% 43% 70%


Montage 7% 0% 3% 7% 33% 87%

60 LIGO 0% 0% 0% 0% 3% 13%


Montage 7% 7% 3% 7% 10% 53%

bandwidth were not evaluated for utility grids andclouds. Rahman et al. [5] characterized a measurementstudy to characterize the impact of virtualization onthe performance of the network of the Amazon EC2.Through analysis of packet delay, TCP/UDP throughputand packet loss at Amazon EC2 virtual machines, theyconcluded that unstable network throughput and largedelay variations can negatively impact the performanceof scientific applications running on clouds. Indeed, theauthors argued that strategies for task-mapping needscustomization to handle inaccurate information on theavailable bandwidth.

Although these papers advance the workflow schedul-ing problem in cloud computing environments, noneof them considers the impact of bandwidth uncertaintyon the schedule. In other words, the scheduling algo-rithms assume that the estimated available bandwidthat the scheduling time is 100% precise. However, thisassumption is not always true, and the schedules can benegatively affected during the execution of the workflow.Tools for estimating available bandwidth produce esti-mates with large variability [30], and schedules can benegatively affected by misleading bandwidth estimatesduring the execution of workflows, especially in hybridclouds. Few works in the literature consider the impactof the bandwidth uncertainty in communication chan-nels at the scheduling time.

7 CONCLUSIONS

The performance of workflow execution in hybrid cloudsdepends largely on the available bandwidth in the inter-cloud links since this bandwidth determines the datatransfer time between tasks residing in different clouds.The available bandwidth in inter-cloud links fluctuatesconsiderably since Internet links are shared resources,this fluctuation contributes to imprecision in estimates ofthe available bandwidth, and the use of these estimatesas an input data to a scheduler usually results in increasein the workflow execution time, as well as costs. Thenegative impact of imprecise estimation of availablebandwidth can be reduced by the adoption of schedulingschemes which take into consideration uncertainties ininput values.

The mechanism proposed in this paper applies adeflating factor to the available bandwidth value that

is provided as input to the scheduler. The deflatingfactor is computed using multiple linear regression, andthe execution history of a specific workflow is used inthe computation. The procedure is employed before theinformation about the currently available bandwidth isprovided to the scheduler.

We evaluated the proposed multiple linear regres-sion mechanism using simulations and three differentscheduling algorithms to schedule three different real-life scientific workflow application. The results pro-vide evidence that, in general, the proposal is capableof producing schedules with less misleading cost andmakespan estimates, as well as increasing the numberof qualified solutions (solutions within the applicationdeadline). More specifically the proposed proceduresreduce the number of disqualified solution by up to 70%,with costs reduced by up to 10%.

As Future work, we intend to implement the proposedprocedure on a real testbed in order to verify the efficacyof the mechanism in a hybrid cloud environment. Be-sides that, we also intend to improve the performanceof the proposed mechanism by updating the historicalinformation as well as by deriving criteria for selectingobsolete information to be deleted from the database inorder to maintain the scalability of the procedure.

ACKNOWLEDGMENT

The authors would like to thank CNPq, CAPES and Stateof Sao Paulo Research Foundation (FAPESP 2012/02778-6 and 2014/08607-4) for the financial support.

REFERENCES

[1] Q. Zhang, L. Cheng, and R. Boutaba, “Cloud computing: state-of-the-art and research challenges,” Journal of Internet Services andApplications, vol. 1, no. 1, pp. 7–18, 2010.

[2] M. L. Pinedo, Scheduling: Theory, Algorithms, and Systems.Springer Publishing Company, 2008.

[3] T. A. L. Genez, L. F. Bittencourt, N. L. S. Da Fonseca, and E. R. M.Madeira, “Refining the estimation of the available bandwidth ininter-cloud links for task scheduling,” in IEEE Global Communica-tions Conference (GLOBECOM), Dec 2014.

[4] A. Greenberg, J. Hamilton, D. A. Maltz, and P. Patel, “The cost ofa cloud: Research problems in data center networks,” SIGCOMMComput. Commun. Rev., vol. 39, no. 1, pp. 68–73, Dec. 2008.

[5] G. Wang and T. S. E. Ng, “The impact of virtualization onnetwork performance of Amazon EC2 data center,” in IEEE Conf.on Information Communications (INFOCOM), 2010, pp. 1163–1171.




[6] C. Guo, G. Lu, H. J. Wang, S. Yang, C. Kong, P. Sun, W. Wu,and Y. Zhang, “SecondNet: A data center network virtualizationarchitecture with bandwidth guarantees,” in 6th International Con-ference. ACM, 2010, pp. 15:1–15:12.

[7] L. F. Bittencourt and E. R. M. Madeira, “HCOC: a cost opti-mization algorithm for workflow scheduling in hybrid clouds,”Internet Services and Applications, vol. 2, no. 3, pp. 207–227, 2011.

[8] S. Pandey, L. Wu, S. Guru, and R. Buyya, “A particle swarmoptimization-based heuristic for scheduling workflow applica-tions in cloud computing environments,” in IEEE InternationalConference on Advanced Information Networking and Applications,april 2010, pp. 400 –407.

[9] E. Deelman, G. Singh, M.-H. Su, J. Blythe, Y. Gil, C. Kesselman,G. Mehta, K. Vahi, G. B. Berriman, J. Good, A. Laity, J. C.Jacob, and D. S. Katz, “Pegasus: A framework for mappingcomplex scientific workflows onto distributed systems,” ScientificProgramming Journal, vol. 13, no. 3, pp. 219–237, 2005.

[10] A. Ramakrishnan, G. Singh, H. Zhao, E. Deelman, R. Sakellariou,K. Vahi, K. Blackburn, D. Meyers, and M. Samidi, “Schedulingdata-intensive workflows onto storage-constrained distributedresources,” in IEEE International Symposium on Cluster Computingand the Grid (CCGRID), May 2007, pp. 401–409.

[11] Y. Zhao, J. Dobson, I. Foster, L. Moreau, and M. Wilde, “A notationand system for expressing and executing cleanly typed workflowson messy scientific data,” ACM SIGMOD Record, vol. 34, no. 3,pp. 37–43, Sep. 2005.

[12] J. Annis, Y. Zhao, J. Voeckler, M. Wilde, S. Kent, and I. Foster,“Applying chimera virtual data concepts to cluster finding in thesloan sky survey,” in ACM/IEEE Conference Supercomputing, Nov2002, pp. 56–56.

[13] L. F. Bittencourt, E. R. M. Madeira, and N. L. S. da Fonseca,“Scheduling in hybrid clouds,” IEEE Communications Magazine,vol. 50, no. 9, pp. 42–47, 2012.

[14] L. Bittencourt, E. Madeira, and N. Da Fonseca, “Impact of commu-nication uncertainties on workflow scheduling in hybrid clouds,”in IEEE Global Communications Conference (GLOBECOM), Dec 2012,pp. 1623–1628.

[15] O. Biran, A. Corradi, M. Fanelli, L. Foschini, A. Nus, D. Raz,and E. Silvera, “A stable network-aware vm placement for cloudsystems,” in 2th IEEE/ACM International Symposium on Cluster,Cloud and Grid Computing, May 2012, pp. 498–506.

[16] K. LaCurts, S. Deng, A. Goyal, and H. Balakrishnan, “Choreo:Network-aware task placement for cloud applications,” in Con-ference on Internet Measurement Conference, 2013, pp. 191–204.

[17] R. F. da Silva, W. Chen, G. Juve, K. Vahi, and E. Deelman, “Com-munity resources for enabling research in distributed scientificworkflows,” in 10th IEEE int’l Conference on e-Science, Nov 2014.

[18] L. F. Bittencourt and E. R. M. Madeira, “A performance-orientedadaptive scheduler for dependent tasks on grids,” Concurrency andComputation: Prac. and Exp., vol. 20, no. 9, pp. 1029–1049, 2008.

[19] H. Topcuoglu, S. Hariri, and M.-Y. Wu, “Performance-effectiveand low-complexity task scheduling for heterogeneous comput-ing.” IEEE Transactions on Parallel and Distributed Systems, vol. 13,no. 3, pp. 260–274, 2002.

[20] H. Casanova, A. Legrand, D. Zagorodnov, and F. Berman,“Heuristics for scheduling parameter sweep applications in gridenvironments,” in 9th Heterogeneous Computing Wksh., 2000, pp.349–363.

[21] S. Sanghrajka, N. Mahajan, and R. Sion, “Cloud performancebenchmark series – network performance: Amazon EC2,” StonyBrook University, Tech. Rep., 2011.

[22] S. Sanghrajka and R. Sion, “Cloud performance benchmark series– network performance: Rackspace.com,” Stony Brook University,Tech. Rep., 2011.

[23] R. Wolski, N. T. Spring, and J. Hayes, “The network weatherservice: A distributed resource performance forecasting servicefor metacomputing,” Future Gener. Comput. Syst., vol. 15, no. 5-6,pp. 757–768, Oct. 1999.

[24] G. Juve, A. Chervenak, E. Deelman, S. Bharathi, G. Mehta, andK. Vahi, “Characterizing and profiling scientific workflows,” Fu-ture Gener. Comput. Syst., vol. 29, no. 3, pp. 682–692, Mar. 2013.

[25] T. A. L. Genez, L. F. Bittencourt, and E. R. M. Madeira, “WorkflowScheduling for SaaS / PaaS Cloud Providers Considering TwoSLA Levels,” in IEEE/IFIP Network Operations and ManagementSymposium (NOMS 2012). IEEE, 2012.

[26] G. Allen, D. Angulo, I. Foster, G. Lanfermann, C. Liu, T. Radke,E. Seidel, and J. Shalf, “The cactus worm: Experiments with dy-

namic resource discovery and allocation in a grid environment,”Journal of High Performance Computing App., vol. 15, p. 2001, 2001.

[27] R. Sakellariou and H. Zhao, “A low-cost rescheduling policyfor efficient mapping of workflows on grid systems,” ScientificProgramming, vol. 12, no. 4, pp. 253–262, Dec. 2004.

[28] D. M. Batista, N. L. S. da Fonseca, F. K. Miyazawa, and F. Granelli,“Self-adjustment of resource allocation for grid applications,”Computer Networks, vol. 52, no. 9, pp. 1762–1781, Jun. 2008.

[29] D. M. Batista and N. L. S. da Fonseca, “Robust scheduler for gridnetworks under uncertainties of both application demands andresource availability,” Computer Networks, vol. 55, no. 1, pp. 3–19,2011.

[30] D. M. Batista, L. J. Chaves, N. L. Fonseca, and A. Ziviani,“Performance analysis of available bandwidth estimation toolsfor grid networks,” Journal of Supercomputing, vol. 53, no. 1, pp.103–121, Jul. 2010.

Thiago Augusto Lopes Genez received hisbachelor’s degree in Computer Science from theState University of Londrina, Brazil, in 2009, andhis master’s degree in Computer Science fromthe University of Campinas, Brazil, in 2012. Heis currently a Ph.D. student in the Universityof Campinas, Brazil, and his research interestsinvolve aspects of cloud computing. He hasbeen researching algorithms for the problem ofworkflows scheduling in hybrid clouds and publiccloud.

Luiz Fernando Bittencourt is an Assistant Pro-fessor of Computer Science at the Universityof Campinas, Brazil. He received his Ph.D. in2010 from the University of Campinas, and hisresearch interests involve aspects of resourceallocation and management in heterogeneousdistributed systems, focusing on algorithms forscheduling in hybrid clouds. He is the recipientof the 2013 IEEE ComSoc Latin America YoungProfessional Award.

Nelson Luis Saldanha da Fonseca receivedhis Ph.D. degree in Computer Engineering fromThe University of Southern California in 1994.He is Full Professor at Institute of Computing ofThe University of Campinas, Campinas, Brazil.He has published 350+ papers and supervised50+ graduate students. He is Director for Con-ference Development of the IEEE Communi-cations Society (ComSoc). He served as VicePresident Member Relations of the ComSoc,Director of Latin America Region and Director of

on-line Services. He is the recipient of the 2012 IEEE CommunicationsSociety (ComSoc) Joseph LoCicero Award for Exemplary Service toPublications, the Medal of the Chancelor of the University of Pisa (2007)and the Elsevier Computer Network Journal Editor of Year 2001 award.He is past EiC of the IEEE Communications Surveys and Tutorials. Heis Senior Editor for the IEEE Communications Surveys and Tutorialsand Senior Editor for the IEEE Communications Magazine, a memberof the editorial board of Computer Networks, Peer-to-Peer Networkingand Applications, Journal of Internet Services and Applications andInternational Journal of Communication Systems. He founded the IEEELATINCOM and was technical chair of over 10 IEEE conferences

Edmundo Roberto Mauro Madeira is a FullProfessor of Computer Science at the Univer-sity of Campinas, Brazil, and member of theeditorial board of the Journal of Network andSystems Management (JNSM), Springer. He re-ceived his Ph.D. in Electrical Engineering fromthe University of Campinas in 1991, and hisresearch interests involve aspects of computernetworks, distributed system management, gridand cloud computing, mobility aware technolo-gies and wireless networks.

Documents

Impact of the Available Bandwidth in Inter-Cloud Links on ...nfonseca/data/uploads... · 2.2 Scheduling Workﬂows in Hybrid Clouds A hybrid cloud (Figure1) is composed of the resources