27
CSU498 PROJECT REPORT WORKFLOW SCHEDULING IN GRID COMPUTING Submitted in partial fulfillment of the requirements for the award of the degree of Bachelor of Technology in Computer Science and Engineering Submitted by SAMBIT KUMAR SAHOO B080322CS SHASHI KUMAR B080442CS VIBHUTI BHUSHAN B080487CS VIVEK RANJAN B080572CS Under the guidance of Mr. Vinod Pathari Assistant Professor CSED, NIT Calicut DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING NATIONAL INSTITUTE OF TECHNOLOGY CALICUT NIT CAMPUS PO, CALICUT KERALA, INDIA 673601 April 18, 2012

workflow scheduling in grid computing using genetic algorithm

Embed Size (px)

Citation preview

Page 1: workflow scheduling in grid computing using genetic algorithm

CSU498 PROJECTREPORT

WORKFLOW SCHEDULING IN GRID COMPUTING

Submitted in partial fulfillment ofthe requirements for the award of the degree of

Bachelor of Technologyin

Computer Science and EngineeringSubmitted by

SAMBIT KUMAR SAHOO B080322CSSHASHI KUMAR B080442CSVIBHUTI BHUSHAN B080487CSVIVEK RANJAN B080572CS

Under the guidance ofMr. Vinod PathariAssistant ProfessorCSED, NIT Calicut

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERINGNATIONAL INSTITUTE OF TECHNOLOGY CALICUT

NIT CAMPUS PO, CALICUTKERALA, INDIA 673601

April 18, 2012

Page 2: workflow scheduling in grid computing using genetic algorithm
Page 3: workflow scheduling in grid computing using genetic algorithm

Abstract

Grid computing is considered as a promising next generation computational plat-form that supports wide-area and distributed computing. Application that are beingused in this are generally regarded as workflows. Workflow management system modeland workflow task scheduling algorithms are critical research areas. The problem ofscheduling workflows in terms of certain Quality of Service(QoS) requirements is quitechallenging.

Aim of this project is to study different scheduling algorithms currently deployed inworkflow systems followed by introducing some alterations to one of these algorithmsto provide efficient solution to multiple QoS requirements, to simulate the proposedalgorithm and analyse its performance against the existing one.

Page 4: workflow scheduling in grid computing using genetic algorithm

Contents

1 Problem Definition 3

2 Introduction 42.1 Grid Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.2 Workflow Management System . . . . . . . . . . . . . . . . . . . . . . . . 52.3 Workflow Scheduling Algorithm For Grid Computing . . . . . . . . . . . . 6

2.3.1 Scheduling Problem Overview[16] . . . . . . . . . . . . . . . . . . . 62.3.2 Best Effort Based Workflow Scheduling . . . . . . . . . . . . . . . 6

2.4 QoS constraint based scheduling . . . . . . . . . . . . . . . . . . . . . . . 92.5 Genetic Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.5.1 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.5.2 Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.5.3 Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.5.4 Reproduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.5.5 Termination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.6 Ant Colony Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.6.1 Edge selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.6.2 Pheromone update . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3 Work Done 143.1 Selection of Benchmark . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.1.1 Genetic Algorithm[16] . . . . . . . . . . . . . . . . . . . . . . . . . 143.2 Overview of GridSim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153.3 Modification and Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.4 Result and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

4 Conclusion and Future Work 23

References 24

1

Page 5: workflow scheduling in grid computing using genetic algorithm

Chapter 1

Problem Definition

The aim of this project is to study different job-scheduling algorithms to the QoS basedmeta-heuristic problems using Genetic Algorithms (GA) and Ant Colony Optimization(ACO) approach. Then, to simulate one of these existing algorithms (our benchmark)[16]using GridSim[2] and to propose some modifications to the algorithm for an optimisedsolution, followed by its simulation and also to analyse the performance of the proposedalgorithm against the benchmark.

2

Page 6: workflow scheduling in grid computing using genetic algorithm

Chapter 2

Introduction

2.1 Grid Computing

• “Grid computing is a term referring to the combination of computer resources frommultiple administrative domains to reach a common goal.”[10]

• “The grid can be thought of as loosely coupled, heterogeneous distributed systemwith non-interactive workloads.”[10]

Grids [7] have emerged as a global cyber-infrastructure for the next-generation of e-science and e-business applications, by integrating large-scale, distributed and heteroge-neous resources. A number of Grid middleware and management tools such as Globus[6], UNICORE [1] and Grid Bus [3] have been developed, in order to provide the in-frastructure that enables users to access remote resources transparently over a secure,shared scalable world-wide network. More recently, Grid computing has progressed to-wards a service-oriented paradigm which defines a new way of service provisioning basedon utility computing models. Within utility Grids, each resource is represented as aservice to which consumers can negotiate their usage and Quality of Service.

Scientific communities in areas such as high-energy physics, gravitational-wave physics,geophysics, astronomy and bioinformatics, are utilizing Grids to share, manage and pro-cess large data sets. In order to support complex scientific experiments, distributedresources such as computational devices, data, applications and scientific instrumentsneed to be orchestrated while managing the application workflow operations within Gridenvironments. Workflow is concerned with the automation of procedures, whereby filesand other data are passed between participants according to a defined set of rules inorder to achieve an overall goal. A workflow management system defines, manages andexecutes workflow on computing resources.

• Workflow Management is the key service in Grid Computing.

3

Page 7: workflow scheduling in grid computing using genetic algorithm

2.2 Workflow Management System

• It is a Computer System that manages and defines a series of task within anorganization to produce a final outcome or outcomes.

Workflow is concerned with the automation of procedures where documents, informationor tasks are passed between participants according to a defined set of rules to achieve orcontribute to an overall business goal.

Realizing workflow management for Grid computing requires a number of challengesto be overcome. They include workflow application modeling, workflow scheduling, re-source discovery, information services, data management and fault management. How-ever, from the user’s perspective, two important barriers that need to be overcome are:

1. the complexity of developing and deploying workflow applications

2. their scheduling on heterogeneous and distributed resources to enhance the utilityof resources and meet user Quality of Service (QoS) demands.

Figure 2.1: Workflow Scheduling System for Grid [16]

4

Page 8: workflow scheduling in grid computing using genetic algorithm

2.3 Workflow Scheduling Algorithm For Grid Computing

Challenges for scheduling workflow applications in a Grid environment arises because

1. Resources are shared on Grids and many users compete for resources.

2. Resources are not under the control of the scheduler.

3. Resources are heterogeneous and may not all perform identically for any giventask.

4. Many workflow applications are data-intensive and large data sets are required tobe transferred between multiple sites.

Therefore, Grid workflow scheduling is required to consider non-dedicated and het-erogeneous execution environments. It also needs to address the issue of large datatransmission across various data communication links.

2.3.1 Scheduling Problem Overview[16]

We model a workflow application as a Directed Acyclic Graph (DAG). Let Γ be the finiteset of tasks Ti (1 ≤ i ≤ n). Let Λ be the set of directed arcs of the form (Ti, Tj) whereTi is called a parent task of Tj , and Tj the child task of Ti. We assume that a child taskcannot be executed until all of its parent tasks have been completed. Let m be the totalnumber of services available. There is a set of services Sj

i (1 ≤ i ≤ n, 1 ≤ j ≤mi, mi ≤m),capable of executing the task Ti, but each task can only be assigned for execution on oneof these services. Services have varied processing capability delivered at different prices.We denote tji as the sum of the processing time and data transmission time, and cji as

the sum of the service price and data transmission cost for processing Ti on service Sji .

Let B be the cost constraint (budget) and D be the time constraint (deadline) specifiedby the users for workflow execution. The budget constrained scheduling problem is tomap every Ti onto a suitable Sj

i to minimize the execution time of the workflow andcomplete it within B. The deadline constrained scheduling problem is to map every Ti

onto a suitable Sji to minimize the execution cost of the workflow and complete it within

D.

2.3.2 Best Effort Based Workflow Scheduling

Community Grids, in which resources are shared by different organizations, are targetedby best effort based workflow algorithms. These algorithms attempt to complete theexecution at the earliest possible time or minimize the makespan(time taken from thestart of an application till the outputs are available to the user).

In general best effort scheduling algorithms are derived either from heuristics ap-proach or from meta-heuristics approach.

5

Page 9: workflow scheduling in grid computing using genetic algorithm

Figure 2.2: Taxonomy of Workflow Scheduling[16]

Heuristics

Heuristic refers to thumb rules, intuitive judgment, educated guess overall experiencebased techniques for problem solving to speed up the process of finding solution whereexhaustive search is impractical[12].

The heuristics proposed for workflow scheduling can be classified into two categoriesi.e.

• task level

• workflow level

Task level heuristics makes workflow decisions solely based upon the data avail-able on the independent tasks at hand, while workflow level takes the whole workflowinto account. Min-Min, Max-Min, Suffrage are three major task level heuristics em-ployed for scheduling workflows on Grids. Two workflow level heuristics have beenemployed by ASKALON project[14]. One is based on Genetic Algorithms and anotheris a Heterogeneous-Earliest-Finish-Time (HEFT) algorithm[13](figure 2.3).

Metaheuristics

Metaheuristics are approximate optimization techniques developed in the last two decadeswith the sheer intention of tackling complex combinatorial optimization problems where

6

Page 10: workflow scheduling in grid computing using genetic algorithm

Figure 2.3: Comparison of Best Effort Workflow Scheduling Algorithms (Heuristics)[16]

classical heuristics and optimization methods failed to be effective. This method op-timizes a problem by iteratively improving a candidate solution with regard to givenqualities. It does not guarantee an optimized solution but searches a large space of can-didate solution by making no assumption. Figure 2.4 explains breakthrough algorithmsdeveloped in the last decade or so. [11]

Figure 2.4: Comparison of Best Effort Workflow Scheduling Algorithms (Meta-heuristics)[16]

.

7

Page 11: workflow scheduling in grid computing using genetic algorithm

Deadline Constrained Budget Constrained

Back-tracking LOSS and GAINDeadline distribution Genetic algorithmsGenetic algorithms Genetic algorithms

Table 2.1: QoS based algorithms on Budget and Deadline

2.4 QoS constraint based scheduling

“Many workflow applications require some assurances of quality of services (QoS). Work-flow scheduling is required to be able to analyze users’ QoS requirements and map work-flow on suitable resources such that the workflow execution can be completed to satisfyusers’ QoS constraints.

However, whether the execution can be completed within a required QoS not onlydepend on the global scheduling decision of the workflow scheduler but also depend onthe local resource allocation model of each execution site. If the execution of every singletask in the workflow cannot be completed as what the scheduler expects, it is impossibleto guarantee the entire workflow execution. Instead of scheduling tasks on communityGrids, QoS-constraint based schedulers should be able to interact with service-orientedGrid services to ensure resource availability and QoS levels. It is required that thescheduler can negotiate with service providers to establish a service level agreement(SLA) which is a contract specifying the minimum expectations and obligations betweenservice providers and consumers. Users normally would like to specify a QoS constraintfor entire workflow. The scheduler needs to determine a QoS constraint for each task inthe workflow, such that the QoS of entire workflow is satisfied.

In general, service-oriented Grid services are based on utility computing models.Users need to pay for resource access. Service pricing is based on the QoS level andcurrent market supply and demand. Therefore, unlike the scheduling strategy deployedin community Grids, QoS constraint based scheduling may not always need to completethe execution at earliest time. They sometimes may prefer to use cheaper services witha lower QoS that is sufficient to meet their requirements.”[16]

Now we shall discuss the two meta-heuristic approaches Genetic Algorithms (Section2.5) and Ant Colony Optimization Algorithms (Section 2.6).

2.5 Genetic Algorithms

“A genetic algorithm (GA) is a search meta-heuristic that is used to generate useful solu-tions to optimization and search problems. It belong to the larger class of evolutionaryalgorithms (EA), which generate solutions to optimization problems using techniquesinspired by natural evolution, such as inheritance, mutation, selection and crossover.”[9]

8

Page 12: workflow scheduling in grid computing using genetic algorithm

Figure 2.5: A General Flow of a Genetic Algorithm

2.5.1 Requirements

A genetic algorithm requires:

1. A genetic representation of the solution domain

In a GA,each candidate solution (individual) needs to be encoded based on certainset of characteristics. Generally a solution is represented by an array of bits.Though arrays of other types and structures can also be used. A representationwith fixed size facilitates crossover operator.

2. A fitness function to evaluate the solution

A fitness function is required for the following purpose:

• Parent selection

• Discarding individuals

• Measure for convergence

It is always problem dependent. The evolution of the solution begins with a randompopulation and occurs in generation. In every cycle, few individuals are selectedfrom the present population of solutions based on their fitness measure. In otherwords, a fitness function is a measure of struggle for life by the individuals.

After the genetic representation and the fitness function have been defined, we ini-tialise a population of solutions (randomly generated or occasionally generated by someheuristics) and optimize it by repeatedly applying stochastic operations like selection,crossover, mutation, etc.

9

Page 13: workflow scheduling in grid computing using genetic algorithm

Figure 2.6: A Simple Genetic Algorithm[9]

2.5.2 Initialization

Initially, we generated many individual solutions to form an initial population set. Gen-erally, the initial solutions cover the entire range of possible solutions. Though, sometimethe solution is picked from the areas where most of the optimal solution are likely tooccur.

2.5.3 Selection

In every cycle, few individual solutions are selected, based on the evaluation by thefitness function, to reproduce new solutions.

2.5.4 Reproduction

Now, we need to generate a new generation of population of solutions from the initialpopulation using the following genetic operators:

• crossover (also called recombination): It creates new individuals on the currentpopulation by combining of rearranging parts of the existing individuals.

• mutation: It occasionally occurs to allow a child to obtain features that are notpossessed by either parent.

2.5.5 Termination

The reproduction cycle terminates in the following cases:

• A solution with the minimum criteria is obtained.• No new solution can be reproduced.• Fixed number of cycles has been reached.• Allocated budget or deadline exhausted.• Combination of the above.

10

Page 14: workflow scheduling in grid computing using genetic algorithm

2.6 Ant Colony Optimization

The ant colony optimization algorithm (ACO) is a probabilistic technique for solvingcomputational problems which can be reduced to finding good paths through graphs[5].

In the natural world, ants (initially) wander randomly, and upon finding food returnto their colony by laying down pheromone trails. If other ants find such a path, theyfollow the trail instead of keep traveling at random. Over time, however, the pheromonetrail starts to evaporate, thus reducing its attractive strength. The more time it takesfor an ant to travel down the path and back again, the more time the pheromones haveto evaporate. A short path, by comparison, gets marched over more frequently, and thusthe pheromone density becomes higher on shorter paths than longer ones.

If there were no pheromone evaporation at all, the paths chosen by the first ants wouldtend to be excessively attractive to the following ones. In that case, the exploration ofthe solution space would be constrained.

Thus, when one ant finds a good (i.e., short) path from the colony to a food source,other ants are more likely to follow that path, and positive feedback eventually leads allthe ants following a single path.

Figure 2.7: A Natural Path Optimization by ants[4]

The idea of the ant colony optimization is inspired from the above behavior of ants.

2.6.1 Edge selection

An ant is a computational agent in the ACO that iteratively constructs a solution forthe problem at hand. At each iteration of the algorithm, each ant moves from a statex to state y, corresponding to a more complete intermediate solution. For ant k, theprobability pkxy of moving from state x to state y depends on the combination of twovalues, which are:

• the attractiveness ηxy of the move, indicating desirability that biases the antsdepending upon the problem

• the trail level τxy of the move, indicating how proficient it has been in the past tomake that particular move

11

Page 15: workflow scheduling in grid computing using genetic algorithm

In general, the kth ant moves from state x to state y with probability (as given in[4])

pkxy =(ταxy)(η

βxy)

Σ(ταxy)(ηβxy)

(2.1)

whereτxy is the amount of pheromone deposited for transition from state x to y,0 ≤ α is a parameter to control the influence of τxy,ηxy is the desirability of state transition xy (from former knowledge as computed by

heuristics, typically 1 / dxy, where d is the distance) andβ ≥ 1 is a parameter to control the influence of ηxy.

2.6.2 Pheromone update

When all the ants have completed a solution, the trails are updated by (as given in [4])

τkxy = (1− ρ)ταxy +Δταxy (2.2)

whereτkxy is the amount of pheromone deposited for a state transition xy,ρ is the pheromone evaporation coefficient andΔταxy is the amount of pheromone deposited.

In the next chapter, we will discuss the contribution made by us to thisproject.

12

Page 16: workflow scheduling in grid computing using genetic algorithm

Chapter 3

Work Done

3.1 Selection of Benchmark

We have selected a benchmark of Genetic Algorithms Approach[16].

3.1.1 Genetic Algorithm[16]

Initial SolutionGenerally, initial solutions are randomly generated. The algorithm has takenadvantage of Greedy time-cost distribution(CD)[15] and Greedy cost-timedistribution(TD)[15] for the initial solution.

Fitness FunctionFor the budget constrained scheduling, the cost-fitness function:

Fcost(I) =c(I)

Bα ∗maxCost(1−α), α = {0, 1} (3.1)

where c(I) is total cost of individual I, maxCost is the most expensive solution ofcurrent population and B is the budget of workflow.For the deadline constrained scheduling, the time-fitness function:

Ftime(I) =t(I)

Bβ ∗maxTime(1−β), β = {0, 1} (3.2)

where t(I) is total completion time of individual I, maxTime is the most expensivesolution of current population and B is the deadline of workflow.

Selection SchemeRoulette wheel selection scheme is used [16].

ExperimentIn order to evaluate, the algorithm was implemented and the test result was comparedwith a set of non-GA heuristics for two different types of workflow applications on asimulated Grid testbed.

13

Page 17: workflow scheduling in grid computing using genetic algorithm

Workflow ApplicationWorkflow application structures can be categorized as either balanced structure orunbalanced structure:1. The balanced-structure application consists of several parallel pipelines, whichrequire the same types of services but process different data sets.2. In the Unbalanced-structure application, many parallel tasks in the unbalancedstructure require different types of services, and their workload and I/O data variessignificantly.

3.2 Overview of GridSim

The GridSim tool-kit provides a comprehensive facility for simulation of differentclasses of heterogeneous resources, user applications, resource brokers andschedulers[3]. It can be used to simulate application schedulers for single or multipleadministrative domain distributed computing systems such as clusters and grids.System Architecture

Figure 3.1: System Architecture of GridSim [3]

14

Page 18: workflow scheduling in grid computing using genetic algorithm

For evaluating the performance of the workflow scheduling algorithms, we have tosimulate a grid broker using GridSim package, where we can insert these optimizedscheduling algorithms.

Figure 3.2: Broker Architecture [3]

Following key classes are implemented by grid broker package.

• Class Experiment : acts as a placeholder for representing simulation experimentconfiguration that includes synthesized application (a set of Gridlets stored inGridletList) and user requirements such as D and B factors or deadline andbudget constraints, and optimization strategy.The user entity invokes the brokerentity and passes its requirements via the experiment object. On receiving anexperiment from its user, the broker schedules Gridlets according to theoptimization policy set for the experiment.

• class UserEntity: a GridSim entity that simulates the user. It invokes the brokerand passes the user requirements. When it receives the results of applicationprocessing, it records parameters of interest with the gridsim.Statistics entity.

• class Broker : a GridSim entity that simulates the Grid resource broker. Onreceiving an experiment from the user entity, it carries out resource discovery,and determines deadline and budget values based on D and B factors, and thenproceeds with scheduling. It schedules Gridlets on resources depending on userconstraints, optimization strategy, and cost of resources and their availability.

15

Page 19: workflow scheduling in grid computing using genetic algorithm

• class BrokerResourse : acts as a placeholder for the broker to maintain a detailedrecord on the resources it uses for processing user applications. It maintainsresource characteristics, a list of Gridlets assigned to the resource, the actualamount of MIPS available to the user, and a report on the Gridlets processed.

• class ReportWriter : a user-defined, optional GridSim entity which is meant forcreating a report at the end of each simulation by interacting with thegridsim.Statistics entity.

The Event Diagram of the interaction between different entities of Broker is shown infig 3.3.

Figure 3.3: Activity Diagram for interaction between different entities of broker

16

Page 20: workflow scheduling in grid computing using genetic algorithm

Simulation EnvironmentFigure 3.4 shows the simulation in which the simulated services are discovered byquerying the GridSim Indexing Service (GIS). Every service is able to handle a freeslot query, reservation request and commitment.

Figure 3.4: Simulation Environment of Gridsim[15]

In the next subsection, we will discuss the modifications made by us in the benchmarkalgorithm. Also we will cover the modification that is required in GridSim to simulatethe proposed algorithm.

17

Page 21: workflow scheduling in grid computing using genetic algorithm

3.3 Modification and Design

While simulating the algorithms, we faced many difficulties as Gridsim toolkit lacks afew functionalities. Thus, we had to make the following changes:

• Gridsim toolkit is designed for homogeneous subtasks(subtask is represented bygridlets in gridsim toolkit). Hence we modified the Gridsim.Gridlet class toinclude a resource list which contains all those resources which can process thatgridlet(subtask).

• Gridsim toolkit is designed for independent subtasks and supports linked list. Wehad to modify the Gridsim.gridlet class to include a dependency list whichconsists of the parent gridlets. This helps to form a DAG representation forgraphs. We had to perform topological sort to get the DAG out of the linked list.

earlier,

now,

• For processing the Gridlet (subtask) in accordance with their dependency, wemodified method “dispatcher()” in “gridbroker.Broker” class to send thescheduled Gridlet according to their dependency rather than ResourceList.

In order to give better solution, we used Min-min (A best-effort heuristic approach)strategy to find initial solutions for Genetic approach.

18

Page 22: workflow scheduling in grid computing using genetic algorithm

We proposed new fitness functions.For Budget Optimization:

Fc(I) =C(I)−minCost

maxCost−minCost∗ C(I)

B(3.3)

where,Fc(I) is Cost fitness function of individual I in population.C(I) is cost for individual I.maxCost is the most expensive solution of the current population.minCost is the least expensive solution of the current population.B is the user budget.

• Individual with lower Fc(I) is considered better (more fit).

• If C(I) for an individual is greater than B, then the term C(I)B is greater than one

and thus it acts as an penalty for the individual.

For Deadline Optimization:

Ft(I) =T (I)−minTime

maxTime−minTime∗ T (I)

D(3.4)

where,Ft(I) is Time fitness function of individual I in population.T(I) is time for individual I.maxTime is the most expensive (in terms of time) solution of the current

population.minTime is the least expensive (in terms of time) solution of the current population.D is the user deadline.

• Individual with lower Ft(I) is considered better (more fit).

• If T(I) for an individual is greater than D, then the term T (I)D is greater than one

and thus it acts as an penalty for the individual.

For Cost-Time Optimization:

F (I) = α ∗ Fc(I) + (1− α) ∗ Ft(I) (3.5)

where α is budget factor; 0≤ α ≤1 and (1-α) is deadline factor.

19

Page 23: workflow scheduling in grid computing using genetic algorithm

3.4 Result and Analysis

While simulating the benchmark and our proposed algorithms, we took 100 gridletsand 12 resources. The specification of resources are as follows:

GA : Genetic algorithm with initial solutions randomly selected (Bechmark).CD : Greedy time-cost distribution approach.TD : Greedy cost-time distribution approach.TD+GA: Genetic algorithm with initial solutions generated by TD.CD+GA: Genetic algorithm with initial solutions generated by CD.MIN+GA:Genetic algorithm with initial solutions generated by min-min heuristic.

The simulation results are shown below(in Fig. 3.5 and Fig. 3.6).

Cost optimization within a set deadline

Figure 3.5: Execution cost of Budget-Constraint Approaches

20

Page 24: workflow scheduling in grid computing using genetic algorithm

We can see all approaches cannot satisfy the low budget constraint. At low budget,Genetic Algorithm does not get much budget to optimise. Hence, general costoptimization strategy CD performs better than basic GA and other modified GA(withinitial solution-MIN). As in the case of CD+GA, GA does not optimise the initialsolution. Hence, we get the same result. MIN+GA performs worst, because it hasalready optimized according to best-effort and then trying to optimise according tobudget constraint.At medium budget(5000), GA performs better than CD because the decision of taskassignment for CD is based only on local budget constraint and does not consider taskdependencies. CD distributes the budget among the tasks and then finds the fastestservice. Hence, CD is not much efficient. Also, because of this GA performs betterthan CD+GA and MIN+GA.At high budget constraint, budget constraint distribution can be released. Hence CDand GA performs almost similar. Moreover CD+GA and MIN+GA produce betterresults.

Time optimization within a set budget

Figure 3.6: Execution time of Deadline-Constraint Approaches

We can see it is hard for all approaches to successfully meet the deadline constraints.But MIN+GA performs better as min-min heuristic already selected fastest solution asinitial population.Also, since TD distributes the overall deadline between tasks based on both taskworkload and task dependencies; GA and TD both perform almost similar as deadlineincreases. Both TD+GA and MIN+GA perform better than other two. MIN+GAgives better result than TD+GA because MIN+GA starts with best-effort approach.

This shows that the genetic algorithms can improve the overall results byemploying some heuristics to get better individuals in its initial solutions.

21

Page 25: workflow scheduling in grid computing using genetic algorithm

Chapter 4

Conclusion and Future Work

We studied different strategies for scheduling employed in grid computing, bothheuristics and meta-heuristics. Then, we described the two meta-heuristics (GA andACO) approaches required for this project. Later we selected Genetic Algorithm towork on and chose a benchmark for the same. Next, a simulation environment,GridSim, was selected to evaluate their performance with the results claimed by thesebenchmarks.Finally we modified the algorithm given in the benchmark[16], simulated it using thesimulation kit GridSim and compared the performance.This project aims to contribute to better deadline and budget constrained algorithms.Here we used min-min and other heuristics for the initial solution in our geneticapproach. But min-min heuristic do not normally consider users’ QoS requirements, soQoS guided min-min[8] can be used. In our work, we only took deadline and budgetinto the consideration for users’ QoS. Security can also be taken into the considerationand a security fitness function can also be defined.Moreover our work can be continued in the same line for an improved ACO schedulingalgorithm by selecting a better phormone updation equation.

22

Page 26: workflow scheduling in grid computing using genetic algorithm

Bibliography

[1] J. Almond and D. Snelling. Unicore: uniform access to supercomputing as anelement of electronic commerce. Future Generation Computer Systems,15(5-6):539–548, 1999.

[2] R. Buyya and M. Murshed. Gridsim: A toolkit for the modeling and simulation ofdistributed resource management and scheduling for grid computing. Concurrencyand Computation: Practice and Experience, 14(13-15):1175–1220, 2002.

[3] R. Buyya and S. Venugopal. The gridbus toolkit for service oriented grid andutility computing: An overview and status report. In Grid Economics andBusiness Models, 2004. GECON 2004. 1st IEEE International Workshop on,pages 19–66. IEEE, 2004.

[4] M. Dorigo, M. Birattari, and T. Stutzle. Ant colony optimization. ComputationalIntelligence Magazine, IEEE, 1(4):28–39, 2006.

[5] M. Dorigo and T. Stutzle. Ant colony optimization. MIT Press, 2004.

[6] I. Foster and C. Kesselman. Globus: A metacomputing infrastructure toolkit.International Journal of High Performance Computing Applications, 11(2):115,1997.

[7] I. Foster, C. Kesselman, et al. The grid: blueprint for a future computinginfrastructure, 1999.

[8] X.S. He, X.H. Sun, and G. Von Laszewski. Qos guided min-min heuristic for gridtask scheduling. Journal of Computer Science and Technology, 18(4):442–451,2003.

[9] http://en.wikipedia.org/wiki/Genetic algorithm.

[10] http://en.wikipedia.org/wiki/Grid computing.

[11] I.H. Osman and G. Laporte. Metaheuristics: A bibliography. Annals ofOperations Research, 63:513–623, 1996.

[12] J. Pearl. Heuristics: intelligent search strategies for computer problem solving.1984.

23

Page 27: workflow scheduling in grid computing using genetic algorithm

[13] H. Topcuoglu, S. Hariri, and M. Wu. Performance-effective and low-complexitytask scheduling for heterogeneous computing. IEEE transactions on parallel anddistributed systems, pages 260–274, 2002.

[14] M. Wieczorek, R. Prodan, and T. Fahringer. Scheduling of scientific workflows inthe askalon grid environment. ACM SIGMOD Record, 34(3):56–62, 2005.

[15] J. Yu. Qos-based scheduling of workflows on global grids. 2007.

[16] J. Yu, R. Buyya, and K. Ramamohanarao. Workflow scheduling algorithms forgrid computing. Metaheuristics for Scheduling in Distributed ComputingEnvironments, pages 173–214, 2008.

24