Upload
subhabrata
View
215
Download
2
Embed Size (px)
Citation preview
Local Search Based Approach in Grid SchedulingUsing Simulated Annealing
Rajmohan Goswami, Tarun Kumar Ghosh and Subhabrata Barman
Abstract— Grid computing is a High Performance Computingenvironment. Load balancing is an important aspect of Grids,which provide the necessary resource management features. Thisdecision is made by a resource management component of theGrid, namely, the Grid scheduler. While Grid users are often in-terested in satisfaction of their Quality of Service (QoS) require-ments, these cannot be satisfactorily handled by commonly usedqueue-based approaches. In this paper, a local search based ap-proach based on Simulated Annealing (SA) is proposed as a pe-riodical optimizer to other dynamic, space shared and schedule-based scheduling policies. SA is found to be most reliable methodto apply in practice. Using SA it has been proved that it is possibleto converge to the best solution.
Index Terms—Grid Computing, Resource Broker, GridScheduling, Local Search, Simulated Annealing, GridSim.
I. INTRODUCTION
We are living in an Exponential World. Moore’s Law states thattransistor count doubles each 18 months. Storage density dou-bles every 12 months. In recent years, parallel and distributedsystems (PDSs) have emerged as viable solutions to meet theseever increasing needs for computational power and data man-agement capability. These systems offer speedup in computa-tional performance that is necessary to support computationallyintensive grand challenge applications, such as drug discovery,economic forecasting, seismic analysis, and back office dataprocessing in support for e-commerce and Web services [1],[2].
A parallel system consists of multiple processors in closecommunication, normally located within the same machine. Inparallel computing, all processors have access to a shared mem-ory. A distributed system spreads out the computation amongautonomous computers, which are physically distributed ingeneral. Each distributed computer accesses its own local mem-ory and communicates with other computers through networkssuch as the Internet. A distributed system supports both re-source sharing and load sharing, and is fault-tolerant throughthe replication of devices and data in separate physical loca-tions. It also provides a good price/performance ratio. A Grid isa very large-scale distributed system that can scale to Internet-size environments with resources distributed across multiple or-ganizations and administrative domains [1].
R. Goswami(email:[email protected]) Corresponding author,T. K. Ghosh (email:tarun ghosh [email protected]) and S. Barman(email:subhabrata [email protected]) are with the Computer Science andEngineering, Haldia Institute of Technology, Haldia, Purba Medinipur-721 657,West Bengal, India.
Balancing and sharing resources are important aspects ofGrid, which provide the necessary resource management fea-tures. This decision is made by a resource management compo-nent of the Grid, namely, the Grid scheduler. If a system in theGrid is over-loaded, the Grid scheduling algorithm reschedulessome of the tasks to other systems that are idle or less loaded.In this way the Grid scheduling algorithm transparently trans-fers the tasks to a less loaded system thereby making use of theunder utilized resources [3].
This paper is organized as follows. Section II briefly out-lines the relevant past work done on scheduling in Grid environ-ment. In Section III, the Grid scheduling problem has shownto be NP(nondeterministic polynomial)-complete. Section IVhighlights Simulated Annealing (SA) technique. Section V de-scribes implementation of SA in Grid Scheduling. While Sec-tion VI presents the simulation of the proposed algorithm, Sec-tion VII presents the results obtained from simulation. In fine,Section VIII concludes the paper.
II. GRID RESOURCE MANAGEMENT AND SCHEDULING
ISSUES
Schedule is a complex data structure which maps jobs onto ex-isting resources in time. Scheduling is a process of allocatingjobs onto available resources in time. Such process has to re-spect constraints imposed by the jobs and the Grid. Schedulinghas several phases such as schedule creation, schedule modifi-cation and/or schedule optimization. Usually one or more opti-mization criteria are used to make scheduling decisions [4].
A. Resource Broker
Because of the heterogeneous and the dynamic nature of theGrid, scheduling in Grid Environment is significantly compli-cated. Most Grid systems use the Grid resource broker forresource discovery, deciding allocation of a job to a partic-ular resource, binding of user applications (files)to hardwareresources, initiating computations, adapting to the changes inGrid resources and presenting the Grid to the user as a single,unified resource. It finally controls the physical allocation ofthe tasks and manages the available resources constantly whiledynamically updating the Grid scheduler whenever there is achange in resource availability [5].
International Conference on Computer & Communication Technology (ICCCT)-2011
978-1-4577-1386-611$26.00©2011 IEEE 340
B. Taxonomy for Grid Scheduling Algorithms
Out of the various scheduling policies known, static and dy-namic schedulings are important and easy to implement.
Static Scheduling, also called off-line scheduling technique,is a scheduling in which all decisions are taken before the exe-cution of a schedule. It is suitable when all the tasks (or appli-cations) and resources are known in advance.
Dynamic Scheduling, also called on-line scheduling, is ascheduling in which some or all the decisions are taken duringthe execution. It is suitable when jobs and machines are comingon-line or going off-line due to failures, the processor speed ofeach processor is varying during the scheduling and difficultiesare encountered in predicting the cost of applications. Dynamicmapping is performed when the arrival of tasks is not knownbeforehand [6].
C. Quality of Service (QoS)
Job is a user-defined task that is scheduled to be carried out byan execution subsystem. Resource is an entity that is usefulin a Grid environment. The term usually encompasses enti-ties that are pooled (e.g., hosts, software licenses, IP addresses)or that provide a given capacity (e.g., disks, networks, mem-ory, databases). However, entities such as processes, print jobs,database query results and virtual organizations may also berepresented and handled as resources [7].
Scheduling in Grid computing means that one or more users’jobs that may be scheduled without knowing where the re-sources are located or even who owns the resources.
A Grid scheduler or scheduling algorithm has to guaran-tee the QoS of a job’s execution, which includes a goodprice/performance ratio and of course completion of job at theearliest before the deadline.
III. PROBLEM DESCRIPTION
NP is the set of all decision problems solvable by nondetermin-istic algorithms in polynomial time. A problem is said to beNP-hard if it is at least as hard as any problem in NP. A prob-lem is said to be NP-complete if it is NP-hard, and also in NPitself [8].
The Grid scheduling problem has shown to be NP-completein its general as well as in some restricted forms.
A. Formulation of Grid Scheduling
The Grid scheduling problem is formulated considering T oftasks, processor number m ∈ Z
+, length l(t) ∈ Z+ for each
t ∈ T and deadline D ∈ Z+.
The problem is to find that if there is an m-processor schedulefor T that meets the overall deadline D, i.e., a function σ : T →Z0
+ such that, for all u ≥ 0 the number of tasks t ∈ T for
which σ(t) ≤ u < σ(t) + l(t) is not more than m and suchthat, for all t ∈ T , σ(t) + l(t) ≤ D.
It remains NP-complete for m = 2, but can be solved inpseudo-polynomial time for any fixed m. It is NP-completein the strong sense for m arbitrary. If all tasks have the samelength, then this problem is trivial to solve in polynomial time,even for “different speed” processors [9].
To cope with NP-complete problems, computer scientists nolonger focus on finding an optimal solution, but instead try tofind a “good” solution within an acceptable amount of time.Algorithms that do this are loosely termed heuristic algorithms,since they frequently are based on sensible rules of thumb. Themost widely applied technique is Local Search. Algorithmsbased on Local Search (LS) belong to the family of meta-heuristic algorithms. LS based techniques start with existing—so called initial schedule and improve such a complete scheduleduring the computation. Initial schedule may be randomly gen-erated or some other technique such as dispatching rule may beused for its creation [9].
The neighbourhood of a schedule is any new schedule whichis created from the previous one with a single local change.Local change usually means moving one job into new positionor a swap of two jobs [10].
B. Framework and Notation
The objective to be minimized is always a function of the com-pletion times of the jobs, which, of course, depend on the sched-ule. The completion time of the execution of job j on machinei is denoted by Cij . The time job j exits the system (that is, itscompletion time on the last machine on which it requires pro-cessing) is denoted by Cj . The objective may also be a functionof the due dates dj . The lateness of job j is defined as
Lj = Cj − dj
which is positive when job j is completed late and negativewhen it is completed early. The tardiness of job j is definedas
Tj = max(Cj − dj , 0) = max(Lj , 0)
The makespan, defined as max(C1, . . . , Cn), is equivalentto the completion time of the last job to leave the system. Aminimum makespan usually implies a good utilization of themachine(s) [10].
The lateness, the tardiness and the unit penalty are the threebasic due date related penalty functions.
Job j also requires Ri,j number of CPUs for its execution(Ri,j > 0). Resources are computational machines with knowncapacity Ri, representing the number of CPUs. All CPUswithin one machine have the same speed si representing thenumber of operations per second. Different machines may havedifferent speed and number of CPUs. All Machines use thespace shared processor allocation policy which allows parallelexecution of k jobs on machine i if Ri ≥
∑kj=1 Ri,j [11].
International Conference on Computer & Communication Technology (ICCCT)-2011
341
IV. SIMULATED ANNEALING (SA)- A META-HEURISTIC
SA is a variant of local search to find a good solution to anoptimization problem by trying random variations of the cur-rent solution. Traditional local search (e.g. steepest descentfor minimization) always moves in a direction of improvement.SA allows non-improving moves to avoid getting stuck at a lo-cal optimum with a probability that decreases as the compu-tation proceeds. The slower the cooling schedule, or rate ofdecrease, the more likely the algorithm is to find an optimal ornear-optimal solution [12], [13].
The idea of SA comes from a paper published by Metropoliset al. in 1953 [14]. The algorithm in this paper simulated thecooling of material in a heat bath. This is a process known asannealing.
In 1982, Kirkpatrick et al. [15] took the idea of the Metropo-lis algorithm and applied it to optimisation problems. The ideais to use SA to search for feasible solutions and converge to anoptimal solution.
A. Basic Notions of SA
Thermodynamics states that at temperature t, the probability ofan increase in energy of magnitude dE is
P (dE) = e−dE/kBt
where kB is the Boltzmann’s constant [10], [13].
SA generates a perturbation and calculates the resulting en-ergy change. If energy has decreased then the system movesto this new state, otherwise the new state is accepted with theabove probability. Boltzmann’s constant is dropped since kdoes not have a physical analogy, but t is still referred to as tem-perature which is reduced according to a cooling schedule [16],[13].
B. An Example
One instance of Travelling Salesman Problem (TSP), a graph of48 cities representing 48 capital cities in United States, is onefor which SA technique has been successful.
Firstly, random sampling is performed on TSP. The best so-lution found after testing 1.5 million random permutations is101, 712.8. This is more than three times the cost of the opti-mal tour of 33, 523.7.
Next, hill climbing is performed on TSP. With over a totalof 1.5 million tour evaluations in the 48-city TSP instance, thebest local search tour is a length of 40, 121.2–only 19.6% morethan the optimal tour.
Lastly, SA is performed on TSP. SA gave us a solution ofcost 36, 617.4–only 9.2% over the optimum. Letting it run for5,000,000 iterations get the score down to 34, 254.9, or 2.2%over the optimum.The SA solution works admirably [17].
V. APPLICATION OF SA IN GRID SCHEDULING
This paper proposes SA optimization algorithm to increase themachine usage and decrease the number of waiting jobs. SAis periodically used to optimize the initial solution accordingto the objective function. The initial solution is created usingthe Earliest Gap - Earlier Deadline First scheduling policy pro-posed by Dalibor Klusacek [11].
A. Earliest Gap - Earlier Deadline First (EG-EDF) SchedulingPolicy
EG-EDF is a schedule - based dynamic scheduling policy (SeeAlgorithm 1). Klusacek defined a gap as the period of idle CPUtime. If the number of currently available CPUs of a machine isgreater than the number of CPUs requested by the job(s) in thegiven time period, a gap appears. When a new job arrives, allthe fitting machines are then tested for a suitable gap. If thereexist more than one appropriate gap, the earliest one is cho-sen (Earliest Gap (EG) policy) and the new schedule is com-puted according to the AcceptanceCriterion function (See Al-gorithm 2). The new scheduling is searched by the EDF policyto find the job whose deadline djobk
> djob . Incoming jobis placed between jobk−1 and jobk. Klusacek mentioned thatEG-EDF policy only focuses on newly arriving job–previouslyscheduled jobs are not considered. In such case many gaps mayremain in the schedule, while some previously assigned jobsmay be starving. [11].
At that point SA optimization technique (See Algorithm 3)is used in the proposed algorithm to fill the gaps efficiently.At the outset, the initial value of the control parameter t, thetemperature decrement function, the number of iterations andthe time limit are defined.
B. Implementation of SA
In the proposed implementation of SA technique, the Gridschedule is represented by an array of particular machines’schedules. Single machine’s schedule is represented by a linearlist of jobs. In each iteration, a machine is chosen randomly anda starving job is removed randomly from that machine’s sched-ule. If the optimizer finds a suitable gap, annealing criterion iscomputed using the formula
d = (diffu ∗ 1.0) + (diffst ∗ 1.0) + (diffsd ∗ 1.0);
where
d = decisiondiffu = differenceusage
u = max(0.0, previous usage)
diffu =current usage − previous usage
u
diffst = differencestart time
st = max(0.0, previous start)
International Conference on Computer & Communication Technology (ICCCT)-2011
342
diffst =previous start − current start
st
diffsd = differenceslow down
sd = max(1.0, previous sd)
diffsd =previous sd − current sd
sd
If new decision is greater than zero, then annealing takesplace. If new decision is less than zero, then probabilistic ac-ceptance of non-improving moves is allowed. Otherwise thejob is removed from the gap, the temperature is updated and thenext iteration starts. This procedure continues until the definednumber of iterations or the defined time limit, which is earlier,is over.
All machines use the Space Sharing processor allocation pol-icy which allows parallel execution of k jobs on machine i.
Algorithm 1 Earliest Gap-Earlier Deadline First(job)
1: scheduleinitial ⇐ [machsched1, . . . ,machschedm]2: schedulenew ⇐ ø3: schedulebest ⇐ ø4: gapfound ⇐ false5: k ⇐ 06: for i ⇐ 0 to m do7: if machinei is suitable to perform job then8: if suitable gap for job was found in schedulenew[i]
then9: gapfound ⇐ true
10: schedulenew ⇐ scheduleinitial
11: schedulenew[i] ⇐ place job into found gap inschedulenew[i]
12: else if gapfound = false then13: schedulenew ⇐ scheduleinitial
14: k ⇐ index of the first jobk ∈ schedulenew[i]whose djobk
> djob
15: schedulenew[i] ⇐ insert job into schedulenew[i]between jobk−1 and jobk
16: end if17: if AcceptanceCriterion(schedulebest, schedulenew)=
true then18: schedulebest ⇐ schedulenew
19: end if20: end if21: end for22: return schedulebest
VI. EVALUATION BY SIMULATION
The proposed algorithm was simulated in Grid scheduling sim-ulation environment Alea 2.1 [18]. Alea 2.1 is an extensionof Java package GridSim. GridSim is a toolkit for the mod-eling and simulation of distributed resource management andscheduling for Grid computing [19].
Algorithm 2 AcceptanceCriterion(schedulebest,schedulenew
)1: if schedulebest = ø then2: return true3: end if4: compute makespanbest and nondelayedbest according to
schedulebest
5: compute makespannew and nondelayednew according toschedulenew
6: weightmakespan ⇐ (makespanbest −makespannew)/(makespanbest)
7: weightdeadline ⇐ (nondelayednew −nondelayedbest)/(nondelayedbest)
8: weight ⇐ weightmakespan + weightdeadline
9: if weight > 0.0 then10: return true11: else12: return false13: end if
Algorithm 3 Simulated annealing for a minimization problemwith solution space S, objective function f and neighbourhoodstructure N
1: Initial solution S0 ∈ N(S) is created using EG-EDF.2: Select an initial temperature t0 > 03: Select a temperature reduction function a4: repeat5: repeat6: Randomly select Si
7: α ⇐ f(Si) − f(S0)8: if α < 0 then9: S0 ⇐ Si
10: else11: generate randomly x uniformly in the range (0, 1)12: if x < e−α/t then13: S0 = Si
14: end if15: end if16: until iteration = nrep17: t ⇐ a(t)18: until stopping condition = true19: S0 is the approximation to the optimal solution
A. Alea 2.1
Alea 2.1 is used for evaluation of various job scheduling tech-niques. The main part of the Alea 2.1 is the Scheduler entity.
Scheduler takes the newly incoming jobs from JobLoaderentity and determines where to send the jobs. It uses bothqueue-based and schedule-based scheduling algorithms. body()method of Scheduler entity is responsible for communica-tion. Scheduling algorithms are placed outside body() method.Schedule of each resource is placed in objects of ResourceInfoentity. One ResourceInfo object represents one resource [20].
Functions that approximate makespan, tardiness, and othervalues important for the scheduling algorithms are implemented
International Conference on Computer & Communication Technology (ICCCT)-2011
343
in Alea 2.1.
The evaluation of the proposed algorithm was a large sim-ulation. Running Alea 2.1 during simulation required a lot ofmemories since many objects were created. The simulation wasperformed on the Intel Pentium 4 2.8 GHz machine with 512MB RAM.
B. Resource Modelling
A number of space-shared resources with different character-istics, configurations, and capabilities are modelled and simu-lated from those in the World-Wide Grid (WWG) testbed. Lat-est CPU models released by renowned manufacturers have beenselected. The processing capability of these Processing Ele-ments (PEs) in simulation time units is modelled in the form ofMIPS (Million Instructions Per Second) ratings.
C. Application Modelling
Four task farming applications that consist of6000, 12000, 3000 and 5000 jobs have been modelled. InGridSim, these jobs are packaged as Gridlets whose contentsinclude the job length in MI (Million Instructions), the sizeof the job input and output data in bytes, along with variousother execution related parameters when they move betweenthe broker and the resources.
D. Simulation Parameters
The t1 ≥ t2 ≥ t3 ≥ · · · > 0 are control parameters referredto as cooling parameters or temperatures (in analogy with theannealing process mentioned above) [10]. Values of controlparameters are shown in Table I:
TABLE ISIMULATION PARAMETERS
Parameter Valuetemperature decrement function an
a 0.9 (initially)n 0 to 499
time limit 800
VII. SIMULATION OUTPUT
Current Grid scheduling systems are all queue based systems.These systems use one or more incoming queues where jobs arestored until they are scheduled for execution. All systems usebasic First Come First Served (FCFS) scheduling policy.
Through the simulation, the proposed SA based schedulingalgorithm has been compared with the FCFS algorithm. Fig-ure 1 and Figure 2 present graphs depicting the number of wait-ing and running jobs per day, following FCFS and SA algo-rithms respectively, as were generated by the Alea 2.1 during
the simulation. These graphs demonstrate major differencesamong the algorithms. Concerning the machine usage, FCFSgenerates poor results in not being able to utilize available re-sources which is depicted by the existence of waiting jobs evenin the last part of the simulation. In contrast, it is observed thatthe SA based approach is able to manage the load efficientlythrough efficient local search technique which is clearly de-picted by the increase in the number non-waiting jobs in thelast part of the simulation.
Fig. 1. Waiting and running jobs by FCFS
Fig. 2. Waiting and running jobs by SA
VIII. CONCLUSION AND FUTURE WORK
Grid scheduling is a process of reserving resources for futureuse by a planned task. The goal of this process is to minimizevarious optimization criteria such as makespan, lateness andtardiness. Finding of optimal solution in the Grid environmentis an NP-complete problem which is practically intractable forlarger sets of jobs.
International Conference on Computer & Communication Technology (ICCCT)-2011
344
However, current Grid schedulers usually use very simple al-gorithms based on queues. In this paper, a scheduling algorithmbased on schedule optimization using SA is proposed for jobscheduling problem on computational Grids. Job scheduling al-gorithms based on SA can be applied in existing computationalGrid environments.
Endeavour has been made to compare performance of theproposed approach with that of the existing queue based FCFSapproach through simulation. This comparison demonstratesthat the proposed approach generates an optimal schedule. Thisoptimal schedule increases the machine usage and the numberof non-waiting jobs.
The future work may incorporate hybridization technique(memetic algorithms) to make Grid scheduling more efficient.
ACKNOWLEDGMENT
The authors would like to acknowledge all researchers of thesimulation tools described in this paper and thank them for theiroutstanding work. Sincere thanks are also due to the anony-mous reviewers whose thought-provoking comments have en-riched the paper.
REFERENCES
[1] Chee Shin Yeo, Anthony Sulistio and Rajkumar Buyya. A taxonomy ofcomputer-based simulations and its mappings to parallel and distributedsystems simulation tools. Software - Practice And Experience, 34:653–673, April 2004.
[2] Ian Foster and Carl Kesselman, editors. The Grid: Blueprint for a newcomputing infrastructure. Morgan Kaufmann, San Francisco, CA, 1999.
[3] Kiat-An Tan, Frederic Magoules, Jie Pan and Abhinit Kumar. Introduc-tion to Grid Computing. CRC Press, London, UK, 2009.
[4] Dalibor Klusacek. Scheduling in Grid Environment. PhD thesis, MasarykUniversity, Brno, 2008.
[5] Rajkumar Buyya, Ajith Abraham and Baikunth Nath. Nature’s heuristicsfor scheduling jobs on computational grids. In P.S. Sinha and R. Gupta,editors, Proceedings of 8th IEEE International Conference on AdvancedComputing and Communications, (ADCOM2000), pages 45–52. TataMcGraw-Hill Publishing Co. Ltd, New Delhi, 2000.
[6] J.G. Kuhl Casavant. A taxonomy of scheduling in general-purpose dis-tributed computing systems. IEEE Transaction on Software Engineering,14(2):141–154, 1988.
[7] Dong and S. G. Akl. Scheduling algorithms for grid computing: State ofthe art and open problems. Technical Report 2006/504, 2006.
[8] Sartaj Sahni, Ellis Horowitz and Sanguthevar Rajasekaran. Fundamentalsof Computer Algorithms. Universities Press Private Limited, Hyderabad,India, second edition, 2010.
[9] Michael R. Garey and David S. Johnson. Computers and Intractability.Bell Telephone Laboratories, USA, 1979.
[10] Michael Pinedo. Scheduling:Theory, Algorithms, and Systems. SpringerScience + Business Media, NewYork,USA, third edition, 2008.
[11] Dalibor Klusacek and Hana Rudova. Improving qos in computationalgrids through schedule-based approach. In Scheduling and Planning Ap-plications Workshop (SPARK) at the International Conference on Auto-mated Planning and Scheduling (ICAPS’08), Sydney, 2008.
[12] Paul E. Black, editor. Dictionary of Algorithms and Data Struc-tures [online]. U.S. National Institute of Standards and Technology,http://xw2k.nist.gov/dads/HTML/simulatedAnnealing.html, 2009.
[13] K.A. Dowsland. Modern Heuristic Techniques for Combinatorial Prob-lems (editor C.R. Reeves). McGraw-Hill, 1995.
[14] M.N. Rosenbluth, A.H. Teller, N. Metropolis, A.W. Rosenbluth andE. Teller. Equation of state calculation by fast computing machines. Jour-nal of Chemistry Physics, 21(6):1087–1091, 1953.
[15] C.D. Gelatt, S. Kirkpatrick and M.P. Vecchi. Optimization by simulatedannealing. Science, 220(4598):671–680, 1983.
[16] Rui Chibante, editor. Simulated Annealing: Theory with Applications.Sciyo, Rijeka, Croatia, 2010.
[17] Steven S. Skiena. The Algorithm Design Manual. Springer Science +Business Media, London, UK, second edition, 2008.
[18] Ludek Matyska, Dalibor Klusacek and Hana Rudova. Alea - grid schedul-ing simulation environment. In Proceedings of 7th International Con-ference on Parallel Processing and Applied Mathematics (PPAM 2007),volume 4967, pages 1029–1038. Springer, 2008.
[19] Rajkumar Buyya and Manzur Murshed. Gridsim: a toolkit for the mod-eling and simulation of distributed resource management and schedulingfor grid computing. Concurrency and Computation: Practice and Expe-rience (CCPE), 14:1175–1220, Nov.-Dec. 2002.
[20] Dalibor Klusacek and Hana Rudova. Alea 2 - job scheduling simula-tor. In Proceedings of 3rd International Conference on Simulation Toolsand Techniques (SIMUTools2010), Torremolinos, Malaga, Spain, March2010.
International Conference on Computer & Communication Technology (ICCCT)-2011
345