Service scheduling and rescheduling in an applications integration framework

Advances in Engineering Software 40 (2009) 941–946

Contents lists available at ScienceDirect

Advances in Engineering Software

journal homepage: www.elsevier .com/locate /advengsoft

Service scheduling and rescheduling in an applications integration framework

Lei Yu, Frédéric Magoulès *

Applied Mathematics and Systems Laboratory, Ecole Centrale Paris, Grande Voie des Vignes, 92295 Châtenay-Malabry Cedex, France

a r t i c l e i n f o

Article history:Received 3 November 2008Accepted 1 December 2008Available online 7 April 2009

Keywords:Web servicesScheduling algorithmWSRFApplication integrationScheduler serviceRescheduling

0965-9978/$ - see front matter � 2009 Elsevier Ltd. Adoi:10.1016/j.advengsoft.2008.12.017

* Corresponding author. Tel.: +33 1 41 13 10 00.E-mail address: [email protected] (F

a b s t r a c t

Grid technologies are evolving towards a service oriented architecture (SOA) and the traditional client/server architecture of heterogeneous computing (HC) can be transformed into a grid service orientedarchitecture. In this architecture, when more than one service fulfills the user request, a service whichcan make scheduling decisions is essential. A scheduling service has been proposed in a framework whichachieves the dynamic deployment and scheduling of scientific and engineering applications. The frame-work treats all components (resource service and scheduler service) as WSRF-compliant services whichsupport the applications integration with underlying native platform facilities and facilitate the construc-tion of the hierarchical scheduling system. In order to enhance the system performance, we replace theMWL scheduling algorithm with an MCT algorithm and integrate a rescheduling mechanism in theframework. The experiments show that the MCT algorithm can achieve a smaller makespan and therescheduling mechanism ensures the task execution even if an application is removed from the ResourceService.

� 2009 Elsevier Ltd. All rights reserved.

1. Introduction

Grid technologies are evolving towards a services orientedarchitecture (SOA) in which all the entities (e.g. computationaland storage resources, networks, programs, databases) are treatedas services. These services are build on concepts and technologiesfrom both the grid and Web services communities and are definedby a set of specifications which specify standard mechanisms forcreating, naming, and discovering transient grid service instances[1]. Therefore, the traditional client/server architecture of hetero-geneous computing (HC) can be transformed into a grid service ori-ented architecture where the two main roles are the serviceprovider and the service consumer. In a SOA, service providerscan register the services they provide into a directory and serviceconsumers then discover their requested services from the direc-tory and send their service requests. Nevertheless, when two ormore similar services are registered, a scheduling service is lackingin such a SOA. We can not request a directory service to collectcomputing resources information, to analyze client criterion andto make scheduling decisions.

Scheduling algorithms have been well studied in heterogeneouscomputing (HC) systems. But in a SOA grid environment, the real-ization and evaluation of such algorithms are less mentioned. Yuand Magoulès have proposed a framework to achieve dynamicdeployment of scientific and engineering applications [2]. In this

ll rights reserved.

. Magoulès).

framework, a scheduling service is realized and a very simplescheduling algorithm is implemented to make scheduling deci-sions. The algorithm realized makes use of static and dynamicinformation of the computing resources (e.g. free CPU) to scheduleservice requests and the service that its hosting computing re-source has the lowest charge (e.g. more free CPUs, less waiting jobsper CPU) is selected to be the best service for the consumer. Butthis algorithm does not take the execution information (e.g. execu-tion time) into account, resulting in the load inbalance between theavailable resources when the execution time of the services variesvastly.

In this paper, a minimum completion time (MCT) schedulingalgorithm is implemented and evaluated in the previously devel-oped framework [2]. The minimum completion time (MCT) heuris-tic assigns each task to the machine that results in that task’searliest completion time. Moreover, the framework lacks arescheduling mechanism to ensure the execution of jobs when arequested application in some computing resources are removeddynamically [2]. Thus a rescheduling algorithm is also proposedand implemented.

The next section of the paper presents briefly scheduling algo-rithms in the HC system. Following this, the architecture and theimplementation of the framework [2] are described in Section 3.In Section 4, the MCT and rescheduling algorithm are presentedand the detail of the implementation of these algorithms is de-scribed. Next, in Section 5 experiments are presented to comparethe performance of the different algorithms and to evaluate therescheduling algorithm. The related work is discussed in Section6. Finally we conclude with a brief discussion of the future research.

mailto:[email protected]

http://www.sciencedirect.com/science/journal/09659978

http://www.elsevier.com/locate/advengsoft

SDJSDJ

Computing Resource

Client

Resource Service

Computing Resource

Resource Service

Scheduler Service

Meta−Scheuler

AdminTool

Fig. 1. The architecture of the framework.

942 L. Yu, F. Magoulès / Advances in Engineering Software 40 (2009) 941–946

2. Scheduling algorithms in heterogenous computing systems

In general, heterogeneous computing (HC) is the coordinateduse of different types of machines with different capabilities, net-works, and interfaces to maximize their combined performanceand/or cost-effectiveness [3]. HC is an important technique for effi-ciently solving collections of computationally intensive problemsand HC research provides the foundation for grid computing.

A schedule (or task schedule) is the assignment of tasks toresources in specific time intervals, such that no two tasks are as-signed to any resource at the same time, or such that the capacityof the resource is not exceeded by the tasks. In order to clearly de-scribe the scheduling algorithms, some preliminary terms must bedefined. Machine availability time, matðjÞ, is the earliest time amachine j can complete the execution of all the tasks that havepreviously been assigned to it. Let ETCði; jÞ be defined as the esti-mated time to compute for task i on machine j. Completiontime, ctði; jÞ, is the machine availability time plus the executiontime of task i on machine j; ctði; jÞ ¼ matðjÞ þ ETCði; jÞ. Let t be thenumber of tasks to be executed and m the number of machinesin the HC system. The maximum value of ctði; jÞ, for 0 6 i < t and0 6 j < m is known as the makespan. Each scheduling algorithmis an attempt to minimize the makespan.

In a general HC system, two types of scheduling algorithm areintensively researched: static and dynamic. Dynamic methods per-form the scheduling as tasks arrive. This is in contrast to statictechniques, where the complete set of tasks to be scheduled isknown a priori, the scheduling is done prior to the execution ofany of the tasks, and more time is available to make the schedulingdecision. Both static and dynamic methods are widely adopted ingrid computing. Dynamic scheduling is more appropriate than sta-tic scheduling in a grid environment because of the dynamic avail-ability and load variability of computing resources.

According to paper [3], dynamic scheduling algorithms can begrouped into two categories: on-line mode and batch mode heuris-tics. In the on-line mode, a task is scheduled onto machine as soonas it arrives at the scheduler. In the batch mode, tasks are notscheduled onto the machines as they arrive; instead they are col-lected into a set that is prepared for scheduling at a pre-definedtime interval. The most popular heuristics of the on-line modeare minimum completion time (MCT), minimum execution time(MET) and k-percent best (KPB). Moreover, Min–min heuristicand sufferage heuristic are heuristics which are more studied andimplemented in the batch mode. Normally, the KPB provides theminimum makespan in the on-line mode heuristics and the suffer-age heuristic gives the smallest makespan in the batch mode heu-ristics [3].

3. Framework for dynamic deployment of application

The evolution of grid computing generates new requirementsfor the distributed application development and deployment.Unfortunately, most of the applications which need to be deployedare command-line applications written in FORTRAN, C and script-ing languages. Although these applications are fast, efficient andeasy to use, they are usually platform dependent and are difficultto interact with applications from other communities. Moreoverthere is no standard way of registering these applications so thatthey can be discovered by interested clients and end-users.

The service oriented architecture (SOA) is an ideal technology tointegrate legacy applications into the grid. Adopting this serviceoriented architecture, a framework [2] is implemented to achievethe dynamic deployment and scheduling of scientific applications.The architecture of the framework is illustrated in Fig. 1. The re-source service is deployed in each computing resource and makes

the computing resource virtual through encapsulation of scientificand engineering applications behind a common interface. Userapplications interact with the scheduler service, via a uniform userinterface, to discover applications, to submit applications and tomonitor execution status. In each resource service, jobs which exe-cute the scientific applications are described in the job descriptionschema [4] and these job description files are saved in the local jobdescription storehouse (JDS). An AdminTool which provides a gra-phic interface for users can be used by the local administrator tointeract with the resource service which takes the responsibilityto add, delete and modify application descriptions.

The framework treats all components (e.g. the resource serviceand the scheduler service) as WSRF-compliant services [5] whichsupport applications integration with underlying native platformfacilities and facilitate the construction of a hierarchical schedulingsystem. The local administrator can dynamically put some applica-tions available or unavailable on the resource service without stop-ping the execution of the computing resources. The schedulerservice implements a simple scheduling algorithm which can real-ize job scheduling and select the best resource service to submitjobs for the users. The performance of the framework has beenevaluated using some experiments.

4. Minimum completion time (MCT) algorithm implementation

4.1. The algorithm implemented in the framework

The framework [2] is implemented using the framework of Glo-bus Toolkit (GT) 4 [6] which is a Web service-based version andprovides significant improvements over previous releases in termsof robustness, performance, usability, documentation, standardscompliance, and functionality. The Globus Toolkit’s Monitoringand Discovery System (MDS) defines and implements mechanismsfor service and resource discovery and monitoring in distributedenvironments [7]. MDS can also be configured in a hierarchicalfashion with upper levels of the hierarchy aggregating informationfrom the lower-level MDS. The upper levels are identified as up-stream resources in the hierarchy, and the lower-levels are identi-fied as downstream resources [8]. Thus from each computingresource, the scheduler service can gather the dynamic and staticresource information to make the scheduling decisions.

L. Yu, F. Magoulès / Advances in Engineering Software 40 (2009) 941–946 943

Using the information gathered from each computing resource,the scheduler service tries to map tasks to the resource which hasthe minimum work load. The achieved scheduling algorithm isshown in Algorithm 4.1. When the scheduler service finds thatthere is more than one available resource service which conformsto the user requirement, it compares the number of availableCPUs of each computing resource. The resource which has themost available CPUs is assigned to execute the user task. If thenumber of available CPUs is similar, scheduler service calculatesthe value of WaitingJobs/ TotalCPUs for each computing re-source. WaitingJobs is the number of jobs waiting in the localjob queue, and TotalCPUs is the number of CPUs on each com-puting resource. The resource which has the smallest value ismapped to perform the user task. We designate this algorithmas the minimum work load (MWL) heuristic because the sched-uler service always try to assign jobs to the resource with theminimum work load.

Algorithm 4.1. Minimum work load (MWL) algorithm

task tj arrivesfor Each resource mi which has the wanted application do

Calculate factori ¼waiting jobs in mi/ total CPUs of mi

Set fi ¼ free CPUs in mi

end forfor Each resource mi which has the wanted application do

if fi has the largest value or fk of mk has the same largest value asmi but factori < factork then

assign the task tj to resource mi

end ifendfor

P3

P2

P1

Processor

Time0 10

T13

T2 T3 T4 T5 T65 4 5 6 4

T1

T2

T3

T4

T5

T6

JobQueue

t1 t2 t3 t4

Head

Fig. 2. The simulation procedure to estimate the machine availability time.

4.2. MCT algorithm

The minimum completion time (MCT) Algorithm 4.1. assignseach task to the resource that results in the task’s earliest comple-tion time. As a task arrives, all the resources which satisfy thetask’s requirements are examined to determine the resource thatgives the earliest completion time for the task [9]. This may causesome tasks to be assigned to machines that do not have the mini-mum execution time for them.

Algorithm 4.2. Minimum completion time (MCT) algorithm

task tj arrivesfor Each resource mi which has the wanted application do

Calculate ctði; jÞ ¼ matðjÞ þ ETCði; jÞend forfor Each resource mi which has the wanted application do

if tj has the minimum completion time in resource mi thenassign tj to mi

end ifend for

MCT is an easy but effective scheduling algorithm in the on-linemode of dynamic scheduling [3]. The difficulty of implementingMCT in the framework is the estimation of Machine availabilitytime, matðjÞ, for each computing resource. In the Globus JobDescription language, there are several elements to set job wall-time (e.g. maxCpuTime, maxTime and maxWallTime). The max-Time is used to set the task execution time when the localadministrator defines the job description for an application. Inthe framework, each task is submitted in a WSRF-resource [5]

which can be used to keep task execution status. The definitionof resource properties is shown as follows:

<!– RESOURCE PROPERTIES –><xsd:element name="JobStatus" type="xsd:string"/><xsd:element name="EstimatedTerminationTime"

type="xsd:int"/><xsd:element name="CurrentTime" type="xsd:time"/><xsd:element name="SubmitTime" type="xsd:time"/>

<xsd:element name="GridResourceProperties"><xsd:complexType>

<xsd:sequence><xsd:element ref="tns:JobStatus" minOccurs="1"

maxOccurs="1"/><xsd:element

ref="tns:EstimatedTerminationTime"

minOccurs="1" maxOccurs="1"/><xsd:element ref="tns:CurrentTime"

minOccurs="1" maxOccurs="1"/><xsd:element ref="tns:SubmitTime" minOccurs="1"

maxOccurs="1"/></xsd:sequence>

</xsd:complexType></xsd:element>

The EstimatedTerminationTime ðesiÞ is the task’s estimatedexecution time which is initiated according to the maxTime inthe job description. When the task is submitted to a computingresource, the time is saved in SubmitTime ðstiÞ. Moreover,when the job is really executed in the computing resource (jobstatus is active), the time is set in CurrentTime ðcutiÞ. Therefore,a task’s execution time ðexeiÞ can be calculated by formula asfollows:

1. if job, i, is activated, exei ¼ esi � ðnow� cutiÞ. now is the currentsystem time.

2. if esi < ðnow� cutiÞ; exei ¼ ðnow� cutiÞ.3. if job, i, is not activated, exei ¼ esi.

Having the execution time ðexeiÞ for each task, we can estimatethe resource’s Machine availability time ðmatðjÞÞ by a simulation.The simulation procedure is illustrated in Fig. 2. We suppose thatthere are six jobs in the job queue and the resource has three pro-cessors. The execution time of each job is shown below the job’sname (e.g. 3, 5). The scheduling strategy of job queue is FIFS (firstin first serve). Thus at the beginning, the first three jobs (T1,T2 andT3) are executed in the three processors of the resource. After

LAN

Resource Service

Condor Pool

GT 4

Resource Service

Condor Pool

GT 4

AdminTool

C 1 C 2

Meta−Scheuler

Scheduler Service

GT4

Client

MDS

Interactionbetween Services

Fig. 3. The experimental set-up.


t1; T1 completes and T4 is started. Then, after t2; T3 finishes and T5is executed. Next, T6 is started when T2 completes after t3. Now allthe jobs in the job queue are executed and t4 is the shortest com-pletion time for the jobs which are still running. Therefore, the Ma-chine availability time of this resource can be calculated:matðjÞ ¼ t1þ t2þ t3þ t4.

4.3. Rescheduling

In the previously developed framework [2], it is possible thatthe deployed application is removed (deleted) by the local admin-istrator. Thus some submitted jobs which want to execute thisapplication will be blocked in the resource service or will invokesystem errors. The rescheduling of such jobs must be taken into ac-count. Two rescheduling mechanisms that are described in the pa-per [10]: rescheduling by processor swapping and rescheduling bystop and restart. To enable processor swapping, the application islaunched with more machines than will actually be used for thecomputation. There are two set of machines: the active set (ma-chines that become part of the computation) and the inactive set(machines that do nothing initially). During the execution, themonitor periodically checks the performance of the machinesand swaps slower machines in the active set with faster machinesin the inactive set. In the stop/restart approach, the application issuspended and migrated only when better resources are foundfor application execution.

A rescheduling Algorithm (swapping/start) 4.3 is proposed toachieve the job rescheduling in the framework. Supposing jobswhich have been activated by computing resource can not be can-celed and rescheduled. Jobs which have been run in the computingresource are considered to complete correctly even if the requiredapplication is removed from the resource service. When the Sched-uler Service detects a job description is removed from a resourceservice, it searches its job queue to find out all the jobs which havebeen mapped to this resource service and want to execute the re-moved application. For jobs which have not been activated by thecomputing resource, the Scheduler Service cancels the job submis-sion and insert these jobs into a rescheduling queue. Then a mech-anism is implemented to regularly scan jobs in the reschedulingqueue and try to assign jobs to another resources.

5. Evaluation

As we have introduced in Section 2, each scheduling algorithmaims to minimize the makespan. Therefore the performance ofscheduling algorithms can be evaluated by comparing the jobcompletion time achieved with each scheduling algorithm. Theframework [2] realized the dynamic integration of scientific andengineering applications and we have implemented a reschedul-ing algorithm in it to ensure the execution of the jobs. Thus thecapacity of dynamic adding and removing applications in theframework must be evaluated. The performance of the frameworkand scheduling algorithms are measured by three types ofexperiments:

1. All the jobs have the same estimated execution time (5 min). Aset of jobs is scheduled to computing resources twice, using thetwo different scheduling algorithms (MWL and MCT). Each job’sCompletion time is measured.

2. There are three types of jobs which have a different estimatedexecution time (5, 10, 15 min). A set of jobs which are selectedrandomly from the three types of job is mapped to computingresources with different scheduling algorithms.

3. An application is removed and deployed dynamically in aResource service. Each job’s Completion time is measured.

Algorithm 4.3. Swapping/start rescheduling algorithm

application Ap is removed from resource service RSj

for Each Job Ji in job queue of scheduler service doif Ji executes Ap then

if Ji is not activated in RSj thencancel Ji submissionremove Ji from job queue of RSj

remove Ji from job queue of scheduler serviceinsert Ji in job rescheduling queue

end ifend if

end forfor Each Job Ji in job rescheduling queue do

find the RSj which has the minimum completion time for Ji

assign Ji to RSj

end for

5.1. Experimental set-up

The experimental set-up, shown in Fig. 3, is as follows. The Re-source service is deployed and tested in two Condor clusters [11]named C1 and C2: each has three servers. Each server has 2 Pen-tium 4 3.20 GHz with 1 GB RAM. The scheduler service is deployedin a PC powered by Pentium 4 3.00 GHz with 512 MB RAM. All themachines are connected by 100 Mb Ethernet. GT 4 is installed inthe central manager of condor pool and scheduler machine, andscheduler adapters are configured to support the job submissioninto the condor pool.

A laptop is used as a client to interact with the scheduler ser-vice. MDS is configured in a hierarchical fashion with upper levelsof the hierarchy (meta-scheduler) aggregating information fromthe lower-level MDS (C1 and C2).

5.2. Five minutes jobs scheduling experiments

A C application which can execute 5 min in one of the servers ofcondor clusters is prepared. In order to execute the application inthe standard universe, condor_compile must be used to relink

300

350

400

450

500

550

600

0 200 400 600 800 1000 1200 1400 1600

Com

plet

ion

Tim

e (S

econ

d)

Interval (Second)

40 Jobs with MCT algorithm40 Jobs with MWL algorithm

Fig. 4. The comparison of completion time of jobs with the two schedulingalgorithms.

1.05

1.1

1.15

1.2

1.25

1.3

1.35

1.4

1.45

4 6 8 10 12 14 16

Rat

ios

(CT/

ET)

Job Type (Minute)

40 Jobs with MCT algorithm40 Jobs with MWL algorithm

Fig. 5. The comparison of ratios (CT/ET) of multi-type jobs between MWL and MCT.

300

400

500

600

700

800

900

0 100 200 300 400 500 600 700 800 900 1000

Com

plet

ion

Tim

e (S

econ

d)

Interval (Second)

40 Jobs with rescheduling and dynamic deployment40 Jobs with normal submission

Fig. 6. The performance of rescheduling and dynamic deployment.

L. Yu, F. Magoulès / Advances in Engineering Software 40 (2009) 941–946 945

the application with the condor libraries [11]. Then the AdminToolis used to deploy dynamically this application in both C1 and C2clusters. Next, the client submits 40 jobs to the service schedulerwhich implements the MCT algorithm and the interval of submis-sion is 30 s. From the user’s point of view, a job submission is a se-quence of finding application, scheduling, getting job status andjob completion. In order to compare the difference of two schedul-ing algorithms, the 40 jobs submitted is effected once again withthe MWL algorithm.

Fig. 4 shows that the MCT algorithm achieves a more balancedand effective job scheduling. The average job completion time ismuch shorter than that of MWL algorithm. We notice that the com-pletion time of the thirty-third and thirty-fourth jobs with the MCTalgorithm is longer. This can be explained as the perturbationcomes from another server users. When a user does some opera-tions in one of the servers of cluster, the operation will make useof a CPU and suspend the activated job. Moreover, the bad perfor-mance of the MWL algorithm is partly because of the performanceof the MDS system. The MWL algorithm needs dynamic informa-tion to make scheduling decisions. But the MDS can only provideinformation with a certain delay, this prevents the scheduler mak-ing a precise scheduling decision.

5.3. Multi-type jobs scheduling experiments

In this part of experiments, three types of applications are pre-pared according to the job’s estimated execution time: 5 min,10 min and 15 min. We make a set of 40 jobs which includes 20jobs of 5 min, 10 jobs of 10 min and 10 jobs of 15 min. The se-quence of job submission is created randomly. The submission iseffectuated twice: one with the MWL algorithm and another withthe MCT algorithm.

The average completion time for each type of jobs is calculated.Then we define variable ratios which equals average completiontime/estimated execution time for each type of job. The variableratios represents a ratio of the job waiting time in a job queue tothe job execution time. The larger the value of ratios, the longerthe job waits in the job queue.

The experimental results are illustrated in Fig. 5. It is shownthat the ratios of jobs with the MCT algorithm smaller than thatof jobs with the MWL. That means jobs wait more longer in jobqueues when the scheduler service uses MWL scheduling algo-rithm. When it is similar the execution time of the jobs is shorter,the ratio is greater for both of two scheduling algorithms.

5.4. Rescheduling and dynamic deployment experiments

In order to have more jobs waiting in the job queue, we modifythe submission interval to 15 s. The client submits 40 jobs to thescheduler service with the MCT algorithm and waits for the com-pletion of all the jobs. In the case of normal submission, thereare no applications removed from C1. For the case of the resched-uling experiments, when the eleventh job is submitted, we removethe application from C1. In this time, the ninth, tenth and eleventhjobs are waiting in C1 and are not yet activated. Thus the three jobsare rescheduled to C2 and are executed in this resource. The jobswhich follow have a greater completion time because there is onlyC2 which provides service. Next, after the submission of twenty-second job, the application is dynamically deployed in C1. Thusthe completion time of the successful jobs drops and the schedul-ing becomes stable and balanced. The Fig. 6 shows the results. Weresubmit 40 jobs with an interval of 15 s to make a comparison.

6. Related work

In the context of computational grids, we can mention the fol-lowing meta-scheduling projects: Condor/G [12], which providesuser tools with fault tolerance capabilities to submit jobs to a glo-bus-based grid; Nimrod/G [13], designed specifically for parametersweep application (PSA) optimizing user-supplied parameters like


deadline or budget; GridLab Resource Management System(GRMS) [14], which is a meta-scheduler component to deploy re-source management systems for large scale infrastructures; andthe community scheduler framework (CSF) [15], an implementa-tion of an OGSA-based meta-scheduler; and the Enabling Gridsfor E-sciencE (EGEE) resource broker [16], that handles job submis-sion and accounting. Finally, GridWay gives end users, applicationdevelopers and managers of globus infrastructures a schedulingfunctionality, including support for the DRMAA GGF standard [17].

There are several research efforts aiming at automating thetransformation of legacy code into a grid service. Most of thesesolutions are based on the general framework to transform legacyapplications into Web services outlined in [18], and use Java wrap-ping in order to generate stubs automatically. One example can befound in [19], where the authors describe a semi-automatic con-version of a legacy C code into Java using JNI (Java Native Interface)[20].

Compared to Java wrapping, some solutions [2,20–22] are basedon a different principle. They offer a front-end grid service layerthat communicates with the client in order to pass input and out-put parameters, and contacts a local job manager to submit the leg-acy computational job. The grid service is defined by OGSA [23]which supports, via standard interfaces and conventions, the crea-tion, termination, management, and invocation of stateful andtransient services as named and managed entities with a dynamicand managed lifetime. To deploy a legacy application as a grid ser-vice there is no need for the source code. The user only has to de-scribe the legacy parameters in a pre-defined file (description) andto transfer that file to a factory service.

The paper [24] presents a lightweight grid solution for thedeployment of multi-parameter applications on a set of clustersprotected by firewalls. The system uses a hierarchical design basedon Condor for managing each cluster locally and XtremWeb for en-abling resource sharing among the clusters. This approach fulfillsthe requirements of grid deployments ensuring strong securityand fault tolerance using resilient components which fetch theircontext before restarting.

New grid scheduling and rescheduling methods [10] are intro-duced in GrADS. GrADS utilizes autopilot to monitor performanceof the agreement between the application demands and resourcecapabilities. Once the contract is violated, a simple stop/migrate/restart approach and a process-swapping approach are applied torescheduling grid applications, improving the performance of thesystem.

7. Conclusion and future work

The paper realized a more effective scheduling algorithm andimplemented a rescheduling mechanism in a framework whichcan deploy scientific and engineering applications into a grid envi-ronment. In this framework, the local administrator can dynami-cally put some applications available or unavailable on theResource service without stopping the execution of the GlobusToolkit Java Web Services container. In order to enhance systemperformance, a more efficient algorithm (MCT) has been imple-mented in the scheduler service of the framework. The difficultyof implementing MCT in the framework is the estimation of ma-chine availability time, mat(j), for each computing resource. In or-der to estimate mat(j), three resource properties were defined tokeep task execution status, and a formula was proposed to simu-late mat(j) for each task. The experiments show that MCT realizesa smaller makespan of jobs than MWL. Moreover, a reschedulingalgorithm and mechanism are proposed in the framework. The sys-tem can dynamically detect the deployment and undeployment of

applications, schedule or reschedule tasks to available resourceswithout human interruptions.

We plan to realize a more complex scheduling algorithm (e.g.batch mode heuristics) and to integrate the workflow in thescheduler service. Moreover the interaction between schedulerservices or between a scheduler service and the other meta-scheduler can be realized in the standard of Web service. So wewould like to create a hierarchy of meta-schedulers to realize a dis-tributed scheduling.

References

[1] Foster I, Kesselman C, Nick JM, Tuecke S. Grid services for distributed systemintegration. Computer 2002;35(6):37–46.

[2] Yu L, Magoulès F. A framework for dynamic deployment of scientificapplications based on wsrf. In: Advances in grid and pervasivecomputing. Paris, France: Springer Verlag; 2007. May.

[3] Siegel HJ, Ali S. Techniques for mapping tasks to machines in heterogeneouscomputing systems. J Syst Architec 2000;46:627–39.

[4] Globus team: Gt 4.0 ws gram: job description schema doc. <http://www.globus.org/toolkit/docs/4.0/execution/wsgram/schemas/gram_job_description.html>0.

[5] Foster I, Czajkowski K, Ferguson D, Frey J, Graham S, Maguire T, et al. Modelingand managing state in distributed systems: the role of ogsi and wsrf. In:Proceedings of the IEEE, vol. 93; March 2005. p. 604–12.

[6] Foster I. Globus toolkit version 4: software for service-oriented systems. In:IFIP international conference on network and parallel computing, LNCS 3779,Springer-Verlag, p. 2–13.

[7] Schopf JM, Raicu I, Pearlman L, Miller N, Kesselman C, Foster I, D’Arcy M.Monitoring and discovery in a web services framework: functionality andperformance of globus toolkit mds4. Technical report ANL/MCS-P1248-0405,Argonne National Laboratory, Argonne, IL; 2005.

[8] Mausolf J. Grid in action: monitor and discover grid services. <http://www-128.ibm.com/developerworks/grid/library/gr-gt4mds/index.html>.

[9] Maheswaran M, Ali S, Siegel HJ, Hensgen D, Freund RF. Dynamic mapping of aclass of independent tasks onto heterogeneous computing systems. J ParallelDistributed Comput 1999;59(2):107–31.

[10] Berman F, Casanova H, Chien A, Cooper K, Dail H, Dasgupta A, et al. New gridscheduling and rescheduling methods in the grads project. Int J ParallelProgram 2005;33(2):209–29.

[11] Condor team: condor user’s manual. <http://www.cs.wisc.edu/condor/manual/v6.8/2_4Road_map_Running.html>.

[12] Frey J, Tannenbaum T, Livny M, Foster I, Tuecke S. Condor-G: a computationmanagement agent for multi-institutional grids. Cluster Comput2002;5(3):237–46.

[13] Buyya R, Abramson D, Giddy J. A computational economy for grid computingand its implementation in the Nimrod-G resource broker. Future GenerationComput Syst 2002;18:1061–74.

[14] Seidel E, Allen G, Merzky A, Nabrzyski J. GridLab – a grid application toolkitand testbed. Future Generation Comput Syst 2002;18(8):1143–53.

[15] Platform computing team: open source metascheduling for virtualorganizations with the community scheduler framework (CSF). Technicalreport, platform computing; 2003.

[16] EGEE team: EGEE middleware architecture and planning (Release 2). Technicalreport, DJRA1.4, EGEE; 2005.

[17] Huedo E, Montero RS, Llorente IM. A modular meta-scheduling architecture forinterfacing with pre-WS and WS grid resource management services. FutureGeneration Comput Syst 2007;23:252–61.

[18] Kuebler D, Eibach W. Adapting legacy applications as web services. IBMDeveloperWorks; 2002. <http://www-128.ibm.com/developerworks/library/ws-legacy>/.

[19] Huang Y, Taylor I, Walker D, Davies R. Wrapping legacy codes for grid-basedapplications. In: Proceedings of the international parallel and distributedprocessing symposium; April 2003. p. 22–6.

[20] Kacsuk P, Goyeneche A, Delaitre T, Kiss T, Farkas Z, Boczko T. High-level gridapplication environment to use legacy codes as ogsa grid services. In:Proceedings of the fifth IEEE/ACM international workshop grid computing;2004. p. 428–35.

[21] Kandaswamy G, Fang L, Huang Y, Shirasuna S, Gannon D. A generic frameworkfor building services and scientific workflows for the grid. In: The 2005 ACM/IEEE conference on supercomputing; 2005.

[22] Gannon D, Ananthakrishnan R, Krishnan S, Govindaraju M, Ramakrishnan L,Slominski A. Grid web services and application factories. In: Fox, Berman, Hey,editors. Computing: making the global infrastructure a reality. Wiley; 2003.

[23] Foster I, Kesselman C, Nick J, Tuecke S. The physiology of the grid: an open gridservices architecture for distributed systems integration; 2002.

[24] Lodygensky O, Fedak G, Cappello F, Neri V, Livny M, Thain D. Xtremweb andcondor: sharing resources between internet connected condor pool. In:Proceedings third IEEE/ACM international symposium cluster computing andthe grid; 12–15 May 2003. p. 382–9.

http://www.globus.org/toolkit/docs/4.0/execution/wsgram/schemas/gram_job_description.html



http://www-128.ibm.com/developerworks/grid/library/gr-gt4mds/index.html

http://www-128.ibm.com/developerworks/grid/library/gr-gt4mds/index.html

http://www.cs.wisc.edu/condor/manual/v6.8/2_4Road_map_Running.html

http://www.cs.wisc.edu/condor/manual/v6.8/2_4Road_map_Running.html

http://www-128.ibm.com/developerworks/library/ws-legacy

http://www-128.ibm.com/developerworks/library/ws-legacy

Documents

Service scheduling and rescheduling in an applications integration framework