[IEEE 2006 IEEE International Conference on Cluster Computing - Barcelona, Spain (2006.09.25-2006.09.28)] 2006 IEEE International Conference on Cluster Computing - Scheduling Workflow-based

Scheduling Workflow-based Parameter-Sweep Applications with Best-Intermediate-Result-First Heuristic

Kunaporn Srimanotham Scientific Parallel Computer Engineering

Lab, Department of Computer Engineering, Chulalongkorn University,

Bangkok, Thailand [email protected]

Veera Muangsin Scientific Parallel Computer Engineering

Lab, Department of Computer Engineering, Chulalongkorn University,

Bangkok, Thailand [email protected]

Abstract

Workflow-based parameter-sweep applications are an important class of parallel jobs on clusters and grid today. Conventional batch schedulers and parameter study tools are not effective for this type of application. Especially, their scheduling policies are usually designed to minimize the makespan of the whole parameter study. However, many parameter-sweep applications also have a primary objective to obtain the best or a few top-ranked results from a large parameter space.

This paper describes a new heuristic for scheduling parameter-sweep workflows in order to minimize the turnaround time of the workflows that give the best results. The algorithm is based on dynamically adjusted priority according to intermediate data obtained at some stage in the workflow. The technique is applied on a high-throughput drug screening application. The experimental results show that our technique can significantly improve the correlation between the ranking of the final results and the order of completion of the workflows. 1. Introduction

Workflow-based parameter-sweep applications are an important class of parallel jobs on clusters and grid today. Many parameter-sweep applications require a very long time to finish all tasks. However, there are many classes of applications that have the objective to find the best results or just a few top-ranked ones from a lot of tasks.

Such applications include a class of applications called search applications. A search application finds the final result by exploring many dimensions. When

each computation is finished on its dimension, a few intermediate results are produced and appropriately selected according to some objective functions and then passed to the next computation. Finally, the best result can be found in the last dimension.

Also, many applications exploit evolutionary computing techniques such as genetic algorithms to explore a lot of solutions by repeatedly creating and evaluating a group of solutions. The best intermediate solutions are chosen for creating the next generation of solutions.

For example, a drug-screening project involves a large number of drug candidates to be tested for their interactions with the targeted protein. However, only a small number of drug candidates will be chosen for further analysis. Similarly, in an automated engineering design process, a lot of models are systematically built and tested. Only a few models can pass to the prototyping stage.

Given enough time, all possible solutions can be evaluated to get the globally optimal result. However, due to time constrains, the user often has to take the best solution available. Usually, batch jobs are scheduled and results are obtained in an undetermined order. On average the best result will be obtained at the middle of all experiments and in the worst case the best result will be obtained at the end.

Therefore, getting better results out earlier would be much desirable, even though all experiments may still need to be done for completeness.

These observations motivated the work presented in this paper. The main contributions include:

• A novel heuristic for scheduling parameter-sweep applications that minimizes the completion time of the workflows that provide better results.

• A workflow scheduling mechanism based on dynamically assigned priority levels.

1-4244-0328-6/06/$20.00 ©2006 IEEE.

The rest of the paper is organized as follows. Section 2 outlines the background. The proposed method is introduced in Section 3. Section 4 describes an implementation of the proposed algorithm. Section 5 discusses the target application. Section 6 describes the experimental evaluation method. Section 7 presents the experimental results and discussion. Related work is given in Section 8. Section 9 summarizes the paper and proposes some future work. 2. Workflow-based parameter-sweep scheduling

A parameter-sweep application is a set of multiple tasks, each of which is executed with a distinct set of parameters. Parameter-sweep tasks can be submitted together as a batch by using some parameter study tools such as Nimrod [1]. Often, each task is not a single program but a workflow consisting of interacting component programs that execute in a partial order determined by data dependency. A workflow can be represented as a directed acyclic graph (DAG). Conventional batch schedulers and parameter study tools on cluster environments do not handle a workflow as a single entity.

Most scheduling algorithms aim to minimize the makespan or minimize the average completion times. However, our objective is to minimize the completion time of the workflow that gives the best result. In order to achieve this objective, the ability to treat a workflow as a single entity must be added to the scheduler.

3. Best-Intermediate-Result-First scheduling 3.1. Problem statement

Given a set of individual instances of a DAG representation of a workflow, W = {wi | i = 1..m}, let the set of application components in wi be denoted by Ai = {aij | j = 1..n} and the set of computing node be C = {ci | i = 1..k}. The scheduling problem is to map application components in W onto elements of C such that wx is given a higher priority than wy if the output of wx is evaluated as better than that of wy. 3.2. Mechanisms and algorithms

The ideal outcome is to have the workflows complete in the same order that their results are sorted. However, the achievable goal is to minimize the turnaround time of the workflows that give the best results. To achieve that, the scheduling algorithm is based on dynamically adjusted priority according to

intermediate data obtained at some stage in the workflow.

An application that can benefit from this technique must have at lease one stage that produces an intermediate result that can be used to predict the quality of the final result. In addition, the intermediate result should be obtained and evaluated with relatively small computing and I/O costs. When a workflow reaches the point where the intermediate result is created, the intermediate result is evaluated and compared to the intermediate results of other workflows. For simplicity, we first consider the case that a workflow has only one evaluation point. Therefore, a workflow is divided into two phases, namely before-evaluation and after-evaluation.

Figure 1: Workflow and intermediate evaluation Figure 1 depicts a sample workflow with five

stages, A, B, C, D and E. Stage D produces the intermediate result. When D is completed, the workflow reaches the evaluation point. The available result is passed to an application-specific program that extracts or calculates a scoring value. The score is passed to the evaluator program. The evaluator collects the scores from workflows, sorts them, and sets priority of the workflows accordingly.

All stages that begin execution after the evaluation point are given the new priority. The stages that have already started are not affected. For example, stage E always runs in the after-evaluation phase because it runs after stage D. However, C can run in either

Intermediate result

1. Evaluate 2. Ranking 3. Set priority

D

A

B

C

E

Final result

before

after

before

after

before-evaluation or after-evaluation phase depending on whether it starts before or after the evaluation point.

This algorithm is on-line. That means it does not require that all workflows must be evaluated before ranking. Only the workflows that pass the evaluation point are involved in the intermediate ranking process.

There arises a new problem. While the scheduler selects the top ranked workflow to run, there are workflows that have not been evaluated and better results may come from them. The scheduler must balance between speeding up evaluated workflows and looking for better results in new workflows.

Our solution is based on the use of two queues, namely before-evaluation queue and after-evaluation queue. Figure 2 shows the diagram representing the structure of the scheduling system. When a workflow is submitted, it is put into the before-evaluation queue first. After the workflow is executed and the evaluation point is passed, it will be moved to the after-evaluation queue. The before-evaluation queue is a FIFO queue, while the after evaluation queue is a priority queue.

Figure 2: Workflow scheduling system

When a machine is available, the scheduler first

selects between the two queues based on an adjustable probability condition. Let P1 and P2 be the probability that the before-evaluation and after-evaluation queues will be selected, respectively. If we want to explore more samples for better candidate solutions, P1 should be greater than P2. With more workflows being compared, ranking is more accurate but the results will

be delayed. If we want to get the results from the evaluated workflows quickly, P2 should be greater. If P1 is 1 and P2 is 0, all workflows are evaluated and ranked before the next phase can begin. If P1 is 0 and P2 is 1, it becomes the conventional algorithm.

Our approach tries to do the best for all situations. When there are more unevaluated workflows than the evaluated ones, especially at the beginning, more chances are given to evaluating the new comers. When there are more evaluated candidates, more chances are given to the best candidates to complete the job.

Let N1 and N2 be the number of workflows in the before-evaluation queue and the after-evaluation queue respectively. The scheduler gets a random number, R, between 1 and N1+N2. If R is less than or equal to the total number of workflows in both queues, the before-evaluation queue is selected. Otherwise, the after-evaluation queue is selected. Therefore, P1 is N1/(N1+N2) and P2 is N2/(N1+N2). Then, the scheduler gets a workflow from the selected queue. 4. Implementation

To demonstrate the feasibility of the proposed technique, it is implemented by using available tools on a cluster environment. We built a higher-level scheduler on top of Torque, a batch scheduler based on PBS, to handle workflow jobs and to provide workflow-wide operations. Torque’s dependency feature is used for representing a DAG. Its priority feature is used for scheduling a workflow as a single entity. The proposed algorithm is implemented as described in section 3. This system is also used as a simulation system for evaluation purposes. 5. Target application

The proposed technique has been applied to high-throughput drug screening. It is the process of screening a database of ligands (drugs or small molecules) to find the most compatible drug for a particular protein. Generally, drug screening consists of two major steps, namely molecular docking and scoring [2]. The docking process predicts ligand conformation within a targeted binding site. The scoring process evaluates the predicted conformations.

In our particular drug screening application, two major programs, AutoDock [3] and Gaussian [4], are used in complement. AutoDock is a docking program to prepare ligand-protein complexes. Its scoring scheme is based on molecular mechanics calculation of free energy and binding energy. Gaussian uses quantum mechanics calculation to provide another

Workflow Description

Workflow Engine

before-evaluation workflow queue

after-evaluation workflow queue

N1=3 N2=2 WF5

WF4 WF3

WF2 WF1

Batch Job Scheduler

job queue

Workflow instances

scoring scheme for selection of the most suitable binding modes provided by AutoDock.

Figure 4 shows the workflow of the drug screening process. There are a few small programs involving data preparation for AutoDock, namely mkgpf, mkdpf and AutoGrid. Another program is needed for extracting the scoring result from AutoDock output file.

Figure 4: Drug screening workflow

6. Experimental evaluation

The preliminary evaluation was performed with a small set of samples consisting of 24 ligands and one protein. The data of the ligands and targeted protein were real data obtained from the Computational Chemistry Lab. Twenty-four instances of the workflow were created by varying the ligands. Initially, each workflow instance was run on a PC with a 2.8 GHz Pentium IV processor in order to obtain the result and timing data. AutoDock and Gaussian programs are computing-intensive and produce small text files (a few hundreds kilobytes). The results occupy a specifically labeled section consisting of a few lines that can be quickly extracted with a simple Perl script.

To quickly start a pilot study, we use the workflow scheduler that we have implemented and a small cluster with 4 compute nodes as a simulated environment. Each compute node is assigned with only one job at a time. Application components in the workflows were replaced with sleep commands. The sleeping times are set according to run-times of component programs obtained from the real execution of each particular sample. The collected run-times were scaled down 30 times in the simulated run.

The proposed technique was compared against the conventional FIFO scheduling scheme. In that scheme, the workflow with the earliest arrival time was selected from the waiting queue. Once it started, it would run until finished. Intermediate evaluation was done but not used for scheduling.

Each workflow instance was selected by random and submitted to the simulator subsequently without delay. This step was repeated 50 times. In other words, 50 random sequences of the 24 workflows were generated for the evaluation. Each sequence was evaluated with the proposed technique and the conventional technique. 7. Results and discussion

The practicality of the proposed algorithm depends on the correlation between the intermediate and final results. Each compound is given two ranking numbers between 1 and 24 according to the results from AutoDock and Gaussian respectively.

0

5

10

15

20

25

1 3 5 7 9 11 13 15 17 19 21 23

Intermediate ranking

Fina

l ran

king

Figure 5: Intermediate and final ranking

The relation between the two ranking numbers is

shown in Figure 5. The dashed line represents the ideal case. The difference between the two ranking numbers was 5.08 on average. Their correlation efficient was 0.61. Therefore, there was indeed a correlation between them, even though it was not high.

In addition, the average before-evaluation time and after-evaluation time is 17.25% and 82.75% of the completion time respectively.

0

5

10

15

20

25

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24Order of Workflow Completion

Aver

age

Wor

kflo

w R

anki

ng

Ranking by Intermediate Result

Ranking by Final Result

Figure 6: Average workflow ranking of completed workflows (with conventional scheduling algorithm)

mkgpf

AutoGrid mkdpf

AutoDock

Gaussian

Intermediate Evaluation

Figure 6 shows the average rank of each workflow that completed in order on the conventional system. Both ranks by the intermediate results and by the final results are shown. Since the order of completion was not affected by the results, the ranks of the workflows with any order of completion were close to 12.5 (the average value of numbers from 1 to 24). The anomaly at both ends of the graph was due to the particular data set being used. A few workflows with the shortest completion time gave very low ranked results and also a few workflows that took the longest time happened to be the top ranked ones.

0

5

10

15

20

25

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24Order of Workflow Completion

Aver

age

Wor

kflo

w R

anki

ng

Ranking by Intermediate Result

Ranking by Final Result

Figure 7: Improved average workflow ranking

Figure 7 shows the relation between the order of

completion and the average rank when our technique was applied. The correlation coefficient between the order of completion and the ranking by the intermediate results was 0.97. It indicated that the algorithm performed very well. The correlation coefficient between the order of completion and the ranking by the final results was 0.66. That was due to the differences between the intermediate ranking and the final ranking as mentioned above. In an ideal situation that the final results are in agreement with the intermediate results, our algorithm will perform better.

0:00

6:00

12:00

18:00

24:00

30:00

36:00

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

Order of Workflow Completion

Ave

rage

Sta

rt/F

inis

h Ti

me

(hr:

min

) original start timeoriginal finish timenew starttimenew finish time

Figure 8: Average start and finish times

Figure 8 shows the average start time and average finish time for each order of completion. The parallel lines in the middle are start and finish times of workflows on the conventional system. The bottom and the top lines are start and finish times on the new system.

Apparently the new system produced the output later than the conventional one. In about six hours the conventional system finished three workflows while the new system just almost finished the first. However, by looking more closely and in complement to the previous graphs, we can see the true benefit of the new system. With the proposed technique, workflows started to execute earlier than the conventional technique. At the beginning there were more workflows running in the before-evaluation phase than those running in the after-evaluation phase. Therefore, there was higher possibility to run the before-evaluation workflows. The graphs show that most workflows started within less than 10 hours. By that time, most workflows had passed the evaluation point.

Therefore, the first results of the new system came out later than in the conventional system. Also, since the workflows with lower intermediate ranks might be preempted by the higher ranks, their execution in the second phase was delayed and the completion time is extended.

In about 18 hours, half way of the total time, the conventional system finished 13 workflows while the new system finished only 8. However, the average ranking of the 13 workflows was 13.1 and 13.4 for intermediate ranking and final ranking respectively. The average ranking of the 8 workflow was 6.2 and 10.4 for intermediate ranking and final ranking respectively. That means the new technique could make most of good results come out before bad results.

Within 24 hours the new system finished 12 workflows. All of them have the average intermediate ranking less than 9 and six of them have the average final ranking less than 12 or in the better half of all workflows. At the same time the conventional system finished 17 workflows with the average ranking of 12.9 and 13.2 for intermediate ranking and final ranking respectively.

After 24 hours, the new system finished workflows more frequently than the conventional system. The last workflows finished at about the same time. That means the makespans of both systems were the same.

In the conventional system, the completion times were about the same for all completion orders. For the new system, the workflows that complete earlier had less completion times. The completion times were much greater after 10 workflows completed.

8. Related work

Nimrod [1] is a parameter-sweep scheduling tool that has been around for a long time and also used for a drug screening application [5]. In [6], priority-based algorithms for scheduling workflows with parameter-sweep tasks are proposed. It focuses on workflows that have parameter-sweep tasks as workflow nodes while we are interested in parameter-sweep applications having lots of independent workflows.

Resource allocation strategies for user-directed parameter search are studied in [7]. In this work, the user gives priorities to groups of jobs interactively and resources are allocated accordingly. The experiment is done on a search application based on genetic algorithms. The jobs in this study are independent jobs whereas in our study they are parameter-sweep workflow jobs.

Most research on scheduling workflow and parameter-sweep applications [8, 9, 10, 11] address the problem of minimizing makespans by using various heuristics based on estimation of completion time and earliest start time. The heuristics include min-min, max-min, sufferage, and their variations. 9. Conclusions and future Work

We have presented a novel strategy to schedule workflow-based parameter-sweep applications onto clusters. It is aimed to make the workflow that gives the best result finishes as soon as possible. The technique is based on evaluation of intermediate results and dynamic priority adjustment. The preliminary evaluation of the technique in comparison with a conventional scheduling strategy on a drug screening application has shown some promising results. We have also implemented the scheduling strategy on top of a batch job scheduler called Torque.

However, we have made many assumptions for simplicity and therefore we need more experiments in order to evaluate the applicability of the algorithm in general. First, it must be tested on a larger simulated environment with other applications. For example, an application that uses a genetic algorithm, similar to that was in [7]. We also need to explore the effects of some parameters, such as the choice of P1 and P2, and the effectiveness of the algorithm for ranges of application characteristics, namely the correlation between the quality of intermediate results and of final results, the fraction of the computation that is required to obtain final results from intermediate results, and the cost of intermediate evaluation.

We plan to investigate the technique with on-line submission of a larger number of more complicated

workflows on a larger cluster. Also, we are improving the scheduler to work on the Grid. 10. Acknowledgments

We would like to thank Nadtanet Nunthaboot and Koonwadee Rathanasak at Computational Chemistry Lab, Chulalongkorn University for their help on drug screening. This work is supported by Chulalongkorn University and Thai National Grid Project. 11. References [1] D. Abramson, R Sosic., J. Giddy and B. Hall, “Nimrod: A Tool for Performing Parametised Simulations using Distributed Workstations,” The 4th IEEE Symposium on High Performance Distributed Computing, 1995. [2] D. Kitchen, H. Decornez, J. R. Furr and J. Bajorath, “Docking and Scoring in Virtual Screening for Drug Discovery: Methods and Applications,” Nature Reviews Drug Discovery, 3:935-949, 2004. [3] G. M. Morris, D. S. Goodsell, R. S. Halliday, R. Huey, W. E. Hart, R. K. Belew and A. J. Olson, “Automated Docking Using a Lamarckian Genetic Algorithm and an Empirical Binding Free Energy Function,” Journal of Computational Chemistry, 19(14):1639-1662, 1998. [4] Æ. Frisch and M. J. Frisch, “Gaussian 98 User's Reference”, 2/e, Gaussian, Inc., Pittsburgh, 1998. [5] R. Buyya, K. Branson, J. Giddy and D. Abramson, “The Virtual Laboratory: Enabling On Demand Drug Design with the Worldwide Grid,” Concurrency and Computation: Practice and Experience, 15(1), 2003. [6] T. Ma and R. Buyya, “Critical-Path and Priority based Algorithms for Scheduling Workflows with Parameter Sweep Tasks on Global Grids,” The 17th International Symposium on Computer Architecture and High Performance Computing, 2005. [7] M. Faerman, A. Birnbaum, H. Casanova, and F. Berman, “Resource Allocation for Steerable Parallel Parameter Searches,” Proceedings of the Grid Computing Workshop, Baltimore, Maryland, November 2002. [8] A. Mandal, K. Kennedy, C. Koelbel, G. Marin, B. Liu and L. Johnsson, “Scheduling Strategies for Mapping Application Workflows onto the Grid,” The 14th IEEE Symposium on High Performance Distributed Computing, 2005. [9] H. Casanova, A. Legrand, D. Zagorodnov, and F. Berman, “Heuristics for scheduling parameter sweep applications in grid environments,” Proceedings of the 9th Heterogeneous Computing workshop (HCW), May 2000. [10] R. Sakellariou and H. Zhao, “A hybrid heuristic for DAG scheduling on heterogeneous systems,” The 13th International Heterogeneous Computing Workshop, 2004. [11] A. Kamthe, S.Y. Lee, “A Stochastic Approach to Estimating Earliest Start Times of Nodes for Scheduling DAGs on Heterogeneous Distributed Computing Systems,” Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05), 2005.

Documents

[IEEE 2006 IEEE International Conference on Cluster Computing - Barcelona, Spain (2006.09.25-2006.09.28)] 2006 IEEE International Conference on Cluster Computing - Scheduling Workflow-based