A Steering Portal for Condor/DAGMAN

Preview:

DESCRIPTION

A Steering Portal for Condor/DAGMAN. Naoya Maruyama on behalf of Akiko Iino Hidemoto Nakada, Satoshi Matsuoka Tokyo Institute of Technology. Background. Common Grid Usage Scenario Zillions of Batch Jobs scheduled over combination of private/public resources within a VO - PowerPoint PPT Presentation

Citation preview

1

A Steering Portal for Condor/DAGMAN

Naoya Maruyama on behalf of Akiko Iino

Hidemoto Nakada, Satoshi MatsuokaTokyo Institute of Technology

2

Background

Common Grid Usage Scenario Zillions of Batch Jobs scheduled over combination of

private/public resources within a VO Some Jobs require steering during workflow

“Human decision required” Most previous steering work focused on GUI-

level interactivity Real-time, interactive steering of the application itself Does not meld well with batch jobs Need significant application customizations

3

Objectives and Contributions

Objectives A Steering Portal for workflow (DAGMAN) jobs with

easy descriptions, w/o application, Condor, or DAGMAN modifications

Contributions Portal to allow steering with simple additions to

DAGMAN scripts Confirmed low overhead with exemplar applications

Quantitative assessment of user steps required

4

Outline

Background Motivating example Required features of steering Steering example Overview and prototype implementation Evaluation Conclustion

5

Exemplar Application:Phylogenetic Tree Inference

Infer phylogenetic relationships between different species from their genomic sequences[Hasegawa&Shimodaira04]

App Characteristics Basically execute multiple parallel jobs in sequence  => Workflow of batch jobs But difficult to judge the termination condition of the a

pplication phases  => Need human steering

Common Ancestor

6

Narrow down on the candidate phylogenetic trees:

Hard to automate=>batch jobs difficult

Phylogenetic Tree Inference Breakdown

Compute Posterior Probability

“MrBayes”

Compute likelihood value

“PAML”

Test“CONSEL”

7

List of Applications in the WF

 Job

Description

Input OutputCompute Time

Required

MrBayes

Compute Posterior Probability

Initial Topology

List of Topologies

~2 weeks on 24 high-end CPUs

PAMLCompute

likelihood value

List of Topologies

Likelihood Values

~10 days on 26 high-end CPUs

CONSEL Test

List of Topologies & Likelihood Values

Probability Values

1~2 hours on 1 CPU

8

The Actual Workflow

1. Exec. MrBayes

2. Termination Judgement

3. Manutal input of new parameters

4. Post-Process MrBayes

5. Execute PAML

6. Execute CONSEL5 5 5 5 5

1 1 1 1 1

Need Steering

3

4

9

MrBayes Example and Problems

As a standalone app, requests interactive input Up to a user to judge

computational convergence

But lacks info display to allow good judgment Not on this screen!

1. User needs to periodically poll his screen and make interactive input

2. Also look at output files from 1000 jobs!

10

MrBayes Examples and Problems (2)

Visualize ・ Decide on next parameter

Problems:

3.Manual conversion to graphical display

4. Changing appropriate parameters

Output file

・ Decide onConvergence

11

Outline

Background Motivating example Required features of steering Steering example Overview and prototype implementation Evaluation Conclustion

12

Steering portal features for batch workflows with interactivity elements Pausing/resuming computation

Progress computation as much as possible until user input is absolutely needed

Resume immediately after input Allow flexible parameter modifications

Various ways to specify parameters for output and input Various ways to notify users – interactive screen, email, etc. Various ways of parameter observations – various portal

functions Various ways to modify parameters

Even switching back and forth between your terminal and from a cell phone 10,000 miles away!

13

Outline

Background Motivating example Required features of steering Steering example Overview and prototype implementation Evaluation Conclustion

14

Example: (1) Job submission Standard Condor/DAGMAN job submission

But includes steering functions in job description

15

Example (2): User Notification Various notification methods, incl. email Displays Portal URL in the message Works on various devices incl. cell phones

16

Example (3): Steering Portal

Parameter Input

Visualize current status

Continuing of Workflow

Portal generating steering web pages dynamically depending on

workflow context

17

Outline

Background Motivating example Required features of steering Steering example Overview and prototype implementation Evaluation Conclustion

18

Condor Pool

Individual job submissions

Workflow and Steering description

DAGMAN/Condor

Steering–input

Steering PortalUser Notification

Web page generationand Job control

Overview of our Steering Portal

submission

Retry Function

POST

Scripting Features

Steering– notification

Steering–display

19

Overview of Steering Portal (2) The user defines several steering components for the steering

portal, defining in a script below:A) A set of applications in the workflowB) CondorDAGMan+Steering workflow description

A) Translator for converting output to input to continue workflowB) Visualization program to display application output on steering web p

age

C) Application input/output specifications D) Parameters that require steering

The Steering portal does: Read the above script Automatically generate steering web page Interact with DAGMAN to notify users (email, etc.) and take input from th

e web portal

20

Prototype Implementation

Coordination between DAGMAN and Steering Portal Use DAGMan POST Scripting function to invoke the steering po

rtal Use DAGMan Retry function to resume workflow execution

Prototype Implementation of the Steering Portal Interpretation of the steering descriptions embedded in DAGMA

N workflow Appropriate and multiple notifications and steering interfaces av

ailable Notification and interfaces currently selected according to script

Automated selection for the future Mail and messaging notification function with embedded services CGI web page generation onto the portal sever using ssh Steering from anywhere, anytime (incl. cell phones and PDAs

21

Outline

Background Motivating example Required features of steering Steering example Overview and prototype implementation Evaluation Conclustion

22

Evaluation Apply to sample applications (simple pi calculation and

more complex phylogenetic tree example) Evaluate the necessary “work steps” Items of Evaluation

A) Modification to the application program itselfB) CondorDAGMan workflow descriptionC) Translator for converting input to output to continue workflowD) Visualization program to display application output on steering

web page E) Application input/output specifications F) Parameters that require steeringG) Modifications to the Condor Job submit file

23

Sample Pi Program

Eval. Item

ANo mod to the original

program

EInput: 4 inputs from stdin  Output: 3 number columns

F 2 inputs out of the 4 stdin

Eval. Item # Files# Lines in

Total

B 2 4

C 0 0

D 1 3

G 1 6

24

Phylogenetic Tree Program

Eval. Item

ANo mod to the original

program

EInput: 1 setup file, 1 data

fileOutput: 2 files

F 1 parameter value

Eval. Item # Files# Lines in

Total

B 3 6

C 1 40

D 1 16

G 20(1) 180

(1) 20 9-line files, only 1 line differsamongst them

25

Conclusion and Future Work Conclusion

Proposed a Steering Portal that allows interactive steering of batch scheduled jobs in Condor/DAGMAN

Created prototypes with flexible notification and visualization/steering features

Applied to sample apps including Pi and Phylogenetic trees Future work

Support and automatically select various interfaces Apply to other application, esp. with larger workflows and

more complex interactions Apply to other workflow engines

26

Contact infoSatoshi Matsuoka, matsu@is.titech.ac.jp,

Tokyo Institute of Technology

Recommended