26
1 A Steering Portal for Condor/DAGMAN Naoya Maruyama on behalf of Ak iko Iino Hidemoto Nakada, Satoshi Matsu oka Tokyo Institute of Technology

A Steering Portal for Condor/DAGMAN

  • Upload
    duante

  • View
    24

  • Download
    1

Embed Size (px)

DESCRIPTION

A Steering Portal for Condor/DAGMAN. Naoya Maruyama on behalf of Akiko Iino Hidemoto Nakada, Satoshi Matsuoka Tokyo Institute of Technology. Background. Common Grid Usage Scenario Zillions of Batch Jobs scheduled over combination of private/public resources within a VO - PowerPoint PPT Presentation

Citation preview

Page 1: A Steering Portal for Condor/DAGMAN

1

A Steering Portal for Condor/DAGMAN

Naoya Maruyama on behalf of Akiko Iino

Hidemoto Nakada, Satoshi MatsuokaTokyo Institute of Technology

Page 2: A Steering Portal for Condor/DAGMAN

2

Background

Common Grid Usage Scenario Zillions of Batch Jobs scheduled over combination of

private/public resources within a VO Some Jobs require steering during workflow

“Human decision required” Most previous steering work focused on GUI-

level interactivity Real-time, interactive steering of the application itself Does not meld well with batch jobs Need significant application customizations

Page 3: A Steering Portal for Condor/DAGMAN

3

Objectives and Contributions

Objectives A Steering Portal for workflow (DAGMAN) jobs with

easy descriptions, w/o application, Condor, or DAGMAN modifications

Contributions Portal to allow steering with simple additions to

DAGMAN scripts Confirmed low overhead with exemplar applications

Quantitative assessment of user steps required

Page 4: A Steering Portal for Condor/DAGMAN

4

Outline

Background Motivating example Required features of steering Steering example Overview and prototype implementation Evaluation Conclustion

Page 5: A Steering Portal for Condor/DAGMAN

5

Exemplar Application:Phylogenetic Tree Inference

Infer phylogenetic relationships between different species from their genomic sequences[Hasegawa&Shimodaira04]

App Characteristics Basically execute multiple parallel jobs in sequence  => Workflow of batch jobs But difficult to judge the termination condition of the a

pplication phases  => Need human steering

Common Ancestor

Page 6: A Steering Portal for Condor/DAGMAN

6

Narrow down on the candidate phylogenetic trees:

Hard to automate=>batch jobs difficult

Phylogenetic Tree Inference Breakdown

Compute Posterior Probability

“MrBayes”

Compute likelihood value

“PAML”

Test“CONSEL”

Page 7: A Steering Portal for Condor/DAGMAN

7

List of Applications in the WF

 Job

Description

Input OutputCompute Time

Required

MrBayes

Compute Posterior Probability

Initial Topology

List of Topologies

~2 weeks on 24 high-end CPUs

PAMLCompute

likelihood value

List of Topologies

Likelihood Values

~10 days on 26 high-end CPUs

CONSEL Test

List of Topologies & Likelihood Values

Probability Values

1~2 hours on 1 CPU

Page 8: A Steering Portal for Condor/DAGMAN

8

The Actual Workflow

1. Exec. MrBayes

2. Termination Judgement

3. Manutal input of new parameters

4. Post-Process MrBayes

5. Execute PAML

6. Execute CONSEL5 5 5 5 5

1 1 1 1 1

Need Steering

3

4

Page 9: A Steering Portal for Condor/DAGMAN

9

MrBayes Example and Problems

As a standalone app, requests interactive input Up to a user to judge

computational convergence

But lacks info display to allow good judgment Not on this screen!

1. User needs to periodically poll his screen and make interactive input

2. Also look at output files from 1000 jobs!

Page 10: A Steering Portal for Condor/DAGMAN

10

MrBayes Examples and Problems (2)

Visualize ・ Decide on next parameter

Problems:

3.Manual conversion to graphical display

4. Changing appropriate parameters

Output file

・ Decide onConvergence

Page 11: A Steering Portal for Condor/DAGMAN

11

Outline

Background Motivating example Required features of steering Steering example Overview and prototype implementation Evaluation Conclustion

Page 12: A Steering Portal for Condor/DAGMAN

12

Steering portal features for batch workflows with interactivity elements Pausing/resuming computation

Progress computation as much as possible until user input is absolutely needed

Resume immediately after input Allow flexible parameter modifications

Various ways to specify parameters for output and input Various ways to notify users – interactive screen, email, etc. Various ways of parameter observations – various portal

functions Various ways to modify parameters

Even switching back and forth between your terminal and from a cell phone 10,000 miles away!

Page 13: A Steering Portal for Condor/DAGMAN

13

Outline

Background Motivating example Required features of steering Steering example Overview and prototype implementation Evaluation Conclustion

Page 14: A Steering Portal for Condor/DAGMAN

14

Example: (1) Job submission Standard Condor/DAGMAN job submission

But includes steering functions in job description

Page 15: A Steering Portal for Condor/DAGMAN

15

Example (2): User Notification Various notification methods, incl. email Displays Portal URL in the message Works on various devices incl. cell phones

Page 16: A Steering Portal for Condor/DAGMAN

16

Example (3): Steering Portal

Parameter Input

Visualize current status

Continuing of Workflow

Portal generating steering web pages dynamically depending on

workflow context

Page 17: A Steering Portal for Condor/DAGMAN

17

Outline

Background Motivating example Required features of steering Steering example Overview and prototype implementation Evaluation Conclustion

Page 18: A Steering Portal for Condor/DAGMAN

18

Condor Pool

Individual job submissions

Workflow and Steering description

DAGMAN/Condor

Steering–input

Steering PortalUser Notification

Web page generationand Job control

Overview of our Steering Portal

submission

Retry Function

POST

Scripting Features

Steering– notification

Steering–display

Page 19: A Steering Portal for Condor/DAGMAN

19

Overview of Steering Portal (2) The user defines several steering components for the steering

portal, defining in a script below:A) A set of applications in the workflowB) CondorDAGMan+Steering workflow description

A) Translator for converting output to input to continue workflowB) Visualization program to display application output on steering web p

age

C) Application input/output specifications D) Parameters that require steering

The Steering portal does: Read the above script Automatically generate steering web page Interact with DAGMAN to notify users (email, etc.) and take input from th

e web portal

Page 20: A Steering Portal for Condor/DAGMAN

20

Prototype Implementation

Coordination between DAGMAN and Steering Portal Use DAGMan POST Scripting function to invoke the steering po

rtal Use DAGMan Retry function to resume workflow execution

Prototype Implementation of the Steering Portal Interpretation of the steering descriptions embedded in DAGMA

N workflow Appropriate and multiple notifications and steering interfaces av

ailable Notification and interfaces currently selected according to script

Automated selection for the future Mail and messaging notification function with embedded services CGI web page generation onto the portal sever using ssh Steering from anywhere, anytime (incl. cell phones and PDAs

Page 21: A Steering Portal for Condor/DAGMAN

21

Outline

Background Motivating example Required features of steering Steering example Overview and prototype implementation Evaluation Conclustion

Page 22: A Steering Portal for Condor/DAGMAN

22

Evaluation Apply to sample applications (simple pi calculation and

more complex phylogenetic tree example) Evaluate the necessary “work steps” Items of Evaluation

A) Modification to the application program itselfB) CondorDAGMan workflow descriptionC) Translator for converting input to output to continue workflowD) Visualization program to display application output on steering

web page E) Application input/output specifications F) Parameters that require steeringG) Modifications to the Condor Job submit file

Page 23: A Steering Portal for Condor/DAGMAN

23

Sample Pi Program

Eval. Item

ANo mod to the original

program

EInput: 4 inputs from stdin  Output: 3 number columns

F 2 inputs out of the 4 stdin

Eval. Item # Files# Lines in

Total

B 2 4

C 0 0

D 1 3

G 1 6

Page 24: A Steering Portal for Condor/DAGMAN

24

Phylogenetic Tree Program

Eval. Item

ANo mod to the original

program

EInput: 1 setup file, 1 data

fileOutput: 2 files

F 1 parameter value

Eval. Item # Files# Lines in

Total

B 3 6

C 1 40

D 1 16

G 20(1) 180

(1) 20 9-line files, only 1 line differsamongst them

Page 25: A Steering Portal for Condor/DAGMAN

25

Conclusion and Future Work Conclusion

Proposed a Steering Portal that allows interactive steering of batch scheduled jobs in Condor/DAGMAN

Created prototypes with flexible notification and visualization/steering features

Applied to sample apps including Pi and Phylogenetic trees Future work

Support and automatically select various interfaces Apply to other application, esp. with larger workflows and

more complex interactions Apply to other workflow engines

Page 26: A Steering Portal for Condor/DAGMAN

26

Contact infoSatoshi Matsuoka, [email protected],

Tokyo Institute of Technology