16
Pegasus and Condor Gaurang Mehta, Ewa Deelman, Carl Kesselman, Karan Vahi Center For Grid Technologies USC/ISI

Pegasus and Condor

Embed Size (px)

DESCRIPTION

Pegasus and Condor. Gaurang Mehta, Ewa Deelman, Carl Kesselman, Karan Vahi Center For Grid Technologies USC/ISI. PEGASUS. Pegasus – Planning for Execution in Grid Pegasus is a configurable system that can plan, schedule and execute complex workflows on the Grid. - PowerPoint PPT Presentation

Citation preview

Page 1: Pegasus and Condor

Pegasus and Condor

Gaurang Mehta, Ewa Deelman, Carl Kesselman, Karan Vahi

Center For Grid Technologies USC/ISI

Page 2: Pegasus and Condor

04/19/23 Condor-Week 2

PEGASUS Pegasus – Planning for Execution in Grid Pegasus is a configurable system that can plan,

schedule and execute complex workflows on the Grid.– Algorithmic and AI based techniques are used.

Pegasus takes an abstract workflow as input. The abstract workflow describes the transformations and data in terms of their logical names.

It then queries the Replica Location Service (RLS) for existence of any materialized data. If any derived data exists then it is reused and a workflow reduction is done

Page 3: Pegasus and Condor

04/19/23 Condor-Week 3

Workflow Reduction

E1

E3

E2

f.b f.c

f.a1 f.a2

f.d

f.b and f.c exist in RLS

E3

f.b f.c

f.d

Reduced workflow

Original Abstract workflow

Execution nodes

Transfer nodes

Registration nodes

Page 4: Pegasus and Condor

04/19/23 Condor-Week 4

Pegasus (Cont) It then locates physical locations for both components

(transformations and data)– Uses Globus Replica Location Service (RLS) and the

Transformation Catalog (TC) Finds appropriate resources to execute

– Via Globus Monitoring and Discovery Service (MDS) Adds the stage-in jobs to transfer raw and materialized

input files to the computation sites. Adds the stage out jobs to transfer derived data to the

user selected storage location.– Both input and output staging is done Globus GridFtp

Publishes newly derived data products for reuse– RLS, Chimera virtual data catalog (VDC)

Page 5: Pegasus and Condor

04/19/23 Condor-Week 5

Workflow Modification

T1

E3

T2

f.b f.c

f.d

T3

R1

Final Dag

E3

f.b f.c

f.d

Reduced workflow

Execution nodes

Transfer nodes

Registration nodes

Page 6: Pegasus and Condor

04/19/23 Condor-Week 6

Pegasus (Cont) Pegasus generates the concrete workflow in Condor

Dagman format and submits them to Dagman/Condor-G for execution on the Grid.

These concrete Dags have the concrete location of the data and the site where the computation is to be performed.

Condor-G submits these jobs via Globus-Gram to remote schedulers running Condor, PBS, LSF and Sun Grid Engine.

Part of a software package distributed by GriPhyN called Virtual Data System (VDS).

VDS-1.2.3(Pegasus+Chimera) currently included in the Virtual Data Toolkit 1.1.13 (VDT).

Page 7: Pegasus and Condor

04/19/23 Condor-Week 7

Workflow Construction

PEGASUS

VDC

DAGMANCONDOR-G

CHIMERA

TC

MDS

RLS

VDLX

DAX

USER SUPPLIED DAX

Dag/Submit Files

VDL

GRID

Page 8: Pegasus and Condor

04/19/23 Condor-Week 8

Current System

Original Abstract Workflow

Current Pegasus

Pegasus(Abstract Workflow)

DAGMan(CW))

Co

ncre

te W

orfklo

w

Workflow Execution

Page 9: Pegasus and Condor

04/19/23 Condor-Week 9

Deferred Planning in Pegasus Current Pegasus implementation plans the entire

workflow before submitting it for execution. (Full ahead) Grids are very dynamic and resources come and go

pretty often. Currently adding support for deferred planning where in

only a part of the workflow will be planned and executed at a time.

Chop the abstract workflow into partitions. Plan on one partition and submit it to Dagman/Condor-G The last job in the partition calls Pegasus again and plans

the next partition and so on.. Initial partitions will be level based on breadth-first

search.

Page 10: Pegasus and Condor

04/19/23 Condor-Week 10

Incremental Refinement

Partition Abstract workflow into partial workflows

PW A

PW B

PW C

A Particular PartitioningNew Abstract

Workflow

Page 11: Pegasus and Condor

04/19/23 Condor-Week 11

Meta-DAGMan

Pegasus(A)

Pegasus(B)

Pegasus(C)

DAGMan(Su(A))

Su(B)

Su(C)

DAGMan(Su(B))

DAGMan(Su(C))

Pegasus(X) –Pegasus generates the concrete workflow and the submit files for X = Su(X)

DAGMan(Su(X))—DAGMan executes the concrete workflow for X

Page 12: Pegasus and Condor

04/19/23 Condor-Week 12

Current Condor Technologies Used

Dagman to manage the dependencies in the acyclic workflow.– Provides support to resume a failed workflow using

rescue dag generated by Dagman. Condor-G to submit jobs to the grid (globus-

jobmanager).– Jobs are submitted using Globus GRAM and the

stdout/stdin/stderr is streamed back using GLOBUS GASS. Condor as a scheduler to harness idle cpu cycles on

existing desktops.– ISI has a small 36 node condor pool consisting of primarily

Linux and Solaris machines.

Page 13: Pegasus and Condor

04/19/23 Condor-Week 13

Future Condor Technologies to be integrated.

Nest– We are looking at integrating support for

nest which allows disk space reservation on remote sites

Stork (Data Placement Scheduler)– Allows support of multiple transfer

protocols.

(ftp, http, nest/chirp, gsiftp, srb, file)

– Reliably transfers your file across the grid.

Page 14: Pegasus and Condor

04/19/23 Condor-Week 14

Applications Using Pegasus and Condor Dagman

GriPhyN Experiments– Laser Interferometer Gravitational Wave Observatory

(Caltech/UWM)– ATLAS (U of Chicago)– SDSS (Fermilab)

Also IVDGL/GRID3 National Virtual Observatory and NASA

– Montage Biology

– BLAST (ANL, PDQ-funded) Neuroscience

– Tomography for Telescience(SDSC, NIH-funded)

Page 15: Pegasus and Condor

04/19/23 Condor-Week 15

A small Montage workflow

1202 nodes

Page 16: Pegasus and Condor

04/19/23 Condor-Week 16

Pegasus Acknowledgements

Ewa Deelman, Carl Kesselman, Gaurang Mehta, Karan Vahi, Mei-Hui Su, Saurabh Khurana, Sonal Patil, Gurmeet Singh (Center for Grid Computing, ISI)

James Blythe, Yolanda Gil (Intelligent Systems Division, ISI) Collaboration with Miron Livny and the Condor Team (UW

Madison) Collaboration with Mike Wilde, Jens Voeckler (UofC) -

Chimera Research funded as part of the NSF GriPhyN, NVO and

SCEC projects and EU-funded GridLab For more information

– http://pegasus.isi.edu- http://www.griphyn.edu/workspace/vds- Contacts: deelman , gmehta , vahi @isi.edu