Upload
darryl-weeks
View
55
Download
3
Embed Size (px)
DESCRIPTION
Pegasus and Condor. Gaurang Mehta, Ewa Deelman, Carl Kesselman, Karan Vahi Center For Grid Technologies USC/ISI. PEGASUS. Pegasus – Planning for Execution in Grid Pegasus is a configurable system that can plan, schedule and execute complex workflows on the Grid. - PowerPoint PPT Presentation
Citation preview
Pegasus and Condor
Gaurang Mehta, Ewa Deelman, Carl Kesselman, Karan Vahi
Center For Grid Technologies USC/ISI
04/19/23 Condor-Week 2
PEGASUS Pegasus – Planning for Execution in Grid Pegasus is a configurable system that can plan,
schedule and execute complex workflows on the Grid.– Algorithmic and AI based techniques are used.
Pegasus takes an abstract workflow as input. The abstract workflow describes the transformations and data in terms of their logical names.
It then queries the Replica Location Service (RLS) for existence of any materialized data. If any derived data exists then it is reused and a workflow reduction is done
04/19/23 Condor-Week 3
Workflow Reduction
E1
E3
E2
f.b f.c
f.a1 f.a2
f.d
f.b and f.c exist in RLS
E3
f.b f.c
f.d
Reduced workflow
Original Abstract workflow
Execution nodes
Transfer nodes
Registration nodes
04/19/23 Condor-Week 4
Pegasus (Cont) It then locates physical locations for both components
(transformations and data)– Uses Globus Replica Location Service (RLS) and the
Transformation Catalog (TC) Finds appropriate resources to execute
– Via Globus Monitoring and Discovery Service (MDS) Adds the stage-in jobs to transfer raw and materialized
input files to the computation sites. Adds the stage out jobs to transfer derived data to the
user selected storage location.– Both input and output staging is done Globus GridFtp
Publishes newly derived data products for reuse– RLS, Chimera virtual data catalog (VDC)
04/19/23 Condor-Week 5
Workflow Modification
T1
E3
T2
f.b f.c
f.d
T3
R1
Final Dag
E3
f.b f.c
f.d
Reduced workflow
Execution nodes
Transfer nodes
Registration nodes
04/19/23 Condor-Week 6
Pegasus (Cont) Pegasus generates the concrete workflow in Condor
Dagman format and submits them to Dagman/Condor-G for execution on the Grid.
These concrete Dags have the concrete location of the data and the site where the computation is to be performed.
Condor-G submits these jobs via Globus-Gram to remote schedulers running Condor, PBS, LSF and Sun Grid Engine.
Part of a software package distributed by GriPhyN called Virtual Data System (VDS).
VDS-1.2.3(Pegasus+Chimera) currently included in the Virtual Data Toolkit 1.1.13 (VDT).
04/19/23 Condor-Week 7
Workflow Construction
PEGASUS
VDC
DAGMANCONDOR-G
CHIMERA
TC
MDS
RLS
VDLX
DAX
USER SUPPLIED DAX
Dag/Submit Files
VDL
GRID
04/19/23 Condor-Week 8
Current System
Original Abstract Workflow
Current Pegasus
Pegasus(Abstract Workflow)
DAGMan(CW))
Co
ncre
te W
orfklo
w
Workflow Execution
04/19/23 Condor-Week 9
Deferred Planning in Pegasus Current Pegasus implementation plans the entire
workflow before submitting it for execution. (Full ahead) Grids are very dynamic and resources come and go
pretty often. Currently adding support for deferred planning where in
only a part of the workflow will be planned and executed at a time.
Chop the abstract workflow into partitions. Plan on one partition and submit it to Dagman/Condor-G The last job in the partition calls Pegasus again and plans
the next partition and so on.. Initial partitions will be level based on breadth-first
search.
04/19/23 Condor-Week 10
Incremental Refinement
Partition Abstract workflow into partial workflows
PW A
PW B
PW C
A Particular PartitioningNew Abstract
Workflow
04/19/23 Condor-Week 11
Meta-DAGMan
Pegasus(A)
Pegasus(B)
Pegasus(C)
DAGMan(Su(A))
Su(B)
Su(C)
DAGMan(Su(B))
DAGMan(Su(C))
Pegasus(X) –Pegasus generates the concrete workflow and the submit files for X = Su(X)
DAGMan(Su(X))—DAGMan executes the concrete workflow for X
04/19/23 Condor-Week 12
Current Condor Technologies Used
Dagman to manage the dependencies in the acyclic workflow.– Provides support to resume a failed workflow using
rescue dag generated by Dagman. Condor-G to submit jobs to the grid (globus-
jobmanager).– Jobs are submitted using Globus GRAM and the
stdout/stdin/stderr is streamed back using GLOBUS GASS. Condor as a scheduler to harness idle cpu cycles on
existing desktops.– ISI has a small 36 node condor pool consisting of primarily
Linux and Solaris machines.
04/19/23 Condor-Week 13
Future Condor Technologies to be integrated.
Nest– We are looking at integrating support for
nest which allows disk space reservation on remote sites
Stork (Data Placement Scheduler)– Allows support of multiple transfer
protocols.
(ftp, http, nest/chirp, gsiftp, srb, file)
– Reliably transfers your file across the grid.
04/19/23 Condor-Week 14
Applications Using Pegasus and Condor Dagman
GriPhyN Experiments– Laser Interferometer Gravitational Wave Observatory
(Caltech/UWM)– ATLAS (U of Chicago)– SDSS (Fermilab)
Also IVDGL/GRID3 National Virtual Observatory and NASA
– Montage Biology
– BLAST (ANL, PDQ-funded) Neuroscience
– Tomography for Telescience(SDSC, NIH-funded)
04/19/23 Condor-Week 15
A small Montage workflow
1202 nodes
04/19/23 Condor-Week 16
Pegasus Acknowledgements
Ewa Deelman, Carl Kesselman, Gaurang Mehta, Karan Vahi, Mei-Hui Su, Saurabh Khurana, Sonal Patil, Gurmeet Singh (Center for Grid Computing, ISI)
James Blythe, Yolanda Gil (Intelligent Systems Division, ISI) Collaboration with Miron Livny and the Condor Team (UW
Madison) Collaboration with Mike Wilde, Jens Voeckler (UofC) -
Chimera Research funded as part of the NSF GriPhyN, NVO and
SCEC projects and EU-funded GridLab For more information
– http://pegasus.isi.edu- http://www.griphyn.edu/workspace/vds- Contacts: deelman , gmehta , vahi @isi.edu