Upload
thomasina-shields
View
214
Download
0
Tags:
Embed Size (px)
Citation preview
Pegasus: Running Large-Scale
Scientific Workflows on the TeraGrid
Ewa Deelman
USC
Information Sciences Institute
http://pegasus.isi.edu www.isi.edu/~deelman
Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu
Acknowledgements
Carl Kesselman, Gaurang Mehta, Gurmeet Singh, Mei-Hui Su, Karan Vahi (Center for Grid Technologies, ISI)
James Blythe, Yolanda Gil (Intelligent Systems Division, ISI)
http://pegasus.isi.edu Research funded as part of the NSF GriPhyN, NVO
and SCEC projects, NIH-funded CRCNS project and EU-funded GridLab
Thanks for the use of the TeraGrid
Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu
Outline
Applications as workflows Pegasus (Planning for Execution in Grids) Montage application (Astronomy,
NSF&NASA) CyberShake (Southern California Earthquake
Center) Results from running on the TeraGrid
Conclusions
Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu
Today’s Scientific Applications
Applications Increasing in the level of complexity Use of individual application components Components are supplied by various individuals Reuse of individual intermediate data products (files)
Execution environment is complex and very dynamic Resources come and go Data is replicated Components can be found at various locations or staged in on demand
Separation between the application description the actual execution description
Applications being described in terms of workflows
Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu
Scientific AnalysisW
orkf
low
Evo
lutio
n Select the Input Data
Map the Workflow onto Available Resources
Execute the Workflow
Construct the Analysis
Workflow Template
Abstract Worfklow
Executable Workflow
Tasks to be executed
Grid Resources
Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu
Execution EnvironmentScientific AnalysisW
orkf
low
Evo
lutio
n
Grid Resources
Select the Input Data
Map the Workflow onto Available Resources
Execute the Workflow
Information Services
Library of Application
Components
Data Catalogs
Construct the Analysis
Resource availability and characteristics
Tasks to be executed
Data properties
Component characteristics
Workflow Template
Abstract Worfklow
Executable Workflow
Aut
omat
edU
ser
guid
ed
Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu
Executable Workflow Generation and Mapping
Intelligent Workflow
Composition tools ( WINGS
and CAT) (Nat. Lang.
Proc)
PegasusCondor
DAGManExecutable Workflow
Results
Virtual Data Language
(VDL)(GTOMO,
HEP, Biology, others)
Application-specificAbstract Workflow
Service (LIGO,SCEC,
Montage)
Abstract Workflow
Grid Resourcesjobs
Application-dependent
Application independent
WINGS and CAT, developed at ISI by Y. Gil, VDL, developed at ANL & Uof C by I. Foster, J. Voeckler & M. Wilde
Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu
Pegasus:Planning for Execution in Grids
Maps from abstract to executable workflow Automatically locates physical locations for
both workflow components and data Finds appropriate resources to execute the
components Reuses existing data products where
applicable Publishes newly derived data products
Provides provenance information
Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu
Information Components used by Pegasus
Globus Monitoring and Discovery Service (MDS) (or static file) Locates available resources Finds resource properties
Dynamic: load, queue length Static: location of GridFTP server, RLS, etc
Globus Replica Location Service Locates data that may be replicated Registers new data products
Transformation Catalog Locates installed executables
Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu
Example Workflow Reduction
Original abstract workflow
If “b” already exists (as determined by query to the RLS), the workflow can be reduced
Also useful in case of failures
d1 d2ba c
d2
b c
Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu
Mapping from abstract to executable
Query RLS, MDS, and TC, schedule computation and data movement
Execute d2 at B
Move b from A
to B
Move c from B
to U
Register c in the
RLS
d2
b c
Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu
Mosaic of M42 created on the Teragrid resources using Pegasus
Pegasus improved the runtime of this application by 90% over the baseline case
Workflow with 4,500 nodes
Bruce Berriman, John Good (Caltech)Joe Jacob, Dan Katz(JPL)
Gurmeet Singh, Mei Su (ISI)
Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu
Small Montage Workflow
~1200 nodes
Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu
MontageRegion Name, Degrees
Pegasus
Concrete Workflow
Condor DAGMAN
TeraGrid Clusters
SDSC
NCSA
ISI Condor Pool
AbstractWorkflow
User Portal
AbstractWorkflowService
2MASSImage List
Service
Grid Schedulingand Execution
Service
UserNotification
Service
ComputationalGrid
mDAGFiles
m2MASSList
mGridExec
ImageList
DAGMan
mNotify
JPL
JPL
IPAC IPAC
ISIAbstractWorkflow
Initial prototype implemented and tested on the TeraGrid Montage performance evaluations Production Montage portal open to the astronomy community this year
Collaboration with JPL & IPAC
Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu
SCEC Derive Probabilistic Hazard Curves & maps for the
Los Angeles Area: 6 sites in 2005, 625 in 2006, and 10,000 in 2007
Probability of a certain ground motion during a certain period of time
Hazard Map
Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu
SCEC workflows on the TG
Provision the resources
Map the Workflow onto
the Grid resources
Run the Workflow on the Grid Resources
Record Information about
the Workflow
Res
ourc
e D
escr
iptio
ns
Executable Workflow
Tasks
Task Info
Grid Resources
Executable workflow
Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu
SCEC Workflows on the TG
Condor’s DAGMan
VDS Kickstart & Provenance Tracking
Catalog (PTC)
Res
ourc
e D
escr
iptio
ns
Executable Workflow
Condor Glide-in
Tasks
Task Info
(Jens Voeckler, Mike Wilde (UofC, ANL)
(nice TeraGrid folks)
University of Wisconsin Madison
Gaurang Mehta at ISI ran the experiments
Local machine
Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu
SCEC computations so far
Pasadena 33 workflows
USC 26 workflows
Each workflow [11, 1000] jobs
23 days total runtime NCSA & SDSC TG
Failed job recovery Retries Rescue DAG
Number of Jobs, total number of jobs 261,823
0100002000030000400005000060000700008000090000
100000
Pea
kValC
alc_O
kaya
Sei
smog
ram
Gen
_Li
Data
Trans
fer
Data
Regist
ratio
n
faile
d job
s
Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu
So far 2 SCEC sites done (Pasadena and USC)
Number of jobs per day (23 days), 261,823 jobs total, Number of CPU hours per day, 15,706 hours total (1.8 years)
1
10
100
1000
10000
100000
10/1
910
/21
10/2
310
/25
10/2
710
/29
10/3
111
/211
/411
/611
/8
11/1
0
JOBS
HRS
Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu
Distribution of seismogram jobs
1
10
100
1000
10000
100000
10 60 110 160 210 260 310 360 410 460 510 900 2400 4200
Time (mins)
Nu
m o
f Jo
bs
70 hours
Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu
Observations from working with the Scientists Two way street: they give us feedback on our technologies,
we show them how things run (break) at scale We have seen great performance improvements in the
codes
Execution Sites
1
10
100
1,000
10,000
100,000
1,000,000
local ncsa sdsc
NUM JOBS
DAYS
Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu
Some other Pegasus Application Domains
Laser Gravitational Wave Observatory (LIGO)
Galaxy morphology (NVO)
Tomography for neural structure reconstruction (NIH)
High-energy physics Gene alignment Natural Language
processing
Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu
LIGO has used Pegasus to run on the Open Science Grid at SC’05
Courtesy of David Meyers, Caltech
Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu
Benefits of the workflow & Pegasus approach
Pegasus can run the workflow on a variety of resources Pegasus can run a single workflow across multiple
resources Pegasus can opportunistically take advantage of
available resources (through dynamic workflow mapping) Pegasus can take advantage of pre-existing intermediate
data products Pegasus can improve the performance of the
application.
Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu
Benefits of the workflow & Pegasus approach
Pegasus shields from the Grid details
The workflow exposes the structure of the application maximum parallelism of the application
Pegasus can take advantage of the structure to Set a planning horizon (how far into the workflow to plan) Cluster a set of workflow nodes to be executed as one (for
performance)
Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu
Pegasus Research
resource discovery and assessment resource selection resource provisioning workflow restructuring
task merged together or reordered to improve overall performance
adaptive computing Workflow refinement adapts to changing execution
environment
workflow debugging
Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu
Software releases
Pegasus http://pegasus.isi.edu released as part of the GriPhyN Virtual Data System (VDS)
Collaborators in VDS: Ian Foster (ANL) Mike Wilde (ANL) and Jens Voeckler (Uof C)
http://vds.isi.edu