24
Pegasus: Planning for Execution in Grids Ewa Deelman Information Sciences Institute University of Southern California

Pegasus: Planning for Execution in Grids Ewa Deelman Information Sciences Institute University of Southern California

Embed Size (px)

Citation preview

Pegasus: Planning for Execution in

Grids

Ewa DeelmanInformation Sciences Institute

University of Southern California

Ewa Deelman Information Sciences Institute

Pegasus Acknowledgements

Ewa Deelman, Carl Kesselman, Saurabh Khurana, Gaurang Mehta, Sonal Patil, Gurmeet Singh, Mei-Hui Su, Karan Vahi (ISI)

James Blythe, Yolanda Gil (ISI) http://pegasus.isi.edu Research funded as part of the NSF

GriPhyN, NVO and SCEC projects.

Ewa Deelman Information Sciences Institute

Outline

General Scientific Workflow Issues on the Grid Mapping complex applications onto the Grid

Pegasus Pegasus Application Portal

LIGO-gravitational-wave physics Montage-astronomy

Incremental Workflow Refinement Futures

Ewa Deelman Information Sciences Institute

Grid Applications Increasing in the level of complexity Use of individual application components Reuse of individual intermediate data products (files) Description of Data Products using Metadata Attributes

Execution environment is complex and very dynamic Resources come and go Data is replicated Components can be found at various locations or staged in on

demand

Separation between the application description the actual execution description

Ewa Deelman Information Sciences Institute

FFT

FFT filea

/usr/local/bin/fft /home/file1

transfer filea from host1://home/filea

to host2://home/file1

ApplicationDomain

AbstractWorkflow

ConcreteWorkflow

ExecutionEnvironment

host1 host2

Data

Data

host2

App

licat

ion

Dev

elop

men

t and

Exe

cutio

n P

roce

ss

DataTransfer

Resource SelectionData Replica Selection

Transformation InstanceSelection

ApplicationComponentSelection

Retry

Pick different Resources

Specify aDifferentWorkflow

Failure RecoveryMethod

Abstract Workflow

Generation

ConcreteWorkflow

Generation

Ewa Deelman Information Sciences Institute

Why Automate Workflow Generation?

Usability: Limit User’s necessary Grid knowledge Monitoring and Directory Service Replica Location Service

Complexity: User needs to make choices

Alternative application components Alternative files Alternative locations

The user may reach a dead end Many different interdependencies may occur among

components Solution cost:

Evaluate the alternative solution costs Performance Reliability Resource Usage

Global cost: minimizing cost within a community or a virtual organization requires reasoning about individual user’s choices in light of

other user’s choices

Ewa Deelman Information Sciences Institute

Executable Workflow Construction Chimera builds an abstract workflow based

on VDL descriptions Pegasus takes the abstract workflow and

produces and executable workflow for the Grid

Condor’s DAGMan executes the workflow

AbstractWorfklow

Concrete Workflow

Jobs

Chimera Pegasus DAGMan

Ewa Deelman Information Sciences Institute

Pegasus:Planning for Execution in Grids

Maps from abstract to concrete workflow Algorithmic and AI based techniques

Automatically locates physical locations for both components (transformations) and data Use Globus RLS and the Transformation Catalog

Finds appropriate resources to execute via Globus MDS

Reuses existing data products where applicable Publishes newly derived data products

Chimera virtual data catalog

Ewa Deelman Information Sciences Institute

GridGridGrid

workflow executor (DAGman)Execution

WorkflowPlanning

Globus Replica Location Service

Globus Monitoring and Discovery

Service

Information and Models

detector

Raw data

Co

ncr

ete

Wo

rkfl

ow

Replica LocationAvailable Reources

Abstract Worfklow

Dyn

amic

in

form

atio

n

Request Manager

Replica and Resource SelectorSubmission and

Monitoring System

Workflow Reduction

DataPublication

Virtual Data Language Chimera

Data Management

TransformationCatalog

Chimera is developed at ANLBy I. Foster, M. Wilde, and J. Voeckler

Ewa Deelman Information Sciences Institute

Example Workflow Reduction

Original abstract workflow

If “b” already exists (as determined by query to the RLS), the workflow can be reduced

d1 d2ba c

d2b c

Ewa Deelman Information Sciences Institute

Mapping from abstract to concrete

Query RLS, MDS, and TC, schedule computation and data movement

Execute d2 at B

Move b from A

to B

Move c from B

to U

Register c in the

RLS

d2b c

Ewa Deelman Information Sciences Institute

LIGO-specific interface

Montage-specific

Interface

Globus MDS

Globus RLS

Chimera

User

Metadata/

VDLMetadata/

Abstract Workflow

PegasusPortal

The Grid

Metadata CatalogService

TransformationCatalog

MyProxy

DAGMan

Authentication

VDL/

Abst

ract

Wor

fklo

w

Execution

records

Abstract Workflow/

Information

Information

Concr

ete

Wor

kflo

w/

Info

rmat

ion

Jobs/Information

Information

Sim

plifi

ed V

iew

of S

C 2

003

Por

tal

Ewa Deelman Information Sciences Institute

LIGO Scientific Collaboration

Continuous gravitational waves are expected to be produced by a variety of celestial objects

Only a small fraction of potential sources are known Need to perform blind searches, scanning the regions of

the sky where we have no a priori information of the presence of a source

Wide area, wide frequency searches Search is performed for potential sources of continuous

periodic waves near the Galactic Center and the galactic core

The search is very compute and data intensive LSC used the occasion of SC2003 to initiate a month-long

production run with science data collected during 8 weeks in the Spring of 2003

Ewa Deelman Information Sciences Institute

Additional resources used: Grid3 iVDGL resources

Ewa Deelman Information Sciences Institute

LIGO Acknowledgements Bruce Allen, Scott Koranda, Brian Moe, Xavier Siemens,

University of Wisconsin Milwaukee, USA

Stuart Anderson, Kent Blackburn, Albert Lazzarini, Dan Kozak, Hari Pulapaka, Peter Shawhan, Caltech, USA

Steffen Grunewald, Yousuke Itoh, Maria Alessandra Papa, Albert Einstein Institute, Germany

Many Others involved in the Testbed

www.ligo.caltech.edu www.lsc- group.phys.uwm.edu/lscdatagrid/ http://pandora.aei.mpg.de/merlin/

LIGO Laboratory operates under NSF cooperative agreement PHY-0107417

Ewa Deelman Information Sciences Institute

Montage Montage (NASA and NVO) Deliver science-grade

custom mosaics on demand

Produce mosaics from a wide range of data sources (possibly in different spectra)

User-specified parameters of projection, coordinates, size, rotation and spatial sampling.

Mosaic created by Pegasus based Montage from a run of the M101 galaxy images on the Teragrid.

Ewa Deelman Information Sciences Institute

Small Montage Workflow

~1200 nodes

Ewa Deelman Information Sciences Institute

Montage Acknowledgments Bruce Berriman, John Good, Anastasia Laity,

Caltech/IPAC Joseph C. Jacob, Daniel S. Katz, JPL http://montage.ipac. caltech.edu/

Testbed for Montage: Condor pools at USC/ISI, UW Madison, and Teragrid resources at NCSA, PSC, and SDSC.

Montage is funded by the National Aeronautics and Space Administration's Earth Science Technology Office, Computational Technologies Project, under Cooperative Agreement Number NCC5-626 between NASA and the California Institute of Technology.

Ewa Deelman Information Sciences Institute

Other Applications Using Chimera and Pegasus

Other GriPhyN applications: High-energy physics: Atlas, CMS (many) Astronomy: SDSS (Fermi Lab, ANL)

Astronomy: Galaxy Morphology (NCSA, JHU, Fermi,

many others, NVO-funded) Biology

BLAST (ANL, PDQ-funded) Neuroscience

Tomography (SDSC, NIH-funded)

Ewa Deelman Information Sciences Institute

Current System

Original Abstract Workflow

Current Pegasus

Pegasus(Abstract Workflow)

DAGMan(CW))

Co

ncre

te W

orfklo

w

Workflow Execution

Ewa Deelman Information Sciences Institute

time

Levels ofabstraction

Application-level

knowledge

Logicaltasks

Tasksbound toresources

and sent forexecution

User’sRequest

Relevantcomponents

Fullabstractworkflow

Partialexecution

Not yetexecuted

executed

Workflow refinement

Taskmatchmaker

Workflow repair

Policyinfo

Workflow Refinement and execution

Ewa Deelman Information Sciences Institute

Incremental Refinement Partition Abstract workflow into partial

workflows

PW A

PW B

PW C

A Particular PartitioningNew Abstract

Workflow

Ewa Deelman Information Sciences Institute

Meta-DAGMan

Pegasus(A)

Pegasus(B)

Pegasus(C)

DAGMan(Su(A))

Su(B)

Su(C)

DAGMan(Su(B))

DAGMan(Su(C))

Pegasus(X) –Pegasus generates the concrete workflow and the submit files for X = Su(X)

DAGMan(Su(X))—DAGMan executes the concrete workflow for X

Ewa Deelman Information Sciences Institute

Future Directions

Incorporate AI-planning technologies in production software (Virtual Data Toolkit)

Investigate various scheduling techniques Investigating fault tolerance issues

Selecting resources based on their reliability Responding to failures

http://pegasus.isi.edu