22
Pegasus on the Virtual Grid: A Case Study of Workflow Planning over Captive Resources Yang-Suk Kee, Eun-Kyu Byun, Ewa Deelman, Kran Vahi, Jin-Soo Kim Oracle US Inc Korea Advanced Institute of Science and Technology Information Sciences Institute/University of Southern California Sungkyunkwan University

Pegasus on the Virtual Grid: A Case Study of Workflow Planning over Captive Resources Yang-Suk Kee, Eun-Kyu Byun, Ewa Deelman, Kran Vahi, Jin-Soo Kim Oracle

Embed Size (px)

Citation preview

Pegasus on the Virtual Grid: A Case Study of Workflow Planning over Captive Resources

Yang-Suk Kee, Eun-Kyu Byun, Ewa Deelman, Kran Vahi, Jin-Soo KimOracle US Inc

Korea Advanced Institute of Science and TechnologyInformation Sciences Institute/University of Southern California

Sungkyunkwan University

Overview

Motivation Background

– Pegasus– Virtual Grid

Pegasus-VG Proxy Conclusion Discussion

Motivation

Challenges in scientific application development

– Data/control flow, task scheduling, data replication, fault-tolerance, etc

Challenges in resource management– Availability, performance, cost, reliability, fault-

tolerance, etc

How to leverage existing cyber infrastructures for easy and efficient scientific computing?

Separations of Concerns

Application domain– Workflow management: application management

can be conducted independently of target execution environments.

– E.g.) Pegasus, Askalon, Triana Resource domain

– Resource provisioning: resource management can be encapsulated underneath abstractions or virtualizations

– E.g.) Virtual Grid, virtual cluster, cloud

Workflow planning and execution over provisioned resources

Pegasus

A framework for workflow planning and execution

Workflow lifecycle– Design: describe the data/control flows of

application via an abstract workflow– Planning: map the workflow tasks onto physical

resources– Execution: schedule and run the workflow tasks

on the mapped resources

Pegasus Workflow Management

Pegasus mapper

Condor DAGman

Condor

Computing environment

MonitoringInformation provenance

Pegasus Executable workflow

tasks

tasks MonitoringInformation provenance

Abstract workflow

Condor pool

Virtual Grid

A programmable virtualized resource provisioning framework

Components– vgDL (Virtual Grid Description Language)

Specifies resource requirements– vgES (Virtual Grid Execution System)

Compiles and coordinates resources– PC (Personal Cluster)

Provides uniform job management

TimeshareTimeshare

A

B C

D

Application

Virtual GridResource Abstraction

Virtual GridResource Abstraction

VG

TimeshareTimeshare

LeaseLease

BatchBatch

VGVG

PBS

P4 P4VGDLVGDL

vgdl=clusterof (node) [2] { node = [Processor==“P4”]}

program run

A B

C

D

ClassificationClassification SelectionSelection BindingBinding EnvironmentEnvironment

ok

Pegasus on Virtual Grid

Scope– A basic integration for workflow planning and

execution over provisioned resources

Issues– Resource capacity estimation

Resource specification (vgDL) synthesis for Virtual Grid

– Resource information publicationSite catalog generation for Pegasus

Resource Capacity Estimation

What Virtual Grid expects from Pegasus– vgDL description

Available information– Task execution time, data transfer time, performance

metrics, minimum memory capacity, cost, deadline, etc

Unknown information– # of virtual processors

Resource capacity estimate– Minimize the # of processors that can execute a workflow

within a deadline

BTS (Balanced Time Scheduling)

Ref: E-science’08 E.-K. Byun, Y.-S. Kee et. al

1

2 3 4 5

6

ID

1

2

3

4

5

6

ET

1

5

2

2

1

1

1

2

6

3

4

5

Tim

e

p1 p2

How many processors do we need to run this workflow within 7 units?

Example

Execution time of each task - Xeon processor Data transfer time - network with 1Gbs bandwidth. Deadline is 1 hour.

Diamond = ClusterOf [2] (nd) [, 0:30:00] { nd = [Processor == “Xeon”] }

preprocess

findrange findrange

analyze

f.input

f.output

Resource Information Publication

What Pegasus expects from Virtual Grid– Site catalog

Virtual Grid– VG instance

Resource information publication– Devirtualize a VG instance and generate a site

catalog for Pegasus

TimeshareTimeshare

A

B C

D

Application

Virtual GridResource Abstraction

Virtual GridResource Abstraction

VG

TimeshareTimeshare

LeaseLease

BatchBatch

VGVG

PBS

P4 P4VGDLVGDL

vgdl=clusterof (node) [2] { node = [Processor==“P4”]}

program run

A B

C

D

ClassificationClassification SelectionSelection BindingBinding EnvironmentEnvironment

ok

Personal Cluster

A partition of resources dedicated to a user under the control of a user-level resource manager during a limited time period

GT4/PBS

GT4/PBS

Ref: HCW’08 Y.-S. Kee and C. Kesselman

Site Catalog Publication

<sitecatalog xmlns="http://pegasus.isi.edu/schema/sitecatalog" …>…<profile namespace="env" key="PEGASUS_HOME">/home/globus/pegasus-2.1.0</profile> <profile namespace="condor" key="grid_type">gt4</profile> <profile namespace="condor" key="jobmanager_type">PBS</profile> <lrc url="rlsn://cat7.kaist.ac.kr" /> <gridftp url="gsiftp://cat7.kaist.ac.kr:2811" storage="/home/globus" major="4" minor="0" patch="7" /> <jobmanager universe="transfer" url="https://cat7.kaist.ac.kr:9000/wsrf/services/ManagedJobFactoryService" major="4" minor="0" patch="7" total-nodes="2" /><jobmanager universe="vanilla" url="https://cat7.kaist.ac.kr:9000/wsrf/services/ManagedJobFactoryService" major="4" minor="0" patch="7" total-nodes="2" /><workdirectory>$HOME/workdir</workdirectory> </site>…</sitecatalog>

Workflow Planning over Provisioned Resources

Creation

Planning

Scheduling/Execution

A

B C

D

CC

A

B C

D

CC

Executable workflow

Abstract workflow BTS

VGVG

Virtu

al G

ridV

irtua

l Grid

VGDL

Devirtualization

Site

catalog

vgdl = ClusterOf (nd) [2] { nd = [Proc==“Xeon”] }

GT4+PBS

Pegasus VG-Pegasus Proxy

Conclusion

Pegasus on Virtual Grid– Implements workflow planning and execution

over on-demand captive resources– Enables easy and efficient application

development and execution

Issues– Resource capacity estimation– Site catalog publication

Discussion

Effective performance– What is the cost that a user has to pay to have a

successful execution?

Ongoing studies– Find-grain planning for resource provisioning

Performance, cost, reliability– Workflow execution for virtualization

Recovery of failed tasks

Need More Information?

Pegaus– http://pegasus.isi.edu

VGrADS– Tuesday, 11:30am, RENCI booth (2633)– Wednesday, noon, GCAS booth (285)– Wednesday, 2:00Pm, SDSC booth (568)– Wednesday, 4:00pm, RENCI booth (2633)

AQ&Q U E S T I O N SQ U E S T I O N S

A N S W E R SA N S W E R S