18
Under the Hood of a Workflow Manager Matthew Shields, BiodiversityWorld GRID workshop, NeSC, 30 June - 1 July T r a a i n

Under the Hood of a Workflow Manager Matthew Shields, BiodiversityWorld GRID workshop, NeSC, 30 June - 1 July T r a ai n

Embed Size (px)

Citation preview

Page 1: Under the Hood of a Workflow Manager Matthew Shields, BiodiversityWorld GRID workshop, NeSC, 30 June - 1 July T r a ai n

Under the Hood of aWorkflow Manager

Matthew Shields, BiodiversityWorld GRID workshop, NeSC, 30 June - 1 July

T

r

a

ai

n

Page 2: Under the Hood of a Workflow Manager Matthew Shields, BiodiversityWorld GRID workshop, NeSC, 30 June - 1 July T r a ai n

Matthew Shields, Cardiff University

Outline

What is Workflow management?

Why should I care?

Current State of the Art

Workflow Languages

Other Projects

Triana, Architecture & Services

Extending Triana for BDWorld

Conclusion

Page 3: Under the Hood of a Workflow Manager Matthew Shields, BiodiversityWorld GRID workshop, NeSC, 30 June - 1 July T r a ai n

Matthew Shields, Cardiff University

What is Workflow Management?

Concept comes from business world

Many years of research and practice

Process capture and reuse

Repeatability, provenance, audit trails & accountability

Domain expert knowledge capture

Analysis and optimization

Page 4: Under the Hood of a Workflow Manager Matthew Shields, BiodiversityWorld GRID workshop, NeSC, 30 June - 1 July T r a ai n

Matthew Shields, Cardiff University

What Can a Workflow Manager do for Me?

Scientific Workflow different focus to businessLarge-scale data collection

Querying

Analysis

Visualization

Similar goalsComponent & workflow reuse

Knowledge capture

Additional goalsSimplified application/experiment design

Environment/Complexity abstraction

Page 5: Under the Hood of a Workflow Manager Matthew Shields, BiodiversityWorld GRID workshop, NeSC, 30 June - 1 July T r a ai n

Matthew Shields, Cardiff University

State of the Art

Schedule workflow tasks (Grid/distributed environment)

Monitor/Control execution

Active visualization and computational steering

User interaction

Pause and restart

Data provenance

Component and sub-workflow reuse

Analysis and optimization

Page 6: Under the Hood of a Workflow Manager Matthew Shields, BiodiversityWorld GRID workshop, NeSC, 30 June - 1 July T r a ai n

Matthew Shields, Cardiff University

Workflow Languages

No current agreed standard

Most projects use DAG or Petri-Net

Data vs control flow

Dependency vs scripting language

Many XML schema

Business workflow standards - BPELNot good enough fit

GGF WFM-RGAttempting to solicit agreement on standards

Page 7: Under the Hood of a Workflow Manager Matthew Shields, BiodiversityWorld GRID workshop, NeSC, 30 June - 1 July T r a ai n

Matthew Shields, Cardiff University

Workflow Management Projects

myGrid/Taverna - Southampton & othersXML/DAG based workflow languageInitially WS choreography tool - now incorporates local tools/componentsGrid integration with databases via OGSA Distributed Query ProcessormyGrid Project main users - Bioinformatics

Kepler - SDSCBased on Ptolemy - modeling, simulation & design of real time & concurrent systemsConcurrent dataflowActors (components), Directors (workflow engines)Local, Web Service & Grid Service actorsEcology, biology, chemistry, oceanography, and the geosciences

Page 8: Under the Hood of a Workflow Manager Matthew Shields, BiodiversityWorld GRID workshop, NeSC, 30 June - 1 July T r a ai n

Matthew Shields, Cardiff University

WM Projects 2

Karajan/Commodity Grid (CoG) Kit, Argonne & Berkerley

Scripting workflow language for Grid tasks

Integration with Globus Toolkit GT3 & GT4

Pure control flow

Data flow performed by data tasks - GridFTP

And many more…See

http://www.gridworkflow.org/snips/gridworkflow/

http://www.extreme.indiana.edu/swf-survey/

Page 9: Under the Hood of a Workflow Manager Matthew Shields, BiodiversityWorld GRID workshop, NeSC, 30 June - 1 July T r a ai n

Matthew Shields, Cardiff University

Triana

Cardiff University! PPARC funded

Java based Scientific Workflow Tool or PSE

Originally designed for Signal Processing

Now domain independentBioinformatics - obviously!

Signal Processing - gravitational wave detection & radio astronomy

Design optimisation

Data mining

Medical imaging

Distributed Audio Processing

Page 10: Under the Hood of a Workflow Manager Matthew Shields, BiodiversityWorld GRID workshop, NeSC, 30 June - 1 July T r a ai n

Matthew Shields, Cardiff University

Triana Components

Local Java componentsService-oriented Components

Web services as components (WSRF coming soon)Web service workflowPeer 2 Peer services as componentsDistributed service workflow

Grid-oriented ComponentsGrid file and job primitives as componentsComplex Grid workflow

Legacy code components via GridMonSteerMix and Match composition

Page 11: Under the Hood of a Workflow Manager Matthew Shields, BiodiversityWorld GRID workshop, NeSC, 30 June - 1 July T r a ai n

Matthew Shields, Cardiff University

Workflow

Inherently data flow basedcontrol flow through “messages”

XML/DCG workflow formatInternally workflow language independent

Migration to standards based language

Simple Parent/Child relationship between tasks

Context based implied actionsLocal file -> local file = file copy

Local file -> remote file = file transfer

Import/Export other workflow formatsPegasus/EGEE read/write DAGMan format

Page 12: Under the Hood of a Workflow Manager Matthew Shields, BiodiversityWorld GRID workshop, NeSC, 30 June - 1 July T r a ai n

Matthew Shields, Cardiff University

Triana Architecture

P2PS JXTAWeb

Services

GAP Interface

UDDISOAP

P2PSDiscovery

P2PSPipes

JXTADiscovery

JXTAPipes

GAT Interface

Condor

Globus RLS

Unicore

PBS GridLab

GRMS

SGESSH

WSRF

LDR

.NET

Other..

GridFTP

Grid Computing:

Job Submission, File services

A Graphical Grid Computing

Environment or Portal

Service Based Computing:

Deployment, discovery and communication with distributed services e.g. P2P and (GSI) Web services

Grid services

Page 13: Under the Hood of a Workflow Manager Matthew Shields, BiodiversityWorld GRID workshop, NeSC, 30 June - 1 July T r a ai n

Matthew Shields, Cardiff University

Triana in a SO World

network

babelfish.altavista.

com

BabelFish

en_fr

hello

bonjourService Discovery

Dynamic?Decentralized?

CommunicationMessage Format

SOAP?

Transport ProtocolTCP?UDP?

GAP

Page 14: Under the Hood of a Workflow Manager Matthew Shields, BiodiversityWorld GRID workshop, NeSC, 30 June - 1 July T r a ai n

Matthew Shields, Cardiff University

GAP Interface

A Simple Service based API, forService Deployment,Service DiscoveryPipe Based Communication

Static application interface with multiple middleware bindings

P2PSJXTAWeb services

P2PS JXTAWeb

Services

GAP Interface

UDDISOAP

P2PSDiscovery

P2PSPipes

JXTADiscovery

JXTAPipes

Page 15: Under the Hood of a Workflow Manager Matthew Shields, BiodiversityWorld GRID workshop, NeSC, 30 June - 1 July T r a ai n

Matthew Shields, Cardiff University

WSPeer

High Level Interface to Web ServicesDiscovery InvocationDeploymentHosting

Abstract from usual Web Service Discovery and Communication Mechanisms (i.e. UDDI and HTTP)

P2PS Web Service Discovery?

Uses Apache AXIS as SOAP EngineExtends Capabilities of Apache AXIS

Stubless Invocation (including complex types)Non Standard Transports (i.e. P2PS)

Page 16: Under the Hood of a Workflow Manager Matthew Shields, BiodiversityWorld GRID workshop, NeSC, 30 June - 1 July T r a ai n

Matthew Shields, Cardiff University

WSPeer

WSPeer – P2PS

Application

WSPeer – HTTP/UDDI

deploy publish locate invoke

UDDI

HTTPServer

deploy

launch server

publish locate

invoke

deploy

publish locate

invoke

Page 17: Under the Hood of a Workflow Manager Matthew Shields, BiodiversityWorld GRID workshop, NeSC, 30 June - 1 July T r a ai n

Matthew Shields, Cardiff University

Extending Triana for BDWorld

BDWorld proxy components talk to Web Services

Workflow Design Assistant (WfDA) selection and composition of BDWorld workflows from available services

Uses Meta Data Repository (MDR) & Meta Data Agent (MDA)

MDR contains mapping from proxies to resources

WfDA captures domain knowledge in constraints

Constraints used to limit the possible components at each stage of composition

Simplifies valid workflow creation

Page 18: Under the Hood of a Workflow Manager Matthew Shields, BiodiversityWorld GRID workshop, NeSC, 30 June - 1 July T r a ai n

Matthew Shields, Cardiff University

Conclusion

A workflow manager should:

Simplify scientific experimentationEnable reuse at multiple levels

ComponentSub-workflow/Compund componentsCollaboration

Abstract component and environment complexitiesThink of all components as a service that performs a known taskImplied/Context based operations - file copy/move

Put the scientist back in control of the science, not the computing