Upload
kaitlyn-manning
View
213
Download
0
Tags:
Embed Size (px)
Citation preview
Under the Hood of aWorkflow Manager
Matthew Shields, BiodiversityWorld GRID workshop, NeSC, 30 June - 1 July
T
r
a
ai
n
Matthew Shields, Cardiff University
Outline
What is Workflow management?
Why should I care?
Current State of the Art
Workflow Languages
Other Projects
Triana, Architecture & Services
Extending Triana for BDWorld
Conclusion
Matthew Shields, Cardiff University
What is Workflow Management?
Concept comes from business world
Many years of research and practice
Process capture and reuse
Repeatability, provenance, audit trails & accountability
Domain expert knowledge capture
Analysis and optimization
Matthew Shields, Cardiff University
What Can a Workflow Manager do for Me?
Scientific Workflow different focus to businessLarge-scale data collection
Querying
Analysis
Visualization
Similar goalsComponent & workflow reuse
Knowledge capture
Additional goalsSimplified application/experiment design
Environment/Complexity abstraction
Matthew Shields, Cardiff University
State of the Art
Schedule workflow tasks (Grid/distributed environment)
Monitor/Control execution
Active visualization and computational steering
User interaction
Pause and restart
Data provenance
Component and sub-workflow reuse
Analysis and optimization
Matthew Shields, Cardiff University
Workflow Languages
No current agreed standard
Most projects use DAG or Petri-Net
Data vs control flow
Dependency vs scripting language
Many XML schema
Business workflow standards - BPELNot good enough fit
GGF WFM-RGAttempting to solicit agreement on standards
Matthew Shields, Cardiff University
Workflow Management Projects
myGrid/Taverna - Southampton & othersXML/DAG based workflow languageInitially WS choreography tool - now incorporates local tools/componentsGrid integration with databases via OGSA Distributed Query ProcessormyGrid Project main users - Bioinformatics
Kepler - SDSCBased on Ptolemy - modeling, simulation & design of real time & concurrent systemsConcurrent dataflowActors (components), Directors (workflow engines)Local, Web Service & Grid Service actorsEcology, biology, chemistry, oceanography, and the geosciences
Matthew Shields, Cardiff University
WM Projects 2
Karajan/Commodity Grid (CoG) Kit, Argonne & Berkerley
Scripting workflow language for Grid tasks
Integration with Globus Toolkit GT3 & GT4
Pure control flow
Data flow performed by data tasks - GridFTP
And many more…See
http://www.gridworkflow.org/snips/gridworkflow/
http://www.extreme.indiana.edu/swf-survey/
Matthew Shields, Cardiff University
Triana
Cardiff University! PPARC funded
Java based Scientific Workflow Tool or PSE
Originally designed for Signal Processing
Now domain independentBioinformatics - obviously!
Signal Processing - gravitational wave detection & radio astronomy
Design optimisation
Data mining
Medical imaging
Distributed Audio Processing
Matthew Shields, Cardiff University
Triana Components
Local Java componentsService-oriented Components
Web services as components (WSRF coming soon)Web service workflowPeer 2 Peer services as componentsDistributed service workflow
Grid-oriented ComponentsGrid file and job primitives as componentsComplex Grid workflow
Legacy code components via GridMonSteerMix and Match composition
Matthew Shields, Cardiff University
Workflow
Inherently data flow basedcontrol flow through “messages”
XML/DCG workflow formatInternally workflow language independent
Migration to standards based language
Simple Parent/Child relationship between tasks
Context based implied actionsLocal file -> local file = file copy
Local file -> remote file = file transfer
Import/Export other workflow formatsPegasus/EGEE read/write DAGMan format
Matthew Shields, Cardiff University
Triana Architecture
P2PS JXTAWeb
Services
GAP Interface
UDDISOAP
P2PSDiscovery
P2PSPipes
JXTADiscovery
JXTAPipes
GAT Interface
Condor
Globus RLS
Unicore
PBS GridLab
GRMS
SGESSH
WSRF
LDR
.NET
Other..
GridFTP
Grid Computing:
Job Submission, File services
A Graphical Grid Computing
Environment or Portal
Service Based Computing:
Deployment, discovery and communication with distributed services e.g. P2P and (GSI) Web services
Grid services
Matthew Shields, Cardiff University
Triana in a SO World
network
babelfish.altavista.
com
BabelFish
en_fr
hello
bonjourService Discovery
Dynamic?Decentralized?
CommunicationMessage Format
SOAP?
Transport ProtocolTCP?UDP?
GAP
Matthew Shields, Cardiff University
GAP Interface
A Simple Service based API, forService Deployment,Service DiscoveryPipe Based Communication
Static application interface with multiple middleware bindings
P2PSJXTAWeb services
P2PS JXTAWeb
Services
GAP Interface
UDDISOAP
P2PSDiscovery
P2PSPipes
JXTADiscovery
JXTAPipes
Matthew Shields, Cardiff University
WSPeer
High Level Interface to Web ServicesDiscovery InvocationDeploymentHosting
Abstract from usual Web Service Discovery and Communication Mechanisms (i.e. UDDI and HTTP)
P2PS Web Service Discovery?
Uses Apache AXIS as SOAP EngineExtends Capabilities of Apache AXIS
Stubless Invocation (including complex types)Non Standard Transports (i.e. P2PS)
Matthew Shields, Cardiff University
WSPeer
WSPeer – P2PS
Application
WSPeer – HTTP/UDDI
deploy publish locate invoke
UDDI
HTTPServer
deploy
launch server
publish locate
invoke
deploy
publish locate
invoke
Matthew Shields, Cardiff University
Extending Triana for BDWorld
BDWorld proxy components talk to Web Services
Workflow Design Assistant (WfDA) selection and composition of BDWorld workflows from available services
Uses Meta Data Repository (MDR) & Meta Data Agent (MDA)
MDR contains mapping from proxies to resources
WfDA captures domain knowledge in constraints
Constraints used to limit the possible components at each stage of composition
Simplifies valid workflow creation
Matthew Shields, Cardiff University
Conclusion
A workflow manager should:
Simplify scientific experimentationEnable reuse at multiple levels
ComponentSub-workflow/Compund componentsCollaboration
Abstract component and environment complexitiesThink of all components as a service that performs a known taskImplied/Context based operations - file copy/move
Put the scientist back in control of the science, not the computing