23
Programming Scientific and Distributed Workflow with Triana Services Matthew Shields, GGF10 Workflow Workshop, 9th March

Programming Scientific and Distributed Workflow with Triana Services Matthew Shields, GGF10 Workflow Workshop, 9th March

Embed Size (px)

Citation preview

Programming Scientific and Distributed Workflow with Triana Services

Matthew Shields, GGF10 Workflow Workshop, 9th March

Matthew Shields, Cardiff University

Presentation Outline

TrianaOverviewTriana services and their distribution Distribution policiesThe GAP interface and its relation to the Gridlab GAT

Scientific WorkflowBinary Inspiral Algorithm Example

Dynamic Distributed WorkflowService Composition on the GridService Usage, dynamically distributing a Triana workflow

Conclusion

Matthew Shields, Cardiff University

What is Triana?

Matthew Shields, Cardiff University

GAP Any

GAP service

e.g. Web service

Triana Distributed Work-flow

Network

ActionCommand

s

Workflow, e.g. BPEL4WS

TrianaEngine

Triana ControllingService (TCS)

TrianaService &

Engine

TrianaService &

Engine

OtherEngine

Distributed Triana Work-flow- flexible distribution: based around Triana Groups- HPC and Pipelined distribution

Triana Gateway

Matthew Shields, Cardiff University

GAP Overview based around a series of Java interface classesConcrete implementations that form the GAP bindings The core interface is the

Service Creation and DiscoveryPipe Creation and DiscoveryMessage CommunicationInformationJob SubmmissionData Management - transfers - logical lookup

Will be become an adapter for the GridLab Java GAT, providing:

Advertisement, Discovery, deployment and communication of servicesGRMS job submission adapterData Management Services

Matthew Shields, Cardiff University

Jxtaserve GSI EnabledNS-2And more..

Java GAT Prototype

Jxta

GridLab GAT (www.gridlab.org)

• Advertising• Discovery• Communication

GAP (Java Prototype)

Web Services

P2PS

Job Submission (GRMS)

• Generic Job Submission• Virtual filename data accessData

Management

• Set of generic Java interfaces• high level abstractions to Grid services• Factory design – dynamic pluggable services

OGSA(planned)

Matthew Shields, Cardiff University

Triana Prototype

Distributed Triana Prototype Based around Triana Groups i.e. aggregate toolsEach group can be distributedDistribution policies:

HTC - high throughput/task farmingPipeline - allow node to node communication

Each service can be a gateway to finer granularities of distribution:

PipelineDistribution

Task-FarmingDistribution

Triana Service

Triana Service

Triana Service

Triana Service

Triana Service

Triana Service

Matthew Shields, Cardiff University

Triana Workflow

Triana is inherently flow basedData flow - data arriving at component triggers executionControl flow - control commands trigger execution

Decentralised executionData or Control messages sent along communication “pipes” from sender to receiver causes receiver to executeSynchronous or Asynchronous messaging (Implementation dependant)Multiple inputs can block or trigger immediately (Component designer defined)

Matthew Shields, Cardiff University

Components and Definitions

Component is unit of executionComponents are defined in XML files:

Naming informationInput and output portsParameter information

Why Components?To simplify the application design process and to speed up application development

The component model provides an infrastructure for the interaction of components

Matthew Shields, Cardiff University

Taskgraph

Internal object based workflow graph representationTaskgraph - DAGTasksConnections

External XML representationSimple XML syntaxList of participating Task definitionsParent/Child connectionHierarchical (Compound components)

Alternative Languages & Syntaxe.g. BPEL4WSAvailable through pluggable readers & writers.

Matthew Shields, Cardiff University

Workflow

No explicit language support for control constructsLoops and execution branching handled by components

Loop component - controls loop over sub-workflowLogical component - control workflow branching

Unlike BPEL4WS or similarFlexibility of control - constraint based loops etc…

Matthew Shields, Cardiff University

Distributing Triana Workflow

Deploying Remote Services on ResourcesService application installation Service executionService discovery

Mapping tasks or groups of tasks to Services

Workflow rewiring, XML definition for connections modified for remote location - sub-workflows duplicatedData distribution, annotated sub-sections of taskgraph passed to resources

Matthew Shields, Cardiff University

GEO 600 Inspiral Search

BackgroundCompact binary stars orbiting each other in a close orbit

among the most powerful sources of gravitational waves

As the orbital radius decreases a characteristic chirp waveform is produced - amplitude and frequency increase with time until eventually the two bodies merge together

Computing Need 10 Gigaflops to keep up with real time data (modest search..)

Data 8kHz in 24-bit resolution (stored in 4 bytes) -> Signal contained within 1 kHz = 2000 samples/seconddivided into chunks of 15 minutes in duration (i.e. 900 seconds) = 8MB

Algorithm Data is transmitted to a nodeNode initialises i.e. generates its templates (around 10000)fast correlates its templates with data

Matthew Shields, Cardiff University

Coalescing Binary Search

GEO 600 Coalescing

Binary Search Algorithm

implemented as a Triana workflow

Matthew Shields, Cardiff University

Coalescing Binary Scenario

GridlabTest-bed

GW Data

Distributed Storage

Logical File Name

CB Search

Controller

GAT (GRMS, Adaptive)

GW Data

GAT (Data Management)

• Submit Job• Optimised Mapping

Email, SMS notification

Matthew Shields, Cardiff University

GRMS Web Service

rage1.man.poznan.pl

GridlabTestbed

GAP

Triana Service Job Submission

Matthew Shields, Cardiff University

Triana GRMS Component

Front end to GridLab GRMS Web ServiceJob Submission Service - interfaces with GRAM

GAP Web Service binding + GSI AuthenticationJava CoG Kit

X509 Certificate handlingAxis authentication & communication

GRMS executes applications on GridLab Testbed

Heterogeneous hardware platforms Default software - Globus 2.4, GSISSH, cc, cvs, c++, F90, make, perl, mpicc

Matthew Shields, Cardiff University

Service Composition Workflow

Multiple GRMS Components

Install Applications (ftp, tar, ant)

Start installed Triana Services

Matthew Shields, Cardiff University

Dynamic Distributed Workflow

Distribution units are standard Triana tools, enabling users to create their own custom distributions

DistributionUnit

Wave Grapher

GaussianFFT

GaussianFFT

RemoteServices

LocalTriana

The workflow is cloned/split/rewired to achieve the required distribution topology

Custom distribution units allow sub-workflows to be distributed in parallel or pipelined

Matthew Shields, Cardiff University

Conclusion

GridlabTest-bed

GW Data

Distributed Storage

Logical File Name

CB Search

Controller

GAT (GRMS, Adaptive)

GW Data

GAT (Data Management)

• Submit Job• Optimised Mapping

Email, SMS notification

Matthew Shields, Cardiff University

Conclusion

Shown three distinct workflowsService composition workflow to submit grid jobs that deploys multiple Triana Services on remote resourcesLocal scientific workflow representing the algorithmDynamic distributed workflow - rewire local workflow for data parallelism across multiple Triana Services

GAP APIWeb Service binding + GSI - Grid Job SubmissionP2PS binding - service discovery + service communication

Combined to perform parallel scientific computation

Matthew Shields, Cardiff University

Thanks !

• The Astronomers: Prof. B Sathyaprakash, David Churches, Roger Philp and Craig Robinson

• The Triana team: Ian Wang, Andrew

Harrison, Omer Rana, Diem Lam and Shalil Majithia

• All the partners in the GridLab project

Matthew Shields, Cardiff University

Thanks !Information & Software

http://www.trianacode.org/

http://www.gridlab.org/