24
A Hybrid Decomposition Scheme for Building Scientific Workflows Wei Lu Indiana University

A Hybrid Decomposition Scheme for Building Scientific Workflows

  • Upload
    samuru

  • View
    53

  • Download
    0

Embed Size (px)

DESCRIPTION

A Hybrid Decomposition Scheme for Building Scientific Workflows. Wei Lu Indiana University. Our work. Application Decomposition. Large scientific applications require Decomposing the problem into manageable units Units need to be Self-described Self-encapsulated - PowerPoint PPT Presentation

Citation preview

Page 1: A Hybrid Decomposition Scheme for Building Scientific Workflows

A Hybrid Decomposition Scheme for Building Scientific

WorkflowsWei Lu

Indiana University

Page 2: A Hybrid Decomposition Scheme for Building Scientific Workflows

Application Decomposition

• Large scientific applications require– Decomposing the problem into manageable units– Units need to be

• Self-described • Self-encapsulated • Independently developed and deployed • composable

• Two decomposition dimensions– Functional Decomposition (a.k.a. Spatial Decomposition)

• C/C++, JAVA• Component

– Temporal Decomposition• Unix Pipe• Workflow

– however,• most PSEs provide only one approach to the exclusion of the other

Our work

Page 3: A Hybrid Decomposition Scheme for Building Scientific Workflows

Common Component Architecture (CCA)

• Scientific computing imposes special requirements– Support for legacy software– Performance is crucial– languages, data types

• Fortran, C/C++, Python, Java, etc.• Complex numbers and Arrays (as first-class objects)

– Support the various parallel run-time platforms

• CCA– Component framework specification– Designed for the scientific high performance computing– Aims at improving the scientific software reusing

Page 4: A Hybrid Decomposition Scheme for Building Scientific Workflows

CCA Component

• Each component describes– What functionality it fulfills

• Provide port– What functionality it needs to fulfill its task

• Use port

• Use-Provide pattern– Plug-and-play

• The port is described in SIDL– Scientific Interface Definition Language– Partially derived from CORBA IDL– With constructs to describe the complex number,

array, etc.– Babel : Language Interoperability Tool

NonlinearFunction

FunctionPortFunctionPort

MidpointIntegrator

IntegratorPort

C Fortran

LinearFunction

FunctionPort

Python

Page 5: A Hybrid Decomposition Scheme for Building Scientific Workflows

Example of the CCA Composition

interface IntegratorPort extends gov.cca.Port{ double integrate(in double lowBound, in double upBound, in int count);}

Page 6: A Hybrid Decomposition Scheme for Building Scientific Workflows

Ccaffeine

• Parallel implementation of the CCA framework• SCMD (Single Component Multiple Data)

– Inter-components communication • virtual function call in the same address space

– Intra-components communication• could be MPI, PVM, etc.

Page 7: A Hybrid Decomposition Scheme for Building Scientific Workflows

Kepler

• Scientific workflow enviroment– Data-flow oriented

• Basic unit: Actor– Input, Output– Typed dataflow structure– Lots of domain-specific actors supporting

• biology, ecology, astronomy – General facility actors

• Grid service actor• Web service actor

• Wire the actors by piping

GridFtp ClassifierlocalFilePath

URL

Credential

Page 8: A Hybrid Decomposition Scheme for Building Scientific Workflows

Compare Side by Side

• Actor– Stands for one function

• Port– Input/Output– A data-structure definition

• Connection– Producer to Consumer

• Compositions defines “How”

• Advantages– Loosely coupled – Supports distributed

resource sharing

• Component– Stands for one class

• Port – Provide/Use– An interface signature

• Connection– Caller to Callee

• Composition defines “What”

• Advantages– Good performance– Supports parallel

programming model

Page 9: A Hybrid Decomposition Scheme for Building Scientific Workflows

A Hybrid solution

• Typical scientific applications – involve multiple distributed data processing phases. – Among those phases there are number of

computationally intensive cores, • often the classical numerical algorithm • need the high performance execution environment.

• The hybrid scheme – use the workflow scheme to decompose based on the

distribution of the resource– Then use the component scheme to further

decompose those computationally intensive sub-problems to form the parallel solution.

• Benefit from both schemes

Page 10: A Hybrid Decomposition Scheme for Building Scientific Workflows

Service over Components

• Building web service over the CCA– Web service = good interoperability – Kepler supports web service as the actor– More resource and protocols (e.g., WS-BEPL)

• Façade pattern– External view by the coarse-grained web service– Internal functionality by the fine-grained components.

• Factory pattern– Workflow needs

• a task-specific service rather than meta-level service.

– The task-specific Service • Should be created dynamically and on-demand

– But service is not instantiable !

service

Task-specificservice

create

Page 11: A Hybrid Decomposition Scheme for Building Scientific Workflows

Architecture

• Job– A specific task performed by a group wired components

• Two phases execution– Compose the job– Run the job

• Two explicitly separated web services (CCA-Services)– Factory Service– Job Proxy

FactoryService

CcaffeineFramework

IPC

JobProxy

Composer

UserInvocation

Job description

Page 12: A Hybrid Decomposition Scheme for Building Scientific Workflows

Job Factory Service• A Façade for the ccaffeine framework

– Connects the ccaffeine muxer via a socket– Maintains the job tables, job lifecycle

• Create– parameters

• Gateway port– the task-specific interface

• Composition Description: – how components wired to support the Gateway port

– Convert the SIDL to WSDL• Gateway port definition to the equivalent WSDL

– Forward the composition commands to the ccaffeine muxer• Will be executed in parallel

– Maintain job records internally – Create the Job Proxy service

• return its WSDL URL• Modify

– Change the composition without impacting the service interface

Page 13: A Hybrid Decomposition Scheme for Building Scientific Workflows

Job Proxy Service

• Façade for the wired components

• With task-specific WSDL interface

• When getting the SOAP message– Extract the argument from the message– Pass the argument to the ccaffeine – Invoke the ccaffeine– Get result from Driver and send SOAP

responseJob

ProxyUser

SOAP request Arguments Driver

Page 14: A Hybrid Decomposition Scheme for Building Scientific Workflows

Example

FactoryService

socketComposer Gateway port

composition

JobProxy

Go

Gateway port

User SOAP

Job WSDL

Job table

Page 15: A Hybrid Decomposition Scheme for Building Scientific Workflows

Convert SIDL to WSDL • SIDL• Port interface (methods)• object oriented

– Port interface• A virtual interface

• inheritance, polymorphism

• Can be referred as the function parameter type

– No data structure so far

• WSDL• PortType (operations) • wire-format description

– PortType• A group of message exchanges

• no inheritance, no polymorphism

• can’t be referred as the method parameter type

– Any type is data structure essentially (by XML Schema)

No way to figure out the structural information from a SIDL port interface!Challenge

Current workaround:Only allow the methods with primitive argument type

Introducing structure in SIDL will alleviate the problem reasonably

Page 16: A Hybrid Decomposition Scheme for Building Scientific Workflows

Exampleinterface IntegratorPort extends gov.cca.Port{ double integrate(in double lowBound, in double upBound, in int count);}

<wsdl:message name="integrateInput"> <wsdl:part name="lowBound" type="xsd:double"/> <wsdl:part name="upBound" type="xsd:double"/> <wsdl:part name="count" type="xsd:integer"/></wsdl:message><wsdl:message name="integrateOutput"> <wsdl:part name="return" type="xsd:double"/></wsdl:message><wsdl:portType name="integrator.IntegratorPort_PortType"> <wsdl:operation name="integrate"> <wsdl:input message="integrateInput"/> <wsdl:output message="integrateOutput"/> </wsdl:operation></wsdl:portType>

Page 17: A Hybrid Decomposition Scheme for Building Scientific Workflows

Kepler Web Service Actor

• Kepler provides a general web service actor• For a method defined in the WSDL

– The actor will dynamically adjusts its input/output setting

Page 18: A Hybrid Decomposition Scheme for Building Scientific Workflows

Kepler CCA-Service Actor

• For CCA-Serivce– Recall that we have 2 explicit steps– the JobProxy service is dynamically created– We need to hide the procedure of creating the

JobProxy service from the user

• CCA-Service Actor– Extended from the web service actor– First calls the JobFactory service to create the

JobProxy service– With the WSDL of JobProxy, it does same thing as a

general web service actor does

Page 19: A Hybrid Decomposition Scheme for Building Scientific Workflows
Page 20: A Hybrid Decomposition Scheme for Building Scientific Workflows
Page 21: A Hybrid Decomposition Scheme for Building Scientific Workflows

Change the GUI fromSocket stream based toSoap message based.

Page 22: A Hybrid Decomposition Scheme for Building Scientific Workflows
Page 23: A Hybrid Decomposition Scheme for Building Scientific Workflows

Conclusion

• A hybrid decomposition scheme for scientific application• Workflow scheme is used first based on the resource

distribution• Component scheme is used to further decompose the

core parts• Web service interface is the key to the integration• CCA integrates into Kepler as a special actor, with GUI

supporting unified visual environment.• Converting SIDL to WSDL is inherently challenging,

Structure is useful for distributed systems, so we need to introduce the Structure into SIDL

Page 24: A Hybrid Decomposition Scheme for Building Scientific Workflows

Thanks

• Thanks for the valuable comment by the reviewers