Upload
dirk
View
34
Download
4
Tags:
Embed Size (px)
DESCRIPTION
An Extensible System for Design and Execution of Scientific Workflows. San Diego Supercomputer Center (SDSC) University of California, San Diego (UCSD). Kepler (UCSD and UCDavis ). Scientific workflow management system based on Ptolemy II - PowerPoint PPT Presentation
Citation preview
Scientific workflow management system based on Ptolemy II
Allows scientists to visually design and execute scientific workflows
Actor-oriented model with directors acting as the main workflow engine
Enables different models of computation
Modeling flow of data from one step to another in series of computations to achieve some scientific goal
Software system for modeling, simulation, and design of concurrent, real-time, embedded systems developed at UC Berkeley
Objective:“The focus is on assembly of concurrent components. The key underlying principle in the project is the use of well-defined models of computation that govern the interaction between components. A major problem area being addressed is the use of heterogeneous mixtures of models of computation.”
Directors Actors Ports Relations
PortPort
Actor Actor
LinkRelation
Actor
Port
connection
Link
Link
Attributes Attributes
Attributes
Directors control execution of workflow Actors are executable components of a
workflow (scheduling, dispatching threads, etc)
Directors govern execution of Actors
Actor-/Dataflow Orientation vsObject-/Control flow Orientation
Every Kepler workflow needs a director
Execute networks of components under multiple execution models› Synchronous vs. Parallel vs. Dataflow
vs. time-based vs. event-based vs. all combined
Computation model dictates semantics for component interaction
Make use of separation of concerns› e.g., component execution, workflow
execution and provenance tracking Managers acts like “common execution
environment” › governing different concerns related to
execution of network and services
CT – continuous time modeling DE – discrete event systems FSM – finite state machines PN – process networks SDF – synchronous dataflow DDF – dynamic dataflow SR - synchronous/reactive systems
Reusable components that execute variety of functions
Communicate with other actors in workflow through ports
Composite actor – aggregation of actors
Composite actor may have a local director
Top level workflows can be conceptual representation of science process
Drilling down reveals increasing levels of detail
Composing models using hierarchy promotes development of re-usable components
Each actor implements several methods› initialize() – initializes state variables› prefire() – indicates if actor wants to fire› fire() – main point of execution
Read inputs, produce outputs, read parameter values
› postfire() – update persistent state, see if execution complete
› wrapup() Each director calls these methods
according to its model
Copy actor– copy files from one resource to another during execution› Stage actor – local to remote host› Fetch actor - remote to local host
Job execution actor – submit and run a remote job Monitoring actor – notify user of failures Service discovery actor – import web services from a
service repository or web site Rexpression actors MatlabExpression actors Web services actors – Given WSDL and name of an
operation of a web service, dynamically customizes itself to implement and execute that method
Database connection and query actors
Ports used to produce and consume data and communicate with other actors in workflow› Input port – data consumed by actor› Output port – data produced by actor› Input/output port – data both produced and
consumed
Direct same input or output to more than one port
Example: direct output to 1. display actor to show intermediate
results, and 2. operational actor for further processing
Execution Options: › inside GUI› at command-line› distributed computing
Kepler components can be shared by exporting workflow or component into a Kepler Archive (KAR) file (extension of JAR file format)
Component Repository is centralized system for sharing Kepler workflows
Users can search for components from repository from within Vergil
Kepler provides direct access to scientific data archived in many of commonly used data archives. › Ex. access to data stored in Knowledge
Network for Biocomplexity (KNB) Metacat server and described using Ecological Metadata Language.
Additional supported data sources › DiGIR protocol, OPeNDAP protocol, GridFTP,
JDBC, SRB, and others.
Kepler ships by default with:› Globus actors› GridFTP actors
No BES implementation*
Job submission to openPBS, G-lite Kepler actors capable of using Unicore by
Euforia (Poznań SC) TeraGrid gateways exists that use Kepler
Actor Data Polymorphism:› Add numbers (int, float, double, complex)› Add strings (concatenation)› Add complex types (arrays, records,
matrices)› Add user-defined types
Distributed execution of workflow parts (peer to peer) Efficient data transfer Provenance tracking of data and processes Tracking workflow evolution Streaming data analysis Easy-to-deploy batch interfaces Intuitive workflow design Customizable semantic typing Interoperability with other workflow and analytical
environments (at exec level)
Ecology› SEEK: Ecological Niche Modeling and climate change› REAP: Modeling parasite invasions in grasslands using sensor networks› NEON: Ecological sensor networks; COMET: Environmental science
Geosciences› GEON: LiDAR data processing, Geological data integration› NEESit: Earthquake engineering
Molecular biology› SDM: Gene promoter identification and ScalaBLAST› ChIP-chip: Genome-scale research; CAMERA: Metagenomics
Oceanography› REAP: SST data processing; LOOKING/OOI CI: ocean observing CI› ROADNet: real-time data modeling and analysis› ATOL: Processing Phylodata ; CiPRES: Phylogentic tools
Chemistry› Resurgence: Computational chemistry; DART/ARCHER: X-Ray crystallography
Library science› DIGARCH: Digital preservation; UK Text Mining Center: Cheshire feature and
archival Conservation biology
› SanParks: Thresholds of Potential Concerns Physics
› SDM: astrophysics TSI-1 and TSI-2 ; CPES: Plasma fusion simulation; ITER-EU: ITM fusion workflows