23
National Aeronautics and Space Administration Jet Propulsion Laboratory Supporting Science Through Supporting Science Through Workflows: Infrastructure, Workflows: Infrastructure, Architecture and Modeling Architecture and Modeling David Woollard NASA Jet Propulsion Laboratory University of Southern California

National Aeronautics and Space Administration Jet Propulsion Laboratory Supporting Science Through Workflows: Infrastructure, Architecture and Modeling

  • View
    224

  • Download
    1

Embed Size (px)

Citation preview

National Aeronautics andSpace Administration

Jet Propulsion Laboratory

Supporting Science Through Workflows: Supporting Science Through Workflows: Infrastructure, Architecture and ModelingInfrastructure, Architecture and Modeling

David WoollardNASA Jet Propulsion Laboratory

University of Southern California

D.M. Woollard. Supporting Science Through Workflows. 2

National Aeronautics andSpace Administration

Jet Propulsion Laboratory

Agenda» Motivation

» Classification of in silico Experimentation

» Research Problem» Related Work

» Introduction to Workflow Systems

» Research Goals» Methodology

» Refactoring existing software» Domain Specific Software Architecture

» Evaluation» Conclusions & Future Work

D.M. Woollard. Supporting Science Through Workflows. 3

National Aeronautics andSpace Administration

Jet Propulsion Laboratory

Motivation• The nature of scientific investigations has changed.• Two major trend lines:

– Simulation via computer has for many replaced in vivo and in vitro science.

– Collaborations are growing (system of systems science).

• New discoveries in materials science, chemistry, physics, planetary science, and even social sciences are made via in silico experimentation.

D.M. Woollard. Supporting Science Through Workflows. 4

National Aeronautics andSpace Administration

Jet Propulsion Laboratory

in silico Experimentation• Discovery is a phase

is which a scientist rapidly prototypes, tests hypotheses, and develops a methodology

Discovery Production

Distribution

Theory

Practice

Development

Execution

Lone Researcher[Kepner 03]

D.M. Woollard. Supporting Science Through Workflows. 5

National Aeronautics andSpace Administration

Jet Propulsion Laboratory

in silico Experimentation• Production is the

engineeringengineering of replicating an experiment on large volumes of data.

Discovery Production

Distribution

We will focus on Production SystemsProduction Systems in this talk.

D.M. Woollard. Supporting Science Through Workflows. 6

National Aeronautics andSpace Administration

Jet Propulsion Laboratory

in silico Experimentation• Distribution is a phase

in which data is dispersed to peers for review and further experimentation including:PapersPapersFederated DataFederated DataDigital LibrariesDigital Libraries

Discovery Production

Distribution

D.M. Woollard. Supporting Science Through Workflows. 7

National Aeronautics andSpace Administration

Jet Propulsion Laboratory

The Role of Technology• In silico science, especially system of systems

science, is facilitated by the Grid.

“The sharing that we are concerned with is not primarily file exchange but rather direct access to computers, software, data, and other resources, as is required by a range of collaborative problem-solving and resource- brokering strategies emerging in industry, science, and engineering.”

The Anatomy of the Grid (2001)

D.M. Woollard. Supporting Science Through Workflows. 8

National Aeronautics andSpace Administration

Jet Propulsion Laboratory

Research Problem• Scientists harness complex hardware and software systems in order to

conduct scientific research in silicoin silico.

Meeting these production requirements causes scientists to engineer a production system or a software engineer to rewrite scientific code. This is both inefficientinefficient and costlycostly.

• Once algorithms and processes are established, production systemsproduction systems are created to produce large volumes of data.

• Designing a production system is a complex engineering taskcomplex engineering task as well as a complex scientific task.

D.M. Woollard. Supporting Science Through Workflows. 9

National Aeronautics andSpace Administration

Jet Propulsion Laboratory

Introduction to Workflows

ProductionSystems

Grid Systems

Grid Systems have traditionally focused on creating Virtual Virtual OrganizationsOrganizations.

In Grids, workflowsworkflows orchestrate processing tasks in production systems.

Workflows are a processing model that incorporate actors, tasks, data, actors, tasks, data, and rulesand rules.

Workflows

T1

T2

T3

T4

T0

Workflow management systems execute tasks on data once the task’s dependencies are satisfied based on rules.

D.M. Woollard. Supporting Science Through Workflows. 10

National Aeronautics andSpace Administration

Jet Propulsion Laboratory

Workflow System Model

D.M. Woollard. Supporting Science Through Workflows. 11

National Aeronautics andSpace Administration

Jet Propulsion Laboratory

Workflows Workflows Everywhere

Condor-G

Pegasus

Wings

Taverna

Grid Workflow

YawlDAG-Man

Triana

ICENI

VDS

GridAnt

GrADS

GridFlow

Unicore

GridbusAskalon

Kepler

Karajan

SciFlowOODT

D.M. Woollard. Supporting Science Through Workflows. 12

National Aeronautics andSpace Administration

Jet Propulsion Laboratory

Bottom-up Taxonomy

• Yu & Buyya presented a taxonomy [Yu & Buyya 05]

– Based on workflow properties like model representation and scheduling policy

– Illustration of divergence in the field

No taxonomy by interface to task code.

D.M. Woollard. Supporting Science Through Workflows. 13

National Aeronautics andSpace Administration

Jet Propulsion Laboratory

Insights from an Architect• Each production workflow task is a complex software application

with two primary stakeholders: the scientist and the engineer.

• Software architectures are a system’s blueprint–its form, elements, and rationale [Perry & Wolf, 92].

• An architecture provides appropriate viewsappropriate views for each stakeholder in addition to encapsulation of computation and communication. These are the architecture’s componentscomponents, connectorsconnectors and topologytopology.

• Reification of architectural elements in code is a method of bridging the gap between design and implementation. First-First-class connectorsclass connectors and explicit interfacesexplicit interfaces are such reifications.

D.M. Woollard. Supporting Science Through Workflows. 14

National Aeronautics andSpace Administration

Jet Propulsion Laboratory

Research Goals

• Develop a Domain Specific Software Architecture (DSSA) for tasks in scientific workflows.

• Develop a methodology for refactoring existing scientific code into this DSSA.

• Minimize overhead (computation time and memory footprint).

• Maximize science code reuse.

D.M. Woollard. Supporting Science Through Workflows. 15

National Aeronautics andSpace Administration

Jet Propulsion Laboratory

Agenda» Motivation

» Classification of in silico Experimentation

» Research Problem» Related Work

» Introduction to Workflow Systems

» Research Goals» Methodology

» Refactoring existing software» Domain Specific Software Architecture

» Evaluation» Conclusions & Future Work

D.M. Woollard. Supporting Science Through Workflows. 16

National Aeronautics andSpace Administration

Jet Propulsion Laboratory

Decomposing Software• Decomposition, the first step in the approach, is a process in

which scientific modulesscientific modules are identified and control flow determined.

• Scientific modules are like functionsfunctions - they have internal scope and a single entry and exit point. In graph theoretic terms, the call dominancy tree for the basic blocks in the module only have one source and one sink.

• The proper level of decomposition is dependant on both scientific functionality and engineering requirements. Therefore, it should be “tunabletunable.”

Decomposition Architecting

Deployment

D.M. Woollard. Supporting Science Through Workflows. 17

National Aeronautics andSpace Administration

Jet Propulsion Laboratory

“Injecting” Architecture

Decomposition Architecting

Deployment

• In the second part of the approach, these modules must be “architected”“architected” into a workflow task with connectors to services at appropriate levels (to satisfy production requirements).

• We use Prism-MW wrapperswrappers to encapsulate and componentized these decomposed modules. This provides us with a standard interface and utilities at the module level for employing event-based communication.

• We use the Exogenous Connector style Exogenous Connector style [Lau et. al.] to mimic the original control and data flow in the workflow task and augment these connectors with a specialized version of the invoking invoking connectorconnector.

D.M. Woollard. Supporting Science Through Workflows. 18

National Aeronautics andSpace Administration

Jet Propulsion Laboratory

Deploying to the Grid

Decomposition Architecting

Deployment

• Deployment is the last step in our approach. • We currently deploy the resulting workflow component into the

OODT Science Data System environment. This is a grid workflow management system used at JPL.

• We should note that this choice is purely for the sake of developer convenience, the approach such be deployable to any target workflow management system.

D.M. Woollard. Supporting Science Through Workflows. 19

National Aeronautics andSpace Administration

Jet Propulsion Laboratory

SWSA Architecture

Scientific Workflow Software Architecture (SWSA), a domain specific software

architecture for workflow tasks.

D.M. Woollard. Supporting Science Through Workflows. 20

National Aeronautics andSpace Administration

Jet Propulsion Laboratory

Preliminary Evaluation

• We chose a canonical scientific application (matrix multiplication) implemented in both Fortran and C

• Six different metrics were taken: – Execution time for:

• Base application• Wrapper (no data exchanged)• Wrapper (data exchanged)

– Memory Footprint• Base application• Wrapper (no data exchanged)• Wrapper (data exchanged)

D.M. Woollard. Supporting Science Through Workflows. 21

National Aeronautics andSpace Administration

Jet Propulsion Laboratory

Preliminary Evaluation

Refactoring Methodology Example: Molecular Dynamics Simulation

Performance results are very promising:

Time Overhead: 1.85%

Code Reuse: 96.77%

D.M. Woollard. Supporting Science Through Workflows. 22

National Aeronautics andSpace Administration

Jet Propulsion Laboratory

Conclusions & Future Work• Scientific Workflow Software Architecture (SWSA) improves

upon existing workflow systems by providing:– A methodology for accessing services.

– A separation of concerns between scientific algorithms and production features of code.

– A clean separation of roles between the scientist and the engineer.

• Satisfies the “cult of performance.”

• Future Work– Extended evaluation on more advanced simulation codes.

– Expansion of the the architecture to support parallel codes.

D.M. Woollard. Supporting Science Through Workflows. 23

National Aeronautics andSpace Administration

Jet Propulsion Laboratory

Thank You

Portions of this research were conducted at the Jet Propulsion Laboratory managed by the California Institute of Technology under a contract with the National Aeronautics and Space Administration.

For more information, please see:• D. Woollard, N. Medvidovic, Y. Gil, and C. Mattmann. “Scientific Software as Workflows: FromDiscovery to Distribution.” To appear in IEEE Software Special Issue on Developing Scientific Software, 2008.• D. Woollard, D. Freeborn, E. Kay-Im, S. LaVoie. “Case Studies in Science Data Systems: Meeting Software Challenges in Competitive Environments.” To appear in Proceedings of the 10th International Conference on Space Operations (SpaceOps-2008), AIAA press, Heidelberg, Germany, May 2008.• D. Woollard. “Supporting Scientific Workflows Through First-Class Connectors.” Qualifying Examination Report. University of Southern California. May, 2007.• D. Woollard, C. Mattmann, and N. Medvidovic "Injecting Software Architectural Constraints into Legacy Scientific Applications." USC Center for Software Engineering Technical Report, USC-CSE-2007-701, January 2007.