What do we need to manage end-to-end scientific workflows for

What do we need to manage end-to-end scientific workflows for

efficiency and productivity?

Lavanya Ramakrishnan

[email protected]

1

147MB 147MB

206MB

Wrf Static

Terrain PreProcessor

Lateral Boundary

Interpolator

ADAS Interpolator

ARPS2WRF

WRF

4secs

338secs

146secs

78secs

240secs

4570secs/ 16 processors

0.2MB

488MB 243MB

19MB

0.2MB

2422MB

(CS Biased) View of Workflow Challenges: LEAD North American Mesoscale (NAM) forecast (2009)

•  Mostly simple workflow

•  Repetitive, iterative, parametric studies

•  Provenance •  Workflow and data

sharing •  Mix of single core &

multiple core jobs managed by a service based arch People use ad-hoc scripts, keep

notes in text files and encode metadata in file names

DDM: Dula’s Data Management (2012)

Way too many hard drives Multiple hours per week spent manually copying/erasing data EVERY BEAMLINE FOR ITSELF!

Advanced Light Source

But haven’t we solved the workflow and data management problems?

What about all the workflow tools out there?

Downloading and pre-processing data can be complex

5

NASA FTP Server

(Metadata and data)

Batch Queue Nodes

Data Transfer Nodes

FTP Queue

Job Queue

HPSS File System

Local Metadata Catalog

Output data

Metadata

Access re-projected data

Determine files to download for a tile

Create a file download record

Create a job record

http://modis.nersc.gov

Data Collection (High Frequency and Meteorological

Data)

Pre-Processing and Sensor Calibration

Processing into Fluxes

Post-Processing and Product Generation

Synthesis Studies, Models, Simulations

QA / QC Flagging / Visual Checks

Ustar Calculation and Filtering

Gapfilling

Flux Partitioning

Generation of Data Products

Ameriflux: Atmosphere-Biosphere interactions in Climate Models Data Processing Pipeline for Net Ecosystem Exchange (NEE)

The structure of the workflow does not capture data complexities

http://ameriflux.lbl.gov

Networking paths are still pretty complex: DayaBay Networking

• Relay Path requires daisy-chaining 2 data transfers (default)

• Direct Path requires 2 data transfers out of site. • DayaNet: Dedicated 150 Mbps optical link

• CSTNet: Chinese national network

• GLORIAD: Trans-Pacific scientific network (NSF)

• ASGC: Trans-Pacific eScience network (fallback for GLORIAD)

• ESNet: US national Energy Science Network

• Hot-swappable disk transport between Daya Bay and Hong Kong in case of long-term network failure of either DayaNet or CSTNet

7 http://dayabay.ihep.ac.cn/

Failures of all magnitudes can occur • Relay Path requires daisy-chaining 2 data transfers (default)

• Direct Path requires 2 data transfers out of site. • DayaNet: Dedicated 150 Mbps optical link

• CSTNet: Chinese national network

• GLORIAD: Trans-Pacific scientific network (NSF)

• ASGC: Trans-Pacific eScience network (fallback for GLORIAD)

• ESNet: US national Energy Science Network

• Hot-swappable disk transport between Daya Bay and Hong Kong in case of long-term network failure of either DayaNet or CSTNet

8

Framework

Beamline User

image

Data Transfer

Node Experiment HPC Storage and Archive

HPC Compute Resources

metadata

reference Data Pipeline

Remote User

Control and Feedback

Prompt Analysis Pipeline

Real-time data processing presents additional challenges

How are we going to manage simulation and analysis workflows? (NERSC Cori System)

Cray Cascade- 64 Cabinets KNL Compute 9304 Nodes

2 BOOT 2 SDB Network Nodes (4)

Boot Cabinet With esLogin

GigE Switch

GigE (Admin)

FDR IB 40 GigE

GigE (Internal)

FC

Boot RAID

FC Switch

SMW

Network RSIP (10)

esLogin 14 Nodes

esMS 2 Node

40 GigE Switch

136 FDR IB

DVS Server Nodes (32)

MOM 28 Nodes

DSL 32 Nodes

To NERSC network

To NERSC network

Haswell Compute 1920 Nodes

Redundant Core IB Switches

15 FDR IB 102 FDR IB

PFS

12 Scalable Units

MGS – 2 Nodes

MDS – 4 Nodes

OSS – 96 Nodes

DDN WolfCreek

5 NetApp E2700

Burst Buffer 384 Nodes

68 LNET Routers

32 FDR IB to NGF

It takes a village to build and run pipelines …

•  Application Scientists (and Students and Postdocs) –  Know the use cases –  Write application codes –  Often write some scripts –  User who runs the final workflow

•  Workflow Developers –  Composes workflows

•  System Integrators –  Write middleware/software pieces –  Computational model experts

•  System/Facility Experts –  HPC center and ESnet staff

Not nearly well-defined roles

Tigres: a workflow system “library”/”toolkit”

What do we need for end-to-end pipelines?

•  Workflow tools need to be more than what runs on clusters/HPC –  Data Transfer, Data Processing and Storage Environment –  Integral to scientific processes

•  User/project requirements are changing and users have to play a key role in tool development –  Scientists should focus on science/algorithm –  Not practical to have a workflow developer for every scientist

•  We need to think beyond our individual boxes –  data, workflow, network, resource management has to happen

in conjunction •  Efficiencies (performance, energy, …), reliability and

productivity are only becoming more important

Even CS folks need workflow tools ( e.g., Prabhat)!

Tigres: Design templates for common scientific workflow patterns

"LightSrc" Domain templates

Base Tigres templates

Scale up

Application "LightSrc-1"

Application "LightSrc-2"

Create andDebug

Share

Create andDebug

Implement templates as a library in an existing language

Early (friendly) release is now available! http://tigres.lbl.gov

Key Aspects of Tigres

•  Targeted for large-scale data-intensive workflows –  Motivated by “MapReduce” model –  No centralized managed model

•  Library model embedded in existing languages such as Python and C –  “Extend current scripting/programming tools” –  API-based, embedded in code

•  Light-weight execution framework –  “As easy to run as an MPI program on an HPC resource” –  No persistent services

•  Scientist-Centered Design Process –  Get feedback from user before writing all the code

Tigres Templates

TaskN

Task1 Sequence

Taskn Task1 ... ...

Split

Parallel

TaskN Task1

Task

Merge

Tasko

Taskn Task1

Templates

•  Sequence ( name, task_array, input_array ) –  e.g., output [ ] = Sequence (“my seq”, task_array_12,

input_array_12) •  Parallel ( name, task_array, input_array )

–  e.g., output[ ] = Parallel(“abc”, task_array_12, input_array_12)

•  Split ( name, split_task, split_input_values, task_array, task_array_in ) –  e.g., Split( “split”, task_x1, input_value_1, spl_t_arr,

spl_i_arr) •  Merge ( name, task_array, input_array, merge_task,

merge_input_values) –  e.g., Merge( “merge”, syn_t_arr, syn_i_arr, task_x1,

input_value_1)

Design

Execution Environment

API Implementation Optimizations Scientist-Centered

Design Process

Scientist Centered Design Process

Requirements gathering is not the same as usability studies

Concept understanding by user Changes to Nomenclature Support in C also important

Priorities for first prototype: Desktop to NERSC Monitoring Intermediate state management

Impact of Scientist-Centered Design

Design

Execution Environment

API Implementation Optimizations Scientist-Centered

Design Process

It took days for first stub implementation rather than weeks

(or months)!

Summary

•  User-centered design processes are vital for development of next-generation tools

•  We need libraries/toolkits that can be customized for specific needs

•  Next-generation workflow/software ecosystems need to be holistic

Acknowledgements

•  Tigres Team –  Deb Agarwal (PI), Lavanya Ramakrishnan, Daniel Gunter –  Gilberto Pastorello, Valerie Hendrix, Ryan Rodriguez

•  NERSC/CRD –  John Shalf, Nicholas Wright, Christopher Daley

•  Science Use Cases –  Daniel Chivers, John Kua, Michael Quinlan –  Craig Tull, Dula Parkinson, Gilberto Pastorello –  …and many others

[email protected]

Documents

What do we need to manage end-to-end scientific workflows for