46
Building Scalable Scientific Applications using Makeflow Dinesh Rajan and Peter Sempolinski University of Notre Dame

Building Scalable Scientific Applications using Makeflow

  • Upload
    rosa

  • View
    53

  • Download
    3

Embed Size (px)

DESCRIPTION

Building Scalable Scientific Applications using Makeflow. Dinesh Rajan and Peter Sempolinski University of Notre Dame. Cooperative Computing Lab. University of Notre Dame. http://www.nd.edu/~ccl. The Cooperative Computing Lab. The Cooperative Computing Lab. The Cooperative Computing Lab. - PowerPoint PPT Presentation

Citation preview

Page 1: Building Scalable Scientific Applications using Makeflow

Building Scalable Scientific Applications using Makeflow

Dinesh Rajan and Peter Sempolinski

University of Notre Dame

Page 2: Building Scalable Scientific Applications using Makeflow

Cooperative Computing Lab

http://www.nd.edu/~ccl

University of Notre Dame

Page 3: Building Scalable Scientific Applications using Makeflow

The Cooperative Computing Lab

3

The Cooperative Computing LabThe Cooperative Computing Lab• We collaborate with people who have

large scale computing problems in science, engineering, and other fields.

• We operate computer systems on the O(10,000) cores: clusters, clouds, grids.

• We conduct computer science research in the context of real people and problems.

• We develop open source software for large scale distributed computing.

The Cooperative Computing Lab• We collaborate with people who have large

scale computing problems in science, engineering, and other fields.

• We operate computer systems on the O(10,000) cores: clusters, clouds, grids.

• We develop open source software for large scale distributed computing.

http://www.nd.edu/~ccl

Page 4: Building Scalable Scientific Applications using Makeflow

Plan for Today’s Tutorial

1. Our CCTools Softwarei. Makeflow, Work Queue, Parrot, Chirp

2. Makeflowi. Lecture: Overview, featuresii. Tutorial: Write simple Makeflows

3. Work Queuei. Lecture: Overview, featuresii. Tutorial: Write simple WQ programs

Page 5: Building Scalable Scientific Applications using Makeflow

Science Depends on Computing!

Page 6: Building Scalable Scientific Applications using Makeflow

The Good News:Computing is Plentiful!

6

Page 7: Building Scalable Scientific Applications using Makeflow
Page 8: Building Scalable Scientific Applications using Makeflow
Page 9: Building Scalable Scientific Applications using Makeflow

Superclusters by the Hour

9

http://arstechnica.com/business/news/2011/09/30000-core-cluster-built-on-amazon-ec2-cloud.ars

Page 10: Building Scalable Scientific Applications using Makeflow

10

I have a standard, debugged, trusted application that runs on my laptop. A toy problem completes in one hour.A real problem will take a month (I think.)

Can I get a single result faster?Can I get more results in the same time?

Last year,I heard aboutthis grid thing.

This year,I heard about

this cloud thing.

Page 11: Building Scalable Scientific Applications using Makeflow

I have allocations on clusters (unlimited) + grids (limited) + clouds ($)!

How do I run my applicationon those machines?

Page 12: Building Scalable Scientific Applications using Makeflow

Should I port my program to MPI or Hadoop?Learn MPI / HadoopLearn C / JavaRe-architectRe-writeRe-testRe-debugRe-certify

Page 13: Building Scalable Scientific Applications using Makeflow

And my application looks like this…

Page 14: Building Scalable Scientific Applications using Makeflow

Makeflow & Work Queue

Easy to scale from one desktop to national scale infrastructure.Harness all available resources:

desktops, clusters, clouds, grids.Portable across operating systems, storage systems, batch systems.No special privileges required.

Page 15: Building Scalable Scientific Applications using Makeflow

Makeflow

15

part1 part2 part3: input.data split.py ./split.py input.data

out1: part1 mysim.exe ./mysim.exe part1 >out1

out2: part2 mysim.exe ./mysim.exe part2 >out2

out3: part3 mysim.exe ./mysim.exe part3 >out3

result: out1 out2 out3 join.py ./join.py out1 out2 out3 > result

Page 16: Building Scalable Scientific Applications using Makeflow

16

Work Queue Library

http://www.nd.edu/~ccl/software/workqueue

#include “work_queue.h”

while( not done ) {

while (more work ready) { task = work_queue_task_create(); // add some details to the task work_queue_submit(queue, task); }

task = work_queue_wait(queue); // process the completed task}

Page 17: Building Scalable Scientific Applications using Makeflow

17

Parrot Virtual File System

Local HTTP CVMFS Chirp iRODS

OrdinaryAppl

Filesystem Interface: open/read/write/close

Web Servers

iRODSServer

CVMFSNetwork

ChirpServer

Parrot and Chirp

Page 18: Building Scalable Scientific Applications using Makeflow

http://www.nd.edu/~ccl/software/manuals

Page 19: Building Scalable Scientific Applications using Makeflow

Source code in GitHubhttp://github.com/cooperative-computing-lab/cctools

Page 20: Building Scalable Scientific Applications using Makeflow

Makeflow & Work Queue

Federate/harness all available resources: desktops, clusters, clouds, grids.

Simple interfaces & API

Part of CCTools softwareNo special privileges required to install.

Page 21: Building Scalable Scientific Applications using Makeflow

Makeflow Lecture: Outline

1. What is Makeflow?– Portable: One Makeflow program for SGE, Condor, PBS

2. How to write an application using Makeflow?– Simple rule-based syntax

3. How to run Makeflow?– Features, commands, using Work Queue

Page 22: Building Scalable Scientific Applications using Makeflow

An Old Idea: Makefiles

22

part1 part2 part3: input.data split.py ./split.py input.data

out1: part1 mysim.exe ./mysim.exe part1 >out1

out2: part2 mysim.exe ./mysim.exe part2 >out2

out3: part3 mysim.exe ./mysim.exe part3 >out3

result: out1 out2 out3 join.py ./join.py out1 out2 out3 > result

Page 23: Building Scalable Scientific Applications using Makeflow

Makeflow Language - RulesEach rule specifies:

a set of target files to create;a set of source files needed to create them;a command that generates the target files from the source files.

part1 part2 part3: input.data split.py ./split.py input.data

out1: part1 mysim.exe ./mysim.exe part1 >out1

out2: part2 mysim.exe ./mysim.exe part2 >out2

out3: part3 mysim.exe ./mysim.exe part3 >out3

result: out1 out2 out3 join.py ./join.py out1 out2 out3 > result

out1 : part1 mysim.exemysim.exe part1 > out1

Page 24: Building Scalable Scientific Applications using Makeflow

You must stateall the files

needed by the command.

Page 25: Building Scalable Scientific Applications using Makeflow

sims.mf

out.10 : in.dat calib.dat sim.exesim.exe –p 10 in.data > out.10

out.20 : in.dat calib.dat sim.exesim.exe –p 20 in.data > out.20

out.30 : in.dat calib.dat sim.exesim.exe –p 30 in.data > out.30

Page 26: Building Scalable Scientific Applications using Makeflow

Makeflow = Make + Workflow

Provides portability across batch systems.Enable parallelism (but not too much!)Fault tolerance at multiple scales.Data and resource management.

Makeflow

Local Condor SGE WorkQueue

http://www.nd.edu/~ccl/software/makeflow

Page 27: Building Scalable Scientific Applications using Makeflow

PrivateCluster

CampusCondor

Pool

PublicCloud

Provider

XSEDE Cluster

Makefile

Makeflow

Local Files and Programs

Makeflow + Batch System

makeflow –T sge

makeflow –T condor

Work Queue

Work Queue

Page 28: Building Scalable Scientific Applications using Makeflow

How to run a MakeflowRun a workflow local

% makeflow -T local sims.mfRun the workflow on SGE:% makeflow -T sge sims.mfRun the workflow on Condor:% makeflow -T condor sims.mfClean up the workflow outputs:% makeflow -c sims.mf

Page 29: Building Scalable Scientific Applications using Makeflow

Makeflow can verify if your Makeflow file is syntactically correct % makeflow -k sims.mf  Makeflow: Syntax OK.

Makeflow will point out syntax errors if any % makeflow -k sims.mf  makeflow: out10 is defined multiple times at out.10:1 and out.10:4

Makeflow Syntax Checker

Page 30: Building Scalable Scientific Applications using Makeflow

Makeflow can output a makeflow file as a Dot graph.

% makeflow -D dot sims.mfdigraph {node [shape=ellipse,color = green,style = unfilled,fixedsize = false];N2 [label="sim.exe"];N1 [label="sim.exe"];N0 [label="sim.exe"];node [shape=box,color=blue,style=unfilled,fixedsize=false];F3 [label = "out.30"];F0 [label = "sim.exe"];F5 [label = "out.10"];F2 [label = "in.dat"];F1 [label = "calib.dat"];F4 [label = "out.20"];....

Makeflow Visualization

Page 31: Building Scalable Scientific Applications using Makeflow

Example App: Biocompute Portal

Generate Makefile

Makeflow

RunWorkflow

ProgressBar

Transaction Log

UpdateStatus

CondorPool

Submit Tasks

BLASTSSAHASHRIMPESTMAKER…

Page 32: Building Scalable Scientific Applications using Makeflow

Makeflow + Work Queue

Page 33: Building Scalable Scientific Applications using Makeflow

PrivateCluster

CampusCondor

Pool

PublicCloud

Provider

XSEDECluster

Makefile

Makeflow

Local Files and Programs

Makeflow + Batch System

makeflow –T sge

makeflow –T condor

???

???

Page 34: Building Scalable Scientific Applications using Makeflow

XSEDECluster

CampusCondor

Pool

PublicCloud

Provider

PrivateCluster

Makefile

Makeflow

Local Files and Programs

Makeflow + Work Queue

W

W

W

ssh

WW

WW

sge_submit_workers

W

W

W

condor_submit_workers

W

W

W

Thousands of Workers in a

Personal Cloud

submittasks

Page 35: Building Scalable Scientific Applications using Makeflow

Advantages of Work Queue

Scalability: Harness multiple infrastructure

simultaneously.

Elasticity: Scale resources up & down as

needed.

Data Management: Remote data caching.

Data Locality: Matches tasks to nodes with data.

Page 36: Building Scalable Scientific Applications using Makeflow

Fault Tolerance

MF +WQ is fault tolerant :If Makeflow crashes (or killed), it recovers by reading log and continues where it left off.

If a worker crashes, the master will detect and restart the task elsewhere.

Workers can be added and removed any time during execution.

Page 37: Building Scalable Scientific Applications using Makeflow

Makeflow and Work QueueTo start the Makeflow% makeflow -T wq sims.mfCould not create work_queue on port 9123.

% makeflow -T wq -p 0 sims.mf Listening for workers on port 8374…

To start one worker:% work_queue_worker ccl.cse.nd.edu 8374

Page 38: Building Scalable Scientific Applications using Makeflow

Start Workers Everywhere!Submit workers to SGE:% sge_submit_workers ccl.cse.nd.edu 8374 25

Submit workers to Condor:% condor_submit_workers ccl.cse.nd.edu 8374 25

Submit workers to Torque:% torque_submit_workers ccl.cse.nd.edu 8374 25

Page 39: Building Scalable Scientific Applications using Makeflow

Keeping track of port numbersgets old fast…

Page 40: Building Scalable Scientific Applications using Makeflow

Project Names

Worker

work_queue_worker -a –N myproject

Catalog

connect toccl.cse.nd.edu:4057

advertise

“myproject”is at ccl.cse.nd.edu:4057

query

Makeflow(port 4057)

makeflow … –a –N myproject

Page 41: Building Scalable Scientific Applications using Makeflow

Makeflow with Project NamesStart Makeflow with a project name:% makeflow -T wq -p 0 -a -N xsede-tutorial sims.mf Listening for workers on port XYZ…

Start one worker:% work_queue_worker -N xsede-tutorial

Start many workers:% sge_submit_workers -N ccgrid-tutorial 5

Page 42: Building Scalable Scientific Applications using Makeflow

http://www.nd.edu/~ccl/software/makeflow/

Page 43: Building Scalable Scientific Applications using Makeflow

The Cooperative Computing Lab

43

The Cooperative Computing LabThe Cooperative Computing Lab• We collaborate with people who have

large scale computing problems in science, engineering, and other fields.

• We operate computer systems on the O(10,000) cores: clusters, clouds, grids.

• We conduct computer science research in the context of real people and problems.

• We develop open source software for large scale distributed computing.

Makeflow• Portable: One program for clusters, grids, clouds

• Simple syntax: inputs, outputs, command • All files needed by command must be specified

• Makeflow with Work Queue• Federation, Elasticity, Data management

• Project Names• Easy to remember locations of Makeflow masters

Page 44: Building Scalable Scientific Applications using Makeflow

Chris Hempel (TACC)David Gignac (TACC)

Acknowledgements

Page 45: Building Scalable Scientific Applications using Makeflow

Go to: http://www.nd.edu/~cclClick on “Tutorial at XSEDE 2013”

Page 46: Building Scalable Scientific Applications using Makeflow

Click on “Tutorial” under Makeflow