Upload
rosa
View
53
Download
3
Tags:
Embed Size (px)
DESCRIPTION
Building Scalable Scientific Applications using Makeflow. Dinesh Rajan and Peter Sempolinski University of Notre Dame. Cooperative Computing Lab. University of Notre Dame. http://www.nd.edu/~ccl. The Cooperative Computing Lab. The Cooperative Computing Lab. The Cooperative Computing Lab. - PowerPoint PPT Presentation
Citation preview
Building Scalable Scientific Applications using Makeflow
Dinesh Rajan and Peter Sempolinski
University of Notre Dame
Cooperative Computing Lab
http://www.nd.edu/~ccl
University of Notre Dame
The Cooperative Computing Lab
3
The Cooperative Computing LabThe Cooperative Computing Lab• We collaborate with people who have
large scale computing problems in science, engineering, and other fields.
• We operate computer systems on the O(10,000) cores: clusters, clouds, grids.
• We conduct computer science research in the context of real people and problems.
• We develop open source software for large scale distributed computing.
The Cooperative Computing Lab• We collaborate with people who have large
scale computing problems in science, engineering, and other fields.
• We operate computer systems on the O(10,000) cores: clusters, clouds, grids.
• We develop open source software for large scale distributed computing.
http://www.nd.edu/~ccl
Plan for Today’s Tutorial
1. Our CCTools Softwarei. Makeflow, Work Queue, Parrot, Chirp
2. Makeflowi. Lecture: Overview, featuresii. Tutorial: Write simple Makeflows
3. Work Queuei. Lecture: Overview, featuresii. Tutorial: Write simple WQ programs
Science Depends on Computing!
The Good News:Computing is Plentiful!
6
Superclusters by the Hour
9
http://arstechnica.com/business/news/2011/09/30000-core-cluster-built-on-amazon-ec2-cloud.ars
10
I have a standard, debugged, trusted application that runs on my laptop. A toy problem completes in one hour.A real problem will take a month (I think.)
Can I get a single result faster?Can I get more results in the same time?
Last year,I heard aboutthis grid thing.
This year,I heard about
this cloud thing.
I have allocations on clusters (unlimited) + grids (limited) + clouds ($)!
How do I run my applicationon those machines?
Should I port my program to MPI or Hadoop?Learn MPI / HadoopLearn C / JavaRe-architectRe-writeRe-testRe-debugRe-certify
And my application looks like this…
Makeflow & Work Queue
Easy to scale from one desktop to national scale infrastructure.Harness all available resources:
desktops, clusters, clouds, grids.Portable across operating systems, storage systems, batch systems.No special privileges required.
Makeflow
15
part1 part2 part3: input.data split.py ./split.py input.data
out1: part1 mysim.exe ./mysim.exe part1 >out1
out2: part2 mysim.exe ./mysim.exe part2 >out2
out3: part3 mysim.exe ./mysim.exe part3 >out3
result: out1 out2 out3 join.py ./join.py out1 out2 out3 > result
16
Work Queue Library
http://www.nd.edu/~ccl/software/workqueue
#include “work_queue.h”
while( not done ) {
while (more work ready) { task = work_queue_task_create(); // add some details to the task work_queue_submit(queue, task); }
task = work_queue_wait(queue); // process the completed task}
17
Parrot Virtual File System
Local HTTP CVMFS Chirp iRODS
OrdinaryAppl
Filesystem Interface: open/read/write/close
Web Servers
iRODSServer
CVMFSNetwork
ChirpServer
Parrot and Chirp
http://www.nd.edu/~ccl/software/manuals
Source code in GitHubhttp://github.com/cooperative-computing-lab/cctools
Makeflow & Work Queue
Federate/harness all available resources: desktops, clusters, clouds, grids.
Simple interfaces & API
Part of CCTools softwareNo special privileges required to install.
Makeflow Lecture: Outline
1. What is Makeflow?– Portable: One Makeflow program for SGE, Condor, PBS
2. How to write an application using Makeflow?– Simple rule-based syntax
3. How to run Makeflow?– Features, commands, using Work Queue
An Old Idea: Makefiles
22
part1 part2 part3: input.data split.py ./split.py input.data
out1: part1 mysim.exe ./mysim.exe part1 >out1
out2: part2 mysim.exe ./mysim.exe part2 >out2
out3: part3 mysim.exe ./mysim.exe part3 >out3
result: out1 out2 out3 join.py ./join.py out1 out2 out3 > result
Makeflow Language - RulesEach rule specifies:
a set of target files to create;a set of source files needed to create them;a command that generates the target files from the source files.
part1 part2 part3: input.data split.py ./split.py input.data
out1: part1 mysim.exe ./mysim.exe part1 >out1
out2: part2 mysim.exe ./mysim.exe part2 >out2
out3: part3 mysim.exe ./mysim.exe part3 >out3
result: out1 out2 out3 join.py ./join.py out1 out2 out3 > result
out1 : part1 mysim.exemysim.exe part1 > out1
You must stateall the files
needed by the command.
sims.mf
out.10 : in.dat calib.dat sim.exesim.exe –p 10 in.data > out.10
out.20 : in.dat calib.dat sim.exesim.exe –p 20 in.data > out.20
out.30 : in.dat calib.dat sim.exesim.exe –p 30 in.data > out.30
Makeflow = Make + Workflow
Provides portability across batch systems.Enable parallelism (but not too much!)Fault tolerance at multiple scales.Data and resource management.
Makeflow
Local Condor SGE WorkQueue
http://www.nd.edu/~ccl/software/makeflow
PrivateCluster
CampusCondor
Pool
PublicCloud
Provider
XSEDE Cluster
Makefile
Makeflow
Local Files and Programs
Makeflow + Batch System
makeflow –T sge
makeflow –T condor
Work Queue
Work Queue
How to run a MakeflowRun a workflow local
% makeflow -T local sims.mfRun the workflow on SGE:% makeflow -T sge sims.mfRun the workflow on Condor:% makeflow -T condor sims.mfClean up the workflow outputs:% makeflow -c sims.mf
Makeflow can verify if your Makeflow file is syntactically correct % makeflow -k sims.mf Makeflow: Syntax OK.
Makeflow will point out syntax errors if any % makeflow -k sims.mf makeflow: out10 is defined multiple times at out.10:1 and out.10:4
Makeflow Syntax Checker
Makeflow can output a makeflow file as a Dot graph.
% makeflow -D dot sims.mfdigraph {node [shape=ellipse,color = green,style = unfilled,fixedsize = false];N2 [label="sim.exe"];N1 [label="sim.exe"];N0 [label="sim.exe"];node [shape=box,color=blue,style=unfilled,fixedsize=false];F3 [label = "out.30"];F0 [label = "sim.exe"];F5 [label = "out.10"];F2 [label = "in.dat"];F1 [label = "calib.dat"];F4 [label = "out.20"];....
Makeflow Visualization
Example App: Biocompute Portal
Generate Makefile
Makeflow
RunWorkflow
ProgressBar
Transaction Log
UpdateStatus
CondorPool
Submit Tasks
BLASTSSAHASHRIMPESTMAKER…
Makeflow + Work Queue
PrivateCluster
CampusCondor
Pool
PublicCloud
Provider
XSEDECluster
Makefile
Makeflow
Local Files and Programs
Makeflow + Batch System
makeflow –T sge
makeflow –T condor
???
???
XSEDECluster
CampusCondor
Pool
PublicCloud
Provider
PrivateCluster
Makefile
Makeflow
Local Files and Programs
Makeflow + Work Queue
W
W
W
ssh
WW
WW
sge_submit_workers
W
W
W
condor_submit_workers
W
W
W
Thousands of Workers in a
Personal Cloud
submittasks
Advantages of Work Queue
Scalability: Harness multiple infrastructure
simultaneously.
Elasticity: Scale resources up & down as
needed.
Data Management: Remote data caching.
Data Locality: Matches tasks to nodes with data.
Fault Tolerance
MF +WQ is fault tolerant :If Makeflow crashes (or killed), it recovers by reading log and continues where it left off.
If a worker crashes, the master will detect and restart the task elsewhere.
Workers can be added and removed any time during execution.
Makeflow and Work QueueTo start the Makeflow% makeflow -T wq sims.mfCould not create work_queue on port 9123.
% makeflow -T wq -p 0 sims.mf Listening for workers on port 8374…
To start one worker:% work_queue_worker ccl.cse.nd.edu 8374
Start Workers Everywhere!Submit workers to SGE:% sge_submit_workers ccl.cse.nd.edu 8374 25
Submit workers to Condor:% condor_submit_workers ccl.cse.nd.edu 8374 25
Submit workers to Torque:% torque_submit_workers ccl.cse.nd.edu 8374 25
Keeping track of port numbersgets old fast…
Project Names
Worker
work_queue_worker -a –N myproject
Catalog
connect toccl.cse.nd.edu:4057
advertise
“myproject”is at ccl.cse.nd.edu:4057
query
Makeflow(port 4057)
makeflow … –a –N myproject
Makeflow with Project NamesStart Makeflow with a project name:% makeflow -T wq -p 0 -a -N xsede-tutorial sims.mf Listening for workers on port XYZ…
Start one worker:% work_queue_worker -N xsede-tutorial
Start many workers:% sge_submit_workers -N ccgrid-tutorial 5
http://www.nd.edu/~ccl/software/makeflow/
The Cooperative Computing Lab
43
The Cooperative Computing LabThe Cooperative Computing Lab• We collaborate with people who have
large scale computing problems in science, engineering, and other fields.
• We operate computer systems on the O(10,000) cores: clusters, clouds, grids.
• We conduct computer science research in the context of real people and problems.
• We develop open source software for large scale distributed computing.
Makeflow• Portable: One program for clusters, grids, clouds
• Simple syntax: inputs, outputs, command • All files needed by command must be specified
• Makeflow with Work Queue• Federation, Elasticity, Data management
• Project Names• Easy to remember locations of Makeflow masters
Chris Hempel (TACC)David Gignac (TACC)
Acknowledgements
Go to: http://www.nd.edu/~cclClick on “Tutorial at XSEDE 2013”
Click on “Tutorial” under Makeflow