Upload
nate
View
38
Download
0
Embed Size (px)
DESCRIPTION
Master/Worker and Condor Barcelona, 2006. Agenda. Extended user’s tutorial Advanced Uses of Condor Java programs DAGMan Stork MW Grid Computing Case studies, and a discussion of your application‘s needs. Why M aster W orker?. MW addresses a weakness in Condor: Short jobs - PowerPoint PPT Presentation
Citation preview
Condor ProjectComputer Sciences DepartmentUniversity of Wisconsin-Madison
[email protected]://www.cs.wisc.edu/condor
Master/Worker and Condor
Barcelona, 2006
2http://www.cs.wisc.edu/condor
AgendaExtended user’s tutorialAdvanced Uses of Condor
Java programsDAGManStorkMWGrid Computing
Case studies, and a discussion of your application‘s needs
3http://www.cs.wisc.edu/condor
Why Master Worker?
MW addresses a weakness in Condor:
Short jobs
Excellent for dynamic, parallel workflows
4http://www.cs.wisc.edu/condor
A Workflow Problem
A problem requires that we do A 60,000 times, and we do B 100,000 times A takes 1 second B takes 3 seconds
Computation time for the problem is(60000 x 1) + (100000 x 3) =
360,000 seconds or 100 hours
5http://www.cs.wisc.edu/condor
Condor Runs the Workflow
Assume that the overhead Condor adds to running each instance of A or B is 20 seconds (this overhead is much too small)
Time for Condor to do the problem is(60000 x 21) + (100000 x 23) =
3,560,000 seconds or 989 hours
6http://www.cs.wisc.edu/condor
A Condor Job…
7http://www.cs.wisc.edu/condor
Bundle several As or Bs into a single Condor job
Must address further issues: Partial failures Load balancing Dynamic creation of work
An Often Considered Solution
AA
A
One Condor job
8http://www.cs.wisc.edu/condor
Basics of MW
The master gives tasks to the workers.
9http://www.cs.wisc.edu/condor
Workers and TasksEach worker serially takes on tasks,
as assigned by the master
feed me
change diaper
bathe me
one worker
10http://www.cs.wisc.edu/condor
Relating MW to Condor
There is 1 master The master determines the number of
workers Each worker is a Condor job Each worker receives tasks serially Many workers do tasks at the same time
(in parallel) Workers communicate only with the
master
11http://www.cs.wisc.edu/condor
Solution: Lightweight Tasks
Multiplexed on top of Jobs
The analogy:Process is to Thread as
Condor Job is to an MW Task A Condor job may take
minutes to create and dispatch; an MWTask dispatch takes milliseconds
12http://www.cs.wisc.edu/condor
MW is
C++ Framework
A way to re-use Condor worker jobs
Each worker may run many tasks Results in a very parallel application
13http://www.cs.wisc.edu/condor
MW is not
MPI (Message Passing Interface) General parallel programming
scheme
14http://www.cs.wisc.edu/condor
MW in action
condor_submit
Submit machine
Master exe
T
T
TWorker
Worker
Worker
TTTTT
15http://www.cs.wisc.edu/condor
You Must Write 3 Classes, the Subclasses
of. . . MWDriver
MWTask
MWWorker
Master exe
Worker exe
16http://www.cs.wisc.edu/condor
An MWTask
Subclass MWTask Data members for inputs Data member for results
Serialization of inputs and results Distinct instances on each side
17http://www.cs.wisc.edu/condor
The Four Task Methods void MyTask::pack_work(void); void MyTask::unpack_work(void); void MyTask::pack_results(void); void MyTask::unpack_results(void);
Also constructors and destructors!
18http://www.cs.wisc.edu/condor
RMC
Resource Management and Communication
An abstraction to set up communication, to specify resource requirements, etc.
RMC->pack(int *array, int length); RMC->unpack(int *array, int length);
19http://www.cs.wisc.edu/condor
MWWorker
• Just one method:
executeTask(MWTask *t)
• Also constructor and destructor!
20http://www.cs.wisc.edu/condor
MWDriver (the master)
get_userinfo(int argc, char **argv) RMC->add_executable(char *exe, char *requirements);
setup_initial_tasks(int num_tasks, MWTask ***init_tasks)
act_on_completed_task(MWTask *t) RMC->add_task(MWTask *t)
Also constructor and destructor
21http://www.cs.wisc.edu/condor
MWTask ***init_tasks
task
array of pointers to taskspointer to the array
22http://www.cs.wisc.edu/condor
MWDriver (the master)
get_userinfo(int argc, char **argv) RMC->add_executable(char *exe, char *requirements);
setup_initial_tasks(int num_tasks, MWTask ***init_tasks)
act_on_completed_task(MWTask *t) RMC->add_task(MWTask *t)
Also constructor and destructor
23http://www.cs.wisc.edu/condor
Putting it all together:examples/new_skel
./new_app MY_PROJECTA Perl script to create appropriately named
files containing skeleton code
Use configure –help for options
make
24http://www.cs.wisc.edu/condor
Running an application
Just launch the appropriate master use condor_q to see it in action
25http://www.cs.wisc.edu/condor
Real MW Applications MWFATCOP (Chen, Ferris, Linderoth)
A branch and cut code for linear integer programming MWMINLP (Goux, Leyffer, Nocedal)
A branch and bound code for nonlinear integer programming MWQPBB (Linderoth)
A (simplicial) branch and bound code for solving quadratically constrained quadratic programs
MWAND (Linderoth, Shen)A nested decomposition based solver for multistage stochastic linear programming
MWATR (Linderoth, Shapiro, Wright)A trust-region-enhanced cutting plane code for linear stochastic programming and statistical verification of solution quality.
MWQAP (Anstreicher, Brixius, Goux, Linderoth)A branch and bound code for solving the quadratic assignment problem
26http://www.cs.wisc.edu/condor
Other resources
http://www.cs.wisc.edu/condor/mw
Online manual
MW-users mailing list
27http://www.cs.wisc.edu/condor
Extra Slides
28http://www.cs.wisc.edu/condor
Advice for Large Runs
Use Personal Condor Flock, glidein, schedd-on-side,
hobblein
Use checkpoints! Set worker_increment high
29http://www.cs.wisc.edu/condor
Debugging with Independent Mode
Special RMComm for debugging
Single process, can run under gdb
30http://www.cs.wisc.edu/condor
MW Philosophy
Reuse either code or concept
Key idea: Late binding
31http://www.cs.wisc.edu/condor
User-level Checkpoints
MWTask::write_chkpt_info(FILE *) MWTask::read_chkpt_info(FILE *)
MWDriver::read_master_state(FILE *) MWDriver::write_master_state(FILE *)
32http://www.cs.wisc.edu/condor
Example codes with MW
Matmul
Blackbox
knapsack
33http://www.cs.wisc.edu/condor
More on MW
http://www.cs.wisc.edu/condor/mw Version 0.2 is the latest
It is more stable than the version number suggests!
Mailing list available for discussion Active development by the Condor
team