Master/Worker and Condor Barcelona, 2006

Preview:

DESCRIPTION

Master/Worker and Condor Barcelona, 2006. Agenda. Extended user’s tutorial Advanced Uses of Condor Java programs DAGMan Stork MW Grid Computing Case studies, and a discussion of your application‘s needs. Why M aster W orker?. MW addresses a weakness in Condor: Short jobs - PowerPoint PPT Presentation

Citation preview

Condor ProjectComputer Sciences DepartmentUniversity of Wisconsin-Madison

condor-admin@cs.wisc.eduhttp://www.cs.wisc.edu/condor

Master/Worker and Condor

Barcelona, 2006

2http://www.cs.wisc.edu/condor

AgendaExtended user’s tutorialAdvanced Uses of Condor

Java programsDAGManStorkMWGrid Computing

Case studies, and a discussion of your application‘s needs

3http://www.cs.wisc.edu/condor

Why Master Worker?

MW addresses a weakness in Condor:

Short jobs

Excellent for dynamic, parallel workflows

4http://www.cs.wisc.edu/condor

A Workflow Problem

A problem requires that we do A 60,000 times, and we do B 100,000 times A takes 1 second B takes 3 seconds

Computation time for the problem is(60000 x 1) + (100000 x 3) =

360,000 seconds or 100 hours

5http://www.cs.wisc.edu/condor

Condor Runs the Workflow

Assume that the overhead Condor adds to running each instance of A or B is 20 seconds (this overhead is much too small)

Time for Condor to do the problem is(60000 x 21) + (100000 x 23) =

3,560,000 seconds or 989 hours

6http://www.cs.wisc.edu/condor

A Condor Job…

7http://www.cs.wisc.edu/condor

Bundle several As or Bs into a single Condor job

Must address further issues: Partial failures Load balancing Dynamic creation of work

An Often Considered Solution

AA

A

One Condor job

8http://www.cs.wisc.edu/condor

Basics of MW

The master gives tasks to the workers.

9http://www.cs.wisc.edu/condor

Workers and TasksEach worker serially takes on tasks,

as assigned by the master

feed me

change diaper

bathe me

one worker

10http://www.cs.wisc.edu/condor

Relating MW to Condor

There is 1 master The master determines the number of

workers Each worker is a Condor job Each worker receives tasks serially Many workers do tasks at the same time

(in parallel) Workers communicate only with the

master

11http://www.cs.wisc.edu/condor

Solution: Lightweight Tasks

Multiplexed on top of Jobs

The analogy:Process is to Thread as

Condor Job is to an MW Task A Condor job may take

minutes to create and dispatch; an MWTask dispatch takes milliseconds

12http://www.cs.wisc.edu/condor

MW is

C++ Framework

A way to re-use Condor worker jobs

Each worker may run many tasks Results in a very parallel application

13http://www.cs.wisc.edu/condor

MW is not

MPI (Message Passing Interface) General parallel programming

scheme

14http://www.cs.wisc.edu/condor

MW in action

condor_submit

Submit machine

Master exe

T

T

TWorker

Worker

Worker

TTTTT

15http://www.cs.wisc.edu/condor

You Must Write 3 Classes, the Subclasses

of. . . MWDriver

MWTask

MWWorker

Master exe

Worker exe

16http://www.cs.wisc.edu/condor

An MWTask

Subclass MWTask Data members for inputs Data member for results

Serialization of inputs and results Distinct instances on each side

17http://www.cs.wisc.edu/condor

The Four Task Methods void MyTask::pack_work(void); void MyTask::unpack_work(void); void MyTask::pack_results(void); void MyTask::unpack_results(void);

Also constructors and destructors!

18http://www.cs.wisc.edu/condor

RMC

Resource Management and Communication

An abstraction to set up communication, to specify resource requirements, etc.

RMC->pack(int *array, int length); RMC->unpack(int *array, int length);

19http://www.cs.wisc.edu/condor

MWWorker

• Just one method:

executeTask(MWTask *t)

• Also constructor and destructor!

20http://www.cs.wisc.edu/condor

MWDriver (the master)

get_userinfo(int argc, char **argv) RMC->add_executable(char *exe, char *requirements);

setup_initial_tasks(int num_tasks, MWTask ***init_tasks)

act_on_completed_task(MWTask *t) RMC->add_task(MWTask *t)

Also constructor and destructor

21http://www.cs.wisc.edu/condor

MWTask ***init_tasks

task

array of pointers to taskspointer to the array

22http://www.cs.wisc.edu/condor

MWDriver (the master)

get_userinfo(int argc, char **argv) RMC->add_executable(char *exe, char *requirements);

setup_initial_tasks(int num_tasks, MWTask ***init_tasks)

act_on_completed_task(MWTask *t) RMC->add_task(MWTask *t)

Also constructor and destructor

23http://www.cs.wisc.edu/condor

Putting it all together:examples/new_skel

./new_app MY_PROJECTA Perl script to create appropriately named

files containing skeleton code

Use configure –help for options

make

24http://www.cs.wisc.edu/condor

Running an application

Just launch the appropriate master use condor_q to see it in action

25http://www.cs.wisc.edu/condor

Real MW Applications MWFATCOP (Chen, Ferris, Linderoth)

A branch and cut code for linear integer programming MWMINLP (Goux, Leyffer, Nocedal)

A branch and bound code for nonlinear integer programming MWQPBB (Linderoth)

A (simplicial) branch and bound code for solving quadratically constrained quadratic programs

MWAND (Linderoth, Shen)A nested decomposition based solver for multistage stochastic linear programming

MWATR (Linderoth, Shapiro, Wright)A trust-region-enhanced cutting plane code for linear stochastic programming and statistical verification of solution quality.

MWQAP (Anstreicher, Brixius, Goux, Linderoth)A branch and bound code for solving the quadratic assignment problem

26http://www.cs.wisc.edu/condor

Other resources

http://www.cs.wisc.edu/condor/mw

Online manual

MW-users mailing list

27http://www.cs.wisc.edu/condor

Extra Slides

28http://www.cs.wisc.edu/condor

Advice for Large Runs

Use Personal Condor Flock, glidein, schedd-on-side,

hobblein

Use checkpoints! Set worker_increment high

29http://www.cs.wisc.edu/condor

Debugging with Independent Mode

Special RMComm for debugging

Single process, can run under gdb

30http://www.cs.wisc.edu/condor

MW Philosophy

Reuse either code or concept

Key idea: Late binding

31http://www.cs.wisc.edu/condor

User-level Checkpoints

MWTask::write_chkpt_info(FILE *) MWTask::read_chkpt_info(FILE *)

MWDriver::read_master_state(FILE *) MWDriver::write_master_state(FILE *)

32http://www.cs.wisc.edu/condor

Example codes with MW

Matmul

Blackbox

knapsack

33http://www.cs.wisc.edu/condor

More on MW

http://www.cs.wisc.edu/condor/mw Version 0.2 is the latest

It is more stable than the version number suggests!

Mailing list available for discussion Active development by the Condor

team

Recommended