33
Condor Project Computer Sciences Department University of Wisconsin-Madison [email protected] http://www.cs.wisc.edu/condor Master/Worker and Condor Barcelona, 2006

Master/Worker and Condor Barcelona, 2006

  • Upload
    nate

  • View
    38

  • Download
    0

Embed Size (px)

DESCRIPTION

Master/Worker and Condor Barcelona, 2006. Agenda. Extended user’s tutorial Advanced Uses of Condor Java programs DAGMan Stork MW Grid Computing Case studies, and a discussion of your application‘s needs. Why M aster W orker?. MW addresses a weakness in Condor: Short jobs - PowerPoint PPT Presentation

Citation preview

Page 1: Master/Worker and Condor Barcelona, 2006

Condor ProjectComputer Sciences DepartmentUniversity of Wisconsin-Madison

[email protected]://www.cs.wisc.edu/condor

Master/Worker and Condor

Barcelona, 2006

Page 2: Master/Worker and Condor Barcelona, 2006

2http://www.cs.wisc.edu/condor

AgendaExtended user’s tutorialAdvanced Uses of Condor

Java programsDAGManStorkMWGrid Computing

Case studies, and a discussion of your application‘s needs

Page 3: Master/Worker and Condor Barcelona, 2006

3http://www.cs.wisc.edu/condor

Why Master Worker?

MW addresses a weakness in Condor:

Short jobs

Excellent for dynamic, parallel workflows

Page 4: Master/Worker and Condor Barcelona, 2006

4http://www.cs.wisc.edu/condor

A Workflow Problem

A problem requires that we do A 60,000 times, and we do B 100,000 times A takes 1 second B takes 3 seconds

Computation time for the problem is(60000 x 1) + (100000 x 3) =

360,000 seconds or 100 hours

Page 5: Master/Worker and Condor Barcelona, 2006

5http://www.cs.wisc.edu/condor

Condor Runs the Workflow

Assume that the overhead Condor adds to running each instance of A or B is 20 seconds (this overhead is much too small)

Time for Condor to do the problem is(60000 x 21) + (100000 x 23) =

3,560,000 seconds or 989 hours

Page 6: Master/Worker and Condor Barcelona, 2006

6http://www.cs.wisc.edu/condor

A Condor Job…

Page 7: Master/Worker and Condor Barcelona, 2006

7http://www.cs.wisc.edu/condor

Bundle several As or Bs into a single Condor job

Must address further issues: Partial failures Load balancing Dynamic creation of work

An Often Considered Solution

AA

A

One Condor job

Page 8: Master/Worker and Condor Barcelona, 2006

8http://www.cs.wisc.edu/condor

Basics of MW

The master gives tasks to the workers.

Page 9: Master/Worker and Condor Barcelona, 2006

9http://www.cs.wisc.edu/condor

Workers and TasksEach worker serially takes on tasks,

as assigned by the master

feed me

change diaper

bathe me

one worker

Page 10: Master/Worker and Condor Barcelona, 2006

10http://www.cs.wisc.edu/condor

Relating MW to Condor

There is 1 master The master determines the number of

workers Each worker is a Condor job Each worker receives tasks serially Many workers do tasks at the same time

(in parallel) Workers communicate only with the

master

Page 11: Master/Worker and Condor Barcelona, 2006

11http://www.cs.wisc.edu/condor

Solution: Lightweight Tasks

Multiplexed on top of Jobs

The analogy:Process is to Thread as

Condor Job is to an MW Task A Condor job may take

minutes to create and dispatch; an MWTask dispatch takes milliseconds

Page 12: Master/Worker and Condor Barcelona, 2006

12http://www.cs.wisc.edu/condor

MW is

C++ Framework

A way to re-use Condor worker jobs

Each worker may run many tasks Results in a very parallel application

Page 13: Master/Worker and Condor Barcelona, 2006

13http://www.cs.wisc.edu/condor

MW is not

MPI (Message Passing Interface) General parallel programming

scheme

Page 14: Master/Worker and Condor Barcelona, 2006

14http://www.cs.wisc.edu/condor

MW in action

condor_submit

Submit machine

Master exe

T

T

TWorker

Worker

Worker

TTTTT

Page 15: Master/Worker and Condor Barcelona, 2006

15http://www.cs.wisc.edu/condor

You Must Write 3 Classes, the Subclasses

of. . . MWDriver

MWTask

MWWorker

Master exe

Worker exe

Page 16: Master/Worker and Condor Barcelona, 2006

16http://www.cs.wisc.edu/condor

An MWTask

Subclass MWTask Data members for inputs Data member for results

Serialization of inputs and results Distinct instances on each side

Page 17: Master/Worker and Condor Barcelona, 2006

17http://www.cs.wisc.edu/condor

The Four Task Methods void MyTask::pack_work(void); void MyTask::unpack_work(void); void MyTask::pack_results(void); void MyTask::unpack_results(void);

Also constructors and destructors!

Page 18: Master/Worker and Condor Barcelona, 2006

18http://www.cs.wisc.edu/condor

RMC

Resource Management and Communication

An abstraction to set up communication, to specify resource requirements, etc.

RMC->pack(int *array, int length); RMC->unpack(int *array, int length);

Page 19: Master/Worker and Condor Barcelona, 2006

19http://www.cs.wisc.edu/condor

MWWorker

• Just one method:

executeTask(MWTask *t)

• Also constructor and destructor!

Page 20: Master/Worker and Condor Barcelona, 2006

20http://www.cs.wisc.edu/condor

MWDriver (the master)

get_userinfo(int argc, char **argv) RMC->add_executable(char *exe, char *requirements);

setup_initial_tasks(int num_tasks, MWTask ***init_tasks)

act_on_completed_task(MWTask *t) RMC->add_task(MWTask *t)

Also constructor and destructor

Page 21: Master/Worker and Condor Barcelona, 2006

21http://www.cs.wisc.edu/condor

MWTask ***init_tasks

task

array of pointers to taskspointer to the array

Page 22: Master/Worker and Condor Barcelona, 2006

22http://www.cs.wisc.edu/condor

MWDriver (the master)

get_userinfo(int argc, char **argv) RMC->add_executable(char *exe, char *requirements);

setup_initial_tasks(int num_tasks, MWTask ***init_tasks)

act_on_completed_task(MWTask *t) RMC->add_task(MWTask *t)

Also constructor and destructor

Page 23: Master/Worker and Condor Barcelona, 2006

23http://www.cs.wisc.edu/condor

Putting it all together:examples/new_skel

./new_app MY_PROJECTA Perl script to create appropriately named

files containing skeleton code

Use configure –help for options

make

Page 24: Master/Worker and Condor Barcelona, 2006

24http://www.cs.wisc.edu/condor

Running an application

Just launch the appropriate master use condor_q to see it in action

Page 25: Master/Worker and Condor Barcelona, 2006

25http://www.cs.wisc.edu/condor

Real MW Applications MWFATCOP (Chen, Ferris, Linderoth)

A branch and cut code for linear integer programming MWMINLP (Goux, Leyffer, Nocedal)

A branch and bound code for nonlinear integer programming MWQPBB (Linderoth)

A (simplicial) branch and bound code for solving quadratically constrained quadratic programs

MWAND (Linderoth, Shen)A nested decomposition based solver for multistage stochastic linear programming

MWATR (Linderoth, Shapiro, Wright)A trust-region-enhanced cutting plane code for linear stochastic programming and statistical verification of solution quality.

MWQAP (Anstreicher, Brixius, Goux, Linderoth)A branch and bound code for solving the quadratic assignment problem

Page 26: Master/Worker and Condor Barcelona, 2006

26http://www.cs.wisc.edu/condor

Other resources

http://www.cs.wisc.edu/condor/mw

Online manual

MW-users mailing list

Page 27: Master/Worker and Condor Barcelona, 2006

27http://www.cs.wisc.edu/condor

Extra Slides

Page 28: Master/Worker and Condor Barcelona, 2006

28http://www.cs.wisc.edu/condor

Advice for Large Runs

Use Personal Condor Flock, glidein, schedd-on-side,

hobblein

Use checkpoints! Set worker_increment high

Page 29: Master/Worker and Condor Barcelona, 2006

29http://www.cs.wisc.edu/condor

Debugging with Independent Mode

Special RMComm for debugging

Single process, can run under gdb

Page 30: Master/Worker and Condor Barcelona, 2006

30http://www.cs.wisc.edu/condor

MW Philosophy

Reuse either code or concept

Key idea: Late binding

Page 31: Master/Worker and Condor Barcelona, 2006

31http://www.cs.wisc.edu/condor

User-level Checkpoints

MWTask::write_chkpt_info(FILE *) MWTask::read_chkpt_info(FILE *)

MWDriver::read_master_state(FILE *) MWDriver::write_master_state(FILE *)

Page 32: Master/Worker and Condor Barcelona, 2006

32http://www.cs.wisc.edu/condor

Example codes with MW

Matmul

Blackbox

knapsack

Page 33: Master/Worker and Condor Barcelona, 2006

33http://www.cs.wisc.edu/condor

More on MW

http://www.cs.wisc.edu/condor/mw Version 0.2 is the latest

It is more stable than the version number suggests!

Mailing list available for discussion Active development by the Condor

team