24
Greg Thain Computer Sciences Department University of Wisconsin-Madison [email protected] http://www.cs.wisc.edu/condor/mw Master-Worker Tutorial Condor Week 2006

Master-Worker Tutorial Condor Week 2006

  • Upload
    taji

  • View
    34

  • Download
    0

Embed Size (px)

DESCRIPTION

Master-Worker Tutorial Condor Week 2006. Agenda. What is M-W When to use M-W How to build a simple M-W application Q & A. Why M-W?. M-W addresses a weakness in Condor: Short jobs Also, for dynamic, parallel workflows. A Condor Job…. An easy solution:. - PowerPoint PPT Presentation

Citation preview

Page 1: Master-Worker Tutorial Condor Week 2006

Greg ThainComputer Sciences DepartmentUniversity of Wisconsin-Madison

[email protected]://www.cs.wisc.edu/condor/mw

Master-WorkerTutorial

Condor Week 2006

Page 2: Master-Worker Tutorial Condor Week 2006

www.cs.wisc.edu/condor/mw

Agenda

› What is M-W

› When to use M-W

› How to build a simple M-W application

› Q & A

Page 3: Master-Worker Tutorial Condor Week 2006

www.cs.wisc.edu/condor/mw

Why M-W?

› M-W addresses a weakness in Condor:

Short jobs

› Also, for dynamic, parallel workflows

Page 4: Master-Worker Tutorial Condor Week 2006

www.cs.wisc.edu/condor/mw

A Condor Job…

Page 5: Master-Worker Tutorial Condor Week 2006

www.cs.wisc.edu/condor/mw

An easy solution:

› Why not just wrap up smaller jobs into a bigger Condor job? Partial failures? Load balancing? Dynamic creation of work?

Page 6: Master-Worker Tutorial Condor Week 2006

www.cs.wisc.edu/condor/mw

Solution: Lightweight Tasks

Multiplexed on top of Jobs

› Process : Thread :: Condor Job : MW Task

› MWTask dispatch in milliseconds, Condor job can take minutes

Page 7: Master-Worker Tutorial Condor Week 2006

www.cs.wisc.edu/condor/mw

MW is…

› C++ Framework

› To re-use condor worker jobs

› To each run many tasks

› Results in very parallel application

Page 8: Master-Worker Tutorial Condor Week 2006

www.cs.wisc.edu/condor/mw

MW is not

› MPI

› General parallel programming scheme

Page 9: Master-Worker Tutorial Condor Week 2006

www.cs.wisc.edu/condor/mw

MW in action

condor_submit

Submit machine

T T T T T T T T

Master exe

T

T

TWorker

Worker

Worker

Page 10: Master-Worker Tutorial Condor Week 2006

www.cs.wisc.edu/condor/mw

You Must Write 3 Classes

Subclasses of …MWDriver

MWTask

MWWorker

Master exe

Worker exe

Page 11: Master-Worker Tutorial Condor Week 2006

www.cs.wisc.edu/condor/mw

Your_MWTask

› Subclass MWTask

› Data members for inputs

› Data member for results

› Serialization of inputs and results

› Distinct instances on each side

Page 12: Master-Worker Tutorial Condor Week 2006

www.cs.wisc.edu/condor/mw

The Four Task Methods

› void MyTask::pack_work(void);

› void MyTask::unpack_work(void);

› void MyTask::pack_results(void);

› void MyTask::unpack_results(void);

› Also ctor/dtor!

Page 13: Master-Worker Tutorial Condor Week 2006

www.cs.wisc.edu/condor/mw

RMComms

› Abstraction for communication• (and some other stuff…)

› RMC->pack(int *array, int length);

› RMC->unpack(int *array, int length);

Page 14: Master-Worker Tutorial Condor Week 2006

www.cs.wisc.edu/condor/mw

MWWorker

› Just one method:

› executeTask(MWTask *t)

› Also ctor/dtor!

Page 15: Master-Worker Tutorial Condor Week 2006

www.cs.wisc.edu/condor/mw

MWDriver

› get_userinfo(int argc, char **argv) RMC->add_executable(char *exe, char *requirements);

› setup_initial_tasks(int num_tasks, MWTask ***init_tasks)

› act_on_completed_task(MWTask *t) RMC->add_task(MWTask *t)

› Also ctor/dtor

Page 16: Master-Worker Tutorial Condor Week 2006

www.cs.wisc.edu/condor/mw

Putting it all together:new_skel

› ./new_skel MY_PROJECT

› Use configure –help for options

› make

Page 17: Master-Worker Tutorial Condor Week 2006

www.cs.wisc.edu/condor/mw

Debugging with Independent Mode

› Special RMComm for debugging

› Single process, can run under gdb

Page 18: Master-Worker Tutorial Condor Week 2006

www.cs.wisc.edu/condor/mw

Running on the Grid…

› Just launch the appropriate master

› condor_q to see it in action

Page 19: Master-Worker Tutorial Condor Week 2006

www.cs.wisc.edu/condor/mw

Advice for Large Runs

› Use personal condor Flock, glide-in, schedd-on-side,

hobblein

› Use checkpointing!

› Set_worker_increment high

Page 20: Master-Worker Tutorial Condor Week 2006

www.cs.wisc.edu/condor/mw

User-level Checkpointing

› MWTask::write_chkpt_info(FILE *)

› MWTask::read_chkpt_info(FILE *)

› MWDriver::read_master_state(FILE *)

› MWDriver::write_master_state(FILE *)

Page 21: Master-Worker Tutorial Condor Week 2006

www.cs.wisc.edu/condor/mw

Example codes with MW

› Matmul

› Blackbox

› knapsack

Page 22: Master-Worker Tutorial Condor Week 2006

www.cs.wisc.edu/condor/mw

MW Philosophy

› Reuse either code or concept

› Key idea: Late binding

Page 23: Master-Worker Tutorial Condor Week 2006

www.cs.wisc.edu/condor/mw

Other resources

› http://www.cs.wisc.edu/condor/mw

› Online manual

› MW-users mailing list

Page 24: Master-Worker Tutorial Condor Week 2006

www.cs.wisc.edu/condor/mw

Thank You!

Questions?

MW Home page: http://www.cs.wisc.edu/condor/mw