Click here to load reader

MPI Scheduling in Condor: An Update Paradyn/Condor Week Madison, WI 2002

  • View

  • Download

Embed Size (px)


MPI Scheduling in Condor: An Update Paradyn/Condor Week Madison, WI 2002. Outline. Review of Dedicated/MPI Scheduling in Condor Dedicated vs. Opportunistic Backfill Supported MPI Implementations Supported Platforms Future Work. What is MPI?. MPI is the “Message Passing Interface” - PowerPoint PPT Presentation

Text of MPI Scheduling in Condor: An Update Paradyn/Condor Week Madison, WI 2002

Condor and MPIMPI Scheduling in Condor: An Update
Paradyn/Condor Week
Dedicated vs. Opportunistic
Fixed number of nodes
Dedicated Scheduling in Condor
To schedule MPI jobs, Condor must have access to dedicated resources
More and more Condor pools are being formed from dedicated resources
Few schedulers handle both dedicated and non-dedicated resources at the same time
Dedicated resources are not really dedicated
Most software for controlling clusters relies on dedicated scheduling algorithms
Assume constant availability of resources to compute fixed schedules
Due to hardware and software failure, dedicated resources are not always available over the long-term
Condor overcomes these difficulties by combining aspects of dedicated and opportunistic scheduling into a single system
Opportunistic scheduling involves placing jobs on non-dedicated resources under the assumption that the resources might not be available for the entire duration of the jobs
This is what Condor has been doing for years
Condor manages all resources and jobs within a single system
Administrators only have to maintain one system, saving time and money
Users can submit a wide variety of jobs:
Serial or parallel (including PVM + MPI)
Spend less time learning different scheduling tools, more time doing science
Claiming Resources for Dedicated Jobs
When the dedicated scheduler (DS) has idle jobs, it queries the collector to find all dedicated resources
DS does match-making to decide which resources it wants
DS sends requests to the opportunistic scheduler to claim those resources
DS claims resources and has exclusive control (until it releases them)
Traditional solution is to use backfilling
Use lower priority parallel jobs
Use serial jobs
However, if you can’t checkpoint the serial jobs, and/or you don’t have any parallel jobs of the right size and duration, you’ve still got holes
Backfilling: The Condor Solution
In Condor, we already have an infrastructure for managing non-dedicated nodes with opportunistic scheduling, so we use that to fill the holes in the dedicated schedule
Our opportunistic jobs can be checkpointed and migrated when the dedicated scheduler needs the resources again
Allows dedicated resources to be used for opportunistic jobs as needed
MPICH uses rsh to spawn jobs
Condor provides our own rsh tool
Older versions of MPICH need to be built without a hard-coded path to rsh
Newer versions of MPICH ( and later) support an environment variable, P4_RSHCOMMAND, which specifies what program should be used
We’ve investigated supporting MPIPro jobs with Condor
MPIPro has some issues with selecting a port for the head node in your computation, and we’re looking for a good solution
Condor + LAM = "LAMdor”
LAM's API is better suited for a dynamic environment, where hosts can come and go from your MPI universe
Has a different mechanism for spawning jobs than MPICH
Condor working to support their methods for spawning
LAMdor (Cont’d)
LAM working to understand, expand, and fully implement the dynamic scheduling calls in their API
LAM also considering using Condor’s libraries to support checkpointing of MPI computations
What are people using?
Do you want to see Condor support any other MPI implementations?
If so, let us know by sending email to:
Supported Platforms
Condor’s MPI support is now available on all Condor platforms:
Windows (new since last year)
NT, 2000
Integrating Condor’s user priority system with its dedicated scheduling
Adding support for user-specified job priorities (among their own jobs)
Condor-MPI support for the Tool Daemon Protocol
Solving problems w/ MPI on the Grid
"Flocking" MPI jobs to remote pools, or even spanning pools with a single computation
Solving issues of resource ownership on the Grid (i.e. how do you handle multiple dedicated schedulers on the grid wanting to control a given resource?)
Generic dedicated jobs
We gather and schedule the resources, then call your program, give it the list of machines, and let the program spawn itself
Linda (parallel programming interface)
“Checkpointing” vanilla jobs to swap space
Checkpointing entire MPI computations
How do I start using MPI with Condor?
MPI support added and tested in the current development series (6.3.X)
MPI support is a built-in feature of the next stable series of Condor (6.4.X)
6.4.0 will be released Any Day Now™
Come to the MPI “BoF”, Wednesday, 3/6/02, 11am-noon, 3385 CS
For more information:

Search related