7
Condor and Multi-core Scheduling Dan Bradley [email protected] University of Wisconsin enerous support from the National Science Foundati oductive collaboration with Red Hat

Condor and Multi-core Scheduling

Embed Size (px)

DESCRIPTION

Condor and Multi-core Scheduling. Dan Bradley [email protected] University of Wisconsin. with generous support from the National Science Foundation and productive collaboration with Red Hat. What Exists Today. Parallel scheduling Designed for MPI-type jobs - PowerPoint PPT Presentation

Citation preview

Page 1: Condor and Multi-core Scheduling

Condor and Multi-core Scheduling

Dan Bradley

[email protected]

University of Wisconsin

with generous support from the National Science Foundationand productive collaboration with Red Hat

Page 2: Condor and Multi-core Scheduling

Dan Bradley (Wisconsin) @ CERN multi-core workshop 2 10 Oct 2008

What Exists Today

• Parallel scheduling– Designed for MPI-type jobs– No way to limit matchmaking to slots on

same machine

• Custom Batch Slots– One “slot” can be <> one core– Slot policy can depend on type of job

Page 3: Condor and Multi-core Scheduling

Dan Bradley (Wisconsin) @ CERN multi-core workshop 3 10 Oct 2008

Example: Custom Batch Slots

• Slot 1, 2, 3– only accept 1-core jobs

• Slot 4– when claimed by normal 1-core job,

behaves normally– when claimed by 4-core job

• stop accepting jobs for slots 1, 2, and 3• suspends 4-core job until slots 1, 2, and 3 drain

Related Condor How-to: http://nmi.cs.wisc.edu/node/1482

Page 4: Condor and Multi-core Scheduling

Dan Bradley (Wisconsin) @ CERN multi-core workshop 4 10 Oct 2008

shortcomings

• Awkward to extend to N-core jobs, but doable

• Very awkward to extend to N-core jobs and also support dynamic partitioning of memory, etc.

• Requires custom configuration by admins; no standard JDL for submitting grid jobs to sites

• Accounting does not charge multi-core job at higher rate than 1-core job

Page 5: Condor and Multi-core Scheduling

Dan Bradley (Wisconsin) @ CERN multi-core workshop 5 10 Oct 2008

in development: dynamic slots

• Start with one “batch slot” representing whole machine

• Trim slot to what job requires (cores, memory, disk, network, etc.)

• Leftovers assigned to new slot

Page 6: Condor and Multi-core Scheduling

Dan Bradley (Wisconsin) @ CERN multi-core workshop 6 10 Oct 2008

TODO

• improve accounting and any other monitoring to support multi-core slots

• support standard JDL for requesting multiple cores

• deal with preemption and/or mechanism for preventing starvation of multi-core jobs

• speed up dynamic slot creation

Page 7: Condor and Multi-core Scheduling

Dan Bradley (Wisconsin) @ CERN multi-core workshop 7 10 Oct 2008

Questions

• Is it desirable to have the capability to lock a job to a CPU?

• Does Globus provide RSL attributes with well-defined semantics for multi-core jobs?– example: (count=N)(jobtype=single)– in my experience, this is not well-defined– can mean N cores x 1 job or N x job