Transcript
Page 1: UK Condor Week NeSC – Scotland – Oct 2004 Condor Team Computer Sciences Department University of Wisconsin-Madison  The Bologna

UK Condor WeekNeSC – Scotland – Oct 2004

Condor TeamComputer Sciences DepartmentUniversity of Wisconsin-Madisonhttp://www.cs.wisc.edu/condor

The Bologna Batch System: Flexible Policy

with Condor

Page 2: UK Condor Week NeSC – Scotland – Oct 2004 Condor Team Computer Sciences Department University of Wisconsin-Madison  The Bologna

2www.cs.wisc.edu/condor

The Bologna Batch System

› Custom batch scheduling system for local users at INFN in Bologna, Italy. “Istituto Nazionale di Fisica Nucleare” Dr. Paolo Mazzanti initiated the idea.

› Implement on a small subset of machines within the larger nationwide INFN Condor pool: INFN Condor Pool: ~300 CPUs INFN-Bologna Condor: ~100 CPUs Bologna Batch System: ~50 CPUs

Page 3: UK Condor Week NeSC – Scotland – Oct 2004 Condor Team Computer Sciences Department University of Wisconsin-Madison  The Bologna

3www.cs.wisc.edu/condor

Where We Started› Basic Condor Policy

Opportunistic resources• Jobs only run when machines are otherwise idle• Jobs can be preempted for machine owners or

higher-priority users Fair-share across INFN pool

• Highest priority user in the pool gets first crack at a given resource

• The more you use, the worse your priority becomes

› Some problems: Long-running vanilla jobs (with no checkpointing)

were frequently preempted before running to completion

Users dislike waiting for a resource if they only want to run a short job

High-priority users from other INFN sites running on local resources while lower-priority local users wait.

Page 4: UK Condor Week NeSC – Scotland – Oct 2004 Condor Team Computer Sciences Department University of Wisconsin-Madison  The Bologna

4www.cs.wisc.edu/condor

BBS Policy Requirements

› Prioritize local work Share resources, but run outside jobs as backfill

› Treat local servers as “dedicated” resources for local jobs, but “opportunistic” resources for other jobs. Run outside Condor jobs only if the server is idle. Run local batch jobs regardless of other system load

or console activity. Preempt outside Condor jobs to allow local batch jobs

to run, but don’t preempt local jobs for outside work.

Page 5: UK Condor Week NeSC – Scotland – Oct 2004 Condor Team Computer Sciences Department University of Wisconsin-Madison  The Bologna

5www.cs.wisc.edu/condor

BBS Policy Requirements

› Ensure resource availability for both short and long-running jobs Prioritize short batch jobs so that they are never kept

waiting by long batch jobs. Prevent long batch jobs from being preempted or

starved by short jobs.

› Never waste resources No idle CPUs when jobs are waiting to run! No preemption of vanilla jobs!

• Preemption ideal if you can checkpoint, but here we can’t…

Page 6: UK Condor Week NeSC – Scotland – Oct 2004 Condor Team Computer Sciences Department University of Wisconsin-Madison  The Bologna

6www.cs.wisc.edu/condor

A Contradiction!

› No way to guarantee resource availability for short or long jobs without “reserving” some CPUs for each…

› ...But no way to avoid idle CPUs without allowing them to start any kind of job: If CPUs reserved for short jobs are used for long jobs,

they become unavailable to run short jobs. If CPUs reserved for short jobs are not used for long

jobs, they’re being wasted when there are no short jobs to run.

› What to do, what to do…

Page 7: UK Condor Week NeSC – Scotland – Oct 2004 Condor Team Computer Sciences Department University of Wisconsin-Madison  The Bologna

7www.cs.wisc.edu/condor

A Solution!

› Allow resources to be temporarily overcommitted We treat one CPU as two… On a two-CPU machine, define four Condor VMs (virtual

machines): two for short jobs and two for long jobs.

› Allow jobs to be suspended rather than preempted Think of as “checkpointing to swap”…

› OR allow jobs to be “de-prioritized” temporarily If memory is adequate, allow “suspended” long jobs to

continue running at a poor OS priority and steal cycles whenever “active” short jobs are busy doing I/O.

Page 8: UK Condor Week NeSC – Scotland – Oct 2004 Condor Team Computer Sciences Department University of Wisconsin-Madison  The Bologna

8www.cs.wisc.edu/condor

Everybody wins!

› Short jobs start right away on dedicated “short” VMs

› Long jobs aren’t preempted by short jobs, but rather suspend temporarily or run at a lower priority.

› Outside jobs run only when no Bologna jobs waiting.

› All CPUs available to all types of jobs. No idle CPUs when jobs are waiting.

Page 9: UK Condor Week NeSC – Scotland – Oct 2004 Condor Team Computer Sciences Department University of Wisconsin-Madison  The Bologna

9www.cs.wisc.edu/condor

Okay, how?

› Flipside of flexibility is complexity!

› It’s pretty cool that Condor allows you to combine dedicated and opportunistic scheduling in one system, but it takes a bit of work to get it all set up…

› Luckily for y’all, we’ve already done the hard part, and now you can copy it.

Page 10: UK Condor Week NeSC – Scotland – Oct 2004 Condor Team Computer Sciences Department University of Wisconsin-Madison  The Bologna

10www.cs.wisc.edu/condor

Copy it from where?

› Bologna Batch System document http://www.cs.wisc.edu/~pfc/bbs.doc

› A detailed walk-through of the specific policies and the necessary Condor configuration to make each one work.

› Line by line examples of how we implemented each.

› What’s in it? Let’s take a look…

Page 11: UK Condor Week NeSC – Scotland – Oct 2004 Condor Team Computer Sciences Department University of Wisconsin-Madison  The Bologna

11www.cs.wisc.edu/condor

First Step: No hand waving!!

› Bologna Batch Jobs are specially-designated jobs which may run only on specially-designated Bologna Batch Servers.

› Only users in Bologna may submit Bologna Batch Jobs.› Bologna Batch Jobs must be vanilla-universe jobs (and therefore are not

capable of checkpointing and resuming), and thus once they start they must not be preempted for other jobs.

› Bologna Batch Servers prefer Bologna Batch Jobs over other Condor jobs, and will start Bologna Batch Jobs regardless of system load or console activity.

› There are two types of Bologna Batch Jobs, short-running and long-running. Bologna Batch Jobs are assumed to be short-running unless they are explicitly labeled as long-running when they are submitted.

› A short-running Bologna Batch Job must not be forced to wait for the completion of a long-running Bologna Batch Job before starting.

› When short and long-running Bologna Batch Jobs are running simultaneously on the same physical machine, the short-running job processes should run at a lower (better) OS priority than the long-running jobs.

› A short-running Bologna Batch Job may only run for one hour, after which point it should be killed and removed from the queue.

› Bologna Batch Jobs have priority over other Condor jobs. This means two things: other jobs must never preempt Bologna Batch Jobs, and Bologna Batch Jobs must always immediately preempt other jobs.

Page 12: UK Condor Week NeSC – Scotland – Oct 2004 Condor Team Computer Sciences Department University of Wisconsin-Madison  The Bologna

12www.cs.wisc.edu/condor

Review…› Job

Requirements› Machine

START PREEMPT RANK WANT_SUSPEND, … JOB_RENICE_INCREMENT

› PREEMPTION_REQUIREMENTS› STARTD_EXPRS, SUBMIT_EXPRS

Page 13: UK Condor Week NeSC – Scotland – Oct 2004 Condor Team Computer Sciences Department University of Wisconsin-Madison  The Bologna

13www.cs.wisc.edu/condor

Requirement #1, “Bologna Batch Jobs are specially-designated jobs which may run only on specially-

designated Bologna Batch Servers.”: To identify the servers, place into local condor

config:BolognaBatchServer = TrueSTARTD_EXPRS = $(STARTD_EXPRS) BolognaBatchServer

To indentify Bologna Batch Jobs by inserting the following line into their job submit description files:

+BolognaBatchJob = True

Now Bologna Batch Jobs and Servers can identify one another, users ensure that Bologna Batch Jobs run only on Bologna Batch Servers by specifying a job requirement:

Requirements = (BolognaBatchServer == True)

Page 14: UK Condor Week NeSC – Scotland – Oct 2004 Condor Team Computer Sciences Department University of Wisconsin-Madison  The Bologna

14www.cs.wisc.edu/condor

Requirement #2, “Only users in Bologna may submit Bologna Batch

Jobs.”: Each Bologna Batch Server double-checks the origin of a

job claiming to be a Bologna Batch Job :IsBBJob = ( TARGET.BolognaBatchJob =?= True \ && TARGET.SUBMIT_SITE_DOMAIN ==

$(SUBMIT_SITE_DOMAIN) )

SUBMIT_SITE_DOMAIN is an attribute that INFN defines on all machines, and which they previously configured the Condor schedd to automatically add to each job’s classad . Individual Condor users are not able to override it:

SUBMIT_SITE_DOMAIN = "$(UID_DOMAIN)"SUBMIT_EXPRS = $(SUBMIT_EXPRS) SUBMIT_SITE_DOMAIN

Page 15: UK Condor Week NeSC – Scotland – Oct 2004 Condor Team Computer Sciences Department University of Wisconsin-Madison  The Bologna

15www.cs.wisc.edu/condor

Requirement #3, “BB Jobs must be vanilla-universe jobs, and thus once

they start they must not be preempted“

Next we modified each Bologna Batch Server’s WANT_SUSPEND_VANILLA and PREEMPT expressions, which Condor uses to decide when to suspend or preempt a vanilla job, so that INFN’s default preemption policy would only affect non-Bologna Batch Jobs. :

IsNotBBJob = ( $(IsBBJob) =!= True )WANT_SUSPEND_VANILLA = ( $(IsNotBBJob) && ($

(WANT_SUSPEND_VANILLA)) )PREEMPT = ( $(IsNotBBJob) && ($(PREEMPT)) )

Page 16: UK Condor Week NeSC – Scotland – Oct 2004 Condor Team Computer Sciences Department University of Wisconsin-Madison  The Bologna

16www.cs.wisc.edu/condor

Requirement #4, “Bologna Batch Servers prefer Bologna Batch Jobs

over other Condor jobs, and will start Bologna Batch Jobs regardless of system load or console activity“

RANK = $(IsBBJob)INFN_START = ( (LoadAvg - CondorLoadAvg) < 0.3 \ && KeyboardIdle > (15 * 60) \ && TotalCondorLoadAvg <= 1.0 )START = ( $(IsBBJob) || ($(INFN_START)) )

Page 17: UK Condor Week NeSC – Scotland – Oct 2004 Condor Team Computer Sciences Department University of Wisconsin-Madison  The Bologna

17www.cs.wisc.edu/condor

Requirement #5, “There are two types of Bologna Batch Jobs, short-running and long-running. Bologna Batch Jobs are assumed to be short-

running unless they are explicitly labeled as long-running when they

are submitted.“

Declare long running jobs by placing the following into submit file:

+LongRunningJob = True

The in the config file, take advantage of meta-operators:

IsLongBBJob = ( $(IsBBJob) && TARGET.LongRunningJob =?= True )

IsShortBBJob = ( $(IsBBJob) && TARGET.LongRunningJob =!= True )

Page 18: UK Condor Week NeSC – Scotland – Oct 2004 Condor Team Computer Sciences Department University of Wisconsin-Madison  The Bologna

18www.cs.wisc.edu/condor

Requirement #6, “A short-running Bologna Batch Job must not be forced to wait for the completion of a long-running Bologna Batch Job before

starting..“Declare more Virtual Machines than there are actual CPUs (dual CPU = 2

short VMs, 4 long):NUM_SHORT_RUNNING_VMS = 2IsShortRunningVM = (VirtualMachineID <= $(NUM_SHORT_RUNNING_VMS))IsLongRunningVM = (VirtualMachineID > $(NUM_SHORT_RUNNING_VMS))

Change the start expression:SHORT_RUNNING_VM_START = ( $(IsShortBBJob) \ || ( $(IsNotBBJob) && $(INFN_START) ) )LONG_RUNNING_VM_START = $(IsLongBBJob)

START = ( ( $(IsShortRunningVM) && $(SHORT_RUNNING_VM_START) ) \ || ( $(IsLongRunningVM) && $(LONG_RUNNING_VM_START) ) )

Page 19: UK Condor Week NeSC – Scotland – Oct 2004 Condor Team Computer Sciences Department University of Wisconsin-Madison  The Bologna

19www.cs.wisc.edu/condor

Requirement #7, “When short and long-running BB Jobs are running

simultaneously on the same physical machine, the short-running job processes should run at a lower

(better) OS priority”JOB_RENICE_INCREMENT = ( 5 + ( 10 * ( LongRunningJob =?= True \ || BolognaBatchJob =!= True ) )

If LongRunningJob is true in the job classad, the expression evaluates to (5 + (10 * 1)), or 15. If LongRunningJob is undefined or false in the job classad, but BolognaBatchJob is true, the expression evaluates to (5 + (10 * 0)), or 5. If neither is defined, the expression evaluates to (5 + (10 * 1)), or 15

Page 20: UK Condor Week NeSC – Scotland – Oct 2004 Condor Team Computer Sciences Department University of Wisconsin-Madison  The Bologna

20www.cs.wisc.edu/condor

Requirement #8, “A short-running Bologna Batch Job may only run for

one hour, after which point it should be killed and removed from the

queue.Declare long running jobs by placing the following into

submit file:PREEMPT = ( ( $(IsNotBBJob) && ($(PREEMPT)) ) \ || ( $(IsShortBBJob) && ($(ActivityTimer) >

60*60) ) )SHORT_RUNNING_VM_START = (( $(IsShortBBJob) \ && (RemoteWallClockTime<60*60) =!= False) \ || ( $(IsNotBBJob) && ($(INFN_START)) ) )

To remove from the queue, in the job ad add:Periodic_Remove = ( LongRunningJob =!= True \ && (RemoteWallClockTime < 60*60) )

Page 21: UK Condor Week NeSC – Scotland – Oct 2004 Condor Team Computer Sciences Department University of Wisconsin-Madison  The Bologna

21www.cs.wisc.edu/condor

Requirement #9, “Bologna Batch Jobs have priority over other Condor jobs: other jobs must never preempt

BBJobs, and BB Jobs must always immediately preempt other jobs..

RANK already dealt with, now priority preemption:INFN_PREEMPTION_REQUIREMENTS = ( $(StateTimer) > (2 * (60 * 60)) \ && RemoteUserPrio > SubmittorPrio * 1.2 )

PREEMPTION_REQUIREMENTS = \ (( BolognaBatchServer=!=True && $

(INFN_PREEMPTION_REQUIREMENTS)) \ || (BolognaBatchServer =?= True \ && ( BolognaBatchJob =!= True \ && ( TARGET.BolognaBatchJob =?= True \ || $

(INFN_PREEMPTION_REQUIREMENTS) ))))

Page 22: UK Condor Week NeSC – Scotland – Oct 2004 Condor Team Computer Sciences Department University of Wisconsin-Madison  The Bologna

22www.cs.wisc.edu/condor

Wrap condor_submit to make it easy for users

bbs_submit_short / bbs_submit_long

#!/bin/sh_CONDOR_APPEND_REQ_VANILLA='(BolognaBatchServer

== True)'export _CONDOR_APPEND_REQ_VANILLAcondor_submit -a '+BolognaBatchJob = True' \ -a 'should_transfer_files = IF_NEEDED' \ -a 'when_to_transfer_output = ON_EXIT' \ -a 'universe = vanilla' \ -a 'periodic_remove = ( LongRunningJob =!= True && (RemoteWallClockTime > 60*60) ) ' \ $*

Page 23: UK Condor Week NeSC – Scotland – Oct 2004 Condor Team Computer Sciences Department University of Wisconsin-Madison  The Bologna

23www.cs.wisc.edu/condor

Simple for Users

› Although policy is complicated, the interface for users is kept simple: Users call bbs_submit_long or bbs_submit_short,

just as they would condor_submit…• Short jobs start quickly, but those that run for >1 hour

are killed.• Long jobs will run to completion...

bbs_submit_* scripts automatically add the appropriate classad attributes to the job to take advantage of the long or short running VMs on Bologna Batch Servers.

Page 24: UK Condor Week NeSC – Scotland – Oct 2004 Condor Team Computer Sciences Department University of Wisconsin-Madison  The Bologna

24www.cs.wisc.edu/condor

Any Questions?

› Email me at [email protected].› Check the Bologna Batch System

document at http://www.cs.wisc.edu/~pfc/bbs.doc

› Thanks!


Recommended