Sticky-ERfair: a task-processor affinity aware proportional fair scheduler

Real-Time Syst (2011) 47:356–377DOI 10.1007/s11241-011-9120-2

Sticky-ERfair: a task-processor affinity awareproportional fair scheduler

Arnab Sarkar · Sujoy Ghose · P.P. Chakrabarti

Published online: 26 February 2011© Springer Science+Business Media, LLC 2011

Abstract A drawback of current proportional fair schedulers is that they ignore task-to-processor mutual affinities, thereby causing frequent inter-processor task migra-tions and cache misses. This paper presents the Sticky-ERfair Scheduler, an Early-Release Fair (ERfair) scheduler that attempts to minimize task migrations and pre-emptions. Given any processor Vi , Sticky-ERfair allocates to it the most recentlyexecuted ready task that previously executed on Vi (thus restricting migrations andpreemptions) in such a way that this allocation does not cause any ERfairness vi-olations in the system at any time during the schedule length. Experimental resultsreveal that Sticky-ERfair can achieve upto over 40 times reduction both in the num-ber of migrations and preemptions suffered with respect to Basic-ERfair (for a setof 25 to 100 tasks running on 2 to 10 processors) while simultaneously guaranteeingERfairness at each time slot.

Keywords Proportional fairness · ERfair scheduling · Real time scheduling · Taskmigration · Task preemptions · Cache affinity

1 Introduction

Proportionate fair (Pfair) schedulers (Anderson and Srinivasan 2000, 2004; Baruahet al. 1996, 1995) form a very effective resource management strategy for scheduling

A. Sarkar (�) · S. Ghose · P.P. ChakrabartiComputer Science & Engineering Department, Indian Institute of Technology, Kharagpur,WB 721 302, Indiae-mail: [email protected]

S. Ghosee-mail: [email protected]

P.P. Chakrabartie-mail: [email protected]

mailto:[email protected]



Real-Time Syst (2011) 47:356–377 357

recurrent hard real-time task sets with fully dynamic priorities on multiprocessors.They are optimal in the sense that any feasible task set (Jeffay and Goddard 1999,2001) may be scheduled using these strategies with the maximum possible fairnessaccuracy that is practical.

Given a set of n periodic tasks to be scheduled on m identical processors whereeach task Ti = (ei,pi) is characterized by two parameters: an execution requirementei and a period pi , Pfair schedulers guarantee that not only are all task deadlinesmet, but also each task executes at a consistent rate proportional to its weight (ei/pi ).Typically, Pfair algorithms consider discrete time lines and divide the tasks into equalsized subtasks (therefore, time is measured in terms of the number of time slots ortime quanta t elapsed since the start of the schedule; a single subtask may be exe-cuted within the time period of a slot). Subtasks are scheduled appropriately to ensurefairness. The fairness accuracy is generally defined in terms of the lag between theamount of time that has been actually allocated to a task and the amount of time thatwould be allocated to it in an ideal system with a time quantum approaching zero.Formally, the lag of task Ti at time t , denoted lag(Ti, t), is defined as follows:

lag(Ti, t) = (ei/pi) ∗ t − allocated(Ti, t),

where allocated(Ti, t) is the amount of processor time allocated to Ti in [0, t).A schedule is Pfair (Anderson and Srinivasan 2004) iff:

(∀ Ti, t :: −1 < lag(Ti, t) < 1)

Informally, the allocation error associated with each task must always be less thanone time quantum.

Each subtask stij of a task Ti has a pseudo-release time prij and pseudo-deadline

time pdij defined as: prij = � j−1wti

� and pdij = � jwti

� − 1, where wti denotes theweight ( ei

pi) of Ti . The scheduling bandwidth or window (win(stij )) for each subtask

is given by: win(stij ) = [prij , pdij ].The notion of early-release fair scheduling (ERfair) (Anderson and Srinivasan

2000) is obtained from the definition of Pfair scheduling by simply dropping the−1 lag constraint. Formally, a schedule is early-release fair (ERfair) iff:

(∀ T , t :: lag(T , t) < 1)

Hence, in an ERfair system, a subtask becomes eligible for execution immediatelyafter its previous subtask completes execution. Each subtask in an ERfair system alsohas a pseudo-deadline and it is the same as the Pfair pseudo-deadline pdij .

However, inspite of its theoretical importance and usefulness, actual implemen-tations of these fair schedulers are limited, primarily due to the fact that they areusually ignorant of the affinities between tasks and their executing processors whichmay cause unrestricted inter-processor task migrations and preemptions, thus incur-ring high overheads.

Task preemptions primarily result in delay suffered by resumed threads of execu-tion due to compulsory and conflict cache misses while populating the caches withtheir evicted working sets. Therefore, a processor is affined to the task it executed last

358 Real-Time Syst (2011) 47:356–377

because its working set currently exist in cache (and are valid (non-dirty)) and hence,its execution results in cache hits (Gupta et al. 1991; Squillante and Lazowska 1993;Squillante and Nelson 1991; Torrellas et al. 1993). Although, even after intermediatepreemptions, some traces of cache data of a task say Ti , (which executed previouslyon a given processor) may still remain valid (in that processor’s cache), most of Ti ’scache contents will typically be swapped out even by a single distinct task execut-ing for just one time slot between two consecutive executions of Ti . This will betrue for all practical time slot lengths and sizes of cache (Berg and Hagersten 2004;Negi et al. 2003).

Task migration related overheads refer to the time spent by the operating systemto transfer the complete state of a thread from the processor where it had been exe-cuting to the processor where it will execute next after a migration. Obviously, themore loosely-coupled a system, the higher will be this overhead. Task migrationsmay also effect some cache-miss related overheads, although in general, this is lessin comparison to communication related overheads.

These expensive overheads underline the importance of devising suitable schedul-ing techniques that firstly attempt to maximize the time for which a task executeson a particular processor and then try to execute those recently executed ready taskswithin a given processor whose working sets currently reside in cache.

Restricting migrations/preemptions has been an area of considerable interest overthe years (Anderson et al. 2005; Baruah and Fisher 2006; Block and Anderson 2006;Funk et al. 2001; Harizopoulos and Ailamaki 2002; Lopez et al. 2000; Phillips etal. 1997). The traditional approach to avoiding migration has been by adopting apartition oriented scheduling approach where, once a task is allocated to a processor,it is exclusively executed on that processor (Carpenter et al. 2004; Malkevitch 2004).One of its major problems is that none of the algorithms here can achieve an overallsystem utilization (

∑ni=1 ei/pi ) greater than m+1

2 in a system of m processors inthe worst case (Andersson and Jonsson 2003). That is, there are situations where nomore than 50% of the system capacity can be used in order to ensure that all deadlinesare met. However, this worst-case condition may be relaxed either by bounding themaximum weight of any individual task under a certain value (Lopez et al. 2000) orby allowing individual tasks to be split across more than one processor (Anderssonand Bletsas 2008; Andersson and Tovar 2006). Lopez et al. (2000) proved that ifall the tasks have a weight under a value α, the worst-case achievable utilizationbecomes:

Uwc(m,β) = βm + 1

β + 1, (1)

where β = � 1α�. Thus, as α approaches 0, Uwc(m,β) approaches m and when α = 1,

Uwc(m,β) becomes m+12 . The EKG algorithm (Andersson and Tovar 2006) on the

other hand attempts to improve system utilization by allowing tasks to be split intoatmost two portions if necessary. Tasks are assigned one processor at a time. The firstportion of a split task is allocated to the processor currently being assigned whilethe second portion goes to the next processor on which tasks will be assigned. Thesetwo portions are scheduled exclusively. The principal facet of this approach is that itallows a trade-off on the upper bound of the achievable system utilization with the

Real-Time Syst (2011) 47:356–377 359

number of preemptions by adjusting a parameter k (where 2 ≤ k ≤ m; m denotesthe number of processors). When k < m, the achievable utilization becomes k/(k +1) while incurring atmost 2k preemptions per job per hyper-period (a hyper-periodrepresents the time period denoted by the least common multiple of the periods of allthe tasks). Thus, the achievable system utilization is 66% when k = 2 and incurs 4migrations per job every hyper-period. For k = m, the achievable utilization becomes100%, although at the expense of more preemptions.

Anderson et al. (2005) have provided an interesting algorithm to restrict task mi-grations that provides bounded deadline tardiness and does not pose any restrictionon the overall system utilization. The algorithm is also important from the standpointthat it ensures that over a long range of time, all tasks execute at their prescribed rates(given by their weights) and thus, it takes the first steps at attempting to develop a par-tition oriented real-time rate-based scheduler. However, being based on EDF (Liu andLayland 1973), a non-rate-based algorithm, its rate-based properties are weak. Theother limitations of the algorithm are that it does not allow task priorities to changewithin a job and requires that individual task weights to be capped to at most 1/2.

On the other hand, even though the generic global scheduling methodology (whichallows a task to execute on any processor when resuming after having been pre-empted) as employed by the Pfair schedulers may cause an unrestricted number ofpreemptions/migrations, it possesses many attractive features like flexible resourcemanagement, dynamic load distribution, fault resilience, etc. (Srinivasan et al. 2003).Algorithms like BF (Zhu et al. 2003), LLREF (Cho et al. 2006) and NVNLF (Funaokaet al. 2008) attempt to lower the number of preemptions by enforcing Pfair/ERfairconstraints to be satisfied only at task period boundaries. Kimbrel et al. (2006) haveprovided a global scheduler to minimize task migrations. This attempts a trade-offbetween the amount of deviation from perfect fairness and the number of migrations.The drawbacks of this algorithm however stems from its limited scope of applica-bility in that it works only for a set of persistent (non-dynamic) equal priority taskswhere all tasks have same weight.

In this paper, we propose a modified version of the Basic-ERfair algorithm calledSticky-ERfair to minimize the number of inter-processor task migrations and preemp-tions. Although, there exist other optimal low-overhead algorithms like EKG (Ander-sson and Tovar 2006), LLREF (Cho et al. 2006), etc., this algorithm is important insystems where strict maintenance of fairness (as that provided by ERfair) is not anoption but a necessity. These include dynamic task systems which allow tasks to joinand leave the system at any time (provided that certain feasibility conditions are sat-isfied), streaming multimedia systems requiring strict execution progress guaranteesfor each stream and today’s embedded systems which concurrently run a mix of dif-ferent independent applications like real-time audio processing, interactive gaming,web browsing, etc. Before proceeding further, we first provide a brief overview of theBasic-ERfair algorithm.

Given a set of m processors V1, V2, . . . , Vm and n (≥ m) tasks, Basic-ERfairchooses the m most urgent tasks from a priority queue at each time-slot and allo-cates processors to these tasks in the order in which they have been extracted fromthe priority queue (thus, the first task extracted from the queue is allocated V1, thesecond task is allocated V2, and so on). Thus, Basic-ERfair is completely oblivious

360 Real-Time Syst (2011) 47:356–377

of the processor in which a task executed the last time it was scheduled and henceincurs an unrestricted number of migrations. It is also oblivious of the task whichexecuted last on a particular processor and therefore incurs an unrestricted number ofpreemptions.

Sticky-ERfair minimizes the number of migrations and preemptions by: (I) Keep-ing track of the processor where a task last executed, and (II) Utilizing task over-allocations in under-loaded ERfair systems. For any given processor Vi and a setof tasks T = {T1, T2, . . . , Tσ } which executed on Vi the last time it was allocateda processor, Sticky-ERfair attempts to execute the most recently executed task Tρ

from the set T such that such an execution does not generate any possibility of futureERfairness violations in the system. In addition to saving cache-misses by reducingpreemptions, this methodology also allows re-use of any traces of non-dirty data ofpreviously executed tasks that still remains in cache. This is because, the more recentthe last execution of a task on a given processor, higher becomes the probability thatsome of its cache contents still remain valid. Experimental results reveal that Sticky-ERfair can achieve upto over 40 times reduction both in the number of migrationsand preemptions suffered with respect to Basic-ERfair (for a set of 25 to 100 tasksrunning on 2 to 10 processors) while simultaneously guaranteeing ERfairness at eachtime slot. Theoretical analysis on migrations suffered by equal priority tasks usingSticky-ERfair shows that unlike Basic-ERfair, number of migrations is independentof the number of tasks for Sticky-ERfair.

This paper is organized as follows. In the next section, we present some importantterminology that will be required in the later sections. Section 3 describes the Sticky-ERfair algorithm along with an illustrative example. We provide an analysis of thealgorithm in Sect. 4. Experimental results are presented in Sect. 5. We conclude inSect. 6.

2 Terminology

– t : Time; represents the t th time slot.– n: Total number of tasks.– m: Number of processors used.– T : The set of tasks. Symbolically, T = {T1, T2, T3, . . . , Tn}, where Ti is the ith

task.– stij : j th subtask of task Ti . Each subtask defines a code segment which may be

executed within a period of one time slot.– ei : Execution requirement of Ti (in number of time slots).– eci : Number of time slots of execution already completed by Ti . Therefore, ei −

eci gives the remaining execution requirement.– pi : Period within which Ti ’s execution must complete to meet its deadline.– lei : Id. of the processor where Ti last executed.– Vi : Denotes the ith processor (Processor id).– k: Migration ratio. Given a set of tasks, k denotes the ratio of the number of mi-

grations suffered by a Basic-ERfair schedule to the number of migrations sufferedby a Sticky-ERfair schedule.

Real-Time Syst (2011) 47:356–377 361

– R: Preemption ratio; denotes the ratio of the number of preemptions suffered bya Basic-ERfair schedule to the number of preemptions suffered by a Sticky-ERfairschedule.

– wti : Weight of a task Ti . It is given by the ratio of its execution requirement ei

and the period within which to execute it pi .

wti = ei

pi

. (2)

– W : Total system weight; sum of weights of the currently active tasks.– prij : Pseudo-release time of the j th subtask of task Ti . It is used in Pfair algo-

rithms to denote the time slot from which the j th subtask of Ti becomes ready andis considered for execution. It is given by:

prij =⌊

(j − 1) · pi

ei

⌋. (3)

– pdij : Pseudo-deadline of the j th subtask of task Ti . It denotes the time slot beforewhich Ti must complete executing its j th subtask to remain Pfair/ERfair. It is givenby:

pdij =⌈

j · pi

ei

⌉− 1. (4)

– win(stij ): It is used in Pfair scheduling; denotes the scheduling bandwidth orwindow for the j th subtask of Ti . It is given by:

win(stij ) = [prij , pdij ]. (5)

So, the window length, denoted by |win(sti,j )| is:

|win(stij )| = pdij − prij + 1 (6)

– δij : Deadline of postponement of the j th subtask of task Ti . It denotes the timeslot upto which the execution of the j th subtask of Ti may be safely postponed(suspended from ready state) without any possibility of the system violating ER-fairness in a Sticky-ERfair schedule. It is given by:

δij = pdij −⌊

pi

ei

⌋− 2. (7)

When used without the second subscript, δi simply denotes the deadline of post-ponement of the next subtask of task Ti . Theorem 1 (in Sect. 4) proves that ifthe execution of any subtask is not delayed beyond its deadline of postponement,system ERfairness can never be violated.

3 The sticky-ERfair algorithm

Given a set of n tasks T = {T1, T2, . . . , Tn} to be scheduled on a set of m identicalprocessors V = {V1,V2, . . . , Vm}, the Sticky-ERfair algorithm works as follows:

362 Real-Time Syst (2011) 47:356–377

1. At each time slot t , Sticky-ERfair sequentially selects the most urgent m tasks(these tasks have the earliest pseudo-deadlines for their next sub-tasks) from anAVL tree A which contains the set of ready tasks.

2. Those tasks (say m1 in number) among these m tasks whose deadline of postpone-ment (δ; refer Sect. 2) is already over must execute in the current time slot. To eachtask say Ti , of these m1 tasks, Sticky-ERfair attempts to allocate that processor sayVj , where Ti had executed the last time it was allocated a processor, provided Vj

has not already been alloted another task (among these m1 tasks).3. Let m2 denote the number of tasks which could not be allocated due to such a

clash of more than one task on the same processor. Although, these m2 tasks mustexecute in the current time slot t incurring a migration, they are temporarily storedin a list l2 delaying their processor allotment.

4. The execution of the remaining m − m1 tasks may be safely postponed withoutthe possibility of ERfairness violations by any task in future and hence they areseparated out in another list l1.

5. Sticky-ERfair now attempts to replace these tasks by such tasks that may allowit to avoid migrations and/or cache-misses on any m − m1 of the m − m1 + m2processors still awaiting task allocation.

6. On any of these processors (say Vj ), the algorithm attempts to allocate the mostrecently executed runnable task which executed on Vj the last time it was sched-uled. To be able to efficiently discover these tasks, the scheduler maintains for eachprocessor Vi an AVL tree ATi containing the ids of those tasks which were lastscheduled on Vi . The task ids in ATi are ordered in terms of the pseudo-deadlinesof their next sub-tasks.

7. Finally, the tasks in list l2 are allocated any of the remaining free processors andthe unallocated tasks in list l1 are re-inserted back into the AVL tree A.

A noteworthy aspect of the algorithm is the use of AVL tree A as the priority queueof ready tasks instead of a min-heap, the more commonly used data structure to holdready tasks in proportional fair schedulers (Baruah et al. 1996). Sticky-ERfair uses anAVL tree because unlike the existing proportional fair schedulers it not only needs toextract the urgent-most ready tasks at each time slot but may also need to extract anyarbitrary task when allocating to a processor the most recent task which last executedon it. While each such arbitrary extraction incurs an overhead of O(lg n) on AVLtrees, it would result in a O(n · lg n) overhead on heaps. On a similar ground, thepriority queues corresponding to each processor has been structured as AVL trees(ATi ) instead of lists or heaps.

Algorithm 1 presents the pseudo-code of the Sticky-ERfair Algorithm.

Example Let us consider (n =) 5 tasks, T1, T2, T3, T4 and T5 to be scheduled ina system of (m =) 3 processors V1, V2 and V3. Let us consider a time instant say,t = 25 when le1 = le2 = le3 = 1, le4 = 2, and le5 = 3. Let the δi values for the nextsubtasks of these tasks respectively be 22, 25, 26, 26, 27 and their pdi values be 30,30, 33, 33 and 36. The sticky-ERfair algorithm will first choose the three most urgenttasks, that is T1, T2 and T3 in the same manner as Basic-ERfair chooses. After this,T1 is alloted processor V1. As t = δ2 = 25, T2 must be executed in the current timeslot to avoid possible ERfairness violation and is transferred to list l2. T2 will incur a

Real-Time Syst (2011) 47:356–377 363

migration. T3 is transferred to list l1. T4 is executed on V2 in place of T3, thus savingone migration with respect to ERfair in this time slot. Finally, T2 is allotted processorV3.

Algorithm 1 Sticky-ERfair1: {Given: A set of n tasks and m processors}2: for Each time slot t , do3: Select the most urgent m tasks from the AVL tree A in ERfair fashion.4: for Each task Ti of these m tasks do5: if t ≥ δi {Ti may possibly violate ERfairness if not executed in the current

time-slot t} then6: if Vlei

is free { Vleidenotes the processor where Ti last executed} then

7: Allot Vleito Ti .

8: else if Ti was executed more recently on Vleithan the currently allocated

task Tc on Vlei{this is determined by the relative positions of Ti and Tc

in the AVL tree ATlei} then

9: Allot Vleito Ti ; Transfer Tc to a separate list l2.

10: else11: Transfer Ti to list l2.12: else13: Transfer Ti to list l1.14: { Let sg1, sg2, . . . , sgf be the indices of the processors yet to be allocated

a task in this time-slot. The lines below (15 to 25) attempts to replace theexecution of tasks in l1 by executing the most recently executed runnable tasksfrom the AVL trees ATsgj

(1 ≤ j ≤ f ; f = |l1| + |l2|). }15: k = j = 116: while k ≤ h and j ≤ f {h denotes the number of tasks in l1} do17: if ATsgj

is not empty then18: Extract index c of the most recently executed task in ATsgj

.19: if Task Tc is one of the m tasks selected in step 3 then20: Extract Tc from list l1.21: else22: Extract Tc from AVL tree A.23: Allocate Tc to processor Vsgj

; Increment k; Increment j .24: else25: Increment j .26: Allocate tasks in l2 in the remaining free processors. {Each allocation incurs

a migration}27: while There is still a free processor Vx do28: Allocate the next task in l1 to Vx . {Each allocation incurs a migration}29: Reinsert remaining tasks in l1 into the AVL tree A.

364 Real-Time Syst (2011) 47:356–377

4 Analysis of the algorithm

This section provides an analysis of the Sticky-ERfair algorithm. We first prove thatSticky-ERfair never violates ERfairness. Then we show that the complexity of Sticky-ERfair is O(m · lg(n)) which is typically of the same order as that of Basic-ERfair.Finally, in Theorem 3 we provide an analysis for the number of migrations sufferedby Sticky-ERfair for a set of equal-weighted tasks on fully loaded processors andshow that in this case, the number of migrations is independent of the number oftasks.

Lemma 1 The maximum Pfair window length maxj |win(stij )| is given by:

maxj

|win(stij )| =⌊

pi

ei

⌋+ 2

Proof The length of the Pfair execution window (Anderson and Srinivasan 2004) isgiven by:

|win(stij )| =⌈

j · pi

ei

⌉−

⌊(j − 1) · pi

ei

⌋

Therefore, maxj |win(stij )| = maxj (� j ·pi

ei� − � (j−1)·pi

ei�). Let pi = f1ei + r1 such

that f1, r1 ∈ I;0 ≤ r1 < ei in the above expression,

⇒ maxj

(⌈jf1 + jr1

ei

⌉−

⌊jf1 − f1

r1(j − 1)

ei

⌋)

= maxj

(⌈jr1

ei

⌉−

⌊r1(j − 1)

ei

⌋+ f1

)

Here, either r1 = 0 or r1 > 0. If r1 = 0, the above expression reduces to f1 = pi

ei. If

r1 > 0, putting kr1 = f2ei + r2, f2, r2 ∈ I;0 ≤ r2 < ei in the above expression,

⇒ maxj

(⌈f2ei + r2

ei

⌉−

⌊f2ei + r2 − r1

ei

⌋)+ f1

= maxj

(⌈r2

ei

⌉−

⌊r2 − r1

ei

⌋)+ f1

Here, either r2 − r1 < 0 or r2 − r1 > 0. If r2 − r1 > 0, the above expression reduces tof1 + 1 = �pi

ei�+ 1. If r2 − r1 < 0, the above expression reduces to f1 + 2 = �pi

ei�+ 2.

Therefore, maxj |win(stij )| = � pi

ei� + 2. �

Theorem 1 Sticky-ERfair never violates ERfairness within its entire schedule length.

Proof The pseudo-deadline (pdij ) for subtasks used in Sticky-ERfair is same as thatused in other Pfair and ERfair algorithms.

Real-Time Syst (2011) 47:356–377 365

So, the possibility of ERfairness violation may arise in the Sticky-ERfair algorithmonly because it allows postponement of subtask execution to avoid task migrationsand/or cache-misses.

Sticky-ERfair never postpones a subtask beyond its deadline of postponement.Hence, we need to prove that system ERfairness can never be violated if no subtaskstil is allowed to remain suspended from ready state beyond its deadline of postpone-ment δil .

From definition (Sect. 2), pdil − δil = � pi

ei� + 2.

By Lemma 1, the maximum length of the Pfair execution window is:maxj |win(stij )| = � pi

ei�+2. Thus, pdil −δil = maxj |win(stij )|. So, Sticky-ERfair’s

execution window for any given subtask is always greater than or equal to its corre-sponding Pfair execution window.

Hence, Sticky-ERfair can never violate ERfairness. �

Theorem 2 Sticky-ERfair has a scheduling complexity of O(m · lg(n)).

Proof Let us analyze the complexity of the steps executed in each time slot of algo-rithm Sticky-ERfair.

Line 3 involves m searches on the AVL tree A, each of them being a O(lg(n))

operation. Hence, the complexity of this step is O(m · lg(n)).Lines 4–13 contain a for loop that allocates each task Ti either to the processor

Vleior to lists l1 or l2. Allocation of Vlei

takes constant time. As l1 and l2 aremaintained as binary search trees, insertion into l1 or l2 takes O(lg(m)) time. Theloop executes m times, once for each task selected in line 3. Its complexity thereforebecomes O(m · lg(m)).

The while loop in lines 16–25 tries to allocate to each remaining free processorVi the most recently executed task that last executed on Vi , till the number of suchfree processors is more than the number of tasks in l2. Line 20 within the loop doesa search in list l1. As this list is of size O(m), the step has a complexity of O(lg(m))

(l1 is maintained as a binary search tree). Line 22 within the loop involves an AVLtree search and thus have an overhead of O(lg(n)). The other steps inside the looprun in O(1) time. Because the number of free processors can be m − 1 atmost and l2may be possibly empty, the overall complexity of the loop becomes O(m · lg(n)).

Line 26 allots a processor for each task in l2. l2 can contain atmost m − 1 tasks.So, this step incurs an overhead of O(m).

Lines 27–28 contain a while loop that allocates tasks in l1 to the processors stillremaining free. This loop also has an overhead of O(m).

The last line (line 29) of the algorithm involves an AVL tree insert operation andhas a complexity of O(m · lg(n)).

Therefore, the overall scheduling complexity of Sticky-ERfair is O(m · lg(n)). �

Note: Typically, ERfair also has a scheduling complexity of O(m · lg(n)) (Ander-son and Srinivasan 2000; Baruah et al. 1995).

We now show that the number of migrations suffered by a set of equal prioritytasks when scheduled on fully loaded processors using Sticky-ERfair is independentof the number of tasks and provide a bound on the number of such migrations.

366 Real-Time Syst (2011) 47:356–377

Theorem 3 Given a set of n equal weighted tasks to be scheduled on m identicalprocessors (m ≤ n) such that n = qm + r, q ∈ N, r ∈ I and 0 ≤ r < m and to-tal system load=100% (fully loaded system), algorithm Sticky-ERfair will produce aschedule where the number of inter-processor task migrations every n time-slots isupper bounded by 2 · r(m − r).

Proof Sticky-ERfair selects the m most urgent tasks at each time slot. In the givenscenario as the system is 100% loaded, task execution postponement by Sticky-ERfairis not possible. Again, as all the tasks are equal weighted, selection of the m mosturgent tasks will result in an exact round-robin execution order for the tasks.

If during the course of round-robin execution, two tasks allocated to the sameprocessor (say Vi ) require to be scheduled in the same time-slot, one of the tasksis executed on a free processor (a processor is considered to be free at a particulartime-slot if no task which executed on that processor the last time it was scheduled,executes at that time-slot). Each such execution incurs a maximum of two migrations– one migration for one of each clashing pair of tasks to the free processor and an-other migration to return back to its previously allocated processor Vi in a subsequentscheduling step.

Let IRi (1 ≤ i ≤ m) be the set of tasks that have executed on processor Vi the firsttime they were scheduled. As the execution sequence is strictly round-robin, r amongthese m sets denoted by IRi (1 ≤ i ≤ m) will contain exactly q + 1 distinct tasks andthe rest m − r sets will contain q distinct tasks.

Let the r sets among the sets IRi (1 ≤ i ≤ m) which contain q + 1 distinct tasksbe denoted by ISj (1 ≤ i ≤ r). As the tasks are equal weighted and the system isfully loaded, each of the q + 1 tasks in ISj will execute m times within every periodof n time slots. Therefore, at the end of n time slots it is obvious that there will be:

(q + 1)m − n =(

n − r

m+ 1

)m − n = m − r

instances where two tasks allocated to the same processor will clash and require toexecute in the same time slot, and each such clash will incur no more than 2 migra-tions.

Therefore, each of the r sets ISj (1 ≤ i ≤ r) will incur atmost 2(m−r) migrations.Hence, the number of migrations in a Sticky-ERfair schedule is upper bounded by2r(m − r) per n time slots. �

Note: Basic-ERfair may incur an unbounded number of migrations in the worstcase here.

Example Let us consider n = 5 equal weighted tasks and m = 3 processors (thus,here r = 2). Figure 1 shows the first five time slots of the schedules produced usingthe ERfair and Sticky-ERfair algorithms. From Fig. 1(a), it may be observed that theERfair schedule incurs n · (m−1) = 10 migrations while Fig. 1(b) shows that Sticky-ERfair incurs only 2 migrations (arrows in the figure indicate the migrations) whichis below the upper bound of 2r(m − r) = 4 migrations (obtained in Theorem 3).

Real-Time Syst (2011) 47:356–377 367

Fig. 1 The first five time slots(T1 to T5) of two differentschedules of five equal weightedtasks on three identicalprocessors (P1 to P3) generatedusing ERfair and Sticky-ERfairalgorithms

5 Experiments and results

We have experimentally evaluated the migration and preemption overheads of theproposed Sticky-ERfair algorithm and compared it against the Basic-ERfair algo-rithm. The evaluation methodology is based on simulation studies using an experi-mental framework which is described below.

5.1 Experimental setup

The experimentation framework used is as follows: The data sets consist of randomlygenerated hypothetical periodic tasks whose execution periods (pi ) and weights ( ei

pi)

have been taken from normal distributions. The model of the system on which tasksare scheduled consists of a set of synchronous processors which ideally work in lock-step fashion.

Given the total number of tasks to be generated (n) and the summation of weightsof the n tasks (U ), the task weights have been generated from a distribution withstandard deviation (σ ) = 0.1 and mean (μ) = U

2 . The summation of weights of thetasks as generated through the above procedure is not constant. However, making thesummation of weights constant helps in the evaluation and comparison of the algo-rithms. Therefore, the weights have been scaled uniformly to make the cumulativeweight of each distribution constant and equal to U . All the task periods have alsobeen generated from a normal distribution having σ = 3500 and μ = 4000. Now,different types of data sets have been generated by setting different values for thefollowing parameters:

1. Task set size n: sizes considered were 25, 50 and 100 tasks.2. Number of processors m: Multiprocessor systems consisting of 2 to 10 processors

were considered.3. Workload: Three different workloads were considered; we have considered cases

when the processor is 80%, 90% or 100% loaded.

During experimentation, no slack has been provided between the periods of twoconsecutive instances of a task. This has been done to keep the total load on thesystem constant throughout the schedule. Each result has been generated by runningboth the Basic and Sticky-ERfair algorithms on 100 different instances of each data

368 Real-Time Syst (2011) 47:356–377

Table 1 Migrations: Sticky-ERfair vs. Basic-ERfair

m WL n = 25 n = 50 n = 100

SERF BERF k SERF BERF k SERF BERF k

2 80% 0.126 0.793 6.300 0.133 0.793 5.961 0.183 1.077 5.888

90% 0.129 0.800 6.203 0.145 0.848 5.920 0.186 1.081 5.814

100% 0.429 0.845 1.970 0.447 0.862 1.883 0.618 1.134 1.835

4 80% 0.139 2.409 17.331 0.149 2.456 16.485 0.185 2.870 15.514

90% 0.146 2.447 16.761 0.158 2.499 15.819 0.193 2.878 14.916

100% 0.926 2.492 2.691 0.990 2.501 2.527 1.220 2.907 2.383

6 80% 0.152 4.122 27.120 0.161 4.192 26.039 0.188 4.674 24.865

90% 0.164 4.149 25.301 0.174 4.265 24.516 0.201 4.682 23.298

100% 1.369 4.227 3.088 1.458 4.283 2.938 1.691 4.729 2.797

8 80% 0.165 6.122 37.104 0.170 6.134 36.084 0.191 6.471 33.884

90% 0.181 6.151 32.195 0.197 6.172 31.326 0.214 6.505 30.398

100% 1.863 6.191 3.323 1.909 6.217 3.257 2.181 6.545 3.001

10 80% 0.172 7.782 45.247 0.179 7.785 43.497 0.197 8.257 41.917

90% 0.202 7.821 38.722 0.213 7.865 36.926 0.239 8.322 34.824

100% 2.113 8.012 3.792 2.377 8.051 3.387 2.587 8.420 3.255

n: Task set size; m: Total Number of Processors; WL: Total work load percentage SERF: Migrations pertime-slot suffered by Sticky-ERfair; BERF: Migrations per time-slot suffered by Basic-ERfair; k: Migra-tion ratio (Sticky-ERfair:Basic-ERfair)

set type and then taking the average of these hundred runs. The schedule length hasbeen taken to be 100 000 time slots. In all the experiments presented in the paper, wehave included a monitor routine to check for the ERfairness of each task at each timeslot. No ERfairness violations were observed at any instant within the entire schedulelength.

5.2 Migration measurement results

The number of inter-processor task migrations suffered both by Sticky-ERfair andBasic-ERfair algorithms have been measured by running them on 100 different in-stances of each data set type. We have presented the actual number of migrationssuffered per time slot by Sticky-ERfair and have also found out the ratio k (calledmigration ratio) of the number of migrations suffered by Basic-ERfair to the numberof migrations suffered by Sticky-ERfair. Table 1 summarizes the results for differ-ent number of tasks, processors and system load values. Figure 2 contains the plotsfor task sets consisting of 100 tasks on different number of processors under varyingworkloads. Plots for different task set sizes and different number of processors on90% loaded systems have been presented in Fig. 3.

From Fig. 2, it may be observed that the migration ratio k decreases and the actualnumber of migrations per time slot increases as the system workload increase. These

Real-Time Syst (2011) 47:356–377 369

Fig. 2 Plots showing the number of migrations per time slot and migration ratio k for 100 tasks, 2 to 10processors on 80%, 90% and 100% loaded systems

plots also reveal that the number of migrations per time slot for both the algorithmsis close to each other in a fully loaded system. However, although the number ofmigrations per time slot for Basic-ERfair continues to grow at a roughly similar rateeven at lower workloads (90% and 80%), the growth rate for Sticky-ERfair reduces

370 Real-Time Syst (2011) 47:356–377

Fig. 3 Plots showing the number of migrations per time slot and migration ratio k for 25, 50 and 100tasks, 2 to 10 processors on 90% loaded systems

drastically at lower system loads. This is because, when the system is very heavilyloaded, the deadline of postponement in Sticky-ERfair is very stringent for all tasksforcing a lot of migrations. But as the workload reduces and the deadline of postpone-ment becomes more relaxed, many tasks can be postponed to save migrations. Hence,

Real-Time Syst (2011) 47:356–377 371

Table 2 Preemptions: Sticky-ERfair vs. Basic-ERfair

m WL n = 25 n = 50 n = 100

SERF BERF R SERF BERF R SERF BERF R

2 80% 0.173 1.922 11.112 0.191 1.951 10.217 0.215 1.980 9.212

90% 0.186 1.968 10.584 0.203 1.981 9.759 0.240 1.983 8.266

100% 1.976 1.982 1.003 1.980 1.981 1.000 1.983 1.983 1.000

4 80% 0.187 3.793 20.283 0.197 3.827 19.426 0.225 3.848 17.103

90% 0.199 3.819 19.190 0.219 3.840 17.534 0.291 3.856 13.251

100% 3.847 3.866 1.005 3.859 3.863 1.001 3.863 3.864 1.000

6 80% 0.207 5.617 27.138 0.218 5.651 25.923 0.246 5.659 22.986

90% 0.255 5.671 22.239 0.274 5.681 20.734 0.358 5.686 15.883

100% 5.640 5.685 1.008 5.671 5.683 1.002 5.686 5.687 1.000

8 80% 0.222 7.553 34.027 0.245 7.578 30.934 0.289 7.586 26.251

90% 0.304 7.567 24.893 0.372 7.578 20.372 0.468 7.586 16.211

100% 7.489 7.587 1.013 7.556 7.587 1.004 7.587 7.587 1.000

10 80% 0.240 9.393 39.141 0.289 9.414 32.582 0.356 9.413 26.439

90% 0.405 9.412 23.241 0.481 9.414 19.573 0.634 9.416 14.853

100% 9.241 9.417 1.019 9.349 9.415 1.007 9.409 9.418 1.001

n: Task set size; m: Total Number of Processors; WL: Total work load percentage SERF: Preemptionsper time-slot suffered by Sticky-ERfair; BERF: Preemptions per time-slot suffered by Basic-ERfair; R:Preemption ratio (Sticky-ERfair:Basic-ERfair)

migrations drastically reduce at lower workloads for Sticky-ERfair. From Fig. 3, weobserve that at 90% workload, the number of migrations for Sticky-ERfair increasesslightly with the number of tasks, but the migration ratios k decrease. The figure alsoshows that for a given number of processors, the number of migrations for Basic-ERfair remains roughly same irrespective of the task set size.

5.3 Preemption measurement results

We have determined the actual number of preemptions suffered by both Sticky-ERfairand Basic-ERfair and have also found the Preemption ratio R, such that the number ofpreemptions suffered using Basic-ERfair is R times that suffered using Sticky-ERfair.Similar to migration measurements, each result here is the averaged value obtainedfrom executions on 100 instances of each date set type. Table 2 summarizes the pre-emption results for task sets consisting of 25, 50 and 100 tasks under 80%, 90% and100% workloads on systems consisting of 2 to 10 processors. In Fig. 4, we presentthe plots for task sets consisting of 100 tasks on different number of processors un-der varying workloads. Figure 5 contains the plots for the number of preemptionssuffered per time slot by the Sticky-ERfair algorithm as well as plots for the ratio ofthe number of preemptions (R) for different task set sizes and different number ofprocessors on 90% loaded systems.

372 Real-Time Syst (2011) 47:356–377

Fig. 4 Plots showing the number of preemptions per time slot and preemption ratio k for 100 tasks, 2 to10 processors on 80%, 90% and 100% loaded systems

From Table 2 and Figs. 4 and 5, it may be observed that the number of preemptionsfor both Sticky-ERfair and Basic-ERfair is almost same when the system is fullyloaded as almost all sub-tasks are urgent at each time slot. However, Sticky-ERfair

Real-Time Syst (2011) 47:356–377 373

Fig. 5 Plots showing the number of preemptions per time slot and preemption ratio k for 25, 50 and 100tasks, 2 to 10 processors on 90% loaded systems

incurs a lower number of migrations here (refer Fig. 2) by attempting to maximize thetime for which a task executes on a given processor. Preemptions reduce drasticallyfor Sticky-ERfair as system work load reduces. For Basic-ERfair, preemptions growlinearly as the number of processors increase.

374 Real-Time Syst (2011) 47:356–377

5.3.1 Sticky-ERfair vs. Basic-ERfair-time gain

The reduction in preemption overheads obtained by using Sticky-ERfair when trans-lated to the corresponding reduction obtained in terms of actual time, directly gives ameasure of the actual time gain obtained due to these reductions using Sticky-ERfairover the Basic-ERfair algorithm. However, the actual time gain also depends heavilyon the overhead for a single preemption on a given system and the size of the time slot.Realistic values for the preemption overhead may typically vary from lower than 1 µsin closely-coupled multi-core systems to more than 100 µs in loosely-coupled multi-processor systems. The size of a time slot may vary from 500 µs to 10 ms (Srinivasanet al. 2003).

As an example of the obtainable time gain, let us consider a system of n = 100tasks being scheduled on m = 8 processors, the average overhead for a single pre-emption being 0.01 ms (this is the approximate mean value considering preemptionsthat result in a migration and preemptions that do not cause migrations) and the timeslot size is 1 ms. The system is 80% loaded. From Table 2, it may be observed thatsuch a system when scheduled using Sticky-ERfair suffers 0.29 preemptions per timeslot and the preemption ratio R is about 26 times with respect to Basic-ERfair. So,after a period of say 100 s, Sticky-ERfair will incur (0.29 × 100 × 1000 ≈) 3 × 104

preemptions while Basic-ERfair will incur about 7.5×105 migrations. The time gainafter 100 secs is given by: (7.5 × 105 − 3 × 104)0.01 ms ≈ 7.2 s (about 7%).

This gain in time gives us spare processor bandwidth which may be useful invarious scenarios. Examples include, completion of tasks which misbehave at run-time by taking more time than they were stipulated to take, execution of non-realtime and aperiodic tasks on a best effort basis along with the real time periodic tasks,implementing power management strategies like processor slowdown and processorshutdown, fault tolerance, etc.

6 Conclusions

We have presented a new multiprocessor ERfair scheduling algorithm that minimizesinter-processor task migrations and preemptions under a global scheduling scenario(a single priority queue stores all the ready tasks; at each scheduler invocation, themost appropriate m tasks are selected from this queue). Analytical results show thatSticky-ERfair achieves this reduction in overheads while simultaneously maintainingthe same order of fairness and complexity as Basic-ERfair. Experimental results re-veal that Sticky-ERfair can achieve upto over 40 times reduction both in the numberof migrations and preemptions suffered with respect to Basic-ERfair (for a set of 25to 100 tasks running on 2 to 10 processors and varying cache sizes) while simultane-ously guaranteeing ERfairness at each time slot. However, some more tuning may bepossible while implementing the scheduler on a particular multiprocessor system.

Acknowledgement We thank the reviewers for their comments and suggestions. Arnab Sarkar was sup-ported by Microsoft Research India Ph.D. Research Fellowship Award.

Real-Time Syst (2011) 47:356–377 375

References

Anderson J, Bud V, Devi UC (2005) An edf-based scheduling algorithm for multiprocessor soft real-time systems. In: Proceedings of the 17th euromicro conference on real-time systems (ECRTS’05),Washington, DC, USA, IEEE Comput. Soc., Los Alamitos, pp 199–208

Anderson J, Srinivasan A (2000) Early-release fair scheduling. In: Proceedings of the 12th euromicroconference on real-time systems, Jun 2000, pp 35–43

Anderson J, Srinivasan A (2004) Mixed pfair/ERfair scheduling of asynchronous periodic tasks. J ComputSyst Sci 68(1):157–204

Andersson B, Bletsas K (2008) Sporadic multiprocessor scheduling with few preemptions. In: ECRTS ’08:Proceedings of the 20th euromicro conference on real-time systems, Prague, Jul 2008, pp 243–252

Andersson B, Jonsson J (2003) The utilization bounds of partitioned and pfair static-priority schedulingon multiprocessors are 50%. In: Proceedings of the 15th euromicro conference on real-time systems,Jul 2003, pp 33–40

Andersson B, Tovar E (2006) Multiprocessor scheduling with few preemptions. In: Proceedings of the12th IEEE international conference on embedded and real-time computing systems and applications,pp 322–334

Baruah S, Cohen N, Plaxton CG, Varvel D (1996) Proportionate progress: A notion of fairness in resourceallocation. Algorithmica 15(6):600–625

Baruah S, Fisher N (2006) The partitioned multiprocessor scheduling of deadline-constrained sporadictask systems. IEEE Trans Comput 55(7):918–923

Baruah S, Gehrke J, Plaxton CG (1995) Fast scheduling of periodic tasks on multiple resources. In: Pro-ceedings of the 9th international parallel processing symposium, Apr 1995, pp 280–288

Berg E, Hagersten E (2004) Statcache: a probabilistic approach to efficient and accurate data localityanalysis. In: ISPASS ’04: proceedings of the 2004 IEEE international symposium on performanceanalysis of systems and software, Washington, DC, USA. IEEE Comput Soc, Los Alamitos, pp 20–27

Block A, Anderson J (2006) Accuracy versus migration overhead in real-time multiprocessor reweightingalgorithms. In: Proceedings of the 12th international conference on parallel and distributed systems,Washington, DC, USA. IEEE Comput. Soc., Los Alamitos, pp 355–364

Carpenter J, Funk S, Holman P, Srinivasan A, Anderson J, Baruah S (2004) A categorization of real-timemultiprocessor scheduling problems and algorithms. Handbook of scheduling algorithms, methodsand models

Cho H, Ravindran B, Jensen ED (2006) An optimal real-time scheduling algorithm for multiprocessors.In: RTSS ’06: proceedings of the 27th IEEE international real-time systems symposium, Washington,DC, USA. IEEE Comput. Soc., Los Alamitos, pp 101–110

Funaoka K, Kato S, Yamasaki N (2008) Work-conserving optimal real-time scheduling on multiprocessors.In: ECRTS ’08: proceedings of the 20th euromicro conference on real-time systems, Prague, Jul 2008,pp 13–22

Funk S, Goossens J, Baruah S (2001) On-line scheduling on uniform multiprocessors. In: Proceedings ofthe 22nd IEEE real-time systems symposium, Dec 2001

Gupta A, Tucker A, Urushibara S (1991) The impact of operating system scheduling policies and synchro-nization methods of the performance of parallel application. In: ACM SIGMETRICS conference onmeasurement and modeling of computer systems, pp 120–132

Harizopoulos S, Ailamaki A Affinity scheduling in staged server architectures, Mar 2002Jeffay K, Goddard S (1999) A theory of rate-based execution. In: Proceedings of the 20th IEEE real-time

systems symposium, pp 304–314Jeffay K, Goddard S (2001) Rate-based resource allocation models for embedded systems. In: Lecture

Notes in Computer Science, vol 2211. p 204Kimbrel T, Schieber B, Sviridenko M (2006) Minimizing migrations in fair multiprocessor scheduling of

persistent tasks. J Sched 9(4):365–379Liu CL, Layland JW (1973) Scheduling algorithms for multiprogramming in a hard-real-time environment.

J ACM 20(1):46–61Lopez JM, Garcia M, Diaz JL, Garcia DF (2000) Worst-case utilization bound for edf scheduling on real-

time multiprocessor systems. In: Proceedings of the 12th euromicro conference on real-time systems,Jun 2000, pp 25–33

Malkevitch J (2004) Bin packing and machine scheduling. Feature column from the AMS: monthly essayson mathematical topics, Jun 2004

376 Real-Time Syst (2011) 47:356–377

Negi HS, Mitra T, Roychoudhury A (2003) Accurate estimation of cache-related preemption delay.In: CODES+ISSS ’03: Proceedings of the 1st IEEE/ACM/IFIP international conference on Hard-ware/software codesign and system synthesis, pp 201–206

Phillips C, Stein C, Torng E, Wein J (1997) Optimal time-critical scheduling via resource augmentation.In: Proceedings of the twenty-ninth annual ACM symposium on Theory of computing, pp 140–149

Squillante MS, Lazowska ED (1993) Using processor-cache affinity information in shared-memory mul-tiprocessor scheduling. IEEE Trans Parallel Distrib Syst 4(2):131–143

Squillante MS, Nelson RD (1991) Analysis of task migration in shared-memory multiprocessor schedul-ing. In: ACM SIGMETRICS conference on measurement and modeling of computer systems,pp 143–155

Srinivasan A, Holman P, Anderson J (2003) The case for fair multiprocessor scheduling. In: Proceedingsof the 11th international workshop on parallel and distributed real-Time systems, Nice, France, Apr2003

Torrellas J, Tucker A, Gupta A (1993) Benefits of cache-affinity scheduling in shared-memory multi-processors: a summary. ACM SIGMETRICS Perform Eval Rev 21(1):272–274

Zhu D, Mosse D, Melhem R (2003) Multiple-resource periodic scheduling problem: how much fairness isnecessary. In: Proceedings of the 24th IEEE real-time systems symposium, Dec 2003, p 142

Arnab Sarkar received the B.Sc. degree in Computer Science in 2000and B.Tech degree in Information Technology in 2003 from Universityof Calcutta, Kolkata, India. He received the M.S. degree in ComputerScience and Engineering at the Indian Institute of Technology (IIT),Kharagpur, India in 2006 and is currently pursuing his Ph.D. in thesame institute. He received the National Doctoral Fellowship (NDF)from AICTE, Ministry of HRD, Govt. of India, in 2006 and the MSRIndia Ph.D. fellowship from Microsoft Research Lab India, in 2007. Heis currently pursuing his research as a Microsoft Research Fellow. Hiscurrent research interests include real-time scheduling, system softwarefor embedded systems and computer architectures.

Sujoy Ghose received the B.Tech. degree in Electronics and ElectricalCommunication Engineering from the Indian Institute of Technology,Kharagpur, in 1976, the M.S. degree from Rutgers University, Piscat-away, NJ, and the Ph.D. degree in computer science and engineeringfrom the Indian Institute of Technology. He is currently a Professor inthe Department of Computer Science and Engineering, Indian Instituteof Technology. His research interests include design of algorithms, ar-tificial intelligence, and computer networks.

Real-Time Syst (2011) 47:356–377 377

P.P. Chakrabarti received the BTech and Ph.D. degrees in computerscience and engineering from the Indian Institute of Technology (IIT),Kharagpur, in 1985 and 1988, respectively. He joined the Departmentof Computer Science and Engineering, IIT, as a faculty member in 1988and is currently a professor in the Computer Science and EngineeringDepartment, where he currently holds the position of dean (SponsoredResearch and Industrial Consultancy) and where he was the professorin charge of the state-of-the-art VLSI Design Laboratory. He has pub-lished more than 100 papers and collaborated with a number of world-class companies. His areas of interest include artificial intelligence,CAD for VLSI, and algorithm design. He received the President of In-dia Gold Medal, the Swarnajayanti Fellowship, and the Shanti SwarupBhatnagar Prize from the Government of India for his contributions.

Documents

Sticky-ERfair: a task-processor affinity aware proportional fair scheduler