CPU Scheduling

CS 519 -- Operating Systems -- Fall 2000

CPU SchedulingCS 519: Operating System TheoryComputer Science, Rutgers University

Instructor: Thu D. NguyenTA: Xiaoyan Li

Spring 2002#Computer Science, RutgersCS 519: Operating System TheoryWhat and Why?What is processor scheduling?Why?At first to share an expensive resource multiprogrammingNow to perform concurrent tasks because processor is so powerfulFuture looks like past + nowRent-a-computer approach large data/processing centers use multiprogramming to maximize resource utilizationSystems still powerful enough for each user to run multiple concurrent tasks#Computer Science, RutgersCS 519: Operating System TheoryAssumptionsPool of jobs contending for the CPUCPU is a scarce resourceJobs are independent and compete for resources (this assumption is not always used)Scheduler mediates between jobs to optimize some performance criteria

#Computer Science, RutgersCS 519: Operating System TheoryWhat Do We Optimize?System-oriented metrics: Processor utilization: percentage of time the processor is busyThroughput: number of processes completed per unit of timeUser-oriented metrics: Turnaround time: interval of time between submission and termination (including any waiting time). Appropriate for batch jobsResponse time: for interactive jobs, time from the submission of a request until the response begins to be receivedDeadlines: when process completion deadlines are specified, the percentage of deadlines met must be promoted#Computer Science, RutgersCS 519: Operating System TheoryDesign SpaceTwo dimensionsSelection functionWhich of the ready jobs should be run next?PreemptionPreemptive: currently running job may be interrupted and moved to Ready stateNon-preemptive: once a process is in Running state, it continues to execute until it terminates or it blocks for I/O or system service#Computer Science, RutgersCS 519: Operating System TheoryJob Behavior

#Computer Science, RutgersCS 519: Operating System TheoryJob BehaviorI/O-bound jobsJobs that perform lots of I/OTend to have short CPU burstsCPU-bound jobsJobs that perform very little I/OTend to have very long CPU burstsDistribution tends to be hyper-exponentialVery large number of very short CPU burstsA small number of very long CPU burstsCPUDisk#Computer Science, RutgersCS 519: Operating System TheoryHistogram of CPU-burst Times

#Computer Science, RutgersCS 519: Operating System TheoryExample Job SetProcessArrivalTimeServiceTime123450246836452#Computer Science, RutgersCS 519: Operating System TheoryBehavior of Scheduling Policies

#Computer Science, RutgersCS 519: Operating System TheoryBehavior of Scheduling Policies

#Computer Science, RutgersCS 519: Operating System TheoryMultilevel QueueReady queue is partitioned into separate queues:foreground (interactive)background (batch)Each queue has its own scheduling algorithm:foreground RRbackground FCFSScheduling must be done between the queues.Fixed priority scheduling; i.e., serve all from foreground then from background. Possibility of starvation.Time slice each queue gets a certain amount of CPU time which it can schedule amongst its processes; i.e.,80% to foreground in RR20% to background in FCFS #Computer Science, RutgersCS 519: Operating System TheoryMultilevel Queue Scheduling

#Computer Science, RutgersCS 519: Operating System TheoryMultilevel Feedback QueueA process can move between the various queues; aging can be implemented this way.Multilevel-feedback-queue scheduler defined by the following parameters:number of queuesscheduling algorithms for each queuemethod used to determine when to upgrade a processmethod used to determine when to demote a processmethod used to determine which queue a process will enter when that process needs service#Computer Science, RutgersCS 519: Operating System TheoryMultilevel Feedback Queues

#Computer Science, RutgersCS 519: Operating System TheoryExample of Multilevel Feedback QueueThree queues: Q0 time quantum 8 millisecondsQ1 time quantum 16 millisecondsQ2 FCFSSchedulingA new job enters queue Q0 which is served FCFS. When it gains CPU, job receives 8 milliseconds. If it does not finish in 8 milliseconds, job is moved to queue Q1.At Q1 job is again served FCFS and receives 16 additional milliseconds. If it still does not complete, it is preempted and moved to queue Q2.#Computer Science, RutgersCS 519: Operating System TheoryTraditional UNIX Scheduling Multilevel feedback queues 128 priorities possible (0-127) 1 Round Robin queue per priority Every scheduling event the scheduler picks the lowest priority non-empty queue and runs jobs in round-robin Scheduling events: Clock interrupt Process does a system call Process gives up CPU,e.g. to do I/O

#Computer Science, RutgersCS 519: Operating System TheoryTraditional UNIX SchedulingAll processes assigned a baseline priority based on the type and current execution status: swapper 0waiting for disk20waiting for lock35user-mode execution50At scheduling events, all processs priorities are adjusted based on the amount of CPU used, the current load, and how long the process has been waiting.Most processes are not running, so lots of computing shortcuts are used when computing new priorities. #Computer Science, RutgersCS 519: Operating System TheoryUNIX Priority Calculation Every 4 clock ticks a processes priority is updated:

The utilization is incremented every clock tick by 1.The niceFactor allows some control of job priority. It can be set from 20 to 20. Jobs using a lot of CPU increase the priority value. Interactive jobs not using much CPU will return to the baseline.

#Computer Science, RutgersCS 519: Operating System TheoryUNIX Priority CalculationVery long running CPU bound jobs will get stuck at the highest priority. Decay function used to weight utilization to recent CPU usage. A processs utilization at time t is decayed every second:

The system-wide load is the average number of runnable jobs during last 1 second

#Computer Science, RutgersCS 519: Operating System TheoryUNIX Priority Decay1 job on CPU. load will thus be 1. Assume niceFactor is 0. Compute utilization at time N:

+1 second:

+2 seconds

+N seconds

#Computer Science, RutgersCS 519: Operating System TheoryScheduling AlgorithmsFIFO is simple but leads to poor average response times. Short processes are delayed by long processes that arrive before themRR eliminate this problem, but favors CPU-bound jobs, which have longer CPU bursts than I/O-bound jobsSJN, SRT, and HRRN alleviate the problem with FIFO, but require information on the length of each process. This information is not always available (although it can sometimes be approximated based on past history or user input)Feedback is a way of alleviating the problem with FIFO without information on process length#Computer Science, RutgersCS 519: Operating System TheoryIts a Changing WorldAssumption about bi-modal workload no longer holdsInteractive continuous media applications are sometimes processor-bound but require good response timesNew computing model requires more flexibilityHow to match priorities of cooperative jobs, such as client/server jobs?How to balance execution between multiple threads of a single process?

#Computer Science, RutgersCS 519: Operating System TheoryLottery SchedulingRandomized resource allocation mechanismResource rights are represented by lottery ticketsHave rounds of lotteryIn each round, the winning ticket (and therefore the winner) is chosen at randomThe chances of you winning directly depends on the number of tickets that you haveP[wining] = t/T, t = your number of tickets, T = total number of tickets#Computer Science, RutgersCS 519: Operating System TheoryLottery SchedulingAfter n rounds, your expected number of wins isE[win] = nP[wining]The expected number of lotteries that a client must wait before its first winE[wait] = 1/P[wining]

Lottery scheduling implements proportional-share resource managementTicket currencies allow isolation between users, processes, and threadsOK, so how do we actually schedule the processor using lottery scheduling?#Computer Science, RutgersCS 519: Operating System TheoryImplementation

#Computer Science, RutgersCS 519: Operating System TheoryPerformance

Allocated and observed execution ratios between two tasks running the Dhrystone benchmark.With exception of 10:1allocation ratio, all observedratios are close to allocations#Computer Science, RutgersCS 519: Operating System TheoryShort-term Allocation Ratio

#Computer Science, RutgersCS 519: Operating System TheoryIsolation

Five tasks running the Dhrystonebenchmark. Let amount.currencydenote a ticket allocation of amountdenominated in currency. Tasks A1 and A2 have allocations 100.A and200.A, respectively. Tasks B1 and B2have allocations 100.B and 200.B, respectively. Halfway thru experimentB3 is started with allocation 300.B.This inflates the number of tickets in Bfrom 300 to 600. Theres no effect on tasks in currency A or on the aggregate iteration ratio of A tasks to B tasks. Tasks B1 and B2 slow to half their original rates, corresponding to the factor of 2 inflation caused by B3.#Computer Science, RutgersCS 519: Operating System TheoryBorrowed-Virtual-Time (BVT) SchedulingCurrent scheduling in general purpose systems does not support rapid dispatch of latency-sensitive applicationsExamples include continuous media applications such as teleconferencing, playing movies, voice-over-IP, etc.Whats the problem with the traditional Unix scheduler?Beauty of BVT is its simplicityCorollary: not that much to sayTricky part is figuring out the appropriate parameters of the paper is on this (which Im going to skip)#Computer Science, RutgersCS 519: Operating System TheoryBVT Scheduling: Basic IdeaScheduling is done based on virtual timeEach thread hasEVT (effective virtual time)AVT (actual virtual time)W (warp factor)warpBack (whether warp is on or not)EVT of thread is computed as

Threads accumulate virtual time as they runThread with earliest EVT is scheduled next

#Computer Science, RutgersCS 519: Operating System TheoryBVT Scheduling: DetailsCan only switch every C time units to prevent thrashingThreads can accumulate virtual time at different ratesAllow for weighted fair sharing of CPUTo make sure that latency-sensitive threads are scheduled right away, give these threads high warp valuesHave limits on how much and how long can warp to prevent abuse#Computer Science, RutgersCS 519: Operating System TheoryBVT Scheduling: Performance

#Computer Science, RutgersCS 519: Operating System TheoryBVT Scheduling: Performance



#Computer Science, RutgersCS 519: Operating System TheoryBVT vs. LotteryHow do the two compare?#Parallel Processor Scheduling#Computer Science, RutgersCS 519: Operating System TheorySimulating Ocean CurrentsModel as two-dimensional gridsDiscretize in space and timefiner spatial and temporal resolution => greater accuracyMany different computations per time stepset up and solve equationsConcurrency across and within grid computations(a) Cross sections(b) Spatial discretization of a cross section#Computer Science, RutgersCS 519: Operating System Theorym1m2r2

Case Study 2: Simulating Galaxy EvolutionSimulate interactions of many stars evolving over timeComputing forces is expensiveO(n2) brute force approachHierarchical Methods take advantage of force law: G#Computer Science, RutgersCS 519: Operating System TheoryCase Study 2: Barnes-Hut Many time steps, plenty of concurrency across stars within eachLocality GoalParticles close together in space should be on same processorDifficulties: Nonuniform, dynamically changingSpatial DomainQuad-tree#Computer Science, RutgersCS 519: Operating System TheoryCase Study 3: Rendering Scenes by Ray TracingShoot rays into scene through pixels in projection planeResult is color for pixelRays shot through pixels in projection plane are called primary raysReflect and refract when they hit objectsRecursive process generates ray tree per primary ray Tradeoffs between execution time and image qualityViewpointProjection Plane3D SceneRay fromviewpoint toupper right cornerpixelDynamicallygenerated ray#Computer Science, RutgersCS 519: Operating System TheoryPartitioningNeed dynamic assignmentUse contiguous blocks to exploit spatial coherence among neighboring rays, plus tiles for task stealingA block,the unit ofassignmentA tile,the unit of decompositionand stealing#Computer Science, RutgersCS 519: Operating System TheorySample Speedups

#Computer Science, RutgersCS 519: Operating System TheoryCoscheduling (Gang)Cooperating processes may interact frequentlyWhat problem does this lead to?Fine-grained parallel applications have process working setTwo things neededIdentify process working setCoschedule themAssumption: explicitly identified process working setSome good recent work has shown that it may be possible to dynamically identify process working set#Computer Science, RutgersCS 519: Operating System TheoryCoschedulingWhat is coscheduling?Coordinating across nodes to make sure that processes belonging to the same process working set are scheduled simultaneouslyHow might we do this?

#Computer Science, RutgersCS 519: Operating System TheoryImpact of OS Scheduling Policies and Synchronization on PerformanceConsider performance for a set of applications forFeedback priority schedulingSpinningBlockingSpin-and-blockBlock-and-hand-offBlock-and-affinityGang scheduling (time-sharing coscheduling)Process control (space-sharing coscheduling)#Computer Science, RutgersCS 519: Operating System TheoryApplications

#Computer Science, RutgersCS 519: Operating System TheoryNormal Scheduling with Spinning

#Computer Science, RutgersCS 519: Operating System Theory

Normal Scheduling with Blocking Locks#Computer Science, RutgersCS 519: Operating System TheoryGang Scheduling

#Computer Science, RutgersCS 519: Operating System TheoryProcess Control (Space-Sharing)

#Computer Science, RutgersCS 519: Operating System TheoryMultiprocessor SchedulingLoad sharing: poor locality; poor synchronization behavior; simple; good processor utilization. Affinity or per processor queues can improve locality. Gang scheduling: central control; fragmentation --unnecessary processor idle times (e.g., two applications with P/2+1 threads); good synchronization behavior; if careful, good localityHardware partitions: poor utilization for I/O-intensive applications; fragmentation unnecessary processor idle times when partitions left are small; good locality and synchronization behavior#

Documents

CPU Scheduling