Scheduling and Policing 2007. Outline What is scheduling? Why we need it? What is scheduling? Why we need it? Requirements of a scheduling discipline

Scheduling and Policing

2007

Outline What is scheduling? Why we need it?What is scheduling? Why we need it? Requirements of a scheduling Requirements of a scheduling

disciplinediscipline Fundamental choicesFundamental choices Scheduling best effort connectionsScheduling best effort connections Scheduling guaranteed-service Scheduling guaranteed-service

connectionsconnections Packet drop strategiesPacket drop strategies PolicingPolicing

Scheduling SharingSharing always results in always results in contentioncontention A A scheduling disciplinescheduling discipline resolves resolves

contention: who’s next?contention: who’s next? Problem: given N packet streams Problem: given N packet streams

contending for the same channel, how contending for the same channel, how to schedule pkt transmissions?to schedule pkt transmissions?

Key to Key to fairly fairly sharing resourcessharing resources and and providing performance providing performance guaranteesguarantees - - divide apple piedivide apple pie

Components A scheduling discipline does two things:A scheduling discipline does two things:

decides decides service orderservice order manages manages queuequeue of service requests of service requests

-- loss when Q is full-- loss when Q is full Example: web serverExample: web server

consider queries awaiting web serverconsider queries awaiting web server scheduling discipline decides service scheduling discipline decides service

orderorder and also if some query should be ignoredand also if some query should be ignored

Where? AnywhereAnywhere where contention may occur where contention may occur At every layer of protocol stackAt every layer of protocol stack Usually studied at network layer, at Usually studied at network layer, at output output

queuesqueues of switches of switches

Why do we need one? Because future applications need itBecause future applications need it We expect two types of future applicationsWe expect two types of future applications

best-effortbest-effort (adaptive, non-real time) (adaptive, non-real time) elasticelastice.g. email, some types of file transfere.g. email, some types of file transfer

guaranteed serviceguaranteed service (non-adaptive, real (non-adaptive, real time)time)e.g. packet voice, interactive video, e.g. packet voice, interactive video,

stock quotes stock quotes -- 64 kbps, 150msRTT -- 64 kbps, 150msRTT

Why and where scheduling?

What can scheduling disciplines do? Give different users different qualities of serviceGive different users different qualities of service Example of passengers waiting to board a planeExample of passengers waiting to board a plane

early boarders spend less time waitingearly boarders spend less time waiting bumped off passengers are ‘lost’!bumped off passengers are ‘lost’!

Scheduling disciplines can allocateScheduling disciplines can allocate bandwidthbandwidth delaydelay lossloss

They also determine how They also determine how fairfair the network isthe network is

The Conservation Law Traffic class/Connection Traffic class/Connection i: i: (KL_Q_v2_117) (KL_Q_v2_117) λλi i xxii ρρii = =λλi i xxi i qqi i -- mean waiting time-- mean waiting time

Example 9.2 Example 9.2 155 Mbps ATM -- two virtual 155 Mbps ATM -- two virtual circuits A: 10 Mbps, 0.5 circuits A: 10 Mbps, 0.5 ms --> 0.1 ms ms --> 0.1 ms B: 25 Mbps, 0.5 ms --> mean Q delay D = ? B: 25 Mbps, 0.5 ms --> mean Q delay D = ? 10/155*0.5+25/155*0.5 = 10/155*0.1+25/155*D 10/155*0.5+25/155*0.5 = 10/155*0.1+25/155*D D =0.66 ms D =0.66 ms

Constant 1

N

iiiq



connectionsconnections Packet drop strategiesPacket drop strategies

Requirements easy to implementeasy to implement fair (for best effort)fair (for best effort)

protected against abusive sourcesprotected against abusive sources provides performance bounds provides performance bounds

(for guaranteed service) (for guaranteed service) allows easy allows easy admission controladmission control

decisions (for guaranteed service)decisions (for guaranteed service) to decide whether a new flow can be to decide whether a new flow can be

allowedallowed

Requirement 1. Ease of implementation Scheduling discipline has to make a decision Scheduling discipline has to make a decision

once every once every few nsfew ns ! ! 40 B-packet / 40 Gbps --> 8 40 B-packet / 40 Gbps --> 8 nsns

Should be implementable in a few instructions Should be implementable in a few instructions or hardwareor hardware for hardware: critical constraint is VLSI for hardware: critical constraint is VLSI spacespace

Work per packet should scale less than linearly Work per packet should scale less than linearly with number of active connections with number of active connections Scheduling Scheduling overheadoverhead - independent of - independent of

number of active connections number of active connections

Requirement 2. Fairness Scheduling discipline Scheduling discipline allocatesallocates a a resourceresource An allocation is fair if it satisfiesAn allocation is fair if it satisfies max-min fairnessmax-min fairness

resource is equally shared resource is equally shared each connection getseach connection gets no more than what it wants no more than what it wants the excess, if any, is equally sharedthe excess, if any, is equally shared

The minimum of the flows should be as large as The minimum of the flows should be as large as possiblepossible

A B C A B C

Transfer half of excess

Unsatisfied demand

Fairness (cont.) Definition: max-min fair allocates Definition: max-min fair allocates

-- it maximizes the minimum share of a -- it maximizes the minimum share of a resource whose demand is not filly satisfiedresource whose demand is not filly satisfied Resources are allocated in order of increasing demandResources are allocated in order of increasing demand No resource gets a resource share large than its No resource gets a resource share large than its

demanddemand Sources with unsatisfied demands get an equally shareSources with unsatisfied demands get an equally share

K-ex 9.3K-ex 9.3 resource capacity -- 10resource capacity -- 10 4 connection demand -- 2, 2.6, 4, 54 connection demand -- 2, 2.6, 4, 5 allocate 2, 2.6, 2.7, 2.7allocate 2, 2.6, 2.7, 2.7 10/4=2.5, 10/4=2.5, 2.5-2=0.5, 0.5/3=0.166,2.5-2=0.5, 0.5/3=0.166, 2.5+0.166-2.6=0.066, 0.06/2=0.0332.5+0.166-2.6=0.066, 0.06/2=0.033

Fairness (cont.) Fairness is Fairness is intuitively intuitively a good idea a good idea But it also provides But it also provides protectionprotection

traffic hogs cannot overrun otherstraffic hogs cannot overrun others automatically builds automatically builds firewallsfirewalls around around

heavy usersheavy users Fairness is a Fairness is a global global objective, but scheduling objective, but scheduling

isis local local flow is a flow is a global global objective, switch is local objective, switch is local Each endpoint must restrict its flow to the Each endpoint must restrict its flow to the

smallest fair allocation along its pathsmallest fair allocation along its path Dynamics + delayDynamics + delay => global fairness may => global fairness may

never be achievednever be achieved

Requirement 3. Performance bounds

What is it?What is it? A way to obtain a desired level of serviceA way to obtain a desired level of service

Can be Can be deterministic deterministic or or statisticalstatistical Common parameters areCommon parameters are

bandwidthbandwidth delaydelay delay-jitter delay-jitter lossloss




Fundamental choices1. 1. Number of priority levelsNumber of priority levels

2. Work-conserving vs. 2. Work-conserving vs.

non-work-conservingnon-work-conserving

3. Degree of aggregation3. Degree of aggregation

4. Service order within a level4. Service order within a level

Choices: 1. Priority Packet is served from a given priority level Packet is served from a given priority level

only if no packets exist at higher levels only if no packets exist at higher levels ((multilevel priority with exhaustive servicemultilevel priority with exhaustive service))

Highest level gets lowest delayHighest level gets lowest delay Watch out for starvation!Watch out for starvation! Usually map priority levels to delay classesUsually map priority levels to delay classes

Low bandwidth urgent messagesLow bandwidth urgent messages

RealtimeRealtime

Non-realtime Non-realtime

Priority

Choices: 2. Work conserving vs. non-work- conserving Work conserving discipline is Work conserving discipline is never idlenever idle when when

packets await servicepackets await service Why non-work conserving? -> less burstWhy non-work conserving? -> less burst

A - smooth A - burstA - smooth A - burst

B -burst B - burstB -burst B - burst

Non-work-conserving disciplines (Regulators)

Key conceptual idea: Key conceptual idea: delay packetdelay packet till till eligibleeligible Reduces delay-jitterReduces delay-jitter => less => less burstburst , , fewer fewer

buffersbuffers, less , less lossloss in network in network How to choose eligibility time?How to choose eligibility time?

rate-jitter regulatorrate-jitter regulatorbounds maximum outgoing ratebounds maximum outgoing rate

delay-jitter regulatordelay-jitter regulatorcompensates for variable delay at compensates for variable delay at

previous hopprevious hop

Do we need non-work-conservation? Can remove delay-jitter at an endpoint insteadCan remove delay-jitter at an endpoint instead

but also reduces size of switch buffers…but also reduces size of switch buffers… Increases Increases mean delaymean delay

not a problem for not a problem for playbackplayback applications applications Wastes Wastes bandwidthbandwidth

can serve best-effort packets insteadcan serve best-effort packets instead Always punishes a misbehaving sourceAlways punishes a misbehaving source

can’t have it both ways ?can’t have it both ways ? Bottom line: not too bad, implementation cost Bottom line: not too bad, implementation cost

may be the biggest problem may be the biggest problem - calendar Q - K-p228 - calendar Q - K-p228

Outline What is schedulingWhat is scheduling Why we need itWhy we need it Requirements of a scheduling Requirements of a scheduling



Scheduling best-effort connections

Main requirement is Main requirement is fairnessfairness Fairness GoalsFairness Goals

Allocate resources fairly Allocate resources fairly Isolate ill-behaved usersIsolate ill-behaved users

Router does not send explicit feedback to Router does not send explicit feedback to sourcesource

Still needs e2e congestion controlStill needs e2e congestion control Still achieve statistical muxingStill achieve statistical muxing

One flow can fill entire pipe if no contendersOne flow can fill entire pipe if no contenders Work conserving Work conserving scheduler never idles scheduler never idles

link if it has a packetlink if it has a packet

Generalized Processor Sharing (GPS) Main requirement is Main requirement is fairness fairness (max-min fair (max-min fair

allocates)allocates) Achievable using Achievable using Generalized Processor Generalized Processor

Sharing Sharing (GPS)(GPS) Visit each non-empty queue in turnVisit each non-empty queue in turn Serve Serve infinitesimal infinitesimal from eachfrom each Why is this fair?Why is this fair? How can we give weights to connections?How can we give weights to connections?

More on GPS GPS is GPS is ideal ideal and unimplementable!and unimplementable!

we cannot serve infinitesimal, only packetswe cannot serve infinitesimal, only packets No packet discipline can be as fair as GPSNo packet discipline can be as fair as GPS

while a packet is being served, we are while a packet is being served, we are unfair to othersunfair to others

Degree of unfairness can be boundedDegree of unfairness can be bounded Define: Define: workwork((I,a,bI,a,b)) = # bits transmitted for = # bits transmitted for

connection I in time [a,b]connection I in time [a,b] AbsoluteAbsolute fairness bound for discipline S fairness bound for discipline S

Max (work_GPS(I,a,b) - work_S(I, a,b))Max (work_GPS(I,a,b) - work_S(I, a,b)) RelativeRelative fairness bound for discipline S fairness bound for discipline S

Max (work_S(I,a,b) - work_S(J,a,b))Max (work_S(I,a,b) - work_S(J,a,b))

Weighted Fair Queueing (WFQ) GPS is fairest disciplineGPS is fairest discipline Find the Find the finish timefinish time of a packet, of a packet, had we been had we been

doing GPSdoing GPS Then serve packets in order of their finish Then serve packets in order of their finish

timestimes Deals better with Deals better with variable sizevariable size packets and packets and

weightsweights

Weighted round robinWeighted round robin UnfairUnfair if packets are of different length or if packets are of different length or

weights are not equalweights are not equal

WFQ: first cut Suppose, in each Suppose, in each round, round, the server served the server served

one bitone bit from each active connection from each active connection Round numberRound number is the number of rounds is the number of rounds

already completed. It can bealready completed. It can be fractional fractional If a packet of length If a packet of length p p arrives to an empty arrives to an empty

queue when the round number is queue when the round number is RR, it will , it will complete service when the round numbercomplete service when the round number

is is R + p => R + p => finish numberfinish number is is R + pR + p independent of the number of other connections!independent of the number of other connections!

If a packet arrives to a non-empty queue, and If a packet arrives to a non-empty queue, and the previous packet has a finish number of the previous packet has a finish number of ff, , then the packet’s finish number is then the packet’s finish number is f + pf + p

Serve packets in order of finish numbers (Serve packets in order of finish numbers (≠ ≠ finish timefinish time))

WFQ continued To sum up, assuming we know the current To sum up, assuming we know the current

round number round number RR Finish number of packet of length Finish number of packet of length pp

if arriving to active connection = previous if arriving to active connection = previous finish number finish number ff + + pp

if arriving to an inactive connection = if arriving to an inactive connection = RR + + pp (How should we deal with weights?)(How should we deal with weights?) To implement, we need to know two things:To implement, we need to know two things:

is connection active?is connection active? if not, what is the current round number?if not, what is the current round number?

Answer to both questions depends on Answer to both questions depends on computing the current round number (why?)computing the current round number (why?)

WFQ: computing the round number Naively: round number = number of rounds Naively: round number = number of rounds

of service completed so farof service completed so far what if a server has not served all what if a server has not served all

connections in a round?connections in a round? what if new conversations join in halfway what if new conversations join in halfway

through a round?through a round? RedefineRedefine round number round number as a as a realreal-valued -valued

variable that increases at a rate variable that increases at a rate inverselyinversely proportional to the number of currently proportional to the number of currently activeactive connections connections this takes care of both problems (why?)this takes care of both problems (why?)

With this change, WFQ emulates GPS With this change, WFQ emulates GPS instead of bit-by-bit RRinstead of bit-by-bit RR

WFQ: computing the round number + K - Ex 9.10 K - Ex 9.10 arrival: arrival: A: (size=10,t=0) (20,40), B: (20,0), C: (20,0)A: (size=10,t=0) (20,40), B: (20,0), C: (20,0)

service rate: 1 unit/s, equally service rate: 1 unit/s, equally weightedweighted

0

5

10

15

20

25

30

35

40

0 10 20 30 40 50 60 70 80

Ti me

Round number

A

B,C

Roundnumber

Problem: iterated deletion

Ex:Ex: Keshav p242 Keshav p242 t= act connections rate round numt= act connections rate round num 0 5 1/5 00 5 1/5 0 5 1 --> 1.05 --> 5 1 --> 1.05 -->

1.05+1.05+ 4 a depart -> 4 ---> 1/4 --------------^4 a depart -> 4 ---> 1/4 --------------^ ,-------------------------------’,-------------------------------’ 4.9 b depart -> 3 ---> 1/3 --------------------------^4.9 b depart -> 3 ---> 1/3 --------------------------^ A sever recomputes round number on each A sever recomputes round number on each

packet arrival/deptpacket arrival/dept

Need Need improveimprove WFQ WFQ

Round number # active conversations - at arrival/departure

Problem: iterated deletion (cont)

TrickTrick use previous count to compute round use previous count to compute round

numbernumber if this makes some conversation inactive, if this makes some conversation inactive,

recomputerecompute repeat until no conversations become repeat until no conversations become

inactiveinactive Complex computation at arrival/departureComplex computation at arrival/departure

problem in high speed networksproblem in high speed networks overcoming iterated deletion problem,overcoming iterated deletion problem,

e.g. Self-clocked fair queuing (SCFQ) e.g. Self-clocked fair queuing (SCFQ)

Round number # active conversations - at arrival/departure

Evaluation ProsPros

deals better with variable size packets and deals better with variable size packets and weightsweights

like GPS, it provides protectionlike GPS, it provides protection can obtain worst-case can obtain worst-case end-to-end delay boundend-to-end delay bound gives users incentive to use intelligent flow gives users incentive to use intelligent flow

control (& provides rate information implicitly)control (& provides rate information implicitly) ConsCons

needs per-connection stateneeds per-connection state iterated deletion is complicatediterated deletion is complicated requires a priority queuerequires a priority queue

Variants of WFQVariants of WFQ - implementing easily - implementing easily e.g. Self-clocked fair queuing (SCFQ), e.g. Self-clocked fair queuing (SCFQ), start-time fair queuingstart-time fair queuing

Since 1996, WFQ in Cisco, FORE, ..Since 1996, WFQ in Cisco, FORE, ..

What does “fairness” divide between?

At what granularity?At what granularity? Flows, connections, domains?Flows, connections, domains?

What if users have different RTTs/links/etc.What if users have different RTTs/links/etc. Should it share a link fairly or be TCP fair?Should it share a link fairly or be TCP fair?

Basically a tough question to answer – Basically a tough question to answer – typically design mechanisms instead of typically design mechanisms instead of policypolicy User = arbitrary granularityUser = arbitrary granularity Paper has a nice argument for (src, dst) Paper has a nice argument for (src, dst)

pairspairs




Scheduling guaranteed-service connections

With best-effort connections, goal is With best-effort connections, goal is fairnessfairness

With guaranteed-service connectionsWith guaranteed-service connections what performance guarantees are what performance guarantees are

achievable?achievable? how easy is admission control?how easy is admission control?

We now study some scheduling We now study some scheduling disciplines that provide performance disciplines that provide performance guaranteesguarantees

WFQ WFQ also provides performance guaranteesWFQ also provides performance guarantees Bandwidth boundBandwidth bound

ratio of weights * link capacityratio of weights * link capacity e.g. connections with weights 1, 2, 7; link e.g. connections with weights 1, 2, 7; link

capacity 10capacity 10 connections get at least 1, 2, 7 units of b/w connections get at least 1, 2, 7 units of b/w

eacheach End-to-end delay boundEnd-to-end delay bound

assumes that the connection doesn’t send assumes that the connection doesn’t send ‘too much’ more ‘too much’ more

precisely, connection should beprecisely, connection should be leaky-leaky-bucketbucket regulatedregulated

Parekh-Gallager theoremParekh-Gallager theorem

Parekh-Gallager theorem Let it be Let it be leaky-bucket leaky-bucket regulated such that # regulated such that #

bits sent in time [tbits sent in time [t11, t, t22] <= ρ(t] <= ρ(t22 - t - t11) + σ ) + σ Let a connection be allocated weights at each Let a connection be allocated weights at each

WFQ scheduler along its path, so that the WFQ scheduler along its path, so that the least bandwidth it is allocated is least bandwidth it is allocated is g -- gg -- g<<gg((kk)) gg((kk) – assigned service rate) – assigned service rate

Let the connection pass through Let the connection pass through KK schedulers, where the schedulers, where the kkth scheduler has a th scheduler has a link rate link rate r(k)r(k)

Let largest packet allowed in the network be Let largest packet allowed in the network be PP

1

1 1

)(/)(//__K

k

K

k

krPkgPgdelayendend

Significance Theorem shows that WFQ can provide end-to-Theorem shows that WFQ can provide end-to-

end delay boundsend delay bounds So WFQ provides both fairness and So WFQ provides both fairness and

performance guaranteesperformance guarantees Bound holds regardless of cross traffic Bound holds regardless of cross traffic

behavior behavior Can be generalized for networks where Can be generalized for networks where

schedulers are variants of WFQ, and the link schedulers are variants of WFQ, and the link service rate changes over timeservice rate changes over time

Problems WFQ couples delay and bandwidth allocationsWFQ couples delay and bandwidth allocations

low delay requires allocating more low delay requires allocating more bandwidthbandwidth

wastes bandwidth for low-bandwidth low-wastes bandwidth for low-bandwidth low-delay sourcesdelay sources

gg can be very large, in some cases can be very large, in some cases 80 times80 times the peak rate!the peak rate!

Sources must be Sources must be leaky-bucketleaky-bucket regulated regulated but choosing leaky-bucket parameters is but choosing leaky-bucket parameters is

problematicproblematic

Delay- EDD (Earliest Due Date) EDD: packet with earliest deadline selectedEDD: packet with earliest deadline selected Delay-EDD prescribes how to assign deadlines Delay-EDD prescribes how to assign deadlines

to packetsto packets Deadline = expected arrival time + delay boundDeadline = expected arrival time + delay bound If a source sends faster than contract, delay bound If a source sends faster than contract, delay bound

will not applywill not apply A source is required to send slower than its A source is required to send slower than its

peak ratepeak rate Bandwidth at scheduler reserved at peak rate Bandwidth at scheduler reserved at peak rate Each packet gets a hard delay boundEach packet gets a hard delay bound

Delay bound is Delay bound is independentindependent of bandwidth of bandwidth requirementrequirement but reservation is at a connection’s peak ratebut reservation is at a connection’s peak rate

Implementation requires per-connection state Implementation requires per-connection state and a priority queueand a priority queue

Rate-controlled scheduling A A classclass of disciplines of disciplines

two components: regulator and schedulertwo components: regulator and scheduler incoming packets are placed in regulator incoming packets are placed in regulator

where they wait to become eligiblewhere they wait to become eligible then they are put in the schedulerthen they are put in the scheduler

RegulatorRegulator shapesshapes the traffic, scheduler the traffic, scheduler provides performance guaranteesprovides performance guarantees

Summary Two sorts of applications: best effort and Two sorts of applications: best effort and

guaranteed serviceguaranteed service Best effort connections require fair serviceBest effort connections require fair service

provided by GPS, which is unimplementableprovided by GPS, which is unimplementable emulated by WFQ and its cheaper variantsemulated by WFQ and its cheaper variants

Guaranteed service connections require Guaranteed service connections require performance guarantees performance guarantees provided by WFQ, but this is expensive provided by WFQ, but this is expensive

(low delay -> large b/w) (low delay -> large b/w) may be better to use rate-controlled may be better to use rate-controlled

schedulersschedulers K-p253 TableK-p253 Table




Packet dropping

Packets that cannot be served Packets that cannot be served immediately are bufferedimmediately are buffered

Full buffers => Full buffers => packet drop strategypacket drop strategy Packet losses happen almost always Packet losses happen almost always

from best-effort connections from best-effort connections (why?)(why?)

Shouldn’t drop packets unless Shouldn’t drop packets unless imperativeimperative packet drop wastes resources packet drop wastes resources (why?)(why?)

Classification of drop strategies

1. 1. Degree of aggregationDegree of aggregation

2. Drop priorities2. Drop priorities

3. Early or late3. Early or late

4. Drop position4. Drop position

1. Degree of aggregation (classify packets )

Degree of discrimination in selecting a packet Degree of discrimination in selecting a packet to dropto drop E.g. in vanilla FIFO, all packets are in the E.g. in vanilla FIFO, all packets are in the

same classsame class Instead, can classify packets and drop Instead, can classify packets and drop

packets selectively packets selectively The finer the classification the better the The finer the classification the better the

protection protection Max-min fair allocation of buffers to classesMax-min fair allocation of buffers to classes

share buffershare buffer drop packet from class with the longest drop packet from class with the longest

queue queue (why?)(why?)

2. Drop priorities Drop lower-priority packets first e.g. divide video Drop lower-priority packets first e.g. divide video

streamstream How to choose?How to choose?

endpoint marks packetsendpoint marks packets regulator marks packets regulator marks packets congestion loss priority (CLP-ATM) bit in packet congestion loss priority (CLP-ATM) bit in packet

headerheader

3. Early vs. late drop: RED

RED (RED (RandomRandom early detection) makes three early detection) makes three improvementsimprovements

Metric is moving average of queue lengthsMetric is moving average of queue lengths small bursts pass through unharmedsmall bursts pass through unharmed only affects sustained overloadsonly affects sustained overloads

Packet Packet dropdrop probability is a linear function of probability is a linear function of mean queue lengthmean queue length prevents severe reaction to mild overloadprevents severe reaction to mild overload

Can Can markmark packets instead of dropping them packets instead of dropping them allows sources to detect network state allows sources to detect network state

without losseswithout losses

Weighted average queue length

AvgLen = (1-Weight)AvgLen + (Weight)SampleLenAvgLen = (1-Weight)AvgLen + (Weight)SampleLen Filter out short-term changes in queue lengthFilter out short-term changes in queue length

Queue length

Instantaneous

A verage

T ime

Drop probability function

queue drop with P drop queue drop with P drop

packet packet packetpacket packet packet

P(drop)

1.0

MaxP

MinThresh MaxThresh

A vgLen

RED benefits RED improves performance of a network of RED improves performance of a network of

cooperating TCP sourcescooperating TCP sources No bias against bursty sources - good No bias against bursty sources - good

for TCP new openfor TCP new open Controls queue length regardless of Controls queue length regardless of

endpoint cooperationendpoint cooperation RED only works if senders obey packet loss RED only works if senders obey packet loss

as congestion signal and back offas congestion signal and back off RED improves fairnessRED improves fairness

randomrandom

4. Drop position Can drop a packet from head, tail, or random position Can drop a packet from head, tail, or random position

in the queuein the queue TailTail

easyeasy default approachdefault approach

HeadHead harder to implementharder to implement lets source detect loss earlierlets source detect loss earlier



connectionsconnections Packet drop strategiesPacket drop strategies Core-Stateless Fair QueuingCore-Stateless Fair Queuing Policing (Regulating, Shaping )Policing (Regulating, Shaping )

Fair Queuing Tradeoffs

FQ can control congestion by monitoring flowsFQ can control congestion by monitoring flows Non-adaptive flows can still be a problem – why?Non-adaptive flows can still be a problem – why?

Complex stateComplex state Must keep queue per flowMust keep queue per flow

Hard in routers with many flows (e.g., Hard in routers with many flows (e.g., backbone routers)backbone routers)

Flow aggregation is a possibility (e.g. do Flow aggregation is a possibility (e.g. do fairness per domain)fairness per domain)

Complex computationComplex computation Classification into flows may be hardClassification into flows may be hard Must keep queues sorted by finish timesMust keep queues sorted by finish times dR/dt changes whenever the flow count changesdR/dt changes whenever the flow count changes

Core-Stateless Fair Queuing Key problem with FQ is Key problem with FQ is core routerscore routers

core – an Internet AS backbonecore – an Internet AS backbone Must maintain state for many (50-100k!) flowsMust maintain state for many (50-100k!) flows Must update state at Gbps line speedsMust update state at Gbps line speeds

CSFQ (Core-Stateless FQ) objectivesCSFQ (Core-Stateless FQ) objectives Edge routersEdge routers should do should do complex complex tasks since they tasks since they

have fewer flows (1000s)have fewer flows (1000s) Core routersCore routers can do can do simplesimple tasks tasks

No per-flow state/processing No per-flow state/processing this means this means that core routers can only decide on dropping that core routers can only decide on dropping packets not on order of processingpackets not on order of processing

Can only provide max-min bandwidth fairness Can only provide max-min bandwidth fairness not delay allocationnot delay allocation

Key ideas

DPS (Dynamic Packet State)DPS (Dynamic Packet State) Edges estimate Edges estimate arrival ratearrival rate for each flow for each flow

(per-flow state)(per-flow state) Core routers useCore routers use

Estimated arrival Estimated arrival ratesrates from edge from edge Internal measure of Internal measure of fair-sharefair-share To generate a To generate a drop probabilitydrop probability. Labels . Labels

changed on outbound flow with new changed on outbound flow with new post-drop arrival rate.post-drop arrival rate.

Estimation for fair-share value converges Estimation for fair-share value converges rapidlyrapidly

Edge Router Behavior

Monitor each flow i to Monitor each flow i to measuremeasure its arrival rate ( its arrival rate (RRii)) EWMA EWMA of rateof rate

t_i^k, l_i^k = arrival time, length of kth packet in flow it_i^k, l_i^k = arrival time, length of kth packet in flow i Non-constant EWMA constant Non-constant EWMA constant

T_i^k = interarrival (time since last pkt) (t_i^k – t_{i-1}^k)T_i^k = interarrival (time since last pkt) (t_i^k – t_{i-1}^k) Constant: eConstant: e-T/K-T/K where T , K = constant where T , K = constant Ri new = (1 – const)* length/interarrival + const*(Ri new = (1 – const)* length/interarrival + const*(RRi old)i old) Helps adapt to different packet sizes and arrival Helps adapt to different packet sizes and arrival

patternspatterns• Intuition: Trusts the “old” values less as the Intuition: Trusts the “old” values less as the

time interval increases (time interval increases (negativenegative T) T) Rate is attached to each packetRate is attached to each packet

Core Router Behavior

Drop probability for packet = Drop probability for packet = max (1- max (1- /r, 0)/r, 0) Track aggregate input ATrack aggregate input A Track accepted rate F(Track accepted rate F()) Estimate Estimate fair share rate fair share rate

Solve Solve F(F() = C) = C; but this is hard:; but this is hard: Note: Increasing Note: Increasing does not increase load (F) by does not increase load (F) by

N * N * ΔΔ (why?) (why?) F(F() = ) = ii min(r min(rii, , )) what does this look like? what does this look like?

max-min fairness: allallocations converge to max-min fairness: allallocations converge to some value some value such that all offered loads such that all offered loads rrii< < are are

given a rate given a rate rrii, while all inputs , while all inputs rrii> > are given a are given a

rate equal to rate equal to

F vs. Alpha

New alpha

C [linked capacity]

r1 r2 r3 old alphaalpha

FF

Estimating Fair Share Need F(Need F() = capacity = C) = capacity = C

Can’t keep map of F(Can’t keep map of F() values ) values would require per would require per flow stateflow state

If we’re overutilized:If we’re overutilized: Since F(Since F() is concave, piecewise-linear) is concave, piecewise-linear

F(0) = 0 and F(F(0) = 0 and F() = current accepted rate = F) = current accepted rate = Fcc

F(F() = F) = Fcc/ / F(F(newnew) = C ) = C newnew = = oldold * C/F * C/Fcc

If underutilized:If underutilized: = max_i (ri) (No drops at all)= max_i (ri) (No drops at all)

What if a mistake was made?What if a mistake was made? Forced into dropping packets due to buffer capacityForced into dropping packets due to buffer capacity When queue overflows When queue overflows is decreased slightly is decreased slightly Note that this is an increase/decrease rule in Note that this is an increase/decrease rule in

disguise. disguise.



connectionsconnections Packet drop strategiesPacket drop strategies Core-Stateless Fair QueuingCore-Stateless Fair Queuing Policing Policing (Regulating, Shaping )(Regulating, Shaping )

Policing Mechanisms

Goal:Goal: limit traffic to not exceed declared limit traffic to not exceed declared

parametersparametersThree common-used criteria: Three common-used criteria: (Long term) Average Rate:(Long term) Average Rate: how many pkts can be how many pkts can be

sent per unit time (in the long run)sent per unit time (in the long run) crucial question: what is the interval length: 100 crucial question: what is the interval length: 100

packets per sec or 6000 packets per min have packets per sec or 6000 packets per min have same average!same average!

Peak Rate:Peak Rate: e.g., 6000 pkts per min. (ppm) avg.; 1500 e.g., 6000 pkts per min. (ppm) avg.; 1500 ppm peak rateppm peak rate

(Max.) Burst Size:(Max.) Burst Size: max. number of pkts sent max. number of pkts sent consecutively (with no intervening idle)consecutively (with no intervening idle)

Leaky bucket

Token BucketToken Bucket:Token Bucket: limit input to specified Burst Size and limit input to specified Burst Size and

Average Rate. Average Rate.

bucket can hold bucket can hold b tokensb tokens tokens generatedtokens generated at rate at rate r token/secr token/sec unless bucket fullunless bucket full

over interval of lengthover interval of length t: t: number of packets admittednumber of packets admitted less less than orthan or equal equal to to (r t + b). (r t + b). a linear function of timea linear function of time Linear Bounded Arrival ProcessLinear Bounded Arrival Process (LBAP) (LBAP)

Choosing Token Bucket parameters

Tradeoff between Tradeoff between r r and b and b Minimal descriptorMinimal descriptor

doesn’t simultaneously have smaller doesn’t simultaneously have smaller rr and band b

presumably costs lesspresumably costs less How to choose minimal descriptor?How to choose minimal descriptor? Three way tradeoffThree way tradeoff

loss rateloss rate choice of bchoice of b (data bucket size)(data bucket size) choice of choice of rr

Choosing minimal parameters Keeping loss rate the sameKeeping loss rate the same

if bif b is more, is more, r r is less (smoothing) is less (smoothing) for each for each rr we have least we have least bb

Choose knee of curveChoose knee of curve

Policing Mechanisms (more) token bucket, WFQtoken bucket, WFQ combine to provide combine to provide

guaranteed upper bound on delay, i.e., guaranteed upper bound on delay, i.e., QoS QoS guaranteeguarantee!!

WFQ

token rate, r

bucket size, b

per-flowrate, R

D = b/Rmax

arrivingtraffic

Leaky bucket Popular in practice and in academiaPopular in practice and in academia

sort of representativesort of representative verifiableverifiable sort of preservablesort of preservable sort of usablesort of usable

Problems with multiple time scale trafficProblems with multiple time scale traffic large burst messes up thingslarge burst messes up things

Fig. 5-25 token bucketFig. 5-25 token bucket r T + b = MTr T + b = MT (c)(c) 2T + 250 = 25T 2T + 250 = 25T T = 11 T = 11

Documents

Scheduling and Policing 2007. Outline What is scheduling? Why we need it? What is scheduling? Why we need it? Requirements of a scheduling discipline