25
Journal of Algorithms 52 (2004) 57–81 www.elsevier.com/locate/jalgor Minimizing end-to-end delay in high-speed networks with a simple coordinated schedule Matthew Andrews and Lisa Zhang Bell Laboratories, Murray Hill, NJ, USA Received 11 November 2003 Available online 15 April 2004 Abstract We study the problem of providing end-to-end delay guarantees in connection-oriented networks. In this environment, multiple-hop sessions coexist and interfere with one another. Parekh and Gallager showed that the Weighted Fair Queueing (WFQ) scheduling discipline provides a worst-case delay guarantee comparable to (1i ) × K i for a session with rate ρ i and K i hops. Such delays can occur since a session-i packet can wait for time 1i at every hop. We describe a randomized work-conserving scheme that guarantees, with high probability, an ad- ditive delay bound of approximately 1i + K i . This bound is smaller than the multiplicative bound (1i ) × K i of WFQ, especially when the hop count K i is large. We call our scheme COORDINATED- EARLIEST-DEADLINE-FIRST (CEDF) since it uses an earliest-deadline-first approach in which sim- ple coordination is applied to the deadlines for consecutive hops of a session. The key to the bound is that once a packet has passed through its first server, it can pass through all its subsequent servers quickly. We conduct simulations to compare the delays actually produced by the two scheduling disciplines. In many cases, these actual delays are comparable to their analytical worst-case bounds, implying that CEDF outperforms WFQ. 2004 Elsevier Inc. All rights reserved. Keywords: Packet routing; Scheduling; Earliest deadline first; Weighted fair queueing; Delay bounds 1. Introduction The provision of end-to-end delay guarantees in high-speed networks remains one of the most important and widely studied Quality-of-Service (QoS) issues. Many real time E-mail addresses: [email protected] (M. Andrews), [email protected] (L. Zhang). 0196-6774/$ – see front matter 2004 Elsevier Inc. All rights reserved. doi:10.1016/j.jalgor.2004.03.004

Minimizing end-to-end delay in high-speed networks with a simple coordinated schedule

Embed Size (px)

Citation preview

Page 1: Minimizing end-to-end delay in high-speed networks with a simple coordinated schedule

r

tworks.

cipline

sim-boundt servers

dulingbounds,

ofl time

g).

Journal of Algorithms 52 (2004) 57–81

www.elsevier.com/locate/jalgo

Minimizing end-to-end delay in high-speednetworks with a simple coordinated schedule

Matthew Andrews and Lisa Zhang

Bell Laboratories, Murray Hill, NJ, USA

Received 11 November 2003

Available online 15 April 2004

Abstract

We study the problem of providing end-to-end delay guarantees in connection-oriented neIn this environment, multiple-hop sessions coexist and interfere with one another.

Parekh and Gallager showed that the Weighted Fair Queueing (WFQ) scheduling disprovides a worst-case delay guarantee comparable to(1/ρi ) × Ki for a session with rateρi andKi hops. Such delays can occur since a session-i packet can wait for time 1/ρi at every hop.

We describe a randomized work-conserving scheme that guarantees, with high probability, anad-ditivedelay bound of approximately 1/ρi + Ki . This bound is smaller than themultiplicativebound(1/ρi)×Ki of WFQ, especially when the hop countKi is large. We call our scheme COORDINATED-EARLIEST-DEADLINE-FIRST(CEDF) since it uses an earliest-deadline-first approach in whichple coordination is applied to the deadlines for consecutive hops of a session. The key to theis that once a packet has passed through its first server, it can pass through all its subsequenquickly.

We conduct simulations to compare the delays actually produced by the two schedisciplines. In many cases, these actual delays are comparable to their analytical worst-caseimplying that CEDF outperforms WFQ. 2004 Elsevier Inc. All rights reserved.

Keywords:Packet routing; Scheduling; Earliest deadline first; Weighted fair queueing; Delay bounds

1. Introduction

The provision of end-to-end delay guarantees in high-speed networks remains onethe most important and widely studied Quality-of-Service (QoS) issues. Many rea

E-mail addresses:[email protected] (M. Andrews), [email protected] (L. Zhan

0196-6774/$ – see front matter 2004 Elsevier Inc. All rights reserved.doi:10.1016/j.jalgor.2004.03.004

Page 2: Minimizing end-to-end delay in high-speed networks with a simple coordinated schedule

58 M. Andrews, L. Zhang / Journal of Algorithms 52 (2004) 57–81

lays.In

e.tudiedketallager

al

e

when

p

o

n

ll burstees

ervicesuffer

audio and video applications rely on the ability of the network to provide small deOne key mechanism for achieving this aim isschedulingat the outputs of the switches.this paper, we attempt to minimize end-to-end delay using a novel scheduling schem

Before we introduce our scheme we first recall the delay bounds for the much sWeighted Fair Queueing (WFQ) scheduling discipline, also known as Packet-by-PacGeneralized Processor Sharing (PGPS). In their seminal papers [1,2], Parekh and Gshowed that WFQ achieves the following session-i delay bound for Rate ProportionProcessor Sharing (RPPS)

σi + (Ki − 1)Li

ρi

+Ki∑

m=1

Lmax

rm. (1)

For sessioni, Li is the maximum packet size,Ki is the number of servers andrm is theservice rate of themth server. The maximum packet size over all sessions isLmax. Sessioni is leaky-bucket constrained with burst sizeσi and rateρi . Throughout this paper, wassume that all service is non-cut-through and non-preemptive.

To understand the delay guarantee of (1) better, we compare the delay boundsessioni has a single hop (Ki = 1) with the bound when sessioni has multiple hops(Ki > 1). We observe the following. When the burst sizeσi is large then the multiple-hodelay bound is much less thanKi times the single-hop delay bound. However, whenσi issmall then the multiple-hop delay can be approximatelyKi times the single-hop delay. Tsee this, let us assume a uniform packet size for all sessions (Li = 1) and a uniform servicerate for all servers (rm = 1). The delay bound of (1) now becomes

σi + Ki − 1

ρi

+ Ki.

Hence, for a small burst size, e.g.,σi = 1, the multiple-hop delay is essentially

1

ρi

× Ki,

and the single-hop delay is essentially 1/ρi . Moreover, it is possible to construct aexample in which this bound is achieved since a packet can wait for time 1/ρi at everyhop. This illustrates our earlier observation.

In this paper, we demonstrate with both analysis and simulation that even for smasizes, a bound of(1/ρi) × Ki is not necessary, i.e., theK-hop delay does not have to bK times the 1-hop delay. Indeed, in the case ofuniform packet sizes, uniform service ratand small burst sizes, [3] showed that each sessioni can achieve a delay bound1

O

(1

ρi

+ Ki

),

1 The boundO(1/ρi + Ki) is best possible up to a constant factor. To see this, under non-cut-through sall sessions must suffer delayKi . Moreover, examples can be constructed in which some sessions mustdelay 1/ρi .

Page 3: Minimizing end-to-end delay in high-speed networks with a simple coordinated schedule

M. Andrews, L. Zhang / Journal of Algorithms 52 (2004) 57–81 59

rotocol

etllows

rgive

ter,

sionoundur

d withhich a“spread

its firstcol as

ightset turnhrough.

tdavend. Weterm

y

oftenehav-

d.

using a centralized scheme. The same paper also proposed a simple distributed pwith a slightly weaker bound

O

(1

ρi

+ Ki log

(n

ρmin

)).

Here,n is the number of servers in the network andρmin is the minimum session rate.

1.1. Our results

In Section 3 we generalize the simple protocol in [3] to accommodate arbitrary packsizes and arbitrary server rates. We derive the following exact delay bound which aus to provide a direct comparison with (1)

σi + 4Li/ε

ρi

+Ki∑

m=1

Lmax

rmlog(·). (2)

The parameterε reflects the server utilization and, in particular,ε fraction of the serverates is not utilized. The logarithmic factor, although small, is somewhat involved. Wethe full definition later. We note thatrm the server rate is faster, and typically much fasthanρi the session rate, since the server rate needs to be at least the sum ofall the sessionrates. In addition,Li is often close toLmax, since the largest packet size from each sesis usually similar. Therefore, our bound of (2) is an improvement over the WFQ bof (1). In Section 4 we provide simulation results to compare the actual performance of oprotocol and WFQ.

The basic ideas of our protocol are an earliest-deadline-first approach couplerandomization and coordination. We assign a deadline for every server through wpacket passes. By introducing some randomness, the deadlines can be sufficientlyout” so that all the packets can meet all their deadlines. By introducingsimplecoordinationamong the deadlines, we can ensure that once a packet has passed throughserver, it can pass through all its subsequent servers quickly. We refer to our protoCOORDINATED-EARLIEST-DEADLINE-FIRST (CEDF).

The traffic lights in Manhattan provide an intuitive analogue to CEDF. Since the lare coordinated, when one traffic light turns green, many lights further down the stregreen also. This means that once a car waits through one red light it can then drive tmany green lights quickly. In this way, delaydoes not have to accumulate at every light

From now on, we refer to a delay bound of the form(1/ρi) × Ki as amultiplicativeboundand a bound of the form 1/ρi + Ki as anadditive bound. In Figs. 1 and 2, we plothese bounds for different values ofKi andρi . The curves for the multiplicative bounhave different slopes for differentρi , whereas the curves for the additive bound all hthe same slope. We can see that in general it is desirable to have an additive bounote that the bound (2) of CEDF is close to an additive bound. (It does not contain aKi/ρi .) Apart from the bound in reference [3] weknow of no previous end-to-end delabound that is close to an additive bound.

In our simulations, we observe that the actual delays under WFQ and CEDF arecomparable to their analytical bounds. In many scenarios, the former exhibits the bior of a multiplicative bound, and the latter exhibits the behavior of an additive boun

Page 4: Minimizing end-to-end delay in high-speed networks with a simple coordinated schedule

60 M. Andrews, L. Zhang / Journal of Algorithms 52 (2004) 57–81

Fig. 1. A plot of the multiplicative delay bound(1/ρi ) × Ki . Each curve represents a different value ofρi . Thedelays are plotted againstKi .

Fig. 2. A plot of the additive delay bound 1/ρi + Ki .

Page 5: Minimizing end-to-end delay in high-speed networks with a simple coordinated schedule

M. Andrews, L. Zhang / Journal of Algorithms 52 (2004) 57–81 61

redditive

sessions

erverng and,elay-hievedevabletyn [9]

tlyr,

eneralm and

ltsmpareend-ueinglauser

[17]g,abaniuling

rieflyzed inions in

in thepecified

For these scenarios, CEDF produces significantly lower delays. In other scenarios whethere is less contention between sessions, both protocols exhibit the behavior of an abound.

CEDF has other desirable properties. First, we donot need trafficreshapingbetweenhops. Second, we only need to do per-session processing at the points where theenter the network. That is, we donoper-session processingwithin the network.

1.2. Previous work

The Earliest-Deadline-First (EDF) scheduling discipline when applied to a single shas received much attention. For example, Ferrari and Verma [4] and Verma, ZhaFerrari [5] showed that it can provide delay bounds and delay-jitter bounds. GeorgiadisGuérin and Parekh [6] and Liebeherr, Wrege and Ferrari [7] proved that EDF is doptimal in the sense that if a set of delay bounds is achievable then it can be acby EDF. Necessary and sufficient conditions for a set of delay bounds to be achiwere given. Liebeherr et al. also presented schemes with low implementation complexithat approximate EDF [7,8]. For networks, Georgiadis, Guérin, Peris and Sivarajashowed that EDF can be sub-optimal. Nevertheless they proved that if the traffic is correcreshaped after each node then EDF can outperform Weighted Fair Queueing. Howevethe best explicit bound on end-to-end delay given in [9] is the same as Eq. (1). Gtechniques for calculating end-to-end delay bounds were obtained by Goyal, LaVin [10] and Goyal and Vin [11].

A number of papers have simulated end-to-end delay performance. Simulation resufor EDF are presented in [4,5]. Clark, Shenker and Zhang [12] used simulation to coWFQ with variants of FIFO. Yates, Kurose, Towsley and Hluchyj [13] examinedto-end delay distributions for WFQ, FIFO and Golestani’s Stop-and-Go Fair Que[14,15]. They found that the analytic delay bounds can be too pessimistic. Grossgand Keshav [16] showed that FIFO can outperform the Weighted Round Robin (WRR)and Round Robin (RR) disciplines for CBR traffic.

Our protocol CEDF is motivated by techniques of Leighton, Maggs and Raoand Leighton, Maggs and Richa [18] forstatic packet scheduling. In this static settinall packets are present in the network initially. Similar techniques were used by Rand Tardos [19] and Ostrovsky and Rabani [20]. For an overview of different scheddisciplines, see [21,22].

The rest of the paper is divided into sections as follows. We define our model and breview WFQ and RPPS in Section 2. Our protocol CEDF is described and analySection 3. The simulation results are presented in Section 4. We give our conclusSection 5.

2. Model and definitions

We consider a packet-based connection-oriented network. We equate each linknetwork with the server that schedules the sessions on the link. Each session is s

Page 6: Minimizing end-to-end delay in high-speed networks with a simple coordinated schedule

62 M. Andrews, L. Zhang / Journal of Algorithms 52 (2004) 57–81

of, we

ing

oegins

ition.haringInat

d

asis. In

ve

Rate

by a fixed path through the network. LetKi be the number of servers along the pathsessioni, and letm(i)

1 ,m(i)2 , . . . ,m

(i)Ki

be these servers. When it causes no confusiondrop the superscript(i). We defineLi to be the maximum size of a session-i packet in bits.Let Lmax = maxi Li andLmin = mini Li .

We use the(σ,ρ) traffic model introduced by Cruz [23,24] in which the traffic enterthe network isleaky-bucket constrained. The session-i traffic is characterized by aburstsizeσi and asession rateρi . If Ai(t1, t2) denotes the amount of session-i traffic (in bits)entering the network during the time interval(t1, t2], then

Ai(t1, t2] � σi + ρi(t2 − t1), ∀ t2 � t1 � 0.

Let rm be theservice rateof serverm, i.e.,m can service at mostrm(t2 − t1) bits duringthe interval(t1, t2]. Let I (m) be the set of sessions served by serverm. We require thefollowing stability condition for every serverm

∑i∈I (m)

ρi � (1− ε)rm for someε > 0.

The parameterε reflects the serverutilization. It is crucial in allowing us to usecoordination to achieve low delay bounds.

We adopt thenon-cut-throughandnon-preemptiveconvention for scheduling. First, npacket is eligible for service until its last bit has arrived. Second, once a server bserving a packet, it must continue until the whole packet has been serviced.

2.1. Review of Weighted Fair Queueing

Since we refer frequently to Weighted Fair Queueing, we now provide a brief definFor details see [1,2,25]. WFQ attempts to emulate the Generalized Processor S(GPS) scheme, in which all backlogged sessions receive service simultaneously.particular, if sessioni is backlogged at serverm then under GPS it receives servicerate

Φmi∑

j∈BmΦm

j

rm

whereBm is the set of backlogged sessions at serverm and theΦmi are a set of allocate

weights.WFQ is a non-preemptive scheme that emulates GPS on a packet-by-packet b

particular, if a server needs to select a packet for transmission at timet then it selects thefirst packet that would complete service under GPS if no additional packets were to arriafter timet .

In this paper we restrict our attention to a special case of WFQ known asProportional Processor Sharing (RPPS) in whichΦm

i = ρi for all sessionsi and serversm.The end-to-end delay bound for RPPS derived in [2] is stated in Eq. (1).

Page 7: Minimizing end-to-end delay in high-speed networks with a simple coordinated schedule

M. Andrews, L. Zhang / Journal of Algorithms 52 (2004) 57–81 63

.

an

r. Ties

essetserver

es ofesrough allhts ongreen,lightsmany

utF,

gh.

ss in

ing,val

e

3. Analytical bound

3.1. Overview

The basic idea of COORDINATED-EDF is very simple. For each packetp, we assigndeadlinesD1,D2, . . . ,DK for every server,m1,m2, . . . ,mK , through whichp passesThe deadlines at a serverm are defined using a parameterGm, whereGm is essentially(Lmax/rm) log(·). (We define the logarithmic factor inGm later.) In particular,D1 isrand + Gm1 time afterp’s injection, whererand is a random number chosen fromappropriate range. Each subsequent deadlineDk+1 is Dk + Gmk . CEDF gives priority tothe packet with the earliest deadline if more than one packet is waiting for a serveare broken arbitrarily.

Note that randomness is only added to the firstdeadline of each packet. This randomnhas the important effect of “spreading out” the deadlines. Ifrand is chosen from a largenough range, i.e., proportional toLi/ρi for sessioni, then deadlines from differensessions do not cluster together. In this way, packets do not compete for the samesimultaneously, and hence all packets are able to meet all their deadlines.

The Gm’s provide coordination among the deadlines. We point out that the valuthe Gm’s are usually small, especially in high-speed networks where the server ratrm

are large. This means that once a packet passes through its first server, it passes thits subsequent servers quickly. As an analogy to our strategy, consider the traffic ligan avenue in Manhattan. If a car is stopped at a red light then once that light turnsmany of the subsequent lights turn green also. In other words, the coordination of themeans that once the car has passed through one light, it can quickly travel throughlights in succession.

We emphasize that theGm’s are dependent on,ε, the server utilization parameter, bnot on the rate of each individual session that goes through the server. Under CEDsession-i packets do not accumulate a delay of 1/ρi for each server that they pass throuHence, CEDF doesnot have a multiplicative term of the form(1/ρi) × Ki in its delaybound. This provides a significant contrast with the delay bound of WFQ. We discumore detail the advantages of CEDF in Section 3.2.

3.2. Protocol

3.2.1. ParametersWe define parametersTi andM for generating random numbers. Roughly speak

M serves as the “period” of the deadlines. Once the deadlines are defined in an interof lengthM, all deadlines are defined. The parameterTi is the size of the intervals fromwhich the random numbers for sessioni are chosen. WhenTi is about 2Li/(ερi), theamount of randomness is sufficient to “spread out” the deadlines. We choose to writTi inthe following (slightly complicated) form, because it ensures thatM is an integral multipleof all theTi ’s. For reasons that will become clear later, we also defineSi such thatSi/Ti isslightly greater than the session rateρi . Let

Ti = 2�log2

2Liερi

�,

Page 8: Minimizing end-to-end delay in high-speed networks with a simple coordinated schedule

64 M. Andrews, L. Zhang / Journal of Algorithms 52 (2004) 57–81

t is

ability

y

ines

he

M = maxi

Ti,

Si = Tiρi(1+ ε/2).

We defineGm for each serverm, which determines how the deadline for a packeincremented when it advances from one server to the next. Let

Gm = αLmax

rmloge

(nMrmε

Lmin

),

whereLmax = maxi Li andLmin = mini Li . The parameter

α = O

(ε−3 log

1

1− psuc

),

wherepsuc is the success probability of the protocol. (We discuss this success probin the Analysis section.) Note thatα is independent ofLi , σi , ρi , Ki andrm.

3.2.2. TokensWe usetokensto define deadlines. For sessioni, let τ1, τ2, . . . , τM/Ti be numbers

chosen uniformly at random from each of the intervals[0, Ti), [Ti,2Ti), . . . , [M − Ti,M).Session-i tokens appear periodically with periodM at the following times.

τ1 τ2 . . . τM/Ti

τ1 + M τ2 + M . . . τM/Ti + M

τ1 + 2M τ2 + 2M . . . τM/Ti + 2M

τ1 + 3M τ2 + 3M . . . τM/Ti + 3M...

3.2.3. DeadlinesLet m1,m2, . . . ,mKi be the servers on the path of sessioni. For each session-i packet,

we define a sequence of deadlinesD1,D2, . . . ,DKi for traversing the servers.When a packet of size� bits obtains a token, itconsumes� bits from that token. At

mostSi bits can be consumed from each session-i token. (Note that multiple packets maconsume the same token.) Suppose a session-i packetp is injected at timetinj and has�p

bits. Suppose also that the session-i packet injected immediately beforep obtains its tokenat timetprev. Packetp obtains the first session-i token at or after time max{tinj, tprev} thathas at least�p bits unconsumed. Letτ be the time that the token appears. The deadlare defined as follows

D1 = τ + Gm1,

Dj = Dj−1 + Gmj .

Now that all deadlines are defined, each server gives priority to the packet that has tearliest deadline.

Page 9: Minimizing end-to-end delay in high-speed networks with a simple coordinated schedule

M. Andrews, L. Zhang / Journal of Algorithms 52 (2004) 57–81 65

f theithckets

ethat notns are

adlineheg for

btains.enters.nt of its

tocol

placingan

t

a

Remarks.

1. The only coordination required comes from the above iterative definition odeadlines. This coordination can be achieved simply by stamping each packet wits current deadline.2 Each server can then update the deadlines of its pending paautonomously, i.e., we do not require explicit communication among servers.

2. We do not place tokens at timesTi,2Ti,3Ti , etc., but rather we introduce somrandomness. This randomness is essential for spacing out the deadlines somany deadlines contend for the same server simultaneously. Once the tokechosen, the deadlines are chosen deterministically.

3. We emphasize that our protocol iswork conservingand requiresno traffic shaping.As long as some packets are waiting for a server, the packet with the earliest degets serviced. In particular, a packet canbe serviced before it obtains a token. Tconcept of a “packet obtaining/consuming a token” is merely a method of countinthe purpose of assigning deadlines.

4. The only per-session processing is the determination of which token a packet oThis can be done at the point on the edge of the network where the sessionOnce the token has been obtained, the deadlines for the packet are independesession parameters. This means that we need no per-session statewithin the network.

3.3. Analysis

In this section we prove the following end-to-end delay bound.

Theorem 1. With high probability, theend-to-end delay guarantee for sessioni is

σi + 4Li/ε

ρi

+ α

Ki∑k=1

Lmax

rmkloge

(nMrmkε

Lmin

).

To prove Theorem 1, we prove two statements. First, with high probability the prois successful. (See Lemmas 2 and 3.) We say that a protocol issuccessfulif everypacketmeets all of its deadlines. The success of the protocol is equivalent to the successfulof a finite number of tokens due to the periodicity of the token placement. Hence, we cuse a Chernoff-bound argument to analyze the success probability. Second,τ is at mosttinj + σi/ρi + 4Li/(ερi) for each session-i packet, wheretinj is the injection time of thapacket. (See Lemma 4.)

Consider a serverm and a time intervalI . Let P be the set of packets that havedeadline for serverm in interval I . If the total size of the packets inP is x, then we saythatI servicesx bits at serverm.

Lemma 2. Consider any serverm and any time intervalI = [t − Gm, t], wheret is apotential deadline for some session at serverm. With high probability, any such intervalI

services fewer thanGmrm bits at serverm.

2 This can be done using techniques similar to the protocols of [26].

Page 10: Minimizing end-to-end delay in high-speed networks with a simple coordinated schedule

66 M. Andrews, L. Zhang / Journal of Algorithms 52 (2004) 57–81

tareost

,

rce

ote that

Proof. Let Xi be the number of session-i bits thatI services at serverm. The expectationof Xi , E[Xi], is at most(Si/Ti)Gm. This is because one session-i token is placed arandom in each of the intervals[0, Ti), [Ti,2Ti), etc., and the deadlines for each sessiona fixed amount of time after the tokens. In addition, each token is consumed by at mSi

bits. LetN be the set of sessions whose paths pass throughm. By linearity of expectationand the fact thatSi/Ti = (1+ ε/2)ρir

m and∑

i∈N ρi � (1− ε)rm,

E

[∑i∈N

Xi

]�

∑i∈N

Si

Ti

Gm � (1− ε/2)rmGm.

We use a Chernoff-type argument to show that Pr[∑i∈N Xi � rmGm] is small. Inparticular,

Pr

[∑i∈N

Xi � rmGm

]� e−ε3(1−ε)G/(48Lmax).

In detail, consider any token for any session and call this tokenj . Let Yj be the numbeof bits serviced byI at m due toj . The Yj ’s are independent random variables, sinthe delays inserted after the tokens are independent from one another. We also n∑

i Xi = ∑j Yj , and Yj � Smax where Smax is the maximumSi over all sessionsi.

It is easy to verify thatSmax � 4Lmax(1 + ε/2)/ε from the definition ofSi . Let γ =loge(1+ ε/2)/Smax and letG = rmGm.

Pr

[∑j

Yj > G

]

= Pr[eγ

∑j Yj > eγG

]

� e−γG · E[eγ

∑j Yj

](Markov’s inequality)

= e−γG ·∏j

E[eγYj

](independence of theYj ’s)

� e−γG ·∏j

E[1+ (

eγ Smax − 1)Yj/Smax

] (convexity off (Yj ) = eγYj ;0 � Yj � Smax

)= e−γG ·

∏j

E[1+ Yj ε/(2Smax)

](definition ofγ )

= e−γG ·∏j

(1+ E[Yj ]ε/(2Smax)

)(linearity of expectation)

� e−γG ·∏j

eE[Yj ]ε/(2Smax)(1+ ax � eax

)

= e−γG · eE[∑

j Yj]ε/(2Smax)

� (1+ ε/2)−G/Smax · e(1−ε/2)εG/(2Smax)

(definition ofγ ;

∑Xi =

∑Yj

)

i j
Page 11: Minimizing end-to-end delay in high-speed networks with a simple coordinated schedule

M. Andrews, L. Zhang / Journal of Algorithms 52 (2004) 57–81 67

e

.

es.

isines

y

�(

eε/2

(1+ ε/2)(1+ε/2)

)(1−ε/2)G/Smax

� e−ε2(1−ε/2)G/(12Smax)

(ex

(1+ x)1+x� e−x2/3 for 0 � x � 1

)

� e−ε3(1−ε)G/(48Lmax)(Smax� 4Lmax(1+ ε/2)/ε

).

Since the token placement is periodic with periodM, we only need to consider a fixed timperiod of lengthM. For each serverm, only M/Ti intervalsI = [t − Gm, t] can havet asa deadline for a session-i packet in that time period. There aren servers in the networkHence, the total number of such intervalsI is

n∑

i

M

Ti

� n∑

i

Mερi

2Li

� nMrmε

2Lmin.

Let α = (48/(ε3(1− ε))) loge(1/(1− psuc)). Recall that

Gm = α

(Lmax

rm

)loge

(nMrmε

Lmin

).

By a union bound argument, the probability that some serverm services at leastGmrm bitsduring some intervalI is at most(

nMrmε

2Lmin

)(e−ε3(1−ε)rmGm/(48Lmax)

)

�(

nMrmε

2Lmin

)(Lmin

nMrmε

)αε3(1−ε)/48

� 1− psuc.

We can choosepsuc, the success probability of the protocol, to be close to 1.�Lemma 3. If the assumption in Lemma2 holds, then every packet meets all its deadlin

Proof. For the purpose of contradiction, letD be the first deadline that is missed. Thimplies that all deadlines earlier thanD are met. Letp be the packet that misses deadlD for serverm. Suppose that packetp has length�p . Since packetp meets its previoudeadlines, it must be waiting at serverm at timeD − Gm. Hence, serverm is servicingother packets from timeD − Gm to D − �p/rm. Let p′ be such a packet, thenp′ musthave a deadlineD′ � D by the definition of EDF. Moreover,D′ � D − Gm sinceD is thefirst deadline missed. Hence, the totalsize of packets that have deadlines in[D − Gm,D]is at leastrmGm. This contradicts the assumption of Lemma 2.�

Lemma 2 and Lemma 3 imply that each session-i packetp reaches its destination btime τ + ∑Ki

j=1 Gmj . To complete our analysis, we upper boundτ as follows.

Lemma 4. For each session-i packetp injected attinj , we have

τ � tinj + σi

ρi

+ 4Li

ερi

.

Page 12: Minimizing end-to-end delay in high-speed networks with a simple coordinated schedule

68 M. Andrews, L. Zhang / Journal of Algorithms 52 (2004) 57–81

n.

weh the

iformwencethoutrevioushe

llowing

Proof. Let t0 be the last time beforetinj that no session-i packet is waiting to obtain a tokeDuring (t0, τ ) every session-i token must consume packets injected during(t0, tinj] onlyand each token must consume more thanSi − Li bits. Otherwise, either(t0, tinj) containsa time when no session-i packet is waiting orp would obtain a token beforeτ . The totalnumber of bits injected during(t0, tinj] is at most

σi + (tinj − t0)ρi.

The total number of session-i tokens during(t0, τ ] is at least(τ − t0 − Ti)/Ti . Therefore,the total number of session-i bits consumed during[t0, τ ] is at least

τ − t0 − Ti

Ti

(Si − Li).

Hence,

τ − t0 − Ti

Ti

(Si − Li) � σi + (tinj − t0)ρi

⇒ τ − t0 − Ti

ρiTi

(ρiTi + Li − Li) � σi

ρi

+ tinj − t0

⇒ τ � tinj + σi

ρi

+ 4Li

ερi

. �The factor 1/ε in the term 4Li/ε is needed in the proof of the above lemma. However,conjecture that in many situations it will be possible to obtain a delay bound in whicterm 4Li/ε is replaced by 4Li .

4. Simulation results

Our experiments simulate a simple situation with uniform packet sizes and unserver rates, as considered analytically in[3]. Since CEDF involves many parameters,simulate a simplified version, SIMPLE-CEDF, which nevertheless contains the esseof CEDF. Under S-CEDF, the deadline for the first server is chosen randomly (wireference to periodic tokens). Every subsequent deadline is the deadline for the pserver incremented by one packet service time. (See Fig. 3.) As we shall see, tperformance of S-CEDF corresponds to the analytical bounds of Section 3.

We compare the performance of WFQ and SIMPLE-CEDF (S-CEDF) using themean end-to-end delay and the 98%-percentile end-to-end delay. We use the fo

• p: A session-i packet• tinj : Injection time ofp• Dk : Deadline ofp at itskth hop

1 D1 := randomly chosen from[tinj , tinj + 1/ρi ]2 Dk := Dk−1 + one packet service time3 Each link gives priority to the packet with the earliest deadline.

Fig. 3. S-CEDF, the SIMPLE-CEDF protocol.

Page 13: Minimizing end-to-end delay in high-speed networks with a simple coordinated schedule

M. Andrews, L. Zhang / Journal of Algorithms 52 (2004) 57–81 69

00istspendsents.

sistsch

for theis

p

ed byactual

otsFQimilar

ssions,rome

links,initial

ginningsessionadlines

simulation parameters. The link speed is set to 1 Mb/sec and all packets have a size 10bits. The packet service time on each link is therefore 1 ms. The end-to-end delay consof the packet service time and the queueing time, i.e., the time that the packet swaiting in a buffer. Buffers have a large size and no packet is dropped in any experim

packet size link speed packet service time buffer size

1000 b 1 Mb/sec 1 ms ∞

4.1. Single long session

We begin with a very simple configuration as illustrated in Fig. 4. The network conof a line ofN links. A long session ofN hops travels through the network sharing eahop with a short session of 1 hop. These short sessions provide the “cross-traffic”long session. The lengthN of the long session varies from 5 to 40. The link utilizationset to 0.8 (i.e.,ε = 0.2). The rate of the long sessionρ� varies in the range from 0.03 to0.7. The rate of each short sessionρs is set to 0.8 − ρ�. Experiments of a similar setuwere conducted in other simulation studies, e.g., [5,16,27].

We first use a deterministic injection model that conforms to the(σ,ρ) traffic modelwith σ = 1 for each session. Figures 5–8 illustrate the end-to-end delay experiencthe long session. We note the striking resemblance between the curves for thesedelays and the curves for the analytical delay bounds. (Recall Figs. 1 and 2.) These pldemonstrate that for small values ofρ�, S-CEDF has a significant advantage over Win terms of the end-to-end delay of the long session. The two disciplines present sbehavior for larger values ofρ�.

We take a closer look at the behavior of the long session for smallρ�. Under WFQ,packets from the long session are frequently delayed by packets from the 1-hop sesinceρs is much larger thanρ�. Furthermore, a packet from the long session suffers fa similar amount of queueing delay at each link. This behavior of WFQ supports thanalytical bound of the multiplicative form(1/ρ�) × K.

Under S-CEDF, the long session behaves differently. When traversing the first fewa packet from the long session is likely to queue in the buffers. This is because thedeadline is chosen from the range[tinj, tinj + 1/ρ�]. Whenρ� is smaller thanρs , the longsession is likely to have later deadlines than the interfering 1-hop sessions at the beof its path, and hence its packets are delayed. However, as the packet from the longmoves further along its path, its deadline becomes earlier in comparison to the de

Fig. 4. Session 0 is the long session with 5 hops. Sessions 1 through 5 are the 1-hop sessions.

Page 14: Minimizing end-to-end delay in high-speed networks with a simple coordinated schedule

70 M. Andrews, L. Zhang / Journal of Algorithms 52 (2004) 57–81

Fig. 5. Mean delay of the long session due to WFQ.

Fig. 6. Mean delay of the long session due to S-CEDF.

Page 15: Minimizing end-to-end delay in high-speed networks with a simple coordinated schedule

M. Andrews, L. Zhang / Journal of Algorithms 52 (2004) 57–81 71

Fig. 7. 98%-percentile delay of the long session due to WFQ.

Fig. 8. 98%-percentile delay of the long session due to S-CEDF.

Page 16: Minimizing end-to-end delay in high-speed networks with a simple coordinated schedule

72 M. Andrews, L. Zhang / Journal of Algorithms 52 (2004) 57–81

pports

-great

1-hop

of thech link,essionsd 40nshown

ration

mes

tting,e

where25, 30,tes ofch pathn twomade

link isp pathsp path0, 15,ing the13–16WFQ7.

essions

of the 1-hop sessions, and hence it suffers less delay. This behavior of S-CEDF suthe analytical bound of the additive form 1/ρ� + K.

Despite the fact that the long sessions with smallρ� have much smaller end-toend delay under S-CEDF than under WFQ, the 1-hop sessions do not suffer adeal under S-CEDF. The following table summarizes the mean delay of thesessions

ρ� 0.03 0.1 0.2 0.3 0.4 0.5 0.6 0.7ρs 0.77 0.7 0.6 0.5 0.4 0.3 0.2 0.1

WFQ 1.0 1.0 1.0 1.0 1.4 1.5 1.8 2.2S-CEDF 1.06 1.16 1.26 1.3 1.42 1.53 1.79 2.15

Variations of the above experiments are conducted. We first vary the configurationnetwork and the sessions. For example, instead of having one 1-hop session at eawe use multiple 1-hop sessions at each link where the total rates of these 1-hop sadd up to 0.8 − ρ�. As another example, we experiment with a ring of 40 nodes anlinks. Multiple long sessions wrap around the ring interfering with one another in additioto the 1-hop sessions on each link. These experiments yield similar results to thosein Figs. 5–8.

We also vary the injection patterns at the source for the single long session configushown in Fig. 4. Experiments with a larger burst size, e.g.,σ = 10, yields plots similar toFigs. 5–8. A probabilistic on-off source with exponentially distributed on and off tiyields the plots in Figs. 9–12.

We have results for similar experiments using the FIFO discipline. In this sethe delays produced by FIFOare close to the delays of WFQ, i.e., the delays can bapproximated by a multiplicative formula.

4.2. Multiple long sessions

We now consider a more complicated configuration. We use a ring of 40 nodes,neighboring nodes are connected by 8 links. Sessions with hops 1, 5, 10, 15, 20,35 and 40 coexist and interfere with one another in this network. The paths and rathese sessions are chosen as follows. We first choose a set of 40-hop paths. Eabegins with a random node and then follows the ring. Each hop of the path betweeneighboring nodes can follow any of the 8 links between these nodes. The choice israndomly subject to the constraint that the number of paths going through eachthe same. We now cut some of these 40-hop paths into shorter paths. Some 40-hoare divided into a 5-hop path and a 35-hop path, others are divided into a 10-hoand 30-hop path, etc. After this process, the network has paths with lengths 5, 1. . . , 40. We also have some 1-hop paths. All sessions have the same rate. By varynumber of the original 40-hop paths, we achieve the desired session rates. Figuressummarize the performance of WFQ and S-CEDF. As we can see, the curves forhave the multiplicative characteristic, although it is less pronounced than in Figs. 5 andThe curves for S-CEDF have the additive characteristic. We also observe that long s

Page 17: Minimizing end-to-end delay in high-speed networks with a simple coordinated schedule

M. Andrews, L. Zhang / Journal of Algorithms 52 (2004) 57–81 73

Fig. 9. Probabilistic on-off source. Mean delay due to WFQ.

Fig. 10. Probabilistic on-off source. Mean delay due to S-CEDF.

Page 18: Minimizing end-to-end delay in high-speed networks with a simple coordinated schedule

74 M. Andrews, L. Zhang / Journal of Algorithms 52 (2004) 57–81

Fig. 11. Probabilistic on-off source. 98%-percentile delay due to WFQ.

Fig. 12. Probabilistic on-off source. 98%-percentile delay due to S-CEDF.

Page 19: Minimizing end-to-end delay in high-speed networks with a simple coordinated schedule

M. Andrews, L. Zhang / Journal of Algorithms 52 (2004) 57–81 75

Fig. 13. Multiple long sessions. Mean delay due to WFQ.

Fig. 14. Multiple long sessions. Mean delay due to S-CEDF.

Page 20: Minimizing end-to-end delay in high-speed networks with a simple coordinated schedule

76 M. Andrews, L. Zhang / Journal of Algorithms 52 (2004) 57–81

Fig. 15. Multiple long sessions. 98%-percentile delay due to WFQ.

Fig. 16. Multiple long sessions. 98%-percentile delay due to S-CEDF.

Page 21: Minimizing end-to-end delay in high-speed networks with a simple coordinated schedule

M. Andrews, L. Zhang / Journal of Algorithms 52 (2004) 57–81 77

ally

reforewhich

reg frome pathomly,throughrder toays duecativeCEDF

passesCEDF,ehavebe

is

lp ints on aation

perform better under S-CEDF than under WFQ, whereas short sessions perform marginbetter under WFQ.

We finally note that the analytical bound for WFQ is a worst-case bound, and thecan be overly conservative. In our experiments, we have encountered situations inWFQ behaves in a similar manner to S-CEDF, i.e., the additive form of 1/ρ + K is moreapparent. In one such experiment, we consider a line of 41 nodes and 80 links, wheneighboring nodes are connected by double links. All sessions have 40 hops, startinthe node on the left end and finishing at the node on the right end. Each hop along thof a session can follow either the upper or the lower link. The choice is made randsubject to the constraint that each link has an equal number of sessions passingit. All sessions have the same injection rate. We vary the number of sessions in oachieve the desired session rate. Figures 17 and 19 illustrate the end-to-end delto WFQ averaging over all the 40-hop sessions. These delays have little multiplibehavior. This is because in this network there is little contention among packets. S-produces similar end-to-end delays.

5. Conclusion

We have described a work-conserving scheduling discipline COORDINATED-EARLIEST-DEADLINE-FIRST with end-to-end delay bound

σi + 4Li/ε

ρi

+Ki∑k=1

Lmax

rmklog(·).

CEDF uses randomization and simple coordination to ensure that once a packetthrough its first server it can pass through all its subsequent servers quickly. Undera session-i packet does not accumulate a delay ofLiKi/ρi overKi hops, and thereforits delay bound is smaller than that of the Weighted Fair Queueing discipline. Wealso presented simulation results to showthat the performance of CEDF and WFQ cancomparable to the analytical bounds.

The major open problem is to reduce the delay bound still further. The ultimate goala simple protocol with a delay bound

σi + Li

ρi

+Ki∑k=1

Lmax

rmk.

Acknowledgments

We thank Antonio Fernández, Mor Harchol-Balter and Tom Leighton for their heearlier stages of this work. Antonio Fernández also provided many detailed commenpreliminary draft of this paper. We thank Jorg Liebeherr for his insight on implementissues.

Page 22: Minimizing end-to-end delay in high-speed networks with a simple coordinated schedule

78 M. Andrews, L. Zhang / Journal of Algorithms 52 (2004) 57–81

Fig. 17. Double-link network. Mean delay due to WFQ.

Fig. 18. Double-link network. Mean delay due to S-CEDF.

Page 23: Minimizing end-to-end delay in high-speed networks with a simple coordinated schedule

M. Andrews, L. Zhang / Journal of Algorithms 52 (2004) 57–81 79

Fig. 19. Double-link network. 98%-percentile delay due to WFQ.

Fig. 20. Double-link network. 98%-percentile delay due to S-CEDF.

Page 24: Minimizing end-to-end delay in high-speed networks with a simple coordinated schedule

80 M. Andrews, L. Zhang / Journal of Algorithms 52 (2004) 57–81

ces

ces

r-on

elected

, in:

ents,

ice,

f IEEE

node

so,

-

rk:

callOMM

n. 9 (7)

om-

9–136.

ual

M

in:

orm.

1,

References

[1] A.K. Parekh, R.G. Gallager, A generalized processor sharing approach to flow control in integrated servinetworks: the single-node case, IEEE/ACM Trans. Netw. 1 (3) (1993) 344–357.

[2] A.K. Parekh, R.G. Gallager, A generalized processor sharing approach to flow control in integrated servinetworks: the multiple-node case, IEEE/ACM Trans. Netw. 2 (2) (1994) 137–150.

[3] M. Andrews, A. Fernández, M. Harchol-Balter, T.Leighton, L. Zhang, Dynamic packet routing with pepacket delay guarantees ofO(distance+ 1/session rate), in: Proceedings of the 38th Annual SymposiumFoundations of Computer Science, Miami Beach, FL, 1997, pp. 294–302.

[4] D. Ferrari, D. Verma, A scheme for real-time channel establishment in wide-area networks, IEEE J. SAreas Commun. 8 (3) (1990) 368–379.

[5] D. Verma, H. Zhang, D. Ferrari, Guaranteeing delay jitter bounds in packet switching networksProceedings of Tricomm ’91, Chapel Hill, NC, 1991.

[6] L. Georgiadis, R. Guérin, A. Parekh, Optimal multiplexing on a single link: delay and buffer requiremIEEE Trans. Inform. Theory 43 (5) (1997) 1518–1535.

[7] J. Liebeherr, D. Wrege, D. Ferrari, Exact admission control for networks with a bounded delay servIEEE/ACM Trans. Netw. 4 (6) (1996) 885–901.

[8] D. Wrege, J. Liebeherr, A near-optimal packet scheduler for QoS networks, in: Proceedings oINFOCOM ’97, 1997.

[9] L. Georgiadis, R. Guérin, V. Peris, K. Sivarajan, Efficient network QoS provisioning based on pertraffic shaping, in: Proceedings of IEEE INFOCOM ’96, 1996, pp. 102–110.

[10] P. Goyal, S. Lam, H. Vin, Determining end-to-enddelay bounds in heterogeneousnetworks, in: Proceedingof the Fifth International Workshop on Network andOperating System Support for Digital Audio and VideDurham, NH, 1995, pp. 287–298.

[11] P. Goyal, H. Vin, Generalizedguaranteed rate scheduling algorithms: a framework, Technical Report TR95-30, University of Texas, Austin, September 1995.

[12] D. Clark, S. Shenker, L. Zhang, Supporting real-time applications in an integrated services packet netwoarchitecture and mechanism, in: Proceedings of ACM SIGCOMM ’92, 1992, pp. 14–26.

[13] D. Yates, J. Kurose, D. Towsley, M. Hluchyj, On per-session end-to-end delay distributions and theadmission problem for real time applications with QOS requirements, in: Proceedings of ACM SIGC’93, 1993, pp. 2–12.

[14] S.J. Golestani, A framing strategy for congestion management, IEEE J. Selected Areas Commu(1991) 1064–1077.

[15] S.J. Golestani, Congestion-free communication in high-speed packet networks, IEEE Trans. Cmun. 39 (12) (1992) 1802–1812.

[16] M. Grossglauser, S. Keshav, On CBR service, in: Proceedings of IEEE INFOCOM ’96, 1996, pp. 12[17] F.T. Leighton, B.M. Maggs, S.B. Rao, Packet routing and job-shop scheduling inO(congestion+ dilation)

steps, Combinatorica 14 (2) (1993) 167–186.[18] F.T. Leighton, B.M. Maggs, A.W. Richa, Fast algorithms for findingO(congestion+dilation) packet routing

schedules, Technical report CMU-CS-96-152, Carnegie Mellon University, 1996.[19] Y. Rabani, E. Tardos, Distributed packet switchingin arbitrary networks, in: Proceedings of the 28th Ann

ACM Symposium on Theory of Computing, Philadelphia, PA, 1996.[20] R. Ostrovsky, Y. Rabani, Local control packet switching algorithm, in: Proceedings of the 29th Annual AC

Symposium on Theory of Computing, 1997.[21] S. Keshav, An Engineering Approach to Computer Networking, Addison–Wesley, Reading, MA, 1997.[22] H. Zhang, Service disciplines for guaranteed performance service in packet-switching networks,

Proceedings of IEEE,1995.[23] R.L. Cruz, A calculus for network delay, part I: Network elements in isolation, in: IEEE Trans. Inf

Theory, 1991, pp. 114–131.[24] R.L. Cruz, A calculus for network delay, part II:Network analysis, in: IEEE Trans. Inform. Theory, 199

pp. 132–141.

Page 25: Minimizing end-to-end delay in high-speed networks with a simple coordinated schedule

M. Andrews, L. Zhang / Journal of Algorithms 52 (2004) 57–81 81

g:

esign,

, PhD

[25] A. Demers, S. Keshav, S. Shenker, Analysis and simulation of a fair queueing algorithm, J. InternetworkinResearch and Experience 1 (1990) 3–26.

[26] A. Banerjea, D. Ferrari, B. Mah, M. Moran, D. Verma, H. Zhang, The Tenet real-time protocol suite: dimplementation, and experiences, IEEE/ACM Trans. Netw. 4 (1) (1996) 1–11.

[27] D. Stiliadis, Traffic scheduling in packet-switched networks: analysis design and implementationthesis, UCSC, 1996.