Lecture Notes on Stochastic Networks · Communication networks underpin our modern world, and provide fasci-nating and challenging examples of large-scale stochastic systems. Ran-domness

Lecture Notes on Stochastic Networks

Frank Kelly and Elena Yudovina

Contents

Preface page1Overview 30.1 Queueing and loss networks 40.2 Decentralized optimization 60.3 Random access networks 70.4 Broadband networks 80.5 Internet modelling 10

Part I 13

1 Markov chains 151.1 Definitions and notation 151.2 Time reversal 181.3 Erlang’s formula 201.4 Further reading 23

2 Queueing networks 242.1 An M/M/1 queue 242.2 A series ofM/M/1 queues 262.3 Closed migration processes 282.4 Open migration processes 322.5 Little’s law 382.6 Linear migration processes 412.7 Generalizations 462.8 Further reading 50

3 Loss networks 513.1 Network model 513.2 Approximation procedure 533.3 Truncating reversible processes 543.4 Maximum probability 59

iii

iv Contents

3.5 A central limit theorem 633.6 Erlang fixed point 693.7 Diverse routing 733.8 Further reading 82

Part II 83

4 Decentralized optimization 854.1 An electrical network 864.2 Road traffic models 924.3 Optimization of queueing and loss networks 1014.4 Further reading 107

5 Random access networks 1085.1 The ALOHA protocol 1095.2 Estimating backlog 1155.3 Acknowledgement-based schemes 1195.4 Distributed random access 1255.5 Further reading 131

6 Effective bandwidth 1336.1 Chernoff bound and Cramer’s theorem 1346.2 Effective bandwidth 1386.3 Large deviations for a queue with many sources 1426.4 Further reading 148

Part III 149

7 Internet congestion control 1517.1 Control of elastic network flows 1517.2 Notions of fairness 1587.3 A primal algorithm 1627.4 Modelling TCP 1667.5 What is being optimized? 1707.6 A dual algorithm 1727.7 Time delays 1737.8 Modelling a switch 1777.9 Further reading 184

8 Flow level Internet models 1868.1 Evolution of flows 186

Contents v

8.2 α-fair rate allocations 1878.3 Stability ofα-fair rate allocations 1898.4 What can go wrong? 1928.5 Linear network with proportional fairness 1958.6 Further reading 199

Appendix A Continuous-time Markov processes 201

Appendix B Little’s law 204

Appendix C Lagrange multipliers 207

Appendix D Foster-Lyapunov criteria 210

Notes 217References 218Index 223

loss networkeffective

bandwidth

Preface

Communication networks underpin our modern world, and provide fasci-nating and challenging examples of large-scale stochastic systems. Ran-domness arises in communication systems at many levels: for example, theinitiation and termination times of calls in a telephone network, or the sta-tistical structure of the arrival streams of packets at routers in the Internet.How can routing, flow control and connection acceptance algorithms bedesigned to work well in uncertain and random environments? And canwe design these algorithms using simple local rules so that they producecoherent and purposeful behaviour at the macroscopic level?

The first two parts of these lecture notes will describe a variety ofclas-sical models that can be used to help understand the behaviourof large-scale stochastic networks. Queueing and loss networks will bestudied, aswell as random access schemes and the concept of an effective bandwidth.Parallels will be drawn with models from physics, and with modelsof traf-fic in road networks.

The third part of the notes will study more recently developed modelsof packet traffic and of congestion control algorithms in the Internet. Thisis an area of some practical importance, with network operators, contentproviders, hardware and software vendors, and regulators actively seekingways of delivering new services reliably and effectively. The complex in-terplay between end-systems and the network has attracted the attention ofeconomists as well as mathematicians and engineers.

We describe enough of the technological background to communicationnetworks to motivate our models, but no more. Some of the ideas describedin these notes are finding application to financial, energy andeconomicnetworks, as computing and communication technologies transform theseareas. But communication networks currently provide the richest and best-developed area of application within which to present a connected accountof the ideas.

The lecture notes have been used for a Masters course (“Part III”) in the

1

2 Preface

Little’s lawLagrange

multiplierFoster-

Lyapunovcriteria

Faculty of Mathematics at Cambridge University. This is a one-year post-graduate course which assumes a mathematically mature audience, albeitfrom a variety of different mathematical backgrounds. Familiarity with un-dergraduate courses on Optimization and Markov Chains is helpful, but notabsolutely necessary. Appendices are provided on continuous-time Markovprocesses, Little’s law, Lagrange multipliers and Foster-Lyapunov criteria,reviewing the material needed. At Cambridge students may attend othercourses where topics touched on in these notes, for example Poisson pointprocesses, large deviations or game theory, are treated in more depth, andthis course can serve as additional motivation.

Suggestions on further reading are given at the end of each chapter; theseinclude reviews where the historical development and attribution of resultscan be found – we have not attempted this here – as well as some recentpapers which give current research directions.

The authors are grateful to students of the course at Cambridge (and,for one year, Stanford) for their questions and insights, and especially toJoe Hurd, Damon Wischik, Gaurav Raina and Neil Walton, who producedearlier sets of these notes.

Frank Kelly and Elena Yudovina, May 2013

Overview

These notes are about stochastic networks, and their applications. Large-scale systems of interacting components have long been of interest to physi-cists. For example, the behaviour of the air in a room can be described at themicroscopic level in terms of the position and velocity of eachmolecule. Atthis level of detail a molecule’s velocity appears as a random process. Con-sistent with this detailed microscopic description of the system is macro-scopic behaviour best described by quantities such as temperature and pres-sure. Thus the pressure on the wall of the room is an average over an areaand over time of many small momentum transfers as molecules bounce offthe wall, and the relationship between temperature and pressure for a gasin a confined volume can be deduced from the microscopic behaviour ofmolecules.

Economists, as well as physicists, are interested in large-scale systems,driven by the interactions of agents with preferences rather than inanimateparticles. For example from a market with many heterogeneous buyers andsellers there may emerge the notion of a price at which the market clears.

In the last century, some of the most striking examples of large-scalesystems have been technological in nature and constructed by us, from thetelephony network through to the Internet. Can we relate the microscopicdescription of these systems in terms of calls or packets to macroscopicconsequences such as blocking probabilities or throughput rates? And canwe design the microscopic rules governing calls or packets to produce moredesirable macroscopic behaviour? These are some of the questions we ad-dress in these notes. We shall see that there are high level constructs whichparallel fundamental concepts from physics or economics such asenergyor price, and which allow us to reason about the systems we design.

In this Chapter we’ll briefly introduce some of the models that wewillencounter. We’ll see in later chapters that for the systems we ourselves con-struct, we are sometimes able to use simple local rules to producemacro-scopic behaviour which appears coherent and purposeful.

3

4 Overview

loss networkprocessor

sharingloss networkloss network

0.1 Queueing and loss networks

We begin Chapter 1 with a brief introduction to Markov chains andMarkovprocesses, which will be the underlying probabilistic model formost of thesystems we consider. In Chapter 2 we study queueing networks, in whichcustomers (or jobs or packets) wait for service by one or several servers.

Figure 0.1 A network of four queues.

We will first look at a single queue. We will see how to model it asaMarkov process, and derive information on the distribution of the queuesize. We will then look briefly at a network of queues. Starting fromasimplified description of each queue, we will obtain information about thesystem behaviour. We’ll define a traffic intensity for a simple queue, andidentify Poisson flows in a network of queues.

A natural queueing discipline is first-come-first-served, and we’llalsolook at processor sharing, where the server shares its effort equally over allthe customers present in the queue.

C=3

C=2

Figure 0.2 A loss network with some of the routes highlighted.

In Chapter 3, we move on to consider loss networks. A loss networkconsists of several links, each of which may have a number of circuits.The classical example of its use is to model landline telephone connectionsbetween several cities. When a telephone call is placed between one nodeand another, it needs to hold simultaneously a circuit on eachof the linkson a route between the two nodes – otherwise, the call is lost. Note the keydifferences from a queueing network.

0.1 Queueing and loss networks 5

loss networkhysteresis

Queueing vs. loss networksQueueing networks Loss networkssequential use of resourcessimultaneous resource possessioncongestion=⇒ delay congestion=⇒ loss

We’ll first treat loss networks where routing is fixed in advance. Wewill show that the simple rules for call acceptance lead to a stationary dis-tribution for the system state that is centred on the solution toa certainoptimization problem, and we’ll relate this problem to a classical approxi-mation procedure.

Next we’ll allow calls to be rerouted, if they are blocked on their pri-mary route. One example we’ll consider is the following model ofa lossnetwork on a complete graph. Suppose that if a call arrives between twonodes and there is a circuit available on the direct link betweenthe twonodes, then the call is carried on the direct link. Otherwise, we attempt toredirect the call via another node (chosen at random), on a path through twolinks. (Figure 0.2 shows the single links and some of the two-link reroutingoptions.)

P(losing a call)

load

Figure 0.3 Hysteresis

What is the loss probability in such a network, as a function of the arrivalrates? In Figure 0.3 we sketch the proportion of calls lost, as the load onthe system is varied. The interesting phenomenon that can occur is as fol-lows: as the load is slowly increased, the loss probability increases slowlyto begin with, but then jumps suddenly to a higher value; if the load is thenslowly decreased, the loss probability does not trace the same curve, but

6 Overview

hysteresisresource

poolingloss networkBraess’s

paradox

drops back at a noticeably lower level of load. The system exhibitshystere-sis, a phenomenon more often associated with magnetic materials.

How can this be? We’ll model the network as an irreducible finite-stateMarkov chain, so it must have auniquestationary distribution, henceuniqueprobability of losing a call at a given arrival rate.

On the other hand, it makes sense. If the proportion of occupied circuitsis low, a call is likely to be carried on the direct route through a single link;if the proportion of occupied circuits is high, more calls will becarriedalong indirect routes through two circuits, which may in turn keeplinkutilization high.

How can both of these insights be true? We’ll see that the resolution con-cerns two distinct scaling regimes, obtained by considering either longerand longer time intervals, or larger and larger numbers of nodes.

We’ll end Chapter 3 with a discussion of a simple dynamic routing strat-egy which allows a network to respond robustly to failures and overloads.We’ll use this discussion to illustrate a more general phenomenon,resourcepooling, that arises in systems where spare capacity in part of network canbe used to deal with excess load elsewhere. Resource pooling allows sys-tems to operate more efficiently, but we’ll see that this is sometimes at thecost of losing early warning signs that the system is about to saturate.

0.2 Decentralized optimization

A major practical and theoretical issue in the design of communicationnetworks concerns the extent to which control can be decentralized, and inChapter 4 we’ll place this issue in a wider context through a discussion ofsome ideas from physics and economics.

In our study of loss networks we’ll have seen an optimization problemwhich a loss network is implicitly solving in decentralized manner. Thisis reminiscent of various models in physics. We’ll look at a verysimplemodel of electron motion and establish Thomson’s principle: the patternof potentials in a network of resistors is just such that it minimizes heatdissipation for a given level of current flow. The local, random behaviourof the electrons causes the network as a whole to solve a rather complexoptimization problem.

Of course simple local rules may lead to poor system behaviour if therules are the wrong ones. We’ll look at a simple model of a road traffic net-work to provide a chastening example of this. Braess’s paradox describeshow, if a new road is added to a congested network, the average speed oftraffic may fall rather than rise, and indeed everyone’s journey time may

0.3 Random access networks 7

loss networklengthen. It is possible to alter the local rules, by the imposition of appro-priate tolls, so that the network behaves more sensibly, and indeed roadtraffic networks have long provided an example of the economic principlethat externalities need to be appropriately penalized if the invisible hand isto lead to optimal behaviour.

In a queueing or loss network we can approach the optimization ofthesystem at various levels, corresponding to the two previous models: we candynamically route customers or calls according to the instantaneous state ofthe network locally, and we can also measure aggregate flows overlongerperiods of time, estimate externalities and use these estimates to decidewhere to add additional capacity.

0.3 Random access networks

In Chapter 5, we will consider the following model. Suppose multiple basestations are connected with each other via a satellite, as in Figure 0.4. If abase station needs to pass a message to another station it sends it via thesatellite, which then broadcasts the message to all of them (the address ispart of the message, so it gets decoded by the correct station at the end). Iftransmissions from two or more stations overlap, they interfere and need tobe retransmitted. The fundamental issue here is contention resolution: howshould stations arrange their transmissions so that at least some of them getthrough?

Earth

Figure 0.4 Multiple base stations in contact via a satellite

If stations could instantaneously sense when another station is trans-mitting, there would be no collisions. The problem arises because the fi-nite speed of light causes a delay between the time when a station startstransmitting and the time when other stations can sense this interference.As processing speeds increase speed-of-light delays pose a problem overshorter and shorter distances, so that the issue now arises evenfor distanceswithin a building or less.

8 Overview

token ringALOHA

protocolEthernet

protocolloss network

Consider some approaches.

• We could divide time into slots, and assign, say, every fourth slot to eachof the stations. However, this only works if we know how many stationsthere are and the load from each of them.

• We could implement a token ring: set up an order between the stations,and have the last thing that a station transmits be a “token” which meansthat it is done transmitting. Then the next station is allowedto start trans-mitting (or it may simply pass on the token). However, if there isa largenumber of stations, then simply passing the token around all of themwill take a very long time.

• The ALOHA protocol: listen to the channel; if nobody else is trans-mitting, just start your own transmission. Since it takes sometime formessages to reach the satellite and be broadcast back, it’s possible thatthere will be collisions (if two stations decide to start transmitting atsufficiently close times). If this happens, stop transmitting and wait fora randomamount of time before trying to retransmit.

• The Ethernet protocol: afterk unsuccessful attempts, wait for a randomtime with mean 2k before retransmitting.

We’ll study the last two approaches, which have advantages forthe chal-lenging case where the number of stations is large or unknown,each need-ing to access the channel infrequently.

We’ll end Chapter 5 with a discussion of distributed random access,where each station can hear only a subset of the other stations.

0.4 Broadband networks

Consider a communication link of total capacityC, which is being sharedby lots of connections, each of which needs a randomly fluctuating rate.We are interested in controlling the probability that the link isoverloadedby controlling the admission of connections into the network.

If the rate is constant, as in the first panel of Figure 0.5, then we couldcontrol admission as to a single link loss network: don’t admitanotherconnection if the sum of the rates would exceed the capacity. But whatshould we do if the rate needed by a connection fluctuates, as inthe secondpanel? A conservative approach might be to add up the peaks and not admitanother connection if the sum would exceed the capacity. Butthis looksvery conservative, and would waste a lot of capacity if the peak/mean ratiois high. Or we might add up the means and only refuse a connectionif thesum of the means would exceed the capacity. If there is enough buffering

0.4 Broadband networks 9

effectivebandwidth

loss network

bandwidth

bandwidth

a telephone call

peak

mean

t

t

Figure 0.5 Possible bandwidth profiles; mean and peak rates

available to smooth out the load then we might expect a queueing modelto be stable, but connections would suffer long delays and this would notwork for real-time communication.

In Chapter 6, we’ll use a large deviations approach to define aneffectivebandwidthsomewhere between the mean and the peak: this will depend onthe statistical characteristics of the bandwidth profiles and on the desiredlow level of overload probability. Connection acceptance takes on a simpleform: add the effective bandwidth of a new request to the effective band-widths of connections already in progress and accept the new request if thesum satisfes a bound. This approach allows the insights available from ourearlier model of a loss network to transfer to the case where the bandwidthrequired by a connection fluctuates.

If the connections are transferring a file, rather than carrying a real-timevideo conversation, then another approach is possible which makes moreefficient use of capacity, as we next describe.

10 Overview

transmissioncontrolprotocol(TCP)

processorsharing

0.5 Internet modelling

What happens when I want to see a web page? My device (e.g. computer,laptop, phone) sends a request to the server, which then starts sending thepage to me as a stream of packets through the very large network that is theInternet. The server sends one packet at first; my device sends an acknowl-edgement packet back; as the server receives acknowledgements it sendsfurther packets, at increasing rate. If a packet gets lost (which will happenif a buffer within the network overflows) it is detected by the end-points(packets have sequence numbers on them) and the server slows down. Theprocess is controlled by TCP, thetransmission control protocolof the In-ternet, implemented at the endpoints (on my device and on the server) –the network itself is essentially dumb.

server

device

Figure 0.6 A schematic diagram of the Internet. Squarescorrespond to devices and servers, the network contains resourceswith buffers, and many flows traverse the network.

Even this greatly simplified model of the Internet raises many fascinat-ing questions. Each flow is controlled by its own feedback loop,and thesecontrol loops interact with each other, through their competition for sharedresources. Will the network be stable, and what does this mean? How willthe network end up sharing the resources between the flows?

In Chapter 7 we will discuss various forms of fairness, and their in-terpretation in terms of optimization problems. We will study dynamicalmodels of congestion control algorithms that correspond to primal or dualapproaches to the solution of these optimization problems.

Over longer time scales, flows will come and go, as web pages and filesare requested or their transfers are completed. On this time scale the net-work’s overall behaviour resembles a processor sharing queueingsystem,with congestion decreasing flow rates and lengthening the delay before aflow is completed. We’ll look at models of this behaviour in Chapter 8.

0.5 Internet modelling 11

processorsharing

We’ll see that to understand the Internet’s behaviour we need toconsidermany time and space scales, depending on the level of aggregation andthe time intervals of interest. At the packet level the system looks like aqueueing network, and we need to model packet-level congestion controlalgorithms and scheduling disciplines; but if we look on a longer time scalethen a flow makes simultaneous use of resources along its route, and flowrates converge to the solution of an optimization problem; andover longertime scales still, where the numbers of flows can fluctuate significantly, thesystem behaves as a processor sharing queue.

Part I

13

1

Markov chains

Our fundamental model will be a Markov chain. We do not presume togivean introduction to the theory of Markov chains in this chapter; forthat, seeNorris (1998). In this chapter, we briefly review the basic definitions.

1.1 Definitions and notation

Let S be a countable state space, and letP = (p(i, j), i, j ∈ S) be a matrixof transition probabilities. That is,p(i, j) ≥ 0 for all i, j, and

∑

j p(i, j) = 1.A collection of random variables (X(n), n ∈ Z+), defined on a commonprobability space and taking values inS, is aMarkov chainwith transitionprobabilitiesP if, for j, k ∈ S andn ∈ Z+, we have

P(X(n+ 1) = k |X(n) = j,X(n− 1) = xn−1, . . . ,X(0) = x0) = p( j, k)

whenever the conditioning event has positive probability. That is, the con-ditional probability depends only onj andk, and not on the earlier statesx0, . . . , xn−1, or onn (theMarkov property). Here we have used capital let-ters to denote random variables, and lower case letters to refer to specificstates inS. Thus,X(0) is a random variable, andx0 is a particular elementof S.

We will assume that our Markov chains are irreducible and often weshall additionally assume aperiodicity. A Markov chain isirreducible ifevery state inS can be reached, directly or indirectly, from every otherstate. A Markov chain isperiodic if there exists an integerδ > 1 such thatP(X(n+τ) = j |X(n) = j) = 0 unlessτ is divisible byδ; otherwise the chainis aperiodic.

A Markov chain evolves in discrete time, potentially changingstate at in-teger times. Next we consider continuous time. LetQ = (q(i, j), i, j ∈ S) be a matrix withq(i, j) ≥ 0 for i , j, q(i, i) = 0,

15

16 Markov chains

and 0< q(i) ≡ ∑

j∈S q(i, j) < ∞ for all i. Let

p( j, k) =q( j, k)q( j)

j, k ∈ S,

so thatP = (p(i, j), i, j ∈ S) is a matrix of transition probabilities. Infor-mally, a Markov process in continuous time rests in statei for a time thatis exponentially distributed with parameterq(i) (hence meanq(i)−1), andthen jumps to statej with probability p(i, j). Note thatp(i, i) = 0 in ourdefinition, so the Markov process must change state when it jumps.

Formally, let (XJ(n), n ∈ Z+) be a Markov chain with transition prob-abilities P, and letT0,T1, . . . be an independent sequence of independentexponentially distributed random variables with unit mean. Now let Sn =

q(XJ(n))−1Tn: thus if XJ(n) = i thenSn is an exponentially distributed ran-dom variable with parameterq(i). (The letterS stands for “sojourn time”.)We shall define a process evolving in continuous time by using the MarkovchainXJ to record the sequence of states occupied, and the sequenceSn torecord the time spent in successive states. To do this, let

X(t) = XJ(N) for S0 + . . . + SN−1 ≤ t < S0 + . . . + SN.

Then it can be shown that the properties of the Markov chainXJ togetherwith the memoryless property of the exponential distribution imply that fort0 < t1 < . . . < tn < tn+1,

P(X(tn+1) = xn+1 |X(tn) = xn,X(tn−1) = xn−1, . . . ,X(t0) = x0)

= P(X(tn+1) = xn+1 |X(tn) = xn)

whenever the event{X(tn) = xn} has positive probability; and, further, thatfor any pair j , k ∈ S

limδ→0

P(X(t + δ) = k |X(t) = j)δ

= q( j, k)

exists and depends only onj and k, whenever the event{X(t) = j} haspositive probability. Intuitively, we can think of conditioning on the entiretrajectory of the process up to timet – we claim that the only importantinformation is the state at timet, not how the process got there. We callq( j, k) the transition ratefrom j to k. The Markov chainXJ is called thejump chain. The sum

∑∞0 Sn may be finite, in which case an infinite number

of jumps takes place in a finite time (and the process “runs out of instruc-tions”); if not we have defined theMarkov process(X(t), t ∈ R+).

We will only be concerned with countable state space processes. Thereis a rich theory of Markov chains/processes defined on an uncountable state

1.1 Definitions and notation 17

equilibriumequations


space; and in some other texts the term “Markov process” refers to contin-uous space, rather than continuous time. Even with only a countable statespace it is possible to construct Markov processes with strange behaviour.These will be excluded: we shall assume that all our Markov processes areirreducible, remain in each state for a positive length of time, and are in-capable of passing through an infinite number of states in finite time. SeeAppendix A for more on this.

A Markov chain or process may possess anequilibrium distribution, thatis a collectionπ = (π( j), j ∈ S) of positive numbers summing to unity thatsatisfy

π( j) =∑

k∈Sπ(k)p(k, j) ∀ j ∈ S, (1.1)

for a Markov chain, or

π( j)∑

k∈Sq( j, k) =

∑

k∈Sπ(k)q(k, j) ∀ j ∈ S (1.2)

for a Markov process (theequilibrium equations).Under the assumptions we have made for a Markov process, ifπ ex-

ists then it will be unique. It will then also be the limiting, ergodic, andstationary distribution –

Limiting: ∀ j ∈ S, P(X(t) = j)→ π( j) ast → ∞.

Ergodic: ∀ j ∈ S, 1T

∫ T

0I {X(t) = j}dt → π( j) asT → ∞ with probability

1. (The integral is the amount of time, between 0 andT, that theprocess spends in statej.)

Stationary: If P(X(0) = j) = π( j) for all j ∈ S, thenP(X(t) = j) = π( j)for all j ∈ S and allt ≥ 0.

If an equilibrium distribution does not exist thenP(X(t) = j)→ 0 ast → ∞for all j ∈ S. An equilibrium distribution will not exist if we can find acollection of positive numbers satisfying the equilibrium equations whosesum is infinite. When the state spaceS is finite, an equilibrium distributionwill always exist. All of this paragraph remains true for Markov chains,except that for a periodic chain there may not be convergence to alimitingdistribution.

A chain or process for whichP(X(0) = j) = π( j) for all j ∈ S is termedstationary. A stationary chain or process can be defined for allt ∈ Z or R– we imagine time has been running from−∞. We shall often refer to astationary Markov process as being in equilibrium.

18 Markov chains

Exercises

Exercise 1.1 LetS be an exponentially distributed random variable. Showthat for anys, t > 0

P(S > s+ t |S > t) = P(S > s),

thememoryless propertyof the exponential distribution.Let S,T be independent random variables having exponential distribu-

tions with parametersλ, µ. Show that

Z = min{S,T}

also has an exponential distribution, with parameterλ + µ.

Exercise 1.2 A Markov process has transition rates (q( j, k), j, k ∈ S),and equilibrium distribution (π( j), j ∈ S). Show that its jump chain hasequilibrium distribution (πJ( j), j ∈ S) given by

πJ( j) = G−1π( j)q( j)

provided

G =∑

j

∑

k

π( j)q( j, k) < ∞. (1.3)

Exercise 1.3 Show that the Markov property is equivalent to the follow-ing statement: for anyt−k < . . . < t−1 < t0 < t1 < . . . < tn,

P(X(tn) = xn,X(tn−1) = xn−1, . . . ,X(t−k) = x−k |X(t0) = x0) =

P(X(tn) = xn, . . . ,X(t1) = x1 |X(t0) = x0)

× P(X(t−1) = x−1, . . . ,X(t−k) = x−k |X(t0) = x0)

whenever the conditioning event has positive probability. That is, “condi-tional on the present, the past and the future are independent”.

1.2 Time reversal

For a stationary chain or process, we can consider what happensif werun time backwards. Because the Markov property is symmetric in time(Exercise 1.3), thereversedchain or process will again be Markov. Let’scalculate its transition rates.

Proposition 1.1 Suppose(X(t), t ∈ R) is a stationary Markov processwith transition rates(q(i, j), i, j ∈ S) and equilibrium distributionπ =(π( j), j ∈ S). Let Y(t) = X(−t), t ∈ R. Then(Y(t), t ∈ R) is a stationary

1.2 Time reversal 19

balance!detailedbalance!fullbalance!detailedequilibrium

equations

Markov process with the same equilibrium distributionπ and transitionrates

q′( j, k) =π(k)q(k, j)π( j)

, ∀ j, k ∈ S.

Proof We know (Y(t), t ∈ R) is a Markov process: we determine its tran-sition rates by an application of Bayes’ theorem.

P(Y(t + δ) = k |Y(t) = j) =P(Y(t + δ) = k,Y(t) = j)

P(Y(t) = j)

=P(X(−t − δ) = k,X(−t) = j)

π( j)

=π(k)P(X(−t) = j |X(−(t + δ)) = k)

π( j)

=π(k)(q(k, j)δ + o(δ)))

π( j)

Now divide both sides byδ and letδ → 0 to obtain the desired expressionfor q′( j, k).

Of course,P(Y(t) = i) = π(i) for all i ∈ S and for allt, and soπ is thestationary distribution. �

If the reversed process has the same transition rates as the originalpro-cess then we call the processreversible. In order for this to hold, i.e. tohaveq( j, k) = q′( j, k), we need the followingdetailed balanceequations tobe satisfied:

π( j)q( j, k) = π(k)q(k, j), ∀ j, k ∈ S. (1.4)

Detailed balance says that in equilibrium transitions fromj to k happen “asfrequently” as transitions fromk to j.

Note that detailed balance implies the equilibrium equations (1.2) whichare sometimes known asfull balance. When it holds, detailed balance ismuch easier to check than full balance. Consequently, we willoften lookfor equilibrium distributions of Markov processes by trying to solve de-tailed balance equations.

Exercises

Exercise 1.4 Check that the distribution (π( j), j ∈ S) and the transitionrates (q′( j, k), j, k ∈ S) found in Proposition 1.1 satisfy the equilibriumequations.

Establish the counterpart to Proposition 1.1 for a Markov chain.

20 Markov chains

Erlang’sformula—(

balance!detailed

Exercise 1.5 Associate a graph with a Markov process by letting an edgejoin statesj andk if eitherq( j, k) orq(k, j) is positive. Show that if the graphassociated with a stationary Markov process is a tree, then the process isreversible.

1.3 Erlang’s formula

In the early 20th century Agner Krarup Erlang worked for the CopenhagenTelephone Company, and he was interested in determining how many par-allel circuits were needed on a telephone link to provide service(say, out ofa single town, or between two cities). We will later look at generalizationsof this problem for networks of links; right now, we will look at a singlelink.

Suppose that calls arrive to a link as a Poisson process of rateλ. The linkhasC parallel circuits, and while a call lasts, it uses one of the circuits. Wewill assume that each call lasts for an exponentially distributed amount oftime with parameterµ, and that these call holding periods are independentof each other and of the arrival times. If a call arrives to the link and findsthat allC circuits are busy, it’s simply lost. We would like to estimate theprobability that an arriving call is lost.

Let X(t) = j be the number of busy circuits on the link. This is a Markovprocess, with transition rates given by

q( j, j + 1) = λ, j = 0, 1, . . . ,C − 1; q( j, j − 1) = jµ, j = 1, 2, . . . ,C

The upward transition is equivalent to the assumption that thearrival pro-cess is Poisson of rateλ; the downward transition rate arises since we arelooking for the first ofj calls to finish – recall Exercise 1.1.

We try to solve the detailed balance equations:

π( j − 1)q( j − 1, j) = π( j)q( j, j − 1)

=⇒ π( j) =λ

µ jπ( j − 1) = · · · =

(

λ

µ

) j 1j!π(0).

Adding the additional constraint∑C

j=0 π( j) = 1, we find

π(0) =

C∑

j=0

(

λ

µ

) j 1j!

−1

.

This is the equilibrium probability of all circuits being free.

1.3 Erlang’s formula 21

The equilibrium probability that all circuits are busy is therefore

π(C) = E

(

λ

µ,C

)

whereE(ν,C) =νC/C!

∑Cj=0 ν

j/ j!. (1.5)

This is known asErlang’s formula.This was a model that assumed that the arrival rate of calls is constant at

all times. Suppose, however, that we have a finite populationof M callers,so that those already connected will not be trying to connect again, andsuppose someone not already connected calls at rateη. We consider a linkwith C parallel circuits, whereC < M.

Letting X(t) = j be the number of busy lines, we now have the transitionrates

q( j, j+1) = η(M− j), j = 0, . . . ,C−1; q( j, j−1) = µ j, j = 1, 2, . . . ,C.

The equilibrium distributionπM( j) now satisfies (check this!)

πM( j − 1)q( j − 1, j) = πM( j)q( j, j − 1)

=⇒ πM( j) = πM(0)

(

Mj

) (

η

µ

) j

.

Next we find the distribution of the number of busy circuits whena newcall is initiated. Now

P( j lines are busy and a call is initiated in (t, t + δ))

= πM( j)(η(M − j)δ + o(δ)), j = 0, 1, . . . ,C

and the probability of a call being initiated is just the sum ofthese termsover all j. Lettingδ→ 0, we find that the conditional probability ofj linesbeing busy when a new call arrives is

πM( j)η(M − j)∑

i πM(i)η(M − i)∝ πM( j)(M − j)

∝(

Mj

)

(M − j)

(

η

µ

) j

∝(

M − 1j

) (

η

µ

) j

.

The symbol “∝” means “is proportional to”. We have used the trick of onlyleaving the terms that depend onj: the normalization constant can be foundlater, from the fact that we have a probability distribution thatadds up to 1.

22 Markov chains

PASTAproperty If we now enforce this condition, we see

P( j lines are busy when a call is initiated)= πM−1( j), j = 0, 1, . . . ,C.

Thus

P(an arriving call is lost)= πM−1(C) < πM(C) :

an arriving call sees a distribution of busy lines that is not thetime-averageddistribution of busy lines,πM.

Note that ifM → ∞ with ηM held fixed atλ, we recover the previousmodel, and the probability an arriving call finds all lines busy approachesthe time-averaged probability, given by Erlang’s formula. This isan exam-ple of a more general phenomenon, called thePASTAproperty (“Poissonarrivals see time averages”).

Exercises

Exercise 1.6 A Markov process has transition rates (q( j, k), j, k ∈ S),and equilibrium distribution (π( j), j ∈ S). A Markov chain is formed byobserving the transitions of this process: at the successive jump times ofthe process, the statej just before, and the statek just after, the jump arerecorded as an ordered pair (j, k) ∈ S2. Write down the transition prob-abilities of the resulting Markov chain, and show that it has equilibriumdistribution

π′( j, k) = G−1π( j)q( j, k)

provided (1.3) holds.Give an alternative interpretation ofπ′( j, k) in terms of the conditional

probability of seeing the original process jump from statej to statek in theinterval (t, t+ δ), given that the process is in equilibrium and a jump occursin that interval.

Exercise 1.7 Show that the mean number of circuits in use in the modelleading to Erlang’s formula, i.e. the mean of the equilibrium distributionπ,is

ν(1− E(ν,C)).

Exercise 1.8 Show that

ddν

E(ν,C) = −(1− E(ν,C))(E(ν,C) − E(ν,C − 1)).

1.4 Further reading 23

Erlang’sformula—)

Exercise 1.9 Car parking spaces are labelledn = 1, 2, . . . , where the labelindicates the distance (in car lengths) to walk to a shop, and an arriving carparks in the lowest numbered free space. Cars arrive as a Poisson processof rateν, and parking times are exponentially distributed with unit mean,and are independent of each other and of the arrival process. Showthat thedistance at which a newly arrived car parks has expectation

∞∑

C=0

E(ν,C),

whereE(ν,C) is Erlang’s formula.[Hint: calculate the probability the firstC spaces are occupied.]

1.4 Further reading

Norris (1998) is a lucid and rigorous introduction to discrete and continu-ous time Markov chains. The fourth chapter of Whittle (1986) and the firstchapter of Kelly (2011) give more extended discussions of reversibilitythan we have time for here.

balance!detailed

2

Queueing networks

In this Chapter we look at simple models of a queue and of a network ofqueues. We begin by studying the departure process from a queue withPoisson arrivals and exponential service times. This will be important tounderstand if customers leaving the queue move on to another queue in anetwork.

2.1 An M/M/1 queue

Suppose that the stream of customers arriving at a queue (the arrivalpro-cess) forms a Poisson process of rateλ. Suppose further there is a singleserver and that customers’ service times are independent of eachother andof the arrival process and are exponentially distributed with parameterµ.Such a system is called anM/M/1 queue, theM’s indicating the memo-ryless (exponential) character of the interarrival and service times and thefinal digit indicating the number of servers. LetX(t) = j be the numberof customers in the queue, including the customer being served. Then itfollows from our description of the queue thatX is a Markov process withtransition rates

q( j, j + 1) = λ, j = 0, 1, . . .

q( j, j − 1) = µ, j = 1, 2, . . .

If the arrival rateλ is less than the service rateµ then the distribution

π( j) = (1− ρ)ρ j , j = 0, 1, . . .

satisfies the detailed balance equations (1.4) and is thus theequilibriumdistribution, whereρ = λ/µ is thetraffic intensity.

Let (X(t), t ∈ R) be a stationaryM/M/1 queue. Since it is reversible, itis indistinguishable from the same process run backwards in time:

(X(t), t ∈ R)D= (X(−t), t ∈ R),

24

2.1 An M/M/1 queue 25

whereD= meansequal in distribution. That is, there is no statistic that can

distinguish the two processes.Now consider the sample path of the queue. We can represent it interms

of two point processes (viewed here as sets of random times): the arrivalprocessA, and the departure processD. The arrival processA records the

t

X(t)

Figure 2.1 Sample path of theM/M/1 queue.+ marks the pointsof A, −marks the points ofD.

jumps up of the Markov processX(t); the departure processD records thejumps down of the Markov processX(t). By assumption,A is a Poissonprocess of rateλ. On the other hand,A is defined on (X(t), t ∈ R) exactly asD is defined on (X(−t), t ∈ R) – the departures of the original process arethe arrivals of the reversed process. Since (X(t), t ∈ R) and (X(−t), t ∈ R)are distributionally equivalent, we conclude thatD is also a Poisson processof rateλ. That is, we have shown the following theorem.

Theorem 2.1(Burke, Reich) In equilibrium, the departure process froman M/M/1 queue is a Poisson process.

Remark 2.2 Time reversibility for the systems we consider holds only inequilibrium! For example, if we condition on the queue size being positive,then the time until the next departure is again exponential, but now withparameterµ rather thanλ. A related comment is thatA and D are bothPoisson processes, but they are by no means independent.

To investigate the dependence structure, we introduce the following no-tation. For random variables or processesB1 andB2 we will write B1⊥⊥B2

to mean thatB1 andB2 are independent. Now, for a fixed timet0 ∈ R, thestate of the queue up to timet0 is independent of the future arrivals:

(X(t), t ≤ t0)⊥⊥(

A∩ (t0,∞))

.

Applying this to the reversed process, we get for every fixed timet1 ∈ R

(X(t), t ≥ t1)⊥⊥(

D ∩ (−∞, t1))

.

26 Queueing networks

In particular, when the queue is in equilibrium, the number of people thatare present in the queue at timet1 is independent of the departure process upto timet1. (But clearly it isn’t independent of the future departure process!)

Exercises

Exercise 2.1 Upon anM/M/1 queue is imposed the additional constraintthat arriving customers who findN customers already present leave andnever return. Find the equilibrium distribution of the queue.

Exercise 2.2 An M/M/2 queue has two identical servers at the front.Find a Markov chain representation for it, and determine the equilibriumdistribution of a stationaryM/M/2 queue. Is the queue reversible?[Hint: you can still use the number of customers in the queue as the stateof the Markov chain.]

Deduce from the equilibrium distribution that the proportion of timeboth servers are idle is

π(0) =1− ρ1+ ρ

, whereρ =λ

2µ< 1.

Exercise 2.3 An M/M/1 queue has arrival rateν and service rateµ, whereρ = ν/µ < 1. Show that the sojourn time (= queueing time+ service time)of a typical customer is exponentially distributed with parameterµ − ν.

Exercise 2.4 Consider anM/M/∞ queue with servers numbered 1, 2, . . .On arrival a customer chooses the lowest numbered server which isfree.Calculate the equilibrium probability that serverj is busy.[Hint: compare with Exercise 1.9, and calculate the expected number outof the firstC servers that are busy.]

2.2 A series ofM/M/1 queues

The most obvious application of Theorem 2.1 is to a series ofJ queuesarranged so that when a customer leaves a queue she joins the next one,until she has passed through all the queues, as illustrated in Figure 2.2.

Suppose the arrival stream at queue 1 is Poisson at rateν, and that servicetimes at queuej are exponentially distributed with parameterµ j , whereν <µ j for j = 1, 2, . . . , J. Suppose further that service times are independent ofeach other, including those of the same customer in different queues, andof the arrival process at queue 1.

2.2 A series of M/M/1 queues 27

nJn2n1

νµ1 µ2 µJ

. . .

Figure 2.2 A series ofM/M/1 queues

By the analysis in the previous section, if we look at the system in equi-librium, then (by induction) each of the queues has a Poisson arrival pro-cess, so this really is a series ofM/M/1 queues. Letn(t) = (n1(t), . . . , nJ(t))be the Markov process giving the number of customers in each of thequeues at timet. In equilibrium we know the marginal distribution of thecomponents:

π j(nj) = (1− ρ j)ρn j

j , nj = 0, 1, . . .

whereρ j = ν/µ j . But what is the joint distribution of allJ queue lengths?Equivalently, what is the dependence structure between them?To discoverthis, we’ll develop a little further the induction that gave usthe marginaldistributions.

For a fixed timet0, consider the following quantities:

1. n1(t0);2. departures from queue 1 prior tot0;3. service times of customers in queues 2, 3, . . . , J;4. the remaining vector (n2(t0), . . . , nJ(t0)).

In the previous section, we have established (1)⊥⊥ (2). Also, (3)⊥⊥ (1,2) byconstruction: the customers’ service times at queues 2, . . . , J are indepen-dent of the arrival process at queue 1 and service times there. Therefore,(1), (2), and (3) are mutually independent, and in particular (1)⊥⊥ (2, 3).On the other hand, (4) is a function of (2, 3), so we conclude that (1)⊥⊥ (4).

Similarly, for eachj we havenj(t0)⊥⊥(nj+1(t0), . . . , nJ(t0)). Therefore theequilibrium distributionπ(n) factorizes:

π(n1, . . . , nJ) =J∏

j=1

π j(nj).

You can use this technique to show a number of other results for ase-ries of M/M/1 queues; for example, Exercise 2.5 shows that the sojourntimes of a customer in successive queues are independent. However, thetechnique is fragile; in particular, it does not allow a customer to leave alater queue and return to an earlier one. We will next develop a set of tools


that doesn’t give such fine-detail results, but can tolerate moregeneral flowpatterns.

Exercises

Exercise 2.5 Suppose that customers at each queue in a stationary seriesof M/M/1 queues are served in the order of their arrival. Note that, from asample path such as that illustrated in Figure 2.1, each arrival time can bematched to a departure time, corresponding to the same customer.Arguethat the sojourn time of a customer in queue 1 is independent ofdeparturesfrom queue 1 prior to her departure. Deduce that in equilibrium the sojourntimes of a customer at each of theJ queues are independent.

2.3 Closed migration processes

In this section, we will analyze a generalization of the series of queuesexample. It is simplest to just give a Markovian description. The state space

n2

n1

n5n4

n3

Figure 2.3 Closed migration process

of the Markov process isS = {n ∈ ZJ+ :

∑Jj=1 nj = N}. Each state is written

asn = (n1, . . . , nJ) wherenj is the number ofindividualsin colony j.For two different coloniesj andk, define the operatorT jk as

T jk(n1, . . . , nJ) =

(n1, . . . , nj − 1, . . . , nk + 1, . . . , nJ), j < k;

(n1, . . . , nk + 1, . . . , nj − 1, . . . , nJ), j > k.

That is,T jk transfers one individual from colonyj to colonyk.We now describe the rate at which transitions occur in our state space.

We will only allow individuals to move one at a time, so transitions can

2.3 Closed migration processes 29

only occur between a staten andT jkn for some j, k. We will assume thatthe transition rates have the form

q(n,T jk(n)) = λ jkφ j(nj), φ j(0) = 0.

That is, it is possible to factor the rate into a product of two functions: onedepending only on the two coloniesj andk, and another depending onlyon the number of individuals in the “source” colonyj.

We will suppose thatn is irreducible inS (in particular, that it is possiblefor individuals to get from any colony to any other colony, possibly inseveral steps). In this case, we calln aclosed migration process.

We can model ans-server queue at colonyj by takingφ j(n) = min(n, s).Each of the customers requires an exponential service time with parameterλ j =

∑

k λ jk; and once service is completed, the individual goes to colonykwith probabilityλ jk/λ j .

Another important example isφ j(n) = n for all j. These are the transitionrates we get if individuals move independently of one another. This can bethought of as a network ofinfinite-serverqueues (corresponding tos =∞ above), and is an example of alinear migration process; we’ll studythese further in Section 2.6. IfN = 1 then the single individual performsa random walk on the set of colonies, with equilibrium distribution (α j),where theα j satisfy

α j > 0,∑

j

α j = 1

α j

∑

k

λ jk =∑

k

αkλk j, j = 1, 2, . . . , J. (2.1)

We refer to these equations as thetraffic equations, and we use them todefine the quantities (α j) in terms of (λ jk) for a general closed migrationprocess.

Remark 2.3 Note that the quantitiesλ jk andφ j(·) are only well-definedup to a constant factor. In particular, the (α j) are only well-defined afterwe’ve picked the particular set of (λ jk).

Theorem 2.4 The equilibrium distribution for a closed migration processis

π(n) = G−1N

J∏

j=1

αn j

j∏n j

r=1 φ j(r), n ∈ S.

Here, GN is a normalizing constant, chosen so the distribution sums to 1,and(α j) are the solution to the traffic equations(2.1).


balance!fullbalance!partialbalance!partialbalance!fullbalance!partialbalance!detailed

Remark 2.5 Although this expression looks somewhat complicated, itsform is really quite simple: the joint distribution factors as a product overindividual colonies.

Proof In order to check that this is the equilibrium distribution, it sufficesto verify that for eachn the full balance equations hold:

π(n)∑

j

∑

k

q(n,T jkn)?=

∑

j

∑

k

π(T jkn)q(T jkn, n).

These will be satisfied provided the following set ofpartial balanceequa-tions hold:

π(n)∑

k

q(n,T jkn)?=

∑

k

π(T jkn)q(T jkn, n) ∀ j.

That is, it suffices to check that, from any state, the rate of individualsleaving a given colonyj is the same as the rate of individuals arriving intoit. We now recall

q(n,T jkn) = λ jkφ j(nj), q(T jkn, n) = λk jφk(nk + 1)

and from the claimed form forπ we have that

π(T jkn) = π(n)φ j(nj)

α j

αk

φk(nk + 1).

(T jk(n) has one more customer in colonyk thann does, hence the appear-ance ofnk + 1 in the arguments.) After substituting and cancelling terms,we see that the partial balance equations are equivalent to

∑

k

λ jk?=

1α j

∑

k

αkλk j,

which is true by the definition of theα j . �

Remark 2.6 The full balance equations state that the total probabilityflux into and out of any state is the same. The detailed balanceequationsstate that the total probability flux between any pair of states is the same.Partial balance says that, for a fixed state, there is a subset of the states forwhich the total probability flux into and out of the subset is equal.

Example 2.7 A telephone banking facility has N incoming lines and asingle (human) operator. Calls to the facility are initiated asa Poisson pro-cess of rateν, but calls initiated when allN lines are in use are lost. A callfinding a free line has then to wait for the operator to answer. The operatordeals with waiting calls one at a time, and takes an exponentially distributed

2.3 Closed migration processes 31

PASTApropertylength of time with parameterλ to check the caller’s identity, after which

the call is passed to an automated handling system for the caller to transactbanking business, and the operator is freed to deal with anothercaller. Theautomated handling system is able to serve up to N callers simultaneously,and the time it takes to serve a call is exponentially distributed with param-eterµ. All these lengths of time are independent of each other and ofthePoisson arrival process.

n1 lines free

n2 lines waiting for service

n3 connected calls

n3

n2n1

Figure 2.4 Closed migration process for the telephone bankingfacility

We model this system as a closed migration process as in Figure 2.4.The transition rates correspond to

λ12 = ν φ1(n1) = I [n1 > 0]

λ23 = λ φ2(n2) = I [n2 > 0]

λ31 = µ φ3(n3) = n3

We can easily solve the traffic equations

α1 : α2 : α3 =1ν

:1λ

:1µ

since we have a random walk on three vertices, and these are the aver-age amounts of time it spends in each of the vertices. Therefore, by Theo-rem 2.4,

π(n1, n2, n3) ∝1νn1

1λn2

1µn3

1n3!.

For example, the proportion of incoming calls that are lost is, by the PASTAproperty,

P(n1 = 0) =∑

n2+n3=N

π(0, n2, n3).


Exercises

Exercise 2.6 For the telephone banking example, show that the propor-tion of calls lost has the form

H(N)∑N

n=0 H(n)

where

H(n) =(ν

λ

)n n∑

i=0

(

λ

µ

)i 1i!.

Exercise 2.7 A restaurant hasN tables, with a customer seated at eachtable. Two waiters are serving them.One of the waiters moves from table totable taking orders for food. The time that he spends taking orders at eachtable is exponentially distributed with parameterµ1. He is followed by thewine waiter who spends an exponentially distributed time with parameterµ2 taking orders at each table. Customers always order food first and thenwine, and orders cannot be taken concurrently by both waiters from thesame customer. All times taken to order are independent of eachother. Acustomer, after having placed her two orders, completes her meal at rateν, independently of the other customers. As soon as a customer finishesher meal, she departs and a new customer takes her place and waits toorder. Model this as a closed migration process. Show that the stationaryprobability that both waiters are busy can be written in the form

G(N − 2)G(N)

· ν2

µ1µ2

for a functionG(·), which may also depend onν, µ1, µ2, to be determined.

2.4 Open migration processes

It is simple to modify the previous model so as to allow customers to enterand exit the system. Define the operators

T j→n = (n1, . . . , nj − 1, . . . , nJ), T→kn = (n1, . . . , nk + 1, . . . , nJ).

T j→ corresponds to an individual from colonyj departing the system;T→k

corresponds to an individual entering colonyk from the outside world.We’ll assume that the transition rates associated with these extra possi-bilities are

q(n,T jkn) = λ jkφ j(nj); q(n,T j→n) = µ jφ j(nj); q(n,T→kn) = νk.

2.4 Open migration processes 33

balance!partialThat is, immigration into colonyk is Poisson. We can think of→ as simplyanother colony, but with an infinite number of individuals, so that individ-uals entering or leaving it do not change the overall rate.

If the resulting processn is irreducible inZJ+, we calln anopen migration

process.As before, we define quantities (α1, . . . , αJ), which satisfy the newtraffic

equations

α j(µ j +∑

k

λ jk) = ν j +∑

k

αkλk j ∀ j. (2.2)

These equations have a unique, positive solution (Exercise 2.9).Since the state-space of an open migration process is infinite, it is pos-

sible that there may not exist an equilibrium distribution. Define the con-stants

gj =

∞∑

n=0

αnj

∏nr=1 φ j(r)

, j = 1, 2, . . . , J.

Theorem 2.8 If g1, . . . , gJ < ∞, then n has the equilibrium distribution

π(n) =J∏

j=1

π j(nj), π j(nj) = g−1j

αn j

j∏n j

r=1 φ j(r).

Proof Once again, we need to check for eachn the full balance equations

π(n)

∑

j

∑

k

q(n,T jkn) +∑

j

q(n,T j→n) +∑

k

q(n,T→kn)

?=

∑

j

∑

k

π(T jkn)q(T jkn, n) +∑

j

π(T j→n)q(T j→n, n)

+∑

k

π(T→kn)q(T→kn, n),

which will be satisfied if we can solve the partial balance equations

π(n)

∑

k

q(n,T jkn) + q(n,T j→n)

?=

∑

k

π(T jkn)q(T jkn, n) + π(T j→n)q(T j→n, n) (2.3)

and

π(n)∑

k

q(n,T→kn)?=

∑

k

π(T→kn)q(T→kn, n). (2.4)


balance!detailed(These look a lot like the earlier partial balance equations with an addedcolony called→.) Substitution of the claimed form forπ(n) will verify thatequations (2.3) are satisfied (check this!).

To see that Equations (2.4) are satisfied, we substitute

q(n,T→kn) = νk, π(T→kn) = π(n)αk

φk(nk + 1), q(T→kn, n) = µkφk(nk + 1)

to get after cancellation∑

k

νk?=

∑

k

αkµk.

This is not directly one of the traffic equations (2.2): instead, it is the sumof these equations over allj. �

We conclude that, in equilibrium, at any fixed timet the random vari-ablesn1(t), n2(t), . . . , nJ(t) are independent (although in most open migra-tion networks they won’t be independent as processes).

A remarkable property of the distributionπ j(nj) is that it is the same as ifthe arrivals at colonyj were a Poisson process of rateα jλ j , with departureshappening at rateλ jφ j(nj), where we defineλ j = µ j +

∑

k λ jk. However,in general the entire process of arrivals of individuals into a colony is notPoisson. For example, consider the system in Figure 2.5 below. If ν is quitesmall butλ21/µ2 is quite large, the typical arrival process into queue 1 willlook like the right-hand picture: rare bursts of arrivals of geometricsize.

ν µ2

λ21

Figure 2.5 Simple open migration network. The typical arrivalprocess into queue 1 is illustrated on the right.

While an open migration process is not in general reversible (the detailedbalance equations do not hold), it does behave nicely under time reversal:the reversed process also looks like an open migration process.

Theorem 2.9 If (n(t), t ∈ R) is a stationary open migration process, thenso is the reversed process(n(−t), t ∈ R).

Proof This requires checking that the transition rates have the required


form. Using Proposition 1.1 we have

q′(n,T jkn) =π(T jkn)

π(n)q(T jkn, n) =

φ j(nj)

α j

αk

φk(nk + 1)λk jφk(nk + 1)

= λ′jkφ j(nj), where λ′jk =αk

α jλk j.

Similarly we find (check this) that the remaining transition ratesof thereversed open migration process have the form

q′(n,T j→n) = µ′jφ j(nj), q′(n,T→kn) = ν′k,

whereµ′j = ν j/α j andν′k = αkµk. �

Corollary 2.10 The exit process from colony k, i.e. the stream of individ-uals leaving the system from colony k, is a Poisson process ofrateαkµk.

Of course, as Figure 2.5 shows, not all of the internal streams of individ-uals can be Poisson! However, slightly more is true than we havesaid sofar.

Remark 2.11 Consider the open migration process illustrated in Fig-ure 2.6; the circles are colonies and the arrows indicate positivevaluesof λ, ν, µ.

P

PP

P

P P

PP

Figure 2.6 Example open migration network. The dashed sets ofcolonies will be useful in Exercise 2.10.

In Exercise 2.10 we will show that the stream of customers is Poisson notonly along those arrows marked with aP, but also along all the boldfacedarrows.

We have so far tacitly assumed that the open migration process has afinite number of colonies, but we haven’t explicitly used this assumption;the only extra condition we require to accommodateJ = ∞ is to insist thatthe equilibrium distribution can be normalized, i.e. that

∏∞j=1 g−1

j > 0. Wewill now discuss an example of an open migration process withJ infinite.


family sizeprocess Family size process

This process was studied by Kendall (1975), who was interested infamilynames in Yorkshire. An immigrating individual has a distinguishing char-acteristic, such as a surname or a genetic type, which is passedon to all hisdescendents. The population is divided into families, eachof which con-sists of all those individuals alive with a given characteristic. The processcan also be used to study models of preferential attachment, where newobjects (e.g. web pages, or news stories) tend to attach to popular objects.

Let nj be the number of families of sizej. Then the family size process(n1, n2, . . . ) is a linear open migration process with transition rates

q(n,T j, j+1n) = jλnj , j = 1, 2, . . . – λ is the birth rate

q(n,T j, j−1n) = jµnj , j = 2, 3, . . . – µ is the death rate

q(n,T→1n) = ν – immigration

q(n,T1→n) = µn1 – extinction of a family.

This corresponds to an open migration process withφ j(nj) = nj (and, forexample,λ j, j+1 = jλ). Observe that afamily is the basic unit which movesthrough the colonies of the system, and that the movements of differentfamilies are independent.

The traffic equations have a solution (check!)

α j =ν

λ j

(

λ

µ

) j

with

gj =

∞∑

n=0

αnj

n!= eα j ,

∞∏

i=1

g−1i = e−

∑

αi > 0 if λ < µ.

The conditionλ < µ is to be expected, for stability. Under this conditionthe equilibrium distribution is

π(n) =∞∏

j=1

e−α jα

n j

j

nj !

and thus the number of families of sizej, nj , has a Poisson distributionwith meanα j , andn1, n2, . . . are independent. Hence the total number ofdistinct families,N =

∑

j nj , has a Poisson distribution with mean

∑

j

α j = −ν

λlog

(

1− λµ

)

,

from the series expansion of− log(1− x).


family sizeprocess

negativebinomialexpansion

Exercises

Exercise 2.8 Show that the reversed process obtained from a stationaryclosed migration process is also a closed migration process, anddetermineits transition rates.

Exercise 2.9 Show that there exists a unique, positive solution to the traf-fic equations (2.2).[Hint: defineα→ = 1.]

Exercise 2.10 Consider Figure 2.6. Show that each of the indicated sub-sets of colonies is itself an open migration process.[Hint: induction. ]

Suppose that, in an open migration process, individuals from colony kcannot reach colonyj without leaving the system. Show that the processof individuals transitioning directly from colonyj to colonyk is Poisson.Conclude that the processes of individuals transitioning along bold arrowsare Poisson.[Hint: write j { k if individuals leaving colonyj can reach colonyk bysome path with positive probability. Form subsets of coloniesas equiva-lence classes, using the equivalence relation:j ∼ k if both j { k andk{ j. ]

Exercise 2.11 In the family size process show that the points in time atwhich family extinctions occur forms a Poisson process.

Exercise 2.12 In the family size process, let

M =∞∑

1

jn j ,

the population size. Determine the distribution ofn conditional onM, andshow that it can be written in the form

π(n |M) =

(

θ + M − 1M

)−1 M∏

j=1

(

θ

j

)n j 1nj !

(2.5)

whereθ = ν/λ.[Hint: consider the coefficients ofxM in the power series expansions of theidentity

(1− x)−θ =∞∏

j=1

exp

(

θ

jx j

)

. ]


Chineserestaurantprocess

family sizeprocess

Little’s law—(

Exercise 2.13 (The Chinese restaurant process)An initially empty restau-rant has an unlimited number of tables each capable of sittingan unlimitednumber of customers. Customers numbered 1, 2, . . . arrive one by one andare seated as follows. Customer 1 sits at a new table. Form ≥ 1 supposemcustomers are seated in some arrangement: then customerm+ 1 sits at anew table with probabilityθ/(θ+m) and sits at a given table already seatingj customers with probabilityj/(θ +m).

Let nj be the number of tables occupied byj customers. Show that afterthe firstM customers have arrived the distribution of (n1, n2, . . . ) is givenby expression (2.5).

Deduce that for the family size process the expected number of distinctfamilies (e.g. surnames, genetic types, or news stories) giventhe populationsize is

E(N |M) =M∑

i=1

θ

θ + i − 1.

2.5 Little’s law

Consider a stochastic process (X(t), t ≥ 0) on which is defined a functionn(t) = n(X(t)), thenumber of customers in the systemat timet. Suppose thatwith probability 1 there exists a timeT1 such that the continuation of theprocess beyondT1 is a probabilistic replica of the process starting at time0: this implies the existence of further timesT2,T3, . . . having the sameproperty asT1. These times are calledregeneration points. Suppose alsothat the regeneration point is such thatn(T1) = 0. (For example, in an openmigration process a regeneration point may be “entire system is empty”.)SupposeET1 < ∞. Then we can define the mean number of customers inthe system

limT→∞

1T

∫ T

0n(s)ds

w.p.1= lim

T→∞

1T

∫ T

0En(s)ds=

E∫ T1

0n(s)ds

ET1=: L

(that is, the average number of customers in the system can be computedfrom a single regeneration cycle – treat this as an assertion for the moment).

Also, let Wn, n = 1, 2, . . . be the amount of time spent by thenth cus-tomer in the system. Then we can define the mean time spent by a customerin the system

limn→∞

1n

n∑

i=1

Wnw.p.1= lim

n→∞

1n

n∑

i=1

EWn =E

∑Nn=1 Wn

EN=: W,

2.5 Little’s law 39

whereN is the number of customers who arrive during the first regenerationcycle [0,T1].

Finally, we can define the mean arrival rate

limT→∞

1T

(number of arrivals in [0,T])

w.p.1= lim

T→∞

1TE(number of arrivals in [0,T]) =

ENET1=: λ.

Note that we do not assume that the arrivals are Poisson;λ is simply thelong-run average arrival rate.

Remark 2.12 We will not prove any of these statements. (They followfrom renewal theory, and are discussed further in Appendix B.) But wewill establish a straightforward consequence of these statements.

Note that the definition of what constitutes the system, or a customer, isleft flexible. We require only consistency of meaning in the phrases “num-ber of customers in the system”, “time spent in the system”, “number ofarrivals”. For example, suppose that in an open migration process we wantto define “the system” to be a subset of the colonies. Thenn(t) is the num-ber of customers present in the subset of the colonies, but we have a choicefor the arrivals and time periods. We could view each visit of an individualto the subset as a distinct arrival, generating a distinct time spent in “thesystem”. Alternatively, we could count each distinct individual as a singlearrival, and add up the times of that individual’s visits into asingle timespent in the system.

Theorem 2.13(Little’s law) L = λW.

Proof Note

L =E

∫ T1

0n(s)ds

ET1=E

∫ T1

0n(s)ds

ENENET1

whereas

λW =ENET1

E∑N

i=1 Wi

EN.

Thus, it suffices to show∫ T1

0n(s)ds=

N∑

i=1

Wi . (2.6)

Figure 2.7 illustrates this equality for a simple queue: the shaded area canbe calculated as an integral over time, or as a sum over customers. More


generally we see equality holds by the following argument. Imagine eachcustomer pays at a rate of 1 per unit time while in the system (andso thetotal paid by thenth customer in the system is justWn). Then the two sidesof (2.6) are just different ways of calculating the total amount paid duringa regenerative cycle.

��

��

Arrivals

Departures

T1

N

number of customers

t

Figure 2.7 Shaded area is∫ T1

0n(s)ds, but also

∑Ni=1 Wi .

�

Remark 2.14 Little’s law holds whenever we can reasonably define thequantitiesL, λ, andW. The proof of the relationship between them is goingto be essentially the same; the question is simply whether itmakes sense totalk of the average arrival rate, number of customers in the system, or timespent in the system.

Exercises

Exercise 2.14 The Orchard tea garden remains open 24 hours per day,365 days per year. The total number of customers served in the tea gar-den during 2012 was 21% greater than the total for 2011. In each year,the number of customers in the tea garden was recorded at a large num-ber of randomly selected times, and the average of those numbersin 2012was 16% greater than the average in 2011. By how much did the averageduration of a customer visit to the Orchard increase or decrease?

Exercise 2.15 Use Little’s law to check the result in Exercise 1.7.

Exercise 2.16 Customers arrive according to a Poisson process with rateν at a single server, but a restricted waiting room causes those whoarrive

2.6 Linear migration processes 41

Little’s law—)Poisson point

process

whenn customers are already present to be lost. Accepted customers haveservice times which are independent and identically distributed with meanµ, and independent of the arrival process. (Service times are not necessarilyexponentially distributed.) Identify a regeneration point for the system. IfPj is the long-run proportion of timej customers are present, show that

1− P0 = νµ(1− Pn).

Exercise 2.17 Consider two possible designs for anS-server queue. Wemay have all the customers wait in a single queue; the person at the head ofthe line will be served by the first available server. (This is the arrangementat the Cambridge railway ticket office.) Alternatively, we might haveSqueues (one per server), with each customer choosing a queue to join whenshe enters the system. Assuming that customers can switch queues, and inparticular will not allow a server to go idle while anyone is waiting, is therea difference in the expected waiting time between the two systems? Istherea difference in the variance?

2.6 Linear migration processes

In this Section we shall consider systems which have the property that afterindividuals (or customers or particles) have entered the systemthey moveindependently through it. We’ve seen the example of linear open migra-tion networks, where in equilibrium the numbers in distinct colonies areindependent Poisson random variables. We’ll see that this result can be besubstantially generalized, to time-dependent linear migration processes ona general space. We’ll need a little more mathematical infrastructure, whichis useful also to better comprehend Poisson processes.

Let X be a set, andF aσ-algebra of measurable subsets ofX. (If youare not familiar with measure theory, then just view the elements ofF asthe subsets ofX in which we are interested.)

A Poisson point process with mean measure Mis a point process forwhich, if E1, . . . ,Ek ∈ F are disjoint sets, then the number of points inE1, . . . ,Ek ∈ F are independent Poisson random variables, withM(Ei) themean number of points inEi .

As a simple example, letX = R, let F be the Borelσ-algebra gener-ated by open intervals, and letM be Lebesgue measure (intuitively,M(E)measures the size of the setE – its length ifE is an interval – andE ∈ Fif the size can be sensibly defined). The probability of no points in the in-terval (0, t) is, from the Poisson distribution,e−t; hence the distance from 0


to the first point to the right of 0 has an exponential distribution. Similarlythe various other properties of a Poisson process indexed by timecan bededuced. For example, given there areN points in the interval (0, t), theirpositions are independent and uniformly distributed.

Another example is a Poisson process on [0, 1] × [0,∞), with the Borelσ-algebra and Lebesgue measure. We can think of [0, 1] as “space” and[0,∞) as “time”: then this is a sequence of points arriving as a unit ratePoisson process in time, and each arriving point is distributed uniformlyon [0, 1], independently of all the other points. The collections ofpointsarriving to different subsets of [0, 1], or over different time periods, areindependent. If we colour red those points whose space coordinate liesin (0, p), and blue those points whose space coordinate lies in (p, 1), for0 < p < 1, then the times of arrival of red and blue points form inde-pendent Poisson processes with ratesp and 1− p respectively. The arrivalprocess of red points is formed bythinning the original arrival process: anarriving point is retained with probabilityp, independently from point topoint. Similarly if two independent Poisson processes with mean measuresM1,M2 are superimposed we obtain a Poisson process with mean measureM1 + M2.

Suppose now that individuals arrive intoX as a Poisson stream of rateν, and then proceed to move independently throughX according to somestochastic process before (possibly) leavingX. We would like to under-stand the collection of points that represents the (random) set of individualsin the system at any given time.

Example 2.15 Recall our earlier discussion of linear migration processes,in Section 2.4. Then we can takeX to be the set of colonies{1, 2, . . . , J}or {1, 2, . . . }. We assume that individuals enter the system into colonyj asa Poisson process of rateν j . After arriving at colonyj an individual staysthere for a certain random amount of timeT j with meanλ−1

j , theholdingtimeof the individual in colonyj. At the end of its holding time in colonyj, an individual moves to colonyk with probability p( j, k), or leaves thesystem with probabilityp( j,→).

In our earlier discussion of linear migration processes in section 2.4, thetimesT j were exponentially distributed with parameterλ j , and all holdingtimes were independent. But now, in this example, we relax thisassump-tion, allowingT j to be arbitrarily distributed with meanλ−1

j , and we allowthe holding times of an individual at the colonies she visitsto be depen-dent. (But we insist these holding times are independent of those of otherindividuals, and of the Poisson arrival process.) In Exercise 2.20, we will


insensitivityBartlett’s

theorem

show that this does not change the limiting distribution for the numbers indifferent colonies, a result known asinsensitivity.

Example 2.16 Consider cars moving on a highway. The setX = R+ cor-responds to a semi-infinite road. Cars arrive at the point 0 as a Poissonprocess, and then move forward (i.e., right). If the highway is wide anduncongested, we may assume that they move independently of each other,passing if necessary. At any time, the set of car locations is arandom col-lection of points inX, and we are interested in properties of this randomcollection as it evolves in time. To determine the stochasticprocess fully,we need to specify the initial state of the highway at some timet = 0, andthe way in which each car moves through the system. We will do this inExercise 2.18.

For E ∈ F define the quantity

P(u,E) = P(individual is inE a timeu after her arrival into the system).

Theorem 2.17(Bartlett’s theorem) If the system is empty at time0 thenat time t individuals are distributed overX according to a Poisson processwith mean measure

M(t,E) = ν∫ t

0P(u,E) du, E ∈ F .

Remark 2.18 The form for the mean measure should not be a surprise:the mean number of individuals inE at time t is obtained by integratingover (0, t) the rate at which individuals arrive multiplied by the probabilitythat at timet they are inE. But we need to prove that the numbers in disjointsubsets are independent, with a Poisson distribution.

Proof Let E1, . . . ,Ek ∈ F be disjoint, and letnj(t) be the number of in-dividuals inE j at timet. In order to calculate the joint distribution of therandom variablesnj(t), we will compute the joint probability generatingfunction. We will use the shorthandzn(t) to meanzn1(t)

1 . . . znk(t)k ; the probabil-

ity generating function we want is thenEzn(t).Let m be the number of arrivals into the system in (0, t). Conditional on

m, the arrival timesτ1, . . . , τm are independent and uniform in (0, t) (un-ordered, of course). LetAri be the event that the individual who arrived attimeτr is in Ei at timet; thenP(Ari ) = P(t − τr ,Ei).


Now,

E[zn(t) |m, τ1, . . . , τm] = E

m∏

r=1

k∏

i=1

zI [Ari ]i |m, τ1, . . . , τm

=

m∏

r=1

E

k∏

i=1

zI [Ari ]i | τr

(by independence ofτr )

=

m∏

r=1

1−k∑

i=1

(1− zi)P(t − τr ,Ei)

,

by considering the various places the arrival at timeτr could be (in at mostone ofEi , i = 1, 2, . . . , k). Taking the average overτr , we get

E[zn(t) |m] =m∏

r=1

1−k∑

i=1

(1− zi)1t

∫ t

0P(t − τ,Ei) dτ

=

1−k∑

i=1

(1− zi)1t

∫ t

0P(t − τ,Ei) dτ

m

.

To take the average overm, note thatm is a Poisson random variable withmeanνt, and for a Poisson random variableX with meanλ we haveEzX =

e−(1−z)λ. Therefore,

E[zn(t)] = exp

−νtk∑

i=1

(1− zi)1t

∫ t

0P(u,Ei) du

=

k∏

i=1

exp(−(1−zi)M(t,Ei)).

This shows that the joint probability generating function ofn1(t), . . . , nk(t)is the product of the probability generating functions of Poisson randomvariables with meansM(t,Ei); therefore,n1(t), . . . , nk(t) are independentPoisson random variables with the claimed means, as required. �

Remark 2.19 Some intuition (and a rather slicker proof) for the aboveresult can be developed as follows. Colour an individual who arrives atthe system at timeτ with colour j if that individual is inE j at timet, forj = 1, · · · , J, and leave it uncoloured if it is not in any of these sets at timet. Then the colours of individuals are independent, and the probability anarrival at timeτ is colouredj is P(t−τ,E j). We are interested in the coloursof the arrivals over (0, t), and the result then follows from general thinningand superposition results for Poisson processes.

Corollary 2.20 As t → ∞ the distribution of individuals overX ap-


Little’s lawLittle’s lawinsensitivity

proaches a Poisson process with mean measure

M(E) = ν∫ ∞

0P(u,E) du, E ∈ F .

Remark 2.21 Observe that∫ ∞

0P(u,E) du is the mean time spent by an

individual in the setE, and so this expression for the mean number in thesetE is Little’s law. (Of course Little’s law does not give the distribution,only the mean.)

Exercises

Exercise 2.18 Cars arrive at the beginning of a long highway in a Poissonstream of rateν from time t = 0 onwards. A car has a fixed velocityV > 0which is a random variable. The velocities of cars are independent andidentically distributed, and independent of the arrival process. Cars canovertake each other freely. Show that the number of cars on the firstxmiles of road at timet has a Poisson distribution with mean

νE

[ xV∧ t

]

.

Exercise 2.19 Airline passengers arrive at a passport control desk in ac-cordance with a Poisson process of rateν. The desk operates as a single-server queue at which service times are independent and exponentially dis-tributed with meanµ(< ν−1) and are independent of the arrival process.After leaving the passport control desk a passenger must pass through a se-curity check. This also operates as a single-server queue, but one at whichservice times are all equal toτ(< ν−1). Show that in equilibrium the proba-bility both queues are empty is

(1− νµ)(1− ντ).

If it takes a timeσ to walk from the first queue to the second, what is theequilibrium probability that both queues are empty and there is no passen-ger walking between them?[Hint: Use Little’s Law to find the probability the second queue is empty;use the approach of Section 2.2 to establish independence; and use theproperties of a Poisson process to determine the distribution of the numberof passengers walking between the two queues. ]

Exercise 2.20 Consider the linear migration process of Example 2.15.Write down the limiting distribution of the vector (n1, n2, . . . , nJ), the num-bers of individuals in theJ colonies. Show that the limiting distribution of


nj depends on the distribution ofT j only throughET j , the expected amountof time that an individual customer spends in colonyj. The limiting distri-bution isinsensitiveto the precise form of the holding time distribution, aswell as to dependencies between an individual’s holding times at coloniesvisited.

2.7 Generalizations

The elementary migration processes considered earlier in this Chapter canbe generalized in various ways, while still retaining the simplicity of aproduct-form equilibrium distribution. We shall not go into this in greatdetail, but with a few examples we will briefly sketch two generalizations:namely multiple classes of customer and general service distributions.

Example 2.22(Communication network/optimal allocation) Our first ex-ample is of a network (e.g., of cities) with one-way communication linksbetween them. The links are used to transmit data packets; and if there arecapacity constraints on the links, it is possible that packets may have toqueue for service. If we model the links as queues (as in the right-hand di-

1

3

2

4

5

1

4

52

3

Figure 2.8 A communication network. On the right, the samenetwork with links modelled as queues.

agram of Figure 2.8), we get something that isn’t quite an open migrationnetwork; in an open migration network, individuals can travel incycles,whereas packets would not do that. (In fact, a natural model of this systemis as amulti-classqueueing network, but we will not be developing thatframework here.)

Let aj be the average arrival rate of messages at queuej, and nj bethe number of messages in queuej. Suppose that we can establish that in

2.7 Generalizations 47

Little’s lawLagrange

multiplierKleinrock’s

square rootassignment

ARPANETprocessor

sharing

equilibrium

P(nj = n) =

(

1−aj

φ j

) (aj

φ j

)n

,

whereφ j is the service rate at queuej. (It is possible to prove under someassumptions that the equilibrium distribution of a multi-class queueing net-work is of product form, with these distributions as marginal distributions.)

Suppose now that we are allowed to choose the service rates (capaci-ties)φ1, . . . , φJ, subject only to the budget constraint

∑

φ j = F, where allsummations are overj = 1, 2, . . . , J. What choice will minimize the meannumber of packets in the system, or (equivalently, in view of Little’s law)the mean period spent in the system by a message?

The optimization problem we are interested in solving is:

minimizeE(∑

nj

)

=∑ aj

φ j − aj

subject to∑

φ j = F,

over φ j ≥ 0 ∀ j.

We shall solve this problem using Lagrangian techniques (see Appendix Cfor a review). Introduce a Lagrange multipliery for the budget constraint,and let

L(φ; y) =∑ aj

φ j − aj+ y

(∑

φ j − F)

.

Setting∂L/∂φ j = 0, we find thatL is minimized overφ by the choice

φ j = aj +

√

aj/y.

Substitution of this into the budget constraint and solving for y shows thatwe should choose

φ j = aj +

√aj

∑ √ak

(F −∑

ak).

This is known asKleinrock’s square root channel capacity assignment:any excess capacity (beyond the bare minimum required to service themean arrival rates) is allocated proportionally to the square root of the ar-rival rates. This approach was used by Kleinrock (1964, 1976) for the earlycomputer network ARPANET, the forerunner of today’s Internet, when ca-pacity was expensive and queueing delays were often lengthy.

Example 2.23(Processor sharing) Consider a model in whichK indi-viduals, or jobs, oscillate between being served by a processor and being


balance!detailedinsensitivity

idle, where the processor has a total service rateµ that it shares out equallyamong all then0 active jobs. (One might imagineK customers using aweb server for their shopping, with idleness corresponding to waiting for acustomer’s response.) Each job has an exponential service requirement ofunit mean at the processor, queue 0, and so ifn0 jobs are at queue 0 theneach one of them departs at rateµ/n0. Suppose when jobk becomes idleit remains so for an exponentially distributed time with parameter λk. Theschematic diagram appears in Figure 2.9.

λ1

λ2

λK

n0

n1

n2

...µ/n0

nK

Figure 2.9 Processor sharing system:n0 jobs are currently beingserved, andnk = 0 or 1 fork , 0.

In Exercise 2.21, you will check that the stationary distribution for thissystem is

π(n0, . . . , nK) ∝ n0!µ−n0

K∏

k=1

λ−nk

k (2.7)

and satisfies detailed balance.The interesting fact about this example is that while in order to check

detailed balance we need quite specific distributional assumptions (expo-nential service requirements and idle times), the stationary distribution for(n0, . . . , nK) is actually the same if the distributions are quite arbitrary withthe same mean. There are several scheduling disciplines for which this sortof insensitivityholds, including the queues of this example. The Erlangmodel of Section 1.3 is insensitive to the distribution of call holding peri-ods, as is its network generalization of the next Chapter. We looked brieflyat this phenomenon in Exercise 2.20, but we shall not study it in more de-tail. (Notably, a single-server queue with first-come first-serve schedulingdisciplineis sensitive to its service time distribution.)

Example 2.24(A wireless network) Our next example is a model of a

2.7 Generalizations 49

interferencegraph

balance!detailedprocessor

sharingbalance!detailedindependent

vertex set

network of K nodes sharing a wireless medium. Let the nodes form thevertex set of a graph, and place an edge between two nodes if the twonodes interfere with each other. Call the resulting undirected graph thein-terference graphof the network. LetS ⊂ {0, 1}K be the set of vectorsn = (n1, n2, . . . , nK) with the property that ifi and j have an edge betweenthem thenni · nj = 0. Let n be a Markov process with state spaceS andtransition rates

q(n, n− ek) = µk if nk = 1,

q(n, n+ ek) = νk if n+ ek ∈ S,

whereek ∈ S is a unit vector with a 1 as itskth component and 0s else-where. We interpretnk = 1 as indicating that nodek is transmitting. Thusa node is blocked from transmitting whenever any of its neighbours areactive. Nodek starts transmitting at rateνk provided it is not blocked, andtransmits for an exponentially distributed period of time with parameterµk.

It is immediate that the stationary distribution for this systemis

π(n1, . . . , nK) ∝K∏

k=1

(

νk

µk

)nk

(2.8)

normalized over the state spaceS, and that it satisfies detailed balance.(Indeed the distribution is unaltered if the periods for which nodek trans-mits are independent random variables arbitrarily distributed with mean1/µk, and if the periods for which nodek waits before attempting a trans-mission following either the successful completion of, or a blocked attemptat, a transmission are independent random variables arbitrarily distributedwith mean 1/νk.)

We’ll look at variants of this model in Chapters 3, 5 and 7.

Exercises

Exercise 2.21 Check (by verifying detailed balance) that expression (2.7)gives the stationary distribution for the processor sharing example.

Exercise 2.22 Deduce, from the form (2.8), that ifλ = νk/µk does notdepend onk then the stationary probability of any given configuration(n1, . . . , nK) depends on the the number of active transmissions,

∑

k nk, butnot otherwise on which stations are transmitting.

An independent vertex setis a set of vertices of the interference graphwith the property that no two vertices are joined by an edge. Show that as


Little’s lawPASTA

property

λ → ∞ the stationary probability approaches a uniform distribution overthe independent vertex sets of maximum cardinality.

2.8 Further reading

Whittle (1986), Walrand (1988), Bramson (2006) and Kelly (2011) givemore extended treatments of migration processes and queueing networks,including the generalizations of Section 2.7. Pitman (2006) provides anextensive treatment of the Chinese restaurant process (Exercise 2.13). Bac-celli and Bremaud (2003) develop a very general framework for the studyof queueing results such as Little’s law and the PASTA property.Kingman(1993) is an elegant text on Poisson processes on general spaces.

The short-hand queueing notation was suggested by Kendall (1953):thus aD/G/K queue has a deterministic arrival process, general servicetimes, andK servers. Asmussen (2003) provides an authoritative treatmentof queues, and of the basic mathematical tools for their analysis.

lossnetwork—(

Erlang’sformula3

Loss networks

We’ve seen, in Section 1.3, Erlang’s model of telephone link.But whathappens if the system consists of many links, and if calls of different types(perhaps voice, video or conference calls) require different resources? Inthis Chapter we describe a generalization of Erlang’s model which treatsa network of links, and which allows the number of circuits required todepend upon the call. The classical example of this model is atelephonenetwork, and it is natural to couch its definition in terms of calls, links andcircuits. Circuits may be physical (in Erlang’s time, a strand of copper; orfor a wireless link, a radio frequency) or virtual (a fixed proportion ofthetransmission capacity of a communication link such as an optical fibre).The term ‘circuit-switched’ is common in some application areas,whereit is used to describe systems in which, before a request (which maybea call, a task or a customer) is accepted, it is first checked thatsufficientresources are available to deal with each stage of the request. The essentialfeatures of the model are that a call makes simultaneous use of a numberof resources and that blocked calls are lost.

The simplest model of a loss network is a single link withC circuits,where the arrivals are Poisson of rateν, the holding times are exponentialwith mean 1, and blocked calls are lost. In this case, we know from Erlang’sformula (1.5)

P(an arriving call is lost)= E(ν,C) =νC

C!

C∑

j=0

ν j

j!

−1

.

3.1 Network model

Let the set of links beJ = {1, 2, . . . , J}, and letC j be the number of circuitson link j. A route r is a subset of{1, 2, . . . , J}, and the set of all routes iscalledR (this is a subset of the power set ofJ). Let R = |R|. Figure 3.1

51

52 Loss networks

Figure 3.1 A loss network with 6 links and 6 two-link routes.

shows an example of a network with 6 links, in which “routes” are pairs ofadjacent links.

Call requesting router arrive as a Poisson process of rateνr , and asrvaries it indexes independent Poisson streams. A call requesting router isblocked and lost if on any linkj ∈ r there is no free circuit. Otherwise thecall is connected and simultaneously holds one circuit on each link j ∈ rfor the holding period of the call. The call holding period is exponentiallydistributed with unit mean and independent of earlier arrival and holdingtimes.

Let A be thelink-route incidence matrix,

Ajr =

1, j ∈ r

0, j < r.

Remark 3.1 Often the set of links comprising a route will form a pathbetween two nodes for some underlying graph, but we do not make thisa requirement: for example for a conference call the set of links mayin-stead form a tree, or a complete subgraph, in an underlying graph. Later,from Section 3.3, we shall allow entries ofA to be arbitrary non-negativeintegers, with the interpretation that a call requesting router requiresAjr

circuits from link j, and is lost if on any linkj = 1, . . . , J there are fewerthanAjr circuits free. But for the moment assumeA is a 0-1 matrix, andconveys the same information as the relationj ∈ r.

Let nr be the number of calls in progress on router. The number ofcircuits busy on linkj is given by

∑

r Ajr nr . Let n = (nr , r ∈ R), and letC = (C j , j ∈ J). Thenn is a Markov process with state space

{n ∈ ZR+ : An≤ C}

(the inequality is to be read componentwise). This process is called alossnetwork with fixed routing.

3.2 Approximation procedure 53

interferencegraph

Erlang’sformula

Erlang fixedpoint

We will later compute the exact equilibrium distribution forn. But firstwe describe a widely used approximation procedure.

Exercises

Exercise 3.1 Suppose thatµk = 1, k = 1, . . . ,K in the model of Exam-ple 2.24. Show that the model is a loss network, with the resourcesetJtaken as the set of edges in the interference graph.

3.2 Approximation procedure

The idea underlying the approximation is simple to explain. Let Bj be theprobability of blocking on linkj. Suppose that a Poisson stream of rateνris thinned by a factor 1− Bi at each linki ∈ r \ j before being offered tolink j. If these thinnings could be assumed independent both from linktolink and over all routes passing through linkj (they clearly are not), thenthe traffic offered to link j would be Poisson at rate

∑

r

Ajr νr

∏

i∈r\{ j}(1− Bi)

(the reduced load), and the blocking probability at linkj would be givenby Erlang’s formula

Bj = E

∑

r

Ajr νr

∏

i∈r−{ j}(1− Bi),C j

, j = 1, . . . , J. (3.1)

Does a solution exist to these equations? First, let’s recall (or note) theBrouwer fixed point theorem: a continuous map from a compact, convexset to itself has at least one fixed point. But (3.1) defines a continuous mapF : [0, 1]J → [0, 1]J via

(Bj , j = 1, . . . , J) 7→

E(∑

r

Ajr νr

∏

i∈r−{ j}(1− Bi),C j

)

, j = 1, . . . , J

,

and [0, 1]J is convex and compact. Hence there exists a fixed point. We willlater see that this fixed point is unique, and we call it, i.e. the solution to(3.1), theErlang fixed point.

Example 3.2 Consider the following network with three cities linked viaa transit node in the middle. We are given the arrival ratesν12, ν23, ν31. (We

54 Loss networks

1 2

3

C1 C2

C3

Figure 3.2 Telephone network for three cities

write ν12 for the more cumbersomeν{1,2}). The approximation reads

B1 = E(ν12(1− B2) + ν31(1− B3),C1)

B2 = E(ν12(1− B1) + ν23(1− B3),C2)

B3 = E(ν23(1− B2) + ν31(1− B1),C3)

with corresponding loss probability along the route{1, 2}

L12 = 1− (1− B1)(1− B2)

and similarly forL23, L31.

Remark 3.3 How can we solve the equations (3.1)? If we simply iter-ate the transformationF, we will usually converge to the fixed point. Thismethod, calledrepeated substitution, is often used in practice. If we usea sufficiently damped iteration, i.e. take a convex combination of the pre-vious point and its image underF as the next point, we are guaranteedconvergence. The complication of damping can be avoided by avariant ofrepeated substitution whose convergence you will establishlater, in Exer-cise 3.7.

The approximation is often surprisingly accurate. Can we provide anyinsight into why it works well when it does? This will be a majoraim inthis Chapter.

3.3 Truncating reversible processes

We now determine a precise expression for the equilibrium distribution ofa loss network with fixed routing.

Consider a Markov process with transition rates (q( j, k), j, k ∈ S). Saythat it is truncatedto a setA ⊂ S if q( j, k) is changed to 0 forj ∈ A,k ∈ S \ A, and if the resulting process is irreducible withinA.

3.3 Truncating reversible processes 55

balance!detailedbalance!partial

Lemma 3.4 If a reversible Markov process with state spaceS and equi-librium distribution(π( j), j ∈ S) is truncated toA ⊂ S, then the resultingMarkov process is reversible and has equilibrium distribution

π( j)

∑

k∈Aπ(k)

−1

, j ∈ A. (3.2)

Proof

π( j)q( j, k) = π(k)q(k, j)

by the reversibility of the original process, so the probability distribution (3.2)satisfies detailed balance. �

Remark 3.5 If the original process is not reversible, then (3.2) is theequilibrium distribution of the truncated process if and only if

π( j)∑

k∈Aq( j, k) =

∑

k∈Aπ(k)q(k, j).

Earlier, equations of this form were termed “partial balance”.

Now, consider a loss network with fixed routing for whichC1 = . . . =

CJ = ∞. If this is the case, arriving calls are never blocked; they simplyarrive (at rateνr), remain for an exponentially distributed amount of timewith unit mean, and leave. Henceforth allow the link-route incidence ma-trix A to have entries inZ+, not just 0 or 1.

This system is described by a linear migration process with transitionrates

q(n,T→rn) = νr , q(n,Tr→n) = nr

and equilibrium distribution

∏

r∈Re−νrν

nrr

nr !, n ∈ ZR.

(Since the capacities are infinite, the individual routes become indepen-dent.) If we now truncaten to S(C) = {n : An ≤ C}, we will get preciselythe original loss network with fixed routing (and finite capacities). There-fore, its equilibrium distribution is

π(n) = G(C)∏

r

νnrr

nr !, n ∈ S(C) = {n : An≤ C} (3.3)

56 Loss networks

with

G(C) =

∑

n∈S(C)

∏

r

νnrr

nr !

−1

.

Further, the equilibrium probability that a call on router will be acceptedis

1− Lr =∑

n∈S(C−Aer )

π(n) =G(C)

G(C − Aer )

whereer ∈ S(C) is the unit vector describing one call in progress on router.

Remark 3.6 This is not a directly useful result for large (C big) or com-plex (R big) networks, because of the difficulty of computing the normal-izing constant (in the case of an arbitrary matrixA, this is an NP-hardproblem). However, we might hope for a limit result. We know that forlargeν andC the Poisson distribution is going to be well approximated bya multivariate normal. Conditioning a multivariate normal on aninequal-ity will have one of two effects: if the center is on the feasible side of theinequality, then the constraint has very little effect; if the center is on theinfeasible side of the inequality, we will effectively restrict the distributionto the boundary of the constraint (because the tail of the normal distribu-tion dies off very quickly), and the restriction of a multivariate normal to anaffine subspace is again a multivariate normal distribution. In later sectionswe shall make this more precise.

We end this section with some further examples of truncated processes.These examples show that a variety of models may reduce to be equivalentto a loss network with fixed routing (and in particular the equilibrium dis-tribution is given by independent Poisson random variables conditioned ona set of linear inequalities).

Example 3.7(Call repacking) Consider the network in Figure 3.3 joiningthree nodes. Suppose that calls can be rerouted, even while in progress, ifthis will allow another call to be accepted. (We also, as usual, assume thatarrivals are Poisson and holding times are exponential.)

Let nαβ be the number of calls in progress betweenα andβ. Thenn =(n12, n23, n31) is Markov, since all we need to know is the number of callsbetween each pair of cities. Further, we claimn = (n12, n23, n31) is a linear

3.3 Truncating reversible processes 57

3

21

C1C2

C3

Figure 3.3 Loss network with call repacking

migration process with equilibrium distribution∏

r

e−νrν

nrr

nr !, r = {1, 2}, {2, 3}, {3, 1}

truncated to the set

A = {n : n12 + n23 ≤ C3 +C1, n23 + n31 ≤ C1 +C2, n31 + n12 ≤ C2 +C3}.

Indeed, it’s clear that these restrictions are necessary: that is, if the statenis not inA then it cannot be reached. (Each of the three inequalities is acut constraintseparating one of the vertices from the rest of the network).Next we show they are sufficient: that is, ifn ∈ A then it is feasible topack the calls so that all the calls inn are carried. First note that if all threeinequalities of the formn23 ≤ C1 hold (one for each direct route), the statenis clearly feasible. If not, suppose without loss of generalityn23 > C1. Thenwe can routeC1 of the calls directly, and reroute the remainingn23−C1 callsvia node 1. This will be possible providedn23−C1 ≤ min(C3−n12,C2−n31),i.e. providedn12 + n23 ≤ C1 + C3 andn23 + n31 ≤ C1 + C2, which willnecessarily hold sincen ∈ A. Thus in this case too the staten is feasible.

Thus, the equilibrium distribution is

π(n) = G(A)∏

r

νnrr

nr !, n ∈ A, r = {1, 2}, {2, 3}, {3, 1}.

Note that the processn is equivalent to the loss network with fixed rout-ing illustrated in Figure 3.4: the transition rates, as well as the state spaceand the equilibrium distribution, are identical.

Example 3.8(A cellular network) Consider a cellular network, with sevencells arranged as in Figure 3.5. There areC distinct radio communicationchannels, but two adjacent cells cannot use the same channel. (A channel

58 Loss networks

1 2

3

C1 +C2

C1 +C3C2 +C3

Figure 3.4 An equivalent network to the repacking model

2

7

6

5

4

1

3

Figure 3.5 A cellular network: adjacent cells cannot use thesame radio channel.

may be a frequency or a time slot, and interference prevents the samechan-nel being used in adjacent cells. Two cells are adjacent in Figure 3.5 if theyshare an edge.) Calls may be reallocated between channels, even while theyare in progress, if this will allow another call to be accepted. Itis clear thatnα + nβ + nγ ≤ C for all cellsα, β, γ that meet at a vertex in Figure 3.5. InExercise 3.3 it is shown that these are the only constraints. Thus the net-work is equivalent to a loss network with fixed routing: it is as ifthere is avirtual link of capacityC associated with each vertex, and a call arriving ata cell requires one circuit from each of the virtual links associated with thevertices of that cell.

Exercises

Exercise 3.2 Let m = C − An, so thatmj is the number of free circuitson link j, and letπ′(m) =

∑

n:An=C−mπ(n) be the equilibrium probabilityof the event thatmj circuits are free on linkj for j ∈ J . Establish theKaufman-Dziong-Roberts recursion

(C j −mj)π′(m) =

∑

r:Aer≤C−m

Ajr νrπ′(m+ Aer ).

3.4 Maximum probability 59

balance!detailedStrong

LagrangianPrinciple

[Hint: CalculateE[nr |m], using the detailed balance condition forπ(n).Note that the recursion can be used to solve forπ′(m) in terms ofπ′(C),using the boundary condition thatπ′(m) = 0 if mj > C j for any j. ]

Exercise 3.3 Establish the result claimed in Example 3.8: that is, there isan allocation of radio channels to cells such that no channel is used in twoadjacent cells if and only ifnα + nβ + nγ ≤ C for all cellsα, β, γ that meetat a vertex in Figure 3.5.[Hint: Begin by allocating channels 1, . . . , n1 to cell 1, and channelsn1 +

1, . . . , n1 + n2 to cell 2,n1 + 1, . . . , n1 + n4 to cell 4, andn1 + 1, . . . , n1 + n6

to cell 6. Generalizations of this example are treated in Pallant and Taylor(1995) and Kindet al. (1998).]

3.4 Maximum probability

To obtain some insight into the probability distribution (3.3)we begin bylooking for its mode, that is maxπ(n) overn ∈ ZR

+ andAn≤ C. Write

logπ(n) = log

G(C)∏

r

νnrr

nr !

= logG(C) +∑

r

(

nr logνr − log(nr !))

.

By Stirling’s approximation,

n! ∼√

2πnn+ 12 e−n asn→ ∞,

and so

log(n!) = n logn− n+O(logn).

We will now replace the discrete variablen by a continuous variablex, andalso ignore theO(logn) terms to obtain the following problem.

PRIMAL:

maximize∑

r

(xr logνr − xr log xr + xr )

subject toAx≤ C (3.4)

over x ≥ 0.

The optimum in this problem is attained, since the feasible region is com-pact. The Strong Lagrangian Principle holds (Appendix C) since the objec-tive function is concave, and the feasible region is defined by a set of linear

60 Loss networks

Lagrange mul-tiplier!slackvariables

Lagrangemultiplier

StrongLagrangianPrinciple

Lagrangemulti-plier!complementaryslackness

inequalities. The Lagrangian is

L(x, z; y) =∑

r

(xr logνr − xr log xr + xr) +∑

j

yj(C j −∑

r

Ajr xr − zj).

Here,z ≥ 0 are the slack variables in the constraintAx ≤ C and y areLagrange multipliers. Rewrite as

L(x, z; y) =∑

r

xr +∑

r

xr (logνr − log xr −∑

j

yjAjr ) +∑

j

yjC j −∑

j

yjzj

which we attempt to maximize overx, z ≥ 0. To obtain a finite maximumwhen we maximize the Lagrangian overz≥ 0 we requirey ≥ 0; and giventhis we have at the optimumy · z = 0. Further, differentiating with respectto xr gives

logνr − log xr −∑

j

yjAjr = 0,

and hence at the maximum

xr = νre−∑

j y j A jr .

Thus if for giveny we maximize the Lagrangian overx, z≥ 0 we get

maxx,z≥0

L(x, z; y) =∑

r

νre−∑

j y j A jr +∑

j

yjC j

providedy ≥ 0. This gives the following as the Lagrangian dual problem.

DUAL:

minimize∑

r

νre−∑

j y j A jr +∑

j

yjC j (3.5)

over y ≥ 0.

By the Strong Lagrangian Principle, there existsy such that the Lagrangianform L(x, z; y) is maximized atx = x, z = z that are feasible (i.e.x ≥0, z = C − Ax ≥ 0), andx, z are then optimal for the PRIMAL problem.Necessarilyy ≥ 0 (dual feasibility) andy·z= 0 (complementary slackness).

Let

e−y j = 1− Bj .

We can rewrite the statement that there existy, z satisfying the conditions

3.4 Maximum probability 61



Lagrangemultiplier


y ≥ 0 andy · z = 0 in terms of the transformed variables as follows: thereexist (B1, . . . , BJ) ∈ [0, 1)J that satisfy

CONDITIONS ON B:

∑

r

Ajrνr

∏

i

(1− Bi)Air = C j if Bj > 0

≤ C j if Bj = 0.(3.6)

(To check the correspondence with the complementary slackness condi-tions, note thatBj > 0 if and only if yj > 0. If you don’t know the StrongLagrangian Principle, don’t worry: in a moment we’ll establish directly theexistence of a solution to the conditions onB.)

Remark 3.9 The conditions onB have an elegant fluid flow interpreta-tion. With the substitutione−y j = 1−Bj the flowxr = νr

∏

i(1−Bi)Air lookslike a thinning of the arrival streamνr by a factor (1− Bi) for each circuitrequested from linki for each linki that the router goes through. Then∑

r Ajr νr∏

i(1− Bi)Air is the aggregated flow on linkj. The conditions onBtell us that the aggregated flow on linkj does not exceed the link’s capacity,and that blocking only occurs on links that are at full capacity.

Let’s summarize what we have shown so far, adding a few additionalpoints.

Theorem 3.10 There exists a unique optimal solutionx = (xr , r ∈ R) tothe PRIMAL problem(3.4). It can be expressed in the form

xr = νr

∏

j

(1− Bj)A jr , r ∈ R,

where B= (B1, . . . , BJ) is any solution to(3.6), the conditions on B. Therealways exists a vector B satisfying these conditions; it is unique if A hasrank J. There is a one-to-one correspondence between vectors satisfyingthe conditions on B and optima of the DUAL problem(3.5), given by

1− Bj = e−y j , j = 1, . . . , J.

Proof Strict concavity of the PRIMAL objective function gives unique-ness of the optimumx. The form of the optimumx was found above interms of the Lagrange multipliersy, or equivalentlyB.

The explicit form of the DUAL problem will allow us to establish di-rectly the existence ofB satisfying the conditions onB, without relying onthe Strong Lagrangian Principle. Note that in the DUAL problem weare

62 Loss networks

minimising a convex, differentiable function over the positive orthant, andthe function grows to infinity asyj → ∞ for each j = 1, . . . , J. Therefore,the function achieves its minimum at some pointy in the positive orthant.If the minimumy hasyj > 0, then the partial derivative∂/∂yj | y = 0. Ifyj = 0, then the partial derivative must be nonnegative. (See Figure 3.6.)

Figure 3.6 Minimum of a convex function over nonnegativevalues occurs either where the derivative is zero, or at 0 with thederivative nonnegative.

But this partial derivative of the DUAL objective function is simply

−∑

r

Ajr νre−∑

i yi Air +C j ,

and so the minimum overy ≥ 0 will be at a point where

∑

r

Ajr νre−∑

i yi Air = C j if yj > 0

≤ C j if yj = 0.

Note that these are precisely the conditions onB, under the substitutionBj = 1 − e−y j ; this gives the existence of a solution to the conditions onB, and establishes the one-to-one correspondence with the optimaof theDUAL problem.

Finally, note that the DUAL objective function∑

r νre−∑

j y j A jr is strictlyconvex in the components ofyA. If A has rankJ, the mappingy 7→ yA isone-to-one, and therefore the objective function is also strictly convex inthe components ofy, and so there is a unique optimum. �

Example 3.11 (Rank-deficiency) IfA is rank-deficient, the DUAL objec-tive function may not be strictly convex in the components ofy, and the

3.5 A central limit theorem 63

optimizingy may not be unique. Consider

A =

1 0 0 11 1 0 00 1 1 00 0 1 1

corresponding to the system in Figure 3.7. The routes inRare{1, 2}, {2, 3},

1 3

4

2

Figure 3.7 A network with a rank-deficient matrix

{3, 4}, {4, 1}, and the matrixA has rank 3. In Exercise 3.4 it is shown that asolution to the conditions onB will, in general, be non-unique.

Remark 3.12 A will have rankJ if there is some single-link traffic oneach link (i.e. there exists a router = { j} for each j ∈ R), since then thematrix A will contain the columns of theJ × J identity matrix amongst itscolumns. This is a natural assumption in many cases.

Exercises

Exercise 3.4 Consider Example 3.11. Show that ifB1, B2, B3, B4 > 0solve the conditions onB, then so do

1− d(1− B1), 1− d−1(1− B2), 1− d(1− B3), 1− d−1(1− B4),

i.e. only (1− Bodd)(1− Beven) is fixed. Deduce that for this example, wherethe matrixA is rank-deficient, a solution to the conditions onB may benon-unique.

3.5 A central limit theorem

We noted earlier that we expect the truncated multivariate Poisson distri-bution (3.3) to approach a conditioned multivariate normal distribution as

64 Loss networks

arrival rates and capacities become large. In this section we make this pre-cise, and we shall see that the solutions to the optimizationproblems of thelast section play a key role in the form of the limit.

We need first to define a limiting regime.

3.5.1 A limiting regime

Consider a sequence of networks as follows: replaceν = (νr , r ∈ R) andC = (C j , j ∈ J) by nu(N) = (νr(N), r ∈ R) andC(N) = (C j(N), j ∈ J) forN = 1, 2, . . . . We will consider the scaling regime in which

1Nνr(N)→νr for r ∈ R

1N

C j(N)→C j for j ∈ J

asN → ∞, whereνr > 0, C j > 0 for all r and j. Write B(N), x(N), etc. forquantities defined for theNth network. ThusB(N) solves the conditions onB (3.6) withν,C replaced byν(N),C(N).

From now on we will assume thatA has full rankJ, and moreover thatthe unique solution to the conditions onB (3.6) satisfies

∑

r

Ajrνr

∏

i

(1− Bi)Air < C j if Bj = 0,

i.e. the inequality is strict. This will simplify the statements and proofs ofresults.

Lemma 3.13 As N→ ∞, B(N)→ B and 1N x(N)→ x.

Proof The sequenceB(N), N = 1, 2, . . . takes values in the compact set[0, 1]J. Select a convergent subsequence, sayB(Nk), k = 1, 2, . . . , and letB′ = limk→∞ B(Nk). Then forN = Nk large enough,

∑

r

Ajr νr(N)∏

i

(1− Bi(N))Air = C j(N) if B′j > 0

≤ C j(N) if B′j = 0.

(Note that this is true for allk with Bj(Nk) on the right-hand side, butB′j >0 =⇒ Bj(Nk) > 0 for all k large enough and allj = 1, . . . , J.)

Divide by Nk, and letNk → ∞, to obtain∑

r

Ajrνr

∏

i

(1− B′i )Air = C j if B′j > 0

≤ C j if B′j = 0.


Note that this showsB′j , 1 for any j (it can’t be equal to 1 in the secondline, and it can’t be equal to 1 in the first line becauseC j > 0 for all j).

By the uniqueness of solutions to this set of relations,B′ = B.To finish the proof thatB(N)→ B, we use a standard analysis argument.

We have shown that any convergent sequence ofB(N) converges toB, andsince we’re in a compact set, any infinite subset of{B(N)} has a convergentsubsequence. Consider an open neighbourhoodO of B; we will show thatall but finitely many terms ofB(N) lie in O. Indeed, the set [0, 1]J \ O isstill compact; if infinitely many termsB(Nk) were to lie in it, they wouldhave a convergent subsequence; but that convergent subsequence would(by above) converge toB, a contradiction.

Finally, since

xr(N) = νr(N)∏

i

(1− Bi(N))Air ,

we conclude thatxr (N)N → xr . �

We have shown that in the limiting regime of high arrival rates andlargecapacities, we can approximate the most likely state of a large network bythe most likely state of the limit. Our next task is to show that the dis-tribution of n(N), appropriately scaled, converges to a normal distributionconditioned on belonging to a subspace. The definitions below will be use-ful in defining the subspace.

Let

pN(n(N)) =∏

r

νr (N)nr (N)

nr (N)!,

(the unnormalised distribution ofn(N)), and also let

mj(N) = C j(N) −∑

r

Ajr nr (N),

the number of spare circuits on linkj in theNth network. Let

B = { j : Bj > 0}, AB = (Ajr , j ∈ B, r ∈ R).

The matrixAB contains those rows ofA that correspond to links with pos-itive entries ofB. Intuitively, those are the links where we expect to havepositive blocking probability.

66 Loss networks

3.5.2 Limit theorems

Choose a sequence of statesn(N) ∈ S(N), N = 1, 2, . . . and let

ur (N) = N−1/2(nr (N) − xr(N)), r ∈ R.

Thus we are centering the vector (ur (N), r ∈ R) on the approximate modeas found in the last section, and using the appropriate scaling for conver-gence to a normal distribution.

Theorem 3.14 The distribution of u(N) = (ur(N), r ∈ R) convergesto the distribution of the vector u= (ur , r ∈ R) formed by conditioningindependent normal random variables ur ∼ N(0, xr ), r ∈ R, on ABu = 0.Moments converge also, and hence

1NE[nr(N)] → xr , r ∈ R.

Remark 3.15 The proof proceeds by directly estimating the unnormalisedprobability masspN(n(N)), and using Stirling’s formula in the form

n! = (2πn)1/2 exp(n logn− n+ θ(n)),

where 112n+1 < θ(n) < 1

12n. We’ll not go through the more tedious partsof the proof (controlling error terms in the tail of the distribution) but willsketch the key ideas.

Sketch of proof Restrict attention initially to sequencesn(N) ∈ S(N),N = 1, 2, . . . , such that

∑

r ur(N)2 is bounded above: thennr (N) ∼ xr (N) ∼xr N asN→ ∞. By Stirling’s formula,

pN(n(N)) =∏

r

(2πnr (N))−1/2

· exp

∑

r

(nr (N) logνr(N) − nr (N) lognr (N) + nr(N))

︸︷︷︸

X

+O(N−1)

.

We now expandX:

X = −∑

r

nr (N) lognr(N)eνr(N)

= −∑

r

nr(N) logxr(N)νr(N)

︸︷︷︸

first term

−∑

r

nr (N) lognr (N)exr(N)

︸︷︷︸

second term

.

The first term, by the construction ofx as a solution to an optimization


problem, is∑

r

nr (N)∑

j

yj(N)Ajr =∑

j

yj(N)∑

r

Ajr nr (N) =∑

j

yj(N)(C j(N)−mj(N)).

To deal with the second term, we note from a Taylor series expansion that

(1+ a)(1− log(1+ a)) = 1− 12

a2 + o(a2) asa→ 0,

and therefore

−nr (N) lognr (N)exr(N)

= (xr (N) + ur(N)N1/2)

(

1− log(1+ur (N)N1/2

xr(N))

)

= xr (N)

1−12

(

ur(N)N1/2

xr (N)

)2

+ o(N−1)

= xr (N) − ur (N)2

2xr+ o(1)

(we absorb the error from replacingxr (N)/N by xr into the o(1) term).Putting these together,

X =∑

j

yj(N)C j(N) +∑

r

xr(N) −∑

j

yj(N)mj(N) −∑

r

ur (N)2

2xr+ o(1),

and thus

pN(n(N)) exp(−∑

r

xr (N) −∑

j

yj(N)C j(N))

︸︷︷︸

term 1

=

∏

j

(1− Bj(N))mj (N)

︸︷︷︸

term 2

∏

r

(2πnr (N))−1/2

︸︷︷︸

term 3

exp

(

−ur (N)2

2xr+ o(1)

)

︸︷︷︸

term 4

Remark 3.16 We have established this form uniformly over all sequencesn(N) such that

∑

r ur (N)2 is bounded above and thus uniformly overu(N) inany compact set. To be sure that this is enough to determine the limit distri-bution we need to ensure tightness for the distributions ofu(N): that is, weneed to know that probability mass doesn’t leak to infinity asN → ∞, sothat we can compute the limit distribution by normalizing the above formto be a probability distribution. We also need to ensure that theprobabil-ity mass outside of a compact set decays quickly enough that momentscan be calculated from the limit distribution. It is possible to ensure this,

68 Loss networks

Little’s law by crudely bounding the error terms in Stirling’s formula and the Taylorseries expansion, but we shall omit this step.

Note that term 1 is a constant that does not depend onn (it does dependonN, of course): it will be absorbed in the normalizing constant. Similarly,sincenr (N) ∼ xr N, term 3 can be absorbed in the normalizing constant,with an error that can absorbed into theo(1) term. Term 4 looks promis-ing: we are trying to relate the probability distribution ofu(N) to the formexp(−u2

r /2xr ) corresponding to aN(0, xr) random variable. But what aboutterm 2?

We will now show that term 2 has the effect of conditioning the normaldistribution on the setABu = 0. For j ∈ B, i.e. Bj > 0, we consider thequantity

mj(N) = C j(N) −∑

r

Ajr nr (N) = C j(N) −∑

r

Ajr

(

xr (N) + ur(N)N1/2)

= C j(N) −∑

r

Ajr xr (N)

︸︷︷︸

=0 for N large enough

−

∑

r

Ajr ur(N)

N1/2.

Sincemj(N) ≥ 0, we deduce that∑

r Ajr ur (N) ≤ 0 for j ∈ B andN largeenough. Further, term 2 decays geometrically withmj(N), for eachj ∈ B.Thus the relative mass assigned bypN(n(N)) to values ofmj(N) > mdecaysto zero asm increases, and so the relative mass assigned bypN(n(N)) tovalues of−∑

r Ajr ur (N) larger than any givenǫ > 0 decays to zero asNincreases. Therefore, the normalized limiting distribution is concentratedon the setABu = 0, as required.

�

Corollary 3.17 For r ∈ R, and as N→ ∞,

Lr (N)→ Lr ≡ 1−∏

i

(1− Bi)Air .

Here, Lr(N) is the probability that a call on route r is lost (in network N).

Proof By Little’s law,

(1− Lr (N))νr (N)︸︷︷︸

λ

· 1︸︷︷︸

W

= E[nr (N)]︸︷︷︸

L

.

Therefore,

1− Lr (N) =E[nr (N)]νr(N)

=E[nr (N)]/Nνr(N)/N

→ xr

νr=

∏

i

(1− Bi)Air .

3.6 Erlang fixed point 69

Erlang fixedpoint—(

�

Remark 3.18 Observe that if there is a linkr = { j}, then the Corollaryshows thatLr (N)→ Bj .

Example 3.19 Consider the system in Figure 3.8, with link-route inci-dence matrix

A =

(

1 0 10 1 1

)

.

��

��

��

��

��

��

��

n1

n3

n2 n2

C2C1

n3

n1

m1

m2

Figure 3.8 Shared communication link, and break-down of thenumber of circuits in use/ free on each of the links

SupposeB = {1, 2}, so AB = A. Then prior to conditioning onABu =0 we haveur = (nr − xr )/

√N → N(0, xr ), three independent normals.

However, since|B| = 2, the conditionABu = 0 reduces the support by twodimensions, constrainingu to a one-dimensional subspace, wheren1+n3 ≈C1 andn2 + n3 ≈ C2.

The free circuit processesm1 andm2 control the system: linkj blockscalls if and only ifmj = 0. Thusm1 andm2 are often not zero. But in thelimiting regimem1 andm2 are onlyO(1), while n1, n2, n3,C1 andC2 allscale asO(N).

3.6 Erlang fixed point

Consider the equations

E j = E

(1− E j)−1

∑

r

Ajr νr

∏

i

(1− Ei)Air ,C j

(3.7)

(this is the generalisation of the Erlang fixed point equations to matricesAthat may not be 0-1). Our goal now is to show that there exists a uniquesolution to these equations, and that in the limiting regime of the solution

70 Loss networks

converges to the correct limitB, namely that arising from the maximumprobability optimization problem.

Theorem 3.20 There exists a unique solution(E1, . . . ,EJ) ∈ [0, 1]J sat-isfying(3.7).

Proof We will prove this theorem by showing that we can rewrite theequations (3.7) as the stationary conditions for an optimization problem(with a unique optimum).

Define a functionU(y,C) : R+ × Z+ → R+ by the implicit relation

U(− log(1− E(ν,C)),C) = ν(1− E(ν,C)).

The interpretation is thatU(y,C) is theutilization or mean number of cir-cuits in use in the Erlang model, when the blocking probabilityis E =1− e−y. (See Exercise 3.5.) Observe that asν increases continuously from0 to∞ the first argument ofU increases continuously from 0 to∞, and sothis implicit relation defines a functionU(y,C) : R+ × Z+ → R+. Sinceboth the utilizationν(1− E(ν,C)) and the blocking probabilityE(ν,C) arestrictly increasing functions ofν, the functionU(y,C) is a strictly increas-ing function ofy. Therefore, the function

∫ y

0U(z,C) dz

is a strictly convex function ofy.Consider now the optimization problem

REVISED DUAL:

minimize∑

r

νre−∑

j y j A jr +∑

j

∫ y j

0U(z,C j) dz (3.8)

over y ≥ 0.

Note that this problem looks a lot like (3.5), except that we havereplacedthe linear term in the earlier objective function by a strictly convex term.

By the strict convexity of its objective function (3.8), the REVISEDDUAL has a unique minimum. Differentiating, the stationary conditionsyield that at the unique minimum

∑

r

Ajrνre−∑

i yi Air = U(yj ,C j), j = 1, . . . , J. (3.9)

Now supposeE solves the Erlang fixed point equations (3.7), and defineyj

3.6 Erlang fixed point 71

by E j = 1− e−y j (i.e.,yj = − log(1− E j)). We can rewrite (3.7) in terms ofy as

E(ey j

∑

r

Ajr νre−∑

i yi Air ,C j) = 1− e−y j , j = 1, . . . , J

or, moving things from side to side and multiplying by an arbitraryrealparameterλ,

λe−y j = λ

1− E(ey j

∑

r

Ajrνre−∑

i yi Air ,C j)

. (3.10)

But if we make the choice

λ = ey j

∑

r

Ajr νre−∑

i yi Air

and use the definition ofU then (3.10) will become precisely the statementof the stationary conditions (3.9). Since the solution to the stationary condi-tions is unique, we deduce that there exists a unique solutionto the Erlangfixed point equations. �

We will now show that the objective function of the REVISED DUALasymptotically approaches that of the DUAL problem: first we show thatU(z,C j) is close toC j (except of course forz= 0).

Lemma 3.21

U(y,C) = C − (ey − 1)−1 + o(1)

as C → ∞, uniformly over y∈ [a, b] ⊂ (0,∞) (i.e. y on compact sets,bounded away from 0).

Proof Consider an isolated Erlang link of capacityC offered Poisson traf-fic at rateν. Then

π( j) =ν j

j!

C∑

k=0

νk

k!

−1

.

Let ν,C → ∞ with C/ν → 1 − B for someB > 0. (That is, the capacityis smaller than the arrival rate by a constant factor, and the ratio givesB.)Then

π(C) =1

1+ Cν+

C(C−1)ν2+ . . .

→ B,

since the denominator converges to 1+ (1− B) + (1− B)2 + . . . .Now suppose a more precise relationship betweenν,C: suppose that as

72 Loss networks

C → ∞, ν takes values so thatπ(C) = B for fixed B ∈ (0, 1). Then theexpectation (with respect toπ) of the number of free circuits is

C∑

m=0

mπ(C −m) = π(C)

(

0+ 1 · Cν+ 2 · C(C − 1)

ν2+ . . .

)

≤

π(C)∞∑

m=0

mCm

νm= π(C)

C/ν(1−C/ν)2

→ B(1− B)B2

=1− B

B.

Indeed, we have bounded convergence: the elements of the series convergeterm-by-term to the geometric bound, which is finite. This impliesthat inthe limit we have equality in the place of≤, that is,

Eπ[number of free circuits]→ 1− BB=

e−y

1− e−y= (ey − 1)−1

where, as usual,B = 1 − e−y. Since the utilizationU(y,C) is the expectednumber of busy circuits when the blocking probability isB = 1 − e−y, wehave established the pointwise limit stated in the result. Todeduce uniformconvergence, observe that forB in a compact set contained within (0, 1) theerror in the bounded convergence step can be controlled uniformlyover thecompact set. �

Let E j(N) be the Erlang fixed point, that is the solution to (3.7), fornetwork N in the sequence considered in the last section. As in the lastsection, assumeA is of full rank, so that the solution to (3.6), the conditionson B, is unique.

Corollary 3.22 As N→ ∞, Ej(N)→ Bj .

Proof The Erlang fixed point is the unique minimum of the REVISEDDUAL objective function, and from Lemma 3.21 we can write this objec-tive function as

∑

r

νr(N)e−∑

j y j A jr +∑

j

∫ y j

0U(z,C j(N))dz

= N

∑

r

νre−∑

j y j A jr +∑

j

yjC j + o(1)

,

where the convergence is uniform on compact subsets of (0,∞)J. Thatis, the REVISED DUAL objective function, scaled byN, converges uni-formly to the DUAL objective function of (3.5). Since the DUAL objectivefunction is strictly convex and continuous, the minimum of theREVISEDDUAL objective function converges to it asN→ ∞, as required.

3.7 Diverse routing 73

Erlang’sformula

Erlang fixedpoint—)

�

Remark 3.23 The corollary to Theorem 3.14 showed that the limitingloss probabilityLr is as if links block independently, with linkj rejectinga request for a circuit with probabilityBj where (B1, . . . , BJ) is the uniquesolution to the conditions onB. The above corollary shows, reassuringly,that the Erlang fixed point converges to the same vector (B1, . . . , BJ).

Exercises

Exercise 3.5 Let y andC be given. Consider a single Erlang link of ca-pacityC, whose arrival rateν = ν(y,C) is such that the probability of a lostcall is equal to 1− e−y. Use Erlang’s formula to relatey andν. Then useExercise 1.7 (or Exercise 2.15) to find the mean number of circuitsin use,U(y,C), and verify that it has the form given in proof of Theorem 3.20.

Exercise 3.6 Consider the isolated Erlang link of Lemma 3.21 under thelimiting regime considered there. Show that the distribution ofthe num-ber free circuits on the link converges to a geometric distribution, whoseprobability of being equal to 0 is the blocking probabilityB.

Exercise 3.7 In section 3.2 repeated substitution was noted as a methodfor finding the Erlang fixed point. A variant of repeated substitution is tostart from a vectorB, perhaps (0, 0, . . . , 0), and use the right-hand side of(3.1) to update the components ofB cyclically, one component at a time.Show that, whenA is a 0-1 matrix, this corresponds to finding the minimumin each coordinate direction of the function (3.8) cyclically, one coordinateat a time.[Hint: Show that solving thej th equation (3.9) foryj , for given values ofyi , i , j, corresponds to finding the minimum of the function (3.8) in thej th coordinate direction.]

Deduce from the strict convexity of the continuously differentiable func-tion (3.8) that this method converges.

Describe a generalization of this method, with proof of convergence, forthe case whenA is not a 0-1 matrix.

3.7 Diverse routing

The limit theorems of earlier sections concern networks with increasinglink capacities and loads, but fixed network topology. Next webriefly con-sider a different form of limiting regime, where link capacities and loads

74 Loss networks

Erlang fixedpoint

Erlang fixedpoint

are fixed or bounded and where the numbers of links and routes in thenetwork increase to infinity.

We begin with a simple example. Consider the star network withn links,as in Figure 3.9. An arriving call requires a single circuit from each of two

Figure 3.9 Star network with 6 links, and one of the routes.

randomly chosen links. Suppose we leave constant the capacities of links,but let the number of links increase to infinity, and suppose the total arrivalrate of calls in the entire network increases at the same rate. Thenit ispossible to show that the Erlang fixed point emerges under the limit asn→ ∞.

Note that the probability that two calls sharing a link actually share theirsecond link also tends to 0 asn→ ∞, since this will be true for a networkwith all capacities infinite, and at any timet the calls present in a finite ca-pacity network are a subset of those present in the infinite capacity network(under the natural coupling).

The Erlang fixed point also emerges for various other network topolo-gies, provided the routing is sufficiently diverse, in the sense that the prob-ability approaches zero that two calls through a given link sharea link else-where in the network. This is unsurprising (although non-trivial toprove),since we would expect the assumption of diverse routing to leadquite natu-rally to links becoming less dependent, and the Erlang fixed point is basedon a link independence approximation.

The Erlang fixed point is appealing as an approximation proceduresinceit works well in a range of limiting regimes, and these regimes cover net-works with large capacities and/or diverse routing, for which exact answersare hard to compute.


Erlang fixedpoint

Erlang fixedpoint

3.7.1 Non-uniqueness

We’ve seen that under fixed routing the Erlang fixed point is unique. Nextwe’ll consider an example of alternative routing interesting in its own rightwhere the fixed point emerges from a diverse routing limit, but is notunique.

Consider a network which is a complete graph onn nodes and symmet-ric: the arrival rate between each pair of nodes isν, and the number ofcircuits on each of the links isC. Suppose that an arriving call is routed di-

Figure 3.10 Complete graph topology, with 5 nodes. A route(solid) and an alternative route (dashed) are highlighted.

rectly if possible, and otherwise a randomly chosen 2-link alternative routeis tried; if that route happens to be blocked at either link thenthe call islost.

We will develop an Erlang fixed point equation for this model, based onthe same underlying approximation that links block independently. In thatcase, ifB is the blocking probability on a link, taken to be the same foreach link, then

P(incoming call is accepted)= (1− B)︸︷︷︸

can route directly

+ B(1− B)2

︸︷︷︸

can’t route directly,but can via 2-link detour

and the expected number of circuits per link that are busy isν(1 − B) +2νB(1− B)2. Thus, we look for a solution to

B = E(ν(1+ 2B(1− B)),C) (3.11)

(since if an arrival rateν(1+ 2B(1− B)) is thinned by a factor 1− B we getthe desired expected number of busy circuits).

Remark 3.24 An alternative way to derive this arrival rate is to just count

76 Loss networks

hysteresis the calls that come for linki j : we have an arrival rate ofν for the callsi ↔ j, plus for each other nodek we have a traffic of νB(1 − B) · 1

n−2 forcalls i ↔ k which get rerouted viaj (and the same for callsj ↔ k whichget rerouted viai). Adding these 2(n− 2) terms gives the total arrival rateν(1+ 2B(1− B)).

The solutions to the fixed point equation (3.11) are illustratedin theFigure 3.11. The curve corresponding toC = ∞ arises as follows. Supposethatν,C→ ∞ while keeping their ratio fixed. Then

limN→∞

E(νN,CN) = (1−C/ν)+

wherex+ = max(x, 0) is the positive part ofx. The fixed point equation(3.11) therefore simplifies to

B = [1−C/ν(1+ 2B(1− B))]+ ,

and the locus of points satisfying this equation is plotted asthe dashedcurve in Figure 3.11. Observe that for some values ofν andC there aremultiple solutions (for example whenC = ∞ we have a cubic equation forB, which has multiple roots).

0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5

0.0

0.1

0.2

0.3

0.4

0.5

0.6

load ν

C

B

Figure 3.11 Blocking probabilities from the fixed point equationfor C = 100,C = 1000 (solid) and the limitingC = ∞ (dashed).


Simulations of the system withn large show hysteresis, as in Figure 3.12.When the arrival rate is slowly increased from 0, the blocking probability

P(losing a call)

loadν/C

Figure 3.12 Sketch of loss probability, as we slowly increase orslowly decrease the arrival rate. The dashed curve sketchesthesolution of the Erlang fixed point equation on the same axes.

follows the lower solution forB until that has an infinite derivative, thenjumps to the higher value. If the arrival rate is then slowly decreased, theblocking probability follows the higher solution forB until that has an in-finite derivative, then jumps to the lower value.

An intuitive explanation is as follows. The lower solution correspondsto a mode in which blocking is low, calls are mainly routed directly andrelatively few calls are carried on two-link paths. The upper solution corre-sponds to a mode in which blocking is high and many calls are carried overtwo-link paths. Such calls use two circuits, and this additional demand onnetwork resources may cause a number of subsequent calls also toattempttwo-link paths. Thus a form of positive feedback may keep the system inthe high blocking mode.

And yet the system is an an irreducible finite-state Markov chain,so itmust have auniquestationary distribution, and hence auniqueprobabilityof a link being full, at any given arrival rate.

How can both these insights be true? We can indeed reconcile them, byobserving that they concern two different scaling regimes:

(1) if we leave the arrival rate constant and simulate the network for longenough, then the proportion of time a link is full or the proportionof lostcalls will indeed converge to the stationary probability coming from theunique stationary distribution;

(2) if, however, we fix a time period [0,T], fix the per-link arrival rate

78 Loss networks

of calls, and let the number of links tend to infinity, then thetime to jumpfrom one branch of the graph above to the other will tend to infinity, i.e.the system will freeze in one of the two modes (low-utilisation orhigh-utilisation).

Remark 3.25 In case (1), the unique stationary distribution for the chainmay be bimodal, with a separation between the two modes that becomesmore pronounced asn increases. If you’re familiar with the Ising modelof a magnet, a similar phenomenon occurs there. The Ising model consid-ers particles located at lattice points with states±1, which are allowed toswitch states, and the particle is more likely to switch to a state where itagrees with the majority of its neighbours. The unique stationary distribu-tion of any finite-sized system is symmetric with mean zero; on the otherhand, if we look over a finite time horizon at ever-larger squares, for someparameter values the system will freeze into a mode where almost all par-ticles are+1 or almost all particles are−1.

Remark 3.26 Some further insight into the instability apparent in thisexample can be obtained as follows. The fixed point equations (3.11) locatethe stationary points of

νe−y + νe−2y(1− 2/3e−y)︸︷︷︸

non-convex!

+

∫ y

0U(z,C)dz, (3.12)

whereU is the utilization we defined in the proof of Theorem 3.20. (Youwill show this in Exercise 3.8). Since this is a non-convex function, chang-ing the parameters slightly can change the location of the stationary pointsquite a lot. For an example of this phenomenon, see Figure 3.13, whichplots a family of non-convex curves (not this family, however).

Figure 3.13 As we transition from the solid to the dashed andthen the dotted curve, the location of the minimum changesabruptly. The value of the minimum will change continuously.


The diverse routing regime can be used to provide a deeper understand-ing of this example. LetQn j(t) be the number of links withj busy circuitsat time t. Let Xn j(t) = n−1Qn j(t) and letXn(t) = (Xn j, j = 0, 1, . . . ,C).Note that (Xn(t), t ∈ [0,T]) is a random function. But it converges to adeterministic function asn → ∞. Indeed it can be established (Cram-etz and Hunt (1991)) that ifXn(0) converges in distribution toX(0) then(Xn(t), t ∈ [0,T]) converges in distribution to (X(t), t ∈ [0,T]) whereX(·) =(x0(·), x1(·), . . . , xC(·)) is the unique solution to the equations

ddt

j∑

i=0

xi(t)

= ( j + 1)xj+1(t) − (ν + σ(t))xj(t) j = 0, 1, . . . ,C − 1 (3.13)

where

σ(t) = 2νxC(t)(1− xC(t)).

An intuition into these equations is as follows: the sum on theleft is theproportion of links with j or fewer busy circuits; this will increase when alink with j + 1 busy circuits has a circuit become free, and decrease whena call arrives at a link withj busy circuits.

Exercise 3.9 shows that if a probability distributionx = (x0, x1, . . . , xC)is a fixed point of the above system of differential equations thenxC =

B, whereB is a solution to the fixed point equation (3.11), and hence astationary point of the function (3.12). The upper and lower solutions forBin Figure 3.11 correspond to minima in Figure 3.13 and stable fixedpointsof the differential equations, while the middle solution corresponds to theintervening maximum in Figure 3.13 and to an unstable fixed point of thedifferential equation.

3.7.2 Sticky random routing

Mismatches between traffic and capacity are common in communicationnetworks, often caused by forecast errors or link failures. Can some formof alternative routing allow underutilised links to help with traffic over-flowing from busy links? The last section has shown what can go wrong.And if we allow calls to try more and more alternative routes the prob-lem becomesworse, as a higher and higher proportion of calls are carriedover two circuits. How can we control this given that wedo want to allowrerouting of calls? We describe a simple and effective method that was firstimplemented in the BT network and elsewhere in the 1990s. There are twokey ideas.

80 Loss networks

trunkreservation

trunkreservation

resourcepooling

Erlang’sformula

trunkingefficiency

resourcepooling

First: trunk reservation. A link will accept an alternatively routed callonly if there are≥ s circuits on the link, wheres≪ C is a small constant(for example,s= 5 for C = 100, 1000). This is an idea with a long history,and achieves a form of priority for directly routed calls over alternativelyrouted calls when the link is busy.

Second: the followingsticky random scheme. A call i → j is routeddirectly if there is a free circuit on the link between them. Otherwise, wetry to reroute the call via a tandem nodek(i, j) (which is stored ati). Iftrunk reservation allows the rerouting, the call is accepted. If not, the callis lost, and the tandem nodek(i, j) is reset randomly. Note especially thatthe tandem node is not reselected if the call is successfully routed on eitherthe direct link or the two-link alternative route. The intuitionis that thesystem will look for links with spare capacity in the network, and that trunkreservation will discourage two-link routing except where there issparecapacity.

Dynamic routing strategies such as this are effective in allowing a net-work to respond robustly to failures and overloads. Good strategies effec-tively pool the resources of the network, so that spare capacity in part ofthe network can be available to deal with excess traffic elsewhere.

3.7.3 Resource pooling

Next we look at a very simple model to illustrate an interesting conse-quence ofresource pooling. Consider a number of independent and par-allel Erlang links, each with the same load and capacity. Whatcould begained by pooling them, so that a call for any of the links coulduse a cir-cuit from any of them? Well, to calculate the benefit we simply need torecalculate Erlang’s formulaE(ν,C) with both its parameters increased bythe same factor, the number of links pooled. The effect on the loss prob-ability is shown in Figure 3.14. Intuitively, the aggregating of load and ofcapacity lessens the effect of randomness and this reduces the loss proba-bility, a phenomenon known in the telecommunications industry astrunk-ing efficiency. (If the loads differed from link to link, the reduction in lossprobability from pooling would be even greater.)

Thus as the amount of resource pooling increases, that is asC increases,the blocking probability decreases for a given loadν/C. Indeed, in the limitasC → ∞, the blocking probability approaches 0 for any load less than 1.This admirable state of affairs has one unfortunate consequence. Imaginethe load on the network is gradually increasing, perhaps over months, andthat the blocking probability is used as an indication of the health of the


resourcepooling

Figure 3.14 Erlang blocking probability forC = 100,C = 1000(solid) and the limit asC→ ∞ (dashed).

network; in particular, suppose an increase in blocking is usedto indicatea need for capacity expansion, which may take time to implement. WhenC is small the gradual increase in blocking as load increases gives plentyof time for capacity expansion. But whenC is large there is a problem. Inthe limiting case ofC = ∞, nothing is noticed until the load passes through1: at this point the blocking probability curve has a discontinous deriva-tive, and blocking increases rapidly. The moral of this discussion is thatin networks with very efficient resource pooling, the blocking probabilityalone is not a good measure of how close the system is to capacity, andadditional information may be needed. We return to this point at the end ofSection 4.3.

Exercises

Exercise 3.8 Show that the fixed point equations (3.11) locate the sta-tionary points of the function (3.12).[Hint: Use the definition of the functionU from the proof of Theorem 3.20.]

Exercise 3.9 Show that if a probability distributionx = (x0, x1, . . . , xC)is a fixed point of the system of differential equations (3.13), then it is alsothe stationary distribution for an Erlang link of capacityC with a certainarrival rate. Deduce that necessarilyxC = B, whereB is a solution to thefixed point equation (3.11).

Exercise 3.10 Suppose that when a calli → j is lost in the sticky randomscheme the tandem nodek(i, j) is reset either at random from the set of

82 Loss networks

resourcepooling

trunkreservation

lossnetwork—)

nodes other thani and j, or by cycling through a random permutation ofthis set of nodes. Letpk(i, j) denote the long-run proportion of callsi → jwhich are offered to tandem nodek, and letqk(i, j) be the long-run propor-tion of those callsi → j and offered to tandem nodek which are blocked.Show that

pa(i, j)qa(i, j) = pb(i, j)qb(i, j), a, b , i, j.

Deduce that if the blocking fromi → j is high on the path via the tandemnodek, then the proportion of overflow routed via nodek will be low.

Exercise 3.11 Consider poolingC independentM/M/1 queues, each withthe same parameters, into a singleM/M/1 queue with service rateC. Showthat the mean sojourn time in the system of a customer is divided byC. Sketch a diagram parallel to Figure 3.14, plotting mean sojourntimeagainst load. Note that in the limiting case ofC = ∞, the mean sojourntime is discontinuous at a load of 1.

Exercise 3.12 Trunk reservation has several other uses. Consider a singlelink of capacityC at which calls of typek, requiringAk circuits, arrive asindependent Poisson streams of rateνk. Let n = (nk)k describe the numberof calls of each type in progress. Suppose calls of typek are accepted ifand only if the the vectorn prior to admission satisfies

∑

r

Arnr ≤ C − sk

where the trunk reservation parameterssk satisfysk ≥ Ak. Note that varyingsk varies the relative loss probabilities for different types of call. Show thatif, for eachk, sk = maxr Ar then the loss probability does not depend uponthe call type.

3.8 Further reading

Extensive reviews of work on loss networks are given by Kelly (1991) andRoss (1995); Zachary and Ziedins (2011) review recent work on dynamicalbehaviour. Sticky random schemes are surveyed by Gibbenset al. (1995).

Exercise 3.3 is taken from Pallant and Taylor (1995).

Part II

83

Braess’sparadox

4

Decentralized optimization

A major practical and theoretical issue in the design of communicationnetworks concerns the extent to which control can be decentralized. Overa period of time the form of the network or the demands placed on it maychange, and routings may need to respond accordingly. It is rarely the case,however, that there should be a central decision-making processor, decid-ing upon these responses. Such a centralized processor, even ifit were itselfcompletely reliable and could cope with the complexity of thecomputa-tional task involved, would have its lines of communicationthrough thenetwork vulnerable to delays and failures. Rather, control should be de-centralized and of a simple form: the challenge is to understand how suchdecentralized control can be organized so that the network as a whole reactssensibly to fluctuating demands and failures.

The behaviour of large-scale systems has been of great interestto math-ematicians for over a century, with many examples coming from physics.For example, the behaviour of a gas can be described at the microscopiclevel in terms of the position and velocity of each molecule. At this levelof detail a molecule’s velocity appears as a random process, with a station-ary distribution as found by Maxwell. Consistent with this detailed micro-scopic description of the system is macroscopic behaviour best describedby quantities such as temperature and pressure. Similarly the behaviourof electrons in an electrical network can be described in terms of randomwalks, and yet this simple description at the microscopic level leads torather sophisticated behaviour at the macroscopic level: thepattern of po-tentials in a network of resistors is just such that it minimizesheat dissi-pation for a given level of current flow. The local, random behaviour ofthe electrons causes the network as a whole to solve a rather complex opti-mization problem.

Of course simple local rules may lead to poor system behaviour if therules are the wrong ones. Road traffic networks provide a chastening exampleof this. Braess’s paradox describes how, if a new road is added to acon-

85

86 Decentralized optimization

loss network gested network, the average speed of traffic may fall rather than rise, andindeed everyone’s journey time may lengthen. The attempts of individu-als to do the best for themselves lead to everyone suffering. It is possibleto alter the local rules, by the imposition of appropriate tolls, so that thenetwork behaves more sensibly, and indeed road traffic networks providedan early example of the economic principle that externalities need to beappropriately penalized for decentralized choices to lead to good systembehaviour.

In this chapter, we discuss these examples of decentralized optimization.In Section 4.1 we discuss a simple model of the motion of electrons ina network of resistors, and in Section 4.2 we describe a model of roadtraffic. We shall see that our earlier treatment of loss networks in Chapter 3can be placed in this more general context, and we shall work throughsome examples which present ideas we shall see again in later Chapters oncongestion control.

4.1 An electrical network

We begin with a description of a symmetric random walk on a graph, whichwe will later relate to a model of an electrical network.

4.1.1 A game

Consider the following game. On a graph with vertex setG, you perform asymmetric random walk with certain transition rates until you hita subsetS ⊂ V. (A random walk on a graph with transition ratesγ = γ jk is aMarkov process with state spaceG and transition ratesqjk = γ jk if ( j, k) isan edge, andqjk = 0 otherwise. A symmetric random walk hasγ jk = γk j.)The game ends when you reach some vertex inS, and you receive a rewardthat depends on the particular vertex you hit: if the vertex isi, the rewardis vi . How much should you pay to play this game? That is, what is yourexpected reward?

γ23γ12

γ15 γ25

γ45

γ34

1 2 3

45

Figure 4.1 Random walk on a graph with transition ratesγi j

4.1 An electrical network 87

Kirchhoff’sequationsClearly, the answer depends on the starting position, so letpj be the ex-

pected reward starting fromj. If j ∈ S then, of course,pj = vj . The randomwalk with transition ratesγ jk is reversible, and its invariant distribution isuniform. By conditioning on the first vertexi to which the random walkjumps from j, we obtain the relations

pj =∑

i

γ ji∑

k γ jkpi , for j ∈ G \ S.

We can rewrite this set of equations as follows:

0 =∑

i

current︷︸︸︷

γi j︸︷︷︸

1/R

(pi − pj)︸︷︷︸

∆V

, j ∈ G \ S

pj = vj , j ∈ S

Interpretingpi as the voltage at vertex (node)i, andγi j as theconductance(inverse resistance) of the edge (i j ), the first line is asserting that the sumof the currents through all the edges into a given nodei is 0. These areKirchhoff’s equations for an electrical networkG, in which nodesi and jare joined by a resistance ofγ−1

i j , and nodesj ∈ S are held at potentialvj .Can we develop a better understanding of why Kirchhoff’s equations for

the flow of current appear in this game?

4.1.2 Button model

v0 = 0

v1 = 1

Figure 4.2 Button model

Let us consider a variant of the game. We will look at the special case ofS = {0, 1} with v0 = 0 andv1 = 1.

Suppose that on every node of the graph there is a button. Buttonsonnodesj andk swap at rateγ jk; that is, associated to each edge there is anindependent Poisson clock which ticks at rateγ jk, and when it ticks, the two


Kirchhoff’sequations buttons at the vertices of that edge swap places. When a buttonarrives at

node 1, it is painted black; and when it arrives at node 0, it is painted white.(A button may be repainted many times.) You should convince yourselfthat this defines a Markov process, whose state is the set of nodes wherethe buttons are coloured black.

As you will show in Exercise 4.1, each individual button is performing asymmetric random walk on the graph. Thus, for a given button, asking thequestion “am I black?” amounts to asking whether that button’s individualrandom walk has more recently been to 1 or to 0. Since the random walk isreversible, the probability that the button is black is equal tothe probabilitythat the random walk started from that point will in the future visitnode1 before visiting node 0. Thus, from the point of view of each individualbutton, it is playing the game we described above, withS = {0, 1}.

Let us look at another equivalent model for this Markov process.Sup-pose that electrons perform a random walk on the graph with exclusion.That is, an electron in nodei attempts to jump to a neighbouring nodejat rateγi j , but if there is already an electron inj, then that jump will beblocked, and the state will be unchanged. Suppose also that electrons arepushed in at node 1, i.e., as soon as an electron leaves node 1, anotherone appears there; and electrons are pulled out at node 0, i.e., assoon asan electron appears there, it is removed from the system. (You will showthe equivalence of the two models in Exercise 4.2.) This is a rather simplemodel of electron movement in an electrical network, with node 1beingheld at a higher voltage than node 0.

Let pj be the (stationary) probability that nodej is occupied by an elec-tron. Then

p0 = 0, p1 = 1, pj =∑

i

γ ji∑

k γ jkpi , j , 0, 1

Equivalently,

0 =∑

i

γi j (pi − pj), j , 0, 1

pj = vj , j = 0, 1

which are precisely Kirchhoff’s equations for an electrical network inwhich nodesi and j are joined by a resistanceγ−1

i j and nodes 0, 1 are heldat a voltage ofv0 = 0, v1 = 1.

Thenetflow of electrons from nodej to nodek is

γ jkP( j occupied,k empty)− γk jP(k occupied,j empty).


Now, γ jk = γk j. Also, we can rewrite the difference of probabilities byadding and subtracting the event thatj andk are both occupied, as follows:

P( j occupied,k empty)− P(k occupied,j empty)

=(

P( j occupied,k empty)+ P( j occupied,k occupied))

− (

P( j occupied,k occupied)+ P(k occupied,j empty))

= pj − pk.

Therefore, the net flow of electrons from nodej to nodek is

γ jk(pj − pk),

i.e. we have recovered Ohm’s law that the current flow fromj to k is pro-portional to the voltage difference; the constant of proportionality is theconductance (inverse resistance)γ jk.

We could also prove a result that says that, during a long time interval,the net number of electrons that have moved fromj tok will be proportionalto this quantity, and even give a central limit theorem for the “shot noise”around the average rate of electron motion fromj to k.

4.1.3 Extremal characterization

Next we give another angle of attack on current flow in networks, whichwas developed in the late 19th century.

Let ujk be the current flowing fromj to k, and letr jk = (1/γ jk) be theresistance between nodesj andk. Then the heat dissipation in the networkis

12

∑

j

∑

k

r jku2jk

(the factor of12 is there because we are double-counting each edge). Sup-

pose we want to minimize the heat dissipation subject to a given total cur-rent U flowing from node 0 to node 1, and we are free to choose wherecurrent flows, subject only to flow balance at the other nodes. The problem


Lagrangemultiplier

Lagrangemultiplier

Kirchhoff’sequations

Thomson’sprinciple

is thus as follows.

minimize12

∑

j

∑

k

r jku2jk

over ujk = −uk j, j, k ∈ G

such that∑

k

ujk =

0, j ∈ G

−U, j = 0

+U, j = 1

We can write the Lagrangian as

L(u; p) =12

∑

j<k

r jku2jk +

∑

j

pj

∑

k

ujk

+ p0U − p1U;

we will see that the notationpj for the Lagrange multipliers isn’t acciden-tal. To deal with the conditionujk = −uk j, we will simply eliminateujk withj > k, so that the Lagrangian involves onlyujk with j < k. (In particular,we readuk j for k > j as−ujk in the second term. Also the summation inthe first term is overj < k rather than over allj andk; this corresponds todividing the original objective function by 2.) Differentiating with respectto ujk,

∂L∂ujk

= r jkujk + pj − pk,

so the solution is

ujk =pk − pj

r jk.

That is, the Lagrange multipliers really are potentials in Kirchhoff’s equa-tions, and the currentsujk obey Ohm’s law with these potentials.

This is known asThomson’s principle: the flow pattern of current withina network of resistors is such as to minimize the heat dissipation for a giventotal current. (The Thomson in question is the physicist William Thomson,later Lord Kelvin.)

Extremal characterizations are very useful in deriving certain mono-tonicity properties of current flows. For example, suppose that we removean edge from the network (i.e., assign an infinite resistance to it).What willthis do to the effective resistance of the network? Intuitively, it seems clearthat the effective resistance should increase, but how can we prove it?

This becomes an easy problem if we have found an extremal charac-terization. The effective resistance is the ratio of the heat dissipated to the


Kirchhoff’sequations

Dirichletproblem

loss networkKirchhoff’s

equationsbistabilitytrunk

reservation

square of the current, and so we will be done if we can show that removingan edge increases the minimum attained in the optimization problem. Butremoving an edge from the graph restricts the set of possible solutions, andso the minimal heat dissipation can only increase (or not go down, anyway)if we enforceujk = 0. It is quite tricky to prove this result without usingthe extremal characterization.

If we evaluate the heat dissipation in the network in terms of thepoten-tial differences rather than currents, then we are led to a dual form of theoptimization problem:

minimize12

∑

j

∑

k

γ jk(pj − pk)2

over pj , j ∈ G

such thatp0 = 0, p1 = 1

This gives the optimality conditions

∂L∂pj=

∑

k

γ jk(pj − pk) = 0, j , 0, 1;

which are again Kirchhoff’s equations.You may recognise the problem of minimising a quadratic form sub-

ject to boundary conditions as the Dirichlet problem. (To get theclassicalDirichlet problem on continuous spaces, imagine a network thatis a squaregrid, and allow the grid to get finer and finer.)

Remark 4.1 It is interesting to note the parallels between the loss net-work of the last Chapter and our model of an electrical network. At themicroscopic level, we have a probabilistic model of call arrivals as a Pois-son process and rules for whether a call is accepted; this parallels ourdescription of electron motion as a random walk with exclusion.At themacroscopic level, we have quantities describing average behaviour overa period of time, such as blocking probabilities or currents and potentials,with relationships determining them, such as the conditions on B or Ohm’slaw and Kirchhoff’s equations. Finally, at what might be termed theteleo-logical level, we have for both networks an extremal characterization – anobjective function that the network is “trying” to optimize.

We saw, in Figures 3.12 and 3.13, that instability of alternative routingcould be interpreted in terms of multiple minima of a function. Viewedin this light, the bistability is a natural consequence of theform of thefunction that the network is trying to minimize. One way of interpretingthe trunk reservation, sticky random scheme is as an attempt to align more


Braess’sparadox—( closely the microscopic rules with the desired macroscopic consequences.

(For example, in Exercise 3.10 you showed that the proportion of trafficrerouted fromi → j via nodek is inversely proportional to the loss rate onthis route.)

Exercises

Exercise 4.1 Convince yourself that, if we look only at a single buttonin the button game, it is performing a symmetric random walk on a graphwith transition ratesγ jk. (Of course, the random walks of different buttonswill not be independent.)

Exercise 4.2 Convince yourself that, if we identify “electron”= “blackbutton” and “no electron”= “white button”, then the button model is equiv-alent to the model of electron motion, in the sense that the theset of nodesoccupied by a black button and the set of nodes occupied by an electronboth define the same Markov process. Why does swapping two blackbut-tons correspond to a blocked electron jump?

Exercise 4.3 Show that the effective resistance of a network of resistorsis not decreased if the resistance of an edge is increased.

4.2 Road traffic models

In the last Section we saw that simple local rules may allow a network tosolve a large-scale optimization problem. This is an encouraging insight;but local rules may lead to poor system behaviour if the rules arethe wrongones. We begin with an example that arises in the study of road traffic.

4.2.1 Braess’s paradox

Figure 4.3 depicts a road network in which cars travel from South to North.The labels on the edges, or one-way roads, indicate the delay that will beincurred by the cars travelling on that road, as a function of the traffic y(number of cars per unit time) travelling along it.

Let us fix the total flow from South to North at 6 cars per unit time.The left-hand diagram of Figure 4.4 shows how the cars will distributethemselves. Note that all routes from S to N have the same total delay of83 time units, so no driver has an incentive to switch route.

In the right-hand diagram of Figure 4.3, we have introduced an extra

4.2 Road traffic models 93

y+ 50

10y

10y

y+ 50

E

S

W

N

Figure 4.3 A road network with delays.

10y

10y10y

10y y+ 50y+ 50

y+ 50y+ 50

y+ 10W E

S

N

W E

S

N

Figure 4.4 Equilibrium when flow from S to N is 6 (left).Equilibrium when a new road is added (right.)

road with its own delay function, and found the new distribution of trafficsuch that no driver has an incentive to switch route. Note that all routesagain have the same total delay, but it is now 92 time units! That is, addingextra road capacity hasincreasedeveryone’s delay. This is in sharp contrast


Braess’sparadox—)

Wardropequilibrium—(

to electrical networks, where adding a link could only make iteasier totraverse the network.

Next we define and analyse a general model of a road traffic network,with a view to figuring out why the paradox occurs and how we mightavoid or fix it.

4.2.2 Wardrop equilibrium

We will model the network as a setJ of directed links. (If we want someof the roads to be two-way, we can introduce two links, one going in eachdirection). The set of possibleroutesthrough the network isR ⊂ 2J ; eachroute is a subset of links (we will not need to know in what order these aretraversed). LetA be thelink-route incidence matrix, soAjr = 1 if j ∈ r andAjr = 0 otherwise.

Let xr be the flow on router, andx = (xr , r ∈ R) the vector of flows.Then the flow on linkj is given by

yj =∑

r∈RAjr xr , j ∈ J .

We can equivalently writey = Ax.The delay that is incurred on a single linkj is given by a functionD j(yj),

which we assume to be continuously differentiable and increasing. (Wemight also expect it to be convex, but we will not be using thisassumption.)We will treat this delay as a steady-state quantity; it doesn’tbuild up fromone time period to another. The end-to-end delay along a route will simply

0 yj

often but not necessarilyD′j(0) = 0

but not necessarily,possibly,

a vertical asymptote

D j(yj)

Figure 4.5 Possible delay as a function of traffic on a link

be the sum of the delays along each link.


The basic premise of our model is that the individual cars care aboutgetting from their origin to their destination, but don’t care about the routethey take. LetS be the set of source-destination pairs. For us, a source-destination pair is simply a set of routes which serve it. We letH be theroute-destination incidence matrix withHsr = 1 if the source-destinationpair s is served by router andHsr = 0 otherwise. (Column sums ofH areone, i.e. each route has a single source-destination pair that it serves. Theflow fs on a source-destination pairs is the sum of the flows along all theroutes serving it:

fs =∑

r∈RHsrxr , s ∈ S.

Equivalently, we writef = Hx.We would like to answer the following question. Does there always exist

a stable routing pattern, where none of the drivers has any incentive toswitch routes? If so, can we characterise this routing pattern in a way thatprovides some insight into the paradox?

To illustrate the concepts of the link-route and route-destination inci-dence matrices, consider the network in Figure 4.6.

1 25

4

3

b

a

c

Figure 4.6 Links on an example network. Routes areab= {1},ac= {1,3}, ba= {2}, bc= {3}, ca1 = {5}, ca2 = {4,2}, cb1 = {4},cb2 = {5,1}.

We will take it to have the link-route incidence matrix

A =

ab ac ba bc ca1 ca2 cb1 cb2

1 1 1 0 0 0 0 0 12 0 0 1 0 0 1 0 03 0 1 0 1 0 0 0 04 0 0 0 0 0 1 1 05 0 0 0 0 1 0 0 1

.


Wardropequilibrium

and the corresponding route-destination incidence matrix

H =

ab ac ba bc ca1 ca2 cb1 cb2

ab 1 0 0 0 0 0 0 0ac 0 1 0 0 0 0 0 0ba 0 0 1 0 0 0 0 0bc 0 0 0 1 0 0 0 0ca 0 0 0 0 1 1 0 0cb 0 0 0 0 0 0 1 1

(Note that we are omitting the potential two-link route servingbaand con-sisting of links 3 and 5; our modelling assumptions allow us to considerany set of routes, not necessarily all the physically possibleones.)

Let us address what we mean by a stable routing pattern. Given a set offlows xr , we can determine the delay of a driver on each of the routes. Tofind the delay of a driver on router, we find the traffic yj along each linkof the network, evaluate the associated delayD j(yj), and add up the delaysalong all the links of a given route. A routing pattern will bestableif noneof the drivers has any incentive to switch: that is, for any router serving agiven source-destination pair and carrying a positive amount oftraffic, theaggregate delay along all other routes serving the same source-destinationpair is at least as large. Put into our notation, this becomes

xr > 0 =⇒∑

j∈JD j(yj)Ajr ≤

∑

j∈JD j(yj)Ajr ′ ∀ r ′ ∈ s(r)

wheres(r) is the set of routes serving the same source-destination pair asr.

Such a stable pattern has a name.

Definition 4.2 A Wardrop equilibriumis a vector of flows along routesx = (xr , r ∈ R) such that

xr > 0 =⇒∑

j∈JD j(yj)Ajr = min

r ′∈s(r)

∑

j∈JD j(yj)Ajr ′ ,

wherey = Ax.

From our definition, it is not clear that a Wardrop equilibrium exists andhow many of them there are. We will now show existence of a Wardropequilibrium, by exhibiting an alternative characterization ofit.

Theorem 4.3 A Wardrop equilibrium exists.


Wardropequilibrium

Proof Consider the optimization problem

min∑

j∈J

∫ y j

0D j(u)du

over x ≥ 0, y

s.t. Hx = f , Ax= y.

(Note that we are leavingy unconstrained; we will gety ≥ 0 automatically,becausey = Ax.)

The feasible region is convex and compact, and the objective functionis differentiable and convex (becauseD j is an increasing, continuous func-tion). Thus, an optimum exists and can be found by Lagrangian techniques.The Lagrangian for the problem is

L(x, y; λ, µ) =∑

j∈J

∫ y j

0D j(u)du+ λ · ( f − Hx) − µ · (y− Ax)

To minimize, we differentiate:

∂L∂yj= D j(yj) − µ j ,

∂L∂xr= −λs(r) +

∑

j

µ jAjr .

We want to find a minimum overxr ≥ 0 and allyj ∈ R. This means thatat the minimum the derivative with respect toyj is equal to 0, i.e. we canidentify µ j = D j(yj) as the delay on linkj. The derivative with respect toxr must be nonnegative, and 0 ifxr > 0, so

λs(r)

=∑

j µ jAjr , xr > 0

≤ ∑

j µ jAjr , xr = 0.

Therefore, we can interpretλs(r) as the minimal delay available to the source-destination pairs(r).

Consequently, solutions of this optimization problem are in one-to-onecorrespondence with Wardrop equilibria. �

Remark 4.4 Can we interpret the function∫ y j

0D j(u)du appearing in the

above optimization? Not that easily: the glib answer is thatit is the func-tion whose derivative isD j(yj). The reader with a well-developed physicsinsight might recall the relationship between potential energy and force: ifthe force acting on an object is a function of position only, thenthe object’spotential energy is the function whose derivative is the (negative of the)force. Despite this indirect definition, we have a well-developed intuition


Wardropequilibrium

about energy, and of the tendency of physical systems to moveto minimumpotential energy configurations.

Later, in Exercise 7.13, we shall touch on an economic parallelwherean abstract concept, utility, is defined as a function whose derivative givesa measurable quantity, demand.

Remark 4.5 Is the Wardrop equilibrium unique? Strictly speaking, no.If we assume that allD j are strictly increasing, then we can conclude thatthere is a unique optimum for the link flowsy in the optimization problemabove, since the objective function will be a strictly convex function ofy.However, for a general link-route incidence matrix, there is no reason to ex-pect uniqueness inx. For example, in the network in Figure 4.7 we clearlycan shift traffic between the solid and the dashed routes while keeping thetraffic on each link the same.

Of course, as in the previous chapter, if we assume the existence ofsingle-link routes, the Wardrop equilibrium will be unique, but it is some-what less natural to assume the existence of single-link traffic here.

Figure 4.7 Network with four links, two 2-link routes (solid),and two 3-link routes (dashed). This network has many trafficpatternsx giving the same link flowsy.

In the course of showing existence of the Wardrop equilibrium, we havearrived at a certain extremal characterisation of it: it is a set of traffic flowsthat optimizes, subject to constraints, the quantity

∑

j∈J∫ y j

0D j(u)du. If we

add an extra link (and, therefore, extra possible routing options), then thisquantity will decrease. However, this tells us nothing aboutthe changes tothe average delay.

How might we design a network so that the individual users’ prefer-ences would line up with a societal goal of minimizing the average delay?


Consider the same problem with a different objective function:

minimize∑

j∈JyjD j(yj)

over x ≥ 0, y

such thatHx = f , Ax= y.

The quantity∑

j yjD j(yj) is the rate at which total delay is incurred by usersin the system. (In our model, which has a constant flow of users through thesystem per unit time, this is the natural quantity to consider.) Applying thesame techniques to this new minimization problem, we get the Lagrangian

L(x, y; λ, µ) =∑

j∈JyjD j(yj) + λ · ( f − Hx) − µ · (y− Ax).

When we differentiate it, we obtain

∂L∂yj= D j(yj) + yjD

′j(yj) − µ j ,

∂L∂xr= −λs(r) +

∑

j

µ jAjr .

At the optimum, we still have

λs(r)

=∑

j µ jAjr , xr > 0

≤ ∑

j µ jAjr , xr = 0,

i.e. λ is the minimal total “something” along the route. The quantity µ j ,however, now has an extra term:

µ j = D j(yj) + yjD′j(yj).

If we interpretT j(yj) = yjD′j(yj) as a congestion toll that the users of linkj must pay, thenµ j is their total cost from both the delay and toll, andλs(r)

is the minimal cost available to source-destination pairs(r).This suggests that, by adding tolls on the links, we may be able to en-

courage users towards collectively more desirable behaviour. It is of coursestill a very simple model of a complex social problem. For example, it as-sumes that delay and toll are weighted equally by each driver. But the op-timization formulations do give us insight into the original paradox. Thepattern of traffic in Figure 4.7 is optimizing a certain function, but it is notthe “right” one. So it should not surprise us that the solution has counter-intuitive properties.


Wardropequilibrium

Wardropequilibrium—)

loss network

Exercises

Exercise 4.4 Calculate the tollT j(y) = yD′j(y) on each of the links inFigure 4.4. Observe that, if this toll is applied just after the new road isbuilt (i.e., while nobody is using it), there is no incentive for any driver touse the new road.

Exercise 4.5 In the definition of a Wardrop equilibriumfs is the aggregateflow for source-sink pairs, and is assumed fixed. Suppose now that theaggregate flow between source-sink pairs is not fixed, but is a continuous,strictly decreasing functionBs(λs), whereλs is the minimal delay over allroutes serving the source-sink pairs, for eachs ∈ S. For the extendedmodel, show that an equilibrium exists and solves the optimization problem

minimize∑

j∈J

∫ y j

0D j(u)du−G( f )

over x ≥ 0, y, f

s.t. Hx = f ,Ax= y,

for a suitable choice of the functionG( f ). Interpret your result in termsof a fictitious additional one-link “stay-at-home” route for eachsource-destination pairs, with appropriate delay characteristics.

[Hint: G( f ) =∑

s∈S∫ fs B−1

s (u) du. ]

Exercise 4.6 Recall the DUAL optimization problem for loss networks:

minimize∑

r

νre−∑

j y j A jr

︸︷︷︸

carried traffic

+∑

j

yjC j

such thaty ≥ 0

Thus the function being minimized does not readily align with,for exam-ple, the sum of carried traffic, which we might want tomaximize. Considera three link loss network with

A =

1 0 0 10 1 0 10 0 1 1

with ν j = C j = 100, j = 1, 2, 3. Show that ifν123 is large enough, thenreducing the capacity of link 1 to zero willincreasethe carried traffic.

4.3 Optimization of queueing and loss networks 101

lossnetwork—(

loss network

4.3 Optimization of queueing and loss networks

Models similar to those of Section 4.2 have been used for traffic in commu-nication networks: in this Section we consider a queueing network, wherethe ideas cross over easily, and also a loss network.

4.3.1 A queueing network

Consider an open network of queues. As in the previous models, we willassume that there are routes through this network; a route is simplya set ofqueues that the customer must traverse. Letφ j be the service rate at queuej, and letνr be the arrival rate of customers on router. We shall suppose themean sojourn time of customers in queuej is 1

φ j−λ j, whereλ j =

∑

r: j∈r νr .(This model would arise from a network of·/M/1 queues, an example of anetwork with a product form stationary distribution.)

Suppose that a unit delay of a customer on router costswr ; then themean cost per unit time in the system is

W(ν; φ) =∑

r

wr

∑

j: j∈r

νr

φ j −∑

r ′: j∈r ′ νr ′

whereν = (νr)r , φ = (φ j) j . There may be multiple routesr that servethe same source-destination traffic, and we may be able to choose how wedivide the traffic across these routes. Can we tell if it would be desirable toshift traffic from one such route to another?

Now, consider increasing the arrival rate on router slightly. The changein total delay will be given by

dWdνr=

∑

j∈r

( wr

φ j − λ j︸︷︷︸

extra delay at queuejof another customer on router

+∑

r ′: j∈r ′

νr ′wr ′

(φ j − λ j)2

︸︷︷︸

knock-on cost, or externality,to later customers on routes throughj

of another customer on router

)

.

The expression gives the exact partial derivative, while the annotation be-low the expression gives an intuitively appealing interpretation: a singleextra customer on router will incur delay itself at each queue along itsroute, and it will additionally cause a knock-on cost to other customerspassing through these queues.

If two or more routes can substitute for each other, then the objectivefunction W(ν; φ) can be improved by shifting traffic towards routes withsmaller derivatives. (Gallager (1977) describes an algorithm toimplementthis via routing rules held at individual nodes of a data network.)


loss networkErlang fixed

pointErlang’s

formula

4.3.2 Implied costs in loss networks

What happens if we explicitly attempt to align routing in a loss network,so as to optimize a weighted sum of carried traffics? In this section, we usethe Erlang fixed point to conduct an analysis for a loss network similar tothat for the queueing network of the last section. The calculations are moreinvolved for a loss network, since an increase in traffic on a route affects notjust the resources on that route: the knock-on effects spread out throughoutthe network. Nevertheless we shall see that the equations we derive stillhave a local character.

Consider then the fixed point model described in Chapter 3. Lettheblocking probabilitiesB1, B2, . . . , BJ be a solution to the equations

Bj = E(ρ j ,C j) j = 1, 2, . . . , J (4.1)

where

ρ j = (1− Bj)−1

∑

r

Ajr νr

∏

i

(1− Bi)Air (4.2)

and the functionE is Erlang’s formula

E(ν,C) =νC

C!

C∑

n=0

νn

n!

−1

. (4.3)

Then an approximation for the loss probability on router is given by

1− Lr =∏

j

(1− Bj)A jr . (4.4)

Suppose that each call carried on router is worth wr . Then, under theapproximation (4.4), the rate of return from the network will be

W(ν; C) =∑

r

wrλr where λr = νr(1− Lr )

corresponds to the traffic carried on router. We emphasise the dependenceof W on the vectors of offered traffics ν = (νr , r ∈ R) and capacitiesC = (C1,C2, . . . ,CJ), and we shall be interested in howW(ν; C) varieswith changes in traffic on different routes, or with changes in capacity atdifferent links. Erlang’s formula can be extended to non-integral values ofits second argument in various ways, but for our purpose there is definitelya preference: we extend the definition (4.3) to non-integral valuesof thescalarC by linear interpolation, and at integer values ofC j we define thederivative ofW(ν; C) with respect toC j to be the left derivative. (Thesedefinitions are explored in Exercise 4.8.)


shadow priceLet

δ j = ρ j(E(ρ j ,C j − 1)− E(ρ j ,C j)) .

(This isErlang’s improvement formula– the increase in carried traffic thatcomes from a unit increase in capacity.)

Theorem 4.6d

dνrW(ν; C) = (1− Lr )sr (4.5)

andd

dCjW(ν; C) = cj (4.6)

where s= (sr , r ∈ R) and c= (c1, c2, . . . , cJ) are the unique solution to thelinear equations

sr = wr −∑

j

cjAjr (4.7)

cj = δ j

∑

r Ajrλr(sr + cj)∑

r Ajrλr. (4.8)

Remark 4.7 We can interpretsr as thesurplus valueof a call on router:if such a call is accepted it will earnwr directly but at animplied costof cj

for each circuit used from linkj. The implied costsc measure the expectedknock-on effects of accepting a call upon later arrivals at the network. From(4.6) it follows thatcj is also ashadow price, measuring the sensitivity ofthe rate of return to the capacityC j of link j. Note the local characterof equations (4.7) and (4.8): the right-hand side of (4.7) involvescostscj only for links j on the router, while (4.8) exhibitscj in terms of anweighted average, over just those routes through linkj, of sr + cj . We caninterpret (4.8) as follows: a circuit used for a call on linkj will cause theloss of an additional arriving call with probabilityδ j ; given that it does, theprobability the additional lost call is on router is proportional toAjrλr ; andthis will losewr directly but have a positive knock-on impact elsewhere onrouter, captured by the termsr + cj .

Proof Note that in many ways this is not a complicated statement, asallwe are doing is differentiating an implicit function. However, we are goingto be somewhat clever about it.

We shall prove the Theorem for the case where (Ajr ) is a 0-1 matrix.Suppose, without loss of generality, that there exist marker routes{ j} ∈ R


for j = 1, 2, . . . , J, with ν{ j} = w{ j} = 0. For notational simplicity, we willsimply writeν j for ν{ j}.

Exercise 4.8 shows that, in order to computedWdCj

, it is sufficient to com-

pute dWdν j

instead. The key observation is that if, when we changeν j , wealso appropriately change the values of all the otherνk, then we can keepthe loadsρk and blocking probabilitiesBk on all links k , j unchanged.This makes it easy to compute the resulting change inW. (To avoid triv-ial complications with what it means to haveν j < 0, we can compute thederivatives around some point where allν j are positive; the algebraic ma-chinery will be exactly the same.)

We proceed as follows. Alter the offered traffic ν j . This will affect di-rectly the blocking probabilityBj at link j, and hence the carried trafficsλr

for routesr through link j. This in turn will have indirect effects upon otherlinks through which these routes pass. We can, however, cancelout theseindirect effects by judicious alterations toνk for k , j. The alterations toνk have to be such as to leave the reduced loadρk constant fork , j, sincethen, from (4.1), the blocking probabilityBk will be left constant fork , j.

Let us begin by calculating the direct effect of the change inν j on thecarried traffic λr , along a route through linkj. From the relation

λr = νr

∏

k

(1− Bk)Akr ,

assuming we can arrange for allBk, k , j, to be unchanged, the direct effectis

dλr = −Ajr (1− Bj)−1λr ·

∂Bj

∂ν j· dν j .

Here the partial derivative∂Bj

∂ν jis calculated by allowingν j to vary, but hold-

ing the other traffic offered to link j fixed. We can use Exercise 4.8 to relateδ j to this partial derivative, and deduce that

∂Bj

∂ν j=

ddρ j

E(ρ j ,C j)

= (1− Bj)δ jρ−1j .

Thus

dλr = −Ajrλrδ jρ−1j · dν j .

Next we calculate the necessary alterations toνk for k , j. In order thatρk be left constant, for each router passing throughk we must compensate


loss networkfor the changedλr by a change

dνk = −Akr(1− Bk)−1 · dλr .

Indeed, from (4.2), reduced loadρk receives a contributionAkr(1− Bk)−1λr

from router. Observe that, apart from marker routes, the only routes forwhichλr changes are routes through linkj. Therefore, we can calculate theeffect of this combination of changes of theν j on W(ν; C), since we havecomputeddλr for those routes already. Putting the various terms together,we obtain

[

ddν j+

∑

r

Ajrλrδ jρ−1j

∑

k, j

Akr(1− Bk)−1 d

dνk

]

W(ν; C)

= −∑

r

Ajrλrδ jρ−1j wr . (4.9)

Further, using (4.6) and Exercise 4.8, we obtain

cj = −(1− Bj)−1 d

dν jW(ν; C). (4.10)

Substituting (4.10) into (4.9) gives (4.8), where we use (4.7) to define sr .To calculate the derivative (4.5), compensate for the change inνr by

alterations toν j that holdρ j , and henceBj , constant: that is

dν j = −Ajr (1− Bj)−1

∏

k

(1− Bk)Akr · dνr .

The change in carried traffic on router, worthwr , is (1− Lr )dνr . Thus[

ddνr−

∑

j

Ajr (1− Bj)−1

∏

k

(1− Bk)Akr

ddν j

]

W(ν; C)

= (1− Lr)wr .

The derivative (4.5) now follows from the identity (4.10).Finally, uniqueness of the solution to the equations (4.7)-(4.8) is shown

in Exercise 4.9. �

Remark 4.8 If traffic serving a source-sink pair can be spread over twoor more routes, then the derivatives we’ve calculated for queueing and lossnetworks can be used as the basis for a gradient descent algorithmto finda local optimum of the functionW. This form of routing optimization iscalledquasi-static: it corresponds to probabilistically splitting traffic overroutes according to proportions that vary slowly, so that the traffic offeredto each route is approximately Poisson over short time periods. Incontrast


resourcepooling

trunkingefficiency

Braess’sparadox

dynamicschemes, such as the sticky random scheme of the last Chapter,route calls according to the instantaneous state of the network.

The discussion of resource pooling at the end of Section 3.7 showed thatwe should in general expect dynamic schemes to achieve betterperfor-mance than quasi-static schemes, as a consequence of trunkingefficiency.But we saw, in the same discussion, that in networks with efficient resourcepooling, there may be little warning that the system as a whole is close tocapacity. Implied costs can be calculated for dynamic routing schemes,where they can help anticipate the need for capacity expansionbefore itbecomes apparent through loss of traffic (Key (1988)). Together with mea-surements of load these calculations can be very helpful with longer-termnetwork planning, since externalities, such as those that caused the diffi-culties with Braess’s paradox, are taken into account.

Exercises

Exercise 4.7 Show that the functionW(ν; φ) defined in sub-section 4.3.1for a queueing network is a strictly convex function ofν, and deduce thata gradient descent algorithm will eventually find the minimum.Show thatwhenwr does not depend uponr, the queueing network can be recast interms of the final optimization in Section 4.2, and check that the toll T j(y)defined there is just the externality identified in sub-section 4.3.1.

Exercise 4.8 For scalarν,C recall from Exercise 1.8 that

ddC

E(ν,C) ≡ E(ν,C) − E(ν,C − 1)

= −(1− E(ν,C))−1 ddν

E(ν,C) .

For vectorν,C show that

ddCj

W(ν; C) = −(1− Bj)−1 d

dν jW(ν; C).

Exercise 4.9 Show that the equations (4.7)-(4.8) may be rewritten in anequivalent form as a matrix identity

c(I + AλATγ) = wλATγ

where we define diagonal matrices

λ = diag(λr , r ∈ R), γ = diag

(δ j

(1− δ j)∑

r Ajrλr, j ∈ J

)

.


lossnetwork—)Note that (I + γ1/2AλATγ1/2) is positive definite and hence invertible. De-

duce that the equations have the unique solution

c = wλATγ1/2(I + γ1/2AλATγ1/2)−1γ1/2 .

4.4 Further reading

For the reader interested in the history of the subject, Erlang (1925) givesan insight into the influence of Maxwell’s law (for the distribution of molec-ular velocities) on Erlang’s thinking, while Doyle and Snell(2000) givea beautifully written account of work on random walks and electricnet-works going back to Lords Rayleigh and Kelvin. Whittle (2007) treats theoptimization of network design in a wide range of areas, with fascinatinginsights into many of the topics covered in these notes.

token ring—(

5

Random access networks

Consider the situation illustrated in Figure 5.1. We have multiple base sta-tions which cannot talk to each other directly because the Earth is in theway. (The diagram is not to scale.) Instead, they send messages to a satel-lite. The satellite will broadcast the message (with its address) back to allthe stations.

Earth

Figure 5.1 Multiple base stations contacting a satellite.

If two stations attempt to transmit messages simultaneously,the mes-sages will interfere with each other, and will need to be retransmitted. Sothe fundamental question we will be addressing here iscontention resolu-tion, i.e. avoiding collisions in such a set-up.

Further examples of the same fundamental problem occur with mobiletelephones attempting connection to a base station, or wireless devicescommunicating with each other. In all cases, if we could instantaneouslysense whether someone else is transmitting, there would be no collisions.The problem arises because the finite speed of light causes a delay be-tween the time when a station starts transmitting and the time when theother stations sense this interference. As processing speeds increase, thespeed-of-light delays pose a problem over shorter and shorter distances.

One approach might be to use atoken ring. Label the stations in a cycli-cal order. One by one, each station will transmit its messages to the satel-lite. When it is done, it will send a “token” message to indicate that it is

108

5.1 The ALOHA protocol 109

token ring—)ALOHA

protocol—(

finished; this lets the next station begin transmitting. If a station has nomessages to transmit, it will simply pass on the token. This approach hasdifficulties if the token becomes corrupted, e.g. by ambient noise.But italso does not work well in an important limiting regime where the numberof stations is large (and the total traffic is constant), since simply passingon the token around all the stations will take a long time.

Our focus in this chapter will be on protocols that are useful whenthenumber of stations is large or unknown. These protocols use randomnessin an intrinsic manner in order to schedule access to the channel, and hencethe title of the chapter.

5.1 The ALOHA protocol

The ALOHA protocol was developed for a radio broadcast network con-necting terminals on the Hawaiian islands to a central computing facility,and hence its name.

We will assume that time is divided into discrete slots, and all trans-mission attempts take up exactly one time slot. Let the channel state beZt ∈ {0, 1, ∗}, t ∈ Z+. We putZt = 0 if no transmission attempts were madeduring time slott, Zt = 1 if exactly one transmission attempt was made,andZt = ∗ if more than one transmission attempt was made.

We will suppose that new packets for transmission arrive in a Poissonstream of rateν, so that in each time slot the random number of newly ar-riving packets has a Poisson distribution with meanν. We will model eachpacket as corresponding to its own station; once the packet is successfullytransmitted, the station leaves the system. Thus, there is no queueing ofpackets at a station in this model.

Let Yt be the number of new packets arriving during the time slott −1. These are all at new stations, and we suppose each of them attemptstransmission during time slott. In addition, stations that previously failedto transmit their packets may attempt to retransmit them during the timeslot. A transmission attempt during time slott will be successful only ifZt = 1; if Zt = ∗, there is a collision, and none of the packets involved willbe successfully transmitted.

We have so far not described the retransmission policy. We now defineit as follows. Let f ∈ (0, 1) be a fixed parameter.

Definition 5.1 (ALOHA protocol) After an unsuccessful transmission at-tempt, a station attempts retransmission after a delay that is ageometricallydistributed random variable with meanf −1, independently of everything

110 Random access networks

recurrence else. Equivalently, after its first attempt, a station independently retransmitsits packet with probabilityf in each following slot until it is successfullytransmitted.

Let Nt be the backlog of packets awaiting retransmission, i.e. packetsthat arrived before time slott − 1 and have not yet been successfully trans-mitted. Then, under the ALOHA protocol, the number of packetsthat willattemptretransmission during thetth slot is binomial with parametersNt

and f . Consequently, the total number of transmission attempts during timeslot t is

At = Yt + Binom(Nt, f ).

The channel stateZt is 0, 1, or∗ according to whetherAt is 0, 1, or> 1.The backlog evolves as

Nt+1 = Nt + Yt − I {Zt = 1},

and under our Poisson and geometric assumptionsNt is a Markov chain.We will try to determine whether it is recurrent. First, let’s look at thedriftof the backlog, that is, the conditional expectation of its change:

E[Nt+1 − Nt |Nt = n] = E[Yt − I {Zt = 1} |Nt = n]

= ν − P(Zt = 1 |Nt = n),

where the first term follows sinceYt is an independent Poisson variablewith meanν. To calculate the second term, note thatZt = 1 can occur onlyif Yt = 1 and there are no retransmission attempts, or ifYt = 0 and there isexactly one retransmission attempt. IfYt > 1 then we are guaranteed thatZt = ∗. Thus

P(Zt = 1 |Nt = n) = e−ν · n f(1− f )n−1 + νe−ν · (1− f )n.

We conclude that the drift is positive (i.e., backlog is “on average” grow-ing) if

ν > e−ν(n f + (1− f )ν)(1− f )n−1.

For any fixed retransmission probabilityf , the quantity on the right-handside tends to 0 asn→ ∞. Consequently, for any positive arrival rate of mes-sagesν, if the backlog is large enough, we expect it to grow even larger.This strongly suggests that the backlog will be transient – nota good sys-tem feature!

Remark 5.2 The drift condition does not prove thatNt is transient: thereare recurrent Markov chains with positive drift. For example, consider the


recurrencerecurrence!positive

Markov chain where from each staten there are two possible transitions:to n + 2 with probability 1− 1/n, and to 0 with probability 1/n. Then thedrift is 1 − 2/n→ 1, butP(hit 0 eventually)= 1 so the chain is recurrent.(The drift conditiondoesshow that the chain cannot be positive recurrent.)

For our process, the jumps are constrained, and it is easy to showthatE[(Nt+1 − Nt)2 |Nt] is uniformly bounded; this together with the drift anal-ysis can lead to a proof of transience. We will not pursue this approach.

We will see that the system has an even more clear failing than agrowingbacklog.

Proposition 5.3 Consider the ALOHA protocol, with arrival rateν > 0and retransmission probability f∈ (0, 1). Almost surely, there exists a finite(random) time after which we will always have Zt = ∗. That is, ALOHAtransmits only finitely many packets and then “jams” forever. Formally,

P(∃J < ∞ : Zt = ∗ ∀t ≥ J) = 1.

Remark 5.4 The timeJ is not a stopping time. That is, given the historyof the system up until timet we cannot determine ifJ < t and the channelhas jammed.

Proof For each size of the backlog, consider the probability that the chan-nel will unjam before the backlog increases:

p(n) = P(channel unjams before backlog increases|N0 = n)

= P(∃T < ∞ : N1, . . . ,NT = n,ZT = 0 or 1|N0 = n).

To computep(n), let us think of this as a game. Suppose we reach timetwithout either the channel unjamming or the backlog increasing. If at timet, Zt = 0 or 1 thenT = t and we “win”. If Zt = ∗ andNt+1 > n then no suchT exists and we “lose”. Otherwise,Zt = ∗ andNt+1 = n, and we are in thesame situation at timet + 1; call this outcome “try again”. In such a gamethe probability of eventually winning is

p(n) =P(win)

P(win) + P(lose)=

P(Zt = 0 or 1|Nt = n)1− P(Nt+1 = n,Zt = ∗ |Nt = n)

=e−ν(1+ ν)(1− f )n + e−νn f(1− f )n−1

1− e−ν(1− (1− f )n − n f(1− f )n−1)

∼ n f(1− f )n−1

eν − 1asn→ ∞.


Borel-Cantellilemma As we expected,p(n)→ 0 asn→ ∞. Moreover, they are summable:

∞∑

n=0

p(n) < ∞.

Summable sequences of probabilities bring to mind the first Borel-Cantellilemma.

Theorem 5.5(First Borel-Cantelli Lemma) If (An) is a sequence of eventssuch that

∑∞n=1 P(An) < ∞ then the probability that infinitely many of the

events occur is 0.

Proof

E[number of events occurring]= E

∑

n

I [An]

=∑

n

E[I [An]] =∑

n

P(An) < ∞.

Since the number of events occurring has finite expectation, itmust be finitewith probability 1. �

We cannot apply the Borel-Cantelli lemma directly to the sequencep(n):we need a sequence of events that occur once or not at all, whereas theremay be many times when the backlog reaches leveln. Instead, we willlook at therecord valueshit at record times. SetR(1) = 1 and letR(r +1) =inf {t > R(r) : Nt > NR(r)}.

t

N

Figure 5.2 A possible trajectory ofNt, and its record values andrecord times.

By definition, we hit each record value only once. Moreover, withprob-ability 1 the sequence of record values is infinite, i.e. the backlog is un-


Borel-Cantellilemma

bistability

bounded. Finally, since we showed∑∞

n=1 p(n) < ∞, we certainly have

∞∑

r=1

p(NR(r)) ≤∞∑

n=0

p(n) < ∞.

Let A(r) be the event that the channel unjams after the backlog hits its r th

record value. By the first Borel-Cantelli Lemma, only finitely manyof theeventsA(r) occur. However, since we hit ther th record value at a finite time,this means that there will be only finitely many successful transmissions:

P(∃J < ∞ : Zt = ∗ ∀t ≥ J) = 1,

for any arrival rateν > 0. �

��

��

MNt

P(Nt)

Figure 5.3 Qualitative behaviour of the equilibrium distributionof the backlog with a finite number of stations.

Remark 5.6 Suppose the number of stations is large but finite, sayM,and that stations that have a backlog do not generate new packets. In thatcase, the natural assumption is that the number of new arrivals in a time slothas a binomial distributionYt ∼ Binom(M − Nt, q), rather than a Poissondistribution.

SinceNt has a state space that is finite and irreducible, it can’t be tran-sient. ButM f may still be large, in which case, the equilibrium distributionof the backlog has the form sketched in Figure 5.3. While the backlog issmall it tends to stay small. However, if the backlog grows large at somepoint, it will tend to stay large for a really long time. This bistability forfinite M corresponds to transience in the Poisson limit (M → ∞ whileMq = ν).


bistability It may take a very long time for the system to traverse from the left to theright of Figure 5.3 (cf. the discussion of bistability in Section3.7.1), and thetheoretical instability may not be a real problem for many systems. Never-theless, motivated by large-scale systems where there may be noapparentupper bound on the number of stations, the theoretical question arises: is itpossible to construct a random access scheme that has a stable throughput(i.e. a long-run proportion of successfully occupied slots that ispositive)for some rateν > 0? We’ll address this in the next two sections. And we’llreturn to consider a model of a finite number of stations in Section 5.4.

Exercises

Exercise 5.1 In this exercise we show that if we can accomplish somepositive throughput, then with longer messages we can accomplish a through-put that’s arbitrarily close to 1.

Suppose that a message comprises a single “random access” packet anda random numberK of data packets. A station that successfully transmitsa random access packet is able to reserveK available slots for the restof the message. (For example, the station might announce it will use thenext K slots to transmit the data packets, and all the other stationsavoidtransmitting during those slots, as in Figure 5.4. Or the channel might betime divided, so that everykth slot is used by random access packets, anda successful random access packet reserves slots at times not divisible byk.) Indicate why, if a random access scheme achieves a stable throughput

��

Figure 5.4 A sequence of “random access” and “data” packetsfrom two stations. White transmits one random access packetfollowed by the rest of the message; three random access packetscollide; grey successfully transmits; white transmits again.

η > 0 for the random access packets, then a stable throughput of

EKEK + η−1

should be expected in such a hybrid system.

Exercise 5.2 In this exercise we look briefly at a distributed model ofrandom access, with a finite number of stations. We’ve seen a model of thisin continuous time, Example 2.24. We now describe a variant of this model

5.2 Estimating backlog 115

interferencegraph

ALOHAprotocol—)

ALOHAprotocol

in discrete time, to allow for delay in sensing a transmission. Recall thatthe interference graph has as vertices the stations, and stationsi and j arejoined by an edge if the two stations interfere with each other: write i ∼ j inthis case. First suppose there is an infinite backlog of packetsto transmit ateach station, and suppose that stationj transmits a packet in a time slot withprobability f j , independently over all time slots and all stations. Deducethat successful transmissions from stationj form a Bernoulli sequence ofrate

f j

∏

i∼ j

(1− fi).

Deduce that if, for eachj, packet arrivals at stationj occur at a rate lessthan this and stationj transmits with probabilityf j if it has packets queued,then the system will be stable.

5.2 Estimating backlog

The problem with the ALOHA protocol is that when the backlog getslarge, the probability of successfully transmitting a packet gets very smallbecause packets almost invariably collide. Now, if the stations knew thebacklog (they don’t!), they could reduce their probability of retransmissionwhen the backlog gets large, thus reducing the number of collisions.

Let us now not distinguish between old and new packets, and simply letNt be the number of packets that await to be transmitted. Supposethat eachof them will be transmitted or not with the same probabilityf , and let uscompute the optimal probability of retransmission for a given backlog size.The probability of exactly one transmission attempt given that the backlogis Nt = n is thenn f(1 − f )n−1. If we maximize this with respect tof , wewill get

0 =∂

∂ f= n(1− f )n−1 − n(n− 1) f (1− f )n−2 =⇒ f =

1n

in which case the probability of a single retransmission attempt is(

1− 1n

)n−1

→ e−1 asn→ ∞.

Thus, if the stations knew the size of the backlog, we would expect themto be able to maintain a throughput of up to 1/e≈ 0.368.

Now suppose that stations don’t know the backlog size, but can observethe channel state. We will assume that all stations are observing the channel


from time 0, so they have access to the sequenceZ1,Z2, . . . . They willmaintain a counterSt (which we hope to use as a proxy forNt), the samefor all stations, even those with nothing to transmit. Once a station has apacket to transmit, it will attempt transmission with probability 1/St.

Intuitively, the estimateSt should go up ifZt = ∗, because a collisionmeans that stations are transmitting too aggressively. It should go down ifZt = 0, because that means the stations aren’t being aggressive enough,and have wasted a perfectly good time slot for transmissions. IfZt = 1, wemight letSt decrease (there was a successful transmission), or stay constant(the transmission probability is about right). So, for example, wemight try

St+1 =

max(1,St − 1), Zt = 0 or 1

St + 1, Zt = ∗(naıve values). (5.1)

(We don’t wantSt to fall below 1, because we are using 1/St as the trans-mission probability.) More generally, we will let

St+1 = max(1,St + aI{Zt = 0} + bI{Zt = 1} + cI{Zt = ∗}). (5.2)

for some triplet (a, b, c), where we expecta < 0 andc > 0.Our hope is that ifSt can trackNt, then we may be able to achieve

a positive throughput. Notice thatNt alone andSt alone are not Markovchains, but the pair (St,Nt) is a Markov chain (why?). We next computethe drift of this Markov chain.

Computing the probability of 0, 1, or more than 1 transmission attemptsduring a time slot, we have (check these!)

E[St+1 − St |St = s,Nt = n] = a

(

1− 1s

)n

+ b(n

s

) (

1− 1s

)n−1

+ c

1−(

1− 1s

)n

−(n

s

) (

1− 1s

)n−1

and

E[Nt+1 − Nt |St = s,Nt = n] = ν − ns

(

1− 1s

)n−1

.

Consider now a deterministic approximation to our system by a pair ofcoupled differential equations: that is, trajectories (s(t), n(t)) with

dsdt= E[St+1 − St |St = s,Nt = n],

dndt= E[Nt+1 − Nt |St = s,Nt = n].

We shall investigate the drifts as a function ofκ(t) = n(t)s(t) , for large values

5.2 Estimating backlog 117

of n, s when, in the limit,

dsdt= (a− c)e−κ + (b− c)κe−κ + c,

dndt= ν − κe−κ. (5.3)

We aim to show that trajectories of solutions to these differential equationsconverge towards the origin, as in Figure 5.5.

Suppose we choose (a, b, c) so that

(a− c)e−κ + (b− c)κe−κ + c

< 0, κ < 1

> 0, κ > 1.(5.4)

Then dsdt < or > 0 according asκ(t) < or > 1, and trajectories cross the

diagonalκ = 1 vertically, and in a downwards direction providedν < 1/e.This looks promising, since once the trajectory is belowκ = 1 we havedsdt < 0. But we need to know more whenκ(t) > 1.

Suppose we insist that (a, b, c) also satisfy

ν − κe−κ < κ ((a− c)e−κ + (b− c)κe−κ + c)

, for κ > 1. (5.5)

Then (Exercise 5.3) the derivativeddtκ(t) < 0 for κ(t) > 1, so that in theregionκ > 1 trajectories approach and cross the diagonalκ = 1.

For either of the choices

(a, b, c) = (2− e, 0, 1) or (1− e/2, 1− e/2, 1) (5.6)

(or many other choices) and any value ofν < e−1, both of these conditionsare satisfied (Exercise 5.3), and we can deduce that all trajectories approachthe origin. In Figure 5.5, we plot the vector field when (a, b, c) = (1− e

2 , 1−e2 , 1) andν = 1

3.

Remark 5.7 Suppose the triplet (a, b, c) satisfies the conditions (5.4) and(5.5), and thatν < 1/e. Then along a trajectoryκ(t) converges to a valueκν < 1, andκν ↑ 1 asν ↑ 1/e.

The triplet (a, b, c) = (−1,−1, 1) suggested in (5.1) does not manage tosatisfy the conditions (5.4) and (5.5), and its maximal throughput is lessthan 1/e (but still positive). In Figure 5.6, we show the vector field of theequation forν = 1

3 (now unstable) andν = 16 (stable).

Note that the triplet (a, b, c) = (1 − e2 , 1 −

e2 , 1) only requires the sta-

tions to know whether a collision occurred – it does not require a station todistinguish between channel states 0 and 1.

Just as in Remark 5.2 drift analysis didn’t conclusively prove transience,here we have not conclusively proven positive recurrence yet. Negativedrift does mean that the chain will not be transient (Meyn and Tweedie,


Figure 5.5 Vector field of (s,n) when(a,b, c) = (1− e/2,1− e/2,1) andν = 1/3.

Figure 5.6 Vector field of (s,n) with the “naıve” parameters(a,b, c) = (−1,−1,1). Left: ν = 1/3 (now unstable). Right:ν = 1/6 (stable).

1993, Theorem 8.4.3), but the expected return time may be infinite, asthe following argument easily shows. Consider a Markov chain that jumpsfrom 0 to 2n − 1, for n ≥ 1, with probability 2−n, and forn > 1 transitions

5.3 Acknowledgement-based schemes 119

Foster-Lyapunovcriteria

ALOHAprotocol

deterministicallyn→ n− 1 . Then, starting from 0, the return time will be2n with probability 2−n, and so the expected return time will be infinite.

We will prove positive recurrence of this system later, in Exercise D.3.Rather than finding an explicit equilibrium distribution (which would bequite hard in this case), we will use theFoster-Lyapunov criteria, a stricterversion of the drift conditions.

Exercises

Exercise 5.3 Show that condition (5.5) is sufficient to ensure thatκ(t) isdecreasing whenκ(t) > 1.

Show that either of the choices (5.6) satisfy both conditions (5.4) and(5.5) providedν < 1/e.

Exercise 5.4 Show that ifc ≤ 0 then the conclusion of Proposition 5.3holds – the channel jams after a finite time – for the scheme (5.1).

Exercise 5.5 Repeat the argument of this section with the “naıve” valuesa = −1, b = −1, c = 1 of (5.1). Check that the drift ofS is negative ifNt < κSt, and positive ifNt > κSt, for κ ≈ 1.68. What throughput shouldthis system allow?

5.3 Acknowledgement-based schemes

Often it is not possible for stations to observe the channel. Forexample, amachine using the Internet typically learns that a sent packet has been suc-cessfully received when it receives anacknowledgement packet, or ACK.We will now consider what happens when the only information a stationhas concerning other stations or the use of the channel is the history of itsown transmission attempts.

Let us model the situation as follows. Packets arrive from time 0 onwardsin a Poisson stream of rateν > 0. A packet arriving in slott will attempttransmission in slotst+ x0, t+ x1, t+ x2, . . . until the packet is successfullytransmitted, where 1= x0 < x1 < x2 < . . . , andX = (x1, x2, . . . ) is arandom (increasing) sequence. Assume the choice of the sequence X isindependent from packet to packet and of the arrival process. (We couldimagine the packet arrives equipped with the set of times at which it willattempt retransmission.) Writeh(x) for the probability thatx ∈ X, so thath(1) = 1.


Ethernetprotocol—(

binaryexponentialbackoff

ALOHAprotocol

Example 5.8(ALOHA) For the ALOHA protocol of Section 5.1,h(x) =f for all x > 1.

Example 5.9(Ethernet) TheEthernetprotocol is defined as follows. Af-ter r unsuccessful attempts a packet is retransmitted after a period oftimechosen uniformly from the set{1, 2, 3, . . . , 2r }. (This is calledbinary expo-nential backoff.)

Here, there isn’t an easy expression forh(x), but we can enumerate thefirst few values. As always,h(1) = 1; h(2) = 1

2; h(3) = 58; h(4) = 17

64;h(5) = 305

1024; and so on. (Check these!)

We are interested in the question of whether, like ALOHA, the channelwill eventually jam forever, or if we can guarantee an infinite number ofsuccessful transmissions.

For the purpose of argument suppose that the channel is externally jammedfrom time 0 onwards, so that no packets are successfully transmitted, andhenceall potential packet retransmissions of the arriving packets occur.Let Yt be the number of newly arrived packets that arrive during slott − 1and are first transmitted in slott: thenY1,Y2, . . . are independent Poissonrandom variables with meanν.

Since the packets’ retransmission sequencesX are independent, the num-ber of transmissions occurring in slott will be Poisson with mean (h(1)+. . . + h(t))ν. (The number of packets arriving in slot 0 and transmitting inslot t is Poisson with meanνh(t), the number of packets arriving in slot 1and transmitting in slott is Poisson with meanνh(t − 1), and so on.)

The probability of less than two attempts in any given time slot t is

Pt = P(

Poisson(

(h(1)+ . . . + h(t))ν)

= 0 or 1)

=

1+ νt∑

r=1

h(r)

e−ν∑t

r=1 h(r).

LetΦ be the the set comprising those slots in which less than two attemptsare made in this externally jammed channel. Then the expected number ofsuch slots is

H(ν) ≡ E|Φ| =∞∑

t=1

Pt.

(The numbers of attempts in different slots are not independent, but thelinearity of expectation is all we’ve used here.) The probabilities Pt, andhenceH(ν), are decreasing inν. Let

νc = inf {ν : H(ν) < ∞}.

Thus if ν > νc thenE|Φ|< ∞ and henceΦ is finite with probability one.


ALOHAprotocol

Ethernetprotocol

We will first show thatνc is the critical rate. For arrival rates belowνc,with probability 1 infinitely many packets get transmitted; forarrival ratesaboveνc, only finitely many packets do. We will then compute the valueofνc for some schemes, including ALOHA and Ethernet.

Now, let’s remove the external jamming, and letΦ be the set comprisingthose slots in which less than two attempts are made. ThenΦ ⊂ Φ, sinceany element ofΦ is necessarily an element ofΦ. We will now show that ifν > νc, then with positive probability the containment is equality.

Consider an arrival rateν > νc. Subject the system to an additional,independent, Poisson arrival stream of rateǫ. There is a positive probabilitythat the additional arrivals jam every slot in the setΦ, sinceΦ is finite withprobability one. Thus there is positive probability, sayp > 0, that ineveryslot two or more transmissions are attempted, even without any externaljamming. You should convince yourself that in this caseΦ = Φ = ∅.

Suppose that did not occur, and consider the situation just after the firstslot, say slotτ1, in which either 0 or 1 transmissions are attempted. It looksjust like time 0, except we likely have a backlog of packets already present.These packets only create more collisions; so the probability that the chan-nel will never unjam from timeτ1 onwards is at leastp.

Similarly, following each time slot in which either 0 or 1 transmissionsare attempted, the probability that the system jams forever from this timeonwards is≥ p. Therefore, the number of successful packet transmissionsis bounded above by a geometric random variable with parameter 1− p.Sinceν andǫ are arbitrary subject toν > νc, ǫ > 0 it follows that for asystem with arrival rateν > νc the expected number of successful packettransmissions is finite. In particular, the number of successfulpacket trans-missions is finite with probability one.

Remark 5.10 This style of argument is known ascoupling: in order toshow that the jamming probability from timeτ1 onwards is at leastp, weconsider realisations of “system starting empty” and “systemstarting witha backlog” where pathwise the backlog can only create more collisions.

Next we show that ifνc > 0 and the arrival rate satisfiesν < νc , thenthe setΦ is infinite with probability one. Suppose otherwise: then thereispositive probability that the setΦ is finite. Now subject the system to anadditional, independent, Poisson arrival stream of rateǫ whereν + ǫ < νc.Then there is a positive probability, sayp > 0, that ineveryslot two or moretransmissions are attempted. Indeed, the earlier coupling argument showsthe number of such slots is bounded by a geometric random variable withparameter 1− p. But this then implies that, for the system with arrival rate


ALOHAprotocol ν+ ǫ, E |Φ| < ∞ and henceE

∣∣∣Φ

∣∣∣ < ∞. But this contradicts the definition of

νc. Hence we deduce that the setΦ is infinite with probability one for anyarrival rateν < νc.

We have shown that the channel will unjam infinitely many times forν < νc, but does this imply that there are infinitely many successful trans-missions? To address this question, letΨi be the set comprising those slotswherei retransmission attempts are made, fori = 0, 1. ThenΦ ⊂ Ψ0 ∪ Ψ1

(the inclusion may be strict, since the definition ofΨ0,Ψ1 does not involvefirst transmission attempts). Now

|Φ| =∞∑

t=1

(I [t ∈ Ψ1,Yt = 0] + I [t ∈ Ψ0,Yt = 1] + I [t ∈ Ψ0,Yt = 0]) .

But Yt is independent ofI [t ∈ Ψ0] and so with probability one

∞∑

t=1

(I [t ∈ Ψ0,Yt = 1] + I [t ∈ Ψ0,Yt = 0]) = ∞

=⇒∞∑

t=1

I [t ∈ Ψ0,Yt = 1] = ∞.

Since|Φ| = ∞ with probability one we deduce that∞∑

t=1

(I [t ∈ Ψ1,Yt = 0] + I [t ∈ Ψ0,Yt = 1]) = ∞

with probability one.We have thus shown the following result.

Theorem 5.11 If ν ∈ (0, νc) then with probability one an infinite numberof packets are successfully transmitted.

If ν > νc then with probability one only a finite number of packets aresuccessfully transmitted; further, the expected number ofpackets success-fully transmitted is finite.

Thus, the question we would like to answer is: What back-off schemes(equivalently, choices of the functionh(x)) haveνc > 0?

Example 5.12(ALOHA) Recall that for the ALOHA scheme,h(x) = ffor all x ≥ 2, and soPt ∼ ν f te−ν f t. HenceH(ν) < ∞ for anyν > 0. Thusνc = 0; as we already saw, with any positive arrival rate, the system willeventually jam. More generallyνc = 0 whenever

(log t)−1t∑

x=1

h(x)→ ∞ as t → ∞, (5.7)


recurrence!positiverecurrence!positive

since this implies∑∞

t=1 Pt < ∞ for any ν > 0 (Exercise 5.8). In thissense, the expected number of successful transmissions is finite for anyacknowledgement-based scheme with slower than exponentialbackoff.

Example 5.13(Ethernet) For the Ethernet protocol described above, ast → ∞,

t∑

r=1

h(r) ∼ log2 t.

(The expected time for a packet to makek retransmission attempts is roughly2k.) Therefore,

e−ν∑t

r=1 h(r) = t−ν/ log 2+o(1),

hence

Pt = νt−ν/ log 2+o(1) log2 t,

and∑∞

t=1 Pt < ∞ or = ∞ according asν > or < log 2 ≈ 0.693. That is,νc = log 2.

Remark 5.14 Notice that this doesn’t mean that the Markov chain de-scribing the backlog of packets is positive recurrent. Indeed, suppose wehad a positive recurrent system, withπ0 the equilibrium probability of 0retransmissions, andπ1 the equilibrium probability of 1 retransmission inany given time slot. Then the expected number of (new and old) packettransmissions in a slot is

π1e−ν + π0νe

−ν,

while the expected number of arrivals isν. Now, π0 + π1 ≤ 1 and clearlyν ≤ 1, soπ1e−ν + π0νe−ν ≤ e−ν, and we get

ν ≤ e−ν =⇒ ν ≤ 0.567.

Thus, for 0.567 < ν < 0.693 an acknowledgement-based scheme is mostdefinitely not positive recurrent, even though Ethernet will have an infinitenumber of packets successfully transmitted in this range.

In fact, (a geometric version of) the Ethernet scheme will not be positiverecurrent for any positive arrival rate.

Theorem 5.15(Aldous (1987)) Suppose the backoff after r unsuccessfulattempts is distributed geometrically with mean2r . (This will give the sameasymptotic behaviour of h(t), but a nicer Markov chain representation.)Then the Markov chain describing the backlog of untransmitted packets is


recurrence!positiveEthernet

protocol—)binary

exponentialbackoff

transient for allν > 0. Further, if N(t) is the number of packets successfullytransmitted by time t, then with probability one N(t)/t → 0 as t→ ∞.

It is not known whether there exists any acknowledgement-basedschemeand arrival rateν > 0 such that the scheme is positive recurrent. Theorem 5.15suggests not, and that some form of channel sensing or feedback,as in thelast section, is necessary for positive recurrence. (In practice, acknowledgement-based schemes generally discard packets after a finite number ofattempts.)

Remark 5.16 The models of this Chapter assumed that the time axis canbe slotted. In view of the very limited amount of information available to astation under an acknowledgement-based scheme, it is worth noting whathappens without this assumption. Suppose that a packet thatarrives at timet ∈ R+ is first transmitted in the interval (t, t + 1); the transmission is un-successful if any other station transmits for any part of the interval, andotherwise the transmission is successful. Suppose that afterr unsuccess-ful attempts, the packet is retransmitted after a period of time chosen uni-formly from the real interval (1, ⌊br⌋). Then it is known that the probabilityinfinitely many packets are transmitted successfully is one or zero accord-ing asν ≤ νc or ν > νc whereνc = 1

2 logb (Kelly and MacPhee (1987)).Comparing this with Exercise 5.7 we see that moving from a slottedto acontinuous time model halves the critical valueνc.

Exercises

Exercise 5.6 Theorem 5.11 left open the caseν = νc. Show that this casecan be decided according asH(νc) is finite or not.[Hint: Is the additional Poisson arrival stream of rateǫ really needed in theproof of the Theorem?]

Exercise 5.7 Consider the following generalization of the Ethernet pro-tocol. Suppose that afterr unsuccessful attempts, a packet is retransmittedafter a period of time chosen uniformly from{1, 2, 3, . . . , ⌊br⌋}. Thusb = 2corresponds to binary exponential backoff. Show thatνc = logb.

Exercise 5.8 Check the claim thatνc = 0 if condition (5.7) is satisfied.Show that if

∞∑

x=1

h(x) < ∞

then a packet is transmitted at most a finite number of times (equivalently,it is discarded after a finite number of attempts), and thatνc = ∞.

5.4 Distributed random access 125

Borel-Cantellilemma

ALOHAprotocol

interferencegraph

[Hint: First Borel-Cantelli Lemma.]Suppose that a packet is retransmitted a timex after its first transmission

with probabilityh(x), independently for eachx until it is successful. Showthat if

h(x) =1

x log xx ≥ 2

then no packets are discarded andνc = ∞.

Exercise 5.9 Any shared information between stations may aid coordi-nation. For example, suppose that in an acknowledgement-based scheme astation can distinguish even from odd slots, according to a universal clock,after its first transmission. Show that if a station transmits for the first timeas soon as it can, but confines retransmission attempts to even slots, thenthe number of packets successfully transmitted grows at least at rateνe−ν/2.

5.4 Distributed random access

In this section we consider a finite number of stations attempting to sharea channel using the ALOHA protocol, where not all stations interfere witheach other. We begin by supposing stations form the vertex set ofan in-terference graph, with an edge between two stations if they interfere witheach other. This would be a natural representation of a wireless networkdistributed over a geographical area.

We’ve seen already in Example 2.24 a simple model operating in con-tinuous time. Recalling that model, letS ⊂ {0, 1}R be the set of vectorsn = (nr , r ∈ R) with the property that ifi and j have an edge between themthenni ·nj = 0. Letn be a Markov process with state spaceS and transitionrates

q(n, n− er ) = 1 if nr = 1,

q(n, n+ er ) = exp(θr ) if n+ er ∈ S

for r ∈ R, whereer ∈ S is a unit vector with a 1 as itsrth component and 0selsewhere. We interpretnr = 1 as indicating that stationr is transmitting,and we assume that other stations within interference range of stationr (i.e.stations that share an edge with stationr) sense this and do not transmit.Station r transmits for a time that is exponentially distributed with unitmean. Ifnr = 0 and no station is already transmitting within interferencerange of stationr then stationr starts transmitting at rate exp(θr ). We ignorefor the moment any delays in sensing, thus any collisions.



The equilibrium distribution forn = (nr , r ∈ R) can, from Example 2.24or Section 3.3, be written in the form

π(n) =exp(n · θ)

∑

m∈S exp(m · θ) , n ∈ S (5.8)

whereθ = (θr , r ∈ R). The equilibrium probability that stationr is trans-mitting is then

Eθ[nr ] =∑

n∈Snrπ(n) =

∑

n∈S nr exp(n · θ)∑

m∈S exp(m · θ) . (5.9)

If we imagine that packets arrive at stationr at a mean rateλr and areserved when stationr is transmitting, then we should expect the systemto be stable providedλr < Eθ[nr ] for r ∈ R. By varying the parametersθ = (θr , r ∈ R), what range of values forλ = (λr , r ∈ R) does this allow?

Let

Λ = {λ ≥ 0 : ∃p(n) ≥ 0,∑

n∈Sp(n) = 1 such that

∑

n∈Sp(n)nr ≥ λr for r ∈ R}.

We’ll now show we can allow any vector in the interior ofΛ.

Theorem 5.17(Jiang and Walrand (2010))Providedλ lies in the interiorof the regionΛ there exists a vectorθ such that

Eθ[nr ] > λr , r ∈ R

whereEθ[nr ] is given by equation(5.9).

Proof Consider the optimization problem

minimize∑

n∈Sp(n) log p(n) (5.10)

subject to∑

n∈Sp(n)nr ≥ λr r ∈ R

and∑

n∈Sp(n) = 1

over p(n) ≥ 0 n ∈ S.

Sinceλ lies in the interior ofΛ there exists a feasible solution to this prob-lem, and indeed there exists a feasible solution for any sufficiently smallperturbation of the right-hand sides of the constraints. Further the optimumis attained, since the feasible region is compact. The objective function isconvex and the constraints are linear, and so the Strong Lagrangian Princi-ple holds (Appendix C).



The Lagrangian for the problem is

L(p, z; θ, κ)

=∑

n∈Sp(n) log p(n) +

∑

r∈Rθr

λr + zr −∑

n∈Sp(n)nr

+ κ

1−∑

n∈Sp(n)

wherezr , r ∈ R, are slack variables andκ, θr , r ∈ R, are Lagrange multipli-ers for the constraints. We now attempt to minimizeL over p(n) ≥ 0 andzr ≥ 0. To obtain a finite minimum overzr ≥ 0 we requireθr ≥ 0; and giventhis we have at the optimumθr · zr = 0. Further, differentiating with respectto p(n) gives

∂L∂p(n)

= 1+ log p(n) −∑

r∈Rθrnr − κ.

By the Strong Lagrangian Principle, we know there exist Lagrange multi-pliersκ, θr , r ∈ R, such that the Lagrangian is is maximized atp, z that arefeasible, andp, z are then optimal, for the original problem.

At a minimum overp(n)

p(n) = exp

κ − 1+∑

r∈Rθrnr

.

Chooseκ so that (p(n), n ∈ S) sum to one: then

p(n) =exp(

∑

r∈R θrnr )∑

m∈S exp(∑

r∈R θrmr ),

precisely of the form (5.8). Sincezr ≥ 0, we have found a vectorθ suchthat

∑

n∈Sp(n)nr ≥ λr ,

with equality if θr > 0. To obtain the strict inequality of the theorem, notethat sinceλ lies in the interior of the regionΛ we can use a perturbation ofλ in the above construction.

�

Remark 5.18 If we had used the transition rates

q(n, n− er ) = exp(−θr ) if nr = 1,

q(n, n+ er ) = 1 if n+ er ∈ S

for r ∈ R, then the equilibrium distribution (5.8) would be unaltered. Underthe first variant, stations transmit for a period with unit mean, whatever the


entropyinterference

graphloss network

the value ofθ, while under this second variant a station attempts to starttransmission at unit rate whatever the value ofθ. A disadvantage of thefirst variant is that if components ofθ are large then collisions (caused bya delay in sensing that a station within interference range has also startedtransmitting) will become more frequent. The second variant avoids this,and so while transmissions may occasionally collide, there isno tendencyfor this to increase as loads increase. But a disadvantage of thesecond vari-ant is that if components ofθ are large then stations may hog the system,blocking other stations, for long periods of time.

If we could scale up all the transition rates by the same factor then thiswould reduce the hogging periods by the same factor; but of coursespeed-of-light delays limit this.

Remark 5.19 The objective function (5.10) is (minus) theentropyofthe probability distribution (p(n), n ∈ S), and so the optimization problemfinds the distribution of maximum entropy subject to the constraints. Thisfunctional form of objective produces the product-form (5.8) forp at anoptimum. We’ve seen something formally similar in Theorem 3.10wherethe solution to the PRIMAL problem (3.4) had a product-form. In eachcase there are dual variables for each constraint, although theseconstraintsare of very different forms. As in the earlier problem, there are many vari-ations on the objective function (5.10) which will leave the optimum ofproduct-form, and we consider one in Exercise 5.11.

Example 5.20(Optical wavelength-routed networks) So far in this sec-tion we have defined the state spaceS in terms of an interference graph,but all that is needed is thatS be a subset of{0, 1}R, with the following hi-erarchical property:n+er ∈ S =⇒ n ∈ S. As an example we’ll describe asimple model of an optical wavelength-routed network. Such networks areable to provide very substantial capacity in the core of a communicationnetwork.

In optical networks each fibre is partitioned into a number of wave-lengths, each capable of transmitting data. The fibres are connected byswitches (called optical cross-connects): the fibres are the edgesand theswitches the nodes of the physical topology. A connection across the net-work must be routed over the physical topology and must also be assigneda wavelength. The combination of physical route and wavelength is knownas alightpath. We’ll consider an all-optical network, where the same wave-length is used along the path (if the wavelength can be converted at inter-mediate nodes, the model resembles a loss network as in Chapter3).

Lightpaths carry data between end-points of the physical topology, pro-


viding capacity between pairs of end-points. But how should lightpaths beprovided, so that the capacity between end-points is aligned with demand?

Suppose that on each fibre there are the same set of (scalar)C wave-lengths, and letr label a lightpath. We supposer identifies a source-sinkpair s(r), a set of fibres, and a wavelength, and we letR be the set of light-paths. It will be convenient to writer ∈ s if source-sink pairs is identifiedby lightpathr. Note that we donot assume that all lightpathsr ∈ s iden-tify the same set of fibres: a source-sink pair may be served by multiplepaths through the physical topology, as well as by multiple wavelengths ona given path.

Let nr = 0 or 1 indicate that lightpathr is inactive or active respectively,and letn = (nr , r ∈ R). Then a state is feasible if active lightpaths usingthe same wavelength have no fibres in common: letS(C) ⊂ {0, 1}R be theset of feasible states. With a mild risk of confusion we’ll letS be the set ofsource-sink pairs.

Suppose that each source-sink pairs maintain a parameterθs, and sup-pose thatn is a Markov process with state spaceS(C) and transition rates

q(n, n− er ) = 1 if nr = 1,

q(n, n+ er ) = exp(θs(r)) if n+ er ∈ S(C)

for r ∈ R, whereer ∈ S(C) is a unit vector with a 1 as itsrth componentand 0s elsewhere.

We recognise this model as just another example of a truncated re-versible process, and its stationary distribution is given by expression (5.8)whereθr = θs(r). But what range of loadsλ = (λs, s ∈ S) does this allow?

Let

Λ = {λ ≥ 0 : ∃p(n) ≥ 0,∑

n∈S(C)

p(n) = 1

such that∑

r∈R

∑

n∈S(C)

p(n)nr ≥ λs for s ∈ S}.

In Exercise 5.14 it is shown thatθ = (θs, s ∈ S) can be chosen to supportany vector in the interior ofΛ.

Remark 5.21 If we imagine time is slotted, then we can multiply wave-lengths in the above model by time as well as frequency division. Thismight help if we want to increase the ratio of raw wavelengths to source-sink pairs.

We’ve used the properties of a wavelength-routed network to motivatea particular model of routing, and such models may arise in a variety of


loss networkKleinrock’s

square rootassignment

other contexts. We should note that wide-area optical wavelength-routednetworks may provide the basic link capacities in a loss network, and inthis role we would expect the allocation of lightpaths to source-sink pairs tochange rather slowly, perhaps over a daily cycle. But the abovediscussiondoes suggest that simple decentralized algorithms driven by random localchoices have the potential to be remarkably effective.

Exercises

Exercise 5.10 By substituting the values ofp, zoptimizing the Lagrangian,show that the dual to problem (5.10) is

maximize V(θ) =∑

r∈Rλrθr − log

∑

n∈Sexp

∑

r∈Rθrnr

over θr ≥ 0, r ∈ R.

Show that∂V∂θr= λr − Eθ[nr ]

where the expectation is calculated under the distribution (5.8).(We shall use this result later, in Exercise 7.20.)

Exercise 5.11 If we imagine stationr as a server for the packets that ar-rive there, then a very crude model for the mean number of packets queuedat stationr is λr/(E[nr ] − λr ): this is the approximation we used in Ex-ample 2.22 leading to Kleinrock’s square root assignment. Consider theproblem

minimize∑

n∈Sp(n) log p(n) + D

∑

r∈R

λr

zr

subject to∑

n∈Sp(n)nr = λr + zr r ∈ R

and∑

n∈Sp(n) = 1

over p(n) ≥ 0, n ∈ S; zr ≥ 0, r ∈ R.

We’ve added to the objective function our crude representation of the meannumber of packets at stations, weighted by a positive constant D. Show thatprovidedλ lies in the interior of the regionΛ the optimum is again of theform (5.8).


independentvertex set

interferencegraph


Show that the dual to the optimization problem is

maximize V(θ) =∑

r∈R

(

λrθr + 2(Dλrθr )12

)

− log

∑

n∈Sexp

∑

r∈Rθrnr

over θr ≥ 0, r ∈ R,

and that

∂V∂θr= λr +

(

Dλr

θr

) 12

− Eθ[nr ].

Exercise 5.12 Consider the choice of the constantD in the model of thelast Exercise. Writeθr = θr (D), r ∈ R, to emphasise the dependence of theLagrange multipliers onD. From the form of the optimizing variableszshow that the mean number of packets at stationr is

(

θr (D)λr

D

) 12

.

Deduce that, asD → ∞, the form (5.8) concentrates probability mass onvectorsn that each identify a maximal independent vertex set (that is anindependent vertex set that is not a strict subset of any other independentvertex set).

Exercise 5.13 Show that it is possible to view the state spaceS(C) ofExample 5.20 as derived from an interference graph, to be described.

Exercise 5.14 Establish the claim in Example 5.20.[Hint: Consider optimization problem (5.10), but with the first constraintreplaced by

∑

r∈R

∑

n∈S(C)

p(n)nr ≥ λs for s ∈ S. ]

5.5 Further reading

Our treatment, and use in Section 5.2, of the Foster-Lyapunov criteria fol-lows Hajek (2006), which is an excellent text on several of the topics cov-ered in these notes. Goldberget al. (2004) provides a review of early workon acknowledgement-based schemes; Jiang and Walrand (2012) and Shahand Shin (2012) discuss recent work on distributed random access.

The problem of decentralized communication becomes harder in anet-work setting, particularly if packets need to be transmitted across multiplelinks, or “hops”. Fixed point approximations for networks withrandom


Erlang fixedpoint

access policies (in the spirit of the Erlang fixed point approximation) arediscussed in Marbachet al. (2011). A popular alternative to distributedrandom access in the multihop setting is to use gossiping algorithms tocoordinate the communications; Modianoet al. (2006) discusses this ap-proach.

effectivebandwidth—(

loss networkErlang’s

formula6

Effective bandwidth

Until now, our model of communication has assumed that an individualconnection requests some amount of bandwidth (measured, for example, incircuits) and receives all or none of it. We then used this modelto developErlang’s formula and our analysis of loss networks. However, it is possiblethat the bandwidth profile of a connection is different, and in particular itmight fluctuate. Thus, the peak rate required may be greater than the meanrate achieved over the holding period of the connection, as illustrated inFigure 6.1.

bandwidth

bandwidth

a telephone call

peak

mean

t

t

Figure 6.1 Possible bandwidth profiles; mean and peak rates

Our goal in this Chapter will be to understand how we might perform

133

134 Effective bandwidth

loss network admission control for such a system, so as to achieve an acceptable balancebetween utilizing the capacity of the resources and the probability that thetotal required rate will exceed the capacity of a resource causingpackets tobe lost.

There are two extreme strategies that we can identify immediately, eachof which reduces the problem to our earlier model of a loss network. At oneextreme, we could assume that each connection has its peak ratereservedfor it at each resource it passes through. This will mean that packets do notget lost, but may require a lot of excess capacity at resources. Atthe otherextreme, we could reserve for each connection its mean rate and allowpackets to queue for resources if there are instantaneous overloads. Withintelligent management of queued packets it may be possibleto stabilizethe resulting queueing network, but at the cost of large delaysin packetsbeing delivered. Our goal is to find a strategy between these twoextremes.

Exercises

Exercise 6.1 Consider the following (over-)simplified model, based onExample 2.22. Suppose that a connection on router produces a Poissonstream of packets at rateλr , and that a resource is well described as anM/M/1 queue with arrival rate

∑

r: j∈r nrλr and service rateφ j . Suppose thata new connection is accepted if and only if the resulting vector n = (nr , r ∈R) satisfies

∑

r: j∈rnrλr < φ j − ǫ

for each j ∈ J . Show that, under the M/M/1 assumption, the mean packetdelay at each queue is held below 1/ǫ.

6.1 Chernoff bound and Cramer’s theorem

Our intuition for not wanting to use the peak rate for every user isbased onthe fact that if the bandwidth requirements of the different users are inde-pendent, then the total bandwidth requirement when we have many usersshould scale approximately as the mean. We will now attempt tounder-stand the deviations: that is, withn independent users, how much largercan their total bandwidth requirement get than the sum of the mean rates?

Let X1, . . . ,Xn be independent identically distributed copies of a randomvariableX. DefineM(s) = logEesX, the log-moment generating function.When we need to indicate dependence on the variableX, we will write

6.1 Chernoff bound and Cramer’s theorem 135

MX(s). We will make the assumption thatM(s) is finite for reals in an openneighbourhood of the origin, and thus that the moments ofX are finite.

Now, for any random variableY and anys≥ 0 we have the bound

P(Y > 0) = E[I [Y > 0]] ≤ E[esY]. (6.1)

This is illustrated in Figure 6.2.

I [Y > 0]

esY

Figure 6.2 I [Y > 0] ≤ esY

Optimizing overs gives us theChernoff bound:

logP(Y > 0) ≤ infs≥0

logE[esY].

Now MX1+...+Xn(s) = nMX(s) and so

logP(X1 + . . . + Xn > 0) ≤ n infs≥0

M(s).

We can easily put a number other than 0 in the right-hand side:

logP(X1 + . . . + Xn > nc) = logP((X1 − c) + . . . + (Xn − c) > 0)

≤ n infs≥0

[MX(s) − cs],

sinceMX−c(s) = MX(s)−cs. This gives us an upper bound on the probabilitythat the sumX1+ . . .+Xn is large; we will next show that this upper boundis asymptotically exact asn→ ∞.

Remark 6.1 We’ve already assumed thatM(s) is finite for reals in anopen neighbourhood of the origin. A consequence is thatM(s) is differen-tiable in the interior of the interval for which it’s finite, with

M′(s) = E(XesX)/E(esX)

as we’d expect, but we shan’t prove this. (For a careful treatment,see Ganesh


et al. (2004), Chapter 2.) We’ll assume that the infimum in the followingstatement of Cramer’s theorem is attained at a point in the interior of theinterval for whichM(s) is finite.

Theorem 6.2(Cramer’s theorem) Let X1, . . . ,Xn be independent identi-cally distributed random variables withE[X1] < c, P(X1 > c) > 0, andsuppose the log-moment generating function M(s) satisfies the above as-sumptions. Then

limn→∞

1n

logP(X1 + . . . + Xn > nc) = infs≥0

[M(s) − cs].

Proof We’ve shown the right-hand side is an upper bound; let’s now es-tablish the lower bound. Lets∗ > 0 be the value that achieves the infimum(it’s greater than 0, sinceM′(0) = E[X1] < c).

We will usetilting. We will define a new set of independent identicallydistributed random variables,X1, . . . , Xn, whose density is written in termsof the density ofX1. The tilted random variableX1 will have meanE[X1] =c and finite non-zero variance (Exercise 6.3), and so by the central limittheorem

P

(

c <1n

(X1 + . . . + Xn) < c+ ǫ

)

→ 12

asn→ ∞.

Since the densities ofXi andXi are related, this will allow us to bound frombelow the corresponding probability forXi .

Let fX(·) be the density ofX1. Define the density of the exponentiallytilted random variableX1 to be

fX(x) = e−M(s∗)es∗x fX1(x).

We need to check two things: first, that this is a density function, i.e. inte-grates to 1; and second, thatX1 has meanc, as advertised. We first checkthat fX(·) is a density function:

∫

fX1(x)dx= e−M(s∗)

∫

es∗x fX1(x)dx= e−M(s∗)E[es∗X1] = 1

by the definition ofM. We next check thatX1 has meanc. To do this, werecall thats∗ was chosen as the point that minimizedM(s) − cs; therefore,at s∗, the derivative of this function is 0, i.e.c = M′(s∗). We now consider

E[X1] =∫

x fX1(x)dx=

∫

xe−M(s∗)es∗x fX1(x)dx

= E[X1es∗X1−M(s∗)] = M′(s∗) = c

6.1 Chernoff bound and Cramer’s theorem 137

as required.We now bound the probability thatX1 + . . . + Xn > nc from below as

follows:

P(X1 + . . . + Xn > nc) > P(nc< X1 + . . . + Xn < n(c+ ǫ))

=

(

nc<x1+...+xn<n(c+ǫ)

fX1(x1)dx1 . . . fXn(xn)dxn

=

(


eM(s∗)e−s∗x1 fX1(x1)dx1 . . .e

M(s∗)e−s∗xn fXn(xn)dxn

= enM(s∗)

(


e−s∗(x1+...+xn) fX1(x1)dx1 . . . fXn

(xn)dxn

≥ enM(s∗)e−s∗n(c+ǫ)P(nc< X1 + . . . + Xn < n(c+ ǫ)).

The latter probability, as we discussed earlier, converges to1/2; so weconclude

1n

logP(X1 + . . . + Xn > nc) ≥ (M(s∗) − s∗c) − s∗ǫ − 1n

log 2.

Lettingn→ ∞ andǫ → 0 gives us the result we wanted. �

Remark 6.3 An equivalent way of thinking about tilting is that we arechanging the underlying probability measureP toQ, with dQ

dP = e−M(s∗)es∗x.(This means that, for any measurable setS, Q(S) =

∫

Se−M(s∗)es∗xdP(x).)

UnderQ, we haveEQX1 = c. The random variableX1 is simply X1, andwe compareP- andQ-probabilities of the event1n(X1 + . . . + Xn) ≥ c. Thisway of thinking about it doesn’t requireX1 to have a density, which can beuseful.

We know more from the proof than we have stated. Suppose the eventX1 + . . . + Xn > nc occurs. Ifn is large, then conditional on that event wecan be reasonably certain of two things. First, the inequality is very close toequality. Second, the conditional distribution ofX1, . . . ,Xn is very close tothe distribution ofn independent identically distributed random variablessampled from the distribution ofX1. That is, when the rare event occurs,we know rather precisely how it occurs; results from the theory of largedeviations often have this flavour.


Exercises

Exercise 6.2 In this exercise, we record some properties of the log-momentgenerating function,MX(s) = logE[esX]. Check the following.

1. If X andY are independent random variables,MX+Y(s) = MX(s)+MY(s).2. MX(s) is convex. [Hint: Check its derivative is increasing.]3. If X is a normal variable with meanλ and varianceσ2 then MX(s) =λs+ (σs)2/2.

Exercise 6.3 We assumed that the infimum in the statement of Theo-rem 6.2 is attained in the interior of the interval for whichM(s) is finite.Check that this implies the tilted random variableX1 has finite variance.

6.2 Effective bandwidth

Consider a single resource of capacityC, which is being shared byJ typesof connection. (For example connections might be real-time audio, or real-time video conference calls). Letnj be the number of connections of typej = 1, . . . , J. Let

S =J∑

j=1

n j∑

i=1

Xji

whereXji are independent random variables, and where the distribution ofXji can depend onj but not oni. Let

M j(s) = logE[esXji ].

We viewXji as the bandwidth requirement of thei th connection of typej.We should clearly expect to haveES < C and we will investigate what

it takes to assure a small value forP(S > C).GivenC and information about the number and type of connections, the

bound (6.1) implies that for anys≥ 0,

logP(S > C) ≤ logE[es(S−C)] =J∑

j=1

nj M j(s) − sC.

In particular,

infs≥0

J∑

j=1

nj M j(s) − sC

≤ −γ =⇒ P(S > C) ≤ e−γ. (6.2)

6.2 Effective bandwidth 139

This is useful for deciding whether we can add another call of classkand still retain a service quality guarantee. Let

A =

n ∈ RJ

+ :J∑

j=1

nj M j(s) − sC≤ −γ for somes≥ 0

.

From the implication (6.2)

(n1, · · · , nJ) ∈ A =⇒ P(S > C) ≤ e−γ.

Call A theacceptance region: a new connection can be accepted, withoutviolating the service quality guarantee thatP(S > C) ≤ e−γ, if it leaves thevectorn insideA.

Remark 6.4 The acceptance regionA is conservative, since the implica-tions are in one direction. Cramer’s theorem shows that, for large valuesof C, n andγ, the converse holds, i.e.P(S > C) ≤ e−γ is approximatelyequivalent ton ∈ A (Exercise 6.4).

��

��

A

n1

n2

Figure 6.3 The convex complement of the acceptance region.

What does the acceptance regionA look like? For eachs the inequality

J∑

j=1

nj M j(s) − sC≤ −γ

defines a half-space ofRJ, and assvaries it indexes a family of half-spaces.The regionA has a convex complement inRJ

+ since this complement is justthe intersectionRJ

+ with this family of half-spaces. Figure 6.3 illustratesthe caseJ = 2. (As s varies the boundary of the half-space rolls over theboundary ofA, typically forming a tangent plane.)


loss networktrunk

reservation

Define

α j(s) =M j(s)

s=

1s

logE[esXji ].

Rewriting the inequality, we see that if

J∑

j=1

njα j(s) ≤ C − γs

(6.3)

then we are guaranteed to haveP(S > C) ≤ e−γ. But how should we chooses? If we know the typical operating region, the natural choice is tolook fora pointn∗ on the boundary ofA corresponding to the typical traffic mix,and then chooses to attain the infimum in (6.2) forn = n∗, as illustrated inFigure 6.4.

n1

n2

Boundary of acceptance region

Typical operating region

Linear approximation

n∗

Figure 6.4 Approximating the acceptance region by a half-spacebounded by the tangent atn∗.

We thus have a simple admission control: accept a new connection if itleaves the inequality (6.3) satisfied. We callα j(s) theeffective bandwidthofa source of classj. The admission control simply adds the effective band-width of a new request to the effective bandwidths of connections alreadyin progress and accepts the new request if the sum satisfies a bound.

If a similar admission control is enforced at each resource of a network,then the network will behave as a loss network. Note that the effectivebandwidth of a connection may vary over the resources of the network(since different resources may choose different values ofs, depending ontheir mix of traffic), just as in our earlier model of a loss network,Ajr

may vary over resourcesj for a given router. Techniques familiar fromloss networks can be used, for example trunk reservation to controlrelativepriorities for different types of connection (Exercise 3.12).

6.2 Effective bandwidth 141

Remark 6.5 The admission criterion (6.3) is conservative for two rea-sons. First the acceptance regionA is conservative, as noted in Remark 6.4.Secondly, the boundary ofA may not be well approximated by a tangentplane; we explore this next.

Example 6.6(Gaussian case) How well approximated is the acceptanceregionA by the linear constraint (6.3)? Some insight can be obtained fromthe Gaussian case, where explicit calculations are easy. Suppose that

α j(s) = λ j +sσ2

j

2,

corresponding to a normally distributed load with meanλ j and varianceσ2

j . Then (Exercise 6.7) the acceptance region is

A =

n :

∑

j

njλ j + (2γ∑

j

njσ2j )

1/2 ≤ C

and the tangent plane at a pointn∗ on the boundary ofA is of the form (6.3)with

α j(s) = λ j + γσ2

j

C(1− ρ∗) ,

whereρ∗ =∑

j n∗jλ j/C, the traffic intensity associated with the pointn∗.Thus the effective bandwidthsα j(s) will be relatively insensitive to the

traffic mix n∗ provided (1−ρ∗)−1 does not vary too much withn∗, or, equiv-alently, provided the traffic intensity is not too close to 1 on the boundaryof A. Now, traffic intensity very close to 1 is only sustainable if the varianceof the load is very small: this might arise near thenk axis if traffic typek isnearly deterministic.

Exercises

Exercise 6.4 Use Cramer’s theorem to show that

limN→∞

1N

logP

J∑

j=1

n j N∑

i=1

Xji > CN

= inf

s≥0

J∑

j=1

nj M j(s) − sC

.

In this sense the converse to the implication (6.2) holds as the number ofsources increases and the tail probability decreases.


Exercise 6.5 In this exercise, we explore properties of the effective band-width formula

α(s) =1s

logE[esX].

Show thatα(s) is increasing ins over the ranges ∈ (0,∞) and that it liesbetween the mean and the peak:

EX ≤ α(s) ≤ sup{x : P(X > x) > 0}.

[Hint: Jensen’s inequality.]Show further that

lims→0α(s) = EX, and lim

s→∞α(s) = ess supX.

Exercise 6.6 If X is a random variable withP(X = h) = p = 1−P(X = 0),show that its effective bandwidth is

1s

log(pesh+ 1− p).

Exercise 6.7 Consider the Gaussian case, Example 6.6. Show that theinfimum in relation (6.2) withn = n∗ is attained at

s=C −∑

j n∗jλ j∑

j n∗jσ2j

,

and hence deduce the expressions forA and for a tangent plane given inExample 6.6.

Check that, with this value ofs, the right-hand side of inequality (6.3)can be written as

C − γ∑

j n∗jσ2j

C(1− ρ∗) = C − γ variance of loadmean free capacity

.

For the Gaussian case, we can compute a necessary and sufficient con-dition onn such thatP(S > C) ≤ e−γ: show that it takes the form

∑

j

njλ j + φ(∑

j

njσ2j )

1/2 ≤ C

whereφ = Φ−1(1− e−γ) andΦ is the standard normal distribution function.

6.3 Large deviations for a queue with many sources

Next we consider a model which includes a buffer: that way, we can al-low the total bandwidth requirements of all the connections toexceed thecapacity for a while. The challenge is to work out for how long!

6.3 Large deviations for a queue with many sources 143

BC

⊕

Figure 6.5 Three users (of three different types) feeding througha buffer.

We will consider the following model. There is a potentially infinitequeue in which unprocessed work resides. As long as the queue is posi-tive, the work is removed from it at rateC; when the queue is empty, workis processed at rateC or as quickly as it arrives. The system experiences“pain” (packets dropped, or held up at earlier queues, etc.) when the queuelevel exceedsB; this event is called buffer overflow. We would like to givea description of the set of vectorsn (giving the number of connections ofeach type) for which the probability of buffer overflow is small.

Because the queue introduces a time component into the picture, we willneed slightly more notation. LetXji [t1, t2] be the workload arriving fromthe ith source of typej during the interval [t1, t2]. We will assume that therandom processes (Xji ) are independent, that the distributions may dependupon the typej but not uponi, and each process hasstationary increments:that is, the distribution ofXji [t1, t2] depends ont1, t2 only throught2 − t1.

We next give some simple examples of processes with stationaryincre-ments.

Example 6.7 (Constant rate sources) SupposeXji [t1, t2] = (t2 − t1)Xji ,whereXji is a non-negative random variable. That is, each connection hasaconstant (but random) bandwidth requirement, whose distribution dependson the user typej. This essentially reproduces the bufferless model of thelast section: if

∑

Xji < C, buffer overflow never occurs; if∑

Xji > C, bufferoverflow will definitely occur from a finite time onwards, no matterhowlarge the buffer.

Example 6.8(M/G/1 queue) SupposeXji [0, t] =∑N(t)

k=1 Zk, whereN(t) is aPoisson process andZk are independent identically distributed random vari-ables. This models the arrival of work at anM/G/1 queue: the interarrivaltimes are exponential (hence the firstM), and the service requirements of


different “customers” are independent but have a general, not necessarilyexponential, distribution (hence theG).

Remark 6.9 We might be more interested in the case of a truly finitebuffer, in which work that arrives when the buffer is full is never processed.We might expect that buffer overflow would occur more rarely in this sys-tem, since its queue should be shorter. In the regime where the buffer rarelyoverflows (this is the regime we’d like to be in!), the overflows that do oc-cur will be small; so there is not much difference between dropping thiswork and keeping it. This can be made precise; see Ganeshet al. (2004),Chapter 7.

Suppose the number of connections of typej is nj , let n = (n1, · · · , nJ),and let the workload arriving at the queue over the interval [t1, t2] be

S[t1, t2] =J∑

j=1

n j∑

i=1

Xji [t1, t2].

We shall suppose the queue has been evolving from time−∞, and lookat the queue size at time 0; because the arrival processesXji are time-homogenous, this will have the same distribution as the queuesize at anyother timet. The queue size at time 0 is the random variable

Q(0) = sup0≤τ<∞

(S[−τ, 0] −Cτ) . (6.4)

To see this, note that the queue at time 0 must be at leastS[−τ, 0] −Cτ foranyτ, because it cannot clear work any faster than that; that the queue sizeis indeed given by (6.4) is shown in Exercise 6.8. There we shall also seethat the random variable−τ is the time the queue was last empty beforetime 0 (more exactly, in the case of non-uniqueness, the random variable−τ is such thatQ(−τ) = 0 and the queue drains at the full rateC over(−τ, 0)).

The random variableQ(0) may be infinite. For example with constantrate sources of total rateS the queue size is infinite with probabilityP(S >C), precisely the object of study in Section 6.2, as we noted in Example 6.7.

Let L(C, B, n) = P(Q(0) > B); this is the probability we would like tocontrol. We are going to look at this in the regime where capacity,numberof connections, and buffer size are all large, while the number of connectiontypes is fixed.


Theorem 6.10

limN→∞

1N

logL(CN, BN, nN) = supt≥0

infs≥0

st

∑

j∈Jnjα j(s, t) − s(B+Ct)

where

α j(s, t) =1st

logEesXji [0,t] . (6.5)

Sketch of proof.

P(QN(0) > BN) = P(

SN[−t, 0] > (B+Ct)N for somet)

.

This probability is bounded from below byP(SN[−t, 0] > (B + Ct)N) forany fixedt, and from above by

∑

t≥0 P(SN[−t, 0] > (B+ Ct)N) (at least in

discrete time).For any fixedt, Cramer’s theorem gives

1N

logP

J∑

j=1

n j N∑

i=1

Xji [−t, 0] > (B+Ct)N

→ infs

st

J∑

j=1

njα j(s, t) − s(B+Ct)

.

This gives us the lower bound directly.The upper bound can be derived by noticing that the terms decay expo-

nentially inN, and therefore we can ignore all but the largest term. Makingthis precise takes some work, but see Exercise 6.9 for a start. �

Suppose thats∗ andt∗ are extremizing parameter values in Theorem 6.10.The sketched proof allows an interpretation of these values. Wesee thatif buffer overflow occurs, then with high probability this happens becauseover a time periodt∗, the total work that arrived into the system was greaterthanB+Ct∗. Thus,t∗ is the typical time scale for buffer overflow. The mostlikely trajectory for the amount of work in the buffer over the time scale[−t∗, 0] (wheret = 0 is the time of the buffer overflow event) depends onthe temporal correlation structure of the processesXji . Figure 6.6 sketchessome possibilities.

The parameters∗ is the exponential tilt for the distribution ofS, and foreach of theXji , that makes the sumS over the time periodt∗ likely to beB+Ct∗.

An acceptance region can be defined as in the last section. Although itwill no longer have convex complement in general, the expressions (6.5)


QN

t∗t

B

t∗t

B

QN

Figure 6.6 Most likely trajectories leading to buffer overflow.The most likely trajectory is linear (left) ifS has independentincrements. The image on the right corresponds to trafficdescribed by a fractional Brownian motion with Hurst parameter> 1/2; this is a process exhibiting positive correlations over timeand long-range dependence. The thick line is the limit asN→ ∞,the thin line is a possible instance for finiteN.

do define the tangent plane at a typical boundary point, and hence haveinterpretations as effective bandwidths.

Remark 6.11 The notion of an effective bandwidth can be developed invarious other frameworks. For example, Exercise 6.10 develops a linearlybounded acceptance region for a model where the log-moment generat-ing function may not be defined (other than at 0). But the large deviationsframework sketched in this section does give a broad concept of effectivebandwidth that takes into account the statistical characteristics of a connec-tion over different time and space scales.

Exercises

Exercise 6.8 In this Exercise we derive equation (6.4) for the queue size.We suppose the queue has been evolving from time−∞. If the queue isdraining at the full rateC for the entire period (−∞, 0) there may be an am-biguity in the definition of the queue size. This might occur,for example,with constant rate sources such thatS = C, since then the queue size isa fixed constant for allt, which we can interpret as the initial queue sizeat time−∞. We assume the queue size is zero at least once in the past,possibly in the limit ast → −∞.

Suppose thatτ < ∞ achieves the supremum in equation (6.4). Show that

• S[−τ,−t] ≥ C(τ− t) for all 0 ≤ t ≤ τ, and deduce that the queue must bedraining at the full rateC for the entire period [−τ, 0].


Pollaczek-Khinchineformula

• Show thatS[−t,−τ] ≤ C(t − τ) for all t ≥ τ, and deduce thatQ(−t) ≥Q(−τ) for all t ≥ τ.

• Deduce thatQ(−τ) = 0, and hence equation (6.4).

Finally obtain equation (6.4) in the case where the supremum attained isfinite, but is only approached asτ→ ∞.

Exercise 6.9 [Principle of the Largest Term] Consider two sequencesaN,bN, with

1N

logaN → a,1N

logbN → b.

If a ≥ b, show that

1N

log(aN + bN)→ a.

Conclude that for finitely many sequencespN(0), . . . , pN(T) with limitsp(0), . . . , p(T)

1N

log

T∑

t=0

pN(t)

→ max0≤t≤T

p(t) asN→ ∞.

That is, the growth of a finite sum is governed by the largest growth rate ofany term in it; this is known as the principle of the largest term.

In the proof of Theorem 6.10, we need to bound∑

t≥0 P(XN[−t, 0] >

(B+Ct)N), which involves extending this argument to a countable sum.

Exercise 6.10 The expected amount of work in anM/G/1 queue, Exam-ple 6.8, is

EQ =ν(µ2 + σ2)2(C − νµ)

whereν is the rate of the Poisson arrival process andµ, σ2 are the mean andvariance respectively of the distributionG (from the Pollaczek-Khinchineformula). Now suppose that

G(x) =J∑

j=1

pjG j(x), ν =J∑

j=1

ν jnj , pj =ν jnj

ν

corresponding tonj sources of typej, with ν j the arrival rate andG j the ser-vice requirement distribution of typej. Letµ j , σ

2j be the mean and variance

respectively of the distributionG j . Show that the service quality guarantee



bandwidth—)

EQ ≤ B is satisfied if and only if

J∑

j=1

α jnj ≤ C

where

α j = ν j

[

µ j +1

2B(µ2

j + σ2j )

]

.

6.4 Further reading

Ganeshet al.(2004) give a systematic account of how large deviations the-ory can be applied to queueing problems, and give extensive references tothe earlier papers. For a more detailed development of the examples givenin this Chapter, the reader is referred to Kelly (1996) and to Courcoubetisand Weber (2003). For a treatment of effective bandwidths within a moregeneral setting of performance guarantees, see Chang (2000). Mazum-dar (2010) emphasises the role of effective bandwidths as mappings fromqueueing level phenomena to loss network models, and providesa concisemathematical treatment of a number of topics from these notes.

The effective bandwidth of a connection is a measure of the connection’sconsumption of network resources, and we might expect it to be relevant inthe calculation of a charge: see Songhurst (1999) for a detaileddiscussionof usage-based charging schemes, and Courcoubetis and Weber (2003) fora broad overview of the complex subject of pricing communication net-works.

After a flow has passed through a resource, its characteristics may change:Wischik (1999) considers a model where the relevant statisticalcharacter-istics of a flow of traffic are preserved by passage through a resource, in thelimit where the number of inputs to that resource increases.

Part III

149


bandwidth7

Internet congestion control

In the previous Chapter we looked at organising a network of userswithtime-varying bandwidth requirements into something that resembles a lossnetwork, by finding a notion of effective bandwidth. In this Chapter weconsider an alternative approach, where the users actively participate in thesharing of the network resources.

7.1 Control of elastic network flows

How should available resources be shared between competing streams ofelastic traffic? There are at least two aspects to this question. First, whywould one allocation of resources be preferred to another? Secondly, whatcontrol mechanisms could be implemented in a network to achieve anypreferred allocation? To make progress with these questions, we need tofix some notation, and define some terms.

Consider a network with a setJ of resources, and letC j be the finitecapacity of resourcej. (We occasionally refer to the resources aslinks.) Leta route r be a non-empty subset ofJ , and writeR for the set of possibleroutes. As before, we will use thelink-route incidence matrix Ato indicatewhich resources belong to which routes.A is a|J|×|R|matrix withAjr = 1if j ∈ r, andAjr = 0 otherwise.

We will identify a user with a route. For example, when Elena at theUniversity of Michigan downloads the web page atwww.google.com, thecorresponding “user” (route) is sending data along 16 links, including thelink from Elena’s laptop to the nearest wireless access point. If Elena alsodownloads the web pagewww.cam.ac.uk, this (different!) “user” is using18 links; five of the links are common with the route to Google.

Suppose that if a ratexr is allocated to userr then this has a utilityUr(xr ). We formalize the notion of elastic traffic by assuming that theutil-ity function Ur(·) is increasing and concave, and that our objective is tomaximize the sum of user utilities. For convenience we will also assume

151

152 Internet congestion control

loss network Ur(·) is strictly concave, continuously differentiable on (0,∞), and satisfiesU′r(0) = ∞ andU′r(∞) = 0.

Remark 7.1 Elastic traffic, that is concave utility functions, captures thenotion of users preferring to share. The aggregate utility of two users bothgetting a medium download rate is higher than if one of them gota veryhigh rate and the other a very low one. If more users come into the system,then it is better for all the users to decrease their rate of sending data, sothat everyone has a chance to use the network. This isn’t true of all types oftraffic. For real-time voice communication, the utility function might havethe form shown on the right in Figure 7.1 if speech becomes unintelligibleat low rates. In this case users may require a certain set of resources inorder to proceed, and attempting to share would decrease aggregate utility.If there are too many users then some should get served and some not;in a loss network the implicit randomization is effected by the admissioncontrol mechanism.

utility, U(x) utility, U(x)

elastic traffic, prefers to share inelastic traffic, prefers to randomizerate,x rate,x

Figure 7.1 Elastic and inelastic demand

Let C = (C j , j ∈ J) be the vector of capacities, letU = (Ur(·), r ∈ R)be the collection of user utilities, and letx = (xr , r ∈ R) be an allocation offlow rates. The feasible region is then given byx ≥ 0,Ax ≤ C, and underthe model we have described the system optimal rates solve the followingproblem.

SYSTEM(U,A,C):

maximize∑

r∈RUr (xr )

subject to Ax≤ C (7.1)

over x ≥ 0.

7.1 Control of elastic network flows 153

While this optimization problem is mathematically tractable(with anobjective function that is strictly concave and a convex feasible region),it involves utilitiesU that are unlikely to be known by the network; fur-ther, even the matrixA describing the network topology is not likely to beknown in its entirety at any single point in the network. We address theissue of unknown utilities first, by decomposing the optimization probleminto simpler problems. We leave the second issue for later sections.

Suppose that userr may choose an amount to pay per unit time,wr , andwill then receive in return a flow ratexr proportional towr , where 1/λr isthe known constant of proportionality. Then the utility maximization prob-lem for userr is as follows.

USERr (Ur ; λr):

maximize Ur

(

wr

λr

)

− wr (7.2)

over wr ≥ 0.

Here we could interpretwr as userr ’s budget andλr as the price per unit offlow; note that no capacity constraints are involved.

Suppose next that the network knows the vectorw = (wr , r ∈ R) andsolves a variant of the SYSTEM problem using a particular set of utilityfunctions,Ur (xr ) = wr log xr , r ∈ R.

NETWORK(A,C; w):

maximize∑

r∈Rwr log xr

subject to Ax≤ C (7.3)

over x ≥ 0.

We can find a pleasing relationship between these various problems, sum-marized in the following theorem.

Theorem 7.2(Problem decomposition)There exist vectorsλ = (λr , r ∈R), w = (wr , r ∈ R) and x= (xr , r ∈ R) such that

• wr = λr xr for all r ∈ R• wr solves USERr (Ur ; λr) for all r ∈ R• x solves NETWORK(A,C; w).


Lagrangemultiplier—(


Moreover, the vector x then also solves SYSTEM(U,A,C).

Remark 7.3 Thus the system problem can be solved by simultaneouslysolving the network and user problems. (In later sections we shall addressthe issue of how to solve the problems simultaneously, but youmight an-ticipate that some form of iterative procedure will be involved.)

There are many other ways to decompose the SYSTEM problem. Forexample, we might pick a version of NETWORK problem that uses a dif-ferent set of “placeholder” utilities, instead of the familywr log(·); we’llconsider a more general class in Section 8.2. As we’ll see, the logarithmicutility function occupies a privileged position within a natural family ofscale invariant utility functions, and it corresponds to a particular notion offairness.

Proof Note first of all that there exists a unique optimum for the SYS-TEM problem, because we are optimizing a strictly concave function overa closed convex set; and the optimum is interior to the positive orthantsinceU′r (0) = ∞. Our goal is to show that this optimum is aligned with anoptimum for each of the NETWORK and the USER problems.

The Lagrangian for SYSTEM is

LSYSTEM(x, z; µ) =∑

r∈RUr (xr ) + µ

T(C − Ax− z)

wherez = (zj , j ∈ J) is a vector of slack variables, andµ = (µ j , j ∈ J)is a vector of Lagrange multipliers, for the inequality constraints. By theStrong Lagrangian Principle (Appendix C), there exists a vectorµ such thatthe Lagrangian is maximized at a pairx, z that solve the original problem,SYSTEM.

In order to maximize the Lagrangian overxr , we differentiate:

∂

∂xrLSYSTEM(x, z; µ) = U′r (xr) −

∑

j∈rµ j

since differentiating the termµTAx will leave only thoseµ j with j ∈ r.The Lagrangian for NETWORK is

LNETWORK(x, z; µ) =∑

r∈Rwr log xr + µ

T(C − Ax− z),

and differentiating with respect toxr we obtain

∂

∂xrLNETWORK(x, z; µ) =

wr

xr−

∑

j∈rµ j .


Lagrangemultiplier—)

shadow price

(There’s no reason at this point to believe thatµ andµ have anything to dowith each other; they are simply the Lagrange multipliers for two differentoptimization problems.)

Finally, for USER the objective function isUr (wr/λr ) − wr , and

∂

∂wr

[

Ur

(

wr

λr

)

− wr

]

=1λr

U′r

(

wr

λr

)

− 1.

Suppose now that userr always picks her budgetwr to optimize theUSER objective function. Then we must have

U′r

(

wr

λr

)

= λr .

Therefore, if we pick

µ j = µ j , λr =∑

j∈rµ j , wr = λr(U

′r )−1(λr), xr =

wr

λr

then the NETWORK and SYSTEM Lagrangians will have the same deriva-tives. Hence, ifµ j are optimal dual variables for the SYSTEM problem,then this choice of the remaining variables simultaneously solves all threeproblems, and establishes the existence of the vectorsλ,w andx satisfyingconditions 1, 2 and 3 of the theorem. Conversely suppose vectorsλ,w andx satisfy conditions 1, 2 and 3: thenx solves the NETWORK problem, andso can be written in terms of ˜µ, and settingµ = µ shows thatx also solvesthe SYSTEM problem. �

We can identifyµ j as theshadow priceper unit flow through resourcej.The price for a unit of flow along router is λr , and decomposes as a sumover the resources involved.

Next we determine the dual problem. The value of the NETWORK La-grangian is

maxx,z≥0

LNETWORK(x, z; µ) =∑

r∈Rwr log

wr∑

j∈r µ j+ µTC.

If we subtract the constant∑

r wr logwr , and change sign, we obtain thefollowing optimization problem.


DUAL(A,C; w):

maximize∑

r∈Rwr log

∑

j∈rµ j

−

∑

j∈Jµ jC j

over µ ≥ 0.

This rephrases the NETWORK problem as a problem of setting up pricesµ j per unit of flow through each resource so as to optimize a certain revenuefunction.

The treatment in this section suggests that one way to solve the resourceallocation problem SYSTEM without knowing the individual user utilitieswould be to charge the users for access to the network, and allow users tochoose the amounts they pay. But, at least in the early days of the Inter-net, charging was both impractical and likely to ossify a rapidly develop-ing technology. In the next section we shall next discuss how one mightcompare alternative resource allocations using the ideas in this section butwithout resorting to charging.

Exercises

Exercise 7.1 In the USER problem, userr acts as a price-taker: she doesnot anticipate the consequences of her choice ofwr upon the priceλr . Inthis exercise we explore the effect of more strategic behaviour.

Consider a single link of capacityC, shared between users inR. Supposethat userr chooseswr : show that under the solution to NETWORK, userrreceives a rate

wr∑

s∈R wsC.

Now suppose userr anticipates that this will be the outcome of thechoicesw = (ws, s ∈ R) and attempts to choosewr to maximize

Ur

(

wr∑

s∈R wsC

)

− wr

overwr > 0, for given values ofws, s , r. Show that there exists a vectorw that solves this problem simultaneously for eachr ∈ R, and that it is the


unique solution to the problem

maximize∑

r∈RUr (xr )

subject to∑

r∈Rxr ≤ C

over xr ≥ 0

where

Ur (xr ) =(

1− xr

C

)

Ur (xr) +xr

C

(

1xr

∫ xr

0Ur (y)dy

)

.

Thus the choices of price-anticipating users will not in generalmaximizethe sum of user utilities, although if many users share a link and xr ≪ C,the effect is likely to be small.

Exercise 7.2 In this exercise we extend the model of this section to in-clude routing.

Let s ∈ S now label a user (perhaps a source-destination pair), and sup-poses is identified with a subset ofR, namely the routes available to servethe users. Define an incidence matrixH as in Section 4.2.2, by settingHsr = 1 if r ∈ sandHsr = 0 otherwise. Thus ifyr is the flow on router andy = (yr , r ∈ R), thenx = Hy gives the aggregate flows achieved by eachusers ∈ S. Let SYSTEM(U,H,A,C) be the problem

maximize∑

s∈SUs(xs)

subject to x = Hy,Ay≤ C

over x, y ≥ 0

and let NETWORK(H,A,C; w) be the problem

maximize∑

r∈Rws log xs

subject to x = Hy,Ay≤ C

over x, y ≥ 0.

Show that there exist vectorsλ = (λs, s ∈ S), w = (ws, s ∈ S) andx = (xs, s ∈ S) satisfyingws = λsxs for s ∈ S, such thatws solvesUSERs(Us; λs) for s ∈ S andx solves NETWORK(H,A,C; w); show fur-ther thatx is then the unique vector with the property that there exists avectory such that (x, y) solves SYSTEM(U,H,A,C).


fairness!max-min Exercise 7.3 Requests to view a web page arrive as a Poisson process

of rateC. Each time a page is viewed an advert is displayed within thepage. The advert is selected randomly from a pool ofR adverts; advertr ∈ R is selected with probabilitywr/

∑

s∈R ws wherewr is the payment perunit time (e.g. per day) made by advertiserr. Advert r, when displayed, isclicked upon by a viewer of the page with probability 1/Ar . Advertiserrdoes not observeC, As for s ∈ R, or ws for s, r. Assume that advertiserrdoesobserve

xr ≡wr

∑

s∈R ws· C

Ar;

this is a reasonable assumption, since the number of clicks onadvertr hasmeanxr under the above model. Advertiserr can thus deduce

λr ≡wr

xr,

the amount she has paid per click.Show that under the usual assumptions onUr (·), r ∈ R, there exist vec-

torsλ = (λr , r ∈ R), w = (wr , r ∈ R) andx = (xr , r ∈ R) such that the abovetwo identities are satisfied and such that, for eachr ∈ R, wr maximizesUr(wr/λr ) − wr . Show that the vectorx then also maximizes

∑

r∈R Ur (xr )over allx satisfying

∑

r∈R Ar xr = C.

7.2 Notions of fairness

We would like the allocation of flow rates to the users to be fair in somesense, but what do we mean by this? There are several widely used no-tions of fairness; we will formulate a few here. Later we will encounter ageneralized notion of fairness which encompasses all of theseas specialcases.

Max-min fairness. We say that an allocationx = (xr , r ∈ R) is max-minfair if it is feasible (that is,x ≥ 0 andAx ≤ C) and for any other feasibleallocationy

∃r : yr > xr =⇒ ∃s : ys < xs ≤ xr .

That is, in order for userr to benefit, someone else (users) who was worseoff thanr needs to get hurt. The compactness and convexity of the feasibleregion imply that such a vectorx exists and is unique (check this!). Theterm “max-min” comes from the fact that we’re maximizing the minimalamount that anyone is getting: that is, the max-min fair allocation will give

7.2 Notions of fairness 159

fairness!max-min

fairness!proportionalfairness!weighted

proportionalfairness!max-

minfairness!proportional

a solution to

maxx feasible

(minr

xr ).

Actually, the max-min fair allocation will give more: once we’ve fixedthe smallest amount that anyone is getting, we then maximizethe second-smallest, then the third-smallest, and so on.

The concept of max-min fairness has been much discussed by politicalphilosophers. InA Theory of JusticeRawls (1971) proposes, if one wereto design society from scratch without knowing where in it one might endup, then one would want it to be max-min fair. However, for our purposesit seems a bit extreme, because it places such overwhelming emphasis onmaximizing the lowest rate.

Proportional fairness. We say an allocationx = (xr , r ∈ R) is propor-tionally fair if it is feasible, and if for any other feasible allocationy theaggregate of proportional changes is nonpositive:

∑

r

yr − xr

xr≤ 0.

This still places importance on increasing small flows (for whichthe de-nominator is smaller), but it isn’t quite as overwhelming.

There may bea priori reasons why some users should be given moreweight than others: perhaps a route is providing a flow of value to anumberof individuals (these lecture notes are being downloaded to a class), ordifferent users invested different amounts initially to construct the network.Let w = (wr , r ∈ R) be a vector of weights: thenx = (xr , r ∈ R) is weightedproportionally fair if it is feasible, and if for any other feasible vectory

∑

r

wryr − xr

xr≤ 0. (7.4)

C = 1 C = 1

proportionally fair

C = 1 C = 1

max-min fair

1/2 1/2 2/3 2/3

1/2 1/3

Figure 7.2 The difference between max-min and proportionalfairness. Note that the allocation maximizing total throughput is(1,1,0).


fairness!weightedproportional

Nash’sbargainingproblem

marketclearingequilibrium


In Figure 7.2 we show the max-min and the proportionally fair alloca-tions for a particular network example with two resources, each ofunitcapacity, and three routes. The throughput-optimal allocationis simplyto award all the capacity to the single-resource routes; this willhave athroughput of 2. The proportionally fair allocation has a totalthroughputof 1.67, and the max-min fair one has a total throughput of 1.5.

There is a relationship between the notion of (weighted) proportionalfairness and the NETWORK problem we introduced in the last section.

Proposition 7.4 A vector x solves NETWORK(A,C; w) if and only if it isweighted proportionally fair.

Proof The objective function of NETWORK is strictly concave and con-tinuously differentiable, the feasible region is compact and convex, and thevector x that solves NETWORK is unique. Consider a perturbation ofx,y = x + δx (so yr = xr + δxr ). The objective function

∑

r∈R wr log xr ofNETWORK will change by an amount

∑

r∈Rwrδxr

xr+ o(δx).

But the first term is simply the expression (7.4). The result follows,usingthe concavity of the objective function to deduce the inequality (7.4) fornon-infinitesimal variations fromx. �

Remark 7.5 The proportionally fair allocation has various other relatedinterpretations, including as a solution to Nash’s bargaining problem andas a market clearing equilibrium.

Bargaining problem, Nash (1950). Consider the problem of severalplayers bargaining among a set of possible choices. The Nash bargainingsolution is the unique vector satisfying the following axioms:

• Invariance under scaling. If we rescale the feasible region, the bargainingsolution should simply rescale.

• Pareto efficiency. An allocation is Pareto inefficient if there exists analternative allocation that improves the amount allocated toat least oneplayer without reducing the amount allocated to any other players.

• Symmetry. If the feasible region is symmetric, then the players shouldbe allocated an equal amount.

• Independence of Irrelevant Alternatives. If we restrict the set of feasibleallocations to some subset that contains the original maximum, then themaximum should not change.

7.2 Notions of fairness 161





Lagrangemultiplier

shadow price

A general set of weightsw corresponds to a situation with unequal bargain-ing power; this will modify the notion of symmetry.

Market clearing equilibrium , (Gale, 1960, Chapter 8.5). A market-clearing equilibrium is a set of resource pricesp = (pj , j ∈ J) and anallocationx = (xr , r ∈ R) such that

• p ≥ 0, Ax≤ C (feasibility);

• pT(C − Ax) = 0 (complementary slackness): if the price of a resource ispositive then the resource is used up;

• wr = xr∑

j∈r pj , r ∈ R: if user r has an endowmentwr , then all endow-ments are spent.

Exercises

Exercise 7.4 In this exercise, we will show that the Nash bargaining so-lution associated with the problem of assigning flowsxr subject to the con-straintsx ≥ 0,Ax ≤ C is the solution of NETWORK(A,C; w) with wr = 1for all r ∈ R.

1. Show that the feasible region can be rescaled by a linear transformationso that the maximum of

∏

r xr is located at the point (1, . . . , 1).

2. Argue that this implies that the reparametrized region must lie entirelybelow the hyperplane

∑

r xr = |R|, and hence that (1, . . . , 1) is a Paretoefficient point.

3. By considering a suitable symmetric set that contains the feasible re-gion, and applying Independence of Irrelevant Alternatives, argue thatthe bargaining solution to the reparametrized problem must be equal to(1, . . . , 1).

4. Conclude that the bargaining solution to the original problem must havebeen the point that maximizes

∏

r xr , or, equivalently, the point thatmaximizes

∑

r log xr .

Exercise 7.5 Show that a market-clearing equilibrium (p, x) can be foundby solving NETWORK(A,C; w) for x, with p identified as the vector ofLagrange multipliers (or shadow prices) associated with the resource con-straints.



7.3 A primal algorithm

We will now look at an algorithm that can accomplish fair sharing of re-sources in a distributed manner. We begin by describing the mechanics ofdata transfer over the Internet.

server

device

Figure 7.3 A schematic diagram of the Internet. Squarescorrespond to sources and destinations, the network containsresources with buffers, and many flows traverse the network.

The transfer of a file (for example, a web page) over the Internet be-gins when the device wanting the file sends a packet with a request to theserver that has the file stored. The file is broken up into packets,and thefirst packet is sent back to the device. If the packet is received success-fully, then an acknowledgement packet is sent back to the server, and thisprompts further packets to be sent. This process is controlled by TCP, thetransmission control protocolof the Internet. TCP is part of the software ofthe machines that are the source and destination of the data (theserver andthe device in our example). The general approach is as follows. When aresource within the network becomes heavily loaded, one or more packetsis lost or marked. The lost or marked packet is taken as an indication ofcongestion, the destination informs the source, and the source slows down.TCP then gradually increases the sending rate until it receives another indi-cation of congestion. This cycle of increase and decrease serves to discoverand utilize available capacity, and to share it between flows.

A consequence of this approach is that “intelligence” and control areend-to-end rather than hidden in the network: the machines within the net-work forward packets, but the machines that are the source and destinationregulate the rate. We will now describe a model that attempts to capturethis notion of end-to-end control.

Consider a network with parametersA,C,w as before, but now supposethe flow ratexr (t) on router is time-varying. We define theprimal algo-

7.3 A primal algorithm 163

Lyapunovfunction—(

rithm to be the system of differential equations

ddt

xr (t) = κr

wr − xr (t)

∑

j∈rµ j(t)

r ∈ R (7.5)

where

µ j(t) = pj

∑

s: j∈sxs(t)

j ∈ J , (7.6)

wr , κr > 0 for r ∈ R and the functionpj(·) is a non-negative, continuous,and increasing function, not identically zero, forj ∈ J .

The primal algorithm is a useful caricature of end-to-end control: con-gestion indication signals are generated at resourcej at rateµ j(t); thesesignals reach each userr whose route passes through resourcej; and theresponse to this feedback by the userr is a decrease at rate proportionalto the stream of feedback signals received together with a steady increaseat rate proportional to the weightwr . We could imagine that the functionpj(·) measures the “pain” of resourcej as a function of the total traffic go-ing through it. (For example, it could be the probability of a packet beingdropped or marked at resourcej.)

Note the local nature of the primal algorithm: the summation in equation(7.5) concerns only resourcesj that are used by router, and the summa-tion in equation (7.6) concerns only routess that pass through resourcej. Nowhere in the network is there a need to know the entire link-routeincidence matrixA.

Starting from an initial statex(0), the dynamical system (7.5)-(7.6) de-fines a trajectory (x(t), t ≥ 0). We next show that whatever the initial state,the trajectory converges to a limit, and that the limit solves acertain opti-mization problem. We shall establish this by exhibiting aLyapunov func-tion for the system of differential equations. (A Lyapunov function is ascalar function of the state of the system,x(t), that is monotone in timetalong trajectories.)

Theorem 7.6(Global stability) The strictly concave function

U(x) =∑

r∈Rwr log xr −

∑

j∈J

∫ ∑

s: j∈s xs

0pj(y)dy.

is a Lyapunov function for the primal algorithm. The unique value maxi-mizingU(x) is an equilibrium point of the system, to which all trajectoriesconverge.


Lyapunovfunction—)

Lyapunovfunction

Proof The assumptions onwr > 0, r ∈ R, and pj(·), j ∈ J , ensure thatU(·) is strictly concave with a maximum which is interior to the positiveorthant; the maximizing value ofx is thus unique. Moreover,U(x) is con-tinuously differentiable with

∂

∂xrU(x) =

wr

xr−

∑

j∈rpj

∑

s: j∈sxs

,

and setting these derivatives to zero identifies the maximum,x say. Thederivative (7.5) is zero atx, and hencex is an equilibrium point.

Further

ddtU(x(t)) =

∑

r∈R

∂U∂xr

ddt

xr(t)

=∑

r∈R

κr

xr (t)

wr − xr (t)

∑

j∈rpj

∑

s: j∈sxs(t)

2

≥ 0, (7.7)

with equality only atx. This almost, but not quite, proves the desired con-vergence tox: we need to make sure that the derivative (7.7) isn’t so smallthat the system grinds to a halt far away fromx. However, this is easy toshow.

Consider an initial statex(0) in the interior of the positive orthant. Thetrajectory (x(t), t ≥ 0) cannot leave the compact set{x : U(x) ≥ U(x(0))}.Consider the complement,C, within this compact set of an openǫ-ballcentred onx. ThenC is compact, and so the continuous derivative (7.7) isbounded away from zero onC. Hence the trajectory can only spend a finitetime inC before entering theǫ-ball. Thus,x(t)→ x. �

We have shown that the primal algorithm optimizes the functionU(x).Now we can view

C j

∑

s: j∈sxs

=

∫ ∑

s: j∈s xs

0pj(y)dy

as a cost function penalizing use of resourcej. If we take

pj(y) =

∞, y > C j

0, y ≤ C j

then maximizing the Lyapunov functionU(x) becomes precisely the prob-lem NETWORK(A,C; w). While this choice ofpj violates the continuityassumption onpj , we could approximate it arbitrarily closely by smooth

7.3 A primal algorithm 165

functions, and so it would seem that this approach will generatearbitrarilygood approximations of the NETWORK problem. However, we shallseelater that, in the presence of propagation delays in the network, large val-ues ofp′j(·) compromise stability, so there is a limit to how closely we canapproximate the NETWORK problem using this approach.

Exercises

Exercise 7.6 Suppose that the parameterκr in the primal algorithm isreplaced by a functionκr (xr(t)), for r ∈ R. Check that Theorem 7.6 remainsvalid, with an unaltered proof, for any initial statex(0) in the interior of thepositive orthant provided the functionsκr (·), r ∈ R, are continuous andpositive on the interior of the positive orthant.[In Section 7.7 we shall be interested in the choiceκr (xr (t)) = κr xr (t)/wr .]

Exercise 7.7 Suppose there reallyis a costC j(z) = z/(C j − z) to carryinga loadz through resourcej, and that it is possible to choose the functionspj(·). (AssumeC j(z) = ∞ if z ≥ C j .) Show that the sum of user utilitiesminus costs is maximized by the primal algorithm with the choicespj(z) =C j/(C j − z)2, j ∈ J .

The above functionC j(·) arises from a simple queueing model of a re-source (cf. Section 4.3.1). Next we consider a simple time-slotted model ofa resource.

Suppose we model the number of packets arriving at resourcej in a timeslot as a Poisson random variable with meanz, and that if the number isabove a limitNj there is some cost to dealing with the excess, so that

C j(z) = e−z∑

n>N j

(n− Nj)zn

n!.

Show that the sum of user utilities minus costs is maximized by the primalalgorithm with the choices

pj(z) = e−z∑

n≥N j

zn

n!.

Observe thatpj will be the proportion of packets marked at resourcej ifthe following mechanism is adopted: markeverypacket in any time slot inwhich Nj or more packets arrive.

Exercise 7.8 In this exercise we generalize the primal algorithm to thecase wheres ∈ S labels a source-destination pair served by routesr ∈ s, asin Exercise 7.2.


transmissioncontrolprotocol(TCP)—(

binaryexponentialbackoff


Let s(r) identify the unique source-destination pair served by router.Suppose that the flow along router evolves according to

ddt

yr(t) = κr

ws(r) −

∑

a∈s(r)ya(t)

∑

j∈rµ j(t)

(or zero if this expression is negative andyr (t) = 0) where

µ j(t) = pj

∑

r: j∈ryr (t)

,

and let

U(y) =∑

s∈Sws log

∑

r∈syr

−∑

j∈J

∫ ∑

r: j∈r yr

0pj(z)dz.

Show thatddtU(y(t)) > 0

unlessy solves the problem NETWORK(H,A,C; w) from Exercise 7.2.

7.4 Modelling TCP

Next we describe in slightly more detail the congestion avoidance algo-rithm of TCP, due to Jacobson (1988). LetT, the round-trip time, be thetime between a source sending a packet, and the source receivingan ac-knowledgement. The source attempts to maintain a window (of sizecwnd)of packets that have been sent but not yet acknowledged. The rate x of ourmodel represents the ratiocwnd/T. If the acknowledgement is positive thencwnd is increased by 1/cwnd, while if the acknowledgement is negative (apacket was lost or marked) thencwnd is halved.

Remark 7.7 Even our more detailed description of TCP is simplified,omitting discussion of timeouts (which trigger retransmissions, with binaryexponential backoff) and the slow start phase (during which the sendingrate grows exponentially). But the above description is sufficient for us todevelop a system of differential equations to compare with those of theprevious section.

Figure 7.4 shows the typical evolution of window size, in incrementsof the round-trip time, for TCP. Modelling this behaviour by a differen-tial equation is at first sight implausible: the ratex is very clearlynotsmooth. It’s helpful to think first of a weighted version of TCP, MulTCP

7.4 Modelling TCP 167

10

15

20

25

000000000000000000000000000000

5Pac

kets

per

RT

T(”

win

dow

”)

Loss

5 10 15 20 25 30Time (in round-trip times)

Window halvedon packet loss

Congestion

(linear)avoidance

”Slow start” (exponential)

Figure 7.4 Typical window size for TCP

due to Crowcroft and Oechslin (1998), which is in general smoother.Let wbe a weight parameter, and suppose:

• the rate of additive increase is multiplied byw, so that each acknowl-edgement increasescwnd by w/cwnd; and

• the multiplicative decrease factor becomes 1− 12w, so that after a conges-

tion indication the window size becomes (1− 12w)cwnd.

The weight parameterw is designed to imitate the user sending traffic viaw distinct TCP streams; the original algorithm corresponds tow = 1. Thismodification results in a smoother trajectory for the ratex for larger valuesof w. It is also a crude model for the aggregate ofw distinct TCP flows overthe same route. (Observe that if a congestion indication is received by oneof w distinct TCP flows then only one flow will halve its window.)

Let us try to approximate the rate obtained by MulTCP by a differen-tial equation. Letp be the probability of a congestion indication being re-ceived during an update step. The expected change in the congestion win-dowcwnd per update step is approximately

wcwnd

(1− p) − cwnd2w

p. (7.8)

Since the time between update steps is approximatelyT/cwnd, the ex-


Lyapunovfunction

pected change in the ratex per unit time is approximately(

wcwnd

(1− p) − cwnd2w p)

/T

T/cwnd=

wT2−

(

wT2+

x2

2w

)

p.

Motivated by this calculation, we model MulTCP by the systemof dif-ferential equations

ddt

xr (t) =wr

T2r

−(

wr

T2r

+x2

r

2wr

)

pr(t)

where

pr (t) =∑

j∈rµ j(t)

andµ j(t) is given by (7.6). Here,Tr is the round-trip time andwr is theweight parameter for router. If congestion indication is provided by drop-ping a packet thenpr (t) approximates the probability of a packet drop alonga route by the sum of the packet drop probabilities at each of the resourcesalong the route.

Theorem 7.8(Global stability)

U(x) =∑

r∈R

√2wr

Trarctan

(

xrTr√2wr

)

−∑

j∈J

∫ ∑

s: j∈s xs

0pj(y)dy

is a Lyapunov function for the above system of differential equations. Theunique value x maximizingU(x) is an equilibrium point of the system, towhich all trajectories converge.

Proof Observe that

∂

∂xrU(x) =

wr

T2r

(

wr

T2r

+x2

r

2wr

)−1

−∑

j∈rpj

∑

s: j∈sxs

and so

ddtU(x(t)) =

∑

r∈R

∂U∂xr

ddt

xr (t) =∑

r∈R

(

wr

T2r

+x2

r

2wr

)−1 (

ddt

xr(t)

)2

≥ 0.

The proof proceeds as before. �

At the stable point,

xr =wr

Tr

(

21− pr

pr

)1/2

.

7.4 Modelling TCP 169


loss networkErlang fixed

point

If we think of xr as the aggregate ofwr distinct TCP flows on the sameroute, then each of these flows has an average throughput that is given bythis expression withwr = 1.

The constant 21/2 appearing in this expression is sensitive to the dif-ference between MulTCP andw distinct TCP flows. Withw distinct TCPflows the flow affected by a congestion indication is more likely to be onewith a larger window. This bias towards the larger of thew windows in-creases the final term of the expectation (7.8) and decreases the constant,but to prove this would require a more sophisticated analysis, such as in Ott(2006).

Remark 7.9 It is instructive to compare our modelling approach in thisChapter with that of earlier Chapters. Our models of a loss network inChapter 3, and of an electrical network in Chapter 4, began withdetailedprobabilistic descriptions of call acceptance and electron motion respec-tively, and we derived macroscopic relationships for blocking probabilitiesand currents such as the Erlang fixed point and Ohm’s law. In this Chapterwe have not developed a detailed probabilistic description ofqueueing be-haviour at resources within the network: instead we have crudelysumma-rized this behaviour with the functionspj(·). We’ve adopted this approachsince the control exerted by end-systems, the additive increaseand multi-plicative decrease rules for window sizes, is the overwhelming influence atthe packet level on flow rates. It is these rules which lead to the particularform of a flow rate’s dependence onTr andpr , which we’ll explore in moredetail in the next section.

Exercises

Exercise 7.9 Check that ifpr is small, corresponding to a low end-to-endloss probability, then at the stable point

xr =wr

Tr

√

2pr+ o(pr ).

Check that ifxr is large, again corresponding to a low end-to-end loss prob-ability, then

Ur (xr ) =

√2wr

Trarctan

(

xrTr√2wr

)

= const.− 2w2r

T2r xr+ o

(

1xr

)

.



fairness!proportional

7.5 What is being optimized?

From Theorem 7.4 we see that TCP (or at least our differential equationmodel of it) is behaving as if it is maximizing the sum of user utilities,subject to a cost function penalizing proximity to the capacity constraints.The user utility for a single TCP flow (wr = 1) is, from Exercise 7.9, of theform

Ur (xr ) ≈ const.− 2T2

r xr.

Further, the rate allocated to a single flow on router is approximately pro-portional to

1

Tr p1/2r

(7.9)

so that the rate achieved is inversely proportional to the round-trip timeTr

and to the square root of the packet loss probabilitypr .The proportionally fair allocation of rates would be inversely propor-

tional to pr and would have no dependence onTr . Relative to proportionalfairness, the above rate penalizesTr and under-penalizespr : it penalizesdistance and under-penalizes congestion.

For many files downloaded over the Internet, the round-trip times arenegligible compared to the time it takes to download the file: the round-triptime is essentially a speed-of-light delay, and is of the order of50ms fora transatlantic connection, but may be much smaller. If limited capacity iscausing delay in downloading a file then it would be reasonableto expectthat the user would not care very much whether the round-trip time is infact 1ms or 50ms – and we would not expect a big difference in user utility.

Another way to illustrate the relative impact of distance and conges-tion is the problem ofcache location. Consider the network in Figure 7.5.On the short route, from a user to cache locationA, the round-trip time isT = 2T1, and the probability of congestion indication isp ≈ 2p1 (we areassumingp1 and p2 are small). On the long route, from the user to cachelocationB, we haveT = T1 + T2 andp ≈ p1 + p2.

Suppose now that link 2 is a high-capacity long-distance optical fibre.Plausible parameters for it arep2 = 0 (resource 2 is underused), butT2 =

100T1. Then the ratio of TCP throughputs along the two routes is

T1 + T2

2T1

√p1 + p2√

2p1

=101

2√

2≈ 36.

That is, the throughput along the short route is much higher thanalong the

7.5 What is being optimized? 171


fairness!proportionaltransmission

controlprotocol(TCP)—)

user

T2, p2

cache locationBT1, p1

T1, p1

cache locationA

Figure 7.5 A network with three links, and two routes: long(solid) and short (dotted). The round-trip times and packetdropprobabilities are indicated on the links.

long one, even though the short route is using two congested resources, andthe long route only uses one.

Suppose that we want to place a cache with lots of storage spacesome-where in the network, and must choose between putting it atA or at B.The cache will appear more effective if we put it atA, further loadingthe already-congested resources. This is an interesting insightinto conse-quences of the implicit optimization inherent in the form (7.9) –the depen-dence of throughput on round-trip times means that decentralizeddecisionson caching will tend to overload the congested edges of the network andunderload the high-capacity long-distance core.

Exercises

Exercise 7.10 What is the ratio of throughputs along the two routes underproportional fairness?

Exercise 7.11 Our models of congestion control assume file sizes arelarge, so that there is time to control the flow rate while the file is in theprocess of being transferred. Many files transferred by TCP have only oneor a few packets, and are essentially uncontrolled (mice rather thanele-phants). Suppose the uncontrolled traffic through resourcej is uj , andpj(y)is replaced bypj(y+ uj) in our systems of differential equations. Find thefunctionU that is being implicitly optimized.

(Note:There will be good reasons for placing mice, if not elephants, ina cache within a short round-trip time of users.)


Lyapunovfunction

7.6 A dual algorithm

Recall the optimization problem DUAL(A,C; w):

max∑

r∈Rwr log

∑

j∈rµ j

−

∑

j∈Jµ jC j

overµ ≥ 0.

Define thedual algorithm

ddtµ j(t) = κ jµ j(t)

∑

r: j∈rxr (t) −C j

where

xr (t) =wr

∑

k∈r µk(t),

andκ j > 0, j ∈ J . That is, the rate of change ofµ j is proportional toµ j

and to theexcess demandfor link j. In economics, such a process is knownas a “tatonnement process” (“tatonnement” is French for “groping”). It isa natural model for a process that attempts to balance supply anddemand.

For the dual algorithm, the intelligence of the system is at the resourcesj ∈ J , rather than at the usersr ∈ R. That is, the end systems simplymaintainxr (t) = 1/

∑

k∈r µk(t) and it is the resources that have the trickiertask of adjustingµ j(t). In the primal algorithm, by contrast, the intelligencewas placed with the end systems.

Let us check that the dual algorithm solves the DUAL problem. We’ll dothis by showing the objective function of DUAL is a Lyapunov function.Let

V(µ) =∑

r∈Rwr log

∑

j∈rµ j

−

∑

j∈Jµ jC j ,

and letµ(0) be an initial state in the interior of the positive orthant. Then

ddtV(µ(t)) =

∑

j∈J

∂V∂µ j

ddtµ j(t) =

∑

j∈Jκ jµ j(t)

∑

r: j∈rxr (t) −C j

2

≥ 0

with equality only at a point which is both an equilibrium point of the dualalgorithm and a maximum of the concave functionV(µ). If the matrixA isof full rank than the functionV(µ) is strictly concave and there is a uniqueoptimum, to which trajectories necessarily converge.

7.7 Time delays 173

Exercises

Exercise 7.12 Suppose that the matrixA is not of full rank. Show thatwhile the value ofµ maximizing the concave functionV(µ) may not beunique, the vectorµTA is unique, and hence the optimal flow patternx isunique. Prove that, along any trajectory (µ(t), t ≥ 0) of the dual algorithmstarting from an initial stateµ(0) in the interior of the positive orthant,x(t)converges tox.[Hint: Consider the set of pointsµmaximizing the functionV(µ), and showthatµ(t) approaches this set.]

Exercise 7.13 Consider a generalized model of the dual algorithm, wherethe flow rate is given by

xr(t) = Dr

∑

j∈rµ j(t)

.

The functionDr (·) can be interpreted as the demand for traffic on router: assume it is a positive, strictly decreasing function of the total price onrouter.

Let

V(µ) =∑

r∈R

∫ ∑

j∈r µ j

Dr (ξ)dξ −∑

j∈Jµ jC j .

By considering∂V∂µ j

, show thatddtV(µ(t)) will be nonnegative provided we

setddtµ j(t) ≷ 0 according as

∑

r: j∈rxr (t) ≷ C j .

This suggests we can choose from a large family of price update mecha-nisms, provided we increase the price of those resources where the capacityconstraint is binding, and decrease the price of those resources where it isnot. We shall see that in the presence of time delays this freedomis actuallysomewhat limited.

7.7 Time delays

In earlier sections the parametersκr , r ∈ R, or κ j , j ∈ J , were assumed tobe positive, but were otherwise arbitrary. This freedom was possible sincethe feedback of information between resources and sources was assumedto be instantaneous. In this section we briefly explore what happens when


feedback is assumed to be fast, e.g. at the speed of light, but nonethelessfinite.

Consider first a very simple, single resource model of the primal algo-rithm, but one where the change in flow rate at timet depends on congestioninformation from a timeT units earlier. (In the Internet, a natural value forT is the round-trip time; information about the fate of packets on route rtakes approximately that long to reach the source.)

ddt

x(t) = κ(

w− x(t − T)p(

x(t − T)))

This system has an equilibrium pointx(t) = xe, which solves

w = xep(xe).

Our interest is in how the system behaves if we introduce a smallfluctua-tion about this rate. Writex(t) = xe + u(t). Let pe = p(xe) andp′e = p′(xe).Linearizing the time-delayed equation aboutxe, by neglecting second orderterms inu(·), we obtain

ddt

u(t) ≈ κ(

w− (

xe + u(t − T))(

pe + p′eu(t − T)))

≈ −κ(pe + xep′e)u(t − T).

Now, the solutions to a single-variable linear differential equation withdelay,

ddt

u(t) = −αu(t − T), (7.10)

are quite easy to find. Figure 7.6 plots the qualitative behaviour of the so-lutions, as thegainα increases from 0 to∞.

α small α large

u(t)

t

u(t)

t

u(t)

t t

u(t)

Figure 7.6 Solutions of the differential equation for differentvalues ofα

To find the valueαc that corresponds to the periodic solution, try usingu(t) = sin(λt). Then

λ cosλt = −αc sinλ(t − T) = −αc (sinλt cosλT − cosλt sinλT) ,

7.7 Time delays 175


which gives cosλT = 0, andαc = λ =π2

1T . The periodic solution marks the

boundary between stability and instability of equation (7.10), and hencelocal stability and instability of the earlier non-linear system.

Remark 7.10 Note that we could have gottenαc ∝ 1T from dimensional

analysis alone: given that the critical value forα exists, it has dimension ofinverse time, andT is the only time parameter in the problem. For our sin-gle resource model, the condition for local stability about theequilibriumpoint becomes

κT(pe + xep′e) <π

2.

Thus a larger derivativep′e requires a smaller gainκ.

The argument suggests that the gainκ in the update mechanism shouldbe limited by the fact that the productκT should not get too large. Ofcourse, in the absence of time delays, a larger value forκ corresponds tofaster convergence to equilibrium, so we would likeκ to be as large aspossible.

We briefly comment, without proofs, on the extension of these ideas toa network. For eachj, r such thatj ∈ r let Tr j be the time delay for packetsfrom the source for router to reach the resourcej, and letT jr be the returndelay for information from the resourcej to reach the source. We supposethese delays satisfy

Tr j + T jr = Tr j ∈ r, r ∈ R,

whereTr is the round-trip time on router.Consider the time-delayed dynamical system

ddt

xr(t) = κr xr(t − Tr )

(

1− pr (t) − pr (t)xr (t)wr

)

(7.11)

where

pr (t) = 1−∏

j∈r

(

1− µ j(t − T jr ))

and

µ j(t) = pj

∑

s: j∈sxs(t − Ts j)

,

andκr > 0, r ∈ R. (This model arises in connection with variants of TCP:comparing the right-hand side of equation (7.11) with expression(7.8) wesee thatxr(t − Tr )(1− pr (t)) andxr(t − Tr )pr (t) represent the stream of re-turning positive and negative acknowledgements.) If packetsare dropped


or marked at resourcej with probability pj independently from other re-sources, thenpr measures the probability that a packet traverses the entirerouter intact. To see where the delays enter, note that the feedback seen atthe source for router at timet is carried on a flow that passed through theresourcej a timeT jr previously, and the flow rate on routes that is seen atresourcej at timet left the source for routes a timeTs j previously.

A plausible choice for the marking probabilitypj(·) is

pj(yj) =

(yj

C j

)B

;

this is the probability of finding a queue of size≥ B in an M/M/1 queuewith arrival rateyj and service rateC j . (While we don’t expect the indi-vidual resources to behave precisely like independentM/M/1 queues withfixed arrival rates, this is a sensible heuristic.) It can be shown that thesystem of differential equations with delay will be locally stable about itsequilibrium point provided

κrTr B <π

2, ∀r ∈ R.

This is a strikingly simple result: the gain parameter for router is limitedonly by the round-trip timeTr , and not by the round-trip times of otherroutes. AsB increases, and thus as the functionpj(·) approaches a sharptransition at the capacityC j , the gain parameters must decrease.

For the dual algorithm of Section 7.6 the time-delayed dynamical systembecomes

ddtµ j(t) = κ jµ j(t)

∑

r: j∈rxr (t − Tr j ) −C j

where

xr (t) =wr

∑

k∈r µk(t − Tkr),

andκ j > 0, j ∈ J . It can be shown that this system will be locally stableabout its equilibrium point provided

κ jC jT j <π

2, ∀ j ∈ J , (7.12)

where

T j =

∑

r: j∈r xrTr∑

r: j∈r xr,

the average round-trip time of packets through resourcej. Again we see a

7.8 Modelling a switch 177



strikingly simple result, where the gain parameter at resourcej is limitedby characteristics local to resourcej.

Exercises

Exercise 7.14 In the single-resource model of delay (7.10), find the valueof α that corresponds to the transition from the first to the second graph inFigure 7.6.[Hint: Look for solutions of the formu(t) = e−λt with λ real. Answer:α = 1

eT.]

Exercise 7.15 Check that at an equilibrium point for the system (7.11)

xr = wr1− pr

pr,

independent ofTr .

Exercise 7.16 The simple, single resource model of the dual algorithmbecomes

ddtµ(t) = κµ(t)

(

wµ(t − T)

−C

)

.

Linearize this equation about its equilibrium point, to obtain an equationof the form (7.10): what isα? Check that the conditionαT < π/2 for localstability is a special case of equation (7.12).

7.8 Modelling a switch

In this chapter we’ve described packet-level algorithms implemented atend-points, for example in our discussion of TCP. Our model of resourceswithin the network has been very simple, captured by linear constraintsof the formAx ≤ C. In this section we shall look briefly at a more devel-oped model of an Internet router, and see how these linear constraints couldemerge.

Consider then the following model. There is a set of queues indexedby R. Packets arrive into queues as independent Poisson processes withratesλr . There are certain constraints on the sets of queues that can beserved simultaneously. Assume that at each discrete time slotthere is afinite setS of possible schedules, only one of which can be chosen. Ascheduleσ is a vector of non-negative integers (σr)r∈R, which describesthe number of packets to be served from each queue. We assume thatthe


output 1 output 2 output 3

input 3

input 2

input 1

Inputports

Figure 7.7 Input-queued switch with three input and three outputports, and thus nine queues. One possible matching is highlightedin bold. The figure is borrowed from Shah and Wischik (2012).

schedule provides an upper bound; that is, ifσ is a schedule, andπ ≤ σcomponentwise, thenπ is also a schedule.

We are interested in when this system may be stable. Writeλ ∈ Λ if λ isnon-negative and if there exist constantscσ, σ ∈ S, with cσ ≥ 0,

∑

cσ = 1,such thatλ ≤ ∑

cσσ. CallΛ theadmissible region.We would not expect the system to be stable if the arrival rate vector λ

lies outside ofΛ, since then there is a weighted combination of workloadsin the different queues whose drift is upwards whatever scheduling strategyis used (Exercise 7.17).

Example 7.11(An input-queued switch) An internet router hasN inputports andN output ports. A data transmission cable is attached to each ofthese ports. Packets arrive at the input ports. The function of therouter isto work out which output port each packet should go to, and to transferpackets to the correct output ports. This last function is calledswitching.There are a number of possible switch architectures; in this example wewill describe the common input-queued switch architecture.

A queuer is held for each pair of an input port and an output port. Ineach time slot, the switch fabric can transmit a number of packets frominput ports to output ports, subject to the constraints that each input cantransmit at most one packet and that each output can receive at most onepacket. In other words, at each time slot the switch can choose amatchingfrom inputs to outputs. Therefore, the setS of possible schedules is theset of all possible matchings between inputs and outputs (Figure 7.7). Theindexing setR is the set of all input-output pairs.

A matching identifies a permutation of the set{1, 2, . . . ,N} and so for


interferencegraph

MaxWeight-$“alpha$algorithm

fairness!max-min

this example we can identifyS with the set of permutation matrices.TheBirkhoff-von Neumann theorem states that the convex hull ofS is the setof doubly stochastic matrices (i.e. matrices with non-negativeentries suchthat all row and column sums are 1). We deduce that for an input-queuedswitch

Λ = {λ ∈ [0, 1]NxN :N∑

i=1

λi j ≤ 1 j = 1, 2, . . . ,N,

N∑

j=1

λi j ≤ 1 i = 1, 2, . . . ,N}. (7.13)

Example 7.12(A wireless network) K stations share a wireless medium:the stations form the vertex set of an interference graph, and there isanedge between two stations if they interfere with each other’s transmissions.A scheduleσ ∈ S ⊂ {0, 1}K is a vectorσ = (σ1, σ2, . . . , σK) with theproperty that ifi and j have an edge between them thenσi · σ j = 0.

We’ve looked at similar networks previously, in Example 2.24 and Ex-ercise 5.2, without explicit coordination between nodes. In this section wesuppose there is some centralized form of scheduling, so thatσ can bechosen as a function of queue sizes at the stations.

We return now to the more general model, with queues indexed byRand schedules indexed byS. We will look at a family of algorithms, theMaxWeight-α family, which we shall show stabilizes the system for allarrival rates in the interior ofΛ.

Fix α ∈ (0,∞). At time t, pick a scheduleσ ∈ S to solve

max∑

r∈Rσrqr (t)

α s.t.σ ∈ S

whereqr (t) is the number of packets in queuer at timet. If the maximizingσ is not unique, pick a schedule at random from amongst the set of sched-ules that attain the maximum. Note the myopic nature of the algorithm: theschedule chosen at timet depends only on the current queue size vectorq(t) = (qr (t), r ∈ R), which is thus a Markov process under the assumptionof Poisson arrivals.

As α→ 0, the strategy aims to maximize the number of packets served,corresponding to throughput maximization. Ifα = 1 the scheduleσ max-imizes a weighted sum of the number of packets served, with weights thequeue lengths (amaximum weight schedule). Whenα→ ∞, the scheduleσaims to maximize service to the longest queue, then to the second longest,and so on, reminiscent of max-min fairness.



recurrence!positiveLyapunov

functionMaxWeight-

$“alpha$algorithm

Theorem 7.13 For anyα ∈ (0,∞), and Poisson arrivals with ratesλ =(λr , r ∈ R) that lie in the interior ofΛ, the MaxWeight-α algorithm stabi-lizes the system. That is, the Markov chain q(t) = (qr (t), r ∈ R) describingthe evolution of queue sizes is positive recurrent.

Sketch of proof Consider theLyapunov function

L(q) =∑ qα+1

r

α + 1.

This is a nonnegative function. We will argue that the drift ofL should benegative provided‖q(t)‖ is large:

E[L(q(t + 1))− L(q(t)) |q(t) large]< 0.

(This suggests thatL(q(t)) should stay fairly small, and hence thatq(t)should be positive recurrent. However, as we have seen in Section 5.2,simply having negative drift is insufficient to conclude positive recurrenceof a Markov chain, so to turn the sketch into a proof we need to do morework, later.)

Let us approximate the drift ofL by a quantity that is easier to estimate.Whenq(t) is large, we can write

L(q(t + 1)) ≈ L(q(t)) +∑

r

∂L(q(t))∂qr

(qr(t + 1)− qr (t)) .

Therefore

E[L(q(t + 1))− L(q(t)) |q(t)] ≈∑

r

qr(t)αE[qr (t + 1)− qr (t) |q(t)], (7.14)

providedq(t) is large.Letσ∗(t) be the MaxWeight-α schedule selected at timet. Then

qr(t)αE[qr (t + 1)− qr (t) |q(t)] = qr (t)

α(λr − σ∗r (t)).

(If qr(t) = 0 thenE[qr (t + 1) − qr (t) |q(t)] may not beλr − σ∗r (t), but theproduct is still zero.)

Now, the definition ofσ∗(t) means that∑

r

σ∗r (t)qr (t)α ≥

∑

r

∑

σ

cσσrqr (t)α

for any other convex combination of feasible schedules. Sincewe knowthere is a convex combination of feasible schedules that dominatesλ, wemust have

∑

r

σ∗r (t)qr (t)α >

∑

r

λrqr (t)α,





provided at least one of the queues is nonzero. This means

E[L(q(t + 1))− L(q(t)) |q(t) large]< 0, under (7.14). (7.15)

In order to make the above argument rigorous, we require somewhatstronger conditions on the drift ofL. Specifically, we need to check thatthere exist constantsK > 0, ǫ > 0, andb for which the following holds.First, if ‖q(t)‖ > K, thenL has downward drift that is bounded away from0:

E[L(q(t + 1))− L(q(t)) | ‖q(t)‖ > K] < −ǫ < 0.

Second, if‖q(t) ≤ K‖, then the upward jumps ofL need to be uniformlybounded:

E[L(q(t + 1))− L(q(t)) | ‖q(t)‖ ≤ K] < b.

Foster-Lyapunov criteria (Proposition D.1) then assert that the Markovchainq(t) is positive recurrent.

You will check in Exercise 7.18 that these inequalities hold for the quan-tity appearing on the right-hand side of (7.14). Showing that the approxi-mation made in (7.14) doesn’t destroy the inequalities requirescontrollingthe tail of the distribution of the arrivals and services happening over asingle time step; see Example D.4 and Exercise D.1. �

Remark 7.14 We have shown that when the queuesq(t) are large, theMaxWeight-α algorithm chooses the schedule that will maximize the one-step downward drift ofL(q(t)). Thus, it belongs to a class of greedy, ormyopic, policies, which optimize some quantity over a single step. It maybe surprising that we don’t need to think about what happens on subsequentsteps, but it certainly makes the algorithm easier to implement!

Remark 7.15 Thus arrival ratesλ can be supported by a scheduling al-gorithm if λ is in the interior of the admissable regionΛ. Since the set ofschedulesS is finite the admissable regionΛ is a polytope that can be writ-ten in the formΛ = {λ ≥ 0 : Aλ ≤ C} for someA,C with non-negativecomponents. For the input-queued switch, Example 7.11, the linear con-straintsAλ ≤ C represent the 2N constraints identified in the representa-tion (7.13). For this example, each constraint corresponds to thecapacityof a data transmission cable connected to a port.

Note that the relevant time scales for switch scheduling are much shorterthan round-trip times in a wide area network, by the ratio of the distancesconcerned (centimetres, rather than kilometres or thousands of kilometres).TCP’s congestion control will change arrival rates at a router as aresponse




to congestion, but this occurs over a slower time scale than that relevant forthe model of this section.

Exercises

Exercise 7.17 If the arrival rate vectorλ lies outside the admissible regionΛ find a vectorw such that the weighted combination of workloads in thedifferent queues,w · q, has upwards drift whatever scheduling strategy isused.[Hint: SinceΛ is closed and convex, the separating hyperplane theoremshows there is a pair of parallel hyperplanes separated by a gap inbetweenΛ andλ.]

Exercise 7.18 Show that the right-hand side of (7.14) satisfies∑

r

qr (t)αE[qr (t + 1)− qr (t) | ‖q(t)‖ > K] < −ǫ‖q(t)‖α

for someǫ > 0 and an appropriately large constantK.[Hint: MaxWeight-α is invariant under rescaling all queue sizes by a con-stant factor, so reduce to the case‖q(t)‖ = 1, and use the finiteness of thesetS.]

Show also that∑

r

qr (t)αE[qr (t + 1)− qr(t) | ‖q(t)‖ ≤ K] < b

for some constantb.Example D.4 and Exercise D.1 derive similar bounds for the quantity on

the left-hand side of (7.14), which are needed to apply the Foster-Lyapunovcriteria.

Exercise 7.19 Suppose a wireless network is serving the seven cells il-lustrated in Figure 3.5, where there is a single radio channel which cannotbe simultaneously used in two cells which are adjacent. Show that the ad-missable region is

Λ = {λ ∈ [0, 1]7 : λα + λβ + λγ ≤ 1 for all α, β, γ that meet at a vertex}.

Exercise 7.20 In this section we’ve assumed some centralized form ofscheduling, so that the scheduleσ can be chosen as a function of queuesizes at the stations. But this is often a quite difficult computational task:to select a maximum weight schedule is in general an NP-hard problem. Inthis and the next exercises we’ll explore a rather different approach, basedon a randomized and decentralized approximation algorithm.



entropy

Recall that we have seen a continuous time version of the wireless net-work of Example 7.12 in Section 5.4, operating under a simple decentral-ized policy. In this exercise we’ll see that it is possible to tune this earliermodel so that it slowly adapts its parameters to meet any arrival rate vectorλ in the interior of admissible regionΛ.

Suppose the stateσ ∈ S ⊂ {0, 1}K evolves as a Markov process withtransition rates as in Section 5.4, and with equilibrium distribution (5.8)determined by the vector of parametersθ. Then the proportion of time thatstationr is transmitting is given by expression (5.9) as

xr (θ) ≡∑

σ∈S σr exp(σ · θ)∑

m∈S exp(m · θ) .

Suppose now that forr ∈ R stationr variesθr = θr (t), but so slowly thatxr(θ) is still given by this expression. IfV(θ) is defined as in Exercise 5.10show thatd

dtV(µ(t)) will be nonnegative provided we set

ddtθr (t) ≷ 0 according asλr ≷ xr (θ(t)).

Exercise 7.21 Consider the following variant of the optimization prob-lem (5.10):

maximize∑

r∈Rwr log xr −

∑

n∈Sp(n) log p(n)

subject to∑

n∈Sp(n)nr = xr , r ∈ R,

and∑

n∈Sp(n) = 1

over p(n) ≥ 0, n ∈ S; xr , r ∈ R.

The first term of the objective function is the usual weighted proportionallyfair utility function and the second term is, as in Section 5.4,the entropyof the probability distribution (p(n), n ∈ S). Show that the Lagrangian dualcan be written (after omitting constant terms) in the form

maximize V(θ) =∑

r∈Rwr logθr − log

∑

n∈Sexp

∑

r∈Rθrnr

over θr ≥ 0, r ∈ R.

Exercise 7.22 We continue with the model of Exercise 7.20. Again sup-pose that forr ∈ R stationr can varyθr , but slowly so thatxr(θ) tracksits value under the equilibrium distribution determined byθ. Specifically,


Lyapunovfunction

Lyapunovfunction


suppose thatddtθr (t) = κr (wr − θr (t)xr (θ(t)))

whereκr > 0, for r ∈ R. Show that the strictly concave functionV(θ) ofExercise 7.21 is a Lyapunov function for this system, and that the uniquevalue maximizing this function is an equilibrium point of the system, towhich all trajectories converge.

Exercise 7.23 Consider again Example 5.20, of wavelength routing.Suppose source-sinksvariesθr = θr (t), for r ∈ s, but so slowly thatxr (θ)

tracks its value under the equilibrium distribution determinedby θ, givenby the expression

xr(θ) ≡∑

n∈S(C) nr exp(n · θ)∑

m∈S(C) exp(m · θ) .

Specifically, suppose thatθr (t) = θs(t) for r ∈ s, and that

ddtθs(t) = κs

ws − θs(t)∑

r∈sxr (θ(t))

,

whereκs > 0, for s ∈ S.Show that the strictly concave function of (θs, s ∈ S)

V(θ) =∑

s∈Sws logθs − log

∑

n∈S(C)

exp

∑

s∈Sθs

∑

r∈snr

is a Lyapunov function for this system, and the unique value maximizingthis function is an equilibrium point to which all trajectories converge.

Observe that this model gives throughputs on source-sink pairss that areweighted proportionally fair, with weights (ws, s ∈ S).

7.9 Further reading

Srikant (2004) and Shakkottai and Srikant (2007) provide a more exten-sive introduction to the mathematics of Internet congestion control, andsurvey work up until their dates of publication. The review by Chianget al.(2007) develops implications for network architecture (i.e. which functionsare performed where in a network). Johari and Tsitsiklis (2004) treat thestrategic behaviour of price-anticipating users, a topic touched upon in Ex-ercise 7.1, and Berry and Johari (2013) is a recent monograph on economicmodels of engineered networks.




recurrence!positive

For more on the particular primal and dual models with delays of Sec-tion 7.7 see Kelly (2003a); a recent text on delay stability for networks isTian (2012). There is much interest in the engineering and implementa-tion of congestion control algorithms: see Vinnicombe (2002) and Kelly(2003b) for the TCP variant of Section 7.7, Kelly and Raina (2011)for adual algorithm using the local stability condition (7.12), andWischiket al.(2011) for an account of recent work on multipath algorithms.

Tassiulas and Ephremides (1992) established the stability ofthe maxi-mum weight scheduling algorithm. Shah and Wischik (2012) is a recentpaper on MaxWeight-α scheduling which discusses the interesting con-jecture that for the switch of Example 7.11 the average queueingdelaydecreases asα decreases. Our assumption, in Theorem 7.13, of Poisson ar-rivals is stronger than needed: Bramson (2006) shows that when a carefullydefinedfluid modelof a queueing network is stable, the queueing networkis positive recurrent.

Research proceeds apace on randomized and decentralized approxima-tion algorithms for hard scheduling problems. For recent reviews see Jiangand Walrand (2012), Shah and Shin (2012) and Chenet al. (2013): a keyissue concerns the separation of time scales implicit in thesealgorithms.

transmissioncontrolprotocol(TCP) 8

Flow level Internet models

In the previous Chapter, we considered a network with a fixed number ofusers sending packets. In this chapter, we will look over longer time scales,where the users may leave because their files have been transferred,andnew users may arrive into the system. We will develop a stochastic modelto represent the randomly varying number of flows present in a networkwhere bandwidth is dynamically shared between flows, where eachflowcorresponds to the continuous transfer of an individual file or document.We will assume that the rate control mechanisms we discussed inthe pre-vious Chapter work on a much faster time scale than these changes occur,so that the system reaches its equilibrium rate allocation very quickly.

8.1 Evolution of flows

We suppose that a flow is transferring a file. For example, when Elena onher home computer is downloading files from her office computer, each filecorresponds to a separate flow. In this chapter we shall want to allow thenumber of flows using a given route to fluctuate. Letnr be the number ofactive flows along router. Let xr be the rate allocated to each flow alongrouter (we assume that it is the same for each flow on the same route); thenthe capacity allocated to router is nr xr at each resourcej ∈ r. The vectorx = (xr , r ∈ R) will be a function ofn = (nr , r ∈ R); for example, it may bethe equilibrium rate allocated by TCP, when there arenr users on router.

We assume that new flows arrive on router as a Poisson process of rateνr , and that files transferred over router have a size which is exponentiallydistributed with parameterµr . Assume file sizes are independent of eachother and of arrivals. Thus, the number of flows evolves as follows:

nr → nr + 1 at rateνr , nr → nr − 1 at rateµrnr xr (n),

for eachr ∈ R. With this definition,n is a Markov process.The ratioρr := νr/µr is theload on router.

186

8.2 α-fair rate allocations 187


recurrence!nullrecurrence!transiencefairness!weighted

$“alpha$

Remark 8.1 This model corresponds to atime-scale separation: giventhe number of flowsn, we assume the rate control mechanism instantlyachieves ratesxr (n). The assumption that file sizes are exponentially dis-tributed is not an assumption likely to be satisfied in practice, but fortu-nately the model is not very sensitive to this assumption.

Exercises

Exercise 8.1 Suppose the network comprises a single resource, of capac-ity C, and thatxr (n) = C/N whereN =

∑

r nr , so that the capacity is sharedequally over all file transfers in progress. Show that provided

ρ :=∑

r∈Rρr < C

the Markov processn = (nr , r ∈ R) has equilibrium distribution

π(n) =(

1− ρC

) ( Nn1, n2, . . . , nR

)∏

r∈R

(ρr

C

)nr

,

and deduce that in equilibriumN has a geometric distribution with meanC/ρ. Observe that ifρ ≥ C, then omitting the term (1− ρ/C) in the aboveexpression forπ still gives a solution to the equilibrium equations: but nowits sum is infinite and so no equilibrium distribution exists (ifρ = C theprocess is null recurrent, while ifρ > C the process is transient).

Show that the mean file size over all files arriving to be transferred isρ/

∑

r νr . Deduce that when file sizes arriving at the resource are distributedas a mixture of exponential random variables, the distribution of N dependsonly on the mean file size, and not otherwise on the distributionof file sizes.

Exercise 8.2 Suppose the shared part of the network comprises a singleresource of capacityC, but that each flow has to pass through its own accesslink of capacityL as well, so thatxr (n) = max{L,C/N} whereN is thenumber of flows present. Suppose flows arrive as a Poisson process,andfiles are exponentially distributed. Show that ifC = sL then the number offlows in the system evolves as anM/M/squeue.

8.2 α-fair rate allocations

Next we define the rate allocationx = x(n). We will define a family ofallocation policies which includes as special cases severalexamples we’veseen earlier.

188 Flow level Internet models

fairness!weighted$“alpha$

Lagrangemultiplier


fairness!proportionalfairness!weighted

proportionalfairness!TCPfairness!max-

minfairness!weighted

$“alpha$

Let wr > 0, r ∈ R, be a set of weights, and letα ∈ (0,∞) be a fixedconstant. Theweightedα-fair allocation x= (xr , r ∈ R) is, for α , 1, thesolution of the following optimization problem.

maximize∑

r

wrnrx1−α

r

1− α (8.1)

such that∑

r: j∈rnr xr ≤ C j , j ∈ J

xr ≥ 0, r ∈ R.

For α = 1 it is the solution of the same optimization problem but withobjective function

∑

r wrnr log xr ; Exercise 8.3 shows why this is the naturalcontinuation of the objective function throughα = 1.

For all α ∈ (0,∞) and all n the objective function is a strictly con-cave function of (xr : r ∈ R, nr > 0); the problem is a generalization ofNETWORK(A,C; w), from Section 7.1.

We can characterize the solutions of the problem (8.1) as follows.Let(pj , j ∈ J) be Lagrange multipliers for the constraints. Then, at the opti-mum,x = x(n) andp = p(n) satisfy

xr =

(

wr∑

j∈r pj

)1/α

, r ∈ R,

where

xr ≥ 0, r ∈ R,∑

r: j∈rnr xr ≤ C j , j ∈ J (primal feasibility);

pj ≥ 0, j ∈ J (dual feasibility);

pj ·

C j −

∑

r: j∈rnr xr

= 0, j ∈ J (complementary slackness).

The form of anα-fair rate allocation captures several of the fairnessdefinitions we’ve seen earlier.

• As α → 0 and withwr ≡ 1 the total throughput,∑

r nr xr , approaches itsmaximum.

• If α = 1 then, from Proposition 7.4, the ratesxr are weighted propor-tionally fair.

• If α = 2 andwr = 1/T2r then the ratesxr areTCP fair, i.e. they are of the

form (7.9).• As α→ ∞ and withwr ≡ 1 the ratesxr approach max-min fairness.

8.3 Stability ofα-fair rate allocations 189


recurrence!positiveLyapunov

function—(recurrence!nullrecurrence!transienceFoster-

Lyapunovcriteria

Exercises

Exercise 8.3 Check that the objective function of the problem (8.1) isconcave for each of the three cases 0< α < 1, α = 1 and 1< α < ∞. Showthat the derivative of the objective function with respect toxr has the sameform, namelywrnr x−αr , for all α ∈ (0,∞).

Exercise 8.4 Check the claimed characterization of the solutions to theproblem (8.1).

Exercise 8.5 Consider theα-fair allocation asα→ 0, the allocation max-imizing the total throughput. Show that however large the number of dis-tinct routes|R|, typically only |J| of them are allocated a non-zero rate.[Hint: Show that asα→ 0 the problem (8.1) approaches a linear program,and consider basic feasible solutions. ]

8.3 Stability of α-fair rate allocations

Consider now the behaviour of the Markov processn under anα-fair allo-cation. Is it stable? This will clearly depend on the parametersνr , µr , and wenext show that the condition is as simple as it could be: all that is needed isthat the load arriving at the network for resourcej is less than the capacityof resourcej, for every resourcej.

Theorem 8.2 The Markov process n is positive recurrent (i.e., has anequilibrium distribution) if and only if

∑

r: j∈rρr < C j for all j ∈ J. (8.2)

Sketch of proof If condition (8.2) is violated for somej then the system isnot stable, by the following coupling argument. Suppose thatj is the onlyresource with finite capacity in the network, with all the other resourcesgiven infinite capacity. This cannot increase the work that hasbeen pro-cessed at resourcej before timet, for any t ≥ 0. But the resulting systemwould be the single server queue considered in Exercise 8.1, which is tran-sient or at best null recurrent.

Our approach to showing thatn is positive recurrent under condition (8.2)will be the same as in the proof of Theorem 7.13. We will write downaLyapunov function, approximate its drift, and show that the approximationis negative whenn is large, and bounded whenn is small. This is almostenough to apply the Foster-Lyapunov criteria (Proposition D.3); you willcheck the remaining details in Exercise D.2.



We begin by calculating the drift ofn.

E[nr (t + δt) − nr (t) |n(t)] ≈(

νr − µrnr xr (n(t)))

δt.

This would be an equality if we could guarantee that not all of the nr (t)flows depart by timet + δt. As nr (t)→ ∞, the approximation gets better.

Next, we consider the Lyapunov function

L(n) =∑

r

wr

µrρ−αr

nα+1r

α + 1.

This is a nonnegative function. We will argue that the drift ofL should benegative providedn(t) is large. We can approximate the change ofL overa small time period by

1δtE[L(n(t + δt)) − L(n(t)) |n(t)] ≈

∑

r∈R

(

∂L∂nr

)

· 1δtE[nr (t + δt) − nr (t) |n(t)]

=∑

r

wr

µrρ−αr nαr

(

νr − µrnr xr (n(t)))

(8.3)

=∑

r

wrρ−αr nαr

(

ρr − nr xr (n(t)))

.

Our goal is to show that this is negative.At this point it is helpful to rewrite the optimization problem (8.1) in

terms ofXr = nr xr , r ∈ R, as:

maximizeG(X) =∑

r

wrnαr

X1−αr

1− α (8.4)

such that∑

r: j∈rXr ≤ C j , j ∈ J

over Xr ≥ 0, r ∈ R.

SinceG is concave, for everyU inside the feasible region of (8.4)

G′(U) · (U − X) ≤ G(U) −G(X) ≤ 0 (8.5)

whereX is the optimum (Figure 8.1 illustrates).Now, if the stability conditions (8.2) are satisfied, then∃ǫ > 0 such that

u = (ρr (1+ ǫ), r ∈ R) is also inside the region (8.2). Therefore, by (8.5),∑

r

wrnαr (ρr (1+ ǫ))

−α(ρr (1+ ǫ) − Xr ) ≤ 0.

8.3 Stability ofα-fair rate allocations 191







XU

Figure 8.1 Let X be the optimum and letU be another point inthe feasible region of (8.4). SinceG(·) is concave, the tangentplane atU lies aboveG(·), soG(X) ≤ G(U) +G′(U) · (X − U).

Combining this with (8.3), we see that for largen, the drift ofL(n) is neg-ative:

1δtE[L(n(t + δt)) − L(n(t)) |n(t)] ≤ −ǫ

∑

r

wrnαr ρ−α+1r .

In order to apply the Foster-Lyapunov criteria (Proposition D.1), wealsoneed to bound the drift ofL(n) from above. You will do this (for the quan-tity on the right-hand side of (8.3)) in Exercise 8.6. This can be done usingthe fact that flows arrive as Poisson processes, and depart at mostas quicklyas a Poisson process. �

Remark 8.3 It’s interesting to compare the model of anα-fair rate alloca-tion with the earlier model of a MaxWeight-α queue schedule. The modelsare operating on very different time scales (the time taken for a packetto pass through a router, versus the time taken for a file to pass across anetwork) but nevertheless the stability results established are of the sameform.

If we view nr as the size of the queue of files that are being transmittedalong router then anα-fair rate allocation chooses the ratesXr allocatedto these queues to solve the optimization problem (8.4). A MaxWeight-αschedule would instead choose the ratesXr allocated to these queues tosolve

max∑

r

nαr Xr

s.t.∑

r: j∈rXr ≤ C j , j ∈ J

Xr ≥ 0, r ∈ R.

In either case we observe that if a queuenr grows to infinity while other






queues are bounded, then this must eventually cause the rateXr allocated tothis queue to increase towards its largest feasible value. In thenext Sectionwe’ll consider a rate allocation scheme for which this is not the case.

Exercises

Exercise 8.6 Show that the right-hand side of (8.3) satisfies

∑

r

wr

µrρ−αr nαr

1δtE[nr (t + δt) − nr (t) | ‖n(t)‖ ≤ K] ≤ b

for some constantb. Why does that not follow immediately from the cal-culation in the proof of Theorem 8.2?

Exercise D.2 in the Appendix asks you to prove a similar inequality forthe quantity on the left-hand side of (8.3), which is needed to apply theFoster-Lyapunov criteria (Proposition D.3).

8.4 What can go wrong?

The stability condition (8.2) is so natural that it might be thought that itwould apply to most rate allocation schemes. In this Section weshow oth-erwise.

C = 1 C = 1

n0

ρ0

ρ1

n1 n2

ρ2

Figure 8.2 A system with three flows and two resources

Consider the network illustrated in Figure 8.2 with just two resourcesand three routes: two one-resource routes and one route using both re-sources. With anyα-fair allocation of the traffic rates, as we saw in thelast section, the stability condition isρ0+ρ1 < 1,ρ0+ρ2 < 1. Provided this

8.4 What can go wrong? 193

recurrence!positivefairness!weighted

$“alpha$fairness!weighted

$“alpha$

condition is satisfied, the Markov process describing the number of files oneach route is positive recurrent.

Let us consider a different way of allocating the available capacity. Sup-pose that streams 1 and 2 are given absolute priority at their resources: thatis, if n1 > 0 thenn1x1 = 1 (and hencen0x0 = 0), and ifn2 > 0 thenn2x2 = 1(and againn0x0 = 0). Then stream 0 will only get served ifn1 = n2 = 0,i.e. if there is no work in any of the high-priority streams.

What is the new stability condition? Resource 1 is occupied bystream1 a proportionρ1 of the time; independently, resource 2 is occupied bystream 2 a proportionρ2 of the time. (Because of the absolute priority,neither stream 1 nor stream 2 “sees” stream 0.) Therefore, both of theseresources are free for stream 0 to use a proportion (1− ρ1)(1 − ρ2) of thetime, and the stability condition is thus

ρ0 < (1− ρ1)(1− ρ2).

This is a strictly smaller stability region than before. Figure 8.3 shows aslice of the stability region for a fixed value ofρ0.

��

��

��

��

��

��

ρ2

ρ11− ρ0

1− ρ0

Figure 8.3 Stability regions underα-fair and priority schemes

The system is failing to realize its capacity, because there isstarvationof resources: the high-priority flows prevent work from reaching some ofthe available resources.

Remark 8.4 Suppose that we put a very large buffer between resources 1and 2, allowing jobs on route 0 to be served by resource 1 and then wait forservice at resource 2. This would increase the stability region, and might bean adequate solution in some contexts. However, in the Internet the buffersizes needed at internal nodes would be unrealistically large, and additionalbuffering between resources would increase round-trip times.

Another way of giving priority to customers on routes 1 and 2 wouldbeto use the weightedα-fair scheme and assign these flows very high weight;


recurrence!positiveas we know, any weightedα-fair scheme results in a positive recurrentMarkov process. Why are the results we get for the two schemes so dif-ferent? In both schemes, the number of files on route 0,n0, will eventuallyget very large. In anα-fair scheme, that means that route 0 will eventuallyreceive service; in the strict priority settingn0 doesn’t matter at all.

Strict priority rules often cause starvation of network resources andareduction in the stability region. Exercise 8.7 gives a queueing networkexample.

Exercises

Exercise 8.7 Consider the network in Figure 8.4. Jobs in the networkneed to go through stations 1–4 in order, but a single server is working atstations 1 & 4, and another single server at stations 2 & 3. Consequently,at any time, either jobs at station 1, or jobs at station 4, butnot both, maybe served; similarly for jobs at stations 2 and 3.

1 2

34

Figure 8.4 The Lu-Kumar network. Jobs go through stations 1–4in order, but a single server is working at stations 1 & 4, and at 2& 3.

We will assume that jobs at station 4 have strict priority over station 1,and jobs at station 2 have strict priority over station 3. That is, if the serverat stations 1 & 4 can choose a job at station 1 or at station 4, she willprocess the job at station 4 first, letting the jobs at station 1 queue.

Assume that jobs enter the system deterministically, at times 0, 1, 2,and so on. We will also assume that the service times of all the jobs aredeterministic: a job requires time 0 at the low-priority stations 1and 3,and time 2/3 at the high-priority stations 2 and 4. The 0-time jobs are, ofcourse, an idealisation, but we’ll see that they still make a difference due tothe priority rules.

Assume that service completions happen “just before schedule”. In par-ticular, a job completing service at station 3 at timen will arrive at station4 before the external arrival at timen comes to station 1.

8.5 Linear network with proportional fairness 195

fairness!proportional—(fairness!proportional

1. Check that each of the two servers individually is not overloaded, i.e.work is arriving at the system for each of them at a rate smaller than1.

2. LetM be a large integer divisible by 3, and consider this system startingat time 0 withM jobs waiting to be processed by station 1, and no jobsanywhere else. At time 0, these jobs all move to station 2. Whatis thestate of the system at time 2M/3? At time M? When will the systemfirst start processing jobs at station 3?

3. Once the jobs are processed through station 3, they move on to station4, which has priority over station 1. When will jobs be processednextthrough station 1? What is the queue at station 1 at that time?

The above example shows that, if we start the system at time 0 with aqueue ofM jobs waiting for service at station 1, then at time 4M there willbe a queue of 2M jobs there. That is, the system is not stable. This is hap-pening because the priority rules cause starvation of some of theservers,forcing them to be idle a large portion of the time.

8.5 Linear network with proportional fairness

In this section, we will analyse an example of a simple network with a pro-portionally fair rate allocation. A linear network (illustratedin Figure 8.5)is in some respects what a more complicated network looks like fromthepoint of view of a single file; consequently, it is an importantmodel toget to understand. We shall see that it is possible to find its equilibriumdistribution explicitly, at least for certain parameter values.

n1, x1 n2, x2 nL, xL

C1 = C2 = . . . = CL = 1

n0, x0

Figure 8.5 Linear network withL resources andL + 1 flows

Takeα = 1 andwi = 1 for all i, i.e. proportional fairness. Let us computethe proportionally fair rate allocation by maximizing

∑

ni log xi over thefeasible region forx. Clearly, if nl > 0 thenn0x0 + nl xl = 1, as otherwise


balance!detailedwe can increasexl . Therefore, the optimization problem requires that wemaximize

n0 log x0 +

L∑

l=1

nl log

(

1− n0x0

nl

)

,

where the summation runs overl for which nl > 0. Differentiating, at theoptimum we have

n0

x0=

L∑

l=1

nln0

1− n0x0=⇒ 1− n0x0 = x0

L∑

l=1

nl

or

x0 =1

n0 +∑L

l=1 nl

, n0x0 =n0

n0 +∑L

l=1 nl

= 1− nl xl .

We now compute the equilibrium distribution for this system explicitly.The general formulae for transition rates are

q(n, n+ er ) = νr , q(n, n− er) = xr (n)nrµr

and thus in this example

q(n, n− e0) = µ0n0

n0 +∑L

l=1 nl

, n0 > 0

and

q(n, n− ei) = µi

∑Ll=1 nl

n0 +∑L

l=1 nl

, ni > 0, i = 1, . . . , L.

Theorem 8.5 The stationary distribution for the above network is

π(n) =∏L

l=1(1− ρ0 − ρl)(1− ρ0)L−1

(∑Ll=0 nl

n0

) L∏

l=0

ρnl

l

providedρ0 + ρl < 1 for l = 1, 2, . . . , L.

Proof We check the detailed balance equations.

π(n) q(n, n+ e0)︸︷︷︸

ν0

= π(n+ e0)︸︷︷︸

π(n)n0+

∑Ll=1 nl+1

n0+1ν0µ0

q(n+ e0, n)︸︷︷︸

µ0n0+1

n0+∑L

l=1 nl+1

and

π(n) q(n, n+ ei)︸︷︷︸

νi

= π(n+ ei)︸︷︷︸

π(n)n0+

∑Ll=1 nl+1

∑Ll=1 nl+1

νiµi

q(n+ ei , n)︸︷︷︸

µi

∑Ll=1 nl+1

n0+∑L

l=1 nl+1

8.5 Linear network with proportional fairness 197


quasi-reversible

processorsharing

insensitivityLittle’s law

as required.It remains only to check thatπ sums to one. But

∑

n0,n1,...,nL

(

n0 + n1 + . . . + nL

n0

)

ρn0

0 ρn1

1 . . . ρnL

L

=∑

n1,...,nL

ρn1

1 . . . ρnL

L

∑

n0

(

n0 + . . . + nL

n0

)

ρn0

0

=∑

n1,...,nL

ρn1

1 . . . ρnL

L

1(1− ρ0)n1+...+nL+1

(negative binomial expansion)

=1

1− ρ0

∑

n1,...,nL

(

ρ1

1− ρ0

)n1(

ρ2

1− ρ0

)n2

. . .

(

ρL

1− ρ0

)nL

if ρi < 1− ρ0, ∀i

=(1− ρ0)L−1

∏Ll=1(1− ρ0 − ρl)

and thusπ does indeed sum to 1, provided the stability conditions are sat-isfied. �

Remark 8.6 The above network is an example of a quasi-reversible queue.That is, if we draw a large box around the linear network, then arrivals ofeach file type will be Poisson processes by assumption, and thedepartureswill be Poisson as well. (We saw this for a sequence ofM/M/1 queuesat the beginning of the course.) This lets us embed this into a bigger net-work as a unit, in the same way we did withM/M/1 queues. We could forexample replace the simple processor sharing queue 0 in Figure 2.9 by theabove linear network and retain a product-form. It also can be used toshowinsensitivity, i.e. that the stationary distribution depends only on the meanfile sizes, and not otherwise on their distributions.

Remark 8.7 We can compute the average time to transfer a file on router but, since files on different routes may have very different sizes, a moreuseful measure is theflow throughputon router defined as

average file size on routeraverage time required to transfer a file on router

.

The average file size we know is 1/µr . The average transfer time can bederived from Little’s lawL = λW, which tells us

E(nr ) = νr · average time required to transfer a file on router .



Combining these, and recallingρr = νr/µr , we get

flow throughput=ρr

Enr.

Since we knowπ(n), we can calculate this throughput, at least for the par-ticular model of this section: see Exercise 8.8.

Exercises

Exercise 8.8 Show that the flow throughput,ρr/Enr , for the linear net-work is given by

1− ρ0 − ρl , l = 1, . . . , L

(1− ρ0)

1+L∑

l=1

ρ0

1− ρ0 − ρl

−1

, l = 0.

Note that 1− ρ0 − ρl is the proportion of time that resourcel is idle. Thus,the flow throughput on routel = 1, . . . , L is as if a file on this route gets aproportion 1− ρ0 − ρl of resourcel to itself.

Exercise 8.9 Use the stationary distribution found in Theorem 8.5 toshow that under this distributionn1, n2, . . . , nL are independent, andnl isgeometrically distributed with meanρl/(1− ρ0 − ρl).[Hint: Sumπ(n) overn0, and use the negative binomial expansion. ]

Exercise 8.10 Consider a network with resourcesJ = {1, 2, 3, 4} eachof unit capacity and routesR = {12, 23, 34, 41}, where we usei j as a con-venient shorthand for{i, j}. Givenn = (nr , r ∈ R), find the ratexr of eachflow on router, for eachr ∈ R, under a proportionally fair rate allocation.Show, in particular, that ifn12 > 0 then

x12n12 =n12 + n34

n12 + n23 + n34 + n41.

Suppose now that flows describe the transfer of files through a network, thatnew flows originate as independent Poisson processes of ratesνr , r ∈ R,and that file sizes are independent and exponentially distributed with meanµr on router ∈ R. Determine the transition rates of the resulting Markovprocessn = (nr , r ∈ R). Show that the stationary distribution of the Markovprocessn = (nr , r ∈ R) takes the form

π(n) = B

(

n12 + n23 + n34 + n41

n12 + n34

)∏

r

(

νr

µr

)nr

,


fairness!proportional—)provided it exists.

8.6 Further reading

Theorem 8.2 and the examples of Sections 8.4 and 8.5 are from Bonald andMassoulie (2001); for further background to the general approach of thisChapter see BenFredjet al. (2001).

The network in Exercise 8.7 was introduced by Lu and Kumar (1991).Analyzing the set of arrival rates for which a given queueing network with agiven scheduling discipline will be stable turns out to be a surprisingly non-trivial problem. The book of Bramson (2006) discusses in detail amethodof showing stability using fluid models, which generalise thedrift analysiswe have seen in several chapters.

Kanget al.(2009) show that a generalization of the independence resultof Exercise 8.9 holds approximately for a wide class of networks operatingunder proportional fairness.

In this Chapter we have assumed a time-scale separation of packet-leveldynamics, such as those described in Section 7.8 for MaxWeightschedul-ing, and flow-level dynamics, such as those described in Section 8.1. SeeWalton (2009) and Moallemi and Shah (2010) for more on the joint dynam-ics of these two time scales. Shahet al. (2012) is a recent paper exploringconnections between scheduling strategies such as those in Section 7.8 andrate allocations such as those in Section 8.2.

Appendix A

Continuous-time Markov processes

Let T be a subset ofR. A collection of random variables (X(t), t ∈ T )defined on a common probability space and taking values in a countablesetS has the Markov property if fort0 < t1 < . . . < tn < tn+1 all in T

P(X(tn+1) = xn+1 |X(tn) = xn,X(tn−1) = xn−1, . . . ,X(t0) = x0)

= P(X(tn+1 = xn+1 |X(tn) = xn)

whenever the event{X(tn) = xn} has positive probability. This is equivalentto the property that fort0 < t1 < . . . < tn all in T and for 0< p < n

P(X(ti) = xi , 0 ≤ i ≤ n |X(tp) = xp) =

P(X(ti) = xi , 0 ≤ i ≤ p |X(tp) = xp)P(X(ti) = xi , p ≤ i ≤ n |X(tp) = xp)

whenever the event{X(tp) = xp} has positive probability. Note that thisstatement has no preferred direction of time: it states that, conditional onX(tp) (the present), (X(ti), 0 ≤ i ≤ p) (the past) and (X(ti), p ≤ i ≤ n) (thefuture) are independent.

Our definition in Chapter 1 of a Markov chain corresponds to the casewhereT is Z+ or Z, and where the transition probabilities are time homo-geneous. Our definition of a Markov process corresponds to the case whereT is R+ or R, where the process is time-homogeneous, and where there isan additional assumption we now explore.

A possibility that arises in continuous, but not in discrete, time isexplo-sion. Consider a Markov process which spends an exponentially distributedtime, with mean (1/2)k, in statek before moving to statek+1, and supposethe process starts at time 0 in state 0. Since

∑

k(1/2)k = 2 we expect theprocess to have run out instructions by time 2. The trajectories will looksomething like Figure A.1. In particular, there will be an infinite number oftransitions during a finite time period.

A related possibility is a process “returning from∞”: consider a processthat at time 0− is in state 0, then “jumps to∞”, and then runs a time-

201

202 Continuous-time Markov processes


recurrence!positiverecurrence!nullrecurrence!transience

0 1 2 3 4

0

1

2

3

4

5

6

7

t

X(t)

Figure A.1 Trajectories of an explosive process. On average, thetime when the process “runs out of instructions” is 2.

reversed version of the above explosion. Then at any timet > 0, the processis (with probability 1) in a finite state; but, ast → 0 from above,P(X(t) =j)→ 0 for any finitej. In particular, this process spends a finite time in state0, but “goes nowhere” once it leaves state 0 (in this example,q(0, j) = 0for all j > 0). Such a process is known asnonconservative.

Throughout these notes we assume that any Markov process with whichwe deal remains in each state for a positive period of time and is incapableof passing through an infinite number of states in a finite time.This as-sumption excludes both of the above phenomena and ensures theprocesscan be constructed as in Chapter 1 (from a jump chain and a sequenceof independent exponential random variables) withq(i) ≡ ∑

j∈S q(i, j) thetransition rate out of statei. It will usually be clear that the assumptionholds for particular processes, for example, from the definition ofthe pro-cess in terms of arrival and service times.

In Chapter 1 we defined an equilibrium distribution for a chain or pro-cess to be a a collectionπ = (π( j)), j ∈ S) of positive numbers summing tounity that satisfy the equilibrium equations (1.1) or (1.2). An equilibriumdistribution exists exists if and only if the chain or process ispositive recur-rent, i.e. from any state, the mean time to return to that state after leaving itis finite. If an equilibrium distribution does not exist then thechain or pro-cess may be eithernull recurrent, i.e. the time to return to a state is finitebut has infinite mean, ortransient, i.e. on leaving a state there is a positiveprobability the state will never be visited again.

When an equilibrium distribution exists, the process can be explicitly

Continuous-time Markov processes 203

Poisson pointprocessconstructed fort < 0 by constructing the reversed process (X(−t), t ≥ 0),

using the transition rates (q′( j, k), j, k ∈ S) given in Proposition 1.1. Forexample, in Figure 2.1, given the number in the queue at timet = 0, wecan use the Poisson point processesA andD to construct the process bothbefore and after time 0. This provides an alternative to starting the processwith the equilibrium distribution at timet0, and lettingt0→ −∞.

Consider a Markov process (X(t), t ∈ T ) whereT isR+ orR. A randomvariableT is astopping time(for the process) if for eacht ∈ T the event{T ≤ t} depends only on (X(s), s ≤ t); that is, it is possible to determinewhetherT ≤ t based only on observing the sample path ofX up throughtime t. A fundamental result is thestrong Markov property: if T is a stop-ping time then conditional onT < ∞ andX(T) = xp, the past (X(t), t ≤ T)and the future (X(t), t ≥ T) are independent. IfT is deterministic then thisbecomes our earlier restatement of the Markov property: for fixedt0 thedistribution of (X(t), t ≥ t0) is determined by and determines the finite di-mensional distributions, i.e. the distributions of (X(ti), 0 ≤ i ≤ n) for allt0 < t1 < . . . < tn, and any finiten (Norris (1998), section 6). Finally wenote that our construction of a Markov process from a jump chain followedthe usual convention and made the process right-continuous, but this wasarbitrary, and is a distinction not detected by the finite dimensional distri-butions.

Little’s law—(

Appendix B

Little’s law

Let X1,X2, . . . be independent identically distributed nonnegative randomvariables. These will represent times between successive renewals. Let

N(t) = max{n : X1 + . . . + Xn ≤ t}

be the number of renewals that have occurred by timet. N(t) is called arenewal process.

Suppose that a rewardYn is earned at the time of thenth renewal. TherewardYn will usually depend onXn, but we suppose the pairs (Xn,Yn), n =1, 2, . . . , are independent and identically distributed, and thatYn, as well asXn, is non-negative. Therenewal reward process

Y(t) =N(t)∑

n=1

Yn

is the total reward earned by timet.

Theorem B.1(Renewal Reward Theorem)If bothEXn andEYn are finitethen

limt→∞

Y(t)t

w.p.1=EYn

EXn= lim

t→∞E

Y(t)t.

Remark B.2 In some applications the reward may be earned over thecourse of the cycle, rather than at the end. Provided the partial rewardearned over any interval is non-negative the theorem still holds. The the-orem also holds if the initial pair (X1,Y1) has a different distribution fromlater pairs, provided its components have finite mean. For example, if thesystem under study is a Markov process with an equilibrium distribution,the system might start in equilibrium.

The reader will note the similarity of the “with probability one” part ofthe Theorem to the ergodic property of an equilibrium distribution for aMarkov process: both results are consequences of the strong law oflarge

204

Little’s law 205

numbers. Let’s prove just this part of the renewal reward theorem (the“ex-pectation” part needs a little more renewal theory).

Proof (of the w. p. 1 part).Write (X,Y) for a typical pair (Xn,Yn); weallow (X1,Y1) to have a different distribution. FirstN(t) → ∞ as t → ∞with probability one, since theXs have finite mean and are therefore properrandom variables. Now

N(t)∑

n=1

Xn ≤ t <N(t)+1∑

n=1

Xn

by the definition ofN(t), and so∑N(t)

n=1 Xn

N(t)︸︷︷︸

→EX

≤ tN(t)

<

∑N(t)+1n=1 Xn

N(t) + 1︸︷︷︸

→EX

· N(t) + 1N(t)

.

︸︷︷︸

→1

where the convergences indicated are with probability 1. Thust/N(t) →EX with probability 1, and soN(t)/t → 1/EX with probability 1. But

N(t)∑

n=1

Yn ≤ Y(t) ≤N(t)+1∑

n=1

Yn

and so∑N(t)

n=1 Yn

N(t)︸︷︷︸

→EY

· N(t)t

︸︷︷︸

→1/EX

≤ Y(t)t≤

∑N(t)+1n=1 Yn

N(t) + 1︸︷︷︸

→EY

· N(t) + 1N(t)

︸︷︷︸

→1

· N(t)t

︸︷︷︸

→1/EX

and thusY(t)/t → EY/EX with probability 1. �

To deduce Little’s Law in Section 2.5, we needed to make three applica-tions of the renewal reward theorem. Consider first a renewal reward pro-cess where each customer generates unit reward per unit of time itspends inthe system. Then we used this process, together with the abovetheorem, todefineL in terms of the mean rewards earned over, and the mean length of,a regenerative cycle. Our definition of the mean arrival arrival rateλ useda different reward structure, where each customer generates unit rewardwhen it enters the system. Finally, to defineW we construct a discrete timerenewal process, as follows. The process tracks the number of customersin the system at the times of customer arrivals and departures (cf. the jumpchain for a Markov process). The first customer to enter the system fol-lowing a regeneration point of the system marks a regeneration point onthis discrete time scale. If each customer generates a reward on leaving the

206 Little’s law

Little’s law—) system equal to its time spent in the system, then we defineW in terms ofthe mean reward earned over, and the mean length of, a regenerative cycleon the discrete time scale.

Lagrangemultiplier—(

StrongLagrangianPrinciple—(

Appendix C

Lagrange multipliers

Let P(b) be the optimization problem

minimize f (x)

subject to h(x) = b

over x ∈ X.

Let X(b) = {x ∈ X : h(x) = b}. Say thatx is feasiblefor P(b) if x ∈ X(b).Define theLagrangian

L(x; y) = f (x) + yT(b− h(x)).

Typically X ⊂ Rn, h : Rn → Rm, with b, y ∈ Rm. The components ofy arecalledLagrange multipliers.

Theorem C.1(Lagrange Sufficiency Theorem) If x andy exist such thatx is feasible for P(b) and

L(x; y) ≤ L(x; y) ∀x ∈ X,

thenx is optimal for P(b).

Proof For all x ∈ X(b) andy we have

f (x) = f (x) + yT(b− h(x)) = L(x; y).

Now x ∈ X(b) ⊂ X, and thus

f (x) = L(x; y) ≤ L(x; y) = f (x) ∀x ∈ X(b).

�

Section 2.7 gives an example of the use of a Lagrange multiplier, and ofthe application of the above Theorem.

Let

φ(b) = infx∈X(b)

f (x) and g(y) = infx∈X

L(x; y).

207

208 Lagrange multipliers

shadow price Then for ally

φ(b) = infx∈X(b)

L(x; y) ≥ infx∈X

L(x; y) = g(y). (C.1)

Let Y = {y : g(y) > −∞}. From inequality (C.1) we immediately have thefollowing.

Theorem C.2(Weak Duality Theorem)

infx∈X(b)

f (x) ≥ supy∈Y

g(y).

The left-hand side of this inequality, that isP(b), is called theprimalproblem, and the right-hand side is called thedual problem. The Lagrangemultipliers y appearing in the dual problem are also known asdual vari-ables. In general, the inequality in Theorem C.2 may be strict: in this casethere is no ¯y that allows the Lagrange Sufficiency Theorem to be applied.

Say that theStrong Lagrangian Principleholds forP(b) if there exists ¯ysuch that

φ(b) = infx∈X

L(x; y);

equality then holds in Theorem C.2, from (C.1), and we say there isstrongduality.

What are sufficient conditions for the Strong Lagrangian Principle tohold? We’ll state, but not prove, some sufficient conditions that are satisfiedfor the various optimization problems we’ve seen in earlier Chapters (formore see Whittle (1971); Boyd and Vandenberghe (2004)).

Theorem C.3 (Supporting Hyperplane Theorem)Supposeφ is convexand b lies in the interior of the set of points whereφ is finite. Then thereexists a (non-vertical) supporting hyperplane toφ at b.

Theorem C.4 The following are equivalent for the problem P(b):

• there exists a (non-vertical) supporting hyperplane toφ at b;• the Strong Lagrangian Principle holds.

Corollary C.5 If the Strong Lagrangian Principle holds and ifφ(·) isdifferentiable at b, theny = φ′(b).

Remark C.6 The optimal value achieved in problemP(b) is φ(b), andthus Corollary C.5 shows that the Lagrange multipliers ¯y measure the sen-sitivity of the optimal value to changes in the constraint vector b. For thisreason Lagrange multipliers are sometimes calledshadow prices.

Lagrange multipliers 209

Lagrange mul-tiplier!slackvariables


StrongLagrangianPrinciple—)

Lagrangemultiplier—)

Sometimes our constraints are of the formh(x) ≤ b rather thanh(x) = b.In this case we can write the problem in the formP(b) as

minimize f (x)

subject to h(x) + z= b (C.2)

over x ∈ X, z≥ 0,

wherez is a vector ofslack variables. Note thatx, znow replacesx, and welook to minimize

L(x, z; y) = f (x) + yT(b− h(x) − z)

over x ∈ X andz ≥ 0. For this minimization to give a finite value it isnecessary thaty ≥ 0, which is thus a constraint on the feasible regionY ofthe dual problem; and given this, any minimizingzsatisfiesyTz= 0, a con-dition termedcomplementary slackness. Section 3.4 uses this formulation.

Theorem C.7 For a problem P(b) of the form(C.2), if X is a convex setand f and h are convex thenφ is convex.

Thus, ifX is a convex set,f andh are convex functions, andb lies in theinterior of the set of points whereφ is finite, we conclude that the StrongLagrangian Principle holds.


Foster-Lyapunovcriteria—(

recurrence!positiverecurrence!positive

Appendix D

Foster-Lyapunov criteria

Our interest here is to be able to prove that certain Markov chains are pos-itive recurrent without explicitly deriving the equilibrium distribution forthem. We will borrow an idea from the analysis of ordinary differentialequations.

In Section 7.7.3 we studied the long-term behaviour of the trajectoriesof systems of ordinary differential equations. We were able to establishconvergence without explicitly computing the trajectories themselves byexhibiting a Lyapunov function, that is a scalar function of the state of thesystem that was monotone in time.

A similar argument works for Markov chains. Consider an irreduciblecountable state-space Markov chain (Xn)n≥0. Analogously to the differentialequation case, a Lyapunov function for a Markov chain is a function of thestateX whose expectation is monotone in time:E[L(X(n + 1)) |X(n)] ≤L(X(n)). However, as we noted in Chapter 5 (Remark 5.2), a simple driftcondition is not enough to show positive recurrence; we need some furtherconstraints bounding the upward jumps ofL.

Proposition D.1 (Foster-Lyapunov stability criterion for Markov chains)Let (X(n))n≥0 be an irreducible Markov chain with state spaceS and tran-sition matrix P= (p(i, j)). SupposeL : S → R+ is a function such that, forsome constantsǫ > 0 and b, some finite exception set K⊂ S, and all i ∈ S,

E[L(X(n+ 1))− L(X(n)) |X(n) = i] ≤

−ǫ, i < K;

b− ǫ, i ∈ K.

Then the expected return time to K is finite, and X is a positiverecurrentMarkov chain.

Proof Let τ = min{n ≥ 1 : X(n) ∈ K} be the first time aftert = 0 ofentering the exception set. We will show thatE[τ |X(0) = i] is finite for alli ∈ S.

210

Foster-Lyapunov criteria 211

recurrence!positiveConsider the inequality

E[L(X(t + 1)) |X(t)] + ǫ ≤ L(X(t)) + bI{X(t) ∈ K},

which holds for all timest. Add this up over allt : 0 ≤ t ≤ τ − 1 and takeexpectations to get

τ∑

t=1

E[L(X(t))] + ǫE[τ] ≤τ−1∑

t=0

E[L(X(t))] + b. (D.1)

We would like to cancel like terms in the sums to get a bound onE[τ], butthe sums may be infinite. We get around this difficulty as follows. Letτn bethe first time whenL(X(t)) ≥ n, and let

τn = min(τ, n, τn).

Then we have

τn∑

t=1

E[L(X(t))] + ǫE[τn] ≤τn−1∑

t=0

E[L(X(t))] + b (D.2)

and here everything is finite, so we may indeed cancel like terms:

ǫE[τn] ≤ L(X(0))− L(X(τn)) + b ≤ L(X(0))+ b. (D.3)

Increasingn → ∞ will increaseτn → ∞, henceτn → τ. Monotoneconvergence now implies

E[τ] ≤ L(X(0))+ bǫ

.

We have thus shown that the mean return time to the exception setK isfinite. This implies (and is equivalent to) positive recurrence (Asmussen,2003, Lemma I.3.10). �

Remark D.2 This technique for proving stability of Markov chains wasintroduced in Foster (1953). Many variants of the Foster-Lyapunov criteriaexist; this version has been adapted from Hajek (2006).

Often we want to work with (continuous-time) Markov processes, ratherthan (discrete-time) Markov chains. The Foster-Lyapunov stability crite-rion is very similar, once we have defined the drift appropriately. Looking

212 Foster-Lyapunov criteria

recurrence!positiveover a small time interval of lengthδ, we have

E[L(X(t + δ)) − L(X(t)) |X(t) = i]δ

=∑

j

P(X(t + δ) = j |X(t) = i))δ

︸︷︷︸

→q(i, j)

(L( j) − L(i))

asδ → 0, where (q(i, j)) is the matrix of transition rates for the Markovprocess. Thus, we expect the drift condition for Markov processes to usethe quantity

∑

j q(i, j)(L( j) − L(i)).

Proposition D.3(Foster-Lyapunov stability criterion for Markov processes)Let (X(t))t≥0 be a (time-homogeneous, irreducible, nonexplosive, conser-vative) continuous-time Markov process with countable state spaceS andmatrix of transition rates(q(i, j)). SupposeL : S → R+ is a function suchthat, for some constantsǫ > 0 and b, some finite exception set K⊂ S, andall i ∈ S,

∑

j

q(i, j)(L( j) − L(i)) ≤

−ǫ, i < K

b− ǫ, i ∈ K.(D.4)

Then the expected return time to K is finite, and X is positive recurrent.

Proof Let XJ be the jump chain ofX. This is the Markov chain obtainedby looking atX just after it has jumped to a new state. Recall, from Chapter1, that the transition probabilities of the jump chain arep(i, j) = q(i, j)/q(i),whereq(i) =

∑

j q(i, j). Let N = min{n ≥ 1 : XJ(n) ∈ K}, and letτ be thetime of theNth jump. Equivalently,τ is the first time after leaving stateX(0) that the processX(t) is in the setK.

Now, we can rewrite the condition (D.4) as

∑

j

p(i, j)(L( j) − L(i)) ≤

−ǫ(i) = −ǫq(i)−1, i < K

b(i) − ǫ(i) = (b− ǫ)q(i)−1, i ∈ K.

The argument for discrete-time chains gives

E

N−1∑

n=0

ǫ(XJ(n))

≤ L(X(0))+ b(X(0)).

Recall thatq(i)−1 is the expected time it takes the Markov processX tojump from statei, so the inequality above can be rewritten as

ǫE[τ] ≤ L(X(0))+ b(X(0)),


recurrence!positiveMaxWeight-

$“alpha$algorithm

and henceE[τ] < ∞. A continuous-time version of (Asmussen, 2003,Lemma I.3.10) then implies thatX is positive recurrent. �

Foster-Lyapunov criteria provide a powerful technique for showingpos-itive recurrence of Markov chains, because constructing a Lyapunov func-tion is often easier than constructing an explicit, summable equilibrium dis-tribution. However, verifying the conditions of the criteria canbe a lengthyprocess. An example computation from Section 7.7.8 appears below in Ex-ample D.4 and Exercise D.1.

The question of determining when queueing systems are stable is anactive area of research. Lyapunov functions offer one technique, but theycome with limitations, both because of the Markovian assumptions re-quired, and because writing down an explicit Lyapunov functioncan betricky – see Exercise D.3 below! One commonly adapted approach involvesconstructingfluid limits, which describe the limiting trajectories of the sys-tem under a certain rescaling. Analyzing properties of these trajectoriesis often simpler than constructing an explicit Lyapunov function, and canalso be used to show positive recurrence. To read about this technique, seeBramson (2006).

Example D.4 (MaxWeight-α) Recall the model of a switched networkrunning the MaxWeight-α algorithm. There is a set of queues indexed byR. Packets arrive into queues as independent Poisson processes with ratesλr . Due to constraints on simultaneous activation of multiple queues, ateach discrete time slot there is a finite setS of possible schedules, onlyone of which can be chosen. A scheduleσ is a vector of non-negativeintegers (σr )r∈R, which describes the number of packets to be served fromeach queue. (Ifqr < σr , i.e. there aren’t enough packets in queuer, thepolicy will serve all the packets in queuer.) The MaxWeight-α policy will,at every time slot, pick a scheduleσ that solves

max∑

r∈Rσrqr(t)

α s.t.σ ∈ S.

Note that this is a Markovian system, since the scheduling decision dependsonly on the current state of the system. We would like to know for whatarrival rate vectorsλ the Markov chain is positive recurrent. Theorem 7.13asserts that this is true for allλ in the interior of the admissible regionΛ,given by

Λ = {λ ≥ 0 : ∃cσ ≥ 0, σ ∈ S,∑

σ

cσ = 1 such thatλ ≤∑

σ

cσσ}.


In our sketch of the proof of this result, we considered the Lyapunovfunction

L(q) =∑ qα+1

r

α + 1.

In our analysis in Chapter 7, we approximated its true drift by a quantitythat was easier to work with, see (7.14). Let us now show rigorouslythatthis functionL satisfies the conditions of Proposition D.1.

We will show that the drift ofL(q(t)) can be bounded as follows:

E[L(q(t + 1))− L(q(t)) |q(t)] ≤∑

r

qr (t)αE[qr (t + 1)− qr(t) |qr (t)]

︸︷︷︸

main term

+

δ∑

r

qr (t)α +

∑

r

br

︸︷︷︸

error

, (D.5)

where we can take constantsδr > 0 to be arbitrarily small, and then pickconstantsbr to be sufficiently large. The main term can be bounded aboveby −ǫ‖q(t)‖r as in Exercise 7.18, for someǫ > 0. Clearly, if δr are smallenough, the main term will dominate for large enough queues. Thus, wecan pick a constantK > 0 for which the conditions of Proposition D.1 aresatisfied.

It remains to show the bound (D.5). You will do this in Exercise D.1.

Exercise D.1 Write E[qr (t + 1)α+1 − qr(t)α+1 |q(t)] as

qr(t)α+1E

(

1+qr (t + 1)− qr (t)

qr(t)

)α+1

− 1

|q(t)

.

The (random) increment|qr (t + 1)− qr (t)| is bounded above by the maxi-mum of a Poisson random variable of rateλr (the arrivals), and the finiteset of valuesσr (the services). Call this maximumYr , and notice thatYr isindependent ofqr (t).

1. Show that we can bound the drift above by

qr(t)α+1E

(

1+Yr

qr (t)

)α+1

− 1

|qr (t)

.

2. Let Kr be a large constant, letδ > 0 be small, and supposeqr(t) > Kr .By splitting the expectation (overY) into the cases ofY > δKr and


Y ≤ δKr , show that

E

(

1+Yr

qr (t)

)α+1

− 1

|qr (t)

≤

∣∣∣(1+ δ)α+1 − 1

∣∣∣ + E

∣∣∣∣∣∣∣

(

1+Yr

Kr

)α+1∣∣∣∣∣∣∣

+ 1 |Yr > δKr

P(Yr > δKr ).

3. Show that all moments ofYr are finite. Conclude that the second termtends to zero asKr → ∞.

4. If qr (t) ≤ Kr , show that the right-hand side is bounded by a constantbr

(which may depend onKr ).5. Show that

E

[

1α + 1

(

qr(t + 1)α+1 − qr (t)α+1

)

|q(t)

]

≤

qr (t)αE[qr (t + 1)− qr(t) | qr(t)] + 2δ

∑

r

qr (t)α +

∑

r

br

providedδ is chosen small enough, thenKr large enough, thenbr largeenough.

Exercise D.2 Do a similar analysis for theα-fair allocation of Section 8.3.The Lyapunov function is defined in the proof of Theorem 8.2.

Exercise D.3 In this exercise we revisit the backlog estimation schemeconsidered in Section 5.5.2, and show that it is positive recurrent forν < e−1

and a choice of parameters (a, b, c) = (2 − e, 0, 1). Finding the Lyapunovfunction is quite tricky here; the suggestion below is adapted from (Hajek,2006, Proof of Proposition 4.2.1).

Recall thatκν < 1 was defined as the limiting value ofκ(t) = n(t)s(t) for a

trajectory of the system of differential equations approximating the Markovchain (N(t),S(t)) (see (5.3)). To construct the Lyapunov function, we ob-serve that for the differential equation eithern(t) ≈ κνs(t), and thenn(t) isdecreasing, or else the deviation|n(t) − κνs(t)| tends to decrease. Thus, theLyapunov function below includes contributions from both of those quan-tities.

Define

L(N,S) = N+φ(|κνS − N|), φ(u) =

u2/M, 0 ≤ u ≤ M2

M(2u− M2), u > M2(D.6)

for some sufficiently large constantM.Show thatL(n(t), s(t)) is monotonic when (n(t), s(t)) solve (5.3). Then



Foster-Lyapunovcriteria—)

use the techniques in this appendix to argue thatL satisfies the conditionsof the Foster-Lyapunov criteria for the Markov chain (Nt,St).

Notes

217

References

Aldous, D. 1987. Ultimate instability of exponential back-off protocol foracknowledgement-based transmission control of random access communicationchannels.IEEE Transactions on Information Theory, 33, 219–223.

Asmussen, S. 2003.Applied Probability and Queues. 2nd edn. Springer.Baccelli, F., and Bremaud, P. 2003.Elements of Queueing Theory. Springer.BenFredj, S., Bonald, T., Proutiere, A., Regnie, G., and Roberts, J. W. 2001. Statistical

bandwidth sharing: a study of congestion at flow level.Computer CommunicationReview, 31, 111–122.

Berry, R.A., and Johari, R. 2013.Economic Modeling: a Primer with EngineeringApplications. Foundations and Trends in Networking, NoW Publishers.

Bonald, T., and Massoulie, L. 2001. Impact of fairness on Internet performance.Per-formance Evaluation Review, 29, 82–91.

Boyd, S., and Vandenberghe, L. 2004.Convex Optimization. Cambridge UniversityPress.

Bramson, M. 2006.Stability and Heavy Traffic Limits for Queueing Networks: St. FlourLectures Notes. Springer.

Chang, C.-S. 2000.Performance Guarantees in Communication Networks. Springer.Chen, M., Liew, S., Shao, Z., and Kai, C. 2013. Markov approximation for combinato-

rial network optimization.IEEE Transactions on Information Theory.Chiang, M., Low, S.H., Calderbank, A.R., and Doyle, J.C. 2007. Layering as optimiza-

tion decomposition: a mathematical theory of network architectures.Proceedings ofthe IEEE, 95, 255–312.

Courcoubetis, C., and Weber, R. 2003.Pricing Communication Networks: Economics,Technology and Modelling. Wiley.

Crametz, J.-P., and Hunt, P.J. 1991. A limit result respecting graph structure for a fullyconnected loss network with alternative routing.Annals of Applied Probability, 1,436–444.

Crowcroft, J., and Oechslin, P. 1998. Differentiated end-to-end Internet services usinga weighted proportionally fair sharing TCP.Computer Communications Review, 28,53–69.

Doyle, P.G., and Snell, J.L. 2000.Random Walks and Electric Networks. Carus Math-ematical Monographs.

Erlang, A.K. 1925. A proof of Maxwell’s law, the principal proposition inthe kinetictheory of gases. Pages 222–226 of: E. Brockmeyer, H.L. Halstrom, and Jensen,

218

References 219

A. (eds),The life and works of A.K. Erlang. Copenhagen: Academy of TechnicalSciences, 1948.

Foster, F. G. 1953. On the stochastic matrices associated with certain queueing pro-cesses.Annals of Mathematical Statistics, 24, 355–360.

Gale, D. 1960.The Theory of Linear Economic Models. University of Chicago.Gallager, R.G. 1977. A minimum delay routing algorithm using distributed computa-

tion. IEEE Transactions on Communications, 25, 73–85.Ganesh, A., O’Connell, N., and Wischik, D. 2004.Big Queues. Springer.Gibbens, R.J., Kelly, F.P., and Key, P.B. 1995. Dynamic AlternativeRouting. Pages

13–47 of: Steenstrup, Martha (ed),Routing in Communication Networks. PrenticeHall.

Goldberg, L., Jerrum, M., Kannan, S., and Paterson, M. 2004. A bound on the capacityof backoff and acknowledgement-based protocols.SIAM J. Comput., 33, 313–331.

Hajek, B. 2006.Notes for ECE 467: Communication Network Analysis. http://www.ifp.illinois.edu/˜hajek/Papers/networkanalysis.html.

Jacobson, V. 1988. Congestion avoidance and control.Computer Communication Re-view, 18, 314–329.

Jiang, L., and Walrand, J. 2010. A distributed CSMA algorithm for throughput andutility maximization in wireless networks.IEEE/ACM Transactions on Networking,18, 960–972.

Jiang, L., and Walrand, J. 2012. Stability and delay of distributed scheduling algorithmsfor networks of conflicting queues.Queueing Systems, 72, 161–187.

Johari, R., and Tsitsiklis, J.N. 2004. Efficiency loss in a network resource allocationgame.Mathematics of Operations Research, 29, 407–435.

Kang, W.N., Kelly, F.P., Lee, N.H., and Williams, R.J. 2009. State space collapse anddiffusion approximation for a network operating under a fair bandwidth-sharing pol-icy. Annals of Applied Probability, 19, 1719–1780.

Kelly, F., and Raina, G. 2011. Explicit congestion control: charging, fairnessand admission management. Pages 257–274 of: Ramamurthy, B., Rouskas, G.,and Sivalingam, K. (eds),Next-Generation Internet Architectures and Protocols.Cambridge University Press.

Kelly, F.P. 1991. Loss networks.Annals of Applied Probability, 1, 319–378.Kelly, F.P. 1996. Notes on effective bandwidths. Pages 141–168 of: Kelly, F.P., Zachary,

S., and Ziedins, I.B. (eds),Stochastic Networks: Theory and Applications. OxfordUniversity Press.

Kelly, F.P. 2003a. Fairness and stability of end-to-end congestion control. EuropeanJournal of Control, 9, 159–176.

Kelly, F.P. 2011.Reversibility and Stochastic Networks. Cambridge University Press.Kelly, F.P., and MacPhee, I.M. 1987. The number of packets transmitted by collision

detect random access schemes.Annals of Probability, 15, 1557–1668.Kelly, T. 2003b. Scalable TCP: improving performance in highspeed wide area net-

works. Computer Communication Review, 33, 83–91.Kendall, D.G. 1953. Stochastic processes occurring in the theory of queues and their

analysis by the method of the imbedded Markov chain.Annals of MathematicalStatistics, 24, 338–354.

220 References

Kendall, D.G. 1975. Some problems in mathematiucal geneology. Pages325–345 of:Gani, J. (ed),Perspectives in Probability and Statistics: Papers in Honour of M.S.Bartlett. Applied Probability Trust/ Academic Press.

Key, P.B. 1988. Implied cost methodology and software tools for a fullyconnectednetwork with DAR and trunk reservation.British Telecom Technology Journal, 6,52–65.

Kind, J., Niessen, T., and Mathar, R. 1998. Theory of maximum packing and relatedchannel assignment strategies for cellular radio networks.Mathematical Methods ofOperations Research, 48, 1–16.

Kingman, J.F.C. 1993.Poisson Processes. Oxford University Press.Kleinrock, L. 1964.Communication Nets: Stochastic Message Flow and Delay. Mc-

Graw Hill.Kleinrock, L. 1976.Queueing Systems, vol II: Computer Applications. Wiley.Lu, S.H., and Kumar, P.R. 1991. Distributed scheduling based on duedates and buffer

priorities. IEEE Transactions on Automatic Control, 36, 1406–1416.Marbach, P., Eryilmaz, A., and Ozdaglar, A. 2011. AsynchronousCSMA Policies in

Multihop Wireless Networks With Primary Interference Constraints.IEEE Transac-tions on Information Theory, 57, 3644–3676.

Mazumdar, R. 2010.Performance Modeling, Loss Networks, and Statistical Multiplex-ing. Morgan and Claypool.

Meyn, S.P., and Tweedie, R.L. 1993.Markov Chains and Stochastic Stability. Springer.Moallemi, C., and Shah, D. 2010. On the flow-level dynamics of a packet-switched

network.Performance Evaluation Review, 38, 83–94.Modiano, E., Shah, D., and Zussman, G. 2006. Maximizing throughput in wireless

networks via gossiping.Performance Evaluation Review, 34, 27–38.Nash, J.F. 1950. The bargaining problem.Econometrica, 18, 155–162.Norris, J.R. 1998.Markov Chains. Cambridge University Press.Ott, T.J. 2006. Rate of convergence for the ‘square root formula’ inthe Internet trans-

mission control protocol.Advances in applied probability, 38, 1132–1154.Pallant, D.L., and Taylor, P.G. 1995. Modeling handovers in cellular mobile networks

with dynamic channel allocation.Operations Research, 43, 33–42.Pitman, J. 2006.Combinatorial Stochastic Processes. Springer.Rawls, J. 1971.A Theory of Justice. Harvard University Press.Ross, K.W. 1995.Multiservice Loss Models for Broadband Communication Networks.

Springer.Shah, D., and Shin, J. 2012. Randomized scheduling algorithm for queueing networks.

Annals of Applied Probability, 22, 128–171.Shah, D., and Wischik, D. 2012. Switched networks with maximum weight policies:

Fluid approximation and multiplicative state space collapse.Annals of Applied Prob-ability, 22, 70–127.

Shah, D., Walton, N.S., and Zhong, Y. 2012. Optimal queue-size scaling in switchednetworks.Performance Evaluation Review, 40, 17–28.

Shakkottai, S., and Srikant, R. 2007.Network Optimization and Control. Foundationsand Trends in Networking, NoW Publishers.

Songhurst, D.J. 1999.Charging Communication Networks: from Theory to Practice.Elsevier.

Srikant, R. 2004.The Mathematics of Internet Congestion Control. Birkhauser.

References 221

Tassiulas, L., and Ephremides, A. 1992. Stability properties of constrained queueingsystems and scheduling policies for maximum throughput in multihop radio net-works. IEEE Transactions on Automatic Control, 37, 1936–1948.

Tian, Y.-P. 2012.Frequency-Domain Analysis and Design of Distributed Control Sys-tems. Wiley.

Vinnicombe, G. 2002. On the stability of networks operating TCP-like congestioncontrol. Proc IFAC, 15, 217–222.

Walrand, J. 1988.An Introduction to Queueing Networks. Prentice-Hall.Walton, N.S. 2009. Proportional fairness and its relationship with multi-class queueing

networks.Annals of Applied Probability, 19, 2301–2333.Whittle, P. 1971.Optimization Under Constraints. Wiley.Whittle, P. 1986.Systems in Stochastic Equilibrium. Wiley.Whittle, P. 2007.Networks: Optimisation and Evolution. Cambridge University Press.Wischik, D. 1999. The output of a switch, or, effective bandwidths for networks.Queue-

ing Systems, 32, 383–396.Wischik, D., Raiciu, C., Greenhalgh, A., and Handley, M. 2011. Design, implementa-

tion and evaluation of congestion control for multipath TCP.Proceedings of the 8thUSENIX conference on Networked systems design and implementation.

Zachary, S., and Ziedins, I. 2011. Loss networks. Pages 701–728of: Boucherie, R.J.,and van Dijk, N.M. (eds),Queueing Networks: a Fundamental Approach. Springer.

Index

ALOHA protocol, 8, 109–115, 119–122,125

ARPANET, 47

balancedetailed, 19, 20, 24, 30, 34, 48, 49, 55,

59, 196full, 19, 30partial, 30, 33, 55

Bartlett’s theorem, 43binary exponential backoff, 120, 124, 166bistability, 91, 113, 114Borel-Cantelli lemma, 112, 113, 125Braess’s paradox, 6, 85, 92–94, 106

Chinese restaurant process, 38

Dirichlet problem, 91

effective bandwidth, 1, 9, 133–148, 151entropy, 128, 183equilibrium equations, 17, 19, 187, 202Erlang fixed point, 53, 69–75, 77, 102,

132, 169Erlang’s formula, 20–23, 51, 53, 73, 80,

102, 133Ethernet protocol, 8, 120–124

fairnessmax-min, 158, 159, 179, 188proportional, 159, 170, 171, 188,

195–199TCP, 188weightedα, 187–193weighted proportional, 159, 160, 183,

184, 188family size process, 36–38Foster-Lyapunov criteria, 2, 119, 131,

181, 182, 189, 191, 192, 210–216

hysteresis, 5, 6, 76

independent vertex set, 49, 131insensitivity, 43, 45, 48, 197

interference graph, 49, 53, 115, 125, 128,131, 179

Kirchhoff’s equations, 87, 88, 90, 91Kleinrock’s square root assignment, 47,

130

Lagrange multiplier, 2, 47, 60, 61, 90,154–155, 161, 188, 207–209

complementary slackness, 60, 61, 161,188, 209

slack variables, 60, 209Little’s law, 2, 38–41, 45, 47, 50, 68, 197,

204–206loss network, 1, 4–9, 51–82, 86, 91,

100–107, 128, 130, 133, 134, 140,148, 151, 152, 169

Lyapunov function, 163–164, 168, 172,180, 184, 189–192, 210–216

market clearing equilibrium, 160, 161MaxWeight-α algorithm, 179–182, 185,

191, 213

Nash’s bargaining problem, 160, 161negative binomial expansion, 37, 197,

198

PASTA property, 22, 31, 50Poisson point process, 41, 203Pollaczek-Khinchine formula, 147processor sharing, 4, 10, 11, 47–49, 197

quasi-reversible, 197

recurrence, 110, 111null, 187, 189, 202positive, 111, 123, 124, 180, 185, 189,

193, 194, 202, 210–213transience, 187, 189, 202

resource pooling, 6, 80–82, 106

shadow price, 103, 155, 161, 208Strong Lagrangian Principle, 59–61, 126,

127, 154, 207–209

223

224 Index

Thomson’s principle, 90token ring, 8, 108–109transmission control protocol (TCP), 10,

162, 166–171, 175, 177, 181, 185,186

trunk reservation, 80, 82, 91, 140trunking efficiency, 80, 106

Wardrop equilibrium, 94–100

Documents

Lecture Notes on Stochastic Networks · Communication networks underpin our modern world, and provide fasci-nating and challenging examples of large-scale stochastic systems. Ran-domness