6
92 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 39, NO. 1, JANUARY 1993 Recurrence Times of Buffer Overflows in Jackson Networks Sean P. Meyn, Member, IEEE, and Michael R. Frater, Member, IEEE Abstract-The estimation of the statistics of buffer overflows in networks of queues by means of simulation is inherently costly, simply because of the rarity of these events. An alternative analytic approach is presented, with very low computational cost, for calculating the recurrence time of buffer overflows Jackson networks in which the recurrence times of buffer overflows in a network is expressed in terms of the recurrence times for overflows of individual buffers, isolated from the network This result is applied to the buffer allocation problem for queuing networks, providing extensions and further justification for a previously derived heuristic approach to this problem. Index Terms- Jackson networks, buffer overflows, Markov processes. I. INTRODUCTION N A PROPERLY dimensioned queueing network with finite I buffers, the estimation of the recurrence times of buffer overflows by direct simulation is very costly, simply because of their rarity. In this paper, we present an alternative analytic approach for the calculation of the expected recurrence times of buffer overflows in Jackson networks. Importance sampling techniques based on large deviations theory for the optimally efficient (in the sense of variance) simulation of rare events have been proposed in [l]. These results were applied to queueing networks by [2]. The basic idea of the importance sampling approach is an follows. Given a rare event A in a system S, we seek a system S and an event A defined on S such that A is less rare than A, and that given the probability of A in S, we can find the probability of our original event A in S. Ideally, we would like to find S and A such that the simulation cost is minimized. Neither of [l], [2] provide an analytic solution for such an optimal transformation. For the case of queuing networks, an explicit analytic solution for the simulation system, based on [2] has also been derived [3], [4]. In each of the previously described papers, the quantity of interest is the expected time, starting with the network empty, for an overflow to occur. In practice, the expected time Manuscript received December 10, 1989. This work was supported by the Australian Telecommunications and Electronics Research Board (ATERB) and ANU Centre for Information Science Research (CISR). This work was presented at the 29th IEEE Conference on Decision and Control, Honolulu, HI, December 1990. S. P. Meyn is with the Coordinated Science Laboratory, University of Illinois, 1101 W. Springfield Avenue, Urbana, IL 61801. M. R. Frater is with the Department of Electrical Engineering, University College, Australian Defense Force Academy, Northcott Drive, Campbell ACT 2600, Australia. IEEE Log Number 9203007. between overflows is often of interest, and it is this problem that is considered here. Linear-algebraic techniques, based upon the methods of [5], can also be used to compute the statistics of exit times. While this approach has the advantage of finding distributions rather than just their expected values, it involves complex manipulations of the probability transition matrix, and in systems of large dimensions, such as queueing networks, this may not be feasible. The approach presented here permits an exact calculation of the mean time between overflows for Jackson networks. This mean time is expressed in terms of the exit times for the network’s component queues operating with Poisson arrival streams of the same average rate as the arrival stream provided by the network. Section I1 contains a brief summary of the major theoretical ideas used in this work. The main result, Theorem 1, is presented in Section I1 together with a proof. In Section IV, we show how Theorem 1 may be used to derive an optimal buffer allocation rule for Jackson networks, which becomes the rule of thumb of [6] when the total buffer space becomes large. The Appendix provides a proof of a key result which is used to prove Theorem 1. 11. BACKGROUND THEORY Here, we introduce some results from the theories of Markov processes and queueing networks that will be needed later in the paper. A. Markov Processes Let (Pt) denote a Markov transition semigroup, where t takes values in the nonnegative real numbers R+. The state space, denoted X, is assumed to be a countable set. For each fixed t and x E X, Pt(x, .) is a probability on X, and the Chapman-Kolmogorov equation holds: For a probability p on X, t E R , , and bounded function f on X, we define the probability pPt and the function Pt f as Pt f = Pt( ., dx) f (x). / Given an initial distribution po on X, we may define the stochastic process z evolving on X by defining a probability 0018-9448/93$03.00 0 1993 IEEE __ .- -~ __

Recurrence times of buffer overflows in Jackson networks

  • Upload
    mr

  • View
    214

  • Download
    0

Embed Size (px)

Citation preview

92 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 39, NO. 1, JANUARY 1993

Recurrence Times of Buffer Overflows in Jackson Networks

Sean P. Meyn, Member, IEEE, and Michael R. Frater, Member, IEEE

Abstract-The estimation of the statistics of buffer overflows in networks of queues by means of simulation is inherently costly, simply because of the rarity of these events. An alternative analytic approach is presented, with very low computational cost, for calculating the recurrence time of buffer overflows Jackson networks in which the recurrence times of buffer overflows in a network is expressed in terms of the recurrence times for overflows of individual buffers, isolated from the network This result is applied to the buffer allocation problem for queuing networks, providing extensions and further justification for a previously derived heuristic approach to this problem.

Index Terms- Jackson networks, buffer overflows, Markov processes.

I. INTRODUCTION N A PROPERLY dimensioned queueing network with finite I buffers, the estimation of the recurrence times of buffer

overflows by direct simulation is very costly, simply because of their rarity. In this paper, we present an alternative analytic approach for the calculation of the expected recurrence times of buffer overflows in Jackson networks.

Importance sampling techniques based on large deviations theory for the optimally efficient (in the sense of variance) simulation of rare events have been proposed in [l]. These results were applied to queueing networks by [2]. The basic idea of the importance sampling approach is an follows. Given a rare event A in a system S, we seek a system S and an event A defined on S such that A is less rare than A, and that given the probability of A in S, we can find the probability of our original event A in S. Ideally, we would like to find S and A such that the simulation cost is minimized. Neither of [l], [2] provide an analytic solution for such an optimal transformation. For the case of queuing networks, an explicit analytic solution for the simulation system, based on [2] has also been derived [3], [4].

In each of the previously described papers, the quantity of interest is the expected time, starting with the network empty, for an overflow to occur. In practice, the expected time

Manuscript received December 10, 1989. This work was supported by the Australian Telecommunications and Electronics Research Board (ATERB) and ANU Centre for Information Science Research (CISR). This work was presented at the 29th IEEE Conference on Decision and Control, Honolulu, HI, December 1990.

S. P. Meyn is with the Coordinated Science Laboratory, University of Illinois, 1101 W. Springfield Avenue, Urbana, IL 61801.

M. R. Frater is with the Department of Electrical Engineering, University College, Australian Defense Force Academy, Northcott Drive, Campbell ACT 2600, Australia.

IEEE Log Number 9203007.

between overflows is often of interest, and it is this problem that is considered here.

Linear-algebraic techniques, based upon the methods of [5 ] , can also be used to compute the statistics of exit times. While this approach has the advantage of finding distributions rather than just their expected values, it involves complex manipulations of the probability transition matrix, and in systems of large dimensions, such as queueing networks, this may not be feasible.

The approach presented here permits an exact calculation of the mean time between overflows for Jackson networks. This mean time is expressed in terms of the exit times for the network’s component queues operating with Poisson arrival streams of the same average rate as the arrival stream provided by the network.

Section I1 contains a brief summary of the major theoretical ideas used in this work. The main result, Theorem 1, is presented in Section I1 together with a proof. In Section IV, we show how Theorem 1 may be used to derive an optimal buffer allocation rule for Jackson networks, which becomes the rule of thumb of [6] when the total buffer space becomes large.

The Appendix provides a proof of a key result which is used to prove Theorem 1.

11. BACKGROUND THEORY Here, we introduce some results from the theories of Markov

processes and queueing networks that will be needed later in the paper.

A. Markov Processes

Let (Pt) denote a Markov transition semigroup, where t takes values in the nonnegative real numbers R+. The state space, denoted X, is assumed to be a countable set.

For each fixed t and x E X, Pt(x, .) is a probability on X, and the Chapman-Kolmogorov equation holds:

For a probability p on X, t E R,, and bounded function f on X, we define the probability pPt and the function Pt f as

Pt f = Pt( ., dx) f (x). / Given an initial distribution po on X, we may define the

stochastic process z evolving on X by defining a probability

0018-9448/93$03.00 0 1993 IEEE

__ .- -~ __

MEYN AND FRATER RECURRENCE TIMES OF BUFFER OVERFLOWS IN JACKSON NETWORKS

~

93

P, on an appropriate subset of sample space XR+ via the relations

P,,{xo = 4 vo(x), P,,,{Zt = ~13~'s) A Pt-'(x,,x), O 5 s < t ,

where 3s = a{xt : 0 5 t 5 s } . The resulting stochastic process z is called a Markov process with Markov transition semigroup (P'). The second relation above expresses the Markov property; the future behavior of z given the present and past depends only on the present.

The Markov processes considered in this paper will be assumed to possess sample paths that are piecewise constant and right continuous. Under these assumptions, the strong Markov property holds:

(4)

when r is a 3t-adapted stopping time. For a precise definition of the backwards shift operator 0' and the a-algebra 3T, see

It follows from the previous definition that xt has distri- bution popt for all t when $0 N pa. In particular, if f is a bounded function on X and if E,, denotes the expectation operator associated with P,,, then we have E,,[f(xt)] =

When po is concentrated on a single point x E X we write P, for the resulting probability on sample space, and E, for the corresponding expectation operator.

An invariant probability IT is a probability on X which is left invariant by the transition matrices {Pt} . Hence, for all

~71, PI.

POPtf.

t E R+,2 E x, P,{xt = x} = 7rPt(X) = 7r(x), ( 5 )

and it may be shown that in fact z is a strictly stationary stochastic process when zo N 7r.

The rate from x to y is defined as

We let

9, A Cqzy. YZ,

Our definition of qz differs from the usual notation since we have omitted a minus sign.

For the Markov processes considered in this paper, we will always assume that a unique invariant probability 7r exists and that, for some c > 0,C < 00,

c 5 qx 5 C, for all 2 E X of positive 7r-measure. (6)

We define the random times {Tk : k 2 0) by

Under the conditions assumed in this paper, the embedded Markov chain (xTk : IC 2 0) possess a unique invariant prob- ability 7r0 which is defined for x E X by

(7)

This paper is concerned with the expected value of the first exit time from a given subset of the state space in which the process possesses certain desirable properties. The Markov processes considered here will be used to model queueing networks, and hence the first exit time from a bounded subset of the state space is a quantity which is of considerable practical interest.

Let A c X be a given set, and let T A ~ denote the first exit time from A. In general, for a set B c X, we define the first hitting time TB as

TB = min(t : xt E B , t 2 0).

The lcth hitting times ( T i : k 2 1) are defined by TA = TB, and T i = S;-'+ e s k l T ~ , where S;-' = Tk-' B +

OT;-'T~', and the definition of the shift operator 0 may be found in [7], [8].

B. Queueing Networks

In its simplest form, a queueing system is described as a process where jobs arrive into a system demanding service, wait (if necessary) until they may be serviced, and after service is completed a customer reenters the queue with probability T ,

and exits the system with probability 1 - T . Such a system is called an MIMI1 queue if the arrival process is Poisson with rate y and consecutive virtual service times are i.i.d. with a common exponential distribution with parameter p.

Letting xt denote the number of jobs present in the buffer of an MIMI1 queue at time t, it may be shown that xt is a Markov process evolving on Z+.

Let A A y / ( l - r ) denote the effective rate of customers to the single queue. If p k Alp denotes the load on the system, then an invariant probability 7r exists, if and only if p < 1, and if this is the case then 7r is defined uniquely by

In this paper, we will be concerned with queueing networks, which are finite collections of queueing systems connected to- gether. Consider a queueing network consisting of d MIMI1 queues numbered 1 ,2 , . . . , d. The required service time at queue i is exponentially distributed with parameter pi, and jobs arrive from outside the network into node i in a Poisson stream with rate yi. Whenever a job has completed service at system i, it is routed to system j with probability ~ i , j and leaves the system with probability T ~ , o . If the network is also open, that is every job entering the system may leave the system with positive probability, then it is called a Jackson network.

The arrival stream at a queue in a Jackson network will not be Poisson in general. However the result below shows that it will retain certain important properties of Poisson arrival streams when the system is stable and is operating in steady

94 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 39, NO. 1, JANUARY 1993

state. Let {X i } denote the constants defined by the so called traffic equations

where I A { 1, . . d } . Let 7ri denote the invariant probability for an M/M/1 queue with effective arrival rate A; and service rate pi, and let xt denote the vector (xt(l) , . . . , xt(d)) where xt(i) denotes the number of jobs awaiting service at the ith queue at time t.

For a proof of the following result see [9], [lo]. Jackson's Theorem: The process x ( t ) is a Markov process

on X 4 Z$. An invariant probability T exists, if and only if pi A &/pi < 1 for each 1 5 i 5 d and in the case it is given uniquely by

T{(xl,-,xd)} = m(x1) x ' . . x T d ( Z d ) .

111. EXIT TIMES FOR A NETWORK IN TERMS OF EXIT TIMES FOR CONSTITUENT QUEUES

In this section, we present the main result of the paper, which allows the computation of the mean time between buffer overflows for a Jackson network.

Let x be the Markov process whose components indicate the number of customers at each node in a Jackson network. We assume that the Markov process x is defined so that the maximum load, defined as p A m a x l l i l d pi , is less than unity. That is, the system is stable, and hence x possesses a unique invariant probability T . It is easy to see that the rate condition (6) is also satisfied.

For each i E I , let y(i) denote a single queue with the same service rate and load as queue i in the queueing network described by x. The probability that a customer is rerouted to the single queue after a service is completed is equal to r;,;; the probability that the customer leaves the queue is equal to 1 - ri,; = C%O r;,j. The arrival stream is assumed Poisson with rate 7: = (1 - ri ,a)pipi , so that the load at y(i) is equal to pa.

The processes {y(i) : 1 5 i I d } are assumed to be mu- tually independent, independent of x, and may be defined on separate probability spaces. Let N; E Z+ denote an upper bound, which we interpret as the maximum acceptable number of customers in the ith buffer, i E I . We let Ei + Ni + 1, G = {x E X : x ( i ) 5 Ni,l I i 5 d } , and E = {x E X : x(j) = Ej for exactly one j E I , and x ( i ) I Ni for all i # j } .

Our main result concerns the first exit time from the set G or, equivalently, the first entrance time to the set E.

Let n denote the distribution of x immediately after recovery from an overflow. The distribution n may be expressed as

32%

n { x > = PTO { x q = x)~,, E GI E E } , (8)

We may now state our main result, which shows how the expected time between overflows in a network may be related to the expected overflow times of its constituent queues.

Theorem 1: Let TE denote the first exit time from G for the Jackson network modeled by x, and for i E I , let TE, denote the corresponding exit time for the process y(i). Then we have

1

where

with E = min Ni, and p = m a p ; . 0

In words, our main result states that the rate of overflows in the network is less than, and approximately equal to, the sum of the rates of its constituent queues.

Remark 1 : The result applies to a network with infinite buffers. The same result will apply to a network with finite buffers, such as the model treated in [ll], but the distribution n may only be an approximation to that occurring immediately after recovery from an overflow. This follows from the obser- vation that a modification of the process on the event of an overflow does not influence the distribution of the overflow time.

Remark 2: An exact expression for E; may be found in the proof of Theorem 1-See (12). From the theorem we see that E; = 0, if and only if ri,; + r i , ~ = 1. Hence, for large N, the overflow time for the network behaves like the overflow time for d independent queues operating in parallel.

Remark 3: Computing the quantity E N , [TE,] is trivial: We have by (14),

(9)

Proof of Theorem 1: The proof of Theorem 1 is based upon the following lemma, for which we require two technical definitions. Firstly, we define the rate from A to A', denoted q(A,A"), as

The quantity T(A)q(A, A") is the steady state average number of transitions from A to A" (see (22)).

Next we define the steady state distribution of x on arrival to A. This probability shall be denoted a, and is defined for x ~ X b y

where Tk denotes the time at which x undergoes its kth jump, a(.) = PT0{xT1 = x1xT1 E A1xT0 E A"}, (11) and T O denotes the invariant probability for the jump process {ZTk : IC 2 1). where TO is defined in (7).

MEYN AND FRATER: RECURRENCE TIMES OF BUFFER OVERFLOWS IN JACKSON NETWORKS

~

95

Lemma 1: Let z be a Markov process satisfying (6), and X be a finite set satisfying 0 < T(A) < 1, where T let A

denotes the unique invariant probability for the process. Then,

E,[TA~] = q(A,Ac)-'. 0

For instance, in the case where A = { a } is a singleton, Lemma 1 becomes

x f a

The proof of Lemma 1 is postponed to the Appendix. To prove Theorem 1 we first observe that the distribution

n used in Theorem 1 is equal to the distribution (Y defined in (11).

It follows by [lo], Lemma 1.4 that the conservation of flux equation holds:

where we have used the product form of the invariant proba- bility T . The constant ~i may be written explicitly as

Applying Lemma 1, we obtain

and by specializing (13) to the case where d = 1 we obtain for each i,

Equations (12)-(14) imply Theorem 1. 0

IV. A BUFFER ALLOCATION PROBLEM

Here, we show how Theorem 1 may be used to extend, and give further justification for, the rule of thumb of [6].

Suppose that we are given N buffers which must be distributed among the d nodes of a Jackson network in such a way that the expected time between overflows is maximized.

Let {Pi : 1 5 i 5 d } denote the proportion of buffer space allocated to the various nodes. We require that Pi 2 0 for all i, and that C:=,/3, = 1.

We will solve this allocation problem by setting it up as a standard constrained minimization problem.

Our objective is to maximize E ~ [ T E ] , which, for large N , may be closely approximated by minimizing the function F on the simplex in Rd defined as

d

(since PzN % N , ) d

= E N , [Tal-' % En[T]-' 2=1

Letting K, = A,(1 - r2,%)(1 - pa) , A, = p r , and defining the Lagrangian

we obtain, as a necessary condition for an optimum,

z E I . dL - = ( K , log AJAf + t = 0, aPa

By convexity of the function F , a unique minimum exists, and hence, the previous equation shows that the buffer space should be distributed so that

A2(1 - T , , , ) ( l - p2)Nlogp,ppN = Const., (15)

where the constant is independent of a but may depend on the other variables.

The questions addressed in [6] are concerned with what happens when N is large. In this case, we may use (15) to obtain, for all 1 5 z < J 5 d,

log(-A2(1 - r2,2)(1 - p2)NlogpJ + NP2logp2 = 1. log(-A,(1 - T,,3)(1 - P,)NlogP,) + NP1 @P,

Letting N + 00 (recall that Pz is a function of N ) , we obtain

Pzlogp2 = 1, lim ~

N-Cc P, 1% P1 which expresses Anantharam's rule of thumb:

"In allocating N buffers to maximize the time to buffer overflow, one should allocate roughly a fraction of the N buffers to node 2 , where Pz is proportional to ( log p2)-'.',

96 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 39, NO. 1, JANUARY 1993

m . M

lA'" 1

TA SA T i S i *t

A A Y Y .....

N(T)=O N(T)=l N(T)=2

Fig. 1. A sample path of la (zt ) .

V. CONCLUSION The previous results demonstrate that it is possible to calcu-

late analytically the mean recurrence time of buffer overflows in a Jackson network without large cost in computation. Fur- ther work needs to be done to establish whether similar results hold in more general classes of queueing networks for which product-form solutions exist for the invariant probability.

APPENDIX PROOF OF LEMMA 1

The proof of Lemma 1 is similar to the standard proof of Little's Theorem, which is based upon the ergodic theorem for stationary stochastic processes.

Suppose that xi-, N A (xi-, has distribution A) so that z becomes an ergodic, stationary process, and consider the process ( ~ A ( Q ) : t 2 0). A typical sample path of this process is illustrated in Fig. 1. Fig. 1 also provides interpretations for some of the random variables introduced in this section.

We will require two other random variables in our analysis:

N ( T ) A II{k : x~~ E A,%+, E A",O 5 Tk+l 5 T}II = number of times that z jumps from A to A"

in [O,TI, J(T) max{k : Q 5 T , k E Z,}

= number of jumps in [0, TI.

It follows by the ergodic theorem for Markov chains that

By (19) and the ergodic theorem for Markov chains, we have T

lim - - - E r , [Til T+m J ( T )

where we have used the ergodic theorem together with (6), which implies that (E, [TI] - ( ~ k + 1 - T~) ,F~&+, ) is an La-bounded martingale c&ference process. The last equality follows from the expression Ez[q] = qL1.

Combining (17), (20), and (21) gives

We now consider the second factor in (16). Define the function f : X -+ R+ as

The ergodic theorem for stationary processes and the previous definitions imply that

1 T ( A ) = r+m lim - T / 1 A ( x t ) d t

We require that T i and hence f have a second moment. Because A is finite and satisfies 0 < r ( A ) < 1, and

because A is ergodic, there exists a set Ao G A satisfying A ( A \ Ao) = 0, a constant m 2 1, and a constant p < 1 such that

T

0 max P z { 7 A = 2 m} = p . " ,EAo

= lim (y ) (N(T) 2 eT'TAC * (16) Using this fact together with the Markov property, it follows that

T-+CC

To prove Lemma 1 we will evaluate the limit of each of the P,{TA~ 2 N m } = E, [l ( T A ~ > ( N - 1)m) two factors on the right hand side of (16). The first factor may be expressed as ' pz(~-,),,, [TAc 2 m]]

5 ppz{TAc 2 ( N - 1)m} (17) I P N

N ( T ) - lim -- N ( T ) J ( T ) lim - - T+oo T T+m J(T) T '

which may be further decomposed as for any N 2 1, and hence E, bounded, which is what was wanted.

Hence, by the strong Markov property, the adapted process

(wk,Gk) A (f ( X T ; . ) -oT"TAc,F~;, (18)

MEYN AND FRATER: RECURRENCE TIMES OF BUFFER OVERFLOWS IN JACKSON NETWORKS 97

is a martingale difference process with a bounded second moment.

By the law of large numbers for orthogonal random vari- ables [12], we have

Furthermore, by the law of large numbers of Markov chains it may be shown that

and combining this relation with (16) and (22) completes the proof of Lemma 1. 0

REFERENCES

[l] M. Cottrell, J.C. Fort, and G. Malgouyes, “Large deviations and rare events in the study of stochastic algorithms,” IEEE Trans. Automat. Contr., vol. AC-28, pp. 907-918, Sept. 1983.

[2] S. Parekh and J. Walrand, “A quick simulation of excessive back- lags in networks and queues,’’ IEEE Trans. Automat. Contr., vol. 34, pp. 54-66, Jan. 1989.

[3] M. R. Frater and B. Anderson, “Fast estimation of the statistics of exces- sive backlogs in tandem networks of queues,” Australian Telecommun. Res., vol. 23, pp. 49-55, May 1989.

[4] M.R. Frater, T.M. Lemon, and B. Anderson, “Optimally efficient estimation of the statistics rare events in queueing networks,” IEEE Trans. Automat. Contr., vol. 36, Dec. 1991.

[5] M. F. Neuts, Matrix-Geometric Solutions in Stochastic Models. Balti- more, MD: Johns Hopkins Univ. Press, 1981.

161 V. Anantharam, “The optimal buffer allocation problem,” IEEE Trans. Inform. Theory, vol. 35, pp. 721-725, July 1989.

[7] K. L. Chung, Markov Chains with Stationary Transition Probabilities. Berlin: Springer, 1960.

[8] S. P. Meyn and R. L. Tweedie, Markov Chains and Stochastic Stabdity (Control and Communication in Engineering) New York: Springer- Verlag, 1992.

Cam- bridge: Cambridge Univ. Press, 1987.

1101 F. P. Kelly, Reversibility and Stochastic Networks. New York: John Wiley, 1987.

[ l l ] N. M. van Dijk, “On Jackson’s product form with ‘jump-over’ blocking,” Oper. Res. Lett., vol. 7, pp. 233-235, Oct. 1988.

[12] J. L. Doob, Stochastic Processes.

[9] I. Mitrani, Modelling of Computer and Communication Systems,

New York: John Wiley, 1953.