98
Part B Applied Probability Matthias Winkel Department of Statistics University of Oxford MT 2007

Part B Applied Probability

  • Upload
    dongoc

  • View
    235

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Part B Applied Probability

Part B

Applied Probability

Matthias WinkelDepartment of Statistics

University of Oxford

MT 2007

Page 2: Part B Applied Probability
Page 3: Part B Applied Probability

Part B

Applied Probability16 lectures MT 2007

Aims

This course is intended to show the power and range of probability by considering realexamples in which probabilistic modelling is inescapable and useful. Theory will bedeveloped as required to deal with the examples.

Synopsis

Poisson processes and birth processes. Continuous-time Markov chains. Transition rates,jump chains and holding times. Forward and backward equations. Class structure, hittingtimes and absorption probabilities. Recurrence and transience. Invariant distributionsand limiting behaviour. Time reversal.

Applications of Markov chains in areas such as queues and queueing networks –M/M/s queue, Erlang’s formula, queues in tandem and networks of queues, M/G/1 andG/M/1 queues; insurance ruin models; epidemic models; applications in applied sciences.

Renewal theory. Limit theorems: strong law of large numbers, central limit theo-rem, elementary renewal theorem, key renewal theorem. Excess life, inspection paradox.Applications.

Reading

• J.R. Norris, Markov chains, Cambridge University Press (1997)

• G.R. Grimmett, and D.R. Stirzaker, Probability and Random Processes, 3rd edition,Oxford University Press (2001)

• G.R. Grimmett, and D.R. Stirzaker, One Thousand Exercises in Probability, OxfordUniversity Press (2001)

• D.R. Stirzaker, Elementary Probability, Cambridge University Press (1994)

• S.M. Ross, Introduction to Probability Models, 4th edition, Academic Press (1989)

Page 4: Part B Applied Probability
Page 5: Part B Applied Probability

Contents

1 Introduction: Poisson processes, generalisations, applications 1

1.1 Poisson processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 The Markov property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3 Brief summary of the course . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Conditioning and stochastic modelling 7

2.1 Modelling of events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.2 Conditional probabilities, densities and expectations . . . . . . . . . . . . 9

2.3 Independence and conditional independence . . . . . . . . . . . . . . . . 11

3 Birth processes and explosion 13

3.1 Definition and an example . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.2 Tonelli’s Theorem, monotone and dominated convergence . . . . . . . . . 14

3.3 Explosion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

4 Birth processes and the Markov property 17

4.1 Statements of the Markov property . . . . . . . . . . . . . . . . . . . . . 17

4.2 Application of the Markov property . . . . . . . . . . . . . . . . . . . . . 18

4.3 Proof of the Markov property . . . . . . . . . . . . . . . . . . . . . . . . 19

4.4 The strong Markov property . . . . . . . . . . . . . . . . . . . . . . . . . 20

4.5 The Markov property for Poisson processes . . . . . . . . . . . . . . . . . 21

5 Continuous-time Markov chains 23

5.1 Definition and terminology . . . . . . . . . . . . . . . . . . . . . . . . . . 23

5.2 Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

5.3 M/M/1 and M/M/s queues . . . . . . . . . . . . . . . . . . . . . . . . . 26

6 Transition semigroups 29

6.1 The semigroup property of transition matrices . . . . . . . . . . . . . . . 29

6.2 Backward equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

6.3 Forward equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

6.4 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

6.5 Matrix exponentials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

v

Page 6: Part B Applied Probability

vi Contents

7 The class structure of continuous-time Markov chains 35

7.1 Communicating classes and irreducibility . . . . . . . . . . . . . . . . . . 35

7.2 Recurrence and transience . . . . . . . . . . . . . . . . . . . . . . . . . . 37

7.3 Positive and null recurrence . . . . . . . . . . . . . . . . . . . . . . . . . 39

7.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

8 Convergence to equilibrium 41

8.1 Invariant distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

8.2 Convergence to equilibrium . . . . . . . . . . . . . . . . . . . . . . . . . 43

8.3 Detailed balance equations and time reversal . . . . . . . . . . . . . . . . 44

8.4 Erlang’s formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

9 The Strong Law of Large Numbers 47

9.1 Modes of convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

9.2 The first Borel-Cantelli lemma . . . . . . . . . . . . . . . . . . . . . . . . 49

9.3 The Strong Law of Large Numbers . . . . . . . . . . . . . . . . . . . . . 49

9.4 The second Borel-Cantelli lemma . . . . . . . . . . . . . . . . . . . . . . 51

9.5 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

10 Renewal processes and equations 53

10.1 Motivation and definition . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

10.2 The renewal function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

10.3 The renewal equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

10.4 Strong Law and Central Limit Theorem of renewal theory . . . . . . . . 56

10.5 The elementary renewal theorem . . . . . . . . . . . . . . . . . . . . . . 57

11 Excess life and stationarity 59

11.1 The renewal property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

11.2 Size-biased picks and stationarity . . . . . . . . . . . . . . . . . . . . . . 61

12 Convergence to equilibrium – renewal theorems 65

12.1 Convergence to equilibrium . . . . . . . . . . . . . . . . . . . . . . . . . 65

12.2 Renewal theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

12.3 The coupling proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

12.4 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

13 M/M/1 queues and queueing networks 69

13.1 M/M/1 queues and the departure process . . . . . . . . . . . . . . . . . 70

13.2 Tandem queues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

13.3 Closed and open migration networks . . . . . . . . . . . . . . . . . . . . 73

14 M/G/1 and G/M/1 queues 75

14.1 M/G/1 queues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

14.2 Waiting times in M/G/1 queues . . . . . . . . . . . . . . . . . . . . . . . 77

14.3 G/M/1 queues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

Page 7: Part B Applied Probability

Contents vii

15 Markov models for insurance 8115.1 The insurance ruin model . . . . . . . . . . . . . . . . . . . . . . . . . . 8115.2 Aggregate claims and reserve processes . . . . . . . . . . . . . . . . . . . 8215.3 Ruin probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8315.4 Some simple finite-state-space models . . . . . . . . . . . . . . . . . . . . 86

16 Conclusions and perspectives 8716.1 Summary of the course . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8716.2 Duration-dependent transition rates . . . . . . . . . . . . . . . . . . . . . 8816.3 Time-dependent transition rates . . . . . . . . . . . . . . . . . . . . . . . 8816.4 Spatial processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8916.5 Markov processes in uncountable state spaces (R or Rd) . . . . . . . . . . 8916.6 Levy processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8916.7 Stationary processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8916.8 4th year dissertation projects . . . . . . . . . . . . . . . . . . . . . . . . 90

A Assignments IA.1 The lack of memory property . . . . . . . . . . . . . . . . . . . . . . . . IIIA.2 Poisson processes and birth processes . . . . . . . . . . . . . . . . . . . . VA.3 Continuous-time Markov chains . . . . . . . . . . . . . . . . . . . . . . . VIIA.4 Continuous-time Markov chains II . . . . . . . . . . . . . . . . . . . . . . IXA.5 Renewal theory I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . XIA.6 Renewal theory II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . XIII

B Solutions XVB.1 The lack of memory property . . . . . . . . . . . . . . . . . . . . . . . . XVB.2 Poisson processes and birth processes . . . . . . . . . . . . . . . . . . . . XXB.3 Continuous-time Markov chains . . . . . . . . . . . . . . . . . . . . . . . XXVB.4 Continuous-time Markov chains II . . . . . . . . . . . . . . . . . . . . . . XXXB.5 Renewal theory I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . XXXVB.6 Renewal theory II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . XL

Page 8: Part B Applied Probability
Page 9: Part B Applied Probability

Lecture 1

Introduction: Poisson processes,generalisations and applications

Reading: Part A Probability; Grimmett-Stirzaker 6.1, 6.8 up to (10)Further reading: Ross 4.1, 5.3; Norris Introduction, 1.1, 2.4

This course is, in the first place, a course for 3rd year undergraduates who did PartA Probability in their second year. Other students from various M.Sc.’s are welcomeas long as they are aware of the prerequisites of the course. These are essentially anintroductory course in probability not based on measure theory. It will be an advantageif this included the central aspects of discrete-time Markov chains, by the time we get toLecture 5 in week 3.

The aim of Lecture 1 is to give a brief overview of the course. To do this at anappropriate level, we begin with a review of Poisson processes that were treated at thevery end of the Part A syllabus. The parts most relevant to us in today’s lecture areagain included here, and some more material is on the first assignment sheet.

This is a mathematics course. “Applied probability” means that we apply probability,but not so much Part A Probability but further probability building on Part A and notcovered there, so effectively, we will be spending a lot of our time developing theory asrequired for certain examples and applications.

For the rest of the course, let N = 0, 1, 2, . . . denote the natural numbers includingzero. Apart from very few exceptions, all stochastic processes that we consider in thiscourse will have state space N (or a subset thereof). However, most results in the theoryof Markov chains will be treated for any countable state space S, which does not pose anycomplications as compared with N, since one can always enumerate all states in S andhence give them labels in N. For uncountable state spaces, however, several technicalitiesarise that are beyond the scope of this course, at least in any reasonable generality – wewill naturally come across a few examples of Markov processes in R towards the end ofthe course.

1

Page 10: Part B Applied Probability

2 Lecture Notes – Part B Applied Probability – Oxford MT 2007

1.1 Poisson processes

There are many ways to define Poisson processes. We choose an elementary definitionthat happens to be the most illustrative since it allows to draw pictures straight away.

Definition 1 Let (Zn)n≥0 be a sequence of independent exponential random variablesZn ∼ Exp(λ) for a parameter (inverse mean) λ ∈ (0,∞), T0 = 0, Tn =

∑n−1k=0 Zk, n ≥ 1.

Then the process X = (Xt)t≥0 defined by

Xt = #n ≥ 1 : Tn ≤ t

is called Poisson process with rate λ.

Note that (Xt)t≥0 is not just a family of (dependent!) random variables but indeedt 7→ Xt is a random right-continuous function. This view is very useful since it is theformal justification for pictures of “typical realisations” of X.

Think of Tn as arrival times of customers (arranged in increasing order). Then Xt iscounting the numbers of arrivals up to time t for all t ≥ 0 and we study the evolution ofthis counting process. Instead of customers, one might be counting particles detected bya Geiger counter or cars driving through St. Giles, etc. Something more on the link andthe important distinction between real observations (cars in St. Giles) and mathematicalmodels (Poisson process) will be included in Lecture 2. For the moment we have amathematical model, well specified in the language of probability theory. Starting from asimple sequence of independent random variables (Zn)n≥0 we have defined a more complexobject (Xt)t≥0, that we call Poisson process.

Let us collect some properties that, apart from some technical details, can serve asan alternative definition of the Poisson process.

Remark 2 A Poisson process X with rate λ has the following properties

(i) Xt ∼ Poi(λt) for all t ≥ 0, where Poi refers to the Poisson distribution with meanλ.

(ii) X has independent increments, i.e. for all t0 ≤ . . . ≤ tn, Xtj −Xtj−1, j = 1, . . . , n,

are independent.

(iii) X has stationary increments, i.e. for all s ≤ t, Xt+s − Xt ∼ Xs, where ∼ means“has the same distribution as”.

To justify (i), calculate

E(qXt)

=

∞∑

n=0

qnP(Xt = n) =

∞∑

n=0

qnP(Tn ≤ t, Tn+1 > t)

=

∞∑

n=0

qn (P(Tn ≤ t) − P(Tn+1 ≤ t)) = 1 −∞∑

j=1

qj−1(1 − q)P(Tj ≤ t)

= 1 −

∫ t

0

∞∑

j=1

qj−1(1 − q)λj

(j − 1)!zj−1e−λzdz

= 1 −

∫ t

0

(1 − q)λe−λz+λqzdz = e−λt(1−q),

Page 11: Part B Applied Probability

Lecture 1: Introduction: Poisson processes, generalisations, applications 3

where we used the well-known fact that Tn as a sum of independent Exp(λ)-variables has aGamma(n, λ) distribution. It is now easily checked that this is the probability generatingfunction of the Poisson distribution with parameter λt. We conclude by the UniquenessTheorem for probability generating functions. Note that we interchanged summation andintegration. This may be justified by an expansion of e−λz into a power series and usinguniform convergence of power series twice. We will see another justification in Lecture 3.

(ii)-(iii) can be derived from the following Proposition 3, see also Lecture 4.

1.2 The Markov property

Let S be a countable state space, typically S = N. Let Π = (πrs)r,s∈S be a Markovtransition matrix on S. For every s0 ∈ S this specifies the distribution of a Markov chain(Mn)n≥0 starting from s0 (under Ps0, say), by

Ps0(M1 = s1, . . . ,Mn = sn) = P(M1 = s1, . . . ,Mn = sn|M0 = s0) =

n∏

j=1

πsj−1,sj

We say that (Mn)n≥0 is a Markov chain with transition matrix Π starting from s0. Thereare several formulations of the Markov property:

• For all paths s0, . . . , sn+1 ∈ S of positive probability, we have

P(Mn+1 = sn+1|M0 = s0, . . . ,Mn = sn) = P(Mn+1 = sn+1|Mn = sn) = πsn,sn+1

• For all s ∈ S and events (M0, . . . ,Mn) ∈ A and (Mn,Mn+1, . . .) ∈ B, we have:if P(Mn = s, (Mj)0≤j≤n ∈ A) > 0, then

P((Mn+k)k≥0 ∈ B|Mn = s, (Mj)0≤j≤n ∈ A) = P((Mn+k)k≥0 ∈ B|Mn = s).

• (Mj)0≤j≤n and (Mn+k)k≥0 are conditionally independent given Mn = s, for all s ∈ S.Furthermore, given Mn = s, (Mn+k)k≥0 is a Markov chain with transition matrix Πstarting from s.

Informally: no matter how we got to a state, the future behaviour of the chain is as ifwe were starting a new chain from that state. This is one reason why it is vital to studyMarkov chains not starting from one initial state but from any state in the state space.

In analogy, we will here study Poisson processes X starting from initial states X0 =k ∈ N (under Pk), by which we just mean that we consider Xt = k + Xt, t ≥ 0, where Xis a Poisson process starting from 0 as in the above definition.

Proposition 3 (Markov property) Let X be a Poisson process with rate λ startingfrom 0 and t ≥ 0 a fixed time. Then the following hold.

(i) For all k ∈ N and events (Xr)r≤t ∈ A and (Xt+s)s≥0 ∈ B, we have: if P(Xt =k, (Xr)r≤t ∈ A) > 0, then

P((Xt+s)s≥0 ∈ B|Xt = k, (Xr)r≤t ∈ A) = P((Xt+s)s≥0 ∈ B|Xt = k) = Pk((Xs)s≥0 ∈ B).

Page 12: Part B Applied Probability

4 Lecture Notes – Part B Applied Probability – Oxford MT 2007

(ii) Given Xt = k, (Xr)r≤t and (Xt+s)s≥0 are independent, and the conditional distri-bution of (Xt+s)s≥0 is that of a Poisson process with rate λ starting from k.

(iii) (Xt+s − Xt)s≥0 is a Poisson process with rate λ starting from 0, independent of(Xr)r≤t.

We will prove a more general Proposition 17 in Lecture 4. Also, in Lecture 2, we willrevise and push further the notion of conditioning. For this lecture we content ourselveswith the formulation of the Markov property and proceed to the overview of the course.

Markov models (models that have the Markov property) are useful in a wide range ofapplications, e.g. price processes in Mathematical Finance, evolution of genetic materialin Mathematical Biology, evolutions of particles in space in Mathematical Physics. TheMarkov property is a property that makes the model somewhat simple (not easy, but itcould be much less tractable). We will develop tools that support this statement.

1.3 Brief summary of the course

Two generalisations of the Poisson process and several applications make up this course.

• The Markov property of Proposition 3(ii) can be used as a starting point to a biggerclass of processes, so-called continuous-time Markov chains. They are analogues ofdiscrete-time Markov chains, and they are often better adapted to applications. Onthe other hand, new aspects arise that did not arise in discrete time, and connectionsbetween the two will be studied. Roughly, the first half of this course is concernedwith continuous-time Markov chains. Our main reference book will be Norris’sbook on Markov Chains.

• The Poisson process is the prototype of a counting process. For the Poisson process,“everything” can be calculated explicitly. In practice, though, this is often onlyhelpful as a first approximation. E.g. in insurance applications, the Poisson processis used for the arrival of claims. However, there is empirical evidence that inter-arrival times are neither exponentially distributed nor independent nor identicallydistributed. The second approximation is to relax exponentiality of inter-arrivaltimes but keep their independence and identical distribution. This class of countingprocesses is called renewal processes. Since exact calculations are often impossibleor not helpful, the most important results of renewal theory are limiting results. Ourmain reference will be Chapter 10 of Grimmett and Stirzaker’s book on Probabilityand Random Processes.

• Many applications that we discuss are in queueing theory. The easiest, so-calledM/M/1 queue consist of a server and customers arriving according to a Poissonprocess. Independently of the arrival times, each customer has an exponentialservice time for which he will occupy the server, when it is his turn. If the serveris busy, customers queue until being served. Everything has been designed so thatthe queue length is a continuous-time Markov chain, and various quantities can bestudied or calculated (equilibrium distribution, lengths of idle periods, waiting time

Page 13: Part B Applied Probability

Lecture 1: Introduction: Poisson processes, generalisations, applications 5

distributions etc.). More complicated queues arise if the Poisson process is replacedby a renewal process or the exponential service times by any other distribution.There are also systems with k = 2, 3, . . . ,∞ servers. The abstract queueing systemscan be more concretely applied in telecommunication, computing networks, etc.

• Some other applications include insurance ruin and propagation of diseases.

Page 14: Part B Applied Probability

6 Lecture Notes – Part B Applied Probability – Oxford MT 2007

Page 15: Part B Applied Probability

Lecture 2

Conditioning and stochasticmodelling

Reading: Grimmett-Stirzaker 3.7, 4.6Further reading: Grimmett-Stirzaker 4.7; CT4 Unit 1

This lecture consolidates the ideas of conditioning and modelling preparing a morevaried range of applications and a less mechanical use of probability than what was thefocus of Part A. Along the way, we explain the meaning of statements such as the Markovproperties of Lecture 1.

2.1 Modelling of events

Much of probability theory is about events and probabilities of events. Informally, thisis an easy concept. Events like A1 =“the die shows an even number” and A2 =“the firstcustomer arrives before 10am” make perfect sense in real situations. When it comes toassigning probabilities, things are less clear. We seem to be able to write down some(P(A1) = 0.5?) probabilities directly without much sophistication (still making implicitassumptions about the fairness of the die and the conduct of the experiment). Others(P(A2)) definitely require a mathematical model.

Hardly any real situations involve genuine randomness. It is rather our incompleteperception/information that makes us think there was randomness. In fact, assuming aspecific random model in our decision-making can be very helpful and lead to decisionsthat are sensible/good/beneficial in some sense.

Mathematical models always make assumptions and reflect reality only partially.Quite commonly, we have the following phenomenon: the better a model represents real-ity, the more complicated it is to analyse. There is a trade-off here. In any case, we mustbase all our calculations on the model specification, the model assumptions. Translatingreality into models is a non-mathematical task. Analysing a model is purely mathemat-ical. Models have to be consistent, i.e. not contain contradictions. This statement mayseem superfluous, but there are models that have undesirable features that cannot beeasily removed, least by postulating the contrary. E.g., you may wish to specify a modelfor customer arrival where arrival counts over disjoint time intervals are independent,

7

Page 16: Part B Applied Probability

8 Lecture Notes – Part B Applied Probability – Oxford MT 2007

arrival counts over time intervals of equal lengths have the same distribution (cf. Re-mark 2 ii)-iii)), and times between two arrivals have a nonexponential distribution. Well,such a model does not exist (we won’t prove this statement now, it’s a bit too hard atthis stage). On the other hand, within a consistent model, all properties that were notspecified in the model assumptions have to be derived from these. Otherwise it must beassumed that the model may not have the property.

Suppose we are told that a shop opens at 9.30am, and on average, there are 10customers per hour. One model could be to say, that a customer arrives exactly everysix minutes. Another model could be to say, customers arrive according to a Poissonprocess at rate λ = 10 (time unit=1 hour). Whichever model you prefer, fact is, you can“calculate” P(A2), and it is not the same in the two models, so we should reflect this inour notation. We don’t want it to be A2 that changes, so it must be P, and we may wishto write P for the second model. P should be thought of as defining the randomness.Similarly, we can express dependence on a parameter by P(λ), dependence on an initialvalue by Pk. Informally, for a Poisson process model, we set Pk(A) = P(A|X0 = k) forall events A (formally you should wonder whether P(X0 = k) > 0).

Aside: Formally, there is a way to define random variables as functions Xt : Ω → N,Zn : Ω → [0,∞) etc. Pk can then be defined as a measure on Ω for all k, and this measureis compatible with our distributional assumptions which claim that

the probability that Xt = k + j is(λt)j

j!e−λt

in that

Pk(Xt = j + k) = Pk(ω ∈ Ω : Xt(ω) = j + k) =(λt)j

j!e−λt.

In the mathematical sense, the set Aj+k := Xt = j+k := ω ∈ Ω : Xt(ω) = j+k ⊂ Ωis called an event. Technically1, we cannot in general call all subsets of Ω events if Ω is

uncountable, but we will not worry about this, since it is very hard to find examples of non-

measurable sets. ω should be thought of as a scenario, a realisation of all the randomness. Ω

collects the possibilities and P tells us how likely each event is to occur. If we denote the set

of all events by A, then (Ω,A, P) is called a stochastic basis. Its form is usually irrelevant. It

is important that it exists for all our purposes to make sure that the random objects we study

exist. We will assume that all our random variables can be defined as (measurable) functions

on Ω. This existence can be proved for all our purposes, using measure theory. In fact, when

we express complicated families of random variables such as a Poisson process (Xt)t≥0 in terms

of a countable family (Zn)n≥1 of independent random variables, we do this for two reasons.

The first should be apparent: countable families of independent variables are conceptually

easier than uncountable families of dependent variables. The second is that a result in measure

theory says that there exists a stochastic basis on which we can define countable families of

independent variables whereas any more general result for uncountable families or dependent

variables requires additional assumptions or other caveats.

1The remainder of this paragraph is in a smaller font. This means (now and whenever something isin small font), that it can be skipped on first reading, and the reader may or may not want to get backto it at a later stage.

Page 17: Part B Applied Probability

Lecture 2: Conditioning and stochastic modelling 9

It is very useful to think about random variables Zn as functions Zn(ω), becauseit immediately makes sense to define a Poisson process Xt(ω) as in Definition 1, bydefining new functions in terms of old functions. When learning probability, it is usualto first apply analytic rules to calculate distributions of functions of random variables(transformation formula for densities, expectation of a function of a random variable interms of its density or probability mass function). Here we are dealing more explicitlywith random variables and events themselves, operating on them directly.

This course is not based on measure theory, but you should be aware that someof the proofs are only mathematically complete if based on measure theory. Ideally,this only means that we apply a result from measure theory that is intuitive enough tobelieve without proof. In a few cases, however, the gap is more serious. Every effortwill be made to point out technicalities, but without drawing attention away from theprobabilistic arguments that constitute this course and that are useful for applications.

B10a Martingales Through Measure Theory provides as pleasant an introduction tomeasure theory as can be given. That course nicely complements this course in providingthe formal basis for probability theory in general and hence this course in particular.However, it is by no means a co-requisite, and when we do refer to this course, it islikely to be to material that has not yet been covered there. Williams’ Probability withMartingales is the recommended book reference.

2.2 Conditional probabilities, densities and expecta-

tions

Conditional probabilities were introduced in Part A (or even Mods) as

P(B|A) =P(B ∩ A)

P(A),

where we require P(A) > 0.

Example 4 Let X be a Poisson process. Then

P(Xt = k + j|Xs = k) =P(Xt −Xs = j,Xs = k)

P(Xs = k)= P(Xt −Xs = j) = P(Xt−s = j),

by the independence and stationarity of increments, Remark 2 (ii)-(iii).

Conditional densities were introduced as

fS|T (s|t) = fS|T=t(s) =fS,T (s, t)

fT (t).

Example 5 Let X be a Poisson process. Then, for t > s,

fT2|T1=s(t) =fT1,T2(s, t)

fT1(s)=fZ0,Z1(s, t− s)

fZ0(s)=fZ0(s)fZ1(t− s)

fZ0(s)= fZ1(t− s) = λe−λ(t−s),

by the transformation formula for bivariate densities to relate fT1,T2 to fZ0,Z1, and inde-pendence of Z0 and Z1.

Page 18: Part B Applied Probability

10 Lecture Notes – Part B Applied Probability – Oxford MT 2007

Conditioning has to do with available information. Many models are stochastic onlybecause the detailed deterministic structure is too complex. E.g. the counting processof natural catastrophies (hurricanes, earthquakes, volcanic eruptions, spring tide) etc. isgenuinely deterministic. Since we cannot observe and model precisely weather, tectonicmovements etc. it is much more fruitful to write down a stochastic model, e.g. a Poissonprocess, as a first approximation. We observe this process over time, and we can updatethe stochastic process by its realisation. Suppose we know the value of the intensityparameter λ ∈ (0,∞). (If we don’t, any update will lead to a new estimate of λ, butwe do not worry about this here). If the first arrival takes a long time to happen, thisgives us information about the second arrival time T2, simply since T2 = T1 + Z1 > T1.When we eventually observe T1 = s, the conditional density of T2 given T1 = s takes intoaccount this observation and captures the remaining stochastic properties of T2. Theresult of the formal calculation to derive the conditional density is in agreement with theintuition that if T1 = s, T2 = T1 + Z1 ought to have the distribution of Z1 shifted by s.

Example 6 Conditional probabilities and conditional densities are compatible in that

P(S ∈ B|T = t) =

B

fS|T=t(s)ds = limε↓0

P(S ∈ B|t ≤ T ≤ t+ ε),

provided only that fT is right-continuous. To see this, when fS,T is sufficiently smooth,write for all intervals B = (a, b)

P(S ∈ B|t ≤ T ≤ t+ ε) =P(S ∈ B, t ≤ T ≤ t+ ε)

P(t ≤ T ≤ t+ ε)=

∫ t+ε

t

∫BfS,T (s, u)dsdu

1εP(t ≤ T ≤ t+ ε)

and under the smoothness condition (by dominated convergence etc.), this tends to∫

BfS,T (s, t)ds

fT (t)=

B

fS|T=t(s)ds = P(S ∈ B|T = t).

Similarly, we can also define

P(X = k|T = t) = limε↓0

P(X = k|t ≤ T ≤ t+ ε)

One can define conditional expectations in analogy with unconditional expections, e.g.in the latter case by

E(X|T = t) =

∞∑

j=0

jP(X = j|T = t).

Proposition 7 a) If X and Y are (dependent) discrete random variables in N, then

E(X) =

∞∑

n=0

E(X|Y = n)P(Y = n).

If b) X and T are jointly continuous random variables in (0,∞) or c) if X is discreteand T is continuous, and if T has a right-continuous density, then

E(X) =

∫ ∞

0

E(X|T = t)fT (t)dt.

Page 19: Part B Applied Probability

Lecture 2: Conditioning and stochastic modelling 11

Proof: c) We start at the right-hand side

∫ ∞

0

E(X|T = t)fT (t)dt =

∫ ∞

0

∞∑

j=0

jP(X = j|T = t)fT (t)dt

and calculate

P(X = j|T = t) = limε↓0

P(X = j, t ≤ T ≤ t+ ε)

P(t ≤ T ≤ t+ ε)

= limε↓0

1εP(t ≤ T ≤ t+ ε|X = j)P(X = j)

1εP(t ≤ T ≤ t+ ε)

=fT |X=j(t)P(X = j)

fT (t)

so that we get on the right-hand side

∫ ∞

0

∞∑

j=0

jP(X = j|T = t)fT (t)dt =

∞∑

j=0

jP(X = j)

∫ ∞

0

fT |X=j(t)dt = E(X)

after interchanging summation and integration. This is justified by Tonnelli’s theoremthat we state in Lecture 3.

b) is similar to c).a) is more elementary and left to the reader. 2

Statement and argument hold for left-continuous densities and approximations fromthe left, as well. For continuous densities, one can also approximate T = t by t− ε ≤T ≤ t+ ε (for ε < t, and normalisation by 2ε, as adequate).

Recall that we formulated the Markov property of the Poisson process as

P((Xt+u)u≥0 ∈ B|Xt = k, (Xr)r≤t ∈ A) = Pk((Xt+u)u≥0 ∈ B)

for all events (Xr)r≤t ∈ A such that P(Xt = k, (Xr)r≤t ∈ A) > 0, and (Xt+u)u≥0 ∈ B.For certain sets A with zero probability, this can still be established by approximation.

2.3 Independence and conditional independence

Recall that independence of two random variables is defined as follows. Two discreterandom variables X and Y are independent if

P(X = j, Y = k) = P(X = j)P(Y = k) for all j, k ∈ S.

Two jointly continuous random variables S and T are independent if their joint densityfactorises, i.e. if

fS,T (s, t) = fS(s)fT (t) for all s, t ∈ R, where fS(s) =

R

fS,T (s, t)dt.

Page 20: Part B Applied Probability

12 Lecture Notes – Part B Applied Probability – Oxford MT 2007

Recall also (or check) that this is equivalent, in both cases, to

P(S ≤ s, T ≤ t) = P(S ≤ s)P(T ≤ t) for all s, t ∈ R.

In fact, it is also equivalent to

P(S ∈ A, T ∈ B) = P(S ∈ B)P(T ∈ B) for all (measurable) A,B ⊂ R,

and we define more generally:

Definition 8 Let X and Y be two random variables with values in any, possibly differentspaces X and Y. Then we call X and Y independent if

P(X ∈ A, Y ∈ B) = P(X ∈ A)P(Y ∈ B) for all (measurable) A ⊂ X and B ⊂ Y.

We call X and Y conditionally independent given a third random variable Z if for allz ∈ S (if Z has values in S) or z ∈ [0,∞) (if Z has values in [0,∞)),

P(X ∈ A, Y ∈ B|Z = z) = P(X ∈ A|Z = z)P(Y ∈ B|Z = z).

Remark and Fact 9 2 Condional independence is in many ways like ordinary (uncondi-tional) independence. E.g., if X is discrete, it suffices to check the condition for A = x,x ∈ X. If Y is real-valued, it suffices to consider B = (−∞, y], y ∈ R. If Y is bivariate,it suffices to consider all B of the form B = B1 × B2.

If X = f : [0, t] → N right-continuous, it suffices to consider A = f ∈ X : f(r1) =n1, . . . , f(rm) = nm for all 0 ≤ r1 < . . . < rm ≤ t and n1, . . . , nm ∈ N. This is the basisfor the meaning of Proposition 3(ii).

We conclude by a fact that may seem obvious, but does not follow immediately fromthe definition. Also the approximation argument only gives some special cases.

Fact 10 Let X be any random variable, and T a real-valued random variable with right-continuous density. Then, for all (measurable) f : X × [0,∞) → [0,∞), we have

E(f(X, T )|T = t) = E(f(X, t)|T = t).

Furthermore, if X and T are independent and g : X → [0,∞) (measurable) we have

E(g(X)|T = t) = E(g(X)).

If X takes values in [0,∞) also, example for f are e.g. f(x, t) = 1x+t>s, where1x+t>s := 1 if x + t > s and 1x+t>s := 0 otherwise; or f(x, t) = eλ(x+t) in which casethe statements are

P(X + T > s|T = t) = P(X + t > s|T = t) and E(eλ(X+T )|T = t) = eλtE(eλX |T = t),

and the condition T = t can be removed on the right-hand sides if X and T areindependent. This can be shown by the approximation argument.

The analogue of Fact 10 for discrete T is elementary.

2Facts are results that are true, but that we cannot prove in this course. Note also that there is agrey zone between facts and propositions, since proofs or partial proofs sometimes appear on assignmentsheets, in the main or optional parts.

Page 21: Part B Applied Probability

Lecture 3

Birth processes and explosion

Reading: Norris 2.2-2.3, 2.5; Grimmett-Stirzaker 6.8 (11),(18)-(20)

In this lecture we introduce birth processes in analogy with our definition of Poissonprocesses. The common description (also for Markov chains) is always as follows: giventhe current state is m, what is the next part of the evolution, and (for the purposeof an inductive description) how does it depend on the past? (Answer to the last bit:conditionally independent given the current state, but this can here be expressed in termsof genuine independence).

3.1 Definition and an example

If we consider the Poisson process as a model for a growing population, it is not alwayssensible to assume that new members are born at the same rate regardless what the sizeof the population. You would rather expect this rate to increase with size (more birthsin larger populations), although some saturation effects may make sense as well.

Also, if the Poisson process is used as a counting process of alpha particle emissionsof a decaying radioactive substance, it makes sense to assume that the rate is decreasingwith the number of emissions, particularly if the half-life time is short.

Definition 11 A stochastic process X = (Xt)t≥0 of the form

Xt = k + #

n ≥ 1 :

n−1∑

j=0

Zj ≤ t

is called a simple birth process of rates (λn)n≥0 starting from X0 = k ∈ N, if the inter-arrival times Zj, j ≥ 0, are independent exponential random variables with parametersλk+j, j ≥ 0. X is called a (k, (λn)n≥0)-birth process.

Note that the parameter λn is attached with height n. The so-called holding time atheight n has an Exp(λn) distribution.

“Simple” refers to the fact that no two births occur at the same time, which onewould call “multiple” births. Multiple birth processes can be studied as well, and, givencertain additional assumptions, form examples of continuous-time Markov chains, likesimple birth processes do, as we will see soon.

13

Page 22: Part B Applied Probability

14 Lecture Notes – Part B Applied Probability – Oxford MT 2007

Example 12 Consider a population in which each individual gives birth after an expo-nential time of parameter λ, all independently and repeatedly.

If n individuals are present, then the first birth will occur after an exponential timeof parameter nλ. Then we have n + 1 individuals and, by the lack of memory property(see Assignment 1), the process begins afresh with n+ 1 exponential clocks independentof the previous evolution (n − 1 clocks still running with residual Exp(λ) times and 2new clocks from the individual that has just given birth and the new-born individual).By induction, the size of the population performs a birth process with rates λn = nλ.

Let Xt denote the number of individuals at time t and suppose X0 = 1. Write T forthe time of the first birth. Then by Proposition 7c)

E(Xt) =

∫ ∞

0

λe−λuE(Xt|T = u)du =

∫ t

0

λe−λuE(Xt|T = u)du+ e−λt

since Xt = X0 = 1 if T > t.Put µ(t) = E(Xt), then for 0 < u < t, intuitively E(Xt|T = u) = 2µ(t− u) since from

time u, the two individuals perform independent birth processes of the same type and weare interested in their population sizes t− u time units later. We will investigate a moreformal argument later, when we have the Markov property at our disposal.

Now

µ(t) =

∫ t

0

2λe−λuµ(t− u)du+ e−λt

and setting r = t− u

eλtµ(t) = 1 + 2λ

∫ t

0

eλrµ(r)dr.

Differentiating we obtain

µ′(t) = λµ(t)

so the mean population size grows exponentially, and X0 = 1 implies

E(Xt) = µ(t) = eλt.

3.2 Tonelli’s Theorem, monotone and dominated con-

vergence

The following is a result from measure theory that we cannot prove or dwell on here.

Fact 13 (Tonelli) You may interchange order of integration, countable summation andexpectation whenever the integrand/summands/random variables are nonnegative, e.g.

E

(∑

n≥0

Xn

)=∑

n≥0

E(Xn),

∫ ∞

0

n≥0

fn(x)dx =∑

n≥0

∫ ∞

0

fn(x)dx

∫ ∞

0

∫ x

0

f(x, y)dydx =

∫ ∞

0

∫ ∞

y

f(x, y)dxdy.

Page 23: Part B Applied Probability

Lecture 3: Birth processes and explosion 15

There were already two applications of this, one each in Lectures 1 and 2. The focuswas on other parts of the argument, but you may wish to consider the justification ofRemark 2 and Proposition 7 again now.

Interchanging limits is more delicate, and there are monotone and dominated conver-gence for this purpose. In this course we will only interchange limits when this is justifiedby monotone or dominated convergence, but we do not have the time to work out thedetails. Here are indicative statements.

Fact 14 (Monotone convergence) Integrals (expectations) of an increasing sequenceof nonnegative functions (random variables Yn) converge (in the sense that limn→∞ E(Yn) =E(limn→∞ Yn)).

Fact 15 (Dominated convergence) Integrals (expectations) of a pointwise convergentsequence of functions fn → f (applied to a random variable) converge, if the sequence|fn| ≤ g is dominated by an integrable function g (function which when applied to therandom variable has finite expectation), i.e.

∫g(x)dx <∞ ⇒ lim

n→∞

∫fn(x)dx =

∫lim

n→∞fn(x)dx

E(g(X)) <∞ ⇒ limn→∞

E(fn(X)) = E( limn→∞

fn(X)).

We refer to B10a Martingales Through Measure Theory for those of you who followthat course. In any case, our working hypothesis is that, in practice, we may interchangeall limits that come up in this course, but also know that these two theorems are requiredfor formal justification, and refer to them.

3.3 Explosion

If the rates (λn)n∈N increase too quickly, it may happen that infinitely many individualsare born in finite time (as with deterministic inter-birth times). We call this phenomenonexplosion. Formally, we can express the possibility of explosion by P(T∞ <∞) > 0 whereT∞ = limn→∞ Tn. Remember that it is not a valid argument to say that this is ridiculousfor the application we are modelling and hence cannot occur in our model. We haveto check whether it can occur under the model assumptions. And if it does occur andis ridiculous for the application, it means that the model is not a good model for theapplication. For simple birth processes, we have the following necessary and sufficientcondition.

Proposition 16 Let X be a (k, (λn)n≥0)-birth process. Then

P(T∞ <∞) > 0 if and only if∞∑

m=k

1

λm<∞.

Furthermore, in this case P(T∞ <∞) = 1.

Page 24: Part B Applied Probability

16 Lecture Notes – Part B Applied Probability – Oxford MT 2007

Proof: Note that

E(T∞) = E

(∞∑

n=0

Zn

)

=

∞∑

n=0

E(Zn)

=

∞∑

m=k

1

λm

where Tonelli’s Theorem allows us to interchange summation and expectation. Therefore,if the series is finite, then E(T∞) <∞ implies P(T∞ <∞) = 1.

Since, in general, P(T∞ < ∞) does not imply E(T∞) < ∞, we are not yet done forthe converse. However, (using monotone convergence), and also the independence of theZn, we can calculate

− log E(e−T∞

)= −

∞∑

n=0

log E(e−Zn

)

=∞∑

m=k

log

(1 +

1

λm

).

Either this latter sum is greater than log(2)∑∞

n=n0

1λn

if λn ≥ 1 for n ≥ n0, by linearinterpolation and concavity of the logarithm. Or otherwise a restriction to any subse-quence λnk

≤ 1 shows that the sum is infinite as each of these summands contributes atleast log(2).

Therefore, if the series diverges, then E(e−T∞) = 0, i.e. P(T∞ = ∞) = 1. 2

Note that we have not explicitly specified what happens after T∞ if T∞ < ∞. Witha population size model in mind, Xt = ∞ for all t ≥ T∞ is a reasonable convention.Formally, this means that X is a process in N = N ∪ ∞. This process is often calledthe minimal process, since it is “active” on a minimal time interval. We will show theMarkov property for minimal processes. It can also be shown that there are other waysto specify X after explosion that preserve the Markov property. The next natural thingto do is to start afresh after explosion. Such a process is then called non-minimal.

Page 25: Part B Applied Probability

Lecture 4

Birth processes and the Markovproperty

Reading: Norris 2.4-2.5; Grimmett-Stirzaker 6.8 (21)-(25)

In this lecture we discuss in detail the Markov property for birth processes.

4.1 Statements of the Markov property

Proposition 17 (Markov property) Let X be a simple birth process with rates (λn)n≥0

starting from k ≥ 0, and t ≥ 0 a fixed time, ℓ ≥ k a fixed height. Then given Xt = ℓ,(Xr)r≤t and (Xt+s)s≥0 are conditionally independent, and the conditional distribution of(Xt+s)s≥0 is that of a simple birth process with rates (λn)n≥0 starting from k, we usenotation

(Xr)r≤t

Xt=ℓ

(Xt+s)s≥0 ∼ (ℓ, (λn)n≥0)-birth process.

Example 18 In formulas, we can write (and apply)

P((Xr)r≤t ∈ A, (Xt+s)s≥0 ∈ B|Xt = ℓ) = P((Xr)r≤t ∈ A|Xt = ℓ)P((Xt+s)s≥0 ∈ B|Xt = ℓ)

= P((Xr)r≤t ∈ A|Xt = ℓ)P((Xs)s≥0 ∈ B),

where X is an (ℓ, (λn)n≥0)-birth process and A,B arbitrary sets of paths, e.g. A = f :[0, t] → N : f(rj) = nj , j = 1, . . . , m for some 0 ≤ r1 < . . . < rm ≤ t, nj ∈ N. Note that

(Xr)r≤t ∈ A ⇐⇒ Xrj= nj for all j = 1, . . . , m.

Therefore, in particular

P(Xr1 = n1, . . . , Xrm= nm, Xt+s1 = p1, . . . , Xt+sh

= ph|Xt = ℓ)

= P(Xr1 = n1, . . . , Xrm= nm|Xt = ℓ)P(Xs1 = p1, . . . , Xsh

= ph)

17

Page 26: Part B Applied Probability

18 Lecture Notes – Part B Applied Probability – Oxford MT 2007

The Markov property as formulated in Proposition 17 (or Example 18) says that“past and future are (conditionally) independent given the present”. We can say this ina rather different way as “the past is irrelevant for the future, only the present matters”.Let us derive this reformulation using the elementary rules of conditional probability: theMarkov property is about three events

past E = (Xr)r≤t ∈ A, future F = (Xt+s)s≥0 ∈ B and present C = Xt = ℓ

and states P(E ∩ F |C) = P(E|C)P(F |C). This is a special property that does not holdfor general events E, F and C. In fact, by the definition of conditional probabilities, wealways have

P(E ∩ F |C) =P(E ∩ F ∩ C)

P(C)=

P(F |E ∩ C)P(E ∩ C)

P(C)= P(F |E ∩ C)P(E|C),

so that, by comparison, if P(E ∩ C) > 0, we here have P(F |E ∩ C) = P(F |C), and wededuce

Corollary 19 (Markov property, alternative formulation) For all t ≥ 0, ℓ ≥ kand sets of paths A and B with P(Xt = ℓ, (Xr)r≤t ∈ A) > 0, we have

P((Xt+s)s≥0 ∈ B|Xt = ℓ, (Xr)r≤t ∈ A) = P((Xt+s)s≥0 ∈ B|Xt = ℓ) = P((Xs)s≥0 ∈ B),

where X is an (ℓ, (λn)n≥0)-birth process.

In fact, the condition P(Xt = ℓ, (Xr)r≤t ∈ A) > 0 can often be waived, if the con-ditional probabilities can still be defined via an approximation by events of positiveprobability. This is a very informal statement since different approximations could, in princi-

ple, give rise to different values for such conditional probabilities. This is, in fact, very subtle,

and uniqueness cannot be achieved in a strong sense, but some exceptional points usually occur.

Formally, one would introduce versions of conditional probabilities, and sometimes, a nice ver-

sion can be identified. For us, this is just a technical hurdle that we do not attempt. If we did

attempt it, we would find enough continuity (or right-continuity), and see that we usually in-

tegrate conditional probabilities so that any non-uniqueness would not affect our final answers.

See B10a Martingales Through Measure Theory for details.

4.2 Application of the Markov property

We make (almost) rigorous the intuitive argument used in Example 12 to justify E(Xu|T =t) = 2E(Xu−t), where u ≥ t, X is a (1, (nλ)n≥0)-birth process and T = inft ≥ 0 : Xt =2. In fact, we now have

T = t = Xr = 1, r < t;Xt = 2 ⊂ Xt = 2

so that for all n ≥ 0

P(Xu = n|T = t) = P(Xu = n|Xr = 1, r < t;Xt = 2) = P(Xu = n|Xt = 2) = P(Xu−t = n),

Page 27: Part B Applied Probability

Lecture 4: Birth processes and the Markov property 19

where X is a (2, (nλ)n≥0)-birth process. This also gives

E(Xu|T = t) =∑

n∈N

nP(Xu = n|T = t)

=∑

n∈N

nP(Xu−t = n|X0 = 2) = E(Xu−t) = 2E(Xu−t),

since, by model assumption, the families of two individuals evolve completely indepen-dently like separate populations starting from one individual each.

Remark: The event T = t has probability zero (P(T = t) = 0 as T is exponentiallydistributed). Therefore any conditional probabilities given T = t have to be approximated

(using t − ε < T ≤ t) in order to justify the application of the Markov property. We will not

go through the details of this in this course and leave this to the insisting reader. Our focus is

on the probabilistic argument that the application of the Markov property constitutes here.

4.3 Proof of the Markov property

Proof: Assume for simplicity that X0 = 0. The general case can then be deduced. OnXt = k = Tk ≤ t < Tk+1 we have

Xs := Xt+s = #

n ≥ 1 :

n−1∑

j=0

Zj ≤ t+ s

= k + #

n ≥ k + 1 :

n−1∑

j=0

Zj − t ≤ s

= k + #

m ≥ 1 :

m−1∑

j=0

Zj ≤ s

,

where Zj = Zk+j, j ≥ 1, and Z0 = Tk+1 − t. Therefore, X has the structure of abirth process starting from k, since given X0 = k, the Zj ∼ Exp(λk+j) are independent(conditionally given Xt = k). For j = 0 note that

P(Z0 > z|Xt = k) = P(Zk > (t− Tk) + z|Zk > t− Tk ≥ 0) = P(Zk > z)

where we applied the lack of memory property of Zk to the independent threshold t−Tk .This actually requires a slightly more thorough explanation since we are dealing withrepeated conditioning (first Xt = k, then Zk > t−Tk), but the key result that we need is

Lemma 20 (Exercise A.1.4(b)) The lack of memory property of the exponential dis-tribution holds at independent thresholds, i.e. for Z ∼ Exp(λ) and L a random variableindependent of Z, the following holds: given Z > L, Z − L ∼ Exp(λ) and Z − L isconditionally independent of L.

The proof is now completed again by the lack of memory property to see that Z0 (aswell as the Zj, j ≥ 1) is conditionally independent of Z0, . . . , Zk−1 given Xt = k, and theassertion follows. 2

Page 28: Part B Applied Probability

20 Lecture Notes – Part B Applied Probability – Oxford MT 2007

4.4 The strong Markov property

The Markov property means that at whichever fixed time we inspect our process, infor-mation about the past is not relevant for its future behaviour. If we think of an exampleof radioactive decay, this would allow us to describe the behaviour of the emission pro-cess, say, after the experiment has been running for 2 hours. Alternatively, we may wishto reinspect after 1000 emissions. This time is random, but we can certainly carry outany action we wish at the time of the 1000th emission. Such times are called stoppingtimes.

We will only look at stopping times of the form

Tn = inft ≥ 0 : Xt = n or TC = inft ≥ 0 : Xt ∈ C

for n ∈ N or C ⊂ N (or later C ⊂ S, a countable state space).

Proposition and Fact 21 (Strong Markov property) (i) Let X be a simple birthprocess with rates (λn)n≥0 and T ≥ 0 a stopping time. Then for all k ∈ N withP(XT = k) > 0, we have given T < ∞ and XT = k, (Xr)r≤T and (XT+s)s≥0 areindependent, and the conditional distribution of (XT+s)s≥0 is a simple birth processwith rates (λn)n≥0 starting from k.

(ii) In the special case where X is a Poisson process and P(T <∞) = 1, (XT+s−XT )s≥0

is a Poisson process starting from 0 independent of T and (Xr)r≤T .

The proof of the strong Markov property (in full generality) is beyond the scope of thiscourse, but we will use the result from time to time. See the Appendix of Norris’s bookfor details. Note however, that the proof the strong Markov property for first hittingtimes Ti is not so hard since then P(XTi

= i) = 1, so the only relevant statement is fork = i, and (Xr)r≤Ti

can be expressed in terms of Zj, 0 ≤ j ≤ i−1, and (XT+s−XT )s≥0 interms of Zj, j ≥ i. In fact, this is not just a sketch of a proof, but a complete proof, if wemake two more remarks. First, conditioning on XTi

= i is like not conditioning at all,because this event has probability 1. Second, it is really enough to establish independenceof holding times, because “can be expressed in terms of” is actually “is a function G of”,where the first function e.g. goes from [0,∞)i to the space

X = f : [0, t] → N, f rightcontinuous, t ≥ 0.

Now, we have a general result saying that if G : A → X and H : B → Y are (measureable)functions, and A and B are independent random variables in A and B, then G(A) andH(B) are also independent. To prove this using our definition of independence, just notethat for all E ⊂ X and F ⊂ Y (measurable), we have

P(G(A) ∈ E,H(B) ∈ F ) = P(A ∈ G−1(E), B ∈ H−1(F ))

= P(A ∈ G−1(E))P(B ∈ H−1(F ))

= P(G(A) ∈ E)P(H(B) ∈ F ).

A more formal definition of stopping times is as follows.

Page 29: Part B Applied Probability

Lecture 4: Birth processes and the Markov property 21

Definition 22 (Stopping time) A random time T taking values in [0,∞] is called a stoppingtime for a continuous-time process X = (Xt)t≥0 if, for all t ≥ 0, the event T ≤ t can beexpressed (in a measurable way) in terms of (Xt)r≤t.

Of course, you should think of X as an N-valued simple birth process for the moment, butyou will appreciate that this definition makes sense in much more generality. In our next lecturewe will see examples where we are observing several independent processes on the same timescale. The easiest example is two independent birth processes X = (X(1),X(2)) modelling e.g.two populations that we observe simultaneously.

Example 23 1. Let X be a simple birth process starting from X0 = 0. Then for all i ≥ 1,Ti = inft ≥ 0 : Xt = i is a stopping time since T ≤ t = ∃s ≤ t : Xs = i = Xt ≥ i (thelatter equality uses the property that birth processes do not decrease; thus, strictly speaking,this equality is to mean that the two events differ by sets of probability zero in the sense thatwe write E = F if P(E \F ) = P(F \E) = 0). Ti is called the first hitting time of i. Clearly, forX modelling a Geiger counter and i = 1000, we are in the situation of our motivating example.

2. Let X be a simple birth process. Then for ε > 0, the random time Tε = infTi ≥ T1 :Ti − Ti−1 < ε, i.e. the first time that two births have occurred within time at most ε of oneanother, is a stopping time.

In general, the first time that something happens, or that several things have happened

successively, is a stopping time. It is essential that we don’t have to look ahead to decide. In

particular, the last time that something happens, e.g. the last birth time before time t, is not

a stopping time, and the statement of the strong Markov property is usually wrong for such

times.

4.5 The Markov property for Poisson processes

In Proposition 3, we reformulated the Markov property of the Poisson process in terms ofgenuine independence rather than just conditional independence. Let (Xt)t≥0 be a Poissonprocess with rate λ > 0, i.e. a (0, (λn)n≥0)-birth process with λn = λ for all n ≥ 0, then

(Xr)r≤t

∐(Xt+s − Xt)s≥0 ∼ (Xs)s≥0.

Proof: By the version of the Markov property given in Proposition 17, we have for all ℓ ≥ 0,that

(Xr)r≤t

Xt=ℓ

(Xt+s)s≥0 ∼ (ℓ + Xs)s≥0,

also since (ℓ + Xs)s≥0 is a Poisson process starting from ℓ. Specifically, it is not hard to seethat for a general (k, (λn)n≥0)-birth process, the process (ℓ+Xs)s≥0 is a (k + ℓ, (λn−ℓ)n≥0-birthprocess (for any choice of λn, −ℓ ≤ n ≤ 0), and here λn = λ for all n. Clearly also

(Xr)r≤t

Xt=ℓ

(Xt+s − ℓ)s≥0 ∼ (Xs)s≥0.

Page 30: Part B Applied Probability

22 Lecture Notes – Part B Applied Probability – Oxford MT 2007

However, conditionally given Xt = ℓ, we have Xt+s − ℓ = Xt+s − Xt for all s ≥ 0, so that

(Xr)r≤t

Xt=ℓ

(Xt+s − Xt)s≥0 ∼ (Xs)s≥0.

Since the distribution of (Xt+s −Xt)s≥0 (the distribution of a Poisson process of rate λ startingfrom 0) does not depend on ℓ ∈ N, we can now apply Exercise A.1.5 to conclude that this condi-tional independence is in fact genuine independence, and the common conditional distributionof (Xt+s − Xt)s≥0 given Xt = ℓ, ℓ ∈ N, is in fact also the unconditional distribution.

More precisely, we apply the statement of the exercise with N = Xt, Z = (Xt+s−Xt)s≥0 andM = (Xr)r≤t. Strictly speaking, Z and M are here random variables taking values in suitable

function spaces, but this does not pose any problems from a measure-theoretic perspective, once“suitable” has been defined, but this definition of “suitable” is again beyond the scope of thiscourse. 2

We can now also establish our assertions in Remark 2, namely X has stationary incrementssince

(Xt+s − Xt)s≥0 ∼ (Xs)s≥0 ⇒ Xt+u − Xt ∼ Xu for all u ≥ 0.

The subtlety of this statement is that we have for all B ⊂ f : [0,∞) → N (measurable)

P((Xt+s − Xt)s≥0 ∈ B) = P((Xs)s≥0 ∈ B),

and for all n ≥ 0 and B = Bu,n = f : [0,∞) → N : f(u) = n this gives P(Xt+u − Xt =n) = P(Xu = n) as required. Independence of two increments Xt and Xt+u − Xt also followsdirectly. An careful inductive argument allows to extend this to the required independence ofm successive increments Xtj − Xtj−1 , j = 1, . . . ,m, because

P(Xtm+1 − Xtm = nm+1,Xtm − Xtm−1 = nm, . . . ,Xt1 − Xt0 = n1)

= P(Xtm+1 − Xtm = nm+1)P(Xtm − Xtm−1 = nm, . . . ,Xt1 − Xt0 = n1)

follows from the Markov property at t = tm.An alternative way to establish Proposition 3 is to give a direct proof of the stationarity and

independence of increments in Remark 2, based on the lack of memory property, and deducethe Markov property from there.

Page 31: Part B Applied Probability

Lecture 5

Continuous-time Markov chains

Reading: Norris 2.1, 2.6Further reading: Grimmett-Stirzaker 6.9; Ross 6.1-6.3; Norris 2.9

In this lecture, we generalise the notion of a birth process to allow deaths and othertransitions, actually transitions between any two states in a state space S, just as fordiscrete-time Markov chains and using the notion of a discrete-time Markov chain.

Continuous-time Markov chains are similar in many respects to discrete-time Markovchains, but they also show important differences. Roughly, we will spend Lectures 5 and6 to explore differences and tools to handle these, then similarities in Lectures 7 and 8.

5.1 Definition and terminology

Definition 24 Let (Mn)n≥0 be a discrete-time Markov chain on S with transition prob-abilities πij , i, j ∈ S. Let (Zn)n≥0 be a sequence of conditionally independent expo-nential random variables with conditional distributions Exp(λMn

) given (Mn)n≥0, whereλi ∈ (0,∞), i ∈ S. Then the process X = (Xt)t≥0 defined by

Xt = Mn, Tn ≤ t < Tn+1, n ≥ 0, Xt = ∞, T∞ ≤ t <∞,

where T0 = 0, Tn = Z0 + . . .+ Zn−1, n ≥ 1, is called (minimal) continuous-time Markovchain with jump probabilities (πij)i,j∈S and holding rates (λi)i∈S.

Usually, T∞ = ∞, but the explosion phenomenon studied in Lecture 3 for the specialcase of a birth process has to be taken into account in a general definition. This is theso-called jump-chain holding-time definition of continuous-time Markov chains. Thereare others, and we will point these out when we have established relevant connections.

Here Zn ∼ Exp(λMn) given Mn is short for Zn ∼ Exp(λk) conditionally given Mn = k,

for all k ∈ S and conditional independence given (Mn)n≥0 means that for all m ≥ 0,Z0, . . . , Zm are conditionally independent given M0 = k0, . . . ,Mm = km.

Example 25 (Birth processes) For (k, (λn)n≥0)-birth processes, we have Mn = k + ndeterministic, i.e. πi,i+1 = 1. Conditional independence of Zn given Mn is independence,and Exp(λMn

) = Exp(λk+n) is the unconditional distribution of Zk.

23

Page 32: Part B Applied Probability

24 Lecture Notes – Part B Applied Probability – Oxford MT 2007

The representation of the distribution of X by (πij)i,j∈S and (λi)i∈S is unique if weassume furthermore πii ∈ 0, 1 so that M either jumps straight away or remains in agiven state forever, and by setting λi = 0 if πii = 1. This eliminates the possibility of thediscrete chain proposing a “jump” from state i to itself.

It is customary to represent the transition probabilities πij and the holding rates λi

in a single matrix, called the Q-matrix, as follows. Define for i 6= j

qij = λiπij and qii = −λi

Remark 26 qii = −∑

j 6=i qij, since either∑

j 6=i πij = 1 or λi = 0. As a consequence, therow sums of a Q-matrix vanish.

X is then also referred to as a continuous-time Markov chain with Q-matrix Q, a (k,Q)-Markov chain if starting from X0 = M0 = k.

Example 25 (continued) For birth processes, we obtain

Q =

−λ0 λ0 0 0 · · ·0 −λ1 λ1 0 · · ·

0 0 −λ2 λ2. . .

.... . .

. . .. . .

. . .

.

As with discrete-time chains it is sometimes useful to specify an initial distributionfor X0 that we call ν, i.e. we let νi = P(X0 = i), i ∈ S. Such a continuous-time Markovchain will be referred to as a (ν,Q)-Markov chain. Often

νi = δii0 =

1 i = i00 i 6= i0

or short ν = δi0

δii0 as a number derived from two arguments, here i and i0, is called Kronecker delta, δi0as a distribution only charging one point, here i0, is called Dirac delta.

As an example of how an initial distribution can arise in practice, consider the numberof customers arriving before a shop opens at time t = 0. As this number typically variesfrom day to day, it is natural to model it by a random variable, and to specify itsdistribution.

5.2 Construction

Defining lots of conditional distributions for infinite families of random variables requiressome care in a measure-theoretic context. Also outside the measure-theoretic context,it is conceptually easier to express complicated random objects such as continuous-timeMarkov chains, in terms of a countable family of independent random variables. Wehave already done this for Poisson processes and can therefore use independent Pois-son processes as building blocks. This leads to the following maze construction of acontinuous-time Markov chain. It is the second appearance of the theory of competingexponentials and nicely illustrates the evolution of continuous-time Markov chains:

Page 33: Part B Applied Probability

Lecture 5: Continuous-time Markov chains 25

Proposition 27 Let M0 ∼ ν and (N ijt )t≥0, i, j ∈ S, i 6= j, independent Poisson processes

with rates qij. Then define T0 = 0 and for n ≥ 0

Tn+1 = inft > Tn : NMnjt 6= NMnj

Tnfor some j 6= Mn

and

Mn+1 = j if Tn+1 <∞ and NMnjTn+1

6= NMnjTn

.

Then

Xt = Mn, Tn ≤ t < Tn+1, n ≥ 0, Xt = ∞, T∞ ≤ t <∞,

is a (ν,Q)-Markov chain.

Think of the state space S as a maze where qij > 0 signifies that there is a gate fromstate i ∈ S to state j ∈ S. Each gate (i, j) opens at the event times of a Poisson processN ij . If after a given number of transitions the current state is i, then the next jumptime of X is when the next gate leading away from i opens. If this gate leads from i toj, then j is the new state for X. Think of each Poisson process as a clock that rings atits event times. A ringing clock here corresponds to a gate opening instantaneously (i.e.immediately closing afterwards).

Proof: We have to check that the process defined here has the correct jump chain, holdingtimes and dependence structure. Clearly M0 = X0 has the right starting distribution.Given M0 = i, the first jump occurs at the first time at which one of the Poisson processesN ij , j 6= i, has its first jump. This time is a minimum of independent exponential randomvariables of parameters qij , T1 = infT ij

1 , j 6= i for which

P(T1 > t) = P(T ij1 > t for all j 6= i) =

j 6=i

P(T ij1 > t) = exp

−t∑

j 6=i

qij

= e−λit,

i.e. Z0 = T1 ∼ Exp(λM0) given M0. Furthermore

P(T1 = T ij1 ) = P(T ij

1 < infT ik1 , k 6∈ i, j) =

qijλi

= πij .

For independence, the second and inductively all further holding times and transitions, weapply the strong Markov property of the Poisson processes (Fact 21, or a combination of

the lack of memory property at minima of exponential variables T ij1 and at an independent

exponential variable T1 for Nkj, k 6= i, as on assignment sheet 1) to see that the post-T1

Poisson processes (N ijT1+s − NT1)s≥0 are Poisson processes themselves, and therefore the

previous argument completes the induction step and hence the proof. T1 is a stoppingtime since

T1 ≤ t =⋂

j 6=i

T ij1 ≤ t

and the latter were expressed in terms of (N ijr )r≤t, j 6= i, respectively, in Example 23. 2

Page 34: Part B Applied Probability

26 Lecture Notes – Part B Applied Probability – Oxford MT 2007

Corollary 28 (Markov property) Let X be a (ν,Q)-Markov chain and t ≥ 0 a fixedtime. Then given Xt = k, (Xr)r≤t and (Xt+s)s≥0 are independent, and the conditionaldistribution of (Xt+s)s≥0 is that of a (k,Q)-Markov chain.

Proof: The post-t Poisson processes (N ijt+s − N ij

t )s≥0 are themselves Poisson processes,independent of the pre-t Poisson processes (N ij

r )0≤r≤t. The post-t behaviour of X onlydepends on Xt and the post-t Poisson processes. If we condition on Xt = k, thenclearly (Xt+s)s≥0 is starting from k. 2

Continuous-time Markov chains also have the strong Markov property. We leave theformulation to the reader. Its proof is beyond the scope of this course.

5.3 M/M/1 and M/M/s queues

Example 29 (M/M/1 queue) Let us model by Xt the number of customers in asingle-server queueing system at time t ≥ 0, including any customer currently beingserved. We assume that new customers arrive according to a Poisson process with rateλ, and that service times are independent Exp(µ) distributed.

Given a queue size of n, two transitions are possible. If a customer arrives (at rateλ), X increases to n+1. If the customer being served leaves (at rate µ), the X decreasesto n− 1. If no customer is in the system, only the former can happen. This amounts toa Q-matrix

Q =

−λ λ 0 0 · · ·µ −µ− λ λ 0 · · ·

0 µ −µ − λ λ. . .

.... . .

. . .. . .

. . .

.

X is indeed a continuous-time Markov chain, since given state n ≥ 1 (n = 0) andtwo (one) independent clocks Exp(λ) and Exp(µ) (unless n = 0) ticking, the theory ofcompeting exponential clocks (Exercise A.1.2) shows that the system starts afresh withthe residual clock and the new clock (except n = 1 and transition to 0) exponential andindependent of the past, and the induction proceeds.

Example 30 (M/M/s queue) If there are s ≥ 1 servers in the system, the rate atwhich customers leave is s-fold, provided there are at least s customers in the system.

Page 35: Part B Applied Probability

Lecture 5: Continuous-time Markov chains 27

We obtain the Q-matrix

−λ λ 0 0 · · · 0 0 0 · · ·µ −µ − λ λ 0 · · · 0 0 0 · · ·

0 2µ −2µ− λ λ. . . 0 0 0

. . .

0 0 3µ −3µ− λ. . . 0 0 0

. . ....

.... . .

. . .. . .

. . .. . .

. . .. . .

0 0 0 0. . . −sµ− λ λ 0

. . .

0 0 0 0. . . sµ −sµ− λ λ

. . .

0 0 0 0. . . 0 sµ −sµ − λ

. . ....

.... . .

. . .. . .

. . .. . .

. . .. . .

which is maybe better represented by qi,i+1 = λ for all i ≥ 0, qi,i−1 = iµ for 1 ≤ i ≤ s,qi,i−1 = sµ for i ≥ s, qii = −iµ − λ for 0 ≤ i ≤ s, qii = −sµ − λ for i ≥ s, and qij = 0otherwise. A slight variation of the argument for Exercise 29 shows that the M/M/squeue is a continuous-time Markov chain.

Page 36: Part B Applied Probability

28 Lecture Notes – Part B Applied Probability – Oxford MT 2007

Page 37: Part B Applied Probability

Lecture 6

Transition semigroups

Reading: Norris 2.8, 3.1Further reading: Grimmett-Stirzaker 6.8 (12)-(17), 6.9; Ross 6.4; Norris 2.7, 2.10

In this lecture we establish transition matrices P (t), t ≥ 0, for continuous-time Markovchains. This family of matrices are the analogues of n-step transition matrices Πn =(π

(n)ij )i,j∈S, n ≥ 0, for discrete-time Markov chains. While we will continue to use the Q-

matrix to specify the distribution of a continuous-time Markov chain, transition matricesP (t) give some of the most important probabilities related to a continuous-time Markovchain, but they are available explicitly only in a limited range of examples.

6.1 The semigroup property of transition matrices

As a consequence of the Markov property of continuous-time Markov chains, the proba-bilities P(Xt+s = j|Xt = i) do not depend on t. We denote by

pij(s) = P(Xt+s = j|Xt = i) and P (s) = (pij(s))i,j∈S

the s-step transition probabilities and s-step transition matrix.

Example 31 For a Poisson process with rate λ, we have for j ≥ i or n ≥ 0

pij(t) =(λt)j−i

(j − i)!e−λt or pi,i+n(t) =

(λt)n

n!e−λt

by Remark 2. For fixed t ≥ 0 and i ≥ 0, these are Poi(λt) probabilities, shifted by i.

Proposition 32 (P (t))t≥0 is a semigroup, i.e. for all t, s ≥ 0 we have P (t)P (s) =P (t+ s) in the sense of matrix multiplication, and P (0) = I, the identity matrix.

Proof: Just note that for all i, k ∈ S

pik(t+ s) =∑

j∈S

P(Xt+s = k,Xt = j|X0 = i)

=∑

j∈S

P(Xt = j|X0 = i)P(Xt+s = k|Xt = j,X0 = i) =∑

j∈S

pij(t)pjk(s)

where we applied the Markov property. 2

We will remind ourselves of a fixed initial state X0 = i by writing P(·|X0 = i) or Pi(·).

29

Page 38: Part B Applied Probability

30 Lecture Notes – Part B Applied Probability – Oxford MT 2007

6.2 Backward equations

The following result is useful to calculate transition probabilities.

Proposition 33 The transition matrices (P (t))t≥0 of a minimal (ν,Q)-Markov chainsatisfy the backward equation

P ′(t) = QP (t)

with initial condition P (0) = I, the identity matrix.Furthermore, P (t) is the minimal nonnegative solution in the sense that all other nonnega-

tive solutions P (t) satisfy pik(t) ≥ pik(t) for all i, k ∈ S.

Proof: We first show that (P (t))t≥0 solves P ′(t) = QP (t), i.e. for all i, k ∈ S, t ≥ 0

p′ik(t) =∑

j∈S

qijpjk(t).

We start by a one-step analysis (using the strong Markov property at the first jump timeT1, or directly identifying the structure of the post-T1 process) to get

pik(t) = Pi(Xt = k) =

∫ ∞

0

Pi(Xt = k|T1 = s)λie−λisds

= δike−λit +

∫ t

0

j∈S

Pi(Xt = k,Xs = j|T1 = s)λie−λisds

= δike−λit +

∫ t

0

j∈S

Pi(Xt = k|Xs = j, T1 = s)Pi(Xs = j|T1 = s)λie−λisds

= δike−λit +

∫ t

0

j 6=i

pjk(t− s)πijλie−λisds

= δike−λit +

∫ t

0

j 6=i

qijpjk(u)e−λi(t−u)du,

i.e.

eλitpik(t) = δik +

∫ t

0

j 6=i

qijpjk(u)eλiudu.

Clearly this implies that pij is differentiable and we obtain

eλitp′ik(t) + λieλitpik(t) =

j 6=i

qijpjk(t)eλit,

which after cancellation of eλit and by λi = −qii is what we require.Suppose now, we have another non-negative solution pij(t). Then, by integration, pij(t) also

satisfies the above integral equations (the δik come from the initial conditions). Trivially

T0 = 0 ⇒ Pi(Xt = k, t < T0) = 0 ≤ pik(t) for all i, k ∈ S and t ≥ 0.

Page 39: Part B Applied Probability

Lecture 6: Transition semigroups 31

If for some n ∈ N

Pi(Xt = k, t < Tn) ≤ pik(t) for all i, k ∈ S and t ≥ 0,

then as above

Pi(Xt = k, t < Tn+1) = e−qitδik +

∫ t

0

j 6=i

qijPj(Xu = k, u < Tn)e−λi(t−u)du

≤ e−qitδik +

∫ t

0

j 6=i

qij pjk(u)e−λi(t−u)du = pik(t)

and therefore

pik(t) = limn→∞

Pi(Xt = k, t < Tn) ≤ pik(t)

as required. We conclude that pik(t) is the minimal non-negative solution to the backward

equation. 2

Note that non-minimal solutions that satisfy the conditions for transition matricescan only exist if

∑k∈S

pik(t) < 1 for some i ∈ S and t ≥ 0, i.e. the continuous-timeMarkov chain must be explosive in the sense that P(T∞ < ∞) > 0, and then pi∞(t) =1 −

∑k∈S

pik(t) > 0.

6.3 Forward equations

Proposition 34 If S is finite, then the transition matrices (P (t))t≥0 of a (ν,Q)-Markovchain satisfy the forward equation

P ′(t) = P (t)Q

with initial condition P (0) = I, the identity matrix.

Proof: See Assignment question A.3.5. 2

Fact 35 If S is infinite, then the statement of the proposition still holds for minimal(ν,Q)-Markov chains.

Furthermore, P (t) is the minimal nonnegative solution.

The proof of the proposition can be adapted under a uniformity assumption. Thisassumption will be sufficient for most practical purposes, but the general case is bestproved by conditioning on the last jump before t. Since this is not a stopping time, theMarkov property does not apply and calculations have to be done by hand, which is quitetechnical, see Norris 2.8.

In fact, both forward and backward equations admit unique solutions if the corre-sponding continuous-time Markov chain does not explode. This is the case in all practi-cally relevant situations. The non-uniqueness arises since Markovian extensions of explo-sive chains other than the minimal extension that we consider, will also have transitionsemigroups that satisfy the backward equations.

Page 40: Part B Applied Probability

32 Lecture Notes – Part B Applied Probability – Oxford MT 2007

Remark 36 Transition semigroups and the Markov property can form the basis for a definitionof continuous-time Markov chains. In order to match our definition, we could say that a (ν,Q)-Markov chain is a process with right-continuous sample paths in S such that

P(Xtn+1 = in+1|Xt0 = i0, . . . ,Xtn = in) = pin,in+1(tn+1 − tn)

for all 0 ≤ t0 ≤ t1 ≤ . . . ≤ tn+1 and i0, . . . , in+1 ∈ S, where P (t) satisfies the forward equations.See Norris 2.8.

6.4 Example

Example 37 For the Poisson process qi,i+1 = λ, qii = −λ, qij = 0 otherwise, hence wehave forward equations

p′ii(t) = −pii(t)λ, i ∈ N

p′i,i+n(t) = pi,i+n−1λ− pi,i+n(t)λ, i ∈ N, n ≥ 1

and it is easily seen inductively (fix i and proceed n = 0, 1, 2, . . .) that Poisson probabil-ities

pi,i+n(t) =(λt)n

n!e−λt

are solutions. Writing i + n rather than j is convenient because of the stationarity ofincrements in this special case of the Poisson process. Alternatively, we may consider thebackward equations

p′ii(t) = −λpii(t), i ∈ N

p′ij(t) = λpi+1,j(t) − λpij(t), i ∈ N, j ≥ i+ 1

and solve inductively (fix j and proceed i = j, j − 1, . . . , 0). We have seen an easier wayto derive the Poisson transition probabilities in Remark 2. The link between the twoways is revealed by the passage to probability generating functions

Gi(z, t) = Ei

(zXt)

which then have to satisfy differential equations

∂tGi(z, t) =

∞∑

n=0

zi+np′i,i+n(t) = λ(z − 1)Gi(z, t), Gi(z, 0) = Ei(zX0) = zi.

Solutions for these equations are obvious. In general, if we have Gi sufficiently smoothin t and z, we can derive from differential equations for probability generating functionsdifferential equations for moments

mi(t) = Ei(Xt) =∂

∂zGi(z, t)

∣∣∣∣z=1−

Page 41: Part B Applied Probability

Lecture 6: Transition semigroups 33

that yield here

m′i(t) =

∂z

∂tGi(z, t)

∣∣∣∣z=1−

= λ Gi(z, t)|z=1− = λ, mi(0) = Ei(X0) = i.

Often (even in this case), this can be solved more easily than the differential equation forprobability generating functions. Together with a similar equation for the variance, wecan capture the two most important distributional features of a model.

6.5 Matrix exponentials

It is tempting to say that the differential equation P ′(t) = QP (t), P (0) = I, has as its uniquesolution P (t) = etQ, that the same is true for P ′(t) = P (t)Q, P (0) = I, and that the functionalequation P (s)P (t) = P (s+ t) also has as its solutions precisely P (t) = etQ for some Q. Remem-ber, however that Q is a matrix, and, in general, the state space is countably infinite. Therefore,we have to define etQ in the first place, and to do this, we could use the (minimal or unique)solutions to the differential equations. Another, more direct, possibility is the exponential series

etQ :=∑

n≥0

(tQ)n

n!

where tQ is scalar multiplication, i.e. multiplication of each entry of Q by the scalar t, and(tQ)n is an n-fold matrix product. Let us focus on a finite state space S, so that the onlylimiting procedure is the series over n ≥ 0. It is natural to consider a series of matrices as amatrix consisting of the series of the corresponding entries. In fact, this works in full generality,as long as S is finite, see Norris 2.10.

For infinite state space, this is much harder, since every entry in a matrix product is then

already a limiting quantity and one will either need uniform control over entries or use operator

norms to make sense of the series of matrices. The limited benefits from such a theory are not

worth setting up the technical apparatus in our context.

Page 42: Part B Applied Probability

34 Lecture Notes – Part B Applied Probability – Oxford MT 2007

Page 43: Part B Applied Probability

Lecture 7

The class structure ofcontinuous-time Markov chains

Reading: Norris 3.2-3.5Further reading: Grimmett-Stirzaker 6.9

In this lecture, we introduce for continuous-time chains the notions of irreducibilityand positive recurrence that will be needed for the convergence theorems in Lecture 8.

7.1 Communicating classes and irreducibility

We define the class structure characteristics as for discrete-time Markov chains.

Definition 38 Let X be a continuous-time Markov chain.

(a) We say that i ∈ S leads to j ∈ S and write i→ j if

Pi(Xt = j for some t ≥ 0) = Pi(Tj <∞) > 0, where Tj = inft ≥ 0 : Xt = j.

(b) We say i communicates with j and write i↔ j if both i→ j and j → i.

(c) We say A ⊂ S is a communicating class if it is an equivalence class for the equiva-lence relation ↔ on S, i.e. if for all i, j ∈ A we have i ↔ j and A is maximal withthis property (for all k ∈ S − A, i ∈ A at most one of i→ k, k → i holds).

(d) We say A is a closed class if there is no i ∈ A, j ∈ S −A with i→ j, i.e. the chaincannot leave A.

(e) We say that i is an absorbing state if i is closed.

(f) We say that X is irreducible if S is (the only) communicating class.

In the following we denote by M = (Mn)n≥0 the jump chain, (Zn)n≥0 the holdingtimes that we used in the construction of X = (Xt)t≥0.

35

Page 44: Part B Applied Probability

36 Lecture Notes – Part B Applied Probability – Oxford MT 2007

Proposition 39 Let X be a minimal continuous-time Markov chain. For i, j ∈ S, i 6= j,the following are equivalent

(i) i→ j for X.

(ii) i→ j for M .

(iii) There is a sequence (i0 . . . , in), ij ∈ S, from i0 = i to in = j such that

n−1∏

j=0

qij ,ij+1> 0.

(iv) pij(t) > 0 for all t > 0.

(v) pij(t) > 0 for some t > 0.

Proof: Implications (iv)⇒(v)⇒(i)⇒(ii) are clear.(ii)⇒(iii): From the discrete-time theory, we know that i → j for M implies that

there is a path (i0, . . . , in) from i to j with

n−1∏

k=0

πik,ik+1> 0, hence

n−1∏

k=0

πik,ik+1λik > 0

since λm = 0 if and only if πmm = 1.(iii)⇒(iv) If qij > 0, then we can get a lower bound for pij(t) by only allowing one

transition in [0, t] by

pij(t) ≥ Pi(Z0 ≤ t,M1 = j, Z1 > t)

= Pi(Z0 ≤ t)Pi(M1 = j)P(Z1 > t|M1 = j)

= (1 − e−λit)πije−λjt > 0

for all t > 0, hence in general for the path (i0, . . . , in) given by (iii)

pij(t) = Pi(Xt = j) ≥ Pi(Xkt/n = ik for all k = 1, . . . , n)

=

n−1∏

k=0

pik,ik+1(t/n) > 0

for all t > 0. For the last equality, we used the Markov property which implies that forall m = 1, . . . , n

P(Xmt/n = im|Xkt/n = ik for all k = 0, . . . , m− 1) = P(Xmt/n = im|X(m−1)t/n = im−1)

= pim−1,im(t/n).

2

Condition (iv) shows that the situation is simpler than in discrete-time where it maybe possible to reach a state, but only after a certain length of time, and then onlyperiodically.

Page 45: Part B Applied Probability

Lecture 7: The class structure of continuous-time Markov chains 37

7.2 Recurrence and transience

Definition 40 Let X be a continuous-time Markov chain.

(a) i ∈ S is called recurrent if

Pi(t ≥ 0 : Xt = i is unbounded) = 1.

(b) i ∈ S is called transient if

Pi(t ≥ 0 : Xt = i is bounded) = 1.

Note that if X can explode starting from i and if X is a minimal continuous-timeMarkov chain, then i is certainly not recurrent.

Recall that Ni = infn ≥ 1 : Mn = i is called the first passage time of M to state i.We define

Hi = TNi= inft ≥ T1 : Xt = i,

the first passage time of X to state i. Note that we require the chain to do at least onejump. This is to force X to leave i first if X0 = i. We also define the successive passagetimes by N

(1)i = Ni and N

(m+1)i = infn > N

(m)i : Mn = i, m ≥ 1, for M , and

H(m)i = T

N(m)i

,

m ≥ 1, for X.

Proposition 41 i ∈ S is recurrent (transient) for a minimal continuous-time Markovchain X if and only if i is recurrent (transient) for the jump chain M .

Proof: Suppose, i is recurrent for the jump chain M , i.e. M visits i infinitely often, atsteps (N

(m)i )m≥1. If we denote by 1X0=i the random variable that is 1 if X0 = i and 0

otherwise, the total amount of time that X spends at i is

Z01X0=i +∑

m≥1

ZN

(m)i

= ∞

with probability 1 by the argument for Proposition 16 (convergent and divergent sums ofindependent exponential variables) since ZNm

∼ Exp(λi) and the sum of their (identical!)inverse parameters is infinite. In particular t ≥ 0 : Xt = i must be unbounded withprobability 1.

Suppose, i is transient for the jump chain M , then there is a last step L < ∞ awayfrom i and

t ≥ 0 : Xt = i ⊂ [0, TL)

is bounded with probability 1.The inverse implications are now obvious since i can only be either recurrent or

transient for M and we constructed all minimal continuous-time Markov chains fromjump chains. 2

Page 46: Part B Applied Probability

38 Lecture Notes – Part B Applied Probability – Oxford MT 2007

From this result and the analogous properties for discrete-time Markov chains, wededuce

Corollary 42 Every state i ∈ S is either recurrent or transient for X.

Recall that a class property is a property of states that either all states in a (commu-nicating) class have or all states in a (communicating) class don’t have.

Corollary 43 Recurrence and transience are class properties.

Proof: If i is recurrent and i ↔ j, for X, then i is recurrent and i ↔ j for M . Fromdiscrete-time Markov chain theory, we know that j is recurrent for M . Therefore j isrecurrent for X.

The proof for transience is similar. 2

Proposition 44 For any i ∈ S the following are equivalent:

(i) i is recurrent for X.

(ii) λi = 0 or Pi(Hi <∞) = 1.

(iii)

∫ ∞

0

pii(t)dt = ∞.

Proof: (iii)⇒(ii): One can deduce this from the corresponding discrete-time result, butwe give a direct argument here. Assume λi > 0 and hi = Pi(Hi = ∞) > 0. Then,

the strong Markov property at H(m)i states that, given H

(m)i <∞, the post-H

(m)i process

X(m+1) = (XH

(m)i +t

)t≥0 is distributed as X and independent of the pre-H(m)i process. Now

the total number G of visits of X to i must have a geometric distribution with parameterhi since Pi(G = 1) = hi and Pi(G = m|G ≥ m) = hi, m ≥ 2. Therefore, the total timespent in i is

G−1∑

m=0

ZN

(m)i

∼ Exp(hiλi), cf. Solution to Exercise A.2.6.

With notation 1Xt=i = 1 if Xt = i and 1Xt=i = 0 otherwise, we obtain by Tonelli’stheorem

∫ ∞

0

pii(t)dt =

∫ ∞

0

Ei(1Xt=i)dt

= Ei

(∫ ∞

0

1Xt=idt

)= Ei

(G−1∑

m=0

ZN

(m)i

)=

1

hiλi<∞.

The other implications can be established using similar arguments. 2

Page 47: Part B Applied Probability

Lecture 7: The class structure of continuous-time Markov chains 39

7.3 Positive and null recurrence

As in the discrete-time case, there is a link between recurrence and the existence ofinvariant distributions. More precisely, recurrence is strictly weaker. The stronger notionrequired is positive recurrence:

Definition 45 A state i ∈ S is called positive recurrent if either λi = 0 or mi = Ei(Hi) <∞. Otherwise, we call i null recurrent.

Fact 46 Positive recurrence is a class property.

7.4 Examples

Example 47 The M/M/1 queue with λ > 0 and µ > 0 is irreducible since for allm > n ≥ 0, we have qm,m−1 . . . qn+1,n = µm−n > 0 and qn,n+1 . . . qm−1,m = λm−n > 0 andProposition 39 yields m↔ n.

λ > µ means that customers arrive at a higher rate than they leave. Intuitively, thismeans that Xt → ∞ (this can be shown by comparison of the jump chain with a simplerandom walk with up probability λ/(λ + µ) > 1/2). As a consequence, Li = supt ≥0 : Xt = i < ∞ for all i ∈ N, and since t ≥ 0 : Xt = i ⊂ [0, Li], we deduce that i istransient.

λ < µ means that customers arrive at a slower rate than they can leave. Intuitively,this means that Xt will return to zero infinitely often. The mean of the return timecan be estimated by comparison of the jump chain with a simple random walk with upprobability λ/(λ+ µ) < 1/2:

E0(H0) = E

(N0−1∑

k=0

Zk

)=

∞∑

n=2

P(N0 = n)E

(n−1∑

k=0

Zk

∣∣∣∣∣N0 = n

)

=1

λ+

∞∑

n=2

P(N0 = n)E

(n−1∑

k=1

Yk

)

=1

λ+

∞∑

n=2

P(N0 = n)n− 1

λ + µ=

1

λ+

E0(N0) − 1

λ+ µ<∞,

where Y1, Y2, . . . ∼ Exp(λ + µ). Therefore, 0 is positive recurrent. Since positive recur-rence is a class property, all states are positive recurrent.

For λ = µ, the same argument shows that 0 is null-recurrent, by comparison withsimple symmetric random walk.

Note in each case, that the jump chain is not a simple random walk, but coincideswith a simple random walk until it hits zero. This is enough to calculate E0(N0).

Example 48 Let λ ≥ 0 and µ ≥ 0. Consider a simple birth and death process withQ-matrix Q = (qnm)n,m∈N, where qnn = −n(λ + µ), qn,n+1 = nλ, qn,n−1 = nµ, qnm = 0otherwise.

Page 48: Part B Applied Probability

40 Lecture Notes – Part B Applied Probability – Oxford MT 2007

• If µ = 0 and λ = 0, then Q ≡ 0, all states are absorbing, so the communicatingclasses are n, n ∈ N. They are all closed and positive recurrent.

• If µ = 0 and λ > 0, then 0 is still absorbing since q00 = 0, but otherwise n → m ifand only if 1 ≤ n ≤ m. Again, the communicating classes are n, n ∈ N, 0 isclosed and positive recurrent, but n is open and transient for all n ≥ 1, since theprocess will not return after the Exp(nλ) holding time.

• If µ > 0 and λ = 0, then 0 is still absorbing, n is an open transient class.

• If µ > 0 and λ > 0, then 0 is still absorbing, 1, 2, . . . is an open and transientcommunicating class. It can be shown that the process when starting from i ≥ 1will be absorbed in 0 if λ ≤ µ and that it will do so with a probability in (0, 1)if λ > µ.

Page 49: Part B Applied Probability

Lecture 8

Convergence to equilibrium

Reading: Norris 3.5-3.8Further reading: Grimmett-Stirzaker 6.9; Ross 6.5-6.6

In Lecture 7 we studied the class structure of continuous-time Markov chains. We cansummarize the findings by saying that the state space can be decomposed into (disjoint)communicating classes

S =⋃

m≥1

Rm ∪⋃

m≥1

Tm,

where the (states in) Rm are recurrent, hence closed, and the Tm are transient, whetherclosed or not. This is the same as for discrete-time Markov chains, in fact equivalent tothe decomposition for the associated jump chain. Furthermore, each recurrent class iseither positive recurrent or null recurrent.

To understand equilibrium behaviour, one should look at each recurrent class sepa-rately. The complete picture can then be set together from its pieces on the separateclasses. This is relevant in some applications, but not for the majority, and not for thosewe want to focus on here. We therefore only treat the case where we have only one classthat is recurrent. We called this case irreducible. The reason for this name is that wecannot further reduce the state space without changing the transition mechanisms. Wewill further focus on the positive recurrent case.

8.1 Invariant distributions

Note that for an initial distribution ν on S, X0 ∼ ν we have

P(Xt = j) =∑

i∈S

P(X0 = i)P(Xt = j|X0 = i) = (νP (t))j

where νP (t) is the product of a row vector ν with the matrix P (t), and we extract thejth component of the resulting row vector.

Definition 49 A distribution ξ on S is called invariant for a continuous-time Markovchain if ξP (t) = ξ for all t ≥ 0.

41

Page 50: Part B Applied Probability

42 Lecture Notes – Part B Applied Probability – Oxford MT 2007

If we take an invariant distribution ξ as initial distribution, then Xt ∼ ξ for all t ≥ 0.We then say that X is in equilibrium.

Proposition 50 If S is finite, then ξ is invariant if and only if ξQ = 0.

Proof: If ξP (t) = ξ for all t ≥ 0, then by the forward equation

ξQ = ξP (t)Q = ξP ′(t) = ξ limh→0

P (t+ h) − P (t)

h= lim

h→0

ξP (t+ h) − ξP (t)

h= 0.

If ξQ = 0, we have

ξP (t) = ξP (0) + ξ

∫ t

0

P ′(s)ds = ξ +

∫ t

0

ξQP (s)ds = ξ

where we applied the backward equation. Here, also the integration is understood com-ponentwise.

Interchanging limits/integrals and matrix multiplication is justified since S is finite.2

Fact 51 If S is infinite, Q is a Q-matrix and (P (t))t≥0 are the transition matrices of theminimal continuous-time Markov chain associated with Q-matrix Q. Then ξQ = 0 if andonly if ξP (t) = ξ for all t ≥ 0.

As a consequence, ξ can then only exist if X is non-explosive in the sense that P(T∞ =∞) = 1.

Fact 52 An irreducible (minimal) continuous-time Markov chain is positive recurrent ifand only if it has an invariant distribution. An invariant distribution ξ can then be givenby

ξi =1

miλi

, i ∈ S,

where mi = Ei(Hi) is the mean return time to i and λi = −qii the holding rate in i.

The proof is quite technical and does not give further intuition. The analogous resultfor discrete chains holds and gives ηi = 1/Ei(Ni) as invariant distribution. The furtherfactor λi occurs because a chain in stationarity is likely to be found in i if the return timeis short and the holding time is long; both observations are reflected through the inverseproportionality to mi and λi, respectively. Since this is a key result for both ConvergenceTheorem and Ergodic Theorem, the diligent reader may want to refer to Norris Theorem3.5.3.

Page 51: Part B Applied Probability

Lecture 8: Convergence to equilibrium 43

Example 53 Consider the M/M/1 queue of Example 29. The equations ξQ = 0 aregiven by

−λξ0 + µξ1 = 0, λξi−1 − (λ+ µ)ξi + µξi+1 = 0, i ≥ 1.

This system of linear equations (for the unknowns ξi, i ∈ N) has a probability massfunction as its solution if and only if λ < µ. It is given by the geometric probabilities

ξi =

µ

)i(1 −

λ

µ

), i ∈ N.

By Fact 52, we can calculate Ei(Hi) = mi = 1/(λiξi). In particular, for i = 0, we havethe length of a full cycle beginning and ending with an empty queue. Since the initialempty period has average length 1/λ, the busy period has length

E0(H0) − 1/λ =1

λ (1 − λ/µ)=

1

µ− λ.

Note that this tends to infinity as λ ↑ µ.

8.2 Convergence to equilibrium

The convergence theorem is of central importance in applications since is it often as-sumed that a system is in equilibrium. The convergence theorem is a justification forthis assumption, since it means that a system must only be running long enough to be(approximately) in equilibrium.

Theorem 54 Let X = (Xt)t≥0 be a (minimal) irreducible positive-recurrent continuous-time Markov chain, X0 ∼ ν, and ξ an invariant distribution, then

P(Xt = j) → ξj as t→ ∞ for all j ∈ S.

This result can be deduced from the convergence result for discrete-time Markovchains by looking at the processes Z

(h)n = Xnh that are easily seen to be Markov chains

with transition matrices P (h).However, it is more instructive to see a (very elegant) direct argument, using the

coupling method in continuous time.

Sketch of proof: Let X be the continuous-time Markov chain starting according to ν, Yan independent continuous-time Markov chain with the same Q-matrix, but starting fromthe invariant distribution ξ. Choose i ∈ S and define T = inft ≥ 0 : (Xt, Yt) = (i, i)the time they first meet (in i, to simplify the argument). A third process is constructed

Xt = Xt, t < T (following the ν-chain before T ), Xt = Yt, t ≥ T (following the ξ-chainafter T ). The following three steps complete the proof:

1. the meeting time T is finite with probability 1 (this is because ηij = ξiξj is stationarydistribution for the bivariate process (X, Y ) and existence of stationary distributionimplies positive recurrence of X, Y (by Fact 52);

Page 52: Part B Applied Probability

44 Lecture Notes – Part B Applied Probability – Oxford MT 2007

2. the third chain X has the same distribution as the ν-chain X;

3. the third chain (which eventually coincides with the ξ-chain Y ) is asymptoticallyin equilibrium in the sense of the convergence statement in Theorem 54.

2

Note that we obtain the uniqueness of the invariant distribution as a consequencesince also the marginal distribution of a Markov chain starting from a second invariantdistribution would remain invariant and converge to ξ.

Theorem 55 (Ergodic theorem) In the setting of Theorem 54, X0 ∼ ν

P

(1

t

∫ t

0

1Xs=ids→ ξi as t→ ∞

)= 1

Proof: A proof using renewal theory is in assignment question A.5.4. 2

We interpret this as follows. For any initial distribution, the long-term proportions oftime spent in any state i approaches the invariant probability for this state. This resultestablishes a time-average analogue for the spatial average of Theorem 54. This is of greatpractical importance, since it allows us to observe the invariant distribution by lookingat time proportions over a long period of time. If we tried to observe the stationarydistribution using Theorem 54, we would need many independent observations of thesame system at a large time t to estimate ξ.

8.3 Detailed balance equations and time reversal

Proposition 56 Consider a Q-matrix Q. If the detailed balance equations

ξiqij = ξjqji, i, j ∈ S,

have a solution ξ = (ξi)i∈S, then ξ is a stationary distribution.

Proof: Let ξ be such that all detailed balance equations hold. Then fix j ∈ S and sumthe equations over i ∈ S to get

(ξQ)j =∑

i∈S

ξiqij =∑

i∈S

ξjqji = ξj∑

i∈S

qji = 0

since the row sums of any Q-matrix vanish (see Remark 26). Therefore ξQ = 0, asrequired. 2

Page 53: Part B Applied Probability

Lecture 8: Convergence to equilibrium 45

Note that (in the case of finite #S = n), while ξQ = 0 is a set of as many equations asunknowns, n, the detailed balance equations form a set of n(n− 1)/2 different equationsfor n unknowns, so one would not expect solutions, in general. However, if the Q-matrix issparse, i.e. contains lots of zeros, corresponding equations will be automatically satisfied,and these are the cases where we will successfully apply detailed balance equations.

The class of continuous-time Markov chains for which the detailed balance equationshave solutions can be studied further. They also arise naturally in the context of timereversal, a tool that may seem of little practical relevance, since our world lives forwardin time, but sometimes it is useful to model by a random process an unknown past.Sometimes, one can identify a duality relationships between two different processes, bothforward in time that reveals that the behaviour of one is the same as the behaviour ofthe time reversal of the other. This can allow to translate known results for one intointeresting new results for the other.

Proposition 57 Let X be an irreducible positive recurrent (minimal) continuous-timeMarkov chain with Q-matrix Q and starting from the invariant distribution ξ. Let t > 0be a fixed time and Xs = Xt−s−. Then the process X is a continuous-time Markov chainwith Q-matrix Q given by ξj qji = ξiqij.

Proof: First note that Q has the properties of a Q-matrix in being non-negative off thediagonal and satisfying

i6=j

qji =∑

i6=j

ξiξjqij = −qjj = −qjj

by the invariance of ξ. Similarly, we define ξj pji(t) = ξipij(t) and see that P (t) have theproperties of transition matrices. In fact the transposed forward equation P ′(t) = P (t)Qyields P ′(t) = QP (t), the backward equation for P (t). Now X is a continuous-timeMarkov chain with transition probabilities P (t) since

Pξ(Xt0 = i0, . . . , Xtn = in) = Pξ(Xt−tn = in, . . . , Xt−t0 = i0)

= ξin

n∏

k=1

pik,ik−1(tk − tk−1)

= ξi0

n∏

k=1

pik−1,ik(tk − tk−1).

From this we can deduce the Markov property. More importantly, the finite-dimensionaldistributions of X are the ones of a continuous-time Markov chain with transition matricesP (t). Together with the path structure, Remark 36 implies that X is a Markov chainwith Q-matrix Q. 2

If Q = Q, X is called reversible. It is evident from the definition of Q that ξ thensatisfies the detailed balance equations ξiqij = ξjqji, i, j ∈ S.

Page 54: Part B Applied Probability

46 Lecture Notes – Part B Applied Probability – Oxford MT 2007

8.4 Erlang’s formula

Example 58 Consider the birth-death process with birth rates qi,i+1 = λi and deathrates qi,i−1 = µi, qii = −λi − µi, i ∈ N, all other entries zero (and also µ0 = 0). (Thisis standard notation for this type of process, but note that λi = qi,i+1, we will not useearlier notation λi = −qii).

We recognise birth processes and queueing systems as special cases.To calculate invariant distributions, we solve ξQ = 0, i.e.

ξ1µ1 − ξ0λ0 = 0 and ξn+1µn+1 − ξn(λn + µn) + ξn−1λn−1 = 0, n ≥ 1

or more easily the detailed balance equations

ξiλi = ξi+1µi+1.

giving

ξn =λn−1 . . . λ0

µn . . . µ1ξ0

where ξ0 is determined by the normalisation requirement of ξ to be a probability massfunction, i.e.

ξ0 =1

Swhere S = 1 +

∞∑

n=1

λn−1 . . . λ0

µn . . . µ1

provided S is finite.If S is infinite, then there does not exist an invariant distribution. This cannot be

deduced from the detailed balance equations but can be argued directly by showing thatξQ = 0 does not have a solution. It does not necessarily mean explosion in finite time,but includes all simple birth processes since they model growing populations and cannotbe in equilibrium. By Fact 52, it means that X is then null recurrent or transient.

On the other hand, if λ0 = 0 as in many population models, then the invariantdistribution is concentrated in 0, i.e. ξ0 = 1, ξn = 0 for all n ≥ 1.

Many special cases can be given more explicitly. E.g., if λn = λ, n ≥ 0, µn = nµ, weget

ξn =(λ/µ)n

n!e−λ/µ.

You recognise the Poisson probabilities. What is this model? We can give two differ-ent interpretations both of which tie in with models that we have studied. First, as apopulation model, λn = λ means that arrivals occur according to a Poisson process, thiscan model immigration; µn = nµ is obtained from as many Exp(µ) clocks as individualsin the population, i.e. independent Exp(µ) lifetimes for all individuals. Second, as aqueueing model with arrivals according to a Poisson process, each individual leaves thesystem after an Exp(µ) time, no matter how many other people are in the system – thiscan be obtained from infinitely many servers working at rate µ.

Page 55: Part B Applied Probability

Lecture 9

The Strong Law of Large Numbers

Reading: Grimmett-Stirzaker 7.2; David Williams “Probability with Martingales” 7.2Further reading: Grimmett-Stirzaker 7.1, 7.3-7.5

With the Convergence Theorem (Theorem 54) and the Ergodic Theorem (Theorem55) we have two very different statements of convergence of something to a stationarydistribution. We are looking at a recurrent Markov chain (Xt)t≥0, i.e. one that visitsevery state at arbitrarily large times, so clearly Xt itself does not converge, as t → ∞.In this lecture, we look more closely at the different types of convergence and developmethods to show the so-called almost sure convergence, of which the statement of theErgodic Theorem is an example.

9.1 Modes of convergence

Definition 59 Let Xn, n ≥ 1, and X be random variables. Then we define

1. Xn → X in probability, if for all ε > 0, P(|Xn −X| > ε) → 0 as n→ ∞.

2. Xn → X in distribution, if P(Xn ≤ x) → P(X ≤ x) as n → ∞, for all x ∈ R atwhich x 7→ P(X ≤ x) is continuous.

3. Xn → X in L1, if E(|Xn|) <∞ for all n ≥ 1 and E(|Xn −X|) → 0 as n→ ∞.

4. Xn → X almost surely (a.s.), if P(Xn → X as n→ ∞) = 1.

Almost sure convergence is the notion that we will study in more detail here. It helpsto consider random variables as functions Xn : Ω → R on a sample space Ω, or at least asfunctions of a common, typically infinite, family of independent random variables. Whatis different here from previous parts of the course (except for the Ergodic Theorem, whichwe yet have to inspect more thoroughly), is that we want to calculate probabilities thatfundamentally depend on an infinite number of random variables. So far, we have beenable to revert to events depending on only finitely many random variables by conditioning.This will not work here.

47

Page 56: Part B Applied Probability

48 Lecture Notes – Part B Applied Probability – Oxford MT 2007

Let us start by recalling the definition of convergence of sequences, as n→ ∞,

xn → x ⇐⇒ ∀m≥1∃nm≥1∀n≥nm|xn − x| < 1/m.

If we want to consider all sequences (xn)n≥1 of possible values of the random variables(Xn)n≥1, then

nm = infk ≥ 1 : ∀n≥k|xn − x| < 1/m ∈ N ∪ ∞

will vary as a function of the sequence (xn)n≥1, and so it will become a random variable

Nm = infk ≥ 1 : ∀n≥k|Xn −X| < 1/m ∈ N ∪ ∞

as a function of (Xn)n≥1. This definition of Nm permits us to write

P(Xn → X) = P(∀m≥1Nm <∞).

This will occasionally help, when we are given almost sure convergence, but is not muchuse when we want to prove almost sure convergence. To prove almost sure convergence,we can transform as follows

P(Xn → X) = P(∀m≥1∃N≥1∀n≥N |Xn −X| < 1/m) = 1

⇐⇒ P(∃m≥1∀N≥1∃n≥N |Xn −X| ≥ 1/m) = 0.

We are used to events such as Am,n = |Xn −X| ≥ 1/m, and we understand events assubsets of Ω, or loosely identify this event as set of all ((xk)k≥1, x) for which |xn−x| ≥ 1/m.This is useful, because we can now translate ∃m≥1∀N≥1∃n ≥ N into set operations andwrite

P(∪m≥1 ∩N≥1 ∪n≥NAm,n) = 0.

This event can only have zero probability if all events ∩N≥1 ∪n≥N Am,n, m ≥ 1, havezero probability (formally, this follows from the sigma-additivity of the measure P). TheBorel-Cantelli lemma will give a criterion for this.

Proposition 60 The following implications hold

Xn → X almost surely⇓

Xn → X in probability ⇒ Xn → X in distribution⇑

Xn → X in L1 ⇒ E(Xn) → E(X)

No other implications hold in general.

Proof: Most of this is Part A material. Some counterexamples are on Assignment5. It remains to prove that almost sure convergence implies convergence in probability.Suppose, Xn → X almost surely, then the above considerations yield P(∀m≥1Nm <∞) =1, i.e. P(Nk <∞) ≥ P(∀m≥1Nm <∞) = 1 for all k ≥ 1.

Now fix ε > 0. Choose m ≥ 1 such that 1/m < ε. Then clearly |Xn −X| > ε > 1/mimplies Nm > n so that

P(|Xn −X| > ε) ≤ P(Nm > n) → P(Nm = ∞) = 0,

as n→ ∞, for any ε > 0. Therefore, Xn → X in probability. 2

Page 57: Part B Applied Probability

Lecture 9: The Strong Law of Large Numbers 49

9.2 The first Borel-Cantelli lemma

Let us now work on a sample space Ω. It is safe to think of Ω = RN × R and ω ∈ Ω asω = ((xn)n≥1, x) as the set of possible outcomes for an infinite family of random variables(and a limiting variable).

The Borel-Cantelli lemmas are useful to prove almost sure results. Particularly lim-iting results often require certain events to happen infinitely often (i.o.) or only a finitenumber of times. Logically, this can be expressed as follows. Consider events An ⊂ Ω,n ≥ 1. Then

ω ∈ An i.o. ⇐⇒ ∀n≥1∃m≥n ω ∈ Am ⇐⇒ ω ∈⋂

n≥1

m≥n

Am

Lemma 61 (Borel-Cantelli (first lemma)) Let A =⋂

n≥1

⋃m≥nAm be the event that

infinitely many of the events An occur. Then

n≥1

P(An) <∞ ⇒ P(A) = 0

Proof: We have that A ⊂⋃

m≥nAm for all n ≥ 1, and so

P(A) ≤ P

(⋃

m≥n

Am

)≤∑

m≥n

P(Am) → 0 as n→ ∞

whenever∑

n≥1 P(An) <∞. 2

9.3 The Strong Law of Large Numbers

Theorem 62 Let (Xn)n≥1 be a sequence of independent and identically distributed (iid)random variables with E(X4

1 ) <∞ and E(X1) = µ. Then

Sn

n:=

1

n

n∑

i=1

Xi → µ almost surely.

Fact 63 Theorem 62 remains valid without the assumption E(X41 ) < ∞, just assuming

E(|X1|) <∞.

The proof for the general result is hard, but under the extra moment conditionE(X4

1 ) <∞ there is a nice proof.

Lemma 64 In the situation of Theorem 62, there is a constant K <∞ such that for alln ≥ 0

E((Sn − nµ)4) ≤ Kn2.

Page 58: Part B Applied Probability

50 Lecture Notes – Part B Applied Probability – Oxford MT 2007

Proof: Let Zk = Xk − µ and Tn = Z1 + . . .+ Zn = Sn − nµ. Then

E(T 4n) = E

(

n∑

i=1

Zi

)4 = nE(Z4

1 ) + 3n(n− 1)E(Z21Z

22) ≤ Kn2

by expanding the fourth power and noting that most terms vanish such as

E(Z1Z32 ) = E(Z1)E(Z3

2 ) = 0.

K was chosen appropriately, say K = 4 maxE(Z41 ), (E(Z2

1))2. 2

Proof of Theorem 62: By the lemma,

E

((Sn

n− µ

)4)

≤ Kn−2

Now, by Tonelli’s theorem,

E

(∑

n≥1

(Sn

n− µ

)4)

=∑

n≥0

E

((Sn

n− µ

)4)<∞ ⇒

n≥1

(Sn

n− µ

)4

<∞ a.s.

But if a series converges, the underlying sequence converges to zero, and so

(Sn

n− µ

)4

→ 0 almost surely ⇒Sn

n→ µ almost surely.

2

This proof did not use the Borel-Cantelli lemma, but we can also conclude by theBorel-Cantelli lemma:

Proof of Theorem 62: We know by Markov’s inequality that

P

(1

n|Sn − nµ| ≥ n−γ

)≤

E((Sn/n− µ)4)

n−4γ= Kn−2+4γ .

Define for γ ∈ (0, 1/4)

An =

1

n|Sn − nµ| ≥ n−γ

n≥1

P(An) <∞ ⇒ P(A) = 0

by the first Borel-Cantelli lemma, where A =⋂

n≥1

⋃m≥nAn. But now, event Ac happens

if and only if

∃N∀n≥N

∣∣∣∣Sn

n− µ

∣∣∣∣ < n−γ ⇒Sn

n→ µ.

2

Page 59: Part B Applied Probability

Lecture 9: The Strong Law of Large Numbers 51

9.4 The second Borel-Cantelli lemma

We won’t need the second Borel-Cantelli lemma in this course, but include it for completeness.

Lemma 65 (Borel-Cantelli (second lemma)) Let A =⋂

n≥1

⋃m≥n An be the event that

infinitely many of the events An occur. Then

n≥1

P(An) = ∞ and (An)n≥1 independent ⇒ P(A) = 1.

Proof: The conclusion is equivalent to P(Ac) = 0. By de Morgan’s laws

Ac =⋃

n≥1

m≥n

Acm.

However,

P

m≥n

Acm

= lim

r→∞P

(r⋂

m=n

Acm

)

=∏

m≥n

(1 − P(Am)) ≤∏

m≥n

exp (−P(Am)) = exp

m≥n

P(Am)

= 0

whenever∑

n≥1 P(An) = ∞. Thus

P(Ac) = limn→∞

P

m≥n

Acm

= 0.

2

As a technical detail: to justify some of the limiting probabilities, we use “continuity of P”along increasing and decreasing sequences of events, that follows from the sigma-additivity ofP, cf. Grimmett-Stirzaker, Lemma 1.3.(5).

9.5 Examples

Example 66 (Arrival times in Poisson process) A Poisson process has independentand identically distributed inter-arrival times (Zn)n≥0 with Zn ∼ Exp(λ). We denotedthe partial sums (arrival times) by Tn = Z0 + . . . + Zn−1. The Strong Law of LargeNumbers yields

Tn

n→

1

λalmost surely, as n→ ∞.

Page 60: Part B Applied Probability

52 Lecture Notes – Part B Applied Probability – Oxford MT 2007

Example 67 (Return times of Markov chains) For a positive-recurrent discrete-timeMarkov chain we denoted by

Ni = N(1)i = infn > 0 : Mn = i, N

(m+1)i = infn > N

(m)i : Mn = i, m ∈ N,

the successive return times to 0. By the strong Markov property, the random variablesN

(m+1)i −N

(m)i , m ≥ 1 are independent and identically distributed. If we define N

(0)i = 0

and start from i, then this holds for m ≥ 0. The Strong Law of Large Number yields

N(m)i

m→ Ei(Ni) almost surely, as m→ ∞.

Similarly, in continuous time, for

Hi = H(1)i = inft ≥ T1 : Xt = i, H

(m)i = T

N(m)i

, m ∈ N,

we get

H(m)i

m→ Ei(Hi) = mi almost surely, as m→ ∞.

Example 68 (Empirical distributions) If (Yn)n≥1 is an infinite sample (independentand identically distributed random variables) from a discrete distribution ν on S, then the

random variables B(i)n = 1Yn=i, n ≥ 1, are also independent and identically distributed

for each fixed i ∈ S, as functions of independent variables. The Strong Law of LargeNumbers yields

ν(n)i =

#k = 1, . . . , n : Yk = i

n=B

(i)1 + . . .+B

(i)n

n→ E(B

(i)1 ) = P(Y1 = i) = νi

almost surely, as n→ ∞. The probability mass function ν(n) is called empirical distribu-tion. It lists relative frequencies in the sample and, for a specific realisation, can serve asan approximation of the true distribution. In applications of statistics, it is the sampledistribution associated with a population distribution. The result that empirical distri-butions converge to the true distribution, is true uniformly in i and in higher generality,it is usually referred to as the Glivenko-Cantelli theorem.

Remark 69 (Discrete ergodic theorem) If (Mn)n≥0 is a positive-recurrent discrete-time Markov chain, the Ergodic Theorem is a statement very similar to the example ofempirical distributions

#k = 0, . . . , n− 1 : Mk = i

n→ Pη(M0 = i) = ηi almost surely, as n→ ∞,

for a stationary distribution η, but of course, the Mn, n ≥ 0, are not independent (ingeneral). Therefore, we need to work a bit harder to deduce the Ergodic Theorem fromthe Strong Law of Large Numbers.

Page 61: Part B Applied Probability

Lecture 10

Renewal processes and equations

Reading: Grimmett-Stirzaker 10.1-10.2; Ross 7.1-7.3

10.1 Motivation and definition

So far, the topic has been continuous-time Markov chains, and we’ve introduced themas discrete-time Markov chains with exponential holding times. In this setting we havea theory very much similar to the discrete-time theory, with independence of future andpast given the present (Markov property), transition probabilities, invariant distributions,class structure, convergence to equilibrium, ergodic theorem, time reversal, detailed bal-ance etc. A few odd features can occur, mainly due to explosion.

These parallels are due to the exponential holding times and their lack of memoryproperty which is the key to the Markov property in continuous time. In practice, thisassumption is often not reasonable.

Example 70 Suppose that you count the changing of batteries for an electrical device.Given that the battery has been in use for time t, is its residual lifetime distributed asits total lifetime? We would assume this, if we were modelling with a Poisson process.

We may wish to replace the exponential distribution by other distributions, e.g. onethat cannot take arbitrarily large values or, for other applications, one that can produceclustering effects (many short holding times separated by significantly longer ones). Westarted the discussion of continuous-time Markov chains with birth processes as gener-alised Poisson processes. Similarly, we start here generalising the Poisson process to havenon-exponential but independent identically distributed inter-arrival times.

Definition 71 Let (Zn)n≥0 be a sequence of independent identically distributed positiverandom variables, Tn =

∑n−1k=0 Zk, n ≥ 1, the partial sums. Then the process X = (Xt)t≥0

defined by

Xt = #n ≥ 1 : Tn ≤ t

is called a renewal process. The common distribution of Zn, n ≥ 0, is called inter-arrivaldistribution.

53

Page 62: Part B Applied Probability

54 Lecture Notes – Part B Applied Probability – Oxford MT 2007

Example 72 If (Yt)t≥0 is a continuous-time Markov chain with Y0 = i, then Zn =

H(n+1)i − H

(n)i , the times between successive returns to i by Y , are independent and

identically distributed (by the strong Markov property). The associated counting process

Xt = #n ≥ 1 : H(n)i ≤ t

counting the visits to i is thus a renewal process.

10.2 The renewal function

Definition 73 The function t 7→ m(t) := E(Xt) is called the renewal function.

It plays an important role in renewal theory. Remember that for Zn ∼ Exp(λ) we hadXt ∼ Poi(λt) and in particular m(t) = E(Xt) = λt.

To calculate the renewal function for general renewal processes, we should investigatethe distribution of Xt. Note that, as for birth processes,

Xt = k ⇐⇒ Tk ≤ t < Tk+1,

so that we can express

P(Xt = k) = P(Tk ≤ t < Tk+1) = P(Tk ≤ t) − P(Tk+1 ≤ t)

in terms of the distributions of Tk = Z0 + . . .+ Zk−1, k ≥ 1.Recall that for two independent continuous random variables S and T with densities

f and g, the random variable S + T has density

(f ∗ g)(u) =

∫ ∞

−∞

f(u− t)g(t)dt, u ∈ R,

the convolution (product) of f and g, and if S ≥ 0 and T ≥ 0, then

(f ∗ g)(u) =

∫ u

0

f(u− t)g(t)dt, u ≥ 0.

It is not hard to check that the convolution product is symmetric, associative and dis-tributes over sums of functions. While the first two of these properties translate asS + T = T + S and (S + T ) + U = S + (T + U) for associated random variables, thethird property has no such meaning, since sums of densities are no longer probabilitydensities. However, the definition of the convolution product makes sense for generalnonnegative integrable functions, and we will meet other relevant examples soon. Wecan define convolution powers f ∗(1) = f and f ∗(k+1) = f ∗ f ∗(k), k ≥ 1. Then

P(Tk ≤ t) =

∫ t

0

fTk(s)ds =

∫ t

0

f ∗(k)(s)ds,

if Zn, n ≥ 0, are continuous with density f .

Page 63: Part B Applied Probability

Lecture 10: Renewal processes and equations 55

Proposition 74 Let X be a renewal process with interarrival density f . Then m(t) =E(Xt) is differentiable in the weak sense that it is the integral function of

m′(s) :=

∞∑

k=1

f ∗(k)(s)

Lemma 75 Let X be an N-valued random variable. Then E(X) =∑

k≥1 P(X ≥ k).

Proof: We use Tonelli’s Theorem

k≥1

P(X ≥ k) =∑

k≥1

j≥k

P(X = j) =∑

j≥1

j∑

k=1

P(X = j) =∑

j≥0

jP(X = j) = E(X).

2

Proof of Proposition 74: Let us integrate∑∞

k=1 f∗(k)(s) using Tonelli’s Theorem

∫ t

0

∞∑

k=1

f ∗(k)(s)ds =

∞∑

k=1

∫ t

0

f ∗(k)(s)ds =

∞∑

k=1

P(Tk ≤ t) =

∞∑

k=1

P(Xt ≥ k) = E(Xt) = m(t).

2

10.3 The renewal equation

For continuous-time Markov chains, conditioning on the first transition time was a pow-erful tool. We can do this here and get what is called the renewal equation.

Proposition 76 Let X be a renewal process with interarrival density f . Then m(t) =E(Xt) is the unique (locally bounded) solution of

m(t) = F (t) +

∫ t

0

m(t− s)f(s)ds, i.e. m = F + f ∗m,

where F (t) =∫ t

0f(s)ds = P(Z1 ≤ t).

Proof: Conditioning on the first arrival will involve the process Xu = XT1+u, u ≥ 0.

Note that X0 = 1 and that Xu−1 is a renewal process with interarrival times Zn = Zn+1,n ≥ 0, independent of T1. Therefore

E(Xt) =

∫ ∞

0

f(s)E(Xt|T1 = s)ds =

∫ t

0

f(s)E(Xt−s)ds = F (s) +

∫ t

0

f(s)m(t− s)ds.

Page 64: Part B Applied Probability

56 Lecture Notes – Part B Applied Probability – Oxford MT 2007

For uniqueness, suppose that also ℓ = F + f ∗ ℓ, then α = ℓ−m is locally bounded andsatisfies α = f ∗ α = α ∗ f . Iteration gives α = α ∗ f ∗(k) for all k ≥ 1 and, summing overk gives for the right hand side something finite:

∣∣∣∣∣

(∑

k≥1

α ∗ f ∗(k)

)(t)

∣∣∣∣∣ =

∣∣∣∣∣

(α ∗∑

k≥1

f ∗(k)

)(t)

∣∣∣∣∣ = |(α ∗m′) (t)|

=

∣∣∣∣∫ t

0

α(t− s)m′(s)ds

∣∣∣∣ ≤(

supu∈[0,t]

|α(u)|

)m(t) <∞

but the left-hand side is infinite unless α(t) = 0. Therefore ℓ(t) = m(t), for all t ≥ 0. 2

Example 77 We can express m as follows: m = F + F ∗∑

k≥1 f∗(k). Indeed, we check

that ℓ = F + F ∗∑

k≥1 f∗(k) satisfies the renewal equation:

F + f ∗ ℓ = F + F ∗ f + F ∗∑

j≥2

f ∗(j) = F + F ∗∑

k≥1

f ∗(k) = ℓ,

just using properties of the convolution product. By Proposition 76, ℓ = m.

Unlike Poisson processes, general renewal processes do not have a linear renewal func-tion, but it will be asymptotically linear (Elementary Renewal Theorem, as we will see).In fact, renewal functions are in one-to-one correspondence with interarrival distributions– we do not prove this, but it should not be too surprising given that m = F + f ∗m isalmost symmetric in f and m. Unlike the Poisson process, increments of general renewalprocesses are not stationary (unless we change the distribution of Z0 in a clever way,as we will see) nor independent. Some of the important results in renewal theory areasymptotic results.

These asymptotic results will, in particular, allow us to prove the Ergodic Theoremfor Markov chains.

10.4 Strong Law and Central Limit Theorem of re-

newal theory

Theorem 78 (Strong Law of renewal theory) Let X be a renewal process with meaninterarrival time µ ∈ (0,∞). Then

Xt

t→

1

µalmost surely, as t→ ∞.

Proof: Note that X is constant on [Tn, Tn+1) for all n ≥ 0, and therefore constant on[TXt

, TXt+1) ∋ t. Therefore, as soon as Xt > 0,

TXt

Xt≤

t

Xt<TXt+1

Xt=

TXt+1

Xt + 1

Xt + 1

Xt.

Page 65: Part B Applied Probability

Lecture 10: Renewal processes and equations 57

Now P(Xt → ∞) = 1, since X∞ ≤ n ⇐⇒ Tn+1 = ∞ which is absurd, since Tn+1 =Z0 + . . .+Zn is a finite sum of finite random variables. Therefore, we conclude from theStrong Law of Large Numbers for Tn, that

TXt

Xt→ µ almost surely, as t→ ∞.

Therefore, if Xt → ∞ and Tn/n→ µ, then

µ ≤ limt→∞

t

Xt

≤ µ as t→ ∞,

but this means P(Xt/t→ 1/µ) ≥ P(Xt → ∞, Tn/n→ µ) = 1, as required. 2

Try to do this proof for convergence in probability. The nasty ε expressions are notvery useful in this context, and the proof is very much harder. But we can now deducea corresponding Weak Law of Renewal Theory, because almost sure convergence impliesconvergence in probability.

We also have a Central Limit Theorem:

Theorem 79 (Central Limit Theorem of Renewal Theory) Let X = (Xt)t≥0 be arenewal process whose interarrival times (Yn)n≥0 satisfy 0 < σ2 = V ar(Y1) < ∞ andµ = E(Y1). Then

Xt − t/µ√tσ2/µ3

→ N (0, 1) in distribution, as t→ ∞.

The proof is not difficult and left as an exercise on Assignment 5.

10.5 The elementary renewal theorem

Theorem 80 Let X be a renewal process with mean interarrival times µ and m(t) =E(Xt). Then

m(t)

t=

E(Xt)

t→

1

µas t→ ∞

Note that this does not follow easily from the strong law of renewal theory since almostsure convergence does not imply convergence of means (cf. Proposition 60, see also thecounter example on Assignment 5). In fact, the proof is longer and not examinable: westart with a lemma.

Lemma 81 For a renewal process X with arrival times (Tn)n≥1, we have

E(TXt+1) = µ(m(t) + 1), where m(t) = E(Xt), µ = E(T1).

Page 66: Part B Applied Probability

58 Lecture Notes – Part B Applied Probability – Oxford MT 2007

This ought to be true, because TXt+1 is the sum of Xt +1 interarrival times, each with mean µ.Taking expectations, we should get m(t) + 1 times µ. However, if we condition on Xt we haveto know the distribution of the residual interarrival time after t, but without lack of memory,we are stuck.

Proof: Let us do a one-step analysis on the quantity of interest g(t) = E(TXt+1):

g(t)=

∫ ∞

0E(TXt+1|T1 =s)f(s)ds =

∫ t

0

(s + E(TXt−s+1)

)f(s)ds +

∫ ∞

tsf(s)ds = µ + (g ∗ f)(t).

This is almost the renewal equation. In fact, g1(t) = g(t)/µ − 1 satisfies the renewal equation

g1(t) =1

µ

∫ t

0g(t − s)f(s)ds =

∫ t

0(g1(t − s) + 1)f(s)ds = F (t) + (g1 ∗ f)(t),

and, by Proposition 76, g1(t) = m(t), i.e. g(t) = µ(1 + m(t)) as required. 2

Proof of Theorem 80: Clearly t < E(TXt+1) = µ(m(t) + 1) gives the lower bound

lim inft→∞

m(t)

t≥

1

µ.

For the upper bound we use a truncation argument and introduce

Zj = Zj ∧ a =

Zj if Zj < aa if Zj ≥ a

with associated renewal process X. Zj ≤ Zj for all j ≥ 0 implies Xt ≥ Xt for all t ≥ 0, hencem(t) ≥ m(t). Putting things together, we get from the lemma again

t ≥ E(T eXt) = E(T eXt+1) − E(Z eXt

) = µ(m(t) + 1) − E(Z eXt) ≥ µ(m(t) + 1) − a.

Therefore

m(t)

t≤

1

µ+

a − µ

µt

so that

lim supt→∞

m(t)

t≤

1

µ

Now µ = E(Z1) = E(Z1 ∧ a) → E(Z1) = µ a a → ∞ (by monotone convergence). Therefore

lim supt→∞

m(t)

t≤

1

µ.

2

Note that truncation was necessary to get E(Z eXt) ≤ a. It would have been enough if we

had E(ZXt) = E(Z1) = µ, but this is not true. Look at the Poisson process as an example. Weknow that the residual lifetime has already mean µ = 1/λ, but there is also the part of ZXt

before time t. We will explore this in Lecture 11 when we discuss residual lifetimes in renewaltheory.

Page 67: Part B Applied Probability

Lecture 11

Excess life and stationarity

Reading: Grimmett-Stirzaker 10.3-10.4; Ross 7.7

So far, we have studied the behaviour of one-dimensional marginals Xt:

Xt

t→

1

µ,

E(Xt)

t→

1

µ,

Xt − t/µ√tσ2/µ3

→ N (0, 1).

For Poisson processes we also studied finite-dimensional marginals and described the jointdistributions of

Xt, Xt+s −Xt, . . . stationary, independent increments.

In this lecture, we will make some progress with such a programme for renewal processes.

11.1 The renewal property

To begin with, let us study the post-t process for a renewal process X, i.e. (Xt+s−Xt)s≥0.For fixed t, this is not a renewal process in the strict sense, but for certain random t = Twe have

Proposition 82 Let X be a renewal process, Ti = inft ≥ 0 : Xt = i. Then (Xr)r≤Ti

and (XTi+s −XTi)s≥0 are independent and (XTi+s −XTi

)s≥0 has the same distribution asX.

Proof: The proof is the same as (actually easier than) the proof of the strong Markovproperty of birth processes at Ti, cf. Exercise A.2.3(a). We identify the interarrival times

Zn = Zi+n, n ≥ 0, independent of Z0, . . . , Zi−1. Here, as for the special case of Poissonprocesses, we describe X = (XTi+s − XTi

)s≥0 rather than (XTi+s)s≥0 since the former isa renewal process (in the strict sense) whereas the latter starts at i. 2

Here are two simple examples that show why this cannot be extended to fixed timest = T .

59

Page 68: Part B Applied Probability

60 Lecture Notes – Part B Applied Probability – Oxford MT 2007

Example 83 If the interarrival times are constant, say P(Zn = 3) = 1, then X =

(Xt+s −Xt)s≥0 has a first arrival time Z0 with P(Z0 = 3 − t) = 1, for 0 ≤ t < 3.

The second example shows that also the independence of the pre-t and post-t processesfails.

Example 84 Suppose, the interarrival times can take three values, say P(Zn = 1) = 0.7,P(Zn = 2) = 0.2 and P(Zn = 19) = 0.1. Note

E(Zn) = 0.7 × 1 + 0.2 × 2 + 0.1 × 19 = 3

Let us investigate a potential renewal property at time t = 2. Denoting X = (Xt+s −

Xt)s≥0 with holding times Zn, n ≥ 0, we have

(1) X2 = 2 implies X1 = 1, Z0 = Z1 = 1, and we get P(Z0 = j|X2 = 2) = P(Z2 = j);

(2) X2 = 1 and X1 = 1 implies Z0 = 1 and Z1 ≥ 2, we get P(Z0 = 1|X2 = 1, X1 = 1) =P(Z1 = 2|Z1 ≥ 2) = 0.2/0.3 = 0.6;

(3) X2 = 1 and X1 = 0 implies Z0 = 2, we get P(Z0 = 1|X2 = 1, X1 = 0) = 0.7;

(4) X2 = 0 implies Z0 = 19, we get P(Z0 = 17|X2 = 0) = 1.

From (1) and (4) we obtain that Z0 is not independent of X2 and hence X depends on(Xr)r≤2.

From (2) and (3) we obtain that Z0 is not conditionally independent of X1 given

X2 = 1, so X depends on (Xr)r≤2 even conditionally given X2 = 1.

In general, the situation is more involved, but you can imagine that knowledge ofthe age At = t− TXt

at time t, gives you information about the excess lifetime (residual

lifetime) Et = TXt+1 − t which is the first arrival time Z0 of the post-t process. Inparticular, At and Et are not independent, and also the distribution of Et depends on t.Note that if the distribution of Et did not depend on At, we would get (something veryclose to, not exactly, since At is random) the lack of memory property and expect thatZj is exponential.

However, this is all about Z0, we have not met any problems for the following inter-arrival times. Therefore, we define “delayed” renewal processes, where the first renewaltime is different from the other inter-renewal times. In other words, the typical renewalbehaviour is delayed until the first renewal time. Our main interpretation will be thatZ0 is just part of an inter-renewal time, but we define a more general concept withinwhich we develop the idea of partial renewal times, which will show some effects thatseem paradoxial at first sight.

Definition 85 Let (Zn)n≥1 be a family of independent and identically distributed inter-arrival times and Z0 an independent interarrival time with possibly different distribution.Then the associated counting process X with interarrival times (Zn)n≥0 is called a delayedrenewal process.

Page 69: Part B Applied Probability

Lecture 11: Excess life and stationarity 61

As a corollary to Proposition 82, we obtain.

Corollary 86 (Renewal property) Proposition 82 also remains true for delayed re-newal processes. In this case, the post-Ti process will be undelayed.

Just as the Markov property holds at more general stopping times, the renewal prop-erty also holds at more general stopping times, provided that they take values in theset Tn, n ≥ 0 of renewal times. Here is a heuristic argument that can be made pre-cise: we apply Proposition 82 conditionally on T = Ti and get conditional independenceand the conditional distribution of the post-T process given T = Ti. We see that theconditional distribution of the post-T process does not depend on i and apply ExerciseA.1.5 to deduce the unconditional distribution of the post-T process and unconditionalindependence.

In particular, the renewal property does not apply to fixed T = t, but it does apply,e.g., to the first arrival (renewal) after time t, which is TXt+1. Actually, this special caseof the renewal property can be proved directly by conditioning on Xt.

Proposition 87 Given a (possibly delayed) renewal process X, for every t ≥ 0, the post-t

process Z = (Xt+s −Xt)s≥0 is a delayed renewal process with Z0 = Et.

Remark 88 Note that we make no statement about the dependence on the pre-t process.

Proof: We apply the renewal property to the renewal time TXt+1, the first renewal timeafter t. This establishes that (Zn)n≥1 are independent identically distributed interarrival

times, independent from the past, in particular from Z0 = Et. X therefore has thestructure of a delayed renewal process. 2

11.2 Size-biased picks and stationarity

An important step towards further limit results will be to choose a good initial delaydistribution. We cannot achieve independence of increments, but we will show that wecan achieve stationarity of increments.

Proposition 89 Let X be a delayed renewal process. If the distribution of the excessesEt does not depend on t ≥ 0, then X has stationary increments.

Proof: By Proposition 87, the post-t process X = (Xt+s −Xt)s≥0 is a delayed renewal

process with Z0 = Et. If the delay distribution does not depend on t, then the distributionof X does not depend on t. In particular, the distribution of Xs = Xt+s−Xt, an incrementof X, does not depend on t. 2

Page 70: Part B Applied Probability

62 Lecture Notes – Part B Applied Probability – Oxford MT 2007

In fact,the converse is true, as well. Also, it can be shown that E is a Markovprocess on the uncountable state space S = [0,∞), and we are looking for its invariantdistribution.

Since we have not identified the invariant distribution, we should look at Et for larget for a non-delayed renewal process since equilibrium (for Markov chains at least) isestablished asymptotically. We may either appeal to the Convergence Theorem and lookat the distribution of Et, say P(Et > x) for large t, or we may appeal to the ErgodicTheorem and look at the proportion of s ∈ [0, t] for which Es > x.

Example 90 Let us continue the discussion of Example 84 and look at Et for large t.First we notice that the fractional parts Et have a deterministic sawtooth behaviourforever since Zn, n ≥ 0, only take integer values. So, look at En for large n. 1. What isthe probability that n falls into a long interarrival time? Or 2. What proportion of nsfall into a long interarrival time? 10%?

In fact, En is a discrete-time Markov chain on 1, 2, . . . , 19 with transition probabil-ities

πi+1,i = 1, i ≥ 1, π1,j = P(Z1 = j),

which is clearly irreducible and positive recurrent if E1(H1) = E(Z1) < ∞, so it has aunique stationary distribution. This stationary distribution can be calculated in a generalsetting of N-valued interarrival times, see Assignment 6. Let us here work on the intuition:The answer to the first question is essentially the probability of being in a state 3, 4, . . . , 19under the stationary distribution (Convergence Theorem for P(En = j)), whereas theanswer to the second question is essentially the probability of being in a state 3, 4, . . . , 19under the stationary distribution (Ergodic Theorem for n−1#k = 0, . . . , n−1 : Ek = j).

In a typical period of 30 time units, we will have 10 arrivals, of which two will beseparated by one long interarrival time. More precisely, on average 11 ns out of 30 willfall into small interarrival times, and 19 will fall into the long interarrival time. This iscalled a size-biased pick since 19 is the size of the long interarrival time. 1 and 2 arethe sizes of the short interarrival times, this is weighted with their seven resp. two timeshigher proportion of occurence.

Furthermore, of these 19 that fall into the long interarrival time, one each has anexcess of k for k = 1, . . . , 19, i.e. we may expect the split of the current interarrival timeinto age and excess to be uniformly distributed over the interarrival time.

Definition 91 With a probability mass function p on N with µ =∑

n≥0 npn < ∞ we

associate the size-biased pick distribution psb as

psbn =

npn

µ, n ≥ 1.

With a probability density function f on [0,∞) with µ =∫∞

0tf(t)dt < ∞, we associate

the size-biased pick distribution

fsb(z) =zf(z)

µ.

Page 71: Part B Applied Probability

Lecture 11: Excess life and stationarity 63

Example 90 is for an N-valued interarrival distribution, where the renewal process(Xt)t≥0 will only see renewals at integer times, so it makes sense to consider the discretisedprocess (Xn)n≥0 as well as a discretised process (En)n≥0 of excesses at integer times. Thestory with continuous interarrival times is completely analogous but does not tie in withthe theory of discrete-time Markov chains, rather with [0,∞)-valued Markov processes.We can still calculate

1

Tk

∫ Tk

0

1Es>yds =1

Tk

k−1∑

j=0

(Zj − y)1Zj>y

using the Strong Law of Large Numbers on (Zj)j≥0 and Yj = (Zj − y)1Zj>y, j ≥ 0, toobtain

k∑k−1

j=0 Zj

∑k−1j=0 Yj

k→

E(Y1)

E(Z1)=

1

E(Z1)

∫ ∞

y

(z − y)f(z)dz

as (survival function P(Z0 > y) of the) proposed stationary distribution for (Et)t≥0.It is not hard to show that if L has the size-biased pick distribution and U is uniform

on [0, 1] then LU has density

f0(y) =1

µF (y).

Just check

P(LU > x) =

∫ 1

0

P(L > x/u|U = u)du =

∫ 1

0

∫ ∞

x/u

z

µf(z)dzdu =

∫ ∞

x

∫ 1

x/z

duz

µf(z)dz

=1

µ

∫ ∞

x

(z − x)f(z)dz

and differentiate to get

f0(x) = −d

dxP(LU > x) =

1

µxf(x) +

1

µF (x) −

1

µxf(x) =

1

µF (x).

If there is uniqueness of stationary distribution, Ergodic Theorem, etc. for the Markovprocess (Et)t≥0, we obtain the following result.

Proposition 92 Let X be a delayed renewal process where P(Zn > y) = F (y), n ≥ 1,and P(Z0 > y) = F0(y). Then X has stationary increments, and Et ∼ LU where L ∼ Fsb

and U ∼ U(0, 1) independent, for all t ≥ 0.

Proof: A proof within the framework of this course is on Assignment 6. 2

Example 93 Clearly, for the Exp(λ) distribution as inter-arrival distribution, we getf0(x) = λe−λx also. Note, however, that

f sb(x) =xf(x)

µ= λ2xe−λx, x ≥ 0,

is the probability density function of a Gamma(2, λ) distribution, the distribution of thesum of two independent Exp(λ) random variables.

Page 72: Part B Applied Probability

64 Lecture Notes – Part B Applied Probability – Oxford MT 2007

Page 73: Part B Applied Probability

Lecture 12

Convergence to equilibrium –renewal theorems

Reading: Grimmett-Stirzaker 10.4

Equilibrium for renewal processes X (increasing processes!) cannot be understoodin the same way as for Markov chains. We rather want increment processes Xt+s −Xt,t ≥ 0, to form a stationary processes for all fixed s. Note that we look at Xt+s −Xt as aprocess in t, so we are looking at all increments over a fixed interval length. This is notthe post-t process that takes Xt+s −Xt as a process in s.

12.1 Convergence to equilibrium

We will have to treat the discrete and continuous cases separately in most of the sequel.More precisely, periodicity is an issue for discrete interarrival times. The general notionneeded is that of arithmetic and non-arithmetic distributions:

Definition 94 A nonnegative random variable Z (and its distribution) are called d-arith-metic if P(Z ∈ dN) = 1, and d is maximal with this property. If Z is not d-arithmeticfor any d > 0, it is called non-arithmetic.

We think of Z as an interarrival time. All continuous distributions are non-arithmetic,but there are others, which will not be relevant for us. We will mostly focus on this case.Our second focus is the 1-arithmetic case, i.e. the integer-valued case, where additionallyP(Z ∈ dN) < 1 for all d ≥ 2. It is not difficult to formulate corresponding results for d-arithmetic interarrival distributions and also for non-arithmetic ones, once an appropriatedefinition of a size-biased distribution is found.

Fact 95 (Convergence in distribution) Let X be a (possibly delayed) renewal processhaving interarrival times with finite mean µ.

65

Page 74: Part B Applied Probability

66 Lecture Notes – Part B Applied Probability – Oxford MT 2007

(a) If the interarrival distribution is continuous, then

Xt+s −Xt → Xs in distribution, as t→ ∞,

where X is an associated stationary renewal process.

Also (At, Et) → (L(1 − U), LU) in distribution, as t → ∞ where L has the size-biased density fsb and U ∼ Unif(0, 1) is independent of L.

(b) If the interarrival distribution is integer-valued and 1-arithmetic, then

Xn+s −Xn → Xs in distribution, as n→ ∞.

where X is an associated delayed renewal process, not with continuous first arrivalLU , but with a discrete version: let L have size-biased probability mass functionpsb and Z0 conditionally uniform on 1, . . . , L. Also, (An, En) → (L− Z0, Z0) indistribution, as n→ ∞.

We will give a sketch of the coupling proof for the arithmetic case below.

12.2 Renewal theorems

The renewal theorems are now extensions of Fact 95 to convergence of certain moments.The renewal theorem itself concerns means. It is a refinement of the Elementary RenewalTheorem to increments, i.e. a second-order result.

Fact 96 (Renewal theorem) Let X be a (possibly delayed) renewal process having in-terarrival times with finite mean µ and renewal function m(t) = E(Xt).

(a) If the interarrival times are non-arithmetic, then for all h ≥ 0

m(t+ h) −m(t) →h

µas t→ ∞.

(b) If the interarrival times are 1-arithmetic, then for all h ∈ N

m(t+ h) −m(t) →h

µas t→ ∞.

As a generalisation that is often useful in applications, we mention a special case ofthe key renewal theorem:

Fact 97 (Key renewal theorem) Let X be a renewal process with continuous inter-arrival time distribution and m(t) = E(Xt). If g : [0,∞) → [0,∞) is integrable (over[0,∞) in the limiting sense) and non-increasing, then

(g ∗m′)(t) =

∫ t

0

g(t− x)m′(x)dx→1

µ

∫ ∞

0

g(x)dx as t→ ∞.

Page 75: Part B Applied Probability

Lecture 12: Convergence to equilibrium – renewal theorems 67

There are generalisations that allow bigger classes of g (directly Riemann-integrablefunctions) and put no restrictions on the interarrival distribution (at the cost of somelimits through discrete lattices in the d-arithmetic case). Even an infinite mean can beshown to correspond to zero limits.

Note that for gh(x) = 1[0,h](x), we get

∫ t

0

g(t− x)m′(x)dx =

∫ t

t−h

m′(x)dx = m(t) −m(t− h)

and this leads to the renewal theorem. The renewal theorem and the key renewal theoremshould be thought of as results where time windows are sent to infinity, and a stationarypicture is obtained in the limit. In the case of the renewal theorem, we are only looking atthe mean of an increment. In the key renewal theorem, we can consider other quantitiesrelated to the mean behaviour in a window. E.g., moments of excess lifetimes fall intothis category. Note the general scheme that e.g. Et ≤ h is a quantity only dependingon a time window of size h.

Vice versa, the key renewal theorem can be deduced from the renewal theorem byapproximating g by step functions.

12.3 The coupling proofs

Suppose now that X is a renewal process with integer-valued interarrival times, andsuppose that P(Z1 ∈ dN) < 1 for all d ≥ 2, i.e. suppose that Z1 is 1-arithmetic. Let

X be a renewal process that is not itself stationary, but that is such that (Xn)n≥0 has

stationary increments and such that X has also a 1-arithmetic first arrival time. This canbe achieved by choosing L with the size-biased pick distribution psb, which is 1-arithmetic,and choosing Z0 conditionally uniform on 1, . . . , L.

We want to couple these two independent processes X and X. Define N = infn ≥

1 : Tn = Tn, the first index of simultaneous arrivals. Note that we require not only thatarrivals happen at the same time, but that the index is the same. We can show thatP(N < ∞) = 1 by invoking a result on centered random walks. In fact N is the first

time, the random walk Sn = Tn − Tn hits 0. S is not a simple random walk, but it canbe shown that it is recurrent (hence hits 0 in finite time) since E(Z1 − Z1) = 0.

Now the idea is similar to the one in the coupling proof of the convergence theoremfor Markov chains. We construct a new process

Zn =

Zn if n < N

Zn if n ≥ N

so that

X t =

Xt if t < TN

Xt if t ≥ TN

Page 76: Part B Applied Probability

68 Lecture Notes – Part B Applied Probability – Oxford MT 2007

Now X ∼ X and we get, in fact

P((X t+s −Xt)s≥0 ∈ A) = P(TN ≤ t, (Xt+s − Xt)s≥0 ∈ A) + P(TN > t, (Xt+s −X t)s≥0 ∈ A)

→ P((Xt+s − Xt)s≥0 ∈ A),

for all suitable A ⊂ f : [0,∞) → N, since P(TN > t) → 0.This shows convergence of the distribution of (Xt+s −Xt)s≥0 to the distribution of X

in so-called total variation, which is actually stronger than convergence in distributionas we claim. In particular, we can now take sets A = f : [0,∞) → N : f(s) = k toconclude.

For the renewal theorem, we deduce from this stronger form of convergence, that themeans of Xt+s −Xt converge as t→ ∞.

For the non-arithmetic case, the proof is harder, since N = ∞, and times Nε =infn ≥ 1 : |Tn − Tn| < ε do not achieve a perfect coupling.

There is also a proof using renewal equations.

12.4 Example

Let us investigate the asymptotics of E(Ert ). We condition on the last arrival time before

t and that it is the kth arrival

E(Ert ) = E(((Z0 − t)+)r) +

k≥1

∫ t

0

E(((Zk − (t− x))+)r)f ∗(k)(x)dx

= E(((Z0 − t)+)r) +

∫ t

0

E(((Z1 − (t− x))+)r)m′(x)dx.

Let us write h(y) = E(((Z1 − y)+)r). This is a clearly a nonnegative nonincreasingfunction of y and can be seen to be integrable if and only if E(Zr+1

1 ) < ∞ (see below).The Key Renewal Theorem gives

E(Ert ) →

1

µ

∫ ∞

0

h(y)dy =E(Zr+1

1 )

µ(r + 1)

since

E(((Z1 − x)+)r) =

∫ ∞

x

(z − x)rf(z)dz =

∫ ∞

0

yrf(y + x)dy

and hence∫ ∞

0

E(((Z1 − x)+)r)dx =

∫ ∞

0

yr

∫ ∞

0

f(y + x)dxdy

=

∫ ∞

0

yrF (y)dy =E(Zr+1

1 )

r + 1.

It is now easy to check that, in fact, these are the moments of the limit distribution LUfor the excess life Et.

Page 77: Part B Applied Probability

Lecture 13

M/M/1 queues and queueingnetworks

Reading: Norris 5.2.1-5.2.6; Grimmett-Stirzaker 11.2, 11.7; Ross 6.6, 8.4

Consider a single-server queueing system in which customers arrive according to aPoisson process of rate λ and service times are independent Exp(µ). Let Xt denote thelength of the queue at time t including any customer that is currently served. This is thesetting of Exercise A.4.2 and from there we recall that

• An invariant distribution exists if and only if λ < µ, and is given by

ξn = (λ/µ)n(1 − λ/µ) = ρn(1 − ρ), n ≥ 0.

where ρ = λ/µ is called the traffic intensity. Cleary λ < µ ⇐⇒ ρ < 1. By theergodic theorem, the server is busy a (long-term) proportion ρ of the time.

• ξn can be best obtained by solving the detailed balance equations. By Proposition57, X is reversible in equilibrium.

• The embedded “jump chain” (Mn)n≥0, Mn = XTn, has a different invariant distri-

bution η 6= ξ since the holding times are Exp(λ+µ) everywhere except in 0, wherethey are Exp(λ), hence rather longer, so that X spends “more time” in 0 than M .Hence η puts higher weight on 0, again by the ergodic theorem, now in discretetime. Let us state more explicitly the two ergodic theorems. They assert that wecan obtain the invariant distributions as almost sure limits as n→ ∞

1

Tn

∫ Tn

0

1Xs=ids =1

Tn

n−1∑

k=0

Zk1Mk=i → ξi

1

n

n−1∑

k=0

1Mk=i → ηi,

for all i ≥ 0, actually in the first case more generally, as t → ∞ where t replacesthe special choice t = Tn. Note how the holding times change the proportions asweights in the sums, Tn = Z0 + . . .+ Zn−1 being just the sum of the weights.

• During any Exp(µ) service time, a geom(λ/(λ+ µ)) number of customers arrives.

69

Page 78: Part B Applied Probability

70 Lecture Notes – Part B Applied Probability – Oxford MT 2007

13.1 M/M/1 queues and the departure process

Define D0 = 0 and successive departure times

Dn+1 = inft > Dn : Xt −Xt− = −1 n ≥ 0.

Let us study the process Vn = XDn, n ≥ 0, i.e. the process of queue lengths after depar-

tures. By the lack of memory property of Exp(λ), the geometric random variables Nn,n ≥ 1, that record the number of new customers between Dn−1 and Dn, are independent.Therefore, (Vn)n≥0 is a Markov chain, with transition probabilities

dk,k−1+m =

λ+ µ

)mµ

λ+ µ, k ≥ 1, m ≥ 0.

For k = 0, we get d0,m = d1,m, m ≥ 0, since the next service only begins when a newcustomer enters the system.

Proposition 98 V has invariant distribution ξ.

Proof: A simple calculation shows that with ρ = λ/µ and q = λ/(λ+ µ)

k∈N

ξkdk,n = ξ0d0,n +

n+1∑

k=1

ξkdk,n = (1 − ρ)qn(1 − q) + (1 − ρ)(1 − q)qn+1

n+1∑

k=1

q

)k

= ξn,

after bringing the partial geometric progression into closed form and appropriate cancel-lations. 2

Note that the conditional distribution of Dn+1 −Dn given Vn = k is the distributionof a typical service time G ∼ Exp(µ) if k ≥ 1 and the distribution of Y + G, whereY ∼ Exp(λ) is a typical interarrival time, if k = 0 since we have to wait for a new customerand his service. We can also calculate the unconditional distribution of Dn+1 − Dn, atleast if V is in equilibrium.

Proposition 99 If X (and hence V ) is in equilibrium, then the Dn+1 −Dn are indepen-dent Exp(λ) distributed.

Proof: Let us first study D1. We can calculate its moment generating function byProposition 7 a), conditioning on V0, which has the stationary distribution ξ:

E(eγD1) = E(eγD1 |V0 = 0)P(V0 = 0) +

∞∑

k=1

E(eγD1 |V0 = k)P(V0 = k)

λ− γ

µ

µ− γ

(1 −

λ

µ

)+

µ

µ− γ

λ

µ

µ− γ

µ− λ+ λ− γ

λ− γ=

λ

λ− γ

and identify the Exp(λ) distribution.

Page 79: Part B Applied Probability

Lecture 13: M/M/1 queues and queueing networks 71

For independence of V1 and D1 we have to extend the above calculation and checkthat

E(eγD1αV1) =λ

λ− γ

µ− λ

µ− αλ,

because the second ratio is the probability generating function of the geom(λ/µ) station-ary distribution ξ. To do this, condition on V0 ∼ ξ and then on D1:

E(eγD1αV1) =

∞∑

k=0

ξkE(eγD1αV1 |V0 = k)

and use the fact that given V1 = k ≥ 1, V1 = k+N1−1, where N1 ∼ Poi(λx) conditionallygiven D1 = x, because N1 is counting Poisson arrivals in an interval of length D1 = x:

E(eγD1αV1 |V0 = k) = αk−1

∫ ∞

0

E(eγD1αN1|V0 = k,D1 = x)fD1(x)dx

= αk−1

∫ ∞

0

eγx exp−λx(1 − α)fD1(x)dx

= αk−1E(e(γ−λ(1−α)D1)) = αk−1 µ

µ− γ + λ(1 − α).

For k = 0, we get the same expression without αk−1 and with a factor λ/(λ − γ),because D1 = Y +G, where no arrivals occur during Y , and N1 is counting those duringG ∼ Exp(µ). Putting things together, we get

E(eγD1αV1) = (1 − ρ)

λ− γ+

ρ

1 − ρα

µ− γ + λ(1 − α),

which simplifies to the expression claimed.

Now an induction shows Dn+1−Dn ∼ Exp(λ), and they are independent, because thestrong Markov property at Dn makes the system start afresh conditionally independentlyof the past given Vn. Since D1, . . . , Dn −Dn−1 are independent of Vn, they are then alsoindependent of the whole post-Dn process. 2

The argument is very subtle, because the post-Dn process is actually not independentof the whole pre-Dn process, just of the departure times. The result, however, is notsurprising since we know that X is reversible, and the departure times of X are thearrival times of the time-reversed process, which form a Poisson process of rate λ.

In the same way, we can study A0 = 0 and successive arrival times

An+1 = inft > An : Xt −Xt− = 1, n ≥ 0.

Clearly, these also have Exp(λ) increments, since the arrival process is a Poisson processwith rate λ. We study XAt

in the next lecture in a more general setting.

Page 80: Part B Applied Probability

72 Lecture Notes – Part B Applied Probability – Oxford MT 2007

13.2 Tandem queues

The simplest non-trivial network of queues is a so-called tandem system that consistsof two queues with one server each, having independent Exp(µ1) and Exp(µ2) servicetimes, respectively. Customers join the first queue according to a Poisson process of rateλ, and on completing service immediately enter the second queue. Denote by X

(1)t the

length of the first queue at time t and by X(2)t the length of the second queue at time t.

Proposition 100 The queue length processX = (X(1), X(2)) is a continuous-time Markovchain with state space S = N2 and non-zero transition rates

q(i,j),(i+1,j) = λ, q(i+1,j),(i,j+1) = µ1, q(i,j+1),(i,j) = µ2, i, j ∈ N.

Proof: Just note that in state (i + 1, j + 1), three exponential clocks are ticking, thatlead to transitions at rates as described. Similarly, there are fewer clocks for (0, j + 1),(i+1, 0) and (0, 0) since one or both servers are idle. The lack of memory property makesthe process start afresh after each transition. Standard reasoning completes the proof.

2

Proposition 99 yields that the departure process of the first queue, which is now alsothe arrival process of the second queue, is a Poisson process with rate λ, provided thatthe queue is in equilibrium. This can be achieved if λ < µ1.

Proposition 101 X is positive recurrent if and only if ρ1 := λ/µ1 < 1 and ρ2 := λ/µ2 <1. The unique stationary distribution is then given by

ξ(i,j) = ρi1(1 − ρ1)ρ

j2(1 − ρ2)

i.e. in equilibrium, the lengths of the two queues at any fixed time are independent.

Proof: As shown in Exercise A.4.3, ρ1 ≥ 1 would prevent equilibrium for X(1), andexpected return times for X and X(1) then clearly satisfy m(0,0) ≥ m

(1)0 = ∞. If ρ1 < 1

andX(1) is in equilibrium, then by Proposition 99, the arrival process for the second queueis a Poisson process at rate λ, and ρ2 ≥ 1 would prevent equilibrium for X(2). Specifically,if we assume m0,0 <∞, then we get the contradiction ∞ = m

(2)0 ≤ m(0,0) <∞.

If ρ1 < 1 and ρ2 < 1, ξ as given in the statement of the proposition is an invariantdistribution, it is easily checked that the (i+ 1, j + 1) entry of ξQ = 0 holds:

ξ(i,j+1)q(i,j+1),(i+1,j+1) + ξ(i+2,j)q(i+2,j),(i+1,j+1) + ξ(i+1,j+2)q(i+1,j+2),(i+1,j+1)

+ ξ(i+1,j+1)q(i+1,j+1),(i+1,j+1) = 0

for i, j ∈ N, and similar equations for states (0, j + 1), (i+ 1, 0) and (0, 0). It is uniquesince X is clearly irreducible (we can find paths between any two states in N2). 2

We stressed that queue lengths are independent at fixed times. In fact, they are notindependent in a stronger sense, e.g. (X

(1)s , X

(1)t ) and (X

(2)s , X

(2)t ) for s < t turn out to

be dependent. More specifically, consider X(1)s − X

(2)t = n for big n, then it is easy to

see that 0 < P(X(2)t = 0|X(1)

s −X(1)t = n) → 0 as n→ ∞, since at least n customers will

then have been served by server 2 also.

Page 81: Part B Applied Probability

Lecture 13: M/M/1 queues and queueing networks 73

13.3 Closed and open migration networks

More general queueing systems are obtained by allowing customers to move in a system ofm single-server queues according to a Markov chain on 1, . . . , m. For a single customer,no queues ever occur, since he is simply served where he goes. If there are r customers inthe system with no new customers arriving or existing customers departing, the systemis called a closed migration network. If at some (or all) queues, also new customers arriveaccording to a Poisson process, and at some (or all) queues, customers served may leavethe system, the system is called an open migration network.

The tandem queue is an open migration network with m = 2, where new customersonly arrive at the first queue and existing customers only leave the system after servicefrom the second server. The Markov chain is deterministic and sends each customerfrom state 1 to state 2: π12 = 1. Customers then go into an absorbing exit state 0, say,π2,0 = 1, π0,0 = 1.

Fact 102 If service times are independent Exp(µk) at server k ∈ 1, . . . , m, arrivalsoccur according to independent Poisson processes of rates λk, k = 1, . . . , m, and depar-tures are modelled by transitions to another server or an additional state 0, accordingto transition probabilities πk,ℓ, then the queue-lengths process X = (X(1), . . . , X(m)) iswell-defined and a continuous-time Markov chain. Its transition rates can be given as

qx,x+ek= λk, qx,x−ek+eℓ

= µkπkℓ, qx,x−ek= µkπk0

for all k, ℓ ∈ 1, . . . , m, x = (x1, . . . , xm) ∈ Nm such that xk ≥ 1 for the latter two,

ek = (0, . . . , 0, 1, 0, . . . , 0) is the kth unit vector.

Fact 103 If X = (X(1), . . . , X(m)) models a closed migration network with irreducible

migration chain, then the total number of customers X(1)t + . . . +X

(2)t remains constant

over time, and for any such constant r, say, X has a unique invariant distribution givenby

ξx = Br

m∏

k=1

ηxk

k , for all x ∈ Nm such that x1 + . . .+ xm = r,

where η is the invariant distribution of the continuous-time migration chain and Br is anormalising constant.

Note that ξ has a product form, but the queue lengths at servers k = 1, . . . , munder the stationary distribution are not independent, since the admissible x-values areconstrained by x1 + . . .+ xm = r.

Page 82: Part B Applied Probability

74 Lecture Notes – Part B Applied Probability – Oxford MT 2007

Page 83: Part B Applied Probability

Lecture 14

M/G/1 and G/M/1 queues

Reading: Norris 5.2.7-5.2.8; Grimmett-Stirzaker 11.1; 11.3-11.4; Ross 8.5, 8.7Further reading: Grimmett-Stirzaker 11.5-11.6

The M/M/1 queue is the simplest queueing model. We have seen how it can be ap-plied/modified in queueing networks, with several servers etc. These were all continuous-time Markov chains. It was always the exponential distribution that described interarrivaltimes as well as service times. In practice, this assumption is often unrealistic. If we keepexponential distributions for either interarrival times or service times, but allow more gen-eral distributions for the other, the model can still be handled using Markov techniquesthat we have developed.

We call M/G/1 queue a queue with Markovian arrivals (Poisson process of rate λ), aGeneral service time distribution (we also use G for a random variable with this generaldistribution on (0,∞)), and 1 server.

We call G/M/1 queue a queue with a General interarrival distribution and Markovianservice times (exponential with rate parameter µ), and 1 server.

There are other queues that have names in this formalism. We have seen M/M/squeues (Example 30), and also M/M/∞ (queues with an infinite number of servers) –this model is the same as the immigration-death model that we formulated at the end ofExample 58.

14.1 M/G/1 queues

An M/G/1 queue has independent and identically distributed service times with anydistributions on (0,∞), but independent Exp(λ) interarrival times. Let Xt be the queuelength at time t. X is not a continuous-time Markov chain, since the service distributiondoes not have the lack of memory property (unless it is exponential which brings us backto M/M/1). This means that after an arrival, we have a nasty residual service distribu-tion. However, after departures, we have exponential residual interarrival distributions:

75

Page 84: Part B Applied Probability

76 Lecture Notes – Part B Applied Probability – Oxford MT 2007

Proposition 104 The process of queue lengths Vn = XDnat successive departure times

Dn, n ≥ 0, is a Markov chain with transition probabilities

dk,k−1+m = E

((λG)m

m!e−λG

), k ≥ 1, m ≥ 0,

and d0,m = d1,m, m ≥ 0. Here G is a (generic) service time.

Proof: The proof is not hard since we recognise the ingredients. Given G = t thenumber N of arrivals during the service times has a Poisson distribution with parameterλt. Therefore, if G has density g

P(N = m) =

∫ ∞

0

P(N = m|G = t)g(t)dt

=

∫ ∞

0

(λt)m

m!e−λtg(t)dt

= E

((λG)m

m!e−λG

).

If G is discrete, a similar argument works. The rest of the proof is the same as for M/M/1queues (cf. the discussion before Proposition 98). In particular, when the departingcustomer leaves an empty system behind, there has to be an arrival, before the nextservice time starts. 2

For the M/M/1 queue, we defined the traffic intensity ρ = λ/µ, in terms of the arrivalrate λ = 1/E(Y ) and the (potential) service rate µ = 1/E(G) for a generic interarrivaltime Y ∼ Exp(λ) and service time G ∼ Exp(µ). We say “potential” service rate, becausein the queueing system, the server may have idle periods (empty system), during whichthere is no service. Indeed, a main reason to consider traffic intensities is to describewhether there are idle periods, i.e. whether the queue length is a recurrent process.

If G is not exponential, we can interpret “service rate” as asymptotic rate. consider arenewal process N with interrenewal times distributed as G. By the strong law of renewaltheory Nt/t→ 1/E(G). It is therefore natural, for the M/G/1 queue, to define the trafficintensity as ρ = λE(G).

Proposition 105 Let ρ = λE(G) be the traffic intensity of an M/G/1 queue. If ρ < 1,then V has a unique invariant distribution ξ. This ξ has probability generating function

k∈N

ξksk = (1 − ρ)(1 − s)

1

1 − s/E(eλ(s−1)G).

Proof: We define ξ via its probability generating function

φ(s) =∑

k∈N

ξksk := (1 − ρ)(1 − s)

1

1 − s/E(eλ(s−1)G)

Page 85: Part B Applied Probability

Lecture 14: M/G/1 and G/M/1 queues 77

and note that ξ0 = φ(0) = 1 − ρ. To identify ξ as solution of

ξj =

j+1∑

i=0

ξidi,j, j ≥ 0,

we can check the corresponding equality of probability generating functions. The prob-ability generating function of the left-hand side is φ(s). To calculate the probabilitygenerating function of the right-hand side, calculate first

m∈N

dk+1,k+msm =

m∈N

E

((sλG)m

m!e−λG

)= E(e(s−1)λG).

and then we have to check that the following sum is equal to φ(s):

j∈N

j+1∑

i=0

ξidi,jsj =

j∈N

ξ0d0,jsj +∑

k∈N

m∈N

ξk+1dk+1,k+msk+m

= E(e(s−1)λG)

(ξ0 +

∞∑

k∈N

ξk+1sk

)

= E(e(s−1)λG)s−1 (φ(s) − (1 − ρ)(1 − s)) ,

but this follows using the definition of φ(s). This completes the proof since uniquenessfollows from the irreducibility of V . 2

14.2 Waiting times in M/G/1 queues

An important quantity in queueing theory is the waiting time of a customer. Here we haveto be specific about the service discipline. We will assume throughout that customersqueue and are served in their order of arrival. This discipline is called FIFO (First InFirst Out). Other disciplines like LIFO (Last In First Out) with or without interruptionof current service can also be studied.

Clearly, under the FIFO discipline, the waiting time of a given customer depends onthe service times of customers in the queue when he arrives. Similarly, all customers inthe system when a given customer leaves, have arrived during his waiting and servicetimes.

Proposition 106 If X is such that V is in equilibrium, then the waiting time of anycustomer has distribution given by

E(eγW ) =(1 − ρ)γ

λ+ γ − λE(eγG).

Page 86: Part B Applied Probability

78 Lecture Notes – Part B Applied Probability – Oxford MT 2007

Proof: Unfortunately, we have not established equilibrium of X at the arrival timesof customers. Therefore, we have to argue from the time when a customer leaves. Dueto the FIFO discipline, he will leave behind all those customers that arrived during hiswaiting time W and his service time G. Given T = W + G = t, their number N has aPoisson distribution with parameter λt so that

E(sN ) =

∫ ∞

0

E(sN |T = t)fT (t)dt =

∫ ∞

0

eλt(s−1)fT (t)dt

= E(eλT (s−1)) = E(eλ(s−1)W )E(eλ(s−1)G).

From Proposition 105 we take E(sN), and putting γ = λ(s− 1), we deduce the formularequired by rearrangement. 2

Corollary 107 In the special case of M/M/1, the distribution of W is given by

P(W = 0) = 1 − ρ and P(W > w) = ρe−(µ−λ)w, w ≥ 0.

Proof: We calculate the moment generating function of the proposed distribution

eγ0(1 − ρ) +

∫ ∞

0

eγtρ(µ− λ)e−(µ−λ)tdt =µ− λ

µ+λ

µ

µ− λ

µ− λ− γ=µ− λ

µ

µ− γ

µ− λ− γ.

From the preceding proposition we get for our special case

E(eγW ) =γ(µ− λ)/µ

λ+ γ − λµ/(µ− γ)=µ− λ

µ

(µ− γ)γ

(λ+ γ)(µ− γ) − λµ

and we see that the two are equal. We conclude by the Uniqueness Theorem for momentgenerating functions. 2

14.3 G/M/1 queues

For G/M/1 queues, the arrival process is a renewal process. Clearly, by the renewalproperty and by the lack of memory property of the service times, the queue lengthprocess X starts afresh after each arrival, i.e. Un = XAn

, n ≥ 0, is a Markov chain on1, 2, 3, . . ., where An is the nth arrival time. It is actually more natural to consider the

Markov chain Un = Un − 1 = XAn− on N.

It can be shown that for M/M/1 queues the invariant distribution of U is the sameas the invariant distribution of V and of X. For general G/M/1 queues we get

Proposition 108 Let ρ = 1/(µE(A1)) be the traffic intensity. If ρ < 1, then U has aunique invariant distribution given by

ξk = (1 − q)qk, k ∈ N,

where q is the smallest positive root of q = E(eµ(q−1)A1).

Page 87: Part B Applied Probability

Lecture 14: M/G/1 and G/M/1 queues 79

Proof: First note that given an interarrival time Y = y, a Poi(µy) number of customersare served, so U has transition probabilities

ai,i+1−j = E

((µY )j

j!e−µY

), j = 0, . . . , i; ai,0 = 1 −

i∑

j=0

ai,i+1−j.

Now for any geometric ξ, we get, for k ≥ 1, from Tonelli’s theorem,∞∑

i=k−1

ξiaik =

∞∑

j=0

ξj+k−1aj+k−1,j

=∞∑

j=0

(1 − q)qj+k−1E

((µY )j

j!e−µY

)

= (1 − q)qk−1E(e−µY (1−q)

),

and clearly this equals ξk = (1 − q)qk if and only if q = E(eµ(q−1)Y ) =: f(q), as required.Note that both sides are continuously differentiable on [0, 1) and on [0, 1] if and only iflimits q ↑ 1 are finite, f(0) > 0, f(1) = 1 and f ′(1) = E(µY ) = 1/ρ, so there is a solutionif ρ < 1, since then f(1 − ε) < 1 − ε for ε small enough. The solution is unique, sincethere is at most one stationary distribution for the irreducible Markov chain U . The casek = 0 can be checked by a similar computation, so ξ is indeed a stationary distribution.

2

Proposition 109 The waiting time W of a customer arriving in equilibrium has distri-bution

P(W = 0) = 1 − q, P(W > w) = qe−µ(1−q)w, w ≥ 0.

Proof: In equilibrium, an arriving customer finds a number N ∼ ξ of customers in thequeue in front of him, each with a service ofGj ∼ Exp(µ). Clearly P(W = 0) = ξ0 = 1−q.Also since the conditional distribution of N given N ≥ 1 is geometric with parameter qand geometric sums of exponential random variables are exponential, we have that Wgiven N ≥ 1 is exponential with parameter µ(1 − q). 2

Alternatively, we can write this proof in formulas as a calculation of P(W > y) byconditioning on N .

P(W > w) =∞∑

n=0

P(N = n)P(W > w|N = n)

= 0 +

∞∑

n=1

qn(1 − q)

∫ ∞

w

µn

(n− 1)!xn−1e−µxdx

=

∫ ∞

w

e−µxqµ(1 − q)

∞∑

n=1

µn−1

(n− 1)!qn−1xn−1dx

= q

∫ ∞

w

µ(1 − q) exp−µx+ µqxdx = q exp−µ(1 − q)y,

where we used that the sum of n independent identically exponentially distributed ran-dom variables is Gamma distributed.

Page 88: Part B Applied Probability

80 Lecture Notes – Part B Applied Probability – Oxford MT 2007

Page 89: Part B Applied Probability

Lecture 15

Markov models for insurance

Reading: Ross 7.10; CT4 Unit 6Further reading: Norris 5.3

15.1 The insurance ruin model

Insurance companies deal with large numbers of insurance policies at risk. They aregrouped according to type and various other factors into so-called portfolios. Let usfocus on such a portfolio and model the associated claim processes, the claim sizes andthe reserve process. We make the following assumptions.

• Claims arrive according to a Poisson process (Xt)t≥0 with rate λ.

• Claim amounts (Aj)j≥1 are positive, independent of the arrival process and identi-cally distributed with common probability density function k(a), a > 0, and meanµ = E(A1).

• The insurance company provides an initial reserve of u ≥ 0 money units.

• Premiums are paid continuously at constant rate c generating a linear premiumincome accumulating to ct at time t. We assume c > λµ to have more premiumincome than claim outgo, on average.

• We ignore all expenses and other influences.

In this setting, we define the following objects of interest

• The aggregate claims process Ct =

Xt∑

n=1

An, t ≥ 0.

• The reserve process Rt = u+ ct− Ct, t ≥ 0.

• The ruin probability ψ(u) = Pu(Rt < 0 for some t ≥ 0), as a function of R0 = u ≥ 0.

81

Page 90: Part B Applied Probability

82 Lecture Notes – Part B Applied Probability – Oxford MT 2007

15.2 Aggregate claims and reserve processes

Proposition 110 C and R have stationary independent increments. Their momentgenerating functions are given by

E(eγCt) = exp

λt

∫ ∞

0

(eγa − 1)k(a)da

and

E(eβRt) = exp

βu+ βct− λt

∫ ∞

0

(1 − e−βa)k(a)da

.

Proof: First calculate the moment generating function of Ct:

E(eγCt) = E

(exp

γ

Xt∑

j=1

Aj

)

=∑

n∈N

E

(exp

γ

n∑

j=1

Aj

)P(Xt = n)

=∑

n∈N

(E(eγA1

))nP(Xt = n)

= expλt(E(eγA1

)− 1)

which in the case where A1 has a density k, gives the formula required. The same calcu-lation for the joint moment generating function of Ct and Ct+s −Ct, or more increments,yields stationarity and independence of increments (only using the stationarity and in-dependence of increments of X, and the independence of the (Aj)j≥1).

The statements for R follow easily with β = −γ. 2

The moment generating function is useful to calculate moments.

Example 111 We differentiate the moment generating functions at zero to obtain

E(Ct) =∂

∂γexp

λt(E(eγA1

)− 1)∣∣∣∣

γ=0

= λt∂

∂γE(eγA1

)∣∣∣∣γ=0

= λtµ.

and E(Rt) = u+ ct− λtµ = u+ (c− λµ)t. Note that the Strong Law of Large Numbers,applied to increments Zn = Rn − Rn−1 yields

Rn

n=u

n+

1

n

n∑

j=1

Zj → E(Z1) = c− λµ > 0 a.s., as n→ ∞.

confirming our claim that c > λµ means that, on average, there is more premium incomethan claim outgo. In particular, this implies Rn → ∞ a.s. as n → ∞. Does this implythat infRs : 0 ≤ s <∞ > −∞? No. It is conceivable that between integers, the reserveprocess takes much smaller values. But we will show that infRs : 0 ≤ s <∞ > −∞.

Page 91: Part B Applied Probability

Lecture 15: Markov models for insurance 83

There are other random walks that are embedded in the reserve process:

Example 112 Consider the process at claim times Wn = RTn, n ≥ 0, where (Tn)n≥0 are

the event times of the Poisson process (and T0 = 0). Now

Wn+1 −Wn = RTn+1 −RTn= c(Tn+1 − Tn) −An+1, n ≥ 0,

are also independent identically distributed increments with E(Wn+1−Wn) = c/λ−µ > 0,and the Strong Law of Large Numbers yields

Wn

n=u

n+

1

n

n∑

j=1

(Wn+1 −Wn) → cλ− µ a.s. as n→ ∞.

Again, we conclude Wn → ∞, but note that Wn are the local minima of R, so

I∞ := infRt : 0 ≤ t <∞ = infWn, n ≥ 0 > −∞.

As a consequence, if we denote R0t = ct− Ct with associated I0

∞, then

ψ(u) = Pu(Rt < 0 for some t ≥ 0) = P(I0∞ ≤ −u) → P(I0

∞ = −∞) = 0

as u→ ∞, but this is then ψ(∞) = 0.

15.3 Ruin probabilities

We now turn to studying the ruin probabilities ψ(u), u ≥ 0.

Proposition 113 The ruin probabilities ψ(u) satisfy the renewal equation

ψ(x) = g(x) +

∫ x

0

ψ(x− y)f(y)dy, x ≥ 0,

where

f(y) =λ

cK(y) =

λ

c

∫ ∞

y

k(x)dx and g(x) =λµ

cK0(x) =

λ

c

∫ ∞

x

K(y)dy.

Proof: Condition on T1 ∼ Exp(λ) and A1 ∼ k(a) to obtain

ψ(x) =

∫ ∞

0

∫ ∞

0

ψ(x+ ct− a)k(a)daλe−λtdt

=

∫ ∞

x

λ

ce−(s−x)λ/c

∫ ∞

0

ψ(s− a)k(a)dads

where we use the convention that ψ(x) = 1 for x < 0.

Page 92: Part B Applied Probability

84 Lecture Notes – Part B Applied Probability – Oxford MT 2007

Differentiation w.r.t. x yields

ψ′(x) =λ

cψ(x) −

λ

c

∫ ∞

0

ψ(x− a)k(a)da

cψ(x) −

λ

c

∫ x

0

ψ(x− a)k(a)da−λ

cK(x).

Note that we also have a terminal condition ψ(∞) = 0. With this terminal condition,this integro-differential equation has a unique solution. It therefore suffices to check thatany solution of the renewal equation also solves the integro-differential equation.

For the renewal equation, we only sketch the argument since the technical detailswould distract from the main steps: note that differentiation (we skip the details fordifferentiation under the integral sign!) yields (setting s = x − y in the convolutionintegral)

ψ′(x) = g′(x) + ψ(x)f(0) +

∫ x

0

ψ(s)f ′(x− s)ds

= −λ

cK(x) +

λ

cψ(x) −

λ

c

∫ x

0

ψ(x− a)k(a)da,

and note also that, with the convention ψ(x) = 1 for x < 0, we can write the renewalequation as

ψ(x) =

∫ ∞

0

ψ(x− y)f(y)dy,

where f is a nonnegative function with∫∞

0f(y)dy = λ/c < 1, so for any nonnegative

solution ψ ≥ 0, ψ(x) is less than an average of ψ on (−∞, x], and hence ψ is decreasing(this requires a bit more care), so ψ(∞) exists with (by monotone convergence usingψ(x− y) ↓ ψ(∞) as x→ ∞)

ψ(∞) = limx→∞

∫ ∞

0

ψ(x− y)f(y)dy =

∫ ∞

0

ψ(∞)f(y)dy =λ

cψ(∞) ⇒ ψ(∞) = 0.

2

Example 114 We can calculate ψ(0) = g(0) = λµ/c. In particular, zero initial reservedoes not entail ruin with probability 1. In other words, ψ jumps at u = 0 from ψ(0−) = 1to ψ(0) = ψ(0+) = λµ/c < 1.

Corollary 115 If λµ ≤ c, then ψ is given by

ψ(x) = g(x) +

∫ x

0

g(x− y)u(y)dy where u(y) =∑

n≥1

f ∗(n)(y).

Proof: This is an application of Exercise A.6.2(c), the general solution of the renewalequation. Note that f is not a probability density for λµ < c, but the results (andarguments) are valid for nonnegative f with

∫∞

0f(y)dy ≤ 1. 2

Page 93: Part B Applied Probability

Lecture 15: Markov models for insurance 85

Where is the renewal process? For λµ < c, there is no renewal process with interarrivaldensity f in the strict sense, since f is not a probability density function. One canassociate a defective renewal process that only counts a geometric number of points, andthe best way to motivate this is by looking at λµ = c, where the situation is nicer. It canbe shown that the renewal process is counting new minima of R or R0, not in the timeparameterisation of R0, but in the height variable, i.e.

Yh = #n ≥ 1 : Wn ∈ [−h, 0] and Wn = minW0, . . . ,Wn, h ≥ 0,

is a renewal process. Note that f is the distribution of LU where L is a size-biased claimand U ∼ Unif(0, 1). Intuitively, this is, because big claims are more likely to exceed theprevious minimal reserve level, hence size-biased L, but the previous level will only beexceeded by a fraction LU , since R will not be at its minimum when the claim arrives.

So what happens if λµ < c? There will only be a finite number of claims that exceedthe previous minimal reserve level since now Rt → ∞, and Y remains constant for anylower levels of h.

This is not very explicit. To conclude, let us derive more explicit estimates of ψ.

Proposition 116 Assume that there is α > 0 such that

1 =

∫ ∞

0

eαyf(y)dy =λ

c

∫ ∞

0

eαyK(y)dy.

Then there is a constant C > 0 such that

ψ(x) ∼ Ce−αx as x→ ∞.

Proof: Define a probability density function f(y) = eαyf(y), and g(y) = eαyg(y) andψ(x) = eαyψ(x). Then ψ(x) satisfies

ψ(x) = g(x) +

∫ x

0

ψ(x− y)f(y)dy.

The solution (obtained as in Corollary 115) converges by the key renewal theorem:

ψ(x) = g(x) +

∫ x

0

g(x− y)u(y)dy →1

µ

∫ ∞

0

g(y)dy =: C as x → ∞,

where

u(x) =∑

n≥1

f ∗(n)(x).

Note that g is not necessarily non-increasing, but it can be checked that it is integrable,and a version of the key renewal theorem still applies. 2

Page 94: Part B Applied Probability

86 Lecture Notes – Part B Applied Probability – Oxford MT 2007

Example 117 If An ∼ Exp(1/µ), then in the notation of Proposition 113

g(x) =λµ

ce−x/µ and f(y) =

λ

ce−y/µ

so that the renewal equation becomes

ex/µψ(x) =λµ

c+λ

c

∫ x

0

ψ(y)ey/µdy.

In particular ψ(0) = λµ/c. After differentiation and cancellation

ψ′(x) =

c−

1

µ

)ψ(x) ⇒ ψ(x) =

λµ

cexp

−c− λµ

cµx

.

15.4 Some simple finite-state-space models

Example 118 (Sickness-death) In health insurance, the following model arises. LetS = H,S,∆ consist of the states healthy, sick and dead. Clearly, ∆ is absorbing. Allother transitions are possible, at different rates. Under the assumption of full recoveryafter sickness, the state of health of the insured can be modelled by a continuous-timeMarkov chain.

Example 119 (Multiple decrement model) A life assurance often pays benefits notonly upon death but also when a critical illness or certain losses of limbs, sensory lossesor other disability are suffered. The assurance is not usually terminated upon such anevent.

Example 120 (Marital status) Marital status has a non-negligible effect for variousinsurance types. The state space is S = B,M,D,W,∆ to model bachelor, married,divorced, widowed, dead. Not all direct transitions are possible.

Example 121 (No claims discount) In automobile and some other general insurances,you get a discount on your premium depending on the number of years without (or at mostone) claim. This gives rise to a whole range of models, e.g. S = 0%, 20%, 40%, 50%, 60%.

In all these examples, the exponential holding times are not particularly realistic.There are usually costs associated either with the transitions or with the states. Also, es-timation of transition rates is of importance. A lot of data are available and sophisticatedmethods have been developed.

Page 95: Part B Applied Probability

Lecture 16

Conclusions and Perspectives

16.1 Summary of the course

This course was about stochastic process models X = (Xt)t≥0 in continuous time and(mostly) a discrete state space S, often N. Applications include those where X describes

• counts of births, atoms, bacteria, visits, trials, arrivals, departures, insurance claims,etc.

• the size of a population, the number of buses in service, the length of a queue

and others can be added. Important is the real structure, the real transition mecha-nism that we wish to model by X. Memory plays an important role. We distinguish

• Markov property (lack of memory, exponential holding times; past irrelevant forthe future except for the current state)

• Renewal property (information on previous states irrelevant, but duration in staterelevant; Markov property at transition times)

• Stationarity, equilibrium (behaviour homogeneous in time; for Markov chains, in-variant marginal distribution; for renewal processes, stationary increments)

• Independence (of individuals in population models, of counts over disjoint timeintervals, etc.)

Once we are happy that such conditions are met, we have a model X for the realprocess, and we study it under our model assumptions. We study

• different descriptions of X (jump chain - holding times, transition probabilities -forward-backward equations, infinitesimal behaviour)

• convergence to equilibrium (invariant distributions, convergence of transition prob-abilities, ergodic theorem; strong law and CLT of renewal theory, renewal theorems)

• hitting times, excess life, recurrence times, waiting times, ruin probabilities

87

Page 96: Part B Applied Probability

88 Lecture Notes – Part B Applied Probability – Oxford MT 2007

• odd behaviour (explosion, transience, arithmetic interarrival times)

Techniques

• Conditioning, one-step analysis

• detailed balance equations

• algebra of limits for almost sure convergence

16.2 Duration-dependent transition rates

Renewal processes can be thought of as duration-dependent transition rates. If the in-terarrival distribution is not exponential, then (at least some) residual distributions willnot be the same as the full interarrival distribution, but we can still express, say for Zwith density f that

P(Z − t > s|Z > t) =P(Z > t+ s)

P(Z > t)and fZ−t|Z>t(s) =

f(t+ s)

P(Z > t).

If we define

λ(t) =f(t)

P(Z > t)=

−F′(t)

F (t),

where F (t) = P(Z > t) and in particular F (0) = 1, we can write

F (t) = exp

∫ t

0

λ(s)ds

and f(t) = λ(t) exp

∫ t

0

λ(s)ds

.

We can then also express the residual distributions in terms of λ(s)

P(Z − t > s|Z > t) = exp

∫ t+s

t

λ(r)dr

.

λ(t) can be interpreted as the instantaneous arrival rate time t after the previous arrival.Similarly, we can use this idea in Markov models and split a holding rate λi(d) dependingon the duration d of the current visit to state i into transition rates λi(d) =

∑j 6=i qij(d).

16.3 Time-dependent transition rates

A different type of varying transition intensities is obtained if we make the rates λ(t)dependent on global time t. Here, the time passed in a given state is irrelevant, but onlythe actual time matters. This is useful to model seasonal effects. E.g. the intensity ofroad accidents may be considered higher in winter than in summer. So, a Poisson processto model this could have intensity λ(t) = λ0 + λ1 cos(2πt). This can also be built intobirth- and death-rates of population models, insurance models etc.

Page 97: Part B Applied Probability

Lecture 16: Conclusions and perspectives 89

16.4 Spatial processes

In the case of Poisson counts, one can also look at intensity functions on R2 or R

d andlook at “arrivals” as random points in the plane.

N([0, t] × [0, z]) = X(t, z) ∼ Poi

(∫ t

0

∫ z

0

λ(s, y)dyds

)

and such that counts in disjoint rectangles are independent Poisson variables.

16.5 Markov processes in uncountable state spaces

(R or Rd)

We have come across some processes for which we could have proved a Markov property,the age process (At)t≥0 of a renewal process, the excess process (Et)t≥0 of a renewal pro-cess, but also the processes (Ct)t≥0 and (Rt)t≥0 with stationary independent incrementsthat arose in insurance ruin by combining Poisson arrival times with jump sizes. A sys-tematic study of such Markov processes in R is technically much harder, although manyideas and results transfer from our countable state space model.

Diffusion processes as a special class of such Markov processes are studied in a Financecontext in B10b, in the context of Stochastic Differential Equations in a Part C course,and in a Genetics context in another Part C course.

16.6 Levy processes

A particularly nice class of processes are processes with stationary independent incre-ments, so-called Levy processes. If you have learned or are about to learn about Brown-ian motion in another course (B10b), then you know most Levy processes since a generalLevy process can be written as Xt = µt + σBt + Ct where µ is a drift coefficient, σa scale parameter for the Brownian motion process B, and C is a limit of compoundPoisson processes like the claim size process above. In fact, C may have infinitely manyjumps in a finite interval, that are summable in some sense, but not necessarily absolutelysummable, but these jumps can bedescribed by a family of independent Poisson processeswith associated independent jump heights, in fact a Poisson measure on [0,∞)×R∗ withintensity function ...

There is a Part C course on Levy processes in Finance.

16.7 Stationary processes

We have come across stationary Markov chains and stationary increments of other pro-cesses. Stationarity is a concept that can be studied separately. In our examples, thedependence structure of processes was simple: independent increments, or Markoviandependence, independent holding times etc. More complicated dependence structuresmay be studied.

Page 98: Part B Applied Probability

90 Lecture Notes – Part B Applied Probability – Oxford MT 2007

16.8 4th year dissertation projects

Other than this, if you wish to study any of the stochastic processes or the applicationsmore deeply in your fourth year, there are several people in the Statistics Departmentand the Mathematical Institute who would be willing to supervise you. Think aboutthis, maybe over the Christmas vacations since the Easter vacations are close to theexams. Dissertation projects for Maths&Stats students are arranged before the summerto ensure that every student obtains a project well in advance, you can start working onyour project during the summer. For Maths students dissertation projects are optional.Please get in touch with me or other potential supervisors in Hilary term, if you wish todiscuss your possibilities. I am also happy to direct you to other people.