177
Stat333 Lecture Notes Applied Probability Theory Jiahua Chen Department of Statistics and Actuarial Science University of Waterloo c Jiahua Chen Fall, 2003

Stat333 Lecture Notes - Daum

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Stat333 Lecture Notes - Daum

Stat333 Lecture Notes

Applied Probability Theory

Jiahua Chen

Department of Statistics and Actuarial Science

University of Waterloo

c©Jiahua Chen

Fall, 2003

Page 2: Stat333 Lecture Notes - Daum

2

Course Outline

Stat333

Review of basic probability. Generating functions and their applications.

Simple random walk, branching process and renewal events. Discrete time

Markov chain. Poisson process and continues time Markov chain. Quequing

theory and renewal processes.

Page 3: Stat333 Lecture Notes - Daum

Contents

1 Introduction 1

1.1 Probability Model . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Conditional Probabilities and Independence . . . . . . . . . . 3

1.3 Bayes Formula . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.4 Key Facts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.5 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2 Random Variables 7

2.1 Random Variable . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.2 Discrete Random Variables . . . . . . . . . . . . . . . . . . . . 9

2.3 Continuous Random Variables . . . . . . . . . . . . . . . . . . 10

2.4 Expectations . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.5 Joint Distribution . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.6 Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.7 Formulas for Expectations . . . . . . . . . . . . . . . . . . . . 14

2.8 Key Results and Concepts . . . . . . . . . . . . . . . . . . . . 15

2.9 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3 Conditional Expectation 19

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.2 Formulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.3 Comment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.4 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

1

Page 4: Stat333 Lecture Notes - Daum

2 CONTENTS

4 Generating Functions 29

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

4.2 Probability Generating Functions . . . . . . . . . . . . . . . . 32

4.3 Convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

4.3.1 Key Facts . . . . . . . . . . . . . . . . . . . . . . . . . 36

4.4 The Simple Random Walk . . . . . . . . . . . . . . . . . . . . 36

4.4.1 First Passage Times . . . . . . . . . . . . . . . . . . . 38

4.4.2 Returns to Origin . . . . . . . . . . . . . . . . . . . . . 40

4.4.3 Some Key Results in the Simple Random Walk . . . . 41

4.5 The Branching Process . . . . . . . . . . . . . . . . . . . . . . 42

4.5.1 Mean and Variance of Zn . . . . . . . . . . . . . . . . . 43

4.5.2 Probability of Extinction . . . . . . . . . . . . . . . . . 44

4.5.3 Some Key Results in Branch Process . . . . . . . . . . 48

4.6 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

5 Renewal Events 59

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

5.2 The Renewal and Lifetime Sequences . . . . . . . . . . . . . . 61

5.3 Some Properties . . . . . . . . . . . . . . . . . . . . . . . . . . 64

5.4 Delayed Renewal Events . . . . . . . . . . . . . . . . . . . . . 67

5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

5.6 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

6 Discrete Time MC 73

6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

6.2 Chapman-Kolmogorov Equations . . . . . . . . . . . . . . . . 80

6.3 Classification of States . . . . . . . . . . . . . . . . . . . . . . 82

6.4 Limiting Probabilities . . . . . . . . . . . . . . . . . . . . . . . 89

6.5 Mean Time Spent in Transient States . . . . . . . . . . . . . . 95

6.6 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

7 Exponential and Poisson 105

7.1 Definition and Some Properties . . . . . . . . . . . . . . . . . 106

7.2 Properties of Exponential Distribution . . . . . . . . . . . . . 106

7.3 The Poisson Process . . . . . . . . . . . . . . . . . . . . . . . 109

Page 5: Stat333 Lecture Notes - Daum

CONTENTS 3

7.3.1 Inter-arrival and Waiting Time Distributions . . . . . . 112

7.4 Further Properties . . . . . . . . . . . . . . . . . . . . . . . . 113

7.5 Conditional Distribution of the Arrival Times . . . . . . . . . 114

7.6 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

8 Continuous Time Markov Chain 119

8.1 Birth and Death Process . . . . . . . . . . . . . . . . . . . . . 122

8.2 Kolmogorov Differential Equations . . . . . . . . . . . . . . . 125

8.3 Limiting Probabilities . . . . . . . . . . . . . . . . . . . . . . . 130

8.4 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

9 Queueing Theory 139

9.1 Cost Equations . . . . . . . . . . . . . . . . . . . . . . . . . . 139

9.2 Steady-State Probabilities . . . . . . . . . . . . . . . . . . . . 141

9.3 Exponential Model . . . . . . . . . . . . . . . . . . . . . . . . 143

9.4 Single Server . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

9.5 Network of Queues . . . . . . . . . . . . . . . . . . . . . . . . 149

9.5.1 Open System . . . . . . . . . . . . . . . . . . . . . . . 149

9.5.2 Closed Systems . . . . . . . . . . . . . . . . . . . . . . 150

9.6 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154

10 Renewal Process 155

10.1 Distribution of N(t) . . . . . . . . . . . . . . . . . . . . . . . 156

10.2 Limiting Theorems and Their Applications . . . . . . . . . . . 159

10.3 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161

11 Sample Exam Papers 165

11.1 Quiz 1: Winter 2003 . . . . . . . . . . . . . . . . . . . . . . . 165

11.2 Quiz 2: Winter 2003 . . . . . . . . . . . . . . . . . . . . . . . 167

11.3 Final Exam: Winter 2003 . . . . . . . . . . . . . . . . . . . . 169

Page 6: Stat333 Lecture Notes - Daum

Chapter 1

Introduction

1.1 Probability Model

A probability model consists of three parts: sample space, a collection of

events, and a probability measure.

Assume an experiment is to be done. The set of all possible outcomes is

called Sample Space. For example, if we roll a die, {1, 2, 3, 4, 5, 6} is the

sample space. We use notation S for the sample space. Every element of

S is called a sample point. Mathematically, the sample space is merely an

arbitrary set. There is no need of a corresponding experiment.

Roughly speaking, every subset of S is an event. The collection of events

is then all possible subsets of S. In some cases, however, we only admit a

specific class of subsets of S as events. We do not discuss this point in this

course.

For every event, we assign a probability to it. To make it meaningful,

we have to maintain some internal consistency. That is, the assignment is

required to have some properties. The following conditions are placed on

assigning probabilities.

Axioms of Probability Measure

A probability measure P is a function of events such that:

1. 0 ≤ P (E) ≤ 1 for any event E;

1

Page 7: Stat333 Lecture Notes - Daum

2 CHAPTER 1. INTRODUCTION

2. P (S) = 1;

3. P (∪∞i=1Ei) =∑∞i=1 P (Ei) for any mutually exclusive events Ei, i =

1, 2, . . .. i.e. EiEj = φ for all i 6= j.

Mathematically, the above definition does not depend on the hypothetical

experiment. A probability model consists of a sample space S, a σ-algebra B(a collection of subsets of S with some properties), and a probability measure

P .

The axioms for a probability model imply that the probability measure

has many other properties not explicitly stated as axioms. For example, since

P (φ ∪ φ) = P (φ) + P (φ), we must have P (φ) = 0.

Let Ec be the complement of event E which consists of all sample points

which do not belong to E. Axioms 2 and 3 imply that

1 = P (S) = P (E ∪ Ec) = P (E) + P (Ec).

Hence, P (Ec) = 1− P (E).

For any two events E1 and E2, we have

P (E1 ∪ E2) = P (E1) + P (E2)− P (E1E2).

In general,

P (∪ni=1Ei) =∑

P (Ei)−∑i1<i2

P (Ei1Ei2) + · · · ,+(−1)n+1P (∩ni=1Ei).

Example 1.1

In a party, n men throw their hats in the centre of the room. Each man

randomly picks up a hat. What is the probability that nobody gets his own

hat? What is the limit of this probability when n→∞?

Solution: Let Ai = the event that the ith man gets his hat for i =

1, 2, . . . , n. Then the event that nobody gets his own = [∪Ai]c.Note that

P (∪iAi) = nP (A1)− (n2 )P (A1A2) + · · · .

Page 8: Stat333 Lecture Notes - Daum

1.2. CONDITIONAL PROBABILITIES AND INDEPENDENCE 3

Using classical definition of the probability measure (which satisfies three

Axioms),

P (A1) =(n− 1)!

n!; P (A1A2) =

(n− 2)!

n!

and so on. We get

P (∪iAi) = 1− 1

2!+

1

3!− · · ·+ (−1)n+1 1

n!.

The answer to the question is then

1− P (∪iAi) = 1− [1− 1

2!+

1

3!− · · ·+ (−1)n+1 1

n!].

The limit when n→∞ is then exp(−1). ♦

1.2 Conditional Probabilities and Independence

Two events A and B are independent if and only if

P (AB) = P (A)P (B).

Some people may have probabilistic instinct on why this relationship de-

scribes the independence, and why our notion of independence implies this

relationship. However, once the notion of independence is defined as above,

this relationship serves as our golden standard. We always try to verify this,

whether we work on assignment problems or on applications on the concept

of independence. For instance, to test if being a smoker is independent of

having heart disease, we check whether the above relationship is true by

collecting data on these incidents.

A sequence of events A1, . . . , An are independent of each other if and only

if

P (∩i∈IAi) =∏i∈IP (Ai)

for all subsets I of {1, 2, . . . , n}.We would like to emphasize that pairwise independence does not imply

the overall independence.

Page 9: Stat333 Lecture Notes - Daum

4 CHAPTER 1. INTRODUCTION

Let E and F be two events and P (F ) > 0. We define that the conditional

probability of E given F by

P (E|F ) = P (EF )/P (F ).

As already defined, two events E and F are independent if and only if

P (EF ) = P (E)P (F ). When events E and F are independent, we find

P (E|F ) = P (E)

when P (F ) > 0. However, we should not use this relationship as the defi-

nition of independence. When P (F ) = 0, the conditional probability is not

defined, but E and F can still be two independent events.

1.3 Bayes Formula

Let Fi, i = 1, 2, . . . , n be mutually exclusive events such that ∪Fi = S, and

P (E) > 0. Then

P (Fk|E) =P (EFk)

P (E)=

P (E|Fk)P (Fk)∑i P (E|Fi)P (Fi)

.

The Bayes formula is a mathematical consequence of defining the condi-

tional probability. However, this formula has generated a lot of thinking in

statistics. We could think of E is an event (subset of sample space) of some

experiment to be done, and Fi’s classify the sample points of the same exper-

iment according to possibly a different rule (than the rule of E). Somehow,

E is readily observed, but Fi’s are not. Before the experiment is done, we

may have some prior information on what probabilities of Fi’s are. However,

when the experiment is done and the outcome (the sample point) is known

to belong to E, but its membership in Fi remains unknown, this Bayes for-

mula allows us to update our assessment of the chance for Fi in view of the

occurrence of E. For example, before we toss a die, it is known the chance of

observing 2 is 1/6. After a die is tossed, and you are told that the outcome

is an even number, then the conditional probability becomes 1/3.

Here is a less straightforward example.

Page 10: Stat333 Lecture Notes - Daum

1.4. KEY FACTS 5

Example 1.2

There are three coins in a box: 1. two headed; 2. fair; 3. biased with P (H) =

0.75.

When one of the coins is selected at random and flipped, it shows head.

What is the probability that it is the two headed coin?

Solution: Let C1, C2 and C3 represent the events when the two headed,

fair or biased coin is selected, respectively. We want to find P (C1|H).

P (C1|H) =P (H|C1)P (C1)∑3i=1 P (H|Ci)P (Ci)

.

The answer is 4/9. ♦Remark It is not so important to memorize the Bayes formula, but the def-

inition of the conditional probability. Once you understand the conditional

probability, you can work out the formula easily.

1.4 Key Facts

A probability space consists of three components: Sample space, the collec-

tion of events, and the probability measure. The probability measure satisfies

three Axioms and from which we introduce the concepts of conditional prob-

ability and independence. The Bayes theorem is a simple consequence of

manipulating the idea of conditional probability. However, the result incited

philosophical debate in statistics.

1.5 Problems

1. Suppose that in an experiment, a fair die is rolled twice. Let A={the

first outcome is even}, B={the total score is 4}, C= the total score,

D=the absolute difference between two scores.

(a) Which of A, B, C, D are events? Which of them are random

variables?

(b) Which of the following make sense? Which of them do not?

(i) A ∪B, (ii) P (C), (iii) E(A), (iv) Var(D).

Page 11: Stat333 Lecture Notes - Daum

6 CHAPTER 1. INTRODUCTION

2. Let S be the sample space of an particular experiment, A and B be

events, and P be a probability measure. Which of the followings are

Axioms, definitions and formulas?

(i) P (A ∪B) = P (A) + P (B)− P (AB).

(ii) P (S) = 1.

(iii) P (A|B) = P (AB)/P (B) when P (B) 6= 0.

3. Using only the axioms of probability, show that

1) P (A ∪B) = P (A) + P (B)− P (AB)

2) P (A∪B∪C) = P (A)+P (B)+P (C)−P (AB)−P (AC)−P (BC)+

P (ABC).

4. a) Prove that P (ABC) = P (A|BC)P (B|C)P (C).

b) Prove that if A and B are independent, then so are Ac and Bc.

5. Let A and B be two events.

(a) Show that in general, if A and B are mutually exclusive, then they

are not necessarily independent.

(b) Find a particular pair of events A and B such that they are both

mutually exclusive and independent.

6. Prove Boole’s inequalities:

(a) P (∪ni=1Ai) ≤∑ni=1 P (Ai), (b) P (∩ni=1Ai) ≥ 1−∑n

i=1 P (Aci).

7. Let A1 ⊃ A2 ⊃ · · · be a sequence of events. If⋂∞i=1 Ai = φ(empty),

show that

limn→∞

P (An) = 0.

Page 12: Stat333 Lecture Notes - Daum

Chapter 2

Random Variables

2.1 Random Variable

In practice, we may describe the outcomes of an experiment by any termi-

nology. For example, if Mary and Paul compete in a game, the outcomes can

be: Mary wins; Mary loses; it is a draw.

However, it is more convenient in mathematics to code the outcomes by

numbers. For example, we may define the outcome as 1 if Mary wins, the

outcome is −1 if Mary loses, and as 0 if it is a draw. That is, we can transform

the outcomes in S into numbers. There are many ways to transform the

outcomes.

In probability theory, we call the mechanism of transforming sample

points into numbers as Random Variable. More formally, we define a

random variable as a function on the sample space S.

We use capital letters X, Y , and so on for random variables.

In most applications, we focus mainly on the value of the function (ran-

dom variables). That is why it appears that the random variables are num-

bers, rather than mechanisms of transforming sample points into numbers.

As a function, a random variable is totally deterministic. There is nothing

random. However, the inputs of this function are random, this fact implies

the outcome of the transformation is random. This is how we get the notion

that random variables are random.

Example 2.1

7

Page 13: Stat333 Lecture Notes - Daum

8 CHAPTER 2. RANDOM VARIABLES

Let S be the ordered outcomes of rolling two fair dice. Define X be the sum

of two outcomes. If ω = (2, 5) which is a sample point, then X(ω) = 7.

Nothing is random. ♦Since in a specific experiment, we are not certain in advance whether the

two outcomes will be ω = (2, 5), we hence do not know whether the outcome

of X will be 7. This gives us the illusion of X being random. Its randomness

is inherited from the randomness of the outcome in S.

When we use notation “X = 7”, we often do not mean that the outcome

of X is 7 in a specific experiment. Rather, we define it as

“X = 7” = “Set of sample points which makes X equal 7”.

Hence, in this example,

“X = 7” = {(1, 6), (2, 5), . . . , (6, 1)}

which is a subset of S. Consequently, it is an event. When the dice are fair,

the classical definition assigns a probability of 1/6 to this event.

If the dice are not fair, we usually assign a different value to it, or we

do not know what value is most suitable in this application. However, we

believe that there must be a suitable value exists and it does not have any

effect on the definition of X.

There is another excuse for not focus on the fact that a random variable

X is a function. We care more about probabilities associated with events in

the form of “X ≤ x” than about how X maps S into real numbers. Once

P (X ≤ x) is available for all real numbers x, we then classify X according

to the form of this function, and ignore X itself.

Example 2.2

Toss a coin until the first head appears. Suppose in each trial, P (H) = p

and trials are independent. Define X = the number of tosses when the

experiment is completed.

In this experiment, the sample space is

S = {H,TH, TTH, . . .}.

Page 14: Stat333 Lecture Notes - Daum

2.2. DISCRETE RANDOM VARIABLES 9

The corresponding values of X are

{1, 2, 3, . . .}.

We find

P (X = n) = p(1− p)n−1

for all n ≥ 1. Once this is done, we say X has geometric distribution. How

this X is defined becomes irrelevant. ♦If X is a random variable, we call

F (x) = P (X ≤ x)

the cumulative distribution function(c.d.f.). It is known that F (x) is a

c.d.f. of some random variable in some probability model if and only if

1. F (x) ≥ 0;

2. F (∞) = 1, F (−∞) = 0;

3. F (x) is monotone increasing and right continuous.

That is, we can construct a sample space together with a probability measure

and a random variable, so that the cumulative distribution function of this

random variable is given by F (x).

2.2 Discrete Random Variables

If the set of all possible outcomes of a random variable X is countable, then

we say that the random variable X is discrete.

For example, if a random variable can only take values {.2, .5,√

2, π}, it

is discrete. More commonly seen discrete random variables in our textbooks

take integer values. However, we should remember that discrete random

variable can take any values, as long as the number of possible values remain

countable.

By the way, the notion of countable needs to be clarified. If we can find

a one-to-one map from a set to a set of integers, then this set is count-

able. The set of all even numbers is countable. The set of the numbers

Page 15: Stat333 Lecture Notes - Daum

10 CHAPTER 2. RANDOM VARIABLES

{1, .1, .01, .001, . . .} is also countable. Being countable implies that we can

arrange the elements in the set into a sequence. We often represent a count-

able set of real numbers as {x1, x2, . . .}.If { t1, t2, t3, . . .} is the set of possible outcomes of X, we say the function

f(ti) = P (X = ti)

the probability (mass) function (p.m.f.) of X.

Note that in this definition, I used notation ti for possible values of the

random variable X. Although it is a common practice that we use xi’s for

possible values of the random variable X, this is not a requirement. It is

very important for us to make a distinction between (the notation of) the

possible values of X, and X itself.

2.3 Continuous Random Variables

If the c.d.f. of a random variable F (x) = P (X ≤ x) can be written as

F (x) =∫ x

−∞f(t)dt,

for some non-negative f(t), we say X is absolutely continuous. We have

f(x) = dF (x)/dx

(for almost all x) and f(x) is called the density function of X.

We classify random variables according to their cumulative distribution

functions, probability functions or density functions. We usually do not mind

how these random variables are defined.

Example 2.3

1. X has Binomial (n, p) distribution if f(i) = P (X = i) =(ni

)pi(1−p)n−i

for i = 0, 1, . . . , n.

2. X has Poisson (µ) distribution if

f(i) = P (X = i) =µi

i!exp(−µ)

for i = 0, 1, 2, . . ..

Page 16: Stat333 Lecture Notes - Daum

2.4. EXPECTATIONS 11

3. X has uniform [0, 1] distribution if F (x) = P (X ≤ x) = x for x ∈ [0, 1],

or f(x) = 1 for x ∈ [0, 1].

4. X has exponential distribution with mean parameter θ if its c.d.f. is

given by F (x) = 1 − exp(−x/θ) or if its p.d.f. is given by f(x) =1θ

exp(−x/θ) for x ≥ 0.

♦Note that we do not have to specify the sample space, probability mea-

sure, and how the random variables are defined in the above example.

Two basic types of random variables have been introduced. In theory,

there is a third type of random variables. However, the third type of random

variables is usually not discussed in elementary probability courses. Notice

that the sum of two random variables is clearly another random variable.

When we add a continuous random variable to a discrete random variable,

the new random variable is not discrete nor continuous. That is, we can-

not always classify a random variable into one of the three possible types.

A measure theory result states, however, that any random variable can be

written as a linear combination of three random variables of each type.

2.4 Expectations

A proper definition of the expectation of a random variable needs advanced

knowledge of real analysis. We give a handicapped definition as follows.

If X is discrete with possible values {x0, x1, x2, . . . , }, then we calculate

its expectation as

E(X) =∞∑i=0

xiP (X = xi)

when the summation converges absolutely.

If X is (absolute) continuous with density function f(x), then we calculate

its expectation as

E(X) =∫ ∞−∞

tf(t)dt

when the integration converges absolutely.

Page 17: Stat333 Lecture Notes - Daum

12 CHAPTER 2. RANDOM VARIABLES

When the convergence does not hold, we say the expectation does not

exist.

To calculate the expectation of any random variable, we should pay a lot

of attention to the “if” part before you start. Many students lost the clue

because they ignore this part of the definition.

Example 2.4

Calculate expectation of Binomial and Exponential random variables. ♦

2.5 Joint Distribution

Let X and Y be two random variables. Note that it is possible to define two

functions on the same sample space. For example, suppose our sample space

is [0, 1]×[0, 1], the unit square. Every sample point can be represented as

(w1, w2). Let

X(w1, w2) = w1, Y (w1, w2) = w2

and assume the probability measure on [0, 1]×[0, 1] is uniform. Then both

X and Y have uniform distribution. We find

P (X ≤ s, Y ≤ t) = st;

when (s, t) ∈ [0, 1]×[0, 1].

If Z is another random variable such that

Z(w1, w2) = 1− w1.

We find Z also have uniform distribution. However,

P (X ≤ s, Z ≤ t) 6= st

in general.

The moral of this example is: knowing individual distributions of X, Y

and Z is not enough to tell their joint behavior.

The joint random behavior of two random variables X and Y is charac-

terized by their joint c.d.f. defined as

F (x, y) = P (X ≤ x, Y ≤ y).

Page 18: Stat333 Lecture Notes - Daum

2.5. JOINT DISTRIBUTION 13

The joint c.d.f. of more random variables are defined similarly.

Let us point out again that the lower case letters x, y are notation for

dummy variables. They do not have to associate with random variables X

and Y . That is, we may use

F (s, t) = P (X ≤ s, Y ≤ t)

to represent exactly the same joint c.d.f. It is the appearance of X, Y in the

definition that makes F the joint c.d.f of X and Y .

The marginal c.d.f of X or Y can be obtained by taking limit.

FX(s) = P (X ≤ s) = limt→∞

F (s, t).

FY (y) = P (Y ≤ y) = lims→∞

F (s, y).

Note that I used (s, t, y) on purpose. It is certainly not a good practice,

but the point is, X does not have to be linked with x.

When both X and Y are discrete, it is more convenient to work with the

joint probability (mass) function:

f(x, y) = P (X = x, Y = y);

When there exists a non-negative function f(x, y) such that

F (x, y) =∫ y

−∞

∫ x

−∞f(s, t)dsdt

for all real numbers (x, y), we say that X and Y are jointly (absolutely)

continuous and f(x, y) is their joint density function.

The marginal probability function (for discrete case) can be obtained as

fX(x) =∑y

f(x, y).

The marginal density function (for continuous case) can be obtained as

fX(x) =∫f(x, y)dy.

Page 19: Stat333 Lecture Notes - Daum

14 CHAPTER 2. RANDOM VARIABLES

2.6 Independence

If the joint c.d.f. of X and Y satisfies F (x, y) = FX(x)FY (y) for all x, y, then

we say X and Y are independent.

When both X and Y are discrete, then the independence is equivalent to

f(x, y) = fX(x)fY (y)

for all (x, y) where f(x, y) is the joint probability function. When X and Y

are jointly continuous, then the independence is equivalent to

f(x, y) = fX(x)fY (y)

for almost of (x, y) where f(x, y) is the joint density function.

2.7 Formulas for Expectations

Let X and Y be two random variables. We define

V ar(X) = E(X − E(X))2 = E(X2)− (EX)2;

Cov(X, Y ) = E[(X − EX)(Y − EY )].

It is known that

E(aX + bY ) = aEX + bEY ;

V ar(aX + bY ) = a2V ar(X) + b2V ar(Y ) + 2abCov(X,Y )

where a, b are two real numbers (constants).

Let Z = X + Y be a newly created random variable. Its c.d.f. can be

derived from the joint c.d.f. of X and Y . This task is not always simple.

There are two special cases.

First, assume X and Y are independent and jointly continuous. Assume

that X has density function f(x) and Y have density function g(y). Then

we know that the joint density function f(x, y) = f(x)g(y). The density

function of Z = X + Y is given by

fZ(t) =∫ ∞−∞

f(t− y)g(y)dy.

Page 20: Stat333 Lecture Notes - Daum

2.8. KEY RESULTS AND CONCEPTS 15

Second, assume X and Y are independent, take non-negative integer val-

ues only, with probability functions f(x) and g(y). (Note the notation look

the same as before). The joint probability function of Z = X + Y is

P (Z = n) =n∑i=0

f(i)g(n− i).

Example 2.5 Derivation of the distribution of X + Y .

1. Both X and Y have exponential distribution with common density

F (x) = exp(−x) for x ≥ 0.

2. Both X and Y have Poisson distribution with means µ1 and µ2.

2.8 Key Results and Concepts

Random variables are real valued functions defined on the sample space.

Their randomness is the consequence of the randomness of the outcome from

the sample space. We classify them according to their cumulative distribu-

tion functions or equivalently, their probability mass functions or probability

density functions.

A discrete random variable takes at most countable number of possible

values. An absolutely continuous random variable has cumulative distribu-

tion function which can be obtained from a density function by integration.

(Or roughly, its cumulative distribution function is differentiable). The third

type of random variable is not discussed.

A random variable has, say, Poisson distribution if its probability function

has the formµn

n!exp(−µ), n− 0, 1, 2, . . . .

In general, the distribution of a random variable is named after the form of

its cumulative distribution function.

The mean, variance, moments of a random variable are determined by

its distribution. In many examples, they can be obtained by summation or

Page 21: Stat333 Lecture Notes - Daum

16 CHAPTER 2. RANDOM VARIABLES

integration (to some students) easily. In other examples, the mean, variance

of a random variable can be obtained via its relationship to other random

variables. Thus, memorizing some formulas is useful.

2.9 Problems

1. If X and Y are two random variables, what do we mean by

(i) F (x) is the cumulative distribution function of X?

(ii) X ≤ 4 is independent of Y ≥ 2?

2. LetX be a random variable with Binomial distribution with parameters

n = 3, p = 0.4, i.e.

pX(k) =

(3

k

)(0.4)k(1− 0.4)3−k when k = 0, 1, 2, 3.

Let Y = (X − 1)2.

(i) Let FX(x) be the cumulative distribution function of X. Calculate

FX(2.4).

(ii) Tabulate the probability function of Y .

(iii) Tabulate the probability function of X given Y = 1. (iv) Tabulate

E(X|Y ).

3. A random number N of fair dice is thrown. P (N = n) = 2−n, n ≥ 1.

Let S be the sum of the scores. Find the probability that

a) N = 2 given S = 4

b) S = 4 given N = 2.

c) S = 4 given N is even

d) the largest number shown by any die is r.

4. A coupon is selected at random from a series of k coupons and placed

in each box of cereal. A house-husband has bought N boxes of cereal.

Page 22: Stat333 Lecture Notes - Daum

2.9. PROBLEMS 17

What is the probability that all k coupons are obtained? (Hint: Con-

sider the event that the ith coupon is not obtained. The answer is in

nice summation format.)

5. If birthdays are equally likely to fall in each of the twelve months of

the year, find the probability that all twelve months are represented

among the birthdays of 20 people selected at random.

(Hint: let Ai be the event that the ith month is not included and

consider A1 ∪ A2 · · · ∪ A12)

6. Let X be a random variable and g(·) be a real valued function.

(a) What do we mean by X is discrete?

(b) If X is a discrete random variable, argue that g(X) is also a random

variable and discrete.

(c) If X is a continuous random variable, is g(X) necessarily a contin-

uous random variable? Why?

7. Let a and b be independent random variables uniformly distributed in

(0, 1). What is the probability that x2 + ax+ b = 0 has no real roots?

8. Express the distribution functions of

X+ = max{0, X}, X− = −min{0, X}, |X| = X+ +X−, −X

in terms of the distribution function F of the random variable X.

9. Is it generally true that E(1/X) = 1/E(X)? Is it ever true that

E(1/X) = 1/E(X)?

10. Suppose that 10 cards, of which 5 are red and 5 are green, are put

at random into 10 envelops, of which 7 are red and 3 are green, so

that each envelop will contain a card. Determine the probability that

exactly k envelopes will contain a card with a matching color(k=0, 1,

. . ., 10).

Page 23: Stat333 Lecture Notes - Daum

18 CHAPTER 2. RANDOM VARIABLES

Page 24: Stat333 Lecture Notes - Daum

Chapter 3

Conditional Distribution and

Expectations

3.1 Introduction

Suppose both X and Y are discrete and hence have a joint probability func-

tion f(x, y). Then, we have

P (X = x|Y = y) =P (X = x, Y = y)

P (Y = y)=f(x, y)

fY (y).

Of course, this is meaningful only if P (Y = y) = fY (y) > 0.

When we pay no attention on the part of Y = y, this is a function of

x only. However, this function (or the way of crunching the number x and

report a number called probability) is determined by X, Y and the number y

jointly. As a function of x, it is a probability function. Since it is determined

by X and Y = y, we say it is the conditional probability function of X given

Y = y. A commonly used notation is fX|Y (x|y).

Example 3.1

There are two urns. The first contains 4 white and 6 black balls, and the

second contains 2 white balls and 8 black balls. A urn is selected randomly,

and then we randomly pick 5 balls from the urn (with replacement). Define

19

Page 25: Stat333 Lecture Notes - Daum

20 CHAPTER 3. CONDITIONAL EXPECTATION

X= the number of while balls selected. What is the probability function of

X?

Solution: Consider the situations when different urns are selected. Define

Y = i if the ith urn is selected.

Let us work on the conditional probability functions first.

P (X = j|Y = 1) =

(5

j

)(.4)j(.6)5−j

and

P (X = j|Y = 2) =

(5

j

)(.2)j(.8)5−j

for j = 0, 1, . . . , 5.

The marginal probability function of X is given by

P (X = j) = (.5)[

(5

j

)((.4)j(.6)5−j] + (0.5)[

(5

j

)(.2)j(.8)5−j].

♦As we have noticed, when Y = 1 is given, X has binomial distribution

with n = 5, p = 0.4. This distribution has expectation 2. We use notation

E(X|Y = 1) = 2.

In general, we define

E(X|Y = y) =∑x

xP (X = x|Y = y)

where the sum is over all possible values of X.

Remark: Again, we should always first determine whether X is discrete.

If it is, then determine the what values of X can be before this formula is

applied.

When both X and Y are discrete, E(X|Y = y) is well defined. There are

several components in this definition. Whenever we use a new value y, the

outcome will probably change. In the last example,

E(X|Y = 1) = 2, E(X|Y = 2) = 1.

Page 26: Stat333 Lecture Notes - Daum

3.1. INTRODUCTION 21

When we focus on the value of Y in this expression, we find we have a

function of y defined as

φ(y) = E(X|Y = y).

Just like a function such as g(y) = y2, we know that φ(Y ) is also a random

variable. Thus, we might want to know the expectation of this new random

variable. It turns out that

E[φ(Y )] =∑y

φ(y)P (Y = y)

=∑y

E(X|Y = y)P (Y = y)

=∑y

[∑x

xP (X = x|Y = y)]P (Y = y)

=∑x,y

xP (X = x, Y = y)

= E(X).

To be more concrete, we do not use φ(Y ) in textbooks, but write it as

E(X|Y ) and call it the conditional expectation of X given Y . For some with

mathematical curiosity, we may write

E(X|Y ) = E[X|Y = y]|y=Y .

Hence, the above identity can be stated as

E[E(X|Y )] = E(X).

One intuitive interpretation of this result is: the grand average is the weighted

average of sub-averages. To find the average mark of students in stat230, we

may first calculate the average in each of 6 sections. Hence, we obtain 6

conditional expectations (conditioning on which section a student is in). We

then calculate the weighted average of section averages according to the size

of each section. This is the second expectation being applied on the left hand

side of the above formula.

It turns out that this concept applies to continuous random variables too.

If (X,Y ) are jointly continuous, we define the conditional density function

Page 27: Stat333 Lecture Notes - Daum

22 CHAPTER 3. CONDITIONAL EXPECTATION

of X given Y = y as

fX|Y (x|y) =f(x, y)

fY (y)

where f(x, y) is the joint density, fX and fY are marginal density functions,

and we assume that fY (y) is larger than zero,

The conditional expectation will then be defined as

E(X|Y = y) =∫ ∞−∞

xf(x, y)

fY (y)dx

which is again a function of y. Same argument implies we could define

E(X|Y ) in exactly the same way as before. It is easy to verify that

E[E(X|Y )] = E(X).

In fact, this equality is true regardless the type of random variables (after

they are properly defined). The only restriction is: all relevant quantities

exist.

3.2 Formulas

Most formulas for ordinary expectation remain valid for the conditional ex-

pectation. For example,

E(aX + bY |Z) = aE(X|Z) + bE(Y |Z).

If g(·) is a function, we have

E[g(Y )X|Y ] = g(Y )E[X|Y ]

as g(Y ) is regarded as non-random with respect to Y .

At last, we define

V ar(X|Y ) = E[(X − E(X|Y ))2|Y ].

Then

V ar(X) = E[V ar(X|Y )] + V ar[E(X|Y )].

Page 28: Stat333 Lecture Notes - Daum

3.2. FORMULAS 23

To show this, notice that

E[V ar(X|Y )] = E{E[(X − E(X|Y ))2|Y ]}= E{E(X2|Y )− [E(X|Y )]2}= EX2 − E[E(X|Y )]2,

and

V ar(E(X|Y )) = E[E(X|Y )]2 − [E{E(X|Y )}]2

= E[E(X|Y )]2 − [E(X)]2.

Adding them up, we get the conclusion.

Example 3.2

A miner is trapped in a mine with 3 doors. If he uses the first door, he will

be free 2 hours later. If he uses the second, he will be back to the same spot

3 hours later. If he uses the third door, he will be back to the same spot 5

hours later. Assume that he does not have memory and will always pick a

door at random. What is the expected time it takes for him to get free?

Solution: Let X be the number of hours it takes until he gets free. We

are asked to calculate E(X).

It seems that the expectation is simpler if we know which door he selected

in the first place. For this reason, we define random variable Y to be the

door he selects in the first try.

Now it is simple to write down

E(X|Y = 1) = 2.

However, we only have

E(X|Y = 2) = 3 + E(X), E(X|Y = 3) = 5 + E(X).

Even though it does not directly answer our question, we do have

E(X) = E[E(X|Y )] =1

3[2 + (3 + EX) + (5 + EX)].

Page 29: Stat333 Lecture Notes - Daum

24 CHAPTER 3. CONDITIONAL EXPECTATION

This is a simple linear equation, we find E(X) = 10. ♦Can we use the same idea to calculate V ar(X)?

It is seen that

V ar(X|Y = 1) = 0; V ar(X|Y = 2) = V ar(X|Y = 3) = V ar(X).

Hence,

E[V ar(X|Y )] =2

3V ar(X),

V ar[E(X|Y )] =1

3[22 + 133 + 152]− 102 =

98

3.

Consequently, we find

V ar(X) =2

3V ar(X) +

98

3

and hence V ar(X) = 98. �

Remark: We certainly do not believe that the miner will be memoryless.

Such an example might be useful to model a trapped mouse. We might be

able to make the inference whether a mouse will learn after repeating this

experiment a number of times. We could compare the observed average with

this theoretical average under memoryless assumption. Any discrepancy may

point to the possibility that the mouse is in fact learning.

3.3 Comment

It could be claimed that the probability theory is a special case of mea-

sure theory in mathematics. However, the concepts of independence and

conditional expectation allow probability theory to be a separate scientific

discipline.

Our subsequent developments depend heavily on the use of conditional

expectation.

Page 30: Stat333 Lecture Notes - Daum

3.4. PROBLEMS 25

3.4 Problems

1. Let X be an random variable such that

P (X = n) = p(1− p)n, n = 0, 1, 2, . . .

is its probability function and 0 < p < 1.

(i) Show that P (X ≥ k) = (1− p)k for k = 0, 1, 2, . . ..

(ii) Prove the memoryless property:

P (X ≥ k1 + k2|X ≥ k1) = P (X ≥ k2)

for all non-negative integers k1 and k2. (iii) Calculate the probability

that X is even.

2. Suppose X and Y are independent and exponentially distributed with

parameter λ > 0. Their common probability density function is

f(t) = λe−λt t ≥ 0.

(i) Calculate P (X > 5|X > 3).

(ii) Calculate P (X + Y ≤ 1).

3. There are two TA’s for a certain course. For a particular assignment

handed in, if it were marked by the first TA, the mark would be random

with mean 75% and variance (0.1)2; while if it were marked by the

second TA, the mark would be random with mean 70% and variance

(0.05)2. The first TA has 40% chance to mark any single assignment.

Let X be the mark of the particular assignment. Calculate the mean

and variance of X.

4. Let X1, X2, X3, . . . be independently distributed random variables such

that Xn has probability mass function

fn(k) = P (Xn = k) =

(n

k

)pk(1− p)n−k k = 0, 1, . . . , n.

Page 31: Stat333 Lecture Notes - Daum

26 CHAPTER 3. CONDITIONAL EXPECTATION

(a) Find the probability generating function of Xn.

(b) Find the probability generating function of X1 +X2 +X3.

(c) Let N be a positive integer valued random variable with probability

generating function G(s) and assume it is independent of X1, X2, . . ..

Find the probability generating function of XN

(d) Continuation of (c), find the probability generating function ofXN+

XN+1.

5. An integer N is chosen from the geometric distribution with probability

function

fN(n) = θ(1− θ)n−1, n = 1, 2, . . .

Given N = n, X has the uniform distribution on 1, 2, . . . , n.

a) Find the joint p.f. of X and N .

b) Find the conditional p.f. of N given X = x.

6. The number of fish that Elise catches in a day is a Poisson random

variable with mean 30. However, on the average, Elise tosses back two

out of every three fish she catches. What is the probability that, on a

given day, Elise takes home n fish. What is the mean and variance of

(a) the number of fish she catches,

(b) the number of fish she takes home?

(What independence assumptions have you made?)

7. Let X1, X2, X3 be independent random variables taking values in the

positive integers and having probability function given by P (Xi = x) =

(1− pi)px−1i for x = 1, 2, . . . , and i = 1, 2, 3.

(a) Show that

P (X1 < X2 < X3) =(1− p1)(1− p2)p2p

23

(1− p2p3)(1− p1p2p3).

(b) Find P (X1 ≤ X2 ≤ X3).

Page 32: Stat333 Lecture Notes - Daum

3.4. PROBLEMS 27

8. Suppose that 13 cards are selected at random from a regular deck of 52

playing cards. (a) If it is known that at least one ace has been selected,

what is the probability that at least two aces have been selected? (b)

If it is known that the ace of heart has been selected, what is the

probability that at least two aces have been selected?

9. The number of children N in a randomly chosen family has mean µ

and variance σ2. Each child is a male with probability p independently

and X represents the number of male children in a randomly chosen

family. Find the mean and variance of X.

10. Suppose we have ten coins which are such that if the ith one is flipped

then heads will appear with probability i/10, i = 1, 2, . . . , 10. When

one of the coins is randomly selected and flipped, it shows head. What

is the conditional probability that it was the fifth coin?

Page 33: Stat333 Lecture Notes - Daum

28 CHAPTER 3. CONDITIONAL EXPECTATION

Page 34: Stat333 Lecture Notes - Daum

Chapter 4

Generating functions and their

applications

4.1 Introduction

Suppose that {aj} = {a0, a1, . . .}, is a sequence of real numbers. If

A(s) =∞∑j=0

ajsj = a0 + a1s+ a2s

2 + · · · (4.1)

converges in some interval |s| ≤ s0 where s0 > 0, then A(s) is called the

generating function of the sequence {aj}∞0 . The generating function provides

a convenient summary of a real number sequence. In many examples, simple

and explicit expressions of A(s) can be obtained. This enables us to study

the properties of {aj}∞0 conveniently.

Example 4.1

The Fibonacci sequence {fj} is defined by f0 = 0, f1 = 1 and the recursive

relationship

fj = fj−1 + fj−2, j = 2, 3, . . . . (4.2)

We use the tool of generating function to find explicit expressions of fj.

29

Page 35: Stat333 Lecture Notes - Daum

30 CHAPTER 4. GENERATING FUNCTIONS

Solution: Multiplying by sj and summing over j gives

∞∑j=2

fjsj =

∞∑j=2

fj−1sj +

∞∑j=2

fj−2sj. (4.3)

Note the summation starts from j = 2 because (4.2) is valid only when

j = 2, 3, . . .. By defining F (s) =∑∞j=0 fjs

j, we get

∞∑j=2

fjsj =

∞∑j=0

fjsj − f0 − f1s = F (s)− s.

With similar treatment on the right hand side of (4.3), we obtain

F (s)− s = sF (s) + s2F (s). (4.4)

Ignoring the convergence issue for the moment, we find

F (s) =s

1− s− s2.

This is surely a simple and explicit generating function. To study other

properties of the sequence, let us note that in general, a generating function

has the McLaurin series expansion

A(s) = A(0) + A′(0)s+ A′′(0)s2/2! + · · ·

which by comparison with (4.1) gives

aj =A(j)

j!.

This, of course, requires the function be analytic at 0 which is true when A(s)

converges in a neighbourhood of 0. An obvious conclusion is: the real number

sequences and the generating functions have an one-to-one correspondence

when the convergence and the analytic properties are true.

Now let us get back to the example, F (s) clearly converges at least for

|s| ≤ 0.5. This allows us to look for its McLaurin series expansion. Note that

1− s− s2 = (1− 1 +√

5

2s)(1− 1−

√5

2s)

Page 36: Stat333 Lecture Notes - Daum

4.1. INTRODUCTION 31

and by the method of partial fractions

F (s) =1√5

[∞∑j=0

(1 +√

5

2)jsj −

∞∑j=0

(1−√

5

2)jsj].

Recall the property of one-to-one correspondence,

fj =1√5

[(1 +√

5

2)j − (

1−√

5

2)j], j = 0, 1, 2, . . . .

♦It is interesting to note that

limj→∞

fj/fj−1 = (1 +√

5)/2

which is the golden ratio, to which the ancient Egyptians attributed many

mystical quantities.

In this example, the generating function has been used as a tool for solving

the difference equation (4.2). The generating functions will be seen to be

far more useful than just this. For example, if A(s) converges in |s| ≤ s0

with s0 > 1, then

A(1) =∞∑j=1

aj, A′(1) =∞∑j=1

jaj

and so on.

Example 4.2

Consider the following series:

aj = 1, j = 0, 1, 2, . . . ;

bj = 1/j!, j = 0, 1, 2, . . . ;

c0 = 0, cj = 1/j, j = 1, 2, . . . .

Easy calculation shows their corresponding generating functions are A(s) =

(1−s)−1, B(s) = es and C(s) = − log(1−s), where the regions of convergence

as |s| < 1 for A(s) and C(s), and all real s for B(s).

Page 37: Stat333 Lecture Notes - Daum

32 CHAPTER 4. GENERATING FUNCTIONS

4.2 Probability Generating Functions

Let X be a random variable taking non-negative integer values with proba-

bility function {pj}, where

pj = P{X = j}, j = 0, 1, 2, . . . .

The generating function of {pj} is called the probability generating func-

tion of X and we write

G(s) = GX(s) = E{sX} = p0 + p1s+ p2s2 + · · · . (4.5)

Of course, this function provides a convenient summary of the probability

function of X. Note that it converges at least for |s| ≤ 1 since, for s in this

interval,∞∑j=0

pj|s|j ≤∞∑j=0

pj = 1.

Using some mathematics tools, we can easily find

G′(1) = E(X) =∞∑j=0

jpj, G(r)(1) = E(X(r)) =∞∑j=0

j(r)pj

whenever the corresponding quantities exist. Otherwise, G(r)(1) has to be

replaced by lims→1− G(r)(s) and infinity outcome is allowed. Note j(r) =

j(j− 1) · · · (j− r+ 1) and E(X(r)) is the rth factorial moment of X. The

variance of X can be expressed as

V ar(X) = E(X(2)) + E(X)− [E(x)]2 = G′′(1) +G′(1)− [G′(1)]2.

Example 4.3

Suppose X has geometric distribution with parameter p so that

pj = P (X = j) = p(1− p)j, j = 0, 1, 2, . . . .

The probability generating function of X is

G(s) = E(sX) =∞∑j=0

p(1− p)jsj = p[1− (1− p)s]−1

Page 38: Stat333 Lecture Notes - Daum

4.2. PROBABILITY GENERATING FUNCTIONS 33

for |s| < (1− p)−1. The mean of X is

E(X) = G′(1) = p(1− p)[1− (1− p)s]−2|s=1 = p−1 − 1.

From

E(X(2)) = G′′(1) = 2p(1− p)2[[1− (1− p)s]−3|s=1 = 2p−2(1− p)2,

we have

V ar(X) = 2p−2(1− p)2 + p−1 − 1− (p−1 − 1)2 = p−2(1− p).

Let us note that the definition of geometric distribution can be different in

different places. We should find it out before we start.

Consider now the sequence of tail area probabilities {qj} defined by

qj = P (X > j) = pj+1 + pj+2 + · · · , j = 0, 1, 2, . . . .

Let Q(s) =∑∞j=0 qjs

j be the corresponding generating function and note that

since qj ≤ 1 for all j, it follows that

∞∑j=0

qjsj ≤

∞∑j=0

sj = (1− s)−1

for |s| < 1. Note that q0 = 1− p0 and

qj = qj−1 − pj, j = 1, 2, . . . . (4.6)

Again, (4.6) is true for all j start from 1. Multiplying (4.6) by sj and summing

over j we obtain∞∑j=1

qjsj =

∞∑j=1

qj−1sj −

∞∑j=1

pjsj

so that

Q(s)− (1− p0) = sQ(s)− [G(s)− p0].

Thus, for all |s| < 1, we have that

Q(s) =1−G(s)

1− s. (4.7)

Page 39: Stat333 Lecture Notes - Daum

34 CHAPTER 4. GENERATING FUNCTIONS

♦Since G(1) = 1, it follows from (4.7) and the Mean Value Theorem in

calculus that, for given |s| < 1, there exists s∗ ∈ (s, 1) such that

Q(s) = G′(s∗).

It follows that

lims→1−

Q(s) = lims→1−

G′(s∗) = E(X).

Note, we have, in fact, proved

E(X) =∞∑j=0

qj.

As a way to check whether you understand the technique of obtaining Q(s),

please use the similar technique to find the generating function of P (X ≤ j).

4.3 Convolutions and Sums of Independent

Random Variables

Let {aj} and {bj} be two sequences of real numbers and cj be a sequence

defined by

cj =j∑l=0

albj−l = a0bj + a1bj−1 + · · ·+ ajb0, j = 0, 1, 2, . . . .

The new sequence is called the convolution of {aj} and {bj}; we write

{cj} = {aj} ∗ {bj}.

Theorem 4.1

If A(s), B(s) and C(s) are the generating functions of {aj}, {bj} and {cj} =

{aj} ∗ {bj} respectively, then (when they all exist at s)

C(s) = A(s)B(s).

Proof Let bj = 0 when j = −1,−2, . . .. Hence,

cj =∞∑l=0

albj−l.

Page 40: Stat333 Lecture Notes - Daum

4.3. CONVOLUTION 35

Thus,

C(s) =∞∑j=0

cjsj =

∞∑j=0

∞∑l=0

albj−lsj

=∞∑l=0

∞∑j=0

albj−lsj =

∞∑l=0

[alsl∞∑j=0

bj−lsj−l]

=∞∑l=0

[alslB(s)] = A(s)B(s).

♦If X and Y are two non-negative integer valued independent random

variables and A(s) and B(s) are their probability generating functions, then

the probability function of Z = X + Y is

C(s) = E(sZ) = E(sX+Y ) = E(sX)E(sY ) = A(s)B(s).

Thus, the above theorem implies that the probability function of Z is the

convolution of these of X and Y . Namely,

P (Z = j) =j∑l=0

P (X = l)P (Y = j − l), j = 0, 1, 2, . . . ,

a fact we knew already.

However, the theorem is more useful than this. First, we may directly

identify the distribution of Z by the form of C(s) rather than by the form of

P (Z = j). Second, by expanding C(s), we can avoid the direct summation

to find P (Z = j).

Example 4.4

Assume X and Y are independent and have binomial distributions with pa-

rameters (n, p) and (m, p) respectively. Then

A(s) =n∑j=0

(n

j

)pj(1− p)n−jsj =

n∑j=0

(n

j

)(ps)j(1− p)n−j = (1− p+ ps)n,

Page 41: Stat333 Lecture Notes - Daum

36 CHAPTER 4. GENERATING FUNCTIONS

and B(s) = (1 − p + ps)m. Hence, the probability generating function of

X + Y is C(s) = A(s)B(s) = (1 − p + ps)m+n. Thus, we know X + Y has

binomial distribution right away and it is simple to expand C(s) to find

P (X + Y = j) =

(m+ n

j

)pj(1− p)m+n−j, j = 0, 1, . . . ,m+ n.

4.3.1 Key Facts

Suppose that X has probability generating function, G(s).

E(X) = G′(s)|s=1, V ar(X) = [G′′(s) +G′(s) + {G′(s)}2]|s=1.

If X and Y are independent random variables,

GX+Y (s) = GX(s)GY (s).

4.4 The Simple Random Walk

Let Z1, Z2, . . . be a sequence of independent random variables with P (Zn =

1) = p and P (Zn = −1) = q = 1 − p, 0 < p < 1. Let X0 = 0 and

Xn = Xn−1 + Zn for n ≥ 1. The stochastic process {Xn}∞n=0 is called a

simple random walk. By plotting Xn against n, we may obtain a figure

as follows.

When used to model gambling, Xn would be the net winning of a gambler

after n games, where each game results in a gain of one dollar with probability

p, or a loss of one dollar with probability q = 1 − p. When used to model

the movement of a particle, Xn would be the location of the particle on a

line after n unit times. If we use it to model the movement of a not so sober

individual walking on a line, then Xn would be its position after n steps.

This gives us the idea why this process have such a name.

Page 42: Stat333 Lecture Notes - Daum

4.4. THE SIMPLE RANDOM WALK 37

-

6

���tt@@@t@@@�

��t���t���t @

@@

t@@@

t���t @

@@

t@@@

t@@@

t���t @

@@

t���tt

Here, we use generating functions to examine properties of the process

{Xn}. Some quantities to be investigated are

un = P (Xn = 0)

fn = P (X1 6= 0, . . . , Xn−1 6= 0, Xn = 0)

λn = P (X1 < 1, . . . , Xn−1 < 1, Xn ≥ 1)

λ(r)n = P (X1 < r, . . . , Xn−1 < r,Xn ≥ r)

λ(−r)n = P (X1 > −r, . . . , Xn−1 > −r,Xn ≤ −r)

for n = 1, 2, . . . and r = 1, 2, . . ..

For convenience, we define u0 = 1, f0 = λ0 = λ(r)0 = λ(−r) = 0. In the

simple random walk as presented, Zn can be either 1 or −1. Thus, it is

impossible for “Xn−1 < r, Xn > r” to occur for any n. We insist on using

Xn−1 < r, Xn ≥ r instead of Xn−1 < r, Xn = r in the definitions of λ(r)n . It

has the advantage of being able to retain the same definition for more general

random walks.

Each of these quantities represents the probability of a particular outcome

of the simple random walk after n trials. We summarize them in the following

table.

Symbol Probability of

un return to 0 at trial n

fn first return to 0 at trial n

λn first passage through 1 at trial n

λ(r)n first passage through r at trial n

Page 43: Stat333 Lecture Notes - Daum

38 CHAPTER 4. GENERATING FUNCTIONS

Clearly, λ(1)n = λn. It is also easily seen that

u2n+1 = f2n+1 = 0;

λ2n = 0

because, for example, it is impossible to make the sum of odd number of ±1’s

equaling 0.

Since X2n =∑2ni=1 Zi, if X2n = 0, we must have equal number of Zi being

1 and being −1. Thus, it is simple to find

u2n =

(2n

n

)(pq)n, n = 1, 2, . . . .

You can try to verify that the generating function of {un} is given by

U(s) = (1− 4pqs2)−1/2.

To find the generating functions of other sequences, let F (s), Λ(s), and

Λ(r)(s) be generating functions of {fn}, {λn} and {λ(r)n }. We will get them

through some difference equations.

4.4.1 First Passage Times

The first thing we do is to find a relationship between λn and λ(2)n . Note that

for the random walk to reach 2 at trial n, it has to reach 1 at some time

between trial 1 and n. Let k be the first time when the walk reaches 1, and

it will reach 2 at n. Thus, this event can be equivalently be described as

Ak = {X1 −X0 < 1, X2 −X0 < 1, . . . , Xk−1 −X0 < 1, Xk −X0 = 1}∩{Xk+1 −Xk < 1, . . . , Xn−1 −Xk < 1, Xn −Xk = 1} = B0kBkn

where B0k and Bkn are independent events. Clearly, P (B0k) = λk−0 and

P (Bkn) = λn−k. Thus, we have

λ(2)n = P (∪n−1

k=1Ak) =n−1∑k=1

λkλn−k =n∑k=0

λkλn−k

Page 44: Stat333 Lecture Notes - Daum

4.4. THE SIMPLE RANDOM WALK 39

since λ0 = 0. Note this identity is still true even when n = 0. Therefore,

we have found {λ(2)} = {λn} ∗ {λn} (convolution) and Λ(2) = [Λ(s)]2. In like

manner,

Λ(r)(s) = [Λ(s)]r, r = 2, 3, . . . .

Although the above relationship is neat, we cannot solve it to obtain an

explicit expression of λn’s yet. Let us work on another relationship between

{λ(2)n } and {λn}. It is obvious that λ1 = p. If the first passage through 1 at

trial n such that n > 1, it requires Z1 = X1 = −1. After that, it requires the

simple random walk to gain a value of 2 in exactly n− 1 steps. Thus

λn = qλ(2)n−1, n = 2, 3, . . . . (4.8)

Multiplying both sides of (4.8) by sn and sum over n with care over its range,

we have ∞∑n=2

λnsn = q

∞∑n=2

λ(2)n−1s

n.

We find

Λ(s)− ps = qsΛ(2)n (s) = qs[Λ(s)]2

from the first relationship.

It is easy to find the two possible forms:

Λ(s) =1±√

1− 4pqs2

2qs.

When s→ 0, we should have Λ(s)→ 0 so we must have

Λ(s) =1−√

1− 4pqs2

2qs= −(2qs)−1

∞∑j=1

(1/2

j

)(−4pqs2)j

where the binomial expansion has been used. From this we find λ2n = 0 and

λ2n−1 = −(2q)−1

(1/2

n

)(−4pq)n = (2n−1)−1

(2n− 1

n

)pnqn−1, n = 1, 2, . . . .

The generating function Λ(s) will tell us more about the simple random

walk. Since

Λ(s) =∞∑n=0

λnsn,

Page 45: Stat333 Lecture Notes - Daum

40 CHAPTER 4. GENERATING FUNCTIONS

λ(1) = λ0 + λ1 + λ2 + · · ·= P (first passage through 1 ever occurs)

= (1−√

1− 4p+ 4p2)/2q = (1− |p− q|)/2q

=

{1 p ≥ q

p/q p < q.

The walk is certain to pass through 1 when p > q, or even when p = q = 1/2.

If p ≥ q, we may define the random variable N which is the waiting time

until first passage through 1 occurs. That is

N = min{n : Xn = 1}

and we know, in this case, that P (N < ∞) = 1. Since P (N = j) = λj, the

probability generating function of N is Λ(s) and

E(N) = Λ′(1−) =

{(p− q)−1, p > q

∞, p=q.

Can we still define N when p < q?

If the walk is used to model gambling, the above conclusions amount

to say: the gambler is certain to have positive net winning at some time if

p ≥ 1/2. If, however, p < 1/2, the gambler may never have a net winning at

any time. If p = 1/2, the average waiting time until the gambler wins some

money can be infinity although it is certain to occur.

4.4.2 Returns to Origin

For a first return to the origin at trial n, the walk must either begin with

X1 = −1 or X1 = 1. In the first case, the event can be written as

A = {X1 = −1, X2−X1 < 1, X3−X1 < 1, . . . , Xn−1−X1 < 1, Xn−X1 ≥ 1}.

Hence P (A) = qλn−1. In the second case, the event becomes

B = {X1 = 1,−(X2 −X1) < 1, . . . ,−(Xn−1 −X1) < 1,−(Xn −X1) ≥ 1}.

Page 46: Stat333 Lecture Notes - Daum

4.4. THE SIMPLE RANDOM WALK 41

Note {−Xn} is also a simple random work with P (−Xn = 1) = q rather than

p. Hence the event B has similar structure to the event A. Let

λ(−1)n = P (−X1 < 1,−X2 < 1, . . . ,−Xn−1 < 1,−Xn = 1).

Then, {λ(−1)n } has the same generating function as that of {λn} except for p

and q switched. In addition, P (B) = P (−X1 = −1)λ(−1)n−1 and therefore, for

n ≥ 1,

fn = P (A) + P (B) = pλ(−1)n−1 + qλn−1.

Equivalently,

F (s) = psΛ(−1)(s) + qsΛ(s)

= ps1−√

1− 4pqs2

2ps+ qs

1−√

1− 4pqs2

2qs

= 1−√

1− 4pqs2.

The probability that the process ever returns to the origin is

F (1) =∞∑n=0

fn = 1− |p− q|

and so a return is certain only if p = q = 1/2. In this case, the mean time to

return is

F ′(1−) = lims→1−

d

ds[1−√

1− s2] =∞.

Thus, if the game is fair and you have lost some money at the moment, we

have a good news for you: the chance that you will win back all your money

is 1. The bad news is, the above result also tells you that on average, you

may not live that long to see it.

4.4.3 Some Key Results in the Simple Random Walk

Symbol Expression Generating function

un(

2nn

)(pq)n U(s) = (1− 4pqs2)−1/2

fn (2n− 1)−1(

2nn

)(pq)n 1−

√(1− 4pqs2)

λn (2n− 1)−1(

2n−1n

)pnqn−1 (2qs)−1[1−

√(1− 4pqs2)]

Page 47: Stat333 Lecture Notes - Daum

42 CHAPTER 4. GENERATING FUNCTIONS

The following are key steps of deriving the results in the above table.

qs[Λ(s)]2 − Λ(s) + ps = 0;

F (s) = 1− [U(s)]−1;

F (s) = psΛ(−1)(s) + qsΛ(s);

Λ(2)(s) = [Λ(s)]2.

4.5 The Branching Process

Now let us study the second example of simple stochastic processes. Here we

have particles that are capable of producing particles of like kind. Assume

that all such particles act independently of one another, and that each parti-

cle has a probability pj of producing exactly j new particles, j = 0, 1, 2, . . .,∑pj = 1. For simplicity, we assume that the 0th generation to consist of

a single particle and the direct descendants of that particle form the first

generation. Similarly, the direct descendants of the nth generation form the

(n+ 1)th generation. u Z0 = 1���������

���

@@@

PPPPPPPPPu u u u Z1 = 4���

���

���

@@@u u u u u Z2 = 5���

���

���

���

@@@

HHHHHH

���u u u u u u u u u Z3 = 9r rr rr rr r

Let Zn be the population of the nth generation so that Z0 = 1 and

P (Z1 = j) = pj, j = 0, 1, 2, . . .. Let Xni be the direct descendants of the ith

individual in the nth generation. Hence, we have

Zn+1 =Zn∑i=1

Xni

for all n ≥ 1. In addition, all Xni are independent and have the same

distribution as that of Z1.

Page 48: Stat333 Lecture Notes - Daum

4.5. THE BRANCHING PROCESS 43

Due to the assumption of Z0 = 1, we have Z1 = X01. That is, the

population size of the first generation is the same as the family size of the

founding member.

Let Hn(s) = E(sZn) and G(s) = E(sX01). With the assumption Z0 = 1,

we have H0(s) = s and H1(s) = G(s). For an obvious reason, we call G(s)

the probability generating function of the family size distribution.

If Zn = k is given, we would have

Zn+1 =k∑i=1

Xni.

Thus,

E(sZn+1|Zn = k) = E[s∑k

i=1Xni ] = Gk(s).

Recall the definition of the conditional expectation, we have shown

E(sZn+1 |Zn) = GZn(s). (4.9)

To avoid confusing, let us write

Hn(t) = E[tZn ],

and substitute t = G(s) into it. Then, we get

Hn(G(s)) = E{[G(s)]Zn} = E{E(sZn+1|Zn)} = E(sZn+1) = Hn+1(s)

with the help of (4.9). That is, we have found an iterative relationship

Hn+1(s) = Hn(G(s)) = . . . = G(Hn(s)) n = 1, 2, . . . . (4.10)

This relationship can be used to calculate Hn(s) recursively, although the

calculations are generally not pleasant.

4.5.1 Mean and Variance of Zn

If the expectation of Z1 exists, the mean population of the nth generation is

µn = E(Zn) = H ′n(1)

Page 49: Stat333 Lecture Notes - Daum

44 CHAPTER 4. GENERATING FUNCTIONS

and from (4.10) it follows that

µn = G′(Hn−1(1))H ′n−1(1) = θµn−1, n = 1, 2, . . . (4.11)

where θ = G′(1) is the mean family size and we have Hn−1(1) = 1. Since

µ0 = 1, it follows from (4.11) that µn = θn. Thus, if θ > 1, the average

population size increases exponentially. If θ < 1, E(Zn) approaches 0 at an

exponential rate as n → ∞. The case θ = 1 gives the curious result that

E(Zn) = 1 for all n.

More directly,

V ar(Zn) = E[V ar(Zn|Zn−1)] + V ar[E(Zn|Zn−1)]

= E(Zn−1σ2) + V ar(Zn−1θ).

Thus

σ2n = µn−1σ

2 + θ2σ2n−1, n = 1, 2, . . .

where σ2n = V ar(Zn) and σ2 is the variance of the family size distribution.

Note σ20 = 0 and µn−1 = θn−1, we find

σ2n =

θn−1(1− θn)

1− θσ2, n = 1, 2, . . .

which can be established by inductive argument.

Remark The mean of Zn can be obtained by the direct arguments, while

the variance of Zn can be obtained by using the iterative relationship (4.10).

The two derivations are about equally difficulty compared the derivations

presented.

4.5.2 Probability of Extinction

Let qn represent the probability that the populations extinct by the n gen-

eration. That is, define

qn = P (Zn = 0) = Hn(0), n = 0, 1, 2, . . . .

Thus, q0 = 0 and, by (4.10),

qn = G(qn−1). (4.12)

Page 50: Stat333 Lecture Notes - Daum

4.5. THE BRANCHING PROCESS 45

Note that q0 ≤ q1 ≤ q2 ≤ · · · and qj ≤ 1 for all j. Thus,

q = limn→∞

qn

exists and represents the probability that the population ever becomes ex-

tinct. From (4.12), it follows that q is a fixed point of the probability gener-

ating function G(s); that is

q = G(q).

This gives us the idea that we need only solve the equation G(s)− s = 0

to obtain the probability of extinction. However, we need to know that when

the equation has more than one solution, which one gives the probability of

extinction?

Theorem 4.2

Let {Zn}∞n=0 be a branching process as specified in this section such that

Z0 = 1, and the family size generating function is given by G(s). Then the

probability of extinction for this branching process q is the smallest solution

of the equation

s = G(s)

in the interval [0, 1].

Proof: Assume that smallest solution in [0, 1] is q∗ and we want to show

that q = q∗.

Let qn = P (Zn = 0) for n = 0, 1, . . .. Clearly, q0 = 0 ≤ q∗. Assume that

qk ≤ q∗ for some n. Note that G(s) in an increasing function for s ∈ [0, 1].

Hence, G(qk) ≤ q∗. Consequently, qk+1 = G(qk) ≤ q∗. This implies qn ≤ q∗

for all k. Let n → ∞, we obtain q ≤ q∗. Since q is also a solution in [0, 1],

and q∗ is the smallest such solutions, we must have q = q∗. ♦In many situations, we do not have to solve the equation to determind the

value of q. Let h(s) = G(s)− s. One obvious solution in [0, 1] is s = 1. Note

h′(s) = G′(s) − 1, h′′(s) = G′′(s) =∑∞j=2 j(j − 1)pjs

j−2 ≥ 0 when s ∈ [0, 1].

Thus, h(s) is a convex function.

There are several possibilities:

1. If h′(1) = G′(1)−1 = θ−1 > 0, the curve of h(s) goes down from s = 0

and then comes up to hit 0 at s = 1. Since h(0) = P (X01 = 0) ≥ 0, the

Page 51: Stat333 Lecture Notes - Daum

46 CHAPTER 4. GENERATING FUNCTIONS

curve crosses 0 line exactly once before s = 1. Since q is the smallest

solution in [0, 1]. We must have q < 1 in this case.

2. If h′(1) = G′(1)− 1 = θ− 1 < 0, we must have h(0) = P (X01 = 0) > 0.

Thus, the curve decreases all the way down to 0 at s = 1. Thus s = 1

is the unique solution in [0, 1], we must have q = 1.

3. Suppose h′(1) = 0. If futher, h(0) = P (X01 = 0) > 0, then we are at the

same situation as in case 2; On the other hand, h(0) = P (X01 = 0) = 0

implies the family size is fixed at 1, hence q = 0.

Remark Because of the above summary, most students tend to always solve

the equation to find the probability of ultimate extinction. This is often more

than what is needed.

Example 4.5

Page 52: Stat333 Lecture Notes - Daum

4.5. THE BRANCHING PROCESS 47

Lotka (See Feller 1968, page 294) showed that to a reasonable approximation,

the distribution of the number of male offspring in an American family was

described by

p0 = 0.4825, pk = (0.2126)(0.5893)k−1, k ≥ 1

which is a geometric distribution with a modified first term. The correspond-

ing probability generating function is

G(s) =0.4825− 0.0717s

1− 0.5893s

and G′(1) = 1.261. Thus, for example, in the 16th generation, the average

population of male descendants of a single root ancestor is

θ16 = (1.261)16 = 40.685.

The probability of extinction, however, is the smallest solution of

q =0.4826− 0.0717q

1− 0.5893q.

Thus, we find q = 0.8197. This suggests that for those names that do survive

to the 16th generation, the average size is very much more than 40.685. (All

the calculations are subject to original round off errors). ♦

Example 4.6

From the point of epidemiology, it is more important to control the spread

of the disease than to cure the infected patients. Suppose that the spread of

a disease can be modeled by a branching process. Then it is very important

to make sure that the average number of people being infected by a patient

is less than 1. If so, the probability of its extinction will be one. However,

even if the average number of people being infected is larger than one, there

is still a positive chance that the disease will extinct.

A scientist in Health Canada analyzed the data from the SARS (severe

atypical respiratory syndrome) epidemic in year 2003. It is noticed that many

interest phenomena could be partially explained by the results in branching

process.

Page 53: Stat333 Lecture Notes - Daum

48 CHAPTER 4. GENERATING FUNCTIONS

First, many countries imported SARS patients but they did not cause

epidemics. This can be explained by the fact that the probability of extinc-

tion is not small (even when the average number of people being infected by

a single patient is larger than 1).

Second, a few patients were nicknamed “super-spreader”. They might

simply corresponding to the portion of branching process which do not ex-

tinct.

Third, after government intervention, the average number of people being

infected by a single patient was substantially reduced. When it fell below 1,

the epidemic was doomed to extinct.

Finally, it was not cost effective to screen all airplane passengers, but to

take strict and quick measure of ”quarantine” of new and old cases. When

the average number of people being infected by a single patient falls below

one, the disease will be controlled with probability one. ♦

4.5.3 Some Key Results in Branch Process

For simplicity, we assumed that the population starts from a single individual:

Z0 = 1; we also assumed the numbers of offsprings of various individual are

independent and have the same distribution.

Under these assumptions, we have shown that

µn = θn

and

σ2n =

θn − 1

θ − 1θn−1σ2

where θ and σ2 are the mean and the variance of the family size and µn and

σ2n are the mean and the variance of the size of the nth generation.

We have shown that the probability of extinction, q, is the smallest non-

negative solution to

G(s) = s

where G(s) is the probability generating function of the family size. Further,

it is known that q = 1 when θ < 1; when θ > 1, q < 1. When θ = 1, q = 1

unless the family size is not random.

Page 54: Stat333 Lecture Notes - Daum

4.6. PROBLEMS 49

These results can all be derived from the fact that

Hn(s) = Hn−1(G(s))

where Hn(s) is the probability generating function of the population size of

the nth generation.

4.6 Problems

1. Find the mean and variance of X when

(a) X has Poisson distribution with p(x) = µx

x!e−µ, x = 0, 1, . . ..

(b) X has exponential distribution with f(x) = λe−λx, x ≥ 0.

2. (a) If X and Y are exponentially distributed with rate λ = 1 and

independent of each other, find the density function of X + Y .

(b) If X and Y are geometrically distributed with parameter p and

independent of each other, find the probability mass function of X+Y .

(c) Find a typical discrete distribution and a typical continuous distri-

bution (not discussed in class) to repeat question (a) and (b).

3. Suppose that given N = n, X has binomial distribution with parame-

ters n and p. Suppose also N has Poisson distribution with parameter

µ. Use the technique of generating functions to find

(a) the marginal distribution of X.

(b) the distribution of N −X.

4. Let X1, X2, X3, . . . be independent and identically distributed random

variables such that X1 has probability mass function

f(k) = P (X1 = k) = p(1− p)k k = 0, 1, 2, . . . .

(a) Find the probability generating function of X1.

Page 55: Stat333 Lecture Notes - Daum

50 CHAPTER 4. GENERATING FUNCTIONS

(b) Let In = 1 if Xn ≥ n and In = 0 if Xn < n for n = 0, 1, 2, . . ..

That is, In is an indicator random variable. Show that the probability

generating function of In is given by

Hn(s) = 1 + (s− 1)(1− p)n.

(c) Let N be a random variable with probability generating function

G(s) and assume it is independent of X1, X2, . . .. Let IN = In when

N = n and In is the indicator random variable defined in (b). Show

that

E[sIN |N ] = HN(s) = 1 + (s− 1)(1− p)N .

Find the probability generating function of IN ,

5. A coin is tossed repeatedly, heads appearing with probability p = 2/3

on each toss.

(a) Let X be the number of tosses until the first occasion by which two

heads have appeared successively. Write down a difference equation for

f(k) = P (X = k). Assume that f(0) = 0.

(b) Show the generating function of f(k) is given by

F (s) =4

27s2[

2

1− 23s

+1

1 + 13s

].

(c) Find an explicit expression for f(k) and calculate E(X).

6. Let X and Y be independent random variables with negative binomial

distribution and probability function

pi =

(−ki

)pk(p− 1)i, i = 0, 1, . . . .

(a) Show that the probability generating function of X is given by

G(s) = pk

(1+(p−1)s)k.

(b) Find the probability function of X + Y .

(c) Calculate E(eX) and V ar(eX) and what condition on the size of p

is needed?

Page 56: Stat333 Lecture Notes - Daum

4.6. PROBLEMS 51

7. Give the sequences generated by the following:

1) A(s) = (1− s)−1.5;

2) B(s) = (s2 − s− 12)−1;

3) C(s) = s log(1− θs2)/ log(1− θ);

4) D(s) = s/(5 + 3s);

5) E(s) = (3 + 2s)/(s2 − 3s− 4);

6) F (s) = (p+ qs)n.

8. Turn the following equation systems into equations in generating func-

tions.

1) b0 = 1; bj = bj−1 + 2aj, j = 1, 2, . . .; a0 = 0.

2) b0 = 0, b1 = p, bn = q∑n−1r=1 brbn−1−r, n = 2, 3, . . ..

9. 1) Find the generating function of the sequence aj = j(j + 1), j =

0, 1, 2, . . ..

2) Find the generating function of the sequence aj = j/(j + 1), j =

0, 1, 2, . . ..

3) Let X be a non-negative integer valued random variable and define

rj = P (X ≤ j). Find the generating function of {rj} in terms of the

probability generating function of X.

10. 1) Negative Binomial

pj =

(−kj

)(−p)j(1− p)k, j = 0, 1, . . .

where k > 0 and 0 < p < 1.

2) Let ro = 0, rj = c/j(j + 2), j = 1, 2, . . . (find the constant c by

yourselves).

Find the means and the variances of the above distributions whichever

exists.

Page 57: Stat333 Lecture Notes - Daum

52 CHAPTER 4. GENERATING FUNCTIONS

11. Find the probability generating function of the following distributions:

1. Discrete uniform on 0, 1, . . . , N .

2. Geometric.

3. Binomial.

4. Poisson.

12. Let {an} be a sequence with generating function A(s), |s| < R, R > 0.

Find the generating functions of

1) {c+ an} where c is a real number.

2) {can} where c is a real number.

3) {an + an+2}.

4) {(n+ 1)an}.

5) {a2n} = {a0, 0, a2, 0, a4, . . .}.

6) {a3n} = {a0, 0, 0, a3, 0, 0, a6, . . .}.

13. Consider a usual branching process: let the population size of the nth

generation be Xn and family size of the ith family in the nth gener-

ation be Zn,i. Thus, Xn =∑Xn−1

i=1 Zn,i and X0 = 1. Assume Zn,i are

independent and identically distributed, and

P (Z1,1 = 0) =1

2+ a; P (Z1,1 = 1) =

1

4− 2a; P (Z1,1 = 3) =

1

4+ a,

for some a.

(a) Find probability generating function of the family size. When a =

1/8, find the probability generating function of X2.

(b) Find range of a such that the probability of extinction is less than

1.

(c) When a = 1/8, find the expectation and variance of the population

size of the 5th generation and the probability of extinct.

14. For a branching process with family size distribution given by

P0 = 1/6, P2 = 1/3, P3 = 1/2;

Page 58: Stat333 Lecture Notes - Daum

4.6. PROBLEMS 53

calculate the probability generating function of Z2 given Zo = 1, where

Z2 is the population of the second generation. Find also, the mean

and variance of Z2 and the probability of extinction. Repeat the same

calculation when Zo = 3 and

P0 = 1/6, P1 = 1/2, P3 = 1/3.

15. Let the probability pn that a family has exactly n children be αpn when

n ≥ 1, and p0 = 1−αp(1+p+p2+· · ·). Assume that all 2n sex sequences

in a family of n children have probability 2−n. Show that for k ≥ 1,

the probability that a family has exactly k boys is 2αpk/(2 − p)k+1.

Given that a family includes at least one boy, what is the probability

that there are two or more boys?

16. Let Xi, i ≥ 1, be independent uniform (0, 1) random variables, and

define N by

N = min{n : Xn < Xn+1}

where X0 = x. Let f(x) = E(N).

(a) Derive an integral equation for f(x) by conditioning on X1.

(b) Differentiate both sides of the equation derived in (a).

(c) Solve the resulting equation obtained in (b).

17. Consider a sequence defined by r0 = 0, r1 = 1 and rj = rj−1 + 2rj−2,

j ≥ 2. Find the generating function R(s) of {rj}, determine r25. For

what region of s values does the series for R(s) converge?

18. Let X1, X2, · · · be independent random variables with common p.g.f.

G(s) = E(sXi). Let N be a random variable with p.g.f. H(s) indepen-

dent of the Xi’s. Let T be defined as 0 if N = 0 and∑Ni=1 Xi if N > 0.

Show that the p.g.f. of T is given by H(G(s)). Hence find E(T ) and

Var(T) in terms of E(X), V ar(X), E(N) and V ar(N).

19. Consider a branching process in which the family size distribution is

Poisson with mean λ.

Page 59: Stat333 Lecture Notes - Daum

54 CHAPTER 4. GENERATING FUNCTIONS

(a) Under what condition will the probability of extinction of the pro-

cess be less than 1?

(b) Find the extinction probability when λ = 2.5 numerically.

(c) When λ = 2.5 find the expected size of the 10th generation, and

the probability of extinction by the 5th generation. Comment on the

relationship between this second number and the ultimate extinction

probability obtained in (b).

20. Consider a branching process in which the family size distribution is

geometric with parameter p. (The geometric distribution has p.m.f

pj = p(1− p)j, j = 0, 1, . . .).

(a) Under what condition will the probability of extinction of the pro-

cess be less than 1?

(b) Find the probability of extinction when p = 1/3.

(c) When p = 1/3, find the expectation and variance of the size of the

10th generation and the probability of extinction by the 5th generation.

21. Let {Zn}∞n=0 be a usual branching process with Z0 = 1. It is known

P0 = p, P1 = pq, P2 = q2 with 0 ≤ p ≤ 1 and q = 1− p.

1) Find a condition on the size of p such that the probability of extinc-

tion is 0. 2) Find the range of p such that the probability of extinction

is smaller than 1. Calculate the probability of extinction when p = 1/2.

3) Calculate the mean and the variance of Zn when p = 1/2.

22. Let X1, X2, . . . be independent random variables with common p.g.f.

G(s) = E(sXi). Let N be a random variable with p.g.f. H(s). Show

that

T =

{ ∑Ni=1 Xi N ≥ 1

0 N = 0

has p.g.f. H(G(s)). Hence, find the mean and variance of T in terms

of the means and variances of Xi and N . Remark: Can you see the

relevance between this problem and the usual branching process?

Page 60: Stat333 Lecture Notes - Daum

4.6. PROBLEMS 55

23. Branching with immigration Each generation of a branching pro-

cess (with a single progenitor) is augmented by a random number of

immigrants who are indistinguishable from the other members of the

population. Suppose that the numbers of immigrants in different gen-

erations are independent of each other and of the past history of the

branching process, each such number having probability generating

function H(s). Show that the probability generating function Gn of

the size of the nth generation satisfies Gn+1(s) = Gn(G(s))H(s), where

G is the probability generating function of a typical family of offspring.

24. Consider the random walk X0 = 0, Xn = Xn−1 + Zn where P (Zn =

+1) = p, P (Zn = −1) = q, n = 1, 2, . . . independently (p + q = 1).

Find the probability that the event Xn = r will ever occur where r is

a fixed positive integer. If p > q, find the expected time until its first

occurrence.

25. Consider the random walk X0 = 0, Xn = Xn−1 + Zn where P (Zn =

+1) = p, P (Zn = −2) = q, n = 1, 2, . . . independently (p+ q = 1). Let

Λ(r)(s) and λn be defined as in the class. Show that Λ(r)(s) = [Λ(s)]r

and derive the relationship

λn = qλ(3)n−1, n = 2, 3, . . . .

Hence, show that

qs[Λ(s)]3 − Λ(s) + ps = 0.

26. Consider the random walkXo = 0, Xn = Xn−1+Zn where Z1, Z2, . . . are

independent, but P (Zn = 1) = p, P (Zn = −1) = q and P (Zn = 0) = r,

n = 1, 2, . . . (p+ q + r = 1). Let

fn = P (X1 6= 0, . . . , Xn−1 6= 0, Xn = 0),

λn = P (X1 < 1, . . . , Xn−1 < 1, Xn ≥ 1).

Find their generating functions Λ(s) and F (s). Hence, obtain the prob-

ability the 0 ever recurs and the probability that the walk ever passes

through 1.

Page 61: Stat333 Lecture Notes - Daum

56 CHAPTER 4. GENERATING FUNCTIONS

27. If an unbiased coin is tossed repeatedly, show that the probability that

the number of heads ever exceeds twice the number of tails is (√

5−1)/2.

28. Let pr,k be the probability that the simple random walk visits state r

(r > 0) exactly k times.

a) If p = q = 0.5, show that pr,k = 0, k = 0, 1, 2, . . ..

b) If p > q, show that

pr,k =

{0, k = 0;

(1− θ)k−1θ, k = 1, 2, . . .

where θ = |p− q|.

c) If p < q, show that

pr,k =

{1− λ, k = 0;

λ(1− θ)k−1θ, k = 1, 2, . . .

where λ = (p/q)r.

29. Consider a gambler who at each play of the game has probability p of

winning one unit and probability q = 1 − p of losing one unit. As-

suming that successive plays of the game are independent, what is the

probability that, starting with i unit, the gambler’s fortune will reach

N before reaching 0? Hint: Let Pi, i = 0, . . . , N denote that probabil-

ity that, starting with i, the gambler’s fortune will eventually reach N .

Derive a relationship between Pi’s.

30. Using the fact that u2n+1 = 0 and u2n = (2nn ) pnqn to show

U(s) = (1− 4pqs2)−1/2.

31. (a) Consider a simple random walk with reflecting barrier. Let Z1, Z2, . . .

be independent and identically distributed random variables such that

P (Zn = 1) = p, P (Zn = −1) = 1 − p = q. Assume 0 < p < 1 and

X0 = 1. Also Xn+1 = Xn +Zn if Xn > 0 and Xn+1 = 1 if Xn = 0. Ver-

ify that {Xn} is a Markov chain and find its transition matrix. Classify

the state space.

Page 62: Stat333 Lecture Notes - Daum

4.6. PROBLEMS 57

(b) Let fn = P (X1 6= 0, X2 6= 0, . . . , Xn−1 6= 0, Xn = 0|X0 = 1) for

n = 1, 2, . . . and f0 = 0. It is known that the generating function of fnis given by

F (s) =1−√

1− 4pqs2

2ps

and√

1− 4pq = |p−q|. Find the probability that 0 will ever be reached.

(c) Find the range of p such that state 0 is recurrent.

Page 63: Stat333 Lecture Notes - Daum

58 CHAPTER 4. GENERATING FUNCTIONS

Page 64: Stat333 Lecture Notes - Daum

Chapter 5

Renewal Events and Discrete

Renewal Processes

5.1 Introduction

Consider a sequence of trials that are not necessarily independent and let

λ represent some property which, on the basis of the outcomes of the first

n trials, can be said unequivocally to occur or not to occur at trial n. By

convention, we suppose that λ has just occurred at trial 0, and En represents

the “event” that λ occurs at trial n, n = 1, 2, . . ..

We call λ an event in renewal theory. However, it is not an event in the

sense of probability models in which events are subsets of the sample space.

Taking the simple random walk {Xn} as an example, we regard Xn as the

outcome of the nth trial. Thus, {Xn} themselves are outcomes of a sequence

of trials. An event λ1 can be used to describe: the outcome Xn is 0. That

is, “the event λ has just occurred at trial n” is the event

En = {Xn = 0}

for a given n.

Similarly, another possible event λ2 can be defined such that “λ2 has just

occurred at trail n” is the event

En = {Xn −Xn−1 = 1, Xn−1 −Xn−2 = −1}, n = 2, 3, . . . .

59

Page 65: Stat333 Lecture Notes - Daum

60 CHAPTER 5. RENEWAL EVENTS

The events E0 and E1 have to be defined separately.

In general, if we have a well defined event λ, then we can easily describe

the event En for every n. If we have a complete description of event En for

every n, the event λ is then well defined. It is convenient to define f0 = 0

and for n ≥ 1, fn = P (Ec1E

c2 . . . E

cn−1En). Thus, fn is the probability that λ

occurs for the first time at trial n (after trial 0).

We say that λ is a renewal event if each time λ occurs, the process

undergoes a renewal or regeneration. That is to say that at the point when λ

occurs, the outcomes of the successive trials have the same stochastic prop-

erties as the outcomes of the successive trials started at time 0. In par-

ticular, the probability that λ will next occur after n additional trials is fn,

n = 1, 2, . . .. Mathematically, it means

1. P (En+m|En) = P (Em|E0);

2. P (En+mEcn+m−1 · · · Ec

n+2Ecn+1|En) = P (EmE

cm−1 · · · Ec

2Ec1|E0).

Another simple (but not rigorous) way to define a renewal event is: inde-

pendent of previous outcomes of the trials, once λ occurs, the waiting time

for the next occurrence of λ has the fixed distribution.

Example 5.1

Consider a sequence of Bernoulli trials in which P (S) = p and P (F ) = q

with p+ q = 1. Let λ represent the event that trials n−2, n−1 and n result

respectively in F, S and S. We shall say that λ is the “event” FSS. It is

clear that λ is a renewal event. If λ occurs at n, the process regenerates and

the waiting time for its next occurrence has the same distribution as had the

waiting time for the first occurrence. ♦

Example 5.2

In the same situation as above, let λ represent the “event” SS. That is, λ is

said to occur at trial n if trials n− 1 and n both give S as the outcome. In

this case, λ is not a renewal event; the occurrence of λ does not constitute a

renewal of the process. The reason is, if λ has occurred at trial n, the chance

it will recur at trial n+ 1 is p, but the chance that λ occurs on the first trial

is 0. ♦

Page 66: Stat333 Lecture Notes - Daum

5.2. THE RENEWAL AND LIFETIME SEQUENCES 61

Example 5.3

In most situations, the event of record breaking is not a renewal event. Let

us consider the record high temperature. The record always gets higher and

makes it hard to break. Thus, the waiting time for the next occurrence is

likely to be longer. Hence, it cannot be a renewal event. ♦

Example 5.4

The simple random walk provides a rich source for examples of renewal

events. As before, we assume X0 = 0 and Xn = Xn−1 + Zn, where Zn = +1

or −1 with respective probabilities p and q, independently, n = 1, 2, . . ..

a) Let λ represent “return to the origin”. Then λ is a renewal event. In

fact, the notation that we used in our analysis of the simple random walk

will motivate our choice of notation for recurrent events as introduced in the

next section.

b) Let λ represent a “ladder point” in the walk. By this we mean that λ

occurs at trial n if

Xn = max{X0, X1, . . . , Xn−1) + 1

and we assume λ to have occurred at trial 0. Thus, the first occurrence of

λ corresponds to first passage through 1, the second occurrence of λ corre-

sponds to first passage through 2, and so on. Here again, λ is a renewal

event, since each ladder point corresponds to a regeneration of the process.

c) As a final example, suppose that λ is said to occur at trial n if the num-

ber of positive values in Z1, . . . , Zn is exactly twice the number of negative

values. Equivalently, λ occurs at trial n if and only if Xn = n/3. ♦

5.2 The Renewal and Lifetime Sequences

Let λ represent a renewal event and as before define the lifetime sequence

{fn} where f0 = 0 and

fn = P{λ occurs for the first time at trial n}, n = 1, 2, . . . .

Page 67: Stat333 Lecture Notes - Daum

62 CHAPTER 5. RENEWAL EVENTS

In like manner, we define the renewal sequence un, where u0 = 1 and

un = P{λ occurs at trial n}, n = 1, 2, . . . .

Let F (s) =∑fns

n and U(s) =∑uns

n be the generating functions of

{fn} and {un}. Note that

f =∑

fn = F (1) ≤ 1

since f has the interpretation that λ recurs at some time in the sequence.

Since the event may not occur at all, it is possible for f to be less than 1.

Clearly, 1 − f represents the probability that λ never recurs in the infinite

sequence of trials.

If f < 1, the waiting time for λ to occur is not really a random variable.

This is because that it has probability 1 − f to BE infinity which is not

allowed for a random variable. For this kind of renewal events, after each

occurrence of λ, there is a probability 1 − f that it will never occur again.

The probability that it will occur exactly k times is fk(1− f), k = 0, 1, . . ..

(Use the model that we toss a coin to decide whether it will recur).

We may compute the chance of λ occuring at most 1000 times as

100∑k=0

fk(1− f) = 1− f 101.

The chance of it occuring at most m times is

m∑k=0

fk(1− f) = 1− fm+1.

Thus, when m tends to infinity, the chance for λ to occur no more than m

times tends to 1. Based on this fact, we claim that such a renewal event

occurs finitely often. It is transient.

If f = 1, then λ will occur some time in the future with probability

one. Only then, we can discuss the waiting time for the next occurrence

of λ. For a renewal event λ with this property, the waiting times from

the nth occurrence to the (n + 1)th (hence called inter-occurrence time)

are independent and have the same distribution. The function F (s) defined

Page 68: Stat333 Lecture Notes - Daum

5.2. THE RENEWAL AND LIFETIME SEQUENCES 63

earlier is the corresponding probability generating function. A renewal event

λ with f = 1 is called recurrent.

For a recurrent event, F (s) is a probability generating function. The

mean inter-occurrence time is

µ = F ′(1) =∞∑n=0

nfn.

If µ <∞, we say that λ is positive recurrent. If µ =∞, we say that λ is

null recurrent.

Finally, if λ can occur only at n = t, 2t, 3t, . . . for some positive integer

t > 1, we say that λ is periodic with period t. More formally, let t =

g.c.d.{n : fn > 0}. (g.c.d. stands for the greatest common divisor). If t > 1,

the recurrent event λ is said to be periodic with period t. If t = 1, λ is said

to be aperiodic.

Note that even if the first a few fn values are zero, the renewal event can

still be aperiodic. Many students believe that: if f1 = f2 = 0, the period of

the renewal event must be at least 3. This is wrong. The renewal event can

still be aperiodic if f8 > 0, f11 > 0 and so on. The greatest common divisor

of 8, 11 is one! No additional information is needed.

Another remark is, suppose fi > 0 and fj > 0 for some integers i and

j. In addition, i and j is mutually prime, then the greatest common divisor

for the set {i, j, any additional numbers} = 1. That is, we know that the

period is 1 already. No need to look further.

To show that the greatest common divisor is t which is larger than 1, we

have to make sure fn = 0 whenever n is not a multiple of t. This is much

harder in general.

In the simple random walk, the renewal event of “returning to zero” has

period 2. This is because fn > 0 only if you lose and win equal number of

games in a total of n games. Thus, n must be even when fn > 0. The period

is t = 2, rather than any larger because f2 > 0. Thus the greatest common

divisor cannot be larger than 2.

Page 69: Stat333 Lecture Notes - Daum

64 CHAPTER 5. RENEWAL EVENTS

5.3 Some Properties

For a renewal event λ to occur at trial n ≥ 1, either λ occurs for the first

time at n with probability fn = fnu0, or λ occurs for the first time at some

intermediate trial k < n and then occurs again at n. The probability of this

event is fkun−k. Notice that f0 = 1, we therefore have

un = f0un + f1un−1 + · · ·+ fn−1u1 + fnu0, n = 1, 2, . . . .

This equation is called renewal equation.

Using the typical generating function methodology, we get

U(s)− 1 = F (s)U(s).

Hence

U(s) =1

1− F (s)or F (s) = 1− 1

U(s).

Recall that when we discussed the simple random walk, we found in that

context,

U(s) = (1− 4pqs2)−1/2, F (s) = 1−√

1− 4pqs2.

It is simple to see that this relationship is true.

The concepts defined in the last section are all related to the {un} se-

quence and we summarize this in the following.

Theorem 5.1

The renewal event λ is

1. transient if and only if u =∑un = U(1) <∞,

2. recurrent if and only if u =∞,

3. periodic if t = g.c.d.{n : un > 0} is greater than 1 and aperiodic if

t = 1.

4. null recurrent if and only if∑un =∞ and un → 0 as n→∞.

Page 70: Stat333 Lecture Notes - Daum

5.3. SOME PROPERTIES 65

Proof 1 and 2:

un =∞∑n=0

un = lims→1−

U(s) = lims→1−

[1− F (s)]−1.

It follows that u <∞ and u =∞ when f = F (1) < 1 and f = 1 respectively.

The event λ is transient in the former case and persistent in the latter.

3. If λ has period d > 1, then F (s) =∑fns

n contains only powers of sd.

Since

U(s) = [1− F (s)]−1 = 1 + F (s) + F 2(s) + · · ·if follows that U(s) = uns

n contains only powers of sd and so t = g.c.d{n :

un > 0} = g.c.d.{md : umd > 0} is such that t|d. But since un = 0 implies

that fn = 0, it follows that d|t. Hence t = d.

4. This result will follow from the renewal theorem.

The following is the famous renewal theorem.

Theorem 5.2 (The renewal theorem).

Let λ be a recurrent and aperiodic renewal event and let

µ =∑

nfn = F ′(1)

be the mean inter-occurrence time. Then

limn→∞

un = µ−1.

Proof: See Feller (1968, page 335).

When µ =∞ which implies λ is null recurrent, then µ−1 = 0. This proves

4) in the last theorem.

For recurrent periodic renewal event λ, we might be able to re-scale the

time unit and then make use of this theorem. Suppose that λ has period

d > 1. We can define a new sequence of trials so that each new trial is a

combination of d original trials. That is, if the outcome of the original trials

are X1, X2, . . .. For instance, define

Ym+1 = (Xmd+1, Xmd+2, . . . , X(m+1)d).

The new sequence {Y0, Y1, . . .} can also be used to define the renewal event

λ. However, in this case, λ becomes aperiodic and the theorem can then be

applied.

Page 71: Stat333 Lecture Notes - Daum

66 CHAPTER 5. RENEWAL EVENTS

Example 5.5

Let λ represent the occurrence of FFS in a sequence of Bernoulli trials with

P (S) = p and P (F ) = q, (p+q=1). In this case

u0 = 1, u1 = u2 = 0

and

un = pq2, n = 3, 4, . . . .

Thus

U(s) = 1 + pq2(s3 + s4 + · · ·) = 1 + pq2s3/(1− s), |s| < 1

and

F (s) = 1− [U(s)]−1 = pq2s3/(1− s+ pq2s3).

Note that F (1) = f = 1 so that λ is recurrent. Since un → pq2 > 0 as

n→∞, it follows that λ is positive recurrent and the mean inter-occurrence

time is µ = (pq2)−1. ♦

Example 5.6

Consider again the simple random walk and let λ represent return to the

origin. It is known that

F (s) = 1−√

1− 4pqs2, |s| ≤ 1

and

U(s) = (1− 4pqs2)−1/2.

Since fn = 0 for all odd n and non-zero for even n, it follows that λ is periodic

with period d = 2. If p = q, F (1) = f = 1 so that λ is in this case recurrent.

If p 6= q,

f = F (1) = 1− |p− q| < 1

and λ is transient. When p = q, µ = lims→1− F′(s) = ∞ so that λ is null

recurrent.

Page 72: Stat333 Lecture Notes - Daum

5.4. DELAYED RENEWAL EVENTS 67

5.4 Delayed Renewal Events

In a simple random walk with X0 = 0, the event of the walk returning to

0 is a renewal event. When Xn = 0 for some n, the process renews itself:

it behaves as if we have just observed X0 = 0, and we can re-set the clock

back to 0. More specifically, if X10 = 0, then {X10 = 0, X10+1, X10+2, . . .} is

stochastically the same as {X0 = 0, X1, X2, . . .}. However, let λ be the event

of Xn = 1, then λ is not a renewal event. When X5 = 1, {X5, X6, . . .} does

not behave the same as {X0, X1, . . .}. Hence, we cannot re-set the clock back

to 0 and pretend that nothing happened.

If we observe that X5 = 1 and X19 = 1, then {X19+0 = 1, X19+1, . . .} will

have the same stochastic property of the system {X5 = 1, X5+1, X5+2, . . .}.Hence, the event λ does not renewal itself when it first occurs, but after its

first occurrence, the future occurrence of λ renews the process to the time

when it first occurs. Such events are called delayed renewal events.

The main difference between the delayed renewal and the usual renewal

events is: the waiting time for the first occurrence of λ has different distribu-

tion from the distribution of the inter-occurrence times. An informal way to

describe the delayed renewal event is: we missed the beginning and started

from the middle of the sequence.

Suppose that λ is a delayed renewal event. Let us define some quantities:

1) {bn}: the probability that λ first occurs on trial n, n = 0, 1, 2, . . ..

2) {fn}: the probability that λ first occurs again n trials later once that

λ has occurred,

3) {un}: the probability that λ occurs on trial n, given that λ occurred

on trial 0.

4) {vn}: unconditional probability that λ occurs on trial n.

By convention, we suppose that f0 = 0 but we do allow b0 > 0 so that

λ may occur for the first time at trial 0. Let B(s), F (s), U(s) and V (s) be

corresponding generating functions. We have,

U(s) = [1− F (s)]−1, |s| < 1,

which can be proved in the same way as for renewal events.

Let b =∑∞n=0 bn be the probability that λ ever occurs. The property of

the delayed renewal event λ are determined by the {fn} sequence. Thus,

Page 73: Stat333 Lecture Notes - Daum

68 CHAPTER 5. RENEWAL EVENTS

λ is recurrent if f =∑fn = 1 and transient if f < 1. Periodicities are

determined by examining g.c.d. {n : fn > 0}. Note that it is possible that λ

is a recurrent event and yet has non-zero probability that λ will never occur,

but once it does it then occurs infinitely often.

To find V (s), let us note that when λ occurs at trial n ≥ 1, either λ

occurs for the first time at n with probability bn = bnu0, or λ occurs for the

first time at some intermediate trial k < n and then occurs again at n. Thus,

vn = b0un + b1un−1 + · · ·+ bnu0, n = 0, 1, 2, . . . .

We recognize the right side as the convolution of {bn} with {un} and so

V (s) = B(s)U(s), |s| < 1

and,

V (s) = B(s)[1− F (s)]−1, |s| < 1.

Example 5.7

Consider the simple random walk and let λ represent passage through 1.

Thus, λ occurs at trial n if Xn = 1. Then λ is a delayed renewal event. In

the notation used for the random walks

bn = λn = P (first passage at trial n)

and B(s) = Λ(s). Once λ occurs, the probability it recurs after n additional

steps is the same as the probability of a return to the origin n. Thus

U(s) = (1− 4pqs2)−1/2

and {vn} where vn = P (Xn = 1), n = 1, 2, . . . has generating function

V (s) = Λ(s)U(s) =1−√

1− 4pqs2

2qs√

1− 4pqs2.

♦Theorem If λ is delayed renewal event that is recurrent and aperiodic, then

limn→∞

vn = b limn→∞

un = bµ−1.

That is, we have to wait until the event occur for once, and then it

becomes a renewal event from there.

The conclusions we obtained here will be very useful when we discuss

Markov chain.

Page 74: Stat333 Lecture Notes - Daum

5.5. SUMMARY 69

5.5 Summary

Table 5.1: Summary of some concepts

Terminology Definition

“Event” It is a property of a stochastic process.

Its occurrence can be determined after n trials.

Renewal Event When this type of event occurs, the stochastic

process undergoes a renewal: the random behavior of the

process from this point is the same as the process from

time zero

Delayed Renewal At the second time when this type of event occurs

Event the process undergoes a renewal: the random behavior of the

process from this point is the same as the process from

time when it occurred for the first time.

Recurrent The renewal event will occur with probability 1.

Transient The renewal event may never occur.

Positive Recurrent The renewal event is recurrent and the

expected waiting time for the next occurrence is finite

Null Recurrent The renewal event is recurrent but the

expected waiting time for the next occurrence is infinite

Period The greatest common divisor of the number of trials

after which the renewal event can happen.

Aperiodic The period of the renewal event is 1.

5.6 Problems

1. A fair die is rolled repeatedly. We keep a record of the score of each

rolls. Let λ be the renewal event that scores 123 occur consecutively.

(a) Let un = P (λ occurs at trial n) = P (Xn−2 = 1, Xn−1 = 2, Xn = 3).

Show that un = 1216

for most n. Find the values of n for which un 6= 1216

.

Page 75: Stat333 Lecture Notes - Daum

70 CHAPTER 5. RENEWAL EVENTS

Recall that we assume u0 = 1.

(b) Obtain the generating function of the sequence un and therefore,

show that the generating function of

fn = P (λ occurs at trial n for the first time)

is given by

F (s) =s3

s3 + 216(1− s).

(c) Is the renewal event periodic? Is it recurrent? If so, find the mean

inter-occurrent time.

2. In a simple random walk examine if the following are renewal events.

(a) ε is said to occur at trial n if at trial n, a return to origin from the

positive side takes place.

(b) ε is said to occur at trial n if at trial n, the walk is to the right side

of the origin.

3. Lifetime distribution of a fuse is given by fn = θn−1(1−θ), n = 1, 2, . . ..

(a) Show that P (X = m+ n|X > m) = fn, n = 1, 2, . . ..

(b) Suppose that a new fuse is place in service on day 0 and immediately

upon its failure is replaced with an identical fuse. Also assume that

the lifetimes are i.i.d. random variables with distribution given above.

The event λ is said to occur at trial n if a new fuse is put at trial n.

Note that λ is a recurrent event. Obtain F (s) and U(s), and hence

determine un for this event.

(c) Let T be the survival time of the fuse in service at time n. (if a

failure occurs at time n, the fuse in service is the replacement). Write

T as a sum of indicator variables Y0, Y1, . . ., where Yi = 1 if the fuse in

service at n is also in service at time i, and 0 otherwise. Show that

E(T ) =1 + θ − θn+1

1− θ.

Page 76: Stat333 Lecture Notes - Daum

5.6. PROBLEMS 71

Note that as n→∞, E(T )→ (1 + θ)/(1− θ) which is strictly greater

than the mean inter-occurrence time. Can you explain this fact on

intuitive grounds?

4. In a symmetric random walk in two dimensions, a particle begins at the

origin and then moves 1 unit to the N, S, E, or W each with probability

1/4. Let λ designate return to the origin.

(a) Show that u2n+1 = 0 and

u2n = 4−2n

(2n

n

)n∑i=0

(n

i

)2

.

(b) Show that the particle returns to the origin with probability 1.

Argue from this result that the particle must pass through every point

in the integer lattice.

5. Consider a renewal event with the {fn} sequence having generating

function F (s). Let Nk denote the number of occurrences in the first k

trials and let qk,n = P (Nk = n).

Show that Qn(s) =∑k qk,ns

k is given by

Qn(s) ={1− F (s)}F n(s)

1− s.

Hint: qk,n =∑

combination of α F (T1 = α1)P (T2 = α2) · · ·P (Tn =

αn)P (Tn+1 > β) where Ti is the number of trials between the (i− 1)th

and the i-th occurrence and β = k − α1 − · · · − αn.

6. A coin is tossed repeatedly, heads appearing with probability p = 2/3

on each toss. Let λ be the renewal event that THH occurs consecutively.

(a). Let un = P (λ occurs at trial n). Show that un = 4/27 for most n.

Find the values of n for which un = 4/27 does not hold. (e.g. u0 = 1

by convention).

(b). Obtain the generating function of un and therefore show that the

generating function of

fn = P (λ occurs at trial n for the first time)

Page 77: Stat333 Lecture Notes - Daum

72 CHAPTER 5. RENEWAL EVENTS

is given by

F (s) =4s3

27− 27s+ 4s3.

(c). Is it a recurrent renewal event? If so, find the mean inter-

occurrence time. Otherwise, find the probability that λ will ever occur

again.

7. (Self-organizing data retrieval system). consider a shelf containing two

books, B1 and B2 (among others). These books have two possible

orders on the shelf, namely B1B2 or B2B1. Assume that at epochs

n = 0, 1, 2, . . ., a book is required by a library user, and that at any

epoch the probability that Bj is needed is pj, j = 1, 2, independently of

what happens in other epochs. Assume p1 > 0, p2 > 0, p1 + p2 < 1. To

obtain the required book, the librarian always searches the book-shelf

from left to right, so that average search time for the requested book

is minimized if the book with higher pj value is on the left.

However, the librarian does not know which book is more popular, and

therefore cannot decide whether B1B2 or B2B1 is the better arrange-

ment. To increase the chance of having the requested book nearer the

left much of the time, the following algorithm has been devised. When-

ever any book is requested, it is placed to the end of the shelf when it

is returned. Thus if B2 is demanded and the shelf order is B1B2, the

new arrangement will be B2B1 once B2 is returned.

(a) Let λ be the renewal event “shelf order of B1 and B2 is B1B2”.

Let fn be the lifetime sequence for λ, having generating function F (s).

Show that

F (s) = (1− p2)s+ p1p2s2/(1− (1− p1)s).

(b) Show that λ is aperiodic and recurrent. Hence determine lim un =

P (λ at epoch n). Determine also the long run probability that the self

order is B2B1.

Page 78: Stat333 Lecture Notes - Daum

Chapter 6

Discrete Time Markov Chain

6.1 Introduction

We have studied properties of several special stochastic processes. There are

often no precise definitions for stochastic processes. A rough definition is

that a stochastic process is a collection of random variables. Thus, even a

single random variable qualifies as a stochastic process. The simple random

walk, branching process, and renewal process are all typical examples of

stochastic processes. Some common features of these processes are: each

process contains countable many random variables; these random variables

are arranged in some order, that is, we know exactly which one is a precedent

of the other; even though the random variables under investigation are not

necessarily independent of each other, they are often functions of a sequence

of independent and identically distributed random variables.

Why do these stochastic share these properties? This is not because

stochastic processes in the real world happen to have these properties. Rather,

it is because our current knowledge does not allow us to draw meaningful

conclusions from more general processes. We have to limit ourselves to these

simple, yet useful stochastic processes.

In this chapter, we consider a bit more general stochastic process {Xn, n =

0, 1, 2 . . .}. Note that we have only allowed countable number of possible ran-

dom variables and they are arranged in a fixed order. In addition, we will

only allow Xn to take countable possible values. With this restriction, all Xn

73

Page 79: Stat333 Lecture Notes - Daum

74 CHAPTER 6. DISCRETE TIME MC

are discrete random variables. Since it is always possible to label the set of

possible values of Xn’s by non-negative integers, we assume Xn taking only

non-negative integer values. When a stochastic process takes values other

than non-negative integers, most of our conclusions will stay.

The most important additional assumption on the stochastic process we

make is the following Markov property:

P{Xn+1 = j|Xn = i,Xn−1 = in−1, . . . , X1 = i1, Xo = io}= P{Xn+1 = j|Xn = i}= P{X1 = j|X0 = i}= pij.

The first equality specifies the Markov property. It is often described

as the property that given the present (Xn = j), the future (Xn+1 = j) is

independent of the past (outcomes of X1, . . . , Xn−1). The second equality

further requires the Markov property is time homogeneous. That is, the

conditional probability does not depend on time n. The third equality simply

assigns a special notation. We call this quantity the transition probability

from state i to state j.

The set of all possible values of Xn’s is called the State Space. The

sub-indices of Xn are regarded as time. That is, the value of Xn is the state

of the process at time n. Unless otherwise mentioned, the state space will be

denoted as {0, 1, 2, . . .} and the time will also be {0, 1, 2, . . .}. If Xn = i, we

say that the Markov chain is in state i at time n.

It should be clear that the state space contains all the possible values of

X1, and all possible values of X2, and all possible values of X3 and so on. It

is not dictated by any single random variable.

Example 6.1

Suppose X0 = 0, X1 has discrete uniform distribution on {0, 1}, X2 has

uniform distribution on {0, 1, 2} and son on. In general, Xn has discrete

uniform distribution on {0, 1, 2, . . . , n} for n = 0, 1, 2, . . ..

The stochastic process {Xn}∞n=0 has a state space S = {0, 1, 2, . . .}.The state space is NOT {0, 1, . . . , n}. ♦

Page 80: Stat333 Lecture Notes - Daum

6.1. INTRODUCTION 75

Definition 6.1

A stochastic process is a discrete time Markov chain, if it

1. consists of a sequence of random variables (that is, countable many),

2. has countable state space, and

3. has Markov property.

♦As already mentioned, notation pij is used for the transition probability

P (Xn+1 = j|Xn = i). It is also called one-step the transition probability

from state i to state j. Obviously,

pij ≥ 0; i, j ≥ 0;

∞∑j=0

pij = 1, i = 0, 1, 2, . . . .

It is most convenient to use a matrix notation P = [Pij]. All entries of P are

non-negative and its the sum of every row of P equals 1.

Example 6.2 (A communication system)

Consider a communication system which transmits the digits 0 and 1. Each

digit transmitted must pass several stages, at each stage there is a probability

p that the digit entered will be unchanged when it leaves. Let Xn be the

digit entering the nth stage. Then {Xn}∞n=1 is a two-state Markov chain.

The state space is {0, 1}. The transition matrix is

P =

[p 1− p

1− p p

].

Example 6.3 (Simple random walk):

Page 81: Stat333 Lecture Notes - Daum

76 CHAPTER 6. DISCRETE TIME MC

It is more convenient to denote state space as {0,±1,±2, . . .}. The transition

probabilities are

pij =

0, |i− j| 6= 1;

p, j = i+ 1;

1− p, j = i− 1.

Example 6.4 (Branching process):

Assume Z0 = 1 and the family size has Binomial (2, p) distribution. Then

the state space is {0, 1, 2, . . .}, and the transition probability is

pij =

(2i

j

)pj(1− p)2i−j.

♦To model an experiment (or real world phenomenon) by a (discrete time)

Markov chain, we should go through the following three steps:

• identifying a sequence of random variables;

• identifying the corresponding state space;

• obtaining the transition probabilities.

We should further confirm the state space is countable, and the Markov prop-

erty is suitable. In the above examples, you may easily find the state spaces

are countable. If we work more carefully, we often find the Markov property

is true as we can calculate the transition probabilities without conditioning

on the state of Xn−1 and so on.

Finally, we should certainly make sure that the Markov chain so defined

serve the purpose of solving problem under investigation. We may define a

flawless Markov chain but it does not help us to answer the question asked.

Example 6.5 (Weather forecasting):

Page 82: Stat333 Lecture Notes - Daum

6.1. INTRODUCTION 77

Suppose that there are only two possible weather conditions for any single

day: 1, Rain; 2, Sunny. In addition, we assume that tomorrow’s weather

depends on today’s weather, but not on previous weather conditions when

today’s weather is given. Also, the chance of rain tomorrow given today is

raining is α, and the chance of being sunny tomorrow given today is sunny

is β. We can model this experiment as a discrete time Markov chain.

First, we should define

Yn = { 1 It is sunny on the nth day;

0 It rains on the nth day.

The state space is clearly {0, 1} by the above definition and is countable.

The Markov property is satisfied as it is clearly stated in the description

of the problem. We certainly cannot blindly believe the real world can indeed

be modeled by a stochastic process with a Markov property. At the same

time, it is hoped that this is a harmless mathematical assumption. The

transition probability matrix is

P =

[α 1− α

1− β β

].

♦The model in the above example might be too simplistic to be useful in

the real world. One possible remedy of having a more realistic model, but

still simple enough mathematically, is to assume that tomorrow’s weather

depends only on the weather of yesterday and today. Let us use notation R

for rain and S for sunny. Assume the transition probabilities are as follows:

Yesterday Today P(Tomorrow = R)

n-1 n n+1

R R 0.7

S R 0.5

R S 0.4

S S 0.2

Under the current specification, {Yn} defined in the last example is no

longer a Markov chain. The probability of Yn=1 = 1 depends on both Yn and

Page 83: Stat333 Lecture Notes - Daum

78 CHAPTER 6. DISCRETE TIME MC

Yn−1. (Can we mathematically verify it?) However, it is possible to build on

this process a different process which is a Markov chain.

Let us define for n ≥ 1,

Yesterday Today Xn

n-1 n n

R R 0

S R 1

R S 2

S S 3

We may have to ignore X0. With this stochastic process {Xn}∞n=1, have a

state space {0, 1, 2, 3} which is still countable. It now has Markov property

as

P (Xn+1 = 0|Xn = 0, whatever Xn−1, Xn−2, . . .) = 0.7.

Note that Xn−1 is assumed to be consistent with Xn so that the probability

of the event in the conditioning part is non-zero. In general, we may define

P (A|B) to be anything if P (B) = 0.

To verify the Markov property, we need only work out the transition

matrix. It is simply to see that

P (Yn+1 = 0|Yn = 0, Whatever values of Yn−1 and so on) = 0.7.

The transition matrix turns out to be

P =

0.7 0 0.3 0

0.5 0 0.5 0

0 0.4 0 0.6

0 0.2 0 0.8

.

It is simple to check again that pij ≥ 0 and∑j pij = 1.

Remark: There are more than one way to label the state space of the

stochastic process. When we exchange the states 0 and 1 in the above defini-

tion of random variable Yn, then the transition matrix would have first and

second rows, first and second columns exchanged too. This will create some

problem for marking your assignments, but it is not a problem in theory.

Page 84: Stat333 Lecture Notes - Daum

6.1. INTRODUCTION 79

When trying to use a probability model to describe a real world phe-

nomenon, we are certain that it is not correct. We are happy to learn that it

might still be very useful. Further, by increasing the model complexity, we

can often obtain a model that is closer to reality and very useful.

Example 6.6

Suppose there are 3 white and 3 black balls in two urns, each containing

3 balls. At each step, we draw a ball randomly from each urn. We then

exchange the balls and put them back into urns. We are interested in knowing

the number of while balls in the first urn after n exchanges.

Let Xn be the number of white balls in the first urn after n exchanges.

We will show {Xn} is a Markov chain.

Step 1: It is obvious that Xn can only be 0, 1, 2, or 3, regardless of n.

Therefore, the state space is {0, 1, 2, 3} which is countable.

Step 2: We need to verify the Markov property. First note,

P (Xn+1 = j|Xn = i,Xn−1 = in−1, . . .) = 0 if |i− j| ≥ 2.

The notation in−1 in the above expression simply means some number. The

equation implies the transition probability from state i to state j does not

depend on the value of in−1 or others, it equals 0 as long as |i− j| ≥ 2.

We have a large number of other cases. If i = 0 and j = 0, we find

P (Xn+1 = 0|Xn = 0,whatever others) = 0.

This is because we will definitely obtain a while ball. Obviously

P (Xn+1 = 1|Xn = 0,whatever others) = 1.

There is no need to consider other cases when i = 0.

If i = 3, we have

P (Xn+1 = 2|Xn = 3,whatever others) = 1.

If i = 2, we have

P (Xn+1 = 2|Xn = 2,whatever others) =4

9.

Page 85: Stat333 Lecture Notes - Daum

80 CHAPTER 6. DISCRETE TIME MC

P (Xn+1 = 1|Xn = 2,whatever others) =4

9.

P (Xn+1 = 3|Xn = 2,whatever others) =1

9.

If i = 1, we have

P (Xn+1 = 2|Xn = 1,whatever others) =4

9.

P (Xn+1 = 1|Xn = 1,whatever others) =4

9.

P (Xn+1 = 0|Xn = 1,whatever others) =1

9.

As none of the above transition probabilities depend on the whatever part,

the Markov property has been verified. The transition probability matrix is

P =

0 1 0 019

49

49

0

0 49

49

19

0 0 1 0

.

This completes the task for modeling this experiment as a Markov chain.

The above model turns out to be very useful to describe the diffusion

process in the physical world. If you drop a colored water into a cup of

clean water, soon the color will spread out. It will be mixed with the water

perfectly without our help. By imaging molecules moving from one part of

the cup to another part randomly, we can see that in the limit, the color

molecules will distribute uniformly over the whole cup.

6.2 Chapman-Kolmogorov Equations

Suppose that in Example 6.6, it is known that X0 = 1. Give this information,

what is the probability that X2 = 1? More generally, what would be the

probability X100 = 1?

Page 86: Stat333 Lecture Notes - Daum

6.2. CHAPMAN-KOLMOGOROV EQUATIONS 81

The first question has a specific answer, the second one can be answered

in principle.

P (X2 = 1|X0 = 1) =P (X2 = 1, X0 = 1)

P (X0 = 1)

=

∑3j=0 P (X2 = 1, X1 = j,X0 = 1)

P (X0 = 1)

=3∑j=0

P (X2 = 1|X1 = j)P (X1 = j|X0 = 1)

=3∑j=0

p1jpj1 =41

81.

The solution to P (X100 = 1|X0 = 1) can be obtained in the same way,

except we need to work on sum of items which are products of 100 numbers.

Due to (temporal homogeneity) Markov property, we know that

P (Xn+2 = j|Xn = i) = P (X2 = j|X0 = i)

which does not depend on n. We hence call it two-step transition matrix.

Using notation p(2)ij , we find

p(2)ij =

∑k

pikpkj.

The above formula may look complex. However, if expressed in matrix for-

mat, it becomes

P (2) = P 2

where P (2) is the two-step transition probability matrix and P is the one step

transition probability matrix. In general, we have

P (m) = Pm.

When the state of the Markov chain at time 0 is not given, but we know

the distribution of X0 as

P (X0 = i) = αi,

Page 87: Stat333 Lecture Notes - Daum

82 CHAPTER 6. DISCRETE TIME MC

we can then find the distribution of X1 as follows:

P (X1 = j) =∑k

pkjαk.

Let α be the row vector of αi and β be the row vector of P (X1 = j), we have

β = αP.

Similarly, if βm is the row vector of P (Xm = j), we have

βm = αPm.

The above formulas P (m) = Pm and its generalized form

P (m)P (n) = P (m+n)

are called Chapman-Kolmogorov equations. They are so simple and straight-

forward. I am sure that if you were born a few centuries earlier, it could be

your name that is attached to these formulas.

6.3 Classification of States

In almost any disciplines, it is often possible to classify the objects under

investigation into several groups. The objects in the same group share some

common properties.

Recall that all possible values of all Xn, n = 0, 1, . . ., form the state space

S of the Markov chain. For convenience, we denote them as {0, 1, 2, . . .} most

of the times. In this section, we plan to classify these states according to the

stochastic properties of the Markov chain. As you may have already guessed

correctly, the transition probability matrix P , together with state space S,

almost completely determine the properties of the Markov chain.

Accessible: We say that state j is accessible from state i if there exist

m ≥ 0 such that

P(m)ij = P (Xn+m = j|Xn = i) > 0.

This implies that if the Markov chain ever enters state i, it is possible for it

to enter state j in the future.

Page 88: Stat333 Lecture Notes - Daum

6.3. CLASSIFICATION OF STATES 83

Communicate: If states i and j are accessible from each other, we say

that they communicate. We use notation i ↔ j. We now try to use this

relation to classify the states.

We first claim that any state of the Markov chain communicates with

itself. This might be interpetted as the state i is accessible from itself in 0

steps. It may also be regarded simply as a convention.

Second, it is easily to see that if state i communicates with state j, the

state j communicates with state i. This is the consequence that the definition

of “communicate” is symmetric in states i and j.

Third, if state i communicates with state j, and state j communicates

with state k, then state i communicates with state k too. This can be proved

as follows. The assumptions imply the existence of integers m1 and m2 such

that

P (Xm1 = j|X0 = i) > 0,

P (Xm2 = k|X0 = j) > 0.

Hence,

P (Xm1+m2 = k|X0 = i) =∑l

P (Xm1+m2 = k|P (Xm1 = l)P (Xm1 = l|X0 = i)

≥ P (Xm1+m2 = k|P (Xm1 = j)P (Xm1 = j|X0 = i)

> 0.

Similarly, we can show

P (Xm1+m2 = i|X0 = k) > 0

for probably a different pair of m1 and m2.

The above three properties of communicate then describe an equivalent

relationship. An equivalent relationship divides the state space uniquely into

equivalent classes. The equivalent classes have the following two properties:

1. Any two states in the same class are equivalent to each other.

2. Two states from two different classes are not equivalent.

Page 89: Stat333 Lecture Notes - Daum

84 CHAPTER 6. DISCRETE TIME MC

Consequently, the relationship “communicate” divides up the state space

of a Markov chain into distinct classes. The states in the same class commu-

nicate with each other. States from different classes do not communicate.

If all states communicate with each other, then there is only one class in

the state space. In this case, we say that the Markov chain is irreducible.

The idea behind this notion is: if a Markov chain is reducible, we might

be able to reduce it before we study its other properties. One principle in

mathematics is always to simplify the problem under investigation.

Example 6.7

Consider a Markov chain consisting of four states 0, 1, 2, 3 and having

transition matrix

P =

12

12

0 012

12

0 014

14

14

14

0 0 0 1

.Note that out set up assumes that {Xn}∞n=0 have been defined, the state

space is {0, 1, 2, 3} and the transition probabilities are already specified by the

transition probability matrix. In many of our assignment problems, you may

have to go through these omitted steps. When you get lost on an assignment

problem before you even get started, ask the question of what these things

are first.

0 1

23

-�

6

@@

@@@@I

����

���-

���-

����

Page 90: Stat333 Lecture Notes - Daum

6.3. CLASSIFICATION OF STATES 85

It is obvious that states 0 and 1 communicate with each other. Hence,

they belong to the same class. All states can be accessed from state 2.

However, state 2 can only be accessed from itself. Therefore, state 2 does

not belong to the class that contains states 0 and 1. It has to form its own

class. State 3 is an absorbing state. When the Markov chain enters this

state, it will stay there forever. Consequently, it does not communicate with

other states either. Hence, state 3 also forms its own class.

In summary, the state space of this Markov chain is divided into three

classes: {0, 1}, {2}, and {3}. ♦What is the consequence for a Markov chain to be reducible? In the

above example, if X0 = 0, then the states 2, 3 will never be reached. Thus,

the Markov chain has practically the “reduced” state space {0, 1}.Next, we consider the problem about the property of the states in the

same class.

It is obvious that the property that the Markov chain enters state i is a

renewal event for any given i. Well, it is probably a delayed renewal event

as we cannot assume X0 = i for every i. However, once the Markov chain is

in state i at time n, the future will not depend on the outcome of Xk, k < n

(conditionally). This justifies the claim that Xn = i is a (delayed) renewal

event.

With this, we may ask the question: is the renewal event “Xn = i”

transient, positive recurrent or null recurrent? In short, we simply ask the

question: is state i transient, positive recurrent or null recurrent? To avoid

the complication raised from “delayed”, we answer this question on the as-

sumption that X0 = i whenever we investigate the property of state i.

Recall that a renewal event is recurrent if the probability of its occurrence

in the future is one. Let

fi = P (Xn = i for some n|X0 = i).

Then state i is recurrent if and only if fi = 1. If fi < 1, state i is transient.

In this case, the Markov chain will only enter state i finite number of times.

Sooner of later, the Markov chain will leave this state alone forever.

Consider the Markov chain defined in Example 6.7, it is easily seen that

state 2 is transient. Starting from state 2, the Markov chain has probability

Page 91: Stat333 Lecture Notes - Daum

86 CHAPTER 6. DISCRETE TIME MC

0.25 to stay in the same state after one transition. However, once it leaves

state 2, the Markov chain will never re-enter this state.

According to our discussion on renewal event, it is also true that state i

is transient if and if only

∑n

P (Xn = i|X0 = i) =∑n

p(n)ii <∞.

(Recall we obtained this theorem by using generating functions). Conse-

quently, state i is recurrent if and only if

∑n

P (Xn = i|X0 = i) =∑n

p(n)ii =∞.

Recall a state i is transient implies that the Markov chain will only visit

i finite number of times (in the whole process over infinity time horizon). If

a Markov chain has only finite number of states, at least one of the states

must be visited infinity number of times (over the infinity time horizon). We

hence conclude:

Corollary 6.1

If a Markov chain has finite state space, then at least one of its state is

positive recurrent. ♦Further, we find the following is true.

Corollary 6.2

If state i is recurrent, and states i and j communicate, then state j is also

recurrent.

Proof: We use the criterion that a renewal event is recurrent if and only if∑n un =∞. See Theorem 5.1. For state j, the un is pnjj.

Since i and j communicate, there exist m1 and m2 such that

p(m1)ij > 0, p

(m2)ji > 0.

In addition,

p(m1+n+m2)jj ≥ p

(m2)ji p

(n)ii p

(m1)ij .

Page 92: Stat333 Lecture Notes - Daum

6.3. CLASSIFICATION OF STATES 87

Now we sum over n on both size and find∑n

p(m1+n+m2)jj ≥ p

(m2)ji [

∑n

p(n)ii ]p

(m1)ij =∞.

Note that it implies∑n p

njj =∞. ♦

Remarks: The above result implies that recurrent property is a class prop-

erty. If one state is a class is recurrent, than all the states in the same class

are recurrent. Further, we see that transient property is also a class property.

If one state is transient, then all states in the same class are transient.

We claim without proof that positive recurrent property, periodicity are

all class properties.

Example 6.8

Let the Markov chain consisting of the states 0, 1, 2, and 3 have the transition

probability matrix

P =

0 0 1

212

1 0 0 0

0 1 0 0

0 1 0 0

.Determine which states are transient and which are recurrent.

Solution: The Markov chain is irreducible and have finite state space,

hence all states are recurrent. ♦

Example 6.9

Consider the Markov chain having states 0, 1, 2, 3 and 4 with the transition

probability matrix

P =

12

12

0 0 012

12

0 0 0

0 0 12

12

0

0 0 12

12

014

14

0 0 12

.

Classify the state space (including identifying the transient, positive recur-

rent, null recurrent classes, and the periods of each classes).

Page 93: Stat333 Lecture Notes - Daum

88 CHAPTER 6. DISCRETE TIME MC

Solution: This chain consists of the three classes {0, 1}, {2, 3} and {4}.The first two classes are positive recurrent and the third transient. All classes

are aperiodic. ♦

Example 6.10 (Simple random walk)

We already know many properties of the simple random walk. It is simple

to see that the chain is irreducible. When p = 0.5, all states are recurrent.

When p 6= 0.5, all states are transient. Note that this chain has infinite state

space.

We may verify the recurrence property of state 0 directly.

Recall

p(2n)00 =

(2n)!

n!n!{p(1− p)}n

for all n = 1, 2, . . .. Using Stirling formula, we find

p(2n)00 ≈ {4p(1− p)}

n

√nπ

.

Hence, when 4p(1− p) < 1,∑n p

(n)00 <∞. The state 0 is transient. This

happens when p 6= 0.5. Otherwise the sum is infinity and the state 0 is

recurrent. ♦It turns out that the two-dimensional random walk (on grid) has similar

property. If the probability of moving in 4 directions are equal, then all the

states are recurrent. The simple random walks on three or higher dimensions

lose this property. All states are transient.

Remark It is often asked whether a closed class is a recurrent class. The

simple random walk example answers this question. When p 6= 0.5, the state

space is a closed class, but it is also a transient class.

Remark We have discussed many ways to find out whether a state is recur-

rent or transient. It may not be clear which methods should be applied to

inexperienced users. My thumb of rules are:

1. When confused, ask yourself what the definitions of recurrentness or

transientness are.

Page 94: Stat333 Lecture Notes - Daum

6.4. LIMITING PROBABILITIES 89

2. Try first to see if the state is transient. If it belongs to an open class,

then it is transient.

3. If it belongs to a closed class. Check the finiteness of the class. If yes,

state i is positive recurrent.

4. If it belongs to a class which is closed but has infinite size, try to see if

you can tell∑n p

(n)ii is finite or infinite.

Unfortunately, you have to be resourceful in order to use the last criterion.

Thus, you should use it only as the last resource.

6.4 Limiting Probabilities

Recall a renewal event is aperiodic if its period is one. A Markov chain is

irreducible if all its state belong to the same class. A positive recurrent,

aperiodic state is called ergodic. These properties are all class properties.

That is, if one state is found to have the property, all states in the same class

will share this property.

It is seen that the precise distribution ofXn is often hard to obtain for each

n. However, it is realized that when the process has been in existence for a

long time, the distribution of Xn seem to stabilize. For example, shortly after

we drop a coloured water into a cup of clean water, we cannot tell exactly

what these colour molecules are. After we waited enough amount of time,

we are certain that they have spread out very uniformly. Mathematically, we

can find a limit for P (Xn = j) as n→∞. The result is summarized by the

following theorem.

Theorem 6.1

For an irreducible ergodic Markov chain, limn→∞ p(n)ij exists and is indepen-

dent of initial state i. Further, letting

πj = limn→∞

p(n)ij ,

Page 95: Stat333 Lecture Notes - Daum

90 CHAPTER 6. DISCRETE TIME MC

then πi is the unique nonnegative solution of

πj =∞∑i=0

πipij,

∑j

πj = 1.

♦We do not present a proof of this theorem. However, since the Markov

chain entering state i is a renewal event, the renewal theorem applies. Hence,

under the conditions in this theorem lim p(n)jj exists and equals µ−1

j where µjis the average time it takes for the Markov chain to come back to state j.

Further, when the Markov chain is irreducible and ergodic, for any state

j, the probability for the chain to enter j eventually is 1. This implies that

the limit of p(n)ij is the same as pjj regardless of from which state the chain

starts.

Let π be the vector of the limiting probabilities, and βn be the vector of

P (Xn = j|X0 = i). From Chapman-Kolmogorov equality, we have

βn+1 = βnP.

Let n→∞ on both sides, under the assumption that the limit of p(n)ij exists,

we have

π = πP.

That is, π must be the solution of this equation. However, P − I does not

have full rank. There exist many many solutions. The one which also satisfies∑∞i=0 πj = 1 gives the limiting probabilities. The solution with this property

is unique.

The renewal theorem claims πj = µ−1j where µj is the expected inter-

occurrence time. Hence πj is the long run proportion of times the Markov

chain is in state j.

When the Markov chain is irreducible and positive recurrent, but not

aperiodic, we may still have a unique non-negative solution of

π = πP

Page 96: Stat333 Lecture Notes - Daum

6.4. LIMITING PROBABILITIES 91

satisfying∑j πj = 1. In this case, πj is still long-run proportion of time when

the Markov chain is in state j, but the limit of p(n)ij may not exist. We still

have πj = µ−1j . That is, the expected inter-occurrence time is given by π−1

j .

When limn→∞ p(n)ij exists and equal for all i, then limn→∞ P (Xn = j) = πj.

If a solution to

π = πP

satisfying∑j πj = 1 exists and the Markov chain is irreducible, then all states

are positive recurrent.

Recall that P (Xn = j) =∑∞i=0 p

(n)ij P (X0 = i). Let n → ∞ results in

limP (Xn = j) = πj.

If βn = π, then βn+m = π for all m = 1, 2, . . .. Hence, we say π is the

stationary distribution of the Markov chain. In some books, it is also

called the steady state of the Markov chain. It can be seen that the sta-

tionary distribution may exist even when the Markov chain is reducible (not

irreducible). In this case, there can exist more than one stationary distribu-

tions.

When βn = π, we also call that the Markov chain has reached equilib-

rium. Under this status, the rate of the chain entering any given state is the

same as the rate of the chain leaving this state.

Example 6.11

A problem of interest to sociologists is to determine the proportion of soci-

ety that has an upper or lower class occupation. One possible mathematical

model would be to assume that transition between social classes of the suc-

cessive generations in a family can be regarded as transitions of a Markov

chain. Let us assume that we examine a single family tree of their first child.

Let Xn = 0, 1, 2 depending on the social class of the child in the nth genera-

tion. Suppose Xn is a Markov chain and the transition probability matrix is

given by

P =

0.45 0.48 0.07

0.05 0.70 0.25

0.01 0.50 0.49

,Solving the equation πP = π and

∑πj = 1, we will get

π0 = 0.07, π1 = 0.62, π2 = 0.31.

Page 97: Stat333 Lecture Notes - Daum

92 CHAPTER 6. DISCRETE TIME MC

In other words, in long run, the child under consideration has 7% chance

of belonging to class 0. If this model applies to all individuals in the society,

about 7% of people in the population will belong to class zero in long run.

Example 6.12

Consider a large population of individuals and consider their genotype at a

specific locus. Each individual has a pair of genes at this locus. A gene can

have different forms called alleles. In this example, we assume that there are

only two possible alleles named A and a. In generation 0, the proportion

of individuals with genotype AA, aa or Aa are respectively p0, q0 and r0

(p0 + r0 + q0 = 1). Mendalian’s law states that the child of a couple will

inherit one gene from each parent. The two genes of each parent is equally

likely to be transmitted to its offspring.

It is a bit difficult to define a sequence of random variables Xn explicitly

here. Consider a line of individuals so that each person is the first child

of the person considered in the last generation. This person serves as a

representative of the general population.

Let Xn be the genotype of the nth person in this line, n = 0, 1, 2, . . ..

Assume

P (X0 = AA) = p0, P (X0 = Aa) = r0, and P (X0 = aa) = q0.

That is, the first individual is chosen randomly from the population.

In addition, we assume his/her spouse will be selected from the population

randomly (at least in terms of his/her genotype). Let Yn be the genotype of

the spouse of the “Xn”. We assume that Yn has the same distribution as Xn.

P (X1 = AA|X0 = AA)

= P (X1 = AA, Y0 = AA|X0 = AA)

+P (X1 = AA, Y0 = Aa|X0 = AA)

+P (X1 = AA, Y0 = aa|X0 = AA)

= p0 +1

2r0 + 0.

Page 98: Stat333 Lecture Notes - Daum

6.4. LIMITING PROBABILITIES 93

If Yn has distribution pn, rn, qn, we have

P (Xn+1 = AA|Xn = AA)

= P (Xn+1 = AA, Yn = AA|Xn = AA)

+P (Xn+1 = AA, Yn = Aa|Xn = AA)

+P (Xn+1 = AA, Yn = aa|Xn = AA)

= pn +1

2rn + 0.

Note that P (Yn = AA|Xn = AA) = P (Yn = AA) due to independence.

Straightforward calculation reveals that the transition probability matrix

for the nth generation to the (n+ 1)th generation is given by

Pn =

pn + rn

20 q + rn

2

0 qn + rn2

p+ rn2

pn2

+ rn4

qn2

+ rn4

pn2

+ q2

+ rn2

.Using the notation (pn, rn, qn) for the distribution of Xn, we have for all

n ≥ 1,

(pn, rn, qn) =[(p0 +

r0

2)2, 2(p0 +

r0

2)(q0 +

r0

2), (q0 +

r0

2)2].

Our computational results imply that the distribution of Xn stabilizes after

one generate. It is simple to verify that all Pn, n ≥ 0 are in fact equal. Hence

{Xn}∞n=0 is a Markov chain. Its limiting probability is given by β1 and the

transition probability matrix is given by P1. ♦

Example 6.13 (Renewal events):

Consider a sequence of Bernoulli trials with outcomes H and T so that

P (H) = p and P (T ) = q. We may model any pattern as a state of a

Markov chain. For instance, if we are interested in the occurrence of TH, we

may defineoutcome at n-1, n Xn =

TT 0

TH 1

HT 2

HH 3.

Page 99: Stat333 Lecture Notes - Daum

94 CHAPTER 6. DISCRETE TIME MC

We may have to defineX1 differently. This is obviously an irreducible, ergodic

Markov chain. It is obvious that for all n ≥ 2.

P (Xn = 1) = pq.

By Theorem 6.1 about the limiting probabilities of Markov chain, we have

π1 = pq. The average time it takes for the Markov chain to come back to

state 1 (starting from 1) is therefore 1pq

.

Since it happens that starting from state 1 is equivalent to starting from

nothing, hence the expected waiting time for HT to occur is the same as the

average inter-occurrence time. ♦This result is can also be regarded as a result from applying the Renewal

Theorem (Theorem 5.2).

Example 6.14

If we want to know the expected time for the occurrence of the pattern THT ,

a caution has to be applied. Using the same argument, we can get the average

time it takes to the next appearance of THT starting from THT which is1pq2 . However, starting from THT is different from starting from nothing.

You may notice that THT is a delayed renewal event.

To calculate the average waiting time for the first occurrence of THT

from the beginning, we take note of the following fact. Once T has occurred,

the waiting time distribution of the occurrence of THT from this moment

is just the same as the waiting time distribution for THT from the moment

when THT occurred. Therefore, we can simply calculate the average waiting

time for the first occurrence of T . It happens that T is a renewal event and

the technique used in Example 6.13 can be used.

The renewal theorem, or the limiting probability theorem for Markov

chain tell us that this average waiting time is 1q. Consequently, the average

waiting time for THT to occur from the beginning is

1

q+

1

pq2.

♦Can you justify now that

E[waiting time until THTH appears] =1

pq+

1

p2q2?

Page 100: Stat333 Lecture Notes - Daum

6.5. MEAN TIME SPENT IN TRANSIENT STATES 95

6.5 Mean Time Spent in Transient States

If a state is transient, then the Markov chain will leave this state alone after

some finite amount of time. Let fi be the probability that the the chain will

return to state i (start from i). Then the number of future visits Ni (start

from i) has geometric distribution

P (Ni = n) = fni (1− fi), n = 0, 1, . . . .

Hence E[Ni] = (1− fi)−1. Please note that the probability of success (leave

forever from now) is 1− fi in this case.

It is, however, not obvious how to calculate fi. Assume we have a finite

class of transient states T = {1, 2, . . . , t}. Let

PT = [pij]

be the set of transition probabilities within this class. This is just a submatrix

of the full transition probability matrix.

We claim that at least one of its rows has a sum less than 1. Otherwise,

the Markov chain will stay inside this class forever. If so, this class is not a

transient class.

Let sij be the expected number of time periods (which is a random

variable) that the chain will visit j, given that it starts from i. Let δi,j = 1

when i = j and 0 otherwise.

Recall that if the chain leaves this class, it will never come back. Oth-

erwise, we find states from other class that communicate with states in this

class. This contradicts the assumption of class. Using conditional expecta-

tion technique, we get

sij = δi,j +t∑

k=1

pikskj.

Let S = [sij], we may write the above equation as

S = I + PTS.

Hence S = (I − PT )−1.

Example 6.15

Page 101: Stat333 Lecture Notes - Daum

96 CHAPTER 6. DISCRETE TIME MC

Consider the gambler’s ruin problem with p = 0.4 and N = 7. The class of

transient states consists of {1, 2, . . . , 6}. We can easily find pi,i+1 = 0.4 and

pi,i−1 = 0.6 for these transient states. Inverting I − P gives a big matrix

(using Splus or whatever you can think of). It turns out that

s3,5 = 0.9228, s3,2 = 2.3677.

How does the above calculation relates to fi we discussed earlier? Let fijbe the probability that start from state i, the Markov chain will ever visit

state j (again if i = j). Hence, fi = fii. It can be seen that

sij = (δij + sjj)fij + δij(1− fij)= δij + fijsjj.

This is a very simple relation. ♦

6.6 Problems

1. Let {Xn}∞n=0 be a stochastic process.

(a) If for each fixed n, Xn has density function

f(x) = 1 when x ∈ [n, n+ 1],

write down the state space of this process.

(b) If for each fixed n, P (Xn = n) = P (Xn = −1) = 0.5, write down

the state space of this process.

(c) Which of the above state spaces, the one in (a) or in (b), is count-

able?

2. Suppose that whether or not it rains today depends on previous weather

conditions through the last three days. Show how this system may

be analyzed by using a Markov chain. How many states are needed?

Define the stochastic process, list the state space.

Suppose also that if it has rained for the past three days, then it will

rain today with probability 0.8; if it did not rain for any of the past

Page 102: Stat333 Lecture Notes - Daum

6.6. PROBLEMS 97

three days, then it will rain today with probability 0.2; and in any other

case the weather today will, with probability 0.6, be the same as the

weather yesterday. Determine the the transition matrix.

3. Let the transition probability matrix of a two-state Markov chain be

given by

P =

(p 1− p

1− p p

).

Show by mathematical induction that

P(n) =

(0.5 + 0.5(2p− 1)n 0.5− 0.5(2p− 1)n

0.5− 0.5(2p− 1)n 0.5 + 0.5(2p− 1)n

).

4. Let the one step transition matrix of an MC be

P =

[1− a a

b 1− b

]0 < a, b < 1.

Show that the n-step transition matrix

P n =1

a+ b

[b a

b a

]+

(1− a− b)n

a+ b

[a −a−b b

].

Use matrix multiplication directly to obtain P 3 when a = b = 0.25.

Verify the result by using the formula you just obtained.

5. Specify the classes of the following Markov chains and determine whether

they are transient or recurrent, whether they are periodic or aperiodic.

For recurrent states, find their mean recurrence time.

P1 =

0 0.5 0.5

0.5 0 0.5

0.5 0.5 0

P2 =

0 0 0 1

0 0 0 1

.5 .5 0 0

0 0 1 0

P3 =

1/2 0 1/2 0 0

1/4 1/2 1/4 0 0

1/2 0 1/2 0 0

0 0 0 1/2 1/2

0 0 0 1/2 1/2

P4 =

1/4 3/4 0 0 0

1/2 1/2 0 0 0

0 0 1 0 0

0 0 1/3 2/3 0

1 0 0 0 0

Page 103: Stat333 Lecture Notes - Daum

98 CHAPTER 6. DISCRETE TIME MC

P5 =

0 0 1 0

1 0 0 0

0.5 0.5 0 0

1/3 1/3 1/3 0

P6 =

0 1 0 0

0 0 0 1

0 1 0 0

1/3 0 2/3 0

.

P7 =

1/3 2/3 0 0 0 0

2/3 1/3 0 0 0 0

0 0 1/4 3/4 0 0

0 0 1/5 4/5 0 0

1/4 0 1/4 0 1/4 1/4

1/6 1/6 1/6 1/6 1/6 1/6

P8 =

1 0 0 0 0 0

0 3/4 1/4 0 0 0

0 1/8 7/8 0 0 0

1/4 1/4 0 1/8 3/8 0

1/3 0 1/6 1/6 1/3 0

0 0 0 0 0 1

6. Let {Xn}∞n=0 be a Markov Chain with transition probability matrix

P =

23

13

0 034

14

0 013

0 13

13

0 0 0 1

.

1) Classify the state space into classes. Assume the state space is

{0, 1, 2, 3}.2) Which of them are recurrent, or transient?

3) Find the period of state 2.

4) Find the expected inter-occurrence times for all recurrent states.

(The answers to some states should be obvious; Limiting probabilities

are useful if you know how to get them).

7. Prove that if the number of states in a Markov chain is M , and if state

j can be reached from state i, it can be reached in M steps or less.

8. A transition matrix P is said to be doubly stochastic if the sum over

each column equals one; that is∑i

pij = 1, for all j.

If such a chain is irreducible and aperiodic and consists of M+1 states,

0, 1, . . .M , show that the limiting probabilities are given by

Page 104: Stat333 Lecture Notes - Daum

6.6. PROBLEMS 99

πj =1

M + 1, j = 0, 1, . . . ,M

9. Let {Xn}∞n=0 be a Markov Chain with transition probability matrix

P =

13

23

0 012

12

0 014

0 14

12

0 0 0 1

.

1) Classify the state space into classes.

2) Which of them are recurrent, or transient?

3) Find the period of state 2. (assume the state space is { 0, 1, 2, 3}).

4) Find the expected inter-recurrent times for all recurrent states. (The

answers to some states should be obvious; Limiting probabilities)

10. Consider the transition matrix

P =

14

34

0 0 034

14

0 0 012

12

0 0 0

0 0 0 1 0

0 13

0 23

0

.

(a) Show that S consists of 2 closed classes and 2 open classes. What

are these classes?

(b) Determine the period of each of the closed classes.

Note that it is impossible to return to either of the transient states 2

and 4 in this chain. In this case, we set the period of the state to be

infinity, to indicate that the chain cannot return to this state.

(c) Find the unique steady state corresponding to each of the closed

classes.

(d) Write down the general form of all steady states for P .

Page 105: Stat333 Lecture Notes - Daum

100 CHAPTER 6. DISCRETE TIME MC

(2) If X0 = 2, what is the probability of absorption in to the class

{0, 1}? If X0 = 4, what is the probability of absorption in to the class

{0, 1}?

11. Consider the transition matrix

P =

15

15

15

0 15

15

0 13

0 23

0 0

0 0 12

0 12

0

0 35

0 25

0 0

0 0 12

0 12

014

14

0 0 12

0

.

(a) Show that S consists of two closed classes and one open class.

(b) Find the period of each of the three classes.

(c) Find the unique steady state corresponding to each closed class,

and write down the general form of all steady states for P .

(d) Find the probability of absorption into {1, 3} from state 0 and

the probability of absorption into {1, 3} from state 5. What can you

say about the probabilities of absorption in {2, 4} from states 0 and 5

respectively?

12. Consider the transition matrix

P =

0 13

23

0 0

0 0 0 14

34

0 0 0 14

34

1 0 0 0 0

1 0 0 0 0

.

(a) Check that P is irreducible and find the period of P .

(b) Solve for the unique steady state of P .

(c) Use the periodic form of the Main Convergence Theorem to find

the mean recurrence time of each of the states.

Page 106: Stat333 Lecture Notes - Daum

6.6. PROBLEMS 101

13. Consider a chain with states 0, 1, 2, . . ., a with

p0,1 = 1, Pa,a−1 = 1

and

Pij =

i2/a2 j = i− 1

(a− i)2/a2 j = i+ 1

2i(a− i)/a2 j = i, (i 6= 0, a).

Show that the chain is ergodic and obtain the stationary distribution.

14. One form of a random walk with two reflecting barriers has transition

matrix given by

P00 = 1− p, P01 = p;

Pj,j−1 = q, Pj,j = r, Pj,j+1 = p when 0 < j < a;

Pa,a−1 = q, Pa,a = 1− q.where p+q+r = 1 and all non-zero. Show that the chain is irreducible

and aperiodic. Determine the stationary distribution for this chain.

15. Let {Zn}∞n=0 be a branching process with the family size distribution

given by P (X = 0) = 1/3, P (X = 2) = 2/3.

1) State the definition of the Markov chain.

2) Verify that {Zn}∞n=0 is a Markov chain. Calculate the transition

probabilities pij. (Think about situations such as i = 0; j = 0; j is odd

etc).

3) Classify the state space. Indicate whether they are recurrent or

transient. Give a one line explanation.

4) Can you find a stationary distribution?

16. Each morning an individual leaves his house and goes for a run. He is

equally likely to leave either from his front or back door. Upon leaving

the house, he chooses a pair of running shoes (or goes running barefoot

if there are no shoes at the door from which he departed). On his return

he is equally likely to enter, and leave his running shoes, either by the

front or back door. If he owns a total of k pairs of running shoes, what

proportion of the time does he run barefooted?

Page 107: Stat333 Lecture Notes - Daum

102 CHAPTER 6. DISCRETE TIME MC

17. The proof copy of a book is read by an infinite sequence of editors

checking for mistakes. Each mistake is detected with probability p

at each reading; between readings the printer corrects the detected

mistakes but introduces a random number of new errors (errors may

be introduced even if no mistakes were detected). Assume as much

independence as usual and that the numbers of new errors after dif-

ferent readings are independent and have Poisson distribution. Find

the stationary distribution of the number Xn of errors after the nth

editor-printer cycle.

18. For a series of dependent trials the probability of success on any trial

is (k + 1)/(k + 2) where k is equal to the number of successes on the

previous two trials. Compute limn→∞ P (success on the nth trial).

19. It is known that for a Markov Chain, the limit probabilities exist if it

is ergodic and aperiodic. Find a simple example of Markov chain such

that it does not satisfy all the conditions but the limit probabilities still

exist.

20. Suppose there are 5 white and 5 black balls in an urn. On each day, a

ball is selected randomly and replaced by a ball with other color. Let

Xn = 0 if a while ball is selected and Xn = 1 otherwise. Also, let Ynbe the number of white balls in the urn after nth selection. Are {Xn}and {Yn} Markov chain? If not, explain why. If yes, list the state space

and obtain the transition matrix.

21. Suppose 4 balls are placed into two urns A and B. On each day, One

ball is selected such that each of the four balls is equally likely to be

selected and the ball is then placed into the other urn.

Let Xn be the number of balls in urn A on the nth day. Let Yn be the

number of balls in urn A on the 2nth day.

a) Are {Xn}∞n=0 and {Yn}∞n=0 Markov chains. If any of them are, write

down their state spaces and transition matrices and do the usual clas-

sification.

b) Given X0 = 1, find the probability function of X2.

Page 108: Stat333 Lecture Notes - Daum

6.6. PROBLEMS 103

c) In a long run, what proportion of times when at least one urn is

empty?

d) Given X0 = k, calculate the probability that number of balls in

urn A reaches 0 before the number of balls in urn B reaches 0 for

k = 0, 1, 2, 3 and 4.

22. Suppose that coin 1 has probability 0.7 of coming up heads, and coin 2

has probability 0.6 of coming up heads. If the coin flipped today comes

up heads, then we select coin 1 to flip tomorrow, and if it comes up

tails, then we select coin 2 to flip tomorrow. If the coin initially (on

the 0th day) flipped is equally likely to be coin 1 or coin 2, then what

is the probability that the coin flipped on the third day after the initial

flip is coin 1?

23. A particle moves on a circle through points which have been marked

0, 1, 2, 3, 4 (in clockwise order). At each step it has a probability p of

moving to the right (clockwise) and 1−p to the left (counterclockwise).

Let Xn denote its location on the circle after the nth step. Show that

the process {Xn, n ≥ 0} is a Markov chain.

(a) Find the transition probability matrix.

(b) If we know X1 = 2, what is the probability of X3 = 4?

(c) If X0 is equally likely to be 0, 1, 2, 3, 4, what is the probability of

X3 = 4?

24. Consider a process {Xn, n = 1, 2, . . .} which takes on the values 0, 1,

or 2. Suppose

P{Xn+1 = j|Xn = i,Xn−1 = in−1, . . . , Xo = io} =

{P Iij, when n is even

P IIij , when n is odd

where∑2j=0 P

Iij =

∑2j=0 P

IIij = 1, i = 0, 1, 2. Is {Xn, n ≥ 0} a Markov

chain? If not, then show how, by enlarging the state space, we may

transform it into a Markov chain.

25. Show that if state i is recurrent and state i does not communicate with

state j, then pij = 0. This implies that once a process enters a recurrent

Page 109: Stat333 Lecture Notes - Daum

104 CHAPTER 6. DISCRETE TIME MC

class of states it can never leave that class. For this reason, a recurrent

class is often referred to as a closed class.

Page 110: Stat333 Lecture Notes - Daum

Chapter 7

Exponential Distribution and

the Poisson Process

Recall that we commented that the real world processes are often too complex

to be analyzed based our current mathematical knowledge. We hence often

restrict ourselves to stochastic processes with simple and nice mathematical

properties. Hopefully, the results we developed are still applicable to the real

world approximately. If the model is too far off, we may then increase the

complexity of our model to see if the generalized model helps.

The discrete time Markov chain ignores the duration between two tran-

sitions. It is often still satisfactory when we use it to model the gambling

problem, the English text, or even the music notes. When it is used to model

the population size, the idea of generation is obviously too rough. It is very

important to know that some individuals may give birth at younger age than

others.

The waiting time for the next transition (when one gives birth, for ex-

ample) should clearly be regarded as the outcome of a random variable. It

turns out that a simple yet more realistic assumption on its distribution is

exponential. Unlike the normal distribution, the exponential distribution is

non-negative, its cumulative distribution function has simple form, and it

has memoryless property.

105

Page 111: Stat333 Lecture Notes - Daum

106 CHAPTER 7. EXPONENTIAL AND POISSON

7.1 Definition and Some Properties

The density function of an exponential distribution with intensity param-

eter λ is given by

f(x) = λ exp(−λx), x ≥ 0.

Its c.d.f. is given by

F (x) = 1− exp(λx) x ≥ 0.

It is simple to find that the moment generating function is given by

φ(t) =λ

λ− t

which is defined for all t < λ.

If X has exponential distribution with parameter λ, then

E(X) =1

λ, V ar(X) =

1

λ2.

Please note that if we call µ = E(X), then µ = 1λ. We also call λ the

rate of the exponential distribution. It is more convenient to parameterize

the distribution in the current way in this course. However, it might be more

convenient to use mean as a parameter in other courses such as STAT230.

Hence, when reading the phrase that X has exponential distribution with pa-

rameter equaling 2.4 (say), we must first clarify what this parameter stands

for. Two possibilities are 1) the mean equals 2.4, or 2) the intensity param-

eter or the rate equals 2.4.

7.2 Properties of Exponential Distribution

The Exponential distribution has the well known memoryless property. If X

has exponential distribution, then for any s, t > 0,

P (X > s+ t|X > t) = P (X > s).

In fact, the exponential distribution family is the only class of distributions

with the above property.

Page 112: Stat333 Lecture Notes - Daum

7.2. PROPERTIES OF EXPONENTIAL DISTRIBUTION 107

The memoryless has interesting implications. If how long we live has

exponential distribution, then no matter how old each of us is now, the

expected remaining life time will be the same. The insurance company should

not ask for higher premium for life insurance from older folks. For any

continuous random variable, we may calculate the the conditional probability

P (X ∈ (t, t+ dt)|X > t) ≈ f(t)dt

1− F (t)= r(t)dt.

We call

r(t) =f(t)

1− F (t)

the hazard rate. The hazard rate for exponential distribution is a constant.

That is, it does not depend on t.

Our life time distribution has a non-constant hazard rate for obvious

reasons. Hence, it does not make sense to use exponential model for insurance

companies. However, the hazard remains almost constant for a period of

time, say from age 22 to 40. An exponential model is still helpful in many

ways.

Example 7.1

Let X1, X2, . . . , Xn be independent exponential random variables with re-

spective rates λ1, . . . , λn, where λi 6= λj when i 6= j. Let N be a random

variable independent of these random variables, such that

pj = P (N = j),n∑j=1

pj = 1.

Then XN is said to be a hyper-exponential random variable. Its density

function is a mixture:

f(t) =n∑j=1

pjλj exp{−λjt}.

It can be shown that the risk function r(t) converges to min(λj) as t→∞.

Page 113: Stat333 Lecture Notes - Daum

108 CHAPTER 7. EXPONENTIAL AND POISSON

Example 7.2

Let Xi, i = 1, 2, . . . , n be iid random variables with exponential distribution

and rate λ. Then the density function of Sn = X1 + . . .+Xn is

fn(t) =λn

(n− 1)!tn−1 exp(−λt)

for t ≥ 0. This distribution is called Gamma with n degrees of freedom and

scale parameter 1λ. ♦

Example 7.3

Assume X1 and X2 are two independent exponential random variables with

rates λ1 and λ2. Then

1. min(X1, X2) has exponential distribution with rate λ1 + λ2.

2. P (X1 < X2) = λ1

λ1+λ2.

3. Further, the event “X1 < X2” is independent of the event min(X1, X2) ≤t for any t.

♦We can generalize the above result easily. Suppose Xi, i = 1, 2, . . . , n are

independent exponential random variables with rate λi. Then

1. min{Xi : i = 1, . . . , n} has exponential distribution with rate∑λi.

2. The probability that X1 has the smallest value (among Xi) is given by

λ1∑i λi

.

3. The event “Xi has the smallest value” is independent of the actual

value of the smallest random variable.

Page 114: Stat333 Lecture Notes - Daum

7.3. THE POISSON PROCESS 109

7.3 The Poisson Process

We start with the simplest continuous time stochastic process. If, starting

from a conceptual beginning at t = 0, we are able to determine the value of

N(t) for each given t > 0 such that N(t) represents the number of occurrence

of some incidents (events). We say {N(t) : t ≥ 0} is a counting process.

Let {N(t) : t ≥ 0} be a stochastic process. To qualify as a counting

process, it must satisfy

(i) N(t) ≥ 0;

(ii) N(t) is integer valued;

(iii) If s < t, then N(s) ≤ N(t).

We say the counting process has independent increment if

N(t2)−N(t1), N(s2)−N(s1)

are independent whenever t2 ≥ t1 ≥ s2 ≥ s1. Assume N(t) represents

the number of phone calls by time t from the beginning of the day. The

independent increment property implies that the number of calls I receive in

the first hour is independent of the number of phone calls in the second hour.

If I receive more phone calls in the second hour, it is because that it happens

to be so, not because I received few than usual calls in the first hour.

We say that the counting process has stationary increment if the dis-

tribution of

N(t+ s)−N(t)

depends only (determined by) on s, but not on t. If the number of students

come to my office hours can be modeled by a counting process with stationary

increment, then I should receive similar number of visits over weeks. This

is obviously not true, since I do observe significant increasement in the week

with assignment due. Thus, this counting process does not have stationary

increment.

The most important special case of the counting process is the Poisson

process. Here we put forward a regular, but slightly different definition from

Stat230.

Definition 7.1 Poisson Process Definition 1

Page 115: Stat333 Lecture Notes - Daum

110 CHAPTER 7. EXPONENTIAL AND POISSON

The counting process {N(t), t ≥ 0} is said to be a Poisson process having

rate λ, λ > 0, if

(i) N(0) = 0;

(ii) the process has independent increments;

(iii) the number of events (incidents) in any interval of length t is Poisson

distributed with mean λt. That is, for all s, t,≥ 0,

P [N(s+ t)−N(s) = n] =(λt)n

n!exp(−λt), n = 0, 1, . . .

♦Remark: Note that the “number of events” is a random variable. Hence,

its random behavior is specified by a cumulative distribution function. This

definition requires that this distribution to be Poisson in order for the process

to be called a Poisson process.

The definition may not be very useful. It is hard to see in any applica-

tions, why the third condition might be satisfied. That is why an equivalent

definition is called for. To this aim, we define the notation o(h) (call it small

o of h).

Definition 7.2 Definition of o(h).

If a function f(h) satisfies

limh→0

f(h)

h= 0,

we say f(h) = o(h). ♦A simple non-trivial example is that f(h) = 1− cos(h) = o(h).

Here comes the second equivalent definition of Poisson process.

Definition 7.3 Poisson Process Definition 2

The counting process {N(t), t ≥ 0} is said to be a Poisson process having

rate λ, λ > 0, if

(i) N(0) = 0.

(ii) The process has stationary and independent increments.

(iii) P [N(h) = 1] = λh+ o(h).

(iv) P [N(h) ≥ 2] = o(h). ♦

Page 116: Stat333 Lecture Notes - Daum

7.3. THE POISSON PROCESS 111

Theorem 7.1

Two definitions of Poisson process are equivalent.

Proof: We will only show the conditions (iii) and (iv) in Definition 2 implies

the condition (iii) in Definition.

We first work on P (N(t) = 0). Define P0(t) = P (N(t) = 0).

Let h > 0 be a small number. Then

P0(t+ h) = P (N(t) = 0, N(t+ h)−N(t) = 0)

= P (N(t) = 0)P (N(t+ h)−N(t) = 0) independent increment

= P0(t)P (N(h) = 0) stationary increment

= P0(t){1− P (N(h) = 1)− P (N(h) ≥ 2)}= P0(t){1− λh+ o(h)}.

Consequently, we have

P0(t+ h)− P0(t)

h= −λP0(t) +

o(h)

h.

Let h → 0. The left hand side is the derivative of P0(t) and the right hand

side gives the result. That is,

P ′0(t) = −λP0(t).

The solution of this differential equation is given by

P0(t) = exp(−λt)

in view of the boundary condition P0(0) = 1.

Next, we build on top of this result. We use mathematical induction for

other cases. Define

Pn(t) = P (N(t) = n)

and assume

Pk(t) =(λt)k

k!exp{−λt}

for k = 0, 1, . . . , n − 1. We have shown that this assumption is true when

n = 1.

Page 117: Stat333 Lecture Notes - Daum

112 CHAPTER 7. EXPONENTIAL AND POISSON

Base on the above induction assumption, we try to show the expression

is true when k = n.

We have

Pn(t+ h) = P (N(t+ h) = n)

= P (N(t) = n,N(t+ h)−N(t) = 0)

+P (N(t) = n− 1, N(t+ h)−N(t) = 1)

+P (N(t+ h) = n,N(t+ h)−N(t) ≥ 2)

= Pn(t){1− λh+ o(h)}+ (λh)Pn−1(t) + o(h).

Note that, we have used the fact o(h) + o(h) = o(h). Thus, we get

Pn(t+ h)− Pn(t)

h= −λPn(t) + λPn−1(t) +

o(h)

h.

Let h→ 0, we get

P ′n(t) = −λPn(t) + λPn−1(t)

= −λPn(t) + λ(λt)n−1

(n− 1)!exp(−λt).

It is a differential equation which can be solved by standard methods. We

however choose to point out that the analytical form suggested for Pn(t)

solves the above equation. ♦

7.3.1 Inter-arrival and Waiting Time Distributions

Let T1 be the waiting time for the first event in a Poisson process with rate

λ. It is obvious that for all t ≥ 0,

P (T1 > t) = P (N(t) = 0) = exp(−λt).

Hence, T1 has exponential distribution with rate λ.

Now, let T2 be the waiting time for the second event after the first event

has occurred. We call it “inter-arrival time”. What is the distribution of T2?

Note that

P (T2 > t|T1 = s) = P (N(s+ t)−N(s) = 0) = exp(−λt).

Page 118: Stat333 Lecture Notes - Daum

7.4. FURTHER PROPERTIES 113

Hence, T2 has exponential distribution with rate λ too.

One caution is that P (T1 = s) = 0. The above argument is not fully

mathematically satisfactory. We, however, do not have the tool to completely

avoid this problem.

Theorem 7.2

The inter-arrival times Tn, n = 1, 2, . . . are independent and identically dis-

tributed with common exponential distribution having rate λ (or mean 1/λ).

♦If Sn =

∑ni=1 Ti. Then

P (Sn ≤ t) = P (N(t) ≥ n).

The density function of Sn is

f(t) =λ(λt)n−1

(n− 1)!exp(−λt).

The corresponding distribution is called Gamma distribution with n de-

grees of freedom and scale parameter 1/λ.

7.4 Further Properties

Suppose that the events in a Poisson process can be classified into two types:

I and II. Further, this classification is random, and it is independent of the

process itself. For example, suppose we can model the number of customs

entering a store as a Poisson process. We classify customers into two types:

class one consists of customers who will buy something; class two consists of

custerms who will just have a look. If we further assume that their purchas-

ing decisions are made independently, then we are in a situation where the

following model will apply.

Let N(t) be the original process. Let N1(t) be the number of first type

events occurred in [0, t]. Similarly define N2(t).

Theorem 7.3

Page 119: Stat333 Lecture Notes - Daum

114 CHAPTER 7. EXPONENTIAL AND POISSON

Under the assumption that each event in a Poisson process can be indepen-

dently classified as types I and II, the two sub-counting processes are both

Poisson process with rates λp and λ(1 − p) where p is the probability of an

event being type I.

Proof: In this situation, it is more convenient to use the first definition of

the Poisson process.

We calculate P (N1(t) = n,N2(t) = m) for each pairs of non-negative

integers. This will give us the joint distribution of N1(t) and N2(t). Whether

N1(t) and N2(t) are independent, whether they all have Poisson distribution,

these questions will be answered easily. Other conditions in the definitions

are obvious.

Here is our calculation:

P (N1(t) = n,N2(t) = m) = P (N1(t) = n,N1(t) +N2(t) = n+m)

= P (N1(t) = n,N(t) = n+m)

= P (N1(t) = n|N(t) = n+m)P (N(t) = n+m)

=

(n+m

n

)pn(1− p)m × (λt)n+m

(n+m)!exp(−λt)

=(pλt)n

n!exp(−pλt){(1− p)λt}

n

n!exp{−(1− p)λt}.

Obviously, N1(t) and N2(t) are independent and both have Poisson distribu-

tion. ♦

7.5 Conditional Distribution of the Arrival

Times

Suppose that in a Poisson process, it is known that N(t) = 1. We want to

know when did this event occur during the period of [0, t].

As before, let T1 the the time when the first event occurred. For any

s ≤ t, we have

P (T1 ≤ s|N(t) = 1) =P (T1 ≤ s,N(t) = 1)

P (N(t) = 1)

Page 120: Stat333 Lecture Notes - Daum

7.5. CONDITIONAL DISTRIBUTION OF THE ARRIVAL TIMES 115

=P (N(s) = 1, N(t)−N(s) = 0)

P (N(t) = 1)

=s

t.

That is, the first event is equally likely to have occurred at any moment in

[0, t]. This is another evidence for uniformity.

Let Si, i = 1, 2, . . . be the time when the ith event occurred. Given

N(t) = n, what is the conditional joint distribution of Si, for i = 1, 2, . . . , n?

For this purpose, let si, i = 1, 2, . . . , n be an increase sequence of positive

numbers such that sn < t and none of them are equal. Let us try to calculate

the probability of the event

“Si ∈ (si, si + dsi) for all i = 1, 2, . . . , n′′.

The notation dsi are just some imaginary small numbers. Roughly, we may

believe that

P (Si ∈ (si, si + dsi), i = 1, 2, . . . , n|N(t) = n)

=P (Si ∈ (si, si + dsi), i = 1, 2, . . . , n, N(t) = n)

P (N(t) = n)

= [P (N(si + dsi)−N(si) = 1, N(si)−N(si−1 + dsi−1) = 0, i = 1, 2, . . . , n,

N(t)−N(sn + dsn) = 0)]/[P (N(t) = n)]

≈ n!

tnds1ds2 · · · dsn.

That is, the joint density function of Si is given by

f(s1, . . . , sn) =n!

tn

for all 0 ≤ s1 ≤ s2 ≤ · · · ≤ sn ≤ t. Note that adding equalities do not change

the density function.

The question is then, what story does this density tell us? Suppose

Y1, Y2, . . ., Yn are n independent and identically distributed uniform ran-

dom variables in [0, t]. Arranging them in increasing order and denoting the

resulting sequence as Y(i), i = 1, 2, . . . , n, we get order statistics. It turns out

Page 121: Stat333 Lecture Notes - Daum

116 CHAPTER 7. EXPONENTIAL AND POISSON

that the ordered independent uniform random variables in [0, t] has the joint

density function given by

f(s1, . . . , sn) =n!

tn.

The moral is: the joint occurrence of n events in [0, t] is again uniform.

7.6 Problems

1. For a Poisson process show, for s < t, that

P{N(s) = k|N(t) = n} =

(k

n

)(s

t)k(1− s

t)n−k, k = 0, 1, . . . , n.

2. The number of bankruptcies in normal years Canada-wide follows a

Poisson process with intensity parameter λ = 10000 per month. Among

them, each bankruptcy has probability 0.98 to be personal bankruptcy,

others are business bankruptcies. Assume the independence of two

types of bankruptcies. Assume also that all 12 months are of equal

length.

(a) If in a particular month, 8500 bankruptcies have been observed,

what is the probability that no more than 10 of these are business

bankruptcies (expression only)?

(b) If in another particular month, 1000 business bankruptcies were

observed, what is the expected total number of bankruptcies in that

month?

(c) Assume the debt for each personal bankruptcy has normal dis-

tribution with mean µ = $100, 000 and standard deviation $40, 000,

independent of each other. Let X be the total debt of the personal

bankruptcies in a month. Calculate the mean and variance of X.

3. The number of meteorites that hit the Earth follows a Poisson process

with intensity parameter λ = 200 per month. Each meteorite has

probability p of reaching the ground, otherwise it burns up in the air.

Assume the usual independence when necessary.

Page 122: Stat333 Lecture Notes - Daum

7.6. PROBLEMS 117

(a) If 200 meteorites have hit the Earth in a particular month, what is

the expected number of them that reached the ground?

(b) If in another particular month, 20 meteorites were found to have

reached ground, what is the expected number of meteorites (including

those burnt up in the air) to have hit the Earth in that month?

(c) Assume the mass of meteorites have an exponential distribution

with mean µ = 1000 kg, independent of each other. Let X be the total

mass of meteorites that hit the Earth in a year. Calculate the mean

and variance of X. Assume a year equals exactly 12 months.

4. Cars pass a point on the highway at a Poisson rate of one per minute.

If five percent of the cars on the road are Dodges, then

(a) what is the probability that at least one Dodge passes by during a

hour?

(b) given that ten Dodges have passed by in an hour, what is the

expected number of cars to have passed by in that time?

(c) if 50 cars have passed by in an hour, what is the probability that

five of them were Dodges?

5. Let {N(t), t ≥ 0} be a Poisson process with rate λ. Let Sn denote the

time of the nth event. Find

(a) E(S4),

(b) E[S4|N(1) = 2],

(c) E[N(4)−N(2)|N(1) = 3].

6. Two individuals, A and B, both require kidney transplants. If she does

not receive a new kidney, then A will die after an exponential time with

rate µA, and B after an exponential time with rate µB. New kidneys

arrive in accordance with a Poisson process having rate λ. It has been

decided that the first kidney will go to A (or to B if B is alive and A

is not at that time) and the next one to B (if still living).

(a) What is the probability A obtains a new kidney?

(b) What is the probability B obtains a new kidney?

Page 123: Stat333 Lecture Notes - Daum

118 CHAPTER 7. EXPONENTIAL AND POISSON

7. Telephone calls arrive at a switchboard in a Poisson process at the rate

of 2 per minute. A random one-tenth of the calls are long distance.

(a) What is the probability that no call arrives between 9:00-9:05am?

(b) What is the probability that at least 2 calls arrive between 10:00-

10:02am?

(c) What is the probability of at least one long distance call in a ten

minute period?

(d) Given that there have been 8 long distance calls in an hour, what

is the expected number of calls to have arrived in the same period?

(e) Given that there were 90 calls in an hour, what is the probability

that 10 were long distance?

8. Three customers A, B and C enter a bank. A and B to deposit money

and C to buy a money order. Suppose that the time it takes to deposit

money is exponentially distributed with mean 2 minutes, and that the

time it takes to buy a money order is exponentially distributed with

mean 4 minutes. If all three customers are served immediately, what is

the probability that C is finished first? That A is finished last?

Page 124: Stat333 Lecture Notes - Daum

Chapter 8

Continuous Time Markov

Chain

One of the shortcomes of the discrete time Markov chain is that it can only

be used to model situations when a transition occurs only at discrete times.

This is not a problem when modeling the outcome of gambling, an English

text or a music piece. It might also be an ideal mathematical model for DNA

sequences. However, it is a bit stretched to model the sizes of some animal

populations.

Continuous time Markov chain represents one of the directions in which

the discrete time Markov chain is generalized. Other than the inter-arrival

time between two transitions is now a continuous random variable, we retain

other requirements of the corresponding stochastic process.

Let {X(t), t ≥ 0} be a stochastic process. It is a continuous time Markov

chain if it has the following two properties:

(i) It has countable state space;

(ii) It has Markov property:

P (X(t+ s) = j|X(s) = i,X(u) = x(u), 0 ≤ u < s)

= P (X(t+ s) = j|X(s) = i),

The concept remains the same. Given present (X(s) = i), the future

outcome X(s+ t) = j is independent of the past (X(u) = x(u), 0 ≤ u < s).

119

Page 125: Stat333 Lecture Notes - Daum

120 CHAPTER 8. CONTINUOUS TIME MARKOV CHAIN

When P (X(t+s) = j|X(s) = i) = P (X(t) = j|X(0) = i), this Markov chain

is homogeneous in time.

Naturally, we call

pij(t) = P (X(t) = j|X(0) = i)

the transition probability of the Markov chain from state i to state j in a

period of time t. We use notation P (t) for the matrix formed by these

transition probabilities.

Recall that the state space is countable. Hence we denote it as {0, 1, 2, . . .}.Suppose X(0) = 3. As time t goes, X(t) may move out of state 3. Let T3 be

the time it takes for this transition to occur. Due to the Markov property,

which can also be interpetted as the memoryless property, T3 must have ex-

ponential distribution with some rate. Let call this rate ν3. At the moment

t = T3 + δ, where δ is an imaginary tiny quantity, X(t) could equal 0, 1,

2, 4, 5, . . .. What is the probability for X(t) = 4 given X(T3)? Since it

is occurring in a very short period of time, we use p34 for this probability

and call it instantaneous transition probability. Since this probability is

computed under the condition that a transition has occurred, we must have

p33 = 0.

In general, we have

pii = 0,∑j

pij = 1

for all i = 0, 1, 2, . . .. We use notation P for the instantaneous transition

probability matrix.

Be aware, this P is different from the P defined for discrete time Markov

chain.

Example 8.1

Consider two machines that are maintained by a single repair-person. Ma-

chine i functions for an exponential time with rate λ before breaking down.

The repair time for either machine is exponential with rate µ.

Let X(t) = the number of machines functioning at time t. Then, we get

a continuous time stochastic process {X(t), t ≥ 0}.The state space of this process is obviously {0, 1, 2} and countable.

Page 126: Stat333 Lecture Notes - Daum

121

The Markov property can be verified as the waiting time for a transition is

exponential regardless of which state the Markov chain is in at the moment.

For example, if X(0) = 0, the waiting time for a transition to occur is

the same as waiting time for the repair-person to get one machine repaired.

This waiting time is exponential with rate µ.

If X(0) = 1, a transition occurs either when the break down machine is

repaired, or the functioning machine breaks, whichever occurs first. Assume

the independence of two waiting times, the shorter of the two has exponential

distribution with rate λ+ µ

If X(0) = 2, a transition occurs when one of the machines breaks down.

Again, under independence assumption, the waiting time for the first break

down is exponential with rate 2λ.

Note also not only the waiting time for a transition is independent of the

past (given the present), the transition probability is also independent of the

past.

When X(0) = 0, the only possible transition is from 0 to 1. Hence,

p01 = 1.

When X(0) = 1, it transfers to 0 if the functioning machine breaks down

ahead of the break down machine is repaired. (the chance of them to occur

at exactly the same time is nil). This occurs with probability

λ

λ+ µ

and this event is independent of the occurrence time. (Review our discussion

on exponential distributions). Hence, p10 = λ/(λ+ µ), and p12 = µ/(λ+ µ).

When X(0) = 2, the only possible transition is from 2 to 1. Hence,

p21 = 1.

The above discussion fully verifies the Markov property, and we find

P =

0 1 0λ

λ+µ0 µ

λ+µ

0 1 0

.The rates for T0, T1, T2 are ν0 = µ, ν1 = λ+ µ and ν2 = 2λ. ♦In the future, we will tie P and νi together so it is simpler to memorize

them.

Page 127: Stat333 Lecture Notes - Daum

122 CHAPTER 8. CONTINUOUS TIME MARKOV CHAIN

Note that to verify a stochastic process as a continuous time Markov

chain, we go through the following three steps.

Step 0: Define the process {X(t) : t ≥ 0} if it is not given;

Step 1: Identify the state space to check its countability.

Step 2: (i) Verify the distribution of the waiting time for a transition

is exponential for all state i. (ii) Verify that the instantaneous transition

probabilities pij do not depend on the history of the stochastic process given

the present state i.

Finally, we normally present the transition matrix P , and the rates of

transitions νi at the concluding step.

8.1 Birth and Death Process

The example we just gave is a special case of birth and death process, while

the later is a special continuous time Markov chain.

Suppose we are investigating a specific biological species. Somehow we

have a starting point t = 0, and define

X(t) = Population size at time t.

We now have a continuous time stochastic process with countable state space.

To qualify as a Markov chain, we make several assumptions on its random

behavior:

(i) When X(t) = n, the waiting time for the next birth to occur has

exponential distribution with rate λn for n ≥ 0.

(ii) When X(t) = n, the waiting time for the next death to occur has ex-

ponential distribution with rate µn for n ≥ 1. (µ0 = 0). Also, the occurrences

of the birth and the death are independent of each other.

We call such a stochastic process “birth and death” process. It is seen

that

(a) State space: S = {0, 1, 2, . . .},(b) (i) Waiting time for the next transition to occur has exponential dis-

tribution with rate λn+µn for n = 1, 2, . . .. (ii) The instantaneous transition

probabilities are:

Page 128: Stat333 Lecture Notes - Daum

8.1. BIRTH AND DEATH PROCESS 123

p01 = 1 (no twins).

pi,i+1 =λi

λi + µi, pi,i−1 =

µiλi + µi

(i ≥ 1).

Unless we are asked to model the birth and death process as a continuous

time Markov chain, we need only specify the birth and death rates in order

to have the birth and death process defined.

Let us consider a special birth and death process.

Example 8.2

A birth and death process is said a pure birth process if µn = 0 for all n. It

has further linear birth rate if λn = nλ. ♦

Example 8.3 (A linear growth model)

Consider a birth and death process with linear birth and death rates. That

is, we have a birth and death process {X(t) : t ≥ 0} with λn = nλ and

µn = nµ for n = 0, 1, . . ..

Assume X(0) = 1. What would be the value of E[X(t)|X(0) = 1]?

Define M(t) = E[X(t)|X(0) = 1]. Let T1 be the time when the first event

occurs, whether it is a death or a birth. Then

M(t) = E{E[X(t)|T1, X(0) = 1]}.

Depending on the value of T1, the conditional expectation has different

outcomes. (That is why we say that the conditional expectation is a function

of T1).

If T1 < t, and the event is a death, we have E[X(t)|T1, X(0) = 1] = 0.

If T1 = s < t, and the event is a birth, we have E[X(t)|T1, X(0) = 1] =

E[X(t)|X(s) = 2]. Hence, E[X(t)|T1, X(0) = 1] = 2M(t− s).If T1 > t, E[X(t)|T1, X(0) = 1] = 1.

Combining these cases together, we have

M(t) = P (T1 > t) +∫ t

0

λ

λ+ µ[2M(t− s)](λ+ µ) exp{−(λ+ µ)s}ds

= exp{−(λ+ µ)t}+ 2λ∫ t

0M(t− s) exp{−(λ+ µ)s}ds.

Page 129: Stat333 Lecture Notes - Daum

124 CHAPTER 8. CONTINUOUS TIME MARKOV CHAIN

Taking derivative with respect to t and simplifying the outcome, we have

M ′(t) = (λ− µ)M(t).

Therefore, we must have M(t) = exp{(λ− µ)t}.It is simply to claim that if M(0) = i, Mi(t) = i exp{(λ− µ)t}.This completes this example. ♦

Example 8.4

Let us consider a birth and death process {X(t), t ≥ 0} with birth and death

rates be given by λi, µi, with µ0 = 0.

Let Ti be the time it takes for the process, starting from state i, to enter

state i+ 1 for the first time.

Assuming λi > 0 for all i. We have

E(T0) =1

λ0

.

What can we say about the expectation of Ti with i ≥ 1?

Let us define

Ii =

{1 If the first transition after X(0) = i is a birth;

0 If the first transition after X(0) = i is a death.

Then we have the following.

E[Ti|Ii = 1] = Expected time until the first event occurs =1

λi + µi.

E[Ti|Ii = 0] = Expected time until the first event occurs

+Expected time it takes to go from i− 1 to i

+Expected time it takes to go from i to i+ 1

=1

λi + µi+ E[Ti−1] + E[Ti].

Using the formula for conditional expectation, we get

E[Ti] = P (Ii = 1)E[Ti|Ii = 1] + P (Ii = 0)E[Ti|Ii = 0]

=1

λi + µi+

µiλi + µi

[E(Ti) + E(Ti−1)].

Page 130: Stat333 Lecture Notes - Daum

8.2. KOLMOGOROV DIFFERENTIAL EQUATIONS 125

We hence arrived at a recursive relationship:

E(Ti) =1

λi+µiλiE(Ti−1)

for i = 1, 2, . . ..

In particular, if the birth and death rates are constant, we have

E(Ti) =1

λ[1 +

µ

λ+ (

µ

λ)2 + · · ·+ (

µ

λ)i] =

1− (µ/λ)i+1

λ− µ.

8.2 Kolmogorov Differential Equations

Recall the Chapman and Kolmogorov equations for the discrete time Markov

chain. The system tells us that the n-step transition matrix is the product

of n one-step transition matrices. This equation system remains true for the

continuous time Markov chain with some modifications.

Lemma 8.1

Suppose {X(t) : t ≥ 0} is a continuous time Markov chain. Let pij(t) =

P [X(t) = j|X(0) = i] be its transition probability function. We have

pij(t+ s) =∞∑k=0

pik(t)pkj(s).

In matrix form, we have

P (t+ s) = P (t)P (s) = P (s)P (t).

♦The proof is the same as that for the discrete time Markov chain.

For discrete time Markov chain, the shortest time unit is 1. There is no

shortest time unit for continuous time Markov chain. If P (0.01) is known,

we can work out P (0.01n) for all positive integer n in principle. We need

only multiply P (0.01) with itself n times even though you might be bored

Page 131: Stat333 Lecture Notes - Daum

126 CHAPTER 8. CONTINUOUS TIME MARKOV CHAIN

to death for this task. The real challenge is, however to compute P (0.002)

based on that? Can we find an analytical form for P (t) based on parameters

νi, pij, the instantaneous transition probability? The answer is positive in

principle.

Lemma 8.2

Suppose {X(t) : t ≥ 0} is a continuous time Markov chain with exponen-

tial rates νi and instantaneous transition probabilities pij. Let pij(t) be its

transition probabilities for a time period of t. Then we have:

(a) limh→0

1− pii(h)

h= νi;

(b) limh→0

pij(h)

h= νipij.

Proof: We have, by definition,

pii(h) = P [X(h) = i|X(0) = i]

= P{no transitions in(0, h]}+P{2 or more transitions in(0, h]}

= exp{−νih}+ o(h).

Therefore, we have

1− pii(h)

h=

1− exp{−νih}+ o(h)

h

whose limit is νi as h→ 0. This proves (a).

Similarly,

pij(h) = P{X(h) = i|X(0) = i}= P{one transition from i to j in(0, h]}

+P{2 or more transitions resulting in j during (0, h]}= pijP{one transition in(0, h]}+ o(h)

= νipijh+ o(h).

Page 132: Stat333 Lecture Notes - Daum

8.2. KOLMOGOROV DIFFERENTIAL EQUATIONS 127

The result (b) is now obvious. ♦Let V = diag(ν0, ν1, . . .) and define G = V (P − I) where I is the identity

matrix. The above result can then be extended and summarized in a neat

matrix form.

Theorem 8.1 Kolmogorov’s Backward Equations

For a continuous Markov chain, we have

P ′(t) = GP (t)

where P ′(t) is a component-wise derivative of P (t) with respect to t.

Proof It is simple to see that

P (t+ h)− P (t)

h=

[P (h)− I]P (t)

h.

The limit is obviously GP (t). ♦

Theorem 8.2 Kolmogorov’s forward Equations

For a continuous Markov chain, we have

P ′(t) = P (t)G

where P ′(t) is a component-wise derivative of P (t) with respect to t.

Proof It is simply to note

P (t+ h)− P (t)

h=P (t)[P (h)− I]

h.

The limit is obviously P (t)G. ♦Unfortunately, the proofs above are not truly rigorous. The problem is

the order of transition matrix, which could be ∞×∞. In the case when the

state space is infinity (but countable, of course), the matrix multiplication

involves summation of infinite terms. The above manipulation implies taking

derivatives term by term in the summation. This is not always valid of taking

derivatives of the summation. Therefore, the theorem on forward equation

must include some regularity conditions. While we do not specify them here,

Page 133: Stat333 Lecture Notes - Daum

128 CHAPTER 8. CONTINUOUS TIME MARKOV CHAIN

we would like to mention that they are satisfied whenever the sample space

is finite. It is also okay with birth and death processes.

Let us assume all the processes to be considered in this course

are regular.

The matrix G plays an important role in these two equations. It is called

infinitesimal generator. The backward equation applies P (t) from the back

of G, and the forward equation applies P (t) from the front of G.

In principle, once G is known, we can solve the backward equation to find

the transition matrix P (t). In reality, this is not always feasible. We have a

few examples for which this can be done.

Example 8.5

Pure birth process with constant birth rate λ. In other words: Poisson pro-

cess. We practically used the differential equation to show that the number

of events in a fixed period of time has Poisson distribution. ♦

Example 8.6

Consider a lab with one machine. The waiting time until it breaks is ex-

ponential with rate λ, and when it is broken, the waiting time until it is

repaired is exponential with rate µ. Let us define

X(t) =

{0 the machine works at time t,

1 the machine is under repair at t.

It is obvious that {X(t) : t ≥ 0} is a birth and death process. It infinitesimal

generating function is

G =

[−λ λ

µ −µ

].

How do we get the transition probability matrix P (t)?

Solution: According to Kolmogorov’s backward equation, we know that

P ′(t) = GP (t)

and P (0) = I. Let us try to solve the equation.

Page 134: Stat333 Lecture Notes - Daum

8.2. KOLMOGOROV DIFFERENTIAL EQUATIONS 129

Component-wise, we have

p′00(t) = λ[p10(t)− p00(t)],

p′10(t) = µ[p00(t)− p10(t)].

We can then find

µp′00(t) + λp′10(t) = 0.

This implies

µp00(t) + λp10(t) = C

where C is a constant. Check the value at t = 0, we find C = µ. Hence

λp10 = µ[1− p00(t)]. Substituting back, we find

p′00(t) = µ− (λ+ µ)p00(t).

Solving this equantion, we find

p00(t) =λ

λ+ µexp{−(λ+ µ)t}+

µ

λ+ µ.

We can similarly work out other components of P (t).

Using this result, we are able to answer the questions such as: if the

machine works at t = 0, what is the probability that is is working at t = 10?

The answer is:

p00(t) =λ

λ+ µexp{−10(λ+ µ)}+

µ

λ+ µ.

Note that if t→∞, the limit of p00(t) is

µ

λ+ µ.

Thus, the long term proportion of times when the machine is working, is µλ+µ

.

The answer is very reasonable. ♦

Page 135: Stat333 Lecture Notes - Daum

130 CHAPTER 8. CONTINUOUS TIME MARKOV CHAIN

8.3 Limiting Probabilities

Similar to the discrete time Markov chain, when t → ∞, pij(t) often has a

limit which does not depend on i. The condition for the validity of this result

is also similar. However, we are no longer bothered by the periodicity.

Theorem 8.3

Suppose {X(t) : t ≥ 0} is a continuous time Markov chain with infinitesimal

generator G. Suppose

(a) all states of the Markov chain communicate with each other:

P{X(t) = j for some t > 0|X(0) = i} > 0

for all i, j; (irreducible)

(b) let Tij = The amount of time from X(0) = i until X(t) = j for the

first time, we have E(Tij) <∞,

Then limt→∞pij(t) = πj exists for all i, j and the vector satisfies

πG = 0;∑

πj = 1.

♦Remark

1. The limiting probability πi still has the interpretation of long run pro-

portional times when the Markov chain stays in state i.

2. Assume that the Markov chain is irreducible, and a non-zero solution

to πG = 0 exists. Then the limiting probability exists and all the states

are positive recurrent. That is, we need not verify the condition (b)

before we solving for the limiting probabilities.

3. Without the notation G = V − V P , the equation that π satisfies can

be written as

πjνj =∑k

πkνkpkj.

We may regard that πjνj as the rate the Markov chain leaving state j,

and∑k πkνkpkj as the rate the Markov chain entering state j. When

Page 136: Stat333 Lecture Notes - Daum

8.3. LIMITING PROBABILITIES 131

the time t goes to infinity, the Markov reaches equilibrium: the rates

of entering and leaving a state is the same for all states. For this sake,

when the Markov reaches this stage, it is in equilibrium.

4. When the limiting probability π exists, the Markov chain is called er-

godic. The limiting probability vector is also a stationary probability

distribution, or equilibrium distribution.

5. The expected inter-occurrence time of state j is again given by µj =

1/πj.

We do not have to rely on the equation πG = 0 to find the π. See the

following example.

Example 8.7 Birth and death process

Consider a typical birth and death process with birth and death rates λn and

µn. We easily set up to following table:

State rate of leaving rate of entering

0 π0λ0 µ1π1

1 π1(λ1 + µ1) µ2π2 + π0λ0

2 π2(λ2 + µ2) µ3π3 + π1λ1

3 π3(λ3 + µ3) µ4π4 + π2λ2

......

n πn(λn + µn) µn+1πn+1 + πn−1λn−1

......

Since the birth and death process has to settle down to some states, the rates

of moving between states have to be balanced. This observation gives

State rate of up rate of down

0 π0λ0 µ1π1

1 π1λ1 µ2π2

2 π2λ2 µ3π3

3 π3λ3 µ4π4

......

n πnλn µn+1πn+1

......

Page 137: Stat333 Lecture Notes - Daum

132 CHAPTER 8. CONTINUOUS TIME MARKOV CHAIN

In this case, we get

π1 =λ0

µ1

π0;

π2 =λ1

µ2

π1;

π3 =λ2

µ3

π2;

. . . . . . .

From the fact that∑n πn = 1, we find

1 = π0

[1 +

∞∑n=1

n−1∏i=0

λiµi+1

].

A meaningful solution exists if and only if

∞∑n=1

n−1∏i=0

λiµi+1

<∞.

This is the necessary and sufficient condition for the birth and death process

to reach equilibrium.

When this condition is satisfied, we find

πn =

∏n−1i=0 (λi/µi+1)

1 +∑∞n=1

∏n−1i=0 (λi/µi+1)

.

♦Remark

(i) When the birth rates are too high, the population will keep increasing.

No equilibrium can be researched.

(ii) When λn = 0 for some n = N . Then the population size will be

capped by N . It is easy to see that the equilibrium is now always possible.

Example 8.8

A job shop has M machines and one repair person. Assume each machine

will work exponential time with rate λ, independent of each other, and the

Page 138: Stat333 Lecture Notes - Daum

8.3. LIMITING PROBABILITIES 133

repairing time is also exponential with rate µ, regardless how many machines

are working at the moment.

Define X(t) to be the number of machines not working at time t. Then

{X(t) : t ≥ 0} is a birth and death process:

State Space 0 1 2 · · · M

Birth Rates Mλ (M − 1)λ (M − 2)λ · · · 0

Death Rates 0 µ µ · · · µ

(a) What is the average number of machines not working in long run?

We need to work out the limiting probabilities to answer this question.

Using the argument of the rates of movements, we note

n→ n+ 1 n+ 1→ n

(M − n)λ0πn µπn+1.

Thus,

πn+1 =(M − n)λ

µπn = (λ/µ)n+1[(M − n)(M − n+ 1) · · ·M ]π0.

From∑i π = 1, we find

π0 =[ M∑i=0

M !

(M − i)!(λ

µ)i]−1

.

There are no closed solutions. The average number of machines not working

is

limt→∞

E[X(t)] =M∑n=0

nπn.

(b) In long run, the proportion of machine which are working is

1−∑Mn=0 nπnM

.

Example 8.9

Page 139: Stat333 Lecture Notes - Daum

134 CHAPTER 8. CONTINUOUS TIME MARKOV CHAIN

When the birth and death rates are all constant (do not dependent on the

state), the solution for the limiting probability is very simple. The limiting

probabilities are given by

πn = (λ

µ)n(1− λ

µ) n = 0, 1, 2, . . . ,

when λ < µ.

This model is also called an M/M/1 queue: a work station has a single

server who works at constant rate, a steady stream of customs arrive for

service. If the service rate is larger than the arriving rate, the system is

stable. A custom will find on average (1− λ/µ)−1 customers in front of him

upon arrival. ♦

8.4 Problems

1. Suppose that a one-celled organism can be in one of two states– either A

or B. An individual in state A will change to state B at an exponential

rate α; an individual in state B divides into two new individuals of

type A at an exponential rate β. Define an appropriate continuous-

time Markov chain for a population of such organisms and determine

the appropriate parameters for this model.

2. Potential customers arrive at a single-server station in accordance with

a Poisson process with rate λ. However, if the arrival finds n customers

already in the station, then he will enter the system with probability

αn. Assuming an exponential service rate µ, set this up as a birth and

death process and determine the birth and death rates.

3. Consider a birth and death process with birth rates λi = (i+1)λ, i ≥ 0,

and death rates µi = iµ, i ≥ 0.

(a) Determine the expected time to go from state 0 to state 2.

(b) Determine the expected time to go from state 2 to state 3.

(c) Determine the variances in parts (a) and (b).

Page 140: Stat333 Lecture Notes - Daum

8.4. PROBLEMS 135

4. There are two TA’s for a particular course who answer questions in a

tutorial center. The number of students who come to ask questions

can be modeled by a Poisson process with intensity λ = 15/hour. The

amount of time it takes to answer questions for a single student has an

exponential distribution with rate µ = 12/hour. Assume the center is

large enough for 4 students including those who are asking questions

and new arrivals will not enter when the room is full.

(a) Set up a birth and death process to model this process. This in-

cludes: define {X(t), t ≥ 0}; write down its state space and its birth

and death rates.

(b) Write down its infinitesimal generator G.

(c) Obtain the limiting probabilities of this process.

(d) What proportion of the time is the room full? Assume the center

has been at service for a very long time.

(e) What proportion of the time can at least one of the TA’s have a

rest?

5. A job shop consists of three machines and two repairmen. The amount

of time a machines works before breaking down is exponentially dis-

tributed with mean 10. If the amount of time it takes a single repair-

man to fix a machine is exponentially distributed with mean 8, then

(a) what is the average number of machines not in use?

(b) what proportion of the time are both repairmen busy?

6. Each individual in a biological population is assumed to give birth at

an exponential rate λ, and to die at an exponential rate µ. In addition,

there is an exponential rate of increase θ due to immigration. However,

immigration nor birth are allowed when the population size reaches N .

(a) Set this up as a birth and death model.

(b) If N = 3, θ = λ = 1, µ = 2, determine the proportion of time that

immigration is restricted.

Page 141: Stat333 Lecture Notes - Daum

136 CHAPTER 8. CONTINUOUS TIME MARKOV CHAIN

7. Potential customers arrive at a full-service, one-pump gas station at a

Poisson rate of 20 cars per hour. However, customers will only enter

the station for gas if there are no more than two cars (including the

one currently being attended to) at the pump. Suppose the amount of

time required to service a car is exponentially distributed with a mean

of five minutes.

(a) What fraction of the attendant’s time will be spent servicing cars?

(b) What fraction of potential customers are lost?

8. A parking lot has N spaces. The incoming traffic is of Poisson type at a

rate of λ cars per hour whereas the occupancy times are exponentially

distributed with a mean of β hours.

(1) Find the appropriate differential equations for the probabilities,

Pn(t), of finding exactly n spaces occupied at time t.

(2) When N = 5, λ = 2 and β = 1, obtain the variance of the number

of spaces occupied if the process has been operating for a very long

time.

9. A small appropriate, operated by a single barber, has room for at most

two customers. Potential customers arrive at a Poisson rate of three

per hour, and the successive service times are independent exponential

random variables with mean 1/4 hour. What is

(a) the average number of customers in the shop?

(b) the proportion of potential customers that enter the shop?

(c) If the barber could work twice as fast, how much more business

would he do?

10. Consider two machines, both of which have an exponential lifetime with

mean 1/λ. There is a single repairman that can service machines at an

exponential rate µ. Set up the Kolmogorov backward equations; you

need not solve them. If you can solve this equation, what questions

will you be able to answer?

Page 142: Stat333 Lecture Notes - Daum

8.4. PROBLEMS 137

11. Consider two machines. Machine i operates for an exponential time

with rate λi and then fails; its repair time is exponential with rate

µi, i = 1, 2. The machines act independently of each other. Define

a four-state continuous-time Markov chain which jointly describes the

condition of the two machines. Use the assumed independence to com-

pute the transition probabilities for this chain and then verify that these

transition probabilities satisfy the forward and backward equations.

12. There are 6 copies of the movie Toy Story in a 24-hour video rental

shop. The demand for this movie is a Poisson process with intensity

parameter µ = 5/day. Once a tape is rented, the waiting time for its

returning has exponential distribution with rate λ = 1/day. Define

{X(t), t ≥ 0} be the number of copies of Toy story in the shop at time

t.

(a) Model {X(t), t ≥ 0} as a birth and death process. Specify it state

space, birth and death rates.

(b) Write down its infinitesimal generator G.

(c) Obtain the limiting probabilities of this process.

(d) What proportion of times when the shop is out of any copies of Toy

Story? Assume the shop has been at service for a very long time.

(e) If the owner charges $2.5 per day for the rental of one copy, how

much money does the owner make from Toy Story per day? (Assume

the owner charge $1.25 for half day and so on for simplicity).

13. Let {X(t)} be a typical birth and death process with birth rates λn and

death rates µn, n = 0, 1, . . ., and µ0 = 0. (You are responsible to know

any other assumptions made in a general birth and death process).

(a) In this set up, letBn be the waiting time until a birth whenX(t) = n

and Dn be the waiting time until a death when X(t) = n. What are

the distributions of Bn and Dn and their related parameters?

(b) Let Tn = min{Bn, Dn}. Calculate P (Tn > t) for t > 0. What is

the distribution of Tn?

(c) Calculate P (Bn < Dn) for any non-negative integer n.

Page 143: Stat333 Lecture Notes - Daum

138 CHAPTER 8. CONTINUOUS TIME MARKOV CHAIN

(d) Verify that {X(t)} is a continuous time Markov chain. Identify-

ing corresponding parameters: the exponential rate vn and conditional

transition probability pij. (given that X(t) is leaving i, the chance it

enters j).

14. A computer can handle N tasks simultaneously. The tasks are submit-

ted to the computer as a Poisson process with a rate of λ per second

and the amount of time it takes to complete a task is independent of

other tasks and has exponential distribution with a mean of β seconds.

The tasks submitted while the computer is at full load will be lost

without any warnings. (a) Set up a birth and death process to model

this process. This includes: define {X(t), t ≥ 0}; write down its state

space and its birth and death rates.

(b) Write down its infinitesimal generator G.

(c) Assume N = 3, λ = 4 and β = 1,

(i) obtain the limiting probabilities of this process.

(ii) obtain the mean number of tasks the computer handles at any

moment if the computer has been operating for a very long time.

(iii) what proportion of the jobs you submitted will get lost in a long

run?

Page 144: Stat333 Lecture Notes - Daum

Chapter 9

Queueing Theory

Queueing theory is closely related to the continuous time Markov chain. It

has the following basic set ups. There is a service station with several servers.

Customers come for service. They leave after being served. There are three

important factors that determine the properties of a queueing system.

The first factor is the random mechanism of the arrival of the customers.

Is the waiting time for the next customer a constant? Is it independent of

what happened already?

The second factor is the number of servers. How many customers can be

served simultaneously?

The third factor is the random mechanism of the service time. How long

does it take to serve a customer? Is it random?

The model becomes more complex if the number of servers changes ac-

cording to the length of the queue. Customers may also be divided into

several classes so that some of them receive priority service.

There are also several questions we might be interested in their answers.

On average, how long a customer has to wait before being served? What

proportion of the time when the server is idle? Once we have sufficient

understanding of the queue, the system will be optimized.

9.1 Cost Equations

Let us define the following quantities:

139

Page 145: Stat333 Lecture Notes - Daum

140 CHAPTER 9. QUEUEING THEORY

L, the average number of customers in the system;

LQ, the average number of customers waiting in the queue;

W , the average amount of time a customer spends in the system;

WQ, the average amount of time a customer spends waiting in the queue.

These quantities are obviously not independent of each other. Lower

average number of waiting customers implies shorter waiting time. The fun-

damental constraint is: there is a balance between staying in the queue and

being served in the system.

To make this balance relationship more explicit, we image that each cus-

tomer will pay to stay in the system, and hence the system makes money.

For a balanced system,

The amount customers pay = The amount the system earns.

When above equality is computed based on unit time, it becomes

Average rate the system earns

= average amount an entering customer pays

×the rate of entering customers.

Using this argument, when every customer pays one dollar per unit time,

we find

L = λaW

where λa is the rate of entering customers.

If customers pay only when they are waiting, not when they are being

served, then the relation becomes

LQ = λaWQ.

If customers pay for the service time only, we get

average number of customers in service = λaE[S],

where E(S) is the average amount of time a customer spends on service.

In order for these identities hold, the system has to be able to reach an

equilibrium. That is, at some time in the distance future, the rate of entering

equals the rate of leaving.

Page 146: Stat333 Lecture Notes - Daum

9.2. STEADY-STATE PROBABILITIES 141

9.2 Steady-State Probabilities

.

Let us now define a stochastic process for the queueing system. At any

given time t, we might be interested in several aspects of the queueing system.

We define

X(t) = number of customers in the system at time t.

Hence X(t) is the total number of customers including those being served at

the moment and those who are waiting. One quantity of interest is P{X(t) =

n} for each n. Namely, the probability (mass) function of X(t) for each

given t. Mathematically, this is often too hard to be computed analytically.

Instead, consider

πn = limt→∞

P{X(t) = n}

when it exists. Under certain conditions, computing this limit is simple. This

quantity can be interpreted as long-run proportion of times when there will

be exactly n customers in the system. It is also referred to as steady-state

probability of exactly n customers in the system. It is usually true that

πn is the long-run proportion of times when the system contain n customers.

If π3 = 0.2, then about 20% of times, the system contains 3 customers.

On average (in the long-run), there are∑nπn number of customers in the

system.

Let Tm be the arrival time of the mth customer, then X(Tm−) is the

number of customers in the system when the mth customer arrives. Define

an = limm→∞

P (X(Tm−) = n).

That is, the system is sampled when a new customer arrives.

Let Sm be the departure time of the mth customer. Let

dn = limm→∞

P (X(Sm+) = n).

So, we sample the system when a customer leaves.

All three limits, πn, an and dn are long-run proportion of times when

there are n customers in the system under the specific sampling plan.

Page 147: Stat333 Lecture Notes - Daum

142 CHAPTER 9. QUEUEING THEORY

Example 9.1

Consider a queueing model in which all customers have their service time

equal to 1, and where the times between successive customers are always

greater than 1. In this case, the system is always empty when a new customer

arrives, and when a customer leaves. We hence find

a0 = d0 = 1.

However, π0 > 0 as long as there is a steady stream of customers arriving. ♦If you work in a service station with this property, your supervisor can

always pick up the right time so that you are found idle all every time, even

though you are very busy in between.

Example 9.2

If there are no multiple arrivals, and there is only one server, then an = dnfor all n.

If the system reaches a balance, the long term number of transitions of

X(t) from n to n+ 1 have to be the same as the number of transitions from

n + 1 to n. The former represents an and the later represents dn. So they

are equal.

The conditions on single arrival and single server make sure that transi-

tions such as from n to n+ 2 cannot happen. ♦

Example 9.3

If the customers arrive according to a Poisson process model, then

πn = an.

Due to possible sampling bias, the supervisor may not always know how

busy you are on average. However, if he/she picks next inspection time

according to an exponential distribution, he/she will not be at risk of mis-

judging your average work load in the long run. ♦

Page 148: Stat333 Lecture Notes - Daum

9.3. EXPONENTIAL MODEL 143

9.3 Exponential Model

A special queueing model is when (i) customers arrive according to the condi-

tions of Poisson process with rate λ; (ii) the service station has one server; (iii)

service time has exponential distribution with rate µ. This type of queue-

ing model is also called /M/M/1 model. The letter M stands for Markov

property: that is, the memoryless property of the exponential distribution

used to describe the arrival and service. The digit 1 stands for the number

of server. Obviously, if X(t) is the number of customers in the system at

time t, then {X(t) : t ≥ 0} is simply a birth and death process. If πn is the

limit of P [X(t) = n], then it satisfies equation πG = 0. From another point

of view, considering the rate of X(t) moving up and down, we have

State up down

0 λπ0 µπ1

1 λπ1 µπ2

2 λπ2 µπ3

· · ·n λπn µπn+1

· · ·

Equating these pairs of rates, we find

πn = (λ/µ)nπo.

Hence, when λ < µ so that the limits exist,

π0 = 1− (λ/µ).

As a distribution, π is geometric.

On average, how many customers are there in the system? The answer

is simple, the geometric distribution has mean

L =1

1− λ/µ− 1 =

λ

µ− λ.

(Pay attention that this π starts from π0.)

The average waiting time W = L/λ = (µ− λ)−1.

Page 149: Stat333 Lecture Notes - Daum

144 CHAPTER 9. QUEUEING THEORY

Let S be the service time a customer receives. Then, by assumption, it

has exponential distribution with rate µ. Hence, E(S) = µ−1. (Too bad we

use µ as a notation for rate, not for its mean). So, on average, a customer

spends the following amount of time in the queue (not being served):

WQ = W − E(S) =λ

µ(µ− λ).

The average number of customers waiting in the queue is

LQ = λWQ =λ2

µ(µ− λ).

LetW ∗ be the amount of time an arbitrary customer spends in the system.

Note that before taking the average, it is random. Its average (expectation)

is given by W = (µ− λ)−1. What is its distribution?

If we know the number of customers in the system when this customer

arrives, then the conditional distribution is gamma. Let N be the number of

customers in the system at the moment when this customer arrives. Then

P [W ∗ ≤ t|N = n] =∫ t

0

µ(µs)n−1

(n− 1)!exp(−µs)ds.

Since Poisson arrivals see the average, we have

P (N = n) = πn = (λ/µ)n(1− λ/µ).

Hence

P [W ∗ ≤ t] = E{P [W ∗ ≤ t|N ]} = 1− exp{−(µ− λ)t}.

So, the waiting time of a randomly selected customer is also exponentially

distributed.

9.4 A Single-server Exponential Queueing Sys-

tem Having Finite Capacity

Assume the service station can only hold N customers. When the system is

full, the arrivals get lost. This system can be analyzed in the same fashion

as before.

Page 150: Stat333 Lecture Notes - Daum

9.4. SINGLE SERVER 145

State up down

0 λπ0 µπ1

1 λπ1 µπ2

2 λπ2 µπ3

· · ·N − 1 λπN−1 µπN

Note that this balance sheet stops at N .

Now, we have

π0 =(1− λ/µ)

1− (λ/µ)N+1.

In case you have not noticed the difference, here we put down a list:

(1) This result remains true regardless of µ > λ. If there too many

customers that the system can handle, we simply turn them away.

(2) We have a sum of finite number of terms. Sharpening your memory

on geometric summation is necessary.

It is often the case that finite case is harder than the infinite case. We

have, for this system,

L =N∑n=0

nπn =λ{1 +N(λ/µ)N+1 − (N + 1)(λ/µ)N}

(µ− λ){1− (λ/µ)N+1)}.

What is the average amount of time a customer spend in the system.

This depends on whether we count those who will be turned away. If there

are counted, (whose time is zero), the answer is L/λ. Otherwise, only λa =

λ(1 − πN) entered the system, their average time spend in the system is

L/{λ(1− πN)}.

Example 9.4

Suppose it costs cµ dollars per hour to provide service at rate µ. Suppose

also that we profit A dollars for each customer served. If the system has

capacity N , what service rate µ maximizes our total profit?

Page 151: Stat333 Lecture Notes - Daum

146 CHAPTER 9. QUEUEING THEORY

Solution: We can work out the relationship between the net profit and the

service rate, together with arrival rate λ. Let us assume that M/M/1 model

is suitable.

Net profit per hour = λ(1− πN)A− cµ =λA[1− (λ/µ)N ]

1− (λ/µ)N+1− cµ.

We cannot find the µ that maximize the above expression analytically. If

N = 2, λ = 1, A = 10 and c = 1, then

Net profit per hour =10(µ3 − µ)

µ3 − 1− µ.

We may find the value of µ that maximizes above numerically. The answer

is approximately µ = 2. ♦

Example 9.5 A shoeshine shop

It is not entirely clear why we should be interested in a shoeshine shop.

This example, however, provides an example when the possible paths of the

continuous Markov chain form a net. This structure makes the task of solving

equations hard for limiting probabilities.

Ignoreing the background, this continuous time Markov chain has 5 states,

they are:

(0, 0): No customers in the system;

(1, 0): One customer in the system and receiving type I service;

(0, 1): Receiving type II service;

(1, 1): Two customers, receiving types I and II services respectively;

(b, 1): Two customers, one finished type I, and the other is receiving type

II service.

The system can accept at most two customers. A customer arrives when

the system is in states (1, 0), (1, 1) or (b, 1) will be turned away.

Service time of type I is exponential with rate µ1, and of type II is µ2.

Arrival rate of new customers is λ. You bet that everything is exponential.

The following transitions are possible:

(0, 0)→ (1, 0),

Page 152: Stat333 Lecture Notes - Daum

9.4. SINGLE SERVER 147

(1, 0)→ (0, 1),

(0, 1)→ (1, 1),

(0, 1)→ (0, 0),

(1, 1)→ (0, 1),

(1, 1)→ (b, 1),

(b, 1)→ (0, 1).

Let us equate the rate of enter and rate of leaving:

state Leave Enter

(0, 0) λπ0,0 µ2π0,1

(1, 0) µ1π1,0 λπ0,0 + µ2π1,1

(0, 1) (λ+ µ2)π0,1 µ1π1,0 + µ2πb,1(1, 1) (µ1 + µ2)π1,1 λπ0,1

(b, 1) µ2πb,0 µ1π1,1

General solutions are involved. We work on special cases:

(a) If λ = 1, µ1 = 1, µ2 = 2, we find

π[(0, 0), (1, 0), (1, 1), (0, 1), (b, 1)] = (12, 16, 2, 6, 1)/37.

So, L = 28/37, W = 28/18. (We should be careful about the rate of enter-

ing).

(b) If λ = 1, µ1 = 2, µ2 = 1, we find

π[(0, 0), (1, 0), (1, 1), (0, 1), (b, 1)] = (3, 2, 1, 2, 3)/11.

So, L = 1, W = 11/6. It is a better arrangement to make the room for the

next customer. ♦

Example 9.6 A queueing system with bulk service

Consider the system when the single server can serve two customers a

time. The service time for two customers are identical. The elevator is such

an example. Because of this, this is still a one server system.

What makes it special is: when there are two or more customers waiting

after the server finishes the service to previous customers, it will take the

Page 153: Stat333 Lecture Notes - Daum

148 CHAPTER 9. QUEUEING THEORY

next two customers. If there is only one waiting, it will take just one. When

nobody is waiting, it idles. It seems more convenient to let

X(t) = the number of customers in the line.

When nobody is waiting, we still have two different situations: the server is

idle, or the server is busy. So, we define X(t) = 0 when no one is in the line,

but the server is busy, and X(t) = 0′ when no one is in the line, and the

server is idle.

To find the limiting probabilities, we notice that there is still a general

direction when the Markov chain moves: it either moves up or down to one

state.state up down

0′ λπ0′ µπ0

0 λπ0 µ(π1 + π2)

0 λπ1 µ(π2 + π3)

· · ·n λπn µ(πn+1 + πn+2)

Although it is possible to solve this system of equations with generating

function, we need only use this idea to justify that the solution has the form

πn = αnπ0.

for all n = 0, 1, 2, . . .. Substitute into the relationship, we find

α =

√1 + 4λ/µ− 1

2.

Further, from the fact that∑πi = 1, we find

π0 =λ(1− α)

λ+ µ(1− α).

The rest of limiting probabilities can then be easily calculated. For instance,

π0′ =µ

λπ0.

One remark is that the solution makes sense only if α < 1. This requires

2µ > λ, which is obviously necessary.

Page 154: Stat333 Lecture Notes - Daum

9.5. NETWORK OF QUEUES 149

The proportion of customers who are served along is λπ0′+µπ1

λ, and

LQ =λα

(1− α)[λ+ µ(1− α)].

It is seen that

WQ = LQ/λ

and so on. ♦

9.5 Network of Queues

9.5.1 Open System

Consider a two-server system in which customers arrive at a Poisson rate

λ at server 1. After being served by server 1, they then join the queue in

front of server 2. Assume there are infinite waiting space at both servers.

Each server serves one customer at a time. The service time of the server i

has exponential distribution with rate µi, i = 1, 2. Such a system is called a

tandem or sequential system. Note that this system is similar to the shoeshine

example. The difference is that the shoeshine example has finite capacity.

It is more convenient to define

X(t) = (n,m)

if there are n customers in the queue for server 1 and m customers in the

queue for server 2 at time t. The state space of this stochastic process is

obviously countable. Although it is possible to identify the corresponding in-

finitesimal generator, we may find it not so useful in determining the limiting

probabilities.

Note that from state (n,m), the Markov chain may enter three possible

states: (n − 1,m) if a customer completes server 1 first; (n,m − 1) if a

customer completes server 2 first; (n+ 1,m) if a new customer arrives before

any one completes services. If any of n or m is zero, we need to make some

minor adjustments.

Page 155: Stat333 Lecture Notes - Daum

150 CHAPTER 9. QUEUEING THEORY

Now consider the states from which the Markov chain may enter state

(n,m). They include (n− 1,m), (n+ 1,m− 1), (n,m+ 1). Again, we make

some minor adjustments if any of n,m is zero.

So the general balance equation is

πn,m(µ1 + µ2 + λ) = λπn−1,m + µ1πn+1,m−1 + µ2πn,m+1.

For special cases, we have

λπ0,0 = µ1π0,1;

(λ+ µ1)πn,0 = µ2πn,1 + λπn−1,0;

(λ+ µr)π0,m = µ2π0,m+1 + µ1π1,m−1.

Rather than solving this system of equations directly, it is more conve-

nient to guess the solution and verify it. The idea is: the system under

consideration is similar to two M/M/1 systems. If the balance (equilibrium)

will be researched, the arrival rate for the second server has also to be λ.

Hence, we must have

πn,· = (1− λ/µ1)(λ/µ1)n;

and

π·,m = (1− λ/µ2)(λ/µ2)m.

And we guess

πn,m = πn,·π·,m.

Needless to say, this guess is correct. We may verify it quickly.

Note that the limiting distribution is a joint independent geometric. The

total number of customers in the system has expectation given by

L =λ

µ1 − λ+

λ

µ2 − λ.

9.5.2 Closed Systems

If customers come and go, it is called an open system. If no new customers

enter, and existing ones never depart, the system is closed. My best imag-

ination of such a system is when the system is so large, it includes all the

service stations you can possibly have.

Page 156: Stat333 Lecture Notes - Daum

9.5. NETWORK OF QUEUES 151

Suppose we have m customers in the system and there are k service

stations. Whenever a customer completes a service, it immediately gets into a

queue in front of another server (could be the same one). If we follow a single

customer and define Xn to be the server index in which line this customer

enters after completed n services, we can easily see that {Xn} is a discrete

time Markov chain. Certainly, we have to assume that the choice of the next

server depending only on which service this customer has just completed, not

on the history of the services. Let P be the transition probability matrix for

this discrete time Markov chain. Assume further that this Markov chain is

irreducible and having a stationary distribution π. It is known that

π = πP

with∑kj=1 πj = 1. Note that it is more convenient to write the state space

as {1, 2, . . . , k}.From the experience of the last example, it seems possible to believe that

the arrival to service station j is again Poisson process with some rate, say

λm(j). If so, we must have

λm = λmP.

As the solution to the above type of equation is unique up to a scaler, we

must have

λm = π‖λm‖,

where ‖λm‖ =∑j λm(j). We may interpret ‖λm‖ as the average service

completion rate of the entire system. It is the system throughput rate.

I would like to add that these arguments are still until the exponential

service time assumption with service rate µj at station j. We must also

assume that service stations are independent of each other. Our next question

is: how do these m customers distribute among these k servers?

Let Y (t) = (n1, n2, . . . , nk) be the vector with j component equaling the

number of customers in jth station at time t. Then {Y (t) : t ≥ 0} is a

continuous time Markov chain (with vector valued random variables). Let

Pm(n1, n2, . . . , nk) = limt→∞

P [Y (t) = (n1, n2, . . . , nk)].

Page 157: Stat333 Lecture Notes - Daum

152 CHAPTER 9. QUEUEING THEORY

It can be shown that, if it exists,

Pm(n1, n2, . . . , nk) = Km

p∏j=1

(λm(j)/µj)nj ,

for all possible vector (n1, n2, . . . , nk). Note that Km is a normalizing con-

stant.

Due to the relationship between πj and λm(j), we have

Pm(n1, n2, . . . , nk) = Cm

p∏j=1

(πj/µj)nj .

Note the normalizing constant Km becomes Cm now.

At the first sight, I would claim this is a multinomial probability function.

It looks like so and fits my intuition. Unfortunately, it is not. The main

difference is that the individual terms does not have a multinomial coefficient.

Because of this, even if πj and µj are given, it is still hard to determine Cmwhen m is large. There are m+k−1 choose m possible vectors (n1, n2, . . . , nk)

such that∑nj = m.

Even without knowing what this Cm equals numerically, we can still learn

a lot from the above expression. Consider the moment when a particular

customer has just completed service i and will enter service j. What is the

probability that s/he will see (m1,m2, . . . ,mk) customers in k stations? Note

that we have∑kj=1 mj = m− 1.

P (seeing(m1,m2, . . . ,mk)|i→ j)

=P (Y (t) = (m1,m2, . . . ,mi + 1, . . . ,mk), i→ j)

P (i→ j)

=P (Y (t) = (m1,m2, . . . ,mi + 1, . . . ,mk))µiPij∑P (Y (t) = (n1, n2, . . . , ni + 1, . . . , nk))µiPij

= K(πi/µi)k∏j=1

(πj/µj)mj

= Ck∏j=1

(πj/µj)mj .

Page 158: Stat333 Lecture Notes - Daum

9.5. NETWORK OF QUEUES 153

Since C is a normalizing constant, we find this conditional probability func-

tion is the same as Pm−1. Hence, we claim

Theorem 9.1 The arrival theorem

In the closed network system withm customers, the system as seen be arrivals

to server j is distributed as the stationary distribution in the same network

system when there are only m− 1 customers. ♦That is, this customer may pretend that s/he is an observer from outside.

Let Lm(j) and Wm(j) be the average number of customers and the aver-

age time a customer spends at server j when there are m customers in the

network. Upon conditioning on the number of customers found at server j

by an arrival to that server, it follows that

Wm(j) =1 + Em[nj]

µj=

1 + Lm−1(j)

µj.

Replacing Em[nj] by Lm−1(j) in the last equality is based on the arrival

theorem. (Sorry that we have used lower case for random variable, and

upper case for expected value here).

In addition, since λm−1(j) = λm−1πj, the cost equation implies

Lm−1(j) = λm−1πjWm−1(j).

Substituting back to Wm(j), we get

Wm(j) =1 + λm−1πjWm−1(j)

µj.

Since∑j Lm−1(j) = m− 1, we obtain

m− 1 = λm−1

∑j

πjWm−1(j),

or

λm−1 =m− 1∑

j πjWm−1(j).

These manipulations result in

Wm(j) =1

µj+

(m− 1)πjWm−1(j)

µj∑i πiWm−1(i)

.

Page 159: Stat333 Lecture Notes - Daum

154 CHAPTER 9. QUEUEING THEORY

After so much work, we may rightfully ask: so what? Note that W1(j) =

1/µj which is very easy to calculate. The above relationship enables us to

obtain W2(j), and from Wm−1(j), we can easily get Wm(j). Thus, we can

compute Wm(j) iteratively. The cost equation will then make it possible to

calculate all Lm(j).

9.6 Problems

1. Consider a single-server bank for which customers arrive in accordance

with a Poisson process with rate λ. If a customer only will enter the

bank if the server is free when he arrives, and if the service time of a

customer has the distribution G, then what proportion of time is the

server busy?

2. Consider the following queueing system. Customers arrive in a Poisson

process at rate λ > 0 and are served, in order of arrival, by a single

server. Service times are independent; however, they are not identi-

cally distributed since it has been observed that the server works more

quickly when there are a number of customers waiting in the queue.

To model this phenomenon of stat-dependent serve times assume that

when there are j customers in the system the server provides exponen-

tial service at rate jµ, j = 1, 2, . . ..

(a) Show that {πn}, the equilibrium probability distribution for the

number of customers in the system (including the one being served), is

Poisson with mean ρ = λ/µ.

(b) Let W be the equilibrium waiting time for a customer who joins the

queue, and suppose that W has pdf w(x) with corresponding Laplace

transform w(s). If Π(z) =∑∞n=0 πnz

n, show that

(i) Π(z) = w(λ− λz);

(ii) w(s) = e−s/µ.

(c) Using the results of (b) or otherwise, find E(W ).

Page 160: Stat333 Lecture Notes - Daum

Chapter 10

Renewal Process

In the Poisson process model, the inter-arrival times are assumed to be in-

dependent an identically distributed exponential random variables. We now

seek to relax the requirement slightly.

Definition 10.2

Let X1, X2, . . . , be a sequence of independent and identically distributed

non-negative random variables. Define

N(t) = max{n :n∑i=1

Xi ≤ t}

for all t ≥ 0. Then {N(t) : t ≥ 0} is called a renewal process.

Compared to the Poisson process, the inter-arrival times no longer have

to have exponential distribution for the renewal process. Thus, the renewal

process losses the Markov or memoryless property. If it has been a while

since the occurrence of the last event, the waiting time for the next event

from the moment may be substantially shorter than the usual waiting time.

At the same time, if an event has just occurred, the waiting time for the

next event has the same distribution no matter how often events occurred

before the time of last occurrence. In this sense, the process renews itself at

the moment when an event occurs. We may now link this process with the

concepts of renewal events discussed before.

155

Page 161: Stat333 Lecture Notes - Daum

156 CHAPTER 10. RENEWAL PROCESS

Let us define S0 = 0 and Sn =∑ni=1 Xi for n ≥ 1. Assume that

P (X1 = 0) < 1.

Let µ = E[X1]. Obviously, µ > 0. We did not really pay attention whether

N(t) is well defined for each t. It is likely that N(t) < 200, for instance, no

matter how large the t is?

It turns out that this cannot happen. According to the strong law of

large numbers, we haveSnn→ µ

almost surely as n → ∞. It is hence true that Sn ≈ nµ. Thus, when

n increases to infinity, Sn also increases to infinity almost surely. By the

definition of N(t), we easily see that

P (N(t) <∞) = 1

for all t ≥ 0. At the same time, limt→∞N(t) =∞.

10.1 Distribution of N(t)

For each given t, N(t) is a random variable. What is its distribution? The

answer is usually not available unless the distribution of X1 has a convenient

form. Some discussions are possible.

Note that the event N(t) ≥ n is the same as Sn ≤ t. Thus, it is seen that

P{N(t) = n} = P{N(t) ≥ n} − P{N(t) ≥ n+ 1}= P{Sn ≤ t} − P{Sn+1 ≤ t}.

Denote Fn(t) = P{Sn ≤ t} which is the convolution of the distributions of

X1, . . . , Xn, we have the expression

P{N(t) = n} = Fn(t)− Fn+1(t).

As indicated earlier, this expression does not provide any practical means

of computing the distribution of N(t).

Page 162: Stat333 Lecture Notes - Daum

10.1. DISTRIBUTION OF N(T ) 157

Example 10.1

Suppose that in a renewal process, the inter-arrival times X1, X2, . . . , are

uniformly distributed on the unit interval [0, 1]. Then for 0 ≤ t ≤ 1,

Fn(t) =tn

n!

for n = 1, 2, . . .. However, the expression for t > 1 is usually very complex.

Example 10.2

Suppose that in a renewal process, the inter-arrival times X1, X2, . . . , are

discretely uniform on integers {0, 1, 2, 3}. Then the expressions of F1(t) and

F2(t) are easy to obtain:

F1(i) =i+ 1

4i = 0, 1, 2, 3.

The probability mass function f2(t) and the cumulative distribution function

F2(t) are given by

0 1 2 3 4 5 6

16f2(t) 1 2 3 4 3 2 1

16F2(t) 1 3 6 10 13 15 16

♦It is also possible to find examples when a simple expression for the dis-

tribution of N(t). Other than the standard special case of Poisson process,

we have the following examples.

Example 10.3

Consider the renewal process whose inter-arrival times have geometric dis-

tribution such that

P (X = i) = p(1− p)i−1, i ≥ 1.

Page 163: Stat333 Lecture Notes - Daum

158 CHAPTER 10. RENEWAL PROCESS

It is seen that

P (Sn = k) =

(k − 1

n− 1

)pn(1− p)k−n, k ≥ n.

Thus, we have

P (N(t) = n) =[t]∑k=n

(k − 1

n− 1

)pn(1− p)k−n −

[t]∑k=n+1

(k − 1

n

)pn+1(1− p)k−n−1.

♦Let us now consider the problem of computing the expectation of the

mean-value function:

m(t) = E[N(t)].

This function is also called renewal function.

Recall an expectation formula derived for non-negative integer valued

random variables,

m(t) =∞∑n=1

P{N(t) ≥ n} =∞∑n=1

Fn(t).

It can be shown (by using characteristic functions), that the renewal function

and the inter-arrival distribution uniquely determine each other. Thus, if the

inter-arrival time distribution is exponential with rate λ = 2, then we find

m(t) = λt = 2t.

If the renewal function

m(t) = λt = 2t,

we know that {N(t) : t ≥ 0} is a Poisson process with rate λ = 2.

One mathematical problem is the finiteness of m(t) for any given t. Sup-

pose P (X1 > 0) > 0. It can be shown that for any given t, Fn(t) decreases

at an exponential rate when n is large. Thus, m(t) is always finite when

P (X1 > 0) > 0.

The relationship between m(t) and the distribution of the inter-arrival

time is made explicit in the following theorem.

Page 164: Stat333 Lecture Notes - Daum

10.2. LIMITING THEOREMS AND THEIR APPLICATIONS 159

Theorem 10.1

Let m(t) be the renewal function of the renewal process {N(t) : t ≥ 0} and

F (t) be the distribution of the inter-arrival time. Assume that F (0) < 1.

Then

m(t) = F (t) +∫ t

0m(t− x)dF (x).

Proof: We have

m(t) = E{E[N(t)|X1]}

=∫ ∞

0E[N(t)|X1 = x]dF (x)

=∫ t

0[1 +m(t− x)]dF (x)

= F (t) +∫ t

0m(t− x)dF (x).

10.2 Limiting Theorems and Their Applica-

tions

Theorem 10.2

Suppose that {N(t) : t ≥ 0} is a renewal process and the inter-arrival time

X1 has non-zero expectation µ. Then

N(t)

t→ 1

µ

almost surely as t→∞.

Proof: Let Sn be the occurrence time of the nth event as before. By the

definition of renewal process, we have

SN(t) ≤ t ≤ SN(t)+1

which impliesSN(t)

N(t)≤ t

N(t)≤SN(t)+1

N(t).

Page 165: Stat333 Lecture Notes - Daum

160 CHAPTER 10. RENEWAL PROCESS

By the law of large numbers, we have

Snn→ µ

almost surely. Since N(t)→∞ almost surely when t→∞, we have

SN(t)

N(t)→ µ,

andSN(t)+1

N(t)=SN(t)

N(t)

[1 +

1

N(t)

]→ µ.

Thus, we have the result. ♦The elementary renewal theorem is as follows.

Theorem 10.3

Suppose that {N(t) : t ≥ 0} is a renewal process and the inter-arrival time

X1 has non-zero expectation µ. Then, the renewal function satisfies

m(t)

t→ 1

µ

as t→∞. ♦We do not provide a proof here. It should be noted that this result cannot

be directly obtained from the last theorem.

If the renewal theorem is assumed, the limiting probabilities of the dis-

crete time Markov chain can be derived as follows.

Example 10.4

Let {Xn : n = 0, 1, . . .} be a discrete time Markov chain. Assume that it is

irreducible, aperiodic and positive recurrent. Let the state space be denoted

as S = {0, 1, . . . , }.Consider the case when X0 = i for some i. Define Tk be the inter-arrival

time for the Markov chain to visit state i. Thus, we can define a renewal

process Ni(t) to be the number of times when state i is visited by time t.

Page 166: Stat333 Lecture Notes - Daum

10.3. PROBLEMS 161

By the renewal theorem, the long term proportion of times when state i is

visited is given by

limn→∞

Ni(n)

n→ µ−1

i

where µi = E[T1]. That is, πi = µ−1i . ♦

Example 10.5

Let {N(t) : t ≥ 0} be a renewal process and X1, X2, . . . , be the inter-arrival

times. Let µ = E[X1] > 0. For any given n, the event N(t) + 1 = n implies

that the n − 1th event has occurred by time t but the nth event has not

occurred yet. In another word, we know that

n−1∑i=1

Xi ≤ t <n∑i=1

Xn.

Consequently, it has nothing to do with the values of Xn+1, Xn+2, . . .. A

random variable T = N(t) + 1, with the property that T = n is independent

of future outcomes Xn+1, Xn+2, . . ., is called a stopping time. It can be

shown that for a stopping time,

E[T∑i=1

Xi] = E[T ]E[X].

In our case, we have

E[N(t)+1∑i=1

Xi] = E[N(t) + 1]E[X].

♦It turns out that N(t) is not a stopping time and the above formula is

not applicable to N(t).

10.3 Problems

1. Suppose that the inter-arrival distribution for a renewal process is Pois-

son distributed with mean µ. That is, suppose

P (Xn = k) =µk−1

(k − 1)!exp(−µ), k = 1, 2, . . . .

Page 167: Stat333 Lecture Notes - Daum

162 CHAPTER 10. RENEWAL PROCESS

(a) Find the distribution of Sn.

(b) Calculate P (N(t) ≥ n).

(c) Find m(t) = E[N(t)](not necessarily in a closed form).

2. Mr. Smith works on a temporary basis. The mean length of each job

he gets is three months. If the amount of time he spends between jobs

is exponentially distributed with mean 2, then at what rate does Mr.

Smith get new jobs?

3. Each time a machine is repaired it remains up for an exponentially

distributed time with rate λ. It then fails, and its failure is either of

two types. If it is a type 1 failure, then the time to repair the machines

is exponential with rate µ1; if it is a type 2 failure, then the repair time

is exponential with rate µ2. Each failure is , independently of the time

it took the machines to fail, a type 1 failure with probability p and a

type 2 failure with probability 1 − p. What proportion of time is the

machine down due to a type 1 failure? what proportion of time is the

machine down due to a type 2 failure? What proportion of time is it

up?

4. A machine in use is replaced by a new machine either when it fails

or when it reaches the age of T years. If the lifetimes of successive

machines are independent with a common distribution F having density

f show that

(a) the long-run rate at which machines are replaced equals[∫ T

0xf(x)dx+ T (1− F (T ))

]−1

;

(b) the long-run rate at which machines in use fail equals

F (T )∫ T0 xf(x)dx+ T [1− F (T )]

.

5. Machines in a factory break down at an exponential rate of six per hour.

There is a single repairman who fixes machines at an exponential rate

Page 168: Stat333 Lecture Notes - Daum

10.3. PROBLEMS 163

of eight per hour. The cost incurred in lost production when machines

are out of service is $10 per hour per machine. What is the average

cost rate incurred due to failed machines?

6. The manager of a market can hire either Mary or Alice. Mary, who

gives service at an exponential rate of 20 customers per hour, can be

hired at a rate of $3 per hour. Alice, who gives service at an exponential

rate of 30 customers per hour, can be hired at a rate of $C per hour.

The manager estimates that, on the average, each customer’s time is

worth $1 per hour and should be accounted for the model. If customers

arrive at a Poisson rate of 10 per hour, then

(a) what is the average cost per hour if Mary is hired? if Alice is hired?

(b) find C if the average cost per hour is the same for Mary and Alice.

7. Consider a renewal process {N(t), t ≥ 0} having a gamma (r, λ) inter-

arrival distribution. That is, the inter-arrival density is

f(x) =λe−λx(λx)r−1

(r − 1)!, x > 0.

(a) Show that

P{N(t) ≥ n} =∞∑i=nr

e−λt(λt)i

i!.

(b) Use (a) to show that

m(t) =∞∑i=r

[i

r]e−λt(λt)i

i!.

where [ ir] is the largest integer less than or equal to i/r.

Page 169: Stat333 Lecture Notes - Daum

164 CHAPTER 10. RENEWAL PROCESS

Page 170: Stat333 Lecture Notes - Daum

Chapter 11

Sample Exam Papers

11.1 Quiz 1: Winter 2003

1. [4] Using only the axioms of the probability, show that if A and B are

two event such that A ⊂ B, then

P (A) ≤ P (B).

2. [2] Two independent random variables X and Y have probability mass

functions

P (X = k) = 1/3; k = 0, 1, 2,

and

P (Y = k) = (1/2)k+1; k = 0, 1, 2, . . .

That is: X has uniform distribution on {0, 1, 2}, and Y has geometric

distribution.

[3] (a) Find the probability generating function of X.

[3] (b) Find a closed form expression for the probability generating

function of Y . (This means that leaving it as a summation is not

enough).

[3] (c) Find the probability generating function of XY .

165

Page 171: Stat333 Lecture Notes - Daum

166 CHAPTER 11. SAMPLE EXAM PAPERS

3. [2] Let X be a random variable with probability function given by the

following table:x -2 -1 0 1 2

p .3 .2 .1 .2 .2

Let Y = (X + 1)2.

[3] (a) Tabulate the probability function of Y .

[3] (b) Tabulate the conditional probability function of X given Y = 0.

[3] (c) Tabulate E(X|Y ).

[3] (d) Compute V ar[E(X|Y )].

4. [3] The number of claims received at an insurance company during a

week is a random variable with mean 20 and variance 120. The amount

paid in each claim is a random variable with mean 350 and variance

10000. Assume that the amounts of different claims are independent.

(a) [4] Suppose this company received exactly 3 claims in a particular

week. The amount of each claim is still random as already specified.

What are the mean and variance of the total amount paid to these 3

claims in this week?

(b) [4] Assume that in one week, all claims received the same payment

of 300. What is the mean and variance of the total amount paid in this

week?

(c) [4] What are the mean and variance of the total amount paid to

claims in an ordinary week?

5. [3] A secretary puts n letters n envelopes randomly. Let An be the

event that at least one letter is in the correct envelope.

[4] (a) Show that for all n = 1, 2, . . .,

pn = 1− P (An) = 1 +(−1)

1!+

(−1)2

2!+ · · ·+ (−1)n

n!.

Hint: for each given n, define Bi = the event that the ith letter is in

the ith envelope, i = 1, 2, . . . , n. Then An = B1 ∪B2 ∪ · · · ∪Bn.

Page 172: Stat333 Lecture Notes - Daum

11.2. QUIZ 2: WINTER 2003 167

[4] (b) Define p0 = 0, find the generating function of {pn}∞n=1.

Hint: obtain a difference equation first.

11.2 Quiz 2: Winter 2003

1. [3] State the definition of Markov chain (discrete time).

2. [4] State the definitions of the concepts of transient, positive recurrent

and null recurrent for a renewal event.

3. Assume the coding sequence of a DNA sequence in a region without

genes can be modeled as a random sample of symbols: A, G, T, C with

corresponding probabilities

PA = 0.2, PG = 0.2, PT = 0.4, PC = 0.2.

Assume X0 = T .

[3] (a) Let

un = P (TAT occurs at trial n) = P (Xn−2 = T,Xn−1 = A,Xn = T |X0 = T ).

Show that un = 0.032 for n ≥ 3. Find the values of un for n = 0, 1, 2.

[3] (b) Obtain the generating function of the sequence un.

[2] (c) Is TAT a renewal or delayed renewal event? Give a one sentence

justification.

[3] (d) Show that TAT is recurrent.

[3] (e) Use renewal theorem, compute the mean inter-occurrence time

for TAT after its first occurrence.

4. Assume {Xn}∞n=0 is a Markov chain with transition probability matrix

P =

0 0.3 0.2 0.5

0.3 0 0.5 0.2

0 0 0.4 0.6

0 0 0.3 0.7

.

Page 173: Stat333 Lecture Notes - Daum

168 CHAPTER 11. SAMPLE EXAM PAPERS

[3] (a) Find the two step transition probability matrix.

[3] (b) Suppose that the probability function of X1 is given by the

vector β1 = (0, 0.5, 0, 0.5)τ . Find the probability function of X3.

[6] (c) Classify the state space. For each class, determine whether it is

recurrent or transient. Determine their periods.

[2] (c) What does it mean by “irreducible”? Is this MC reducible?

[5] (d) Find the long run proportions of times when the MC is in state

0, in state 2. (Do not blindly solve πP = π).

[3] (e) Calculate limn→∞E[Xn].

5. Let {Zn}∞n=0 be a usual branching process with Z0 = 1 and Zn =∑Zn−1

j=1 Xn−1,j for n > 0 with family sizes Xn,j being iid random vari-

ables.

Assume X0,1 has discrete uniform distribution on 0, 1, . . . , k for some

positive integer k.

For example, if k = 3, then P (X0,1 = j) = 0.25 for j = 0, 1, 2, 3.

[2] (a) For what values of k the probability of extinction is 1?

[4] (b) When k = 3, compute the probability of extinction.

[4] (c) When k = 5, calculation the mean and variance of X5.

6. In a more complex random walk, Z1, Z2, . . . are independent and iden-

tically distributed random variables with

P (Z1 = 1) = p, P (Z1 = 0) = r and P (Z1 = −2) = q, (11.1)

such that p+r+q = 1 and all p, r, q are non-zero. As usualXn =∑ni=1 Zi

for n ≥ 1, with X0 = 0.

Define (as usual)

λ(r)n = P (Xn = r,Xn−1 < r,Xn−2 < r, . . . , X2 < r,X1 < r|X0 = 0)

for r > 0.

Page 174: Stat333 Lecture Notes - Daum

11.3. FINAL EXAM: WINTER 2003 169

You are given that the corresponding generating functions satisfy

Λ(r)(s) = [Λ(s)]r.

[4] (a) Show that Λ(s) satisfies the equation

qs[Λ(s)]3 + (rs− 1)Λ(s) + ps = 0.

[4] (b) When p = 0.25, q = 0.25 and r = 0.5, find the probability that

Xn = 1 will ever occur for some n ≥ 1.

[bonus 5] (c) Show that when p = 2q > 0, Xn = 0 is a recurrent renewal

event (more on recurrent part).

11.3 Final Exam: Winter 2003

1. Let Xn, n = 1, 2, . . . be a sequence of independent and identically

distributed geometric random variables such that Xn (for all n) has

probability mass function

f(k) = P (Xn = k) = p(1− p)k, k = 0, 1, 2, . . .

for some parameter p ∈ (0, 1).

(a) [5] Find the probability generating function of Xn.

(b) [5] Find the probability generating function of X1 +X3.

(c) [5] Let N be a Poisson distribution random variable with mean

µ and is independent of X1, X2, . . .. Find the probability generating

function of TN =∑2N+1i=1 Xi.

(d) [5] Compute E(TN) and V ar(TN), where TN is defined as in (c).

2. Suppose 4 balls are placed into two urns A and B. On each day, One

ball is selected such that each of the four balls is equally likely to be

selected and the ball is then placed into the other urn.

Let Xn be the number of balls in urn A on the nth day; and Yn be the

number of balls in urn A on the 2nth day for n = 0, 1, 2, . . ..

Page 175: Stat333 Lecture Notes - Daum

170 CHAPTER 11. SAMPLE EXAM PAPERS

a) [6] Are {Xn}∞n=0 and {Yn}∞n=0 Markov chains. If any of them are,

write down their state spaces and transition matrices and do the usual

classification.

b) [4] Given X0 = 3, find the probability function of X2.

c) [4] In a long run, what proportion of times when at least one urn is

empty?

d) [6] Given X0 = k, calculate the probability that number of balls

in urn A reaches 0 before the number of balls in urn B reaches 0 for

k = 0, 1, 2, 3 and 4.

3. A student has practically infinite number of assignment problems to

work on at the moment. The time it takes to solve a problem is ran-

dom with exponential distribution whose mean is 10 minutes. The

probability that her solution is correct is 80%. Assume the amount of

time for solving each problem, and the probability of getting correct

answers are all independent.

The worth of each correct answer is random with uniform distribution

on 1, 2, 3 marks. (No partial marks for wrong answers for simplicity).

(a) [4] What is the probability that she solved exactly 10 problems in

an hour?

(b) [4] What is the probability that she used more than 1 hour to solve

the first 3 problems?

(c) [4] If she solved 10 problems in a hour, calculate the probability

that she gets at least 9 correctly.

(d) [4] Suppose she solved 9 problems correctly in one hour. Given this,

what is her expected number of problems solved in the same period?

(e) [4] Suppose she worked on assignment for one hour and handed in

whatever she completed in that hour. Let T be her mark of her hand

in. Calculate the mean and variance of T .

4. A professor has access to two computer servers, and he has a computing

job to be done.

Page 176: Stat333 Lecture Notes - Daum

11.3. FINAL EXAM: WINTER 2003 171

Assume that the time it takes to complete a job is random with expo-

nential distribution, independent of each other, for both servers. The

rates are λ1 = 3/hour and λ2 = 2/hour for Servers 1 and 2 respectively.

Let X1(t) and X2(t) be the numbers of jobs in the queues for Servers

1 and 2 respectively at time t. The jobs in the queues do not switch

between servers even if the other machine is idle sometimes.

The professor submitted the same job to both servers at time t = 0.

His job is done as soon as one of two servers completes it.

Suppose X1(0) = 1 and X2(0) = 1.

[5] (1) What is the probability that Server 1 will start work on his job

before Server 2?

[5] (2) What is the probability that he has to wait for 0.5 hours or

longer before any server starts working on his job?

[5] (3) What is the probability that his job is completed by Server 1

before Server 2 starts working on this job?

[5] (4) What is the probability that he has to wait at least 2 hours

before the job is done?

5. A closed population has N individuals. Assume the number of flu

cases can be modeled by a birth and death process. Let X(t) be the

number of flu cases in this population at time t. The birth rate λk =

(k + 1)(N − k)λ and the death rate (not the death of the individual,

but the death of the “flu”) µk = k2µ when X(t) = k, k = 0, 1, . . . , N .

(a) [5] Given X(0) = 0, what is the expected waiting times until X(t) =

1, until X(t) = 2?

(b) [5] Given X(0) = k, for some 0 < k < N , what is the probability

that after the next transition, there will be an extra case rather than

there will be one few case?

(c) [5] Assume λ = 1, µ = 9 and N = 5. In the long run, what is the

proportion of times when there are no flu cases in the population?

(d) [5] Assume λ = 1, µ = 9 and N = 5. what is the average number

of flu cases at any moment in the long run?

Page 177: Stat333 Lecture Notes - Daum

172 CHAPTER 11. SAMPLE EXAM PAPERS

(e) [bonus 2] Answer (c) for a general N .