Markov Models - meyn.ece.ufl.edu€¦ · Markov Models This appendix describes stability theory and ergodic theory for Markov chains on a countable state space that provides foundations

Appendix A

Markov Models

This appendix describes stability theory and ergodic theory for Markov chains on acountable state space that provides foundations for the development in Part III of thisbook. It is distilled from Meyn and Tweedie [367], which contains an extensive bibli-ography (the monograph [367] is now available on-line.)

The term “chain” refers to the assumption that the time-parameter is discrete. TheMarkov chains that we consider evolve on a countable state space, denotedX, withtransition law defined as follows,

P (x, y) := P{X(t+ 1) = y | X(t) = x} x, y ∈ X, t = 0, 1, . . . .

The presentation is designed to allow generalization to more complex general statespace chains as well as reflected Brownian motion models.

Since the publication of [367] there has been a great deal of progress on the theoryof geometrically ergodic Markov chains, especially in the context of Large Deviationstheory. See Kontoyiannis et. al. [312, 313, 311] and Meyn [364] for some recent results.The website [444] also contains on-line surveys on Markov and Brownian models.

A.1 Every process is (almost) Markov

Why do we focus so much attention on Markov chain models? An easy responseis to cite the powerful analytical techniques available, such as the operator-theoretictechniques surveyed in this appendix. A more practical reply is that most processes canbe approximated by a Markov chain.

Consider the following example:Z is a stationary stochastic process on the non-negative integers. A Markov chain can be constructed that has the same steady-statebehavior, and similar short-term statistics. Specifically, define the probability measureon Z+ × Z+ via,

Π(z0, z1) = P{Z(t) = z0, Z(t+ 1) = z1}, z0, z1 ∈ Z+.

Note thatΠ captures the steady-state behavior by construction. By considering thedistribution of the pair(Z(t), Z(t+ 1)) we also capture some of the dynamics ofZ.

538

Control Techniques for Complex Networks Draft copy April 22, 2007 539

The first and second marginals ofΠ agree, and are denotedπ,

π(z0) = P{Z(t) = z0} =∑

z1∈Z+

Π(z0, z1), z0,∈ Z+.

The transition matrix for the approximating process is defined as the ratio,

P (z0, z1) =Π(z0, z1)

π(z0), z0, z1 ∈ X ,

with X = {z ∈ Z+ : π(z) > 0}.The following simple result is established in Chorin [111], but the origins are

undoubtedly ancient. It is a component of the model reduction techniques pioneeredby Mori and Zwanzig in the area of statistical mechanics [375, 505].

Proposition A.1.1. The transition matrixP describes these aspects of the stationaryprocessZ:

(i) One-step dynamics:P (z0, z1) = P{Z(t+ 1) = z1 | Z(t) = z0}, z0, z1 ∈ X.

(ii) Steady-state:The probabilityπ is invariant forP ,

π(z1) =∑

z0∈X

π(z0)P (z0, z1), z1,∈ X.

Proof. Part (i) is simply Baye’s rule

P{Z(t+ 1) = z1 | Z(t) = z0} =P{Z(t+ 1) = z1, Z(t) = z0}

P{Z(t) = z0}=

Π(z0, z1)

π(z0).

The definition ofP givesπ(z0)P (z0, z1) = Π(z0, z1), and stationarity ofZ impliesthat

∑z0π(z0)P (z0, z1) =

∑z0

Π(z0, z1) = π(z1), which is (ii). ⊓⊔

PropositionA.1.1 is just one approach to approximation. IfZ is not stationary, analternative is to redefineΠ as the limit,

Π(z0, z1) = limN→∞

1

N

N−1∑

t=0

P{Z(t) = z0, Z(t+ 1) = z1},

assuming this exists for eachz0, z1 ∈ Z+. Similar ideas are used in Section9.2.2to prove that an optimal policy for a controlled Markov chaincan be taken stationarywithout loss of generality.

Another common technique is to add some history toZ via,

X(t) := [Z(t), Z(t− 1), . . . , Z(t− n0)],

wheren0 ∈ [1,∞] is fixed. Ifn0 = ∞ then we are including the entire history, and inthis caseX is Markov: For any possible valuex1 of X(t+ 1),

P{X(t + 1) = x1 | X(t),X(t − 1), . . . } = P{X(t+ 1) = x1 | X(t)}


A.2 Generators and value functions

The main focus of the Appendix is performance evaluation, where performance is de-fined in terms of a cost functionc : X → R+. For a Markov model there are severalperformance criteria that are well-motivated and are also conveniently analyzed usingtools from the general theory of Markov chains:

Discounted cost For a given discount-parameterγ > 0, recall that the discounted-cost value function is defined as the sum,

hγ(x) :=

∞∑

t=0

(1 + γ)−t−1Ex[c(X(t))], X(0) = x ∈ X. (A.1)

Recall from (1.18) that the expectations in (A.1) can be expressed in terms of thet-steptransition matrix via,

E[c(X(t)) | X(0) = x] = P tc (x), x ∈ X, t ≥ 0.

Consequently, denoting theresolventby

Rγ =∞∑

t=0

(1 + γ)−t−1P t, (A.2)

the value function (A.1) can be expressed as the “matrix-vector product”,

hγ(x) = Rγc (x) :=∑

y∈X

Rγ(x, y)c(y), x ∈ X.

Based on this representation, it is not difficult to verify the following dynamic program-ming equation. The discounted-cost value function solves

Dhγ = −c+ γhγ , (A.3)

where thegeneratorD is defined as the difference operator,

D = P − I . (A.4)

The dynamic programming equation (A.3) is a first step in the development of dynamicprogramming for controlled Markov chains contained in Chapter 9.

Average cost The average cost is the limit supremum of the Cesaro-averages,

ηx := lim supr→∞

1

r

r−1∑

t=0

Ex

[c(X(t))

], X(0) = x ∈ X.

A probability measure is called invariant if it satisfies theinvariance equation,


∑

y∈X

π(x)D(x, y) = 0, x ∈ X, (A.5)

Under mild stability and irreducibility assumptions we findthat the average cost coin-cides with the spatial averageπ(c) =

∑x′ π(x′)c(x′) for each initial condition. Under

these conditions the limit supremum in the definition of the average cost becomes alimit, and it is also the limit of the normalized discounted cost for vanishing discount-rate,

ηx = π(c) = limr→∞

1

r

r−1∑

t=0

Ex

[c(X(t))

]= lim

γ↓0γhγ(x). (A.6)

In a queueing network model the followingx∗-irreducibility assumption fre-quently holds withx∗ ∈ X taken to represent a network free of customers.

Definition A.2.1. Irreducibility

The Markov chainX is called

(i) x∗-Irreducible if x∗ ∈ X satisfies for one (and hence any)γ > 0,

Rγ(x, x∗) > 0 for eachx ∈ X.

(ii) The chain is simply calledirreducible if it is x∗-irreducible for eachx∗ ∈ X.

(iii) A x∗-irreducible chain is calledaperiodic if there existsn0 < ∞ such thatPn(x∗, x∗) > 0 for all n ≥ n0.

When the chain isx∗-irreducibile, we find that the most convenient sample pathrepresentations ofη are expressed with respect to thefirst return timeτx∗ to the fixedstatex∗ ∈ X. From PropositionA.3.1 we find thatη is independent ofx within thesupport ofπ, and has the form,

η = π(c) =(Ex∗

[τx∗

])−1Ex∗

[τx∗−1∑

t=0

c(X(t))]. (A.7)

Considering the functionc(x) = 1{x 6= x∗} gives,

Theorem A.2.1. (Kac’s Theorem) If X is x∗-irreducible then it is positive recurrentif and only if Ex∗ [τx∗ ] < ∞. If positive recurrence holds, then lettingπ denote theinvariant measure forX, we have

π(x∗) = (Ex∗ [τx∗ ])−1. (A.8)


Total cost and Poisson’s equationFor a given functionc : X → R with steady statemeanη, denote the centered function byc = c−η. Poisson’s equation can be expressed,

Dh = −c (A.9)

The functionc is called theforcing function, and a solutionh : X → R is known as arelative value function. Poisson’s equation can be regarded as a dynamic programmingequation; Note the similarity between (A.9) and (A.3).

Under thex∗-irreducibility assumption we have various representations of the rel-ative value function. One formula is similar to the definition (A.6):

h(x) = limγ↓0

(hγ(x) − hγ(x

∗)), x ∈ X. (A.10)

Alternatively, we have a sample path representation similar to (A.7),

h(x) = Ex

[τx∗−1∑

t=0

(c(X(t)) − η

)], x ∈ X. (A.11)

This appendix contains a self-contained treatment of Lyapunov criteria for stabil-ity of Markov chains to validate formulae such as (A.11). A central result known astheComparison Theoremis used to obtain bounds onη or any of the value functionsdescribed above.

These stability criteria are all couched in terms of the generator forX . The mostbasic criterion is known as condition (V3): for a functionV : X → R+, a functionf : X → [1,∞), a constantb <∞, and a finite setS ⊂ X,

DV (x) ≤ −f + b1S(x), x ∈ X , (V3)

or equivalently,

E[V (X(t + 1)) − V (X(t)) | X(t) = x] ≤{−f(x) x ∈ Sc

−f(x) + b x ∈ S.(A.12)

Under thisLyapunov drift conditionwe obtain various ergodic theorems in SectionA.5.The main results are summarized as follows:

Theorem A.2.2. Suppose thatX is x∗-irreducible and aperiodic, and that there ex-ists V : X → (0,∞), f : X → [1,∞), a finite setS ⊂ X, and b < ∞ such thatCondition (V3) holds. Suppose moreover that the cost function c : X → R+ satisfies‖c‖f := supx∈X c(x)/f(x) ≤ 1.

Then, there exists a unique invariant measureπ satisfyingη = π(c) ≤ b, and thefollowing hold:

(i) Strong Law of Large Numbers: For each initial condition,1

n

n−1∑

t=0

c(X(t)) → η

a.s. asn→ ∞.


(ii) Mean Ergodic Theorem: For each initial condition,Ex[c(X(t))] → η ast→ ∞.

(iii) Discounted-cost value functionhγ : Satisfies the uniform upper bound,

hγ(x) ≤ V (x) + bγ−1, x ∈ X.

(iv) Poisson’s equationh: Satisfies, for someb1 <∞,

|h(x) − h(y)| ≤ V (x) + V (y) + b1, x, y ∈ X.⊓⊔

Proof. The Law of Large numbers is given in TheoremA.5.8, and the Mean ErgodicTheorem is established in TheoremA.5.4 based on couplingX with a stationary ver-sion of the chain.

The boundη ≤ b along with the bounds onh andhγ are given in TheoremA.4.5.⊓⊔

These results are refined elsewhere in the book in the construction and analysis ofalgorithms to bound or approximate performance in network models.

A.3 Equilibrium equations

In this section we consider in greater detail representations forπ andh, and begin todiscuss existence and uniqueness of solutions to equilibrium equations.

A.3.1 Representations

Solving either equation (A.5) or (A.9) amounts to a form of inversion, but there aretwo difficulties. One is that the matrices to be inverted may not be finite dimensional.The other is that these matrices arenever invertable! For example, to solve Poisson’sequation (A.9) it appears that we must invertD. However, the functionf which isidentically equal to one satisfiesDf ≡ 0. This means that the null-space ofD isnon-trivial, which rules out invertibility.

On iterating the formulaPh = h− c we obtain the sequence of identities,

P 2h = h− c− P c =⇒ P 3h = h− c− P c− P 2c =⇒ · · · .

Consequently, one might expect a solution to take the form,

h =

∞∑

i=0

P ic. (A.13)

When the sum converges absolutely, then this function does satisfy Poisson’s equation(A.9).

A representation which is more generally valid is defined by arandom sum. Definethe first entrance time and first return time to a statex∗ ∈ X by, respectively,


σx∗ = min(t ≥ 0 : X(t) = x∗) τx∗ = min(t ≥ 1 : X(t) = x∗) (A.14)

PropositionA.3.1 (i) is contained in [367, Theorem 10.0.1], and (ii) is explained inSection 17.4 of [367].

Proposition A.3.1. Letx∗ ∈ X be a given state satisfyingEx∗ [τx∗ ] <∞. Then,

(i) The probability distribution defined below is invariant:

π(x) :=(Ex∗

[τx∗

])−1Ex∗

[τx∗−1∑

t=0

1(X(t) = x)], x ∈ X. (A.15)

(ii) Withπ defined in (i), suppose thatc : X → R is a function satisfyingπ(|c|) <∞. Then, the function defined below is finite-valued onXπ := the support ofπ,

h(x) = Ex

[τx∗−1∑

t=0

c(X(t))]

= Ex

[σx∗∑

t=0

c(X(t))]− c(x∗), x ∈ X. (A.16)

Moreover,h solves Poisson’s equation onXπ.⊓⊔

The formulae forπ andh given in PropositionA.3.1 are perhaps the most com-monly known representations. In this section we develop operator-theoretic representa-tions that are truly based on matrix inversion. These representations help to simplify thestability theory that follows, and they also extend most naturally to general state-spaceMarkov chains, and processes in continuous time.

Operator-theoretic representations are formulated in terms of the resolventresol-vent matrixdefined in (A.2). In the special caseγ = 1 we omit the subscript andwrite,

R(x, y) =

∞∑

t=0

2−t−1P t(x, y), x, y ∈ X. (A.17)

In this special case, the resolvent satisfiesR(x,X) :=∑

y R(x, y) = 1, and hence itcan be interpreted as a transition matrix. In fact, it is precisely the transition matrix fora sampled process. Suppose that{tk} is an i.i.d. process with geometric distributionsatisfyingP{tk = n} = 2−n−1 for n ≥ 0, k ≥ 1. Let {Tk : k ≥ 0} denote thesequence of partial sums,

T0 = 0, andTk+1 = Tk + tk+1 for k ≥ 0.

Then, the sampled process,

Y (k) = X(Tk), k ≥ 0, (A.18)

is a Markov chain with transition matrixR.Solutions to the invariance equations forY andX are closely related:


Proposition A.3.2. For any Markov chainX onX with transition matrixP ,

(i) The resolvent equation holds,

DR = RD = DR, whereDR = R− I. (A.19)

(ii) A probability distributionπ onX is P -invariant if and only if it isR-invariant.

(iii) Suppose that an invariant measureπ exists, and thatg : X → R is given withπ(|g|) < ∞. Then, a functionh : X → R solves Poisson’s equationDh = −gwith g := g − π(g), if and only if

DRh = −Rg. (A.20)

Proof. From the definition ofR we have,

PR =∞∑

t=0

2−(t+1)P t+1 =∞∑

t=1

2−tP t = 2R− I.

HenceDR = PR−R = R− I, proving (i).To see (ii) we pre-multiply the resolvent equation (A.19) by π,

πDR = πDR

Obviously then,πD = 0 if and only if πDR = 0, proving (ii). The proof of (iii) issimilar. ⊓⊔

The operator-thoretic representations ofπ andh are obtained under the follow-ing minorization condition: Suppose thats : X → R+ is a given function, andν is aprobability onX such that

R(x, y) ≥ s(x)ν(y) x, y ∈ X. (A.21)

For example, ifν denotes the probability onX which is concentrated at a singletonx∗ ∈ X, ands denotes the function onX given bys(x) :=R(x, x∗), x ∈ X, then we dohave the desired lower bound,

R(x, y) ≥ R(x, y)1x∗(y) = s(x)ν(y) x, y ∈ X.

The inequality (A.21) is a matrix inequality that can be written compactly as,

R ≥ s⊗ ν (A.22)

whereR is viewed as a matrix, and the right hand side is the outer product of thecolumn vectors, and the row vectorν. From the resolvent equation and (A.22) we cannow give a roadmap for solving the invariance equation (A.5). Suppose that we alreadyhave an invariant measureπ, so that

πR = π.


Then, on subtractings⊗ ν we obtain,

π(R− s⊗ ν) = πR− π[s⊗ ν] = π − δν,

whereδ = π(s). Rearranging gives,

π[I − (R− s⊗ ν)] = δν. (A.23)

We can now attempt an inversion. The point is, the operatorDR := I − R is notinvertible, but by subtracting the outer products⊗ν there is some hope in constructingan inverse. Define thepotential matrixas

G =

∞∑

n=0

(R− s⊗ ν)n . (A.24)

Under certain conditions we do haveG = [I − (R− s⊗ ν)]−1, and hence from (A.23)we obtain the representation ofπ,

π = δ[νG]. (A.25)

We can also attempt the ‘forward direction’ to constructπ: Given a pairs, ν satis-fying the lower bound (A.22), wedefineµ := νG. We must then answer two questions:(i) when isµ invariant? (ii) when isµ(X) < ∞? If both are affirmative, then we dohave an invariant measure, given by

π(x) =µ(x)

µ(X), x ∈ X.

We will show thatµ always exists as a finite-valued measure onX, and that it is alwayssubinvariant,

µ(y) ≥∑

x∈X

µ(x)R(x, y), y ∈ X.

Invariance and finiteness both require some form ofstability for the process.The following result shows that the formula (A.25) coincides with the representa-

tion given in (A.15) for the sampled chainY .

Proposition A.3.3. Suppose thatν = δx∗ , the point mass at some statex∗ ∈ X, andsuppose thats(x) := R(x, x∗) for x ∈ X. Then we have for each bounded functiong : X → R,

(R− s⊗ ν)ng (x) = Ex[g(Y (n))1{τYx∗ > n}], x ∈ X, n ≥ 1, (A.26)

whereτYx∗ denotes the first return time tox∗ for the chainY defined in (A.18). Conse-quently,

Gg (x) :=

∞∑

n=0

(R− s⊗ ν)ng (x) = Ex

[τYx∗

−1∑

t=0

g(Y (t))].


Proof. We have(R − s ⊗ ν)(x, y) = R(x, y) − R(x, x∗)1y=x∗ = R(x, y)1y 6=x∗ . Or,in probabilistic notation,

(R − s⊗ ν)(x, y) = Px{Y (1) = y, τYx∗ > 1}, x, y ∈ X.

This establishes the formula (A.26) for n = 1. The result then extends to arbitraryn ≥ 1 by induction. If (A.26) is true for any givenn, then(R− s⊗ ν)n+1(x, g) =

∑

y∈X

[(R−s⊗ ν)(x, y)

][(R− s⊗ ν)n(y, g)

]

=∑

y∈X

Px{Y (1) = y, τYx∗ > 1}Ey[g(Y (n))1{τYx∗ > n}]

= Ex

[1{τYx∗ > 1}E[g(Y (n+ 1))1{Y (t) 6= x∗, t = 2, . . . , n+ 1} | Y (1)]

]

= Ex

[g(Y (n + 1))1{τYx∗ > n+ 1}

]

where the second equation follows from the induction hypothesis, and in the third equa-tion the Markov property was applied in the form (1.19) for Y . The final equationfollows from the smoothing property of the conditional expectation. ⊓⊔

A.3.2 Communication

The following result shows that one can assume without loss of generality that the chainis irreducible by restricting to anabsorbingsubset ofX. The setXx∗ ⊂ X defined inPropositionA.3.4 is known as acommunicating class.

Proposition A.3.4. For eachx∗ ∈ X the set defined by

Xx∗ = {y : R(x∗, y) > 0} (A.27)

is absorbing:P (x,Xx∗) = 1 for eachx ∈ Xx∗. Consequently, ifX is x∗-irreduciblethen the process may be restricted toXx∗, and the restricted process is irreducible.

Proof. We haveDR = R− I, which implies thatR = 12(RP + I). Consequently, for

anyx0, x1 ∈ X we obtain the lower bound,

R(x∗, x1) ≥ 12

∑

y∈X

R(x∗, y)P (y, x1) ≥ 12R(x∗, x0)P (x0, x1).

Consequently, ifx0 ∈ Xx∗ andP (x0, x1) > 0 thenx1 ∈ Xx∗. This shows thatXx∗ isalways absorbing. ⊓⊔

The resolvent equation in PropositionA.3.2 (i) can be generalized to any one ofthe resolvent matrices{Rγ}:

Proposition A.3.5. Consider the family of resolvent matrices (A.2). We have the tworesolvent equations,


(i) [γI −D]Rγ = Rγ [γI −D] = I, γ > 0.

(ii) For distinctγ1, γ2 ∈ (1,∞),

Rγ2 = Rγ1 + (γ1 − γ2)Rγ1Rγ2 = Rγ1 + (γ1 − γ2)Rγ2Rγ1 (A.28)

Proof. For anyγ > 0 we can express the resolvent as a matrix inverse,

Rγ =

∞∑

t=0

(1 + γ)−t−1P t = [γI −D]−1, x ∈ X, (A.29)

and from (A.29) we deduce (i). To see (ii) write,

[γ1I −D] − [γ2I −D] = (γ1 − γ2)I

Multiplying on the left by[γ1I −D]−1 and on the right by[γ2I −D]−1 gives,

[γ2I −D]−1 − [γ1I −D]−1 = (γ1 − γ2)[γ1I −D]−1[γ2I −D]−1

which is the first equality in (A.28). The proof of the second equality is identical.⊓⊔

When the chain isx∗-irreducible then one can solve the minorization conditionwith s positive everywhere:

Lemma A.3.6. Suppose thatX is x∗-irreducible. Then there existss : X → [0, 1]and a probability distributionν onX satisfying,

s(x) > 0 for all x ∈ X andν(y) > 0 for all y ∈ Xx∗.

Proof. Chooseγ1 = 1, γ2 ∈ (0, 1), and defines0(x) = 1x∗(x), ν0(y) = Rγ2(x∗, y),

x, y ∈ X, so thatRγ2 ≥ s0 ⊗ ν0. From (A.28),

Rγ2 = R1 + (1 − γ2)R1Rγ2 ≥ (1 − γ2)R1[s0 ⊗ ν0].

Settings = (1 − γ2)R1s0 andν = ν0 givesR = R1 ≥ s ⊗ ν. The functions ispositive everywhere due to thex∗-irreducibility assumption, andν is positive onXx∗sinceRγ2(x

∗, y) > 0 if and only ifR(x∗, y) > 0. ⊓⊔

The following is the key step in establishing subinvariance, and criteria for invari-ance. Note that LemmaA.3.7 (i) only requires the minorization condition (A.22).

Lemma A.3.7. Suppose that the functions : X → [0, 1) and the probability distribu-tion ν onX satisfy (A.22). Then,

(i) Gs (x) ≤ 1 for everyx ∈ X.

(ii) (R− s⊗ ν)G = G(R − s⊗ ν) = G− I.

(iii) If X is x∗-irreducible ands(x∗) > 0, thensupx∈XG(x, y) < ∞ for eachy ∈ X.


Proof. ForN ≥ 0, definegN : X → R+ by

gN =

N∑

n=0

(R− s⊗ ν)ns.

We show by induction thatgN (x) ≤ 1 for everyx ∈ X andN ≥ 0. This will establish(i) sincegN ↑ Gs, asN ↑ ∞.

For eachx we haveg0(x) = s(x) = s(x)ν(X) ≤ R(x,X) = 1, which verifiesthe induction hypothesis whenN = 0. If the induction hypothesis is true for a givenN ≥ 0, then

gN+1(x) = (R− s⊗ ν)gN (x) + s(x)

≤ (R− s⊗ ν)1(x) + s(x)

= [R(x,X) − s(x)ν(X)] + s(x) = 1,

where in the last equation we have used the assumption thatν(X) = 1.Part (ii) then follows from the definition ofG.To prove (iii) we first apply (ii), givingGR = G − I + Gs ⊗ ν. Consequently,

from (i),GRs = Gs − s+ ν(s)Gs ≤ 2 onX. (A.30)

Under the conditions of the lemma we haveRs (y) > 0 for everyy ∈ X, and thiscompletes the proof of (iii), with the explicit bound,

G(x, y) ≤ 2(Rs (y))−1 for all x, y ∈ X.

⊓⊔

It is now easy to establish subinvarance:

Proposition A.3.8. For ax∗-irreducible Markov chain, and any small pair(s, ν), themeasureµ = νG is always subinvariant. Writingp(s,ν) = νGs, we have

(i) p(s,ν) ≤ 1;

(ii) µ is invariant if and only ifp(s,ν) = 1.

(iii) µ is finite if and only ifνG (X) <∞.

Proof. Result (i) follows from LemmaA.3.7and the assumption thatν is a probabilitydistribution onX. The final result (iii) is just a restatement of the definitionof µ. For(ii), write

µR =∞∑

n=0

ν(R− s⊗ ν)nR

=

∞∑

n=0

ν(R− s⊗ ν)n+1 +

∞∑

n=0

ν(R− s⊗ ν)ns⊗ ν

= µ− ν + p(s,ν)ν ≤ µ.

⊓⊔


It turns out that the casep(s,ν) = 1 is equivalent to a form of recurrence.

Definition A.3.1. Recurrence

A x∗-irreducible Markov chainX is called,

(i) Harris recurrent, if the return time (A.14) is finite almost-surely from each ini-tial condition,

Px{τx∗ <∞} = 1, x ∈ X.

(ii) Positive Harris recurrent, if it is Harris recurrent, and an invariant measureπexists.

For a proof of the following result the reader is referred to [388]. A key step inthe proof is the application of PropositionA.3.3.

Proposition A.3.9. Under the conditions of PropositionA.3.8,

(i) p(s,ν) = 1 if and only ifPx∗{τx∗ < ∞} = 1. If either of these conditions holdthenGs(x) = Px{τx∗ <∞} = 1 for eachx ∈ Xx∗ .

(ii) µ(X) <∞ if and only ifEx∗ [τx∗ ] <∞. ⊓⊔

To solve Poisson’s equation (A.9) we again apply PropositionA.3.2. First note thatthe solutionh is not unique since we can always add a constant to obtain a newsolutionto (A.9). This gives us some flexibility:assumethatν(h) = 0, so that(R− s⊗ ν)h =Rh. This combined with the formulaRh = h − Rf + η given in (A.20) leads to afamiliar looking identity,

[I − (R− s⊗ ν)]h = Rc.

Provided the inversion can be justified, this leads to the representation

h = [I − (R− s⊗ ν)]−1Rc = GRc. (A.31)

Based on this we define thefundamental matrix,

Z :=GR(I − 1 ⊗ π), (A.32)

so that the function in (A.31) can be expressedh = Zc.

Proposition A.3.10. Suppose thatµ(X) <∞. If c : X → R is any function satisfyingµ(|c|) < ∞ then the functionh = Zc is finite valued on the support ofν and solvesPoisson’s equation.


Proof. We haveµ(|c|) = ν(GR|c|), which shows thatν(GR|c|) < ∞. It follows thath is finite valued a.e.[ν]. Note also from the representation ofµ,

ν(h) = ν(GRc) = µ(Rc) = µ(c) = 0.

To see thath solves Poisson’s equation we write,

Rh = (R− s⊗ ν)h = (R− s⊗ ν)GRc = GRc−Rc,

where the last equation follows from LemmaA.3.7 (ii). We conclude thath solves theversion of Poisson’s equation (A.20) for the resolvent with forcing functionRc, andPropositionA.3.2 then implies thath is a solution forP with forcing functionc. ⊓⊔

A.3.3 Near-monotone functions

A function c : X → R is callednear-monotoneif the sublevel set,Sc(r) := {x : c(x) ≤r} is finite for eachr < supx∈X c(x). In applications the functionc is typically a costfunction, and hence the near monotone assumption is the natural condition that largestates have relatively high cost.

The functionc = 1{x∗}c is near monotone sinceSc(r) consists of the singleton{x∗} for r ∈ [0, 1), and it is empty forr < 0. A solution to Poisson’s equation withthis forcing function can be constructed based on the samplepath formula (A.16),

h(x) = Ex

[τx∗−1∑

t=0

1{x∗}c(X(t)) − π({x∗}c)]

= (1 − π({x∗}c)Ex[τx∗ ] − 1x∗(x) = π(x∗)Ex[σx∗ ]

(A.33)

The last equality follows from the formulaπ(x∗)Ex∗ [τx∗ ] = 1 (see (A.15)) and thedefinitionσx∗ = 0 whenX(0) = x∗.

The fact thath is bounded from below is a special case of the following generalresult.

Proposition A.3.11. Suppose thatc is near monotone withη = π(c) <∞. Then,

(i) The relative value functionh given in (A.31) is uniformly bounded from be-low, finite-valued onXx∗ , and solves Poisson’s equation on the possibly larger setXh = {x ∈ X : h(x) <∞}.

(ii) Suppose there exists a non-negative valued function satisfying g(x) < ∞ forsomex ∈ Xx∗, and the Poisson inequality,

Dg (x) ≤ −c(x) + η, x ∈ X. (A.34)

Theng(x) = h(x) + ν(g) for x ∈ Xx∗, whereh is given in (A.31). Consequently,g solves Poisson’s equation onXx∗ .


Proof. Note that ifη = supx∈X c(x) thenc(x) ≡ η on Xx∗, so we may takeh ≡ 1 tosolve Poisson’s equation.

We henceforth assume thatη < supx∈X c(x), and defineS = {x ∈ X : c(x) ≤ η}.This set is finite sincec is near-monotone. We have the obvious boundc(x) ≥ −η1S(x)for x ∈ X, and hence

h(x) ≥ −ηGR1S (x), x ∈ X.

LemmaA.3.7and (A.30) imply thatGR1S is a bounded function onX. This completesthe proof thath is bounded from below, and PropositionA.3.10establishes Poisson’sequation.

To prove (ii) we maintain the notation used in PropositionA.3.10. On applyingLemmaA.3.6 we can assume without loss of generality that the pair(s, ν) used in thedefinition ofG are non-zero onXx∗ . Note first of all that by the resolvent equation,

Rg − g = RDg ≤ −Rc.We thus have the bound,

(R− s⊗ ν)g ≤ g −Rc− ν(g)s,

and hence for eachn ≥ 1,

0 ≤ (R − s⊗ ν)ng ≤ g −n−1∑

i=0

(R− s⊗ ν)iRc− ν(g)n−1∑

i=0

(R − s⊗ ν)is.

On lettingn ↑ ∞ this gives,

g ≥ GRc+ ν(g)Gs = h+ ν(g)h0,

whereh0 :=Gs. The functionh0 is identically one onXx∗ by PropositionA.3.9, whichimplies thatg − ν(g) ≥ h onXx∗ . Moreover, using the fact thatν(h) = 0,

ν(g − ν(g) − h) = ν(g − ν(g)) − ν(h) = 0.

Henceg − ν(g) − h = 0 a.e.[ν], and this implies thatg − ν(g) − h = 0 on Xx∗ asclaimed. ⊓⊔

Bounds on the potential matrixG are obtained in the following section to obtaincriteria for the existence of an invariant measure as well asexplicit bounds on therelative value function.

A.4 Criteria for stability

To compute the invariant measureπ it is necessary to compute the mean random sum(A.15), or invert a matrix, such as through an infinite sum as in (A.24). To verify theexistenceof an invariant measure is typically far easier.

In this section we describe Foster’s criterion to test for the existence of an invari-ant measure, and several variations on this approach which are collectively called theFoster-Lyapunov criteriafor stability. Each of these stability conditions can be inter-preted as a relaxation of the Poissoninequality(A.34).


x

∆(x)

X(t)

(x, V (x))

V (X(t))

Figure A.1:V (X(t)) is decreasing outside of the setS.

A.4.1 Foster’s criterion

Foster’s criterion is the simplest of the “Foster-Lyapunov” drift conditions for stability.It requires that for a non-negative valued functionV on X, a finite setS ⊂ X, andb <∞,

DV (x) ≤ −1 + b1S(x), x ∈ X. (V2)

This is precisely Condition (V3) (introduced at the start ofthis chapter) usingf ≡ 1.The construction of theLyapunov functionV is illustrated using the M/M/1 queue inSection3.3.

The existence of a solution to (V2) is equivalent to positiverecurrence. This issummarized in the following.

Theorem A.4.1. (Foster’s Criterion) The following are equivalent for ax∗-irreducibleMarkov chain

(i) An invariant measureπ exists.

(ii) There is a finite setS ⊂ X such thatEx[τS ] <∞ for x ∈ S.

(iii) There existsV : X → (0,∞], finite at somex0 ∈ X, a finite setS ⊂ X, andb <∞ such that Foster’s Criterion (V2) holds.

If (iii) holds then there existsbx∗ <∞ such that

Ex[τx∗ ] ≤ V (x) + bx∗ , x ∈ X.

Proof. We just prove the implication (iii)=⇒ (i). The remaining implications may befound in [367, Chapter 11].


Take any pair(s, ν) positive onXx∗ and satisfyingR ≥ s ⊗ ν. On applyingPropositionA.3.8 it is enough to shown thatµ(X) <∞ with µ = νG.

Letting f ≡ 1 we have under (V2)DV ≤ −f + b1S , and on applyingR to bothsides of this inequality we obtain using the resolvent equation (A.19), (R − I)V =RDV ≤ −Rf + bR1S , or on rearranging terms,

RV ≤ V −Rf + bR1S . (A.35)

From (A.35) we have(R−s⊗ν)V ≤ V −Rf+g, whereg :=bR1S . On iteratingthis inequality we obtain,

(R− s⊗ ν)2V ≤ (R − s⊗ ν)(V −Rf + g)

≤ V −Rf + g

−(R− s⊗ ν)Rf

+(R− s⊗ ν)g.

By induction we obtain for eachn ≥ 1,

0 ≤ (R − s⊗ ν)nV ≤ V −n−1∑

i=0

(R− s⊗ ν)iRf +n−1∑

i=0

(R− s⊗ ν)ig .

Rearranging terms then gives,

n−1∑

i=0

(R− s⊗ ν)iRf ≤ V +

n−1∑

i=0

(R− s⊗ ν)ig,

and thus from the definition (A.24) we obtain the bound,

GRf ≤ V +Gg. (A.36)

To obtain a bound on the final term in (A.36) recall thatg := bR1S. From itsdefinition we have,

GR = G[R − s⊗ ν] +G[s⊗ ν] = G− I + (Gs) ⊗ ν,

which shows thatGg = bGR1S ≤ b[G1S + ν(S)Gs].

This is uniformly bounded overX by LemmaA.3.7. Sincef ≡ 1 the bound (A.36)implies thatGRf (x) = G(x,X) ≤ V (x) + b1, x ∈ X, with b1 an upper bound onGg.

Integrating both sides of the bound (A.36) with respect toν gives,

µ(X) =∑

x∈X

ν(x)G(x,X) ≤ ν(V ) + ν(g).

The minorization and the drift inequality (A.35) give

sν(V ) = (s⊗ ν)(V ) ≤ RV ≤ V − 1 + g,


which establishes finiteness ofν(V ), and the bound,

ν(V ) ≤ infx∈X

V (x) − 1 + g(x)

s(x).

⊓⊔

The following result illustrates the geometric considerations that may be requiredin the construction of a Lyapunov function, based on the relationship between the gra-dient∇V (x), and thedrift vector field∆: X → R

ℓ defined by

∆(x) := E[X(t+ 1) −X(t) | X(t) = x], x ∈ X. (A.37)

This geometry is illustrated in FigureA.1 based on the following proposition.

Proposition A.4.2. Consider a Markov chain onX ⊂ Zℓ+, and a C1 function

V : Rℓ → R+ satisfying the following conditions:

(a) The chain is skip-free in the mean, in the sense that

bX := supx∈X

E[‖X(t + 1) −X(t)‖ | X(t) = x] <∞;

(b) There existsε0 > 0, b0 <∞, such that,

〈∆(y),∇V (x)〉 ≤ −(1 + ε0) + b0(1 + ‖x‖)−1‖x− y‖, x, y ∈ X. (A.38)

Then the functionV solves Foster’s criterion (V2).

Proof. This is an application of the Mean Value Theorem which asserts that there existsa stateX ∈ R

ℓ on the line segment connectingX(t) andX(t+ 1) with,

V (X(t+ 1)) = V (X(t)) + 〈∇V (X), (X(t + 1) −X(t))〉,

from which the following bound follows:

V (X(t+ 1)) ≤ V (X(t)) − (1 + ε0) + b0(1 + ‖X(t)‖)−1‖X(t+ 1) −X(t)‖

Under the skip-free assumption this shows that

DV (x) = E[V (X(t+1)−V (X(t)) | X(t) = x] ≤ −(1+ε0)+b0(1+‖x‖)−1bX , ‖x‖ ≥ n0.

Hence Foster’s Criterion is satisfied with the finite set,S = {x ∈ X : (1+‖x‖)−1bX ≥ε0}. ⊓⊔

sean

Sticky Note

This result and all that lean on it are in need of work! I should NOT write <Delta(y), nabla V(x)> < ...I need a bound on <Delta(x), nabla V(x)>and a separate bound on Delta(x)-Delta(x+y)


A.4.2 Criteria for finite moments

We now turn to the issue of performance bounds based on the discounted-cost defined in(A.2) or the average costη = π(c) for a cost functionc : X → R+. We also introducemartingale methods to obtain performance bounds. We let{Ft : t ≥ 0} denote thefiltration, or history generated by the chain,

Ft := σ{X(0), . . . ,X(t)}, t ≥ 0.

Recall that a random variableτ taking values inZ+ is called astopping timeif for eacht ≥ 0,

{τ = t} ∈ Ft.That is, by observing the processX on the time interval[0, t] it is possible to determinewhether or notτ = t.

The Comparison Theorem is the most common approach to obtaining bounds onexpectations involving stopping times.

Theorem A.4.3. (Comparison Theorem) Suppose that the non-negative functionsV, f, g satisfy the bound,

DV ≤ −f + g. x ∈ X. (A.39)

Then for eachx ∈ X and any stopping timeτ we have

Ex

[τ−1∑

t=0

f(X(t))]≤ V (x) + Ex

[τ−1∑

t=0

g(X(t))].

Proof. DefineM(0) = V (X(0)), and forn ≥ 1,

M(n) = V (X(n)) +

n−1∑

t=0

(f(X(t)) − g(X(t))).

The assumed inequality can be expressed,

E[V (X(t + 1)) | Ft] ≤ V (X(t)) − f(X(t)) + g(X(t)), t ≥ 0,

which shows that the stochastic processM is asuper-martingale,

E[M(n + 1) | Fn] ≤M(n), n ≥ 0.

Define forN ≥ 1,

τN = min{t ≤ τ : t+ V (X(t)) + f(X(t)) + g(X(t)) ≥ N}.

This is also a stopping time. The processM is uniformly bounded below by−N2

on the time-interval(0, . . . , τN − 1), and it then follows from the super-martingaleproperty that


E[M(τN )] ≤ E[M(0)] = V (x), N ≥ 1.

From the definition ofM we thus obtain the desired conclusion withτ replaced byτN : For each initial conditionX(0) = x,

Ex

[τN−1∑

t=0

f(X(t))]≤ V (x) + Ex

[τN−1∑

t=0

g(X(t))].

The result then follows from the Monotone Convergence Theorem since we haveτN ↑ τ asN → ∞. ⊓⊔

In view of the Comparison Theorem, to boundπ(c) we search for a solution to (V3)or (A.39) with |c| ≤ f . The existence of a solution to either of these drift inequalitiesis closely related to the following stability condition,

Definition A.4.1. Regularity

Suppose thatX is a x∗-irreducible Markov chain, and thatc : X → R+ is a givenfunction. The chain is calledc-regular if the following cost over ay-cycle is finite foreach initial conditionx ∈ X, and eachy ∈ Xx∗:

Ex

[τy−1∑

t=0

c(X(t))]<∞.

Proposition A.4.4. Suppose that the functionc : X → R satisfiesc(x) ≥ 1 outside ofsome finite set. Then,

(i) If X is c-regular then it is positive Harris recurrent andπ(c) <∞.

(ii) Conversely, ifπ(c) < ∞ then the chain restricted to the support ofπ is c-regular.

Proof. The result follows from [367, Theorem 14.0.1]. To prove (i) observe thatX isHarris recurrent sincePx{τx∗ <∞} = 1 for all x ∈ X when the chain isc-regular. Wehave positivity andπ(c) <∞ based on the representation (A.15). ⊓⊔

Criteria forc-regularity will be established through operator manipulations similarto those used in the proof of TheoremA.4.1 based on the following refinement ofFoster’s Criterion: For a non-negative valued functionV on X, a finite setS ⊂ X,b <∞, and a functionf : X → [1,∞),

DV (x) ≤ −f(x) + b1S(x), x ∈ X. (V3)

The functionf is interpreted as a bounding function. In TheoremA.4.5 we considerπ(c) for functionsc bounded byf in the sense that,

‖c‖f := supx∈X

|c(x)|f(x)

<∞. (A.40)


Theorem A.4.5. Suppose thatX is x∗-irreducible, and that there existsV : X →(0,∞), f : X → [1,∞), a finite setS ⊂ X, andb < ∞ such that (V3) holds. Then forany functionc : X → R+ satisfying‖c‖f ≤ 1,

(i) The average cost satisfies the uniform bound,

ηx = π(c) ≤ b <∞, x ∈ X.

(ii) The discounted-cost value function satisfies the followinguniform bound, forany given discount parameterγ > 0,

hγ(x) ≤ V (x) + bγ−1, x ∈ X.

(iii) There exists a solution to Poisson’s equation satisfying, for someb1 <∞,

h(x) ≤ V (x) + b1, x ∈ X.

Proof. Observe that (ii) and the definition (A.6) imply (i).To prove (ii) we apply the resolvent equation,

PRγ = RγP = (1 + γ)Rγ − I. (A.41)

Equation (A.41) is a restatement of Equation (A.29). Consequently, under (V3),

(1 + γ)RγV − V = RγPV ≤ Rγ [V − f + b1S ].

Rearranging terms givesRγf + γRγV ≤ V + bRγ1S. This establishes (ii) sinceRγ1S (x) ≤ Rγ(x,X) ≤ γ−1 for x ∈ X.

We now prove (iii). Recall that the measureµ = νG is finite and invariant sincewe may apply TheoremA.4.1when the chain isx∗-irreducible. We shall prove that thefunctionh = GRc given in (A.31) satisfies the desired upper bound.

The proof of the implication (iii)=⇒ (i) in TheoremA.4.1 was based upon thebound (A.36),

GRf ≤ V +Gg,

whereg := bR1S. Although it was assumed there thatf ≡ 1, the same steps lead tothis bound for generalf ≥ 1 under (V3). Consequently, since0 ≤ c ≤ f ,

GRc ≤ GRf ≤ V +Gg.

Part (iii) follows from this bound and LemmaA.3.7with b1 := supGg (x) <∞. ⊓⊔

PropositionA.4.2can be extended to provide the following criterion for finitemo-ments in a skip-free Markov chain:

Proposition A.4.6. Consider a Markov chain onX ⊂ Rℓ, and a C1 function

V : Rℓ → R+ satisfying the following conditions:


(i) The chain is skip-free in mean-square:

bX2 := supx∈X

E[‖X(t + 1) −X(t)‖2 | X(t) = x] <∞;

(ii) There existsb0 <∞ such that,

〈∆(y),∇V (x)〉 ≤ −‖x‖ + b0‖x− y‖2, x, y ∈ X. (A.42)

Then the functionV solves (V3) withf(x) = 1 + 12‖x‖. ⊓⊔

A.4.3 State-dependent drift

In this section we consider consequences of state-dependent drift conditions of the form

∑

y∈X

Pn(x)(x, y)V (y) ≤ g[V (x), n(x)], x ∈ Sc, (A.43)

wheren(x) is a function fromX to Z+, g is a function depending on which type ofstability we seek to establish, andS is a finite set.

The functionn(x) here provides the state-dependence of the drift conditions, sincefrom anyx we must waitn(x) steps for the drift to be negative.

In order to develop results in this framework we work with a sampled chainX.Usingn(x) we define the new transition law{P (x,A)} by

P (x,A) = Pn(x)(x,A), x ∈ X, A ⊂ X, (A.44)

and letX denote a Markov chain with this transition law. This Markov chain can beconstructed explicitly as follows. The timen(x) is a (trivial) stopping time. Let{nk}denote its iterates: That is, along any sample path,n0 = 0, n1 = n(x) and

nk+1 = nk + n(X(nk)).

Then it follows from the strong Markov property that

X(k) = X(nk), k ≥ 0 (A.45)

is a Markov chain with transition lawP .Let Fk = Fnk

be theσ-field generated by the events “beforenk”: that is,

Fk := {A : A ∩ {nk ≤ n} ∈ Fn, n ≥ 0}.

We let τS denote the first return time toS for the chainX. The timenk and the event{τS ≥ k} areFk−1-measurable for anyS ⊂ X.

The integernτS is a particular time at which the original chain visits the set S.Minimality implies the bound,

nτS ≥ τS. (A.46)


By adding the lengths of the sampling timesnk along a sample path for the sampledchain, the timenτS can be expressed as the sum,

nτS =

τS−1∑

k=0

n(X(k)). (A.47)

These relations enable us to first apply the drift condition (A.43) to bound the index atwhich X reachesS, and thereby bound the hitting time for the original chain.

We prove here a state-dependent criterion for positive recurrence. Generalizationsare described in the Notes section in Chapter10, and Theorem10.0.1contains strength-ened conclusions for the CRW network model.

Theorem A.4.7. Suppose thatX is a x∗-irreducible chain onX, and letn(x) be afunction fromX to Z+. The chain is positive Harris recurrent if there exists somefinitesetS, a functionV : X → R+, and a finite constantb satisfying

∑

y∈X

Pn(x)(x, y)V (y) ≤ V (x) − n(x) + b1S(x), x ∈ X (A.48)

in which case for allxEx[τS ] ≤ V (x) + b. (A.49)

Proof. The state-dependent drift criterion for positive recurrence is a direct conse-quence of thef -regularity results of TheoremA.4.3, which tell us that without anyirreducibility or other conditions onX, if f is a non-negative function and

∑

y∈X

P (x, y)V (y) ≤ V (x) − f(x) + b1S(x), x ∈ X (A.50)

for some setS then for eachx ∈ X

Ex

[τS−1∑

t=0

f(X(t))]≤ V (x) + b. (A.51)

We now apply this result to the chainX defined in (A.45). From (A.48) we canuse (A.51) for X, with f(x) taken asn(x), to deduce that

Ex

[τS−1∑

k=0

n(X(k))]≤ V (x) + b. (A.52)

Thus from (A.46,A.47) we obtain the bound (A.49). TheoremA.4.1 implies thatX ispositive Harris. ⊓⊔

A.5 Ergodic theorems and coupling

The existence of a Lyapunov function satisfying (V3) leads to the ergodic theorems(1.23), and refinements of this drift inequality lead to stronger results. These results arebased on the coupling method described next.


A.5.1 Coupling

Couplingis a way of comparing the behavior of the process of interestX with anotherprocessY which is already understood. For example, ifY is taken as the stationaryversion of the process, withY (0) ∼ π, we then have the trivial mean ergodic theorem,

limt→∞

E[c(Y (t))] = E[c(Y (t0))], t0 ≥ 0 .

This leads to a corresponding ergodic theorem forX provided the two processes couplein a suitably strong sense.

To precisely define coupling we define a bivariate process,

Ψ(t) =

(X(t)

Y (t)

), t ≥ 0,

whereX andY are two copies of the chain with transition probabilityP , and differentinitial conditions. It is assumed throughout thatX is x∗-irreducible, and we define thecoupling timefor Ψ as the first time both chains reachx∗ simultaneously,

T = min(t ≥ 1 : X(t) = Y (t) = x∗) = min(t : Ψ(t) =

(x∗x∗

)).

To give a full statistical description ofΨ we need to explain howX andY arerelated. We assume a form of conditional independence fork ≤ T :

P{Ψ(t+ 1) = (x1, y1)T | Ψ(0), . . . ,Ψ(t);Ψ(t) = (x0, y0)

T, T > t}

= P (x0, x1)P (y0, y1).(A.53)

It is assumed that the chains coellesce at timeT , so thatX(t) = Y (t) for t ≥ T .The processΨ is not itself Markov since givenΨ(t) = (x, x)T with x 6= x∗ it is

impossible to know ifT ≤ t. However, by appending the indicator function of thisevent we obtain a Markov chain denoted,

Ψ∗(t) = (Ψ(t),1{T ≤ t}),

with state spaceX∗ = X × X × {0, 1}. The subsetX × X × {1} is absorbing for thischain.

The following two propositions allow us to infer propertiesof Ψ∗ based on prop-

erties ofX. The proof of PropositionA.5.1 is immediate from the definitions.

Proposition A.5.1. Suppose thatX satisfies (V3) withf coercive. Then (V3) holdsfor the bivariate chainΨ∗ in the form,

E[V∗(Ψ(t+ 1)) | Ψ(t) = (x, y)T] ≤ V∗(x, y) − f∗(x, y) + b∗,

with V∗(x, y) = V (x) + V (y), f∗(x, y) = f(x) + f(y), andb∗ = 2b. Consequently,there existsb0 <∞ such that,

E

[T−1∑

t=0

(f(X(t)) + f(Y (t))

)]≤ 2[V (x) + V (y)] + b0, x, y ∈ X.

⊓⊔


A necessary condition for the Mean Ergodic Theorem for arbitrary initial condi-tions is aperiodicity. Similarly, aperiodicity is both necessary and sufficient forx∗∗-irreducibility of Ψ∗ with x∗∗ := (x∗, x∗, 1)T ∈ X

∗:

Proposition A.5.2. Suppose thatX is x∗-irreducible and aperiodic. Then the bivari-ate chain isx∗∗-irreducible and aperiodic.

Proof. Fix anyx, y ∈ X, and define

n0 = min{n ≥ 0 : Pn(x, x∗)Pn(y, x∗) > 0}.

The minimum is finite sinceX is x∗-irreducible and aperiodic. We haveP{T ≤ n} =0 for n < n0 and by the construction ofΨ,

P{T = n0} = P{Ψ(n0) = (x∗, x∗)T | T ≥ n0} = Pn0(x, x∗)Pn0(y, x∗) > 0.

This establishesx∗∗-irreducibility.Forn ≥ n0 we have,

P{Ψ∗(n) = x∗∗} ≥ P{T = n0, Ψ∗(n) = x∗∗} = Pn0(x, x∗)Pn0(y, x∗)Pn−n0(x∗, x∗).

The right hand side is positive for alln ≥ 0 sufficiently large sinceX is aperiodic. ⊓⊔

A.5.2 Mean ergodic theorem

A mean ergodic theorem is obtained based upon the followingcoupling inequality:

Proposition A.5.3. For any giveng : X → R we have,∣∣E[g(X(t))] − E[g(Y (t))]

∣∣ ≤ E[(|g(X(t))| + |g(Y (t))|)1(T > t)].

If Y (0) ∼ π so thatY is stationary we thus obtain,

|E[g(X(t))] − π(g)| ≤ E[(|g(X(t))| + |g(Y (t))|)1(T > t)].

Proof. The differenceg(X(t)) − g(Y (t)) is zerofor t ≥ T . ⊓⊔

Thef -total variation normof a signed measureµ onX is defined by

‖µ‖f = sup{|µ(g)| : ‖g‖f ≤ 1}.

Whenf ≡ 1 then this is exactly twice thetotal-variation norm: For any two probabilitymeasuresπ, µ,

‖µ− π‖tv := supA⊂X

|µ(A) − π(A)|.

Theorem A.5.4. Suppose thatX is aperiodic, and that the assumptions of Theo-remA.4.5hold. Then,


(i) ‖P t(x, · ) − π‖f → 0 ast→ ∞ for eachx ∈ X.

(ii) There existsb0 <∞ such that for eachx, y ∈ X,

∞∑

t=0

‖P t(x, · ) − P t(y, · )‖f ≤ 2[V (x) + V (y)] + b0.

(iii) If in additionπ(V ) <∞, then there existsb1 <∞ such that

∞∑

t=0

‖P t(x, · ) − π‖f ≤ 2V (x) + b1.

The coupling inequality is only useful if we can obtain a bound on the expectationE[|g(X(t))|1(T > t)]. The following result shows that this vanishes whenX andY

are each stationary.

Lemma A.5.5. Suppose thatX is aperiodic, and that the assumptions of Theo-remA.4.5hold. Assume moreover thatX(0) andY (0) each have distributionπ, andthatπ(|g|) <∞. Then,

limt→∞

E[(|g(X(t))| + |g(Y (t))|)1(T > t)] = 0.

Proof. Suppose thatX, Y are defined on the two-sided time-interval with marginaldistributionπ. It is assumed that these processes are independent on{0,−1,−2, . . . }.By stationarity we can write,

Eπ[|g(X(t))|1(T > t)] = Eπ[|g(X(t))|1{Ψ(i) 6= (x∗, x∗)T, i = 0, . . . , t}]

= Eπ[|g(X(0))|1{Ψ(i) 6= (x∗, x∗)T, i = 0,−1, . . . ,−t}] .

The expression within the expectation on the right hand sidevanishes ast → ∞ withprobability one by(x∗, x∗)T-irreducibility of the stationary process{Ψ(−t) : t ∈ Z+}.The Dominated Convergence Theorem then implies that

limt→∞

E[|g(X(t))|1(T > t)] = Eπ[|g(X(0))|1{Ψ(i) 6= (x∗, x∗)T, i = 0,−1, . . . ,−t}] = 0.

Repeating the same steps withX replaced byY we obtain the analogous limit bysymmetry. ⊓⊔

Proof of TheoremA.5.4. We first prove (ii). From the coupling inequality we have,with X(0) = x,X◦(0) = y,

|P tg (x) − P tg (y)| = |E[g(X(t))] − E[g(Y (t))]|

≤ E[(|g(X(t))| + |g(Y (t))|

)1(T > t)

]

≤ ‖g‖fE[(f(X(t)) + f(Y (t))

)1(T > t)

]


Taking the supremum over allg satisfying‖g‖f ≤ 1 then gives,

‖P t(x, · ) − P t(y, · )‖f ≤ E[(f(X(t)) + f(Y (t))

)1(T > t)

], (A.54)

so that on summing overt,

∞∑

t=0

‖P t(x, · ) − P t(y, · )‖f ≤∞∑

t=0

E[(f(X(t)) + f(Y (t))

)1(T > t)

]

= E

[T−1∑

t=0

(f(X(t)) + f(Y (t))

)].

Applying PropositionA.5.1completes the proof of (ii).To see (iii) observe that,

∑

y∈X

π(y)|P tg (x) − P tg (y)| ≥∣∣∣∑

y∈X

π(y)[P tg (x) − P tg (y)]∣∣∣ = |P tg (x) − π(g)|.

Hence by (ii) we obtain (iii) withb1 = b0 + 2π(V ).

Finally we prove (i). Note that we only need establish the mean ergodic theorem in(i) for a single initial conditionx0 ∈ X. To see this, first note that we have the triangleinequality,

‖P t(x, · )−π( · )‖f ≤ ‖P t(x, · )−P t(x0, · )‖f+‖P t(x0, · )−π( · )‖f , x, x0 ∈ X.

From this bound and Part (ii) we obtain,

lim supt→∞

‖P t(x, · ) − π( · )‖f ≤ lim supt→∞

‖P t(x0, · ) − π( · )‖f .

Exactly as in (A.54) we have, withX(0) = x0 andY (0) ∼ π,

‖P t(x0, · ) − π( · )‖f ≤ E[(f(X(t)) + f(Y (t))

)1(T > t)

]. (A.55)

We are left to show that the right hand side converges to zero for somex0. ApplyingLemmaA.5.5we obtain,

limt→∞

∑

x,y

π(x)π(y)E[[f(X(t)) + f(Y (t))]1(T > t) | X(0) = x, Y (0) = y

]= 0.

It follows that the right hand side of (A.55) vanishes ast → ∞ whenX(0) = x0 andY (0) ∼ π. ⊓⊔

A.5.3 Geometric ergodicity

TheoremA.5.4 provides a mean ergodic theorem based on the coupling timeT . If wecan control the tails of the coupling timeT then we obtain a rate of convergence ofP t(x, · ) to π.

The chain is calledgeometrically recurrentif Ex∗ [exp(ετx∗)] < ∞ for someε >0. For such chains it is shown in TheoremA.5.6that for a.e.[π] initial conditionx ∈ X,the total variation norm vanishes geometrically fast.


Theorem A.5.6. The following are equivalent for an aperiodic,x∗-irreducible Markovchain:

(i) The chain is geometrically recurrent.

(ii) There existsV : X → [1,∞] withV (x0) <∞ for somex0 ∈ X, ε > 0, b <∞,and a finite setS ⊂ X such that

DV (x) ≤ −εV (x) + b1S(x), x ∈ X. (V4)

(iii) For somer > 1,

∞∑

n=0

‖Pn(x∗, ·) − π(·)‖1rn <∞.

If any of the above conditions hold, then withV given in (ii), we can findr0 > 1 andb <∞ such that the stronger mean ergodic theorem holds: For eachx ∈ X, t ∈ Z+,

‖P t(x, · ) − π( · )‖V := sup|g|≤V

∣∣Ex[g(X(t)) − π(t)]∣∣ ≤ br−t0 V (x). (A.56)

⊓⊔

In applications TheoremA.5.6 is typically applied by constructing a solution tothe drift inequality (V4) to deduce the ergodic theorem in (A.56). The following resultshows that (V4) is not that much stronger than Foster’s criterion.

Proposition A.5.7. Suppose that the Markov chainX satisfies the following threeconditions:

(i) There existsV : X → (0,∞), a finite setS ⊂ X, andb < ∞ such that Foster’sCriterion (V2) holds.

(ii) The functionV is uniformly Lipschitz,

lV := sup{|V (x) − V (y)| : x, y ∈ X, ‖x− y‖ ≤ 1} <∞.

(iii) For someβ0 > 0, b1 <∞,

b1 := supx∈X

Ex[eβ0‖X(1)−X(0)‖] <∞.

Then, there existsε > 0 such that the controlled process isVε-uniformly ergodic withVε = exp(εV ).

Proof. Let ∆V = V (X(1)) − V (X(0)), so thatEx[∆V ] ≤ −1 + b1S(x) under (V2).Using a second order Taylor expansion we obtain for eachx andε > 0,


[Vε(x)]−1PVε (x) = Ex

[exp

(ε∆V

)]

= Ex

[1 + ε∆V + 1

2ε2∆2

V exp(εϑx∆V

)]

≤ 1 + ε(−1 + b1S(x)

)+ 1

2ε2Ex

[∆2V exp

(εϑx∆V

)](A.57)

whereϑx ∈ [0, 1]. Applying the assumed Lipschitz bound and the bound12z

2 ≤ ez forz ≥ 0 we obtain, for anya > 0,

12∆2

V exp(εϑx∆V

)≤ a−2 exp

((a+ ε)

∣∣∆V

∣∣)

≤ a−2 exp((a+ ε)lV

∥∥X(1) −X(0)∥∥)

Settinga = ε1/3 and restrictingε > 0 so that(a + ε)lV ≤ β0, the bound (A.57) and(iii) then give,

[Vε(x)]−1PVε (x) ≤ (1 − ε) + εb1S(x) + ε4/3b1

This proves the theorem, since we have1− ε+ ε4/3b1 < 1 for sufficiently smallε > 0,and thus (V4) holds forVε. ⊓⊔

A.5.4 Sample paths and limit theorems

We conclude this section with a look at the sample path behavior of partial sums,

Sg(n) :=

n−1∑

t=0

g(X(t)) (A.58)

We focus on two limit theorems under (V3):

LLN TheStrong Law of Large Numbersholds for a functiong if for each initial con-dition,

limn→∞

1

nSg(n) = π(g) a.s.. (A.59)

CLT The Central Limit Theoremholds forg if there exists a constant0 < σ2g < ∞

such that for each initial conditionx ∈ X,

limn→∞

Px

{(nσ2

g)−1/2Sg(n) ≤ t

}=

∫ t

−∞

1√2πe−x

2/2 dx

whereg = g − π(g). That is, asn→ ∞,

(nσ2g)

−1/2Sg(n)w−→ N(0, 1).

The LLN is a simple consequence of the coupling techniques already used to provethe mean ergodic theorem when the chain is aperiodic and satisfies (V3). A slightlydifferent form of coupling can be used when the chain is periodic. There is only roomfor a survey of theory surrounding the CLT, which is most elegantly approached usingmartingale methods. A relatively complete treatement may be found in [367], and themore recent survey [282].

The following versions of the LLN and CLT are based on Theorem17.0.1 of [367].


Theorem A.5.8. Suppose thatX is positive Harris recurrent and that the functiongsatisfiesπ(|g|) <∞. Then the LLN holds for this function.

If moreover (V4) holds withg2 ∈ LV∞ then,

(i) Letting g denote the centered functiong = g −∫g dπ, the constant

σ2g := Eπ[g

2(X(0))] + 2

∞∑

t=1

Eπ[g(X(0))g(X(t))] (A.60)

is well defined, non-negative and finite, and

limn→∞

1

nEπ

[(Sg(n)

)2]= σ2

g . (A.61)

(ii) If σ2g = 0 then for each initial condition,

limn→∞

1√nSg(n) = 0 a.s..

(iii) If σ2g > 0 then the CLT holds for the functiong.

⊓⊔

The proof of the theorem in [367] is based on consideration of the martingale,

Mg(t) := g(X(t)) − g(X(0)) +

t−1∑

i=0

g(X(i)), t ≥ 1,

with Mg(0) := 0. This is a martingale since Poisson’s equationP g = g − g gives,

E[g(X(t)) | X(0), . . . ,X(t− 1)] = g(X(t− 1)) − g(X(t− 1)),

so that,E[Mg(t) | X(0), . . . ,X(t− 1)] = Mg(t− 1).

The proof of the CLT is based on the representationSg(t) = Mg(t) + g(X(t)) −g(X(0)), combined with limit theory for martingales, and the boundson solutions toPoisson’s equation given in TheoremA.4.5.

An alternate representation for the asymptotic variance can be obtained through thealternate representation for the martingale as the partialsums of a martingale differencesequence,

Mg(t) =t∑

i=1

∆g(i), t ≥ 1,

with {∆g(t) := g(X(t)) − g(X(t − 1)) + g(X(t − 1))}. Based on the martingaledifference property,

E[∆g(t) | Ft−1] = 0, t ≥ 1,


it follows that these random variables are uncorrelated, sothat the variance ofM g canbe expressed as the sum,

E[(Mg(t))2] =

t∑

i=1

E[(∆g(i))2], t ≥ 1.

In this way it can be shown that the asymptotic variance is expressed as the steady-statevariance of∆g(i). For a proof of (A.62) (under conditions much weaker than assumedin PropositionA.5.9) see [367, Theorem 17.5.3].

Proposition A.5.9. Under the assumptions of TheoremA.5.8the asymptotic variancecan be expressed,

σ2g = Eπ[(∆g(0))

2] = π(g2 − (P g)2) = π(2gg − g2). (A.62)

⊓⊔

A.6 Converse theorems

The aim of SectionA.5 was to explore the application of (V3) and the coupling method.We now explain why (V3) isnecessaryas well as sufficient for these ergodic theoremsto hold.

Converse theorems abound in the stability theory of Markov chains. Theo-rem A.6.1 contains one such result: Ifπ(f) < ∞ then there is a solution to (V3),defined as a certain “value function”. For ax∗-irreducible chain the solution takes theform,

PVf = Vf − f + bf1x∗ , (A.63)

where the Lyapunov functionVf defined in (A.64) is interpreted as the ‘cost to reachthe statex∗’. The identity (A.63) is a dynamic programming equation for theshortestpath problemdescribed in Section9.4.1.

Theorem A.6.1. Suppose thatX is ax∗-irreducible, positive recurrent Markov chainonX and thatπ(f) <∞, wheref : X → [1,∞] is given. Then, with

Vf (x) := Ex

[σx∗∑

t=0

f(X(t))], x ∈ X, (A.64)

the following conclusions hold:

(i) The setXf = {x : Vf (x) <∞} is non-empty and absorbing:

P (x,Xf ) = 1 for all x ∈ Xf .

(ii) The identity (A.63) holds withbf := Ex∗

[ τx∗∑

t=1

f(X(t))]<∞.


(iii) For x ∈ Xf ,

limt→∞

1

tEx[Vf (X(t))] = lim

t→∞Ex[Vf (X(t))1{τx∗ > t}] = 0.

Proof. Applying the Markov property, we obtain for eachx ∈ X,

PVf (x) = Ex

[EX(1)

[σx∗∑

t=0

f(X(t))]]

= Ex

[E

[ τx∗∑

t=1

f(X(t)) | X(0),X(1)]]

= Ex

[ τx∗∑

t=1

f(X(t))]

= Ex

[ τx∗∑

t=0

f(X(t))]− f(x), x ∈ X.

On noting thatσx∗ = τx∗ for x 6= x∗, the identity above implies the desired identity in(ii).

Based on (ii) it follows thatXf is absorbing. It is non-empty since it containsx∗,which proves (i).

To prove the first limit in (iii) we iterate the idenitity in (ii) to obtain,

Ex[Vf (X(t))] = P tVf (x) = Vf (x) +t−1∑

k=0

[−P kf (x) + bfPk(x, x∗)], t ≥ 1.

Dividing by t and lettingt→ ∞ we obtain, wheneverVf (x) <∞,

limt→∞

1

tEx[Vf (X(t))] = lim

t→∞1

t

t−1∑

k=0

[−P kf (x) + bfPk(x, x∗)].

Applying (i) and (ii) we conclude that the chain can be restricted toXf , and the re-stricted process satisfies (V3). Consequently, the conclusions of the Mean ErgodicTheoremA.5.4hold for initial conditionsx ∈ Xf , which gives

limt→∞

1

tEx[Vf (X(t))] = −π(f) + bfπ(x∗),

and the right hand side is zero for by (ii).By the definition ofVf and the Markov property we have for eachm ≥ 1,

Vf (X(m)) = EX(m)

[σx∗∑

t=0

f(X(t))]

= E

[ τx∗∑

t=m

f(X(t)) | Fm], on{τx∗ ≥ m}.

(A.65)


Moreover, the event{τx∗ ≥ m} is Fm measurable. That is, one can determine ifX(t) = x∗ for somet ∈ {1, . . . ,m} based onFm :=σ{X(t) : t ≤ m}. Consequently,by the smoothing property of the conditional expectation,

Ex[Vf (X(m))1{τx∗ ≥ m}] = E

[1{τx∗ ≥ m}E

[ τx∗∑

t=m

f(X(t)) | Fm]]

= E

[1{τx∗ ≥ m}

τx∗∑

t=m

f(X(t))]≤ E

[ τx∗∑

t=m

f(X(t))]

If Vf (x) < ∞, then the right hand side vanishes asm → ∞ by the Dominated Con-vergence Theorem. This proves the second limit in (iii). ⊓⊔

Proposition A.6.2. Suppose that the assumptions of TheoremA.6.1hold: X is ax∗-irreducible, positive recurrent Markov chain onX with π(f) < ∞. Suppose that there

existsg ∈ Lf∞ andh ∈ LVf∞ satisfying,

Ph = h− g.

Thenπ(g) = 0, so thath is a solution to Poisson’s equation with forcing functiong.Moreover, forx ∈ Xf ,

h(x) − h(x∗) = Ex

[τx∗−1∑

t=0

g(X(t))]. (A.66)

Proof. LetMh(t) = h(X(t)) − h(X(0)) +∑t−1

k=0 g(X(k)), t ≥ 1,Mh(0) = 0. ThenMh is a zero-mean martingale,

E[Mh(t)] = 0, and E[Mh(t+ 1) | Ft] = Mh(t), t ≥ 0.

It follows that the stopped process is a martingale,

E[Mh(τx∗ ∧ (r + 1)) | Fr] = Mh(τx∗ ∧ r), r ≥ 0.

Consequently, for anyr,

0 = Ex[Mh(τx∗ ∧ r)] = Ex

[h(X(τx∗ ∧ r)) − h(X(0)) +

τx∗∧r−1∑

t=0

g(X(t))].

On rearranging terms and subtractingh(x∗) from both sides,

h(x) − h(x∗) = Ex

[[h(X(r)) − h(x∗)]1{τx∗ > r} +

τx∗∧r−1∑

t=0

g(X(t))], (A.67)

where we have used the fact thath(X(τx∗ ∧ t)) = h(x∗) on{τx∗ ≤ t}.

Applying TheoremA.6.1 (iii) and the assumption thath ∈ LVf∞ gives,


lim supr→∞

∣∣∣Ex[(h(X(r)) − h(x∗)

)1{τx∗ > r}

]∣∣∣

≤ (‖h‖Vf+ |h(x∗)|) lim sup

r→∞Ex[Vf (X(r))1{τx∗ > r}] = 0.

Hence by (A.67), for anyx ∈ Xf ,

h(x) − h(x∗) = limr→∞

Ex

[τx∗∧r−1∑

t=0

g(X(t))].

Exchanging the limit and expectation completes the proof. This exchange is justifiedby the Dominated Convergence Theorem wheneverVf (x) <∞ sinceg ∈ Lf∞. ⊓⊔

Documents

Markov Models - meyn.ece.ufl.edu€¦ · Markov Models This appendix describes stability theory and ergodic theory for Markov chains on a countable state space that provides foundations