CONVERGENCE ANALYSIS OF BIRTH-DEATH MARKOV …ufdcimages.uflib.ufl.edu/UF/E0/05/02/42/00001/HA_T.pdf2008). Another way to obtain geometric ergodicity is via spectral theory, as we

CONVERGENCE ANALYSIS OF BIRTH-DEATH MARKOV CHAINS AND GIBBSSAMPLERS

By

TRUNG HA

A DISSERTATION PRESENTED TO THE GRADUATE SCHOOLOF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT

OF THE REQUIREMENTS FOR THE DEGREE OFDOCTOR OF PHILOSOPHY

UNIVERSITY OF FLORIDA

2016

c© 2016 Trung Ha

To my family

ACKNOWLEDGMENTS

Most of all, I owe my gratitude to my advisor, Dr. James Hobert. He has been a

great mentor and supporter. He gave me advice to take some very helpful courses for

my research. He suggested me many research projects and gave me the freedom to

select my favorite ones. He was patient with my research progression when I needed

time to read some textbooks for my research. He strongly supported me when I had

problem with my visa, so I could keep my mind on my research.

I would like to thank my dissertation committee of Dr. Brett Presnell, Dr. Kshitij

Khare, and Dr. Taylor Stein. Dr. Presnell helped me to improve my notations. Dr. Kshitij

shared some important notes with me. Dr. Taylor Stein quickly helped me to set up my

exam schedule.

I would like to thank Vietnam Education Foundation for bringing the chance to

attend PhD program in the United States to me. I would also like to thank Hanoi Institute

of Mathematics, especially my advisor Dr. Nguyen Dinh Cong, for supporting me to

study in the United States.

I would like to thank the faculty at the Department of Statistics at the University of

Florida for teaching me a great deal of knowledge in statistics.

I would like to thank Dr. Dinh Quang Luu (1947-2005) and my undergraduate

advisor, Dr. Bui Khoi Dam, for inspiring me in my field of study, probability and statistics.

Last but not least, this dissertation is dedicated to the memory of my late father and

my family, especially my wife and my children, for their endless love and support.

4

TABLE OF CONTENTS

page

ACKNOWLEDGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

CHAPTER

1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

1.1 Background on General State Space Markov Chains . . . . . . . . . . . . 101.1.1 Basic Definitions and Convergence Concepts . . . . . . . . . . . . 101.1.2 Spectral Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111.1.3 Cp Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121.1.4 Gibbs Sampler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

1.2 Overview of the Remaining Chapters . . . . . . . . . . . . . . . . . . . . . 161.2.1 Chapter 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161.2.2 Chapter 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191.2.3 Chapter 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2 CHARACTERIZATION OF COMPACTNESS FOR BIRTH-DEATH MARKOVOPERATORS WITH APPLICATIONS TO GIBBS SAMPLING . . . . . . . . . . 23

2.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232.2 Compactness of Birth-Death Markov Operators . . . . . . . . . . . . . . . 232.3 Application to a Family of Gibbs Samplers . . . . . . . . . . . . . . . . . . 252.4 Birth-Death Chains Are Not Uniformly Ergodic . . . . . . . . . . . . . . . . 29

3 SPECTRAL ANALYSIS OF GIBBS SAMPLERS FOR BAYESIAN LINEAR MIXEDMODELS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303.2 The Models and the Gibbs Samplers . . . . . . . . . . . . . . . . . . . . . 30

3.2.1 Proper Prior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303.2.2 Improper Prior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

3.3 Hobert & Geyer’s Gibbs Sampler Is Not Hilbert-Schmidt . . . . . . . . . . 363.4 The Gibbs Sampler with Improper Priors (and Alternative Blocking) Is

Not Trace Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

4 CHARACTERIZATION OF GEOMETRIC ERGODICITY FOR BIRTH-DEATHMARKOV CHAINS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

4.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 624.2 Some Known Results on the Geometric Ergodicity of Birth-Death Chains . 63

4.2.1 Orthogonal Polynomial Method . . . . . . . . . . . . . . . . . . . . 63

5

4.2.2 Spectral Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 694.3 Drift Condition Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

4.3.1 Geometric Ergodicity of Birth-Death Chains . . . . . . . . . . . . . 704.3.2 Application to a Family of Gibbs Samplers . . . . . . . . . . . . . . 834.3.3 Geometric Ergodicity for a Family of Random Walks on Z . . . . . . 844.3.4 Some Other Results . . . . . . . . . . . . . . . . . . . . . . . . . . 88

APPENDIX

A SOME LEMMAS AND EXAMPLES . . . . . . . . . . . . . . . . . . . . . . . . . 93

B CHI-SQUARE DISTANCE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

C F -GEOMETRIC ERGODICITY . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

BIOGRAPHICAL SKETCH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

6

LIST OF FIGURES

Figure page

1-1 Pr(U = i, V = j) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

A-1 Graph in R2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

A-2 Graph in R3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

7

Abstract of Dissertation Presented to the Graduate Schoolof the University of Florida in Partial Fulfillment of theRequirements for the Degree of Doctor of Philosophy

CONVERGENCE ANALYSIS OF BIRTH-DEATH MARKOV CHAINS AND GIBBSSAMPLERS

By

Trung Ha

August 2016

Chair: James P. HobertMajor: Statistics

Markov chain Monte Carlo (MCMC) is one of the most powerful computational tools

in statistics, and the Gibbs sampler (GS) is an important special case of MCMC. The

GS is often used to approximate Bayesian statistical estimators (posterior expectations).

While it is usually simple to apply the GS in practice, the Markov chain convergence

analysis that is required to ensure that the results are reasonable can be very difficult.

In particular, one must demonstrate that the underlying Markov chain converges at a

geometric rate. The two most popular methods of establishing this are the drift and

minorization method and the spectral method. We perform a spectral analysis of the

toy GS from Tan et al. (2013) by exploiting the fact that one of the marginal chains

associated with this GS is a birth-death Markov chain. In particular, we develop a

general necessary and sufficient condition for the compactness of birth-death Markov

operators, and use this to get a necessary and sufficient condition for compactness of

the Markov operator associated with the marginal Gibbs Markov chain. We note that

(under standard regularity conditions) compactness of a Markov operator implies a

geometric convergence rate for the corresponding Markov chain. We also use spectral

theory to study two different practically relevant GSs for Bayesian linear mixed models,

one with proper priors and a standard parametrization, and the other with improper

priors and an alternative parametrization. The GS for the model with proper priors is

known to be geometrically ergodic (Hobert and Geyer, 1998). We prove that neither of

8

the corresponding Markov operators is trace class. This is contrary to recent results

showing that several common GSs have trace class Markov operators. Finally, we

use drift methodology to find a necessary and sufficient condition for the geometric

ergodicity of a general birth-death Markov chain.

9

CHAPTER 1INTRODUCTION

1.1 Background on General State Space Markov Chains

1.1.1 Basic Definitions and Convergence Concepts

Let Φ = {Xi}∞i=0 be an irreducible, aperiodic, and Harris positive recurrent Markov

chain on a countably generated measure space (X,F) with a Markov transition kernel

P (x, dy) and a stationary distribution π, i.e.

π(A) =

∫X

P (x,A)π(dx), ∀A ∈ F .

Those conditions guarantee that Φ is ergodic (see Meyn and Tweedie, 2009, chap.5 and

chap.13), i.e.

‖P n(x, ·)− π(·)‖TV → 0, ∀x ∈ X,

where ‖·‖TV denotes the total variation norm and P n denotes the n-step Markov transi-

tion kernel of Φ. Therefore, we can approximate π by P n(x, ·) for large n for any x ∈ X.

Given a π-integrable function f in the probability space (X,F , π), we denote

πf =

∫X

fdπ.

When Φ is ergodic with stationary distribution π, the strong law of large numbers (SLLN)

holds for any π-integrable function f and initial distribution ν (see Meyn and Tweedie,

2009, chap.17), i.e.

fn :=1

n

n−1∑i=0

f(Xi)→ πf Pν-a.s.,

where Pν denotes the probability of events when the chain has initial distribution ν. That

means we can approximate πf by fn for large n. We say that a π-integrable function f

satisfies a central limit theorem (CLT) if there exists some constant σ2 < ∞ such that for

any initial distribution ν,

1√n

n−1∑i=0

[f(Xi)− πf ]d−→ N(0, σ2).

10

Ergodicity does not tell us about the rate of convergence in ‖P n(x, ·)− π(·)‖TV and is

not sufficient for the CLT. One of the most popular ways to obtain both of those is to find

conditions for the exponential convergence rate of ‖P n(x, ·)− π(·)‖TV. P is geometrically

ergodic if there exist a function M : X→ [0,∞) and a constant r ∈ [0, 1) such that

‖P n(x, ·)− π(·)‖TV ≤M(x)rn, ∀x ∈ X,∀n ∈ N.

If M(·) is bounded, then the chain is called uniformly ergodic. Geometric ergodicity

for Markov chains on uncountable state spaces is usually established by building drift

and/or minorization conditions (Jones and Hobert, 2001; Meyn and Tweedie, 2009;

Roberts and Rosenthal, 2004). In practice, it is very difficult to construct drift conditions.

Consequently, very few practical Monte Carlo Markov chains have been shown to

be geometrically ergodic. And even when drift and minorization conditions can be

constructed, they typically lead to very poor bounds on M(·) and r (Diaconis et al.,

2008). Another way to obtain geometric ergodicity is via spectral theory, as we explain in

the next section.

1.1.2 Spectral Theory

Define the Hilbert spaces

L2(π) ={f is F-measurable : πf 2 <∞

}and

L20(π) =

{f is F-measurable : πf = 0, πf 2 <∞

}with the inner product 〈f, g〉 = π(fg) and the corresponding norm ‖f‖ =

√πf 2. L2

0(π) is

a closed subspace of L2(π). P defines a Markov operator from L20(π) (or L2(π)) to itself,

which is also denoted by P , as follows

(Pf)(x) =

∫X

f(y)P (x, dy), x ∈ X.

11

The norm of the operator P in the space L20(π) is defined by

‖P‖ = supf∈L2

0(π),‖f‖=1

‖Pf‖.

By Jensen’s inequality, we can show that ‖Pf‖ ≤ 1 for ‖f‖ ≤ 1 and f ∈ L2(π). So

‖P‖ ≤ 1. One of the reasons for studying P on L20(π) instead of on L2(π) is that, on

L2(π), ‖P‖ is always 1, because 1 is always an eigenvalue of P (see, e.g., Hobert

et al., 2011). On the other hand, on L20(π), neither 1 nor -1 is an eigenvalue of P

(see Chan and Geyer, 1994, p.1753), and, moreover, ‖P‖ is a measure of the speed

of convergence of the Markov chain, with smaller values corresponding to faster

convergence.

It is well known that if ‖P‖ < 1 then Φ is geometrically ergodic, and if P is self-

adjoint then ‖P‖ < 1 if and only if Φ is geometrically ergodic (Liu et al., 1995; Roberts

and Rosenthal, 1997; Roberts and Tweedie, 2001). Proposition B.7 gives some detail.

However, it is not easy to show that ‖P‖ < 1 in practice. A sufficient condition for

‖P‖ < 1 is the compactness of P , so the compactness of P implies geometric ergodicity

of the corresponding Markov chain. (An operator is called compact if it maps any

bounded set to a relatively compact set.) Other than this implication, not much is

known about the relationship between compactness and geometric/uniform ergodicity.

It is known that very few MCMC operators are compact. Indeed, Chan and Geyer

(1994, p.1755) show that a Metropolis-Hastings algorithm with non-zero rejection

probabilities cannot be compact. However, if P is Hilbert-Schmidt (see Section 1.1.3

for the definition), then we have a closed-form expression for the chi-square distance

between P n(x, ·) and π (Diaconis et al., 2008). Unfortunately, in general, checking

compactness is also difficult.

1.1.3 Cp Class

The Hilbert-Schmidt Markov operators are a subset of the compact Markov opera-

tors, and the trace class Markov operators are a subset of the Hilbert-Schmidt Markov

12

operators. (Definitions are provided later in this section.) Fortunately, there are simple

techniques for checking whether or not a given Markov operator is Hilbert-Schmidt or

trace class. This provides an avenue for establishing compactness and hence geometric

ergodicity. Moreover, in practice, it is sometimes easier to show that the trace class (or

Hilbert-Schmidt) condition is satisfied than it is to construct a geometric drift condition.

Some recent studies have shown that several practically relevant Markov operators are

trace class (Choi and Hobert, 2013; Jung and Hobert, 2014; Khare and Hobert, 2011;

Pal et al., 2015).

The following facts are true for both L2(π) and L20(π) so we only talk about L2(π).

Denote the dimension of the Hilbert space L2(π) by dimL2(π) and the set of all bounded

linear operators from L2(π) to itself by B(L2(π)). For definitions see Ringrose (1971). For

1 ≤ p <∞, denote by Cp the set of all operators T in B(L2(π)) such that

∑j∈J

|〈Tϕj, ϕj〉|p <∞,

where {ϕj : j ∈ J} is any orthonormal system of L2(π) such that the cardinality of J is

less than or equal to dimL2(π), and the above sum is unordered sum.

From Theorem 1.8.7 in Ringrose (1971), each operator in Cp is compact. If T is a

compact self-adjoint operator, T has countable non-zero real eigenvalues {λn}, counting

multiplicity, and then T ∈ Cp if and only if∑

n |λn|p <∞ (see Ringrose, 1971, p.86).

Below we suppose that P (x, dx′) = k(x, x′)µ(dx′) and π(dx) = π(x)µ(dx) for some

positive measure µ (we use π to denote both measure and density function), and π(x) is

positive µ-almost everywhere.

C2 is called the Hilbert-Schmidt class of operators on L2(π). Given any orthonormal

bases {ϕj : j ∈ J} and {ψj : j ∈ J} in L2(π), T ∈ C2 if and only if∑

i∈J∑

j∈J 〈Tϕi, ψj〉2 =∑

j∈J‖Tϕj‖2 < ∞ (see Ringrose, 1971, p.102). So those sums do not depend on the

specific orthonormal bases {ϕj : j ∈ J} and {ψj : j ∈ J}. P is Hilbert-Schmidt if and only

13

if (see Ringrose, 1971, p.104)∫X

∫X

[k(x, x′)

π(x′)

]2

π(x)π(x′)µ(dx)µ(dx′) <∞.

When P ∈ C2 and P is self-adjoint, P has countable non-zero real eigenvalues {βn},

counting multiplicity, and hence

∑i∈J

∑j∈J

〈Pϕi, ψj〉2 =∑j∈J

‖Pϕj‖2 =∑n

β2n =

∫X

∫X

[k(x, x′)

π(x′)

]2

π(x)π(x′)µ(dx)µ(dx′)

for any orthonormal bases {ϕj : j ∈ J} and {ψj : j ∈ J} (see Ringrose, 1971, p.107).

C1 is called the trace class of operators on L2(π). Given any orthonormal basis

{ϕj : j ∈ J} in L2(π), if T ∈ C1 then∑

j∈J 〈Tϕj, ϕj〉 exists and does not depend on the

specific orthonormal basis {ϕj : j ∈ J} (see Ringrose, 1971, p.82). If P is a positive

self-adjoint operator, P is trace class if and only if (personal communication with K.

Khare) ∑n

βn =

∫X

k(x, x)µ(dx) <∞.

Because Cp ⊂ Cq if 1 ≤ p ≤ q ≤ ∞ (see Ringrose, 1971, p.76), trace class is a subset

of Hilbert-Schmidt class. Note that 0 < βn ≤ 1 when P is positive. Therefore, if P is a

positive self-adjoint operator, then

H :=

∫X

∫X

[k(x, x′)

π(x′)

]2

π(x)π(x′)µ(dx)µ(dx′) ≤∫X

k(x, x)µ(dx).

We now provide a direct proof of this fact. P is a positive self-adjoint operator so there

exists another Markov transition kernel s(x, x′) which is also self-adjoint with respect to π

such that (see Kadison and Ringrose, 1997, p.247 or Li, 2003, p.37)

k(x, x′) =

∫X

s(x′′, x′)s(x, x′′)µ(dx′′).

By Jensen’s inequality

[k(x, x′)]2 =

[∫X

s(x′′, x′)s(x, x′′)µ(dx′′)

]2

≤∫X

s2(x′′, x′)s(x, x′′)µ(dx′′).

14

Now,

H =

∫X

∫X

π(x)

π(x′)k2(x, x′)µ(dx)µ(dx′) ≤

∫X

∫X

∫X

π(x)

π(x′)s2(x′′, x′)s(x, x′′)µ(dx′′)µ(dx)µ(dx′).

But, s(x, x′′)π(x) = s(x′′, x)π(x′′), so

H ≤∫X

∫X

∫X

π(x′′)

π(x′)s2(x′′, x′)s(x′′, x)µ(dx′′)µ(dx)µ(dx′)

=

∫X

∫X

π(x′′)

π(x′)s2(x′′, x′)µ(dx′′)µ(dx′).

And using the self-adjointness again, we have s(x′′, x′)π(x′′) = s(x′, x′′)π(x′), so

H ≤∫X

∫X

s(x′′, x′)s(x′, x′′)µ(dx′′)µ(dx′) =

∫X

k(x′, x′)µ(dx′).

1.1.4 Gibbs Sampler

Now we talk about the relationship between Markov chains corresponding to

a Gibbs sampler (GS). Suppose that (X, Y ) has a joint distribution π(dx, dy) =

π(x, y)µ(dx)ν(dy) on a countably generated measure space (X × Y,F ⊗ G). Also,

suppose that it is easy to simulate from conditional kernels πX|Y (y, dx) = πX|Y (x|y)µ(dx)

and πY |X(x, dy) = πY |X(y|x)ν(dy). Denote the marginal distribution of X by πX . The GS

for the above joint distribution is based on a Markov chain {(Xn, Yn)}∞n=0 which is called

the (x, y)-chain or the Gibbs Markov chain. If the current state is (Xn, Yn) = (x, y), we

simulate the next state (Xn+1, Yn+1) by:

1. Draw Yn+1 ∼ πY |X(·|x). Call the observed value y′.

2. Draw Xn+1 ∼ πX|Y (·|y′).

The (x, y)-chain has the Markov transition kernel

P ((x, y), d(x′, y′)) := k((x, y), (x′, y′))µ(dx′)ν(dy′) = πX|Y (x′|y′)πY |X(y′|x)µ(dx′)ν(dy′)

15

and the stationary distribution π. {Xn}∞n=0, which is called the x-chain or the marginal

chain, is also a Markov chain with the Markov transition kernel

PX(x, dx′) := kX(x, x′)µ(dx′) =

(∫Y

πX|Y (x′|y)πY |X(y|x)ν(dy)

)µ(dx′)

and the stationary distribution πX . PX is a positive self-adjoint operator in L2(πX). The

Gibbs Markov chain and its marginal chain have a strong relationship. If one of them is

geometrically ergodic then the other is also geometrically ergodic (Diaconis et al., 2008).

The x-chain is trace class if and only if the (x, y)-chain is Hilbert-Schmidt because∫Y

∫X

∫Y

∫X

[k((x, y), (x′, y′))

π(x′, y′)

]2

π(x, y)π(x′, y′)µ(dx)ν(dy)µ(dx′)ν(dy′)

=

∫Y

∫X

∫Y

∫X

[πX|Y (x′|y′)πY |X(y′|x)

π(x′, y′)

]2

π(x, y)π(x′, y′)µ(dx)ν(dy)µ(dx′)ν(dy′)

=

∫Y

∫X

∫X

[πX|Y (x′|y′)πY |X(y′|x)

π(x′, y′)

]2

πX(x)π(x′, y′)µ(dx)µ(dx′)ν(dy′)

=

∫Y

∫X

∫X

[π(x, y′)

πY (y′)πX(x)

]2

πX(x)π(x′, y′)µ(dx)µ(dx′)ν(dy′)

=

∫Y

∫X

[π(x, y′)

πY (y′)πX(x)

]2

πX(x)πY (y′)µ(dx)ν(dy′)

=

∫Y

∫X

πX|Y (x|y′)πY |X(y′|x)µ(dx)ν(dy′)

=

∫X

kX(x, x)µ(dx).

1.2 Overview of the Remaining Chapters

1.2.1 Chapter 2

The GS is an indispensable tool for exploring intractable probability distributions

(see, e.g., Brooks et al., 2011), but there is still very little known about its convergence

properties. In particular, with the exception of a few scattered examples, the qualitative

convergence rates of most GSs that are used in practice are unknown. That is, it is

unknown whether these Gibbs Markov chains are uniformly ergodic, geometrically

ergodic, or sub-geometrically ergodic. This is a serious practical problem because the

16

techniques that are typically used to process GS simulations are predicated on the

existence of CLT, which may not exist if the Markov chain in question is sub-geometric

(see, e.g., Roberts and Rosenthal, 1998; Flegal et al., 2008).

The main reason why so little is known about the GSs that are used in practice

is that the underlying Markov chains are complex, high dimensional, and difficult to

analyze. This has led to the study of families of GSs that are not practically relevant, in

the sense that their invariant distributions are not intractable, but are still challenging

from a convergence rate analysis standpoint. For example, Diaconis et al. (2008)

performed a thorough spectral analysis of the GSs that result when a density in the

one-parameter exponential family is combined with the natural conjugate prior. Other

examples include Tan et al. (2013), Jovanovski and Madras (2014) and Hobert and

Khare (2014). Note that all finite state space, irreducible, and aperiodic Markov chains

are uniformly ergodic (see Billingsley, 1995, p.131). Here we consider the family

introduced and studied in Tan (2009) and Tan et al. (2013). These GSs are, in a sense,

the simplest ones whose state space in not finite. We now describe this family.

Let {ai}∞i=1 and {bi}∞i=1 be two sequences of strictly positive real numbers such that∑∞i=1 ai +

∑∞i=1 bi = 1. Denote by N the set of natural numbers. Let (U, V ) be a discrete

bivariate random vector supported on N × N whose joint probability mass function (pmf)

is given by (see Figure 1-1)

Pr(U = i, V = j) =

ai if i = j and j = 1, 2, . . .

bj if i = j + 1 and j = 1, 2, . . .

0 otherwise

. (1–1)

For convenience, we define b0 = 0 and a0 = 1. Let {(Xn, Yn)}∞n=0 be a Markov chain

on N× N such that

Pr(Xn+1 = i′, Yn+1 = j′ |Xn = i, Yn = j

)= Pr(V = j′ |U = i) Pr(U = i′ |V = j′) , (1–2)

17

a1

a2

a3

a4

b1

b2

b3

1 2 3 4 5i

1

2

3

4

5j

Figure 1-1. Pr(U = i, V = j)

where, for any j ∈ N,

Pr(U = i |V = j) =aj

aj + bjI(i = j) +

bjaj + bj

I(i = j + 1) ,

and, likewise, for any i ∈ N,

Pr(V = j |U = i) =ai

ai + bi−1

I(j = i) +bi−1

ai + bi−1

I(j = i− 1) .

It is easy to see that this Markov chain is irreducible, aperiodic and Harris positive

recurrent, with stationary mass function (1–1).

It would be a trivial matter to simulate independent and identically distributed (iid)

random vectors from the joint pmf (1–1), so the above GS would never be used to

explore (1–1). However, it is still worthy of analysis since, as we now explain, these

GSs are the simplest ones that are not automatically uniformly ergodic. First, it is clear

that one cannot construct a GS using a single random variable, so a bivariate random

vector is the simplest place to start. (In the trivial case where the two components

are independent, the GS converges in one iteration.) Second, if either (or both) of the

components of a bivariate random vector have finite support, then the corresponding GS

is automatically uniformly ergodic (see, e.g., Diebolt and Robert, 1994). Furthermore,

if ai or bi is 0 for some i, then the Gibbs chain is not irreducible. So, given that both

18

variables must have infinite support in order for the corresponding GS to be interesting,

the support of (U, V ) seems quite minimal. We note that a member of this family of

Gibbs samplers was recently used as a counterexample in Łatuszynski et al. (2013).

Tan et al. (2013) developed conditions on the ais and bis that guarantee geometric

ergodicity. The marginal x-chain {Xn}∞n=0 is a special case of birth-death chain. This

leads us to the study of general birth-death chain. We use the same notation X =

{Xn}∞n=0 to denote a birth-death Markov chain with state space N and Markov transition

matrix (Mtm) given by

M =

r1 p1 0 0 0 · · ·

q2 r2 p2 0 0 · · ·

0 q3 r3 p3 0 · · ·...

......

......

. . .

.

Of course, the (i, j)th entry of M represents Pr(Xn+1 = j |Xn = i), so qi + ri + pi = 1

for all i ∈ N, where q1 ≡ 0. In Section 2.2, we characterize compactness of the

Markov operator M of the birth-death chain X. Using result in Section 2.2, we can

also characterize compactness of the Markov operator of the marginal x-chain of

{(Xn, Yn)}∞n=0 in Section 2.3. Comparing Tan’s conditions for geometric ergodicity with

our conditions for compactness shows that compactness is a much stronger property.

Finally, we show that birth-death chains can not be uniformly ergodic in Section 2.4.

1.2.2 Chapter 3

Consider the frequentist unbalanced one-way random effects model (Searle et al.,

1992)

yij = θi + εij, i = 1, 2, . . . , K, j = 1, 2, . . . ,mi, (1–3)

where the random effects θis are iid N(µ, λ−1θ ), the white noise εijs are iid N(0, λ−1

e ),

θis and εijs are independent for all i and j, µ, λθ, and λe are unknown parameters, and

K ≥ 2 and mi ≥ 2 for all i = 1, 2, . . . , K are known constants.

19

Given positive numbers α and β, we denote by X ∼ Gamma(α, β) the random

variable with density function f(x) = (βα/Γ(α))xα−1e−xβI{x > 0}. 1 is a K × 1 column

vector of ones, I is a K ×K identity matrix, θ = (θ1, . . . , θK)T is a K × 1 column vector,

ζ = (θ1, . . . , θK , µ)T is a (K + 1)× 1 column vector, λ = (λθ, λe)T is a 2× 1 column vector,

and y denotes all numbers yijs.

A Bayesian version with proper prior of the frequentist model (1–3) is

yij|θ, λe, λθ, µ ∼ N(θi, λ−1e ), i = 1, 2, . . . , K, j = 1, 2, . . . ,mi,

θ|µ, λθ, λe ∼ N(1µ, Iλ−1θ ),

where λe ∼ Gamma(a2, b2), µ ∼ N(µ0, λ−10 ), λθ ∼ Gamma(a1, b1), yijs are independent

given θ, λe, λθ, and µ, µ0 is a known constant, a1, a2, b1, b2, and λ0 are strictly positive

known constants, and µ, λθ, and λe are mutually independent. The posterior distribution

is π(ζ, λ|y). To simplify notation, we suppress the notation of dependence on y, e.g.

π(ζ, λ) := π(ζ, λ|y). Hobert and Geyer (1998) studied a Block GS for the above model

which is based on π(ζ|λ) and π(λ|ζ) and established that it is geometrically ergodic

under some simple conditions. In Section 3.3, we show that the marginal λ-chain of

the Block Gibbs chain in Hobert and Geyer (1998) is never Hilbert-Schmidt. This is a

negative result but it tells us some properties of the spectrum of the Markov operator.

For example, the Markov operator of that marginal λ-chain is either not compact or

compact with infinite sum of the squares of its eigenvalues.

A Bayesian version with improper prior of the frequentist model (1–3) is (Tan and

Hobert, 2009)

yij|θ, µ, λθ, λe ∼ N(θi, λ−1e ), i = 1, 2, . . . , K, j = 1, 2, . . . ,mi,

θ|µ, λθ, λe ∼ N(1µ, Iλ−1θ ),

f(λθ, λe, µ) ∝ λa−1θ λb−1

e , λθ, λe > 0,

20

where yijs are independent given θ, λe, λθ, and µ, a and b are known constants. Tan and

Hobert (2009) studied a Block GS for the above model which is based on π(ζ|λ) and

π(λ|ζ) and established that it is geometrically ergodic under some simple conditions.

Improper priors probably would not make things better, so we figured it would be

unlikely that Tan and Hobert’s (2009) chains would be better behaved than Hobert and

Geyer’s (1998) chains. So we do not study the spectral properties of Tan and Hobert’s

(2009) Markov chains. But we thought a different parametrization might change things.

Therefore, we consider an alternative Block GS that is based on π(θ|µ, λ) and π(µ, λ|θ)

(personal communication with A. Tan). It is currently an open problem whether this

alternative Block Gibbs chain is geometrically ergodic. In Section 3.4, we show that this

alternative Block Gibbs chain is not trace class in most cases.

1.2.3 Chapter 4

In this chapter, we return to the birth-death chains studied in Chapter 2. van Doorn

and Schrijner (1995), Mao (2010), and Tan et al. (2013) have studied the geometric

ergodicity of birth-death chains. van Doorn and Schrijner (1995) used orthogonal

polynomials. Theorem 3.4 in van Doorn and Schrijner (1995) gives a necessary and

sufficient condition for the geometric ergodicity of birth-death chains, but it is very

difficult to be verified in practice. So Theorem 3.5 in van Doorn and Schrijner (1995)

develops a more practical necessary condition for the geometric ergodicity of birth-death

chains. Tan et al. (2013) studied a subset of the set of all general birth-death chains.

Lemma 3 in Tan et al. (2013) suggests us the idea to develop a part of our conditions

for the geometric ergodicity of birth-death chains. Theorem 4.3 in Mao (2010) used

both spectra gap and drift condition to find the most practical necessary and sufficient

condition for the geometric ergodicity of birth-dearth chains among known results. Their

result is quite similar to our main result in this chapter. We use drift condition method,

which is different from Mao’s (2010) method, to find a necessary and sufficient condition

for the geometric ergodicity of birth-death chains which is simpler than the condition

21

in Mao (2010). We study properties of all possible drift functions and find a necessary

condition for the geometric ergodicity of birth-death chains. That condition is strong

enough so that it is also sufficient for the geometric ergodicity of birth-death chains. Our

method can be applied to other models such as a family of random walks on Z where Z

is the set of integer numbers. We do not know Mao’s (2010) method can be applied for

those random walks on Z or not.

Here is what we will do in this chapter. Section 4.2 reviews main results in van

Doorn and Schrijner (1995) and Mao (2010) on the geometric ergodicity of birth-death

chains. Section 4.3.1 develops a simple necessary and sufficient condition for the

geometric ergodicity of birth-death chains. Section 4.3.2 uses results in Section 4.3.1

to study the toy GS from Tan et al. (2013). Section 4.3.3 uses the same method in

Section 4.3.1 to study a family of random walks on Z. Section 4.3.4 gives some results

which relate to birth-death chains.

22

CHAPTER 2CHARACTERIZATION OF COMPACTNESS FOR BIRTH-DEATH MARKOV

OPERATORS WITH APPLICATIONS TO GIBBS SAMPLING

2.1 Summary

In this chapter, we characterize compactness of the Markov operator of the birth-

death chain. We then apply it to study Tan’s (2009) GS Markov chains, which were

introduced in Section 1.2.1. In particular, we look at the relationship between com-

pactness and geometric ergodicity. Finally, we show that birth-death chain can not be

uniformly ergodic.

2.2 Compactness of Birth-Death Markov Operators

Recall from Section 1.2.1 that X = {Xn}∞n=0 is a birth-death Markov chain with state

space N and Markov transition matrix (Mtm) given by

M =

r1 p1 0 0 0 · · ·

q2 r2 p2 0 0 · · ·

0 q3 r3 p3 0 · · ·...

......

......

. . .

.

It is well-known (see, e.g., Karlin and Taylor, 1975, p.108) that the birth-death chain X is

irreducible, aperiodic, and positive recurrent if and only if the following three conditions

hold: (i) pi > 0 for all i ∈ N, (ii) ri > 0 for some i ∈ N, and (iii) c =∑∞

i=1 ci <∞, where

c1 = 1, ci =p1 p2 · · · pi−1

q2 q3 · · · qi, i = 2, 3, . . .

We note that Karlin and Taylor (1975) actually refer to X as a random walk chain rather

than a birth-death chain, but the latter appears to be more standard. When the three

regularity conditions hold, there exist a set of (strictly positive) stationary probabilities

π = {πi}∞i=1 with πi = ci/c for i = 1, 2, . . . Note that πi+1/πi = pi/qi+1 for all i ∈ N. Also, it

is easy to see that X is reversible; i.e., πimij = πjmji for all i, j ∈ N.

23

Denote by R the set of real numbers. Let L20(π) denote the functions f : N→ R such

that∞∑i=1

f 2i πi <∞ and

∞∑i=1

fi πi = 0 .

L20(π) is the Hilbert space with the inner-product

〈f, g〉 =∞∑i=1

fi gi πi .

The Mtm M defines an operator, which we also call M , that maps f = {fi}∞i=1 ∈ L20(π) to

Mf = {(Mf)i}∞i=1 ∈ L20(π), where

(Mf)i =∞∑j=1

mijfj .

Our main result for the birth-death chain is a simple characterization of compactness.

Proposition 2.1. The operator M is compact in L20(π) if and only if ri → 0 and pi → 0.

Proof. Lemma A.1 in the Appendix shows that compactness in L2(π) is equivalent to

compactness in L20(π). Hence, it suffices to prove that M is a compact operator in L2(π)

if and only if ri → 0 and pi → 0. For i ∈ N, let e(i) ∈ L2(π) denote the vector that has

ith coordinate equal to 1/√πi, and has every other coordinate equal to 0. Note that

〈e(i), e(j)〉 equals 1 if i = j, and equals 0 otherwise. Hence, the e(i)s form an orthonormal

basis of L2(π). Now let M∗ denote the matrix representation of the linear operator M in

L2(π) with respect to this basis (see, e.g., Akhiezer and Glazman, 1993, p.48). That is,

m∗ij =

√πiπjmij .

Thus, we have

M∗ =

r1

√π1π2p1 0 0 0 · · ·√

π2π1q2 r2

√π2π3p2 0 0 · · ·

0√

π3π2q3 r3

√π3π4p3 0 · · ·

......

......

.... . .

.

24

(Note the correspondence between M∗ and the finite-dimensional matrix D−1PD on

page 42 of Diaconis and Stroock (1991).) Now, since m∗ij = 0 whenever |i − j| > 2,

the results in Section 28 of Akhiezer and Glazman (1993) imply that M is a compact

operator if and only if

limi,j→∞

m∗ij = 0 . (2–1)

Now, using the reversibility of the birth-death chain, we have

m∗ji =

√πjπimji =

√πjπi

πimij

πj=

√πiπjmij = m∗ij ,

so M∗ is a symmetric matrix. Moreover, since πi+1/πi = pi/qi+1, we have√πiπi+1

pi =

√πi+1

πiqi+1 =

√piqi+1 .

Thus, (2–1) holds if and only if ri → 0 and √piqi+1 → 0. Clearly, if ri → 0 and pi → 0,

then ri → 0 and √piqi+1 → 0. To prove the other direction, assume that ri → 0 and√piqi+1 → 0. Assume also that lim sup pi > 0. Then there exists an ε ∈ (0, 1) such that

lim sup pi > ε. Fix δ > 0 and choose N such that ri and piqi+1 are both less than δ for all

i > N . Since lim sup pi > ε, there exists j > N such that pj > ε. Then because j > N ,

we have qj+1 < δ/ε and rj+1 < δ. Thus, pj+1 = 1 − rj+1 − qj+1 > 1 − δ − δ/ε. Assuming

ε is sufficiently small, and taking δ = ε2, we have pj+1 > 1 − ε2 − ε > ε. By repeating

the argument, we get that pi > ε for all i ≥ j. But piqi+1 → 0, so we must have qi → 0.

Consequently, πi/πi+1 = qi+1/pi → 0. But this implies that πi+1 > πi for all large i, which

contradicts the fact that πi → 0. We conclude that pi → 0.

2.3 Application to a Family of Gibbs Samplers

Recall from Section 1.2.1 that {ai}∞i=1 and {bi}∞i=1 are two sequences of strictly

positive real numbers such that∑∞

i=1 ai +∑∞

i=1 bi = 1, and {(Xn, Yn)}∞n=0 is the Gibbs

Markov chain, whose transition probabilities are given by

Pr(Xn+1 = i′, Yn+1 = j′ |Xn = i, Yn = j

)= Pr(V = j′ |U = i) Pr(U = i′ |V = j′) , (2–2)

25

where, for any j ∈ N,

Pr(U = i |V = j) =aj

aj + bjI(i = j) +

bjaj + bj

I(i = j + 1) ,

and, likewise, for any i ∈ N,

Pr(V = j |U = i) =ai

ai + bi−1

I(j = i) +bi−1

ai + bi−1

I(j = i− 1) .

It is well-known that the two marginal sequences, {Xn}∞n=0 and {Yn}∞n=0, are themselves

reversible Markov chains, and that geometric ergodicity is a solidarity property for the

three chains. That is, either {(Xn, Yn)}∞n=0, {Xn, }∞n=0 and {Yn}∞n=0 are all geometrically

ergodic, or none of them is (Diaconis et al., 2008; Roberts and Rosenthal, 1997; Liu

et al., 1994). We analyze the marginal x-chain {Xn}∞n=0. A simple calculation shows

that the x-chain is a special case of the birth-death chain with transition probabilities are

given by

pi =aibi

(ai + bi−1)(ai + bi)and qi =

ai−1bi−1

(ai + bi−1)(ai−1 + bi−1),

and ri = 1 − pi − qi. (Note that q1 = 0 due to the definitions of a0 and b0.) Let MX

denote the Markov operator defined by the x-chain. We use Proposition 2.1 to establish

necessary and sufficient conditions for its compactness.

Proposition 2.2. The operator MX of the marginal x-chain of {(Xn, Yn)}∞n=0 is compact if

and only if

bi/ai → 0 and ai+1/bi → 0 . (2–3)

Proof. We start with necessity. Assume MX is compact. Then Proposition 2.1 implies

that ri → 0. In this case,

ri = 1− pi − qi =a2i

(ai + bi−1)(ai + bi)+

b2i−1

(ai + bi−1)(ai−1 + bi−1).

Thus, ri → 0 if and only if

a2i

(ai + bi−1)(ai + bi)=

1

(1 + bi−1/ai)(1 + bi/ai)→ 0 , (2–4)

26

andb2i−1

(ai + bi−1)(ai−1 + bi−1)=

1

(1 + ai/bi−1)(1 + ai−1/bi−1)→ 0 . (2–5)

Now consider (2–4). Since 1/(1 + bi−1/ai) and 1/(1 + bi/ai) are both in (0,1), their product

converges to zero if and only if

min

{1

1 + bi−1/ai,

1

1 + bi/ai

}→ 0 .

It follows that ri → 0 if and only if

max

{bi−1

ai,biai

}→∞ , (2–6)

and

max

{aibi−1

,ai−1

bi−1

}→∞ . (2–7)

Now let N be such that (bi ∨ bi−1)/ai > 1 and (ai ∨ ai−1)/bi−1 > 1 for all i > N where

∨ denotes maximum. Assume that i > N . If ai−1 ≤ bi−1, then ai > bi−1 and ai < bi.

Consequently,

ai−1 ≤ bi−1 < ai < bi < ai+1 < bi+1 < · · · .

But this contradicts the fact that ai → 0. We conclude that ai−1 > bi−1 for all i > N . Then,

since we also know that (bi ∨ bi−1)/ai > 1 for all i > N , it follows that

aN > bN > aN+1 > bN+1 > aN+2 > bN+2 > · · · ,

Thus, for i > N , bi ∨ bi−1 = bi−1 and ai ∨ ai−1 = ai−1. It follows from (2–6) and (2–7) that

bi−1/ai →∞ and ai/bi →∞. Thus, (2–3) holds.

We now establish sufficiency. Assume that (2–3) holds. Then there exists an N

such that bi/ai and ai+1/bi are both smaller than 1 for all i ≥ N . Therefore,

aN > bN > aN+1 > bN+1 > aN+2 > bN+2 > · · · .

27

Thus, the arguments in the necessity part of the proof imply that ri → 0. Finally, it is

easy to see that

pi =1

(1 + bi−1/ai)(1 + ai/bi)→ 0 .

The following result gives a sufficient condition for geometric ergodicity.

Proposition 2.3. (Tan, 2009) The Markov chains {(Xn, Yn)}∞n=0, {Xn, }∞n=0 and {Yn}∞n=0

are all geometrically ergodic if

lim supi→∞

piqi< 1 and lim inf

i→∞qi > 0 . (2–8)

Note that the conditions of Proposition 2.3 are much weaker than the conditions of

Proposition 2.1. We now apply our results to two examples of the joint pmf (1–1). We

begin with a chain converges at a geometric rate, but is not compact.

Example 2.4. (Tan, 2009) For i ∈ N, let ai = ce−i and bi = e−i, where c = e − 2. Then

pi = c(c+e)(c+1)

and qi = ce(c+e)(c+1)

, so pi/qi = 1/e. Thus, by Proposition 2.3, the Gibbs

Markov chain is geometrically ergodic. However, since bi/ai = 1/c 6= 0, Proposition 2.2

implies that the operator is not compact.

We end with an example of a compact Gibbs Markov chain.

Example 2.5. For i ∈ N, let

a′i =1

(2i− 1)2i−1and b′i =

1

(2i)2i.

Note that M :=∑∞

i=1(a′i + b′i) =∑∞

i=1 i−i <∞, and let ai = a′i/M and bi = b′i/M . Now

biai

=(2i− 1)2i−1

(2i)2i=

[2i− 1

2i

]2i−11

2i≤ 1

2i→ 0 .

In addition,ai+1

bi=

(2i)2i

(2i+ 1)2i+1=

[2i

2i+ 1

]2i1

2i+ 1≤ 1

2i+ 1→ 0 .

28

Thus, by Proposition 2.2, the operator is compact. Of course, this implies that the chain

is geometrically ergodic.

2.4 Birth-Death Chains Are Not Uniformly Ergodic

In this section, we prove that

Proposition 2.6. The birth-death chain X is not uniformly ergodic.

We can prove this proposition by using Theorem 16.0.2(vi) in Meyn and Tweedie

(2009). However we will provide a simple proof here.

Proof. Recall that π is the stationary distribution. There exists N such that π(N) > 0.

Suppose that X is uniformly ergodic then there exists a positive constant M < ∞ and a

constant 0 < t < 1 such that for all i and n

∞∑j=1

|P n(i, j)− π(j)| ≤Mtn.

So

|P n(N + n+ 1, N)− π(N)| ≤Mtn, n ≥ 1.

If the chain starts at the state N + n+ 1, it takes at least n+ 1 steps to move to the state

N . Thus, P n(N + n+ 1, N) = 0 for all n and hence

|P n(N + n+ 1, N)− π(N)| = π(N) ≤Mtn, n ≥ 1.

That can not happen because tn → 0 and π(N) > 0. Thus, X is not uniformly ergodic.

We can use the same arguments to prove that a more general chain is not uniformly

ergodic. P is not uniformly ergodic if P has a stationary distribution and there exists

ρ > 0 such that

P (i, j) = 0 for |i− j| > ρ, i, j = 1, 2, . . .

29

CHAPTER 3SPECTRAL ANALYSIS OF GIBBS SAMPLERS FOR BAYESIAN LINEAR MIXED

MODELS

3.1 Summary

We prove that the marginal chain of Hobert and Geyer’s (1998) Block Gibbs chain

is never Hilbert-Schmidt in Section 3.3. For improper prior, we consider an alternative

Block GS instead of Tan and Hobert’s (2009) chain. In Section 3.4, we show that this

alternative Block Gibbs chain is not trace class in most cases.

3.2 The Models and the Gibbs Samplers

3.2.1 Proper Prior

Recall the model with proper prior in Section 1.2.2

yij|θ, λe, λθ, µ ∼ N(θi, λ−1e ), i = 1, 2, . . . , K, j = 1, 2, . . . ,mi,

θ|µ, λθ, λe ∼ N(1µ, Iλ−1θ ), λe ∼ Gamma(a2, b2),

µ ∼ N(µ0, λ−10 ), λθ ∼ Gamma(a1, b1).

We introduce some notations beside notations mentioned in Section 1.2.2. Denote

yi = m−1i

∑mij=1 yij, N =

∑Ki=1mi, A1 = a1 +K/2, and A2 = a2 +N/2. Denote λθ,λe > 0 by

λ > 0. Denote by f(|) a generic conditional density and by f a generic density. For any

matrix M , let M(i, j) denote the entry on the ith row and jth column of M . Recall that

ζ = (θ1, . . . , θK , µ)T and λ = (λθ, λe)T . The posterior density is

π(ζ, λ|y) ∝

[K∏i=1

mi∏j=1

f(yij|θ, λe)

]f(θ|µ, λθ)f(µ)f(λθ)f(λe).

Recall that we simplify notation by suppressing the notation of dependence on y, e.g.

π(ζ, λ) := π(ζ, λ|y).

30

Hobert and Geyer (1998) studied the following Block GS for the above model

λθ|λe, µ, θ ∼ Gamma

(A1, b1 +

1

2

K∑i=1

(θi − µ)2

),

λe|λθ, µ, θ ∼ Gamma

(A2, b2 +

1

2

K∑i=1

mi∑j=1

(yij − θi)2

),

ζ|λθ, λe ∼ N (ζλ, Vλ) ,

(3–1)

where

V −1λ =

D2λ −λθ1

−λθ1T λ0 +Kλθ

,where Dλ is a K ×K diagonal matrix whose ith diagonal element is dλ,ii =

√λθ +miλe,

and

V −1λ ζλ =

λem1y1

λem2y2

...

λemK yK

λ0µ0

. (3–2)

As we introduced in Chapter 1, we have three Markov chains which relate to this GS.

The Block Gibbs chain has Markov transition density (Mtd)

k(λ′, ζ ′|λ, ζ)

= π(λ′|ζ ′)π(ζ ′|λ)

∝

[b1 +

1

2

K∑i=1

(θ′i − µ′)2

]A1

λ′θA1−1

exp

[−b1λ

′θ −

1

2

K∑i=1

(θ′i − µ′)2λ′θ

][b2 +

1

2

K∑i=1

mi∑j=1

(yij − θ′i)2

]A2

λ′eA2−1

exp

[−b2λ

′e −

1

2

K∑i=1

mi∑j=1

(yij − θ′i)2λ′e

]

|V −1λ |

1/2 exp

[−1

2(ζ ′ − ζλ)TV −1

λ (ζ ′ − ζλ)], λ, λ′ > 0 and ζ, ζ ′ ∈ RK+1.

31

The marginal λ-chain has Mtd

k(λ′|λ)

=

∫RK+1

π(λ′|ζ)π(ζ|λ)dζ

∝∫RK+1

[b1 +

1

2

K∑i=1

(θi − µ)2

]A1

λ′θA1−1

exp

[−b1λ

′θ −

1

2

K∑i=1

(θi − µ)2λ′θ

][b2 +

1

2

K∑i=1

mi∑j=1

(yij − θi)2

]A2

λ′eA2−1

exp

[−b2λ

′e −

1

2

K∑i=1

mi∑j=1

(yij − θi)2λ′e

]

|V −1λ |

1/2 exp

[−1

2(ζ − ζλ)TV −1

λ (ζ − ζλ)]dζ, λ, λ′ > 0.

The marginal ζ-chain has Mtd

k(ζ ′|ζ)

=

∫λ>0

π(ζ ′|λ)π(λ|ζ)dλ

∝∫λ>0

[b1 +

1

2

K∑i=1

(θi − µ)2

]A1

λθA1−1 exp

[−b1λθ −

1

2

K∑i=1

(θi − µ)2λθ

][b2 +

1

2

K∑i=1

mi∑j=1

(yij − θi)2

]A2

λeA2−1 exp

[−b2λe −

1

2

K∑i=1

mi∑j=1

(yij − θi)2λe

]

|V −1λ |

1/2 exp

[−1

2(ζ ′ − ζλ)TV −1

λ (ζ ′ − ζλ)]dλ, ζ, ζ ′ ∈ RK+1.

All the three above chains are geometrically ergodic or none of them is (Diaconis et al.,

2008).

3.2.2 Improper Prior

Recall the model with improper prior in Section 1.2.2

yij|θ, µ, λθ, λe ∼ N(θi, λ−1e ), i = 1, 2, . . . , K, j = 1, 2, . . . ,mi,

θ|µ, λθ, λe ∼ N(1µ, Iλ−1θ ),

f(λθ, λe, µ) ∝ λa−1θ λb−1

e , λθ, λe > 0.

32

We also use some extra notations here. Denote yi = m−1i

∑mij=1 yij, N =

∑Ki=1mi,

A = a + K/2, and B = b + N/2. Denote λθ,λe > 0 by λ > 0. Denote by f(|) a generic

conditional density and by f a generic density. The posterior density is

π(θ, µ, λ|y) ∝

[K∏i=1

mi∏j=1

f(yij|θi, λe)

]f(θ|µ, λθ)f(λθ, λe, µ).

This posterior is proper if and only if (see Lemma A.2)

a < 0, a+K

2>

1

2, and a+ b >

1−N2

.

Again, we suppress the notation of dependence on y. We consider a Block GS which

is based on π(θ|µ, λ) and π(µ, λ|θ), so we need to calculate π(θ, µ, λ), π(θ|µ, λ), and

π(µ, λ|θ). Since

π(θ, µ, λ) ∝

{K∏i=1

mi∏j=1

λ1/2e exp

[−λe

2(yij − θi)2

]}{ K∏i=1

λ1/2θ exp

[−λθ

2(θi − µ)2

]}λa−1θ λb−1

e

= λA−1θ λB−1

e exp

[−λθ

2

K∑i=1

(θi − µ)2 − λe2

K∑i=1

mi∑j=1

(yij − θi)2

], λ > 0, (3–3)

we have

π(θ|µ, λ) ∝ π(θ, µ, λ)

∝ exp

[−λθ

2

K∑i=1

(θi − µ)2 − λe2

K∑i=1

mi∑j=1

(yij − θi)2

]

∝ exp

[−λθ

2

K∑i=1

(θ2i − 2µθi)−

λe2

K∑i=1

(miθ2i − 2miyiθi)

]

= exp

[−1

2

K∑i=1

[(λθ +miλe)θ

2i − 2(λθµ+ λemiyi)θi

]].

So

θ|µ, λ ∼ N(θµ,λ, Vλ), (3–4)

33

where Vλ is a K ×K diagonal matrix whose ith diagonal element is (λθ + miλe)−1, and

θµ,λ is a K × 1 column vector whose ith element is

θµ,λ,i =λθµ+ λemiyiλθ +miλe

, 1 ≤ i ≤ K. (3–5)

Note that

θTµ,λV−1λ =

λθµ+ λem1y1

...

λθµ+ λemK yK

T

. (3–6)

From π(µ, λ|θ) ∝ π(θ, µ, λ), λe and (λθ, µ) are independent given θ with

λe|λθ, µ, θ ∼ Gamma

(B,

1

2

K∑i=1

mi∑j=1

(yij − θi)2

), (3–7)

π(λθ, µ|λe, θ) ∝ λA−1θ exp

[−λθ

2

K∑i=1

(θi − µ)2

].

We can show that

µ|λθ, θ, λe ∼ N(θ, (Kλθ)−1),

λθ|θ, λe ∼ Gamma

(A− 1

2,1

2

K∑i=1

(θi − θ)2

).

(3–8)

34

By (3–8), we have

π(µ, λθ|λe, θ)

= π(λθ|θ)π(µ|λθ, θ)

∝

[K∑i=1

(θi − θ)2

]A− 12

λA− 3

2θ exp

[−1

2λθ

K∑i=1

(θi − θ)2

]λ

12θ exp

[−1

2Kλθ(µ− θ)2

]

= λA−1θ

[K∑i=1

(θi − θ)2

]A− 12

exp

{−λθ

2

[K(µ− θ)2 +

K∑i=1

(θi − θ)2

]}

= λA−1θ

[K∑i=1

(θi − θ)2

]A− 12

exp

{−λθ

2

[Kµ2 − 2Kµθ +Kθ2 +

K∑i=1

θ2i −Kθ2

]}

= λA−1θ

[K∑i=1

(θi − θ)2

]A− 12

exp

[−λθ

2

K∑i=1

(θi − µ)2

], λθ > 0, µ ∈ R. (3–9)

We have three Markov chains. The Block GS chain has Mtd

k(µ′, λ′, θ′|µ, λ, θ)

= π(µ′, λ′|θ′)π(θ′|µ, λ)

= π(µ′, λ′θ|λ′e, θ′)π(λ′e|θ′)π(θ′|µ, λ)

∝ λ′θA−1

[K∑i=1

(θ′i − θ′)2

]A− 12

exp

[−λ

′θ

2

K∑i=1

(θ′i − µ′)2

][

K∑i=1

mi∑j=1

(yij − θ′i)2

]Bλ′e

B−1exp

[−1

2

K∑i=1

mi∑j=1

(yij − θ′i)2λ′e

]

|V −1λ |

1/2 exp

[−1

2(θ′ − θµ,λ)TV −1

λ (θ′ − θµ,λ)], λ, λ′ > 0, µ, µ′ ∈ R, θ, θ′ ∈ RK .

35

The marginal (µ, λ)-chain has Mtd

k(µ′, λ′|µ, λ) =

∫RK

π(µ′, λ′|θ)π(θ|µ, λ)dθ

=

∫RK

π(µ′, λ′θ|λ′e, θ)π(λ′e|θ)π(θ|µ, λ)dθ

∝∫RK

λ′θA−1

[K∑i=1

(θi − θ)2

]A− 12

exp

[−λ

′θ

2

K∑i=1

(θi − µ′)2

][

K∑i=1

mi∑j=1

(yij − θi)2

]Bλ′e

B−1exp

[−1

2

K∑i=1

mi∑j=1


]

|V −1λ |

1/2 exp

[−1

2(θ − θµ,λ)TV −1

λ (θ − θµ,λ)]dθ, λ, λ′ > 0, µ, µ′ ∈ R.

The marginal θ-chain has Mtd

k(θ′|θ) =

∫λ>0

∫ ∞µ=−∞

π(θ′|µ, λ)π(µ, λ|θ)dµdλ

∝∫λ>0

∫ ∞µ=−∞

λA−1θ

[K∑i=1

(θi − θ)2

]A− 12

exp

[−λθ

2

K∑i=1

(θi − µ)2

][

K∑i=1

mi∑j=1

(yij − θi)2

]Bλe

B−1 exp

[−1

2

K∑i=1

mi∑j=1

(yij − θi)2λe

]

|V −1λ |

1/2 exp

[−1

2(θ′ − θµ,λ)TV −1

λ (θ′ − θµ,λ)]dµdλ, θ, θ′ ∈ RK .

All the three above chains are geometrically ergodic or none of them is (Diaconis et al.,

2008).

3.3 Hobert & Geyer’s Gibbs Sampler Is Not Hilbert-Schmidt

Our main result is

Proposition 3.1. The marginal λ-chain of Hobert and Geyer’s (1998) Gibbs chain is not

Hilbert-Schmidt, i.e.∫λθ>0

∫λe>0

∫λ′θ>0

∫λ′e>0

[k(λ′|λ)

π(λ′)

]2

π(λ)π(λ′)dλ′dλ =∞.

We denote by “constant” some known positive constant. We need the following

lemmas for the proof.

36

Lemma 3.2. Given positive integer number K, positive real numbers ci, 1 ≤ i ≤ K, and

real numbers c and λ such that c− λ2∑K

i=11ci> 0. Let

Mi =

ci 0 0 · · · 0 −λ

0 ci+1 0 · · · 0 −λ

0 0 ci+2 · · · 0 −λ...

......

. . ....

...

0 0 0 · · · cK −λ

−λ −λ −λ · · · −λ c

, i = 1, . . . , K.

Then

|M1| =

(K∏i=1

ci

)(c− λ2

K∑i=1

1

ci

),

where |M1| is the determinant of M1. And the unique solution x for M1x = r, where

r = (r1, r2, . . . , rK+1)T and x = (x1, x2, . . . , xK+1)T are (K + 1)× 1 column vector, is

xK+1 =rK+1 + λ

∑Ki=1

rici

c− λ2∑K

i=11ci

and xi = (ri + λxK+1)/ci, i = 1, 2, . . . , K.

Proof. We have

|M1| = c1|M2|+ (−1)K+2(−λ)

∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣

0 c2 0 · · · 0

0 0 c3 · · · 0

......

.... . .

...

0 0 0 · · · cK

−λ −λ −λ · · · −λ

∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣= c1|M2|+ (−1)K+2(−λ)

[c2c3 . . . cK(−λ)(−1)K−1

]= c1|M2| − c2c3 . . . cKλ

2.

37

Do the same we have

|Mi| = ci|Mi+1| − ci+1ci+2 . . . cKλ2, i = 1, 2, . . . , K − 1,

|MK | = cKc− λ2.

Thus,

|M1| − c1|M2| = −∏K

j=1 cj

c1

λ2

c1c2 . . . ci−1|Mi| − c1c2 . . . ci|Mi+1| = −λ2

∏Kj=1 cj

ci, i = 2, 3, . . . , K − 1,

c1c2 . . . cK−1|MK | = c1c2 . . . cKc− λ2

∏Kj=1 cj

cK.

Sum them up we get

|M1| = −

[K∏j=1

cj

]λ2

K∑i=1

1

ci+

[K∏i=1

ci

]c

=

[K∏i=1

ci

][c− λ2

K∑i=1

1

ci

].

Now we solve the equation M1x = r to find x. We have

c1x1 − λxK+1 = r1,

. . .

cKxK − λxK+1 = rK ,

− λK∑i=1

xi + cxK+1 = rK+1.

Plug xi = (ri + λxK+1)/ci in the last equation we have

−λK∑i=1

ri + λxK+1

ci+ cxK+1 = rK+1 ⇔ xK+1

(c− λ2

K∑i=1

1

ci

)= rK+1 + λ

K∑i=1

rici

⇔ xK+1 =rK+1 + λ

∑Ki=1

rici

c− λ2∑K

i=11ci

.

38

Lemma 3.3. Let

(X, Y )T ∼ N

µXµY

,

σ2X ρσXσY

ρσXσY σ2Y

,

where µX , µY , σX ,σY , and ρ are functions of a vector τ in Rn. Suppose that σX → ∞

when τ → 0, but µX , µY , σY , and Cov(X, Y ) are uniformly bounded by a constant for all

τ . Given any constants a and b > 0, there exists ε > 0 such that

E[(X − Y )2|X − a|b] > (constant)σb+2X for 0 < τ < (ε, ε, · · · , ε)T ,

where (ε, ε, · · · , ε)T ∈ Rn and “<” is the partial order in Rn.

Proof. Denote h(X) = |X − a|b. Denote conditional expectation and conditional variance

with respect to X by EX and VarX respectively. Note that

E[(X − Y )2|X − a|b] = E[(X − Y )2h(X)] = E[h(X)EX(Y −X)2].

Since

EX(Y −X)2 = VarXY + (EXY −X)2 ≥ (EXY −X)2,

we have

E[h(X)EX(Y −X)2] ≥ E[h(X)(EXY −X)2

].

(X, Y ) has normal distribution, so

EXY = µY + ρσYσX

(X − µX).

Denote U = X − a then h(X) = |U |b. We get

X − EXY = X − µY − ρσYσX

(X − µX)

=

(1− ρσY

σX

)X −

(µY − ρ

σYσX

µX

)=

(1− ρσY

σX

)(X − a) +

[a

(1− ρσY

σX

)−(µY − ρ

σYσX

µX

)]= c′U + d′,

39

where

c′ = 1− ρσYσX

and d′ = a

(1− ρσY

σX

)−(µY − ρ

σYσX

µX

).

Denote c = |c′| and d = |d′|. Combining above formulas then

E[h(X)EX(Y −X)2] ≥ E[|U |b(c′U + d′)2]

= E[c2|U |b+2 + 2c′d′|U |bU + d2|U |b]

≥ E[c2|U |b+2 − 2cd|U |b+1]

= c[cE|U |b+2 − 2dE|U |b+1

]and hence

c−1E[h(X)EX(Y −X)2] ≥ c

2E|U |b+2 +

[ c2E|U |b+2 − 2dE|U |b+1

].

It suffices to show that (c/2)E|U |b+2 > (constant)σb+2X and (c/2)E|U |b+2 − 2dE|U |b+1 > 0

for small τ . To do that, we need to find bounds for c and d and an representation for

E|U |r vis σrU for all r > 0.

We have

ρ =Cov(X, Y )

σXσY⇒ ρ

σYσX

=Cov(X, Y )

σ2X

.

Because Cov(X,Y) is uniformly bounded by a constant for all τ and σX → ∞ when

τ → 0, there exists ε1 > 0 such that |ρσY /σX | < 12

for τ < ε1. Thus,

3

2> c >

1

2for τ < ε1.

µX and µY are uniformly bounded by a constant for all τ , and |ρσY /σX | < 12

for τ < ε1

and therefore d′ is uniformly bounded by a constant for τ < ε1. So there exists a constant

M > 0 such that

d < M for τ < ε1.

We have

U ∼ N(µU , σ2U),

40

where µU = µX − a and σU = σX . Denote the rising factorial by

zn = z(z + 1) · · · (z + n− 1), n ∈ N,

and z0 = 1. From Winkelbauer (2012), we have

E|U |r = σrU2r/2Γ( r+1

2)

√π

Φ

(−r

2,1

2,− µU

2σ2U

), r > −1,

where

Φ(α, β, x) =∞∑n=0

αn

βnxn

n!= 1 +

∞∑n=1

αn

βnxn

n!

is the Kummer’s confluent hypergeometric functions. When β is not negative integer

number, this function is analytic and hence it is continuous with respect to x.

Now we show that c2E|U |b+2 − 2dE|U |b+1 is positive when τ is small enough. For

τ < ε1 we have

c

2E|U |b+2 − 2dE|U |b+1

≥ 1

4E|U |b+2 − 2ME|U |b+1

=1

4σb+2U 2(b+2)/2 Γ( b+2+1

2)

√π

Φ

(−b+ 2

2,1

2,− µU

2σ2U

)− 2Mσb+1

U 2(b+1)/2 Γ( b+1+12

)√π

Φ

(−b+ 1

2,1

2,− µU

2σ2U

)= σb+1

U 2(b−2)/2π−1/2

[σUΓ

(b+ 3

2

)Φ

(−b+ 2

2,1

2,− µU

2σ2U

)−4√

2MΓ

(b+ 2

2

)Φ

(−b+ 1

2,1

2,− µU

2σ2U

)].

Φ(− b+2

2, 1

2, x)

is continuous with respect to x and Φ(− b+2

2, 1

2, 0)

= 1, so there exists

ε2 > 0 such that Φ(− b+2

2, 1

2, x)> 1

2for |x| < ε2. Since µX is uniformly bounded by a

constant for all τ , µU is also uniformly bounded by a constant for all τ . Combining that

with σU = σX →∞ when τ → 0, we get − µU2σ2U→ 0 when τ → 0. Then there exists ε3 > 0

such that∣∣∣ µU2σ2

U

∣∣∣ < ε2 for τ < ε3 and hence Φ(− b+2

2, 1

2,− µU

2σ2U

)> 1

2for τ < ε3. Φ

(− b+1

2, 1

2, x)

is continuous so there exists M1 such that Φ(− b+1

2, 1

2,− µU

2σ2U

)< M1 for τ < ε3. Thus, for

41

τ < min{ε1, ε3} we have

c

2E|U |b+2 − 2dE|U |b+1

≥ σb+1U 2(b−3)/2π−1/2

[σUΓ

(b+ 3

2

)1

2− 4√

2MM1Γ

(b+ 2

2

)].

σU → ∞ when τ → 0, so there exists ε4 > 0 such that the last bracket in the above

formula is positive for τ < ε4. Now if we select ε = min{ε1, ε3, ε4} then c2E|U |b+2 −

2dE|U |b+1 > 0 for τ < ε.

Finally, for τ < ε, we have

c2−1E|U |b+2 ≥ 2−12−1σb+2U 2(b+2)/2 Γ( b+2+1

2)

√π

Φ

(−b+ 2

2,1

2,− µU

2σ2U

)≥ 2−2σb+2

U 2(b+2)/2 Γ( b+2+12

)√π

2−1

= (constant)σb+2X .

Proof of Theorem 3.1. We will calculate k(λ′|λ), and then calculate π(λ). From (3–1), we

have

π(λ′|ζ)π(ζ|λ)

∝

[b1 +

1

2

K∑i=1

(θi − µ)2

]A1

λ′θA1−1

exp

[−b1λ

′θ −

1

2

K∑i=1

(θi − µ)2λ′θ

][b2 +

1

2

K∑i=1

mi∑j=1

(yij − θi)2

]A2

λ′eA2−1

exp

[−b2λ

′e −

1

2

K∑i=1

mi∑j=1


]

|V −1λ |

1/2 exp

[−1

2(ζ − ζλ)TV −1

λ (ζ − ζλ)]

= λ′θA1−1

λ′eA2−1

exp(−b1λ′θ − b2λ

′e)|V −1

λ |1/2g(ζ) exp

(−1

2g1

), λ > 0, ζ ∈ RK+1,

42

where

g(ζ) =

[b1 +

1

2

K∑i=1

(θi − µ)2

]A1[b2 +

1

2

K∑i=1

mi∑j=1

(yij − θi)2

]A2

and

g1 =K∑i=1

(θi − µ)2λ′θ +K∑i=1

mi∑j=1

(yij − θi)2λ′e + (ζ − ζλ)TV −1λ (ζ − ζλ).

Consider some (K + 1)-variate normal distribution N(ε,Σ). Note that∫RK+1

g(ζ)√|Σ−1| exp

[−1

2(ζ − ε)TΣ−1(ζ − ε)

]dζ ∝ E(ε,Σ)g

where E(ε,Σ) is the expectation with respect to N(ε,Σ) distribution, so∫RK+1

g(ζ) exp

{−1

2

[ζTΣ−1ζ − 2εTΣ−1ζ

]}dζ ∝ 1

|Σ−1|1/2exp

[1

2εTΣ−1ε

]E(ε,Σ)g. (3–10)

We will separate a kernel of multivariate normal distribution from exp(−12g1). We have

g1 =

[K∑i=1

(θi − µ)2λ′θ +K∑i=1

mi∑j=1

(θ2i − 2yijθi)λ

′e + ζTV −1

λ ζ − 2ζTλ V−1λ ζ

]

+

[λ′e

K∑i=1

mi∑j=1

y2ij + ζTλ V

−1λ ζλ

]

= g2 + g3,

where g2 and g3 are the first and the second brackets respectively. Note that g3 does not

depend on ζ. We will re-write g2 as ζTΣ−1ζ−2εTΣ−1ζ. Denote Λθ = λθ +λ′θ, Λe = λe +λ′e,

43

and Λ = (Λθ,Λe). Using (3–2), we have

g2 =K∑i=1

(θ2i − 2θiµ+ µ2)λ′θ +

K∑i=1

(miθ2i − 2miyiθi)λ

′e + [ζTV −1

λ ζ − 2ζTλ V−1λ ζ] (3–11)

=K∑i=1

(λ′θ +miλ′e)θ

2i +Kλ′θµ

2 − 2λ′θ

K∑i=1

µθi − 2K∑i=1

λ′emiyiθi + [ζTV −1λ ζ − 2ζTλ V

−1λ ζ]

= ζT

D2λ′ −λ′θ1

−λ′θ1T Kλ′θ

ζ − 2

λ′em1y1

...

λ′emK yK

0

T

ζ + [ζTV −1λ ζ − 2ζTλ V

−1λ ζ]

= ζT

D2λ′ −λ′θ1

−λ′θ1T Kλ′θ

ζ − 2

λ′em1y1

...

λ′emK yK

0

T

ζ + ζTV −1λ ζ − 2

λem1y1

...

λemK yK

λ0µ0

T

ζ

= ζTV −1Λ ζ − 2

Λem1y1

...

ΛemK yK

λ0µ0

T

ζ

= ζTV −1Λ ζ − 2ζTΛV

−1Λ ζ.

Thus,

π(λ′|ζ)π(ζ|λ)

∝ λ′θA1−1

λ′eA2−1


′e)|V −1

λ |1/2 exp

(−1

2g3

)g(ζ) exp

[−1

2

(ζTV −1

Λ ζ − 2ζTΛV−1

Λ ζ)], λ > 0, ζ ∈ RK+1.

44

Denote E(ζλ,Vλ) by Eλ. Using (3–10), we have

k(λ′|λ) =

∫RK+1

π(λ′|ζ)π(ζ|λ)dζ

∝ λ′θA1−1

λ′eA2−1


′e)|V −1

λ |1/2 exp

(−1

2g3

)EΛg

1

|V −1Λ |1/2

exp

[1

2ζTΛV

−1Λ ζΛ

]= λ′θ

A1−1λ′e

A2−1 |V −1λ |1/2

|V −1Λ |1/2

EΛg exp(−b1λ′θ − b2λ

′e)

exp

[−1

2

(λ′e

K∑i=1

mi∑j=1

y2ij + ζTλ V

−1λ ζλ − ζTΛV −1

Λ ζΛ

)], λ, λ′ > 0. (3–12)

Now we calculate π(λ). We have

π(λ, ζ) ∝K∏i=1

mi∏j=1

f(yij|θ, λe)f(θ|µ, λθ)f(µ)f(λθ)f(λe)

=K∏i=1

mi∏j=1

{λ1/2e exp

[−1

2λe(yij − θi)2

]} K∏i=1

{λ

1/2θ exp

[−1

2λθ(θi − µ)2

]}exp[−λ0(µ− µ0)2/2][λa1−1

θ e−b1λθ ][λa2−1e e−b2λe ]

= λK/2+a1−1θ λN/2+a2−1

e exp(−b1λθ − b2λe)

exp

{−1

2

[K∑i=1

mi∑j=1

λe(yij − θi)2 +K∑i=1

λθ(θi − µ)2 + λ0(µ− µ0)2

]}

∝ λA1−1θ λA2−1

e exp(−b1λθ − b2λe)

exp

(−1

2λe

K∑i=1

mi∑j=1

y2ij

)exp

(−1

2g4

), λ > 0, ζ ∈ RK+1,

where

g4 =K∑i=1

mi∑j=1

λe(θ2i − 2yijθi) +

K∑i=1

λθ(θi − µ)2 + λ0(µ2 − 2µµ0).

45

Doing the same as in (3–11), we get

g4 =

[K∑i=1

miλeθ2i − 2

K∑i=1

λemiyiθi

]+

[K∑i=1

λθθ2i − 2

K∑i=1

λθθiµ+Kλθµ2

]+ [λ0µ

2 − 2λ0µ0µ]

=

[K∑i=1

(λθ +miλe)θ2i + (λ0 +Kλθ)µ

2 − 2λθ

K∑i=1

θiµ

]− 2

[K∑i=1

λemiyiθi + λ0µ0µ

]

= ζTV −1λ ζ − 2ζTλ V

−1λ ζ.

Thus,

π(λ, ζ) ∝ λA1−1θ λA2−1

e exp(−b1λθ − b2λe) exp

[−1

2

(λe

K∑i=1

mi∑j=1

y2ij + ζTV −1


)],

λ > 0, ζ ∈ RK+1.

Since π(ζ|λ) ∼ N(ζλ, Vλ), we have

π(λ) =π(λ, ζ)

π(ζ|λ)

∝ λA1−1θ λA2−1

e exp(−b1λθ − b2λe) exp

[−1

2

(λe

K∑i=1

mi∑j=1

y2ij + ζTV −1


)]{|V −1λ |

1/2 exp

[−1

2(ζ − ζλ)TV −1

λ (ζ − ζλ)]}−1

= λA1−1θ λA2−1

e exp(−b1λθ − b2λe)1

|V −1λ |1/2

exp

[−1

2

(λe

K∑i=1

mi∑j=1

y2ij − ζTλ V −1

λ ζλ

)], λ > 0. (3–13)

46

From (3–12) and (3–13), we have

k(λ′|λ)

π(λ′)

∝ λ′θA1−1

λ′eA2−1 |V −1

λ |1/2

|V −1Λ |1/2

EΛg exp(−b1λ′θ − b2λ

′e)

exp

[−1

2

(λ′e

K∑i=1

mi∑j=1

y2ij + ζTλ V

−1λ ζλ − ζTΛV −1

Λ ζΛ

)]{λ′θ

A1−1λ′e

A2−1exp(−b1λ

′θ − b2λ

′e)

1

|V −1λ′ |1/2

exp

[−1

2

(λ′e

K∑i=1

mi∑j=1

y2ij − ζTλ′V −1

λ′ ζλ′

)]}−1

=

(|V −1λ ||V

−1λ′ |

|V −1Λ |

)1/2

EΛg exp

[−1

2

(ζTλ V

−1λ ζλ + ζTλ′V

−1λ′ ζλ′ − ζ

TΛV−1

Λ ζΛ

)], λ, λ′ > 0.

Hence,

G(λ, λ′) :=

[k(λ′|λ)

π(λ′)

]2

π(λ)π(λ′)

∝ |V−1λ ||V

−1λ′ |

|V −1Λ |

(EΛg)2 exp

[−1

2

(2ζTλ V

−1λ ζλ + 2ζTλ′V

−1λ′ ζλ′ − 2ζTΛV

−1Λ ζΛ

)]λA1−1θ λA2−1

e exp(−b1λθ − b2λe)1

|V −1λ |1/2

exp

[−1

2

(λe

K∑i=1

mi∑j=1

y2ij − ζTλ V −1

λ ζλ

)]

λ′θA1−1

λ′eA2−1


′e)

1

|V −1λ′ |1/2

exp

[−1

2

(λ′e

K∑i=1

mi∑j=1

y2ij − ζTλ′V −1

λ′ ζλ′

)]

=

[|V −1λ ||V

−1λ′ |

|V −1Λ |2

]1/2

(EΛg)2λA1−1θ λA2−1

e λ′θA1−1

λ′eA2−1

exp(−b1λθ − b2λe − b1λ′θ − b2λ

′e)

exp

(−1

2g5

), λ, λ′ > 0, (3–14)

where

g5 = Λe

K∑i=1

mi∑j=1

y2ij + ζTλ V

−1λ ζλ + ζTλ′V

−1λ′ ζλ′ − 2ζTΛV

−1Λ ζΛ.

We will show that the integral of G(λ, λ′) over a domain, which is a subset of

a neighborhood of 0 and will be defined later, is infinity. Denote m = minimi and

M = maximi. Consider some constant δ > 0 such that

2Nδ, 8(M + 1)δ < λ0. (3–15)

47

Below we only consider (λ, λ′) ∈ (0, δ)4. We need to analyze exp(−12g5), calculate |V −1

λ |,

and finally analyze EΛg.

We now analyze exp(−12g5). Hobert and Geyer (1998) gave an exact formula for ζλ.

We will re-derive this in a different way. By Lemma 3.2, we have

ζλ,K+1 =λ0µ0 + λθ

∑Ki=1

λemiyiλθ+miλe

λ0 +Kλθ − λ2θ

∑Ki=1

1λθ+miλe

=λ0µ0 + sλλ0 + tλ

,

where sλ =∑ λθλemi

λθ+miλeyi,

tλ = Kλθ − λ2θ

K∑i=1

1

λθ +miλe(3–16)

(see Hobert and Geyer, 1998, p.47), and

ζλ,i =λemiyi + λθζλ,K+1

λθ +miλe, i = 1, . . . , K.

ζλ,K+1 is a convex combination of yis and µ0 so it is bounded by a constant (see Hobert

and Geyer, 1998, p.418). ζλ,i is a convex combination of yi and ζλ,K+1 so it is also

bounded by a constant. For (λ, λ′) ∈ (0, δ)4, elements of V −1λ is bounded by a constant.

So ζTλ V−1λ ζλ is bounded by a constant. Doing the same, ζTλ′V

−1λ′ ζλ′ and ζTΛV

−1Λ ζΛ are also

bounded by a constant. Λe

∑Ki=1

∑mij=1 y

2ij is clearly bounded by a constant. Combining

all of these bounds, g5 is bounded by a constant and hence

exp(−1

2g5) > constant > 0. (3–17)

We now calculate |V −1λ |. By Lemma 3.2, we get

|V −1λ | =

K∏i=1

(λθ +miλe)

[λ0 +Kλθ − λ2

θ

K∑i=1

1

λθ +miλe

]= (λ0 + tλ)

K∏i=1

(λθ +miλe). (3–18)

Denote

pλ,i =λθ

λθ +miλe, i = 1, . . . , K,

and

αλ,i = miλepλ,i, i = 1, . . . , K.

48

Note that tλ =∑K

i=1 αλ,i. Since 0 ≤ pΛ,i ≤ 1,

αΛ,i = miΛepΛ,i ≤ miΛe ≤M2δ, i = 1, . . . , K.

Thus,

tΛ =K∑i=1

αΛ,i ≤K∑i=1

miΛe = NΛe < 2Nδ < λ0.

Doing the same we have

tλ, tλ′ < Nδ < λ0.

Using (3–18), we have

|V −1λ ||V

−1λ′ |

|V −1Λ |2

=(λ0 + tλ)

[∏Ki=1(λθ +miλe)

](λ0 + tλ′)

[∏Ki=1(λ′θ +miλ

′e)]

(λ0 + tΛ)2∏K

i=1(Λθ +miΛe)2.

Because λ0 + tλ > λ0, λ0 + tλ′ > λ0, and λ0 + tΛ < 2λ0,

(λ0 + tλ)(λ0 + tλ′)

(λ0 + tΛ)2>

λ20

(2λ0)2=

1

4.

So

|V −1λ ||V

−1λ′ |

|V −1Λ |2

>1

4

K∏i=1

(λθ +miλe)(λ′θ +miλ

′e)

(Λθ +miΛe)2

≥ (constant)(

(λθ +mλe)(λ′θ +mλ′e)

(Λθ +MΛe)2

)K. (3–19)

We have

exp(−b1λθ − b2λe) exp(−b1λ′θ − b2λ

′e) ≥ exp[−2δ(b1 + b2)] = constant > 0.

From (3–14), (3–17), (3–19), and the above inequality, for (λ, λ′) ∈ (0, δ)4, we have

G(λ, λ′) ≥ (constant)(EΛg)2

λA1−1θ λA2−1

e λ′θA1−1

λ′eA2−1

((λθ +mλe)(λ

′θ +mλ′e)

(Λθ +MΛe)2

)K/2. (3–20)

49

Now we analyze EΛg. Because a1, a2 > 0 and K,mi ≥ 2, we have A1 > K/2 ≥ 1 and

A2 > (m1 +m2)/2 ≥ 2. A1, A2 > 1 so

g(ζ) >

[1

2(θ1 − µ)2

]A1[

1

2(y11 − θ1)2

]A2

= (constant)[(θ1 − µ)2

]A1[(θ1 − y11)2

]A2

= (constant)[(θ1 − µ)2(θ1 − y11)2A2/A1

]A1.

By Jensen’s inequality,

EΛg > (constant){EΛ

[(θ1 − µ)2(θ1 − y11)2A2/A1

]}A1.

Hobert and Geyer (see 1998, p.418) shows that the elements of ζΛ are uniformly

bounded by the constant max{µ0, |y1|, . . . , |yK |}, the elements of VΛ except VΛ(i, i)’s,

1 ≤ i ≤ K, are uniformly bounded by λ−10 , and VΛ(i, i) > (Λθ + MΛe)

−1 for 1 ≤ i ≤ K.

Denote τ = (λ, λ′) then (θ1, µ)T has bivariate normal distribution which satisfies

conditions in Lemma 3.3. So there exists real number δ1 > 0 such that

EΛ

[(θ1 − µ)2(θ1 − y11)2A2/A1

]> (constant)

√VarΛθ1

2+2A2/A1

= (constant)[VarΛθ1]1+A2/A1

for τ < (δ1, δ1, δ1, δ1). Finally,

EΛg > (constant)[VarΛθ1](1+A2/A1)A1 > (constant)(Λθ +MΛe)−A1−A2 , τ < (δ1, δ1, δ1, δ1).

Combining this with condition (3–15), we have final conditions for δ

δ < δ1 and 2Nδ, 8(M + 1)δ < λ0. (3–21)

50

From (3–20), when δ satisfies condition (3–21) and (λ, λ′) ∈ (0, δ)4, we have

G(λ, λ′) ≥ (constant)1

(Λθ +MΛe)2A1+2A2λA1−1θ λA2−1

e λ′θA1−1

λ′eA2−1

((λθ +mλe)(λ

′θ +mλ′e)

(Λθ +MΛe)2

)K/2= (constant)

1

(Λθ +MΛe)4

(λθ

Λθ +MΛe

)A1−1(λe

Λθ +MΛe

)A2−1

(λ′θ

Λθ +MΛe

)A1−1(λ′e

Λθ +MΛe

)A2−1(λθ +mλeΛθ +MΛe

)K/2(λ′θ +mλ′eΛθ +MΛe

)K/2.

Consider the set

D ={

(λ, λ′) ∈ (0, δ)4∣∣∣12<λθλ′e,λeλ′e,λ′θλ′e,λθλ′θ,λeλ′θ,λθλe

< 2}.

This set has a positive Lebesgue measure in R4. Because 12< x

y< 2 and 1

2< y

x< 2 are

equivalent, in D we have1

2<x

y< 2

where x and y could be any of λθ, λe, λ′θ, or λ′e. Some properties which help us figure out

the set D: D contains the segment {(λ, λ′) = (x, x, x, x) : 0 < x < δ}; if (λ, λ′) ∈ D then

(pλ, pλ′) ∈ D for 0 < p < 1. Note that

λθΛθ +MΛe

=

[1 +

λ′θλθ

+Mλeλθ

+Mλ′eλθ

]−1

≥ (1 + 2(1 + 2M))−1 = constant.

We can do the same to get positive constant lower bounds for

x

Λθ +MΛe

where x could be any λe, λ′θ, and λ′e. Also note that

λθ +mλeΛθ +MΛe

>λθ

Λθ +MΛe

andλ′θ +mλ′eΛθ +MΛe

>λ′θ

Λθ +MΛe

.

51

Thus, on D, [k(λ′|λ)

π(λ′)

]2

π(λ)π(λ′) ≥ (constant)1

(Λθ +MΛe)4.

We only need to show that ∫D

(Λθ +MΛe)−4dλdλ′ =∞. (3–22)

We apply spherical coordinates in R4 (see Leoni, 2009, p.253) with

λθ = r cos(φ1),

λe = r sin(φ1) cos(φ2),

λ′θ = r sin(φ1) sin(φ2) cos(φ3),

λ′e = r sin(φ1) sin(φ2) sin(φ3),

dλdλ′ = r3 sin2(φ1) sin(φ2)drdφ1dφ2dφ3.

Integrating over the domain D is equivalent to integrating over some domain S in the

spherical coordinate system. We will find some subset Sb of S which is easy to describe

(see A.3 for graphs of D and Sb in R2 and R3). Denote Φ = (φ1, φ2, φ3) and

h(Φ) =[

cos(φ1) + sin(φ1) cos(φ2) + sin(φ1) sin(φ2) cos(φ3)

+ sin(φ1) sin(φ2) sin(φ3)]−4

sin2(φ1) sin(φ2).

52

M(Λθ + Λe) > Λθ +MΛe so∫D

(Λθ +MΛe)−4dλdλ′

≥M−4

∫D

(Λθ + Λe)−4dλdλ′

= M−4

∫S

r−4[

cos(φ1) + sin(φ1) cos(φ2) + sin(φ1) sin(φ2) cos(φ3)

+ sin(φ1) sin(φ2) sin(φ3)]−4

r3 sin2(φ1) sin(φ2)drdΦ.

= M−4

∫S

r−1h(Φ)drdΦ

≥M−4

∫Sb

r−1h(Φ)drdΦ.

We only need to show that the integral over Sb is infinity. Because D contains the

segment {(λ, λ′) = (x, x, x, x) : 0 < x < δ}, we find the image of that segment in

the spherical coordinate system. Or we find Φ0 = (φ0,1, φ0,2, φ0,3) such that (λ, λ′) =

(x, x, x, x). λ′θ = λ′e so sin(φ3) = cos(φ3) and hence φ3 = π/4. λ′θ = λe implies cos(φ2) =

sin(φ2)/√

(2). By sin2(φ2) + cos2(φ2) = 1, cos(φ2) = 1/√

3. λθ = λe implies cos(φ1) =

sin(φ1)/√

3, and cos(φ1) = 1/2 follows. Finally, Φ0 = (arccos(1/2), arccos(1/√

3), π/4) and

cos(φ0,1) = sin(φ0,1) cos(φ0,2)

= sin(φ0,1) sin(φ0,2) cos(φ0,3) = sin(φ0,1) sin(φ0,2) sin(φ0,3) = 1/2.

Denote

Sn =

{(r,Φ)

∣∣∣0 < r < δ, |φi − φ0,i| <1

n, i = 1, 2, 3

}.

sin(φi) and cos(φi) are continuous so cos(φ1), sin(φ1) cos(φ2), sin(φ1) sin(φ2) cos(φ3), and

sin(φ1) sin(φ2) sin(φ3) are close to 1/2 when Φ is close to Φ0. So given any 0 < ε < 1/2,

there exists b large enough such that

1

2− ε < cos(φ1), sin(φ1) cos(φ2),

sin(φ1) sin(φ2) cos(φ3), sin(φ1) sin(φ2) sin(φ3) <1

2+ ε

53

for (r,Φ) ∈ Sb. Thus,

r(1/2 + ε) > λθ, λe, λ′θ, λ′e > r(1/2− ε), (r,Φ) ∈ Sb

and hence

1/2− ε1/2 + ε

=r(1/2− ε)r(1/2 + ε)

<x

y<r(1/2 + ε)

r(1/2− ε)=

1/2 + ε

1/2− ε, (r,Φ) ∈ Sb,

where x and y could be any of λθ, λe, λ′θ, or λ′e. Note that

1

2<

1/2− ε1/2 + ε

is equivalent to ε < 1/6. If we select ε = 1/7 then there exists b such that if (r,Φ) ∈ Sb

then1

2<x

y< 2

where x and y could be any of λθ, λe, λ′θ, or λ′e. We also have 0 < x < δ where x could

be any of λθ, λe, λ′θ, or λ′e. Hence (λ, λ′) ∈ D or Sb is a subset of S. We have∫Sb

r−1h(Φ)drdΦ =

∫ δ

0

r−1dr

∫ φ0,1+ 1b

φ0,1− 1b

∫ φ0,2+ 1b

φ0,2− 1b

∫ φ0,3+ 1b

φ0,3− 1b

h(Φ)dΦ.

Because∫ δ

0r−1dr =∞ and h(Φ) is positive, the last integral is infinity.

3.4 The Gibbs Sampler with Improper Priors (and Alternative Blocking) Is NotTrace Class

Recall that a necessary condition for proper posterior is A = a+K/2 > 12.

Proposition 3.4. Suppose that there exists i such that yijs are not the same for all j and

A > 3/2 then the GS with improper priors is not trace class, i.e.∫λ>0

∫ ∞µ=−∞

∫θ∈RK

π(µ, λ|θ)π(θ|µ, λ)dθdµdλ =∞.

54

Proof. From (3–4), (3–7), and (3–8) we have

π(µ, λ|θ)π(θ|µ, λ)

= π(µ, λθ|λe, θ)π(λe|θ)π(θ|µ, λ)

∝ λA−1θ

[K∑i=1

(θi − θ)2

]A− 12

exp

[−λθ

2

K∑i=1

(θi − µ)2

][

K∑i=1

mi∑j=1

(yij − θi)2

]Bλe

B−1 exp

[−1

2

K∑i=1

mi∑j=1

(yij − θi)2λe

]

|V −1λ |

1/2 exp

[−1

2(θ − θµ,λ)TV −1

λ (θ − θµ,λ)]

= λA−1θ λB−1

e |V −1λ |

1/2g(θ) exp

(−1

2h1

), λ > 0, µ ∈ R, θ ∈ RK ,

where

g(θ) =

[K∑i=1

(θi − θ)2

]A− 12[

K∑i=1

mi∑j=1

(yij − θi)2

]B(3–23)

and

h1 =K∑i=1

(θi − µ)2λθ +K∑i=1

mi∑j=1

(yij − θi)2λe + (θ − θµ,λ)TV −1λ (θ − θµ,λ).

We will separate a kernel of multivariate normal distribution from exp(−1

2h1

). We have

h1 =K∑i=1

(θ2i − 2θiµ+ µ2)λθ +

K∑i=1

mi∑j=1

(θ2i − 2yijθi + y2

ij)λe

+[θTV −1

λ θ − 2θTµ,λV−1λ θ + θTµ,λV

−1λ θµ,λ

]=

[K∑i=1

(θ2i − 2θiµ)λθ +

K∑i=1

mi∑j=1

(θ2i − 2yijθi)λe + θTV −1

λ θ − 2θTµ,λV−1λ θ

]

+

[Kµ2λθ + λe

K∑i=1

mi∑j=1

y2ij + θTµ,λV

−1λ θµ,λ

]

= h2 + h3,

where h2 and h3 are the first and the second brackets respectively. Note that h3 does not

depend on θ. We will rewrite h2 as θTΣ−1θ − 2εTΣ−1θ. Using (3–6) and V −12λ = 2V −1

λ , we

55

have

h2 =K∑i=1

(θ2i − 2θiµ)λθ +

K∑i=1

(miθ2i − 2miyiθi)λe +

[θTV −1


]=

K∑i=1

(λθ +miλe)θ2i − 2

K∑i=1

(λθµ+ λemiyi)θi +[θTV −1


]= θTV −1

λ θ − 2θTµ,λV−1λ θ +

[θTV −1


]= θTV −1

2λ θ − 2θTµ,λV−1

2λ θ.

Thus,

π(µ, λ|θ)π(θ|µ, λ) ∝ λA−1θ λB−1

e |V −1λ |

1/2 exp

(−1

2h3

)g(θ) exp

{−1

2[θTV −1

2λ θ − 2θTµ,λV−1

2λ θ]

}, λ > 0, µ ∈ R, θ ∈ RK .

As in (3–10), and denote E(θµ,λ,Vλ) by Eµ,λ, we have

G(µ, λ) :=

∫RK

π(µ, λ|θ)π(θ|µ, λ)dθ

∝ λA−1θ λB−1

e |V −1λ |

1/2 exp

(−1

2h3

)(Eµ,λg)

1

|V −12λ |1/2

exp

[1

2θTµ,λV

−12λ θµ,λ

]= λA−1

θ λB−1e

|V −1λ |1/2

|V −12λ |1/2

(Eµ,λg) exp

(−1

2h4

), λ > 0, µ ∈ R,

where

h4 = h3 − θTµ,λV −12λ θµ,λ = Kµ2λθ + λe

K∑i=1

mi∑j=1

y2ij + θTµ,λV

−1λ θµ,λ − θTµ,λV −1

2λ θµ,λ.

Since

|V −1λ | =

K∏i=1

(λθ +miλe),

we get

|V −12λ | =

K∏i=1

(2λθ +mi2λe) = 2K |V −1λ |.

56

Denote

c =K∑i=1

mi∑j=1

y2ij −

K∑i=1

miy2i .

There exists i such that yij 6= yij′ for some j 6= j′ so c > 0. V −12λ = 2V −1

λ so

h4 = Kµ2λθ + λe

K∑i=1

mi∑j=1

y2ij + θTµ,λV

−1λ θµ,λ − θTµ,λV −1

2λ θµ,λ

= cλe +

[K∑i=1

λemiy2i +Kµ2λθ − θTµ,λV −1

λ θµ,λ

]

= cλe + h5,

where h5 is the above bracket. Denote

pλ,i =λθ

λθ +miλe, i = 1, . . . , K,

and

αλ,i = miλepλ,i =miλeλθλθ +miλe

= λθ(1− pλ,i), i = 1, . . . , K.

By (3–5),

θµ,λ,i = pλ,iµ+ (1− pλ,i)yi. (3–24)

By (3–5) and (3–24),

(λθ +miλe)θ2µ,λ,i = (λθµ+ λemiyi)θµ,λ,i

= (λθµ+ λemiyi)[pλ,iµ+ (1− pλ,i)yi]

= λemi(1− pλ,i)y2i + λθpλ,iµ

2 + [λθ(1− pλ,i) + λemipλ,i]µyi

= (λemi − αλ,i)y2i + λθpλ,iµ

2 + 2αλ,iµyi.

57

Thus,

θTµ,λV−1λ θµ,λ =

K∑i=1

(λθ +miλe)θ2µ,λ,i

=K∑i=1

λemiy2i + µ2

K∑i=1

λθpλ,i −K∑i=1

αλ,i(y2i − 2µyi)

=K∑i=1

λemiy2i + µ2

K∑i=1

(λθpλ,i + αλ,i)−K∑i=1

αλ,i(µ− yi)2.

Because

λθpλ,i + αλ,i = λθpλ,i + (1− pλ,i)λθ = λθ,

we have

θTµ,λV−1λ θµ,λ =

K∑i=1

λemiy2i +Kµ2λθ −

K∑i=1

αλ,i(µ− yi)2

and hence

h5 =K∑i=1

αλ,i(µ− yi)2.

Finally,

G(µ, λ) ∝ λA−1θ λB−1

e (Eµ,λg) exp

(−1

2cλe −

1

2

K∑i=1

αλ,i(µ− yi)2

), λ > 0, µ ∈ R.

We will show that ∫D

G(µ, λ)dµdλ =∞,

where we define domain D later. Again, we denote by “constant” some known positive

constant. Denote

m = minimi and M = max

imi.

Since

mi∑j=1

(yij − θi)2 = miθ2i − 2miyiθi +

mi∑j=1

y2ij = mi(θi − yi)2 +

(mi∑j=1

y2ij −miy

2i

),

58

we get

K∑i=1

mi∑j=1

(yij − θi)2 ≥K∑i=1

(mi∑j=1

y2ij −miy

2i

)= c > 0

and hence

g(θ) ≥ cB

[K∑i=1

(θi − θ)2

]A− 12

.

By Jensen’s inequality and A− 1/2 > 1,

Eµ,λg ≥ cB

[Eµ,λ

K∑i=1

(θi − θ)2

]A− 12

.

Given a random vector Z with EZ = ε and V Z = Σ and a matrix C, we know that

E(ZTCZ) = tr(CΣ) + εTCε.

We also know thatK∑i=1

(θi − θ)2 = θT (I− 11T/K)θ.

Thus,

Eµ,λ

K∑i=1

(θi − θ)2 ≥ tr((I− 11T/K)Vλ) = tr(Vλ)−1

Ktr(Vλ) =

K − 1

K

K∑i=1

1

λθ +miλe

≥ (K − 1)1

λθ +Mλe

and hence

Eµ,λg ≥ cB(K − 1)A−12 (λθ +Mλe)

−(A− 12

).

Note that

αλ,i =λθmiλeλθ +miλe

≤ Mλθλeλθ +mλe

.

59

Combining above inequalities, we get

G(µ, λ) ≥ (constant)λA−1θ λB−1

e

1

(λθ +Mλe)A− 1

2

exp

(−1

2cλe −

1

2

Mλθλeλθ +mλe

t

), λ > 0, µ ∈ R,

where

t =K∑i=1

(µ− yi)2.

We will prove that∫D

λA−1θ λB−1

e

1

(λθ +Mλe)A− 1

2

exp

(−1

2cλe −

1

2

Mλθλeλθ +mλe

t

)dλdµ =∞.

Denote x := mλe, y := λθ, c′ = c/(2m), and t′ = Mt/(2m). λ > 0 is equivalent to x, y > 0

and dλ = m−1dxdy. We need to prove that∫D

xB−1yA−1 1

(y +Mx/m)A−12

exp

(−c′x− xy

x+ yt′)dxdydµ =∞.

Fix any constant ε > 0. Define a domain D such that x, |µ| < ε and y > ε. We have

xy

x+ y= x

y

x+ y≤ x < ε.

Because |µ| < ε, t′ < δ for some constant δ and hence

c′x+xy

x+ yt′ < c′ε+ εδ = constant.

We also have1

y +Mx/m>

1

y +Mε/m:=

1

y + c1

.

We only need to show that ∫D

xB−1 yA−1

(y + c1)A−12

dxdydµ =∞.

60

Note that ∫D

xB−1 yA−1

(y + c1)A−12

dxdydµ =

∫x<ε

xB−1dx

∫|µ|<ε

dµ

∫y>ε

yA−1

(y + c1)A−12

dy.

The first two integrals are positive. It suffices to show that the last one is infinity. c1 =

Mε/m ≥ ε implies∫y>ε

yA−1

(y + c1)A−12

dy ≥∫y>c1

yA−1

(y + c1)A−12

dy =

∫y>c1

(y

y + c1

)A− 12

y−12dy.

For y > c1, 2y > y + c1 and hence∫y>c1

(y

y + c1

)A− 12

y−12dy ≥

∫y>c1

1

2A−12

y−12dy =∞.

61

CHAPTER 4CHARACTERIZATION OF GEOMETRIC ERGODICITY FOR BIRTH-DEATH MARKOV

CHAINS

4.1 Summary

Recall from Section 2.2 that X = {Xn}∞n=0 is a birth-death chain with state space N

and Markov transition matrix given by

M =

r1 p1 0 0 0 · · ·

q2 r2 p2 0 0 · · ·

0 q3 r3 p3 0 · · ·...

......

......

. . .

.

X is irreducible, aperiodic, and positive recurrent if and only if the following three

conditions hold:

pi > 0 for all i ∈ N, (4–1)

ri > 0 for some i ∈ N, (4–2)

and

c =∞∑i=1

ci <∞, (4–3)

where

c1 = 1, ci =p1 p2 · · · pi−1

q2 q3 · · · qi, i = 2, 3, . . . (4–4)

The stationary distribution is π = {πi}∞i=1 with πi = ci/c for i = 1, 2, . . . For convenience,

denote c0 = p0 = 1. We have

πipi = πi+1qi+1, ∀i ∈ N, (4–5)

and X is always reversible.

Let Mn = (m(n)ij ) denote the n-step transition matrix of X. From Section 1.1.1, X is

geometrically ergodic if there exist a function R : N → [0,∞) and a constant 0 < ρ < 1

62

such that∞∑j=1

|m(n)ij − πj| ≤ R(i)ρn, ∀i ∈ N,∀n ∈ N.

We suppose throughout this chapter, except Corollary 4.5 and Corollary 4.15, that

M satisfies conditions (4–1)-(4–3). Here is what we will do in the remaining sections

of this chapter. Section 4.2 reviews some known results on the geometric ergodicity

of birth-death chains. In Section 4.3.1, we develop a simple necessary and sufficient

condition for the geometric ergodicity of birth-death chains. We apply this result to the

toy GS from Tan et al. (2013) in Section 4.3.2. We apply the method in Section 4.3.1

to study a random walk on Z in Section 4.3.3. Finally, Section 4.3.4 gives some results

which relate to birth-death chains.

4.2 Some Known Results on the Geometric Ergodicity of Birth-Death Chains

4.2.1 Orthogonal Polynomial Method

In this section, we review some results in van Doorn and Schrijner (1995). Note

that Section 3.1 in van Doorn and Schrijner (1995) defines geometric ergodicity in a less

standard way. It only requires that Markov chains converge at a geometric rate, even

when they converge to 0. Hence we have to add conditions (4–1)-(4–3) to their results.

van Doorn and Schrijner (1995) used Karlin and McGregor’s (1959) representation of

n-step transition matrix of birth-death chain via an orthogonal polynomial system.

Proposition 4.1. (Karlin and McGregor, 1959, Theorem 1) There exist polynomials

Qn(x)s of degree n− 1 such that

Q1(x) = 1

r1Q1(x) + p1Q2(x) = xQ1(x)

qiQi−1(x) + riQi(x) + piQi+1(x) = xQi(x), i ≥ 2.

(4–6)

63

And then there exists a unique measure ψ of total mass 1 on the interval [−1, 1] such

that {Qi(x)}∞i=1 are orthogonal with respect to ψ, i.e.

πj

∫ 1

−1

Qi(x)Qj(x)dψ(x) = δij, i, j ≥ 1, (4–7)

where δii = 1 and δij = 0 for i 6= j. Furthermore,

m(n)ij = πj

∫ 1

−1

xnQi(x)Qj(x)dψ(x), i, j ≥ 1. (4–8)

Denote Q(x) = (Q1(x), Q2(x), · · · )T then we can rewrite (4–6) as MQ(x) = xQ(x)

and Q1(x) = 1. It is very difficult to find a representation which is similar to (4–8)

for other Markov chains. Karlin and McGregor (1959) proved Proposition 4.1, but we

present their proof here in a clearer way by adding some extra steps as follows.

Proof of Proposition 4.1. For any real number x, consider the equation

Mφ = xφ

where φ = {φi}∞i=1 is a real sequence. That means

r1φ1 + p1φ2 = xφ1,

q2φ1 + r2φ2 + p2φ3 = xφ2,

q3φ2 + r3φ3 + p3φ4 = xφ3,

. . .

If we know φ1 then

φ2 = (x− r1)φ1/p1,

φ3 = [(x− r2)φ2 − q2φ1]/p2,

φ4 = [(x− r3)φ3 − q3φ2]/p3,

. . .

(4–9)

64

So for any real number x, Mφ = xφ has a unique solution φ to within a constant

factor. Because φ may not be in the space L2(π) so x may not be an eigenvalue.

For each real number x, let Q(x) = (Q1(x), Q2(x), · · · )T be the unique solution of

MQ(x) = xQ(x) such that Q1(x) = 1. Q1 is a polynomial of degree 0. As in (4–9), we

have p1Q2(x) = (x− r1)Q1(x) = x− r1. So Q2 is a polynomial of degree 1. As in (4–9),

piQi+1(x) = (x− ri)Qi(x)− qiQi−1(x), i ≥ 2.

By induction we can show that Qi is a polynomial of degree i− 1 for i ≥ 1.

Recall Hibert spaces L2(π) and L20(π) in Section 2.2. For i ∈ N, let e(i) ∈ L2(π)

denote the vector that has ith coordinate equal to 1/πi (note that it is not 1/√πi as in

Section 2.2) and has every other coordinates equal to 0. By (4–5), piπi = qi+1πi+1. Using

this equality for n ≥ 2, we get

(Me(n))i =∞∑j=1

mije(n)j = mine

(n)n = min/πn

=

pn−1/πn = qn/πn−1 if i = n− 1

rn/πn if i = n

qn+1/πn = pn/πn+1 if i = n+ 1

0 o.w.

and hence

Me(n) = qne(n−1) + rne

(n) + pne(n+1), n ≥ 2. (4–10)

For n = 1,

Me(1) = r1e(1) + p1e

(2).

We will prove that Qn(M)e(1) = e(n) for all n ≥ 1. It is obviously true for n = 1

because Q1(M) = I. Suppose that Qi(M)e(1) = e(i) for 1 ≤ i ≤ n, it suffices to show that

65

Qn+1(M)e(1) = e(n+1). From (4–6), we get

MQn(M) = qnQn−1(M) + rnQn(M) + pnQn+1(M)

⇒MQn(M)e(1) = qnQn−1(M)e(1) + rnQn(M)e(1) + pnQn+1(M)e(1)

⇒Me(n) = qne(n−1) + rne

(n) + pnQn+1(M)e(1).

Combining this with (4–10), we get Qn+1(M)e(1) = e(n+1).

For 1 ≤ i, j ≤ n,

⟨Mne(j), e(i)

⟩=∞∑k=1

(Mne(j))ke(i)k πk =

∞∑k=1

∞∑l=1

e(j)l m

(n)kl e

(i)k πk = e

(j)j m

(n)ij e

(i)i πi = m

(n)ij /πj,

so

m(n)ij = πj

⟨Mne(j), e(i)

⟩= πj

⟨MnQj(M)e(1), Qi(M)e(1)

⟩(because Qn(M)e(1) = e(n))

= πj⟨Qi(M)MnQj(M)e(1), e(1)

⟩(because Qi(M) is self-adjoint)

= πj⟨MnQi(M)Qj(M)e(1), e(1)

⟩(because Qi(M) commutes with M).

Denote α(x) = xnQi(x)Qj(x) then

m(n)ij = πj

⟨α(M)e(1), e(1)

⟩.

From formula (2.4) in Conway (1990, p.264),

m(n)ij = πj

∫ 1

−1

α(x)dψ(x)

where ψ(x) =⟨Exe

(1), e(1)⟩

(see, e.g., Conway, 1990, p.257, Lemma 1.9) and Ex is the

spectral measure of M .

By α(x) = xnQi(x)Qj(x), we get

m(n)ij = πj

∫ 1

−1

xnQi(x)Qj(x)dψ(x).

66

Plugging in n = 0 we get

πj

∫ 1

−1

Qi(x)Qj(x)dψ(x) = δij

where δii = m(0)ii = 1 and δij = m

(0)ij = 0 for i 6= j (note that M0 = I).

van Doorn and Schrijner (1995) showed that Qj+1(x) has j distinct real zeros

xj1 < xj2 < · · · < xjj, j ≥ 0,

and the following limits exist

ηj = limk→∞

xk,k−j+1, j ≥ 0

and

τ = limj→∞

ηj.

Using those limits, we have a condition for geometric ergodicity.

Theorem 4.2. (van Doorn and Schrijner, 1995, Theorem 3.4) X is geometrically ergodic

if and only if τ < 1.

The value of τ depends only on the limiting behavior of the parameters in (4–6)

when j → ∞. It is not easy to calculate τ in practice. We have upper and lower bounds

for τ as follows.

Theorem 4.3. (van Doorn and Schrijner, 1995, Theorem 3.5)

τ ≤ lim supj→∞

[rj +√pj−1qj +

√pjqj+1]

and

τ ≥ lim supn→∞

{1

n

n∑j=1

(rj + 2√pj−1qj)

}.

When there exist p := limi→∞ pi and q := limi→∞ qi, we have a simple necessary and

sufficient condition for geometric ergodicity as follows.

Proposition 4.4. (van Doorn and Schrijner, 1995, Corollary 3.6) X is geometrically

ergodic if and only if p 6= q.

67

Note that (4–1) and (4–2) are easy to check but (4–3) is not. If p and q exist, the

next corollary (which is equivalent to Proposition 4.4) shows that we do not need to

check condition (4–3). We do not suppose that (4–3) holds in the next corollary. Note

that X is irreducible and aperiodic if and only if pi > 0 for i ≥ 1, qi > 0 for i ≥ 2, and

ri > 0 for some i ≥ 1.

Corollary 4.5. Suppose that pi > 0 for i ≥ 1, qi > 0 for i ≥ 2, and ri > 0 for some i ≥ 1.

X is geometrically ergodic if and only if p < q.

Proof. Geometric ergodicity implies positive recurrence, so we restate Proposition 4.4

as follows. Suppose that pi > 0 for i ≥ 1, qi > 0 for i ≥ 2, and ri > 0 for some i ≥ 1, X is

geometrically ergodic if and only if p 6= q and (4–3) holds. It suffices to prove that (4–3)

and p 6= q are equivalent to p < q. Suppose that (4–3) and p 6= q hold. First we show that

(4–3) implies p ≤ q by contradiction. Suppose that p > q. Because p and q are finite and

p 6= q,

limi→∞

ci+1

ci= lim

i→∞

piqi+1

=p

q> 1.

(Note that the above limit is not true when p = q = 0.) By D’Alembert’s ratio test (see,

e.g., Knopp, 1951, p.117), the series∑∞

i=1 ci diverges. This contradicts (4–3) so p ≤ q.

Because p 6= q, we have p < q.

Conversely, suppose that p < q. It is obvious that p 6= q. We have

limi→∞

ci+1

ci=p

q< 1.

By D’Alembert’s ratio test,∑∞

i=1 ci converges.

Proposition 4.4 also means that X is irreducible, aperiodic, positive recurrent, and

sub-geometrically ergodic if and only if (4–1)-(4–3) hold and p = q. Tan et al. (2013)

have some examples of sub-geometrically ergodic chain, but those are marginal chains

of Gibb chains and do not cover all values 0 < p ≤ 1/2.

68

Proposition 4.6. For all 0 < p ≤ 1/2, there exists an irreducible, aperiodic, and positive

recurrent birth-death chain which is sub-geometrically ergodic.

Proof. When p = q, D’Alembert’s ratio test can not be used. So we will use a better

test, Raabe-Duhamel’s test (see, e.g., Knopp, 1951, p.285), to find an example of sub-

geometrically ergodic chain. By Raabe-Duhamel’s test, we only need to find pis and qis

such that p = q and

limi→∞

i

(cici+1

− 1

)= lim

i→∞i

(qi+1

pi− 1

)> 1.

Let qi+1 = p+ εi and pi = p+ δi for i ≥ 1. Then

i

(qi+1

pi− 1

)> 1⇔ p+ εi

p+ δi− 1 >

1

i⇔ εi − δi

p+ δi>

1

i⇔ εi > δi +

p+ δii⇔ εi >

p+ (i+ 1)δii

.

First considering the case 1/2 ≥ p > 0. We can select δi = −(i + 1)−1 and εi = p/i when

i is large enough (so 0 < pi, qi and pi + qi < 1). It is clearly that qi > 0. pi > 0 if and only if

i > 1p− 1. Because p ≤ 1/2, we have

pi + qi = 2p+ δi + εi−1 = 2p+p

i− 1− 1

i+ 1< 1

⇐ p

i− 1<

1

i+ 1⇔ p <

i− 1

i+ 1

⇐ i > 4.

We can select δi = εi = 0 for i ≤ 4 or i ≤ 1p− 1.

Finally for the case p = 0, we can select δi = (i+ 1)−1 and εi = (p+ 2)/i for i > 2.

4.2.2 Spectral Method

Mao (2010) used both spectral gap and drift condition to find a practical necessary

and sufficient condition for the geometric ergodicity of birth-death chains.

Theorem 4.7. (Mao, 2010, Theorem 4.3) The birth-death chain X is geometrically

ergodic if and only if

supi≥1

[∞∑j=i

cj

][i−1∑k=0

1

ckpk

]<∞. (4–11)

69

Chapter 4 in Chen (1992) reviews similar results from Mao and Zhang (2004) for

birth-death process.

4.3 Drift Condition Method

4.3.1 Geometric Ergodicity of Birth-Death Chains

We develop a necessary and sufficient condition for the geometric ergodicity of

birth-death chains via drift condition. Our condition includes two inequalities: lim inf qi >

0 and supi≥1

[cipi

∑i−1k=0

1ckpk

]< ∞. So, our condition is equivalent to the inequality 4–11

in Mao (2010). lim inf qi > 0 also appears in Lemma 3 in Tan et al. (2013) and is easy

to check. And supi≥2

[∑∞j=i cj

] [∑i−1k=0

1ckpk

]< ∞ in Mao (2010) implies our inequality

supi≥1

[cipi

∑i−1k=0

1ckpk

]<∞ because cipi < ci <

∑∞j=i cj for all i ≥ 1.

Given a sequence x in N, we use both notations xi and x(i) for the value of x at i.

We apply Theorem C.8 to the birth-death chain X and find properties of any possible

drift function V . The next lemma is a special case of Lemma 2.2 in Jarner and Hansen

(2000). To avoid other concepts such as random-walk-type Markov chain in Jarner

and Hansen (2000), we will provide a direct proof. This lemma describes small sets of

birth-death chain. We will use this lemma to show that V (i)→∞.

Lemma 4.8. Given a birth-death chain X, if a set is small then it is finite.

Proof. Suppose that C is a small set. By the definition of small set (see Meyn and

Tweedie, 2009, p.109), there exist a constant m and a non-trivial measure ν such that for

all i ∈ C and B ⊂ N

Mm(i, B) ≥ ν(B).

ν is non-trivial so there exists j such that ν(j) 6= 0. Hence Mm(i, j) > 0 for all i ∈ C.

When i > j, we need at least i − j steps to move from i to j, and therefore Mm(i, j) = 0

when i − j > m. Because Mm(i, j) > 0 for all i ∈ C, we have i − j ≤ m for i ∈ C. Thus,

C must be bounded.

70

Because V is finite, we can define

δi = Vi − Vi−1, i ≥ 2. (4–12)

Using Theorem C.8 and Lemma 4.8, we derive a drift condition for birth-death chains. It

implies that drift function has a monotonic tail.

Proposition 4.9. Fix N ∈ N. X is geometrically ergodic if and only if there exist a

constant ε > 0 and a finite function V ≥ 1 on N such that

qiδi ≥ piδi+1 + εVi, i > N, (4–13)

Vi →∞, and Vi strictly increases for i ≥ N .

Proof. We start with necessity. Suppose that X is geometrically ergodic. Let C =

{1, 2, · · · , N}. By Theorem C.8, there exist some constants b < ∞, ε > 0, and a finite

function V ≥ 1 such that

MV ≤ (1− ε)V + b1C .

By Lemma 15.2.2(ii) in Meyn and Tweedie (2009), {V < c} is petite for all c. By Theorem

5.5.7 in Meyn and Tweedie (2009), {V < c} is small. By Lemma 4.8, {V < c} is finite. If

lim inf Vi < ∞ then {V < 1 + lim inf Vi} is infinite. It is a contradiction, so lim inf Vi = ∞.

Thus, limVi =∞.

For i ≥ 2, we have

MV (i) =∞∑j=1

mijVj

= piVi+1 + qiVi−1 + (1− pi − qi)Vi

= pi[Vi+1 − Vi]− qi[Vi − Vi−1] + Vi

= piδi+1 − qiδi + Vi. (4–14)

71

So for i > N ,

MV (i) ≤ (1− ε)V (i) + b1C(i)

⇔MV (i) ≤ (1− ε)V (i)⇔ piδi+1 − qiδi ≤ −εVi ⇔ qiδi ≥ piδi+1 + εVi. (4–15)

Suppose that Vi+1 ≥ Vi for some i > N , or δi+1 ≥ 0, then

qiδi ≥ piδi+1 + εVi ⇒ qiδi ≥ εVi ⇒ δi > 0⇔ Vi > Vi−1. (4–16)

Given any j > N . V is finite and limV (i) = ∞, so there exists m > j such that

Vm+1 > Vm. By induction and applying (4–16), we have Vj > Vj−1. So Vi is strictly

increasing for i ≥ N .

We now prove sufficiency. Suppose that V has the stated properties. As in (4–15)

we have

qiδi ≥ piδi+1 + εVi ⇔MV (i) ≤ (1− ε)V (i), i > N.

For convenience, we only denote V0 = 0 and q1 = 1 in this proof. Because V is finite,

b : = supi≤N

MV (i) = supi≤N

qiVi−1 + riVi + piVi+1

< supi≤N

[Vi−1 + Vi + Vi+1] <∞.

Let C = {1, 2, · · · , N} then MV ≤ (1 − ε)V + b1C . By Theorem C.8, X is geometrically

ergodic.

Proposition 4.10. If X is geometrically ergodic then infi≥2 qi > 0.

Proof. By Proposition 4.9, given N = 1, there exist a constant ε > 0 and a finite, strictly

increases function V ≥ 1 such that qiδi ≥ piδi+1 + εVi for i > 1. Because δi+1 > 0 for

i ≥ 1,

qiδi = qi(Vi − Vi−1) ≥ εVi ⇒ qi > ε, i > 1.

Thus, lim inf qi ≥ ε > 0. By Lemma A.4, we have infi≥2 qi > 0.

72

For 0 < Vi <∞, δi+1/Vi has meaning for all Vi+1. Thus, we consider

αi :=δi+1

Vi, 0 < Vi, Vi+1 <∞. (4–17)

(We add an extra condition 0 < Vi+1 < ∞ so αi is finite.) The next proposition is another

version of Proposition 4.9 by reparameterization. Note that formula (4–13) in Proposition

4.9 involves three values of V which are Vi−1, Vi, and Vi+1. The next proposition is

simpler than Proposition 4.9 because it only involves two values of α, which are αi and

αi−1, in each step.

Proposition 4.11. Fix N ∈ N. X is geometrically ergodic if and only if there exist a

constant ε > 0 and a finite function α > −1 on N such that

αi ≤qipi

αi−1

1 + αi−1

− ε

pi, i > N, (4–18)

∑∞i=1 αi =∞, and αi > 0 for i ≥ N .

Proof. First we prove some equivalences which will be used in the proofs of both

sufficiency and necessity. We have

αi =δi+1

Vi⇔ Vi+1 − Vi = αiVi ⇔ Vi+1 = (1 + αi)Vi.

Because function t 7→ t/(t+ 1) is one-to-one for t > −1, we get

αi =δi+1

Vi⇔ αi

1 + αi=δi+1

Vi

[δi+1

Vi+ 1

]−1

=δi+1

Vi

[Vi+1

Vi

]−1

=δi+1

Vi+1

.

Combining the two above formulas we conclude that, for each i, if (4–17) holds then

αi =δi+1

Vi⇔ Vi+1 = (1 + αi)Vi ⇔

δi+1

Vi+1

=αi

1 + αi. (4–19)

Now fix N ∈ N. Suppose that (4–17) holds for i ≥ N , then we will prove that

qiδi ≥ piδi+1 + εVi ⇔ αi <qipi

αi−1

1 + αi−1

− ε

pi, i ≥ N + 1, (4–20)

Vi strictly increases for i ≥ N ⇔ αi > 0 for i ≥ N, (4–21)

73

and

Vi →∞⇔∞∑i=1

αi =∞. (4–22)

We prove (4–20) first. (4–17) holds for i ≥ N , so 0 < Vi < ∞ for i ≥ N . Note that pi > 0

for all i ≥ 1, thus

qiδi ≥ piδi+1 + εVi ⇔δi+1

Vi≤ qipi

δiVi− ε

pi, i ≥ N. (4–23)

Replace i by i− 1 in the last equality in (4–19), we get

δiVi

=αi−1

1 + αi−1

, i ≥ N + 1. (4–24)

From (4–23), (4–17), and (4–24), we have (4–20). From the middle equality in (4–19),

Vi+1 > Vi if and only if αi > 0, so we have (4–21). Because αi > 0 for i ≥ N , by the

product series theory (see Knopp, 1951, p.219)

∞∏i=N

(1 + αi) <∞⇔∞∑i=N

αi <∞. (4–25)

Form the middle equality in (4–19), we have Vi = VN∏i−1

j=N(1 + αj). Combining it with

(4–25), we get (4–22).

We now start with necessity. Suppose that X is geometrically ergodic. From

Proposition 4.9, there exist ε > 0 and∞ > V ≥ 1 such that qiδi ≥ piδi+1 + εVi for i > N ,

Vi → ∞, and Vi strictly increases for i ≥ N . Define α by (4–17) for i > 0. Note that

∞ > α > −1. Since (4–17) holds for all i ≥ N , we have (4–20)-(4–22). That means we

have (4–18), αi > 0 for i ≥ N , and∑∞

i=1 αi =∞.

We now prove sufficiency. Suppose that α has the stated properties. Define V

by Vi = 1 for i ≤ N and Vi+1 = (1 + αi)Vi for i ≥ N . ∞ > αi > 0 for i ≥ N so

∞ > V ≥ 1. Since (4–17) holds for i ≥ N , we have (4–20)-(4–22). By Proposition 4.9, X

is geometrically ergodic.

74

Denote

λ1 =1

p1

, λi =qipi, i ≥ 2

and

fi(t) = λit

1 + t, t > −1, i ≥ 2,

then we can rewrite (4–18) as

αi ≤ fi(αi−1)− ε

pi, i > N. (4–26)

By Proposition 4.11, we need∑∞

i=1 αi =∞, so the larger αis, the better. Given αi−1 > 0,

we should select the largest αi by

αi = fi(αi−1)− ε

pi.

After we get αi, we also want as large as possible αi+1. Thus, we need to check whether

the larger αi, the larger αi+1. As above, we select αi+1 by

αi+1 = fi+1(αi)−ε

pi+1

.

Because fi+1 is increasing, a larger αi does give a larger αi+1. We now consider ε.

Given αi−1 > 0, the smaller ε, the larger αi > 0 could be selected. This leads us to the

study of the upper bounds for αis. Given N ∈ N, define a sequence x = {xi}∞i=1 by

xN > 0 and xi = fi(xi−1), i > N. (4–27)

(The values of x1, x2, . . . , xN−1 are not important.) Let consider a special case xN = αN .

Because fis are increasing, we can see that xi is an upper bound of αi for each i ≥ N .

By Proposition 4.11, we need∑∞

i=1 αi = ∞, so∑∞

i=N xi ≥∑∞

i=1 αi = ∞ and hence∑∞i=N xi = ∞ is a necessary condition for geometric ergodicity. Is this condition strong

enough? Actually∑∞

i=N xi = ∞ for all positive recurrent chains (we do not provide a

proof here), so this necessary condition for geometric ergodicity is too weak. It turns

out that we can find a much stronger necessary condition for geometric ergodicity than

75

∑∞i=N xi = ∞, which is lim inf xi > 0. And then we can always select some α satisfying

a much stronger condition than∑∞

i=1 αi = ∞, which is lim inf αi > 0. From Proposition

4.10, infi≥2 qi > 0 is a necessary condition for geometric ergodicity. Under that condition,

lim inf xi > 0 is strong enough such that it is also a sufficient condition. First, we provide

some properties of x.

Lemma 4.12. The following four conditions are equivalent

∀x1 > 0, xi = fi(xi−1), i > 1, lim inf xi > 0. (4–28a)

∃x1 > 0, xi = fi(xi−1), i > 1, lim inf xi > 0. (4–28b)

∃N > 0,∀xN > 0, xi = fi(xi−1), i > N, lim inf xi > 0. (4–28c)

∃N > 0,∃xN > 0, xi = fi(xi−1), i > N, lim inf xi > 0. (4–28d)

Proof. Because (4–28a) is strongest and (4–28d) is weakest, it suffices to show that

(4–28d) implies (4–28a). Replacing x by z in (4–28d), we have

∃N > 0,∃zN > 0, zi = fi(zi−1), i > N, inf zi > 0.

We first prove (4–28c). Suppose that 0 < xN < zN then xN = czN for some 0 < c < 1.

For 0 < c < 1 and t > 0, we have

fi(ct) = λict

1 + ct≥ λi

ct

1 + t= cfi(t).

If xi−1 ≥ czi−1 > 0 then

xi = fi(xi−1) ≥ fi(czi−1) ≥ cfi(zi−1) = czi.

By induction, xi ≥ czi for all i ≥ N . So lim inf xi ≥ c lim inf zi > 0, therefore

∀0 < xN ≤ zN , xi = fi(xi−1), i > N, lim inf xi > 0.

76

That means (4–28c) holds for xN ≤ zN . Now consider xN ≥ zN . Suppose that

xi−1 ≥ zi−1 > 0 then

xi = fi(xi−1) ≥ fi(zi−1) = zi.

By induction we get xi ≥ zi for all i ≥ N , so lim inf xi ≥ lim inf zi > 0. Thus, we have

proved (4–28c) for all xN .

We now prove (4–28a). Given any x1 > 0, let xi = fi(xi−1) for i > 1. Because

fi(t) > 0 for t > 0, for all i, and x1 > 0, we have xN = (fN ◦ · · · ◦ f2)(x1) > 0. Combine

this with xi = fi(xi−1) for i > N , we have lim inf xi > 0 by (4–28c). So we finally have

(4–28a).

By Lemma A.4, we can replace lim inf xi > 0 by infi≥N xi > 0 in (4–28c)

and (4–28d). (We can not use inf xi because x may not be positive.) The next proposi-

tion gives a condition for geometric ergodicity which is based on xis.

Proposition 4.13. X is geometrically ergodic if and only if infi≥2 qi > 0 and one of four

conditions (4–28a)-(4–28d) holds.

Proof. We start with necessity. Suppose that X is geometrically ergodic. From Propo-

sition 4.10, we have infi≥2 qi > 0. Given any N ∈ N (actually we can select N = 1), by

Proposition 4.11, there exist constant ε > 0 and α > −1 such that

αi ≤qipi

αi−1

1 + αi−1

− ε

pi= fi(αi−1)− ε

pi, i > N,

∑∞i=1 αi =∞, and αi > 0 for i ≥ N . We set

xN = αN , xi = fi(xi−1), i > N.

αN > 0 so xN > 0.

77

Suppose that xi−1 ≥ αi−1 for some i > N . fi(t) is strictly increasing for t > −1, so

fi(xi−1)− fi(αi−1) ≥ 0. Thus,

xi − αi = fi(xi−1)− αi ≥ fi(xi−1)− fi(αi−1) +ε

pi≥ ε

pi, i > N.

Because xN ≥ αN , using induction and the above argument we have

xi − αi ≥ε

pi, i > N

and hence xipi ≥ ε for i > N . Because 0 < pi < 1, lim inf xi ≥ lim inf xipi ≥ ε > 0, i.e.

(4–28d) holds. By Lemma 4.12, the necessary condition holds.

We now prove sufficiency. By Lemma 4.12, we can suppose that infi≥2 qi > 0 and

(4–28d) holds. Let αi = 0 for i < N and αi = xi/2 for i ≥ N . (4–28d) holds so xi > 0

for i ≥ N . Therefore, α ≥ 0 > −1 and αi > 0 for i ≥ N . Because lim inf xi > 0, we get∑xi =∞ and hence

∑αi =∞. We now only need to find a constant ε > 0 such that


pi, i > N. (4–29)

Note that


pi

⇔ fi(αi−1)− xi2≥ ε

pi

⇔ 2fi

(xi−1

2

)− fi(xi−1) ≥ 2ε

pi

⇔ 2λixi−1/2

1 + xi−1/2− λi

xi−1

1 + xi−1

≥ 2ε

pi

⇔ piλixi−1

[1

1 + xi−1/2− 1

1 + xi−1

]≥ 2ε

⇔ qix2i−1/2

(1 + xi−1)(1 + xi−1/2)≥ 2ε

Because functionst

1 + tand

t/2

1 + t/2

78

increase for t > 0, their product

h(t) :=t2/2

(1 + t)(1 + t/2)

increases for t > 0. h(t) increases, h(t) > 0 for t > 0, and lim inf xi > 0, so lim inf h(xi) >

0. Finally,

lim inf qix2i−1/2

(1 + xi−1)(1 + xi−1/2)= lim inf qih(xi−1) ≥ (lim inf qi)[lim inf h(xi−1)] > 0.

(4–28d) implies xi > 0 for i ≥ N . So qih(xi−1) > 0 for i > N . By Lemma A.4,

infi>N qih(xi−1) > 0. Then we can find ε > 0 satisfying (4–29). By Proposition 4.11, X is

geometrically ergodic.

We are now ready to find a condition for geometric ergodicity which is only based on

pis and qis by selecting a special value for xN . Fix N ∈ N. Set xN = λN and denote

yN,i =1

xi, γi =

1

λi, i ≥ N. (4–30)

From xN = λN and

xi = λixi−1

1 + xi−1

⇔ 1

xi=

1

λi

(1 +

1

xi−1

),

we have

yN,N = γN , yN,i = γi(1 + yN,i−1), i > N.

By induction

yN,i = γi + γiγi−1 + · · ·+ γiγi−1 . . . γN =i∑

k=N

i∏j=k

γj, i ≥ N.

Because xi > 0 for i ≥ N , lim inf xi > 0 if and only if infi≥N xi > 0 by Lemma A.4. Since

infi≥N xi = 1/ supi≥N yN,i, infi≥N xi > 0 if and only if supi≥N yN,i <∞.

We can also rewrite yN,i in another form. Recall that we define ci in (4–4) by c1 = 1

and

ci =p1p2 . . . pi−1

q2q3 . . . qi, i > 1,

79

and c0p0 = 1. We have cipi = γ1 · · · γi = (λ1 · · ·λi)−1 for i ≥ 1. Thus,

yN,i = γi + γiγi−1 + · · ·+ γiγi−1 . . . γN

= cipi(λ1 . . . λi)(γi + γi−1γi + · · ·+ γNγN+1 . . . γi)

= cipi(λ1λ2 . . . λi−1 + · · ·+ λ1λ2 . . . λN−1)

= cipi

i−1∑k=N−1

1

ckpk, i ≥ N ≥ 1. (4–31)

Theorem 4.14. Fix N ∈ N. X is geometrically ergodic if and only if

infi≥2

qi > 0

and

supi≥N

yN,i = supi≥N

(γi + γiγi−1 + · · ·+ γiγi−1 . . . γN) <∞. (4–32)

By (4–31), (4–32) is equivalent to

i−1∑k=N−1

1

ckpk< C

1

cipi, ∀i ≥ N,

for some finite constant C.

Proof of Theorem 4.14. We start with necessity. Suppose that X is geometrically

ergodic. Using Proposition 4.13 with (4–28c) and xN = λN , we have infi≥2 qi > 0 and

supi≥N yN,i <∞.

We now prove sufficiency. Suppose that two inequalities in this theorem hold. Using

Proposition 4.13 with (4–28d) and xN = λN , we see that X is geometrically ergodic.

Note that (4–3) is not easy to check. We will show that we can ignore it. We do not

suppose positive recurrence in the next corollary.

Corollary 4.15. Suppose that X is irreducible and aperiodic. Fix N ∈ N. X is geometri-

cally ergodic if and only if infi>N qi > 0 and supi≥N yN,i <∞.

80

Proof. We start with necessity. Suppose that X is geometrically ergodic, so X is positive

recurrent. By Theorem 4.14, infi>N qi > 0 and supi≥N yN,i <∞.

We now prove sufficiency. Suppose that we has the stated properties. We show that

X is recurrent first. From Proposition 4.9, any drift function V has a monotonic tail, so

{V < n} is finite for all n ≥ 0. By Lemma C.10 (i), {V < n} is small. By Theorem 8.0.2

(ii) in Meyn and Tweedie (2009), X is recurrent. By Theorem C.8, note that we only need

recurrence instead of positive recurrence in Theorem 4.14. Thus, X is geometrically

ergodic.

In the above corollary, we use infi>N qi > 0 instead of infi≥N qi > 0 because q1 = 0.

To show that the inequality (4–11) in Mao (2010) implies our inequality (4–32), we select

N = 1. By (4–31), y1,i = cipi∑i−1

k=01

ckpk. supi≥1

[∑∞j=i cj

] [∑i−1k=0

1ckpk

]< ∞ in Mao (2010)

implies supi≥1

[cipi

∑i−1k=0

1ckpk

]<∞ because cipi < ci <

∑∞j=i cj for i ≥ 1.

By Lemma A.5, given any N ∈ N, under the condition lim inf qi > 0, condition (4–3)

is equivalent to

limi→∞

(γN + γNγN+1 + · · ·+ γNγN+1 . . . γi) <∞.

The above sum is quite similar to yN,i in some cases. (See Example 4.16 for a case.) In

those cases, if a birth-death chain is irreducible, aperiodic, and positive recurrent then

it is also geometrically ergodic. We now apply our result to a chain to show that it is

geometrically ergodic, but Theorem 4.3 could not be used here.

81

Example 4.16. Consider a chain which pis and qis repeat values every four rows as

follows

M =

r1 p1 0 0 0 0 0 0 · · ·

1/9 0 8/9 0 0 0 0 0 · · ·

0 1/2 0 1/2 0 0 0 0 · · ·

0 0 8/9 0 1/9 0 0 0 · · ·

0 0 0 8/9 0 1/9 0 0 · · ·

0 0 0 0 1/9 0 8/9 0 · · ·...

......

......

......

.... . .

for some 0 < r1 < 1, i.e.

p2 = p2+4k = 8/9,

p3 = p3+4k = 1/2,

p4 = p5 = p4+4k = p5+4k = 1/9, k > 0,

qi + pi = 1, i > 1.

We could not apply Theorem 4.3 because

lim sup[√pi−1qi +

√piqi+1] ≥ √p2+4kq3+4k +

√p3+4kq4+4k

=

√8

9

1

2+

√1

2

8

9= 2

2

3=

4

3> 1.

It is clearly that pi > 0 for i ≥ 1, qi > 0 for i ≥ 2, r1 > 0, and infi>1 qi > 0. We apply

Corollary 4.15 with N = 2 to show that this chain is geometrically ergodic. We have

γ2 = γ2+4k = p2/q2 = 8,

γ3 = γ3+4k = p3/q3 = 1,

γ4 = γ5 = γ4k = γ1+4k = p4/q4 = 1/8, k > 0.

82

So (γi, γi+1, γi+2, γi+3) always has one coordinate equal to 8, one coordinate equal to 1,

and two coordinates equal to 1/8 for i ≥ 2, therefore γi+3γi+2γi+1γi = (8)(1)(1/8)(1/8) =

1/8 for i ≥ 2. To prove that supi≥2 y2,i < ∞, it suffices to show that supk≥3 y2,j+4k < ∞ for

each integer number 0 ≤ j ≤ 3. We only consider j = 0 because other cases are similar.

For i = 4k and k ≥ 3, we partition y2,i into four sequences as follows

y2,i = γi + γiγi−1 + · · ·+ γiγi−1 . . . γ2

= γi[1 + (γi−1γi−2γi−3γi−4) + · · ·+ (γi−1γi−2γi−3γi−4) · · · (γ7γ6γ5γ4)]

+ γiγi−1[1 + (γi−2γi−3γi−4γi−5) + · · ·+ (γi−2γi−3γi−4γi−5) · · · (γ6γ5γ4γ3)]

+ γiγi−1γi−2[1 + γi−3γi−4γi−5γi−6 + · · ·+ (γi−3γi−4γi−5γi−6) · · · (γ5γ4γ3γ2)]

+ [(γiγi−1γi−2γi−3) + · · ·+ (γiγi−1γi−2γi−3) · · · (γ8γ7γ6γ5)]

= γi[1 + 1/8 + · · ·+ (1/8)k−2]

+ γiγi−1[1 + 1/8 + · · ·+ (1/8)k−2]

+ γiγi−1γi−2[1 + 1/8 + · · ·+ (1/8)k−2]

+ [1/8 + · · ·+ (1/8)k−2]

≤ [γi + γiγi−1 + γiγi−1γi−2 + 1]1− (1/8)k−1

1− 1/8

≤ [γi + γiγi−1 + γiγi−1γi−2 + 1]1

7/8.

i = 4k so γi + γiγi−1 + γiγi−1γi−2 + 1 = 1/8 + (1/8)(1)(8) + 1 = 218. Thus, y2,4k ≤

(21

8

)87

for

k ≥ 3 and hence supk≥3 y2,4k <∞.

4.3.2 Application to a Family of Gibbs Samplers

Recall from Section 1.2.1 that b0 = 0 and a0 = 1, {ai}∞i=1 and {bi}∞i=1 are two

sequences of strictly positive real numbers such that∑∞

i=1 ai +∑∞

i=1 bi = 1, the x-chain

{Xn} is a birth-death chain with

pi =aibi

(ai + bi−1)(ai + bi)and qi =

ai−1bi−1

(ai + bi−1)(ai−1 + bi−1),

83

and ri = 1 − pi − qi. We use Theorem 4.14 to establish a necessary and sufficient

condition for the geometric ergodicity of {Xn}.

Proposition 4.17. The x-chain {Xn} is geometrically ergodic if and only if

supi>0

biai<∞, sup

i>0

ai+1

bi<∞, (4–33)

and there exists a constant 0 < C < 1 such that

i−1∑j=1

(1

aj+

1

bj

)< C

(1

ai+

1

bi

), ∀i ≥ 2. (4–34)

Proof. infi≥2 qi > 0 if and only if supi≥2 q−1i <∞. We have

1

qi+1

=ai+1 + bi

bi

ai + biai

=

(1 +

ai+1

bi

)(1 +

biai

), i ≥ 1.

Because 1 < 1 + ai+1/bi, 1 + bi/ai < q−1i+1, supi≥2 q

−1i <∞ is equivalent to (4–33).

Denote

ti =aibiai + bi

, i ≥ 1.

For i ≥ 2, we have

γi =piqi

=aibi

(ai + bi−1)(ai + bi)

(ai + bi−1)(ai−1 + bi−1)

ai−1bi−1

=titi−1

.

Thus, γiγi−1 . . . γj = ti/tj−1 for i ≥ j ≥ 2 and hence

y2,i = γi + γiγi−1 + · · ·+ γiγi−1 . . . γ2

=titi−1

+titi−2

+ · · ·+ tit1

= ti

i−1∑j=1

t−1j =

aibiai + bi

i−1∑j=1

(1

aj+

1

bj

), i ≥ 2.

By Theorem 4.14, we obtain the result.

4.3.3 Geometric Ergodicity for a Family of Random Walks on Z

In this section, we use the same method in Section 4.3.1 to find an analogous

necessary and sufficient condition for the geometric ergodicity for a family of random

84

walks on Z. Let Z = {Zn}∞n=0 denote a random walk with state space Z and Markov

transition matrix

M =

. . ....

......

......

...... . .

.

· · · 0 q−1 r−1 p−1 0 0 0 · · ·

· · · 0 0 q0 r0 p0 0 0 · · ·

· · · 0 0 0 q1 r1 p1 0 · · ·

. .. ...

......

......

......

. . .

.

Suppose that π is the stationary distribution of Z then

πM = M

⇔ pi−1πi−1 + riπi + qi+1πi+1 = πi, i ∈ Z

⇔ pi−1πi−1 + qi+1πi+1 = (pi + qi)πi, i ∈ Z

⇔ qi+1πi+1 − piπi = qiπi − pi−1πi−1, i ∈ Z

⇔ qi+1πi+1 − piπi = q1π1 − p0π0, i ∈ Z.

First, we consider the case q1π1 − p0π0 = t > 0. So qiπi − pi−1πi−1 = t for all i and hence

πi ≥ qiπi ≥ t for all i. Thus,∑πi = ∞. This is a contradiction because π is a stationary

distribution. A similar proof shows that the case q1π1 − p0π0 < 0 also implies∑πi = ∞.

So qi+1πi+1 − piπi = 0 for all i. Thus,

πi =p0p1 · · · pi−1

q1q2 · · · qiπ0, i ≥ 1,

π−i =q0q−1 · · · q−i+1

p−1p−2 · · · p−iπ0, i ≥ 1.

From∑∞

i=−∞ πi = 1, Z is irreducible, aperiodic, and positive recurrent if and only if the

following three conditions holds: (i) pi, qi > 0 for all i ∈ Z, (ii) ri > 0 for some i ∈ Z, and

(iii)∞∑i=1

p0p1 · · · pi−1

q1q2 · · · qi+∞∑i=1

q0q−1 · · · q−i+1

p−1p−2 · · · p−i<∞.

The next lemma is analogous to Lemma 4.8.

85

Lemma 4.18. For the random walk Z, if a set is small then it is finite.

Proof. The proof is similar to that of Lemma 4.8. When i 6= j, we need at least |i − j|

steps to move from i to j. So we just replace i − j by |i − j| in the proof of Lemma

4.8.

Denote

δi = Vi − Vi−1, i ∈ Z. (4–35)

The next lemma is analogous to Proposition 4.9.

Lemma 4.19. Fix N ∈ N. Z is geometrically ergodic if and only if there exist a constant

ε > 0 and a finite function V ≥ 1 on Z such that

qiδi ≥ piδi+1 + εV (i), |i| > N, (4–36)

limi→∞ V (i) = limi→−∞ V (i) = ∞, V (i) strictly increases for i > N , and V (i) strictly

decreases for i < −N .

Proof. The proof is similar to that in birth-death chain case. We start with necessity.

Suppose that Z is geometrically ergodic. Let C = {−N,−N + 1, . . . , N}. By Theorem

C.8, there exist constants b <∞, ε > 0, and a finite function V ≥ 1 satisfying

MV ≤ (1− ε)V + b1C .

As in the proof of Proposition 4.9, we have limi→∞ V (i) =∞ and limi→−∞ V (i) =∞.

We have (similar to equation (4–14))

MV (i) = piδi+1 − qiδi + V (i), i ∈ Z.

So for |i| > N , (similar to (4–15))

MV (i) ≤ (1− ε)Vi ⇔ qiδi ≥ piδi+1 + εVi.

86

Similar to the proof of Proposition 4.9, we can show that V is strictly increase for

i ≥ N . We now prove that V strictly decreases for all i ≤ −N . For i < 0, suppose that

V (i− 1) ≥ V (i), or −δi ≥ 0, we have

qiδi ≥ piδi+1 + εV (i)⇔ pi(−δi+1) ≥ qi(−δi) + εV (i)⇒ −δi+1 > 0⇔ V (i) > V (i+ 1).

Given any j < −N , we will show that

V (j) > V (j + 1).

Because lim infi→−∞ V (i) = ∞, there exists m < j such that V (m) > V (m + 1).

By induction and applying the above result, we get V (j) > V (j + 1). So V (i) strictly

decreases for all i ≤ −N .

The proof for the sufficient condition is similar to the proof of Proposition 4.9.

Denote q′i = p−i, p′i = q−i, V ′i = V−i and

δ′i := V ′i − V ′i−1 = V−i − V−i+1 = −(V−i+1 − V−i) = −δ−i+1.

Replace i by −i in qiδi ≥ piδi+1 + εV (i), we have

q−iδ−i ≥ p−iδ−i+1 + εV−i

⇔ −p′iδ′i+1 ≥ −q′iδ′i + εV ′i

⇔ q′iδ′i ≥ p′iδ

′i+1 + εV ′i .

So we can rewrite the above lemma as follows.

Lemma 4.20. Fix N ∈ N. Z is geometrically ergodic if and only if there exist functions

V, V ′ : N→ [1,∞) such that

qiδi ≥ piδi+1 + εV (i), q′iδ′i ≥ p′iδ

′i+1 + εV ′i , i > N, (4–37)

limi→∞ V (i) = limi→∞ V′(i) =∞, and V (i) and V ′(i) strictly increase for i > N .

87

With the above notation, we can partition a random walk chain into two birth-death

chains. Finally, we have a similar result to Theorem 4.14.

Theorem 4.21. Suppose that the random walk Z is irreducible and aperiodic. Fix N ∈ N.

Z is geometrically ergodic if and only if

infi>0

qi > 0,

infi>0

p−i > 0,

supi≥N

i∑k=N

pkpk+1 . . . piqkqk+1 . . . qi

<∞,

supi≥N

i∑k=N

q−kq−k−1 . . . q−ip−kp−k−1 . . . p−i

<∞.

We suggest that our method can be applied to random walks on Zd which partition

into a finite number of separated birth-death branches outside a bounded set.

4.3.4 Some Other Results

Recall from Section 4.1 that X is a irreducible, aperiodic, and positive recurrent

birth-death chain with Markov transition matrix M defined by pis and qis. Using Proposi-

tion 4.9, we have a necessary condition for the geometric ergodicity of a class of Markov

chains, which is called single-birth chains by some authors (see, e.g., Chen, 2004).

Corollary 4.22. Given an irreducible, aperiodic, and positive recurrent Markov chain X ′

with Markov transition matrix

M ′ =

r′1 p

′1 0 0 0 · · ·

q′21 r

′2 p

′2 0 0 · · ·

q′31 q′32 r

′3 p

′3 0 · · ·

......

......

.... . .

.

Suppose that there exists N0 ≥ 1 such that

p′i ≤ pi and q′i1 + q′i2 + · · ·+ q′i,i−1 ≥ qi, i > N0.

88

If X is geometrically ergodic then X ′ is geometrically ergodic.

Proof. Since X is geometrically ergodic, by Proposition 4.9, there exists a strictly

increasing drift function V on N such that limVi = ∞ (We set N = 1 in Proposition 4.9).

For all i ≥ 2, we have

M ′V (i) ≤ (1− ε)Vi

⇔ p′iVi+1 + r′iVi + q′i,i−1Vi−1 + · · ·+ q′i1V1 ≤ (1− ε)Vi

⇔ p′iVi+1 + (1− p′i − q′i1 − · · · − q′i,i−1)Vi + q′i,i−1Vi−1 + · · ·+ q′i1V1 ≤ (1− ε)Vi

⇔ p′iδi+1 − q′i,i−1(Vi − Vi−1)− q′i,i−2(Vi − Vi−2)− · · · − q′i1(Vi − V1) ≤ −εVi

⇔ q′i,i−1δi + q′i,i−2(Vi − Vi−2) + · · ·+ q′i,1(Vi − V1) ≥ p′iδi+1 + εV (i).

V strictly increases so 0 < δi < Vi − Vj for i > j ≥ 1 and hence for i > N0,

q′i,i−1δi + q′i,i−2(Vi − Vi−2) + · · ·+ q′i,1(Vi − V1)

≥ q′i,i−1δi + q′i,i−2δi + · · ·+ q′i,1δi

≥ qiδi ≥ piδi+1 + εV (i) ≥ p′iδi+1 + εV (i)

⇒M ′V (i) ≤ (1− ε)Vi.

For i ≤ N0,

M ′V (i) = p′iVi+1 + r′iVi + q′i,i−1Vi−1 + · · ·+ q′i1V1 <∞.

Let b = maxi≤N0 M′V (i) < ∞ and C = {1, 2, · · · , N0}. By Theorem C.8, X ′ is geometri-

cally ergodic.

By Proposition 4.10, lim inf qi > 0 is a necessary condition for the geometric

ergodicity of X. Note that lim inf qi > 0 implies lim sup ri < 1. Under this condition,

we will show that we can suppose ri = 0 for all i except some i0 if we only care about

geometric ergodicity. Because of aperiodicity, we need some ri0 > 0. Without loss of

generality, we suppose r1 > 0 in the next corollary. Recall that λi = qi/pi.

89

Corollary 4.23. Let

p′i = pi + ri1

λi + 1, q′i = qi + ri

λiλi + 1

, r′i = 0, i > 1,

then

p′i + q′i = 1 andq′ip′i

= λi, i > 1.

Denote by X ′ a birth-death chain with transition matrix

M ′ =

r1 p1 0 0 0 · · ·

q′2 0 p

′2 0 0 · · ·

0 q′3 0 p

′3 0 · · ·

......

......

.... . .

.

Under condition r1 > 0 and lim sup ri < 1, X is geometrically ergodic if and only if X ′ is


Proof. We can show that X ′ is also an irreducible, aperiodic and positive recurrent

birth-death chain.

Suppose that X is geometrically ergodic. Fix any N ∈ N. By Proposition 4.9, there

exists some ε and drift function V such that qiδi ≥ piδi+1 + εV (i) for i > N . Because

qipi

=q′ip′i,

1

pi≥ 1

p′i, δi > 0, i > N,

and V is finite, we have

qiδi ≥ piδi+1 + εV (i)⇔ δi+1

Vi≤ qipi

δiVi− ε

pi

⇒ δi+1

Vi≤ q′ip′i

δiVi− ε

p′i

⇔ q′iδi ≥ p′iδi+1 + εV (i), i > N.

That means V is also a drift function for X ′, so X ′ is geometrically ergodic by Proposi-

tion 4.9.

90

Conversely, suppose that X ′ is geometrically ergodic. Fix any N ∈ N. By Proposi-

tion 4.9, there exists some ε′ and drift function V such that q′iδi ≥ p′iδi+1 + εV (i) for i > N .

Doing as above we have

δi+1

Vi≤ q′ip′i

δiVi− ε′

p′i.

We need to find some ε > 0 such that

q′ip′i

δiVi− ε′

p′i≤ qipi

δiVi− ε

pi⇔ ε′

p′i≥ ε

pi⇔ ε ≤ pi

p′iε′.

If we could show that lim inf pi/p′i > 0 then we could find a positive constant ε for the

above inequality and hence we can apply Proposition 4.9 with N , V , and ε for X to show

that X is geometrically ergodic. Note that

p′ipi− 1 =

ripi(λi + 1)

=ri

pi(1 + qi/pi)=

ripi + qi

,

so

lim sup ri < 1⇔ lim supri

pi + qi<∞⇔ lim sup

p′ipi<∞⇔ lim inf

pip′i> 0.

Birth-death chains is special cases of random-walk-type Markov chains (see, e.g.,

Jarner and Tweedie, 2003). V is also unbounded for those chains.

Lemma 4.24. Given a random-walk-type Markov chain on Rn and xn ∈ Rn such that

xn →∞. If V is any drift function for this chain then lim inf V (xi) =∞.

Proof. From Jarner and Tweedie (2003), a small set of a random-walk-type Markov

chain on Rn is bounded. {V < c} is petite so it is bounded. If lim inf V (xn) < ∞ then

there exists a subsequence {xnk} of {xn} such that c = limV (xnk) < ∞ and hence

xnk ∈ {V < c+ 1}. This is a contradiction since {xnk} is unbounded.

91

Finally, we mention some studies which relate to Markov chains in this chapter.

Jarner and Hansen (2000) and Bramson (2008) studied the relationship between petite

sets and bounded sets. Theorem 2.2 in Jarner and Tweedie (2003) uses drift condition

to show that a necessary condition for the geometric ergodicity of a random-walk-type

Markov chain is geometric stationary tail. Mao et al. (2012) studied GI/G/1-type Markov

chain which is a random-walk-type Markov chain when its phase process is stochastic.

Theorem 3.1 in Mao et al. (2012) shows that geometric stationary tail is also a sufficient

condition for the geometric ergodicity. A birth-death chain is a GI/G/1-type Markov

chain when pis and qis do not depend on i. Kovchegov (2010) calculates the upper

bound for the total variation distance of geometrically ergodic birth-death chains when

pis and qis do not depend on i.

92

APPENDIX ASOME LEMMAS AND EXAMPLES

Recall Hilbert spaces L2(π) and L20(π) in Chapter 1. Note that the dimension of

L20(π) is 1 less than that of L2(π).

Lemma A.1. Let A be a linear operator from L2(π) to itself. Suppose that A restricted

to L20(π) is a linear operator from L2

0(π) to itself. A is compact in L2(π) if and only if A is

compact in L20(π).

Proof. Suppose that A is compact in L2(π). Let {h(n)} be a bounded sequence in

L20(π) then {h(n)} is bounded in L2(π). Because A is compact in L2(π), there exists a

subsequence {h(n′)} of {h(n)} such that {Ah(n′)} converges to a function g in L2(π).

Because Ah(n′) ∈ L20(π) and L2

0(π) is closed, we have g ∈ L20(π). We can find a

subsequence {h(n′)} of {h(n)} such that {Ah(n′)} converges to a function in L20(π), so A is

compact in L20(π).

Now suppose that A is compact in L20(π). Let {h(n)} be a bounded sequence in

L2(π). Suppose that ‖h(n)‖ < M . By Jensen’s inequality we have |πh(n)|2 ≤ π(h(n))2 =

‖h(n)‖2 < M2, so the sequence {πh(n)} is bounded in R. Since every bounded sequence

in R has a subsequence which converges, there exists a subsequence {h(n′)} of {h(n)}

such that {πh(n′)} converges in R. Given a constant c, we also denote by c the function

c(x) = c. Since A(πh(n′)) = πh(n′)A(1), the sequence {A(πh(n′))} converges in L2(π).

Because Varh(n′) ≤ E(h(n′))2, we have ‖h(n′) − πh(n′)‖ ≤ ‖h(n′)‖ < M . Note that

π(h(n′) − πh(n′)) = 0, so {h(n′) − πh(n′)} is a bounded sequence in L20(π). A is compact

in L20(π), so there is a subsequence {h(n′′)} of {h(n′)} such that {A(h(n′′) − πh(n′′))}

converges in L20(π). Since Ah(n′′) = A(h(n′′) − πh(n′′)) + A(πh(n′′)), the sequence {Ah(n′′)}

converges in L2(π). We can find a subsequence {h(n′′)} of {h(n)} such that {Ah(n′′)}

converges in L2(π), so A is compact in L2(π).

93

Lemma A.2. In Section 3.2.2, π(θ, µ, λ|y) is improper if and only if

a < 0, a+K

2>

1

2, and a+ b >

1−N2

.

Proof. Tan and Hobert (2009) gave conditions so that the posterior is proper for a similar

model with the reparameterization λ−1θ = σ2

θ and λ−1e = σ2

e . We have

yij|θ, µ, σ2θ , σ

2e ∼ N(θi, σ

2e), i = 1, 2, . . . , K, j = 1, 2, . . . ,mi,

θ|µ, σ2θ , σ

2e ∼ N(1µ, Iσ2

θ),

f(σ2θ , σ

2e , µ) ∝ (σ2

θ)−(a−1)(σ2

e)−(b−1)

∣∣∣∣dλθdσ2θ

∣∣∣∣ ∣∣∣∣dλedσ2e

∣∣∣∣ = (σ2θ)−(a+1)(σ2

e)−(b+1), σθ, σe > 0.

To simply the arguments, we consider X|Y ∼ fX|Y (x|y), Y ∼ fY (y), and the reparam-

eterization Z = g(Y ) for some fX|Y , fY , and g. Suppose that g is one-to-one and there

exists dg−1/dz. Consider y0 and z0 = g(y0). We have

fX|Z(x|z0) = fX|Y (x|g−1(z0)) = fX|Y (x|y0),

and

fZ(z0) = fY (g−1(z0))dg−1(z0)

dz0

= fY (y0)dg−1(z0)

dz0

.

So ∫fX,Z(x, z)dz =

∫fX|Z(x|z)fZ(z)dz

=

∫fX|Y (x|y)fY (y)

dg−1(z)

dzdz

=

∫fX|Y (x|y)fY (y)dg−1(z)

=

∫fX|Y (x|y)fY (y)dy

=

∫fX,Y (x, y)dy.

That means both integrals will be finite or infinite. By Hobert and Casella (1996) and Tan

and Hobert (2009), we obtain the result.

94

Example A.3. We draw sets which are similar to D and Sb in R2 and R3 in Section 3.3.

In R2, denote by (x, y) a point in a regular coordinate system and by (r, θ) a point in the

Figure A-1. Graph in R2

corresponding polar coordinate system. If x = y then θ = π4. Figure A-1 is the graph of

the line y = x for 0 < x < δ, the region

D =

{(x, y) ∈ (0, δ)2 :

1

2<x

y< 2

},

and the curve {(r, θ) : r = δ, |θ − π

4| < 1

4

}.

Note that for any point A in the set

S4 =

{(r, θ) : r < δ, |θ − π

4| < 1

4

},

the line which connect (0, 0) and A cut the above curve.

In R3, denote by (x, y, z) a point in a regular coordinate system and by (r, θ, φ) a

point in the corresponding polar coordinate system. If x = y = z then θ = arctan(√

2) and

φ = π4. Figure A-2 is the graphs of the line x = y = z for 0 < x < δ, the region

D =

{(x, y) ∈ (0, δ)2 :

1

2<x

z,y

z,x

y< 2

},

95

Figure A-2. Graph in R3

and the surface {(r, θ, φ) : r = δ, |θ − arctan(

√2)| < 1

5, |φ− π

4| < 1

5

}.

Lemma A.4. For a positive sequence {xi}∞i=1, inf xi > 0 if and only if lim inf xi > 0.

Proof. Suppose that inf xi > 0. There exists ε > 0 such that xi > ε for all i. So

lim inf xi ≥ ε > 0.

Conversely, suppose that lim inf xi > 0. There exists N ∈ N and ε > 0 such

that xi ≥ ε for i > N , therefore infi>N xi ≥ ε. Because x is a positive sequence,

mini≤N xi > 0. Thus, inf xi > 0.

Given positive sequences {pi}∞i=1 and {qi}∞i=1, recall that we define cis as in (4–4)

c1 = 1, ci =p1 p2 · · · pi−1

q2 q3 · · · qi, i = 2, 3, . . .

The next lemma show that we can change the range of indexes is of qis when

lim inf qi > 0.

96

Lemma A.5. If lim inf qi > 0 then

∞∑i=1

ci <∞⇔∞∑

i−k≥2

p1p2 . . . pi−1

q2q3 . . . qi−k<∞, k ≥ 0

⇔∞∑

i+l≥2

p1p2 . . . pi−1

q2q3 . . . qi+l<∞, l > 0.

Proof. We only prove for k because the proof for l is quite similar. Denote

dki =p1p2 . . . pi−1

q2q3 . . . qi−k, i− 2 ≥ k,

then dki = ci(qi−k+1 . . . qi) ≤ ci. lim inf qi > 0 so there exists N ≥ 2 and ε > 0

such that qi > ε for i > N . For i − k + 1 > N (which implies i − 2 ≥ k), we have

ci ≥ dki = ci(qi−k+1 . . . qi) ≥ ciεk. So

∑∞i−k+1>N ci converges if and only if

∑∞i−k+1>N dki

converges. Thus,∑∞

i=1 ci converges if and only if∑

i−k≥2 dki converges.

97

APPENDIX BCHI-SQUARE DISTANCE

Definition B.1. Given two σ-finite measures Λ and Π on a measure space (X,F). chi-

square distance between Λ and Π (also called chi-square divergence of Π from Λ) is

definied by (see, e.g., Roberts and Rosenthal, 1997, p.16)

χ2(Λ,Π) =

∫X

(dΛdΠ− 1)2dΠ, Λ� Π,

∞, otherwise,

where� denotes absolute continuity.

Note that this definition only depends on Λ and Π. Suppose that Λ � µ and

Π � µ for some σ-finite measure µ on (X,F). (For example, those conditions hold when

µ = Λ + Π.) Denote dΠ = πdµ and dΛ = λdµ. The next lemma gives another form for

chi-square distance.

Lemma B.2.

χ2(Λ,Π) =

∫X

(λ− π)2

πdµ.

Therefore, the right hand side of the above formula does not depend on µ.

Proof. We start with case Λ � Π. Because Λ � Π and Π � µ, we have (see Halmos,

1950, p.133)

λ =dΛ

dµ=dΛ

dΠ

dΠ

dµ=dΛ

dΠπ µ-a.e.

Soλ

π=dΛ

dΠµ-a.e.

We can suppose that dΛdΠ

is finite (see Halmos, 1950, p.128). If π(x) = 0 then λ(x)/π(x) is

infinite when λ(x) 6= 0 and has no meaning when λ(x) = 0. From measure theory, those

cases are not important because they happen on a set with measure µ equal to 0. So∫X

(dΛ

dΠ− 1

)2

dΠ =

∫X

(λ

π− 1

)2

πdµ =

∫X

(λ− π)2

πdµ.

98

Now suppose that Λ is not absolutely continuous with respect to Π. We need to

prove that ∫X

(λ− π)2

πdµ =∞.

By Lebesgue decomposition (see Halmos, 1950, p.134), Λ = ν + σ where ν ⊥ Π and

σ � Π. Since ν ⊥ Π, there exists a set A such that Π(A) = 0 and ν(X \ A) = 0. Because

Λ is not absolutely continuous with respect to Π, ν is not trivial and hence ν(A) > 0. We

have ∫X

(λ− π)2

πdµ ≥

∫A

(λ− π)2

πdµ.

Π(A) = 0 so we can select π such that π(x) = 0 for x ∈ A. σ � Π and Π � µ imply

σ � µ. Since Λ � µ, ν = Λ − σ � µ. Combine that with Λ ≥ ν, we have λ ≥ dνdµ

.

Because π(x) = 0 on A, we get

∫A

(λ− π)2

πdµ =

∫A

λ2

πdµ ≥

∫A

(dνdµ

)2

πdµ.

Because ν(A) > 0, dνdµ> 0 on some set B ⊂ A such that µ(B) > 0. Since π(x) = 0 on A,

∫A

(dνdµ

)2

πdµ =∞.

Chi-square distance is not symmetric, because χ2(Λ,Π) 6= χ2(Π,Λ) in most cases.

If Λ and Π are probability measures, then

χ2(Λ,Π) =

∫X

[(dΛ

dΠ

)2

− 2dΛ

dΠ+ 1

]dΠ

=

∫X

(dΛ

dΠ

)2

dΠ− 2 + 1 =

∫X

(dΛ

dΠ

)2

dΠ− 1. (B–1)

We also denote by L2(Π) all σ-finite signed measures Λ on (X,F) such that dΛ/dΠ ∈

L2(Π) (see, e.g., Roberts and Rosenthal, 1997, p.16). From (B–1), when Λ and Π are

probability measures, χ2(Λ,Π) <∞ if and only if Λ ∈ L2(Π).

99

Lemma B.3. (see, e.g., Diaconis et al., 2008, p.155)

4‖Λ− Π‖2 ≤ χ2(Λ,Π),

where the left hand side is the total variation norm.

Proof. Note that if dΛ = λdµ and µ is positive then ‖Λ‖ =∫|λ|dµ. Writing total variance

distance as L1 distance, we have

‖Λ− Π‖ =1

2

∫X

|λ− π|dµ =1

2

∫X

|λ− π|π

dΠ. (B–2)

By Jensen’s inequality,

4‖Λ− Π‖2 =

(∫X

∣∣∣∣λ− ππ∣∣∣∣ dΠ

)2

≤ χ2(Λ,Π).

Let Φ = {Xi}∞i=0 be an irreducible Markov chain on a Borel space (X,F) with some

Markov transition kernel P , stationary distribution Π, and initial distribution P0(·). Denote

by Pn the distribution of Xn for all n ≥ 0. Denote by P ∗ the kernel of the backward chain.

The chain is called L2(Π)-geometrically ergodic if (see, e.g., Roberts and Tweedie,

2001, p.39) if there exist a function M(P0) <∞ and a constant 0 < r < 1 such that

χ2(Pn,Π) ≤M(P0)rn, ∀P0 ∈ L2(Π),∀n ∈ N.

Lemma B.4. (Liu et al., 1995, p.162) Suppose that P (x, ·) is absolutely continuous with

respect to some measure µ for all x and χ(Pk,Π) <∞ for some k ≥ 0 then

χ(Pn,Π) ≤ χ(Pk,Π)‖P‖n−k, ∀n ≥ k.

Furthermore, if ‖P‖ < 1 then the chain is L2(Π)-geometrically ergodic.

100

Proof. We can suppose that k = 0 without loss of generality. Given a measurable set A,

suppose that µ(A) = 0. It implies P (x,A) = 0 for all x. Because Π = ΠP , we have

Π(A) =

∫X

Π(dx)P (x,A) = 0

and hence Π � µ. Denote π(x) = dΠ/dµ. Note that P n = PP n−1 and Pn = P0Pn, so

we can use the same technique to show that P n(x, ·) � µ and Pn � µ for all n ≥ 1.

χ(P0,Π) < ∞ implies P0 � Π. Combine it with Π � µ, we get P0 � µ. Finally, we can

denote dP n(x, ·)/dµ = kn(x, ·) for all n ≥ 1 and dPn/dµ = pn for all n ≥ 0.

We have

pn(y)− π(y) =

∫X

kn(x, y)p0(x)µ(dx)−∫X

kn(x, y)π(x)µ(dx)

=

∫X

kn(x, y)[p0(x)− π(x)]µ(dx)

because all above integrals are finite. Denote g = [p0 − π]/π and h = [pn − π]/π.

χ(P0,Π) <∞ implies g ∈ L2(Π). Denote dP ∗n(y, ·)/dµ = k∗n(y, x), then

π(x)kn(x, y) = π(y)k∗n(y, x).

We have

h(y) =1

π(y)[pn(y)− π(y)] =

∫X

kn(x, y)

π(y)[p0(x)− π(x)]µ(dx)

=

∫X

k∗n(y, x)p0(x)− π(x)

π(x)µ(dx) =

∫X

k∗n(y, x)g(x)π(dx) = P ∗ng(y),

or h = P ∗ng. Since P ∗ is an operator from L2(Π) to L2(Π), P ∗n is also an operator from

L2(Π) to L2(Π). g ∈ L2(Π) implies h ∈ L2(Π). We have

χ2(Pn,Π) = ‖h‖2 = ‖P ∗ng‖2.

Because ‖P ∗‖ = ‖P‖,

χ(Pn,Π) = ‖P ∗ng‖ ≤ ‖P ∗n‖‖g‖ ≤ ‖P ∗‖n‖g‖ = χ(P0,Π)‖P‖n.

101

From (B–1), χ2(P0,Π) < ∞ if and only if P0 ∈ L2(Π). If ‖P‖ < 1, the chain is L2(Π)-


Remark B.5. We need g, h ∈ L2(Π) to have χ2(Pn,Π) = 〈P nh, g〉 and

‖h‖2 = 〈P nh, g〉 ⇒ ‖h‖2 ≤ ‖P n‖‖h‖‖g‖ ≤ ‖P‖n‖h‖‖g‖

⇒ ‖h‖ ≤ ‖P‖n‖g‖.

Example B.6. Φ starts at some point x0 so P0 = δx0 (Dirac measure), and P (x, ·) � λ

where λ is the Lebesgue on R. Since λ({x0}) = 0 and δx0({x0}) = 1 6= 0, δx0 is not

absolutely continuous with respect to λ. By the definition of chi-square distance, we

have χ(P0,Π) = ∞, so we can not use Lemma B.4 here. P1 = P (x0, ·) implies P1 � λ. If

we have χ2(P1,Π) <∞, we can use Lemma B.4 with k = 1.

Without the condition χ(Pk,Π) < ∞ for some constant k ≥ 1, we still can show that

the chain is geometrically ergodic for almost all starting points.

Proposition B.7. Suppose that P (x, ·) is absolutely continuous with respect to some

measure µ for all x, and Φ is ϕ-irreducible. If ‖P‖ < 1 then Φ is Π-a.e. geometrically

ergodic.

Proof. By Lemma B.4, Φ is L2(Π)-geometrically ergodic. By Theorem 1 in Roberts and

Tweedie (2001), Φ is Π-a.e. geometrically ergodic.

Suppose that P0 and P (x, ·) are absolutely continuous with respect to some

measure µ for all x. We now derive a formula for chi-square distance in reversible case

which is similar to (2.7) in Diaconis et al. (2008, p.156). P = P ∗ implies

χ2(Pn,Π) = ‖P ng‖2 = 〈P ng, P ng〉 = 〈P 2ng, g〉.

102

Denote by σ(P ) the spectrum of P . By the spectral theorem (see Conway, 1990, p.263),

there exists the spectral measure E on σ(P ) such that

P =

∫σ(P )

λdE(λ).

Let φ(λ) = λ2n. Using results in Conway (1990, p.264), we have the following lemma.

Lemma B.8. Suppose that P0 and P (x, ·) are absolutely continuous with respect to

some measure µ for all x. If Φ is reversible then

χ2(Pn,Π) = 〈P 2ng, g〉 =

∫σ(P )

λ2ndEg,g, (B–3)

where Eg,g(·) = 〈E(·)g, g〉 is a nonnegative measure on σ(P ) (Conway, 1990, p.257).

When P is compact, σ(P ) is discrete and hence the integral in (B–3) is equal to a

sum which is similar to (2.7) in Diaconis et al. (2008, p.156).

The next lemma is analogue to Proposition 13.3.2 in Meyn and Tweedie (2009).

Lemma B.9. Suppose that P0 and P (x, ·) are absolutely continuous with respect

to some measure µ for all x, and the chain is reversible. Then chi-square distance

χ2(Pn,Π) is non-increasing in n.

Proof. If m ≥ n then λ2m ≤ λ2n for |λ| ≤ 1. Because Eg,g is a nonnegative measure, we

have

χ2(Pn,Π) =

∫σ(P )

λ2ndEg,g ≥∫σ(P )

λ2mdEg,g = χ2(Pm,Π).

103

APPENDIX CF -GEOMETRIC ERGODICITY

We review the drift condition method for general state space and discrete state

space Markov chains. We state results in a more completed way than known docu-

ments. Let Φ = {Xk}∞k=0 be a ψ-irreducible Markov chain on a countably generated

measure space (X,F) with a Markov transition kernel P (x, ·), and let F+ = {A ∈ F :

ψ(A) > 0} (see Meyn and Tweedie, 2009, p.91). We denote Meyn and Tweedie (2009)

by MT throughout this section. Given A ∈ F , the first return time on A is defined by

τA = min{k ≥ 1 : Xk ∈ A} (see MT, p.71). A ∈ F is full if ψ(Ac) = 0 where Ac is the

complement of A, and A is absorbing if P (x,A) = 1 for all x ∈ A (see MT, p.91).

Consider an F-measurable function f : X → [1,∞]. For any constant r > 1, denote

(see MT, p.368 and p.372)

R(r)A (x, f) = Ex

[τA−1∑k=0

f(Xk)rk

], ∀x ∈ X,∀A ∈ F .

A set A ∈ F is called f -geometrically regular if for each B ∈ F+ there exists r =

r(f,B) > 1 such that (see MT, p.368)

supx∈A

R(r)B (x, f) = sup

x∈AEx

[τB−1∑k=0

f(Xk)rk

]<∞.

Given any signed measure µ on (X,F), the f -norm of µ is defined by ‖µ‖f = supg:|g|≤f |µf |

where µf =∫Xfdµ (see MT, p.334). Suppose that Φ has a stationary distribution π. Φ is

called f -geometrically ergodic if there exists a constant r > 1 such that (see MT, p.359)

∞∑n=1

rn‖P n(x, ·)− π‖f <∞, ∀x ∈ X.

Φ is called π-f -geometrically ergodic if there exists a constant r > 1 such that

∞∑n=1

rn‖P n(x, ·)− π‖f <∞, π-a.s.

104

For f = 1, we use terms π-geometric ergodicity (geometric ergodicity, resp.) instead of

π-1-geometric ergodicity (1-geometric ergodicity, resp.) Denote by 1A the function which

is equal 1 on A and equal 0 otherwise.

Theorem 15.0.1 in MT is about sufficient conditions for π-geometric ergodicity.

Theorem 6.14 in Nummelin (1984) is about necessary and sufficient conditions for

π-geometric ergodicity and there is no drift condition. The next theorem improves both

of those theorems from π-geometric ergodicity to π-f -geometric ergodicity, and π-f -

geometric ergodicity is stated as a necessary and sufficient condition. But note that

many theorems in Chapter 15 in MT are about π-f -geometric ergodicity, so many proofs

in MT still can be used to prove many parts of the next theorem. To make it clearer, we

note that the next theorem is analogous to Theorem 14.0.1 in MT. In Theorem 14.0.1 in

MT, we first fix a function f , and then we want to know Markov chain is π-f -ergodic or

not. In the next theorem, we also fix a function f first, and find necessary and sufficient

conditions for π-f -geometric ergodicity.

Theorem C.1. (MT, Theorem 15.0.1) Suppose that Φ is ψ-irreducible, aperiodic, and

recurrent. Given an F-measurable function f : X → [1,∞) then the following five

conditions are equivalent:

(i) Φ is positive with a stationary distribution π such that πf < ∞, and there exist

some petite set A ∈ F+, constants M <∞, r > 1, c1, and c2 such that for all x ∈ A

and n ≥ 0,

rn|P nf(x)− c1| < M (C–1)

and

rn|P n(x,A)− c2| < M. (C–2)

(ii) There exist some petite set C ∈ F and constant κ > 1 such that

supx∈C

R(κ)C (x, f) = sup

x∈CEx

[τC−1∑k=0

f(Xk)κk

]<∞. (C–3)

105

(iii) There exist some petite set B, constants r > 1, b < ∞, and a measurable function

V : X→ [1,∞], which is finite at some x0 ∈ X and V ≥ f , such that

PV ≤ r−1V + b1B. (C–4)

(iv) There exists an absorbing and full subset of X which is a subset of a union of a

countable number of f -geometrically regular sets.

(v) Φ is π-f -geometrically ergodic, i.e. there exists stationary distribution π and some

constant s > 1 such that

∞∑n=1

sn‖P n(x, ·)− π‖f <∞, π-a.s. (C–5)

Furthermore, any of these five conditions implies that SV = {x : V (x) < ∞} is full and

absorbing, where V is any function such that condition (iii) holds, πV <∞, there exists a

constant a <∞ such that

∞∑n=1

sn‖P n(x, ·)− π‖f ≤ aV (x), x ∈ SV ,

where s is any constant such that condition (v) holds, and there also exist constants s

and a <∞ such that

∞∑n=1

sn‖P n(x, ·)− π‖V ≤ aV (x), x ∈ SV .

Remark C.2. We compare some differences between Theorem C.1 and Theorem 15.0.1

in MT.

• In Theorem C.1, π-f -geometric ergodicity (Theorem C.1(v)) is equivalent to eachof four conditions Theorem C.1(i)-Theorem C.1(iv). But in Theorem 15.0.1 in MT,π-geometric ergodicity (formula (15.4)) is only stated as a necessary condition foreach of three conditions Theorem 15.0.1(i)-Theorem 15.0.1(iii). Theorem C.1 isanalogous to Theorem 6.14 in Nummelin (1984). In Theorem 6.14 in Nummelin(1984), π-geometric ergodicity (Theorem 6.14(iii)) is equivalent to each of twoconditions Theorem 6.14(i) and Theorem 6.14(ii).

• The difference between Theorem 15.0.1 in MT and Theorem C.1 is analogous tothat between Chapter 13 and Chapter 14 in MT. Theorem 15.0.1 in MT actually

106

only mentions conditions for π-geometric ergodicity, but Theorem C.1 mentionsconditions for π-f -geometric ergodicity. Note that each of five equivalent conditionsin Theorem C.1 involves f . To make it clearer, we now explain why Theorem 15.0.1in MT is not enough for π-f -geometric ergodicity. Given an F-measurable functionf : X → [1,∞), we want to know Φ is π-f -geometric ergodicity or not. First, we tryto apply Theorem 15.0.1 in MT. Suppose that (15.2) in MT, which is

supx∈C

Ex [κτC ] <∞,

holds. We can show that it is equivalent to a special case of (C–3) (set f = 1 in(C–3)). (15.2) in MT does not involve f anywhere so it should not work. Theorem15.0.1 in MT only tell us that the chain is π-V -geometrically ergodic where V isdefined by Theorem 15.2.4 in MT (we set f = 1 in Theorem 15.2.4)

V (x) = Ex

[σC∑k=0

1X(Xk)rk

],

where σC = min{k ≥ 0 : Xk ∈ C}. There is no relationship between V and f soTheorem 15.0.1 in MT does not tell us Φ is π-f -geometrically ergodic or not. Wenow apply Theorem C.1. Suppose that Theorem C.1(ii) holds. Theorem C.1 showsthat we can find a drift function V such that V ≥ f , and Φ is both π-V -geometricallyergodic and π-f -geometrically ergodic.

• Theorem C.1(i) with f = 1 is exactly Theorem 15.0.1(i) in MT. For f = 1, we haveP nf(x) − πf = 1 − 1 = 0 so 0 = rn|P nf(x) − c1| < M (set c1 = πf = 1) obviouslyholds.

• We do not use ∆V = PV − V to define drift condition (C–4) because PV − V maybe∞−∞. For example,∞−∞ may happens on a non trivial set in Lemma 15.2.4in (MT). We can use ∆V if we state that drift condition (C–4) holds almost surely.

• We use different notations for petite sets (A, B, and C) in Theorem C.1. Forexample, if Theorem C.1(iii) holds with petite set B, than we can not select petiteset C = B in Theorem C.1(ii). The reason for selecting different notations for ratesr, κ, and s in Theorem C.1 is similar.

To prove Theorem C.1, we need some lemmas. Denote (see MT, (15.23))

U(r)C (x, f) = Ex

[τC∑k=1

f(Xk)rk

]

and (see MT, (15.29))

G(r)C (x, f) = Ex

[σC∑k=0

f(Xk)rk

].

107

If (C–3) holds for any set C then C is called a f -Kendall set of rate κ.

Lemma C.3.

(i) Let r > 1 then Ex[rτC ] <∞⇔ Ex[∑τC

n=1 rn] <∞⇔ Ex[

∑τC−1n=0 rn] <∞.

(ii) If supC f(x) < ∞ and supB R(r)C (x, f) < ∞ then supB f(x) < ∞, supB Ex(r

τC ) < ∞,

supB U(r)C (x, f) <∞, and supB G

(r)C (x, f) >∞.

(iii) If C is f -Kendall of rate r, then supC f(x) <∞, supC Ex(rτC ) <∞, supC U

(r)C (x, f) <

∞, and supC G(r)C (x, f) <∞.

Proof.

(i) Because Ex[∑τC

n=1 rn] = r

r−1Ex(r

τC − 1) and Ex[∑τC−1

n=0 rn] = 1r−1

Ex(rτC − 1), we

have (i).

(ii) We have supC f(x), supB R(r)C (x, f) < M < ∞ for some constant M . Ex[f(Φ0)r0] =

f(x), so M > supB f(x). Then Ex[f(ΦτC )rτC ] ≤ MEx(rτC ). Because f ≥ 1, we

have∞ > supB R(r)C (x, f) ≥ supB Ex[

∑τC−1n=0 rn]. By part (i), supB Ex(r

τC ) < ∞,

so supB Ex[f(ΦτC )rτC ] < ∞. And then supB U(r)C (x, f) < ∞. Finally, G(r)

C (x, f) ≤

f(x) + U(r)C (x, f) so supB G

(r)C (x, f) <∞.

(iii) We apply part (ii) with B = C.

Lemma C.4. (MT, Lemma 15.2.3 and Theorem 15.2.4)

(i) If 0 ≤ f(x) <∞ then at x

PG(r)C = r−1G

(r)C − r

−1I + r−1ICU(r)C .

(ii) Given 1 ≤ f <∞ then VC(x) = G(r)C (x, f) satisfies

PVC(x) =

r−1VC(x)− r−1f(x) + r−1U(r)C (x, f), x ∈ C,

r−1VC(x)− r−1f(x), x /∈ C.

VC(x) = f(x) for x ∈ C and VC ≥ f .

If C is a f -Kendall set of rate r then VC is a solution to (C–4) and bounded on C. VC is

also bounded on B for any f -geometrically regular set B.

108

Proof.

(i): G(r)C = I + ICcU

(r)C , so U (r)

C = G(r)C − I + ICU

(r)C when f(x) < ∞. And U (r)

C =∑∞n=0(PICc)

nPrn+1 = rPG(r)C . We can finish the proof by combining both above

formulas.

(ii): Fix some 1 ≤ f <∞ and let VC(x) = G(r)C (x, f), then

PVC = r−1VC − r−1I + r−1ICU(r)C .

We can prove that ICU(r)C (x, f) = IC(x)U

(r)C (x, f), so

PVC(x) =

r−1VC(x)− r−1f(x) + r−1U(r)C (x, f), x ∈ C,

r−1VC(x)− r−1f(x), x /∈ C.

Note that VC(x) = f(x) for x ∈ C and VC(x) ≥ Ex[f(x)] = f(x).

Now we suppose that C is a f -Kendall set of rate r. By Lemma C.3(iii), we have

supC f(x) <∞ and supC U(r)C (x, f) <∞. So VC is a solution to (C–4) and bounded on C.

Because C is f -Kendall, C is regular, and then C ∈ F+. For any f -geometrically regular

set B, supB R(r)C (x, f) <∞. By Lemma C.3(ii), VC is bounded on B.

Proof of Theorem C.1. We will prove that (ii)⇒(iii)⇒(iv)⇒(ii) and (ii)⇒(v)⇒(i)⇒(ii).

(ii)⇒(iii): It is exactly Lemma C.4(ii).

(iii)⇒(iv): The proof is from Theorem 15.2.6 in MT.

(iv)⇒(ii): There exists a full and absorbing set S which is covered by f -geometrically

regular sets {Sn}. There exists a f -geometrically regular set C = Sn0 ∈ F+, so it is petite

f -Kendall.

(ii)⇒(v): The proof is from Theorem 15.4.1 in MT.

(v)⇒(i): By Theorem 5.2.2 in (MT), there exists a small set C ∈ F+. Φ is π-f -

geometrically ergodic of rate r so∑∞

n=0 rn‖P n(x, ·)− π‖f < ∞ π-a.s. and hence

109

C ∩ {x :∑∞

n=0 rn‖P n(x, ·)− π‖f <∞} ∈ F+. Since

C ∩ {x :∞∑n=0

rn‖P n(x, ·)− π‖f < M} ↑ C ∩ {x :∞∑n=0

rn‖P n(x, ·)− π‖f <∞}

when M →∞, there exists a finite constant M such that A := C∩{x :∑∞

n=0 rn‖P n(x, ·)−

π‖f < M} ∈ F+. C is small so A is small. On A, we have (C–1)-(C–2) with c1 = πf and

c2 = π(f1A).

(i)⇒(ii): The proof is quite similar to the proof of Theorem 15.4.2 in MT. Given a constant

c ∈ R, we also denote by c the constant sequence (c, c, · · · ). For a sequence a = {ai}∞i=0,

we denote a− c = {ai − c}∞i=0, so (a− c)(i) = ai − c. From Chapter 13 and Chapter 14 in

MT, we can show that c1 = πf and c2 = π(A).

First we consider the case A = α is an atom. (i) is equivalent to

rn|P n(α, f)− πf | < M and rn|P n(α, α)− π(α)| < M. (C–6)

From formula (13.45) in (MT), P n(α, f) = [u ∗ tf ](n) where u(n) = P n(α, α) and

tf (n) = Eα[f(Φn)1τα≥n ]. From formula (13.50) in MT,

πf = π(α)∞∑i=1

tf (i) = π(α)n∑i=1

tf (i) + π(α)∞∑

i=n+1

tf (i)

= [π(α) ∗ tf ](n) + π(α)∞∑

i=n+1

tf (i), (because π(α) is constant).

Note that tf is a non-negative sequence and∑n

i=1 tf (i) <∞. We have

P n(α, f)− πf = [(u− π(α)) ∗ tf ](n)− π(α)∞∑

i=n+1

tf (i).

Fix any 1 < s < r,

N∑n=0

sn[P n(α, f)− πf ] =N∑n=0

{[(u− π(α)) ∗ tf ](n)sn} − π(α)N∑n=0

[sn

∞∑i=n+1

tf (i)

].

110

Using |x− y| ≥ ||x| − |y||, we get∣∣∣∣∣N∑n=0

sn[P n(α, f)− πf ]

∣∣∣∣∣ ≥∣∣∣∣∣π(α)

N∑n=0

[sn

∞∑i=n+1

tf (i)

]∣∣∣∣∣−∣∣∣∣∣N∑n=0

{[(u− π(α)) ∗ tf ](n)sn}

∣∣∣∣∣= |SN,1(s)− SN,2(s)|,

where SN,1(s) and SN,2(s) are the first and second terms, respectively. Given non-

negative sequences {an} and {bn}, denote by a ∗ b the convolution of two sequences.

We have

N∑n=0

[(a ∗ b)nsn] =N∑n=0

[n∑k=0

akskbn−ks

n−k

]

=N∑k=0

[(aks

k)N∑n=k

(bn−ksn−k)

]

≤N∑k=0

(aksk)

N∑n=0

(bnsn).

Applying above formula we have

SN,2(s) =

∣∣∣∣∣N∑n=0

{[(u− π(α)) ∗ tf ](n)sn}

∣∣∣∣∣ ≤N∑n=0

{[|u− π(α)| ∗ tf ](n)sn}

≤N∑n=0

tf (n)snN∑k=0

|u− π(α)|(k)sk = cN(s)dN(s),

where cN(s) =∑N

n=0 tf (n)sn and dN(s) =∑N

k=0 |u− π(α)|(k)sk. By (C–6),

dN(s) ≤∞∑k=0

|u− π(α)|(k)sk =∞∑k=0

|P k(α, α)− π(α)|sk <∞.

And then SN,2(s) ≤ cN(s)d(s).

Given any non-negative sequences {an},

N∑n=0

[sn

∞∑i=n+1

ai

]≥

N∑n=0

[sn

N∑i=n+1

ai

]=

N∑i=1

ai

i−1∑n=0

sn = (s− 1)−1

N∑i=1

ai(si − 1)

= (s− 1)−1

[N∑1

aisi −

N∑1

ai

]≥ (s− 1)−1

[N∑1

aisi −

∞∑1

ai

].

111

Replace ai = tf (i) and note that πf = π(α)∑∞

1 tf (i) we get

SN,1(s) = π(α)N∑n=0

[sn

∞∑i=n+1

tf (i)

]≥ (s− 1)−1π(α)[cN(s)− πf/π(α)].

Suppose that cN(s) → ∞ when N → ∞ for all fixed s such that 1 < s < r, so

cN(s)/2 > πf/π(α) for N > N(s). And then SN,1(s) ≥ (s − 1)−1π(α)cN(s)2−1 for

N > N(s). We have

SN,1(s)− SN,2(s) ≥ 2−1(s− 1)−1π(α)cN(s)− cN(s)d(s)

= cN(s)[2−1π(α)(s− 1)−1 − d(s)], N > N(s).

π(α) > 0 so (s − 1)−1π(α)s↓1→ +∞. Since d(s) < ∞, we can select 1 < s0 < r close to 1

such that 2−1π(α)(s0 − 1)−1 > 2d(s0). So SN,1(s0)− SN,2(s0) ≥ cN(s0)d(s0) for N > N(s0).

cN(s)→∞ so SN,1(s0)− SN,2(s0)→∞. And then∣∣∣∣∣N∑n=0

sn0 [P n(α, f)− πf ]

∣∣∣∣∣→∞.There is a contradiction. We must have cN(s) converges to a finite number for some

1 < s1 < r. By formula (13.43) in MT, we have

cN(s1)→∞∑n=0

tf (n)sn1 = Eα

[τα∑k=0

f(φk)sk1

],

so α is f -Kendall. α is an atom so α is petite.

Consider the case Φ is strongly aperiodic with probability measure ν. The split chain

Φ has the accessible atom α := A1 and rn|P n(α, f) − πf | < M for all n where P is the

112

kernel of Φ (see, e.g., MT, p.106-108). We have

rn|P n(α, f)− π∗f | = rn|νP n−1f − πf |, (P n(α, f) = νP n−1f and π∗ = π)

= rn∣∣∣ ∫

C

ν( dx)[P n−1f(x)− πf ]∣∣∣, (ν is a probability measure)

≤ rn∫C

ν( dx)|P n−1f(x)− πf |

≤ r

∫C

ν( dx)M = Mr.

Do the same as above (replace f by 1α), we have

rn|P n(α, α)− π∗(α)| ≤Mr.

From the atom case, α is f -Kendall for the split chain. Since α is an atom, α is petite. So

X is almost everywhere covered by f -geometrically regular sets Sns. A0 ∈ F+ implies

there exist n0 such that C0 = A0 ∩ Sn0 is in F+. C0 ⊂ Sn0, C1 ⊂ α, and Sn0 and α are

f -Kendall, so C is f -Kendall. We can show that

CPn(x, f) = δ∗x(x0)CP

n(x0, f) + δ∗x(x1)CPn(x1, f),

so

U(r)C (x, f) = δ∗x(x0)U

(r)

C(x0, f) + δ∗x(x1)U

(r)

C(x1, f).

From the last formula, C is f -Kendall and C ∈ F+. C ⊂ A so C is a small set.

Finally, consider the case Φ is aperiodic, so Φm is strongly aperiodic for some m.

From strongly aperiodic case, we can find an f -Kendall small set C ∈ F+ for Φm. Using

the proof of Theorem 15.3.6 in MT, C is f -Kendall for Φ.

We have prove that each of conditions (i)-(v) is equivalent to each other condition.

By Theorem 14.3.7 in MT, we have πV <∞. It is easy to check other results.

If Theorem C.1 is about π-f -geometric ergodicity, then the next theorem is about

f -geometric ergodicity. It adds more results to Theorem 15.3.3 in MT.

113

Theorem C.5. (MT, Theorem 15.3.3) Suppose that Φ is ψ-irreducible. Given an F-

measurable function f : X→ [1,∞) then the following three conditions are equivalent:

(i) There exist some petite set C ∈ F and constant κ > 1 such that

supx∈C

R(κ)C (x, f) = sup

x∈CEx

[τC−1∑k=0

f(Xk)κk

]<∞

and R(κ)C (x, f) <∞ for all x ∈ X.

(ii) There exist some petite set B, constants r > 1, b < ∞, and a finite measurable

function V : X→ [1,∞), which V ≥ f , such that

PV ≤ r−1V + b1B.

(iii) X is covered by a union of a countable number of f -geometrically regular sets.

If Φ is also aperiodic and recurrent, any of these three condition implies that Φ is Harris

recurrent and f -geometrically ergodic.

Remark C.6. We note some differences between Theorem C.1 and Theorem C.5.

• The difference between Theorem C.1 and Theorem C.5 is analogous to thatbetween Theorem 11.0.1 and Theorem 11.3.15 in MT. One result is true almostsurely, the other result is true on the whole state space.

• f -geometric ergodicity is weaker than each condition in Theorem C.5. An anal-ogous one is Theorem 14.3.3(iii) in MT. Also note that regularity is a strongercondition than Harris positivity (we can prove that), and Harris positivity is astronger condition than ergodicity (MT, Theorem 13.3.1).

Proof of Theorem C.5. We will prove that (i)⇒(ii)⇒(iii)⇒(i).

(i)⇒(ii): As in the proof of Theorem C.1, let V = G(r)C (·, f) with r = κ. We only need to

prove that V is finite. V = f on C so V (x) < ∞ for x ∈ C. For x /∈ C, we have τC = σC ,

so V = R(r)C (·, f)+Ex[f(φτC )rτC ]. We now prove Ex[f(φτC )rτC ] <∞. From Lemma C.3(iii),

C is f -Kendall so supC f(x) < ∞. Then Ex[f(φτC )rτC ] ≤ [supC f(x)]Ex[rτC ]. We have

∞ > R(r)C (x, f) ≥ Ex[

∑τC−1k=0 rk]. From Lemma C.3(i), we get Ex[rτC ] <∞.

(ii)⇒(iii): By Theorem 15.2.6 in MT, {V < n} is f -geometrically regular for n > 0.

114

Because V is finite, sets {V < n}s for n ∈ N cover X.

(iii)⇒(i): We can repeat the proof in Theorem C.1 here.

If Φ is also aperiodic and recurrent, by Theorem 15.4.1 in MT, (ii) implies that Φ is

f -geometrically ergodic. By Theorem 11.3.4 in MT, Φ is Harris.

For discrete state space, if something holds π-a.s. then it holds on the whole state

space. For example, all recurrent chains is Harris recurrent for discrete state space.

To the end of this section, we only focus on the equivalence between f -geometric

ergodicity and drift condition. If f -geometric ergodicity is weaker than drift condition in

Theorem C.5, then they are equivalent in discrete state space.

Theorem C.7. Let Φ be an irreducible, aperiodic, and recurrent chain with state space X

and transition matrix P . Φ is f -geometrically ergodic if and only if there exist some petite

set B, constants r > 1, b <∞, and a finite function V ≥ f on X such that

PV ≤ r−1V + b1B.

Proof. By Theorem C.1, Φ is π-f -geometrically ergodic and V is finite almost every-

where. Because state space is discrete, Φ is f -geometrically ergodic and V is finite.

Using Theorem C.7, we can prove the next theorem (we replace r−1 by 1− ε).

Theorem C.8. (Popov, 1977) Let Φ be an irreducible, aperiodic, and recurrent chain with

state space X and transition matrix P . Fix any finite subset C 6= ∅ of the state space.

Φ is geometrically ergodic if and only if there exist constants b < ∞, ε > 0, and a finite

function V ≥ 1 on X such that

PV ≤ (1− ε)V + b1C .

Remark C.9. We note some differences between Theorem C.7 and Theorem C.8.

• Theorem C.7 is for f -geometric ergodicity. Theorem C.8 is for geometric ergodicityonly.

115

• In Theorem C.7, B could be a petite set. In Theorem C.8, we need finite set. Wecan prove that every finite set is petite in Lemma C.10(i). But a petite set may notbe finite. For example, if Pij = πj for all i then all sets is small.

To prove Theorem C.8 by using Theorem C.7, we need the following lemma.

Lemma C.10. Let Φ be a discrete state-space, irreducible and aperiodic chain with

transition matrix P . Then

(i) Every finite set is small;

(ii) If Φ is also recurrent and geometrically ergodic, every finite set is geometrically

regular.

Proof.

(i) Suppose that C is finite. Fix any state j. The chain is irreducible and aperiodic, so for

each i ∈ C, there exists Ni such that P n(i, j) > 0 for n ≥ Ni. Let N = maxi∈C Ni. Define

ν by ν(j) = mini∈C PN(i, j) > 0 and ν(i) = 0 when i 6= j. Let B be any subset of state

space. If j /∈ B, ν(B) = 0 ≤ PN(i, B). If j ∈ B and i ∈ C, PN(i, B) ≥ PN(i, j) ≥ ν(j) =

ν(B). So C is small.

(ii) Suppose that C is finite. Φ is geometrically ergodic, by Theorem C.7, we can cover

X by a countable number of geometrically regular sets {Ci}s. For each state i0 in C,

there exists a geometrically regular set Ci0 such that i0 ∈ Ci0. From the definition of

geometrically regular set, union of finite geometrically regular sets is geometrically

regular, and a subset of a geometrically regular set is geometrically regular. So ∪i0∈CCi0

is geometrically regular, and then subset C of ∪i0∈CCi0 is geometrically regular.

Proof of Theorem C.8. If Φ is geometrically geometric then C is geometrically regular by

Lemma C.10(ii). By Theorem 15.2.1 in MT, C is a petite Kendall set. By Theorem C.7,

there exists a drift condition for C.

Suppose that there is a drift condition for C and some V . C is small by Lemma C.10(i),

so Φ is geometrically ergodic by Theorem C.7.

116

REFERENCES

AKHIEZER, N. I. and GLAZMAN, I. M. (1993). Theory of linear operators in Hilbertspace. Dover Publications Inc.

BILLINGSLEY, P. (1995). Probability and measure. 3rd ed. Wiley Series in Probabilityand Mathematical Statistics, John Wiley & Sons, Inc., New York.

BRAMSON, M. (2008). Stability of queueing networks, vol. 1950 of Lecture Notes inMathematics. Springer, Berlin.

BROOKS, S., GELMAN, A., JONES, G. and MENG, X.-L. (eds.) (2011). Handbook ofMarkov Chain Monte Carlo. Chapman & Hall/CRC Press.

CHAN, K. S. and GEYER, C. J. (1994). Discussion: Markov chains for exploringposterior distributions. The Annals of Statistics, 22 1747–1758.

CHEN, M. F. (1992). From Markov chains to nonequilibrium particle systems. WorldScientific Publishing Co., Inc., River Edge, NJ.

CHEN, M.-F. (2004). From Markov chains to non-equilibrium particle systems. 2nd ed.World Scientific Publishing Co., Inc., River Edge, NJ.

CHOI, H. M. and HOBERT, J. P. (2013). Analysis of MCMC algorithms for Bayesianlinear regression with Laplace errors. Journal of Multivariate Analysis, 117 32–40.

CONWAY, J. B. (1990). A course in functional analysis, vol. 96 of Graduate Texts inMathematics. 2nd ed. Springer-Verlag, New York.

DIACONIS, P., KHARE, K. and SALOFF-COSTE, L. (2008). Gibbs sampling, exponentialfamilies and orthogonal polynomials. Statistical Science, 23 151–178.

DIACONIS, P. and STROOCK, D. (1991). Geometric bounds for eigenvalues of Markovchains. The Annals of Applied Probability, 1 36–61.

DIEBOLT, J. and ROBERT, C. P. (1994). Estimation of finite mixture distributions byBayesian sampling. Journal of the Royal Statistical Society, Series B, 56 363–375.

FLEGAL, J. M., HARAN, M. and JONES, G. L. (2008). Markov chain Monte Carlo: Canwe trust the third significant figure? Statistical Science, 23 250–260.

HALMOS, P. R. (1950). Measure Theory. D. Van Nostrand Company, Inc., New York, N.Y.

HOBERT, J. P. and CASELLA, G. (1996). The effect of improper priors on Gibbssampling in hierarchical linear mixed models. Journal of the American StatisticalAssociation, 91 1461–1473.

117

HOBERT, J. P. and GEYER, C. J. (1998). Geometric ergodicity of Gibbs and block Gibbssamplers for a hierarchical random effects model. Journal of Multivariate Analysis, 67414–430.

HOBERT, J. P. and KHARE, K. (2014). Computable upper bounds on the distance tostationarity for Jovanovski and Madras’s Gibbs sampler. Tech. rep., U. of Florida.

HOBERT, J. P., ROY, V. and ROBERT, C. P. (2011). Improving the convergence prop-erties of the data augmentation algorithm with an application to Bayesian mixturemodeling. Statistical Science, 26 332–351.

JARNER, S. F. and HANSEN, E. (2000). Geometric ergodicity of Metropolis algorithms.Stochastic Processes and their Applications, 85 341–361.

JARNER, S. F. and TWEEDIE, R. L. (2003). Necessary conditions for geometric andpolynomial ergodicity of random-walk-type Markov chains. Bernoulli, 9 559–578.

JONES, G. L. and HOBERT, J. P. (2001). Honest exploration of intractable probabilitydistributions via Markov chain Monte Carlo. Statistical Science, 16 312–334.

JOVANOVSKI, O. and MADRAS, N. (2014). Convergence rates for hierarchical Gibbssamplers. Tech. rep., York University. ArXiv:1402.4733.

JUNG, Y. J. and HOBERT, J. P. (2014). Spectral properties of MCMC algorithms forBayesian linear regression with generalized hyperbolic errors. Statistics & ProbabilityLetters, 95 92–100.

KADISON, R. V. and RINGROSE, J. R. (1997). Fundamentals of the theory of operatoralgebras. Vol. I, vol. 15 of Graduate Studies in Mathematics. American MathematicalSociety, Providence, RI.

KARLIN, S. and MCGREGOR, J. (1959). Random walks. Illinois Journal of Mathematics,3 66–81.

KARLIN, S. and TAYLOR, H. M. (1975). A First Course in Stochastic Processes.Academic Press.

KHARE, K. and HOBERT, J. P. (2011). A spectral analytic comparison of trace-class dataaugmentation algorithms and their sandwich variants. The Annals of Statistics, 392585–2606.

KNOPP, K. (1951). Theory and Application of Infinite Series. Courier Corporation.

KOVCHEGOV, Y. (2010). Orthogonality and probability: mixing times. ElectronicCommunications in Probability, 15 59–67.

ŁATUSZYNSKI, K., ROBERTS, G. and ROSENTHAL, J. (2013). Adaptive Gibbs samplersand related MCMC methods. The Annals of Applied Probability 66–98.

LEONI, G. (2009). A First Course in Sobolev Spaces. American Mathematical Soc.

118

LI, B. (2003). Real operator algebras. World Scientific Publishing Co. Inc., River Edge,NJ.

LIU, J. S., WONG, W. H. and KONG, A. (1994). Covariance structure of the Gibbssampler with applications to comparisons of estimators and augmentation schemes.Biometrika, 81 27–40.

LIU, J. S., WONG, W. H. and KONG, A. (1995). Covariance structure and convergencerate of the Gibbs sampler with various scans. Journal of the Royal Statistical Society.Series B. Methodological, 57 157–169.

MAO, Y., TAI, Y., ZHAO, Y. Q. and ZOU, J. (2012). Ergodicity for the $GI/G/1$-typeMarkov Chain. ArXiv e-prints 16. 1208.5225.

MAO, Y.-H. (2010). Convergence rates for reversible Markov chains without theassumption of nonnegative definite matrices. Science China. Mathematics, 531979–1988.

MAO, Y.-H. and ZHANG, Y.-H. (2004). Exponential ergodicity for single-birth processes.Journal of Applied Probability, 41 1022–1032.

MEYN, S. and TWEEDIE, R. L. (2009). Markov chains and stochastic stability. 2nd ed.Cambridge University Press, Cambridge.

NUMMELIN, E. (1984). General irreducible Markov chains and nonnegative operators,vol. 83 of Cambridge Tracts in Mathematics. Cambridge University Press, Cambridge.

PAL, S., KHARE, K. and HOBERT, J. P. (2015). Trace class markov chains for bayesianinference with generalized double pareto shrinkage prior. Tech. rep., U. of Florida.

POPOV, N. N. (1977). Geometric ergodicity conditions for countable Markov chains.Doklady Akademii Nauk SSSR, 234 316–319.

RINGROSE, J. R. (1971). Compact non-self-adjoint operators. Van Nostrand ReinholdCo, London, New York.

ROBERTS, G. O. and ROSENTHAL, J. S. (1997). Geometric ergodicity and hybridMarkov chains. Electronic Communications in Probability, 2 no. 2, 13–25 (electronic).

ROBERTS, G. O. and ROSENTHAL, J. S. (1998). Markov chain Monte Carlo: Somepractical implications of theoretical results (with discussion). Canadian Journal ofStatistics, 26 5–31.

ROBERTS, G. O. and ROSENTHAL, J. S. (2004). General state space Markov chainsand MCMC algorithms. Probability Surveys, 1 20–71 (electronic).

ROBERTS, G. O. and TWEEDIE, R. L. (2001). Geometric L2 and L1 convergence areequivalent for reversible Markov chains. Journal of Applied Probability, 38A 37–41.

119

1208.5225

SEARLE, S. R., CASELLA, G. and MCCULLOCH, C. E. (1992). Variance components.Wiley Series in Probability and Mathematical Statistics: Applied Probability andStatistics, John Wiley & Sons, Inc., New York.

TAN, A. (2009). Convergence Rates and Regeneration of the Block Gibbs Sampler forBayesian Random Effects Models. Ph.D. thesis, Department of Statistics, University ofFlorida.

TAN, A. and HOBERT, J. P. (2009). Block Gibbs sampling for Bayesian random effectsmodels with improper priors: convergence and regeneration. Journal of Computationaland Graphical Statistics, 18 861–878.

TAN, A., JONES, G. L. and HOBERT, J. P. (2013). On the geometric ergodicity of two-variable Gibbs samplers. In Advances in Modern Statistical Theory and Applications:A Festschrift in Honor of Morris L. Eaton (G. L. Jones and X. Shen, eds.), vol. 10 ofIMS Collections Ser. IMS, Beachwood, OH, 25–42.

VAN DOORN, E. A. and SCHRIJNER, P. (1995). Geometric ergodicity and quasi-stationarity in discrete-time birth-death processes. Australian Mathematical Society.Journal. Series B. Applied Mathematics, 37 121–144.

WINKELBAUER, A. (2012). Moments and Absolute Moments of the Normal Distribution.ArXiv e-prints. 1209.4340.

120

1209.4340

BIOGRAPHICAL SKETCH

Trung Ha was born in 1981 in Hanoi, Vietnam. He was recruited to the gifted

mathematics program of Be Van Dan elementary and middle school in 1990 and to

the mathematics program of HUS High School for Gifted Students, Vietnam National

University, Hanoi in 1996. In 1999, he was admitted to the mathematics program of

the Center of Training of Talented Engineers, Hanoi University of Technology. In 2004,

he earned his bachelor’s degree in applied mathematics and informatics. In 2005, he

worked as researcher at the Department of Probability and Statistics, Hanoi Institute of

Mathematics. He got VEF fellowship to attend the Department of Statistics, University

of Florida in 2007. He received his master’s degree in 2013 and his doctorate degree in

2016.

121

Documents

CONVERGENCE ANALYSIS OF BIRTH-DEATH MARKOV …ufdcimages.uflib.ufl.edu/UF/E0/05/02/42/00001/HA_T.pdf2008). Another way to obtain geometric ergodicity is via spectral theory, as we