View
0
Download
0
Category
Preview:
Citation preview
CONVERGENCE ANALYSIS OF BIRTH-DEATH MARKOV CHAINS AND GIBBSSAMPLERS
By
TRUNG HA
A DISSERTATION PRESENTED TO THE GRADUATE SCHOOLOF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OFDOCTOR OF PHILOSOPHY
UNIVERSITY OF FLORIDA
2016
c© 2016 Trung Ha
To my family
ACKNOWLEDGMENTS
Most of all, I owe my gratitude to my advisor, Dr. James Hobert. He has been a
great mentor and supporter. He gave me advice to take some very helpful courses for
my research. He suggested me many research projects and gave me the freedom to
select my favorite ones. He was patient with my research progression when I needed
time to read some textbooks for my research. He strongly supported me when I had
problem with my visa, so I could keep my mind on my research.
I would like to thank my dissertation committee of Dr. Brett Presnell, Dr. Kshitij
Khare, and Dr. Taylor Stein. Dr. Presnell helped me to improve my notations. Dr. Kshitij
shared some important notes with me. Dr. Taylor Stein quickly helped me to set up my
exam schedule.
I would like to thank Vietnam Education Foundation for bringing the chance to
attend PhD program in the United States to me. I would also like to thank Hanoi Institute
of Mathematics, especially my advisor Dr. Nguyen Dinh Cong, for supporting me to
study in the United States.
I would like to thank the faculty at the Department of Statistics at the University of
Florida for teaching me a great deal of knowledge in statistics.
I would like to thank Dr. Dinh Quang Luu (1947-2005) and my undergraduate
advisor, Dr. Bui Khoi Dam, for inspiring me in my field of study, probability and statistics.
Last but not least, this dissertation is dedicated to the memory of my late father and
my family, especially my wife and my children, for their endless love and support.
4
TABLE OF CONTENTS
page
ACKNOWLEDGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
CHAPTER
1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.1 Background on General State Space Markov Chains . . . . . . . . . . . . 101.1.1 Basic Definitions and Convergence Concepts . . . . . . . . . . . . 101.1.2 Spectral Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111.1.3 Cp Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121.1.4 Gibbs Sampler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.2 Overview of the Remaining Chapters . . . . . . . . . . . . . . . . . . . . . 161.2.1 Chapter 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161.2.2 Chapter 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191.2.3 Chapter 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2 CHARACTERIZATION OF COMPACTNESS FOR BIRTH-DEATH MARKOVOPERATORS WITH APPLICATIONS TO GIBBS SAMPLING . . . . . . . . . . 23
2.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232.2 Compactness of Birth-Death Markov Operators . . . . . . . . . . . . . . . 232.3 Application to a Family of Gibbs Samplers . . . . . . . . . . . . . . . . . . 252.4 Birth-Death Chains Are Not Uniformly Ergodic . . . . . . . . . . . . . . . . 29
3 SPECTRAL ANALYSIS OF GIBBS SAMPLERS FOR BAYESIAN LINEAR MIXEDMODELS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303.2 The Models and the Gibbs Samplers . . . . . . . . . . . . . . . . . . . . . 30
3.2.1 Proper Prior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303.2.2 Improper Prior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.3 Hobert & Geyer’s Gibbs Sampler Is Not Hilbert-Schmidt . . . . . . . . . . 363.4 The Gibbs Sampler with Improper Priors (and Alternative Blocking) Is
Not Trace Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4 CHARACTERIZATION OF GEOMETRIC ERGODICITY FOR BIRTH-DEATHMARKOV CHAINS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 624.2 Some Known Results on the Geometric Ergodicity of Birth-Death Chains . 63
4.2.1 Orthogonal Polynomial Method . . . . . . . . . . . . . . . . . . . . 63
5
4.2.2 Spectral Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 694.3 Drift Condition Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.3.1 Geometric Ergodicity of Birth-Death Chains . . . . . . . . . . . . . 704.3.2 Application to a Family of Gibbs Samplers . . . . . . . . . . . . . . 834.3.3 Geometric Ergodicity for a Family of Random Walks on Z . . . . . . 844.3.4 Some Other Results . . . . . . . . . . . . . . . . . . . . . . . . . . 88
APPENDIX
A SOME LEMMAS AND EXAMPLES . . . . . . . . . . . . . . . . . . . . . . . . . 93
B CHI-SQUARE DISTANCE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
C F -GEOMETRIC ERGODICITY . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
BIOGRAPHICAL SKETCH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
6
LIST OF FIGURES
Figure page
1-1 Pr(U = i, V = j) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
A-1 Graph in R2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
A-2 Graph in R3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
7
Abstract of Dissertation Presented to the Graduate Schoolof the University of Florida in Partial Fulfillment of theRequirements for the Degree of Doctor of Philosophy
CONVERGENCE ANALYSIS OF BIRTH-DEATH MARKOV CHAINS AND GIBBSSAMPLERS
By
Trung Ha
August 2016
Chair: James P. HobertMajor: Statistics
Markov chain Monte Carlo (MCMC) is one of the most powerful computational tools
in statistics, and the Gibbs sampler (GS) is an important special case of MCMC. The
GS is often used to approximate Bayesian statistical estimators (posterior expectations).
While it is usually simple to apply the GS in practice, the Markov chain convergence
analysis that is required to ensure that the results are reasonable can be very difficult.
In particular, one must demonstrate that the underlying Markov chain converges at a
geometric rate. The two most popular methods of establishing this are the drift and
minorization method and the spectral method. We perform a spectral analysis of the
toy GS from Tan et al. (2013) by exploiting the fact that one of the marginal chains
associated with this GS is a birth-death Markov chain. In particular, we develop a
general necessary and sufficient condition for the compactness of birth-death Markov
operators, and use this to get a necessary and sufficient condition for compactness of
the Markov operator associated with the marginal Gibbs Markov chain. We note that
(under standard regularity conditions) compactness of a Markov operator implies a
geometric convergence rate for the corresponding Markov chain. We also use spectral
theory to study two different practically relevant GSs for Bayesian linear mixed models,
one with proper priors and a standard parametrization, and the other with improper
priors and an alternative parametrization. The GS for the model with proper priors is
known to be geometrically ergodic (Hobert and Geyer, 1998). We prove that neither of
8
the corresponding Markov operators is trace class. This is contrary to recent results
showing that several common GSs have trace class Markov operators. Finally, we
use drift methodology to find a necessary and sufficient condition for the geometric
ergodicity of a general birth-death Markov chain.
9
CHAPTER 1INTRODUCTION
1.1 Background on General State Space Markov Chains
1.1.1 Basic Definitions and Convergence Concepts
Let Φ = {Xi}∞i=0 be an irreducible, aperiodic, and Harris positive recurrent Markov
chain on a countably generated measure space (X,F) with a Markov transition kernel
P (x, dy) and a stationary distribution π, i.e.
π(A) =
∫X
P (x,A)π(dx), ∀A ∈ F .
Those conditions guarantee that Φ is ergodic (see Meyn and Tweedie, 2009, chap.5 and
chap.13), i.e.
‖P n(x, ·)− π(·)‖TV → 0, ∀x ∈ X,
where ‖·‖TV denotes the total variation norm and P n denotes the n-step Markov transi-
tion kernel of Φ. Therefore, we can approximate π by P n(x, ·) for large n for any x ∈ X.
Given a π-integrable function f in the probability space (X,F , π), we denote
πf =
∫X
fdπ.
When Φ is ergodic with stationary distribution π, the strong law of large numbers (SLLN)
holds for any π-integrable function f and initial distribution ν (see Meyn and Tweedie,
2009, chap.17), i.e.
fn :=1
n
n−1∑i=0
f(Xi)→ πf Pν-a.s.,
where Pν denotes the probability of events when the chain has initial distribution ν. That
means we can approximate πf by fn for large n. We say that a π-integrable function f
satisfies a central limit theorem (CLT) if there exists some constant σ2 < ∞ such that for
any initial distribution ν,
1√n
n−1∑i=0
[f(Xi)− πf ]d−→ N(0, σ2).
10
Ergodicity does not tell us about the rate of convergence in ‖P n(x, ·)− π(·)‖TV and is
not sufficient for the CLT. One of the most popular ways to obtain both of those is to find
conditions for the exponential convergence rate of ‖P n(x, ·)− π(·)‖TV. P is geometrically
ergodic if there exist a function M : X→ [0,∞) and a constant r ∈ [0, 1) such that
‖P n(x, ·)− π(·)‖TV ≤M(x)rn, ∀x ∈ X,∀n ∈ N.
If M(·) is bounded, then the chain is called uniformly ergodic. Geometric ergodicity
for Markov chains on uncountable state spaces is usually established by building drift
and/or minorization conditions (Jones and Hobert, 2001; Meyn and Tweedie, 2009;
Roberts and Rosenthal, 2004). In practice, it is very difficult to construct drift conditions.
Consequently, very few practical Monte Carlo Markov chains have been shown to
be geometrically ergodic. And even when drift and minorization conditions can be
constructed, they typically lead to very poor bounds on M(·) and r (Diaconis et al.,
2008). Another way to obtain geometric ergodicity is via spectral theory, as we explain in
the next section.
1.1.2 Spectral Theory
Define the Hilbert spaces
L2(π) ={f is F-measurable : πf 2 <∞
}and
L20(π) =
{f is F-measurable : πf = 0, πf 2 <∞
}with the inner product 〈f, g〉 = π(fg) and the corresponding norm ‖f‖ =
√πf 2. L2
0(π) is
a closed subspace of L2(π). P defines a Markov operator from L20(π) (or L2(π)) to itself,
which is also denoted by P , as follows
(Pf)(x) =
∫X
f(y)P (x, dy), x ∈ X.
11
The norm of the operator P in the space L20(π) is defined by
‖P‖ = supf∈L2
0(π),‖f‖=1
‖Pf‖.
By Jensen’s inequality, we can show that ‖Pf‖ ≤ 1 for ‖f‖ ≤ 1 and f ∈ L2(π). So
‖P‖ ≤ 1. One of the reasons for studying P on L20(π) instead of on L2(π) is that, on
L2(π), ‖P‖ is always 1, because 1 is always an eigenvalue of P (see, e.g., Hobert
et al., 2011). On the other hand, on L20(π), neither 1 nor -1 is an eigenvalue of P
(see Chan and Geyer, 1994, p.1753), and, moreover, ‖P‖ is a measure of the speed
of convergence of the Markov chain, with smaller values corresponding to faster
convergence.
It is well known that if ‖P‖ < 1 then Φ is geometrically ergodic, and if P is self-
adjoint then ‖P‖ < 1 if and only if Φ is geometrically ergodic (Liu et al., 1995; Roberts
and Rosenthal, 1997; Roberts and Tweedie, 2001). Proposition B.7 gives some detail.
However, it is not easy to show that ‖P‖ < 1 in practice. A sufficient condition for
‖P‖ < 1 is the compactness of P , so the compactness of P implies geometric ergodicity
of the corresponding Markov chain. (An operator is called compact if it maps any
bounded set to a relatively compact set.) Other than this implication, not much is
known about the relationship between compactness and geometric/uniform ergodicity.
It is known that very few MCMC operators are compact. Indeed, Chan and Geyer
(1994, p.1755) show that a Metropolis-Hastings algorithm with non-zero rejection
probabilities cannot be compact. However, if P is Hilbert-Schmidt (see Section 1.1.3
for the definition), then we have a closed-form expression for the chi-square distance
between P n(x, ·) and π (Diaconis et al., 2008). Unfortunately, in general, checking
compactness is also difficult.
1.1.3 Cp Class
The Hilbert-Schmidt Markov operators are a subset of the compact Markov opera-
tors, and the trace class Markov operators are a subset of the Hilbert-Schmidt Markov
12
operators. (Definitions are provided later in this section.) Fortunately, there are simple
techniques for checking whether or not a given Markov operator is Hilbert-Schmidt or
trace class. This provides an avenue for establishing compactness and hence geometric
ergodicity. Moreover, in practice, it is sometimes easier to show that the trace class (or
Hilbert-Schmidt) condition is satisfied than it is to construct a geometric drift condition.
Some recent studies have shown that several practically relevant Markov operators are
trace class (Choi and Hobert, 2013; Jung and Hobert, 2014; Khare and Hobert, 2011;
Pal et al., 2015).
The following facts are true for both L2(π) and L20(π) so we only talk about L2(π).
Denote the dimension of the Hilbert space L2(π) by dimL2(π) and the set of all bounded
linear operators from L2(π) to itself by B(L2(π)). For definitions see Ringrose (1971). For
1 ≤ p <∞, denote by Cp the set of all operators T in B(L2(π)) such that
∑j∈J
|〈Tϕj, ϕj〉|p <∞,
where {ϕj : j ∈ J} is any orthonormal system of L2(π) such that the cardinality of J is
less than or equal to dimL2(π), and the above sum is unordered sum.
From Theorem 1.8.7 in Ringrose (1971), each operator in Cp is compact. If T is a
compact self-adjoint operator, T has countable non-zero real eigenvalues {λn}, counting
multiplicity, and then T ∈ Cp if and only if∑
n |λn|p <∞ (see Ringrose, 1971, p.86).
Below we suppose that P (x, dx′) = k(x, x′)µ(dx′) and π(dx) = π(x)µ(dx) for some
positive measure µ (we use π to denote both measure and density function), and π(x) is
positive µ-almost everywhere.
C2 is called the Hilbert-Schmidt class of operators on L2(π). Given any orthonormal
bases {ϕj : j ∈ J} and {ψj : j ∈ J} in L2(π), T ∈ C2 if and only if∑
i∈J∑
j∈J 〈Tϕi, ψj〉2 =∑
j∈J‖Tϕj‖2 < ∞ (see Ringrose, 1971, p.102). So those sums do not depend on the
specific orthonormal bases {ϕj : j ∈ J} and {ψj : j ∈ J}. P is Hilbert-Schmidt if and only
13
if (see Ringrose, 1971, p.104)∫X
∫X
[k(x, x′)
π(x′)
]2
π(x)π(x′)µ(dx)µ(dx′) <∞.
When P ∈ C2 and P is self-adjoint, P has countable non-zero real eigenvalues {βn},
counting multiplicity, and hence
∑i∈J
∑j∈J
〈Pϕi, ψj〉2 =∑j∈J
‖Pϕj‖2 =∑n
β2n =
∫X
∫X
[k(x, x′)
π(x′)
]2
π(x)π(x′)µ(dx)µ(dx′)
for any orthonormal bases {ϕj : j ∈ J} and {ψj : j ∈ J} (see Ringrose, 1971, p.107).
C1 is called the trace class of operators on L2(π). Given any orthonormal basis
{ϕj : j ∈ J} in L2(π), if T ∈ C1 then∑
j∈J 〈Tϕj, ϕj〉 exists and does not depend on the
specific orthonormal basis {ϕj : j ∈ J} (see Ringrose, 1971, p.82). If P is a positive
self-adjoint operator, P is trace class if and only if (personal communication with K.
Khare) ∑n
βn =
∫X
k(x, x)µ(dx) <∞.
Because Cp ⊂ Cq if 1 ≤ p ≤ q ≤ ∞ (see Ringrose, 1971, p.76), trace class is a subset
of Hilbert-Schmidt class. Note that 0 < βn ≤ 1 when P is positive. Therefore, if P is a
positive self-adjoint operator, then
H :=
∫X
∫X
[k(x, x′)
π(x′)
]2
π(x)π(x′)µ(dx)µ(dx′) ≤∫X
k(x, x)µ(dx).
We now provide a direct proof of this fact. P is a positive self-adjoint operator so there
exists another Markov transition kernel s(x, x′) which is also self-adjoint with respect to π
such that (see Kadison and Ringrose, 1997, p.247 or Li, 2003, p.37)
k(x, x′) =
∫X
s(x′′, x′)s(x, x′′)µ(dx′′).
By Jensen’s inequality
[k(x, x′)]2 =
[∫X
s(x′′, x′)s(x, x′′)µ(dx′′)
]2
≤∫X
s2(x′′, x′)s(x, x′′)µ(dx′′).
14
Now,
H =
∫X
∫X
π(x)
π(x′)k2(x, x′)µ(dx)µ(dx′) ≤
∫X
∫X
∫X
π(x)
π(x′)s2(x′′, x′)s(x, x′′)µ(dx′′)µ(dx)µ(dx′).
But, s(x, x′′)π(x) = s(x′′, x)π(x′′), so
H ≤∫X
∫X
∫X
π(x′′)
π(x′)s2(x′′, x′)s(x′′, x)µ(dx′′)µ(dx)µ(dx′)
=
∫X
∫X
π(x′′)
π(x′)s2(x′′, x′)µ(dx′′)µ(dx′).
And using the self-adjointness again, we have s(x′′, x′)π(x′′) = s(x′, x′′)π(x′), so
H ≤∫X
∫X
s(x′′, x′)s(x′, x′′)µ(dx′′)µ(dx′) =
∫X
k(x′, x′)µ(dx′).
1.1.4 Gibbs Sampler
Now we talk about the relationship between Markov chains corresponding to
a Gibbs sampler (GS). Suppose that (X, Y ) has a joint distribution π(dx, dy) =
π(x, y)µ(dx)ν(dy) on a countably generated measure space (X × Y,F ⊗ G). Also,
suppose that it is easy to simulate from conditional kernels πX|Y (y, dx) = πX|Y (x|y)µ(dx)
and πY |X(x, dy) = πY |X(y|x)ν(dy). Denote the marginal distribution of X by πX . The GS
for the above joint distribution is based on a Markov chain {(Xn, Yn)}∞n=0 which is called
the (x, y)-chain or the Gibbs Markov chain. If the current state is (Xn, Yn) = (x, y), we
simulate the next state (Xn+1, Yn+1) by:
1. Draw Yn+1 ∼ πY |X(·|x). Call the observed value y′.
2. Draw Xn+1 ∼ πX|Y (·|y′).
The (x, y)-chain has the Markov transition kernel
P ((x, y), d(x′, y′)) := k((x, y), (x′, y′))µ(dx′)ν(dy′) = πX|Y (x′|y′)πY |X(y′|x)µ(dx′)ν(dy′)
15
and the stationary distribution π. {Xn}∞n=0, which is called the x-chain or the marginal
chain, is also a Markov chain with the Markov transition kernel
PX(x, dx′) := kX(x, x′)µ(dx′) =
(∫Y
πX|Y (x′|y)πY |X(y|x)ν(dy)
)µ(dx′)
and the stationary distribution πX . PX is a positive self-adjoint operator in L2(πX). The
Gibbs Markov chain and its marginal chain have a strong relationship. If one of them is
geometrically ergodic then the other is also geometrically ergodic (Diaconis et al., 2008).
The x-chain is trace class if and only if the (x, y)-chain is Hilbert-Schmidt because∫Y
∫X
∫Y
∫X
[k((x, y), (x′, y′))
π(x′, y′)
]2
π(x, y)π(x′, y′)µ(dx)ν(dy)µ(dx′)ν(dy′)
=
∫Y
∫X
∫Y
∫X
[πX|Y (x′|y′)πY |X(y′|x)
π(x′, y′)
]2
π(x, y)π(x′, y′)µ(dx)ν(dy)µ(dx′)ν(dy′)
=
∫Y
∫X
∫X
[πX|Y (x′|y′)πY |X(y′|x)
π(x′, y′)
]2
πX(x)π(x′, y′)µ(dx)µ(dx′)ν(dy′)
=
∫Y
∫X
∫X
[π(x, y′)
πY (y′)πX(x)
]2
πX(x)π(x′, y′)µ(dx)µ(dx′)ν(dy′)
=
∫Y
∫X
[π(x, y′)
πY (y′)πX(x)
]2
πX(x)πY (y′)µ(dx)ν(dy′)
=
∫Y
∫X
πX|Y (x|y′)πY |X(y′|x)µ(dx)ν(dy′)
=
∫X
kX(x, x)µ(dx).
1.2 Overview of the Remaining Chapters
1.2.1 Chapter 2
The GS is an indispensable tool for exploring intractable probability distributions
(see, e.g., Brooks et al., 2011), but there is still very little known about its convergence
properties. In particular, with the exception of a few scattered examples, the qualitative
convergence rates of most GSs that are used in practice are unknown. That is, it is
unknown whether these Gibbs Markov chains are uniformly ergodic, geometrically
ergodic, or sub-geometrically ergodic. This is a serious practical problem because the
16
techniques that are typically used to process GS simulations are predicated on the
existence of CLT, which may not exist if the Markov chain in question is sub-geometric
(see, e.g., Roberts and Rosenthal, 1998; Flegal et al., 2008).
The main reason why so little is known about the GSs that are used in practice
is that the underlying Markov chains are complex, high dimensional, and difficult to
analyze. This has led to the study of families of GSs that are not practically relevant, in
the sense that their invariant distributions are not intractable, but are still challenging
from a convergence rate analysis standpoint. For example, Diaconis et al. (2008)
performed a thorough spectral analysis of the GSs that result when a density in the
one-parameter exponential family is combined with the natural conjugate prior. Other
examples include Tan et al. (2013), Jovanovski and Madras (2014) and Hobert and
Khare (2014). Note that all finite state space, irreducible, and aperiodic Markov chains
are uniformly ergodic (see Billingsley, 1995, p.131). Here we consider the family
introduced and studied in Tan (2009) and Tan et al. (2013). These GSs are, in a sense,
the simplest ones whose state space in not finite. We now describe this family.
Let {ai}∞i=1 and {bi}∞i=1 be two sequences of strictly positive real numbers such that∑∞i=1 ai +
∑∞i=1 bi = 1. Denote by N the set of natural numbers. Let (U, V ) be a discrete
bivariate random vector supported on N × N whose joint probability mass function (pmf)
is given by (see Figure 1-1)
Pr(U = i, V = j) =
ai if i = j and j = 1, 2, . . .
bj if i = j + 1 and j = 1, 2, . . .
0 otherwise
. (1–1)
For convenience, we define b0 = 0 and a0 = 1. Let {(Xn, Yn)}∞n=0 be a Markov chain
on N× N such that
Pr(Xn+1 = i′, Yn+1 = j′ |Xn = i, Yn = j
)= Pr(V = j′ |U = i) Pr(U = i′ |V = j′) , (1–2)
17
a1
a2
a3
a4
b1
b2
b3
1 2 3 4 5i
1
2
3
4
5j
Figure 1-1. Pr(U = i, V = j)
where, for any j ∈ N,
Pr(U = i |V = j) =aj
aj + bjI(i = j) +
bjaj + bj
I(i = j + 1) ,
and, likewise, for any i ∈ N,
Pr(V = j |U = i) =ai
ai + bi−1
I(j = i) +bi−1
ai + bi−1
I(j = i− 1) .
It is easy to see that this Markov chain is irreducible, aperiodic and Harris positive
recurrent, with stationary mass function (1–1).
It would be a trivial matter to simulate independent and identically distributed (iid)
random vectors from the joint pmf (1–1), so the above GS would never be used to
explore (1–1). However, it is still worthy of analysis since, as we now explain, these
GSs are the simplest ones that are not automatically uniformly ergodic. First, it is clear
that one cannot construct a GS using a single random variable, so a bivariate random
vector is the simplest place to start. (In the trivial case where the two components
are independent, the GS converges in one iteration.) Second, if either (or both) of the
components of a bivariate random vector have finite support, then the corresponding GS
is automatically uniformly ergodic (see, e.g., Diebolt and Robert, 1994). Furthermore,
if ai or bi is 0 for some i, then the Gibbs chain is not irreducible. So, given that both
18
variables must have infinite support in order for the corresponding GS to be interesting,
the support of (U, V ) seems quite minimal. We note that a member of this family of
Gibbs samplers was recently used as a counterexample in Łatuszynski et al. (2013).
Tan et al. (2013) developed conditions on the ais and bis that guarantee geometric
ergodicity. The marginal x-chain {Xn}∞n=0 is a special case of birth-death chain. This
leads us to the study of general birth-death chain. We use the same notation X =
{Xn}∞n=0 to denote a birth-death Markov chain with state space N and Markov transition
matrix (Mtm) given by
M =
r1 p1 0 0 0 · · ·
q2 r2 p2 0 0 · · ·
0 q3 r3 p3 0 · · ·...
......
......
. . .
.
Of course, the (i, j)th entry of M represents Pr(Xn+1 = j |Xn = i), so qi + ri + pi = 1
for all i ∈ N, where q1 ≡ 0. In Section 2.2, we characterize compactness of the
Markov operator M of the birth-death chain X. Using result in Section 2.2, we can
also characterize compactness of the Markov operator of the marginal x-chain of
{(Xn, Yn)}∞n=0 in Section 2.3. Comparing Tan’s conditions for geometric ergodicity with
our conditions for compactness shows that compactness is a much stronger property.
Finally, we show that birth-death chains can not be uniformly ergodic in Section 2.4.
1.2.2 Chapter 3
Consider the frequentist unbalanced one-way random effects model (Searle et al.,
1992)
yij = θi + εij, i = 1, 2, . . . , K, j = 1, 2, . . . ,mi, (1–3)
where the random effects θis are iid N(µ, λ−1θ ), the white noise εijs are iid N(0, λ−1
e ),
θis and εijs are independent for all i and j, µ, λθ, and λe are unknown parameters, and
K ≥ 2 and mi ≥ 2 for all i = 1, 2, . . . , K are known constants.
19
Given positive numbers α and β, we denote by X ∼ Gamma(α, β) the random
variable with density function f(x) = (βα/Γ(α))xα−1e−xβI{x > 0}. 1 is a K × 1 column
vector of ones, I is a K ×K identity matrix, θ = (θ1, . . . , θK)T is a K × 1 column vector,
ζ = (θ1, . . . , θK , µ)T is a (K + 1)× 1 column vector, λ = (λθ, λe)T is a 2× 1 column vector,
and y denotes all numbers yijs.
A Bayesian version with proper prior of the frequentist model (1–3) is
yij|θ, λe, λθ, µ ∼ N(θi, λ−1e ), i = 1, 2, . . . , K, j = 1, 2, . . . ,mi,
θ|µ, λθ, λe ∼ N(1µ, Iλ−1θ ),
where λe ∼ Gamma(a2, b2), µ ∼ N(µ0, λ−10 ), λθ ∼ Gamma(a1, b1), yijs are independent
given θ, λe, λθ, and µ, µ0 is a known constant, a1, a2, b1, b2, and λ0 are strictly positive
known constants, and µ, λθ, and λe are mutually independent. The posterior distribution
is π(ζ, λ|y). To simplify notation, we suppress the notation of dependence on y, e.g.
π(ζ, λ) := π(ζ, λ|y). Hobert and Geyer (1998) studied a Block GS for the above model
which is based on π(ζ|λ) and π(λ|ζ) and established that it is geometrically ergodic
under some simple conditions. In Section 3.3, we show that the marginal λ-chain of
the Block Gibbs chain in Hobert and Geyer (1998) is never Hilbert-Schmidt. This is a
negative result but it tells us some properties of the spectrum of the Markov operator.
For example, the Markov operator of that marginal λ-chain is either not compact or
compact with infinite sum of the squares of its eigenvalues.
A Bayesian version with improper prior of the frequentist model (1–3) is (Tan and
Hobert, 2009)
yij|θ, µ, λθ, λe ∼ N(θi, λ−1e ), i = 1, 2, . . . , K, j = 1, 2, . . . ,mi,
θ|µ, λθ, λe ∼ N(1µ, Iλ−1θ ),
f(λθ, λe, µ) ∝ λa−1θ λb−1
e , λθ, λe > 0,
20
where yijs are independent given θ, λe, λθ, and µ, a and b are known constants. Tan and
Hobert (2009) studied a Block GS for the above model which is based on π(ζ|λ) and
π(λ|ζ) and established that it is geometrically ergodic under some simple conditions.
Improper priors probably would not make things better, so we figured it would be
unlikely that Tan and Hobert’s (2009) chains would be better behaved than Hobert and
Geyer’s (1998) chains. So we do not study the spectral properties of Tan and Hobert’s
(2009) Markov chains. But we thought a different parametrization might change things.
Therefore, we consider an alternative Block GS that is based on π(θ|µ, λ) and π(µ, λ|θ)
(personal communication with A. Tan). It is currently an open problem whether this
alternative Block Gibbs chain is geometrically ergodic. In Section 3.4, we show that this
alternative Block Gibbs chain is not trace class in most cases.
1.2.3 Chapter 4
In this chapter, we return to the birth-death chains studied in Chapter 2. van Doorn
and Schrijner (1995), Mao (2010), and Tan et al. (2013) have studied the geometric
ergodicity of birth-death chains. van Doorn and Schrijner (1995) used orthogonal
polynomials. Theorem 3.4 in van Doorn and Schrijner (1995) gives a necessary and
sufficient condition for the geometric ergodicity of birth-death chains, but it is very
difficult to be verified in practice. So Theorem 3.5 in van Doorn and Schrijner (1995)
develops a more practical necessary condition for the geometric ergodicity of birth-death
chains. Tan et al. (2013) studied a subset of the set of all general birth-death chains.
Lemma 3 in Tan et al. (2013) suggests us the idea to develop a part of our conditions
for the geometric ergodicity of birth-death chains. Theorem 4.3 in Mao (2010) used
both spectra gap and drift condition to find the most practical necessary and sufficient
condition for the geometric ergodicity of birth-dearth chains among known results. Their
result is quite similar to our main result in this chapter. We use drift condition method,
which is different from Mao’s (2010) method, to find a necessary and sufficient condition
for the geometric ergodicity of birth-death chains which is simpler than the condition
21
in Mao (2010). We study properties of all possible drift functions and find a necessary
condition for the geometric ergodicity of birth-death chains. That condition is strong
enough so that it is also sufficient for the geometric ergodicity of birth-death chains. Our
method can be applied to other models such as a family of random walks on Z where Z
is the set of integer numbers. We do not know Mao’s (2010) method can be applied for
those random walks on Z or not.
Here is what we will do in this chapter. Section 4.2 reviews main results in van
Doorn and Schrijner (1995) and Mao (2010) on the geometric ergodicity of birth-death
chains. Section 4.3.1 develops a simple necessary and sufficient condition for the
geometric ergodicity of birth-death chains. Section 4.3.2 uses results in Section 4.3.1
to study the toy GS from Tan et al. (2013). Section 4.3.3 uses the same method in
Section 4.3.1 to study a family of random walks on Z. Section 4.3.4 gives some results
which relate to birth-death chains.
22
CHAPTER 2CHARACTERIZATION OF COMPACTNESS FOR BIRTH-DEATH MARKOV
OPERATORS WITH APPLICATIONS TO GIBBS SAMPLING
2.1 Summary
In this chapter, we characterize compactness of the Markov operator of the birth-
death chain. We then apply it to study Tan’s (2009) GS Markov chains, which were
introduced in Section 1.2.1. In particular, we look at the relationship between com-
pactness and geometric ergodicity. Finally, we show that birth-death chain can not be
uniformly ergodic.
2.2 Compactness of Birth-Death Markov Operators
Recall from Section 1.2.1 that X = {Xn}∞n=0 is a birth-death Markov chain with state
space N and Markov transition matrix (Mtm) given by
M =
r1 p1 0 0 0 · · ·
q2 r2 p2 0 0 · · ·
0 q3 r3 p3 0 · · ·...
......
......
. . .
.
It is well-known (see, e.g., Karlin and Taylor, 1975, p.108) that the birth-death chain X is
irreducible, aperiodic, and positive recurrent if and only if the following three conditions
hold: (i) pi > 0 for all i ∈ N, (ii) ri > 0 for some i ∈ N, and (iii) c =∑∞
i=1 ci <∞, where
c1 = 1, ci =p1 p2 · · · pi−1
q2 q3 · · · qi, i = 2, 3, . . .
We note that Karlin and Taylor (1975) actually refer to X as a random walk chain rather
than a birth-death chain, but the latter appears to be more standard. When the three
regularity conditions hold, there exist a set of (strictly positive) stationary probabilities
π = {πi}∞i=1 with πi = ci/c for i = 1, 2, . . . Note that πi+1/πi = pi/qi+1 for all i ∈ N. Also, it
is easy to see that X is reversible; i.e., πimij = πjmji for all i, j ∈ N.
23
Denote by R the set of real numbers. Let L20(π) denote the functions f : N→ R such
that∞∑i=1
f 2i πi <∞ and
∞∑i=1
fi πi = 0 .
L20(π) is the Hilbert space with the inner-product
〈f, g〉 =∞∑i=1
fi gi πi .
The Mtm M defines an operator, which we also call M , that maps f = {fi}∞i=1 ∈ L20(π) to
Mf = {(Mf)i}∞i=1 ∈ L20(π), where
(Mf)i =∞∑j=1
mijfj .
Our main result for the birth-death chain is a simple characterization of compactness.
Proposition 2.1. The operator M is compact in L20(π) if and only if ri → 0 and pi → 0.
Proof. Lemma A.1 in the Appendix shows that compactness in L2(π) is equivalent to
compactness in L20(π). Hence, it suffices to prove that M is a compact operator in L2(π)
if and only if ri → 0 and pi → 0. For i ∈ N, let e(i) ∈ L2(π) denote the vector that has
ith coordinate equal to 1/√πi, and has every other coordinate equal to 0. Note that
〈e(i), e(j)〉 equals 1 if i = j, and equals 0 otherwise. Hence, the e(i)s form an orthonormal
basis of L2(π). Now let M∗ denote the matrix representation of the linear operator M in
L2(π) with respect to this basis (see, e.g., Akhiezer and Glazman, 1993, p.48). That is,
m∗ij =
√πiπjmij .
Thus, we have
M∗ =
r1
√π1π2p1 0 0 0 · · ·√
π2π1q2 r2
√π2π3p2 0 0 · · ·
0√
π3π2q3 r3
√π3π4p3 0 · · ·
......
......
.... . .
.
24
(Note the correspondence between M∗ and the finite-dimensional matrix D−1PD on
page 42 of Diaconis and Stroock (1991).) Now, since m∗ij = 0 whenever |i − j| > 2,
the results in Section 28 of Akhiezer and Glazman (1993) imply that M is a compact
operator if and only if
limi,j→∞
m∗ij = 0 . (2–1)
Now, using the reversibility of the birth-death chain, we have
m∗ji =
√πjπimji =
√πjπi
πimij
πj=
√πiπjmij = m∗ij ,
so M∗ is a symmetric matrix. Moreover, since πi+1/πi = pi/qi+1, we have√πiπi+1
pi =
√πi+1
πiqi+1 =
√piqi+1 .
Thus, (2–1) holds if and only if ri → 0 and √piqi+1 → 0. Clearly, if ri → 0 and pi → 0,
then ri → 0 and √piqi+1 → 0. To prove the other direction, assume that ri → 0 and√piqi+1 → 0. Assume also that lim sup pi > 0. Then there exists an ε ∈ (0, 1) such that
lim sup pi > ε. Fix δ > 0 and choose N such that ri and piqi+1 are both less than δ for all
i > N . Since lim sup pi > ε, there exists j > N such that pj > ε. Then because j > N ,
we have qj+1 < δ/ε and rj+1 < δ. Thus, pj+1 = 1 − rj+1 − qj+1 > 1 − δ − δ/ε. Assuming
ε is sufficiently small, and taking δ = ε2, we have pj+1 > 1 − ε2 − ε > ε. By repeating
the argument, we get that pi > ε for all i ≥ j. But piqi+1 → 0, so we must have qi → 0.
Consequently, πi/πi+1 = qi+1/pi → 0. But this implies that πi+1 > πi for all large i, which
contradicts the fact that πi → 0. We conclude that pi → 0.
2.3 Application to a Family of Gibbs Samplers
Recall from Section 1.2.1 that {ai}∞i=1 and {bi}∞i=1 are two sequences of strictly
positive real numbers such that∑∞
i=1 ai +∑∞
i=1 bi = 1, and {(Xn, Yn)}∞n=0 is the Gibbs
Markov chain, whose transition probabilities are given by
Pr(Xn+1 = i′, Yn+1 = j′ |Xn = i, Yn = j
)= Pr(V = j′ |U = i) Pr(U = i′ |V = j′) , (2–2)
25
where, for any j ∈ N,
Pr(U = i |V = j) =aj
aj + bjI(i = j) +
bjaj + bj
I(i = j + 1) ,
and, likewise, for any i ∈ N,
Pr(V = j |U = i) =ai
ai + bi−1
I(j = i) +bi−1
ai + bi−1
I(j = i− 1) .
It is well-known that the two marginal sequences, {Xn}∞n=0 and {Yn}∞n=0, are themselves
reversible Markov chains, and that geometric ergodicity is a solidarity property for the
three chains. That is, either {(Xn, Yn)}∞n=0, {Xn, }∞n=0 and {Yn}∞n=0 are all geometrically
ergodic, or none of them is (Diaconis et al., 2008; Roberts and Rosenthal, 1997; Liu
et al., 1994). We analyze the marginal x-chain {Xn}∞n=0. A simple calculation shows
that the x-chain is a special case of the birth-death chain with transition probabilities are
given by
pi =aibi
(ai + bi−1)(ai + bi)and qi =
ai−1bi−1
(ai + bi−1)(ai−1 + bi−1),
and ri = 1 − pi − qi. (Note that q1 = 0 due to the definitions of a0 and b0.) Let MX
denote the Markov operator defined by the x-chain. We use Proposition 2.1 to establish
necessary and sufficient conditions for its compactness.
Proposition 2.2. The operator MX of the marginal x-chain of {(Xn, Yn)}∞n=0 is compact if
and only if
bi/ai → 0 and ai+1/bi → 0 . (2–3)
Proof. We start with necessity. Assume MX is compact. Then Proposition 2.1 implies
that ri → 0. In this case,
ri = 1− pi − qi =a2i
(ai + bi−1)(ai + bi)+
b2i−1
(ai + bi−1)(ai−1 + bi−1).
Thus, ri → 0 if and only if
a2i
(ai + bi−1)(ai + bi)=
1
(1 + bi−1/ai)(1 + bi/ai)→ 0 , (2–4)
26
andb2i−1
(ai + bi−1)(ai−1 + bi−1)=
1
(1 + ai/bi−1)(1 + ai−1/bi−1)→ 0 . (2–5)
Now consider (2–4). Since 1/(1 + bi−1/ai) and 1/(1 + bi/ai) are both in (0,1), their product
converges to zero if and only if
min
{1
1 + bi−1/ai,
1
1 + bi/ai
}→ 0 .
It follows that ri → 0 if and only if
max
{bi−1
ai,biai
}→∞ , (2–6)
and
max
{aibi−1
,ai−1
bi−1
}→∞ . (2–7)
Now let N be such that (bi ∨ bi−1)/ai > 1 and (ai ∨ ai−1)/bi−1 > 1 for all i > N where
∨ denotes maximum. Assume that i > N . If ai−1 ≤ bi−1, then ai > bi−1 and ai < bi.
Consequently,
ai−1 ≤ bi−1 < ai < bi < ai+1 < bi+1 < · · · .
But this contradicts the fact that ai → 0. We conclude that ai−1 > bi−1 for all i > N . Then,
since we also know that (bi ∨ bi−1)/ai > 1 for all i > N , it follows that
aN > bN > aN+1 > bN+1 > aN+2 > bN+2 > · · · ,
Thus, for i > N , bi ∨ bi−1 = bi−1 and ai ∨ ai−1 = ai−1. It follows from (2–6) and (2–7) that
bi−1/ai →∞ and ai/bi →∞. Thus, (2–3) holds.
We now establish sufficiency. Assume that (2–3) holds. Then there exists an N
such that bi/ai and ai+1/bi are both smaller than 1 for all i ≥ N . Therefore,
aN > bN > aN+1 > bN+1 > aN+2 > bN+2 > · · · .
27
Thus, the arguments in the necessity part of the proof imply that ri → 0. Finally, it is
easy to see that
pi =1
(1 + bi−1/ai)(1 + ai/bi)→ 0 .
The following result gives a sufficient condition for geometric ergodicity.
Proposition 2.3. (Tan, 2009) The Markov chains {(Xn, Yn)}∞n=0, {Xn, }∞n=0 and {Yn}∞n=0
are all geometrically ergodic if
lim supi→∞
piqi< 1 and lim inf
i→∞qi > 0 . (2–8)
Note that the conditions of Proposition 2.3 are much weaker than the conditions of
Proposition 2.1. We now apply our results to two examples of the joint pmf (1–1). We
begin with a chain converges at a geometric rate, but is not compact.
Example 2.4. (Tan, 2009) For i ∈ N, let ai = ce−i and bi = e−i, where c = e − 2. Then
pi = c(c+e)(c+1)
and qi = ce(c+e)(c+1)
, so pi/qi = 1/e. Thus, by Proposition 2.3, the Gibbs
Markov chain is geometrically ergodic. However, since bi/ai = 1/c 6= 0, Proposition 2.2
implies that the operator is not compact.
We end with an example of a compact Gibbs Markov chain.
Example 2.5. For i ∈ N, let
a′i =1
(2i− 1)2i−1and b′i =
1
(2i)2i.
Note that M :=∑∞
i=1(a′i + b′i) =∑∞
i=1 i−i <∞, and let ai = a′i/M and bi = b′i/M . Now
biai
=(2i− 1)2i−1
(2i)2i=
[2i− 1
2i
]2i−11
2i≤ 1
2i→ 0 .
In addition,ai+1
bi=
(2i)2i
(2i+ 1)2i+1=
[2i
2i+ 1
]2i1
2i+ 1≤ 1
2i+ 1→ 0 .
28
Thus, by Proposition 2.2, the operator is compact. Of course, this implies that the chain
is geometrically ergodic.
2.4 Birth-Death Chains Are Not Uniformly Ergodic
In this section, we prove that
Proposition 2.6. The birth-death chain X is not uniformly ergodic.
We can prove this proposition by using Theorem 16.0.2(vi) in Meyn and Tweedie
(2009). However we will provide a simple proof here.
Proof. Recall that π is the stationary distribution. There exists N such that π(N) > 0.
Suppose that X is uniformly ergodic then there exists a positive constant M < ∞ and a
constant 0 < t < 1 such that for all i and n
∞∑j=1
|P n(i, j)− π(j)| ≤Mtn.
So
|P n(N + n+ 1, N)− π(N)| ≤Mtn, n ≥ 1.
If the chain starts at the state N + n+ 1, it takes at least n+ 1 steps to move to the state
N . Thus, P n(N + n+ 1, N) = 0 for all n and hence
|P n(N + n+ 1, N)− π(N)| = π(N) ≤Mtn, n ≥ 1.
That can not happen because tn → 0 and π(N) > 0. Thus, X is not uniformly ergodic.
We can use the same arguments to prove that a more general chain is not uniformly
ergodic. P is not uniformly ergodic if P has a stationary distribution and there exists
ρ > 0 such that
P (i, j) = 0 for |i− j| > ρ, i, j = 1, 2, . . .
29
CHAPTER 3SPECTRAL ANALYSIS OF GIBBS SAMPLERS FOR BAYESIAN LINEAR MIXED
MODELS
3.1 Summary
We prove that the marginal chain of Hobert and Geyer’s (1998) Block Gibbs chain
is never Hilbert-Schmidt in Section 3.3. For improper prior, we consider an alternative
Block GS instead of Tan and Hobert’s (2009) chain. In Section 3.4, we show that this
alternative Block Gibbs chain is not trace class in most cases.
3.2 The Models and the Gibbs Samplers
3.2.1 Proper Prior
Recall the model with proper prior in Section 1.2.2
yij|θ, λe, λθ, µ ∼ N(θi, λ−1e ), i = 1, 2, . . . , K, j = 1, 2, . . . ,mi,
θ|µ, λθ, λe ∼ N(1µ, Iλ−1θ ), λe ∼ Gamma(a2, b2),
µ ∼ N(µ0, λ−10 ), λθ ∼ Gamma(a1, b1).
We introduce some notations beside notations mentioned in Section 1.2.2. Denote
yi = m−1i
∑mij=1 yij, N =
∑Ki=1mi, A1 = a1 +K/2, and A2 = a2 +N/2. Denote λθ,λe > 0 by
λ > 0. Denote by f(|) a generic conditional density and by f a generic density. For any
matrix M , let M(i, j) denote the entry on the ith row and jth column of M . Recall that
ζ = (θ1, . . . , θK , µ)T and λ = (λθ, λe)T . The posterior density is
π(ζ, λ|y) ∝
[K∏i=1
mi∏j=1
f(yij|θ, λe)
]f(θ|µ, λθ)f(µ)f(λθ)f(λe).
Recall that we simplify notation by suppressing the notation of dependence on y, e.g.
π(ζ, λ) := π(ζ, λ|y).
30
Hobert and Geyer (1998) studied the following Block GS for the above model
λθ|λe, µ, θ ∼ Gamma
(A1, b1 +
1
2
K∑i=1
(θi − µ)2
),
λe|λθ, µ, θ ∼ Gamma
(A2, b2 +
1
2
K∑i=1
mi∑j=1
(yij − θi)2
),
ζ|λθ, λe ∼ N (ζλ, Vλ) ,
(3–1)
where
V −1λ =
D2λ −λθ1
−λθ1T λ0 +Kλθ
,where Dλ is a K ×K diagonal matrix whose ith diagonal element is dλ,ii =
√λθ +miλe,
and
V −1λ ζλ =
λem1y1
λem2y2
...
λemK yK
λ0µ0
. (3–2)
As we introduced in Chapter 1, we have three Markov chains which relate to this GS.
The Block Gibbs chain has Markov transition density (Mtd)
k(λ′, ζ ′|λ, ζ)
= π(λ′|ζ ′)π(ζ ′|λ)
∝
[b1 +
1
2
K∑i=1
(θ′i − µ′)2
]A1
λ′θA1−1
exp
[−b1λ
′θ −
1
2
K∑i=1
(θ′i − µ′)2λ′θ
][b2 +
1
2
K∑i=1
mi∑j=1
(yij − θ′i)2
]A2
λ′eA2−1
exp
[−b2λ
′e −
1
2
K∑i=1
mi∑j=1
(yij − θ′i)2λ′e
]
|V −1λ |
1/2 exp
[−1
2(ζ ′ − ζλ)TV −1
λ (ζ ′ − ζλ)], λ, λ′ > 0 and ζ, ζ ′ ∈ RK+1.
31
The marginal λ-chain has Mtd
k(λ′|λ)
=
∫RK+1
π(λ′|ζ)π(ζ|λ)dζ
∝∫RK+1
[b1 +
1
2
K∑i=1
(θi − µ)2
]A1
λ′θA1−1
exp
[−b1λ
′θ −
1
2
K∑i=1
(θi − µ)2λ′θ
][b2 +
1
2
K∑i=1
mi∑j=1
(yij − θi)2
]A2
λ′eA2−1
exp
[−b2λ
′e −
1
2
K∑i=1
mi∑j=1
(yij − θi)2λ′e
]
|V −1λ |
1/2 exp
[−1
2(ζ − ζλ)TV −1
λ (ζ − ζλ)]dζ, λ, λ′ > 0.
The marginal ζ-chain has Mtd
k(ζ ′|ζ)
=
∫λ>0
π(ζ ′|λ)π(λ|ζ)dλ
∝∫λ>0
[b1 +
1
2
K∑i=1
(θi − µ)2
]A1
λθA1−1 exp
[−b1λθ −
1
2
K∑i=1
(θi − µ)2λθ
][b2 +
1
2
K∑i=1
mi∑j=1
(yij − θi)2
]A2
λeA2−1 exp
[−b2λe −
1
2
K∑i=1
mi∑j=1
(yij − θi)2λe
]
|V −1λ |
1/2 exp
[−1
2(ζ ′ − ζλ)TV −1
λ (ζ ′ − ζλ)]dλ, ζ, ζ ′ ∈ RK+1.
All the three above chains are geometrically ergodic or none of them is (Diaconis et al.,
2008).
3.2.2 Improper Prior
Recall the model with improper prior in Section 1.2.2
yij|θ, µ, λθ, λe ∼ N(θi, λ−1e ), i = 1, 2, . . . , K, j = 1, 2, . . . ,mi,
θ|µ, λθ, λe ∼ N(1µ, Iλ−1θ ),
f(λθ, λe, µ) ∝ λa−1θ λb−1
e , λθ, λe > 0.
32
We also use some extra notations here. Denote yi = m−1i
∑mij=1 yij, N =
∑Ki=1mi,
A = a + K/2, and B = b + N/2. Denote λθ,λe > 0 by λ > 0. Denote by f(|) a generic
conditional density and by f a generic density. The posterior density is
π(θ, µ, λ|y) ∝
[K∏i=1
mi∏j=1
f(yij|θi, λe)
]f(θ|µ, λθ)f(λθ, λe, µ).
This posterior is proper if and only if (see Lemma A.2)
a < 0, a+K
2>
1
2, and a+ b >
1−N2
.
Again, we suppress the notation of dependence on y. We consider a Block GS which
is based on π(θ|µ, λ) and π(µ, λ|θ), so we need to calculate π(θ, µ, λ), π(θ|µ, λ), and
π(µ, λ|θ). Since
π(θ, µ, λ) ∝
{K∏i=1
mi∏j=1
λ1/2e exp
[−λe
2(yij − θi)2
]}{ K∏i=1
λ1/2θ exp
[−λθ
2(θi − µ)2
]}λa−1θ λb−1
e
= λA−1θ λB−1
e exp
[−λθ
2
K∑i=1
(θi − µ)2 − λe2
K∑i=1
mi∑j=1
(yij − θi)2
], λ > 0, (3–3)
we have
π(θ|µ, λ) ∝ π(θ, µ, λ)
∝ exp
[−λθ
2
K∑i=1
(θi − µ)2 − λe2
K∑i=1
mi∑j=1
(yij − θi)2
]
∝ exp
[−λθ
2
K∑i=1
(θ2i − 2µθi)−
λe2
K∑i=1
(miθ2i − 2miyiθi)
]
= exp
[−1
2
K∑i=1
[(λθ +miλe)θ
2i − 2(λθµ+ λemiyi)θi
]].
So
θ|µ, λ ∼ N(θµ,λ, Vλ), (3–4)
33
where Vλ is a K ×K diagonal matrix whose ith diagonal element is (λθ + miλe)−1, and
θµ,λ is a K × 1 column vector whose ith element is
θµ,λ,i =λθµ+ λemiyiλθ +miλe
, 1 ≤ i ≤ K. (3–5)
Note that
θTµ,λV−1λ =
λθµ+ λem1y1
...
λθµ+ λemK yK
T
. (3–6)
From π(µ, λ|θ) ∝ π(θ, µ, λ), λe and (λθ, µ) are independent given θ with
λe|λθ, µ, θ ∼ Gamma
(B,
1
2
K∑i=1
mi∑j=1
(yij − θi)2
), (3–7)
π(λθ, µ|λe, θ) ∝ λA−1θ exp
[−λθ
2
K∑i=1
(θi − µ)2
].
We can show that
µ|λθ, θ, λe ∼ N(θ, (Kλθ)−1),
λθ|θ, λe ∼ Gamma
(A− 1
2,1
2
K∑i=1
(θi − θ)2
).
(3–8)
34
By (3–8), we have
π(µ, λθ|λe, θ)
= π(λθ|θ)π(µ|λθ, θ)
∝
[K∑i=1
(θi − θ)2
]A− 12
λA− 3
2θ exp
[−1
2λθ
K∑i=1
(θi − θ)2
]λ
12θ exp
[−1
2Kλθ(µ− θ)2
]
= λA−1θ
[K∑i=1
(θi − θ)2
]A− 12
exp
{−λθ
2
[K(µ− θ)2 +
K∑i=1
(θi − θ)2
]}
= λA−1θ
[K∑i=1
(θi − θ)2
]A− 12
exp
{−λθ
2
[Kµ2 − 2Kµθ +Kθ2 +
K∑i=1
θ2i −Kθ2
]}
= λA−1θ
[K∑i=1
(θi − θ)2
]A− 12
exp
[−λθ
2
K∑i=1
(θi − µ)2
], λθ > 0, µ ∈ R. (3–9)
We have three Markov chains. The Block GS chain has Mtd
k(µ′, λ′, θ′|µ, λ, θ)
= π(µ′, λ′|θ′)π(θ′|µ, λ)
= π(µ′, λ′θ|λ′e, θ′)π(λ′e|θ′)π(θ′|µ, λ)
∝ λ′θA−1
[K∑i=1
(θ′i − θ′)2
]A− 12
exp
[−λ
′θ
2
K∑i=1
(θ′i − µ′)2
][
K∑i=1
mi∑j=1
(yij − θ′i)2
]Bλ′e
B−1exp
[−1
2
K∑i=1
mi∑j=1
(yij − θ′i)2λ′e
]
|V −1λ |
1/2 exp
[−1
2(θ′ − θµ,λ)TV −1
λ (θ′ − θµ,λ)], λ, λ′ > 0, µ, µ′ ∈ R, θ, θ′ ∈ RK .
35
The marginal (µ, λ)-chain has Mtd
k(µ′, λ′|µ, λ) =
∫RK
π(µ′, λ′|θ)π(θ|µ, λ)dθ
=
∫RK
π(µ′, λ′θ|λ′e, θ)π(λ′e|θ)π(θ|µ, λ)dθ
∝∫RK
λ′θA−1
[K∑i=1
(θi − θ)2
]A− 12
exp
[−λ
′θ
2
K∑i=1
(θi − µ′)2
][
K∑i=1
mi∑j=1
(yij − θi)2
]Bλ′e
B−1exp
[−1
2
K∑i=1
mi∑j=1
(yij − θi)2λ′e
]
|V −1λ |
1/2 exp
[−1
2(θ − θµ,λ)TV −1
λ (θ − θµ,λ)]dθ, λ, λ′ > 0, µ, µ′ ∈ R.
The marginal θ-chain has Mtd
k(θ′|θ) =
∫λ>0
∫ ∞µ=−∞
π(θ′|µ, λ)π(µ, λ|θ)dµdλ
∝∫λ>0
∫ ∞µ=−∞
λA−1θ
[K∑i=1
(θi − θ)2
]A− 12
exp
[−λθ
2
K∑i=1
(θi − µ)2
][
K∑i=1
mi∑j=1
(yij − θi)2
]Bλe
B−1 exp
[−1
2
K∑i=1
mi∑j=1
(yij − θi)2λe
]
|V −1λ |
1/2 exp
[−1
2(θ′ − θµ,λ)TV −1
λ (θ′ − θµ,λ)]dµdλ, θ, θ′ ∈ RK .
All the three above chains are geometrically ergodic or none of them is (Diaconis et al.,
2008).
3.3 Hobert & Geyer’s Gibbs Sampler Is Not Hilbert-Schmidt
Our main result is
Proposition 3.1. The marginal λ-chain of Hobert and Geyer’s (1998) Gibbs chain is not
Hilbert-Schmidt, i.e.∫λθ>0
∫λe>0
∫λ′θ>0
∫λ′e>0
[k(λ′|λ)
π(λ′)
]2
π(λ)π(λ′)dλ′dλ =∞.
We denote by “constant” some known positive constant. We need the following
lemmas for the proof.
36
Lemma 3.2. Given positive integer number K, positive real numbers ci, 1 ≤ i ≤ K, and
real numbers c and λ such that c− λ2∑K
i=11ci> 0. Let
Mi =
ci 0 0 · · · 0 −λ
0 ci+1 0 · · · 0 −λ
0 0 ci+2 · · · 0 −λ...
......
. . ....
...
0 0 0 · · · cK −λ
−λ −λ −λ · · · −λ c
, i = 1, . . . , K.
Then
|M1| =
(K∏i=1
ci
)(c− λ2
K∑i=1
1
ci
),
where |M1| is the determinant of M1. And the unique solution x for M1x = r, where
r = (r1, r2, . . . , rK+1)T and x = (x1, x2, . . . , xK+1)T are (K + 1)× 1 column vector, is
xK+1 =rK+1 + λ
∑Ki=1
rici
c− λ2∑K
i=11ci
and xi = (ri + λxK+1)/ci, i = 1, 2, . . . , K.
Proof. We have
|M1| = c1|M2|+ (−1)K+2(−λ)
∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣
0 c2 0 · · · 0
0 0 c3 · · · 0
......
.... . .
...
0 0 0 · · · cK
−λ −λ −λ · · · −λ
∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣= c1|M2|+ (−1)K+2(−λ)
[c2c3 . . . cK(−λ)(−1)K−1
]= c1|M2| − c2c3 . . . cKλ
2.
37
Do the same we have
|Mi| = ci|Mi+1| − ci+1ci+2 . . . cKλ2, i = 1, 2, . . . , K − 1,
|MK | = cKc− λ2.
Thus,
|M1| − c1|M2| = −∏K
j=1 cj
c1
λ2
c1c2 . . . ci−1|Mi| − c1c2 . . . ci|Mi+1| = −λ2
∏Kj=1 cj
ci, i = 2, 3, . . . , K − 1,
c1c2 . . . cK−1|MK | = c1c2 . . . cKc− λ2
∏Kj=1 cj
cK.
Sum them up we get
|M1| = −
[K∏j=1
cj
]λ2
K∑i=1
1
ci+
[K∏i=1
ci
]c
=
[K∏i=1
ci
][c− λ2
K∑i=1
1
ci
].
Now we solve the equation M1x = r to find x. We have
c1x1 − λxK+1 = r1,
. . .
cKxK − λxK+1 = rK ,
− λK∑i=1
xi + cxK+1 = rK+1.
Plug xi = (ri + λxK+1)/ci in the last equation we have
−λK∑i=1
ri + λxK+1
ci+ cxK+1 = rK+1 ⇔ xK+1
(c− λ2
K∑i=1
1
ci
)= rK+1 + λ
K∑i=1
rici
⇔ xK+1 =rK+1 + λ
∑Ki=1
rici
c− λ2∑K
i=11ci
.
38
Lemma 3.3. Let
(X, Y )T ∼ N
µXµY
,
σ2X ρσXσY
ρσXσY σ2Y
,
where µX , µY , σX ,σY , and ρ are functions of a vector τ in Rn. Suppose that σX → ∞
when τ → 0, but µX , µY , σY , and Cov(X, Y ) are uniformly bounded by a constant for all
τ . Given any constants a and b > 0, there exists ε > 0 such that
E[(X − Y )2|X − a|b] > (constant)σb+2X for 0 < τ < (ε, ε, · · · , ε)T ,
where (ε, ε, · · · , ε)T ∈ Rn and “<” is the partial order in Rn.
Proof. Denote h(X) = |X − a|b. Denote conditional expectation and conditional variance
with respect to X by EX and VarX respectively. Note that
E[(X − Y )2|X − a|b] = E[(X − Y )2h(X)] = E[h(X)EX(Y −X)2].
Since
EX(Y −X)2 = VarXY + (EXY −X)2 ≥ (EXY −X)2,
we have
E[h(X)EX(Y −X)2] ≥ E[h(X)(EXY −X)2
].
(X, Y ) has normal distribution, so
EXY = µY + ρσYσX
(X − µX).
Denote U = X − a then h(X) = |U |b. We get
X − EXY = X − µY − ρσYσX
(X − µX)
=
(1− ρσY
σX
)X −
(µY − ρ
σYσX
µX
)=
(1− ρσY
σX
)(X − a) +
[a
(1− ρσY
σX
)−(µY − ρ
σYσX
µX
)]= c′U + d′,
39
where
c′ = 1− ρσYσX
and d′ = a
(1− ρσY
σX
)−(µY − ρ
σYσX
µX
).
Denote c = |c′| and d = |d′|. Combining above formulas then
E[h(X)EX(Y −X)2] ≥ E[|U |b(c′U + d′)2]
= E[c2|U |b+2 + 2c′d′|U |bU + d2|U |b]
≥ E[c2|U |b+2 − 2cd|U |b+1]
= c[cE|U |b+2 − 2dE|U |b+1
]and hence
c−1E[h(X)EX(Y −X)2] ≥ c
2E|U |b+2 +
[ c2E|U |b+2 − 2dE|U |b+1
].
It suffices to show that (c/2)E|U |b+2 > (constant)σb+2X and (c/2)E|U |b+2 − 2dE|U |b+1 > 0
for small τ . To do that, we need to find bounds for c and d and an representation for
E|U |r vis σrU for all r > 0.
We have
ρ =Cov(X, Y )
σXσY⇒ ρ
σYσX
=Cov(X, Y )
σ2X
.
Because Cov(X,Y) is uniformly bounded by a constant for all τ and σX → ∞ when
τ → 0, there exists ε1 > 0 such that |ρσY /σX | < 12
for τ < ε1. Thus,
3
2> c >
1
2for τ < ε1.
µX and µY are uniformly bounded by a constant for all τ , and |ρσY /σX | < 12
for τ < ε1
and therefore d′ is uniformly bounded by a constant for τ < ε1. So there exists a constant
M > 0 such that
d < M for τ < ε1.
We have
U ∼ N(µU , σ2U),
40
where µU = µX − a and σU = σX . Denote the rising factorial by
zn = z(z + 1) · · · (z + n− 1), n ∈ N,
and z0 = 1. From Winkelbauer (2012), we have
E|U |r = σrU2r/2Γ( r+1
2)
√π
Φ
(−r
2,1
2,− µU
2σ2U
), r > −1,
where
Φ(α, β, x) =∞∑n=0
αn
βnxn
n!= 1 +
∞∑n=1
αn
βnxn
n!
is the Kummer’s confluent hypergeometric functions. When β is not negative integer
number, this function is analytic and hence it is continuous with respect to x.
Now we show that c2E|U |b+2 − 2dE|U |b+1 is positive when τ is small enough. For
τ < ε1 we have
c
2E|U |b+2 − 2dE|U |b+1
≥ 1
4E|U |b+2 − 2ME|U |b+1
=1
4σb+2U 2(b+2)/2 Γ( b+2+1
2)
√π
Φ
(−b+ 2
2,1
2,− µU
2σ2U
)− 2Mσb+1
U 2(b+1)/2 Γ( b+1+12
)√π
Φ
(−b+ 1
2,1
2,− µU
2σ2U
)= σb+1
U 2(b−2)/2π−1/2
[σUΓ
(b+ 3
2
)Φ
(−b+ 2
2,1
2,− µU
2σ2U
)−4√
2MΓ
(b+ 2
2
)Φ
(−b+ 1
2,1
2,− µU
2σ2U
)].
Φ(− b+2
2, 1
2, x)
is continuous with respect to x and Φ(− b+2
2, 1
2, 0)
= 1, so there exists
ε2 > 0 such that Φ(− b+2
2, 1
2, x)> 1
2for |x| < ε2. Since µX is uniformly bounded by a
constant for all τ , µU is also uniformly bounded by a constant for all τ . Combining that
with σU = σX →∞ when τ → 0, we get − µU2σ2U→ 0 when τ → 0. Then there exists ε3 > 0
such that∣∣∣ µU2σ2
U
∣∣∣ < ε2 for τ < ε3 and hence Φ(− b+2
2, 1
2,− µU
2σ2U
)> 1
2for τ < ε3. Φ
(− b+1
2, 1
2, x)
is continuous so there exists M1 such that Φ(− b+1
2, 1
2,− µU
2σ2U
)< M1 for τ < ε3. Thus, for
41
τ < min{ε1, ε3} we have
c
2E|U |b+2 − 2dE|U |b+1
≥ σb+1U 2(b−3)/2π−1/2
[σUΓ
(b+ 3
2
)1
2− 4√
2MM1Γ
(b+ 2
2
)].
σU → ∞ when τ → 0, so there exists ε4 > 0 such that the last bracket in the above
formula is positive for τ < ε4. Now if we select ε = min{ε1, ε3, ε4} then c2E|U |b+2 −
2dE|U |b+1 > 0 for τ < ε.
Finally, for τ < ε, we have
c2−1E|U |b+2 ≥ 2−12−1σb+2U 2(b+2)/2 Γ( b+2+1
2)
√π
Φ
(−b+ 2
2,1
2,− µU
2σ2U
)≥ 2−2σb+2
U 2(b+2)/2 Γ( b+2+12
)√π
2−1
= (constant)σb+2X .
Proof of Theorem 3.1. We will calculate k(λ′|λ), and then calculate π(λ). From (3–1), we
have
π(λ′|ζ)π(ζ|λ)
∝
[b1 +
1
2
K∑i=1
(θi − µ)2
]A1
λ′θA1−1
exp
[−b1λ
′θ −
1
2
K∑i=1
(θi − µ)2λ′θ
][b2 +
1
2
K∑i=1
mi∑j=1
(yij − θi)2
]A2
λ′eA2−1
exp
[−b2λ
′e −
1
2
K∑i=1
mi∑j=1
(yij − θi)2λ′e
]
|V −1λ |
1/2 exp
[−1
2(ζ − ζλ)TV −1
λ (ζ − ζλ)]
= λ′θA1−1
λ′eA2−1
exp(−b1λ′θ − b2λ
′e)|V −1
λ |1/2g(ζ) exp
(−1
2g1
), λ > 0, ζ ∈ RK+1,
42
where
g(ζ) =
[b1 +
1
2
K∑i=1
(θi − µ)2
]A1[b2 +
1
2
K∑i=1
mi∑j=1
(yij − θi)2
]A2
and
g1 =K∑i=1
(θi − µ)2λ′θ +K∑i=1
mi∑j=1
(yij − θi)2λ′e + (ζ − ζλ)TV −1λ (ζ − ζλ).
Consider some (K + 1)-variate normal distribution N(ε,Σ). Note that∫RK+1
g(ζ)√|Σ−1| exp
[−1
2(ζ − ε)TΣ−1(ζ − ε)
]dζ ∝ E(ε,Σ)g
where E(ε,Σ) is the expectation with respect to N(ε,Σ) distribution, so∫RK+1
g(ζ) exp
{−1
2
[ζTΣ−1ζ − 2εTΣ−1ζ
]}dζ ∝ 1
|Σ−1|1/2exp
[1
2εTΣ−1ε
]E(ε,Σ)g. (3–10)
We will separate a kernel of multivariate normal distribution from exp(−12g1). We have
g1 =
[K∑i=1
(θi − µ)2λ′θ +K∑i=1
mi∑j=1
(θ2i − 2yijθi)λ
′e + ζTV −1
λ ζ − 2ζTλ V−1λ ζ
]
+
[λ′e
K∑i=1
mi∑j=1
y2ij + ζTλ V
−1λ ζλ
]
= g2 + g3,
where g2 and g3 are the first and the second brackets respectively. Note that g3 does not
depend on ζ. We will re-write g2 as ζTΣ−1ζ−2εTΣ−1ζ. Denote Λθ = λθ +λ′θ, Λe = λe +λ′e,
43
and Λ = (Λθ,Λe). Using (3–2), we have
g2 =K∑i=1
(θ2i − 2θiµ+ µ2)λ′θ +
K∑i=1
(miθ2i − 2miyiθi)λ
′e + [ζTV −1
λ ζ − 2ζTλ V−1λ ζ] (3–11)
=K∑i=1
(λ′θ +miλ′e)θ
2i +Kλ′θµ
2 − 2λ′θ
K∑i=1
µθi − 2K∑i=1
λ′emiyiθi + [ζTV −1λ ζ − 2ζTλ V
−1λ ζ]
= ζT
D2λ′ −λ′θ1
−λ′θ1T Kλ′θ
ζ − 2
λ′em1y1
...
λ′emK yK
0
T
ζ + [ζTV −1λ ζ − 2ζTλ V
−1λ ζ]
= ζT
D2λ′ −λ′θ1
−λ′θ1T Kλ′θ
ζ − 2
λ′em1y1
...
λ′emK yK
0
T
ζ + ζTV −1λ ζ − 2
λem1y1
...
λemK yK
λ0µ0
T
ζ
= ζTV −1Λ ζ − 2
Λem1y1
...
ΛemK yK
λ0µ0
T
ζ
= ζTV −1Λ ζ − 2ζTΛV
−1Λ ζ.
Thus,
π(λ′|ζ)π(ζ|λ)
∝ λ′θA1−1
λ′eA2−1
exp(−b1λ′θ − b2λ
′e)|V −1
λ |1/2 exp
(−1
2g3
)g(ζ) exp
[−1
2
(ζTV −1
Λ ζ − 2ζTΛV−1
Λ ζ)], λ > 0, ζ ∈ RK+1.
44
Denote E(ζλ,Vλ) by Eλ. Using (3–10), we have
k(λ′|λ) =
∫RK+1
π(λ′|ζ)π(ζ|λ)dζ
∝ λ′θA1−1
λ′eA2−1
exp(−b1λ′θ − b2λ
′e)|V −1
λ |1/2 exp
(−1
2g3
)EΛg
1
|V −1Λ |1/2
exp
[1
2ζTΛV
−1Λ ζΛ
]= λ′θ
A1−1λ′e
A2−1 |V −1λ |1/2
|V −1Λ |1/2
EΛg exp(−b1λ′θ − b2λ
′e)
exp
[−1
2
(λ′e
K∑i=1
mi∑j=1
y2ij + ζTλ V
−1λ ζλ − ζTΛV −1
Λ ζΛ
)], λ, λ′ > 0. (3–12)
Now we calculate π(λ). We have
π(λ, ζ) ∝K∏i=1
mi∏j=1
f(yij|θ, λe)f(θ|µ, λθ)f(µ)f(λθ)f(λe)
=K∏i=1
mi∏j=1
{λ1/2e exp
[−1
2λe(yij − θi)2
]} K∏i=1
{λ
1/2θ exp
[−1
2λθ(θi − µ)2
]}exp[−λ0(µ− µ0)2/2][λa1−1
θ e−b1λθ ][λa2−1e e−b2λe ]
= λK/2+a1−1θ λN/2+a2−1
e exp(−b1λθ − b2λe)
exp
{−1
2
[K∑i=1
mi∑j=1
λe(yij − θi)2 +K∑i=1
λθ(θi − µ)2 + λ0(µ− µ0)2
]}
∝ λA1−1θ λA2−1
e exp(−b1λθ − b2λe)
exp
(−1
2λe
K∑i=1
mi∑j=1
y2ij
)exp
(−1
2g4
), λ > 0, ζ ∈ RK+1,
where
g4 =K∑i=1
mi∑j=1
λe(θ2i − 2yijθi) +
K∑i=1
λθ(θi − µ)2 + λ0(µ2 − 2µµ0).
45
Doing the same as in (3–11), we get
g4 =
[K∑i=1
miλeθ2i − 2
K∑i=1
λemiyiθi
]+
[K∑i=1
λθθ2i − 2
K∑i=1
λθθiµ+Kλθµ2
]+ [λ0µ
2 − 2λ0µ0µ]
=
[K∑i=1
(λθ +miλe)θ2i + (λ0 +Kλθ)µ
2 − 2λθ
K∑i=1
θiµ
]− 2
[K∑i=1
λemiyiθi + λ0µ0µ
]
= ζTV −1λ ζ − 2ζTλ V
−1λ ζ.
Thus,
π(λ, ζ) ∝ λA1−1θ λA2−1
e exp(−b1λθ − b2λe) exp
[−1
2
(λe
K∑i=1
mi∑j=1
y2ij + ζTV −1
λ ζ − 2ζTλ V−1λ ζ
)],
λ > 0, ζ ∈ RK+1.
Since π(ζ|λ) ∼ N(ζλ, Vλ), we have
π(λ) =π(λ, ζ)
π(ζ|λ)
∝ λA1−1θ λA2−1
e exp(−b1λθ − b2λe) exp
[−1
2
(λe
K∑i=1
mi∑j=1
y2ij + ζTV −1
λ ζ − 2ζTλ V−1λ ζ
)]{|V −1λ |
1/2 exp
[−1
2(ζ − ζλ)TV −1
λ (ζ − ζλ)]}−1
= λA1−1θ λA2−1
e exp(−b1λθ − b2λe)1
|V −1λ |1/2
exp
[−1
2
(λe
K∑i=1
mi∑j=1
y2ij − ζTλ V −1
λ ζλ
)], λ > 0. (3–13)
46
From (3–12) and (3–13), we have
k(λ′|λ)
π(λ′)
∝ λ′θA1−1
λ′eA2−1 |V −1
λ |1/2
|V −1Λ |1/2
EΛg exp(−b1λ′θ − b2λ
′e)
exp
[−1
2
(λ′e
K∑i=1
mi∑j=1
y2ij + ζTλ V
−1λ ζλ − ζTΛV −1
Λ ζΛ
)]{λ′θ
A1−1λ′e
A2−1exp(−b1λ
′θ − b2λ
′e)
1
|V −1λ′ |1/2
exp
[−1
2
(λ′e
K∑i=1
mi∑j=1
y2ij − ζTλ′V −1
λ′ ζλ′
)]}−1
=
(|V −1λ ||V
−1λ′ |
|V −1Λ |
)1/2
EΛg exp
[−1
2
(ζTλ V
−1λ ζλ + ζTλ′V
−1λ′ ζλ′ − ζ
TΛV−1
Λ ζΛ
)], λ, λ′ > 0.
Hence,
G(λ, λ′) :=
[k(λ′|λ)
π(λ′)
]2
π(λ)π(λ′)
∝ |V−1λ ||V
−1λ′ |
|V −1Λ |
(EΛg)2 exp
[−1
2
(2ζTλ V
−1λ ζλ + 2ζTλ′V
−1λ′ ζλ′ − 2ζTΛV
−1Λ ζΛ
)]λA1−1θ λA2−1
e exp(−b1λθ − b2λe)1
|V −1λ |1/2
exp
[−1
2
(λe
K∑i=1
mi∑j=1
y2ij − ζTλ V −1
λ ζλ
)]
λ′θA1−1
λ′eA2−1
exp(−b1λ′θ − b2λ
′e)
1
|V −1λ′ |1/2
exp
[−1
2
(λ′e
K∑i=1
mi∑j=1
y2ij − ζTλ′V −1
λ′ ζλ′
)]
=
[|V −1λ ||V
−1λ′ |
|V −1Λ |2
]1/2
(EΛg)2λA1−1θ λA2−1
e λ′θA1−1
λ′eA2−1
exp(−b1λθ − b2λe − b1λ′θ − b2λ
′e)
exp
(−1
2g5
), λ, λ′ > 0, (3–14)
where
g5 = Λe
K∑i=1
mi∑j=1
y2ij + ζTλ V
−1λ ζλ + ζTλ′V
−1λ′ ζλ′ − 2ζTΛV
−1Λ ζΛ.
We will show that the integral of G(λ, λ′) over a domain, which is a subset of
a neighborhood of 0 and will be defined later, is infinity. Denote m = minimi and
M = maximi. Consider some constant δ > 0 such that
2Nδ, 8(M + 1)δ < λ0. (3–15)
47
Below we only consider (λ, λ′) ∈ (0, δ)4. We need to analyze exp(−12g5), calculate |V −1
λ |,
and finally analyze EΛg.
We now analyze exp(−12g5). Hobert and Geyer (1998) gave an exact formula for ζλ.
We will re-derive this in a different way. By Lemma 3.2, we have
ζλ,K+1 =λ0µ0 + λθ
∑Ki=1
λemiyiλθ+miλe
λ0 +Kλθ − λ2θ
∑Ki=1
1λθ+miλe
=λ0µ0 + sλλ0 + tλ
,
where sλ =∑ λθλemi
λθ+miλeyi,
tλ = Kλθ − λ2θ
K∑i=1
1
λθ +miλe(3–16)
(see Hobert and Geyer, 1998, p.47), and
ζλ,i =λemiyi + λθζλ,K+1
λθ +miλe, i = 1, . . . , K.
ζλ,K+1 is a convex combination of yis and µ0 so it is bounded by a constant (see Hobert
and Geyer, 1998, p.418). ζλ,i is a convex combination of yi and ζλ,K+1 so it is also
bounded by a constant. For (λ, λ′) ∈ (0, δ)4, elements of V −1λ is bounded by a constant.
So ζTλ V−1λ ζλ is bounded by a constant. Doing the same, ζTλ′V
−1λ′ ζλ′ and ζTΛV
−1Λ ζΛ are also
bounded by a constant. Λe
∑Ki=1
∑mij=1 y
2ij is clearly bounded by a constant. Combining
all of these bounds, g5 is bounded by a constant and hence
exp(−1
2g5) > constant > 0. (3–17)
We now calculate |V −1λ |. By Lemma 3.2, we get
|V −1λ | =
K∏i=1
(λθ +miλe)
[λ0 +Kλθ − λ2
θ
K∑i=1
1
λθ +miλe
]= (λ0 + tλ)
K∏i=1
(λθ +miλe). (3–18)
Denote
pλ,i =λθ
λθ +miλe, i = 1, . . . , K,
and
αλ,i = miλepλ,i, i = 1, . . . , K.
48
Note that tλ =∑K
i=1 αλ,i. Since 0 ≤ pΛ,i ≤ 1,
αΛ,i = miΛepΛ,i ≤ miΛe ≤M2δ, i = 1, . . . , K.
Thus,
tΛ =K∑i=1
αΛ,i ≤K∑i=1
miΛe = NΛe < 2Nδ < λ0.
Doing the same we have
tλ, tλ′ < Nδ < λ0.
Using (3–18), we have
|V −1λ ||V
−1λ′ |
|V −1Λ |2
=(λ0 + tλ)
[∏Ki=1(λθ +miλe)
](λ0 + tλ′)
[∏Ki=1(λ′θ +miλ
′e)]
(λ0 + tΛ)2∏K
i=1(Λθ +miΛe)2.
Because λ0 + tλ > λ0, λ0 + tλ′ > λ0, and λ0 + tΛ < 2λ0,
(λ0 + tλ)(λ0 + tλ′)
(λ0 + tΛ)2>
λ20
(2λ0)2=
1
4.
So
|V −1λ ||V
−1λ′ |
|V −1Λ |2
>1
4
K∏i=1
(λθ +miλe)(λ′θ +miλ
′e)
(Λθ +miΛe)2
≥ (constant)(
(λθ +mλe)(λ′θ +mλ′e)
(Λθ +MΛe)2
)K. (3–19)
We have
exp(−b1λθ − b2λe) exp(−b1λ′θ − b2λ
′e) ≥ exp[−2δ(b1 + b2)] = constant > 0.
From (3–14), (3–17), (3–19), and the above inequality, for (λ, λ′) ∈ (0, δ)4, we have
G(λ, λ′) ≥ (constant)(EΛg)2
λA1−1θ λA2−1
e λ′θA1−1
λ′eA2−1
((λθ +mλe)(λ
′θ +mλ′e)
(Λθ +MΛe)2
)K/2. (3–20)
49
Now we analyze EΛg. Because a1, a2 > 0 and K,mi ≥ 2, we have A1 > K/2 ≥ 1 and
A2 > (m1 +m2)/2 ≥ 2. A1, A2 > 1 so
g(ζ) >
[1
2(θ1 − µ)2
]A1[
1
2(y11 − θ1)2
]A2
= (constant)[(θ1 − µ)2
]A1[(θ1 − y11)2
]A2
= (constant)[(θ1 − µ)2(θ1 − y11)2A2/A1
]A1.
By Jensen’s inequality,
EΛg > (constant){EΛ
[(θ1 − µ)2(θ1 − y11)2A2/A1
]}A1.
Hobert and Geyer (see 1998, p.418) shows that the elements of ζΛ are uniformly
bounded by the constant max{µ0, |y1|, . . . , |yK |}, the elements of VΛ except VΛ(i, i)’s,
1 ≤ i ≤ K, are uniformly bounded by λ−10 , and VΛ(i, i) > (Λθ + MΛe)
−1 for 1 ≤ i ≤ K.
Denote τ = (λ, λ′) then (θ1, µ)T has bivariate normal distribution which satisfies
conditions in Lemma 3.3. So there exists real number δ1 > 0 such that
EΛ
[(θ1 − µ)2(θ1 − y11)2A2/A1
]> (constant)
√VarΛθ1
2+2A2/A1
= (constant)[VarΛθ1]1+A2/A1
for τ < (δ1, δ1, δ1, δ1). Finally,
EΛg > (constant)[VarΛθ1](1+A2/A1)A1 > (constant)(Λθ +MΛe)−A1−A2 , τ < (δ1, δ1, δ1, δ1).
Combining this with condition (3–15), we have final conditions for δ
δ < δ1 and 2Nδ, 8(M + 1)δ < λ0. (3–21)
50
From (3–20), when δ satisfies condition (3–21) and (λ, λ′) ∈ (0, δ)4, we have
G(λ, λ′) ≥ (constant)1
(Λθ +MΛe)2A1+2A2λA1−1θ λA2−1
e λ′θA1−1
λ′eA2−1
((λθ +mλe)(λ
′θ +mλ′e)
(Λθ +MΛe)2
)K/2= (constant)
1
(Λθ +MΛe)4
(λθ
Λθ +MΛe
)A1−1(λe
Λθ +MΛe
)A2−1
(λ′θ
Λθ +MΛe
)A1−1(λ′e
Λθ +MΛe
)A2−1(λθ +mλeΛθ +MΛe
)K/2(λ′θ +mλ′eΛθ +MΛe
)K/2.
Consider the set
D ={
(λ, λ′) ∈ (0, δ)4∣∣∣12<λθλ′e,λeλ′e,λ′θλ′e,λθλ′θ,λeλ′θ,λθλe
< 2}.
This set has a positive Lebesgue measure in R4. Because 12< x
y< 2 and 1
2< y
x< 2 are
equivalent, in D we have1
2<x
y< 2
where x and y could be any of λθ, λe, λ′θ, or λ′e. Some properties which help us figure out
the set D: D contains the segment {(λ, λ′) = (x, x, x, x) : 0 < x < δ}; if (λ, λ′) ∈ D then
(pλ, pλ′) ∈ D for 0 < p < 1. Note that
λθΛθ +MΛe
=
[1 +
λ′θλθ
+Mλeλθ
+Mλ′eλθ
]−1
≥ (1 + 2(1 + 2M))−1 = constant.
We can do the same to get positive constant lower bounds for
x
Λθ +MΛe
where x could be any λe, λ′θ, and λ′e. Also note that
λθ +mλeΛθ +MΛe
>λθ
Λθ +MΛe
andλ′θ +mλ′eΛθ +MΛe
>λ′θ
Λθ +MΛe
.
51
Thus, on D, [k(λ′|λ)
π(λ′)
]2
π(λ)π(λ′) ≥ (constant)1
(Λθ +MΛe)4.
We only need to show that ∫D
(Λθ +MΛe)−4dλdλ′ =∞. (3–22)
We apply spherical coordinates in R4 (see Leoni, 2009, p.253) with
λθ = r cos(φ1),
λe = r sin(φ1) cos(φ2),
λ′θ = r sin(φ1) sin(φ2) cos(φ3),
λ′e = r sin(φ1) sin(φ2) sin(φ3),
dλdλ′ = r3 sin2(φ1) sin(φ2)drdφ1dφ2dφ3.
Integrating over the domain D is equivalent to integrating over some domain S in the
spherical coordinate system. We will find some subset Sb of S which is easy to describe
(see A.3 for graphs of D and Sb in R2 and R3). Denote Φ = (φ1, φ2, φ3) and
h(Φ) =[
cos(φ1) + sin(φ1) cos(φ2) + sin(φ1) sin(φ2) cos(φ3)
+ sin(φ1) sin(φ2) sin(φ3)]−4
sin2(φ1) sin(φ2).
52
M(Λθ + Λe) > Λθ +MΛe so∫D
(Λθ +MΛe)−4dλdλ′
≥M−4
∫D
(Λθ + Λe)−4dλdλ′
= M−4
∫S
r−4[
cos(φ1) + sin(φ1) cos(φ2) + sin(φ1) sin(φ2) cos(φ3)
+ sin(φ1) sin(φ2) sin(φ3)]−4
r3 sin2(φ1) sin(φ2)drdΦ.
= M−4
∫S
r−1h(Φ)drdΦ
≥M−4
∫Sb
r−1h(Φ)drdΦ.
We only need to show that the integral over Sb is infinity. Because D contains the
segment {(λ, λ′) = (x, x, x, x) : 0 < x < δ}, we find the image of that segment in
the spherical coordinate system. Or we find Φ0 = (φ0,1, φ0,2, φ0,3) such that (λ, λ′) =
(x, x, x, x). λ′θ = λ′e so sin(φ3) = cos(φ3) and hence φ3 = π/4. λ′θ = λe implies cos(φ2) =
sin(φ2)/√
(2). By sin2(φ2) + cos2(φ2) = 1, cos(φ2) = 1/√
3. λθ = λe implies cos(φ1) =
sin(φ1)/√
3, and cos(φ1) = 1/2 follows. Finally, Φ0 = (arccos(1/2), arccos(1/√
3), π/4) and
cos(φ0,1) = sin(φ0,1) cos(φ0,2)
= sin(φ0,1) sin(φ0,2) cos(φ0,3) = sin(φ0,1) sin(φ0,2) sin(φ0,3) = 1/2.
Denote
Sn =
{(r,Φ)
∣∣∣0 < r < δ, |φi − φ0,i| <1
n, i = 1, 2, 3
}.
sin(φi) and cos(φi) are continuous so cos(φ1), sin(φ1) cos(φ2), sin(φ1) sin(φ2) cos(φ3), and
sin(φ1) sin(φ2) sin(φ3) are close to 1/2 when Φ is close to Φ0. So given any 0 < ε < 1/2,
there exists b large enough such that
1
2− ε < cos(φ1), sin(φ1) cos(φ2),
sin(φ1) sin(φ2) cos(φ3), sin(φ1) sin(φ2) sin(φ3) <1
2+ ε
53
for (r,Φ) ∈ Sb. Thus,
r(1/2 + ε) > λθ, λe, λ′θ, λ′e > r(1/2− ε), (r,Φ) ∈ Sb
and hence
1/2− ε1/2 + ε
=r(1/2− ε)r(1/2 + ε)
<x
y<r(1/2 + ε)
r(1/2− ε)=
1/2 + ε
1/2− ε, (r,Φ) ∈ Sb,
where x and y could be any of λθ, λe, λ′θ, or λ′e. Note that
1
2<
1/2− ε1/2 + ε
is equivalent to ε < 1/6. If we select ε = 1/7 then there exists b such that if (r,Φ) ∈ Sb
then1
2<x
y< 2
where x and y could be any of λθ, λe, λ′θ, or λ′e. We also have 0 < x < δ where x could
be any of λθ, λe, λ′θ, or λ′e. Hence (λ, λ′) ∈ D or Sb is a subset of S. We have∫Sb
r−1h(Φ)drdΦ =
∫ δ
0
r−1dr
∫ φ0,1+ 1b
φ0,1− 1b
∫ φ0,2+ 1b
φ0,2− 1b
∫ φ0,3+ 1b
φ0,3− 1b
h(Φ)dΦ.
Because∫ δ
0r−1dr =∞ and h(Φ) is positive, the last integral is infinity.
3.4 The Gibbs Sampler with Improper Priors (and Alternative Blocking) Is NotTrace Class
Recall that a necessary condition for proper posterior is A = a+K/2 > 12.
Proposition 3.4. Suppose that there exists i such that yijs are not the same for all j and
A > 3/2 then the GS with improper priors is not trace class, i.e.∫λ>0
∫ ∞µ=−∞
∫θ∈RK
π(µ, λ|θ)π(θ|µ, λ)dθdµdλ =∞.
54
Proof. From (3–4), (3–7), and (3–8) we have
π(µ, λ|θ)π(θ|µ, λ)
= π(µ, λθ|λe, θ)π(λe|θ)π(θ|µ, λ)
∝ λA−1θ
[K∑i=1
(θi − θ)2
]A− 12
exp
[−λθ
2
K∑i=1
(θi − µ)2
][
K∑i=1
mi∑j=1
(yij − θi)2
]Bλe
B−1 exp
[−1
2
K∑i=1
mi∑j=1
(yij − θi)2λe
]
|V −1λ |
1/2 exp
[−1
2(θ − θµ,λ)TV −1
λ (θ − θµ,λ)]
= λA−1θ λB−1
e |V −1λ |
1/2g(θ) exp
(−1
2h1
), λ > 0, µ ∈ R, θ ∈ RK ,
where
g(θ) =
[K∑i=1
(θi − θ)2
]A− 12[
K∑i=1
mi∑j=1
(yij − θi)2
]B(3–23)
and
h1 =K∑i=1
(θi − µ)2λθ +K∑i=1
mi∑j=1
(yij − θi)2λe + (θ − θµ,λ)TV −1λ (θ − θµ,λ).
We will separate a kernel of multivariate normal distribution from exp(−1
2h1
). We have
h1 =K∑i=1
(θ2i − 2θiµ+ µ2)λθ +
K∑i=1
mi∑j=1
(θ2i − 2yijθi + y2
ij)λe
+[θTV −1
λ θ − 2θTµ,λV−1λ θ + θTµ,λV
−1λ θµ,λ
]=
[K∑i=1
(θ2i − 2θiµ)λθ +
K∑i=1
mi∑j=1
(θ2i − 2yijθi)λe + θTV −1
λ θ − 2θTµ,λV−1λ θ
]
+
[Kµ2λθ + λe
K∑i=1
mi∑j=1
y2ij + θTµ,λV
−1λ θµ,λ
]
= h2 + h3,
where h2 and h3 are the first and the second brackets respectively. Note that h3 does not
depend on θ. We will rewrite h2 as θTΣ−1θ − 2εTΣ−1θ. Using (3–6) and V −12λ = 2V −1
λ , we
55
have
h2 =K∑i=1
(θ2i − 2θiµ)λθ +
K∑i=1
(miθ2i − 2miyiθi)λe +
[θTV −1
λ θ − 2θTµ,λV−1λ θ
]=
K∑i=1
(λθ +miλe)θ2i − 2
K∑i=1
(λθµ+ λemiyi)θi +[θTV −1
λ θ − 2θTµ,λV−1λ θ
]= θTV −1
λ θ − 2θTµ,λV−1λ θ +
[θTV −1
λ θ − 2θTµ,λV−1λ θ
]= θTV −1
2λ θ − 2θTµ,λV−1
2λ θ.
Thus,
π(µ, λ|θ)π(θ|µ, λ) ∝ λA−1θ λB−1
e |V −1λ |
1/2 exp
(−1
2h3
)g(θ) exp
{−1
2[θTV −1
2λ θ − 2θTµ,λV−1
2λ θ]
}, λ > 0, µ ∈ R, θ ∈ RK .
As in (3–10), and denote E(θµ,λ,Vλ) by Eµ,λ, we have
G(µ, λ) :=
∫RK
π(µ, λ|θ)π(θ|µ, λ)dθ
∝ λA−1θ λB−1
e |V −1λ |
1/2 exp
(−1
2h3
)(Eµ,λg)
1
|V −12λ |1/2
exp
[1
2θTµ,λV
−12λ θµ,λ
]= λA−1
θ λB−1e
|V −1λ |1/2
|V −12λ |1/2
(Eµ,λg) exp
(−1
2h4
), λ > 0, µ ∈ R,
where
h4 = h3 − θTµ,λV −12λ θµ,λ = Kµ2λθ + λe
K∑i=1
mi∑j=1
y2ij + θTµ,λV
−1λ θµ,λ − θTµ,λV −1
2λ θµ,λ.
Since
|V −1λ | =
K∏i=1
(λθ +miλe),
we get
|V −12λ | =
K∏i=1
(2λθ +mi2λe) = 2K |V −1λ |.
56
Denote
c =K∑i=1
mi∑j=1
y2ij −
K∑i=1
miy2i .
There exists i such that yij 6= yij′ for some j 6= j′ so c > 0. V −12λ = 2V −1
λ so
h4 = Kµ2λθ + λe
K∑i=1
mi∑j=1
y2ij + θTµ,λV
−1λ θµ,λ − θTµ,λV −1
2λ θµ,λ
= cλe +
[K∑i=1
λemiy2i +Kµ2λθ − θTµ,λV −1
λ θµ,λ
]
= cλe + h5,
where h5 is the above bracket. Denote
pλ,i =λθ
λθ +miλe, i = 1, . . . , K,
and
αλ,i = miλepλ,i =miλeλθλθ +miλe
= λθ(1− pλ,i), i = 1, . . . , K.
By (3–5),
θµ,λ,i = pλ,iµ+ (1− pλ,i)yi. (3–24)
By (3–5) and (3–24),
(λθ +miλe)θ2µ,λ,i = (λθµ+ λemiyi)θµ,λ,i
= (λθµ+ λemiyi)[pλ,iµ+ (1− pλ,i)yi]
= λemi(1− pλ,i)y2i + λθpλ,iµ
2 + [λθ(1− pλ,i) + λemipλ,i]µyi
= (λemi − αλ,i)y2i + λθpλ,iµ
2 + 2αλ,iµyi.
57
Thus,
θTµ,λV−1λ θµ,λ =
K∑i=1
(λθ +miλe)θ2µ,λ,i
=K∑i=1
λemiy2i + µ2
K∑i=1
λθpλ,i −K∑i=1
αλ,i(y2i − 2µyi)
=K∑i=1
λemiy2i + µ2
K∑i=1
(λθpλ,i + αλ,i)−K∑i=1
αλ,i(µ− yi)2.
Because
λθpλ,i + αλ,i = λθpλ,i + (1− pλ,i)λθ = λθ,
we have
θTµ,λV−1λ θµ,λ =
K∑i=1
λemiy2i +Kµ2λθ −
K∑i=1
αλ,i(µ− yi)2
and hence
h5 =K∑i=1
αλ,i(µ− yi)2.
Finally,
G(µ, λ) ∝ λA−1θ λB−1
e (Eµ,λg) exp
(−1
2cλe −
1
2
K∑i=1
αλ,i(µ− yi)2
), λ > 0, µ ∈ R.
We will show that ∫D
G(µ, λ)dµdλ =∞,
where we define domain D later. Again, we denote by “constant” some known positive
constant. Denote
m = minimi and M = max
imi.
Since
mi∑j=1
(yij − θi)2 = miθ2i − 2miyiθi +
mi∑j=1
y2ij = mi(θi − yi)2 +
(mi∑j=1
y2ij −miy
2i
),
58
we get
K∑i=1
mi∑j=1
(yij − θi)2 ≥K∑i=1
(mi∑j=1
y2ij −miy
2i
)= c > 0
and hence
g(θ) ≥ cB
[K∑i=1
(θi − θ)2
]A− 12
.
By Jensen’s inequality and A− 1/2 > 1,
Eµ,λg ≥ cB
[Eµ,λ
K∑i=1
(θi − θ)2
]A− 12
.
Given a random vector Z with EZ = ε and V Z = Σ and a matrix C, we know that
E(ZTCZ) = tr(CΣ) + εTCε.
We also know thatK∑i=1
(θi − θ)2 = θT (I− 11T/K)θ.
Thus,
Eµ,λ
K∑i=1
(θi − θ)2 ≥ tr((I− 11T/K)Vλ) = tr(Vλ)−1
Ktr(Vλ) =
K − 1
K
K∑i=1
1
λθ +miλe
≥ (K − 1)1
λθ +Mλe
and hence
Eµ,λg ≥ cB(K − 1)A−12 (λθ +Mλe)
−(A− 12
).
Note that
αλ,i =λθmiλeλθ +miλe
≤ Mλθλeλθ +mλe
.
59
Combining above inequalities, we get
G(µ, λ) ≥ (constant)λA−1θ λB−1
e
1
(λθ +Mλe)A− 1
2
exp
(−1
2cλe −
1
2
Mλθλeλθ +mλe
t
), λ > 0, µ ∈ R,
where
t =K∑i=1
(µ− yi)2.
We will prove that∫D
λA−1θ λB−1
e
1
(λθ +Mλe)A− 1
2
exp
(−1
2cλe −
1
2
Mλθλeλθ +mλe
t
)dλdµ =∞.
Denote x := mλe, y := λθ, c′ = c/(2m), and t′ = Mt/(2m). λ > 0 is equivalent to x, y > 0
and dλ = m−1dxdy. We need to prove that∫D
xB−1yA−1 1
(y +Mx/m)A−12
exp
(−c′x− xy
x+ yt′)dxdydµ =∞.
Fix any constant ε > 0. Define a domain D such that x, |µ| < ε and y > ε. We have
xy
x+ y= x
y
x+ y≤ x < ε.
Because |µ| < ε, t′ < δ for some constant δ and hence
c′x+xy
x+ yt′ < c′ε+ εδ = constant.
We also have1
y +Mx/m>
1
y +Mε/m:=
1
y + c1
.
We only need to show that ∫D
xB−1 yA−1
(y + c1)A−12
dxdydµ =∞.
60
Note that ∫D
xB−1 yA−1
(y + c1)A−12
dxdydµ =
∫x<ε
xB−1dx
∫|µ|<ε
dµ
∫y>ε
yA−1
(y + c1)A−12
dy.
The first two integrals are positive. It suffices to show that the last one is infinity. c1 =
Mε/m ≥ ε implies∫y>ε
yA−1
(y + c1)A−12
dy ≥∫y>c1
yA−1
(y + c1)A−12
dy =
∫y>c1
(y
y + c1
)A− 12
y−12dy.
For y > c1, 2y > y + c1 and hence∫y>c1
(y
y + c1
)A− 12
y−12dy ≥
∫y>c1
1
2A−12
y−12dy =∞.
61
CHAPTER 4CHARACTERIZATION OF GEOMETRIC ERGODICITY FOR BIRTH-DEATH MARKOV
CHAINS
4.1 Summary
Recall from Section 2.2 that X = {Xn}∞n=0 is a birth-death chain with state space N
and Markov transition matrix given by
M =
r1 p1 0 0 0 · · ·
q2 r2 p2 0 0 · · ·
0 q3 r3 p3 0 · · ·...
......
......
. . .
.
X is irreducible, aperiodic, and positive recurrent if and only if the following three
conditions hold:
pi > 0 for all i ∈ N, (4–1)
ri > 0 for some i ∈ N, (4–2)
and
c =∞∑i=1
ci <∞, (4–3)
where
c1 = 1, ci =p1 p2 · · · pi−1
q2 q3 · · · qi, i = 2, 3, . . . (4–4)
The stationary distribution is π = {πi}∞i=1 with πi = ci/c for i = 1, 2, . . . For convenience,
denote c0 = p0 = 1. We have
πipi = πi+1qi+1, ∀i ∈ N, (4–5)
and X is always reversible.
Let Mn = (m(n)ij ) denote the n-step transition matrix of X. From Section 1.1.1, X is
geometrically ergodic if there exist a function R : N → [0,∞) and a constant 0 < ρ < 1
62
such that∞∑j=1
|m(n)ij − πj| ≤ R(i)ρn, ∀i ∈ N,∀n ∈ N.
We suppose throughout this chapter, except Corollary 4.5 and Corollary 4.15, that
M satisfies conditions (4–1)-(4–3). Here is what we will do in the remaining sections
of this chapter. Section 4.2 reviews some known results on the geometric ergodicity
of birth-death chains. In Section 4.3.1, we develop a simple necessary and sufficient
condition for the geometric ergodicity of birth-death chains. We apply this result to the
toy GS from Tan et al. (2013) in Section 4.3.2. We apply the method in Section 4.3.1
to study a random walk on Z in Section 4.3.3. Finally, Section 4.3.4 gives some results
which relate to birth-death chains.
4.2 Some Known Results on the Geometric Ergodicity of Birth-Death Chains
4.2.1 Orthogonal Polynomial Method
In this section, we review some results in van Doorn and Schrijner (1995). Note
that Section 3.1 in van Doorn and Schrijner (1995) defines geometric ergodicity in a less
standard way. It only requires that Markov chains converge at a geometric rate, even
when they converge to 0. Hence we have to add conditions (4–1)-(4–3) to their results.
van Doorn and Schrijner (1995) used Karlin and McGregor’s (1959) representation of
n-step transition matrix of birth-death chain via an orthogonal polynomial system.
Proposition 4.1. (Karlin and McGregor, 1959, Theorem 1) There exist polynomials
Qn(x)s of degree n− 1 such that
Q1(x) = 1
r1Q1(x) + p1Q2(x) = xQ1(x)
qiQi−1(x) + riQi(x) + piQi+1(x) = xQi(x), i ≥ 2.
(4–6)
63
And then there exists a unique measure ψ of total mass 1 on the interval [−1, 1] such
that {Qi(x)}∞i=1 are orthogonal with respect to ψ, i.e.
πj
∫ 1
−1
Qi(x)Qj(x)dψ(x) = δij, i, j ≥ 1, (4–7)
where δii = 1 and δij = 0 for i 6= j. Furthermore,
m(n)ij = πj
∫ 1
−1
xnQi(x)Qj(x)dψ(x), i, j ≥ 1. (4–8)
Denote Q(x) = (Q1(x), Q2(x), · · · )T then we can rewrite (4–6) as MQ(x) = xQ(x)
and Q1(x) = 1. It is very difficult to find a representation which is similar to (4–8)
for other Markov chains. Karlin and McGregor (1959) proved Proposition 4.1, but we
present their proof here in a clearer way by adding some extra steps as follows.
Proof of Proposition 4.1. For any real number x, consider the equation
Mφ = xφ
where φ = {φi}∞i=1 is a real sequence. That means
r1φ1 + p1φ2 = xφ1,
q2φ1 + r2φ2 + p2φ3 = xφ2,
q3φ2 + r3φ3 + p3φ4 = xφ3,
. . .
If we know φ1 then
φ2 = (x− r1)φ1/p1,
φ3 = [(x− r2)φ2 − q2φ1]/p2,
φ4 = [(x− r3)φ3 − q3φ2]/p3,
. . .
(4–9)
64
So for any real number x, Mφ = xφ has a unique solution φ to within a constant
factor. Because φ may not be in the space L2(π) so x may not be an eigenvalue.
For each real number x, let Q(x) = (Q1(x), Q2(x), · · · )T be the unique solution of
MQ(x) = xQ(x) such that Q1(x) = 1. Q1 is a polynomial of degree 0. As in (4–9), we
have p1Q2(x) = (x− r1)Q1(x) = x− r1. So Q2 is a polynomial of degree 1. As in (4–9),
piQi+1(x) = (x− ri)Qi(x)− qiQi−1(x), i ≥ 2.
By induction we can show that Qi is a polynomial of degree i− 1 for i ≥ 1.
Recall Hibert spaces L2(π) and L20(π) in Section 2.2. For i ∈ N, let e(i) ∈ L2(π)
denote the vector that has ith coordinate equal to 1/πi (note that it is not 1/√πi as in
Section 2.2) and has every other coordinates equal to 0. By (4–5), piπi = qi+1πi+1. Using
this equality for n ≥ 2, we get
(Me(n))i =∞∑j=1
mije(n)j = mine
(n)n = min/πn
=
pn−1/πn = qn/πn−1 if i = n− 1
rn/πn if i = n
qn+1/πn = pn/πn+1 if i = n+ 1
0 o.w.
and hence
Me(n) = qne(n−1) + rne
(n) + pne(n+1), n ≥ 2. (4–10)
For n = 1,
Me(1) = r1e(1) + p1e
(2).
We will prove that Qn(M)e(1) = e(n) for all n ≥ 1. It is obviously true for n = 1
because Q1(M) = I. Suppose that Qi(M)e(1) = e(i) for 1 ≤ i ≤ n, it suffices to show that
65
Qn+1(M)e(1) = e(n+1). From (4–6), we get
MQn(M) = qnQn−1(M) + rnQn(M) + pnQn+1(M)
⇒MQn(M)e(1) = qnQn−1(M)e(1) + rnQn(M)e(1) + pnQn+1(M)e(1)
⇒Me(n) = qne(n−1) + rne
(n) + pnQn+1(M)e(1).
Combining this with (4–10), we get Qn+1(M)e(1) = e(n+1).
For 1 ≤ i, j ≤ n,
⟨Mne(j), e(i)
⟩=∞∑k=1
(Mne(j))ke(i)k πk =
∞∑k=1
∞∑l=1
e(j)l m
(n)kl e
(i)k πk = e
(j)j m
(n)ij e
(i)i πi = m
(n)ij /πj,
so
m(n)ij = πj
⟨Mne(j), e(i)
⟩= πj
⟨MnQj(M)e(1), Qi(M)e(1)
⟩(because Qn(M)e(1) = e(n))
= πj⟨Qi(M)MnQj(M)e(1), e(1)
⟩(because Qi(M) is self-adjoint)
= πj⟨MnQi(M)Qj(M)e(1), e(1)
⟩(because Qi(M) commutes with M).
Denote α(x) = xnQi(x)Qj(x) then
m(n)ij = πj
⟨α(M)e(1), e(1)
⟩.
From formula (2.4) in Conway (1990, p.264),
m(n)ij = πj
∫ 1
−1
α(x)dψ(x)
where ψ(x) =⟨Exe
(1), e(1)⟩
(see, e.g., Conway, 1990, p.257, Lemma 1.9) and Ex is the
spectral measure of M .
By α(x) = xnQi(x)Qj(x), we get
m(n)ij = πj
∫ 1
−1
xnQi(x)Qj(x)dψ(x).
66
Plugging in n = 0 we get
πj
∫ 1
−1
Qi(x)Qj(x)dψ(x) = δij
where δii = m(0)ii = 1 and δij = m
(0)ij = 0 for i 6= j (note that M0 = I).
van Doorn and Schrijner (1995) showed that Qj+1(x) has j distinct real zeros
xj1 < xj2 < · · · < xjj, j ≥ 0,
and the following limits exist
ηj = limk→∞
xk,k−j+1, j ≥ 0
and
τ = limj→∞
ηj.
Using those limits, we have a condition for geometric ergodicity.
Theorem 4.2. (van Doorn and Schrijner, 1995, Theorem 3.4) X is geometrically ergodic
if and only if τ < 1.
The value of τ depends only on the limiting behavior of the parameters in (4–6)
when j → ∞. It is not easy to calculate τ in practice. We have upper and lower bounds
for τ as follows.
Theorem 4.3. (van Doorn and Schrijner, 1995, Theorem 3.5)
τ ≤ lim supj→∞
[rj +√pj−1qj +
√pjqj+1]
and
τ ≥ lim supn→∞
{1
n
n∑j=1
(rj + 2√pj−1qj)
}.
When there exist p := limi→∞ pi and q := limi→∞ qi, we have a simple necessary and
sufficient condition for geometric ergodicity as follows.
Proposition 4.4. (van Doorn and Schrijner, 1995, Corollary 3.6) X is geometrically
ergodic if and only if p 6= q.
67
Note that (4–1) and (4–2) are easy to check but (4–3) is not. If p and q exist, the
next corollary (which is equivalent to Proposition 4.4) shows that we do not need to
check condition (4–3). We do not suppose that (4–3) holds in the next corollary. Note
that X is irreducible and aperiodic if and only if pi > 0 for i ≥ 1, qi > 0 for i ≥ 2, and
ri > 0 for some i ≥ 1.
Corollary 4.5. Suppose that pi > 0 for i ≥ 1, qi > 0 for i ≥ 2, and ri > 0 for some i ≥ 1.
X is geometrically ergodic if and only if p < q.
Proof. Geometric ergodicity implies positive recurrence, so we restate Proposition 4.4
as follows. Suppose that pi > 0 for i ≥ 1, qi > 0 for i ≥ 2, and ri > 0 for some i ≥ 1, X is
geometrically ergodic if and only if p 6= q and (4–3) holds. It suffices to prove that (4–3)
and p 6= q are equivalent to p < q. Suppose that (4–3) and p 6= q hold. First we show that
(4–3) implies p ≤ q by contradiction. Suppose that p > q. Because p and q are finite and
p 6= q,
limi→∞
ci+1
ci= lim
i→∞
piqi+1
=p
q> 1.
(Note that the above limit is not true when p = q = 0.) By D’Alembert’s ratio test (see,
e.g., Knopp, 1951, p.117), the series∑∞
i=1 ci diverges. This contradicts (4–3) so p ≤ q.
Because p 6= q, we have p < q.
Conversely, suppose that p < q. It is obvious that p 6= q. We have
limi→∞
ci+1
ci=p
q< 1.
By D’Alembert’s ratio test,∑∞
i=1 ci converges.
Proposition 4.4 also means that X is irreducible, aperiodic, positive recurrent, and
sub-geometrically ergodic if and only if (4–1)-(4–3) hold and p = q. Tan et al. (2013)
have some examples of sub-geometrically ergodic chain, but those are marginal chains
of Gibb chains and do not cover all values 0 < p ≤ 1/2.
68
Proposition 4.6. For all 0 < p ≤ 1/2, there exists an irreducible, aperiodic, and positive
recurrent birth-death chain which is sub-geometrically ergodic.
Proof. When p = q, D’Alembert’s ratio test can not be used. So we will use a better
test, Raabe-Duhamel’s test (see, e.g., Knopp, 1951, p.285), to find an example of sub-
geometrically ergodic chain. By Raabe-Duhamel’s test, we only need to find pis and qis
such that p = q and
limi→∞
i
(cici+1
− 1
)= lim
i→∞i
(qi+1
pi− 1
)> 1.
Let qi+1 = p+ εi and pi = p+ δi for i ≥ 1. Then
i
(qi+1
pi− 1
)> 1⇔ p+ εi
p+ δi− 1 >
1
i⇔ εi − δi
p+ δi>
1
i⇔ εi > δi +
p+ δii⇔ εi >
p+ (i+ 1)δii
.
First considering the case 1/2 ≥ p > 0. We can select δi = −(i + 1)−1 and εi = p/i when
i is large enough (so 0 < pi, qi and pi + qi < 1). It is clearly that qi > 0. pi > 0 if and only if
i > 1p− 1. Because p ≤ 1/2, we have
pi + qi = 2p+ δi + εi−1 = 2p+p
i− 1− 1
i+ 1< 1
⇐ p
i− 1<
1
i+ 1⇔ p <
i− 1
i+ 1
⇐ i > 4.
We can select δi = εi = 0 for i ≤ 4 or i ≤ 1p− 1.
Finally for the case p = 0, we can select δi = (i+ 1)−1 and εi = (p+ 2)/i for i > 2.
4.2.2 Spectral Method
Mao (2010) used both spectral gap and drift condition to find a practical necessary
and sufficient condition for the geometric ergodicity of birth-death chains.
Theorem 4.7. (Mao, 2010, Theorem 4.3) The birth-death chain X is geometrically
ergodic if and only if
supi≥1
[∞∑j=i
cj
][i−1∑k=0
1
ckpk
]<∞. (4–11)
69
Chapter 4 in Chen (1992) reviews similar results from Mao and Zhang (2004) for
birth-death process.
4.3 Drift Condition Method
4.3.1 Geometric Ergodicity of Birth-Death Chains
We develop a necessary and sufficient condition for the geometric ergodicity of
birth-death chains via drift condition. Our condition includes two inequalities: lim inf qi >
0 and supi≥1
[cipi
∑i−1k=0
1ckpk
]< ∞. So, our condition is equivalent to the inequality 4–11
in Mao (2010). lim inf qi > 0 also appears in Lemma 3 in Tan et al. (2013) and is easy
to check. And supi≥2
[∑∞j=i cj
] [∑i−1k=0
1ckpk
]< ∞ in Mao (2010) implies our inequality
supi≥1
[cipi
∑i−1k=0
1ckpk
]<∞ because cipi < ci <
∑∞j=i cj for all i ≥ 1.
Given a sequence x in N, we use both notations xi and x(i) for the value of x at i.
We apply Theorem C.8 to the birth-death chain X and find properties of any possible
drift function V . The next lemma is a special case of Lemma 2.2 in Jarner and Hansen
(2000). To avoid other concepts such as random-walk-type Markov chain in Jarner
and Hansen (2000), we will provide a direct proof. This lemma describes small sets of
birth-death chain. We will use this lemma to show that V (i)→∞.
Lemma 4.8. Given a birth-death chain X, if a set is small then it is finite.
Proof. Suppose that C is a small set. By the definition of small set (see Meyn and
Tweedie, 2009, p.109), there exist a constant m and a non-trivial measure ν such that for
all i ∈ C and B ⊂ N
Mm(i, B) ≥ ν(B).
ν is non-trivial so there exists j such that ν(j) 6= 0. Hence Mm(i, j) > 0 for all i ∈ C.
When i > j, we need at least i − j steps to move from i to j, and therefore Mm(i, j) = 0
when i − j > m. Because Mm(i, j) > 0 for all i ∈ C, we have i − j ≤ m for i ∈ C. Thus,
C must be bounded.
70
Because V is finite, we can define
δi = Vi − Vi−1, i ≥ 2. (4–12)
Using Theorem C.8 and Lemma 4.8, we derive a drift condition for birth-death chains. It
implies that drift function has a monotonic tail.
Proposition 4.9. Fix N ∈ N. X is geometrically ergodic if and only if there exist a
constant ε > 0 and a finite function V ≥ 1 on N such that
qiδi ≥ piδi+1 + εVi, i > N, (4–13)
Vi →∞, and Vi strictly increases for i ≥ N .
Proof. We start with necessity. Suppose that X is geometrically ergodic. Let C =
{1, 2, · · · , N}. By Theorem C.8, there exist some constants b < ∞, ε > 0, and a finite
function V ≥ 1 such that
MV ≤ (1− ε)V + b1C .
By Lemma 15.2.2(ii) in Meyn and Tweedie (2009), {V < c} is petite for all c. By Theorem
5.5.7 in Meyn and Tweedie (2009), {V < c} is small. By Lemma 4.8, {V < c} is finite. If
lim inf Vi < ∞ then {V < 1 + lim inf Vi} is infinite. It is a contradiction, so lim inf Vi = ∞.
Thus, limVi =∞.
For i ≥ 2, we have
MV (i) =∞∑j=1
mijVj
= piVi+1 + qiVi−1 + (1− pi − qi)Vi
= pi[Vi+1 − Vi]− qi[Vi − Vi−1] + Vi
= piδi+1 − qiδi + Vi. (4–14)
71
So for i > N ,
MV (i) ≤ (1− ε)V (i) + b1C(i)
⇔MV (i) ≤ (1− ε)V (i)⇔ piδi+1 − qiδi ≤ −εVi ⇔ qiδi ≥ piδi+1 + εVi. (4–15)
Suppose that Vi+1 ≥ Vi for some i > N , or δi+1 ≥ 0, then
qiδi ≥ piδi+1 + εVi ⇒ qiδi ≥ εVi ⇒ δi > 0⇔ Vi > Vi−1. (4–16)
Given any j > N . V is finite and limV (i) = ∞, so there exists m > j such that
Vm+1 > Vm. By induction and applying (4–16), we have Vj > Vj−1. So Vi is strictly
increasing for i ≥ N .
We now prove sufficiency. Suppose that V has the stated properties. As in (4–15)
we have
qiδi ≥ piδi+1 + εVi ⇔MV (i) ≤ (1− ε)V (i), i > N.
For convenience, we only denote V0 = 0 and q1 = 1 in this proof. Because V is finite,
b : = supi≤N
MV (i) = supi≤N
qiVi−1 + riVi + piVi+1
< supi≤N
[Vi−1 + Vi + Vi+1] <∞.
Let C = {1, 2, · · · , N} then MV ≤ (1 − ε)V + b1C . By Theorem C.8, X is geometrically
ergodic.
Proposition 4.10. If X is geometrically ergodic then infi≥2 qi > 0.
Proof. By Proposition 4.9, given N = 1, there exist a constant ε > 0 and a finite, strictly
increases function V ≥ 1 such that qiδi ≥ piδi+1 + εVi for i > 1. Because δi+1 > 0 for
i ≥ 1,
qiδi = qi(Vi − Vi−1) ≥ εVi ⇒ qi > ε, i > 1.
Thus, lim inf qi ≥ ε > 0. By Lemma A.4, we have infi≥2 qi > 0.
72
For 0 < Vi <∞, δi+1/Vi has meaning for all Vi+1. Thus, we consider
αi :=δi+1
Vi, 0 < Vi, Vi+1 <∞. (4–17)
(We add an extra condition 0 < Vi+1 < ∞ so αi is finite.) The next proposition is another
version of Proposition 4.9 by reparameterization. Note that formula (4–13) in Proposition
4.9 involves three values of V which are Vi−1, Vi, and Vi+1. The next proposition is
simpler than Proposition 4.9 because it only involves two values of α, which are αi and
αi−1, in each step.
Proposition 4.11. Fix N ∈ N. X is geometrically ergodic if and only if there exist a
constant ε > 0 and a finite function α > −1 on N such that
αi ≤qipi
αi−1
1 + αi−1
− ε
pi, i > N, (4–18)
∑∞i=1 αi =∞, and αi > 0 for i ≥ N .
Proof. First we prove some equivalences which will be used in the proofs of both
sufficiency and necessity. We have
αi =δi+1
Vi⇔ Vi+1 − Vi = αiVi ⇔ Vi+1 = (1 + αi)Vi.
Because function t 7→ t/(t+ 1) is one-to-one for t > −1, we get
αi =δi+1
Vi⇔ αi
1 + αi=δi+1
Vi
[δi+1
Vi+ 1
]−1
=δi+1
Vi
[Vi+1
Vi
]−1
=δi+1
Vi+1
.
Combining the two above formulas we conclude that, for each i, if (4–17) holds then
αi =δi+1
Vi⇔ Vi+1 = (1 + αi)Vi ⇔
δi+1
Vi+1
=αi
1 + αi. (4–19)
Now fix N ∈ N. Suppose that (4–17) holds for i ≥ N , then we will prove that
qiδi ≥ piδi+1 + εVi ⇔ αi <qipi
αi−1
1 + αi−1
− ε
pi, i ≥ N + 1, (4–20)
Vi strictly increases for i ≥ N ⇔ αi > 0 for i ≥ N, (4–21)
73
and
Vi →∞⇔∞∑i=1
αi =∞. (4–22)
We prove (4–20) first. (4–17) holds for i ≥ N , so 0 < Vi < ∞ for i ≥ N . Note that pi > 0
for all i ≥ 1, thus
qiδi ≥ piδi+1 + εVi ⇔δi+1
Vi≤ qipi
δiVi− ε
pi, i ≥ N. (4–23)
Replace i by i− 1 in the last equality in (4–19), we get
δiVi
=αi−1
1 + αi−1
, i ≥ N + 1. (4–24)
From (4–23), (4–17), and (4–24), we have (4–20). From the middle equality in (4–19),
Vi+1 > Vi if and only if αi > 0, so we have (4–21). Because αi > 0 for i ≥ N , by the
product series theory (see Knopp, 1951, p.219)
∞∏i=N
(1 + αi) <∞⇔∞∑i=N
αi <∞. (4–25)
Form the middle equality in (4–19), we have Vi = VN∏i−1
j=N(1 + αj). Combining it with
(4–25), we get (4–22).
We now start with necessity. Suppose that X is geometrically ergodic. From
Proposition 4.9, there exist ε > 0 and∞ > V ≥ 1 such that qiδi ≥ piδi+1 + εVi for i > N ,
Vi → ∞, and Vi strictly increases for i ≥ N . Define α by (4–17) for i > 0. Note that
∞ > α > −1. Since (4–17) holds for all i ≥ N , we have (4–20)-(4–22). That means we
have (4–18), αi > 0 for i ≥ N , and∑∞
i=1 αi =∞.
We now prove sufficiency. Suppose that α has the stated properties. Define V
by Vi = 1 for i ≤ N and Vi+1 = (1 + αi)Vi for i ≥ N . ∞ > αi > 0 for i ≥ N so
∞ > V ≥ 1. Since (4–17) holds for i ≥ N , we have (4–20)-(4–22). By Proposition 4.9, X
is geometrically ergodic.
74
Denote
λ1 =1
p1
, λi =qipi, i ≥ 2
and
fi(t) = λit
1 + t, t > −1, i ≥ 2,
then we can rewrite (4–18) as
αi ≤ fi(αi−1)− ε
pi, i > N. (4–26)
By Proposition 4.11, we need∑∞
i=1 αi =∞, so the larger αis, the better. Given αi−1 > 0,
we should select the largest αi by
αi = fi(αi−1)− ε
pi.
After we get αi, we also want as large as possible αi+1. Thus, we need to check whether
the larger αi, the larger αi+1. As above, we select αi+1 by
αi+1 = fi+1(αi)−ε
pi+1
.
Because fi+1 is increasing, a larger αi does give a larger αi+1. We now consider ε.
Given αi−1 > 0, the smaller ε, the larger αi > 0 could be selected. This leads us to the
study of the upper bounds for αis. Given N ∈ N, define a sequence x = {xi}∞i=1 by
xN > 0 and xi = fi(xi−1), i > N. (4–27)
(The values of x1, x2, . . . , xN−1 are not important.) Let consider a special case xN = αN .
Because fis are increasing, we can see that xi is an upper bound of αi for each i ≥ N .
By Proposition 4.11, we need∑∞
i=1 αi = ∞, so∑∞
i=N xi ≥∑∞
i=1 αi = ∞ and hence∑∞i=N xi = ∞ is a necessary condition for geometric ergodicity. Is this condition strong
enough? Actually∑∞
i=N xi = ∞ for all positive recurrent chains (we do not provide a
proof here), so this necessary condition for geometric ergodicity is too weak. It turns
out that we can find a much stronger necessary condition for geometric ergodicity than
75
∑∞i=N xi = ∞, which is lim inf xi > 0. And then we can always select some α satisfying
a much stronger condition than∑∞
i=1 αi = ∞, which is lim inf αi > 0. From Proposition
4.10, infi≥2 qi > 0 is a necessary condition for geometric ergodicity. Under that condition,
lim inf xi > 0 is strong enough such that it is also a sufficient condition. First, we provide
some properties of x.
Lemma 4.12. The following four conditions are equivalent
∀x1 > 0, xi = fi(xi−1), i > 1, lim inf xi > 0. (4–28a)
∃x1 > 0, xi = fi(xi−1), i > 1, lim inf xi > 0. (4–28b)
∃N > 0,∀xN > 0, xi = fi(xi−1), i > N, lim inf xi > 0. (4–28c)
∃N > 0,∃xN > 0, xi = fi(xi−1), i > N, lim inf xi > 0. (4–28d)
Proof. Because (4–28a) is strongest and (4–28d) is weakest, it suffices to show that
(4–28d) implies (4–28a). Replacing x by z in (4–28d), we have
∃N > 0,∃zN > 0, zi = fi(zi−1), i > N, inf zi > 0.
We first prove (4–28c). Suppose that 0 < xN < zN then xN = czN for some 0 < c < 1.
For 0 < c < 1 and t > 0, we have
fi(ct) = λict
1 + ct≥ λi
ct
1 + t= cfi(t).
If xi−1 ≥ czi−1 > 0 then
xi = fi(xi−1) ≥ fi(czi−1) ≥ cfi(zi−1) = czi.
By induction, xi ≥ czi for all i ≥ N . So lim inf xi ≥ c lim inf zi > 0, therefore
∀0 < xN ≤ zN , xi = fi(xi−1), i > N, lim inf xi > 0.
76
That means (4–28c) holds for xN ≤ zN . Now consider xN ≥ zN . Suppose that
xi−1 ≥ zi−1 > 0 then
xi = fi(xi−1) ≥ fi(zi−1) = zi.
By induction we get xi ≥ zi for all i ≥ N , so lim inf xi ≥ lim inf zi > 0. Thus, we have
proved (4–28c) for all xN .
We now prove (4–28a). Given any x1 > 0, let xi = fi(xi−1) for i > 1. Because
fi(t) > 0 for t > 0, for all i, and x1 > 0, we have xN = (fN ◦ · · · ◦ f2)(x1) > 0. Combine
this with xi = fi(xi−1) for i > N , we have lim inf xi > 0 by (4–28c). So we finally have
(4–28a).
By Lemma A.4, we can replace lim inf xi > 0 by infi≥N xi > 0 in (4–28c)
and (4–28d). (We can not use inf xi because x may not be positive.) The next proposi-
tion gives a condition for geometric ergodicity which is based on xis.
Proposition 4.13. X is geometrically ergodic if and only if infi≥2 qi > 0 and one of four
conditions (4–28a)-(4–28d) holds.
Proof. We start with necessity. Suppose that X is geometrically ergodic. From Propo-
sition 4.10, we have infi≥2 qi > 0. Given any N ∈ N (actually we can select N = 1), by
Proposition 4.11, there exist constant ε > 0 and α > −1 such that
αi ≤qipi
αi−1
1 + αi−1
− ε
pi= fi(αi−1)− ε
pi, i > N,
∑∞i=1 αi =∞, and αi > 0 for i ≥ N . We set
xN = αN , xi = fi(xi−1), i > N.
αN > 0 so xN > 0.
77
Suppose that xi−1 ≥ αi−1 for some i > N . fi(t) is strictly increasing for t > −1, so
fi(xi−1)− fi(αi−1) ≥ 0. Thus,
xi − αi = fi(xi−1)− αi ≥ fi(xi−1)− fi(αi−1) +ε
pi≥ ε
pi, i > N.
Because xN ≥ αN , using induction and the above argument we have
xi − αi ≥ε
pi, i > N
and hence xipi ≥ ε for i > N . Because 0 < pi < 1, lim inf xi ≥ lim inf xipi ≥ ε > 0, i.e.
(4–28d) holds. By Lemma 4.12, the necessary condition holds.
We now prove sufficiency. By Lemma 4.12, we can suppose that infi≥2 qi > 0 and
(4–28d) holds. Let αi = 0 for i < N and αi = xi/2 for i ≥ N . (4–28d) holds so xi > 0
for i ≥ N . Therefore, α ≥ 0 > −1 and αi > 0 for i ≥ N . Because lim inf xi > 0, we get∑xi =∞ and hence
∑αi =∞. We now only need to find a constant ε > 0 such that
αi ≤ fi(αi−1)− ε
pi, i > N. (4–29)
Note that
αi ≤ fi(αi−1)− ε
pi
⇔ fi(αi−1)− xi2≥ ε
pi
⇔ 2fi
(xi−1
2
)− fi(xi−1) ≥ 2ε
pi
⇔ 2λixi−1/2
1 + xi−1/2− λi
xi−1
1 + xi−1
≥ 2ε
pi
⇔ piλixi−1
[1
1 + xi−1/2− 1
1 + xi−1
]≥ 2ε
⇔ qix2i−1/2
(1 + xi−1)(1 + xi−1/2)≥ 2ε
Because functionst
1 + tand
t/2
1 + t/2
78
increase for t > 0, their product
h(t) :=t2/2
(1 + t)(1 + t/2)
increases for t > 0. h(t) increases, h(t) > 0 for t > 0, and lim inf xi > 0, so lim inf h(xi) >
0. Finally,
lim inf qix2i−1/2
(1 + xi−1)(1 + xi−1/2)= lim inf qih(xi−1) ≥ (lim inf qi)[lim inf h(xi−1)] > 0.
(4–28d) implies xi > 0 for i ≥ N . So qih(xi−1) > 0 for i > N . By Lemma A.4,
infi>N qih(xi−1) > 0. Then we can find ε > 0 satisfying (4–29). By Proposition 4.11, X is
geometrically ergodic.
We are now ready to find a condition for geometric ergodicity which is only based on
pis and qis by selecting a special value for xN . Fix N ∈ N. Set xN = λN and denote
yN,i =1
xi, γi =
1
λi, i ≥ N. (4–30)
From xN = λN and
xi = λixi−1
1 + xi−1
⇔ 1
xi=
1
λi
(1 +
1
xi−1
),
we have
yN,N = γN , yN,i = γi(1 + yN,i−1), i > N.
By induction
yN,i = γi + γiγi−1 + · · ·+ γiγi−1 . . . γN =i∑
k=N
i∏j=k
γj, i ≥ N.
Because xi > 0 for i ≥ N , lim inf xi > 0 if and only if infi≥N xi > 0 by Lemma A.4. Since
infi≥N xi = 1/ supi≥N yN,i, infi≥N xi > 0 if and only if supi≥N yN,i <∞.
We can also rewrite yN,i in another form. Recall that we define ci in (4–4) by c1 = 1
and
ci =p1p2 . . . pi−1
q2q3 . . . qi, i > 1,
79
and c0p0 = 1. We have cipi = γ1 · · · γi = (λ1 · · ·λi)−1 for i ≥ 1. Thus,
yN,i = γi + γiγi−1 + · · ·+ γiγi−1 . . . γN
= cipi(λ1 . . . λi)(γi + γi−1γi + · · ·+ γNγN+1 . . . γi)
= cipi(λ1λ2 . . . λi−1 + · · ·+ λ1λ2 . . . λN−1)
= cipi
i−1∑k=N−1
1
ckpk, i ≥ N ≥ 1. (4–31)
Theorem 4.14. Fix N ∈ N. X is geometrically ergodic if and only if
infi≥2
qi > 0
and
supi≥N
yN,i = supi≥N
(γi + γiγi−1 + · · ·+ γiγi−1 . . . γN) <∞. (4–32)
By (4–31), (4–32) is equivalent to
i−1∑k=N−1
1
ckpk< C
1
cipi, ∀i ≥ N,
for some finite constant C.
Proof of Theorem 4.14. We start with necessity. Suppose that X is geometrically
ergodic. Using Proposition 4.13 with (4–28c) and xN = λN , we have infi≥2 qi > 0 and
supi≥N yN,i <∞.
We now prove sufficiency. Suppose that two inequalities in this theorem hold. Using
Proposition 4.13 with (4–28d) and xN = λN , we see that X is geometrically ergodic.
Note that (4–3) is not easy to check. We will show that we can ignore it. We do not
suppose positive recurrence in the next corollary.
Corollary 4.15. Suppose that X is irreducible and aperiodic. Fix N ∈ N. X is geometri-
cally ergodic if and only if infi>N qi > 0 and supi≥N yN,i <∞.
80
Proof. We start with necessity. Suppose that X is geometrically ergodic, so X is positive
recurrent. By Theorem 4.14, infi>N qi > 0 and supi≥N yN,i <∞.
We now prove sufficiency. Suppose that we has the stated properties. We show that
X is recurrent first. From Proposition 4.9, any drift function V has a monotonic tail, so
{V < n} is finite for all n ≥ 0. By Lemma C.10 (i), {V < n} is small. By Theorem 8.0.2
(ii) in Meyn and Tweedie (2009), X is recurrent. By Theorem C.8, note that we only need
recurrence instead of positive recurrence in Theorem 4.14. Thus, X is geometrically
ergodic.
In the above corollary, we use infi>N qi > 0 instead of infi≥N qi > 0 because q1 = 0.
To show that the inequality (4–11) in Mao (2010) implies our inequality (4–32), we select
N = 1. By (4–31), y1,i = cipi∑i−1
k=01
ckpk. supi≥1
[∑∞j=i cj
] [∑i−1k=0
1ckpk
]< ∞ in Mao (2010)
implies supi≥1
[cipi
∑i−1k=0
1ckpk
]<∞ because cipi < ci <
∑∞j=i cj for i ≥ 1.
By Lemma A.5, given any N ∈ N, under the condition lim inf qi > 0, condition (4–3)
is equivalent to
limi→∞
(γN + γNγN+1 + · · ·+ γNγN+1 . . . γi) <∞.
The above sum is quite similar to yN,i in some cases. (See Example 4.16 for a case.) In
those cases, if a birth-death chain is irreducible, aperiodic, and positive recurrent then
it is also geometrically ergodic. We now apply our result to a chain to show that it is
geometrically ergodic, but Theorem 4.3 could not be used here.
81
Example 4.16. Consider a chain which pis and qis repeat values every four rows as
follows
M =
r1 p1 0 0 0 0 0 0 · · ·
1/9 0 8/9 0 0 0 0 0 · · ·
0 1/2 0 1/2 0 0 0 0 · · ·
0 0 8/9 0 1/9 0 0 0 · · ·
0 0 0 8/9 0 1/9 0 0 · · ·
0 0 0 0 1/9 0 8/9 0 · · ·...
......
......
......
.... . .
for some 0 < r1 < 1, i.e.
p2 = p2+4k = 8/9,
p3 = p3+4k = 1/2,
p4 = p5 = p4+4k = p5+4k = 1/9, k > 0,
qi + pi = 1, i > 1.
We could not apply Theorem 4.3 because
lim sup[√pi−1qi +
√piqi+1] ≥ √p2+4kq3+4k +
√p3+4kq4+4k
=
√8
9
1
2+
√1
2
8
9= 2
2
3=
4
3> 1.
It is clearly that pi > 0 for i ≥ 1, qi > 0 for i ≥ 2, r1 > 0, and infi>1 qi > 0. We apply
Corollary 4.15 with N = 2 to show that this chain is geometrically ergodic. We have
γ2 = γ2+4k = p2/q2 = 8,
γ3 = γ3+4k = p3/q3 = 1,
γ4 = γ5 = γ4k = γ1+4k = p4/q4 = 1/8, k > 0.
82
So (γi, γi+1, γi+2, γi+3) always has one coordinate equal to 8, one coordinate equal to 1,
and two coordinates equal to 1/8 for i ≥ 2, therefore γi+3γi+2γi+1γi = (8)(1)(1/8)(1/8) =
1/8 for i ≥ 2. To prove that supi≥2 y2,i < ∞, it suffices to show that supk≥3 y2,j+4k < ∞ for
each integer number 0 ≤ j ≤ 3. We only consider j = 0 because other cases are similar.
For i = 4k and k ≥ 3, we partition y2,i into four sequences as follows
y2,i = γi + γiγi−1 + · · ·+ γiγi−1 . . . γ2
= γi[1 + (γi−1γi−2γi−3γi−4) + · · ·+ (γi−1γi−2γi−3γi−4) · · · (γ7γ6γ5γ4)]
+ γiγi−1[1 + (γi−2γi−3γi−4γi−5) + · · ·+ (γi−2γi−3γi−4γi−5) · · · (γ6γ5γ4γ3)]
+ γiγi−1γi−2[1 + γi−3γi−4γi−5γi−6 + · · ·+ (γi−3γi−4γi−5γi−6) · · · (γ5γ4γ3γ2)]
+ [(γiγi−1γi−2γi−3) + · · ·+ (γiγi−1γi−2γi−3) · · · (γ8γ7γ6γ5)]
= γi[1 + 1/8 + · · ·+ (1/8)k−2]
+ γiγi−1[1 + 1/8 + · · ·+ (1/8)k−2]
+ γiγi−1γi−2[1 + 1/8 + · · ·+ (1/8)k−2]
+ [1/8 + · · ·+ (1/8)k−2]
≤ [γi + γiγi−1 + γiγi−1γi−2 + 1]1− (1/8)k−1
1− 1/8
≤ [γi + γiγi−1 + γiγi−1γi−2 + 1]1
7/8.
i = 4k so γi + γiγi−1 + γiγi−1γi−2 + 1 = 1/8 + (1/8)(1)(8) + 1 = 218. Thus, y2,4k ≤
(21
8
)87
for
k ≥ 3 and hence supk≥3 y2,4k <∞.
4.3.2 Application to a Family of Gibbs Samplers
Recall from Section 1.2.1 that b0 = 0 and a0 = 1, {ai}∞i=1 and {bi}∞i=1 are two
sequences of strictly positive real numbers such that∑∞
i=1 ai +∑∞
i=1 bi = 1, the x-chain
{Xn} is a birth-death chain with
pi =aibi
(ai + bi−1)(ai + bi)and qi =
ai−1bi−1
(ai + bi−1)(ai−1 + bi−1),
83
and ri = 1 − pi − qi. We use Theorem 4.14 to establish a necessary and sufficient
condition for the geometric ergodicity of {Xn}.
Proposition 4.17. The x-chain {Xn} is geometrically ergodic if and only if
supi>0
biai<∞, sup
i>0
ai+1
bi<∞, (4–33)
and there exists a constant 0 < C < 1 such that
i−1∑j=1
(1
aj+
1
bj
)< C
(1
ai+
1
bi
), ∀i ≥ 2. (4–34)
Proof. infi≥2 qi > 0 if and only if supi≥2 q−1i <∞. We have
1
qi+1
=ai+1 + bi
bi
ai + biai
=
(1 +
ai+1
bi
)(1 +
biai
), i ≥ 1.
Because 1 < 1 + ai+1/bi, 1 + bi/ai < q−1i+1, supi≥2 q
−1i <∞ is equivalent to (4–33).
Denote
ti =aibiai + bi
, i ≥ 1.
For i ≥ 2, we have
γi =piqi
=aibi
(ai + bi−1)(ai + bi)
(ai + bi−1)(ai−1 + bi−1)
ai−1bi−1
=titi−1
.
Thus, γiγi−1 . . . γj = ti/tj−1 for i ≥ j ≥ 2 and hence
y2,i = γi + γiγi−1 + · · ·+ γiγi−1 . . . γ2
=titi−1
+titi−2
+ · · ·+ tit1
= ti
i−1∑j=1
t−1j =
aibiai + bi
i−1∑j=1
(1
aj+
1
bj
), i ≥ 2.
By Theorem 4.14, we obtain the result.
4.3.3 Geometric Ergodicity for a Family of Random Walks on Z
In this section, we use the same method in Section 4.3.1 to find an analogous
necessary and sufficient condition for the geometric ergodicity for a family of random
84
walks on Z. Let Z = {Zn}∞n=0 denote a random walk with state space Z and Markov
transition matrix
M =
. . ....
......
......
...... . .
.
· · · 0 q−1 r−1 p−1 0 0 0 · · ·
· · · 0 0 q0 r0 p0 0 0 · · ·
· · · 0 0 0 q1 r1 p1 0 · · ·
. .. ...
......
......
......
. . .
.
Suppose that π is the stationary distribution of Z then
πM = M
⇔ pi−1πi−1 + riπi + qi+1πi+1 = πi, i ∈ Z
⇔ pi−1πi−1 + qi+1πi+1 = (pi + qi)πi, i ∈ Z
⇔ qi+1πi+1 − piπi = qiπi − pi−1πi−1, i ∈ Z
⇔ qi+1πi+1 − piπi = q1π1 − p0π0, i ∈ Z.
First, we consider the case q1π1 − p0π0 = t > 0. So qiπi − pi−1πi−1 = t for all i and hence
πi ≥ qiπi ≥ t for all i. Thus,∑πi = ∞. This is a contradiction because π is a stationary
distribution. A similar proof shows that the case q1π1 − p0π0 < 0 also implies∑πi = ∞.
So qi+1πi+1 − piπi = 0 for all i. Thus,
πi =p0p1 · · · pi−1
q1q2 · · · qiπ0, i ≥ 1,
π−i =q0q−1 · · · q−i+1
p−1p−2 · · · p−iπ0, i ≥ 1.
From∑∞
i=−∞ πi = 1, Z is irreducible, aperiodic, and positive recurrent if and only if the
following three conditions holds: (i) pi, qi > 0 for all i ∈ Z, (ii) ri > 0 for some i ∈ Z, and
(iii)∞∑i=1
p0p1 · · · pi−1
q1q2 · · · qi+∞∑i=1
q0q−1 · · · q−i+1
p−1p−2 · · · p−i<∞.
The next lemma is analogous to Lemma 4.8.
85
Lemma 4.18. For the random walk Z, if a set is small then it is finite.
Proof. The proof is similar to that of Lemma 4.8. When i 6= j, we need at least |i − j|
steps to move from i to j. So we just replace i − j by |i − j| in the proof of Lemma
4.8.
Denote
δi = Vi − Vi−1, i ∈ Z. (4–35)
The next lemma is analogous to Proposition 4.9.
Lemma 4.19. Fix N ∈ N. Z is geometrically ergodic if and only if there exist a constant
ε > 0 and a finite function V ≥ 1 on Z such that
qiδi ≥ piδi+1 + εV (i), |i| > N, (4–36)
limi→∞ V (i) = limi→−∞ V (i) = ∞, V (i) strictly increases for i > N , and V (i) strictly
decreases for i < −N .
Proof. The proof is similar to that in birth-death chain case. We start with necessity.
Suppose that Z is geometrically ergodic. Let C = {−N,−N + 1, . . . , N}. By Theorem
C.8, there exist constants b <∞, ε > 0, and a finite function V ≥ 1 satisfying
MV ≤ (1− ε)V + b1C .
As in the proof of Proposition 4.9, we have limi→∞ V (i) =∞ and limi→−∞ V (i) =∞.
We have (similar to equation (4–14))
MV (i) = piδi+1 − qiδi + V (i), i ∈ Z.
So for |i| > N , (similar to (4–15))
MV (i) ≤ (1− ε)Vi ⇔ qiδi ≥ piδi+1 + εVi.
86
Similar to the proof of Proposition 4.9, we can show that V is strictly increase for
i ≥ N . We now prove that V strictly decreases for all i ≤ −N . For i < 0, suppose that
V (i− 1) ≥ V (i), or −δi ≥ 0, we have
qiδi ≥ piδi+1 + εV (i)⇔ pi(−δi+1) ≥ qi(−δi) + εV (i)⇒ −δi+1 > 0⇔ V (i) > V (i+ 1).
Given any j < −N , we will show that
V (j) > V (j + 1).
Because lim infi→−∞ V (i) = ∞, there exists m < j such that V (m) > V (m + 1).
By induction and applying the above result, we get V (j) > V (j + 1). So V (i) strictly
decreases for all i ≤ −N .
The proof for the sufficient condition is similar to the proof of Proposition 4.9.
Denote q′i = p−i, p′i = q−i, V ′i = V−i and
δ′i := V ′i − V ′i−1 = V−i − V−i+1 = −(V−i+1 − V−i) = −δ−i+1.
Replace i by −i in qiδi ≥ piδi+1 + εV (i), we have
q−iδ−i ≥ p−iδ−i+1 + εV−i
⇔ −p′iδ′i+1 ≥ −q′iδ′i + εV ′i
⇔ q′iδ′i ≥ p′iδ
′i+1 + εV ′i .
So we can rewrite the above lemma as follows.
Lemma 4.20. Fix N ∈ N. Z is geometrically ergodic if and only if there exist functions
V, V ′ : N→ [1,∞) such that
qiδi ≥ piδi+1 + εV (i), q′iδ′i ≥ p′iδ
′i+1 + εV ′i , i > N, (4–37)
limi→∞ V (i) = limi→∞ V′(i) =∞, and V (i) and V ′(i) strictly increase for i > N .
87
With the above notation, we can partition a random walk chain into two birth-death
chains. Finally, we have a similar result to Theorem 4.14.
Theorem 4.21. Suppose that the random walk Z is irreducible and aperiodic. Fix N ∈ N.
Z is geometrically ergodic if and only if
infi>0
qi > 0,
infi>0
p−i > 0,
supi≥N
i∑k=N
pkpk+1 . . . piqkqk+1 . . . qi
<∞,
supi≥N
i∑k=N
q−kq−k−1 . . . q−ip−kp−k−1 . . . p−i
<∞.
We suggest that our method can be applied to random walks on Zd which partition
into a finite number of separated birth-death branches outside a bounded set.
4.3.4 Some Other Results
Recall from Section 4.1 that X is a irreducible, aperiodic, and positive recurrent
birth-death chain with Markov transition matrix M defined by pis and qis. Using Proposi-
tion 4.9, we have a necessary condition for the geometric ergodicity of a class of Markov
chains, which is called single-birth chains by some authors (see, e.g., Chen, 2004).
Corollary 4.22. Given an irreducible, aperiodic, and positive recurrent Markov chain X ′
with Markov transition matrix
M ′ =
r′1 p
′1 0 0 0 · · ·
q′21 r
′2 p
′2 0 0 · · ·
q′31 q′32 r
′3 p
′3 0 · · ·
......
......
.... . .
.
Suppose that there exists N0 ≥ 1 such that
p′i ≤ pi and q′i1 + q′i2 + · · ·+ q′i,i−1 ≥ qi, i > N0.
88
If X is geometrically ergodic then X ′ is geometrically ergodic.
Proof. Since X is geometrically ergodic, by Proposition 4.9, there exists a strictly
increasing drift function V on N such that limVi = ∞ (We set N = 1 in Proposition 4.9).
For all i ≥ 2, we have
M ′V (i) ≤ (1− ε)Vi
⇔ p′iVi+1 + r′iVi + q′i,i−1Vi−1 + · · ·+ q′i1V1 ≤ (1− ε)Vi
⇔ p′iVi+1 + (1− p′i − q′i1 − · · · − q′i,i−1)Vi + q′i,i−1Vi−1 + · · ·+ q′i1V1 ≤ (1− ε)Vi
⇔ p′iδi+1 − q′i,i−1(Vi − Vi−1)− q′i,i−2(Vi − Vi−2)− · · · − q′i1(Vi − V1) ≤ −εVi
⇔ q′i,i−1δi + q′i,i−2(Vi − Vi−2) + · · ·+ q′i,1(Vi − V1) ≥ p′iδi+1 + εV (i).
V strictly increases so 0 < δi < Vi − Vj for i > j ≥ 1 and hence for i > N0,
q′i,i−1δi + q′i,i−2(Vi − Vi−2) + · · ·+ q′i,1(Vi − V1)
≥ q′i,i−1δi + q′i,i−2δi + · · ·+ q′i,1δi
≥ qiδi ≥ piδi+1 + εV (i) ≥ p′iδi+1 + εV (i)
⇒M ′V (i) ≤ (1− ε)Vi.
For i ≤ N0,
M ′V (i) = p′iVi+1 + r′iVi + q′i,i−1Vi−1 + · · ·+ q′i1V1 <∞.
Let b = maxi≤N0 M′V (i) < ∞ and C = {1, 2, · · · , N0}. By Theorem C.8, X ′ is geometri-
cally ergodic.
By Proposition 4.10, lim inf qi > 0 is a necessary condition for the geometric
ergodicity of X. Note that lim inf qi > 0 implies lim sup ri < 1. Under this condition,
we will show that we can suppose ri = 0 for all i except some i0 if we only care about
geometric ergodicity. Because of aperiodicity, we need some ri0 > 0. Without loss of
generality, we suppose r1 > 0 in the next corollary. Recall that λi = qi/pi.
89
Corollary 4.23. Let
p′i = pi + ri1
λi + 1, q′i = qi + ri
λiλi + 1
, r′i = 0, i > 1,
then
p′i + q′i = 1 andq′ip′i
= λi, i > 1.
Denote by X ′ a birth-death chain with transition matrix
M ′ =
r1 p1 0 0 0 · · ·
q′2 0 p
′2 0 0 · · ·
0 q′3 0 p
′3 0 · · ·
......
......
.... . .
.
Under condition r1 > 0 and lim sup ri < 1, X is geometrically ergodic if and only if X ′ is
geometrically ergodic.
Proof. We can show that X ′ is also an irreducible, aperiodic and positive recurrent
birth-death chain.
Suppose that X is geometrically ergodic. Fix any N ∈ N. By Proposition 4.9, there
exists some ε and drift function V such that qiδi ≥ piδi+1 + εV (i) for i > N . Because
qipi
=q′ip′i,
1
pi≥ 1
p′i, δi > 0, i > N,
and V is finite, we have
qiδi ≥ piδi+1 + εV (i)⇔ δi+1
Vi≤ qipi
δiVi− ε
pi
⇒ δi+1
Vi≤ q′ip′i
δiVi− ε
p′i
⇔ q′iδi ≥ p′iδi+1 + εV (i), i > N.
That means V is also a drift function for X ′, so X ′ is geometrically ergodic by Proposi-
tion 4.9.
90
Conversely, suppose that X ′ is geometrically ergodic. Fix any N ∈ N. By Proposi-
tion 4.9, there exists some ε′ and drift function V such that q′iδi ≥ p′iδi+1 + εV (i) for i > N .
Doing as above we have
δi+1
Vi≤ q′ip′i
δiVi− ε′
p′i.
We need to find some ε > 0 such that
q′ip′i
δiVi− ε′
p′i≤ qipi
δiVi− ε
pi⇔ ε′
p′i≥ ε
pi⇔ ε ≤ pi
p′iε′.
If we could show that lim inf pi/p′i > 0 then we could find a positive constant ε for the
above inequality and hence we can apply Proposition 4.9 with N , V , and ε for X to show
that X is geometrically ergodic. Note that
p′ipi− 1 =
ripi(λi + 1)
=ri
pi(1 + qi/pi)=
ripi + qi
,
so
lim sup ri < 1⇔ lim supri
pi + qi<∞⇔ lim sup
p′ipi<∞⇔ lim inf
pip′i> 0.
Birth-death chains is special cases of random-walk-type Markov chains (see, e.g.,
Jarner and Tweedie, 2003). V is also unbounded for those chains.
Lemma 4.24. Given a random-walk-type Markov chain on Rn and xn ∈ Rn such that
xn →∞. If V is any drift function for this chain then lim inf V (xi) =∞.
Proof. From Jarner and Tweedie (2003), a small set of a random-walk-type Markov
chain on Rn is bounded. {V < c} is petite so it is bounded. If lim inf V (xn) < ∞ then
there exists a subsequence {xnk} of {xn} such that c = limV (xnk) < ∞ and hence
xnk ∈ {V < c+ 1}. This is a contradiction since {xnk} is unbounded.
91
Finally, we mention some studies which relate to Markov chains in this chapter.
Jarner and Hansen (2000) and Bramson (2008) studied the relationship between petite
sets and bounded sets. Theorem 2.2 in Jarner and Tweedie (2003) uses drift condition
to show that a necessary condition for the geometric ergodicity of a random-walk-type
Markov chain is geometric stationary tail. Mao et al. (2012) studied GI/G/1-type Markov
chain which is a random-walk-type Markov chain when its phase process is stochastic.
Theorem 3.1 in Mao et al. (2012) shows that geometric stationary tail is also a sufficient
condition for the geometric ergodicity. A birth-death chain is a GI/G/1-type Markov
chain when pis and qis do not depend on i. Kovchegov (2010) calculates the upper
bound for the total variation distance of geometrically ergodic birth-death chains when
pis and qis do not depend on i.
92
APPENDIX ASOME LEMMAS AND EXAMPLES
Recall Hilbert spaces L2(π) and L20(π) in Chapter 1. Note that the dimension of
L20(π) is 1 less than that of L2(π).
Lemma A.1. Let A be a linear operator from L2(π) to itself. Suppose that A restricted
to L20(π) is a linear operator from L2
0(π) to itself. A is compact in L2(π) if and only if A is
compact in L20(π).
Proof. Suppose that A is compact in L2(π). Let {h(n)} be a bounded sequence in
L20(π) then {h(n)} is bounded in L2(π). Because A is compact in L2(π), there exists a
subsequence {h(n′)} of {h(n)} such that {Ah(n′)} converges to a function g in L2(π).
Because Ah(n′) ∈ L20(π) and L2
0(π) is closed, we have g ∈ L20(π). We can find a
subsequence {h(n′)} of {h(n)} such that {Ah(n′)} converges to a function in L20(π), so A is
compact in L20(π).
Now suppose that A is compact in L20(π). Let {h(n)} be a bounded sequence in
L2(π). Suppose that ‖h(n)‖ < M . By Jensen’s inequality we have |πh(n)|2 ≤ π(h(n))2 =
‖h(n)‖2 < M2, so the sequence {πh(n)} is bounded in R. Since every bounded sequence
in R has a subsequence which converges, there exists a subsequence {h(n′)} of {h(n)}
such that {πh(n′)} converges in R. Given a constant c, we also denote by c the function
c(x) = c. Since A(πh(n′)) = πh(n′)A(1), the sequence {A(πh(n′))} converges in L2(π).
Because Varh(n′) ≤ E(h(n′))2, we have ‖h(n′) − πh(n′)‖ ≤ ‖h(n′)‖ < M . Note that
π(h(n′) − πh(n′)) = 0, so {h(n′) − πh(n′)} is a bounded sequence in L20(π). A is compact
in L20(π), so there is a subsequence {h(n′′)} of {h(n′)} such that {A(h(n′′) − πh(n′′))}
converges in L20(π). Since Ah(n′′) = A(h(n′′) − πh(n′′)) + A(πh(n′′)), the sequence {Ah(n′′)}
converges in L2(π). We can find a subsequence {h(n′′)} of {h(n)} such that {Ah(n′′)}
converges in L2(π), so A is compact in L2(π).
93
Lemma A.2. In Section 3.2.2, π(θ, µ, λ|y) is improper if and only if
a < 0, a+K
2>
1
2, and a+ b >
1−N2
.
Proof. Tan and Hobert (2009) gave conditions so that the posterior is proper for a similar
model with the reparameterization λ−1θ = σ2
θ and λ−1e = σ2
e . We have
yij|θ, µ, σ2θ , σ
2e ∼ N(θi, σ
2e), i = 1, 2, . . . , K, j = 1, 2, . . . ,mi,
θ|µ, σ2θ , σ
2e ∼ N(1µ, Iσ2
θ),
f(σ2θ , σ
2e , µ) ∝ (σ2
θ)−(a−1)(σ2
e)−(b−1)
∣∣∣∣dλθdσ2θ
∣∣∣∣ ∣∣∣∣dλedσ2e
∣∣∣∣ = (σ2θ)−(a+1)(σ2
e)−(b+1), σθ, σe > 0.
To simply the arguments, we consider X|Y ∼ fX|Y (x|y), Y ∼ fY (y), and the reparam-
eterization Z = g(Y ) for some fX|Y , fY , and g. Suppose that g is one-to-one and there
exists dg−1/dz. Consider y0 and z0 = g(y0). We have
fX|Z(x|z0) = fX|Y (x|g−1(z0)) = fX|Y (x|y0),
and
fZ(z0) = fY (g−1(z0))dg−1(z0)
dz0
= fY (y0)dg−1(z0)
dz0
.
So ∫fX,Z(x, z)dz =
∫fX|Z(x|z)fZ(z)dz
=
∫fX|Y (x|y)fY (y)
dg−1(z)
dzdz
=
∫fX|Y (x|y)fY (y)dg−1(z)
=
∫fX|Y (x|y)fY (y)dy
=
∫fX,Y (x, y)dy.
That means both integrals will be finite or infinite. By Hobert and Casella (1996) and Tan
and Hobert (2009), we obtain the result.
94
Example A.3. We draw sets which are similar to D and Sb in R2 and R3 in Section 3.3.
In R2, denote by (x, y) a point in a regular coordinate system and by (r, θ) a point in the
Figure A-1. Graph in R2
corresponding polar coordinate system. If x = y then θ = π4. Figure A-1 is the graph of
the line y = x for 0 < x < δ, the region
D =
{(x, y) ∈ (0, δ)2 :
1
2<x
y< 2
},
and the curve {(r, θ) : r = δ, |θ − π
4| < 1
4
}.
Note that for any point A in the set
S4 =
{(r, θ) : r < δ, |θ − π
4| < 1
4
},
the line which connect (0, 0) and A cut the above curve.
In R3, denote by (x, y, z) a point in a regular coordinate system and by (r, θ, φ) a
point in the corresponding polar coordinate system. If x = y = z then θ = arctan(√
2) and
φ = π4. Figure A-2 is the graphs of the line x = y = z for 0 < x < δ, the region
D =
{(x, y) ∈ (0, δ)2 :
1
2<x
z,y
z,x
y< 2
},
95
Figure A-2. Graph in R3
and the surface {(r, θ, φ) : r = δ, |θ − arctan(
√2)| < 1
5, |φ− π
4| < 1
5
}.
Lemma A.4. For a positive sequence {xi}∞i=1, inf xi > 0 if and only if lim inf xi > 0.
Proof. Suppose that inf xi > 0. There exists ε > 0 such that xi > ε for all i. So
lim inf xi ≥ ε > 0.
Conversely, suppose that lim inf xi > 0. There exists N ∈ N and ε > 0 such
that xi ≥ ε for i > N , therefore infi>N xi ≥ ε. Because x is a positive sequence,
mini≤N xi > 0. Thus, inf xi > 0.
Given positive sequences {pi}∞i=1 and {qi}∞i=1, recall that we define cis as in (4–4)
c1 = 1, ci =p1 p2 · · · pi−1
q2 q3 · · · qi, i = 2, 3, . . .
The next lemma show that we can change the range of indexes is of qis when
lim inf qi > 0.
96
Lemma A.5. If lim inf qi > 0 then
∞∑i=1
ci <∞⇔∞∑
i−k≥2
p1p2 . . . pi−1
q2q3 . . . qi−k<∞, k ≥ 0
⇔∞∑
i+l≥2
p1p2 . . . pi−1
q2q3 . . . qi+l<∞, l > 0.
Proof. We only prove for k because the proof for l is quite similar. Denote
dki =p1p2 . . . pi−1
q2q3 . . . qi−k, i− 2 ≥ k,
then dki = ci(qi−k+1 . . . qi) ≤ ci. lim inf qi > 0 so there exists N ≥ 2 and ε > 0
such that qi > ε for i > N . For i − k + 1 > N (which implies i − 2 ≥ k), we have
ci ≥ dki = ci(qi−k+1 . . . qi) ≥ ciεk. So
∑∞i−k+1>N ci converges if and only if
∑∞i−k+1>N dki
converges. Thus,∑∞
i=1 ci converges if and only if∑
i−k≥2 dki converges.
97
APPENDIX BCHI-SQUARE DISTANCE
Definition B.1. Given two σ-finite measures Λ and Π on a measure space (X,F). chi-
square distance between Λ and Π (also called chi-square divergence of Π from Λ) is
definied by (see, e.g., Roberts and Rosenthal, 1997, p.16)
χ2(Λ,Π) =
∫X
(dΛdΠ− 1)2dΠ, Λ� Π,
∞, otherwise,
where� denotes absolute continuity.
Note that this definition only depends on Λ and Π. Suppose that Λ � µ and
Π � µ for some σ-finite measure µ on (X,F). (For example, those conditions hold when
µ = Λ + Π.) Denote dΠ = πdµ and dΛ = λdµ. The next lemma gives another form for
chi-square distance.
Lemma B.2.
χ2(Λ,Π) =
∫X
(λ− π)2
πdµ.
Therefore, the right hand side of the above formula does not depend on µ.
Proof. We start with case Λ � Π. Because Λ � Π and Π � µ, we have (see Halmos,
1950, p.133)
λ =dΛ
dµ=dΛ
dΠ
dΠ
dµ=dΛ
dΠπ µ-a.e.
Soλ
π=dΛ
dΠµ-a.e.
We can suppose that dΛdΠ
is finite (see Halmos, 1950, p.128). If π(x) = 0 then λ(x)/π(x) is
infinite when λ(x) 6= 0 and has no meaning when λ(x) = 0. From measure theory, those
cases are not important because they happen on a set with measure µ equal to 0. So∫X
(dΛ
dΠ− 1
)2
dΠ =
∫X
(λ
π− 1
)2
πdµ =
∫X
(λ− π)2
πdµ.
98
Now suppose that Λ is not absolutely continuous with respect to Π. We need to
prove that ∫X
(λ− π)2
πdµ =∞.
By Lebesgue decomposition (see Halmos, 1950, p.134), Λ = ν + σ where ν ⊥ Π and
σ � Π. Since ν ⊥ Π, there exists a set A such that Π(A) = 0 and ν(X \ A) = 0. Because
Λ is not absolutely continuous with respect to Π, ν is not trivial and hence ν(A) > 0. We
have ∫X
(λ− π)2
πdµ ≥
∫A
(λ− π)2
πdµ.
Π(A) = 0 so we can select π such that π(x) = 0 for x ∈ A. σ � Π and Π � µ imply
σ � µ. Since Λ � µ, ν = Λ − σ � µ. Combine that with Λ ≥ ν, we have λ ≥ dνdµ
.
Because π(x) = 0 on A, we get
∫A
(λ− π)2
πdµ =
∫A
λ2
πdµ ≥
∫A
(dνdµ
)2
πdµ.
Because ν(A) > 0, dνdµ> 0 on some set B ⊂ A such that µ(B) > 0. Since π(x) = 0 on A,
∫A
(dνdµ
)2
πdµ =∞.
Chi-square distance is not symmetric, because χ2(Λ,Π) 6= χ2(Π,Λ) in most cases.
If Λ and Π are probability measures, then
χ2(Λ,Π) =
∫X
[(dΛ
dΠ
)2
− 2dΛ
dΠ+ 1
]dΠ
=
∫X
(dΛ
dΠ
)2
dΠ− 2 + 1 =
∫X
(dΛ
dΠ
)2
dΠ− 1. (B–1)
We also denote by L2(Π) all σ-finite signed measures Λ on (X,F) such that dΛ/dΠ ∈
L2(Π) (see, e.g., Roberts and Rosenthal, 1997, p.16). From (B–1), when Λ and Π are
probability measures, χ2(Λ,Π) <∞ if and only if Λ ∈ L2(Π).
99
Lemma B.3. (see, e.g., Diaconis et al., 2008, p.155)
4‖Λ− Π‖2 ≤ χ2(Λ,Π),
where the left hand side is the total variation norm.
Proof. Note that if dΛ = λdµ and µ is positive then ‖Λ‖ =∫|λ|dµ. Writing total variance
distance as L1 distance, we have
‖Λ− Π‖ =1
2
∫X
|λ− π|dµ =1
2
∫X
|λ− π|π
dΠ. (B–2)
By Jensen’s inequality,
4‖Λ− Π‖2 =
(∫X
∣∣∣∣λ− ππ∣∣∣∣ dΠ
)2
≤ χ2(Λ,Π).
Let Φ = {Xi}∞i=0 be an irreducible Markov chain on a Borel space (X,F) with some
Markov transition kernel P , stationary distribution Π, and initial distribution P0(·). Denote
by Pn the distribution of Xn for all n ≥ 0. Denote by P ∗ the kernel of the backward chain.
The chain is called L2(Π)-geometrically ergodic if (see, e.g., Roberts and Tweedie,
2001, p.39) if there exist a function M(P0) <∞ and a constant 0 < r < 1 such that
χ2(Pn,Π) ≤M(P0)rn, ∀P0 ∈ L2(Π),∀n ∈ N.
Lemma B.4. (Liu et al., 1995, p.162) Suppose that P (x, ·) is absolutely continuous with
respect to some measure µ for all x and χ(Pk,Π) <∞ for some k ≥ 0 then
χ(Pn,Π) ≤ χ(Pk,Π)‖P‖n−k, ∀n ≥ k.
Furthermore, if ‖P‖ < 1 then the chain is L2(Π)-geometrically ergodic.
100
Proof. We can suppose that k = 0 without loss of generality. Given a measurable set A,
suppose that µ(A) = 0. It implies P (x,A) = 0 for all x. Because Π = ΠP , we have
Π(A) =
∫X
Π(dx)P (x,A) = 0
and hence Π � µ. Denote π(x) = dΠ/dµ. Note that P n = PP n−1 and Pn = P0Pn, so
we can use the same technique to show that P n(x, ·) � µ and Pn � µ for all n ≥ 1.
χ(P0,Π) < ∞ implies P0 � Π. Combine it with Π � µ, we get P0 � µ. Finally, we can
denote dP n(x, ·)/dµ = kn(x, ·) for all n ≥ 1 and dPn/dµ = pn for all n ≥ 0.
We have
pn(y)− π(y) =
∫X
kn(x, y)p0(x)µ(dx)−∫X
kn(x, y)π(x)µ(dx)
=
∫X
kn(x, y)[p0(x)− π(x)]µ(dx)
because all above integrals are finite. Denote g = [p0 − π]/π and h = [pn − π]/π.
χ(P0,Π) <∞ implies g ∈ L2(Π). Denote dP ∗n(y, ·)/dµ = k∗n(y, x), then
π(x)kn(x, y) = π(y)k∗n(y, x).
We have
h(y) =1
π(y)[pn(y)− π(y)] =
∫X
kn(x, y)
π(y)[p0(x)− π(x)]µ(dx)
=
∫X
k∗n(y, x)p0(x)− π(x)
π(x)µ(dx) =
∫X
k∗n(y, x)g(x)π(dx) = P ∗ng(y),
or h = P ∗ng. Since P ∗ is an operator from L2(Π) to L2(Π), P ∗n is also an operator from
L2(Π) to L2(Π). g ∈ L2(Π) implies h ∈ L2(Π). We have
χ2(Pn,Π) = ‖h‖2 = ‖P ∗ng‖2.
Because ‖P ∗‖ = ‖P‖,
χ(Pn,Π) = ‖P ∗ng‖ ≤ ‖P ∗n‖‖g‖ ≤ ‖P ∗‖n‖g‖ = χ(P0,Π)‖P‖n.
101
From (B–1), χ2(P0,Π) < ∞ if and only if P0 ∈ L2(Π). If ‖P‖ < 1, the chain is L2(Π)-
geometrically ergodic.
Remark B.5. We need g, h ∈ L2(Π) to have χ2(Pn,Π) = 〈P nh, g〉 and
‖h‖2 = 〈P nh, g〉 ⇒ ‖h‖2 ≤ ‖P n‖‖h‖‖g‖ ≤ ‖P‖n‖h‖‖g‖
⇒ ‖h‖ ≤ ‖P‖n‖g‖.
Example B.6. Φ starts at some point x0 so P0 = δx0 (Dirac measure), and P (x, ·) � λ
where λ is the Lebesgue on R. Since λ({x0}) = 0 and δx0({x0}) = 1 6= 0, δx0 is not
absolutely continuous with respect to λ. By the definition of chi-square distance, we
have χ(P0,Π) = ∞, so we can not use Lemma B.4 here. P1 = P (x0, ·) implies P1 � λ. If
we have χ2(P1,Π) <∞, we can use Lemma B.4 with k = 1.
Without the condition χ(Pk,Π) < ∞ for some constant k ≥ 1, we still can show that
the chain is geometrically ergodic for almost all starting points.
Proposition B.7. Suppose that P (x, ·) is absolutely continuous with respect to some
measure µ for all x, and Φ is ϕ-irreducible. If ‖P‖ < 1 then Φ is Π-a.e. geometrically
ergodic.
Proof. By Lemma B.4, Φ is L2(Π)-geometrically ergodic. By Theorem 1 in Roberts and
Tweedie (2001), Φ is Π-a.e. geometrically ergodic.
Suppose that P0 and P (x, ·) are absolutely continuous with respect to some
measure µ for all x. We now derive a formula for chi-square distance in reversible case
which is similar to (2.7) in Diaconis et al. (2008, p.156). P = P ∗ implies
χ2(Pn,Π) = ‖P ng‖2 = 〈P ng, P ng〉 = 〈P 2ng, g〉.
102
Denote by σ(P ) the spectrum of P . By the spectral theorem (see Conway, 1990, p.263),
there exists the spectral measure E on σ(P ) such that
P =
∫σ(P )
λdE(λ).
Let φ(λ) = λ2n. Using results in Conway (1990, p.264), we have the following lemma.
Lemma B.8. Suppose that P0 and P (x, ·) are absolutely continuous with respect to
some measure µ for all x. If Φ is reversible then
χ2(Pn,Π) = 〈P 2ng, g〉 =
∫σ(P )
λ2ndEg,g, (B–3)
where Eg,g(·) = 〈E(·)g, g〉 is a nonnegative measure on σ(P ) (Conway, 1990, p.257).
When P is compact, σ(P ) is discrete and hence the integral in (B–3) is equal to a
sum which is similar to (2.7) in Diaconis et al. (2008, p.156).
The next lemma is analogue to Proposition 13.3.2 in Meyn and Tweedie (2009).
Lemma B.9. Suppose that P0 and P (x, ·) are absolutely continuous with respect
to some measure µ for all x, and the chain is reversible. Then chi-square distance
χ2(Pn,Π) is non-increasing in n.
Proof. If m ≥ n then λ2m ≤ λ2n for |λ| ≤ 1. Because Eg,g is a nonnegative measure, we
have
χ2(Pn,Π) =
∫σ(P )
λ2ndEg,g ≥∫σ(P )
λ2mdEg,g = χ2(Pm,Π).
103
APPENDIX CF -GEOMETRIC ERGODICITY
We review the drift condition method for general state space and discrete state
space Markov chains. We state results in a more completed way than known docu-
ments. Let Φ = {Xk}∞k=0 be a ψ-irreducible Markov chain on a countably generated
measure space (X,F) with a Markov transition kernel P (x, ·), and let F+ = {A ∈ F :
ψ(A) > 0} (see Meyn and Tweedie, 2009, p.91). We denote Meyn and Tweedie (2009)
by MT throughout this section. Given A ∈ F , the first return time on A is defined by
τA = min{k ≥ 1 : Xk ∈ A} (see MT, p.71). A ∈ F is full if ψ(Ac) = 0 where Ac is the
complement of A, and A is absorbing if P (x,A) = 1 for all x ∈ A (see MT, p.91).
Consider an F-measurable function f : X → [1,∞]. For any constant r > 1, denote
(see MT, p.368 and p.372)
R(r)A (x, f) = Ex
[τA−1∑k=0
f(Xk)rk
], ∀x ∈ X,∀A ∈ F .
A set A ∈ F is called f -geometrically regular if for each B ∈ F+ there exists r =
r(f,B) > 1 such that (see MT, p.368)
supx∈A
R(r)B (x, f) = sup
x∈AEx
[τB−1∑k=0
f(Xk)rk
]<∞.
Given any signed measure µ on (X,F), the f -norm of µ is defined by ‖µ‖f = supg:|g|≤f |µf |
where µf =∫Xfdµ (see MT, p.334). Suppose that Φ has a stationary distribution π. Φ is
called f -geometrically ergodic if there exists a constant r > 1 such that (see MT, p.359)
∞∑n=1
rn‖P n(x, ·)− π‖f <∞, ∀x ∈ X.
Φ is called π-f -geometrically ergodic if there exists a constant r > 1 such that
∞∑n=1
rn‖P n(x, ·)− π‖f <∞, π-a.s.
104
For f = 1, we use terms π-geometric ergodicity (geometric ergodicity, resp.) instead of
π-1-geometric ergodicity (1-geometric ergodicity, resp.) Denote by 1A the function which
is equal 1 on A and equal 0 otherwise.
Theorem 15.0.1 in MT is about sufficient conditions for π-geometric ergodicity.
Theorem 6.14 in Nummelin (1984) is about necessary and sufficient conditions for
π-geometric ergodicity and there is no drift condition. The next theorem improves both
of those theorems from π-geometric ergodicity to π-f -geometric ergodicity, and π-f -
geometric ergodicity is stated as a necessary and sufficient condition. But note that
many theorems in Chapter 15 in MT are about π-f -geometric ergodicity, so many proofs
in MT still can be used to prove many parts of the next theorem. To make it clearer, we
note that the next theorem is analogous to Theorem 14.0.1 in MT. In Theorem 14.0.1 in
MT, we first fix a function f , and then we want to know Markov chain is π-f -ergodic or
not. In the next theorem, we also fix a function f first, and find necessary and sufficient
conditions for π-f -geometric ergodicity.
Theorem C.1. (MT, Theorem 15.0.1) Suppose that Φ is ψ-irreducible, aperiodic, and
recurrent. Given an F-measurable function f : X → [1,∞) then the following five
conditions are equivalent:
(i) Φ is positive with a stationary distribution π such that πf < ∞, and there exist
some petite set A ∈ F+, constants M <∞, r > 1, c1, and c2 such that for all x ∈ A
and n ≥ 0,
rn|P nf(x)− c1| < M (C–1)
and
rn|P n(x,A)− c2| < M. (C–2)
(ii) There exist some petite set C ∈ F and constant κ > 1 such that
supx∈C
R(κ)C (x, f) = sup
x∈CEx
[τC−1∑k=0
f(Xk)κk
]<∞. (C–3)
105
(iii) There exist some petite set B, constants r > 1, b < ∞, and a measurable function
V : X→ [1,∞], which is finite at some x0 ∈ X and V ≥ f , such that
PV ≤ r−1V + b1B. (C–4)
(iv) There exists an absorbing and full subset of X which is a subset of a union of a
countable number of f -geometrically regular sets.
(v) Φ is π-f -geometrically ergodic, i.e. there exists stationary distribution π and some
constant s > 1 such that
∞∑n=1
sn‖P n(x, ·)− π‖f <∞, π-a.s. (C–5)
Furthermore, any of these five conditions implies that SV = {x : V (x) < ∞} is full and
absorbing, where V is any function such that condition (iii) holds, πV <∞, there exists a
constant a <∞ such that
∞∑n=1
sn‖P n(x, ·)− π‖f ≤ aV (x), x ∈ SV ,
where s is any constant such that condition (v) holds, and there also exist constants s
and a <∞ such that
∞∑n=1
sn‖P n(x, ·)− π‖V ≤ aV (x), x ∈ SV .
Remark C.2. We compare some differences between Theorem C.1 and Theorem 15.0.1
in MT.
• In Theorem C.1, π-f -geometric ergodicity (Theorem C.1(v)) is equivalent to eachof four conditions Theorem C.1(i)-Theorem C.1(iv). But in Theorem 15.0.1 in MT,π-geometric ergodicity (formula (15.4)) is only stated as a necessary condition foreach of three conditions Theorem 15.0.1(i)-Theorem 15.0.1(iii). Theorem C.1 isanalogous to Theorem 6.14 in Nummelin (1984). In Theorem 6.14 in Nummelin(1984), π-geometric ergodicity (Theorem 6.14(iii)) is equivalent to each of twoconditions Theorem 6.14(i) and Theorem 6.14(ii).
• The difference between Theorem 15.0.1 in MT and Theorem C.1 is analogous tothat between Chapter 13 and Chapter 14 in MT. Theorem 15.0.1 in MT actually
106
only mentions conditions for π-geometric ergodicity, but Theorem C.1 mentionsconditions for π-f -geometric ergodicity. Note that each of five equivalent conditionsin Theorem C.1 involves f . To make it clearer, we now explain why Theorem 15.0.1in MT is not enough for π-f -geometric ergodicity. Given an F-measurable functionf : X → [1,∞), we want to know Φ is π-f -geometric ergodicity or not. First, we tryto apply Theorem 15.0.1 in MT. Suppose that (15.2) in MT, which is
supx∈C
Ex [κτC ] <∞,
holds. We can show that it is equivalent to a special case of (C–3) (set f = 1 in(C–3)). (15.2) in MT does not involve f anywhere so it should not work. Theorem15.0.1 in MT only tell us that the chain is π-V -geometrically ergodic where V isdefined by Theorem 15.2.4 in MT (we set f = 1 in Theorem 15.2.4)
V (x) = Ex
[σC∑k=0
1X(Xk)rk
],
where σC = min{k ≥ 0 : Xk ∈ C}. There is no relationship between V and f soTheorem 15.0.1 in MT does not tell us Φ is π-f -geometrically ergodic or not. Wenow apply Theorem C.1. Suppose that Theorem C.1(ii) holds. Theorem C.1 showsthat we can find a drift function V such that V ≥ f , and Φ is both π-V -geometricallyergodic and π-f -geometrically ergodic.
• Theorem C.1(i) with f = 1 is exactly Theorem 15.0.1(i) in MT. For f = 1, we haveP nf(x) − πf = 1 − 1 = 0 so 0 = rn|P nf(x) − c1| < M (set c1 = πf = 1) obviouslyholds.
• We do not use ∆V = PV − V to define drift condition (C–4) because PV − V maybe∞−∞. For example,∞−∞ may happens on a non trivial set in Lemma 15.2.4in (MT). We can use ∆V if we state that drift condition (C–4) holds almost surely.
• We use different notations for petite sets (A, B, and C) in Theorem C.1. Forexample, if Theorem C.1(iii) holds with petite set B, than we can not select petiteset C = B in Theorem C.1(ii). The reason for selecting different notations for ratesr, κ, and s in Theorem C.1 is similar.
To prove Theorem C.1, we need some lemmas. Denote (see MT, (15.23))
U(r)C (x, f) = Ex
[τC∑k=1
f(Xk)rk
]
and (see MT, (15.29))
G(r)C (x, f) = Ex
[σC∑k=0
f(Xk)rk
].
107
If (C–3) holds for any set C then C is called a f -Kendall set of rate κ.
Lemma C.3.
(i) Let r > 1 then Ex[rτC ] <∞⇔ Ex[∑τC
n=1 rn] <∞⇔ Ex[
∑τC−1n=0 rn] <∞.
(ii) If supC f(x) < ∞ and supB R(r)C (x, f) < ∞ then supB f(x) < ∞, supB Ex(r
τC ) < ∞,
supB U(r)C (x, f) <∞, and supB G
(r)C (x, f) >∞.
(iii) If C is f -Kendall of rate r, then supC f(x) <∞, supC Ex(rτC ) <∞, supC U
(r)C (x, f) <
∞, and supC G(r)C (x, f) <∞.
Proof.
(i) Because Ex[∑τC
n=1 rn] = r
r−1Ex(r
τC − 1) and Ex[∑τC−1
n=0 rn] = 1r−1
Ex(rτC − 1), we
have (i).
(ii) We have supC f(x), supB R(r)C (x, f) < M < ∞ for some constant M . Ex[f(Φ0)r0] =
f(x), so M > supB f(x). Then Ex[f(ΦτC )rτC ] ≤ MEx(rτC ). Because f ≥ 1, we
have∞ > supB R(r)C (x, f) ≥ supB Ex[
∑τC−1n=0 rn]. By part (i), supB Ex(r
τC ) < ∞,
so supB Ex[f(ΦτC )rτC ] < ∞. And then supB U(r)C (x, f) < ∞. Finally, G(r)
C (x, f) ≤
f(x) + U(r)C (x, f) so supB G
(r)C (x, f) <∞.
(iii) We apply part (ii) with B = C.
Lemma C.4. (MT, Lemma 15.2.3 and Theorem 15.2.4)
(i) If 0 ≤ f(x) <∞ then at x
PG(r)C = r−1G
(r)C − r
−1I + r−1ICU(r)C .
(ii) Given 1 ≤ f <∞ then VC(x) = G(r)C (x, f) satisfies
PVC(x) =
r−1VC(x)− r−1f(x) + r−1U(r)C (x, f), x ∈ C,
r−1VC(x)− r−1f(x), x /∈ C.
VC(x) = f(x) for x ∈ C and VC ≥ f .
If C is a f -Kendall set of rate r then VC is a solution to (C–4) and bounded on C. VC is
also bounded on B for any f -geometrically regular set B.
108
Proof.
(i): G(r)C = I + ICcU
(r)C , so U (r)
C = G(r)C − I + ICU
(r)C when f(x) < ∞. And U (r)
C =∑∞n=0(PICc)
nPrn+1 = rPG(r)C . We can finish the proof by combining both above
formulas.
(ii): Fix some 1 ≤ f <∞ and let VC(x) = G(r)C (x, f), then
PVC = r−1VC − r−1I + r−1ICU(r)C .
We can prove that ICU(r)C (x, f) = IC(x)U
(r)C (x, f), so
PVC(x) =
r−1VC(x)− r−1f(x) + r−1U(r)C (x, f), x ∈ C,
r−1VC(x)− r−1f(x), x /∈ C.
Note that VC(x) = f(x) for x ∈ C and VC(x) ≥ Ex[f(x)] = f(x).
Now we suppose that C is a f -Kendall set of rate r. By Lemma C.3(iii), we have
supC f(x) <∞ and supC U(r)C (x, f) <∞. So VC is a solution to (C–4) and bounded on C.
Because C is f -Kendall, C is regular, and then C ∈ F+. For any f -geometrically regular
set B, supB R(r)C (x, f) <∞. By Lemma C.3(ii), VC is bounded on B.
Proof of Theorem C.1. We will prove that (ii)⇒(iii)⇒(iv)⇒(ii) and (ii)⇒(v)⇒(i)⇒(ii).
(ii)⇒(iii): It is exactly Lemma C.4(ii).
(iii)⇒(iv): The proof is from Theorem 15.2.6 in MT.
(iv)⇒(ii): There exists a full and absorbing set S which is covered by f -geometrically
regular sets {Sn}. There exists a f -geometrically regular set C = Sn0 ∈ F+, so it is petite
f -Kendall.
(ii)⇒(v): The proof is from Theorem 15.4.1 in MT.
(v)⇒(i): By Theorem 5.2.2 in (MT), there exists a small set C ∈ F+. Φ is π-f -
geometrically ergodic of rate r so∑∞
n=0 rn‖P n(x, ·)− π‖f < ∞ π-a.s. and hence
109
C ∩ {x :∑∞
n=0 rn‖P n(x, ·)− π‖f <∞} ∈ F+. Since
C ∩ {x :∞∑n=0
rn‖P n(x, ·)− π‖f < M} ↑ C ∩ {x :∞∑n=0
rn‖P n(x, ·)− π‖f <∞}
when M →∞, there exists a finite constant M such that A := C∩{x :∑∞
n=0 rn‖P n(x, ·)−
π‖f < M} ∈ F+. C is small so A is small. On A, we have (C–1)-(C–2) with c1 = πf and
c2 = π(f1A).
(i)⇒(ii): The proof is quite similar to the proof of Theorem 15.4.2 in MT. Given a constant
c ∈ R, we also denote by c the constant sequence (c, c, · · · ). For a sequence a = {ai}∞i=0,
we denote a− c = {ai − c}∞i=0, so (a− c)(i) = ai − c. From Chapter 13 and Chapter 14 in
MT, we can show that c1 = πf and c2 = π(A).
First we consider the case A = α is an atom. (i) is equivalent to
rn|P n(α, f)− πf | < M and rn|P n(α, α)− π(α)| < M. (C–6)
From formula (13.45) in (MT), P n(α, f) = [u ∗ tf ](n) where u(n) = P n(α, α) and
tf (n) = Eα[f(Φn)1τα≥n ]. From formula (13.50) in MT,
πf = π(α)∞∑i=1
tf (i) = π(α)n∑i=1
tf (i) + π(α)∞∑
i=n+1
tf (i)
= [π(α) ∗ tf ](n) + π(α)∞∑
i=n+1
tf (i), (because π(α) is constant).
Note that tf is a non-negative sequence and∑n
i=1 tf (i) <∞. We have
P n(α, f)− πf = [(u− π(α)) ∗ tf ](n)− π(α)∞∑
i=n+1
tf (i).
Fix any 1 < s < r,
N∑n=0
sn[P n(α, f)− πf ] =N∑n=0
{[(u− π(α)) ∗ tf ](n)sn} − π(α)N∑n=0
[sn
∞∑i=n+1
tf (i)
].
110
Using |x− y| ≥ ||x| − |y||, we get∣∣∣∣∣N∑n=0
sn[P n(α, f)− πf ]
∣∣∣∣∣ ≥∣∣∣∣∣π(α)
N∑n=0
[sn
∞∑i=n+1
tf (i)
]∣∣∣∣∣−∣∣∣∣∣N∑n=0
{[(u− π(α)) ∗ tf ](n)sn}
∣∣∣∣∣= |SN,1(s)− SN,2(s)|,
where SN,1(s) and SN,2(s) are the first and second terms, respectively. Given non-
negative sequences {an} and {bn}, denote by a ∗ b the convolution of two sequences.
We have
N∑n=0
[(a ∗ b)nsn] =N∑n=0
[n∑k=0
akskbn−ks
n−k
]
=N∑k=0
[(aks
k)N∑n=k
(bn−ksn−k)
]
≤N∑k=0
(aksk)
N∑n=0
(bnsn).
Applying above formula we have
SN,2(s) =
∣∣∣∣∣N∑n=0
{[(u− π(α)) ∗ tf ](n)sn}
∣∣∣∣∣ ≤N∑n=0
{[|u− π(α)| ∗ tf ](n)sn}
≤N∑n=0
tf (n)snN∑k=0
|u− π(α)|(k)sk = cN(s)dN(s),
where cN(s) =∑N
n=0 tf (n)sn and dN(s) =∑N
k=0 |u− π(α)|(k)sk. By (C–6),
dN(s) ≤∞∑k=0
|u− π(α)|(k)sk =∞∑k=0
|P k(α, α)− π(α)|sk <∞.
And then SN,2(s) ≤ cN(s)d(s).
Given any non-negative sequences {an},
N∑n=0
[sn
∞∑i=n+1
ai
]≥
N∑n=0
[sn
N∑i=n+1
ai
]=
N∑i=1
ai
i−1∑n=0
sn = (s− 1)−1
N∑i=1
ai(si − 1)
= (s− 1)−1
[N∑1
aisi −
N∑1
ai
]≥ (s− 1)−1
[N∑1
aisi −
∞∑1
ai
].
111
Replace ai = tf (i) and note that πf = π(α)∑∞
1 tf (i) we get
SN,1(s) = π(α)N∑n=0
[sn
∞∑i=n+1
tf (i)
]≥ (s− 1)−1π(α)[cN(s)− πf/π(α)].
Suppose that cN(s) → ∞ when N → ∞ for all fixed s such that 1 < s < r, so
cN(s)/2 > πf/π(α) for N > N(s). And then SN,1(s) ≥ (s − 1)−1π(α)cN(s)2−1 for
N > N(s). We have
SN,1(s)− SN,2(s) ≥ 2−1(s− 1)−1π(α)cN(s)− cN(s)d(s)
= cN(s)[2−1π(α)(s− 1)−1 − d(s)], N > N(s).
π(α) > 0 so (s − 1)−1π(α)s↓1→ +∞. Since d(s) < ∞, we can select 1 < s0 < r close to 1
such that 2−1π(α)(s0 − 1)−1 > 2d(s0). So SN,1(s0)− SN,2(s0) ≥ cN(s0)d(s0) for N > N(s0).
cN(s)→∞ so SN,1(s0)− SN,2(s0)→∞. And then∣∣∣∣∣N∑n=0
sn0 [P n(α, f)− πf ]
∣∣∣∣∣→∞.There is a contradiction. We must have cN(s) converges to a finite number for some
1 < s1 < r. By formula (13.43) in MT, we have
cN(s1)→∞∑n=0
tf (n)sn1 = Eα
[τα∑k=0
f(φk)sk1
],
so α is f -Kendall. α is an atom so α is petite.
Consider the case Φ is strongly aperiodic with probability measure ν. The split chain
Φ has the accessible atom α := A1 and rn|P n(α, f) − πf | < M for all n where P is the
112
kernel of Φ (see, e.g., MT, p.106-108). We have
rn|P n(α, f)− π∗f | = rn|νP n−1f − πf |, (P n(α, f) = νP n−1f and π∗ = π)
= rn∣∣∣ ∫
C
ν( dx)[P n−1f(x)− πf ]∣∣∣, (ν is a probability measure)
≤ rn∫C
ν( dx)|P n−1f(x)− πf |
≤ r
∫C
ν( dx)M = Mr.
Do the same as above (replace f by 1α), we have
rn|P n(α, α)− π∗(α)| ≤Mr.
From the atom case, α is f -Kendall for the split chain. Since α is an atom, α is petite. So
X is almost everywhere covered by f -geometrically regular sets Sns. A0 ∈ F+ implies
there exist n0 such that C0 = A0 ∩ Sn0 is in F+. C0 ⊂ Sn0, C1 ⊂ α, and Sn0 and α are
f -Kendall, so C is f -Kendall. We can show that
CPn(x, f) = δ∗x(x0)CP
n(x0, f) + δ∗x(x1)CPn(x1, f),
so
U(r)C (x, f) = δ∗x(x0)U
(r)
C(x0, f) + δ∗x(x1)U
(r)
C(x1, f).
From the last formula, C is f -Kendall and C ∈ F+. C ⊂ A so C is a small set.
Finally, consider the case Φ is aperiodic, so Φm is strongly aperiodic for some m.
From strongly aperiodic case, we can find an f -Kendall small set C ∈ F+ for Φm. Using
the proof of Theorem 15.3.6 in MT, C is f -Kendall for Φ.
We have prove that each of conditions (i)-(v) is equivalent to each other condition.
By Theorem 14.3.7 in MT, we have πV <∞. It is easy to check other results.
If Theorem C.1 is about π-f -geometric ergodicity, then the next theorem is about
f -geometric ergodicity. It adds more results to Theorem 15.3.3 in MT.
113
Theorem C.5. (MT, Theorem 15.3.3) Suppose that Φ is ψ-irreducible. Given an F-
measurable function f : X→ [1,∞) then the following three conditions are equivalent:
(i) There exist some petite set C ∈ F and constant κ > 1 such that
supx∈C
R(κ)C (x, f) = sup
x∈CEx
[τC−1∑k=0
f(Xk)κk
]<∞
and R(κ)C (x, f) <∞ for all x ∈ X.
(ii) There exist some petite set B, constants r > 1, b < ∞, and a finite measurable
function V : X→ [1,∞), which V ≥ f , such that
PV ≤ r−1V + b1B.
(iii) X is covered by a union of a countable number of f -geometrically regular sets.
If Φ is also aperiodic and recurrent, any of these three condition implies that Φ is Harris
recurrent and f -geometrically ergodic.
Remark C.6. We note some differences between Theorem C.1 and Theorem C.5.
• The difference between Theorem C.1 and Theorem C.5 is analogous to thatbetween Theorem 11.0.1 and Theorem 11.3.15 in MT. One result is true almostsurely, the other result is true on the whole state space.
• f -geometric ergodicity is weaker than each condition in Theorem C.5. An anal-ogous one is Theorem 14.3.3(iii) in MT. Also note that regularity is a strongercondition than Harris positivity (we can prove that), and Harris positivity is astronger condition than ergodicity (MT, Theorem 13.3.1).
Proof of Theorem C.5. We will prove that (i)⇒(ii)⇒(iii)⇒(i).
(i)⇒(ii): As in the proof of Theorem C.1, let V = G(r)C (·, f) with r = κ. We only need to
prove that V is finite. V = f on C so V (x) < ∞ for x ∈ C. For x /∈ C, we have τC = σC ,
so V = R(r)C (·, f)+Ex[f(φτC )rτC ]. We now prove Ex[f(φτC )rτC ] <∞. From Lemma C.3(iii),
C is f -Kendall so supC f(x) < ∞. Then Ex[f(φτC )rτC ] ≤ [supC f(x)]Ex[rτC ]. We have
∞ > R(r)C (x, f) ≥ Ex[
∑τC−1k=0 rk]. From Lemma C.3(i), we get Ex[rτC ] <∞.
(ii)⇒(iii): By Theorem 15.2.6 in MT, {V < n} is f -geometrically regular for n > 0.
114
Because V is finite, sets {V < n}s for n ∈ N cover X.
(iii)⇒(i): We can repeat the proof in Theorem C.1 here.
If Φ is also aperiodic and recurrent, by Theorem 15.4.1 in MT, (ii) implies that Φ is
f -geometrically ergodic. By Theorem 11.3.4 in MT, Φ is Harris.
For discrete state space, if something holds π-a.s. then it holds on the whole state
space. For example, all recurrent chains is Harris recurrent for discrete state space.
To the end of this section, we only focus on the equivalence between f -geometric
ergodicity and drift condition. If f -geometric ergodicity is weaker than drift condition in
Theorem C.5, then they are equivalent in discrete state space.
Theorem C.7. Let Φ be an irreducible, aperiodic, and recurrent chain with state space X
and transition matrix P . Φ is f -geometrically ergodic if and only if there exist some petite
set B, constants r > 1, b <∞, and a finite function V ≥ f on X such that
PV ≤ r−1V + b1B.
Proof. By Theorem C.1, Φ is π-f -geometrically ergodic and V is finite almost every-
where. Because state space is discrete, Φ is f -geometrically ergodic and V is finite.
Using Theorem C.7, we can prove the next theorem (we replace r−1 by 1− ε).
Theorem C.8. (Popov, 1977) Let Φ be an irreducible, aperiodic, and recurrent chain with
state space X and transition matrix P . Fix any finite subset C 6= ∅ of the state space.
Φ is geometrically ergodic if and only if there exist constants b < ∞, ε > 0, and a finite
function V ≥ 1 on X such that
PV ≤ (1− ε)V + b1C .
Remark C.9. We note some differences between Theorem C.7 and Theorem C.8.
• Theorem C.7 is for f -geometric ergodicity. Theorem C.8 is for geometric ergodicityonly.
115
• In Theorem C.7, B could be a petite set. In Theorem C.8, we need finite set. Wecan prove that every finite set is petite in Lemma C.10(i). But a petite set may notbe finite. For example, if Pij = πj for all i then all sets is small.
To prove Theorem C.8 by using Theorem C.7, we need the following lemma.
Lemma C.10. Let Φ be a discrete state-space, irreducible and aperiodic chain with
transition matrix P . Then
(i) Every finite set is small;
(ii) If Φ is also recurrent and geometrically ergodic, every finite set is geometrically
regular.
Proof.
(i) Suppose that C is finite. Fix any state j. The chain is irreducible and aperiodic, so for
each i ∈ C, there exists Ni such that P n(i, j) > 0 for n ≥ Ni. Let N = maxi∈C Ni. Define
ν by ν(j) = mini∈C PN(i, j) > 0 and ν(i) = 0 when i 6= j. Let B be any subset of state
space. If j /∈ B, ν(B) = 0 ≤ PN(i, B). If j ∈ B and i ∈ C, PN(i, B) ≥ PN(i, j) ≥ ν(j) =
ν(B). So C is small.
(ii) Suppose that C is finite. Φ is geometrically ergodic, by Theorem C.7, we can cover
X by a countable number of geometrically regular sets {Ci}s. For each state i0 in C,
there exists a geometrically regular set Ci0 such that i0 ∈ Ci0. From the definition of
geometrically regular set, union of finite geometrically regular sets is geometrically
regular, and a subset of a geometrically regular set is geometrically regular. So ∪i0∈CCi0
is geometrically regular, and then subset C of ∪i0∈CCi0 is geometrically regular.
Proof of Theorem C.8. If Φ is geometrically geometric then C is geometrically regular by
Lemma C.10(ii). By Theorem 15.2.1 in MT, C is a petite Kendall set. By Theorem C.7,
there exists a drift condition for C.
Suppose that there is a drift condition for C and some V . C is small by Lemma C.10(i),
so Φ is geometrically ergodic by Theorem C.7.
116
REFERENCES
AKHIEZER, N. I. and GLAZMAN, I. M. (1993). Theory of linear operators in Hilbertspace. Dover Publications Inc.
BILLINGSLEY, P. (1995). Probability and measure. 3rd ed. Wiley Series in Probabilityand Mathematical Statistics, John Wiley & Sons, Inc., New York.
BRAMSON, M. (2008). Stability of queueing networks, vol. 1950 of Lecture Notes inMathematics. Springer, Berlin.
BROOKS, S., GELMAN, A., JONES, G. and MENG, X.-L. (eds.) (2011). Handbook ofMarkov Chain Monte Carlo. Chapman & Hall/CRC Press.
CHAN, K. S. and GEYER, C. J. (1994). Discussion: Markov chains for exploringposterior distributions. The Annals of Statistics, 22 1747–1758.
CHEN, M. F. (1992). From Markov chains to nonequilibrium particle systems. WorldScientific Publishing Co., Inc., River Edge, NJ.
CHEN, M.-F. (2004). From Markov chains to non-equilibrium particle systems. 2nd ed.World Scientific Publishing Co., Inc., River Edge, NJ.
CHOI, H. M. and HOBERT, J. P. (2013). Analysis of MCMC algorithms for Bayesianlinear regression with Laplace errors. Journal of Multivariate Analysis, 117 32–40.
CONWAY, J. B. (1990). A course in functional analysis, vol. 96 of Graduate Texts inMathematics. 2nd ed. Springer-Verlag, New York.
DIACONIS, P., KHARE, K. and SALOFF-COSTE, L. (2008). Gibbs sampling, exponentialfamilies and orthogonal polynomials. Statistical Science, 23 151–178.
DIACONIS, P. and STROOCK, D. (1991). Geometric bounds for eigenvalues of Markovchains. The Annals of Applied Probability, 1 36–61.
DIEBOLT, J. and ROBERT, C. P. (1994). Estimation of finite mixture distributions byBayesian sampling. Journal of the Royal Statistical Society, Series B, 56 363–375.
FLEGAL, J. M., HARAN, M. and JONES, G. L. (2008). Markov chain Monte Carlo: Canwe trust the third significant figure? Statistical Science, 23 250–260.
HALMOS, P. R. (1950). Measure Theory. D. Van Nostrand Company, Inc., New York, N.Y.
HOBERT, J. P. and CASELLA, G. (1996). The effect of improper priors on Gibbssampling in hierarchical linear mixed models. Journal of the American StatisticalAssociation, 91 1461–1473.
117
HOBERT, J. P. and GEYER, C. J. (1998). Geometric ergodicity of Gibbs and block Gibbssamplers for a hierarchical random effects model. Journal of Multivariate Analysis, 67414–430.
HOBERT, J. P. and KHARE, K. (2014). Computable upper bounds on the distance tostationarity for Jovanovski and Madras’s Gibbs sampler. Tech. rep., U. of Florida.
HOBERT, J. P., ROY, V. and ROBERT, C. P. (2011). Improving the convergence prop-erties of the data augmentation algorithm with an application to Bayesian mixturemodeling. Statistical Science, 26 332–351.
JARNER, S. F. and HANSEN, E. (2000). Geometric ergodicity of Metropolis algorithms.Stochastic Processes and their Applications, 85 341–361.
JARNER, S. F. and TWEEDIE, R. L. (2003). Necessary conditions for geometric andpolynomial ergodicity of random-walk-type Markov chains. Bernoulli, 9 559–578.
JONES, G. L. and HOBERT, J. P. (2001). Honest exploration of intractable probabilitydistributions via Markov chain Monte Carlo. Statistical Science, 16 312–334.
JOVANOVSKI, O. and MADRAS, N. (2014). Convergence rates for hierarchical Gibbssamplers. Tech. rep., York University. ArXiv:1402.4733.
JUNG, Y. J. and HOBERT, J. P. (2014). Spectral properties of MCMC algorithms forBayesian linear regression with generalized hyperbolic errors. Statistics & ProbabilityLetters, 95 92–100.
KADISON, R. V. and RINGROSE, J. R. (1997). Fundamentals of the theory of operatoralgebras. Vol. I, vol. 15 of Graduate Studies in Mathematics. American MathematicalSociety, Providence, RI.
KARLIN, S. and MCGREGOR, J. (1959). Random walks. Illinois Journal of Mathematics,3 66–81.
KARLIN, S. and TAYLOR, H. M. (1975). A First Course in Stochastic Processes.Academic Press.
KHARE, K. and HOBERT, J. P. (2011). A spectral analytic comparison of trace-class dataaugmentation algorithms and their sandwich variants. The Annals of Statistics, 392585–2606.
KNOPP, K. (1951). Theory and Application of Infinite Series. Courier Corporation.
KOVCHEGOV, Y. (2010). Orthogonality and probability: mixing times. ElectronicCommunications in Probability, 15 59–67.
ŁATUSZYNSKI, K., ROBERTS, G. and ROSENTHAL, J. (2013). Adaptive Gibbs samplersand related MCMC methods. The Annals of Applied Probability 66–98.
LEONI, G. (2009). A First Course in Sobolev Spaces. American Mathematical Soc.
118
LI, B. (2003). Real operator algebras. World Scientific Publishing Co. Inc., River Edge,NJ.
LIU, J. S., WONG, W. H. and KONG, A. (1994). Covariance structure of the Gibbssampler with applications to comparisons of estimators and augmentation schemes.Biometrika, 81 27–40.
LIU, J. S., WONG, W. H. and KONG, A. (1995). Covariance structure and convergencerate of the Gibbs sampler with various scans. Journal of the Royal Statistical Society.Series B. Methodological, 57 157–169.
MAO, Y., TAI, Y., ZHAO, Y. Q. and ZOU, J. (2012). Ergodicity for the $GI/G/1$-typeMarkov Chain. ArXiv e-prints 16. 1208.5225.
MAO, Y.-H. (2010). Convergence rates for reversible Markov chains without theassumption of nonnegative definite matrices. Science China. Mathematics, 531979–1988.
MAO, Y.-H. and ZHANG, Y.-H. (2004). Exponential ergodicity for single-birth processes.Journal of Applied Probability, 41 1022–1032.
MEYN, S. and TWEEDIE, R. L. (2009). Markov chains and stochastic stability. 2nd ed.Cambridge University Press, Cambridge.
NUMMELIN, E. (1984). General irreducible Markov chains and nonnegative operators,vol. 83 of Cambridge Tracts in Mathematics. Cambridge University Press, Cambridge.
PAL, S., KHARE, K. and HOBERT, J. P. (2015). Trace class markov chains for bayesianinference with generalized double pareto shrinkage prior. Tech. rep., U. of Florida.
POPOV, N. N. (1977). Geometric ergodicity conditions for countable Markov chains.Doklady Akademii Nauk SSSR, 234 316–319.
RINGROSE, J. R. (1971). Compact non-self-adjoint operators. Van Nostrand ReinholdCo, London, New York.
ROBERTS, G. O. and ROSENTHAL, J. S. (1997). Geometric ergodicity and hybridMarkov chains. Electronic Communications in Probability, 2 no. 2, 13–25 (electronic).
ROBERTS, G. O. and ROSENTHAL, J. S. (1998). Markov chain Monte Carlo: Somepractical implications of theoretical results (with discussion). Canadian Journal ofStatistics, 26 5–31.
ROBERTS, G. O. and ROSENTHAL, J. S. (2004). General state space Markov chainsand MCMC algorithms. Probability Surveys, 1 20–71 (electronic).
ROBERTS, G. O. and TWEEDIE, R. L. (2001). Geometric L2 and L1 convergence areequivalent for reversible Markov chains. Journal of Applied Probability, 38A 37–41.
119
SEARLE, S. R., CASELLA, G. and MCCULLOCH, C. E. (1992). Variance components.Wiley Series in Probability and Mathematical Statistics: Applied Probability andStatistics, John Wiley & Sons, Inc., New York.
TAN, A. (2009). Convergence Rates and Regeneration of the Block Gibbs Sampler forBayesian Random Effects Models. Ph.D. thesis, Department of Statistics, University ofFlorida.
TAN, A. and HOBERT, J. P. (2009). Block Gibbs sampling for Bayesian random effectsmodels with improper priors: convergence and regeneration. Journal of Computationaland Graphical Statistics, 18 861–878.
TAN, A., JONES, G. L. and HOBERT, J. P. (2013). On the geometric ergodicity of two-variable Gibbs samplers. In Advances in Modern Statistical Theory and Applications:A Festschrift in Honor of Morris L. Eaton (G. L. Jones and X. Shen, eds.), vol. 10 ofIMS Collections Ser. IMS, Beachwood, OH, 25–42.
VAN DOORN, E. A. and SCHRIJNER, P. (1995). Geometric ergodicity and quasi-stationarity in discrete-time birth-death processes. Australian Mathematical Society.Journal. Series B. Applied Mathematics, 37 121–144.
WINKELBAUER, A. (2012). Moments and Absolute Moments of the Normal Distribution.ArXiv e-prints. 1209.4340.
120
BIOGRAPHICAL SKETCH
Trung Ha was born in 1981 in Hanoi, Vietnam. He was recruited to the gifted
mathematics program of Be Van Dan elementary and middle school in 1990 and to
the mathematics program of HUS High School for Gifted Students, Vietnam National
University, Hanoi in 1996. In 1999, he was admitted to the mathematics program of
the Center of Training of Talented Engineers, Hanoi University of Technology. In 2004,
he earned his bachelor’s degree in applied mathematics and informatics. In 2005, he
worked as researcher at the Department of Probability and Statistics, Hanoi Institute of
Mathematics. He got VEF fellowship to attend the Department of Statistics, University
of Florida in 2007. He received his master’s degree in 2013 and his doctorate degree in
2016.
121
Recommended