RANDOM WALKS
ARIEL YADIN
Course: 201.1.8031 Spring 2016
Lecture notes updated: May 2, 2016
Contents
Lecture 1. Introduction 3
Lecture 2. Markov Chains 8
Lecture 3. Recurrence and Transience 18
Lecture 4. Stationary Distributions 26
Lecture 5. Positive Recurrent Chains 33
Lecture 6. Convergence to Equilibrium 37
Lecture 7. Conditional Expectation 42
Lecture 8. Martingales 50
Lecture 9. Reversible Chains 55
Lecture 10. Discrete Analysis 60
Lecture 11. Networks 67
Lecture 12. Network Reduction 73
Lecture 13. Thompson’s Principle 80
Lecture 14. Nash-Williams 84
Lecture 15. Flows 89
Lecture 16. Resistance in Euclidean Lattices 93
Lecture 17. Spectral Analysis 981
2
Lecture 18. Kesten’s Amenability Criterion 103
Lecture 19. 107
Lecture 20. 112
Lecture 21. 118
Lecture 22. 126
Number of exercises in lecture: 0
Total number of exercises until here: 0
3
Random Walks
Ariel Yadin
Lecture 1: Introduction
1.1. Overview
In this course we will study the behavior of random processes; that is, processes that evolve
in time with some randomness, or probability measure, governing the evolution.
Let us give some examples:
• A gambler playing the roulette.
• A drunk man walking in some city.
• A drunk bird flying in the sky.
• The evolution of a certain family name.
Some questions which we will be able to (hopefully) answer by the end of the course:
• Suppose a gambler starts with N Shekel. What is the probability that the gambler will
earn another N Shekel before losing all of the money?
• How long will it take for a drunk man walking to reach either his house or the city limits?
• Suppose a chess knight moves randomly on a chess board. Will the knight eventually
return to the starting point? What is the expected number of steps until the knight
returns?
• Suppose that men of the Rothschild family have three children on average. What is the
probability that the Rothschild name will still be alive in another 100 years? Is there
positive probability for the Rothschild name to survive forever?
1.2. Random Walks on Z
We will start with some “soft” example, and then go into the more deep and precise theory.
What is a random walk? A (simple) random walk on a graph is a process, or a sequence of
vertices, such that at every step the next vertex is chosen uniformly among the neighbors of the
current vertex, each step of the walk independently.
X Story about Polya meeting a couple in the woods.
George Polya (1887-1985)
4
Figure 1. Path of a drunk man walking in the streets.
Figure 2. Path of a drunk bird flying around.
Now, suppose we want to perform a random walk on Z. If the “walker” is at a vertex z, then
a uniformly chosen neighbor is choosing z + 1 or z − 1 with probability 1/2 each.
5
That is, we can model a random walk on Z by considering an i.i.d. sequence (Xk)∞k=1, where
Xk is uniform on −1, 1, and the walk will be St =∑tk=1Xk. So Xk is the k-th step of the
walk, and St is the position after t steps.
Let us consider a few properties of the random walk on Z:
First let us calculate the expected number of visits to 0 by time t:
• Proposition 1.1. Let (St)t be a random walk on Z. Denote by Vt the number of visits to 0
up to time t; that is,
Vt = # 1 ≤ k ≤ t : Sk = 0 .
Then, there exists a constant c > 0 such that for all t,
E[Vt] > c√t.
Proof. An inequality we will use is Stirling’s approximation of n!:
√2πn(n/e)ne
112n+1 < n! <
√2πn(n/e)ne
112n .
This leads by a bit of careful computation to:
1√πn· 22n exp
(− 1
12n+ 1
)<
(2n
n
)<
1√πn· 22n exp
(1
12n
).
Specifically,
12 <√πn · 2−2n
(2n
n
)< 2.
James Stirling (1692-1770)Now, what is the probability P[Sk = 0]? Note that there are k steps, so for Sk = 0 we need
that the number of +’s equals the number of −’s. Rigorously, if
Rt = # 1 ≤ k ≤ t : Xk = 1 and Lt = # 1 ≤ k ≤ t : Xk = −1 ,
then Rt + Lt = t. Moreover, the distribution of Rt is Bin(t, 1/2). Also, St = Rt − Lt, so for
St = 0 we need that Rt = Lt = t/2. This is only possible for even t, and we get
P[S2k = 0] = P[R2k = k] =
(2k
k
)2−2k and P[S2k+1 = 0] = 0.
Now, note that Vt =∑tk=1 1Sk=0. So
E[Vt] =
t∑k=1
P[Sk = 0] =
bt/2c∑k=1
(2k
k
)2−2k
6
Since
m∑k=1
1√πk≥∫ m+1
1
1√πxdx = 2π−1/2 · (
√m+ 1− 1),
we get that E[Vt] ≥ c√t for some c > 0. ut
Let us now consider the probability that the random walker will return to the origin.
• Proposition 1.2. P[∃ t ≥ 1 : St = 0] = 1.
Proof. Let p = P[∃ t ≥ 1 : St = 0]. Assume for a contradiction that p < 1. (p > 0 since
p > P[S2 = 0] = 12 .) Suppose that St = 0 for some t > 0. Then, since St+k = St +
∑kj=1Xt+j ,
(St+k)k has the same distribution as a random walk on Z, and is independent of St.
So P[∃ k ≥ 1 : St+k = 0 | St = 0] = p. Thus, every time we are at 0 there is probability
0 < 1− p < 1 to never return.
Now we consider the different “excursions”. That is, let T0 = 0 and define inductively
Tk = inf t ≥ Tk−1 + 1 : St = 0 ,
where inf ∅ =∞. Now let K be the first k such that Tk =∞. The analysis above gives that for
k ≥ 1,
P[K = k] = P[T1 <∞, . . . , Tk−1 <∞, Tk =∞] = P[T1−T0 <∞, . . . , Tk−1−Tk−2 <∞, Tk−Tk−1 =∞].
The main observation now is that the different Tk − Tk−1 are independent, so P[K = k] =
pk−1(1 − p). That is, K ∼ Geo(1 − p). Thus, E[K] = 11−p . But note that K is exactly the
number of visits to 0 in the infinite time walk. That is, Vt K. However, in the previous
proposition we have shown that E[Vt] ≥ c√t→∞ a contradiction!
So it must be that p = 1. ut
It is not a coincidence that the expected number of visits to 0 is infinite, and that the
probability to return to 0 is 1. This will also be the case in 2-dimensions, but not in 3-dimensions.
In the upcoming classes we will rigorously prove the following theorem by Polya.
••• Theorem 1.3. Fix d ≥ 1. Let (Xk)k be i.i.d. d-dimension random variables uniformly dis-
tributed on ±e1, . . . ,±ed (where e1, . . . , ed is the standard basis for Rd). Let St =∑tk=1Xk.
Let p(d) = P[∃ t ≥ 1 : St = 0]. Then, p(d) = 1 for d ≤ 2 and p(d) < 1 for d ≥ 3.
7
Remark 1.4. The proof for d ≥ 3 is mainly that P[St = 0] ≤ Ct−d/2. Thus, for d ≥ 3,
∞∑t=1
P[St = 0] <∞.
So by the Borel-Cantelli Lemma P[St = 0 i.o. ] = 0. In other words,
P[∃ T : ∀t > T St 6= 0] = P[lim inf St 6= 0] = 1.
Thus, a.s. the number of visits to 0 is finite. If the probability to return to 0 was 1, then the
number of visits to 0 must be infinite a.s. All this will be done rigorously in the upcoming
classes.
Number of exercises in lecture: 0
Total number of exercises until here: 0
8
Random Walks
Ariel Yadin
Lecture 2: Markov Chains
2.1. Preliminaries
2.1.1. Graphs. We will make use of the structure known as a graph:
X Notation: For a set S we use(Sk
)to denote the set of all subsets of size k in S; e.g.
(S
2
)= x, y : x, y ∈ S, x 6= y .
• Definition 2.1. A graph G is a pair G = (V (G), E(G)), where V (G) is a countable set, and
E(G) ⊂(V (G)
2
).
The elements of V (G) are called vertices. The elements of E(G) are called edges. The
notation xG∼ y (sometimes just x ∼ y when G is clear from the context) is used for x, y ∈ E(G).
If x ∼ y, we say that x is a neighbor of y, or that x is adjacent to y. If x ∈ e ∈ E(G) then
the edge e is said to be incident to x, and x is incident to e.
The degree of a vertex x, denoted deg(x) = degG(x) is the number of edges incident to x in
G.
X Notation: Many times we will use x ∈ G instead of x ∈ V (G).
Example 2.2. • The complete graph.
• Empty graph on n vertices.
• Cycles.
• Z,Z2,Zd.
• Regular trees.
• Cayley graphs of finitely generated groups: Let G =< S > be a finitely generated group,
with a finite generating set S such that S is symmetric (S = S−1). Then, we can equip G
with a graph structure C = CG,S by letting V (C) = G and g, h ∈ E(C) iff g−1h ∈ S.
S being symmetric implies that this is a graph.
9
CG,S is called the Cayley graph of G with respect to S.
Examples: Zd, regular trees, cycles, complete graphs.
454
• Definition 2.3. Let G be a graph. A path in G is a sequence γ = (γ0, γ1, . . . , γn) (with the
possibility of n = ∞) such that for all j, γj ∼ γj+1. γ0 is the start vertex and γn is the end
vertex (when n <∞).
The length of γ is |γ| = n.
If γ is a path in G such that γ starts at x and ends at y we write γ : x→ y.
The notion of a path on a graph gives rise to two important notions: connectivity and graph
distance.
• Definition 2.4. Let G be a graph. For two vertices x, y ∈ G define
dist(x, y) = distG(x, y) := inf |γ| : γ : x→ y ,
where inf ∅ =∞.
Exercise 2.1. Show that distG defines a metric on G.
(Recall that a metric is a function that satisfies:
• ρ(x, y) ≥ 0 and ρ(x, y) = 0 iff x = y.
• ρ(x, y) = ρ(y, x).
• ρ(x, y) ≤ ρ(x, z) + ρ(z, y). )
• Definition 2.5. Let G be a graph. We say that vertices x and y are connected if there exists
a path γ : x → y of finite length. That is, if distG(x, y) < ∞. We denote x connected to y by
x↔ y.
The relation ↔ is an equivalence relation, so we can speak of equivalence classes. The equiv-
alence class of a vertex x under this relation is called the connected component of x.
If a graphG has only one connected component it is called connected. That is, G is connected
if for every x, y ∈ G we have that x↔ y.
10
Exercise 2.2. Prove that ↔ is an equivalence relation in any graph.
X In this course we will focus on connected graphs.
X Notation: For a path in a graph G, or more generally, a sequence of elements from a set
S, we use the following “time” notation: If s = (s0, s1, . . . , sn, . . .) is a sequence in S (finite of
infinite), then s[t1, t2] = (st1 , st1+1, . . . , st2) for all integers t2 ≥ t1 ≥ 0.
2.1.2. S-valued random variables. Given a countable set S, we can define a discrete topology
on S. Thus, the Borel σ-algebra on S is just the complete σ-algebra 2S . This gives rise to
the notion of S-valued random variables, which is just a fancy name for functions X from a
probability space into S such that for every s ∈ S the pull-back X−1(s) is an event.
That is,
• Definition 2.6. Let (Ω,F ,P) be a probability space, and let S be a countable set. A S-valued
random variable is a function X : Ω→ S such that for any s ∈ S, X−1(s) ∈ F .
2.1.3. Sequences - infinite dimensional vectors. At some point, we will want to consider
sequences of random variables. If X = (Xn)n is a sequence of S-valued random variables, we
can think of X as an infinite dimensional vector.
What is the appropriate measurable space for such vectors?
Well, we can consider Ω = SN, the space of all sequences in S. Next, we have a π-system
of cylinder sets: Given a finite sequence s0, s1, . . . , sm in S, the cylinder induced by these is
C = C(s1, . . . , sm) =ω ∈ SN : ω0 = s0, . . . , ωm = sm
. The collection of all cylinder sets
forms a π-system. We let F be the σ-algebra generated by this π-system.
2.1.4. Caratheodory and Kolmogorov extension. Now suppose we have a probability mea-
sure P on (Ω,F) as above. For every n, we can consider the restriction of P to the first n
Constantin Caratheodory
(1873-1950)
coordinates; that is, we can consider Ωn = Sn and the full σ-algebra on Ωn, and then
Pn[s0, s1, . . . , sn−1] := P[C(s0, s1, . . . , sn−1)]
11
defines a probability measure on Ωn. Note that these measures are consistent, in the sense that
for any n > m,
Pm[s0, . . . , sm] = Pn[ω ∈ Sn : ω0 = s0, . . . , ωm = sm].
Theorems by Caratheodory and Kolmogorov tell us that if we started with a consistent family
of probability measure on Sn, n = 1, 2, . . ., we could find a unique extension of these whose
restriction would give these measures.
Andrey Kolmogorov
(1903-1987)
In other words, the finite-dimensional marginals determine the probability measure of the
sequence.
2.1.5. Matrices. Recall that if A,B are n × n matrix and v is an n-dimensional vector, then
Av, vA are vectors defined by
(Av)k =
n∑j=1
Ak,jvj and (vA)k =
n∑j=1
vjAj,k.
Also, AB is the matrix defined by
(AB)m,k =
n∑j=1
Am,jBj,k.
These definitions can be generalized to infinite dimensions.
Also, we will view vectors also as functions, and matrices as operators. For example, if
C0(N) = RN = f : N→ R. Then, any infinite matrix A is an operator on C0(N) by defining
(Af)(k) :=∑n
A(k, n)f(n) and (fA)(k) :=∑n
f(n)A(n, k).
2.2. Markov Chains
A stochastic process is just a sequence of random variables. If (Xn)n is a stochastic
process, or a sequences of random variables, then we can think of the sequence (Xn)n as a
infinite dimensional random variable; consider the function f : N → R defined by f(n) = Xn.
This is a different function for each ω ∈ Ω. We can view this as a random function.
Up till now we have not restricted our processes - so anything can be a stochastic process.
However, in the discussion regarding random walks, we wanted the current step to be dependent
only on the position, regardless of the history and time. This gives rise to the following definition:
• Definition 2.7. Let S be a countable set. A Markov chain on S is a sequence (Xn)n≥0 of
S-valued random variables (i.e. measurable functions Xn : Ω → S), that satisfies the following
Markovian property:
12
• For any n ≥ 0, and any s0, s1, . . . , sn, sn+1 ∈ S,
P[Xn+1 = sn+1|X0 = s0, . . . , Xn = sn] = P[Xn+1 = sn+1|Xn = sn] = P[X1 = sn+1|X0 = sn].
Andrey Markov (1871-1897)
That is, the probability to go from s to s′ does not depend on n or on the history, but only
on the current position being at s and on s′. This property is known as the Markov property.
X A set S as above is called the state space.
Remark 2.8. Any Markov chain is characterized by its transition matrix.
Let (Xn)n be a Markov chain on S. For x, y ∈ S define P (x, y) = P[Xn+1 = y|Xn = x]
(which is independent of n). Then, P is a |S| × |S| matrix indexed by the elements of S. One
immediately notices that for all x, ∑y∈S
P (x, y) = 1,
and that all the entries of P are in [0, 1]. Such a matrix is called stochastic. [ Each row of the
matrix is a probability measure on S. ]
On the other hand, suppose that P is a stochastic matrix indexed by a countable set S. Then,
one can define the sequence of S-valued random variables as follows. Let X0 = x for some fixed
starting point x ∈ X. For all n ≥ 0, conditioned on X0 = s0, . . . , Xn = sn, define Xn+1 as
the random variable with distribution P[Xn+1 = y|Xn = sn, . . . , X0 = s0] = P (sn, y). One can
verify that this defines a Markov chain.
We will identify a stochastic matrix P with the Markov chain it defines.
X Notation: We say that (Xt)t is Markov-(µ, P ) if (Xt)t is a Markov chain with transition
matrix P and starting distribution X0 ∼ µ. If we wish to stress the state space, we say that
(Xt)t is Markov-(µ, P, S). Sometimes we omit the starting distribution; i.e. (Xt)t is Markov-P
means that (Xt)t is a Markov chain with transition matrix P .
Example 2.9. Consider the following state space and matrix: S = Z. P (x, y) = 0 if |x− y| 6= 1
and P (x, y) = 1/2 if |x− y| = 1.
What if we change this to P (x, y) = 1/4 for |x− y| = 1 and P (x, x) = 1/2?
What about P (x, x+ 1) = 3/4 and P (x, x− 1) = 1/4? 454
Example 2.10. Consider the set Zn := Z/nZ = 0, 1, . . . , n− 1. Let P (x, y) = 1/2 for
x− y ∈ −1, 1 (mod n). 454
13
Example 2.11. Let G be a graph. For x, y ∈ G define P (x, y) = 1deg(x) if x ∼ y and P (x, y) = 0
if x 6∼ y.
This Markov chain is called the simple random walk on G.
If we take 0 < α < 1 and set Q(x, x) = α and Q(x, y) = (1 − α) · 1deg(x) for x ∼ y, and
Q(x, y) = 0 for x 6∼ y, then Q is also a stochastic matrix, and defines what is sometimes called
the lazy random walk on G (with holding probability α). Note that Q = αI+ (1−α)P . 454
X Notation: We will usually use (Xn)n to denote the realization of Markov chains. We will
also use Px to denote the probability measure Px = P[·|X0 = x]. Note that the Markov property
is just the statement that
P[Xn = x|X0 = s0, . . . , Xn = sn] = P[Xn+1 = x|Xn = sn] = Psn [X1 = x].
More generally, if µ is a probability measure on S, we write
Pµ = P[·|X0 ∼ µ] =∑s
µ(s)Ps .
Exercise 2.3. Let (Xn)n be a Markov chain on state space S, with transition matrix
P . Show that for any event A ∈ σ(X0, . . . , Xk)
Pµ[Xn+k = y|A,Xk = x] = Pn(x, y)
(provided Pµ[A,Xk = x] > 0).
Remark 2.12. For those uncomfortable with σ-algebras,
Example 2.13. Consider a bored programmer. She has a (possibly biased) coin, and two chairs,
say a and b. Every minute, out of boredom, she tosses the coin. If it comes out heads, she moves
to the other chair. Otherwise, she does nothing.
This can be modeled by a Markov chain on the state space a, b. At each time, with some
probability 1− p the programmer does not move, and with probability p she jumps to the other
state. The corresponding transition matrix would be P =
1− p p
p 1− p
.
What is the probability Pa[Xn = b] =? For this we need to calculate Pn.
A complicated way would be to analyze the eigenvalues of P ...
14
An easier way: Let µn = Pn(a, ·). So µn+1 = µnP . Consider the vector π = (1/2, 1/2).
Then πP = P . Now, consider an = (µn − π)(a). Since µn is a probability measure, we get that
µn(b) = 1− µn(a), so
an = (µn−1 − π)P (a) = (1− p)µn−1(a) + pµn−1(b)− 1/2
= (1− 2p)(µn−1 − π)(a) + p− π(a) + (1− 2p)π(a) = (1− 2p)an−1.
So an = (1− 2p)na0 = (1− 2p)n · 12 and Pn(a, a) = µn(a) = 1+(1−2p)n
2 . (This also implies that
Pn(a, b) = 1− Pn(a, a) = 1−(1−2p)n
2 .)
We see that
Pn → 12
1 1
1 1
=
ππ
.454
The following proposition relates starting distributions, and steps of the Markov chain, to
matrix and vector multiplication.
• Proposition 2.14. Let (Xn)n be a Markov chain with transition matrix P on some state
space S. Let µ be some distribution on S; i.e. µ is an S-indexed vector with∑s µ(s) = 1. Then,
Pµ[Xn = y] = (µPn)(y). Specifically, taking µ = δx we get that Px[Xn = y] = Pn(x, y).
Moreover, if f : S → R is any function, which can be viewed as a S-indexed vector, then
µPnf = Eµ[f(Xn)] and (Pnf)(x) = Ex[f(Xn)].
Proof. This is shown by induction: It is the definition for n = 0 (P 0 = I the identity matrix).
The Markov property gives for n > 0, using induction,
Pµ[Xn = y] =∑s∈S
Pµ[Xn = y|Xn−1 = s]Pµ[Xn−1 = s]
=∑s
P (s, y)(µPn−1)(s) = ((µPn−1)P )(y) = (µPn)(y).
The second assertion also follows by conditional expectation,
Eµ[f(Xn)] =∑s
µ(s)E[f(Xn)|X0 = s] =∑s
µ(s)∑x
P[Xn = x|X0 = s]f(x)
=∑s,x
µ(s)Pn(s, x)f(x) = µPnf.
(Pnf)(x) = Ex[f(Xn)] is just for µ = δx. ut
15
2.3. Classification of Markov chains
When we spoke about graphs, we have the notion of connectivity. We are now interested to
generalize this notion to Markov chains. We want to say that a state x is connected to a state y
if there is a way to get from x to y; note that for general Markov chains this does not necessarily
imply that one can get from y to x.
• Definition 2.15. Let P be the transition matrix of a Markov chain on S. P is called
irreducible if for every pair of states x, y ∈ S there exists t > 0 such that P t(x, y) > 0.
This means that for every pair, there is a large enough time such that with positive probability
the chain can go from one of the pair to the other in that time.
Example 2.16. Consider the cycle Z/nZ, for n even. This is an irreducible chain since for any
x, y, we have for t = dist(x, y), if γ is a path of length t from x to y,
P t(x, y) ≥ Px[(X0, . . . , Xt) = γ] = 2−t > 0.
Note that at each step, the Markov chain moves from the current position +1 or −1 (mod n).
Thus, since n is even, at even times the chain must be at even vertices, and at odd times the
chain must be at odd vertices.
Thus, it is not true that there exists t > 0 such that for all x, y, P t(x, y) > 0.
The main reason for this is that the chain has a period: at even times it is on some set, and
at odd times on a different set. Similarly, the chain cannot be back at its starting point at odd
times, only at even times. 454
• Definition 2.17. Let P be a Markov chain on S.
• A state x is called periodic if gcd t ≥ 1 : P t(x, x) > 0 > 1, and this gcd is called the
period of x.
• If gcd t ≥ 1 : P t(x, x) > 0 = 1 the x is called aperiodic.
• P is called aperiodic if all x ∈ S are aperiodic. Otherwise P is called periodic.
X Note that in the even-length cycle example, gcd t ≥ 1 : P t(x, x) > 0 = gcd 2, 4, 6, . . . =
2.
Remark 2.18. If P is periodic, then there is an easy way to “fix” P to become aperiodic: namely,
let Q = αI+(1−α)P be a lazy version of P . Then, Q(x, x) ≥ α for all x, and thus Q is aperiodic.
16
• Proposition 2.19. Let P be a Markov chain on state space S.
• x is aperiodic if and only if there exists t(x) such that for all t > t(x), P t(x, x) > 0.
• If P is irreducible, then P is aperiodic if and only if there exists an aperiodic state x.
• Consequently, if P is irreducible and aperiodic, and if S is finite, then there exists t0
such that for all t > t0 all x, y admit P t(x, y) > 0.
Proof. We start with the first assertion. Assume that x is aperiodic. LetR = t ≥ 1 : P t(x, x) > 0.Since P t+s(x, x) ≥ P t(x, x)P s(x, x) we get that t, s ∈ R implies t+ s ∈ R; i.e. R is closed under
addition. A number theoretic result tells us that since gcdR = 1 it must be that Rc is finite.
The other direction is simpler. If Rc is finite, then R contains primes p 6= q, so gcdR =
gcd(p, q) = 1.
For the second assertion, if P is irreducible and x is aperiodic, then let t(x) be such that for
all t > t(x), P t(x, x) > 0. For any z, y let t(z, y) be such that P t(z,y)(z, y) > 0 (which exists by
irreducibility). Then, for any t > t(y, x) + t(x) + t(x, y) we get that
P t(y, y) ≥ P t(y,x)(y, x)P t−t(y,x)−t(x,y)(x, x)P t(x,y)(x, y) > 0.
So for all large enough t, P t(y, y) > 0, which implies that y is aperiodic. This holds for all y, so
P is aperiodic.
The other direction is trivial from the definition.
For the third assertion, for any z, y let t(z, y) be such that P t(z,y)(z, y) > 0. Let T =
maxz,y t(z, y). Let x be an aperiodic state and let t(x) be such that for all t > t(x), P t(x, x) >
0. We get that for any t > 2T + t(x) we have that t− t(z, x)− t(x, z) ≥ t− 2T > t(x), so
P t(z, y) ≥ P t(z,x)(z, x)P t−t(z,x)−t(x,z)(x, x)P t(x,z)(x, z) > 0.
ut
Exercise 2.4. Let G be a finite connected graph, and let Q be the lazy random walk
on G with holding probability α; i.e. Q = αI + (1 − α)P where P (x, y) = 1deg(x) if x ∼ y and
P (x, y) = 0 if x 6∼ y.
Show that Q is aperiodic. Show that for diam(G) = max dist(x, y) : x, y ∈ G we have that
for all t > diam(G), all x, y ∈ G admit Qt(x, y) > 0.
17
Number of exercises in lecture: 4
Total number of exercises until here: 4
18
Random Walks
Ariel Yadin
Lecture 3: Recurrence and Transience
3.1. Recurrence and Transience
X Notation: If (Xt)t is Markov-P on state space S, we can define the following: For A ⊂ S,
TA = inf t ≥ 0 : Xt ∈ A and T+A = inf t ≥ 1 : Xt ∈ A .
These are the hitting time of A and return time to A. (We use the convention that inf ∅ =∞.)
If A = x we write Tx = Tx and similarly T+x = T+
x.
Recall that we saw that the simple random walk on Z a.s. returns to the origin. We also
stated that on Z3 this is not true, and the simple random walk will never return to the origin
with positive probability.
Let us classify Markov chain according to these properties.
• Definition 3.1. Let P be a Markov chain on S. Consider a state x ∈ S.
• If Px[T+x =∞] > 0, we say that x is a transient state.
• If Px[T+x <∞] = 1, we say that x is recurrent .
• For a recurrent state x, there are two options:
– If Ex[T+x ] <∞ we say that x is positive recurrent.
– If Ex[T+x ] =∞ we say that x is null recurrent.
Our first goal will be to prove the following theorem.
••• Theorem 3.2. Let (Xt)t be a Markov chain on S with transition matrix P . If P is irre-
ducible, then for any x, y ∈ S, x is (positive, null) recurrent if and only if y is (positive, null)
recurrent.
That is, for irreducible chains, all the states have the same classification.
19
3.2. Stopping Times
A word about σ-algebras:
Recall that the canonical σ-algebra we take on the space SN is the σ-algebra generated by
the cylinder sets. A cylinder set is a set of the formω ∈ SN : ω0 = x0, . . . , ωt = xt
for some
t ≥ 0. A ⊂ SN is called a t-cylinder set if there exist x0, . . . , xt ∈ S such that for every ω ∈ Awe have ωj = xj for all j = 0, . . . , t.
Recall the σ-algebra
σ(X0, . . . , Xt) = σ(X−1j (x) : x ∈ S , j = 0, . . . , t
)= σ
(A : A is a j-cylinder set for some j ≤ t
).
Exercise 3.1. Define an equivalence relation on SN by ω ∼t ω′ if ωj = ω′j for all
j = 0, 1, . . . , t.
Show that this is indeed an equivalence relation.
We say that en event A respects ∼t if for any equivalent ω ∼t ω′ we have that ω ∈ A if and
only if ω′ ∈ A.
Show that σ(X0, X1, . . . , Xt) = A : A respects ∼t.
The hitting and return times above have the property, that their value can be determined by
the history of the chain; that is the event TA ≤ t is determined by (X0, X1, . . . , Xt).
• Definition 3.3 (Stopping Time). Consider a Markov chain on S. Recall that the probability
space is (SN,F ,P) where F is the σ-algebra generated by the cylinder sets.
A random variable T : SN → N ∪ ∞ is called a stopping time if for all t ≥ 0, the event
T ≤ t ∈ σ(X0, . . . , Xt).
Example 3.4. Any hitting time and return time is a stopping time. Indeed,
TA ≤ t =
t⋃j=0
Xj ∈ A .
Similarly for T+A . 454
Example 3.5. Consider the simple random walk on Z3. Let T = sup t : Xt = 0. This is the
last time the walk is at 0. One can show that T is a.s. finite. However, T is not a stopping time,
20
since for example
T = 0 = ∀ t > 0 Xt 6= 0 =
∞⋂t=1
Xt 6= 0 6∈ σ(X0).
454
Example 3.6. Let (Xt)t be a Markov chain and let T = inf t ≥ TA : Xt ∈ A′, where A,A′ ⊂S. Then T is a stopping time, since
T ≤ t =
t⋃k=0
k⋃m=0
Xm ∈ A,Xk ∈ A′ .
454
• Proposition 3.7. Let T, T ′ be stopping times. The following holds:
• Any constant t ∈ N is a stopping time.
• T ∧ T ′ and T ∨ T ′ are stopping times.
• T + T ′ is a stopping time.
Proof. Since t ≤ k ∈ ∅,Ω, the trivial σ-algebra, we get that t ≤ k ∈ σ(X0, . . . , Xk) for
any k. So constants are stopping times.
For the minimum:
T ∧ T ′ ≤ t = T ≤ t⋃T ′ ≤ t ∈ σ(X0, . . . , Xt).
The maximum is similar:
T ∨ T ′ ≤ t = T ≤ t⋂T ′ ≤ t ∈ σ(X0, . . . , Xt).
For the addition,
T + T ′ ≤ t =
t⋃k=0
T = k, T ′ ≤ t− k .
Since T = k = T ≤ k \ T ≤ k − 1 ∈ σ(X0, . . . , Xk), we get that T + T ′ is a stopping
time. ut
3.2.1. Conditioning on a stopping time. Stopping times are extremely important in the
theory of martingales, a subject we will come back to in the future.
For the moment, the important property we want is the Strong Markov Property.
For a fixed time t, we saw that the process (Xt+n)n is a Markov chain with starting distribution
Xt, independent of σ(X0, . . . , Xt). We want to do the same thing for stopping times.
21
Let T be a stopping time. The information captured byX0, . . . , XT , is the σ-algebra σ(X0, . . . , XT ).
This is defined to be the collection of all events A such that for all t, A∩T ≤ t ∈ σ(X0, . . . , Xt).
That is,
σ(X0, . . . , XT ) = A : A ∩ T ≤ t ∈ σ(X0, . . . , Xt) for all t .
One can check that this is indeed a σ-algebra.
Exercise 3.2. Show that σ(X0, . . . , XT ) is a σ-algebra.
Important examples are:
• For any t, T ≤ t ∈ σ(X0, . . . , XT ).
• Thus, T is measurable with respect to σ(X0, . . . , XT ).
• XT is measurable with respect to σ(X0, . . . , XT ) (indeed XT = x, T ≤ t ∈ σ(X0, . . . , Xt)
for all t and x).
• Proposition 3.8 (Strong Markov Property). Let (Xt)t be a Markov-P on S, and let T be
a stopping time. For all t ≥ 0, define Yt = XT+t. Then, conditioned on T < ∞ and XT , the
sequence (Yt)t is independent of σ(X0, . . . , XT ) and is Markov-(δXT , P ).
Proof. The (regular) Markov property tells us that for anym > k, and any eventA ∈ σ(X0, . . . , Xk),
P[Xm = y,A,Xk = x] = Pm−k(x, y)P[A,Xk = x].
We need to show that for all t, and any A ∈ σ(X0, . . . , XT ),
P[XT+t+1 = y|XT+t = x,1A, T <∞] = P (x, y)
(provided of course that P[XT+t = x,A, T < ∞] > 0). Indeed this follows from the fact that
A ∩ T = k ∈ σ(X0, . . . , Xk) ⊂ σ(X0, . . . , Xk+t) for all k, so
P[XT+t+1 = y,A,XT+t = x, T <∞] =
∞∑k=0
P[Xk+t+1 = y,Xk+t = x,A, T = k]
=
∞∑k=0
P (x, y)P[Xk+t = x,A, T = k] = P (x, y)P[XT+t = x,A, T <∞].
ut
Another way to state the above proposition is that for a stopping time T , conditional on
T <∞ we can restart the Markov chain from XT .
22
3.3. Excursion Decomposition
We now use the strong Markov property to prove the following:
Example 3.9. Let P be an irreducible Markov chain on S. Fix x ∈ S.
Define inductively the following stopping times: T(0)x = 0, and
T (k)x = inf
t ≥ T (k−1)
x + 1 : Xt = x.
So T(k)x is the time of the k-th return to x.
Let Vt(x) be the number of visits to x up to time t; i.e. Vt(x) =∑tk=1 1Xk=x.
It is immediate that Vt(x) ≥ k if and only if T(k)x ≤ t.
Now let us look at the excursions to x: The k-th excursion is
X[T (k−1)x , T (k)
x ] = (XT
(k−1)x
, XT
(k−1)x +1
, . . . , XT
(k)x
).
These excursions are paths of the Markov chain ending at x and starting at x (except, possibly,
the first excursion which starts at X0).
For k > 0 define
τ (k)x = T (k)
x − T (k−1)x ,
if T(k)x <∞ and 0 otherwise. For T
(k)x <∞, this is the length of the k-th excursion.
We claim that conditioned on T(k−1)x < ∞, the excursion X[T
(k−1)x , T
(k)x ], is independent
of σ(X0, . . . , XT(k−1)x
), and has the distribution of the first excursion X[0, T+x ] conditioned on
X0 = x.
Indeed, let Yt = XT
(k−1)x +t
. For any A ∈ σ(X0, . . . , XT(k−1)x
), and for any path γ : x → x,
since XT
(k−1)x
= x,
P[Y [0, τ (k)x ] = γ|A, T (k−1)
x ] = P[X[T (k−1)x , T (k)
x ] = γ|A, T (k−1)x <∞] = Px[X[0, T+
x ] = γ],
where we have used the strong Markov property. 454
This gives rise to the following relation:
• Lemma 3.10. Let P be an irreducible Markov chain on S. Then,
(Px[T+x <∞])k = Px[V∞(x) ≥ k] = Px[T (k)
x <∞].
Consequently,
1 + Ex[V∞(x)] =1
Px[T+x =∞]
,
where 1/0 =∞.
23
Proof. The event V∞(x) ≥ k is the event that x is visited at least k times, which is exactly
the event that the k-th excursion ends at some finite time. From the example above we have
that for any m,
P[T (m)x <∞|T (m−1)
x <∞] = P[∃t ≥ 1 : XT
(m−1)x +t
= x|T (m−1)x <∞] = Px[T+
x <∞].
SinceT
(m)x <∞
=T
(m)x <∞, T (m−1)
x <∞
, we can inductively conclude that
Px[T (k)x <∞] = Px[T (k)
x <∞|T (k−1)x <∞] · P[T (k−1)
x <∞]
= · · · = (Px[T+x <∞])k
The second assertion follows from the fact that
1 + Ex[V∞(x)] =
∞∑k=0
Px[V∞(x) ≥ k] =1
1− Px[T+x <∞]
,
where this holds even if Px[T+x <∞] = 1. ut
Similarly, one can prove:
Exercise 3.3. Let (Xt)t be Markov-(S, P ) for some irreducible P . Let Z ⊂ S. Show
that under Px, the number of visits to x until hitting Z (i.e. the random variable V = VTZ (x) +
1X0=x) is distributed geometric-p, for p = Px[TZ < T+x ].
We now get the following important characterization of recurrence in Markov chains:
• Corollary 3.11. Let P be an irreducible Markov chain on S. Then the following are equivalent:
(1) x is recurrent.
(2) Px[V∞(x) =∞] = 1.
(3) For any state y, Px[T+y <∞] = 1.
(4) Ex[V∞(x)] =∞.
Proof. If x is recurrent, then Px[T+x < ∞] = 1. So for any k, Px[V∞(x) ≥ k] = 1. Taking k to
infinity, we get that Px[V∞(x) =∞] = 1. This is the first implication.
For the second implication: Let y ∈ S.
Let Ek = X[T(k−1)x , T
(k)x ] be the k-th excursion from x. We assumed that Px[∀ k T (k)
x <∞] =
1. So under Px, all (Ek) are independent and identically distributed.
24
Since P is irreducible, there exists t > 0 such that Px[Xt = y , t < T+x ] > 0 (this is an
exercise). Thus, we have that p := Px[Ty < T+x ] ≥ Px[Xt = y , t < T+
x ] > 0. This implies by
the strong Markov property that
Px[Ty < T (k+1)x | Ty > T (k)
x , T (k)x <∞] ≥ p > 0.
So, using the fact that Px[∀ k T (k)x <∞] = 1,
Px[Ty ≥ T (k)x ] = Px[Ty ≥ T (k)
x | Ty > T (k−1)x , T (k−1)
x <∞] · Px[Ty > T (k−1)x ]
≤ (1− p) · Px[Ty ≥ T (k−1)x ] ≤ · · · ≤ (1− p)k.
Thus,
Px[T+y =∞] ≤ Px[∀ k , Ty ≥ T (k−1)
x ] = limk→∞
(1− p)k = 0.
This proves the second implication.
Finally, if for any y we have Px[T+y <∞] = 1, then taking y = x shows that x is recurrent.
This shows that (1),(2),(3) are equivalent.
It is obvious that (2) implies (4). Since Px[T+x = ∞] = 1
Ex[V∞(x)]+1 , we get that (4) implies
(1). ut
Exercise 3.4. Show that if P is irreducible, there exists t > 0 such that Px[Xt = y , t <
T+x ] > 0.
♣ Solution to ex:3.4. :(
There exists n such that Pn(x, y) > 0 (because P is irreducible). Thus, there is a sequence
x = x0, x1, . . . , xn = y such that P (xj , xj+1) > 0 for all 0 ≤ j < n. Let m = max0 ≤ j <
n : xj = x, and let t = n −m and yj := xm+j for 0 ≤ j ≤ t. Then, we have the sequence
x = y0, . . . , yt = y so that yj 6= x for all 0 < j ≤ t, and we know that P (yj , yj+1) > 0 for all
0 ≤ j < t. Thus,
Px[Xt = y , t < T+x ] ≥ Px[∀ 0 ≤ j ≤ t , Xj = yj ] = P (y0, y1) · · ·P (yt−1, yt) > 0.
:) X
25
Example 3.12. A gambler plays a fair game. Each round she wins a dollar with probability
1/2, and loses a dollar with probability 1/2, all rounds independent. What is the probability
that she never goes bankrupt, if she starts with N dollars?
We have already seen that this defines a simple random walk on Z, and that E0[Vt(0)] ≥ c√t.
Thus, taking t→∞ we get that E0[V∞(0)] =∞, and so 0 is recurrent.
Note that 0 here was not special, since all vertices look the same. This symmetry implies that
Px[T+x < ∞] = 1 for all x ∈ Z. Thus, for any N , PN [T+
0 = ∞] = 0. That is, no matter how
much money the gambler starts with, she will always go bankrupt eventually. 454
We now have part of Theorem 3.2.
• Corollary 3.13. Let P be an irreducible Markov chain on S. Then, for any x, y ∈ S, x is
transient if and only if y is transient.
Proof. As usual, by irreducibility, for any pair of states z, w we can find t(z, w) > 0 such that
P t(z,w)(z, w) > 0.
Fix x, y ∈ S and suppose that x is transient. For any t > 0,
P t+t(x,y)+t(y,x)(x, x) ≥ P t(x,y)(x, y)P t(y, y)P t(y,x)(y, x).
Thus,
Ey[V∞(y)] =
∞∑t=1
P t(y, y) ≤ 1
P t(x,y)(x, y)P t(y,x)(y, x)·∞∑t=1
P t+t(x,y)+t(y,x)(x, x) <∞.
So y is transient as well. ut
Number of exercises in lecture: 4
Total number of exercises until here: 8
26
Random Walks
Ariel Yadin
Lecture 4: Stationary Distributions
4.1. Stationary Distributions
Suppose that P is a Markov chain on state space S such that for some starting distribution
µ, we have that Pµ[Xn = x] → π(x) where π is some limiting distribution. One immediately
checks that in this case we must have
πP (x) =∑s
limn→∞
Pn(y, s)P (s, x) = limn→∞
Pn+1(y, x) = π(x),
or πP = π. (That is, π is a left eigenvector for P with eigenvalue 1.)
• Definition 4.1. Let P be a Markov chain. If π is a distribution satisfying πP = π then π is
called a stationary distribution.
Example 4.2. Recall the two-state chain P =
1− p p
p 1− p
. We saw that P → 12 ·
1 1
1 1
.
Indeed, it is simple to check that π = (1/2, 1/2) is a stationary distribution in this case. 454
Example 4.3. Consider a finite graph G. Let P be the transition matrix of a simple random
walk on G. So P (x, y) = 1deg(x)1x∼y. Or: deg(x)P (x, y) = 1x∼y. Thus,∑
x
deg(x)P (x, y) = deg(y).
So deg is a left eigenvector for P with eigenvalue 1. Since∑x
deg(x) =∑x
∑y
1x∼y = 2∑
e∈E(G)
= 2|E(G)|,
we normalize π(x) = deg(x)2|E(G)| to get a stationary distribution for P . 454
The above stationary distribution has a special property, known as the detailed balance equa-
tion:
A distribution π is said to satisfy the detailed balance equation with respect to a transition
matrix P if for all states x, y
π(x)P (x, y) = π(y)P (y, x).
27
Exercise 4.1. If π satisfies the detailed balance equations, then π is a stationary
distribution.
We will come back to such distributions in the future.
4.2. Stationary Distributions and Hitting Times
There is a deep connection between stationary distributions and return times. The main
result here is:
••• Theorem 4.4. Let P be an irreducible Markov chain on state space S. Then, the following
are equivalent:
• P has a stationary distribution π.
• Every x is positive recurrent.
• Some x is positive recurrent.
• P has a unique stationary distribution, π(x) = 1Ex[T+
x ].
The proof of this theorem goes through a few lemmas.
X In the next lemma we will consider a function (vector) v : S → [0,∞]. Although it
may take the value ∞, since we are only dealing with non-negative numbers we can write
vP (x) =∑y v(y)P (y, x) without confusion (with the convention that 0 · ∞ = 0).
• Lemma 4.5. Let P be an irreducible Markov chain on state space S. Let v : S → [0,∞] be
such that vP = v. Then:
• If there exists a state x such that v(x) <∞ then v(y) <∞ for all states y.
• If v is not the zero vector, then v(y) > 0 for all states y.
X Note that this implies that if π is a stationary distribution then all the entries of π are
strictly positive.
Proof. For any t, using the fact that v ≥ 0,
v(x) =∑z
v(z)P t(z, x) ≥ v(y)P t(y, x).
28
Thus, for a suitable choice of t, since P is irreducible, we know that P t(y, x) > 0, and so
v(y) ≤ v(x)P t(y,x) <∞.
For the second assertion, if v is not the zero vector, since it is non-negative, there exists a
state x such that v(x) > 0. Thus, for any state y and for t such that P t(x, y) > 0 we get
v(y) =∑z
v(z)P t(z, y) ≥ v(x)P t(x, y) > 0.
ut
X Notation: Recall that for a Markov chain (Xt)t we denote by Vt(x) =∑tk=1 1Xk=x the
number of visits to x.
• Lemma 4.6. Let (Xt)t be Markov-(P, µ) for irreducible P . Assume T is a stopping time such
that
Pµ[XT = x] = µ(x) for all x.
Assume further that 1 ≤ T <∞ Pµ-a.s. Let v(x) = Eµ[VT (x)].
Then, vP = v. Moreover, if Eµ[T ] <∞ then P has a stationary distribution π(x) = v(x)Eµ[T ] .
Proof. The assumptions on T give that for any j,
µ(x) = Pµ[XT = x] =
∞∑j=1
Pµ[Xj = x, T = j].
∞∑j=0
Pµ[Xj = y, T > j] = Pµ[X0 = y] +
∞∑j=1
Pµ[Xj = y, T > j]
=
∞∑j=1
Pµ[Xj = y, T = j] + Pµ[Xj = y, T > j]
=
∞∑j=1
Pµ[Xj = y, T ≥ j] = v(y).
Thus we have that
v(x) =
∞∑j=1
Pµ[Xj = x, T ≥ j] =
∞∑j=0
Pµ[Xj+1 = x, T > j]
=
∞∑j=0
∑y
Pµ[Xj+1 = x,Xj = y, T > j]
=∑y
∞∑j=0
Pµ[Xj = y, T > j]P (y, x) = (vP )(x).
29
That is, vP = v.
Since ∑x
v(x) = Eµ[∑x
VT (x)] = Eµ[T ],
if Eµ[T ] <∞, then π(x) = v(x)Eµ[T ] defines a stationary distribution. ut
Example 4.7. Consider (Xt)t that is Markov-P for an irreducible P , and let v(y) = Ex[VT+x
(y)].
If x is recurrent, then Px-a.s. we have 1 ≤ T+x < ∞, and Px[XT+
x= y] = 1y=x = Px[X0 = y].
So we conclude that vP = v. Since Px-a.s. VT+x
(x) = 1, we have that 0 < v(x) = 1 < ∞, so
0 < v(y) <∞ for all y.
Note that although it may be that Ex[T+x ] =∞, i.e. x is null-recurrent, we still have that for
any y, Ex[VT+x
(y)] <∞, i.e. the expected number of visits to y until returning to x is finite.
If x is positive recurrent, then π(y) =Ex[V
T+x
(y)]
Ex[T+x ]
is a stationary distribution for P . 454
This vector plays a special role, as in the next Lemma.
• Lemma 4.8. Let P be an irreducible Markov chain. Let u(y) = Ex[VT+x
(y)]. Let v ≥ 0 be a
non-negative vector such that vP = v, and v(x) = 1. Then, v ≥ u. Moreover, if x is recurrent,
then v = u.
Proof. If y = x then v(x) = 1 ≥ u(x), so we can assume that y 6= x.
We will prove by induction that for all t, for any y 6= x,
t∑k=1
Px[Xk = y, T+x ≥ k] ≤ v(y).(4.1)
Indeed, for t = 1 this is just
Px[X1 = y, T+x ≥ 1] = P (x, y) ≤
∑z
v(z)P (z, y) = v(y),
since v ≥ 0, v(x) = 1 and y 6= x.
For general t > 0, we rely on the fact that by the Markov property, for any y 6= x,
Px[Xk+1 = y, T+x ≥ k+1] =
∑z 6=x
Px[Xk+1 = y,Xk = z, T+x ≥ k] =
∑z 6=x
Px[Xk = z, T+x ≥ k]P (z, y).
30
So by induction,
t+1∑k=1
Px[Xk = y, T+x ≥ k] = P (x, y) +
t∑k=1
Px[Xk+1 = y, T+x ≥ k + 1]
= P (x, y) +∑z 6=x
P (z, y)
t∑k=1
Px[Xk = z, T+x ≥ k]
≤ P (x, y) +∑z 6=x
P (z, y)v(z) =∑z
v(z)P (z, y) = v(y).
This completes a proof of (4.1) by induction.
Now, one notes that the left-hand side of (4.1) is just the expected number of visits to y
started at x, up to time T+x ∧ t. Taking t→∞, using monotone convergence,
v(y) ≥t∑
k=1
Px[Xk = y, T+x ≥ k] = Ex[VT+
x ∧t(y)] u(y).
This proves that v ≥ u.
Since x is recurrent, we have uP = u, and u(x) = 1 = v(x). We have seen that v − u ≥ 0,
and of course (v − u)P = v − u. Until now we have not actually used irreducibility; we will
use this to show that v − u = 0. Indeed, let y be any state. If v(y) > u(y) then v − u is a
non-zero non-negative left eigenvector for P , so must be positive everywhere. This contradicts
v(x)− u(x) = 0. So it must be that v − u ≡ 0. ut
We are now in good shape to prove Theorem 4.4.
Proof of Theorem 4.4. Assume that π is a stationary distribution for P . Fix any state x. Recall
that π(x) > 0. Define the vector v(z) = π(z)π(x) . We have that v ≥ 0, vP = v and v(x) = 1. Hence,
v(z) ≥ Ex[VT+x
(z)] for all z. That is,
Ex[T+x ] =
∑y
Ex[VT+x
(y)] ≤∑y
v(y) =∑y
π(y)
π(x)=
1
π(x)<∞.
So x is positive recurrent. This holds for a generic x.
The second bullet of course implies the third.
Now assume some state x is positive recurrent. Let v(y) = Ex[VT+x
(y)]. Since x is recurrent,
we know that vP = v, and∑y v(y) = Ex[T+
x ] < ∞. So π = vEx[T+
x ]is a stationary distribution
for P .
Since P has a stationary distribution, by the first implication all states are positive recurrent.
Thus, for any state z, if v = ππ(z) then vP = v and v(z) = 1. So z being recurrent we get that
31
v(y) = Ez[VT+z
(y)] for all y. Specifically,
Ez[T+z ] =
∑y
v(y) =1
π(z),
which holds for all states z.
For the final implication, if P has a specific stationary distribution, then of course it has a
stationary distribution. ut
• Corollary 4.9 (Stationary distributions are unique). If an irreducible Markov chain P has
two stationary distributions π and π′, then π = π′.
Exercise 4.2. Let P be an irreducible Markov chain. Show that for positive recurrent
states x, y,
Ex[VT+x
(y)]Ey[VT+y
(x)] = 1.
4.3. Transience, positive or null recurrence are properties of the chain
We also have now shown that
Theorem* (3.2). [restatement] Let P be an irreducible Markov chain. For any two states
x, y: x is transient / null recurrent / positive recurrent if and only if y is transient / null
recurrent / positive recurrent.
Proof. We have seen that
Px[T+x =∞] =
1
1 + Ex[V∞(x)]
implies that x is transient if and only if y is transient.
Now, if x is positive recurrent, then P has a stationary distribution, so all states, including y
are positive recurrent. ut
In light of this:
• Definition 4.10. Let P be an irreducible Markov chain. We say that
• P is transient, if there exists a transient state.
• P is null recurrent if there exists a null recurrent state.
• P is positive recurrent if there exists a positive recurrent state.
32
Number of exercises in lecture: 2
Total number of exercises until here: 10
33
Random Walks
Ariel Yadin
Lecture 5: Positive Recurrent Chains
5.1. Simple Random Walks
Last lecture we proved that an irreducible Markov chain P has a stationary distribution if and
only if P is positive recurrent, and the stationary distribution is the reciprocal of the expected
return time.
Let’s investigate what this means in the setting of a simple random walk on a graph.
Example 5.1. Let G be a graph, and let P be the simple random walk on G; that is, P (x, y) =
1deg(x)1x∼y.
First, it is immediate that P is irreducible. This was shown in the exercises.
Consider the vector v(x) = deg(x). We have that
∑x
deg(x)P (x, y) =∑x
deg(x)1
deg(x)1x∼y = deg(y).
That is, vP = v.
If we take u(y) = v(y)/v(x) for some x, then uP = u and u(x) = 1. Thus, if P is recurrent,
then Ex[VT+x
(y)] = u(y) = deg(y)deg(x) for all x, y. This does not depend on dist(x, y)!
Another observation is that∑x v(x) = 2|E(G)|. That is, P is positive recurrent if and only
if G is finite. Moreover, in this case, the stationary distribution for P is π(x) = deg(x)2|E(G)| .
Note that if G is a finite regular graph then the stationary distribution on G is the uniform
distribution. 454
Example 5.2. Recall the simple random walk on Z. We already have seen that this is a
recurrent Markov chain. Thus, if vP = v, then v(y) = Ex[VT+x
(y)]v(x) for all x, y. Since the
constant vector ~1 satisfies ~1P = ~1, we get that Ex[VT+x
(y)] = 1 for all x, y. Thus, any v such
that vP = v must admit v ≡ c.So there is no stationary distribution on Z; that is, Z is null-recurrent. (We could have also
deduced this from the previous example.) 454
34
Example 5.3. Consider a different Markov chain on Z: Let P (x, x+1) = p and P (x, x−1) = 1−pfor all x.
Suppose vP = v. Then, v(x) = v(x−1)p+v(x+1)(1−p), or v(x+1) = 11−p (v(x)−pv(x−1))
Solving such recursions is simple: Set ux =[v(x+ 1) v(x)
]τ. So ux+1 = 1
1−pAux where
A =
1 −p1− p 0
.
Since the characteristic polynomial of A is λ2 − λ+ p(1− p) = (λ− p)(λ− (1− p)), we get that
the eigenvalues of A are p and 1− p. One can easily check that A is diagonalizable, and so
v(x) = ux(2) = (1− p)−x(Axu0)(2) = (1− p)−x · [0 1]MDM−1u0 = a(
p1−p
)x+ b,
where D is diagonal with p, 1 − p on the diagonal, and a, b are constants that depend on the
matrix M and on u0 (but independent of x).
Thus,∑x v(x) will only converge for a = 0, b = 0 which gives v = 0. That is, there is no
stationary distribution, and P is not positive recurrent.
In the future we will in fact see that P is transient for p 6= 1/2, and for p = 1/2 we have
already seen that P is null-recurrent. 454
Example 5.4. A chess knight moves on a chess board, each step it chooses uniformly among
the possible moves. Suppose the knight starts at the corner. What is the expected time it takes
the knight to return to its starting point?
At first, this looks difficult...
However, let G be the graph whose vertices are the squares of the chess board, V (G) =
1, 2, . . . , 82. Let x = (1, 1) be the starting point of the knight. For edges, we will connect two
vertices if the knight can jump from one to the other in a legal move.
35
Thus, for example, a vertex in the “center” of the board has 8 adjacent vertices. A corner,
on the other hand has 2 adjacent vertices. In fact, we can determine the degree of all vertices.
legal moves:
∗ ∗∗ ∗
o
∗ ∗∗ ∗
⇒
2 3 4 4 4 4 3 2
3 4 6 6 6 6 4 3
4 6 8 8 8 8 6 4
4 6 8 8 8 8 6 4
4 6 8 8 8 8 6 4
4 6 8 8 8 8 6 4
3 4 6 6 6 6 4 3
2 3 4 4 4 4 3 2
Summing all the degrees, one sees that 2|E(G)| = 4 · (4 · 8 + 4 · 6 + 5 · 4 + 2 · 3 + 2) = 4 · 84 = 336.
Thus, the stationary distribution is π(i, j) = deg(i, j)/336. Specifically, π(x) = 2/336 and so
Ex[T+x ] = 168. 454
5.2. Summary so far
Let us sum up what we know so far about irreducible chains. If P is an irreducible Markov
chain, then:
• Ex[V∞(x)] + 1 = 1Px[T+
x =∞].
• For all states x, y, x is transient if and only if y is transient.
• If P is recurrent, the vector v(z) = Ex[VT+x
(z)] is a positive left eigenvector for P , and
any non-negative left eigenvector for P is proportional to v.
• P has a stationary distribution if and only if P is positive recurrent.
• If P is positive recurrent, then π(x)Ex[T+x ] = 1.
5.3. Positive Recurrent Chains
Recall that Lemma 4.6 connects the expected number of visits to x up to an appropriate
stopping time, to the stationary distribution and the expected value of the stopping time:
Lemma* (4.6). [restatement] Let (Xt)t be Markov-(P, µ) for irreducible P . Assume T is a
stopping time such that
Pµ[XT = x] = µ(x) for all x.
Assume further that 1 ≤ T <∞ Pµ-a.s. Let v(x) = Eµ[VT (x)].
Then, vP = v. Moreover, if Eµ[T ] <∞ then P has a stationary distribution π(x) = v(x)Eµ[T ] .
36
Good choices of the stopping time T for positive recurrent chains will give some nice identities.
• Proposition 5.5. Let P be a positive recurrent chain with stationary distribution π. Then,
• Ex[T+x ] = 1
π(x) .
• Ex[VT+x
(y)] = π(y)π(x) .
• For x 6= y,
1 + Ex[VT+y
(x)] = π(x) · (Ey[T+x ] + Ex[T+
y ]).
• For x 6= y,
π(x)Px[T+y < T+
x ] · (Ey[T+x ] + Ex[T+
y ]) = 1.
• (This is sometimes called “the edge commute inequality”. It will be important in the
future.) For x ∼ y,
Ex[Ty] + Ey[Tx] ≤ 1
π(x)P (x, y).
Proof. We have:
• Follows by choosing T = T+x in Lemma 4.6.
• We have already seen this. It also follows by choosing T = T+x in Lemma 4.6.
• Let T = inf t ≥ Tx + 1 : Xt = y. So Ey[T ] = Ey[T+x ] + Ex[T+
y ]. Since Py[T = z] =
1z=y, we can apply Lemma 4.6. The strong Markov property at time Tx gives that
Ey[VT (x)] = Ey[∑
Tx≤k≤T1Xk=x] = Ex[VT+
y(x)] + 1.
So by Lemma 4.6,
Ex[VT+y
(x)] = Ey[VT (x)]− 1 = π(x)Ey[T ]− 1 = π(x) · (Ey[T+x ] + Ex[T+
y ])− 1.
• This follows from the previous bullet since Px-a.s. VT+y
(x)+1 ∼ Geo(p) for p = Px[T+y <
T+x ].
• Since for x ∼ y we have that Px[T+y < T+
x ] ≥ Px[X1 = y] = P (x, y), we get the assertion
from the previous bullet.
ut
Number of exercises in lecture: 0
Total number of exercises until here: 10
37
Random Walks
Ariel Yadin
Lecture 6: Convergence to Equilibrium
6.1. Convergence to Equilibrium
Recall that we saw that if P t(y, x)→ π(x) for all x, then π must be a stationary distribution.
We will start to work our way to prove the opposite, at least for irreducible and aperiodic chains.
Our goal:
Theorem* (6.5). [restatement] Let (Xt)t be an irreducible and aperiodic Markov chain.
Suppose that π is a stationary distribution for this chain. Then, for any starting distribution µ,
and any state x,
Pµ[Xt = x]→ π(x).
6.2. Couplings
Example 6.1. Two gamblers walk into a casino in Las Vegas.
The first one plays a fair game - every round she wins a dollar with probability 1/2, and loses
a dollar with probability 1/2. All rounds are independent.
The second gambler plays an unfair game - every round he wins a dollar with probability
p < 1/2, and loses a dollar with probability 1− p, again all rounds independent.
It is extremely intuitive that the second gambler is worse off than the first one. It should be
the case that the probability of the second gambler to go bankrupt is at least the probability of
the first one. Also, it seems that any reasonable measure of success should be larger for the first
gambler than for the second.
How can we mathematically prove this?
For example, we would like to show that for all starting positions N and any M > N , we
have that P1N [T0 < TM ] ≤ P2
N [T0 < TM ]. How can we show this?
The idea is to use couplings. 454
38
• Definition 6.2. A coupling of Markov chains P,Q on a state space S, is a stochastic process
(Xt, Yt)t such that (Xt)t is Markov-P and (Yt)t is Markov-Q.
Note that (Xt, Yt)t need not be a Markov chain on S2. If a coupling (Xt, Yt)t is in addition a
Markov chain on S2, then we say that (Xt, Yt)t is a Markovian coupling. If R is the transition
matrix for the Markovian coupling (Xt, Yt)t, we say that R is a coupling of P,Q.
Example 6.3. Let us use a Markovian coupling to show that lowering the winning probability
for a gambler, lowers their chances of winning.
Let p < q, and let P be the transition matrix on N for the gambler that wins with probability
p, and let Q be the transition matrix for the gambler that wins with probability q. That is,
P (n, n+ 1) = p and P (n, n− 1) = 1− p for all n > 0, and P (0, 0) = 1. Similarly for Q.
The corresponding Markov chains are (Xt)t for P and (Yt)t for Q. We can could the chains
as follows: Given (Xt, Yt), since Y moves up with higher probability than X, we can organize a
coupling such that Yt+1 ≥ Xt+1 in any case. That is, given (Xt, Yt), if Xt > 0 let
(Xt+1, Yt+1) = (Xt, Yt) +
(1, 1) with probability p
(−1, 1) with probability q − p(−1,−1) with probability 1− q.
If Xt = 0, Yt > 0 let
(Xt+1, Yt+1) = (Xt, Yt) +
(0, 1) with probability q
(0,−1) with probability 1− q.
If Xt = Yt = 0 let Xt+1 = Yt+1 = 0
It is immediate to check that this is indeed a coupling of P and Q, and that Yt ≥ Xt for all t
provided that Y0 ≥ X0.
One can check that the resulting transition matrix is:
R((n,m), (n+ i,m+ j)) =
p i = 1, j = 1n,m > 0
q − p i = −1, j = 1, n,m > 0
1− q i = −1, j = −1, n,m > 0
q i = 0, j = 1, n = 0,m > 0
1− q i = 0, j = −1, n = 0,m > 0
1 i = 0, j = 0, n = m = 0.
So this is a Markovian coupling.
39
Thus, for any M > N ,
PQN [T0 < TM ] = PR(N,N)[∃ t : Yt = 0 and ∀ n < t Yn < M ]
≤ PR(N,N)[∃ t : Xt = 0 and ∀ n < t Xn < M ] = PPN [T0 < TM ],
where PP ,PQ,PR denote the probability measures for P , Q, and R respectively, and we have
used the fact that under PR(N,N), a.s. Xt ≤ Yt for all t. 454
6.2.1. Coupling Time.
• Lemma 6.4. Let (Xt, Yt)t be a Markovian coupling of two Markov chains on the same state
space S with the same transition matrix P . Define the coupling time as
τ = inf t ≥ 0 : Xt = Yt .
This is a stopping time for the Markov chain (Xt, Yt)t.
Define
Zt =
Xt t ≤ τYt t ≥ τ.
Then, (Zt)t is a Markov chain with transition matrix P , started from Z0 = X0.
Specifically, (Zt, Yt)t is a coupling of Markov chains such that for all t ≥ τ , Zt = Yt.
Proof. Since τ ≥ t+ 1 = τ < t+ 1c ∈ σ((X0, Y0), . . . , (Xt, Yt)), the Markov property at
time t gives
P[Zt+1 = y|Zt = x, τ ≥ t+1, Zt−1, . . . , Z0] = P[Xt+1 = y|Xt = x, τ ≥ t+1, Xt−1, . . . , X0] = P (x, y).
Since τ is a stopping time, we can use the strong Markov property to deduce that for any t,
P[Zt+1 = y|Zt = x, τ ≤ t, Zt−1, . . . , Z0] = P[Yt+1 = y|Yt = x, . . . , Yτ ] = P (x, y).
Thus, for any t,
P[Zt+1 = y|Zt = x, Zt−1, . . . , Z0]
= P[Zt+1, τ ≥ t+ 1|Zt = x, Zt−1, . . . , Z0] + P[Zt+1, τ ≤ t|Zt = x, Zt−1, . . . , Z0]
= P (x, y) · (P[τ ≥ t+ 1|Zt = x, Zt−1, . . . , Z0] + P[τ ≤ t|Zt = x, Zt−1, . . . , Z0]) = P (x, y).
ut
40
6.3. The Convergence Theorem
In this section we will prove a fundamental result in the theory of Markov chains.
••• Theorem 6.5. Let P be an irreducible and aperiodic Markov chain. If P has a stationary
distribution π, then for any starting distribution µ, and any state x,
Pµ[Xt = x]→ π(x).
Proof. Let (Yt)t be Markov-(π, P ) independent of (Xt)t. Since πP t = π, we have that π(x) =
P[Yt = x]. Let τ be the coupling time of (Xt, Yt)t.
First we show that P[τ <∞] = 1, so P[τ > t]→ 0. Indeed, (Xt, Yt)t is a Markov chain on S2,
with transition matrix Q((x, y), (x′, y′)) = P (x, x′)P (y, y′). Moreover, for χ(x, y) = π(x)π(y),
we get that χ is stationary distribution for Q.
We claim that since P is irreducible and aperiodic, then Q is also irreducible (and aperi-
odic). Indeed, let (x, y), (x′, y′) ∈ S2. We already saw that there exist t(x, x′), t(y, y′) such
that for all t > t(x, x′), P t(x, x′) > 0 and for all t > t(y, y′), P t(y, y′) > 0. Thus, for all
t > max t(x, x′), t(y, y′) we have that Qt((x, x′), (y, y′)) > 0. Thus, Q is irreducible.
Since Q has a stationary distribution and Q is irreducible, we get that Q is positive recurrent.
Specifically, P[T(x,x) <∞] = 1 for any x ∈ S. Since τ ≤ T(x,x), we get that P[τ <∞] = 1.
Now define
Zt =
Yt t ≤ τXt t ≥ τ.
So (Xt, Zt)t is a coupling of Markov chains such that for all t ≥ τ , Xt = Zt. Also, since
Z0 = Y0 ∼ π,
P[Zt = x] = P[Zt = x, t < τ ] + P[Zt = x, t ≥ τ ] = P[Zt = x, t < τ ] + P[Xt = x, t ≥ τ ].
Adding this to
P[Xt = x] = P[Xt = x, t < τ ] + P[Xt = x, t ≥ τ ],
we get that
|P[Xt = x]− P[Zt = x]| = |P[Xt = x, t < τ ]− P[Zt = x, t < τ ]| ≤ P[τ > t]→ 0.
Finally, the previous lemma tells us that (Zt)t is Markov-(S, P, π), most importantly, starting
distribution π. So P[Zt = x] = π(x). ut
41
Number of exercises in lecture: 0
Total number of exercises until here: 10
42
Random Walks
Ariel Yadin
Lecture 7: Conditional Expectation
7.1. Conditional Probability
Recall that we want to define a random walk. A (simple) random walk is a process that
given the current location chooses among the available neighbors uniformly. So we need a way
of conditioning on the current position.
That is, we want the notions of conditional probability and conditional expectation.
The notion of conditional expectation is central to probability. It is developed using the
Radon-Nikodym derivative from measure theory:
Johann Radon (1887-1956)
Otto Nikodym (1887-1974)
••• Theorem 7.1. Let µ, ν be two probability measures on (Ω,F). Suppose that µ is absolutely
continuous with respect to ν; that is, ν(A) = 0 implies that µ(A) = 0 for all A ∈ F .
Then, there exists a (ν-a.s. unique) random variable dµdν on (Ω,F , ν) such that for any event
A ∈ F ,
Eµ[1A] = Eν [1Adµ
dν].
X Lebesgue integrals give the following form:∫A
dµ =
∫A
dµ
dνdν,
which can be informally stated as dµdν dν = dµ.
This theorem is used to prove the following theorem.
••• Theorem 7.2. Let X be a random variable on a probability space (Ω,F ,P) such that
E[|X|] < ∞. Let G ⊂ F be a sub-σ-algebra of F . Then, there exists a (P-a.s. unique) G-
measurable random variable Y such that for all A ∈ G, E[Y 1A] = E[X1A].
X Notation: An X such as above is called integrable.
X Notation: If Y is G-measurable then we write Y ∈ G.
43
• Definition 7.3. Let X be an integrable (E[|X|] <∞) random variable on a probability space
(Ω,F ,P). Let G ⊂ F be a sub-σ-algebra of F .
The random variable from the above theorem is denoted E[X|G].
If Y is a random variable on (Ω,F ,P) then we denote E[X|Y ] := E[X|σ(Y )].
If A ∈ F is any event then we write
P[A|G] := E[1A|G].
Proof of Theorem 7.2. Note that uniqueness is immediate from the fact that if Y, Y ′ are two
such random variables, then for An =Y − Y ′ ≥ n−1
we have that An ∈ G (as a function of
(Y, Y ′)) and
P[An]n−1 ≤ E[(Y − Y ′)1An ] = E[X1An ]− E[X1An ] = 0.
So by continuity of probability,
P[Y > Y ′] = P[⋃n
An] = limn
P[An] = 0.
Exchanging the roles of Y and Y ′ we get that P[Y 6= Y ′] = 0.
For existence we use the Radon-Nikodym derivative: First assume that X ≥ 0. Then, define
a probability measure on (Ω,G) by
∀ A ∈ G Q(A) =E[X1A]
E[X].
If P[A] = 0 then Q(A) = 0 (e.g. by Cauchy-Schwartz E[X1A]2 ≤ E[X2]P[A] = 0); that is
Q << P. So the Radon-Nikodym derivative exists and for all A ∈ G,
E[X1A] = E[dQd P1A]E[X].
Taking Y = dQd P E[X] completes the case of X ≥ 0.
For the general case, recall that X = X+ − X−, and X+, X− are non-negative. Let Y1 =
E[X+|G] and Y2 = E[X−|G]. Then, Y1 − Y2 ∈ G and for any A ∈ G,
E[X1A] = E[X+1A]− E[X−1A] = E[(Y1 − Y2)1A].
Thus, Y = Y1 − Y2 completes the proof. ut
X Note that to prove that Y = E[X|G] one needs to show two things: Y ∈ G and E[Y 1A] =
E[X1A] for all A ∈ G.
X Important: Conditional expectation E[X|G] is the average value of X given the information
44
in G; this is a random variable, not a number as is the usual expectation. One needs to be careful
with this. Whenever we write E[X|G] = Z we actually mean that E[X|G] = Z a.s.
Exercise 7.1. Let X be an integrable random variable on (Ω,F ,P). Let G ⊂ F be a
sub-σ-algebra. Then,
• If X ∈ G then E[X|G] = X. [ The average value of X given X is X itself. ]
• If G = ∅,Ω then E[X|G] = E[X]. [ Given no information, the average value of X is
E[X]. ]
• If X = c for c a constant, then X is measurable with respect to the trivial σ-algebra
∅,Ω ⊂ G, so E[c|G] = c.
• If X is independent of G then E[X|G] = E[X]. [ Given no information about X, the
average value of X is E[X]. ]
• E[E[X|G]] = E[X].
Solution.
• It is trivial that E[X1A] = E[X1A] so if X ∈ G then X satisfies both properties required
to be a conditional expectation.
• Again, constants are measurable with respect to any σ-algebra. For the second property,
E[X1∅] = 0 = E[E[X]1∅] and E[X1Ω] = E[X].
• Easy. Follows from the previous bullets.
• If X is independent of G, then for any A ∈ G, E[X1A] = E[X]P[A] = E[E[X]1A]. Also,
E[X] ∈ G since constants are measurable with respect to any σ-algebra.
• Consider the event Ω ∈ G. Since 1 = 1Ω we get that E[X] = E[X1Ω] = E[E[X|G]1Ω] =
E[E[X|G]].
ut
Exercise 7.2. If Y = Y ′ a.s. then E[X|Y ] = E[X|Y ′]. [ Changing by measure 0 does
not change the conditioning. ]
Hint: Consider E[X|σ(Y ) ∩ σ(Y ′)].
45
Solution. It suffices to prove that if G and G′ are σ-algebras such that if A ∈ G4G′ then P[A] = 0
(that is, G and G′ only differ on measure 0 events) then E[X|G] = E[X|G′] a.s.
G ∩ G′ is a σ-algebra as an intersection of σ-algebras. Let Z = E[X|G ∩ G′]. Since G ∩ G′ ⊂ Gand G∩G′ ⊂ G′ we have that Z is both G and G′ measurable. Moreover, for any A ∈ G: if A 6∈ G′
then P[A] = 0 so E[X1A] = 0 = E[Z1A]. If A ∈ G′ then A ∈ G ∩ G′ so E[X1A] = E[Z1A] by
definition. Thus, Z = E[X|G]. Similarly, exchanging the roles of G and G′, we get Z = E[X|G′],so E[X|G] = E[X|G′] a.s. ut
Exercise 7.3. E[aX + Y |G] = aE[X|G] + E[Y |G] a.s.
Solution. The right hand side is of course G-measurable. For any A ∈ G,
E[(aX+Y )1A] = aE[X1A]+E[Y 1A] = aE[E[X|G]1A]+E[E[Y |G]1A] = E[(aE[X|G]+E[Y |G])1A].
ut
Exercise 7.4. If X ≤ Y then E[X|G] ≤ E[Y |G].
Solution. Since Y −X ≥ 0 is suffices to show that if X ≥ 0 then E[X|G] ≥ 0 a.s.
Let An =E[X|G] ≤ −n−1
. So An ∈ G and
P[An]n−1 ≤ −E[E[X|G]1An ] = −E[X1An ] ≤ 0.
So P[An] = 0 for all n, and thus P[E[X|G] < 0] = P[∃ n : An] = 0. ut
Exercise 7.5. Let G ∈ G. Show that for any event A with P[A] > 0,
P[G|A] =E[P[A|G]1G]
P[A].
Thomas Bayes (1701-1761)
46
Solution. Note that since G ∈ G, by definition,
E[P[A|G]1G] = E[1A1G] = P[A ∩G].
ut
7.2. More Properties
• Proposition 7.4 (Monotone Convergence). If (Xn)n is a monotone non-decreasing sequence
of non-negative integrable random variables, such that Xn X for some integrable X, then
E[Xn|G] E[X|G] a.s.
Proof. Let Yn = X −Xn. Since Xn X, we get that Yn ≥ 0 for all n. Thus, (E[Yn|G])n is a
monotone non-increasing sequence of non-negative random variables. Let Z(ω) = infn E[Yn|G](ω) =
limn E[Yn|G](ω) = lim infn E[Yn|G](ω). So Z ∈ G and Z ≥ 0. Fatou’s Lemma gives that for any
A ∈ G,
E[Z] ≤ lim infn
E[E[Yn|G]] = lim infn
E[X −Xn] = 0,
since E[Xn] E[X] by monotone convergence. Thus, Z = 0 a.s. This implies that
E[X|G]− E[Xn|G]a.s.−→ 0.
ut
• Proposition 7.5. If Z ∈ G then E[XZ|G] = E[X|G]Z a.s.
Proof. Note that E[X|G]Z ∈ G so we only need to prove the second property.
We use the usual four-step proof, from indicators to simple random variables to non-negatives
to general.
If Z = 1B for some B ∈ G then for any A ∈ G,
E[XZ1A] = E[X1B∩A] = E[E[X|G]1B∩A] = E[E[X|G]Z1A].
If Z is simple, then Z =∑k ak1Ak and by linearity and the previous case,
E[XZ|G] =∑k
ak E[X1Ak |G] =∑k
ak E[X|G]1Ak = E[X|G]Z.
For general non-negative Z, in the case X is non-negative, we approximate Z by a non-
decreasing sequence of simple random variables, Zn Z, so that XZn XZ and by monotone
convergence and the previous case,
E[XZ|G] = limn
E[XZn| G] = limn
E[X|G]Zn = E[X|G]Z.
47
For a general Z ∈ G, and general X, write Z = Z+ − Z− and X = X+ − X−, with 0 ≤Z+, Z− ∈ G and X+, X− ≥ 0. By the previous case and linearity,
E[X±Z|G] = E[X±(Z+ − Z−)|G] = E[X±|G](Z+ − Z−) = E[X±|G]Z,
which immediately leads to the assertion. ut
The following properties all have their “usual” proof adapted to the conditional setting.
• Proposition 7.6 (Jensen’s Inequality). If g : R → R is convex, and X, g(X) are integrable,
then
g(E[X|G]) ≤ E[g(X)|G].
Johan Jensen (1859-1925)Proof. If g is convex then for any m there exist am, bm such that g(s) ≥ ams+bm for all s, and g(m) = amm+bm.
Thus, for any ω ∈ Ω, there exist A(ω), B(ω) such that g(s) ≥ A(ω)s + B(ω) for all s and g(E[X|G](ω)) =
A(ω)E[X|G](ω) + B(ω). It is not difficult to see that A,B are measurable, and determined by E[X|G] and g, so
A,B are G-measurable random variables. Thus,
g(E[X|G]) = AE[X|G] +B = E[AX +B| G] ≤ E[g(X)|G].
ut
• Proposition 7.7 (Cauchy-Schwarz). If X,Y are in L2(Ω,F ,P), then
(E[XY |G])2 ≤ E[X2|G]E[Y 2|G].
Augustin-Louis Cauchy
(1789-1857)
Proof. By Jensen’s inequality, E |E[XY |G]| ≤ E[|XY |] ≤√
E[X2][Y 2] <∞, so E[XY |G] is a.s. finite. If E[Y 2|G] =
0 a.s. then Y = 0 a.s. and so both sides of the inequality become 0. So we can assume that E[Y 2|G] > 0.
Set λ =E[XY |G]
E[Y 2|G], which is a G-measurable random variable. By linearity,
0 ≤ E[(X − λY )2|G] = E[X2|G] + λ2 E[Y 2|G]− 2λE[XY |G]
= E[X2|G]−(E[XY |G])2
E[Y 2|G].
ut
Hermann Schwarz
(1843-1921)
• Proposition 7.8 (Markov / Chebyshev ). If X ≥ 0 is integrable, then for any G-measurable
Z such that Z > 0,
P[X ≥ Z|G] ≤ E[X|G]
Z.
Pafnuty Chebyshev
(1821-1894)
Proof. Let Y = Z1X≥Z. So Y ≤ X. Thus,
Z P[X ≥ Z|G] = E[Y |G] ≤ E[X|G].
ut
48
Remark 7.9. Suppose that (Ω,F ,P) is a probability space, and G ⊂ F is some sub-σ-algebra.
We have two vector spaces associated: L2(Ω,G,P) ⊂ L2(Ω,F ,P); the spaces of square-integrable
G-measurable random variables and square-integrable F-measurable random variables. These
spaces come equipped with a inner-product structure given by < X,Y >= E[XY ]. The theory
of inner-product (or Hilbert) spaces tells us that L2(Ω,F ,P) = L2(Ω,G,P) ⊕ V where V is
the orthogonal complement to L2(Ω,G,P) in L2(Ω,F ,P). So we can project any F-measurable
square integrable X onto L2(Ω,G,P). This projection turns out to be exactly X 7→ E[X|G].
In fact, it is immediate that E[X|G] is a square-integrable G-measurable random variable.
Moreover, for Y ∈ L2(Ω,G,P),
〈X − E[X|G], Y 〉 = E[XY − E[X|G]Y ] = E[XY ]− E[E[XY |G]] = 0.
Thus, to minimize E[(X − Y )2] over all Y ∈ L2(Ω,G,P), we can take Y = E[X|G].
7.2.1. The smaller σ-algebra always wins. Perhaps the most important property that has
no “unconditional” counterpart is
• Proposition 7.10. Let X be an integrable random variable on a probability space (Ω,F ,P).
Let H ⊂ G ⊂ F be sub-σ-algebras. Then,
• E[E[X|H]|G] = E[X|H].
• E[E[X|G]|H] = E[X|H].
Proof. The first assertion comes from the fact that E[X|H] ∈ H ⊂ G, so conditioning on G has
no effect.
For the second assertion we have that E[X|H] ∈ H of course, and for any A ∈ H, using that
A ∈ G as well,
E[E[X|G]1A] = E[E[X1A|G]] = E[X1A] = E[E[X|H]1A].
ut
7.3. Partitioned Spaces
During this course, we will almost always use conditional probabilities conditioned on some
discrete random variable. Note that if Y is discrete with range R (perhaps d-dimensional), then∑r∈R 1Y=r = 1 a.s. This simplifies the discussion regarding conditional probabilities.
The main observation is the following
49
Exercise 7.6. Suppose that (Ω,F ,P) is a probability space with Ω =⊎k∈I Ak where
Ak ∈ F for all k ∈ I, with I some countable (possibly finite) index set. Show that
σ((Ak)k∈I) = ⊎k∈J
Ak : J ⊂ I.
Hint: Show that any set in the right-hand side must be in σ((Ak)k∈I). Show that the right-
hand side is a σ-algebra.
• Lemma 7.11. Let X be an integrable random variable on (Ω,F ,P). Let I be some countable
index set (possibly finite). Suppose that P[⊎k∈I Ak] = 1 where Ak ∈ F for all k, and P[Ak] > 0
for all k. Let G = σ((Ak)k∈I). Then,
E[X|G] =∑k
1Ak ·E[X1Ak ]
P[Ak].
Proof. Let Y =∑k 1Ak ·
E[X1Ak ]
P[Ak] . The of course Y ∈ G. For any A ∈ G we have that 1A =∑k∈J 1Ak (P-a.s.) for some J ⊂ I. Thus,
E[Y 1A] =∑k∈J
E[1Ak ] · E[X1Ak ]
P[Ak]=∑k∈J
E[X1Ak ] = E[X1A].
ut
• Corollary 7.12. Let Y be a discrete random variable with range R on (Ω,F ,P). Let X be an
integrable random variable on the same space. Then,
E[X|Y ] =∑r∈R
1Y=rE[X1Y=r]
P[Y = r]=∑r∈R
1Y=r E[X|Y = r],
where we take the convention that E[X|Y = r] =E[X1Y=r]
P[Y=r] = 0 for P[Y = r] = 0.
Proof. Ω =⊎r∈R Y = r. ut
X Note that E[X|Y ] is a discrete random variable as well, regardless of the original distribution
of X.
Number of exercises in lecture: 6
Total number of exercises until here: 16
50
Random Walks
Ariel Yadin
Lecture 8: Martingales
8.1. Martingales
X Do conditional expectation
• Definition 8.1. Let (Ω,F ,P) be a probability space. A filtration is a monotone sequence of
sub-σ-algebras F0 ⊂ F1 ⊂ · · · ⊂ F .
A sequence (Xn)n of random variables is said to be adapted to a filtration (Fn)n if for all
n, Xn ∈ Fn.
• Definition 8.2. Let (Ω,F ,P) be a probability space, and let (Fn)n be a filtration. A sequence
(Xn)n is said to be a martingale with respect to the filtration (Fn)n, or sometimes a (Fn)n-
martingale, if for all n,
• E[|Xn|] <∞ (i.e. Xn is integrable).
• E[Xn+1|Fn] = Xn.
• (Xn)n is adapted to (Fn)n.
If the filtration is not specified then we say that (Xn)n is a martingale if it is a martingale
with respect to the natural filtration Fn := σ(X0, . . . , Xn); that is, a sequence of integrable
random variables such that for all n,
E[Xn+1|Xn, . . . , X0] = Xn.
Exercise 8.1. Show that if (Xn)n is an Fn-martingale then (Xn)n is also a
martingale with respect to the natural filtration (σ(X0, . . . , Xn))n. (Hint: Show that for all
n, σ(X0, . . . , Xn) ⊂ Fn.)
51
Example 8.3. Let (Xn)n be a simple random walk on Z started at X0 = 0. The Markov
property gives that
E[Xn+1|Xn, . . . , X0] =1
2(Xn + 1) +
1
2(Xn − 1) = Xn.
So (Xn)n is a martingale. 454
Example 8.4. More generally, if (Xn)n is a sequence of independent random variables with
E[Xn] = 0 for all n, and Sn =∑nk=0Xk, then
E[Sn+1|Sn, . . . , S0] = Sn + E[Xn+1|Sn, . . . , S0].
Since Sn, . . . , S0 ∈ σ(X0, . . . , Xn) and since Xn+1 is independent of σ(X0, . . . , Xn) we have that
E[Xn+1|Sn, . . . , S0] = E[Xn+1] = 0.
So, in conclusion, (Sn)n is a martingale. 454
• Proposition 8.5. Let (Xn)n be a (Fn)n-martingale. For any k ≤ n we have E[Xn|Fk] = Xk.
Proof. For k = n this is obvious. Assume that k < n. By properties of conditional expectation,
because Fk ⊂ Fn−1,
E[Xn|Fk] = E[E[Xn|Fn−1]|Fk] = E[Xn−1|Fk].
Continuing inductively, we get the proposition. ut
Exercise 8.2. Let (Xn)n be a (Fn)n-martingale. Let T be a stopping time (with respect
to the filtration (Fn)n). Prove that (Yn := XT∧n)n is a (Fn)n-martingale.
••• Theorem 8.6 (Optional Stopping). Let (Xn)n be an (Fn)n-martingale and T a stopping
time. We have that E[XT |X0] = X0 in the following cases:
• If T is bounded; that is if T ≤ t a.s. for some 0 < t <∞.
• If T is a.s. finite and there exists M > 0 such that |Xn| ≤ M for all n a.s. ((Xn)n is
bounded).
• If E[T ] < ∞ and there exists M > 0 such that |Xn+1 −Xn| ≤ M for all n a.s. (Xn)n
has bounded increments.
52
Proof. We start with the first case: Let Yn = XT∧n. Since T ≤ t a.s. we get that Yt = XT .
Since Y0 = X0 we conclude
E[XT |X0] = E[Yt|Y0] = Y0 = X0.
For the second case: Let Yn = XT∧n as above. We have
|E[Yn|X0]− E[XT |X0]| = |E[(XT∧n −XT ) · 1T>n|X0]| ≤ 2M · P[T > n|X0]→ 0,
because T <∞ a.s. Thus, since T ∧ n is a bounded stopping time,
E[XT |X0] = limn→∞
E[Yn|Y0] = limn→∞
Y0 = X0.
Finally, for the third case: Note that
|XT∧n −XT | · 1T>n ≤T−1∑k=n
|Xk+1 −Xk|1T>n ≤MT1T>n.
Thus, similarly to the above,
|E[XT∧n|X0]− E[XT |X0]| ≤M E[T1T>n|X0].
Since T1T>n → 0, and since E[T ] <∞, we get by dominated convergence that E[T1T>n]→0, and so
X0 = E[XT∧n|X0]→ E[XT |X0].
ut
Let us use martingales to calculate some probabilities.
Example 8.7 (Gambler’s Ruin). Let (Xt)t be a simple random walk on Z. Let T = T (0, n)be the first time the walk is at 0 or n.
We can think of Xt as a the amount of money a gambler playing a fair game has after the
t-th game. What is the probability that a gambler that starts with x reaches n before going
bankrupt?
Let
pn(x) = Px[Tn < T0].
Since (Xt)t is a martingale, we get that (Xt∧T )t is a bounded martingale under the measure Px.
Since T is a.s. finite, we can apply the optional stopping theorem to get
x = Ex[XT∧T ] = Ex[XT ]
= Ex[XT |Tn < T0] · pn(x) + Ex[XT |T0 < Tn] · (1− pn(x)) = pn(x) · n.
53
So pn(x) = xn . 454
Remark 8.8. This is another proof that Z is recurrent:
Let An =Tn < T+
0
. So (An)n is a decreasing sequence of events. Thus,
P1[⋂n
An] = limn
P[An] = limn
1n = 0.
By symmetry,
P−1[⋂n
A−n] = 0.
Now, the event that the walk never returns to 0 is the event that the walk takes a step to either
1 or −1 and then never returns to 0; i.e.
T+
0 =∞
=
X1 = 1,
⋂n
An
⊎X1 = −1,
⋂n
A−n
.
The Markov property gives
P0[T+0 =∞] = P1[
⋂n
An] + P−1[⋂n
A−n] = 0.
Example 8.9. What about the amount of time it takes to reach 0 or n?
Consider Yt = X2t − t.
E[Yt+1|X0, . . . , Xt] =1
2·((Xt + 1)2 − (t+ 1) + (Xt − 1)2 − (t+ 1)
)= Yt.
So (Yt)t is a martingale, and thus (YT∧t) is a bounded martingale under the measure Px. Thus,
since Y0 = X20 ,
x2 = Ex[YT |Y0] = Ex[X2T ]− Ex[T ] = pn(x)n2 − Ex[T ].
So by the previous example, for any 0 ≤ x ≤ n,
Ex[T ] = xn− x2 = x(n− x).
454
Remark 8.10. This is another proof that Z is null-recurrent:
Under P0, the event Tn < T+0 implies that T+
0 ≥ 2n. So,
P0[T+0 ≥ 2n] ≥ P0[X1 = 1, Tn < T+
0 ] = P1[Tn < T0] = 1n .
54
Since P0[T+0 ≥ 2n− 1] ≥ P0[T+
0 ≥ 2n], we get that
E0[T+0 ] =
∞∑m=0
P0[T+0 > m] =
∞∑m=1
P0[T+0 ≥ m]
=
∞∑n=1
P0[T+0 ≥ 2n− 1] + P0[T+
0 ≥ 2n] ≥∞∑n=1
2
n=∞.
Example 8.11. Consider the martingale X2t − t. Using the optional stopping theorem at time
T = T+0 we get that
1 = E1[X20 − 0] = E1[X2
T − T ] = E1[T ].
Similarly, E−1[T ] = 1. Since
E0[T+0 ] =
1
2
(E0[T+
0 |X1 = 1] + E0[T+0 |X1 = −1]
)=
1
2(E1[T0 + 1] + E−1[T0 + 1]) ,
we get that E0[T+0 ] = 2 <∞!
Where did we go wrong?
We could not use the optional stopping theorem, because the martingale X2t −t is not bounded!
454
Example 8.12. Actually, this last bit gives a third proof that E0[T+0 ] = ∞. Suppose that
Ex[T0] < ∞. Since (Xt)t is a martingale with bounded differences, by the optional stopping
theorem x = Ex[XT0 ]. But, XT0 = 0 a.s. so Ex[T0] =∞ for all x. Using the Markov property,
E0[T+0 ] =
1
2(E1[T0 + 1] + E−1[T0 + 1]) =∞.
454
Number of exercises in lecture: 2
Total number of exercises until here: 18
55
Random Walks
Ariel Yadin
Lecture 9: Reversible Chains
9.1. Time Reversal
Let (Xt)t be Markov-P . Then, conditioned on Xt, we have that X[0, t] and X[t,∞) are
independent. This suggests looking at the chain run backwards in time - since determining the
past given the future will only depend on the current state.
However, in accordance with the second law of thermodynamics (entropy always increases),
we know that nice enough chains converge to a stationary distribution, even if the chain is started
from a very ordered distribution - namely a δ-measure. This suggests that there is a specific
direction we are looking at, and that the chain is moving from order to disorder represented by
the stationary measure.
However, if we start the chain from the stationary distribution, perhaps we can view the chain
both forwards and backwards in time. This is the content of the following.
• Definition 9.1. Let P be an irreducible Markov chain with stationary distribution π. Define
P (x, y) = P (y, x)π(y)π(x) . P is called the time reversal of P .
The next theorem justifies the name time reversal.
••• Theorem 9.2. Let π be the stationary distribution for an irreducible Markov chain P .
Then, P is an irreducible Markov chain, and π is a stationary distribution for P .
Moreover: Let (Xt)t be Markov-(π, P ). Fix any T > 0 and define Yt = XT−t, t = 0, . . . , T .
Then, (Yt)Tt=0 is Markov-(π, P ).
Proof. The fact that P is a Markov chain follows from∑y
P (x, y) =∑y
π(y)P (y, x) · 1π(x) = π(x)
π(x) = 1.
Also,
(πP )(x) =∑y
π(y)P (y, x) =∑y
π(y)π(x)P (x, y) 1π(y) = π(x) ·
∑y
P (x, y) = π(x),
56
so π is stationary for P .
Finally, note that π(x)P (x, y) = π(y)P (y, x). So,
P[Y0 = x0, . . . , YT = xT ] = Pπ[X0 = xT , X1 = xT−1, . . . , XT = x0]
= π(xT )P (xT , xT−1) · · ·P (x1, x0) = P (xT−1, xT ) · P (xT−2, xT−1) · · · P (x0, x1) · π(x0)
= π(x0) · P (x0, x1) · P (x1, x2) · · · P (xT−1, xT ).
ut
9.2. Reversible Chains
Recall the following definition:
• Definition 9.3. Let P be a Markov chain on S. A probability measure on S, π, is said to
satisfy the detailed balance equations if for all x, y ∈ S,
π(x)P (x, y) = π(y)P (y, x).
We also say that P and π are in detailed balance.
We also proved in the exercises that if P and π are in detailed balance, then π must be a
stationary distribution for P . (The opposite is not necessarily true, as is shown in the exercises.)
Immediately we see a connection between detailed balance and time reversals:
• Proposition 9.4. Let P be a Markov chain with stationary distribution π. The following are
equivalent:
• P and π are in detailed balance.
• P = P .
• For any T > 0, (Xt)Tt=0 is Markov-(π, P ) if and only if (XT−t)Tt=0 is Markov-(π, P ). [
The time reversal is the same as the forward-time chain. ]
Proof. We show that each bullet implies the one after it.
If P and π are in detailed balance, then for any states x, y,
P (x, y) = P (y, x)π(y)π(x) = π(x)P (x, y) 1
π(x) = P (x, y).
So P = P .
57
If P = P then for any T > 0, if (Xt)Tt=0 is Markov-(π, P ) then (XT−t)Tt=0 is Markov-(π, P ).
Since P = P we get that (XT−t)Tt=0 is Markov-(π, P ). Reversing the roles of Xt and XT−t we
get that for all T > 0, (Xt)Tt=0 is Markov-(π, P ) if and only if (XT−t)Tt=0 is Markov-(π, P ).
Now for the third implication, assume that for all T > 0, (Xt)Tt=0 is Markov-(π, P ) if and
only if (XT−t)Tt=0 is Markov-(π, P ). Take T = 1. Then (X0, X1) is Markov-(π, P ) if and only if
(X1, X0) is Markov-(π, P ). That is,
π(x)P (x, y) = Pπ[X0 = x,X1 = y] = Pπ[X1 = y,X0 = x] = π(y)P (x, y).
So P and π are in detailed balance. ut
9.3. Reversible chains as weighted graphs
• Definition 9.5. Let G be a graph. A conductance on G is a function c : V (G)2 → [0,∞)
satisfying
• c(x, y) = c(y, x) for all x, y.
• c(x, y) > 0 if and only if x ∼ y.
The pair (G, c) is called a weighted graph, or sometimes a network or electric network.
Remark 9.6. Let (G, c) be a weighted graph, with C =∑x,y c(x, y) <∞. Define cx =
∑y c(x, y)
and P (x, y) = c(x,y)cx
. P is a stochastic matrix, and so defines a Markov chain. For π(x) = cxC
we have that π is a distribution, and π(x)P (x, y) = c(x, y) = c(y, x) = π(y)P (y, x). Thus, P is
reversible.
We will refer to such a P as the random walk on G induced by c.
On the other hand, if P is a reversible Markov chain S, we can define a weighted graph as
follows: Let V (G) = S and c(x, y) = π(x)P (x, y). Let x ∼ y if c(x, y) > 0. Note that∑x,y
c(x, y) =∑x,y
π(x)P (x, y) = 1.
Also, we see that P is the random walk induced by (G, c).
X Connection to multiple edges and self-loops.
• Definition 9.7. If (G, c) is a weighted graph with∑x,y c(x, y) <∞, then the Markov chain
P (x, y) = c(x,y)∑z c(x,z)
is called the weighted random walk on G with weights c.
58
Example 9.8. Let (G, c) be the graph V (G) = 0, 1, 2, with edges E(G) = 0, 1 , 1, 2 , 0, 2and c(0, 1) = 1, c(1, 2) = 2 and c(2, 0) = 3.
The weighted random walk is then
P =
0 1
434
13 0 2
3
35
25 0
.The stationary measure is, of course, π(x) =
∑y c(x, y)/
∑z,w c(z, w) so π = [ 1
314
512 ] is the
stationary distribution.
We can compute that P = P (which is expected since P is reversible). 454
Example 9.9 (One dimensional Markov chains are almost reversible). Let P be a Markov
chain on Z such that P (x, y) > 0 if and only if |x − y| = 1. For x ∈ Z let px = P (x, x + 1) (so
1− px = P (x, x− 1).
Consider the following conductances on Z: Let c(0, 1) = 1. For x > 0 set
c(x, x+ 1) =
x∏y=1
py1− py
.
Let c(0,−1) = 1−pxpx
and for x < 0 set
c(x, x− 1) =
x∏y=−1
1− pypy
.
Note that for any x ∈ Z we have that
c(x, x+ 1) = c(x− 1, x) · px1− px
,
soc(x, x+ 1)
c(x, x+ 1) + c(x, x− 1)=
px(1− px)( px
1−px + 1)= px.
So P is the weighted random walk with weights given by c.
Moreover, note that
(c(x, x− 1) + c(x, x+ 1))P (x, x+ 1) = c(x, x− 1) · 11−px · px = c(x, x+ 1)
and
(c(x+ 1, x) + c(x+ 1, x+ 2))P (x+ 1, x) = c(x, x+ 1) · 11−px · (1− px) = c(x, x+ 1),
So for m(x) = c(x, x−1)+ c(x, x+1) we have that m(x)P (x, y) = m(y)P (y, x) for all x, y. That
is, if m was a distribution, P would be reversible.
59
To normalize m to be a distribution we would need that∑x
m(x) =∑x
c(x, x− 1) + c(x, x+ 1) = 2∑x
c(x, x+ 1) <∞.
For example, if px = 1/3 for x > 0 and px = 2/3 for x < 0 we would have that c(x, x+1) = 2−x
for x ≥ 0 and c(x, x− 1) = 2x for x ≤ 0. Thus∑x
m(x) = 2
∞∑x=0
2−x + 2−x = 4 · 2 = 8 <∞.
So π(x) = c(x,x−1)+c(x,x+1)8 is a stationary distribution.
In general, we see that a drift towards 0 would give a reversible chain. 454
Number of exercises in lecture: 0
Total number of exercises until here: 18
60
Random Walks
Ariel Yadin
Lecture 10: Discrete Analysis
10.1. Laplacian
In order to study electric networks and conductances, we will first introduce the concept of
harmonic functions.
Let G = (V (G), c) be a network; recall that by this we mean: c : V (G)×V (G)→ [0,∞) with
c(x, y) = c(y, x) for all x, y ∈ G and cx :=∑y c(x, y) <∞ for all x. We denote by E(G) the set
of oriented edges of G; that is,
E(G) = (x, y) : c(x, y) > 0 .
(We write x ∼ y when c(x, y) > 0.) For e ∈ E(G) we write e = (e+, e−). c is known as the
conductance of the network.
Let C0(V ) = f : V (G)→ R and C0(E) = f : E(G)→ R be the sets of all functions of
vertices and (oriented) edges of G respectively.
We can define an operator ∇ : C0(V )→ C0(E) by: for any edge x ∼ y,
(∇f)(x, y) = c(x, y)(f(x)− f(y)).
We can also define an operator div : C0(E)→ C0(V )
(divF )(x) =∑y∼x
1
cx(F (x, y)− F (y, x)).
We can consider the spaces C0(V ), C0(E) with the inner products
〈f, f ′〉 =∑x
cxf(x)f ′(x) and 〈F, F ′〉 =∑e
1
c(e)F (e)F ′(e).
Consider the subspaces L2(V ) =f ∈ C0(V ) : 〈f, f〉 <∞
and L2(E) =
F ∈ C0(E) : 〈F, F 〉 <∞
.
The operator ∇ is a linear operator from L2(V ) to L2(E). Also div : L2(E)→ L2(V ) is a linear
61
operator, and
〈∇f, F 〉 =∑(x,y)
(f(x)− f(y))F (x, y) =∑x∼y
f(x)(F (x, y)− F (y, x))
=∑x
cxf(x)∑y∼x
1
cx(F (x, y)− F (y, x)) = 〈f, divF 〉.
So ∇∗ = div and div∗ = ∇ are dual of each other.
Recall that the weighted random walk on the network G is just the Markov process with
transition matrix given by P (x, y) = c(x,y)cx
.
Define the operator ∆ : C0(V )→ C0(V ) by ∆ = 12div∇. That is,
∆f(x) = 12div∇f(x) = 1
2
∑y∼x
1
cx(∇f(x, y)−∇f(y, x)) =
∑y
P (x, y)(f(x)− f(y)).
Exercise 10.1. Show that (in matrix form) ∆ = I − P where I is the identity
operator.
10.2. Harmonic functions
• Definition 10.1. A function f : V (G) → R is called harmonic at x if ∆f(x) = 0. f is
said to be harmonic on A if for all x ∈ A, f is harmonic at x. f is said to be harmonic, if f is
harmonic at all x.
Harmonic functions and martingales are intimately related.
• Proposition 10.2. Let G = (V (G), c) be a network. Let f : G→ R be a function. Let S ⊂ Gand let T = TSc be the first exit time of S, for (Xt)t, the weighted random walk on G.
Then, f is harmonic in S if and only if the sequence (Mt = f(Xt∧T ))t is a martingale under
Px for all x.
Proof. First assume that f is harmonic in S. Note that if x 6∈ S then Xt∧T = X0 = x a.s. under
Px. So as a constant sequence, Mt = f(x) is a martingale. So we only need to deal with x ∈ S.
The main observation here is that the Markov property is just the fact that
Ex[f(Xt+1)|Ft] =∑y
P (Xt, y)f(y) = (Pf)(Xt).
62
For any t, since 1T≥t+1 = 1T>t ∈ Ft, and f(XT )1T≤t ∈ Ft,
Ex[Mt+1|Ft] = Ex[f(Xt+1)|Ft]1T>t + f(XT )1T≤t = (Pf)(Xt)1T>t + f(XT )1T≤t.
If f is harmonic at x, then Pf(x) = f(x). Thus, since on the event T > t, f is harmonic at Xt,
we get that (Pf)(Xt)1T>t = f(Xt)1T>t. In conclusion,
Ex[Mt+1|Ft] = (Pf)(Xt)1T>t+f(XT )1T≤t = f(Xt)1T>t+f(XT )1T≤t = f(Xt∧T ) = Mt.
So Mt is a martingale.
For the other direction, assume that Mt∧T is a martingale. Then, for any x ∈ S,
f(x) = M0 = Ex[M1] = Ex[f(X1)] = (Pf)(x),
were we have used that under Px, T ≥ 1 a.s. So we have that for any x ∈ S, ∆f(x) =
(I − P )f(x) = 0. So f is harmonic in S. ut
Harmonic functions exhibit properties analogous to those in the continuous case.
• Proposition 10.3 (Solution to Dirichlet Problem). Let G = (V (G), c) be a network. Let
B ⊂ G (we think of B as the boundary). Let
D = x ∈ G : Px[TB <∞] = 1 .
(So B ⊂ D.) Let u : B → R be some bounded function (boundary values).
Then, there exists a unique function f : D → R that is bounded, harmonic in D \ B and
admits f(b) = u(b) for all b ∈ B.
Proof. Define f(x) = Ex[u(XTB )]. This is well defined, since under Px, TB < ∞ a.s. and since
u is bounded.
It is immediate to check that for any b ∈ B, f(b) = u(b). Also, for x ∈ D \ B, since TB ≥ 1
Px−a.s. , by the Markov property,
f(x) = Ex[u(XTB )] =∑y
P (x, y)Ey[u(XTB )] = Pf(x).
So f is harmonic at x.
For uniqueness, assume that g : D → R is bounded, harmonic in D \ B, and g(b) = u(b) for
all b ∈ B. We want to show that
for all x ∈ D, g(x) = Ex[u(XTB )].(10.1)
63
g is bounded, so (g(XTB∧t))t is a bounded martingale, so (10.1) holds by the optional stopping
theorem, because TB <∞ Px-a.s. for all x ∈ D. ut
If we remove the condition that TB < ∞ then we can only guaranty existence but not
uniqueness of the solution to the Dirichlet problem.
• Proposition 10.4. Let G = (V (G), c) be a network. Let B ⊂ G and let u : B → R be some
function.
Then, there exists a function f : G → R that is harmonic in G \ B and admits f(b) = u(b)
for all b ∈ B.
Proof. We define
f(x) = Ex[u(XTB )1TB<∞].
Obviously, f(b) = u(b) for all b ∈ B. Also, for x 6∈ B, since TB ≥ 1 Px-a.s. we have that f is
harmonic at x by the Markov property. ut
X Comparison to Poisson formula?
The maximum principle for harmonic functions in Rd states that if a non-constant function is har-
monic in a connected open subset of Rd then it will have all its maximal values on the boundary.
• Proposition 10.5 (Maximum Principle). Let G = (V (G), c) be a network. Let B ⊂ G and
D = x ∈ G : Px[TB <∞] = 1 . Let f : D → R be a bounded function, harmonic in D \ B.
Then,
supx∈D
f(x) = supx∈B
f(x) and infx∈D
f(x) = infx∈B
f(x).
That is, supremum and infimum are on the boundary.
Moreover, if D \ B is connected, and f is not constant, any x such that f(x) attains the
supremum or infimum must admit x ∈ B.
Proof. For any x ∈ D we know that
f(x) = Ex[f(XTB )] ≤ supb∈B
f(b),
because XTB ∈ B a.s.
Now, assume that f(x) ≥ supy∈D f(y) for some x ∈ D \ B. Let z ∈ D. Since D \ B is
connected, there exists a path from x to z that does not intersect B. Thus, there exists t > 0
64
such that Px[TB ≥ t,Xt = z] > 0. Since f is harmonic in D \ B, we get that f(XTB∧s)s is a
martingale. Thus, stopping at time s = t,
(f(x)−f(z))·Px[TB ≥ t,Xt = z] = Ex[(f(x)−f(Xt∧TB ))·1TB≥t,Xt=z] ≤ Ex[f(x)−f(Xt∧TB )] = 0.
So f(z) ≤ f(x) ≤ f(z) for any z ∈ D, and f is constant.
This completes the proofs for the supremum. For the infimum, consider the function g = −f .
So g is bounded, harmonic in D \ B. Since supx∈S g(x) = − infx∈S f(x) for any set S, we can
apply the proposition to g to get the assertions for the infimum. ut
Example 10.6. Consider the following network: V (G) = Z and c(x, x+ 1) =(
p1−p
)x. Suppose
that p > 1/2 (if p = 1/2 this is just the simple random walk on Z, and if p < 1/2 then we can
exchange x 7→ −x to get the same thing).
The weighted random walk here is given by
P (x, x+ 1) =c(x, x+ 1)
c(x, x+ 1) + c(x− 1, x)= p and P (x, x− 1) = 1− p.
First let’s prove that the weighted random walk here is transient. For example, recall that it
suffices to show that∞∑t=0
P0[Xt = 0] <∞.
Well, since at each step the walk moves right with probability p and left with probability 1− pindependently, we can model this walk by
Xt =
t∑k=1
ξk,
where (ξk)k are independent and all have distribution P[ξk = 1] = p = 1− P[ξk = −1].
The usual trick here is to note that ξk+12 ∼ Ber(p), so
P0[X2t = 0] = P[Bin(2t, p) = t] =
(2t
t
)pt(1− p)t.
(This is symmetric in p as expected.) Of course P[X2t+1 = 0] = 0 because of parity issues.
Now, since(
2tt
)is the number of size t subsets out of 2t elements, this is at most the total
number of subsets which is 22t. Since for p 6= 1/2, 4p(1− p) < 1, we get that
∞∑t=0
P[Xt = 0] ≤∞∑t=0
(4p(1− p))t =1
1− 4p(1− p) <∞.
This is one proof that for p 6= 1/2 the weighted walk is transient.
65
Now, let us consider B = 0 and boundary values u(0) = 1. What is a bounded harmonic
function f : G → R such that f is harmonic in G \ B? Well, we can take f ≡ 1, which is one
option. Another option is to take f(x) = Px[T0 < ∞]. But since G is transient, we know that
f 6≡ 1!
Since Px[T0 < ∞] = Ex[u(0)1T0<∞] we see that this is the second solution from above.
However, the uniqueness is only for functions defined on x : Px[T0 <∞] = 1, so a-priori
there is freedom to choose more than one option for those x’s such that Px[T0 <∞] < 1. 454
X add discussion on finite networks?
10.3. Green Function
Let G = (V (G), c) be a network. Let u : G→ R be a function. Suppose we want to solve the
equation ∆f = u.
If we had a function g : G×G→ R that satisfied
∆g(·, x) = 1x=·
for every x, we could write
f(y) =∑x
g(y, x)u(x).
Then,
∆f(z) =∑x
u(x)1x=z = u(z),
which is a solution. So finding the solution to ∆g = 1x=· is the basic step.
It turns out that such a g exists, and g is called the Green Function. It is the counterpart of
the classical Green Function.
• Proposition 10.7. Let G = (V (G), c) be a network. Let Z ⊂ G be a set (possibly empty).
Define
gZ(x, y) = Ex[
TZ−1∑k=0
1Xk=y].
Assume that at least one of the following conditions holds:
• The weighted random walk on G is transient.
• Z 6= ∅.
66
Then,
∆gZ(·, x) = 1x=·
for all x 6∈ Z. Moreover, for all x, y,
cxgZ(x, y) = cygZ(y, x).
Proof. The conditions of the proposition are there to ensure that
gZ(x, y) =
∞∑k=0
Px[Xk = y, TZ > k] <∞.
First, the Markov property gives that for a fixed y, using h(x) = gZ(x, y),
h(x) = 1x=y +
∞∑k=1
Px[Xk = y, TZ > k] = 1x=y +
∞∑k=1
∑w
P (x,w)Pw[Xk−1 = y, TZ > k − 1]
= 1x=y +∑w
P (x,w)h(w),
so ∆h(x) = 1x=y.
The symmetry of gZ is shown as follows: By the definition of the weighted random walk, we
have that cxP (x, y) = cyP (y, x) = c(x, y) for all x ∼ y. Thus, for any path in G, (x0, . . . , xn),
cx0 Px0 [X0 = x0, . . . , Xn = xn] = cxn Pxn [X0 = xn, . . . , xn = x0].
Thus, for any x, y,
cx Px[Xk = y, TZ > k] =∑γ:x→y
|γ|=k , γ∩Z=∅
cx Px[X[0, k] = γ]
=∑γ:x→y
|γ|=k , γ∩Z=∅
cy Py[X[0, k] = (γk, γk−1, . . . , γ0)]
= cy Py[Xk = x, TZ > k].
Summing over k completes the proof. ut
Number of exercises in lecture: 1
Total number of exercises until here: 19
67
Random Walks
Ariel Yadin
Lecture 11: Networks
11.1. Some discrete analysis
Let G = (V, c) be a network.
Recall that for x ∼ y,
(∇f)(x, y) = c(x, y)(f(x)− f(y)).
Also,
(divF )(x) =∑y∼x
1
cx(F (x, y)− F (y, x)).
We have the duality formula
〈∇f, F 〉 =∑(x,y)
(f(x)− f(y))F (x, y) =∑x∼y
f(x)(F (x, y)− F (y, x))
=∑x
cxf(x)∑y∼x
1
cx(F (x, y)− F (y, x)) = 〈f, divF 〉,
where 〈f, f ′〉 =∑x cxf(x)f ′(x) and 〈F, F ′〉 =
∑e
1c(e)F (e)F ′(e). Also, ∆ = I − P = 1
2div∇.
We want to think of ∇ as differentiation. So the opposite operation should be some kind of
integral.
Let γ : x→ y be a path in G. For a function F ∈ C0(E) on the oriented edges of G, define∮γ
F =
|γ|−1∑j=0
F (γj , γj+1)1
c(γj , γj+1).
For a path γ define its reversal by γ = (γ|γ|, γ|γ|−1, . . . , γ0). Also, define F ∈ C0(E) by F (x, y) =
F (y, x).
We make a few observations:
• Proposition 11.1. Let F ∈ C0(E).
•∮γF =
∮γF . Thus, if F is anti-symmetric, i.e. F (x, y) = −F (y, x) for all x ∼ y, then
for any path∮γF = −
∮γF .
• If F = ∇f for some f ∈ C0(V ), then for any path γ : x → y we have that∮γF =
f(x)− f(y).
68
• If ∇f = ∇g then there exists a constant η such that f = g + η.
Proof. The first bullet is immediate just reversing the order of the edges in F .
For the second bullet, expanding the sum, we find that for γ : x→ y,∮γ
F =
|γ|−1∑j=0
f(γj)− f(γj+1) = f(x)− f(y).
For the third bullet, note that for any γ : x→ y we have that
f(x)− f(y) =
∮γ
∇f =
∮γ
∇g = g(x)− g(y).
So f(x)− g(x) = f(y)− g(y) for all x, y, and the difference f − g is constant. ut
• Definition 11.2. A function F ∈ C0(E) is said to respect Kirchhoff’s cycle law if for any
cycle γ : x→ x,∮γF = 0.
Gustav Kirchhoff
(1824-1887)
Any gradient respects Kirchhoff’s cycle law, as shown above. But the converse also holds:
• Proposition 11.3. F ∈ C0(E) respects Kirchhoff’s cycle law if and only if there exists
f ∈ C0(V ) such that F = ∇f .
In other words, if F respects Kirchhoff’s cycle law, then we can define∫F := f for any f
such that ∇f = F , and then all representations of∫F differ by some constant.
Proof. We only need to prove the “only if” direction.
Assume that F respects Kirchhoff’s cycle law. First, note that F must be anti-symmetric.
Indeed, for x ∼ y, the path (x, y, x) is a cycle, and
F (x, y) + F (y, x) = c(x, y) ·∮
(x,y,x)
F = 0.
Now, fix x, y ∈ G and let γ : x→ y and β : x→ y. Then, the path α = γβ = (γ0, . . . , γ|γ|, β|β|−1, . . . , β0)
is a cycle α : x→ x. So ∮γ
F −∮β
F =
∮γ
F +
∮β
F =
∮α
F = 0.
So∮γF does not depend on the choice of γ : x→ y, but only on the endpoints x and y.
Fix some a ∈ G and for any x ∈ G define f(x) =∮γF for some γ : x→ a, with the convention
that f(a) = 0. It is clear that for any x ∼ y,
F (x, y)1
c(x, y)=
∮(x,y)
F = f(x)− f(y).
So F = ∇f . ut
69
11.2. Electrical Networks
Let G = (V, c) be a network. For each edge x ∼ y, define the resistance of the edge to be
r(x, y) = 1c(x,y) . Let A,Z ⊂ G be two disjoint subsets.
If we were physicists, we could enforce voltage 1 on A, voltage 0 on Z, and look at the voltage
and current flowing through the graph G, where each edge is a r(x, y)-Ohm resistor. According
to Ohm’s law, the current equals the potential difference divided by the resistance I = ∇VR .
Kirchhoff would reformulate this telling us that the total current out of each node should be 0,
except for those nodes in A ∪ Z.
Let us turn this into a mathematical definition. The physics will only serve as intuition (albeit
usually good intuition).
• Definition 11.4. Let G = (V, c) be a network. Let A,Z be disjoint subsets of G.
A voltage imposed on A and Z is a function v : G→ R that is harmonic in G \ (A ∪ Z).
A unit voltage is a voltage v with v(a) = 1 for all a ∈ A and v(z) = 0 for all z ∈ Z.
Given a voltage v, the current induced by v is defined I(x, y) = ∇v(x, y) = c(x, y)(v(x)−v(y))
for all oriented edges x ∼ y.
X Note that this has the form I(x, y) = v(x)−v(y)r(x,y) , which is the form of Ohm’s law.
Georg Ohm (1789-1854)
• Definition 11.5. Let G = (V, c) be a network, and let A,Z be disjoint subsets of G. A flow
from A to Z is a function F on oriented edges of G satisfying:
• F is anti-symmetric: For every edge x ∼ y, F (x, y) = −F (y, x).
• F is divergence free: For every x ∈ G\(A∪Z), divF (x) =∑y∼x
1cx
(F (x, y)−F (y, x)) =
0. (A function being divergence free is sometimes said to respect Kirchhoff’s node
law.)
X For simplicity, we will sometimes extend a flow F to all pairs (x, y) by defining F (x, y) = 0
for x 6∼ y.
Example 11.6. If v is a voltage, then the current induced by v is a flow; indeed,
I(x, y) = c(x, y)(v(x)− v(y)) = −c(y, x)(v(y)− v(x)) = −I(y, x),
70
and for x 6∈ A ∪ Z,
divI(x) =∑y∼x
1
cx(I(x, y)− I(y, x)) = 2
∑y∼x
c(x, y)
cx(v(x)− v(y)) = 2∆v(x) = 0.
This fact is Kirchhoff’s node law. 454
Example 11.7. If v is a voltage, and I is the current induced by v, then we have Kirchhoff’s
cycle law: for any cycle γ : x→ x, γ = (x = γ0, γ1, . . . , γn = x),
n−1∑j=0
I(xj , xj+1)r(xj , xj+1) =
n−1∑j=0
v(xj)− v(xj+1) = v(x)− v(x) = 0.
This of course is due to the fact that any derivative ∇v respects Kirchhoff’s cycle law. 454
Exercise 11.1. Let G = (V, c) be a finite network and A,Z disjoint subsets of
G. Let I be flow from A to Z, that satisfies Kirchhoff’s cycle law: for any cycle γ : x → x,
γ = (x = γ0, γ1, . . . , γn = x),
n−1∑j=0
I(xj , xj+1)r(xj , xj+1) = 0.
Then, there exists a voltage v such that I is induced by v. Moreover, if u, v are two such voltages,
then v − u = η, for some constant η.
11.3. Probability and Electric Networks
Since voltages are harmonic functions, it is not surprising that there is a connection between
probability and electric networks. Let us elaborate on this.
• Definition 11.8. Let G = (V, c) be a network. Let a ∈ G and Z ⊂ G. Let v be a voltage
such that v(z) = 0 for all z ∈ Z and v(a) = 1. Define the effective conductance from a to Z
by
Ceff(a, Z) : =∑x
I(a, x) =ca2
divI(a)
=∑x
c(a, x)(v(a)− v(x)) = ca∆v(a),
where I is the current induced by v.
71
The effective resistance is defined as the reciprocal of the effective conductance.
Reff(a, Z) := (Ceff(a, Z))−1.
• Proposition 11.9. Let G = (V, c) be a network. Let a , Z be disjoint subsets. Let v be a
voltage such that v(z) = 0 for all z ∈ Z, and v(a) 6= 0 arbitrary. Let I be the current induced by
v. Then,
• Ceff(a, Z) =∑x I(a,x)
v(a) = ca∆v(a)v(a) .
• If the component of a in G \ Z is finite, then Ceff(a, Z) = ca Pa[TZ < T+a ]. Specifically,
in this case Ceff(a, Z) does not depend on the choice of the voltage.
Proof. The first bullet follows from the fact that u = 1v(a)v is a voltage with u(z) = 0 for all
z ∈ Z and u(a) = 1, and v(a)∆u = ∆v.
For the second bullet, let D be the component of a in G\Z. We have two harmonic functions
on D: u = vv(a) and Px[TZ > Ta], which are 0 on Z and 1 on a. Thus, these functions are equal,
because D is finite. Now,
ca Pa[TZ < T+a ] = ca
∑x
P (a, x)(1− u(x)) =1
v(a)
∑x
c(a, x)(v(a)− v(x))
=ca∆v(a)
v(a)= Ceff(a, Z).
ut
11.4. Resistance to Infinity
Example 11.10. Let G = (V, c) be an infinite network, and let a ∈ G. Let (Gn)n be an
increasing sequence of finite connected subgraphs of G, that contain a, such that G =⋃nGn
(in this case we say that (Gn)n exhaust G).
For every n, let Zn = G \Gn. Note that the connected component of a in G \Zn is Gn which
is finite. Thus, we can consider the effective conductance from a to Zn, Ceff(a, Zn). This is a
sequence of numbers, which converges to a limit; indeed, if T+a < ∞, since X[0, T+
a ] is a finite
path, there exists n0 such that for all n > n0, X[0, T+a ] ⊂ Gn. The events TZn < T+
a form a
decreasing sequence, so
limn→∞
Ceff(a, Zn) = ca limn→∞
Pa[TZn < T+a ] = ca Pa[T+
a =∞].
72
Thus, we see that limn→∞ Ceff(a, Zn) does not depend on the choice of the exhausting subgraphs
(Gn)n, and
(G, c) is recurrent ⇐⇒ limn→∞
Ceff(a, Zn) = 0 ⇐⇒ limn→∞
Reff(a, Zn) =∞.
454
In light of the above:
• Definition 11.11. Let G = (V, c) be an infinite network, and let a ∈ G. Let (Gn)n be an
increasing sequence of finite connected subgraphs of G, that contain a, such that G =⋃nGn.
Let Zn = G \Gn.
Define the conductance from a to infinity and resistance from a to infinity as
Ceff(a,∞) = limn→∞
Ceff(a, Zn) and Reff(a,∞) = Ceff(a,∞)−1.
Thus, the theorem is:
••• Theorem 11.12. The weighted random walk in a network G is recurrent if and only if the
resistance from some vertex a to infinity is infinite.
Number of exercises in lecture: 1
Total number of exercises until here: 20
73
Random Walks
Ariel Yadin
Lecture 12: Network Reduction
12.1. Network Reduction
Recall that
Ceff(a,∞) = ca Pa[T+a =∞].
So the effective resistance or conductance to infinity will not help us decide whether (G, c) is
recurrent unless we have a way of simplifying a sequence finite networks Gn.
We will now compute a few operations that will help us reduce networks to simpler ones
without changing the effective conductance between a and Z. Thus, it will give us the ability to
compute probabilities on some networks.
When we wish two differentiate between effective conductance (or resistance) in two networks,
we will use Ceff(a, Z;G) and Ceff(a, Z;G′).
12.1.1. Parallel Law.
Exercise 12.1. Suppose (G, c) is a network with multiple edges. Let (G′, c′) be the
network without multiple edges where the weight c′(x, y) is the sum of all weights between x
and y in (G, c). That is,
c′(x, y) =∑
e∈E(G),e+=x,e−=y
c(e).
Then, (G′, c′) is a network without multiple edges, and the weighted random walk on (G′, c′)
has the same distribution as the weighted random walk on (G, c).
Specifically, for all a, Z the effective conductance between a and Z does not change.
Solution. This is just the fact that the transition probabilities for (G, c) and (G′, c′) are propor-
tional to each-other:
P (x, y) ∑
e : e+=x,e−=y
c(e) = c′(x, y) P ′(x, y).
74
ut
x y
x y
c1
c2
c1 + c2
Figure 3. Parallel Law
12.1.2. Series Law.
• Proposition 12.1 (Series Law). Let (G, c) be a network. Suppose there exists w that has
exactly two adjacent vertices u1, u2.
Let (G′, c′) be the network given by V (G′) = V (G) \ w, and
c′(x, y) =
c(x, y) x, y ∈ V (G′) , x, y 6= u1, u21
r(u1,w)+r(u2,w) + c(u1, u2) x, y = u1, u2 .
That is, we remove the edges between u1 ∼ w and u2 ∼ w and add weight 1c(u1,w)−1+c(u2,w)−1 to
the edge u1 ∼ u2 (which may have originally had weight 0).
Then, for any a, Z such that w 6∈ a∪Z, and such that the component of a in G\Z is finite,
we have that Ceff(a, Z;G) = Ceff(a, Z;G′).
Proof. Let (G′, c′) be a network identical to (G, c) except that c′(u1, w) = c′(u2, w) = 0 and
c′(u1, u2) = c(u1, u2)+C. We want to calculate C so that any function that is harmonic at u1, w
on G will be harmonic at u1 on G′ as well.
Let f : G→ R be harmonic at u1, w on G. If f(u1) = f(w), then harmonicity at w together
with the fact that w is adjacent only to u1, u2, give that f(u1) = f(w) = f(u2). So the weight
of the edges between u1, u2, w does not affect harmonicity of function, and can be changed.
75
Hence, we assume that f(u1) 6= f(w). Let h = f−f(w)f(u1)−f(w) . So h is harmonic at u1, w and
h(w) = 0 and h(u1) = 1. Harmonicity at u1 gives that
∑y 6=w
c(u1, y)(h(u1)− h(y)) = −c(u1, w)(h(u1)− h(w)) = −c(u1, w).
Harmonicity at w gives
c(u1, w) + c(u2, w)h(u2) = 0.
Thus, in order for h to be harmonic at u1 on G′, we require that
0 =∑y 6=w
c(u1, y)(h(u1)− h(y)) + C(h(u1)− h(u2)) = −c(u1, w) + C ·(
1 +c(u1, w)
c(u2, w)
).
This leads to the equation
C =c(u1, w) · c(u2, w)
c(u1, w) + c(u2, w)=
1
r(u1, w) + r(u2, w).
Thus, we have shown that choosing the weight 1r(u1,w)+r(u2,w) as above, we get that if f is
harmonic at u1, w on G, then f is also harmonic at u1 on G′. Taking u1 to play the role of u2,
the same holds if f is harmonic at u2 and w on G.
Let a, Z be as in the proposition. Let D be the component of a in G \ Z. Let v be a unit
voltage imposed on a and Z in D. Since we chose the weight on u1 ∼ u2 in G′ correctly, we get
that v is also a unit voltage imposed on a and Z in G′.
Because Ceff(a, Z;G) =∑y∇v(a, y) and similarly in G′, and since G \ Z and G′ \ Z only
differ at edges adjacent to u1, u2 and w, we have that Ceff(a, Z;G) − Ceff(a, Z;G′) = 0 for all
a 6∈ u1, u2.Now, if a = u1 then we have by harmonicity of v at w,
(c(u1, w) + c(u2, w))v(w) = c(u1, w)v(a) + c(u2, w)v(u2).
Since the only difference is on edges adjacent to u1, u2 and w,
Ceff(a, Z;G)− Ceff(a, Z;G′) = c(a,w)(v(a)− v(w))− 1
r(u1, w) + r(u2, w)· (v(a)− v(u2))
=c(u1, w)
c(u1, w) + c(u2, w)· ((c(u1, w) + c(u2, w))(v(a)− v(w)) + c(u2, w)(v(a)− v(u2)))
= 0.
ut
76
Remark 12.2. Note that if w has exactly 2 neighbors in a network (G, c) as above, with resistances
r1, r2 on these edges, then the network with these two resistors exchanged with a single resistor of
resistance r1+r2 is an equivalent network in the sense that effective resistances and conductances
do note change, as above.
u1 u2w
u1 u2
c1 c2
11c1
+ 1c2
Figure 4. Series Law
Example 12.3. What is the effective conductance between a and z in the following network:
a z
a z
1/2
1/2
1/2
1/2
1/2
a z
3/2
a z
3/5
1/2
a z
3/8
1/3
a z
17/24
77
454
12.1.3. Contracting Equal Voltages.
Exercise 12.2. Let (G, c) be a network, and let v be a unit voltage imposed on a and
Z. Suppose x, y 6∈ a ∪ Z are such that v(x) = v(y). Define (G′, c′) by contracting x, y to the
same vertex; that is: V (G′) is V (G) with the vertices x, y removed and a new vertex xy instead.
All edges and weights stay the same, except for those adjacent to x or y, for which we have
c′(xy,w) = c(x,w) + c(y, w) for all w.
Then, v is a unit voltage imposed on a and Z in G′ (where v(xy) := v(x) = v(y)), and the
effective conductance between a and Z does not change: Ceff(a, Z;G) = Ceff(a, Z;G′).
Solution. Since the only change is at edges adjacent to x and y, we only need to check that for
w = xy or w ∼ xy such that w 6∈ a ∪ Z, v is harmonic at w in G′.
For w ∼ xy,
∑u
c′(w, u)(v(w)− v(u)) =∑u6=x,y
c(w, u)(v(w)− v(u)) + (c(w, x) + c(w, y))(v(w)− v(xy))
=∑u
c(w, u)(v(w)− v(u)),
where we have used that v(xy) = v(x) = v(y). So if v is harmonic at w in G then v is harmonic
at w in G′.
Similarly, for w = xy,
∑u
c′(xy, u)(v(xy)− v(u)) =∑u
c(x, u)(v(x)− v(u)) +∑u
c(y, u)(v(y)− v(u)),
so v is harmonic at xy in G′. ut
Example 12.4. What is the effective conductance between a and z in the following network:
78
a z
a z
a z
2
1/2
1/2
a z
2
a z
2/3
454
Exercise 12.3. Let (G, c) be a network, and let v be a unit voltage imposed on a and Z.
Suppose x, y 6∈ a ∪ Z are such that v(x) = v(y). Let c′ be a new weight function on G that is
identical to c except for the edge x ∼ y. For x ∼ y let c′(x, y) = C ≥ 0 some arbitrary number,
possibly 0. Let ∆′ be the Laplacian on (G, c′).
Then, v is harmonic in G \ (a ∪ Z) also with respect to c′. Conclude that the effective
conductance between a and Z is the same in both (G, c) and (G, c′).
Solution. Since the difference is only the edge x ∼ y, we only need to check that the harmonicity
is preserved at x and y. Because v(x)− v(y) = 0,
c′z∆′v(z) =
∑w
c′(z, w)(v(z)− v(w))
=∑
w : z,w6=x,yc(z, w)(v(z)− v(w)) + c′(x, y)(v(x)− v(y)) = cz∆v(z).
79
Thus v is a unit voltage imposed on a and Z with respect to c′ as well. Also,
Ceff(a, Z; (G, c′)) = c′a∆′v(a) = ca∆v(a) = Ceff(a, Z; (G, c)).
ut
Example 12.5. The network from the previous example can be reduced by removing the vertical
edge. 454
Exercise 12.4. Let G = (V, c) be a network such that V = Z and x ∼ y if and only if
|x− y| = 1. For the weighted random walk (Xt)t on G define
Vt(x) =
t∑n=0
1Xn=x
the number of visits to x up to time t. Let T+0 = inf t ≥ 1 : Xt = 0.
Calculate E0[VT+0
(x)] as a function of c only.
Number of exercises in lecture: 4
Total number of exercises until here: 24
80
Random Walks
Ariel Yadin
Lecture 13: Thompson’s Principle
Suppose G is a network. We think of weights as conductances, so it seems intuitive that
increasing the conductance of edges would result in making the graph more transient. This is
what we prove in this lecture.
13.1. Thomson’s Principle
• Definition 13.1. For F ∈ L2(E) and for v such that ∇v ∈ L2(E) define the energy of F
and of v by
E(F ) := 〈F, F 〉 =∑e
r(e)F (e)2 and E(v) := 〈∇v,∇v〉 =∑x∼y
c(x, y)(v(x)− v(y))2.
X Note that if v ∈ L2(V ) then E(v) = 2〈∆v, v〉 by the duality formula.
• Lemma 13.2 (Thomson’s Principle / Dirichlet Principle). Let G = (V, c) be a finite network,
let A,Z be disjoint subsets.
Joseph John Thomson
(1856-1940)
The unit voltage v is the function that minimizes the energy E(f) over all functions f with
f(a) = 1 for all a ∈ A and f(z) = 0 for all z ∈ Z.
Proof. By the duality formula we have that for any f, f ′ ∈ C0(V ),
〈∆f, f ′〉 = 12 〈∇f,∇f ′〉 = 〈f,∆f ′〉.
(That is, the Laplacian is self-dual.) Since f −v = 0 on A∪Z, and since v is harmonic off A∪Z,
we get that (f − v)∆v ≡ 0. So,
〈∆(f − v), v〉 = 〈f − v,∆v〉 =∑x
cx(f(x)− v(x))∆v(x) = 0.
This implies
E(f) = 2〈∆(f − v + v), f − v + v〉 = E(f − v) + E(v) ≥ E(v),
where we have used that the energy is always non-negative. utJohann Dirichlet
(1805-1859)
81
• Lemma 13.3 (Thomson’s Principle - Dual Form). Let G be a finite network, let a , Z be
disjoint subsets. Let v(x) = 12gZ(x, a) where gZ(x, a) is the Green function (the expected number
of visits to a started at x from time 0 until before hitting Z).
Then, over all flows F from a to Z with flow divF (a) = 1, the energy E(F ) is minimized at
I = ∇v.
Proof. First, we know that v is a voltage on a and Z with v(z) = 0 for all z ∈ Z. Also,
divI(a) = 2∆v(a) = 1.
Let F be a flow from a to Z with divF (a) = 1. Then, F − I is a flow from a to Z with
div(F − I)(a) = 0. Since div(F − I) is 0 off Z, and v is 0 on Z, we get that div(F − I)v ≡ 0.
Thus,
〈F − I, I〉 = 〈div(F − I), v〉 = 0.
So,
E(F ) = E(F − I + I) = E(F − I) + E(I) ≥ E(I).
ut
• Corollary 13.4 (Rayleigh’s Monotonicity Principle). Let G be a finite network, and let a be
a point not in a subset Z. Suppose c′ is a weight function on G such that c ≤ c′. Then,
John William Strutt, 3rd
Baron Rayleigh (1842-1919)
Ceff(a, Z; c) ≤ Ceff(a, Z; c′).
Proof. Let v be the unit voltage imposed on a and Z with respect to c, and let u be the unit
voltage imposed on a and Z with respect to c′.
Note that
E(v) = 2〈∆v, v〉 = 2∑x
cx∆v(x)v(x) = 2ca∆v(a) = 2Ceff(a, Z; c),
because ∆v(x) = 0 for x 6∈ a ∪ Z, v(z) = 0 for z ∈ Z, and v(a) = 1. Similarly, E(u) =
2Ceff(a, Z; c′). (This fact is called conservation of energy.)
Since ca ≤ c′a, using Thomson’s principle,
Ceff(a, Z; c) = 12
∑x,y
c(x, y)(v(x)− v(y))2 ≤ 12
∑x,y
c(x, y)(u(x)− u(y))2
≤ 12
∑x,y
c′(x, y)(u(x)− u(y))2 = Ceff(a, Z; c′).
ut
82
• Corollary 13.5. Let G be an infinite network. Let c′ be a weight function on G such that
c′ ≥ c.If (G, c) is transient, then also (G, c′) is transient.
Proof. Fix a vertex o ∈ G. For every n, let Gn be the ball of radius n around o; that is
Gn = x ∈ G : dist(x, o) ≤ n .
So (Gn)n form an increasing sequence of subgraphs that exhaust G. Let Zn = Gn+1 \Gn, which
is the outer boundary of the ball of radius n. We know that G is transient, which is equivalent
to
limn→∞
Reff(a, Zn; c) <∞
(because imposing a unit voltage on a and G \Gn is the same as imposing a unit voltage on a
and Zn). Now, for each fixed n, since c′ ≥ c, considering the finite networks (Gn, c) and (Gn, c′),
we have that
Ceff(a, Zn; c) ≤ Ceff(a, Zn, c′).
Thus,
limn→∞
Reff(a, Zn; c′) ≤ limn→∞
Reff(a, Zn; c) <∞,
so (G, c′) is transient. ut
Exercise 13.1. Let H be a subgraph of a graph G (not necessarily spanning all vertices
of G. Show that if the simple random walk on H is transient then so is the simple random walk
on G.
13.2. Shorting
Another intuitive network operation that can be done is to short two vertices. This can be
thought of as imposing a conductance of∞ between them. Since this increases the conductance,
it is intuitive that this will increase the effective conductance.
• Proposition 13.6. Let (G, c) be a finite network. Let b, d ∈ G and define (G′, c′) by shorting
b and d: Let V (G′) = V (G)\b, d∪bd and c′(z, w) = c(z, w) for z, w 6∈ b, d and c′(bd, w) =
c(b, w) + c(d,w).
Then, for any disjoint sets a , Z, we have that Ceff(a, Z;G) ≤ Ceff(a, Z;G′).
83
Proof. Let v be the unit voltage imposed on a and Z with respect to c, and let u be the unit
voltage imposed on a and Z with respect to c′.
Conservation of energy tells us that
2Ceff(a, Z; c) =∑x,y
c(x, y)(v(x)− v(y))2 and 2Ceff(a, Z; c′) =∑x,y
c′(x, y)(u(x)− u(y))2.
Note that u can be viewed as a function on V (G) by setting u(b) = u(d) = u(bd). Since ca ≤ c′a,
using Thomson’s principle,
Ceff(a, Z; c′) = 12
∑x,y∈G\b,d
c′(x, y)(u(x)− u(y))2 +∑w
c′(bd, w)(u(bd)− u(w))2
= 12
∑x,y∈G\b,d
c(x, y)(u(x)− u(y))2 +∑k=b,d
∑w
c(k,w)(u(k)− u(w))2
= 12
∑x,y∈G
c(x, y)(u(x)− u(y))2
≥ 12
∑x,y∈G
c(x, y)(v(x)− v(y))2 = Ceff(a, Z; c). ut
Number of exercises in lecture: 1
Total number of exercises until here: 25
84
Random Walks
Ariel Yadin
Lecture 14: Nash-Williams
14.1. A Probabilistic Interpretation of Current
• Proposition 14.1. Let (G, c) be a network. Let a ∈ G and Z ⊂ G such that the component
of a in G \ Z is finite.
For the weighted random walk on G, (Xt)t, and for any edge x ∼ y, let Vx,y be the number of
times the walk goes from x to y until hitting Z; that is,
Vx,y :=
TZ∑k=1
1Xk−1=x,Xk=y.
Then,
Ea[Vx,y − Vy,x] = ∇v(x, y) · Reff(a, Z),
where v is a unit voltage imposed on a, Z.
Proof. Let
g(x) =cacxgZ(a, x) =
cacx
EaTZ−1∑k=0
1Xk=x = gZ(x, a).
We have already seen that g is harmonic in G \ (a ∪ Z). Also, g(z) = 0 for all z and
g(a) =1
Pa[TZ < T+a ]
= ca · Reff(a, Z).
(g is a voltage imposed on a, Z with g(z) = 0 for all z ∈ Z.)
Now,
Ea[Vx,y] =
∞∑k=1
Pa[Xk−1 = x,Xk = y, TZ > k − 1]
=
∞∑k=0
Pa[Xk = x, TZ > k]P (x, y) =1
ca· cxg(x)P (x, y) =
1
cag(x)c(x, y).
Thus,
Ea[Vx,y − Vy,x] =1
ca· c(x, y)(g(x)− g(y)).
That is, since v = gg(a) is a unit voltage imposed on a, Z, and since c−1
a g = Reff(a, Z)v,
Ea[Vx,y − Vy,x] = Reff(a, Z) · c(x, y)(v(x)− v(y)). ut
85
14.2. The Nash-Williams Criterion
• Definition 14.2. Let G be a graph. Let A,Z be disjoint subsets.
A subset of edges Π is a cut between A and Z if any path γ : a→ z with a ∈ A and z ∈ Zmust pass through and edge in Π.
A subset of edges Π is a cutest, (sometimes a cut between A and ∞), if any infinite simple
path that starts at a ∈ A, must pass through an edge in Π.
One intuitive statement is that if e is a cut edge between a and Z, then Reff(a, Z) ≥ r(e),
because there is at least some more resistance between a and Z.
• Proposition 14.3. Let (G, c) be a finite network. Let a , Z be disjoint subsets and let e be
a cut edge between a and Z. Then Reff(a, Z) ≥ r(e).
Proof. Suppose that e = (x, y). Let Vx,y be the number of times a random walk crosses the edge
(x, y) until hitting Z and let Vy,x be the number of times the walk crosses y, x before hitting Z.
We have seen that
Ea[Vx,y − Vy,x] = ∇v(x, y) · Reff(a, Z),
where v is a unit voltage imposed on a and Z.
Because G is finite, we know by uniqueness of harmonic functions that v(x) = Px[Ta < TZ ].
Because (x, y) is a cut edge between a and Z, to get from y to a the walk must pass through x;
that is,
v(y) = Py[Ta < TZ ] = Py[Tx < TZ ] · Px[Ta < TZ ] ≤ v(x).
So 0 ≤ v(x)− v(y) ≤ 1.
Now, since (x, y) is a cut edge between a and Z, we must have that Vx,y − Vy,x ≥ 1, because
the walk must cross the edge (x, y) and every time it crosses back over (y, x) it must return to
cross (x, y). Thus,
1 ≤ c(x, y)(v(x)− v(y))Reff(a, Z) ≤ c(x, y)Reff(a, Z).
ut
If Π is a cut between a and Z then shorting all edges in Π would result in a cut edge of
conductance at most∑e∈Π c(e). A natural generalization of the above is the following.
• Lemma 14.4 (Nash-Williams Inequality). Let (G, c) be a finite network, and a , Z disjoint
sets. Suppose that Π1, . . . ,Πk are k pairwise disjoint cuts between a and Z. Then,
Crispin Nash-Williams
(1932-2001)
86
Reff(a, Z) ≥k∑j=1
∑e∈Πj
c(e)
−1
.
Proof. Note that since removing edges from a cut-set only increases the right hand side, we
can prove the lemma with the assumption that cut-sets are minimal. Specifically, they do not
contain both (x, y) and (y, x) for an edge x ∼ y.
Let v be a unit voltage imposed on a and Z. We know (conservation of energy) that 12E(v) =
Ceff(a, Z).
For an edge (x, y) let Vx,y be the number of crossings from x to y until hitting Z; that is,
Vx,y =
TZ∑k=1
1Xk−1=x,Xk=y.
Then, for any minimal cut Π between a and Z, we have that Pa-a.s.
∑(x,y)∈Π
Vx,y − Vy,x ≥ 1.
Also, we have that for any edge (x, y),
Ea[Vx,y − Vy,x] = ∇v(x, y) · Reff(a, Z).
Thus, applying Cauchy-Schwarz, for any cut Π between a and Z,
1 ≤ Reff(a, Z)2 ·
∑(x,y)∈Π
∇v(x, y)
2
≤ Reff(a, Z)2 ·∑
(x,y)∈Π
c(x, y) ·∑
(x,y)∈Π
c(x, y)(v(x)− v(y))2.
That is, for any one of the cuts Πj ,∑e∈Πj
c(e)
−1
≤ Reff(a, z)2 ·∑
(x,y)∈Π
c(x, y)(v(x)− v(y))2.
Since the cuts Πj are disjoint, and since we assumed that the cut does not contain both (x, y)
and (y, x) (because they are minimal), we have that
k∑j=1
∑e∈Πj
c(e)
−1
≤ Reff(a, z)2 · 1
2
∑x,y
c(x, y)(v(x)− v(y))2 = Reff(a, Z).
ut
87
• Corollary 14.5 (Nash-Williams Criterion). Let (G, c) be an infinite network. If (Πn)n is a
sequence of pairwise disjoint finite cutsets between a and ∞ such that
∞∑n=1
(∑e∈Πn
c(e)
)−1
=∞,
then (G, c) is recurrent.
Proof. Fix n. Let Gn be subnetwork induced by (G, c) on the smallest ball (in the graph metric)
that contains⋃nj=1 Πj . Let Zn = G \Gn.
So (Gn)n exhaust G and for each fixed n,
Reff(a, Zn) ≥n∑j=1
∑e∈Πj
c(e)
−1
.
Letting n → ∞, the left hand side tends to Reff(a,∞) and the right hand side tends to the
infinite sum. Since Reff(a,∞) ≥ ∞, (G, c) is recurrent. ut
Example 14.6. We now give a proof that Z and Z2 are recurrent.
Recall that we could prove this by showing that
P0[X2t = 0] ≥ const ·
1/√t d = 1
1/t d = 2
.
However, it will be easier to do this without these calculations (especially the more complicated
Z2 case).
For Z this is easy, because Z is just composed of edges in series, so for any n > 0,
Reff(0, −n, n) = n/2→∞.
Now for Z2: By the Nash-Williams criterion, it suffices to find disjoint cutsets (Πn)n such
that ∑n
1
|Πn|=∞.
Indeed, taking
Πn = (x, y) : ||x||∞ = n, ||y||∞ = n+ 1
we have that |Πn| = 4(2n+ 1). 454
88
X ADD: example that Nash-Williams is not necessary.
Number of exercises in lecture: 0
Total number of exercises until here: 25
89
Random Walks
Ariel Yadin
Lecture 15: Flows
15.1. Finite Energy Flows
The Nash-Williams criterion was a sufficient condition for recurrence. We now turn to a
stronger condition which is necessary and sufficient.
Let (G, c) be an infinite weighted graph. Recall that a flow from A to Z is an anti-symmetric
function with vanishing divergence off A ∪ Z.
In this spirit, we say that F is a flow from o ∈ G to ∞ if
• F is anti-symmetric.
• divF (o) 6= 0 and divF (x) = 0 for all x 6= o.
If in addition divF (o) = 1 we say that F is a unit flow from o to infinity.
••• Theorem 15.1 (T. Lyons, 1983). A weighted graph (G, c) is transient if and only if there
exists a finite energy flow on (G, c) from some vertex o ∈ G to ∞.
Terrence Lyons
Proof. The proof is an adaptation of a method of H. Royden in the continuous world.
Assume that F is a flow from o to ∞. By changing F 7→ F/divF (o) we can assume without
loss of generality that F is a unit flow.
For each n let Gn be the finite subnetwork of G induced on the ball of radius n around o (in
the graph metric). Let Zn = Gn \Gn−1. Transience of G is equivalent to
limn→∞
Reff(o, Zn) <∞.
Let vn = 12gZn(x, o), where gZn is the Green function on the finite network Gn. Since
div∇vn(o) = 2∆vn(o) = 1, the dual version of Thompson’s principle tells us that E(F ) ≥EGn(F ) ≥ EGn(vn).
Also, since
vn(o) = 12gZn(o, o) =
1
2Po[TZn < T+o ]
=co2Reff(o, Zn)
90
and ∆vn(o) = 12 , we get that
EGn(vn) = 2∑x
cx∆vn(x)vn(x) = 2co∆vn(o)vn(o) =c2o2Reff(o, Zn).
Thus, if F has finite energy on G then
limn→∞
Reff(o, Zn) =2
c20limn→∞
EGn(vn) ≤ 2
c2oE(F ) <∞
and (G, c) is transient.
For the other direction, assume that (G, c) is transient and consider the functions vn(x) =
Px[To < TZn ] and v(x) = Px[To < ∞]. vn is a unit voltage imposed on o and Zn in Gn, and
vn(x) v(x) for every x by monotone convergence. Note that v(o) = vn(o) = 1 and v, vn are
non-constant because (G, c) is transient.
Let In = ∇vn and I = ∇v. Note that for every edge e, In(e)2r(e) → I(e)2r(e). Also, for
every n, since Gn is finite,
E(In) = 2〈∆vn, vn〉 = 2co∆vn(o)vn(o) = 2Ceff(o, Zn) ≤ 2co <∞.
Thus, Fatou’s lemma (for sums) tells us that
E(I) =∑e
limn→∞
In(e)2r(e) ≤ lim infn→∞
∑e
In(e)2r(e) ≤ 2co <∞.
Since I = ∇v, we have that
divI(o) = 2∆v(o) = 2∑y
P (o, y)(1− Py[To <∞]) = 2Po[T+o =∞] > 0
by transience, and divI(x) = 2∆v(x) = 0 for all x 6= o. That is, I is a flow from o to ∞ with
finite energy. ut
15.2. Flows on Zd
We now want to give some more details about random walks on Zd.
We start with a proof that Zd is transient for d ≥ 3. By Rayleigh’s monotonicity principle it
suffices to prove that Z3 is transient. By Lyon’s Theorem it suffices to provide a finite energy
flow on Z3.
Let µ be a probability measure on paths on some graph G. Let Γ denote the random path,
and suppose that µ-a.s. every vertex of G is visited finitely many times. Then, we can define
V (x, y) to be the number of times Γ crosses the edge (x, y), and Eµ(x, y) to be the expectation
of V (x, y) under µ.
91
Claim 15.2. Suppose that Γ is infinite and Γ0 = o µ-a.s. Suppose also that Eµ(x, y) < ∞ for
every edge (x, y).
Then, F (x, y) := Eµ(x, y)− Eµ(y, x) is a flow from o to ∞.
Proof. Anti-symmetry is clear. Also, for any x 6= o, since Γ is infinite, it cannot terminate at x.
Thus, every time Γ crosses an edge (y, x) it must then cross an edge (x, z) immediately after.
Thus, µ-a.s.∑y∼x V (x, y)− V (y, x) = 0 and so divF (x) = 0.
Also, since Γ0 = o, we get one extra passage out of o, but the rest must cancel: codivF (o) =
2∑y F (o, y) = 2. ut
That is, to show a graph is transient, we need to construct a measure on infinite paths, that
start at some vertex, and the expected number of visits to any vertex is finite. If the energy is
finite for such a measure, we have transience.
15.2.1. Wedges. Let us prove something a bit more general than Z3 being transient and Z2
being recurrent.
Let ϕ : N→ N be an increasing function. Consider the subgraph of Z3 induced on
Wϕ =
(x, y, z) ∈ Z3 : |z| ≤ ϕ(|x|).
(This is the ϕ-wedge.)
••• Theorem 15.3 (T. Lyons 1983). If
∞∑n=1
1
n(ϕ(n) + 1)=∞
then Wϕ is recurrent.
If ϕ(n+ 1)− ϕ(n) ≤ 1 and∞∑n=1
1
n(ϕ(n) + 1)<∞
then Wϕ is transient.
Proof. The first direction is simpler. Let Wϕ be a wedge, and let Bn denote the ball of radius n
around 0 in the graph metric (which is the L1 distance in R3). Let ∂Bn be the edges connecting
Bn to Bcn. Thus, ∂Bn form disjoint cutsets between 0 and ∞.
What is the size of ∂Bn? Well there are at most 2n choices for x and then, given x there are
at most 2(ϕ(|x|) + 1) ≤ 2(ϕ(n) + 1) choices for z, which then determines y up to sign. Thus, the
92
size is bounded by |∂Bn| ≤ O(n(ϕ(n) + 1)). So Nash-Williams tells us that if
∞∑n=1
1
n(ϕ(n) + 1)=∞
the walk is recurrent.
Now for the other direction. We define a measure on paths in Wϕ. Let U1, U2 be uniformly
chosen on [0, 1] independently. Let L be the set (n,Un,U ′ϕ(n)) : n ∈ N. Choose a monotone
path Γ in Wϕ that is always at distance at most 1 from L. (A monotone path γ is a path in Z3
such that dist(γt+1, 0) ≥ dist(γt, 0).)
Fix an edge e in Wϕ and suppose that (x, y, z) is an endpoint of e. Let R = |x| + |y| + |z|.The event that e ∈ Γ implies that (x, y, z) is at distance at most 1 from L. That implies that
3n ≥ n+ Un+ U ′ϕ(n) ≥ R− 1 ⇒ n ≥ R− 1
3
(where we have used that ϕ(n) ≤ n). Also
|x− n|+ |y − Un|+ |z − U ′ϕ(n)| ≤ 1
so nU ∈ [y − 1, y + 1] and ϕ(n)U ′ ∈ [z − 1, z + 1]. Thus,
µ[e ∈ Γ] ≤ 4
n(ϕ(n) + 1).
Because Γ visits any edge at most once, this is also a bound on Eµ(e). Since there are at most
O(R(ϕ(R) + 1)) ≤ O(n(ϕ(n) + 1)) such possibilities for (x, y, z) ∈Wϕ, we have that the energy
of the flow is at most
2∑R
∑|x|+|y|+|z|=R
∑e : e+=(x,y,z)
Eµ(e)2 ≤ const. ·∑n
n(ϕ(n) + 1) · 1
n2(ϕ(n) + 1)2
= const.∑n
1
n(ϕ(n) + 1).
Since this is finite, the flow is of finite energy and the wedge is transient. ut
Example 15.4. For example, if we choose ϕ(n) = nε, we get a transient wedge. This is also
true if we take ϕ(n) = (log n)2.
If we choose ϕ(n) = 1 we get essentially Z2 and recurrence, of course. Also, ϕ(n) = log n will
give a divergent sum, so this wedge is recurrent. 454
Number of exercises in lecture: 0
Total number of exercises until here: 25
93
Random Walks
Ariel Yadin
Lecture 16: Resistance in Euclidean Lattices
Let’s wrap up our discussions with some examples of random walks on graphs.
16.1. Euclidean lattices
We have already seen that Zd are transient for d ≥ 3 and recurrent for d ≤ 2. We saw two
different methods to prove this.
The first was brute force computation of P0[St = 0], using Stirling’s formula, and then ap-
proximating E0[Vt(0)] and E0[V∞(0)].
The second method was more robust, and less computational. It involved approximating the
energy of certain flows, mainly taking a uniform direction and following that direction with a
path in the lattice.
Energy estimates and the Nash-Williams inequality can give us better control of the effective
resistance and Green function.
16.1.1. Resistance Estimates. Since Zd, d ≥ 3, is transient, we know that Reff(0, ∂Bn) is
bounded by two constants, where ∂Bn is the boundary of the ball of radius n around 0.
However, for d = 2 we know that Reff(0, ∂Bn) → ∞. We now investigate the growth rate of
this function.
• Proposition 16.1. Let Zn =z ∈ Z2 : dist(0, z) ≥ n
. Then, there exist constants 0 <
c,C <∞ independent of n such that
c log n ≤ Reff(0, Zn) ≤ C log n.
Proof. The lower bound follows by noting that for the sets
Πn =
(z, z′) ∈ Z2 : dist(z, 0) = n− 1,dist(z′, 0) = n,
all Π1,Π2, . . . ,Πn are cuts between 0 and Zn, with size |Πn| = O(n). So the Nash-Williams
inequality gives
Reff(0, Zn) ≥n∑k=1
1
|Πk|≥ const. ·
n∑k=1
1
k= const. log n.
94
For the other direction, let vn(x) = 14gZn(x, 0). So vn is a voltage imposed on 0 and Zn, with
∆vn(0) = 14 and vn(0) = (4P0[T+
0 > TZn ])−1 = Reff(0, Zn). Also,
E(vn) = 8∆vn(0)vn(0) = 2Reff(0, Zn).
Let U be a uniform random variable in [0, 1], and let L =
(n,Un) ∈ R2
. Let Γ be some
random monotone path from 0 that always is at distance at most 1 from L. For any edge
e = (x, y) in Z2, the event e ∈ Γ implies that |x − n| ≤ 1 and nU ∈ [y − 1, y + 1]. Thus, the
expected number of times Γ crosses e is at most 2n ≤ 2
|x|−1 . Let Fn be the flow given by this
random path restricted to G \ Zn. Since the number of edges with endpoint at distance n from
0 is O(n),
E(Fn) ≤n∑k=1
O(k · k−2) = O(log n).
Recall that divFn(0) = 1/2, so Thompson’s principle tells us that for I = ∇vn, since I is a
current with divI(0) = 2∆vn(0) = 12 ,
E(Fn) ≥ E(I) = E(vn) = 2Reff(0, Zn).
ut
Remark 16.2. If we tried to adapt the argument above to Zd, we would see that the probability
that an edge e at distance n from 0 is in Γ is at most O(n−(d−1)) (because we would be looking
at the direction (n,U1n,U2n, . . . , Ud−1n) for U1, . . . , Ud i.i.d. ). Thus,
Reff(0, Zn) ≤ 2−1E(Fn) ≤n∑k=1
O(kd−1 · k−2(d−1)) =
n∑k=1
O(k1−d) = O(n2−d).
Similarly the lower bound would follow from the Nash-Williams inequality.
16.2. Regular Trees
Let Td denote the d regular tree. Fix some vertex ρ ∈ Td as the root. For n ≥ 0 let
Tn = x ∈ Td : dist(x, ρ) = n. It is easy to check that |T0| = 1 and |Tn| = d(d − 1)n−1 for
n ≥ 1.
For any x, y ∈ Tn there exists a graph automorphism ϕ : Td → Td that maps ϕ(x) = y and
fixes each level Tn; i.e. ϕ(Tn) = Tn. Thus, if vn is a unit voltage imposed on ρ and Tn, we
have that vn is constant on Tk for k ≤ n. Thus, all vertices in each level Tk can be shorted into
95
one vertex, without changing the effective resistance Reff(ρ, Tn). This gives us a network whose
vertices are 0, 1, . . . , n, and resistances r(k, k+ 1) = |Tk+1|−1. Thus, the effective resistance is
Reff(ρ, Tn) =1
d
n∑k=1
1
(d− 1)k−1=
d− 1
d(d− 2)·(1− (d− 1)−n
).
Thus, Reff(ρ,∞) = d−1d(d−2) <∞ so Td is transient for d > 2.
16.2.1. A computational proof. We now give a computational proof that the random walk
on Td is transient for d > 2.
Let (Xt)t be the random walk on Td, and consider the following sequences: Dt := dist(Xt, ρ)
and Mt = (d− 1)−Dt . Let Tj be the first time Xt ∈ Tj .First, note that
E[Mt+1 | Ft] = 1Dt=0(d− 1)−1 + 1Dt>0 ·(
1d (d− 1)Mt + (1− 1
d )(d− 1)−1Mt
)= 1Dt=0(d− 1)−1 + 1Dt>0Mt.
So under Px for x 6= ρ, we have that (Mt∧T0)t is a bounded martingale. If Px[T0 < ∞] = 1 we
would have by the optional stopping theorem that
(d− 1)−dist(x,ρ) = Ex[MT0] = 1,
which is a contradiction. Since
Pρ[T+ρ <∞] =
1
d
∑x∈T1
Px[T0 <∞] < 1,
we get that Td is transient.
In fact, the above lets us calculate exactly the probability to escape from ρ: If T = T0 ∧ Tnthen by the optional stopping theorem, for x ∈ T1,
(d− 1)−1 = Ex[MT ] = Px[Tρ > Tn] · (d− 1)−n + 1− Px[Tρ > Tn],
so
Px[Tρ > Tn] =d− 2
d− 1− (d− 1)−n+1= 1− (d− 1)n−1 − 1
(d− 1)n − 1.
Also, v(x) = Px[Tρ < Tn] is a unit voltage on ρ and Tn. Thus,
∆v(ρ) =1
d
∑x∈T1
Px[Tρ > Tn] =d− 2
d− 1− (d− 1)−n+1.
So
Ceff(ρ, Tn) = d∆v(ρ) =d(d− 2)
d− 1− (d− 1)−n+1=d(d− 2)
d− 1·(1− (d− 1)−n
)−1,
96
which coincides with our calculation above.
16.3. Flows from random paths
In this section, we generalize the previous constructions on Zd.
Let µ be a probability measure on infinite paths in G started from o ∈ G. By mapping each
path in the support of µ to its loop-erasure, we may assume without loss of generality that µ is
supported on simple paths (paths that do not cross any vertex more than once).
Let α, β be two independent random paths of law µ. Now, define F ∈ C0(E) by F (x, y) =
µ((x, y) ∈ α) − µ((y, x) ∈ α) (by e ∈ α we mean that there exists n such that e = (αn, αn+1).
We claim that F is a flow. Indeed, for x 6= o the number of edges going into x in α equals the
number of edges exiting x in α. Thus, for x 6= o,
divF (x) =∑y∼x
1cx
(F (x, y)− F (y, x)) = 2cx·∑y∼x
E[1(x,y)∈α − 1(y,x)∈α] = 0.
Similarly for x = o, there is one more edge exiting o than edges entering o. So we have divF (o) =
2co
.
Let us calculate the energy of F . First, note that for x ∼ y,
F (x, y)2 = (µ((x, y) ∈ α)− µ((y, x) ∈ α))2 ≤ µ((x, y) ∈ α)2 + µ((y, x) ∈ α)2.
Thus, for α, β independent paths of law µ,
E(F ) =∑e
r(e)F (e)2 ≤ 2 ·∑e
r(e)µ(e ∈ α) · µ(e ∈ β)
= 2E∑e
r(e)1e∈α1e∈β
We conclude:
• Proposition 16.3. Let G be a graph. Suppose that G admits a probability measure µ on
infinite paths in G started from some fixed o ∈ G such that for two independent paths α, β we
have E |α ∩ β| <∞. Then G is transient.
The following is an open question.
Conjecture 16.4. Let G be a transitive graph. If the simple random walk on G is transient,
then there exists a measure on infinite paths µ started from some fixed o ∈ G such that for two
independent paths α, β of law µ, there exists ε > 0 with
E[eε|α∩β|] <∞.
97
Number of exercises in lecture: 0
Total number of exercises until here: 25
98
Random Walks
Ariel Yadin
Lecture 17: Spectral Analysis
17.1. Spectral Radius
Let (G, c) be a network. Recall that the transition matrix P is an operator on C0(V ) that
operates by Pf(x) =∑y P (x, y)f(y). Also, recall that the space L2(V ) is the space of functions
f ∈ C0(V ) that admit
〈f, f〉 =∑x
cxf(x)2 <∞.
One can easily check that P : L2(V )→ L2(V ). Also, P is a self-adjoint operator, and its norm
admits ||P || ≤ 1 (this is called a contraction).
• Proposition 17.1. Let (G, c) be a weighted graph with transition matrix P . The limit
ρ(P ) = lim supn→∞
(Pn(x, y))1/n
does not depend on x, y.
Proof. Fix z, w ∈ V . We will show that ρ(P ) ≤ lim supn(Pn(z, w))1/n.
Because P is irreducible, we have that for some t, t′ > 0, P t(z, x) > 0, P t′(y, w) > 0. Thus,
Pn(z, w) ≥ P t(z, x)Pn−t−t′(x, y)P t
′(y, w).
Since (P t(z, x))1/n → 1 and (P t′(y, w))1/n → 1,
lim supn→∞
(Pn(z, w))1/n ≥ lim supn→∞
(Pn(x, y))1/n.
Exchanging the roles of x, y and z, w we get that the lim sup does not depend on the choice of
x, y. ut
• Definition 17.2 (Spectral Radius). Let (G, c) be a weighted graph with transition matrix P .
Define the spectral radius of (G, c) to be
ρ(G, c) = ρ(P ) := lim supn→∞
Pn(x, x).
99
One of the reasons for the name spectral radius is that by the Cauchy-Hadamard criterion, the
generating function for the Green function has radius of convergence ρ−1. That is, the function
g(x, y|z) =
∞∑n=0
Pn(x, y)zn
converges when |z| < ρ−1.
Jacques Hadamard
(1865-1963)
Note that ρ ≤ 1, and that g(x, y|z = 1) is exactly the Green function. Since the Green
function converges if and only if G is transient, we have that for recurrent graphs ρ = 1. The
natural question arises, what are precisely the cases for which ρ = 1? This has been answered
by Kesten in his PhD thesis in 1959, see Theorem 18.1 below.
The above is a good reason for the radius part of the name spectral radius. The next propo-
sition explains the spectral part of the name.
• Proposition 17.3. Let (G, c) be a weighted graph with transition matrix P . Then, ||P || =
ρ(P ). Moreover, for any x, y,
Pn(x, y) ≤√cycx· ρ(P )n.
Proof. First, note that for any x, y,
Pn(x, y) = c−1x 〈Pnδy, δx〉 ≤ c−1
x ||P ||n||δy|| · ||δx|| =√cycx||P ||n.
Thus, ρ(P ) ≤ ||P ||.The other direction is a bit more complicated. Let f ∈ L2(V ) have finite support S ⊂ V .
Now, because S is finite, for every ε there exists N = N(ε, S), such that for all n > N , and all
x, y ∈ S we have that P 2n(x, y) ≤ (ρ(P ) + ε)2n. Thus, for all n > N(ε, S),
||Pnf ||2 = 〈P 2nf, f〉 =∑x,y
cxP2n(x, y)f(x)f(y)
≤∑x,y
cxP2n(x, y)f(x)f(y)1f(x)>01f(y)>0
≤ (ρ(P ) + ε)2n ·∑x,y
cxf(x)f(y)1f(x)>01f(y)>0 = Cf (ρ(P ) + ε)2n.
Thus, lim supn ||Pnf ||1/n ≤ ρ(P ) + ε for any ε, and so lim supn ||Pnf ||1/n ≤ ρ(P ).
Now, consider the sequence an = ||Pnf ||. We have that
a2n+1 = 〈Pn+1f, Pn+1f〉 = 〈Pnf, Pn+2f〉
≤ ||Pnf || · ||Pn+2f || = an · an+2.
100
That is, bn := an+1
anis a non-decreasing sequence. Thus, the following limits exist and satisfy
supnbn = lim
n→∞bn = lim
n→∞a1/nn ≤ ρ(P ).
So,
||Pf ||||f || = b0 ≤ sup
nbn ≤ ρ(P ).
This holds for all finitely supported f .
We want this to hold for all f ∈ L2(V ). We now use the fact that the finitely supported
functions are dense in L2(V ). Indeed, let f ∈ L2(V ). Fix ε > 0. Since∑x cxf(x)2 < ∞, there
exists a finite set Sε ⊂ V such that ∑x 6∈Sε
cxf(x)2 < ε2.
Thus, setting g = f1Sε , we have that ||f − g||2 < ε2. Now, since g is finitely supported, and
since ||g|| ≤ ||f ||,
||Pf || = ||P (f − g) + Pg|| ≤ ||P || · ||f − g||+ ||Pg||
≤ ||P ||ε+ ρ(P )||g|| ≤ ||P ||ε+ ρ(P )||f ||.
Taking ε→ 0, ||Pf || ≤ ρ(P )||f ||. Since this holds for all f , we get that ||P || ≤ ρ(P ). ut
Exercise 17.1. Let (G, c) be a weighted graph with transition matrix P . Let ρ(P ) be
the spectral radius. Show that if G is recurrent then ρ(P ) = 1.
17.1.1. Energy minimization. Let (G, c) be a weighted graph. Consider the functions on G
with finite support; i.e. L0(V ). These all have finite energy. We want to find the function that
minimizes the energy, when normalized to have length 1.
• Proposition 17.4. Let (G, c) be a weighted graph. Then
1− ρ(G) = inf06=f∈L0(V )
E(f)
2〈f, f〉 .
(Sometimes 1 − ρ is called the spectral gap. This is the minimal possible energy of unit
length functions.)
101
Proof. Note that for f ∈ L0(V ) we can use duality so that
12E(f) = 〈∆f, f〉 = 〈f, f〉 − 〈Pf, f〉.
Thus, it suffices to show that
ρ = ρ := sup0 6=f∈L0(V )
〈Pf, f〉〈f, f〉 .
Now, for any f 6= 0 we have by Cauchy-Schwarz
|〈Pf, f〉| ≤ ||Pf || · ||f || ≤ ||P || · 〈f, f〉,
so ρ ≤ ||P || = ρ(P ). On the other hand, since P is self-adjoint, for any f, g ∈ L0(V ),
〈Pf, g〉 =1
4(〈P (f + g), f + g〉 − 〈P (f − g), f − g〉) .
So
〈Pf, g〉 ≤ ρ
4· (〈f + g, f + g〉+ 〈f − g, f − g〉) =
ρ
2· (〈f, f〉+ 〈g, g〉) .
Now take g = ||f ||||Pf ||Pf . Plugging this in above gives
||f || · ||Pf || = ||f ||||Pf || · 〈Pf, Pf〉 ≤
ρ
2
(〈f, f〉+
||f ||2||Pf ||2 〈Pf, Pf〉
)= ρ||f ||2.
So ||Pf || ≤ ρ||f || for all f ∈ L0(V ).
Using the fact that L0(V ) is dense in L2(V ) completes the proof: For any f ∈ L2(V ) and any
ε find g ∈ L0(V ) such that ||f − g|| < ε and ||g|| ≤ ||f ||. Then,
||Pf || ≤ ||P (f − g)||+ ||Pg|| ≤ ||P ||ε+ ρ||g|| ≤ ||P ||ε+ ρ||f ||.
Taking ε→ 0 gives that ρ(P ) = ||P || ≤ ρ. ut
17.2. Isoperimetric Constant
For a graph G, we are interested in how small a boundary of a set can be, compared to the
volume of that set. These serve as bottlenecks in the graph, so a random walk can get “stuck”
inside for a while. Thus, it makes sense to define the following.
• Definition 17.5. Let (G, c) be a weighted graph. Let S ⊂ G be a finite subset. Define the
(edge) boundary of S to be
∂S = (x, y) ∈ E(G) : x ∈ S , y 6∈ S .
Define the isoperimetric constant of G to be
Φ = Φ(G, c) := inf c(∂S)/c(S) : S is a finite connected subset of G .
102
Here c(∂S) =∑e∈∂S c(e) and c(S) =
∑x∈S cx.
Of course 1 ≥ Φ(G) ≥ 0 for any graph. When Φ(G) > 0, we have that sets “expand”: the
edges coming out of a set carry a constant proportion of the weight of the set.
• Definition 17.6. Let (G, c) be a weighted graph. If Φ(G, c) = 0 we say that (G, c) is amenable.
Otherwise we call (G, c) non-amenable.
A sequence a finite connected sets (Sn)n such that c(∂Sn)/c(Sn) → 0 is called a Folner
sequence, or the sets are called Folner sets.
Erling Folner (1919 - 1991) The concept of amenability was introduced by von Neumann in the context of groups and the
Banach-Tarski paradox. Folner’s criterion using boundaries of sets provided the ability to carry
over the concept of amenability to other geometric objects such as graphs.
The isoperimetric constant is a geometrical object. It turns out that positivity of the isoperi-
metric constant is equivalent to the spectral radius being strictly less than 1.
John von Neumann
(1903-1957)
Exercise 17.2. Let S ⊂ Td be a finite connected subset, with |S| ≥ 2. Show that
|∂S| = |S|(d− 2) + 2.
Deduce that Φ(Td) = d−2d .
Number of exercises in lecture: 2
Total number of exercises until here: 27
103
Random Walks
Ariel Yadin
Lecture 18: Kesten’s Amenability Criterion
18.1. Kesten’s Thesis
Kesten, in his PhD thesis in 1959 proved the connection between amenability and spectral
radius strictly less than 1. This was subsequently generalized to more general settings by others
(including Cheeger, Dodziuk, Mohar).
••• Theorem 18.1. A weighted graph (G, c) is amenable if and only if ρ(G, c) = 1. In fact,
Φ2
2≤ 1−
√1− Φ2 ≤ 1− ρ ≤ Φ.
Harry Kesten (1931–)
First we require
• Lemma 18.2. Let (G, c) be a weighted graph. For any f ∈ L0(V ) (that is, with finite support)
2Φ(G, c) ·∑x
cxf(x) ≤∑x,y
|∇f(x, y)|.
Note that if f = 1S for a finite set S, this is exactly the definition of Φ.
Proof. Since f has finite support we can write∫ ∞0
c(x : f(x) > t)dt =
∫ ∞0
∑x
cx1f(x)>tdt =∑x
cxf(x)1f(x)≥0 ≥∑x
cxf(x).
Also, ∫ ∞0
1f(x)>t≥f(y)dt = |f(x)− f(y)|1f(x)≥f(y).
Using the set St = x : f(x) > t we see that
∂St = (x, y) ∈ E : f(x) > t ≥ y .
104
Since for any t, Φ · c(St) ≤ c(∂St), we can integrate over t to get
Φ ·∑x
cxf(x) ≤∫ ∞
0
Φ · c(St)dt ≤∫ ∞
0
∑x,y
c(x, y)1f(x)>t≥f(y)dt
=∑x,y
c(x, y)|f(x)− f(y)|1f(x)≥f(y) ≤1
2
∑x,y
|∇f(x, y)|,
where we have used the fact that all sums are finite because f has finite support. ut
Proof of Theorem 18.1. The leftmost inequality is just ξ2/2 ≤ 1 −√
1− ξ2, valid for any ξ ∈[0, 1].
The rightmost inequality follows by taking a sequence of finite connected sets (Sn)n such that
Φ = limn→∞ c(∂Sn)/c(Sn). Since
(∇1Sn(x, y))2 = c(x, y)2(1(x,y)∈∂Sn + 1(y,x)∈∂Sn),
1
2E(1Sn) =
1
2
∑x,y
r(x, y)(∇1Sn(x, y))2 =∑x,y
c(x, y)1(x,y)∈∂Sn = c(∂Sn).
Also, 〈1Sn ,1Sn〉 =∑x cx1x∈Sn = c(Sn). Thus,
1− ρ = inf06=f∈L0(V )
E(f)
2〈f, f〉 ≤ limn→∞
c(∂Sn)
c(Sn)= Φ.
The central inequality is Φ2 ≤ 1− ρ2. We use that
1− ρ = inf06=f∈L0(V )
E(f)
2〈f, f〉 and ρ ≥ sup06=f∈L0(V )
〈Pf, f〉〈f, f〉 .
First, for f ∈ L0(V ),
2〈f, f〉+ 2〈Pf, f〉 =∑x,y
c(x, y)f(x)2 +∑x,y
c(x, y)f(y)2 + 2∑x,y
c(x, y)f(x)f(y)
=∑x,y
c(x, y)(f(x) + f(y))2
For g = f2,
〈f, f〉 =∑x
cxg(x) ≤ (2Φ)−1∑x,y
c(x, y)|g(x)− g(y)|
= (2Φ)−1∑x,y
c(x, y)|f(x)− f(y)| · |f(x) + f(y)|.
Applying Cauchy-Schwarz,
4Φ2 · 〈f, f〉2 ≤∑x,y
c(x, y)(f(x)− f(y))2 ·∑x,y
c(x, y)(f(x) + f(y))2
= E(f) · (2〈f, f〉+ 2〈Pf, f〉) ≤ 2E(f) · 〈f, f〉 · (1 + ρ).
105
Rearranging, we get that for any f ∈ L0(V ),
4Φ2 ≤ 2E(f)
〈f, f〉 · (1 + ρ).
Taking infimum over all f ∈ L0(V ), we get that Φ2 ≤ (1− ρ)(1 + ρ) = 1− ρ2 as required. ut
Example 18.3. Let’s calculate ρ(Td), the spectral radius for the d-regular tree.
Let r be the root of Td, and let Tn = x : dist(x, r) = n.For one direction, consider the function
fn(x) =
n∑k=1
(d− 1)−k/2 · 1x∈Tk = 11≤dist(x,r)≤n · (d− 1)−dist(x,r)/2.
If x ∼ y then c(x, y)f(x)f(y) = (d − 1)−(dist(x,r)+dist(y,r))/2 if 1 ≤ dist(x, r),dist(y, r) ≤ n, and
0 otherwise. Thus, since |Tk| = d(d− 1)k−1,
||fn||2 =∑
x : 1≤dist(x,r)≤ncx(d− 1)−dist(x,r) =
n∑k=1
d(d− 1)k−1 · d · (d− 1)−k
= d2(d− 1)−1n.
Simlarly,
Pfn(x) =
2(d−1)1/2
d (d− 1)−dist(x,r)/2 2 ≤ dist(x, r) ≤ n− 1,
(d−1)1/2
d (d− 1)−dist(x,r)/2 dist(x, r) ∈ 1, n ,
(d− 1)−1/2 x = r,
1d (d− 1)−n/2 dist(x, r) = n+ 1,
and 0 otherwise. So,
||Pfn||2 =∑x
cx(Pfn(x))2
= d
n−1∑k=2
d(d− 1)k−1 · 4(d− 1)
d2(d− 1)−k + d · (d− 1)−1 + d2(d− 1)n · 1
d2(d− 1)−n
+ d2 · d− 1
d2(d− 1)−1 + d2(d− 1)n−1 · d− 1
d2(d− 1)−n
= 4(n− 2) +d
d− 1+ 1 + 1 + 1 = 4n− 5 +
d
d− 1.
This implies that
ρ(Td) ≥ ||Pfn||/||fn|| →2√d− 1
d.
For the other direction, since Φ(Td) = d−2d , we have that ρ(Td) ≤
√1− Φ(Td)2 = 2
√d−1d .
454
106
Number of exercises in lecture: 0
Total number of exercises until here: 27
107
Random Walks
Ariel Yadin
Lecture 19:
19.1. Speed of Random Walks
Let (G, c) be a weighted graph and let (Xt)t be the corresponding weighted random walk. In
the exercises one shows that the limit
limt→∞
Ex[dist(Xt, X0)]
t
exists for transitive graphs, and is independent of the choice of starting vertex x. We call this
the speed of the random walk. For general graph this limit may not exist, so we consider lim inf
and lim sup of the sequence. Of course, these limits lie between 0 and 1.
• Definition 19.1. Let (G, c) be a weighted graph and let (Xt)t be the corresponding weighted
random walk. Fix some o ∈ G. The lower speed and upper speed are defined to be
lim inft→∞
Eo[dist(Xt, X0)]
tand lim sup
t→∞
Eo[dist(Xt, X0)]
t.
If these limits coincide, we call the corresponding limit the speed.
Example 19.2. Let us calculate the speed of the random walk on Td.
Fix o ∈ Td. Let (Xt)t be the random walk and define Dt = dist(Xt, o). Let Lt = Lt(o) =∑tk=0 1Xk=o and L−1 = 0.
Consider the sequence Mt = dist(Xt, o)− d−2d t− 2
dLt−1.
Eo[dist(Xt+1, o) | Ft] = 1Xt=o + 1Xt 6=o((dist(Xt, o) + 1)d−1
d + (dist(Xt, o)− 1) 1d
)= dist(Xt, o) +
d− 2
d+ 1Xt=o ·
(1− d− 2
d− dist(Xt, o)
)= dist(Xt, o) +
d− 2
d+
2
d1Xt=o,
where we have used that dist(Xt, o)1Xt=o = 0. Thus,
Eo[Mt+1 | Ft] = dist(Xt, o) +d− 2
d+
2
d1Xt=o −
d− 2
d(t+ 1)− 2
dLt
= dist(Xt, o)−d− 2
dt− 2
dLt−1 = Mt.
108
So (Mt)t is a martingale. This implies that
0 = Eo[Mt] = Eo[dist(Xt, o)]−d− 2
dt− 2
dEo[Lt−1].
Since Td is transient, we know by monotone convergence that
limt→∞
Eo[Lt−1] = Eo[V∞(o) + 1] =1
Po[T+o =∞]
<∞.
Thus,
limt→∞
1
tEo[dist(Xt, o)] =
d− 2
d.
454
It is not a coincidence that Td has positive speed. In fact, this has to do with the fact that
ρ(Td) < 1, or that Td is non-amenable.
••• Theorem 19.3. Let (G, c) be a weighted graph, and let (Xt)t be the corresponding random
walk started at some o ∈ G. Assume the following:
• ρ(G, c) < 1.
• There exists M > 0 such that cx ≤M for all x (i.e. cx is uniformly bounded).
• The limit
b := lim supr→∞
|B(o, r)|1/r <∞
is finite, where B(o, r) is the ball of radius r around o.
Then, the lower speed is positive. In fact, a.s.
lim inft→∞
1
tdist(Xt, o) ≥ −
log ρ(G)
log b> 0.
Proof. Let 0 < α < − log ρlog b . So ρbα < 1. We can choose λ > b such that ρλα < 1. Because λ > b,
there exists some universal constant K > 0 such that |B(o, r)| ≤ Kλr for all r. Because cx is
uniformly bounded, we have that K can be chosen large enough so that K >√M/co. Thus, for
all x and all t, P t(o, x) ≤ Kρt. Combining these two bounds we get that
P[dist(Xt, o) ≤ αt] ≤∑
x∈B(o,bαtc)P t(o, x) ≤ K2ρtλαt.
Since ρλα < 1, we get that these probabilities are summable. By Borel-Cantelli, we have that
P[lim inf 1t dist(Xt, o) ≤ α] = 0.
Taking α→ − log ρlog b completes the proof. ut
109
X Recall that by Fatou’s Lemma
− log ρ
log b≤ Eo[lim inf t−1dist(Xt, o)] ≤ lim inf t−1 Eo[dist(Xt, o)].
So, non-amenable graphs have positive (lower) speed.
Example 19.4. For all d, the random walk on Zd has zero speed:
In fact, we show that for a random walk (Xt)t on Zd, E0[dist(Xt, 0)] t1/2.
Consider the j-th coordinate Xt(j).
E0[Xt+1(j)2 | Ft] =1
2d·((Xt(j) + 1)2 + (Xt(j)− 1)2
)+(1− 1
2d
)·Xt(j)
2 = Xt(j)2 +
1
d.
Thus, Mt = Xt(j)2 − t
d is a martingale, and 0 = E0[Xt(j)2] − t
d . So E0[|Xt(j)|] ≤√t/d, and
E0[dist(Xt, 0)] ≤√dt.
Also, note that we can write
Xt(j) =
t∑k=1
ξk
where (ξk)k are i.i.d. random variables with P[ξk = 1] = P[ξk = −1] = 1/2d and P[ξk = 0] =
1 − 1/2d. Since E[ξk] = 0 and Var[ξk] = E[ξ2k] = 1/d, we get by the central limit theorem that
√dt−1/2Xt(j) converges in distribution to a standard normal random variable, N(0, 1). So
limt→∞
P0[√d|Xt(j)| ≥
√t · 1
2 ] = P[|N(0, 1)| ≥ 12 ] := c > 0.
Thus,
lim inft→∞
1√tE0[|Xt(j)|] ≥ lim inf
t→∞1√tP0[|Xt(j)| ≥ 1
2
√td−1/2] ·
√t
2√d
=c
2√d,
and so
lim inft→∞
1√tE0[dist(Xt, 0)] ≥ c
2
√d.
454
Since many interesting graphs have zero speed, we sometimes are interested in a bit more
precision.
• Definition 19.5. Let (G, c) be a weighted graph and let (Xt)t be the corresponding weighted
random walk. Fix some o ∈ G. The lower escape exponent and upper escape exponent
are defined to be
lim inft→∞
logEo[dist(Xt, X0)]
log tand lim sup
t→∞
logEo[dist(Xt, X0)]
log t.
If these limits coincide, we call the corresponding limit the escape exponent.
110
Example 19.6. Td has escape exponent 1. In fact any graph with positive speed has escape
exponent 1. (This is immediate from logEo[dist(Xt, X0)] = log( 1t Eo[dist(Xt, X0)]) + log t.)
454
Example 19.7. Zd has escape exponent 1/2, as shown above. 454
Speed exponent 1/2 plays an important role in the theory. Walks with speed exponent 1/2
are called diffusive. Walks with speed exponent < 1/2 (resp. > 1/2) are called sub-diffusive
(resp. super-diffusive). Walks with speed exponent 1 (i.e. positive speed) are called ballistic.
19.2. Graph Powers
If G is a graph, there is a natural graph structure on V (G)d: Define the graph Gd to be as
follows. The vertex set of Gd is V (Gd) = V (G)d. The edges are define by the relations:
(x1, . . . , xd) ∼ (y1, . . . , yd) ⇐⇒ ∃ k : ∀ j 6= k , xj = yj and xk ∼ yk.
• Lemma 19.8. Let G be a graph with speed exponent α.
Then, any lazy random walk on G has speed exponent α. Moreover, for any d ≥ 1, the graph
Gd has speed exponent α as well.
Proof. First,
Exercise 19.1. Show that
distGd((x1, . . . , xd), (y1, . . . , yd)) =
d∑j=1
distG(xj , yj).
Now, let (Xt)t be a random walk on Gd and let Xt(j) be the j-th coordinate of Xt. Note
that (Xt(j))t is a lazy random walk on G with holding probability 1− 1d . Then,
dist(Xt, X0) =∑j
dist(Xt(j), X0(j)),
so it suffices to prove that any lazy walk on G has speed exponent α.
Let (Yt)t be a lazy walk on G with holding probability p. Let (Xt)t be a simple random walk
on G. Suppose that P is the transition matrix for the simple random walk (Xt)t on G, so that
111
the transition matrix for (Yt)t is Q = pI + (1− p)P . Let f(x) = dist(x, o). We have that
Qt =
t∑k=0
(t
k
)(1− p)kpt−k · P k,
so
Eo[dist(Yt, o)] =∑x
Qt(o, x)dist(x, o) = (Qtf)(o)
=
t∑k=0
(t
k
)(1− p)kpt−k · P kf(o) =
t∑k=0
(t
k
)(1− p)kpt−k · Eo[dist(Xk, o)].
Now, for any ε > 0 there exists Kε such that for all k > Kε,
kα−ε ≤ Eo[dist(Xk, o)] ≤ kα+ε.
Let Bt ∼ Bin(t, 1− p), and let qk =(tk
)(1− p)kpt−k = P[Bt = k]. By Chebychev’s inequality,
P[|Bt − (1− p)t| > 12 (1− p)t] ≤ 4 Var[Bt]
(1− p)2t2=
4p
(1− p)t ,
so
P[Bt ≥ 12 (1− p)t] ≥ 1− 4p
(1− p)t → 1.
Hence, for ε > 0, for all large enough t (so that (1− p)t > 2Kε),
t∑k=0
qk Eo[dist(Xk, o)] ≥ P[Bt ≥ 12 (1− p)t] ·
(12 (1− p)t
)α−ε,
which implies that
lim inft→∞
logEo[dist(Yt, o)]
log t≥ α− ε+ lim
t→∞log(P[Bt ≥ 1
2 (1− p)t] · ((1− p)/2)α−ε)
log t= α− ε.
On the other hand,t∑
k=0
qk Eo[dist(Xk, o)] ≤ Kε + tα+ε,
so
lim supt→∞
logEo[dist(Yt, o)]
log t≤ α+ ε+ lim
t→∞log(1 + Kε
tα+ε
)log t
= α+ ε.
ut
Number of exercises in lecture: 1
Total number of exercises until here: 28
112
Random Walks
Ariel Yadin
Lecture 20:
20.1. Lamp-Lighter Graphs
We have already seen that non-amenable graphs must have positive speed and so escape
exponent 1. Non-amenable graphs are also transient, because their spectral radius is strictly less
than 1. The converse of these statements does not hold.
Figure 5 sums up the situation (for graphs) in terms of speed, amenability and transience.
zero speed
positive speed
transient recurrent
non-amenable
LL(Z3)
Z3Z
LL(Z)
sub-diffusive
τ3τ = τ((β log2 k)k)
Tdτ(Ackermann)
Exponential growth line
Figure 5. Possibilities for speed, amenability and transience
113
We will now construct a special class of graphs called lamp-lighter graphs. These are used
to give many examples in geometric group theory. They will provide us with examples of
(exponential volume growth) amenable graphs with positive speed.
Let us describe the construction in words, before the formal definition. We start with any
graph G (finite or infinite). This is the base graph. Suppose some lamp-lighter walks around
on the graph G. At every site of G there is some lamp, whose state is either on or off. The
lamp-lighter walks around and can also change the state of the lamp at her current position -
changing it either to on or to off.
What is a position in this new space? A position consists of the configuration of all lamps on
G, that is, a function from G to 0, 1 and the position of the lamp-lighter, i.e. a vertex in G.
• Definition 20.1 (Lamp-Lighter Chain). Let P be a Markov chain on state space S. We define
the Markov chain LL(P ), called lamp-lighter on P , as follows.
The state space for LL(P ) is LL(S) := S × (0, 1S)c, where (0, 1S)c is the set of ω : S →0, 1 with finite support (i.e. ω−1(1) is finite). For a state (x, ω) ∈ LL(S), we call x the position
of the lamp-lighter. If ω(y) = 1 we say the lamp at y is on, and if ω(y) = 0 we say it is off.
For a lamp configuration ω ∈ (0, 1S)c and a position x ∈ S we define ωx ∈ 0, 1S by
ωx(y) = ω(y) for all y 6= x and ωx(x) = ω(x) + 1 (mod 2).
Define the transition matrix LL(P ) by setting
LL(P )((x, ω), (y, η)) =1
4P (x, y),
for η ∈ ω, ωx, ωy, (ωx)y and 0 otherwise.
If (G, c) is a weighted graph, the LL(G) = LL(P ) for P the weighted random walk on (G, c).
X Note that the chain LL(P ) evolves as follows: At each step, the lamp-lighter chooses a
neighbor of her current position with distribution P (x, ·) and moves there, then she refreshes
the state of the lamps at the old position and at the new position to on or off with probability
1/2 each, independently.
Remark 20.2. If G is a graph, then LL(G) defines a graph structure as well. P ((x, ω), (y, η)) > 0
if and only if P (x, y) > 0 and η ∈ ω, ωy. So the graph structure on LL(G) is given by the
relations (x, ω) ∼ (y, η) for η ∈ ω, ωx, ωy, (ωx)y.
In fact:
114
Exercise 20.1. Suppose that (G, c) is a weighted graph, and P is the transition matrix
of the weighted random walk on G. Show that LL(P ) is given by a weighted random walk on a
weighted graph whose vertices are (x, ω), x ∈ G, ω ∈ (0, 1G)c. What is the weight function on
this graph?
Exercise 20.2. Let P be an irreducible Markov chain. Let (Xt, ωt)t be Markov-LL(P ).
Show that (Xt)t is Markov-P .
Exercise 20.3. Let G be a graph, and let L = LL(G). Let o ∈ G and let 0 ∈ 0, 1G
denote the all zero function (configuration). Then, for any (x, ω) ∈ L,
distL((x, ω), (o, 0)) ≥ |ω−1(1)|.
The next example is an (exponential volume growth) amenable graph, but with positive speed.
Example 20.3. Consider L = LL(Z3).
First we show that L is amenable. We only need to demonstrate a Folner sequence. For every
r, let (Br)r be a Folner sequence in Z3 (e.g. the L∞ balls of radius r). Let
Ar =
(x, ω) ∈ L : x ∈ Br , ω−1(1) ⊂ Br.
Note that |Ar| = |Br|2|Br|. Also, ((x, ω), (y, η)) ∈ ∂Ar if and only if (x, y) ∈ ∂Br and η ∈ω, ωy, ωx, (ωx)y. Thus, |∂Ar| = 4|∂Br|2|Br|. Thus, since the degree in L is 12,
Φ(L) ≤ infr
|∂Ar|12|Ar|
= infr
4|∂Br|6|Br|
= 0,
and so L is amenable.
Next we show that L has positive speed. Let 0 denote the all 0 lamp configuration, and let
o = (0, 0) ∈ L. Let (Xt, ωt) be a random walk on L. We claim that for any z 6= 0,
Po[ωt(z) = 1] =1
2· Po[T+
z ≤ t],(20.1)
115
where T+z = inf t ≥ 1 : Xt = z.
Given this, we have that
Eo[dist(Xt, o)] ≥ Eo[|ω−1t (1)|] =
∑z
P[ωt(z) = 1] =1
2· Eo[|Rt|],
where Rt = X1, . . . , Xt. Since (Xt)t is a random walk on Z3, we are left with showing that
limt t−1 EZ3
0 [|Rt|] > 0. In fact, we have using Exercise 20.4 below,
EZ3
0 [|Rt|]t
→ PZ3
0 [T+0 =∞] > 0.
We turn to proving (20.1). Let (y0, η0), . . . , (yn, ηn) be a path in L. Let T = inf t : yt = z,(where inf ∅ =∞). Define a new path
(y0, η0), . . . (yT−1, ηT−1), (yT , ηzT ), (yT+1, η
zT+1), . . . , (yn, η
zn).
Since L is a regular graph, both paths have the same probability. Summing over all possible
paths, we get that for any k ≤ t,
Po[ωt(z) = 1, T+z = k] = Po[ωzt (z) = 1, T+
z = k] = Po[ωt(z) = 0, T+z = k].
So
Po[ωt(z) = 1 | T+z = k] = Po[ωt(z) = 0 | T+
z = k] =1
2.
Thus,
Po[ωt(z) = 1] =
t∑k=1
Po[ωt(z) = 1, T+z = k] =
1
2·
t∑k=1
Po[T+z = k] =
1
2· Po[T+
z ≤ t],
proving (20.1). 454
Exercise 20.4. Show that for d ≥ 3, if (Xt)t is a random walk on Zd, and Rt =
X1, . . . , Xt is the range, then
E0[|Rt|]t
→ P0[T+0 =∞].
Example 20.4. We have already seen an example of amenable zero-speed graphs: Zd. We in
fact know that these are diffusive.
Let us show that this can even be done with a graph of exponential volume growth.
116
We will show that LL(Z) is (at most) diffusive.
Let (Xt, ωt) be a random walk on L = LL(Z). Let o = (0, 0) ∈ L. Define Mt = maxk≤t |Xt|.Since the lamp-lighter up to time t never leaves [−Mt,Mt], we have Po-a.s. that ω−1(1) ⊂[−Mt,Mt].
Note that at time t, the lamp-lighter can walk to one of the ends of [−Mt,Mt] in at most
Mt steps, and then start turning off all the lamps in [−Mt,Mt] in at most 2Mt steps, finally
returning to 0 in at most another Mt steps. Thus, dist(Xt, o) ≤ 4Mt for all t, Po-a.s.
Thus it suffices to show that E[Mt] ≤ 2√t for all t. For this we use a trick called the reflection
principle.
By the strong Markov property at time Tx, we have that
Po[Xt ≥ x, Tx ≤ t] =
t∑k=0
Po[Tx = k] · Px[Xt−k ≥ x]
= 1x=o · Po[Xt ≥ o] +
t−1∑k=1
Po[Tx = k] · 1
2+ Po[Tx = t]
≥ Po[Tx ≤ t] ·1
2.
where we have used transitivity, and symmetry by reflecting around 0:
Px[Xt ≥ x] = P0[Xt ≥ 0] = P0[Xt ≤ 0] ≥ 1
2,
since P0[Xt ≤ 0] + P0[Xt ≥ 0] = 1 + P0[Xt = 0] ≥ 1. We now have
P0[maxk≤t
Xk ≥ x] = P0[Tx ≤ t] ≤ 2P0[Xt ≥ x].
Reflecting around 0,
P0[mink≤t
Xk ≤ −x] = P0[T−x ≤ t] ≤ 2P0[Xt ≤ −x].
So
P0[Mt ≥ x] ≤ P0[maxk≤t
Xk ≥ x] + P0[mink≤t
Xk ≤ −x] ≤ 2P0[Xt ≥ x] + 2P0[Xt ≤ −x].
We conclude with
2E[|Xt|] = 2
∞∑x=0
P0[|Xt| ≥ x+ 1] =
∞∑x=0
2P0[Xt ≥ x+ 1] + 2P0[Xt ≤ −(x+ 1)]
≥∞∑x=0
P0[Mt ≥ x+ 1] = E0[Mt].
117
So E0[Mt] ≤ 2E[|Xt|] ≤ 2√t. Thus,
Eo[dist(Xt, o)] ≤ 8√t.
454
Number of exercises in lecture: 4
Total number of exercises until here: 32
118
Random Walks
Ariel Yadin
Lecture 21:
Our next goal is to complete the picture in Figure 5; that is to give examples of graphs that
are transient, but have very slow speed (sub-diffusive), and examples of graphs that are recurrent
but have positive upper speed.
21.1. Concentration of Martingales: Azuma’s inequality
Let (Xt)t be a random walk on Z. We know (using the martingale |Xt|2−t) that E0[T−r,r] =
r2. That is, it takes a random walk r2 steps to reach distance r. We have already seen that this
implies diffusive behavior of the walk.
Let us prove a short concentration result, showing that actually T−r,r is close to r2 with
very high probability.
••• Theorem 21.1 (Azuma’s Inequality). Let (Mt)t be a (Ft)t-martingale with bounded incre-
ments (i.e. |Mt+1 −Mt| ≤ 1 a.s.). Then for any λ > 0,
P[Mt −M0 ≥ λ] ≤ exp
(−λ
2
2t
).
Proof. There are two main ideas:
The first idea, is that for a random variable X with E[X] = 0 and |X| ≤ 1 a.s. one has
E[eαX ] ≤ eα2/2. Indeed, f(x) = eαx is a convex function, so for |x| ≤ 1 we can write x =
β · 1 + (1− β) · (−1), where β = x+12 , so
eαx ≤ βeα + (1− β)e−α = cosh(α) + x sinh(α).
(Here 2 cosh(α) = eα + e−α and 2 sinh(α) = eα − e−α.) Thus, because E[X] = 0, and using
(2k)! ≥ 2kk!,
E[eαX ] ≤ cosh(α) + E[X] sinh(α) = cosh(α)
=
∞∑k=0
α2k
(2k)!≤∞∑k=0
α2k
2kk!= eα
2/2.
For the second idea, due to Sergei Bernstein, one applies the Chebychev / Markov inequality
Segei Bernstein (1880-1968) to the non-negative random variable eαX , and then optimizes on α.
119
In our case: For every t, since E[Mt −Mt−1 | Ft−1] = 0 and |Mt −Mt−1| ≤ 1, exactly as
above we could show that
E[eα(Mt−Mt−1) | Ft−1] ≤ eα2/2 a.s.
Thus,
E[eα(Mt−M0)] = E[eα(Mt−1−M0) · E[eα(Mt−Mt−1) | Ft−1]
]≤ eα2/2 · E[eα(Mt−1−M0)] ≤ · · · ≤ etα2/2.
Now apply Markov’s inequality to the non-negative random variable eα(Mt−M0) to get
P[Mt −M0 ≥ λ] = P[eα(Mt−M0) ≥ eαλ] ≤ exp(
12 tα
2 − αλ).
Optimizing over α we get that for α = λ/t,
P[Mt −M0 ≥ λ] ≤ exp
(−λ
2
2t
).
ut
Example 21.2. Let’s apply Azuma’s inequality to random walks on Z.
Let (Xt)t be a random walk on Z. Recall that (Xt)t is a martingale. Consider the stopping
time T = T−r,r. This is the first time |Xt| ≥ r.Recall the reflection principle:
P0[T ≤ t] = P0[maxk≤t|Xk| ≥ r] ≤ 4P0[Xt ≥ r].
Using Azuma’s inequality on this last term,
P0[T−r,r ≤ t] ≤ 4 exp
(−r
2
2t
).
454
21.2. Recurrent Trees - The Grove
Let (dk)k∈N be a sequence of positive numbers. For each k, let τk be a binary tree of depth
dk.
Define the graph τ((dk)k) to be the graph N, with the tree τk glued at the vertex k ∈ N (let
the root of τk be k ∈ N); that is, the vertex set of τ((dk)d) is⋃∞k=0 V (τk). The edges are those
in each τk, with the edges k ∼ k + 1 for all k added. We call this the (dk)k-grove.
• Proposition 21.3. The graph τ((dk)k) is recurrent.
120
d0
d1d2
d3
d4
0 1 2 3 4
Figure 6. The graph τ((dk)k).
Proof. If v is a unit voltage on 0 and τk, then for any n ≤ k, and any vertex x ∈ τn we have
that v(x) = v(n). Indeed, if (Xt)t is a random walk on this graph, then because τn is finite,
Px-a.s. the hitting time of n is finite, and also, v(Xt) is a bounded martingale. Thus, by the
optional stopping theorem, v(x) = Ex[v(XTn)] = v(n).
Thus, we can short together all vertices in each tree τn, n ≤ k. This results in the network
which is just the graph N. Thus, Reff(0, τn) = n→∞. ut
Recall that if τ is a finite binary tree of depth d, then |E(τ)| = |V (τ)| − 1 =∑dk=0 2k − 1 =
2d+1 − 2.
• Lemma 21.4. Let r ∈ N ⊂ τ((dk)k). The hitting time of r, Tr, has expectation given by
E0[Tr] = 4
r−1∑k=0
(r − k)2dk − 3
2r(r + 1).
Proof. Every time the walk as at a vertex k ∈ N, with probability 1/2 it starts a random walk
in the finite subtree τk. The expected time to return to the root in a finite tree is the reciprocal
of the stationary distribution on that tree. Thus, we have
λk := Ek[T+k | X1 6∈ N] = |E(τk)| = 2(2dk − 1).
Now, using the strong Markov property, for k > 0,
Ek[Tk+1] =1
4· 1 +
1
4· Ek−1[Tk+1] +
1
2· (λk + Ek[Tk+1])
=1
4+
1
4Ek−1[Tk] +
1
4Ek[Tk+1] +
1
2λk +
1
2Ek[Tk+1].
121
Rearranging and iterating,
Ek[Tk+1] = 2λk + 1 + Ek−1[Tk] = · · · = 2
k∑j=1
λj + k + E0[T1].
Similarly, E0[T1] = 23 · (λ0 + E0[T1]) + 1
3 , so E0[T1] = 2λ0 + 1. Thus,
Ek[Tk+1] = 2
k∑j=0
λj + k + 1 =
k∑j=0
2dj+2 − 3(k + 1),
and
E0[Tr] =
r−1∑k=0
Ek[Tk+1] =
r−1∑k=0
(r − k)2dk+2 − 3
r−1∑k=0
k + 1
=
r−1∑k=0
(r − k)2dk+2 − 3
2r(r + 1)
ut
Exercise 21.1. Let τ be a finite binary tree of depth d with root o. Then,
Po[T+o > 2d] ≥ (2e)−1.
The next theorem gives an example of a tree with speed exponent α for any α ≤ 1/2.
••• Theorem 21.5. Let dk = bβ log2(k+ 1)c. The tree τ((dk)k) has speed exponent (β + 2)−1.
Proof. Let (Xt)t be a random walk on τ = τ((dk)k). For x ∈ τ denote |x| the number that is
the root of the unique finite subtree τk such that x ∈ τk. So |x| ≤ dist(x, 0) ≤ |x| + d|x|. So it
suffices to prove that
limt→∞
logE0[|Xt|]log t
= (β + 2)−1.
The lower bound is simpler. Note that
P0[|Xt| < r] ≤ P0[|Xt| < r, Tr > t] + P0[|Xt| < r, Tr ≤ t]
≤ P0[Tr > t] + Pr[|Xt| < r] ≤ t−1 E0[Tr] +1
2.
If we take t ≥ 4E0[Tr], we get that P0[|Xt| < r] ≤ 34 . Since
E0[Tr] = 4
r−1∑k=0
(r − k)2dk − 3
2r(r + 1)
∫ r
1
(r − x+ 1)xβdx− 3
2r(r + 1) rβ+2,
122
We get that for t = d4E0[Tr]e, r t1/(β+2). So
E0[|Xt|] ≥ P0[|Xt| ≥ r]r ≥r
4 t1/(β+2).
We now turn to the upper bound on E0[|Xt|]. Define inductively the following times. θ0 = 0
and
θn+1 = inf t ≥ θn : |Xt| 6= |Xθn | .
That is, (θn)n are the subsequent times the random walk moves from a vertex in N to a new
vertex in N. For every 0 < k ∈ N
Pk[X1 = k + 1 | X1 ∈ N] = Pk[X1 = k − 1 | X1 ∈ N] =1
2.
So the sequence (Zn = Xθn)n is a random walk on N.
Now, if the walk is at a vertex k ∈ N, then with probability 1/2 it performs an excursion into
the finite subtree τk, and with remaining probability 1/2 is moves in N. Thus, by the exercise
above,
P0[θn+1 > θn + 2dk∣∣ Zn = k,Fθn ] ≥ (4e)−1 a.s.
Let x = r, y = 2r, z = 3r. LetN < M be such that θN = Ty and θM = inf m > N : Zm ∈ x, z.For n ≥ N let Jn = 1θn+1>θn+2dx, and let S = N ≤ n < M : Jn = 1. Since dk ≥ dx for all
k ∈ [x, z], we have from the above, that for any set A ⊂ 0, 1, . . . , k − 1,
P0[S ⊂ A+N | M −N ≥ k] ≤(1− (4e)−1
)k−|A|.
Thus, for any λ < K, the event |S| < λ can be bounded by
P0[|S| < λ] ≤ P0[M −N < K] + P0[|S| < λ | M −N ≥ K]
≤ P0[M −N < K] +
(K
λ
)(1− (4e)−1
)K−λ.
Now, M − N is the number of steps a random walk on Z started at y = 2r takes to reach
x, z = r, 3r. Translating r 7→ 0 we get that P0[M − N < λ] is bounded by the probability
a random walk on Z started at 0 reaches [−r, r] before time λ. Azuma’s inequality above (and
Example 21.2 following it) tells us that
P0[M −N < K] ≤ 4 exp
(− r2
2K
).
Taking K = br2(8 log r)−2c and λ = bεKc for ε small enough (so that − log(1− (4e)−1) · (ε−1 −1) > −2 log ε) we have
P0[|S| < λ] < exp(−Ω((log r)2)) + exp(−Ω((r/ log r)2)),
123
which decays faster than any polynomial in r.
The event |S| ≥ λ implies that
T3r ≥ θM > |S| · 2dx + θN > λ · 2dx .
We thus conclude that for t = bλ · 2dxc,
P0[|Xt| > 3r] ≤ P0[T3r < t] ≤ P0[|S| < λ] ≤ exp(−Ω((log r)2)).
Since λ r2(log r)−1 and 2dx rβ . we get that t r2+β(log r)−1, and
E0[|Xt|] ≤ 3r + t · P0[|Xt| > 3r] ≤ 3r + t · exp(−Ω((log t)2)) t1/(β+2)+o(1).
So
lim supt→∞
logE0[|Xt|]log t
≤ 1
β + 2,
which coincides with our lower bound. ut
21.3. Transient and Sub-Diffusive
Example 21.6. We now have an example of a transient sub-diffusive graph. Let τ = τ((dk)k)
be the grove for dk = bβ log2(k + 1)c. Let G = τ3.
We know that as a graph power G has speed exponent (β + 2)−1. However, since N is a
subgraph of τ , then also N3 is a subgraph of G. We know that N3 is transient, so G must be
transient as well. 454
21.4. Recurrent Positive Speed
Raymond Paley
(1907-1933) Exercise 21.2. [Paley-Zygmund Inequality] Let X be a non-negative random variable.
Let α ∈ [0, 1]. Then,
P[X > αE[X]] ≥ (E[X])2
E[X2]· (1− α)2.
Antoni Zygmund
(1900-1992)
• Lemma 21.7. There exists a universal constant p > 0 such that the following holds. Let τ be
a finite binary tree of depth d with root o. For any t ≤ d,
Po[dist(Xt, o) ≥ 16 t] ≥ p,
where (Xt)t is a random walk on τ .
124
Proof. Let Dt = dist(Xt, o). We have already seen that for Lt :=∑tk=0 1Xt=o (L−1 = 0) and
Mt = Dt − 13 t − Lt−1 that (Mt)
dt=0 is a martingale (the restriction to t ≤ d is so that the walk
does not reach the leaves). Thus, for t ≤ d,
Eo[Dt] = 13 t+ Eo[Lt−1].
Also, for t ≤ d,
Eo[D2t | Ft−1] = 1Xt−1=o + 1Xt−1 6=o ·
(13 · (Dt−1 − 1)2 + 2
3 · (Dt−1 + 1)2)
= D2t−1 + 1 +
2
3·Dt−1.
So
Eo[D2t ] = Eo[D2
t−1] + 1 +2
3E0[Dt−1] = · · · = t+
2
3
t−1∑k=0
13k + Eo[Lk−1].
Note that for t ≤ d, we have that Lt is the number of visits to the root up to time t. Let q be
the probability that a random walk on an infinite rooted binary tree does not return to the root.
Then, if A is the set of leaves in τ then Po[TA < T+o ] ≥ q. However, since Po-a.s. t ≤ d ≤ TA,
we get that for any t ≤ d,
1 ≤ Eo[Lt] ≤ Eo[LTA ] =1
Po[TA < T+o ]≤ 1
q<∞.
Thus we conclude that
Eo[D2t ] ≤ t+
1
9t(t− 1) +
1
qt.
We now use the Payley-Zygmund inequality to conclude that for any t ≤ d,
Po[Dt ≥ 12 Eo[Dt]] ≥
(Eo[Dt])2
4Eo[D2t ]≥ 1
4·
19 t
2
19 t
2 + ( 89 + 1
q ) · t →14 .
Since Eo[Dt] ≥ 13 t the proof is complete. ut
Example 21.8. We complete the picture in Figure 5 by giving an example of a recurrent graph,
but with positive speed.
Recall that for the (dk)k grove, the expected time to reach the vertex r ∈ N is
E0[Tr] = 4
r−1∑k=0
(r − k)2dk − 3
2r(r + 1).
Let (dk)k be an increasing sequence that satisfies
dr > 1 + 4 ·(r−1∑k=0
(r − k)2dk − 3
2r(r + 1)
).
125
(This sequence must grow super-fast, at least like the Ackermann tower function.) Note that
this ensures that dr > d4E0[Tr]e. Consider the (dk)k-grove τ = τ((dk)k).
τ is of course recurrent.
Recall that for a random walk (Xt)t on τ and for t ≥ 4E0[Tr], we have that P0[|Xt| < r] ≤ 34
(where |Xt| is the root of the finite subtree containing Xt).
Given that X0 = r, we have by Lemma 21.7 for any t ≤ dr,
Pr[dist(Xt, r) ≥ 16 t] ≥ c > 0,
for some universal constant c > 0. So if we take t = 2 · t′ for t′ = d4E0[Tr]e, then t′ < dr so
P0[dist(Xt, 0) ≥ 16 t′] ≥
t′∑k=r
P0[|Xt′ | = k] · Pk[dist(Xt, k) ≥ 16 t′] ≥ 1
4· c.
So
E0[dist(Xt, 0)] ≥ 1
4c · 1
6t′ t.
And so τ has positive speed. 454
Number of exercises in lecture: 2
Total number of exercises until here: 34
126
Random Walks
Ariel Yadin
Lecture 22:
22.1. Galton-Watson Processes
The final topic for this course is a special Markov chain on trees, known as the Galton-Watson
process.
Francis Galton (1822-1911) Galton and Watson were interested in the question of the survival of aristocratic surnames in
the Victorian era. They proposed a model to study the dynamics of such a family name.
Henry Watson (1827-1903)
In words, the model can be stated as follows. We start with one individual. This individual
has a certain random number of offspring. Thus passes one generation. In the next generation,
each one of the offspring has its own offspring independently. The processes continues building
a random tree of descent.
The formal definition is a bit complicated. For the moment let us focus only on the population
size at a given generation.
• Definition 22.1. Let µ be a distribution on N; i.e. µ : N→ [0, 1] such that∑n µ(n) = 1. The
Galton-Watson Process, with offspring distribution µ, (also denoted GWµ,) is the following
Markov chain (Zn)n on N:
Let (Xj,k)j,k∈N be a sequence of i.i.d. random variables with distribution µ.
• At generation n = 0 we set Z0 = 1. [ Start with one individual. ]
• Given Zn, let
Zn+1 :=
Zn∑k=1
Xn+1,k.
[ Xn+1,k represents the number of offspring of the k-th individual in generation n. ]
Example 22.2. If µ(0) = 1 then the GWµ process is just the sequence Z0 = 1, Zn = 0 for all
n > 0.
If µ(1) = 1 then GWµ is Zn = 1 for all n.
How about µ(0) = p = 1 − µ(1)? In this case, Z0 = 1, and given that Zn = 1, we have
that Zn+1 = 0 with probability p, and Zn+1 = 1 with probability 1 − p, independently of all
(Zk : k ≤ n). If Zn = 0 the Zn+1 = 0 as well.
127
What is the distribution of T = inf n : Zn = 0? Well, on can easily check that T ∼ Geo(p).
So GWµ is essentially a geometric random variable.
We will in general assume that µ(0) + µ(1) < 1, otherwise the process is not interesting.
454
22.2. Generating Functions
X Notation: For a function f : R→ R we write f (n) = f · · · f for the composition of f
with itself n times.
Let X be a random variable with values in N. The probability generating function, or
PGF , is defined as
GX(z) := E[zX ] =∑n
P[X = n]zn.
This function can be thought of as a function from [0, 1] to [0, 1]. If µ(n) = P[X = n] is the
density of X, then we write Gµ = GX .
Some immediate properties:
Exercise 22.1. Let GX be the probability generating function of a random variable X
with values in N. Show that
• If z ∈ [0, 1] then 0 ≤ GX(z) ≤ 1.
• GX(1) = 1.
• GX(0) = P[X = 0].
• G′X(1−) = E[X].
• E[X2] = G′′X(1−) +G′X(1−).
• ∂n
∂znGX(0+) = P[X = n].
• Proposition 22.3. A PGF GX is convex on [0, 1].
Proof. GX is twice differentiable, with
G′′X(z) = E[X(X − 1)zX−2] ≥ 0.
ut
The PGF is an important tool in the study of Galton-Watson processes.
128
• Proposition 22.4. Let (Zn)n be a GWµ process. For z ∈ [0, 1],
E[zZn+1∣∣ Z0, . . . , Zn] = Gµ(z)Zn .
Thus,
GZn = G(n)µ = Gµ · · · Gµ.
Proof. Conditioned on Z0, . . . , Zn, we have that
Zn+1 =
Zn∑k=1
Xk,
where X1, . . . , are i.i.d. distributed according to µ. Thus,
E[zZn+1∣∣ Z0, . . . , Zn] = E[
Zn∏k=1
zXk∣∣ Z0, . . . , Zn] =
Zn∏k=1
E[zXk ] = Gµ(z)Zn .
Taking expectations of booths sides we have that
GZn+1(z) = E[zZn+1 ] = E[Gµ(z)Zn ] = GZn(Gµ(z)) = GZn Gµ(z).
An inductive procedure gives
GZn = GZn−1 Gµ = GZn−2 Gµ Gµ = · · · = G(n)µ ,
since GZ1 = Gµ. ut
22.3. Extinction
Recall that the first question we would like to answer is the extinction probability for a GW
process.
Let (Zn)n be a GWµ process. Extinction is the event ∃n : Zn = 0. The extinction
probability is defined to be q = q(GWµ) = P[∃n : Zn = 0]. Note that the events Zn = 0form an increasing sequence, so
q(GWµ) = limn→∞
P[Zn = 0].
• Proposition 22.5. Consider a GWµ. (Assume that µ(0) +µ(1) < 1.) Let q = q(GWµ) be the
extinction probability and G = Gµ. Then,
• q is the smallest solution to the equation G(z) = z. If only one solution exists, q = 1.
Otherwise, q < 1 and the only other solution is G(1) = 1.
• q = 1 if and only if G′(1−) = E[X] ≤ 1.
129
X Positivity of the extinction probability depends only on the mean number of offspring!
Proof. If P[X = 0] = G(0) = 0 then Zn ≥ Zn−1 for all n, so q = 0, because there is never
extinction. Also, the only solutions to G(z) = z in this case are 0, 1 because G′′(z) > 0 for z > 0
so G′ is strictly convex, and thus G(z) < z for all z ∈ (0, 1). So we can assume that G(0) > 0.
Let f(z) = G(z)− z. So f ′′(z) > 0 for z > 0. Thus, f ′ is a strictly increasing function.
• Case 1: If G′(1−) ≤ 1. So f ′(1−) ≤ 0. Since f ′(0+) = −(1 − µ(1)) < 0 (because
µ(1) < 1), and since f ′ is strictly increasing, for all z < 1 we have that f ′(z) < 0. Thus,
the minimal value of f is at 1; that is, f(z) > 0 for all z < 0 and there is only one
solution to f(z) = 0 at 1.
• Case 2: IfG′(1−) > 1. Then f ′(1−) > 0. Since f ′(0+) < 0 there must be some 0 < x < 1
such that f ′(x) = 0. Since f ′ is strictly increasing, this is the unique minimum of f in
[0, 1]. Since f ′(z) > 0 for z > x, as a minimum, we have that f(x) < f( 1+x2 ) ≤ f(1) = 0.
Also, f(0) = µ(0) > 0, and because f is continuous, there exists a 0 < p < x such that
f(p) = 0.
We claim that p, 1 are the only solutions to f(z) = 0. Indeed, if a < b are any such
solutions, then because f is strictly convex, for any a < z < b, f(z) < αf(a) + (1 −α)f(b) = 0 for some α ∈ (0, 1).
In conclusion, in the case G′(1−) > 1 we have that there are exactly two solutions to
G(z) = z, which are p and 1.
Moreover, p < x for x the unique minimum of f , so because f ′ is strictly increasing,
−1 ≤ −(1− µ(1)) = f ′(0+) ≤ f ′(z) ≤ f ′(p) < f ′(x) = 0
for any z ≤ p. Thus, for any z ≤ p we have that
f(z) = f(z)− f(p) = −∫ p
z
f ′(t)dt ≤ p− z,
which implies that G(z) ≤ p for any z ≤ p.
Now, recall that the extinction probability admits
q = limn→∞
P[Zn = 0] = limn→∞
GZn(0) = limn→∞
G(n)(0).
Since G is a continuous function, we get that G(q) = q so q is a solution to G(z) = z.
If two solutions exists (equivalently, G′(1−) > 1), say p and 1, then G(n)(0) ≤ p for all n, so
q ≤ p and thus must be q = p < 1.
130
If only one solution exists then q = 1. ut
q
0 1 0 1
G′(1−) > 1
G′(1−) ≤ 1
Figure 7. The two possibilities for G′(1−). The blue dotted line and crosses
show how the iterates G(n)(0) advance toward the minimal solution of G(z) = z.