Download pdf - Contentsyadina/RWnotes.pdfRANDOM WALKS ARIEL YADIN Course: 201.1.8031 Spring 2016 Lecture notes updated: May 2, 2016 Contents Lecture 1. Introduction 3 Lecture 2. Markov Chains 8 Lecture

RANDOM WALKS

ARIEL YADIN

Course: 201.1.8031 Spring 2016

Lecture notes updated: May 2, 2016

Contents

Lecture 1. Introduction 3

Lecture 2. Markov Chains 8

Lecture 3. Recurrence and Transience 18

Lecture 4. Stationary Distributions 26

Lecture 5. Positive Recurrent Chains 33

Lecture 6. Convergence to Equilibrium 37

Lecture 7. Conditional Expectation 42

Lecture 8. Martingales 50

Lecture 9. Reversible Chains 55

Lecture 10. Discrete Analysis 60

Lecture 11. Networks 67

Lecture 12. Network Reduction 73

Lecture 13. Thompson’s Principle 80

Lecture 14. Nash-Williams 84

Lecture 15. Flows 89

Lecture 16. Resistance in Euclidean Lattices 93

Lecture 17. Spectral Analysis 981

2

Lecture 18. Kesten’s Amenability Criterion 103

Lecture 19. 107

Lecture 20. 112

Lecture 21. 118

Lecture 22. 126

Number of exercises in lecture: 0

Total number of exercises until here: 0

3

Random Walks

Ariel Yadin

Lecture 1: Introduction

1.1. Overview

In this course we will study the behavior of random processes; that is, processes that evolve

in time with some randomness, or probability measure, governing the evolution.

Let us give some examples:

• A gambler playing the roulette.

• A drunk man walking in some city.

• A drunk bird flying in the sky.

• The evolution of a certain family name.

Some questions which we will be able to (hopefully) answer by the end of the course:

• Suppose a gambler starts with N Shekel. What is the probability that the gambler will

earn another N Shekel before losing all of the money?

• How long will it take for a drunk man walking to reach either his house or the city limits?

• Suppose a chess knight moves randomly on a chess board. Will the knight eventually

return to the starting point? What is the expected number of steps until the knight

returns?

• Suppose that men of the Rothschild family have three children on average. What is the

probability that the Rothschild name will still be alive in another 100 years? Is there

positive probability for the Rothschild name to survive forever?

1.2. Random Walks on Z

We will start with some “soft” example, and then go into the more deep and precise theory.

What is a random walk? A (simple) random walk on a graph is a process, or a sequence of

vertices, such that at every step the next vertex is chosen uniformly among the neighbors of the

current vertex, each step of the walk independently.

X Story about Polya meeting a couple in the woods.

George Polya (1887-1985)

4

Figure 1. Path of a drunk man walking in the streets.

Figure 2. Path of a drunk bird flying around.

Now, suppose we want to perform a random walk on Z. If the “walker” is at a vertex z, then

a uniformly chosen neighbor is choosing z + 1 or z − 1 with probability 1/2 each.

5

That is, we can model a random walk on Z by considering an i.i.d. sequence (Xk)∞k=1, where

Xk is uniform on −1, 1, and the walk will be St =∑tk=1Xk. So Xk is the k-th step of the

walk, and St is the position after t steps.

Let us consider a few properties of the random walk on Z:

First let us calculate the expected number of visits to 0 by time t:

• Proposition 1.1. Let (St)t be a random walk on Z. Denote by Vt the number of visits to 0

up to time t; that is,

Vt = # 1 ≤ k ≤ t : Sk = 0 .

Then, there exists a constant c > 0 such that for all t,

E[Vt] > c√t.

Proof. An inequality we will use is Stirling’s approximation of n!:

√2πn(n/e)ne

112n+1 < n! <

√2πn(n/e)ne

112n .

This leads by a bit of careful computation to:

1√πn· 22n exp

(− 1

12n+ 1

)<

(2n

n

)<

1√πn· 22n exp

(1

12n

).

Specifically,

12 <√πn · 2−2n

(2n

n

)< 2.

James Stirling (1692-1770)Now, what is the probability P[Sk = 0]? Note that there are k steps, so for Sk = 0 we need

that the number of +’s equals the number of −’s. Rigorously, if

Rt = # 1 ≤ k ≤ t : Xk = 1 and Lt = # 1 ≤ k ≤ t : Xk = −1 ,

then Rt + Lt = t. Moreover, the distribution of Rt is Bin(t, 1/2). Also, St = Rt − Lt, so for

St = 0 we need that Rt = Lt = t/2. This is only possible for even t, and we get

P[S2k = 0] = P[R2k = k] =

(2k

k

)2−2k and P[S2k+1 = 0] = 0.

Now, note that Vt =∑tk=1 1Sk=0. So

E[Vt] =

t∑k=1

P[Sk = 0] =

bt/2c∑k=1

(2k

k

)2−2k

6

Since

m∑k=1

1√πk≥∫ m+1

1

1√πxdx = 2π−1/2 · (

√m+ 1− 1),

we get that E[Vt] ≥ c√t for some c > 0. ut

Let us now consider the probability that the random walker will return to the origin.

• Proposition 1.2. P[∃ t ≥ 1 : St = 0] = 1.

Proof. Let p = P[∃ t ≥ 1 : St = 0]. Assume for a contradiction that p < 1. (p > 0 since

p > P[S2 = 0] = 12 .) Suppose that St = 0 for some t > 0. Then, since St+k = St +

∑kj=1Xt+j ,

(St+k)k has the same distribution as a random walk on Z, and is independent of St.

So P[∃ k ≥ 1 : St+k = 0 | St = 0] = p. Thus, every time we are at 0 there is probability

0 < 1− p < 1 to never return.

Now we consider the different “excursions”. That is, let T0 = 0 and define inductively

Tk = inf t ≥ Tk−1 + 1 : St = 0 ,

where inf ∅ =∞. Now let K be the first k such that Tk =∞. The analysis above gives that for

k ≥ 1,

P[K = k] = P[T1 <∞, . . . , Tk−1 <∞, Tk =∞] = P[T1−T0 <∞, . . . , Tk−1−Tk−2 <∞, Tk−Tk−1 =∞].

The main observation now is that the different Tk − Tk−1 are independent, so P[K = k] =

pk−1(1 − p). That is, K ∼ Geo(1 − p). Thus, E[K] = 11−p . But note that K is exactly the

number of visits to 0 in the infinite time walk. That is, Vt K. However, in the previous

proposition we have shown that E[Vt] ≥ c√t→∞ a contradiction!

So it must be that p = 1. ut

It is not a coincidence that the expected number of visits to 0 is infinite, and that the

probability to return to 0 is 1. This will also be the case in 2-dimensions, but not in 3-dimensions.

In the upcoming classes we will rigorously prove the following theorem by Polya.

••• Theorem 1.3. Fix d ≥ 1. Let (Xk)k be i.i.d. d-dimension random variables uniformly dis-

tributed on ±e1, . . . ,±ed (where e1, . . . , ed is the standard basis for Rd). Let St =∑tk=1Xk.

Let p(d) = P[∃ t ≥ 1 : St = 0]. Then, p(d) = 1 for d ≤ 2 and p(d) < 1 for d ≥ 3.

7

Remark 1.4. The proof for d ≥ 3 is mainly that P[St = 0] ≤ Ct−d/2. Thus, for d ≥ 3,

∞∑t=1

P[St = 0] <∞.

So by the Borel-Cantelli Lemma P[St = 0 i.o. ] = 0. In other words,

P[∃ T : ∀t > T St 6= 0] = P[lim inf St 6= 0] = 1.

Thus, a.s. the number of visits to 0 is finite. If the probability to return to 0 was 1, then the

number of visits to 0 must be infinite a.s. All this will be done rigorously in the upcoming

classes.



8

Random Walks

Ariel Yadin

Lecture 2: Markov Chains

2.1. Preliminaries

2.1.1. Graphs. We will make use of the structure known as a graph:

X Notation: For a set S we use(Sk

)to denote the set of all subsets of size k in S; e.g.

(S

2

)= x, y : x, y ∈ S, x 6= y .

• Definition 2.1. A graph G is a pair G = (V (G), E(G)), where V (G) is a countable set, and

E(G) ⊂(V (G)

2

).

The elements of V (G) are called vertices. The elements of E(G) are called edges. The

notation xG∼ y (sometimes just x ∼ y when G is clear from the context) is used for x, y ∈ E(G).

If x ∼ y, we say that x is a neighbor of y, or that x is adjacent to y. If x ∈ e ∈ E(G) then

the edge e is said to be incident to x, and x is incident to e.

The degree of a vertex x, denoted deg(x) = degG(x) is the number of edges incident to x in

G.

X Notation: Many times we will use x ∈ G instead of x ∈ V (G).

Example 2.2. • The complete graph.

• Empty graph on n vertices.

• Cycles.

• Z,Z2,Zd.

• Regular trees.

• Cayley graphs of finitely generated groups: Let G =< S > be a finitely generated group,

with a finite generating set S such that S is symmetric (S = S−1). Then, we can equip G

with a graph structure C = CG,S by letting V (C) = G and g, h ∈ E(C) iff g−1h ∈ S.

S being symmetric implies that this is a graph.

9

CG,S is called the Cayley graph of G with respect to S.

Examples: Zd, regular trees, cycles, complete graphs.

454

• Definition 2.3. Let G be a graph. A path in G is a sequence γ = (γ0, γ1, . . . , γn) (with the

possibility of n = ∞) such that for all j, γj ∼ γj+1. γ0 is the start vertex and γn is the end

vertex (when n <∞).

The length of γ is |γ| = n.

If γ is a path in G such that γ starts at x and ends at y we write γ : x→ y.

The notion of a path on a graph gives rise to two important notions: connectivity and graph

distance.

• Definition 2.4. Let G be a graph. For two vertices x, y ∈ G define

dist(x, y) = distG(x, y) := inf |γ| : γ : x→ y ,

where inf ∅ =∞.

Exercise 2.1. Show that distG defines a metric on G.

(Recall that a metric is a function that satisfies:

• ρ(x, y) ≥ 0 and ρ(x, y) = 0 iff x = y.

• ρ(x, y) = ρ(y, x).

• ρ(x, y) ≤ ρ(x, z) + ρ(z, y). )

• Definition 2.5. Let G be a graph. We say that vertices x and y are connected if there exists

a path γ : x → y of finite length. That is, if distG(x, y) < ∞. We denote x connected to y by

x↔ y.

The relation ↔ is an equivalence relation, so we can speak of equivalence classes. The equiv-

alence class of a vertex x under this relation is called the connected component of x.

If a graphG has only one connected component it is called connected. That is, G is connected

if for every x, y ∈ G we have that x↔ y.

10

Exercise 2.2. Prove that ↔ is an equivalence relation in any graph.

X In this course we will focus on connected graphs.

X Notation: For a path in a graph G, or more generally, a sequence of elements from a set

S, we use the following “time” notation: If s = (s0, s1, . . . , sn, . . .) is a sequence in S (finite of

infinite), then s[t1, t2] = (st1 , st1+1, . . . , st2) for all integers t2 ≥ t1 ≥ 0.

2.1.2. S-valued random variables. Given a countable set S, we can define a discrete topology

on S. Thus, the Borel σ-algebra on S is just the complete σ-algebra 2S . This gives rise to

the notion of S-valued random variables, which is just a fancy name for functions X from a

probability space into S such that for every s ∈ S the pull-back X−1(s) is an event.

That is,

• Definition 2.6. Let (Ω,F ,P) be a probability space, and let S be a countable set. A S-valued

random variable is a function X : Ω→ S such that for any s ∈ S, X−1(s) ∈ F .

2.1.3. Sequences - infinite dimensional vectors. At some point, we will want to consider

sequences of random variables. If X = (Xn)n is a sequence of S-valued random variables, we

can think of X as an infinite dimensional vector.

What is the appropriate measurable space for such vectors?

Well, we can consider Ω = SN, the space of all sequences in S. Next, we have a π-system

of cylinder sets: Given a finite sequence s0, s1, . . . , sm in S, the cylinder induced by these is

C = C(s1, . . . , sm) =ω ∈ SN : ω0 = s0, . . . , ωm = sm

. The collection of all cylinder sets

forms a π-system. We let F be the σ-algebra generated by this π-system.

2.1.4. Caratheodory and Kolmogorov extension. Now suppose we have a probability mea-

sure P on (Ω,F) as above. For every n, we can consider the restriction of P to the first n

Constantin Caratheodory

(1873-1950)

coordinates; that is, we can consider Ωn = Sn and the full σ-algebra on Ωn, and then

Pn[s0, s1, . . . , sn−1] := P[C(s0, s1, . . . , sn−1)]

11

defines a probability measure on Ωn. Note that these measures are consistent, in the sense that

for any n > m,

Pm[s0, . . . , sm] = Pn[ω ∈ Sn : ω0 = s0, . . . , ωm = sm].

Theorems by Caratheodory and Kolmogorov tell us that if we started with a consistent family

of probability measure on Sn, n = 1, 2, . . ., we could find a unique extension of these whose

restriction would give these measures.

Andrey Kolmogorov

(1903-1987)

In other words, the finite-dimensional marginals determine the probability measure of the

sequence.

2.1.5. Matrices. Recall that if A,B are n × n matrix and v is an n-dimensional vector, then

Av, vA are vectors defined by

(Av)k =

n∑j=1

Ak,jvj and (vA)k =

n∑j=1

vjAj,k.

Also, AB is the matrix defined by

(AB)m,k =

n∑j=1

Am,jBj,k.

These definitions can be generalized to infinite dimensions.

Also, we will view vectors also as functions, and matrices as operators. For example, if

C0(N) = RN = f : N→ R. Then, any infinite matrix A is an operator on C0(N) by defining

(Af)(k) :=∑n

A(k, n)f(n) and (fA)(k) :=∑n

f(n)A(n, k).

2.2. Markov Chains

A stochastic process is just a sequence of random variables. If (Xn)n is a stochastic

process, or a sequences of random variables, then we can think of the sequence (Xn)n as a

infinite dimensional random variable; consider the function f : N → R defined by f(n) = Xn.

This is a different function for each ω ∈ Ω. We can view this as a random function.

Up till now we have not restricted our processes - so anything can be a stochastic process.

However, in the discussion regarding random walks, we wanted the current step to be dependent

only on the position, regardless of the history and time. This gives rise to the following definition:

• Definition 2.7. Let S be a countable set. A Markov chain on S is a sequence (Xn)n≥0 of

S-valued random variables (i.e. measurable functions Xn : Ω → S), that satisfies the following

Markovian property:

12

• For any n ≥ 0, and any s0, s1, . . . , sn, sn+1 ∈ S,

P[Xn+1 = sn+1|X0 = s0, . . . , Xn = sn] = P[Xn+1 = sn+1|Xn = sn] = P[X1 = sn+1|X0 = sn].

Andrey Markov (1871-1897)

That is, the probability to go from s to s′ does not depend on n or on the history, but only

on the current position being at s and on s′. This property is known as the Markov property.

X A set S as above is called the state space.

Remark 2.8. Any Markov chain is characterized by its transition matrix.

Let (Xn)n be a Markov chain on S. For x, y ∈ S define P (x, y) = P[Xn+1 = y|Xn = x]

(which is independent of n). Then, P is a |S| × |S| matrix indexed by the elements of S. One

immediately notices that for all x, ∑y∈S

P (x, y) = 1,

and that all the entries of P are in [0, 1]. Such a matrix is called stochastic. [ Each row of the

matrix is a probability measure on S. ]

On the other hand, suppose that P is a stochastic matrix indexed by a countable set S. Then,

one can define the sequence of S-valued random variables as follows. Let X0 = x for some fixed

starting point x ∈ X. For all n ≥ 0, conditioned on X0 = s0, . . . , Xn = sn, define Xn+1 as

the random variable with distribution P[Xn+1 = y|Xn = sn, . . . , X0 = s0] = P (sn, y). One can

verify that this defines a Markov chain.

We will identify a stochastic matrix P with the Markov chain it defines.

X Notation: We say that (Xt)t is Markov-(µ, P ) if (Xt)t is a Markov chain with transition

matrix P and starting distribution X0 ∼ µ. If we wish to stress the state space, we say that

(Xt)t is Markov-(µ, P, S). Sometimes we omit the starting distribution; i.e. (Xt)t is Markov-P

means that (Xt)t is a Markov chain with transition matrix P .

Example 2.9. Consider the following state space and matrix: S = Z. P (x, y) = 0 if |x− y| 6= 1

and P (x, y) = 1/2 if |x− y| = 1.

What if we change this to P (x, y) = 1/4 for |x− y| = 1 and P (x, x) = 1/2?

What about P (x, x+ 1) = 3/4 and P (x, x− 1) = 1/4? 454

Example 2.10. Consider the set Zn := Z/nZ = 0, 1, . . . , n− 1. Let P (x, y) = 1/2 for

x− y ∈ −1, 1 (mod n). 454

13

Example 2.11. Let G be a graph. For x, y ∈ G define P (x, y) = 1deg(x) if x ∼ y and P (x, y) = 0

if x 6∼ y.

This Markov chain is called the simple random walk on G.

If we take 0 < α < 1 and set Q(x, x) = α and Q(x, y) = (1 − α) · 1deg(x) for x ∼ y, and

Q(x, y) = 0 for x 6∼ y, then Q is also a stochastic matrix, and defines what is sometimes called

the lazy random walk on G (with holding probability α). Note that Q = αI+ (1−α)P . 454

X Notation: We will usually use (Xn)n to denote the realization of Markov chains. We will

also use Px to denote the probability measure Px = P[·|X0 = x]. Note that the Markov property

is just the statement that

P[Xn = x|X0 = s0, . . . , Xn = sn] = P[Xn+1 = x|Xn = sn] = Psn [X1 = x].

More generally, if µ is a probability measure on S, we write

Pµ = P[·|X0 ∼ µ] =∑s

µ(s)Ps .

Exercise 2.3. Let (Xn)n be a Markov chain on state space S, with transition matrix

P . Show that for any event A ∈ σ(X0, . . . , Xk)

Pµ[Xn+k = y|A,Xk = x] = Pn(x, y)

(provided Pµ[A,Xk = x] > 0).

Remark 2.12. For those uncomfortable with σ-algebras,

Example 2.13. Consider a bored programmer. She has a (possibly biased) coin, and two chairs,

say a and b. Every minute, out of boredom, she tosses the coin. If it comes out heads, she moves

to the other chair. Otherwise, she does nothing.

This can be modeled by a Markov chain on the state space a, b. At each time, with some

probability 1− p the programmer does not move, and with probability p she jumps to the other

state. The corresponding transition matrix would be P =

1− p p

p 1− p

.

What is the probability Pa[Xn = b] =? For this we need to calculate Pn.

A complicated way would be to analyze the eigenvalues of P ...

14

An easier way: Let µn = Pn(a, ·). So µn+1 = µnP . Consider the vector π = (1/2, 1/2).

Then πP = P . Now, consider an = (µn − π)(a). Since µn is a probability measure, we get that

µn(b) = 1− µn(a), so

an = (µn−1 − π)P (a) = (1− p)µn−1(a) + pµn−1(b)− 1/2

= (1− 2p)(µn−1 − π)(a) + p− π(a) + (1− 2p)π(a) = (1− 2p)an−1.

So an = (1− 2p)na0 = (1− 2p)n · 12 and Pn(a, a) = µn(a) = 1+(1−2p)n

2 . (This also implies that

Pn(a, b) = 1− Pn(a, a) = 1−(1−2p)n

2 .)

We see that

Pn → 12

1 1

1 1

=

ππ

.454

The following proposition relates starting distributions, and steps of the Markov chain, to

matrix and vector multiplication.

• Proposition 2.14. Let (Xn)n be a Markov chain with transition matrix P on some state

space S. Let µ be some distribution on S; i.e. µ is an S-indexed vector with∑s µ(s) = 1. Then,

Pµ[Xn = y] = (µPn)(y). Specifically, taking µ = δx we get that Px[Xn = y] = Pn(x, y).

Moreover, if f : S → R is any function, which can be viewed as a S-indexed vector, then

µPnf = Eµ[f(Xn)] and (Pnf)(x) = Ex[f(Xn)].

Proof. This is shown by induction: It is the definition for n = 0 (P 0 = I the identity matrix).

The Markov property gives for n > 0, using induction,

Pµ[Xn = y] =∑s∈S

Pµ[Xn = y|Xn−1 = s]Pµ[Xn−1 = s]

=∑s

P (s, y)(µPn−1)(s) = ((µPn−1)P )(y) = (µPn)(y).

The second assertion also follows by conditional expectation,

Eµ[f(Xn)] =∑s

µ(s)E[f(Xn)|X0 = s] =∑s

µ(s)∑x

P[Xn = x|X0 = s]f(x)

=∑s,x

µ(s)Pn(s, x)f(x) = µPnf.

(Pnf)(x) = Ex[f(Xn)] is just for µ = δx. ut

15

2.3. Classification of Markov chains

When we spoke about graphs, we have the notion of connectivity. We are now interested to

generalize this notion to Markov chains. We want to say that a state x is connected to a state y

if there is a way to get from x to y; note that for general Markov chains this does not necessarily

imply that one can get from y to x.

• Definition 2.15. Let P be the transition matrix of a Markov chain on S. P is called

irreducible if for every pair of states x, y ∈ S there exists t > 0 such that P t(x, y) > 0.

This means that for every pair, there is a large enough time such that with positive probability

the chain can go from one of the pair to the other in that time.

Example 2.16. Consider the cycle Z/nZ, for n even. This is an irreducible chain since for any

x, y, we have for t = dist(x, y), if γ is a path of length t from x to y,

P t(x, y) ≥ Px[(X0, . . . , Xt) = γ] = 2−t > 0.

Note that at each step, the Markov chain moves from the current position +1 or −1 (mod n).

Thus, since n is even, at even times the chain must be at even vertices, and at odd times the

chain must be at odd vertices.

Thus, it is not true that there exists t > 0 such that for all x, y, P t(x, y) > 0.

The main reason for this is that the chain has a period: at even times it is on some set, and

at odd times on a different set. Similarly, the chain cannot be back at its starting point at odd

times, only at even times. 454

• Definition 2.17. Let P be a Markov chain on S.

• A state x is called periodic if gcd t ≥ 1 : P t(x, x) > 0 > 1, and this gcd is called the

period of x.

• If gcd t ≥ 1 : P t(x, x) > 0 = 1 the x is called aperiodic.

• P is called aperiodic if all x ∈ S are aperiodic. Otherwise P is called periodic.

X Note that in the even-length cycle example, gcd t ≥ 1 : P t(x, x) > 0 = gcd 2, 4, 6, . . . =

2.

Remark 2.18. If P is periodic, then there is an easy way to “fix” P to become aperiodic: namely,

let Q = αI+(1−α)P be a lazy version of P . Then, Q(x, x) ≥ α for all x, and thus Q is aperiodic.

16

• Proposition 2.19. Let P be a Markov chain on state space S.

• x is aperiodic if and only if there exists t(x) such that for all t > t(x), P t(x, x) > 0.

• If P is irreducible, then P is aperiodic if and only if there exists an aperiodic state x.

• Consequently, if P is irreducible and aperiodic, and if S is finite, then there exists t0

such that for all t > t0 all x, y admit P t(x, y) > 0.

Proof. We start with the first assertion. Assume that x is aperiodic. LetR = t ≥ 1 : P t(x, x) > 0.Since P t+s(x, x) ≥ P t(x, x)P s(x, x) we get that t, s ∈ R implies t+ s ∈ R; i.e. R is closed under

addition. A number theoretic result tells us that since gcdR = 1 it must be that Rc is finite.

The other direction is simpler. If Rc is finite, then R contains primes p 6= q, so gcdR =

gcd(p, q) = 1.

For the second assertion, if P is irreducible and x is aperiodic, then let t(x) be such that for

all t > t(x), P t(x, x) > 0. For any z, y let t(z, y) be such that P t(z,y)(z, y) > 0 (which exists by

irreducibility). Then, for any t > t(y, x) + t(x) + t(x, y) we get that

P t(y, y) ≥ P t(y,x)(y, x)P t−t(y,x)−t(x,y)(x, x)P t(x,y)(x, y) > 0.

So for all large enough t, P t(y, y) > 0, which implies that y is aperiodic. This holds for all y, so

P is aperiodic.

The other direction is trivial from the definition.

For the third assertion, for any z, y let t(z, y) be such that P t(z,y)(z, y) > 0. Let T =

maxz,y t(z, y). Let x be an aperiodic state and let t(x) be such that for all t > t(x), P t(x, x) >

0. We get that for any t > 2T + t(x) we have that t− t(z, x)− t(x, z) ≥ t− 2T > t(x), so

P t(z, y) ≥ P t(z,x)(z, x)P t−t(z,x)−t(x,z)(x, x)P t(x,z)(x, z) > 0.

ut

Exercise 2.4. Let G be a finite connected graph, and let Q be the lazy random walk

on G with holding probability α; i.e. Q = αI + (1 − α)P where P (x, y) = 1deg(x) if x ∼ y and

P (x, y) = 0 if x 6∼ y.

Show that Q is aperiodic. Show that for diam(G) = max dist(x, y) : x, y ∈ G we have that

for all t > diam(G), all x, y ∈ G admit Qt(x, y) > 0.

17



18

Random Walks

Ariel Yadin

Lecture 3: Recurrence and Transience

3.1. Recurrence and Transience

X Notation: If (Xt)t is Markov-P on state space S, we can define the following: For A ⊂ S,

TA = inf t ≥ 0 : Xt ∈ A and T+A = inf t ≥ 1 : Xt ∈ A .

These are the hitting time of A and return time to A. (We use the convention that inf ∅ =∞.)

If A = x we write Tx = Tx and similarly T+x = T+

x.

Recall that we saw that the simple random walk on Z a.s. returns to the origin. We also

stated that on Z3 this is not true, and the simple random walk will never return to the origin

with positive probability.

Let us classify Markov chain according to these properties.

• Definition 3.1. Let P be a Markov chain on S. Consider a state x ∈ S.

• If Px[T+x =∞] > 0, we say that x is a transient state.

• If Px[T+x <∞] = 1, we say that x is recurrent .

• For a recurrent state x, there are two options:

– If Ex[T+x ] <∞ we say that x is positive recurrent.

– If Ex[T+x ] =∞ we say that x is null recurrent.

Our first goal will be to prove the following theorem.

••• Theorem 3.2. Let (Xt)t be a Markov chain on S with transition matrix P . If P is irre-

ducible, then for any x, y ∈ S, x is (positive, null) recurrent if and only if y is (positive, null)

recurrent.

That is, for irreducible chains, all the states have the same classification.

19

3.2. Stopping Times

A word about σ-algebras:

Recall that the canonical σ-algebra we take on the space SN is the σ-algebra generated by

the cylinder sets. A cylinder set is a set of the formω ∈ SN : ω0 = x0, . . . , ωt = xt

for some

t ≥ 0. A ⊂ SN is called a t-cylinder set if there exist x0, . . . , xt ∈ S such that for every ω ∈ Awe have ωj = xj for all j = 0, . . . , t.

Recall the σ-algebra

σ(X0, . . . , Xt) = σ(X−1j (x) : x ∈ S , j = 0, . . . , t

)= σ

(A : A is a j-cylinder set for some j ≤ t

).

Exercise 3.1. Define an equivalence relation on SN by ω ∼t ω′ if ωj = ω′j for all

j = 0, 1, . . . , t.

Show that this is indeed an equivalence relation.

We say that en event A respects ∼t if for any equivalent ω ∼t ω′ we have that ω ∈ A if and

only if ω′ ∈ A.

Show that σ(X0, X1, . . . , Xt) = A : A respects ∼t.

The hitting and return times above have the property, that their value can be determined by

the history of the chain; that is the event TA ≤ t is determined by (X0, X1, . . . , Xt).

• Definition 3.3 (Stopping Time). Consider a Markov chain on S. Recall that the probability

space is (SN,F ,P) where F is the σ-algebra generated by the cylinder sets.

A random variable T : SN → N ∪ ∞ is called a stopping time if for all t ≥ 0, the event

T ≤ t ∈ σ(X0, . . . , Xt).

Example 3.4. Any hitting time and return time is a stopping time. Indeed,

TA ≤ t =

t⋃j=0

Xj ∈ A .

Similarly for T+A . 454

Example 3.5. Consider the simple random walk on Z3. Let T = sup t : Xt = 0. This is the

last time the walk is at 0. One can show that T is a.s. finite. However, T is not a stopping time,

20

since for example

T = 0 = ∀ t > 0 Xt 6= 0 =

∞⋂t=1

Xt 6= 0 6∈ σ(X0).

454

Example 3.6. Let (Xt)t be a Markov chain and let T = inf t ≥ TA : Xt ∈ A′, where A,A′ ⊂S. Then T is a stopping time, since

T ≤ t =

t⋃k=0

k⋃m=0

Xm ∈ A,Xk ∈ A′ .

454

• Proposition 3.7. Let T, T ′ be stopping times. The following holds:

• Any constant t ∈ N is a stopping time.

• T ∧ T ′ and T ∨ T ′ are stopping times.

• T + T ′ is a stopping time.

Proof. Since t ≤ k ∈ ∅,Ω, the trivial σ-algebra, we get that t ≤ k ∈ σ(X0, . . . , Xk) for

any k. So constants are stopping times.

For the minimum:

T ∧ T ′ ≤ t = T ≤ t⋃T ′ ≤ t ∈ σ(X0, . . . , Xt).

The maximum is similar:

T ∨ T ′ ≤ t = T ≤ t⋂T ′ ≤ t ∈ σ(X0, . . . , Xt).

For the addition,

T + T ′ ≤ t =

t⋃k=0

T = k, T ′ ≤ t− k .

Since T = k = T ≤ k \ T ≤ k − 1 ∈ σ(X0, . . . , Xk), we get that T + T ′ is a stopping

time. ut

3.2.1. Conditioning on a stopping time. Stopping times are extremely important in the

theory of martingales, a subject we will come back to in the future.

For the moment, the important property we want is the Strong Markov Property.

For a fixed time t, we saw that the process (Xt+n)n is a Markov chain with starting distribution

Xt, independent of σ(X0, . . . , Xt). We want to do the same thing for stopping times.

21

Let T be a stopping time. The information captured byX0, . . . , XT , is the σ-algebra σ(X0, . . . , XT ).

This is defined to be the collection of all events A such that for all t, A∩T ≤ t ∈ σ(X0, . . . , Xt).

That is,

σ(X0, . . . , XT ) = A : A ∩ T ≤ t ∈ σ(X0, . . . , Xt) for all t .

One can check that this is indeed a σ-algebra.

Exercise 3.2. Show that σ(X0, . . . , XT ) is a σ-algebra.

Important examples are:

• For any t, T ≤ t ∈ σ(X0, . . . , XT ).

• Thus, T is measurable with respect to σ(X0, . . . , XT ).

• XT is measurable with respect to σ(X0, . . . , XT ) (indeed XT = x, T ≤ t ∈ σ(X0, . . . , Xt)

for all t and x).

• Proposition 3.8 (Strong Markov Property). Let (Xt)t be a Markov-P on S, and let T be

a stopping time. For all t ≥ 0, define Yt = XT+t. Then, conditioned on T < ∞ and XT , the

sequence (Yt)t is independent of σ(X0, . . . , XT ) and is Markov-(δXT , P ).

Proof. The (regular) Markov property tells us that for anym > k, and any eventA ∈ σ(X0, . . . , Xk),

P[Xm = y,A,Xk = x] = Pm−k(x, y)P[A,Xk = x].

We need to show that for all t, and any A ∈ σ(X0, . . . , XT ),

P[XT+t+1 = y|XT+t = x,1A, T <∞] = P (x, y)

(provided of course that P[XT+t = x,A, T < ∞] > 0). Indeed this follows from the fact that

A ∩ T = k ∈ σ(X0, . . . , Xk) ⊂ σ(X0, . . . , Xk+t) for all k, so

P[XT+t+1 = y,A,XT+t = x, T <∞] =

∞∑k=0

P[Xk+t+1 = y,Xk+t = x,A, T = k]

=

∞∑k=0

P (x, y)P[Xk+t = x,A, T = k] = P (x, y)P[XT+t = x,A, T <∞].

ut

Another way to state the above proposition is that for a stopping time T , conditional on

T <∞ we can restart the Markov chain from XT .

22

3.3. Excursion Decomposition

We now use the strong Markov property to prove the following:

Example 3.9. Let P be an irreducible Markov chain on S. Fix x ∈ S.

Define inductively the following stopping times: T(0)x = 0, and

T (k)x = inf

t ≥ T (k−1)

x + 1 : Xt = x.

So T(k)x is the time of the k-th return to x.

Let Vt(x) be the number of visits to x up to time t; i.e. Vt(x) =∑tk=1 1Xk=x.

It is immediate that Vt(x) ≥ k if and only if T(k)x ≤ t.

Now let us look at the excursions to x: The k-th excursion is

X[T (k−1)x , T (k)

x ] = (XT

(k−1)x

, XT

(k−1)x +1

, . . . , XT

(k)x

).

These excursions are paths of the Markov chain ending at x and starting at x (except, possibly,

the first excursion which starts at X0).

For k > 0 define

τ (k)x = T (k)

x − T (k−1)x ,

if T(k)x <∞ and 0 otherwise. For T

(k)x <∞, this is the length of the k-th excursion.

We claim that conditioned on T(k−1)x < ∞, the excursion X[T

(k−1)x , T

(k)x ], is independent

of σ(X0, . . . , XT(k−1)x

), and has the distribution of the first excursion X[0, T+x ] conditioned on

X0 = x.

Indeed, let Yt = XT

(k−1)x +t

. For any A ∈ σ(X0, . . . , XT(k−1)x

), and for any path γ : x → x,

since XT

(k−1)x

= x,

P[Y [0, τ (k)x ] = γ|A, T (k−1)

x ] = P[X[T (k−1)x , T (k)

x ] = γ|A, T (k−1)x <∞] = Px[X[0, T+

x ] = γ],

where we have used the strong Markov property. 454

This gives rise to the following relation:

• Lemma 3.10. Let P be an irreducible Markov chain on S. Then,

(Px[T+x <∞])k = Px[V∞(x) ≥ k] = Px[T (k)

x <∞].

Consequently,

1 + Ex[V∞(x)] =1

Px[T+x =∞]

,

where 1/0 =∞.

23

Proof. The event V∞(x) ≥ k is the event that x is visited at least k times, which is exactly

the event that the k-th excursion ends at some finite time. From the example above we have

that for any m,

P[T (m)x <∞|T (m−1)

x <∞] = P[∃t ≥ 1 : XT

(m−1)x +t

= x|T (m−1)x <∞] = Px[T+

x <∞].

SinceT

(m)x <∞

=T

(m)x <∞, T (m−1)

x <∞

, we can inductively conclude that

Px[T (k)x <∞] = Px[T (k)

x <∞|T (k−1)x <∞] · P[T (k−1)

x <∞]

= · · · = (Px[T+x <∞])k

The second assertion follows from the fact that

1 + Ex[V∞(x)] =

∞∑k=0

Px[V∞(x) ≥ k] =1

1− Px[T+x <∞]

,

where this holds even if Px[T+x <∞] = 1. ut

Similarly, one can prove:

Exercise 3.3. Let (Xt)t be Markov-(S, P ) for some irreducible P . Let Z ⊂ S. Show

that under Px, the number of visits to x until hitting Z (i.e. the random variable V = VTZ (x) +

1X0=x) is distributed geometric-p, for p = Px[TZ < T+x ].

We now get the following important characterization of recurrence in Markov chains:

• Corollary 3.11. Let P be an irreducible Markov chain on S. Then the following are equivalent:

(1) x is recurrent.

(2) Px[V∞(x) =∞] = 1.

(3) For any state y, Px[T+y <∞] = 1.

(4) Ex[V∞(x)] =∞.

Proof. If x is recurrent, then Px[T+x < ∞] = 1. So for any k, Px[V∞(x) ≥ k] = 1. Taking k to

infinity, we get that Px[V∞(x) =∞] = 1. This is the first implication.

For the second implication: Let y ∈ S.

Let Ek = X[T(k−1)x , T

(k)x ] be the k-th excursion from x. We assumed that Px[∀ k T (k)

x <∞] =

1. So under Px, all (Ek) are independent and identically distributed.

24

Since P is irreducible, there exists t > 0 such that Px[Xt = y , t < T+x ] > 0 (this is an

exercise). Thus, we have that p := Px[Ty < T+x ] ≥ Px[Xt = y , t < T+

x ] > 0. This implies by

the strong Markov property that

Px[Ty < T (k+1)x | Ty > T (k)

x , T (k)x <∞] ≥ p > 0.

So, using the fact that Px[∀ k T (k)x <∞] = 1,

Px[Ty ≥ T (k)x ] = Px[Ty ≥ T (k)

x | Ty > T (k−1)x , T (k−1)

x <∞] · Px[Ty > T (k−1)x ]

≤ (1− p) · Px[Ty ≥ T (k−1)x ] ≤ · · · ≤ (1− p)k.

Thus,

Px[T+y =∞] ≤ Px[∀ k , Ty ≥ T (k−1)

x ] = limk→∞

(1− p)k = 0.

This proves the second implication.

Finally, if for any y we have Px[T+y <∞] = 1, then taking y = x shows that x is recurrent.

This shows that (1),(2),(3) are equivalent.

It is obvious that (2) implies (4). Since Px[T+x = ∞] = 1

Ex[V∞(x)]+1 , we get that (4) implies

(1). ut

Exercise 3.4. Show that if P is irreducible, there exists t > 0 such that Px[Xt = y , t <

T+x ] > 0.

♣ Solution to ex:3.4. :(

There exists n such that Pn(x, y) > 0 (because P is irreducible). Thus, there is a sequence

x = x0, x1, . . . , xn = y such that P (xj , xj+1) > 0 for all 0 ≤ j < n. Let m = max0 ≤ j <

n : xj = x, and let t = n −m and yj := xm+j for 0 ≤ j ≤ t. Then, we have the sequence

x = y0, . . . , yt = y so that yj 6= x for all 0 < j ≤ t, and we know that P (yj , yj+1) > 0 for all

0 ≤ j < t. Thus,

Px[Xt = y , t < T+x ] ≥ Px[∀ 0 ≤ j ≤ t , Xj = yj ] = P (y0, y1) · · ·P (yt−1, yt) > 0.

:) X

25

Example 3.12. A gambler plays a fair game. Each round she wins a dollar with probability

1/2, and loses a dollar with probability 1/2, all rounds independent. What is the probability

that she never goes bankrupt, if she starts with N dollars?

We have already seen that this defines a simple random walk on Z, and that E0[Vt(0)] ≥ c√t.

Thus, taking t→∞ we get that E0[V∞(0)] =∞, and so 0 is recurrent.

Note that 0 here was not special, since all vertices look the same. This symmetry implies that

Px[T+x < ∞] = 1 for all x ∈ Z. Thus, for any N , PN [T+

0 = ∞] = 0. That is, no matter how

much money the gambler starts with, she will always go bankrupt eventually. 454

We now have part of Theorem 3.2.

• Corollary 3.13. Let P be an irreducible Markov chain on S. Then, for any x, y ∈ S, x is

transient if and only if y is transient.

Proof. As usual, by irreducibility, for any pair of states z, w we can find t(z, w) > 0 such that

P t(z,w)(z, w) > 0.

Fix x, y ∈ S and suppose that x is transient. For any t > 0,

P t+t(x,y)+t(y,x)(x, x) ≥ P t(x,y)(x, y)P t(y, y)P t(y,x)(y, x).

Thus,

Ey[V∞(y)] =

∞∑t=1

P t(y, y) ≤ 1

P t(x,y)(x, y)P t(y,x)(y, x)·∞∑t=1

P t+t(x,y)+t(y,x)(x, x) <∞.

So y is transient as well. ut



26

Random Walks

Ariel Yadin

Lecture 4: Stationary Distributions

4.1. Stationary Distributions

Suppose that P is a Markov chain on state space S such that for some starting distribution

µ, we have that Pµ[Xn = x] → π(x) where π is some limiting distribution. One immediately

checks that in this case we must have

πP (x) =∑s

limn→∞

Pn(y, s)P (s, x) = limn→∞

Pn+1(y, x) = π(x),

or πP = π. (That is, π is a left eigenvector for P with eigenvalue 1.)

• Definition 4.1. Let P be a Markov chain. If π is a distribution satisfying πP = π then π is

called a stationary distribution.

Example 4.2. Recall the two-state chain P =

1− p p

p 1− p

. We saw that P → 12 ·

1 1

1 1

.

Indeed, it is simple to check that π = (1/2, 1/2) is a stationary distribution in this case. 454

Example 4.3. Consider a finite graph G. Let P be the transition matrix of a simple random

walk on G. So P (x, y) = 1deg(x)1x∼y. Or: deg(x)P (x, y) = 1x∼y. Thus,∑

x

deg(x)P (x, y) = deg(y).

So deg is a left eigenvector for P with eigenvalue 1. Since∑x

deg(x) =∑x

∑y

1x∼y = 2∑

e∈E(G)

= 2|E(G)|,

we normalize π(x) = deg(x)2|E(G)| to get a stationary distribution for P . 454

The above stationary distribution has a special property, known as the detailed balance equa-

tion:

A distribution π is said to satisfy the detailed balance equation with respect to a transition

matrix P if for all states x, y

π(x)P (x, y) = π(y)P (y, x).

27

Exercise 4.1. If π satisfies the detailed balance equations, then π is a stationary

distribution.

We will come back to such distributions in the future.

4.2. Stationary Distributions and Hitting Times

There is a deep connection between stationary distributions and return times. The main

result here is:

••• Theorem 4.4. Let P be an irreducible Markov chain on state space S. Then, the following

are equivalent:

• P has a stationary distribution π.

• Every x is positive recurrent.

• Some x is positive recurrent.

• P has a unique stationary distribution, π(x) = 1Ex[T+

x ].

The proof of this theorem goes through a few lemmas.

X In the next lemma we will consider a function (vector) v : S → [0,∞]. Although it

may take the value ∞, since we are only dealing with non-negative numbers we can write

vP (x) =∑y v(y)P (y, x) without confusion (with the convention that 0 · ∞ = 0).

• Lemma 4.5. Let P be an irreducible Markov chain on state space S. Let v : S → [0,∞] be

such that vP = v. Then:

• If there exists a state x such that v(x) <∞ then v(y) <∞ for all states y.

• If v is not the zero vector, then v(y) > 0 for all states y.

X Note that this implies that if π is a stationary distribution then all the entries of π are

strictly positive.

Proof. For any t, using the fact that v ≥ 0,

v(x) =∑z

v(z)P t(z, x) ≥ v(y)P t(y, x).

28

Thus, for a suitable choice of t, since P is irreducible, we know that P t(y, x) > 0, and so

v(y) ≤ v(x)P t(y,x) <∞.

For the second assertion, if v is not the zero vector, since it is non-negative, there exists a

state x such that v(x) > 0. Thus, for any state y and for t such that P t(x, y) > 0 we get

v(y) =∑z

v(z)P t(z, y) ≥ v(x)P t(x, y) > 0.

ut

X Notation: Recall that for a Markov chain (Xt)t we denote by Vt(x) =∑tk=1 1Xk=x the

number of visits to x.

• Lemma 4.6. Let (Xt)t be Markov-(P, µ) for irreducible P . Assume T is a stopping time such

that

Pµ[XT = x] = µ(x) for all x.

Assume further that 1 ≤ T <∞ Pµ-a.s. Let v(x) = Eµ[VT (x)].

Then, vP = v. Moreover, if Eµ[T ] <∞ then P has a stationary distribution π(x) = v(x)Eµ[T ] .

Proof. The assumptions on T give that for any j,

µ(x) = Pµ[XT = x] =

∞∑j=1

Pµ[Xj = x, T = j].

∞∑j=0

Pµ[Xj = y, T > j] = Pµ[X0 = y] +

∞∑j=1

Pµ[Xj = y, T > j]

=

∞∑j=1

Pµ[Xj = y, T = j] + Pµ[Xj = y, T > j]

=

∞∑j=1

Pµ[Xj = y, T ≥ j] = v(y).

Thus we have that

v(x) =

∞∑j=1

Pµ[Xj = x, T ≥ j] =

∞∑j=0

Pµ[Xj+1 = x, T > j]

=

∞∑j=0

∑y

Pµ[Xj+1 = x,Xj = y, T > j]

=∑y

∞∑j=0

Pµ[Xj = y, T > j]P (y, x) = (vP )(x).

29

That is, vP = v.

Since ∑x

v(x) = Eµ[∑x

VT (x)] = Eµ[T ],

if Eµ[T ] <∞, then π(x) = v(x)Eµ[T ] defines a stationary distribution. ut

Example 4.7. Consider (Xt)t that is Markov-P for an irreducible P , and let v(y) = Ex[VT+x

(y)].

If x is recurrent, then Px-a.s. we have 1 ≤ T+x < ∞, and Px[XT+

x= y] = 1y=x = Px[X0 = y].

So we conclude that vP = v. Since Px-a.s. VT+x

(x) = 1, we have that 0 < v(x) = 1 < ∞, so

0 < v(y) <∞ for all y.

Note that although it may be that Ex[T+x ] =∞, i.e. x is null-recurrent, we still have that for

any y, Ex[VT+x

(y)] <∞, i.e. the expected number of visits to y until returning to x is finite.

If x is positive recurrent, then π(y) =Ex[V

T+x

(y)]

Ex[T+x ]

is a stationary distribution for P . 454

This vector plays a special role, as in the next Lemma.

• Lemma 4.8. Let P be an irreducible Markov chain. Let u(y) = Ex[VT+x

(y)]. Let v ≥ 0 be a

non-negative vector such that vP = v, and v(x) = 1. Then, v ≥ u. Moreover, if x is recurrent,

then v = u.

Proof. If y = x then v(x) = 1 ≥ u(x), so we can assume that y 6= x.

We will prove by induction that for all t, for any y 6= x,

t∑k=1

Px[Xk = y, T+x ≥ k] ≤ v(y).(4.1)

Indeed, for t = 1 this is just

Px[X1 = y, T+x ≥ 1] = P (x, y) ≤

∑z

v(z)P (z, y) = v(y),

since v ≥ 0, v(x) = 1 and y 6= x.

For general t > 0, we rely on the fact that by the Markov property, for any y 6= x,

Px[Xk+1 = y, T+x ≥ k+1] =

∑z 6=x

Px[Xk+1 = y,Xk = z, T+x ≥ k] =

∑z 6=x

Px[Xk = z, T+x ≥ k]P (z, y).

30

So by induction,

t+1∑k=1

Px[Xk = y, T+x ≥ k] = P (x, y) +

t∑k=1

Px[Xk+1 = y, T+x ≥ k + 1]

= P (x, y) +∑z 6=x

P (z, y)

t∑k=1

Px[Xk = z, T+x ≥ k]

≤ P (x, y) +∑z 6=x

P (z, y)v(z) =∑z

v(z)P (z, y) = v(y).

This completes a proof of (4.1) by induction.

Now, one notes that the left-hand side of (4.1) is just the expected number of visits to y

started at x, up to time T+x ∧ t. Taking t→∞, using monotone convergence,

v(y) ≥t∑

k=1

Px[Xk = y, T+x ≥ k] = Ex[VT+

x ∧t(y)] u(y).

This proves that v ≥ u.

Since x is recurrent, we have uP = u, and u(x) = 1 = v(x). We have seen that v − u ≥ 0,

and of course (v − u)P = v − u. Until now we have not actually used irreducibility; we will

use this to show that v − u = 0. Indeed, let y be any state. If v(y) > u(y) then v − u is a

non-zero non-negative left eigenvector for P , so must be positive everywhere. This contradicts

v(x)− u(x) = 0. So it must be that v − u ≡ 0. ut

We are now in good shape to prove Theorem 4.4.

Proof of Theorem 4.4. Assume that π is a stationary distribution for P . Fix any state x. Recall

that π(x) > 0. Define the vector v(z) = π(z)π(x) . We have that v ≥ 0, vP = v and v(x) = 1. Hence,

v(z) ≥ Ex[VT+x

(z)] for all z. That is,

Ex[T+x ] =

∑y

Ex[VT+x

(y)] ≤∑y

v(y) =∑y

π(y)

π(x)=

1

π(x)<∞.

So x is positive recurrent. This holds for a generic x.

The second bullet of course implies the third.

Now assume some state x is positive recurrent. Let v(y) = Ex[VT+x

(y)]. Since x is recurrent,

we know that vP = v, and∑y v(y) = Ex[T+

x ] < ∞. So π = vEx[T+

x ]is a stationary distribution

for P .

Since P has a stationary distribution, by the first implication all states are positive recurrent.

Thus, for any state z, if v = ππ(z) then vP = v and v(z) = 1. So z being recurrent we get that

31

v(y) = Ez[VT+z

(y)] for all y. Specifically,

Ez[T+z ] =

∑y

v(y) =1

π(z),

which holds for all states z.

For the final implication, if P has a specific stationary distribution, then of course it has a

stationary distribution. ut

• Corollary 4.9 (Stationary distributions are unique). If an irreducible Markov chain P has

two stationary distributions π and π′, then π = π′.

Exercise 4.2. Let P be an irreducible Markov chain. Show that for positive recurrent

states x, y,

Ex[VT+x

(y)]Ey[VT+y

(x)] = 1.

4.3. Transience, positive or null recurrence are properties of the chain

We also have now shown that

Theorem* (3.2). [restatement] Let P be an irreducible Markov chain. For any two states

x, y: x is transient / null recurrent / positive recurrent if and only if y is transient / null

recurrent / positive recurrent.

Proof. We have seen that

Px[T+x =∞] =

1

1 + Ex[V∞(x)]

implies that x is transient if and only if y is transient.

Now, if x is positive recurrent, then P has a stationary distribution, so all states, including y

are positive recurrent. ut

In light of this:

• Definition 4.10. Let P be an irreducible Markov chain. We say that

• P is transient, if there exists a transient state.

• P is null recurrent if there exists a null recurrent state.

• P is positive recurrent if there exists a positive recurrent state.

32



33

Random Walks

Ariel Yadin

Lecture 5: Positive Recurrent Chains

5.1. Simple Random Walks

Last lecture we proved that an irreducible Markov chain P has a stationary distribution if and

only if P is positive recurrent, and the stationary distribution is the reciprocal of the expected

return time.

Let’s investigate what this means in the setting of a simple random walk on a graph.

Example 5.1. Let G be a graph, and let P be the simple random walk on G; that is, P (x, y) =

1deg(x)1x∼y.

First, it is immediate that P is irreducible. This was shown in the exercises.

Consider the vector v(x) = deg(x). We have that

∑x

deg(x)P (x, y) =∑x

deg(x)1

deg(x)1x∼y = deg(y).

That is, vP = v.

If we take u(y) = v(y)/v(x) for some x, then uP = u and u(x) = 1. Thus, if P is recurrent,

then Ex[VT+x

(y)] = u(y) = deg(y)deg(x) for all x, y. This does not depend on dist(x, y)!

Another observation is that∑x v(x) = 2|E(G)|. That is, P is positive recurrent if and only

if G is finite. Moreover, in this case, the stationary distribution for P is π(x) = deg(x)2|E(G)| .

Note that if G is a finite regular graph then the stationary distribution on G is the uniform

distribution. 454

Example 5.2. Recall the simple random walk on Z. We already have seen that this is a

recurrent Markov chain. Thus, if vP = v, then v(y) = Ex[VT+x

(y)]v(x) for all x, y. Since the

constant vector ~1 satisfies ~1P = ~1, we get that Ex[VT+x

(y)] = 1 for all x, y. Thus, any v such

that vP = v must admit v ≡ c.So there is no stationary distribution on Z; that is, Z is null-recurrent. (We could have also

deduced this from the previous example.) 454

34

Example 5.3. Consider a different Markov chain on Z: Let P (x, x+1) = p and P (x, x−1) = 1−pfor all x.

Suppose vP = v. Then, v(x) = v(x−1)p+v(x+1)(1−p), or v(x+1) = 11−p (v(x)−pv(x−1))

Solving such recursions is simple: Set ux =[v(x+ 1) v(x)

]τ. So ux+1 = 1

1−pAux where

A =

1 −p1− p 0

.

Since the characteristic polynomial of A is λ2 − λ+ p(1− p) = (λ− p)(λ− (1− p)), we get that

the eigenvalues of A are p and 1− p. One can easily check that A is diagonalizable, and so

v(x) = ux(2) = (1− p)−x(Axu0)(2) = (1− p)−x · [0 1]MDM−1u0 = a(

p1−p

)x+ b,

where D is diagonal with p, 1 − p on the diagonal, and a, b are constants that depend on the

matrix M and on u0 (but independent of x).

Thus,∑x v(x) will only converge for a = 0, b = 0 which gives v = 0. That is, there is no

stationary distribution, and P is not positive recurrent.

In the future we will in fact see that P is transient for p 6= 1/2, and for p = 1/2 we have

already seen that P is null-recurrent. 454

Example 5.4. A chess knight moves on a chess board, each step it chooses uniformly among

the possible moves. Suppose the knight starts at the corner. What is the expected time it takes

the knight to return to its starting point?

At first, this looks difficult...

However, let G be the graph whose vertices are the squares of the chess board, V (G) =

1, 2, . . . , 82. Let x = (1, 1) be the starting point of the knight. For edges, we will connect two

vertices if the knight can jump from one to the other in a legal move.

35

Thus, for example, a vertex in the “center” of the board has 8 adjacent vertices. A corner,

on the other hand has 2 adjacent vertices. In fact, we can determine the degree of all vertices.

legal moves:

∗ ∗∗ ∗

o

∗ ∗∗ ∗

⇒

2 3 4 4 4 4 3 2

3 4 6 6 6 6 4 3

4 6 8 8 8 8 6 4

4 6 8 8 8 8 6 4

4 6 8 8 8 8 6 4

4 6 8 8 8 8 6 4

3 4 6 6 6 6 4 3

2 3 4 4 4 4 3 2

Summing all the degrees, one sees that 2|E(G)| = 4 · (4 · 8 + 4 · 6 + 5 · 4 + 2 · 3 + 2) = 4 · 84 = 336.

Thus, the stationary distribution is π(i, j) = deg(i, j)/336. Specifically, π(x) = 2/336 and so

Ex[T+x ] = 168. 454

5.2. Summary so far

Let us sum up what we know so far about irreducible chains. If P is an irreducible Markov

chain, then:

• Ex[V∞(x)] + 1 = 1Px[T+

x =∞].

• For all states x, y, x is transient if and only if y is transient.

• If P is recurrent, the vector v(z) = Ex[VT+x

(z)] is a positive left eigenvector for P , and

any non-negative left eigenvector for P is proportional to v.

• P has a stationary distribution if and only if P is positive recurrent.

• If P is positive recurrent, then π(x)Ex[T+x ] = 1.

5.3. Positive Recurrent Chains

Recall that Lemma 4.6 connects the expected number of visits to x up to an appropriate

stopping time, to the stationary distribution and the expected value of the stopping time:

Lemma* (4.6). [restatement] Let (Xt)t be Markov-(P, µ) for irreducible P . Assume T is a

stopping time such that

Pµ[XT = x] = µ(x) for all x.

Assume further that 1 ≤ T <∞ Pµ-a.s. Let v(x) = Eµ[VT (x)].

Then, vP = v. Moreover, if Eµ[T ] <∞ then P has a stationary distribution π(x) = v(x)Eµ[T ] .

36

Good choices of the stopping time T for positive recurrent chains will give some nice identities.

• Proposition 5.5. Let P be a positive recurrent chain with stationary distribution π. Then,

• Ex[T+x ] = 1

π(x) .

• Ex[VT+x

(y)] = π(y)π(x) .

• For x 6= y,

1 + Ex[VT+y

(x)] = π(x) · (Ey[T+x ] + Ex[T+

y ]).

• For x 6= y,

π(x)Px[T+y < T+

x ] · (Ey[T+x ] + Ex[T+

y ]) = 1.

• (This is sometimes called “the edge commute inequality”. It will be important in the

future.) For x ∼ y,

Ex[Ty] + Ey[Tx] ≤ 1

π(x)P (x, y).

Proof. We have:

• Follows by choosing T = T+x in Lemma 4.6.

• We have already seen this. It also follows by choosing T = T+x in Lemma 4.6.

• Let T = inf t ≥ Tx + 1 : Xt = y. So Ey[T ] = Ey[T+x ] + Ex[T+

y ]. Since Py[T = z] =

1z=y, we can apply Lemma 4.6. The strong Markov property at time Tx gives that

Ey[VT (x)] = Ey[∑

Tx≤k≤T1Xk=x] = Ex[VT+

y(x)] + 1.

So by Lemma 4.6,

Ex[VT+y

(x)] = Ey[VT (x)]− 1 = π(x)Ey[T ]− 1 = π(x) · (Ey[T+x ] + Ex[T+

y ])− 1.

• This follows from the previous bullet since Px-a.s. VT+y

(x)+1 ∼ Geo(p) for p = Px[T+y <

T+x ].

• Since for x ∼ y we have that Px[T+y < T+

x ] ≥ Px[X1 = y] = P (x, y), we get the assertion

from the previous bullet.

ut



37

Random Walks

Ariel Yadin

Lecture 6: Convergence to Equilibrium

6.1. Convergence to Equilibrium

Recall that we saw that if P t(y, x)→ π(x) for all x, then π must be a stationary distribution.

We will start to work our way to prove the opposite, at least for irreducible and aperiodic chains.

Our goal:

Theorem* (6.5). [restatement] Let (Xt)t be an irreducible and aperiodic Markov chain.

Suppose that π is a stationary distribution for this chain. Then, for any starting distribution µ,

and any state x,

Pµ[Xt = x]→ π(x).

6.2. Couplings

Example 6.1. Two gamblers walk into a casino in Las Vegas.

The first one plays a fair game - every round she wins a dollar with probability 1/2, and loses

a dollar with probability 1/2. All rounds are independent.

The second gambler plays an unfair game - every round he wins a dollar with probability

p < 1/2, and loses a dollar with probability 1− p, again all rounds independent.

It is extremely intuitive that the second gambler is worse off than the first one. It should be

the case that the probability of the second gambler to go bankrupt is at least the probability of

the first one. Also, it seems that any reasonable measure of success should be larger for the first

gambler than for the second.

How can we mathematically prove this?

For example, we would like to show that for all starting positions N and any M > N , we

have that P1N [T0 < TM ] ≤ P2

N [T0 < TM ]. How can we show this?

The idea is to use couplings. 454

38

• Definition 6.2. A coupling of Markov chains P,Q on a state space S, is a stochastic process

(Xt, Yt)t such that (Xt)t is Markov-P and (Yt)t is Markov-Q.

Note that (Xt, Yt)t need not be a Markov chain on S2. If a coupling (Xt, Yt)t is in addition a

Markov chain on S2, then we say that (Xt, Yt)t is a Markovian coupling. If R is the transition

matrix for the Markovian coupling (Xt, Yt)t, we say that R is a coupling of P,Q.

Example 6.3. Let us use a Markovian coupling to show that lowering the winning probability

for a gambler, lowers their chances of winning.

Let p < q, and let P be the transition matrix on N for the gambler that wins with probability

p, and let Q be the transition matrix for the gambler that wins with probability q. That is,

P (n, n+ 1) = p and P (n, n− 1) = 1− p for all n > 0, and P (0, 0) = 1. Similarly for Q.

The corresponding Markov chains are (Xt)t for P and (Yt)t for Q. We can could the chains

as follows: Given (Xt, Yt), since Y moves up with higher probability than X, we can organize a

coupling such that Yt+1 ≥ Xt+1 in any case. That is, given (Xt, Yt), if Xt > 0 let

(Xt+1, Yt+1) = (Xt, Yt) +

(1, 1) with probability p

(−1, 1) with probability q − p(−1,−1) with probability 1− q.

If Xt = 0, Yt > 0 let

(Xt+1, Yt+1) = (Xt, Yt) +

(0, 1) with probability q

(0,−1) with probability 1− q.

If Xt = Yt = 0 let Xt+1 = Yt+1 = 0

It is immediate to check that this is indeed a coupling of P and Q, and that Yt ≥ Xt for all t

provided that Y0 ≥ X0.

One can check that the resulting transition matrix is:

R((n,m), (n+ i,m+ j)) =

p i = 1, j = 1n,m > 0

q − p i = −1, j = 1, n,m > 0

1− q i = −1, j = −1, n,m > 0

q i = 0, j = 1, n = 0,m > 0

1− q i = 0, j = −1, n = 0,m > 0

1 i = 0, j = 0, n = m = 0.

So this is a Markovian coupling.

39

Thus, for any M > N ,

PQN [T0 < TM ] = PR(N,N)[∃ t : Yt = 0 and ∀ n < t Yn < M ]

≤ PR(N,N)[∃ t : Xt = 0 and ∀ n < t Xn < M ] = PPN [T0 < TM ],

where PP ,PQ,PR denote the probability measures for P , Q, and R respectively, and we have

used the fact that under PR(N,N), a.s. Xt ≤ Yt for all t. 454

6.2.1. Coupling Time.

• Lemma 6.4. Let (Xt, Yt)t be a Markovian coupling of two Markov chains on the same state

space S with the same transition matrix P . Define the coupling time as

τ = inf t ≥ 0 : Xt = Yt .

This is a stopping time for the Markov chain (Xt, Yt)t.

Define

Zt =

Xt t ≤ τYt t ≥ τ.

Then, (Zt)t is a Markov chain with transition matrix P , started from Z0 = X0.

Specifically, (Zt, Yt)t is a coupling of Markov chains such that for all t ≥ τ , Zt = Yt.

Proof. Since τ ≥ t+ 1 = τ < t+ 1c ∈ σ((X0, Y0), . . . , (Xt, Yt)), the Markov property at

time t gives

P[Zt+1 = y|Zt = x, τ ≥ t+1, Zt−1, . . . , Z0] = P[Xt+1 = y|Xt = x, τ ≥ t+1, Xt−1, . . . , X0] = P (x, y).

Since τ is a stopping time, we can use the strong Markov property to deduce that for any t,

P[Zt+1 = y|Zt = x, τ ≤ t, Zt−1, . . . , Z0] = P[Yt+1 = y|Yt = x, . . . , Yτ ] = P (x, y).

Thus, for any t,

P[Zt+1 = y|Zt = x, Zt−1, . . . , Z0]

= P[Zt+1, τ ≥ t+ 1|Zt = x, Zt−1, . . . , Z0] + P[Zt+1, τ ≤ t|Zt = x, Zt−1, . . . , Z0]

= P (x, y) · (P[τ ≥ t+ 1|Zt = x, Zt−1, . . . , Z0] + P[τ ≤ t|Zt = x, Zt−1, . . . , Z0]) = P (x, y).

ut

40

6.3. The Convergence Theorem

In this section we will prove a fundamental result in the theory of Markov chains.

••• Theorem 6.5. Let P be an irreducible and aperiodic Markov chain. If P has a stationary

distribution π, then for any starting distribution µ, and any state x,

Pµ[Xt = x]→ π(x).

Proof. Let (Yt)t be Markov-(π, P ) independent of (Xt)t. Since πP t = π, we have that π(x) =

P[Yt = x]. Let τ be the coupling time of (Xt, Yt)t.

First we show that P[τ <∞] = 1, so P[τ > t]→ 0. Indeed, (Xt, Yt)t is a Markov chain on S2,

with transition matrix Q((x, y), (x′, y′)) = P (x, x′)P (y, y′). Moreover, for χ(x, y) = π(x)π(y),

we get that χ is stationary distribution for Q.

We claim that since P is irreducible and aperiodic, then Q is also irreducible (and aperi-

odic). Indeed, let (x, y), (x′, y′) ∈ S2. We already saw that there exist t(x, x′), t(y, y′) such

that for all t > t(x, x′), P t(x, x′) > 0 and for all t > t(y, y′), P t(y, y′) > 0. Thus, for all

t > max t(x, x′), t(y, y′) we have that Qt((x, x′), (y, y′)) > 0. Thus, Q is irreducible.

Since Q has a stationary distribution and Q is irreducible, we get that Q is positive recurrent.

Specifically, P[T(x,x) <∞] = 1 for any x ∈ S. Since τ ≤ T(x,x), we get that P[τ <∞] = 1.

Now define

Zt =

Yt t ≤ τXt t ≥ τ.

So (Xt, Zt)t is a coupling of Markov chains such that for all t ≥ τ , Xt = Zt. Also, since

Z0 = Y0 ∼ π,

P[Zt = x] = P[Zt = x, t < τ ] + P[Zt = x, t ≥ τ ] = P[Zt = x, t < τ ] + P[Xt = x, t ≥ τ ].

Adding this to

P[Xt = x] = P[Xt = x, t < τ ] + P[Xt = x, t ≥ τ ],

we get that

|P[Xt = x]− P[Zt = x]| = |P[Xt = x, t < τ ]− P[Zt = x, t < τ ]| ≤ P[τ > t]→ 0.

Finally, the previous lemma tells us that (Zt)t is Markov-(S, P, π), most importantly, starting

distribution π. So P[Zt = x] = π(x). ut

41



42

Random Walks

Ariel Yadin

Lecture 7: Conditional Expectation

7.1. Conditional Probability

Recall that we want to define a random walk. A (simple) random walk is a process that

given the current location chooses among the available neighbors uniformly. So we need a way

of conditioning on the current position.

That is, we want the notions of conditional probability and conditional expectation.

The notion of conditional expectation is central to probability. It is developed using the

Radon-Nikodym derivative from measure theory:

Johann Radon (1887-1956)

Otto Nikodym (1887-1974)

••• Theorem 7.1. Let µ, ν be two probability measures on (Ω,F). Suppose that µ is absolutely

continuous with respect to ν; that is, ν(A) = 0 implies that µ(A) = 0 for all A ∈ F .

Then, there exists a (ν-a.s. unique) random variable dµdν on (Ω,F , ν) such that for any event

A ∈ F ,

Eµ[1A] = Eν [1Adµ

dν].

X Lebesgue integrals give the following form:∫A

dµ =

∫A

dµ

dνdν,

which can be informally stated as dµdν dν = dµ.

This theorem is used to prove the following theorem.

••• Theorem 7.2. Let X be a random variable on a probability space (Ω,F ,P) such that

E[|X|] < ∞. Let G ⊂ F be a sub-σ-algebra of F . Then, there exists a (P-a.s. unique) G-

measurable random variable Y such that for all A ∈ G, E[Y 1A] = E[X1A].

X Notation: An X such as above is called integrable.

X Notation: If Y is G-measurable then we write Y ∈ G.

43

• Definition 7.3. Let X be an integrable (E[|X|] <∞) random variable on a probability space

(Ω,F ,P). Let G ⊂ F be a sub-σ-algebra of F .

The random variable from the above theorem is denoted E[X|G].

If Y is a random variable on (Ω,F ,P) then we denote E[X|Y ] := E[X|σ(Y )].

If A ∈ F is any event then we write

P[A|G] := E[1A|G].

Proof of Theorem 7.2. Note that uniqueness is immediate from the fact that if Y, Y ′ are two

such random variables, then for An =Y − Y ′ ≥ n−1

we have that An ∈ G (as a function of

(Y, Y ′)) and

P[An]n−1 ≤ E[(Y − Y ′)1An ] = E[X1An ]− E[X1An ] = 0.

So by continuity of probability,

P[Y > Y ′] = P[⋃n

An] = limn

P[An] = 0.

Exchanging the roles of Y and Y ′ we get that P[Y 6= Y ′] = 0.

For existence we use the Radon-Nikodym derivative: First assume that X ≥ 0. Then, define

a probability measure on (Ω,G) by

∀ A ∈ G Q(A) =E[X1A]

E[X].

If P[A] = 0 then Q(A) = 0 (e.g. by Cauchy-Schwartz E[X1A]2 ≤ E[X2]P[A] = 0); that is

Q << P. So the Radon-Nikodym derivative exists and for all A ∈ G,

E[X1A] = E[dQd P1A]E[X].

Taking Y = dQd P E[X] completes the case of X ≥ 0.

For the general case, recall that X = X+ − X−, and X+, X− are non-negative. Let Y1 =

E[X+|G] and Y2 = E[X−|G]. Then, Y1 − Y2 ∈ G and for any A ∈ G,

E[X1A] = E[X+1A]− E[X−1A] = E[(Y1 − Y2)1A].

Thus, Y = Y1 − Y2 completes the proof. ut

X Note that to prove that Y = E[X|G] one needs to show two things: Y ∈ G and E[Y 1A] =

E[X1A] for all A ∈ G.

X Important: Conditional expectation E[X|G] is the average value of X given the information

44

in G; this is a random variable, not a number as is the usual expectation. One needs to be careful

with this. Whenever we write E[X|G] = Z we actually mean that E[X|G] = Z a.s.

Exercise 7.1. Let X be an integrable random variable on (Ω,F ,P). Let G ⊂ F be a

sub-σ-algebra. Then,

• If X ∈ G then E[X|G] = X. [ The average value of X given X is X itself. ]

• If G = ∅,Ω then E[X|G] = E[X]. [ Given no information, the average value of X is

E[X]. ]

• If X = c for c a constant, then X is measurable with respect to the trivial σ-algebra

∅,Ω ⊂ G, so E[c|G] = c.

• If X is independent of G then E[X|G] = E[X]. [ Given no information about X, the

average value of X is E[X]. ]

• E[E[X|G]] = E[X].

Solution.

• It is trivial that E[X1A] = E[X1A] so if X ∈ G then X satisfies both properties required

to be a conditional expectation.

• Again, constants are measurable with respect to any σ-algebra. For the second property,

E[X1∅] = 0 = E[E[X]1∅] and E[X1Ω] = E[X].

• Easy. Follows from the previous bullets.

• If X is independent of G, then for any A ∈ G, E[X1A] = E[X]P[A] = E[E[X]1A]. Also,

E[X] ∈ G since constants are measurable with respect to any σ-algebra.

• Consider the event Ω ∈ G. Since 1 = 1Ω we get that E[X] = E[X1Ω] = E[E[X|G]1Ω] =

E[E[X|G]].

ut

Exercise 7.2. If Y = Y ′ a.s. then E[X|Y ] = E[X|Y ′]. [ Changing by measure 0 does

not change the conditioning. ]

Hint: Consider E[X|σ(Y ) ∩ σ(Y ′)].

45

Solution. It suffices to prove that if G and G′ are σ-algebras such that if A ∈ G4G′ then P[A] = 0

(that is, G and G′ only differ on measure 0 events) then E[X|G] = E[X|G′] a.s.

G ∩ G′ is a σ-algebra as an intersection of σ-algebras. Let Z = E[X|G ∩ G′]. Since G ∩ G′ ⊂ Gand G∩G′ ⊂ G′ we have that Z is both G and G′ measurable. Moreover, for any A ∈ G: if A 6∈ G′

then P[A] = 0 so E[X1A] = 0 = E[Z1A]. If A ∈ G′ then A ∈ G ∩ G′ so E[X1A] = E[Z1A] by

definition. Thus, Z = E[X|G]. Similarly, exchanging the roles of G and G′, we get Z = E[X|G′],so E[X|G] = E[X|G′] a.s. ut

Exercise 7.3. E[aX + Y |G] = aE[X|G] + E[Y |G] a.s.

Solution. The right hand side is of course G-measurable. For any A ∈ G,

E[(aX+Y )1A] = aE[X1A]+E[Y 1A] = aE[E[X|G]1A]+E[E[Y |G]1A] = E[(aE[X|G]+E[Y |G])1A].

ut

Exercise 7.4. If X ≤ Y then E[X|G] ≤ E[Y |G].

Solution. Since Y −X ≥ 0 is suffices to show that if X ≥ 0 then E[X|G] ≥ 0 a.s.

Let An =E[X|G] ≤ −n−1

. So An ∈ G and

P[An]n−1 ≤ −E[E[X|G]1An ] = −E[X1An ] ≤ 0.

So P[An] = 0 for all n, and thus P[E[X|G] < 0] = P[∃ n : An] = 0. ut

Exercise 7.5. Let G ∈ G. Show that for any event A with P[A] > 0,

P[G|A] =E[P[A|G]1G]

P[A].

Thomas Bayes (1701-1761)

46

Solution. Note that since G ∈ G, by definition,

E[P[A|G]1G] = E[1A1G] = P[A ∩G].

ut

7.2. More Properties

• Proposition 7.4 (Monotone Convergence). If (Xn)n is a monotone non-decreasing sequence

of non-negative integrable random variables, such that Xn X for some integrable X, then

E[Xn|G] E[X|G] a.s.

Proof. Let Yn = X −Xn. Since Xn X, we get that Yn ≥ 0 for all n. Thus, (E[Yn|G])n is a

monotone non-increasing sequence of non-negative random variables. Let Z(ω) = infn E[Yn|G](ω) =

limn E[Yn|G](ω) = lim infn E[Yn|G](ω). So Z ∈ G and Z ≥ 0. Fatou’s Lemma gives that for any

A ∈ G,

E[Z] ≤ lim infn

E[E[Yn|G]] = lim infn

E[X −Xn] = 0,

since E[Xn] E[X] by monotone convergence. Thus, Z = 0 a.s. This implies that

E[X|G]− E[Xn|G]a.s.−→ 0.

ut

• Proposition 7.5. If Z ∈ G then E[XZ|G] = E[X|G]Z a.s.

Proof. Note that E[X|G]Z ∈ G so we only need to prove the second property.

We use the usual four-step proof, from indicators to simple random variables to non-negatives

to general.

If Z = 1B for some B ∈ G then for any A ∈ G,

E[XZ1A] = E[X1B∩A] = E[E[X|G]1B∩A] = E[E[X|G]Z1A].

If Z is simple, then Z =∑k ak1Ak and by linearity and the previous case,

E[XZ|G] =∑k

ak E[X1Ak |G] =∑k

ak E[X|G]1Ak = E[X|G]Z.

For general non-negative Z, in the case X is non-negative, we approximate Z by a non-

decreasing sequence of simple random variables, Zn Z, so that XZn XZ and by monotone

convergence and the previous case,

E[XZ|G] = limn

E[XZn| G] = limn

E[X|G]Zn = E[X|G]Z.

47

For a general Z ∈ G, and general X, write Z = Z+ − Z− and X = X+ − X−, with 0 ≤Z+, Z− ∈ G and X+, X− ≥ 0. By the previous case and linearity,

E[X±Z|G] = E[X±(Z+ − Z−)|G] = E[X±|G](Z+ − Z−) = E[X±|G]Z,

which immediately leads to the assertion. ut

The following properties all have their “usual” proof adapted to the conditional setting.

• Proposition 7.6 (Jensen’s Inequality). If g : R → R is convex, and X, g(X) are integrable,

then

g(E[X|G]) ≤ E[g(X)|G].

Johan Jensen (1859-1925)Proof. If g is convex then for any m there exist am, bm such that g(s) ≥ ams+bm for all s, and g(m) = amm+bm.

Thus, for any ω ∈ Ω, there exist A(ω), B(ω) such that g(s) ≥ A(ω)s + B(ω) for all s and g(E[X|G](ω)) =

A(ω)E[X|G](ω) + B(ω). It is not difficult to see that A,B are measurable, and determined by E[X|G] and g, so

A,B are G-measurable random variables. Thus,

g(E[X|G]) = AE[X|G] +B = E[AX +B| G] ≤ E[g(X)|G].

ut

• Proposition 7.7 (Cauchy-Schwarz). If X,Y are in L2(Ω,F ,P), then

(E[XY |G])2 ≤ E[X2|G]E[Y 2|G].

Augustin-Louis Cauchy

(1789-1857)

Proof. By Jensen’s inequality, E |E[XY |G]| ≤ E[|XY |] ≤√

E[X2][Y 2] <∞, so E[XY |G] is a.s. finite. If E[Y 2|G] =

0 a.s. then Y = 0 a.s. and so both sides of the inequality become 0. So we can assume that E[Y 2|G] > 0.

Set λ =E[XY |G]

E[Y 2|G], which is a G-measurable random variable. By linearity,

0 ≤ E[(X − λY )2|G] = E[X2|G] + λ2 E[Y 2|G]− 2λE[XY |G]

= E[X2|G]−(E[XY |G])2

E[Y 2|G].

ut

Hermann Schwarz

(1843-1921)

• Proposition 7.8 (Markov / Chebyshev ). If X ≥ 0 is integrable, then for any G-measurable

Z such that Z > 0,

P[X ≥ Z|G] ≤ E[X|G]

Z.

Pafnuty Chebyshev

(1821-1894)

Proof. Let Y = Z1X≥Z. So Y ≤ X. Thus,

Z P[X ≥ Z|G] = E[Y |G] ≤ E[X|G].

ut

48

Remark 7.9. Suppose that (Ω,F ,P) is a probability space, and G ⊂ F is some sub-σ-algebra.

We have two vector spaces associated: L2(Ω,G,P) ⊂ L2(Ω,F ,P); the spaces of square-integrable

G-measurable random variables and square-integrable F-measurable random variables. These

spaces come equipped with a inner-product structure given by < X,Y >= E[XY ]. The theory

of inner-product (or Hilbert) spaces tells us that L2(Ω,F ,P) = L2(Ω,G,P) ⊕ V where V is

the orthogonal complement to L2(Ω,G,P) in L2(Ω,F ,P). So we can project any F-measurable

square integrable X onto L2(Ω,G,P). This projection turns out to be exactly X 7→ E[X|G].

In fact, it is immediate that E[X|G] is a square-integrable G-measurable random variable.

Moreover, for Y ∈ L2(Ω,G,P),

〈X − E[X|G], Y 〉 = E[XY − E[X|G]Y ] = E[XY ]− E[E[XY |G]] = 0.

Thus, to minimize E[(X − Y )2] over all Y ∈ L2(Ω,G,P), we can take Y = E[X|G].

7.2.1. The smaller σ-algebra always wins. Perhaps the most important property that has

no “unconditional” counterpart is

• Proposition 7.10. Let X be an integrable random variable on a probability space (Ω,F ,P).

Let H ⊂ G ⊂ F be sub-σ-algebras. Then,

• E[E[X|H]|G] = E[X|H].

• E[E[X|G]|H] = E[X|H].

Proof. The first assertion comes from the fact that E[X|H] ∈ H ⊂ G, so conditioning on G has

no effect.

For the second assertion we have that E[X|H] ∈ H of course, and for any A ∈ H, using that

A ∈ G as well,

E[E[X|G]1A] = E[E[X1A|G]] = E[X1A] = E[E[X|H]1A].

ut

7.3. Partitioned Spaces

During this course, we will almost always use conditional probabilities conditioned on some

discrete random variable. Note that if Y is discrete with range R (perhaps d-dimensional), then∑r∈R 1Y=r = 1 a.s. This simplifies the discussion regarding conditional probabilities.

The main observation is the following

49

Exercise 7.6. Suppose that (Ω,F ,P) is a probability space with Ω =⊎k∈I Ak where

Ak ∈ F for all k ∈ I, with I some countable (possibly finite) index set. Show that

σ((Ak)k∈I) = ⊎k∈J

Ak : J ⊂ I.

Hint: Show that any set in the right-hand side must be in σ((Ak)k∈I). Show that the right-

hand side is a σ-algebra.

• Lemma 7.11. Let X be an integrable random variable on (Ω,F ,P). Let I be some countable

index set (possibly finite). Suppose that P[⊎k∈I Ak] = 1 where Ak ∈ F for all k, and P[Ak] > 0

for all k. Let G = σ((Ak)k∈I). Then,

E[X|G] =∑k

1Ak ·E[X1Ak ]

P[Ak].

Proof. Let Y =∑k 1Ak ·

E[X1Ak ]

P[Ak] . The of course Y ∈ G. For any A ∈ G we have that 1A =∑k∈J 1Ak (P-a.s.) for some J ⊂ I. Thus,

E[Y 1A] =∑k∈J

E[1Ak ] · E[X1Ak ]

P[Ak]=∑k∈J

E[X1Ak ] = E[X1A].

ut

• Corollary 7.12. Let Y be a discrete random variable with range R on (Ω,F ,P). Let X be an

integrable random variable on the same space. Then,

E[X|Y ] =∑r∈R

1Y=rE[X1Y=r]

P[Y = r]=∑r∈R

1Y=r E[X|Y = r],

where we take the convention that E[X|Y = r] =E[X1Y=r]

P[Y=r] = 0 for P[Y = r] = 0.

Proof. Ω =⊎r∈R Y = r. ut

X Note that E[X|Y ] is a discrete random variable as well, regardless of the original distribution

of X.



50

Random Walks

Ariel Yadin

Lecture 8: Martingales

8.1. Martingales

X Do conditional expectation

• Definition 8.1. Let (Ω,F ,P) be a probability space. A filtration is a monotone sequence of

sub-σ-algebras F0 ⊂ F1 ⊂ · · · ⊂ F .

A sequence (Xn)n of random variables is said to be adapted to a filtration (Fn)n if for all

n, Xn ∈ Fn.

• Definition 8.2. Let (Ω,F ,P) be a probability space, and let (Fn)n be a filtration. A sequence

(Xn)n is said to be a martingale with respect to the filtration (Fn)n, or sometimes a (Fn)n-

martingale, if for all n,

• E[|Xn|] <∞ (i.e. Xn is integrable).

• E[Xn+1|Fn] = Xn.

• (Xn)n is adapted to (Fn)n.

If the filtration is not specified then we say that (Xn)n is a martingale if it is a martingale

with respect to the natural filtration Fn := σ(X0, . . . , Xn); that is, a sequence of integrable

random variables such that for all n,

E[Xn+1|Xn, . . . , X0] = Xn.

Exercise 8.1. Show that if (Xn)n is an Fn-martingale then (Xn)n is also a

martingale with respect to the natural filtration (σ(X0, . . . , Xn))n. (Hint: Show that for all

n, σ(X0, . . . , Xn) ⊂ Fn.)

51

Example 8.3. Let (Xn)n be a simple random walk on Z started at X0 = 0. The Markov

property gives that

E[Xn+1|Xn, . . . , X0] =1

2(Xn + 1) +

1

2(Xn − 1) = Xn.

So (Xn)n is a martingale. 454

Example 8.4. More generally, if (Xn)n is a sequence of independent random variables with

E[Xn] = 0 for all n, and Sn =∑nk=0Xk, then

E[Sn+1|Sn, . . . , S0] = Sn + E[Xn+1|Sn, . . . , S0].

Since Sn, . . . , S0 ∈ σ(X0, . . . , Xn) and since Xn+1 is independent of σ(X0, . . . , Xn) we have that

E[Xn+1|Sn, . . . , S0] = E[Xn+1] = 0.

So, in conclusion, (Sn)n is a martingale. 454

• Proposition 8.5. Let (Xn)n be a (Fn)n-martingale. For any k ≤ n we have E[Xn|Fk] = Xk.

Proof. For k = n this is obvious. Assume that k < n. By properties of conditional expectation,

because Fk ⊂ Fn−1,

E[Xn|Fk] = E[E[Xn|Fn−1]|Fk] = E[Xn−1|Fk].

Continuing inductively, we get the proposition. ut

Exercise 8.2. Let (Xn)n be a (Fn)n-martingale. Let T be a stopping time (with respect

to the filtration (Fn)n). Prove that (Yn := XT∧n)n is a (Fn)n-martingale.

••• Theorem 8.6 (Optional Stopping). Let (Xn)n be an (Fn)n-martingale and T a stopping

time. We have that E[XT |X0] = X0 in the following cases:

• If T is bounded; that is if T ≤ t a.s. for some 0 < t <∞.

• If T is a.s. finite and there exists M > 0 such that |Xn| ≤ M for all n a.s. ((Xn)n is

bounded).

• If E[T ] < ∞ and there exists M > 0 such that |Xn+1 −Xn| ≤ M for all n a.s. (Xn)n

has bounded increments.

52

Proof. We start with the first case: Let Yn = XT∧n. Since T ≤ t a.s. we get that Yt = XT .

Since Y0 = X0 we conclude

E[XT |X0] = E[Yt|Y0] = Y0 = X0.

For the second case: Let Yn = XT∧n as above. We have

|E[Yn|X0]− E[XT |X0]| = |E[(XT∧n −XT ) · 1T>n|X0]| ≤ 2M · P[T > n|X0]→ 0,

because T <∞ a.s. Thus, since T ∧ n is a bounded stopping time,

E[XT |X0] = limn→∞

E[Yn|Y0] = limn→∞

Y0 = X0.

Finally, for the third case: Note that

|XT∧n −XT | · 1T>n ≤T−1∑k=n

|Xk+1 −Xk|1T>n ≤MT1T>n.

Thus, similarly to the above,

|E[XT∧n|X0]− E[XT |X0]| ≤M E[T1T>n|X0].

Since T1T>n → 0, and since E[T ] <∞, we get by dominated convergence that E[T1T>n]→0, and so

X0 = E[XT∧n|X0]→ E[XT |X0].

ut

Let us use martingales to calculate some probabilities.

Example 8.7 (Gambler’s Ruin). Let (Xt)t be a simple random walk on Z. Let T = T (0, n)be the first time the walk is at 0 or n.

We can think of Xt as a the amount of money a gambler playing a fair game has after the

t-th game. What is the probability that a gambler that starts with x reaches n before going

bankrupt?

Let

pn(x) = Px[Tn < T0].

Since (Xt)t is a martingale, we get that (Xt∧T )t is a bounded martingale under the measure Px.

Since T is a.s. finite, we can apply the optional stopping theorem to get

x = Ex[XT∧T ] = Ex[XT ]

= Ex[XT |Tn < T0] · pn(x) + Ex[XT |T0 < Tn] · (1− pn(x)) = pn(x) · n.

53

So pn(x) = xn . 454

Remark 8.8. This is another proof that Z is recurrent:

Let An =Tn < T+

0

. So (An)n is a decreasing sequence of events. Thus,

P1[⋂n

An] = limn

P[An] = limn

1n = 0.

By symmetry,

P−1[⋂n

A−n] = 0.

Now, the event that the walk never returns to 0 is the event that the walk takes a step to either

1 or −1 and then never returns to 0; i.e.

T+

0 =∞

=

X1 = 1,

⋂n

An

⊎X1 = −1,

⋂n

A−n

.

The Markov property gives

P0[T+0 =∞] = P1[

⋂n

An] + P−1[⋂n

A−n] = 0.

Example 8.9. What about the amount of time it takes to reach 0 or n?

Consider Yt = X2t − t.

E[Yt+1|X0, . . . , Xt] =1

2·((Xt + 1)2 − (t+ 1) + (Xt − 1)2 − (t+ 1)

)= Yt.

So (Yt)t is a martingale, and thus (YT∧t) is a bounded martingale under the measure Px. Thus,

since Y0 = X20 ,

x2 = Ex[YT |Y0] = Ex[X2T ]− Ex[T ] = pn(x)n2 − Ex[T ].

So by the previous example, for any 0 ≤ x ≤ n,

Ex[T ] = xn− x2 = x(n− x).

454

Remark 8.10. This is another proof that Z is null-recurrent:

Under P0, the event Tn < T+0 implies that T+

0 ≥ 2n. So,

P0[T+0 ≥ 2n] ≥ P0[X1 = 1, Tn < T+

0 ] = P1[Tn < T0] = 1n .

54

Since P0[T+0 ≥ 2n− 1] ≥ P0[T+

0 ≥ 2n], we get that

E0[T+0 ] =

∞∑m=0

P0[T+0 > m] =

∞∑m=1

P0[T+0 ≥ m]

=

∞∑n=1

P0[T+0 ≥ 2n− 1] + P0[T+

0 ≥ 2n] ≥∞∑n=1

2

n=∞.

Example 8.11. Consider the martingale X2t − t. Using the optional stopping theorem at time

T = T+0 we get that

1 = E1[X20 − 0] = E1[X2

T − T ] = E1[T ].

Similarly, E−1[T ] = 1. Since

E0[T+0 ] =

1

2

(E0[T+

0 |X1 = 1] + E0[T+0 |X1 = −1]

)=

1

2(E1[T0 + 1] + E−1[T0 + 1]) ,

we get that E0[T+0 ] = 2 <∞!

Where did we go wrong?

We could not use the optional stopping theorem, because the martingale X2t −t is not bounded!

454

Example 8.12. Actually, this last bit gives a third proof that E0[T+0 ] = ∞. Suppose that

Ex[T0] < ∞. Since (Xt)t is a martingale with bounded differences, by the optional stopping

theorem x = Ex[XT0 ]. But, XT0 = 0 a.s. so Ex[T0] =∞ for all x. Using the Markov property,

E0[T+0 ] =

1

2(E1[T0 + 1] + E−1[T0 + 1]) =∞.

454



55

Random Walks

Ariel Yadin

Lecture 9: Reversible Chains

9.1. Time Reversal

Let (Xt)t be Markov-P . Then, conditioned on Xt, we have that X[0, t] and X[t,∞) are

independent. This suggests looking at the chain run backwards in time - since determining the

past given the future will only depend on the current state.

However, in accordance with the second law of thermodynamics (entropy always increases),

we know that nice enough chains converge to a stationary distribution, even if the chain is started

from a very ordered distribution - namely a δ-measure. This suggests that there is a specific

direction we are looking at, and that the chain is moving from order to disorder represented by

the stationary measure.

However, if we start the chain from the stationary distribution, perhaps we can view the chain

both forwards and backwards in time. This is the content of the following.

• Definition 9.1. Let P be an irreducible Markov chain with stationary distribution π. Define

P (x, y) = P (y, x)π(y)π(x) . P is called the time reversal of P .

The next theorem justifies the name time reversal.

••• Theorem 9.2. Let π be the stationary distribution for an irreducible Markov chain P .

Then, P is an irreducible Markov chain, and π is a stationary distribution for P .

Moreover: Let (Xt)t be Markov-(π, P ). Fix any T > 0 and define Yt = XT−t, t = 0, . . . , T .

Then, (Yt)Tt=0 is Markov-(π, P ).

Proof. The fact that P is a Markov chain follows from∑y

P (x, y) =∑y

π(y)P (y, x) · 1π(x) = π(x)

π(x) = 1.

Also,

(πP )(x) =∑y

π(y)P (y, x) =∑y

π(y)π(x)P (x, y) 1π(y) = π(x) ·

∑y

P (x, y) = π(x),

56

so π is stationary for P .

Finally, note that π(x)P (x, y) = π(y)P (y, x). So,

P[Y0 = x0, . . . , YT = xT ] = Pπ[X0 = xT , X1 = xT−1, . . . , XT = x0]

= π(xT )P (xT , xT−1) · · ·P (x1, x0) = P (xT−1, xT ) · P (xT−2, xT−1) · · · P (x0, x1) · π(x0)

= π(x0) · P (x0, x1) · P (x1, x2) · · · P (xT−1, xT ).

ut

9.2. Reversible Chains

Recall the following definition:

• Definition 9.3. Let P be a Markov chain on S. A probability measure on S, π, is said to

satisfy the detailed balance equations if for all x, y ∈ S,

π(x)P (x, y) = π(y)P (y, x).

We also say that P and π are in detailed balance.

We also proved in the exercises that if P and π are in detailed balance, then π must be a

stationary distribution for P . (The opposite is not necessarily true, as is shown in the exercises.)

Immediately we see a connection between detailed balance and time reversals:

• Proposition 9.4. Let P be a Markov chain with stationary distribution π. The following are

equivalent:

• P and π are in detailed balance.

• P = P .

• For any T > 0, (Xt)Tt=0 is Markov-(π, P ) if and only if (XT−t)Tt=0 is Markov-(π, P ). [

The time reversal is the same as the forward-time chain. ]

Proof. We show that each bullet implies the one after it.

If P and π are in detailed balance, then for any states x, y,

P (x, y) = P (y, x)π(y)π(x) = π(x)P (x, y) 1

π(x) = P (x, y).

So P = P .

57

If P = P then for any T > 0, if (Xt)Tt=0 is Markov-(π, P ) then (XT−t)Tt=0 is Markov-(π, P ).

Since P = P we get that (XT−t)Tt=0 is Markov-(π, P ). Reversing the roles of Xt and XT−t we

get that for all T > 0, (Xt)Tt=0 is Markov-(π, P ) if and only if (XT−t)Tt=0 is Markov-(π, P ).

Now for the third implication, assume that for all T > 0, (Xt)Tt=0 is Markov-(π, P ) if and

only if (XT−t)Tt=0 is Markov-(π, P ). Take T = 1. Then (X0, X1) is Markov-(π, P ) if and only if

(X1, X0) is Markov-(π, P ). That is,

π(x)P (x, y) = Pπ[X0 = x,X1 = y] = Pπ[X1 = y,X0 = x] = π(y)P (x, y).

So P and π are in detailed balance. ut

9.3. Reversible chains as weighted graphs

• Definition 9.5. Let G be a graph. A conductance on G is a function c : V (G)2 → [0,∞)

satisfying

• c(x, y) = c(y, x) for all x, y.

• c(x, y) > 0 if and only if x ∼ y.

The pair (G, c) is called a weighted graph, or sometimes a network or electric network.

Remark 9.6. Let (G, c) be a weighted graph, with C =∑x,y c(x, y) <∞. Define cx =

∑y c(x, y)

and P (x, y) = c(x,y)cx

. P is a stochastic matrix, and so defines a Markov chain. For π(x) = cxC

we have that π is a distribution, and π(x)P (x, y) = c(x, y) = c(y, x) = π(y)P (y, x). Thus, P is

reversible.

We will refer to such a P as the random walk on G induced by c.

On the other hand, if P is a reversible Markov chain S, we can define a weighted graph as

follows: Let V (G) = S and c(x, y) = π(x)P (x, y). Let x ∼ y if c(x, y) > 0. Note that∑x,y

c(x, y) =∑x,y

π(x)P (x, y) = 1.

Also, we see that P is the random walk induced by (G, c).

X Connection to multiple edges and self-loops.

• Definition 9.7. If (G, c) is a weighted graph with∑x,y c(x, y) <∞, then the Markov chain

P (x, y) = c(x,y)∑z c(x,z)

is called the weighted random walk on G with weights c.

58

Example 9.8. Let (G, c) be the graph V (G) = 0, 1, 2, with edges E(G) = 0, 1 , 1, 2 , 0, 2and c(0, 1) = 1, c(1, 2) = 2 and c(2, 0) = 3.

The weighted random walk is then

P =

0 1

434

13 0 2

3

35

25 0

.The stationary measure is, of course, π(x) =

∑y c(x, y)/

∑z,w c(z, w) so π = [ 1

314

512 ] is the

stationary distribution.

We can compute that P = P (which is expected since P is reversible). 454

Example 9.9 (One dimensional Markov chains are almost reversible). Let P be a Markov

chain on Z such that P (x, y) > 0 if and only if |x − y| = 1. For x ∈ Z let px = P (x, x + 1) (so

1− px = P (x, x− 1).

Consider the following conductances on Z: Let c(0, 1) = 1. For x > 0 set

c(x, x+ 1) =

x∏y=1

py1− py

.

Let c(0,−1) = 1−pxpx

and for x < 0 set

c(x, x− 1) =

x∏y=−1

1− pypy

.

Note that for any x ∈ Z we have that

c(x, x+ 1) = c(x− 1, x) · px1− px

,

soc(x, x+ 1)

c(x, x+ 1) + c(x, x− 1)=

px(1− px)( px

1−px + 1)= px.

So P is the weighted random walk with weights given by c.

Moreover, note that

(c(x, x− 1) + c(x, x+ 1))P (x, x+ 1) = c(x, x− 1) · 11−px · px = c(x, x+ 1)

and

(c(x+ 1, x) + c(x+ 1, x+ 2))P (x+ 1, x) = c(x, x+ 1) · 11−px · (1− px) = c(x, x+ 1),

So for m(x) = c(x, x−1)+ c(x, x+1) we have that m(x)P (x, y) = m(y)P (y, x) for all x, y. That

is, if m was a distribution, P would be reversible.

59

To normalize m to be a distribution we would need that∑x

m(x) =∑x

c(x, x− 1) + c(x, x+ 1) = 2∑x

c(x, x+ 1) <∞.

For example, if px = 1/3 for x > 0 and px = 2/3 for x < 0 we would have that c(x, x+1) = 2−x

for x ≥ 0 and c(x, x− 1) = 2x for x ≤ 0. Thus∑x

m(x) = 2

∞∑x=0

2−x + 2−x = 4 · 2 = 8 <∞.

So π(x) = c(x,x−1)+c(x,x+1)8 is a stationary distribution.

In general, we see that a drift towards 0 would give a reversible chain. 454



60

Random Walks

Ariel Yadin

Lecture 10: Discrete Analysis

10.1. Laplacian

In order to study electric networks and conductances, we will first introduce the concept of

harmonic functions.

Let G = (V (G), c) be a network; recall that by this we mean: c : V (G)×V (G)→ [0,∞) with

c(x, y) = c(y, x) for all x, y ∈ G and cx :=∑y c(x, y) <∞ for all x. We denote by E(G) the set

of oriented edges of G; that is,

E(G) = (x, y) : c(x, y) > 0 .

(We write x ∼ y when c(x, y) > 0.) For e ∈ E(G) we write e = (e+, e−). c is known as the

conductance of the network.

Let C0(V ) = f : V (G)→ R and C0(E) = f : E(G)→ R be the sets of all functions of

vertices and (oriented) edges of G respectively.

We can define an operator ∇ : C0(V )→ C0(E) by: for any edge x ∼ y,

(∇f)(x, y) = c(x, y)(f(x)− f(y)).

We can also define an operator div : C0(E)→ C0(V )

(divF )(x) =∑y∼x

1

cx(F (x, y)− F (y, x)).

We can consider the spaces C0(V ), C0(E) with the inner products

〈f, f ′〉 =∑x

cxf(x)f ′(x) and 〈F, F ′〉 =∑e

1

c(e)F (e)F ′(e).

Consider the subspaces L2(V ) =f ∈ C0(V ) : 〈f, f〉 <∞

and L2(E) =

F ∈ C0(E) : 〈F, F 〉 <∞

.

The operator ∇ is a linear operator from L2(V ) to L2(E). Also div : L2(E)→ L2(V ) is a linear

61

operator, and

〈∇f, F 〉 =∑(x,y)

(f(x)− f(y))F (x, y) =∑x∼y

f(x)(F (x, y)− F (y, x))

=∑x

cxf(x)∑y∼x

1

cx(F (x, y)− F (y, x)) = 〈f, divF 〉.

So ∇∗ = div and div∗ = ∇ are dual of each other.

Recall that the weighted random walk on the network G is just the Markov process with

transition matrix given by P (x, y) = c(x,y)cx

.

Define the operator ∆ : C0(V )→ C0(V ) by ∆ = 12div∇. That is,

∆f(x) = 12div∇f(x) = 1

2

∑y∼x

1

cx(∇f(x, y)−∇f(y, x)) =

∑y

P (x, y)(f(x)− f(y)).

Exercise 10.1. Show that (in matrix form) ∆ = I − P where I is the identity

operator.

10.2. Harmonic functions

• Definition 10.1. A function f : V (G) → R is called harmonic at x if ∆f(x) = 0. f is

said to be harmonic on A if for all x ∈ A, f is harmonic at x. f is said to be harmonic, if f is

harmonic at all x.

Harmonic functions and martingales are intimately related.

• Proposition 10.2. Let G = (V (G), c) be a network. Let f : G→ R be a function. Let S ⊂ Gand let T = TSc be the first exit time of S, for (Xt)t, the weighted random walk on G.

Then, f is harmonic in S if and only if the sequence (Mt = f(Xt∧T ))t is a martingale under

Px for all x.

Proof. First assume that f is harmonic in S. Note that if x 6∈ S then Xt∧T = X0 = x a.s. under

Px. So as a constant sequence, Mt = f(x) is a martingale. So we only need to deal with x ∈ S.

The main observation here is that the Markov property is just the fact that

Ex[f(Xt+1)|Ft] =∑y

P (Xt, y)f(y) = (Pf)(Xt).

62

For any t, since 1T≥t+1 = 1T>t ∈ Ft, and f(XT )1T≤t ∈ Ft,

Ex[Mt+1|Ft] = Ex[f(Xt+1)|Ft]1T>t + f(XT )1T≤t = (Pf)(Xt)1T>t + f(XT )1T≤t.

If f is harmonic at x, then Pf(x) = f(x). Thus, since on the event T > t, f is harmonic at Xt,

we get that (Pf)(Xt)1T>t = f(Xt)1T>t. In conclusion,

Ex[Mt+1|Ft] = (Pf)(Xt)1T>t+f(XT )1T≤t = f(Xt)1T>t+f(XT )1T≤t = f(Xt∧T ) = Mt.

So Mt is a martingale.

For the other direction, assume that Mt∧T is a martingale. Then, for any x ∈ S,

f(x) = M0 = Ex[M1] = Ex[f(X1)] = (Pf)(x),

were we have used that under Px, T ≥ 1 a.s. So we have that for any x ∈ S, ∆f(x) =

(I − P )f(x) = 0. So f is harmonic in S. ut

Harmonic functions exhibit properties analogous to those in the continuous case.

• Proposition 10.3 (Solution to Dirichlet Problem). Let G = (V (G), c) be a network. Let

B ⊂ G (we think of B as the boundary). Let

D = x ∈ G : Px[TB <∞] = 1 .

(So B ⊂ D.) Let u : B → R be some bounded function (boundary values).

Then, there exists a unique function f : D → R that is bounded, harmonic in D \ B and

admits f(b) = u(b) for all b ∈ B.

Proof. Define f(x) = Ex[u(XTB )]. This is well defined, since under Px, TB < ∞ a.s. and since

u is bounded.

It is immediate to check that for any b ∈ B, f(b) = u(b). Also, for x ∈ D \ B, since TB ≥ 1

Px−a.s. , by the Markov property,

f(x) = Ex[u(XTB )] =∑y

P (x, y)Ey[u(XTB )] = Pf(x).

So f is harmonic at x.

For uniqueness, assume that g : D → R is bounded, harmonic in D \ B, and g(b) = u(b) for

all b ∈ B. We want to show that

for all x ∈ D, g(x) = Ex[u(XTB )].(10.1)

63

g is bounded, so (g(XTB∧t))t is a bounded martingale, so (10.1) holds by the optional stopping

theorem, because TB <∞ Px-a.s. for all x ∈ D. ut

If we remove the condition that TB < ∞ then we can only guaranty existence but not

uniqueness of the solution to the Dirichlet problem.

• Proposition 10.4. Let G = (V (G), c) be a network. Let B ⊂ G and let u : B → R be some

function.

Then, there exists a function f : G → R that is harmonic in G \ B and admits f(b) = u(b)

for all b ∈ B.

Proof. We define

f(x) = Ex[u(XTB )1TB<∞].

Obviously, f(b) = u(b) for all b ∈ B. Also, for x 6∈ B, since TB ≥ 1 Px-a.s. we have that f is

harmonic at x by the Markov property. ut

X Comparison to Poisson formula?

The maximum principle for harmonic functions in Rd states that if a non-constant function is har-

monic in a connected open subset of Rd then it will have all its maximal values on the boundary.

• Proposition 10.5 (Maximum Principle). Let G = (V (G), c) be a network. Let B ⊂ G and

D = x ∈ G : Px[TB <∞] = 1 . Let f : D → R be a bounded function, harmonic in D \ B.

Then,

supx∈D

f(x) = supx∈B

f(x) and infx∈D

f(x) = infx∈B

f(x).

That is, supremum and infimum are on the boundary.

Moreover, if D \ B is connected, and f is not constant, any x such that f(x) attains the

supremum or infimum must admit x ∈ B.

Proof. For any x ∈ D we know that

f(x) = Ex[f(XTB )] ≤ supb∈B

f(b),

because XTB ∈ B a.s.

Now, assume that f(x) ≥ supy∈D f(y) for some x ∈ D \ B. Let z ∈ D. Since D \ B is

connected, there exists a path from x to z that does not intersect B. Thus, there exists t > 0

64

such that Px[TB ≥ t,Xt = z] > 0. Since f is harmonic in D \ B, we get that f(XTB∧s)s is a

martingale. Thus, stopping at time s = t,

(f(x)−f(z))·Px[TB ≥ t,Xt = z] = Ex[(f(x)−f(Xt∧TB ))·1TB≥t,Xt=z] ≤ Ex[f(x)−f(Xt∧TB )] = 0.

So f(z) ≤ f(x) ≤ f(z) for any z ∈ D, and f is constant.

This completes the proofs for the supremum. For the infimum, consider the function g = −f .

So g is bounded, harmonic in D \ B. Since supx∈S g(x) = − infx∈S f(x) for any set S, we can

apply the proposition to g to get the assertions for the infimum. ut

Example 10.6. Consider the following network: V (G) = Z and c(x, x+ 1) =(

p1−p

)x. Suppose

that p > 1/2 (if p = 1/2 this is just the simple random walk on Z, and if p < 1/2 then we can

exchange x 7→ −x to get the same thing).

The weighted random walk here is given by

P (x, x+ 1) =c(x, x+ 1)

c(x, x+ 1) + c(x− 1, x)= p and P (x, x− 1) = 1− p.

First let’s prove that the weighted random walk here is transient. For example, recall that it

suffices to show that∞∑t=0

P0[Xt = 0] <∞.

Well, since at each step the walk moves right with probability p and left with probability 1− pindependently, we can model this walk by

Xt =

t∑k=1

ξk,

where (ξk)k are independent and all have distribution P[ξk = 1] = p = 1− P[ξk = −1].

The usual trick here is to note that ξk+12 ∼ Ber(p), so

P0[X2t = 0] = P[Bin(2t, p) = t] =

(2t

t

)pt(1− p)t.

(This is symmetric in p as expected.) Of course P[X2t+1 = 0] = 0 because of parity issues.

Now, since(

2tt

)is the number of size t subsets out of 2t elements, this is at most the total

number of subsets which is 22t. Since for p 6= 1/2, 4p(1− p) < 1, we get that

∞∑t=0

P[Xt = 0] ≤∞∑t=0

(4p(1− p))t =1

1− 4p(1− p) <∞.

This is one proof that for p 6= 1/2 the weighted walk is transient.

65

Now, let us consider B = 0 and boundary values u(0) = 1. What is a bounded harmonic

function f : G → R such that f is harmonic in G \ B? Well, we can take f ≡ 1, which is one

option. Another option is to take f(x) = Px[T0 < ∞]. But since G is transient, we know that

f 6≡ 1!

Since Px[T0 < ∞] = Ex[u(0)1T0<∞] we see that this is the second solution from above.

However, the uniqueness is only for functions defined on x : Px[T0 <∞] = 1, so a-priori

there is freedom to choose more than one option for those x’s such that Px[T0 <∞] < 1. 454

X add discussion on finite networks?

10.3. Green Function

Let G = (V (G), c) be a network. Let u : G→ R be a function. Suppose we want to solve the

equation ∆f = u.

If we had a function g : G×G→ R that satisfied

∆g(·, x) = 1x=·

for every x, we could write

f(y) =∑x

g(y, x)u(x).

Then,

∆f(z) =∑x

u(x)1x=z = u(z),

which is a solution. So finding the solution to ∆g = 1x=· is the basic step.

It turns out that such a g exists, and g is called the Green Function. It is the counterpart of

the classical Green Function.

• Proposition 10.7. Let G = (V (G), c) be a network. Let Z ⊂ G be a set (possibly empty).

Define

gZ(x, y) = Ex[

TZ−1∑k=0

1Xk=y].

Assume that at least one of the following conditions holds:

• The weighted random walk on G is transient.

• Z 6= ∅.

66

Then,

∆gZ(·, x) = 1x=·

for all x 6∈ Z. Moreover, for all x, y,

cxgZ(x, y) = cygZ(y, x).

Proof. The conditions of the proposition are there to ensure that

gZ(x, y) =

∞∑k=0

Px[Xk = y, TZ > k] <∞.

First, the Markov property gives that for a fixed y, using h(x) = gZ(x, y),

h(x) = 1x=y +

∞∑k=1

Px[Xk = y, TZ > k] = 1x=y +

∞∑k=1

∑w

P (x,w)Pw[Xk−1 = y, TZ > k − 1]

= 1x=y +∑w

P (x,w)h(w),

so ∆h(x) = 1x=y.

The symmetry of gZ is shown as follows: By the definition of the weighted random walk, we

have that cxP (x, y) = cyP (y, x) = c(x, y) for all x ∼ y. Thus, for any path in G, (x0, . . . , xn),

cx0 Px0 [X0 = x0, . . . , Xn = xn] = cxn Pxn [X0 = xn, . . . , xn = x0].

Thus, for any x, y,

cx Px[Xk = y, TZ > k] =∑γ:x→y

|γ|=k , γ∩Z=∅

cx Px[X[0, k] = γ]

=∑γ:x→y

|γ|=k , γ∩Z=∅

cy Py[X[0, k] = (γk, γk−1, . . . , γ0)]

= cy Py[Xk = x, TZ > k].

Summing over k completes the proof. ut



67

Random Walks

Ariel Yadin

Lecture 11: Networks

11.1. Some discrete analysis

Let G = (V, c) be a network.

Recall that for x ∼ y,

(∇f)(x, y) = c(x, y)(f(x)− f(y)).

Also,

(divF )(x) =∑y∼x

1

cx(F (x, y)− F (y, x)).

We have the duality formula

〈∇f, F 〉 =∑(x,y)

(f(x)− f(y))F (x, y) =∑x∼y

f(x)(F (x, y)− F (y, x))

=∑x

cxf(x)∑y∼x

1

cx(F (x, y)− F (y, x)) = 〈f, divF 〉,

where 〈f, f ′〉 =∑x cxf(x)f ′(x) and 〈F, F ′〉 =

∑e

1c(e)F (e)F ′(e). Also, ∆ = I − P = 1

2div∇.

We want to think of ∇ as differentiation. So the opposite operation should be some kind of

integral.

Let γ : x→ y be a path in G. For a function F ∈ C0(E) on the oriented edges of G, define∮γ

F =

|γ|−1∑j=0

F (γj , γj+1)1

c(γj , γj+1).

For a path γ define its reversal by γ = (γ|γ|, γ|γ|−1, . . . , γ0). Also, define F ∈ C0(E) by F (x, y) =

F (y, x).

We make a few observations:

• Proposition 11.1. Let F ∈ C0(E).

•∮γF =

∮γF . Thus, if F is anti-symmetric, i.e. F (x, y) = −F (y, x) for all x ∼ y, then

for any path∮γF = −

∮γF .

• If F = ∇f for some f ∈ C0(V ), then for any path γ : x → y we have that∮γF =

f(x)− f(y).

68

• If ∇f = ∇g then there exists a constant η such that f = g + η.

Proof. The first bullet is immediate just reversing the order of the edges in F .

For the second bullet, expanding the sum, we find that for γ : x→ y,∮γ

F =

|γ|−1∑j=0

f(γj)− f(γj+1) = f(x)− f(y).

For the third bullet, note that for any γ : x→ y we have that

f(x)− f(y) =

∮γ

∇f =

∮γ

∇g = g(x)− g(y).

So f(x)− g(x) = f(y)− g(y) for all x, y, and the difference f − g is constant. ut

• Definition 11.2. A function F ∈ C0(E) is said to respect Kirchhoff’s cycle law if for any

cycle γ : x→ x,∮γF = 0.

Gustav Kirchhoff

(1824-1887)

Any gradient respects Kirchhoff’s cycle law, as shown above. But the converse also holds:

• Proposition 11.3. F ∈ C0(E) respects Kirchhoff’s cycle law if and only if there exists

f ∈ C0(V ) such that F = ∇f .

In other words, if F respects Kirchhoff’s cycle law, then we can define∫F := f for any f

such that ∇f = F , and then all representations of∫F differ by some constant.

Proof. We only need to prove the “only if” direction.

Assume that F respects Kirchhoff’s cycle law. First, note that F must be anti-symmetric.

Indeed, for x ∼ y, the path (x, y, x) is a cycle, and

F (x, y) + F (y, x) = c(x, y) ·∮

(x,y,x)

F = 0.

Now, fix x, y ∈ G and let γ : x→ y and β : x→ y. Then, the path α = γβ = (γ0, . . . , γ|γ|, β|β|−1, . . . , β0)

is a cycle α : x→ x. So ∮γ

F −∮β

F =

∮γ

F +

∮β

F =

∮α

F = 0.

So∮γF does not depend on the choice of γ : x→ y, but only on the endpoints x and y.

Fix some a ∈ G and for any x ∈ G define f(x) =∮γF for some γ : x→ a, with the convention

that f(a) = 0. It is clear that for any x ∼ y,

F (x, y)1

c(x, y)=

∮(x,y)

F = f(x)− f(y).

So F = ∇f . ut

69

11.2. Electrical Networks

Let G = (V, c) be a network. For each edge x ∼ y, define the resistance of the edge to be

r(x, y) = 1c(x,y) . Let A,Z ⊂ G be two disjoint subsets.

If we were physicists, we could enforce voltage 1 on A, voltage 0 on Z, and look at the voltage

and current flowing through the graph G, where each edge is a r(x, y)-Ohm resistor. According

to Ohm’s law, the current equals the potential difference divided by the resistance I = ∇VR .

Kirchhoff would reformulate this telling us that the total current out of each node should be 0,

except for those nodes in A ∪ Z.

Let us turn this into a mathematical definition. The physics will only serve as intuition (albeit

usually good intuition).

• Definition 11.4. Let G = (V, c) be a network. Let A,Z be disjoint subsets of G.

A voltage imposed on A and Z is a function v : G→ R that is harmonic in G \ (A ∪ Z).

A unit voltage is a voltage v with v(a) = 1 for all a ∈ A and v(z) = 0 for all z ∈ Z.

Given a voltage v, the current induced by v is defined I(x, y) = ∇v(x, y) = c(x, y)(v(x)−v(y))

for all oriented edges x ∼ y.

X Note that this has the form I(x, y) = v(x)−v(y)r(x,y) , which is the form of Ohm’s law.

Georg Ohm (1789-1854)

• Definition 11.5. Let G = (V, c) be a network, and let A,Z be disjoint subsets of G. A flow

from A to Z is a function F on oriented edges of G satisfying:

• F is anti-symmetric: For every edge x ∼ y, F (x, y) = −F (y, x).

• F is divergence free: For every x ∈ G\(A∪Z), divF (x) =∑y∼x

1cx

(F (x, y)−F (y, x)) =

0. (A function being divergence free is sometimes said to respect Kirchhoff’s node

law.)

X For simplicity, we will sometimes extend a flow F to all pairs (x, y) by defining F (x, y) = 0

for x 6∼ y.

Example 11.6. If v is a voltage, then the current induced by v is a flow; indeed,

I(x, y) = c(x, y)(v(x)− v(y)) = −c(y, x)(v(y)− v(x)) = −I(y, x),

70

and for x 6∈ A ∪ Z,

divI(x) =∑y∼x

1

cx(I(x, y)− I(y, x)) = 2

∑y∼x

c(x, y)

cx(v(x)− v(y)) = 2∆v(x) = 0.

This fact is Kirchhoff’s node law. 454

Example 11.7. If v is a voltage, and I is the current induced by v, then we have Kirchhoff’s

cycle law: for any cycle γ : x→ x, γ = (x = γ0, γ1, . . . , γn = x),

n−1∑j=0

I(xj , xj+1)r(xj , xj+1) =

n−1∑j=0

v(xj)− v(xj+1) = v(x)− v(x) = 0.

This of course is due to the fact that any derivative ∇v respects Kirchhoff’s cycle law. 454

Exercise 11.1. Let G = (V, c) be a finite network and A,Z disjoint subsets of

G. Let I be flow from A to Z, that satisfies Kirchhoff’s cycle law: for any cycle γ : x → x,

γ = (x = γ0, γ1, . . . , γn = x),

n−1∑j=0

I(xj , xj+1)r(xj , xj+1) = 0.

Then, there exists a voltage v such that I is induced by v. Moreover, if u, v are two such voltages,

then v − u = η, for some constant η.

11.3. Probability and Electric Networks

Since voltages are harmonic functions, it is not surprising that there is a connection between

probability and electric networks. Let us elaborate on this.

• Definition 11.8. Let G = (V, c) be a network. Let a ∈ G and Z ⊂ G. Let v be a voltage

such that v(z) = 0 for all z ∈ Z and v(a) = 1. Define the effective conductance from a to Z

by

Ceff(a, Z) : =∑x

I(a, x) =ca2

divI(a)

=∑x

c(a, x)(v(a)− v(x)) = ca∆v(a),

where I is the current induced by v.

71

The effective resistance is defined as the reciprocal of the effective conductance.

Reff(a, Z) := (Ceff(a, Z))−1.

• Proposition 11.9. Let G = (V, c) be a network. Let a , Z be disjoint subsets. Let v be a

voltage such that v(z) = 0 for all z ∈ Z, and v(a) 6= 0 arbitrary. Let I be the current induced by

v. Then,

• Ceff(a, Z) =∑x I(a,x)

v(a) = ca∆v(a)v(a) .

• If the component of a in G \ Z is finite, then Ceff(a, Z) = ca Pa[TZ < T+a ]. Specifically,

in this case Ceff(a, Z) does not depend on the choice of the voltage.

Proof. The first bullet follows from the fact that u = 1v(a)v is a voltage with u(z) = 0 for all

z ∈ Z and u(a) = 1, and v(a)∆u = ∆v.

For the second bullet, let D be the component of a in G\Z. We have two harmonic functions

on D: u = vv(a) and Px[TZ > Ta], which are 0 on Z and 1 on a. Thus, these functions are equal,

because D is finite. Now,

ca Pa[TZ < T+a ] = ca

∑x

P (a, x)(1− u(x)) =1

v(a)

∑x

c(a, x)(v(a)− v(x))

=ca∆v(a)

v(a)= Ceff(a, Z).

ut

11.4. Resistance to Infinity

Example 11.10. Let G = (V, c) be an infinite network, and let a ∈ G. Let (Gn)n be an

increasing sequence of finite connected subgraphs of G, that contain a, such that G =⋃nGn

(in this case we say that (Gn)n exhaust G).

For every n, let Zn = G \Gn. Note that the connected component of a in G \Zn is Gn which

is finite. Thus, we can consider the effective conductance from a to Zn, Ceff(a, Zn). This is a

sequence of numbers, which converges to a limit; indeed, if T+a < ∞, since X[0, T+

a ] is a finite

path, there exists n0 such that for all n > n0, X[0, T+a ] ⊂ Gn. The events TZn < T+

a form a

decreasing sequence, so

limn→∞

Ceff(a, Zn) = ca limn→∞

Pa[TZn < T+a ] = ca Pa[T+

a =∞].

72

Thus, we see that limn→∞ Ceff(a, Zn) does not depend on the choice of the exhausting subgraphs

(Gn)n, and

(G, c) is recurrent ⇐⇒ limn→∞

Ceff(a, Zn) = 0 ⇐⇒ limn→∞

Reff(a, Zn) =∞.

454

In light of the above:

• Definition 11.11. Let G = (V, c) be an infinite network, and let a ∈ G. Let (Gn)n be an

increasing sequence of finite connected subgraphs of G, that contain a, such that G =⋃nGn.

Let Zn = G \Gn.

Define the conductance from a to infinity and resistance from a to infinity as

Ceff(a,∞) = limn→∞

Ceff(a, Zn) and Reff(a,∞) = Ceff(a,∞)−1.

Thus, the theorem is:

••• Theorem 11.12. The weighted random walk in a network G is recurrent if and only if the

resistance from some vertex a to infinity is infinite.



73

Random Walks

Ariel Yadin

Lecture 12: Network Reduction

12.1. Network Reduction

Recall that

Ceff(a,∞) = ca Pa[T+a =∞].

So the effective resistance or conductance to infinity will not help us decide whether (G, c) is

recurrent unless we have a way of simplifying a sequence finite networks Gn.

We will now compute a few operations that will help us reduce networks to simpler ones

without changing the effective conductance between a and Z. Thus, it will give us the ability to

compute probabilities on some networks.

When we wish two differentiate between effective conductance (or resistance) in two networks,

we will use Ceff(a, Z;G) and Ceff(a, Z;G′).

12.1.1. Parallel Law.

Exercise 12.1. Suppose (G, c) is a network with multiple edges. Let (G′, c′) be the

network without multiple edges where the weight c′(x, y) is the sum of all weights between x

and y in (G, c). That is,

c′(x, y) =∑

e∈E(G),e+=x,e−=y

c(e).

Then, (G′, c′) is a network without multiple edges, and the weighted random walk on (G′, c′)

has the same distribution as the weighted random walk on (G, c).

Specifically, for all a, Z the effective conductance between a and Z does not change.

Solution. This is just the fact that the transition probabilities for (G, c) and (G′, c′) are propor-

tional to each-other:

P (x, y) ∑

e : e+=x,e−=y

c(e) = c′(x, y) P ′(x, y).

74

ut

x y

x y

c1

c2

c1 + c2

Figure 3. Parallel Law

12.1.2. Series Law.

• Proposition 12.1 (Series Law). Let (G, c) be a network. Suppose there exists w that has

exactly two adjacent vertices u1, u2.

Let (G′, c′) be the network given by V (G′) = V (G) \ w, and

c′(x, y) =

c(x, y) x, y ∈ V (G′) , x, y 6= u1, u21

r(u1,w)+r(u2,w) + c(u1, u2) x, y = u1, u2 .

That is, we remove the edges between u1 ∼ w and u2 ∼ w and add weight 1c(u1,w)−1+c(u2,w)−1 to

the edge u1 ∼ u2 (which may have originally had weight 0).

Then, for any a, Z such that w 6∈ a∪Z, and such that the component of a in G\Z is finite,

we have that Ceff(a, Z;G) = Ceff(a, Z;G′).

Proof. Let (G′, c′) be a network identical to (G, c) except that c′(u1, w) = c′(u2, w) = 0 and

c′(u1, u2) = c(u1, u2)+C. We want to calculate C so that any function that is harmonic at u1, w

on G will be harmonic at u1 on G′ as well.

Let f : G→ R be harmonic at u1, w on G. If f(u1) = f(w), then harmonicity at w together

with the fact that w is adjacent only to u1, u2, give that f(u1) = f(w) = f(u2). So the weight

of the edges between u1, u2, w does not affect harmonicity of function, and can be changed.

75

Hence, we assume that f(u1) 6= f(w). Let h = f−f(w)f(u1)−f(w) . So h is harmonic at u1, w and

h(w) = 0 and h(u1) = 1. Harmonicity at u1 gives that

∑y 6=w

c(u1, y)(h(u1)− h(y)) = −c(u1, w)(h(u1)− h(w)) = −c(u1, w).

Harmonicity at w gives

c(u1, w) + c(u2, w)h(u2) = 0.

Thus, in order for h to be harmonic at u1 on G′, we require that

0 =∑y 6=w

c(u1, y)(h(u1)− h(y)) + C(h(u1)− h(u2)) = −c(u1, w) + C ·(

1 +c(u1, w)

c(u2, w)

).

This leads to the equation

C =c(u1, w) · c(u2, w)

c(u1, w) + c(u2, w)=

1

r(u1, w) + r(u2, w).

Thus, we have shown that choosing the weight 1r(u1,w)+r(u2,w) as above, we get that if f is

harmonic at u1, w on G, then f is also harmonic at u1 on G′. Taking u1 to play the role of u2,

the same holds if f is harmonic at u2 and w on G.

Let a, Z be as in the proposition. Let D be the component of a in G \ Z. Let v be a unit

voltage imposed on a and Z in D. Since we chose the weight on u1 ∼ u2 in G′ correctly, we get

that v is also a unit voltage imposed on a and Z in G′.

Because Ceff(a, Z;G) =∑y∇v(a, y) and similarly in G′, and since G \ Z and G′ \ Z only

differ at edges adjacent to u1, u2 and w, we have that Ceff(a, Z;G) − Ceff(a, Z;G′) = 0 for all

a 6∈ u1, u2.Now, if a = u1 then we have by harmonicity of v at w,

(c(u1, w) + c(u2, w))v(w) = c(u1, w)v(a) + c(u2, w)v(u2).

Since the only difference is on edges adjacent to u1, u2 and w,

Ceff(a, Z;G)− Ceff(a, Z;G′) = c(a,w)(v(a)− v(w))− 1

r(u1, w) + r(u2, w)· (v(a)− v(u2))

=c(u1, w)

c(u1, w) + c(u2, w)· ((c(u1, w) + c(u2, w))(v(a)− v(w)) + c(u2, w)(v(a)− v(u2)))

= 0.

ut

76

Remark 12.2. Note that if w has exactly 2 neighbors in a network (G, c) as above, with resistances

r1, r2 on these edges, then the network with these two resistors exchanged with a single resistor of

resistance r1+r2 is an equivalent network in the sense that effective resistances and conductances

do note change, as above.

u1 u2w

u1 u2

c1 c2

11c1

+ 1c2

Figure 4. Series Law

Example 12.3. What is the effective conductance between a and z in the following network:

a z

a z

1/2

1/2

1/2

1/2

1/2

a z

3/2

a z

3/5

1/2

a z

3/8

1/3

a z

17/24

77

454

12.1.3. Contracting Equal Voltages.

Exercise 12.2. Let (G, c) be a network, and let v be a unit voltage imposed on a and

Z. Suppose x, y 6∈ a ∪ Z are such that v(x) = v(y). Define (G′, c′) by contracting x, y to the

same vertex; that is: V (G′) is V (G) with the vertices x, y removed and a new vertex xy instead.

All edges and weights stay the same, except for those adjacent to x or y, for which we have

c′(xy,w) = c(x,w) + c(y, w) for all w.

Then, v is a unit voltage imposed on a and Z in G′ (where v(xy) := v(x) = v(y)), and the

effective conductance between a and Z does not change: Ceff(a, Z;G) = Ceff(a, Z;G′).

Solution. Since the only change is at edges adjacent to x and y, we only need to check that for

w = xy or w ∼ xy such that w 6∈ a ∪ Z, v is harmonic at w in G′.

For w ∼ xy,

∑u

c′(w, u)(v(w)− v(u)) =∑u6=x,y

c(w, u)(v(w)− v(u)) + (c(w, x) + c(w, y))(v(w)− v(xy))

=∑u

c(w, u)(v(w)− v(u)),

where we have used that v(xy) = v(x) = v(y). So if v is harmonic at w in G then v is harmonic

at w in G′.

Similarly, for w = xy,

∑u

c′(xy, u)(v(xy)− v(u)) =∑u

c(x, u)(v(x)− v(u)) +∑u

c(y, u)(v(y)− v(u)),

so v is harmonic at xy in G′. ut

Example 12.4. What is the effective conductance between a and z in the following network:

78

a z

a z

a z

2

1/2

1/2

a z

2

a z

2/3

454

Exercise 12.3. Let (G, c) be a network, and let v be a unit voltage imposed on a and Z.

Suppose x, y 6∈ a ∪ Z are such that v(x) = v(y). Let c′ be a new weight function on G that is

identical to c except for the edge x ∼ y. For x ∼ y let c′(x, y) = C ≥ 0 some arbitrary number,

possibly 0. Let ∆′ be the Laplacian on (G, c′).

Then, v is harmonic in G \ (a ∪ Z) also with respect to c′. Conclude that the effective

conductance between a and Z is the same in both (G, c) and (G, c′).

Solution. Since the difference is only the edge x ∼ y, we only need to check that the harmonicity

is preserved at x and y. Because v(x)− v(y) = 0,

c′z∆′v(z) =

∑w

c′(z, w)(v(z)− v(w))

=∑

w : z,w6=x,yc(z, w)(v(z)− v(w)) + c′(x, y)(v(x)− v(y)) = cz∆v(z).

79

Thus v is a unit voltage imposed on a and Z with respect to c′ as well. Also,

Ceff(a, Z; (G, c′)) = c′a∆′v(a) = ca∆v(a) = Ceff(a, Z; (G, c)).

ut

Example 12.5. The network from the previous example can be reduced by removing the vertical

edge. 454

Exercise 12.4. Let G = (V, c) be a network such that V = Z and x ∼ y if and only if

|x− y| = 1. For the weighted random walk (Xt)t on G define

Vt(x) =

t∑n=0

1Xn=x

the number of visits to x up to time t. Let T+0 = inf t ≥ 1 : Xt = 0.

Calculate E0[VT+0

(x)] as a function of c only.



80

Random Walks

Ariel Yadin

Lecture 13: Thompson’s Principle

Suppose G is a network. We think of weights as conductances, so it seems intuitive that

increasing the conductance of edges would result in making the graph more transient. This is

what we prove in this lecture.

13.1. Thomson’s Principle

• Definition 13.1. For F ∈ L2(E) and for v such that ∇v ∈ L2(E) define the energy of F

and of v by

E(F ) := 〈F, F 〉 =∑e

r(e)F (e)2 and E(v) := 〈∇v,∇v〉 =∑x∼y

c(x, y)(v(x)− v(y))2.

X Note that if v ∈ L2(V ) then E(v) = 2〈∆v, v〉 by the duality formula.

• Lemma 13.2 (Thomson’s Principle / Dirichlet Principle). Let G = (V, c) be a finite network,

let A,Z be disjoint subsets.

Joseph John Thomson

(1856-1940)

The unit voltage v is the function that minimizes the energy E(f) over all functions f with

f(a) = 1 for all a ∈ A and f(z) = 0 for all z ∈ Z.

Proof. By the duality formula we have that for any f, f ′ ∈ C0(V ),

〈∆f, f ′〉 = 12 〈∇f,∇f ′〉 = 〈f,∆f ′〉.

(That is, the Laplacian is self-dual.) Since f −v = 0 on A∪Z, and since v is harmonic off A∪Z,

we get that (f − v)∆v ≡ 0. So,

〈∆(f − v), v〉 = 〈f − v,∆v〉 =∑x

cx(f(x)− v(x))∆v(x) = 0.

This implies

E(f) = 2〈∆(f − v + v), f − v + v〉 = E(f − v) + E(v) ≥ E(v),

where we have used that the energy is always non-negative. utJohann Dirichlet

(1805-1859)

81

• Lemma 13.3 (Thomson’s Principle - Dual Form). Let G be a finite network, let a , Z be

disjoint subsets. Let v(x) = 12gZ(x, a) where gZ(x, a) is the Green function (the expected number

of visits to a started at x from time 0 until before hitting Z).

Then, over all flows F from a to Z with flow divF (a) = 1, the energy E(F ) is minimized at

I = ∇v.

Proof. First, we know that v is a voltage on a and Z with v(z) = 0 for all z ∈ Z. Also,

divI(a) = 2∆v(a) = 1.

Let F be a flow from a to Z with divF (a) = 1. Then, F − I is a flow from a to Z with

div(F − I)(a) = 0. Since div(F − I) is 0 off Z, and v is 0 on Z, we get that div(F − I)v ≡ 0.

Thus,

〈F − I, I〉 = 〈div(F − I), v〉 = 0.

So,

E(F ) = E(F − I + I) = E(F − I) + E(I) ≥ E(I).

ut

• Corollary 13.4 (Rayleigh’s Monotonicity Principle). Let G be a finite network, and let a be

a point not in a subset Z. Suppose c′ is a weight function on G such that c ≤ c′. Then,

John William Strutt, 3rd

Baron Rayleigh (1842-1919)

Ceff(a, Z; c) ≤ Ceff(a, Z; c′).

Proof. Let v be the unit voltage imposed on a and Z with respect to c, and let u be the unit

voltage imposed on a and Z with respect to c′.

Note that

E(v) = 2〈∆v, v〉 = 2∑x

cx∆v(x)v(x) = 2ca∆v(a) = 2Ceff(a, Z; c),

because ∆v(x) = 0 for x 6∈ a ∪ Z, v(z) = 0 for z ∈ Z, and v(a) = 1. Similarly, E(u) =

2Ceff(a, Z; c′). (This fact is called conservation of energy.)

Since ca ≤ c′a, using Thomson’s principle,

Ceff(a, Z; c) = 12

∑x,y

c(x, y)(v(x)− v(y))2 ≤ 12

∑x,y

c(x, y)(u(x)− u(y))2

≤ 12

∑x,y

c′(x, y)(u(x)− u(y))2 = Ceff(a, Z; c′).

ut

82

• Corollary 13.5. Let G be an infinite network. Let c′ be a weight function on G such that

c′ ≥ c.If (G, c) is transient, then also (G, c′) is transient.

Proof. Fix a vertex o ∈ G. For every n, let Gn be the ball of radius n around o; that is

Gn = x ∈ G : dist(x, o) ≤ n .

So (Gn)n form an increasing sequence of subgraphs that exhaust G. Let Zn = Gn+1 \Gn, which

is the outer boundary of the ball of radius n. We know that G is transient, which is equivalent

to

limn→∞

Reff(a, Zn; c) <∞

(because imposing a unit voltage on a and G \Gn is the same as imposing a unit voltage on a

and Zn). Now, for each fixed n, since c′ ≥ c, considering the finite networks (Gn, c) and (Gn, c′),

we have that

Ceff(a, Zn; c) ≤ Ceff(a, Zn, c′).

Thus,

limn→∞

Reff(a, Zn; c′) ≤ limn→∞

Reff(a, Zn; c) <∞,

so (G, c′) is transient. ut

Exercise 13.1. Let H be a subgraph of a graph G (not necessarily spanning all vertices

of G. Show that if the simple random walk on H is transient then so is the simple random walk

on G.

13.2. Shorting

Another intuitive network operation that can be done is to short two vertices. This can be

thought of as imposing a conductance of∞ between them. Since this increases the conductance,

it is intuitive that this will increase the effective conductance.

• Proposition 13.6. Let (G, c) be a finite network. Let b, d ∈ G and define (G′, c′) by shorting

b and d: Let V (G′) = V (G)\b, d∪bd and c′(z, w) = c(z, w) for z, w 6∈ b, d and c′(bd, w) =

c(b, w) + c(d,w).

Then, for any disjoint sets a , Z, we have that Ceff(a, Z;G) ≤ Ceff(a, Z;G′).

83

Proof. Let v be the unit voltage imposed on a and Z with respect to c, and let u be the unit

voltage imposed on a and Z with respect to c′.

Conservation of energy tells us that

2Ceff(a, Z; c) =∑x,y

c(x, y)(v(x)− v(y))2 and 2Ceff(a, Z; c′) =∑x,y

c′(x, y)(u(x)− u(y))2.

Note that u can be viewed as a function on V (G) by setting u(b) = u(d) = u(bd). Since ca ≤ c′a,

using Thomson’s principle,

Ceff(a, Z; c′) = 12

∑x,y∈G\b,d

c′(x, y)(u(x)− u(y))2 +∑w

c′(bd, w)(u(bd)− u(w))2

= 12

∑x,y∈G\b,d

c(x, y)(u(x)− u(y))2 +∑k=b,d

∑w

c(k,w)(u(k)− u(w))2

= 12

∑x,y∈G

c(x, y)(u(x)− u(y))2

≥ 12

∑x,y∈G

c(x, y)(v(x)− v(y))2 = Ceff(a, Z; c). ut



84

Random Walks

Ariel Yadin

Lecture 14: Nash-Williams

14.1. A Probabilistic Interpretation of Current

• Proposition 14.1. Let (G, c) be a network. Let a ∈ G and Z ⊂ G such that the component

of a in G \ Z is finite.

For the weighted random walk on G, (Xt)t, and for any edge x ∼ y, let Vx,y be the number of

times the walk goes from x to y until hitting Z; that is,

Vx,y :=

TZ∑k=1

1Xk−1=x,Xk=y.

Then,

Ea[Vx,y − Vy,x] = ∇v(x, y) · Reff(a, Z),

where v is a unit voltage imposed on a, Z.

Proof. Let

g(x) =cacxgZ(a, x) =

cacx

EaTZ−1∑k=0

1Xk=x = gZ(x, a).

We have already seen that g is harmonic in G \ (a ∪ Z). Also, g(z) = 0 for all z and

g(a) =1

Pa[TZ < T+a ]

= ca · Reff(a, Z).

(g is a voltage imposed on a, Z with g(z) = 0 for all z ∈ Z.)

Now,

Ea[Vx,y] =

∞∑k=1

Pa[Xk−1 = x,Xk = y, TZ > k − 1]

=

∞∑k=0

Pa[Xk = x, TZ > k]P (x, y) =1

ca· cxg(x)P (x, y) =

1

cag(x)c(x, y).

Thus,

Ea[Vx,y − Vy,x] =1

ca· c(x, y)(g(x)− g(y)).

That is, since v = gg(a) is a unit voltage imposed on a, Z, and since c−1

a g = Reff(a, Z)v,

Ea[Vx,y − Vy,x] = Reff(a, Z) · c(x, y)(v(x)− v(y)). ut

85

14.2. The Nash-Williams Criterion

• Definition 14.2. Let G be a graph. Let A,Z be disjoint subsets.

A subset of edges Π is a cut between A and Z if any path γ : a→ z with a ∈ A and z ∈ Zmust pass through and edge in Π.

A subset of edges Π is a cutest, (sometimes a cut between A and ∞), if any infinite simple

path that starts at a ∈ A, must pass through an edge in Π.

One intuitive statement is that if e is a cut edge between a and Z, then Reff(a, Z) ≥ r(e),

because there is at least some more resistance between a and Z.

• Proposition 14.3. Let (G, c) be a finite network. Let a , Z be disjoint subsets and let e be

a cut edge between a and Z. Then Reff(a, Z) ≥ r(e).

Proof. Suppose that e = (x, y). Let Vx,y be the number of times a random walk crosses the edge

(x, y) until hitting Z and let Vy,x be the number of times the walk crosses y, x before hitting Z.

We have seen that

Ea[Vx,y − Vy,x] = ∇v(x, y) · Reff(a, Z),

where v is a unit voltage imposed on a and Z.

Because G is finite, we know by uniqueness of harmonic functions that v(x) = Px[Ta < TZ ].

Because (x, y) is a cut edge between a and Z, to get from y to a the walk must pass through x;

that is,

v(y) = Py[Ta < TZ ] = Py[Tx < TZ ] · Px[Ta < TZ ] ≤ v(x).

So 0 ≤ v(x)− v(y) ≤ 1.

Now, since (x, y) is a cut edge between a and Z, we must have that Vx,y − Vy,x ≥ 1, because

the walk must cross the edge (x, y) and every time it crosses back over (y, x) it must return to

cross (x, y). Thus,

1 ≤ c(x, y)(v(x)− v(y))Reff(a, Z) ≤ c(x, y)Reff(a, Z).

ut

If Π is a cut between a and Z then shorting all edges in Π would result in a cut edge of

conductance at most∑e∈Π c(e). A natural generalization of the above is the following.

• Lemma 14.4 (Nash-Williams Inequality). Let (G, c) be a finite network, and a , Z disjoint

sets. Suppose that Π1, . . . ,Πk are k pairwise disjoint cuts between a and Z. Then,

Crispin Nash-Williams

(1932-2001)

86

Reff(a, Z) ≥k∑j=1

∑e∈Πj

c(e)

−1

.

Proof. Note that since removing edges from a cut-set only increases the right hand side, we

can prove the lemma with the assumption that cut-sets are minimal. Specifically, they do not

contain both (x, y) and (y, x) for an edge x ∼ y.

Let v be a unit voltage imposed on a and Z. We know (conservation of energy) that 12E(v) =

Ceff(a, Z).

For an edge (x, y) let Vx,y be the number of crossings from x to y until hitting Z; that is,

Vx,y =

TZ∑k=1

1Xk−1=x,Xk=y.

Then, for any minimal cut Π between a and Z, we have that Pa-a.s.

∑(x,y)∈Π

Vx,y − Vy,x ≥ 1.

Also, we have that for any edge (x, y),

Ea[Vx,y − Vy,x] = ∇v(x, y) · Reff(a, Z).

Thus, applying Cauchy-Schwarz, for any cut Π between a and Z,

1 ≤ Reff(a, Z)2 ·

∑(x,y)∈Π

∇v(x, y)

2

≤ Reff(a, Z)2 ·∑

(x,y)∈Π

c(x, y) ·∑

(x,y)∈Π

c(x, y)(v(x)− v(y))2.

That is, for any one of the cuts Πj ,∑e∈Πj

c(e)

−1

≤ Reff(a, z)2 ·∑

(x,y)∈Π

c(x, y)(v(x)− v(y))2.

Since the cuts Πj are disjoint, and since we assumed that the cut does not contain both (x, y)

and (y, x) (because they are minimal), we have that

k∑j=1

∑e∈Πj

c(e)

−1

≤ Reff(a, z)2 · 1

2

∑x,y

c(x, y)(v(x)− v(y))2 = Reff(a, Z).

ut

87

• Corollary 14.5 (Nash-Williams Criterion). Let (G, c) be an infinite network. If (Πn)n is a

sequence of pairwise disjoint finite cutsets between a and ∞ such that

∞∑n=1

(∑e∈Πn

c(e)

)−1

=∞,

then (G, c) is recurrent.

Proof. Fix n. Let Gn be subnetwork induced by (G, c) on the smallest ball (in the graph metric)

that contains⋃nj=1 Πj . Let Zn = G \Gn.

So (Gn)n exhaust G and for each fixed n,

Reff(a, Zn) ≥n∑j=1

∑e∈Πj

c(e)

−1

.

Letting n → ∞, the left hand side tends to Reff(a,∞) and the right hand side tends to the

infinite sum. Since Reff(a,∞) ≥ ∞, (G, c) is recurrent. ut

Example 14.6. We now give a proof that Z and Z2 are recurrent.

Recall that we could prove this by showing that

P0[X2t = 0] ≥ const ·

1/√t d = 1

1/t d = 2

.

However, it will be easier to do this without these calculations (especially the more complicated

Z2 case).

For Z this is easy, because Z is just composed of edges in series, so for any n > 0,

Reff(0, −n, n) = n/2→∞.

Now for Z2: By the Nash-Williams criterion, it suffices to find disjoint cutsets (Πn)n such

that ∑n

1

|Πn|=∞.

Indeed, taking

Πn = (x, y) : ||x||∞ = n, ||y||∞ = n+ 1

we have that |Πn| = 4(2n+ 1). 454

88

X ADD: example that Nash-Williams is not necessary.



89

Random Walks

Ariel Yadin

Lecture 15: Flows

15.1. Finite Energy Flows

The Nash-Williams criterion was a sufficient condition for recurrence. We now turn to a

stronger condition which is necessary and sufficient.

Let (G, c) be an infinite weighted graph. Recall that a flow from A to Z is an anti-symmetric

function with vanishing divergence off A ∪ Z.

In this spirit, we say that F is a flow from o ∈ G to ∞ if

• F is anti-symmetric.

• divF (o) 6= 0 and divF (x) = 0 for all x 6= o.

If in addition divF (o) = 1 we say that F is a unit flow from o to infinity.

••• Theorem 15.1 (T. Lyons, 1983). A weighted graph (G, c) is transient if and only if there

exists a finite energy flow on (G, c) from some vertex o ∈ G to ∞.

Terrence Lyons

Proof. The proof is an adaptation of a method of H. Royden in the continuous world.

Assume that F is a flow from o to ∞. By changing F 7→ F/divF (o) we can assume without

loss of generality that F is a unit flow.

For each n let Gn be the finite subnetwork of G induced on the ball of radius n around o (in

the graph metric). Let Zn = Gn \Gn−1. Transience of G is equivalent to

limn→∞

Reff(o, Zn) <∞.

Let vn = 12gZn(x, o), where gZn is the Green function on the finite network Gn. Since

div∇vn(o) = 2∆vn(o) = 1, the dual version of Thompson’s principle tells us that E(F ) ≥EGn(F ) ≥ EGn(vn).

Also, since

vn(o) = 12gZn(o, o) =

1

2Po[TZn < T+o ]

=co2Reff(o, Zn)

90

and ∆vn(o) = 12 , we get that

EGn(vn) = 2∑x

cx∆vn(x)vn(x) = 2co∆vn(o)vn(o) =c2o2Reff(o, Zn).

Thus, if F has finite energy on G then

limn→∞

Reff(o, Zn) =2

c20limn→∞

EGn(vn) ≤ 2

c2oE(F ) <∞

and (G, c) is transient.

For the other direction, assume that (G, c) is transient and consider the functions vn(x) =

Px[To < TZn ] and v(x) = Px[To < ∞]. vn is a unit voltage imposed on o and Zn in Gn, and

vn(x) v(x) for every x by monotone convergence. Note that v(o) = vn(o) = 1 and v, vn are

non-constant because (G, c) is transient.

Let In = ∇vn and I = ∇v. Note that for every edge e, In(e)2r(e) → I(e)2r(e). Also, for

every n, since Gn is finite,

E(In) = 2〈∆vn, vn〉 = 2co∆vn(o)vn(o) = 2Ceff(o, Zn) ≤ 2co <∞.

Thus, Fatou’s lemma (for sums) tells us that

E(I) =∑e

limn→∞

In(e)2r(e) ≤ lim infn→∞

∑e

In(e)2r(e) ≤ 2co <∞.

Since I = ∇v, we have that

divI(o) = 2∆v(o) = 2∑y

P (o, y)(1− Py[To <∞]) = 2Po[T+o =∞] > 0

by transience, and divI(x) = 2∆v(x) = 0 for all x 6= o. That is, I is a flow from o to ∞ with

finite energy. ut

15.2. Flows on Zd

We now want to give some more details about random walks on Zd.

We start with a proof that Zd is transient for d ≥ 3. By Rayleigh’s monotonicity principle it

suffices to prove that Z3 is transient. By Lyon’s Theorem it suffices to provide a finite energy

flow on Z3.

Let µ be a probability measure on paths on some graph G. Let Γ denote the random path,

and suppose that µ-a.s. every vertex of G is visited finitely many times. Then, we can define

V (x, y) to be the number of times Γ crosses the edge (x, y), and Eµ(x, y) to be the expectation

of V (x, y) under µ.

91

Claim 15.2. Suppose that Γ is infinite and Γ0 = o µ-a.s. Suppose also that Eµ(x, y) < ∞ for

every edge (x, y).

Then, F (x, y) := Eµ(x, y)− Eµ(y, x) is a flow from o to ∞.

Proof. Anti-symmetry is clear. Also, for any x 6= o, since Γ is infinite, it cannot terminate at x.

Thus, every time Γ crosses an edge (y, x) it must then cross an edge (x, z) immediately after.

Thus, µ-a.s.∑y∼x V (x, y)− V (y, x) = 0 and so divF (x) = 0.

Also, since Γ0 = o, we get one extra passage out of o, but the rest must cancel: codivF (o) =

2∑y F (o, y) = 2. ut

That is, to show a graph is transient, we need to construct a measure on infinite paths, that

start at some vertex, and the expected number of visits to any vertex is finite. If the energy is

finite for such a measure, we have transience.

15.2.1. Wedges. Let us prove something a bit more general than Z3 being transient and Z2

being recurrent.

Let ϕ : N→ N be an increasing function. Consider the subgraph of Z3 induced on

Wϕ =

(x, y, z) ∈ Z3 : |z| ≤ ϕ(|x|).

(This is the ϕ-wedge.)

••• Theorem 15.3 (T. Lyons 1983). If

∞∑n=1

1

n(ϕ(n) + 1)=∞

then Wϕ is recurrent.

If ϕ(n+ 1)− ϕ(n) ≤ 1 and∞∑n=1

1

n(ϕ(n) + 1)<∞

then Wϕ is transient.

Proof. The first direction is simpler. Let Wϕ be a wedge, and let Bn denote the ball of radius n

around 0 in the graph metric (which is the L1 distance in R3). Let ∂Bn be the edges connecting

Bn to Bcn. Thus, ∂Bn form disjoint cutsets between 0 and ∞.

What is the size of ∂Bn? Well there are at most 2n choices for x and then, given x there are

at most 2(ϕ(|x|) + 1) ≤ 2(ϕ(n) + 1) choices for z, which then determines y up to sign. Thus, the

92

size is bounded by |∂Bn| ≤ O(n(ϕ(n) + 1)). So Nash-Williams tells us that if

∞∑n=1

1

n(ϕ(n) + 1)=∞

the walk is recurrent.

Now for the other direction. We define a measure on paths in Wϕ. Let U1, U2 be uniformly

chosen on [0, 1] independently. Let L be the set (n,Un,U ′ϕ(n)) : n ∈ N. Choose a monotone

path Γ in Wϕ that is always at distance at most 1 from L. (A monotone path γ is a path in Z3

such that dist(γt+1, 0) ≥ dist(γt, 0).)

Fix an edge e in Wϕ and suppose that (x, y, z) is an endpoint of e. Let R = |x| + |y| + |z|.The event that e ∈ Γ implies that (x, y, z) is at distance at most 1 from L. That implies that

3n ≥ n+ Un+ U ′ϕ(n) ≥ R− 1 ⇒ n ≥ R− 1

3

(where we have used that ϕ(n) ≤ n). Also

|x− n|+ |y − Un|+ |z − U ′ϕ(n)| ≤ 1

so nU ∈ [y − 1, y + 1] and ϕ(n)U ′ ∈ [z − 1, z + 1]. Thus,

µ[e ∈ Γ] ≤ 4

n(ϕ(n) + 1).

Because Γ visits any edge at most once, this is also a bound on Eµ(e). Since there are at most

O(R(ϕ(R) + 1)) ≤ O(n(ϕ(n) + 1)) such possibilities for (x, y, z) ∈Wϕ, we have that the energy

of the flow is at most

2∑R

∑|x|+|y|+|z|=R

∑e : e+=(x,y,z)

Eµ(e)2 ≤ const. ·∑n

n(ϕ(n) + 1) · 1

n2(ϕ(n) + 1)2

= const.∑n

1

n(ϕ(n) + 1).

Since this is finite, the flow is of finite energy and the wedge is transient. ut

Example 15.4. For example, if we choose ϕ(n) = nε, we get a transient wedge. This is also

true if we take ϕ(n) = (log n)2.

If we choose ϕ(n) = 1 we get essentially Z2 and recurrence, of course. Also, ϕ(n) = log n will

give a divergent sum, so this wedge is recurrent. 454



93

Random Walks

Ariel Yadin

Lecture 16: Resistance in Euclidean Lattices

Let’s wrap up our discussions with some examples of random walks on graphs.

16.1. Euclidean lattices

We have already seen that Zd are transient for d ≥ 3 and recurrent for d ≤ 2. We saw two

different methods to prove this.

The first was brute force computation of P0[St = 0], using Stirling’s formula, and then ap-

proximating E0[Vt(0)] and E0[V∞(0)].

The second method was more robust, and less computational. It involved approximating the

energy of certain flows, mainly taking a uniform direction and following that direction with a

path in the lattice.

Energy estimates and the Nash-Williams inequality can give us better control of the effective

resistance and Green function.

16.1.1. Resistance Estimates. Since Zd, d ≥ 3, is transient, we know that Reff(0, ∂Bn) is

bounded by two constants, where ∂Bn is the boundary of the ball of radius n around 0.

However, for d = 2 we know that Reff(0, ∂Bn) → ∞. We now investigate the growth rate of

this function.

• Proposition 16.1. Let Zn =z ∈ Z2 : dist(0, z) ≥ n

. Then, there exist constants 0 <

c,C <∞ independent of n such that

c log n ≤ Reff(0, Zn) ≤ C log n.

Proof. The lower bound follows by noting that for the sets

Πn =

(z, z′) ∈ Z2 : dist(z, 0) = n− 1,dist(z′, 0) = n,

all Π1,Π2, . . . ,Πn are cuts between 0 and Zn, with size |Πn| = O(n). So the Nash-Williams

inequality gives

Reff(0, Zn) ≥n∑k=1

1

|Πk|≥ const. ·

n∑k=1

1

k= const. log n.

94

For the other direction, let vn(x) = 14gZn(x, 0). So vn is a voltage imposed on 0 and Zn, with

∆vn(0) = 14 and vn(0) = (4P0[T+

0 > TZn ])−1 = Reff(0, Zn). Also,

E(vn) = 8∆vn(0)vn(0) = 2Reff(0, Zn).

Let U be a uniform random variable in [0, 1], and let L =

(n,Un) ∈ R2

. Let Γ be some

random monotone path from 0 that always is at distance at most 1 from L. For any edge

e = (x, y) in Z2, the event e ∈ Γ implies that |x − n| ≤ 1 and nU ∈ [y − 1, y + 1]. Thus, the

expected number of times Γ crosses e is at most 2n ≤ 2

|x|−1 . Let Fn be the flow given by this

random path restricted to G \ Zn. Since the number of edges with endpoint at distance n from

0 is O(n),

E(Fn) ≤n∑k=1

O(k · k−2) = O(log n).

Recall that divFn(0) = 1/2, so Thompson’s principle tells us that for I = ∇vn, since I is a

current with divI(0) = 2∆vn(0) = 12 ,

E(Fn) ≥ E(I) = E(vn) = 2Reff(0, Zn).

ut

Remark 16.2. If we tried to adapt the argument above to Zd, we would see that the probability

that an edge e at distance n from 0 is in Γ is at most O(n−(d−1)) (because we would be looking

at the direction (n,U1n,U2n, . . . , Ud−1n) for U1, . . . , Ud i.i.d. ). Thus,

Reff(0, Zn) ≤ 2−1E(Fn) ≤n∑k=1

O(kd−1 · k−2(d−1)) =

n∑k=1

O(k1−d) = O(n2−d).

Similarly the lower bound would follow from the Nash-Williams inequality.

16.2. Regular Trees

Let Td denote the d regular tree. Fix some vertex ρ ∈ Td as the root. For n ≥ 0 let

Tn = x ∈ Td : dist(x, ρ) = n. It is easy to check that |T0| = 1 and |Tn| = d(d − 1)n−1 for

n ≥ 1.

For any x, y ∈ Tn there exists a graph automorphism ϕ : Td → Td that maps ϕ(x) = y and

fixes each level Tn; i.e. ϕ(Tn) = Tn. Thus, if vn is a unit voltage imposed on ρ and Tn, we

have that vn is constant on Tk for k ≤ n. Thus, all vertices in each level Tk can be shorted into

95

one vertex, without changing the effective resistance Reff(ρ, Tn). This gives us a network whose

vertices are 0, 1, . . . , n, and resistances r(k, k+ 1) = |Tk+1|−1. Thus, the effective resistance is

Reff(ρ, Tn) =1

d

n∑k=1

1

(d− 1)k−1=

d− 1

d(d− 2)·(1− (d− 1)−n

).

Thus, Reff(ρ,∞) = d−1d(d−2) <∞ so Td is transient for d > 2.

16.2.1. A computational proof. We now give a computational proof that the random walk

on Td is transient for d > 2.

Let (Xt)t be the random walk on Td, and consider the following sequences: Dt := dist(Xt, ρ)

and Mt = (d− 1)−Dt . Let Tj be the first time Xt ∈ Tj .First, note that

E[Mt+1 | Ft] = 1Dt=0(d− 1)−1 + 1Dt>0 ·(

1d (d− 1)Mt + (1− 1

d )(d− 1)−1Mt

)= 1Dt=0(d− 1)−1 + 1Dt>0Mt.

So under Px for x 6= ρ, we have that (Mt∧T0)t is a bounded martingale. If Px[T0 < ∞] = 1 we

would have by the optional stopping theorem that

(d− 1)−dist(x,ρ) = Ex[MT0] = 1,

which is a contradiction. Since

Pρ[T+ρ <∞] =

1

d

∑x∈T1

Px[T0 <∞] < 1,

we get that Td is transient.

In fact, the above lets us calculate exactly the probability to escape from ρ: If T = T0 ∧ Tnthen by the optional stopping theorem, for x ∈ T1,

(d− 1)−1 = Ex[MT ] = Px[Tρ > Tn] · (d− 1)−n + 1− Px[Tρ > Tn],

so

Px[Tρ > Tn] =d− 2

d− 1− (d− 1)−n+1= 1− (d− 1)n−1 − 1

(d− 1)n − 1.

Also, v(x) = Px[Tρ < Tn] is a unit voltage on ρ and Tn. Thus,

∆v(ρ) =1

d

∑x∈T1

Px[Tρ > Tn] =d− 2

d− 1− (d− 1)−n+1.

So

Ceff(ρ, Tn) = d∆v(ρ) =d(d− 2)

d− 1− (d− 1)−n+1=d(d− 2)

d− 1·(1− (d− 1)−n

)−1,

96

which coincides with our calculation above.

16.3. Flows from random paths

In this section, we generalize the previous constructions on Zd.

Let µ be a probability measure on infinite paths in G started from o ∈ G. By mapping each

path in the support of µ to its loop-erasure, we may assume without loss of generality that µ is

supported on simple paths (paths that do not cross any vertex more than once).

Let α, β be two independent random paths of law µ. Now, define F ∈ C0(E) by F (x, y) =

µ((x, y) ∈ α) − µ((y, x) ∈ α) (by e ∈ α we mean that there exists n such that e = (αn, αn+1).

We claim that F is a flow. Indeed, for x 6= o the number of edges going into x in α equals the

number of edges exiting x in α. Thus, for x 6= o,

divF (x) =∑y∼x

1cx

(F (x, y)− F (y, x)) = 2cx·∑y∼x

E[1(x,y)∈α − 1(y,x)∈α] = 0.

Similarly for x = o, there is one more edge exiting o than edges entering o. So we have divF (o) =

2co

.

Let us calculate the energy of F . First, note that for x ∼ y,

F (x, y)2 = (µ((x, y) ∈ α)− µ((y, x) ∈ α))2 ≤ µ((x, y) ∈ α)2 + µ((y, x) ∈ α)2.

Thus, for α, β independent paths of law µ,

E(F ) =∑e

r(e)F (e)2 ≤ 2 ·∑e

r(e)µ(e ∈ α) · µ(e ∈ β)

= 2E∑e

r(e)1e∈α1e∈β

We conclude:

• Proposition 16.3. Let G be a graph. Suppose that G admits a probability measure µ on

infinite paths in G started from some fixed o ∈ G such that for two independent paths α, β we

have E |α ∩ β| <∞. Then G is transient.

The following is an open question.

Conjecture 16.4. Let G be a transitive graph. If the simple random walk on G is transient,

then there exists a measure on infinite paths µ started from some fixed o ∈ G such that for two

independent paths α, β of law µ, there exists ε > 0 with

E[eε|α∩β|] <∞.

97



98

Random Walks

Ariel Yadin

Lecture 17: Spectral Analysis

17.1. Spectral Radius

Let (G, c) be a network. Recall that the transition matrix P is an operator on C0(V ) that

operates by Pf(x) =∑y P (x, y)f(y). Also, recall that the space L2(V ) is the space of functions

f ∈ C0(V ) that admit

〈f, f〉 =∑x

cxf(x)2 <∞.

One can easily check that P : L2(V )→ L2(V ). Also, P is a self-adjoint operator, and its norm

admits ||P || ≤ 1 (this is called a contraction).

• Proposition 17.1. Let (G, c) be a weighted graph with transition matrix P . The limit

ρ(P ) = lim supn→∞

(Pn(x, y))1/n

does not depend on x, y.

Proof. Fix z, w ∈ V . We will show that ρ(P ) ≤ lim supn(Pn(z, w))1/n.

Because P is irreducible, we have that for some t, t′ > 0, P t(z, x) > 0, P t′(y, w) > 0. Thus,

Pn(z, w) ≥ P t(z, x)Pn−t−t′(x, y)P t

′(y, w).

Since (P t(z, x))1/n → 1 and (P t′(y, w))1/n → 1,

lim supn→∞

(Pn(z, w))1/n ≥ lim supn→∞

(Pn(x, y))1/n.

Exchanging the roles of x, y and z, w we get that the lim sup does not depend on the choice of

x, y. ut

• Definition 17.2 (Spectral Radius). Let (G, c) be a weighted graph with transition matrix P .

Define the spectral radius of (G, c) to be

ρ(G, c) = ρ(P ) := lim supn→∞

Pn(x, x).

99

One of the reasons for the name spectral radius is that by the Cauchy-Hadamard criterion, the

generating function for the Green function has radius of convergence ρ−1. That is, the function

g(x, y|z) =

∞∑n=0

Pn(x, y)zn

converges when |z| < ρ−1.

Jacques Hadamard

(1865-1963)

Note that ρ ≤ 1, and that g(x, y|z = 1) is exactly the Green function. Since the Green

function converges if and only if G is transient, we have that for recurrent graphs ρ = 1. The

natural question arises, what are precisely the cases for which ρ = 1? This has been answered

by Kesten in his PhD thesis in 1959, see Theorem 18.1 below.

The above is a good reason for the radius part of the name spectral radius. The next propo-

sition explains the spectral part of the name.

• Proposition 17.3. Let (G, c) be a weighted graph with transition matrix P . Then, ||P || =

ρ(P ). Moreover, for any x, y,

Pn(x, y) ≤√cycx· ρ(P )n.

Proof. First, note that for any x, y,

Pn(x, y) = c−1x 〈Pnδy, δx〉 ≤ c−1

x ||P ||n||δy|| · ||δx|| =√cycx||P ||n.

Thus, ρ(P ) ≤ ||P ||.The other direction is a bit more complicated. Let f ∈ L2(V ) have finite support S ⊂ V .

Now, because S is finite, for every ε there exists N = N(ε, S), such that for all n > N , and all

x, y ∈ S we have that P 2n(x, y) ≤ (ρ(P ) + ε)2n. Thus, for all n > N(ε, S),

||Pnf ||2 = 〈P 2nf, f〉 =∑x,y

cxP2n(x, y)f(x)f(y)

≤∑x,y

cxP2n(x, y)f(x)f(y)1f(x)>01f(y)>0

≤ (ρ(P ) + ε)2n ·∑x,y

cxf(x)f(y)1f(x)>01f(y)>0 = Cf (ρ(P ) + ε)2n.

Thus, lim supn ||Pnf ||1/n ≤ ρ(P ) + ε for any ε, and so lim supn ||Pnf ||1/n ≤ ρ(P ).

Now, consider the sequence an = ||Pnf ||. We have that

a2n+1 = 〈Pn+1f, Pn+1f〉 = 〈Pnf, Pn+2f〉

≤ ||Pnf || · ||Pn+2f || = an · an+2.

100

That is, bn := an+1

anis a non-decreasing sequence. Thus, the following limits exist and satisfy

supnbn = lim

n→∞bn = lim

n→∞a1/nn ≤ ρ(P ).

So,

||Pf ||||f || = b0 ≤ sup

nbn ≤ ρ(P ).

This holds for all finitely supported f .

We want this to hold for all f ∈ L2(V ). We now use the fact that the finitely supported

functions are dense in L2(V ). Indeed, let f ∈ L2(V ). Fix ε > 0. Since∑x cxf(x)2 < ∞, there

exists a finite set Sε ⊂ V such that ∑x 6∈Sε

cxf(x)2 < ε2.

Thus, setting g = f1Sε , we have that ||f − g||2 < ε2. Now, since g is finitely supported, and

since ||g|| ≤ ||f ||,

||Pf || = ||P (f − g) + Pg|| ≤ ||P || · ||f − g||+ ||Pg||

≤ ||P ||ε+ ρ(P )||g|| ≤ ||P ||ε+ ρ(P )||f ||.

Taking ε→ 0, ||Pf || ≤ ρ(P )||f ||. Since this holds for all f , we get that ||P || ≤ ρ(P ). ut

Exercise 17.1. Let (G, c) be a weighted graph with transition matrix P . Let ρ(P ) be

the spectral radius. Show that if G is recurrent then ρ(P ) = 1.

17.1.1. Energy minimization. Let (G, c) be a weighted graph. Consider the functions on G

with finite support; i.e. L0(V ). These all have finite energy. We want to find the function that

minimizes the energy, when normalized to have length 1.

• Proposition 17.4. Let (G, c) be a weighted graph. Then

1− ρ(G) = inf06=f∈L0(V )

E(f)

2〈f, f〉 .

(Sometimes 1 − ρ is called the spectral gap. This is the minimal possible energy of unit

length functions.)

101

Proof. Note that for f ∈ L0(V ) we can use duality so that

12E(f) = 〈∆f, f〉 = 〈f, f〉 − 〈Pf, f〉.

Thus, it suffices to show that

ρ = ρ := sup0 6=f∈L0(V )

〈Pf, f〉〈f, f〉 .

Now, for any f 6= 0 we have by Cauchy-Schwarz

|〈Pf, f〉| ≤ ||Pf || · ||f || ≤ ||P || · 〈f, f〉,

so ρ ≤ ||P || = ρ(P ). On the other hand, since P is self-adjoint, for any f, g ∈ L0(V ),

〈Pf, g〉 =1

4(〈P (f + g), f + g〉 − 〈P (f − g), f − g〉) .

So

〈Pf, g〉 ≤ ρ

4· (〈f + g, f + g〉+ 〈f − g, f − g〉) =

ρ

2· (〈f, f〉+ 〈g, g〉) .

Now take g = ||f ||||Pf ||Pf . Plugging this in above gives

||f || · ||Pf || = ||f ||||Pf || · 〈Pf, Pf〉 ≤

ρ

2

(〈f, f〉+

||f ||2||Pf ||2 〈Pf, Pf〉

)= ρ||f ||2.

So ||Pf || ≤ ρ||f || for all f ∈ L0(V ).

Using the fact that L0(V ) is dense in L2(V ) completes the proof: For any f ∈ L2(V ) and any

ε find g ∈ L0(V ) such that ||f − g|| < ε and ||g|| ≤ ||f ||. Then,

||Pf || ≤ ||P (f − g)||+ ||Pg|| ≤ ||P ||ε+ ρ||g|| ≤ ||P ||ε+ ρ||f ||.

Taking ε→ 0 gives that ρ(P ) = ||P || ≤ ρ. ut

17.2. Isoperimetric Constant

For a graph G, we are interested in how small a boundary of a set can be, compared to the

volume of that set. These serve as bottlenecks in the graph, so a random walk can get “stuck”

inside for a while. Thus, it makes sense to define the following.

• Definition 17.5. Let (G, c) be a weighted graph. Let S ⊂ G be a finite subset. Define the

(edge) boundary of S to be

∂S = (x, y) ∈ E(G) : x ∈ S , y 6∈ S .

Define the isoperimetric constant of G to be

Φ = Φ(G, c) := inf c(∂S)/c(S) : S is a finite connected subset of G .

102

Here c(∂S) =∑e∈∂S c(e) and c(S) =

∑x∈S cx.

Of course 1 ≥ Φ(G) ≥ 0 for any graph. When Φ(G) > 0, we have that sets “expand”: the

edges coming out of a set carry a constant proportion of the weight of the set.

• Definition 17.6. Let (G, c) be a weighted graph. If Φ(G, c) = 0 we say that (G, c) is amenable.

Otherwise we call (G, c) non-amenable.

A sequence a finite connected sets (Sn)n such that c(∂Sn)/c(Sn) → 0 is called a Folner

sequence, or the sets are called Folner sets.

Erling Folner (1919 - 1991) The concept of amenability was introduced by von Neumann in the context of groups and the

Banach-Tarski paradox. Folner’s criterion using boundaries of sets provided the ability to carry

over the concept of amenability to other geometric objects such as graphs.

The isoperimetric constant is a geometrical object. It turns out that positivity of the isoperi-

metric constant is equivalent to the spectral radius being strictly less than 1.

John von Neumann

(1903-1957)

Exercise 17.2. Let S ⊂ Td be a finite connected subset, with |S| ≥ 2. Show that

|∂S| = |S|(d− 2) + 2.

Deduce that Φ(Td) = d−2d .



103

Random Walks

Ariel Yadin

Lecture 18: Kesten’s Amenability Criterion

18.1. Kesten’s Thesis

Kesten, in his PhD thesis in 1959 proved the connection between amenability and spectral

radius strictly less than 1. This was subsequently generalized to more general settings by others

(including Cheeger, Dodziuk, Mohar).

••• Theorem 18.1. A weighted graph (G, c) is amenable if and only if ρ(G, c) = 1. In fact,

Φ2

2≤ 1−

√1− Φ2 ≤ 1− ρ ≤ Φ.

Harry Kesten (1931–)

First we require

• Lemma 18.2. Let (G, c) be a weighted graph. For any f ∈ L0(V ) (that is, with finite support)

2Φ(G, c) ·∑x

cxf(x) ≤∑x,y

|∇f(x, y)|.

Note that if f = 1S for a finite set S, this is exactly the definition of Φ.

Proof. Since f has finite support we can write∫ ∞0

c(x : f(x) > t)dt =

∫ ∞0

∑x

cx1f(x)>tdt =∑x

cxf(x)1f(x)≥0 ≥∑x

cxf(x).

Also, ∫ ∞0

1f(x)>t≥f(y)dt = |f(x)− f(y)|1f(x)≥f(y).

Using the set St = x : f(x) > t we see that

∂St = (x, y) ∈ E : f(x) > t ≥ y .

104

Since for any t, Φ · c(St) ≤ c(∂St), we can integrate over t to get

Φ ·∑x

cxf(x) ≤∫ ∞

0

Φ · c(St)dt ≤∫ ∞

0

∑x,y

c(x, y)1f(x)>t≥f(y)dt

=∑x,y

c(x, y)|f(x)− f(y)|1f(x)≥f(y) ≤1

2

∑x,y

|∇f(x, y)|,

where we have used the fact that all sums are finite because f has finite support. ut

Proof of Theorem 18.1. The leftmost inequality is just ξ2/2 ≤ 1 −√

1− ξ2, valid for any ξ ∈[0, 1].

The rightmost inequality follows by taking a sequence of finite connected sets (Sn)n such that

Φ = limn→∞ c(∂Sn)/c(Sn). Since

(∇1Sn(x, y))2 = c(x, y)2(1(x,y)∈∂Sn + 1(y,x)∈∂Sn),

1

2E(1Sn) =

1

2

∑x,y

r(x, y)(∇1Sn(x, y))2 =∑x,y

c(x, y)1(x,y)∈∂Sn = c(∂Sn).

Also, 〈1Sn ,1Sn〉 =∑x cx1x∈Sn = c(Sn). Thus,

1− ρ = inf06=f∈L0(V )

E(f)

2〈f, f〉 ≤ limn→∞

c(∂Sn)

c(Sn)= Φ.

The central inequality is Φ2 ≤ 1− ρ2. We use that

1− ρ = inf06=f∈L0(V )

E(f)

2〈f, f〉 and ρ ≥ sup06=f∈L0(V )

〈Pf, f〉〈f, f〉 .

First, for f ∈ L0(V ),

2〈f, f〉+ 2〈Pf, f〉 =∑x,y

c(x, y)f(x)2 +∑x,y

c(x, y)f(y)2 + 2∑x,y

c(x, y)f(x)f(y)

=∑x,y

c(x, y)(f(x) + f(y))2

For g = f2,

〈f, f〉 =∑x

cxg(x) ≤ (2Φ)−1∑x,y

c(x, y)|g(x)− g(y)|

= (2Φ)−1∑x,y

c(x, y)|f(x)− f(y)| · |f(x) + f(y)|.

Applying Cauchy-Schwarz,

4Φ2 · 〈f, f〉2 ≤∑x,y

c(x, y)(f(x)− f(y))2 ·∑x,y

c(x, y)(f(x) + f(y))2

= E(f) · (2〈f, f〉+ 2〈Pf, f〉) ≤ 2E(f) · 〈f, f〉 · (1 + ρ).

105

Rearranging, we get that for any f ∈ L0(V ),

4Φ2 ≤ 2E(f)

〈f, f〉 · (1 + ρ).

Taking infimum over all f ∈ L0(V ), we get that Φ2 ≤ (1− ρ)(1 + ρ) = 1− ρ2 as required. ut

Example 18.3. Let’s calculate ρ(Td), the spectral radius for the d-regular tree.

Let r be the root of Td, and let Tn = x : dist(x, r) = n.For one direction, consider the function

fn(x) =

n∑k=1

(d− 1)−k/2 · 1x∈Tk = 11≤dist(x,r)≤n · (d− 1)−dist(x,r)/2.

If x ∼ y then c(x, y)f(x)f(y) = (d − 1)−(dist(x,r)+dist(y,r))/2 if 1 ≤ dist(x, r),dist(y, r) ≤ n, and

0 otherwise. Thus, since |Tk| = d(d− 1)k−1,

||fn||2 =∑

x : 1≤dist(x,r)≤ncx(d− 1)−dist(x,r) =

n∑k=1

d(d− 1)k−1 · d · (d− 1)−k

= d2(d− 1)−1n.

Simlarly,

Pfn(x) =

2(d−1)1/2

d (d− 1)−dist(x,r)/2 2 ≤ dist(x, r) ≤ n− 1,

(d−1)1/2

d (d− 1)−dist(x,r)/2 dist(x, r) ∈ 1, n ,

(d− 1)−1/2 x = r,

1d (d− 1)−n/2 dist(x, r) = n+ 1,

and 0 otherwise. So,

||Pfn||2 =∑x

cx(Pfn(x))2

= d

n−1∑k=2

d(d− 1)k−1 · 4(d− 1)

d2(d− 1)−k + d · (d− 1)−1 + d2(d− 1)n · 1

d2(d− 1)−n

+ d2 · d− 1

d2(d− 1)−1 + d2(d− 1)n−1 · d− 1

d2(d− 1)−n

= 4(n− 2) +d

d− 1+ 1 + 1 + 1 = 4n− 5 +

d

d− 1.

This implies that

ρ(Td) ≥ ||Pfn||/||fn|| →2√d− 1

d.

For the other direction, since Φ(Td) = d−2d , we have that ρ(Td) ≤

√1− Φ(Td)2 = 2

√d−1d .

454

106



107

Random Walks

Ariel Yadin

Lecture 19:

19.1. Speed of Random Walks

Let (G, c) be a weighted graph and let (Xt)t be the corresponding weighted random walk. In

the exercises one shows that the limit

limt→∞

Ex[dist(Xt, X0)]

t

exists for transitive graphs, and is independent of the choice of starting vertex x. We call this

the speed of the random walk. For general graph this limit may not exist, so we consider lim inf

and lim sup of the sequence. Of course, these limits lie between 0 and 1.

• Definition 19.1. Let (G, c) be a weighted graph and let (Xt)t be the corresponding weighted

random walk. Fix some o ∈ G. The lower speed and upper speed are defined to be

lim inft→∞

Eo[dist(Xt, X0)]

tand lim sup

t→∞

Eo[dist(Xt, X0)]

t.

If these limits coincide, we call the corresponding limit the speed.

Example 19.2. Let us calculate the speed of the random walk on Td.

Fix o ∈ Td. Let (Xt)t be the random walk and define Dt = dist(Xt, o). Let Lt = Lt(o) =∑tk=0 1Xk=o and L−1 = 0.

Consider the sequence Mt = dist(Xt, o)− d−2d t− 2

dLt−1.

Eo[dist(Xt+1, o) | Ft] = 1Xt=o + 1Xt 6=o((dist(Xt, o) + 1)d−1

d + (dist(Xt, o)− 1) 1d

)= dist(Xt, o) +

d− 2

d+ 1Xt=o ·

(1− d− 2

d− dist(Xt, o)

)= dist(Xt, o) +

d− 2

d+

2

d1Xt=o,

where we have used that dist(Xt, o)1Xt=o = 0. Thus,

Eo[Mt+1 | Ft] = dist(Xt, o) +d− 2

d+

2

d1Xt=o −

d− 2

d(t+ 1)− 2

dLt

= dist(Xt, o)−d− 2

dt− 2

dLt−1 = Mt.

108

So (Mt)t is a martingale. This implies that

0 = Eo[Mt] = Eo[dist(Xt, o)]−d− 2

dt− 2

dEo[Lt−1].

Since Td is transient, we know by monotone convergence that

limt→∞

Eo[Lt−1] = Eo[V∞(o) + 1] =1

Po[T+o =∞]

<∞.

Thus,

limt→∞

1

tEo[dist(Xt, o)] =

d− 2

d.

454

It is not a coincidence that Td has positive speed. In fact, this has to do with the fact that

ρ(Td) < 1, or that Td is non-amenable.

••• Theorem 19.3. Let (G, c) be a weighted graph, and let (Xt)t be the corresponding random

walk started at some o ∈ G. Assume the following:

• ρ(G, c) < 1.

• There exists M > 0 such that cx ≤M for all x (i.e. cx is uniformly bounded).

• The limit

b := lim supr→∞

|B(o, r)|1/r <∞

is finite, where B(o, r) is the ball of radius r around o.

Then, the lower speed is positive. In fact, a.s.

lim inft→∞

1

tdist(Xt, o) ≥ −

log ρ(G)

log b> 0.

Proof. Let 0 < α < − log ρlog b . So ρbα < 1. We can choose λ > b such that ρλα < 1. Because λ > b,

there exists some universal constant K > 0 such that |B(o, r)| ≤ Kλr for all r. Because cx is

uniformly bounded, we have that K can be chosen large enough so that K >√M/co. Thus, for

all x and all t, P t(o, x) ≤ Kρt. Combining these two bounds we get that

P[dist(Xt, o) ≤ αt] ≤∑

x∈B(o,bαtc)P t(o, x) ≤ K2ρtλαt.

Since ρλα < 1, we get that these probabilities are summable. By Borel-Cantelli, we have that

P[lim inf 1t dist(Xt, o) ≤ α] = 0.

Taking α→ − log ρlog b completes the proof. ut

109

X Recall that by Fatou’s Lemma

− log ρ

log b≤ Eo[lim inf t−1dist(Xt, o)] ≤ lim inf t−1 Eo[dist(Xt, o)].

So, non-amenable graphs have positive (lower) speed.

Example 19.4. For all d, the random walk on Zd has zero speed:

In fact, we show that for a random walk (Xt)t on Zd, E0[dist(Xt, 0)] t1/2.

Consider the j-th coordinate Xt(j).

E0[Xt+1(j)2 | Ft] =1

2d·((Xt(j) + 1)2 + (Xt(j)− 1)2

)+(1− 1

2d

)·Xt(j)

2 = Xt(j)2 +

1

d.

Thus, Mt = Xt(j)2 − t

d is a martingale, and 0 = E0[Xt(j)2] − t

d . So E0[|Xt(j)|] ≤√t/d, and

E0[dist(Xt, 0)] ≤√dt.

Also, note that we can write

Xt(j) =

t∑k=1

ξk

where (ξk)k are i.i.d. random variables with P[ξk = 1] = P[ξk = −1] = 1/2d and P[ξk = 0] =

1 − 1/2d. Since E[ξk] = 0 and Var[ξk] = E[ξ2k] = 1/d, we get by the central limit theorem that

√dt−1/2Xt(j) converges in distribution to a standard normal random variable, N(0, 1). So

limt→∞

P0[√d|Xt(j)| ≥

√t · 1

2 ] = P[|N(0, 1)| ≥ 12 ] := c > 0.

Thus,

lim inft→∞

1√tE0[|Xt(j)|] ≥ lim inf

t→∞1√tP0[|Xt(j)| ≥ 1

2

√td−1/2] ·

√t

2√d

=c

2√d,

and so

lim inft→∞

1√tE0[dist(Xt, 0)] ≥ c

2

√d.

454

Since many interesting graphs have zero speed, we sometimes are interested in a bit more

precision.

• Definition 19.5. Let (G, c) be a weighted graph and let (Xt)t be the corresponding weighted

random walk. Fix some o ∈ G. The lower escape exponent and upper escape exponent

are defined to be

lim inft→∞

logEo[dist(Xt, X0)]

log tand lim sup

t→∞

logEo[dist(Xt, X0)]

log t.

If these limits coincide, we call the corresponding limit the escape exponent.

110

Example 19.6. Td has escape exponent 1. In fact any graph with positive speed has escape

exponent 1. (This is immediate from logEo[dist(Xt, X0)] = log( 1t Eo[dist(Xt, X0)]) + log t.)

454

Example 19.7. Zd has escape exponent 1/2, as shown above. 454

Speed exponent 1/2 plays an important role in the theory. Walks with speed exponent 1/2

are called diffusive. Walks with speed exponent < 1/2 (resp. > 1/2) are called sub-diffusive

(resp. super-diffusive). Walks with speed exponent 1 (i.e. positive speed) are called ballistic.

19.2. Graph Powers

If G is a graph, there is a natural graph structure on V (G)d: Define the graph Gd to be as

follows. The vertex set of Gd is V (Gd) = V (G)d. The edges are define by the relations:

(x1, . . . , xd) ∼ (y1, . . . , yd) ⇐⇒ ∃ k : ∀ j 6= k , xj = yj and xk ∼ yk.

• Lemma 19.8. Let G be a graph with speed exponent α.

Then, any lazy random walk on G has speed exponent α. Moreover, for any d ≥ 1, the graph

Gd has speed exponent α as well.

Proof. First,

Exercise 19.1. Show that

distGd((x1, . . . , xd), (y1, . . . , yd)) =

d∑j=1

distG(xj , yj).

Now, let (Xt)t be a random walk on Gd and let Xt(j) be the j-th coordinate of Xt. Note

that (Xt(j))t is a lazy random walk on G with holding probability 1− 1d . Then,

dist(Xt, X0) =∑j

dist(Xt(j), X0(j)),

so it suffices to prove that any lazy walk on G has speed exponent α.

Let (Yt)t be a lazy walk on G with holding probability p. Let (Xt)t be a simple random walk

on G. Suppose that P is the transition matrix for the simple random walk (Xt)t on G, so that

111

the transition matrix for (Yt)t is Q = pI + (1− p)P . Let f(x) = dist(x, o). We have that

Qt =

t∑k=0

(t

k

)(1− p)kpt−k · P k,

so

Eo[dist(Yt, o)] =∑x

Qt(o, x)dist(x, o) = (Qtf)(o)

=

t∑k=0

(t

k

)(1− p)kpt−k · P kf(o) =

t∑k=0

(t

k

)(1− p)kpt−k · Eo[dist(Xk, o)].

Now, for any ε > 0 there exists Kε such that for all k > Kε,

kα−ε ≤ Eo[dist(Xk, o)] ≤ kα+ε.

Let Bt ∼ Bin(t, 1− p), and let qk =(tk

)(1− p)kpt−k = P[Bt = k]. By Chebychev’s inequality,

P[|Bt − (1− p)t| > 12 (1− p)t] ≤ 4 Var[Bt]

(1− p)2t2=

4p

(1− p)t ,

so

P[Bt ≥ 12 (1− p)t] ≥ 1− 4p

(1− p)t → 1.

Hence, for ε > 0, for all large enough t (so that (1− p)t > 2Kε),

t∑k=0

qk Eo[dist(Xk, o)] ≥ P[Bt ≥ 12 (1− p)t] ·

(12 (1− p)t

)α−ε,

which implies that

lim inft→∞

logEo[dist(Yt, o)]

log t≥ α− ε+ lim

t→∞log(P[Bt ≥ 1

2 (1− p)t] · ((1− p)/2)α−ε)

log t= α− ε.

On the other hand,t∑

k=0

qk Eo[dist(Xk, o)] ≤ Kε + tα+ε,

so

lim supt→∞

logEo[dist(Yt, o)]

log t≤ α+ ε+ lim

t→∞log(1 + Kε

tα+ε

)log t

= α+ ε.

ut



112

Random Walks

Ariel Yadin

Lecture 20:

20.1. Lamp-Lighter Graphs

We have already seen that non-amenable graphs must have positive speed and so escape

exponent 1. Non-amenable graphs are also transient, because their spectral radius is strictly less

than 1. The converse of these statements does not hold.

Figure 5 sums up the situation (for graphs) in terms of speed, amenability and transience.

zero speed

positive speed

transient recurrent

non-amenable

LL(Z3)

Z3Z

LL(Z)

sub-diffusive

τ3τ = τ((β log2 k)k)

Tdτ(Ackermann)

Exponential growth line

Figure 5. Possibilities for speed, amenability and transience

113

We will now construct a special class of graphs called lamp-lighter graphs. These are used

to give many examples in geometric group theory. They will provide us with examples of

(exponential volume growth) amenable graphs with positive speed.

Let us describe the construction in words, before the formal definition. We start with any

graph G (finite or infinite). This is the base graph. Suppose some lamp-lighter walks around

on the graph G. At every site of G there is some lamp, whose state is either on or off. The

lamp-lighter walks around and can also change the state of the lamp at her current position -

changing it either to on or to off.

What is a position in this new space? A position consists of the configuration of all lamps on

G, that is, a function from G to 0, 1 and the position of the lamp-lighter, i.e. a vertex in G.

• Definition 20.1 (Lamp-Lighter Chain). Let P be a Markov chain on state space S. We define

the Markov chain LL(P ), called lamp-lighter on P , as follows.

The state space for LL(P ) is LL(S) := S × (0, 1S)c, where (0, 1S)c is the set of ω : S →0, 1 with finite support (i.e. ω−1(1) is finite). For a state (x, ω) ∈ LL(S), we call x the position

of the lamp-lighter. If ω(y) = 1 we say the lamp at y is on, and if ω(y) = 0 we say it is off.

For a lamp configuration ω ∈ (0, 1S)c and a position x ∈ S we define ωx ∈ 0, 1S by

ωx(y) = ω(y) for all y 6= x and ωx(x) = ω(x) + 1 (mod 2).

Define the transition matrix LL(P ) by setting

LL(P )((x, ω), (y, η)) =1

4P (x, y),

for η ∈ ω, ωx, ωy, (ωx)y and 0 otherwise.

If (G, c) is a weighted graph, the LL(G) = LL(P ) for P the weighted random walk on (G, c).

X Note that the chain LL(P ) evolves as follows: At each step, the lamp-lighter chooses a

neighbor of her current position with distribution P (x, ·) and moves there, then she refreshes

the state of the lamps at the old position and at the new position to on or off with probability

1/2 each, independently.

Remark 20.2. If G is a graph, then LL(G) defines a graph structure as well. P ((x, ω), (y, η)) > 0

if and only if P (x, y) > 0 and η ∈ ω, ωy. So the graph structure on LL(G) is given by the

relations (x, ω) ∼ (y, η) for η ∈ ω, ωx, ωy, (ωx)y.

In fact:

114

Exercise 20.1. Suppose that (G, c) is a weighted graph, and P is the transition matrix

of the weighted random walk on G. Show that LL(P ) is given by a weighted random walk on a

weighted graph whose vertices are (x, ω), x ∈ G, ω ∈ (0, 1G)c. What is the weight function on

this graph?

Exercise 20.2. Let P be an irreducible Markov chain. Let (Xt, ωt)t be Markov-LL(P ).

Show that (Xt)t is Markov-P .

Exercise 20.3. Let G be a graph, and let L = LL(G). Let o ∈ G and let 0 ∈ 0, 1G

denote the all zero function (configuration). Then, for any (x, ω) ∈ L,

distL((x, ω), (o, 0)) ≥ |ω−1(1)|.

The next example is an (exponential volume growth) amenable graph, but with positive speed.

Example 20.3. Consider L = LL(Z3).

First we show that L is amenable. We only need to demonstrate a Folner sequence. For every

r, let (Br)r be a Folner sequence in Z3 (e.g. the L∞ balls of radius r). Let

Ar =

(x, ω) ∈ L : x ∈ Br , ω−1(1) ⊂ Br.

Note that |Ar| = |Br|2|Br|. Also, ((x, ω), (y, η)) ∈ ∂Ar if and only if (x, y) ∈ ∂Br and η ∈ω, ωy, ωx, (ωx)y. Thus, |∂Ar| = 4|∂Br|2|Br|. Thus, since the degree in L is 12,

Φ(L) ≤ infr

|∂Ar|12|Ar|

= infr

4|∂Br|6|Br|

= 0,

and so L is amenable.

Next we show that L has positive speed. Let 0 denote the all 0 lamp configuration, and let

o = (0, 0) ∈ L. Let (Xt, ωt) be a random walk on L. We claim that for any z 6= 0,

Po[ωt(z) = 1] =1

2· Po[T+

z ≤ t],(20.1)

115

where T+z = inf t ≥ 1 : Xt = z.

Given this, we have that

Eo[dist(Xt, o)] ≥ Eo[|ω−1t (1)|] =

∑z

P[ωt(z) = 1] =1

2· Eo[|Rt|],

where Rt = X1, . . . , Xt. Since (Xt)t is a random walk on Z3, we are left with showing that

limt t−1 EZ3

0 [|Rt|] > 0. In fact, we have using Exercise 20.4 below,

EZ3

0 [|Rt|]t

→ PZ3

0 [T+0 =∞] > 0.

We turn to proving (20.1). Let (y0, η0), . . . , (yn, ηn) be a path in L. Let T = inf t : yt = z,(where inf ∅ =∞). Define a new path

(y0, η0), . . . (yT−1, ηT−1), (yT , ηzT ), (yT+1, η

zT+1), . . . , (yn, η

zn).

Since L is a regular graph, both paths have the same probability. Summing over all possible

paths, we get that for any k ≤ t,

Po[ωt(z) = 1, T+z = k] = Po[ωzt (z) = 1, T+

z = k] = Po[ωt(z) = 0, T+z = k].

So

Po[ωt(z) = 1 | T+z = k] = Po[ωt(z) = 0 | T+

z = k] =1

2.

Thus,

Po[ωt(z) = 1] =

t∑k=1

Po[ωt(z) = 1, T+z = k] =

1

2·

t∑k=1

Po[T+z = k] =

1

2· Po[T+

z ≤ t],

proving (20.1). 454

Exercise 20.4. Show that for d ≥ 3, if (Xt)t is a random walk on Zd, and Rt =

X1, . . . , Xt is the range, then

E0[|Rt|]t

→ P0[T+0 =∞].

Example 20.4. We have already seen an example of amenable zero-speed graphs: Zd. We in

fact know that these are diffusive.

Let us show that this can even be done with a graph of exponential volume growth.

116

We will show that LL(Z) is (at most) diffusive.

Let (Xt, ωt) be a random walk on L = LL(Z). Let o = (0, 0) ∈ L. Define Mt = maxk≤t |Xt|.Since the lamp-lighter up to time t never leaves [−Mt,Mt], we have Po-a.s. that ω−1(1) ⊂[−Mt,Mt].

Note that at time t, the lamp-lighter can walk to one of the ends of [−Mt,Mt] in at most

Mt steps, and then start turning off all the lamps in [−Mt,Mt] in at most 2Mt steps, finally

returning to 0 in at most another Mt steps. Thus, dist(Xt, o) ≤ 4Mt for all t, Po-a.s.

Thus it suffices to show that E[Mt] ≤ 2√t for all t. For this we use a trick called the reflection

principle.

By the strong Markov property at time Tx, we have that

Po[Xt ≥ x, Tx ≤ t] =

t∑k=0

Po[Tx = k] · Px[Xt−k ≥ x]

= 1x=o · Po[Xt ≥ o] +

t−1∑k=1

Po[Tx = k] · 1

2+ Po[Tx = t]

≥ Po[Tx ≤ t] ·1

2.

where we have used transitivity, and symmetry by reflecting around 0:

Px[Xt ≥ x] = P0[Xt ≥ 0] = P0[Xt ≤ 0] ≥ 1

2,

since P0[Xt ≤ 0] + P0[Xt ≥ 0] = 1 + P0[Xt = 0] ≥ 1. We now have

P0[maxk≤t

Xk ≥ x] = P0[Tx ≤ t] ≤ 2P0[Xt ≥ x].

Reflecting around 0,

P0[mink≤t

Xk ≤ −x] = P0[T−x ≤ t] ≤ 2P0[Xt ≤ −x].

So

P0[Mt ≥ x] ≤ P0[maxk≤t

Xk ≥ x] + P0[mink≤t

Xk ≤ −x] ≤ 2P0[Xt ≥ x] + 2P0[Xt ≤ −x].

We conclude with

2E[|Xt|] = 2

∞∑x=0

P0[|Xt| ≥ x+ 1] =

∞∑x=0

2P0[Xt ≥ x+ 1] + 2P0[Xt ≤ −(x+ 1)]

≥∞∑x=0

P0[Mt ≥ x+ 1] = E0[Mt].

117

So E0[Mt] ≤ 2E[|Xt|] ≤ 2√t. Thus,

Eo[dist(Xt, o)] ≤ 8√t.

454



118

Random Walks

Ariel Yadin

Lecture 21:

Our next goal is to complete the picture in Figure 5; that is to give examples of graphs that

are transient, but have very slow speed (sub-diffusive), and examples of graphs that are recurrent

but have positive upper speed.

21.1. Concentration of Martingales: Azuma’s inequality

Let (Xt)t be a random walk on Z. We know (using the martingale |Xt|2−t) that E0[T−r,r] =

r2. That is, it takes a random walk r2 steps to reach distance r. We have already seen that this

implies diffusive behavior of the walk.

Let us prove a short concentration result, showing that actually T−r,r is close to r2 with

very high probability.

••• Theorem 21.1 (Azuma’s Inequality). Let (Mt)t be a (Ft)t-martingale with bounded incre-

ments (i.e. |Mt+1 −Mt| ≤ 1 a.s.). Then for any λ > 0,

P[Mt −M0 ≥ λ] ≤ exp

(−λ

2

2t

).

Proof. There are two main ideas:

The first idea, is that for a random variable X with E[X] = 0 and |X| ≤ 1 a.s. one has

E[eαX ] ≤ eα2/2. Indeed, f(x) = eαx is a convex function, so for |x| ≤ 1 we can write x =

β · 1 + (1− β) · (−1), where β = x+12 , so

eαx ≤ βeα + (1− β)e−α = cosh(α) + x sinh(α).

(Here 2 cosh(α) = eα + e−α and 2 sinh(α) = eα − e−α.) Thus, because E[X] = 0, and using

(2k)! ≥ 2kk!,

E[eαX ] ≤ cosh(α) + E[X] sinh(α) = cosh(α)

=

∞∑k=0

α2k

(2k)!≤∞∑k=0

α2k

2kk!= eα

2/2.

For the second idea, due to Sergei Bernstein, one applies the Chebychev / Markov inequality

Segei Bernstein (1880-1968) to the non-negative random variable eαX , and then optimizes on α.

119

In our case: For every t, since E[Mt −Mt−1 | Ft−1] = 0 and |Mt −Mt−1| ≤ 1, exactly as

above we could show that

E[eα(Mt−Mt−1) | Ft−1] ≤ eα2/2 a.s.

Thus,

E[eα(Mt−M0)] = E[eα(Mt−1−M0) · E[eα(Mt−Mt−1) | Ft−1]

]≤ eα2/2 · E[eα(Mt−1−M0)] ≤ · · · ≤ etα2/2.

Now apply Markov’s inequality to the non-negative random variable eα(Mt−M0) to get

P[Mt −M0 ≥ λ] = P[eα(Mt−M0) ≥ eαλ] ≤ exp(

12 tα

2 − αλ).

Optimizing over α we get that for α = λ/t,

P[Mt −M0 ≥ λ] ≤ exp

(−λ

2

2t

).

ut

Example 21.2. Let’s apply Azuma’s inequality to random walks on Z.

Let (Xt)t be a random walk on Z. Recall that (Xt)t is a martingale. Consider the stopping

time T = T−r,r. This is the first time |Xt| ≥ r.Recall the reflection principle:

P0[T ≤ t] = P0[maxk≤t|Xk| ≥ r] ≤ 4P0[Xt ≥ r].

Using Azuma’s inequality on this last term,

P0[T−r,r ≤ t] ≤ 4 exp

(−r

2

2t

).

454

21.2. Recurrent Trees - The Grove

Let (dk)k∈N be a sequence of positive numbers. For each k, let τk be a binary tree of depth

dk.

Define the graph τ((dk)k) to be the graph N, with the tree τk glued at the vertex k ∈ N (let

the root of τk be k ∈ N); that is, the vertex set of τ((dk)d) is⋃∞k=0 V (τk). The edges are those

in each τk, with the edges k ∼ k + 1 for all k added. We call this the (dk)k-grove.

• Proposition 21.3. The graph τ((dk)k) is recurrent.

120

d0

d1d2

d3

d4

0 1 2 3 4

Figure 6. The graph τ((dk)k).

Proof. If v is a unit voltage on 0 and τk, then for any n ≤ k, and any vertex x ∈ τn we have

that v(x) = v(n). Indeed, if (Xt)t is a random walk on this graph, then because τn is finite,

Px-a.s. the hitting time of n is finite, and also, v(Xt) is a bounded martingale. Thus, by the

optional stopping theorem, v(x) = Ex[v(XTn)] = v(n).

Thus, we can short together all vertices in each tree τn, n ≤ k. This results in the network

which is just the graph N. Thus, Reff(0, τn) = n→∞. ut

Recall that if τ is a finite binary tree of depth d, then |E(τ)| = |V (τ)| − 1 =∑dk=0 2k − 1 =

2d+1 − 2.

• Lemma 21.4. Let r ∈ N ⊂ τ((dk)k). The hitting time of r, Tr, has expectation given by

E0[Tr] = 4

r−1∑k=0

(r − k)2dk − 3

2r(r + 1).

Proof. Every time the walk as at a vertex k ∈ N, with probability 1/2 it starts a random walk

in the finite subtree τk. The expected time to return to the root in a finite tree is the reciprocal

of the stationary distribution on that tree. Thus, we have

λk := Ek[T+k | X1 6∈ N] = |E(τk)| = 2(2dk − 1).

Now, using the strong Markov property, for k > 0,

Ek[Tk+1] =1

4· 1 +

1

4· Ek−1[Tk+1] +

1

2· (λk + Ek[Tk+1])

=1

4+

1

4Ek−1[Tk] +

1

4Ek[Tk+1] +

1

2λk +

1

2Ek[Tk+1].

121

Rearranging and iterating,

Ek[Tk+1] = 2λk + 1 + Ek−1[Tk] = · · · = 2

k∑j=1

λj + k + E0[T1].

Similarly, E0[T1] = 23 · (λ0 + E0[T1]) + 1

3 , so E0[T1] = 2λ0 + 1. Thus,

Ek[Tk+1] = 2

k∑j=0

λj + k + 1 =

k∑j=0

2dj+2 − 3(k + 1),

and

E0[Tr] =

r−1∑k=0

Ek[Tk+1] =

r−1∑k=0

(r − k)2dk+2 − 3

r−1∑k=0

k + 1

=

r−1∑k=0

(r − k)2dk+2 − 3

2r(r + 1)

ut

Exercise 21.1. Let τ be a finite binary tree of depth d with root o. Then,

Po[T+o > 2d] ≥ (2e)−1.

The next theorem gives an example of a tree with speed exponent α for any α ≤ 1/2.

••• Theorem 21.5. Let dk = bβ log2(k+ 1)c. The tree τ((dk)k) has speed exponent (β + 2)−1.

Proof. Let (Xt)t be a random walk on τ = τ((dk)k). For x ∈ τ denote |x| the number that is

the root of the unique finite subtree τk such that x ∈ τk. So |x| ≤ dist(x, 0) ≤ |x| + d|x|. So it

suffices to prove that

limt→∞

logE0[|Xt|]log t

= (β + 2)−1.

The lower bound is simpler. Note that

P0[|Xt| < r] ≤ P0[|Xt| < r, Tr > t] + P0[|Xt| < r, Tr ≤ t]

≤ P0[Tr > t] + Pr[|Xt| < r] ≤ t−1 E0[Tr] +1

2.

If we take t ≥ 4E0[Tr], we get that P0[|Xt| < r] ≤ 34 . Since

E0[Tr] = 4

r−1∑k=0

(r − k)2dk − 3

2r(r + 1)

∫ r

1

(r − x+ 1)xβdx− 3

2r(r + 1) rβ+2,

122

We get that for t = d4E0[Tr]e, r t1/(β+2). So

E0[|Xt|] ≥ P0[|Xt| ≥ r]r ≥r

4 t1/(β+2).

We now turn to the upper bound on E0[|Xt|]. Define inductively the following times. θ0 = 0

and

θn+1 = inf t ≥ θn : |Xt| 6= |Xθn | .

That is, (θn)n are the subsequent times the random walk moves from a vertex in N to a new

vertex in N. For every 0 < k ∈ N

Pk[X1 = k + 1 | X1 ∈ N] = Pk[X1 = k − 1 | X1 ∈ N] =1

2.

So the sequence (Zn = Xθn)n is a random walk on N.

Now, if the walk is at a vertex k ∈ N, then with probability 1/2 it performs an excursion into

the finite subtree τk, and with remaining probability 1/2 is moves in N. Thus, by the exercise

above,

P0[θn+1 > θn + 2dk∣∣ Zn = k,Fθn ] ≥ (4e)−1 a.s.

Let x = r, y = 2r, z = 3r. LetN < M be such that θN = Ty and θM = inf m > N : Zm ∈ x, z.For n ≥ N let Jn = 1θn+1>θn+2dx, and let S = N ≤ n < M : Jn = 1. Since dk ≥ dx for all

k ∈ [x, z], we have from the above, that for any set A ⊂ 0, 1, . . . , k − 1,

P0[S ⊂ A+N | M −N ≥ k] ≤(1− (4e)−1

)k−|A|.

Thus, for any λ < K, the event |S| < λ can be bounded by

P0[|S| < λ] ≤ P0[M −N < K] + P0[|S| < λ | M −N ≥ K]

≤ P0[M −N < K] +

(K

λ

)(1− (4e)−1

)K−λ.

Now, M − N is the number of steps a random walk on Z started at y = 2r takes to reach

x, z = r, 3r. Translating r 7→ 0 we get that P0[M − N < λ] is bounded by the probability

a random walk on Z started at 0 reaches [−r, r] before time λ. Azuma’s inequality above (and

Example 21.2 following it) tells us that

P0[M −N < K] ≤ 4 exp

(− r2

2K

).

Taking K = br2(8 log r)−2c and λ = bεKc for ε small enough (so that − log(1− (4e)−1) · (ε−1 −1) > −2 log ε) we have

P0[|S| < λ] < exp(−Ω((log r)2)) + exp(−Ω((r/ log r)2)),

123

which decays faster than any polynomial in r.

The event |S| ≥ λ implies that

T3r ≥ θM > |S| · 2dx + θN > λ · 2dx .

We thus conclude that for t = bλ · 2dxc,

P0[|Xt| > 3r] ≤ P0[T3r < t] ≤ P0[|S| < λ] ≤ exp(−Ω((log r)2)).

Since λ r2(log r)−1 and 2dx rβ . we get that t r2+β(log r)−1, and

E0[|Xt|] ≤ 3r + t · P0[|Xt| > 3r] ≤ 3r + t · exp(−Ω((log t)2)) t1/(β+2)+o(1).

So

lim supt→∞

logE0[|Xt|]log t

≤ 1

β + 2,

which coincides with our lower bound. ut

21.3. Transient and Sub-Diffusive

Example 21.6. We now have an example of a transient sub-diffusive graph. Let τ = τ((dk)k)

be the grove for dk = bβ log2(k + 1)c. Let G = τ3.

We know that as a graph power G has speed exponent (β + 2)−1. However, since N is a

subgraph of τ , then also N3 is a subgraph of G. We know that N3 is transient, so G must be

transient as well. 454

21.4. Recurrent Positive Speed

Raymond Paley

(1907-1933) Exercise 21.2. [Paley-Zygmund Inequality] Let X be a non-negative random variable.

Let α ∈ [0, 1]. Then,

P[X > αE[X]] ≥ (E[X])2

E[X2]· (1− α)2.

Antoni Zygmund

(1900-1992)

• Lemma 21.7. There exists a universal constant p > 0 such that the following holds. Let τ be

a finite binary tree of depth d with root o. For any t ≤ d,

Po[dist(Xt, o) ≥ 16 t] ≥ p,

where (Xt)t is a random walk on τ .

124

Proof. Let Dt = dist(Xt, o). We have already seen that for Lt :=∑tk=0 1Xt=o (L−1 = 0) and

Mt = Dt − 13 t − Lt−1 that (Mt)

dt=0 is a martingale (the restriction to t ≤ d is so that the walk

does not reach the leaves). Thus, for t ≤ d,

Eo[Dt] = 13 t+ Eo[Lt−1].

Also, for t ≤ d,

Eo[D2t | Ft−1] = 1Xt−1=o + 1Xt−1 6=o ·

(13 · (Dt−1 − 1)2 + 2

3 · (Dt−1 + 1)2)

= D2t−1 + 1 +

2

3·Dt−1.

So

Eo[D2t ] = Eo[D2

t−1] + 1 +2

3E0[Dt−1] = · · · = t+

2

3

t−1∑k=0

13k + Eo[Lk−1].

Note that for t ≤ d, we have that Lt is the number of visits to the root up to time t. Let q be

the probability that a random walk on an infinite rooted binary tree does not return to the root.

Then, if A is the set of leaves in τ then Po[TA < T+o ] ≥ q. However, since Po-a.s. t ≤ d ≤ TA,

we get that for any t ≤ d,

1 ≤ Eo[Lt] ≤ Eo[LTA ] =1

Po[TA < T+o ]≤ 1

q<∞.

Thus we conclude that

Eo[D2t ] ≤ t+

1

9t(t− 1) +

1

qt.

We now use the Payley-Zygmund inequality to conclude that for any t ≤ d,

Po[Dt ≥ 12 Eo[Dt]] ≥

(Eo[Dt])2

4Eo[D2t ]≥ 1

4·

19 t

2

19 t

2 + ( 89 + 1

q ) · t →14 .

Since Eo[Dt] ≥ 13 t the proof is complete. ut

Example 21.8. We complete the picture in Figure 5 by giving an example of a recurrent graph,

but with positive speed.

Recall that for the (dk)k grove, the expected time to reach the vertex r ∈ N is

E0[Tr] = 4

r−1∑k=0

(r − k)2dk − 3

2r(r + 1).

Let (dk)k be an increasing sequence that satisfies

dr > 1 + 4 ·(r−1∑k=0

(r − k)2dk − 3

2r(r + 1)

).

125

(This sequence must grow super-fast, at least like the Ackermann tower function.) Note that

this ensures that dr > d4E0[Tr]e. Consider the (dk)k-grove τ = τ((dk)k).

τ is of course recurrent.

Recall that for a random walk (Xt)t on τ and for t ≥ 4E0[Tr], we have that P0[|Xt| < r] ≤ 34

(where |Xt| is the root of the finite subtree containing Xt).

Given that X0 = r, we have by Lemma 21.7 for any t ≤ dr,

Pr[dist(Xt, r) ≥ 16 t] ≥ c > 0,

for some universal constant c > 0. So if we take t = 2 · t′ for t′ = d4E0[Tr]e, then t′ < dr so

P0[dist(Xt, 0) ≥ 16 t′] ≥

t′∑k=r

P0[|Xt′ | = k] · Pk[dist(Xt, k) ≥ 16 t′] ≥ 1

4· c.

So

E0[dist(Xt, 0)] ≥ 1

4c · 1

6t′ t.

And so τ has positive speed. 454



126

Random Walks

Ariel Yadin

Lecture 22:

22.1. Galton-Watson Processes

The final topic for this course is a special Markov chain on trees, known as the Galton-Watson

process.

Francis Galton (1822-1911) Galton and Watson were interested in the question of the survival of aristocratic surnames in

the Victorian era. They proposed a model to study the dynamics of such a family name.

Henry Watson (1827-1903)

In words, the model can be stated as follows. We start with one individual. This individual

has a certain random number of offspring. Thus passes one generation. In the next generation,

each one of the offspring has its own offspring independently. The processes continues building

a random tree of descent.

The formal definition is a bit complicated. For the moment let us focus only on the population

size at a given generation.

• Definition 22.1. Let µ be a distribution on N; i.e. µ : N→ [0, 1] such that∑n µ(n) = 1. The

Galton-Watson Process, with offspring distribution µ, (also denoted GWµ,) is the following

Markov chain (Zn)n on N:

Let (Xj,k)j,k∈N be a sequence of i.i.d. random variables with distribution µ.

• At generation n = 0 we set Z0 = 1. [ Start with one individual. ]

• Given Zn, let

Zn+1 :=

Zn∑k=1

Xn+1,k.

[ Xn+1,k represents the number of offspring of the k-th individual in generation n. ]

Example 22.2. If µ(0) = 1 then the GWµ process is just the sequence Z0 = 1, Zn = 0 for all

n > 0.

If µ(1) = 1 then GWµ is Zn = 1 for all n.

How about µ(0) = p = 1 − µ(1)? In this case, Z0 = 1, and given that Zn = 1, we have

that Zn+1 = 0 with probability p, and Zn+1 = 1 with probability 1 − p, independently of all

(Zk : k ≤ n). If Zn = 0 the Zn+1 = 0 as well.

127

What is the distribution of T = inf n : Zn = 0? Well, on can easily check that T ∼ Geo(p).

So GWµ is essentially a geometric random variable.

We will in general assume that µ(0) + µ(1) < 1, otherwise the process is not interesting.

454

22.2. Generating Functions

X Notation: For a function f : R→ R we write f (n) = f · · · f for the composition of f

with itself n times.

Let X be a random variable with values in N. The probability generating function, or

PGF , is defined as

GX(z) := E[zX ] =∑n

P[X = n]zn.

This function can be thought of as a function from [0, 1] to [0, 1]. If µ(n) = P[X = n] is the

density of X, then we write Gµ = GX .

Some immediate properties:

Exercise 22.1. Let GX be the probability generating function of a random variable X

with values in N. Show that

• If z ∈ [0, 1] then 0 ≤ GX(z) ≤ 1.

• GX(1) = 1.

• GX(0) = P[X = 0].

• G′X(1−) = E[X].

• E[X2] = G′′X(1−) +G′X(1−).

• ∂n

∂znGX(0+) = P[X = n].

• Proposition 22.3. A PGF GX is convex on [0, 1].

Proof. GX is twice differentiable, with

G′′X(z) = E[X(X − 1)zX−2] ≥ 0.

ut

The PGF is an important tool in the study of Galton-Watson processes.

128

• Proposition 22.4. Let (Zn)n be a GWµ process. For z ∈ [0, 1],

E[zZn+1∣∣ Z0, . . . , Zn] = Gµ(z)Zn .

Thus,

GZn = G(n)µ = Gµ · · · Gµ.

Proof. Conditioned on Z0, . . . , Zn, we have that

Zn+1 =

Zn∑k=1

Xk,

where X1, . . . , are i.i.d. distributed according to µ. Thus,

E[zZn+1∣∣ Z0, . . . , Zn] = E[

Zn∏k=1

zXk∣∣ Z0, . . . , Zn] =

Zn∏k=1

E[zXk ] = Gµ(z)Zn .

Taking expectations of booths sides we have that

GZn+1(z) = E[zZn+1 ] = E[Gµ(z)Zn ] = GZn(Gµ(z)) = GZn Gµ(z).

An inductive procedure gives

GZn = GZn−1 Gµ = GZn−2 Gµ Gµ = · · · = G(n)µ ,

since GZ1 = Gµ. ut

22.3. Extinction

Recall that the first question we would like to answer is the extinction probability for a GW

process.

Let (Zn)n be a GWµ process. Extinction is the event ∃n : Zn = 0. The extinction

probability is defined to be q = q(GWµ) = P[∃n : Zn = 0]. Note that the events Zn = 0form an increasing sequence, so

q(GWµ) = limn→∞

P[Zn = 0].

• Proposition 22.5. Consider a GWµ. (Assume that µ(0) +µ(1) < 1.) Let q = q(GWµ) be the

extinction probability and G = Gµ. Then,

• q is the smallest solution to the equation G(z) = z. If only one solution exists, q = 1.

Otherwise, q < 1 and the only other solution is G(1) = 1.

• q = 1 if and only if G′(1−) = E[X] ≤ 1.

129

X Positivity of the extinction probability depends only on the mean number of offspring!

Proof. If P[X = 0] = G(0) = 0 then Zn ≥ Zn−1 for all n, so q = 0, because there is never

extinction. Also, the only solutions to G(z) = z in this case are 0, 1 because G′′(z) > 0 for z > 0

so G′ is strictly convex, and thus G(z) < z for all z ∈ (0, 1). So we can assume that G(0) > 0.

Let f(z) = G(z)− z. So f ′′(z) > 0 for z > 0. Thus, f ′ is a strictly increasing function.

• Case 1: If G′(1−) ≤ 1. So f ′(1−) ≤ 0. Since f ′(0+) = −(1 − µ(1)) < 0 (because

µ(1) < 1), and since f ′ is strictly increasing, for all z < 1 we have that f ′(z) < 0. Thus,

the minimal value of f is at 1; that is, f(z) > 0 for all z < 0 and there is only one

solution to f(z) = 0 at 1.

• Case 2: IfG′(1−) > 1. Then f ′(1−) > 0. Since f ′(0+) < 0 there must be some 0 < x < 1

such that f ′(x) = 0. Since f ′ is strictly increasing, this is the unique minimum of f in

[0, 1]. Since f ′(z) > 0 for z > x, as a minimum, we have that f(x) < f( 1+x2 ) ≤ f(1) = 0.

Also, f(0) = µ(0) > 0, and because f is continuous, there exists a 0 < p < x such that

f(p) = 0.

We claim that p, 1 are the only solutions to f(z) = 0. Indeed, if a < b are any such

solutions, then because f is strictly convex, for any a < z < b, f(z) < αf(a) + (1 −α)f(b) = 0 for some α ∈ (0, 1).

In conclusion, in the case G′(1−) > 1 we have that there are exactly two solutions to

G(z) = z, which are p and 1.

Moreover, p < x for x the unique minimum of f , so because f ′ is strictly increasing,

−1 ≤ −(1− µ(1)) = f ′(0+) ≤ f ′(z) ≤ f ′(p) < f ′(x) = 0

for any z ≤ p. Thus, for any z ≤ p we have that

f(z) = f(z)− f(p) = −∫ p

z

f ′(t)dt ≤ p− z,

which implies that G(z) ≤ p for any z ≤ p.

Now, recall that the extinction probability admits

q = limn→∞

P[Zn = 0] = limn→∞

GZn(0) = limn→∞

G(n)(0).

Since G is a continuous function, we get that G(q) = q so q is a solution to G(z) = z.

If two solutions exists (equivalently, G′(1−) > 1), say p and 1, then G(n)(0) ≤ p for all n, so

q ≤ p and thus must be q = p < 1.

130

If only one solution exists then q = 1. ut

q

0 1 0 1

G′(1−) > 1

G′(1−) ≤ 1

Figure 7. The two possibilities for G′(1−). The blue dotted line and crosses

show how the iterates G(n)(0) advance toward the minimal solution of G(z) = z.