Entropy-driven cutoff phenomena

Entropy-driven cutoff phenomenonafor finite Markov chains

Carlo Lancia, Benedetto Scoppola

University of Rome TorVergata

Eindhoven – April 5, 2011

Carlo Lancia, Benedetto Scoppola Entropy-driven cutoff

Introduction

Introducing the cutoff phenomenon

What is the cutoff phenomenon?An abrupt convergence of a MC to its stationary state.The distance between the evolute measure µt and theinvariant one π stays at 1 for a certain time and thensuddenly drops to 0.

When is it important?queueing systemssampling distributions

optimization problemscounting and integrating


Introduction


What is the cutoff phenomenon?An abrupt convergence of a MC to its stationary state.The distance between the evolute measure µt and theinvariant one π stays at 1 for a certain time and thensuddenly drops to 0.

When is it important?queueing systemssampling distributions

optimization problemscounting and integrating


Introduction


Definition

Given a family of MC Ωn, Xtn, Pn, µ

tn, µ

0n, πn

and two sequences an, bn such thatbnan→ 0 as n→∞

that family is said to exhibit cutoff, with cutoff-time an andcutoff-window O(bn), iff

limθ→∞

lim infn→∞

∥∥∥µan−θbnn − πn∥∥∥

TV= 1

limθ→∞

lim supn→∞

∥∥∥µan+θbnn − πn

∥∥∥TV

= 0

where∥∥µtn − πn∥∥TV

= 12

∑i∈Ωn

|µtn(i)− πn(i)|


Introduction


Definition


tn, µ

0n, πn



limθ→∞

lim infn→∞


TV= 1

limθ→∞

lim supn→∞


∥∥∥TV

= 0


= 12

∑i∈Ωn

|µtn(i)− πn(i)|


Introduction


Definition


tn, µ

0n, πn



limθ→∞

lim infn→∞


TV= 1

limθ→∞

lim supn→∞


∥∥∥TV

= 0


= 12

∑i∈Ωn

|µtn(i)− πn(i)|


Introduction


Example: Biased random walk


Introduction

Understanding the total variation distance

Properties of the TV distance

It takes values in [0, 1]∥∥µtn − πn∥∥TVis monotonically non-increasing in t

Suppose we have two distributions λ and µ


Introduction




‖µ− λ‖TV = maxA⊆Ωnµ(A)− λ(A)


Introduction




‖µ− λ‖TV = maxA⊆Ωnλ(A)− µ(A)


Introduction




1− ‖µ− λ‖TV = 1−maxA⊆Ωnµ(A)− λ(A)


Introduction


The total variation distance

Fundamental remarkNo overlap between λ and µ =⇒ ‖λ− µ‖TV = 1


Introduction

Understanding the cutoff phenomenon

The Ehrenfest’s Urn2 boxes, n balls

qi = 12in probability to remove a ball from Urn 1

pi = 12n−in probability to add a ball in Urn 1

The equilibrium distribution is binomial B(n, 12)


Introduction


The Ehrenfest’s Urn


Introduction



Introduction



Introduction



Introduction



Introduction



Introduction



Introduction



Introduction



Introduction



Introduction



Introduction



Introduction



Introduction



Introduction


A small recapThe convergence is triggered by the instant when thesupports of µt and π start intersectingThe convergence after this moment is exponentially fast asin the diffusion caseThen the evolution of the chain can be divided in twophases1. Approaching the support of π2. Diffusion inside it

QuestionCan we replace the deterministic instant when the supports ofµt and π start intersecting by the random instant when theprocess Xt enters into the support of π?


Introduction


A small recapThe convergence is triggered by the instant when thesupports of µt and π start intersectingThe convergence after this moment is exponentially fast asin the diffusion caseThen the evolution of the chain can be divided in twophases1. Approaching the support of π2. Diffusion inside it

Intuitively it should be possible, for the supports of µt and π aredisjoint until the time τ when Xt hits the support of π.Unfortunately τ is a stochastic time and we need adeterministic quantity an. What about E (τ)?


Cutoff for birth-and-death chains

Birth-and-death chains

Queues modelsB&D chains take values in Ωn = 0, 1, . . . , n and moveonly to nearest neighborsArrival rates pi = Pi,i+1 and service rates qi = Pi,i−1

If ∀ i pi, qi > 0 and ∃ j such that 1− pj − qj > 0, then thechain has a unique stationary distribution

πn(k) = πn(0)

k∏i=1

piqi

The Strong drift condition

A sufficient condition for cutoff in birth-and-death chains can befound in Barrera, Bertoncini, Fernandez. JSP, 2009



Cutoff for pure-death chains

The coupon collector

Ωn = 0, 1, . . . , n pi = 0, qi = in πn = δi,0

τn = mint ≥ 0 : Xtn = 0 is the hitting time of 0

E(∥∥µτn−θnn − πn

∥∥TV

)∼ 1

E(∥∥µτn+θ

n − πn∥∥

TV

)∼ 0




LemmaIf there exists a sequence of hitting times τn and asequence of positive reals δn such that:σ(τn)+δnE(τn) → 0 as n→∞ ⇒ τn is quasi-deterministic

definitively as n→∞,∥∥µτn−θδnn − πn

∥∥TV≥ 1− f(θ)

definitively as n→∞,∥∥µτn+θδn

n − πn∥∥

TV≤ g(θ)

for any two functions f, g → 0 as θ →∞

Then we have cutoff with an = E (τn) , bn = O(σ (τn) + δn)

The coupon collector

The hypothesis are trivially checked with δn = O(1)




Top-in-at-random shuffle

deck of n cardstop card is inserted atuniform random position

Initial deck permutation is givenWe follow the position of the original bottom cardIf during the shuffling a card is inserted under the originalbottom card the latter steps up one position, otherwise itstands at the same height⇒ pure-death chain




Top-in-at-random shuffle

deck of n cardstop card is inserted atuniform random position

Until the initial bottom card reaches the topmost positionthe distance from uniformity is greater than 1− 1

n

When the initial bottom card is inserted into the deck fromthe top the distance is 0The previous lemma holds with δn = 1



Deeper into the Ehrenfest’s Urn

πn is concentrated in a region of size O(√n)




No overlap between µtn and πn =⇒∥∥µtn − πn∥∥TV

= 1




Similar to a diffusion process =⇒ mixes in time n




Fundamental remarksBoth in Coupon collector and Ehrenfest’s Urn models, thekey to understand cutoff was the drift of the chain towardsa small region An ⊂ Ωn, where the stationary distributionπn is mostly concentrated.The parameter δn seems to be the time necessary to thechain for mixing inside An.In the Ehrenfest’s model the cutoff depends on the startingposition of the chain: if it starts too near the region An thenthe time to reach it is comparable with the diffusion time.We will see that in general the region An can be foundusing entropy or free-energy considerations.


Entropy-driven cutoff

Entropy-driven cutoff phenomenons

Projection of chainsWhen the state space Ωn is highly symmetrical theequilibrium distribution πn is uniformCan we still study cutoff as a hitting process?Yes, if we suitably project our Markov chain on abirth-and-death processWe need an equivalence relation ∼ on Ωn such that theresulting process on Ω]

n = Ωn/ ∼ is still a Markov chain

Then the projected chain X],tn has equilibrium distribution

π]n(i) =∑

Ωn 3x∼iπn(x), which is closely related with the

entropy of the i-th class







π]n(i) =∑









π]n(i) =∑









π]n(i) =∑





Lazy random walk on the hypercube

The modelThe state space is a hypercube of dimension n

Ωn = 0, 1n, Ωn 3 x = (x1, x2, . . . , xn)

At each time one of the n directions is chosen u.a.r.The chain moves along that direction with probability 1

2

Then the probability to flip the j-th component, that ismoving from the vertex x = (x1, . . . , xj , . . . , xn) tox′ = (x1, . . . , 1− xj , . . . , xn), is 1

2n

The chain stands still with probability 12 → ergodicity

A lazy random walk that starts on a vertex (e.g. the origin)exhibits cutoff with an = 1

2n log n and bn = O(n).




Projection of the random walk on the hypercube

Define the Hamming weight W (x) = ‖x‖`1 =∑n

i=1 xi

Declare two state equivalent x ∼ y iff W (x) = W (y)

[i] = x ∈ Ωn : W (x) = i ⇒ µ],0n ([i]) = δ[i],[0]

The transition probabilities are

P ([i], [i+ 1]) =n− i2n

P ([i], [i− 1]) =i

2n

⇒ X],tn is the lazy Ehrenfest’s chain!

Moreover,∥∥µtn − πn∥∥TV

=∥∥∥µ],tn − π]n∥∥∥

TV⇒ we can prove cutoff looking at the projection only




Projection of the random walk on the hypercube

Define the Hamming weight W (x) = ‖x‖`1 =∑n

i=1 xi

Declare two state equivalent x ∼ y iff W (x) = W (y)

[i] = x ∈ Ωn : W (x) = i ⇒ µ],0n ([i]) = δ[i],[0]

The transition probabilities are

P ([i], [i+ 1]) =n− i2n

P ([i], [i− 1]) =i

2n

⇒ X],tn is the lazy Ehrenfest’s chain!

Moreover,∥∥µtn − πn∥∥TV

=∥∥∥µ],tn − π]n∥∥∥

TV⇒ we can prove cutoff looking at the projection only



A sufficient condition for having cutoff


tn, µ

0n, πn and its projection

Ω]n, X

],tn , P

]n, µ

],tn , µ

],0n , π]n suppose the following

∃ An,θθ, An,θ ⊂ Ω]n such that

An,θ ⊆ An,θ′ if θ ≤ θ′ and π]n(An,θ) < f(θ) −→

θ→∞0

The hitting time τn,1 of An,1 is quasi-deterministicThe time necessary to travel from An,θ to An,1 is controlledby θδn and is sufficient for X],t

n to diffuse inside An,1

Then the family exhibit cutoff with

an = E (τn,1) and bn = O(δn + σ (τn,1))




The cutoff windowThere are two contributions to the cutoff window, σ(τn,1) and δn:

The standard deviation of τn,1 is the relevant one forcoupon collector, top-in-at-random and many B&D chains.θδn is a suitable upper bound to both the expectedtravelling time from An,θ to An,1 and the time necessary tomix inside An,1 with a tolerance up to g(θ)∥∥∥µτn,1+θδn

n − πn∥∥∥

TV≤ g(θ)→ 0 as θ →∞

δn is the relevant contribution for the Ehrenfest’s Urn andthe random walk on the hypercube.





An,θ =[n−θ√n

2 , n+θ√n

2

]π]n(A

n,θ) <1θ2

(from Chebyshev, could be done much better)

E (τn,θ) = 12n log n− n log θ

σ(τn,θ) = Oθ

(n

34

)E(τAn,θ→An,1

)= n log θ

The Ehrenfest’s urn started at n−θ√n

2 has a mixing time oforder n, as well as the lazy random walk started withuniform measure over the set x : W (x) = n−θ

√n

2


Free-energy-driven cutoff

Mean-field Ising model

The modelWe have n binary spins σi that can be up (+1) or down (-1)The spins interact each other at temperature T = β−1 > 1

The Hamiltonian of a spin configuration σ = (σ1, . . . , σn) is

H(σ) = −Jn

∑i<j

σiσj

The model is called mean-field Ising model for only theaverage number of + spins to - spins is importantWe name local field the quantity J(i) = J

n

∑j 6=i σj and

magnetization the quantity m(σ) = 1n

∑i σi




Glauber dynamics for the mean-field Ising model

Consider the following MC with Ωn = −1,+1n

At each time a spin σi is selected u.a.r. then it is updatedrespectively to +1 or -1 with probability

p+ =eβJ(i)

eβJ(i) + e−βJ(i)p− =

e−βJ(i)

eβJ(i) + e−βJ(i)

The chain has a unique stationary measure

πn(σ) =e−βH(σ)

Zβ,n

where Zβ,n =∑

σ′∈Ωne−βH(σ′) is a normalizing factor




The magnetization

It is possible to rewrite all the quantities defined above interms of the magnetization m(σ) ∈ −1, −n+2

n , . . . , n−2n , 1

p+ =1 + tanh(βm(σ)− σi

n )

2

p− =1− tanh(βm(σ) + σi

n )

2

πn(σ) =eβn

m(σ)2

2

Zβ,n

Thus it seems natural to declare two configurationsequivalent if they have the same magnetization




The magnetization

It is possible to rewrite all the quantities defined above interms of the magnetization m(σ) ∈ −1, −n+2

n , . . . , n−2n , 1

p+ =1 + tanh(βm(σ)− σi

n )

2

p− =1− tanh(βm(σ) + σi

n )

2

πn(σ) =eβn

m(σ)2

2

Zβ,n

Thus it seems natural to declare two configurationsequivalent if they have the same magnetization




Projection of the Glauber chain

σ ∼ σ′ iff m(σ) = m(σ′)

The resulting lumped process is a birth-and-death Markovchain over Ω]

n = −1, −n+2n , . . . , n−2

n , 1

pm =1−m

2

1 + tanh(βm− 1n)

2

qm =1 +m

2

1− tanh(βm+ 1n)

2

The stationary distribution is

π]n(m) =exp

[βnm

2

2

]Zβ,n

(n

12n(1 +m)

)





E(m) = nm2

2 is the energy of the magnetizationS(m) = log

(n

12n(1+m)

)is the entropy of the magnetization

A(m) = E(m)− TS(m) is the Helmholtz free-energy

For m 1 we have that S(m) ∝ −nm2

2 , then

π]n(m) = c e−1−β2nm2

= c e−βA(m)

which is for large n a normal distribution N(

0, 1√(1−β)n

)Therefore it’s possible to prove cutoff with the sametechnique used for the Ehrenfest’s Urn


Documents

Entropy-driven cutoff phenomena