28
Theory of Stochastic Processes 8. Markov chain Monte Carlo Tomonari Sei [email protected] Department of Mathematical Informatics, University of Tokyo June 8, 2017 http://www.stat.t.u-tokyo.ac.jp/~sei/lec.html 1 / 28

Theory of Stochastic Processes 8. Markov chain Monte Carlosei/lec/ohp_SP8.pdf · Let Θ = f1;:::;100g2 and ˇi /exp(j i1 i2j2=2) ( much correlated). 0 2000 4000 6000 8000 10000 0

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Theory of Stochastic Processes 8. Markov chain Monte Carlosei/lec/ohp_SP8.pdf · Let Θ = f1;:::;100g2 and ˇi /exp(j i1 i2j2=2) ( much correlated). 0 2000 4000 6000 8000 10000 0

.

.

. ..

.

.

Theory of Stochastic Processes8. Markov chain Monte Carlo

Tomonari [email protected]

Department of Mathematical Informatics, University of Tokyo

June 8, 2017

http://www.stat.t.u-tokyo.ac.jp/~sei/lec.html

1 / 28

Page 2: Theory of Stochastic Processes 8. Markov chain Monte Carlosei/lec/ohp_SP8.pdf · Let Θ = f1;:::;100g2 and ˇi /exp(j i1 i2j2=2) ( much correlated). 0 2000 4000 6000 8000 10000 0

Handouts & Announcements

Handouts:

Slides (this one)

Solutions of the midterm exam

Copy of §6.5 and §6.14 of PRP

Note:

Your answer sheets of the midterm exam will be returned today unlessyou took an additional exam (for conference attendee or sick leave).

2 / 28

Page 3: Theory of Stochastic Processes 8. Markov chain Monte Carlosei/lec/ohp_SP8.pdf · Let Θ = f1;:::;100g2 and ˇi /exp(j i1 i2j2=2) ( much correlated). 0 2000 4000 6000 8000 10000 0

Schedule

Apr 6 Overview

Apr 13 Simple random walk

Apr 20 Generating functions

Apr 27 Markov chain

May 11 Continuous-time Markov chain

May 18 Review

May 25 (midterm exam)

June 8 Markov chain Monte Carlo

June 15 Stationary processes

June 22 Martingales

June 29 Queues

July 6 Diffusion processes

July 13 Review

July 20 (final exam) ← the date has been decided.

3 / 28

Page 4: Theory of Stochastic Processes 8. Markov chain Monte Carlosei/lec/ohp_SP8.pdf · Let Θ = f1;:::;100g2 and ˇi /exp(j i1 i2j2=2) ( much correlated). 0 2000 4000 6000 8000 10000 0

Outline today

.

. .1 Remarks about the midterm exam

.

. .

2 Markov chain Monte CarloDetailed balance equationGibbs samplerMetropolis-Hastings algorithm

.

. .

3 Recommended problems

4 / 28

Page 5: Theory of Stochastic Processes 8. Markov chain Monte Carlosei/lec/ohp_SP8.pdf · Let Θ = f1;:::;100g2 and ˇi /exp(j i1 i2j2=2) ( much correlated). 0 2000 4000 6000 8000 10000 0

How to score

I sincerely apologize that the problems Q1-(c) and Q4-(b) were verydifficult.

So I decided that the full marks is 70 marks.

Therefore the final score will be based on

40%× 10

7× (midterm exam) + 60%× (final exam)

The score of Q5 of the midterm exam is either α, β, or 0, where αand β are functions of the total score of Q1 to Q4, and satisfy0 ≤ β ≤ α ≤ 10. I have not decided the specific function yet...

The final exam will consist of easier questions...

5 / 28

Page 6: Theory of Stochastic Processes 8. Markov chain Monte Carlosei/lec/ohp_SP8.pdf · Let Θ = f1;:::;100g2 and ˇi /exp(j i1 i2j2=2) ( much correlated). 0 2000 4000 6000 8000 10000 0

Q5 of the midterm exam

Research interest

statistics, numerical algorithm, optimization, intelligent systems,music data, computational linguistics, causal discovery, mathematicalfinance, computer graphics, random graph, economics, data structure,hybrid system, brain consciousness.

Comments on the lecture (selected)

loud

explanation could be more detailed

FAQ

easier exercises

more proofs

6 / 28

Page 7: Theory of Stochastic Processes 8. Markov chain Monte Carlosei/lec/ohp_SP8.pdf · Let Θ = f1;:::;100g2 and ˇi /exp(j i1 i2j2=2) ( much correlated). 0 2000 4000 6000 8000 10000 0

Review: Stationary distribution

Let pij denote the transition probability (from i to j) of a Markov chain.

.

Definition (reminder)

.

.

.

. ..

.

.

A mass function π = (πi )i∈S is called the stationary distribution of theMarkov chain if ∑

i∈S

πipij = πj for all j ∈ S .

7 / 28

Page 8: Theory of Stochastic Processes 8. Markov chain Monte Carlosei/lec/ohp_SP8.pdf · Let Θ = f1;:::;100g2 and ˇi /exp(j i1 i2j2=2) ( much correlated). 0 2000 4000 6000 8000 10000 0

Detailed balance equation

.

Definition

.

.

.

. ..

.

.

A Markov chain is called reversible if it has the stationary distributionπ = (πi )i∈S satisfying

πipij = πjpji for all i , j ∈ S .

This equation is called the detailed balance equation.

8 / 28

Page 9: Theory of Stochastic Processes 8. Markov chain Monte Carlosei/lec/ohp_SP8.pdf · Let Θ = f1;:::;100g2 and ˇi /exp(j i1 i2j2=2) ( much correlated). 0 2000 4000 6000 8000 10000 0

Detailed balance implies stationarity

.

Lemma

.

.

.

. ..

.

.

If a mass function πi satisfies the detailed balance equation, then πi is astationary distribution.

Proof: ∑i

πipij =∑

i

πjpji (detailed balance)

= πj

∑i

pji

= πj (property of transition matrix)

9 / 28

Page 10: Theory of Stochastic Processes 8. Markov chain Monte Carlosei/lec/ohp_SP8.pdf · Let Θ = f1;:::;100g2 and ˇi /exp(j i1 i2j2=2) ( much correlated). 0 2000 4000 6000 8000 10000 0

Exercises

.

Exercise 1

.

.

.

. ..

.

.

Let P =

0 1 01/6 1/2 1/30 2/3 1/3

and π = (1/10, 6/10, 3/10).

Show that the detailed balance equation is satisfied.

.

Exercise 2

.

.

.

. ..

. .

Let P =

a b cc a bb c a

and π = (1/3, 1/3, 1/3), where a, b, c ≥ 0 and

a + b + c = 1.

.

.

.

1 Show that π is a stationary distribution.

.

.

.

2 Determine the condition that the detailed balance equation holds.

Note: a matrix P is called doubly stochastic if all the row sums andcolumn sums are 1.

10 / 28

Page 11: Theory of Stochastic Processes 8. Markov chain Monte Carlosei/lec/ohp_SP8.pdf · Let Θ = f1;:::;100g2 and ˇi /exp(j i1 i2j2=2) ( much correlated). 0 2000 4000 6000 8000 10000 0

Markov Chain Monte Carlo (MCMC)

We begin with an example.

.

A toy example

.

.

.

. ..

.

.

Let Θ be the set of all strings of length 20 that consist of four letters ‘A’,‘G’, ‘C’, ‘T’. For example,

“AGATACACATTTAACGGCAT” ∈ Θ.

Define a probability mass function on Θ by

π(θ) =2c(θ)

Z, Z =

∑θ∈Θ

2c(θ),

where c(θ) is the number of “CAT” in θ. Find the expected number∑θ∈Θ

c(θ)π(θ).

11 / 28

Page 12: Theory of Stochastic Processes 8. Markov chain Monte Carlosei/lec/ohp_SP8.pdf · Let Θ = f1;:::;100g2 and ˇi /exp(j i1 i2j2=2) ( much correlated). 0 2000 4000 6000 8000 10000 0

MCMC

Our target is∑

θ∈Θ c(θ)π(θ).

Since |Θ| = 420 ' 1012, computation of the sum is not so easy.

Instead, construct a Markov chain in the following manner.

.

Gibbs sampler for the above problem

.

.

.

. ..

.

.

.

.

.

1 Determine an initial state θ(0). For example,θ(0) =“AAAAAAAAAAAAAAAAAAAA”.

.

.

.

2 Choose a site from the 20 at random, and replace it with a letter insuch a way that the detailed balance equation is satisfied. Repeat thisprocedure N times. Let θ(n) for n = 1, . . . , N be the obtainedsequence.

.

.

.

3 An estimate is (1/N)∑N

n=1 c(θ(n)).

12 / 28

Page 13: Theory of Stochastic Processes 8. Markov chain Monte Carlosei/lec/ohp_SP8.pdf · Let Θ = f1;:::;100g2 and ˇi /exp(j i1 i2j2=2) ( much correlated). 0 2000 4000 6000 8000 10000 0

MCMC

For example, if

θ =“AGATACACATTTAACGGCAT”

and the 2nd letter is chosen (with probability 1/20), then move from θ toφ in the following probability:

φ 2c(φ) probability

“AAATACACATTTAACGGCAT” 4 1/5“AGATACACATTTAACGGCAT” 4 1/5“ACATACACATTTAACGGCAT” 8 2/5“ATATACACATTTAACGGCAT” 4 1/5

The detailed balance equation is satisfied. It will be checked later in moregeneral form.

13 / 28

Page 14: Theory of Stochastic Processes 8. Markov chain Monte Carlosei/lec/ohp_SP8.pdf · Let Θ = f1;:::;100g2 and ˇi /exp(j i1 i2j2=2) ( much correlated). 0 2000 4000 6000 8000 10000 0

A result

Sorry, I had no time to implement it.

14 / 28

Page 15: Theory of Stochastic Processes 8. Markov chain Monte Carlosei/lec/ohp_SP8.pdf · Let Θ = f1;:::;100g2 and ˇi /exp(j i1 i2j2=2) ( much correlated). 0 2000 4000 6000 8000 10000 0

Ising model

Here is another example.

.

Example (Ising model)

.

.

.

. ..

.

.

Let Θ = {−1, 1}n×n. The Ising model is defined by

π(θ) =π∗(θ)

Z, π∗(θ) = exp

β∑

(v ,w):v∼w

θvθw

, θ ∈ Θ.

Here v ∼ w means that v and w are “adjacent” in the n × n lattice, Z isdefined by Z =

∑θ∈Θ π∗(θ), and β ∈ R is a given parameter.

How to generate a random sample from π(θ)?

It is hopeless to calculate Z since |Θ| = 2n2.

→ MCMC avoids such computation.

15 / 28

Page 16: Theory of Stochastic Processes 8. Markov chain Monte Carlosei/lec/ohp_SP8.pdf · Let Θ = f1;:::;100g2 and ˇi /exp(j i1 i2j2=2) ( much correlated). 0 2000 4000 6000 8000 10000 0

Output of MCMC

Random samples from the Ising model are shown.The size of the lattice is 100× 100.

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

beta= 0.4

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

beta= 0

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

beta= −0.4

β = 0.4 β = 0 β = −0.4(uniform distribution)

16 / 28

Page 17: Theory of Stochastic Processes 8. Markov chain Monte Carlosei/lec/ohp_SP8.pdf · Let Θ = f1;:::;100g2 and ˇi /exp(j i1 i2j2=2) ( much correlated). 0 2000 4000 6000 8000 10000 0

Idea of MCMC

Purpose: generate a sample from πi = π∗i /Z , where Z =

∑i∈S π∗

i .

Assumption: π∗i is easily calculated, but Z is not.

Solution: construct a Markov chain with the stationary distribution π.

.

Example (cont.)

.

.

.

. ..

.

.

For the Ising model, we used the following Markov chain:

Choose v ∈ V = {1, . . . , n}2 randomly.

Choose θ̃v ∈ {−1, 1} with the probability

P(θ̃v = ±1) =e±β

P

w∼v θw

eβP

w∼v θw + e−βP

w∼v θw.

and θ̃w = θw for all w 6= v .

Move from θ to θ̃.

This is an example of the Gibbs sampler. → next slide

17 / 28

Page 18: Theory of Stochastic Processes 8. Markov chain Monte Carlosei/lec/ohp_SP8.pdf · Let Θ = f1;:::;100g2 and ˇi /exp(j i1 i2j2=2) ( much correlated). 0 2000 4000 6000 8000 10000 0

Gibbs sampler

The state space is Θ = SV , where S is “local state space” and V is afinite set.

Let Θi ,v = {j ∈ Θ | jw = iw for w 6= v} for i ∈ Θ and v ∈ V .

Assumption: the quantity

hij =πj∑

k∈Θi,vπk

, j ∈ Θi ,v ,

is easy to compute.

Markov chain: choose v ∈ V randomly, and choose j ∈ Θi ,v randomlyaccording to {hij}. Then move from i to j .

.

Example (cont.)

.

.

.

. ..

.

.

For the Ising model, S = {−1, 1}, V = {1, . . . , n}2, πi ∝ eβP

(v,w):v∼w iv iw ,and

hij =eβ

P

w∼v jv iw

eβP

w∼v iw + e−βP

w∼v iw, j ∈ Θi ,v .

18 / 28

Page 19: Theory of Stochastic Processes 8. Markov chain Monte Carlosei/lec/ohp_SP8.pdf · Let Θ = f1;:::;100g2 and ˇi /exp(j i1 i2j2=2) ( much correlated). 0 2000 4000 6000 8000 10000 0

Validity of the Gibbs sampler

.

Lemma

.

.

.

. ..

.

.

The Markov chain constructed above is reversible.

Proof: The transition probability is

pij =1

|V |∑v∈V

1{j∈Θi,v}hij

=1

|V |∑v∈V

1{j∈Θi,v}πj∑

k∈Θi,vπk

Then

πipij =1

|V |∑v∈V

1{j∈Θi,v}πiπj∑

k∈Θi,vπk

.

This is symmetric because j ∈ Θi ,v ⇔ i ∈ Θj ,v .

19 / 28

Page 20: Theory of Stochastic Processes 8. Markov chain Monte Carlosei/lec/ohp_SP8.pdf · Let Θ = f1;:::;100g2 and ˇi /exp(j i1 i2j2=2) ( much correlated). 0 2000 4000 6000 8000 10000 0

Exercise

.

Exercise

.

.

.

. ..

.

.

Let Θ = {1, 2}2 = {(1, 1), (1, 2), (2, 1), (2, 2)} and

πi =1

(i1 + i2)Z, i ∈ Θ,

where Z =∑

i∈Θ1

i1+i2.

.

.

.

1 Write down the transition matrix of the Gibbs sampler.

.

.

.

2 Confirm that the obtained chain is reversible.

20 / 28

Page 21: Theory of Stochastic Processes 8. Markov chain Monte Carlosei/lec/ohp_SP8.pdf · Let Θ = f1;:::;100g2 and ˇi /exp(j i1 i2j2=2) ( much correlated). 0 2000 4000 6000 8000 10000 0

A drawback

In general, the Gibbs sampler is quite powerful.However, if the “correlation” of the target distribution is high, theconvergence to the stationary distribution may be slow.

.

Example

.

.

.

. ..

.

.

Let Θ = {1, . . . , 100}2 and πi ∝ exp(−|i1 − i2|2/2) (← much correlated).

0 2000 4000 6000 8000 10000

020

4060

8010

0

time

i 1

0 20 40 60 80 100

0.00

00.

005

0.01

00.

015

0.02

0

Index

prob

abili

ty

A sample path of i1 Histogram of the states(blue curve shows exact values)

21 / 28

Page 22: Theory of Stochastic Processes 8. Markov chain Monte Carlosei/lec/ohp_SP8.pdf · Let Θ = f1;:::;100g2 and ˇi /exp(j i1 i2j2=2) ( much correlated). 0 2000 4000 6000 8000 10000 0

Metropolis-Hastings algorithm

Given Xn = i , generate Xn+1 as follows.

Pick Y ∈ Θ according to

P(Y = j | Xn = i) = hij ,

where H = (hij) is an irreducible transition matrix called the proposal.

Given that Y = j , set

Xn+1 =

{j with probability aij ,Xn with probability 1− aij ,

where the acceptance probability aij ∈ [0, 1] is determined to satisfythe detailed balance equation

πihijaij = πjhjiaji .

For example,

aij = min

(1,

πjhji

πihij

), aij =

πjhji

πihij + πjhjietc.

22 / 28

Page 23: Theory of Stochastic Processes 8. Markov chain Monte Carlosei/lec/ohp_SP8.pdf · Let Θ = f1;:::;100g2 and ˇi /exp(j i1 i2j2=2) ( much correlated). 0 2000 4000 6000 8000 10000 0

.

Example (cont.)

.

.

.

. ..

.

.

Let Θ = {1, . . . , 100}2 and πi ∝ exp(−|i1 − i2|2/2). For example,determine a proposal matrix as follows:

hij =1

100

(1

2· 1{j1=j2} +

1

198· 1{j1 6=j2}

), i , j ∈ Θ.

Note that hij does not depend on i . We use aij = min(1, πjhji/(πihij)).

0 2000 4000 6000 8000 10000

020

4060

8010

0

time

i 1

0 20 40 60 80 100

0.00

0.01

0.02

0.03

0.04

0.05

Index

prob

abili

ty

Trajectory of i1 Histogram of i1

23 / 28

Page 24: Theory of Stochastic Processes 8. Markov chain Monte Carlosei/lec/ohp_SP8.pdf · Let Θ = f1;:::;100g2 and ˇi /exp(j i1 i2j2=2) ( much correlated). 0 2000 4000 6000 8000 10000 0

Application: random sampling of mazes

Maze Rooted tree(The root is indicated by ©)

24 / 28

Page 25: Theory of Stochastic Processes 8. Markov chain Monte Carlosei/lec/ohp_SP8.pdf · Let Θ = f1;:::;100g2 and ˇi /exp(j i1 i2j2=2) ( much correlated). 0 2000 4000 6000 8000 10000 0

Application: random sampling of mazes

Proposal: Select a neighbour point of the root with probability 1/4.Then modify the tree.For example,

t=0 (initial) t=1

t=2 t=3

This chain is not reversible, but has the uniform stationarydistribution.

25 / 28

Page 26: Theory of Stochastic Processes 8. Markov chain Monte Carlosei/lec/ohp_SP8.pdf · Let Θ = f1;:::;100g2 and ˇi /exp(j i1 i2j2=2) ( much correlated). 0 2000 4000 6000 8000 10000 0

Application: Bayesian inference

θ is a parameter (unobservable)

x is a data (observable)

π(θ): prior

f (x |θ): likelihood

After the data x is observed, the posterior is given by

π(θ|x) =f (x |θ)π(θ)∑θ f (x |θ)π(θ)

.

26 / 28

Page 27: Theory of Stochastic Processes 8. Markov chain Monte Carlosei/lec/ohp_SP8.pdf · Let Θ = f1;:::;100g2 and ˇi /exp(j i1 i2j2=2) ( much correlated). 0 2000 4000 6000 8000 10000 0

Image restoration

original image perturbed image

restored image

0 200 400 600 800 1000

7000

8000

9000

1000

0

beta= 1 , gamma= 2

Index

log

post

erio

r

http://www.city.yokohama.lg.jp/koutuu/kids/profile.html27 / 28

Page 28: Theory of Stochastic Processes 8. Markov chain Monte Carlosei/lec/ohp_SP8.pdf · Let Θ = f1;:::;100g2 and ˇi /exp(j i1 i2j2=2) ( much correlated). 0 2000 4000 6000 8000 10000 0

Recommended problems

Recommended problems:

§6.5, Problem 1, 8*, 9.

§6.14, Problem 1.

§6.15, Problem 2,

The asterisk (*) shows difficulty.

28 / 28