Download pdf - Hastings paper discussion

Transcript
Page 1: Hastings paper discussion

OutlineIntroduction

Monte Carlo PrincipleMarkov Chain Theory

MCMCConclusion

Monte Carlo Sampling methods using MarkovChains and their Applications

Hastings-University of Toronto

Reading seminar on classics: C.P.Robertpresented by:Donia Skanji

December 3, 2012

1/40

Hastings-University of Toronto Reading Seminar:MCMC

Page 2: Hastings paper discussion

OutlineIntroduction

Monte Carlo PrincipleMarkov Chain Theory

MCMCConclusion

Outline

1 Introduction

2 Monte Carlo Principle

3 Markov Chain Theory

4 MCMC

5 Conclusion

2/40

Hastings-University of Toronto Reading Seminar:MCMC

Page 3: Hastings paper discussion

OutlineIntroduction

Monte Carlo PrincipleMarkov Chain Theory

MCMCConclusion

Introduction to MCMC Methods

3/40

Hastings-University of Toronto Reading Seminar:MCMC

Page 4: Hastings paper discussion

OutlineIntroduction

Monte Carlo PrincipleMarkov Chain Theory

MCMCConclusion

Introduction:

There are several numerical problems such as Integralcomputing and Maximum evaluation in large dimensionalspaces

Monte Carlo Methods are often applied to solve integrationand optimisation problems.

Monte Carlo Markov chain (MCMC) is one of the most knownMonte Carlo methods.

MCMC methods involve a large class of sampling algorithmsthat have had a greatest influence on science development.

4/40

Hastings-University of Toronto Reading Seminar:MCMC

Page 5: Hastings paper discussion

OutlineIntroduction

Monte Carlo PrincipleMarkov Chain Theory

MCMCConclusion

To expose some relevant theory and techniques ofapplication related to MCMC methods

To present a generalization of Metropolis sampling method.

Study objectif

5/40

Hastings-University of Toronto Reading Seminar:MCMC

Page 6: Hastings paper discussion

OutlineIntroduction

Monte Carlo PrincipleMarkov Chain Theory

MCMCConclusion

Next Steps

Monte Carlo Principle

Markov Chain

6/40

Hastings-University of Toronto Reading Seminar:MCMC

Page 7: Hastings paper discussion

OutlineIntroduction

Monte Carlo PrincipleMarkov Chain Theory

MCMCConclusion

Next Steps

Monte Carlo Principle

Markov Chain

6/40

Hastings-University of Toronto Reading Seminar:MCMC

Page 8: Hastings paper discussion

OutlineIntroduction

Monte Carlo PrincipleMarkov Chain Theory

MCMCConclusion

Next Steps

Monte Carlo Principle

Markov Chain

To introduce:

6/40

Hastings-University of Toronto Reading Seminar:MCMC

Page 9: Hastings paper discussion

OutlineIntroduction

Monte Carlo PrincipleMarkov Chain Theory

MCMCConclusion

Next Steps

Monte Carlo Principle

Markov Chain

To introduce:

-MCMC Methods

6/40

Hastings-University of Toronto Reading Seminar:MCMC

Page 10: Hastings paper discussion

OutlineIntroduction

Monte Carlo PrincipleMarkov Chain Theory

MCMCConclusion

Next Steps

Monte Carlo Principle

Markov Chain

To introduce:

-MCMC Methods

-MCMC Algorithms

6/40

Hastings-University of Toronto Reading Seminar:MCMC

Page 11: Hastings paper discussion

OutlineIntroduction

Monte Carlo PrincipleMarkov Chain Theory

MCMCConclusion

Monte Carlo Methods

7/40

Hastings-University of Toronto Reading Seminar:MCMC

Page 12: Hastings paper discussion

OutlineIntroduction

Monte Carlo PrincipleMarkov Chain Theory

MCMCConclusion

Overview

The idea of Monte Carlo simulation is to draw an i.i.d. set ofsamples{x i}Ni=1 from a target density π.

These N samples can be used to approximate the targetdensity with the following empirical point-mass function:

πN(x) = 1N

∑Ni=1 δx(i)(x)

For independent samples, by Law of Large numbers, one canapproximate the integrals I (f ) with tractable sums IN(f ) thatconverge as follows:

IN(f ) = 1N

∑Ni=1 f (x i )→ I (f ) =

∫f (x)π(x)dx a.s

see example

8/40

Hastings-University of Toronto Reading Seminar:MCMC

Page 13: Hastings paper discussion

OutlineIntroduction

Monte Carlo PrincipleMarkov Chain Theory

MCMCConclusion

N sample from π

x1

x2

x3

x4x5

xN

x9x6

x7

x8

But independent sampling from π may be difficult especially in ahigh dimensional space.

9/40

Hastings-University of Toronto Reading Seminar:MCMC

Page 14: Hastings paper discussion

OutlineIntroduction

Monte Carlo PrincipleMarkov Chain Theory

MCMCConclusion

It turns out that 1N

∑Ni=1 f (x i )→

∫f (x)π(x)dx (N →∞)

still applies if we generate samples using a Markovchain(dependent samples).

The idea of MCMC is to use Markov chain convergenceproperties to overcome the dimensionality problems met byregular Monte carlo methods.

But first, some revision of Markov chains in a discrete set χ.

10/40

Hastings-University of Toronto Reading Seminar:MCMC

Page 15: Hastings paper discussion

OutlineIntroduction

Monte Carlo PrincipleMarkov Chain Theory

MCMCConclusion

Markov Chain Theory

11/40

Hastings-University of Toronto Reading Seminar:MCMC

Page 16: Hastings paper discussion

OutlineIntroduction

Monte Carlo PrincipleMarkov Chain Theory

MCMCConclusion

Definition

Finite Markov Chain

A Markov chain is a mathematical system that undergoestransitions from one state to another, between a finite or countablenumber of possible states. It is a random process usuallycharacterized as memoryless:

P(X (t+1)/X (0),X (1), . . . ,X (t)) = P(X (t+1)/X (t))

12/40

Hastings-University of Toronto Reading Seminar:MCMC

Page 17: Hastings paper discussion

OutlineIntroduction

Monte Carlo PrincipleMarkov Chain Theory

MCMCConclusion

Transition Matrix

Let P = {Pij} the transition Matrix of a markov chain with states0, 1, 2 . . . ,S then, if X (t) denotes the state occupied by theprocess at time t, we have:

Pr(X (t+1) = j/X (t) = i) = Pij

X (t+1) = X (t).P13/40

Hastings-University of Toronto Reading Seminar:MCMC

Page 18: Hastings paper discussion

OutlineIntroduction

Monte Carlo PrincipleMarkov Chain Theory

MCMCConclusion

Properties

Stationarity

As t →∞,the Markov chain converges to itsstationary(invariant) distribution:π = π.P

Irreducibility

Irreducible means any set of states can bereached from any other state in a finite numberof moves (p(i , j) > 0 for every i and j).

Stationarity/Irreducibility

14/40

Hastings-University of Toronto Reading Seminar:MCMC

Page 19: Hastings paper discussion

OutlineIntroduction

Monte Carlo PrincipleMarkov Chain Theory

MCMCConclusion

Properties

Stationarity

As t →∞,the Markov chain converges to itsstationary(invariant) distribution:π = π.P

Irreducibility

Irreducible means any set of states can bereached from any other state in a finite numberof moves (p(i , j) > 0 for every i and j).

Stationarity/Irreducibility

14/40

Hastings-University of Toronto Reading Seminar:MCMC

Page 20: Hastings paper discussion

OutlineIntroduction

Monte Carlo PrincipleMarkov Chain Theory

MCMCConclusion

Properties

Stationarity

As t →∞,the Markov chain converges to itsstationary(invariant) distribution:π = π.P

Irreducibility

Irreducible means any set of states can bereached from any other state in a finite numberof moves (p(i , j) > 0 for every i and j).

Stationarity/Irreducibility

14/40

Hastings-University of Toronto Reading Seminar:MCMC

Page 21: Hastings paper discussion

OutlineIntroduction

Monte Carlo PrincipleMarkov Chain Theory

MCMCConclusion

Properties

Stationarity

As t →∞,the Markov chain converges to itsstationary(invariant) distribution:π = π.P

Irreducibility

Irreducible means any set of states can bereached from any other state in a finite numberof moves (p(i , j) > 0 for every i and j).

Stationarity/Irreducibility

14/40

Hastings-University of Toronto Reading Seminar:MCMC

Page 22: Hastings paper discussion

OutlineIntroduction

Monte Carlo PrincipleMarkov Chain Theory

MCMCConclusion

MCMC

The idea of Markov Monte Carlo Method is to choose P thetransition Matrix so that π(the target density which is verydifficult to sample from) is its unique stationary distribution.

Assume the Markov Chain:has a stationary distribution π(X )is irreducible and aperiodic

Then we have an Ergodic Theorem:

Theorem(Ergodic Theorem)

if the Markov chain xt is irriducible, aperiodic and stationary thenfor any function h with E |h| ≺ ∞

1N

∑i h(xi )→

∫h(x)dπ(x) when N →∞

15/40

Hastings-University of Toronto Reading Seminar:MCMC

Page 23: Hastings paper discussion

OutlineIntroduction

Monte Carlo PrincipleMarkov Chain Theory

MCMCConclusion

Recall that our goal is to build a markov chain (X t)using a transition matrix P so that the limiting distri-bution of (X t) is the target density π and integrals canbe approximated using the ergodic theorem.

Summary

16/40

Hastings-University of Toronto Reading Seminar:MCMC

Page 24: Hastings paper discussion

OutlineIntroduction

Monte Carlo PrincipleMarkov Chain Theory

MCMCConclusion

Question

How do we construct a Markov chain whose stationarydistribution is the target distribution,π

Metropolis et al (1953) showed how.

The method was generalized by Hastings (1970).

17/40

Hastings-University of Toronto Reading Seminar:MCMC

Page 25: Hastings paper discussion

OutlineIntroduction

Monte Carlo PrincipleMarkov Chain Theory

MCMCConclusion

Construction of the transition matrix

in order to construct a markov chain with π as its stationarydistribution, we have to consider a transition matrix P thatsatisfy the reversibility condition that for all i and j

πip(i → j) = πjp(j → i)

πipij = πjpji

This property ensures that∑πipij = πj(definition of a

stationary distribution) and hence that π is a stationarydistribution of P

18/40

Hastings-University of Toronto Reading Seminar:MCMC

Page 26: Hastings paper discussion

OutlineIntroduction

Monte Carlo PrincipleMarkov Chain Theory

MCMCConclusion

Construction of the transition matrix

How to choose thetransition MatrixP so that the

reversibility con-dition is verified?

πiPij = πjPji

19/40

Hastings-University of Toronto Reading Seminar:MCMC

Page 27: Hastings paper discussion

OutlineIntroduction

Monte Carlo PrincipleMarkov Chain Theory

MCMCConclusion

Overview

Suppose that we have a proposal matrix denoted Q where∑j qij = 1 .

If it happens that Q itself satisfies the reversibilitycondition:πiqij = πjqji for all i and j then our research isover,but most likely it will not.

We might find for example that for some i and j:πiqij > πjqji

A convenient way to correct this condition is to reduce thenumber of moves from i to j by introducing a probability αij

that the move is made.

20/40

Hastings-University of Toronto Reading Seminar:MCMC

Page 28: Hastings paper discussion

OutlineIntroduction

Monte Carlo PrincipleMarkov Chain Theory

MCMCConclusion

The choice of the transition matrix

we assume that the transition matrix P has this form:

Pij = qijαij if i 6= jPii = 1−

∑j 6=i Pij if i = j

where:X Q = qij is the proposal matrix or jumping matrix of anarbitrary Markov chain on the states 0, 1..S , which suggests anew sample value j given a sample value i .X αij is the acceptance probability to move from state i tostate j.

21/40

Hastings-University of Toronto Reading Seminar:MCMC

Page 29: Hastings paper discussion

OutlineIntroduction

Monte Carlo PrincipleMarkov Chain Theory

MCMCConclusion⊙

In order to obtain the reversibility condition, we have to verify :

πipij = πjpjiπiαijqij = πjαjiqji (∗)⊙

The probabilities αij and αji are introduced to ensure that thetwo sides of (∗) are in balance.⊙

In his paper, Hastings defined a generic form of the acceptanceprobability:

αij =sij

1+πi qijπj qji⊙

Where:sij is a symetric function of i and j(sij = sji ) chosen sothat 0 6 αij 6 1 for all i and j⊙

With this form of Pij and αij suggested by Hastings, it’s readilyverified the reversibility condition.

22/40

Hastings-University of Toronto Reading Seminar:MCMC

Page 30: Hastings paper discussion

OutlineIntroduction

Monte Carlo PrincipleMarkov Chain Theory

MCMCConclusion

2-The acceptance probability α

Recall that in this paper, Hastings defined the acceptanceprobability αij as follows:

αij =sij

1+πi qijπj qji

For a specific choice of sij , we recognize the acceptanceprobabilities suggested by both:⊕Metropolis et al(1953)⊕Barker(1965)

The choice of α

23/40

Hastings-University of Toronto Reading Seminar:MCMC

Page 31: Hastings paper discussion

OutlineIntroduction

Monte Carlo PrincipleMarkov Chain Theory

MCMCConclusion

The acceptance probability α

Two choices for Sij are given for all i and j by

s(M)ij =

{1 +

πiqijπjqji

ifπjqjiπiqij> 1

1 +πjqjiπiqij

ifπjqjiπiqij6 1

when qij = qji and Sij = S(M)ij we have the method devised

by Metropolis et al with α(M)ij = min(1,

πjπi

)

whenqij = qji and Sij = S(B)ij = 1 we have the method

devised by Barker with α(B)ij = (

πjπi+πj

)

The choice of Sij

24/40

Hastings-University of Toronto Reading Seminar:MCMC

Page 32: Hastings paper discussion

OutlineIntroduction

Monte Carlo PrincipleMarkov Chain Theory

MCMCConclusion

In this paper, Hastings mentionned that little is known about

the merits of these two choices of S(M)ij and S

(B)ij

Remark

r

25/40

Hastings-University of Toronto Reading Seminar:MCMC

Page 33: Hastings paper discussion

OutlineIntroduction

Monte Carlo PrincipleMarkov Chain Theory

MCMCConclusion

The Proposal Matrix Q

It has been recognised that the choice of the proposalmatrix/density is crucial to the success(rapid convergence)of MCMC algorithm.

The proposal matrix can be almost arbitrary which allows toreach all states frequently and assure a high acceptance rate

The choice of Q

26/40

Hastings-University of Toronto Reading Seminar:MCMC

Page 34: Hastings paper discussion

OutlineIntroduction

Monte Carlo PrincipleMarkov Chain Theory

MCMCConclusion

1 First, pick a proposal matrix Q(i , j) of an arbitrary Markovchain on the states 0, 1..S , which suggests a new samplevalue j given a sample value i .

2 Also, start with some arbitrary point i0 as the first sample.

3 Then, to return a new sample j given the most recentsample i , we proceed as follows:

4 Generate a proposed new sample value j from the jumpingdistribution Q(i → j).

5 Accept proposal with probability α(i → j)

-if proposal accepted then move to j/step4-repeat until a sample from the desired size isobtained

Algorithm

X

27/40

Hastings-University of Toronto Reading Seminar:MCMC

Page 35: Hastings paper discussion

OutlineIntroduction

Monte Carlo PrincipleMarkov Chain Theory

MCMCConclusion

1 First, pick a proposal matrix Q(i , j) of an arbitrary Markovchain on the states 0, 1..S , which suggests a new samplevalue j given a sample value i .

2 Also, start with some arbitrary point i0 as the first sample.

3 Then, to return a new sample j given the most recentsample i , we proceed as follows:

4 Generate a proposed new sample value j from the jumpingdistribution Q(i → j).

5 Accept proposal with probability α(i → j)

-if proposal accepted then move to j/step4-repeat until a sample from the desired size isobtained

Algorithm

X

27/40

Hastings-University of Toronto Reading Seminar:MCMC

Page 36: Hastings paper discussion

OutlineIntroduction

Monte Carlo PrincipleMarkov Chain Theory

MCMCConclusion

1 First, pick a proposal matrix Q(i , j) of an arbitrary Markovchain on the states 0, 1..S , which suggests a new samplevalue j given a sample value i .

2 Also, start with some arbitrary point i0 as the first sample.

3 Then, to return a new sample j given the most recentsample i , we proceed as follows:

4 Generate a proposed new sample value j from the jumpingdistribution Q(i → j).

5 Accept proposal with probability α(i → j)

-if proposal accepted then move to j/step4-repeat until a sample from the desired size isobtained

Algorithm

X

27/40

Hastings-University of Toronto Reading Seminar:MCMC

Page 37: Hastings paper discussion

OutlineIntroduction

Monte Carlo PrincipleMarkov Chain Theory

MCMCConclusion

1 First, pick a proposal matrix Q(i , j) of an arbitrary Markovchain on the states 0, 1..S , which suggests a new samplevalue j given a sample value i .

2 Also, start with some arbitrary point i0 as the first sample.

3 Then, to return a new sample j given the most recentsample i , we proceed as follows:

4 Generate a proposed new sample value j from the jumpingdistribution Q(i → j).

5 Accept proposal with probability α(i → j)

-if proposal accepted then move to j/step4-repeat until a sample from the desired size isobtained

Algorithm

X

27/40

Hastings-University of Toronto Reading Seminar:MCMC

Page 38: Hastings paper discussion

OutlineIntroduction

Monte Carlo PrincipleMarkov Chain Theory

MCMCConclusion

1 First, pick a proposal matrix Q(i , j) of an arbitrary Markovchain on the states 0, 1..S , which suggests a new samplevalue j given a sample value i .

2 Also, start with some arbitrary point i0 as the first sample.

3 Then, to return a new sample j given the most recentsample i , we proceed as follows:

4 Generate a proposed new sample value j from the jumpingdistribution Q(i → j).

5 Accept proposal with probability α(i → j)

-if proposal accepted then move to j/step4-repeat until a sample from the desired size isobtained

Algorithm

X

27/40

Hastings-University of Toronto Reading Seminar:MCMC

Page 39: Hastings paper discussion

OutlineIntroduction

Monte Carlo PrincipleMarkov Chain Theory

MCMCConclusion

1 First, pick a proposal matrix Q(i , j) of an arbitrary Markovchain on the states 0, 1..S , which suggests a new samplevalue j given a sample value i .

2 Also, start with some arbitrary point i0 as the first sample.

3 Then, to return a new sample j given the most recentsample i , we proceed as follows:

4 Generate a proposed new sample value j from the jumpingdistribution Q(i → j).

5 Accept proposal with probability α(i → j)

-if proposal accepted then move to j/step4

-repeat until a sample from the desired size isobtained

Algorithm

X

27/40

Hastings-University of Toronto Reading Seminar:MCMC

Page 40: Hastings paper discussion

OutlineIntroduction

Monte Carlo PrincipleMarkov Chain Theory

MCMCConclusion

1 First, pick a proposal matrix Q(i , j) of an arbitrary Markovchain on the states 0, 1..S , which suggests a new samplevalue j given a sample value i .

2 Also, start with some arbitrary point i0 as the first sample.

3 Then, to return a new sample j given the most recentsample i , we proceed as follows:

4 Generate a proposed new sample value j from the jumpingdistribution Q(i → j).

5 Accept proposal with probability α(i → j)

-if proposal accepted then move to j/step4-repeat until a sample from the desired size isobtained

Algorithm

X

27/40

Hastings-University of Toronto Reading Seminar:MCMC

Page 41: Hastings paper discussion

OutlineIntroduction

Monte Carlo PrincipleMarkov Chain Theory

MCMCConclusion

An empirical way for checking convergence is to let two ormore different chains run in parallel and see if they areconcentrating on the some place.

The calculation of α does not require knowledge of thenormalizing constant of π because it appears both in thenumerator and denominator.

Although the Markov chain eventually converges to thedesired distribution, the initial samples may follow a verydifferent distribution, especially if the starting point is in aregion of low density.As a result a burn in period is typically necessary.

Remarks

r

28/40

Hastings-University of Toronto Reading Seminar:MCMC

Page 42: Hastings paper discussion

OutlineIntroduction

Monte Carlo PrincipleMarkov Chain Theory

MCMCConclusion

Example:Poisson Distribution as the Target Distribution

� Consider π as the Poisson distribution with intensity λ > 0

πi = e−λ λi

i! where i = 0, 1, 2, · · ·

� Hastings(1970)suggests the following proposal transition matrix

qij =

q00 = q01 = 1

2 if i = 012 if j = i − 112 if j = i + 10 otherwise

Q =

12

12 0 0 · · ·

12 0 1

2 0 · · ·0 1

2 0 12 · · ·

0 0 12 0 · · ·

......

...... · · ·

� Q is in fact symmetric, and the algorithm reduces to that ofMetropolis

skip

29/40

Hastings-University of Toronto Reading Seminar:MCMC

Page 43: Hastings paper discussion

OutlineIntroduction

Monte Carlo PrincipleMarkov Chain Theory

MCMCConclusion

pij = qijα(M)ij =

12min(1, i

λ) if j = i − 112min(1, λ

i+1 ) if j = i + 1

1− pi ,i−1 − pi ,i+1 j = i0 otherwise

For i = 0

p0j =

12min(1, λ) if j = 11− 1

2min(1, λ) if j = 00 otherwise

� this transition probability is aperiodic and irreducible� In practice, if λ is small, this choice of Q seems to work fairlywell and fast to approximate π

30/40

Hastings-University of Toronto Reading Seminar:MCMC

Page 44: Hastings paper discussion

OutlineIntroduction

Monte Carlo PrincipleMarkov Chain Theory

MCMCConclusion

Given a starting point i we take:

j=i+1 with probability 12

or j=i-1 with probability 12

qij = 12δi−1(j) + 1

2δi+1(j)

We calculate Metropolis and Hastings ratio:

αij = min{1, π(j)π(i)} = min{1, λ(j−i) × i!

j!}let u ∼ U[0, 1]

if u ≤ αij then Xk+1 = j

else Xk+1 = Xk = i

Algorithm

31/40

Hastings-University of Toronto Reading Seminar:MCMC

Page 45: Hastings paper discussion

OutlineIntroduction

Monte Carlo PrincipleMarkov Chain Theory

MCMCConclusion

Given a starting point i we take:

j=i+1 with probability 12

or j=i-1 with probability 12

qij = 12δi−1(j) + 1

2δi+1(j)

We calculate Metropolis and Hastings ratio:

αij = min{1, π(j)π(i)} = min{1, λ(j−i) × i!

j!}let u ∼ U[0, 1]

if u ≤ αij then Xk+1 = j

else Xk+1 = Xk = i

Algorithm

31/40

Hastings-University of Toronto Reading Seminar:MCMC

Page 46: Hastings paper discussion

OutlineIntroduction

Monte Carlo PrincipleMarkov Chain Theory

MCMCConclusion

Given a starting point i we take:

j=i+1 with probability 12

or j=i-1 with probability 12

qij = 12δi−1(j) + 1

2δi+1(j)

We calculate Metropolis and Hastings ratio:

αij = min{1, π(j)π(i)} = min{1, λ(j−i) × i!

j!}let u ∼ U[0, 1]

if u ≤ αij then Xk+1 = j

else Xk+1 = Xk = i

Algorithm

31/40

Hastings-University of Toronto Reading Seminar:MCMC

Page 47: Hastings paper discussion

OutlineIntroduction

Monte Carlo PrincipleMarkov Chain Theory

MCMCConclusion

Given a starting point i we take:

j=i+1 with probability 12

or j=i-1 with probability 12

qij = 12δi−1(j) + 1

2δi+1(j)

We calculate Metropolis and Hastings ratio:

αij = min{1, π(j)π(i)} = min{1, λ(j−i) × i!

j!}let u ∼ U[0, 1]

if u ≤ αij then Xk+1 = j

else Xk+1 = Xk = i

Algorithm

31/40

Hastings-University of Toronto Reading Seminar:MCMC

Page 48: Hastings paper discussion

OutlineIntroduction

Monte Carlo PrincipleMarkov Chain Theory

MCMCConclusion

Given a starting point i we take:

j=i+1 with probability 12

or j=i-1 with probability 12

qij = 12δi−1(j) + 1

2δi+1(j)

We calculate Metropolis and Hastings ratio:

αij = min{1, π(j)π(i)} = min{1, λ(j−i) × i!

j!}let u ∼ U[0, 1]

if u ≤ αij then Xk+1 = j

else Xk+1 = Xk = i

Algorithm

31/40

Hastings-University of Toronto Reading Seminar:MCMC

Page 49: Hastings paper discussion

OutlineIntroduction

Monte Carlo PrincipleMarkov Chain Theory

MCMCConclusion

Given a starting point i we take:

j=i+1 with probability 12

or j=i-1 with probability 12

qij = 12δi−1(j) + 1

2δi+1(j)

We calculate Metropolis and Hastings ratio:

αij = min{1, π(j)π(i)} = min{1, λ(j−i) × i!

j!}

let u ∼ U[0, 1]

if u ≤ αij then Xk+1 = j

else Xk+1 = Xk = i

Algorithm

31/40

Hastings-University of Toronto Reading Seminar:MCMC

Page 50: Hastings paper discussion

OutlineIntroduction

Monte Carlo PrincipleMarkov Chain Theory

MCMCConclusion

Given a starting point i we take:

j=i+1 with probability 12

or j=i-1 with probability 12

qij = 12δi−1(j) + 1

2δi+1(j)

We calculate Metropolis and Hastings ratio:

αij = min{1, π(j)π(i)} = min{1, λ(j−i) × i!

j!}let u ∼ U[0, 1]

if u ≤ αij then Xk+1 = j

else Xk+1 = Xk = i

Algorithm

31/40

Hastings-University of Toronto Reading Seminar:MCMC

Page 51: Hastings paper discussion

OutlineIntroduction

Monte Carlo PrincipleMarkov Chain Theory

MCMCConclusion

Given a starting point i we take:

j=i+1 with probability 12

or j=i-1 with probability 12

qij = 12δi−1(j) + 1

2δi+1(j)

We calculate Metropolis and Hastings ratio:

αij = min{1, π(j)π(i)} = min{1, λ(j−i) × i!

j!}let u ∼ U[0, 1]

if u ≤ αij then Xk+1 = j

else Xk+1 = Xk = i

Algorithm

31/40

Hastings-University of Toronto Reading Seminar:MCMC

Page 52: Hastings paper discussion

OutlineIntroduction

Monte Carlo PrincipleMarkov Chain Theory

MCMCConclusion

Given a starting point i we take:

j=i+1 with probability 12

or j=i-1 with probability 12

qij = 12δi−1(j) + 1

2δi+1(j)

We calculate Metropolis and Hastings ratio:

αij = min{1, π(j)π(i)} = min{1, λ(j−i) × i!

j!}let u ∼ U[0, 1]

if u ≤ αij then Xk+1 = j

else Xk+1 = Xk = i

Algorithm

31/40

Hastings-University of Toronto Reading Seminar:MCMC

Page 53: Hastings paper discussion

OutlineIntroduction

Monte Carlo PrincipleMarkov Chain Theory

MCMCConclusion

R implementation

> l i b r a r y (mcsm)> f a c t=f u n c t i o n ( n ){gamma( n+1)}> p o i s s o n f=f u n c t i o n ( n , lambda , x0 ){

x=x0xn=x0f o r ( i i n 1 : n ){i f ( xn!= 0)y=xn +(2∗ rbinom ( 1 , 1 , 0 . 5 ) −1 )e l s e {y=rbinom ( 1 , 1 , 0 . 5 )}a l p h a=min ( 1 , lambda ˆ( y−xn )∗ f a c t ( xn )/ f a c t ( y ) )i f ( r u n i f ( 1 ) < a l p h a ){ xn=y}x=c ( x , xn )}x}

32/40

Hastings-University of Toronto Reading Seminar:MCMC

Page 54: Hastings paper discussion

OutlineIntroduction

Monte Carlo PrincipleMarkov Chain Theory

MCMCConclusion

33/40

Hastings-University of Toronto Reading Seminar:MCMC

Page 55: Hastings paper discussion

OutlineIntroduction

Monte Carlo PrincipleMarkov Chain Theory

MCMCConclusion

Multivariate Target

if the distribution π is d-dimensional and the simulatedprocess X (t) = {X1(t), · · ·Xd(t)}, we may use the followingtechniques to construct the transition matrix P

1 In the transition from t to t + 1 all co-ordinates of X (t) maybe changed

2 In the transition from t to t + 1 only one co-ordinates of X (t)may be changed, that selection may be made at randomamong the d co-ordinates

3 In the transition from time t to t + 1 only one co-ordinate maychange in each transition, and the co-ordinate being selectedin a fixed rather than a random sequence.

34/40

Hastings-University of Toronto Reading Seminar:MCMC

Page 56: Hastings paper discussion

OutlineIntroduction

Monte Carlo PrincipleMarkov Chain Theory

MCMCConclusion

Hastings transformed the d dimensional problem to onedimensional problem

The approach is based on updating one component at eachtime

The transition matrix is defined as follow:P = P1.P2 · · ·Pd

For each (k = 1 · · · d),Pk is constructed so that πPk = π

π will be a stationary distribution of P sinceπP = πP1 · · ·Pd = πP2 · · ·Pd

Hastings’justification

Orthogonal Matrices

35/40

Hastings-University of Toronto Reading Seminar:MCMC

Page 57: Hastings paper discussion

OutlineIntroduction

Monte Carlo PrincipleMarkov Chain Theory

MCMCConclusion

Conclusion

+In this paper, Hastings gives a generalization of Metropoliset al (1953) approach.

+He also introduiced gibbs sampling strategy when hepresented the multivariate target.

+Hastings treated the continuous case using a discretizationanalogy.

-little information about the merits of Metropolis and Barkeracceptance forms.

36/40

Hastings-University of Toronto Reading Seminar:MCMC

Page 58: Hastings paper discussion

OutlineIntroduction

Monte Carlo PrincipleMarkov Chain Theory

MCMCConclusion

Thank You For Your Attention

37/40

Hastings-University of Toronto Reading Seminar:MCMC

Page 59: Hastings paper discussion

OutlineIntroduction

Monte Carlo PrincipleMarkov Chain Theory

MCMCConclusion

Bibliography

[1]:W.K.Hastings(1970).Monte Carlo Sampling Methods UsingMarkov chain and their Applications[2]:Christian P Roberts (2010).Introduicing Monte Carlo Methodswith R[3]:Kenneth Lange(2010).Numerical Analysis for statisticians[4]:Siddhartha Chib(1995).Understanding the metropolis Hastingsalgorithm[5]:Robert Gray(2001).Advanced statistical computing

38/40

Hastings-University of Toronto Reading Seminar:MCMC

Page 60: Hastings paper discussion

OutlineIntroduction

Monte Carlo PrincipleMarkov Chain Theory

MCMCConclusion

Random orthogonal Matrices

Hastings suggests an interesting chain on the space n × northogonal matrices(H ′H = I , det(H) = 1)

The proposal stage of Hasting’s algorithm consists of choosingat random 2 indices i and j and an angle θ ∈ [0, 2π]

The proposed replacement for the current rotation matrix H isthen H ′ = Eij(θ).H

Eij(θ) coincides with the identity matrix expect for someentries

since Eij(θ)−1 = Eij(−θ) the transition density is symmetricand the markov chain induced is reversible

back

39/40

Hastings-University of Toronto Reading Seminar:MCMC

Page 61: Hastings paper discussion

OutlineIntroduction

Monte Carlo PrincipleMarkov Chain Theory

MCMCConclusion

Estimating Pi using Monte Carlo methods (SAS output)

Problem :Estimate PI using Monte CarloIntegration

Strategy:Equation of a circle with radius= 1 :

x2 + y2 = 1 which can be written y =√

1 − x2

Area of this circle =pi

Area of this circle in the first quadrant = pi�4

Generate Ux Uniform(0, 1) and Uy Uniform(0, 1)

Check to see if Uy ≤√

1 − U2x

The proportion of generated points when thisCondition is true is an estimate of pi�4.

Based on 10,000 simulated points using SAS:PI (SE) = 3.1056(0.016)

back

40/40

Hastings-University of Toronto Reading Seminar:MCMC