PMR: Sampling II - The University of Edinburgh · Setting the right proposal to provide good mixing, but reasonable acceptance probability. Try to get acceptance rate to be 0:234

Metropolis Hastings Gibbs Sampling Hamiltonian MCMC

PMR: Sampling IIProbabilistic Modelling and Reasoning

Amos Storkey

School of Informatics, University of Edinburgh

Amos Storkey — PMR: Sampling II 1/20


Outline

1 Metropolis Hastings

2 Gibbs Sampling

3 Hamiltonian MCMC



Outline

1 Metropolis Hastings

2 Gibbs Sampling

3 Hamiltonian MCMC



Outline



MCMC - Metropolis-Hastings Sampler

Markov chain: Propose Q(θ′|θt).Accept with probability

P(Accept) = min(1,

P(θ′)Q(θt|θ′)P(θt)Q(θ′|θt)

)

If accept, set θt+1 = θ′, else set θt+1 = θt.





P(Accept) = min(1,


)






P(Accept) = min(1,


)






P(Accept) = min(1,


)




Metropolis-Hastings Transition

Write out the full transition probability

P(φ|θ) =P(φ)Q(θ|φ)P(θ)Q(φ|θ)

Q(φ|θ)+∫dφ′

(1 −min

(1,

P(φ′)Q(θ|φ′)P(θ)Q(φ′|θ)

))Q(φ′|θ)δ(φ − θ)

if P(θ)Q(φ|θ) > P(φ)Q(θ|φ) and

P(φ|θ) = Q(φ|θ)

otherwise.Exercise: Prove satisfies detailed balance.






Q(φ|θ)+∫dφ′

(1 −min

(1,


))Q(φ′|θ)δ(φ − θ)


P(φ|θ) = Q(φ|θ)







Q(φ|θ)+∫dφ′

(1 −min

(1,


))Q(φ′|θ)δ(φ − θ)


P(φ|θ) = Q(φ|θ)




Metropolis-Hastings Demo

Matlab Demo



Metropolis-Hastings Demo

Matlab Demo



Metropolis-Hastings Problems

Setting the right proposal to provide good mixing, butreasonable acceptance probability.Try to get acceptance rate to be 0.234 (various argumentsprovide conditions for this to be optimal). Can vary width.Random walk behaviour in high dimensions: proposal is adiffusion process. Can take a long time to get anywhere.















MCMC - Gibbs Sampler

Markov chain: Adapt θi keeping all θ j,i fixed. i.e.Choose i uniformly from i = 1, 2, . . . ,D. Set θt+1 = θt. Thensample θt+1,i from the conditional probability P(θt+1,i|θt+1,,i)where θt+1,,i denotes the set {θt+1, j| j , i}.Repeat.Can cycle through i either (this is not reversible, but can beshown to have a unique equilibrium distribution)



MCMC - Gibbs Sampler

Markov chain: Adapt θi keeping all θ j,i fixed. i.e.Choose i uniformly from i = 1, 2, . . . ,D. Set θt+1 = θt. Thensample θt+1,i from the conditional probability P(θt+1,i|θt+1,,i)where θt+1,,i denotes the set {θt+1, j| j , i}.Repeat.Can cycle through i either (this is not reversible, but can beshown to have a unique equilibrium distribution)



Gibbs Demo

Matlab Demo: GaussianMatlab Demo: Lattice



MCMC - Block Sampler

Gibbs sampler suffers from self reinforcement problem:frustrated systems.Instead of updating one variable at a time it may bepossible to update a whole block of variables in one go.Can help a bit. But need to have joint distribution for block.











MCMC - Mixed Sampler

Can mix ergodic sampling steps from different samplers:still satisfies detailed balance.Can helps to overcome the disadvantages of one methodby incorporating another.Often helpful to add in specific steps to help with mixing: ifa sampler gets stuck in one potential well (a regionsurrounded by low probability regions), need a means ofgetting it to transition to another.















Augmentation Methods

Suppose we have a big sampling problem, that is hard.Solution?Turn it into an even bigger problem by adding additionalvariables. Solve that.Can get sample from the original problem by just throwingunneeded variables away.If samples (ψi,θi) are from joint P(ψ,θ) then samples arealso samples of P(ψ|θ)P(θ) as this is the same. Hencesamples θi must be from P(θ) as

∫dψP(ψ|θ) = 1 whatever

θ is.Examples: Hamiltonian Monte-Carlo. Swendsen Wang.







































MCMC - Hamiltonian (or Hybrid) Monte-Carlo

Problem of Metropolis Hastings is random walk behaviour:slow diffusion to cover the space.Hamiltonion Monte-Carlo reduces this by augmenting eachvariable in the original space with another random variable.Now can do contour walks for each of these variables, inaddition to Gibbs sampling steps in the joint distribution ofthe augmented variables.Related to Hamiltonian systems in physics: maintainconstant energy by swapping kinetic energy for potentialenergy. Augmented variables are momentum variables.See Mackay Chapter 30.



















Hamiltonian Monte-Carlo Procedure

Original problem P(θ). Add augmented Gaussian P(v).Step 1: Sample from Gaussian P(v).Step 2: Choose a direction (b = ±1) (to maintainreversibility).Walk along Hamiltonian path.

θ̇i = −b∂∂vi

log P(v)

v̇i = b∂∂θi

log P(θ)

Repeat.Actually have to put in a few fixes to deal with fact that canonly run differential system using finite steps. Need to useleapfrog steps and run bidirectionally use aproposal/acceptance approach to ensure detailed balance.






log P(v)

v̇i = b∂∂θi

log P(θ)







log P(v)

v̇i = b∂∂θi

log P(θ)







log P(v)

v̇i = b∂∂θi

log P(θ)







log P(v)

v̇i = b∂∂θi

log P(θ)







log P(v)

v̇i = b∂∂θi

log P(θ)







log P(v)

v̇i = b∂∂θi

log P(θ)




Hamiltonian Monte-Carlo Comments

Most commonly used in large continuous systems.Hamiltonian Monte-Carlo is recommended for many typicalunsupervised settings.











Swendsen Wang

Problem: Gibbs sampling mixes poorly due toself-reinforcement.Add in bond variables between highly aligned variables.Bonds can be in ‘connected’ or ‘disconnected’ states.Ensure marginal distribution is original problem.Conditioned on the states, bonds are independent. Canrandomly cut strong bonds.Better mixing comes from fact that there are now fewerreinforcing influences.See http://www.inference.phy.cam.ac.uk/mackay/itila/swendsen.pdf


http://www.inference.phy.cam.ac.uk/mackay/itila/swendsen.pdf



Swendsen Wang






Swendsen Wang






Swendsen Wang






Swendsen Wang






Swendsen Wang






Swendsen Wang






Swendsen Wang






Convergence?

No guaranteed way to test convergence in general.Main tests involve tests of coalescence: converged when ithas forgotten past.Multiple chains: do they end up in the same place?



Convergence?




Convergence?




Convergence?




Real Machine Learning?

But how are these used in a real probabilistic modellingcontext?For sampling from the posterior distribution of machinelearning methods.











To Do

Examinable ReadingMackay Chapter 29, 30

Preparatory ReadingMackay Chapter 45

Extra ReadingAny papers of Radford Neal that take your fancy. Iain Murray’stutorial slides.


Documents

PMR: Sampling II - The University of Edinburgh · Setting the right proposal to provide good mixing, but reasonable acceptance probability. Try to get acceptance rate to be 0:234