65
Metropolis Hastings Gibbs Sampling Hamiltonian MCMC PMR: Sampling II Probabilistic Modelling and Reasoning Amos Storkey School of Informatics, University of Edinburgh Amos Storkey — PMR: Sampling II 1/20

PMR: Sampling II - The University of Edinburgh · Setting the right proposal to provide good mixing, but reasonable acceptance probability. Try to get acceptance rate to be 0:234

Embed Size (px)

Citation preview

Page 1: PMR: Sampling II - The University of Edinburgh · Setting the right proposal to provide good mixing, but reasonable acceptance probability. Try to get acceptance rate to be 0:234

Metropolis Hastings Gibbs Sampling Hamiltonian MCMC

PMR: Sampling IIProbabilistic Modelling and Reasoning

Amos Storkey

School of Informatics, University of Edinburgh

Amos Storkey — PMR: Sampling II 1/20

Page 2: PMR: Sampling II - The University of Edinburgh · Setting the right proposal to provide good mixing, but reasonable acceptance probability. Try to get acceptance rate to be 0:234

Metropolis Hastings Gibbs Sampling Hamiltonian MCMC

Outline

1 Metropolis Hastings

2 Gibbs Sampling

3 Hamiltonian MCMC

Amos Storkey — PMR: Sampling II 2/20

Page 3: PMR: Sampling II - The University of Edinburgh · Setting the right proposal to provide good mixing, but reasonable acceptance probability. Try to get acceptance rate to be 0:234

Metropolis Hastings Gibbs Sampling Hamiltonian MCMC

Outline

1 Metropolis Hastings

2 Gibbs Sampling

3 Hamiltonian MCMC

Amos Storkey — PMR: Sampling II 3/20

Page 4: PMR: Sampling II - The University of Edinburgh · Setting the right proposal to provide good mixing, but reasonable acceptance probability. Try to get acceptance rate to be 0:234

Metropolis Hastings Gibbs Sampling Hamiltonian MCMC

Outline

Amos Storkey — PMR: Sampling II 4/20

Page 5: PMR: Sampling II - The University of Edinburgh · Setting the right proposal to provide good mixing, but reasonable acceptance probability. Try to get acceptance rate to be 0:234

Metropolis Hastings Gibbs Sampling Hamiltonian MCMC

MCMC - Metropolis-Hastings Sampler

Markov chain: Propose Q(θ′|θt).Accept with probability

P(Accept) = min(1,

P(θ′)Q(θt|θ′)P(θt)Q(θ′|θt)

)

If accept, set θt+1 = θ′, else set θt+1 = θt.

Amos Storkey — PMR: Sampling II 5/20

Page 6: PMR: Sampling II - The University of Edinburgh · Setting the right proposal to provide good mixing, but reasonable acceptance probability. Try to get acceptance rate to be 0:234

Metropolis Hastings Gibbs Sampling Hamiltonian MCMC

MCMC - Metropolis-Hastings Sampler

Markov chain: Propose Q(θ′|θt).Accept with probability

P(Accept) = min(1,

P(θ′)Q(θt|θ′)P(θt)Q(θ′|θt)

)

If accept, set θt+1 = θ′, else set θt+1 = θt.

Amos Storkey — PMR: Sampling II 5/20

Page 7: PMR: Sampling II - The University of Edinburgh · Setting the right proposal to provide good mixing, but reasonable acceptance probability. Try to get acceptance rate to be 0:234

Metropolis Hastings Gibbs Sampling Hamiltonian MCMC

MCMC - Metropolis-Hastings Sampler

Markov chain: Propose Q(θ′|θt).Accept with probability

P(Accept) = min(1,

P(θ′)Q(θt|θ′)P(θt)Q(θ′|θt)

)

If accept, set θt+1 = θ′, else set θt+1 = θt.

Amos Storkey — PMR: Sampling II 5/20

Page 8: PMR: Sampling II - The University of Edinburgh · Setting the right proposal to provide good mixing, but reasonable acceptance probability. Try to get acceptance rate to be 0:234

Metropolis Hastings Gibbs Sampling Hamiltonian MCMC

MCMC - Metropolis-Hastings Sampler

Markov chain: Propose Q(θ′|θt).Accept with probability

P(Accept) = min(1,

P(θ′)Q(θt|θ′)P(θt)Q(θ′|θt)

)

If accept, set θt+1 = θ′, else set θt+1 = θt.

Amos Storkey — PMR: Sampling II 5/20

Page 9: PMR: Sampling II - The University of Edinburgh · Setting the right proposal to provide good mixing, but reasonable acceptance probability. Try to get acceptance rate to be 0:234

Metropolis Hastings Gibbs Sampling Hamiltonian MCMC

Metropolis-Hastings Transition

Write out the full transition probability

P(φ|θ) =P(φ)Q(θ|φ)P(θ)Q(φ|θ)

Q(φ|θ)+∫dφ′

(1 −min

(1,

P(φ′)Q(θ|φ′)P(θ)Q(φ′|θ)

))Q(φ′|θ)δ(φ − θ)

if P(θ)Q(φ|θ) > P(φ)Q(θ|φ) and

P(φ|θ) = Q(φ|θ)

otherwise.Exercise: Prove satisfies detailed balance.

Amos Storkey — PMR: Sampling II 6/20

Page 10: PMR: Sampling II - The University of Edinburgh · Setting the right proposal to provide good mixing, but reasonable acceptance probability. Try to get acceptance rate to be 0:234

Metropolis Hastings Gibbs Sampling Hamiltonian MCMC

Metropolis-Hastings Transition

Write out the full transition probability

P(φ|θ) =P(φ)Q(θ|φ)P(θ)Q(φ|θ)

Q(φ|θ)+∫dφ′

(1 −min

(1,

P(φ′)Q(θ|φ′)P(θ)Q(φ′|θ)

))Q(φ′|θ)δ(φ − θ)

if P(θ)Q(φ|θ) > P(φ)Q(θ|φ) and

P(φ|θ) = Q(φ|θ)

otherwise.Exercise: Prove satisfies detailed balance.

Amos Storkey — PMR: Sampling II 6/20

Page 11: PMR: Sampling II - The University of Edinburgh · Setting the right proposal to provide good mixing, but reasonable acceptance probability. Try to get acceptance rate to be 0:234

Metropolis Hastings Gibbs Sampling Hamiltonian MCMC

Metropolis-Hastings Transition

Write out the full transition probability

P(φ|θ) =P(φ)Q(θ|φ)P(θ)Q(φ|θ)

Q(φ|θ)+∫dφ′

(1 −min

(1,

P(φ′)Q(θ|φ′)P(θ)Q(φ′|θ)

))Q(φ′|θ)δ(φ − θ)

if P(θ)Q(φ|θ) > P(φ)Q(θ|φ) and

P(φ|θ) = Q(φ|θ)

otherwise.Exercise: Prove satisfies detailed balance.

Amos Storkey — PMR: Sampling II 6/20

Page 12: PMR: Sampling II - The University of Edinburgh · Setting the right proposal to provide good mixing, but reasonable acceptance probability. Try to get acceptance rate to be 0:234

Metropolis Hastings Gibbs Sampling Hamiltonian MCMC

Metropolis-Hastings Demo

Matlab Demo

Amos Storkey — PMR: Sampling II 7/20

Page 13: PMR: Sampling II - The University of Edinburgh · Setting the right proposal to provide good mixing, but reasonable acceptance probability. Try to get acceptance rate to be 0:234

Metropolis Hastings Gibbs Sampling Hamiltonian MCMC

Metropolis-Hastings Demo

Matlab Demo

Amos Storkey — PMR: Sampling II 7/20

Page 14: PMR: Sampling II - The University of Edinburgh · Setting the right proposal to provide good mixing, but reasonable acceptance probability. Try to get acceptance rate to be 0:234

Metropolis Hastings Gibbs Sampling Hamiltonian MCMC

Metropolis-Hastings Problems

Setting the right proposal to provide good mixing, butreasonable acceptance probability.Try to get acceptance rate to be 0.234 (various argumentsprovide conditions for this to be optimal). Can vary width.Random walk behaviour in high dimensions: proposal is adiffusion process. Can take a long time to get anywhere.

Amos Storkey — PMR: Sampling II 8/20

Page 15: PMR: Sampling II - The University of Edinburgh · Setting the right proposal to provide good mixing, but reasonable acceptance probability. Try to get acceptance rate to be 0:234

Metropolis Hastings Gibbs Sampling Hamiltonian MCMC

Metropolis-Hastings Problems

Setting the right proposal to provide good mixing, butreasonable acceptance probability.Try to get acceptance rate to be 0.234 (various argumentsprovide conditions for this to be optimal). Can vary width.Random walk behaviour in high dimensions: proposal is adiffusion process. Can take a long time to get anywhere.

Amos Storkey — PMR: Sampling II 8/20

Page 16: PMR: Sampling II - The University of Edinburgh · Setting the right proposal to provide good mixing, but reasonable acceptance probability. Try to get acceptance rate to be 0:234

Metropolis Hastings Gibbs Sampling Hamiltonian MCMC

Metropolis-Hastings Problems

Setting the right proposal to provide good mixing, butreasonable acceptance probability.Try to get acceptance rate to be 0.234 (various argumentsprovide conditions for this to be optimal). Can vary width.Random walk behaviour in high dimensions: proposal is adiffusion process. Can take a long time to get anywhere.

Amos Storkey — PMR: Sampling II 8/20

Page 17: PMR: Sampling II - The University of Edinburgh · Setting the right proposal to provide good mixing, but reasonable acceptance probability. Try to get acceptance rate to be 0:234

Metropolis Hastings Gibbs Sampling Hamiltonian MCMC

Metropolis-Hastings Problems

Setting the right proposal to provide good mixing, butreasonable acceptance probability.Try to get acceptance rate to be 0.234 (various argumentsprovide conditions for this to be optimal). Can vary width.Random walk behaviour in high dimensions: proposal is adiffusion process. Can take a long time to get anywhere.

Amos Storkey — PMR: Sampling II 8/20

Page 18: PMR: Sampling II - The University of Edinburgh · Setting the right proposal to provide good mixing, but reasonable acceptance probability. Try to get acceptance rate to be 0:234

Metropolis Hastings Gibbs Sampling Hamiltonian MCMC

MCMC - Gibbs Sampler

Markov chain: Adapt θi keeping all θ j,i fixed. i.e.Choose i uniformly from i = 1, 2, . . . ,D. Set θt+1 = θt. Thensample θt+1,i from the conditional probability P(θt+1,i|θt+1,,i)where θt+1,,i denotes the set {θt+1, j| j , i}.Repeat.Can cycle through i either (this is not reversible, but can beshown to have a unique equilibrium distribution)

Amos Storkey — PMR: Sampling II 9/20

Page 19: PMR: Sampling II - The University of Edinburgh · Setting the right proposal to provide good mixing, but reasonable acceptance probability. Try to get acceptance rate to be 0:234

Metropolis Hastings Gibbs Sampling Hamiltonian MCMC

MCMC - Gibbs Sampler

Markov chain: Adapt θi keeping all θ j,i fixed. i.e.Choose i uniformly from i = 1, 2, . . . ,D. Set θt+1 = θt. Thensample θt+1,i from the conditional probability P(θt+1,i|θt+1,,i)where θt+1,,i denotes the set {θt+1, j| j , i}.Repeat.Can cycle through i either (this is not reversible, but can beshown to have a unique equilibrium distribution)

Amos Storkey — PMR: Sampling II 9/20

Page 20: PMR: Sampling II - The University of Edinburgh · Setting the right proposal to provide good mixing, but reasonable acceptance probability. Try to get acceptance rate to be 0:234

Metropolis Hastings Gibbs Sampling Hamiltonian MCMC

Gibbs Demo

Matlab Demo: GaussianMatlab Demo: Lattice

Amos Storkey — PMR: Sampling II 10/20

Page 21: PMR: Sampling II - The University of Edinburgh · Setting the right proposal to provide good mixing, but reasonable acceptance probability. Try to get acceptance rate to be 0:234

Metropolis Hastings Gibbs Sampling Hamiltonian MCMC

MCMC - Block Sampler

Gibbs sampler suffers from self reinforcement problem:frustrated systems.Instead of updating one variable at a time it may bepossible to update a whole block of variables in one go.Can help a bit. But need to have joint distribution for block.

Amos Storkey — PMR: Sampling II 11/20

Page 22: PMR: Sampling II - The University of Edinburgh · Setting the right proposal to provide good mixing, but reasonable acceptance probability. Try to get acceptance rate to be 0:234

Metropolis Hastings Gibbs Sampling Hamiltonian MCMC

MCMC - Block Sampler

Gibbs sampler suffers from self reinforcement problem:frustrated systems.Instead of updating one variable at a time it may bepossible to update a whole block of variables in one go.Can help a bit. But need to have joint distribution for block.

Amos Storkey — PMR: Sampling II 11/20

Page 23: PMR: Sampling II - The University of Edinburgh · Setting the right proposal to provide good mixing, but reasonable acceptance probability. Try to get acceptance rate to be 0:234

Metropolis Hastings Gibbs Sampling Hamiltonian MCMC

MCMC - Block Sampler

Gibbs sampler suffers from self reinforcement problem:frustrated systems.Instead of updating one variable at a time it may bepossible to update a whole block of variables in one go.Can help a bit. But need to have joint distribution for block.

Amos Storkey — PMR: Sampling II 11/20

Page 24: PMR: Sampling II - The University of Edinburgh · Setting the right proposal to provide good mixing, but reasonable acceptance probability. Try to get acceptance rate to be 0:234

Metropolis Hastings Gibbs Sampling Hamiltonian MCMC

MCMC - Mixed Sampler

Can mix ergodic sampling steps from different samplers:still satisfies detailed balance.Can helps to overcome the disadvantages of one methodby incorporating another.Often helpful to add in specific steps to help with mixing: ifa sampler gets stuck in one potential well (a regionsurrounded by low probability regions), need a means ofgetting it to transition to another.

Amos Storkey — PMR: Sampling II 12/20

Page 25: PMR: Sampling II - The University of Edinburgh · Setting the right proposal to provide good mixing, but reasonable acceptance probability. Try to get acceptance rate to be 0:234

Metropolis Hastings Gibbs Sampling Hamiltonian MCMC

MCMC - Mixed Sampler

Can mix ergodic sampling steps from different samplers:still satisfies detailed balance.Can helps to overcome the disadvantages of one methodby incorporating another.Often helpful to add in specific steps to help with mixing: ifa sampler gets stuck in one potential well (a regionsurrounded by low probability regions), need a means ofgetting it to transition to another.

Amos Storkey — PMR: Sampling II 12/20

Page 26: PMR: Sampling II - The University of Edinburgh · Setting the right proposal to provide good mixing, but reasonable acceptance probability. Try to get acceptance rate to be 0:234

Metropolis Hastings Gibbs Sampling Hamiltonian MCMC

MCMC - Mixed Sampler

Can mix ergodic sampling steps from different samplers:still satisfies detailed balance.Can helps to overcome the disadvantages of one methodby incorporating another.Often helpful to add in specific steps to help with mixing: ifa sampler gets stuck in one potential well (a regionsurrounded by low probability regions), need a means ofgetting it to transition to another.

Amos Storkey — PMR: Sampling II 12/20

Page 27: PMR: Sampling II - The University of Edinburgh · Setting the right proposal to provide good mixing, but reasonable acceptance probability. Try to get acceptance rate to be 0:234

Metropolis Hastings Gibbs Sampling Hamiltonian MCMC

MCMC - Mixed Sampler

Can mix ergodic sampling steps from different samplers:still satisfies detailed balance.Can helps to overcome the disadvantages of one methodby incorporating another.Often helpful to add in specific steps to help with mixing: ifa sampler gets stuck in one potential well (a regionsurrounded by low probability regions), need a means ofgetting it to transition to another.

Amos Storkey — PMR: Sampling II 12/20

Page 28: PMR: Sampling II - The University of Edinburgh · Setting the right proposal to provide good mixing, but reasonable acceptance probability. Try to get acceptance rate to be 0:234

Metropolis Hastings Gibbs Sampling Hamiltonian MCMC

Augmentation Methods

Suppose we have a big sampling problem, that is hard.Solution?Turn it into an even bigger problem by adding additionalvariables. Solve that.Can get sample from the original problem by just throwingunneeded variables away.If samples (ψi,θi) are from joint P(ψ,θ) then samples arealso samples of P(ψ|θ)P(θ) as this is the same. Hencesamples θi must be from P(θ) as

∫dψP(ψ|θ) = 1 whatever

θ is.Examples: Hamiltonian Monte-Carlo. Swendsen Wang.

Amos Storkey — PMR: Sampling II 13/20

Page 29: PMR: Sampling II - The University of Edinburgh · Setting the right proposal to provide good mixing, but reasonable acceptance probability. Try to get acceptance rate to be 0:234

Metropolis Hastings Gibbs Sampling Hamiltonian MCMC

Augmentation Methods

Suppose we have a big sampling problem, that is hard.Solution?Turn it into an even bigger problem by adding additionalvariables. Solve that.Can get sample from the original problem by just throwingunneeded variables away.If samples (ψi,θi) are from joint P(ψ,θ) then samples arealso samples of P(ψ|θ)P(θ) as this is the same. Hencesamples θi must be from P(θ) as

∫dψP(ψ|θ) = 1 whatever

θ is.Examples: Hamiltonian Monte-Carlo. Swendsen Wang.

Amos Storkey — PMR: Sampling II 13/20

Page 30: PMR: Sampling II - The University of Edinburgh · Setting the right proposal to provide good mixing, but reasonable acceptance probability. Try to get acceptance rate to be 0:234

Metropolis Hastings Gibbs Sampling Hamiltonian MCMC

Augmentation Methods

Suppose we have a big sampling problem, that is hard.Solution?Turn it into an even bigger problem by adding additionalvariables. Solve that.Can get sample from the original problem by just throwingunneeded variables away.If samples (ψi,θi) are from joint P(ψ,θ) then samples arealso samples of P(ψ|θ)P(θ) as this is the same. Hencesamples θi must be from P(θ) as

∫dψP(ψ|θ) = 1 whatever

θ is.Examples: Hamiltonian Monte-Carlo. Swendsen Wang.

Amos Storkey — PMR: Sampling II 13/20

Page 31: PMR: Sampling II - The University of Edinburgh · Setting the right proposal to provide good mixing, but reasonable acceptance probability. Try to get acceptance rate to be 0:234

Metropolis Hastings Gibbs Sampling Hamiltonian MCMC

Augmentation Methods

Suppose we have a big sampling problem, that is hard.Solution?Turn it into an even bigger problem by adding additionalvariables. Solve that.Can get sample from the original problem by just throwingunneeded variables away.If samples (ψi,θi) are from joint P(ψ,θ) then samples arealso samples of P(ψ|θ)P(θ) as this is the same. Hencesamples θi must be from P(θ) as

∫dψP(ψ|θ) = 1 whatever

θ is.Examples: Hamiltonian Monte-Carlo. Swendsen Wang.

Amos Storkey — PMR: Sampling II 13/20

Page 32: PMR: Sampling II - The University of Edinburgh · Setting the right proposal to provide good mixing, but reasonable acceptance probability. Try to get acceptance rate to be 0:234

Metropolis Hastings Gibbs Sampling Hamiltonian MCMC

Augmentation Methods

Suppose we have a big sampling problem, that is hard.Solution?Turn it into an even bigger problem by adding additionalvariables. Solve that.Can get sample from the original problem by just throwingunneeded variables away.If samples (ψi,θi) are from joint P(ψ,θ) then samples arealso samples of P(ψ|θ)P(θ) as this is the same. Hencesamples θi must be from P(θ) as

∫dψP(ψ|θ) = 1 whatever

θ is.Examples: Hamiltonian Monte-Carlo. Swendsen Wang.

Amos Storkey — PMR: Sampling II 13/20

Page 33: PMR: Sampling II - The University of Edinburgh · Setting the right proposal to provide good mixing, but reasonable acceptance probability. Try to get acceptance rate to be 0:234

Metropolis Hastings Gibbs Sampling Hamiltonian MCMC

Augmentation Methods

Suppose we have a big sampling problem, that is hard.Solution?Turn it into an even bigger problem by adding additionalvariables. Solve that.Can get sample from the original problem by just throwingunneeded variables away.If samples (ψi,θi) are from joint P(ψ,θ) then samples arealso samples of P(ψ|θ)P(θ) as this is the same. Hencesamples θi must be from P(θ) as

∫dψP(ψ|θ) = 1 whatever

θ is.Examples: Hamiltonian Monte-Carlo. Swendsen Wang.

Amos Storkey — PMR: Sampling II 13/20

Page 34: PMR: Sampling II - The University of Edinburgh · Setting the right proposal to provide good mixing, but reasonable acceptance probability. Try to get acceptance rate to be 0:234

Metropolis Hastings Gibbs Sampling Hamiltonian MCMC

Augmentation Methods

Suppose we have a big sampling problem, that is hard.Solution?Turn it into an even bigger problem by adding additionalvariables. Solve that.Can get sample from the original problem by just throwingunneeded variables away.If samples (ψi,θi) are from joint P(ψ,θ) then samples arealso samples of P(ψ|θ)P(θ) as this is the same. Hencesamples θi must be from P(θ) as

∫dψP(ψ|θ) = 1 whatever

θ is.Examples: Hamiltonian Monte-Carlo. Swendsen Wang.

Amos Storkey — PMR: Sampling II 13/20

Page 35: PMR: Sampling II - The University of Edinburgh · Setting the right proposal to provide good mixing, but reasonable acceptance probability. Try to get acceptance rate to be 0:234

Metropolis Hastings Gibbs Sampling Hamiltonian MCMC

MCMC - Hamiltonian (or Hybrid) Monte-Carlo

Problem of Metropolis Hastings is random walk behaviour:slow diffusion to cover the space.Hamiltonion Monte-Carlo reduces this by augmenting eachvariable in the original space with another random variable.Now can do contour walks for each of these variables, inaddition to Gibbs sampling steps in the joint distribution ofthe augmented variables.Related to Hamiltonian systems in physics: maintainconstant energy by swapping kinetic energy for potentialenergy. Augmented variables are momentum variables.See Mackay Chapter 30.

Amos Storkey — PMR: Sampling II 14/20

Page 36: PMR: Sampling II - The University of Edinburgh · Setting the right proposal to provide good mixing, but reasonable acceptance probability. Try to get acceptance rate to be 0:234

Metropolis Hastings Gibbs Sampling Hamiltonian MCMC

MCMC - Hamiltonian (or Hybrid) Monte-Carlo

Problem of Metropolis Hastings is random walk behaviour:slow diffusion to cover the space.Hamiltonion Monte-Carlo reduces this by augmenting eachvariable in the original space with another random variable.Now can do contour walks for each of these variables, inaddition to Gibbs sampling steps in the joint distribution ofthe augmented variables.Related to Hamiltonian systems in physics: maintainconstant energy by swapping kinetic energy for potentialenergy. Augmented variables are momentum variables.See Mackay Chapter 30.

Amos Storkey — PMR: Sampling II 14/20

Page 37: PMR: Sampling II - The University of Edinburgh · Setting the right proposal to provide good mixing, but reasonable acceptance probability. Try to get acceptance rate to be 0:234

Metropolis Hastings Gibbs Sampling Hamiltonian MCMC

MCMC - Hamiltonian (or Hybrid) Monte-Carlo

Problem of Metropolis Hastings is random walk behaviour:slow diffusion to cover the space.Hamiltonion Monte-Carlo reduces this by augmenting eachvariable in the original space with another random variable.Now can do contour walks for each of these variables, inaddition to Gibbs sampling steps in the joint distribution ofthe augmented variables.Related to Hamiltonian systems in physics: maintainconstant energy by swapping kinetic energy for potentialenergy. Augmented variables are momentum variables.See Mackay Chapter 30.

Amos Storkey — PMR: Sampling II 14/20

Page 38: PMR: Sampling II - The University of Edinburgh · Setting the right proposal to provide good mixing, but reasonable acceptance probability. Try to get acceptance rate to be 0:234

Metropolis Hastings Gibbs Sampling Hamiltonian MCMC

MCMC - Hamiltonian (or Hybrid) Monte-Carlo

Problem of Metropolis Hastings is random walk behaviour:slow diffusion to cover the space.Hamiltonion Monte-Carlo reduces this by augmenting eachvariable in the original space with another random variable.Now can do contour walks for each of these variables, inaddition to Gibbs sampling steps in the joint distribution ofthe augmented variables.Related to Hamiltonian systems in physics: maintainconstant energy by swapping kinetic energy for potentialenergy. Augmented variables are momentum variables.See Mackay Chapter 30.

Amos Storkey — PMR: Sampling II 14/20

Page 39: PMR: Sampling II - The University of Edinburgh · Setting the right proposal to provide good mixing, but reasonable acceptance probability. Try to get acceptance rate to be 0:234

Metropolis Hastings Gibbs Sampling Hamiltonian MCMC

MCMC - Hamiltonian (or Hybrid) Monte-Carlo

Problem of Metropolis Hastings is random walk behaviour:slow diffusion to cover the space.Hamiltonion Monte-Carlo reduces this by augmenting eachvariable in the original space with another random variable.Now can do contour walks for each of these variables, inaddition to Gibbs sampling steps in the joint distribution ofthe augmented variables.Related to Hamiltonian systems in physics: maintainconstant energy by swapping kinetic energy for potentialenergy. Augmented variables are momentum variables.See Mackay Chapter 30.

Amos Storkey — PMR: Sampling II 14/20

Page 40: PMR: Sampling II - The University of Edinburgh · Setting the right proposal to provide good mixing, but reasonable acceptance probability. Try to get acceptance rate to be 0:234

Metropolis Hastings Gibbs Sampling Hamiltonian MCMC

Hamiltonian Monte-Carlo Procedure

Original problem P(θ). Add augmented Gaussian P(v).Step 1: Sample from Gaussian P(v).Step 2: Choose a direction (b = ±1) (to maintainreversibility).Walk along Hamiltonian path.

θ̇i = −b∂∂vi

log P(v)

v̇i = b∂∂θi

log P(θ)

Repeat.Actually have to put in a few fixes to deal with fact that canonly run differential system using finite steps. Need to useleapfrog steps and run bidirectionally use aproposal/acceptance approach to ensure detailed balance.

Amos Storkey — PMR: Sampling II 15/20

Page 41: PMR: Sampling II - The University of Edinburgh · Setting the right proposal to provide good mixing, but reasonable acceptance probability. Try to get acceptance rate to be 0:234

Metropolis Hastings Gibbs Sampling Hamiltonian MCMC

Hamiltonian Monte-Carlo Procedure

Original problem P(θ). Add augmented Gaussian P(v).Step 1: Sample from Gaussian P(v).Step 2: Choose a direction (b = ±1) (to maintainreversibility).Walk along Hamiltonian path.

θ̇i = −b∂∂vi

log P(v)

v̇i = b∂∂θi

log P(θ)

Repeat.Actually have to put in a few fixes to deal with fact that canonly run differential system using finite steps. Need to useleapfrog steps and run bidirectionally use aproposal/acceptance approach to ensure detailed balance.

Amos Storkey — PMR: Sampling II 15/20

Page 42: PMR: Sampling II - The University of Edinburgh · Setting the right proposal to provide good mixing, but reasonable acceptance probability. Try to get acceptance rate to be 0:234

Metropolis Hastings Gibbs Sampling Hamiltonian MCMC

Hamiltonian Monte-Carlo Procedure

Original problem P(θ). Add augmented Gaussian P(v).Step 1: Sample from Gaussian P(v).Step 2: Choose a direction (b = ±1) (to maintainreversibility).Walk along Hamiltonian path.

θ̇i = −b∂∂vi

log P(v)

v̇i = b∂∂θi

log P(θ)

Repeat.Actually have to put in a few fixes to deal with fact that canonly run differential system using finite steps. Need to useleapfrog steps and run bidirectionally use aproposal/acceptance approach to ensure detailed balance.

Amos Storkey — PMR: Sampling II 15/20

Page 43: PMR: Sampling II - The University of Edinburgh · Setting the right proposal to provide good mixing, but reasonable acceptance probability. Try to get acceptance rate to be 0:234

Metropolis Hastings Gibbs Sampling Hamiltonian MCMC

Hamiltonian Monte-Carlo Procedure

Original problem P(θ). Add augmented Gaussian P(v).Step 1: Sample from Gaussian P(v).Step 2: Choose a direction (b = ±1) (to maintainreversibility).Walk along Hamiltonian path.

θ̇i = −b∂∂vi

log P(v)

v̇i = b∂∂θi

log P(θ)

Repeat.Actually have to put in a few fixes to deal with fact that canonly run differential system using finite steps. Need to useleapfrog steps and run bidirectionally use aproposal/acceptance approach to ensure detailed balance.

Amos Storkey — PMR: Sampling II 15/20

Page 44: PMR: Sampling II - The University of Edinburgh · Setting the right proposal to provide good mixing, but reasonable acceptance probability. Try to get acceptance rate to be 0:234

Metropolis Hastings Gibbs Sampling Hamiltonian MCMC

Hamiltonian Monte-Carlo Procedure

Original problem P(θ). Add augmented Gaussian P(v).Step 1: Sample from Gaussian P(v).Step 2: Choose a direction (b = ±1) (to maintainreversibility).Walk along Hamiltonian path.

θ̇i = −b∂∂vi

log P(v)

v̇i = b∂∂θi

log P(θ)

Repeat.Actually have to put in a few fixes to deal with fact that canonly run differential system using finite steps. Need to useleapfrog steps and run bidirectionally use aproposal/acceptance approach to ensure detailed balance.

Amos Storkey — PMR: Sampling II 15/20

Page 45: PMR: Sampling II - The University of Edinburgh · Setting the right proposal to provide good mixing, but reasonable acceptance probability. Try to get acceptance rate to be 0:234

Metropolis Hastings Gibbs Sampling Hamiltonian MCMC

Hamiltonian Monte-Carlo Procedure

Original problem P(θ). Add augmented Gaussian P(v).Step 1: Sample from Gaussian P(v).Step 2: Choose a direction (b = ±1) (to maintainreversibility).Walk along Hamiltonian path.

θ̇i = −b∂∂vi

log P(v)

v̇i = b∂∂θi

log P(θ)

Repeat.Actually have to put in a few fixes to deal with fact that canonly run differential system using finite steps. Need to useleapfrog steps and run bidirectionally use aproposal/acceptance approach to ensure detailed balance.

Amos Storkey — PMR: Sampling II 15/20

Page 46: PMR: Sampling II - The University of Edinburgh · Setting the right proposal to provide good mixing, but reasonable acceptance probability. Try to get acceptance rate to be 0:234

Metropolis Hastings Gibbs Sampling Hamiltonian MCMC

Hamiltonian Monte-Carlo Procedure

Original problem P(θ). Add augmented Gaussian P(v).Step 1: Sample from Gaussian P(v).Step 2: Choose a direction (b = ±1) (to maintainreversibility).Walk along Hamiltonian path.

θ̇i = −b∂∂vi

log P(v)

v̇i = b∂∂θi

log P(θ)

Repeat.Actually have to put in a few fixes to deal with fact that canonly run differential system using finite steps. Need to useleapfrog steps and run bidirectionally use aproposal/acceptance approach to ensure detailed balance.

Amos Storkey — PMR: Sampling II 15/20

Page 47: PMR: Sampling II - The University of Edinburgh · Setting the right proposal to provide good mixing, but reasonable acceptance probability. Try to get acceptance rate to be 0:234

Metropolis Hastings Gibbs Sampling Hamiltonian MCMC

Hamiltonian Monte-Carlo Comments

Most commonly used in large continuous systems.Hamiltonian Monte-Carlo is recommended for many typicalunsupervised settings.

Amos Storkey — PMR: Sampling II 16/20

Page 48: PMR: Sampling II - The University of Edinburgh · Setting the right proposal to provide good mixing, but reasonable acceptance probability. Try to get acceptance rate to be 0:234

Metropolis Hastings Gibbs Sampling Hamiltonian MCMC

Hamiltonian Monte-Carlo Comments

Most commonly used in large continuous systems.Hamiltonian Monte-Carlo is recommended for many typicalunsupervised settings.

Amos Storkey — PMR: Sampling II 16/20

Page 49: PMR: Sampling II - The University of Edinburgh · Setting the right proposal to provide good mixing, but reasonable acceptance probability. Try to get acceptance rate to be 0:234

Metropolis Hastings Gibbs Sampling Hamiltonian MCMC

Hamiltonian Monte-Carlo Comments

Most commonly used in large continuous systems.Hamiltonian Monte-Carlo is recommended for many typicalunsupervised settings.

Amos Storkey — PMR: Sampling II 16/20

Page 50: PMR: Sampling II - The University of Edinburgh · Setting the right proposal to provide good mixing, but reasonable acceptance probability. Try to get acceptance rate to be 0:234

Metropolis Hastings Gibbs Sampling Hamiltonian MCMC

Swendsen Wang

Problem: Gibbs sampling mixes poorly due toself-reinforcement.Add in bond variables between highly aligned variables.Bonds can be in ‘connected’ or ‘disconnected’ states.Ensure marginal distribution is original problem.Conditioned on the states, bonds are independent. Canrandomly cut strong bonds.Better mixing comes from fact that there are now fewerreinforcing influences.See http://www.inference.phy.cam.ac.uk/mackay/itila/swendsen.pdf

Amos Storkey — PMR: Sampling II 17/20

Page 51: PMR: Sampling II - The University of Edinburgh · Setting the right proposal to provide good mixing, but reasonable acceptance probability. Try to get acceptance rate to be 0:234

Metropolis Hastings Gibbs Sampling Hamiltonian MCMC

Swendsen Wang

Problem: Gibbs sampling mixes poorly due toself-reinforcement.Add in bond variables between highly aligned variables.Bonds can be in ‘connected’ or ‘disconnected’ states.Ensure marginal distribution is original problem.Conditioned on the states, bonds are independent. Canrandomly cut strong bonds.Better mixing comes from fact that there are now fewerreinforcing influences.See http://www.inference.phy.cam.ac.uk/mackay/itila/swendsen.pdf

Amos Storkey — PMR: Sampling II 17/20

Page 52: PMR: Sampling II - The University of Edinburgh · Setting the right proposal to provide good mixing, but reasonable acceptance probability. Try to get acceptance rate to be 0:234

Metropolis Hastings Gibbs Sampling Hamiltonian MCMC

Swendsen Wang

Problem: Gibbs sampling mixes poorly due toself-reinforcement.Add in bond variables between highly aligned variables.Bonds can be in ‘connected’ or ‘disconnected’ states.Ensure marginal distribution is original problem.Conditioned on the states, bonds are independent. Canrandomly cut strong bonds.Better mixing comes from fact that there are now fewerreinforcing influences.See http://www.inference.phy.cam.ac.uk/mackay/itila/swendsen.pdf

Amos Storkey — PMR: Sampling II 17/20

Page 53: PMR: Sampling II - The University of Edinburgh · Setting the right proposal to provide good mixing, but reasonable acceptance probability. Try to get acceptance rate to be 0:234

Metropolis Hastings Gibbs Sampling Hamiltonian MCMC

Swendsen Wang

Problem: Gibbs sampling mixes poorly due toself-reinforcement.Add in bond variables between highly aligned variables.Bonds can be in ‘connected’ or ‘disconnected’ states.Ensure marginal distribution is original problem.Conditioned on the states, bonds are independent. Canrandomly cut strong bonds.Better mixing comes from fact that there are now fewerreinforcing influences.See http://www.inference.phy.cam.ac.uk/mackay/itila/swendsen.pdf

Amos Storkey — PMR: Sampling II 17/20

Page 54: PMR: Sampling II - The University of Edinburgh · Setting the right proposal to provide good mixing, but reasonable acceptance probability. Try to get acceptance rate to be 0:234

Metropolis Hastings Gibbs Sampling Hamiltonian MCMC

Swendsen Wang

Problem: Gibbs sampling mixes poorly due toself-reinforcement.Add in bond variables between highly aligned variables.Bonds can be in ‘connected’ or ‘disconnected’ states.Ensure marginal distribution is original problem.Conditioned on the states, bonds are independent. Canrandomly cut strong bonds.Better mixing comes from fact that there are now fewerreinforcing influences.See http://www.inference.phy.cam.ac.uk/mackay/itila/swendsen.pdf

Amos Storkey — PMR: Sampling II 17/20

Page 55: PMR: Sampling II - The University of Edinburgh · Setting the right proposal to provide good mixing, but reasonable acceptance probability. Try to get acceptance rate to be 0:234

Metropolis Hastings Gibbs Sampling Hamiltonian MCMC

Swendsen Wang

Problem: Gibbs sampling mixes poorly due toself-reinforcement.Add in bond variables between highly aligned variables.Bonds can be in ‘connected’ or ‘disconnected’ states.Ensure marginal distribution is original problem.Conditioned on the states, bonds are independent. Canrandomly cut strong bonds.Better mixing comes from fact that there are now fewerreinforcing influences.See http://www.inference.phy.cam.ac.uk/mackay/itila/swendsen.pdf

Amos Storkey — PMR: Sampling II 17/20

Page 56: PMR: Sampling II - The University of Edinburgh · Setting the right proposal to provide good mixing, but reasonable acceptance probability. Try to get acceptance rate to be 0:234

Metropolis Hastings Gibbs Sampling Hamiltonian MCMC

Swendsen Wang

Problem: Gibbs sampling mixes poorly due toself-reinforcement.Add in bond variables between highly aligned variables.Bonds can be in ‘connected’ or ‘disconnected’ states.Ensure marginal distribution is original problem.Conditioned on the states, bonds are independent. Canrandomly cut strong bonds.Better mixing comes from fact that there are now fewerreinforcing influences.See http://www.inference.phy.cam.ac.uk/mackay/itila/swendsen.pdf

Amos Storkey — PMR: Sampling II 17/20

Page 57: PMR: Sampling II - The University of Edinburgh · Setting the right proposal to provide good mixing, but reasonable acceptance probability. Try to get acceptance rate to be 0:234

Metropolis Hastings Gibbs Sampling Hamiltonian MCMC

Swendsen Wang

Problem: Gibbs sampling mixes poorly due toself-reinforcement.Add in bond variables between highly aligned variables.Bonds can be in ‘connected’ or ‘disconnected’ states.Ensure marginal distribution is original problem.Conditioned on the states, bonds are independent. Canrandomly cut strong bonds.Better mixing comes from fact that there are now fewerreinforcing influences.See http://www.inference.phy.cam.ac.uk/mackay/itila/swendsen.pdf

Amos Storkey — PMR: Sampling II 17/20

Page 58: PMR: Sampling II - The University of Edinburgh · Setting the right proposal to provide good mixing, but reasonable acceptance probability. Try to get acceptance rate to be 0:234

Metropolis Hastings Gibbs Sampling Hamiltonian MCMC

Convergence?

No guaranteed way to test convergence in general.Main tests involve tests of coalescence: converged when ithas forgotten past.Multiple chains: do they end up in the same place?

Amos Storkey — PMR: Sampling II 18/20

Page 59: PMR: Sampling II - The University of Edinburgh · Setting the right proposal to provide good mixing, but reasonable acceptance probability. Try to get acceptance rate to be 0:234

Metropolis Hastings Gibbs Sampling Hamiltonian MCMC

Convergence?

No guaranteed way to test convergence in general.Main tests involve tests of coalescence: converged when ithas forgotten past.Multiple chains: do they end up in the same place?

Amos Storkey — PMR: Sampling II 18/20

Page 60: PMR: Sampling II - The University of Edinburgh · Setting the right proposal to provide good mixing, but reasonable acceptance probability. Try to get acceptance rate to be 0:234

Metropolis Hastings Gibbs Sampling Hamiltonian MCMC

Convergence?

No guaranteed way to test convergence in general.Main tests involve tests of coalescence: converged when ithas forgotten past.Multiple chains: do they end up in the same place?

Amos Storkey — PMR: Sampling II 18/20

Page 61: PMR: Sampling II - The University of Edinburgh · Setting the right proposal to provide good mixing, but reasonable acceptance probability. Try to get acceptance rate to be 0:234

Metropolis Hastings Gibbs Sampling Hamiltonian MCMC

Convergence?

No guaranteed way to test convergence in general.Main tests involve tests of coalescence: converged when ithas forgotten past.Multiple chains: do they end up in the same place?

Amos Storkey — PMR: Sampling II 18/20

Page 62: PMR: Sampling II - The University of Edinburgh · Setting the right proposal to provide good mixing, but reasonable acceptance probability. Try to get acceptance rate to be 0:234

Metropolis Hastings Gibbs Sampling Hamiltonian MCMC

Real Machine Learning?

But how are these used in a real probabilistic modellingcontext?For sampling from the posterior distribution of machinelearning methods.

Amos Storkey — PMR: Sampling II 19/20

Page 63: PMR: Sampling II - The University of Edinburgh · Setting the right proposal to provide good mixing, but reasonable acceptance probability. Try to get acceptance rate to be 0:234

Metropolis Hastings Gibbs Sampling Hamiltonian MCMC

Real Machine Learning?

But how are these used in a real probabilistic modellingcontext?For sampling from the posterior distribution of machinelearning methods.

Amos Storkey — PMR: Sampling II 19/20

Page 64: PMR: Sampling II - The University of Edinburgh · Setting the right proposal to provide good mixing, but reasonable acceptance probability. Try to get acceptance rate to be 0:234

Metropolis Hastings Gibbs Sampling Hamiltonian MCMC

Real Machine Learning?

But how are these used in a real probabilistic modellingcontext?For sampling from the posterior distribution of machinelearning methods.

Amos Storkey — PMR: Sampling II 19/20

Page 65: PMR: Sampling II - The University of Edinburgh · Setting the right proposal to provide good mixing, but reasonable acceptance probability. Try to get acceptance rate to be 0:234

Metropolis Hastings Gibbs Sampling Hamiltonian MCMC

To Do

Examinable ReadingMackay Chapter 29, 30

Preparatory ReadingMackay Chapter 45

Extra ReadingAny papers of Radford Neal that take your fancy. Iain Murray’stutorial slides.

Amos Storkey — PMR: Sampling II 20/20