54
Introduction Why MCMC? Intuition Writing an MCMC sampler Markov chain Monte Carlo I Models for Socio-Environmental Data Chris Che-Castaldo, Mary B. Collins, N. Thompson Hobbs December 12, 2020 Che-Castaldo, Collins, Hobbs DBI-1052875, DBI-1639145, DEB 1145200 1 / 54

Markov chain Monte Carlo I - GitHub Pages

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Markov chain Monte Carlo I - GitHub Pages

Introduction Why MCMC? Intuition Writing an MCMC sampler

Markov chain Monte Carlo I

Models for Socio-Environmental Data

Chris Che-Castaldo, Mary B. Collins, N. Thompson Hobbs

December 12, 2020

Che-Castaldo, Collins, Hobbs DBI-1052875, DBI-1639145, DEB 1145200 1 / 54

Page 2: Markov chain Monte Carlo I - GitHub Pages

Introduction Why MCMC? Intuition Writing an MCMC sampler

What is this course about?

Che-Castaldo, Collins, Hobbs DBI-1052875, DBI-1639145, DEB 1145200 2 / 54

Page 3: Markov chain Monte Carlo I - GitHub Pages

Introduction Why MCMC? Intuition Writing an MCMC sampler

You can understand it.

I Rules of probabilityI Conditioning and independenceI Law of total probabilityI Factoring joint probabilities

I Distribution theory

I Markov chain Monte Carlo

Che-Castaldo, Collins, Hobbs DBI-1052875, DBI-1639145, DEB 1145200 3 / 54

Page 4: Markov chain Monte Carlo I - GitHub Pages

Introduction Why MCMC? Intuition Writing an MCMC sampler

The Bayesian method

Existing theory, scientific

objectives, intuition

Write deterministic

model of process.

Design / choose

observations.

A. Design

Diagram relationship

between observed and

unobserved.

Write out posterior and joint

distributions using general

probability notation.

Choose appropriate

probability distributions.

C. Model implementation

B. Model specification

Write full conditional distributions.

Write MCMC sampling algorithm. OrWrite code for MCMC

software.

Implement MCMC on simulated data.

D. Model evaluation and inferencePosterior predictive checks

Probabilistic inference from

single model Model selection,

model averaging

Implement MCMC on real data.

Che-Castaldo, Collins, Hobbs DBI-1052875, DBI-1639145, DEB 1145200 4 / 54

Page 5: Markov chain Monte Carlo I - GitHub Pages

Introduction Why MCMC? Intuition Writing an MCMC sampler

The MCMC algorithm

I Why MCMC?

I Some intuition about how it works for a single parametermodel

I MCMC for multiple parameter modelsI Full-conditional distributions (today)I Gibbs sampling (today)I Metropolis-Hastings algorithm (optional lecture notes and

reading)I MCMC software (JAGS, today and tomorrow)

Che-Castaldo, Collins, Hobbs DBI-1052875, DBI-1639145, DEB 1145200 5 / 54

Page 6: Markov chain Monte Carlo I - GitHub Pages

Introduction Why MCMC? Intuition Writing an MCMC sampler

MCMC learning outcomes

1. Develop a big picture understanding of how MCMC allows usto approximate the marginal posterior distributions ofparameters and latent quantities.

2. Understand and be able to code a simple MCMC algorithm.

3. Appreciate the different methods that can be used withinMCMC algorithms to make draws from the posteriordistribution.

3.1 Metropolis3.2 Metropolis-Hastings3.3 Gibbs

4. Understand concepts of burn-in and convergence.

5. Understand and be able to write full-conditional distributions.

Che-Castaldo, Collins, Hobbs DBI-1052875, DBI-1639145, DEB 1145200 6 / 54

Page 7: Markov chain Monte Carlo I - GitHub Pages

Introduction Why MCMC? Intuition Writing an MCMC sampler

Remember the marginal distribution of the data

Che-Castaldo, Collins, Hobbs DBI-1052875, DBI-1639145, DEB 1145200 7 / 54

Page 8: Markov chain Monte Carlo I - GitHub Pages

Introduction Why MCMC? Intuition Writing an MCMC sampler

We have simple solutions for the posterior for simplemodels:

[φ |y ] = beta

The prior α︷︸︸︷α +y︸ ︷︷ ︸

The new α

,

The priorβ︷︸︸︷β +n−y︸ ︷︷ ︸The new β

Che-Castaldo, Collins, Hobbs DBI-1052875, DBI-1639145, DEB 1145200 8 / 54

Page 9: Markov chain Monte Carlo I - GitHub Pages

Introduction Why MCMC? Intuition Writing an MCMC sampler

Problems of high dimension do not have simple solutions:

[θ1,θ2,θ3,θ4,z | y,u] =∏

ni=1[yi |θ1zi ][ui |θ2,zi ][zi |θ3,θ4][θ1][θ2][θ3][θ4]∫

....∫

∏ni=1[yi |θ1zi ][ui |θ2,zi ][zi |θ3,θ4][θ1][θ2][θ3][θ4]dzidθ1dθ2dθ3dθ4

Che-Castaldo, Collins, Hobbs DBI-1052875, DBI-1639145, DEB 1145200 9 / 54

Page 10: Markov chain Monte Carlo I - GitHub Pages

Introduction Why MCMC? Intuition Writing an MCMC sampler

What we are doing in MCMC?

Recall that the posterior distribution is proportional to the joint:because the marginal distribution of the data

∫[y |θ ][θ ]dθ is a

constant after the data have been observed.

Posterior︷︸︸︷[θ |y ] ∝

Joint︷ ︸︸ ︷[y ,θ ] (1)

Posterior︷︸︸︷[θ |y ] ∝

likelihood︷ ︸︸ ︷[θ | y ]

prior︷︸︸︷[θ ] (2)

Factoring the joint distribution into a product of probabilitydistributions using the chain rule of probability is where we start allBayesian modeling. The factored joint distribution provides thebasis for MCMC.

Che-Castaldo, Collins, Hobbs DBI-1052875, DBI-1639145, DEB 1145200 10 / 54

Page 11: Markov chain Monte Carlo I - GitHub Pages

Introduction Why MCMC? Intuition Writing an MCMC sampler

What we are doing in MCMC?

6 8 10 12 14

0.0e+00

1.5e-09

Likelihood

θ

[y | θ]

6 8 10 12 14

0.0

0.1

0.2

0.3

Posterior

θ

[θ |

y]

Che-Castaldo, Collins, Hobbs DBI-1052875, DBI-1639145, DEB 1145200 11 / 54

Page 12: Markov chain Monte Carlo I - GitHub Pages

Introduction Why MCMC? Intuition Writing an MCMC sampler

What we are doing in MCMC?

n=100, not normalized

data

Frequency

0.0 0.2 0.4 0.6

05

1015

2025

30n=100, normalized

data

Density

0.0 0.2 0.4 0.60.0

1.0

2.0

3.0

Che-Castaldo, Collins, Hobbs DBI-1052875, DBI-1639145, DEB 1145200 12 / 54

Page 13: Markov chain Monte Carlo I - GitHub Pages

Introduction Why MCMC? Intuition Writing an MCMC sampler

What are we doing in MCMC?

I The posterior distribution is unknown, but the likelihood isknown as a likelihood profile and we know the priors.

I We want to accumulate many, many values that representrandom samples proportionate to their density in the marginalposterior distribution.

I MCMC generates these samples using the likelihood and thepriors to decide which samples to keep and which to throwaway.

I We can then use these samples to calculate statisticsdescribing the distribution: means, medians, variances,credible intervals etc.

Che-Castaldo, Collins, Hobbs DBI-1052875, DBI-1639145, DEB 1145200 13 / 54

Page 14: Markov chain Monte Carlo I - GitHub Pages

Introduction Why MCMC? Intuition Writing an MCMC sampler

What are we doing in MCMC?

The marginal posterior distribution of each unobserved quantity isapproximated by samples accumulated in the chain.

n=100000, not normalized

θ

Frequency

0.0 0.2 0.4 0.6 0.8

050

100

150

200

250

300

350

n=100000, normalized

θ

Density

0.0 0.2 0.4 0.6 0.8

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

Che-Castaldo, Collins, Hobbs DBI-1052875, DBI-1639145, DEB 1145200 14 / 54

Page 15: Markov chain Monte Carlo I - GitHub Pages

Introduction Why MCMC? Intuition Writing an MCMC sampler

What are we doing in MCMC?

θ

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

θ

0 1

A

C

D

B

θ

Density

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

01

23

4

[θ*]

Joint distribution [y|θ][θ]

determines frequency with which

proposals are kept in chain.

..2 3 K1k .

.63....27.07.21θ

θ*

Che-Castaldo, Collins, Hobbs DBI-1052875, DBI-1639145, DEB 1145200 15 / 54

Page 16: Markov chain Monte Carlo I - GitHub Pages

Introduction Why MCMC? Intuition Writing an MCMC sampler

Metropolis updates

We keep the more probable members of the posterior distributionby comparing a proposal with the current value in the chain.

k 1 2

Proposalθ ∗k+1θ ∗ 2

Test P(θ ∗ 2)> P(θ1)

Chain(θk ) θ1 θ2 = θ ∗ 2

Che-Castaldo, Collins, Hobbs DBI-1052875, DBI-1639145, DEB 1145200 16 / 54

Page 17: Markov chain Monte Carlo I - GitHub Pages

Introduction Why MCMC? Intuition Writing an MCMC sampler

Metropolis updates

We keep the more probable members of the posterior distributionby comparing a proposal with the current value in the chain.

k 1 2 3

Proposalθ ∗k+1θ ∗ 2 θ ∗ 3

Test P(θ ∗ 2)> P(θ1)

P(θ2)> P(θ ∗ 3

)

Chain(θk ) θ1 θ2 = θ ∗ 2 θ3 = θ2

Che-Castaldo, Collins, Hobbs DBI-1052875, DBI-1639145, DEB 1145200 17 / 54

Page 18: Markov chain Monte Carlo I - GitHub Pages

Introduction Why MCMC? Intuition Writing an MCMC sampler

Metropolis updates

We keep the more probable members of the posterior distributionby comparing a proposal with the current value in the chain.

k 1 2 3 4 K

Proposalθ∗k+1θ∗ 2 θ∗ 3 θ∗ 4 . . . .

Test P(θ∗ 2)>P(θ1)

P(θ2 )>P(

θ∗ 3)

P(θ3 )>P(

θ∗ 4)

. . . .

Chain(θk ) θ1 θ2 = θ∗ 2 θ3 = θ2 θ4 = θ3

Che-Castaldo, Collins, Hobbs DBI-1052875, DBI-1639145, DEB 1145200 18 / 54

Page 19: Markov chain Monte Carlo I - GitHub Pages

Introduction Why MCMC? Intuition Writing an MCMC sampler

Metropolis updates

[θ ∗k+1|y ] =

likelihood︷ ︸︸ ︷[y |θ ∗k+1]

prior︷ ︸︸ ︷[θ ∗k+1]∫

[y |θ ][θ ]dθ

[θ k |y ] =

likelihood︷ ︸︸ ︷[y |θ k ]

prior︷︸︸︷[θ k ]∫

[y |θ ][θ ]dθ

R = [θ ∗k+1|y ][θk |y ]

Che-Castaldo, Collins, Hobbs DBI-1052875, DBI-1639145, DEB 1145200 19 / 54

Page 20: Markov chain Monte Carlo I - GitHub Pages

Introduction Why MCMC? Intuition Writing an MCMC sampler

The cunning idea behind Metropolis updates

[θ ∗k+1|y ] =

likelihood︷ ︸︸ ︷[y |θ ∗k+1]

prior︷ ︸︸ ︷[θ ∗k+1]

(((((∫[y |θ ][θ ]dθ

[θ k |y ] =

likelihood︷ ︸︸ ︷[y |θ k ]

prior︷︸︸︷[θ k ]

(((((∫[y |θ ][θ ]dθ

R = [θ ∗k+1|y ][θk |y ]

Che-Castaldo, Collins, Hobbs DBI-1052875, DBI-1639145, DEB 1145200 20 / 54

Page 21: Markov chain Monte Carlo I - GitHub Pages

Introduction Why MCMC? Intuition Writing an MCMC sampler

When do we keep the proposal?

PR =min(1,R)

Keep θ ∗k+1 as the next value in the chain with probability PR andkeep θ k with probability 1−PR.

Che-Castaldo, Collins, Hobbs DBI-1052875, DBI-1639145, DEB 1145200 21 / 54

Page 22: Markov chain Monte Carlo I - GitHub Pages

Introduction Why MCMC? Intuition Writing an MCMC sampler

When do we keep the proposal?

1. Calculate R based on likelihoods and priors.

2. Draw a random number, U from uniform distribution 0,1 IfR >U , we keep the proposal θ ∗k+1 as the next value in thechain.

3. Otherwise, we retain θ k as the next value.

Che-Castaldo, Collins, Hobbs DBI-1052875, DBI-1639145, DEB 1145200 22 / 54

Page 23: Markov chain Monte Carlo I - GitHub Pages

Introduction Why MCMC? Intuition Writing an MCMC sampler

A simple example for one parameter

I Mary is interested in estimating the proportion of coal-firedpower plants that fail to meet regulations for emissions of lead.

I She is not very ambitious, so she only checks 12 plants, 3 ofwhich are non-compliant. She assumes there is no priorknowledge of this proportion.

I How could she calculate the parameters of the posteriordistribution of the proportion of non-compliant plants on theback of a cocktail napkin?

Che-Castaldo, Collins, Hobbs DBI-1052875, DBI-1639145, DEB 1145200 23 / 54

Page 24: Markov chain Monte Carlo I - GitHub Pages

Introduction Why MCMC? Intuition Writing an MCMC sampler

The model

[φ |y ] ∝ binomial(y |n,φ)beta(φ |1,1)

Che-Castaldo, Collins, Hobbs DBI-1052875, DBI-1639145, DEB 1145200 24 / 54

Page 25: Markov chain Monte Carlo I - GitHub Pages

Introduction Why MCMC? Intuition Writing an MCMC sampler

Sampling from the posterior

0.1 0.2 0.3 0.4 0.5 0.6

01

23

45

6

Simulated and Calculated Distribution, iterations = 100

φ

P(φ|data)

CalculatedSimulated

Che-Castaldo, Collins, Hobbs DBI-1052875, DBI-1639145, DEB 1145200 25 / 54

Page 26: Markov chain Monte Carlo I - GitHub Pages

Introduction Why MCMC? Intuition Writing an MCMC sampler

Sampling from the posterior

0.0 0.2 0.4 0.6

01

23

45

6

Simulated and Calculated Distribution, iterations = 1000

φ

P(φ|data)

CalculatedSimulated

Che-Castaldo, Collins, Hobbs DBI-1052875, DBI-1639145, DEB 1145200 26 / 54

Page 27: Markov chain Monte Carlo I - GitHub Pages

Introduction Why MCMC? Intuition Writing an MCMC sampler

Sampling from the posterior

0.0 0.2 0.4 0.6 0.8

01

23

45

6

Simulated and Calculated Distribution, iterations = 10000

φ

P(φ|data)

CalculatedSimulated

Che-Castaldo, Collins, Hobbs DBI-1052875, DBI-1639145, DEB 1145200 27 / 54

Page 28: Markov chain Monte Carlo I - GitHub Pages

Introduction Why MCMC? Intuition Writing an MCMC sampler

Sampling from the posterior

0.0 0.2 0.4 0.6 0.8

01

23

45

6

Simulated and Calculated Distribution, iterations = 100000

φ

P(φ|data)

CalculatedSimulated

Che-Castaldo, Collins, Hobbs DBI-1052875, DBI-1639145, DEB 1145200 28 / 54

Page 29: Markov chain Monte Carlo I - GitHub Pages

Introduction Why MCMC? Intuition Writing an MCMC sampler

Sampling from the posterior

Che-Castaldo, Collins, Hobbs DBI-1052875, DBI-1639145, DEB 1145200 29 / 54

Page 30: Markov chain Monte Carlo I - GitHub Pages

Introduction Why MCMC? Intuition Writing an MCMC sampler

Sampling from the posterior

The chain has converged when adding more samples does notchange the shape of the posterior distribution. We throw awaysamples that are accumulated before convergence (burn-in).

Che-Castaldo, Collins, Hobbs DBI-1052875, DBI-1639145, DEB 1145200 30 / 54

Page 31: Markov chain Monte Carlo I - GitHub Pages

Introduction Why MCMC? Intuition Writing an MCMC sampler

Intuition for MCMC for multi-parameter models

yi

�0,�1

xi

�2

g(�0,�1, xi) = �0 + �1xi

[�0,�1,�2 | yi] / [�0,�1,�

2, yi]

factoring rhs using DAG:

[�0,�1,�2 | yi] / [yi | g(�0,�1, xi),�

2][�0], [�1][�2]

joint for all data :

[�0,�1,�2 | y] /

nY

i=1

[yi | g(�0,�1, xi),�2][�0][�1][�

2]

choose specific distributions:

[�0,�1,�2 | y] /

nY

i=1

normal(yi | g(�0,�1, xi),�2)

⇥ normal(�0 | 0, 10000)normal(�1 | 0, 10000)

⇥ uniform(�2 | 0, 500)

Che-Castaldo, Collins, Hobbs DBI-1052875, DBI-1639145, DEB 1145200 31 / 54

Page 32: Markov chain Monte Carlo I - GitHub Pages

Introduction Why MCMC? Intuition Writing an MCMC sampler

Intuition for MCMC for multi-parameter models

[β0,β1,σ2 | y] ∝

n

∏i=1

normal(yi |g(β0,β1,xi),σ2)

× normal(β0 | 0,10000)normal(β1 | 0,10000)× uniform(σ2 | 0,100)

1. Set initial values for β0,β1,σ2

2. Assume β1,σ2 are known and constant. Make a draw for β0.

Store the draw.

3. Assume β0,σ2 are known and constant. Make a draw for β1.

Store the draw.

4. Assume β0,β1 are known and constant. Make a draw for σ2.Store the draw.

5. Do this many times. The stored values for each parameterapproximate its marginal posterior distribution afterconvergence.

Che-Castaldo, Collins, Hobbs DBI-1052875, DBI-1639145, DEB 1145200 32 / 54

Page 33: Markov chain Monte Carlo I - GitHub Pages

Introduction Why MCMC? Intuition Writing an MCMC sampler

Implementing MCMC for multiple parametersI Write an expression for the posterior and joint distribution using a DAG

as a guide. Always.

I If you are using MCMC software (e.g. JAGS) use expression for theposterior and joint distribution as template for writing code. You aredone.

I If you are writing your own MCMC sampler or you simply want tounderstand what JAGS is doing for you:

I Decompose the expression of the multivariate joint distribution intoa series of univariate distributions called full-conditionaldistributions.

I Choose a sampling method for each full-conditional distribution.I Cycle through each unobserved quantity, sampling from its

full-conditional distribution, treating the others as if they wereknown and constant.

I The accumulated samples approximate the marginal posteriordistribution of each unobserved quantity.

I Note that this takes a complex, multivariate problem and turns itinto a series of simple, univariate problems that we solve, as in theexample above, one at a time.

Che-Castaldo, Collins, Hobbs DBI-1052875, DBI-1639145, DEB 1145200 33 / 54

Page 34: Markov chain Monte Carlo I - GitHub Pages

Introduction Why MCMC? Intuition Writing an MCMC sampler

Definition of full-conditional distribution

Let θθθ be a vector of length k containing all of the unobservedquantities we seek to understand. Let θθθ j be a vector of lengthk −1 that contains all of the unobserved quantities except θj . Thefull-conditional distribution of θj is

[θj |y ,,,θθθ j ],

which we notate as[θj |·].

It is the posterior distribution of θj conditional on all of theother parameters and the data, which we assume are known.

Che-Castaldo, Collins, Hobbs DBI-1052875, DBI-1639145, DEB 1145200 34 / 54

Page 35: Markov chain Monte Carlo I - GitHub Pages

Introduction Why MCMC? Intuition Writing an MCMC sampler

Writing full-conditional distributions

I You will have one full-conditional for each unobservedquantity in the posterior.

I For each unobserved quantity, write the distributions where itappears.

I Ignore the other distributions.

I Simple.

Che-Castaldo, Collins, Hobbs DBI-1052875, DBI-1639145, DEB 1145200 35 / 54

Page 36: Markov chain Monte Carlo I - GitHub Pages

Introduction Why MCMC? Intuition Writing an MCMC sampler

Example

I Clark 2003 considered the problem of modeling fecundity ofspotted owls and the implication of individual variation infecundity for population growth rate.

I Data were number of offspring produced by per pair of owlswith sample size n = 197.

Clark, J. S. 2003. Uncertainty and variability in demography and population growth: A hierarchical approach.Ecology 84:1370-1381.

Che-Castaldo, Collins, Hobbs DBI-1052875, DBI-1639145, DEB 1145200 36 / 54

Page 37: Markov chain Monte Carlo I - GitHub Pages

Introduction Why MCMC? Intuition Writing an MCMC sampler

Example

yi

�i

↵ �

[λλλ ,α,β |y] ∝

n

∏i=1

Poisson(yi |λi)gamma(λi |α,β )

× gamma(α|.001, .001)gamma(β |.001, .001)

Che-Castaldo, Collins, Hobbs DBI-1052875, DBI-1639145, DEB 1145200 37 / 54

Page 38: Markov chain Monte Carlo I - GitHub Pages

Introduction Why MCMC? Intuition Writing an MCMC sampler

Full-conditionals

yi

λi

α β

We use the multivariate joint distribution to find univariate full-conditional distributions for all unobserved quantities.

How many full conditionals are there?

[�,↵,�|y] /nY

i=1

Poisson (yi|�i) gamma (�i|↵,�) gamma (�|.001, .001) gamma (↵|.001, .001)

Che-Castaldo, Collins, Hobbs DBI-1052875, DBI-1639145, DEB 1145200 38 / 54

Page 39: Markov chain Monte Carlo I - GitHub Pages

Introduction Why MCMC? Intuition Writing an MCMC sampler

Writing full-conditional distributions

I You will have one full-conditional for each unobservedquantity in the posterior.

I For each unobserved quantity, write the distributions(including products) where it appears.

I Ignore the other distributions.

I Simple.

Che-Castaldo, Collins, Hobbs DBI-1052875, DBI-1639145, DEB 1145200 39 / 54

Page 40: Markov chain Monte Carlo I - GitHub Pages

Introduction Why MCMC? Intuition Writing an MCMC sampler

Full-conditional for each λi

yi[λi | .]∝ Poisson yi | λi( )gamma λi |α ,β( )Writing the full-conditional distribution for λi:

[�,↵,�|y] /nY

i=1

Poisson (yi|�i) gamma (�i|↵,�) gamma (�|.001, .001) gamma (↵|.001, .001)

λi

α β

Che-Castaldo, Collins, Hobbs DBI-1052875, DBI-1639145, DEB 1145200 40 / 54

Page 41: Markov chain Monte Carlo I - GitHub Pages

Introduction Why MCMC? Intuition Writing an MCMC sampler

Full-conditional for β

Writing the full-conditional distribution for β:

[�,↵,�|y] /nY

i=1

Poisson (yi|�i) gamma (�i|↵,�) gamma (�|.001, .001) gamma (↵|.001, .001)

[�|.] /nY

i=1

gamma (�i|↵,�) gamma (�|.001, .001)

λi

α β

Che-Castaldo, Collins, Hobbs DBI-1052875, DBI-1639145, DEB 1145200 41 / 54

Page 42: Markov chain Monte Carlo I - GitHub Pages

Introduction Why MCMC? Intuition Writing an MCMC sampler

Full-conditional for α

Writing the full-conditional distribution for α:

[�,↵,�|y] /nY

i=1

Poisson (yi|�i) gamma (�i|↵,�) gamma (�|.001, .001) gamma (↵|.001, .001)

λi

α β

[↵ | ·] /nY

i=1

gamma(�i | ↵,�)gamma(↵ | .001, .001)

Che-Castaldo, Collins, Hobbs DBI-1052875, DBI-1639145, DEB 1145200 42 / 54

Page 43: Markov chain Monte Carlo I - GitHub Pages

Introduction Why MCMC? Intuition Writing an MCMC sampler

Full-conditionals for the model

Posterior and joint:

[λλλ ,α,β |y] ∝

n

∏i=1

Poisson(yi |λi)gamma(λi |α,β )

× gamma(α|.001, .001)gamma(β |.001, .001)

Full conditionals:[λi |.] ∝ Poisson(yi |λi)gamma(λi |α,β )

[β |.] ∝

n

∏i=1

gamma(λi |α,β )gamma(β |.001, .001)

[α|.] ∝

n

∏i=1

gamma(λi |α,β )gamma(α|.001, .001)

Che-Castaldo, Collins, Hobbs DBI-1052875, DBI-1639145, DEB 1145200 43 / 54

Page 44: Markov chain Monte Carlo I - GitHub Pages

Introduction Why MCMC? Intuition Writing an MCMC sampler

Implementing MCMC for multiple parameters and latentquantities

I Write an expression for the posterior and joint distribution using a DAGas a guide. Always.

I If you are using MCMC software (e.g. JAGS) use expression for posteriorand joint as template for writing code.

I If you are writing your own MCMC sampler:

I Decompose the expression of the multivariate joint distribution intoa series of univariate distributions called full-conditionaldistributions.

I Choose a sampling method for each full-conditional distribution.I Cycle through each unobserved quantity, sampling from the its

full-conditional distribution, treating the others as if they wereknown and constant.

I Note that this takes a complex, multivariate problem and turns itinto a series of simple, univariate problems that we solve, as in theexample above, one at a time.

Che-Castaldo, Collins, Hobbs DBI-1052875, DBI-1639145, DEB 1145200 44 / 54

Page 45: Markov chain Monte Carlo I - GitHub Pages

Introduction Why MCMC? Intuition Writing an MCMC sampler

Choosing a sampling method

1. Acept-reject:

1.1 Metropolis: requires a symmetric proposal distribution (e.g.,normal, uniform). This is what we used above in the examplefor one parameter.

1.2 Metropolis-Hastings: allows asymmetric proposal distributions(e.g., beta, gamma, lognormal). A minor modification ofMetropolis. See optional notes.

2. Gibbs: accepts all proposals because they are especially wellchosen. Requires conjugates. In lab today.

Che-Castaldo, Collins, Hobbs DBI-1052875, DBI-1639145, DEB 1145200 45 / 54

Page 46: Markov chain Monte Carlo I - GitHub Pages

Introduction Why MCMC? Intuition Writing an MCMC sampler

Why do you need to understand conjugate priors?

I A easy way to find parameters of posterior distributions forsimple problems.

I Critical to understanding Gibbs updates in Markov chainMonte Carlo as you are about to learn.

Che-Castaldo, Collins, Hobbs DBI-1052875, DBI-1639145, DEB 1145200 46 / 54

Page 47: Markov chain Monte Carlo I - GitHub Pages

Introduction Why MCMC? Intuition Writing an MCMC sampler

What are conjugate priors?

Assume we have a likelihood and a prior:

posterior︷︸︸︷[θ |y ] =

likelihood︷︸︸︷[y |θ ]

prior︷︸︸︷[θ ]

[y ] .

If the form of the distribution of the posterior[θ |y ]is the same as the form of the distribution of the prior,[θ ]then the likelihood and the prior are said to be conjugates[y |θ ][θ ]︸ ︷︷ ︸congugates

and the prior is called a conjugate prior for θ .

Che-Castaldo, Collins, Hobbs DBI-1052875, DBI-1639145, DEB 1145200 47 / 54

Page 48: Markov chain Monte Carlo I - GitHub Pages

Introduction Why MCMC? Intuition Writing an MCMC sampler

Gibbs updates

When priors and likelihoods are conjugate, we know all but one ofthe parameters of the full-conditional because they are assumedto be known at each iteration. We make a draw of the singleunknown directly from its posterior distribution as if the otherparameters were fixed.

Wickedly clever.

Che-Castaldo, Collins, Hobbs DBI-1052875, DBI-1639145, DEB 1145200 48 / 54

Page 49: Markov chain Monte Carlo I - GitHub Pages

Introduction Why MCMC? Intuition Writing an MCMC sampler

Gamma-Poisson conjugate relationship for λ

The conjugate prior distribution for a Poisson likelihood isgamma(λ |α,β ). Given n observations yi of new data, theposterior distribution of λ is

[λ |y] = gamma

The prior α︷︸︸︷α0 +

n

∑i=1

yi

︸ ︷︷ ︸The new α

,

The prior β︷︸︸︷β0 +n︸ ︷︷ ︸

The new β

. (3)

Che-Castaldo, Collins, Hobbs DBI-1052875, DBI-1639145, DEB 1145200 49 / 54

Page 50: Markov chain Monte Carlo I - GitHub Pages

Introduction Why MCMC? Intuition Writing an MCMC sampler

Sampling from the Poisson-gamma conjugate:

Full conditional:

[λi | .] ∝ Poisson(yi | λi)gamma(λi | α,β ) (4)

Gibbs sample:

[λ ki |yi ] = gamma

The currentα︷ ︸︸ ︷αk−1 +yi ,

The current β︷ ︸︸ ︷βk−1 +1

. (5)

In R, this would be:shape[k] = alpha[k-1] + y_i

rate[k] = beta[k-1] + 1

lambda[k,i] = rgamma(1, shape[k], rate[k])

Che-Castaldo, Collins, Hobbs DBI-1052875, DBI-1639145, DEB 1145200 50 / 54

Page 51: Markov chain Monte Carlo I - GitHub Pages

Introduction Why MCMC? Intuition Writing an MCMC sampler

Gamma-gamma conjugate relationship

The conjugate prior distribution for the β parameter (rate) in agamma likelihood gamma(yi |α,β ) is a gamma distributiongamma(β | α0,β0). Given n observations yi of new data, theposterior distribution of β (assuming that α (shape) is known) isgiven by:

[β |y] = gamma

The prior α︷︸︸︷α0 +nα︸ ︷︷ ︸The new α

,

The prior β︷︸︸︷βo +

n

∑i=1︸ ︷︷ ︸

The new β

yi

. (6)

We can substitute any “known” quantity for y, e.g., λλλ in MCMC.

Che-Castaldo, Collins, Hobbs DBI-1052875, DBI-1639145, DEB 1145200 51 / 54

Page 52: Markov chain Monte Carlo I - GitHub Pages

Introduction Why MCMC? Intuition Writing an MCMC sampler

Sampling from the gamma-gamma conjugate:

The full conditional:

[β |.] ∝n

∏i=1

gamma(λi |α,β )gamma(β |.001, .001)

Gibbs sample:

β k ∼ gamma

(.001+nαk−1, .001+

n

∑i=1

λ ki

)

In R this would be:shape[k] = .001 + length(y) * alpha[k-1]rate[k] = .001 + sum(lambda[,k])beta[k] = rgamma(1, shape[k], rate[k])

Che-Castaldo, Collins, Hobbs DBI-1052875, DBI-1639145, DEB 1145200 52 / 54

Page 53: Markov chain Monte Carlo I - GitHub Pages

Introduction Why MCMC? Intuition Writing an MCMC sampler

MCMC algorithm

1. Iterate over i = 1...197

2. At each i , make a draw from

λki ∼ gamma

(αk−1+yi , β

k−1+1)

︸ ︷︷ ︸Gibbs update using gamma - Poisson conjugate for eachλi

(7)

βk ∼ gamma

(.001+α

k−1n, .001+n

∑i=1

λki

)

︸ ︷︷ ︸Gibbs update using gamma - gamma conjugate for β

(8)

αk

n

∏i=1

gamma(

λki |αk−1,β k

)gamma

(αk−1|.001, .001

)

︸ ︷︷ ︸No conguate for α. Use Metropolis - Hastings update

(9)

3. Repeat for k = 1...K iterations, storing λ ki ,α

k and β k . Store the value of eachparameter at each iteration in a vector.

Che-Castaldo, Collins, Hobbs DBI-1052875, DBI-1639145, DEB 1145200 53 / 54

Page 54: Markov chain Monte Carlo I - GitHub Pages

Introduction Why MCMC? Intuition Writing an MCMC sampler

Inference from MCMC

Make inference on each unobserved quantity using the elementsof their vectors stored after convergence. These post-convergencevectors, (i.e., the“rug”described above) approximate the marginalposterior distributions of unobserved quantities.

Che-Castaldo, Collins, Hobbs DBI-1052875, DBI-1639145, DEB 1145200 54 / 54