An Introduction to Bayesian Inference and MCMC Methods …people.stat.sfu.ca/~cschwarz/Consulting/Trinity/Phase2/Trinity... · An Introduction to WinBUGS 3 Two-Stage Capture-Recapture

An Introduction to BayesianInference and MCMC Methods for

Capture-Recapture

Trinity River Restoration ProgramWorkshop on Outmigration: Population Estimation

October 6–8, 2009

An Introduction to Bayesian Inference

1 The Binomial ModelMaximum Likelihood EstimationBayesian Inference and the Posterior DensityPosterior Summaries

2 MCMC MethodsAn Introduction to MCMCAn Introduction to WinBUGS

3 Two-Stage Capture-Recapture ModelsThe Simple-Petersen ModelThe Stratified-Petersen ModelThe Hierarchical-Petersen Model

4 Further IssuesMonitoring ConvergenceModel Selection and the DICGoodness-of-Fit and Bayesian p-values

5 Bayesian Penalized Splines





Maximum Likelihood EstimationThe Binomial Distribution

Setup

a population contains a fixed and known number of markedindividuals (n)

Assumptions

every individual has the same probability of being captured (p)

individuals are captured independently

Probability Mass Function

The probability that m of n individuals are captured is:

P(m|p) =

(n

m

)pm(1− p)n−m

An Introduction to Bayesian Inference: The Binomial Model, Maximum Likelihood Estimation 5/60

Maximum Likelihood EstimationThe Binomial Distribution

If n = 30 and p = .8:

0 5 10 15 20 25 30

0.00

0.05

0.10

0.15

Probability Mass Function

m

Pro

babi

lity


Maximum Likelihood EstimationThe Likelihood Function

DefinitionThe likelihood function is equal to the probability mass function ofthe observed data allowing the parameter values to change whilethe data is fixed.

The likelihood function for the binomial experiment is:

L(p|m) = P(m|p) =

(n

m

)pm(1− p)n−m


Maximum Likelihood EstimationThe Likelihood Function

If n = 30 and m = 24:

0.0 0.2 0.4 0.6 0.8 1.0

0.00

0.05

0.10

0.15

p

Like

lihoo

d


Maximum Likelihood EstimationMaximum Likelihood Estimates

DefinitionThe maximum likelihood estimator is the value of the parameterwhich maximizes the likelihood function for the observed data.

The maximum likelihood estimator of p for the binomialexperiment is:

p̂ =m

n


Maximum Likelihood EstimationMaximum Likelihood Estimates

If n = 30 and m = 24 then p̂ = 2430 = .8:

0.0 0.2 0.4 0.6 0.8 1.0

0.00

0.05

0.10

0.15

p

Like

lihoo

d

●


Maximum Likelihood EstimationMeasures of Uncertainty

Imagine that the same experiment could be repeated many timeswithout changing the value of the parameter.

Definition 1The standard error of the estimator is the standard deviation of theestimates computed from each of the resulting data sets.

Definition 2A 95% confidence interval is a pair of values which, computed inthe same manner for each data set, would bound the true value forat least 95% of the repetitions.

The standard error for the capture probability is:

SEp =√

p̂(1− p̂)/n.

A 95% confidence interval has bounds:

p̂ − 1.96SEp and p̂ + 1.96SEp.An Introduction to Bayesian Inference: The Binomial Model, Maximum Likelihood Estimation 11/60

Maximum Likelihood EstimationMeasures of Uncertainty

If n = 30 and m = 24 then:

the standard error of p̂ is: SEp = .07

a 95% confidence interval for p̂ is: (.66,.94)

0.0 0.2 0.4 0.6 0.8 1.0

0.00

0.05

0.10

0.15

p

Like

lihoo

d

●




Bayesian Inference and the Posterior DensityCombining Data from Multiple Experiments

Pilot Study

Data: n = 20, m = 10

Likelihood:(20

10

)p10(1− p)10

Full Experiment

Data: n = 30, m = 24

Likelihood:(30

24

)p24(1− p)6

Combined Analysis

Likelihood:(20

10

)p10(1− p)10

(3024

)p24(1− p)6

Estimate: p̂ = 3450 = .68

An Introduction to Bayesian Inference: The Binomial Model, Bayesian Inference and the Posterior Density 14/60

Bayesian Inference and the Posterior DensityCombining Data from Multiple Experiments

Pilot Study

Data: n = 20, m = 10

Likelihood:(20

10

)p10(1− p)10

Full Experiment

Data: n = 30, m = 24

Likelihood:(30

24

)p24(1− p)6

Combined Analysis

Likelihood:(20

10

)(3024

)p34(1− p)16

Estimate: p̂ = 3450 = .68


Bayesian Inference and the Posterior DensityCombining Data with Prior Beliefs I

Prior Beliefs

Hypothetical Data: n = 20, m = 10

Prior Density:(20

10

)p10(1− p)10

Full Experiment

Data: n = 30, m = 24

Likelihood:(30

24

)p24(1− p)6

Posterior Beliefs

Posterior Density:(20

10

)(3024

)p34(1− p)16

Estimate: p̂ = 3450 = .68


Bayesian Inference and the Posterior DensityCombining Data with Prior Beliefs I

Prior density:(20

10

)p10(1− p)10

p

0.0 0.2 0.4 0.6 0.8 1.0

LikelihoodPriorPosterior


Bayesian Inference and the Posterior DensityCombining Data with Prior Beliefs II

Prior Beliefs

Hypothetical Data: n = 2, m = 1

Prior Density:(2

1

)p1(1− p)1

Full Experiment

Data: n = 30, m = 24

Likelihood:(30

24

)p24(1− p)6

Posterior Beliefs

Posterior Density:(30

24

)(21

)p25(1− p)7

Estimate: p̂ = 2532 = .78


Bayesian Inference and the Posterior DensityCombining Data with Prior Beliefs II

Prior density:(2

1

)p1(1− p)1

p

0.0 0.2 0.4 0.6 0.8 1.0

LikelihoodPriorPosterior




Posterior SummariesBasis of Bayesian Inference

Fact 1A Bayesian posterior density is a true probability density which canbe used to make direct probability statements about a parameter.

Fact 2Any summary used to describe a random quantity can also be usedto summarize the posterior density.Examples:

mean

median

standard deviation

variance

quantiles

An Introduction to Bayesian Inference: The Binomial Model, Posterior Summaries 20/60

Posterior SummariesBayesian Measures of Centrality

Classical Point Estimates

Maximum Likelihood Estimate

Bayesian Point Estimates

Posterior Mode

Posterior Mean

Posterior Median


Posterior SummariesBayesian Measures of Centrality

Classical Point Estimates

Maximum Likelihood Estimate

Bayesian Point Estimates

Posterior Mode

Posterior Mean

Posterior Median


Posterior SummariesBayesian Measures of Uncertainty

Classical Measures of Uncertainty

Standard Error

95% Confidence Interval

Bayesian Measures of Uncertainty

Posterior Standard Deviation:The standard deviation of the posterior density.

95% Credible Interval:Any interval which contains 95% of the posterior density.


Exercises

1 Bayesian inference for the binomial experimentFile: Intro to Bayes\Exercises\binomial 1.RThis file contains code for plotting the prior density, likelihoodfunction, and posterior density for the binomial model. Varythe values of n, m, and alpha to see how the shapes of thesefunctions and the corresponding posterior summaries areaffected.






An Introduction to MCMCSampling from the Posterior

Concept

If the posterior density is too complicated, then we can estimateposterior quantities by generating a sample from the posteriordensity and computing sample statistics.

An Introduction to Bayesian Inference: MCMC Methods, An Introduction to MCMC 26/60

An Introduction to MCMCThe Very Basics of Markov chain Monte Carlo

DefinitionA Markov chain is a sequence of events such that the probabilitiesfor one event depend only on the outcome of the previous event inthe sequence.

Key Property

If we choose construct the Markov chain properly then theprobability density of the events can be made to match anyprobability density – including the posterior density.

Implication

We can use a carefully constructed chain to generate a sample anycomplicated posterior density.


An Introduction to MCMCThe Very Basics of Markov chain Monte Carlo

DefinitionA Markov chain is a sequence of events such that the probabilitiesfor one event depend only on the outcome of the previous event inthe sequence.

Key Property

If we choose construct the Markov chain properly then theprobability density of the events can be made to match anyprobability density – including the posterior density.

Implication

We can use a carefully constructed chain to generate a sample anycomplicated posterior density.




An Introduction to WinBUGSWinBUGS for the Binomial Experiment

Intro to Bayes\Exercises\binomial model winbugs.txt

## 1) Model definition

model binomial{## Likelihood function

m ~ dbin(p,n)

## Prior distribution

p ~ dbeta (1,1)}

## 2) Data list

list(n=30,m=24)

## 3) Initial values

list(p=.8)

An Introduction to Bayesian Inference: MCMC Methods, An Introduction to WinBUGS 29/60

Exercises

1 WinBUGS for the Binomial ExperimentIntro to Bayes\Exercises\binomial model winbugs.txtUse the provided code to implement the binomial model inWinBUGS . Change the parameters of the prior distribution forp, a and b, so that they are both equal to 1 and recomputethe posterior summaries.

An Introduction to Bayesian Inference: MCMC Methods, An Introduction to WinBUGS 30/60





The Simple-Petersen ModelModel Structure

Notation

n/m=# of marked individuals alive/captured

U/u=# of unmarked individuals alive/captured

Model

Marked sample: m ∼ Binomial(n, p)

Unmarked sample: u ∼ Binomial(U, p)

Prior Densities

p: p ∼ Beta(a, b)

U: log(U) ∝ 1 (Jeffrey’s prior)

An Introduction to Bayesian Inference: Two-Stage Capture-Recapture Models, The Simple-Petersen Model 33/60

The Simple-Petersen ModelWinBUGS Implementation

Intro to Bayes\Exercises\cr winbugs.txt

An Introduction to Bayesian Inference: Two-Stage Capture-Recapture Models, The Simple-Petersen Model 34/60



The Stratified-Petersen ModelModel Structure

Notation

ni/mi=# of marked individuals alive/captured on day i

Ui/ui=# of unmarked individuals alive/captured on day i

Model

Marked sample: mi ∼ Binomial(ni , pi ), i = 1, . . . , s

Unmarked sample: ui ∼ Binomial(Ui , pi ), i = 1, . . . , s

Prior Densities

pi : pi ∼ Beta(a, b), i = 1, . . . , s

Ui : log(Ui ) ∝ 1, i = 1, . . . , s

An Introduction to Bayesian Inference: Two-Stage Capture-Recapture Models, The Stratified-Petersen Model 36/60

The Stratified-Petersen ModelWinBUGS Implementation

Intro to Bayes\Exercises\cr stratified winbugs.txt


Exercises

1 The Stratified-Petersen ModelIntro to Bayes\Exercises\cr stratified winbugs.txtUse the provided code to implement the stratified-Petersenmodel for the simulated data set and produce a boxplot forthe values of p (if you didn’t specify p in the sample monitorthan you will need to do so and re-run the chain). Notice thatthe 95% credible intervals are much wider for some values ofpi than for others. Why is this?




The Hierarchical-Petersen ModelModel Structure

Notation

ni/mi=# of marked individuals alive/captured on day i

Ui/ui=# of unmarked individuals alive/captured on day i

Model

Marked sample: mi ∼ Binomial(ni , pi ), i = 1, . . . , s

Unmarked sample: ui ∼ Binomial(Ui , pi ), i = 1, . . . , s

Capture probabilities: log(pi/(1− pi )) = ηpi

Prior Densities

ηpi : ηp

i ∼ N(µ, τ2), i = 1, . . . , s

µ, τ : µ ∼ N(0, 10002), τ ∼ Γ−1(.01, .01)

Ui : log(Ui ) ∝ 1, i = 1, . . . , sAn Introduction to Bayesian Inference: Two-Stage Capture-Recapture Models, The Hierarchical-Petersen Model 40/60

The Hierarchical-Petersen ModelWinBUGS Implementation

Intro to Bayes\Exercises\cr hierarchical winbugs.txt

An Introduction to Bayesian Inference: Two-Stage Capture-Recapture Models, The Hierarchical-Petersen Model 41/60

Exercises

1 Bayesian inference for the hierarchical Petersen modelIntro to Bayes\Exercises\cr hierarchical 2 winbugs.txtThe hierarchical model can be used even in the more extremecase in which no marked fish are released in one stratum orthe number of recoveries is missing, so that there is no directinformation about the capture probability. This file containsthe code for fitting the hierarchical model to the simulateddata, except that some of the values of ni have been replacedby the value NA, WinBUGS notation for missing data. Run themodel and produce boxplots for U and p. Note that you willhave to use the gen inits button in the SpecificationTool window to generate initial values for the missing dataafter loading the initial values for p and U.

An Introduction to Bayesian Inference: Two-Stage Capture-Recapture Models, The Hierarchical-Petersen Model 42/60





Monitoring ConvergenceTraceplots

DefinitionThe traceplot for a Markov chain displays the generated valuesversus the iteration number.

Traceplot for U1 from the hierarchical-Petersen model:

An Introduction to Bayesian Inference: Further Issues, Monitoring Convergence 45/60

Monitoring ConvergenceTraceplots and Mixing

Poor Mixing

Good Mixing


Monitoring ConvergenceMC Error

DefinitionThe MC error is the amount uncertainty in the posteriorsummaries due to approximation by a finite sample.

Posterior summary for U1 after 10,000 iterations:


Monitoring ConvergenceMC Error

DefinitionThe MC error is the amount uncertainty in the posteriorsummaries due to approximation by a finite sample.

Posterior summary for U1 after 100,000 iterations:


Monitoring ConvergenceThinning

DefinitionA chain is thinned if only a subset of the generated values arestored and used to compute summary statistics.

Summary statistics for U[1] – 100,000 iterations:

Summary statistics for U[1] – 100,000 iterations thinned by 10:


Monitoring ConvergenceBurn-in Period

DefinitionThe burn-in period is the number of iterations necessary for thechain to converge to the posterior distribution.

Multiple Chains

The burn-in period can be assessed by running several chains withdifferent starting values:


Monitoring ConvergenceThe Brooks-Gelman-Rubin Diagnostic

DefinitionThe Brooks-Gelman-Rubin convergence diagnostic compares theposterior summaries for the separate samples from each chain andthe posterior summaries from the pooled sample from all chains.These should be equal at convergence.

Brooks-Gelman-Rubin diagnostic plot for µ after 100,000iterations:


Exercises

1 Bayesian inference for the hierarchical Petersen model:convergence diagnosticsIntro to Bayes\Exercises\cr hierarchical bgr winbugs.txtThis file contains code to run three parallel chains for thehierarchical-Petersen model. Implement the model and thenproduce traceplots and compute the Brooks-Gelman-Rubindiagnostics. To initialize the model you will need to enter 3 inthe num of chains dialogue and then load the three sets ofinitial values one at a time.




Model SelectionThe Principle of Parsimony

Concept

The most parsimonious model is the one that best explains thedata with the fewest number of parameters.

An Introduction to Bayesian Inference: Further Issues, Model Selection and the DIC 53/60

Model SelectionThe Deviance Information Criterion

Definition 1The pD value for a model is an estimate of the effective number ofparameters in the model – the number of unique and estimableparameters.

Definition 2The Deviance Information Criterion (DIC) is a penalized form ofthe likelihood that accounts for the number of parameters in amodel, as measured by pD . Smaller values are better.

An Introduction to Bayesian Inference: Further Issues, Model Selection and the DIC 54/60



Goodness-of-FitPosterior Prediction

Concept

If the model fits well then new data simulated from the model andthe parameter values generated from the posterior should besimilar to the observed data.

An Introduction to Bayesian Inference: Further Issues, Goodness-of-Fit and Bayesian p-values 56/60

Goodness-of-FitBayesian p-value

Definition 1A discrepancy measure is a function of both the data and theparameters that asses the fit of some part of the model.

Example

Freeman-Tukey comparison of observed and expected counts ofunmarked fish:

D(u,U,p) =s∑

i=1

(√

ui −√

Uipi )2

Definition 2The Bayesian p-value is the proportion of times the discrepancy ofthe observed data is less than the discrepancy of the simulateddata. Bayesian p-values near 0 indicate lack of fit.

An Introduction to Bayesian Inference: Further Issues, Goodness-of-Fit and Bayesian p-values 57/60


5 Bayesian Penalized Splines

Bayesian Penalized Splines

Concept

We can control the smoothness of a B-spline by assigning a priordensity to the differences in the coefficients.

Specifically, we would like our prior to favour smoothness but allowfor sharp changes if the data warrants.

An Introduction to Bayesian Inference: Bayesian Penalized Splines, 59/60

Bayesian Penalized Splines

Model Structure

B-spline yi =∑K+D+1

k=1 bkBk(xi ) + εi

Error εi ∼ N(0, σ2)

Hierarchical Prior Density for Spline Coefficients

Level 1 (bk − bk−1) ∼ N(bk−1 − bk−2, (1/λ)2)

Level 2 λ ∼ Γ(.05, .05)

The parameter λ plays the same role as the smoothing parameter:

if λ is big then bk ≈ bk−1 and the spline is smooth,

if λ is small then bk and bk−1 can be very different.

An Introduction to Bayesian Inference: Bayesian Penalized Splines, 60/60

Documents

An Introduction to Bayesian Inference and MCMC Methods …people.stat.sfu.ca/~cschwarz/Consulting/Trinity/Phase2/Trinity... · An Introduction to WinBUGS 3 Two-Stage Capture-Recapture