Upload
dangkhanh
View
221
Download
1
Embed Size (px)
Citation preview
An Introduction to BayesianInference and MCMC Methods for
Capture-Recapture
Trinity River Restoration ProgramWorkshop on Outmigration: Population Estimation
October 6–8, 2009
An Introduction to Bayesian Inference
1 The Binomial ModelMaximum Likelihood EstimationBayesian Inference and the Posterior DensityPosterior Summaries
2 MCMC MethodsAn Introduction to MCMCAn Introduction to WinBUGS
3 Two-Stage Capture-Recapture ModelsThe Simple-Petersen ModelThe Stratified-Petersen ModelThe Hierarchical-Petersen Model
4 Further IssuesMonitoring ConvergenceModel Selection and the DICGoodness-of-Fit and Bayesian p-values
5 Bayesian Penalized Splines
An Introduction to Bayesian Inference
1 The Binomial ModelMaximum Likelihood EstimationBayesian Inference and the Posterior DensityPosterior Summaries
An Introduction to Bayesian Inference
1 The Binomial ModelMaximum Likelihood EstimationBayesian Inference and the Posterior DensityPosterior Summaries
Maximum Likelihood EstimationThe Binomial Distribution
Setup
a population contains a fixed and known number of markedindividuals (n)
Assumptions
every individual has the same probability of being captured (p)
individuals are captured independently
Probability Mass Function
The probability that m of n individuals are captured is:
P(m|p) =
(n
m
)pm(1− p)n−m
An Introduction to Bayesian Inference: The Binomial Model, Maximum Likelihood Estimation 5/60
Maximum Likelihood EstimationThe Binomial Distribution
If n = 30 and p = .8:
0 5 10 15 20 25 30
0.00
0.05
0.10
0.15
Probability Mass Function
m
Pro
babi
lity
An Introduction to Bayesian Inference: The Binomial Model, Maximum Likelihood Estimation 6/60
Maximum Likelihood EstimationThe Likelihood Function
DefinitionThe likelihood function is equal to the probability mass function ofthe observed data allowing the parameter values to change whilethe data is fixed.
The likelihood function for the binomial experiment is:
L(p|m) = P(m|p) =
(n
m
)pm(1− p)n−m
An Introduction to Bayesian Inference: The Binomial Model, Maximum Likelihood Estimation 7/60
Maximum Likelihood EstimationThe Likelihood Function
If n = 30 and m = 24:
0.0 0.2 0.4 0.6 0.8 1.0
0.00
0.05
0.10
0.15
p
Like
lihoo
d
An Introduction to Bayesian Inference: The Binomial Model, Maximum Likelihood Estimation 8/60
Maximum Likelihood EstimationMaximum Likelihood Estimates
DefinitionThe maximum likelihood estimator is the value of the parameterwhich maximizes the likelihood function for the observed data.
The maximum likelihood estimator of p for the binomialexperiment is:
p̂ =m
n
An Introduction to Bayesian Inference: The Binomial Model, Maximum Likelihood Estimation 9/60
Maximum Likelihood EstimationMaximum Likelihood Estimates
If n = 30 and m = 24 then p̂ = 2430 = .8:
0.0 0.2 0.4 0.6 0.8 1.0
0.00
0.05
0.10
0.15
p
Like
lihoo
d
●
An Introduction to Bayesian Inference: The Binomial Model, Maximum Likelihood Estimation 10/60
Maximum Likelihood EstimationMeasures of Uncertainty
Imagine that the same experiment could be repeated many timeswithout changing the value of the parameter.
Definition 1The standard error of the estimator is the standard deviation of theestimates computed from each of the resulting data sets.
Definition 2A 95% confidence interval is a pair of values which, computed inthe same manner for each data set, would bound the true value forat least 95% of the repetitions.
The standard error for the capture probability is:
SEp =√
p̂(1− p̂)/n.
A 95% confidence interval has bounds:
p̂ − 1.96SEp and p̂ + 1.96SEp.An Introduction to Bayesian Inference: The Binomial Model, Maximum Likelihood Estimation 11/60
Maximum Likelihood EstimationMeasures of Uncertainty
If n = 30 and m = 24 then:
the standard error of p̂ is: SEp = .07
a 95% confidence interval for p̂ is: (.66,.94)
0.0 0.2 0.4 0.6 0.8 1.0
0.00
0.05
0.10
0.15
p
Like
lihoo
d
●
An Introduction to Bayesian Inference: The Binomial Model, Maximum Likelihood Estimation 12/60
An Introduction to Bayesian Inference
1 The Binomial ModelMaximum Likelihood EstimationBayesian Inference and the Posterior DensityPosterior Summaries
Bayesian Inference and the Posterior DensityCombining Data from Multiple Experiments
Pilot Study
Data: n = 20, m = 10
Likelihood:(20
10
)p10(1− p)10
Full Experiment
Data: n = 30, m = 24
Likelihood:(30
24
)p24(1− p)6
Combined Analysis
Likelihood:(20
10
)p10(1− p)10
(3024
)p24(1− p)6
Estimate: p̂ = 3450 = .68
An Introduction to Bayesian Inference: The Binomial Model, Bayesian Inference and the Posterior Density 14/60
Bayesian Inference and the Posterior DensityCombining Data from Multiple Experiments
Pilot Study
Data: n = 20, m = 10
Likelihood:(20
10
)p10(1− p)10
Full Experiment
Data: n = 30, m = 24
Likelihood:(30
24
)p24(1− p)6
Combined Analysis
Likelihood:(20
10
)(3024
)p34(1− p)16
Estimate: p̂ = 3450 = .68
An Introduction to Bayesian Inference: The Binomial Model, Bayesian Inference and the Posterior Density 14/60
Bayesian Inference and the Posterior DensityCombining Data with Prior Beliefs I
Prior Beliefs
Hypothetical Data: n = 20, m = 10
Prior Density:(20
10
)p10(1− p)10
Full Experiment
Data: n = 30, m = 24
Likelihood:(30
24
)p24(1− p)6
Posterior Beliefs
Posterior Density:(20
10
)(3024
)p34(1− p)16
Estimate: p̂ = 3450 = .68
An Introduction to Bayesian Inference: The Binomial Model, Bayesian Inference and the Posterior Density 15/60
Bayesian Inference and the Posterior DensityCombining Data with Prior Beliefs I
Prior density:(20
10
)p10(1− p)10
p
0.0 0.2 0.4 0.6 0.8 1.0
LikelihoodPriorPosterior
An Introduction to Bayesian Inference: The Binomial Model, Bayesian Inference and the Posterior Density 16/60
Bayesian Inference and the Posterior DensityCombining Data with Prior Beliefs II
Prior Beliefs
Hypothetical Data: n = 2, m = 1
Prior Density:(2
1
)p1(1− p)1
Full Experiment
Data: n = 30, m = 24
Likelihood:(30
24
)p24(1− p)6
Posterior Beliefs
Posterior Density:(30
24
)(21
)p25(1− p)7
Estimate: p̂ = 2532 = .78
An Introduction to Bayesian Inference: The Binomial Model, Bayesian Inference and the Posterior Density 17/60
Bayesian Inference and the Posterior DensityCombining Data with Prior Beliefs II
Prior density:(2
1
)p1(1− p)1
p
0.0 0.2 0.4 0.6 0.8 1.0
LikelihoodPriorPosterior
An Introduction to Bayesian Inference: The Binomial Model, Bayesian Inference and the Posterior Density 18/60
An Introduction to Bayesian Inference
1 The Binomial ModelMaximum Likelihood EstimationBayesian Inference and the Posterior DensityPosterior Summaries
Posterior SummariesBasis of Bayesian Inference
Fact 1A Bayesian posterior density is a true probability density which canbe used to make direct probability statements about a parameter.
Fact 2Any summary used to describe a random quantity can also be usedto summarize the posterior density.Examples:
mean
median
standard deviation
variance
quantiles
An Introduction to Bayesian Inference: The Binomial Model, Posterior Summaries 20/60
Posterior SummariesBayesian Measures of Centrality
Classical Point Estimates
Maximum Likelihood Estimate
Bayesian Point Estimates
Posterior Mode
Posterior Mean
Posterior Median
An Introduction to Bayesian Inference: The Binomial Model, Posterior Summaries 21/60
Posterior SummariesBayesian Measures of Centrality
Classical Point Estimates
Maximum Likelihood Estimate
Bayesian Point Estimates
Posterior Mode
Posterior Mean
Posterior Median
An Introduction to Bayesian Inference: The Binomial Model, Posterior Summaries 21/60
Posterior SummariesBayesian Measures of Uncertainty
Classical Measures of Uncertainty
Standard Error
95% Confidence Interval
Bayesian Measures of Uncertainty
Posterior Standard Deviation:The standard deviation of the posterior density.
95% Credible Interval:Any interval which contains 95% of the posterior density.
An Introduction to Bayesian Inference: The Binomial Model, Posterior Summaries 22/60
Exercises
1 Bayesian inference for the binomial experimentFile: Intro to Bayes\Exercises\binomial 1.RThis file contains code for plotting the prior density, likelihoodfunction, and posterior density for the binomial model. Varythe values of n, m, and alpha to see how the shapes of thesefunctions and the corresponding posterior summaries areaffected.
An Introduction to Bayesian Inference: The Binomial Model, Posterior Summaries 23/60
An Introduction to Bayesian Inference
2 MCMC MethodsAn Introduction to MCMCAn Introduction to WinBUGS
An Introduction to Bayesian Inference
2 MCMC MethodsAn Introduction to MCMCAn Introduction to WinBUGS
An Introduction to MCMCSampling from the Posterior
Concept
If the posterior density is too complicated, then we can estimateposterior quantities by generating a sample from the posteriordensity and computing sample statistics.
An Introduction to Bayesian Inference: MCMC Methods, An Introduction to MCMC 26/60
An Introduction to MCMCThe Very Basics of Markov chain Monte Carlo
DefinitionA Markov chain is a sequence of events such that the probabilitiesfor one event depend only on the outcome of the previous event inthe sequence.
Key Property
If we choose construct the Markov chain properly then theprobability density of the events can be made to match anyprobability density – including the posterior density.
Implication
We can use a carefully constructed chain to generate a sample anycomplicated posterior density.
An Introduction to Bayesian Inference: MCMC Methods, An Introduction to MCMC 27/60
An Introduction to MCMCThe Very Basics of Markov chain Monte Carlo
DefinitionA Markov chain is a sequence of events such that the probabilitiesfor one event depend only on the outcome of the previous event inthe sequence.
Key Property
If we choose construct the Markov chain properly then theprobability density of the events can be made to match anyprobability density – including the posterior density.
Implication
We can use a carefully constructed chain to generate a sample anycomplicated posterior density.
An Introduction to Bayesian Inference: MCMC Methods, An Introduction to MCMC 27/60
An Introduction to Bayesian Inference
2 MCMC MethodsAn Introduction to MCMCAn Introduction to WinBUGS
An Introduction to WinBUGSWinBUGS for the Binomial Experiment
Intro to Bayes\Exercises\binomial model winbugs.txt
## 1) Model definition
model binomial{## Likelihood function
m ~ dbin(p,n)
## Prior distribution
p ~ dbeta (1,1)}
## 2) Data list
list(n=30,m=24)
## 3) Initial values
list(p=.8)
An Introduction to Bayesian Inference: MCMC Methods, An Introduction to WinBUGS 29/60
Exercises
1 WinBUGS for the Binomial ExperimentIntro to Bayes\Exercises\binomial model winbugs.txtUse the provided code to implement the binomial model inWinBUGS . Change the parameters of the prior distribution forp, a and b, so that they are both equal to 1 and recomputethe posterior summaries.
An Introduction to Bayesian Inference: MCMC Methods, An Introduction to WinBUGS 30/60
An Introduction to Bayesian Inference
3 Two-Stage Capture-Recapture ModelsThe Simple-Petersen ModelThe Stratified-Petersen ModelThe Hierarchical-Petersen Model
An Introduction to Bayesian Inference
3 Two-Stage Capture-Recapture ModelsThe Simple-Petersen ModelThe Stratified-Petersen ModelThe Hierarchical-Petersen Model
The Simple-Petersen ModelModel Structure
Notation
n/m=# of marked individuals alive/captured
U/u=# of unmarked individuals alive/captured
Model
Marked sample: m ∼ Binomial(n, p)
Unmarked sample: u ∼ Binomial(U, p)
Prior Densities
p: p ∼ Beta(a, b)
U: log(U) ∝ 1 (Jeffrey’s prior)
An Introduction to Bayesian Inference: Two-Stage Capture-Recapture Models, The Simple-Petersen Model 33/60
The Simple-Petersen ModelWinBUGS Implementation
Intro to Bayes\Exercises\cr winbugs.txt
An Introduction to Bayesian Inference: Two-Stage Capture-Recapture Models, The Simple-Petersen Model 34/60
An Introduction to Bayesian Inference
3 Two-Stage Capture-Recapture ModelsThe Simple-Petersen ModelThe Stratified-Petersen ModelThe Hierarchical-Petersen Model
The Stratified-Petersen ModelModel Structure
Notation
ni/mi=# of marked individuals alive/captured on day i
Ui/ui=# of unmarked individuals alive/captured on day i
Model
Marked sample: mi ∼ Binomial(ni , pi ), i = 1, . . . , s
Unmarked sample: ui ∼ Binomial(Ui , pi ), i = 1, . . . , s
Prior Densities
pi : pi ∼ Beta(a, b), i = 1, . . . , s
Ui : log(Ui ) ∝ 1, i = 1, . . . , s
An Introduction to Bayesian Inference: Two-Stage Capture-Recapture Models, The Stratified-Petersen Model 36/60
The Stratified-Petersen ModelWinBUGS Implementation
Intro to Bayes\Exercises\cr stratified winbugs.txt
An Introduction to Bayesian Inference: Two-Stage Capture-Recapture Models, The Stratified-Petersen Model 37/60
Exercises
1 The Stratified-Petersen ModelIntro to Bayes\Exercises\cr stratified winbugs.txtUse the provided code to implement the stratified-Petersenmodel for the simulated data set and produce a boxplot forthe values of p (if you didn’t specify p in the sample monitorthan you will need to do so and re-run the chain). Notice thatthe 95% credible intervals are much wider for some values ofpi than for others. Why is this?
An Introduction to Bayesian Inference: Two-Stage Capture-Recapture Models, The Stratified-Petersen Model 38/60
An Introduction to Bayesian Inference
3 Two-Stage Capture-Recapture ModelsThe Simple-Petersen ModelThe Stratified-Petersen ModelThe Hierarchical-Petersen Model
The Hierarchical-Petersen ModelModel Structure
Notation
ni/mi=# of marked individuals alive/captured on day i
Ui/ui=# of unmarked individuals alive/captured on day i
Model
Marked sample: mi ∼ Binomial(ni , pi ), i = 1, . . . , s
Unmarked sample: ui ∼ Binomial(Ui , pi ), i = 1, . . . , s
Capture probabilities: log(pi/(1− pi )) = ηpi
Prior Densities
ηpi : ηp
i ∼ N(µ, τ2), i = 1, . . . , s
µ, τ : µ ∼ N(0, 10002), τ ∼ Γ−1(.01, .01)
Ui : log(Ui ) ∝ 1, i = 1, . . . , sAn Introduction to Bayesian Inference: Two-Stage Capture-Recapture Models, The Hierarchical-Petersen Model 40/60
The Hierarchical-Petersen ModelWinBUGS Implementation
Intro to Bayes\Exercises\cr hierarchical winbugs.txt
An Introduction to Bayesian Inference: Two-Stage Capture-Recapture Models, The Hierarchical-Petersen Model 41/60
Exercises
1 Bayesian inference for the hierarchical Petersen modelIntro to Bayes\Exercises\cr hierarchical 2 winbugs.txtThe hierarchical model can be used even in the more extremecase in which no marked fish are released in one stratum orthe number of recoveries is missing, so that there is no directinformation about the capture probability. This file containsthe code for fitting the hierarchical model to the simulateddata, except that some of the values of ni have been replacedby the value NA, WinBUGS notation for missing data. Run themodel and produce boxplots for U and p. Note that you willhave to use the gen inits button in the SpecificationTool window to generate initial values for the missing dataafter loading the initial values for p and U.
An Introduction to Bayesian Inference: Two-Stage Capture-Recapture Models, The Hierarchical-Petersen Model 42/60
An Introduction to Bayesian Inference
4 Further IssuesMonitoring ConvergenceModel Selection and the DICGoodness-of-Fit and Bayesian p-values
An Introduction to Bayesian Inference
4 Further IssuesMonitoring ConvergenceModel Selection and the DICGoodness-of-Fit and Bayesian p-values
Monitoring ConvergenceTraceplots
DefinitionThe traceplot for a Markov chain displays the generated valuesversus the iteration number.
Traceplot for U1 from the hierarchical-Petersen model:
An Introduction to Bayesian Inference: Further Issues, Monitoring Convergence 45/60
Monitoring ConvergenceTraceplots and Mixing
Poor Mixing
Good Mixing
An Introduction to Bayesian Inference: Further Issues, Monitoring Convergence 46/60
Monitoring ConvergenceMC Error
DefinitionThe MC error is the amount uncertainty in the posteriorsummaries due to approximation by a finite sample.
Posterior summary for U1 after 10,000 iterations:
An Introduction to Bayesian Inference: Further Issues, Monitoring Convergence 47/60
Monitoring ConvergenceMC Error
DefinitionThe MC error is the amount uncertainty in the posteriorsummaries due to approximation by a finite sample.
Posterior summary for U1 after 100,000 iterations:
An Introduction to Bayesian Inference: Further Issues, Monitoring Convergence 47/60
Monitoring ConvergenceThinning
DefinitionA chain is thinned if only a subset of the generated values arestored and used to compute summary statistics.
Summary statistics for U[1] – 100,000 iterations:
Summary statistics for U[1] – 100,000 iterations thinned by 10:
An Introduction to Bayesian Inference: Further Issues, Monitoring Convergence 48/60
Monitoring ConvergenceBurn-in Period
DefinitionThe burn-in period is the number of iterations necessary for thechain to converge to the posterior distribution.
Multiple Chains
The burn-in period can be assessed by running several chains withdifferent starting values:
An Introduction to Bayesian Inference: Further Issues, Monitoring Convergence 49/60
Monitoring ConvergenceThe Brooks-Gelman-Rubin Diagnostic
DefinitionThe Brooks-Gelman-Rubin convergence diagnostic compares theposterior summaries for the separate samples from each chain andthe posterior summaries from the pooled sample from all chains.These should be equal at convergence.
Brooks-Gelman-Rubin diagnostic plot for µ after 100,000iterations:
An Introduction to Bayesian Inference: Further Issues, Monitoring Convergence 50/60
Exercises
1 Bayesian inference for the hierarchical Petersen model:convergence diagnosticsIntro to Bayes\Exercises\cr hierarchical bgr winbugs.txtThis file contains code to run three parallel chains for thehierarchical-Petersen model. Implement the model and thenproduce traceplots and compute the Brooks-Gelman-Rubindiagnostics. To initialize the model you will need to enter 3 inthe num of chains dialogue and then load the three sets ofinitial values one at a time.
An Introduction to Bayesian Inference: Further Issues, Monitoring Convergence 51/60
An Introduction to Bayesian Inference
4 Further IssuesMonitoring ConvergenceModel Selection and the DICGoodness-of-Fit and Bayesian p-values
Model SelectionThe Principle of Parsimony
Concept
The most parsimonious model is the one that best explains thedata with the fewest number of parameters.
An Introduction to Bayesian Inference: Further Issues, Model Selection and the DIC 53/60
Model SelectionThe Deviance Information Criterion
Definition 1The pD value for a model is an estimate of the effective number ofparameters in the model – the number of unique and estimableparameters.
Definition 2The Deviance Information Criterion (DIC) is a penalized form ofthe likelihood that accounts for the number of parameters in amodel, as measured by pD . Smaller values are better.
An Introduction to Bayesian Inference: Further Issues, Model Selection and the DIC 54/60
An Introduction to Bayesian Inference
4 Further IssuesMonitoring ConvergenceModel Selection and the DICGoodness-of-Fit and Bayesian p-values
Goodness-of-FitPosterior Prediction
Concept
If the model fits well then new data simulated from the model andthe parameter values generated from the posterior should besimilar to the observed data.
An Introduction to Bayesian Inference: Further Issues, Goodness-of-Fit and Bayesian p-values 56/60
Goodness-of-FitBayesian p-value
Definition 1A discrepancy measure is a function of both the data and theparameters that asses the fit of some part of the model.
Example
Freeman-Tukey comparison of observed and expected counts ofunmarked fish:
D(u,U,p) =s∑
i=1
(√
ui −√
Uipi )2
Definition 2The Bayesian p-value is the proportion of times the discrepancy ofthe observed data is less than the discrepancy of the simulateddata. Bayesian p-values near 0 indicate lack of fit.
An Introduction to Bayesian Inference: Further Issues, Goodness-of-Fit and Bayesian p-values 57/60
An Introduction to Bayesian Inference
5 Bayesian Penalized Splines
Bayesian Penalized Splines
Concept
We can control the smoothness of a B-spline by assigning a priordensity to the differences in the coefficients.
Specifically, we would like our prior to favour smoothness but allowfor sharp changes if the data warrants.
An Introduction to Bayesian Inference: Bayesian Penalized Splines, 59/60
Bayesian Penalized Splines
Model Structure
B-spline yi =∑K+D+1
k=1 bkBk(xi ) + εi
Error εi ∼ N(0, σ2)
Hierarchical Prior Density for Spline Coefficients
Level 1 (bk − bk−1) ∼ N(bk−1 − bk−2, (1/λ)2)
Level 2 λ ∼ Γ(.05, .05)
The parameter λ plays the same role as the smoothing parameter:
if λ is big then bk ≈ bk−1 and the spline is smooth,
if λ is small then bk and bk−1 can be very different.
An Introduction to Bayesian Inference: Bayesian Penalized Splines, 60/60