57
Elements of Statistical Inference Theme of the workshop (and book): Analyzing HMs using both classical and Bayesian methods. Dual inference paradigmTopics covered here: Classical inference (likelihood, frequentist) Bayesian inference (posterior distribution) Implementation in R (both MLE and MCMC)

Elements of Statistical Inference Theme of the workshop (and book): Analyzing HMs using both classical and Bayesian methods. “Dual inference paradigm”

Embed Size (px)

Citation preview

Page 1: Elements of Statistical Inference Theme of the workshop (and book): Analyzing HMs using both classical and Bayesian methods. “Dual inference paradigm”

Elements of Statistical Inference

Theme of the workshop (and book): Analyzing HMs using both classical and Bayesian methods.

“Dual inference paradigm”

Topics covered here:• Classical inference (likelihood, frequentist)• Bayesian inference (posterior distribution)• Implementation in R (both MLE and MCMC)• Case Study: logistic regression (not a HM)• Case Study: Occupancy model (a HM)

Page 2: Elements of Statistical Inference Theme of the workshop (and book): Analyzing HMs using both classical and Bayesian methods. “Dual inference paradigm”

Inference for statistical models

• Parametric inference: explicit probability assumptions about data. Inference proceeds assuming the model is truth. (not an approximation to truth, but actual truth)

• Two flavors– Classical inference – Bayesian inference

Page 3: Elements of Statistical Inference Theme of the workshop (and book): Analyzing HMs using both classical and Bayesian methods. “Dual inference paradigm”

Bayesian vs. Classical/Frequentist

• Classical inference– Likelihood estimation (‘method of maximum likelihood’)– Frequentists use a relative frequency interpretation in which

procedures are evaluated w.r.t. repeated realizations of the data. Probability is used to characterize how well procedures do, but not uncertainty about model parameters.

• Bayesian inference– Posterior inference: requires specification of a prior distribution– Bayesians make probability statements directly about model

parameters, conditional on the single data set that you have

Page 4: Elements of Statistical Inference Theme of the workshop (and book): Analyzing HMs using both classical and Bayesian methods. “Dual inference paradigm”

Notation

• Random variables: etc.– These always have probability distributions whether you’re

a Bayesian or not

• Parameters: etc. have probability distributions if you’re a Bayesian but not otherwise

• Distributions: etc..• Bracket notation: etc..• Note: (vertical bar) means conditional

Page 5: Elements of Statistical Inference Theme of the workshop (and book): Analyzing HMs using both classical and Bayesian methods. “Dual inference paradigm”

Classical inference

• You probably know this, but we review the basic ideas. And we show some technical elements in R to demystify what is being done in unmarked

Page 6: Elements of Statistical Inference Theme of the workshop (and book): Analyzing HMs using both classical and Bayesian methods. “Dual inference paradigm”

What is the likelihood?• Observations: random variables that you might observe, • The joint distribution of these random variables: [Independence!]

• The likelihood is the joint distribution regarded as a function of new notation:

• The value of that produces the highest value of the likelihood is the maximum likelihood estimator, MLE,

Footnote: the joint distribution by itself is a function of , and is an index that changes it’s form or shape somehow. But we think of the likelihood the other way around: for the fixed value of y, what is the value of the joint distribution function for different values of ?

Page 7: Elements of Statistical Inference Theme of the workshop (and book): Analyzing HMs using both classical and Bayesian methods. “Dual inference paradigm”

Example 1: Two independent binomial counts

# 2 binomial observationsy<- rbinom(2,size=10, p = .5)# The joint distribution function. As a function of y it gives# the probability of any two values of y1 and y2jointdis<- function(data,K,p){ prod(dbinom(data, size=K, p=p))}(jointdis(y, K=10, p = .5))# also is the likelihood of p = .5 for the# given data, but it is NOT a probability for p.# Evaluate the likelihood for a grid of values of "p"p.grid<- seq(.01,.99,,200)likelihood<- rep(NA,200)for(i in 1:200){ likelihood[i]<- jointdis(y, K = 10, p=p.grid[i])}# Plot the likelihoodplot(p.grid,likelihood,xlab="p", ylab="likelihood")

Page 8: Elements of Statistical Inference Theme of the workshop (and book): Analyzing HMs using both classical and Bayesian methods. “Dual inference paradigm”

• It is not a probability distribution for even though it is called a 'likelihood', which sounds vaguely like 'probability'. That was a marketing gimmick.

Page 9: Elements of Statistical Inference Theme of the workshop (and book): Analyzing HMs using both classical and Bayesian methods. “Dual inference paradigm”

Numerical maximization of the likelihood

• Numerical maximization of the likelihood was a HUGE change in applied statistics.

• Importance cannot be over-stated.• Don’t need formulas (explicit estimators)• Variances == numerically evaluated• Can do “marginal likelihood” by integrating

random effects out• Don’t need a statistician to do things for you

Page 10: Elements of Statistical Inference Theme of the workshop (and book): Analyzing HMs using both classical and Bayesian methods. “Dual inference paradigm”

Properties of MLEs• MLEs are asymptotically normally distributed• The Hessian matrix = matrix of 2nd derivatives of w.r.t. (Fisher

information matrix). The inverse of is the “asymptotic variance-covariance matrix”.– Asymptotic standard error (ASE)– Based on normal approx. to the sampling distribution of the MLE– Numerical evaluation: Revolutionized statistics in the 1970s

• Asymptotic unbiasedness: as the bias of the MLE .• Minimum variance: as the variance of the MLE is the minimum

among all unbiased estimates.• Invariance to transformation: MLE of a function of a parameter is

just a function of the MLE of that parameter [note: variance is not invariant to transofmration]

Page 11: Elements of Statistical Inference Theme of the workshop (and book): Analyzing HMs using both classical and Bayesian methods. “Dual inference paradigm”

Other elements of classical inference

Page 12: Elements of Statistical Inference Theme of the workshop (and book): Analyzing HMs using both classical and Bayesian methods. “Dual inference paradigm”

Parametric bootstrapping

• Obtain MLEs for a model• Simulate data using those MLEs• Obtain MLEs for simulated data• Repeat many times• Use the distribution of the MLEs of simulated

data as an empirical estimate of the sampling distribution

Page 13: Elements of Statistical Inference Theme of the workshop (and book): Analyzing HMs using both classical and Bayesian methods. “Dual inference paradigm”

Example 2: logistic regression

• Modeling species occurrence (usually presence or absence):– Binomial/Bernoulli observation model:

with

– And are independent• Likelihood = joint distribution regarded as a

function of :

Page 14: Elements of Statistical Inference Theme of the workshop (and book): Analyzing HMs using both classical and Bayesian methods. “Dual inference paradigm”

Ordinary logistic regression in R# -------------------------- Simulate data ----------------------------# Create a covariate called vegHtnSites <- 100set.seed(2014) # so that we all get the same values of vegHtvegHt <- runif(nSites, 1, 3) # uniform from 1 to 3

# Suppose that occupancy probability increases with vegHt# The relationship is described by an intercept of -3 and# a slope parameter of 2 on the logit scale# plogis is the inverse-logit (constrains us back to the [0-1] scale)psi <- plogis(-3 + 2*vegHt)

# Now we go to 100 sites and observe presence or absence# Actually, let's just simulate the dataz <- rbinom(nSites, 1, psi)

Page 15: Elements of Statistical Inference Theme of the workshop (and book): Analyzing HMs using both classical and Bayesian methods. “Dual inference paradigm”

General strategy for likelihood estimation: Express the negative log-likelihood as an R function and then use the standard function optim (or nlm) to minimize it:

# Definition of negative log-likelihood.negLogLike <- function(beta, y, x) { beta0 <- beta[1] beta1 <- beta[2] psi <- plogis(beta0 + beta1*x) # inverse-logit likelihood <- psi^y * (1-psi)^(1-y) # same as: # likelihood <- dbinom(y, 1, psi) return(-sum(log(likelihood)))}

# Look at (negative) log-likelihood for 2 parameter setsnegLogLike(c(0,0), y=z, x=vegHt)negLogLike(c(-3,2), y=z, x=vegHt) # Lower is better!

Page 16: Elements of Statistical Inference Theme of the workshop (and book): Analyzing HMs using both classical and Bayesian methods. “Dual inference paradigm”

# Let's minimize it formally by function minimisationstarting.values <- c(beta0=0, beta1=0)opt.out <- optim(starting.values, negLogLike, y=z, x=vegHt, hessian=TRUE)(mles <- opt.out$par) # MLEs are pretty close to truth beta0 beta1 -2.539793 1.617025  # Alternative 1: Brute-force grid search for MLEsmat <- as.matrix(expand.grid(seq(-10,10,0.1), seq(-10,10,0.1))) # above: Can vary resolutionnll <- array(NA, dim = nrow(mat))for (i in 1:nrow(mat)){ nll[i] <- negLogLike(mat[i,], y = z, x = vegHt)}which(nll == min(nll))mat[which(nll == min(nll)),] 

# Produce a likelihood surface, shown in Fig. 2-2.library(raster)r <- rasterFromXYZ(data.frame(x = mat[,1], y = mat[,2], z = nll))mapPalette <- colorRampPalette(rev(c("grey", "yellow", "red")))plot(r, col = mapPalette(100), main = "Negative log-likelihood", xlab = "Intercept (beta0)", ylab = "Slope (beta1)")contour(r, add = TRUE, levels = seq(50, 2000, 100)) 

Page 17: Elements of Statistical Inference Theme of the workshop (and book): Analyzing HMs using both classical and Bayesian methods. “Dual inference paradigm”

# Alternative 2: Use canned R function glm as a shortcut(fm <- glm(z ~ vegHt, family = binomial)$coef) # Add 3 sets of MLEs into plot# 1. Add MLE from function minimisationpoints(mles[1], mles[2], pch = 1, lwd = 2)abline(mles[2],0) # Put a line through the Slope valuelines(c(mles[1],mles[1]),c(-10,10))# 2. Add MLE from grid searchpoints(mat[which(nll == min(nll)),1], mat[which(nll == min(nll)),2], pch = 1, lwd = 2) # 3. Add MLE from glm functionpoints(fm[1], fm[2], pch = 1, lwd = 2) # Note they are essentially all the same

-10 -5 0 5 10

-10

-50

51

0

Negative log-likelihood

Intercept (beta0)

Slo

pe

(b

eta

1)

500

1000

1500

2000

150

Page 18: Elements of Statistical Inference Theme of the workshop (and book): Analyzing HMs using both classical and Bayesian methods. “Dual inference paradigm”

Asymptotic variance/SE

The hessian=TRUE option in the call to optim produces the Hessian matrix in the returned list opt.out, and so we can obtain the asymptotic standard errors (ASE) for the two parameters by doing this: Vc <- solve(opt.out$hessian) # Get variance-cov matrixASE <- sqrt(diag(Vc)) # Extract asymptotic SEsprint(ASE)  beta0 beta1 0.8687444 0.4436064

 

Page 19: Elements of Statistical Inference Theme of the workshop (and book): Analyzing HMs using both classical and Bayesian methods. “Dual inference paradigm”

Summary# Make a table with estimates, SEs, and 95% CImle.table <- data.frame(Est=mles, ASE = sqrt(diag(solve(opt.out$hessian))))mle.table$lower <- mle.table$Est - 1.96*mle.table$ASEmle.table$upper <- mle.table$Est + 1.96*mle.table$ASEmle.table Est ASE lower upperbeta0 -2.539793 0.8687444 -4.2425320 -0.8370538beta1 1.617025 0.4436064 0.7475564 2.4864933 # Plot the actual and estimated response curves plot(vegHt, z, xlab="Vegetation height", ylab="Occurrence probability")plot(function(x) plogis(beta0 + beta1 * x), 1.1, 3, add=TRUE, lwd=2)plot(function(x) plogis(mles[1] + mles[2] * x), 1.1, 3, add=TRUE, lwd=2, col="blue")legend(1.1, 0.9, c("Actual", "Estimate"), col=c("black", "blue"), lty=1, lwd=2)

Page 20: Elements of Statistical Inference Theme of the workshop (and book): Analyzing HMs using both classical and Bayesian methods. “Dual inference paradigm”

1.0 1.5 2.0 2.5 3.0

0.0

0.2

0.4

0.6

0.8

1.0

Vegetation height

Occu

rre

nce

pro

ba

bility

ActualEstimate

Page 21: Elements of Statistical Inference Theme of the workshop (and book): Analyzing HMs using both classical and Bayesian methods. “Dual inference paradigm”

Work session

• Different ways of obtaining MLEs: grid search, optim(), glm()

• Get the asymptotic SE (ASE)• Plot a fitted response curve • Bootstrap

Page 22: Elements of Statistical Inference Theme of the workshop (and book): Analyzing HMs using both classical and Bayesian methods. “Dual inference paradigm”

Bootstrappingnboot <- 1000 # Obtain 1000 bootstrap samplesboot.out <- matrix(NA, nrow=nboot, ncol=3)dimnames(boot.out) <- list(NULL, c("beta0", "beta1", "psi.bar")) for(i in 1:1000){ # Simulate data psi <- plogis(mles[1] + mles[2] * vegHt) z <- rbinom(M, 1, psi)  # Fit model tmp <- optim(mles, negLogLike, y=z, x=vegHt, hessian=TRUE)$par psi.mean <- plogis(tmp[1] + tmp[2] * mean(vegHt)) boot.out[i,] <- c(tmp, psi.mean)} 

Page 23: Elements of Statistical Inference Theme of the workshop (and book): Analyzing HMs using both classical and Bayesian methods. “Dual inference paradigm”

Bootstrapping  SE.boot <- sqrt(apply(boot.out, 2, var)) # Get bootstrap SEnames(SE.boot) <- c("beta0", "beta1", "psi.bar") # 95% bootstrapped confidence intervalsapply(boot.out,2,quantile,c(0.025,0.975)) beta0 beta1 psi.bar2.5% -4.490565 0.8379983 0.572807797.5% -0.978901 2.5974839 0.7828377 # Boostrap SEsSE.boot beta0 beta1 psi.bar 0.89156946 0.45943765 0.05428008 # Compare these with the ASEs for regression parametersmle.table Est ASE lower upperbeta0 -2.539793 0.8687444 -4.2425320 -0.8370538beta1 1.617025 0.4436064 0.7475564 2.4864933

Page 24: Elements of Statistical Inference Theme of the workshop (and book): Analyzing HMs using both classical and Bayesian methods. “Dual inference paradigm”

Part II: Hierarchical Models• HMs have 1 or more “intermediate” models/levels/stages involving a

latent variable (random effect).

observable variable, “observation model”

latent variables, “process model''

• Two canonical examples: 1. Modeling species occurrence – “occupancy models” 2. Modeling species abundance – “N-mixture models” (and related)

Page 25: Elements of Statistical Inference Theme of the workshop (and book): Analyzing HMs using both classical and Bayesian methods. “Dual inference paradigm”

Modeling species occurrence: Occupancy models

Observations: observations of presence/absence at site , sample for samples State variable: binary state-variable, true presence or absence

Observation model:

Same as probability of detecting species given that it is presentProcess model:

AKA: Bernoulli/Bernoulli HM. Also, a compound GLM (two GLMs linked together)

Page 26: Elements of Statistical Inference Theme of the workshop (and book): Analyzing HMs using both classical and Bayesian methods. “Dual inference paradigm”

Modeling species abundance from counts: The N-mixture model

Observations: count of birds at point , sample State variable: state-variable, integer, population size at point Observation model:

population sizeprobability of encountering an individual

Process model:

AKA: A Binomial/Poisson HM. Also a compound GLM.

Page 27: Elements of Statistical Inference Theme of the workshop (and book): Analyzing HMs using both classical and Bayesian methods. “Dual inference paradigm”

Likelihood inference for hierarchical models

Remove random effects from the conditional likelihood by integrating/summing the distribution of conditional on the random effect over possible states of the random effect:

• Integrated (or marginal) likelihood• Not a function of anymore• Maximize to obtain MLEs of • For discrete latent variable, replace by

Page 28: Elements of Statistical Inference Theme of the workshop (and book): Analyzing HMs using both classical and Bayesian methods. “Dual inference paradigm”

Example: Occupancy model

$\bullet$ {\bf Observation model:}\begin{equation}y_{i} \sim \mbox{Binomial}(J, p*z_{i}) \end{equation}

$\bullet$ {\bf State model:}\[ z_{i} \sim \mbox{Bernoulli}(\psi_{i})\]\[ \mbox{logit}(\psi_{i}) = \beta_0 + \beta_{1} x_{i}\]

$\bullet$ What is the marginal likelihood for $y$?

Page 29: Elements of Statistical Inference Theme of the workshop (and book): Analyzing HMs using both classical and Bayesian methods. “Dual inference paradigm”

Example: Occupancy model Computing the Marginal Likelihood• is a discrete random variable, having only 2 states, let’s use the

Law of total probability:

• Marginal likelihood for detection frequency

• AKA ‘zero-inflated binomial’. Can be maximized easily to obtain MLES.

• PRESENCE or unmarked function occu

Page 30: Elements of Statistical Inference Theme of the workshop (and book): Analyzing HMs using both classical and Bayesian methods. “Dual inference paradigm”

Doing it in RnSites <- 100vegHt <- runif(nSites, 1, 3) # uniform from 1 to 3psi <- plogis(-3 + 2*vegHt)

# Now we simulate true presence/absence for 100 sitesz <- rbinom(nSites, 1, psi)

## Now generate observationsp<- 0.6J<- 3 # sample each site 3 timesy<-rbinom(nSites,J,p*z)

# This is the negative log-likelihood.negLogLikeocc <- function(beta, y, x,J) { beta0 <- beta[1] beta1 <- beta[2] p<- plogis(beta[3])

psi <- plogis(beta0 + beta1*x)

marg.likelihood <- dbinom(y, J,p)*psi + ifelse(y==0,1,0)*(1-psi) return(-sum(log(marg.likelihood)))}starting.values <- c(beta0=0, beta1=0,logitp=0)opt.out <- optim(starting.values, negLogLikeocc, y=y, x=vegHt,J=J,hessian=TRUE)

Page 31: Elements of Statistical Inference Theme of the workshop (and book): Analyzing HMs using both classical and Bayesian methods. “Dual inference paradigm”

N-mixture modelThe strategy is the same: is a discrete variable and so the marginal likelihood is just a sum over the possible values of

Observation model:

Process model:

Marginal likelihood:

Page 32: Elements of Statistical Inference Theme of the workshop (and book): Analyzing HMs using both classical and Bayesian methods. “Dual inference paradigm”

Continuous case: numerical integration

Page 33: Elements of Statistical Inference Theme of the workshop (and book): Analyzing HMs using both classical and Bayesian methods. “Dual inference paradigm”

Continuous case: numerical integration

Snowshoe hare data, see Royle and Dorazio (2008, chapter 6)

# FREQUENCIES captured 0, J=14 times:nx<-c(14,34, 16, 10, 4, 2, 2,0,0,0,0,0,0,0,0)nind<-sum(nx)J<-14

Mhlik<-function(parms){ mu<-parms[1] sigma<-exp(parms[2])

il<-rep(NA,J+1)for(k in 0:J){il[k+1]<-integrate( function(x){ dbinom(k,J,plogis(x))*dnorm(x,mu,sigma) },lower=-Inf,upper=Inf)$value}-1*( sum(nx*log(il)) )}tmp<-nlm(Mhlik,c(-1,-1 ),hessian=TRUE)sqrt(diag(solve(tmp$hessian)))

Page 34: Elements of Statistical Inference Theme of the workshop (and book): Analyzing HMs using both classical and Bayesian methods. “Dual inference paradigm”

Part III: Bayesian inference

Page 35: Elements of Statistical Inference Theme of the workshop (and book): Analyzing HMs using both classical and Bayesian methods. “Dual inference paradigm”

Bayes’ rule

Bayes' Rule: A probability law that relates the conditional distributions and for any random variables and

• distribution of conditional on (think detection given occupancy)

• distribution of conditional on • marginal distribution of • marginal distribution of

Page 36: Elements of Statistical Inference Theme of the workshop (and book): Analyzing HMs using both classical and Bayesian methods. “Dual inference paradigm”

Bayes’ rule, continued

Example: if occupancy state and number of detections in vists then probability of occupancy given non-detection in the visits.

plug-in known quantities:

Page 37: Elements of Statistical Inference Theme of the workshop (and book): Analyzing HMs using both classical and Bayesian methods. “Dual inference paradigm”

Bayes’ rule, continued

Bayes' rule is not a controversial thing, it is just a basic law ofprobability. However, advocates of Bayesian inference assert itsgeneral use for inference about model parameters which are nottraditionally considered to be random variables

e.g., for data and some parameter

• “prior distribution”

Page 38: Elements of Statistical Inference Theme of the workshop (and book): Analyzing HMs using both classical and Bayesian methods. “Dual inference paradigm”

Bayesian inferenceJoint distribution of the observations conditional on parameters:

Distribution for parameters -- prior distribution:

We can invoke Bayes’ rule , i.e., compute the conditional distribution:

• numerator = joint distribution of data and parameters• denominator = marginal distribution of the data

Page 39: Elements of Statistical Inference Theme of the workshop (and book): Analyzing HMs using both classical and Bayesian methods. “Dual inference paradigm”

The Posterior distribution

• Arises from application of basic rules of probability, because everything is a random variable

• Is a probability distribution for the parameters!• Chararacterize uncertainty in the parameter values using explicit

probability statements• “Bayesian confidence interval” (usuallly “credible interval”)• In general, report summaries of the posterior distribution: mean, mode,

variance, etc..

Page 40: Elements of Statistical Inference Theme of the workshop (and book): Analyzing HMs using both classical and Bayesian methods. “Dual inference paradigm”

Computing the posterior distribution

1. Do the math. Recognize the mathematical form of the posterior as a standard named distribution that we can compute moments of.

2. Monte Carlo approximation -- draw samples from the posterior distribution and quantify posterior features by summarizing the samples. Markov chain Monte Carlo (MCMC).

Page 41: Elements of Statistical Inference Theme of the workshop (and book): Analyzing HMs using both classical and Bayesian methods. “Dual inference paradigm”

Computing the posterior distributionIn limited cases we can identify the posterior distribution analytically. e.g., if number of times we detected a species at some site, we assume and then

The MLE is , the posterior mean is , the posterior mode is (also this is the MLE).

If we use a uniform prior for ( then the posterior mode is equal to the MLE. This is a general result: if you use a uniform prior then there is a correspondence between posterior modes and MLEs.

Page 42: Elements of Statistical Inference Theme of the workshop (and book): Analyzing HMs using both classical and Bayesian methods. “Dual inference paradigm”

Computing the posterior distribution: MCMC

The posterior:

Computing the denominator is computationally expensive, and sometimes not even possible:

(one integral for each parameter that has a prior distribution). • Usually not recognizable as a known distribution.• Can only be done analytically in very special cases.

0 11 2 1 2 0 1 0 1 0 1[ ] ( , , , ) ( , , , | , ) ( , ) .n nf y y y f y y y g d d

y

Page 43: Elements of Statistical Inference Theme of the workshop (and book): Analyzing HMs using both classical and Bayesian methods. “Dual inference paradigm”

How to do Bayesian analysis: MCMC

• MCMC: simulation methods for sampling from the posterior distribution which do not require that we know the denominator, or ever have to evaluate it. We estimate features of the posterior distribution from the posterior samples.

• We calculate features of the posterior distribution from the posterior samples using Monte Carlo averages.

• e.g., if we obtain from the posterior distribution then

• The topic of MCMC is too vast to cover here, we cover only a couple basic ideas such as Metrpolis-Hastings (otherwise we use BUGS/JAGS!).

Page 44: Elements of Statistical Inference Theme of the workshop (and book): Analyzing HMs using both classical and Bayesian methods. “Dual inference paradigm”

The Metropolis AlgorithmTarget distribution for random variable : . Note: in practice this is always the posterior distribution:

But it can be any distribution at all.

Step 0. Initialize is the current value of the parameter, i.e., at step of the algorithm.

Step 1. Draw candidate values of a parameter, , are simulated from some symmetric proposal distribution Symmetric if

– E.g., – E.g., # need not depend on

Page 45: Elements of Statistical Inference Theme of the workshop (and book): Analyzing HMs using both classical and Bayesian methods. “Dual inference paradigm”

The Metropolis Algorithm

Step 2. Accept that value with probability related to the ratio of the target distribution evaluated at the candidate to that evaluated at the current value / # Note: cancels # Note: involves candidate generator if not symmetric

Acceptance probability =

Step 3. Repeat a few thousand times

Page 46: Elements of Statistical Inference Theme of the workshop (and book): Analyzing HMs using both classical and Bayesian methods. “Dual inference paradigm”

The Metropolis Algorithm

Practical relevance: • The marginal distribution of (i.e., denominator

of the posterior or the conditional posterior) cancels, so we don't need to know what it is.

• To use the Metropolis algorithm we only have to evaluate known distributions that make up the posterior but not the posterior itself

Page 47: Elements of Statistical Inference Theme of the workshop (and book): Analyzing HMs using both classical and Bayesian methods. “Dual inference paradigm”

Illustration

• Suppose you collect two observations and which are independent Binomial random variables with and unknown Obtain the posterior distribution of using the Metropolis algorithm

Page 48: Elements of Statistical Inference Theme of the workshop (and book): Analyzing HMs using both classical and Bayesian methods. “Dual inference paradigm”

Illustration of MCMC using Metropolis Algorithm # 2 binomial observationsy<- rbinom(2,size=10, p = .5)

# The joint distribution function. As a function of data it gives# the probability of any two values of data=c(y1, y2) jointdis<- function(data,K,p){ prod(dbinom(data, size=K, p=p))}

(jointdis(y, K=10, p = .5)) # also happens to be the likelihood of the value p = .5 for the # given data, but it is NOT a probability for p.

# Evaluate the likelihood for a grid of values of "p"p.grid<- seq(.1,.9,,200)likelihood<- rep(NA,200)for(i in 1:200){ likelihood[i]<- jointdis(y, K = 10, p=p.grid[i])}# Plot the likelihoodplot(p.grid,likelihood,xlab="p", ylab="likelihood")

Page 49: Elements of Statistical Inference Theme of the workshop (and book): Analyzing HMs using both classical and Bayesian methods. “Dual inference paradigm”

Illustration of MCMC using Metropolis Algorithm

Page 50: Elements of Statistical Inference Theme of the workshop (and book): Analyzing HMs using both classical and Bayesian methods. “Dual inference paradigm”

Illustration of MCMC using Metropolis Algorithm

Need a prior distribution for : where we assume and are fixed subjectively. i.e., not parameters to estimate.

The target distribution is the posterior distribution:

• ???? • Possibly we could figure this out, but why bother?

Page 51: Elements of Statistical Inference Theme of the workshop (and book): Analyzing HMs using both classical and Bayesian methods. “Dual inference paradigm”

Illustration of MCMC using Metropolis Algorithm

The target distribution is the posterior distribution:

This is proportional to the joint distribution (which was the likelihood) and the beta prior distribution, so we'll make an R function out of it:

# Define the joint distribution of the datajointdis<- function(data,K,p){ prod(dbinom(data, size=K, p=p))}# Posterior is proportional to likelihood times priorposterior<- function(data,K,p,a,b){ prod(dbinom(data, size=K, p=p))*dbeta(p,a,b)}

Page 52: Elements of Statistical Inference Theme of the workshop (and book): Analyzing HMs using both classical and Bayesian methods. “Dual inference paradigm”

Illustration of MCMC using Metropolis Algorithm

## Do 10000 MCMC iterations using the metropolis algorithm## Assume uniform prior which is beta(1,1)mcmc.iters<- 100000out<- rep(NA,mcmc.iters)# starting valuep<- .2for(i in 1:mcmc.iters){

# use a uniform candidate generator. This is not efficient p.cand <- runif(1,0,1) r<- posterior(y,K=10,p=p.cand,a=1,b=1)/posterior(y,K=10,p=p,a=1,b=1) # generate a uniform r.v. and compare with "r", this imposes the # correct probability of acceptance if(runif(1) < r) # This is how you “do something” with probability r p<- p.cand out[i]<- p}

Page 53: Elements of Statistical Inference Theme of the workshop (and book): Analyzing HMs using both classical and Bayesian methods. “Dual inference paradigm”

Likelihood vs. posterior

Page 54: Elements of Statistical Inference Theme of the workshop (and book): Analyzing HMs using both classical and Bayesian methods. “Dual inference paradigm”

The posterior of a function of a model parameter

• To estimate a function of a parameter (and it’s variance) you only have to apply that function to the posterior samples of the parameter.

• If is a posterior sample of some parameter then is a posterior sample of the parameter .

• As an exercise, estimate the posterior distribution of from the binomial example

Page 55: Elements of Statistical Inference Theme of the workshop (and book): Analyzing HMs using both classical and Bayesian methods. “Dual inference paradigm”

Remarks on the Metropolis Algorithm

Heuristic: This algorithm has us simulate candidate valuessomehow, even arbitrarily, and then accept values that have higher posterior probability

• The long-run frequency of ``accepted'' values is that of the target posterior density!

• Note: If the prior is constant, this MCMC calculation isbased on repeated evaluations of the likelihood only. So, if you write a function to do MLE you can also do MCMC.

Page 56: Elements of Statistical Inference Theme of the workshop (and book): Analyzing HMs using both classical and Bayesian methods. “Dual inference paradigm”

Summary thoughts on Bayesian/classical inference

Both inference paradigms useful for analysis of hierarchical models Bayesian: • Completely general methods for implementation (MCMC) which always

work. Sometimes BUGS implementations don't work, so its good to know how to do it.

• Bayes is great for complex models with lots of latent structure• Inferences are not asymptotic, applies to arbitrary . In particular, the that

you have.• Prediction/transformation is more coherent -- comes “for free”• Takes more math/programming know-how????• Sometimes slower due to more calculations Classical:• Integrated likelihood sometimes not feasible. (community model)• But very accurate (not simulation based, no MC error)• Automatic model selection (AIC)

Page 57: Elements of Statistical Inference Theme of the workshop (and book): Analyzing HMs using both classical and Bayesian methods. “Dual inference paradigm”

Idealized Structure of Workshop/Book

• Introduction to a class of models• Likelihood analysis of models in unmarked• Stressing consistent work flow and ease of doing

standard things like prediction and model selection• Bayesian analysis in BUGS • Illustration of a type of model that can't be done

(easily, or in unmarked) using likelihood methods.