Download pptx - Foundations in Statistics for Ecology and Evolution 8. Bayesian Statistics

What is a Probability?

• Physical Probability– Frequentist: long-run outcome– Propensity: property of the system

• Evidential Probability (Bayesian)– Measure of statement (un)certainty

Statistical Falsificationism

• Data is a consequence of the “true” model

• That consequence is probabilistic

Likelihood = P (Data | Model)

• Evidence against the model if the data at hand would be very unlikely under that model

R.A. Fisher

P (Evidence | Hypothesis)

Comparing Multiple Hypothesis

• Falsificationism just rejects hypotheses– Cannot provide support for a hypothesis– Relies on the inability to reject it with time– Only one hypothesis at a time

• In reality, there are often multiple candidate hypotheses

• We can simultaneously calculate the likelihood of our evidence given each of them

• Measure the relative support for hypotheses

Thomas C. ChamberlinImre Lakatos

Likelihood Function

MAXIMUM LIKELIHOOD

ESTIMATE

MAXIMUM LIKELIHOOD

φ = 0.833

8. Bayesian Analysis

Frequentist vs. Bayesian

Frequentist Bayesian

Probability is a long-run average Probability is a degree of belief

There is a true Model, the Data is a random realization

The Data is true/fixed, Models have probabilities

Probability of the data given a hypothesis (Likelihood)

Probability of a hypothesis given the data

Each repeated experiment/observation starts

from ignorance

Can incorporate prior knowledge: probabilities can be updated

Harold JeffereysJerzy Neyman

Bayes Theorem

Thomas Bayes

Bayes Theorem

Prior Knowledge Likelihood

ConstantPosteriorDistribution

Really hard to calculate most times

Estimating Survival

• What is the survival probability?– Check caterpillars every week– Experiment: 5 caterpillars. Ends after 4 weeks– Data: survived 2, 4, 4, 2 and 3 weeks

Clay model caterpillar

1 1 1 0 0

1 1 1 1 1

1 1 1 1 1

1 1 1 0 0

1 1 1 1 0

Caterpillars set Experiment endstime (weeks)

Indi

vidu

als

1 = alive (intact)2 = dead (attacked)

Estimating Survival

• Define a model (assumptions):– Constant probability of survival through time– All individuals are equal– Just one parameter: probability of survival, φ

• Probability of the data given the model: L (D | M)

1 1 1 0 0

1 1 1 1 1

1 1 1 1 1

1 1 1 0 0

1 1 1 1 0

time (weeks)

Indi

vidu

als

1 = alive (intact)2 = dead (attacked)

Clay model caterpillar

Likelihood Approach

• Likelihood of the data for any value of φ L(data |φ)

1 1 1 0 0

1 1 1 1 1

1 1 1 1 1

1 1 1 0 0

1 1 1 1 0

Caterpillars set Experiment ends

time (weeks)

Indi

vidu

als

φ2 ✕ (1-φ)

φ4

φ4

φ2 ✕ (1-φ)

φ3 ✕ (1-φ)

Probability of all data all occurring

independently together

φ15 ✕ (1-φ)3

Likelihood Function

MAXIMUM LIKELIHOOD

ESTIMATE

MAXIMUM LIKELIHOOD

φ = 0.833

The Bayesian Way

• We need a Likelihood function: we have it• We need a Prior distribution:

e.g. all parameter values equally likely a prioriPrior(φ) ~ Uniform(0,1)

Posterior (φ) = P(φ | data) =

P(data)

x

Markov Chain Monte Carlo (MCMC)

1. Initial guess for parameter was A2. Guess a random number (B)3. Calculate Prior x Likelihood4. Accept B with probability:

5. If accept, add it to list of guesses6. If reject, add previous guess to list of guesses

If all guesses have the same probability, this is 1.

MCMC

Propose: φ = 0.2

Prior ~ Unif (0,1)Likelihood = φ15 ✕ (1-φ)3

MCMC

Previous: φ = 0.2Propose: φ = 0.1


Previous Likelihood

Proposed Likelihood

Previous Prior

Proposed Prior

MCMC



MCMC



MCMC



MCMC


φ = 0.55φ = 0.62…

MCMC


φ = 0.55φ = 0.62φ = 0.44φ = 0.44φ = 0.44φ = 0.57φ = 0.82φ = 0.71

Sampling the MCMCIgnore initial numbers:Still far from optimum

BURN-IN

These numbers should be a good sample of the Posterior (φ | data)

Posterior Distribution

Survival Estimate (φ)





95% area

φ = 0.81 (0.60-0.94)

95% CREDIBLEINTERVAL

Uninformative Prior

• We assumed no prior knowledge of φ


Prior Distribution


Informative Priors

Posterior

Prior

Posterior

Prior

• What if I had prior information?– Previous experiment– Literature or expert knowledge

More than one Parameter

e.g. Logistic population growthProblem: Best value for r depends on value of K

Marginalizing

• e.g. Probability of dying given that I am male if…?– P(death | infected) = 0.3– P(death | attacked) = 0.7– P(death | lightning) = 0.99– P(death | bored) = 0.01– P (infected | male) = 0.2– P(attacked | male) = 0.1– P(lightning | male) = 0.00001– P(bored | male)= 0.99

P(death|male) = 0.3x0.2+0.7x0.1

+0.99x0.00001+0.01x0.99

= 0.14

Marginal LikelihoodIntegrate (sum) over all possible values of θ1

Likelihood be for a given value of θ2

Probability of θ2 for any given value of θ1

Posterior (r) P (data | K=200, r)

X = P(data |K=200)

Gibbs Sampling

1. Propose value for parameter θ1 and keep all other parameters at latest value

2. Accept or reject proposed value θ1

3. Move on to updating value of θ2 while keeping latest values of other parameters

4. Keep on until all parameters are updated5. Update θ1 keeping all other parameters at latest

updated valueEtc…

Gibbs Sampling

r = 1.01 (0.61-1.61)

K = 202.1 (200.6-206.5)

N0 = 103.9 (98.0 -123.2)

More than one Parameter

e.g. Logisic population growth

The Power of Marginalizing

• Complex random effect structures– Condition a random effect on another

• Missing data– Condition parameters on missing data values– Missing values updated in Gibbs Sampling: we can

get estimates!• Latent Variables– Models with unmeasured variables that influence

the data

Latent Variables

e.g. Dispersal strategies in frogs– Two types of individuals: dispersive and sedentary– Data on number of dispersal movements– Cannot measure ‘type’ of individual

Latent Variables

N.moves ~ Poisson(λ)λ = λ1 if type = sedentary

λ = λ2 if type = dispersive

P(dispersive) ~ Binom(p)

Estimate Credible Interval

λ1 2.34 1.84-2.90

λ2 11.13 9.90-12.36

p 0.56 0.45-0.67

Latent Variables

• Gibbs Sampling marginalized over the probability of each data point being each type

• Can get estimates of the ‘type’ of each data point• Maybe I want to test if type correlates with mating

success?

Bayesian Model Comparison

• Deviance Information Criterion– DIC = -2logLik + var(logLik)– The more parameters, the more variance the

likelihood has• Reverse Jump MCMC– Create a ‘supermodel’ where each model is weighted by

the probability of that model being true– Gibbs Sampling on all the model estimates and model

probabilities at the same time.– Marginalizes over model probabilities– Straightforward to do model averaging of parameters

Reverse Jump MCMC

N.moves ~ Poisson(λ)Model 1

λ = λ1 if type = sedentary

λ = λ2 if type = dispersive

P(dispersive) ~ Binom(p)Model 2

λ = λ0 same for all

Estimate Credible Interval

M1 0.78

λ1 2.34 1.84-2.90

λ2 11.13 9.90-12.36

p 0.56 0.45-0.67

M2 0.22

λ0 6.17 5.67-6.63

Take Home

• Bayesians calculate probabilities of parameters and hypotheses

• Priors can be informative or uninformative• Parameters are probabilistic: hierarchical models• Pros: – Bayesian methods marginalize over unknowns– Complex hierarchical models and latent variables

• Cons:– Computation time is long– Choosing priors