What is a Probability?
• Physical Probability– Frequentist: long-run outcome– Propensity: property of the system
• Evidential Probability (Bayesian)– Measure of statement (un)certainty
Statistical Falsificationism
• Data is a consequence of the “true” model
• That consequence is probabilistic
Likelihood = P (Data | Model)
• Evidence against the model if the data at hand would be very unlikely under that model
R.A. Fisher
P (Evidence | Hypothesis)
Comparing Multiple Hypothesis
• Falsificationism just rejects hypotheses– Cannot provide support for a hypothesis– Relies on the inability to reject it with time– Only one hypothesis at a time
• In reality, there are often multiple candidate hypotheses
• We can simultaneously calculate the likelihood of our evidence given each of them
• Measure the relative support for hypotheses
Thomas C. ChamberlinImre Lakatos
Likelihood Function
MAXIMUM LIKELIHOOD
ESTIMATE
MAXIMUM LIKELIHOOD
φ = 0.833
8. Bayesian Analysis
Frequentist vs. Bayesian
Frequentist Bayesian
Probability is a long-run average Probability is a degree of belief
There is a true Model, the Data is a random realization
The Data is true/fixed, Models have probabilities
Probability of the data given a hypothesis (Likelihood)
Probability of a hypothesis given the data
Each repeated experiment/observation starts
from ignorance
Can incorporate prior knowledge: probabilities can be updated
Harold JeffereysJerzy Neyman
Bayes Theorem
Thomas Bayes
Bayes Theorem
Prior Knowledge Likelihood
ConstantPosteriorDistribution
Really hard to calculate most times
Estimating Survival
• What is the survival probability?– Check caterpillars every week– Experiment: 5 caterpillars. Ends after 4 weeks– Data: survived 2, 4, 4, 2 and 3 weeks
Clay model caterpillar
1 1 1 0 0
1 1 1 1 1
1 1 1 1 1
1 1 1 0 0
1 1 1 1 0
Caterpillars set Experiment endstime (weeks)
Indi
vidu
als
1 = alive (intact)2 = dead (attacked)
Estimating Survival
• Define a model (assumptions):– Constant probability of survival through time– All individuals are equal– Just one parameter: probability of survival, φ
• Probability of the data given the model: L (D | M)
1 1 1 0 0
1 1 1 1 1
1 1 1 1 1
1 1 1 0 0
1 1 1 1 0
time (weeks)
Indi
vidu
als
1 = alive (intact)2 = dead (attacked)
Clay model caterpillar
Likelihood Approach
• Likelihood of the data for any value of φ L(data |φ)
1 1 1 0 0
1 1 1 1 1
1 1 1 1 1
1 1 1 0 0
1 1 1 1 0
Caterpillars set Experiment ends
time (weeks)
Indi
vidu
als
φ2 ✕ (1-φ)
φ4
φ4
φ2 ✕ (1-φ)
φ3 ✕ (1-φ)
Probability of all data all occurring
independently together
φ15 ✕ (1-φ)3
Likelihood Function
MAXIMUM LIKELIHOOD
ESTIMATE
MAXIMUM LIKELIHOOD
φ = 0.833
The Bayesian Way
• We need a Likelihood function: we have it• We need a Prior distribution:
e.g. all parameter values equally likely a prioriPrior(φ) ~ Uniform(0,1)
Posterior (φ) = P(φ | data) =
P(data)
x
Markov Chain Monte Carlo (MCMC)
1. Initial guess for parameter was A2. Guess a random number (B)3. Calculate Prior x Likelihood4. Accept B with probability:
5. If accept, add it to list of guesses6. If reject, add previous guess to list of guesses
If all guesses have the same probability, this is 1.
MCMC
Propose: φ = 0.2
Prior ~ Unif (0,1)Likelihood = φ15 ✕ (1-φ)3
MCMC
Previous: φ = 0.2Propose: φ = 0.1
Prior ~ Unif (0,1)Likelihood = φ15 ✕ (1-φ)3
Previous Likelihood
Proposed Likelihood
Previous Prior
Proposed Prior
MCMC
Previous: φ = 0.2Propose: φ = 0.66
Prior ~ Unif (0,1)Likelihood = φ15 ✕ (1-φ)3
MCMC
Previous: φ = 0.66Propose: φ = 0.57
Prior ~ Unif (0,1)Likelihood = φ15 ✕ (1-φ)3
MCMC
Previous: φ = 0.66Propose: φ = 0.57
Prior ~ Unif (0,1)Likelihood = φ15 ✕ (1-φ)3
MCMC
Prior ~ Unif (0,1)Likelihood = φ15 ✕ (1-φ)3
φ = 0.55φ = 0.62…
MCMC
Prior ~ Unif (0,1)Likelihood = φ15 ✕ (1-φ)3
φ = 0.55φ = 0.62φ = 0.44φ = 0.44φ = 0.44φ = 0.57φ = 0.82φ = 0.71
Sampling the MCMCIgnore initial numbers:Still far from optimum
BURN-IN
These numbers should be a good sample of the Posterior (φ | data)
Posterior Distribution
Survival Estimate (φ)
Posterior Distribution
Survival Estimate (φ)
Posterior Distribution
Survival Estimate (φ)
95% area
φ = 0.81 (0.60-0.94)
95% CREDIBLEINTERVAL
Uninformative Prior
• We assumed no prior knowledge of φ
Survival Estimate (φ)
Prior Distribution
Posterior Distribution
Informative Priors
Posterior
Prior
Posterior
Prior
• What if I had prior information?– Previous experiment– Literature or expert knowledge
More than one Parameter
e.g. Logistic population growthProblem: Best value for r depends on value of K
Marginalizing
• e.g. Probability of dying given that I am male if…?– P(death | infected) = 0.3– P(death | attacked) = 0.7– P(death | lightning) = 0.99– P(death | bored) = 0.01– P (infected | male) = 0.2– P(attacked | male) = 0.1– P(lightning | male) = 0.00001– P(bored | male)= 0.99
P(death|male) = 0.3x0.2+0.7x0.1
+0.99x0.00001+0.01x0.99
= 0.14
Marginal LikelihoodIntegrate (sum) over all possible values of θ1
Likelihood be for a given value of θ2
Probability of θ2 for any given value of θ1
Posterior (r) P (data | K=200, r)
X = P(data |K=200)
Gibbs Sampling
1. Propose value for parameter θ1 and keep all other parameters at latest value
2. Accept or reject proposed value θ1
3. Move on to updating value of θ2 while keeping latest values of other parameters
4. Keep on until all parameters are updated5. Update θ1 keeping all other parameters at latest
updated valueEtc…
Gibbs Sampling
r = 1.01 (0.61-1.61)
K = 202.1 (200.6-206.5)
N0 = 103.9 (98.0 -123.2)
More than one Parameter
e.g. Logisic population growth
The Power of Marginalizing
• Complex random effect structures– Condition a random effect on another
• Missing data– Condition parameters on missing data values– Missing values updated in Gibbs Sampling: we can
get estimates!• Latent Variables– Models with unmeasured variables that influence
the data
Latent Variables
e.g. Dispersal strategies in frogs– Two types of individuals: dispersive and sedentary– Data on number of dispersal movements– Cannot measure ‘type’ of individual
Latent Variables
N.moves ~ Poisson(λ)λ = λ1 if type = sedentary
λ = λ2 if type = dispersive
P(dispersive) ~ Binom(p)
Estimate Credible Interval
λ1 2.34 1.84-2.90
λ2 11.13 9.90-12.36
p 0.56 0.45-0.67
Latent Variables
• Gibbs Sampling marginalized over the probability of each data point being each type
• Can get estimates of the ‘type’ of each data point• Maybe I want to test if type correlates with mating
success?
Bayesian Model Comparison
• Deviance Information Criterion– DIC = -2logLik + var(logLik)– The more parameters, the more variance the
likelihood has• Reverse Jump MCMC– Create a ‘supermodel’ where each model is weighted by
the probability of that model being true– Gibbs Sampling on all the model estimates and model
probabilities at the same time.– Marginalizes over model probabilities– Straightforward to do model averaging of parameters
Reverse Jump MCMC
N.moves ~ Poisson(λ)Model 1
λ = λ1 if type = sedentary
λ = λ2 if type = dispersive
P(dispersive) ~ Binom(p)Model 2
λ = λ0 same for all
Estimate Credible Interval
M1 0.78
λ1 2.34 1.84-2.90
λ2 11.13 9.90-12.36
p 0.56 0.45-0.67
M2 0.22
λ0 6.17 5.67-6.63
Take Home
• Bayesians calculate probabilities of parameters and hypotheses
• Priors can be informative or uninformative• Parameters are probabilistic: hierarchical models• Pros: – Bayesian methods marginalize over unknowns– Complex hierarchical models and latent variables
• Cons:– Computation time is long– Choosing priors