Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
COMP STAT WEEK 6 DAY 2More Bayes and start of Metropolis Hastings
Dave Campbell, www.stat.sfu.ca/~dac5
Basics of Computational Bayesian Methods
MCMC, how to and what you need to know
Thomas Bayes
A Markov Chain is a sequence of random variables {Xt,t≥0} where
The probability of moving from state At-1 to state At is constant
Conditional on one previous time step, the chain is independent of all events before that.
P(Xt ∈At | Xt−1 ∈At−1,Xt−2 ∈At−2 ,...,X0 ∈A0 )= P(Xt ∈At | Xt−1 ∈At−1)= P(Xs ∈At | Xs−1 ∈At−1)= PAt−1At
Let Ωt be a random variable (stochastic process)
We want to evaluate
We use dependent realizations from a Markov chain to approximate
We just set up a Markov chain with the desired state space and let it step ahead for a long time
θ = E(h(Ω) = h(ω j )P(Ω =ω j )j=1
∞
∑
θ̂
In practice we typically use Metropolis Hastings (MH) algorithm to use a sample from one nice and well behaved Markov chain to give us a sample from our target distribution P(ß|Y=y)
We have a good way of getting
but we don’t have the scaling factor P(Y=y)
P(Y = y | β)P(β)
So what we have is
Where the unknown
P(β = b |Y = y) = CP(Y = y | β = b)P(β = b)
C =1
P(Y = y | β = bj )P(β = bj )j=1
∞
∑
Given ßt=i we propose a value of X=j as a candidate for ßt+1 from the proposal distribution (transition distribution) Qij
For example propose X from Uniform(ßt-∂,ßt+∂)
Make a probabilistic decision about keeping setting ßt+1=X or keeping ßt+1=ßt
We make the decision such that {ßt|t≥0} has the correct limiting distribution: P(ß|Y=y)
Let’s be clear about notation:
P(ßt+1=j|ßt=i) = Pij
So Pij is the probability that the random walk leading to the target (posterior) distribution moves from state i to state j
P(X=j|ßt=i) = Qij
Qij is the the probability that the random walk from an easy to sample yet arbitrary distribution proposes a value from state i to state j
If we could sample from ßt directly it would have the transition distribution Pij
The probability of accepting the value X is
To get the right target distribution we need when i≠j
And we must fulfill the detailed balance
α ij
Pij = Qijα ij
P(β = i)Pij = PjiP(β = j)
1.Start with ßt-1=i 2.Propose a value X|ßt-1=j from transition
probability matrix Qij as a candidate for ßt
3.compute
5.sample u ~ Unif(0,1) 6.If u< then accept the proposal and set ßt=X
and if not then set ßt=ßt-1. 7.Repeat (N times) until you obtain a sufficient
sample from the distribution of ß|Y=y
α ij = minP(Y = y | β = j)P(β = j)P(X = j | β = i)P(Y = y | β = i)P(β = i)P(X = i | β = j)
,1⎛
⎝⎜⎞
⎠⎟
α ij
http://www.probability.ca/jeff/java/ A Markov chain applet, "rwm", illustrates a random walk metropolis hastings algorithm
Check out my awesome applets!
Jeffrey S. Rosenthal University of Toronto
author of: Struck by Lightning: The
Curious World of Probabilities (book for
the general public). HarperCollins Canada,
272 pages, 2005.
And heaps of MCMC theory papers
Simple Example
We will use the example from the cervical cancer vaccination data.
The parameter 'ß' is the probability of getting cervical cancer when someone is not vaccinated.
Without any data, I don't think I know anyone with cervical cancer but admit I know very little about its prevalence
But 0<ß<1 and I will assume is has a q has a density that is higher at low values and decreases linearly to 0 density at ß=1.
Data: The study showed that Y=36 women got cancer from N=5766.
We will use a Binomial Statistical model for Y
The likelihood P(Y|ß) = Binomial(N,ß) is our statistical model.
We are interested in updating our belief about the value of the real parameter with the data suggesting Bayesian methods are appropriate.
Given the data our belief about q is
Let’s get a point and interval estimate for P(ß|Y=y) using MCMC
P(β | Y = y) ∝ P(Y = y | β)P(β)
= P(Y = 36 | β)P(β)
= 576636
⎛
⎝⎜⎞
⎠⎟β 36 (1− β)5766−36 (2 − 2β)
∝ β 36 (1− β)5730 (2 − 2β)
Week6_Day2_Basic_MCMC.R######################## This file runs basic Metropolis Hastings for the Merck Vaccination data# Parameter Beta is the probability of getting cervical cancer when someone is not vaccinated#######################
# The prior is the simple triangle function. # This function is numerically quite stable within the (0,1) interval for Beta# logprior = function(beta){ if(beta>0 && beta<1 ){ return(log(2-2*beta)) }else{ return(-Inf) }}
# Set up the MCMC with niter iterationsniter = 100000stepvar = .002beta = rep(0,niter)
# The datay = 36N = 5766
# keep track of the acceptance rateaccepts = 0
# Initialize and run the MCMCbeta[1] = y/N
for(iter in 2:niter){ # propose a value from an easy distribution Betaprop = rnorm(n = 1, mean = beta[iter-1], sd = stepvar);
# the ratio of un-normalized posteriors. Note that my proposal # distribution is symmetric so Q_{ij}=Q_{ji} alpha = dbinom(y,N,Betaprop,log=TRUE) + logprior(Betaprop) - dbinom(y,N,beta[iter-1],log=TRUE) - logprior(beta[iter-1]); # make a decision if(!is.na(alpha) && runif(1) < exp(alpha)){ accepts = accepts+1; beta[iter] = Betaprop; }else{ beta[iter] = beta[iter-1]; }}
The Markov Chain for ß
hist(beta,100)The distribution of ß, the probability of getting cancer
without getting vaccinated.
Use the sampled values of ß to compute
We often use the sampled values to get an approximation for the mean, median, modes, variance, interval estimates, quantiles...
Bayesian statistics uses MCMC to give an approximation to the full posterior distribution.
E h(β)[ ] = h(b j)P(β = b j)j=1
∞
∑
≈h(b j)Nj=1
N
∑
> summary(beta) Min. 1st Qu. Median Mean 3rd Qu. Max. 0.002896 0.005749 0.006404 0.006450 0.007034 0.011720
Taking it a step further:
The cancer is rare.
Statisticians are skeptical of everything
What should we use as a prior for the probability of getting cancer given that we have been vaccinated?
Let’s see if the vaccine actually works
Switch to RStudio
Interpret my prior for the second example.
Interpret my prior for the first example
What is the frequentist analog to the second analysis?
Second analysis used the posterior from the “no vaccine” group as a prior. This is saying that we start with assuming that the vaccine doesn’t work and the probability of getting cancer is the same with or without vaccination