COMP STAT WEEK 6 DAY 2people.stat.sfu.ca/~dac5/Stat853-2016/Stat853_2016/Course_Notes/... · A Markov Chain is a sequence of random variables {Xt,t≥0} where The probability of moving

$Page 1: COMP STAT WEEK 6 DAY 2people.stat.sfu.ca/~dac5/Stat853-2016/Stat853_2016/Course_Notes/... · A Markov Chain is a sequence of random variables {Xt,t≥0} where The probability of moving$
COMP STAT WEEK 6 DAY 2More Bayes and start of Metropolis Hastings

Dave Campbell, www.stat.sfu.ca/~dac5

http://www.stat.sfu.ca/~dac5

Basics of Computational Bayesian Methods

MCMC, how to and what you need to know

Thomas Bayes

A Markov Chain is a sequence of random variables {Xt,t≥0} where

The probability of moving from state At-1 to state At is constant

Conditional on one previous time step, the chain is independent of all events before that.

P(Xt ∈At | Xt−1 ∈At−1,Xt−2 ∈At−2 ,...,X0 ∈A0 )= P(Xt ∈At | Xt−1 ∈At−1)= P(Xs ∈At | Xs−1 ∈At−1)= PAt−1At

Let Ωt be a random variable (stochastic process)

We want to evaluate

We use dependent realizations from a Markov chain to approximate

We just set up a Markov chain with the desired state space and let it step ahead for a long time

θ = E(h(Ω) = h(ω j )P(Ω =ω j )j=1

∞

∑

θ̂

In practice we typically use Metropolis Hastings (MH) algorithm to use a sample from one nice and well behaved Markov chain to give us a sample from our target distribution P(ß|Y=y)

We have a good way of getting

but we don’t have the scaling factor P(Y=y)

P(Y = y | β)P(β)

So what we have is

Where the unknown

P(β = b |Y = y) = CP(Y = y | β = b)P(β = b)

C =1

P(Y = y | β = bj )P(β = bj )j=1

∞

∑

Given ßt=i we propose a value of X=j as a candidate for ßt+1 from the proposal distribution (transition distribution) Qij

For example propose X from Uniform(ßt-∂,ßt+∂)

Make a probabilistic decision about keeping setting ßt+1=X or keeping ßt+1=ßt

We make the decision such that {ßt|t≥0} has the correct limiting distribution: P(ß|Y=y)

Let’s be clear about notation:

P(ßt+1=j|ßt=i) = Pij

So Pij is the probability that the random walk leading to the target (posterior) distribution moves from state i to state j

P(X=j|ßt=i) = Qij

Qij is the the probability that the random walk from an easy to sample yet arbitrary distribution proposes a value from state i to state j

If we could sample from ßt directly it would have the transition distribution Pij

The probability of accepting the value X is

To get the right target distribution we need when i≠j

And we must fulfill the detailed balance

α ij

Pij = Qijα ij

P(β = i)Pij = PjiP(β = j)

1.Start with ßt-1=i 2.Propose a value X|ßt-1=j from transition

probability matrix Qij as a candidate for ßt

3.compute

5.sample u ~ Unif(0,1) 6.If u< then accept the proposal and set ßt=X

and if not then set ßt=ßt-1. 7.Repeat (N times) until you obtain a sufficient

sample from the distribution of ß|Y=y

α ij = minP(Y = y | β = j)P(β = j)P(X = j | β = i)P(Y = y | β = i)P(β = i)P(X = i | β = j)

,1⎛

⎝⎜⎞

⎠⎟

α ij

http://www.probability.ca/jeff/java/ A Markov chain applet, "rwm", illustrates a random walk metropolis hastings algorithm

Check out my awesome applets!

Jeffrey S. Rosenthal University of Toronto

author of: Struck by Lightning: The

Curious World of Probabilities (book for

the general public). HarperCollins Canada,

272 pages, 2005.

And heaps of MCMC theory papers

http://www.probability.ca/jeff/java/

http://www.probability.ca/jeff/java/rwm.html

http://www.probability.ca/sbl/

Simple Example

We will use the example from the cervical cancer vaccination data.

The parameter 'ß' is the probability of getting cervical cancer when someone is not vaccinated.

Without any data, I don't think I know anyone with cervical cancer but admit I know very little about its prevalence

But 0<ß<1 and I will assume is has a q has a density that is higher at low values and decreases linearly to 0 density at ß=1.

Data: The study showed that Y=36 women got cancer from N=5766.

We will use a Binomial Statistical model for Y

The likelihood P(Y|ß) = Binomial(N,ß) is our statistical model.

We are interested in updating our belief about the value of the real parameter with the data suggesting Bayesian methods are appropriate.

Given the data our belief about q is

Let’s get a point and interval estimate for P(ß|Y=y) using MCMC

P(β | Y = y) ∝ P(Y = y | β)P(β)

= P(Y = 36 | β)P(β)

= 576636

⎛

⎝⎜⎞

⎠⎟β 36 (1− β)5766−36 (2 − 2β)

∝ β 36 (1− β)5730 (2 − 2β)

Week6_Day2_Basic_MCMC.R######################## This file runs basic Metropolis Hastings for the Merck Vaccination data# Parameter Beta is the probability of getting cervical cancer when someone is not vaccinated#######################

# The prior is the simple triangle function. # This function is numerically quite stable within the (0,1) interval for Beta# logprior = function(beta){ if(beta>0 && beta<1 ){ return(log(2-2*beta)) }else{ return(-Inf) }}

# Set up the MCMC with niter iterationsniter = 100000stepvar = .002beta = rep(0,niter)

# The datay = 36N = 5766

# keep track of the acceptance rateaccepts = 0

# Initialize and run the MCMCbeta[1] = y/N

for(iter in 2:niter){ # propose a value from an easy distribution Betaprop = rnorm(n = 1, mean = beta[iter-1], sd = stepvar);

# the ratio of un-normalized posteriors. Note that my proposal # distribution is symmetric so Q_{ij}=Q_{ji} alpha = dbinom(y,N,Betaprop,log=TRUE) + logprior(Betaprop) - dbinom(y,N,beta[iter-1],log=TRUE) - logprior(beta[iter-1]); # make a decision if(!is.na(alpha) && runif(1) < exp(alpha)){ accepts = accepts+1; beta[iter] = Betaprop; }else{ beta[iter] = beta[iter-1]; }}

The Markov Chain for ß

hist(beta,100)The distribution of ß, the probability of getting cancer

without getting vaccinated.

Use the sampled values of ß to compute

We often use the sampled values to get an approximation for the mean, median, modes, variance, interval estimates, quantiles...

Bayesian statistics uses MCMC to give an approximation to the full posterior distribution.

E h(β)[ ] = h(b j)P(β = b j)j=1

∞

∑

≈h(b j)Nj=1

N

∑

> summary(beta) Min. 1st Qu. Median Mean 3rd Qu. Max. 0.002896 0.005749 0.006404 0.006450 0.007034 0.011720

Taking it a step further:

The cancer is rare.

Statisticians are skeptical of everything

What should we use as a prior for the probability of getting cancer given that we have been vaccinated?

Let’s see if the vaccine actually works

Switch to RStudio


Interpret my prior for the second example.

Interpret my prior for the first example

What is the frequentist analog to the second analysis?

Second analysis used the posterior from the “no vaccine” group as a prior. This is saying that we start with assuming that the vaccine doesn’t work and the probability of getting cancer is the same with or without vaccination

Documents

COMP STAT WEEK 6 DAY 2people.stat.sfu.ca/~dac5/Stat853-2016/Stat853_2016/Course_Notes/... · A Markov Chain is a sequence of random variables {Xt,t≥0} where The probability of moving