46
Approximate Bayesian Computation methods and their applications for hierarchical statistical models University College London, 2015

Approximate+Bayesian+ Computation+methods+ and+their ...rmjbale/Stat/12.pdf · Longitudinal Markers. Journal of the American Statistical Society, vol. 96 (2001). 2. Wasserman L. All

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Approximate+Bayesian+ Computation+methods+ and+their ...rmjbale/Stat/12.pdf · Longitudinal Markers. Journal of the American Statistical Society, vol. 96 (2001). 2. Wasserman L. All

Approximate Bayesian Computation methods and their applications for

hierarchical statistical models

University College London, 2015

Page 2: Approximate+Bayesian+ Computation+methods+ and+their ...rmjbale/Stat/12.pdf · Longitudinal Markers. Journal of the American Statistical Society, vol. 96 (2001). 2. Wasserman L. All

Contents

1. Introduction2. ABC methods3. Hierarchical models4. Application for ovarian cancer detection5. Conclusion

Page 3: Approximate+Bayesian+ Computation+methods+ and+their ...rmjbale/Stat/12.pdf · Longitudinal Markers. Journal of the American Statistical Society, vol. 96 (2001). 2. Wasserman L. All

Introduction

• The likelihood function plays an important role in statisticalinference problems

• For complex models computational costs for evaluating theanalytical formula are very high

• Methods which provide statistical inference bypassingevaluation of the likelihood function gained high popularity

Page 4: Approximate+Bayesian+ Computation+methods+ and+their ...rmjbale/Stat/12.pdf · Longitudinal Markers. Journal of the American Statistical Society, vol. 96 (2001). 2. Wasserman L. All

ABC methods

• ABC methods provide ways of evaluatingposterior distributions when the likelihood functionis analytically or computationally intractable

• These methods are based on replacing thecalculation of the likelihood with a comparisonbetween the observed and simulated data

Let be a parameter vector to be estimated. Given the prior distribution , the goal is to approximate the posterior distribution , where

is the likelihood.

θπ(θ)

π(θ |x)∝ f (x|θ)π(θ)f (x|θ)

Page 5: Approximate+Bayesian+ Computation+methods+ and+their ...rmjbale/Stat/12.pdf · Longitudinal Markers. Journal of the American Statistical Society, vol. 96 (2001). 2. Wasserman L. All

Generic form of ABC methods

1. Sample a candidate parameter vector from some proposal distribution .

2. Simulate a dataset from the model described by a conditional probability distribution .

3. Compare the simulated dataset, , with the experimental data, , using a distance function, and tolerance ;; if , accept .

The tolerance is the desired level of agreement between and .

θ*π(θ)

x*f (x|θ*)

x*x0

d ε d(x0,x*)≤ε θ*ε≥0

x0 x*

Page 6: Approximate+Bayesian+ Computation+methods+ and+their ...rmjbale/Stat/12.pdf · Longitudinal Markers. Journal of the American Statistical Society, vol. 96 (2001). 2. Wasserman L. All

Most popular ABC algorithms

• ABC rejection algorithm

• ABC MCMC algorithm (Markov Chain Monte

Carlo)

• ABC SMC algorithm (Sequential Monte Carlo)

Page 7: Approximate+Bayesian+ Computation+methods+ and+their ...rmjbale/Stat/12.pdf · Longitudinal Markers. Journal of the American Statistical Society, vol. 96 (2001). 2. Wasserman L. All

ABC rejection method

1. Sample from .2. Simulate a dataset from .3. If , accept , otherwise reject.4. Return to step 1.

Disadvantage: if prior distribution will be very different from the posterior, acceptance rate would be low.

θ* π(θ)x* f (x|θ*)

d(x0,x*)≤ε θ*

Page 8: Approximate+Bayesian+ Computation+methods+ and+their ...rmjbale/Stat/12.pdf · Longitudinal Markers. Journal of the American Statistical Society, vol. 96 (2001). 2. Wasserman L. All
Page 9: Approximate+Bayesian+ Computation+methods+ and+their ...rmjbale/Stat/12.pdf · Longitudinal Markers. Journal of the American Statistical Society, vol. 96 (2001). 2. Wasserman L. All

Markov chain Monte Carlo (WIKIPEDIA)

In mathematics, more specifically in statistics, Markov chain Monte Carlo (MCMC) methods are a class of algorithms for sampling from a probability distribution based on constructing a Markov chain that has the desired distribution as its equilibrium distribution. The state of the chain after a number of steps is then used as a sample of the desired

distribution. The quality of the sample improves as a function of the number of steps.

Page 10: Approximate+Bayesian+ Computation+methods+ and+their ...rmjbale/Stat/12.pdf · Longitudinal Markers. Journal of the American Statistical Society, vol. 96 (2001). 2. Wasserman L. All

ABC MCMC

1. Metropolis-­Hastings Algorithm2. Random-­walk-­Metropolis-­Hastings Algorithm3. Gibbs Sampling Algorithm4. Metropolis within Gibbs Algorithm

Page 11: Approximate+Bayesian+ Computation+methods+ and+their ...rmjbale/Stat/12.pdf · Longitudinal Markers. Journal of the American Statistical Society, vol. 96 (2001). 2. Wasserman L. All

Metropolis-­Hastings Algorithm

Let be an arbitrary, friendly distribution (we know how to sample from) called proposal. Choose arbitrarily. Suppose we have generated

. To generate do the following:(1) Generate a proposal or candidate value(2) Evaluate where

(3) Set

q(y|x)

X0

X0,X1,...,Xi Xi+1

Y~q(y|Xi)r≡r(Xi|Y)

r(x,y)=min f (y)q(x|y)f (x)q(y|x),1!

"#

$#

%

&#

'#

Xi+1=Y with probability rXi with probability 1−r

"

#$$

%$$

Page 12: Approximate+Bayesian+ Computation+methods+ and+their ...rmjbale/Stat/12.pdf · Longitudinal Markers. Journal of the American Statistical Society, vol. 96 (2001). 2. Wasserman L. All

Remarks to Metropolis-­Hastings Algorithm

• A simple way to execute step (3) is to generate . If set otherwise .

• A common choice for is for some. In this case, proposal density is symmetric,

, and

U~(0,1) U<r Xi+1=Y Xi+1=Xi

q(y|x) N(x,b2)b>0 qq(y|x)=q(x|y)

r(x,y)=min f (y)f (x),1!

"#

$#

%

&#

'#

Page 13: Approximate+Bayesian+ Computation+methods+ and+their ...rmjbale/Stat/12.pdf · Longitudinal Markers. Journal of the American Statistical Society, vol. 96 (2001). 2. Wasserman L. All

Metropolis-­Hastings Algorithm. Example1

Let’s simulate a Markov chain whose distribution is(The Cauchy distribution)

Let’s take as proposal distribution.Then

Let’s choose , length of chain.

f (x)=1π 11+x2

N(x,b2)

r(x, y)=minf (y)

f (x),1

!"#

$%&=min

1+x 2

1+y2, 1

!"#

$%&

b=1 N=10,000

Page 14: Approximate+Bayesian+ Computation+methods+ and+their ...rmjbale/Stat/12.pdf · Longitudinal Markers. Journal of the American Statistical Society, vol. 96 (2001). 2. Wasserman L. All

Example 1. Code in RN=10000b=1x_values=rep(0,N)x_cauchy=rep(0,N)x_axis=seq(-­7,7,by=0.1)x_old=0x_new=x_oldfor (i in 1:N)

y=rnorm(1,x_old,b)r=min((1+x_old^2)/(1+y^2),1)p=runif(1)if (p<r)

(x_new=y) else (x_new=x_old);;x_values[i]=x_newx_old=x_new

x_cauchy=dcauchy(x_axis)plot(x_axis,x_cauchy,type="p",col="black")points(density(x_values),type="l",col="red",lwd=3)

Page 15: Approximate+Bayesian+ Computation+methods+ and+their ...rmjbale/Stat/12.pdf · Longitudinal Markers. Journal of the American Statistical Society, vol. 96 (2001). 2. Wasserman L. All
Page 16: Approximate+Bayesian+ Computation+methods+ and+their ...rmjbale/Stat/12.pdf · Longitudinal Markers. Journal of the American Statistical Society, vol. 96 (2001). 2. Wasserman L. All

Gibbs Sampling

Gibbs Sampling is the easiest to use MCMC algorithm in case of dealing with high-­dimensional problems as it helps to turn a high-­dimensional problem into several one-­dimensional problems.

One of the examples of high-­dimensional problems is hierarchical model

Page 17: Approximate+Bayesian+ Computation+methods+ and+their ...rmjbale/Stat/12.pdf · Longitudinal Markers. Journal of the American Statistical Society, vol. 96 (2001). 2. Wasserman L. All

Hierarchical model. Example1Posterior distribution on associated with the joint model

specified.

(θ,σ 2)

Xi~(θ,σ 2), i=1,...,n,θ~N(θ0,τ 2), σ 2~IG(a,b),θ0,τ 2, a, b

Page 18: Approximate+Bayesian+ Computation+methods+ and+their ...rmjbale/Stat/12.pdf · Longitudinal Markers. Journal of the American Statistical Society, vol. 96 (2001). 2. Wasserman L. All

Gibbs Sampling algorithm

Suppose that has density . Supposethat it is possible to simulate from the conditionaldistributions and . Let bestarting values. Assume we have drawnThen the Gibbs sampling algorithm for getting :

(X,Y) fX,Y(x,y)

fX|Y(x|y) fY | X(y|x) (X0,Y0)(X0,Y0),...,(Xn,Yn)

(Xn+1,Yn+1)

Xn+1~ fX|Y(x|Yn)Yn+1~ fY | X(y|Xn+1)repeat

Page 19: Approximate+Bayesian+ Computation+methods+ and+their ...rmjbale/Stat/12.pdf · Longitudinal Markers. Journal of the American Statistical Society, vol. 96 (2001). 2. Wasserman L. All

Posteriors for the Example1

f (θ |x,σ 2)~N σ 2

σ 2+nτ 2θ0+ nτ 2

σ 2+nτ 2 x, σ 2τ 2

σ 2+nτ 2

!

"

#####

$

%

&&&&&

f (σ 2|x,θ)~IGn2+a,12 xi−θ"

#$$

%

&''

2+b

i∑

"

#

$$$$$$$$

%

&

''''''''

Xi~(θ,σ 2), i=1,...,n,θ~N(θ0,τ 2), σ 2~IG(a,b),θ0,τ 2, a, b

Page 20: Approximate+Bayesian+ Computation+methods+ and+their ...rmjbale/Stat/12.pdf · Longitudinal Markers. Journal of the American Statistical Society, vol. 96 (2001). 2. Wasserman L. All

Example 1. Code in Rx=rnorm(1000,10,2)n=length(x)a=3;; b=3tau2=10theta0=5Nsim=5000xbar=mean(x)sh1=(n/2)+asigma2=theta=rep(0,Nsim) #init arrayssigma2[1]=1/rgamma(1,shape=a,rate=b) #init chainsB=sigma2[1]/(sigma2[1]+n*tau2)theta[1]=rnorm(1,m=B*theta0+(1-­B)*xbar,sd=sqrt(tau2*B))for (i in 2:Nsim)B=sigma2[i-­1]/(sigma2[i-­1]+n*tau2)theta[i]=rnorm(1,m=B*theta0+(1-­B)*xbar,sd=sqrt(tau2*B))ra1=(1/2)*(sum((x-­theta[i])^2))+bsigma2[i]=1/rgamma(1,shape=sh1,rate=ra1)

mean(theta[3000:5000])mean(sigma2[3000:5000])

Page 21: Approximate+Bayesian+ Computation+methods+ and+their ...rmjbale/Stat/12.pdf · Longitudinal Markers. Journal of the American Statistical Society, vol. 96 (2001). 2. Wasserman L. All

Conjugate priors

In Bayesian probability theory, if posteriordistributions are in the same family as the priordistributions, then both prior and posterior are calledconjugate distributions and the prior is calledconjugate prior.

P(θ |D)= P(θ)P(D|θ)P(θ)P(D|θ)dθ∫

Page 22: Approximate+Bayesian+ Computation+methods+ and+their ...rmjbale/Stat/12.pdf · Longitudinal Markers. Journal of the American Statistical Society, vol. 96 (2001). 2. Wasserman L. All

Conjugate priors. Example

Let’s consider normal distribution .For normally distributed with fixed variance , the conjugate prior is also normally distributed. For prior posterior will be in the form:

x~N(µ,σ (2))x σ (2)

µ~N(µ0,σ 0(2))

µ|x,σ (2)~N(µ0,σ 0(2)),

µ0= σ 0(2)

σ (2)+σ 0(2) x+ σ 0

(2)

σ (2)+σ 0(2)µ0,

σ 0(2)= σ

(2)σ 0(2)

σ (2)+σ 0(2)

Page 23: Approximate+Bayesian+ Computation+methods+ and+their ...rmjbale/Stat/12.pdf · Longitudinal Markers. Journal of the American Statistical Society, vol. 96 (2001). 2. Wasserman L. All

Conjugate priors. ExampleLet’s consider normal distribution .For normally distributed with fixed mean , the conjugate prior is distributed according to inverse-­gamma distribution. For prior

x~N(µ,σ (2))x µ

σ (2)~IG(α,β)P(x,µ|σ (2))= 1

σ 2πexp(−(x−µ)22σ 2

)∝(σ 2)−1/2exp(−1/2(x−µ)2σ 2 )

P(σ (2))=IG(α,β)=βα(σ (2))(−α−1)Γ(α) exp− β

σ (2)

$

%

&&&&&&

'

(

))))))

,

P(σ (2)|x,µ)∝(σ 2)−(α+1/2)−1exp−β−1/2(x−µ)2σ (2)

$

%

&&&&&&

'

(

))))))

α=α+1/2, β=β+12(x−µ)2

Page 24: Approximate+Bayesian+ Computation+methods+ and+their ...rmjbale/Stat/12.pdf · Longitudinal Markers. Journal of the American Statistical Society, vol. 96 (2001). 2. Wasserman L. All

ABC SMC

A number of sampled parameter values (particles) , sampled from the prior distribution ,

are propagated through a sequence of intermediate distributions, , until it represents a sample from the target distribution

. The tolerances what mean gradual evolving towards the target posterior.For sufficiently large numbers of particles, this approach avoid the problem of getting stuck in areas of low probability (as in ABC MCMC)

θ (1),...,θ (n) π(θ)

π(θ |d(x0,x*)≤εi), i=1,...,T−1

π(θ |d(x0,x*)≤εT) ε1>...>εT≥0

Page 25: Approximate+Bayesian+ Computation+methods+ and+their ...rmjbale/Stat/12.pdf · Longitudinal Markers. Journal of the American Statistical Society, vol. 96 (2001). 2. Wasserman L. All

ABC SMC Algorithm

S1. Initialize . Set the population indicator .S2.0 Set the particle indicator .S2.1 If , sample independently from .

Else, sample from the previous population with weights and perturb the particle to obtain

, where is a perturbation kernel.If , return to S2.1.Simulate a candidate dataset . If , return to S2.1.

ε1,...,εT t=0i=1

t=0 θ ** π(θ)θ * θt−1

(i)wt−1

θ **~Kt(θ |θ *) Kt

π(θ **)=0x*~ f (x|θ **)

d(x*,x0)≥εt

Page 26: Approximate+Bayesian+ Computation+methods+ and+their ...rmjbale/Stat/12.pdf · Longitudinal Markers. Journal of the American Statistical Society, vol. 96 (2001). 2. Wasserman L. All

ABC SMC Algorithm

S2.2 Set and calculate the weight for particle

If , set , go to S2.1.S3 Normalize the weights.If , set , go to S2.0.

θt(i)θt

(i)=θ **

wt(i)=1, if t=0,

π(θt(i))

wt−1( j)Kt(θt−1

( j),θt(i))

j=1

N

∑, if t>0.

#

$

%%%%

&

%%%%

i<N i=i+1

t<T t=t+1

Page 27: Approximate+Bayesian+ Computation+methods+ and+their ...rmjbale/Stat/12.pdf · Longitudinal Markers. Journal of the American Statistical Society, vol. 96 (2001). 2. Wasserman L. All

Ovarian Cancer case study. CA125

Page 28: Approximate+Bayesian+ Computation+methods+ and+their ...rmjbale/Stat/12.pdf · Longitudinal Markers. Journal of the American Statistical Society, vol. 96 (2001). 2. Wasserman L. All

Risk calculation

Page 29: Approximate+Bayesian+ Computation+methods+ and+their ...rmjbale/Stat/12.pdf · Longitudinal Markers. Journal of the American Statistical Society, vol. 96 (2001). 2. Wasserman L. All

Change-­point hierarchical model for CA125

Controls:

Cases:

Yij|tij~N(θi,σ 2)

Yij|tij,Ii=0~N(θi,σ 2)

Yij|tij,Ii=1~N(θi+γi(tij−τ i)+,σ 2)

Page 30: Approximate+Bayesian+ Computation+methods+ and+their ...rmjbale/Stat/12.pdf · Longitudinal Markers. Journal of the American Statistical Society, vol. 96 (2001). 2. Wasserman L. All
Page 31: Approximate+Bayesian+ Computation+methods+ and+their ...rmjbale/Stat/12.pdf · Longitudinal Markers. Journal of the American Statistical Society, vol. 96 (2001). 2. Wasserman L. All
Page 32: Approximate+Bayesian+ Computation+methods+ and+their ...rmjbale/Stat/12.pdf · Longitudinal Markers. Journal of the American Statistical Society, vol. 96 (2001). 2. Wasserman L. All

Conditional distributions

Page 33: Approximate+Bayesian+ Computation+methods+ and+their ...rmjbale/Stat/12.pdf · Longitudinal Markers. Journal of the American Statistical Society, vol. 96 (2001). 2. Wasserman L. All
Page 34: Approximate+Bayesian+ Computation+methods+ and+their ...rmjbale/Stat/12.pdf · Longitudinal Markers. Journal of the American Statistical Society, vol. 96 (2001). 2. Wasserman L. All

Conclusion

1. ABC methods has great impact on parameters’ estimation.

1. A lot of applied problems can be reduced to hierarchical model

2. Gibbs Sampling Algorithm is most useful in dealing with hierarchical models

Page 35: Approximate+Bayesian+ Computation+methods+ and+their ...rmjbale/Stat/12.pdf · Longitudinal Markers. Journal of the American Statistical Society, vol. 96 (2001). 2. Wasserman L. All

Literature1. Steven J. Skates, Donna K. Pauler, Ian J. Jacobs.Screening Based on the Risk of Cancer Calculation fromBayesian Hierarchical Changepoint and Mixture Models ofLongitudinal Markers. Journal of the American StatisticalSociety, vol. 96 (2001).2. Wasserman L. All of Statistics. A concise course inStatistical Inference, Springer, 2004.3. Tina Toni, David Welch, Natalja Strelkowa, Andreas Ipsen,Michael P.H. Stumpf. Approximate Bayesian computationscheme for parameter inference and model selection indynamical systems. Journal of the royal society, 6, 187-­202(2009).4. Robert P. Christian, Casella George. Introducing MonteCarlo Methods with R, Springer, 2009.

Page 36: Approximate+Bayesian+ Computation+methods+ and+their ...rmjbale/Stat/12.pdf · Longitudinal Markers. Journal of the American Statistical Society, vol. 96 (2001). 2. Wasserman L. All

Data processing with “caret” package in R

• Data preprocessing

• Data splitting

• Data processing

• Model comparison

Page 37: Approximate+Bayesian+ Computation+methods+ and+their ...rmjbale/Stat/12.pdf · Longitudinal Markers. Journal of the American Statistical Society, vol. 96 (2001). 2. Wasserman L. All

Data preprocessing

preProcess

• Standardizing

• Transformation

• Imputing

Page 38: Approximate+Bayesian+ Computation+methods+ and+their ...rmjbale/Stat/12.pdf · Longitudinal Markers. Journal of the American Statistical Society, vol. 96 (2001). 2. Wasserman L. All

Data preprocessing. Example

data(BloodBrain)# contains array bbbDescrbbbDescr=bbbDescr[,-­3]preProc <-­ preProcess(bbbDescr,method = c("center", "scale"))data <-­ predict(preProc, bbbDescr)mean(bbbDescr[,1])mean(data[,1])var(data[,1])mean(bbbDescr[,2])mean(data[,2])var(data[,2])

Page 39: Approximate+Bayesian+ Computation+methods+ and+their ...rmjbale/Stat/12.pdf · Longitudinal Markers. Journal of the American Statistical Society, vol. 96 (2001). 2. Wasserman L. All

Data splitting

• createDataPartition # training/test partition

• createResample # bootstrap samples

• createFolds # split the data into k groups

• createTimeSlices # is used for time series data

Page 40: Approximate+Bayesian+ Computation+methods+ and+their ...rmjbale/Stat/12.pdf · Longitudinal Markers. Journal of the American Statistical Society, vol. 96 (2001). 2. Wasserman L. All

Data splitting. Example

data(BloodBrain)# contains array bbbDescrbbbDescr=bbbDescr[,-­3]train_part <-­ createDataPartition(y=bbbDescr[,1], p=0.75, list=FALSE)training <-­ bbbDescr[train_part,]testing <-­ bbbDescr[-­train_part,]dim(bbbDescr)dim(training)dim(testing)

Page 41: Approximate+Bayesian+ Computation+methods+ and+their ...rmjbale/Stat/12.pdf · Longitudinal Markers. Journal of the American Statistical Society, vol. 96 (2001). 2. Wasserman L. All

Data processing. Resampling

train• method=– boot # bootstraping– boot632 # bootstrapping with adjustment– cv # cross validation– repeatedcv # repeated cross validation– LOOCV # leave one out cross validation

Page 42: Approximate+Bayesian+ Computation+methods+ and+their ...rmjbale/Stat/12.pdf · Longitudinal Markers. Journal of the American Statistical Society, vol. 96 (2001). 2. Wasserman L. All

Data processing. Example

library(mlbench)data(Sonar)set.seed(107)inTrain <-­ createDataPartition(y = Sonar$Class, p = .75, list = FALSE)training <-­ Sonar[ inTrain,]testing <-­ Sonar[-­inTrain,]

plsFit <-­ train(Class ~ ., data = training, method = "knn", preProc = c("center", "scale"))plsClasses <-­ predict(plsFit, newdata = testing)plsClasses

Page 43: Approximate+Bayesian+ Computation+methods+ and+their ...rmjbale/Stat/12.pdf · Longitudinal Markers. Journal of the American Statistical Society, vol. 96 (2001). 2. Wasserman L. All

Model comparison. Metric options

confusionMatrix

Continuous outcomes:• RMSE # root mean squared error• RSquared # R^2 from regression models

Categorical outcomes:• Accuracy # fraction of correct classes• Kappa # measure of concordance

Page 44: Approximate+Bayesian+ Computation+methods+ and+their ...rmjbale/Stat/12.pdf · Longitudinal Markers. Journal of the American Statistical Society, vol. 96 (2001). 2. Wasserman L. All

Model comparison. Example

names(getModelInfo())

plsFit <-­ train(Class ~ ., data = training, method = "knn", preProc = c("center", "scale"))plsClasses <-­ predict(plsFit, newdata = testing)confusionMatrix(data = plsClasses, testing$Class)plsFit <-­ train(Class ~ ., data = training, method = "pls", preProc = c("center", "scale"))plsClasses <-­ predict(plsFit, newdata = testing)confusionMatrix(data = plsClasses, testing$Class)plsFit <-­ train(Class ~ ., data = training, method = "cforest", preProc = c("center", "scale"))plsClasses <-­ predict(plsFit, newdata = testing)confusionMatrix(data = plsClasses, testing$Class)

Page 45: Approximate+Bayesian+ Computation+methods+ and+their ...rmjbale/Stat/12.pdf · Longitudinal Markers. Journal of the American Statistical Society, vol. 96 (2001). 2. Wasserman L. All

Literature

1. Max Kuhn. A Short Introduction to the caret Package(2014).

2. Model training and tuning:http://topepo.github.io/caret/training.html

Page 46: Approximate+Bayesian+ Computation+methods+ and+their ...rmjbale/Stat/12.pdf · Longitudinal Markers. Journal of the American Statistical Society, vol. 96 (2001). 2. Wasserman L. All

Questions