View
5
Download
0
Category
Preview:
Citation preview
Approximate Bayesian Computation methods and their applications for
hierarchical statistical models
University College London, 2015
Contents
1. Introduction2. ABC methods3. Hierarchical models4. Application for ovarian cancer detection5. Conclusion
Introduction
• The likelihood function plays an important role in statisticalinference problems
• For complex models computational costs for evaluating theanalytical formula are very high
• Methods which provide statistical inference bypassingevaluation of the likelihood function gained high popularity
ABC methods
• ABC methods provide ways of evaluatingposterior distributions when the likelihood functionis analytically or computationally intractable
• These methods are based on replacing thecalculation of the likelihood with a comparisonbetween the observed and simulated data
Let be a parameter vector to be estimated. Given the prior distribution , the goal is to approximate the posterior distribution , where
is the likelihood.
θπ(θ)
π(θ |x)∝ f (x|θ)π(θ)f (x|θ)
Generic form of ABC methods
1. Sample a candidate parameter vector from some proposal distribution .
2. Simulate a dataset from the model described by a conditional probability distribution .
3. Compare the simulated dataset, , with the experimental data, , using a distance function, and tolerance ;; if , accept .
The tolerance is the desired level of agreement between and .
θ*π(θ)
x*f (x|θ*)
x*x0
d ε d(x0,x*)≤ε θ*ε≥0
x0 x*
Most popular ABC algorithms
• ABC rejection algorithm
• ABC MCMC algorithm (Markov Chain Monte
Carlo)
• ABC SMC algorithm (Sequential Monte Carlo)
ABC rejection method
1. Sample from .2. Simulate a dataset from .3. If , accept , otherwise reject.4. Return to step 1.
Disadvantage: if prior distribution will be very different from the posterior, acceptance rate would be low.
θ* π(θ)x* f (x|θ*)
d(x0,x*)≤ε θ*
Markov chain Monte Carlo (WIKIPEDIA)
In mathematics, more specifically in statistics, Markov chain Monte Carlo (MCMC) methods are a class of algorithms for sampling from a probability distribution based on constructing a Markov chain that has the desired distribution as its equilibrium distribution. The state of the chain after a number of steps is then used as a sample of the desired
distribution. The quality of the sample improves as a function of the number of steps.
ABC MCMC
1. Metropolis-Hastings Algorithm2. Random-walk-Metropolis-Hastings Algorithm3. Gibbs Sampling Algorithm4. Metropolis within Gibbs Algorithm
Metropolis-Hastings Algorithm
Let be an arbitrary, friendly distribution (we know how to sample from) called proposal. Choose arbitrarily. Suppose we have generated
. To generate do the following:(1) Generate a proposal or candidate value(2) Evaluate where
(3) Set
q(y|x)
X0
X0,X1,...,Xi Xi+1
Y~q(y|Xi)r≡r(Xi|Y)
r(x,y)=min f (y)q(x|y)f (x)q(y|x),1!
"#
$#
%
&#
'#
Xi+1=Y with probability rXi with probability 1−r
"
#$$
%$$
Remarks to Metropolis-Hastings Algorithm
• A simple way to execute step (3) is to generate . If set otherwise .
• A common choice for is for some. In this case, proposal density is symmetric,
, and
U~(0,1) U<r Xi+1=Y Xi+1=Xi
q(y|x) N(x,b2)b>0 qq(y|x)=q(x|y)
r(x,y)=min f (y)f (x),1!
"#
$#
%
&#
'#
Metropolis-Hastings Algorithm. Example1
Let’s simulate a Markov chain whose distribution is(The Cauchy distribution)
Let’s take as proposal distribution.Then
Let’s choose , length of chain.
f (x)=1π 11+x2
N(x,b2)
r(x, y)=minf (y)
f (x),1
!"#
$%&=min
1+x 2
1+y2, 1
!"#
$%&
b=1 N=10,000
Example 1. Code in RN=10000b=1x_values=rep(0,N)x_cauchy=rep(0,N)x_axis=seq(-7,7,by=0.1)x_old=0x_new=x_oldfor (i in 1:N)
y=rnorm(1,x_old,b)r=min((1+x_old^2)/(1+y^2),1)p=runif(1)if (p<r)
(x_new=y) else (x_new=x_old);;x_values[i]=x_newx_old=x_new
x_cauchy=dcauchy(x_axis)plot(x_axis,x_cauchy,type="p",col="black")points(density(x_values),type="l",col="red",lwd=3)
Gibbs Sampling
Gibbs Sampling is the easiest to use MCMC algorithm in case of dealing with high-dimensional problems as it helps to turn a high-dimensional problem into several one-dimensional problems.
One of the examples of high-dimensional problems is hierarchical model
Hierarchical model. Example1Posterior distribution on associated with the joint model
specified.
(θ,σ 2)
Xi~(θ,σ 2), i=1,...,n,θ~N(θ0,τ 2), σ 2~IG(a,b),θ0,τ 2, a, b
Gibbs Sampling algorithm
Suppose that has density . Supposethat it is possible to simulate from the conditionaldistributions and . Let bestarting values. Assume we have drawnThen the Gibbs sampling algorithm for getting :
(X,Y) fX,Y(x,y)
fX|Y(x|y) fY | X(y|x) (X0,Y0)(X0,Y0),...,(Xn,Yn)
(Xn+1,Yn+1)
Xn+1~ fX|Y(x|Yn)Yn+1~ fY | X(y|Xn+1)repeat
Posteriors for the Example1
f (θ |x,σ 2)~N σ 2
σ 2+nτ 2θ0+ nτ 2
σ 2+nτ 2 x, σ 2τ 2
σ 2+nτ 2
!
"
#####
$
%
&&&&&
f (σ 2|x,θ)~IGn2+a,12 xi−θ"
#$$
%
&''
2+b
i∑
"
#
$$$$$$$$
%
&
''''''''
Xi~(θ,σ 2), i=1,...,n,θ~N(θ0,τ 2), σ 2~IG(a,b),θ0,τ 2, a, b
Example 1. Code in Rx=rnorm(1000,10,2)n=length(x)a=3;; b=3tau2=10theta0=5Nsim=5000xbar=mean(x)sh1=(n/2)+asigma2=theta=rep(0,Nsim) #init arrayssigma2[1]=1/rgamma(1,shape=a,rate=b) #init chainsB=sigma2[1]/(sigma2[1]+n*tau2)theta[1]=rnorm(1,m=B*theta0+(1-B)*xbar,sd=sqrt(tau2*B))for (i in 2:Nsim)B=sigma2[i-1]/(sigma2[i-1]+n*tau2)theta[i]=rnorm(1,m=B*theta0+(1-B)*xbar,sd=sqrt(tau2*B))ra1=(1/2)*(sum((x-theta[i])^2))+bsigma2[i]=1/rgamma(1,shape=sh1,rate=ra1)
mean(theta[3000:5000])mean(sigma2[3000:5000])
Conjugate priors
In Bayesian probability theory, if posteriordistributions are in the same family as the priordistributions, then both prior and posterior are calledconjugate distributions and the prior is calledconjugate prior.
P(θ |D)= P(θ)P(D|θ)P(θ)P(D|θ)dθ∫
Conjugate priors. Example
Let’s consider normal distribution .For normally distributed with fixed variance , the conjugate prior is also normally distributed. For prior posterior will be in the form:
x~N(µ,σ (2))x σ (2)
µ~N(µ0,σ 0(2))
µ|x,σ (2)~N(µ0,σ 0(2)),
µ0= σ 0(2)
σ (2)+σ 0(2) x+ σ 0
(2)
σ (2)+σ 0(2)µ0,
σ 0(2)= σ
(2)σ 0(2)
σ (2)+σ 0(2)
Conjugate priors. ExampleLet’s consider normal distribution .For normally distributed with fixed mean , the conjugate prior is distributed according to inverse-gamma distribution. For prior
x~N(µ,σ (2))x µ
σ (2)~IG(α,β)P(x,µ|σ (2))= 1
σ 2πexp(−(x−µ)22σ 2
)∝(σ 2)−1/2exp(−1/2(x−µ)2σ 2 )
P(σ (2))=IG(α,β)=βα(σ (2))(−α−1)Γ(α) exp− β
σ (2)
$
%
&&&&&&
'
(
))))))
,
P(σ (2)|x,µ)∝(σ 2)−(α+1/2)−1exp−β−1/2(x−µ)2σ (2)
$
%
&&&&&&
'
(
))))))
α=α+1/2, β=β+12(x−µ)2
ABC SMC
A number of sampled parameter values (particles) , sampled from the prior distribution ,
are propagated through a sequence of intermediate distributions, , until it represents a sample from the target distribution
. The tolerances what mean gradual evolving towards the target posterior.For sufficiently large numbers of particles, this approach avoid the problem of getting stuck in areas of low probability (as in ABC MCMC)
θ (1),...,θ (n) π(θ)
π(θ |d(x0,x*)≤εi), i=1,...,T−1
π(θ |d(x0,x*)≤εT) ε1>...>εT≥0
ABC SMC Algorithm
S1. Initialize . Set the population indicator .S2.0 Set the particle indicator .S2.1 If , sample independently from .
Else, sample from the previous population with weights and perturb the particle to obtain
, where is a perturbation kernel.If , return to S2.1.Simulate a candidate dataset . If , return to S2.1.
ε1,...,εT t=0i=1
t=0 θ ** π(θ)θ * θt−1
(i)wt−1
θ **~Kt(θ |θ *) Kt
π(θ **)=0x*~ f (x|θ **)
d(x*,x0)≥εt
ABC SMC Algorithm
S2.2 Set and calculate the weight for particle
If , set , go to S2.1.S3 Normalize the weights.If , set , go to S2.0.
θt(i)θt
(i)=θ **
wt(i)=1, if t=0,
π(θt(i))
wt−1( j)Kt(θt−1
( j),θt(i))
j=1
N
∑, if t>0.
#
$
%%%%
&
%%%%
i<N i=i+1
t<T t=t+1
Ovarian Cancer case study. CA125
Risk calculation
Change-point hierarchical model for CA125
Controls:
Cases:
Yij|tij~N(θi,σ 2)
Yij|tij,Ii=0~N(θi,σ 2)
Yij|tij,Ii=1~N(θi+γi(tij−τ i)+,σ 2)
Conditional distributions
Conclusion
1. ABC methods has great impact on parameters’ estimation.
1. A lot of applied problems can be reduced to hierarchical model
2. Gibbs Sampling Algorithm is most useful in dealing with hierarchical models
Literature1. Steven J. Skates, Donna K. Pauler, Ian J. Jacobs.Screening Based on the Risk of Cancer Calculation fromBayesian Hierarchical Changepoint and Mixture Models ofLongitudinal Markers. Journal of the American StatisticalSociety, vol. 96 (2001).2. Wasserman L. All of Statistics. A concise course inStatistical Inference, Springer, 2004.3. Tina Toni, David Welch, Natalja Strelkowa, Andreas Ipsen,Michael P.H. Stumpf. Approximate Bayesian computationscheme for parameter inference and model selection indynamical systems. Journal of the royal society, 6, 187-202(2009).4. Robert P. Christian, Casella George. Introducing MonteCarlo Methods with R, Springer, 2009.
Data processing with “caret” package in R
• Data preprocessing
• Data splitting
• Data processing
• Model comparison
Data preprocessing
preProcess
• Standardizing
• Transformation
• Imputing
Data preprocessing. Example
data(BloodBrain)# contains array bbbDescrbbbDescr=bbbDescr[,-3]preProc <- preProcess(bbbDescr,method = c("center", "scale"))data <- predict(preProc, bbbDescr)mean(bbbDescr[,1])mean(data[,1])var(data[,1])mean(bbbDescr[,2])mean(data[,2])var(data[,2])
Data splitting
• createDataPartition # training/test partition
• createResample # bootstrap samples
• createFolds # split the data into k groups
• createTimeSlices # is used for time series data
Data splitting. Example
data(BloodBrain)# contains array bbbDescrbbbDescr=bbbDescr[,-3]train_part <- createDataPartition(y=bbbDescr[,1], p=0.75, list=FALSE)training <- bbbDescr[train_part,]testing <- bbbDescr[-train_part,]dim(bbbDescr)dim(training)dim(testing)
Data processing. Resampling
train• method=– boot # bootstraping– boot632 # bootstrapping with adjustment– cv # cross validation– repeatedcv # repeated cross validation– LOOCV # leave one out cross validation
Data processing. Example
library(mlbench)data(Sonar)set.seed(107)inTrain <- createDataPartition(y = Sonar$Class, p = .75, list = FALSE)training <- Sonar[ inTrain,]testing <- Sonar[-inTrain,]
plsFit <- train(Class ~ ., data = training, method = "knn", preProc = c("center", "scale"))plsClasses <- predict(plsFit, newdata = testing)plsClasses
Model comparison. Metric options
confusionMatrix
Continuous outcomes:• RMSE # root mean squared error• RSquared # R^2 from regression models
Categorical outcomes:• Accuracy # fraction of correct classes• Kappa # measure of concordance
Model comparison. Example
names(getModelInfo())
plsFit <- train(Class ~ ., data = training, method = "knn", preProc = c("center", "scale"))plsClasses <- predict(plsFit, newdata = testing)confusionMatrix(data = plsClasses, testing$Class)plsFit <- train(Class ~ ., data = training, method = "pls", preProc = c("center", "scale"))plsClasses <- predict(plsFit, newdata = testing)confusionMatrix(data = plsClasses, testing$Class)plsFit <- train(Class ~ ., data = training, method = "cforest", preProc = c("center", "scale"))plsClasses <- predict(plsFit, newdata = testing)confusionMatrix(data = plsClasses, testing$Class)
Literature
1. Max Kuhn. A Short Introduction to the caret Package(2014).
2. Model training and tuning:http://topepo.github.io/caret/training.html
Questions
Recommended