Upload
others
View
9
Download
0
Embed Size (px)
Citation preview
Efficient Bayesian estimation forGARCH-type models via Sequential
Monte Carlo
Dan LiBachelor of Actuarial Studies
Bachelor of Finance
Master of Actuarial Studies
School of Mathematical SciencesScience and Engineering Faculty
Queensland University of Technology
2020
submitted in fulfillment of the requirements for the degree of
Master of Philosophy
Statement of Original Authorship
The work contained in this thesis has not been previously submit-
ted to meet requirements for an award at this or any other higher
education institution. To the best of my knowledge and belief,
the thesis contains no material previously published or written by
another person except where due reference is made.
Signature:
Date: 03 Jan 2020
i
QUT Verified Signature
Abstract
Bayesian statistics is increasing in popularity due to its ability to quantify uncer-
tainty in parameter estimates and model predictions in a principled manner. Se-
quential Monte Carlo (SMC) is an efficient method for sampling from the posterior
distribution in Bayesian statistics. SMC may be preferred over the popular Markov
chain Monte Carlo (MCMC) method for certain classes of problems, since SMC is
easily parallelisable, adaptable and more efficient in representing complex posterior
distributions. However, the useful properties of Bayesian statistics and SMC are
yet to be fully realised in certain areas of Econometrics. This research exploits
the advantages of SMC to develop parameter estimation and model selection meth-
ods for GARCH (Generalized AutoRegressive Conditional Heteroskedasticity) style
models. This approach provides an alternative method for quantifying estimation
uncertainty relative to classical inference. We demonstrate that even with long time
series, the posterior distributions of model parameters are non-normal, highlighting
the need for a Bayesian approach and an efficient posterior sampling method. Ef-
ficient approaches for both constructing the sequence of distributions in SMC, and
leave-one-out cross-validation for long time series data, are also proposed. Finally,
we develop an unbiased estimator of the likelihood for the Bad Environment-Good
Environment model, a complex GARCH-type model, which permits exact Bayesian
inference not previously available in the literature.
iii
Keywords
Markov chain Monte Carlo; Time series analysis; Volatility distribution; Prediction;
Cross-validation; Evidence; Data annealing
v
Acknowledgements
First and foremost, I would like to express my great gratitude to my supervision
team, without their help it would have been difficult to complete this thesis. I
would like to thank my principal supervisor, Associate Professor Chris Drovandi,
for his constant support and guidance throughout this research journey. I am very
much grateful to Chris for his patience, encouragement, enthusiasm, and immense
knowledge. I would like to thank my associate supervisor, Professor Adam Clements,
who has given his time to advise me on this research project and offered me insightful
suggestions and expert guidance.
I would like to acknowledge the financial support of the Australian Research Council
Australian Centre for Mathematical and Statistical Frontiers (ACEMS) scholarship
throughout my research project.
I would like to thank Leah South who has spent much time helping me with my
coding and has encouraged me a lot at the early stage of my research journey. I
would also like to express my appreciation to my other friends at QUT, for their
help, support, caring, and laugher. Finally, I would like to thank my family for their
unconditional love.
vii
Contents
Declaration i
Abstract iii
Keywords v
Acknowledgements vii
Chapter 1 Introduction 1
1.1 Motivation and research problem . . . . . . . . . . . . . . . . . . . . 1
1.2 Aims and objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Research contributions and significance . . . . . . . . . . . . . . . . 3
1.4 Thesis outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Chapter 2 Background 5
2.1 Bayesian Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Markov chain Monte Carlo (MCMC) . . . . . . . . . . . . . . . . . . 6
2.3 Sequential Monte Carlo (SMC) . . . . . . . . . . . . . . . . . . . . . 8
2.4 GARCH models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.4.1 Introduction to GARCH family models . . . . . . . . . . . . 14
2.4.2 Classical GARCH(1,1) model . . . . . . . . . . . . . . . . . . 15
2.4.3 GJR-GARCH(1,1) model . . . . . . . . . . . . . . . . . . . . 16
2.4.4 Bad Environment-Good Environment (BEGE) model . . . . 17
2.4.5 Parameter estimation methods . . . . . . . . . . . . . . . . . 18
2.5 Model Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Chapter 3 A fully Bayesian approach for GARCH-type models 23
ix
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.2 Unbiased estimation for the BEGE likelihood . . . . . . . . . . . . . 25
3.3 Efficient sequence construction method . . . . . . . . . . . . . . . . . 27
3.4 Bayesian posterior predictive distribution . . . . . . . . . . . . . . . 28
3.4.1 Bayesian posterior predictive distribution . . . . . . . . . . . 28
3.4.2 Forecast through maximum likelihood estimation (MLE) ap-
proach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.4.3 Forecast evaluation . . . . . . . . . . . . . . . . . . . . . . . . 29
3.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.5.1 Posterior Distributions . . . . . . . . . . . . . . . . . . . . . . 32
3.5.2 Posterior Predictive Distribution . . . . . . . . . . . . . . . . 39
3.5.3 Model Choice . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Chapter 4 Conclusion 47
4.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.2 Future research directions . . . . . . . . . . . . . . . . . . . . . . . . 48
Appendix Importance sampling estimator of likelihood of the BEGE model 49
References 54
x
Chapter 1
Introduction
1.1 Motivation and research problem
The popularity of Bayesian statistics has increased in recent years because it provides
quantification uncertainty in model parameter estimates and model predictions, and
it tactfully avoids analytically intractable integrals via Monte Carlo methods. How-
ever, it can be challenging to devise an efficient algorithm for sampling from the pos-
terior distribution. Up until now, the most popular and extensively used sampling
method in Bayesian statistics is the Markov chain Monte Carlo (MCMC) method
(Metropolis et al., 1953; Hastings, 1970). An alternative method called sequential
Monte Carlo (SMC) for static Bayesian models (Chopin, 2002; Del Moral et al.,
2006) has been developed in recent times, which has been regarded as a comple-
mentary method to MCMC. SMC may be preferred over MCMC for certain types
of problems, since SMC is easily parallelisable, adaptable, more effective in repre-
senting irregular posterior distributions (South et al., 2019), and the outcomes from
the SMC with a sequence of distributions introduced by data annealing (Chopin,
2002) can be easily utilised for making posterior predictions if needed. Despite the
advantages of SMC being widely known to statisticians, the methods have not been
adopted and studied in some applied fields. The aim of this research is to realise
the potential of SMC in the field of Econometrics.
An important area in Econometrics involves the modelling of financial time series
data in order to undertake empirical analysis of financial market data (Mills and
Markellos, 2008). There are various econometric models developed and one of the
best-known and widely used models is the Generalized AutoRegressive Conditional
1
Heteroscedasticity (GARCH, Bollerslev 1986) process for modelling the dynamics
of volatility of financial asset returns. Frequentist inference based on maximum
likelihood estimation (MLE) is the most common approach to statistical inference
for the GARCH class of models. This standard approach is limited in that it only
provides point estimates of parameters, and the uncertainty in these estimates is
typically based on asymptotic theory where the MLE is assumed to be normally
distributed (Hamilton, 1994). In addition, the frequentist approach can only produce
point estimation of forecasts of volatility.
To address these limitations, we develop a Bayesian approach to statistical inference
for the GARCH class of models. In Bayesian statistics, the unknown parameters
are regarded as random variables, and the corresponding parameter posterior distri-
butions that quantify the uncertainty of parameters are constructed. The MCMC
method has been adopted for Bayesian analysis of the GARCH class of models (see,
for example, Kim et al. (1998), Vrontos et al. (2000), Takaishi (2009)). However,
as MCMC has some drawbacks and limitations, such as the difficulty of tuning
associated parameters, the conventional MCMC algorithm can become less effec-
tive. Therefore, we exploit the advantages of SMC to develop a novel and effi-
cient fully Bayesian approach for the widely used GARCH class of models. This
Bayesian approach is used for parameter estimation, prediction, and model selec-
tion for GARCH-type models. Finally, an unbiased likelihood estimator for the
Bad Environment-Good Environment (BEGE, Bekaert et al. (2015)) model is de-
veloped, which allows exact Bayesian inference for the first time for this popular
model. Specifically, this research addresses the following research questions:
1. How is a fully Bayesian approach developed for GARCH-type time series
models with the SMC?
2. How does this Bayesian approach differ from the existing frequentist ap-
proach and what advantages can be explored from this Bayesian approach?
3. How does the SMC approach outperform the conventional MCMC methods
for GARCH-type models?
2
1.2 Aims and objectives
The overall aim of this research is to apply and investigate the performance of the
SMC methods for Econometric models of interest, so that the awareness of the SMC
methods can be expanded in the Econometrics field. Specifically, in this research
a fully Bayesian approach for the GARCH class of models will be established by
adopting the SMC methods. To achieve this overall aim, the following research
objectives will be pursued.
− To develop efficient parameter estimation and model selection methods for
GARCH style models using SMC, which can quantify estimation uncertainty.
− To develop efficient approaches for making one-step ahead prediction, and
leave-one-out cross-validation that is used for evaluating the predictive perfor-
mance, for long time series data.
− To obtain an unbiased estimator for the likelihood of the BEGE model in order
to perform exact Bayesian inference.
1.3 Research contributions and significance
This thesis contributes in establishing an efficient fully Bayesian approach for the
widely used GARCH-type models using SMC methods, which has obvious advan-
tages over the standard approaches previously applied to such econometrics models.
This approach provides an alternative method for quantifying estimation uncertainty
relative to standard asymptotic assumptions typically made with classical statistical
inference. We demonstrate that even with long time series, the posterior distribu-
tion for our models of interest can be far from normally distributed, underlining the
need for a Bayesian approach and an efficient method sampling from the posterior
distributions. In addition to delivering more information about the parameters, this
approach also provides full predictive distributions of the unobservable volatility,
which offers more information that cannot be provided by the classical approach.
Moreover, an unbiased estimator of the likelihood for the BEGE model is proposed,
which facilitates exact Bayesian inference. We also contribute to the SMC literature
by exploring the most efficient way of constructing the sequence of distributions in
3
SMC for long time series data and showing how SMC can be used for efficient leave-
one-out cross-validation for time series data. Much of the content of this thesis has
been written up as a paper (Li et al., 2019) and is currently being considered for
publication.
1.4 Thesis outline
The structure of the thesis is organized as follows.
Chapter 2 provides background of Bayesian statistics, existing sampling methods,
model selection methods, and the econometrics models considered in the thesis.
Chapter 3 describes how to efficiently estimate parameters, make forecasts, and
select the best models from candidate GARCH-type models through a newly de-
veloped Bayesian approach based on SMC. To make exact Bayesian inference for
the BEGE model, the methods for developing an unbiased estimator of the BEGE
likelihood are provided here. An efficient strategy to choose the sequence of interme-
diate probability distributions in SMC for long time series data is also described. A
comparison of one-step ahead volatility forecasts produced using a fully Bayesian ap-
proach, a classical approach, and a data analytic approach involving a leave-one-out
cross-validation method for time series data for the purpose of evaluating predictive
performance is also provided. Finally, the results from using this fully Bayesian
approach are analysed. Supplementary material for Chapter 3 is provided in the
Appendix at the end of the thesis.
Chapter 4 summarises the methods and results, and also discusses limitations and
avenues for future research.
4
Chapter 2
Background
This chapter provides background of Bayesian statistics, MCMC and SMC sampling
methods, and model selection methods that will be used throughout this thesis.
Three specific GARCH-type models of interest are discussed in more detail in this
chapter.
2.1 Bayesian Statistics
Bayes’ theorem captures learning from observed data and experience. Here, y rep-
resents the observed data and θ represents the set of parameters of a model, which
is believed to have generated y. The likelihood function based on the chosen model
is f(y|θ), which is the probability distribution of the observed data y given the
unknown parameters θ. Before seeing the observed data, the information regarding
the parameters is reflected in a prior distribution π(θ). The posterior distribution
of θ is given by:
π(θ|y) = f(y|θ)π(θ)Z
,
where
Z = f(y) =∫θf(y|θ)π(θ)dθ,
which is the normalising constant of the posterior that is independent of θ. The
normalising constant is difficult to compute in practice, but is often not required
when sampling from the posterior.
There are many advantages of Bayesian statistics over frequentist statistics that
have been discussed in the literature. The Bayesian approach treats the unknown
parameters as random variates and offers quantification of uncertainty through the
5
posterior distribution, which is a full distribution on the parameters and thus it is
possible to make any statistical inference of interest without limiting just to point
estimates and intervals. According to Berger (2013), Bayesian analysis incorporates
prior information with observed data and effectively uses the prior information, es-
pecially when the available prior information is significant. Under the Bayesian
approach, inference is conditional on the actual data, however, the parameter prob-
abilities of frequentist inference are based on all possible random samples that could
have occurred but not the actual ones (Bolstad and Curran, 2016).
When our interest is in generating samples of θ from the posterior, the normalising
constant Z can be ignored and then the posterior function of θ can be expressed as:
π(θ|y) ∝ f(y|θ)π(θ).
Given that directly sampling from the posterior distribution π(θ|y) is generally
infeasible, the Markov chain Monte Carlo (MCMC) method (Metropolis et al., 1953;
Hastings, 1970) described below is the most widely used sampling method.
2.2 Markov chain Monte Carlo (MCMC)
MCMC produces an ergodic Markov chain whose stationary distribution is the pos-
terior distribution itself. If the chain is run for a sufficiently long time, then the
samples produced by the chain can be regarded as correlated samples from the
posterior distribution. One standard MCMC algorithm is the Metropolis-Hastings
(MH) algorithm, which is straightforward to implement. The MH algorithm is pre-
sented in Algorithm 2.1. Under the MH algorithm, a proposal distribution q(θ∗|θ)
should be carefully chosen, where the distribution of the new candidate value θ∗
depends on the current state θ. The decision of accepting or rejecting θ∗ depends
on the following acceptance probability:
α(θ → θ∗) = min(
1, f(y|θ∗)π(θ∗)q(θ|θ∗)f(y|θ)π(θ)q(θ∗|θ)
).
The reader is referred to Chib and Greenberg (1995) for the theoretical details of
MH algorithms.
6
Algorithm 2.1 Metropolis-Hastings1: Initialise θ0
2: for all iterations t ∈ 1...T do
3: Given the current parameter value θ = θt−1, draw proposed θ∗ ∼ q(θ∗|θ)
4: Calculate the acceptance probability:
α(θ → θ∗) = min(
1, f(y|θ∗)π(θ∗)q(θ|θ∗)
f(y|θ)π(θ)q(θ∗|θ)
),
5: Set θt ← θ∗ with probability α(θ → θ∗), otherwise set θt ← θ.
6: end for
When the likelihood f(y|θ) is intractable or it is computationally costly to directly
evaluate the likelihood, a pseudo-marginal MH approach can be used by constructing
an unbiased estimator of the likelihood f(y|θ) (Andrieu and Roberts, 2009). The
unbiased estimator f(y|θ) can be defined as
f(y|θ) := f(y|θ,u),
E[f(y|θ,u)
]=∫f(y|θ,u)f(u)du = f(y|θ),
where u represents the auxiliary random variables with a density function of f(u),
which are required to produce the unbiased likelihood estimator. In most imple-
mentations of pseudo-marginal MH these auxiliary random variables u are sim-
ply random numbers, drawn independently from θ. In addition, by assumption,
f(θ,u) = f(u)π(θ). Then the estimated joint posterior distribution of θ and the
random numbers can be written as
π(θ,u|y) = f(y|θ,u)f(u)π(θ)f(y) .
7
Now, the exact posterior distribution can be obtained by marginalizing u out from
π(θ,u|y): ∫π(θ,u|y)du =
∫f(y|θ,u)f(u)π(θ)
f(y) du
= π(θ)f(y)
∫f(y|θ,u)f(u)du
= π(θ)f(y)f(y|θ)
= π(θ|y).
Accordingly, we can apply MH MCMC to sample from the estimated target distri-
bution π(θ,u|y) and marginalize this joint distribution by abandoning the output
of the generated auxiliary variates u. Under the pseudo-marginal MH approach, the
stationary distribution of the Markov chain will still be the same as the one from the
true likelihood if it had been available. Therefore, the exact likelihood f(y|θ) can be
replaced by its unbiased estimator f(y|θ), with the corresponding MCMC algorithm
still converging to the posterior distribution π(θ|y) as demonstrated above.
Although MCMC is an undoubtedly useful method, it does have some limitations.
The proposal distribution q(θ∗|θ) can be difficult to tune for computationally ef-
ficient performance in the presence of challenging posterior distributions, such as
those that are strongly non-normally distributed and possibly multimodal. Adap-
tive MCMC methods (e.g. Haario et al., 2001, 2006; Roberts and Rosenthal, 2009)
can help to improve the efficiency of MCMC. However, they are delicate from a theo-
retical perspective and optimal acceptance rates targeted by some adaptive methods
are based on strong posterior distribution assumptions, which may not hold in chal-
lenging applications. An alternative method named sequential Monte Carlo (SMC)
was designed, which could be viewed as a complementary method to MCMC. SMC
has several advantages over MCMC that make it more appealing for some appli-
cations. Further, running multiple algorithms targeting the same posterior can be
used to verify the results of each algorithm. A more detailed description of this
method is given in the next section.
2.3 Sequential Monte Carlo (SMC)
Here, we are focusing on the use of SMC for static Bayesian models (Chopin, 2002).
The SMC discussed in this thesis can be viewed as an extension of the importance
8
sampling (IS, see e.g. Neal 2001) method, therefore we will firstly explain the method
of IS here. Suppose that directly sampling from our target posterior distribution
π(θ|y) is difficult, but we can easily sample from an importance distribution g(θ)
constructed to approximate the target distribution. Then we can draw a collection
of independent samples θi ∼ g(·) for i = 1, . . . , N and compute the importance
weights of these samples so that they reflect the target posterior distribution:
wi ∝ f(y|θi)π(θi)g(θi) .
Then we normalize the weights so that they sum to one, W i = wi/∑Nj=1w
j , and
now we have a weighted sample W i,θi from the target distribution. This weighted
sample can be used to estimate the density of our target distribution π(θ|y) and
any expectations with respect to it. The efficiency of this weighted sample can be
measured by the effective sample size (ESS, Liu (2008)), which refers to the num-
ber of independent and identically distributed samples among all the N generated
samples. The ESS is computed by:
ESS = N
1 + CV[w(θ)] ,
where CV[wt(θ)] is the coefficient of variation of the unnormalized weights of the
target distribution. The ESS can be approximated using the normalized weights as
1/∑Ni=1(W i)2. It is obvious that the value of ESS ranges from 1 to N . The value
of ESS is maximized when all the weights are equal, which means the ratio of the
target over the importance distribution equals one (i.e. the importance distribution
is exactly the target density). For IS to work well, the importance distribution
should be a good approximation to our target distribution and provide sufficient tail
coverage of the target. However, it is not always easy to find a good importance
distribution g(·), especially when the target is complex. SMC attempts to overcome
the drawbacks of IS by exploring a sequence of importance distributions.
In SMC, a set of N weighted particles,{W it ,θ
it
}Ni=1, are traversed through a sequence
of posterior distributions. Therefore, we need to build a sequence of distributions
πt(θ|y) for t = 0, . . . , T , which explore the ultimate target distribution gradually
9
and where πT (θ|y) equals π(θ|y). One method to construct the sequence of dis-
tributions is the data annealing strategy (Chopin, 2002). Under this method the
target sequence is created as πt(θ|y1:t) ∝ f(y1:t|θ)π(θ), where y1:t represents the
observation data up to time t. Another method called likelihood annealing (Neal,
2001) can also be used to generate the sequence when the whole observed dataset is
already available. The sequence formed by the likelihood annealing method is given
by:
πt(θ|y) = f(y|θ)γtπ(θ)Zt
∝ f(y|θ)γtπ(θ),
where γt is referred to as the temperature and 0 = γ0 ≤ · · · ≤ γt ≤ · · · ≤ γT = 1.
We use both of the methods in this thesis, however, the data annealing approach
may be preferred when data are observed sequentially. A more detailed analysis and
comparison between these two sequence constructions is demonstrated in Chapter
3. As with MCMC, unbiased likelihood estimators in SMC and exact Bayesian
inference (up to Monte Carlo error) can be produced (Chopin et al., 2013; Duan and
Fulop, 2015).
In order to generate a properly weighted sample for target t, we need to reweight the
particle set at target t − 1,{W it−1,θ
it−1}Ni=1. The reweighting step can be achieved
by IS using:
wit = W it−1w
it = W i
t−1ηt(θit−1|y)ηt−1(θit−1|y)
,
(Chopin, 2002), for i = 1, . . . , N where wit is an unnormalised weight, and ηt(·) is
the unnormalised version of πt(·). The so-called incremental weight wit is calculated
by dividing the unnormalised target density at t, ηt(θit−1|y), by the unnormalised
target density at t− 1, ηt−1(θit−1|y). Unfortunately, the reweighting step generally
leads to reductions in the ESS, so we need to resample the particles periodically by
duplicating particles with high weights and eliminating particles with low weights.
This resets the approximate ESS back to N .
However, the resampling strategy will result in duplication, especially for particle
values with relatively large weight. A moving step is required to increase the number
of unique particles. The moving step is usually the most computationally expensive
part of the SMC process, and the MCMC kernels of invariant distribution πt are
10
commonly employed in the literature. Generally, in static Bayesian models Rt itera-
tions of a multivariate normal random walk MCMC kernel (Chopin, 2002) are taken,
the approach adopted here is designed to diversify the particles. We propose to use
the MH MCMC algorithm in Algorithm 2.1 for the moving step. It is important to
note that the population of particles can be used to automatically tune the param-
eters of the MH proposal density. For example, if a multivariate normal random
walk proposal is used, the covariance matrix may be estimated from the population
of particles. As the proposals of particles can be rejected by the MCMC kernel, it
is then necessary to repeat the MCMC kernel Rt times to maintain a target level of
diversity.
According to Drovandi and Pettitt (2011), with a probability of 1 − c, Rt can
be determined adaptively so that there is at least one accepted proposal: Rt =
ceiling (ln (c)/ ln (1− pt−1)), where pt−1 is the aforementioned MCMC acceptance
probability at the previous move step. This method is straightforward to imple-
ment, but the previous MCMC acceptance probability may not be a good proxy
for the current MCMC acceptance probability. Furthermore, it focuses on accep-
tance rate rather than the distance the particles are moved. Salomone et al. (2018)
propose a more robust method that considers the distance the particles are moved
and using trial MCMC iterations at the current target in SMC to obtaining a more
suitable value of Rt.
Salomone et al. (2018) propose that Rt can be determined adaptively by choosing an
optimal tuning scale for the covariance matrix of the MCMC proposal distribution.
In this thesis the proposal distribution used in the moving step is a multivariate
normal distribution with mean θit and covariance Σ, q(θi,∗t |θit) = N (θit, h2Σt), where
the covariance Σ is estimated by the covariance matrix Σt obtained from the popu-
lation of particles and adjusted by the optimal scale h that needs to be determined
appropriately. The optimal scale is selected based on the expected square jumping
distance (ESJD; Sherlock and Roberts, 2009). We randomly assign each particle one
of several tuning scales and we estimate the ESJD for each particle using a single
MCMC iteration. The tuning scale which resulted in the highest median ESJD value
11
is selected as optimal. The ESJD is given by:
ESJD = α(θit → θ∗)Λit = α(θit → θ∗)(θit − θ∗)T Σ−1t (θit − θ∗),
which is the product of the MH acceptance ratio and the Mahalanobis square jump-
ing distance (Salomone et al., 2018), i = 1, . . . , N , and t = 0, . . . , T . After obtaining
the optimal scale, we keep applying an MCMC kernel to all the particles until a
target number of particles’ ESJD values are higher than a threshold. We set this
threshold at the median value of the Mahalanobis distances between each pair of
particles before the move step. As can be seen, the total number of MCMC itera-
tions can be adaptively attained after meeting the threshold. The SMC sampling
method is summarized in Algorithm 2.2.
12
Algorithm 2.2 SMC Sampler1: Sample θi0 ∼ π(·) and set W i
0 = 1/N for i = 1, . . . , N
2: for t = 1, . . . , T do
3: Reweight: wit = W it−1
ηt(θit−1|y)ηt−1(θit−1|y)
4: Set θit = θit−1 for i = 1, . . . , N
5: Normalization of the weights: W it = wit/
∑Nk=1w
kt for i = 1, . . . , N
6: Computation of ESS: ESS = 1/∑Ni=1(W i
t )2
7: if ESS < αN , with α ∈ (0, 1] then
8: Resample the particles and set W it = 1/N , producing a new weighted par-
ticle set{W it ,θ
it
}Ni=1
9: Compute Σt from the particle set{θit}Ni=1
10: Compute the median value of the Mahalanobis distance Ddesired between
each pair of particles
11: Move each particle with an MCMC kernel with randomly selected scales h
12: Estimate the ESJD for each particle, and set the optimal tuning scale h as
the one that produces the highest median ESJD
13: while The number of the particles that satisfies “ESJD > Ddesired” below
a target number do
14: Move each particle according to an MCMC kernel with the proposal dis-
tribution q(θi,∗t |θit)
15: end while
16: end if
17: end for
13
2.4 GARCH models
2.4.1 Introduction to GARCH family models
Research into GARCH type models remains a highly active area due to their success
in modelling the volatility of financial time series data (see, for example, Giraitis
et al. (2007), Alberg et al. (2008), Kristjanpoller and Minutolo (2015, 2016), and
Dyhrberg (2016)). The standard GARCH model introduced by Bollerslev (1986)
successfully established a dynamic model of time-varying volatility which captures
some of the important characteristics of financial asset returns. The features of
financial asset returns generally consist of unpredictability, numerous extreme values
indicating distributions with heavy tails, and volatility clustering (Engle, 1982). The
standard GARCH model has been shown to capture these features (Engle, 2004).
This thesis is focusing on the application of the GARCH model to stock returns.
Despite the long history of, and extensive literature on the classical GARCH model,
the model is not free of limitations. The classical GARCH model does not account
for the stylised feature of the leverage effect (Black, 1976), an asymmetric impact
of past negative and positive returns on future volatility. In many empirical studies
on stocks (Koutmos, 1998; Bouchaud et al., 2001; Khedhiri and Muhammad, 2008),
volatility increases by a larger amount resulting from a decrease in the stock return
than stock return increasing. However, the classical GARCH model only considers
the magnitude but not the sign of returns when modeling the conditional variance.
The standard GARCH model also imposes a strict non-negativity constraint on
coefficients to ensure a positive variance.
Numerous extensions have been proposed to address the problems and limitations of
the standard GARCH model. The exponential GARCH or EGARCH (Nelson, 1991)
successfully captures the asymmetric response of volatility to decrease and increase
of stock returns. Since the conditional variance under the EGARCH model is in
logarithmic form, the results are on the whole real line and the positivity constraints
of the parameters can be avoided. In this thesis the EGARCH model will not be
investigated in further detail. Instead, the GJR-GARCH (Glosten et al., 1993)
model, which is one of the most popular GARCH-type models and an improvement
over the classical GARCH model, is considered here. The GJR-GARCH model has
14
a simpler expression compared with the EGARCH model, and also accounts for
leverage effect. Engle and Ng (1993) suggest that the GJR-GARCH model captures
the leverage effect in stock returns best among other popular nonlinear models such
as the EGARCH model.
However, the GJR-GARCH model originally assumed a Gaussian distribution of
the return innovation. However, such a specification usually fails to generate a
non-Gaussian distribution for returns which is one of the most important properties
of financial time series (Francq and Zakoıan, 2011). Thus, we also consider an
alternative model that accommodates conditional non-Gaussianity and was based
on the GJR-GARCH model, which is called bad environment-good environment
(BEGE) model (Bekaert et al., 2015). The three GARCH-type models investigated
in this thesis are specifically presented below.
2.4.2 Classical GARCH(1,1) model
GARCH models are observation driven, where the conditional variance is a function
of not only it past values, but past innovations to returns. To begin, define the
innovations in returns, ut as a shock to returns:
rt = µ+ ut, (2.1)
ut = σtεt, εti.i.d.∼ N (0, 1) (2.2)
where rt is the time series of asset returns with fixed mean µ, with ut having a time
varying conditional variance σ2t , with different GARCH models defining different
processes for σ2t . In what follows we present three models within the GARCH family
considered in this paper.
Under the classical GARCH(p, q) model, the conditional variance is given by:
σ2t = α0 +
q∑j=1
αju2t−j +
p∑i=1
βiσ2t−i, t ∈ Z, (2.3)
where α0 > 0, αj > 0, βi > 0, p > 0, q > 0, i = 1, . . . , p, j = 1, . . . , q, and the sta-
tionary condition requires that∑qj=1 αj +
∑pi=1 βi < 1 (Bollerslev, 1986). The wide
success of the parsimonious GARCH(1, 1) model in extensive empirical applications
15
makes this model the workhorse of financial applications (see, for example, Engle
(2001); Alexander and Lazar (2006); Francq and Zakoıan (2011)). The GARCH(1, 1)
model considered here is given by:
σ2t = α0 + β1σ
2t−1 + α1u
2t−1, (2.4)
where α0, α1, and β1 are subject to the constraints discussed above in the general
GARCH(p, q) model. The log-likelihood function of the GARCH(1, 1) model is given
by:
L(r|θGarch) =T∑t=1
12
[− ln σ2
t −u2t
σ2t
− ln 2π], (2.5)
where r is the observed time series of asset returns, and θGarch is a parameter set of
the GARCH(1, 1) model.
2.4.3 GJR-GARCH(1,1) model
The asymmetric GJR-GARCH(1, 1) model is a GARCH-type model with a threshold
indicator function, and has a different expression for conditional volatility σ2t from
the classical GARCH model, which is defined as:
σ2t = α0 + βσσ
2t−1 + φu2
t−1 + φ−u2t−1(Iut−1<0), (2.6)
where I is a dummy variable which takes the value of 1 if the innovation is negative,
and 0 otherwise; and α0 > 0, βσ > 0, φ− > 0 and φ > 0 to ensure the conditional
variance is positive and βσ + φ + 0.5φ− < 1 to ensure the conditional variance is
stationary. In this model, negative shocks increase the conditional variance relative
to positive shocks due to the leverage effect in stock returns. The log-likelihood
function of the GJR-GARCH(1, 1) model has the same expression with the one of
the GARCH(1, 1) model provided in Equation (2.5).
16
2.4.4 Bad Environment-Good Environment (BEGE) model
The BEGE model is based on non-Gaussian innovations and is defined as:
ut = σpωp,t − σnωn,t, where
ωp,t ∼ Γ(pt, 1), and
ωn,t ∼ Γ(nt, 1),
(2.7)
where Γ(k, θ)1 is the so-called centered gamma distributions (Bekaert et al., 2015)
with a zero mean. The innovation ut is a linear combination of a bad environment
component ωn,t with a shape parameter of nt, and a good environment component
ωp,t with a shape parameter of pt. The conditional shock distribution of ut is gen-
erated from two gamma-distributed shocks. The time-varying shape parameters nt
and pt are given by:
pt = p0 + ρppt−1 +φ+p
2σ2p
u2t−1Iut−1≥0 +
φ−p2σ2
p
u2t−1(1− Iut−1≥0), and (2.8)
nt = n0 + ρnnt−1 + φ+n
2σ2n
u2t−1Iut−1≥0 + φ−n
2σ2n
u2t−1(1− Iut−1≥0). (2.9)
The overall conditional variance σ2t of ut in the BEGE model is:
σ2t ≡ vart(rt) = σ2
ppt + σ2nnt, (2.10)
which is derived from the moment generating function of the centered gamma distri-
bution (Bekaert et al., 2015). As with the conditional variance, moments of higher
order such as conditional skewness and kurtosis can be easily derived with tractable
expressions. These higher-order conditional moments are time-varying compared
with traditional asymmetric volatility GARCH models.
The log-likelihood function of the BEGE model is written as:
L(r|θBEGE) =T∑t=1
ln fBEGE(rt|rt−1, . . . , r1;θBEGE), (2.11)
1The probability density function of Γ(k, θ) is f(x) = 1Γ(k)θk (x + kθ)k−1 exp
(− 1θ(x+ kθ)
)for
x > −kθ.
17
where θBEGE is a parameter set of the BEGE model, and the conditional likeli-
hood of observation rt fBEGE(rt|rt−1, . . . , r1;θBEGE) is approximated numerically
according to Bekaert et al. (2015). The cumulative distribution function (CDF) of
fBEGE(rt|rt−1, . . . , r1;θBEGE) at two points just above and below the observation rt
are numerically evaluated. Then a finite difference approximation method is used to
find the derivative of the CDF, which approximates the probability density function
of rt. As a result of the nature of approximation in this deterministic numerical
method, the approximated likelihood of rt is biased. In Chapter 3, we propose an
unbiased estimation for the likelihood of rt of the BEGE model with importance
sampling.
2.4.5 Parameter estimation methods
As researchers explore increasingly complex extensions to the standard GARCH
model, the corresponding parameter estimation problem becomes more challeng-
ing. The generally favored approach for the estimation of GARCH-type models
is maximum likelihood estimation (MLE). The straightforward implementation of
MLE technique makes it popular. Bollerslev and Wooldridge (1992) established the
asymptotic distribution of the quasi-maximum likelihood estimator (QMLE) for the
standard GARCH model under high level assumptions. Traditionally, a Gaussian
innovation is assumed when applying the QMLE technique to estimate the GARCH
model, whilst the true innovation may be far from Gaussian.
However, for many financial time series the empirical evidence indicates that the
distribution of the innovation is asymmetric and heavy-tailed, which leads to the
investigation of QMLE for heavy tailed errors, see e.g. Berkes et al. (2003) and
Hall and Yao (2003). Due to the simplicity principle, the QMLE approach has
been widely adopted in the literature, see e.g. Caporale et al. (2015), Bekaert
et al. (2015), Hansen and Huang (2016) and Guesmi et al. (2019). However, some
difficulties may be encountered in practice when dealing with the MLE of GARCH-
type models (Ardia et al., 2008), such as the cumbersome inequality constraints
are normally considered in the numerical optimization procedure. The Bayesian
approach provides some advantages compared to classical estimation techniques and
offers an attractive alternative.
18
Importance sampling has been used to sample from the posterior distribution of the
parameters of GARCH-type models (e.g. Geweke (1989)). However, the choice of
the importance distribution can be the main issue for applicability. Alternatively,
MCMC methods have been popular for the Bayesian estimation of GARCH-type
models. Ardia and Hoogerheide (2010) state that the simple Gibbs sampling pro-
cedure cannot be used because of the recursive nature of the conditional variance
of GARCH-type models. The Griddy-Gibbs sampler (Ritter and Tanner, 1992) ap-
proach has been adopted by Bauwens and Lubrano (1998, 2002). This approach is
intuitive and easy to implement, but the procedure is extremely time consuming.
MH MCMC is another algorithm used for estimating GARCH-type model parame-
ters. As the proposal distribution in MH MCMC requires to be tuned to produce
reasonable acceptance rates and hence a better Markov chain, the algorithm is not
automatic which makes make it undesirable. The MH MCMC has been used by
Muller and Pole (1998), Vrontos et al. (2000), Chen et al. (2006) and Gerlach and
Chen (2008), among others. Nakatsuma (1998, 2000) propose an automatic MH
algorithm for helping to construct better proposal densities. The reader is referred
to Virbickaite et al. (2015) for a comprehensive review of the Bayesian approach for
GARCH-type models. To the best of our knowledge, from the perspective of the
Bayesian approach, SMC methods have not been established for the GARCH-type
models with fixed parameters in the literature, which stimulates the work of this
thesis.
2.5 Model Selection
Practitioners are not only interested in single model analysis, but also the relative
performance of competing models. Under a fully Bayesian framework, the relative
performance of one model against another candidate model is assessed using the
Bayes factor (Jeffreys, 1935; Kass and Raftery, 1995). Suppose there are M can-
didate models (m = 1, . . . ,M), the posterior model probability P (m|y) is given by
Bayes’ theorem:
P (m|y) = P (y|m)P (m)P (y) ,
19
the posterior odds of two different models, m1 and m2, is then given by:
P (m = m1|y)P (m = m2|y) = P (y|m = m1)
P (y|m = m2)P (m = m1)P (m = m2) .
When the prior probabilities of the two models are equal, then the posterior odds
is just the Bayes factor. The Bayes factor comparing two different models with
parameter vectors θm1 and θm2 can be expressed as:
BFm1,m2 = P (y|m = m1)P (y|m = m2) =
∫P (y|θm1 ,m = m1)P (θm1 |m = m1)dθm1∫P (y|θm2 ,m = m2)P (θm2 |m = m2)dθm2
,
which is just the ratio of normalising constants under each model. As the Bayes
factor transforms the prior odds to posterior odds by observing data, the Bayes factor
indicates how strongly one model is supported by the data compared with another
model. The normalising constant is also referred to as the marginal likelihood or
evidence. The larger the evidence in favour of model m1 relative to m2 is, the greater
the Bayes factor BFm1,m2 . One challenge of implementing Bayes factor is that the
evidence is generally not easy to calculate quickly and accurately. A review of some
commonly used methods of estimating the model evidence is given in Friel and Wyse
(2012).
Fortunately, an estimate of the evidence can be obtained as a by-product from
SMC methods. The estimate of the evidence Z can be computed as Z = ZT /Z0 =∏Tt=1 Zt/Zt−1 with Z0 = 1 (Del Moral et al., 2006). Under likelihood annealing
SMC, the ratio of normalizing constants Zt/Zt−1 is given by:
ZtZt−1
=∫θf(y|θ)γt−γt−1πt−1(θ|y)dθ.
Then the estimate of log evidence lnZ is given by:
lnZ =T∑t=1
ln Eπt−1 [f(y|θ)γt−γt−1 ]
≈T∑t=1
ln(
N∑i=1
wit
),
20
where wit is the unnormalised weight from Algorithm 2.2. A similar estimator under
data annealing can be easily derived. In Chapter 3 we use this estimator to compare
the three GARCH models on various datasets.
Other widely used model selection tools such as the Bayesian information criterion
(BIC) (Schwarz et al., 1978), which provides a crude approximation to the logarithm
of Bayes factor (Kass and Raftery, 1995), and deviance information criterion (DIC)
(Spiegelhalter et al., 2002) have restrictive assumptions and limitations. In the BIC
approach, one chooses the model that minimizes
BIC = −2 lnP (y|θ) + p lnn,
where n is sample size, or equivalently, the number of observations, p is the number
of model parameters, and θ is the maximum likelihood estimate for θ. The BIC is an
asymptotic derivation which means the probability that BIC will select the correct
model approaches one as the sample size goes to infinity (Trevor et al., 2009). When
the implicit prior assumed by BIC is not a multivariate normal distribution, the BIC
will behave poorly (Kuha, 2004).
Spiegelhalter et al. (2002) introduced the DIC as a somewhat Bayesian measure of
model fit and complexity. The DIC is defined as
DIC = −2 lnP (y|θBayes) + 2pD,
pD = 2(lnP (y|θBayes)−Eθ|y (ln p(y|θ))
),
where θBayes = E(θ|y) is the posterior mean, pD is the effective number of param-
eters, and the second term of pD is the posterior expectation of the log-likelihood.
With the rise of simulation methods such as MCMC to obtain the posterior distri-
butions, the DIC becomes more popular as it could be easily computed from the
simulation outputs. Nevertheless, the DIC is only valid when the posterior can be
approximated well with a multivariate normal distribution (Gelman et al., 2014).
However, the fully Bayesian approach based on the Bayes factor, which is adopted
in this thesis, does not rely on the restrictive approximations.
21
Chapter 3
A fully Bayesian approach for GARCH-type models
3.1 Introduction
In many financial applications, volatility, defined as the conditional standard devi-
ation of asset returns, is an important quantity. Many financial applications are
based on volatility forecasts, such as risk management, asset pricing and portfolio
optimization (Engle, 2001). Therefore modelling of volatility has attracted a lot of
research attention. One of the most influential class of models used to describe the
dynamics of volatility is the AutoreRressive Conditional Heteroscedasticity (ARCH)
model by Engle (1982) and the Generalized ARCH (GARCH) model by Bollerslev
(1986), which is a more parsimonious extension of ARCH. The standard GARCH
model of Bollerslev (1986) has successfully captured many characteristics of financial
asset returns, and forms the basis for a wide range of expressions used to capture var-
ious empirical features of returns data. For instance, in order to capture the stylised
fact of the leverage effect (Black, 1976), a range of models such as the exponen-
tial GARCH (EGARCH) model (Nelson, 1991), the Glosten-Jagannathan-Runkle
GARCH model (GJR-GARCH) of Glosten et al. (1993), the absolute value GARCH
(AGARCH) model of Hentschel et al. (1995), and the Bad Environment-Good En-
vironment (BEGE) model (Bekaert et al., 2015) have been developed. Given the
wide range of GARCH type models, it is crucial to develop an efficient statistical
inference framework to select between these models.
Frequentist inference based on maximum likelihood estimation (MLE) is commonly
employed with the GARCH class of models. A limitation of MLE is that it only
produces point estimates of parameters, with the uncertainty of these parameter
23
estimates based on asymptotic theory where parameter estimates are assumed to
be normally distributed (Hamilton, 1994). With the development of new financial
instruments based on volatility, such as volatility options, modelling the full predic-
tive density of volatility is attracting attention (e.g. Corradi et al. (2009, 2011); and
Griffin et al. (2016)). To produce such distributions, here we develop a Bayesian
approach to statistical inference for the GARCH class of models. Under this ap-
proach, the unknown parameters are regarded as random variables, whose posterior
distributions may be far from normal. In contrast to the confidence interval under
classical statistics, the credible interval of Bayesian statistics is easily interpreted
and gives a more direct and intuitive probability statement about uncertainty in
the parameter estimates, (O’Hagan, 2004). Furthermore, our Bayesian approach
can also provide full predictive distributions of the conditional volatility required for
pricing volatility instruments.
An issue with Bayesian estimation is developing an efficient method for sampling
from the posterior distribution based on Monte Carlo methods (Robert and Casella,
2004). The most commonly used approach is MCMC, which has been utilised for
Bayesian analysis of the GARCH class of models (see, for example, Kim et al.
(1998), Vrontos et al. (2000), Takaishi (2009)). Despite the popularity of MCMC,
it can be costly and difficult to tune the associated parameters when using MCMC
(Roberts and Rosenthal, 2009) with conventional MCMC algorithms becoming less
effective when the target distribution is far from normally distributed. In more
recent times, the sequential Monte Carlo (SMC) method for static Bayesian mod-
els (Chopin, 2002; Del Moral et al., 2006) has been developed as a complementary
method to MCMC. SMC traverses a set of particles through a sequence of probabil-
ity distributions, which starts with one that is easy to sample from and ends with
the ultimate target posterior distribution. The sequence of probability distributions
can be attained either using a data annealing strategy (Chopin, 2002) or the method
of likelihood annealing (Neal, 2001). SMC iteratively applies three main steps to
traverse the population of particles, which include re-weighting, re-sampling, and
moving (or mutation). SMC is parallelisable and easy to adapt to become more effi-
cient compared with the MCMC method. It is possible to use parallelisation to save
24
computation costs especially when the likelihood function is computationally inten-
sive, which we demonstrate is easily achievable for the GARCH class of models. In
addition, SMC is efficient at exploring irregular posterior distributions, often found
with GARCH type models. The intermediate samples created by SMC embedded
with data annealing strategy can be reused to perform posterior predictions. Also,
the estimate of evidence, which is a by-product from SMC methods, can be used in
Bayesian model selection.
In this chapter, we utilise SMC methods to develop a novel fully Bayesian approach
for the widely used GARCH class of models. This Bayesian framework can be
used for estimation, prediction and model selection, and generating a full predictive
density for conditional volatility. Our Bayesian SMC approach provides an attractive
alternative to the MCMC or frequentist approaches. This chapter also contributes to
the SMC literature as we demonstrate that SMC data annealing is useful for leave-
one-out cross-validation when dealing with time series data. Finally, we develop an
unbiased likelihood estimator for the BEGE model, which permits exact Bayesian
inference for this model. Bekaert et al. (2015) use an approximation to the likelihood,
which as we show produces biased posterior distributions. South et al. (2019) use
the BEGE model with biased likelihood as an example to illustrate a new SMC
approach for parameter estimation. Our work is a comprehensive demonstration of
the advantages that SMC can produce for GARCH-type models.
In Section 3.2, we provide the methods for deriving an unbiased estimator for the
likelihood of the BEGE model. An efficient strategy to choose the sequence of
intermediate probability distributions of SMC is discussed in Section 3.3. Section 3.4
provides a comparison of one-step forward predictions of volatility that are derived
from the fully Bayesian approach and the classical approach, respectively. A leave-
one-out cross-validation method for time series data is also described for evaluating
the predictive performance of the forecasted volatility. Finally, a summary of the
results is provided in Section 3.5, and concluding discussion is given in Section 3.6.
3.2 Unbiased estimation for the BEGE likelihood
According to Bekaert et al. (2015), the BEGE-distributed variable ut can be rewrit-
ten as ut = ωp,t − ωn,t, where ωp,t ∼ Γ(pt, σp), ωn,t ∼ Γ(nt, σn), and ut = rt − µ.
25
Then the BEGE density can take the form as
fBEGE(ut) =∫ωp,t
fut(ut|ωp,t)f(ωp,t)dωp,t
=∫ωp,t
fωn,t(ωp,t − ut)f(ωp,t)dωp,t.(3.1)
An unbiased estimator of fBEGE(ut) can be constructed using Monte Carlo methods.
Firstly, we sample M independent samples ω1p,t, . . . , ω
Mp,t from f(ωp,t), then fBEGE(ut)
can be estimated as
fBEGE(ut) = 1M
M∑i=1
fωn,t(ωip,t − ut),where ωip,ti.i.d.∼ f(ωp,t), i = 1, . . . ,M. (3.2)
One main issue of the Monte Carlo integration method is that a large number of
Monte Carlo draws, M , may be required to increase the accuracy of the Monte Carlo
estimates, which makes it more computationally intensive. Therefore, we adopt
importance sampling to reduce variance of Monte Carlo estimates and improve the
computational efficiency. When it is hard to sample from our target distribution h(·)
directly, we can construct an importance distribution, g(·), which is easy to sample
from. The following identity is the basis for importance sampling
Eg[h(x)g(x)
]=∫h(x)g(x)g(x)dx =
∫h(x)dx
≈ 1M
M∑i=1
h(xi)g(xi)
, xii.i.d.∼ g(·).
(3.3)
In the case of the BEGE model, the target distribution is h(ωp,t) = fωn,t(ωp,t −
ut)f(ωp,t), and the importance sampling estimator of fBEGE(ut) is given by
fIS(ut) = 1M
M∑i=1
h(ωip,t)g(ωip,t)
, ωip,ti.i.d.∼ g(ωp,t), (3.4)
where a sound proposal distribution g(ωip,t) needs to be selected. To derive the
importance distribution, we firstly determine the mode and curvature of the tar-
get distribution h(ωip,t), and then a parametric importance distribution that shares
the same mode and curvature of the target is developed. The process of construct-
ing an efficient importance distribution for estimating fBEGE(ut) is provided in the
Appendix. The general importance sampling estimator in (3.4) is unbiased.
26
As discussed in Section 2.2, Andrieu and Roberts (2009) show that using an unbiased
likelihood estimator within an MCMC produces a Markov chain that converges
to the idealised posterior that assumes the likelihood can be computed exactly.
This has been demonstrated by marginalizing the auxiliary random variables u out
from the joint posterior π(θ,u|y) in Section 2.2. Chopin et al. (2013) and Duan
and Fulop (2015) show that using an unbiased likelihood estimator in an SMC
algorithm for data annealing and likelihood annealing, respectively, produces an
algorithm that also targets the exact posterior. We adopt a Bayesian statistical
framework for parameter estimation and model selection for GARCH-type models
that are described in Chapter 2. In the following two sections, we develop a Bayesian
statistical framework for model prediction, together with an efficient strategy for
constructing the sequence of distributions of SMC, for GARCH-type models.
3.3 Efficient sequence construction method
Under the SMC approach, the sequence of target distributions can be defined through
both likelihood annealing and data annealing. These two annealing methods provide
similar estimates of posterior distributions for the parameters of the three GARCH-
type models, which are shown in Section 3.5.1. Under the data annealing strategy,
the sequence of target distributions are obtained smoothly when the observations
arrive one at a time. The sequence of posterior distributions constructed by using
data annealing method are given by:
πt(θ|y1:t) ∝ f(y1:t|θ)π(θ),
where t = 0, . . . , T , and y1:t represents the data available up to current time t. The
corresponding unnormalised weight of the particle i at time t is thus:
wit = W it−1
f(y1:t|θit−1)f(y1:t−1|θit−1)
= W it−1f(yt|y1:t−1,θ
it−1),
which needs to be normalised. It is important to note that to update the weights
only the conditional likelihood of the introduced data point needs to be computed.
As more returns become available, the re-weighting step in data annealing does not
27
need to re-process previously collected data. This provides extensive computational
savings, especially when the time series is long.
The data annealing strategy is also useful for performing posterior prediction. One
way to consider if a GARCH-type model is appropriate would be to focus on the
model’s ability to forecast. Our Bayesian approach provides a clear method to
deduce the posterior predictive distribution conditional on currently available data.
The intermediate posterior distribution πt−1(θ|y1:t−1) from data annealing is utilised
when forecasting one-step ahead posterior prediction for data at time t. Meanwhile,
the corresponding posterior predictive density that is used to measure the prediction
accuracy can be estimated. Section 3.4 details the procedure of estimating the
posterior predictive distribution.
3.4 Bayesian posterior predictive distribution
In this section, the fully Bayesian and traditional classical approaches are compared
in the context of generating one-step ahead forecasts of conditional variances. The
distribution of the one-step ahead conditional variances together with 95% prediction
intervals are offered by the fully Bayesian approach, which cannot be provided by
the classical approach. We also provide a fully Bayesian approach for assessing time
series models’ predictive performance, which can be achieved easily with SMC data
annealing.
3.4.1 Bayesian posterior predictive distribution
Under the fully Bayesian approach, the posterior predictive distribution for the
unknown observations can be produced. New observations yt+1 can be drawn from
the posterior predictive distribution given the original data y1:t up to date t:
P(yt+1|y1:t) =∫θ
P(yt+1|θ, y1:t)πt(θ|y1:t)dθ.
The posterior predictive distribution accounts for the uncertainty associated with
the parameters without making asymptotic assumptions. Here, we consider one-step
ahead forecasts for the posterior predictions of the conditional variance σ2t+1 given
its past values and the real data up to time t, σ21:t and y1:t.
28
The posterior predictive distribution is particularly easy to estimate via SMC with
data annealing. Denote the SMC weighted sample from the posterior πt(θ|y1:t) as{W it ,θ
it
}Ni=1. Then, the posterior predictive distribution can be approximated by{
W it , (σ2
t+1)i}Ni=1 where (σ2
t+1)i ∼ p(σ2t+1|y1:t, σ
21:t,θ
it) for i = 1, . . . , N . From the
weighted sample, a point estimate (e.g. mean or median) and predictive interval can
be easily estimated. The computation associated with constructing the approxima-
tion to the predictive posterior can be easily vectorised and parallelised.
3.4.2 Forecast through maximum likelihood estimation (MLE) approach
We compare the Bayesian SMC approach for forecasting with MLE. In order to form
one-step ahead forecasts with MLE, at time t we firstly find the maximum likelihood
estimate θtMLE based on the observations up to time t. MLE parameter estimates
are obtained by maximising each model’s log-likelihood provided in Section 2.4 via
numerical optimisation. After obtaining the MLE parameter estimates at time t, a
point estimate of the prediction of the conditional variance from the three GARCH-
type models at time t+ 1 can be obtained by utilizing one of Equations (2.4), (2.6)
or (2.10) in Section 2.4.
3.4.3 Forecast evaluation
In order to assess which model provides more accurate forecasts, we adopt the leave-
future-out cross-validation (LFO-CV) method (Burkner et al., 2019). Burkner et al.
(2019) state that the common leave-one-out cross-validation (LOO-CV) is not suit-
able for assessing predictive performance when the data is sequentially ordered in
time since future observations may provide some information that affects in-sample
model fit, and thus the predictions may be affected if only leaving one observation
out at a time. Therefore, when making one-step ahead forecasts at time t+1 we use
LFO-CV that leaves out all the future values after time t to assess predictive per-
formance for time series models. To measure the predictive accuracy, the expected
log posterior density (elpd; Vehtari et al., 2017) is approximated by:
elpdLFO =T−1∑t=τ−1
ln p(yt+1|y1:t),
29
(Burkner et al., 2019) where τ is time point that we decide to start assessing the pre-
dictions, T is the number of observations in the dataset, and the density p(yt+1|y1:t)
can be approximated by evaluating an estimated density, which is constructed us-
ing kernel density estimation based on the weighted sample{W it , y
it+1}Ni=1, at the
true observation yt+1. The standard approach to LFO-CV would be to re-compute
the posterior distribution (referred to here as a refit) as each new data point is in-
troduced, however, this would be computationally intensive. Burkner et al. (2019)
propose an algorithm to approximate LFO-CV and reduce the number of refits re-
ducing computation time. However, our data annealing SMC naturally re-estimates
the posterior distribution in a computationally efficient manner as more data are
introduced, and hence, we demonstrate it is valuable for performing LFO-CV. The
weighted sample from the posterior{W it ,θ
it
}Ni=1 based on different subsets of obser-
vations y1:t is a by-product of posterior simulation via data annealing SMC, which
means there is no need to refit the time series model again to perform and test
predictions.
A simulation study is performed in Section 3.5.2 to evaluate the predictive perfor-
mance for conditional variance and explore the relative merits of the fully Bayesian
approach relative to the MLE approach. To assess predictive accuracy we provide
an estimated elpdLFO and a 95% one-step forward prediction interval for each candi-
date model with different simulated datasets. We also illustrate that the predictive
distribution of the conditional volatility can provide us with different levels of con-
fidence about the magnitude of the volatility at the next time point. We count and
record the true conditional variances from the simulated data that fall inside the
2.5% and 97.5% quantiles. We show in Section 3.5.2 that the posterior predictive
interval from each true model, which accounts for all sources of variability in the
model, provides a satisfactory coverage of the true values.
3.5 Results
The plan of this section is as follows. In Section 3.5.1 we present results for the
posterior distributions obtained by SMC algorithms for the GARCH-type models
presented in Section 2.4. The one-step ahead forecasts of conditional variances
by using both classical MLE and Bayesian approaches are illustrated in Section
30
3.5.2. The model selection results obtained from the fully Bayesian approach are
presented in Section 3.5.3. Computer code for implementing our methods is available
at https://github.com/DanLAct/GARCH-SMC.
In this study we useN = 10, 000 particles in SMC to generate posterior distributions.
The time series data we use are monthly log stock returns for the S&P 500 Compos-
ite Index from July 1926 to January 2018 from the Center of Research in Securities
Prices (CRSP). The parameters in the GARCH-type models we aim to estimate
respectively are θGarch(1,1) = (µG, αG0 , α1, β1), θGJR = (µGJR, αGJR
0 , βσ, φ, φ−), and
θBEGE = (µB, p0, n0, ρp, ρn, φ+p , φ
+n , φ
−p , φ
−n , σp, σn). Some constraints on the param-
eters are imposed initially in addition to the stationarity requirements. Such initial
constraints are chosen based on classical estimation results (e.g. Bekaert et al.,
2015). Thus, we set the priors on the parameters of the GARCH(1, 1) model as:
αG0 ∼ U(0, 0.3), α1 ∼ U(0, 0.5),
β1 ∼ U(0, 0.99), µG ∼ U(−0.9, 0.9),
where U(·) denotes a uniform probability density function. To satisfy stationarity
it is necessary to enforce α1 + β1 ≤ 0.9999. The priors on the parameters of the
GJR-GARCH(1, 1) model are given by:
αGJR0 , φ, φ− ∼ U(0, 0.3),
βσ ∼ U(0, 0.99),
µGJR ∼ U(−0.9, 0.9).
To satisfy stationarity we need φ+ 12φ−+βσ ≤ 0.9999. The priors on the parameters
of the BEGE model are given by:
p0, φ+p , φ
−p ∼ U(0, 0.5),
σp, σn ∼ U(0, 0.3), ρp, ρn ∼ U(0, 0.99),
n0 ∼ U(0, 1), φ+n ∼ U(−0.2, 0.1),
φ−n ∼ U(0, 0.75), µB ∼ U(−0.9, 0.9).
31
Additionally, to satisfy stationarity we require that ρp + 12φ
+p + 1
2φ−p ≤ 0.995 and
ρn + 12φ
+n + 1
2φ−n ≤ 0.995. All parameters are assumed to be independent in the
prior.
3.5.1 Posterior Distributions
The univariate posterior distributions obtained by our SMC algorithms for the pa-
rameters of the GARCH(1, 1), GJR-GARCH(1, 1) and BEGE models are shown in
Figures 3.1, 3.3 and 3.5, respectively. The scatterplots of bivariate posterior distri-
butions of the parameters of the three GARCH-type models are shown in Figures
3.2, 3.4, and 3.6, respectively.
0 0.5 1 1.5 2 2.5 3
10-4
0
0.5
1
1.5
2
Den
sity
104 0G
0.05 0.1 0.15 0.2 0.250
5
10
15
20
25
Den
sity
1
0.7 0.75 0.8 0.85 0.9 0.950
5
10
15
20
Den
sity
1
0 0.005 0.01 0.0150
100
200
300
400
Den
sity
G
SMC: Data AnnealSMC: Likelihood Anneal
Figure 3.1: Marginal posterior distributions of the parameters of the GARCH(1,1) modelby using SMC.
The marginal posterior estimates are constructed by using a nonparametric kernel
density estimation method. As can be seen in the figures, each parameter’s pos-
terior distribution derived by data annealing SMC is very close to that obtained
under likelihood annealing SMC. Except for the parameters of the fixed mean value
µ = (µG, µGJR) under GARCH(1, 1) and GJR-GARCH(1, 1) model, all other poste-
rior distributions demonstrate that they are not symmetric and, in the BEGE case,
are far from normally distributed, which contradicts the assumption of normally
distributed parameter estimates under standard asymptotic inference. In addition,
32
the non-normality of the joint posterior distributions are demonstrated in the scat-
terplots of bivariate distributions.
Figure 3.2: Scatterplots of bivariate distributions based on samples drawn from the poste-rior distribution of the parameters of the GARCH(1,1) model by using SMC.
0 1 2 3
10-4
0
5000
10000
15000
Densi
ty
0GJR
0 0.05 0.1 0.15 0.20
5
10
15
20
Densi
ty
0 0.1 0.2 0.30
2
4
6
8
10
Densi
ty
-
0.7 0.8 0.90
5
10
15
20
Densi
ty
0 0.005 0.010
100
200
300
400
Densi
ty
GJR
SMC: Data AnnealSMC: Likelihood Anneal
Figure 3.3: Marginal posterior distributions of the parameters of the GJR-GARCH(1,1)model by using SMC.
Figure 3.5 demonstrates that the approximate likelihood of fBEGE(ut) can lead to
biased posteriors, especially for σp and µB. Even though the unbiased estimators of
33
Figure 3.4: Scatterplots of bivariate distributions based on samples drawn from the poste-rior distribution of the parameters of the GJR-GARCH(1,1) model by using SMC.
0 0.2 0.4 0.60
1
2
3
Densi
ty
p0
-0.01 0 0.01 0.020
100
200
300
Densi
ty
p
0.4 0.6 0.8 10
5
10
15
Densi
ty
p
0 0.1 0.2 0.3 0.40
5
10
Densi
ty
p+
0 0.2 0.4 0.60
5
10
Densi
ty
p-
0 0.5 10
1
2
Densi
ty
n0
0 0.01 0.02 0.03 0.040
100
200
Densi
ty
n
0 0.5 10
2
4
Densi
ty
n
-0.2 -0.1 0 0.1 0.20
5
10
Densi
ty
n+
-0.2 0 0.2 0.4 0.6 0.80
2
4
Densi
ty
n-
0 0.005 0.01 0.0150
200
400
Densi
ty
B
Likelihood Anneal SMC: IS UnbiasedData Anneal SMC: IS UnbiasedData Anneal SMC: MC UnbiasedData Anneal SMC: Biased
Figure 3.5: Marginal posterior distributions of the parameters of the BEGE model by usingSMC.
34
the likelihood produced by Monte Carlo integration and importance sampling pro-
duce similar posterior distributions, the importance sampling estimating method is
more efficient. To test the efficiency of these two estimation methods, we choose
three sets of parameters and 100 independent likelihood estimates for the two esti-
mators with different Monte Carlo sample sizes M . As shown in Table 3.1, the Monte
Carlo integration requires at least 5,000 draws to guarantee the standard deviation
is lower than 1, while the importance sampling only needs 1,000 draws to achieve a
similar level of variance. Therefore, we adopt importance sampling estimator when
using SMC to sample from the posterior for the BEGE model below.
Figure 3.6: Scatterplots of bivariate distributions based on samples drawn from the poste-rior distribution of the parameters of the BEGE model by using SMC.
The likelihood function for the BEGE model is computationally expensive. We sig-
nificantly accelerate the likelihood calculations for the proposed parameters of all
particles in one iteration of the MCMC step of the SMC algorithm by taking ad-
vantage of parallel computing. More specifically, we allocate a set of these proposed
parameters to each core on a multi-core machine. Using the importance sampling
estimator of the likelihood and 12 cores, the SMC algorithm with data annealing
applied to the BEGE model runs 6.7 times faster than a completely serial SMC
35
Table 3.1: Comparison of unbiased Monte Carlo integration estimator and importancesampling estimator
Number of MonteCarlo Iterations Standard Deviation
ParametersMethods Monte Carlo
IntegrationImportanceSampling
Parameter Set 1
100 4.86 1.631000 3.20 0.745000 0.96 0.5010000 0.64 0.39
Parameter Set 2
100 6.09 1.631000 1.30 0.975000 0.59 0.7710000 0.42 0.69
Parameter Set 3
100 7.4 1.451000 1.47 0.845000 0.59 0.6210000 0.41 0.55
Parameter Set 1: p0 = 0.081, σp = 0.004, ρp = 0.857, φ+p = 0.084, φ−
p = 0.151, n0 = 0.817,σn = 0.023, ρn = 0.466, φ+
n = −0.131, φ−n = 0.323, µB = 0.007;
Parameter Set 2: p0 = 0.049, σp = 0.006, ρp = 0.845, φ+p = 0.120, φ−
p = 0.155, n0 = 0.700,σn = 0.022, ρn = 0.571, φ+
n = −0.180, φ−n = 0.475, µB = 0.009;
Parameter Set 3: p0 = 0.122, σp = 0.006, ρp = 0.837, φ+p = 0.108, φ−
p = 0.155, n0 = 0.792,σn = 0.023, ρn = 0.557, φ+
n = −0.183, φ−n = 0.433, µB = 0.008.
algorithm. Furthermore, the likelihood annealing SMC applied to the BEGE model
runs 6.5 times faster than a completely serial SMC algorithm.
Figure 3.7 shows the posterior distributions for the BEGE model explored by using
MH MCMC and SMC methods. The MH MCMC requires tuning of the proposal
distribution. To make implementing the MH MCMC easier, the final population
of SMC particles are used to form the proposal distribution. More specifically, we
use a multivariate normal random walk with a covariance estimated from the SMC
particles. However, even though we give MH MCMC an advantage by avoiding the
tuning step, the posteriors obtained by MCMC still perform poorly compared with
the ones from the SMC methods in Figure 3.7. For instance, the posterior distribu-
tion for parameter σp obtained by MCMC has an obvious unexpected bump, and
the trace plot in Figure 3.8 shows that MCMC can get stuck and the Markov chain
explores the parameter space slowly. The runtime of the MH MCMC algorithm with
1 million iterations is more than 2.2 times slower than the data annealing SMC that
36
exploits parallel computing. Inefficiency factor, also called the integrated autocor-
relation time, is an MCMC diagnostic and measures the computational inefficiency
of an MCMC sampler. The sampler with the lowest inefficiency factors is the most
efficient one, and a very efficient sampling algorithm has inefficiency factors close to
1. The estimated inefficiency factors are reported in Table 3.2. The table shows that
all the parameters’ inefficiency factors are above 1000, which reinforces inefficiency
of the MCMC algorithm.
0 0.2 0.4 0.60
1
2
3
Dens
ity
p0
-0.01 0 0.01 0.020
100
200De
nsity
p
0.4 0.6 0.8 10
10
20
Dens
ity
p
0 0.2 0.40
5
10
15
Dens
ity
p+
0 0.2 0.4 0.60
5
10
Dens
ity
p-
0 0.5 10
1
2
Dens
ity
n0
0 0.02 0.040
100
200
Dens
ity
n
0 0.5 10
2
4
Dens
ity
n
-0.2 0 0.20
5
10
Dens
ity
n+
-0.2 0 0.2 0.4 0.6 0.80
2
4
Dens
ity
n-
0 0.005 0.01 0.0150
200
400
Dens
ity
B
Likelihood Anneal SMC: IS Unbiased
Data Anneal SMC: IS Unbiased
MCMC: IS Unbiased
Figure 3.7: Marginal posterior distributions of the parameters of the BEGE model by usingSMC and MCMC methods for real data.
Table 3.2: Estimated inefficiency factors (IF)p0 σp ρp φ+
p φ−p n0 σn ρn φ+n φ−n µB
IF 1405.71 1677.05 1272.53 1742.26 1926.63 2748.03 1990.45 2317.56 1587.57 1445.31 1388.45
Both the MH MCMC and SMC methods are used to sample the posterior distribu-
tions for the BEGE model with simulated data, and the results are shown in Figure
3.9. We choose SMC particle parameter value with the highest likelihood to be the
initial values for the MCMC, which are the red dashed lines in Figure 3.9. As can
be seen, the estimates of the posterior distribution produced by MH MCMC are less
smooth than SMC and the point estimates used to generate simulated data are also
37
0 5 10
105
0
0.5
Va
lue
p0
0 5 10
105
0
0.01
0.02
Va
lue
p
0 5 10
105
0.6
0.8
1
Va
lue
p
0 5 10
105
0
0.2
0.4
Va
lue
p+
0 5 10
105
0
0.5
Va
lue
p-
0 5 10
105
0
0.5
1
Va
lue
n0
0 5 10
105
0
0.02
0.04
Va
lue
n
0 5 10
105
0
0.5
1
Va
lue
n
0 5 10
105
-0.2
0
0.2
Va
lue
n+
0 5 10
105
0
0.5
1
Va
lue
n-
0 5 10
105
0
0.01
0.02
Va
lue
B
0 0.2 0.4 0.60
2
4
De
nsity
p0
-0.01 0 0.01 0.020
100
200
De
nsity
p
0.6 0.8 10
10
20
De
nsity
p
0 0.2 0.40
10
20
De
nsity
p+
0 0.2 0.40
5
10
De
nsity
p-
0 0.5 10
1
2
De
nsity
n0
0.01 0.02 0.03 0.040
100
200
De
nsity
n
0 0.5 10
2
4
De
nsity
n
-0.2 0 0.20
5
10
De
nsity
n+
-0.2 0 0.2 0.4 0.6 0.80
2
4
De
nsity
n-
0 0.005 0.01 0.0150
200
400
De
nsity
B
Figure 3.8: Trace plots and marginal posterior distributions of the parameters of the BEGEmodel by using MCMC method for real data.
contained by the posterior distributions. This simulation study can be viewed as a
validation of the effectiveness of the proposed SMC methods.
0 0.2 0.4 0.60
2
4
6
Dens
ity
p0
-0.01 0 0.01 0.020
100
200
Dens
ity
p
0.6 0.8 10
5
10
Dens
ity
p
Likelihood Anneal SMC: IS Unbiased
Data Anneal SMC: IS Unbiased
MCMC: IS Unbiased
Input
0 0.2 0.4 0.60
5
10
Dens
ity
p+
0 0.2 0.40
5
10
Dens
ity
p-
0 0.5 10
2
4
6
Dens
ity
n0
0.01 0.02 0.03 0.040
100
200
Dens
ity
n
0 0.5 10
5
10
Dens
ity
n
-0.2 -0.1 0 0.10
10
20
Dens
ity
n+
0 0.2 0.4 0.6 0.80
2
4
6
Dens
ity
n-
5 10 15
10-3
0
200
400
Dens
ity
t
Figure 3.9: Marginal posterior distributions of the parameters of the BEGE model by usingSMC and MCMC methods for simulated data.
38
3.5.2 Posterior Predictive Distribution
In this section we perform a simulation study which shows the relative merits of the
Bayesian approach over the classical approach for forecasting the conditional vari-
ance. Firstly, we simulate the conditional variance of stock returns σ21:1099 from the
GARCH(1, 1) model, the GJR-GARCH(1, 1) model and the BEGE model, respec-
tively. The parameters used to generate the three simulated data sets are randomly
selected from the final population of SMC particles when modelling real time series
data with the three GARCH-type models. The initial values of conditional vari-
ance at time t = 1 of the three GARCH-type models are (σ21)GARCH = 0.0026,
(σ21)GJR−GARCH = 0.0022 and (σ2
1)BEGE = σ2pp1 +σ2
nn1 = 0.0013. The total number
of simulated data in our study is 1099. We consider the performance of one-step
ahead prediction based on simulated conditional variances t = 201 to t = 1099.
Besides using the true model to fit the simulated data, we also perform a study of
model misspecification by using misspecified models to fit the simulated data and
make predictions.
Here, we present the results for the one-step ahead forecasts of conditional vari-
ances of the three GARCH-type models obtained via data annealing SMC and MLE
approaches. In order to provide a clearer visual interpretation, in Figure 3.10 we
illustrate the results for forecasted conditional volatilities, which are the square roots
of the forecasted conditional variances. Figure 3.10 shows the 95% prediction inter-
val bounds for the last 100 predicted volatilities σ1000:1099 simulated from the three
GARCH-type models by using our Bayesian approach. The 95% prediction interval
is built simply by computing the 0.025 and 0.975 quantiles of the square roots of
the posterior predictive sample of the conditional variances. However, the classical
approach cannot produce such prediction intervals for the conditional volatility. It
is obvious that these prediction intervals capture the simulated data successfully.
We take posterior medians as point estimators for the Bayesian approach. After
illustrating these posterior medians together with the classical point estimators of
three GARCH-type models in Figures 3.10, we can see that the differences between
the Bayesian and classical point estimators of predictions are small.
39
1000 1010 1020 1030 1040 1050 1060 1070 1080 1090 11000.02
0.04
0.06
0.08
0.1
0.12
(a)
1000 1010 1020 1030 1040 1050 1060 1070 1080 1090 11000.02
0.03
0.04
0.05
0.06(b)
1000 1010 1020 1030 1040 1050 1060 1070 1080 1090 1100
0.05
0.1
0.15(c)
Posterior MedianSimulated Volatility from True ModelBayesian 95% Prediction IntervalMLE Point Estimate
Figure 3.10: One-step ahead forecasts for conditional standard deviation of theGARCH(1, 1) model, the GJR-GARCH(1, 1) model and the BEGE model. (a) One-stepahead 95% prediction interval of conditional standard deviation of GARCH(1, 1) modelwith simulated data generated from GARCH(1, 1) model; (b) One-step ahead 95% predic-tion interval of conditional standard deviation of GJR-GARCH(1, 1) model with simulateddata generated from GJR-GARCH(1, 1) model; (c) One-step ahead 95% prediction inter-val of conditional standard deviation of BEGE model with simulated data generated fromBEGE model.
Table 3.3: Frequency of simulated conditional variances falling outside 95% predictioninterval
ModelData Simulated Data
from GARCH(1, 1)
Simulated Datafrom
GJR-GARCH(1, 1)
Simulated Datafrom BEGE
GARCH(1, 1) 0.9% 34.9% 47.8%GJR-GARCH(1, 1) 8.8% 0.8% 28.4%
BEGE 2.8% 0.1% 1.2%
40
The 95% Bayesian prediction intervals for the one-step ahead forecasted conditional
variance of the three GARCH-type models contain true conditional variances with
a satisfactory coverage level. Besides checking the accuracy of Bayesian predictive
distributions for conditional volatility by using the true model, we also use mis-
specified models to fit the simulated data and compare their performance with the
true model. Table 3.3 summarises the frequency of true values of simulated con-
ditional variances t = 201 to t = 1099 falling outside each model’s 95% prediction
interval. For the data simulated from the BEGE model, there are 1.2% of true
squared volatilities falling outside the predicted interval for the true BEGE model,
but the other two misspecified models provide poor predictive performance. For
the data simulated from the GARCH(1, 1) model and GJR-GARCH(1, 1) model,
there are less than 1% of the true conditional variances outside the associated 95%
prediction interval for these true models. Overall, the Bayesian prediction intervals
demonstrate reasonable coverage rates of true conditional volatilities for the three
GARCH-type models in the simulation study, and such a conclusion is reinforced by
considering each model’s estimated elpdLFO in Table 3.4. The model with the high-
est value of elpdLFO is the one that has the best predictive performance. We can see
that there is an agreement between the results of elpdLFO and coverage rates of the
95% prediction intervals used to assess candidate models’ predictive performance.
It is worth noting that the BEGE model performs well for forecasting conditional
variances under the dataset simulated from the GJR-GARCH(1, 1) model, which in-
dicates that the performance of the BEGE model is robust to model misspecification
in this respect. Therefore, even though the real volatility of the real stock returns is
unobservable, this simulation study provides some confidence that the real volatility
can be adequately captured and covered by the prediction interval with adoption of
the Bayesian approach.
Table 3.4: Approximated elpdLFO for one-step ahead forecasts of simulated conditionalvariances from t = 201 to t = 1099
ModelData Simulated Data
from GARCH(1, 1)
Simulated Datafrom
GJR-GARCH(1, 1)
Simulated Datafrom BEGE
GARCH(1, 1) 6531.4 3895.3 -847.9GJR-GARCH(1, 1) 6278.8 6364.0 5492.9
BEGE 6364.4 6393.9 6188.1
41
Figure 3.11 shows the time series for real stock returns and one-step ahead fore-
casts for conditional variances obtained via the Bayesian approach. The Bayesian
posterior median value of conditional variance is set as the point estimate of one-
step ahead prediction. It is immediately apparent that the Bayesian approach can
provide efficient predictions for the three GARCH-type models, which successfully
capture the features of the data such as the evident growth of volatility following
the negative shock of 1987 crash.
1944 1950 1960 1970 1980 1990 2000 2010-0.3
-0.2
-0.1
0
0.1
0.2
Month
ly L
og-r
etu
rn
1944 1950 1960 1970 1980 1990 2000 20100
0.005
0.01
0.015
Conditio
nal V
ariance
GARCH(1,1)
GJR-GARCH(1,1)
BEGE
Figure 3.11: One-step ahead Bayesian forecasts for conditional variance of GARCH(1, 1),GJR-GARCH(1, 1) and BEGE with real time series data by using the Bayesian approach.The prediction period is from March 1943 to January 2018. The top panel shows monthlylog-return series, and the bottom panel shows the results of one-step ahead forecasted con-ditional variances from the three GARCH-type models.
Another promising aspect of the Bayesian approach is reflected in the provision of
the predictive distribution of conditional volatility, which quantifies the uncertainty
in the one-step ahead predicted volatility. However, this predictive distribution
cannot be analytically derived. From the predictive distribution, more information
about the one-step forward unobserved volatility can be extracted. For instance,
42
the probability of the conditional volatility at the next time point being within
any particular range can be computed. Figure 3.12 (a) shows a distribution of the
predicted conditional volatility of the GJR-GARCH(1, 1) model at time t = 586
conditional on real data t = 1 to t = 585. From this distribution, we can see that
the conditional volatilities with values around 0.076 have a high probability of being
observed. In contrast, the predictive distribution in Figure 3.12 (b) indicates that
there is a very low chance of the conditional volatility being greater than 0.076, but
it shows a high chance of a relatively low conditional volatility being observed at
time t = 743 from the BEGE model. Figure 3.12 (c) shows that there is a lot of
uncertainty about the relatively high volatility at time t = 580.
0.064 0.066 0.068 0.07 0.072 0.074 0.076 0.078 0.08 0.082 0.0840
50
100
150
200
GJR-GARCH: t = 586
(a)
0.045 0.05 0.055 0.06 0.065 0.07 0.075 0.08 0.085 0.09 0.0950
20
40
60
80
BEGE: t = 743
(b)
0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13 0.140
10
20
30
40
50
60
BEGE: t = 580
(c)
Figure 3.12: Predictive distribution for real data. (a) One-step ahead predicted conditionalvolatility at time t = 586 of GJR-GARCH(1, 1) model; (b) One-step ahead predicted condi-tional volatility at time t = 743 of BEGE model; (c) One-step ahead predicted conditionalvolatility at time t = 580 of BEGE model.
43
3.5.3 Model Choice
In this section we perform a simulation study to assess the performance of the SMC
estimator of the Bayesian model evidence for model selection for the GARCH-type
models. We first simulate five sets of data from each competing model of interest,
producing 15 datasets in total. The associated parameter values used to generate
simulated data are randomly selected from the SMC posterior samples of the three
GARCH-type models in Section 3.5.1. After obtaining simulated data, we use SMC
to estimate the log evidence for each candidate model given different sets of simulated
data.
Table 3.5: Estimation of Log Evidence
Simulated Data from GARCH(1, 1)
ModelData 1 2 3 4 5
GARCH(1, 1) 1811.4 1578.5 1843.6 1936.2 1728.8GJR-GARCH(1, 1) 1810.2 1576.6 1842.1 1934.2 1727.9
BEGE 1805.1 1572.5 1837.8 1930.1 1719.9Simulated Data from GJR-GARCH(1, 1)
ModelData 1 2 3 4 5
GARCH(1, 1) 1716.0 1804.1 1941.5 2017.7 1778.5GJR-GARCH(1, 1) 1730.1 1811.1 1952.6 2021.2 1784.0
BEGE 1723.8 1805.4 1943.7 2014.3 1777.0Simulated Data from BEGE
ModelData 1 2 3 4 5
GARCH(1, 1) 1931.0 1856.1 1819.2 1876.5 1786.4GJR-GARCH(1, 1) 1945.2 1865.2 1831.0 1886.9 1803.9
BEGE 1978.8 1881.4 1873.1 1915.7 1862.6
ModelData Real Time Series Data
Data Annealing SMC Likelihood Annealing SMCGARCH(1, 1) 1811.5 1811.4
GJR-GARCH(1, 1) 1817.5 1818.0BEGE 1845.4 1844.9
The results of each model’s estimation of log evidence given the simulated data
are presented in Table 3.5. As the data generating process (DGP) is known, the
performance of the evidence can be assessed by checking if the true model is cor-
rectly selected according to the estimated value of the log evidence. As can be seen
44
from Table 3.5, for each simulated dataset the model with the highest estimated
log evidence corresponds to the DGP. This demonstrates the effectiveness of the
fully Bayesian approach for model choice and the SMC evidence estimator for the
GARCH-type models and length of time series investigated here. It is worth not-
ing that for simulated data from the GARCH(1, 1) model, the GJR-GARCH(1, 1)
model has only a slightly smaller log evidence value compared to the GARCH(1, 1)
model. However, when data is generated from the GJR-GARCH(1, 1) model, the log
evidence of the GARCH(1, 1) model is much smaller than the GJR-GARCH(1, 1)
model. When the GARCH(1, 1) model is true, there is a possibility that the GJR-
GARCH(1, 1) model can mimic the behaviour of the GARCH(1, 1) model since the
GARCH(1, 1) model is nested within the GJR-GARCH(1, 1) model.
Moreover, each model’s estimated log evidence for real time series data is also pro-
vided in Table 3.5. There is strong evidence that the BEGE model is preferred over
the other two GARCH-type models for this dataset. This is consistent with the re-
sult from Bekaert et al. (2015) and reflects the fact that the BEGE model endowed
with non-Gaussian innovations outperforms the traditional GARCH model and the
standard asymmetric GJR-GARCH model when applied to equity market returns.
3.6 Discussion
In this chapter, a novel Bayesian framework for the analysis of GARCH-type models
has been developed, that allows for parameter estimation, model selection and pre-
diction. To this end, an efficient fully Bayesian approach has been developed using
SMC. To obtain exact Bayesian inference for the BEGE model, we proposed to use
an unbiased likelihood estimator in place of the biased approximation adopted in
Bekaert et al. (2015). This approach, in comparison to traditional estimation infer-
ence and forecasting techniques, offers a number of important advantages. Overall,
in contrast to traditional methods, the Bayesian framework offers a direct method
for quantifying the uncertainty surrounding both model parameter estimates and
volatility forecasts. It is shown that SMC offers a number of advantages over stan-
dard Monte-Carlo methods. For such applications on long time series data, the
SMC approach proposed here offers an efficient method generating the full predic-
tive distribution of volatility, computation of the evidence for model comparison
45
and undertaking leave-one-out cross-validation. Although this study has focused
on GARCH-type models, the benefits of the proposed approach would translate to
other time series models for alternative economic or financial problems. We have
demonstrated here that SMC data annealing can be an efficient approach for com-
paring the predictive performance of time series models. An interesting avenue for
future research would be to explore this idea more comprehensively across differ-
ent time series models and to draw comparisons with the approximate approach of
Burkner et al. (2019).
46
Chapter 4
Conclusion
4.1 Summary
The aim of this thesis was to develop and demonstrate the benefits of a computa-
tional Bayesian method called sequential Monte Carlo for estimation, prediction and
model selection for GARCH-type models for financial time series data. By complet-
ing this research, the objectives discussed in Chapter 1 have been met, and in doing
so a contribution to the Bayesian literature has been made.
In Chapter 3, we established a fully Bayesian approach to statistical inference for
the GARCH-type models via SMC, with the framework was discussed comprehen-
sively in Chapter 2. Comparing with the classical approach and MCMC Bayesian
approach, the potential advantages of using SMC have been demonstrated through a
novel application of the SMC method to three specific GARCH-type models used to
forecast the asset return volatility. These advantages include handling non-normal
posterior distributions, easy exploitation of parallel computing, and the ability to
perform Bayesian model selection by approximating the model evidence. For the
BEGE model, which has an intractable likelihood function, we developed an unbi-
ased likelihood estimator that permits exact Bayesian inference for the first time.
After deriving an effective importance distribution, we demonstrated that the impor-
tance sampling is a more efficient way to generate an unbiased version of the BEGE
likelihood relative to the standard Monte Carlo method. We demonstrated the effi-
ciency and superiority of SMC data annealing for performing posterior predictions
and comparing the predictive performance of time series models.
47
However, this research also has its limitations. The Bayes factor, which was used as
a Bayesian approach to model selection in this thesis, can be very sensitive to the
prior specification for the parameter of interest (Liu and Aitkin, 2008). In particular,
candidate models with informative priors are typically favored over the ones with
non-informative prior distributions. This drawback can be assessed by conducting a
sensitivity analysis using a wide range of potential priors, which has not been done
in this thesis as it would be very computationally intensive.
4.2 Future research directions
For future work, we expect that the benefits of our fully Bayesian approach can be
realised in alternative econometrics and other time series models. We showed the
potential of using SMC data annealing for leave-one-out cross-validation for time
series data. It would be interesting to investigate this idea more comprehensively
across a wider set of time series models and compare with the approximate approach
of Burkner et al. (2019) and other predictive comparison tools.
One of the main foci of the thesis is on the Bayesian approach for one-step ahead
forecasts, where the parameter uncertainty has been considered. Blasques et al.
(2016) have considered multiple-step ahead predictions with incorporating future
innovation uncertainty, which is related to the unknown true underlying process of
the volatility. It is crucial to consider the uncertainty of future innovations when
making multiple-step ahead forecasts according to Blasques et al. (2016). Therefore,
it would be of interest to derive multiple-step ahead predictions through a Bayesian
approach with taking into account the future innovation uncertainty, and comparing
this with the methods proposed in the literature.
For model selection tools, the widely applicable (or Watanabe-Akaike) information
criterion (WAIC, Watanabe 2010) might be preferable as it is less sensitive to the
prior distribution and easier to compute. Also, it still integrates over the posterior
distribution and does not rely on point estimates like BIC and DIC. However, cur-
rently the WAIC is not available for time series data, and thus it could be an avenue
for future research.
48
Appendix
Importance sampling estimator of likelihood of the BEGE
model
As described in Section 3.2, we need an importance distribution that will produce
a low variance estimator of the likelihood with a small Monte Carlo sample size.
This appendix shows that the likelihood requires to be estimated only when both
pt > 1 and nt > 1, and in other cases an exact likelihood can be computed. For
simplicity, we only consider here the likelihood for a single observation. Four cases
are considered to compute the likelihood fBEGE(u) with the following expression
fBEGE(u) =∫ωpfωn(ωp − u)f(ωp)dωp,where
fωp(ωp) = 1Γ(p)σpp
(ωp − ωp)p−1 exp(− 1σp
(ωp − ωp)), for ωp > ωp
fωn(ωp − u) = 1Γ(n)σnn
(ωp − u− ωn)n−1 exp(− 1σn
(ωp − u− ωn)), for ωp − u > ωn
where p ≥ 1, n ≥ 1, ωp = −pσp, and ωn = −nσn. The lower limit of integration
in the expression for fBEGE(u) is ωp, which must be greater than ωp and ωn + u.
Therefore, we define the lower limit of integration as δ = max (ωp, ωn + u).
Firstly, define σ = 1σp
+ 1σn
. When p = n = 1 an exact expression for fBEGE(u) can
be found by
fBEGE(u) =∫ ∞δ
1σp
exp(− 1σp
(ωp − ωp))
1σn
exp(− 1σn
(ωp − (u+ ωn)))
dωp
= k1
∫ ∞δ
exp (−σωp) dωp
= k11σ
exp(−σδ),where k1 = 1σp
1σn
exp(ωpσp
)exp
(u+ ωnσn
).
49
Then, the log-likelihood of the BEGE model in this case is ln [fBEGE(u)] = ln k1 −
ln σ − σδ. When p = 1 and n > 1, an exact expression for the likelihood can be
found as
fBEGE(u) =∫ ∞δ
1σp
exp(− 1σp
(ωp − ωp))
1Γ(n)σnn
(ωp − u− ωn)n−1 exp(− 1σn
(ωp − u− ωn))
dωp,
define k2 = 1σp
1Γ(n)σnn
exp(ωpσp
), and c1 = k2 exp
(− 1σp
(u+ ωn)), then the likelihood
can be written as
fBEGE(u)
=∫ ∞δ
k2 exp(− 1σn
(ωp − u− ωn))
exp(−ωpσp
+ u+ ωnσp
− u+ ωnσp
)(ωp − u− ωn)n−1 dωp
=∫ ∞δ
c1 exp (−σ (ωp − u− ωn)) (ωp − u− ωn)n−1 dωp
= c1Γ(n)σn
∫ ∞δ
σn
Γ(n) (ωp − u− ωn)n−1 exp (−σ (ωp − u− ωn)) dωp.
We set xp = ωp − u − ωn = ωp − (u + ωn), which gives ωp = xp + u + ωn and
dωp = dxp. The likelihood then can be written as
fBEGE(u) = c1Γ(n)σn
∫ ∞δ−(u+ωn)
σn
Γ(n)xn−1p exp (−σxp) dxp
=
c1
Γ(n)σn , if δ = u+ ωn
c1Γ(n)σn [1− F (ωp − (u+ ωn), n, σ)] , if δ = ωp,
where F is the CDF of a gamma distribution with parameters n and σ over the
interval of (0, ωp − (u+ ωn)). Similarly, in the case of n = 1 and p > 1 an exact
expression for the likelihood can be found as
fBEGE(u) =∫ ∞δ
1σn
exp(− 1σn
(ωp − (u+ ωn))) 1
Γ(p)σpp(ωp − ωp)p−1 exp
(− 1σp
(ωp − ωp))
dωp,
50
define k3 = 1σn
1Γ(p)σpp
exp(u+ωnσn
)and c2 = k3 exp
(−ωpσn
), then the likelihood can be
written as
fBEGE(u)
=∫ ∞δ
k3 exp(− 1σp
(ωp − ωp))
exp(−ωpσn
+ ωpσn− ωpσn
)(ωp − ωn)p−1 dωp
=∫ ∞δ
c2 (ωp − ωn)p−1 exp (−σ (ωp − ωp)) dωp
= c2Γ(p)σp
∫ ∞δ
σp
Γ(p) (ωp − ωn)p−1 exp (−σ (ωp − ωp)) dωp.
We set xp = ωp−ωp, which gives ωp = xp +ωp and dωp = dxp. The likelihood then
can be written as
fBEGE(u) = c2Γ(p)σp
∫ ∞δ−ωp
σp
Γ(p)xp−1p exp (−σxp) dxp
=
c2
Γ(p)σp , if δ = ωp
c2Γ(p)σp [1− F (u+ ωn − ωp, p, σ)] , if δ = u+ ωn,
where F is the CDF of a gamma distribution with parameters p and σ over the
interval of (0, u+ ωn − ωp).
In the case of p > 1 and n > 1, we use importance sampling method to estimate
likelihood of the BEGE model. Here we attempt to find a parametric importance
distribution g(ωp) that matched closely the target distribution h(ωp). Firstly, we
determine the mode and curvature of h(ωp). The expression for h(ωp) is given by
h(ωp) = 1Γ(p)σpp
(ωp − ωp)p−1 exp(− 1σp
(ωp − ωp))
1Γ(n)σnn
(ωp − u− ωn)n−1 exp(− 1σn
(ωp − u− ωn)).
We set m(ωp) = ln [h(ωp)], which is given by
m(ωp) =− ln Γ(p)− p ln σp + (p− 1) ln (ωp − ωp)−1σp
(ωp − ωp)
− ln Γ(n)− n ln σn + (n− 1) ln (ωp − u− ωn)− 1σn
(ωp − u− ωn).
51
The first and second derivatives of m(ωp) are given by
m′(ωp) = p− 1ωp − ωp
+ n− 1ωp − u− ωn
− σ,
m′′(ωp) = 1− p(ωp − ωp)2 + 1− n
(ωp − u− ωn)2 .
The mode ωp of m(ωp) can be found by setting m′(ωp) = 0, then we get
(p− 1)(ωp − u− ωn) + (n− 1)(ωp − ωp)− σ(ωp − ωp)(ωp − u− ωn) = 0,
σω2p + [−σ(u+ ωn + ωp) + (2− n− p)]ωp + [σωp(u+ ωn) + (p− 1)(u+ ωn) + (n− 1)ωp] = 0.
By setting a = σ, b = −σ(u + ωn + ωp) + (2 − n − p), and c = σωp(u + ωn) +
(p − 1)(u + ωn) + (n − 1)ωp, then we can write aω2p + bωp + c = 0 and the mode
ωp = −b+√b2−4ac
2a can be easily derived. The variance of m(ωp) is estimated by using
the second-derivative of m(ωp) around the mode ωp, which is given by
V = −[
1− p(ωp − ωp)2 + 1− n
(ωp − u− ωn)2
]−1
.
We propose a gamma distribution, X ∼ Γ(i, j) for x > 0, with mode ωp and variance
V to be the importance distribution. The mode of this gamma distribution is (i−1)j
and the variance is ij2. As ωp > δ, we need to make the importance distribution
greater than δ and a shift gamma distribution, Y = X + δ for y > δ, is selected. As
x = y − δ, we can show that∣∣∣dxdy
∣∣∣ = 1. Then an expression for the density of this
importance distribution g(y) is given by
g(y) = fX(x(y))∣∣∣∣dxdy
∣∣∣∣ = 1jiΓ(i)(y − δ)i−1 exp(−y − δ
j),
where the mode of this distribution is (i− 1)j+ δ and the variance is ij2. The value
of parameter i and j can be derived by solving (i− 1)j + δ = ωp and ij2 = V , and
the results are given by
i =−bi +
√b2i − 4V 2
2V,where bi = −2V − (ωp − δ)2
j = ωp − δi− 1 .
52
After obtaining the importance distribution g(·), we can unbiasedly estimate fBEGE(u)
by following Equation 3.4. It is worth noting that this importance distribution can
be used only when the mode ωp is greater than δ, which is the lower limit of inte-
gration in the expression for fBEGE(u). For this case, it can be proved that ωp is
always greater than δ, and the details of proof are shown below.
(1) when δ = ωp > (u+ ωn),
(2aδ + b)2 = (2σωp + b)2
= b2 + 4σ2ω2p + 4σωp [−σ(u+ ωn + ωp) + (2− n− p)]
= b2 + 4σ2ω2p − 4σ2ω2
p − 4σ2ωp(u+ ωn) + 4σωp(2− n− p)
= b2 −[4σ2(u+ ωn)ωp + 4σ(n− 1)ωp + 4σ(p− 1)ωp
]= b2 − 4ad,where d = σ(u+ ωn)ωp + (n− 1)ωp + (p− 1)ωp,
∵ c = σ(u+ ωn)ωp + (n− 1)ωp + (p− 1)(u+ ωn), and ωp > (u+ ωn)
∴ d > c, 4ad > 4ac, b2 − 4ad < b2 − 4ac, (2aδ + b)2 < b2 − 4ac, 2aδ + b <√b2 − 4ac,
δ <−b+
√b2 − 4ac
2a
δ < ωp;
(2) when δ = (u+ ωn) > ωp,
(2aδ + b)2 = (2σ(u+ ωn) + b)2
= b2 + 4σ2(u+ ωn)2 + 4σ(u+ ωn) [−σ(u+ ωn + ωp) + (2− n− p)]
= b2 + 4σ2(u+ ωn)2 − 4σ2(u+ ωn)2 − 4σ2ωp(u+ ωn)− 4σ(u+ ωn)(n+ p− 2)
= b2 − 4σ [σ(u+ ωn)ωp + (n− 1)(u+ ωn) + (p− 1)(u+ ωn)]
= b2 − 4ae,where e = σ(u+ ωn)ωp + (n− 1)(u+ ωn) + (p− 1)(u+ ωn),
∵ c = σ(u+ ωn)ωp + (n− 1)ωp + (p− 1)(u+ ωn), and (u+ ωn) > ωp
∴ e > c, 4ae > 4ac, b2 − 4ae < b2 − 4ac, (2aδ + b)2 < b2 − 4ac, 2aδ + b <√b2 − 4ac,
δ <−b+
√b2 − 4ac
2a
δ < ωp.
53
References
Alberg, D., Shalit, H., and Yosef, R. (2008). Estimating stock market volatility using
asymmetric GARCH models. Applied Financial Economics, 18(15):1201–1208. 14
Alexander, C. and Lazar, E. (2006). Normal mixture GARCH(1, 1): Applications to ex-
change rate modelling. Journal of Applied Econometrics, 21(3):307–336. 16
Andrieu, C. and Roberts, G. O. (2009). The pseudo-marginal approach for efficient Monte
Carlo computations. The Annals of Statistics, 37(2):697–725. 7, 27
Ardia, D. et al. (2008). Financial risk management with Bayesian estimation of GARCH
models. Springer. 18
Ardia, D. and Hoogerheide, L. F. (2010). Efficient bayesian estimation and combination
of GARCH-type models. Rethinking Risk Measurement and Reporting: Examples and
Applications from Finance, 2. 19
Bauwens, L. and Lubrano, M. (1998). Bayesian inference on GARCH models using the
Gibbs sampler. The Econometrics Journal, 1(1):C23–C46. 19
Bauwens, L. and Lubrano, M. (2002). Bayesian option pricing using asymmetric GARCH
models. Journal of Empirical Finance, 9(3):321–342. 19
Bekaert, G., Engstrom, E., and Ermolov, A. (2015). Bad environments, good environments:
A non-Gaussian asymmetric volatility model. Journal of Econometrics, 186(1):258–275.
55
2, 15, 17, 18, 23, 25, 31, 45
Berger, J. O. (2013). Statistical decision theory and Bayesian analysis. Springer Science &
Business Media. 6
Berkes, I., Horvath, L., and Kokoszka, P. (2003). Garch processes: Structure and estimation.
Bernoulli, pages 201–227. 18
Black, F. (1976). Studies of stock market volatility changes. Proceedings of the American
Statistical Association, the Business and Economics Section. 14, 23
Blasques, F., Koopman, S. J., Lasak, K., and Lucas, A. (2016). In-sample confidence bands
and out-of-sample forecast bands for time-varying parameters in observation-driven mod-
els. International Journal of Forecasting, 32(3):875–887. 48
Bollerslev, T. (1986). Generalized autoregressive conditional heteroskedasticity. Journal of
Econometrics, 31(3):307–327. 2, 14, 15, 23
Bollerslev, T. and Wooldridge, J. M. (1992). Quasi-maximum likelihood estimation and infer-
ence in dynamic models with time-varying covariances. Econometric reviews, 11(2):143–
172. 18
Bolstad, W. M. and Curran, J. M. (2016). Introduction to Bayesian statistics. John Wiley
& Sons. 6
Bouchaud, J.-P., Matacz, A., and Potters, M. (2001). Leverage effect in financial markets:
The retarded volatility model. Physical review letters, 87(22):228701. 14
Burkner, P.-C., Gabry, J., and Vehtari, A. (2019). Approximate leave-future-out cross-
validation for time series models. arXiv preprint arXiv:1902.06281. 29, 30, 46, 48
Caporale, G. M., Ali, F. M., and Spagnolo, N. (2015). Oil price uncertainty and sectoral
stock returns in China: A time-varying approach. China Economic Review, 34:311–321.
18
56
Chen, C. W., Gerlach, R., and So, M. K. (2006). Comparison of nonnested asymmetric
heteroskedastic models. Computational Statistics & Data Analysis, 51(4):2164–2178. 19
Chib, S. and Greenberg, E. (1995). Understanding the Metropolis-Hastings algorithm. The
American Statistician, 49(4):327–335. 6
Chopin, N. (2002). A sequential particle filter method for static models. Biometrika,
89(3):539–552. 1, 8, 10, 11, 24
Chopin, N., Jacob, P. E., and Papaspiliopoulos, O. (2013). SMC2: an efficient algorithm for
sequential analysis of state space models. Journal of the Royal Statistical Society: Series
B (Statistical Methodology), 75(3):397–426. 10, 27
Corradi, V., Distaso, W., and Swanson, N. R. (2009). Predictive density estimators for daily
volatility based on the use of realized measures. Journal of Econometrics, 150(2):119–138.
24
Corradi, V., Distaso, W., and Swanson, N. R. (2011). Predictive inference for integrated
volatility. Journal of the American Statistical Association, 106(496):1496–1512. 24
Del Moral, P., Doucet, A., and Jasra, A. (2006). Sequential Monte Carlo samplers. Journal
of the Royal Statistical Society: Series B (Statistical Methodology), 68(3):411–436. 1, 20,
24
Drovandi, C. C. and Pettitt, A. N. (2011). Likelihood-free Bayesian estimation of multivari-
ate quantile distributions. Computational Statistics and Data Analysis, 55(9):2541–2556.
11
Duan, J.-C. and Fulop, A. (2015). Density-tempered marginalized sequential Monte Carlo
samplers. Journal of Business & Economic Statistics, 33(2):192–202. 10, 27
Dyhrberg, A. H. (2016). Bitcoin, gold and the dollar–A GARCH volatility analysis. Finance
Research Letters, 16:85–92. 14
57
Engle, R. (2001). GARCH 101: The use of ARCH/GARCH models in applied econometrics.
Journal of Economic Perspectives, 15(4):157–168. 16, 23
Engle, R. (2004). Risk and volatility: Econometric models and financial practice. American
Economic Review, 94(3):405–420. 14
Engle, R. F. (1982). Autoregressive conditional heteroscedasticity with estimates of the
variance of United Kingdom inflation. Econometrica: Journal of the Econometric Society,
pages 987–1007. 23
Engle, R. F. and Ng, V. K. (1993). Measuring and Testing the Impact of News on Volatility.
The Journal of Finance, 48(5):1749–1778. 15
Francq, C. and Zakoıan, J.-M. (2011). GARCH Models: Structure, Statistical Inference and
Financial Applications. John Wiley & Sons. 15, 16
Friel, N. and Wyse, J. (2012). Estimating the evidence–a review. Statistica Neerlandica,
66(3):288–308. 20
Gelman, A., Hwang, J., and Vehtari, A. (2014). Understanding predictive information
criteria for Bayesian models. Statistics and Computing, 24(6):997–1016. 21
Gerlach, R. and Chen, C. W. (2008). Bayesian inference and model comparison for asym-
metric smooth transition heteroskedastic models. Statistics and Computing, 18(4):391.
19
Geweke, J. (1989). Bayesian inference in econometric models using Monte Carlo integration.
Econometrica: Journal of the Econometric Society, pages 1317–1339. 19
Giraitis, L., Leipus, R., and Surgailis, D. (2007). Recent advances in ARCH modelling. In
Long Memory in Economics, pages 3–38. Springer. 14
58
Glosten, L. R., Jagannathan, R., and Runkle, D. E. (1993). On the relation between the
expected value and the volatility of the nominal excess return on stocks. The Journal of
Finance, 48(5):1779–1801. 14, 23
Griffin, J., Liu, J., and Maheu, J. (2016). Bayesian nonparametric estimation of ex-post
variance. Technical report, University Library of Munich, Germany. 24
Guesmi, K., Saadi, S., Abid, I., and Ftiti, Z. (2019). Portfolio diversification with virtual
currency: Evidence from bitcoin. International Review of Financial Analysis, 63:431–437.
18
Haario, H., Laine, M., Mira, A., and Saksman, E. (2006). DRAM: Efficient adaptive MCMC.
Statistics and Computing, 16(4):339–354. 8
Haario, H., Saksman, E., Tamminen, J., et al. (2001). An adaptive Metropolis algorithm.
Bernoulli, 7(2):223–242. 8
Hall, P. and Yao, Q. (2003). Inference in ARCH and GARCH models with heavy–tailed
errors. Econometrica, 71(1):285–317. 18
Hamilton, J. D. (1994). Time Series Analysis, volume 2. Princeton university press Prince-
ton, NJ. 2, 24
Hansen, P. R. and Huang, Z. (2016). Exponential GARCH modeling with realized measures
of volatility. Journal of Business & Economic Statistics, 34(2):269–287. 18
Hastings, W. K. (1970). Monte Carlo sampling methods using Markov chains and their
applications. Biometrika, 57(1):97–109. 1, 6
Hentschel, L. et al. (1995). All in the family: Nesting symmetric and asymmetric GARCH
models. Journal of Financial Economics, 39(1):71–104. 23
59
Jeffreys, H. (1935). Some tests of significance, treated by the theory of probability. In
Mathematical Proceedings of the Cambridge Philosophical Society, volume 31, pages 203–
222. Cambridge University Press. 19
Kass, R. E. and Raftery, A. E. (1995). Bayes factors. Journal of the American Statistical
Association, 90(430):773–795. 19, 21
Khedhiri, S. and Muhammad, N. (2008). Empirical analysis of the UAE stock market
volatility. 14
Kim, S., Shephard, N., and Chib, S. (1998). Stochastic volatility: likelihood inference and
comparison with ARCH models. The Review of Economic Studies, 65(3):361–393. 2, 24
Koutmos, G. (1998). Asymmetries in the conditional mean and the conditional variance:
Evidence from nine stock markets. Journal of Economics and Business, 50(3):277–290.
14
Kristjanpoller, W. and Minutolo, M. C. (2015). Gold price volatility: A forecasting approach
using the artificial neural network–GARCH model. Expert Systems with Applications,
42(20):7245–7251. 14
Kristjanpoller, W. and Minutolo, M. C. (2016). Forecasting volatility of oil price using an
artificial neural network-GARCH model. Expert Systems with Applications, 65:233–241.
14
Kuha, J. (2004). AIC and BIC: Comparisons of assumptions and performance. Sociological
Methods and Research, 33(2):188–229. 21
Li, D., Clements, A., and Drovandi, C. (2019). Efficient Bayesian estimation for GARCH-
type models via sequential Monte Carlo. arXiv preprint arXiv:1906.03828. 4
Liu, C. C. and Aitkin, M. (2008). Bayes factors: Prior sensitivity and model generalizability.
Journal of Mathematical Psychology, 52(6):362–375. 48
60
Liu, J. S. (2008). Monte Carlo strategies in scientific computing. Springer Science & Business
Media. 9
Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H., and Teller, E. (1953).
Equation of state calculations by fast computing machines. The Journal of Chemical
Physics, 21(6):1087–1092. 1, 6
Mills, T. C. and Markellos, R. N. (2008). The econometric modelling of financial time series.
Cambridge University Press. 1
Muller, P. and Pole, A. (1998). Monte Carlo posterior integration in GARCH models.
Sankhya: The Indian Journal of Statistics, Series B, pages 127–144. 19
Nakatsuma, T. (1998). A Markov-chain sampling algorithm for GARCH models. Studies in
Nonlinear Dynamics & Econometrics, 3(2). 19
Nakatsuma, T. (2000). Bayesian analysis of ARMA–GARCH models: A Markov chain
sampling approach. Journal of Econometrics, 95(1):57–69. 19
Neal, R. M. (2001). Annealed importance sampling. Statistics and Computing, 11(2):125–
139. 9, 10, 24
Nelson, D. B. (1991). Conditional Heteroskedasticity in Asset Returns: A New Approach.
Econometrica: Journal of the Econometric Society, pages 347–370. 14, 23
O’Hagan, A. (2004). Bayesian statistics: principles and benefits. Frontis, pages 31–45. 24
Ritter, C. and Tanner, M. A. (1992). Facilitating the Gibbs sampler: the Gibbs stopper and
the griddy-Gibbs sampler. Journal of the American Statistical Association, 87(419):861–
868. 19
Robert, C. and Casella, G. (2004). Monte Carlo statistical Methods. Springer New York. 24
Roberts, G. O. and Rosenthal, J. S. (2009). Examples of adaptive MCMC. Journal of
Computational and Graphical Statistics, 18(2):349–367. 8, 24
61
Salomone, R., South, L. F., Drovandi, C. C., and Kroese, D. P. (2018). Unbiased and
consistent nested sampling via sequential Monte Carlo. arXiv preprint arXiv:1805.03924.
11, 12
Schwarz, G. et al. (1978). Estimating the Dimension of a Model. The Annals of Statistics,
6(2):461–464. 21
Sherlock, C. and Roberts, G. (2009). Optimal scaling of the random walk Metropolis on
elliptically symmetric unimodal targets. Bernoulli, pages 774–798. 11
South, L. F., Pettitt, A. N., and Drovandi, C. C. (2019). Sequential Monte Carlo samplers
with independent Markov chain Monte Carlo proposals. To appear in Bayesian Analysis.
1, 25
Spiegelhalter, D. J., Best, N. G., Carlin, B. P., and Van Der Linde, A. (2002). Bayesian
measures of model complexity and fit. Journal of the Royal Statistical Society: Series B
(Statistical Methodology), 64(4):583–639. 21
Takaishi, T. (2009). An adaptive Markov chain Monte Carlo method for GARCH model.
In International Conference on Complex Sciences, pages 1424–1434. Springer. 2, 24
Trevor, H., Robert, T., and JH, F. (2009). The elements of statistical learning: data mining,
inference, and prediction. 21
Vehtari, A., Gelman, A., and Gabry, J. (2017). Practical Bayesian model evaluation using
leave-one-out cross-validation and WAIC. Statistics and Computing, 27(5):1413–1432. 29
Virbickaite, A., Ausın, M. C., and Galeano, P. (2015). Bayesian inference methods for
univariate and multivariate GARCH models: A survey. Journal of Economic Surveys,
29(1):76–96. 19
62
Vrontos, I. D., Dellaportas, P., and Politis, D. N. (2000). Full Bayesian inference for GARCH
and EGARCH models. Journal of Business & Economic Statistics, 18(2):187–198. 2, 19,
24
Watanabe, S. (2010). Asymptotic equivalence of Bayes cross validation and widely applicable
information criterion in singular learning theory. Journal of Machine Learning Research,
11(Dec):3571–3594. 48
63