Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
University of Amsterdam
Faculty of Economics & BusinessMSc Thesis: Financial Econometrics
Modeling the Term Structure of Interest
Rates: using the Generalized
Autoregressive Score Framework
Student: Guido Jonker (10457615)
Supervisor: dr. N.P.A van Giersbergen
Second Reader: prof. dr. H.P. Boswijk
May 2014
Abstract
In this thesis the term structure of interest rates is modeled with the purpose of fitting
and forecasting. For this the Dynamic Nelson-Siegel (DNS) model is used, which is esti-
mated using the Generalized Autoregressive Score (GAS) framework. Within the GAS
framework, some new extensions of the DNS are proposed and some existing extensions
are evaluated. We propose a new time-varying volatility specification. Also, an exten-
sion with student-t disturbances is proposed, but found unfit for modeling. Further,
extensions with nonlinearities and an additional fourth factor are investigated. We find
that more flexible models lead to a better in-sample fit of the data. Moreover, the GAS
estimated models lead to a better in-sample fit than comparable standard models esti-
mated by the Kalman filter. However, out-of-sample predictability of the term structure
is not proven for the new estimation method and model extensions. Sub-sample analysis
indicates that a naive random walk is difficult to beat using both the GAS and Kalman
modeling framework.
Contents
Abstract iii
1 Introduction 1
2 Theory 5
2.1 The Yield Curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Zero-Coupon Yields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Why Model? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.4 Nelson-Siegel Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.4.1 Dynamic Nelson Siegel . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.5 Generalized Autoregressive Score . . . . . . . . . . . . . . . . . . . . . . . 13
2.5.1 The Modeling Framework . . . . . . . . . . . . . . . . . . . . . . . 14
3 Model Specifications 17
3.1 General Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.1.1 Gaussian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.1.2 Student-t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.1.3 Variable Lambda . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.1.4 Time-Varying Volatility . . . . . . . . . . . . . . . . . . . . . . . . 19
3.1.5 Common Disturbance with Time-Varying Volatility . . . . . . . . 19
3.1.6 Common Volatility . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4 Data 21
5 Estimation 27
5.1 Initial Estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
5.1.1 Lambda . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
5.1.2 Two-Step Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . 28
5.1.3 One-Step State-Space Estimation . . . . . . . . . . . . . . . . . . . 29
5.2 GAS Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5.2.1 Maximum Likelihood Estimation . . . . . . . . . . . . . . . . . . . 29
5.2.2 Initial Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
5.2.3 Smoothing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.2.4 Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.3 Scores and Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.3.1 Gaussian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.3.2 Student-t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
v
Contents vi
5.3.3 Variable Lambda . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5.3.4 Common Disturbance with Time-Varying Volatility . . . . . . . . 36
5.3.5 Common Time-Varying Volatility . . . . . . . . . . . . . . . . . . . 39
6 Results 41
6.1 In-Sample Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
6.1.1 Two-Step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
6.1.2 Kalman . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
6.1.3 Gaussian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
6.1.4 Student-t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
6.1.5 Variable Lambda . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
6.1.6 Common Disturbance with Time-Varying Volatility . . . . . . . . 47
6.1.7 Common Time-Varying Volatility . . . . . . . . . . . . . . . . . . . 49
6.1.8 Bjork and Christensen Four-Factor Model . . . . . . . . . . . . . . 50
6.1.9 Estimation Robustness . . . . . . . . . . . . . . . . . . . . . . . . . 50
6.1.10 In-Sample Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . 51
6.2 Out-of-Sample Performance . . . . . . . . . . . . . . . . . . . . . . . . . . 52
6.2.1 Forecast Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
6.2.2 Forecast Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
6.2.3 Forecast Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
6.2.4 Out-of-Sample Conclusion . . . . . . . . . . . . . . . . . . . . . . . 56
7 Conclusion 57
8 Further Research 59
A Tables 61
B Kalman Filter 69
Bibliography 71
Chapter 1
Introduction
The term structure of interest rates gives the relation between interest rates or bond
yields at different terms (maturities). In general yields increase in line with maturity,
giving rise to an upward sloping yield curve. One basic explanation for this compound-
ing phenomenon is that lenders demand higher interest rates for longer-term loans as
compensation for the greater risk associated with them, in comparison to short-term
loans. The yield curve plays a central role in an economy. Modeling and forecasting the
term structure of interest rates is therefore of great importance in many ways: pricing
derivatives, portfolio management, valuation of assets, risk management and monetary
policy.
To address this issue researchers have come up with a vast amount of literature. Many
use theoretically rigorous approaches, but result in empirically unsatisfying results and
especially bad out-of-sample forecasting capabilities. Because bonds trade in well-
organized, deep and liquid markets, it is logical and appealing to impose the absence of
arbitrage. The US treasury market for example is very large and liquid, with a total out-
standing debt of almost 12 trillion, a yearly issuance of over 2 trillion and approximately
500 billion traded every day1. Because of this liquid market it is unlikely that arbitrage
opportunities exist: the risk adjusted returns of different maturity bonds should be the
same. Or differently stated: the yields are internally consistent. Therefore, a large
amount of literature holds this as their grounding. The associated arbitrage-free (AF)
models started with Vasicek (1977) and Cox et al. (1985). They introduced these so-
called ’affine’ models. Which are functions of the instantaneous short rate(r). These
models became even more popular when Duffie and Kan (1996) generalized them. Un-
fortunately these models have a poor fit and are difficult to estimate, many having
multiple likelihood maxima (Kim and Orphanides, 2005). Besides these estimation dif-
ficulties performance of forecasting is also poor for these affine models (Duffee, 2002).
1http://www.sifma.org/research/statistics.aspx
1
Introduction 2
As a result of these difficulties, the focus of many researchers has been on empirically
attractive models. The most important being the Nelson-Siegel (NS) curve (1987). The
NS curve is widely used among central banks and others in the financial industry. The
reason for this, is its relative simplicity, its ease of estimation and empirically tractable
forecasting results. It faces the problem of modeling the yield curve by summarising the
information at any point in time, for a large number of bonds. It does so by expressing
the large set of yields of various maturities as a function of a small set of unobserved
factors. The underlying economic interpretation of the three factors it uses are: the
level (long-term), slope (short-term) and curvature (medium-term) of the yield curve.
The Nelson-Siegel curve is reasonably flexible, allowing for various shapes (monotonic,
inverted, humped, S-shaped). Therefore, the NS curve ensures a good fit to the data.
As an evolution, the Dynamic Nelson Siegel (DNS) model was developed by Diebold and
Li (2006). This model imposes the structural restrictions of the NS with time varying
factors (level, slope, curvature). These time-varying factors are modeled using (Vector)
Autoregressive specifications. Their paper shows that their forecasts outperform stan-
dard time series and therefore this has brought research back to the Nelson-Siegel class.
Several researchers have extended and investigated the DNS. Diebold et al. (2006) put
the DNS in state-space equation form and include macro-economic factors. Pooter (2007)
examines various extensions of the DNS with the purpose of fitting and forecasting. He
concludes that an extension with a fourth factor (Svensson, 1995) forecasts very well.
Especially with a one-step state-space estimation approach using a Kalman filter. Koop-
man et al. (2010) extend the DNS in two directions. First, they impose that the factor
loadings in the DNS depend on an additional loading parameter, that they treat as the
fourth latent variable (λt). Second, they introduce time-varying volatility to the DNS
using a standard GARCH specification.
Besides the introduction of the DNS, the Arbitrage-Free Nelson-Siegel(AFNS) model was
developed by Christensen et al. (2011) which takes the DNS and imposes the absence of
arbitrage. This partly closes the gap between theoretically rigorous and statistical term
structure models. It adds an additional time-invariant ”yield-adjustment term” which
leads to the differences between the AFNS and DNS. Because the DNS generally fits
well, and market yields are assumed to be arbitrage-free, it is arguable that the DNS is
arbitrage-free up to an accurate approximation: the no-arbitrage constraint should be
largely non-binding. Coroneo et al. (2011) support this claim by finding that the normal
Nelson-Siegel is compatible with the no-arbitrage constraints in the US interest rate
market. Duffee and Stanton (2012) come to similar conclusions. Joslin et al. (2011) con-
clude that for their JSZ model forecasts are invariant to the imposition of no-arbitrage
restrictions. Furthermore, they state that the AFNS model is a constrained special case
of the JSZ normalization. Despite this consideration a lot of recent research has focused
on the AFNS. For example, Christensen et al. (2009a) extend their AFNS model to
Introduction 3
form the Arbitrage-Free Generalized Nelson-Siegel (AFGNS); a Svensson extension (a
fourth factor). Also, Christensen et al. (2013) incorporate stochastic volatility in the
AFNS, but conclude that much observed stochastic volatility cannot be associated with
the spanned term structure factors.
Because of these arbitrage-free considerations, this theses will focus on the DNS. The
novel feature of this thesis is the relatively new estimation method. Time-series mod-
els with time-varying parameters can be categorized into two classes: parameter driven
models and observation driven models (Cox et al., 1981).
In parameter driven models, the time-varying parameters are stochastic processes sub-
ject to their own source of error. Therefore, the parameters are not perfectly predictable
given the past: the likelihood function is not known in closed form.
The alternative is an observation driven model, where the time-varying parameters are
dependent on (functions of) lagged dependent values, exogenous variables and past ob-
servations. Parameters are stochastic, but predictable given the past. This approach
simplifies likelihood evaluation, because the likelihood function is known in closed form.
Creal et al. (2008, 2012) introduce Generalized Autoregressive Score (GAS) models; a
class of observation driven time series models. The GAS model uses the scaled score
function as driving mechanism of the time-varying parameters.
The usage of the GAS framework is different, as most papers use a parameter driven
approach. Where the NS model parametrization is used for the observation equation,
combined with a transition equation for the unobserved factors, which are modeled
as (Vector) Autoregressive processes. Together with the NS observation equation this
forms a state-space structure. Restrictions on this state-space are imposed in order to
estimate the model, mostly Gaussian disturbances are assumed in both equations. This
way it is possible to use a Kalman filter (1960). Some propose Bayesian estimation using
Markov Chain Monte Carlo (MCMC) in order not to make these assumptions (Laurini
and Hotta, 2010). With the use of the GAS framework we are not restricted to such
assumptions or methods. This leads to the main question that is answered in this thesis:
How does the GAS framework perform in the term structure setting?
Which model specification performs best? Different specifications are compared, both
in-sample fit and out-of-sample forecasting is regarded, as well as estimation ease and
robustness.
This main question results in different sub-questions:
• Is the assumption of normality valid?
As mentioned, previous applications of the DNS/AFNS have used Gaussian errors.
Different distributions could be use, such as the student-t distribution.
Introduction 4
• Can heteroskedasticity be included in the model?
Is the assumption of cross-sectional and longitudinal independence of the distur-
bances valid? Can the model be extended to incorporate other possibilities?
• How do GAS estimated models perform in relation to Kalman estimated models?
The performance of the GAS estimated models is compared to traditional Kalman
estimated models, regarding in-sample fit and out-of-sample predictions.
Methodology and Techniques
In this thesis new specifications of the DNS are proposed. For these new model specifi-
cations analytical derivations of the likelihood are determined. Furthermore, all models
considered are programmed in Matlab.
The model specifications are assessed on the basis of in-sample fit and out-of-sample fore-
casts. For the in-sample fit, measures such as the Root Mean Squared Error (RMSE))
are used. Also more deciding measures as Akaike and Bayesian information criterion
(AIC/BIC) are used, which judge the effect of adding additional variables. Increasing
the fit of the model, making it more complex vs. a more parsimonious model. Likelihood
ratio (LR) tests are used to test significance of more elaborate nested models, such as
a model with student-t errors vs. Gaussian errors and specifications with time-varying
volatility. For the non-nested models we use the Rivers-Vuong (RV) test: the GAS es-
timated vs. Kalman filter estimated models.
For out-of-sample forecast, comparisons are made between the forecast errors. This is
done using (trace) Root Mean Square Forecast Error ((t)RMSFE). Besides this rela-
tively subjective comparison a more formal test is used to compare predictive accuracy
of different models, namely the Diebold-Mariano (DM) test statistic.
Chapter 2
Theory
In this section the theory behind modeling the term-structure is explained. First we
explain what interest rate curves are and how these curves relate. We then explain how
they are derived. Subsequently, we give reason why to model them and why to use the
Nelson-Siegel method. Finally, we explain the Generalised Autoregressive Score (GAS)
framework and how it can be adopted for the term-structure.
2.1 The Yield Curve
We start by deriving what we try to model: the yield curve or the term structure of
interest rate. The yield curve is used to describe the relationship between the yield (i.e.
the return) of bonds and the time to maturity. This term-structure is given for bonds
with the same credit risk : for example US treasury bonds. This is done because it is
assumed that these will have similar dynamics or the same factors driving its dynamics.
We want to compare different countries or companies over the course of time. We start
by defining the relationships between the different interest rate curves: the yield curve,
the discount curve and the forward curve (Hull, 1999).
Define P (τ) the price of τ -period discount bond (discount bond meaning that a dis-
counted price is paid for a to be received amount in the future). Hence P (τ) denotes
the present value of a risk-free contract that pays unit at its maturity τ -periods ahead:
τ = T − t, the time to maturity. If y(τ) is the continuously compounded yield to
maturity, then by definition the discount curve is given by:
P (τ) = e−τy(τ). (2.1)
Hence the yield curve and the discount curve are fundamentally related. Knowing one
of the curves enables you to calculate the other curve immediately.
5
Theory 6
The forward rate curve gives the forward rate as a function of the maturity. It is similarly
related :
f(τ) = −P′(τ)
P (τ). (2.2)
(2.1) and (2.2) imply in the relationship between yield and forward curve:
y(τ) =1
τ
∫ τ
0f(u)du. (2.3)
So the yield is the equally weighted average of the forward rates. This proves that once
we have a representation of any one of the above equations we can automatically derive
the other. That is: all are interchangeable. (Diebold and Rudebusch, 2011; Piazzesi,
2010)
2.2 Zero-Coupon Yields
Although the yield curve is most used in practice, yields are not observed. Instead yields
need to be estimated from a large set of bond prices with an approximation method.
In practice bonds exist of all different maturities at all time. The difficulty is that we
cannot simply apply the formulas (2.1)-(2.3) to the observed market prices of the bonds.
This is not possible because most bonds bear coupon payments: mostly semi-annually
payments are received. Because of these coupon payments the prices of these bonds bear
a so-called ’coupon effect’ as analyzed by Caks (1977). Therefore bonds with the same
maturity, but with different coupon rates will have different yields. Because of these
effects ’zero-coupon’ yields have to be estimated from the large pool of bonds that are
traded. Different researchers have come up with methods to estimate these zero-coupon
yields.
McCulloch (1975) introduces a cubic splines method to estimate the zero-coupon yield.
The disadvantage of this method is that it has some trouble in fitting flat curves. This
is a result of a diverging discount curve at long maturities. Still the Federal Reserve
presents their yield curves using this method. Vasicek and Fong (1982) overcome the
problem of this method by using exponential splines. Only this method has as problem
that the forward rate is not strictly positive. The last method is introduced by Fama
and Bliss (1987). They construct the yields from estimated forward rates: ”unsmoothed
Fama-Bliss” rates. These forward rates are created using the prices of the coupon bearing
bonds which are averaged using a bootstrap method, assuming a constant forward rate
between the different maturities. The forward rates are then converted to a yield curve
using formula (2.3). The created yields exactly price the bonds used to create them.
Theory 7
This method is regarded as the most accurate and therefore many studies use it: for
example Diebold and Li (2006), Pooter (2007), Koopman et al. (2010). Eventually fitting
a parametric model to these yields will create smoothed yields. This will be discussed
in chapter 3.
2.3 Why Model?
Now that we have defined what we try to model, we elaborate on why we would want
to model this yield curve. Piazzesi (2010) mentions that there are at least four reasons
for modeling the yield curve.
The first reason is forecasting. Yields of long maturity bonds are the expected values
of the average short yields. This is after any risk adjustments. Therefore the current
yield curve tells something about future directions of the economy. Besides the use for
forecasting future yields, it can be used to forecast real activity and inflation (Diebold
et al., 2005; Fama, 1990). All these forecasts can be used for investment decisions, sav-
ings decisions and policy decisions.
A second reason is the assessment of monetary policy. Central banks of most industri-
alized countries seem to be capable to move the short end of the yield curve. However,
what mostly matters for long-term economic growth and developments are the long-
term yields. For example decisions made to buy or rent a house are driven by long-term
mortgage rates (so long-term yields), not the short-term central bank driven rate (e.g.
Federal Funds Rate). Modeling the yield curve can help to understand how moving the
short-end of the yield curve effects long-term yields. The research comprises both the
understanding of the mechanisms in this process as well as understanding how central
banks conduct policy. A recent example is the Quantitative Easing (QE) by the Fed-
eral Reserve (Christensen et al., 2009b). Also the European central bank has recently
decided to use such unconventional monetary policy to stimulate the economy.
A third reason is debt policy. Governments issue debt in the form of bonds. Govern-
ments need to decide about the maturities of the issued bonds. The supply of bonds
with different maturities influences the yields. Governments can actively manage the
maturity structure of its public debt. For example this can be done by selling short
maturity debt and buying long maturity bonds.
The fourth reason is the use for pricing and hedging. For example, coupon-bearing
bonds can be priced using a model of the term structure. Each payment is weighted by
the price of a zero-coupon bond that matures at the date of the coupon. Also, prices of
futures, options, swaps, caps and floors can be computed from a yield curve model. Fur-
thermore, some parties may need to manage risks associated with differences in received
or payed interests. Hedging strategies can be computed by determining the prices of
Theory 8
the derivatives, depending on different states of the economy. It can be argued that for
purposes such as pricing and hedging it is more important to use a model that imposes
the restrictions implied by the absence of arbitrage, whereas the temporal dynamics are
often of less value.
2.4 Nelson-Siegel Model
Because of the nature of the data a multivariate model is needed. As mentioned, the
focus of this thesis is on forecasting. Because we are focusing on forecasting we use
an empirically attractive model and stay away from traditional financial theory that
imposes the restrictions of the absense of arbitrage. To compress the information that is
in the bond data, a model with a factor structure can be used. We want to compress the
information in the yields for statistical reasons. A more parsimonious model will result
in worse in-sample-fit, but generally better forecasts. Fortunately financial theory often
suggests a factor structure. Successful financial models that use a factor structure are for
example: CAPM model (one factor) and Fama-French (three factors) (Fama and French,
1993; Jensen and Scholes, 1972). The risk premiums are often driven by a smaller number
of risk factors. Luckily yields also have a factor structure. The first three principal
components explain almost all variation (97%) (Litterman and Scheinkman, 1991). This
means that the high-dimensional set of yields is driven by a lower dimensional set of
factors.
The Nelson-Siegel is the most used parametric model for the interest rate curve. The
model is popular because of its robustness. It provides a smooth fit and is relatively
flexible, thereby ensuring a good fit. Above all, it provides statistically useful results
that are economically meaningful. Because of these properties it is a popular model
among its users.
The original model is in the class of exponential affine three factor term structure models.
’Affine’ in this context meaning constant plus a linear term: a function of a vector of
observable or unobservable (latent) factors. It was developed by Nelson and Siegel (1987)
for the static cross-section of the term structure. Their model was designed to fit the
forward rate. The model is deduced from the observation that the typical yield curve
shapes are associated with differential or difference equations. The function consists of
a product between an polynomial and an exponential decay term: a Laguerre function.
A Laguerre function is used to approximate function in domain [0,∞). Because yields
are in this domain it will give a good fit. In this thesis the re-factorization of Diebold
and Li (2006) is used. This is done because it gives a more intuitive interpretation of
the factors (level,slope and curvature). The representation of the instantaneous forward
Theory 9
rate is given by:
f(τ) = β1 + β2e−λτ + β3λτe
−λτ . (2.4)
Integrating this forward rate from 0 to t as in equation (2.3) gives the Nelson-Siegel
yield curve:
y(τ) = β1 + β2
(1− e−λτ
λτ
)+ β3
(1− e−λτ
λτ− e−λτ
). (2.5)
The Nelson-Siegel is not just arbitrary but it has some characteristics that are desirable.
Namely that the price of a bond is unit at execution, because of the absence of risk.
Also the price of a bond will go zero when time to maturity(τ) goes to infinity.:
P (0) = 1. (2.6)
limτ→∞
P (τ) = 0. (2.7)
The interpretation of the β’s can be deduced by examining the limiting properties of
the parametrisation:
limτ→∞
y(τ) = β1. (2.8)
limτ→0
y(τ) = β1 + β2 = r. (2.9)
It shows that β1 gives the level of the yield curve. It provides the long-run component of
the yield curve. It is constant for all maturities. It can be seen as the level of the short
rate (r). β2 is the short-term component as it starts at 1 and decays fast to zero with
maturity. Together β1 and β2 form the instantaneous short rate (r). β3 is the medium-
term component because it starts at zero, then increases and finally decays again with
longer maturities. This means that β3 provides the curvature of the yield curve. It does
neither affect the short, nor the long end very much. It mostly effects the middle of the
curve. The properties are illustrated by figure 2.1
λ determines the decay speed of the parameters. A larger λ will fit longer maturities
better. The opposite is true for a smaller λ, which will fit shorter maturities better.
Theory 10
20 40 60 80 100 120
0.2
0.4
0.6
0.8
1.0
Figure 2.1: Factor loadings, blue depicts the first loading given by 1. In red the
second loading ( 1−e−λτ
λτ ) and in yellow third loading ( 1−e−λτ
λτ − e−λτ ), where λ = .062
The flexibility of the NS curve is also a desirable property. It is flexible because is can
assume a variety of shapes through different values of the latent factors: It can be an
increasing or a decreasing function, both at a decreasing as at an increasing rate. It can
assume a S-shaped curve, it can be a flat line ,but it can also adopt a U-shape or an
inverted U-shape (humped or inverted humped). Figure 2.2 illustrates the flexibility of
the Nelson-Siegel curve. The limitation of the NS is that is can only have one optimum.
Luckily this constraint is mostly non-binding. The yield curve usually does not move
jagged with maturity.
Although the NS curve is relatively flexible, it still uses a parsimonious approximation.
The sparse use of factors results in a smooth curve. This smoothness is preferred because
it protects against over-fitting. Over-fitting is undesirable because it results in difficul-
ties in estimation. This over-fitting will most likely end in unmanageable estimation.
Moreover, over-fitting frequently leads to bad forecasting capabilities.
20 40 60 80 100 120
-0.5
0.5
1.0
1.5
2.0
2.5
Figure 2.2: Shapes the NS model can assume: constructed by fixing β1 = 1, β2+β3 =0 and λ = .062
Theory 11
2.4.1 Dynamic Nelson Siegel
The DNS is an evolution of the NS model. Diebold and Li (2006) convert the factor
model for the cross-section into a dynamic factor model. This is done by extending the
model with time-varying latent factors. The latent factors determine the cross-section
of the yield curve, as shown in the previous section. The dynamics of the factors subse-
quently determine the longitudinal dynamics of the yields.
The introduction of this dynamic factor structure has mentionable advantages. It con-
verts the high-dimensional situation (a cross-section of many yields over different time-
periods) into a easier low-dimensional one.
yt(τ) = β1t + β2t
(1− e−λτ
λτ
)+ β3t
(1− e−λτ
λτ− e−λτ
). (2.10)
Diebold and Li (2006) introduce two benchmark models. One with the three latent
factors modeled as univariate AR(1) models. And one with the factors modeled as a
first-order vector autoregressive model VAR(1).
Subsequently Diebold et al. (2006) put the model in State-Space form, adding stochastic
errors to the Nelson-Siegel observation curve. This produces a measurement equation.
It relates a set of N yields with time to maturity τ ∈ {τ1, τ2, . . . , τN} to the three
unobservable factors (β1t, β2t, β3t). This gives:
yt(τ1)
yt(τ2)...
yt(τN )
=
1 1−e−λτ1
λτ11−e−λτ1λτ1
− e−λτ1
1 1−e−λτ2λτ2
1−e−λτ2λτ2
− e−λτ2...
......
1 1−e−λτNλτN
1−e−λτNλτN
− e−λτN
β1t
β2t
β3t
+
εt(τ1)
εt(τ2)...
εt(τN )
. (2.11)
The factor dynamics are then specified in the state equation as:β1t
β2t
β3t
=
µ1
µ2
µ3
+
a11 a12 a13
a21 a22 a23
a31 a32 a33
β1,t−1
β2,t−1
β2,t−2
+
η1t
η2t
η3t
. (2.12)
The state space formed by (2.11) and (2.12) can be put in a more convenient vector
notation:
yt = X(λ)βt + εt. (2.13)
βt = µ+Aβt−1 + ηt. (2.14)
Theory 12
Where yt, εt are vectors of (N × 1), X is a matrix of (N × 3) and βt, µt, ηt vectors of
(3× 1).
The matrix A can be estimated in full or in diagonal form: VAR(1) or AR(1). However,
it is argued by Diebold and Li (2006) that using a VAR (correlated factors) will result
in bad forecasting capabilities.
In order to estimate the model using a Kalman filter (1960) (procedure given in appendix
B), assumptions are made on the error decomposition. Most papers assume that εt and
ηt are uncorrelated Gaussian White Noise:(εt
ηt
)∼ N
[(0
0
),
(Σε , 0
0 , Ση
)]. (2.15)
Furthermore it is assumed that the errors are orthogonal to the initial state vector:
E[β0ε′t] = 0, E[β0η
′t] = 0. (2.16)
Also extensions can be made using this framework; such as a fourth factor (Bjork and
Christensen, 1999; Svensson, 1995). They argue that this improves fit for longer matu-
rities: especially longer than 10 years. Pooter (2007) even concludes that this improves
forecast performance. They conclude that the extension of Bjork and Christensen (1999)
gives similar results. This specification is easier to estimate because it only assumes one
λ instead of two and therefore reduces the estimation space. The resulting loading ma-
trix X(λ) then has dimensions (4× 1) and βt, µt, ηt become (4× 1) vectors. The loading
matrix is then given by:
X(λ) =
1 1−e−λτ1
λτ11−e−λτ1λτ1
− e−λτ1 1−e−2λτ1
2λτ1
1 1−e−λτ2λτ2
1−e−λτ2λτ2
− e−λτ2 1−e−2λτ2
2λτ2...
......
...
1 1−e−λτNλτN
1−e−λτNλτN
− e−λτN 1−e−2λτN
2λτN
. (2.17)
Another direction in which the model is extended is through the inclusion of macro-
variables. Diebold et al. (2006) extend the model by adding macro-variables to the state
vector.
Furthermore, some have proposed models in which the state equation has regime-
switching properties. This can be used when yield structurally change for long periods
for example for a change in fiscal policy or a recession. Bernadell et al. (2005) and
Xiang and Zhu (2013) for example make the means of the state equation dependent on
the regime. Such models can be estimated using a Kalman filter with a Hamilton model
(1990) for regime-switching.
Theory 13
Another way of estimating the models in state-space form is the use of Bayesian infer-
ence using Markov Chain Monte Carlo (MCMC). These methods allow for more flexible
disturbance specifications, though they are computationally very intensive. Hautsch and
Yang (2012) and Caldeira et al. (2010) estimate models with stochastic volatility using
these Bayesian methods.
The main disadvantage of this state-space framework is the assumption of disturbances
in the measurement equation and the state equation. And moreover, the assumptions
that are made in order to estimate it with a Kalman filter. Or the estimation disadvan-
tages when not making these assumption through the use of Bayesian methods. A clear
advantage is the theory that has accumulated on these subjects. And especially the
theory on the Kalman filter with the elegant recursive estimation routine: it is simple,
intuitive, straightforward and powerful. The clear disadvantage of the Kalman filter is
its sensitivity to deviations from the Gaussian distribution and its adaptive capacities.
We therefore proceed to introduce the GAS model.
2.5 Generalized Autoregressive Score
The GAS model of Creal et al. (2008, 2012) gives a different framework to estimate the
DNS introduced before. To model the dynamics of the time-varying parameters, it does
not assume a state-space framework with individual disturbances: both a disturbance in
the observation equation as one in the state equation. Consequently we do not have to
make assumptions on the distribution of the disturbance in the unobserved state equa-
tion.
The GAS model approaches the modeling of these time-varying parameters different but
also shares some similarities. Like the Kalman filter the GAS model links past obser-
vations with future parameters. And in the way that it does this, likelihood evaluation
will still be straightforward.
The observation-driven GAS model is chosen because its main advantage is that it
exploits the full observation density. It is not simply limited to a first or second mo-
ment. Also important is that it can be used for all kinds probability distributions.
Furthermore, it can be applied to linear regressions, but also non-linear regressions
with time-varying coefficients (we will come to that in section 3.1.3). Consequently,
the GAS model nests many econometric models such as the Generalized Autoregressive
Conditional Heteroskedasticity (GARCH) models of Bollerslev (1986),the autoregressive
conditional duration and intensity (ACD and ACI, respectively) models of Engle and
Russell (1998) and Russell (1999), the dynamic conditional correlation(DCC) model of
Engle (2002) and Dynamic Copula (Creal et al., 2008). We will now introduce the
modeling framework.
Theory 14
2.5.1 The Modeling Framework
Let yt denote the dependent variable of interest a N × 1 vector, ft the time-varying
parameter vector, xt a vector of exogenous variables, and θ a vector of static param-
eters. Define Y t = {y1, ..., yt}, F t = {f1, ..., ft} and Xt = {x1, ..., xt}. The available
information set at time t consists of {ft,Ft} where
Ft = {Y t−1, F t−1, Xt}. (2.18)
yt is assumed to be generated by the observation density
yt ∼ p(yt|ft,Ft; θ). (2.19)
The updating mechanism for the time varying parameter vector ft is given by the auto-
regressive updating equation
ft+1 = ω +
p∑i=1
Aist−i+1 +
q∑j=1
Bjft−j+1, (2.20)
where ω is a vector of constants, Ai and Bj coefficient matrices, st = st(yt, ft,Ft; θ) and
st is given by:
st = St · ∇t, with ∇t =∂ ln p(yt|ft,Ft; θ)
∂ft, St = S(t, ft,Ft; θ). (2.21)
So the updating equation (2.20) consists of a constant (ω), a part that uses the scaled
score (st−i+1) and a part that uses the lagged factors (ft−j+1).
The scaled score (st) consists of, as said the score (∇t) and a scaling matrix (St). The use
of the score (∇t) for updating the factors is intuitive, as it gives the direction (steepest-
ascent) in which the factors must be changed to increase the local likelihood, given the
current factors (ft). St is the scaling matrix. Through the choice of this scaling, the
model allows for more flexibility. However it is often a natural consideration to use a
scaling that depends on the variance of the score. That is, the use of the inverse Fisher
information:
St = I−1t|t−1 = Et−1[∇t∇′t
]−1= −Et−1
[∂2 ln p(yt|ft,Ft; θ)
∂ft∂f ′t
]−1. (2.22)
Together equations (2.20)-(2.22) form a GAS(p,q): a Generalised Auto-regressive Score
model with orders p and q. The q gives the number of lags of the factors are consid-
ered: the auto-regressive part of the factors. The p gives the number of lags of the
(scaled) score that are considered. The updating mechanism (2.20) can be interpreted
as a Gauss-Newton algorithm.
Theory 15
An important feature of the model is that under the right specifications the scaled score
(st) forms a martingale difference series: Et−1 [st] = 0. This is a property of the score.
For the variance we get Et−1 [sts′t] = StIt|t−1S′t. When scaling with St = I−1t−1 this
reduces to I−1t|t−1. When scaling with St = I we get It|t−1 as variance. As suggested
scaling in preferably done with the inverse Fisher information matrix (I−1t|t−1). Alterna-
tively the score could be scaled with a unit matrix. This way the unscaled score is used
as updating mechanism. This makes updating similar to a steepest-ascent optimization
of the likelihood. But according to Creal this updating mechanism is often less stable.
Koopman (2012) suggests to use St = I−1/2t|t−1 . For this choice of scaling, st has constant
unit variance and is invariant under non-degenerate parameter transformations g(ft).
They state that the constant unit variance property that results from this scaling choice
is a useful device for detecting model mis-specification in applications.
Additionally the updating equation can be extended to include exogenous variables: xt.
Besides this the coefficient matrices can be functions dependent on the static parame-
ters: ω(θ), Ai(θ), Bi(θ).
This chapter is concluded by the observation that the state equation of the DNS in
equations (2.13)-(2.14) can be replaced by the updating mechanism of 2.20 the GAS
framework, as also proposed by Creal et al. (2008). Furthermore, Creal et al. (2011a)
even suggests to keep using state-space equation framework and the Kalman filter, but
to model other parameters than βt using the GAS framework. Although this possibility
exist, Koopman (2012) conduct a Monte Carlo study to compare parameter driven mod-
els with observation driven models and conclude that observation-driven GAS models
have similar predictive accuracy to correctly specified parameter-driven models. There-
fore, we proceed to model the time-varying factors of the NS using the GAS updating
equation.
Chapter 3
Model Specifications
In the following section different model specifications are introduced. As a result of
the GAS adoption we are no longer constraint by the use of a Kalman filter. We can
thus assume different disturbance specifications and nonlinearities. First, we consider
the standard Gaussian error specification. We then extend this to a specification with
Student-t distributed errors. Subsequently, a model with variable lambda is proposed.
Finally, we extend the model with a disturbance specification with time-varying volatil-
ity.
3.1 General Model
As general model we have:
yt = X(λ)βt + εt, (3.1)
where X(λ) is as given in equation (2.11) or (2.17) for the four-factor extension of Bjork
and Christensen (1999). We add βt to the time-varying parameter vector ft, which is
updated using equation (2.20) as proposed in the modelling framework. So as time-
varying factor vector we have at least ft = βt = (β1t, β2t, β3t)′ .
3.1.1 Gaussian
At first we assume Gaussian disturbances as done in the state-space framework with
Kalman filter. This specification is also estimated in Creal et al. (2008). For this
specification we have the disturbances εt given by:
17
Model Specifications 18
εt ∼ N(0,Σε). (3.2)
where εt is the disturbance vector of N × 1 and Σε a positive definite covariance matrix
of N ×N
3.1.2 Student-t
Instead of the Gaussian disturbances that are used by most researchers, we propose
to adopt multivariate student-t distributed disturbances. The student t-distribution is
symmetric and bell-shaped, like the Gaussian distribution. But as distinctive feature, it
has heavier tails, meaning that it is more prone to producing values that fall far from its
mean. The student-t distribution is suggested for many variables in finance. However
it does not yet capture the asymmetry we often see in financial returns. But as a start
we suggest to use the symmetric student-t distribution for the DNS.
Compared to the Gaussian distribution the student-t adds an additional parameter to
the probability function, namely the degrees of freedom v. The degrees of freedom
determine how fat the tails are. Particularly, the higher the degrees of freedom, the
closer that distribution will resemble a standard normal distribution. That is, for v →∞it resembles the Gaussian distribution. And for values of v > 30 it almost resembles
the Gaussian distribution. So the student-t is a family of distributions that nests the
Gaussian distribution. Hence, this generalizes the Gaussian model. For the disturbances
we have:
εt ∼ Student-t (0,Σε, v), (3.3)
where εt is a N × 1 disturbance vector and Σε a positive definite covariance matrix of
N ×N and v gives the degrees of freedom.
3.1.3 Variable Lambda
So far the decay parameter λ is assumed to be fixed over time. For example Diebold
and Li (2006) fix λ at 0.0609 and Diebold et al. (2006) estimate that λ = 0.077. λ
determines the place of the maximum of the curvature. It may be too restrictive to fix
this parameter as the characteristics of the yield curve may have changed over time. So
we allow for a variable λt as proposed by Koopman et al. (2010) and adopted by Creal
et al. (2008). Instead of the Kalman framework used by Koopman, which needs for local
linearization, we use the observation driven approach of the GAS as used by Creal. Our
Model Specifications 19
time-varying factor vector is extended to ft = (β′t, λt)′. The matrix X(λ) in equation
(3.1) is replaced by a time-varying Xt dependent on λt:
Xt =
1 1−e−λtτ1
λtτ11−e−λtτ1λtτ1
− e−λtτ1
1 1−e−λtτ2λtτ2
1−e−λtτ2λtτ2
− e−λtτ2...
......
1 1−e−λtτNλtτN
1−e−λtτNλtτN
− e−λtτN
. (3.4)
3.1.4 Time-Varying Volatility
Interest rates are subject to financial market trade and therefore sensitive to market
sentiments and market movements. Therefore changes in volatility may emerge. The
models investigated so far have assumed constant volatility. We propose to adopt some
time-varying volatility specifications. We first propose an adapted version of the speci-
fication of Koopman et al. (2010). Further we propose a completely new specification.
3.1.5 Common Disturbance with Time-Varying Volatility
At first the disturbance decomposition proposed by Koopman et al. (2010) is adopted.
They argue that volatilities vary across different maturity yields. They find that shorter
maturity yields are more sensitive to a common shock than longer maturity yields.
Therefore they assume that the disturbance is composed of a common disturbance ε∗t
and an individual disturbance ε+t distributed as:(ε∗t
ε+t
)∼ N
[(0
0
),
(ht 0
0 Σ+ε
)]. (3.5)
The combined disturbance is then defined as
εt = Γεε∗t + ε+t . (3.6)
This leads to a variance given by:
Σε(ht) = htΓεΓ′ε + Σ+
ε , (3.7)
where ht is the time-variable variance of the common disturbance (ε∗t ). Γε is a N × 1
loading vector to pass the effect of the common disturbance onto the yields of the dif-
ferent maturities.
The GAS model is used to update ht, as opposed to the GARCH specification used by
Koopman et al. (2010). They follow the common GARCH specification proposed by
Model Specifications 20
Harvey et al. (1994). In our case the variance ht is modeled as 1 of 4 latent factors
f∗t = (β1t, β2t, β3t, ht). Where the factors are again updated using equation (2.20).
In this specification restrictions are required to overcome identification problems. Koop-
man et al. (2010) propose a normalization Γ′εΓε = 1, but choose to fix the constant of
the common variance at a value close to zero. We choose to fix the first element of Γε
at 1 as this also prevents identification issues.
3.1.6 Common Volatility
Another convenient approach is to use a diagonal parametrization for the covariance
matrix. We therefore propose to use the following specification:
Σε (ht) = htΦ, (3.8)
where Φ is a N × N symmetric positive definite matrix of loadings passing the effect
of the common volatility on to the different maturity yields. Φ is chosen as a diagonal
matrix to save on parameters to be estimated. This means that the diagonal contains
N values. ht is again modeled using the GAS updating equation. Again we need to
normalize this as a multiplication of ht with an arbitrary number (or matrix) and the
division of Φ with the same number (or matrix inverse multiplication) would yield the
same variance matrix Σε (ht). This will result in identification issues. Again we choose
to fix the first element of Φ at 1.
Chapter 4
Data
In this section zero-coupon interest rate data are presented. The characteristic properties
of the yields are discussed. We discuss how the yield curves behave cross-sectional and
temporal: across different maturities and over time. Both are important as we want to
capture the dynamics of the yield curve.
In the thesis we use unsmoothed end-of-month US zero-coupon yields. The data can be
downloaded from the Center for Research in Security Prices (CRSP)1. These unsmoothed
yields are constructed using the Fama-Bliss method (1987). The data-set consists of
continuously compounded interest rates which are presented on an annualized basis.
The method gets rid of the coupon effects discussed by Caks (1977). For this method
filtered bond prices(average bid/ask) are used, eliminating bonds with special features
(such as option features). Using these bond prices, forward rates are generated using the
Fama-Bliss (1987) bootstrap method. The data-set used consists of observations in the
period from January 1970 to December 2009 and has T = 480 observations. It consists of
yield of N = 17 maturities: τi = 3, 6, 9, 12, 15, 18, 21, 24, 30, 36, 48, 60, 72, 84, 96, 108, 120
months. Together this forms a panel of 8160 data points. Figure 4.1 presents a 3D-plot
of the the observations.
1http://www.crsp.com
21
Data 22
19701975
19801985
19901995
20002005
2010
0
50
100
1500
5
10
15
20
YearTime to Maturity (in Months)
Yie
ld (
%)
Figure 4.1: 3D plot of the panel of zero-coupon yields. The figure shows the yields inthe period from January 1970 till December 2009. Yield data is used of 17 maturities,
between 3 months and 10 years.
From the plot it can be seen that yields differ substantially as a result of major economic
events and economic policy. There are periods of extreme high interest: in the early
80’s interests were high due to economic policy. But also we see the recent extreme
low interest rates after the financial crisis of 2008. Interest rates can be seen rising
and declining: for example before and after the burst of the 2000 dot-com bubble and
the 2006 housing bubble. Further we observe that the long-term trend for interests is
downward.
Also observable is that the yield curve differs in shape. A lot of different shapes can
be seen: increasing or decreasing both at increasing rate or at decreasing rate, flat, S-
shaped, U-shaped (inverted humped), inverted U-shaped (humped).
From the descriptive statistics in table 4.1 we see that the average yield curve is upward
sloping. This would mean that term-premia exist. A logical explanation for these premia
can be risk aversion or liquidity preference. Another stylised fact that is shown is that
the short-end of the yield curve is more volatile than the long-end. Volatilities of yields
tend to decrease with maturity. This can be seen as a confirmation that long-term rates
are the average of the expected future short-rates. We also see that all maturities have
high autocorrelations. But, the short end of the yield curve is less persistent; it has lower
Data 23
autocorrelations for longer lags than the long end. Autocorrelations of longer maturities
are still strong for longer lags (2 years).
It can also be seen that the yields are skewed to the right, which means that more mass is
in the right tail than there is in the left tail. Median yield curve with quantiles in figure
4.3 confirms that the yield is right skewed. Another fact is that yields are leptokurtic,
which might suggest thick tales and hence a student-t distribution.
Furthermore we specify proxies for the level, slope and curvature as proposed by Diebold
and Li (2006). The proxy for the level is simply given by the longest maturity yield (10
year). The slope is estimated as the yield of the longest maturity minus the yield of the
shortest maturity (10 year yield - 3 month yield). Finally, the curvature is defined as
2 times the 2 year yield minus the sum of the 3 month and the 10 year yield. We can
deduce from these that the yield curve is concave because the slope and curvature are
on average positive. Moreover we see that the level is highly persistent, as opposed to
the autocorrelation of the slope which goes to zero. The stylised fact that long rates are
more persistent than the short rates is indicated by the higher persistence of the level
than the persistence of the slope and curvature (β2 and β3).
The sample autocorrelations indicate that yields might be integrated of order one. If that
is the case the underlying process is non-stationary and we need to take first differences.
Fortunately, economic theory dictates that yields cannot be integrated and must have a
non-negative, finite expected value. So we may follow through in modelling in levels.
Finally, a Principal Component Analysis (PCA) confirms that the first three principal
components indeed give most of the yield variation. Together they capture almost 99%
of the yield variation. As can be seen from figure 4.2 the loadings of the principal
components show similarities with the Nelson-Siegel loadings. The loadings of the first
principal component almost exactly matches the inverted shape of the Nelson-Siegel
slope loading. The loadings of the second principal component corresponds to the shape
of the curvature. Only the loading of the third principal component is not exactly level
as its Nelson-Siegel counterpart. Instead it has to some extend a sinusoid shape.
Data 24
0 20 40 60 80 100 120−20
−15
−10
−5
0
5
10
15
20
25
Time to Maturity (in Months)
Lo
ad
ing
s
PC1
PC2
PC3
Figure 4.2: Loadings of first three Principal Components: where the inverse of theloading of the first principal component is depicted.
0 20 40 60 80 100 1200
2
4
6
8
10
12
14
Time to Maturity (in Months)
Yie
ld (
%)
5%
25%
Median
Figure 4.3: Median yield curve with 5, 25, 75 and 95 percentiles. The graph indicatesthat yields are skewed to the right.
Data 25
Table4.1:
Des
crip
tive
stati
stic
sof
the
yie
lds
Matu
rity
Mean
Std
.D
ev.
Med
ian
Min
imu
mM
axim
um
Skew
ness
Ku
rtosi
sρ1
ρ12
ρ24
35.7
663.
071
5.32
70.
041
16.0
190.
711
3.99
60.
979
0.74
90.
489
65.9
693.
098
5.51
50.
150
16.4
810.
665
3.82
10.
980
0.76
30.
517
96.0
833.
089
5.69
20.
193
16.3
940.
632
3.71
20.
981
0.77
10.
538
12
6.1
663.
053
5.83
10.
245
16.1
010.
573
3.58
80.
981
0.77
70.
552
15
6.2
533.
029
5.99
20.
377
16.0
550.
519
3.48
70.
982
0.78
50.
571
18
6.3
243.
009
6.07
00.
438
16.2
190.
519
3.46
30.
983
0.79
20.
585
21
6.3
872.
990
6.13
10.
532
16.1
730.
534
3.46
20.
983
0.79
70.
598
24
6.4
182.
943
6.18
30.
532
15.8
140.
518
3.40
00.
983
0.79
90.
609
30
6.5
122.
878
6.27
40.
819
15.4
290.
496
3.32
20.
983
0.80
80.
627
36
6.6
002.
832
6.34
70.
978
15.5
380.
531
3.35
00.
984
0.81
40.
642
48
6.7
562.
755
6.57
11.
019
15.5
990.
567
3.33
50.
984
0.82
20.
664
60
6.8
522.
671
6.65
01.
556
15.1
290.
611
3.27
70.
985
0.83
20.
685
72
6.9
642.
638
6.73
21.
525
15.1
080.
635
3.25
90.
987
0.84
20.
702
84
7.0
262.
573
6.84
32.
179
15.0
240.
709
3.30
20.
987
0.84
10.
709
96
7.0
692.
536
6.80
52.
105
15.0
520.
748
3.29
30.
988
0.85
00.
721
108
7.0
952.
519
6.77
52.
152
15.1
140.
800
3.32
70.
988
0.85
30.
724
120
(Level)
7.0
672.
465
6.68
32.
679
15.1
940.
863
3.40
90.
988
0.84
30.
717
Slo
pe
1.3
011.
362
1.33
8-3
.191
3.95
4-0
.454
3.03
60.
934
0.41
80.
024
Cu
rvatu
re0.0
030.
863
0.11
2-2
.174
2.90
5-0
.126
3.31
80.
877
0.44
10.
242
Chapter 5
Estimation
In this section the estimation methods are introduced. First we give the methods for
initial parameters estimates. This is for the initiation of the optimization procedures of
the specific models. We need sensible initial parameter estimates to avoid estimation
difficulties, because the models are highly parametrized. We first proceed to estimate a
good initial value for lambda. We then give the procedure to estimate the model using
the two step approach introduced by Diebold and Li (2006). Subsequently we give the
method to estimate the model using Kalman filter (1960) as proposed by Diebold et al.
(2006). Finally, we give the estimation framework of the GAS.
5.1 Initial Estimates
5.1.1 Lambda
For each cross-section, at each moment in time, we can estimate a (D)NS model. The
estimates of these cross-section models will play an important role later on in the estima-
tion of the models. We are especially interested in the estimation of λ as the optimization
over λ will be nonlinear. This nonlinearity may result in difficulties.
For any cross-section we minimize the sum of squares error. We want to know the values
of the following optimization:
minλ,β (yt −X(λ)β)′ (yt −X(λ)β) , (5.1)
where yt is the 17×1 vector of yields, X(λ) a 17×3 matrix of factor loadings depending
on λ, and β a 3× 1 vector of factors
27
Estimation 28
Given some λ this reduces to:
minβ (yt −Xλβ)′ (yt −Xλβ) . (5.2)
This is a simple OLS, hence we get:
βλ =(X ′λXλ
)−1X ′λyt. (5.3)
We substitute this in equation (5.1) and get:
minλ
(yt −Xλ
(X ′λXλ
)−1X ′λyt
)′ (yt −Xλ
(X ′λXλ
)−1X ′λyt
)(5.4)
= minλ y′t
(I −Xλ
(X ′λXλ
)−1X ′λ
)′ (I −Xλ
(X ′λXλ
)−1X ′λ
)yt.
Because of orthogonal projectors this reduces to:
minλ y′tyt − y′tXλ
(X ′λXλ
)−1X ′λyt. (5.5)
The optimization problem for the minimum sum of squares problem is then given by:
minλ
T∑t=1
(y′tyt − y′tXλ
(X ′λXλ
)−1X ′λyt
). (5.6)
Optimization of this function is done using the optimization routine of matlab; fminunc,
and gives as result λ = 0.062. Furthermore we estimate equation (5.1) for each cross-
section using Nonlinear Least Squares (NLS) to compare with our model with time-
varying λt.
5.1.2 Two-Step Estimation
Next we can turn to an estimate of the model using the two-step approach as proposed
by Diebold and Li (2006):
1. For some fixed λ we fit a static Nelson-Siegel to each cross-section (t = 1, ..., T )
using Ordinary Least Squares using equation (5.3). This results in three time
series of estimated latent factors (β1t, β2t, β3t) and estimated residual errors, the
measurement disturbances (εt)
2. A dynamic model is fitted to the estimated factors using equation (2.12). In the
paper of Diebold and Li (2006), AR(1) models are fitted to each of the factors. It
is also possible to fit a VAR(1) to the factors.
Estimation 29
The advantage of this procedure is that it is simple and numerically stable as it only uses
linear regressions. In this approach it is also possible to additionally estimate λ in the
first step. This results in four factors in the first step and a four-dimensional dynamic
model in the second step. In this procedure the parameter estimation error from the
first step is ignored in the second step. This may effect the second step and create a
bias or distort results. Consequently, it is difficult to conduct statistical inference.
5.1.3 One-Step State-Space Estimation
Using the initial values estimated in the two-step approach the state-space model can
be estimated in one step. Estimation of measurement (observation) (2.13) and state
(transition) equation (2.14) is done with the Kalman filter (1960). The Kalman filter
accounts for all the uncertainty in the framework.
The Kalman filter is an iterative estimation algorithm, it consists of two steps: a pre-
diction and an update step. The filter gives a minimum mean squared error prediction
of the latent factors. The Kalman algorithm is given in appendix B.
The likelihood of the Kalman filter is maximized using an optimization routine of MAT-
LAB. For the optimization ’fmincon’ is used, which is a constrained optimization rou-
tine. The routine is initialized with the estimates of the two-step model with reasonable
constraints given. The likelihood is optimized using the interior-point algorithm with
numerical derivatives and Broyden-Fletcher-Goldfarb-Shanno (BFGS) Hessian’s. BFGS
is the Quasi-Newton method used to approximate the Hessian’s of the likelihood needed
for the optimization.
5.2 GAS Estimation
Next we introduce the estimation procedure of the model in the GAS framework. First
the Maximum Likelihood estimation is introduced. We then give the procedure for the
initial factors. Subsequently we introduce a smoothing scheme for the Fisher information
matrix. We then show how we may conduct statistical inference. And finally we derive
the model specific scores and Fisher information criteria.
5.2.1 Maximum Likelihood Estimation
The GAS models are like the Kalman filter estimated by Maximum Likelihood (MLE).
For the maximum likelihood we need a fully specified probability density function. For
Estimation 30
our fully specified parametric model we have:
arg maxθ
(L(θ)) = log p(y1, . . . , yt|θ) (5.7)
= log
T∏t=1
p(yt|θ)
=
T∑t=1
`t(θ), where `t = log p(yt|θ).
The likelihood of the GAS can be evaluated in a iterative manner as can be seen from
equation (5.7) above: the local log-likelihood (`t) is determined for each time period
(t = 1, . . . , T ) and summed to a total in an iterative manner.
To determine the factors for each time-period we need to derive st from the updating
equation (2.20). For this we need at least the score (∇t) and preferably the Fisher in-
formation It|t−1 : so we need the derivative w.r.t. the dynamic factors (ft) as given in
equation (2.21) and (2.22). For each model specification this gives different results, the
scores and Fisher information matrices are derived in section 5.3 below.
The optimization routine optimizes the model over the static parameter space θ: for each
θ the likelihood is evaluated and adapted in a direction it will increase the likelihood.
Maximization is done using the MATLAB routine fmincon with the interior-point algo-
rithm and again BFGS Hessian’s.
5.2.2 Initial Factors
To start each likelihood evaluation we need to specify initial values for the dynamic
factors (f1). A couple of options are considered:
• A natural consideration is the unconditional expectation of the factors. For the
GAS(1,1) we have:
ft+1 = ω +A · stB · ft (5.8)
E [ft+1] = ω +A · E [st] +B · E [ft]
(I −B)E [ft] = ω +A · 0
E [ft] = (I −B)−1ω.
• Another possibility is to initialize the iterative procedure at the optimal DNS of
the cross-section at time t=1. For some fixed λ this could be estimated using OLS
with equation (5.3). For fixed λ the model is less sensitive to the initial value. For
Estimation 31
a variable λt initial values are more important as completely wrong initial values
will influence the estimates.
• The final possibility and the most correct is to first apply the forward GAS filter
(compute ft for t = 2, . . . , T + 1) with arbitrary initial values (f1), then backward
filter (compute ft for t = T, . . . , 0), then forward again.
So the GAS updating recursion can be started at different values, but in theory should
approach the optimal values after some ’learning’ time even with wrong initial values.
5.2.3 Smoothing
As proposed by Creal et al. (2008, 2012) we try to scale with the inverse Fisher informa-
tion (I−1t|t−1). A difficulty of scaling with (an approximation of) the inverse information
matrix is that this information matrix must be inverted. This can be a problem if the
information matrix is ill-behaved, i.e. it is not full of rank or numerically unstable for
some model.
A way to help reduce the chance of problems with non-invertible matrices is the use of
some smoothing scheme. Instead of the normal information matrix, a smoothed infor-
mation matrix is used as scaling; (Ist−1)−1. Creal et al. (2008, 2012) proposes to use a
Exponentially Weighted Moving Average (EWMA):
Ist−1 = αIst−2 + (1− α)It−1. (5.9)
for some 0 ≤ α ≤ 1. For α → 1 the model averages over all the past observations. For
α → 0 it reduces to scaling with the Information Matrix. The parameter α is initially
fixed at a safe value of α = 0.2. Eventually it can be added to the unknown parameter
vector θ and optimized using MATLAB’s optimization routine.
5.2.4 Inference
To conduct statistical inference we apply the standard limiting result. The estimated
vector θ has all the static parameters of the models. We use the Hessian at the optimum
to compute standard errors and t-values. By standard regularity conditions the MLE is
consistent and we have:
√T (θ − θ) d−→ N (0, H−1), with H = −E[∂2`/∂θ∂θ′]. (5.10)
The Hessian is calculated numerically and optimization is terminated when tolerance
between iterations is smaller than 10−6.
Estimation 32
5.3 Scores and Scaling
We now proceed to derive the score vectors and scaling matrices of the proposed models.
In order to do this, analytical derivatives and expectation of the log-likelihood functions
are determined.
5.3.1 Gaussian
At first we assume Gaussian errors as proposed in section 3.1.1. The errors εt are given
by a vector of N × 1 in our example N = 17. Hence we have:
εt = yt −Xtft ∼ N(0,Σε). (5.11)
For estimation convenience we assume a diagonal Σε.
The probability density for each observation is given by:
p(yt|θ) = (2π)−N2 (Σε)
− 12 exp
(−1
2(yt −Xβt)′Σ−1ε (yt −Xβt)
). (5.12)
`t(θ) = −N2
log(2π)− 1
2log(|Σε|)−
1
2(yt −Xβt)′Σ−1ε (yt −Xβt). (5.13)
Taking derivatives w.r.t. the initial factor ft = βt leads to the gradient and the scaling
matrix, the inverse information matrix:
∇t(θ) =∂`t∂βt
= −1
2· −2 ·X ′tΣ−1ε (yt −Xtft) (5.14)
= X ′tΣ−1ε (yt −Xtft).
St = Et−1[X′tΣ−1ε εtε
′tΣ−1ε Xt]
−1 (5.15)
= (X ′tΣ−1ε Xt)
−1.
Combined this leads to the scaled score given by:
st = (XtΣ−1ε X ′t)
−1X ′tΣ−1ε (yt −Xtft). (5.16)
Estimation 33
5.3.2 Student-t
We now derive the score and information matrix of the student-t distribution.
p(yt|θ) =Γ(v+m2
)Γ(v2
)[(v − 2)π]N/2 |Σε|1/2
[1 +
ε′tΣ−1ε εt
(v − 2)
]−(v+N)/2
, (5.17)
with, εt = yt −Xtβt
This leads to the log-likelihood given by
`t = log
[Γ
(v +m
2
)]− log
[Γ(v
2
)]− N
2log [(v − 2)π] (5.18)
− 1
2log|Σε|−
(v +N)
2log
[1 +
ε′tΣ−1ε εt
(v − 2)
]. (5.19)
Taking derivatives w.r.t. the factor βt obtains the score given by:
∇t =∂`t∂βt
= −(v +N)
2
[1 +
ε′tΣ−1ε εt
(v − 2)
]−1· −2 · X
′tΣ−1ε εt
(v − 2)(5.20)
= (v +N)
[1 +
ε′tΣ−1ε εt
(v − 2)
]−1X ′tΣ
−1ε εt
(v − 2). (5.21)
taking derivatives again gives:
∂2`t∂βt∂β′t
= (v +N)
(∂(v + ε′tΣ
−1ε εt
)−1∂β′t
X ′tΣεεt +(v + ε′tΣ
−1ε εt
)−1 ∂X ′tΣ−1ε εt∂β′t
)(5.22)
= (v +N)(−(v + ε′tΣ
−1ε εt
)−2 (X ′tΣ
−1ε εt
) (ε′tΣ
−1ε Xt
)+(v + ε′tΣ
−1ε εt
)−1 (−X ′tΣ−1ε Xt
).
The problem with this expression is that it is difficult to determine the expectation. We
therefore take the scaling derived for the Gaussian specification as an approximation of
the scaling. Also the derived hessian above is used as an approximation combined with
the smoothing scheme introduced in section (5.2.3). This leads to an approximation of
the real information matrix.
5.3.3 Variable Lambda
We now proceed to extend the factor vector ft with λt as proposed by Creal et al. (2008):
f+t = [β′t, λt]′. (5.23)
Estimation 34
Because the derivative w.r.t. βt stays unchanged. Only the derivative w.r.t λt needs to
be derived . Which is given by:
∂`t∂λt
=
(∂Xt
∂λtβt
)′Σ−1ε (yt −Xtβt). (5.24)
Together with the derivative given in (5.14) this forms:
∇+t (θ) =
∂`
∂f+t(5.25)
=
[∂`
∂βt,∂`
∂λt
]′=
X ′tΣ−1ε (yt −Xtβt)(
∂Xt∂λt
βt
)′Σ−1ε (yt −Xtβt)
=[Xt ,
(∂Xt∂λt
βt
)]′Σ−1ε (yt −Xtβt)
= Xt′Σ−1ε (yt −Xtβt),
with Xt =
[Xt,
(∂Xt
∂λtβt
)],
and∂xi(τi)
∂λt=
[0,e−λtτi
λt− 1− e−λtτi
λ2t τi,−1− e−λtτi
λ2t τi+ τie
−λtτi +e−λtτi
λt
].
Next the Information matrix is derived using the gradient:
It = E[∇+t ∇
+t′] (5.26)
= E[Xt′Σ−1ε εtε
′tΣ−1ε Xt
]= Xt
′Σ−1ε E
[εtε′t
]Σ−1ε Xt
= Xt′Σ−1ε ΣεΣ
−1ε Xt
= Xt′Σ−1ε Xt. (5.27)
So the scaling matrix is given by:
St = I−1t =(Xt′Σ−1ε Xt
)−1. (5.28)
Combined with the score this leads to the scaled score:
st =(Xt′Σ−1ε Xt
)−1Xt′Σ−1ε (yt −Xtβt). (5.29)
We now specify f+t = (β′t, λt)′ a vector of 4x1 as proposed by Creal et al. (2008):
f+t = φ0 + Φft. (5.30)
Estimation 35
Here ft = βt is the 3x1 vector of factors. This imposes a three-factor structure on the
dynamics of (β′t, λt)′. Using these restrictions the performance of the GAS model can
be assessed with non-linearity but with restrictions on the dynamics of the parameters.
Imposing no restrictions at all will result in a highly non-linear system, which will result
in estimation difficulties. For identification purposes the upper 3× 3 matrix is set equal
to the identity matrix. The the upper three elements of φ0 are set equal to zero. λt is
now a linear function of βt:
f+t = φ0 + Φft (5.31)β1t
β2t
β3t
λt
=
0
0
0
c0
+
1 0 0
0 1 0
0 0 1
c1 c2 c3
β1
β2
β3
=
β1t
β2t
β3t
c0 + c1β1 + c2β2 + c3β3
.
Creal et al. (2008) forgets to mention that this is parametrization for their model and that
the scaled score (5.29) that they depict is not complete. Since it is a parametrization
a new scaled score must be derived for the updating equation as this is the driving
mechanism of the updating equation:
∇t =∂`t∂ft
(5.32)
=∂f+t∂f ′t· ∂`t∂f+t
= Φ′ · ∇+t .
The inverse Fisher information is then given by:
(It|t−1
)−1=(Et−1
[Φ′∇+
t ∇+t′Φ])−1
(5.33)
=(Et−1
[Φ′∇+
t ∇+t′Φ])−1
=(Φ′Et−1
[∇+t ∇
+t′]Φ)−1
=(
Φ′Xt′Σ−1ε XtΦ
)−1.
Estimation 36
Hence this results in the scaled score:
st =(It|t−1
)−1 · ∇t (5.34)
=(
Φ′Xt′Σ−1ε XtΦ
)−1Φ′ · ∇+
t .
5.3.4 Common Disturbance with Time-Varying Volatility
We now derive the score and information matrix for the model with common disturbance
specification. For our factor vector we have: ft = (β1t, β2t, β3t, ht). For each time period
the log-likelihood is given by:
`t(θ) = −N2ln(2π)− 1
2ln(|Σε(ht)|)−
1
2ε′t(Σε(ht))
−1εt. (5.35)
with, εt = yt −Xtβt. Taking the derivative with regard to ht gives:
∂`t∂ht
= −1
2
∂ln (|Σε(ht)|)∂ht
− 1
2
∂(ε′t (Σε(ht))
−1 εt
)∂ht
. (5.36)
Derivations of the parts are given below:
∂Σε(ht)
∂ht= ΓεΓ
′ε. (5.37)
∂ln (|Σε(ht)|)∂ht
= Tr
(Σε(ht)
−1∂Σε(ht)
∂ht
)(5.38)
= Tr(Σε(ht)
−1ΓεΓ′ε
).
∂(ε′t (Σε(ht))
−1 εt
)∂ht
= −ε′t (Σε(ht))−1 ∂Σε(ht)
∂ht(Σε(ht))
−1 εt (5.39)
= −ε′t (Σε(ht))−1 ΓεΓ
′ε (Σε(ht))
−1 εt
= −(ε′t (Σε(ht))
−1 Γε
)2.
Hence, together these form the derivative w.r.t. ht; the fourth element of the score
vector:
∂`t∂ht
= −1
2Tr(Σ−1ε (ht)ΓεΓ
′ε) +
1
2
(ε′t (Σε(ht))
−1 Γε
)2. (5.40)
Combined with the derivative w.r.t. βt derived in (5.14) this forms:
∂`t∂f∗t
=
[∂`t∂β′t
,∂`t∂ht
]′. (5.41)
Estimation 37
To derive the scaling matrix, the inverse Information matrix, all terms are derived w.r.t
to the factors once more. This results in:
∂2`t∂f∗t ∂f
∗t′ =
∂2`t∂βt∂β′t
∂2`t∂βt∂ht
∂2`t∂ht∂β′t
∂2`t∂2h2t
. (5.42)
Further derivations of the parts are given by:
∂Tr(Σ−1ε (ht)ΓεΓ′ε)
∂ht= Tr
(∂Σ−1ε (ht)
∂htΓεΓ
′ε
)(5.43)
= Tr
(−Σ−1ε (ht)
∂Σε(ht)
∂htΣ−1ε (ht)ΓεΓ
′ε
)= −Tr
(Σ−1ε (ht)ΓεΓ
′εΣ−1ε (ht)ΓεΓ
′ε
)= −Tr
((Σ−1ε (ht)ΓεΓ
′ε
)2).
∂(ε′t (Σε(ht))
−1 Γε
)∂ht
= ε′t∂ (Σε(ht))
−1
∂htΓε (5.44)
= −ε′tΣε(ht)−1∂Σε(ht)
∂htΣε(ht)
−1Γε
= −ε′tΣε(ht)−1ΓεΓ
′εΣε(ht)
−1Γε.
∂(ε′t (Σε(ht))
−1 Γε
)2∂ht
= 2 · ε′t (Σε(ht))−1 Γε
∂(ε′t (Σε(ht))
−1 Γε
)∂ht
(5.45)
= −2(ε′t (Σε(ht))
−1 Γε
) (ε′tΣε(ht)
−1Γε)
Γ′εΣε(ht)−1Γε
= −2(ε′t (Σε(ht))
−1 Γε
)2Γ′εΣε(ht)
−1Γε.
Together these parts form:
∂2`t∂h2t
=1
2Tr((
Σ−1ε (ht)ΓεΓ′ε
)2)− (ε′t (Σε(ht))−1 Γε
)2Γ′εΣε(ht)
−1Γε. (5.46)
Taking expectations gets:
It|t−1 = −Et−1[∂2`t∂h2t
]= −1
2Tr((
Σ−1ε (ht)ΓεΓ′ε
)2)+(Γ′εΣε(ht)
−1Γε)2. (5.47)
Estimation 38
For the cross-derivative this gets:
∂(ε′t (Σε(ht))
−1 Γε
)∂β′t
= −Γ′εΣε(ht)−1Xt. (5.48)
∂2`t∂ht∂β′t
=∂(ε′t (Σε(ht))
−1 Γε
)2∂β′t
(5.49)
=1
2· 2 · ε′t (Σε(ht))
−1 Γε∂(ε′t (Σε(ht))
−1 Γε
)∂β′t
= −ε′t (Σε(ht))−1 ΓεΓ
′εΣε(ht)
−1Xt.
Taking expectations this gets:
Et−1
[∂2`t∂ht∂β′t
]= 0. (5.50)
Because ht is a variance it needs to be positive. We thus use ht = exp(αt) as parametriza-
tion. αt is taken to be the modeled factor. We need to derive the new score and in-
formation matrix w.r.t. to our new factors f t = (β1t, β2t, β3t, αt). Because this is an
invertible mapping. The following rule can be applied for derivation of the scaled score
(Creal et al., 2008). Let:
f t = g(ft) (5.51)
gt =∂g(ft)
∂f ′t. (5.52)
And let gt be invertible. We then have:
∇t =∂`t
∂f t=∂ft
∂f t· ∂`t∂ft
(5.53)
=
(∂f t∂f ′t
)−1· ∇t
∇t =(g′t−1
)−1∇t.Thus the score is given by:
st =(
Et−1
[(g′t−1
)−1∇t∇′t (gt−1)−1])−1 (
g′t−1)−1∇t (5.54)
= gt−1st.
Estimation 39
In this case we have a clearly invertible mapping, hence the rule can be applied. In our
case we have:
g(f∗t ) = (β1t, β2t, β3t, log(ht)) (5.55)
g∗t =∂g(f∗t )
∂(β1t, β2t, β3t, ht)=
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 h−1t
. (5.56)
5.3.5 Common Time-Varying Volatility
We now derive score and information matrix for the diagonal variance specification as
proposed in section (5.3.5). The likelihood is given by:
`t(θ) = −N2
log(2π)− 1
2log(|htΦ|)−
1
2h−1t ε′tΦ
−1εt. (5.57)
We have the score given by:
∇t =∂`t∂ht
= −1
2(Tr((htΦ)−1Φ)− h−2t ε′tΦ
−1εt) (5.58)
= −1
2(h−1t − h
−2t ε′tΦ
−1εt).
The second derivative is then given by:
∂`2t∂h2t
= −1
2(−h−2t + 2h−3t ε′tΦ
−1εt). (5.59)
As Information matrix we get:
It|t−1 = −Et−1
[∂`2t∂ht
](5.60)
= −Et−1
[−1
2(−h−2t + 2h−3t ε′tΦ
−1εt)
]=
1
2(−h−2t + 2h−3t · 17ht)
=33
2h−2t .
As scaled score we get:
I−1t|t−1∇t =2
33h2t · −
1
2(h−1t − h
−2t ε′tΦ
−1εt) (5.61)
=1
33(ε′tΦ
−1εt − ht).
Estimation 40
This means that the variance is adjusted with the size of the weighted squared distur-
bance ε′tΦ−1εt, taking the current variance ht into account. To ensure a positive variance
we take as factor ft = log(ht) we get:
I−1t|t−1∇t =1
33(h−1t ε′tΦ
−1εt − 1). (5.62)
Chapter 6
Results
In this section we discuss the in-sample estimation results. First, the in-sample results
are discussed through the RMSE and likelihood values. Also, likelihood ratio statistics,
Akaike, Bayesian information criterion are used to justify the size of the models. Further
we compare GAS estimated models to Kalman estimated models using the Rivers-Vuong
(2002) test. Finally, we describe model specific estimation results and observations.
Table 6.1 gives an overview of the estimated models.
Table 6.1: Overview of Estimated Models
Name Model Description Section
g-AR GAS Gaussian DNS with diagonal coefficient matrices in the
updating equation
5.3.1
g-VAR GAS Gaussian DNS with full coefficient matrices in the up-
dating equation
5.3.1
g-λ-AR GAS Gaussian DNS with variable λt and diagonal coefficient
matrices in the updating equation.
5.3.3
g-TVV-AR GAS Gaussian DNS with Common Time-Varying Volatility
and diagonal coefficient matrices in the updating equation.
5.3.5
g-CD-AR GAS Gaussian DNS with Common Disturbance with Time-
Varying Volatility and diagonal coefficient matrices in the
updating equation.
5.3.4
g-BC-AR GAS Gaussian four-factor Bjork-Christensen with diagonal
coefficient matrices in the updating equation.
5.3.2
t-AR GAS Student-t DNS with diagonal coefficient matrices in
the updating equation.
5.3.2
k-AR Kalman filter DNS with diagnonal coefficient matrix App. B
k-VAR Kalman filter DNS with full coefficient matrix App. B
41
Results 42
6.1 In-Sample Performance
Because all the models estimated using the GAS are nested, we test for statistical signif-
icance of the extended models using the likelihood ratio statistic (LR). Unfortunately,
the models estimated with the Kalman filter have a different distribution due to dif-
ferent assumptions and thus are non-nested. We therefore make comparisons based on
the residuals. We use the Rivers-Vuong test to compare the lack-of-fit of the competing
models that are non-nested. We compare the Gaussian (un)correlated factors to the
Kalman (un)correlated factors model: i.e. g-(V)AR) vs. k-(V)AR. As lack-of-fit mea-
sure we use the trace Mean Squared Error(tMSE).
In table 6.2 we present the likelihood values, Akaike (AIC), Bayesian Information Crite-
ria (BIC), Likelihood Ratio (LR) and Rivers-Vuong (RV) statistics. Further, we present
in table A.1 the RMSE’s and tRMSE.
From the likelihood values and RMSE we see that the model specification with a full
A and B (correlated factors/VAR) specification only marginally improves the fit of the
model, though LR-statistic, AIC and BIC suggest to use them.
Further, it can also be seen that the model with variable lambda, which is only es-
timated in diagonal A and B specification (AR), has a marginally better fit than the
model with static lambda. This can be concluded from the likelihood based measures but
also from RMSE. Both compared to the diagonal specification as to the full specification.
The specifications with time-varying volatility lead to a huge increase in the log-likelihood
value. The resulting likelihood-ratio statistic indicates highly significant improvements
in fit. Also the Akaike and Bayesian Information Criteria indicate that the model exten-
sions significantly improve fit, despite increased number of parameters. Yet, the RMSE
do not show such an increase in fit. The specification with common disturbance performs
worse based on RMSE. The specification with a common factor driving the volatility
only has slightly smaller RMSE for the 3 and 6 month maturity, for all other maturities
it performs slightly worse.
Moreover, we see that the use of the diagonal Bjork and Christensen (1999) extension,
which uses an additional fourth factor, comparably improves fit or even gives a slightly
better fit than the variable lambda specification based on likelihood and RMSE. This is
achieved through the addition of the same number of parameters.
The specifications with student-t disturbances indicate an even better fit of the model.
These specifications give a log-likelihood increase of over 40% compared to the basic
Gaussian model, which would indicate the superior fit of all the models, even without
time-varying volatility specification or nonlinearities. The RMSE of these model speci-
fications however indicate the opposite of a better fit. Here we see consistent increases
in the estimated errors.
Further we see from the RMSE that the GAS estimated models, both diagonal as full
Results 43
A and B, give a better fit than the Kalman estimated models. For all maturities they
have smaller RMSE. Moreover, the Rivers-Vuong statistics for the tMSE are significant
for both the diagonal as full specification comparisons.
Model `(θ) #θ AIC BIC LR RV
g-AR -4997 27 10048 10211g-VAR -4967 39 10011 10246 61*g-λ-AR -4906 30 9871 10052 183*g-TVV-AR -3595 29 7570 7745 2482*g-CD-AR -3258 46 6608 6885 3478*g-BC-AR -4890 30 9840 10021 214*t-AR -2712 28 5480 5649 4570*
k-AR 13130 27 -26205 -26043 -2.43*k-VAR 13139 33 -26212 -26014 18,6* -2.02*
Table 6.2: In-Sample Fit statistics in the period from 1970-Jan till 2009-Dec: Inthis table log-likelihood values, number of parameters, Akaike Information Criterion,Bayesian Information Criterion and Likelihood-Ratio statistic are reported. The Likeli-hood Ratio statistic is compared to the diagonal AR specification for the GAS estimatedmodels. Similarly a comparison is made between the two Kalman estimated models.The Rivers-Vuong statistic is calculated for the g-(V)AR vs. k-(V)AR. * indicates 99%
confidence
Next the results and observations from the individual estimated models are presented.
6.1.1 Two-Step
By fitting the DNS in two steps we obtain initial estimates of the autoregressive part of
the model. This is done for the AR and VAR specification. Given λ = 0.062 we have
an optimal βt, which results in an estimated curve with corresponding errors. Fitting
an autoregressive model to these estimates results in initial values for further model
estimation. The errors still show high autocorrelation (ρ1 = 0.73). This indicates that
there might be a common source of error or an additional factor driving the dynamics.
Also the kurtosis (the peakedness of distribution curve) indicates student-t disturbances.
6.1.2 Kalman
The model estimated with the Kalman filter using the initial values of the two-step
approach results in estimates that should be even closer to the optimal values of the
GAS estimated models. These results will also give a benchmark to which we compare
the GAS estimated models.
This specification is estimated both in a diagonal (k-AR) as well as in a full (k-VAR)
specification. The estimates of the coefficients of the VAR specification show that the
Results 44
factors do not affect each other much. Though the estimates of the off-diagonal entries
of the coefficient matrix are significant.
The log-likelihood values of the Kalman filter estimated models look far more favorable
compared to the values of the GAS estimated models (+8000 points). This results from
the way uncertainty is absorbed in the disturbances of the state equation. The small
variance matrix that therefore enters the log-likelihood function results in far higher
(positive) log-likelihood values. But the different model assumptions make that the
Kalman estimated models are not nested in the GAS estimated models and therefore
uncomparable. The validity of these assumptions are not tested (and difficult to test),
but likely violated. Especially, the assumption that all noise entering the system is White
noise. We see that the Kalman filter estimated errors still shows some autocorrelation
(ρ1 = 0.26). This also suggests the time-varying volatility specification or an additional
factor. The disturbances again have very high correlations, which would indicate an
additional factor or a common disturbance. Also, the excess kurtosis is high with values
of 19 for shortest maturity estimated disturbances till 4 for the 10 year maturity. This
would again suggest the use of student-t distributed errors.
6.1.3 Gaussian
Optimization of the GAS estimated models is started at the optimal estimated values
from the Kalman filter. This specification is both estimated for a diagonal as well as a
full A and B coefficient matrix. In table A.4 and A.5 we present the estimated coeffi-
cients. From this we see the high persistence in the yields through the high estimated
autoregressive coefficients. The full specification gives a significant increase in fit and the
estimates of the off-diagonal element are also significant. Still, the off-diagonal entries of
B, the coefficients for the lagged values of the factors, are close to zero. This is expected
as the factors closely resemble the first three principal component, which should mean
that they are orthogonal to each other and do not affect each other. Only the estimates
of the off-diagonal elements of A, the coefficient matrix for the scaled score, take on
very different values. In particular the entries that affect the third factor: the curvature
component. Also, the constant ω coefficient takes on a very different (negative) value for
the curvature factor, this may compensate for the effects of the score coefficients. But
through these dynamics the model has a different economical meaning, with a decreasing
medium-term component.
The estimated error statistics show that the model errors have low autocorrelation
(ρ1 = 0.05), which indicates that the GAS updating equation is good at absorbing shocks
in the factors. Still the errors of each time-period have high correlations. This can be
explained by a common disturbance or some other source driving these correlations.
Results 45
Further, the estimated disturbances have a low skewness. Though, again the residuals
have a high kurtosis. Again the kurtosis is a decreasing function of the maturity: from 24
for the shortest maturities to 4 for the longest maturity. This again points towards the
use of student-t disturbances. Moreover, Jarque-Bera tests and Kolmogorov-Smirnov
tests on the residual vectors of each maturity reject normality (99.9% confidence).
6.1.4 Student-t
In table A.9 we present the estimates of the coefficients of the student-t specification.
The simple DNS specification with student-t distributed disturbances shows a huge in-
crease in likelihood value. This increase can be accounted to the heavy-tailed nature
of this distribution. Due to these fatter tails, outliers or big shocks to the yields are
regarded as more probable. This assumption of an heavy-tailed distribution is not an
odd assumption. Indeed yields are highly persistent and deviations are small but when
shocks appear, they are sizeable. This would indeed suggest a heavy-tailed distribution.
Also the estimated disturbances from the two-step and Kalman estimation indicated
such an assumption.
However, as a result of this heavy-tailed assumption, the score of the multivariate-t dis-
tribution causes the factor dynamics not to react too fiercely to large values of |εt|. This
makes sense because such large values might easily be due to the fat-tailed nature of the
data and should not be fully attributed to increases in the factors. In reality though
these shocks or disturbances affect the yields for longer periods of time. Therefore these
shocks should be incorporated in the factors that are assumed to drive the dynamics
of the yields, i.e. the first three principal components explain 99% of the yield varia-
tion. With the assumption of student-t disturbances the reaction to shocks is less fierce.
This reasoning might explain why the estimated errors of this student-t model are in
fact larger than the estimated errors in the Gaussian case, despite the higher likelihood
value.
Another explanation for the higher estimation errors might be the use of scaling for
the score in this model specification. This scaling is not fully derived because the ex-
pectations of the Hessian are difficult to determine. Instead we use an approximation:
the scaling of the Gaussian model or the smoothed Hessian of the log-likelihood func-
tion(5.3.2). It is unlikely that this is the main reason for the larger estimated errors in
this model. The scaling of the score is time-invariant in the ’simple’ specification that
is evaluated. So small deviations should be compensated through the optimization of
coefficient matrix A of the updating equation (2.20).
As a result of this inability to quickly incorporate structural shocks into the factors we
see high autocorrelations (ρ1 = 0.45) in the residuals. From the plots of the residuals
Results 46
we see that these autocorrelations are indeed mainly the result of larger shocks. After
such shocks it takes the updating mechanism some periods to change the factors, such
that they give a correct representation of the yield curve. Furthermore, correlations are
still high in the estimated errors.
6.1.5 Variable Lambda
Fitting the Nelson-Siegel to the cross-sections using NLS indicates that the optimal λ
varies considerable over time. We have used a constrained optimization which limits
lambda to values between .01 and .5. In figure 6.1 we see the estimated values of
lambda plotted against the optimal lambda from the NLS. It can be seen that the GAS
estimate λt roughly tracks the NLS estimate. The maximum likelihood estimation of this
nonlinear model specification was considerably more difficult. The MATLAB routine has
more difficulties with maximization because of these nonlinearities. But also because
the Fisher information in this specification is dependent on the time-varying parameters
βt. Because of this dynamic dependence it often results in an information matrix that
is (numerically) ill-conditioned. Therefore we use the information smoothing scheme
of section 5.2.3. This reduces problems with the information matrix, but through this
extra parameter makes the system even more difficult to estimate. We therefore initially
fix the smoothing parameter at α = 0.2. These singularity issues are also tried to be
overcome through the addition of a small identity matrix to the information matrix.
1970 1975 1980 1985 1990 1995 2000 2005 20100
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
Year
La
mb
da
NLS
Lambda GAS
Figure 6.1: Plot of estimated λt vs. optimal λ from NLS on the cross-section
Because of the dynamic lambda this specification has some estimation issues. Fortu-
nately, Pooter (2007) recognizes similar issues in a four-factor Svensson specification,
Results 47
though by using the Kalman filter. He sees numerical problems and estimation issues
for the nonlinear model structure, which has two lambda’s that need to be estimated.
Gilli et al. (2010) even argues that in certain ranges of the parameters, the simple three-
factor Nelson-Siegel is badly conditioned, thus estimated parameters are unstable. For
many values of λ, factor loadings are correlated. That is, the correlation between the
second and third loading is high for many values of λ. Therefore it is difficult to attribute
the yield curve shapes to specific factors. In their paper about the calibration of the NS
they show the correlations between the factor loadings and conclude that λ should be
in a range from .1 to 4 for maturities up to 10 years (from .021 to .83 for τ in months
in our case). They also say that correlated regressors are not necessarily a problem in
forecasting. Though it will be an issue, especially in our diagonal model specification.
We do not want these regressors to be correlated as this will affect the factor estimates
and result in extreme values and nonlinearities in these estimates. From our results we
see that λt indeed roughly stays within this range. But some estimation difficulties are
explainable by the optimization algorithm entering the mentioned lambda regions, thus
resulting in even more nonlinear and unpredictable behaviour.
6.1.6 Common Disturbance with Time-Varying Volatility
Estimation points out the sensitivity of this specification. With the use of the variance
matrix Σε (ht) = Σε + htΓΓ′ our model is extremely sensitive to shocks and therefore
extremes follow from these shocks. Due to these extremes the optimization routine ex-
periences estimation difficulties. The determinant of ΓΓ′ is 0. This implies that when
the elements of the diagonal matrix Σε go to zero, the variance matrix Σε (ht) becomes
singular or becomes numerically close to singular. Also, when ht goes to infinity the
effect of the very well conditioned diagonal matrix Σε diminishes and again Σε (ht) be-
comes close to singular. Because Σε (ht) needs to be inverted this makes the matrix
ill-conditioned. We thus need to ensure that Σε stays well-conditioned; i.e. all the
diagonal elements must stay positive. This is not necessarily a problem, because we
can simply limit these at a particular value. But also the common volatility must be
bounded from extremes, even though it is is theoretically unbounded. We therefore use a
(generalized) logistic transformation for the factor transformation. That is from ft to ht.
ht = lb+ub− lb
1 + exp(−ft)(6.1)
where lb and ub are respectively the lower bound and upper bound of the volatility.
This specification for Σε (ht) is clearly derived in a parameter driven approach. It is as-
sumed that there exist two sources of error; one common disturbance and one disturbance
Results 48
for each of the individual maturity yields. Together these two errors form the observed
error. The problem with this combined error is that our observation driven framework
can not distinguish the two errors; our likelihood only depends on the combined variance
matrix and the combined disturbances. Therefore the optimization algorithm reduces
the constant variance matrix (Σε) in the direction it increases the likelihood the most.
Specifically, the likelihood is affected by the variance at time t through two components:
1. Minus the log determinant of the combined variance matrix: − log (|Σε (ht) |). This
term goes to infinity for an ill-conditioned matrix, that is |Σε (ht) |≈ 0 leads to
− log|Σε (ht) |≈ ∞.
2. The second term penalizes the likelihood using the estimated combined disturbance
(εt) and inverted combined variance matrix (Σε (ht))−1. This means that it cannot
have a determinant equal to zero. With high values of ht this matrix becomes
numerically close to singular. So with an unbounded common variance it is difficult
to guard against a situation with such an ill-conditioned covariance matrix.
In practice the proposed transformation does not seem to work. The estimated volatili-
ties in figure 6.2a show some oscillating characteristics. Also the loadings (figure 6.2b),
that pass on the effect of the common disturbance to the different yields, first increase
for maturities till 20 months and then decrease. Theory and the data suggest that
volatilities should in general decrease with maturity. Despite the appealing likelihood
values it is safe to conclude that this specification is not working well.
Looking back we could have stayed within the a true observation driven framework of
modeling the errors. We then would have to assume a different parametrization for the
errors. For example the parametrization proposed by Creal et al. (2011b):
Σt = DtRtDt (6.2)
where Dt is a diagonal standard deviation matrix and Rt is the (symmetric) correla-
tion matrix. Both or just one could be time-varying. Using this specification there is
a clear distinction between the correlation and the volatility component. We thus can
circumvent the problems with the common disturbance specification, which basically
assumes two sources of error, without the possibility to distinguish them. The common
disturbance in our specification leads to a correlation in the yield disturbances. Through
the proposed variance decomposition we capture this correlation in the correlation com-
ponent Rt. Unfortunately, experimentation with this multivariate heavy-tailed model
(based on the multivariate student-t distribution) on the pre-filtered residuals were not
successful. The dimensions of the data used again turn out to be a big challenge. In order
to estimate the ever increasing number of time-varying parameters, matrices of 289×289
Results 49
need to be determined and need to be inverted. This is computationally intensive, but
above all sensitive to computational errors, even through positive definite matrix decom-
positions such as the Cholesky and LDL decomposition problems occur with the matrix
inversions. Also, estimation problems may have been enlarged by the absence of very
clear volatility dynamics in the residuals. As with the yields these volatilities are highly
persistent. The used residuals have reasonably low autocorrelations (ρ1 = 0.2) which
possibly makes it even harder to identify the dynamics.
Figure 6.2: Common Disturbance with Time-Varying Volatility
1970 1975 1980 1985 1990 1995 2000 2005 20100
0.05
0.1
0.15
0.2
0.25
0.3
0.35
Year
ht
(a) Common Disturbance Volatility
0 20 40 60 80 100 1200.9
1
1.1
1.2
1.3
1.4
1.5
Maturity (in Months)
Loadin
g
(b) Loadings Common Disturbance
6.1.7 Common Time-Varying Volatility
From the results shown in table 6.2 the common volatility specification looks quite
successful. The likelihood values increase hugely and also the volatility and loading
estimates are sensible. Periods with higher volatitilies are for example the 80’s and
during crises (figure 6.3a). From the results we see that as expected the volatilities of
the disturbances of the different maturities have different intensities: shorter maturity
yields are more volatile than longer maturity yields (figure 6.3b). Further we see from
the estimated errors that shocks to the different maturities appear at same time and
in the same direction. This confirms that it is indeed rational to assume a common
volatility driver.
Results 50
Figure 6.3: Common Time-Varying Volatility
1970 1975 1980 1985 1990 1995 2000 2005 20100
0.5
1
1.5
2
2.5
3
Year
ht
(a) Volatility
0 20 40 60 80 100 1200.7
0.75
0.8
0.85
0.9
0.95
1
Maturity (in Months)
loadin
g
(b) Loadings Volalitility
6.1.8 Bjork and Christensen Four-Factor Model
Finally, we estimate a four-factor extension of the DNS, a so-called Svensson extension.
To avoid additional estimation problems reported by Pooter (2007) we use the adjusted
version in which we have only one λ decay parameter (Bjork and Christensen, 1999).
Also, the model is only estimated in diagonal specification. The results indicate that
this specification further increases the likelihood value and decreases the RMSE.
6.1.9 Estimation Robustness
Due to the large number of parameters in the models used in this thesis, the optimization
problems are highly dimensional. Further, because of the high persistence of the yields
the likelihood surface is very flat. Therefore, the likelihood functions can have multiple
local maxima. As a result of this, algorithms can encounter difficulties finding the
global maximum. Sensitivity to initial values can be large for some models, especially
if certain dynamics are not strongly present. The nonlinear extension with variable
lambda is difficult to estimate, but also the simple models with full coefficient matrices.
The optimization should be repeated with different initial values till a global optimum
is found. These concerns are worrying in the empirical use of the models. The Nelson-
Siegel models partly originated in its ease of estimation. Without this convenience it
loses its advantage partly over theoretically more sound arbitrage-free models.
Results 51
6.1.10 In-Sample Conclusion
From the in-sample results we conclude that the more flexible model with variable
lambda fits the data slightly better than the standard model. A model with an ad-
ditional fourth factor (a Svensson extension) will fit the data even better and is consid-
erably easier to estimate, than the nonlinear variable λt model. Moreover, estimation
using the GAS framework results in a better in-sample fit compared to the Kalman filter
estimated models. Also, the specification with a common disturbance is not suited for
GAS estimation. On conclusions for the other model specifications we would like to be
careful: the model with student-t distributed disturbances seems as an extension in the
right direction but in practice has its drawback in the updating of the factors, therefore
it does not fit the data well. The extension with time-varying volatility with Gaussian
disturbances hugely increases the log-likelihood value, but the estimated errors are not
that decisive on increased fit. Nevertheless, the estimated volatilities could give insight
in the confidence intervals surrounding the predictions.
Results 52
6.2 Out-of-Sample Performance
In the following section the forecasting performance of the models are compared. We
make comparisons based on RMSFE and Diebold-Mariano tests. Creal et al. (2008) and
Koopman et al. (2010) do not make comparisons based on out-of-sample performance
of their model extensions. However, forecasting is an important practical problem for
which term-structure models are used, besides fitting the current curve or describing the
dynamics of the past. Therefore, it is practical to evaluate the forecasting capabilities.
From the previous section we saw that GAS estimated models lead to a better in-sample
fit than comparable Kalman estimated models based on its residuals (RMSE). Moreover,
model extensions further increased the in-sample fit. Although the GAS estimation
method and extensions lead to a better fit, it is possible that we over-fit our data. This
over-fitting may lead to worse forecasting capabilities. Therefore we evaluate the models
based on out-of-sample forecasting capabilities. Our newly proposed extensions of the
GAS estimated models are compared, as well as the extensions by Creal et al. (2008)
and Koopman et al. (2010). Also, the predictive capacities of the Kalman estimated
models are compared with the GAS estimated models.
6.2.1 Forecast Procedure
Next the forecasting procedures are introduced. The model chosen to beat is the Random
Walk (RW). In finance it is known to be very difficult to beat the random walk, for the
term structure setting for example: Ang and Piazzesi (2003); Diebold and Li (2006);
Duffee (2002); Monch (2008). We therefore take this as the benchmark for the assessment
of the forecasts generated with the estimated models. The random walk is given by:
yt(τ) = yt−1(τ) + εt, with εt ∼ N(0, σ2(τ)). (6.3)
for τ = 1, . . . ,m. This leads to the h months ahead forecast given by:
yt+h = yt. (6.4)
The forecasting procedure for the GAS and Kalman estimated models is as follows. First
the models are estimated for some estimation period. From this estimation period we
obtain estimated model coefficients. We then use these coefficients to obtain filtered
factors for our forecasting period. Using these filtered factors h-step ahead forecasts of
the yields are calculated at every point in time. Forecasts are generated for the horizons
1,6 and 12 months, so for the short, medium and long-term.
For our GAS estimated models we have the following prediction of the factors. At time t
Results 53
we have the factor prediction given by updating equation (2.20): ft+1. We thus already
have the prediction for h=1. For ft+h we have following from recursions:
ft+h =
(h∑i=1
Bi−1
)ω +Bh−1ft+1. (6.5)
For the Kalman filter we have similar predictions. The forecast of the Kalman state
equation is given by:
βt+h =
(I +
h−1∑i=1
A
)µ+Ahβt|t. (6.6)
After we have calculated the factor forecasts we obtain the yield forecast using:
yt+h = Xt+hβt+h. (6.7)
For our linear specifications we have our time-invariant loading matrix given by: Xt+h =
X(λ). For the variable lambda specification we first need to calculate the corresponding
λt+h by use of equation (5.31) and using this value calculate the corresponding Xt+h
using equation (5.31). The model specification with time-varying volatility have similar
yield prediction steps as the expectation of the disturbance is zero, so the forecasts only
depend on the first three Nelson-Siegel factors.
6.2.2 Forecast Measures
Forecasts are compared based on the following measures. First, the forecast error of
model i is defined by:
εit+h(τ) = yit+h(τ)− yt+h(τ) (6.8)
The RMSFE for each maturity is then given by:
RMSFE(τ) =
√√√√ 1
T
T∑t=1
(εt+h)2. (6.9)
Further we use an average of the forecast errors over all (17) maturities, namely the
trace root mean squared error given by:
tRMSFE(τ) =
√√√√ 1
T
1
N
T∑t=1
N∑i=1
(εt+h)2. (6.10)
Results 54
Besides these somewhat subjective measures, the Diebold-Mariano test is used (2002).
The Diebold-Mariano statistic is used to test the hypothesis that performance of two
models is the same. The loss function of the test is chosen such that it indicates sig-
nificance based on the squared error. Therefore it complements analysis based on the
RMSE. Additionally, the absolute error |εit+h| is could be used as loss-differential. Under
the null the performance of two models is equal:
H0 : E[(ε1t+h)2] = E[(ε2t+h)2]. (6.11)
H1 : E[(ε1t+h)2] 6= E[(ε2t+h)2]. (6.12)
The test statistic is then given by:
S =d(
1T V (d)
)1/2 ∼ N(0, 1), (6.13)
where the loss differential is given by:
dt = (ε1t+h)2 − (ε2t+h)2, (6.14)
and the sample mean and sample variance are correspondingly given by:
d =1
T
T∑t=1
dt, (6.15)
V (d) = γ0 + 2∞∑j=1
γj , γj = ˆcov(dt, dt−j), (6.16)
where V (d) gives a consistent estimate of the asymptotic (long-run) variance. The
long-run variance is used because the loss differentials are serially correlated for h > 1.
Under the null the test-statistic is asymptotically standard normally distributed. For
the standard 2-sided test we reject null if |S|> Φ−1(1 − α/2), where Φ is cumulative
normal distribution and α the critical value.
We evaluate forecasting capabilities by dividing the data-set into an initial estimation
period and a forecasting period. We choose the end of our data-set as forecasting sample.
We estimate and calibrate our model with the data from January 1970 till December
2004 (420 observation). We then take the last part of our data-set to create forecasts.
Forecasts are generated over a period of 5 years: from January 2005 till December 2009
(60 observations). In this period falls the US housing bubble and the start of the global
financial crisis. Therefore this period is far from stable. The yield curve goes from a
steady upward sloping curve to a flat and even inverted yield curve with high interest
rates. Then interest rates rapidly fall and the curve moves back to an upward sloping
Results 55
shape. Because of these real-world dynamics this could realy illustrate the capabilities
of the DNS. We choose not to estimate specifications with common disturbance and
student-t distributed errors. From our in-sample results these specifications are found
not to give a credible representation of the yield curve.
6.2.3 Forecast Results
In tables (A.10)-(A.12) we present the (t)RMSFE. Because our reference model is the
RW, positive values indicate out-performance of the RW. Further, we present in tables
(A.13)-(A.15) the Diebold-Mariano statistics for all maturities in three forecasting hori-
zons. Positive (negative) values indicate out-performance of (by) the RW.
From the results we see that forecasting errors are in general an increasing function of
the forecasting horizon, which conforms to general expectations. Also, yields of longer
maturities have smaller forecasting errors than yields of shorter maturities. Which is
also rational because of their lower estimated variance.
For the forecast results of the chosen sub-sample we can be short: the random walk
easily beats all the proposed models, for some maturities even significantly. Only in
a few instances in the shortest forecasting horizon of one month are the DNS models
able to outperform the naive RW. This is possibly the result of the chosen forecasting
sub-sample. The chosen period is found to be too violent to forecast with the considered
models (and possibly any model).
Further, it could be argued that a different estimation sub-sample would give better
forecasts. We chose to make use of the maximum sample period (1970-2005), instead
of some shorter possibly arbitrary period. Nevertheless, the persistent dynamics which
are captured in the high values of the coefficient matrices will not change that much by
considering a different estimation sub-sample.
The results of the forecasting equation might indicate an important property of the GAS
framework: All estimated models have comparable coefficient estimates for the lagged
values of the factors, both for the Kalman and GAS estimated versions (see in-sample re-
sults in Appendix A), especially for the diagonal coefficient specifications. What makes
the models different is the correction for mis-specifications. Both techniques use a similar
one-step ahead prediction framework: for each period the methods assess the deviations
from expectations and adapt accordingly. From the generated forecasts, we see that the
GAS estimated models have slightly lower forecast errors than the Kalman estimated
versions. Especially for the shortest forecasting horizon of one month. The short-term
benefits the most of the correction (see forecast equation 6.5). This correction might be
beneficial in the short forecasting horizon, but could possibly also lead to over-fitting
and degrade forecasting capabilities in the long-run. This possibility is indicated by
Results 56
the estimates of the coefficients in the full Gaussian specification, which is estimated
using the GAS framework (g-VAR). The estimates have a different economic meaning,
with clear downward trend in the curvature, compared to the estimates of the Kalman
estimated version (k-VAR).
6.2.4 Out-of-Sample Conclusion
Overall, none of the models is able to significantly outperform the benchmark RW model.
On the contrary, for our chosen sub-sample the DNS is easily beaten by the naive
RW. Both techniques use a similar one step ahead prediction framework. Despite this,
both techniques in all specifications are unable to correct for changes in the yields and
correctly forecast them. It should be noted that these conclusions are based on the chosen
sub-sample. In the long-run or for some stable sub-sample the RW could possibly be
beaten, but from our chosen sub-sample we cannot conclude this.
Chapter 7
Conclusion
In this thesis we compare the in-sample and out-of-sample performance of a range of
different dynamic Nelson-Siegel specifications, estimated using the GAS framework. We
estimate some existing specifications and propose some new specifications. Furthermore,
we compare performance of the GAS estimated and Kalman estimated models.
Our first research question concerns the assumption of normality. From our analyses it
follows that the assumption of normality is not correct. Therefore, we propose a DNS
model specification with multivariate student-t distributed disturbances. Based on the
model log-likelihood values, this model extension gives a significant improvement in fit.
But, based on the residuals we come to the conclusion that this specification does not
result in a better model fit. This is a result of the slow reaction to large shocks. The
slow reaction to large shocks can be attributed to the workings of the GAS updating
mechanism. As a result of the fat-tailed nature of the student-t distribution, the up-
dating mechanism reacts less fierce to large changes in the yield. Because these yield
changes are mostly structural we do not find this a meaningful option for modeling and
forecasting.
The second question we answer is if heteroskedasticity can be included in the model.
We modify a time-varying volatility specification proposed by Koopman et al. (2010) in
order to estimate it within the GAS framework. This specification is not suited for esti-
mation using the GAS framework. The inability to correctly estimate this specification
is the result of the assumption of two disturbance sources. In the chosen setup the GAS
framework cannot distinguish these disturbance sources, therefore this specification re-
sults in optimization problems. Furthermore, we propose a new time-varying volatility
specification. This specification works well within the GAS framework and results in a
better in-sample fit compared to a standard fixed variance DNS model. The estimated
volatilities of this specification could give insight in the confidence intervals surrounding
predictions.
57
Conclusion 58
Additionally, other model extensions are investigated taking advantage of the possibili-
ties of the GAS framework. A nonlinear model with a variable lambda is estimated as
proposed by Creal et al. (2008). Also, a four-factor Bjork-Christensen specification is
estimated. From in-sample performance we conclude that these more elaborate models
with nonlinearities or an additional fourth factor fit the data better than the standard
GAS estimated DNS. Although, the estimation difficulties that result from the nonlin-
earities in the variable lambda model are worrying in empirical use. Moreover, the fit
of the four-factor model is better than the nonlinear extension.
The final research question regards how the performance of the GAS estimated models
compares to the Kalman filter estimated models. In-sample results indicate that GAS
estimated models give a better in-sample fit to the data, compared to similar Kalman
filter estimated models. In out-of-sample forecasting, the random walk again is difficult
to beat. For the chosen sub-sample the DNS is outperformed by the naive random walk,
both the GAS estimated specifications, as well as the Kalman estimated specifications.
Overall, the GAS updating mechanism incorporates shocks very well into the factors,
resulting in a better in-sample fit compared to similar Kalman filter estimated mod-
els. Unfortunately, we cannot conclude that this better in-sample fitting translates to
out-of-sample out-performance of the Kalman estimated models. Furthermore, the GAS
framework gives an increased flexibility for model extensions.
Chapter 8
Further Research
The used GAS framework is very flexible and research on this framework looks very
promising. Through the use of this framework the DNS could be extended in various
directions. One extension direction that would be interesting is the addition of macro-
variables. A lot of recent research claims to gain predictive accuracy through the use
variables such as inflation and real economic activity.(Diebold et al., 2005, 2006; Rude-
busch and Wu, 2008)
Secondly, because of the use of a long data-set it could also be interesting to estimate
a model with regime-switching properties estimated using the GAS framework. A lot
of researchers find evidence for regime-switching properties and claim to gain predictive
power through the use of them.(Bernadell et al., 2005; Xiang and Zhu, 2013)
In this thesis, forecasts are only evaluated based on (t)RMSFE and DM test. It would
be interesting to use the forecasts in some trading strategy such as Fabozzi et al. (2005,
2007). He uses the slope and curvature predictions in a trading strategy and evaluates
the return of these strategies. Also, the time-varying volatility could be used in pro-
viding confidence intervals surrounding the predictions. Moreover, it could possibly be
used in some term-structure option trading strategy.
Furthermore, this thesis has focused only on the NS curve for fitting and forecasting the
yield curve. Recent research suggests new estimation methods that are supposed to be
easy to estimate and give consistent global maximums. They realize this by concentrat-
ing out variables and thereby reduce the optimization space of the model. It would be
interesting to see if these methods can be used in combination with the GAS framework
(Hamilton and Wu, 2012).
59
Appendix A
Tables
61
Tables 62
TableA.1:
RM
SE
inth
ep
eriod
from
Janu
ary
1970
tillD
ecemb
er20
09.R
epresen
tedas
valu
esrela
tiveto
the
GA
SG
au
ssian
AR
mod
el
Matu
rityg-A
Rg-V
AR
g-λ
-AR
g-T
VV
-A
Rg-C
D-
AR
g-S
ven
t-AR
t-VA
Rk-A
Rk-V
AR
30.5
750.9
770.97
40.989
1.0170.977
1.2691.192
1.1021.109
60.5
330.9
920.99
40.997
1.0720.993
1.2571.171
1.0621.064
90.5
210.9
990.99
91.002
1.1120.998
1.2921.156
1.0531.054
12
0.5
141.0
000.99
31.003
1.1260.998
1.3171.139
1.0531.053
15
0.5
070.9
990.98
61.004
1.1180.998
1.2931.087
1.0441.044
18
0.4
891.0
000.98
51.004
1.1240.997
1.3401.085
1.0451.045
21
0.4
801.0
000.98
71.007
1.1270.996
1.3821.096
1.0471.046
24
0.4
731.0
000.98
41.005
1.1170.995
1.4201.103
1.0481.049
30
0.4
561.0
010.97
81.006
1.0950.994
1.4221.084
1.0401.042
36
0.4
440.9
990.97
51.010
1.0810.992
1.4181.069
1.0341.036
48
0.4
240.9
980.97
21.013
1.0690.983
1.3871.072
1.0321.032
60
0.4
051.0
000.98
31.007
1.0550.974
1.3421.066
1.0241.025
72
0.3
980.9
990.98
31.003
1.0400.955
1.2671.077
1.0131.011
84
0.3
810.9
960.99
91.004
1.0680.969
1.2911.106
1.0111.009
96
0.3
620.9
950.99
61.007
1.0870.985
1.2421.095
1.0131.012
108
0.3
500.9
910.99
71.009
1.1360.995
1.2461.097
1.0361.030
120
0.3
600.9
901.00
91.000
1.1630.982
1.3151.086
1.0641.053
TO
TA
L0.45
60.99
60.98
71.003
1.0930.989
1.3251.113
1.0471.048
Tables 63
Table A.2: Kalman DNS with diagonal coefficient matrix (k-AR)
µ A λ
7.109*** 0.989*** - - 0.054***(0.0056) (0.0060) (0.0009)-1.699*** - 0.960*** -(0.0016) (0.0062)-0.419*** - - 0.918***(0.0091) (0.0090)
Table A.3: Kalman DNS with full coefficient matrix (k-VAR)
µ A λ
6.919*** 0.990*** 0.020*** -0.008*** 0.055***(0.0221) (0.0015) (0.0026) (0.0023) (0.0010)-1.770*** -0.013 0.954*** 0.036***(0.0021) (0.0052) (0.0072) (0.0070)-0.584*** 0.039*** -0.003** 0.903***(0.0091) (0.0012) (0.0019) (0.0016)
Table A.4: GAS Gaussian with diagonal coefficient matrices (g-AR)
ω A B λ
0.021*** 1.163*** - - 0.997*** - - 0.058***(0.0017) (0.0009) (0.0002) (0.0002)-0.027*** - 1.168*** - - 0.991*** -(0.00130) (0.0009) (0.0004)-0.050*** - - 1.224*** - - 0.866***(0.0052) (0.0010) (0.0008)
Table A.5: GAS Gaussian with full coefficient matrices (g-VAR)
ω A B λ
0.086*** 0.934*** 0.031*** 0.021*** 0.994*** 0.028*** -0.011*** 0.066***(0.0006) (0.0015) (0.0013) (0.0008) (0.0000) (0.0004) (0.0004) (0.0002)0.033*** 0.224*** 1.097*** 0.070*** -0.020*** 0.926*** 0.049***(0.0042) (0.0032) (0.0019) (0.0009) (0.0005) (0.0006) (0.0007)-0.562*** 0.583*** 0.024*** 1.020*** 0.072*** 0.044*** 0.788***(0.0108) (0.0080) (0.0054) (0.0031) (0.0012) (0.0025) (0.0023)
Table A.6: GAS Gaussian with variable λt diagonal coefficient matrices (g-λ-AR)
ω A B φ0 Φ
0.041*** 1.163*** - - 0.995*** - - 0.033*** 0.008***(0.0015) (0.0002) (0.0012) (0.0005) (0.0004)0.016*** - 1.113*** - - 0.999*** - 0.011***(0.0010) (0.0040) (0.0008) (0.0004)-0.015*** - - 1.175*** - - 0.741*** -0.001***(0.0073) (0.0049) (0.0000) (0.0006)
Tables 64
Table A.7: GAS Gaussian with Time-Varying Volatility and diagonal coefficient ma-trices (g-TVV-AR)
ω A B λ
0.105*** 1.131*** - - - 0.983*** - - - 0.050***(0.0054) (0.0486) (0.0067) (0.0102)0.021*** - 1.163*** - - - 0.988*** - -(0.008) (0.0038) (0.0101)
0.002*** - - 1.172*** - - - 0.962*** -(0.0094) (0.0015) -0.0201-0.153*** - - - 0.212*** - - - 0.974***(0.0006) (0.0059) (0.0057)
Table A.8: GAS Four-Factor Bjork-Christensen with diagonal coefficient matrices(g-BC-AR)
ω A B λ
-0.080*** 1.159*** - - - 0.998*** - - - 0.025***(0.0053) (0.0043) (0.0062) (0.0001)-0.528*** - 1.157*** - - - 0.973*** - -(0.0021) (0.0018) (0.0022)0.555*** - - 1.153*** - - - 0.997*** -(0.0025) (0.0058) (0.0011)0.585*** - - - 1.153*** - - - 0.979***(0.0026) (0.0020) (0.0019)
Table A.9: GAS Student-t with diagonal coefficient estimates (t-AR)
ω A B λ v
0.045*** 1.334*** - - 0.996*** - 0.062 4.176***(0.0002) (0.0014) (0.0000) - (0.0012)-0.019*** - 1.245*** - - 0.956*** -(0.0000) (0.0035) (0.0020)-0.073*** - - 1.548*** - - 0.886***(0.0001) (0.0016) (0.0010)
Tables 65
Table A.10: (t)RMSFE in the period from January 2005 till December 2009 with ahorizon of 1 month, represented as values relative to RW.
Maturity RW g-AR g-VAR
g-λ-AR
g-TVV-AR
g-BC-AR
k-AR k-VAR
3 0.307 1.416 1.668 1.363 1.429 1.212 1.802 1.7126 0.282 1.059 1.009 1.115 1.058 1.043 1.318 1.2129 0.283 0.977 0.973 1.058 0.973 1.001 1.132 1.04612 0.291 0.939 1.043 1.045 0.936 0.962 1.034 0.97015 0.286 1.014 1.108 1.092 1.048 1.024 1.116 1.06218 0.287 1.045 1.125 1.116 1.093 1.034 1.126 1.08321 0.285 1.075 1.145 1.149 1.125 1.045 1.135 1.10524 0.295 1.096 1.117 1.150 1.152 1.039 1.133 1.11530 0.296 1.167 1.127 1.205 1.224 1.047 1.179 1.16536 0.291 1.200 1.160 1.224 1.246 1.042 1.188 1.18548 0.304 1.248 1.235 1.240 1.277 1.036 1.213 1.21960 0.288 1.173 1.189 1.226 1.229 0.987 1.121 1.14372 0.298 1.234 1.268 1.274 1.300 1.045 1.177 1.20684 0.277 1.051 1.103 1.206 1.142 1.000 1.010 1.04196 0.301 1.002 1.041 1.141 1.074 0.990 0.976 0.997108 0.281 1.048 1.077 1.208 1.123 1.085 1.031 1.048120 0.296 0.994 1.026 1.100 1.021 1.023 1.000 1.005
Total 0.291 1.113 1.158 1.176 1.155 1.039 1.178 1.153
Table A.11: (t)RMSFE in the period from January 2005 till December 2009 with ahorizon of 6 months, represented as values relative to RW.
Maturity RW g-AR g-VAR
g-λ-AR
g-TVV-AR
g-BC-AR
k-AR k-VAR
3 1.067 1.163 1.162 1.154 1.162 1.248 1.302 1.2136 1.047 1.137 1.120 1.107 1.106 1.191 1.231 1.1459 1.037 1.149 1.132 1.102 1.099 1.166 1.208 1.13512 1.027 1.165 1.151 1.107 1.101 1.151 1.195 1.13615 1.004 1.227 1.218 1.155 1.152 1.179 1.235 1.18718 0.983 1.262 1.259 1.180 1.179 1.185 1.251 1.21421 0.956 1.289 1.293 1.199 1.199 1.189 1.262 1.23524 0.945 1.306 1.320 1.213 1.216 1.186 1.268 1.25230 0.929 1.330 1.360 1.227 1.239 1.184 1.278 1.27836 0.895 1.334 1.378 1.224 1.245 1.175 1.273 1.28748 0.870 1.295 1.362 1.193 1.225 1.144 1.237 1.27160 0.781 1.284 1.385 1.193 1.239 1.162 1.229 1.29272 0.754 1.278 1.396 1.201 1.258 1.193 1.232 1.31284 0.652 1.237 1.396 1.174 1.242 1.204 1.196 1.30596 0.664 1.142 1.297 1.097 1.164 1.142 1.112 1.217108 0.642 1.112 1.274 1.080 1.149 1.131 1.087 1.195120 0.558 1.118 1.323 1.092 1.166 1.174 1.091 1.228
Total 0.886 1.230 1.266 1.162 1.179 1.180 1.233 1.221
Tables 66
Table A.12: (t)RMSFE in the period from January 2005 till December 2009 with ahorizon of 12 months, represented as values relative to RW.
Maturity RW g-AR g-VAR
g-λ-AR
g-TVV-AR
g-BC-AR
k-AR k-VAR
3 1.936 1.086 1.094 1.121 1.103 1.192 1.141 1.0746 1.843 1.090 1.090 1.108 1.093 1.178 1.135 1.0729 1.755 1.122 1.124 1.129 1.119 1.192 1.160 1.10312 1.693 1.147 1.153 1.144 1.138 1.200 1.176 1.12815 1.635 1.195 1.208 1.184 1.182 1.233 1.218 1.17818 1.574 1.230 1.252 1.211 1.215 1.255 1.248 1.21821 1.508 1.259 1.292 1.234 1.243 1.275 1.274 1.25324 1.451 1.289 1.336 1.258 1.274 1.299 1.303 1.29130 1.377 1.324 1.398 1.285 1.313 1.330 1.338 1.34436 1.281 1.350 1.452 1.302 1.344 1.362 1.368 1.39048 1.162 1.357 1.510 1.302 1.365 1.405 1.387 1.44060 1.003 1.378 1.602 1.319 1.412 1.477 1.426 1.51872 0.942 1.372 1.646 1.315 1.430 1.515 1.435 1.55984 0.807 1.341 1.694 1.283 1.427 1.540 1.423 1.59096 0.767 1.291 1.674 1.234 1.393 1.519 1.384 1.571108 0.719 1.276 1.696 1.218 1.396 1.520 1.381 1.590120 0.636 1.259 1.761 1.204 1.403 1.512 1.376 1.628
Total 1.363 1.217 1.296 1.201 1.225 1.284 1.253 1.252
Table A.13: Diebold-Mariano Statistics of forecasts in the period from January 2005till December 2009 with a horizon of 1 month, where positive values indicate outper-
formance of the RW. The last row indicates no. of outperformances of the RW.
Maturity g-AR g-VAR g-λ-AR g-TVV-AR
g-BC-AR
k-AR k-VAR
3 -2.972 -4.414 -2.867 -3.530 -2.141 -3.555 -3.3636 -1.434 -0.223 -1.752 -1.609 -1.212 -3.106 -2.4569 0.486 0.439 -0.793 0.703 -0.013 -2.964 -1.48412 1.310 -0.579 -0.558 1.610 0.765 -1.204 1.74415 -0.242 -1.227 -1.136 -0.818 -0.425 -2.955 -1.55718 -0.808 -1.694 -1.660 -1.854 -0.755 -2.939 -2.56221 -1.196 -2.038 -1.943 -2.225 -0.983 -2.760 -2.58224 -1.443 -1.860 -1.937 -2.379 -0.845 -2.329 -2.18330 -1.809 -1.583 -2.137 -2.523 -0.748 -2.157 -1.98236 -1.971 -1.671 -2.105 -2.659 -0.688 -2.238 -2.10848 -2.026 -1.789 -2.077 -2.497 -0.792 -1.856 -1.83160 -1.943 -1.883 -1.637 -2.753 0.296 -1.717 -1.80972 -1.842 -1.982 -1.839 -2.361 -0.821 -1.546 -1.64784 -0.920 -1.487 -1.146 -1.950 0.006 -0.343 -0.80796 -0.166 -0.959 -0.954 -1.392 0.312 0.691 0.163108 -0.995 -1.746 -1.330 -1.495 -1.106 -0.759 -1.121120 0.093 -0.286 -0.827 -0.382 -0.425 0.065 -0.078# 3 1 0 2 4 2 2
Tables 67
Table A.14: Diebold-Mariano Statistics of forecasts in the period from January 2005till December 2009 with a horizon of 6 month, where positive values indicate outper-
formance of the RW. The last row indicates no. of outperformances of the RW.
Maturity g-AR g-VAR g-λ-AR g-TVV-AR
g-BC-AR
k-AR k-VAR
3 -1.488 -1.085 -1.238 -1.240 -1.915 -1.452 -1.1626 -1.928 -1.158 -1.527 -1.286 -2.222 -1.438 -1.1819 -2.168 -1.506 -1.877 -1.491 -2.172 -1.474 -1.43012 -2.097 -1.686 -1.909 -1.502 -2.023 -1.458 -1.53315 -2.178 -1.954 -1.968 -1.725 -2.494 -1.617 -1.83518 -2.150 -2.043 -1.987 -1.793 -2.554 -1.658 -1.92221 -2.144 -2.102 -1.978 -1.825 -2.662 -1.696 -1.97924 -2.190 -2.170 -1.988 -1.883 -3.068 -1.782 -2.04630 -2.091 -2.167 -1.901 -1.874 -2.939 -1.761 -2.01336 -2.125 -2.227 -1.935 -1.924 -3.258 -1.822 -2.06648 -2.017 -2.189 -1.734 -1.820 -3.526 -1.746 -1.98660 -1.971 -2.103 -1.651 -1.768 -3.808 -1.724 -1.88372 -1.866 -2.023 -1.540 -1.703 -3.472 -1.648 -1.82384 -1.831 -1.899 -1.538 -1.649 -3.781 -1.583 -1.73296 -1.777 -1.655 -1.343 -1.487 -3.218 -1.411 -1.514108 -1.916 -1.586 -1.414 -1.525 -3.588 -1.392 -1.485120 -2.356 -1.616 -2.203 -1.785 -2.706 -1.767 -1.605# 0 0 0 0 0 0 0
Table A.15: Diebold-Mariano Statistics of forecasts in the period from January 2005till December 2009 with a horizon of 12 months, where positive values indicate outper-
formance of the RW. The last row indicates no. of outperformances of the RW.
Maturity g-AR g-VAR g-λ-AR g-TVV-AR
g-BC-AR
k-AR k-VAR
3 -0.952 -0.592 -0.970 -0.790 -1.142 -0.675 -0.5506 -1.156 -0.671 -1.123 -0.863 -1.267 -0.689 -0.6289 -1.348 -0.906 -1.306 -1.043 -1.428 -0.789 -0.87712 -1.335 -1.030 -1.319 -1.090 -1.497 -0.833 -0.99915 -1.365 -1.188 -1.361 -1.192 -1.589 -0.932 -1.16118 -1.385 -1.297 -1.392 -1.261 -1.662 -0.998 -1.26521 -1.403 -1.378 -1.413 -1.310 -1.748 -1.051 -1.34224 -1.419 -1.446 -1.428 -1.354 -1.816 -1.106 -1.40430 -1.392 -1.510 -1.403 -1.370 -1.853 -1.146 -1.44736 -1.393 -1.572 -1.403 -1.393 -1.946 -1.183 -1.49448 -1.373 -1.634 -1.375 -1.395 -2.036 -1.225 -1.53060 -1.373 -1.701 -1.370 -1.412 -2.136 -1.260 -1.57772 -1.381 -1.731 -1.369 -1.425 -2.101 -1.299 -1.60284 -1.332 -1.710 -1.304 -1.367 -2.264 -1.256 -1.58996 -1.328 -1.695 -1.288 -1.343 -2.299 -1.248 -1.583108 -1.415 -1.739 -1.393 -1.393 -2.333 -1.301 -1.638120 -1.485 -1.713 -1.456 -1.394 -2.553 -1.303 -1.661# 0 0 0 0 0 0 0
Appendix B
Kalman Filter
The Kalman filter provides a minimum mean squared error estimate of βt. At each time
an optimal prediction is generated of yt. This prediction is based on all information up
to that time. We need βt|t−1 The prediction step is given by:
βt|t−1 = (I −A)µ+Aβt−1|t−1. (B.1)
Pt|t−1 = APt−1|t−1A′ + Ση. (B.2)
ηt|t−1 = yt − yt|t−1) = yt −X(λ)βt|t−1. (B.3)
Ft|t−1 = X(λ)Pt|t−1X(λ)′ + Σε. (B.4)
The updating step is given by:
βt−1|t−1 = βt|t−1 + Pt|t−1X(λ)′F−1t|t−1ηt|t−1 (B.5)
Pt−1|t−1 = Pt|t−1 − Pt|t−1X(λ)′F−1t|t−1X(λ)Pt|t−1, (B.6)
where Pt is the variance of βt and the computations are carried out recursively for
t = 1, . . . , T .
As initial values we use:
β1|0 = E[βt] = µ
P1|0 = Σβ, where Σβ −AΣβA′ = Ση
The conditional distribution of the yields is then given by:
yt|Ft−1 ∼ N(yt|t−1), Ft|t−1) (B.7)
69
Kalman Filter 70
So the log-likelihood is given by:
`t(θ) = −NT2
log(2π)− NT
2
T∑t=1
log|Ft|t−1|−1
2
T∑t=1
η′t|t−1F−1t|t−1ηt|t−1 (B.8)
Numerical optimization over the hyper-parameters θ gives the maximum likelihood es-
timate
Bibliography
Ang, Andrew and Piazzesi, Monika. A no-arbitrage vector autoregression of term
structure dynamics with macroeconomic and latent variables. Journal of Monetary
economics, 50(4):745–787, 2003.
Bernadell, Carlos; Coche, Joachim, and Nyholm, Ken. Yield curve prediction for the
strategic investor. Technical report, European Central Bank, 2005.
Bjork, Tomas and Christensen, Bent Jesper. Interest rate dynamics and consistent
forward rate curves. Mathematical Finance, 9(4):323–348, 1999.
Bollerslev, Tim. Generalized autoregressive conditional heteroskedasticity. Journal of
econometrics, 31(3):307–327, 1986.
Caks, John. The coupon effect on yield to maturity. The Journal of Finance, 32(1):
103–115, 1977.
Caldeira, Joao F; Laurini, Marcio P, and Portugal, Marcelo S. Bayesian inference
applied to dynamic nelson-siegel model with stochastic volatility. Brazilian Review
of Econometrics, 30(1):123–161, 2010.
Christensen, Jens H. E.; Lopez, Jose A., and Rudebusch, Glenn D. Can spanned term
structure factors drive stochastic volatility? †, 2013.
Christensen, Jens HE; Diebold, Francis X, and Rudebusch, Glenn D. An arbitrage-free
generalized nelson–siegel term structure model. The Econometrics Journal, 12(3):
C33–C64, 2009a.
Christensen, Jens HE; Lopez, Jose A, and Rudebusch, Glenn D. Do central bank
liquidity facilities affect interbank lending rates? Federal Reserve Bank of San
Francisco Working Paper, 13, 2009b.
Christensen, Jens HE; Diebold, Francis X, and Rudebusch, Glenn D. The affine
arbitrage-free class of nelson–siegel term structure models. Journal of Econometrics,
164(1):4–20, 2011.
71
Bibliography 72
Coroneo, Laura; Nyholm, Ken, and Vidova-Koleva, Rositsa. How arbitrage-free is the
nelson–siegel model? Journal of Empirical Finance, 18(3):393–407, 2011.
Cox, David R; Gudmundsson, Gudmundur; Lindgren, Georg; Bondesson, Lennart;
Harsaae, Erik; Laake, Petter; Juselius, Katarina, and Lauritzen, Steffen L.
Statistical analysis of time series: Some recent developments [with discussion and
reply]. Scandinavian Journal of Statistics, pages 93–115, 1981.
Cox, John C; Ingersoll Jr, Jonathan E, and Ross, Stephen A. A theory of the term
structure of interest rates. Econometrica: Journal of the Econometric Society, pages
385–407, 1985.
Creal, ; Koopman, , and Lucas, . The estimation of time-varying parameters in
multivariate linear time series models. Working Paper, 2011a.
Creal, Drew; Koopman, Siem Jan, and Lucas, Andre. A general framework for
observation driven time-varying parameter models. 2008.
Creal, Drew; Koopman, Siem Jan, and Lucas, Andre. A dynamic multivariate
heavy-tailed model for time-varying volatilities and correlations. Journal of Business
& Economic Statistics, 29(4), 2011b.
Creal, Drew; Koopman, Siem Jan, and Lucas, Andre. Generalized autoregressive score
models with applications. Journal of Applied Econometrics, 2012.
Diebold, Francis X and Li, Canlin. Forecasting the term structure of government bond
yields. Journal of econometrics, 130(2):337–364, 2006.
Diebold, Francis X and Mariano, Robert S. Comparing predictive accuracy. Journal of
Business & economic statistics, 20(1), 2002.
Diebold, Francis X and Rudebusch, Glenn D. The dynamic nelson-siegel approach to
yield curve modeling and forecasting. Technical report, mimeo, 2011.
Diebold, Francis X; Piazzesi, Monika, and Rudebusch, Glenn. Modeling bond yields in
finance and macroeconomics. 2005.
Diebold, Francis X; Rudebusch, Glenn D, and Boragan Aruoba, S. The macroeconomy
and the yield curve: a dynamic latent factor approach. Journal of econometrics, 131
(1):309–338, 2006.
Duffee, Gregory R. Term premia and interest rate forecasts in affine models. The
Journal of Finance, 57(1):405–443, 2002.
Duffee, Gregory R and Stanton, Richard H. Estimation of dynamic term structure
models. The Quarterly Journal of Finance, 2(02), 2012.
Bibliography 73
Duffie, Darrell and Kan, Rui. A yield-factor model of interest rates. Mathematical
finance, 6(4):379–406, 1996.
Engle, Robert. Dynamic conditional correlation: A simple class of multivariate
generalized autoregressive conditional heteroskedasticity models. Journal of
Business & Economic Statistics, 20(3):339–350, 2002.
Engle, Robert F and Russell, Jeffrey R. Autoregressive conditional duration: a new
model for irregularly spaced transaction data. Econometrica, pages 1127–1162, 1998.
Fabozzi, Frank J; Martellini, Lionel, and Priaulet, Philippe. Predictability in the shape
of the term structure of interest rates. The Journal of Fixed Income, 15(1):40–53,
2005.
Fabozzi, Frank J; Martellini, Lionel, and Priaulet, Philippe. Exploiting predictability
in the time-varying shape of the term structure of interest rates. EDHEC Risk and
Asset Management Research Centre, 2007.
Fama, Eugene F. Term-structure forecasts of interest rates, inflation and real returns.
Journal of Monetary Economics, 25(1):59–76, 1990.
Fama, Eugene F and Bliss, Robert R. The information in long-maturity forward rates.
The American Economic Review, pages 680–692, 1987.
Fama, Eugene F and French, Kenneth R. Common risk factors in the returns on stocks
and bonds. Journal of financial economics, 33(1):3–56, 1993.
Gilli, Manfred; Große, Stefan, and Schumann, Enrico. Calibrating the
nelson–siegel–svensson model. 2010.
Hamilton, James D. Analysis of time series subject to changes in regime. Journal of
econometrics, 45(1):39–70, 1990.
Hamilton, James D and Wu, Jing Cynthia. Identification and estimation of gaussian
affine term structure models. Journal of Econometrics, 168(2):315–331, 2012.
Harvey, Andrew; Ruiz, Esther, and Shephard, Neil. Multivariate stochastic variance
models. The Review of Economic Studies, 61(2):247–264, 1994.
Hautsch, Nikolaus and Yang, Fuyu. Bayesian inference in a stochastic volatility
nelson–siegel model. Computational Statistics & Data Analysis, 56(11):3774–3792,
2012.
Hull, John C. Options, futures, and other derivatives. Pearson Education India, 1999.
Bibliography 74
Jensen, Michael and Scholes, Myron. The capital asset pricing model: Some empirical
tests. 1972.
Joslin, Scott; Singleton, Kenneth J, and Zhu, Haoxiang. A new perspective on gaussian
dynamic term structure models. Review of Financial Studies, 24(3):926–970, 2011.
Kalman, Rudolph Emil. A new approach to linear filtering and prediction problems.
Journal of basic Engineering, 82(1):35–45, 1960.
Kim, Don H and Orphanides, Athanasios. Term structure estimation with survey data
on interest rate forecasts. 2005.
LucasKoopman, Scharth. Predicting time-varying parameters with parameter-driven
and observation-driven models. Working Paper, 2012.
Koopman, Siem Jan; Mallee, Max IP, and Van der Wel, Michel. Analyzing the term
structure of interest rates using the dynamic nelson–siegel model with time-varying
parameters. Journal of Business & Economic Statistics, 28(3):329–343, 2010.
Laurini, Marcio Poletti and Hotta, Luiz Koodi. Bayesian extensions to diebold-li term
structure model. International Review of Financial Analysis, 19(5):342–350, 2010.
Litterman, Robert B and Scheinkman, Jose. Common factors affecting bond returns.
The Journal of Fixed Income, 1(1):54–61, 1991.
McCulloch, J Huston. The tax-adjusted yield curve. The Journal of Finance, 30(3):
811–830, 1975.
Monch, Emanuel. Forecasting the yield curve in a data-rich environment: A
no-arbitrage factor-augmented var approach. Journal of Econometrics, 146(1):26–43,
2008.
Nelson, Charles R and Siegel, Andrew F. Parsimonious modeling of yield curves.
Journal of business, pages 473–489, 1987.
Piazzesi, Monika. Affine term structure models. Handbook of financial econometrics, 1:
691–766, 2010.
Pooter, MD de. Examining the nelson-siegel class of term structure models. Technical
report, Tinbergen Institute, 2007.
Rivers, Douglas and Vuong, Quang. Model selection tests for nonlinear dynamic
models. The Econometrics Journal, 5(1):1–39, 2002.
Rudebusch, Glenn D and Wu, Tao. A macro-finance model of the term structure,
monetary policy and the economy*. The Economic Journal, 118(530):906–926, 2008.
Bibliography 75
Russell, Jeffrey R. Econometric modeling of multivariate irregularly-spaced
high-frequency data. Manuscript, GSB, University of Chicago, 1999.
Svensson, Lars EO. Estimating forward interest rates with the extended nelson &
siegel method. Sveriges Riksbank Quarterly Review, 3(1):13–26, 1995.
Vasicek, Oldrich. An equilibrium characterization of the term structure. Journal of
financial economics, 5(2):177–188, 1977.
Vasicek, Oldrich A and Fong, H Gifford. Term structure modeling using exponential
splines. The Journal of Finance, 37(2):339–348, 1982.
Xiang, Ju and Zhu, Xiaoneng. A regime-switching nelson–siegel term structure model
and interest rate forecasts. Journal of Financial Econometrics, 11(3):522–555, 2013.