Modeling the Term Structure of Interest Rates: using the

University of Amsterdam

Faculty of Economics & BusinessMSc Thesis: Financial Econometrics

Modeling the Term Structure of Interest

Rates: using the Generalized

Autoregressive Score Framework

Student: Guido Jonker (10457615)

Supervisor: dr. N.P.A van Giersbergen

Second Reader: prof. dr. H.P. Boswijk

May 2014

Abstract

In this thesis the term structure of interest rates is modeled with the purpose of fitting

and forecasting. For this the Dynamic Nelson-Siegel (DNS) model is used, which is esti-

mated using the Generalized Autoregressive Score (GAS) framework. Within the GAS

framework, some new extensions of the DNS are proposed and some existing extensions

are evaluated. We propose a new time-varying volatility specification. Also, an exten-

sion with student-t disturbances is proposed, but found unfit for modeling. Further,

extensions with nonlinearities and an additional fourth factor are investigated. We find

that more flexible models lead to a better in-sample fit of the data. Moreover, the GAS

estimated models lead to a better in-sample fit than comparable standard models esti-

mated by the Kalman filter. However, out-of-sample predictability of the term structure

is not proven for the new estimation method and model extensions. Sub-sample analysis

indicates that a naive random walk is difficult to beat using both the GAS and Kalman

modeling framework.

Contents

Abstract iii

1 Introduction 1

2 Theory 5

2.1 The Yield Curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.2 Zero-Coupon Yields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.3 Why Model? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.4 Nelson-Siegel Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.4.1 Dynamic Nelson Siegel . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.5 Generalized Autoregressive Score . . . . . . . . . . . . . . . . . . . . . . . 13

2.5.1 The Modeling Framework . . . . . . . . . . . . . . . . . . . . . . . 14

3 Model Specifications 17

3.1 General Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.1.1 Gaussian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.1.2 Student-t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.1.3 Variable Lambda . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.1.4 Time-Varying Volatility . . . . . . . . . . . . . . . . . . . . . . . . 19

3.1.5 Common Disturbance with Time-Varying Volatility . . . . . . . . 19

3.1.6 Common Volatility . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

4 Data 21

5 Estimation 27

5.1 Initial Estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

5.1.1 Lambda . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

5.1.2 Two-Step Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . 28

5.1.3 One-Step State-Space Estimation . . . . . . . . . . . . . . . . . . . 29

5.2 GAS Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

5.2.1 Maximum Likelihood Estimation . . . . . . . . . . . . . . . . . . . 29

5.2.2 Initial Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

5.2.3 Smoothing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

5.2.4 Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

5.3 Scores and Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

5.3.1 Gaussian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

5.3.2 Student-t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

v

Contents vi

5.3.3 Variable Lambda . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33


5.3.5 Common Time-Varying Volatility . . . . . . . . . . . . . . . . . . . 39

6 Results 41

6.1 In-Sample Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

6.1.1 Two-Step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

6.1.2 Kalman . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

6.1.3 Gaussian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

6.1.4 Student-t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

6.1.5 Variable Lambda . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46


6.1.7 Common Time-Varying Volatility . . . . . . . . . . . . . . . . . . . 49

6.1.8 Bjork and Christensen Four-Factor Model . . . . . . . . . . . . . . 50

6.1.9 Estimation Robustness . . . . . . . . . . . . . . . . . . . . . . . . . 50

6.1.10 In-Sample Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . 51

6.2 Out-of-Sample Performance . . . . . . . . . . . . . . . . . . . . . . . . . . 52

6.2.1 Forecast Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

6.2.2 Forecast Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

6.2.3 Forecast Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

6.2.4 Out-of-Sample Conclusion . . . . . . . . . . . . . . . . . . . . . . . 56

7 Conclusion 57

8 Further Research 59

A Tables 61

B Kalman Filter 69

Bibliography 71

Chapter 1

Introduction

The term structure of interest rates gives the relation between interest rates or bond

yields at different terms (maturities). In general yields increase in line with maturity,

giving rise to an upward sloping yield curve. One basic explanation for this compound-

ing phenomenon is that lenders demand higher interest rates for longer-term loans as

compensation for the greater risk associated with them, in comparison to short-term

loans. The yield curve plays a central role in an economy. Modeling and forecasting the

term structure of interest rates is therefore of great importance in many ways: pricing

derivatives, portfolio management, valuation of assets, risk management and monetary

policy.

To address this issue researchers have come up with a vast amount of literature. Many

use theoretically rigorous approaches, but result in empirically unsatisfying results and

especially bad out-of-sample forecasting capabilities. Because bonds trade in well-

organized, deep and liquid markets, it is logical and appealing to impose the absence of

arbitrage. The US treasury market for example is very large and liquid, with a total out-

standing debt of almost 12 trillion, a yearly issuance of over 2 trillion and approximately

500 billion traded every day1. Because of this liquid market it is unlikely that arbitrage

opportunities exist: the risk adjusted returns of different maturity bonds should be the

same. Or differently stated: the yields are internally consistent. Therefore, a large

amount of literature holds this as their grounding. The associated arbitrage-free (AF)

models started with Vasicek (1977) and Cox et al. (1985). They introduced these so-

called ’affine’ models. Which are functions of the instantaneous short rate(r). These

models became even more popular when Duffie and Kan (1996) generalized them. Un-

fortunately these models have a poor fit and are difficult to estimate, many having

multiple likelihood maxima (Kim and Orphanides, 2005). Besides these estimation dif-

ficulties performance of forecasting is also poor for these affine models (Duffee, 2002).

1http://www.sifma.org/research/statistics.aspx

1

Introduction 2

As a result of these difficulties, the focus of many researchers has been on empirically

attractive models. The most important being the Nelson-Siegel (NS) curve (1987). The

NS curve is widely used among central banks and others in the financial industry. The

reason for this, is its relative simplicity, its ease of estimation and empirically tractable

forecasting results. It faces the problem of modeling the yield curve by summarising the

information at any point in time, for a large number of bonds. It does so by expressing

the large set of yields of various maturities as a function of a small set of unobserved

factors. The underlying economic interpretation of the three factors it uses are: the

level (long-term), slope (short-term) and curvature (medium-term) of the yield curve.

The Nelson-Siegel curve is reasonably flexible, allowing for various shapes (monotonic,

inverted, humped, S-shaped). Therefore, the NS curve ensures a good fit to the data.

As an evolution, the Dynamic Nelson Siegel (DNS) model was developed by Diebold and

Li (2006). This model imposes the structural restrictions of the NS with time varying

factors (level, slope, curvature). These time-varying factors are modeled using (Vector)

Autoregressive specifications. Their paper shows that their forecasts outperform stan-

dard time series and therefore this has brought research back to the Nelson-Siegel class.

Several researchers have extended and investigated the DNS. Diebold et al. (2006) put

the DNS in state-space equation form and include macro-economic factors. Pooter (2007)

examines various extensions of the DNS with the purpose of fitting and forecasting. He

concludes that an extension with a fourth factor (Svensson, 1995) forecasts very well.

Especially with a one-step state-space estimation approach using a Kalman filter. Koop-

man et al. (2010) extend the DNS in two directions. First, they impose that the factor

loadings in the DNS depend on an additional loading parameter, that they treat as the

fourth latent variable (λt). Second, they introduce time-varying volatility to the DNS

using a standard GARCH specification.

Besides the introduction of the DNS, the Arbitrage-Free Nelson-Siegel(AFNS) model was

developed by Christensen et al. (2011) which takes the DNS and imposes the absence of

arbitrage. This partly closes the gap between theoretically rigorous and statistical term

structure models. It adds an additional time-invariant ”yield-adjustment term” which

leads to the differences between the AFNS and DNS. Because the DNS generally fits

well, and market yields are assumed to be arbitrage-free, it is arguable that the DNS is

arbitrage-free up to an accurate approximation: the no-arbitrage constraint should be

largely non-binding. Coroneo et al. (2011) support this claim by finding that the normal

Nelson-Siegel is compatible with the no-arbitrage constraints in the US interest rate

market. Duffee and Stanton (2012) come to similar conclusions. Joslin et al. (2011) con-

clude that for their JSZ model forecasts are invariant to the imposition of no-arbitrage

restrictions. Furthermore, they state that the AFNS model is a constrained special case

of the JSZ normalization. Despite this consideration a lot of recent research has focused

on the AFNS. For example, Christensen et al. (2009a) extend their AFNS model to

Introduction 3

form the Arbitrage-Free Generalized Nelson-Siegel (AFGNS); a Svensson extension (a

fourth factor). Also, Christensen et al. (2013) incorporate stochastic volatility in the

AFNS, but conclude that much observed stochastic volatility cannot be associated with

the spanned term structure factors.

Because of these arbitrage-free considerations, this theses will focus on the DNS. The

novel feature of this thesis is the relatively new estimation method. Time-series mod-

els with time-varying parameters can be categorized into two classes: parameter driven

models and observation driven models (Cox et al., 1981).

In parameter driven models, the time-varying parameters are stochastic processes sub-

ject to their own source of error. Therefore, the parameters are not perfectly predictable

given the past: the likelihood function is not known in closed form.

The alternative is an observation driven model, where the time-varying parameters are

dependent on (functions of) lagged dependent values, exogenous variables and past ob-

servations. Parameters are stochastic, but predictable given the past. This approach

simplifies likelihood evaluation, because the likelihood function is known in closed form.

Creal et al. (2008, 2012) introduce Generalized Autoregressive Score (GAS) models; a

class of observation driven time series models. The GAS model uses the scaled score

function as driving mechanism of the time-varying parameters.

The usage of the GAS framework is different, as most papers use a parameter driven

approach. Where the NS model parametrization is used for the observation equation,

combined with a transition equation for the unobserved factors, which are modeled

as (Vector) Autoregressive processes. Together with the NS observation equation this

forms a state-space structure. Restrictions on this state-space are imposed in order to

estimate the model, mostly Gaussian disturbances are assumed in both equations. This

way it is possible to use a Kalman filter (1960). Some propose Bayesian estimation using

Markov Chain Monte Carlo (MCMC) in order not to make these assumptions (Laurini

and Hotta, 2010). With the use of the GAS framework we are not restricted to such

assumptions or methods. This leads to the main question that is answered in this thesis:

How does the GAS framework perform in the term structure setting?

Which model specification performs best? Different specifications are compared, both

in-sample fit and out-of-sample forecasting is regarded, as well as estimation ease and

robustness.

This main question results in different sub-questions:

• Is the assumption of normality valid?

As mentioned, previous applications of the DNS/AFNS have used Gaussian errors.

Different distributions could be use, such as the student-t distribution.

Introduction 4

• Can heteroskedasticity be included in the model?

Is the assumption of cross-sectional and longitudinal independence of the distur-

bances valid? Can the model be extended to incorporate other possibilities?

• How do GAS estimated models perform in relation to Kalman estimated models?

The performance of the GAS estimated models is compared to traditional Kalman

estimated models, regarding in-sample fit and out-of-sample predictions.

Methodology and Techniques

In this thesis new specifications of the DNS are proposed. For these new model specifi-

cations analytical derivations of the likelihood are determined. Furthermore, all models

considered are programmed in Matlab.

The model specifications are assessed on the basis of in-sample fit and out-of-sample fore-

casts. For the in-sample fit, measures such as the Root Mean Squared Error (RMSE))

are used. Also more deciding measures as Akaike and Bayesian information criterion

(AIC/BIC) are used, which judge the effect of adding additional variables. Increasing

the fit of the model, making it more complex vs. a more parsimonious model. Likelihood

ratio (LR) tests are used to test significance of more elaborate nested models, such as

a model with student-t errors vs. Gaussian errors and specifications with time-varying

volatility. For the non-nested models we use the Rivers-Vuong (RV) test: the GAS es-

timated vs. Kalman filter estimated models.

For out-of-sample forecast, comparisons are made between the forecast errors. This is

done using (trace) Root Mean Square Forecast Error ((t)RMSFE). Besides this rela-

tively subjective comparison a more formal test is used to compare predictive accuracy

of different models, namely the Diebold-Mariano (DM) test statistic.

Chapter 2

Theory

In this section the theory behind modeling the term-structure is explained. First we

explain what interest rate curves are and how these curves relate. We then explain how

they are derived. Subsequently, we give reason why to model them and why to use the

Nelson-Siegel method. Finally, we explain the Generalised Autoregressive Score (GAS)

framework and how it can be adopted for the term-structure.

2.1 The Yield Curve

We start by deriving what we try to model: the yield curve or the term structure of

interest rate. The yield curve is used to describe the relationship between the yield (i.e.

the return) of bonds and the time to maturity. This term-structure is given for bonds

with the same credit risk : for example US treasury bonds. This is done because it is

assumed that these will have similar dynamics or the same factors driving its dynamics.

We want to compare different countries or companies over the course of time. We start

by defining the relationships between the different interest rate curves: the yield curve,

the discount curve and the forward curve (Hull, 1999).

Define P (τ) the price of τ -period discount bond (discount bond meaning that a dis-

counted price is paid for a to be received amount in the future). Hence P (τ) denotes

the present value of a risk-free contract that pays unit at its maturity τ -periods ahead:

τ = T − t, the time to maturity. If y(τ) is the continuously compounded yield to

maturity, then by definition the discount curve is given by:

P (τ) = e−τy(τ). (2.1)

Hence the yield curve and the discount curve are fundamentally related. Knowing one

of the curves enables you to calculate the other curve immediately.

5

Theory 6

The forward rate curve gives the forward rate as a function of the maturity. It is similarly

related :

f(τ) = −P′(τ)

P (τ). (2.2)

(2.1) and (2.2) imply in the relationship between yield and forward curve:

y(τ) =1

τ

∫ τ

0f(u)du. (2.3)

So the yield is the equally weighted average of the forward rates. This proves that once

we have a representation of any one of the above equations we can automatically derive

the other. That is: all are interchangeable. (Diebold and Rudebusch, 2011; Piazzesi,

2010)

2.2 Zero-Coupon Yields

Although the yield curve is most used in practice, yields are not observed. Instead yields

need to be estimated from a large set of bond prices with an approximation method.

In practice bonds exist of all different maturities at all time. The difficulty is that we

cannot simply apply the formulas (2.1)-(2.3) to the observed market prices of the bonds.

This is not possible because most bonds bear coupon payments: mostly semi-annually

payments are received. Because of these coupon payments the prices of these bonds bear

a so-called ’coupon effect’ as analyzed by Caks (1977). Therefore bonds with the same

maturity, but with different coupon rates will have different yields. Because of these

effects ’zero-coupon’ yields have to be estimated from the large pool of bonds that are

traded. Different researchers have come up with methods to estimate these zero-coupon

yields.

McCulloch (1975) introduces a cubic splines method to estimate the zero-coupon yield.

The disadvantage of this method is that it has some trouble in fitting flat curves. This

is a result of a diverging discount curve at long maturities. Still the Federal Reserve

presents their yield curves using this method. Vasicek and Fong (1982) overcome the

problem of this method by using exponential splines. Only this method has as problem

that the forward rate is not strictly positive. The last method is introduced by Fama

and Bliss (1987). They construct the yields from estimated forward rates: ”unsmoothed

Fama-Bliss” rates. These forward rates are created using the prices of the coupon bearing

bonds which are averaged using a bootstrap method, assuming a constant forward rate

between the different maturities. The forward rates are then converted to a yield curve

using formula (2.3). The created yields exactly price the bonds used to create them.

Theory 7

This method is regarded as the most accurate and therefore many studies use it: for

example Diebold and Li (2006), Pooter (2007), Koopman et al. (2010). Eventually fitting

a parametric model to these yields will create smoothed yields. This will be discussed

in chapter 3.

2.3 Why Model?

Now that we have defined what we try to model, we elaborate on why we would want

to model this yield curve. Piazzesi (2010) mentions that there are at least four reasons

for modeling the yield curve.

The first reason is forecasting. Yields of long maturity bonds are the expected values

of the average short yields. This is after any risk adjustments. Therefore the current

yield curve tells something about future directions of the economy. Besides the use for

forecasting future yields, it can be used to forecast real activity and inflation (Diebold

et al., 2005; Fama, 1990). All these forecasts can be used for investment decisions, sav-

ings decisions and policy decisions.

A second reason is the assessment of monetary policy. Central banks of most industri-

alized countries seem to be capable to move the short end of the yield curve. However,

what mostly matters for long-term economic growth and developments are the long-

term yields. For example decisions made to buy or rent a house are driven by long-term

mortgage rates (so long-term yields), not the short-term central bank driven rate (e.g.

Federal Funds Rate). Modeling the yield curve can help to understand how moving the

short-end of the yield curve effects long-term yields. The research comprises both the

understanding of the mechanisms in this process as well as understanding how central

banks conduct policy. A recent example is the Quantitative Easing (QE) by the Fed-

eral Reserve (Christensen et al., 2009b). Also the European central bank has recently

decided to use such unconventional monetary policy to stimulate the economy.

A third reason is debt policy. Governments issue debt in the form of bonds. Govern-

ments need to decide about the maturities of the issued bonds. The supply of bonds

with different maturities influences the yields. Governments can actively manage the

maturity structure of its public debt. For example this can be done by selling short

maturity debt and buying long maturity bonds.

The fourth reason is the use for pricing and hedging. For example, coupon-bearing

bonds can be priced using a model of the term structure. Each payment is weighted by

the price of a zero-coupon bond that matures at the date of the coupon. Also, prices of

futures, options, swaps, caps and floors can be computed from a yield curve model. Fur-

thermore, some parties may need to manage risks associated with differences in received

or payed interests. Hedging strategies can be computed by determining the prices of

Theory 8

the derivatives, depending on different states of the economy. It can be argued that for

purposes such as pricing and hedging it is more important to use a model that imposes

the restrictions implied by the absence of arbitrage, whereas the temporal dynamics are

often of less value.

2.4 Nelson-Siegel Model

Because of the nature of the data a multivariate model is needed. As mentioned, the

focus of this thesis is on forecasting. Because we are focusing on forecasting we use

an empirically attractive model and stay away from traditional financial theory that

imposes the restrictions of the absense of arbitrage. To compress the information that is

in the bond data, a model with a factor structure can be used. We want to compress the

information in the yields for statistical reasons. A more parsimonious model will result

in worse in-sample-fit, but generally better forecasts. Fortunately financial theory often

suggests a factor structure. Successful financial models that use a factor structure are for

example: CAPM model (one factor) and Fama-French (three factors) (Fama and French,

1993; Jensen and Scholes, 1972). The risk premiums are often driven by a smaller number

of risk factors. Luckily yields also have a factor structure. The first three principal

components explain almost all variation (97%) (Litterman and Scheinkman, 1991). This

means that the high-dimensional set of yields is driven by a lower dimensional set of

factors.

The Nelson-Siegel is the most used parametric model for the interest rate curve. The

model is popular because of its robustness. It provides a smooth fit and is relatively

flexible, thereby ensuring a good fit. Above all, it provides statistically useful results

that are economically meaningful. Because of these properties it is a popular model

among its users.

The original model is in the class of exponential affine three factor term structure models.

’Affine’ in this context meaning constant plus a linear term: a function of a vector of

observable or unobservable (latent) factors. It was developed by Nelson and Siegel (1987)

for the static cross-section of the term structure. Their model was designed to fit the

forward rate. The model is deduced from the observation that the typical yield curve

shapes are associated with differential or difference equations. The function consists of

a product between an polynomial and an exponential decay term: a Laguerre function.

A Laguerre function is used to approximate function in domain [0,∞). Because yields

are in this domain it will give a good fit. In this thesis the re-factorization of Diebold

and Li (2006) is used. This is done because it gives a more intuitive interpretation of

the factors (level,slope and curvature). The representation of the instantaneous forward

Theory 9

rate is given by:

f(τ) = β1 + β2e−λτ + β3λτe

−λτ . (2.4)

Integrating this forward rate from 0 to t as in equation (2.3) gives the Nelson-Siegel

yield curve:

y(τ) = β1 + β2

(1− e−λτ

λτ

)+ β3

(1− e−λτ

λτ− e−λτ

). (2.5)

The Nelson-Siegel is not just arbitrary but it has some characteristics that are desirable.

Namely that the price of a bond is unit at execution, because of the absence of risk.

Also the price of a bond will go zero when time to maturity(τ) goes to infinity.:

P (0) = 1. (2.6)

limτ→∞

P (τ) = 0. (2.7)

The interpretation of the β’s can be deduced by examining the limiting properties of

the parametrisation:

limτ→∞

y(τ) = β1. (2.8)

limτ→0

y(τ) = β1 + β2 = r. (2.9)

It shows that β1 gives the level of the yield curve. It provides the long-run component of

the yield curve. It is constant for all maturities. It can be seen as the level of the short

rate (r). β2 is the short-term component as it starts at 1 and decays fast to zero with

maturity. Together β1 and β2 form the instantaneous short rate (r). β3 is the medium-

term component because it starts at zero, then increases and finally decays again with

longer maturities. This means that β3 provides the curvature of the yield curve. It does

neither affect the short, nor the long end very much. It mostly effects the middle of the

curve. The properties are illustrated by figure 2.1

λ determines the decay speed of the parameters. A larger λ will fit longer maturities

better. The opposite is true for a smaller λ, which will fit shorter maturities better.

Theory 10

20 40 60 80 100 120

0.2

0.4

0.6

0.8

1.0

Figure 2.1: Factor loadings, blue depicts the first loading given by 1. In red the

second loading ( 1−e−λτ

λτ ) and in yellow third loading ( 1−e−λτ

λτ − e−λτ ), where λ = .062

The flexibility of the NS curve is also a desirable property. It is flexible because is can

assume a variety of shapes through different values of the latent factors: It can be an

increasing or a decreasing function, both at a decreasing as at an increasing rate. It can

assume a S-shaped curve, it can be a flat line ,but it can also adopt a U-shape or an

inverted U-shape (humped or inverted humped). Figure 2.2 illustrates the flexibility of

the Nelson-Siegel curve. The limitation of the NS is that is can only have one optimum.

Luckily this constraint is mostly non-binding. The yield curve usually does not move

jagged with maturity.

Although the NS curve is relatively flexible, it still uses a parsimonious approximation.

The sparse use of factors results in a smooth curve. This smoothness is preferred because

it protects against over-fitting. Over-fitting is undesirable because it results in difficul-

ties in estimation. This over-fitting will most likely end in unmanageable estimation.

Moreover, over-fitting frequently leads to bad forecasting capabilities.

20 40 60 80 100 120

-0.5

0.5

1.0

1.5

2.0

2.5

Figure 2.2: Shapes the NS model can assume: constructed by fixing β1 = 1, β2+β3 =0 and λ = .062

Theory 11

2.4.1 Dynamic Nelson Siegel

The DNS is an evolution of the NS model. Diebold and Li (2006) convert the factor

model for the cross-section into a dynamic factor model. This is done by extending the

model with time-varying latent factors. The latent factors determine the cross-section

of the yield curve, as shown in the previous section. The dynamics of the factors subse-

quently determine the longitudinal dynamics of the yields.

The introduction of this dynamic factor structure has mentionable advantages. It con-

verts the high-dimensional situation (a cross-section of many yields over different time-

periods) into a easier low-dimensional one.

yt(τ) = β1t + β2t

(1− e−λτ

λτ

)+ β3t

(1− e−λτ

λτ− e−λτ

). (2.10)

Diebold and Li (2006) introduce two benchmark models. One with the three latent

factors modeled as univariate AR(1) models. And one with the factors modeled as a

first-order vector autoregressive model VAR(1).

Subsequently Diebold et al. (2006) put the model in State-Space form, adding stochastic

errors to the Nelson-Siegel observation curve. This produces a measurement equation.

It relates a set of N yields with time to maturity τ ∈ {τ1, τ2, . . . , τN} to the three

unobservable factors (β1t, β2t, β3t). This gives:

yt(τ1)

yt(τ2)...

yt(τN )

=

1 1−e−λτ1

λτ11−e−λτ1λτ1

− e−λτ1

1 1−e−λτ2λτ2

1−e−λτ2λτ2

− e−λτ2...

......

1 1−e−λτNλτN

1−e−λτNλτN

− e−λτN

β1t

β2t

β3t

+

εt(τ1)

εt(τ2)...

εt(τN )

. (2.11)

The factor dynamics are then specified in the state equation as:β1t

β2t

β3t

=

µ1

µ2

µ3

+

a11 a12 a13

a21 a22 a23

a31 a32 a33

β1,t−1

β2,t−1

β2,t−2

+

η1t

η2t

η3t

. (2.12)

The state space formed by (2.11) and (2.12) can be put in a more convenient vector

notation:

yt = X(λ)βt + εt. (2.13)

βt = µ+Aβt−1 + ηt. (2.14)

Theory 12

Where yt, εt are vectors of (N × 1), X is a matrix of (N × 3) and βt, µt, ηt vectors of

(3× 1).

The matrix A can be estimated in full or in diagonal form: VAR(1) or AR(1). However,

it is argued by Diebold and Li (2006) that using a VAR (correlated factors) will result

in bad forecasting capabilities.

In order to estimate the model using a Kalman filter (1960) (procedure given in appendix

B), assumptions are made on the error decomposition. Most papers assume that εt and

ηt are uncorrelated Gaussian White Noise:(εt

ηt

)∼ N

[(0

0

),

(Σε , 0

0 , Ση

)]. (2.15)

Furthermore it is assumed that the errors are orthogonal to the initial state vector:

E[β0ε′t] = 0, E[β0η

′t] = 0. (2.16)

Also extensions can be made using this framework; such as a fourth factor (Bjork and

Christensen, 1999; Svensson, 1995). They argue that this improves fit for longer matu-

rities: especially longer than 10 years. Pooter (2007) even concludes that this improves

forecast performance. They conclude that the extension of Bjork and Christensen (1999)

gives similar results. This specification is easier to estimate because it only assumes one

λ instead of two and therefore reduces the estimation space. The resulting loading ma-

trix X(λ) then has dimensions (4× 1) and βt, µt, ηt become (4× 1) vectors. The loading

matrix is then given by:

X(λ) =

1 1−e−λτ1

λτ11−e−λτ1λτ1

− e−λτ1 1−e−2λτ1

2λτ1

1 1−e−λτ2λτ2

1−e−λτ2λτ2

− e−λτ2 1−e−2λτ2

2λτ2...

......

...

1 1−e−λτNλτN

1−e−λτNλτN

− e−λτN 1−e−2λτN

2λτN

. (2.17)

Another direction in which the model is extended is through the inclusion of macro-

variables. Diebold et al. (2006) extend the model by adding macro-variables to the state

vector.

Furthermore, some have proposed models in which the state equation has regime-

switching properties. This can be used when yield structurally change for long periods

for example for a change in fiscal policy or a recession. Bernadell et al. (2005) and

Xiang and Zhu (2013) for example make the means of the state equation dependent on

the regime. Such models can be estimated using a Kalman filter with a Hamilton model

(1990) for regime-switching.

Theory 13

Another way of estimating the models in state-space form is the use of Bayesian infer-

ence using Markov Chain Monte Carlo (MCMC). These methods allow for more flexible

disturbance specifications, though they are computationally very intensive. Hautsch and

Yang (2012) and Caldeira et al. (2010) estimate models with stochastic volatility using

these Bayesian methods.

The main disadvantage of this state-space framework is the assumption of disturbances

in the measurement equation and the state equation. And moreover, the assumptions

that are made in order to estimate it with a Kalman filter. Or the estimation disadvan-

tages when not making these assumption through the use of Bayesian methods. A clear

advantage is the theory that has accumulated on these subjects. And especially the

theory on the Kalman filter with the elegant recursive estimation routine: it is simple,

intuitive, straightforward and powerful. The clear disadvantage of the Kalman filter is

its sensitivity to deviations from the Gaussian distribution and its adaptive capacities.

We therefore proceed to introduce the GAS model.

2.5 Generalized Autoregressive Score

The GAS model of Creal et al. (2008, 2012) gives a different framework to estimate the

DNS introduced before. To model the dynamics of the time-varying parameters, it does

not assume a state-space framework with individual disturbances: both a disturbance in

the observation equation as one in the state equation. Consequently we do not have to

make assumptions on the distribution of the disturbance in the unobserved state equa-

tion.

The GAS model approaches the modeling of these time-varying parameters different but

also shares some similarities. Like the Kalman filter the GAS model links past obser-

vations with future parameters. And in the way that it does this, likelihood evaluation

will still be straightforward.

The observation-driven GAS model is chosen because its main advantage is that it

exploits the full observation density. It is not simply limited to a first or second mo-

ment. Also important is that it can be used for all kinds probability distributions.

Furthermore, it can be applied to linear regressions, but also non-linear regressions

with time-varying coefficients (we will come to that in section 3.1.3). Consequently,

the GAS model nests many econometric models such as the Generalized Autoregressive

Conditional Heteroskedasticity (GARCH) models of Bollerslev (1986),the autoregressive

conditional duration and intensity (ACD and ACI, respectively) models of Engle and

Russell (1998) and Russell (1999), the dynamic conditional correlation(DCC) model of

Engle (2002) and Dynamic Copula (Creal et al., 2008). We will now introduce the

modeling framework.

Theory 14

2.5.1 The Modeling Framework

Let yt denote the dependent variable of interest a N × 1 vector, ft the time-varying

parameter vector, xt a vector of exogenous variables, and θ a vector of static param-

eters. Define Y t = {y1, ..., yt}, F t = {f1, ..., ft} and Xt = {x1, ..., xt}. The available

information set at time t consists of {ft,Ft} where

Ft = {Y t−1, F t−1, Xt}. (2.18)

yt is assumed to be generated by the observation density

yt ∼ p(yt|ft,Ft; θ). (2.19)

The updating mechanism for the time varying parameter vector ft is given by the auto-

regressive updating equation

ft+1 = ω +

p∑i=1

Aist−i+1 +

q∑j=1

Bjft−j+1, (2.20)

where ω is a vector of constants, Ai and Bj coefficient matrices, st = st(yt, ft,Ft; θ) and

st is given by:

st = St · ∇t, with ∇t =∂ ln p(yt|ft,Ft; θ)

∂ft, St = S(t, ft,Ft; θ). (2.21)

So the updating equation (2.20) consists of a constant (ω), a part that uses the scaled

score (st−i+1) and a part that uses the lagged factors (ft−j+1).

The scaled score (st) consists of, as said the score (∇t) and a scaling matrix (St). The use

of the score (∇t) for updating the factors is intuitive, as it gives the direction (steepest-

ascent) in which the factors must be changed to increase the local likelihood, given the

current factors (ft). St is the scaling matrix. Through the choice of this scaling, the

model allows for more flexibility. However it is often a natural consideration to use a

scaling that depends on the variance of the score. That is, the use of the inverse Fisher

information:

St = I−1t|t−1 = Et−1[∇t∇′t

]−1= −Et−1

[∂2 ln p(yt|ft,Ft; θ)

∂ft∂f ′t

]−1. (2.22)

Together equations (2.20)-(2.22) form a GAS(p,q): a Generalised Auto-regressive Score

model with orders p and q. The q gives the number of lags of the factors are consid-

ered: the auto-regressive part of the factors. The p gives the number of lags of the

(scaled) score that are considered. The updating mechanism (2.20) can be interpreted

as a Gauss-Newton algorithm.

Theory 15

An important feature of the model is that under the right specifications the scaled score

(st) forms a martingale difference series: Et−1 [st] = 0. This is a property of the score.

For the variance we get Et−1 [sts′t] = StIt|t−1S′t. When scaling with St = I−1t−1 this

reduces to I−1t|t−1. When scaling with St = I we get It|t−1 as variance. As suggested

scaling in preferably done with the inverse Fisher information matrix (I−1t|t−1). Alterna-

tively the score could be scaled with a unit matrix. This way the unscaled score is used

as updating mechanism. This makes updating similar to a steepest-ascent optimization

of the likelihood. But according to Creal this updating mechanism is often less stable.

Koopman (2012) suggests to use St = I−1/2t|t−1 . For this choice of scaling, st has constant

unit variance and is invariant under non-degenerate parameter transformations g(ft).

They state that the constant unit variance property that results from this scaling choice

is a useful device for detecting model mis-specification in applications.

Additionally the updating equation can be extended to include exogenous variables: xt.

Besides this the coefficient matrices can be functions dependent on the static parame-

ters: ω(θ), Ai(θ), Bi(θ).

This chapter is concluded by the observation that the state equation of the DNS in

equations (2.13)-(2.14) can be replaced by the updating mechanism of 2.20 the GAS

framework, as also proposed by Creal et al. (2008). Furthermore, Creal et al. (2011a)

even suggests to keep using state-space equation framework and the Kalman filter, but

to model other parameters than βt using the GAS framework. Although this possibility

exist, Koopman (2012) conduct a Monte Carlo study to compare parameter driven mod-

els with observation driven models and conclude that observation-driven GAS models

have similar predictive accuracy to correctly specified parameter-driven models. There-

fore, we proceed to model the time-varying factors of the NS using the GAS updating

equation.

Chapter 3

Model Specifications

In the following section different model specifications are introduced. As a result of

the GAS adoption we are no longer constraint by the use of a Kalman filter. We can

thus assume different disturbance specifications and nonlinearities. First, we consider

the standard Gaussian error specification. We then extend this to a specification with

Student-t distributed errors. Subsequently, a model with variable lambda is proposed.

Finally, we extend the model with a disturbance specification with time-varying volatil-

ity.

3.1 General Model

As general model we have:

yt = X(λ)βt + εt, (3.1)

where X(λ) is as given in equation (2.11) or (2.17) for the four-factor extension of Bjork

and Christensen (1999). We add βt to the time-varying parameter vector ft, which is

updated using equation (2.20) as proposed in the modelling framework. So as time-

varying factor vector we have at least ft = βt = (β1t, β2t, β3t)′ .

3.1.1 Gaussian

At first we assume Gaussian disturbances as done in the state-space framework with

Kalman filter. This specification is also estimated in Creal et al. (2008). For this

specification we have the disturbances εt given by:

17

Model Specifications 18

εt ∼ N(0,Σε). (3.2)

where εt is the disturbance vector of N × 1 and Σε a positive definite covariance matrix

of N ×N

3.1.2 Student-t

Instead of the Gaussian disturbances that are used by most researchers, we propose

to adopt multivariate student-t distributed disturbances. The student t-distribution is

symmetric and bell-shaped, like the Gaussian distribution. But as distinctive feature, it

has heavier tails, meaning that it is more prone to producing values that fall far from its

mean. The student-t distribution is suggested for many variables in finance. However

it does not yet capture the asymmetry we often see in financial returns. But as a start

we suggest to use the symmetric student-t distribution for the DNS.

Compared to the Gaussian distribution the student-t adds an additional parameter to

the probability function, namely the degrees of freedom v. The degrees of freedom

determine how fat the tails are. Particularly, the higher the degrees of freedom, the

closer that distribution will resemble a standard normal distribution. That is, for v →∞it resembles the Gaussian distribution. And for values of v > 30 it almost resembles

the Gaussian distribution. So the student-t is a family of distributions that nests the

Gaussian distribution. Hence, this generalizes the Gaussian model. For the disturbances

we have:

εt ∼ Student-t (0,Σε, v), (3.3)

where εt is a N × 1 disturbance vector and Σε a positive definite covariance matrix of

N ×N and v gives the degrees of freedom.

3.1.3 Variable Lambda

So far the decay parameter λ is assumed to be fixed over time. For example Diebold

and Li (2006) fix λ at 0.0609 and Diebold et al. (2006) estimate that λ = 0.077. λ

determines the place of the maximum of the curvature. It may be too restrictive to fix

this parameter as the characteristics of the yield curve may have changed over time. So

we allow for a variable λt as proposed by Koopman et al. (2010) and adopted by Creal

et al. (2008). Instead of the Kalman framework used by Koopman, which needs for local

linearization, we use the observation driven approach of the GAS as used by Creal. Our


time-varying factor vector is extended to ft = (β′t, λt)′. The matrix X(λ) in equation

(3.1) is replaced by a time-varying Xt dependent on λt:

Xt =

1 1−e−λtτ1

λtτ11−e−λtτ1λtτ1

− e−λtτ1

1 1−e−λtτ2λtτ2

1−e−λtτ2λtτ2

− e−λtτ2...

......

1 1−e−λtτNλtτN

1−e−λtτNλtτN

− e−λtτN

. (3.4)

3.1.4 Time-Varying Volatility

Interest rates are subject to financial market trade and therefore sensitive to market

sentiments and market movements. Therefore changes in volatility may emerge. The

models investigated so far have assumed constant volatility. We propose to adopt some

time-varying volatility specifications. We first propose an adapted version of the speci-

fication of Koopman et al. (2010). Further we propose a completely new specification.

3.1.5 Common Disturbance with Time-Varying Volatility

At first the disturbance decomposition proposed by Koopman et al. (2010) is adopted.

They argue that volatilities vary across different maturity yields. They find that shorter

maturity yields are more sensitive to a common shock than longer maturity yields.

Therefore they assume that the disturbance is composed of a common disturbance ε∗t

and an individual disturbance ε+t distributed as:(ε∗t

ε+t

)∼ N

[(0

0

),

(ht 0

0 Σ+ε

)]. (3.5)

The combined disturbance is then defined as

εt = Γεε∗t + ε+t . (3.6)

This leads to a variance given by:

Σε(ht) = htΓεΓ′ε + Σ+

ε , (3.7)

where ht is the time-variable variance of the common disturbance (ε∗t ). Γε is a N × 1

loading vector to pass the effect of the common disturbance onto the yields of the dif-

ferent maturities.

The GAS model is used to update ht, as opposed to the GARCH specification used by

Koopman et al. (2010). They follow the common GARCH specification proposed by


Harvey et al. (1994). In our case the variance ht is modeled as 1 of 4 latent factors

f∗t = (β1t, β2t, β3t, ht). Where the factors are again updated using equation (2.20).

In this specification restrictions are required to overcome identification problems. Koop-

man et al. (2010) propose a normalization Γ′εΓε = 1, but choose to fix the constant of

the common variance at a value close to zero. We choose to fix the first element of Γε

at 1 as this also prevents identification issues.

3.1.6 Common Volatility

Another convenient approach is to use a diagonal parametrization for the covariance

matrix. We therefore propose to use the following specification:

Σε (ht) = htΦ, (3.8)

where Φ is a N × N symmetric positive definite matrix of loadings passing the effect

of the common volatility on to the different maturity yields. Φ is chosen as a diagonal

matrix to save on parameters to be estimated. This means that the diagonal contains

N values. ht is again modeled using the GAS updating equation. Again we need to

normalize this as a multiplication of ht with an arbitrary number (or matrix) and the

division of Φ with the same number (or matrix inverse multiplication) would yield the

same variance matrix Σε (ht). This will result in identification issues. Again we choose

to fix the first element of Φ at 1.

Chapter 4

Data

In this section zero-coupon interest rate data are presented. The characteristic properties

of the yields are discussed. We discuss how the yield curves behave cross-sectional and

temporal: across different maturities and over time. Both are important as we want to

capture the dynamics of the yield curve.

In the thesis we use unsmoothed end-of-month US zero-coupon yields. The data can be

downloaded from the Center for Research in Security Prices (CRSP)1. These unsmoothed

yields are constructed using the Fama-Bliss method (1987). The data-set consists of

continuously compounded interest rates which are presented on an annualized basis.

The method gets rid of the coupon effects discussed by Caks (1977). For this method

filtered bond prices(average bid/ask) are used, eliminating bonds with special features

(such as option features). Using these bond prices, forward rates are generated using the

Fama-Bliss (1987) bootstrap method. The data-set used consists of observations in the

period from January 1970 to December 2009 and has T = 480 observations. It consists of

yield of N = 17 maturities: τi = 3, 6, 9, 12, 15, 18, 21, 24, 30, 36, 48, 60, 72, 84, 96, 108, 120

months. Together this forms a panel of 8160 data points. Figure 4.1 presents a 3D-plot

of the the observations.

1http://www.crsp.com

21

Data 22

19701975

19801985

19901995

20002005

2010

0

50

100

1500

5

10

15

20

YearTime to Maturity (in Months)

Yie

ld (

%)

Figure 4.1: 3D plot of the panel of zero-coupon yields. The figure shows the yields inthe period from January 1970 till December 2009. Yield data is used of 17 maturities,

between 3 months and 10 years.

From the plot it can be seen that yields differ substantially as a result of major economic

events and economic policy. There are periods of extreme high interest: in the early

80’s interests were high due to economic policy. But also we see the recent extreme

low interest rates after the financial crisis of 2008. Interest rates can be seen rising

and declining: for example before and after the burst of the 2000 dot-com bubble and

the 2006 housing bubble. Further we observe that the long-term trend for interests is

downward.

Also observable is that the yield curve differs in shape. A lot of different shapes can

be seen: increasing or decreasing both at increasing rate or at decreasing rate, flat, S-

shaped, U-shaped (inverted humped), inverted U-shaped (humped).

From the descriptive statistics in table 4.1 we see that the average yield curve is upward

sloping. This would mean that term-premia exist. A logical explanation for these premia

can be risk aversion or liquidity preference. Another stylised fact that is shown is that

the short-end of the yield curve is more volatile than the long-end. Volatilities of yields

tend to decrease with maturity. This can be seen as a confirmation that long-term rates

are the average of the expected future short-rates. We also see that all maturities have

high autocorrelations. But, the short end of the yield curve is less persistent; it has lower

Data 23

autocorrelations for longer lags than the long end. Autocorrelations of longer maturities

are still strong for longer lags (2 years).

It can also be seen that the yields are skewed to the right, which means that more mass is

in the right tail than there is in the left tail. Median yield curve with quantiles in figure

4.3 confirms that the yield is right skewed. Another fact is that yields are leptokurtic,

which might suggest thick tales and hence a student-t distribution.

Furthermore we specify proxies for the level, slope and curvature as proposed by Diebold

and Li (2006). The proxy for the level is simply given by the longest maturity yield (10

year). The slope is estimated as the yield of the longest maturity minus the yield of the

shortest maturity (10 year yield - 3 month yield). Finally, the curvature is defined as

2 times the 2 year yield minus the sum of the 3 month and the 10 year yield. We can

deduce from these that the yield curve is concave because the slope and curvature are

on average positive. Moreover we see that the level is highly persistent, as opposed to

the autocorrelation of the slope which goes to zero. The stylised fact that long rates are

more persistent than the short rates is indicated by the higher persistence of the level

than the persistence of the slope and curvature (β2 and β3).

The sample autocorrelations indicate that yields might be integrated of order one. If that

is the case the underlying process is non-stationary and we need to take first differences.

Fortunately, economic theory dictates that yields cannot be integrated and must have a

non-negative, finite expected value. So we may follow through in modelling in levels.

Finally, a Principal Component Analysis (PCA) confirms that the first three principal

components indeed give most of the yield variation. Together they capture almost 99%

of the yield variation. As can be seen from figure 4.2 the loadings of the principal

components show similarities with the Nelson-Siegel loadings. The loadings of the first

principal component almost exactly matches the inverted shape of the Nelson-Siegel

slope loading. The loadings of the second principal component corresponds to the shape

of the curvature. Only the loading of the third principal component is not exactly level

as its Nelson-Siegel counterpart. Instead it has to some extend a sinusoid shape.

Data 24

0 20 40 60 80 100 120−20

−15

−10

−5

0

5

10

15

20

25

Time to Maturity (in Months)

Lo

ad

ing

s

PC1

PC2

PC3

Figure 4.2: Loadings of first three Principal Components: where the inverse of theloading of the first principal component is depicted.

0 20 40 60 80 100 1200

2

4

6

8

10

12

14

Time to Maturity (in Months)

Yie

ld (

%)

5%

25%

Median

Figure 4.3: Median yield curve with 5, 25, 75 and 95 percentiles. The graph indicatesthat yields are skewed to the right.

Data 25

Table4.1:

Des

crip

tive

stati

stic

sof

the

yie

lds

Matu

rity

Mean

Std

.D

ev.

Med

ian

Min

imu

mM

axim

um

Skew

ness

Ku

rtosi

sρ1

ρ12

ρ24

35.7

663.

071

5.32

70.

041

16.0

190.

711

3.99

60.

979

0.74

90.

489

65.9

693.

098

5.51

50.

150

16.4

810.

665

3.82

10.

980

0.76

30.

517

96.0

833.

089

5.69

20.

193

16.3

940.

632

3.71

20.

981

0.77

10.

538

12

6.1

663.

053

5.83

10.

245

16.1

010.

573

3.58

80.

981

0.77

70.

552

15

6.2

533.

029

5.99

20.

377

16.0

550.

519

3.48

70.

982

0.78

50.

571

18

6.3

243.

009

6.07

00.

438

16.2

190.

519

3.46

30.

983

0.79

20.

585

21

6.3

872.

990

6.13

10.

532

16.1

730.

534

3.46

20.

983

0.79

70.

598

24

6.4

182.

943

6.18

30.

532

15.8

140.

518

3.40

00.

983

0.79

90.

609

30

6.5

122.

878

6.27

40.

819

15.4

290.

496

3.32

20.

983

0.80

80.

627

36

6.6

002.

832

6.34

70.

978

15.5

380.

531

3.35

00.

984

0.81

40.

642

48

6.7

562.

755

6.57

11.

019

15.5

990.

567

3.33

50.

984

0.82

20.

664

60

6.8

522.

671

6.65

01.

556

15.1

290.

611

3.27

70.

985

0.83

20.

685

72

6.9

642.

638

6.73

21.

525

15.1

080.

635

3.25

90.

987

0.84

20.

702

84

7.0

262.

573

6.84

32.

179

15.0

240.

709

3.30

20.

987

0.84

10.

709

96

7.0

692.

536

6.80

52.

105

15.0

520.

748

3.29

30.

988

0.85

00.

721

108

7.0

952.

519

6.77

52.

152

15.1

140.

800

3.32

70.

988

0.85

30.

724

120

(Level)

7.0

672.

465

6.68

32.

679

15.1

940.

863

3.40

90.

988

0.84

30.

717

Slo

pe

1.3

011.

362

1.33

8-3

.191

3.95

4-0

.454

3.03

60.

934

0.41

80.

024

Cu

rvatu

re0.0

030.

863

0.11

2-2

.174

2.90

5-0

.126

3.31

80.

877

0.44

10.

242

Chapter 5

Estimation

In this section the estimation methods are introduced. First we give the methods for

initial parameters estimates. This is for the initiation of the optimization procedures of

the specific models. We need sensible initial parameter estimates to avoid estimation

difficulties, because the models are highly parametrized. We first proceed to estimate a

good initial value for lambda. We then give the procedure to estimate the model using

the two step approach introduced by Diebold and Li (2006). Subsequently we give the

method to estimate the model using Kalman filter (1960) as proposed by Diebold et al.

(2006). Finally, we give the estimation framework of the GAS.

5.1 Initial Estimates

5.1.1 Lambda

For each cross-section, at each moment in time, we can estimate a (D)NS model. The

estimates of these cross-section models will play an important role later on in the estima-

tion of the models. We are especially interested in the estimation of λ as the optimization

over λ will be nonlinear. This nonlinearity may result in difficulties.

For any cross-section we minimize the sum of squares error. We want to know the values

of the following optimization:

minλ,β (yt −X(λ)β)′ (yt −X(λ)β) , (5.1)

where yt is the 17×1 vector of yields, X(λ) a 17×3 matrix of factor loadings depending

on λ, and β a 3× 1 vector of factors

27

Estimation 28

Given some λ this reduces to:

minβ (yt −Xλβ)′ (yt −Xλβ) . (5.2)

This is a simple OLS, hence we get:

βλ =(X ′λXλ

)−1X ′λyt. (5.3)

We substitute this in equation (5.1) and get:

minλ

(yt −Xλ

(X ′λXλ

)−1X ′λyt

)′ (yt −Xλ

(X ′λXλ

)−1X ′λyt

)(5.4)

= minλ y′t

(I −Xλ

(X ′λXλ

)−1X ′λ

)′ (I −Xλ

(X ′λXλ

)−1X ′λ

)yt.

Because of orthogonal projectors this reduces to:

minλ y′tyt − y′tXλ

(X ′λXλ

)−1X ′λyt. (5.5)

The optimization problem for the minimum sum of squares problem is then given by:

minλ

T∑t=1

(y′tyt − y′tXλ

(X ′λXλ

)−1X ′λyt

). (5.6)

Optimization of this function is done using the optimization routine of matlab; fminunc,

and gives as result λ = 0.062. Furthermore we estimate equation (5.1) for each cross-

section using Nonlinear Least Squares (NLS) to compare with our model with time-

varying λt.

5.1.2 Two-Step Estimation

Next we can turn to an estimate of the model using the two-step approach as proposed

by Diebold and Li (2006):

1. For some fixed λ we fit a static Nelson-Siegel to each cross-section (t = 1, ..., T )

using Ordinary Least Squares using equation (5.3). This results in three time

series of estimated latent factors (β1t, β2t, β3t) and estimated residual errors, the

measurement disturbances (εt)

2. A dynamic model is fitted to the estimated factors using equation (2.12). In the

paper of Diebold and Li (2006), AR(1) models are fitted to each of the factors. It

is also possible to fit a VAR(1) to the factors.

Estimation 29

The advantage of this procedure is that it is simple and numerically stable as it only uses

linear regressions. In this approach it is also possible to additionally estimate λ in the

first step. This results in four factors in the first step and a four-dimensional dynamic

model in the second step. In this procedure the parameter estimation error from the

first step is ignored in the second step. This may effect the second step and create a

bias or distort results. Consequently, it is difficult to conduct statistical inference.

5.1.3 One-Step State-Space Estimation

Using the initial values estimated in the two-step approach the state-space model can

be estimated in one step. Estimation of measurement (observation) (2.13) and state

(transition) equation (2.14) is done with the Kalman filter (1960). The Kalman filter

accounts for all the uncertainty in the framework.

The Kalman filter is an iterative estimation algorithm, it consists of two steps: a pre-

diction and an update step. The filter gives a minimum mean squared error prediction

of the latent factors. The Kalman algorithm is given in appendix B.

The likelihood of the Kalman filter is maximized using an optimization routine of MAT-

LAB. For the optimization ’fmincon’ is used, which is a constrained optimization rou-

tine. The routine is initialized with the estimates of the two-step model with reasonable

constraints given. The likelihood is optimized using the interior-point algorithm with

numerical derivatives and Broyden-Fletcher-Goldfarb-Shanno (BFGS) Hessian’s. BFGS

is the Quasi-Newton method used to approximate the Hessian’s of the likelihood needed

for the optimization.

5.2 GAS Estimation

Next we introduce the estimation procedure of the model in the GAS framework. First

the Maximum Likelihood estimation is introduced. We then give the procedure for the

initial factors. Subsequently we introduce a smoothing scheme for the Fisher information

matrix. We then show how we may conduct statistical inference. And finally we derive

the model specific scores and Fisher information criteria.

5.2.1 Maximum Likelihood Estimation

The GAS models are like the Kalman filter estimated by Maximum Likelihood (MLE).

For the maximum likelihood we need a fully specified probability density function. For

Estimation 30

our fully specified parametric model we have:

arg maxθ

(L(θ)) = log p(y1, . . . , yt|θ) (5.7)

= log

T∏t=1

p(yt|θ)

=

T∑t=1

`t(θ), where `t = log p(yt|θ).

The likelihood of the GAS can be evaluated in a iterative manner as can be seen from

equation (5.7) above: the local log-likelihood (`t) is determined for each time period

(t = 1, . . . , T ) and summed to a total in an iterative manner.

To determine the factors for each time-period we need to derive st from the updating

equation (2.20). For this we need at least the score (∇t) and preferably the Fisher in-

formation It|t−1 : so we need the derivative w.r.t. the dynamic factors (ft) as given in

equation (2.21) and (2.22). For each model specification this gives different results, the

scores and Fisher information matrices are derived in section 5.3 below.

The optimization routine optimizes the model over the static parameter space θ: for each

θ the likelihood is evaluated and adapted in a direction it will increase the likelihood.

Maximization is done using the MATLAB routine fmincon with the interior-point algo-

rithm and again BFGS Hessian’s.

5.2.2 Initial Factors

To start each likelihood evaluation we need to specify initial values for the dynamic

factors (f1). A couple of options are considered:

• A natural consideration is the unconditional expectation of the factors. For the

GAS(1,1) we have:

ft+1 = ω +A · stB · ft (5.8)

E [ft+1] = ω +A · E [st] +B · E [ft]

(I −B)E [ft] = ω +A · 0

E [ft] = (I −B)−1ω.

• Another possibility is to initialize the iterative procedure at the optimal DNS of

the cross-section at time t=1. For some fixed λ this could be estimated using OLS

with equation (5.3). For fixed λ the model is less sensitive to the initial value. For

Estimation 31

a variable λt initial values are more important as completely wrong initial values

will influence the estimates.

• The final possibility and the most correct is to first apply the forward GAS filter

(compute ft for t = 2, . . . , T + 1) with arbitrary initial values (f1), then backward

filter (compute ft for t = T, . . . , 0), then forward again.

So the GAS updating recursion can be started at different values, but in theory should

approach the optimal values after some ’learning’ time even with wrong initial values.

5.2.3 Smoothing

As proposed by Creal et al. (2008, 2012) we try to scale with the inverse Fisher informa-

tion (I−1t|t−1). A difficulty of scaling with (an approximation of) the inverse information

matrix is that this information matrix must be inverted. This can be a problem if the

information matrix is ill-behaved, i.e. it is not full of rank or numerically unstable for

some model.

A way to help reduce the chance of problems with non-invertible matrices is the use of

some smoothing scheme. Instead of the normal information matrix, a smoothed infor-

mation matrix is used as scaling; (Ist−1)−1. Creal et al. (2008, 2012) proposes to use a

Exponentially Weighted Moving Average (EWMA):

Ist−1 = αIst−2 + (1− α)It−1. (5.9)

for some 0 ≤ α ≤ 1. For α → 1 the model averages over all the past observations. For

α → 0 it reduces to scaling with the Information Matrix. The parameter α is initially

fixed at a safe value of α = 0.2. Eventually it can be added to the unknown parameter

vector θ and optimized using MATLAB’s optimization routine.

5.2.4 Inference

To conduct statistical inference we apply the standard limiting result. The estimated

vector θ has all the static parameters of the models. We use the Hessian at the optimum

to compute standard errors and t-values. By standard regularity conditions the MLE is

consistent and we have:

√T (θ − θ) d−→ N (0, H−1), with H = −E[∂2`/∂θ∂θ′]. (5.10)

The Hessian is calculated numerically and optimization is terminated when tolerance

between iterations is smaller than 10−6.

Estimation 32

5.3 Scores and Scaling

We now proceed to derive the score vectors and scaling matrices of the proposed models.

In order to do this, analytical derivatives and expectation of the log-likelihood functions

are determined.

5.3.1 Gaussian

At first we assume Gaussian errors as proposed in section 3.1.1. The errors εt are given

by a vector of N × 1 in our example N = 17. Hence we have:

εt = yt −Xtft ∼ N(0,Σε). (5.11)

For estimation convenience we assume a diagonal Σε.

The probability density for each observation is given by:

p(yt|θ) = (2π)−N2 (Σε)

− 12 exp

(−1

2(yt −Xβt)′Σ−1ε (yt −Xβt)

). (5.12)

`t(θ) = −N2

log(2π)− 1

2log(|Σε|)−

1

2(yt −Xβt)′Σ−1ε (yt −Xβt). (5.13)

Taking derivatives w.r.t. the initial factor ft = βt leads to the gradient and the scaling

matrix, the inverse information matrix:

∇t(θ) =∂`t∂βt

= −1

2· −2 ·X ′tΣ−1ε (yt −Xtft) (5.14)

= X ′tΣ−1ε (yt −Xtft).

St = Et−1[X′tΣ−1ε εtε

′tΣ−1ε Xt]

−1 (5.15)

= (X ′tΣ−1ε Xt)

−1.

Combined this leads to the scaled score given by:

st = (XtΣ−1ε X ′t)

−1X ′tΣ−1ε (yt −Xtft). (5.16)

Estimation 33

5.3.2 Student-t

We now derive the score and information matrix of the student-t distribution.

p(yt|θ) =Γ(v+m2

)Γ(v2

)[(v − 2)π]N/2 |Σε|1/2

[1 +

ε′tΣ−1ε εt

(v − 2)

]−(v+N)/2

, (5.17)

with, εt = yt −Xtβt

This leads to the log-likelihood given by

`t = log

[Γ

(v +m

2

)]− log

[Γ(v

2

)]− N

2log [(v − 2)π] (5.18)

− 1

2log|Σε|−

(v +N)

2log

[1 +

ε′tΣ−1ε εt

(v − 2)

]. (5.19)

Taking derivatives w.r.t. the factor βt obtains the score given by:

∇t =∂`t∂βt

= −(v +N)

2

[1 +

ε′tΣ−1ε εt

(v − 2)

]−1· −2 · X

′tΣ−1ε εt

(v − 2)(5.20)

= (v +N)

[1 +

ε′tΣ−1ε εt

(v − 2)

]−1X ′tΣ

−1ε εt

(v − 2). (5.21)

taking derivatives again gives:

∂2`t∂βt∂β′t

= (v +N)

(∂(v + ε′tΣ

−1ε εt

)−1∂β′t

X ′tΣεεt +(v + ε′tΣ

−1ε εt

)−1 ∂X ′tΣ−1ε εt∂β′t

)(5.22)

= (v +N)(−(v + ε′tΣ

−1ε εt

)−2 (X ′tΣ

−1ε εt

) (ε′tΣ

−1ε Xt

)+(v + ε′tΣ

−1ε εt

)−1 (−X ′tΣ−1ε Xt

).

The problem with this expression is that it is difficult to determine the expectation. We

therefore take the scaling derived for the Gaussian specification as an approximation of

the scaling. Also the derived hessian above is used as an approximation combined with

the smoothing scheme introduced in section (5.2.3). This leads to an approximation of

the real information matrix.


We now proceed to extend the factor vector ft with λt as proposed by Creal et al. (2008):

f+t = [β′t, λt]′. (5.23)

Estimation 34

Because the derivative w.r.t. βt stays unchanged. Only the derivative w.r.t λt needs to

be derived . Which is given by:

∂`t∂λt

=

(∂Xt

∂λtβt

)′Σ−1ε (yt −Xtβt). (5.24)

Together with the derivative given in (5.14) this forms:

∇+t (θ) =

∂`

∂f+t(5.25)

=

[∂`

∂βt,∂`

∂λt

]′=

X ′tΣ−1ε (yt −Xtβt)(

∂Xt∂λt

βt

)′Σ−1ε (yt −Xtβt)

=[Xt ,

(∂Xt∂λt

βt

)]′Σ−1ε (yt −Xtβt)

= Xt′Σ−1ε (yt −Xtβt),

with Xt =

[Xt,

(∂Xt

∂λtβt

)],

and∂xi(τi)

∂λt=

[0,e−λtτi

λt− 1− e−λtτi

λ2t τi,−1− e−λtτi

λ2t τi+ τie

−λtτi +e−λtτi

λt

].

Next the Information matrix is derived using the gradient:

It = E[∇+t ∇

+t′] (5.26)

= E[Xt′Σ−1ε εtε

′tΣ−1ε Xt

]= Xt

′Σ−1ε E

[εtε′t

]Σ−1ε Xt

= Xt′Σ−1ε ΣεΣ

−1ε Xt

= Xt′Σ−1ε Xt. (5.27)

So the scaling matrix is given by:

St = I−1t =(Xt′Σ−1ε Xt

)−1. (5.28)

Combined with the score this leads to the scaled score:

st =(Xt′Σ−1ε Xt

)−1Xt′Σ−1ε (yt −Xtβt). (5.29)

We now specify f+t = (β′t, λt)′ a vector of 4x1 as proposed by Creal et al. (2008):

f+t = φ0 + Φft. (5.30)

Estimation 35

Here ft = βt is the 3x1 vector of factors. This imposes a three-factor structure on the

dynamics of (β′t, λt)′. Using these restrictions the performance of the GAS model can

be assessed with non-linearity but with restrictions on the dynamics of the parameters.

Imposing no restrictions at all will result in a highly non-linear system, which will result

in estimation difficulties. For identification purposes the upper 3× 3 matrix is set equal

to the identity matrix. The the upper three elements of φ0 are set equal to zero. λt is

now a linear function of βt:

f+t = φ0 + Φft (5.31)β1t

β2t

β3t

λt

=

0

0

0

c0

+

1 0 0

0 1 0

0 0 1

c1 c2 c3

β1

β2

β3

=

β1t

β2t

β3t

c0 + c1β1 + c2β2 + c3β3

.

Creal et al. (2008) forgets to mention that this is parametrization for their model and that

the scaled score (5.29) that they depict is not complete. Since it is a parametrization

a new scaled score must be derived for the updating equation as this is the driving

mechanism of the updating equation:

∇t =∂`t∂ft

(5.32)

=∂f+t∂f ′t· ∂`t∂f+t

= Φ′ · ∇+t .

The inverse Fisher information is then given by:

(It|t−1

)−1=(Et−1

[Φ′∇+

t ∇+t′Φ])−1

(5.33)

=(Et−1

[Φ′∇+

t ∇+t′Φ])−1

=(Φ′Et−1

[∇+t ∇

+t′]Φ)−1

=(

Φ′Xt′Σ−1ε XtΦ

)−1.

Estimation 36

Hence this results in the scaled score:

st =(It|t−1

)−1 · ∇t (5.34)

=(

Φ′Xt′Σ−1ε XtΦ

)−1Φ′ · ∇+

t .


We now derive the score and information matrix for the model with common disturbance

specification. For our factor vector we have: ft = (β1t, β2t, β3t, ht). For each time period

the log-likelihood is given by:

`t(θ) = −N2ln(2π)− 1

2ln(|Σε(ht)|)−

1

2ε′t(Σε(ht))

−1εt. (5.35)

with, εt = yt −Xtβt. Taking the derivative with regard to ht gives:

∂`t∂ht

= −1

2

∂ln (|Σε(ht)|)∂ht

− 1

2

∂(ε′t (Σε(ht))

−1 εt

)∂ht

. (5.36)

Derivations of the parts are given below:

∂Σε(ht)

∂ht= ΓεΓ

′ε. (5.37)

∂ln (|Σε(ht)|)∂ht

= Tr

(Σε(ht)

−1∂Σε(ht)

∂ht

)(5.38)

= Tr(Σε(ht)

−1ΓεΓ′ε

).


−1 εt

)∂ht

= −ε′t (Σε(ht))−1 ∂Σε(ht)

∂ht(Σε(ht))

−1 εt (5.39)

= −ε′t (Σε(ht))−1 ΓεΓ

′ε (Σε(ht))

−1 εt

= −(ε′t (Σε(ht))

−1 Γε

)2.

Hence, together these form the derivative w.r.t. ht; the fourth element of the score

vector:

∂`t∂ht

= −1

2Tr(Σ−1ε (ht)ΓεΓ

′ε) +

1

2

(ε′t (Σε(ht))

−1 Γε

)2. (5.40)

Combined with the derivative w.r.t. βt derived in (5.14) this forms:

∂`t∂f∗t

=

[∂`t∂β′t

,∂`t∂ht

]′. (5.41)

Estimation 37

To derive the scaling matrix, the inverse Information matrix, all terms are derived w.r.t

to the factors once more. This results in:

∂2`t∂f∗t ∂f

∗t′ =

∂2`t∂βt∂β′t

∂2`t∂βt∂ht

∂2`t∂ht∂β′t

∂2`t∂2h2t

. (5.42)

Further derivations of the parts are given by:

∂Tr(Σ−1ε (ht)ΓεΓ′ε)

∂ht= Tr

(∂Σ−1ε (ht)

∂htΓεΓ

′ε

)(5.43)

= Tr

(−Σ−1ε (ht)

∂Σε(ht)

∂htΣ−1ε (ht)ΓεΓ

′ε

)= −Tr

(Σ−1ε (ht)ΓεΓ

′εΣ−1ε (ht)ΓεΓ

′ε

)= −Tr

((Σ−1ε (ht)ΓεΓ

′ε

)2).


−1 Γε

)∂ht

= ε′t∂ (Σε(ht))

−1

∂htΓε (5.44)

= −ε′tΣε(ht)−1∂Σε(ht)

∂htΣε(ht)

−1Γε

= −ε′tΣε(ht)−1ΓεΓ

′εΣε(ht)

−1Γε.


−1 Γε

)2∂ht

= 2 · ε′t (Σε(ht))−1 Γε


−1 Γε

)∂ht

(5.45)

= −2(ε′t (Σε(ht))

−1 Γε

) (ε′tΣε(ht)

−1Γε)

Γ′εΣε(ht)−1Γε

= −2(ε′t (Σε(ht))

−1 Γε

)2Γ′εΣε(ht)

−1Γε.

Together these parts form:

∂2`t∂h2t

=1

2Tr((

Σ−1ε (ht)ΓεΓ′ε

)2)− (ε′t (Σε(ht))−1 Γε

)2Γ′εΣε(ht)

−1Γε. (5.46)

Taking expectations gets:

It|t−1 = −Et−1[∂2`t∂h2t

]= −1

2Tr((

Σ−1ε (ht)ΓεΓ′ε

)2)+(Γ′εΣε(ht)

−1Γε)2. (5.47)

Estimation 38

For the cross-derivative this gets:


−1 Γε

)∂β′t

= −Γ′εΣε(ht)−1Xt. (5.48)

∂2`t∂ht∂β′t

=∂(ε′t (Σε(ht))

−1 Γε

)2∂β′t

(5.49)

=1

2· 2 · ε′t (Σε(ht))

−1 Γε∂(ε′t (Σε(ht))

−1 Γε

)∂β′t

= −ε′t (Σε(ht))−1 ΓεΓ

′εΣε(ht)

−1Xt.

Taking expectations this gets:

Et−1

[∂2`t∂ht∂β′t

]= 0. (5.50)

Because ht is a variance it needs to be positive. We thus use ht = exp(αt) as parametriza-

tion. αt is taken to be the modeled factor. We need to derive the new score and in-

formation matrix w.r.t. to our new factors f t = (β1t, β2t, β3t, αt). Because this is an

invertible mapping. The following rule can be applied for derivation of the scaled score

(Creal et al., 2008). Let:

f t = g(ft) (5.51)

gt =∂g(ft)

∂f ′t. (5.52)

And let gt be invertible. We then have:

∇t =∂`t

∂f t=∂ft

∂f t· ∂`t∂ft

(5.53)

=

(∂f t∂f ′t

)−1· ∇t

∇t =(g′t−1

)−1∇t.Thus the score is given by:

st =(

Et−1

[(g′t−1

)−1∇t∇′t (gt−1)−1])−1 (

g′t−1)−1∇t (5.54)

= gt−1st.

Estimation 39

In this case we have a clearly invertible mapping, hence the rule can be applied. In our

case we have:

g(f∗t ) = (β1t, β2t, β3t, log(ht)) (5.55)

g∗t =∂g(f∗t )

∂(β1t, β2t, β3t, ht)=

1 0 0 0

0 1 0 0

0 0 1 0

0 0 0 h−1t

. (5.56)

5.3.5 Common Time-Varying Volatility

We now derive score and information matrix for the diagonal variance specification as

proposed in section (5.3.5). The likelihood is given by:

`t(θ) = −N2

log(2π)− 1

2log(|htΦ|)−

1

2h−1t ε′tΦ

−1εt. (5.57)

We have the score given by:

∇t =∂`t∂ht

= −1

2(Tr((htΦ)−1Φ)− h−2t ε′tΦ

−1εt) (5.58)

= −1

2(h−1t − h

−2t ε′tΦ

−1εt).

The second derivative is then given by:

∂`2t∂h2t

= −1

2(−h−2t + 2h−3t ε′tΦ

−1εt). (5.59)

As Information matrix we get:

It|t−1 = −Et−1

[∂`2t∂ht

](5.60)

= −Et−1

[−1

2(−h−2t + 2h−3t ε′tΦ

−1εt)

]=

1

2(−h−2t + 2h−3t · 17ht)

=33

2h−2t .

As scaled score we get:

I−1t|t−1∇t =2

33h2t · −

1

2(h−1t − h

−2t ε′tΦ

−1εt) (5.61)

=1

33(ε′tΦ

−1εt − ht).

Estimation 40

This means that the variance is adjusted with the size of the weighted squared distur-

bance ε′tΦ−1εt, taking the current variance ht into account. To ensure a positive variance

we take as factor ft = log(ht) we get:

I−1t|t−1∇t =1

33(h−1t ε′tΦ

−1εt − 1). (5.62)

Chapter 6

Results

In this section we discuss the in-sample estimation results. First, the in-sample results

are discussed through the RMSE and likelihood values. Also, likelihood ratio statistics,

Akaike, Bayesian information criterion are used to justify the size of the models. Further

we compare GAS estimated models to Kalman estimated models using the Rivers-Vuong

(2002) test. Finally, we describe model specific estimation results and observations.

Table 6.1 gives an overview of the estimated models.

Table 6.1: Overview of Estimated Models

Name Model Description Section

g-AR GAS Gaussian DNS with diagonal coefficient matrices in the

updating equation

5.3.1

g-VAR GAS Gaussian DNS with full coefficient matrices in the up-

dating equation

5.3.1

g-λ-AR GAS Gaussian DNS with variable λt and diagonal coefficient

matrices in the updating equation.

5.3.3

g-TVV-AR GAS Gaussian DNS with Common Time-Varying Volatility

and diagonal coefficient matrices in the updating equation.

5.3.5

g-CD-AR GAS Gaussian DNS with Common Disturbance with Time-

Varying Volatility and diagonal coefficient matrices in the

updating equation.

5.3.4

g-BC-AR GAS Gaussian four-factor Bjork-Christensen with diagonal

coefficient matrices in the updating equation.

5.3.2

t-AR GAS Student-t DNS with diagonal coefficient matrices in

the updating equation.

5.3.2

k-AR Kalman filter DNS with diagnonal coefficient matrix App. B

k-VAR Kalman filter DNS with full coefficient matrix App. B

41

Results 42

6.1 In-Sample Performance

Because all the models estimated using the GAS are nested, we test for statistical signif-

icance of the extended models using the likelihood ratio statistic (LR). Unfortunately,

the models estimated with the Kalman filter have a different distribution due to dif-

ferent assumptions and thus are non-nested. We therefore make comparisons based on

the residuals. We use the Rivers-Vuong test to compare the lack-of-fit of the competing

models that are non-nested. We compare the Gaussian (un)correlated factors to the

Kalman (un)correlated factors model: i.e. g-(V)AR) vs. k-(V)AR. As lack-of-fit mea-

sure we use the trace Mean Squared Error(tMSE).

In table 6.2 we present the likelihood values, Akaike (AIC), Bayesian Information Crite-

ria (BIC), Likelihood Ratio (LR) and Rivers-Vuong (RV) statistics. Further, we present

in table A.1 the RMSE’s and tRMSE.

From the likelihood values and RMSE we see that the model specification with a full

A and B (correlated factors/VAR) specification only marginally improves the fit of the

model, though LR-statistic, AIC and BIC suggest to use them.

Further, it can also be seen that the model with variable lambda, which is only es-

timated in diagonal A and B specification (AR), has a marginally better fit than the

model with static lambda. This can be concluded from the likelihood based measures but

also from RMSE. Both compared to the diagonal specification as to the full specification.

The specifications with time-varying volatility lead to a huge increase in the log-likelihood

value. The resulting likelihood-ratio statistic indicates highly significant improvements

in fit. Also the Akaike and Bayesian Information Criteria indicate that the model exten-

sions significantly improve fit, despite increased number of parameters. Yet, the RMSE

do not show such an increase in fit. The specification with common disturbance performs

worse based on RMSE. The specification with a common factor driving the volatility

only has slightly smaller RMSE for the 3 and 6 month maturity, for all other maturities

it performs slightly worse.

Moreover, we see that the use of the diagonal Bjork and Christensen (1999) extension,

which uses an additional fourth factor, comparably improves fit or even gives a slightly

better fit than the variable lambda specification based on likelihood and RMSE. This is

achieved through the addition of the same number of parameters.

The specifications with student-t disturbances indicate an even better fit of the model.

These specifications give a log-likelihood increase of over 40% compared to the basic

Gaussian model, which would indicate the superior fit of all the models, even without

time-varying volatility specification or nonlinearities. The RMSE of these model speci-

fications however indicate the opposite of a better fit. Here we see consistent increases

in the estimated errors.

Further we see from the RMSE that the GAS estimated models, both diagonal as full

Results 43

A and B, give a better fit than the Kalman estimated models. For all maturities they

have smaller RMSE. Moreover, the Rivers-Vuong statistics for the tMSE are significant

for both the diagonal as full specification comparisons.

Model `(θ) #θ AIC BIC LR RV

g-AR -4997 27 10048 10211g-VAR -4967 39 10011 10246 61*g-λ-AR -4906 30 9871 10052 183*g-TVV-AR -3595 29 7570 7745 2482*g-CD-AR -3258 46 6608 6885 3478*g-BC-AR -4890 30 9840 10021 214*t-AR -2712 28 5480 5649 4570*

k-AR 13130 27 -26205 -26043 -2.43*k-VAR 13139 33 -26212 -26014 18,6* -2.02*

Table 6.2: In-Sample Fit statistics in the period from 1970-Jan till 2009-Dec: Inthis table log-likelihood values, number of parameters, Akaike Information Criterion,Bayesian Information Criterion and Likelihood-Ratio statistic are reported. The Likeli-hood Ratio statistic is compared to the diagonal AR specification for the GAS estimatedmodels. Similarly a comparison is made between the two Kalman estimated models.The Rivers-Vuong statistic is calculated for the g-(V)AR vs. k-(V)AR. * indicates 99%

confidence

Next the results and observations from the individual estimated models are presented.

6.1.1 Two-Step

By fitting the DNS in two steps we obtain initial estimates of the autoregressive part of

the model. This is done for the AR and VAR specification. Given λ = 0.062 we have

an optimal βt, which results in an estimated curve with corresponding errors. Fitting

an autoregressive model to these estimates results in initial values for further model

estimation. The errors still show high autocorrelation (ρ1 = 0.73). This indicates that

there might be a common source of error or an additional factor driving the dynamics.

Also the kurtosis (the peakedness of distribution curve) indicates student-t disturbances.

6.1.2 Kalman

The model estimated with the Kalman filter using the initial values of the two-step

approach results in estimates that should be even closer to the optimal values of the

GAS estimated models. These results will also give a benchmark to which we compare

the GAS estimated models.

This specification is estimated both in a diagonal (k-AR) as well as in a full (k-VAR)

specification. The estimates of the coefficients of the VAR specification show that the

Results 44

factors do not affect each other much. Though the estimates of the off-diagonal entries

of the coefficient matrix are significant.

The log-likelihood values of the Kalman filter estimated models look far more favorable

compared to the values of the GAS estimated models (+8000 points). This results from

the way uncertainty is absorbed in the disturbances of the state equation. The small

variance matrix that therefore enters the log-likelihood function results in far higher

(positive) log-likelihood values. But the different model assumptions make that the

Kalman estimated models are not nested in the GAS estimated models and therefore

uncomparable. The validity of these assumptions are not tested (and difficult to test),

but likely violated. Especially, the assumption that all noise entering the system is White

noise. We see that the Kalman filter estimated errors still shows some autocorrelation

(ρ1 = 0.26). This also suggests the time-varying volatility specification or an additional

factor. The disturbances again have very high correlations, which would indicate an

additional factor or a common disturbance. Also, the excess kurtosis is high with values

of 19 for shortest maturity estimated disturbances till 4 for the 10 year maturity. This

would again suggest the use of student-t distributed errors.

6.1.3 Gaussian

Optimization of the GAS estimated models is started at the optimal estimated values

from the Kalman filter. This specification is both estimated for a diagonal as well as a

full A and B coefficient matrix. In table A.4 and A.5 we present the estimated coeffi-

cients. From this we see the high persistence in the yields through the high estimated

autoregressive coefficients. The full specification gives a significant increase in fit and the

estimates of the off-diagonal element are also significant. Still, the off-diagonal entries of

B, the coefficients for the lagged values of the factors, are close to zero. This is expected

as the factors closely resemble the first three principal component, which should mean

that they are orthogonal to each other and do not affect each other. Only the estimates

of the off-diagonal elements of A, the coefficient matrix for the scaled score, take on

very different values. In particular the entries that affect the third factor: the curvature

component. Also, the constant ω coefficient takes on a very different (negative) value for

the curvature factor, this may compensate for the effects of the score coefficients. But

through these dynamics the model has a different economical meaning, with a decreasing

medium-term component.

The estimated error statistics show that the model errors have low autocorrelation

(ρ1 = 0.05), which indicates that the GAS updating equation is good at absorbing shocks

in the factors. Still the errors of each time-period have high correlations. This can be

explained by a common disturbance or some other source driving these correlations.

Results 45

Further, the estimated disturbances have a low skewness. Though, again the residuals

have a high kurtosis. Again the kurtosis is a decreasing function of the maturity: from 24

for the shortest maturities to 4 for the longest maturity. This again points towards the

use of student-t disturbances. Moreover, Jarque-Bera tests and Kolmogorov-Smirnov

tests on the residual vectors of each maturity reject normality (99.9% confidence).

6.1.4 Student-t

In table A.9 we present the estimates of the coefficients of the student-t specification.

The simple DNS specification with student-t distributed disturbances shows a huge in-

crease in likelihood value. This increase can be accounted to the heavy-tailed nature

of this distribution. Due to these fatter tails, outliers or big shocks to the yields are

regarded as more probable. This assumption of an heavy-tailed distribution is not an

odd assumption. Indeed yields are highly persistent and deviations are small but when

shocks appear, they are sizeable. This would indeed suggest a heavy-tailed distribution.

Also the estimated disturbances from the two-step and Kalman estimation indicated

such an assumption.

However, as a result of this heavy-tailed assumption, the score of the multivariate-t dis-

tribution causes the factor dynamics not to react too fiercely to large values of |εt|. This

makes sense because such large values might easily be due to the fat-tailed nature of the

data and should not be fully attributed to increases in the factors. In reality though

these shocks or disturbances affect the yields for longer periods of time. Therefore these

shocks should be incorporated in the factors that are assumed to drive the dynamics

of the yields, i.e. the first three principal components explain 99% of the yield varia-

tion. With the assumption of student-t disturbances the reaction to shocks is less fierce.

This reasoning might explain why the estimated errors of this student-t model are in

fact larger than the estimated errors in the Gaussian case, despite the higher likelihood

value.

Another explanation for the higher estimation errors might be the use of scaling for

the score in this model specification. This scaling is not fully derived because the ex-

pectations of the Hessian are difficult to determine. Instead we use an approximation:

the scaling of the Gaussian model or the smoothed Hessian of the log-likelihood func-

tion(5.3.2). It is unlikely that this is the main reason for the larger estimated errors in

this model. The scaling of the score is time-invariant in the ’simple’ specification that

is evaluated. So small deviations should be compensated through the optimization of

coefficient matrix A of the updating equation (2.20).

As a result of this inability to quickly incorporate structural shocks into the factors we

see high autocorrelations (ρ1 = 0.45) in the residuals. From the plots of the residuals

Results 46

we see that these autocorrelations are indeed mainly the result of larger shocks. After

such shocks it takes the updating mechanism some periods to change the factors, such

that they give a correct representation of the yield curve. Furthermore, correlations are

still high in the estimated errors.


Fitting the Nelson-Siegel to the cross-sections using NLS indicates that the optimal λ

varies considerable over time. We have used a constrained optimization which limits

lambda to values between .01 and .5. In figure 6.1 we see the estimated values of

lambda plotted against the optimal lambda from the NLS. It can be seen that the GAS

estimate λt roughly tracks the NLS estimate. The maximum likelihood estimation of this

nonlinear model specification was considerably more difficult. The MATLAB routine has

more difficulties with maximization because of these nonlinearities. But also because

the Fisher information in this specification is dependent on the time-varying parameters

βt. Because of this dynamic dependence it often results in an information matrix that

is (numerically) ill-conditioned. Therefore we use the information smoothing scheme

of section 5.2.3. This reduces problems with the information matrix, but through this

extra parameter makes the system even more difficult to estimate. We therefore initially

fix the smoothing parameter at α = 0.2. These singularity issues are also tried to be

overcome through the addition of a small identity matrix to the information matrix.

1970 1975 1980 1985 1990 1995 2000 2005 20100

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

Year

La

mb

da

NLS

Lambda GAS

Figure 6.1: Plot of estimated λt vs. optimal λ from NLS on the cross-section

Because of the dynamic lambda this specification has some estimation issues. Fortu-

nately, Pooter (2007) recognizes similar issues in a four-factor Svensson specification,

Results 47

though by using the Kalman filter. He sees numerical problems and estimation issues

for the nonlinear model structure, which has two lambda’s that need to be estimated.

Gilli et al. (2010) even argues that in certain ranges of the parameters, the simple three-

factor Nelson-Siegel is badly conditioned, thus estimated parameters are unstable. For

many values of λ, factor loadings are correlated. That is, the correlation between the

second and third loading is high for many values of λ. Therefore it is difficult to attribute

the yield curve shapes to specific factors. In their paper about the calibration of the NS

they show the correlations between the factor loadings and conclude that λ should be

in a range from .1 to 4 for maturities up to 10 years (from .021 to .83 for τ in months

in our case). They also say that correlated regressors are not necessarily a problem in

forecasting. Though it will be an issue, especially in our diagonal model specification.

We do not want these regressors to be correlated as this will affect the factor estimates

and result in extreme values and nonlinearities in these estimates. From our results we

see that λt indeed roughly stays within this range. But some estimation difficulties are

explainable by the optimization algorithm entering the mentioned lambda regions, thus

resulting in even more nonlinear and unpredictable behaviour.


Estimation points out the sensitivity of this specification. With the use of the variance

matrix Σε (ht) = Σε + htΓΓ′ our model is extremely sensitive to shocks and therefore

extremes follow from these shocks. Due to these extremes the optimization routine ex-

periences estimation difficulties. The determinant of ΓΓ′ is 0. This implies that when

the elements of the diagonal matrix Σε go to zero, the variance matrix Σε (ht) becomes

singular or becomes numerically close to singular. Also, when ht goes to infinity the

effect of the very well conditioned diagonal matrix Σε diminishes and again Σε (ht) be-

comes close to singular. Because Σε (ht) needs to be inverted this makes the matrix

ill-conditioned. We thus need to ensure that Σε stays well-conditioned; i.e. all the

diagonal elements must stay positive. This is not necessarily a problem, because we

can simply limit these at a particular value. But also the common volatility must be

bounded from extremes, even though it is is theoretically unbounded. We therefore use a

(generalized) logistic transformation for the factor transformation. That is from ft to ht.

ht = lb+ub− lb

1 + exp(−ft)(6.1)

where lb and ub are respectively the lower bound and upper bound of the volatility.

This specification for Σε (ht) is clearly derived in a parameter driven approach. It is as-

sumed that there exist two sources of error; one common disturbance and one disturbance

Results 48

for each of the individual maturity yields. Together these two errors form the observed

error. The problem with this combined error is that our observation driven framework

can not distinguish the two errors; our likelihood only depends on the combined variance

matrix and the combined disturbances. Therefore the optimization algorithm reduces

the constant variance matrix (Σε) in the direction it increases the likelihood the most.

Specifically, the likelihood is affected by the variance at time t through two components:

1. Minus the log determinant of the combined variance matrix: − log (|Σε (ht) |). This

term goes to infinity for an ill-conditioned matrix, that is |Σε (ht) |≈ 0 leads to

− log|Σε (ht) |≈ ∞.

2. The second term penalizes the likelihood using the estimated combined disturbance

(εt) and inverted combined variance matrix (Σε (ht))−1. This means that it cannot

have a determinant equal to zero. With high values of ht this matrix becomes

numerically close to singular. So with an unbounded common variance it is difficult

to guard against a situation with such an ill-conditioned covariance matrix.

In practice the proposed transformation does not seem to work. The estimated volatili-

ties in figure 6.2a show some oscillating characteristics. Also the loadings (figure 6.2b),

that pass on the effect of the common disturbance to the different yields, first increase

for maturities till 20 months and then decrease. Theory and the data suggest that

volatilities should in general decrease with maturity. Despite the appealing likelihood

values it is safe to conclude that this specification is not working well.

Looking back we could have stayed within the a true observation driven framework of

modeling the errors. We then would have to assume a different parametrization for the

errors. For example the parametrization proposed by Creal et al. (2011b):

Σt = DtRtDt (6.2)

where Dt is a diagonal standard deviation matrix and Rt is the (symmetric) correla-

tion matrix. Both or just one could be time-varying. Using this specification there is

a clear distinction between the correlation and the volatility component. We thus can

circumvent the problems with the common disturbance specification, which basically

assumes two sources of error, without the possibility to distinguish them. The common

disturbance in our specification leads to a correlation in the yield disturbances. Through

the proposed variance decomposition we capture this correlation in the correlation com-

ponent Rt. Unfortunately, experimentation with this multivariate heavy-tailed model

(based on the multivariate student-t distribution) on the pre-filtered residuals were not

successful. The dimensions of the data used again turn out to be a big challenge. In order

to estimate the ever increasing number of time-varying parameters, matrices of 289×289

Results 49

need to be determined and need to be inverted. This is computationally intensive, but

above all sensitive to computational errors, even through positive definite matrix decom-

positions such as the Cholesky and LDL decomposition problems occur with the matrix

inversions. Also, estimation problems may have been enlarged by the absence of very

clear volatility dynamics in the residuals. As with the yields these volatilities are highly

persistent. The used residuals have reasonably low autocorrelations (ρ1 = 0.2) which

possibly makes it even harder to identify the dynamics.

Figure 6.2: Common Disturbance with Time-Varying Volatility

1970 1975 1980 1985 1990 1995 2000 2005 20100

0.05

0.1

0.15

0.2

0.25

0.3

0.35

Year

ht

(a) Common Disturbance Volatility

0 20 40 60 80 100 1200.9

1

1.1

1.2

1.3

1.4

1.5

Maturity (in Months)

Loadin

g

(b) Loadings Common Disturbance

6.1.7 Common Time-Varying Volatility

From the results shown in table 6.2 the common volatility specification looks quite

successful. The likelihood values increase hugely and also the volatility and loading

estimates are sensible. Periods with higher volatitilies are for example the 80’s and

during crises (figure 6.3a). From the results we see that as expected the volatilities of

the disturbances of the different maturities have different intensities: shorter maturity

yields are more volatile than longer maturity yields (figure 6.3b). Further we see from

the estimated errors that shocks to the different maturities appear at same time and

in the same direction. This confirms that it is indeed rational to assume a common

volatility driver.

Results 50

Figure 6.3: Common Time-Varying Volatility

1970 1975 1980 1985 1990 1995 2000 2005 20100

0.5

1

1.5

2

2.5

3

Year

ht

(a) Volatility

0 20 40 60 80 100 1200.7

0.75

0.8

0.85

0.9

0.95

1

Maturity (in Months)

loadin

g

(b) Loadings Volalitility

6.1.8 Bjork and Christensen Four-Factor Model

Finally, we estimate a four-factor extension of the DNS, a so-called Svensson extension.

To avoid additional estimation problems reported by Pooter (2007) we use the adjusted

version in which we have only one λ decay parameter (Bjork and Christensen, 1999).

Also, the model is only estimated in diagonal specification. The results indicate that

this specification further increases the likelihood value and decreases the RMSE.

6.1.9 Estimation Robustness

Due to the large number of parameters in the models used in this thesis, the optimization

problems are highly dimensional. Further, because of the high persistence of the yields

the likelihood surface is very flat. Therefore, the likelihood functions can have multiple

local maxima. As a result of this, algorithms can encounter difficulties finding the

global maximum. Sensitivity to initial values can be large for some models, especially

if certain dynamics are not strongly present. The nonlinear extension with variable

lambda is difficult to estimate, but also the simple models with full coefficient matrices.

The optimization should be repeated with different initial values till a global optimum

is found. These concerns are worrying in the empirical use of the models. The Nelson-

Siegel models partly originated in its ease of estimation. Without this convenience it

loses its advantage partly over theoretically more sound arbitrage-free models.

Results 51

6.1.10 In-Sample Conclusion

From the in-sample results we conclude that the more flexible model with variable

lambda fits the data slightly better than the standard model. A model with an ad-

ditional fourth factor (a Svensson extension) will fit the data even better and is consid-

erably easier to estimate, than the nonlinear variable λt model. Moreover, estimation

using the GAS framework results in a better in-sample fit compared to the Kalman filter

estimated models. Also, the specification with a common disturbance is not suited for

GAS estimation. On conclusions for the other model specifications we would like to be

careful: the model with student-t distributed disturbances seems as an extension in the

right direction but in practice has its drawback in the updating of the factors, therefore

it does not fit the data well. The extension with time-varying volatility with Gaussian

disturbances hugely increases the log-likelihood value, but the estimated errors are not

that decisive on increased fit. Nevertheless, the estimated volatilities could give insight

in the confidence intervals surrounding the predictions.

Results 52

6.2 Out-of-Sample Performance

In the following section the forecasting performance of the models are compared. We

make comparisons based on RMSFE and Diebold-Mariano tests. Creal et al. (2008) and

Koopman et al. (2010) do not make comparisons based on out-of-sample performance

of their model extensions. However, forecasting is an important practical problem for

which term-structure models are used, besides fitting the current curve or describing the

dynamics of the past. Therefore, it is practical to evaluate the forecasting capabilities.

From the previous section we saw that GAS estimated models lead to a better in-sample

fit than comparable Kalman estimated models based on its residuals (RMSE). Moreover,

model extensions further increased the in-sample fit. Although the GAS estimation

method and extensions lead to a better fit, it is possible that we over-fit our data. This

over-fitting may lead to worse forecasting capabilities. Therefore we evaluate the models

based on out-of-sample forecasting capabilities. Our newly proposed extensions of the

GAS estimated models are compared, as well as the extensions by Creal et al. (2008)

and Koopman et al. (2010). Also, the predictive capacities of the Kalman estimated

models are compared with the GAS estimated models.

6.2.1 Forecast Procedure

Next the forecasting procedures are introduced. The model chosen to beat is the Random

Walk (RW). In finance it is known to be very difficult to beat the random walk, for the

term structure setting for example: Ang and Piazzesi (2003); Diebold and Li (2006);

Duffee (2002); Monch (2008). We therefore take this as the benchmark for the assessment

of the forecasts generated with the estimated models. The random walk is given by:

yt(τ) = yt−1(τ) + εt, with εt ∼ N(0, σ2(τ)). (6.3)

for τ = 1, . . . ,m. This leads to the h months ahead forecast given by:

yt+h = yt. (6.4)

The forecasting procedure for the GAS and Kalman estimated models is as follows. First

the models are estimated for some estimation period. From this estimation period we

obtain estimated model coefficients. We then use these coefficients to obtain filtered

factors for our forecasting period. Using these filtered factors h-step ahead forecasts of

the yields are calculated at every point in time. Forecasts are generated for the horizons

1,6 and 12 months, so for the short, medium and long-term.

For our GAS estimated models we have the following prediction of the factors. At time t

Results 53

we have the factor prediction given by updating equation (2.20): ft+1. We thus already

have the prediction for h=1. For ft+h we have following from recursions:

ft+h =

(h∑i=1

Bi−1

)ω +Bh−1ft+1. (6.5)

For the Kalman filter we have similar predictions. The forecast of the Kalman state

equation is given by:

βt+h =

(I +

h−1∑i=1

A

)µ+Ahβt|t. (6.6)

After we have calculated the factor forecasts we obtain the yield forecast using:

yt+h = Xt+hβt+h. (6.7)

For our linear specifications we have our time-invariant loading matrix given by: Xt+h =

X(λ). For the variable lambda specification we first need to calculate the corresponding

λt+h by use of equation (5.31) and using this value calculate the corresponding Xt+h

using equation (5.31). The model specification with time-varying volatility have similar

yield prediction steps as the expectation of the disturbance is zero, so the forecasts only

depend on the first three Nelson-Siegel factors.

6.2.2 Forecast Measures

Forecasts are compared based on the following measures. First, the forecast error of

model i is defined by:

εit+h(τ) = yit+h(τ)− yt+h(τ) (6.8)

The RMSFE for each maturity is then given by:

RMSFE(τ) =

√√√√ 1

T

T∑t=1

(εt+h)2. (6.9)

Further we use an average of the forecast errors over all (17) maturities, namely the

trace root mean squared error given by:

tRMSFE(τ) =

√√√√ 1

T

1

N

T∑t=1

N∑i=1

(εt+h)2. (6.10)

Results 54

Besides these somewhat subjective measures, the Diebold-Mariano test is used (2002).

The Diebold-Mariano statistic is used to test the hypothesis that performance of two

models is the same. The loss function of the test is chosen such that it indicates sig-

nificance based on the squared error. Therefore it complements analysis based on the

RMSE. Additionally, the absolute error |εit+h| is could be used as loss-differential. Under

the null the performance of two models is equal:

H0 : E[(ε1t+h)2] = E[(ε2t+h)2]. (6.11)

H1 : E[(ε1t+h)2] 6= E[(ε2t+h)2]. (6.12)

The test statistic is then given by:

S =d(

1T V (d)

)1/2 ∼ N(0, 1), (6.13)

where the loss differential is given by:

dt = (ε1t+h)2 − (ε2t+h)2, (6.14)

and the sample mean and sample variance are correspondingly given by:

d =1

T

T∑t=1

dt, (6.15)

V (d) = γ0 + 2∞∑j=1

γj , γj = ˆcov(dt, dt−j), (6.16)

where V (d) gives a consistent estimate of the asymptotic (long-run) variance. The

long-run variance is used because the loss differentials are serially correlated for h > 1.

Under the null the test-statistic is asymptotically standard normally distributed. For

the standard 2-sided test we reject null if |S|> Φ−1(1 − α/2), where Φ is cumulative

normal distribution and α the critical value.

We evaluate forecasting capabilities by dividing the data-set into an initial estimation

period and a forecasting period. We choose the end of our data-set as forecasting sample.

We estimate and calibrate our model with the data from January 1970 till December

2004 (420 observation). We then take the last part of our data-set to create forecasts.

Forecasts are generated over a period of 5 years: from January 2005 till December 2009

(60 observations). In this period falls the US housing bubble and the start of the global

financial crisis. Therefore this period is far from stable. The yield curve goes from a

steady upward sloping curve to a flat and even inverted yield curve with high interest

rates. Then interest rates rapidly fall and the curve moves back to an upward sloping

Results 55

shape. Because of these real-world dynamics this could realy illustrate the capabilities

of the DNS. We choose not to estimate specifications with common disturbance and

student-t distributed errors. From our in-sample results these specifications are found

not to give a credible representation of the yield curve.

6.2.3 Forecast Results

In tables (A.10)-(A.12) we present the (t)RMSFE. Because our reference model is the

RW, positive values indicate out-performance of the RW. Further, we present in tables

(A.13)-(A.15) the Diebold-Mariano statistics for all maturities in three forecasting hori-

zons. Positive (negative) values indicate out-performance of (by) the RW.

From the results we see that forecasting errors are in general an increasing function of

the forecasting horizon, which conforms to general expectations. Also, yields of longer

maturities have smaller forecasting errors than yields of shorter maturities. Which is

also rational because of their lower estimated variance.

For the forecast results of the chosen sub-sample we can be short: the random walk

easily beats all the proposed models, for some maturities even significantly. Only in

a few instances in the shortest forecasting horizon of one month are the DNS models

able to outperform the naive RW. This is possibly the result of the chosen forecasting

sub-sample. The chosen period is found to be too violent to forecast with the considered

models (and possibly any model).

Further, it could be argued that a different estimation sub-sample would give better

forecasts. We chose to make use of the maximum sample period (1970-2005), instead

of some shorter possibly arbitrary period. Nevertheless, the persistent dynamics which

are captured in the high values of the coefficient matrices will not change that much by

considering a different estimation sub-sample.

The results of the forecasting equation might indicate an important property of the GAS

framework: All estimated models have comparable coefficient estimates for the lagged

values of the factors, both for the Kalman and GAS estimated versions (see in-sample re-

sults in Appendix A), especially for the diagonal coefficient specifications. What makes

the models different is the correction for mis-specifications. Both techniques use a similar

one-step ahead prediction framework: for each period the methods assess the deviations

from expectations and adapt accordingly. From the generated forecasts, we see that the

GAS estimated models have slightly lower forecast errors than the Kalman estimated

versions. Especially for the shortest forecasting horizon of one month. The short-term

benefits the most of the correction (see forecast equation 6.5). This correction might be

beneficial in the short forecasting horizon, but could possibly also lead to over-fitting

and degrade forecasting capabilities in the long-run. This possibility is indicated by

Results 56

the estimates of the coefficients in the full Gaussian specification, which is estimated

using the GAS framework (g-VAR). The estimates have a different economic meaning,

with clear downward trend in the curvature, compared to the estimates of the Kalman

estimated version (k-VAR).

6.2.4 Out-of-Sample Conclusion

Overall, none of the models is able to significantly outperform the benchmark RW model.

On the contrary, for our chosen sub-sample the DNS is easily beaten by the naive

RW. Both techniques use a similar one step ahead prediction framework. Despite this,

both techniques in all specifications are unable to correct for changes in the yields and

correctly forecast them. It should be noted that these conclusions are based on the chosen

sub-sample. In the long-run or for some stable sub-sample the RW could possibly be

beaten, but from our chosen sub-sample we cannot conclude this.

Chapter 7

Conclusion

In this thesis we compare the in-sample and out-of-sample performance of a range of

different dynamic Nelson-Siegel specifications, estimated using the GAS framework. We

estimate some existing specifications and propose some new specifications. Furthermore,

we compare performance of the GAS estimated and Kalman estimated models.

Our first research question concerns the assumption of normality. From our analyses it

follows that the assumption of normality is not correct. Therefore, we propose a DNS

model specification with multivariate student-t distributed disturbances. Based on the

model log-likelihood values, this model extension gives a significant improvement in fit.

But, based on the residuals we come to the conclusion that this specification does not

result in a better model fit. This is a result of the slow reaction to large shocks. The

slow reaction to large shocks can be attributed to the workings of the GAS updating

mechanism. As a result of the fat-tailed nature of the student-t distribution, the up-

dating mechanism reacts less fierce to large changes in the yield. Because these yield

changes are mostly structural we do not find this a meaningful option for modeling and

forecasting.

The second question we answer is if heteroskedasticity can be included in the model.

We modify a time-varying volatility specification proposed by Koopman et al. (2010) in

order to estimate it within the GAS framework. This specification is not suited for esti-

mation using the GAS framework. The inability to correctly estimate this specification

is the result of the assumption of two disturbance sources. In the chosen setup the GAS

framework cannot distinguish these disturbance sources, therefore this specification re-

sults in optimization problems. Furthermore, we propose a new time-varying volatility

specification. This specification works well within the GAS framework and results in a

better in-sample fit compared to a standard fixed variance DNS model. The estimated

volatilities of this specification could give insight in the confidence intervals surrounding

predictions.

57

Conclusion 58

Additionally, other model extensions are investigated taking advantage of the possibili-

ties of the GAS framework. A nonlinear model with a variable lambda is estimated as

proposed by Creal et al. (2008). Also, a four-factor Bjork-Christensen specification is

estimated. From in-sample performance we conclude that these more elaborate models

with nonlinearities or an additional fourth factor fit the data better than the standard

GAS estimated DNS. Although, the estimation difficulties that result from the nonlin-

earities in the variable lambda model are worrying in empirical use. Moreover, the fit

of the four-factor model is better than the nonlinear extension.

The final research question regards how the performance of the GAS estimated models

compares to the Kalman filter estimated models. In-sample results indicate that GAS

estimated models give a better in-sample fit to the data, compared to similar Kalman

filter estimated models. In out-of-sample forecasting, the random walk again is difficult

to beat. For the chosen sub-sample the DNS is outperformed by the naive random walk,

both the GAS estimated specifications, as well as the Kalman estimated specifications.

Overall, the GAS updating mechanism incorporates shocks very well into the factors,

resulting in a better in-sample fit compared to similar Kalman filter estimated mod-

els. Unfortunately, we cannot conclude that this better in-sample fitting translates to

out-of-sample out-performance of the Kalman estimated models. Furthermore, the GAS

framework gives an increased flexibility for model extensions.

Chapter 8

Further Research

The used GAS framework is very flexible and research on this framework looks very

promising. Through the use of this framework the DNS could be extended in various

directions. One extension direction that would be interesting is the addition of macro-

variables. A lot of recent research claims to gain predictive accuracy through the use

variables such as inflation and real economic activity.(Diebold et al., 2005, 2006; Rude-

busch and Wu, 2008)

Secondly, because of the use of a long data-set it could also be interesting to estimate

a model with regime-switching properties estimated using the GAS framework. A lot

of researchers find evidence for regime-switching properties and claim to gain predictive

power through the use of them.(Bernadell et al., 2005; Xiang and Zhu, 2013)

In this thesis, forecasts are only evaluated based on (t)RMSFE and DM test. It would

be interesting to use the forecasts in some trading strategy such as Fabozzi et al. (2005,

2007). He uses the slope and curvature predictions in a trading strategy and evaluates

the return of these strategies. Also, the time-varying volatility could be used in pro-

viding confidence intervals surrounding the predictions. Moreover, it could possibly be

used in some term-structure option trading strategy.

Furthermore, this thesis has focused only on the NS curve for fitting and forecasting the

yield curve. Recent research suggests new estimation methods that are supposed to be

easy to estimate and give consistent global maximums. They realize this by concentrat-

ing out variables and thereby reduce the optimization space of the model. It would be

interesting to see if these methods can be used in combination with the GAS framework

(Hamilton and Wu, 2012).

59

Appendix A

Tables

61

Tables 62

TableA.1:

RM

SE

inth

ep

eriod

from

Janu

ary

1970

tillD

ecemb

er20

09.R

epresen

tedas

valu

esrela

tiveto

the

GA

SG

au

ssian

AR

mod

el

Matu

rityg-A

Rg-V

AR

g-λ

-AR

g-T

VV

-A

Rg-C

D-

AR

g-S

ven

t-AR

t-VA

Rk-A

Rk-V

AR

30.5

750.9

770.97

40.989

1.0170.977

1.2691.192

1.1021.109

60.5

330.9

920.99

40.997

1.0720.993

1.2571.171

1.0621.064

90.5

210.9

990.99

91.002

1.1120.998

1.2921.156

1.0531.054

12

0.5

141.0

000.99

31.003

1.1260.998

1.3171.139

1.0531.053

15

0.5

070.9

990.98

61.004

1.1180.998

1.2931.087

1.0441.044

18

0.4

891.0

000.98

51.004

1.1240.997

1.3401.085

1.0451.045

21

0.4

801.0

000.98

71.007

1.1270.996

1.3821.096

1.0471.046

24

0.4

731.0

000.98

41.005

1.1170.995

1.4201.103

1.0481.049

30

0.4

561.0

010.97

81.006

1.0950.994

1.4221.084

1.0401.042

36

0.4

440.9

990.97

51.010

1.0810.992

1.4181.069

1.0341.036

48

0.4

240.9

980.97

21.013

1.0690.983

1.3871.072

1.0321.032

60

0.4

051.0

000.98

31.007

1.0550.974

1.3421.066

1.0241.025

72

0.3

980.9

990.98

31.003

1.0400.955

1.2671.077

1.0131.011

84

0.3

810.9

960.99

91.004

1.0680.969

1.2911.106

1.0111.009

96

0.3

620.9

950.99

61.007

1.0870.985

1.2421.095

1.0131.012

108

0.3

500.9

910.99

71.009

1.1360.995

1.2461.097

1.0361.030

120

0.3

600.9

901.00

91.000

1.1630.982

1.3151.086

1.0641.053

TO

TA

L0.45

60.99

60.98

71.003

1.0930.989

1.3251.113

1.0471.048

Tables 63

Table A.2: Kalman DNS with diagonal coefficient matrix (k-AR)

µ A λ

7.109*** 0.989*** - - 0.054***(0.0056) (0.0060) (0.0009)-1.699*** - 0.960*** -(0.0016) (0.0062)-0.419*** - - 0.918***(0.0091) (0.0090)

Table A.3: Kalman DNS with full coefficient matrix (k-VAR)

µ A λ

6.919*** 0.990*** 0.020*** -0.008*** 0.055***(0.0221) (0.0015) (0.0026) (0.0023) (0.0010)-1.770*** -0.013 0.954*** 0.036***(0.0021) (0.0052) (0.0072) (0.0070)-0.584*** 0.039*** -0.003** 0.903***(0.0091) (0.0012) (0.0019) (0.0016)

Table A.4: GAS Gaussian with diagonal coefficient matrices (g-AR)

ω A B λ

0.021*** 1.163*** - - 0.997*** - - 0.058***(0.0017) (0.0009) (0.0002) (0.0002)-0.027*** - 1.168*** - - 0.991*** -(0.00130) (0.0009) (0.0004)-0.050*** - - 1.224*** - - 0.866***(0.0052) (0.0010) (0.0008)

Table A.5: GAS Gaussian with full coefficient matrices (g-VAR)

ω A B λ

0.086*** 0.934*** 0.031*** 0.021*** 0.994*** 0.028*** -0.011*** 0.066***(0.0006) (0.0015) (0.0013) (0.0008) (0.0000) (0.0004) (0.0004) (0.0002)0.033*** 0.224*** 1.097*** 0.070*** -0.020*** 0.926*** 0.049***(0.0042) (0.0032) (0.0019) (0.0009) (0.0005) (0.0006) (0.0007)-0.562*** 0.583*** 0.024*** 1.020*** 0.072*** 0.044*** 0.788***(0.0108) (0.0080) (0.0054) (0.0031) (0.0012) (0.0025) (0.0023)

Table A.6: GAS Gaussian with variable λt diagonal coefficient matrices (g-λ-AR)

ω A B φ0 Φ

0.041*** 1.163*** - - 0.995*** - - 0.033*** 0.008***(0.0015) (0.0002) (0.0012) (0.0005) (0.0004)0.016*** - 1.113*** - - 0.999*** - 0.011***(0.0010) (0.0040) (0.0008) (0.0004)-0.015*** - - 1.175*** - - 0.741*** -0.001***(0.0073) (0.0049) (0.0000) (0.0006)

Tables 64

Table A.7: GAS Gaussian with Time-Varying Volatility and diagonal coefficient ma-trices (g-TVV-AR)

ω A B λ

0.105*** 1.131*** - - - 0.983*** - - - 0.050***(0.0054) (0.0486) (0.0067) (0.0102)0.021*** - 1.163*** - - - 0.988*** - -(0.008) (0.0038) (0.0101)

0.002*** - - 1.172*** - - - 0.962*** -(0.0094) (0.0015) -0.0201-0.153*** - - - 0.212*** - - - 0.974***(0.0006) (0.0059) (0.0057)

Table A.8: GAS Four-Factor Bjork-Christensen with diagonal coefficient matrices(g-BC-AR)

ω A B λ

-0.080*** 1.159*** - - - 0.998*** - - - 0.025***(0.0053) (0.0043) (0.0062) (0.0001)-0.528*** - 1.157*** - - - 0.973*** - -(0.0021) (0.0018) (0.0022)0.555*** - - 1.153*** - - - 0.997*** -(0.0025) (0.0058) (0.0011)0.585*** - - - 1.153*** - - - 0.979***(0.0026) (0.0020) (0.0019)

Table A.9: GAS Student-t with diagonal coefficient estimates (t-AR)

ω A B λ v

0.045*** 1.334*** - - 0.996*** - 0.062 4.176***(0.0002) (0.0014) (0.0000) - (0.0012)-0.019*** - 1.245*** - - 0.956*** -(0.0000) (0.0035) (0.0020)-0.073*** - - 1.548*** - - 0.886***(0.0001) (0.0016) (0.0010)

Tables 65

Table A.10: (t)RMSFE in the period from January 2005 till December 2009 with ahorizon of 1 month, represented as values relative to RW.

Maturity RW g-AR g-VAR

g-λ-AR

g-TVV-AR

g-BC-AR

k-AR k-VAR

3 0.307 1.416 1.668 1.363 1.429 1.212 1.802 1.7126 0.282 1.059 1.009 1.115 1.058 1.043 1.318 1.2129 0.283 0.977 0.973 1.058 0.973 1.001 1.132 1.04612 0.291 0.939 1.043 1.045 0.936 0.962 1.034 0.97015 0.286 1.014 1.108 1.092 1.048 1.024 1.116 1.06218 0.287 1.045 1.125 1.116 1.093 1.034 1.126 1.08321 0.285 1.075 1.145 1.149 1.125 1.045 1.135 1.10524 0.295 1.096 1.117 1.150 1.152 1.039 1.133 1.11530 0.296 1.167 1.127 1.205 1.224 1.047 1.179 1.16536 0.291 1.200 1.160 1.224 1.246 1.042 1.188 1.18548 0.304 1.248 1.235 1.240 1.277 1.036 1.213 1.21960 0.288 1.173 1.189 1.226 1.229 0.987 1.121 1.14372 0.298 1.234 1.268 1.274 1.300 1.045 1.177 1.20684 0.277 1.051 1.103 1.206 1.142 1.000 1.010 1.04196 0.301 1.002 1.041 1.141 1.074 0.990 0.976 0.997108 0.281 1.048 1.077 1.208 1.123 1.085 1.031 1.048120 0.296 0.994 1.026 1.100 1.021 1.023 1.000 1.005

Total 0.291 1.113 1.158 1.176 1.155 1.039 1.178 1.153

Table A.11: (t)RMSFE in the period from January 2005 till December 2009 with ahorizon of 6 months, represented as values relative to RW.


g-λ-AR

g-TVV-AR

g-BC-AR

k-AR k-VAR

3 1.067 1.163 1.162 1.154 1.162 1.248 1.302 1.2136 1.047 1.137 1.120 1.107 1.106 1.191 1.231 1.1459 1.037 1.149 1.132 1.102 1.099 1.166 1.208 1.13512 1.027 1.165 1.151 1.107 1.101 1.151 1.195 1.13615 1.004 1.227 1.218 1.155 1.152 1.179 1.235 1.18718 0.983 1.262 1.259 1.180 1.179 1.185 1.251 1.21421 0.956 1.289 1.293 1.199 1.199 1.189 1.262 1.23524 0.945 1.306 1.320 1.213 1.216 1.186 1.268 1.25230 0.929 1.330 1.360 1.227 1.239 1.184 1.278 1.27836 0.895 1.334 1.378 1.224 1.245 1.175 1.273 1.28748 0.870 1.295 1.362 1.193 1.225 1.144 1.237 1.27160 0.781 1.284 1.385 1.193 1.239 1.162 1.229 1.29272 0.754 1.278 1.396 1.201 1.258 1.193 1.232 1.31284 0.652 1.237 1.396 1.174 1.242 1.204 1.196 1.30596 0.664 1.142 1.297 1.097 1.164 1.142 1.112 1.217108 0.642 1.112 1.274 1.080 1.149 1.131 1.087 1.195120 0.558 1.118 1.323 1.092 1.166 1.174 1.091 1.228

Total 0.886 1.230 1.266 1.162 1.179 1.180 1.233 1.221

Tables 66

Table A.12: (t)RMSFE in the period from January 2005 till December 2009 with ahorizon of 12 months, represented as values relative to RW.


g-λ-AR

g-TVV-AR

g-BC-AR

k-AR k-VAR

3 1.936 1.086 1.094 1.121 1.103 1.192 1.141 1.0746 1.843 1.090 1.090 1.108 1.093 1.178 1.135 1.0729 1.755 1.122 1.124 1.129 1.119 1.192 1.160 1.10312 1.693 1.147 1.153 1.144 1.138 1.200 1.176 1.12815 1.635 1.195 1.208 1.184 1.182 1.233 1.218 1.17818 1.574 1.230 1.252 1.211 1.215 1.255 1.248 1.21821 1.508 1.259 1.292 1.234 1.243 1.275 1.274 1.25324 1.451 1.289 1.336 1.258 1.274 1.299 1.303 1.29130 1.377 1.324 1.398 1.285 1.313 1.330 1.338 1.34436 1.281 1.350 1.452 1.302 1.344 1.362 1.368 1.39048 1.162 1.357 1.510 1.302 1.365 1.405 1.387 1.44060 1.003 1.378 1.602 1.319 1.412 1.477 1.426 1.51872 0.942 1.372 1.646 1.315 1.430 1.515 1.435 1.55984 0.807 1.341 1.694 1.283 1.427 1.540 1.423 1.59096 0.767 1.291 1.674 1.234 1.393 1.519 1.384 1.571108 0.719 1.276 1.696 1.218 1.396 1.520 1.381 1.590120 0.636 1.259 1.761 1.204 1.403 1.512 1.376 1.628

Total 1.363 1.217 1.296 1.201 1.225 1.284 1.253 1.252

Table A.13: Diebold-Mariano Statistics of forecasts in the period from January 2005till December 2009 with a horizon of 1 month, where positive values indicate outper-

formance of the RW. The last row indicates no. of outperformances of the RW.

Maturity g-AR g-VAR g-λ-AR g-TVV-AR

g-BC-AR

k-AR k-VAR

3 -2.972 -4.414 -2.867 -3.530 -2.141 -3.555 -3.3636 -1.434 -0.223 -1.752 -1.609 -1.212 -3.106 -2.4569 0.486 0.439 -0.793 0.703 -0.013 -2.964 -1.48412 1.310 -0.579 -0.558 1.610 0.765 -1.204 1.74415 -0.242 -1.227 -1.136 -0.818 -0.425 -2.955 -1.55718 -0.808 -1.694 -1.660 -1.854 -0.755 -2.939 -2.56221 -1.196 -2.038 -1.943 -2.225 -0.983 -2.760 -2.58224 -1.443 -1.860 -1.937 -2.379 -0.845 -2.329 -2.18330 -1.809 -1.583 -2.137 -2.523 -0.748 -2.157 -1.98236 -1.971 -1.671 -2.105 -2.659 -0.688 -2.238 -2.10848 -2.026 -1.789 -2.077 -2.497 -0.792 -1.856 -1.83160 -1.943 -1.883 -1.637 -2.753 0.296 -1.717 -1.80972 -1.842 -1.982 -1.839 -2.361 -0.821 -1.546 -1.64784 -0.920 -1.487 -1.146 -1.950 0.006 -0.343 -0.80796 -0.166 -0.959 -0.954 -1.392 0.312 0.691 0.163108 -0.995 -1.746 -1.330 -1.495 -1.106 -0.759 -1.121120 0.093 -0.286 -0.827 -0.382 -0.425 0.065 -0.078# 3 1 0 2 4 2 2

Tables 67

Table A.14: Diebold-Mariano Statistics of forecasts in the period from January 2005till December 2009 with a horizon of 6 month, where positive values indicate outper-



g-BC-AR

k-AR k-VAR

3 -1.488 -1.085 -1.238 -1.240 -1.915 -1.452 -1.1626 -1.928 -1.158 -1.527 -1.286 -2.222 -1.438 -1.1819 -2.168 -1.506 -1.877 -1.491 -2.172 -1.474 -1.43012 -2.097 -1.686 -1.909 -1.502 -2.023 -1.458 -1.53315 -2.178 -1.954 -1.968 -1.725 -2.494 -1.617 -1.83518 -2.150 -2.043 -1.987 -1.793 -2.554 -1.658 -1.92221 -2.144 -2.102 -1.978 -1.825 -2.662 -1.696 -1.97924 -2.190 -2.170 -1.988 -1.883 -3.068 -1.782 -2.04630 -2.091 -2.167 -1.901 -1.874 -2.939 -1.761 -2.01336 -2.125 -2.227 -1.935 -1.924 -3.258 -1.822 -2.06648 -2.017 -2.189 -1.734 -1.820 -3.526 -1.746 -1.98660 -1.971 -2.103 -1.651 -1.768 -3.808 -1.724 -1.88372 -1.866 -2.023 -1.540 -1.703 -3.472 -1.648 -1.82384 -1.831 -1.899 -1.538 -1.649 -3.781 -1.583 -1.73296 -1.777 -1.655 -1.343 -1.487 -3.218 -1.411 -1.514108 -1.916 -1.586 -1.414 -1.525 -3.588 -1.392 -1.485120 -2.356 -1.616 -2.203 -1.785 -2.706 -1.767 -1.605# 0 0 0 0 0 0 0

Table A.15: Diebold-Mariano Statistics of forecasts in the period from January 2005till December 2009 with a horizon of 12 months, where positive values indicate outper-



g-BC-AR

k-AR k-VAR

3 -0.952 -0.592 -0.970 -0.790 -1.142 -0.675 -0.5506 -1.156 -0.671 -1.123 -0.863 -1.267 -0.689 -0.6289 -1.348 -0.906 -1.306 -1.043 -1.428 -0.789 -0.87712 -1.335 -1.030 -1.319 -1.090 -1.497 -0.833 -0.99915 -1.365 -1.188 -1.361 -1.192 -1.589 -0.932 -1.16118 -1.385 -1.297 -1.392 -1.261 -1.662 -0.998 -1.26521 -1.403 -1.378 -1.413 -1.310 -1.748 -1.051 -1.34224 -1.419 -1.446 -1.428 -1.354 -1.816 -1.106 -1.40430 -1.392 -1.510 -1.403 -1.370 -1.853 -1.146 -1.44736 -1.393 -1.572 -1.403 -1.393 -1.946 -1.183 -1.49448 -1.373 -1.634 -1.375 -1.395 -2.036 -1.225 -1.53060 -1.373 -1.701 -1.370 -1.412 -2.136 -1.260 -1.57772 -1.381 -1.731 -1.369 -1.425 -2.101 -1.299 -1.60284 -1.332 -1.710 -1.304 -1.367 -2.264 -1.256 -1.58996 -1.328 -1.695 -1.288 -1.343 -2.299 -1.248 -1.583108 -1.415 -1.739 -1.393 -1.393 -2.333 -1.301 -1.638120 -1.485 -1.713 -1.456 -1.394 -2.553 -1.303 -1.661# 0 0 0 0 0 0 0

Appendix B

Kalman Filter

The Kalman filter provides a minimum mean squared error estimate of βt. At each time

an optimal prediction is generated of yt. This prediction is based on all information up

to that time. We need βt|t−1 The prediction step is given by:

βt|t−1 = (I −A)µ+Aβt−1|t−1. (B.1)

Pt|t−1 = APt−1|t−1A′ + Ση. (B.2)

ηt|t−1 = yt − yt|t−1) = yt −X(λ)βt|t−1. (B.3)

Ft|t−1 = X(λ)Pt|t−1X(λ)′ + Σε. (B.4)

The updating step is given by:

βt−1|t−1 = βt|t−1 + Pt|t−1X(λ)′F−1t|t−1ηt|t−1 (B.5)

Pt−1|t−1 = Pt|t−1 − Pt|t−1X(λ)′F−1t|t−1X(λ)Pt|t−1, (B.6)

where Pt is the variance of βt and the computations are carried out recursively for

t = 1, . . . , T .

As initial values we use:

β1|0 = E[βt] = µ

P1|0 = Σβ, where Σβ −AΣβA′ = Ση

The conditional distribution of the yields is then given by:

yt|Ft−1 ∼ N(yt|t−1), Ft|t−1) (B.7)

69

Kalman Filter 70

So the log-likelihood is given by:

`t(θ) = −NT2

log(2π)− NT

2

T∑t=1

log|Ft|t−1|−1

2

T∑t=1

η′t|t−1F−1t|t−1ηt|t−1 (B.8)

Numerical optimization over the hyper-parameters θ gives the maximum likelihood es-

timate

Bibliography

Ang, Andrew and Piazzesi, Monika. A no-arbitrage vector autoregression of term

structure dynamics with macroeconomic and latent variables. Journal of Monetary

economics, 50(4):745–787, 2003.

Bernadell, Carlos; Coche, Joachim, and Nyholm, Ken. Yield curve prediction for the

strategic investor. Technical report, European Central Bank, 2005.

Bjork, Tomas and Christensen, Bent Jesper. Interest rate dynamics and consistent

forward rate curves. Mathematical Finance, 9(4):323–348, 1999.

Bollerslev, Tim. Generalized autoregressive conditional heteroskedasticity. Journal of

econometrics, 31(3):307–327, 1986.

Caks, John. The coupon effect on yield to maturity. The Journal of Finance, 32(1):

103–115, 1977.

Caldeira, Joao F; Laurini, Marcio P, and Portugal, Marcelo S. Bayesian inference

applied to dynamic nelson-siegel model with stochastic volatility. Brazilian Review

of Econometrics, 30(1):123–161, 2010.

Christensen, Jens H. E.; Lopez, Jose A., and Rudebusch, Glenn D. Can spanned term

structure factors drive stochastic volatility? †, 2013.

Christensen, Jens HE; Diebold, Francis X, and Rudebusch, Glenn D. An arbitrage-free

generalized nelson–siegel term structure model. The Econometrics Journal, 12(3):

C33–C64, 2009a.

Christensen, Jens HE; Lopez, Jose A, and Rudebusch, Glenn D. Do central bank

liquidity facilities affect interbank lending rates? Federal Reserve Bank of San

Francisco Working Paper, 13, 2009b.

Christensen, Jens HE; Diebold, Francis X, and Rudebusch, Glenn D. The affine

arbitrage-free class of nelson–siegel term structure models. Journal of Econometrics,

164(1):4–20, 2011.

71

Bibliography 72

Coroneo, Laura; Nyholm, Ken, and Vidova-Koleva, Rositsa. How arbitrage-free is the

nelson–siegel model? Journal of Empirical Finance, 18(3):393–407, 2011.

Cox, David R; Gudmundsson, Gudmundur; Lindgren, Georg; Bondesson, Lennart;

Harsaae, Erik; Laake, Petter; Juselius, Katarina, and Lauritzen, Steffen L.

Statistical analysis of time series: Some recent developments [with discussion and

reply]. Scandinavian Journal of Statistics, pages 93–115, 1981.

Cox, John C; Ingersoll Jr, Jonathan E, and Ross, Stephen A. A theory of the term

structure of interest rates. Econometrica: Journal of the Econometric Society, pages

385–407, 1985.

Creal, ; Koopman, , and Lucas, . The estimation of time-varying parameters in

multivariate linear time series models. Working Paper, 2011a.

Creal, Drew; Koopman, Siem Jan, and Lucas, Andre. A general framework for

observation driven time-varying parameter models. 2008.

Creal, Drew; Koopman, Siem Jan, and Lucas, Andre. A dynamic multivariate

heavy-tailed model for time-varying volatilities and correlations. Journal of Business

& Economic Statistics, 29(4), 2011b.

Creal, Drew; Koopman, Siem Jan, and Lucas, Andre. Generalized autoregressive score

models with applications. Journal of Applied Econometrics, 2012.

Diebold, Francis X and Li, Canlin. Forecasting the term structure of government bond

yields. Journal of econometrics, 130(2):337–364, 2006.

Diebold, Francis X and Mariano, Robert S. Comparing predictive accuracy. Journal of

Business & economic statistics, 20(1), 2002.

Diebold, Francis X and Rudebusch, Glenn D. The dynamic nelson-siegel approach to

yield curve modeling and forecasting. Technical report, mimeo, 2011.

Diebold, Francis X; Piazzesi, Monika, and Rudebusch, Glenn. Modeling bond yields in

finance and macroeconomics. 2005.

Diebold, Francis X; Rudebusch, Glenn D, and Boragan Aruoba, S. The macroeconomy

and the yield curve: a dynamic latent factor approach. Journal of econometrics, 131

(1):309–338, 2006.

Duffee, Gregory R. Term premia and interest rate forecasts in affine models. The

Journal of Finance, 57(1):405–443, 2002.

Duffee, Gregory R and Stanton, Richard H. Estimation of dynamic term structure

models. The Quarterly Journal of Finance, 2(02), 2012.

Bibliography 73

Duffie, Darrell and Kan, Rui. A yield-factor model of interest rates. Mathematical

finance, 6(4):379–406, 1996.

Engle, Robert. Dynamic conditional correlation: A simple class of multivariate

generalized autoregressive conditional heteroskedasticity models. Journal of

Business & Economic Statistics, 20(3):339–350, 2002.

Engle, Robert F and Russell, Jeffrey R. Autoregressive conditional duration: a new

model for irregularly spaced transaction data. Econometrica, pages 1127–1162, 1998.

Fabozzi, Frank J; Martellini, Lionel, and Priaulet, Philippe. Predictability in the shape

of the term structure of interest rates. The Journal of Fixed Income, 15(1):40–53,

2005.

Fabozzi, Frank J; Martellini, Lionel, and Priaulet, Philippe. Exploiting predictability

in the time-varying shape of the term structure of interest rates. EDHEC Risk and

Asset Management Research Centre, 2007.

Fama, Eugene F. Term-structure forecasts of interest rates, inflation and real returns.

Journal of Monetary Economics, 25(1):59–76, 1990.

Fama, Eugene F and Bliss, Robert R. The information in long-maturity forward rates.

The American Economic Review, pages 680–692, 1987.

Fama, Eugene F and French, Kenneth R. Common risk factors in the returns on stocks

and bonds. Journal of financial economics, 33(1):3–56, 1993.

Gilli, Manfred; Große, Stefan, and Schumann, Enrico. Calibrating the

nelson–siegel–svensson model. 2010.

Hamilton, James D. Analysis of time series subject to changes in regime. Journal of

econometrics, 45(1):39–70, 1990.

Hamilton, James D and Wu, Jing Cynthia. Identification and estimation of gaussian

affine term structure models. Journal of Econometrics, 168(2):315–331, 2012.

Harvey, Andrew; Ruiz, Esther, and Shephard, Neil. Multivariate stochastic variance

models. The Review of Economic Studies, 61(2):247–264, 1994.

Hautsch, Nikolaus and Yang, Fuyu. Bayesian inference in a stochastic volatility

nelson–siegel model. Computational Statistics & Data Analysis, 56(11):3774–3792,

2012.

Hull, John C. Options, futures, and other derivatives. Pearson Education India, 1999.

Bibliography 74

Jensen, Michael and Scholes, Myron. The capital asset pricing model: Some empirical

tests. 1972.

Joslin, Scott; Singleton, Kenneth J, and Zhu, Haoxiang. A new perspective on gaussian

dynamic term structure models. Review of Financial Studies, 24(3):926–970, 2011.

Kalman, Rudolph Emil. A new approach to linear filtering and prediction problems.

Journal of basic Engineering, 82(1):35–45, 1960.

Kim, Don H and Orphanides, Athanasios. Term structure estimation with survey data

on interest rate forecasts. 2005.

LucasKoopman, Scharth. Predicting time-varying parameters with parameter-driven

and observation-driven models. Working Paper, 2012.

Koopman, Siem Jan; Mallee, Max IP, and Van der Wel, Michel. Analyzing the term

structure of interest rates using the dynamic nelson–siegel model with time-varying

parameters. Journal of Business & Economic Statistics, 28(3):329–343, 2010.

Laurini, Marcio Poletti and Hotta, Luiz Koodi. Bayesian extensions to diebold-li term

structure model. International Review of Financial Analysis, 19(5):342–350, 2010.

Litterman, Robert B and Scheinkman, Jose. Common factors affecting bond returns.

The Journal of Fixed Income, 1(1):54–61, 1991.

McCulloch, J Huston. The tax-adjusted yield curve. The Journal of Finance, 30(3):

811–830, 1975.

Monch, Emanuel. Forecasting the yield curve in a data-rich environment: A

no-arbitrage factor-augmented var approach. Journal of Econometrics, 146(1):26–43,

2008.

Nelson, Charles R and Siegel, Andrew F. Parsimonious modeling of yield curves.

Journal of business, pages 473–489, 1987.

Piazzesi, Monika. Affine term structure models. Handbook of financial econometrics, 1:

691–766, 2010.

Pooter, MD de. Examining the nelson-siegel class of term structure models. Technical

report, Tinbergen Institute, 2007.

Rivers, Douglas and Vuong, Quang. Model selection tests for nonlinear dynamic

models. The Econometrics Journal, 5(1):1–39, 2002.

Rudebusch, Glenn D and Wu, Tao. A macro-finance model of the term structure,

monetary policy and the economy*. The Economic Journal, 118(530):906–926, 2008.

Bibliography 75

Russell, Jeffrey R. Econometric modeling of multivariate irregularly-spaced

high-frequency data. Manuscript, GSB, University of Chicago, 1999.

Svensson, Lars EO. Estimating forward interest rates with the extended nelson &

siegel method. Sveriges Riksbank Quarterly Review, 3(1):13–26, 1995.

Vasicek, Oldrich. An equilibrium characterization of the term structure. Journal of

financial economics, 5(2):177–188, 1977.

Vasicek, Oldrich A and Fong, H Gifford. Term structure modeling using exponential

splines. The Journal of Finance, 37(2):339–348, 1982.

Xiang, Ju and Zhu, Xiaoneng. A regime-switching nelson–siegel term structure model

and interest rate forecasts. Journal of Financial Econometrics, 11(3):522–555, 2013.

Documents

Modeling the Term Structure of Interest Rates: using the