72
How Efficient is the Kalman Filter at Estimating Affine Term Structure Models? Jens H. E. Christensen Jose A. Lopez and Glenn D. Rudebusch Federal Reserve Bank of San Francisco 101 Market Street, Mailstop 1130 San Francisco, CA 94105 Preliminary and incomplete draft. Comments are welcome. Abstract We perform a carefully orchestrated simulation study to analyze the bias of the Kalman filter in estimating arbitrage-free Nelson-Siegel (AFNS) models with and without stochas- tic volatility. For Gaussian AFNS models, we document significant finite-sample bias in the estimated mean-reversion parameters. Since the Kalman filter is consistent and ef- ficient for that model class, this exercise provides a measure of the finite-sample bias that will affect any estimator. For AFNS models with stochastic volatility, significant finite-sample upward estimation bias remains, but it is not materially larger than in the Gaussian model. Hence, we recommend estimation based on the Kalman filter for both types of AFNS models and corresponding affine term structure models in general. JEL Classification: C13, C58, G12, G17. Keywords: arbitrage-free Nelson-Siegel models, finite-sample bias, stochastic volatility We thank seminar participants at the Second Humboldt Copenhagen Conference on Financial Economet- rics for comments on an earlier draft of this paper. The views in this paper are solely the responsibility of the authors and should not be interpreted as reflecting the views of the Federal Reserve Bank of San Francisco or the Board of Governors of the Federal Reserve System. This version: August 24, 2015.

How Efficient is the Kalman Filter at Estimating Affine Term Structure …cepr.org/sites/default/files/events/1854_CLR_simulation... · 2015. 8. 25. · Gaussian AFNS models for which

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

  • How Efficient is the Kalman Filter at

    Estimating Affine Term Structure Models?†

    Jens H. E. Christensen

    Jose A. Lopez

    and

    Glenn D. Rudebusch

    Federal Reserve Bank of San Francisco

    101 Market Street, Mailstop 1130

    San Francisco, CA 94105

    Preliminary and incomplete draft. Comments are welcome.

    Abstract

    We perform a carefully orchestrated simulation study to analyze the bias of the Kalman

    filter in estimating arbitrage-free Nelson-Siegel (AFNS) models with and without stochas-

    tic volatility. For Gaussian AFNS models, we document significant finite-sample bias in

    the estimated mean-reversion parameters. Since the Kalman filter is consistent and ef-

    ficient for that model class, this exercise provides a measure of the finite-sample bias

    that will affect any estimator. For AFNS models with stochastic volatility, significant

    finite-sample upward estimation bias remains, but it is not materially larger than in the

    Gaussian model. Hence, we recommend estimation based on the Kalman filter for both

    types of AFNS models and corresponding affine term structure models in general.

    JEL Classification: C13, C58, G12, G17.

    Keywords: arbitrage-free Nelson-Siegel models, finite-sample bias, stochastic volatility

    †We thank seminar participants at the Second Humboldt Copenhagen Conference on Financial Economet-rics for comments on an earlier draft of this paper. The views in this paper are solely the responsibility of theauthors and should not be interpreted as reflecting the views of the Federal Reserve Bank of San Francisco orthe Board of Governors of the Federal Reserve System.

    This version: August 24, 2015.

  • 1 Introduction

    Interest rate volatility is a topic of great research interest given its role in derivatives pricing

    and portfolio risk management. However, as compared to the empirical results presented

    in the extensive GARCH literature, the results of modeling interest rate volatility within

    the more commonly used affine, arbitrage-free models of the term structure have been less

    clear-cut, partly due to the difficulty in estimating their parameters.

    Estimation of flexible affine term structure models is complicated and time consuming,

    partly due to the fairly large number of parameters, and partly due to the latent nature of the

    state variables in such models. The latter causes the estimation to be plagued by numerous

    local maxima that are distinct in the sense that they are not invariant affine transformations1

    of each other and therefore may have very different economic implications, see Duffee (2011)

    and Kim and Orphanides (2012) for discussions of these issues.

    To overcome those problems, Christensen et al. (2011, henceforth CDR) introduce the

    affine arbitrage-free class of Nelson-Siegel term structure models (henceforth referred to as

    AFNS models). These are affine term structure models that preserve the level, slope, and

    curvature factor loading structure in the bond yield function known from the standard Nelson

    and Siegel (1987) yield curve model. These models are easy to estimate because the role

    of each factor is predetermined and does not vary for any admissible set of parameters.

    Furthermore, in that model class, the state variables are Gaussian with constant volatility.

    As a consequence, the models can be estimated with the standard Kalman filter, which

    is equivalent to exact maximum likelihood estimation and therefore is both efficient and

    consistent in the limit. However, despite its consistency and efficiency, the Kalman filter

    remains subject to any unavoidable finite-sample bias.

    In a recent paper, Christensen et al. (2014a, henceforth CLR) generalize the AFNS

    model framework introduced in CDR by incorporating stochastic volatility into the state

    variables. These models are also easy to estimate, again due to the imposed Nelson-Siegel

    factor loading structure. CLR estimate their models using the standard Kalman filter and

    report model fit on par with the original Gaussian AFNS model. Now, though, the Kalman

    filter is no longer efficient and potentially inconsistent because it only approximates the true

    probability distribution of the state variables by matching the first and second moment,

    essentially treating the state variables as if they were Gaussian. Thus, in addition to any

    finite-sample bias, there is potential for added bias arising from the fact that the Kalman filter

    is only an approximation to the true likelihood function. Despite this concern, Kalman filter-

    based estimation of affine term structure models with stochastic volatility is relatively common

    in empirical term structure analysis,2 but the size of any bias in realistic three-factor settings

    1See Dai and Singleton (2000) for the definition of this concept.2For examples, see Duffee (1999), Driessen (2005), Feldhütter and Lando (2008), and Christensen et al.

    (2015).

    1

  • has not been studied in detail in the existing term structure literature (to the best of our

    knowledge). In this paper, we focus on the AFNS model classes with and without stochastic

    volatility. This provides us with an ideal setting to study both the finite-sample bias and the

    added bias from using the Kalman filter for estimation of affine non-Gaussian models. As an

    alternative, Joslin et al. (2011) and Hamilton and Wu (2012) provide identification schemes

    that facilitate the estimation of affine Gaussian models in that they avoid the filtering of the

    unobserved latent factors.3 However, it is not obvious if or how those approaches extend to

    affine non-Gaussian models. Thus, the AFNS-based identification of affine Gaussian models

    provided by CDR and extended by CLR to affine non-Gaussian models remains an important

    contribution without which the analysis in this paper would not have been feasible.4

    Because interest rates are highly persistent, empirical autoregressive models, including

    dynamic term structure models, suffer from substantial small-sample estimation bias. Specif-

    ically, model estimates will generally be biased toward a dynamic system that displays much

    less persistence than the true process (so estimates of the real-world mean-reversion matrix,

    KP , are upward biased). Furthermore, if the degree of interest rate persistence is underes-

    timated, future short rates would be expected to revert to their mean too quickly causing

    their expected longer-term averages to be too stable. Therefore, the bias in the estimated

    dynamics distorts the decomposition of yields and contaminates estimates of long-maturity

    term premiums.

    To study this finite-sample problem in detail, we start out simulating and estimating

    Gaussian AFNS models for which the Kalman filter is an efficient estimator as already noted.

    We simulate short ten-year and long forty-year samples to study the finite-sample bias problem

    directly. We allow for low and high noise to assess how data quality affects our conclusions.

    Furthermore, for the benchmark Gaussian AFNS model, we also analyze samples at weekly

    frequency in addition to the monthly frequency used throughout, but since this turns out to

    matter little for our conclusions, we do not repeat this exercise for the models with stochastic

    volatility. We then proceed to simulate and estimate AFNS models with stochastic volatility

    in a similarly careful way.

    Our findings can be summarized as follows.

    In the Gaussian AFNS model, there is a significant finite-sample upward bias in the

    estimates of the mean-reversion rate of the Nelson-Siegel level factor due to its near unit-root

    property. In addition, there is a more modest, finite-sample upward estimation bias in the

    mean-reversion parameters for the slope and curvature factor thanks to their lower persistence.

    Importantly, there is no finite-sample bias in the estimated mean parameters of any of the

    factors. Furthermore, all parameters that relate to the model’s Q-dynamics used for pricing

    3Andreasen and Christensen (2015) offer an alternative way of estimating non-Gaussian term structuremodels.

    4The related literature include Duan and Simonato (1995), Lund (1997), De Jong (2000), Duffee andStanton (2004), and Duffee and Stanton (2008) among others.

    2

  • and fitting the cross section of yields are well determined and without any measurable bias.

    This property turns out to hold for non-Gaussian models as well. However, the accuracy of

    the estimated Q-dynamics is affected by the amount of noise in the data. Finally, the data

    frequency plays no role for these conclusions as both weekly and monthly simulated data

    produce similar results. However, in the weekly samples, the parameter standard deviations

    estimated from the optimized likelihood function in the Kalman filter tend to be too low. This

    makes the upward biased mean-reversion parameters appear even more significant than they

    are, which complicates model selection. Hence, we document one of the unusual situations

    where more data do not necessarily lead to better inference. For selecting the appropriate

    specification of the mean-reversion matrix, which matters for forecast performance, term

    premium decompositions etc., we therefore recommend to rely on monthly rather than weekly

    data.

    We then proceed to simulate and estimate AFNS models with stochastic volatility gener-

    ated by the level factor in one set of exercises, and with stochastic volatility generated by the

    curvature factor in another set of exercises.

    First, we find that the finite-sample upward bias in the estimated mean-reversion parame-

    ters is not materially different in the models with stochastic volatility relative to the Gaussian

    AFNS model. The intuition behind this result is that the time series properties of the three

    state variables are primarily determined by the Nelson-Siegel factor loading structure, which

    is almost identical for all AFNS models with and without stochastic volatility. For similar

    reasons we also see little bias in the estimated mean parameters in these models.

    Second, we analyze in detail the ability of the Kalman filter to estimate the volatility

    sensitivity parameters that determine the degree to which the stochastic volatility factor

    affects the volatility of the unconstrained factors in each model. For U.S. Treasury yields,

    these sensitivity parameters are often estimated to be negligible (see CLR for an example) and

    we report similar results. To assess whether this is a general weakness of the Kalman filter

    when applied to models with stochastic volatility, we perform separate simulation experiments

    with large values for the sensitivity parameters. Our results show that the Kalman filter is in

    fact able to estimate them with some accuracy. Thus, when their estimated values are tiny

    and insignificant, it is most likely because the data call for them to be so.

    Third, in general, it is the case that the parameters that primarily affect the models’

    fit to the cross section of yields tend to have small or no bias, but their accuracy varies

    positively with the quality of the data. We note one exception though. In the AFNS model

    with stochastic volatility generated by the curvature factor, the mean of the curvature factor

    under the risk-neutral Q measure is not well identified. However, we show that this can be

    solved at practically no cost by fixing it at a low value that is exactly high enough that the

    curvature factor does not reach its zero lower bound.

    Another key finding is that the Kalman filter is as efficient at filtering state variables in

    3

  • non-Gaussian models as it is at filtering in Gaussian models, in particular under optimal con-

    ditions with high-quality data. As a consequence, the fit of the AFNS models with stochastic

    volatility is as good as, if not better than, the fit of the Gaussian AFNS model.

    Finally, in light of the low interest rate environment in recent years, we emphasize that

    our study has no baring on how Kalman filter-based estimations perform when yields are near

    their lower bound and exhibit asymmetric behavior for that reason. This is a task that we

    leave for future research. Still, the results we report could serve as a useful benchmark even

    for that kind of exercise.

    The rest of the paper is structured as follows. Section 2 describes our sample of U.S.

    Treasury yields and motivates our focus on the Nelson-Siegel yield curve model, while Section

    3 briefly details the original Gaussian AFNS model of the term structure. Section 4 goes on

    to describe the five classes of AFNS models with stochastic volatility dynamics introduced

    in CLR. Section 5 details the estimation methodology, while Section 6 describes the simu-

    lation study. Section 7 contains the results from the simulation exercises for the Gaussian

    AFNS model, while Sections 8 and 9 contain the results for the AFNS models with stochastic

    volatility generated by the level and curvature factor, respectively. Section 10 concludes the

    paper.

    2 Motivation for the Nelson-Siegel Model

    In this section, we motivate our focus on the Nelson-Siegel yield curve model using principal

    components analysis. Recall that principal components analysis decomposes the observed

    data into a number of factors equal to the number of time series and ranks those factors

    according to how much of the observed variation each factor explains.

    The specific Treasury yields we analyze to obtain realistic parameter sets to be used

    in our simulation exercises are zero-coupon yields constructed by the method described in

    Gürkaynak et al. (2007) and briefly detailed here.5 For each business day a zero-coupon yield

    curve of the Svensson (1995)-type

    yt(τ) = β0 +1− e−λ1τλ1τ

    β1 +[1− e−λ1τ

    λ1τ− e−λ1τ

    ]β2 +

    [1− e−λ2τλ2τ

    − e−λ2τ]β3

    is fitted to price a large pool of underlying off-the-run Treasury bonds. Thus, for each busi-

    ness day, we have the fitted values of the four coefficients (β0(t), β1(t), β2(t), β3(t)) and two

    parameters (λ1(t), λ2(t)). From this data set zero-coupon yields for any relevant maturity

    can be calculated. As demonstrated by Gürkaynak et al. (2007), this discount function prices

    the underlying pool of bonds extremely well. By implication, the zero-coupon yields derived

    from this approach constitute a very good approximation to the true underlying Treasury

    5The Board of Governors of the Federal Reserve updates the data on its website athttp://www.federalreserve.gov/pubs/feds/2006/index.html.

    4

  • 1988 1992 1996 2000 2004 2008

    02

    46

    810

    Rat

    e in

    per

    cent

    10−year yield 5−year yield 1−year yield 3−month yield

    Figure 1: Time Series of Treasury Yields.Illustration of the weekly observed Treasury zero-coupon bond yields covering the period from Decem-

    ber 4, 1987, to January 2, 2009. The yields shown have maturities: Three-month, one-year, five-year,

    and ten-year.

    Maturity Mean Std. dev.in months in % in %

    Skewness Kurtosis

    3 4.52 2.02 0.03 2.416 4.61 2.05 -0.01 2.4012 4.77 2.04 -0.04 2.4124 5.03 1.95 -0.03 2.4336 5.24 1.86 0.02 2.3960 5.58 1.72 0.15 2.2584 5.85 1.62 0.26 2.13120 6.16 1.52 0.36 2.05

    Table 1: Summary Statistics of Treasury Yields.Summary statistics for the sample of weekly observed Treasury zero-coupon bond yields covering the

    period from December 4, 1987, to January 2, 2009.

    zero-coupon yield curve.6

    To have the most active part of the maturity spectrum represented, we construct Treasury

    zero-coupon bond yields with the following maturities: 3-month, 6-month, 1-year, 2-year, 3-

    year, 5-year, 7-year, and 10-year. We use weekly data (Fridays) and limit our sample to the

    6D’Amico and King (2013) show that the Svensson functional form has had some difficulty at times infitting the underlying bond prices since the peak of the financial crisis. This explains why we end our sampleon January 2, 2009. Furthermore, we emphasize that we merely use the U.S. Treasury yields to obtain realisticparameter sets to be used in the model simulations. Hence, ultimately, the accuracy of the Svensson smoothedcurve does not matter for our exercise and the conclusions we draw.

    5

  • Maturity Loading onin months First P.C. Second P.C. Third P.C.

    3 -0.38 -0.44 0.526 -0.39 -0.38 0.1912 -0.40 -0.25 -0.2124 -0.38 -0.03 -0.4736 -0.36 0.12 -0.4260 -0.33 0.33 -0.1184 -0.30 0.44 0.18120 -0.27 0.53 0.45

    % explained 94.12 5.58 0.27

    Table 2: Eigenvectors of the First Three Principal Components of Treasury Yields.The loadings of yields of various maturities on the first three principal components are shown. The

    final row shows the proportion of all bond yield variability accounted for by each principal component.

    The data consist of weekly U.S. Treasury zero-coupon bond yields from December 4, 1987, to January

    2, 2009.

    period from December 4, 1987, to January 2, 2009. The summary statistics are provided in

    Table 1, while Figure 1 illustrates the constructed time series of the three-month, one-year,

    five-year, and ten-year Treasury zero-coupon yields.

    Researchers have typically found that three factors are sufficient to model the time-

    variation in the cross section of Treasury bond yields (e.g., Litterman and Scheinkman, 1991).

    Indeed, for our weekly Treasury bond data, 99.97% of the total variation is accounted for by

    three factors. Table 2 reports the eigenvectors that correspond to the first three principal

    components of our data. The first principal component accounts for 94.1% of the variation in

    the Treasury bond yields, and its loading across maturities is uniformly negative. Thus, like

    a level factor, a shock to this component changes all yields in the same direction irrespective

    of maturity. The second principal component accounts for 5.6% of the variation in these data

    and has sizable negative loadings for the shorter maturities and sizable positive loadings for

    the long maturities. Thus, like a slope factor, a shock to this component steepens or flattens

    the yield curve. Finally, the third component, which accounts for only 0.3% of the variation,

    has a U-shaped factor loading as a function of maturity, which is naturally interpreted as a

    curvature factor.

    In summary, three factors can explain more than 99.97% of the variation in this set of

    Treasury bond yields, and they have properties consistent with an interpretation of level,

    slope, and curvature as in the Nelson-Siegel model detailed in the following.

    6

  • 3 The AFNS Model with Constant Volatility

    In this section, we briefly review the AFNS model with constant volatility, throughout referred

    to as the AFNS0 specification.7,8 We start from a standard continuous-time affine arbitrage-

    free structure (Duffie and Kan, 1996) that underlies all the models to be estimated in this pa-

    per. To represent an affine diffusion process, define a filtered probability space (Ω,F , (Ft), Q),where the filtration (Ft) = {Ft : t ≥ 0} satisfies the usual conditions (Williams, 1997). Thestate variables Xt are assumed to be a Markov process defined on a set M ⊂ Rn that solvesthe following stochastic differential equation (SDE)9

    dXt = KQ(t)[θQ(t)−Xt]dt+Σ(t)D(Xt, t)dWQt , (1)

    where WQ is a standard Brownian motion in Rn, the information of which is contained in

    the filtration (Ft). The drift terms θQ : [0, T ] → Rn and KQ : [0, T ] → Rn×n are bounded,continuous functions.10 Similarly, the volatility matrix Σ : [0, T ] → Rn×n is assumed to be abounded, continuous function, while D :M × [0, T ] → Rn×n is assumed to have the followingdiagonal structure

    √γ1(t) + δ1(t)Xt . . . 0

    .... . .

    ...

    0 . . .√γn(t) + δn(t)Xt

    ,

    where

    γ(t) =

    γ1(t)...

    γn(t)

    , δ(t) =

    δ11(t) . . . δ1n(t)

    .... . .

    ...

    δn1 (t) . . . δnn(t)

    ,

    γ : [0, T ] → Rn and δ : [0, T ] → Rn×n are bounded, continuous functions, and δi(t) denotesthe ith row of the δ(t)-matrix. Finally, the instantaneous risk-free rate is assumed to be an

    affine function of the state variables

    rt = ρ0(t) + ρ1(t)′Xt,

    7Our nomenclature follow CLR and draws on Dai and Singleton (2000). Our AFNSn models are membersof their An(3) class of models, which have three state variables and n square-root processes.

    8This model has been shown to exhibit both good in-sample fit and out-of-sample forecast accuracy forvarious yield curves. The empirical analysis conducted in CDR is based on unsmoothed Fama-Bliss data fornominal Treasury yields. Christensen et al. (2010) examine yields for nominal and real Treasuries as perGürkaynak et al. (2007, 2010), while Christensen et al. (2014b) examine short-term LIBOR and highly-ratedbanks’ and financial firms’ corporate bond rates.

    9The affine property applies to bond prices; therefore, affine models only impose structure on the factordynamics under the pricing measure.

    10Stationarity of the state variables is ensured if all the eigenvalues of KQ(t) are positive (if complex, the realcomponent should be positive), see Ahn et al. (2002). However, stationarity is not a necessary requirementfor the process to be well defined.

    7

  • where ρ0 : [0, T ] → R and ρ1 : [0, T ] → Rn are bounded, continuous functions.Duffie and Kan (1996) prove that zero-coupon bond prices in this framework are exponential-

    affine functions of the state variables

    P (t, T ) = EQt[exp

    (−∫ T

    t

    rudu)]

    = exp(B(t, T )′Xt +A(t, T )

    ),

    where B(t, T ) and A(t, T ) are the solutions to the following system of ordinary differential

    equations (ODEs)

    dB(t, T )

    dt= ρ1 + (K

    Q)′B(t, T )− 12

    n∑

    j=1

    (Σ′B(t, T )B(t, T )′Σ)j,j(δj)′, B(T, T ) = 0, (2)

    dA(t, T )

    dt= ρ0 −B(t, T )′KQθQ −

    1

    2

    n∑

    j=1

    (Σ′B(t, T )B(t, T )′Σ)j,jγj, A(T, T ) = 0, (3)

    and the possible time-dependence of the parameters is suppressed in the notation. These

    pricing functions imply that the zero-coupon yields are given by

    y(t, T ) = − 1T − t log P (t, T ) = −

    B(t, T )′

    T − t Xt −A(t, T )

    T − t .

    As per CDR, assume that the instantaneous risk-free rate is defined by

    rt = Lt + St.

    In addition, assume that the state variables Xt = (Lt, St, Ct) are described by the following

    system of SDEs under the risk-neutral Q-measure

    dLt

    dSt

    dCt

    =

    0 0 0

    0 λ −λ0 0 λ

    θQ1

    θQ2

    θQ3

    Lt

    St

    Ct

    dt+Σ

    dWL,Qt

    dW S,Qt

    dWC,Qt

    , λ > 0.

    Then, zero-coupon bond yields are given by

    y(t, T ) = Lt +(1− e−λ(T−t)

    λ(T − t)

    )St +

    (1− e−λ(T−t)λ(T − t)

    − e−λ(T−t))Ct −

    A(t, T )

    T − t.

    This result defines the class of AFNS0 models derived in CDR and the additional term in

    the yield function is a so-called yield-adjustment term that represents convexity effects due

    to Jensen’s inequality; see CDR for details. To complete the model, we need to specify the

    risk premium structure that generates the connection to the dynamics under the real-world

    P -measure. To that end, it is important to note that there are no restrictions on the dynamic

    drift components under the empirical P -measure. Therefore, beyond the requirement of

    constant volatility, we are free to choose the dynamics under the P -measure. To facilitate

    8

  • the empirical implementation, we follow CDR and limit our focus to the essentially affine risk

    premium introduced in Duffee (2002). In the Gaussian framework, this specification implies

    that the risk premiums Γt depend linearly on the state variables; that is,

    Γt = γ0 + γ1Xt,

    where γ0 ∈ R3 and γ1 ∈ R3×3 contain unrestricted parameters. The relationship betweenreal-world yield curve dynamics under the P -measure and risk-neutral dynamics under the

    Q-measure is given by

    dWQt = dWPt + Γtdt.

    Thus, we can write the P -dynamics of the state variables as

    dXt = KP (θP −Xt)dt+ΣdWPt ,

    where both KP and θP are allowed to vary freely relative to their counterparts under the

    Q-measure. Following CDR, we identify this class of models by fixing the means under the

    Q-measure at zero, i.e., θQ = 0.11 Furthermore, CDR show that Σ cannot be more than a

    triangular matrix for the model to be identified. Thus, the maximally flexible specification of

    the original AFNS model has Q-dynamics given by

    dLt

    dSt

    dCt

    =

    0 0 0

    0 −λ λ0 0 −λ

    Lt

    St

    Ct

    dt+

    σ11 0 0

    σ21 σ22 0

    σ31 σ32 σ33

    dWL,Qt

    dW S,Qt

    dWC,Qt

    ,

    while its P -dynamics are given by

    dLt

    dSt

    dCt

    =

    κP11 κP12 κ

    P13

    κP21 κP22 κ

    P23

    κP31 κP32 κ

    P33

    θP1

    θP2

    θP3

    Lt

    St

    Ct

    dt+

    σ11 0 0

    σ21 σ22 0

    σ31 σ32 σ33

    dWL,Pt

    dW S,Pt

    dWC,Pt

    .

    The main limitation of the AFNS0 class of models is that it is characterized by a constant

    volatility matrix Σ. CLR modify the AFNS0 model in a straightforward fashion in order to

    incorporate stochastic volatility. The key assumption to preserving the desirable Nelson-Siegel

    factor loading structure in the zero-coupon bond yield function is to maintain the KQ mean-

    reversion matrix under the Q-measure. Furthermore, all model classes will be characterized

    by an instantaneous risk-free rate defined as the sum of the first two factors

    rt = Lt + St.

    11CDR demonstrate that this choice is without loss of generality.

    9

  • The details of the AFNS models with stochastic volatility are briefly provided in the following

    section.

    4 Five AFNS Specifications with Stochastic Volatility

    In this section, we present five AFNS specifications with stochastic volatility that vary de-

    pending on whether they contain one, two, or three stochastic volatility factors and on the

    identity of those factors. For each model class, we derive the maximally flexible specifica-

    tion that can be obtained using the extended affine risk premium specification introduced in

    Cheridito et al. (2007).

    4.1 AFNS Models with One Stochastic Volatility Factor

    There are two AFNS stochastic volatility specifications that allow just one factor to exhibit

    stochastic volatility. The first, denoted as the AFNS1-L model, allows only the level factor

    to exhibit stochastic volatility. The state variables in this specification follow this system of

    stochastic differential equations under the risk-neutral Q-measure:12

    dLt

    dSt

    dCt

    =

    ε 0 0

    0 λ −λ0 0 λ

    θQ1

    θQ2

    θQ3

    Lt

    St

    Ct

    dt

    +

    σ11 0 0

    σ21 σ22 0

    σ31 σ32 σ33

    √Lt 0 0

    0√1 + β21Lt 0

    0 0√1 + β31Lt

    dWL,Qt

    dW S,Qt

    dWC,Qt

    ,

    where the level factor Lt is a square-root process with stochastic volatility that affects the

    instantaneous volatility of the two other factors through the volatility sensitivity parameters,

    β21 and β31.

    For the factor loadings in the zero-coupon bond prices, B1(t, T ) is the solution to

    dB1(t, T )

    dt= 1 + εB1(t, T )− 1

    2σ211B

    1(t, T )2 − 12σ221B

    2(t, T )2 − 12σ231B

    3(t, T )2

    −σ21σ11B1(t, T )B2(t, T )− σ31σ11B1(t, T )B3(t, T )− σ21σ31B2(t, T )B3(t, T )

    −12β21

    [σ222B

    2(t, T )2 + σ232B3(t, T )2 + 2σ22σ32B

    2(t, T )B3(t, T )]− 1

    2β31σ

    233B

    3(t, T )2,

    12Note that we cannot set κQ11

    to zero as that would eliminate the drift of Lt and cause this process to remainat zero once it hits zero, which it will P -a.s. Instead, we fix this parameter at a small, but positive, ε = 10−6,to get close to the unit-root property imposed in the AFNS0 model.

    10

  • while B2(t, T ) and B3(t, T ) are given by

    B2(t, T ) = −(1− e−λ(T−t)

    λ

    ),

    B3(t, T ) = (T − t)e−λ(T−t) −(1− e−λ(T−t)

    λ

    ).

    The last two factor loadings match exactly the factor loadings of the slope and curvature

    factors in the Nelson-Siegel zero-coupon yield function, while the ODE for B1(t, T ) contains

    quadratic elements related to the stochastic volatility of Lt. The A(t, T )-function in the

    yield-adjustment term in this class of models must solve the following ODE:

    dA(t, T )

    dt= −B(t, T )′KQθQ − 1

    2σ222B2(t, T )2 − 1

    2(σ2

    32+ σ2

    33)B3(t, T )2 − σ22σ32B2(t, T )B3(t, T ).

    To estimate this model, we specify the dynamics under the real-world P -measure as the

    measure change dWQ = dWPt + Γtdt. Note that we are limited to the essentially affine risk

    premium structure introduced by Duffee (2002) for this particular model class.13 Given this

    limitation, the maximally flexible affine P -dynamics are, in general, given by

    dLt

    dSt

    dCt

    =

    κP11 0 0

    κP21 κP22 κ

    P23

    κP31 κP32 κ

    P33

    θP1

    θP2

    θP3

    Lt

    St

    Ct

    dt

    +

    σ11 0 0

    σ21 σ22 0

    σ31 σ32 σ33

    √Lt 0 0

    0√1 + β21Lt 0

    0 0√1 + β31Lt

    dWL,Pt

    dW S,Pt

    dWC,Pt

    .

    For the first factor with stochastic volatility, there is a restriction on the mean parameter θP1

    that we implement as14

    θP1 =ε · θQ1κP11

    .

    Furthermore, for this process to be well-defined under both probability measures, we require

    that

    κP11θP1 > 0 and ε · θ

    Q1 > 0.

    These two inequalities are satisfied provided κP11 > 0 and θQ1 > 0. These restrictions ensure

    13We choose not to use the extended affine risk premium specification for this particular specification becauseof the restriction imposed on κQ

    11to obtain a level factor structure as similar as possible to the one in the

    Nelson-Siegel model. If we were to do so, we would expect the Feller condition for Lt to be violated under theQ-measure as Lt would approach a unit-root process (CLR observe such violations in the AFNS3 model to bedetailed later despite imposing Feller conditions on all three state variables under both probability measures),but we stress that this is a self-imposed restriction based on the above concern, and not a theoretical necessity.

    14A similar approach is used in the other model classes with stochastic volatility generated by the levelfactor.

    11

  • that the Lt-process will move into positive territory whenever it hits the zero lower bound.

    Finally, we identify this class of models by fixing θQ2 = θQ3 = 0, that is, we eliminate the Q-

    means of the unconstrained processes as in CDR. These restrictions allow the corresponding

    means under the P -measure to be determined in the estimation.

    The natural next AFNS one-factor stochastic volatility specification would allow the slope

    factor to exhibit stochastic volatility. However, examination of the matrix

    KQ =

    0 0 0

    0 λ −λ0 0 λ

    ,

    shows that St cannot be a square-root process with Ct as an unconstrained process, if the

    important off-diagonal element κQ23 is to remain equal to −λ, which generates the uniquefactor loading of the curvature factor in the AFNS model. Thus, there is no admissible

    AFNS1-S model. Instead, we turn to the AFNS1-C model by allowing the curvature factor

    to be a stochastic volatility factor. This approach preserves the properties of the level and

    slope factors, allows the curvature factor to continue to serve as the stochastic mean of the

    slope factor under the pricing measure, and designates the curvature factor to be the source

    of stochastic volatility in the model.

    For the AFNS1-C model, we assume that the state variables Xt are described under the

    risk-neutral Q-measure as:

    dLt

    dSt

    dCt

    =

    0 0 0

    0 λ −λ0 0 λ

    θQ1

    θQ2

    θQ3

    Lt

    St

    Ct

    dt

    +

    σ11 σ12 σ13

    0 σ22 σ23

    0 0 σ33

    √1 + β13Ct 0 0

    0√1 + β23Ct 0

    0 0√Ct

    dWL,Qt

    dW S,Qt

    dWC,Qt

    .

    The curvature factor here is a square-root process that induces stochastic volatility in the

    other two factors through the volatility sensitivity parameters, β13 and β23.

    In this model class, the first two factor loadings are identical to those in the AFNS0 model,

    while B3(t, T ) is the solution to:

    dB3(t, T )

    dt= −λB2(t, T ) + λB3(t, T )− 1

    2σ213B

    1(t, T )2 − 12σ223B

    2(t, T )2 − 12σ233B

    3(t, T )2

    −σ13σ23B1(t, T )B2(t, T )− σ13σ33B1(t, T )B3(t, T )− σ23σ33B2(t, T )B2(t, T )

    −12β13σ

    211B

    1(t, T )2 − 12β23

    [σ212B

    1(t, T )2 + σ222B2(t, T )2 + 2σ12σ22B

    1(t, T )B2(t, T )].

    The A(t, T )-function in the yield-adjustment term in this class of models solves the ODE:

    12

  • dA(t, T )

    dt= −B(t, T )′KQθQ − 1

    2(σ2

    11+ σ2

    12)B1(t, T )2 − 1

    2σ222B2(t, T )2 − σ12σ22B1(t, T )B2(t, T ).

    We estimate this model using the extended affine risk premium specification such that

    the measure change is dWQ = dWPt + Γtdt. The maximally flexible affine P -dynamics are,

    in general, given by

    dLt

    dSt

    dCt

    =

    κP11 κP12 κ

    P13

    κP21 κP22 κ

    P23

    0 0 κP33

    θP1

    θP2

    θP3

    Lt

    St

    Ct

    dt

    +

    σ11 σ12 σ13

    0 σ22 σ23

    0 0 σ33

    √1 + β13Ct 0 0

    0√1 + β23Ct 0

    0 0√Ct

    dWL,Pt

    dW S,Pt

    dWC,Pt

    .

    To keep the model arbitrage-free, Ct cannot be allowed to hit the zero lower bound. This

    outcome is ensured by requiring that the parameters for the Ct-process satisfy the Feller

    condition under both probability measures; i.e.,

    κP33θP3 >

    1

    2σ233 and λθ

    Q3 >

    1

    2σ233.

    Finally, we identify this class of models by fixing θQ1 = θQ2 = 0, which allows the means

    under the P -measure of the unconstrained factors to vary freely and be determined in the

    estimation.

    4.2 AFNS Models with Two Stochastic Volatility Factors

    Our third and fourth classes of stochastic volatility models allow for two stochastic volatility

    factors. Although there are three potential specifications, the specification with just the level

    and slope factors exhibiting stochastic volatility is not admissible because it does not permit

    the important off-diagonal element κQ23 to equal −λ, which is the unique characteristic of thecurvature factor in the original AFNS model. Instead, stochastic volatility is associated with

    either level and curvature or slope and curvature. The first of these specifications, denoted

    13

  • AFNS2-LC, has factor dynamics under the risk-neutral Q-measure given by15

    dLt

    dSt

    dCt

    =

    ε 0 0

    0 λ −λ0 0 λ

    θQ1

    θQ2

    θQ3

    Lt

    St

    Ct

    dt

    +

    σ11 0 0

    σ21 σ22 σ23

    0 0 σ33

    √Lt 0 0

    0√1 + β21Lt + β23Ct 0

    0 0√Ct

    dWL,Qt

    dW S,Qt

    dWC,Qt

    .

    The level and curvature factors, Lt and Ct, exhibit stochastic volatility and induce time-

    varying volatility in the slope factor, St, via the volatility sensitivity parameters, β21 and

    β23.

    The factor loadings in the zero-coupon bond price function are the unique solutions to

    the following set of ODEs:

    dB1(t, T )

    dt= 1 + εB1(t, T )− 1

    2σ211B

    1(t, T )2 − 12σ221B

    2(t, T )2

    −σ11σ21B1(t, T )B2(t, T )−1

    2β21σ

    222B

    2(t, T )2,

    dB2(t, T )

    dt= 1 + λB2(t, T ),

    dB3(t, T )

    dt= −λB2(t, T ) + λB3(t, T )− 1

    2σ233B

    3(t, T )2 − 12σ223B

    2(t, T )2

    −σ23σ33B2(t, T )B3(t, T )−1

    2β23σ

    222B

    2(t, T )2,

    where we note that the solution to B2(t, T ) is simply

    B2(t, T ) = −1− e−λ(T−t)

    λ.

    Hence, St preserves its role as a slope factor. The A(t, T )-function is the solution to:

    dA(t, T )

    dt= −B(t, T )′KQθQ − 1

    2σ222B

    2(t, T )2.

    Using the extended affine risk premium structure, the maximally flexible affine P -dynamics

    15Note that, as before, we fix ε = 10−6 to approximate the unit-root property imposed in the standardAFNS0 model.

    14

  • are given by

    dLt

    dSt

    dCt

    =

    κP11 0 0

    κP21 κP22 κ

    P23

    κP31 0 κP33

    θP1

    θP2

    θP3

    Lt

    St

    Ct

    dt

    +

    σ11 0 0

    σ21 σ22 σ23

    0 0 σ33

    √Lt 0 0

    0√1 + β21Lt + β23Ct 0

    0 0√Ct

    dWL,Pt

    dW S,Pt

    dWC,Pt

    .

    For the level factor, the condition ε · θQ1 = κP11θP1 must be satisfied. Furthermore, to keep thismodel class arbitrage free, Ct cannot hit the zero-boundary, which is prevented by requiring

    that the parameters for the Ct-process satisfy the Feller condition under both probability

    measures; i.e.,16

    κP31θP1 + κ

    P33θ

    P3 >

    1

    2σ233 and λθ

    Q3 >

    1

    2σ233.

    Finally, to have a well-defined Ct-process, the effect of the level factor on the drift of the cur-

    vature factor must be positive, which we impose with the κP31 ≤ 0 constraint. This conditionimplies that the two square-root processes cannot be negatively correlated. To identify this

    model class, we fix the θQ2 mean at zero.

    The second AFNS specification with two volatility factors allows the slope and curvature

    factors to be square-root processes while the level factor remains unconstrained. The factor

    dynamics of this AFNS2-SC model under the Q-measure are:

    dLt

    dSt

    dCt

    =

    0 0 0

    0 λ −λ0 0 λ

    θQ1

    θQ2

    θQ3

    Lt

    St

    Ct

    dt

    +

    σ11 σ12 σ13

    0 σ22 0

    0 0 σ33

    √1 + β12St + β13Ct 0 0

    0√St 0

    0 0√Ct

    dWL,Qt

    dW S,Qt

    dWC,Qt

    .

    Note that the square-root processes, St and Ct, are positively correlated through the off-

    diagonal element κQ23 = −λ < 0. Beyond generating their own stochastic volatility, these twofactors induce instantaneous volatility for Lt via the volatility sensitivities, β12 and β13.

    For the first factor loading in the zero-coupon bond price function, this structure implies

    that

    B1(t, T ) = −(T − t),

    which preserves the role of the level factor. The next two factor loadings are the unique

    16For Lt, we just need to ensure that the process does not turn negative, which is achieved provided thatε · θ

    Q1

    > 0 and κP11θP1 > 0.

    15

  • solutions to:

    dB2(t, T )

    dt= 1 + λB2(t, T )− 1

    2σ222B

    2(t, T )2 − 12σ212B

    1(t, T )2

    −σ12σ22B1(t, T )B2(t, T )−1

    2β12σ

    211B

    1(t, T )2,

    dB3(t, T )

    dt= −λB2(t, T ) + λB3(t, T )− 1

    2σ233B

    3(t, T )2 − 12σ213B

    1(t, T )2

    −σ13σ33B1(t, T )B3(t, T )−1

    2β13σ

    211B

    1(t, T )2.

    The A(t, T )-function in the yield-adjustment term is the solution to

    dA(t, T )

    dt= −B(t, T )′KQθQ − 1

    2σ211B

    1(t, T )2.

    Using the extended affine risk premium specification, the maximally flexible affine P -dynamics

    can be written as

    dLt

    dSt

    dCt

    =

    κP11 κP12 κ

    P13

    0 κP22 κP23

    0 κP32 κP33

    θP1

    θP2

    θP3

    Lt

    St

    Ct

    dt

    +

    σ11 σ12 σ13

    0 σ22 0

    0 0 σ33

    √1 + β12St + β13Ct 0 0

    0√St 0

    0 0√Ct

    dWL,Pt

    dW S,Pt

    dWC,Pt

    .

    To keep this class of models arbitrage-free, the slope and curvature factors, St and Ct, must

    avoid hitting the zero-boundary. This outcome is ensured by imposing the Feller condition

    on their parameters as follows:

    κP22θP2 + κ

    P23θ

    P3 >

    1

    2σ222; λθ

    Q2 − λθ

    Q3 >

    1

    2σ222; κ

    P33θ

    P3 + κ

    P32θ

    P2 >

    1

    2σ233; and λθ

    Q3 >

    1

    2σ233.

    Furthermore, for St and Ct to be well defined, the sign of the effect they have on each other

    must be positive, which we impose using the constraints κP23 ≤ 0 and κP32 ≤ 0. This impliesthat the two square-root processes cannot be negatively correlated. Finally, we identify this

    class of models by fixing θQ1 = 0, which allows θP1 to vary freely.

    16

  • 4.3 AFNS Models with Three Stochastic Volatility Factors

    In the fifth and final AFNS3 specification, all three factors exhibit stochastic volatility. The

    dynamics of Xt are described under the Q-measure as17

    dLt

    dSt

    dCt

    =

    ε 0 0

    0 λ −λ0 0 λ

    θQ1

    θQ2

    θQ3

    Lt

    St

    Ct

    dt

    +

    σ11 0 0

    0 σ22 0

    0 0 σ33

    √Lt 0 0

    0√St 0

    0 0√Ct

    dWL,Qt

    dW S,Qt

    dWC,Qt

    .

    In this model class, the factor loadings in the zero-coupon bond price function are given by

    the unique solution to

    dB1(t, T )

    dt= 1 + εB1(t, T )− 1

    2σ211B

    1(t, T )2,

    dB2(t, T )

    dt= 1 + λB2(t, T )− 1

    2σ222B

    2(t, T )2,

    dB3(t, T )

    dt= −λB2(t, T ) + λB3(t, T )− 1

    2σ233B

    3(t, T )2,

    while the A(t, T )-function in the yield-adjustment term is given by the solution to:

    dA(t, T )

    dt= −B(t, T )′KQθQ.

    Applying the extended affine risk premium specification, the maximally flexible affine P -

    dynamics are given by

    dLt

    dSt

    dCt

    =

    κP11 0 0

    κP21 κP22 κ

    P23

    κP31 κP32 κ

    P33

    θP1

    θP2

    θP3

    Lt

    St

    Ct

    dt

    +

    σ11 0 0

    0 σ22 0

    0 0 σ33

    √Lt 0 0

    0√St 0

    0 0√Ct

    dWL,Pt

    dW S,Pt

    dWC,Pt

    .

    For Lt, the constraint ε ·θQ1 = κP11θP1 must be satisfied. The limited risk premium specificationdue to the near unit-root property of Lt also implies that St and Ct cannot impact the drift

    of Lt once κQ12 and κ

    Q13 have been fixed at zero. We need these restrictions in order to match

    the Nelson-Siegel factor loading structure as closely as possible.

    To keep this model class arbitrage-free, St and Ct must not hit their zero lower bounds.

    17Note that, we again fix ε = 10−6 to approximate the unit-root property imposed in the AFNS0 model.

    17

  • We ensure this by imposing the Feller condition on their parameters under both probability

    measures, i.e.,18

    κP21θ

    P1 + κ

    P22θ

    P2 + κ

    P23θ

    P3 >

    1

    2σ2

    22; λθQ2− λθ

    Q3

    >1

    2σ2

    22; κP31θ

    P1 + κ

    P32θ

    P2 + κ

    P33θ

    P3 >

    1

    2σ2

    33; and λθQ3

    >1

    2σ2

    33.

    Furthermore, to have well-defined processes for St and Ct, the sign of the effect that the factors

    have on each of these two factors must be positive, which we impose with the restrictions

    κP21 ≤ 0, κP23 ≤ 0, κP31 ≤ 0, and κP32 ≤ 0. Note that these restrictions imply that the threesquare-root processes cannot be negatively correlated.

    5 Estimation Methodology

    The stochastic volatility models described in the previous section are estimated using the

    Kalman filter algorithm. In affine term structure models, zero-coupon yields are affine func-

    tions of the state variables such that

    yt(τ) = −1

    τB(τ)′Xt −

    1

    τA(τ) + εt(τ),

    where εt(τ) represents i.i.d. Gaussian white noise measurement errors. The conditional mean

    for multi-dimensional affine continuous-time diffusion processes is given by

    EP [XT |Xt] = (I − exp(−KP (T − t)))θP + exp(−KP (T − t))Xt, (4)

    where exp(−KP (T−t)) is a matrix exponential. In general, the conditional covariance matrixfor affine diffusion processes is given by

    V P [XT |Xt] =∫ T

    t

    exp(−KP (T − s))ΣD(EP [Xs|Xt])D(EP [Xs|Xt])′Σ′ exp(−(KP )′(T − s))ds. (5)

    Stationarity of the system under the P -measure is ensured if the real components of all

    the eigenvalues of KP are positive, and this condition is imposed in all estimations. For this

    reason, we can start the Kalman filter at the unconditional mean and covariance matrix19

    X̂0 = θP and Σ̂0 =

    ∫∞

    0e−K

    P sΣD(θP )D(θP )′Σ′e−(KP )′sds.

    However, the introduction of stochastic volatility implies that the factors are no longer

    simply Gaussian. We choose to approximate the true probability distribution of the state

    variables with the first and second moments and use the Kalman filter algorithm as if the

    18For Lt, we just need to ensure that the process does not become negative, which is assured if ε · θQ1

    > 0and κP11θ

    P1 > 0.

    19In the estimation, we calculate the conditional and unconditional covariance matrices using the analyticalsolutions provided in Fisher and Gilles (1996).

    18

  • state variables were Gaussian.20 Thus, the state equation is given by

    Xt = (I − exp(−KP∆t))θP + exp(−KP∆t)Xt−1 + ηt, ηt ∼ N(0, Vt−1),

    where ∆t is the time between observations and Vt−1 is the conditional covariance matrix given

    in equation (5). However, the discrete nature of the state equation can cause the square-root

    processes to become negative despite the fact that the parameter sets are forced to satisfy

    Feller conditions and other nonnegativity restrictions. Whenever this happens, we follow the

    literature and simply truncate those processes at zero; see Duffee (1999) for an example.

    In the Kalman filter estimations, the error structure is given by

    (ηt

    εt

    )∼ N

    [(0

    0

    ),

    (Vt−1 0

    0 H

    )],

    where H is assumed to be a diagonal matrix of the measurement error standard deviations,

    σε, that are specific to each yield maturity when we perform estimations with the Treasury

    yield data described in Section 2, while σε is assumed to be uniform for all yield maturities

    in the simulated yield samples as discussed below. The linear least-squares optimality of the

    Kalman filter requires that the white noise transition and measurement errors be orthogonal

    to the initial state; i.e., E[f0η′

    t] = 0 and E[f0ε′

    t] = 0. Finally, the standard deviations of the

    estimated parameters are calculated as

    Σ(ψ̂) =1

    T

    [1

    T

    T∑

    t=1

    ∂ log lt(ψ̂)

    ∂ψ

    ∂ log lt(ψ̂)

    ∂ψ

    ′]−1

    ,

    where ψ̂ denotes the optimal parameter set.

    6 Simulation study

    To study the efficiency of the Kalman filter in estimating affine term structure models with

    and without stochastic volatility, we undertake a carefully orchestrated simulation study the

    details of which are provided in the following.

    First, we search for a realistic parameter set for each AFNSi model class to use in the

    simulations. From CDR it follows that neither maximally flexible models nor parsimonious

    independent-factors models appear to reflect the true dynamics of the state variables, the

    former performs poorly out of sample and the latter is counterfactual in that the state variables

    do appear to be correlated. For that reason we look for parsimonious specifications in between

    these two extremes. For each model class, we go through a general-to-specific model selection

    20A few notable examples of papers that follow this approach include Duffee (1999), Driessen (2005),Feldhütter and Lando (2008), and Christensen et al. (2015).

    19

  • procedure using the Bayesian Information Criterion defined as

    BIC(k) = −2 logL+ k log T,

    where k is the number of estimated parameters, while T is the number of observations in the

    data. As described in Section 2, our data sample contains T = 1,101 weekly observations.

    Since CDR report limited gains in terms of forecasting performance from allowing for flexible

    specifications of the volatility matrix Σ, we restrict this matrix to be diagonal throughout.

    Based on the estimated parameters from the preferred specification for each model class,

    we perform two sets of simulations. In the first, we simulate N = 1,000 sample paths for the

    three state variables observed at a monthly frequency over a ten-year period. In the other,

    we repeat this, but simulate over a forty-year period.21

    In a second step, these simulated factor paths are converted into simulated zero-coupon

    yields observed at a monthly frequency with the following eight maturities, 0.25, 0.5, 1, 2, 3,

    5, 7, and 10 years. Finally, a Gaussian i.i.d. measurement error is added to each bond yield.

    To study the role, if any, of the data quality, we consider two values for the measurement

    error standard deviation, σε. In one simulated data sample, this standard deviation is fixed

    uniformly at 1 basis point, in the other data sample it is fixed uniformly at 10 basis points,

    which is at the upper end of the noise we observe in the Treasury yield data. In order to

    make the results as comparable as possible across model classes, the simulated measurement

    errors are kept the same, that is, the simulated measurement errors are the same for the

    ten- and forty-year samples, respectively, independent of the model class being simulated and

    independent of the size of the measurement error standard deviation.

    We now turn to the details of the simulation of the factor paths. The continuous-time

    P -dynamics are, in general, given by

    dXt = KP (θP −Xt)dt+ΣD(Xt)dWPt .

    For both restricted square-root processes and unconstrained processes we approximate the

    continuous-time process using the Euler approximation.22 To exemplify, for a restricted

    square-root process,

    dXit = κPii (θ

    Pi −Xit)dt+ κPij(θPj −X

    jt )dt+ σii

    √XitdW

    P,it ,

    the algorithm is

    Xit = Xit−1 + κ

    Pii (θ

    Pi −Xit−1)∆t+ κPij(θPj −X

    jt−1)∆t+ σii

    √Xit−1

    √∆tzit , z

    it ∼ N(0, 1).

    21For the Gaussian AFNS0 model class we also take out weekly observations from the simulated paths. Theresults presented later show that increasing the sampling frequency does not materially alter any of the results.For that reason we do not analyze weekly samples for the non-Gaussian AFNS model classes.

    22Thompson (2008) is an example.

    20

  • We fix ∆t at a uniform value of 0.0001, which is equivalent to approximately 27 shocks per day

    to each process through the Brownian motion. As Feller conditions and other non-negativity

    requirements are imposed in the estimations performed with the observed Treasury yields, the

    parameter sets used in the simulations satisfy all non-negativity requirements, so the “true”

    underlying continuous-time process never becomes negative P -a.s. However, for the discretely

    observed process above there is always a positive, but usually very small, probability that

    the approximation will become negative. Whenever this happens, we truncate the simulated

    square-root processes at 0 similar to what we do in the model estimations.

    As for the starting point of the simulation algorithm, X0, we ideally want to draw it

    from the unconditional joint distribution of the three state variables. However, with the

    exception of the Gaussian AFNS0 model, we do not know the unconditional distribution

    of Xt = (Lt, St, Ct). To overcome this problem, we take the estimated value of the three

    state variables at the end of the observed Treasury yield sample and simulate the three state

    variables according to the algorithm above for 100 years and repeat this 1,000 times. This

    effectively gives us random draws from the joint unconditional distribution ofXt = (Lt, St, Ct).

    These starting values are identical for both the ten- and forty-year simulated samples within

    each model class, again in an attempt to make the results as comparable as possible.

    In the final step, we use the 1,000 simulated samples from each exercise as input into a

    corresponding number of Kalman filter estimations where we use the true parameters as the

    starting point for each optimization. Since we are estimating the true model in each case,

    this provides us with a clean read of the properties of the Kalman filter as an estimator, not

    impacted by any errors related to model misspecification.

    7 Results for the Gaussian AFNS0 Model

    In this section, we describe our estimation results based on the simulated data of the Gaussian

    AFNS0 model that serves as the benchmark in our analysis. For this model class, the Kalman

    filter is a consistent and efficient estimator equivalent to exact maximum likelihood estimation.

    This allows us to study whether there is any finite-sample bias in the estimated parameters.

    Due to the efficiency of the Kalman filter, such finite-sample bias will affect any estimator.

    Hence, these results provide an ideal background for understanding the bias in Kalman filter-

    based estimations of non-Gaussian AFNS models with stochastic volatility.

    To begin, the result of the model selection for the Gaussian AFNS0 model is reported in

    Table 3. The statistics in the table show that the preferred specification according to the

    Bayesian Information Criterion has P -dynamics given by

    dLt

    dSt

    dCt

    =

    κP11 0 0

    0 κP22 κP23

    0 0 κP33

    θP1

    θP2

    θP3

    Lt

    St

    Ct

    dt+

    σ11 0 0

    0 σ22 0

    0 0 σ33

    dWL,Pt

    dW S,Pt

    dWC,Pt

    .

    21

  • Alternative Goodness-of-fit statisticsspecifications logL k p-value BIC(1) Unrestricted KP 51,042.41 24 n.a. -101,916.7(2) κP

    12= 0 51,042.40 23 0.8875 -101,923.7

    (3) κP12

    = κP32

    = 0 51,042.40 22 0.8875 -101,930.7(4) κP

    12= κP

    32= κP

    31= 0 51,042.23 21 0.5598 -101,937.4

    (5) κP31

    = . . . = κP21

    = 0 51,037.57 20 0.0023 -101,935.1(6) κP

    31= . . . = κP

    13= 0 51,035.98 19 0.0745 -101,938.9

    (7) κP31

    = . . . = κP23

    = 0 51,015.27 18 < 0.0001 -101,904.5

    Table 3: Evaluation of Alternative Specifications of the AFNS0 Model.There are seven alternative estimated specifications of the AFNS0 model with constant volatility. Each

    specification is listed with its maximum log likelihood (logL), number of parameters (k), the p-value

    from a likelihood ratio test of the hypothesis that it differs from the specification above with one more

    free parameter, and the Bayesian information criterion (BIC).

    KP KP·,1 K

    P·,2 K

    P·,3 θ

    P Σ

    KP1,· 0.03943 0 0 0.07242 Σ1,· 0.00570

    (0.07332) (0.01703) (0.00009)KP

    2,· 0 0.43102 -0.69198 -0.03173 Σ2,· 0.00888(0.11962) (0.08121) (0.01271) (0.00020)

    KP3,· 0 0 0.83341 -0.01873 Σ3,· 0.02728

    (0.22767) (0.00676) (0.00047)

    Table 4: Parameter Estimates for the Preferred AFNS0 Model.The estimated parameters of the KP -matrix, the θP -vector, and the Σ-matrix for the preferred AFNS0model according to the Bayesian Information Criterion are shown. The Q-related parameter is λ =

    0.53650 (0.00363). The numbers in parentheses are the estimated standard deviations of the parameter

    estimates. The maximum log likelihood value is 51,035.98.

    The estimated dynamic parameters for this specification are reported in Table 4. Relative

    to the unrestricted model, the likelihood ratio test for the five restrictions jointly in the

    preferred specification are

    LRBIC = 2[51, 042.41 − 51, 035.98] = 12.86 ∼ χ2(5).

    The probability of observing at least 12.86 with five degrees of freedom is 0.0247. Thus,

    the five restrictions are not jointly supported by the data at the 5% level, but they are not

    overwhelmingly rejected either.

    In terms of the estimated parameters reported in Table 4 that are used in the simulations

    of the AFNS0 model, we note the usual pattern that the level factor is the most persistent

    and least volatile factor, the curvature is the most volatile and least persistent factor, and the

    slope factor has dynamic properties in between those two extremes. Finally, the estimated

    value of λ is close to 0.5, which is a typical value for this parameter.

    22

  • Ten-year samples, σε = 1 bpParameterTrue Mean Std. dev. 5% 1st quartile Median 3rd quartile 95%

    κP11 0.03943 0.50335 0.41496 0.07704 0.20025 0.39116 0.67401 1.2888κP22 0.43102 0.61752 0.28653 0.25449 0.42644 0.56905 0.74371 1.1264κP23 -0.69198 -0.76423 0.20026 -1.1174 -0.88497 -0.74995 -0.63269 -0.46515κP33 0.83341 1.2243 0.57056 0.53381 0.82011 1.1043 1.4970 2.2933

    σ11 0.00570 0.00571 0.00024 0.00533 0.00555 0.00570 0.00587 0.00611σ22 0.00888 0.00882 0.00058 0.00782 0.00842 0.00880 0.00924 0.00973σ33 0.02728 0.02747 0.00178 0.02454 0.02625 0.02752 0.02866 0.03038

    θP1 0.07242 0.07273 0.01802 0.04275 0.06020 0.07265 0.08519 0.10222θP2 -0.03173 -0.03199 0.01407 -0.05503 -0.04109 -0.03195 -0.02253 -0.00902θP3 -0.01873 -0.01903 0.00859 -0.03407 -0.02487 -0.01863 -0.01319 -0.00519

    λ 0.53650 0.53635 0.00322 0.53134 0.53415 0.53642 0.53838 0.54172

    σε 0.00010 0.00010 0.00000 0.00010 0.00010 0.00010 0.00010 0.00010

    Ten-year samples, σε = 10 bpsParameterTrue Mean Std. dev. 5% 1st quartile Median 3rd quartile 95%

    κP11 0.03943 0.53931 0.47879 0.08383 0.21206 0.39589 0.70937 1.4538κP22 0.43102 0.62201 0.29042 0.24693 0.41999 0.57518 0.75535 1.1663κP23 -0.69198 -0.77180 0.20727 -1.1374 -0.89486 -0.75441 -0.63236 -0.46117

    κP33 0.83341 1.2275 0.58183 0.51673 0.81498 1.1118 1.4986 2.3486

    σ11 0.00570 0.00583 0.00067 0.00474 0.00539 0.00585 0.00627 0.00691σ22 0.00888 0.00871 0.00079 0.00749 0.00814 0.00870 0.00927 0.00996σ33 0.02728 0.02755 0.00263 0.02331 0.02580 0.02750 0.02936 0.03191

    θP1 0.07242 0.07281 0.01805 0.04235 0.06009 0.07300 0.08522 0.10207θP2 -0.03173 -0.03199 0.01412 -0.05512 -0.04133 -0.03202 -0.02258 -0.00862

    θP3 -0.01873 -0.01917 0.00868 -0.03430 -0.02478 -0.01885 -0.01323 -0.00548

    λ 0.53650 0.53714 0.02259 0.50028 0.52241 0.53714 0.55097 0.57385

    σε 0.00100 0.00100 0.00003 0.00096 0.00098 0.00100 0.00102 0.00104

    Table 5: Summary Statistics of Estimated Parameters from Simulated Ten-YearMonthly Samples of the Preferred AFNS0 Model.

    The table reports the summary statistics of the estimation results from N = 1,000 simulated data

    sets of the preferred AFNS0 model, each with a length of ten years and a uniform measurement error

    standard deviation of σε = 1 basis point and σε = 10 basis points, respectively.

    7.1 Analysis of Ten-Year Monthly Samples

    The summary statistics from the 1,000 estimations based on simulated ten-year monthly

    samples of the preferred specification of the AFNS0 model are reported in Table 5. We note

    that there is an upward bias in the absolute size of all four mean-reversion parameters, that

    is, the three positive parameters in the diagonal of KP have means and medians well above

    their true values, while the negative off-diagonal element, κP23, has a mean and median that

    is below its true value. Hence, there is notable finite-sample bias in the estimates of these

    parameters. In particular, the near unit-root property of the Nelson-Siegel level factor is

    causing the estimator significant difficulty. More than 95% of the estimates of κP11 are above

    0.077 despite its true value of only 0.039. These results show that a near unit-root process

    can come across as very persistent as well as rather quickly mean-reverting in samples of short

    length such as the ten-year samples analyzed here. Figure 2 provides the visual representation

    23

  • 0 200 400 600 800 1000

    01

    23

    4

    Estimation No.

    Par

    amet

    er e

    stm

    ate

    (a) κP11.

    0 200 400 600 800 1000

    0.0

    0.5

    1.0

    1.5

    2.0

    2.5

    Estimation No.

    Par

    amet

    er e

    stm

    ate

    (b) κP22.

    0 200 400 600 800 1000

    −1.

    5−

    1.0

    −0.

    50.

    0

    Estimation No.

    Par

    amet

    er e

    stm

    ate

    (c) κP23.

    0 200 400 600 800 1000

    01

    23

    45

    Estimation No.

    Par

    amet

    er e

    stm

    ate

    (d) κP33.

    Figure 2: Estimated Mean-Reversion Parameters from Simulated Ten-YearMonthly Samples of the Preferred AFNS0 Model.

    Illustration of the estimated mean-reversion parameters in the KP matrix from N = 1,000 simulated

    data sets of the preferred AFNS0 model, each with a length of ten years sampled monthly and a uni-

    form measurement error standard deviation of σε = 10 basis points. The true value of each parameter

    is indicated with a horizontal solid grey line.

    of the estimated mean-reversion parameters across the 1,000 samples. We note that they have

    notably skewed distributions, partly as a consequence of the imposed stationarity.

    Turning to the three volatility parameters in the Σ matrix, we note that they are well

    determined with almost identical means and medians, both close to the true values, and the

    standard deviations of their estimates are also small. Importantly, though, their accuracy

    is sensitive to the quality of the data as a low value of σε decreases the dispersion of their

    24

  • estimated values. This result applies to all three factors, and it suggests that the values of the

    volatility parameters are determined to a large extent from their impact on the cross-sectional

    fit of yields rather than from the time series properties of the state variables, which are the

    same in the simulated data by construction and independent of the value of σε.

    The mean parameters under the P -measure, θP , represent the opposite case. Due to the

    flexibility of the essentially affine risk premium specification within the Gaussian models,

    these parameters play no role for the Q-dynamics and, by implication, have no effect on the

    cross-sectional fit of the model. As a consequence, their estimated values are purely derived

    from the time series properties of the state variables and their distributions are independent

    of the level of noise in the yield data. Furthermore, they are estimated without any detectable

    bias, and the standard deviation of their estimated values is also relatively modest, but larger

    the more persistent the factor in question is.

    Focusing on the estimates of λ, Table 5 shows that this parameter is well determined

    in the estimation with a small standard deviation. It has a 95% confidence interval given

    by (0.500, 0.574) for the case with noise error standard deviation of 10 basis points, and

    an even narrower interval given by (0.531, 0.541) when we reduce the standard deviation of

    the measurement noise to 1 basis point. Since λ only affects the risk-neutral Q-dynamics,

    it is exclusively determined from the cross section of yields and therefore sensitive to the

    quality of the data. Still, variation in the values of λ in the ranges above does not alter the

    cross-sectional fit of the model by much. Thus, its statistical uncertainty is largely without

    economic consequences.

    Finally, the estimates of the measurement error standard deviation exhibit very little

    variation across the simulated samples. However, as noted, their size affect the accuracy of

    the three volatility parameters and λ. This supports the conjecture put forward by CDR

    that the elements in the volatility matrix in the AFNS0 model are determined primarily in

    order to deliver the best possible fit to the cross section of yields rather than matching the

    actual volatility correlation structure among the three state variables. On the other hand,

    the properties of the estimates of the elements in the mean-reversion matrix KP and the

    mean vector θP are essentially unaffected by the size of σε as these parameters reflect the

    time-series dynamics of the three state variables and their values have no consequences for

    the bond yield function fitted to the cross section of observed yields.

    In addition to studying the finite-sample properties of the estimated parameters, we are

    also interested in knowing to what extent the parameter standard deviations estimated from

    the optimized likelihood function in the Kalman filter are reliable in the sense that they reflect

    the variation in the estimated parameters across the 1,000 simulated samples. In this exercise,

    we hence use the empirical standard deviation of the 1,000 estimates of each parameter as

    a proxy for the true, unobserved standard deviation of the estimated parameters.23 Table 6

    23One potential caveat here is that the estimated parameters—the KP parameters in particular—followasymmetric distributions that are not necessarily well summarized by the standard deviation.

    25

  • Parameter Ten-year samples, σε = 1 bpstd. dev. “True” Mean Std. dev. 5% 1st quartile Median 3rd quartile 95%

    σ(κP11) 0.41496 0.32510 0.13282 0.15106 0.22730 0.30352 0.39944 0.57859σ(κP22) 0.28653 0.25756 0.09782 0.14000 0.18904 0.23983 0.30301 0.44689σ(κP23) 0.20026 0.20055 0.05054 0.12971 0.16252 0.19487 0.22862 0.28981σ(κP33) 0.57056 0.55056 0.14534 0.34405 0.44860 0.53131 0.63680 0.80056

    σ(σ11) 0.00024 0.00026 0.00003 0.00021 0.00023 0.00025 0.00028 0.00032σ(σ22) 0.00058 0.00065 0.00008 0.00052 0.00059 0.00065 0.00070 0.00077σ(σ33) 0.00178 0.00189 0.00022 0.00156 0.00174 0.00187 0.00203 0.00228

    σ(θP1 ) 0.01802 0.00586 0.00471 0.00140 0.00256 0.00434 0.00778 0.01526σ(θP2 ) 0.01407 0.01398 0.00860 0.00473 0.00812 0.01201 0.01742 0.03048σ(θP3 ) 0.00859 0.00878 0.00460 0.00399 0.00597 0.00780 0.01055 0.01641

    σ(λ) 0.00322 0.00332 0.00076 0.00221 0.00277 0.00324 0.00376 0.00470

    σ(σε) 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000

    Parameter Ten-year samples, σε = 10 bps

    std. dev. “True” Mean Std. dev. 5% 1st quartile Median 3rd quartile 95%

    σ(κP11) 0.47879 0.36140 0.18019 0.15427 0.23479 0.32578 0.44125 0.69790σ(κP22) 0.29042 0.26323 0.10357 0.13906 0.19128 0.24009 0.30903 0.46648σ(κP23) 0.20727 0.20818 0.05662 0.13038 0.16588 0.20025 0.24281 0.31129

    σ(κP33) 0.58183 0.58518 0.17083 0.35368 0.45967 0.56241 0.67907 0.89759

    σ(σ11) 0.00067 0.00073 0.00008 0.00060 0.00067 0.00072 0.00078 0.00087σ(σ22) 0.00079 0.00086 0.00010 0.00071 0.00079 0.00086 0.00092 0.00102σ(σ33) 0.00263 0.00287 0.00035 0.00232 0.00262 0.00285 0.00309 0.00348

    σ(θP1 ) 0.01805 0.00585 0.00463 0.00142 0.00256 0.00430 0.00768 0.01525σ(θP2 ) 0.01412 0.01413 0.00891 0.00461 0.00816 0.01213 0.01735 0.02998

    σ(θP3 ) 0.00868 0.00895 0.00493 0.00407 0.00589 0.00784 0.01071 0.01666

    σ(λ) 0.02259 0.02320 0.00525 0.01558 0.01944 0.02261 0.02644 0.03220

    σ(σε) 0.00003 0.00003 0.00000 0.00003 0.00003 0.00003 0.00003 0.00003

    Table 6: Summary Statistics of Estimated Parameter Standard Deviations fromSimulated Ten-Year Monthly Samples of the Preferred AFNS0 Model.

    The table reports the summary statistics of the estimated parameter standard deviations from N =

    1,000 simulated data sets of the preferred AFNS0 model, each with a length of ten years and a uniform

    measurement error standard deviation of σε = 1 basis point and σε = 10 basis points, respectively.

    contains the summary statistics for the monthly ten-year samples.

    The parameter standard deviations we calculate from the optimized likelihood function are

    reasonably accurate for the parameters without bias; σ11, σ22, σ33, θP2 , θ

    P3 , λ, and σε. However,

    even for the parameters with a modest bias, κP22, κP23, and κ

    P33, the estimated parameter

    standard deviations are relatively close to, but slightly below the actual variation in the

    estimated parameters. Finally, for κP11 and θP1 , there is a more severe downward bias in the

    estimated parameter standard deviations relative to the actual variation in the parameter

    estimates. Overall, the conclusion is that the standard deviations obtained from the Kalman

    filter underestimate the true variation for the parameters with bias. This will make these

    parameters look more significant than they actually are. This problem is particularly severe

    for the estimated parameters in the mean-reversion matrix KP as their point estimates are

    notably upward biased to begin with. This makes model selection and validation extremely

    26

  • 0 200 400 600 800 1000

    −1.

    0−

    0.5

    0.0

    0.5

    1.0

    Estimation No.

    Est

    imat

    ed c

    orre

    latio

    n co

    effic

    ient

    (a) Correlation of Lt and St.

    0 200 400 600 800 1000

    −1.

    0−

    0.5

    0.0

    0.5

    1.0

    Estimation No.

    Est

    imat

    ed c

    orre

    latio

    n co

    effic

    ient

    (b) Correlation of Lt and Ct.

    0 200 400 600 800 1000

    −1.

    0−

    0.5

    0.0

    0.5

    1.0

    Estimation No.

    Est

    imat

    ed c

    orre

    latio

    n co

    effic

    ient

    (c) Correlation of St and Ct.

    Figure 3: Pairwise Correlations of Estimated Factor Paths from Simulated Ten-Year Monthly Samples of the Preferred AFNS0 Model.

    Illustration of the correlations between the estimated paths of the three state variables in N = 1,000

    simulated data sets of the preferred AFNS0 model, each with a length of ten years and a uniform

    measurement error standard deviation of σε = 10 basis points. Horizontal solid grey lines indicate the

    factor correlations in the true unconditional distribution.

    treacherous when one or more of the state variables are highly persistent. Unfortunately,

    this is not an issue that can be neglected since it is the specification of KP that determines

    a model’s forecast performance and term premium decomposition as discussed in detail in

    Bauer et al. (2012).

    Figure 3 shows the correlations between the estimated factor paths across the 1,000 sam-

    ples. We note that, in short ten-year samples, factor path correlations are not a reliable guide

    to “spotting” the appropriate dynamic relationship between the factors in multi-dimensional

    models of the yield curve as the lack of mean-reversion of the level factor means that almost

    any level of correlation can be observed even though within the simulated model, the level

    factor is entirely independent of the two other factors. Furthermore, even for the slope and

    curvature factors, which are strongly positively correlated within the simulated model, the

    observed correlation can be low, and even negative, with non-trivial probability.

    To end the analysis of the ten-year monthly samples, we analyze the accuracy of the

    filtering of the state variables. Table 7 reports the mean absolute difference between the

    simulated factor paths and the estimated factor paths from the Kalman filter. For the level

    and the slope factor, their absolute filtered error is close to the size of σε that represents

    the noise in the data. This might be due to the fact that they affect yields one-for-one at

    their maximum loading in the yield function. For the curvature factor, its absolute filtered

    error tends to be slightly more than three times larger than the size of σε since its maximum

    loading in the yield function is barely 0.3.

    27

  • State Mean absolute fitted error, ten-year samples, σε = 1 bpvariable Mean Std. dev. 5 percentile 1st quartile Median 3rd quartile 95 percentileLt 2.16 0.77 1.48 1.66 1.87 2.39 3.79St 2.01 0.81 1.30 1.47 1.69 2.27 3.78Ct 4.89 0.48 4.21 4.55 4.83 5.13 5.77

    State Mean absolute fitted error, ten-year samples, σε = 10 bpsvariable Mean Std. dev. 5 percentile 1st quartile Median 3rd quartile 95 percentileLt 11.77 1.59 9.56 10.70 11.52 12.64 14.51St 11.42 1.66 9.37 10.31 11.13 12.17 14.57Ct 34.66 3.25 29.87 32.48 34.42 36.56 40.35

    Table 7: Summary Statistics of Mean Absolute Fitted Errors of the Filtered StateVariables from Simulated Ten-Year Monthly Samples of the Preferred AFNS0Model.

    The table reports the summary statistics of the mean absolute fitted error of the three state variables

    from N = 1,000 simulated data sets of the preferred AFNS0 model, each with a length of ten years

    and a uniform measurement error standard deviation of σε = 1 basis point and σε = 10 basis points,

    respectively. All numbers are measured in basis points.

    7.2 Analysis of Forty-Year Monthly Samples

    In this section, we analyze the results obtained for the forty-year monthly samples simulated

    from the AFNS0 model.

    For a start, Table 8 contains the summary statistics for the 1,000 estimated parameter sets

    we obtain from these monthly forty-year samples. For the parameters determined primarily

    from the cross section, i.e., λ, σ11, σ22, and σ33, we see a reduction of about 50% in their

    dispersion when we quadruple the length of the sample. For the other unbiased parameters,

    θP1 , θP2 , and θ

    P3 , we see a similar reduction in the dispersion for the two latter, while the

    variation in the estimates of θP1 is reduced by only about 20%. This is tied to the fact that,

    even with this sample length, κP11 is still estimated with notable upward bias although it is

    much less severe than in the ten-year samples. On the other hand, for the remaining mean-

    reversion parameters with bias, κP22, κP23, and κ

    P33, we see a significant reduction in their bias.

    In addition, the uncertainty of their estimated values is reduced by a factor of 2.5, which

    reflects the combined effect of increasing the sample length (which reduces the uncertainty in

    itself) and the reduction in the finite-sample bias.

    For the parameters determined from the cross section of yields, we note that a ten-year

    sample of high quality (σε = 1 basis point) tends to lead to more accurate estimates than

    forty-year samples of relatively noisy data (σε = 10 basis points). Thus, whether a long, more

    noisy sample or a short, high quality sample is the more appropriate, really depends on the

    parameters of interest. The accuracy of parameters in KP and θP are determined by the

    sample length and largely independently of the data quality, while the accuracy of estimates

    of λ and the parameters in the Σ volatility matrix can be more sensitive to data quality than

    to sample length.

    28

  • Forty-year samples, σε = 1 bpParameterTrue Mean Std. dev. 5% 1st quartile Median 3rd quartile 95%

    κP11 0.03943 0.15530 0.11662 0.03282 0.07563 0.12431 0.20568 0.38359κP22 0.43102 0.46791 0.08887 0.34128 0.40697 0.45893 0.52220 0.63095κP23 -0.69198 -0.70833 0.08404 -0.84889 -0.76527 -0.70567 -0.64882 -0.57729κP33 0.83341 0.94312 0.22975 0.62266 0.77463 0.91457 1.0842 1.3699

    σ11 0.00570 0.00571 0.00011 0.00553 0.00563 0.00571 0.00578 0.00589σ22 0.00888 0.00887 0.00028 0.00840 0.00868 0.00887 0.00905 0.00932σ33 0.02728 0.02733 0.00084 0.02602 0.02675 0.02732 0.02792 0.02871

    θP1 0.07242 0.07299 0.01497 0.04628 0.06317 0.07379 0.08262 0.09583θP2 -0.03173 -0.03166 0.00817 -0.04497 -0.03728 -0.03167 -0.02590 -0.01842θP3 -0.01873 -0.01864 0.00492 -0.02653 -0.02184 -0.01865 -0.01540 -0.01053

    λ 0.53650 0.53641 0.00141 0.53405 0.53546 0.53636 0.53735 0.53873

    σε 0.00010 0.00010 0.00000 0.00010 0.00010 0.00010 0.00010 0.00010

    Forty-year samples, σε = 10 bpsParameterTrue Mean Std. dev. 5% 1st quartile Median 3rd quartile 95%

    κP11 0.03943 0.15875 0.12341 0.03325 0.07462 0.12392 0.20531 0.40518κP22 0.43102 0.46867 0.09180 0.33625 0.40536 0.45924 0.52424 0.63261κP23 -0.69198 -0.71050 0.08795 -0.85659 -0.76772 -0.70791 -0.65208 -0.57646

    κP33 0.83341 0.94851 0.24482 0.60052 0.77666 0.91701 1.0976 1.4252

    σ11 0.00570 0.00574 0.00033 0.00520 0.00553 0.00574 0.00594 0.00626σ22 0.00888 0.00885 0.00038 0.00821 0.00859 0.00885 0.00910 0.00949σ33 0.02728 0.02734 0.00126 0.02522 0.02646 0.02738 0.02820 0.02938

    θP1 0.07242 0.07300 0.01499 0.04632 0.06290 0.07406 0.08273 0.09618θP2 -0.03173 -0.03166 0.00818 -0.04470 -0.03727 -0.03167 -0.02590 -0.01831

    θP3 -0.01873 -0.01866 0.00495 -0.02662 -0.02193 -0.01864 -0.01548 -0.01042

    λ 0.53650 0.53618 0.01019 0.51956 0.52931 0.53605 0.54286 0.55275

    σε 0.00100 0.00100 0.00001 0.00098 0.00099 0.00100 0.00101 0.00102

    Table 8: Summary Statistics of Estimated Parameters from Simulated Forty-YearMonthly Samples of the Preferred AFNS0 Model.

    The table reports the summary statistics of the estimation results from N = 1,000 simulated data sets

    of the preferred AFNS0 model, each with a length of forty years and a uniform measurement error

    standard deviation of σε = 1 basis point and σε = 10 basis points, respectively.

    Figure 4 shows the distribution of the estimated parameters in the KP mean-reversion

    matrix across the 1,000 samples when the sample length is forty years and the noise has a

    standard deviation of 10 basis points. Relative to the distribution from the ten-year samples

    shown in Figure 2, we note the significant reduction in both the dispersion and skewness of

    the estimates of each of these four parameters when the sample length is quadrupled.

    Table 9 reports the summary statistics of the estimated parameter standard deviations we

    obtain from the optimized likelihood function in the Kalman filter for the forty-year samples.

    We note that the means and medians are close to each other and close to the standard devia-

    tion of the parameter estimates that we use as a proxy for the true, but unobserved parameter

    uncertainty. The pair (κP11, θP1 ) remains the exception for which the estimated standard devi-

    ations still significantly understate the actual variation in the estimated parameters.

    To end the analysis of the forty-year monthly samples, we analyze the accuracy of the

    29

  • 0 200 400 600 800 1000

    0.0

    0.5

    1.0

    1.5

    2.0

    Estimation No.

    Par

    amet

    er e

    stm

    ate

    (a) κP11.

    0 200 400 600 800 1000

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    Estimation No.

    Par

    amet

    er e

    stm

    ate

    (b) κP22.

    0 200 400 600 800 1000

    −1.

    0−

    0.8

    −0.

    6−

    0.4

    −0.

    20.

    0

    Estimation No.

    Par

    amet

    er e

    stm

    ate

    (c) κP23.

    0 200 400 600 800 1000

    0.0

    0.5

    1.0

    1.5

    2.0

    Estimation No.

    Par

    amet

    er e

    stm

    ate

    (d) κP33.

    Figure 4: Estimated Mean-Reversion Parameters from Simulated Forty-YearMonthly Samples of the Preferred AFNS0 Model.

    Illustration of the estimated mean-reversion parameters in the KP matrix from N = 1,000 simulated

    data sets of the preferred AFNS0 model, each with a length of forty years and a uniform measurement

    error standard deviation of σε = 10 basis points. The true value of each parameter is indicated with

    a horizontal solid grey line.

    filtering of the state variables. Table 10 reports the mean absolute difference between the

    simulated factor paths and the estimated factor paths from the Kalman filter in this case.

    Compared to the results for the ten-year monthly samples reported in Table 7, there is a

    modest gain in the quality of the filtering from quadrupling the sample length. However,

    as measured by the median of the absolute filtered errors, the difference is about 0.5 basis

    points. Thus, for all practical purposes, the filtering accuracy is the same and not sensitive

    30

  • Parameter Forty-year samples, σε = 1 bpstd. dev. “True” Mean Std. dev. 5% 1st quartile Median 3rd quartile 95%

    σ(κP11) 0.11662 0.08415 0.03025 0.04295 0.06274 0.07986 0.10214 0.13916σ(κP22) 0.08887 0.08785 0.01606 0.06462 0.07669 0.08584 0.09698 0.11745σ(κP23) 0.08404 0.08482 0.01030 0.06927 0.07720 0.08390 0.09169 0.10208σ(κP33) 0.22975 0.22606 0.03238 0.17799 0.20380 0.22351 0.24643 0.28205

    σ(σ11) 0.00011 0.00012 0.00001 0.00011 0.00011 0.00012 0.00012 0.00013σ(σ22) 0.00028 0.00030 0.00002 0.00027 0.00029 0.00030 0.00031 0.00033σ(σ33) 0.00084 0.00087 0.00005 0.00078 0.00083 0.00087 0.00090 0.00096

    σ(θP1 ) 0.01497 0.00802 0.00540 0.00228 0.00414 0.00663 0.01041 0.01921σ(θP2 ) 0.00817 0.00798 0.00214 0.00491 0.00647 0.00769 0.00926 0.01174σ(θP3 ) 0.00492 0.00476 0.00113 0.00313 0.00396 0.00461 0.00542 0.00680

    σ(λ) 0.00141 0.00141 0.00018 0.00115 0.00129 0.00140 0.00152 0.00172

    σ(σε) 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000

    Parameter Forty-year samples, σε = 10 bps

    std. dev. “True” Mean Std. dev. 5% 1st quartile Median 3rd quartile 95%

    σ(κP11) 0.12341 0.08677 0.03343 0.04349 0.06199 0.08040 0.10587 0.14805σ(κP22) 0.09180 0.08995 0.01677 0.06647 0.07849 0.08762 0.09911 0.12037σ(κP23) 0.08795 0.08794 0.01140 0.07112 0.07970 0.08732 0.09499 0.10791

    σ(κP33) 0.24482 0.23593 0.03729 0.18051 0.21062 0.23243 0.25698 0.30326

    σ(σ11) 0.00033 0.00032 0.00002 0.00029 0.00031 0.00032 0.00034 0.00036σ(σ22) 0.00038 0.00039 0.00002 0.00036 0.00038 0.00040 0.00041 0.00043σ(σ33) 0.00126 0.00130 0.00008 0.00116 0.00124 0.00129 0.00135 0.00143

    σ(θP1 ) 0.01499 0.00802 0.00542 0.00223 0.00414 0.00656 0.01042 0.01893σ(θP2 ) 0.00818 0.00800 0.00218 0.00491 0.00649 0.00773 0.00931 0.01186

    σ(θP3 ) 0.00495 0.00478 0.00115 0.00314 0.00394 0.00464 0.00545 0.00686

    σ(λ) 0.01019 0.01050 0.00127 0.00864 0.00961 0.01039 0.01130 0.01283

    σ(σε) 0.00001 0.00001 0.00000 0.00001 0.00001 0.00001 0.00001 0.00001

    Table 9: Summary Statistics of Estimated Parameter Standard Deviations fromSimulated Forty-Year Monthly Samples on the Preferred AFNS0 Model.

    The table reports the summary statistics of the estimated parameter standard deviations from N =

    1,000 simulated data sets of the preferred AFNS0 model, each with a length of forty years and a uniform

    measurement error standard deviation of σε = 1 basis point and σε = 10 basis points, respectively.

    to the sample length.

    7.3 Analysis of Weekly Samples

    In this section, we analyze the estimation results we obtain with the exact same data analyzed

    thus far, but sampled at a weekly frequency. Importantly, we emphasize that the simulated

    factor paths are identical estimation-by-estimation, only the observed frequency has changed.

    This should make the results as comparable as possible. Thus, only the simulated measure-

    ment errors are not the same across the two exercises.

    As before, we start with an analysis of the ten-year samples the results for which are

    reported in Table 11. In general, the mean and median of the 1,000 estimates of each param-

    eter are close to identical to those obtained with monthly data. Thus, in this sense, there

    are limited benefits from increasing the data frequency. Still, we do see some reduction in

    31

  • State Mean absolute fitted error, forty-year samples, σε = 1 bpvariable Mean Std. dev. 5 percentile 1st quartile Median 3rd quartile 95 percentileLt 1.69 0.20 1.50 1.57 1.63 1.73 2.09St 1.51 0.23 1.31 1.37 1.43 1.55 1.97Ct 4.64 0.17 4.38 4.52 4.63 4.75 4.93

    State Mean absolute fitted error, forty-year samples, σε = 10 bpsvariable Mean Std. dev. 5 percentile 1st quartile Median 3rd quartile 95 percentileLt 11.13 0.57 10.23 10.71 11.09 11.49 12.12St 10.64 0.54 9.84 10.27 10.61 10.96 11.59Ct 34.07 1.46 31.64 33.06 34.02 35.04 36.52

    Table 10: Summary Statistics of Mean Absolute Fitted Errors of the Filtered StateVariables from Simulated Forty-Year Monthly Samples of the Preferred AFNS0Model.

    The table reports the summary statistics of the mean absolute fitted error of the three state variables

    from N = 1,000 simulated data sets of the preferred AFNS0 model, each with a length of forty years

    and a uniform measurement error standard deviation of σε = 1 basis point a