32
Expectations and the empirical fit of DSGE models Preliminary draft prepared for SNDE. Check chrisopherggibbs.weebly.com for latest draft. Eric Gaus * Christopher G. Gibbs March 2017 Abstract This paper studies the mechanisms that contribute to the observed improvement in fit of DSGE models that assume adaptive learning. DSGE models often struggle to reproduce the persistence observed in actual macroeconomic data. To generate this persistence, many researchers have proposed relaxing the rational expectations (RE) assumption in favor of adaptive learning. Empirically, these relaxations are found to increase persistence and almost universally improve model fit compared to RE. However, we show that these improvements are not always explained by the increase in persistence implied by adaptive learning. Instead, a significant propor- tion of the increased fit is due to the relaxation of the cross equation restrictions imposed by RE alone. Our findings suggest that many model comparisons between RE and adaptive learning speak more to the misspecification of RE than to the veracity of the particular learning strategy. We propose a simple check to assess the contribution that learning adds to any estimated model. 1 Introduction DSGE models often struggle to reproduce the persistence observed in actual macroeco- nomic data. To overcome this issue, many researchers add additional frictions, preference assumptions, or ad hoc adjustment. 1 Alternatively, some have argued that relaxing the rational expectations (RE) assumption in favor of adaptive learning offers a theoretically * Gaus: Ursinus College, 601 East Main St., Collegville, PA 19426-1000 (e-mail: [email protected]) Gibbs: University of New South Wales. (e-mail: [email protected]) 1 The lack of persistence generated by DSGE models under RE is a well known. These modifications include ad hoc corrections such as adding lags of the endogenous variables to the structural equations (Galı and Gertler (1999) and Ireland (2004)), changes to preference such as habit persistence (Fuhrer (2000)), changes to inflation setting by firm through inflation indexation (Cogley and Sbordone (2008)), or adding information problems such as rule-of-thumb behavior (Amato and Laubach (2003)), or rational inattention/sticky information (Mankiw and Reis (2002); Ball et al. (2005)) to name a few. Armed with subset of these modifications, Del Negro et al. (2007) declare that the NK model fits the data well enough to be used for policy evaluation. 1

Expectations and the empirical t of DSGE modelschristopherggibbs.weebly.com/uploads/3/8/2/6/... · unrestricted VAR. In addition,Diebold et al.(2015) nds that adding stochastic volatil-ity

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

  • Expectations and the empirical fit of DSGEmodels

    Preliminary draft prepared for SNDE. Check chrisopherggibbs.weebly.com for latest draft.

    Eric Gaus∗ Christopher G. Gibbs†

    March 2017

    Abstract

    This paper studies the mechanisms that contribute to the observed improvementin fit of DSGE models that assume adaptive learning. DSGE models often struggleto reproduce the persistence observed in actual macroeconomic data. To generatethis persistence, many researchers have proposed relaxing the rational expectations(RE) assumption in favor of adaptive learning. Empirically, these relaxations arefound to increase persistence and almost universally improve model fit comparedto RE. However, we show that these improvements are not always explained by theincrease in persistence implied by adaptive learning. Instead, a significant propor-tion of the increased fit is due to the relaxation of the cross equation restrictionsimposed by RE alone. Our findings suggest that many model comparisons betweenRE and adaptive learning speak more to the misspecification of RE than to theveracity of the particular learning strategy. We propose a simple check to assessthe contribution that learning adds to any estimated model.

    1 Introduction

    DSGE models often struggle to reproduce the persistence observed in actual macroeco-

    nomic data. To overcome this issue, many researchers add additional frictions, preference

    assumptions, or ad hoc adjustment.1 Alternatively, some have argued that relaxing the

    rational expectations (RE) assumption in favor of adaptive learning offers a theoretically

    ∗Gaus: Ursinus College, 601 East Main St., Collegville, PA 19426-1000 (e-mail: [email protected])†Gibbs: University of New South Wales. (e-mail: [email protected])1The lack of persistence generated by DSGE models under RE is a well known. These modifications

    include ad hoc corrections such as adding lags of the endogenous variables to the structural equations(Galı and Gertler (1999) and Ireland (2004)), changes to preference such as habit persistence (Fuhrer(2000)), changes to inflation setting by firm through inflation indexation (Cogley and Sbordone (2008)),or adding information problems such as rule-of-thumb behavior (Amato and Laubach (2003)), or rationalinattention/sticky information (Mankiw and Reis (2002); Ball et al. (2005)) to name a few. Armed withsubset of these modifications, Del Negro et al. (2007) declare that the NK model fits the data well enoughto be used for policy evaluation.

    1

  • appealing way to generate persistence without employing additional modifications to the

    model structure. For example, Milani (2006, 2007), Slobodyan and Wouters (2012a,b),

    and Cole and Milani (2014) show that relaxing RE in favor of an adaptive learning

    strategies can generate significant improvements in in-sample fit and in some cases may

    lessen the reliance on assumptions such as habit persistence (Milani (2007)) to explain

    macroeconomic persistence, while leaving inference on all other parameters unchanged.

    In this paper, we reexamine the observed improvements in empirical fit of DSGE

    models that assume adaptive learning. A broad conclusion in much of this literature is

    that the improvement in fit observed for these models is directly tied to the persistence

    that they can potentially generate. However, we show that this conclusion is not always

    correct and may depend crucially on the specific modeling choices for expectations. In

    particular, considering any relaxation of RE has implications for the cross equations

    restrictions imposed on the time series representation of the model that is taken to the

    data. RE typically places tight restrictions on this underlying structure, which may be

    loosened in surprising ways whenever a relaxation of RE is considered. We show that

    the loosening of these restrictions alone can account for a significant proportion of the

    improvement in in-sample fit and out-of-sample forecast accuracy for estimated New

    Keynesian models that assume adaptive learning.

    We illustrate the point by comparing RE and adaptive learning with a constant gain,

    which we refer to as constant gain learning (CGL), to an intermediate case, which we call

    fixed beliefs (FB). In the FB case, we assume that beliefs are formed by a fixed forecasting

    function with the same reduced form as the RE solution of the model. The FB case,

    therefore, nests RE as a special case and is nested by adaptive learning. Importantly,

    though, the FB case does not add any mechanisms to the model that generate persistence.

    All it does is separate the estimation of the structural parameters from the estimation

    of beliefs. In fact, one way to view the FB case is as estimating the model under CGL

    with the gain parameters, which governing the effect of new information on the agents’

    forecasting functions, constrained to be zero. In this case, the agents’ forecasts in all

    periods coincide with whatever is chosen or estimated for their initial beliefs.

    We explore this relaxation in a version of the New Keynesian model developed by

    Ireland (2004), which we strip of all internal propagation mechanisms that are not di-

    rectly generated as a consequence of expectations. The model offers a parsimonious and

    tractable laboratory to study the role of expectations play in estimation, but which re-

    mains significantly detailed to be taken to data. We focus on the two of most widely

    used adaptive learning strategies in the literature: Euler Equation constant gain learning

    (EE-CGL) as in Evans and Honkapohja (2001) and Infinite Horizon constant gain learn-

    2

  • ing (IH-CGL) as proposed by Preston (2005). These two cases are interesting to compare

    because they imply significantly different ranges of persistence for identical calibrations.

    In EE-CGL case, we show that nearly all of the improvement are explained by the re-

    laxation of RE in the FB case and not added persistence. We then confirm this result

    in the medium scale DSGE model of Smets and Wouters (2007). In this model, we find

    that relaxing the RE assumption without learning generates significant improvements in

    in-sample fit that are similar in magnitude to those reported by Slobodyan and Wouters

    (2012b) for a number of different EE-CGL specifications. In the IH-CGL case, learning

    and persistence does play a more prominent role. But, even here, the majority of the

    improvement in in-sample fit comes from the loosening of the cross equation restrictions

    relative to RE.

    There are three main takeaways from our results:

    1. First, our results provide continued support for the exploration of bounded rational-

    ity strategies in DSGE model since even small deviations from RE offer significant

    improvements in-sample fit and real-time out-of-sample forecasting performance,

    while maintaining the standard theoretical structure of the DSGE model.

    2. Second, our results demonstrate that comparisons between bounded rationality

    strategies and RE should be done with care. Many comparisons say more about

    the misspecification of the model under RE than the benefits of the particular

    learning strategy.

    3. Finally, our results suggest a natural benchmark to asses an adaptive learning

    strategy is to compare the fit of the model with parameter updating to the model

    without parameter updating. If learning is important, then model fit should sig-

    nificantly improve when parameter updating is allowed. This is also a natural test

    for the role played by the initial beliefs chosen for the learning algorithm.2

    The existing literature recognizes that different assumption for initial beliefs and

    different specifications of learning vary in their performance relative to RE. However,

    the existing literature does not attempt to distinguish the shared features of these as-

    sumption that generate improvements in fit and no study to our knowledge has assessed

    whether those improvements translate into real-time out-of-sample forecast accuracy.

    For example, both Slobodyan and Wouters (2012a,b) and Cole and Milani (2014) find

    2Slobodyan and Wouters (2012a) and Berardi et al. (2012) both show that the selection of initialbeliefs can have significant effects on inference and fit.

    3

  • improvements in fit relative to RE whether agents are assumed to have correctly speci-

    fied time-varying beliefs (as traditionally assumed under adaptive learning), misspecified

    time-varying beliefs, or fixed but non-rational beliefs. Our results show that the same

    mechanism may drive a substantial portion of the observed improvements in each of these

    cases. In addition, previous comparisons of out-of-sample fit, such as the exercise done

    by Slobodyan and Wouters (2012a), only consider ex-post data, which has the potential

    to overstate true forecasting power as noted by Croushore and Stark (2003).

    Although the out-of-sample fit of the NK model with adaptive learning is largely

    unexplored in the literature, the RE counterpart of the model has been studied by nu-

    merous researchers.3 In particular, Adolfson et al. (2007) find that DSGE forecasts can

    out-perform VARs in short-term forecasts and Bayesian VARs over the long-term fore-

    casts in an open-economy framework. Smets and Wouters (2007) also find comparable

    out-of-sample forecast performance of the model relative to Bayesian VARs. Del Negro

    et al. (2007) finds, using DSGE-VARs, that the cross-equation restrictions implied by

    a rational solution of a New Keynesian model can add forecasting power relative to an

    unrestricted VAR. In addition, Diebold et al. (2015) finds that adding stochastic volatil-

    ity to the model improves forecasting. Although, a caveat to these papers is the use of

    ex-post data instead of real-time data. Kolasa et al. (2012) reevaluates the DSGE-VAR

    evidence against a large scale NK DSGE model with real-time data and finds that except

    for short-term forecasts, the large DSGE model can perform as well, or better than the

    DSGE-VAR. Smets et al. (2014) also finds sizable gains in real-time forecasting accuracy

    in NK models when actual survey expectations are used as observables.

    Our results compare favorably to these results with improvements in forecast accuracy

    at long horizons for inflation and comparable accuracy at short horizons when compared

    to parsimonious time series models on US out-of-sample data spanning the Great Mod-

    eration. Likewise, forecasts for output growth are best at long horizons relative to short

    horizons. However, no noticeable improvements over the time series models is observed

    for output growth forecasts in the NK model considered here.

    In the next section, we discuss the stylized facts for estimation of New Keynesian

    DSGE models with constant gain learning to highlight the key relationship this paper

    explains. In Section 3, we discuss how constant gain learning strategy are typically

    introduced into a DSGE model and how it changes the mapping from the structural

    parameters of interest to the reduced form of the model taken to the data when comparing

    RE to CGL. In Section 4, we introduce a parsimonious DSGE model and derive the

    reduced form mappings implied under different expectation assumptions. In Section 5, we

    3The exception of course being Slobodyan and Wouters (2012a).

    4

  • estimate the five different models and compare the results to the stylized facts discussed

    in Section 2. In Section 6, we explain the role that the loosening of the RE restrictions

    plays in improving model fit. In Section 7, we show that the same mechanism driving fit

    in our parsimonious model appear to underlie the improvements in fit of medium scale

    DSGE models as well. In Section 8, we conclude.

    2 The stylized facts of estimated NK models with

    CGL

    Table 1 reports the parameter estimates from two of the most widely cited studies on

    estimated NK models with adaptive learning: Milani (2007) and Slobodyan and Wouters

    (2012b). Milani investigates adaptive learning in a small scale DSGE model, while Slo-

    bodyan and Wouters investigate learning in the medium scale DSGE model of Smets and

    Wouters (2007). The table reports some key parameter estimates from the two studies

    for the respective models assuming RE and asumming minimum state variable (MSV)

    CGL. The parameter estimates exhibit three stylized facts that are ubiquitous among

    CGL estimation exercises:

    1. First, the estimates of the key parameters that describe monetary policy and the

    exogenous shocks are mostly unchanged between the RE and CGL cases. In par-

    ticular, columns 3 and 6 in the table show the difference between the two cases,

    where asterisks denote whether the changes falls inside of the 95% HPD intervals

    of the RE estimates. Nearly all of the parameters estimates under CGL remain

    within the HPD interval of the rational estimates.

    2. Second, the model under CGL exhibits a significant improvement in in-sample fit

    as measured by marginal log-likelihood relative to RE.

    3. Third, the gain parameters that govern how agents update their beliefs in the

    learning algorithm are estimated to be relatively small.

    The first stylized fact is often interpreted as evidence that persistence explains the

    improvement in fit documented by the second stylized fact. This is because small changes

    in the structural parameters appear to suggest that the structure of the model fits the

    data in the same way as under RE. However, the third fact - small estimated values for

    the gain parameters - complicates this interpretation. Small gain parameters such as

    these can imply extreme persistence in the learning process that goes beyond business

    5

  • cycle frequencies that are typically of interest to an econometrician. In practice, these

    gains are so small that the beliefs may basically remain at whatever values they are

    initialized at for the duration of data used to estimate the model.

    Together, these stylized facts imply that there may be other factors that contribute

    to increase in fit beyond simply the addition of new mechanism to propagate shocks.

    We argue that a key element to explaining these stylized facts is to understand how

    relaxing RE in an estimated model affects the mapping of the structural parameters to

    the reduced form matrices that are taken to the data.

    3 Relaxing the RE restrictions through CGL

    The most common approach to implement an adaptive learning strategy is to start with

    the linearized structural equations of a DSGE model such as

    yt = Γ + Ayt−1 + BIEtyt+1 + Dvt + Kεt (1)

    vt = Rvt−1 + ut, (2)

    where yt is a vector endogenous variables and vt is vector of autoregressive exogenous

    shocks, and to replace the expectation operator with a predetermined boundedly rational

    forecasting strategy. This is the standard approach taken by Evans and Honkapohja

    (2001) in their exhaustive theoretical treatment on adaptive learning and the approach

    most often taken in the behavioral heterogeneous expectation literature as in Hommes

    (2013) and the reference therein. The key here is that agents are only asked to form one-

    step-ahead forecasts and that they ignore any implications of their forecasts for longer

    horizons. The appeal of this approach is that if the chosen forecasting process nests RE,

    then under well-known regularity conditions (E-stability) the agents’ beliefs will stay in

    the neighborhood of the RE solution. Therefore, it allows for potentially complicated

    but bounded beliefs around the natural benchmark of RE in a wide range of models.

    A second approach follows Preston (2005) and asks agents to consider the implications

    of their beliefs on their entire decision problem. In many cases, this means solving

    out an agent’s decision rule, given beliefs today, into the infinite future. The approach

    yields linearized structural equations that nest the standard equations studied in the

    first approach and RE, but which capture the impact of long run beliefs. Again, if the

    forecasting process chosen for agents nests the RE solution, then the functional form of

    the model will nest the previous case, and under well-known regularity conditions the

    6

  • Table 1: Stylized facts of Estimated NK models with CGL

    Slobadyan and Wouters JEDC 2012 Milani JME 2007

    RE CGL-MSV Change RE CGL - MSV Change

    Monetary policy & habits Monetary policy & habits

    MP inflation 2.04 1.91 0.13* MP inflation 1.433 1.484 -0.051*[1.75, 2.33] [1.06, 1.81] [1.08, 1.90]

    MP output 0.09 0.13 -0.04* MP output gap 0.792 0.801 -0.009*[0.05, 0.013] [0.425, 1.165] [0.433, 1.18]

    MP output growth 0.22 0.19 0.03* MP smoothing 0.89 0.914 -0.024*[0.18, 0.27] [0.849,0.93] [0.875, 0.947]

    MP smoothing 0.81 0.84 -0.03* Habits 0.911 0.117 0.794[0.77, 0.85] [0.717, 0.998] [0.006, 0.289]

    Habits 0.71 0.80 -0.09[0.64, 0.78]

    AR parameters AR Parameters

    Productivity 0.96 0.96 0.00* Demand shock 0.87 0.845 0.025*[0.94, 0.98] [0.8,0.93] [0.776, 0.908]

    Risk premium 0.22 0.23 -0.01* Price mark-up 0.02 0.854 -0.834[0.08, 0.36] [0.0005, 0.07] [0.778, 0.93]

    Gov. spending 0.98 0.96 0.02*[0.96, 0.99]

    Investment 0.71 0.33 0.38[0.62, 0.81]

    MP shock 0.15 0.05 0.1*[0.04, 0.24]

    Price mark-up 0.89 0.88 0.01*[0.81. 0.97]

    Wage mark-up 0.97 0.95 0.02*[0.95, 0.99]

    St. Dev. Shocks St. Dev Shocks

    Productivity 0.46 0.47 -0.01* MP Shock 0.933 0.860 0.073*[0.41, 0.51] [0.84, 1.04] [0.777, 0.953]

    Risk premium 0.24 0.25 -0.01* Demand shock 1.067 1.670 -0.603[0.20, 0.28] [0.89, 1.22] [1.47, 1.91]

    Gov. spending 0.53 0.53 0.00* Price mark-up 1.146 1.150 -0.004*[0.48, 0.58 [1.027, 1.27] [1.02, 1.31]

    Investment 0.45 0.61 -0.16[0.37, 0.53]

    MP shock 0.24 0.24 0.00*[0.22, 0.27]

    Price mark-up 0.14 0.14 0.00*[0.11, 0.17]

    Wage mark-up 0.24 0.23 0.01*[0.21, 0.28]

    Gains Gains- 0.017 0.018

    [0.006,0.021] [0.0133,0.0231]Marginal Likelihood Marginal Likelihood

    -922.75 -910.97 -765.45 -759.08

    7

  • agents beliefs will depart from those predicted under RE, but remain bounded around

    the natural benchmark of RE as before.

    Both types of learning models are taken to data using the same methods as their RE

    counterpart. The models have state space representations and inference on a vector of

    structural parameters, Θ, may be obtained by standard likelihood based techniques using

    the Kalman filter. Depending on the assumption for how agents form their forecasts, there

    are of course many new dimensions through which adaptive learning may contribute to

    the observed improvements in fit. This includes the mechanical increases that occur

    because of an increase in the degrees of freedom, or the time variation that is introduced

    into the parameters through learning. Another, and the primary focus of this paper, is a

    change in the mapping from the structural parameters to the reduced form of the model

    taken to the data.

    To illustrate how the mapping changes, consider estimating a vector of parameters

    Θ from a standard linearized DSGE as in Equations (1) and (2), where the true data

    generating process is unknown. The minimum state variable RE solution of the model

    takes the following form

    yt = C + Fyt−1 + Qvt + Kεt. (3)

    The RE restrictions in this case are summarized by the mapping F : Θ → {C,F,Q},which governs how changes in the elements of Θ change C, F, or Q. The mapping in

    general is highly nonlinear and often has no closed form solution. Inference on Θ in

    this case may be obtained by standard likelihood based techniques such that ΘMLE =

    argmaxΘL(Θ|yobs, F ), where L is the likelihood function.Any adaptive learning strategy, however, implies a different mapping. One which is

    unambiguously less restrictive following common assumptions employed in this literature.

    To see this, consider the case where agents forecast using Equation (3) but do not update

    parameters over time. The agents just pick a parameterization for Equation (3) and use

    it to forecast in each period. This is the same as assuming agents are learning the MSV

    solution but that the gain parameters that govern how beliefs are updated are constrained

    to be zero, which makes beliefs unchanging over time. Under this assumption, the data

    generating process for the economy takes the following form

    yt = (Γ + BĈ)︸ ︷︷ ︸C̃

    + (A + BF̂)︸ ︷︷ ︸F̃

    yt−1 + (BQ̂R + D)︸ ︷︷ ︸Q̃

    vt + Kεt, (4)

    where Equation (3) is substituted in for IEtyt+1 and where hats denote fixed values for

    8

  • the belief parameters. Note the following regarding Equation (4):

    1. Equation (4) retains the same reduced form as Equation (3).

    2. Conditioning on Ĉ, F̂, and Q̂, Equation (4) is described by the same structural

    parameters, which we colled in the vector Ξ.

    Now, let’s assume that agents pick beliefs based on ΘMLE = argmaxΘL(Θ|yobs, F )such that the RE values of Ĉ|ΘMLE , F̂|ΘMLE , and Q̂|ΘMLE are used in constructing Equa-

    tion (4). This implies the following mapping G : Ξ×ΘMLE → {C̃, F̃, Q̃} to describe thiscase. By construction, if Θ = Ξ, then Equation (4) is equal to Equation (3). If adpative

    learning, represented by allowing time variation in Ĉ, F̂, and Q̂ based on past data, is

    responsible for explaining improvements in fit, then in this case ΞMLE from Equation (4)

    should be equal to ΘMLE from Equation (3) and

    maxΘ

    IEL(Θ|yobs, F ) = maxΞ

    IEL∗(Ξ|yobs, Ĉ|ΘMLE , F̂|ΘMLE , Q̂|ΘMLE).

    Whether or not these equalities hold in practice depend on a number of factors. For

    example, suppose the true data generating process is given by

    yt = CT + FTyt−1 + Q

    Tvt + KTεt, (5)

    and suppose that Θ0 = F−1({CT,FT,QT}) exists. Then, by construction, it follows

    that F (Θ0) = G(Θ0,Θ0) and Θ0 = argmaxΘIEL(Θ|yobs, F ) = argmaxΞIEL∗(Ξ|yobs, G).However, if Θ0 = F

    −1({CT,FT,QT}) does not exist or is not feasible, then there is noguarantee that Θ is equal to Ξ, and in general

    maxΘ

    IEL(Θ|yobs, F ) ≤ maxΞ

    IEL∗(Ξ|yobs, G),

    where the inequality follows from the definition of F . Specifically, the range of mapping

    F is by construction a subset of the range of G because F (Θ) := G(Θ,Θ).

    Therefore, if RE is misspecified, then the first step in any adaptive learning scheme

    of substituting in a belief function into the structural equations effects the cross equation

    restrictions that map the structural parameters to the reduced from. But, importantly,

    it does not fundamentally change the underlying reduced form structure of the model,

    i.e., Equation (5), Equation (4), and Equation (3) continue to nest one another. There

    is nothing explicitly done to the model that allows it to generate more persistence. Now

    9

  • of course, this insight does not imply that further modification such as assuming time-

    variation in C̃, F̃, and Q̃ as in adaptive learning will not better fit the data, but it

    highlights another mechanism through which adaptive learning models may improve in-

    sample fit. In the remainder of the paper, we specifically ask whether adaptive learning

    does improve in-sample fit beyond just the loosening of the restrictions implied by RE.

    4 The Model, Expectations, and the Reduced Form

    To study the role loosened restrictions plays in model fit, we consider five different as-

    sumptions for expectations that share a common reduced form:

    yt = CRFt + F

    RFt yt−1 + Q

    RFt vt + K

    RF εt, (6)

    that each impose different restrictions on CRFt , FRFt , and Q

    RFt . In this section, we put

    forward a tractable model that allows for analytical derivation of the mapping from the

    structural parameters to the reduced form so that changes implied by different expecta-

    tions assumptions can be studied in detail.

    4.1 The Model

    The New Keynesian model we study is a parsimonious version of the model proposed by

    Ireland (2004). The microfoundations of the model are given in Appendix A. Following

    Preston (2005), the key equations are

    xt = −ωât + ĨEt∞∑T=t

    βT−t [(1− β)(xT+1 + ωρaâT )− (iT − π̂T+1)− (ρa − 1)âT ] (7)

    πt = λ1ĨEt

    ∞∑T=t

    (λ1β)T−t(

    1− λ1λ1

    πT + ψxT − eT)

    (8)

    it = r̄ + π̄ + θπ(πt − π̄) + θxxt + �i,t (9)

    gt = ŷt − ŷt−1 + ḡ + �z,t (10)

    xt = ŷt − ωat (11)

    at = ρaat−1 + �a,t (12)

    et = ρeet−1 + �e,t, (13)

    where xt is the output gap, πt is inflation, it is the nominal interest rate, gt is the growth

    rate of output, yt is the stochastically detrended level of output, at is a preference shock,

    10

  • et is a cost push shock, �i,t is a monetary policy shock, and �z,t is the detrended TFP

    shock. The individual variables are in log terms.

    We choose this model for three reasons. First, the model has no internal propagation

    mechanisms other than expectations. In fact, the minimum state variable RE solution

    does not depend on any lagged endogenous variables. This has the added benefit of

    allowing us to estimate the model without a projection facility, which is usually required

    to keep expectations from becoming explosive due partly to numerical precision issues.

    Gaus and Ramamurthy (2012) note that the choice of projection facility can have a

    significant effect on estimation outcomes, which may complicate a comparison with RE.

    Second, the determinacy and E-stability conditions of the model perfectly coincide.4

    This allows us to impose identical restrictions on the parameter space for RE and CGL

    versions of the model. And finally, the model generates predictions for a number of

    directly observable variables that do not require any manipulation such as HP filtering

    or even de-meaning in order to estimate.

    4.1.1 Rational Expectations

    It is straightforward to show that imposing RE on Equations (7) and (8) allows them to

    be collapsed to a more familiar form

    xt = r̄ + IEREt xt+1 − (it − ĨE

    RE

    t πt+1) + (1− ω)(1− ρa)at (14)

    πt = (1− β)π̄ + β ĨERE

    t πt+1 + ψxt − et. (15)

    Substituting in Equation (9), we can map the model into the general form of Equations

    (1) and (2)

    yt = Γ + Ayt−1 + BĨEtyt+1 + Dvt + Kεt

    vt = ρvt−1 + ut,

    where yt = (xt, πt)′, vt = (at, et)

    ′, ut = (�a,t, �e,t)′, εt = (�i,t, 0)

    ′,

    Γ = m

    (π̄(θπβ − 1)

    −π̄((θx(β − 1)− 1 + β + ψ − θπψ)

    ), B = m

    (1 1− θπβψ β(1 + θx) + ψ

    ),

    4The determinacy and E-stability condition for the model is the Taylor principle: θπ >(β−1)θx+ψ

    ψ .

    11

  • D = m

    ((ρa − 1)(ω − 1) θπ

    (ρa − 1)(ω − 1)ω −1− θx

    ), ρ =

    (ρa 0

    0 ρe

    )

    m = (1 + θx + θπψ)−1, A = 02×2, and K = −I2. Using the reduced form of Equation

    (6) and the method of undetermined coefficient, we can solve analytically for the RE

    solution in terms of the structural parameters: CRE = (0, π̄)′,

    QRE =

    (− (ρa−1)(βρa−1)(ω−1)

    1+θx−θxβρa+βρ2a+θπψ−ρa(1+β+ψ)θπ−ρe

    1+θx−θxβρe+βρ2e+θπψ−ρe(1+β+ψ)(ρa−1)ψ(ω−1)

    1+θx−θxβρa+βρ2a+θπψ−ρa(1+β+ψ)ρe−1−θx

    1+θx−θxβρe+βρ2e+θπψ−ρe(1+β+ψ)

    ), (16)

    FRE = 02×2, and KRE = −I. This is of course the explicit mapping F : Θ →

    {CRE,FRE,QRE} discussed in Section 2 for this model.There are two notable features about this mapping. First, RE places restrictions on

    the intercept term CRE forcing them to be consistent with the steady state of the model.

    As we will see, this will not be the case under other expectation assumptions. Second, the

    mapping from the structural parameters to the reduced form of QRE is highly nonlinear.

    The mapping to each element of QRE is an eight degree polynomial in β, ψ, θπ, θx, ρa, ρe,

    and ω. The nonlinearity of this mapping is important because it means that relatively

    small changes in the values of structural parameters can have large effects on the reduced

    form and fit.

    4.1.2 Euler equation learning

    The second expectations assumption we consider is Euler Equation Constant Gain Learn-

    ing (EE-CGL). To implement this strategy, we start with the same structural equation

    as under RE given by Equations (1) and (2). We assume that the agents perceived law

    of motion (PLM) takes the same functional form of Equation (6)

    yt = CRF + QRFvt + εt. (17)

    The agents estimate the reduced form coefficients using past data by a constant gain

    recursive least squares algorithm

    Φt = Φt−1 + γS−1t zt−1(yt − 1− z′t−1Φt−1) (18)

    St = St−1 + γ(zt−1z′t−1 − St−1), (19)

    12

  • where zt is vector of data, Φt is a vector of regression coefficients, St is the estimated

    variance-covariance matrix, and γ is a matrix of gain parameters that govern the weight

    placed on new information.

    Expectations under EE-CGL at time t are given by5

    IEEEt yt+1 = Ĉt−1 + Q̂t−1Rvt.. (20)

    Substituting Equation (20) in for the beliefs in Equation (1) yields the following mapping

    to the reduced form equations:

    CEEt = Γ + BĈ = m(

    Ĉ11t−1 + (Ĉ21t−1 − π̄)(1 − βθπ)

    Ĉ11t−1ψ + Ĉ21t−1(β(1 + θx) + ψ) − π̄(θx(β − 1) + ψ(1 − θπ) + β − 1)

    )(21)

    and

    QEEt = BQ̂t−1ρ+ D = (22)

    m(

    1 − ω + ρa(Q̂11t−1 + Q̂21t−1(1 − θπβ) + ω − 1) θπ + (Q̂

    12t−1 + Q̂

    22t−1 − θπc22β)ρe

    Q̂21t−1ρa(β(1 + θx) + ψ) + ψ(1 − ω + ρa(Q̂11t−1 + ω − 1)) θx(Q̂

    22t−1βρe − 1) + Q̂

    12t−1ρeψ + Q̂

    22t−1ρe(β + ψ) − 1,

    )where Ĉijt−1 and Q̂

    ijt−1 represent the i

    th row and jth column element of Ĉt−1 and Q̂t−1,

    respectively.

    Comparing CRE to CEEt , it is apparent that the restriction placed on the reduced form

    under RE are loosened under EE-CGL in two ways. First, the reduced forms intercepts

    are no longer explicitly tied to the steady states of the model. They depend on other

    structural parameters and through beliefs, which frees them vary over time. Second,

    comparing QRE to QEEt , the degree of nonlinearity of the mapping from structural pa-

    rameters to the reduced form is reduced. The reduction in nonlinearity allows for changes

    in structural parameters to have smaller effects on the other parameters, which allows

    these parameters to take on new values without having as significant an impact elsewhere

    in the model. For example, consider the limit of the first element of QEEt compared QRE

    as ρa → 1. For the RE case, the coefficient goes to zero. Therefore, as the preferenceshock, at, becomes more persistent, its affect on the output gap approaches zero. This of

    course offsets the persistence by effectively removing the shock from the model. In the

    EE case, however, no such restriction is imposed. Here as ρa goes to one QEE,11t goes to

    m(Q̂11t−1 + Q̂21t−1(1− θπβ)), which allows a more persistence preference shock to affect the

    output gap at least in short run, when beliefs Q̂ are not at their RE values.

    5As is common in the literature, we assume agent know R.

    13

  • 4.1.3 Infinite horizon learning

    The third expectation assumption we consider is the infinite horizon learning solution.

    This solution uses the full forward-looking decision rules given by Equation (7) and (8)

    and asks agents to make decisions with respect to their beliefs over all future periods.

    This means agents must form forecasts of xt and πt as well as it for T = 1, ...,∞. Incontrast, there is no need to forecast interest rates under EE-CGL. Therefore, to preserve

    our nested structure and to allow the agents in this case to use the same PLM as in EE-

    CGL, we assume that agents know the coefficients of the Taylor rule, which allows them

    to only forecast xt and πt. With this assumption, the sequence of agents’ expectations

    are computed as

    IEIHt

    ∞∑T=t

    βT−tyT+1 = Ĉt−1(1− β)−1 + Q̂t−1(I− βρ)−tρvt (23)

    and

    IEIHt

    ∞∑T=t

    (λ1β)T−tyT+1 = Ĉt−1(1− λ1β)−1 + Q̂t−1(I− λ1βρ)−tρvt, (24)

    where Ĉt−1 and Q̂t−1 are the coefficients of Equation (17) estimated using the constant

    gain recursive least squares algorithm discussed previously. Substituting these expecta-

    tions into Equation (7) and (8), the reduced form may be expressed as

    yt = ΓIH + BIHĈt−1

    +(B′1,IHQ̂t−1 (I− βρ)

    −1 ρ+ B′2,IHQ̂t−1 (I− βλ1ρ)−1 ρ+ D

    )vt + G�t

    where CIHt = ΓIH + BIHĈt−1,

    ΓIH = m

    (π̄(θπ−1)(1−β) −

    (1−β)π̄θπ(1−βλ)

    (1−β)π̄(θx+1)(1−βλ) +

    π̄(θπ−1)ψ(1−β)

    ),BIH = m

    (1−β(θx+1)

    (1−β) −βθπλψ(1−βλ)

    1−βθπ(1−β) −

    βθπ(1−λ)(1−βλ)

    βλψ(θx+1)(1−βλ) +

    ψ(1−β(θx+1))(1−β)

    β(1−λ)(θx+1)(1−βλ) +

    ψ(1−βθπ)(1−β)

    )

    QIHt = B′1,IHQ̂t−1 (I− βρ)

    −1 ρ+ B′2,IHQ̂t−1 (I− βλ1ρ)−1 ρ+ D,

    B′1,IH = m

    (1− β(1 + θx) 1− βθπ

    ψ (1− β (θx + 1)) ψ (1− βθπ)

    ), and B′2,IH = m

    (−βθπλψ −βθπ(1− λ)

    βλψ (θx + 1) β(1− λ1) (θx + 1)

    ).

    The IH-CGL case dramatically alters the mapping from structural parameters to the

    14

  • reduced form, while still nesting the RE solutions. The changes here generate an entirely

    different type of loosening of restrictions compared to the EE case. The IH-CGL case

    turns out to permit a range of values for CIHt and QIHt that are not obtainable in the

    other cases for any reasonable parameterization of the model. We illustrate this in detail

    in later section.

    4.1.4 The fixed belief cases

    Finally, the fourth and fifth expectation assumptions we consider are the fixed beliefs

    counterparts to the EE-CGL and IH-CGL specifications. In particular, we fix the values

    for Ĉ and Q̂ and do not consider any learning. Throughout the paper, we assume that

    Ĉ = CRE, F̂ = FRE, and Q̂ = QRE such that the FB cases is defined by the explicit

    mapping G : Ξ × Θ → {CRF ,FRF ,QRF} from Section 2, where Ξ is the vector ofstructural parameters to be estimated and Θ is a vector of structural parameters that

    gives rises to the fixed forecasting function.

    5 Taking the model to the data

    5.1 The state space

    The five expectation assumptions share a common reduced form and a common state

    space representation. The common state space representation is given by

    yobst = HXt

    Xt = Jt−1 + Ft−1Xt−1 + St−1εt

    where Jt−1 = Ω−1t−1µt−1, Ft−1 = Ω

    −1t−1Ψ, St−1 = Ω

    −1t−1ζ,

    µt−1 =

    CRFt

    r̄ + (1− θπ)π̄ḡ

    03×1

    , Ωt−1 =

    I2×2 02×3 −QRFt−θx −θπ 1 0 0 0 0

    0 0 0 1 −1 0 0−1 0 0 0 1 −ω 0

    02×5 I2×2

    15

  • Ψ =

    03×7

    0 0 0 0 −1 0 00 0 0 0 0 0 0

    0 0 0 0 0 ρa 0

    0 0 0 0 0 0 ρe

    , ζ =

    01×2 −1 01×401×7

    01×2 1 01×4

    01×3 1 01×3

    01×7

    02×5 I2×2

    ,

    where yobst = (xt, πt, it, gt)′ are the observable varaibles and Xt = (xt, πt, it, gt, yt, at, et)

    ′.

    In the two adaptive learning cases, CRFt and QRFt evolve according to Equation (18) and

    (19). Otherwise, CRFt = CRF and QRFt = Q

    RF are fixed values.

    5.2 Estimation results

    In this section, we estimate the five versions of the model using maximum likelihood

    and compare the results.6 We use US data from 1984q1 through 2008q3. We use CBO

    measure of the output gap, the GDP deflator measure of inflation, the three month

    treasury bill rate, and growth rate of real GDP as our observables.

    To avoid known issues with weak identification and other estimation issues that arise

    in these models, we calibrate some of the key parameters. We set β = 0.995, π̄ = 2%, and

    ψ = 0.1. The slope of the Phillips curve (ψ) is set following Ireland (2004), which in the

    Calvo pricing framework corresponds to the average firm adjusting its price roughly once

    a year. We also calibrate the mean growth rate ḡ to the average growth rate observed

    over the estimation period. The remaining parameters {θπ, θx, ω, ρa, ρe, σa, σe, σi, σg} areestimated along with {γx, γπ}, the gain parameters, in the CGL case.

    Table 2 shows the estimation results for the five different model specifications. Com-

    paring the results for RE to EE-CGL and IH-CGL, the three stylized facts noted in

    Section 2 are all present. In particular, the inference on the structural parameters and

    6To construct confidence intervals for the the structural parameter estimates, we calculate confidenceintervals using a method similar to the one proposed by Stock and Watson (1998) for time-varyingparameter models. We employ this method because the constrained optimization routine we use tomaximize the likelihood function provides unreliable numerical estimates of the Hessian matrix. Theconfidence intervals are constructed for each parameter by selecting a grid surrounding the ML estimate.We then re-maximize the log-likelihood function by searching over all other parameters, while holdingthe parameter of interest fixed at one point on the grid. The maximized log-likelihood value obtainedat that point is then compared to the original maximum log-likelihood value using a likelihood ratiotest. The set of points on the grid that yield maximized log-likelihood values that fail to reject thenull hypothesis that the true parameter vector lies in the restricted parameter space at the 5% levelconstitutes the 95% confidence interval.

    16

  • exogenous shock process is roughly the same across the three cases. Although,tThe CGL

    cases yield significant improvements in log-likelihood relative RE. And, the estimates for

    the gain parameters are small. In fact, in our case the gains are very small, which is

    partly due to the fact that our estimation does not make use of priors.

    Turning the EE-FB and IH-FB cases, we note that the results are nearly indistinguish-

    able from their learning counterparts. In particular, we cannot reject the null hypothesis

    that the EE-FB model fits the data as well as the EE-CGL model using a likelihood

    ratio test and only reject for the IH case at the 10% significance level. Therefore, the

    loosening of restrictions described in Section 3 appear to account for a majority of the

    improvement fit.

    To illustrate the point, Figure 1 shows the estimated impulse responses for a monetary

    policy shock, a preference shock, and a cost push shock for the five different models. The

    monetary policy shock provides the clearest picture for the role that learning plays in

    adding persistence. There is no mechanism to transmit the shocks in this model except

    through adaptive learning. Therefore, the RE and FB cases only respond in the period

    the shock is realized and then immediately return to steady state. In the two adaptive

    learning cases, the shock does exhibit some persistence through beliefs leading to small

    but persistent responses for the endogenous variables. However, the responses to the

    remaining shocks are for the most part just scaled versions of the RE response. Learning

    does not add any substantive additional dynamics in these cases.

    Turning to the reduced form of the five models, Table 3 reports the elements of

    QRF for each case (for learning we show the value implied by the initial belief) at their

    estimated values reported in Table 2. Because of our decision to calibrate π̄, the implied

    values of CRF are roughly equivalent across the five cases and not shown here. The

    EE cases exhibits a modest and consistent departure from the RE case, while the IH

    cases are drastically different. The table illustrates that relatively modest changes for

    the structural parameters can imply fairly significant changes in the reduced form.

    5.3 Out-of-sample fit

    Finally, we compare the models in a real-time out-of-sample forecasting exercise to deter-

    mine whether the improvement in fit translate in to out-of-sample forecasting power. We

    use the real-time dataset provided by the Philadelphia federal reserve for inflation, gdp

    growth, and interest rates. For a real-time measure of the output gap, we use the Fed’s

    Green Book nowcast of the output gap. We evaluate the forecasts at four different fore-

    cast horizons: the now cast, one-quarter, four-quarter, and six-quarter ahead forecasts

    17

  • Table 2: ML estimates

    RE EE-FB EE-CGL IH-FB IH-CGL

    θπ 1.577 1.123 1.125 1.578 1.653[1.31 , 1.97] [0.99* , 1.61] [0.99* , 1.73] [1.36 , 1.97] [1.45 , 1.73]

    θx 0.166 0.208 0.211 0.160 0.157[0.08 , 0.29] [0.11 , 0.36] [0.21 , 0.30] [0.08 , 0.26] [0.12 , 0.19]

    ω 0.000 0.000 0.000 0.066 0.000[0.00* , 0.02] [0.00* , 0.01] [0.00* , 0.00] [0.00* , 0.48] [0.00* , 0.00]

    ρa 0.933 0.926 0.925 0.973 0.991[0.87 , 1.00*] [ 0.86 , 0.99 ] [0.90 , 0.98] [0.95 , 0.99] [0.97 , 0.99]

    ρe 0.967 0.917 0.899 0.684 0.525[0.87 , 1.00*] [0.80 , 1.00] [0.89 , 0.95] [0.50 , 0.86] [0.31 , 0.76]

    σa × 100 1.671 1.633 1.613 0.555 0.146[0.82 , 6.77] [1.36 , 2.04] [1.47 , 1.79] [0.19 , 1.38] [0.01 , 1.46]

    σe × 100 0.071 0.072 0.073 0.166 0.229[0.06 , 0.10] [0.06 , 0.09] [0.07 , 0.08] [0.11 , 0.22] [0.23 , 0.23]

    σi × 100 0.558 0.558 0.561 0.560 0.557[0.46 , 0.70] [0.46 , 0.73] [0.54 , 0.58] [0.47 , 0.70] [0.52 , 0.61]

    σg × 100 0.179 0.180 0.180 0.175 0.179[0.16 , 0.21] [0.16 , 0.22] [0.17 , 0.22] [0.15 , 0.21] [0.16 , 0.19]

    γx - - 0.00 - 0.00[1.9E-06 , 2.5E-06] [2.4E-08 , 3.0E-08]

    γπ - - 0.00 - 0.001[1.7E-09 , 1.8E-09] [0.00 , 0.02]

    Log Likelihood 1974.49 1982.6 1983.72 1986.84 1988.97LR Statistic (rel. RE) 16.22 18.46 24.42 28.96p-value (0.005) (0.006) (0.000) (0.000)LR Statistic (rel. FB) 2.24 4.54p-value (0.163) (0.052)

    Notes: *Denotes an imposed boundary for the parameter that truncates the confidence interval. ML estimates for theIreland model under the five different assumptions for expectations. 95% confidence intervals for the estimates are shown inbrackets below the point estimates. The confidence intervals are truncated when they hit a theoretically imposed boundary,which is indicated by *. The forecasting function in the FB cases and the initial beliefs in the CGL cases are set to the REsolution implied by the estimates in the first column.

    Table 3: Reduced form estimates of Q̃RF

    Q̃11RF Q̃21RF Q̃

    12RF Q̃

    22RF

    RE 0.060 0.083 8.899 -2.901FB-EE 0.091 0.085 7.272 -2.921CGL-EE 0.091 0.085 7.126 -2.881FB-IH -0.858 0.023 2.367 -1.355CGL-IH -3.251 0.081 1.840 -0.937

    Notes: Reduced form values implied by the estimated parameters given inTable 2.

    18

  • Figure 1: Estimating Impulse Responses

    0 20 40 60 80 100Output Gap

    0

    0.2

    0.4

    0.6 IH-CGLIH-FBEE-CGLEE-FBRE

    0 50 100Inflation

    0.5

    1

    1.5

    2

    0 20 40 60 80 100Interest Rate

    2.5

    3

    3.5

    4

    0 5 10 15GDP Growth

    2.5

    3

    3.5

    4

    4.5

    5

    5.5

    Cost push shock

    0 20 40 60 80 100Output Gap

    0

    0.2

    0.4

    0.6

    0.8

    0 50 100Inflation

    2

    2.5

    3

    0 20 40 60 80 100Interest Rate

    4

    4.5

    5

    5.5

    0 5 10GDP Growth

    3

    4

    5

    6

    Preference shock

    0 10 20 30 40 50Output Gap

    -0.6

    -0.4

    -0.2

    0

    0 50 100Inflation

    1.96

    1.97

    1.98

    1.99

    2

    2.01

    0 10 20 30 40 50Interest Rate

    4

    4.5

    5

    5.5

    6

    0 5 10GDP Growth

    1

    2

    3

    4

    5

    Interest rate shock

    Notes: Impulse responses to one standard deviation shocks implied by the estimated parametersgiven in Table 2.

    19

  • for inflation and GDP growth. We use the second releases of data as the comparison

    series to measure forecast accuracy. Inference on improvements in forecast accuracy are

    obtained using the Diebold and Mariano (1995) test statistic (DM) with the Harvey et

    al. (1997) small sample and forecast horizon correction. This test statistic is found to

    work well on tests of real-time forecast by Clark and McCracken (2009) and Clark and

    McCracken (2011).

    Table 4 shows the out-of-sample results for the five cases plus results fro bivariate

    VAR(2) using inflation and GDP growth and a random walk forecast. The top row,

    labeled RE, gives the root mean squared forecast error (RMSFE) of the RE forecasts for

    inflation and GDP growth at the four different horizons. The remaining rows give the

    relative RMSFE of the remaining forecasts compared to RE with the DM test statistic

    in parentheses below. The results here are mixed. The FB and CGL cases generate

    consistent improvements over the RE model for inflation with the largest gains observed

    at short horizons. Comparing the VAR and the random walk, CGL and FB do best at

    long horizons. For GDP Growth, both FB and CGL do about the same as the RE model

    at all horizons and improve relative to the VAR and random walk at the long horizons.

    The results illustrate that the in sample improvements in fit translate into out-of-

    sample forecasting power and are not just evidence of overfitting. Furthermore, most of

    the gains observed in this case are explained by the FB cases. The additional persistence

    added by learning does not add much to out-of-sample forecasting power.

    6 Understanding how RE restrictions affect fit

    To illustrate how small changes in the reduced form can lead to large changes in model

    fit, we turned to calibrated version of the models. We consider the possible EE-FB and

    IH-FB reduced forms values that are implied for a range of the structural parameters

    ( θπ, θx, ρa, and ρe), while holding all other parameters fixed. We then use the range

    of reduced form values generated by the two cases to solve for θπ, θx, ρa, and ρe that

    are necessary to generate the same reduced form values under RE. In particular, we

    set Θ = (θπ, θx, ρa, ρe)′ and calculate G(Θ, Θ̄) for 1.4 < θπ < 1.6, 0.1 < θx < 0.3,

    .75 < ρa < .95, and .65 < ρa < .85, where Θ̄ = (1.5, 0.2, 0.85, 0.75)′ is fixed. We then

    calculate ΘRE = F−1(G(Θ, Θ̄)) and compare ΘRE to Θ.

    Figure 2 shows the resulting values of QEEand QIH , where the black dot represents

    the value QRE implied by Θ̄. The remaining two columns shows the original grid for

    Θ in red and the implied RE values of ΘRE in blue. The EE case shows the effects of

    a reduction in nonlinearity can potentially have on model fit. RE can cover the same

    20

  • Table 4: Real-time forecast results

    Inflation GDPAnnualized RMSFE Annualized RMSFE

    t (Nowcast) t+1 t+4 t+6 t (Nowcast) t+1 t+4 t+6

    RE 1.11 1.06 0.93 0.99 RE 2.28 2.04 2.06 1.68

    Relative RE Relative RE

    RW 1.02 0.85** 0.80*** 0.86** RW 1.00 1.02 1.12 1.20(2.21) (-1.99) (-2.58) (-2.04) (-0.01) (0.86) (1.64) (4.29)

    Bi-VAR 0.84** 0.78*** 1.36 1.18 Bi-VAR 0.95 0.94 1.15 1.09(-2.17) (-2.98) (3.67) (1.81) (-0.96) (-0.83) (2.82) (1.11)

    FB-EE 0.99** 0.99 0.98 0.99 FB-EE 1.11 1.03 0.98 0.98(-1.74) (-0.61) (-0.91) (-0.46) (1.01) (2.46) (-0.98) (-0.46)

    FB-IH 0.92** 0.88** 0.85** 0.86** FB-IH 1.09 1.02 1.00 1.01(-2.01) (-1.94) (-2.12) (-2.04) (0.89) (1.87) (0.01) (0.95)

    CGL-EE 0.99 0.98 0.99 1.00 CGL-EE 1.10 1.03 0.98 0.98(-0.98) (-1.01) (-0.53) (-0.03) (0.95) (2.39) (-0.74) (0.07)

    CGL-IH 0.95* 0.88*** 0.85** 0.84*** CGL-IH 1.09 1.02 0.99 0.99(-1.39) (0.88) (-2.12) (-2.76) (0.85) (1.97) (-0.72) (0.14)

    *** p < 0.01, ** p < 0.05, * p < 0.1

    Notes:TBD

    reduced form values using a smaller range of parameters but it constrains the possible

    values that the AR coefficients of the shocks can take on. Under EE-FB, more persistent

    shocks are permitted, while maintaining very similar values for the monetary policy

    rule and identical values for the remaining parameters. This allows the EE-FB case to

    generate more persistence by simply increasing the persistence of the exogenous shocks

    without significantly affecting other parts of the reduce form.

    In the IH-FB case, the mechanism on display is more extreme than in the EE case.

    The IH-FB case spans a much larger range of the reduced form space for the same grid

    of points. The space is so large that the RE solution is incapable of covering the same

    area with parameter values that satisfy the Taylor principle or stationary of the shock

    processes. Now of course, it could be the case that these ranges of the parameters space

    are not empirically relevant. But this is not what we find. In fact, if no constraints are

    placed on the reduced form when it is estimated it returns values that only the IH case

    can generate for plausible structural parameter values.

    The key takeaway from this exercise is that it is not necessary for the structural

    parameters to change in any substantial way to fundamentally alter the way the model

    fits the data. In fact, it is precisely because the changes to the structural parameters are

    small that it is easy to attribute changes in fit to the persistence that adaptive learning

    strategy can generate.

    21

  • Figure 2: The mapping from structural parameters to the reduced form

    EE-CGL

    IH-CGL

    Notes: TBD

    22

  • 7 Robustness

    To illustrate that the improvements observed in the FB case are robust, we replicate the

    result in the model of Smets and Wouters (2007). We use the replication files provided by

    the American Economic Review for the paper. Smets and Wouters estimate the model

    using Bayesian techniques in Dynare. We replicate the benchmark case found in Table 1

    and Table 1B of their paper.7 For the FB case, we use the RE solution of the model at the

    posterior modes as the fixed belief function. We then estimate the structural parameters

    around the fixed forecasting function by Bayesian methods in Dynare.

    Table 5 provides selected parameter estimates and the marginal log-likelihood of the

    two models. The parameter estimates shown in the table are those that can be mapped

    into the main parameters of the Ireland model. The parameter labels are chosen to reflect

    these connections. The full set of parameter estimates are given in the appendix.

    Table 5 demonstrates that the key results obtained in the Ireland model hold in this

    model. In particular, the parameter estimates are relatively unchanged and marginal

    likelihood is higher in the FB case. In Slobodyan and Wouters (2012b), the average

    improvement in marginal likelihood was 6.3 with a maximum of 18 across 5 different

    CGL specifications. Therefore, the improvement observed here are consistent with the

    improvement observed in the Ireland model studied in the last section.

    8 Conclusion

    This paper demonstrates that improvements in in-sample fit of New Keynesian DSGE

    models that assume some type of bounded rationality relative to their rational coun-

    terparts are in large part explained by the loosening of the restrictions imposed on the

    structural parameters by Rational Expectations. This finding is in contrast to the prevail-

    ing explanation given in the literature that improvements are primarily due an increased

    ability to generate persistence in key endogenous variables. While increasing a model’s

    ability to generate persistence remains important, it accounts for a much smaller portion

    of the overall improvements observed. Furthermore, we demonstrate that these increases

    translate into modest real-time out-of-sample forecast accuracy and again explain a sub-

    stantial portion of the improvements observed by the same model using adaptive learning

    with a constant gain.

    7Our results differ slightly from the results reported in Smets and Wouters (2007) for the RE case.However, since we use the their official replications files, we have no reason to doubt that the observed dif-ferences are anything more than numerical imprecision that arises from running estimations on differentversions of Matlab.

    23

  • Table 5: FB in Smets and Wouters 2007

    Prior Distribution Prosterior Distribution

    Distr. Mean St. Dev Exp. Mean 95% HPD 5% HPD

    θπ norm 1.5 0.25 RE 2.054 1.765 2.354FB 1.740 1.406 2.071

    θx norm 0.125 0.05 RE 0.217 0.173 0.262FB 0.190 0.145 0.234

    π̄ × 4 gamma 2.5 0.4 RE 2.616 2.140 3.080FB 2.524 1.988 3.060

    ρa beta 0.5 0.2 RE 0.204 0.070 0.337FB 0.205 0.036 0.358

    ρe beta 0.5 0.2 RE 0.895 0.817 0.977FB 0.897 0.829 0.968

    ρz beta 0.5 0.2 RE 0.955 0.936 0.974FB 0.956 0.927 0.986

    σa invg 0.1 2 RE 0.242 0.205 0.282FB 0.259 0.232 0.286

    σe invg 0.1 2 RE 0.246 0.220 0.270FB 0.235 0.212 0.258

    σz invg 0.1 2 RE 0.432 0.386 0.474FB 0.443 0.397 0.488

    σi invg 0.1 2 RE 0.138 0.108 0.166FB 0.123 0.106 0.138

    RE Marginal Likelihood: -924.8 Slobodyan and Wouters RE -922.75FB Marginal Likelihood: -920.3 (JEDC 2012) CGL -922.65/-916.4

    Notes:TBD

    24

  • The results suggest that the overall structure of the New Keynesian model is not

    necessarily at odds with data and that RE may actually be a key source of model mis-

    specification. Bounded rationality strategies that separate the estimation of beliefs from

    the structural parameters of the model may increase model fit and out-of-sample fore-

    casting by loosening the cross equation restrictions imposed on the structural parameters

    by RE.

    References

    Adolfson, Malin, Jesper Lindé, and Mattias Villani, “Forecasting Performance of

    an Open Economy DSGE Model,” Econometric Reviews, April 2007, 26 (2-4), 289–328.

    Amato, Jeffery D and Thomas Laubach, “Rule-of-thumb behaviour and monetary

    policy,” European Economic Review, 2003, 47 (5), 791–831.

    Ball, Laurence, N Gregory Mankiw, and Ricardo Reis, “Monetary policy for

    inattentive economies,” Journal of monetary economics, 2005, 52 (4), 703–725.

    Berardi, Michele, Jaqueson K Galimberti et al., “On the initialization of adaptive

    learning algorithms: A review of methods and a new smoothing-based routine,” Centre

    for Growth and Business Cycle Research Discussion Paper Series, 2012, 175.

    Clark, Todd E and Michael W McCracken, “Tests of equal predictive ability with

    real-time data,” Journal of Business & Economic Statistics, 2009, 27 (4), 441–454.

    and , “Advances in forecast evaluation,” working paper, 2011.

    Cogley, Timothy and Argia M Sbordone, “Trend inflation, indexation, and inflation

    persistence in the New Keynesian Phillips curve,” The American Economic Review,

    2008, 98 (5), 2101–2126.

    Cole, Stephen and Fabio Milani, “The misspecification of expectations in new key-

    nesian models: A dsge-var approach,” Technical Report 2014.

    Croushore, Dean and Tom Stark, “A real-time data set for macroeconomists: Does

    the data vintage matter?,” Review of Economics and Statistics, 2003, 85 (3), 605–617.

    Diebold, Francis X and Robert S Mariano, “Comparing predictive accuracy,” Jour-

    nal of Business & economic statistics, 1995, 13 (3).

    25

  • , Frank Schorfheide, and Munchul Shin, “Real-Time Forecast Evaluation of

    DSGE Models with Stochastic Volatility,” May 2015, pp. 1–24.

    Evans, George W and Seppo Honkapohja, Learning and expectations in macroeco-

    nomics, Princeton: Princeton University Press, January 2001.

    Fuhrer, Jeffrey C, “Habit formation in consumption and its implications for monetary-

    policy models,” American Economic Review, 2000, pp. 367–390.

    Galı, Jordi and Mark Gertler, “Inflation dynamics: A structural econometric anal-

    ysis,” Journal of monetary Economics, 1999, 44 (2), 195–222.

    Gaus, Eric and Srikanth Ramamurthy, “Estimation of Constant Gain Learning

    Models,” mimeo, August 2012.

    Harvey, David, Stephen Leybourne, and Paul Newbold, “Testing the equality

    of prediction mean squared errors,” International Journal of Forecasting, 1997, 13,

    281–291.

    Hommes, Cars, Behavioral rationality and heterogeneous expectations in complex eco-

    nomic systems, Cambridge University Press, 2013.

    Ireland, Peter N, “Technology Shocks in the New Keynesian Model,” The Review of

    Economics and Statistics, November 2004, 86 (4), 923–936.

    Kolasa, Marcin, Michal Rubaszek, and Pawel Skrzypczynski, “Putting the New

    Keynesian DSGE model to the real-time forecasting test,” Journal of Money, Credit

    and Banking, October 2012, 44 (7), 1301–1324.

    Mankiw, N Gregory and Ricardo Reis, “Sticky Information versus Sticky Prices:

    A Proposal to Replace the New Keynesian Phillips Curve,” Quarterly Journal of Eco-

    nomics, 2002, pp. 1295–1328.

    Milani, Fabio, “A Bayesian DSGE Model with Infinite-Horizon Learning: Do” Mechan-

    ical” Sources of Persistence Become Superfluous?,” International Journal Of Central

    Banking, 2006.

    , “Expectations, learning and macroeconomic persistence,” Journal of Monetary Eco-

    nomics, 2007, 54 (7), 2065–2082.

    26

  • Negro, Marco Del, Frank Schorfheide, Frank Smets, and Rafael Wouters, “On

    the Fit of New Keynesian Models,” Journal of Business and Economic Statistics, April

    2007, 25 (2), 123–143.

    Preston, Bruce, “Learning about Monetary Policy Rules when Long-Horizon Expec-

    tations Matter,” International Journal of Central Banking, 2005.

    Rotemberg, Julio J, “Monopolistic price adjustment and aggregate output,” The Re-

    view of Economic Studies, 1982, 49 (4), 517–531.

    Slobodyan, Sergey and Raf Wouters, “Learning in a Medium-Scale DSGE Model

    with Expectations Based on Small Forecasting Models,” American Economic Journal:

    Macroeconomics, April 2012, 4 (2), 65–101.

    and , “Learning in an estimated medium-scale DSGE model,” Journal of Economic

    Dynamics and control, 2012, 36 (1), 26–46.

    Smets, Frank and Rafael Wouters, “Shocks and Frictions in US Business Cycles: A

    Bayesian DSGE Approach,” The American Economic Review, 2007, pp. 586–606.

    , Anders Warne, and Rafael Wouters, “Professional forecasters and real-time

    forecasting with a DSGE model,” International Journal of Forecasting, October 2014,

    30 (4), 981–995.

    Stock, James H and Mark W Watson, “Median unbiased estimation of coefficient

    variance in a time-varying parameter model,” Journal of the American Statistical As-

    sociation, 1998, 93 (441), 349–358.

    Appendix A

    In what follows, we describe the model and derive household and firm decision rules

    following Preston (2005) that depend on expectations but not specifically on any explicit

    assumption for how expectations are formed. This gives us a general setting from which

    the consequences of different expectation assumption on the reduced form may be tracked

    systematically.

    Households seek to maximize the following expected utility function

    ĨEi,t

    ∞∑t=0

    βt[atln(Ci,t) + ln(Mi,t/Pt)− η−1hηi,t

    ](A1)

    27

  • by choosing consumption, money holdings, labor supply, and by taking into account a

    preference shock at, where ĨEt represents a general expectations operator that is yet to

    be defined.8 The representative household is faced with the budget constraint

    Mi,t−1 +Bi,t−1 + Tt +Wthi,t + ∆i,t ≥ PtCi,t +Bi,t/rt +Mt, (A2)

    where M is nominal money balances, B is nominal bond, T is transfers, W is the nominal

    wage, and ∆ is nominal profits the household receives from ownership of firms.

    Production in the economy is separated into two sectors: a perfectly competitive

    finished goods sector and a monopolistically competitive intermediates goods sector. The

    finished goods sector uses a continuum of intermediates goods of prices Pj to construct

    the finished good. The production function is a CES constant returns to scale technology

    (∫ 10

    Y(θt−1)/θtj,t dj

    )θt/(θt−1)≥ Yt, (A3)

    where θt is a cost push shock.9 The finished-good-producing firms maximize profits

    subject to demand for their good

    Yj,t = (Pj,t/Pt)−θt Yt. (A4)

    Finished goods price is given by

    Pt =

    (∫ 10

    P 1−θtj,t dj

    )1/(1−θt)(A5)

    for all t. The intermediate-goods-producing firms hires hj,t units of labor to manufacture

    Yj,t units of outputs using

    Zthj,t ≥ Yj,t, (A6)

    where Zt is an aggregate technology shock.10 To introduce price stickiness, it is assumed

    that firms face an explicit cost to adjust nominal prices following Rotemberg (1982) that

    is measured in terms of finished goods

    φ

    2

    (Pj,t

    π̄Pj,t−1− 1)2

    Yt. (A7)

    8It is assumed that 0 < β < 1, η ≥ 1, and ln(at) = ρaln(at−1) + �a,t.9θt = (1− ρθ)ln(θ) + ρθln(θt−1) + �(θ, t).

    10ln(Zt) = ln(z) + ln(Zt−1) + �z,t.

    28

  • Lastly, the output gap is defined as the ratio between the actual and efficient levels of

    output

    xt =

    (1

    at

    )1/ηYtZt. (A8)

    Households’ decision rule

    We begin by deriving the household’s optimal decision rule following Preston (2005). The

    conditions of the household’s optimal decision are given by

    atC−1i,t = βrtĨEi,tat+1C

    −1i,t+1π

    −t+1 (A9)

    hη−1i,t = atC−1i,t Wt (A10)

    Bi,t−1 +Wthi,t + ∆i,t = PtCi,t +Btr−1t , (A11)

    where we have eliminate the variables dealing with money and transfers in the budget

    constraint. Starting with budget constraint, we put it into real terms by dividing by the

    price level11 and make it stationary using substitution to account for the unit root TFP

    process Zt12

    bi,t−1π−1t + ωtZthi,t + di,tZt = ci,tZt + bi,tr

    −1t .

    Then, rearranging and summing, we obtain the lifetime budget constraint

    ∞∑T=t

    βT−tci,TZt =∞∑T=t

    βT−t(wTZThi,T + di,TZt),

    which allows us to divide out Zt. Then noting that wThi,T + di,T = yi,T , we have

    ∞∑T=t

    βT−tĉi,T =∞∑T=t

    βT−tŷi,t. (A12)

    We then log-linearize the stationary Euler equation

    ĉi,t = ĨEi,tĉi,t+1 − (it − Etπ̂t+1) + (ρa − 1)ât

    and solve it backwards recursively to get

    11We assume wt = Wt/Pt, Dt = ∆t/Pt, bt = Bt/Pt.12We assume ωt = Wt/(PtZt), dt = ∆t/(PtZt).

    29

  • ĨEi,tĉi,T+1 = ĉi,t + ĨEi,t

    T∑s=t

    ((is − π̂s+1)− (ρa − 1)ât).

    Then summing and discounting this expectation we get

    ĨEi,t

    ∞∑T=t

    βT−tĉi,T+1 =1

    1− βĉi,t −

    1

    1− βĨEi,t

    ∞∑T=t

    βT−t((iT − π̂T+1)− (ρa − 1)âT ). (A13)

    Combining Equation (A12) and (A13) yields the households decision rule for consumption

    ĉi,t = ĨEi,t

    ∞∑T=t

    βT−t [(1− β)yj,T+1 − (iT − π̂T+1)− (ρa − 1)âT ] .

    Aggregating across households and using the log linearized output gap, we obtain the

    aggregate IS curve

    xt = −ωât + ĨEt∞∑T=t

    βT−t [(1− β)(xT+1 + ωρaâT )− (iT − π̂T+1)− (ρa − 1)âT ] , (A14)

    which is absent any assumption about how expectations are formed.

    Firms’ decision rule

    Firms seek to maximize the present value of their companies

    ĨEj,t∑T=t

    Qt,TPTΠj,T , (A15)

    where

    Πj,T =

    (Pj,TPT− MCj,T

    PT

    )Yj,t −

    Φ

    2

    (Pj,T

    π̄Pj,T−1− 1)2

    Yt (A16)

    and Qt,T = βT−t aT

    cT. The first order condition of the firm’s problem is

    Φ

    (Pj,t

    π̄Pj,t−1− 1)

    1

    π̄Pj,t−1= ĨEj,t

    [Qt,t+1Yt+1Qt,tYt

    Φ

    (Pj,t+1π̄Pj,t

    − 1)Pj,t+1π̄P 2j,t

    ](A17)

    + θt

    (Pj,tPt

    )−1−θt (wtθt − Zt(θt − 1)Pj,tP 2t Zt

    ).

    30

  • Because we assume π̄ ≥ 1, the model does not have a steady state price level. Therefore,to make the model stationary, we define: P̂j,t+i = Pj,t+i/Pt+i and πt+i = Pt+i/Pt+i−1 for

    all i. Likewise, wages are made stationary by the following substitution ωt = wt/PtZt.

    Substituting in these definitions yields

    Φ

    (P̂j,tPt

    π̄P̂j,t−1Pt−1− 1

    )1

    π̄P̂j,t−1Pt−1= ĨEj,t

    [Qt,t+1Yt+1Qt,tYt

    Φ

    (P̂j,t+1Pt+1

    π̄P̂j,tPt− 1

    )P̂j,t+1Pt+1

    π̄P̂ 2j,tP2t

    ]+

    (P̂j,t

    )−θt (θtωt + (1− θt)P̂j,t

    ).

    Now, noting that market clearing implies Ct = Yt, multiplying both sides by Pt, and

    using the definition of inflation we get

    Φ

    (P̂j,tΠt

    Π̄P̂j,t−1− 1

    )Πt

    π̄P̂j,t−1= ĨEj,t

    [at+1at

    Φ

    (P̂j,t+1Πt+1

    π̄P̂j,t− 1

    )P̂j,t+1Πt+1

    π̄P̂ 2j,t

    ]+

    (P̂j,t

    )−θt (θtωt + (1− θt)P̂j,t

    ).

    Log-linearizing this expression

    −Φpj,t−1 +(Φ(β+1)−1+ θ̄)pj,t−Φβ ĨEj,tpj,t+1 = Φβ ĨEj,tπt+1−Φπt− ω̂t(1− θ̄)− θ̂t (A18)

    and introducing lag polynomials, we can write this as

    (1

    βL2 − Φ(β + 1)− 1 + θ̄

    ΦβL+ 1

    )ĨEj,tpj,t+1 =

    1

    β(πt − βEtπt+1) +

    1− θ̄βΦ

    ω̂t +1

    Φβθ̂t.

    (A19)

    Factoring the lag polynomial and solving forward the unstable root yields

    (1− λ1L)pj,t =−λ−12 L−1

    1− λ−12 L−1L

    β

    (πt − βEtπt+1 +

    1− θ̄Φ

    ω̂t +1

    Φet

    )= −λ1ĨEj,t

    ∞∑T=t

    (λ1β)T−tπT + λ1β ĨEj,t

    ∞∑T=t

    (λ1β)T−tπT+1

    +ĨEj,t

    ∞∑T=t

    βT−t(ψxT − eT ).

    31

  • where (θ̄ − 1)ηΦ−1ωt = ψxt, Φ−1θ̂t = et, 0 < λ1 < 1, λ2 > 1, λ2 = 1λ1β , and λ1 + λ2 =(Φβ)−1(Φ(β + 1)− 1 + θ̄). Combining terms we have13

    (1− λ1L)pj,t = −πt + λ1ĨEj,t∞∑T=t

    (λ1β)T−t1− λ1

    λ1πT + ψxT − eT .

    Finally, aggregating across firms yields the aggregate the Phillips curve free of any ex-

    pectations assumptions

    πt = λ1ĨEt

    ∞∑T=t

    (λ1β)T−t(

    1− λ1λ1

    πT + ψxT − eT). (A20)

    Monetary policy

    The model is closed with a standard contemporaneous Taylor rule

    it = r̄ + π̄ + θπ(πt − π̄) + θxxt + �i,t (A21)

    where �i,t is an i.i.d. monetary policy shock.

    13noting that −λ1πt − λ21βπt+1 − λ31β2πt+2... and λ1βπt+1 + (λ1β)2πt+2 + (λ1β)3πt+3... and −λ1πt +λ1β(1− λ1)πt+1 + (λ1β)2(1− λ1)πt+2 + (λ1β)3(1− λ1)πt+3....

    32

    1 Introduction2 The stylized facts of estimated NK models with CGL3 Relaxing the RE restrictions through CGL4 The Model, Expectations, and the Reduced Form4.1 The Model4.1.1 Rational Expectations4.1.2 Euler equation learning4.1.3 Infinite horizon learning4.1.4 The fixed belief cases

    5 Taking the model to the data5.1 The state space5.2 Estimation results5.3 Out-of-sample fit

    6 Understanding how RE restrictions affect fit7 Robustness8 Conclusion