Expectations and the empirical t of DSGE modelschristopherggibbs.weebly.com/uploads/3/8/2/6/... · unrestricted VAR. In addition,Diebold et al.(2015) nds that adding stochastic volatil-ity

Expectations and the empirical fit of DSGEmodels

Preliminary draft prepared for SNDE. Check chrisopherggibbs.weebly.com for latest draft.

Eric Gaus∗ Christopher G. Gibbs†

March 2017

Abstract

This paper studies the mechanisms that contribute to the observed improvementin fit of DSGE models that assume adaptive learning. DSGE models often struggleto reproduce the persistence observed in actual macroeconomic data. To generatethis persistence, many researchers have proposed relaxing the rational expectations(RE) assumption in favor of adaptive learning. Empirically, these relaxations arefound to increase persistence and almost universally improve model fit comparedto RE. However, we show that these improvements are not always explained by theincrease in persistence implied by adaptive learning. Instead, a significant propor-tion of the increased fit is due to the relaxation of the cross equation restrictionsimposed by RE alone. Our findings suggest that many model comparisons betweenRE and adaptive learning speak more to the misspecification of RE than to theveracity of the particular learning strategy. We propose a simple check to assessthe contribution that learning adds to any estimated model.

1 Introduction

DSGE models often struggle to reproduce the persistence observed in actual macroeco-

nomic data. To overcome this issue, many researchers add additional frictions, preference

assumptions, or ad hoc adjustment.1 Alternatively, some have argued that relaxing the

rational expectations (RE) assumption in favor of adaptive learning offers a theoretically

∗Gaus: Ursinus College, 601 East Main St., Collegville, PA 19426-1000 (e-mail: [email protected])†Gibbs: University of New South Wales. (e-mail: [email protected])1The lack of persistence generated by DSGE models under RE is a well known. These modifications

include ad hoc corrections such as adding lags of the endogenous variables to the structural equations(Galı and Gertler (1999) and Ireland (2004)), changes to preference such as habit persistence (Fuhrer(2000)), changes to inflation setting by firm through inflation indexation (Cogley and Sbordone (2008)),or adding information problems such as rule-of-thumb behavior (Amato and Laubach (2003)), or rationalinattention/sticky information (Mankiw and Reis (2002); Ball et al. (2005)) to name a few. Armed withsubset of these modifications, Del Negro et al. (2007) declare that the NK model fits the data well enoughto be used for policy evaluation.

1

appealing way to generate persistence without employing additional modifications to the

model structure. For example, Milani (2006, 2007), Slobodyan and Wouters (2012a,b),

and Cole and Milani (2014) show that relaxing RE in favor of an adaptive learning

strategies can generate significant improvements in in-sample fit and in some cases may

lessen the reliance on assumptions such as habit persistence (Milani (2007)) to explain

macroeconomic persistence, while leaving inference on all other parameters unchanged.

In this paper, we reexamine the observed improvements in empirical fit of DSGE

models that assume adaptive learning. A broad conclusion in much of this literature is

that the improvement in fit observed for these models is directly tied to the persistence

that they can potentially generate. However, we show that this conclusion is not always

correct and may depend crucially on the specific modeling choices for expectations. In

particular, considering any relaxation of RE has implications for the cross equations

restrictions imposed on the time series representation of the model that is taken to the

data. RE typically places tight restrictions on this underlying structure, which may be

loosened in surprising ways whenever a relaxation of RE is considered. We show that

the loosening of these restrictions alone can account for a significant proportion of the

improvement in in-sample fit and out-of-sample forecast accuracy for estimated New

Keynesian models that assume adaptive learning.

We illustrate the point by comparing RE and adaptive learning with a constant gain,

which we refer to as constant gain learning (CGL), to an intermediate case, which we call

fixed beliefs (FB). In the FB case, we assume that beliefs are formed by a fixed forecasting

function with the same reduced form as the RE solution of the model. The FB case,

therefore, nests RE as a special case and is nested by adaptive learning. Importantly,

though, the FB case does not add any mechanisms to the model that generate persistence.

All it does is separate the estimation of the structural parameters from the estimation

of beliefs. In fact, one way to view the FB case is as estimating the model under CGL

with the gain parameters, which governing the effect of new information on the agents’

forecasting functions, constrained to be zero. In this case, the agents’ forecasts in all

periods coincide with whatever is chosen or estimated for their initial beliefs.

We explore this relaxation in a version of the New Keynesian model developed by

Ireland (2004), which we strip of all internal propagation mechanisms that are not di-

rectly generated as a consequence of expectations. The model offers a parsimonious and

tractable laboratory to study the role of expectations play in estimation, but which re-

mains significantly detailed to be taken to data. We focus on the two of most widely

used adaptive learning strategies in the literature: Euler Equation constant gain learning

(EE-CGL) as in Evans and Honkapohja (2001) and Infinite Horizon constant gain learn-

2

ing (IH-CGL) as proposed by Preston (2005). These two cases are interesting to compare

because they imply significantly different ranges of persistence for identical calibrations.

In EE-CGL case, we show that nearly all of the improvement are explained by the re-

laxation of RE in the FB case and not added persistence. We then confirm this result

in the medium scale DSGE model of Smets and Wouters (2007). In this model, we find

that relaxing the RE assumption without learning generates significant improvements in

in-sample fit that are similar in magnitude to those reported by Slobodyan and Wouters

(2012b) for a number of different EE-CGL specifications. In the IH-CGL case, learning

and persistence does play a more prominent role. But, even here, the majority of the

improvement in in-sample fit comes from the loosening of the cross equation restrictions

relative to RE.

There are three main takeaways from our results:

1. First, our results provide continued support for the exploration of bounded rational-

ity strategies in DSGE model since even small deviations from RE offer significant

improvements in-sample fit and real-time out-of-sample forecasting performance,

while maintaining the standard theoretical structure of the DSGE model.

2. Second, our results demonstrate that comparisons between bounded rationality

strategies and RE should be done with care. Many comparisons say more about

the misspecification of the model under RE than the benefits of the particular

learning strategy.

3. Finally, our results suggest a natural benchmark to asses an adaptive learning

strategy is to compare the fit of the model with parameter updating to the model

without parameter updating. If learning is important, then model fit should sig-

nificantly improve when parameter updating is allowed. This is also a natural test

for the role played by the initial beliefs chosen for the learning algorithm.2

The existing literature recognizes that different assumption for initial beliefs and

different specifications of learning vary in their performance relative to RE. However,

the existing literature does not attempt to distinguish the shared features of these as-

sumption that generate improvements in fit and no study to our knowledge has assessed

whether those improvements translate into real-time out-of-sample forecast accuracy.

For example, both Slobodyan and Wouters (2012a,b) and Cole and Milani (2014) find

2Slobodyan and Wouters (2012a) and Berardi et al. (2012) both show that the selection of initialbeliefs can have significant effects on inference and fit.

3

improvements in fit relative to RE whether agents are assumed to have correctly speci-

fied time-varying beliefs (as traditionally assumed under adaptive learning), misspecified

time-varying beliefs, or fixed but non-rational beliefs. Our results show that the same

mechanism may drive a substantial portion of the observed improvements in each of these

cases. In addition, previous comparisons of out-of-sample fit, such as the exercise done

by Slobodyan and Wouters (2012a), only consider ex-post data, which has the potential

to overstate true forecasting power as noted by Croushore and Stark (2003).

Although the out-of-sample fit of the NK model with adaptive learning is largely

unexplored in the literature, the RE counterpart of the model has been studied by nu-

merous researchers.3 In particular, Adolfson et al. (2007) find that DSGE forecasts can

out-perform VARs in short-term forecasts and Bayesian VARs over the long-term fore-

casts in an open-economy framework. Smets and Wouters (2007) also find comparable

out-of-sample forecast performance of the model relative to Bayesian VARs. Del Negro

et al. (2007) finds, using DSGE-VARs, that the cross-equation restrictions implied by

a rational solution of a New Keynesian model can add forecasting power relative to an

unrestricted VAR. In addition, Diebold et al. (2015) finds that adding stochastic volatil-

ity to the model improves forecasting. Although, a caveat to these papers is the use of

ex-post data instead of real-time data. Kolasa et al. (2012) reevaluates the DSGE-VAR

evidence against a large scale NK DSGE model with real-time data and finds that except

for short-term forecasts, the large DSGE model can perform as well, or better than the

DSGE-VAR. Smets et al. (2014) also finds sizable gains in real-time forecasting accuracy

in NK models when actual survey expectations are used as observables.

Our results compare favorably to these results with improvements in forecast accuracy

at long horizons for inflation and comparable accuracy at short horizons when compared

to parsimonious time series models on US out-of-sample data spanning the Great Mod-

eration. Likewise, forecasts for output growth are best at long horizons relative to short

horizons. However, no noticeable improvements over the time series models is observed

for output growth forecasts in the NK model considered here.

In the next section, we discuss the stylized facts for estimation of New Keynesian

DSGE models with constant gain learning to highlight the key relationship this paper

explains. In Section 3, we discuss how constant gain learning strategy are typically

introduced into a DSGE model and how it changes the mapping from the structural

parameters of interest to the reduced form of the model taken to the data when comparing

RE to CGL. In Section 4, we introduce a parsimonious DSGE model and derive the

reduced form mappings implied under different expectation assumptions. In Section 5, we

3The exception of course being Slobodyan and Wouters (2012a).

4

estimate the five different models and compare the results to the stylized facts discussed

in Section 2. In Section 6, we explain the role that the loosening of the RE restrictions

plays in improving model fit. In Section 7, we show that the same mechanism driving fit

in our parsimonious model appear to underlie the improvements in fit of medium scale

DSGE models as well. In Section 8, we conclude.

2 The stylized facts of estimated NK models with

CGL

Table 1 reports the parameter estimates from two of the most widely cited studies on

estimated NK models with adaptive learning: Milani (2007) and Slobodyan and Wouters

(2012b). Milani investigates adaptive learning in a small scale DSGE model, while Slo-

bodyan and Wouters investigate learning in the medium scale DSGE model of Smets and

Wouters (2007). The table reports some key parameter estimates from the two studies

for the respective models assuming RE and asumming minimum state variable (MSV)

CGL. The parameter estimates exhibit three stylized facts that are ubiquitous among

CGL estimation exercises:

1. First, the estimates of the key parameters that describe monetary policy and the

exogenous shocks are mostly unchanged between the RE and CGL cases. In par-

ticular, columns 3 and 6 in the table show the difference between the two cases,

where asterisks denote whether the changes falls inside of the 95% HPD intervals

of the RE estimates. Nearly all of the parameters estimates under CGL remain

within the HPD interval of the rational estimates.

2. Second, the model under CGL exhibits a significant improvement in in-sample fit

as measured by marginal log-likelihood relative to RE.

3. Third, the gain parameters that govern how agents update their beliefs in the

learning algorithm are estimated to be relatively small.

The first stylized fact is often interpreted as evidence that persistence explains the

improvement in fit documented by the second stylized fact. This is because small changes

in the structural parameters appear to suggest that the structure of the model fits the

data in the same way as under RE. However, the third fact - small estimated values for

the gain parameters - complicates this interpretation. Small gain parameters such as

these can imply extreme persistence in the learning process that goes beyond business

5

cycle frequencies that are typically of interest to an econometrician. In practice, these

gains are so small that the beliefs may basically remain at whatever values they are

initialized at for the duration of data used to estimate the model.

Together, these stylized facts imply that there may be other factors that contribute

to increase in fit beyond simply the addition of new mechanism to propagate shocks.

We argue that a key element to explaining these stylized facts is to understand how

relaxing RE in an estimated model affects the mapping of the structural parameters to

the reduced form matrices that are taken to the data.

3 Relaxing the RE restrictions through CGL

The most common approach to implement an adaptive learning strategy is to start with

the linearized structural equations of a DSGE model such as

yt = Γ + Ayt−1 + BIEtyt+1 + Dvt + Kεt (1)

vt = Rvt−1 + ut, (2)

where yt is a vector endogenous variables and vt is vector of autoregressive exogenous

shocks, and to replace the expectation operator with a predetermined boundedly rational

forecasting strategy. This is the standard approach taken by Evans and Honkapohja

(2001) in their exhaustive theoretical treatment on adaptive learning and the approach

most often taken in the behavioral heterogeneous expectation literature as in Hommes

(2013) and the reference therein. The key here is that agents are only asked to form one-

step-ahead forecasts and that they ignore any implications of their forecasts for longer

horizons. The appeal of this approach is that if the chosen forecasting process nests RE,

then under well-known regularity conditions (E-stability) the agents’ beliefs will stay in

the neighborhood of the RE solution. Therefore, it allows for potentially complicated

but bounded beliefs around the natural benchmark of RE in a wide range of models.

A second approach follows Preston (2005) and asks agents to consider the implications

of their beliefs on their entire decision problem. In many cases, this means solving

out an agent’s decision rule, given beliefs today, into the infinite future. The approach

yields linearized structural equations that nest the standard equations studied in the

first approach and RE, but which capture the impact of long run beliefs. Again, if the

forecasting process chosen for agents nests the RE solution, then the functional form of

the model will nest the previous case, and under well-known regularity conditions the

6

Table 1: Stylized facts of Estimated NK models with CGL

Slobadyan and Wouters JEDC 2012 Milani JME 2007

RE CGL-MSV Change RE CGL - MSV Change

Monetary policy & habits Monetary policy & habits

MP inflation 2.04 1.91 0.13* MP inflation 1.433 1.484 -0.051*[1.75, 2.33] [1.06, 1.81] [1.08, 1.90]

MP output 0.09 0.13 -0.04* MP output gap 0.792 0.801 -0.009*[0.05, 0.013] [0.425, 1.165] [0.433, 1.18]

MP output growth 0.22 0.19 0.03* MP smoothing 0.89 0.914 -0.024*[0.18, 0.27] [0.849,0.93] [0.875, 0.947]

MP smoothing 0.81 0.84 -0.03* Habits 0.911 0.117 0.794[0.77, 0.85] [0.717, 0.998] [0.006, 0.289]

Habits 0.71 0.80 -0.09[0.64, 0.78]

AR parameters AR Parameters

Productivity 0.96 0.96 0.00* Demand shock 0.87 0.845 0.025*[0.94, 0.98] [0.8,0.93] [0.776, 0.908]

Risk premium 0.22 0.23 -0.01* Price mark-up 0.02 0.854 -0.834[0.08, 0.36] [0.0005, 0.07] [0.778, 0.93]

Gov. spending 0.98 0.96 0.02*[0.96, 0.99]

Investment 0.71 0.33 0.38[0.62, 0.81]

MP shock 0.15 0.05 0.1*[0.04, 0.24]

Price mark-up 0.89 0.88 0.01*[0.81. 0.97]

Wage mark-up 0.97 0.95 0.02*[0.95, 0.99]

St. Dev. Shocks St. Dev Shocks

Productivity 0.46 0.47 -0.01* MP Shock 0.933 0.860 0.073*[0.41, 0.51] [0.84, 1.04] [0.777, 0.953]

Risk premium 0.24 0.25 -0.01* Demand shock 1.067 1.670 -0.603[0.20, 0.28] [0.89, 1.22] [1.47, 1.91]

Gov. spending 0.53 0.53 0.00* Price mark-up 1.146 1.150 -0.004*[0.48, 0.58 [1.027, 1.27] [1.02, 1.31]

Investment 0.45 0.61 -0.16[0.37, 0.53]

MP shock 0.24 0.24 0.00*[0.22, 0.27]

Price mark-up 0.14 0.14 0.00*[0.11, 0.17]

Wage mark-up 0.24 0.23 0.01*[0.21, 0.28]

Gains Gains- 0.017 0.018

[0.006,0.021] [0.0133,0.0231]Marginal Likelihood Marginal Likelihood

-922.75 -910.97 -765.45 -759.08

7

agents beliefs will depart from those predicted under RE, but remain bounded around

the natural benchmark of RE as before.

Both types of learning models are taken to data using the same methods as their RE

counterpart. The models have state space representations and inference on a vector of

structural parameters, Θ, may be obtained by standard likelihood based techniques using

the Kalman filter. Depending on the assumption for how agents form their forecasts, there

are of course many new dimensions through which adaptive learning may contribute to

the observed improvements in fit. This includes the mechanical increases that occur

because of an increase in the degrees of freedom, or the time variation that is introduced

into the parameters through learning. Another, and the primary focus of this paper, is a

change in the mapping from the structural parameters to the reduced form of the model

taken to the data.

To illustrate how the mapping changes, consider estimating a vector of parameters

Θ from a standard linearized DSGE as in Equations (1) and (2), where the true data

generating process is unknown. The minimum state variable RE solution of the model

takes the following form

yt = C + Fyt−1 + Qvt + Kεt. (3)

The RE restrictions in this case are summarized by the mapping F : Θ → {C,F,Q},which governs how changes in the elements of Θ change C, F, or Q. The mapping in

general is highly nonlinear and often has no closed form solution. Inference on Θ in

this case may be obtained by standard likelihood based techniques such that ΘMLE =

argmaxΘL(Θ|yobs, F ), where L is the likelihood function.Any adaptive learning strategy, however, implies a different mapping. One which is

unambiguously less restrictive following common assumptions employed in this literature.

To see this, consider the case where agents forecast using Equation (3) but do not update

parameters over time. The agents just pick a parameterization for Equation (3) and use

it to forecast in each period. This is the same as assuming agents are learning the MSV

solution but that the gain parameters that govern how beliefs are updated are constrained

to be zero, which makes beliefs unchanging over time. Under this assumption, the data

generating process for the economy takes the following form

yt = (Γ + BĈ)︸︷︷︸C̃

+ (A + BF̂)︸︷︷︸F̃

yt−1 + (BQ̂R + D)︸︷︷︸Q̃

vt + Kεt, (4)

where Equation (3) is substituted in for IEtyt+1 and where hats denote fixed values for

8

the belief parameters. Note the following regarding Equation (4):

1. Equation (4) retains the same reduced form as Equation (3).

2. Conditioning on Ĉ, F̂, and Q̂, Equation (4) is described by the same structural

parameters, which we colled in the vector Ξ.

Now, let’s assume that agents pick beliefs based on ΘMLE = argmaxΘL(Θ|yobs, F )such that the RE values of Ĉ|ΘMLE , F̂|ΘMLE , and Q̂|ΘMLE are used in constructing Equa-

tion (4). This implies the following mapping G : Ξ×ΘMLE → {C̃, F̃, Q̃} to describe thiscase. By construction, if Θ = Ξ, then Equation (4) is equal to Equation (3). If adpative

learning, represented by allowing time variation in Ĉ, F̂, and Q̂ based on past data, is

responsible for explaining improvements in fit, then in this case ΞMLE from Equation (4)

should be equal to ΘMLE from Equation (3) and

maxΘ

IEL(Θ|yobs, F ) = maxΞ

IEL∗(Ξ|yobs, Ĉ|ΘMLE , F̂|ΘMLE , Q̂|ΘMLE).

Whether or not these equalities hold in practice depend on a number of factors. For

example, suppose the true data generating process is given by

yt = CT + FTyt−1 + Q

Tvt + KTεt, (5)

and suppose that Θ0 = F−1({CT,FT,QT}) exists. Then, by construction, it follows

that F (Θ0) = G(Θ0,Θ0) and Θ0 = argmaxΘIEL(Θ|yobs, F ) = argmaxΞIEL∗(Ξ|yobs, G).However, if Θ0 = F

−1({CT,FT,QT}) does not exist or is not feasible, then there is noguarantee that Θ is equal to Ξ, and in general

maxΘ

IEL(Θ|yobs, F ) ≤ maxΞ

IEL∗(Ξ|yobs, G),

where the inequality follows from the definition of F . Specifically, the range of mapping

F is by construction a subset of the range of G because F (Θ) := G(Θ,Θ).

Therefore, if RE is misspecified, then the first step in any adaptive learning scheme

of substituting in a belief function into the structural equations effects the cross equation

restrictions that map the structural parameters to the reduced from. But, importantly,

it does not fundamentally change the underlying reduced form structure of the model,

i.e., Equation (5), Equation (4), and Equation (3) continue to nest one another. There

is nothing explicitly done to the model that allows it to generate more persistence. Now

9

of course, this insight does not imply that further modification such as assuming time-

variation in C̃, F̃, and Q̃ as in adaptive learning will not better fit the data, but it

highlights another mechanism through which adaptive learning models may improve in-

sample fit. In the remainder of the paper, we specifically ask whether adaptive learning

does improve in-sample fit beyond just the loosening of the restrictions implied by RE.

4 The Model, Expectations, and the Reduced Form

To study the role loosened restrictions plays in model fit, we consider five different as-

sumptions for expectations that share a common reduced form:

yt = CRFt + F

RFt yt−1 + Q

RFt vt + K

RF εt, (6)

that each impose different restrictions on CRFt , FRFt , and Q

RFt . In this section, we put

forward a tractable model that allows for analytical derivation of the mapping from the

structural parameters to the reduced form so that changes implied by different expecta-

tions assumptions can be studied in detail.

4.1 The Model

The New Keynesian model we study is a parsimonious version of the model proposed by

Ireland (2004). The microfoundations of the model are given in Appendix A. Following

Preston (2005), the key equations are

xt = −ωât + ĨEt∞∑T=t

βT−t [(1− β)(xT+1 + ωρaâT )− (iT − π̂T+1)− (ρa − 1)âT ] (7)

πt = λ1ĨEt

∞∑T=t

(λ1β)T−t(

1− λ1λ1

πT + ψxT − eT)

(8)

it = r̄ + π̄ + θπ(πt − π̄) + θxxt + �i,t (9)

gt = ŷt − ŷt−1 + ḡ + �z,t (10)

xt = ŷt − ωat (11)

at = ρaat−1 + �a,t (12)

et = ρeet−1 + �e,t, (13)

where xt is the output gap, πt is inflation, it is the nominal interest rate, gt is the growth

rate of output, yt is the stochastically detrended level of output, at is a preference shock,

10

et is a cost push shock, �i,t is a monetary policy shock, and �z,t is the detrended TFP

shock. The individual variables are in log terms.

We choose this model for three reasons. First, the model has no internal propagation

mechanisms other than expectations. In fact, the minimum state variable RE solution

does not depend on any lagged endogenous variables. This has the added benefit of

allowing us to estimate the model without a projection facility, which is usually required

to keep expectations from becoming explosive due partly to numerical precision issues.

Gaus and Ramamurthy (2012) note that the choice of projection facility can have a

significant effect on estimation outcomes, which may complicate a comparison with RE.

Second, the determinacy and E-stability conditions of the model perfectly coincide.4

This allows us to impose identical restrictions on the parameter space for RE and CGL

versions of the model. And finally, the model generates predictions for a number of

directly observable variables that do not require any manipulation such as HP filtering

or even de-meaning in order to estimate.

4.1.1 Rational Expectations

It is straightforward to show that imposing RE on Equations (7) and (8) allows them to

be collapsed to a more familiar form

xt = r̄ + IEREt xt+1 − (it − ĨE

RE

t πt+1) + (1− ω)(1− ρa)at (14)

πt = (1− β)π̄ + β ĨERE

t πt+1 + ψxt − et. (15)

Substituting in Equation (9), we can map the model into the general form of Equations

(1) and (2)

yt = Γ + Ayt−1 + BĨEtyt+1 + Dvt + Kεt

vt = ρvt−1 + ut,

where yt = (xt, πt)′, vt = (at, et)

′, ut = (�a,t, �e,t)′, εt = (�i,t, 0)

′,

Γ = m

(π̄(θπβ − 1)

−π̄((θx(β − 1)− 1 + β + ψ − θπψ)

), B = m

(1 1− θπβψ β(1 + θx) + ψ

),

4The determinacy and E-stability condition for the model is the Taylor principle: θπ >(β−1)θx+ψ

ψ .

11

D = m

((ρa − 1)(ω − 1) θπ

(ρa − 1)(ω − 1)ω −1− θx

), ρ =

(ρa 0

0 ρe

)

m = (1 + θx + θπψ)−1, A = 02×2, and K = −I2. Using the reduced form of Equation

(6) and the method of undetermined coefficient, we can solve analytically for the RE

solution in terms of the structural parameters: CRE = (0, π̄)′,

QRE =

(− (ρa−1)(βρa−1)(ω−1)

1+θx−θxβρa+βρ2a+θπψ−ρa(1+β+ψ)θπ−ρe

1+θx−θxβρe+βρ2e+θπψ−ρe(1+β+ψ)(ρa−1)ψ(ω−1)

1+θx−θxβρa+βρ2a+θπψ−ρa(1+β+ψ)ρe−1−θx

1+θx−θxβρe+βρ2e+θπψ−ρe(1+β+ψ)

), (16)

FRE = 02×2, and KRE = −I. This is of course the explicit mapping F : Θ →

{CRE,FRE,QRE} discussed in Section 2 for this model.There are two notable features about this mapping. First, RE places restrictions on

the intercept term CRE forcing them to be consistent with the steady state of the model.

As we will see, this will not be the case under other expectation assumptions. Second, the

mapping from the structural parameters to the reduced form of QRE is highly nonlinear.

The mapping to each element of QRE is an eight degree polynomial in β, ψ, θπ, θx, ρa, ρe,

and ω. The nonlinearity of this mapping is important because it means that relatively

small changes in the values of structural parameters can have large effects on the reduced

form and fit.

4.1.2 Euler equation learning

The second expectations assumption we consider is Euler Equation Constant Gain Learn-

ing (EE-CGL). To implement this strategy, we start with the same structural equation

as under RE given by Equations (1) and (2). We assume that the agents perceived law

of motion (PLM) takes the same functional form of Equation (6)

yt = CRF + QRFvt + εt. (17)

The agents estimate the reduced form coefficients using past data by a constant gain

recursive least squares algorithm

Φt = Φt−1 + γS−1t zt−1(yt − 1− z′t−1Φt−1) (18)

St = St−1 + γ(zt−1z′t−1 − St−1), (19)

12

where zt is vector of data, Φt is a vector of regression coefficients, St is the estimated

variance-covariance matrix, and γ is a matrix of gain parameters that govern the weight

placed on new information.

Expectations under EE-CGL at time t are given by5

IEEEt yt+1 = Ĉt−1 + Q̂t−1Rvt.. (20)

Substituting Equation (20) in for the beliefs in Equation (1) yields the following mapping

to the reduced form equations:

CEEt = Γ + BĈ = m(

Ĉ11t−1 + (Ĉ21t−1 − π̄)(1 − βθπ)

Ĉ11t−1ψ + Ĉ21t−1(β(1 + θx) + ψ) − π̄(θx(β − 1) + ψ(1 − θπ) + β − 1)

)(21)

and

QEEt = BQ̂t−1ρ+ D = (22)

m(

1 − ω + ρa(Q̂11t−1 + Q̂21t−1(1 − θπβ) + ω − 1) θπ + (Q̂

12t−1 + Q̂

22t−1 − θπc22β)ρe

Q̂21t−1ρa(β(1 + θx) + ψ) + ψ(1 − ω + ρa(Q̂11t−1 + ω − 1)) θx(Q̂

22t−1βρe − 1) + Q̂

12t−1ρeψ + Q̂

22t−1ρe(β + ψ) − 1,

)where Ĉijt−1 and Q̂

ijt−1 represent the i

th row and jth column element of Ĉt−1 and Q̂t−1,

respectively.

Comparing CRE to CEEt , it is apparent that the restriction placed on the reduced form

under RE are loosened under EE-CGL in two ways. First, the reduced forms intercepts

are no longer explicitly tied to the steady states of the model. They depend on other

structural parameters and through beliefs, which frees them vary over time. Second,

comparing QRE to QEEt , the degree of nonlinearity of the mapping from structural pa-

rameters to the reduced form is reduced. The reduction in nonlinearity allows for changes

in structural parameters to have smaller effects on the other parameters, which allows

these parameters to take on new values without having as significant an impact elsewhere

in the model. For example, consider the limit of the first element of QEEt compared QRE

as ρa → 1. For the RE case, the coefficient goes to zero. Therefore, as the preferenceshock, at, becomes more persistent, its affect on the output gap approaches zero. This of

course offsets the persistence by effectively removing the shock from the model. In the

EE case, however, no such restriction is imposed. Here as ρa goes to one QEE,11t goes to

m(Q̂11t−1 + Q̂21t−1(1− θπβ)), which allows a more persistence preference shock to affect the

output gap at least in short run, when beliefs Q̂ are not at their RE values.

5As is common in the literature, we assume agent know R.

13

4.1.3 Infinite horizon learning

The third expectation assumption we consider is the infinite horizon learning solution.

This solution uses the full forward-looking decision rules given by Equation (7) and (8)

and asks agents to make decisions with respect to their beliefs over all future periods.

This means agents must form forecasts of xt and πt as well as it for T = 1, ...,∞. Incontrast, there is no need to forecast interest rates under EE-CGL. Therefore, to preserve

our nested structure and to allow the agents in this case to use the same PLM as in EE-

CGL, we assume that agents know the coefficients of the Taylor rule, which allows them

to only forecast xt and πt. With this assumption, the sequence of agents’ expectations

are computed as

IEIHt

∞∑T=t

βT−tyT+1 = Ĉt−1(1− β)−1 + Q̂t−1(I− βρ)−tρvt (23)

and

IEIHt

∞∑T=t

(λ1β)T−tyT+1 = Ĉt−1(1− λ1β)−1 + Q̂t−1(I− λ1βρ)−tρvt, (24)

where Ĉt−1 and Q̂t−1 are the coefficients of Equation (17) estimated using the constant

gain recursive least squares algorithm discussed previously. Substituting these expecta-

tions into Equation (7) and (8), the reduced form may be expressed as

yt = ΓIH + BIHĈt−1

+(B′1,IHQ̂t−1 (I− βρ)

−1 ρ+ B′2,IHQ̂t−1 (I− βλ1ρ)−1 ρ+ D

)vt + G�t

where CIHt = ΓIH + BIHĈt−1,

ΓIH = m

(π̄(θπ−1)(1−β) −

(1−β)π̄θπ(1−βλ)

(1−β)π̄(θx+1)(1−βλ) +

π̄(θπ−1)ψ(1−β)

),BIH = m

(1−β(θx+1)

(1−β) −βθπλψ(1−βλ)

1−βθπ(1−β) −

βθπ(1−λ)(1−βλ)

βλψ(θx+1)(1−βλ) +

ψ(1−β(θx+1))(1−β)

β(1−λ)(θx+1)(1−βλ) +

ψ(1−βθπ)(1−β)

)

QIHt = B′1,IHQ̂t−1 (I− βρ)

−1 ρ+ B′2,IHQ̂t−1 (I− βλ1ρ)−1 ρ+ D,

B′1,IH = m

(1− β(1 + θx) 1− βθπ

ψ (1− β (θx + 1)) ψ (1− βθπ)

), and B′2,IH = m

(−βθπλψ −βθπ(1− λ)

βλψ (θx + 1) β(1− λ1) (θx + 1)

).

The IH-CGL case dramatically alters the mapping from structural parameters to the

14

reduced form, while still nesting the RE solutions. The changes here generate an entirely

different type of loosening of restrictions compared to the EE case. The IH-CGL case

turns out to permit a range of values for CIHt and QIHt that are not obtainable in the

other cases for any reasonable parameterization of the model. We illustrate this in detail

in later section.

4.1.4 The fixed belief cases

Finally, the fourth and fifth expectation assumptions we consider are the fixed beliefs

counterparts to the EE-CGL and IH-CGL specifications. In particular, we fix the values

for Ĉ and Q̂ and do not consider any learning. Throughout the paper, we assume that

Ĉ = CRE, F̂ = FRE, and Q̂ = QRE such that the FB cases is defined by the explicit

mapping G : Ξ × Θ → {CRF ,FRF ,QRF} from Section 2, where Ξ is the vector ofstructural parameters to be estimated and Θ is a vector of structural parameters that

gives rises to the fixed forecasting function.

5 Taking the model to the data

5.1 The state space

The five expectation assumptions share a common reduced form and a common state

space representation. The common state space representation is given by

yobst = HXt

Xt = Jt−1 + Ft−1Xt−1 + St−1εt

where Jt−1 = Ω−1t−1µt−1, Ft−1 = Ω

−1t−1Ψ, St−1 = Ω

−1t−1ζ,

µt−1 =

CRFt

r̄ + (1− θπ)π̄ḡ

03×1

, Ωt−1 =

I2×2 02×3 −QRFt−θx −θπ 1 0 0 0 0

0 0 0 1 −1 0 0−1 0 0 0 1 −ω 0

02×5 I2×2

15

Ψ =

03×7

0 0 0 0 −1 0 00 0 0 0 0 0 0

0 0 0 0 0 ρa 0

0 0 0 0 0 0 ρe

, ζ =

01×2 −1 01×401×7

01×2 1 01×4

01×3 1 01×3

01×7

02×5 I2×2

,

where yobst = (xt, πt, it, gt)′ are the observable varaibles and Xt = (xt, πt, it, gt, yt, at, et)

′.

In the two adaptive learning cases, CRFt and QRFt evolve according to Equation (18) and

(19). Otherwise, CRFt = CRF and QRFt = Q

RF are fixed values.

5.2 Estimation results

In this section, we estimate the five versions of the model using maximum likelihood

and compare the results.6 We use US data from 1984q1 through 2008q3. We use CBO

measure of the output gap, the GDP deflator measure of inflation, the three month

treasury bill rate, and growth rate of real GDP as our observables.

To avoid known issues with weak identification and other estimation issues that arise

in these models, we calibrate some of the key parameters. We set β = 0.995, π̄ = 2%, and

ψ = 0.1. The slope of the Phillips curve (ψ) is set following Ireland (2004), which in the

Calvo pricing framework corresponds to the average firm adjusting its price roughly once

a year. We also calibrate the mean growth rate ḡ to the average growth rate observed

over the estimation period. The remaining parameters {θπ, θx, ω, ρa, ρe, σa, σe, σi, σg} areestimated along with {γx, γπ}, the gain parameters, in the CGL case.

Table 2 shows the estimation results for the five different model specifications. Com-

paring the results for RE to EE-CGL and IH-CGL, the three stylized facts noted in

Section 2 are all present. In particular, the inference on the structural parameters and

6To construct confidence intervals for the the structural parameter estimates, we calculate confidenceintervals using a method similar to the one proposed by Stock and Watson (1998) for time-varyingparameter models. We employ this method because the constrained optimization routine we use tomaximize the likelihood function provides unreliable numerical estimates of the Hessian matrix. Theconfidence intervals are constructed for each parameter by selecting a grid surrounding the ML estimate.We then re-maximize the log-likelihood function by searching over all other parameters, while holdingthe parameter of interest fixed at one point on the grid. The maximized log-likelihood value obtainedat that point is then compared to the original maximum log-likelihood value using a likelihood ratiotest. The set of points on the grid that yield maximized log-likelihood values that fail to reject thenull hypothesis that the true parameter vector lies in the restricted parameter space at the 5% levelconstitutes the 95% confidence interval.

16

exogenous shock process is roughly the same across the three cases. Although,tThe CGL

cases yield significant improvements in log-likelihood relative RE. And, the estimates for

the gain parameters are small. In fact, in our case the gains are very small, which is

partly due to the fact that our estimation does not make use of priors.

Turning the EE-FB and IH-FB cases, we note that the results are nearly indistinguish-

able from their learning counterparts. In particular, we cannot reject the null hypothesis

that the EE-FB model fits the data as well as the EE-CGL model using a likelihood

ratio test and only reject for the IH case at the 10% significance level. Therefore, the

loosening of restrictions described in Section 3 appear to account for a majority of the

improvement fit.

To illustrate the point, Figure 1 shows the estimated impulse responses for a monetary

policy shock, a preference shock, and a cost push shock for the five different models. The

monetary policy shock provides the clearest picture for the role that learning plays in

adding persistence. There is no mechanism to transmit the shocks in this model except

through adaptive learning. Therefore, the RE and FB cases only respond in the period

the shock is realized and then immediately return to steady state. In the two adaptive

learning cases, the shock does exhibit some persistence through beliefs leading to small

but persistent responses for the endogenous variables. However, the responses to the

remaining shocks are for the most part just scaled versions of the RE response. Learning

does not add any substantive additional dynamics in these cases.

Turning to the reduced form of the five models, Table 3 reports the elements of

QRF for each case (for learning we show the value implied by the initial belief) at their

estimated values reported in Table 2. Because of our decision to calibrate π̄, the implied

values of CRF are roughly equivalent across the five cases and not shown here. The

EE cases exhibits a modest and consistent departure from the RE case, while the IH

cases are drastically different. The table illustrates that relatively modest changes for

the structural parameters can imply fairly significant changes in the reduced form.

5.3 Out-of-sample fit

Finally, we compare the models in a real-time out-of-sample forecasting exercise to deter-

mine whether the improvement in fit translate in to out-of-sample forecasting power. We

use the real-time dataset provided by the Philadelphia federal reserve for inflation, gdp

growth, and interest rates. For a real-time measure of the output gap, we use the Fed’s

Green Book nowcast of the output gap. We evaluate the forecasts at four different fore-

cast horizons: the now cast, one-quarter, four-quarter, and six-quarter ahead forecasts

17

Table 2: ML estimates

RE EE-FB EE-CGL IH-FB IH-CGL

θπ 1.577 1.123 1.125 1.578 1.653[1.31 , 1.97] [0.99* , 1.61] [0.99* , 1.73] [1.36 , 1.97] [1.45 , 1.73]

θx 0.166 0.208 0.211 0.160 0.157[0.08 , 0.29] [0.11 , 0.36] [0.21 , 0.30] [0.08 , 0.26] [0.12 , 0.19]

ω 0.000 0.000 0.000 0.066 0.000[0.00* , 0.02] [0.00* , 0.01] [0.00* , 0.00] [0.00* , 0.48] [0.00* , 0.00]

ρa 0.933 0.926 0.925 0.973 0.991[0.87 , 1.00*] [ 0.86 , 0.99 ] [0.90 , 0.98] [0.95 , 0.99] [0.97 , 0.99]

ρe 0.967 0.917 0.899 0.684 0.525[0.87 , 1.00*] [0.80 , 1.00] [0.89 , 0.95] [0.50 , 0.86] [0.31 , 0.76]

σa × 100 1.671 1.633 1.613 0.555 0.146[0.82 , 6.77] [1.36 , 2.04] [1.47 , 1.79] [0.19 , 1.38] [0.01 , 1.46]

σe × 100 0.071 0.072 0.073 0.166 0.229[0.06 , 0.10] [0.06 , 0.09] [0.07 , 0.08] [0.11 , 0.22] [0.23 , 0.23]

σi × 100 0.558 0.558 0.561 0.560 0.557[0.46 , 0.70] [0.46 , 0.73] [0.54 , 0.58] [0.47 , 0.70] [0.52 , 0.61]

σg × 100 0.179 0.180 0.180 0.175 0.179[0.16 , 0.21] [0.16 , 0.22] [0.17 , 0.22] [0.15 , 0.21] [0.16 , 0.19]

γx - - 0.00 - 0.00[1.9E-06 , 2.5E-06] [2.4E-08 , 3.0E-08]

γπ - - 0.00 - 0.001[1.7E-09 , 1.8E-09] [0.00 , 0.02]

Log Likelihood 1974.49 1982.6 1983.72 1986.84 1988.97LR Statistic (rel. RE) 16.22 18.46 24.42 28.96p-value (0.005) (0.006) (0.000) (0.000)LR Statistic (rel. FB) 2.24 4.54p-value (0.163) (0.052)

Notes: *Denotes an imposed boundary for the parameter that truncates the confidence interval. ML estimates for theIreland model under the five different assumptions for expectations. 95% confidence intervals for the estimates are shown inbrackets below the point estimates. The confidence intervals are truncated when they hit a theoretically imposed boundary,which is indicated by *. The forecasting function in the FB cases and the initial beliefs in the CGL cases are set to the REsolution implied by the estimates in the first column.

Table 3: Reduced form estimates of Q̃RF

Q̃11RF Q̃21RF Q̃

12RF Q̃

22RF

RE 0.060 0.083 8.899 -2.901FB-EE 0.091 0.085 7.272 -2.921CGL-EE 0.091 0.085 7.126 -2.881FB-IH -0.858 0.023 2.367 -1.355CGL-IH -3.251 0.081 1.840 -0.937

Notes: Reduced form values implied by the estimated parameters given inTable 2.

18

Figure 1: Estimating Impulse Responses

0 20 40 60 80 100Output Gap

0

0.2

0.4

0.6 IH-CGLIH-FBEE-CGLEE-FBRE

0 50 100Inflation

0.5

1

1.5

2

0 20 40 60 80 100Interest Rate

2.5

3

3.5

4

0 5 10 15GDP Growth

2.5

3

3.5

4

4.5

5

5.5

Cost push shock

0 20 40 60 80 100Output Gap

0

0.2

0.4

0.6

0.8

0 50 100Inflation

2

2.5

3


4

4.5

5

5.5

0 5 10GDP Growth

3

4

5

6

Preference shock

0 10 20 30 40 50Output Gap

-0.6

-0.4

-0.2

0

0 50 100Inflation

1.96

1.97

1.98

1.99

2

2.01


4

4.5

5

5.5

6

0 5 10GDP Growth

1

2

3

4

5

Interest rate shock

Notes: Impulse responses to one standard deviation shocks implied by the estimated parametersgiven in Table 2.

19

for inflation and GDP growth. We use the second releases of data as the comparison

series to measure forecast accuracy. Inference on improvements in forecast accuracy are

obtained using the Diebold and Mariano (1995) test statistic (DM) with the Harvey et

al. (1997) small sample and forecast horizon correction. This test statistic is found to

work well on tests of real-time forecast by Clark and McCracken (2009) and Clark and

McCracken (2011).

Table 4 shows the out-of-sample results for the five cases plus results fro bivariate

VAR(2) using inflation and GDP growth and a random walk forecast. The top row,

labeled RE, gives the root mean squared forecast error (RMSFE) of the RE forecasts for

inflation and GDP growth at the four different horizons. The remaining rows give the

relative RMSFE of the remaining forecasts compared to RE with the DM test statistic

in parentheses below. The results here are mixed. The FB and CGL cases generate

consistent improvements over the RE model for inflation with the largest gains observed

at short horizons. Comparing the VAR and the random walk, CGL and FB do best at

long horizons. For GDP Growth, both FB and CGL do about the same as the RE model

at all horizons and improve relative to the VAR and random walk at the long horizons.

The results illustrate that the in sample improvements in fit translate into out-of-

sample forecasting power and are not just evidence of overfitting. Furthermore, most of

the gains observed in this case are explained by the FB cases. The additional persistence

added by learning does not add much to out-of-sample forecasting power.

6 Understanding how RE restrictions affect fit

To illustrate how small changes in the reduced form can lead to large changes in model

fit, we turned to calibrated version of the models. We consider the possible EE-FB and

IH-FB reduced forms values that are implied for a range of the structural parameters

( θπ, θx, ρa, and ρe), while holding all other parameters fixed. We then use the range

of reduced form values generated by the two cases to solve for θπ, θx, ρa, and ρe that

are necessary to generate the same reduced form values under RE. In particular, we

set Θ = (θπ, θx, ρa, ρe)′ and calculate G(Θ, Θ̄) for 1.4 < θπ < 1.6, 0.1 < θx < 0.3,

.75 < ρa < .95, and .65 < ρa < .85, where Θ̄ = (1.5, 0.2, 0.85, 0.75)′ is fixed. We then

calculate ΘRE = F−1(G(Θ, Θ̄)) and compare ΘRE to Θ.

Figure 2 shows the resulting values of QEEand QIH , where the black dot represents

the value QRE implied by Θ̄. The remaining two columns shows the original grid for

Θ in red and the implied RE values of ΘRE in blue. The EE case shows the effects of

a reduction in nonlinearity can potentially have on model fit. RE can cover the same

20

Table 4: Real-time forecast results

Inflation GDPAnnualized RMSFE Annualized RMSFE

t (Nowcast) t+1 t+4 t+6 t (Nowcast) t+1 t+4 t+6

RE 1.11 1.06 0.93 0.99 RE 2.28 2.04 2.06 1.68

Relative RE Relative RE

RW 1.02 0.85** 0.80*** 0.86** RW 1.00 1.02 1.12 1.20(2.21) (-1.99) (-2.58) (-2.04) (-0.01) (0.86) (1.64) (4.29)

Bi-VAR 0.84** 0.78*** 1.36 1.18 Bi-VAR 0.95 0.94 1.15 1.09(-2.17) (-2.98) (3.67) (1.81) (-0.96) (-0.83) (2.82) (1.11)

FB-EE 0.99** 0.99 0.98 0.99 FB-EE 1.11 1.03 0.98 0.98(-1.74) (-0.61) (-0.91) (-0.46) (1.01) (2.46) (-0.98) (-0.46)

FB-IH 0.92** 0.88** 0.85** 0.86** FB-IH 1.09 1.02 1.00 1.01(-2.01) (-1.94) (-2.12) (-2.04) (0.89) (1.87) (0.01) (0.95)

CGL-EE 0.99 0.98 0.99 1.00 CGL-EE 1.10 1.03 0.98 0.98(-0.98) (-1.01) (-0.53) (-0.03) (0.95) (2.39) (-0.74) (0.07)

CGL-IH 0.95* 0.88*** 0.85** 0.84*** CGL-IH 1.09 1.02 0.99 0.99(-1.39) (0.88) (-2.12) (-2.76) (0.85) (1.97) (-0.72) (0.14)

*** p < 0.01, ** p < 0.05, * p < 0.1

Notes:TBD

reduced form values using a smaller range of parameters but it constrains the possible

values that the AR coefficients of the shocks can take on. Under EE-FB, more persistent

shocks are permitted, while maintaining very similar values for the monetary policy

rule and identical values for the remaining parameters. This allows the EE-FB case to

generate more persistence by simply increasing the persistence of the exogenous shocks

without significantly affecting other parts of the reduce form.

In the IH-FB case, the mechanism on display is more extreme than in the EE case.

The IH-FB case spans a much larger range of the reduced form space for the same grid

of points. The space is so large that the RE solution is incapable of covering the same

area with parameter values that satisfy the Taylor principle or stationary of the shock

processes. Now of course, it could be the case that these ranges of the parameters space

are not empirically relevant. But this is not what we find. In fact, if no constraints are

placed on the reduced form when it is estimated it returns values that only the IH case

can generate for plausible structural parameter values.

The key takeaway from this exercise is that it is not necessary for the structural

parameters to change in any substantial way to fundamentally alter the way the model

fits the data. In fact, it is precisely because the changes to the structural parameters are

small that it is easy to attribute changes in fit to the persistence that adaptive learning

strategy can generate.

21

Figure 2: The mapping from structural parameters to the reduced form

EE-CGL

IH-CGL

Notes: TBD

22

7 Robustness

To illustrate that the improvements observed in the FB case are robust, we replicate the

result in the model of Smets and Wouters (2007). We use the replication files provided by

the American Economic Review for the paper. Smets and Wouters estimate the model

using Bayesian techniques in Dynare. We replicate the benchmark case found in Table 1

and Table 1B of their paper.7 For the FB case, we use the RE solution of the model at the

posterior modes as the fixed belief function. We then estimate the structural parameters

around the fixed forecasting function by Bayesian methods in Dynare.

Table 5 provides selected parameter estimates and the marginal log-likelihood of the

two models. The parameter estimates shown in the table are those that can be mapped

into the main parameters of the Ireland model. The parameter labels are chosen to reflect

these connections. The full set of parameter estimates are given in the appendix.

Table 5 demonstrates that the key results obtained in the Ireland model hold in this

model. In particular, the parameter estimates are relatively unchanged and marginal

likelihood is higher in the FB case. In Slobodyan and Wouters (2012b), the average

improvement in marginal likelihood was 6.3 with a maximum of 18 across 5 different

CGL specifications. Therefore, the improvement observed here are consistent with the

improvement observed in the Ireland model studied in the last section.

8 Conclusion

This paper demonstrates that improvements in in-sample fit of New Keynesian DSGE

models that assume some type of bounded rationality relative to their rational coun-

terparts are in large part explained by the loosening of the restrictions imposed on the

structural parameters by Rational Expectations. This finding is in contrast to the prevail-

ing explanation given in the literature that improvements are primarily due an increased

ability to generate persistence in key endogenous variables. While increasing a model’s

ability to generate persistence remains important, it accounts for a much smaller portion

of the overall improvements observed. Furthermore, we demonstrate that these increases

translate into modest real-time out-of-sample forecast accuracy and again explain a sub-

stantial portion of the improvements observed by the same model using adaptive learning

with a constant gain.

7Our results differ slightly from the results reported in Smets and Wouters (2007) for the RE case.However, since we use the their official replications files, we have no reason to doubt that the observed dif-ferences are anything more than numerical imprecision that arises from running estimations on differentversions of Matlab.

23

Table 5: FB in Smets and Wouters 2007

Prior Distribution Prosterior Distribution

Distr. Mean St. Dev Exp. Mean 95% HPD 5% HPD

θπ norm 1.5 0.25 RE 2.054 1.765 2.354FB 1.740 1.406 2.071

θx norm 0.125 0.05 RE 0.217 0.173 0.262FB 0.190 0.145 0.234

π̄ × 4 gamma 2.5 0.4 RE 2.616 2.140 3.080FB 2.524 1.988 3.060

ρa beta 0.5 0.2 RE 0.204 0.070 0.337FB 0.205 0.036 0.358

ρe beta 0.5 0.2 RE 0.895 0.817 0.977FB 0.897 0.829 0.968

ρz beta 0.5 0.2 RE 0.955 0.936 0.974FB 0.956 0.927 0.986

σa invg 0.1 2 RE 0.242 0.205 0.282FB 0.259 0.232 0.286

σe invg 0.1 2 RE 0.246 0.220 0.270FB 0.235 0.212 0.258

σz invg 0.1 2 RE 0.432 0.386 0.474FB 0.443 0.397 0.488

σi invg 0.1 2 RE 0.138 0.108 0.166FB 0.123 0.106 0.138

RE Marginal Likelihood: -924.8 Slobodyan and Wouters RE -922.75FB Marginal Likelihood: -920.3 (JEDC 2012) CGL -922.65/-916.4

Notes:TBD

24

The results suggest that the overall structure of the New Keynesian model is not

necessarily at odds with data and that RE may actually be a key source of model mis-

specification. Bounded rationality strategies that separate the estimation of beliefs from

the structural parameters of the model may increase model fit and out-of-sample fore-

casting by loosening the cross equation restrictions imposed on the structural parameters

by RE.

References

Adolfson, Malin, Jesper Lindé, and Mattias Villani, “Forecasting Performance of

an Open Economy DSGE Model,” Econometric Reviews, April 2007, 26 (2-4), 289–328.

Amato, Jeffery D and Thomas Laubach, “Rule-of-thumb behaviour and monetary

policy,” European Economic Review, 2003, 47 (5), 791–831.

Ball, Laurence, N Gregory Mankiw, and Ricardo Reis, “Monetary policy for

inattentive economies,” Journal of monetary economics, 2005, 52 (4), 703–725.

Berardi, Michele, Jaqueson K Galimberti et al., “On the initialization of adaptive

learning algorithms: A review of methods and a new smoothing-based routine,” Centre

for Growth and Business Cycle Research Discussion Paper Series, 2012, 175.

Clark, Todd E and Michael W McCracken, “Tests of equal predictive ability with

real-time data,” Journal of Business & Economic Statistics, 2009, 27 (4), 441–454.

and , “Advances in forecast evaluation,” working paper, 2011.

Cogley, Timothy and Argia M Sbordone, “Trend inflation, indexation, and inflation

persistence in the New Keynesian Phillips curve,” The American Economic Review,

2008, 98 (5), 2101–2126.

Cole, Stephen and Fabio Milani, “The misspecification of expectations in new key-

nesian models: A dsge-var approach,” Technical Report 2014.

Croushore, Dean and Tom Stark, “A real-time data set for macroeconomists: Does

the data vintage matter?,” Review of Economics and Statistics, 2003, 85 (3), 605–617.

Diebold, Francis X and Robert S Mariano, “Comparing predictive accuracy,” Jour-

nal of Business & economic statistics, 1995, 13 (3).

25

, Frank Schorfheide, and Munchul Shin, “Real-Time Forecast Evaluation of

DSGE Models with Stochastic Volatility,” May 2015, pp. 1–24.

Evans, George W and Seppo Honkapohja, Learning and expectations in macroeco-

nomics, Princeton: Princeton University Press, January 2001.

Fuhrer, Jeffrey C, “Habit formation in consumption and its implications for monetary-

policy models,” American Economic Review, 2000, pp. 367–390.

Galı, Jordi and Mark Gertler, “Inflation dynamics: A structural econometric anal-

ysis,” Journal of monetary Economics, 1999, 44 (2), 195–222.

Gaus, Eric and Srikanth Ramamurthy, “Estimation of Constant Gain Learning

Models,” mimeo, August 2012.

Harvey, David, Stephen Leybourne, and Paul Newbold, “Testing the equality

of prediction mean squared errors,” International Journal of Forecasting, 1997, 13,

281–291.

Hommes, Cars, Behavioral rationality and heterogeneous expectations in complex eco-

nomic systems, Cambridge University Press, 2013.

Ireland, Peter N, “Technology Shocks in the New Keynesian Model,” The Review of

Economics and Statistics, November 2004, 86 (4), 923–936.

Kolasa, Marcin, Michal Rubaszek, and Pawel Skrzypczynski, “Putting the New

Keynesian DSGE model to the real-time forecasting test,” Journal of Money, Credit

and Banking, October 2012, 44 (7), 1301–1324.

Mankiw, N Gregory and Ricardo Reis, “Sticky Information versus Sticky Prices:

A Proposal to Replace the New Keynesian Phillips Curve,” Quarterly Journal of Eco-

nomics, 2002, pp. 1295–1328.

Milani, Fabio, “A Bayesian DSGE Model with Infinite-Horizon Learning: Do” Mechan-

ical” Sources of Persistence Become Superfluous?,” International Journal Of Central

Banking, 2006.

, “Expectations, learning and macroeconomic persistence,” Journal of Monetary Eco-

nomics, 2007, 54 (7), 2065–2082.

26

Negro, Marco Del, Frank Schorfheide, Frank Smets, and Rafael Wouters, “On

the Fit of New Keynesian Models,” Journal of Business and Economic Statistics, April

2007, 25 (2), 123–143.

Preston, Bruce, “Learning about Monetary Policy Rules when Long-Horizon Expec-

tations Matter,” International Journal of Central Banking, 2005.

Rotemberg, Julio J, “Monopolistic price adjustment and aggregate output,” The Re-

view of Economic Studies, 1982, 49 (4), 517–531.

Slobodyan, Sergey and Raf Wouters, “Learning in a Medium-Scale DSGE Model

with Expectations Based on Small Forecasting Models,” American Economic Journal:

Macroeconomics, April 2012, 4 (2), 65–101.

and , “Learning in an estimated medium-scale DSGE model,” Journal of Economic

Dynamics and control, 2012, 36 (1), 26–46.

Smets, Frank and Rafael Wouters, “Shocks and Frictions in US Business Cycles: A

Bayesian DSGE Approach,” The American Economic Review, 2007, pp. 586–606.

, Anders Warne, and Rafael Wouters, “Professional forecasters and real-time

forecasting with a DSGE model,” International Journal of Forecasting, October 2014,

30 (4), 981–995.

Stock, James H and Mark W Watson, “Median unbiased estimation of coefficient

variance in a time-varying parameter model,” Journal of the American Statistical As-

sociation, 1998, 93 (441), 349–358.

Appendix A

In what follows, we describe the model and derive household and firm decision rules

following Preston (2005) that depend on expectations but not specifically on any explicit

assumption for how expectations are formed. This gives us a general setting from which

the consequences of different expectation assumption on the reduced form may be tracked

systematically.

Households seek to maximize the following expected utility function

ĨEi,t

∞∑t=0

βt[atln(Ci,t) + ln(Mi,t/Pt)− η−1hηi,t

](A1)

27

by choosing consumption, money holdings, labor supply, and by taking into account a

preference shock at, where ĨEt represents a general expectations operator that is yet to

be defined.8 The representative household is faced with the budget constraint

Mi,t−1 +Bi,t−1 + Tt +Wthi,t + ∆i,t ≥ PtCi,t +Bi,t/rt +Mt, (A2)

where M is nominal money balances, B is nominal bond, T is transfers, W is the nominal

wage, and ∆ is nominal profits the household receives from ownership of firms.

Production in the economy is separated into two sectors: a perfectly competitive

finished goods sector and a monopolistically competitive intermediates goods sector. The

finished goods sector uses a continuum of intermediates goods of prices Pj to construct

the finished good. The production function is a CES constant returns to scale technology

(∫ 10

Y(θt−1)/θtj,t dj

)θt/(θt−1)≥ Yt, (A3)

where θt is a cost push shock.9 The finished-good-producing firms maximize profits

subject to demand for their good

Yj,t = (Pj,t/Pt)−θt Yt. (A4)

Finished goods price is given by

Pt =

(∫ 10

P 1−θtj,t dj

)1/(1−θt)(A5)

for all t. The intermediate-goods-producing firms hires hj,t units of labor to manufacture

Yj,t units of outputs using

Zthj,t ≥ Yj,t, (A6)

where Zt is an aggregate technology shock.10 To introduce price stickiness, it is assumed

that firms face an explicit cost to adjust nominal prices following Rotemberg (1982) that

is measured in terms of finished goods

φ

2

(Pj,t

π̄Pj,t−1− 1)2

Yt. (A7)

8It is assumed that 0 < β < 1, η ≥ 1, and ln(at) = ρaln(at−1) + �a,t.9θt = (1− ρθ)ln(θ) + ρθln(θt−1) + �(θ, t).

10ln(Zt) = ln(z) + ln(Zt−1) + �z,t.

28

Lastly, the output gap is defined as the ratio between the actual and efficient levels of

output

xt =

(1

at

)1/ηYtZt. (A8)

Households’ decision rule

We begin by deriving the household’s optimal decision rule following Preston (2005). The

conditions of the household’s optimal decision are given by

atC−1i,t = βrtĨEi,tat+1C

−1i,t+1π

−t+1 (A9)

hη−1i,t = atC−1i,t Wt (A10)

Bi,t−1 +Wthi,t + ∆i,t = PtCi,t +Btr−1t , (A11)

where we have eliminate the variables dealing with money and transfers in the budget

constraint. Starting with budget constraint, we put it into real terms by dividing by the

price level11 and make it stationary using substitution to account for the unit root TFP

process Zt12

bi,t−1π−1t + ωtZthi,t + di,tZt = ci,tZt + bi,tr

−1t .

Then, rearranging and summing, we obtain the lifetime budget constraint

∞∑T=t

βT−tci,TZt =∞∑T=t

βT−t(wTZThi,T + di,TZt),

which allows us to divide out Zt. Then noting that wThi,T + di,T = yi,T , we have

∞∑T=t

βT−tĉi,T =∞∑T=t

βT−tŷi,t. (A12)

We then log-linearize the stationary Euler equation

ĉi,t = ĨEi,tĉi,t+1 − (it − Etπ̂t+1) + (ρa − 1)ât

and solve it backwards recursively to get

11We assume wt = Wt/Pt, Dt = ∆t/Pt, bt = Bt/Pt.12We assume ωt = Wt/(PtZt), dt = ∆t/(PtZt).

29

ĨEi,tĉi,T+1 = ĉi,t + ĨEi,t

T∑s=t

((is − π̂s+1)− (ρa − 1)ât).

Then summing and discounting this expectation we get

ĨEi,t

∞∑T=t

βT−tĉi,T+1 =1

1− βĉi,t −

1

1− βĨEi,t

∞∑T=t

βT−t((iT − π̂T+1)− (ρa − 1)âT ). (A13)

Combining Equation (A12) and (A13) yields the households decision rule for consumption

ĉi,t = ĨEi,t

∞∑T=t

βT−t [(1− β)yj,T+1 − (iT − π̂T+1)− (ρa − 1)âT ] .

Aggregating across households and using the log linearized output gap, we obtain the

aggregate IS curve

xt = −ωât + ĨEt∞∑T=t

βT−t [(1− β)(xT+1 + ωρaâT )− (iT − π̂T+1)− (ρa − 1)âT ] , (A14)

which is absent any assumption about how expectations are formed.

Firms’ decision rule

Firms seek to maximize the present value of their companies

ĨEj,t∑T=t

Qt,TPTΠj,T , (A15)

where

Πj,T =

(Pj,TPT− MCj,T

PT

)Yj,t −

Φ

2

(Pj,T

π̄Pj,T−1− 1)2

Yt (A16)

and Qt,T = βT−t aT

cT. The first order condition of the firm’s problem is

Φ

(Pj,t

π̄Pj,t−1− 1)

1

π̄Pj,t−1= ĨEj,t

[Qt,t+1Yt+1Qt,tYt

Φ

(Pj,t+1π̄Pj,t

− 1)Pj,t+1π̄P 2j,t

](A17)

+ θt

(Pj,tPt

)−1−θt (wtθt − Zt(θt − 1)Pj,tP 2t Zt

).

30

Because we assume π̄ ≥ 1, the model does not have a steady state price level. Therefore,to make the model stationary, we define: P̂j,t+i = Pj,t+i/Pt+i and πt+i = Pt+i/Pt+i−1 for

all i. Likewise, wages are made stationary by the following substitution ωt = wt/PtZt.

Substituting in these definitions yields

Φ

(P̂j,tPt

π̄P̂j,t−1Pt−1− 1

)1

π̄P̂j,t−1Pt−1= ĨEj,t

[Qt,t+1Yt+1Qt,tYt

Φ

(P̂j,t+1Pt+1

π̄P̂j,tPt− 1

)P̂j,t+1Pt+1

π̄P̂ 2j,tP2t

]+

(P̂j,t

)−θt (θtωt + (1− θt)P̂j,t

).

Now, noting that market clearing implies Ct = Yt, multiplying both sides by Pt, and

using the definition of inflation we get

Φ

(P̂j,tΠt

Π̄P̂j,t−1− 1

)Πt

π̄P̂j,t−1= ĨEj,t

[at+1at

Φ

(P̂j,t+1Πt+1

π̄P̂j,t− 1

)P̂j,t+1Πt+1

π̄P̂ 2j,t

]+

(P̂j,t

)−θt (θtωt + (1− θt)P̂j,t

).

Log-linearizing this expression

−Φpj,t−1 +(Φ(β+1)−1+ θ̄)pj,t−Φβ ĨEj,tpj,t+1 = Φβ ĨEj,tπt+1−Φπt− ω̂t(1− θ̄)− θ̂t (A18)

and introducing lag polynomials, we can write this as

(1

βL2 − Φ(β + 1)− 1 + θ̄

ΦβL+ 1

)ĨEj,tpj,t+1 =

1

β(πt − βEtπt+1) +

1− θ̄βΦ

ω̂t +1

Φβθ̂t.

(A19)

Factoring the lag polynomial and solving forward the unstable root yields

(1− λ1L)pj,t =−λ−12 L−1

1− λ−12 L−1L

β

(πt − βEtπt+1 +

1− θ̄Φ

ω̂t +1

Φet

)= −λ1ĨEj,t

∞∑T=t

(λ1β)T−tπT + λ1β ĨEj,t

∞∑T=t

(λ1β)T−tπT+1

+ĨEj,t

∞∑T=t

βT−t(ψxT − eT ).

31

where (θ̄ − 1)ηΦ−1ωt = ψxt, Φ−1θ̂t = et, 0 < λ1 < 1, λ2 > 1, λ2 = 1λ1β , and λ1 + λ2 =(Φβ)−1(Φ(β + 1)− 1 + θ̄). Combining terms we have13

(1− λ1L)pj,t = −πt + λ1ĨEj,t∞∑T=t

(λ1β)T−t1− λ1

λ1πT + ψxT − eT .

Finally, aggregating across firms yields the aggregate the Phillips curve free of any ex-

pectations assumptions

πt = λ1ĨEt

∞∑T=t

(λ1β)T−t(

1− λ1λ1

πT + ψxT − eT). (A20)

Monetary policy

The model is closed with a standard contemporaneous Taylor rule

it = r̄ + π̄ + θπ(πt − π̄) + θxxt + �i,t (A21)

where �i,t is an i.i.d. monetary policy shock.

13noting that −λ1πt − λ21βπt+1 − λ31β2πt+2... and λ1βπt+1 + (λ1β)2πt+2 + (λ1β)3πt+3... and −λ1πt +λ1β(1− λ1)πt+1 + (λ1β)2(1− λ1)πt+2 + (λ1β)3(1− λ1)πt+3....

32

1 Introduction2 The stylized facts of estimated NK models with CGL3 Relaxing the RE restrictions through CGL4 The Model, Expectations, and the Reduced Form4.1 The Model4.1.1 Rational Expectations4.1.2 Euler equation learning4.1.3 Infinite horizon learning4.1.4 The fixed belief cases

5 Taking the model to the data5.1 The state space5.2 Estimation results5.3 Out-of-sample fit

6 Understanding how RE restrictions affect fit7 Robustness8 Conclusion

Documents

Expectations and the empirical t of DSGE modelschristopherggibbs.weebly.com/uploads/3/8/2/6/... · unrestricted VAR. In addition,Diebold et al.(2015) nds that adding stochastic volatil-ity