Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Expectations and the empirical fit of DSGEmodels
Preliminary draft prepared for SNDE. Check chrisopherggibbs.weebly.com for latest draft.
Eric Gaus∗ Christopher G. Gibbs†
March 2017
Abstract
This paper studies the mechanisms that contribute to the observed improvementin fit of DSGE models that assume adaptive learning. DSGE models often struggleto reproduce the persistence observed in actual macroeconomic data. To generatethis persistence, many researchers have proposed relaxing the rational expectations(RE) assumption in favor of adaptive learning. Empirically, these relaxations arefound to increase persistence and almost universally improve model fit comparedto RE. However, we show that these improvements are not always explained by theincrease in persistence implied by adaptive learning. Instead, a significant propor-tion of the increased fit is due to the relaxation of the cross equation restrictionsimposed by RE alone. Our findings suggest that many model comparisons betweenRE and adaptive learning speak more to the misspecification of RE than to theveracity of the particular learning strategy. We propose a simple check to assessthe contribution that learning adds to any estimated model.
1 Introduction
DSGE models often struggle to reproduce the persistence observed in actual macroeco-
nomic data. To overcome this issue, many researchers add additional frictions, preference
assumptions, or ad hoc adjustment.1 Alternatively, some have argued that relaxing the
rational expectations (RE) assumption in favor of adaptive learning offers a theoretically
∗Gaus: Ursinus College, 601 East Main St., Collegville, PA 19426-1000 (e-mail: [email protected])†Gibbs: University of New South Wales. (e-mail: [email protected])1The lack of persistence generated by DSGE models under RE is a well known. These modifications
include ad hoc corrections such as adding lags of the endogenous variables to the structural equations(Galı and Gertler (1999) and Ireland (2004)), changes to preference such as habit persistence (Fuhrer(2000)), changes to inflation setting by firm through inflation indexation (Cogley and Sbordone (2008)),or adding information problems such as rule-of-thumb behavior (Amato and Laubach (2003)), or rationalinattention/sticky information (Mankiw and Reis (2002); Ball et al. (2005)) to name a few. Armed withsubset of these modifications, Del Negro et al. (2007) declare that the NK model fits the data well enoughto be used for policy evaluation.
1
appealing way to generate persistence without employing additional modifications to the
model structure. For example, Milani (2006, 2007), Slobodyan and Wouters (2012a,b),
and Cole and Milani (2014) show that relaxing RE in favor of an adaptive learning
strategies can generate significant improvements in in-sample fit and in some cases may
lessen the reliance on assumptions such as habit persistence (Milani (2007)) to explain
macroeconomic persistence, while leaving inference on all other parameters unchanged.
In this paper, we reexamine the observed improvements in empirical fit of DSGE
models that assume adaptive learning. A broad conclusion in much of this literature is
that the improvement in fit observed for these models is directly tied to the persistence
that they can potentially generate. However, we show that this conclusion is not always
correct and may depend crucially on the specific modeling choices for expectations. In
particular, considering any relaxation of RE has implications for the cross equations
restrictions imposed on the time series representation of the model that is taken to the
data. RE typically places tight restrictions on this underlying structure, which may be
loosened in surprising ways whenever a relaxation of RE is considered. We show that
the loosening of these restrictions alone can account for a significant proportion of the
improvement in in-sample fit and out-of-sample forecast accuracy for estimated New
Keynesian models that assume adaptive learning.
We illustrate the point by comparing RE and adaptive learning with a constant gain,
which we refer to as constant gain learning (CGL), to an intermediate case, which we call
fixed beliefs (FB). In the FB case, we assume that beliefs are formed by a fixed forecasting
function with the same reduced form as the RE solution of the model. The FB case,
therefore, nests RE as a special case and is nested by adaptive learning. Importantly,
though, the FB case does not add any mechanisms to the model that generate persistence.
All it does is separate the estimation of the structural parameters from the estimation
of beliefs. In fact, one way to view the FB case is as estimating the model under CGL
with the gain parameters, which governing the effect of new information on the agents’
forecasting functions, constrained to be zero. In this case, the agents’ forecasts in all
periods coincide with whatever is chosen or estimated for their initial beliefs.
We explore this relaxation in a version of the New Keynesian model developed by
Ireland (2004), which we strip of all internal propagation mechanisms that are not di-
rectly generated as a consequence of expectations. The model offers a parsimonious and
tractable laboratory to study the role of expectations play in estimation, but which re-
mains significantly detailed to be taken to data. We focus on the two of most widely
used adaptive learning strategies in the literature: Euler Equation constant gain learning
(EE-CGL) as in Evans and Honkapohja (2001) and Infinite Horizon constant gain learn-
2
ing (IH-CGL) as proposed by Preston (2005). These two cases are interesting to compare
because they imply significantly different ranges of persistence for identical calibrations.
In EE-CGL case, we show that nearly all of the improvement are explained by the re-
laxation of RE in the FB case and not added persistence. We then confirm this result
in the medium scale DSGE model of Smets and Wouters (2007). In this model, we find
that relaxing the RE assumption without learning generates significant improvements in
in-sample fit that are similar in magnitude to those reported by Slobodyan and Wouters
(2012b) for a number of different EE-CGL specifications. In the IH-CGL case, learning
and persistence does play a more prominent role. But, even here, the majority of the
improvement in in-sample fit comes from the loosening of the cross equation restrictions
relative to RE.
There are three main takeaways from our results:
1. First, our results provide continued support for the exploration of bounded rational-
ity strategies in DSGE model since even small deviations from RE offer significant
improvements in-sample fit and real-time out-of-sample forecasting performance,
while maintaining the standard theoretical structure of the DSGE model.
2. Second, our results demonstrate that comparisons between bounded rationality
strategies and RE should be done with care. Many comparisons say more about
the misspecification of the model under RE than the benefits of the particular
learning strategy.
3. Finally, our results suggest a natural benchmark to asses an adaptive learning
strategy is to compare the fit of the model with parameter updating to the model
without parameter updating. If learning is important, then model fit should sig-
nificantly improve when parameter updating is allowed. This is also a natural test
for the role played by the initial beliefs chosen for the learning algorithm.2
The existing literature recognizes that different assumption for initial beliefs and
different specifications of learning vary in their performance relative to RE. However,
the existing literature does not attempt to distinguish the shared features of these as-
sumption that generate improvements in fit and no study to our knowledge has assessed
whether those improvements translate into real-time out-of-sample forecast accuracy.
For example, both Slobodyan and Wouters (2012a,b) and Cole and Milani (2014) find
2Slobodyan and Wouters (2012a) and Berardi et al. (2012) both show that the selection of initialbeliefs can have significant effects on inference and fit.
3
improvements in fit relative to RE whether agents are assumed to have correctly speci-
fied time-varying beliefs (as traditionally assumed under adaptive learning), misspecified
time-varying beliefs, or fixed but non-rational beliefs. Our results show that the same
mechanism may drive a substantial portion of the observed improvements in each of these
cases. In addition, previous comparisons of out-of-sample fit, such as the exercise done
by Slobodyan and Wouters (2012a), only consider ex-post data, which has the potential
to overstate true forecasting power as noted by Croushore and Stark (2003).
Although the out-of-sample fit of the NK model with adaptive learning is largely
unexplored in the literature, the RE counterpart of the model has been studied by nu-
merous researchers.3 In particular, Adolfson et al. (2007) find that DSGE forecasts can
out-perform VARs in short-term forecasts and Bayesian VARs over the long-term fore-
casts in an open-economy framework. Smets and Wouters (2007) also find comparable
out-of-sample forecast performance of the model relative to Bayesian VARs. Del Negro
et al. (2007) finds, using DSGE-VARs, that the cross-equation restrictions implied by
a rational solution of a New Keynesian model can add forecasting power relative to an
unrestricted VAR. In addition, Diebold et al. (2015) finds that adding stochastic volatil-
ity to the model improves forecasting. Although, a caveat to these papers is the use of
ex-post data instead of real-time data. Kolasa et al. (2012) reevaluates the DSGE-VAR
evidence against a large scale NK DSGE model with real-time data and finds that except
for short-term forecasts, the large DSGE model can perform as well, or better than the
DSGE-VAR. Smets et al. (2014) also finds sizable gains in real-time forecasting accuracy
in NK models when actual survey expectations are used as observables.
Our results compare favorably to these results with improvements in forecast accuracy
at long horizons for inflation and comparable accuracy at short horizons when compared
to parsimonious time series models on US out-of-sample data spanning the Great Mod-
eration. Likewise, forecasts for output growth are best at long horizons relative to short
horizons. However, no noticeable improvements over the time series models is observed
for output growth forecasts in the NK model considered here.
In the next section, we discuss the stylized facts for estimation of New Keynesian
DSGE models with constant gain learning to highlight the key relationship this paper
explains. In Section 3, we discuss how constant gain learning strategy are typically
introduced into a DSGE model and how it changes the mapping from the structural
parameters of interest to the reduced form of the model taken to the data when comparing
RE to CGL. In Section 4, we introduce a parsimonious DSGE model and derive the
reduced form mappings implied under different expectation assumptions. In Section 5, we
3The exception of course being Slobodyan and Wouters (2012a).
4
estimate the five different models and compare the results to the stylized facts discussed
in Section 2. In Section 6, we explain the role that the loosening of the RE restrictions
plays in improving model fit. In Section 7, we show that the same mechanism driving fit
in our parsimonious model appear to underlie the improvements in fit of medium scale
DSGE models as well. In Section 8, we conclude.
2 The stylized facts of estimated NK models with
CGL
Table 1 reports the parameter estimates from two of the most widely cited studies on
estimated NK models with adaptive learning: Milani (2007) and Slobodyan and Wouters
(2012b). Milani investigates adaptive learning in a small scale DSGE model, while Slo-
bodyan and Wouters investigate learning in the medium scale DSGE model of Smets and
Wouters (2007). The table reports some key parameter estimates from the two studies
for the respective models assuming RE and asumming minimum state variable (MSV)
CGL. The parameter estimates exhibit three stylized facts that are ubiquitous among
CGL estimation exercises:
1. First, the estimates of the key parameters that describe monetary policy and the
exogenous shocks are mostly unchanged between the RE and CGL cases. In par-
ticular, columns 3 and 6 in the table show the difference between the two cases,
where asterisks denote whether the changes falls inside of the 95% HPD intervals
of the RE estimates. Nearly all of the parameters estimates under CGL remain
within the HPD interval of the rational estimates.
2. Second, the model under CGL exhibits a significant improvement in in-sample fit
as measured by marginal log-likelihood relative to RE.
3. Third, the gain parameters that govern how agents update their beliefs in the
learning algorithm are estimated to be relatively small.
The first stylized fact is often interpreted as evidence that persistence explains the
improvement in fit documented by the second stylized fact. This is because small changes
in the structural parameters appear to suggest that the structure of the model fits the
data in the same way as under RE. However, the third fact - small estimated values for
the gain parameters - complicates this interpretation. Small gain parameters such as
these can imply extreme persistence in the learning process that goes beyond business
5
cycle frequencies that are typically of interest to an econometrician. In practice, these
gains are so small that the beliefs may basically remain at whatever values they are
initialized at for the duration of data used to estimate the model.
Together, these stylized facts imply that there may be other factors that contribute
to increase in fit beyond simply the addition of new mechanism to propagate shocks.
We argue that a key element to explaining these stylized facts is to understand how
relaxing RE in an estimated model affects the mapping of the structural parameters to
the reduced form matrices that are taken to the data.
3 Relaxing the RE restrictions through CGL
The most common approach to implement an adaptive learning strategy is to start with
the linearized structural equations of a DSGE model such as
yt = Γ + Ayt−1 + BIEtyt+1 + Dvt + Kεt (1)
vt = Rvt−1 + ut, (2)
where yt is a vector endogenous variables and vt is vector of autoregressive exogenous
shocks, and to replace the expectation operator with a predetermined boundedly rational
forecasting strategy. This is the standard approach taken by Evans and Honkapohja
(2001) in their exhaustive theoretical treatment on adaptive learning and the approach
most often taken in the behavioral heterogeneous expectation literature as in Hommes
(2013) and the reference therein. The key here is that agents are only asked to form one-
step-ahead forecasts and that they ignore any implications of their forecasts for longer
horizons. The appeal of this approach is that if the chosen forecasting process nests RE,
then under well-known regularity conditions (E-stability) the agents’ beliefs will stay in
the neighborhood of the RE solution. Therefore, it allows for potentially complicated
but bounded beliefs around the natural benchmark of RE in a wide range of models.
A second approach follows Preston (2005) and asks agents to consider the implications
of their beliefs on their entire decision problem. In many cases, this means solving
out an agent’s decision rule, given beliefs today, into the infinite future. The approach
yields linearized structural equations that nest the standard equations studied in the
first approach and RE, but which capture the impact of long run beliefs. Again, if the
forecasting process chosen for agents nests the RE solution, then the functional form of
the model will nest the previous case, and under well-known regularity conditions the
6
Table 1: Stylized facts of Estimated NK models with CGL
Slobadyan and Wouters JEDC 2012 Milani JME 2007
RE CGL-MSV Change RE CGL - MSV Change
Monetary policy & habits Monetary policy & habits
MP inflation 2.04 1.91 0.13* MP inflation 1.433 1.484 -0.051*[1.75, 2.33] [1.06, 1.81] [1.08, 1.90]
MP output 0.09 0.13 -0.04* MP output gap 0.792 0.801 -0.009*[0.05, 0.013] [0.425, 1.165] [0.433, 1.18]
MP output growth 0.22 0.19 0.03* MP smoothing 0.89 0.914 -0.024*[0.18, 0.27] [0.849,0.93] [0.875, 0.947]
MP smoothing 0.81 0.84 -0.03* Habits 0.911 0.117 0.794[0.77, 0.85] [0.717, 0.998] [0.006, 0.289]
Habits 0.71 0.80 -0.09[0.64, 0.78]
AR parameters AR Parameters
Productivity 0.96 0.96 0.00* Demand shock 0.87 0.845 0.025*[0.94, 0.98] [0.8,0.93] [0.776, 0.908]
Risk premium 0.22 0.23 -0.01* Price mark-up 0.02 0.854 -0.834[0.08, 0.36] [0.0005, 0.07] [0.778, 0.93]
Gov. spending 0.98 0.96 0.02*[0.96, 0.99]
Investment 0.71 0.33 0.38[0.62, 0.81]
MP shock 0.15 0.05 0.1*[0.04, 0.24]
Price mark-up 0.89 0.88 0.01*[0.81. 0.97]
Wage mark-up 0.97 0.95 0.02*[0.95, 0.99]
St. Dev. Shocks St. Dev Shocks
Productivity 0.46 0.47 -0.01* MP Shock 0.933 0.860 0.073*[0.41, 0.51] [0.84, 1.04] [0.777, 0.953]
Risk premium 0.24 0.25 -0.01* Demand shock 1.067 1.670 -0.603[0.20, 0.28] [0.89, 1.22] [1.47, 1.91]
Gov. spending 0.53 0.53 0.00* Price mark-up 1.146 1.150 -0.004*[0.48, 0.58 [1.027, 1.27] [1.02, 1.31]
Investment 0.45 0.61 -0.16[0.37, 0.53]
MP shock 0.24 0.24 0.00*[0.22, 0.27]
Price mark-up 0.14 0.14 0.00*[0.11, 0.17]
Wage mark-up 0.24 0.23 0.01*[0.21, 0.28]
Gains Gains- 0.017 0.018
[0.006,0.021] [0.0133,0.0231]Marginal Likelihood Marginal Likelihood
-922.75 -910.97 -765.45 -759.08
7
agents beliefs will depart from those predicted under RE, but remain bounded around
the natural benchmark of RE as before.
Both types of learning models are taken to data using the same methods as their RE
counterpart. The models have state space representations and inference on a vector of
structural parameters, Θ, may be obtained by standard likelihood based techniques using
the Kalman filter. Depending on the assumption for how agents form their forecasts, there
are of course many new dimensions through which adaptive learning may contribute to
the observed improvements in fit. This includes the mechanical increases that occur
because of an increase in the degrees of freedom, or the time variation that is introduced
into the parameters through learning. Another, and the primary focus of this paper, is a
change in the mapping from the structural parameters to the reduced form of the model
taken to the data.
To illustrate how the mapping changes, consider estimating a vector of parameters
Θ from a standard linearized DSGE as in Equations (1) and (2), where the true data
generating process is unknown. The minimum state variable RE solution of the model
takes the following form
yt = C + Fyt−1 + Qvt + Kεt. (3)
The RE restrictions in this case are summarized by the mapping F : Θ → {C,F,Q},which governs how changes in the elements of Θ change C, F, or Q. The mapping in
general is highly nonlinear and often has no closed form solution. Inference on Θ in
this case may be obtained by standard likelihood based techniques such that ΘMLE =
argmaxΘL(Θ|yobs, F ), where L is the likelihood function.Any adaptive learning strategy, however, implies a different mapping. One which is
unambiguously less restrictive following common assumptions employed in this literature.
To see this, consider the case where agents forecast using Equation (3) but do not update
parameters over time. The agents just pick a parameterization for Equation (3) and use
it to forecast in each period. This is the same as assuming agents are learning the MSV
solution but that the gain parameters that govern how beliefs are updated are constrained
to be zero, which makes beliefs unchanging over time. Under this assumption, the data
generating process for the economy takes the following form
yt = (Γ + BĈ)︸ ︷︷ ︸C̃
+ (A + BF̂)︸ ︷︷ ︸F̃
yt−1 + (BQ̂R + D)︸ ︷︷ ︸Q̃
vt + Kεt, (4)
where Equation (3) is substituted in for IEtyt+1 and where hats denote fixed values for
8
the belief parameters. Note the following regarding Equation (4):
1. Equation (4) retains the same reduced form as Equation (3).
2. Conditioning on Ĉ, F̂, and Q̂, Equation (4) is described by the same structural
parameters, which we colled in the vector Ξ.
Now, let’s assume that agents pick beliefs based on ΘMLE = argmaxΘL(Θ|yobs, F )such that the RE values of Ĉ|ΘMLE , F̂|ΘMLE , and Q̂|ΘMLE are used in constructing Equa-
tion (4). This implies the following mapping G : Ξ×ΘMLE → {C̃, F̃, Q̃} to describe thiscase. By construction, if Θ = Ξ, then Equation (4) is equal to Equation (3). If adpative
learning, represented by allowing time variation in Ĉ, F̂, and Q̂ based on past data, is
responsible for explaining improvements in fit, then in this case ΞMLE from Equation (4)
should be equal to ΘMLE from Equation (3) and
maxΘ
IEL(Θ|yobs, F ) = maxΞ
IEL∗(Ξ|yobs, Ĉ|ΘMLE , F̂|ΘMLE , Q̂|ΘMLE).
Whether or not these equalities hold in practice depend on a number of factors. For
example, suppose the true data generating process is given by
yt = CT + FTyt−1 + Q
Tvt + KTεt, (5)
and suppose that Θ0 = F−1({CT,FT,QT}) exists. Then, by construction, it follows
that F (Θ0) = G(Θ0,Θ0) and Θ0 = argmaxΘIEL(Θ|yobs, F ) = argmaxΞIEL∗(Ξ|yobs, G).However, if Θ0 = F
−1({CT,FT,QT}) does not exist or is not feasible, then there is noguarantee that Θ is equal to Ξ, and in general
maxΘ
IEL(Θ|yobs, F ) ≤ maxΞ
IEL∗(Ξ|yobs, G),
where the inequality follows from the definition of F . Specifically, the range of mapping
F is by construction a subset of the range of G because F (Θ) := G(Θ,Θ).
Therefore, if RE is misspecified, then the first step in any adaptive learning scheme
of substituting in a belief function into the structural equations effects the cross equation
restrictions that map the structural parameters to the reduced from. But, importantly,
it does not fundamentally change the underlying reduced form structure of the model,
i.e., Equation (5), Equation (4), and Equation (3) continue to nest one another. There
is nothing explicitly done to the model that allows it to generate more persistence. Now
9
of course, this insight does not imply that further modification such as assuming time-
variation in C̃, F̃, and Q̃ as in adaptive learning will not better fit the data, but it
highlights another mechanism through which adaptive learning models may improve in-
sample fit. In the remainder of the paper, we specifically ask whether adaptive learning
does improve in-sample fit beyond just the loosening of the restrictions implied by RE.
4 The Model, Expectations, and the Reduced Form
To study the role loosened restrictions plays in model fit, we consider five different as-
sumptions for expectations that share a common reduced form:
yt = CRFt + F
RFt yt−1 + Q
RFt vt + K
RF εt, (6)
that each impose different restrictions on CRFt , FRFt , and Q
RFt . In this section, we put
forward a tractable model that allows for analytical derivation of the mapping from the
structural parameters to the reduced form so that changes implied by different expecta-
tions assumptions can be studied in detail.
4.1 The Model
The New Keynesian model we study is a parsimonious version of the model proposed by
Ireland (2004). The microfoundations of the model are given in Appendix A. Following
Preston (2005), the key equations are
xt = −ωât + ĨEt∞∑T=t
βT−t [(1− β)(xT+1 + ωρaâT )− (iT − π̂T+1)− (ρa − 1)âT ] (7)
πt = λ1ĨEt
∞∑T=t
(λ1β)T−t(
1− λ1λ1
πT + ψxT − eT)
(8)
it = r̄ + π̄ + θπ(πt − π̄) + θxxt + �i,t (9)
gt = ŷt − ŷt−1 + ḡ + �z,t (10)
xt = ŷt − ωat (11)
at = ρaat−1 + �a,t (12)
et = ρeet−1 + �e,t, (13)
where xt is the output gap, πt is inflation, it is the nominal interest rate, gt is the growth
rate of output, yt is the stochastically detrended level of output, at is a preference shock,
10
et is a cost push shock, �i,t is a monetary policy shock, and �z,t is the detrended TFP
shock. The individual variables are in log terms.
We choose this model for three reasons. First, the model has no internal propagation
mechanisms other than expectations. In fact, the minimum state variable RE solution
does not depend on any lagged endogenous variables. This has the added benefit of
allowing us to estimate the model without a projection facility, which is usually required
to keep expectations from becoming explosive due partly to numerical precision issues.
Gaus and Ramamurthy (2012) note that the choice of projection facility can have a
significant effect on estimation outcomes, which may complicate a comparison with RE.
Second, the determinacy and E-stability conditions of the model perfectly coincide.4
This allows us to impose identical restrictions on the parameter space for RE and CGL
versions of the model. And finally, the model generates predictions for a number of
directly observable variables that do not require any manipulation such as HP filtering
or even de-meaning in order to estimate.
4.1.1 Rational Expectations
It is straightforward to show that imposing RE on Equations (7) and (8) allows them to
be collapsed to a more familiar form
xt = r̄ + IEREt xt+1 − (it − ĨE
RE
t πt+1) + (1− ω)(1− ρa)at (14)
πt = (1− β)π̄ + β ĨERE
t πt+1 + ψxt − et. (15)
Substituting in Equation (9), we can map the model into the general form of Equations
(1) and (2)
yt = Γ + Ayt−1 + BĨEtyt+1 + Dvt + Kεt
vt = ρvt−1 + ut,
where yt = (xt, πt)′, vt = (at, et)
′, ut = (�a,t, �e,t)′, εt = (�i,t, 0)
′,
Γ = m
(π̄(θπβ − 1)
−π̄((θx(β − 1)− 1 + β + ψ − θπψ)
), B = m
(1 1− θπβψ β(1 + θx) + ψ
),
4The determinacy and E-stability condition for the model is the Taylor principle: θπ >(β−1)θx+ψ
ψ .
11
D = m
((ρa − 1)(ω − 1) θπ
(ρa − 1)(ω − 1)ω −1− θx
), ρ =
(ρa 0
0 ρe
)
m = (1 + θx + θπψ)−1, A = 02×2, and K = −I2. Using the reduced form of Equation
(6) and the method of undetermined coefficient, we can solve analytically for the RE
solution in terms of the structural parameters: CRE = (0, π̄)′,
QRE =
(− (ρa−1)(βρa−1)(ω−1)
1+θx−θxβρa+βρ2a+θπψ−ρa(1+β+ψ)θπ−ρe
1+θx−θxβρe+βρ2e+θπψ−ρe(1+β+ψ)(ρa−1)ψ(ω−1)
1+θx−θxβρa+βρ2a+θπψ−ρa(1+β+ψ)ρe−1−θx
1+θx−θxβρe+βρ2e+θπψ−ρe(1+β+ψ)
), (16)
FRE = 02×2, and KRE = −I. This is of course the explicit mapping F : Θ →
{CRE,FRE,QRE} discussed in Section 2 for this model.There are two notable features about this mapping. First, RE places restrictions on
the intercept term CRE forcing them to be consistent with the steady state of the model.
As we will see, this will not be the case under other expectation assumptions. Second, the
mapping from the structural parameters to the reduced form of QRE is highly nonlinear.
The mapping to each element of QRE is an eight degree polynomial in β, ψ, θπ, θx, ρa, ρe,
and ω. The nonlinearity of this mapping is important because it means that relatively
small changes in the values of structural parameters can have large effects on the reduced
form and fit.
4.1.2 Euler equation learning
The second expectations assumption we consider is Euler Equation Constant Gain Learn-
ing (EE-CGL). To implement this strategy, we start with the same structural equation
as under RE given by Equations (1) and (2). We assume that the agents perceived law
of motion (PLM) takes the same functional form of Equation (6)
yt = CRF + QRFvt + εt. (17)
The agents estimate the reduced form coefficients using past data by a constant gain
recursive least squares algorithm
Φt = Φt−1 + γS−1t zt−1(yt − 1− z′t−1Φt−1) (18)
St = St−1 + γ(zt−1z′t−1 − St−1), (19)
12
where zt is vector of data, Φt is a vector of regression coefficients, St is the estimated
variance-covariance matrix, and γ is a matrix of gain parameters that govern the weight
placed on new information.
Expectations under EE-CGL at time t are given by5
IEEEt yt+1 = Ĉt−1 + Q̂t−1Rvt.. (20)
Substituting Equation (20) in for the beliefs in Equation (1) yields the following mapping
to the reduced form equations:
CEEt = Γ + BĈ = m(
Ĉ11t−1 + (Ĉ21t−1 − π̄)(1 − βθπ)
Ĉ11t−1ψ + Ĉ21t−1(β(1 + θx) + ψ) − π̄(θx(β − 1) + ψ(1 − θπ) + β − 1)
)(21)
and
QEEt = BQ̂t−1ρ+ D = (22)
m(
1 − ω + ρa(Q̂11t−1 + Q̂21t−1(1 − θπβ) + ω − 1) θπ + (Q̂
12t−1 + Q̂
22t−1 − θπc22β)ρe
Q̂21t−1ρa(β(1 + θx) + ψ) + ψ(1 − ω + ρa(Q̂11t−1 + ω − 1)) θx(Q̂
22t−1βρe − 1) + Q̂
12t−1ρeψ + Q̂
22t−1ρe(β + ψ) − 1,
)where Ĉijt−1 and Q̂
ijt−1 represent the i
th row and jth column element of Ĉt−1 and Q̂t−1,
respectively.
Comparing CRE to CEEt , it is apparent that the restriction placed on the reduced form
under RE are loosened under EE-CGL in two ways. First, the reduced forms intercepts
are no longer explicitly tied to the steady states of the model. They depend on other
structural parameters and through beliefs, which frees them vary over time. Second,
comparing QRE to QEEt , the degree of nonlinearity of the mapping from structural pa-
rameters to the reduced form is reduced. The reduction in nonlinearity allows for changes
in structural parameters to have smaller effects on the other parameters, which allows
these parameters to take on new values without having as significant an impact elsewhere
in the model. For example, consider the limit of the first element of QEEt compared QRE
as ρa → 1. For the RE case, the coefficient goes to zero. Therefore, as the preferenceshock, at, becomes more persistent, its affect on the output gap approaches zero. This of
course offsets the persistence by effectively removing the shock from the model. In the
EE case, however, no such restriction is imposed. Here as ρa goes to one QEE,11t goes to
m(Q̂11t−1 + Q̂21t−1(1− θπβ)), which allows a more persistence preference shock to affect the
output gap at least in short run, when beliefs Q̂ are not at their RE values.
5As is common in the literature, we assume agent know R.
13
4.1.3 Infinite horizon learning
The third expectation assumption we consider is the infinite horizon learning solution.
This solution uses the full forward-looking decision rules given by Equation (7) and (8)
and asks agents to make decisions with respect to their beliefs over all future periods.
This means agents must form forecasts of xt and πt as well as it for T = 1, ...,∞. Incontrast, there is no need to forecast interest rates under EE-CGL. Therefore, to preserve
our nested structure and to allow the agents in this case to use the same PLM as in EE-
CGL, we assume that agents know the coefficients of the Taylor rule, which allows them
to only forecast xt and πt. With this assumption, the sequence of agents’ expectations
are computed as
IEIHt
∞∑T=t
βT−tyT+1 = Ĉt−1(1− β)−1 + Q̂t−1(I− βρ)−tρvt (23)
and
IEIHt
∞∑T=t
(λ1β)T−tyT+1 = Ĉt−1(1− λ1β)−1 + Q̂t−1(I− λ1βρ)−tρvt, (24)
where Ĉt−1 and Q̂t−1 are the coefficients of Equation (17) estimated using the constant
gain recursive least squares algorithm discussed previously. Substituting these expecta-
tions into Equation (7) and (8), the reduced form may be expressed as
yt = ΓIH + BIHĈt−1
+(B′1,IHQ̂t−1 (I− βρ)
−1 ρ+ B′2,IHQ̂t−1 (I− βλ1ρ)−1 ρ+ D
)vt + G�t
where CIHt = ΓIH + BIHĈt−1,
ΓIH = m
(π̄(θπ−1)(1−β) −
(1−β)π̄θπ(1−βλ)
(1−β)π̄(θx+1)(1−βλ) +
π̄(θπ−1)ψ(1−β)
),BIH = m
(1−β(θx+1)
(1−β) −βθπλψ(1−βλ)
1−βθπ(1−β) −
βθπ(1−λ)(1−βλ)
βλψ(θx+1)(1−βλ) +
ψ(1−β(θx+1))(1−β)
β(1−λ)(θx+1)(1−βλ) +
ψ(1−βθπ)(1−β)
)
QIHt = B′1,IHQ̂t−1 (I− βρ)
−1 ρ+ B′2,IHQ̂t−1 (I− βλ1ρ)−1 ρ+ D,
B′1,IH = m
(1− β(1 + θx) 1− βθπ
ψ (1− β (θx + 1)) ψ (1− βθπ)
), and B′2,IH = m
(−βθπλψ −βθπ(1− λ)
βλψ (θx + 1) β(1− λ1) (θx + 1)
).
The IH-CGL case dramatically alters the mapping from structural parameters to the
14
reduced form, while still nesting the RE solutions. The changes here generate an entirely
different type of loosening of restrictions compared to the EE case. The IH-CGL case
turns out to permit a range of values for CIHt and QIHt that are not obtainable in the
other cases for any reasonable parameterization of the model. We illustrate this in detail
in later section.
4.1.4 The fixed belief cases
Finally, the fourth and fifth expectation assumptions we consider are the fixed beliefs
counterparts to the EE-CGL and IH-CGL specifications. In particular, we fix the values
for Ĉ and Q̂ and do not consider any learning. Throughout the paper, we assume that
Ĉ = CRE, F̂ = FRE, and Q̂ = QRE such that the FB cases is defined by the explicit
mapping G : Ξ × Θ → {CRF ,FRF ,QRF} from Section 2, where Ξ is the vector ofstructural parameters to be estimated and Θ is a vector of structural parameters that
gives rises to the fixed forecasting function.
5 Taking the model to the data
5.1 The state space
The five expectation assumptions share a common reduced form and a common state
space representation. The common state space representation is given by
yobst = HXt
Xt = Jt−1 + Ft−1Xt−1 + St−1εt
where Jt−1 = Ω−1t−1µt−1, Ft−1 = Ω
−1t−1Ψ, St−1 = Ω
−1t−1ζ,
µt−1 =
CRFt
r̄ + (1− θπ)π̄ḡ
03×1
, Ωt−1 =
I2×2 02×3 −QRFt−θx −θπ 1 0 0 0 0
0 0 0 1 −1 0 0−1 0 0 0 1 −ω 0
02×5 I2×2
15
Ψ =
03×7
0 0 0 0 −1 0 00 0 0 0 0 0 0
0 0 0 0 0 ρa 0
0 0 0 0 0 0 ρe
, ζ =
01×2 −1 01×401×7
01×2 1 01×4
01×3 1 01×3
01×7
02×5 I2×2
,
where yobst = (xt, πt, it, gt)′ are the observable varaibles and Xt = (xt, πt, it, gt, yt, at, et)
′.
In the two adaptive learning cases, CRFt and QRFt evolve according to Equation (18) and
(19). Otherwise, CRFt = CRF and QRFt = Q
RF are fixed values.
5.2 Estimation results
In this section, we estimate the five versions of the model using maximum likelihood
and compare the results.6 We use US data from 1984q1 through 2008q3. We use CBO
measure of the output gap, the GDP deflator measure of inflation, the three month
treasury bill rate, and growth rate of real GDP as our observables.
To avoid known issues with weak identification and other estimation issues that arise
in these models, we calibrate some of the key parameters. We set β = 0.995, π̄ = 2%, and
ψ = 0.1. The slope of the Phillips curve (ψ) is set following Ireland (2004), which in the
Calvo pricing framework corresponds to the average firm adjusting its price roughly once
a year. We also calibrate the mean growth rate ḡ to the average growth rate observed
over the estimation period. The remaining parameters {θπ, θx, ω, ρa, ρe, σa, σe, σi, σg} areestimated along with {γx, γπ}, the gain parameters, in the CGL case.
Table 2 shows the estimation results for the five different model specifications. Com-
paring the results for RE to EE-CGL and IH-CGL, the three stylized facts noted in
Section 2 are all present. In particular, the inference on the structural parameters and
6To construct confidence intervals for the the structural parameter estimates, we calculate confidenceintervals using a method similar to the one proposed by Stock and Watson (1998) for time-varyingparameter models. We employ this method because the constrained optimization routine we use tomaximize the likelihood function provides unreliable numerical estimates of the Hessian matrix. Theconfidence intervals are constructed for each parameter by selecting a grid surrounding the ML estimate.We then re-maximize the log-likelihood function by searching over all other parameters, while holdingthe parameter of interest fixed at one point on the grid. The maximized log-likelihood value obtainedat that point is then compared to the original maximum log-likelihood value using a likelihood ratiotest. The set of points on the grid that yield maximized log-likelihood values that fail to reject thenull hypothesis that the true parameter vector lies in the restricted parameter space at the 5% levelconstitutes the 95% confidence interval.
16
exogenous shock process is roughly the same across the three cases. Although,tThe CGL
cases yield significant improvements in log-likelihood relative RE. And, the estimates for
the gain parameters are small. In fact, in our case the gains are very small, which is
partly due to the fact that our estimation does not make use of priors.
Turning the EE-FB and IH-FB cases, we note that the results are nearly indistinguish-
able from their learning counterparts. In particular, we cannot reject the null hypothesis
that the EE-FB model fits the data as well as the EE-CGL model using a likelihood
ratio test and only reject for the IH case at the 10% significance level. Therefore, the
loosening of restrictions described in Section 3 appear to account for a majority of the
improvement fit.
To illustrate the point, Figure 1 shows the estimated impulse responses for a monetary
policy shock, a preference shock, and a cost push shock for the five different models. The
monetary policy shock provides the clearest picture for the role that learning plays in
adding persistence. There is no mechanism to transmit the shocks in this model except
through adaptive learning. Therefore, the RE and FB cases only respond in the period
the shock is realized and then immediately return to steady state. In the two adaptive
learning cases, the shock does exhibit some persistence through beliefs leading to small
but persistent responses for the endogenous variables. However, the responses to the
remaining shocks are for the most part just scaled versions of the RE response. Learning
does not add any substantive additional dynamics in these cases.
Turning to the reduced form of the five models, Table 3 reports the elements of
QRF for each case (for learning we show the value implied by the initial belief) at their
estimated values reported in Table 2. Because of our decision to calibrate π̄, the implied
values of CRF are roughly equivalent across the five cases and not shown here. The
EE cases exhibits a modest and consistent departure from the RE case, while the IH
cases are drastically different. The table illustrates that relatively modest changes for
the structural parameters can imply fairly significant changes in the reduced form.
5.3 Out-of-sample fit
Finally, we compare the models in a real-time out-of-sample forecasting exercise to deter-
mine whether the improvement in fit translate in to out-of-sample forecasting power. We
use the real-time dataset provided by the Philadelphia federal reserve for inflation, gdp
growth, and interest rates. For a real-time measure of the output gap, we use the Fed’s
Green Book nowcast of the output gap. We evaluate the forecasts at four different fore-
cast horizons: the now cast, one-quarter, four-quarter, and six-quarter ahead forecasts
17
Table 2: ML estimates
RE EE-FB EE-CGL IH-FB IH-CGL
θπ 1.577 1.123 1.125 1.578 1.653[1.31 , 1.97] [0.99* , 1.61] [0.99* , 1.73] [1.36 , 1.97] [1.45 , 1.73]
θx 0.166 0.208 0.211 0.160 0.157[0.08 , 0.29] [0.11 , 0.36] [0.21 , 0.30] [0.08 , 0.26] [0.12 , 0.19]
ω 0.000 0.000 0.000 0.066 0.000[0.00* , 0.02] [0.00* , 0.01] [0.00* , 0.00] [0.00* , 0.48] [0.00* , 0.00]
ρa 0.933 0.926 0.925 0.973 0.991[0.87 , 1.00*] [ 0.86 , 0.99 ] [0.90 , 0.98] [0.95 , 0.99] [0.97 , 0.99]
ρe 0.967 0.917 0.899 0.684 0.525[0.87 , 1.00*] [0.80 , 1.00] [0.89 , 0.95] [0.50 , 0.86] [0.31 , 0.76]
σa × 100 1.671 1.633 1.613 0.555 0.146[0.82 , 6.77] [1.36 , 2.04] [1.47 , 1.79] [0.19 , 1.38] [0.01 , 1.46]
σe × 100 0.071 0.072 0.073 0.166 0.229[0.06 , 0.10] [0.06 , 0.09] [0.07 , 0.08] [0.11 , 0.22] [0.23 , 0.23]
σi × 100 0.558 0.558 0.561 0.560 0.557[0.46 , 0.70] [0.46 , 0.73] [0.54 , 0.58] [0.47 , 0.70] [0.52 , 0.61]
σg × 100 0.179 0.180 0.180 0.175 0.179[0.16 , 0.21] [0.16 , 0.22] [0.17 , 0.22] [0.15 , 0.21] [0.16 , 0.19]
γx - - 0.00 - 0.00[1.9E-06 , 2.5E-06] [2.4E-08 , 3.0E-08]
γπ - - 0.00 - 0.001[1.7E-09 , 1.8E-09] [0.00 , 0.02]
Log Likelihood 1974.49 1982.6 1983.72 1986.84 1988.97LR Statistic (rel. RE) 16.22 18.46 24.42 28.96p-value (0.005) (0.006) (0.000) (0.000)LR Statistic (rel. FB) 2.24 4.54p-value (0.163) (0.052)
Notes: *Denotes an imposed boundary for the parameter that truncates the confidence interval. ML estimates for theIreland model under the five different assumptions for expectations. 95% confidence intervals for the estimates are shown inbrackets below the point estimates. The confidence intervals are truncated when they hit a theoretically imposed boundary,which is indicated by *. The forecasting function in the FB cases and the initial beliefs in the CGL cases are set to the REsolution implied by the estimates in the first column.
Table 3: Reduced form estimates of Q̃RF
Q̃11RF Q̃21RF Q̃
12RF Q̃
22RF
RE 0.060 0.083 8.899 -2.901FB-EE 0.091 0.085 7.272 -2.921CGL-EE 0.091 0.085 7.126 -2.881FB-IH -0.858 0.023 2.367 -1.355CGL-IH -3.251 0.081 1.840 -0.937
Notes: Reduced form values implied by the estimated parameters given inTable 2.
18
Figure 1: Estimating Impulse Responses
0 20 40 60 80 100Output Gap
0
0.2
0.4
0.6 IH-CGLIH-FBEE-CGLEE-FBRE
0 50 100Inflation
0.5
1
1.5
2
0 20 40 60 80 100Interest Rate
2.5
3
3.5
4
0 5 10 15GDP Growth
2.5
3
3.5
4
4.5
5
5.5
Cost push shock
0 20 40 60 80 100Output Gap
0
0.2
0.4
0.6
0.8
0 50 100Inflation
2
2.5
3
0 20 40 60 80 100Interest Rate
4
4.5
5
5.5
0 5 10GDP Growth
3
4
5
6
Preference shock
0 10 20 30 40 50Output Gap
-0.6
-0.4
-0.2
0
0 50 100Inflation
1.96
1.97
1.98
1.99
2
2.01
0 10 20 30 40 50Interest Rate
4
4.5
5
5.5
6
0 5 10GDP Growth
1
2
3
4
5
Interest rate shock
Notes: Impulse responses to one standard deviation shocks implied by the estimated parametersgiven in Table 2.
19
for inflation and GDP growth. We use the second releases of data as the comparison
series to measure forecast accuracy. Inference on improvements in forecast accuracy are
obtained using the Diebold and Mariano (1995) test statistic (DM) with the Harvey et
al. (1997) small sample and forecast horizon correction. This test statistic is found to
work well on tests of real-time forecast by Clark and McCracken (2009) and Clark and
McCracken (2011).
Table 4 shows the out-of-sample results for the five cases plus results fro bivariate
VAR(2) using inflation and GDP growth and a random walk forecast. The top row,
labeled RE, gives the root mean squared forecast error (RMSFE) of the RE forecasts for
inflation and GDP growth at the four different horizons. The remaining rows give the
relative RMSFE of the remaining forecasts compared to RE with the DM test statistic
in parentheses below. The results here are mixed. The FB and CGL cases generate
consistent improvements over the RE model for inflation with the largest gains observed
at short horizons. Comparing the VAR and the random walk, CGL and FB do best at
long horizons. For GDP Growth, both FB and CGL do about the same as the RE model
at all horizons and improve relative to the VAR and random walk at the long horizons.
The results illustrate that the in sample improvements in fit translate into out-of-
sample forecasting power and are not just evidence of overfitting. Furthermore, most of
the gains observed in this case are explained by the FB cases. The additional persistence
added by learning does not add much to out-of-sample forecasting power.
6 Understanding how RE restrictions affect fit
To illustrate how small changes in the reduced form can lead to large changes in model
fit, we turned to calibrated version of the models. We consider the possible EE-FB and
IH-FB reduced forms values that are implied for a range of the structural parameters
( θπ, θx, ρa, and ρe), while holding all other parameters fixed. We then use the range
of reduced form values generated by the two cases to solve for θπ, θx, ρa, and ρe that
are necessary to generate the same reduced form values under RE. In particular, we
set Θ = (θπ, θx, ρa, ρe)′ and calculate G(Θ, Θ̄) for 1.4 < θπ < 1.6, 0.1 < θx < 0.3,
.75 < ρa < .95, and .65 < ρa < .85, where Θ̄ = (1.5, 0.2, 0.85, 0.75)′ is fixed. We then
calculate ΘRE = F−1(G(Θ, Θ̄)) and compare ΘRE to Θ.
Figure 2 shows the resulting values of QEEand QIH , where the black dot represents
the value QRE implied by Θ̄. The remaining two columns shows the original grid for
Θ in red and the implied RE values of ΘRE in blue. The EE case shows the effects of
a reduction in nonlinearity can potentially have on model fit. RE can cover the same
20
Table 4: Real-time forecast results
Inflation GDPAnnualized RMSFE Annualized RMSFE
t (Nowcast) t+1 t+4 t+6 t (Nowcast) t+1 t+4 t+6
RE 1.11 1.06 0.93 0.99 RE 2.28 2.04 2.06 1.68
Relative RE Relative RE
RW 1.02 0.85** 0.80*** 0.86** RW 1.00 1.02 1.12 1.20(2.21) (-1.99) (-2.58) (-2.04) (-0.01) (0.86) (1.64) (4.29)
Bi-VAR 0.84** 0.78*** 1.36 1.18 Bi-VAR 0.95 0.94 1.15 1.09(-2.17) (-2.98) (3.67) (1.81) (-0.96) (-0.83) (2.82) (1.11)
FB-EE 0.99** 0.99 0.98 0.99 FB-EE 1.11 1.03 0.98 0.98(-1.74) (-0.61) (-0.91) (-0.46) (1.01) (2.46) (-0.98) (-0.46)
FB-IH 0.92** 0.88** 0.85** 0.86** FB-IH 1.09 1.02 1.00 1.01(-2.01) (-1.94) (-2.12) (-2.04) (0.89) (1.87) (0.01) (0.95)
CGL-EE 0.99 0.98 0.99 1.00 CGL-EE 1.10 1.03 0.98 0.98(-0.98) (-1.01) (-0.53) (-0.03) (0.95) (2.39) (-0.74) (0.07)
CGL-IH 0.95* 0.88*** 0.85** 0.84*** CGL-IH 1.09 1.02 0.99 0.99(-1.39) (0.88) (-2.12) (-2.76) (0.85) (1.97) (-0.72) (0.14)
*** p < 0.01, ** p < 0.05, * p < 0.1
Notes:TBD
reduced form values using a smaller range of parameters but it constrains the possible
values that the AR coefficients of the shocks can take on. Under EE-FB, more persistent
shocks are permitted, while maintaining very similar values for the monetary policy
rule and identical values for the remaining parameters. This allows the EE-FB case to
generate more persistence by simply increasing the persistence of the exogenous shocks
without significantly affecting other parts of the reduce form.
In the IH-FB case, the mechanism on display is more extreme than in the EE case.
The IH-FB case spans a much larger range of the reduced form space for the same grid
of points. The space is so large that the RE solution is incapable of covering the same
area with parameter values that satisfy the Taylor principle or stationary of the shock
processes. Now of course, it could be the case that these ranges of the parameters space
are not empirically relevant. But this is not what we find. In fact, if no constraints are
placed on the reduced form when it is estimated it returns values that only the IH case
can generate for plausible structural parameter values.
The key takeaway from this exercise is that it is not necessary for the structural
parameters to change in any substantial way to fundamentally alter the way the model
fits the data. In fact, it is precisely because the changes to the structural parameters are
small that it is easy to attribute changes in fit to the persistence that adaptive learning
strategy can generate.
21
Figure 2: The mapping from structural parameters to the reduced form
EE-CGL
IH-CGL
Notes: TBD
22
7 Robustness
To illustrate that the improvements observed in the FB case are robust, we replicate the
result in the model of Smets and Wouters (2007). We use the replication files provided by
the American Economic Review for the paper. Smets and Wouters estimate the model
using Bayesian techniques in Dynare. We replicate the benchmark case found in Table 1
and Table 1B of their paper.7 For the FB case, we use the RE solution of the model at the
posterior modes as the fixed belief function. We then estimate the structural parameters
around the fixed forecasting function by Bayesian methods in Dynare.
Table 5 provides selected parameter estimates and the marginal log-likelihood of the
two models. The parameter estimates shown in the table are those that can be mapped
into the main parameters of the Ireland model. The parameter labels are chosen to reflect
these connections. The full set of parameter estimates are given in the appendix.
Table 5 demonstrates that the key results obtained in the Ireland model hold in this
model. In particular, the parameter estimates are relatively unchanged and marginal
likelihood is higher in the FB case. In Slobodyan and Wouters (2012b), the average
improvement in marginal likelihood was 6.3 with a maximum of 18 across 5 different
CGL specifications. Therefore, the improvement observed here are consistent with the
improvement observed in the Ireland model studied in the last section.
8 Conclusion
This paper demonstrates that improvements in in-sample fit of New Keynesian DSGE
models that assume some type of bounded rationality relative to their rational coun-
terparts are in large part explained by the loosening of the restrictions imposed on the
structural parameters by Rational Expectations. This finding is in contrast to the prevail-
ing explanation given in the literature that improvements are primarily due an increased
ability to generate persistence in key endogenous variables. While increasing a model’s
ability to generate persistence remains important, it accounts for a much smaller portion
of the overall improvements observed. Furthermore, we demonstrate that these increases
translate into modest real-time out-of-sample forecast accuracy and again explain a sub-
stantial portion of the improvements observed by the same model using adaptive learning
with a constant gain.
7Our results differ slightly from the results reported in Smets and Wouters (2007) for the RE case.However, since we use the their official replications files, we have no reason to doubt that the observed dif-ferences are anything more than numerical imprecision that arises from running estimations on differentversions of Matlab.
23
Table 5: FB in Smets and Wouters 2007
Prior Distribution Prosterior Distribution
Distr. Mean St. Dev Exp. Mean 95% HPD 5% HPD
θπ norm 1.5 0.25 RE 2.054 1.765 2.354FB 1.740 1.406 2.071
θx norm 0.125 0.05 RE 0.217 0.173 0.262FB 0.190 0.145 0.234
π̄ × 4 gamma 2.5 0.4 RE 2.616 2.140 3.080FB 2.524 1.988 3.060
ρa beta 0.5 0.2 RE 0.204 0.070 0.337FB 0.205 0.036 0.358
ρe beta 0.5 0.2 RE 0.895 0.817 0.977FB 0.897 0.829 0.968
ρz beta 0.5 0.2 RE 0.955 0.936 0.974FB 0.956 0.927 0.986
σa invg 0.1 2 RE 0.242 0.205 0.282FB 0.259 0.232 0.286
σe invg 0.1 2 RE 0.246 0.220 0.270FB 0.235 0.212 0.258
σz invg 0.1 2 RE 0.432 0.386 0.474FB 0.443 0.397 0.488
σi invg 0.1 2 RE 0.138 0.108 0.166FB 0.123 0.106 0.138
RE Marginal Likelihood: -924.8 Slobodyan and Wouters RE -922.75FB Marginal Likelihood: -920.3 (JEDC 2012) CGL -922.65/-916.4
Notes:TBD
24
The results suggest that the overall structure of the New Keynesian model is not
necessarily at odds with data and that RE may actually be a key source of model mis-
specification. Bounded rationality strategies that separate the estimation of beliefs from
the structural parameters of the model may increase model fit and out-of-sample fore-
casting by loosening the cross equation restrictions imposed on the structural parameters
by RE.
References
Adolfson, Malin, Jesper Lindé, and Mattias Villani, “Forecasting Performance of
an Open Economy DSGE Model,” Econometric Reviews, April 2007, 26 (2-4), 289–328.
Amato, Jeffery D and Thomas Laubach, “Rule-of-thumb behaviour and monetary
policy,” European Economic Review, 2003, 47 (5), 791–831.
Ball, Laurence, N Gregory Mankiw, and Ricardo Reis, “Monetary policy for
inattentive economies,” Journal of monetary economics, 2005, 52 (4), 703–725.
Berardi, Michele, Jaqueson K Galimberti et al., “On the initialization of adaptive
learning algorithms: A review of methods and a new smoothing-based routine,” Centre
for Growth and Business Cycle Research Discussion Paper Series, 2012, 175.
Clark, Todd E and Michael W McCracken, “Tests of equal predictive ability with
real-time data,” Journal of Business & Economic Statistics, 2009, 27 (4), 441–454.
and , “Advances in forecast evaluation,” working paper, 2011.
Cogley, Timothy and Argia M Sbordone, “Trend inflation, indexation, and inflation
persistence in the New Keynesian Phillips curve,” The American Economic Review,
2008, 98 (5), 2101–2126.
Cole, Stephen and Fabio Milani, “The misspecification of expectations in new key-
nesian models: A dsge-var approach,” Technical Report 2014.
Croushore, Dean and Tom Stark, “A real-time data set for macroeconomists: Does
the data vintage matter?,” Review of Economics and Statistics, 2003, 85 (3), 605–617.
Diebold, Francis X and Robert S Mariano, “Comparing predictive accuracy,” Jour-
nal of Business & economic statistics, 1995, 13 (3).
25
, Frank Schorfheide, and Munchul Shin, “Real-Time Forecast Evaluation of
DSGE Models with Stochastic Volatility,” May 2015, pp. 1–24.
Evans, George W and Seppo Honkapohja, Learning and expectations in macroeco-
nomics, Princeton: Princeton University Press, January 2001.
Fuhrer, Jeffrey C, “Habit formation in consumption and its implications for monetary-
policy models,” American Economic Review, 2000, pp. 367–390.
Galı, Jordi and Mark Gertler, “Inflation dynamics: A structural econometric anal-
ysis,” Journal of monetary Economics, 1999, 44 (2), 195–222.
Gaus, Eric and Srikanth Ramamurthy, “Estimation of Constant Gain Learning
Models,” mimeo, August 2012.
Harvey, David, Stephen Leybourne, and Paul Newbold, “Testing the equality
of prediction mean squared errors,” International Journal of Forecasting, 1997, 13,
281–291.
Hommes, Cars, Behavioral rationality and heterogeneous expectations in complex eco-
nomic systems, Cambridge University Press, 2013.
Ireland, Peter N, “Technology Shocks in the New Keynesian Model,” The Review of
Economics and Statistics, November 2004, 86 (4), 923–936.
Kolasa, Marcin, Michal Rubaszek, and Pawel Skrzypczynski, “Putting the New
Keynesian DSGE model to the real-time forecasting test,” Journal of Money, Credit
and Banking, October 2012, 44 (7), 1301–1324.
Mankiw, N Gregory and Ricardo Reis, “Sticky Information versus Sticky Prices:
A Proposal to Replace the New Keynesian Phillips Curve,” Quarterly Journal of Eco-
nomics, 2002, pp. 1295–1328.
Milani, Fabio, “A Bayesian DSGE Model with Infinite-Horizon Learning: Do” Mechan-
ical” Sources of Persistence Become Superfluous?,” International Journal Of Central
Banking, 2006.
, “Expectations, learning and macroeconomic persistence,” Journal of Monetary Eco-
nomics, 2007, 54 (7), 2065–2082.
26
Negro, Marco Del, Frank Schorfheide, Frank Smets, and Rafael Wouters, “On
the Fit of New Keynesian Models,” Journal of Business and Economic Statistics, April
2007, 25 (2), 123–143.
Preston, Bruce, “Learning about Monetary Policy Rules when Long-Horizon Expec-
tations Matter,” International Journal of Central Banking, 2005.
Rotemberg, Julio J, “Monopolistic price adjustment and aggregate output,” The Re-
view of Economic Studies, 1982, 49 (4), 517–531.
Slobodyan, Sergey and Raf Wouters, “Learning in a Medium-Scale DSGE Model
with Expectations Based on Small Forecasting Models,” American Economic Journal:
Macroeconomics, April 2012, 4 (2), 65–101.
and , “Learning in an estimated medium-scale DSGE model,” Journal of Economic
Dynamics and control, 2012, 36 (1), 26–46.
Smets, Frank and Rafael Wouters, “Shocks and Frictions in US Business Cycles: A
Bayesian DSGE Approach,” The American Economic Review, 2007, pp. 586–606.
, Anders Warne, and Rafael Wouters, “Professional forecasters and real-time
forecasting with a DSGE model,” International Journal of Forecasting, October 2014,
30 (4), 981–995.
Stock, James H and Mark W Watson, “Median unbiased estimation of coefficient
variance in a time-varying parameter model,” Journal of the American Statistical As-
sociation, 1998, 93 (441), 349–358.
Appendix A
In what follows, we describe the model and derive household and firm decision rules
following Preston (2005) that depend on expectations but not specifically on any explicit
assumption for how expectations are formed. This gives us a general setting from which
the consequences of different expectation assumption on the reduced form may be tracked
systematically.
Households seek to maximize the following expected utility function
ĨEi,t
∞∑t=0
βt[atln(Ci,t) + ln(Mi,t/Pt)− η−1hηi,t
](A1)
27
by choosing consumption, money holdings, labor supply, and by taking into account a
preference shock at, where ĨEt represents a general expectations operator that is yet to
be defined.8 The representative household is faced with the budget constraint
Mi,t−1 +Bi,t−1 + Tt +Wthi,t + ∆i,t ≥ PtCi,t +Bi,t/rt +Mt, (A2)
where M is nominal money balances, B is nominal bond, T is transfers, W is the nominal
wage, and ∆ is nominal profits the household receives from ownership of firms.
Production in the economy is separated into two sectors: a perfectly competitive
finished goods sector and a monopolistically competitive intermediates goods sector. The
finished goods sector uses a continuum of intermediates goods of prices Pj to construct
the finished good. The production function is a CES constant returns to scale technology
(∫ 10
Y(θt−1)/θtj,t dj
)θt/(θt−1)≥ Yt, (A3)
where θt is a cost push shock.9 The finished-good-producing firms maximize profits
subject to demand for their good
Yj,t = (Pj,t/Pt)−θt Yt. (A4)
Finished goods price is given by
Pt =
(∫ 10
P 1−θtj,t dj
)1/(1−θt)(A5)
for all t. The intermediate-goods-producing firms hires hj,t units of labor to manufacture
Yj,t units of outputs using
Zthj,t ≥ Yj,t, (A6)
where Zt is an aggregate technology shock.10 To introduce price stickiness, it is assumed
that firms face an explicit cost to adjust nominal prices following Rotemberg (1982) that
is measured in terms of finished goods
φ
2
(Pj,t
π̄Pj,t−1− 1)2
Yt. (A7)
8It is assumed that 0 < β < 1, η ≥ 1, and ln(at) = ρaln(at−1) + �a,t.9θt = (1− ρθ)ln(θ) + ρθln(θt−1) + �(θ, t).
10ln(Zt) = ln(z) + ln(Zt−1) + �z,t.
28
Lastly, the output gap is defined as the ratio between the actual and efficient levels of
output
xt =
(1
at
)1/ηYtZt. (A8)
Households’ decision rule
We begin by deriving the household’s optimal decision rule following Preston (2005). The
conditions of the household’s optimal decision are given by
atC−1i,t = βrtĨEi,tat+1C
−1i,t+1π
−t+1 (A9)
hη−1i,t = atC−1i,t Wt (A10)
Bi,t−1 +Wthi,t + ∆i,t = PtCi,t +Btr−1t , (A11)
where we have eliminate the variables dealing with money and transfers in the budget
constraint. Starting with budget constraint, we put it into real terms by dividing by the
price level11 and make it stationary using substitution to account for the unit root TFP
process Zt12
bi,t−1π−1t + ωtZthi,t + di,tZt = ci,tZt + bi,tr
−1t .
Then, rearranging and summing, we obtain the lifetime budget constraint
∞∑T=t
βT−tci,TZt =∞∑T=t
βT−t(wTZThi,T + di,TZt),
which allows us to divide out Zt. Then noting that wThi,T + di,T = yi,T , we have
∞∑T=t
βT−tĉi,T =∞∑T=t
βT−tŷi,t. (A12)
We then log-linearize the stationary Euler equation
ĉi,t = ĨEi,tĉi,t+1 − (it − Etπ̂t+1) + (ρa − 1)ât
and solve it backwards recursively to get
11We assume wt = Wt/Pt, Dt = ∆t/Pt, bt = Bt/Pt.12We assume ωt = Wt/(PtZt), dt = ∆t/(PtZt).
29
ĨEi,tĉi,T+1 = ĉi,t + ĨEi,t
T∑s=t
((is − π̂s+1)− (ρa − 1)ât).
Then summing and discounting this expectation we get
ĨEi,t
∞∑T=t
βT−tĉi,T+1 =1
1− βĉi,t −
1
1− βĨEi,t
∞∑T=t
βT−t((iT − π̂T+1)− (ρa − 1)âT ). (A13)
Combining Equation (A12) and (A13) yields the households decision rule for consumption
ĉi,t = ĨEi,t
∞∑T=t
βT−t [(1− β)yj,T+1 − (iT − π̂T+1)− (ρa − 1)âT ] .
Aggregating across households and using the log linearized output gap, we obtain the
aggregate IS curve
xt = −ωât + ĨEt∞∑T=t
βT−t [(1− β)(xT+1 + ωρaâT )− (iT − π̂T+1)− (ρa − 1)âT ] , (A14)
which is absent any assumption about how expectations are formed.
Firms’ decision rule
Firms seek to maximize the present value of their companies
ĨEj,t∑T=t
Qt,TPTΠj,T , (A15)
where
Πj,T =
(Pj,TPT− MCj,T
PT
)Yj,t −
Φ
2
(Pj,T
π̄Pj,T−1− 1)2
Yt (A16)
and Qt,T = βT−t aT
cT. The first order condition of the firm’s problem is
Φ
(Pj,t
π̄Pj,t−1− 1)
1
π̄Pj,t−1= ĨEj,t
[Qt,t+1Yt+1Qt,tYt
Φ
(Pj,t+1π̄Pj,t
− 1)Pj,t+1π̄P 2j,t
](A17)
+ θt
(Pj,tPt
)−1−θt (wtθt − Zt(θt − 1)Pj,tP 2t Zt
).
30
Because we assume π̄ ≥ 1, the model does not have a steady state price level. Therefore,to make the model stationary, we define: P̂j,t+i = Pj,t+i/Pt+i and πt+i = Pt+i/Pt+i−1 for
all i. Likewise, wages are made stationary by the following substitution ωt = wt/PtZt.
Substituting in these definitions yields
Φ
(P̂j,tPt
π̄P̂j,t−1Pt−1− 1
)1
π̄P̂j,t−1Pt−1= ĨEj,t
[Qt,t+1Yt+1Qt,tYt
Φ
(P̂j,t+1Pt+1
π̄P̂j,tPt− 1
)P̂j,t+1Pt+1
π̄P̂ 2j,tP2t
]+
(P̂j,t
)−θt (θtωt + (1− θt)P̂j,t
).
Now, noting that market clearing implies Ct = Yt, multiplying both sides by Pt, and
using the definition of inflation we get
Φ
(P̂j,tΠt
Π̄P̂j,t−1− 1
)Πt
π̄P̂j,t−1= ĨEj,t
[at+1at
Φ
(P̂j,t+1Πt+1
π̄P̂j,t− 1
)P̂j,t+1Πt+1
π̄P̂ 2j,t
]+
(P̂j,t
)−θt (θtωt + (1− θt)P̂j,t
).
Log-linearizing this expression
−Φpj,t−1 +(Φ(β+1)−1+ θ̄)pj,t−Φβ ĨEj,tpj,t+1 = Φβ ĨEj,tπt+1−Φπt− ω̂t(1− θ̄)− θ̂t (A18)
and introducing lag polynomials, we can write this as
(1
βL2 − Φ(β + 1)− 1 + θ̄
ΦβL+ 1
)ĨEj,tpj,t+1 =
1
β(πt − βEtπt+1) +
1− θ̄βΦ
ω̂t +1
Φβθ̂t.
(A19)
Factoring the lag polynomial and solving forward the unstable root yields
(1− λ1L)pj,t =−λ−12 L−1
1− λ−12 L−1L
β
(πt − βEtπt+1 +
1− θ̄Φ
ω̂t +1
Φet
)= −λ1ĨEj,t
∞∑T=t
(λ1β)T−tπT + λ1β ĨEj,t
∞∑T=t
(λ1β)T−tπT+1
+ĨEj,t
∞∑T=t
βT−t(ψxT − eT ).
31
where (θ̄ − 1)ηΦ−1ωt = ψxt, Φ−1θ̂t = et, 0 < λ1 < 1, λ2 > 1, λ2 = 1λ1β , and λ1 + λ2 =(Φβ)−1(Φ(β + 1)− 1 + θ̄). Combining terms we have13
(1− λ1L)pj,t = −πt + λ1ĨEj,t∞∑T=t
(λ1β)T−t1− λ1
λ1πT + ψxT − eT .
Finally, aggregating across firms yields the aggregate the Phillips curve free of any ex-
pectations assumptions
πt = λ1ĨEt
∞∑T=t
(λ1β)T−t(
1− λ1λ1
πT + ψxT − eT). (A20)
Monetary policy
The model is closed with a standard contemporaneous Taylor rule
it = r̄ + π̄ + θπ(πt − π̄) + θxxt + �i,t (A21)
where �i,t is an i.i.d. monetary policy shock.
13noting that −λ1πt − λ21βπt+1 − λ31β2πt+2... and λ1βπt+1 + (λ1β)2πt+2 + (λ1β)3πt+3... and −λ1πt +λ1β(1− λ1)πt+1 + (λ1β)2(1− λ1)πt+2 + (λ1β)3(1− λ1)πt+3....
32
1 Introduction2 The stylized facts of estimated NK models with CGL3 Relaxing the RE restrictions through CGL4 The Model, Expectations, and the Reduced Form4.1 The Model4.1.1 Rational Expectations4.1.2 Euler equation learning4.1.3 Infinite horizon learning4.1.4 The fixed belief cases
5 Taking the model to the data5.1 The state space5.2 Estimation results5.3 Out-of-sample fit
6 Understanding how RE restrictions affect fit7 Robustness8 Conclusion