Practical DSGE models and topics · ment adjustment costs, Journal of Economic Dynamics and Control, 27, 533-549. Kulish, M. and A. Pagan (2016) Estimation and solution of models

Practical DSGE models and topics

Fabio Canova

BI Norwegian Business School, CAMP, FSBF, and CEPR

August 2017

Outline

� A refresher and identi�cation issues

� DSGE-VARs and data selection.

� Problems with standard priors and measurement errors.

� Data-rich DSGEs (proxies, multiple data, conjunctural information, mixedfrequency, indicators of future variables).

� Dealing with trends and non-balanced growth.

� Prior elicitation and prior predictive analysis

� Time varying coe�cient DSGE models.

� Estimating DSGE models with structural breaks and occasionally bindingconstraints.

� Estimating pruned higher order models.

� Measuring, assessing and dealing with misspeci�cation.

� Sequential MC methods.

� Estimating general nonlinear DSGE models. The particle �lter.

References

Aguiar, M. and G. Gopinath (2007). Emerging market business cycles: The cycle is the

trend, Journal of Political Economy, 115, 69{102.

An S. and F. Schorfheide(2007) Bayesian analysis of DSGE models, Econometric Reviews,

26,113-172

Andreasen, M. (2012). On the e�ects of rare disasters and uncertainty shocks for risk

premia in non-linear DSGE models. Review of Economic Dynamics, 15, 293-316.

Andrle, M. and J. Benes (2013) System prior: Formulating priors about DSGE models'

properties, IMF working paper, WP/13/257.

Ascari, G. and Bonomolo, P. (2015). Does in ation walk on unstable paths? Riksbank

manuscript.

Beaudry, P. and F. Portier, (2006). Stock Prices, News and Economic Fluctuations,

American Economic Review, 96, 1293-1307.

Bayer, A. and R. Farmer (2007) Testing for Indeterminacy: An Application to U.S. Mon-

etary Policy: Comment," American Economic Review, 97, 524-529.

Bi, X. and N. Traum (2014). Estimating Fiscal limits: the case of Greece, Journal of

Applied Econometrics, 29,1053-1072.

Bianchi, F. and L. Melosi (2016). Modeling the evolution of expectations and uncertainty

in general equilibrium. International Economic Review,57, 717-756.

Boivin, J. and M. Giannoni (2006). DSGE estimation in data rich environments, University

of Montreal working paper.

Canova, F. (1998). Detrending and Business Cycle Facts, Journal of Monetary Economics,

41, 475-540.

Canova, F. (2007) Methods for applied macroeconomic research, Princeton university

press, Princeton, NJ.

Canova, F. (2009a). How much structure in empirical models, T. Mills and K. Patterson

(eds.), Palgrave Handbook of Econometrics, volume 2, 30-65.

Canova, F. (2009b). What explains the great moderation in the U.S.?: A Structural

Analysis. Journal of the European Economic Association, 7, 697-721.

Canova, F. (2014). Bridging DSGE models and the raw data, Journal of Monetary

Economics,67, 1-15

Canova, F., and F. Ferroni (2011). "Multiple �ltering device for the estimation of DSGE

models", Quantitative Economics, 2, 73-98.

Canova, F., Ferroni, F. and C. Matthes (2014). Choosing the variables to estimated

singular DSGE models, Journal of Applied Econometrics, 29,1099-1117.

Canova, F., Ferroni, F. and C. Matthes (2015). Approximating Time Varying Structural

Models with Time Invariant Structures, CEPR working paper 10803.

Canova, F. and C. Matthes (2016). A composite likelihood approach for dynamic struc-

tural models, manuscript.

Canova, F. and E. Pappa (2007). Price dispersion in monetary unions. The role of �scal

shocks, Economics Journal, 117, 713-737.

Canova, F. and M. Paustian (2011). Business cycle measurement with some theory.

Journal of Monetary Economics, 48, 345-361.

Canova, F. and Sala, L. (2009). Back to square one: identi�cation issues in DSGE

models, Journal of Monetary Economics, 56,431-449.

Chang, Y., Kim, S. and F. Schorfheide (2013). Labor market heterogeneity and the policy-

(in)variance of DSGE model parameters'. Journal of the European Economic Association,

11,193-220.

Chari, V, Kehoe, P. and McGrattan, E. (2007). Business cycle accounting, Econometrica,

75, 781-836.

Chari, V., Kehoe, P. and E. McGrattan (2008). Are structural VARs with long run

restrictions useful in developing business cycle theory, Journal of Monetary Economics,

55, 1137-1355.

Chari, V., Kehoe, P. and E. McGratten (2009). New Keynesian models: not yet useful

for policy analysis, American Economic Journal: Macroeconomics, 1, 242-266.

Cogley, T. and T. Yagihashi, (2010). Are DSGE approximating models invariant to policy

shifts? The B.E. Journal: Macroeconomics Contributions, vol 10, article 27.

Cogley, T., Matthes, C. and A. Sbordone (2015). Optimized Taylor rules for disin ation

when agents are learning. Journal of Monetary Economics, 72, 131-147

Cogley, T. and T. Sargent (2005). Drifts and volatilities. Monetary policy and outcomes

in post WWII US. Review of Economic Dynamics, 8, 262-302.

Curdia, V. and R. Reis (2009) Correlated disturbances and US business cycles, NBER

working paper 15744.

Davig, T. and E. Leeper (2006). Endogenous monetary policy regime changes. NBER

International Seminar in Macroeconomics, 345-391. National Bureau of Economic Re-

search.

Del Negro, M. and F. Schorfheide (2004). Priors from General Equilibrium Models for

VARs, International Economic Review, 45, 643-673.

Del Negro M. and F. Schorfheide (2008). Forming priors for DSGE models (and how it

a�ects the assessment of nominal rigidities), Journal of Monetary Economics, 55, 1191-

1208.

Del Negro, M. and F. Schorfheide (2009) Monetary policy analysis with potentially mis-

speci�ed models, American Economic Review,99, 1415-1450.

Dew Becker, I. (2014) Bond pricing with a time varying price of risk in an estimated

medium-scale DSGE model. Journal of Money Credit and Banking, 46, 837-888.

Dueker, M. Fisher, A. and Dittman, R. (2006). Stochastic Capital depreciation and the

co-movements of hours and productivity. Berkeley Journals: Topics in Macroeconomics,

vol 6, article 6.

Eklund, J., R. Harrison, G. Kapetanios, and A. Scott (2008). Breaks in DSGE models,

manuscript.

Foroni, C. and M. Marcellino (2014). Mixed-Frequency Structural Models: Identi�cation,

Estimation, And Policy Analysis," Journal of Applied Econometrics, 29, 1118-1144.

Faust, J. (2009) The New macro models: washing our hand and watching for iceberg,

Sveriges Riksbank, Economic Review, 1, 45-68.

Faust, J. and A. Gupta (2012). Posterior Predictive Analysis for Evaluating DSGE Models,

NBER working paper 17906.

Ferroni, F., Grassi, S. and M. Leon Ledesma (2015) Fundamental shock selection in

DSGE models, forthcoming Journal of Applied Econometrics.

Fernandez Villaverde, J. and J. Rubio Ramirez (2007). How structural are structural

parameters. NBER Macroeconomics Annual, 22, 83-132.

Fernandez Villaverde, J. , P. Quintana, and J. Rubio Ramirez (2011). Risk matters: the

real e�ects of volatility shocks. American Economic Review, 101, 2530-2563.

Ferrante, F. (2015) Endogenous loan quality. Federal Reserve Board, manuscript.

Gertler, M. and Karadi, P. (2010). A Model of unconventional monetary policy. Journal

of Monetary Economics, 58, 17-34.

Guerron Quintana, P. (2010). What you match does matter: the e�ects of data on DSGE

estimation, Journal of Applied Econometrics, 25, 774-804.

Gorodnichenko Y. and S. Ng (2010). Estimation of DSGE models when the data are

persistent, Journal of Monetary Economics, 57, 325{340.

Guerrieri, L. and M. Iacoviello (2015) Occbin a toolkit for solving models with occasionally

binding constraints easily. Journal of Monetary Economics, 70, 22{38.

Hansen, L. and T. Sargent (1993). Seasonality and approximation errors in rational

expectations models, Journal of Econometrics, 55, 21{55.

Hansen, L. and T. Sargent (2010). Wanting robustness in macroeconomics, in B. Fried-

man and M. Woodford (eds.) Handbook of Monetary Economics, 3, 1097-1157.

Huang, N. (2014). Weak Inference for DSGE models with time varying parameters.

Boston College, manuscript.

Hurwicz, L. (1962). On the structural form of interdependent systems, in E. Nagel, P.

Suppes, A. Tarski (eds.) Logic, Methodology and Philosophy of science: proceedings of

a 1960 international congress. Stanford University Press.

Kadane, J., Dickey, J., Winkler, R. , Smith, W. and S. Peters, (1980), Interactive elicita-

tion of opinion for a normal linear model, Journal of the American Statistical Association,

75, 845-854.

Kim, J.(2003). Functional equivalence between intertemporal and multisectoral invest-

ment adjustment costs, Journal of Economic Dynamics and Control, 27, 533-549.

Kulish, M. and A. Pagan (2016) Estimation and solution of models with expectations and

structural changes, forthcoming, Journal of Applied Econometrics.

Komunjer, I. and Ng, S. (2011). Dynamic Identi�cation of DSGE models, Econometrica,

79, 1995-2032.

Koop, G. , H. Pesaran, R. Smith (2013). On the identi�cation of Bayesian DSGE Models,

Journal of Business and Economic Statistics, 31, 300-314.

Inoue, A., Kuo, C., B. Rossi, (2016), Identifying the sources of model misspeci�cation,

forthcoming, Journal of Monetary Economics.

Ireland, P. (2004). A method for taking Models to the data, Journal of Economic

Dynamics and Control, 28, 1205-1226.

Ireland, P. (2007). Changes in the Federal Reserve in ation target: causes and conse-

quences. Journal of Money Credit and Banking, 39, 1851-1882.

Iskrev, N. (2010). Local Identi�cation in DSGE Models, Journal of Monetary Economics,

57, 189-202.

Justiniano, A. and G. Primiceri (2008). The time-varying volatility of macroeconomic

uctuations. American Economic Review, 98, 604-641.

Leeper, E., Traum, N. and T. Walker (2015) Clearing up the Fiscal Multiplier Morass,

forthcoming, American Economic Review

Liu, Z., Waggoner, D. and T. Zha (2011). Sources of macroeconomic uctuations: a

regime switching DSGE approach. Quantitative Economics, 2, 251-301.

Lombardi, M and G. Nicoletti (2012). Bayesian prior elicitation in DSGE models: Macro

vs micro priors, Journal of Economic Dynamics and Control, 36, 294-313.

Lombardo, G. and A. Sutherland (2007). Computing second order accurate solutions

for rational expectation models using linear solution methods. Journal of Economic

Dynamics and Control, 31, 515-530.

Lubik, T. and F. Schorfheide (2004). Testing for Indeterminacy: An Application to U.S.

Monetary Policy,American Economic Review, 94, 190-217.

Magnusson, L. and S. Mavroeidis (2014). Identi�cation using stability restrictions. Econo-

metrica 82, 1799-1851.

Meier S. and C. Sprengler (2015). Temporal stability of time preferences. Review of

Economics and Statistics, 97, 273-286.

Mueller, U. (2012). Measuring Prior sensitivity and prior informativeness in large Bayesian

models, Journal of Monetary Economics, 56,581-597.

Pagan, A.(2016) Some shocking consequences of using measurement error shocks when

estimating time series models, forthcoming, Oxford Bulletin of Economics and Statistics.

Parkin, M. (1988) A method for determining whether the parameters in aggregative

models are structural. Carnegie Rochester Conference Series in Public Policy, 29, 215-

252.

Primiceri, G. (2005) Time varying structural vector autoregression and monetary policy.

Review of Economic Studies, 72, 821-852.

Qu, Z. and Tkachenko, D. (2012). Identi�cation and frequency domain maximum likeli-

hood estimation of linearized dynamics stochastic general equilibrium models, Quantita-

tive Economics, 3, 95-112.

Rios Rull J. V. and R. SantaEularia-Llopis, (2010). Redistributive shocks and productivity

shocks. Journal of Monetary Economics,37, 931-948.

Rubio, J., Waggonner, D, and Zha, T (2010). Structural vector autoregressions: theory

of identi�cation and algorithm for inference, Review of Economic Studies, 77, 665-696.

Schmitt Grohe, S. and M. Uribe (2003). Closing small open economy models. Journal

of International Economics, 62, 161-185.

Seoane, H. (2014) Parameter drifts, misspeci�cation and the real exchange rate in emerg-

ing countries, forthcoming, Journal of International Economics.

Stock, J. and Watson, M. (1996). Evidence of structural instability in macroeconomic

relationships. Journal of Business and Economic Statistics,14, 11-30.

Stock, J. and M. Watson (2002). Macroeconomic Forecasting using Di�usion Indices,

Journal of Business and Economic Statistics, 20, 147-162.

Smets, F. and R. Wouters (2003). An estimated dynamic stochastic general equilibrium

model of the euro area, Journal of European Economic Association, 1, 1123{1175.

Smets, F. and R. Wouters (2007). Shocks and frictions in the US Business Cycle: A

Bayesian DSGE approach, American Economic Review, 97, 586{606.

Tierney, L. (1994). Markov Chains for Exploring Posterior Distributions (with discussion),

Annals of Statistics, 22, 1701-1762.

Watson, M. (1993) Measure of �t for calibrated models, Journal of Political Economy,

101,1011-1041.

Vavra, J. (2014). Time varying Phillips curves, NBER working paper 19790.

1 Bayesian analysis: A refresher

� Bayesian econometrics is based on Bayes theorem. It tells us how to

modify prior beliefs about the parameters, once we observe the data.

� Parameters � 2 A, A compact. Prior information g(�). Sample infor-

mation f(yj�) � L(�jy). Bayes Theorem.

g(�jy) = f(yj�)g(�)f(y)

/ f(yj�)g(�) = L(�jy)g(�) � �g(�jy) (1)

f(y) =Rf(yj�)g(�)d� is the unconditional sample density (Marginal like-

lihood); g(�jy) is the posterior density, �g(�jy) is the posterior kernel:g(�jy) = �g(�jy)R

�g(�jy)d�.

� f(y) is a measure of �t. It tells us how good the model is in repro-

ducing the data, on average over the parameter values with positive prior

probability.

� g(�jy) is the conditional probability of �, given y.

� Theorem uses rule: P (A;B) = P (AjB)P (B) = P (BjA)P (A). To use

Bayes theorem we need:

a) Formulate prior beliefs, i.e. choose g(�).

b) Formulate a model for the data (the conditional probability f(yj�)).

� Bayes Theorem with two (N) samples.

Suppose yt = [y1t; y2t] and that y1t is independent of y2t. Then

�g(�jy1; y2) � f(y1; y2j�)g(�) = f2(y2j�)f1(y1j�)g(�) / f2(y2j�)g(�jy1)g(�jy1) / f1(y1j�)g(�) (2)

Posterior for � is obtained �nding �rst the posterior given y1t and then,

treating the posterior as a prior, �nding the posterior given y2t.

- Sequential learning.

- y1t; y2t could be data from di�erent regimes.

- y1t; y2t could be data from di�erent countries.

1.1 Likelihood Selection

� The likelihood is the theoretical (DSGE) model you write down.

� It must represent well the data. Misspeci�cation problematic since it spillsacross equations and makes estimates uninterpretable. Bayesian analysis

meaningful even if likelihood misspeci�ed.

1.2 Prior Selection

� Three methods to choose priors.

1) Non-Informative subjective. Choose reference priors because they are

invariant to the parametrization.

- Location invariant prior: g(�) =constant (=1 for convenience).

- Scale invariant prior g(�) = ��1.

- Location-scale invariant prior : g(�; �) = ��1.

� Useful because many classical estimators (OLS, ML, etc.) are Bayesianestimators with non-informative priors

2) Conjugate Priors

A prior is conjugate if the posterior has the same form as the prior. Because

posterior shape is analytically available, only need to �gure out posterior

moments.

� Can be used in models which are linear in the parameters.

� Important result: posterior moments = weighted average of sample and

prior information. Weights = relative precision of sample and prior infor-

mations.

Tight Prior

4 2 0 2 40.0

0.5

1.0

1.5

2.0POSTERIORPRIOR

Loose Prior

4 2 0 2 40.0

0.5

1.0

1.5

2.0POSTERIORPRIOR

3) Objective priors and ML-II approach. Based on the marginal likelihood:

f(y) =ZL(�jy)g(�)d� � L(yjg) (3)

Given L(�jy), L(yjg) re ects the plausibility of g in the data.If g1 and g2 are two priors and L(yjg1) > L(yjg2), better support for g1.Hence, can estimate the "best" g using L(yjg).

In practice, set g(�) = g(�j�), where �= hyperparameters (e.g. the mean

and the variance of the prior). Then L(yjg) � L(yj�).

The � that maximizes L(yj�) is called ML-II estimator and g(�j�ML) is

ML-II based prior.

Important:

- y1; : : : yT should not be the same sample used for inference.

- y1; : : : yT could represent past time series information, cross sectional/

cross country information.

- y1; : : : yT is called "Training sample".

4) Priors for DSGE?

- Assume that g(�) = g1(�1)g2(�2)::::gq(�q):

- Use a conventional shape for the distributions: a Normal, Beta,Gamma,

Inverted Gamma, Uniform for individual parameters. Choose moments in

a data based fashion: mean = calibrated parameters, variance: subjective.

Problems:

� Independent priors inconsistent with subjective prior beliefs over jointoutcomes. In particular, multivariate priors are often too tight!!

� Calibrated value may be di�erent for di�erent purposes. For example,risk aversion mean is 6-10 to �t the equity premium; close to 1-2 if we

want to �t the reaction of consumption to changes in monetary policy;

negative values to �t aggregate lottery revenues. Which one do we use?

Same for habit parameters (see Faust and Gupta, 2012)

� Circularity: priors based on the same data used to estimate!! Use cali-brated values in a " training sample".

Negro and Schorfheide (2008): dependent priors can be obtained by match-

ing statistics in a training sample.

Summary

Inputs of the analysis: g(�); f(yj�).

Outputs of the analysis:

g(�jy) / f(yj�)g(�) (posterior),

f(y) =Rf(yj�)g(�) (marginal likelihood), and

f(yT+� jyT ) (predictive density of future observations).

- In simple examples, f(y) and g(�jy) can be computed analytically.

- In DSGEs, they can only be computed numerically via posterior simulator.

�Monte-Carlo principle: Approximate integrals with sum of random draws.

Example 1.1 Suppose we want to approximate Eh(�) =Rh(�)g(�jy)d�.

How do we do it? - Draw �l in iid fashion from g(�jy). Compute h(�l).- Repeat l = 1; : : : ; L times. - Use E(h(�)) � 1

L

Pl h(�

l)

h(�) could be any continuous function of � (moments, impulse responses,

forecasts, etc.).

� Since g(�jy) is not analytically available, need to �nd a gAP (�jy) ug(�jy), and easy to draw from.

� Normal Approximations

� Basic Posterior simulators (Acceptance and Importance sampling).

� Markov Chain Monte Carlo (MCMC) methods

1.3 Normal approximation

If T is large g(�jy) � f(�jy). Taking a Taylor expansion around mode ��

log g(�jy) � log g(��jy) + 0:5(��)0[@2 log g(�jy)@�@�0

j�=��](��) (4)

Since g(��jy) is constant, letting �� = �[@2 log g(�jy)

@�@�0�1j�=��]

g(�jy) � N(��;��) = gAP (�jy) (5)

- An approximate 100(1-�)% highest credible set is ��(�=2)I(��)�0:5where �(:) the CDF of a standard normal.

- If �l is an iid draw from gAP (�jy), E(h(�)) = 1L

Pl h(�

l) and a 16-84

range is [h(�)16; h(�)84].

� Approximation is valid under regularity conditions when T !1 or when

the posterior kernel is roughly normal. It is highly inappropriate when:

- Likelihood function at in some dimension (I(��) badly estimated).

- Likelihood function is unbounded (no posterior mode exists).

- Likelihood function has multiple peaks.

- Likelihood function is asymmetric.

- �� is on the boundary of A (quadratic approximation wrong).

- g(�) = 0 in a neighborhood of �� (quadratic approximation wrong).

� Corrections:

- If multiple modes are present, �nd an approximation to each mode, and

set gAP (�jy) = Pi %iN(��i ;��i ) where 0 � %i � 1. If modes are clearly

separated, select %i = g(��i jy)j��i j�0:5.

- If the sample is small, use a t-approximation i.e. gAP (�jy) =Pi %ig(~�jy)[� + (� � ��i )

0��i(� � ��i )]�0:5(k+v) with small �. (If � = 1

t-distribution=Cauchy distribution; if � = 2 t-distribution=Laplace distri-

bution. Typically � = 4; 5 appropriate).

� Check accuracy of approximation.

- Compute Importance Ratio IRl =�g(�ljy)

gAP (�ljy), where �g(�ljy) is the kernel

of the posterior (which you can always compute).

- Accuracy is good if IRl is constant across l. If not, need to use other

techniques.

Example 1.2 True: g(�jy) is t(0,1,2). Approximation: N(0,c), where c = 3; 5; 10; 100.

100 200 300 400 5000

200

400

600

800

c=3

Fre

quen

cy

1 2 30

50

100

c=5

0.5 1 1.5 2 2.50

50

100

c=10

Weights

Fre

quen

cy

2 4 6 80

50

100

c=100

Weights

Horizontal axis=importance ratio weights, vertical axis= frequency of the weights.

- Posterior has fat tails relative to a normal. Thus, the approximation is poor.

1.4 Markov Chain Monte Carlo Methods

� Idea: Suppose there aren states (x1; : : : xn). Let P (i; j) = Pr(xt+1 =xjjxt = xi) and let �(t) = (�1t; : : : �nt) be their unconditional probabilityat t. Then �(t + 1) = P�(t) = P t�(0) and � is an equilibrium (ergodic,steady state, invariant) distribution if � = �P .

-Set � = g(�jy), choose some initial �(0) and some transition P . Ifconditions are right, iterate from �(0) and limiting distribution is g(�jy).

g(α|y)

gMC(1) gMC(0)

α

� What conditions do we need? P (�;A), where A is a set, must be:

� irreducible, i.e. it has no absorbing state.

� aperiodic, i.e. it does not cycle across a �nite number of states.

� Harris recurrent, i.e. each cell is visited an in�nite number of times withprobability one.

Bad draws Good draws

A B ABB

Result 1: A reversible Markov chain, has an ergodic distribution (exis-

tence).

Result 2: (Tierney (1994)) (uniqueness) If a Markov chain is Harris recur-

rent and has a proper invariant distribution �(�), �(�) is unique.

Result 3: (Tierney(1994)) (convergence) If a Markov chain with invariant

�(�) is Harris recurrent and aperiodic, for all �0 2 A and all A, as L!1.- jjPL(�0; A)� �(�)jj ! 0, jj:jj is the total variation distance.

- For all h(�) absolutely integrable with respect to �(�).

- limL!11L

PLl=1 h(�

l)a:s:!Rh(�)�(�)d�.

If chain has a �nite number of states, it is su�cient for irreducibility,

aperiodicity and Harris recurrence that P (�l 2 A1j�l�1 = �0; y) > 0, all

�0; A1 2 A.

� Can dispense with the �nite number of state assumption.

� Can dispense with the �rst order Markov assumption.

General simulation strategy:

� Choose starting values �0, choose a P with the right properties.

� Run MCMC simulations.

� Check convergence.

� Summarize simulation results i.e. compute h(�) and its moments.

1.4.1 Metropolis-Hastings algorithm

- General purpose algorithm to be used when faster methods (e.g. Gibbs

sampler) are unusable or di�cult to implement.

Starts from an arbitrary transition function q(�y; �l�1), where �l�1; �y 2A and an arbitrary �0 2 A. For each l = 1; 2; : : : L.

- Draw �y from q(�y; �l�1) and draw $ � U(0; 1).

- If $ < E(�l�1; �y) = [ �g(�yjY )q(�y;�l�1)

�g(�l�1jY )q(�l�1;�y)], set �` = �y.

- Else set �` = �`�1:

These iterations de�ne a mixture of continuous and discrete transitions:

P (�l�1; �l) = q(�l�1; �l)E(�l�1; �l) if �l 6= �l�1

= 1�ZAq(�l�1; �)E(�l�1; �)d� if �l = �l�1 (6)

P (�l�1; �l) satis�es the conditions needed for existence, uniqueness andconvergence.

� Idea: Want to sample from highest probability region but want to visit

as much as possible the parameter space. How to do it? Choose an initial

vector and a candidate, compute kernel of posterior at the two vectors. If

you go uphill, keep the draw, otherwise keep the draw with some probability.

How do you choose q(�l�1; �y)?

- Typical choice: random walk chain. q(�y; �l�1) = q(�y � �l�1), and�y = �l�1+ v where v � N(0; �2v). To get "reasonable" acceptance ratesadjust �2v. Often �

2v = c � �;� = [�g

00(��jy)]�1. Choose c.

� A good q must be:

- easy to sample from.

- each move goes a reasonable distance in parameter space but does not

reject too frequently (ideal acceptance rate 25-40%).

Implementation issues

A) How to draw posterior samples?

- Produce one sample (of dimension n � L + �L). Throw away initial �L

observations. Keep only elements (L; 2L; : : : ; n � L) (to eliminate theserial correlation of the draws).

- Produces n samples of �L+ L elements. Use last L observations in each

sample for inference.

- Dynare setup to produce n samples. By default it keeps the last 25

percent of the draws of each chain. Careful: Need to make sure that

with 75 percent of the draws each chain has converged. Can adjust this

parameter.

B) How large should �L be?

- Start from di�erent �0. Check if sample you keep, for a given �L, has

same properties (Dynare approach).

- Choose two points, �L1 < �L2; compute distributions/moments of � after

these points. If visually similar, algorithm has converged at �L1. Could

this recursively ! CUMSUM statistic for mean, variance, etc.(checks if it

settles down, no testing required).

For simple problems �L � 50 and L � 200.

For DSGEs �L � 100; 000� 200; 000 and L � 500; 000. If multiple modesare present, L could be larger.

C) How do you compute interesting statistics:

- Weak Law of Large Numbers E(h(�)) � 1j

Pnj=1 h(�

jL); where �jL is

the j � L-th observation drawn after �L iterations are performed.

- E(h(�)h(�)0) =PJ(L)�J(L)w(�)ACFh(�); ACFh(�) = autocovariance of

h(�) for draws separated by � periods; J(L) function of L, w(�) a set of

weights.

- Marginal density (�1k; : : : �Lk ): g(�kjy) =

1L

PLj=1 g(�kjy; �

jk0; k

0 6= k).

- Predictive inference f(yt+� jyt) =Rf(yt+� jyt; �)g(�jyt)d�.

1.5 Model Comparison

� f(y) the marginal likelihood (ML) is a measure of �t.

� Bayes Factor (BF): f(yjM1)f(yjM2)

; Posterior odds (PO):f(yjM1)g(M1)f(yjM2)g(M2)

, g(Mi); g(M2)

are priors on the models.

� Rule of thumb: If BF (PO) < 3 inconclusive; 3 < BF < 10 favours M1,

BF > 10 strongly favours M1.

- M1 and M2 could be two structural models, two time series models, or

one structural and one time series model

- BIC is an asymptotic expansion of BF.

- Can compare models with di�erent number of parameters or non-nested.

1.6 Robustness

� Typically prior chosen to make calculation convenient. How sensitive areresults to prior choice?

� Typical approach: repeat estimation for di�erent priors (ine�cient).

� Alternative.

i) Select a prior g1(�) with support included in g(�).

ii) Let w(�) =g(�)g1(�)

. Then any h1(�) =R(h(�)w(�)dg1(�) can be

computed using h1(�) �1L

Plw(�

l)h(�l)Plw(�

l), where h(�l) are the statistics

computed with g(�).

�Just need the original output obtained and a new set of weights!

1.7 Bayesian estimation of DSGE models

Why do we use Bayesian methods to estimate DSGE models?

1) Hard to include non-sample information in classical ML (a part from

range of possible values).

2) Classical ML is justi�ed only if the model is the GDP of the actual data.

Can use Bayesian methods for misspeci�ed models (economic inference

may be problematic, no problem for statistical inference).

3) Can incorporate prior uncertainty about parameters and models.

Bayesian linear DSGE algorithm

Given some initial structural parameter vector �0

[1.] Construct a perturbed solution of the DSGE economy.

[2.] Transform the data to make it conformable to the model.

[3.] Compute likelihood via Kalman �lter.

[4.] Specify prior distributions g(�).

[5.] Draw posterior sequences for � using MH algorithm. Check conver-

gence.

[6.] Compute marginal likelihood and compare it to the one of alternative

models using Bayes factors.

[7.] Construct statistics of interest. Use loss-based evaluation of discrep-

ancy model/data.

[8.] Perform robustness exercises.

Variations

- If higher than �rst order solution used, compute the likelihood with other

�lters or by direct inversion (see later on).

- If convergence slow or problematic, use other MCMC algorithm ( see

later on)

- If model is misspeci�ed, use step [6.] to check for the lest misspeci�ed

model.

- If transformation of data problematic, opt for exible bridge �lter (see

later on).

1.8 Identi�cation issues

� Prior to estimation one should ask: can we recover structural parame-ters � from the data?

- Identi�ability: Mapping from the objective function to the structural

parameters needs to be well behaved.

- Objective function to have a unique minimum at � = �0- Hessian is positive de�nite and has full rank

- Curvature of the objective function is "su�cient"

Di�cult to verify if these conditions hold in practice in DSGEs because:

� Mapping from structural parameters to solution coe�cients is unknown

(numerical solution).

� Objective function is typically a nonlinear function of solution parameters.

� Di�erent objective functions may have di�erent "identi�cation power" .

DSGE model optimality conditions:

Et[A(�)xt+1 +B(�)xt + C(�)xt�1 +D(�)zt+1 + F (�)zt] = 0 (7)

zt+1 �G(�)zt � et = 0 (8)

Stationary (log)-linearized RE solution:

xt = J(�)xt�1 +K(�)et (9)

De�nitions

� i) Solution identi�cation problem: can we recover structural � from the

decision rule matrices J(�);K(�); G(�)?

� ii) Objective function identi�cation: can we recover aggregate decisionrule matrices J(�);K(�); G(�) from the objective function?

� iii) Population identi�cation (convoluting i) and ii)): can we recover thestructural parameters from the objective function in population?

� iv) Sample identi�cation: can we recover structural parameters from the

objective function, given a sample of data?

- i) and ii) can occur separately or in conjunction.

- i) is model speci�c; ii) may result from an improper choice of objective

functions.

- iv) may occur even if i) and ii) are �ne. This is typically the object of

much econometric literature.

- Problems with DSGE models are in i)-ii).

- Problems may be local or global.

� What kind of population problems may DSGE models encounter?

1: Observational equivalence

1.1) Linear RE forward looking models:

B(�)xt = A(�)Etxt+1 + et (10)

where et � (0;). Assume that B is non-singular.

Solution:

xt =1Xj=0

Q(�)jB(�)�1Etet+j (11)

where Q(�) = B(�)�1A(�).

� Since Etet+j = 0; Etxt+1 = 0, the unique RE equilibrium is xt =

B(�)�1et.

� (10) is observationally equivalent to a model with no dynamics, i.e.yt =M(�)et, where M(�) = B(�)�1.

� (10) is observationally equivalent to a model where the structural shocksare linear combination of the original structural shocks, i.e. yt = ut where

ut = B(�)�1et.

� (10) is observationally equivalent to a model with higher order degreeof forward lookingness, i.e. B(�)yt = A(�)Etyt+n + et; n > 1 or to a

model with more complicated forward looking dynamics, e.g. B(�)yt =Ppn=1An(�)Etyt+n + et.

1.2) Linear RE forward and backward looking models

B(�)xt = A(�)Etxt+1 + C(�)xt�1 + et (12)

where et � (0;). Still maintain that B(�) is non-singular.

� Solution xt = D(�)xt�1 + B(�)�1et where D(�) solves A(�)D(�)2 �B(�)D(�) + C(�) = 0.

� The solution is unique and stationary if all the eigenvalues of D(�) andof (B(�)�A(�)D(�))�1A are all less than one in absolute value.

� (12) is observationally equivalent to a model with just backward lookingdynamics.

Note: in (12) the parameter space may not be variational free, e.g. theremay be restrictions on the parameter space (A(�)+C(�) = 1) and restric-tions due to eigenvalues constraints.

Example 1.3 Consider the three processes ( �2 � 1 � �1 � 0):

1) xt =1

�2+�1Etxt+1 +

�1�2�1+�2

xt�1 + vt.

2) yt = �1yt�1 + wt

3) yt =1�1Etyt+1 where yt+1 = Etyt+1 + wt and wt iid (0; �2w).

Stable RE solution of 1) xt = �1xt�1 +�2+�1�2

vt.

Stable RE solution of 3) is yt = �1yt�1 + wt.

If �w =�2+�1�2

�v, three processes have same impulse responses.

- Bayer and Farmer (2007): Axt+DEtxt+1 = B1xt�1+B2Et�1xt+Cvt.

- Kim (2003, JEDC); Lubik and Schoefheide (2004,AER) An and Schorfheide

(2007,ER).

2: Underidenti�cation

Example 1.4

Rt = �t + e1t (13)

yt = Etyt+1 � �(Rt � Et�t+1) + e2t (14)

�t = �Et�t+1 + yt + e3t (15)

B(�) =

24 1 0 � � 1 00 � 1

35 ; A(�) =24 0 0 00 1 �0 0 �

35;Q(�) = B(�)�1A(�) = 1

�

0@ 0 (� + �)0 1 �(1� � )0 �

1A. The two nonzero eigenval-

ues of Q(�) are �i =(1+�+ �)��2( � +1)

; i = 1; 2, where � = (�2� 2�+ 2�2+2 � + 2 �� 4 �� + 1)0:5. If �i < 1; 8i, the solution is

Rt = �t + e1t (16)

yt = ��Rt + e2t (17)

�t = yt + e3t (18)

� � is not identi�able (it only appears in A(�), and does not enter in

(16)-(18) which are used to computed the likelihood function).

� The eigenvalues formula implies restrictions on (�; ; ; �). Thus, tokeep �i < 1 as ; ; � vary, � needs to be correspondingly adjusted.

- Even if � is calibrated, not all parameters are separately identi�able.

- Because of the stability restrictions, the posterior for � may be updated

even if the likelihood is independent of � (see later).

Example 1.5 Consider a forward looking version of the previous model

Rt = Et�t+1 + e1t (19)

yt = �Etyt+1 � �(Rt � Et�t+1) + e2t (20)

�t = �Et�t+1 + yt + e3t (21)

The solution is xt �

264 Rtyt�t

375 =264 1 0 0� 1 0� � 1

375 et.� �; ; � are underidenti�ed (they disappear from the solution). We needa model with backward and forward looking dynamics to identify them.

� Di�erent impulse responses have di�erent "identi�cation" information.Limited and full information objective functions have di�erent informationcontent. How do we maximize the identi�cation information?

� Identi�cation may be "local", i.e. it depends on the values and �.

3: Weak and partial identi�cation

Consider the RBC model

max�tXt

c1��t

1� �(22)

ct + kt+1 = k�t zt + (1� �)kt (23)

RE solution for wt+1 = [kt+1; ct; yt; zt]0 = Awt +Bet.

- Select � = 0:985; � = 2:0; � = 0:95; � = 0:36; � = 0:025; zss = 1.

- Simulate long data, compute population objective function and study its

shape and features.

- Objective function is the distance between (ct; yt) responses to technology

shocks in the model and the data (model is the DGP).

0.80.9

12

320

10

0

ρφ

10 10

5 5

5 5

1 1

1 1

0.5 0.5

0.5 0.50.050.050.05 0.050.010.01

ρ

φ

0.8 0.85 0.9 0.951

1.5

2

2.5

3

0.010.02

0.030.985

0.990.995

108642

δβ1

1

0.5

0.5

0.1

0.1

0.0

5

0.05

0.0

5

0.01

0.0

1

δ

β

0.005 0.01 0.015 0.02 0.025 0.030.9820.9840.9860.988

0.990.9920.994

Distance surface for selected parameters, RBC

What causes the problems? Law of motion of the capital stock in almost

invariant to :

(a) variations in � (weak identi�cation).

(b) variations in � and � are additive (partial under-identi�cation).

Can we reduce problems by:

(i) Changing W (T )? (long horizon may have little information).

(ii) Matching VAR coe�cients?

(iii) Altering the objective function?

In this speci�c case: NO.

0 .80 .9

12

31 0

5

0

ρφ

5

1

1

0 .5

0 .5

0 .10.10.050 .0 50.01

ρ

φ

0 .8 0 .8 5 0 .9 0 .9 51

1 .5

2

2 .5

3

0 .0 10 .0 2

0 .0 30 .9 85

0 .9 90 .9 95

4

2

δβ

1

0.0

50

.01

0.0

10

.005

δ

β

0 .0 050 .0 10 .0 150 .0 20 .0 25

0 .9 85

0 .9 9

0 .9 95

0 .80 .9

12

30 .2

0 .1

0

ρφ

0.1

0.1

0.05

0.05

0.01

0.01

ρ

φ

0 .8 0 .8 5 0 .9 0 .9 51

1 .5

2

2 .5

3

0 .0 10 .0 2

0 .0 30 .9 85

0 .9 90 .9 95

0 .4

0 .2

δβ

0.0

10

.005

0.0

02

0.0

02

0.0

01

δ

β

0 .0 050 .0 10 .0 150 .0 20 .0 25

0 .9 85

0 .9 9

0 .9 95

0 .80 .9

12

32

1

0

x 1 0 3

ρφ

0.0001

0.0001

5e0055e005

ρ

φ

0 .8 0 .8 5 0 .9 0 .9 51

2

3

0 .0 10 .0 2

0 .0 30 .9 85

0 .9 90 .9 95

1 0

5

x 1 0 3

δβ

0.0

02

0.0

010

.000

5

0.0

001

δ

β

0 .0 050 .0 10 .0 150 .0 20 .0 25

0 .9 85

0 .9 9

0 .9 95

Distance surface for selected parameters, RBC

Can we eliminate weak identi�cation problems?

- Change options in your optimization routine. Set tolerance level to 10�15

instead that standard 10�8.

- Start optimization routine from many initial values.

Can we eliminate partial identi�cation problems?

Standard solution: calibrate one of the two parameters. Problematic!!

0.01 0.02 0.03 0.04 0.050.51

1.52

2.53

3.54

4.55

δ

φ

β = .985

0.01

0.05

0.05

0.1

0.1

0.10.5

0.5

0.5

0.5

1

1

0.01 0.02 0.03 0.04 0.050.51

1.52

2.53

3.54

4.55

δ

φ

β = .995

0.01

0.05

0.1

0.1

0.5

0.5

0.5

0.511

1

0.01 0.02 0.03 0.04 0.050.10.20.30.40.50.60.70.80.9

δ

η

0.010.05

0.05

0.1

0.10.5

0.5

0.5

0.5

1

1

1

1

1

5 0.01 0.02 0.03 0.04 0.050.10.20.30.40.50.60.70.80.9

δ

η

0.050.1 0.1

0.5

0.5

0.5

0.5

1

1

1

1

15

5

Fixing beta, RBC

Summing up 1

- Identi�cation problems intrinsic to the models and their parameterization.

- Detecting them is complicated because structural parameters enter non-

linearly and solution not analytically available.

- These are population problems. In small samples additional problems

can emerge.

Identi�cation and objective function

What objective function should one use to estimate? Likelihood!!

- It has all the information of the model.

- Using a distance function throws away useful identi�cation information.

If you use a subset of impulse responses, problems could be compounded.

- Better to add steady states back to the solution. Many parameters may

enter only the steady states.

- What does a prior do? Can help if small sample identi�cation problems

but not if they are there in population!!

0.022 0.024 0.026 0.028 0.03

0.9750.980.9850.990.99510000

5000

0

δβ

0.022 0.024 0.026 0.028 0.03

0.9750.980.9850.990.9954

2

0x 10 4

δβ

0.022 0.024 0.026 0.028 0.030.975

0.98

0.985

0.99

0.995

δ

β 10100100

100

1000

1000

0.022 0.024 0.026 0.028 0.030.975

0.98

0.985

0.99

0.995

δβ 100

10001000

Likelihood and Posterior, RBC

- Posterior not usually updated if likelihood has no information.

- With stability constraints, updating is possible.

Identi�cation and solution methods

� An-Schorfheide (2007) Likelihood function better behaved (in terms ofidentifying the parameters) if second order approximation is used. How

about distance function?

maxE0Xt

�t[log(ct � b�ct�1)� atNt]

ct = yt = ztNt

�ct external habit; at stationary labor supply shock; ln(ztzt�1

) � uzt technol-

ogy shock.

Linear solution (only labor supply shocks):

Nt = (b+ �)Nt�1 � b�Nt�2 � (1� b)uat (24)

- Sargent (1978), Kennan (1988): b and � are not separately identi�ed.

Second order solution (only labor supply shocks):

Nt = bNt�1 +b(b�1)2 N2t�1 � (1� b)at � 1

2(�(1� b)2 + 1� b)a2tat = �at�1 + uat

0.2 0.4 0.6 0.80.20.40.60.8

0

0.5

1

1.5

2

2.5

3

3.5

ρ

Responses to a labor supply shock

b

Rat

io o

f Cur

vatu

res

Distance function: linear vs. quadratic

Identi�cation and estimation

What are the consequences of identi�cation problems for estimation? What if we disregardidenti�cation issues and use a �nite sample? Model

yt =h

1 + hyt�1 +

1

1 + hEtyt+1 +

1

�(it � Et�t+1) + v1t (25)

(26)

�t =!

1 + !��t�1 +

�

1 + !��t+1 +

(�+ 1:0)(1� ��)(1� �)

(1 + !�)�yt + v2t (27)

(28)

it = �rit�1 + (1� �r)(��t�1 + �yyt�1) + v3t (29)

h: degree of habit persistence (.85); �: relative risk aversion (2)

�: discount factor (.985); !: degree of price indexation (.25); �: degree of price

stickiness (.68)

�r; ��; �y: policy parameters (.2, 1.55, 1.1); v1t: AR(�1) (.65); v2t: AR(�2) (.65); v3t:

i.i.d.

0.98 0.985 0.99024

x 10 3

β = 0

.985

0.98 0.985 0.99024

x 10 3

0.98 0.985 0.99024

x 10 3

0.98 0.985 0.99024

x 10 3

1 2 301020

φ = 2

1 2 301020

1 2 301020

1 2 301020

0 2 405

10

ν = 3

0 2 405

10

0 2 405

10

0 2 405

10

0.5 0.6 0.7 0.8 0.9050100150

ξ = 0

.68

0.5 0.6 0.7 0.8 0.9050100150

0.5 0.6 0.7 0.8 0.9050100150

0.5 0.6 0.7 0.8 0.9050100150

0.1 0.2 0.300.5

1

λ r = 0

.2

0.1 0.2 0.300.5

1

0.1 0.2 0.300.5

1

0.1 0.2 0.300.5

1

1.2 1.4 1.6 1.8 20123

λ π = 1

.55

1.2 1.4 1.6 1.8 20123

1.2 1.4 1.6 1.8 20123

1.2 1.4 1.6 1.8 20123

0.9 1 1.1 1.2 1.300.10.20.3

λ y = 1

.1

0.9 1 1.1 1.2 1.300.10.20.3

0.9 1 1.1 1.2 1.300.10.20.3

0.9 1 1.1 1.2 1.300.10.20.3

0.6 0.65 0.700.5

1

ρ 1 = 0

.65

0.6 0.65 0.700.5

1

0.6 0.65 0.700.5

1

0.6 0.65 0.700.5

1

0.6 0.65 0.700.5

1

ρ 2 = 0

.65

0.6 0.65 0.700.5

1

0.6 0.65 0.700.5

1

0.6 0.65 0.700.5

1

0.5 0.6 0.7 0.8 0.900.20.40.6

ω = 0

.7

0.5 0.6 0.7 0.8 0.900.20.40.6

0.5 0.6 0.7 0.8 0.900.20.40.6

0.5 0.6 0.7 0.8 0.900.20.40.6

0.7 0.8 0.9 100.020.040.06

IS shock

h =

0.85

0.7 0.8 0.9 100.020.040.06

Cost push shock 0.7 0.8 0.9 100.020.040.06

Monetary policy shock 0.7 0.8 0.9 100.020.040.06

All shocks

Distance function, NK model

2 4 6 8 10 120.6

0.70.80.50.4

0.30.20.1

ν

Monetary shocks

ξ

2 4 6 8 10 120.60.65

0.70.75

0.8

ν

ξ

0.001

0.0010.001

0.001

0.01

0.010.01

0.01

0.1

0.1

0.3

2 4 6 8 10 120.6

0.70.820

1510

50

ν

Cost push shocks

ξ

2 4 6 8 10 120.60.65

0.70.75

0.8

ν

ξ 0.01

0.010.01

0.010.1

0.10.1

0.10.3

0.30.3

0.3

0.5

0.5

0.5

0.5

0.7

0.7

0.7

0.9

0.9

0.92

2

5

0.8 1 1.2 1.41.520.04

0.020

λyλπ

0.8 1 1.2 1.41.21.41.61.8

2

λy

λ π

0.001

0.0010.001

0.010.01

0.8 1 1.2 1.41.522

10

λyλπ

0.8 1 1.2 1.41.21.41.61.8

2

λy

λ π

0.010.010.1

0.1

0.1

0.1

0.30.3

0.3

0.50.5

0.7 0.9

Distance function and contours plots, NK model

0.975 0.98 0.985 0.99 0.995020406080

β = 0.985

2 4 6 80

50

100

φ = 2

0.2 0.4 0.6 0.80204060

ζ = 0.68

0.2 0.4 0.6 0.80

50

100 λr = 0.2

2 4 6 80

50

100

λπ = 1.55

1 2 3 40

50

100

0.2 0.4 0.6 0.80

10

20

ρ1 = 0.65

0.2 0.4 0.6 0.80102030

ρ2 = 0.65

0.2 0.4 0.6 0.8050

100150

ω = 0.25

0.2 0.4 0.6 0.80

50

100 h = 0.85

λy = 1.1

Density Estimates, Monetary Shocks, NK model

0 10 200.20

0.20.40.60.8

11.21.4 Gap

IS

0 10 2000.20.40.60.8

11.21.4 π

0 10 2000.5

11.5

22.5

3 interest rate

0 10 201.5

1

0.5

0

0.5

Co

st p

ush

0 10 200.50

0.51

1.52

2.5

0 10 200.50

0.51

1.52

2.5

0 10 200.50.40.30.20.1

00.10.2

Mo

ne

tary

0 10 200.30.25

0.20.15

0.10.05

00.05

0 10 200.60.40.2

00.20.40.60.8

1

Impulse responses, Monetary Shocks, NK model

NK model. Matching monetary policy shocks, biasTrue PopulationT = 120T = 200T=1000T=1000 wrong

� 0.985 0.2 0.6 0.7 0.7 0.6� 2.00 0.7 95.2 70.6 48.6 400� 0.68 0.1 19.3 17.5 23.5 23.7�r 0.2 2.9 172.0 152.6 132.7 90.5�� 1.55 32.5 98.7 78.4 74.5 217.5�y 1.1 34.9 201.6 176.5 126.5 78.3�1 0.65 13.1 30.4 34.3 31.0 31.3�2 0.65 12.8 32.9 34.8 34.7 34.7! 0.25 0.01 238.9 232.3 198.1 284.0h 0.85 0.04 30.9 32.4 21.3 100

� Bias for poorly identi�ed parameters does not disappear as T !1.

Summing up 2

- Population biases are present.

- Distribution of estimates far from normal.

- Impulse responses are close to the true ones even if parameters are not

identi�ed.

- Surface plots/ numerical analysis can help to detect potential problems.

Wrong inference

0 = �kt+1 + (1� �)kt + �xt0 = �ut + rt

0 =��

�rxt + (1�

��

�r)ct � �kt � (1� �)Nt � �ut � ezt

0 = �Rt + �rRt�1 + (1� �r)(��t + �yyt) + ert0 = �yt + �kt + (1� �)Nt + �ut + ezt0 = �Nt + kt � wt + (1 + )rt

0 = Et[h

1 + hct+1 � ct +

h

1 + hct�1 �

1� h

(1 + h)'(Rt � �t+1)]

0 = Et[�

1 + �xt+1 � xt +

1

1 + �xt�1 +

��1

1 + �qt +

�

1 + �ext+1 �

1

1 + �ext]

0 = Et[�t+1 �Rt � qt + �(1� �)qt+1 + ��rrt+1]

0 = Et[�

1 + � p�t+1 � �t +

p

1 + � p�t�1 + Tp(�rt + (1� �)wt � ezt + ept)]

0 = Et[�

1 + � pwt+1 � wt +

1

1 + �wt�1 +

�

1 + ��t+1 �

1 + � w1 + �

�t + w

1 + � wt�1(wt � �Nt �

'

1� h(ct � hct�1)� ewt)]

� depreciation rate (.0182) �w wage markup (1.2) parameter (.564) �� steady state � (1.016)� share of capital (.209) h habit persistence (.448)' risk aversion (3.014) �l inverse elasticity of labor supply (2.145)� discount factor (.991) ��1 investment's elasticity to Tobin's q (.15)�p price stickiness (.887) �w wage stickiness (.62) p price indexation (.862) w wage indexation (.221)�y response to y (.234) �� response to � (1.454)�r int. rate smoothing (.779)

Tp � (1��p)(1��p)(1+� p)�p

Tw � (1��w)(1��w)(1+�)(1+(1+�w)�l�

�1w )�w

0.015 0.020123456789

10 x 10 7

δ = 0.018 0.2 0.250123456789

10 x 10 7

η = 0.209 0.988 0.99 0.992 0.9940123456789

10 x 10 7

β = 0.991 0.4 0.45 0.50123456789

10 x 10 7

h = 0.448 5 6 70123456789

10 x 10 7

χ = 6.3 2.5 3 3.50123456789

10 x 10 7

φ = 3.014

2 30123456789

10 x 10 7

ν = 2.145 0.5 0.60123456789

10 x 10 7

ψ = 0.564 0.85 0.90123456789

10 x 10 7

ξp = 0.887 0.8 0.90123456789

10 x 10 7

γp = 0.862 0.6 0.70123456789

10 x 10 7

ξw = 0.62 0.15 0.2 0.250123456789

10 x 10 7

γw = 0.221

1.15 1.2 1.250123456789

10 x 10 7

εw = 1.2 0.2 0.30123456789

10 x 10 7

λy = 0.2341.45 1.5

0123456789

10 x 10 7

λπ = 1.454 0.75 0.80123456789

10 x 10 7

λr = 0.7790.98 0.99 1

0123456789

10 x 10 7

ρz = 0.997

Objective function: monetary shocks

0 0.5 100.512

0

x 10 4

γpξp

dist

ance

0.2 0.4 0.6 0.80.20.40.60.8

γp

ξ p

0.00015

0.0001

0.0001

1e0051e005

0.2 0.4 0.6 0.80.20.40.60.80.030.020.01

γwξw

dist

ance

0.2 0.4 0.6 0.80.20.40.60.8

γw

ξ w

0.00010.00055e0051e0051e0055e0065e006

0.2 0.4 0.6 0.80.20.40.60.854321

x 10 5

γpγw

dist

ance

0.2 0.4 0.6 0.80.20.40.60.8

γpγ w

1e0055e006

1e0061e006

5e007

0.2 0.4 0.6 0.80.20.40.60.81510

5

x 10 3

ξpξw

dist

ance

0.2 0.4 0.6 0.80.20.40.60.8

ξp

ξ w0.0005 0.000450.000350.00025

0.00015 0.0001

Distance surface and Contours Plots

Experiment:

- Use population responses from a model with some features (e.g. with

price stickiness and no price indexation).

- Ask: is it possible for a model with has di�erent features (e.g. no price

stickiness and price indexation) to have impulse responses which are very

close to the benchmark one?

- Do they imply di�erent welfare properties?

True �p p �w w Obj.Fun.Case 3 0.887 0 0.62 0.8

x0 = lb + 1std 0.9264 0.3701 0.637 0.4919 3.5156E-07x0 = lb + 2std 0.9076 0.2268 0.6415 0.154 3.51E-07x0 = ub - 1std 0.9014 0.3945 0.6477 0 6.12E-07x0 = ub - 2std 0.9263 0.3133 0.6294 0.4252 4.13E-07

Case 4 0.887 0 0 0.221x0 = lb + 1std 0.9186 0.3536 0.0023 0 4.7877E-07x0 = lb + 2std 0.8994 0.234 0 0 3.06E-07x0 = ub - 1std 0.905 0.3494 0.0021 0 4.14E-07x0 = ub - 2std 0.9343 0.5409 0.0042 0 9.64E-07

Case 5 0.887 0 0 0.221x0 = lb + 1std 0.877 0.0123 0.0229 0 2.4547E-06x0 = lb + 2std 0.8919 0.0411 0.0003 0 4.26E-07x0 = ub - 1std 0.907 0.2056 0.001 0.0001 6.58E-07x0 = ub - 2std 0.8839 0.0499 0.0189 0 2.46E-06

0 5 10 15 200.10.080.060.040.02

00.02 Inflation

0 5 10 15 200.10

0.10.20.3 Interest rate

0 5 10 15 201.5

1

0.5

0 Real wage

0 5 10 15 200.80.60.40.2

0 Investment

0 5 10 15 200.250.2

0.150.1

0.050 Consumption

0 5 10 15 200.20.15

0.10.05

00.05 Hours worked

0 5 10 15 200.250.2

0.150.1

0.050 output

quarters after shock 0 5 10 15 200.60.40.2

00.2 Capacity utilisation

quarters after shock

TrueEstimated

Impulse responses, Case 4.

Welfare costs di�erent!

L(�2; y2) = �0:0005 with true parameters.

L(�2; y2) = �0:0022 with estimated parameters.

Detecting identi�cation problems:

� Many approaches in the literature. Traditional one uses ex-post diagnos-tics:

- Erratic parameter estimates as T increases.

- Large (or non-computable) standard errors of estimates.

- Crazy t-test (Choi and Phillips, 1992, Stock and Wright, 2003).

1) Canova- Sala (2009) (ex-ante graphical diagnostic)

- Perform prior predictive analysis.

- Simulate the objective function (likelihood, posterior, distance function)

drawing parameters from some (prior) distribution. Plot objective function

against relevant parameters. Check if it is at, if it displays, ridges, or

other peculiarities.

- If the objective function does not change much when a parameter is

changed, that parameter can not be identi�ed.

- If the objective function does not change much when a subset of the pa-

rameters is changed, the parameter vector can not be separately identi�ed.

- Can also compute numerical derivatives of the solution/objective function

at likely parameter values.

- Can also use simulation analysis: check distribution of population esti-

mates.

2) Iskrev (2010): test the rank of a matrix.

- Likelihood function of normal stationary data depends only on its auto-

covariance function (ACF)

- The Jacobian of the transformation from the structural parameters to

the ACF must be full rank at �0 (the true parameter vector) for the model

to be locally identi�able.

- Randomly draw �0 from the prior. Calculate the (analytical) Jacobian at

�0. If less than full rank ! identi�cation de�ciencies.

Write the linear solution as

y2t = A22(�)y2t�1 +A23(�)y3t (30)

y1t = A12(�)y2t�1 +A13(�)y3t (31)

where y2t are the (endogenous and exogenous) states, y3t are the shocks,

y1t are the controls and � the k � 1 vector of structural parameters. Letxt = S[y1t; y2t]

0 where S is a selection matrix.

- Let mx(�) be a vector of theoretical moments of xt (in the case of

Iskrev, mx = vec(ACFx(j)) where j = 0; : : : ; J). Let mx be the vector

of estimated moment in the actual data. We want mx = mx(�).

- Let Mx =@mx(�)@� . The parameters are locally identi�able at �0 if

rank(Mx) � k.

Note: Mx = Ma �M� where M� is the matrix of derivatives of reduced

form coe�cients (decision rules) with respect to structural parameter, Ma

derivative of the moments with respect to reduced form coe�cients. Usu-

ally, problem is in M�.

- Problem 1: identi�cation in DSGE models is not a either/or proposition

e.g., if rank(Mx) = k, some of its eigenvalues may still be small.

- Problem 2: A lots of parameters may not enter the ACF. Can't just use

the ACF. Need to use also E(x) among the moments.

- Problem 3: If parameters enter only in A13(�) or A23(�) may not sepa-

rately identi�able from the variance of the shocks (see Komunijer and Ng

(2011).

3) Komunijer and Ng (2011): test the rank of a matrix.

- Start from (30)-(31), where y3t may also contain measurement errorsand let xt = y1t be the vector of observables.

- The MA representation for xt is xt = H(L; �)y3t where H(z; �) =P1j=0 hy3(�; j)z

�j is obtained as

H(z; �) = D(�) + C(�)[z � INy2 �A(�)]�1B(�) (32)

where Ny2 is the size of y2t, z 2 C.

- De�ne the spectral density of xt by sx(!; �) = H(z; �)�y3(�)H(z; �)0.

- Properties of sx(!; �) are determined by the properties of H. H, in turn,

is linked to the (Rosenbrook) matrix P (z; �) =

"z � INy2 �A(�) B(�)

�C(�) D(�)

#and rank (P (z; �)) = Ny2+ rank (H(z; �))

- For identi�cation want@Sx(!;�)

@� to have full column rank. To make sure

that this is the case, Kommunjer and Ng derive conditions on the elements

of P (z; �).

- Nice since P contains the mapping from � to the decision rules.

- Result 1 (case of Ny3 < Ny1): Two vectors �1 and �0 are observationally

equivalent if there exists T; U matrices of dimension Ny2�Ny2 and Ny3�Ny3 respectively, and the following hold

A(�1) = TA(�0)T0 (33)

B(�1) = TB(�0)U (34)

C(�1) = C(�0)T�1 (35)

D(�1) = D(�0)U (36)

�y3(�1) = U�1�y3(�0)U�1 (37)

- Result 2: Let �(T; U; �) =

26666664vec(TA(�0)T

0)vec(TB(�0)U)vec(C(�0)T

�1)vec(D(�0)U)

vech(U�1�y3(�0)U�1)

37777775.

The parameters of the model are locally identi�able at �0 if �(T; U; �1) =�(I; I; �0)has a unique solution at (T; U; �1) = (I; I; �0).

� Compute @�(T;U;�)@� and check its column rank at (I; I; �0).

i) Need to pick a �0

ii) Need to compute numerical derivatives (see matlab program in on-lineappendices in Econometrica).

- For Ny3 � Ny1 (not very relevant in applied work) see paper for thechanges that are needed.

How do you check the rank of matrix?

- Compute condition number of the eigenvalues and see if there is at least

one less than a critical value.

- Cragg and Donald (1997): Testing rank of the Hessian. Under regularity

conditions: (vec(H)�vec(H))0(vec(H)�vec(H)) � �2((N �L0)(N �L0)) N = dim(H); L0 =rank of H.

- Anderson (1984): Size of characteristic roots of Hessian. Under regularity

conditions:PN�mi=1 �iPNi=1 �i

D! Normal distribution.

Concentration Statistics: C�0(i) =Rj 6=i

g(�)�g(�0)d�R(��0)d�

; i = 1; 2 : : : (Stock,

Wright and Yogo, 2002) = measures the global curvature of the objective

function around �0.

Example: Smets and Wouters model:

- rank of H = 6;

- sum of 12-13 characteristics roots is smaller than 0.01 of the average

root; i.e. 12-13 parameters have weak or partial identi�cation problems.

Which are the parameters is causing problems? �; h; �l; �; �; ; p; w;

�w; ��; �y; �z. (consistent with graphical analysis).

Why? Variations of these parameters hardly a�ect law of motion of states!

Almost a rule: for identi�cation need states to react changes in structural

parameters.

4) Mueller (2012): How much do the results depend on the prior? Is the

posterior re ecting mostly the likelihood or the prior?

- Plotting marginal/prior marginal posterior insu�cient because

- Univariate representation.

- Prior of one parameter a�ect posterior of other parameters as well.

- Stability conditions imply that (marginal) priors and posteriors di�er even

though the likelihood has no information.

- Traditional approach to identi�cation: Compute the rank of the informa-

tion matrix I(�) = �E[@f(y;�)@�2

].

- Problems: the measure is local; approach does not satisfy the likelihood

principle (inference is based on averages of hypothetical histories that never

materialized).

Iskev (2010), Komunjer and Ng (2011) approach:

- Problem: Still not based on the (classical) likelihood, local measure,

classical inference, no role for priors.

General idea: let � be a scalar ( for simplicity)

1) Suppose g(�) has mean � and variance �2g(�).

2) Embed this prior in a family g�(�) with mean �+� and scores s�(�) =@ ln g�(�)

@� .

3) The posterior for this class has mean �g(�jy)(�) =R�L(�jy)g�(�)d�RL(�jy)g�(�)�d�

and

@�g(�jy)@�

j(�=0) = Eg(�jy)[(� � �g(�jy))s(�=0)(�)] (38)

4) If g�(�) = g�=0(�) exp[��2

�C(�)], where C(�) is independent of �:

@�g(�jy)@�

j�=0 � J = �2g(�)(�)�1�2g(�jy)(�) (39)

- Prior sensitivity: PS = J � �g =�2g(�jy)(�)

�g(�)(�).

- Prior informativeness: PI = min(1; J).

Interpretation:

1) PS measures the (linear) approximate change in the posterior mean

induced by an increase in the prior mean by one prior standard deviation.

2) If the likelihood pins down exactly � than changing the prior mean leaves

the posterior mean unchanged and J = 0. If the likelihood is at, J = 1.

Thus, values of PI between zero and one may be thought of as a numerical

measure for the relative importance of prior information for the posterior.

Advantages:

1) Global measure:compare �2g(�jy)(�) with �E[@f(y;�)

@�2j�.

2) Joint (rather than marginal) measure: compare multivariate J (see

Mueller) with [�2g(�)(�)(j; j)]�1[�2g(�jy)(�)(j; j)]

3) Consistent with the likelihood principle, dependent on the prior.

4) Deliver a measure rather than a yes/no answer.

5) Easy to compute: just take a the posterior covariance matrix and com-

pare it with prior covariance matrix.

Example: Smets and Wouters (2007)

J(i,i) close to one for many parameters.

5) Koop, Pesaran and Smith (2013) (simulation approach)

- In large samples, the variance of the likelihood distribution must converge

to zero at the rate T if a parameter is identi�ed. In large samples, the

variance of the posterior must have the same properties.

- In large samples, the variance of the posterior distribution of parameters

with identi�cation problems must converge to zero at the rate slower than

T or may not converge at all.

- Simulate data from a model with di�erent length. Check how the vari-

ance of the posterior of the parameters change.

What to do when identi�cation problems exist?

- If problems are in population need to respecify/reparameterize the model.

For example, the following NK system has less identi�cation problems than

the standard one

yt =h

1 + hyt�1 +

1

1 + hEtyt+1 +

1

�(it � Et�t+1) + v1t (40)

�t = a�t�1 + b�t+1 + �yt + v2t (41)

it = �rit�1 + (1� �r)(��t�1 + �yyt�1) + v3t (42)

- If problems are due to a particular objective function or to the use of

limited information: use likelihood.

- If problems are due to small sample, add information (prior or other data).

� Don't proceed as if they do not exist. Estimates make no sense!!

� Careful with mixed calibration-estimation. It is preferable to use full

calibration or Bayesian calibration (Canova and Paustian, 2011).

� Do you really need to estimate the model or can you do with reasonablecalibration?

2 Combining DSGE and VARs

� People often discuss whether one should estimate a structural model ora reduced form one (see Canova, 2009). Now the discussion is sterile. We

can estimate both jointly.

� Recall: Linearized solution of a DSGE model:

y2t = A22(�)y2t�1 +A23(�)y3t (43)

y1t = A12(�)y2t�1 +A13(�)y3t (44)

y2t = states (predetermined/exogenous), y1t = controls, y3t = shocks.

- Aij(�) are the decision rule matrices; � are the structural parameters ofpreferences, technologies, policies, etc.

- Standard approach: set up a prior, g(�), use (43)-(44) and the Kalman

�lter to build the likelihood function, f(yj�); and obtain the posteriorg(�jy) using MCMC methods.

- Now take an intermediate step. We specify g(�), use the model to

derive a prior g(�;�uj�), where � = fAij(�)g are the (reduced form)

VAR parameters corresponding to (43)-(44),and build the (reduced form)

likelihood f(yj�;�u).

- g(�) is the prior distribution for DSGE parameters.

- g(�;�uj�) is the prior for (reduced form) VAR parameters, induced bythe prior on � (the hyperparameters) and the solution of the model.

- f(yj�;�u) is likelihood of the data, conditional on the reduced form VARparameters.

Del Negro and Schorfheide (2004): The joint posterior of � and � is

g(�;�u; �jy) = g(�;�u; j�; y)g(�jy) (45)

- g(�;�u; j�; y) = g(�; j�; y;�u)g(�u; j�; y)is easy to compute since, un-der conjugation, g(�j�; y;�u) is normal and g(�u; j�; y) is inverted

Wishart, for a standard speci�cation of g(�;�uj�):

- g(�jy) / �g(�jy) = f(yj�)g(�) where

f(yj�) =Zf(yj�;�u)g(�;�u; �)d�d� =

f(yj�;�u)g(�;�uj�)g(�;�ujy)

(46)

Since g(�;�uj�; y) = g(�;�u; jy). Then

f(yj�) =jT1xs

0(�)xs(�) +X 0Xj�0:5M j(T1 + T )~�u(�)j�0:5(T1+T�k)

j�xs0(�)xs(�)j�0:5M jT1~�su(�)j�0:5(T1�k)

�(2�)�0:5MT2�0:5M(T1+T�k)QM

i=1 �(0:5 � (T1 + T � k + 1� i))

2�0:5M(T1�k)QMi=1 �(0:5 � (T1 � k + 1� i))

(47)

where T1 = number of simulated observations, � is the Gamma function,

X includes all lags of y and the superscript s indicates simulated data.

� Since g(�jy) is non-standard, draw � using the MH algorithm.

� Computing the joint posterior of (�;�u; �) is no more complicated thancomputing the posterior of � alone.

� DSGE-VAR is an application of Hierachical Bayes models (see Canova,2007, ch.9).

� Advantage of the procedure: do not need to choose between estimatinga VAR or a DSGE. Can estimate them jointly.

� How do you draw structural and reduced form parameters? Draw � from

g(�jy): Given �, draw � from a Normal-Wishart, conditional on �. Pro-

cedure equivalent to adding T1 observations from the model to T sample

data.

� Dynare has an option to jointly estimate a DSGE model and the VAR.You can �x T1 or treat it as an additional parameter whose posterior

needs to be computed (need to specify a prior for it).

Estimation algorithm: Set T1 = �T1.

1) Draw a candidate �. Use MCMC to decide if accept or reject.

2) With the accepted draw compute the model induced prior for the VAR

parameters.

3) Compute the posterior for the VAR parameters (analytically if you have

a conjugate structure or via Gibbs sampler). Draw from this posterior.

4) Repeat steps 1)-3) NL + �L times. Check convergence, compute the

marginal likelihood.

Note: we can repeat 1)-4) for di�erent �T1 and choose the �T1 that maxi-

mizes the marginal likelihood.

� Procedure provides a testing ground for the model: if model is the DGPadding data to actual ones improves the �t and reduces standard errors.

If not, adding data introduces biases and reduces the �t

Example 2.1 In a basic sticky price-sticky wage economy, �x � = 0:66; �ss =

1:005; Nss = 0:33; cgdp = 0:8; � = 0:99; �p = �w = 0:75; a0 = 0; a1 =

0:5; a2 = �1:0; a3 = 0:1. Run a VAR with (Y,R, �;M), quarterly data

from 1973:1-1993:4 and data simulated conditional on these parameters.

Marginal Likelihood, Sticky price sticky wage model.� = 0 � = 0:1 � = 0:25 � = 0:5 � = 1 � = 2-1228.08 -828.51 -693.49 -709.13 -913.51 -1424.61

Only a modest amount of simulated data (20 points) should be added.

Model helps the �t, but far from being the DGP.

3 Choice of data and estimation

- DSGE models typically singular: the number of endogenous variables is

larger than the number of shocks. Does it matter which variables are used

in estimation? Yes.

i) Omitting relevant variables may lead to estimation distortions. Adding

variables may improve the �t, but also increase standard errors if added

variables are irrelevant.

ii) Di�erent variables may identify di�erent parameters (e.g. with aggre-

gate consumption and no data on who owns �nancial assets very di�cult

to get estimate the share of rule-of-thumb consumers).

Example 3.1

yt = a1Etyt+1 + a2(it � Et�t+1) + v1t (48)

�t = a3Et�t+1 + a4yt + v2t (49)

it = a5Et�t+1 + v3t (50)

Solution: 264 yt�tit

375 =264 1 0 a2a4 1 a2a40 0 1

375264 v1tv2tv3t

375� a1; a3; a5 disappear from the solution.

� Di�erent variables identify di�erent parameters (it identi�es none !!)

iii) Forecasting performance may change.

iv) The shape of the Likelihood function may change depending on the

variables used. Multimodality may be present if important variables are

omitted.

- With the same model and the same econometric approach Levin et al.

(2005, NBER macro annual) �nd habit in consumption is 0.30; Fernandez

Villaverde and Rubio Ramirez (2008, NBER macro annual) �nd it to be

0.88. Why? They use di�erent data series to estimate the same model!

Can we say something systematic about the choice of variables?

Guerron Quintana (2010): use Smets and Wouters model and di�erent

combinations of observable variables. Finds:

- Internal persistence changes if nominal rate, in ation and real wage are

absent.

- Duration of price spells a�ected by the omission of consumption and real

wage.

- Responses of in ation, investment, hours and real wage sensitive to the

choice of variables.

Parameter Wage stickiness Price Stickiness Slope Phillips

Data Median (s.d.) Median (s.d.) Median (s.d.)Basic 0.62 (0.54,0.69)0.82 (0.80, 0.85)0.94 (0.64,1.44)

Without C 0.80 (0.73,0.85)0.97 (0.96, 0.98)2.70 (1.93,3.78)Without Y 0.34 (0.28,0.53)0.85 (0.84, 0.87)6.22 (5.05,7.44)Without C,W0.57 (0.46,0.68)0.71 (0.63, 0.78)2.91 (1.73,4.49)Without R 0.73 (0.67,0.78)0.81 (0.77, 0.84)0.74 (0.53,1.03)

(in parenthesis 90% probability intervals)

Standard approaches to deal with singularity issues:

� Solve out variables from the FOC before you compute the solution until

the number of observables is the same as the number of shocks.

- Good strategy to follow if a portion of yt is non-observable. If not, which

variables do we solve out?

- Format of the solution is no longer a restricted VAR(1) (it is a VARMA

with possibly in�nite lags).

� Add measurement errors until the combined number of structural shocksand measurement errors equals the number of observables. Thus, if the

model has two shocks and implications for four variables, add at least two

and up to four measurement errors. How many should we use?

Here, (43)-(44) represents the state equations (assuming all variables are

observables) and the measurement equation is

wt = F1yt + ut (51)

where yt = [y01t; y

02t]0:

- Restrict the properties of ut. Otherwise, di�cult to distinguish dynamics

due to structural shocks and the measurement errors. Typical assumptions:

i) ut is iid. If measurement error is iid, � can be recovers from the dynamics

induced by structural shocks.

ii) Ireland (2004): VAR(1) process for the measurement error. If the

model solution is also a VAR(1): how to we distinguish structural from

measurement error dynamics? Identi�cation problem!

- For forecasting the distinction is irrelevant.

- Approach can be used to verify the quality of the model's approximation

to the data if � is calibrated (see also Watson,1993). Problematic if � is

estimated.

iii) Canova (2014): measurement error has a complex structure (see later).

Canova, Ferroni and Matthes (2014)

� Use statistical methods to select variables to be used in estimation:

1) Choose vector of variables that maximize the identi�cability of relevantparameters; i.e. choose the combination of observables so that the rankof the derivative of the spectral density of the solution with respect to theparameters (see Komunjer and Ng, 2011) is as close as possible to theideal one.

- Compare the curvature of the convoluted likelihood in the singular andthe non-singular systems to eliminate ties.

2) Choose vector that minimize the information loss in going from thelarger scale to the smaller scale system. Information loss is measured by

pjt(�; e

t�1; ut) =L(Wjtj�; et�1; ut)L(Ztj�; et�1; ut)

(52)

where L(:j�; y1t) is the likelihood of Zt;Wjt de�ned by

Zt = yt + ut (53)

Wjt = Syjt + ut (54)

ut is an iid convolution error, yt the original set of variables and yjt the

j-th subset of the variables producing a non-singular system.

� Apply procedures to SW model driven with 4 shocks and 7 observables.

Unrest SW RestrSW Restr andVector Rank(�) Rank(�) Sixth Restr

y; c; i; w 186 188 y; c; i; � 185 188 y; c; r; h 185 188 y; i; w; r 185 188 c; i; w; h 185 188 ; �c; �ic; i; �; h 185 188 c; i; r; h 185 188 �!; �p; i!y; c; i; r 185 187...

c; w; �; r 183 187c; w; �; h 183 187i; w; �; r 183 187w; �; r; h 183 187c; i; �; r 183 186Ideal 189 189

Rank conditions for all combinations of variables in the unrestricted SW model (columns 2) and in the

restricted SW model (column 3), where � = 0:025, "p = "w = 10, �w = 1:5 and c=g = 0:18. The fourth

columns reports the extra parameter restriction needed to achieve identi�cation; a blank space means that

there are no parameters able to guarantee identi�cation.

0.65 0.7 0.75

10

0

10

h = 0.710.55 0.6 0.65 0.7 0.75

40

20

0

ξp = 0.650.35 0.4 0.45 0.5 0.55

10

0

10

γp = 0.47

1.6 1.8 2 2.2

105

05

10

σl = 1.921.8 2 2.2

20

0

20

ρπ = 2.030 0.1 0.2

50

0

50

ρy = 0.08

DGPoptimal

Likelihood curvature

Basic T=1500 �u = 0:01 � IOrder Vector Relative Info Vector Relative info Vector Relative Info

1 (y; c; i; h) 1 (y; c; i; h) 1 (y; c; i; h) 12 (y; c; i; w) 0.89 (y; c; i; w) 0.87 (y; c; i; w) 0.863 (y; c; i; r) 0.52 (y; c; i; r) 0.51 (y; c; i; r) 0.514 (y; c; i; �) 0.5 (y; c; i; �) 0.5 (y; c; i; �) 0.5

Ranking based on the information statistic. The �rst two column have the results for the

basic setup, the next six columns those obtained altering nuisance parameters. Relative

information is the ratio of the p(�) statistic relative to the best combination.

- Best combinations always include y,c,i. Worst combinations always

include jointly r,�:

� How di�erent are good and bad combinations?

- Simulate 200 data points from the model with four shocks and estimate

structural parameters using

(1) Model A: 4 shocks and (y; c; i; w) as observables (best rank analysis).

(2) Model B: 4 shocks and (y; c; i; h) as observables (best information analysis).

(3) Model Z: 4 shocks and (c; i; �; r) as observables(worst rank analysis).

(4) Model C: 4 structural shocks, three measurement errors and (yt; ct; it; wt; �; rt; ht) as

observables.

(5) Model D: 7 structural shocks (add price and wage markup and preference shocks)

and (yt; ct; it; wt; �; rt; ht) as observables.

True Model A Model B Model Z Model C Model D�a 0.95 ( 0.920 , 0.975 ) ( 0.905 , 0.966 ) ( 0.946 , 0.958) ( 0.951 , 0.952 ) ( 0.939 , 0.943)*�g 0.97 ( 0.930 , 0.969 ) ( 0.930 , 0.972 ) ( 0.601 , 0.856)* ( 0.970 , 0.971 ) ( 0.970 , 0.972 )�i 0.71 ( 0.621 , 0.743 ) ( 0.616 , 0.788 ) ( 0.733 , 0.844)* ( 0.681 , 0.684)* ( 0.655 , 0.669)*�ga 0.51 ( 0.303 , 0.668 ) ( 0.323 , 0.684 ) ( 0.010 ,0.237 )* ( 0.453 , 0.780 ) ( 0.114 , 0.885)*�n 1.92 ( 1.750 , 2.209 ) ( 1.040 , 2.738 ) ( 0.942 , 2.133) ( 1.913 , 1.934 ) ( 1.793 , 1.864)*�c 1.39 ( 1.152 , 1.546 ) ( 1.071 , 1.581 ) ( 1.367 , 1.563) ( 1.468 , 1.496)* ( 1.417 , 1.444)*h 0.71 ( 0.593 , 0.720 ) ( 0.591 , 0.780 ) ( 0.716 , 0.743 ) (0.699 , 0.701)* ( 0.732 , 0.746)*�! 0.73 ( 0.402 , 0.756 ) (0.242, 0.721)* ( 0.211 ,0.656 )* ( 0.806 , 0.839)*�p 0.65 ( 0.313 , 0.617)* ( 0.251 , 0.713 ) ( 0.512 , 0.616 )* ( 0.317 , 0.322)* ( 0.509 , 0.514)*i! 0.59 ( 0.694 , 0.745 ) ( 0.663 , 0.892)* ( 0.532 ,0.732 ) ( 0.728 , 0.729)* ( 0.683 , 0.690)*ip 0.47 ( 0.571 , 0.680)* ( 0.564 , 0.847)* ( 0.613 , 0.768 )* ( 0.625 , 0.628)* ( 0.606 , 0.611)*�p 1.61 ( 1.523 , 1.810 ) ( 1.495 , 1.850 ) ( 1.371 , 1.894 ) ( 1.624 , 1.631)* ( 1.654 , 1.661)*' 0.26 ( 0.145 , 0.301 ) ( 0.153 , 0.343 ) ( 0.255 , 0.373 ) ( 0.279 , 0.295)* ( 0.281 , 0.306)* 5.48 ( 3.289 , 7.955 ) ( 3.253 , 7.623 ) ( 2.932 , 7.530 ) ( 11.376 , 13.897)* ( 4.332 , 5.371)*� 0.2 ( 0.189 , 0.331 ) ( 0.167 , 0.314 ) ( 0.136 , 0.266 ) ( 0.177 , 0.198)* ( 0.174 , 0.199)*�� 2.03 ( 1.309 , 2.547 ) ( 1.277 , 2.642 ) ( 1.718 , 2.573 ) ( 1.868 , 1.980)* ( 2.119 , 2.188)*�y 0.08 (0.001 , 0.143 ) ( 0.001 , 0.169 ) ( 0.012 , 0.173) ( 0.124 , 0.162)*�R 0.87 ( 0.776 , 0.928 ) ( 0.813 , 0.963 ) ( 0.868 , 0.916 ) ( 0.881 , 0.886)*��y 0.22 ( 0.001 , 0.167)* (0.010, 0.192)* ( 0.130 ,0.215 )* (0.235 , 0.244)*�a 0.46 ( 0.261 , 0.575 ) ( 0.382 , 0.460 ) ( 0.420 ,0.677 ) ( 0.357 , 0.422)* ( 0.386 , 0.455)*�g 0.61 ( 0.551 , 0.655 ) ( 0.551 , 0.657 ) ( 0.071 ,0.113 ) ( 0.536 , 0.629 ) ( 0.585 , 0.688)*�i 0.6 ( 0.569 , 0.771 ) ( 0.532 , 0.756 ) ( 0.503 ,0.663 ) ( 0.561 , 0.660 ) ( 0.693 , 0.819)*�r 0.25 ( 0.100 , 0.259 ) ( 0.078 , 0.286 ) ( 0.225 ,0.267 ) ( 0.226 , 0.265 ) ( 0.222 , 0.261 )

10 20 30 400

0.2

0.4

0.6

y

10 20 30 40

0.2

0.15

0.1

0.05

0

c

10 20 30 40

1

0.5

0i

10 20 30 400.04

0.02

0

0.02

0.04

w

10 20 30 400

0.1

0.2

0.3

0.4

h

10 20 30 400

0.01

0.02

π

10 20 30 400

0.01

0.02

0.03

0.04

r

trueModel A InfModel A SupModel Z InfModel Z Sup

Responses to a goverment spending shock

10 20 30 40

0.2

0.4

0.6

y

10 20 30 40

0.1

0.2

0.3

0.4

c

10 20 30 400

0.5

1

1.5

i

10 20 30 40

0.050.1

0.150.2

0.250.3

w

10 20 30 400.3

0.2

0.1

0

0.1h

10 20 30 40

0.06

0.04

0.02

0

π

10 20 30 40

0.08

0.06

0.04

0.02

0r

trueModel B InfModel B SupModel C InfModel C Sup

Responses to a technology shock

0 5 10 15 20 25 30 35 400.2

0.15

0.1

0.05

0

0.05

y0 5 10 15 20 25 30 35 40

20

15

10

5

0

5x 10

4

0 5 10 15 20 25 30 35 400.1

0.05

0

0.05

h0 5 10 15 20 25 30 35 40

2

1

0

1x 10

3

0 5 10 15 20 25 30 35 400.1

0

0.1

0.2

π0 5 10 15 20 25 30 35 40

2

0

2

4x 10

3

0 5 10 15 20 25 30 35 400.02

0

0.02

0.04

r0 5 10 15 20 25 30 35 40

5

0

5

10x 10

4

SWEst

Responses to an price markup shock

4 Prior speci�cation problems

� It is standard to use an inverted Gamma (IG) prior for the variance ofthe structural shocks. Since a IG density has positive probability on thepositive orthant, structural shocks can not have zero variance.

�What if the DGP has less shocks than in the estimated model? As shownbefore, biases are present and inference may be whimsical.

� Prior speci�cation for the shock variance with positive probability on thevariance being zero? Use, Uniform, exponential, etc.

� Ferroni et al. (2015): the use of non-existent shocks cause a downwardbias in the internal propagation mechanism of shocks (downward bias in theautoregressive parameters, in the price and wage stickiness). Distortionsreduced with the use of exponential or normal priors.

RBC model

Smets and Wouters model

� Exponential distribution is a member of the Gamma family f(x|a,b)= 1ba�(a)

xa�1ex=b;setting a=1 (�(1) = 1):

- In Dynare choose a Gamma density for the variances and set the variance

twice as large as the mean (i.e. mean =0.01, variance 0.02)

5 Measurement error problems

� Problems when measurement errors is added and estimation is performedin growth rates (e.g. using GDP growth, in ation, etc.).

� Measurement error can not be a white noise. It is an integrated process!(see Pagan, 2016). Is this what you want?

Suppose the model features a unit root in technology and let yt and ct be

consumption and output. Assume the measurement equations:

�cDt = �cmt + e1t (55)

�yDt = �ymt + e1t (56)

In level (55)-(56) imply

cDt � cmt =tX

k=1

e1k (57)

yDt � ymt =tX

k=1

e2k (58)

Because in the model yt and ct have a common trend we have:

cDt � cmt = (yDt � ymt ) +tX

k=1

(e1k � e2k) (59)

cDt � yDt = (cmt � ymt ) +tX

k=1

(e1k � e2k)

= wt +tX

k=1

(e1k � e2k) (60)

where wt is a stationary variable.

� The stochastic trend in cDt � yDt is di�erent from the stochastic trend in

technology (and the stochastic trend in the model) unless ct and yt have

the same measurement error cumulant (i.e.Pk(e1k � e2k) = 0)!

� ct and yt can NOT be cointegrated in the data!

� Why do we put a stochastic trend in the model in the �rst place?

� How do we �x the problem? Suppose there are n observed I(1) variables.Write the model in VECM format

�xmt = � xmt�1 + (xmnt � at) + emt (61)

where xt is a generic model variable, are common cointegrating vectors

and the cointegrating relationship is normalized using xnt, the n-th I(1)

variable, at is the stochastic TFP and eMt is a measurement error

Suppose the data has been generated by

�xDt = � zDt�1 + eDt (62)

Subtracting (62) from (61) we have

��t = � �t�1 � (xmnt � at) + eDt � emt (63)

where �t = (xDt � xmt ). Here �t is stationary - the stochastic trend in the

model is the same as the stochastic trend in the data.

Thus, if we let �t = ��t, the stationary measurement error that makes

the model and the data cointegrated is

�t = � �t�1 � (xmnt � at) + edt � emt (64)

� �t can not be iid - it is a VARMA!

� To make sure it is white noise we need ��t = ��. Is this what you

think you are doing when you add measurement error? Quite di�erent

from (55)-(56)!

6 Data rich DSGEs

How do you estimate DSGE models when:

1) variables are mismeasured relative to the model quantities.

2) there are multiple observables that correspond to model quantities.

3) have additional information (not included in the model) you would like

to use.

4) available data is noisy (bad or short).

� Recognize that measures of theoretical concepts are contaminated.

- GDP is revised for up to three years; model savings do not correspond to

savings computed in national statistics. What is the output gap? Should

we use a statistical-based or a theory-based measure? In the latter case,

what is the exible price equilibrium?

- How do you measure hours? Use establishment survey series? Household

survey series? Employment?

- Do we use CPI in ation, GDP de ator or PCE in ation?

� Idea: di�erent measures contain (noisy) information about the true series.Not perfectly correlated among each other.

Case 1: Measurement error is present.

Observable x0t. Model based quantity: xm0t(�) = S0[y1t; y2t], S0 is a

selection matrix.

x0t = xmt (�) + u0t (65)

where u0t is iid measurement error.

� Di�erence with the setup of previous section is that measurement erroris now economically motivated here.

� Cases 2-4 use ideas underlying factor models

- Case 2 (Boivin and Giannoni, 2005) let x1t be a k�1 vector of observablesand let xm1t(�) = S1[y1t; y2t], where S1 is another selection matrix of

dimension N � 1, dim (N) <dim(k). Then measurement equation is:

x1t = �1xmt (�) + u1t (66)

where the �rst row of �1 is normalized to 1, and u1t is iid measurement

error.

� Use x1t to jointly estimate �;�1; �2u: and recover the states y1t if it is ofinterest.

� Interpretation of �1j; j = 2; :::N : information content of indicator j for

xmt relative to indicator 1.

- What is the advantage of this procedure? If only one component of x1tis used, estimates of � will probably be noisy.

- Using a vector and assuming that the elements of u1t are idiosyncratic:

i) can reduce the noise in the estimate of the states y1t (the estimated

variance of y1t will be asymptotically of the order 1=k � the variance

obtained when only one indicator is used, see Stock and Watson, 2002).

ii) estimates of � more precise, see Justiniano et al. (2012).

- What is the di�erence with factor models? Here the DSGEs structure

is imposed in the speci�cation of the law of motion of the states (states

have economic content). In factor models, the states have an unrestricted

time series speci�cation, say a random walk, and are uninterpretable.

- How do we identify the dynamics induced by the structural shocks and the

measurement errors? Since the measurement error is identi�ed from the

cross sectional properties of the observables x1t, it is possible to have both

the structural disturbances and measurement errors serially correlated.

Many cases �t in case 3):

- Sometimes we may have proxy measures for the unobservable states.

(commodity prices are often used as proxies for future in ation shocks,

stock market shocks are used as proxies for future technology shocks, see

Beaudry and Portier (2006).

- Sometimes we have survey data to proxy for unobserved states (e.g.

business cycles).

- Sometimes we have ash information (preliminary estimates).

- Assume that these indicators give you information about the states y1t.

Let x2t a q � 1 vector of variables and let xm2t(�) = S2[y1t; y2t], S2 is

another selection matrix. Measurement equation:

x2t = �2xm2t(�) + u2t (67)

�2 is unrestricted, except for the �rst row, which is normalized to 1.

- Combining all sources of available information:

Xt = �Syt(�) + ut (68)

where Xt = [x0t; x1t; x2t]0, ut = [u0t; u1t; u2t]

0 and � = [I;�1;�2]0, yt =

[y1t; y2t] and S = diag[S0; S1; S2].

- The fact that we are using the DSGE structure (xmt (�) depends on �)

imposes restrictions on the data.

- We interpret data information through the lenses of the DSGE model,

even though the model does not feature the variables used in estimation.

Case 4): use transformations of the data which are, hopefully, less noisy.

For example, output and hours may be poorly estimated, but labor pro-

ductivity may be better estimated.

Observables x3t. Model based quantities xm3t(�) = S3[y1t; y2t], S3 is a

selection matrix. Then

x3t =Mxm3t(�) + u3t (69)

where u3t is iid measurement error, and M is matrix of zero and ones.

Example 6.1 Consider a three equation New-Keynesian model:

ot = Et(ot+1)�1

�(it � Et�t+1) + e1t (70)

�t = �Et�t+1 + �ot + e2t (71)

it = rit�1 + (1� r)( ��t + xot) + e3t (72)

where � is the discount factor, � the relative risk aversion coe�cient, �

the slope of Phillips curve, ( r; �; x) policy parameters. Here ot is the

output gap, �t the in ation rate and it the nominal interest rate. Assume

e1t = �1e1t�1 + v1t (73)

e2t = �2e2t�1 + v2t (74)

e3t = v3t (75)

where �1; �2 < 1, vjt � (0; �2j); j = 1; 2; 3.

- How do we link the output gap, the in ation rate and the nominal

interest rate to empirical counterparts? Which the nominal interest rate

should we use? How do we measure the gap?

- Model solution (state equations of the state space system)

yt = R(�)yt�1 + S(�)vt (76)

where yt is a 8�1 vector including (ot; �t; it, e1t; e2t; e3t), the expectationsof xt and �t and � = (�; �; �; r; y; �; �1; �2; �1; �2; �3).

Let ojt ; j = 1; : : : Nx be indicators for ot, let �

jt ; j = 1; : : : N� indicators

for �t, and ijt ; j = 1; : : : Ni indicators for it. Let

Xt = [o1t ; : : : ; o

Nxt ; �1t ; : : : ; �

N�t ; i1t ; : : : i

Nit ]

0 be a Nx+N�+Ni� 1 vector.

- The measurement equation is

Xt = �yt + ut (77)

where � is Nx+N�+Ni� 3 matrix with at most one nonzero element ineach row and ut is iid.

- (76)-(77) is an extended state space. Kalman �lter routine gives us

estimates of �;� and yt, which are consistent with the data xt.

Extension 1: Conjunctoral information

- Can use conjunctoral information as additional data that gives us infor-

mation about the states.

- Suppose we have measures of future in ation (from surveys, from fore-

casting models) or data which may have some information about future

in ation, for example, oil prices, housing prices, etc.

- Suppose want to predict in ation h = 1; 2; : : :.periods ahead, Let �jt ; j =

1; : : : N� be the indicators for �t and let Xt = [ot; it; �1t ; : : : ; �

N�t ]0 be a

2 +N� � 1 vector. The measurement equation is:

Xt = �yt + ut (78)

where � =

2666666664

1 0 00 1 00 0 10 0 �1: : : : : : : : :0 0 �N�

3777777775is a 2 +N� � 3 matrix.

- Estimates of yt can be obtained with the Kalman �lter. With estimates

of R(�) and S(�) from the state equation, we can unconditionally predict

yt h-steps ahead or predict its path conditional on a path for one structural

shock v1;t+h.

- Forecasts incorporate information from the model, the conjunctural and

the actual data, and information about the path of the shocks. Information

is optimally mixed using their relative precision.

Extension 2: Mixed frequency data

- High frequency data is very useful to understand the state of the economy

(e.g. tapering of US expansionary monetary policy).

- Macro data available at much lower frequencies. How do we combine

high and low frequency information?

- Suppose we have monthly and quarterly macro data. Let xjt the quarterly

version of the monthly data, obtained using only data from the j-month of

the quarter. Set Xt = [x1t; x2t:x3t]0. The observation equation is

Xt = �yt + ut (79)

See Foroni and Marcellino (2013).

- Alternative. Suppose you have quarterly data for interest rates and in a-

tion and annual data for the growth rate of output. How do you estimate

a new Keynesian model?

ot = Et(xt+1)�1

�(it � Et�t+1) + e1t (80)

�t = �Et�t+1 + oxt + e2t (81)

it = rit�1 + (1� r)( ��t + xot) + e3t (82)

Use as observation equations:

oat = o(t) + o(t� 1) + o(t� 2) + o(t� 3) (83)

opt = �t + u1t (84)

oit = it + u2t (85)

- In your code use quarterly data for in ation and nominal rate and quarterly

data for output obtained �lling with 'NA' the missing data and assigning

the annual data to the �rst quarter of each year.

Extension 3: Data from other countries

- Sometimes need to estimate a structural model with data which is shortand of poor quality. How do we proceed?

- Suppose there is data from other countries (for the same sample). Sup-pose this data is informative about the state of the local economy, e.g.,because the two countries are trading partners; have economic intercon-nections; one country leads the other over the cycle, etc.

- Let the indicators for di�erent countries be Xt = [x1t ; :::xNt ]:The obser-

vation equation is

Xt = �y1t + ut (86)

or

Xt = �Syt(�) + ut (87)

where y1tare the states and S is a selection matrix.

7 Dealing with trends and non-balanced growth

- Most of models available for policy exercises are stationary and cyclical.

- Data is close to non-stationary; it has trends; it displays breaks.

- How to we match models to the data? Many approaches:

a) Detrend actual data. Assume that the model is a representation for

detrended data.

Problem: which detrended data is the model representing?

1965 1970 1975 1980 1985 1990 1995 2000 20058

6

4

2

0

2

4

6GDP

LTHPFODBKCF

b) Take ratios in the data and in the model - will get rid of trends ifvariables in the ratio are cointegrated. Problem: data does not seem tosatisfy balanced growth (the variables in the ratios are not cointegrated)

1950:1 1962:2 1974:4 1986:2 1998:4

0.57

0.58

0.59

0.6

0.61

0.62

0.63

c/y

real

1950:1 1962:2 1974:4 1986:2 1998:40.04

0.06

0.08

0.1

0.12

0.14

i/y re

al

1950:1 1962:2 1974:4 1986:2 1998:40.52

0.54

0.56

0.58

0.6

0.62

0.64

c/y

nom

inal

1950:1 1962:2 1974:4 1986:2 1998:40.08

0.1

0.12

0.14

0.16

i/y n

omin

al

Real and nominal Great ratios in US, 1950-2008.

c) Build-in a trend into the model. Detrend the data with model-based

trend. Problems:

1) Speci�cation of the trend is arbitrary (deterministic? stochastic?).

2) Where you put the trend (TFP? preference?) matters for estimation

and inference. Nuisance parameter problem.

� General problem: statistical and economic de�nitions of cycle di�er.Statistical approaches likely to give biased results, even in large samples.

0 0.5 1 1.5 2 2.5 3 3.50

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000Ideal Situation

datacycle 1

Ideal case

0 0.5 1 1.5 2 2.5 3 3.50

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000Cyclical has power outside BC frequencies

datacycle 2

Realistic case

0 0.5 1 1.5 2 2.5 3 3.50

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000Noncyclical has power at BC frequencies

datacycle 3

General case

- In developing countries most of cyclical uctuations driven by trends(permanent shocks), see Aguiar and Gopinath (2007).

Two approaches to deal with the problem:

1) Data-rich environment, see Canova and Ferroni (2011). Let xit be the

actual data �ltered with method i = 1; 2; :::; N and xdt = [x1t ; x2t ; : : :].

Assume:

xdt = �0 + �01xmt (�) + ut (88)

where �j; j = 0; 1 are matrices of parameters, measuring the bias and

correlation between the �lter data xdt and model based quantities xmt (�) �

S[y1t(�); y2t(�)]0; ut are measurement errors; � the structural parameters.

- Factor model setup a-la Boivin and Giannoni (2005); model based quan-

tities are non-observable.

- Jointly estimate � and �'s. Can obtain more precise estimates of xmt (�);

if ut is cross-sectionally uncorrelated (in general they are not).

2) Flexibly bridge cyclical model and the data (Canova, 2014).

xdt = c+ xTt + xmt (�) + ut (89)

where xdt � ~xdt � E(~xdt ) the log demeaned vector of observables, c =

�x � E(~xdt ), xTt is the non-cyclical component, x

mt (�) � S[y1t; y2t]

0 is themodel based- cyclical component, S is a selection matrix; xTt ; x

mt (�) and

ut are mutually orthogonal.

- ut is a iid (0;�u) (measurement) noise

- Model (linearized) solution is given by (43)-(44) where y2t includes en-

dogenous and exogenous states.

- Non cyclical component:

xTt = �1xTt�1 + �xt�1 + et et � iid (0;�e) (90)

�xt = �2�xt�1 + vt vt � iid (0;�v) (91)

Note: if �v > 0 and �e = 0, xTt is a vector of I(2) processes.

If �1 = �2 = I;�v = 0, and �e > 0, xTt is a vector of I(1) processes.

If �1 = �2 = I;�v = �e = 0, xTt is deterministic.

If �1 = �2 = I;�v = �2vI > 0 and �e = �2eI > 0 and�2v�2eis large, xTt is

"smooth"(as in HP �lter).

If �1 6= I; �2 6= I or both, xTt has power at particular frequencies

- Jointly estimate � and non-structural parameters (c,�1; �2;�e;�v;�u).

Advantages:

� No need to take a stand on the properties of the non-cyclical componentand on the choice of �lter to tone down its importance - speci�cation errors

and biases reduced.

� Estimated cyclical component not localized at particular frequencies ofthe spectrum.

- Cyclical, non-cyclical and measurement error uctuations driven by di�er-

ent and orthogonal shocks. The speci�cation is observationally equivalent

to one where cyclical and non-cyclical are correlated.

Example 7.1 The log linearized equilibrium conditions of basic NK model are:

�t = �t ��c

1� h(yt � hyt�1) (92)

yt = zt + (1� �)nt (93)

wt = ��t + �nnt (94)

rt = �rrt�1 + (1� �r)(��t + �yyt) + vt (95)

�t = Et(�t+1 + rt � �t+1) (96)

�t = kp(wt + nt � yt + �t) + �Et�t+1 (97)

zt = �zzt�1 + �zt (98)

where kp =(1��p)(1��p)

�p

1��1��+"�, �t is the Lagrangian on the consumer budget constraint,

zt is a technology shock, �t a preference shock, vt is an iid monetary policy shock and �t

an iid markup shock.

- Estimate this model with a number of detrending transformations.

- Do we get di�erent estimates? Do we get di�erent responses?

� Is the approach capable of capturing di�erent shapes of non-cyclicalcomponents?

- Simulate data from a model where trend is unimportant and where trend

is important.

- What happens to parameter estimates obtained with standard methods?

- Does the new method recovers the DGP better in both cases?

- What kind of parameters are distorted?

DGP1True value LT HP FOD BP Ratio1Flexible

�n 0.50 0.04 0.08 0.00 0.11 0.05 0.04h 0.70 0.00 0.00 0.00 0.01 0.07 0.10� 0.30 0.00 0.04 0.00 0.06 0.04 0.06�r 0.70 0.05 0.05 0.01 0.06 0.13 0.01�� 1.50 0.00 0.00 0.00 0.01 0.02 0.00�y 0.40 0.17 0.20 0.17 0.19 0.15 0.00�p 0.75 0.03 0.04 0.03 0.03 0.02 0.03�� 0.50 0.00 0.04 0.00 0.00 0.00 0.07�z 0.80 0.03 0.05 0.00 0.05 0.00 0.05�� 1.12 1.60 0.45 3.89 0.64 8.79 1.00�z 0.50 1.47 0.01 3.18 0.03 0.02 0.16�r 0.10 1.37 0.03 3.75 0.03 0.00 0.00�� 1.60 13.1418.8117.6838.52 38.36 1.94

Total1 0.30 0.40 0.21 0.48 0.49 0.24Total2 17.9119.7928.7139.75 47.66 3.45

MSE. In DPG1 there is a unit root component to the preference shock and�nc��T�= [1:1; 1:9].

DGP2True value LT HP FOD BP Ratio1Flexible

�n 0.50 0.04 0.11 0.17 0.12 0.12 0.06h 0.70 0.01 0.00 0.00 0.03 0.08 0.17� 0.30 0.00 0.05 0.00 0.06 0.02 0.07�r 0.70 0.05 0.05 0.04 0.05 0.13 0.02�� 1.50 0.00 0.00 0.00 0.00 0.01 0.00�y 0.40 0.16 0.21 0.08 0.19 0.15 0.00�p 0.75 0.03 0.04 0.02 0.05 0.04 0.03�� 0.50 0.00 0.04 0.00 0.00 0.01 0.08�z 0.80 0.04 0.05 0.03 0.03 0.00 0.06�� 1.12 10.41 0.87 2.80 0.69 9.43 0.97�z 0.50 9.15 0.06 1.91 0.06 0.01 0.17�r 0.10 9.35 0.00 1.05 0.03 0.00 0.00�� 1.60 10.4120.7220.3357.03 40.17 1.90

Total1 0.29 0.46 0.32 0.51 0.55 0.35Total2 39.6522.2026.4458.34 50.17 3.54

MSE. In DGP2 all shocks are stationary but there is measurement error and �u�T�

=

[0:09; 0:11] The MSE is computed using 50 replications.

2 4 6 8 10 12 140.1

0

0.1

0.2

0.3y t

P referenc e S hoc ks

2 4 6 8 10 12 140

0.2

0.4

0.6

wt

2 4 6 8 10 12 140

0.05

0.1

0.15

0.2

πt

2 4 6 8 10 12 140

0.05

0.1

r t

2 4 6 8 10 12 140

0.5

1

T ec hnology S hoc ks

2 4 6 8 10 12 14

1

0

1

2 4 6 8 10 12 14

0.2

0.1

0

0.1

0.2

2 4 6 8 10 12 14

0.4

0.2

0

0.2

LT HP F O D BP Rat io 1 Rat io 2

Estimated impulse responses.

Why are estimates distorted with standard �ltering?

- Posterior proportional to likelihood times prior.

- Log-likelihood of the parameters (see Hansen and Sargent, 1993)

L(�jyt) = A1(�) +A2(�) +A3(�)

A1(�) =1

�

X!j

log detG�(!j)

A2(�) =1

�

X!j

trace [G�(!j)]�1F (!j)

A3(�) = (E(y)� �(�))G�(!0)�1(E(y)� �(�))

where !j =�jT ; j = 0; 1; : : : ; T � 1;, G�(!j) is the model based spectral

density matrix of yt, �(�) the model based mean of yt, F (!j) is the databased spectral density of yt and E(y) the unconditional mean of the data.

- A1(�) = sum of the one-step ahead forecast error across frequencies;

- A2(�): a penalty function, emphasizing deviations of the model-basedfrom the data-based spectral density at various frequencies.

- A3(�): a penalty function, weighting deviations of model-based fromdata-based means, with the spectral density of the model at frequencyzero as weight.

- Suppose that the actual data is �ltered so that frequency zero is elimi-nated and low frequencies de-emphasized. Then

L(�jyt) = A1(�) +A2(�)�

A2(�)� =

1

�

X!j

trace [G�(!j)]�1F (!j)

�

where F (!j)� = F (!j)I! and I! is an indicator function.

Suppose that I! = I[!1;!2], an indicator function for the business cycle

frequencies, as in an ideal BP �lter.

The penalty A2(�)� matters only at these frequencies.

Since A2(�)� and A1(�) enter additively in the log-likelihood: two types

of biases in �:

- estimates F�(!j)� only approximately capture the features of F (!j)�

at the required frequencies - the sample version of A2(�)� has a smaller

values at business cycle frequencies and a nonzero value at non-businesscycle ones.

- To reduce the contribution of the penalty function to the log-likelihood,parameters are adjusted to make [G�(!j)] close to F (!j)

� at those fre-quencies where F (!j)

� is not zero. This is done by allowing �tting errors inA1(�) large at frequencies F (!j)

� is zero - in particular the low frequencies.

Conclusions:

1) The volatility of the structural shocks will be overestimated - this makes[G�(!j)] close to F (!j)

� at the relevant frequencies.

2) Their persistence underestimated - this makes G�(!j) small and the�tting error large at low frequencies.

Estimated economy very di�erent from the true one: agents' decision rulesare altered.

- Higher perceived volatility implies distortions in the aversion to risk anda reduction in the internal ampli�cation features of the model.

- Lower persistence implies that perceived substitution and income e�ectsare distorted with the latter typically underestimated.

- Distortions disappear if:

i) the non-cyclical component has low power at the business cycle frequen-cies. Need for this that the volatility of the non-cyclical component isconsiderably smaller than the volatility of the cyclical one.

ii) The prior eliminates the distortions induced by the penalty functions.

Question: What if we �t the �ltered version of the model to the �ltereddata? as suggested by Chari, Kehoe and McGrattan (2008)

- Log-likelihood=A1(�)� = 1

�

P!j log detG�(!j)I!+A2(�). Suppose that

I! = I[!1;!2].

- A1(�)� matters only at business cycle frequencies while the penalty func-

tion is present at all frequencies.

- If the penalty is more important in the low frequencies (typical case)parameters adjusted to make [G�(!j)] close to F (!j) at these frequencies.

-Procedure implies that the model is �tted to the low frequencies com-ponents of the data!!!

i) Volatility of the shocks will be generally underestimated.

ii) Persistence overestimated.

iii) Since less noise is perceived, decision rules will imply a higher degreeof predictability of simulated time series.

iv) Perceived substitution and income e�ects are distorted with the latteroverestimated.

How can we avoid distortions?

- Build models with non-cyclical components (di�cult).

- Use �lters which exibly adapt, see Gorodnichenko and Ng (2010) andEklund, et al. (2008).

- The true and estimated log spectrum and ACF close.

- Both true and estimate cyclical components have power at all frequencies.

5 10 150

0.1

0.2Preference

y t

5 10 150

0.1

0.2Technology

5 10 150.2

0.1

0Monetary Policy

5 10 152

1

0x 10 3

Markup

5 10 150.2

0

0.2

ωt

5 10 152

1

0

5 10 152

1

0

5 10 150.04

0.02

0

5 10 150

0.01

0.02

π t

5 10 150.4

0.2

0

5 10 150.1

0.05

0

5 10 150.02

0

0.02

5 10 150

0.02

0.04

r t

5 10 150.4

0.2

0

5 10 151

0

1

5 10 150.02

0

0.02

True Estimated

Model based IRF, true and estimated.

Actual data: do we get a di�erent story?

1 0.5 0 0.5 10

1

2

3

4

5

6LT

0.7 0.6 0.5 0.4 0.3 0.2 0.1 00

2

4

6

8HP

0.8 0.6 0.4 0.2 0 0.20

1

2

3

4Flexible

0.1 0.05 0 0.05 0.10

5

10

15

20

25

30

35Flexible, inflation target

Sample 19641979 Sample 19842005Posterior distributions of the policy activism parameter, samples 1964:1-1979:4 and

1984:1-2007:4. LT refers to linearly detrended data, HP to Hodrick and Prescott �ltereddata and Flexible to the approach the paper suggests

LT FOD FlexibleOutput In ationOutput In ationOutput In ation

TFP shocks 0.01 0.04 0.00 0.01 0.01 0.19Gov. expenditure shocks 0.00 0.00 0.00 0.00 0.00 0.02Investment shocks 0.08 0.00 0.00 0.00 0.00 0.05

Monetary policy shocks 0.01 0.00 0.00 0.00 0.00 0.01Price markup shocks 0.75(*) 0.88(*) 0.91(*) 0.90(*) 0.00 0.21Wage markup shocks 0.00 0.01 0.08 0.08 0.03 0.49(*)Preference shocks 0.11 0.04 0.00 0.00 0.94(*) 0.00

Variance decomposition at the 5 years horizon, SW model. Estimates are obtained usingthe median of the posterior of the parameters. A (*) indicates that the 68 percent highestcredible set is entirely above 0.10. The model and the data set are the same as in SmetsWouters (2007). LT refers to linearly detrended data, FOD to growth rates and Flexibleto the approach this paper suggests.

8 Eliciting Priors from existing information

Prior distributions for DSGE parameters are often arbitrary.

- Independence: the joint distribution may assign non-zero probability to" unreasonable" regions of the parameter space.

- Set having some statistics in mind (the prior mean is similar to the oneobtained in calibration exercises).

- Same prior is used for a parameter in di�erent models. Problem: sameprior may generate di�erent dynamics in di�erent models.

Example 8.1 Let yt = �1yt�1+ �2+ut; ut � N(0; 1): Suppose �1 and �2 are indepen-dent and p(�1) � U(0; 1� �); � > 0; p(�2j�1) � N(��; �).

Since the mean of yt is � =�21��1 , the prior for �1 and �2 imply that �j�1 � N(��; �

(1��1)2).

Prior mean of yt has a variance increasing in �1! Why? Reasonable?

Alternative: state a prior for �, derive the prior for �1 and �2 (change of variables). Forexample, if � � N(��; �2) and p(�1) = U(0; 1� �), then p(�2j�1) = N(��(1� �1); �

2(1��1)2). Note here that the priors for �1 and �2 are correlated.

Suppose you want to compare the model with yt = � + ut; ut � N(0; 1). If p(�) =N(��; �2) the two models are comparable. If, instead, p(�1) and p(�2) are independent,the two models would not be comparable.

- In calibration exercises: �nd parameters so that statistics of actual andsimulated data are consistent.

- Del Negro and Schorfheide (2008): use the same approach to obtain adistribution for the parameters such that the statistics of simulated dataare consistent with the distribution of statistics of actual data. Distributionfor the parameters is a data-based prior (see also Kadane et al., 1980).

Setup:

i) Let � be a vector of DSGE parameters. Let ST be a set of statisticsobtained with actual data and T observations.

ii) Let SN(�) be the same set of statistics which are obtained from themodel, once � is selected, using N observations. Set

ST = SN(�) + � � � (0;�NT ) (99)

where � is a set of measurement errors.

� In calibration exercises �TN = 0 and ST are averages of the data.

� In SMM: �TN = 0 and ST are moments of the data.

Let L(SN(�)jST ) = p(ST jSN(�)), where the latter is the conditional den-sity in (99). Let N be large, and �NT be the variance of ST (computedusing asymptotic distributions or small sample devices, such as bootstrapor MC methods).

Given any other prior information �(�), the prior for � is

p(�jST ) / L(SN(�)jST )�(�) (100)

- Choose dim(ST ) � dim(�).

- Even if �TN is diagonal, SN(�) will induce correlation across �i.

-Information used to construct ST should be di�erent than informationused to estimate the model. Could be data in a training sample or from adi�erent country or a di�erent regime (see e.g. Canova and Pappa, 2007).

- A normal � makes life easy. Could use other distributions, e.g. uniform.

- What are ST? Steady states, autocorrelation functions, impulse re-sponses, etc. Choose ST to insure that all � are identi�able.

Example 8.2

max(ct;Kt+1;Nt)

E0Xt

�t(c#t (1�Nt)

1�#)1�'

1� '(101)

Gt + ct +Kt+1 = GDPt + (1� �)Kt (102)

ln �t = �� + �z ln �t�1 + �1t �1t � (0; �2z) (103)

lnGt = �G+ �g lnGt�1 + �4t �4t � (0; �2g) (104)

GDPt = �tK1��t N

�t (105)

K0 given, where ct is consumption, Nt is hours, Kt is the capital stock.Let Gt be �nanced with lump sum taxes and �t the Lagrangian on (102).

The FOC are ((109) and (110) equate factor prices and marginal products)

�t = #c#(1�')�1t (1�Nt)

(1�#)(1�') (106)

�t��tk1��t N

��1t = �(1� #)c

#(1�')t (1�Nt)

(1�#)(1�')�1 (107)

�t = Et��t+1[(1� �)�t+1K��t+1N

�t+1 + (1� �)](108)

wt = �GDPt

Nt(109)

rt = (1� �)GDPt

Kt(110)

Using (106)-(107) we have:

�1� #

#

ct

1�Nt= �

GDPt

Nt(111)

The log-linear the equilibrium conditions are

�t � (#(1� ')� 1)ct + (1� #)(1� ')N ss

1�N ssNt = 0 (112)

�t+1 +(1� �)(GDP=K)ss

(1� �)(GDP=K)ss + (1� �))(\GDP t+1 � Kt+1) = �t (113)

1

1�N ssNt + ct �dgdpt = 0 (114)

wt �\GDP t + nt = 0 (115)

rt �\GDP t + kt = 0 (116)

\GDP t � �t � (1� �)Kt � �Nt = 0 (117)

(g

GDP)ssgt + (

c

GDP)ssct + (

K

GDP)ss(Kt+1 � (1� �)Kt)�\GDP t = 0 (118)

(117) and (118) are the production function and resource constraint.

Four types of parameters appear in (112) - (118):

i.) Technological parameters (�; �).

ii) Preference parameters (�; '; #).

iii) Steady state parameters (Nss; ( cGDP )

ss; ( KGDP )

ss; ( gGDP )

ss).

iv) Parameters of the driving process (�g; �z; �2z; �

2g).

Question: How do we set a prior for these 13 parameters?

The steady states of the model are:

1� #

#(

c

GDP)ss = �

1�Nss

Nss(119)

�[(1� �)(GDP

K)ss + (1� �)] = 1 (120)

(g

GDP)ss + (

c

GDP)ss + �(

K

GDP)ss = 1 (121)

GDP

wc= � (122)

K

i= � (123)

Five equations in 8 parameters!!

Need to choose: e.g. (119)-(123) determine (Nss; ( cGDP )

ss; ( KGDP )

ss; �; �)given(( gGDP )

ss; �, #).

Set �2 = [(g

GDP )ss; �, #] and �1 = [N

ss; ( cGDP )

ss; ( KGDP )

ss; �; �]

If S1T are steady state relationships, we can use (119)-(123) to constructg(�1j�2).

How do we measure � ( the uncertainty in S1T )?

- Take a rolling window to estimate S1T and use uncertainty of the estimateof S1T to calibrate var(�); bootstrap S1T , etc.

- Mechanically: �nd �1 such that given �2, SN(�1j�2) = S1T � �.

How do we set a prior for �2? Use additional statistics!

- ( gGDP )

ss could be centered at the average G/Y in the data with standarderror covering the existing range of variations

- � = (1 + r)�1 and typically rss = [0:0075; 0:0150] per quarter. Choosea prior centered at around those values and e.g. uniformly distributed.

- # is related to Frish elasticity of labor supply: use estimates of laborsupply elasticity to obtain histograms and to select a prior shape.

- The uncertainty could be data based or across studies (meta uncertainty).

Parameters of the driving process �3 = (�g; �z; �2z; �

2g) do not enter the

steady state. How do we choose a prior for them?

- �z; �2z can be backed out from moments of Solow residual i.e. estimate

the variance and the AR(1) of z = lnGDPt� (1� �)Kt� �Nt, once � ischosen. The prior for � induces a distribution for z

- �g; �2g backed out from moments government expenditure data.

� Prior standard errors should re ect variations in these statistics.

- For �4 = ' one has two options:

(a) appeal to existing estimates. The coe�cient of relative risk aversion(RRA) is 1 � #(1 � ')). Construct a prior which is consistent with thecross section of estimates (e.g. a �2(2) would be ok).

(b) select, say var(ct) and use

var(ct) = var(ct(')j�1; �2; �3) + �c (124)

to back out a prior for '.

For some parameters (call them �5) we have no moments to match butsome micro evidence. Then p(�5) = �(�5) could be estimated from thehistogram of available estimates.

In sum, a data-based prior for � is

p(�) = p(�1jS1T )p(�2jS2T )p(�3jS3T )p(�4jS4T )�(�1) (125)

- If we had used a di�erent utility function, the prior e.g. for �1; �4 wouldbe di�erent. Prior for di�erent models should be di�erent.

- To use these priors, need a normalizing constant ( (125) is not necessarilya density). Need a RW Metropolis to draw from these priors.

- Careful about multidimensional ridges: e.g. steady states give 5 equa-tions, and 8 parameters - solution not unique, impossible to invert therelationship.

- Careful about choosing �3 and �4 when there are weak and partial iden-ti�cation problems - they are obtained conditional on �1 and �2.

Extension 1: Lombardi and Nicoletti (2012): Impulse responses.

- � vector of IRF; (�) model based IRF.

- Distance function d(�j �) = vec( (�)� �)W ( (�)� �)0, W weightingmatrix.

- Prior kernel: w(�j �;K) = exp(�d(�j �))K(1+exp(�d(�j �)):K is the variance of the

logistic transformation and controls the believes of the analyst (K large,means w(�j �;K) close to uniform).

- Data-based prior: p(�j �) = w(�j �;K)Rw(�j �;K)d�.

Let � = [�1; �2]. Two special cases:

1) Prior kernel w(�1j �;K; ��2) (some parameters do not enter impulseresponses and are calibrated).

2) Prior kernel w(�1j �;K; �2)g(�2j��2;��2) (prior for some parametersobtained from sources other than IRF).

Extension 2: Andrle and Benes (2013): system priors. Prior must produceresults that are consistent with policymakers point of view about certainproperties of the data.

- Pick any economically motivate statistic. e.g. sacri�ce ratio, delayedoutput response to monetary policy, duration of recessions. Use it to backout correlated priors for the parameters.

- Can also use restrictions on trends, properties of measurement error, etc.(frequency domain prior), to back out correlated priors.

- If not all parameter prior can be identi�ed. Add individual subjectivepriors.

� Dynare has an option in the estimation command called endogenous prior.

- It gives priors generating model predictions for the variances which areconsistent with the data (as in Christiano et al., 2011).

- Problem: it does not use a training sample to get the priors. Need towrite your own code to make it operative on a training sample.

� See later on how to use this technology for misspeci�ed models.

9 Prior predictive analysis

� Evaluate the properties of a model a-priori, without having used data toestimate its parameters.

� Useful to see if a model is suitable to answer a question. This may helpto narrow down the set of models to estimate. Useful also to get an ideaof the range of values of certain statistics a model can produce.

� How much of what we see in posterior statistics is due to the prior?(recall Mueller, 2012).

� Idea: draw parameters from the prior. Plug them into the model. Con-struct distributions of outcomes of interest.

Prior predictive analysis entails three steps:

- Given a model, Mj , and associated parameters, �j j = 1; 2; :::; n; positan independent prior density p(�jjMj), giving the range of values and theassociated probabilities. Let ~g(�jjMj) be the product of the marginal para-meter distributions and If�j 2 �jg an indicator function equal to one if �jgenerates stable/determinate outcomes and zero otherwise. The joint prioris g(�jjMj) = c�1~g(�jjMj)If�j 2 �jg, where c =

R�j2�j ~g(�jjMj)d�j

- The model generates (ex-ante) predictive distributions for the observ-ables, yj;1:T , using

p(yj;1:T jMj) =Z�j2�j

L(yj;1:T j�jMj)g(�jjMj)d�j (126)

- For any vector of statistics of !j = h(yj;1:T ), the (ex-ante) predictivedistribution can be used to produce p(!jjyj;1:T ;Mj). In one examplebelow !j is in ation persistence, in another is the �scal multiplier. Thedistribution of !j depends on Mj;and yj;1:T .

� To generate prior predictive distributions for !j, draw �lj from g(�jjMj);

draw ylj;1:T from L(ylj;1:T j�ljMj); and with yj;1:T construct !lj. Repeat

l = 1; 2; : : : ; L times and average over �j

� p(yj;1:T jMj) is the prior distribution of observables and p(!jjyj;1:TMj)the prior distribution of the statistics of interest (on average over the pa-rameter draws)

Example 9.1 Leeper et al.(2015) Compare prior-predictive distribution of�scal multipliers produced by four models.

Example 9.2 New Keynesian Phillips curve. Can we match what we see in the data?

�t = ��t+1 +(1� �p)(1� �p�)

�pmct + et (127)

et is an expectation error. Use the output gap as observable proxy for marginal costs.What are the prior range for in ation persistence, given uniform ranges for (�p; �)?

0.98 0.982 0.984 0.986 0.988 0.99 0.992 0.994 0.996 0.998

0.30.4

0.50.6

0.70.8

0.90.920.930.940.950.960.970.980.99

11.011.02

betazeta

AR

1

� Dynare does not do prior predictive analysis. You can do it with a simpleloop in matlab.

- Draw a vector of parameter from the prior, solve the model, and simulatea relatively long time series.

- Compute the statistics of interest

- Repeat this L times average results or plot histogram (density) of theresults.

10 Varying Coe�cients DSGE

Dueker et al., 2007, Fernandez and Rubio, 2007, Canova, 2009, Rios andSantaeularia, 2010, Liu et al., 2011, Galvao, et al., 2014, Vavra, 2014, DewBecker, 2014, Seoane, 2014, Meier and Sprengler, 2015: DSGE parametersare not time invariant.

� May be due to misspeci�cation (Cogley and Yagihashi, 2010; Chang, etal., 2013, Basile and Carvalho, 2015).

� May be needed to insure the existence of stationary equilibrium (SchmittGrohe and Uribe, 2003).

� May be due to robustness concerns (Hansen and Sargent, 2010) or learn-ing (Cogley et al., 2015).

Standard approaches to modelling of TVs: exogenous.

� Continuously varying �t = �t�1 + ut; ut � (0; �2u), �2u small.

� Markov switching (Bianchi and Melosi, 2016).

� Endogenous time variations?

- State dependent (expansion vs. contraction) policy rules? State depen-dent trend in ation?

- Are households as risk averse when they are wealthy or poor? Or asimpatient when the capital stock is high or low?

- Does the propagation of shocks depends on the state of private andgovernment �nances? Or on inequality?

Questions

� What are the consequences of neglecting parameter variations on thedecision rules and on the transmission of structural shocks?

� Does is it matter if time variations are exogenous or endogenous?

� Do TVC-DSGEs generate time varying decision rules and responses?

� Can we detect parameter variations? Can we determine if they areexogenous or endogenous?

� Can neglected time variations be the reason for identi�cation and esti-mation pathologies?

10.1 The solution of a model with (continuous) TVC

Optimality conditions:

Et [f(Xt+1; Xt; Xt�1; Zt+1; Zt;�t+1;�t)] = 0 (128)

Xt: nx � 1 vector of the endogenous variables; Zt: nz � 1 vector ofthe exogenous variables; �t: n� � 1 vector of possibly time varying (TV)structural parameters; f is continuous and di�erentiable up to order q.

Zt+1 = (Zt; ��zt+1) (129)

is continuous and di�erentiable up to order q; �zt+1 � iid(0; I) a ne�1vector, nz � ne; � � 0 an auxiliary scalar; �� a known ne � ne matrix.

�t+1 = �(�; Xt; Ut+1) (130)

� is continuous and di�erentiable up to order q, � is a vector ofconstants, Ut : nu � 1 vector of disturbances, n� � nu:

Ut+1 = (Ut; ��u�ut+1) (131)

is continuous and di�erentiable up to order q; �ut � iid(0; I) is nu� 1vector, uncorrelated with �zt+1, �u is a known nu � nu matrix.

Let �t+1 = [�zt+1; �

ut+1]

0;� = diag[�z;�u]. Decision rule:

Xt = h(Xt�1; Zt; Ut; ��t) (132)

� Time variations only in structural parameters (see Andreasan, 2012,Ascari and Bonomolo, 2015 for TV in auxiliary parameters ��).

� Endogenous TVC modeled as the relationship between capacity utiliza-tion and depreciation rate (i.e. � = u

t + et).

� Possibility of common patterns of parameter variations.

� Zt and Ut assumed to be stationary for convenience.

� Time variations assumed to be continuous.

- Can accommodate once-and-for-all breaks (at known dates) with smoothtransition

thetat+1 = (1� �)� + ��t + a � exp(t)=(b+ exp(t)) exogenous (133)

�t+1 = (1� �)� + ��t + a exp(�(Kt �K + U�;t+1))=(b+ exp(�(Kt �K + U�;t+1)) endogenous(134)

t = �T1; : : : ;�1; 0; 1; : : : T2;

- Can NOT allow for Markov switching or non-smooth transition e.g.

�t+1 = (1� I(Kt�1 > K))�0 + I((Kt�1 > K))�1

(Davig and Leeper, 2006).

0 1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 0 1 0 00 .2 5

0 .3

0 .3 5

0 .4

0 .4 5

0 .5

0 .5 5

0 .6

0 .6 5

0 .7S tru ctu ra l B re a k

First order approximation

The linear decision rule is:

xt = Pxt�1 +Qzt +Rut (135)

� P = @h=@Xt�1 solves FP 2 + (G+N�x)P + (H +O�x) = 0.

� Given P ,Q = @h=@Zt solves V Q = �vec(L z+M) and V = 0zF +Inz (FP +G+N�x) where vec denotes the columnwise vectorization.

� Given P;R = @h=@Ut solves WR = �vec(N�u!u + O�u) and W =!0u F + In� (FP +G+N�x)

where �u = @�=@Ut+1, �x = @�=@Xt, z = @=@Zt; !u = @=@Ut;where!u is a nu�nu matrix and z is a nz�nz matrix, both with all eigenvaluesstrictly less than one in absolute value.

Results:

1) Decision rules time invariant VARMA(1,1). Having time varying para-meters equivalent to having additional shocks (If �x = �u = 0, R=0, Psolves FP 2+GP+H = 0 and, given P , Q solves V Q = �vec(L z+M),where V = 0z F + Inz (FP +G)).

2) With exogenous variations ( x = 0) dynamics to structural shocks arethe same as in a time invariant model.

3) With endogenous variations (�x 6= 0) dynamics to structural shocksmay be a�ected (if N�x or O�x are zero).

4) No time varying decision rules, no time varying responses. Time varyinglinear decision rules possible with learning (see e.g. Cogley et al (2015),Kulish and Pagan, 2016) or with time variations in the auxiliary parameters.

Higher order approximations: do conclusions change?

� Second order approximate law of motion of the structural parameters is

�t = �xxt�1 + �uut + 1=2 � �t (136)

where �t = vec

"xt�1ut

# hx0t�1 u0t

i!;� =

264 �01...

�0n�

375 ; �j = vec

"�jxx �

jux

�jux �

juu

#!,

and �jxx =

�@2�jt

@xht@xit

�.

� Pruned second order approximate equilibrium conditions are:

0 = Et(Fxt+1+Gxt+Hxt�1+Lzt+1+Mzt+N�t+1+O�t+1=2�1xz��t+1+1=2�

0xz��t)(137)

�t = vec

0BBB@26664xtxt�1zt�t

37775 hx0t x0t�1 z0t �0ti1CCCA

and �ixz�; i = 0; 1 contains the coe�cients on the cross derivative terms.Let

e�t = vec

0BBB@26664xtxt�1ztut

37775 hx0t x0t�1 z0t u0ti1CCCA

Since �t = (J0 J0)e�t and �t = (J1 J1)

e�t whereJ0 =

"0 I 0 00 0 0 I

#; J1 =

26664I 0 0 00 I 0 00 0 I 00 �x 0 �u

37775the optimality conditions can be rewritten as:

0 = EtfFxt+1 + (G+N�x)xt + (H +O�x)xt�1 + [L M ]

"zt+1zt

#

+ [N�u O�u]

"ut+1ut

#+1

2(N�(J0 J0) + �

1xz�(J1 J1))

e�t+1+1

2(O�(J0 J0) + �

0xz�(J1 J1))

e�t g (138)

Following Lombardo and Sutherland (2007), one can rewrite(138) as

0 = Et

0BBBB@Fxt+1 + (G+N�x)xt + (H +O�x)xt�1 + [L M ]

"zt+1zt

#

+[N�u O�u]

"ut+1ut

#+ 12 A e�t

1CCCCA+B(139)

where

A = (N�(J0 J0) + �1xz�(J1 J1))(

eP eP ) + (O�(J0 J0) + �0xz�(J1 J1))

B = 1=2 (N�(J0 J0) + �1xz�(J1 J1)) (

eQ eQ) �~P =

26664P 0 Q�z R!uI 0 0 00 0 �z 00 0 0 !u

37775 ; ~Q =

26664Q R0 0I 00 I

37775.

The solution to (139) is

xt = Pxt�1 +Qzt +Rut +Ce�t +D (140)

where, by construction, P;Q;R are the same as in the �rst order solution,C solves SC = �vec(12A); S = I` (FP +G+N�x); ` = 2nx+nz+nuand D is a function of B.

With constant coe�cients, the optimality condition (138) is

0 = Et

0BBBB@Fxt+1 +Gxt +Hxt�1 + [L M ]

"zt+1zt

#

+[0 0]

"ut+1ut

#+ 12A

cc e�t

1CCCCA+Bcc (141)

where

Acc = �1xz�(Jcc1 Jcc1 )(

gP cc gP cc) + �0xz�(Jcc1 Jcc1 )

Bcc = 1=2 �1xz�(Jcc1 Jcc1 ) (

eQcc gQcc) �~P cc =

26664P cc 0 Qcc�z 0I 0 0 00 0 �z 00 0 0 0

37775 ; ~Qcc =26664Qcc 00 0I 00 0

37775 ; Jcc0 =

"0 I 0 00 0 0 0

#;

Jcc1 =

26664I 0 0 00 I 0 00 0 I 00 0 0 0

37775 and the terms in �1xz� corresponding to the second

order or cross derivatives with respect to �t are zero. The solution is

xt = P ccxt�1 +Qcczt +Ccce�t +Dcc (142)

where, Ccc solves SCcc = �vec(12Acc); and Dcc is a function of Bcc.

� Compare (140) and (142).

1) With exogenous variations and second order solutions, the dynamicsof xt in response to zt shocks is the same in time varying and constantcoe�cient models since both C and D are independent of ut:

� A constant coe�cient model approximate well the dynamics

2) Not true for higher order solutions. For example, in a third order ap-proximation, the optimality conditions feature terms requiring a correctionof the linear terms to account for uncertainty. Since shocks are omitted inconstant coe�cient models, the correction terms di�er.

3) With endogenous variations and a second order solution, structuralresponses in TVC and constant parameter models may di�er, even whenlinear structural responses are not since there is a higher order feedbacke�ects on xt via parameter variations.

Example 10.1

Etyt+1 � f(xt; �t) = �tx0:95t (143)

xt � �x = 0:8(xt�1 � �x) + �zt (144)

�t = 2� 0:5[exp(�0:03 � (xt�1 � �x)) + exp(0:03 � (xt�1 � �x))] + �ut(145)

"zt ; "ut i.i.d., �x � Ext = 1. The second order solutions are

yt � �y = 0:76(xt�1 � �x) + 0:95�zt + �ut � 0:01565(xt�1 � �x)2 � 0:238�z2t + 0:038(xt�1 � �x)�zt+ 0:76(xt � �x)�ut + 0:95�

zt �ut (TV C) (146)

yt � �y = 0:76(xt�1 � �x) + 0:95�zt + �ut � 0:01520(xt�1 � �x)2 � 0:238�z2t + 0:038(xt�1 � �x)�zt+ 0:76(xt � �x)�ut + 0:95�

zt �ut (CC) (147)

� Linear responses to �zt in (146) and (147) are the same, since N�x =O�x = 0. Second order responses are not because �xx = �0:00045.

Time varying decision rules and impulse responses?

� Linearized solutions do not have TVC decision rules. Does this changewith higher order solutions? (140) can be rewritten as

xt ' Pxt�1 +Qzt +Rut

+ C22vec(xt�1x0t�1) + C33vec(ztz

0t) + C44vec(utu

0t) + C23vec(xt�1z

0t)

+ C24vec(xt�1u0t) + C34vec(ztu

0t) (148)

� If ut is treated as any other (exogenous) variable, (148) is again a �xedcoe�cient representation.

� If ut is treated as a "parameter", letting 1t = P + C24ut, 2t =Q+ C34ut and neglecting a number of square terms,

xt � A+ 1txt�1 + 2tzt +Rut (149)

a time varying VARMA(1,1) decision rule.

� Is the "parameter" interpretation justi�ed? No. Valid only in Markovswitching models.

� Even under the "parameter" interpretation, structural responses will betime invariant since innovations in zt and the ut are uncorrelated.

A RBC example

maxE0

1Xt=1

�t(C1��t

1� ��A

N1+ t

1 + ) (150)

Yt(1� gt) = Ct +Kt � (1� �t)Kt�1 (151)

Yt = �tK�t�1N

1��t (152)

Yt is output, Ct consumption, Kt the stock of capital, Nt hours worked

and gt =GtY tthe share of government expenditure.

Exogenous disturbances:

ln �t = (1� ��) ln�� + �� ln �t�1 + ezt (153)

ln gt = (1� �g) ln �g + �g ln gt�1 + egt (154)

- (�, �, , A, ��; �g; ��; �g; ��; �g) �xed; �t and �t time varying (Meier andSprengler, 2015; Dueker, et al., 2007, Liu et al., 2011).


AC�tN

t = (1� �)(1� gt)Yt=Nt (155)

C��t = Et

��t+1�t

C��t+1(�(1� gt+1)Yt+1

Kt+1+ 1� �t+1)

�+ Et

�@�t+1@Kt

u(Ct+1; Nt+1)�@�t+1

@KtKt)

�(156)

(1� gt)Yt = Ct +Kt � (1� �t)Kt�1 (157)

Yt = �tK�t�1N

1��t (158)

Two e�ects of parameter variations on optimality conditions:

- Direct e�ect in the Euler equation and in the resource constraint when�t and �t are time varying.

- If agents take into account that their decisions a�ects parameter varia-tions, there is an additional (indirect) e�ect due to the derivatives of �t+1and �t+1 with respect to the endogenous states (the capital stock).

- Are time varying parameters Chari et al. (2007) wedges? No: there areacross equations restrictions.

� Model A: Constant parameters.

� Model B: Exogenous parameter variation:

Let dt = �t+1=�t. Set

�1t+1 �� dt+1 � ��t+1 � �

!=

u�;t+1u�;t+1

!� Ut+1

u�;t+1 = �du�;t + e�;t+1 (159)

u�;t+1 = ��u�;t + e�;t+1 (160)

� Model C: State dependent parameter variations, no internalization. Set

�1t+1 = [�u � (�u ��l)e��1(Kt�K)] + [�u � (�u ��l)e

�2(Kt�K)] + u�;t+1 (161)

�1; �2;�u;�l are vectors. Let u�;t+1 be zero mean, iid shocks.

1 0.8 0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 1

0.94

0.945

0.95

0.955

0.96

0.965

0.97

0.975

0.98

0.985

0.99

θ

KtK1 0.8 0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 1

0.023

0.0235

0.024

0.0245

0.025

0.0255

0.026

0.0265

θ

KtK

� Model D: State dependent parameter variations, internalization. Usespeci�cation as in C, and use

d0t+1 � @dt+1=@Kt = �(du � �=2)[��11e��11(Kt�K) + �21e�21(Kt�K)] (162)

�0t+1 � @�t+1=@Kt = �(�u � �=2)[��12e��12(Kt�K) + �22e�22(Kt�K)] (163)

5 10 15 20 25 30 35 40

0.02

0.04

0.06

0.08

0.1

0.12

Out

put

5 10 15 20 25 30 35 40

0.005

0.01

0.015

0.02

0.025

0.03

0.035

Con

sum

ptio

n

5 10 15 20 25 30 35 40

0.1

0.2

0.3

0.4

0.5

0.6

Cap

ital

5 10 15 20 25 30 35 408

6

4

2

0

2

4

6

8

Hou

rs

10 3

First order approximation

Model B Model C Model D Model A

Responses to technology shocks, �rst order approximation

� Income and substitution e�ects are di�erent! Agents work and accumu-late less and consume more for a given income increase.

5 1 0 1 5 2 0 2 5 3 0 3 5 4 0

4

2

0

2

4

6

8

x 1 04 Hours fi rs t o rde r

5 1 0 1 5 2 0 2 5 3 0 3 5 4 0

4

2

0

2

4

6

8

x 1 04 Hours s ec on d o rder

5 1 0 1 5 2 0 2 5 3 0 3 5 4 0

1 .2

1 .4

1 .6

1 .8

2

2 .2

x 1 03 Cons um ption fi rs t o rde r

5 1 0 1 5 2 0 2 5 3 0 3 5 4 0

1 .2

1 .4

1 .6

1 .8

2

2 .2

x 1 03 Cons um ption s ec ond o rde r

Mo d e l B Mo d e l A

First and second order responses to technology shocks

- Responses in models A and B are identical.

- Quadratic terms are small so �rst and second order responses look alike.

- Since ut does not respond to technology shocks, dynamics time invariant.

10.2 Characterizing time invariant misspeci�cation

� Di�cult.

� Hard to distinguish TVC model from misspeci�ed model or a model withmeasurement error.

xt = Pxt�1 +Qzt +Rut (164)

xt = Pxt�1 +Q�z�t (165)

~xt = Pxt�1 +Qzt + vt (166)

Q� = [Q;R]; z�t = [z0t; u

0t]0, vt = Rut.

Wedges

Constant coe�cient model with CC data:

0 = Et[F (Xt�1;Wt; ��z�zt+1;�)] (167)

Xt = h(Xt�1;Wt; ��z�zt+1;�) (168)

Constant coe�cient model with TV coe�cients data:

0 = Et[F�(X�t�1;Wt; ��t+1;�)] (169)

X�t = h�(X�t�1;Wt; ��t+1;�) (170)

Wt = [Zt; Ut]0, �t+1 = [�zt+1; �

ut+1]

0.

Implications:1) Et[F (X

�t�1;Wt; ��

z�zt+1;�)] 6= 0 (since �z�zt+1 6= ��t+1;h 6= h�).2) [F (X�t�1;Wt; ��

z�zt+1;�)] predictable using past X�t�1

In �rst order system the wedge is

(F (P � � P )2 +G(P � � P ))xt�1 +

(F (Q� �Q) z +G(Q� �Q) + F (P � � P )(Q� �Q))zt +

(F (P � � P )R� +GR� + FR�!u)ut (171)

� If P � = P;Q� = Q; the wedge reduces to

(GR� + FR�!u)ut (172)

Di�erent from zero if R� 6= 0 and predictable using past xt if !u 6= 0:

� If P � 6= P;Q� 6= Q wedge is non zero (even if R� = 0) and predictableeven if !u = 0.

� H0 : model correctly speci�ed up to time variations. Trade-o�: misspec-i�cation vs. time variations.

Forecast errors

- Linearized decision rule in constant coe�cient model: xt = Pxt�1+Qzt

- Linearized decision rule in TVC model: x�t = P �x�t�1 +Q�zt +R�ut.

Let v�t be the forecast error in predicting x�t using the decision rules of the

constant coe�cient model and TVC data

v�t � x�t � Px�t�1 = Q�zt +R�ut + (P � � P )x�t�1 (173)

� Forecast error is predictable.

� True when P � 6= P but also if P � = P if ut are serially correlated (theya�ect x�t�1).

Misspeci�cation diagnostic: RBC example, N=1000

DGP Euler wedge Forecast errors outputF-test ct�1; rt�1 = 0F-test ct�1; nt�1; yt�1 = 0T=1000 T=150 T=1000 T=150

Fixed coe�cients 0.00 0.00 0.00 0.00Exogenous TVC, no serial correlation 0.08 0.15 0.11 0.21

Exogenous TVC 0.37 0.33 0.76 0.59Endogenous TVC 0.33 0.18 1.00 0.97

Endogenous TVC (internalization) 0.20 0.16 1.00 0.91Fixed coe�cients, higher order solution 0.00 0.00 0.00 0.00

Fixed coe�cients, time to build 0.19 0.26 0.19 0.17Fixed coe�cients, capacity utilization 0.00 0.00 0.00 0.00

Exogenous vs. endogenous variations?

� Use a DSGE-VAR setup. Idea:

- If time variations are detected, add T1 data points from each model withtime variations

- If additional data comes from the DGP, precision of estimates improves,ML increases.

- If additional data does not come from the DGP, noise is added, precisiondecreases (bias may be generated), ML decreases.

- Compare ML ( Bayes factors) of the various extended models.

RBC example: T=150

Bayes factor comparison, N=100, value >3.0T1=150 T1=450

DGP Model BModel CModel DModel BModel CModel DSimulated from B 1.00 0 0 1.00 0 0Simulated from C 0.01 0.99 0 0 1.00 0Simulated from D 0.01 0 0.99 0 0 1.00

10.3 Inferential distortionsParameter identi�cation

� Canova and Sala (2009): DSGE models have POPULATION identi�ca-tion problems.

� Can they arise because parameter variations are neglected?

� What should we expect to happen to the likelihood function when timeinvariant parameters are incorrectly assumed?

� Two distortions: wrong decision rules (models C-D), missaggregation ofshocks (all models).

1 .8

2

2 .2

1 .8

2

2 .22 0 0 0

1 5 0 0

1 0 0 0

5 0 0

0

η

T ru e R B C C E s tim a te d C

γ

1 .8

2

2 .2

1 .8

2

2 .24 .9 5

4 .9

4 .8 5

x 1 06

η

T ru e R B C C E s tim a te d A

γ

1 .8

2

2 .2

1 .8

2

2 .21 5 0 0

1 5 5 0

1 6 0 0

1 6 5 0

η

T ru e D E s ti m a te d D

γ

1 .8

2

2 .2

1 .8

2

2 .21 .6 6

1 .6 5

1 .6 4

1 .6 3

1 .6 2

x 1 05

η

T ru e D E s ti m a te d A

γ

1 .8

2

2 .2

1 .8

2

2 .22 7 6 0

2 7 7 0

2 7 8 0

2 7 9 0

η

T ru e B E s ti m a te d B

γ

1 .8

2

2 .2

1 .8

2

2 .21 5 0 0

1 0 0 0

5 0 0

η

T ru e B E s ti m a te d A

γ

Likelihood surfaces, RBC model

- Curvature of the correct likelihood OK (max at = 2; � = 2; � =0:3; �z = 0:9 for all models).

- When the decision rules of model A are used and the DGP are modelsB-C-D, distortions are large and some partial identi�cation problem exist

- Misspeci�cation of P and Q and shock aggregation both important.

ML based estimates and impulse response

� Use as DGP RBC models B, C, D; N=150; T=100 or T=1000.

� Two DGPs: one where parameter variations explain little of output vari-ability (less than 5 percent); one where parameter variations explain asizable portion of output variability (20 percent).

� Estimate a time invariant model and the correct model.

� Compare parameter estimates, impulse responses to structural shocksand variance decompositions.

DGPTrue Correct Constant Coe�cient Model Constant Coe�cient ModelParameterMedian5th percentile Median 95th percentile 5th percentile Median 95th percentile

T=150 T=150 T=1000B � = 2:0 2.01 0.87 2.08 2.16 0.87 1.38 2.00

= 2:0 2.00 2.31 2.55 2.75 2.13 2.53 2.67�z = 0:9 0.89 0.93 0.94 0.98 0.94 0.95 0.98�g = 0:5 0.49 0.80 0.86 0.89 0.74 0.85 0.87� = 0:025 0.02 0.01 0.01 0.01 0.01 0.01 0.03� = 0:3 0.30 0.10 0.15 0.18 0.10 0.13 0.17� = 2:0 1.98 0.55 1.38 2.00 0.55 1.50 2.00

C = 2:0 2.00 1.62 1.97 2.47 1.62 1.98 2.50�z = 0:9 0.90 0.91 0.95 1.00 0.91 0.97 1.00�g = 0:5 0.49 0.51 0.85 1.00 0.51 0.85 1.00� = 0:025 0.02 0.02 0.07 0.07 0.04 0.07 0.08� = 0:3 0.30 0.17 0.25 0.28 0.19 0.26 0.29

D � = 2:0 1.56 0.39 0.61 1.84 0.04 0.37 1.30 = 2:0 1.98 1.52 2.23 2.58 0.79 1.32 2.30�z = 0:9 0.90 0.91 0.99 1.00 0.94 1.00 1.00�g = 0:5 0.90 0.99 1.00 1.00 1.00 1.00 1.00� = 0:025 0.03 0.01 0.02 0.03 0.01 0.01 0.03� = 0:3 0.30 0.10 0.15 0.22 0.10 0.13 0.22

5 10 15 20 25 30 35 40

5

0

5

10

Hou

rs

10 4 Model B A

5 10 15 20 25 30 35 40

2

1

0

110 3 Model C A

5 10 15 20 25 30 35 405

0

5

10

1510 4 Model D A

5 10 15 20 25 30 35 40

0.02

0.04

0.06

Cap

ital

5 10 15 20 25 30 35 40

0

0.01

0.02

0.03

0.04

5 10 15 20 25 30 35 40

0.02

0.04

0.06

0.08

5 10 15 20 25 30 35 40

5

10

15

Out

put

10 3

5 10 15 20 25 30 35 40

0

5

10

10 3

5 10 15 20 25 30 35 40

5

10

15

10 3

5 10 15 20 25 30 35 40

0.8

1

1.2

1.4

1.6

1.8

Con

sum

ptio

n

10 3

A84 A16 B50

5 10 15 20 25 30 35 40

1

2

3

4

5

10 3

A84 A16 C50

5 10 15 20 25 30 35 40

1

2

3

10 3

A84 A16 D50

Impulse responses to technology shocks

Variance attributed toTechnology Government Technology GovernmentShocks Shocks Shocks Shocks

DGP: Model BTrue Estimated Time invariant

Y 0.942 0.002 0.968 0.032C 0.791 0.045 0.586 0.414N 0.478 0.068 0.376 0.624K 0.749 0.058 0.564 0.436

DGP: Model CTrue Estimated Time invariant

Y 0.965 0.006 0.977 0.023C 0.780 0.055 0.878 0.122N 0.430 0.139 0.499 0.501K 0.592 0.108 0.949 0.051

DGP: Model DTrue Estimated: Time invariant

Y 0.898 0.002 0.738 0.262C 0.836 0.055 0.439 0.561N 0.393 0.153 0.573 0.427K 0.829 0.128 0.757 0.243

Punchline

- Large distortions in persistence and in parameters controlling income andsubstitution e�ects.

- Responses have wrong magnitude.

- Results similar with internal or external variations. Performance does notimprove when T=1000.

- If the DGP has signi�cant time variations, results worsen.

10.4 Estimation of Gertler and Karadi (2010) model

- Use as observables: �Yt, �ct,�Creditt,�Leveraget,Spread (10y BAA-10y T bond). Sample: 1985:2-2014:2

- Focus of h ( habit parameter), � (share of funds that can be stolen bybankers), ! (regulating leverage) and � (bankers' survival probability)

Forecast error regressions diagnostic

Equation T-stat F-statYt�1 Ct�1 Creditt�1 Leveraget�1 Spreadt�1

Sample 1985:3-2014:3Y 0.84 2.61 0.24 0.52 10.00 4.39C -0.85 1.11 0.85 -0.65 0.33 1.26Credit 1.06 2.61 1.65 -0.58 8.49 7.11Leverage -1.11 -2.50 -1.63 0.63 -8.25 7.04Spread -1.26 -3.06 -1.10 0.81 -8.46 8.16

Sample 1985:3-2007:4Y -1.79 3.87 -2.23 -0.38 6.86 4.23C -1.37 1.19 -0.26 0.38 1.40 0.81Credit -1.18 3.53 -0.69 -0.08 7.02 3.60Leverage -1.06 -3.46 0.75 0.09 -6.80 3.72Spread 1.16 -3.84 -1.03 0.17 -6.86 4.29

Wedge diagnostic

Mean Ct�1 IYt�1 F-statEuler wedge 0.02 -0.10 0.72 6.98

(0.03) (0.01) (0.13)

- Small but predictable wedge.

- Time variations not due to last 8 years (conclusions the same if sample1985:2-2007:4).

- DSGE-VAR diagnostic: TVC with endogenous variations strongly pre-ferred to all other speci�cations (log odds > 300)

Parameter estimates

Parameter Time Invariant Time Invariant Exogenous TVC Endogenous TVC6 shocks Function of net worth

h 0.43 (0.006) 0.11 (0.02) 0.19 (0.03) 0.09 (0.02)� 0.24 (0.01) 0.97 (0.01) 0.37 (0.03) 0.55 (0.03)! 0.01 (0.008) 0.02 (0.001) 0.02 (0.002) 0.11 (0.008)� 0.46 (0.009) 0.80 (0.01) 0.54 (0.01) 0.52 (0.02)�� 0.99 (0.004)�� 0.02 (0.002) 0.03 (0.003)�u 0.98 (0.008)�1 0.02 (0.007)�2 0.15 (0.009)Log ML -167.97 1098.32 1546.18 1628.69

5 10 15 20 25 30 35 4010

8

6

4

x 103

Out

put

5 10 15 20 25 30 35 40

4

3

2

1

0

1

x 103

Infla

tion

5 10 15 20 25 30 35 40

2

0

2

4

6

x 103

Inve

stm

ents

5 10 15 20 25 30 35 40

0.2

0.15

0.1

0.05

Net

wor

th

5 10 15 20 25 30 35 40

0.05

0.1

0.15

0.2

Leve

rage

5 10 15 20 25 30 35 400

1

2

3

x 103

Spre

adConstant E xogenous E ndogenous

Dynamics in response to a capital quality shock

11 Estimating models with structural breaks

� Cagliarini and Kulish (2013), Kulish and Pagan (2016) study DSGEswith predictable structural breaks.

� Guerrieri and Iacoviello (2014,2015) consider DSGEs with occasionally(stochastic) binding constraints.

� Solution and estimation technology is similar in the two cases: use apiecewise linear approach to solve the model; build the likelihood functionand maximize (or combine with the prior to get a posterior).

� Piecewise linear methods reasonably accurate when compared to fullynon-linear solution methods, but they are faster to compute.

11.1 Structural breaks or predictable announcements

Kulish and Pagan distinguishing three situations:

� Agents perceive the change when it occurs.

� Agents anticipate the change and they are right about the timing.

� Agents anticipate the change but they are wrong about the timing

In the last two cases: the decision rules will have a time varying coe�cientstructure during the transition period.

Standard ( no break) setup

Linear optimal system (no distinction between states and controls)

A0yt = A1yt�1 +B0Etyt+1 +D0et (174)

First order solution:

yt = Qyt�1 +Get (175)

Using Etyt+1 = Qyt and (A0�B0Q) = (I �A�10 B0Q)�1A�10 into (174)

we have

yt = (1�A�10 B0Q)�1(A�10 A1yt�1 +A�10 D0et) (176)

Matching coe�cients in (175) and (176) it must be that

(1�A�10 B0Q)�1A�10 A1 = Q (177)

(1�A�10 B0Q)�1A�10 D0 = G (178)

(177) is a quadratic equation for Q. Given a solution for Q, (178) is alinear equation in G, given A0; B0; A1; D0.

� Assume you observe zt = Hyt+vt; vt � N(0; V ) is measurement error.

� The likelihood function islogL = �0:5NT log(2�)

�0:5(TXt=1

log det(H�tjt�1H0 + V )�

TXt=1

u0t(H�tjt�1H0 + V )�1ut)

(179)

where ut = zt � Et�1zt , �tjt�1 � Et(yt � Et�1yt)(yt � Et�1yt)0 arecompute via the Kalman �lter where (174) represents the state equation.

Unpredictable structural breaks

Optimality conditions are

A0yt = A1yt�1 +B0Etyt+1 +D0et t � T � (180)

A�0yt = A�1yt�1 +B�0Etyt+1 +D�0et t > T � (181)

The solution is

yt = Qyt�1 +Get t � T � (182)

yt = Q�yt�1 +G�et t > T � (183)

� Solve the two systems separately and �nd (Q;G); (Q�; G�).

� Assume you observe zt = Hyt+vt; vt � N(0; V ) is measurement error.

� The likelihood function islogL = �0:5NT log(2�)

�0:5(T �Xt=1

log det(H�tjt�1H0 + V )�

T �Xt=1

u0t(H�tjt�1H0 + V )�1ut

�TX

t=T �+1log det(H��tjt�1H

0 + V )�TX

t=T �+1u�0t (H�

�tjt�1H

0 + V )�1u�t )

(184)

where ut = zt � Et�1zt , �tjt�1 � Et(yt � Et�1yt)(yt � Et�1yt)0 andu�t = zt � E�t�1zt; �

�tjt�1 � E�t (yt � E�t�1yt)(yt � E�t�1yt)

0:

Predictable structural breaks: right timing

� Announcement of a crawling in ation target, of a new VAT tax, etc.made at Tm to take place at T �.

� Assume that once the break occurs, it is permanent (or perceived to bepermanent); agents have the timing of the change right.


A0yt = A1yt�1 +B0Etyt+1 +D0et t � Tm (185)

A�0yt = A�1yt�1 +B�0Etyt+1 +D�0et t > T � (186)

A0tyt = A1tyt�1 +B0tEtyt+1 +D0tet Tm < t � T � (187)

Solution up to Tm and after T � are given by (182) and (183). FromTm < t � T � (transition period) the solution is

yt = Qtyt�1 +Gtet (188)

Since Etyt+1 = Qtyt, substituting into (187) we have

(1�A�10t B0tQt+1)�1A�10t A1t = Qt (189)

(1�A�10t B0tQt+1)�1A�10t D0t = Gt (190)

� In the transition period, the decision rules are time varying. How do we�nd fQt; GtgT

��1Tm+1

? Use backward induction!

- For t = T � � 1 : Qt+1 = Q�.

- For t < T � � 1 : use (189) to solve for Qt.

- Once fQtgT��1

Tm+1is known, use (190) to solve for Gt.

� Idea: expectations must be consistent with the fact that after T � thenew regime will be in place:

As we get closer to Tm (T �); the solution will look more like the solutionin �rst (second) regime.

Predictable structural changes: wrong timing

� Suppose agents think that the change will occur at T �� > T �.

There will still be three regimes:

- regime 1 with constant coe�cients up to Tm.

- regime 2 with constant coe�cients from T �� on

- a transition regime with a time varying solution between Tm and T ��.

� Need to distinguish between agents perceived break and actual breakchange and perceived and true expectations.

� Agents will get the right solutions up to Tm and after T ��

� From Tm to T �� agents use (189) and (190) where Et = ~Et and the

solution will be of the form yt = ~Qtyt�1 + ~Gtet.

� From Tm to T �� the actual expectations are Et = Et and the solutionwill be of will be of the form yt = Qtyt�1 + Gtet, where Qt and Gt willbe constant from T � to T ��:Thus

(1�A�10t B0t ~Qt+1)�1A�10t A1 = Qt (191)

(1�A�10t B0t ~Qt+1)�1A�10t D0t = Gt (192)

� Agent mistakes' will be such that even if the structure is constant afterT �; the economy will still be evolving from T � to T �� because agent'expectations are changing.

� Still assume you observe zt = Hyt + vt; vt � N(0; V ):

� Likelihood function (both for the case of right and wrong timing)

logL = �0:5NT log(2�)� 0:5TXt=1

log det(H�tjt�1H0 + V )

�0:5TXt=1

u0t(H�tjt�1H0 + V )�1ut (193)

where ut = zt � �Et�1zt, �tjt�1 � �Et(yt � �Et�1yt)(yt � �Et�1yt)0 where�Et�1 = Et�1 for t � Tm; �Et�1 = ~Et�1 for Tm < t � T �� where ~Et�1 =E�t�1 for T

�� < t � T �; and ~Et�1 = �Et�1 for Tm � t � T �.

� The state equation is di�erent depending on the period you are in.

� Easy to compute. Useful to study the e�ect of announcements whichare permanent but not in all situations.

Example 11.1 (Kulish and Pagan, 2016) Standard NK model,

- Increase in ation targeting: p=0.00125 to p=0.005 when Tm = 4 toT � = 16:

- Allow for the possibility announcement is not credible, i.e. agents believep=0.00125 after the announcement.

� If announcement is credible in ation up immediately; if not in ation goesdown initially and increases only when nominal interest rate falls.

11.2 Occasionally binding constraints

� Estimation technology in a model with occasionally binding constraintsis similar (see Guerrieri and Iacoviello, 2014,2015).

� Assume two regimes: a non-binding regime (constraint is slack) andbinding regime (constraint holds).

� Need a guess of which regime applies and guess needs to be veri�edex-post. If the guess not veri�ed, change guess and redo computation.Iterative approach.

� Linearized system in regime M1 (slack constraint)

AXt+1 +BXt + CXt�1 + Fet = 0 (194)

� Linearized system in regime M2 (binding constraint)

A�Xt+1 +B�Xt + C�Xt�1 +D� + F �et = 0 (195)

� Assume that there exists a saddle path in regime M1 (Blanchard andKahn condition is satis�ed), and absent shocks the system is expected toreturn to and stay permanently in M1 in a �nite amount of time.

� Given X0 and fetg1t=1; the solution is

X1 = P1X0 +R1 +Q1e1Xt = PtX0 +Rt +Qtet 8 t 2 [2;1) (196)

(the second equation reduces to Xt = PtX0+Rt if there are no shocks inregime M1).

� How do we �nd Pt; Rt; Qt?

� Assume than from 8t � T regime M1 applies. Then; the solution is

Xt = PXt�1 +Qet (197)

Since ET�1XT = PXT�1; if regime M2 applies at T-1, the system ofequations that need to be solved is

A�PXT�1 +B�XT�1 + C�XT�2 +D� + F �eT�1 = 0 (198)

Solving for XT�1 we have:

XT�1 = (A�P +B�)�1(C�XT�2 +D� + F �eT�1) (199)

and thus PT�1 � (A�P +B�)�1C�; RT�1 � (A�P +B�)�1D�; QT�1 �(A�P +B�)�1F �:

From (199) we have ET�2XT�1 = PT�1XT�2 +RT�1:

- If regime M2 applies in T-2, use again the above equation in equilibriumconditions for regime M2 to obtain

A�PT�1XT�2+RT�1+B�XT�2+C

�XT�3+D�+F �eT�2 = 0 (200)

- If instead regime M1 applies in T-2, use the above equation in the equi-librium conditions of regime M1 to obtain

APT�1XT�2 +RT�1 +BXT�2 + CXT�3 + FeT�2 = 0 (201)

Solving for XT�2;given XT�3; give us PT�2; RT�2; QT�2:

- Continuing backward until t=1 and depending on whether regime M2 orM1 is expected to apply to t=1 we have P1 = (A�P2 + B�)�1C�; R1 =(A�P2+B�)�1D�; Q1 = (A�P2+B�)�1F � or P1 = (AP2+B)�1C;R1 =0; Q1 = (AP2 +B)�1F

� The solution produced in this fashion is generally highly non-linear.

� The time varying coe�cients generally depend on the length of eachregime which, in turn, depends on the state vector.

� Approach needs a guess how when regime M2 ends. if at t=1 equationsimplied by (P1; Q1; R1) do not match with those of regime M2, need anew guess. Iterative approach.

Estimation

� Decision rules with occasionally binding constraints feature TVC co-e�cients. Di�erence with the standard setup is that parameter variationsare endogenous (they depend on the state the economy is). Kalman �lterallows only for exogenous parameter variations; it can not be used.

� A particle �lter or an unscented Kalman �lter (see Terejanu, 2015) canbe used but both are computationally demanding.

� Despite TVC, solution is locally linear in the shocks. Construct thelikelihood directly from the solution using a change of variable methodand a bunch of assumptions.

� Given observables zt = Hyt (no measurement error), the solution is:

zt = HP (yt�1; �t)yt�1 +HR(yt�1; �t) +HQ(yt�1; �t)�t (202)

� Given y0, recursively use (202) to �nd "t;given yt�1 and the observed zt

� If "t � N(0;�); the log-likelihood is

logL / �0:5T log det(�)� 0:5Xt

�t��1�t +

Xt

log jdet(@�t@zt

)j (203)

� Using (202) note that det(@�t@zt) = det(HQ(yt�1; �t)�1); provided the

inverse exists and the determinant is non zero (Here we need the fact thatsolution for zt is locally linear in �t). Hence

logL / �0:5T log det(�)� 0:5Xt

�t��1�t �

Xt

log jdet(HQ(yt�1; �t))j

(204)since jdet(HQ(yt�1; �t)�1)j = 1=jdet(HQ(yt�1; �t))j:

� No need of computing numerically derivatives to obtain the Jacobian.Q(yt�1; �t) known from the model solution; H is a selection matrix.

� Approach has similarities with Fair and Taylor (FT) (1983). FT uses aniterative approach that requires a guess of the whole path of the endoge-nous variables up to what is likely to be a distant end point. Here approachis still iterative but everything is done in one step and only requires an initialguess of whether the constraint is binding or not.

� Maliar et al (2015): Non-linear version of FT.

Piecewise linear algorithm

i) Compute the solution of the model;

ii) Use (202) to obtain f�tgTt=1;consistent with the solution in i) and thedata zt;

iii) Constructs the likelihood;

iv) Maximize it or specify a prior and compute posterior draws using MCMCmethods.

Example 11.2 Guerrieri and Iacoviello (2014) use a model with patientand impatient consumers. occasionally binding constraint on impatientconsumers.

bt � bt�1pt

+ (1� )mhtqt (205)

where is a persistence parameter, m steady state loan-to-value ratio, qtand ht are the price and quantity of housing, and pt ??? The borrowingconstraint adjusts slowly to the value of housing.

12 Estimating a pruned second order system

General nonlinear state space

y2t+1 = h(y2t; �) + ��et+1 et+1 � iid(0; I) (206)

y1t = g(y2t; �) (207)

Partition y2t = [(yf2t)0; (ys2t)

0]. Second order approximation is:

y2t+1 = hy(yf2t + ys2t) + 0:5hyy((y

f2t + ys2t) (y

f2t + ys2t)) + 0:5h��

2 + ��et+1(208)

y1t = gyy2t + 0:5gyy(y2t y2t) + 0:5g��2 (209)

Pruned representation (all variables are second order polynomials in et),see Andreasen, Fernandez and Rubio (2014):

yf2t+1 = hyy

f2t + ��et (210)

ys2t+1 = hyys2t + 0:5hyy(y

f2t y

f2t) + 0:5h��

2 (211)

yf1t = gyy

f2t (212)

ys2t = gy(yf2t + ys2t) + 0:5gyy(y

f2t y

f2t) + 0:5g��

2 (213)

� A pruned second order system has a linear AR representation.

Let zt = ((yf2t)0; (ys2t)

0; (yf2t yf2t))

0]0; �t =

266664et+1

et+1 et+1 � vec(Ine)

et+1 yf2t

yf2t et+1

377775then:

zt = z +Azt�1 +B�t (214)

� The system is stable if all eigenvalues of A are less then one in absolutevalue. Easy to check: since A only depends on hy.

� From (214) one can compute �rst and second moments analytically!First moments:

E(y2t) = E(yf2t) + E(ys2t) 6= 0 (215)

E(ys2t) = (1� hy)�1(0:5hyy(I � hy hy)

�1(�� )vec(Ine) + 0:5h��2 (216)

E(ys1t) = CE(ys2t) +D (217)

Second moments (solution to (218) found via standard Liapunov method)

V (zt) = AV (zt)A0 +BV (�)B0 (218)

Cov(zt; zt�1) = A0V (zt) (219)

V (y2t) = V (yf2t) + V (ys2t) + cov(yf2t; ys2t) + cov(ys2t; y

f2t) (220)

V (ys1t) = CV (y2t)C0 (221)

Cov(ys1t; ys1t�1) = CCov(zt; zt�1)C

0 (222)

� Can use a GMM to estimate the parameters using (215)- (222).

� Since �t in (214) are non-normal need a particle �lter to compute thelikelihood: complicated and time consuming

� Kollman (2015): use a deterministic quasi-likelihood approach.

- Use the Kalman �lter to construct the likelihood assuming that the errorare normal.

- If T is large, likelihood has roughly the correct shape, location and spread(see Hamilton, 1994, chapter 13).

� Kalman �lter much more (numerically) accurate than particle �lter andmuch faster. With large shocks, the increase in accuracy is substantial(see Kollman, 2015).

� Posterior analysis possible with quasi-likelihood. Need a proper prior.

13 Dealing with misspeci�cation

Inference with DSGEs di�cult because of:

� Population and sample identi�cation problems.

� Singularity problems (the number of shocks is smaller than the numberof endogenous variables).

� Latent variable problems (likelihood function may be di�cult to con-struct).

� Informational de�ciencies (models constructed to explain only a portionof the data).

� Numerical issues (acute if the model is of large scale, the data is shortor of poor quality).

� Typically use full information likelihood-based estimation methods (seeAndreasan et al., 2014, for an exception).

� Likelihood-based estimation and inference are conditional on the modelbeing correctly speci�ed. Are we willing to make such an assumption?

� Nowadays typical is to pump up the model prior to estimation withnuisance frictions, additional shocks or reduced form devices to try toreduce misspeci�cation.

� If you refrain from this, how do you deal with misspeci�cation?

- Add measurement errors to decision rules : Sargent and Hansen (1989),Ireland (2004).

- Add margins to the economic problem: Inoue et al. (2016); wedges tothe FOC: Chari et al. (2008), or to decision rules: Den Haan and Drechsel(2016)

- Generalize the shock process: Del Negro and Schorfheide (2009); usecorrelated shocks: Curdia and Reis (2009).

� In all cases check relevance of added features.

- Use a composite likelihood approach: Canova and Matthes (2016).

Example 13.1 Consider a standard permanent income problem

maxE0Xt

�t(�0ct � 0:5�1c2t ) (223)

at+1 = (1 + r)(at � yt � ct) (224)

yt = yt�1 + et (225)

�0 > 0; �1 > 0; �(1 + r) = 1, and et iid (0; �2). The solution is

ct =r

1 + rat + yt (226)

at+1 = (1 + r)(at � yt � ct) (227)

yt = yt�1 + et (228)

What kind of misspeci�cation could be present in this simple model?

1) yt has permanent and transitory components.2) r is not constant over time3) Assets at are mismeasured (because of mispreporting, home sector,etc.).4) Labor supply decisions may be important, other arguments in utility, etc.

� How does misspeci�cation a�ect inference?

Decision rule with permanent and transitory income:

ct =r

r + 1at + (y

Pt +

r

1� �y + ryTt ) (229)

at+1 = (1 + r)(at � yt � ct) (230)

yPt = yPt�1 + et (231)

yTt = �yTt�1 + �1t (232)

yt = yPt + yTt (233)

where yTt is transitory income and yPt is permanent income.

Decision rules with random interest rate:

ct = (1� 1

k)at + yt (234)

at+1 = (1 + rt+1)(at � yt � ct) (235)

yt = yt�1 + et (236)

where k = �E[(1 + rt+1)2].

Decision rules when assets are mismeasured:

ct = (r

r + 1)~at + yt (237)

~at+1 = (1 + r)(~at � yt � ct) (238)

yt = yt�1 + et (239)

~at = at + �2t (240)�2t = �e�2t�1 + �2t�1 (241)

� Misspeci�cation generally a�ects the consumption function. In the lasttwo cases the resource constraint is a�ected. In the �rst case the processfor the exogenous variables is misspeci�ed.

� Standard approach 1: add shocks e.g. to preferences. This makes thereal rate random. Helps with case 2, but not with cases 1 or 3.

� Standard approach 2: add friction, e.g. habit. Decision rules are

ct =h

1 + hct�1 + (1�

h

1 + h)yPt (242)

ypt =

r

1 + r((1 + r)at�1 +

1Xt=�

(1 + r)t��yt) (243)

yt = yt�1 + et (244)

Serial correlation in consumption could help with cases 1 and 3.

� Chari et al(2009): worrisome approaches. Neither method leads to solideconomic inference about, say, marginal propensity to consume.

� Inoue et al (2016): introduce "margins" (non-structural shocks) into theagent problem (preferences, technologies, etc).

Problem for the agent becomes:

maxE0Xt

�t((a+ ut)ct � 0:5bc2t ) (245)

at+1 = (1 + r)(1 + vt+1)(at � yt � ct + wt) (246)

yt = yt�1 + et (247)

ut = �uut�1 + z1t (248)

vt = z2t (249)wt = �wwt�1 + z3t (250)

a > 0; b > 0; �(1 + r) = 1, and zjt iid (0; �2j).

Solution is (where = (1 = r)Et[(1 + vt+1)2])

ct = (1� 1

)at + yt +

1� �ub( � �u)

ut + � 1 � �w

wt (251)

at+1 = (1 + r)(1 + vt+1)(at � yt � ct + wt) (252)

yt = yt�1 + et (253)

ut = �uut�1 + z1t (254)

vt = z2t (255)wt = �wwt�1 + z3t (256)

� Captures misspeci�cation in consumption function and budget constraint.

� How do you distinguish various cases? Compare the marginal likelihoodof (251)-(256) with and without ut, with and without vt+1, and with orwithout wt. If di�erence not signi�cantly di�erent, margin not necessary.

� Kocherlakota (2007): dangerous to use "�t" to select misspeci�ed mod-els.

� Add wedges (Chari et al., 2008): to each individual FOC to account forbusiness cycle uctuations. Wedges could be serially and cross correlated.

� Add measurement errors (Hansen and Sargent, Altug, Ireland).

ct =r

r + 1at + yt + u1t (257)

at+1 = (1 + r)(at � yt � ct) + u2t (258)

yt = yt�1 + et (259)

u1t = �1u1t�1 + !1t (260)

u2t = �2u2t�1 + !2t (261)

!it iid (0; �2i )

� Captures misspeci�cation in consumption function and budget constraint.

� Hard to distinguish various cases, i.e. hard to give a structural interpre-tation to the measurement errors. For example, hard time to distinguishcase 3 from others since u1t and u2t will be both nonzero.

� Add wedges to the decision rules: Den Haan and Drechsel (2016). Sim-ilar to the measurement error idea, but allow zit to be cross sectionallycorrelated.

� Generalize shock process: Del Negro and Schorfheide (2009) The solutionis

ct =r

r � 1at +

r(1 + r)

(1� �1 + r)(1 + r)� �2yt

+�2

(1� �1 + r)(1 + r)� �2yt�1 (262)

at+1 = (1 + r)(at � yt � ct) (263)

yt = �1yt�1 + �2yt�2 + et (264)

with �1 + �2 = 1.

� �2 will be signi�cant in all three cases as the consumption equation ismisspeci�ed. No allowance for misspeci�cation in the resource constraint.Can't distinguish various forms of misspeci�cation.

� Allow for correlated shocks: Curdia and Reis (2009) (here include tran-sitory income correlated with permanent income).The solution is

ct =r

r � 1at + yPt +

r

(1� �+ r)yTt (265)

at+1 = (1 + r)(at � yt � ct) (266)

yPt = yPt�1 + et (267)

yTt = �yPt�1 + �t (268)

yt = yPt + yTt (269)

zt = [et; �t]0iid(0;�)

� Because consumption function is misspeci�ed, variance of transitory in-come will always be di�erent from zero. No misspeci�cation allowed inthe budget constraint. Can detect presence of transitory income; hard todetect random interest rate or mismeasured assets.

� Composite likelihood approach: Canova and Matthes (2016)

- Take all potential model speci�cations seriously. Construct the likelihoodfor each model.

- Do a geometric combination and estimate relevant common parame-ters. Geometric combination likely to be less misspeci�ed than each singlemodel, if wrongly assumed.

- Use posterior of model weights to decide the extent of misspeci�cationin each candidate (use as model selection if needed).

- Perform inference using geometric combination (robusti�cation)

- Di�erent then nesting models into a general one or doing ex-post com-bination for inference.

Example 13.2 Logic: when a model is misspeci�ed, information in addi-tional (misspeci�ed) models restricts the range parameter values can take.

� DGP (AR2): yt = �1yt�1 + �2yt�2 + et; et � (0; �2).

� Estimated models

- (AR1): yt = �1yt�1 + ut; ut � (0; �2u)

- (MA1): yt = ut + �1ut�1; ut � (0; �2u).

� Focus on the relationship between �2u and �2.

� Simulate 150 data points from DGP. Use T=[101,150] for estimation.Weights:

1) ! = 1� ! = 0:5.

2) Based on MSE of two models in training sample T=[2,100]

3) Based on ML of two models in training sample T=[2,100]

4) Based on composite ML in training sample T=[2,100]

Estimates of �u. DGP: yt = �1yt�1 + �2yt�2 + et et � N(0; �),Parameters AR(1) MA(1) CL, equal CL, ML CL, MSE CL, composite

weights weights weights weights� = 0:5; �1 = 0:7; �2 = �0:10.36(0.03)0.36(0.03)0.38(0.03)0.37(0.03)0.36(0.03)0.47(0.04)� = 0:5; �1 = 0:5; �2 = 0:2 0.35(0.03)0.36(0.03)0.37(0.03)0.36(0.03)0.35(0.03)0.47(0.04)� = 0:5; �1 = 0:6; �2 = 0:35 0.36(0.03)0.40(0.03)0.40(0.03)0.41(0.03)0.37(0.03)0.48(0.04)� = 1:0; �1 = 0:7; �2 = �0:10.61(0.04)0.35(0.05)0.62(0.04)0.62(0.04)0.60(0.04)0.78(0.05)� = 1:0; �1 = 0:5; �2 = 0:2 0.60(0.04)0.61(0.04)0.61(0.04)0.62(0.04)0.60(0.04)0.78(0.05)� = 1:0; �1 = 0:6; �2 = 0:35 0.62(0.04)0.38(0.05)0.67(0.04)0.67(0.04)0.61(0.04)0.76(0.05)� = 2:0; �1 = 0:7; �2 = �0:10.95(0.04)0.45(0.04)0.96(0.06)0.96(0.04)0.93(0.04)1.14(0.05)� = 2:0; �1 = 0:5; �2 = 0:2 0.93(0.04)0.43(0.04)0.95(0.04)0.95(0.04)0.94(0.04)1.14(0.05)� = 2:0; �1 = 0:6; �2 = 0:35 0.98(0.01)0.51(0.01)1.02(0.01)1.02(0.01)0.99(0.01)1.15(0.05)

� Gains increase with persistence and volatility of the DGP.

� Gains of MSE or ML based over equal weights are small.

� Larger gains for CL based estimates of !, independent of the parame-terization.

Fixed vs. random weights?

� Assume prior on ! � N(0:5; 0:1)

� DGP: yt = 0:6yt�1 + 0:35yt�2 + et; et � N(0; 0:52)

� Plot CL(equal weights), CL(optimal weights), CL(random weights)

Posterior of �u

0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.750

2

4

6

8

10

12

14 Estimates of

CL(fixed weights)CL(random weights)CL(optimal)

Posterior of !

0.8 0.85 0.9 0.95 1 1.05 1.10

5

10

15

20

25 Estimates of

How does the posterior of ! looks like when among the candidates thereis the DGP?

� DGP (AR1): yt = �1yt�1 + et; et � (0; �2) or (MA1): yt = et +

�1et�1; et � (0; �2).

� Estimated models

- (AR1): yt = �1yt�1 + ut; ut � (0; �2u)

- (MA1): yt = ut + �1ut�1; ut � (0; �2u).

� T=100, � = 1.

� Jointly estimate (�; !).

Some intuition about CL in mispeci�ed models

� Two models: A, B; both misspeci�ed having implications for some vectorsyA and yB.

� Assume the decision rules are:yAt = �AyAt�1 + �Aet (270)

yBt = �ByBt�1 + �But (271)

where et and ut are iid N(0,I).

� Suppose �B = ��A; �B = �A; yAt and yBt are scalars; samples are TAand TB; TB � TA.

� Suppose we care about � = (�A; �A):

� Suppose weights are (!; 1� !).

� Maximization of the composite likelihood leads to:

�A = (TAXt=1

y2At�1 + �2

TBXt=1

y2Bt�1)�1(

TAXt=1

yAtyAt�1 + �1

TBXt=1

yBtyBt�1)

(272)

�2A =1

�(TAXt=1

(yAt � �AyAt�1)2 +

1� !

! 2

TBXt=1

(yBt � ��AyBt�1)2) (273)

where �1 =1�!!

� 2; �2 = �1�; � = (TA + TB

1�!! 2

) is "e�ective"sample

size.

� Shrinkage estimators for �: Formulas are same as in i) Least Squareproblem with uncertain linear restrictions (Canova, 2007, Ch.10), ii)prior-likelihood approach, iii) DSGE-VAR.

� For �, model B plays the role of a prior for model A. �B estimated usingonly model B information.

� Informational content of model B data for � measured by ( ; �; 1� !).The larger is and the smaller is �, the lower is the importance of modelB information.

� More weight given to data assumed to be generated by a model withhigher persistence and lower standard deviation.

� ! is the (a-priori) trust an investigator has in model A information.

� For multiple models equation (272) is

� = (T1Xt=1

y21t�1 +KXi=2

�i2

TiXt=1

y2it�1)�1(

T1Xt=1

y1ty1t�1 +KXi=2

�i1

TiXt=1

yityit�1)

(274)

where �i1 =!i!1

�i 2i; �i2 = �i1�i.

� Robusti�cation: estimates of (�; �2) forced to be consistent with datafrom all models.

Notice:

� yAt and yBt may be di�erent variables. Can use models with di�erentobservables.

� yAt and yBt may be the same variable with di�erent level of aggregation(say, aggregate vs. individual consumption).

� TA and TB may be of di�erent length. Can combine models setup atdi�erent frequencies (e.g. a quarterly and an annual model).

� TA; TB could be two samples for the same variables coming from di�erentcross sectional units, or di�erent time periods.

14 Composite Likelihood

� Apart from dealing with misspeci�cation, Composite likelihood (CL) ap-proach helps to jointly address:

- Identi�cation and singularity problems;

- Short data sets

- Large scale models.

- Panels of unit speci�c data

The standard CL setup

� Known but intractable/numerically complicated DGP: for t=1....T

F (yt; ) (275)

where yt is a m� 1 vector of observables ; a q� 1 vector of parameters.

� Let = (�; �) : � are the parameters of interest, � other parameters.

� Suppose for some (A1; : : : AK);we construct (marginal or conditional)subdensities

f(yit 2 Ai; �; �i) (276)

where yit is Ti � 1; i=1...K.

� Submodel speci�c parameter vector: i = [�; �i]0; �i (nuisance) para-

meters.

� Information produced by each submodel Ai: (yit,Ti; i):

� Given set of weights 0<!i < 1;the composite log-likelihood is

CL(�; �1; :::�Kjy1t:::yKt) =KXi=1

!i log f(yit 2 Ai; �; �i) (277)

� CL(�; �1; :::�Kjy1t:::yKt) is not a log-likelihood function. Under regular-ity conditions, if f(yit 2 Ai; �; �i) are marginal (conditional) subdensities:

�CLP! �0 (278)

T 0:5(�CL � �0)D! N(0; G(�)�1) (279)

for T !1 and K �xed (see e.g. Varin, et al., 2011).

� Since G(�) 6= �H(�) estimation ine�cient.

� Careful choice of !i may improve the e�ciency.

� Optimal weights: min!jjG(�) -I(�)jj; I(�) =Fisher information matrix.

� Otherwise how do you pick ! = (!1:::!K)?

- A-priori, e.g., !i =1K ; 8i.

- Data driven approach; e.g. !i =exp({i)PKi=1 exp({i)

; 8i, {i=hi(y1;[1:� ]; ::::; yK;[1:� ]),

[1 : � ] = training sample (could make !i time varying) or use the marginalCL: (!1:::!K) are treated as hyperparameters.

Our (DSGE) framework

� F (yt; ) is unknown. Why?

- Not enough information to construct F (yt; ).

- Can write the VAR representation of yt but not the DGP.

- F is highly non-linear and can derive only a linear representation for yt.

- F may be of large scale and we only know pieces of it; etc..

� f(yit 2 Ai; �; �i) are neither marginal nor conditionals.

- They are misspeci�ed approximations (simpli�cations) of the DGP.

- They are incompletely speci�ed statistical descriptions of the DGP.

Examples of F (yt; ) and f(yit 2 Ai; �; �i)

1) K di�erent economic models (e.g. a RBC model, a RBC model with�nancial frictions, a New Keynesian model with sticky price, a new Key-nesian model with sticky wages, etc.). All are misspeci�ed because theydisregard aspects or take modelling short cuts (e.g. investment adjustmentcosts).

2) K subcomponents of a large scale model (e.g., country speci�c or bilat-eral blocks of a multi-country model).

3) K approximate solutions obtained with di�erent orders of perturbation.

4) K linear solutions, where one parameter (�i) is allowed to be timevarying.

5) K di�erent statistical models, derived from one theoretical model using:

- di�erent observables; e.g. a standard three-equations NK model with (Y,�cpi, R); (C,�cpi,R);(Y, �gdp, R), etc..

- di�erent samples (e.g. pre-WWI, interwar, post-WWI, etc.).

- di�erent cross sectional units data.

- di�erent aggregation level (e.g. �rm, industry, regional, etc.) data.

In all cases

� f(yit 2 Ai; �; �i) is a limited information density (it ignores the potentialdependence of Ai).

� f(yit 2 Ai; �; �i) need not be mutually compatible across Ai.

� yit need not be mutually exclusive across Ai:

� Generally free to choose what goes in � and in �i (e.g., the CRRAcoe�cient could be in one or the other; in � there could be the slope ofPC and in �i capital adjustment costs or wage stickiness parameters).

Asymptotics of (misspeci�ed) CL

� Because of misspeci�cation, standard asymptotic results do not apply.

Following White (1982), White and Domowitz (1982) under regularityconditions if CL is a density, !i are �xed and T !1

�1; �CL! �0;CL, the minimizer of the Kullback-Leibler distance between

F (yt; ) andYi=1

f(yit 2 Ai; �; �i)!i:

� T0:5(�CL � �0;CL)! N(0; G(�)):

Small sample properties

� Given (yit, Ti; �; �i; !i); construct

CL = L(�; �1jY1;T1)!1 : : :L(�; �KjYK;TK)

!K (280)

� Let the prior for (�; �i; !i) be

p(�; �i; !i) = p(�ij�)p(�)p(!i) (281)

� Composite posterior

g(Y1;T1; :::YK;TK j 1:::: K; !1::::!K) _ �iL( ijYi;Ti)!ip(�ij�)!ip(�)p(!i)

(282)� Employ MCMC to estimate ( i; !i) (see Kim (2002); Chernozukov andHong (2003)).

� Use a 2K + 1 block Metropolis-within-Gibbs algorithm (see Chib andRamamurthy (2010); Herbst and Schorfheide (2015).

Estimation Algorithm

1. Start with [�01 : : : �0K; �

0; !01 : : : !0K]:

For iter = 1 : draws do steps 2-4

2. For i = 1 : K; draw ��i from a symmetric proposal P�i. Set �iter = ��iwith probability

min

0BBB@1; L(h��i ; �

iter�1ijYi;Ti)

!iter�1i p(��i j�iter�1)!iter�1i

L(h�iter�1i ; �iter�1

ijYi;Ti)

!iter�1i p(�iter�1i j�iter�1)!iter�1i

1CCCA

3. Draw �� from a symmetric proposal P �. Set �iter = �� with probability

min

1;

L(��iter1 ; ��

�jY1;T1j)!

iter�11 : : :L(

��iterK ; ��

�jYK;TK)!

iter�1K p(��)

L(��iter1 ; �iter�1

�jY1;T1)!

iter�11 : : :L(

��iteri ; �iter�1

�jYK;TK)!

iter�1K p(�iter�1)

!

4. For i = 1 : K; draw !�i from a symmetric proposal P!. Set !iter =!� = (!�1::: !

�k) with probability

min

1;

L(��iter1 ; �iter

�jY1;T1j)!

�1 : : :L(

��iterK �iter

�jYK;TK)!

�Kp(!�)

L(��iter1 ; �iter

�jY1;T1)!

iter�11 : : :L(

��iteri ; �iter

�jYK;TK)!

iter�1K p(!iter�1)

!

� If subdensities have no nuisance parameters, combine steps 2.-3. If !iare �xed, drop step 4.

� Use random walk proposal for P � and P�i; multivariate logistic (inde-pendent Dirichlet) for P!.

Non-standard estimation problem

� Since yit are not mutually exclusive across Ai, MCMC standard errorsmay be too small (CL treats yit as independent).

� Adjusted Metropolis-within-Gibbs sampler to have credible intervals withfrequentist coverage (see Mueller, 2013).

� i) Maximize the composite likelihood to get H(�); J(�):ii) For each draw �j;use

~�j= � + V �1(�j � �) (283)

where � is the posterior mode, V is computed from the singular valuedecomposition of (H(�)J(�)�1H(�)).iii) Proceed as in previous algorithm.

� (283) requires parameters to be well identi�ed (and mode unique).

Composite Predictions

- ~yt+l: future values of variables appearing in all Ai, l = 1; 2; : : :.

- f(~yt+ljyit; �; �i) = prediction of ~yt+k made by submodel Ai. Let

fcl(~yt+ljy1t; : : : ; ykt; �; �1; : : : ; �K; !1; : : : !k) =Yi

f(~yt+ljyit; �; �i)!i

(284)

fcl is a geometric pool (with weights !i) of K misspeci�ed predictions,given (y1t; : : : ; ykt), and the parameters. The composite predictive distri-bution of ~yt+l, given the data and the weights is

p(~yt+ljy1t; : : : ; yKt; !1; : : : !K) /Zf cl(~yt+ljy1t; : : : ; yKt; �; �1; : : : ; �K; !1; : : : ; !K)

p(�; �1; : : : ; �Kj!1; : : : ; !K; y1t; : : : ; yKt)d�d�1 : : : d�K(285)

� The composite predictive density obtained taking the joint density offuture observations, of parameters, and of available data for each Ai, geo-metrically weighting them, and integrating the expression with respect tothe posterior of the parameters.

� Di�erences with the true predictive distribution: i) the prediction functionuses the composite prediction pool rather than the true prediction density;ii) the composite prediction pool is integrated with respect to the compositeposterior density rather than the true posterior.

� To compute (285), use the mean (mode) of the posterior of !i (Couldalso integrate with respect to the posterior of !, if it is of interest).

Comparison with other linear pooling devices

Typically two approaches to pooling:

� Linear pooling (�nite mixtures predictive densities, BMA , static pools)

� Logarithmic pooling (CL).

- With logarithmic pooling predictive densities generally unimodal and lessdispersed than linear pooling; invariant to the arrival of new information(updating the components of the composite likelihood commutes with thepooling operator).

Composite IRFs and counterfactuals

� The (kernel of) the density of the composite posterior impulse responsescomputed using:

p(~yt+ljy1t; : : : ; yKt; !1; : : : !K) /Zf�if([~yt+ljyit; �jit = ��

jit; �; �1; : : : ; �K; !1; : : : ; !K)]� [~yt+ljyit; �

jit = 0; �; �1; : : : ; �K; !1; : : : ; !K)]g �

p(�; �1; : : : ; �Kj!1; : : : ; !K; y1t; : : : ; yKt)d�d�1 : : : d�K (286)

Expression de�nes a logarithmic pool of impulse responses, given (!1; : : : ; !K).

� Counterfactuals: Let �ykt+l a future path in the k-th element of ~yt+l.Using f(�ykt+ljyit; �

jit+l; �; �i) for submodel Ai, �nd the path of ��

jit+l which

is consistent with the assumed �ykt+l. Compute f(�yk0t+ljyit;��jit+l; �; �i),

for k0 6= k. The composite counterfactual path is the logarithmic pool asin (285).

Composite Likelihood in practice

� Still assume two models, A and B; both misspeci�ed.A: Solving small sample identi�cation problems

� If TA is short, LA may be at. Common parameters may not be wellidenti�ed using yAt.

� If (yAt;yBt) are jointly used, e�ective sample size is � = h(TA, TB; !; ).

� If ! or are small, � >> TA and CL could be more curved than LA.

B: Ameliorating population identi�cation problems

� Canonical New Keynesian model (model A)

RAt = �Et�At+1 + e1t (287)

yAt = �EtyAt+1 � �(RAt � Et�At+1) + e2t (288)

�At = �Et�At+1 + yAt + e3t (289)

RAt = nominal rate, yAt = output gap and �At = in ation rate;(e1t; e2t; e3t)mutually uncorrelated, (�; �; �; �; ) structural parameters, and Et is theconditional expectation. The solution is264 RAtyAt

�At

375 =264 1 0 0� 1 0� � 1

375264 e1te2te3t

375 � Aet (290)

� � is underidenti�ed.

� Slope of the Phillips curve may not be well identi�ed if � is small.

� Consider a B model, known to be more misspeci�ed relative to A model,e.g. a single equation Phillips curve with exogenous marginal costs:

�Bt = �Et�Bt+1 + yBt + u2t (291)

yBt = �yBt�1 + u1t (292)

� < 1; � > 0. The solution is"(1� �`)yBt(1� �`)�Bt

#=

"1 0

1�� 1� �`

# "u1tu2t

#(293)

� Assume ! >> 1� ! > 0 (so B is more misspeci�ed than A).

� Because the log-likelihood of B has information about �; it can be iden-ti�ed (and estimated) from CL.

� In model B log-likelihood curvature in depends on 11�� > 1 if � 6= 0.

CL has more information about than log-likelihood of A, even if 1� !is small.

� Arguments independent of the e�ective sample size � - identi�cationproblems are in population.

C: Solving singularity problems

� If a model features more endogenous variables than shocks:

- Select a subvector of the observables matching the dimension of shockvector informally (see Guerron Quintana, 2010) or formally (see Canova etal., 2014).

- Add measurement errors.

- Invent new structural shocks.

- Alternative: use CL on the singular model, see also Qu (2015).

� Suppose dt = et�'et�1; et � iidN(0; 1); ' < 1; and let pt = �i�idt+i

� < 1:Then

pt = (1� �')et � 'et�1 (294)

� Covariance matrix of (dt; pt) singular. Construct likelihood using eitherdt or pt to estimate (�; �

2).

� Alternatives: measurement error? Random �? No.

� With CL both (dt; pt) can be used to estimate (�; �2; �).

logL('; �2j ~dt) = �0:5T log(2�)� 0:5TXt=1

log &t � 0:5TXt=1

~d2t&t

(295)

~dt = dt � '1 + '2 + '4 + : : :+ '2(t�2)

1 + '2 + '4 + : : :+ '2(t�1)~dt�1 (296)

&t = �21 + '2 + '4 + : : :+ '2t

1 + '2 + '4 + : : :+ '2(t�1)(297)

logL(�; '; �2j~pt) = �0:5T log(2�)� 0:5TXt=1

log �t � 0:5TXt=1

~p2t�t

(298)

~pt = pt � 2

�

1 + 2 + 4 + : : :+ 2(t�2)

1 + 2 + 4 + : : :+ 2(t�1)~pt�1 (299)

�t = �2(1� ��)21 + 2 + 4 + : : :+ 2t

1 + 2 + 4 + : : :+ 2(t�1)(300)

where 2 = '2

(1��')2 .

� CL = ! logL('; �2j ~dt) + (1� !) logL(�; '; �2j~pt).

� No closed expressions for either ML or CL estimators of �. Still possibleto see what composite likelihood does.

� � will be identi�ed and estimated:

- more from the correlation properties of ~pt; if logL(�; '; �2j~pt) is used

- more from the variance properties of ~dt; if logL('; �2j ~dt) is used.

� Estimates from (298) will di�er from those in (295)!

� Depending on !, CL emphasizes the serial correlation, the variance prop-erties of (dt; pt) or both.

D. Dealing with a large scale structural model

� Let yt = A(�)yt�1+et; et iid N(0,�(�)); be the decision rule; � is a vectorof structural parameters; yt is of large dimension and dim(yt) � dim(et).

� Let ~yt � yt be such that dim(~yt) = dim(et); let ]A(�) be the squareversion of A(�) corresponding to ~yt.

� The likelihood function isL(�j~y) = (2�)�T=2j�jT=2 expf(~yt �]A(�)~yt�1)�(�)�1(~yt �]A(�)~yt�1)0g (301)

� Potential problems:

- Computation of �(�)�1 may be demanding.

- Numerical di�culties if elements of ~yt are collinear or if there are nearsingularities in observables (e.g, if a long and a short term rate are used inestimation).

- If ~yt = (~y1t; ~y2t); and ~y2t are non-observables,

L(�j~y1) =ZL(�j~y1; ~y2)g(�j~y1; ~y2)d~y2 (302)

may be intractable.

� Let yt be the observable variables. Partition yt = (y1t; y2t; : : : yKt), sodim(y1t) = dim(y2t) = : : : = dim(et); Two possible CL are:

CL1(�jyt) =KXi=1

!i logL(�jyit) (303)

CL2(�jyit; y�it; i = 1; 2; :::) =KXi=1

!i logL(�jyit; y�it) (304)

� CL1 neglects the correlation structure between yit: blocks treated asindependent. In a multi-country model, yit = observables of country i.

� CL2 obtained conditionally blocking groups of variables. In a multi-country model, use country variables yit, given other countries variablesy�it.

� Choice of CL1 or CL2 is application dependent.

E. Dealing with short data when cross sectional data is available

� Single structural model (e.g. an asset pricing or a consumption functionequation) but we have data from di�erent units (portfolios, consumers) ordata with di�erent levels of aggregation (�rm, industry, sector, country).

� y1t; y2t; :::yKt are the same observables for each i = 1; 2:::K.

CL(�jy1t; y2t; :::yKt) =KXi=1

!i logL(�jyit) (305)

� CL neglects the correlation structure across units but stochastically poolscross sectional information.

� For AR decision rules, CL produces Zellner and Hong (1989) estimators:use individual and (weighted) average of cross sectional information.

� Recall for multiple models equation (272) is

� = (T1Xt=1

y21t�1 +KXi=2

�i2

TiXt=1

y2it�1)�1(

T1Xt=1

y1ty1t�1 +KXi=2

�i1

TiXt=1

yityit�1)

(306)

where �i1 =!i!1

�i 2i; �i2 = �i1�i.

� PKi=2 �i2

PTit=1 y

2it�1 and

PKi=2 �i1

PTit=1 yit�1yit: weighted average of

the information in units other than 1.

� Stochastic Pooling: useful when each Ti is short; e�ective sample sizeis � = (T1 +�i6=1Ti

1�!i!i

2i)

14.1 Combining information for estimation purposes

� Often we have information on the observable variables not used in esti-mation; e.g. data before a structural break or from another regime, dataon the same variables from other countries, etc.

� Can we use this data to sharpen inference even if it may have less bearingthan available data?

� Suppose yt = [y1t; y2t]. The posterior kernel is:

�g(y1; y2j�) � f(y1; y2j�)g(�) = f(y2jy1; �)f(y1j�)g(�) / f(y2jy1; �)g(�jy1)(307)

where

g(�jy1) =f(y1j�)g(�)Rf(y1j�)g(�)d�

(308)

� Posterior for � is obtained �nding �rst the posterior conditional on y1tand then, treating the posterior as a prior for the next stage, �nding theposterior using y2t (sequential learning).

� Here y1t; y2t have the same importance. How do we scale down theimportance of y1t?

� Let 0 � � � 1. Then rather than (308), the prior for stage 2, is

~g(�jy1; �) =f(y1j�)�g(�)Rf(y1j�)�g(�)d�

(309)

� Can use the same approach if some (earlier) data is expected to be oflower quality than other (later) data.

� Could also modify (307), so that y1t and y2 get weighted di�erently.

~g(y1; y2j�) / f(y2jy1; �)1��f(y1j�)�g(�) (310)

� f(y2jy1; �)1��f(y1j�)� is a weighted likelihood - it looks like a compos-ite likelihood.

� If we use f(y2j�)1�� rather than f(y2jy1; �)1�� it is exactly a compositelikelihood (constructed using marginals).

14.2 Quasi-Bayesian methods

� Central in Bayesian analysis is the likelihood of the model. If a model ismisspeci�ed, the likelihood is misspeci�ed, and so is inference.

� Can we construct posteriors without the likelihood of the model? Yes.CL does not use the likelihood of the model. In general quasi-Bayesianmethods. Kim (2002), Christiano et al (2011): use the likelihood of mo-ments of a model.

� Idea similar to endogenous prior setup of Del Negro and Schorfheide(2008). Can be used to estimate linear and nonlinear models.

� Approach likely to produce inference which is more robust to misspeci�ca-tion. Problem: if sample is small or moments non-regular use of asymptotic(normal) approximation to moment distance problematic.

� Small sample alternative: Creel and Christensen (2015), Scalone (2016)Approximate Bayesian Computation (ABC).

� Bayesian limited information (BLI): Kim (2002)

- The posterior distribution g(�jy) = L(�jy)g(�)g(y)

is approximated by

g(�jm;V ) = L(mjV; �)g(�)L(mjV )

(311)

where g(�) is a prior, V is the variance of the data moments and

L(mjV; �) / jV j expf0:5T ( � (�))0V �1( � (�))g (312)

where T is the sample size, are sample moments, (�) are model mo-ments and � the structural parameters. Run standard MCMC algorithmusing (311)-(312) to get posterior of �.

- Advantages: (312) easy to compute also for non-linear models.

- Problems: (312) is an asymptotic approximation to distribution of mo-ments. When is the likelihood of the moments "close" to the likelihood ofthe model?

� Approximate Bayesian computation (ABC) Pritchard et al. (2000). Use

g(�jm;V ) � (�jj � (�)jj < �)g(�) (313)

where � is a set of weights, � is a tolerance level and jj.jj= (PJj=1( j �

j(�))2)0:5 is the Euclidean distance.

- If � = 1, simulate data of the same length as the actual data (no largesample approximation) with a draw from the prior g(�), compute (�) andkeep draw if its distance is less than � from the data moment; otherwisereject it.

- Advantages: very easy to run. If are su�cient statistics, as �! 0 andthe number of replications N ! 1, distribution of accepted parametersconverges to posterior distribution of �.

- Disadvantages: if g(�) is far away from the posterior, algorithm computa-tionally demanding. Problem gets compounded when model dimensionalityis large (and the dimension of is large).

� Re�nements

- Weights accepted draws with kernel (say, with weights proportional to theinverse of jj � (�)jj). This is the Bayesian indirect likelihood estimatorof Creel and Christensen (2015)

- Add a post-sampling correction step. Regress accepted draws againstjj � (�)jj. Update draws according to regression coe�cient, i.e. �� =� + jj � (�)jj=� (see Scalone, 2016).

- A series of SMC steps are added where a decreasing sequence of �i; i =1; 2; :::;M is used to compute posteriors ( see later on for SMC methods).

Example 14.1 BLI and ABC estimators of a standard RBC model. Com-pute

� RMSE for each parameter: RMSEi= 1N

PNi=1(�

api ��

fi )

�i, where �ap= approx-

imate estimator �f= posterior estimators computed with the likelihood ofthe model, � true parameter value and N number of replications.

� Overlapping ratio for each parameter: ORi =CI

api;90\CI

fi;90

CIapi;90[CI

fi;90

.Note �1 �

ORi � 1; OR = 1 if there is perfect overlap and OR = �1 if there is nooverlap.

- Basic ABC with rejection pretty good!!

15 Sequential MC (SMC) methods

� Herbst and Schorfheide (2014): Posteriors for DSGE parameters havemultiple modes. Hard to �nd them with RW Metropolis-Hastings, easywith SMC methods.

� Advantages of SMC relative to MCMC:

- SMC initialization is fast and easy. MCMC initialization typically requiresthe mode (time consuming); risk of dependence on the starting value.

- SMC computations are parallel; MCMC computations are generally serial.

- SMC generates estimates of the marginal data density as a byproduct-computing Bayes factors for competing models straightforward.

- SMC more e�cient than a general-purpose MCMC algorithm or preferablewhen e�cient algorithms require priors with undesirable features.

Basic idea

1) Start with a set of parameter values randomly drawn from the prior.These parameter values are associated with weights. The set of para-meter values and weights (the \particles") de�nes a discrete distributionapproximating the prior.

2) Reweight these particles to iteratively approximate a sequence of dis-tributions, each of which combines the prior with partial information fromthe likelihood. Each distribution in the sequence uses more likelihood in-formation than its predecessor.

3) The algorithm concludes once the information from the full likelihoodhas been incorporated. In the end, we have a set of particles providing adiscrete approximation to the model's posterior.

Let �(�jy) = f(�)Z(�)

� g(�jy); f(�) � L(�jy)p(�);Z(y) �RL(�jy)p(�)d�:

We want to approximate �(�) and Z(y):

- A precursor of SMC is Importance sampling (IS):

- IS approximates a target density, say, f(�) by an easy-to-sample den-sity g(�), (the \source density"). For any continuous function h of theparameters �, it uses the identity

E�[h(�)] =Zh(�)�(�)d� = Z(y)�1

Z�h(�)g(�)w(�)d� (314)

where w(�) =f(�)g(�)

. If �i � iid from g(�), i = 1,...N, then, under regularity

conditions - see Geweke (1989) -the Monte Carlo estimate

�h = N�1NXi=1

h(�i)W i (315)

whereW i = w(�i)=(N�1PNj=1w(�

j)); converges almost surely to E�[h(�)]

as N !1.

Each W i, the (normalized) importance weight, is assigned to �i. We call

the pair (�i, Wi) a "particle". (�i;Wi)Ni=1 provide a discrete distributionapproximating �(�).

- The distance between g(�) and f(�) measures the accuracy of the (per-particle) approximation and the uniformity (or lack thereof) of the distri-bution of weights re ects the size of this distance.

-If the distribution of weights is very uneven, only a few particles contributemeaningfully to the Monte Carlo approximation �h and the approximationis likely to be inaccurate. Uniform weights (the ideal situation) arise if

g(�i)=f(�i) = constant for each �i, in which case we are sampling directlyfrom �(�).

- Constructing \good" importance distributions, is di�cult. A SMC algo-rithm avoids the problem by recursively building a particle approximationto a sequence of distributions that starts from a known distribution (theprior), and end with the distribution of interest (the posterior).

Let n index a sequence of distributions of the form �n(�) =fn(�)Zn(y)

=

L(�jy)�np(�)RL(�jy)�np(�)d� n = 1:::N� and choose an increasing sequence of values

for the scaling parameter, �n, such that �1 = 0 (so �1(�) is the prior),and�� = 1 (so �N�(�) is the posterior).

The method of bridging distributions by raising the likelihood to a powerless than 1 is known as \likelihood tempering" and �n is a sequence oftempering parameters.

The algorithm works as follows:

- The algorithm describes the steps needed to construct a particle approx-imation n starting from particle approximation n� 1.

- It starts with particles sampled from p(�) and assigned uniform weights.At any stage n of the recursion in the correction step, the particles arereweighted according to �n. ( this is equivalent of importance samplingof �n using �n�1 as the proposal). In the selection step, the particlesare rejuvenated using multinomial resampling (if only few particles havemeaningful weight) to avoids the issue of particle impoverishment. Re-sampling itself introduces noise into the simulation; so it should be doneonly if necessary. In the mutation step, particles are moved around theparameter space using M iterations of an MCMC algorithm (with invariantdistribution �n) on each individual particle.

- If dim(�) is large better to use a block MCMC algorithm at the mutationstage - see Chib and Ramamurthy (2010).

- As in standard MCMC, the mutation is crucial to move towards areas ofhigher density of �n and ensures diversity across replicated particles when

resampling occurred during the selection step. Without the mutation step, repeated resampling of the corrected particles would leave only a fewvalues surviving until the �nal stage, resulting in a poor approximation tothe posterior.

- Unlike MCMC, the particle distribution approximates �n even beforemutation is performed. Thus means the procedure is valid for even shortchains (say, N� = 1).

- How do you choose f�ngN�n=1? Herbst and Schorfheide (2014):

�n = ((n� 1)N��1)

� (316)

where � > 0 measure the rate of likelihood information. I f� = 1 theschedule is linear. Typically choose � > 1 ( small initial increments initiallyand larger later on).

� Chopin (2002): recursive characterization of SLLN and CLT that appliesto each of the three steps. Herbst and Schorfheide (2014): su�cientconditions when the mutation stage is adaptive.

� How do you check convergence?

- Durham and Geweke (2014), examine the variation of estimates acrossruns. The variance of the SMC approximations across runs is an estimateof the asymptotic variance associated with the CLT.

- Variance is linked to the behavior of the particle weights. When only afew particles have weight, estimates based on a only a few unique particles.Thus, across SMC runs, the estimator will have high variance.

- Alternative (Bognanni and Herbst, 2015) focus on variation of the logmarginal data density approximation.

�When is SMC bad? When the likelihood peaks in the tail of the prior, theSMC sampler requires many bridge distributions and/or mutation steps toproperly characterize the posterior distribution.

16 Non linear DSGE models

y2t+1 = h1(y2t; �1t; �) (317)

y1t = h2(y2t; �2t; �) (318)

�2t = measurement errors, �1t = structural shocks, � = vector of structuralparameters, y2t = vector of states, y1t = vector of controls. Let yt =(y1t; y2t), �t = (�1t; �2t), y

t�1 = (y0; : : : ; yt�1) and �t = (�1; : : : ; �t).

� Likelihood is L(yT ; �y20) =QTt=1 f(ytjyt�1; �)f(y20; �). Integrating the

initial conditions y20 and the shocks out, we have:

L(yT ; �) =Z[TYt=1

Zf(ytj�t; yt�1; y20; �)f(�tjyt�1; y20; �)d�t]f(y20; �)dy20

(319)(319) is intractable.

� If we have L draws for y20 from f(y20; �) and L draws for �tjt�1;l,l = 1; : : : ; L; t = 1; : : : ; T , from f(�tjyt�1; y20; �) approximate (319)with

L(yT ; �) = 1

L[TYt=1

1

L

Xl

f(ytj�tjt�1;l; yt�1; yl20; �)] (320)

Drawing from f(y20; �) is simple; drawing from f(�tjyt�1; y20; �) compli-cated. Fernandez Villaverde and Rubio Ramirez (2004):use f(�t�1jyt�1; y20; �) as importance sampling for f(�tjyt�1; y20; �):

- Draw yl20 from f(y20; �). Draw �tjt�1;l L times from f(�tjyt�1; yl20; �) =

f(�t�1jyt�1; yl20; �)f(�tj�).

- Construct IRlt =f(ytj�tjt�1;l;yt�1;yl20;�)PLl=1 f(ytj�tjt�1;l;yt�1;yl20;�)

and assign it to each draw

�tjt�1;l.

- Resample from f�tjt�1;lgLl=1with probabilities equal to IRlt.

- Repeat above steps for every t = 1; 2; : : : ; T .

Step 3) is crucial, if omitted, only one particle will asymptotically remainand the integral in (319) diverges as T !1.

� Algorithm is computationally demanding. You need a MC within a MC.Fernandez Villaverde and Rubio Ramirez (2004): some improvements overlinear speci�cations.

17 Pruning large scale models

� Models used in policy institutions are very large (often too large!). Es-timation and interpretation di�cult.

� "A model should be able to answer all possible policy questions andget the same answers as single equation (reduced form) speci�cations".Is it possible? Do we really need all the bells and whistles for policy andforecasting purposes?

� Most models feature ad-hoc mechanisms to capture endogenous featuresof the data, e.g. habit in consumption or investment adjustment costs, ormodelling short cuts to have cross sectors transmissions, e.g., �nancialfrictions.

� Preferable (given the current state of macro and computational technol-ogy) to have a core and satellites which are activated when needed ( e.g.a �scal sector, a �nancial sector with frictions, etc.).

� How do we do the pruning systematically and retain the informationpresent in the data for estimation and inference?

� Can use ideas from Del Negro and Schorfheide (2013)

� Compare the estimation and inferential results obtained using a basicmodel (say, a model with �nancial frictions) to a smaller model (without�nancial frictions).

� The smaller model uses the same data in estimation as the basic model(add them as in data rich DSGE section).

� For example, compare the e�ects of monetary (�scal) shocks in a modelwith and without �nancial frictions. If the transmission di�ers, the eco-nomics of the two models is di�erent. If not, the way �nancial frictionsare introduced does not add much to our understanding of the problem.Cleaner to use a smaller model with additional data.

Documents

Practical DSGE models and topics · ment adjustment costs, Journal of Economic Dynamics and Control, 27, 533-549. Kulish, M. and A. Pagan (2016) Estimation and solution of models