A New Approach To Time Varying Parameters in Vector ... New Approach To Time Varying Parameters in Vector Autoregressive Models ... turning point predictions ... parameter benchmarks

A New Approach To Time Varying Parametersin Vector Autoregressive Models∗

Florian Huber1, Gregor Kastner†1, and Martin Feldkircher‡2

1WU Vienna University of Economics and Business2Oesterreichische Nationalbank (OeNB)

January 15, 2018

Abstract

We propose a flexible means of estimating vector autoregressions with time-varying

parameters (TVP-VARs) by introducing a latent threshold process that is driven by

the absolute size of parameter changes. This enables us to dynamically detect

whether a given regression coefficient is constant or time-varying. When applied

to a medium-scale macroeconomic US dataset our model yields precise density and

turning point predictions, especially during economic downturns, and provides new

insights on the changing effects of increases in short-term interest rates over time.

Keywords: Change point model, Threshold mixture innovations, Structural breaks, Shrink-

age, Bayesian statistics, Monetary policy.

JEL Codes: C11, C32, C52, E42.

∗This paper is a substantially revised version of a paper circulated under the title “Should I stayor should I go? Bayesian inference in the threshold time-varying parameter model”, Department ofEconomics Working Paper Series 235, WU Vienna University of Economics and Business, Vienna.†Corresponding author. Address: Welthandelsplatz 1, Building D4, 4th Floor, 1020 Wien, Austria.

Phone: +43 1 31336-5593. Fax: +43 1 31336-905593. Email: [email protected].‡The opinions expressed in this paper are those of the authors and do not necessarily reflect the

official viewpoint of the Oesterreichische Nationalbank or the Eurosystem.

1

arX

iv:1

607.

0453

2v4

[st

at.M

E]

12

Jan

2018

mailto:[email protected]

1 Introduction

In the last few years, economists in policy institutions and central banks were criticized

for their failure to foresee the recent financial crisis that engulfed the world economy

and led to a sharp drop in economic activity. Critics argued that economists failed

to predict the crisis because models commonly utilized at policy institutions back then

were too simplistic. For instance, the majority of forecasting models adopted were (and

possibly still are) linear and low dimensional. The former implies that the underlying

structural mechanisms and the volatility of economic shocks are assumed to remain

constant over time – a rather restrictive assumption. The latter implies that only little

information is exploited which may be detrimental for obtaining reliable predictions.

In light of this criticism, practitioners started to develop more complex models that

are capable of capturing salient features of time series commonly observed in macroe-

conomics and finance. Recent research (Stock and Watson, 1996; Cogley and Sargent,

2002; 2005; Primiceri, 2005; Sims and Zha, 2006) suggests that, at least for US data,

there is considerable evidence that the influence of certain variables appears to be time-

varying. This raises additional issues related to model specification and estimation. For

instance, do all regression parameters vary over time? Or is time variation just limited

to a specific subset of the parameter space? Moreover, as is the case with virtually any

modeling problem, the question whether a given variable should be included in the

model in the first place naturally arises. Apart from deciding whether parameters are

changing over time, the nature of the process that drives the dynamics of the coeffi-

cients also proves to be an important modeling decision.

In a recent contribution, Frühwirth-Schnatter and Wagner (2010) focus on model

specification issues within the general framework of state space models. Exploiting a

non-centered parametrization of the model allows them to rewrite the model in terms

of a constant parameter specification, effectively capturing the steady state of the pro-

cess along with deviations thereof. The non-centered parameterization is subsequently

used to search for appropriate model specifications, imposing shrinkage on the steady

state part and the corresponding deviations. Recent research aims to discriminate be-

tween inclusion/exclusion of elements of different variables and whether the associated

regression coefficient is constant or time-varying (Belmonte, Koop, and Korobilis, 2014;

Eisenstat, Chan, and Strachan, 2016; Koop and Korobilis, 2012; 2013; Kalli and Griffin,

2014). Another strand of the literature asks whether coefficients are constant or time-

varying by assuming that the innovation variance in the state equation is characterized

by a change point process (McCulloch and Tsay, 1993; Gerlach, Carter, and Kohn, 2000;

Koop, Leon-Gonzalez, and Strachan, 2009; Giordani and Kohn, 2012). However, the

main drawback of this modeling approach is the severe computational burden originat-

2

ing from the need to simulate additional latent states for each parameter. This renders

estimation of large dimensional models like vector autoregressions (VARs) unfeasible.

To circumvent such problems, Koop, Leon-Gonzalez, and Strachan (2009) estimate a

single Bernoulli random variable to discriminate between time constancy and parame-

ter variation for the autoregressive coefficients, the covariances, and the log-volatilities,

respectively. This assumption, however, implies that either all autoregressive parame-

ters change over a given time frame, or none of them. Along these lines, Maheu and

Song (forthcoming) allow for independent breaks in regression coefficients and the

volatility parameters. However, they show that their multivariate approach is inferior

to univariate change point models when out-of-sample forecasts are considered and

conclude that allowing for independent breaks in each series is important.

In the present paper, we introduce a method that can be applied to a highly param-

eterized VAR model by combining ideas from the literature of latent threshold models

(Neelon and Dunson, 2004; Nakajima and West, 2013a;b; Zhou, Nakajima, and West,

2014; Kimura and Nakajima, 2016) and mixture innovation models. Specifically, we in-

troduce a set of latent thresholds that controls the degree of time-variation separately

for each parameter and for each point in time. This is achieved by estimating variable-

specific thresholds that allow for movements in the autoregressive parameters if the

proposed change of the parameter is large enough. We show that this can be achieved

by assuming that the innovations of the state equation follow a threshold model that

discriminates between a situation where the innovation variance is large and a case

with an innovation variance set (very close) to zero. The proposed framework nests a

wide variety of competing models, most notably the standard time-varying parameter

model, a change-point model with an unknown number of regimes, mixtures between

different models, and finally the simple constant parameter model. To assess systemat-

ically, in a data-driven fashion, which predictors should be included in the model, we

impose a set of Normal-Gamma priors (Griffin and Brown, 2010) in the spirit of Bitto

and Frühwirth-Schnatter (2016) on the initial state of the system.

We illustrate the empirical merits of our approach by carrying out an extensive

forecasting exercise based on a medium-scale US dataset. Our proposed framework

is benchmarked against two constant parameter Bayesian VAR models with stochastic

volatility and hierarchical shrinkage priors. The findings indicate that the threshold

time-varying parameter VAR excels in crisis periods while being only slightly inferior

during “normal” periods in terms of one-step-ahead log predictive likelihoods. Consid-

ering turning point predictions for GDP growth, our model outperforms the constant

parameter benchmarks when upward turning points are considered while yielding sim-

ilar forecasts for downward turning points.

3

In the second part of the application, we provide evidence on the degree of time-

variation of the underlying causal mechanisms for the USA. Considering the determi-

nant of the time-varying variance-covariance matrix of the state innovations as a global

measure for the strength of parameter movements, we find that those movements reach

a maximum in the beginning of the 1980s while displaying only relatively modest move-

ments before and after that period. Consequently, we investigate the effects of a mon-

etary policy shock for the pre- and post-1980 periods separately. This exercise reveals

a considerable prize puzzle in the 1960s which starts disappearing in the early 1980s.

Moreover, considering the most recent part of our sample period, we find evidence for

increased effectiveness of monetary policy. This is especially pronounced during the

aftermath of the global financial crisis in 2008/09.

The paper is structured as follows. Section 2 introduces the modeling approach, the

prior setup and the corresponding MCMC algorithm for posterior simulation. Section 3

illustrates the behavior of the model by showcasing scenarios with few, moderately

many, and many jumps in the state equation. In Section 4, we apply the model to a

medium-scale US macroeconomic dataset and assess its predictive capabilities against

a range of competing models in terms of density and turning point predictions. More-

over, we investigate during which periods VAR coefficients display the largest amount

of time-variation and consider the associated implications on dynamic responses with

respect to a monetary policy shock. Finally, Section 5 concludes.

2 Econometric framework

We begin by specifying a flexible model that is capable of discriminating between con-

stant and time-varying parameters at each point in time.

2.1 A threshold mixture innovation model

Consider the following dynamic regression model,

yt = x′tβt + ut, ut ∼ N (0, σ2t ), (2.1)

where xt is a K-dimensional vector of explanatory variables and βt = (β1t, . . . , βKt)′

a vector of regression coefficients. The error term ut is assumed to be independently

normally distributed with (potentially) time-varying variance. This model assumes that

the relationship between elements of xt and yt is not necessarily constant over time,

but changes subject to some law of motion for βt. Typically, researchers assume that

4

the jth element of βt (j = 1, . . . , K) follows a random walk process,

βjt = βj,t−1 + ejt, ejt ∼ N (0, ϑj), (2.2)

with ϑj denoting the innovation variance of the latent states. Eq. (2.2) implies that

parameters evolve gradually over time, ruling out abrupt changes. While being con-

ceptually flexible, in the presence of only a few breaks in the parameters, this model

generates spurious movements in the coefficients that could be detrimental for the em-

pirical performance of the model (D’Agostino, Gambetti, and Giannone, 2013).

Thus, we deviate from Eq. (2.2) by specifying the innovations of the state equation

ejt to be a mixture distribution. More concretely, let

ejt ∼ N (0, θjt), (2.3)

θjt = sjtϑj1 + (1− sjt)ϑj0, (2.4)

where sjt is an indicator variable with an unconditional Bernoulli distribution. This

mechanism is closely related to an absolutely continuous spike-and-slab prior where

the slab has variance ϑj1 and the spike has variance ϑj0 with ϑj1 ϑj0 (see e.g.,

Malsiner-Walli and Wagner, 2011, for an excellent survey on this class of priors).

In the present framework we assume that the conditional distribution p(sjt|∆βt)follows a threshold process,

sjt =

1 if |∆βjt| > dj,

0 if |∆βjt| ≤ dj,(2.5)

where dj is a coefficient-specific threshold to be estimated and ∆βjt := βjt − βj,t−1.

Equations (2.4) and (2.5) state that if the absolute period-on-period change of βjt ex-

ceeds a threshold dj, we assume that the change in βjt is normally distributed with zero

mean and variance ϑj1. On the contrary, if the change in the parameter is too small, the

innovation variance is set close to zero, effectively implying that βjt ≈ βj,t−1, i.e., almost

no change from period (t− 1) to t.

This modeling approach provides a great deal of flexibility, nesting a plethora of sim-

pler model specifications. The interesting cases are characterized by situations where

sjt equals unity only for some t. For instance, it could be the case that parameters tend

to exhibit strong movements at given points in time but stay constant for the major-

ity of the time. An unrestricted time-varying parameter model would imply that the

parameters are gradually changing over time, depending on the innovation variance in

Eq. (2.2). Another prominent case would be a structural break model with an unknown

number of breaks (for a recent Bayesian exposition, see Koop and Potter, 2007).

5

The mixture innovation component in Eq. (2.4) implies that we discriminate be-

tween two regimes. The first regime assumes that changes in the parameters tend

to be large and important to predict yt whereas in the second regime, these changes

can be safely regarded as zero, thus effectively leading to a constant parameter model

over a given period of time. Compared to a standard mixture innovation model that

postulates sjt as a sequence of independent Bernoulli variables, our approach assumes

that regime shifts are governed by a (conditionally) deterministic law of motion. The

main advantage of our approach relative to mixture innovation models is that instead

of having to estimate a full sequence of sjt for all j, the threshold mixture innovation

model only relies on a single additional parameter per coefficient. This renders estima-

tion of high dimensional models such as vector autoregressions (VARs) feasible. The

additional computational burden turns out to be negligible relative to an unrestricted

TVP-VAR, see Section 2.4 for more information.

Our model is also closely related to the latent thresholding approach put forward in

Nakajima and West (2013a) within the time series context. While in their model latent

thresholding discriminates between the inclusion or exclusion of a given covariate at

time t, our model detects whether the associated regression coefficient is constant or

time-varying.

2.2 The threshold mixture innovation TTVP-VAR model

The model proposed in the previous subsection can be straightforwardly generalized to

the VAR case with stochastic volatility (SV) by assuming that yt is an m-dimensional

response vector. In this case, Eq. (2.1) becomes,

yt = x′tβt + ut, (2.6)

with x′t = IM⊗z′t, where zt = (y′t−1, . . . ,y′t−P )′ includes the P lags of the endogenous

variables.1 The vector βt now contains the dynamic autoregressive coefficients with

dimension K = M2p where each element follows the state equation given by Eqs. (2.2)

to (2.5). The vector of white noise shocks ut is distributed as

ut ∼ N (0m,Σt). (2.7)

Hereby, 0m denotes an m-variate zero vector and Σt = V tH tV′t is a time-varying

variance-covariance matrix. The matrix V t is a lower triangular matrix with unit di-

agonal and H t = diag(eh1t , . . . , ehmt). We assume that the logarithm of the variances

1In the empirical application, we also include an intercept term which we omit here for simplicity.

6

evolves according to

hit = µi + ρi(hi,t−1 + µi) + νit, for i = 1, . . . ,m, (2.8)

with µi and ρi being equation-specific mean and persistence parameters and νit ∼N (0, ζi) is an equation-specific white noise error with variance ζi. For the covariances in

V t we impose the random walk state equation with error variances given by Eq. (2.4).

Conditional on the ordering of the variables it is straightforward to estimate the

TTVP model on an equation-by-equation basis, augmenting the ith equation with the

contemporaneous values of the preceding (i − 1) equations (for i > 1), leading to a

Cholesky-type decomposition of the variance-covariance matrix. Thus, the ith equation

(for i = 2, . . . ,m) is given by

yit = z′itβit + uit. (2.9)

We let zit = (z′t, y1t, . . . , yi−1,t)′, and βit = (β′it, vi1,t, . . . , vii−1,t)

′ is a vector of latent

states with dimension Ki = Mp + i − 1. Here, vij,t denotes the dynamic regression co-

efficients on the jth (for j < i) contemporaneous value showing up in the ith equation.

Note that for the first equation we have z1t = zt and β1t = β1t.

The law of motion of the jth element of βit reads

βij,t = βij,t−1 +√θij,trt, rt ∼ N (0, 1). (2.10)

Hereby, θij,t is defined similarly to Eq. (2.4). In what follows, it proves to be convenient

to stack the states of all equations in a K-dimensional vector βt = (β′1t, . . . , β′Mt)

′ and

let βjt denote the jth element of βt.

While being clearly not order-invariant, this specific way of stating the model yields

two significant computational gains. First, the matrix operations involved in estimat-

ing the latent state vector become computationally less cumbersome. Second, we can

exploit parallel computing and estimate each equation simultaneously on a grid.

2.3 Prior specification

Since our approach to estimation and inference is Bayesian, we have to specify suitable

prior distributions for all parameters of the model.

We impose a Normal-Gamma prior (Griffin and Brown, 2010) on each element of

βi0, the initial state of the ith equation,

β0i,j|τj ∼ N(

0,2

λ2iτ 2ij

), τ 2ij ∼ G(ai, ai), (2.11)

7

for i = 1, . . . ,m; j = 1, . . . , Ki. Hereby, λ2i and ai are hyperparameters and τ 2ij denotes

an idiosyncratic scaling parameter that applies an individual degree of shrinkage on

each element of βi0. The hyperparameter λ2i serves as an equation-specific shrinkage

parameter that shrinks all elements of βi0 that belong to the ith equation towards zero

while the local shrinkage parameters τij provide enough flexibility to also allow for

non-zero values of β0i,j in the presence of a tight global prior specification.

For the equation-specific scaling parameter λ2i we impose a Gamma prior, λ2i ∼G(b0, b1), with b0 and b1 being hyperparameters chosen by the researcher. In typical

applications we specify b0 and b1 to render this prior effectively non-influential.

If the innovation variances of the observation equation are assumed to be con-

stant over time, we impose a Gamma prior on σ−2i with hyperparameters c0 and c1,

i.e., σ−2i ∼ G(c0, c1). By contrast, if stochastic volatility is introduced we follow Kast-

ner and Frühwirth-Schnatter (2014) and impose a normally distributed prior on µi with

mean zero and variance 100, a Beta prior on ρi with (ρi+1)/2 ∼ B(aρ, bρ), and a Gamma

distributed prior on ζi ∼ G(1/2, 1/(2Bζ)).

In principle, the spike variance ϑij,0 could be estimated from the data and a suitable

shrinkage prior could be employed to push ϑij,0 towards zero. However, we follow a

simpler approach and estimate the slab variance ϑij,1 only while setting ϑij,0 = ξ × ϑij.Here, ϑij denotes the variance of the OLS estimate for automatic scaling which we

treat as a constant specified a priori. The multiplier ξ is set to a fixed constant close

to zero, effectively turning off any time-variation in the parameters. As long as ϑij,0is not chosen too large, the specific value of the spike variance proves to be rather

non-influential in the empirical applications that follow.

We use a Gamma distributed prior on the inverse of the innovation variances in the

state specification in Eq. (2.2), i.e., ϑ−1ij,1 ∼ G(rij,0, rij,1) for i = 1, . . . ,m; j = 1, . . . , Ki.2

Again, rij,0 and rij,1 denote scalar hyperparameters. This choice implies that we ar-

tificially bound ϑij,1 away from zero, implying that in the upper regime we do not

exert strong shrinkage. This is in contrast to a standard time-varying parameter model,

where this prior is usually set rather tight to control the degree of time variation in

the parameters (see, e.g., Primiceri, 2005). Note that in our model the degree of time

variation is governed by the thresholding mechanism instead.

2Of course, it would also be possible to use a (restricted) Gamma prior on ϑij,1 in the spirit ofFrühwirth-Schnatter and Wagner (2010). However, we have encountered some issues with such a priorif the number of observations in the regime associated with sij,t = 1 is small. This stems from the factthat the corresponding conditional posterior distribution is generalized inverse Gaussian, a distributionthat is heavy tailed and under certain conditions leads to excessively large draws of ϑij,1.

8

Finally, the prior specification of the baseline model is completed by imposing a

uniform distributed prior on the thresholds,

dij ∼ U(πij,0, πij,1) for j = 1, . . . , Ki. (2.12)

Here, πij,0 and πij,1 denote the boundaries of the prior that have to be specified care-

fully. In our examples, we use π0i,j = 0.1 ×√ϑij,1 and πij,1 = 1.5 ×

√ϑij,1. This prior

bounds the thresholds away from zero, implying that a certain amount of shrinkage is

always imposed on the autoregressive coefficients. Setting πij,0 = 0 for all i, j would

also be a feasible option but we found in simulations that being slightly informative on

the presence of a threshold improves the empirical performance of the proposed model

markedly. It is worth noting that even under the assumption that π0j > 0, our frame-

work performs well in simulations where the data is obtained from a non-thresholded

version of our model. This stems from the fact that in a situation where parameters are

expected to evolve smoothly over time, the average period-on-period change of βij,t is

small, implying that 0.1×√ϑij,1 is close to zero and the model effectively shrinks small

parameter movements to zero.

2.4 Posterior simulation

We sample from the joint posterior distribution of the model parameters by utilizing a

Markov chain Monte Carlo (MCMC) algorithm. Conditional on the thresholds dij, the

remaining parameters can be simulated in a straightforward fashion. After initializing

the parameters using suitable starting values we iterate between the following six steps.

1. We start with equation-by-equation simulation of the full history βitt=0,1,...,T

by means of a standard forward filtering backward sampling algorithm (Carter

and Kohn, 1994; Frühwirth-Schnatter, 1994) while conditioning on the remaining

parameters of the model.

2. The inverse of the innovation variances of Eq. (2.2), ϑ−1ij , i = 1, . . . ,m; j =

1, . . . , Ki have conditional density

p(ϑ−1ij |•) = p(ϑ−1ij |dij,β) ∝ p(β|ϑ−1ij , dij)p(dij|ϑ−1ij )p(ϑ−1ij ),

which turns out to be a Gamma distribution, i.e.,

ϑ−1ij |• ∼ G

(rij,0 +

Tij,12

+1

2, rij,1 +

∑Tt=1 sij,t(βij,t − βij,t−1)2

2

), (2.13)

9

with Tij,t =∑T

t=1 sij,t denoting the number of time periods that feature time vari-

ation in the jth parameter and the ith equation.

3. Combining the Gamma prior on τ 2ij with the Gaussian likelihood yields a General-

ized Inverted Gaussian (GIG) distribution

τ 2ij|• ∼ GIG(aij −

1

2, β2

ij,0, aijλ2i

), (2.14)

where the density of the GIG(κ, χ, ψ) distribution is proportional to

zκ−1 exp

−1

2

(χz

+ ψz)

. (2.15)

To sample from this distribution, we use the R package GIGrvg (Leydold and Hör-

mann, 2015) implementing the efficient rejection sampler proposed by Hörmann

and Leydold (2013).

4. The global shrinkage parameter λ2i is sampled from a Gamma distribution given

by

λ2i |• ∼ G

(b0 + aiKi, b1 +

ai2

Ki∑j=1

τ 2ij

). (2.16)

5. We update the thresholds by applying Ki Griddy Gibbs steps (Ritter and Tanner,

1992) per equation. Due to the structure of the model, the conditional distribu-

tion of βij,1:T = (βij,1, . . . , βij,T )′ is

p(βij,1:T |dij, ϑij

)∝

T∏t=1

1√2πθij,t

exp

−(βijt − βij,t−1)2

2θij,t

. (2.17)

This expression can be straightforwardly combined with the prior in Eq. (2.12)

to evaluate the conditional posterior of dij at a given candidate point. The pro-

cedure is repeated over a fine grid of values that is determined by the prior and

an approximation to the inverse cumulative distribution function of the posterior

is constructed. Finally, this approximation is used to perform inverse transform

sampling.

6. The coefficients of the log-volatility equation and the corresponding history of

the log-volatilities are sampled by means of the algorithm brought forward by

Kastner and Frühwirth-Schnatter (2014) which is efficiently implemented in the

R package stochvol (Kastner, 2016). Under homoscedasticity, σ−2i is simulated

from σ−2i |• ∼ G(c0 + T/2, c1 +

∑Tt=1(yit − z′itβit)2/2

).

10

After obtaining an appropriate number of draws, we discard the first N as burn-in and

base our inference on the remaining draws from the joint posterior.

In comparison with standard TVP-VARs, Step (5) is the only additional MCMC step

needed to estimate the proposed TTVP model. Moreover, note that this update is com-

putationally cheap, increasing the amount of time needed to carry out the analysis

conducted in Section 4 by around five percent. For larger models (i.e., with m being

around 15) this step becomes slightly more intensive but, relative to the additional com-

putational burden introduced by applying the FFBS algorithm in Step (1), its costs are

still comparably small relative to the overall computation time needed. We found that

mixing and convergence properties of our proposed algorithm are similar to standard

Bayesian TVP-VAR estimators. In other words, the sampling of the thresholds does not

seem to substantially increase the autocorrelation of the MCMC draws. The TTVP al-

gorithm is bundled into the R package threshtvp which is available from the authors

upon request.

3 An illustrative example

In this section we illustrate our approach by means of a rather stylized example that

emphasizes how well the mixture innovation component for the state innovations per-

forms when used to approximate different data generating processes (DGPs).

For demonstration purposes it proves to be convenient to start with the following

simple DGP with K = 1:

yt = x′1tβ1t + ut, ut ∼ N (0, 0.012),

β1t = β1t−1 + e1t, e1t ∼ N (0, s1t × 0.152),

where s1t ∈ 0, 1 is chosen at random to yield paths which are characterized by many,

moderately many, and few breaks. Finally, independently for all t, we generate x1t ∼U(−1, 1) and set β1,0 = 0.

Fig. 1 shows three possible realizations of β1t and the corresponding estimates ob-

tained from a standard TVP model and our TTVP model. To ease comparison between

the models we impose a similar prior setup for both models. Specifically, for σ−2 we

set c0 = 0.01 and c1 = 0.01, implying a rather vague prior. For the shrinkage part on

β1,0 we set λ2 ∼ G(0.01, 0.01) and a1 = 0.1, effectively applying heavy shrinkage on

the initial state of the system. The prior on ϑ1 is specified as in Nakajima and West

(2013a), i.e., ϑ−11 ∼ G(3, 0.03). To complete the prior setup for the TTVP model we set

π1,0 = 0.1×√ϑ1 and π1,1 = 1.5×

√ϑ1.

11

0.0

0.2

0.4

0.6

0.8

1.0

0 100 200 300 400 500

−1.

5−

1.0

−0.

50.

00.

51.

0

(a) Posterior median and true value of β1

0 100 200 300 400 500

−0.

15−

0.10

−0.

050.

000.

050.

100.

15

(b) Demeaned posterior distribution of β1

0.0

0.2

0.4

0.6

0.8

1.0

0 100 200 300 400 500

−0.

4−

0.2

0.0

0.2

0.4


0 100 200 300 400 500

−0.

10−

0.05

0.00

0.05

0.10


0.0

0.2

0.4

0.6

0.8

1.0

0 100 200 300 400 500

−0.

8−

0.6

−0.

4−

0.2

0.0


0 100 200 300 400 500

−0.

050.

000.

05


Fig. 1: Left: Evolution of the actual state vector (dotted black) along with the posteriormedians of the TVP model (dashed blue) and the TTVP model (solid red). TheTTVP posterior moving probability is indicated by areas shaded in gray. Right: De-meaned posterior distribution of the TVP model (90% credible intervals in shadedblue) and the TTVP model (90% credible intervals in red).

12

The left panel of Fig. 1 displays the evolution of the posterior median of a standard

TVP model (in dotted blue) and of the TTVP model (in solid red) along with the actual

evolution of the state vector (in dotted black). In addition, the areas shaded in gray

depict the probability that a given coefficient moves over a certain time frame (hence-

forth labeled as posterior moving probability, PMP). The right panel shows de-meaned

90% credible intervals of the coefficients from the TVP model (blue shaded area) and

the TTVP model (solid red lines).

At least two interesting findings emerge. First, note that in all three cases, our ap-

proach detects parameter movements rather well, with the PMP reaching unity in virtu-

ally all time points that feature a structural break of the corresponding parameter. The

TVP model also tracks the actual movement of the states well but does so with much

more high frequency variation. This is a direct consequence of the inverted Gamma

prior on the state innovation variances that bound ϑ1 artificially away from zero, irre-

spective of the information contained in the likelihood (see Frühwirth-Schnatter and

Wagner, 2010, for a general discussion of this issue).

Second, looking at the uncertainty surrounding the median estimate (right panel of

Fig. 1) reveals that our approach succeeds in shrinking the posterior variance. This is

due to the fact that in periods where the true value of β1t is constant, our model suc-

cessfully assumes that the estimate of the coefficient at time t is also constant, whereas

the TVP model imposes a certain amount of time variation. This generates additional

uncertainty that inflates the posterior variance, possibly leading to imprecise inference.

Thus, the TTVP model detects change points in the parameters in situations where

the actual number of breaks is small, moderate and large. In situations where the

DGP suggests that the actual threshold equals zero, our approach still captures most of

medium to low frequency noise but shrinks small movements that might, in any case,

be less relevant for econometric inference.

4 Empirical application: Macroeconomic forecasting and structural change

4.1 Model specification and data

We use an extended version of the US macroeconomic data set employed in Smets and

Wouters (2007), Geweke and Amisano (2012) and Amisano and Geweke (2017). Data

are on a quarterly basis, span the period from 1947Q2 to 2014Q4, and comprise the

log differences of consumption, investment, real GDP, hours worked, consumer prices

and real wages. Last, and as a policy variable, we include the Federal Funds Rate (FFR)

in levels. In the next subsections we investigate structural breaks in macroeconomic

relationships by means of forecasting and impulse response analysis.

13

Following Primiceri (2005), we include p = 2 lags of the endogenous variables. The

prior setup is similar to the one adopted in the previous sections, except that now all

hyperparameters are equation-specific and feature an additional index i = 1, . . . ,m.

More specifically, for all applicable i and j, we use the following values for the hy-

perparameters. For the shrinkage part on the initial state of the system, we again

set λ2i ∼ G(0.01, 0.01) and ai = 0.1, and the prior on ϑij is specified to be informa-

tive with ϑ−1ij ∼ G(3, 0.03). For the parameters of the log-volatility equation we use

µi ∼ N (0, 102), ρi+12∼ B(25, 5), and ζi ∼ G(1/2, 1/2). The last ingredient missing is the

prior on the thresholds where we set πij,0 = 0.1×√ϑij,1 and πij,1 = 1.5×

√ϑij,1.

For the seven-variable VAR we draw 500 000 samples from the joint posterior and

discard the first 400 000 draws as burn-in. Finally, we use thinning such that inference

is based on 5000 draws out of 100 000 retained draws.

4.2 Forecasting evidence

We start with a simple forecasting exercise of one-step-ahead predictions. For that

purpose we use an expanding window and a hold-out sample of 100 quarters. Forecasts

are evaluated using log-predictive Bayes factors, which are defined as the difference of

log predictive scores (LPS) of a specification of interest and a benchmark model. The

log-predictive score is a widely used metric to measure density forecast accuracy (see

e.g., Geweke and Amisano, 2010).

As the benchmark model, we use a TVP-VAR with relatively little shrinkage. This

amounts to setting the thresholds equal to zero and specify the prior on ϑ−1ij ∼ G(3, 0.03).

We, moreover, include two additional constant parameter competitors, namely a Minne-

sota-type VAR (Doan, Litterman, and Sims, 1984) and a Normal-Gamma (NG) VAR

(Huber and Feldkircher, 2017). All models feature stochastic volatility. In order to

assess the impact of different prior hyperparameters on ϑij and the impact of ξ, we

estimate the TTVP model over a grid of meaningful values.

Table 1 depicts the LPS differences between a given model and the benchmark

model. First, we see that all models outperform the no-shrinkage time-varying pa-

rameter VAR as indicated by positive values of the log-predictive Bayes factors. Second,

constant parameter VARs with shrinkage turn out to be hard to beat. Especially the

hierarchical Minnesota prior does a good job with respect to one-quarter-ahead fore-

casts. For the TTVP model we see that forecast performance also varies with the prior

specification. More specifically, the results show that increasing ξ, which implies more

time variation in the lower regime a-priori, deteriorates the forecasting performance.

This is especially true if a large value for ξ is coupled with small values of rij,0 and rij,1

14

rij,0 = 3

rij,1 = 0.03

rij,0 = 1.5

rij,1 = 1

rij,0 = 0.001

rij,1 = 0.001

ξ = ξ1 = 10−6 169.06 168.75 169.16ξ = ξ2 = 10−5 170.80 170.60 173.87ξ = ξ3 = 10−4 170.95 172.31 158.44ξ = ξ4 = 10−3 130.45 163.78 137.53

NG MinnesotaBVAR 173.77 177.20

Table 1: Log predictive Bayes factors relative to a time-varying parameter VAR withoutshrinkage for different key parameters of the model. The final row refers to the logpredictive Bayes factor of a BVAR equipped with a Normal-Gamma (NG) shrinkageprior and a hierarchical Minnesota prior. All models estimated with stochasticvolatility. Numbers greater than zero indicate that a given model outperforms thebenchmark.

– the latter referring to the a priori belief of large swings of coefficients in the upper

regime of the model.

To investigate the predictive performance of the different model specifications fur-

ther, Fig. 2(a) shows the log predictive Bayes factors relative to the benchmark model

over time. The plot shows that the specifications with ξ1 and ξ2 excel during most

of the sample period, irrespective of the prior on ϑij. The constant parameter mod-

els, by contrast, dominate only during two very distinct periods of our sample, namely

at the beginning and at the end of the time span covered. In both periods, no se-

vere up or downswings in economic activity occur and the constant parameter models

with SV display excellent predictive capabilities. By contrast, during volatile periods –

such as the global financial crisis – our modeling approach seems to pay off in terms

of predictive accuracy. To investigate this in more detail, we focus on the forecasting

performance of the different model specifications during the period from 2006Q1 to

2010Q1 in Fig. 2(b). Here we see that TTVP specifications with ξj for j < 4 outperform

all remaining competitors. This additional, and more detailed, look at the forecasting

performance during turbulent times thus reveals that the TTVP model is a valuable

alternative to simpler models. Put differently, we observe that during more volatile

periods the TTVP model can severely outperform constant parameter models, while in

tranquil times its forecasts are never far off.

Next, we examine turning point forecasts, since the detection of structural breaks

might be a further useful application of the TTVP framework. We focus on turning

points in real GDP growth and follow Canova and Ciccarelli (2004) to label time point

(t + 1) a downward turning point – conditional on the information up to time t – if

St+1, the growth rate of real GDP at time (t + 1), satisfies that St−2 < St, St−1 < St,

15

Time

2000 2005 2010 2015

050

100

150

200

250

300

350

ξ1 r0 =3, r1 =0.03ξ1 r0 =1.5, r1 =1ξ1 r0 =0.001, r1 =0.001ξ2 r0 =3, r1 =0.03ξ2 r0 =1.5, r1 =1ξ2 r0 =0.001, r1 =0.001ξ3 r0 =3, r1 =0.03ξ3 r0 =1.5, r1 =1ξ3 r0 =0.001, r1 =0.001ξ4 r0 =3, r1 =0.03ξ4 r0 =1.5, r1 =1ξ4 r0 =0.001, r1 =0.001MinnesotaNG

(a) Full evaluation period (1995Q1 to 2014Q4).

Time

2006 2007 2008 2009 2010

140

160

180

200

ξ1 r0 =3, r1 =0.03ξ1 r0 =1.5, r1 =1ξ1 r0 =0.001, r1 =0.001ξ2 r0 =3, r1 =0.03ξ2 r0 =1.5, r1 =1ξ2 r0 =0.001, r1 =0.001ξ3 r0 =3, r1 =0.03ξ3 r0 =1.5, r1 =1ξ3 r0 =0.001, r1 =0.001ξ4 r0 =3, r1 =0.03ξ4 r0 =1.5, r1 =1ξ4 r0 =0.001, r1 =0.001MinnesotaNG

(b) Crisis period only (2006Q1 to 2010Q1).

Fig. 2: Log predictive Bayes factor relative to a TVP-VAR-SV model.

16

Downturns Upturnsrij,0 = 3

rij,1 = 0.03

rij,0 = 1.5

rij,1 = 1

rij,0 = 0.001

rij,1 = 0.001

rij,0 = 3

rij,1 = 0.03

rij,0 = 1.5

rij,1 = 1

rij,0 = 0.001

rij,1 = 0.001

ξ = ξ1 = 10−6 0.66 0.67 0.67 0.84 0.83 0.83ξ = ξ2 = 10−5 0.66 0.66 0.67 0.83 0.83 0.81ξ = ξ3 = 10−4 0.64 0.65 0.68 0.83 0.84 0.80ξ = ξ4 = 10−3 0.87 0.67 0.78 0.81 0.83 0.78

NG Minnesota NG MinnesotaBVAR 0.62 0.62 0.84 0.93

Table 2: QPS scores relative to a time-varying parameter VAR without shrinkage fordifferent key parameters of the model. The final row refers to the QPS score ofa BVAR equipped with a Normal-Gamma (NG) shrinkage prior and a hierarchicalMinnesota prior. All models estimated with stochastic volatility. Numbers belowunity indicate that a given model outperforms the benchmark.

and St > St+1. Analogously, the time point (t + 1) is labeled an upward turning pointif St−2 > St, St−1 > St, and St < St+1. Equipped with these definitions, we then can

split the total number of turning points up into upturns and downturns and compute

the quadratic probability (QPS) scores as an accuracy measure of upturn and downturn

probability forecasts. The results are provided in Table 2.

The picture that arises is similar to that of the density forecasting exercise: all vari-

ants of the TTVP model beat the no-shrinkage time-varying parameter VAR model.

Turning point forecasts deteriorate for larger values of ξ and especially so if they are

coupled with small choices for rij, yielding a relatively uninformative prior on ϑij and

consequently little shrinkage. Forecast gains relative to the benchmark model are more

sizable for downward than for upward turning points. In comparison to the two con-

stant parameter competitors, the TTVP model excels in predicting upward turning

points (for which there are more observations in the sample), while forecast perfor-

mance is slightly inferior for downward forecasts. Also note that for downward predic-

tions, penalizing time-variation seems to be essential and consequently the strongest

performance among TTVP specifications is achieved for small values of ξ. The opposite

is the case for upward turning points where reasonable predictions can be also achieved

with a rather loose prior.

4.3 Detecting structural breaks in US data

In this section we aim to have a closer and more systematic look at changes in the joint

dynamics of our seven variable TTVP-VAR model for the US economy. To that end, we

examine the posterior mean of the determinant of the time-varying variance-covariance

matrix of the innovations in the state equation (Cogley and Sargent, 2005). For each

draw of Ωit = diag(θi1,t, . . . , θKi1,t) we compute its log-determinant and subtract the

17

mean across time. Large values of this measure point towards a pronounced degree of

time-variation in the autoregressive coefficients of the corresponding equations. The

results are provided in Fig. 3 for each equation and the full system.

For all variables we see at least one prominent spike during the sample period in-

dicating a structural break. Most spikes in the determinant occur around 1980, when

then Fed chairman Paul Volcker sharply increased short-term interest rates to fight infla-

tion. Other breaks relate to the dot-com bubble in the early 2000s (consumption), the

oil price crisis and stock market crash in the early 1970s (hours worked) and another

oil price related crisis in the early 1990s. Also, the transition from positive interest rates

to the zero lower bound in the midst of the global financial crisis is indicated by a spike

in the determinant. That we can relate spikes to historical episodes of financial and

economic distress lends further confidence in the modeling approach. Among these

periods, the early 1980s seem to have constituted by far the most severe rupture for

the US economy.

4.4 Impulse responses to a monetary policy shock

In this section we examine the dynamic responses of a set of macroeconomic variables

to a contractionary monetary policy shock. The monetary policy shock is calibrated as a

100 basis point (bp) increase in the FFR and identified using a Cholesky ordering with

the variables appearing in exactly the same order as mentioned above. This ordering is

in the spirit of Christiano, Eichenbaum, and Evans (2005) and has been subsequently

used in the literature (see Coibion, 2012, for an excellent survey). Drawing on the

results of the previous section, we focus on two sub-sets of the sample, namely the pre-

Volcker period from 1947Q4 to 1979Q1 and the rest of the sample.3 The time-varying

impulse responses – as functions of horizons – are displayed in Fig. 4. Additionally, we

also include impulse responses for different horizons – as functions of time – over the

full sample period in Fig. 5 and Fig. 6.

In Fig. 4 we investigate whether the size and the shape of responses varies between

and within the two sub-samples. For that purpose we show median responses over the

first sample split in the top row and for the second part of the sample in the bottom row

of Fig. 4. Impulse responses that belong to the beginning of a sample split are depicted

in light yellow, those that belong to the end of the sample period in dark red. To fix

ideas, if the size of a response increases continuously over time we should see a smooth

darkening of the corresponding impulse from light yellow to dark red. Considering, for

instance, hours worked, this phenomenon can clearly be seen in in the second sample

period from 1979Q2 to 2014Q4, where the median response changes gradually from

3The split into two sub-sets is conducted for interpretation purposes only. For estimation, the entiresample has been used.

18

1950 1960 1970 1980 1990 2000 2010

0.8

1.0

1.2

1.4

1.6

(a) Consumption

1950 1960 1970 1980 1990 2000 2010

0.5

1.0

1.5

2.0

2.5

(b) Investment

1950 1960 1970 1980 1990 2000 2010

0.95

1.00

1.05

1.10

1.15

1.20

(c) Output

1950 1960 1970 1980 1990 2000 2010

0.9

1.0

1.1

1.2

1.3

1.4

(d) Hours worked

1950 1960 1970 1980 1990 2000 2010

12

34

5

(e) Inflation

1950 1960 1970 1980 1990 2000 2010

0.5

1.0

1.5

2.0

2.5

3.0

(f) Real wages

1950 1960 1970 1980 1990 2000 2010

0.5

1.0

1.5

2.0

2.5

3.0

(g) FFRTime

1950 1960 1970 1980 1990 2000 2010

02

46

810

12

(h) Overall

Fig. 3: Posterior mean of the determinant of time-varying variance-covariance ma-trix of the innovations to the state equation from 1947Q2 to 2014Q4. Values areobtained by taking the exponential of the demeaned log-determinant across equa-tions. Gray shaded areas refer to US recessions dated by the NBER business cycledating committee.

19

1947Q4 to 1979Q1

−0.

08−

0.04

0.00

0.02

0 4 10 15

Consumption

−1.

2−

0.8

−0.

40.

0

0 4 10 15

Investment

−0.

06−

0.04

−0.

020.

00

0 4 10 15

Output

−0.

15−

0.10

−0.

050.

00

0 4 10 15

Hours worked

0.0

0.2

0.4

0.6

0 4 10 15

Inflation

−0.

10.

00.

10.

2

0 4 10 15

Real wages0.

51.

01.

5

0 4 10 15

FFR

1979Q2 to 2014Q4

−0.

08−

0.04

0.00

0.02

0 4 10 15

Consumption

−1.

2−

0.8

−0.

40.

0

0 4 10 15

Investment

−0.

06−

0.04

−0.

020.

00

0 4 10 15

Output

−0.

15−

0.10

−0.

050.

00

0 4 10 15

Hours worked

0.0

0.2

0.4

0.6

0 4 10 15

Inflation

−0.

10.

00.

10.

2

0 4 10 15

Real wages

0.5

1.0

1.5

0 4 10 15

FFR

Fig. 4: Posterior median impulse response functions over two sample splits, namelythe pre-Volcker period (1947Q4 to 1979Q1) and the rest of the sample period(1979Q2 to 2014Q4). The coloring of the impulse responses refer to their timing:light yellow stands for the beginning of the sample split, dark red stands for theend of sample split. For reference, 68% credible intervals over the average of thesample period provided (dotted black lines).

20

−0.20−0.15−0.10−0.050.000.05

Con

sum

ptio

n

1947

1954

1960

1966

1972

1979

1985

1991

1997

2004

2010

t= 4

−2.0−1.5−1.0−0.50.0

Inve

stm

ent

1947

1954

1960

1966

1972

1979

1985

1991

1997

2004

2010

−0.10−0.050.00

Out

put

1947

1954

1960

1966

1972

1979

1985

1991

1997

2004

2010

−0.50.00.51.0

Infla

tion

1947

1954

1960

1966

1972

1979

1985

1991

1997

2004

2010

−0.20−0.15−0.10−0.050.000.05

1947

1954

1960

1966

1972

1979

1985

1991

1997

2004

2010

t= 8

−2.0−1.5−1.0−0.50.0

1947

1954

1960

1966

1972

1979

1985

1991

1997

2004

2010

−0.10−0.050.00

1947

1954

1960

1966

1972

1979

1985

1991

1997

2004

2010

−0.50.00.51.0

1947

1954

1960

1966

1972

1979

1985

1991

1997

2004

2010

−0.20−0.15−0.10−0.050.000.05

1947

1954

1960

1966

1972

1979

1985

1991

1997

2004

2010

t= 12

−2.0−1.5−1.0−0.50.0

1947

1954

1960

1966

1972

1979

1985

1991

1997

2004

2010

−0.10−0.050.00

1947

1954

1960

1966

1972

1979

1985

1991

1997

2004

2010

−0.50.00.51.0

1947

1954

1960

1966

1972

1979

1985

1991

1997

2004

2010

Fig.

5:Po

ster

ior

med

ian

resp

onse

sto

a+

100

bpm

onet

ary

polic

ysh

ock,

afte

r4

(top

pane

ls),

8(m

iddl

epa

nels

)an

d12

(bot

tom

pane

ls)

quar

ters

.Sh

aded

area

sco

rres

pond

to90

%(d

ark

red)

and

68%

(lig

htre

d)cr

edib

lese

ts.

21

−0.30−0.25−0.20−0.15−0.10−0.050.00

Hou

rs w

orke

d

1947

1952

1956

1961

1965

1970

1974

1979

1983

1988

1992

1997

2001

2006

2010

t= 4

−0.4−0.20.00.20.40.6

Rea

l wag

es

1947

1952

1956

1961

1965

1970

1974

1979

1983

1988

1992

1997

2001

2006

2010

0.51.01.5

Fed

eral

fund

s ra

te

1947

1952

1956

1961

1965

1970

1974

1979

1983

1988

1992

1997

2001

2006

2010

−0.30−0.25−0.20−0.15−0.10−0.050.00

1947

1952

1956

1961

1965

1970

1974

1979

1983

1988

1992

1997

2001

2006

2010

t= 8

−0.4−0.20.00.20.40.6

1947

1952

1956

1961

1965

1970

1974

1979

1983

1988

1992

1997

2001

2006

2010

0.51.01.5

1947

1952

1956

1961

1965

1970

1974

1979

1983

1988

1992

1997

2001

2006

2010

−0.30−0.25−0.20−0.15−0.10−0.050.00

1947

1952

1956

1961

1965

1970

1974

1979

1983

1988

1992

1997

2001

2006

2010

t= 12

−0.4−0.20.00.20.40.6

1947

1952

1956

1961

1965

1970

1974

1979

1983

1988

1992

1997

2001

2006

2010

0.51.01.5

1947

1952

1956

1961

1965

1970

1974

1979

1983

1988

1992

1997

2001

2006

2010

Fig.

6:Po

ster

ior

med

ian

resp

onse

sto

a+

100

bpm

onet

ary

polic

ysh

ock,

afte

r4

(top

pane

ls),

8(m

iddl

epa

nels

)an

d12

(bot

tom

pane

ls)

quar

ters

.Sh

aded

area

sco

rres

pond

to90

%(d

ark

red)

and

68%

(lig

htre

d)cr

edib

lese

ts.

22

slightly negative to substantially negative. On the other hand, abrupt changes are also

clearly visible, see e.g., the drastic change of the inflation response from 1979Q1 (the

last quarter in the first sample) to 1979Q2 (the first quarter in the second sampler),

dropping from substantially positive to just above zero within one quarter (see also

Fig. 5).

Considering the dynamic responses across different angles, we find three regular-

ities which are worth emphasizing. The first concerns the overall effects of the mon-

etary policy shock. Note that an unexpected rate increase deters investment growth,

hours worked and consequently overall output growth for both sample splits. These

results are reasonable from an economic perspective. Also, estimated effects on output

growth and inflation are comparable to those of Baumeister and Benati (2013) who

use a TVP-VAR framework and US data. Responses of consumption growth tend to be

accompanied by wide credible sets. The same applies to inflation and real wages.

Second, we examine changes in responses over time for the first sub-period. One of

the variables that shows a great deal of variation in magnitudes is the response of infla-

tion. Here, effects become increasingly positive the further one moves from 1947Q4 to

1979Q1 and the shades of the responses turn continuously darker. These results imply

a severe “price puzzle”. While overall credible sets for the sub-sample are wide, positive

responses for inflation and thus the price puzzle are estimated over the period from the

mid-1960s to the beginning of the 1980s (see also Fig. 5). A similar picture arises when

looking at consumption growth. During the first sample split, effects become increas-

ingly more negative, but responses are only precisely estimated for the period from the

mid-1960s to the beginning of the 1980s. This might be explained by the fact that the

monetary policy driven increase in inflation spurs consumption since saving becomes

less attractive.

Third, we focus on the results over the more recent second sample split from 1979Q2

to 2014Q4. Paul Volcker’s fight against inflation had some bearings on overall macroe-

conomic dynamics in the USA. With the onset of the 1980s, the aforementioned price

puzzle starts to disappear (in the sense that effects are surrounded by wide credible

sets and medium responses increasingly negative). There is also a great deal of time

variation evident in other responses, mostly becoming increasingly negative. Put differ-

ently, the effectiveness of monetary policy seems to be higher in the more recent sample

period than before. This can be seen by effects on hours worked, investment growth

and output growth. That the effects of a hypothetical monetary policy shock on output

growth are particular strong after the crisis corroborates findings of Baumeister and

Benati (2013) and Feldkircher and Huber (2016). The latter argue that this is related

to the zero lower bound period: after a prolonged period of unaltered interest rates, a

23

deviation from the (long-run) interest rate mean can exert considerable effects on the

macroeconomy.

5 Closing remarks

This paper puts forth a novel approach to estimate time-varying parameter models in

a Bayesian framework. We assume that the state innovations are following a threshold

model where the threshold variable is the absolute period-on-period change of the

corresponding states. This implies that if the (proposed) change is sufficiently large,

the corresponding variance is set to a value greater than zero. Otherwise, it is set close

to zero which implies that the states remained virtually constant from (t− 1) to t. Our

framework is capable of discriminating between a plethora of competing specifications,

most notably models that feature many, moderately many, and few structural breaks in

the regression parameters.

We also propose a generalization of our model to the VAR framework with stochastic

volatility. In an application to the US macroeconomy, we examine the usefulness of

the TTVP-VAR in terms of forecasting, turning point prediction, and structural impulse

response analysis. Our results show that the model yields precise forecasts, especially

so during more volatile times such as witnessed in 2008. For that period, the forecast

gain over simpler constant parameter models is particularly high. We then proceed by

investigating turning point predictions, and observe excellent performance of the TTVP

model, in particular for upturn predictions. Finally, we examine impulse responses to a

+100 basis points contractionary monetary policy shock focusing on two sub-periods of

our sample span, the pre-Volcker period and the rest of the sample. Our results reveal

significant evidence for a severe price puzzle during episodes of the pre-Volcker period.

The positive effect of the rate increase in inflation disappears in the second half of our

sample. Modeling changes in responses over the two sub-periods only, such as in a

regime switching model, however, would be too simplistic, as we do find also a lot of

time variation within each sub-period. For example, we find increasing effectiveness of

monetary policy in terms of output, investment growth, and hours worked in the more

recent sub-period. This is especially true for the period after the global financial crisis

in which the Federal Funds Rate has been tied to zero. For that period, a hypothetical

deviation from the zero lower bound would create pronounced effects on the wider

macroeconomy.

24

6 Acknowledgments

We sincerely thank the participants of the WU Brown Bag Seminar of the Institute of

Statistics and Mathematics, the 3rd Vienna Workshop on High-Dimensional Time Series

in Macroeconomics and Finance 2017, the NBP Workshop on Forecasting 2017, and in

particular Sylvia Frühwirth-Schnatter, for many helpful comments and suggestions that

improved the paper significantly.

References

AMISANO, G., AND J. GEWEKE (2017): “Prediction using several macroeconomic mod-els,” Review of Economics and Statistics, 99(5), 912–925.

BAUMEISTER, C., AND L. BENATI (2013): “Unconventional Monetary Policy and theGreat Recession: Estimating the Macroeconomic Effects of a Spread Compression atthe Zero Lower Bound,” International Journal of Central Banking, 9(2), 165–212.

BELMONTE, M. A., G. KOOP, AND D. KOROBILIS (2014): “Hierarchical Shrinkage inTime-Varying Parameter Models,” Journal of Forecasting, 33(1), 80–94.

BITTO, A., AND S. FRÜHWIRTH-SCHNATTER (2016): “Achieving shrinkage in a time-varying parameter model framework,” arXiv pre-print 1611.01310.

CANOVA, F., AND M. CICCARELLI (2004): “Forecasting and turning point predictions ina Bayesian panel VAR model,” Journal of Econometrics, 120(2), 327–359.

CARTER, C. K., AND R. KOHN (1994): “On Gibbs sampling for state space models,”Biometrika, 81(3), 541–553.

CHRISTIANO, L. J., M. EICHENBAUM, AND C. L. EVANS (2005): “Nominal Rigidities andthe Dynamic Effects of a Shock to Monetary Policy,” Journal of Political Economy,113(1), 1–45.

COGLEY, T., AND T. J. SARGENT (2002): “Evolving post-world war II US inflation dy-namics,” in NBER Macroeconomics Annual 2001, Volume 16, pp. 331–388. MIT Press.

(2005): “Drifts and volatilities: monetary policies and outcomes in the postWWII US,” Review of Economic Dynamics, 8(2), 262–302.

COIBION, O. (2012): “Are the Effects of Monetary Policy Shocks Big or Small?,” Ameri-can Economic Journal: Macroeconomics, 4(2), 1–32.

D’AGOSTINO, A., L. GAMBETTI, AND D. GIANNONE (2013): “Macroeconomic forecastingand structural change,” Journal of Applied Econometrics, 28(1), 82–101.

DOAN, T. R., B. R. LITTERMAN, AND C. A. SIMS (1984): “Forecasting and ConditionalProjection Using Realistic Prior Distributions,” Econometric Reviews, 3, 1–100.

EISENSTAT, E., J. C. CHAN, AND R. W. STRACHAN (2016): “Stochastic model specifica-tion search for time-varying parameter VARs,” Econometric Reviews, pp. 1–28.

FELDKIRCHER, M., AND F. HUBER (2016): “Unconventional US Monetary Policy: NewTools, Same Channels?,” Working Papers 208, Oesterreichische Nationalbank (Aus-trian Central Bank).

FRÜHWIRTH-SCHNATTER, S. (1994): “Data augmentation and dynamic linear models,”Journal of time series analysis, 15(2), 183–202.

FRÜHWIRTH-SCHNATTER, S., AND H. WAGNER (2010): “Stochastic model specificationsearch for Gaussian and partial non-Gaussian state space models,” Journal of Econo-metrics, 154(1), 85–100.

25

GERLACH, R., C. CARTER, AND R. KOHN (2000): “Efficient Bayesian inference for dy-namic mixture models,” Journal of the American Statistical Association, 95(451),819–828.

GEWEKE, J., AND G. AMISANO (2010): “Comparing and evaluating Bayesian predictivedistributions of asset returns,” International Journal of Forecasting, 26(2), 216–230.

(2012): “Prediction with misspecified models,” The American Economic Review,102(3), 482–486.

GIORDANI, P., AND R. KOHN (2012): “Efficient Bayesian inference for multiple change-point and mixture innovation models,” Journal of Business & Economic Statistics.

GRIFFIN, J. E., AND P. J. BROWN (2010): “Inference with normal-gamma prior distri-butions in regression problems,” Bayesian Analysis, 5(1), 171–188.

HÖRMANN, W., AND J. LEYDOLD (2013): “Generating generalized inverse Gaussianrandom variates,” Statistics and Computing, 24(4), 1–11.

HUBER, F., AND M. FELDKIRCHER (2017): “Adaptive shrinkage in Bayesian vector au-toregressive models,” Journal of Business and Economic Statistics, forthcoming.

KALLI, M., AND J. E. GRIFFIN (2014): “Time-varying sparsity in dynamic regressionmodels,” Journal of Econometrics, 178(2), 779–793.

KASTNER, G. (2016): “Dealing with stochastic volatility in time series using the R pack-age stochvol,” Journal of Statistical Software, 69(5), 1–30.

KASTNER, G., AND S. FRÜHWIRTH-SCHNATTER (2014): “Ancillarity-sufficiency inter-weaving strategy (ASIS) for boosting MCMC estimation of stochastic volatility mod-els,” Computational Statistics & Data Analysis, 76, 408–423.

KIMURA, T., AND J. NAKAJIMA (2016): “Identifying conventional and unconventionalmonetary policy shocks: a latent threshold approach,” The BE Journal of Macroeco-nomics, 16(1), 277–300.

KOOP, G., AND D. KOROBILIS (2012): “Forecasting inflation using dynamic model aver-aging,” International Economic Review, 53(3), 867–886.

(2013): “Large time-varying parameter VARs,” Journal of Econometrics, 177(2),185–198.

KOOP, G., R. LEON-GONZALEZ, AND R. W. STRACHAN (2009): “On the evolution ofthe monetary policy transmission mechanism,” Journal of Economic Dynamics andControl, 33(4), 997–1017.

KOOP, G., AND S. M. POTTER (2007): “Estimation and forecasting in models with mul-tiple breaks,” The Review of Economic Studies, 74(3), 763–789.

LEYDOLD, J., AND W. HÖRMANN (2015): GIGrvg: Random variate generator for the GIGdistributionR package version 0.4.

MAHEU, J. M., AND Y. SONG (forthcoming): “An efficient Bayesian approach to multiplestructural change in multivariate time series,” Journal of Applied Econometrics.

MALSINER-WALLI, G., AND H. WAGNER (2011): “Comparing Spike and Slab Priors forBayesian Variable Selection,” Austrian Journal Of Statistics, 40(4), 241–264.

MCCULLOCH, R. E., AND R. S. TSAY (1993): “Bayesian inference and prediction formean and variance shifts in autoregressive time series,” Journal of the AmericanStatistical Association, 88(423), 968–978.

NAKAJIMA, J., AND M. WEST (2013a): “Bayesian analysis of latent threshold dynamicmodels,” Journal of Business & Economic Statistics, 31(2), 151–164.

(2013b): “Dynamic factor volatility modeling: A Bayesian latent thresholdapproach,” Journal of Financial Econometrics, 11(1), 116–153.

26

NEELON, B., AND D. B. DUNSON (2004): “Bayesian Isotonic Regression and Trend Anal-ysis,” Biometrics, 60(2), 398–406.

PRIMICERI, G. E. (2005): “Time varying structural vector autoregressions and mone-tary policy,” The Review of Economic Studies, 72(3), 821–852.

RITTER, C., AND M. A. TANNER (1992): “Facilitating the Gibbs sampler: the Gibbs stop-per and the griddy-Gibbs sampler,” Journal of the American Statistical Association,87(419), 861–868.

SIMS, C. A., AND T. ZHA (2006): “Were there regime switches in US monetary policy?,”The American Economic Review, 96(1), 54–81.

SMETS, F., AND R. WOUTERS (2007): “Shocks and frictions in US business cycles: ABayesian DSGE approach,” The American Economic Review, 97(3), 586–606.

STOCK, J. H., AND M. W. WATSON (1996): “Evidence on structural instability inmacroeconomic time series relations,” Journal of Business & Economic Statistics,14(1), 11–30.

ZHOU, X., J. NAKAJIMA, AND M. WEST (2014): “Bayesian forecasting and portfoliodecisions using dynamic dependent sparse factor models,” International Journal ofForecasting, 30(4), 963–980.

27

Documents

A New Approach To Time Varying Parameters in Vector ... New Approach To Time Varying Parameters in Vector Autoregressive Models ... turning point predictions ... parameter benchmarks