6 Time-Varying Coeﬃcients VAR - UAB Barcelonapareto.uab.es/lgambetti/Part5TSMaster3013.pdfThe Model Time-varying coeﬃcients VAR (TVC-VAR) represent a generalization of VAR models

6 Time-Varying Coefficients VAR

Motivation

• Economies and economic dynamics are evolving over-time.

• Parameters constancy probably not a too good idea.

• Better idea: allowing model dynamics to also vary over-time.

• Several ways to do it:

– More or less smooth regime switches.

– Continuously varying parameters

• Here we focus on the second.

The Model

Time-varying coefficients VAR (TVC-VAR) represent a generalization of VAR models in

which the coefficients are allowed to change over time. Let Yt be a vector of time series.

We assume that Yt satisfies

Yt = A0,t+A1,tYt−1 + ...+Ap,tYt−p+ εt (65)

where εt is a Gaussian white noise with zero mean and time-varying covariance matrix Σt.

Let At = [A0,t, A1,t..., Ap,t], and θt = vec(A′t), where vec(·) is the column stacking operator. We

postulate

θt = θt−1 + ωt (66)

where ωt is a Gaussian white noise with zero mean and covariance Ω.

Let Σt = FtDtF ′t , where Ft is lower triangular, with ones on the main diagonal, and Dt a

diagonal matrix. Let σt be the vector of the diagonal elements of D1/2t and φi,t, i = 1, ..., n−1

the column vector formed by the non-zero and non-one elements of the (i+ 1)-th row of

F−1t . We assume that

logσt = logσt−1 + ξt (67)

φi,t = φi,t−1 + ψi,t (68)

where ξt and ψi,t are Gaussian white noises with zero mean and covariance matrix Ξ and Ψi,

respectively.

Let φt = [φ′1,t, . . . , φ′n−1,t], ψt = [ψ′1,t, . . . , ψ

′n−1,t], and Ψ be the covariance matrix of ψt. We

assume that ψi,t is independent of ψj,t, for j �= i, and that ξt, ψt, ωt, εt are mutually

uncorrelated at all leads and lags.

Time-varying dynamics

A characteristic of the TVC-VAR is that impulse response functions and the second moments

(variances and covariances) are time varying meaning that the effects and the contributions

of a shock may change over time.

Example: TVC-VAR(1). consider the model

Yt = AtYt−1 + εt (69)

Ask: (i) what are the impulse response functions (effects of a shock)? (ii) and what is the

variance of the process?

IRF measure the effects of a shock occurring today (time t) on future (t+ j) time series.

Substituting forward we obtain

Yt+1 = At+1AtYt−1 +At+1εt+ εt+1

Yt+2 = At+2At+1AtYt−1 +At+2At+1εt+At+1εt+1 + εt+2

Yt+k = At+k...At+1AtYt−1 +At+k...At+2At+1εt+ ...+1εt+k

so the collection (At+k...At+2At+1), (At+k−1...At+2At+1), ..., (At+2At+1), At+1, I represent the im-pulse response functions of εt. Clearly these will be different for εt−k.

Note that if we ignore variations in future coefficients impulse response functions reduce to

Akt , Ak−1t , ..., At, I. That means that if we ignore future uncertainty in the parameters, the

vector of time series, if stationarity condition are satisfied for every t, can be inverted at

every t and

Yt = Ct(L)εtwhere Ct(L) = I + C1tL+ C2tL2 + ... and Ckt = Akt .

Identifcation is similar to the fixed coefficients case. The difference is that both the Cholesky

factor and the rotation matrix will be time varying St and Ht respectively. The structural

model is

Yt = Ct(L)StHtwt

Also the variance and the other second moments will be time varying. In fact the variance

of the process at time t will be given by

V art(Yt) =

∞∑j=0

AjtΩA′jt

Given that impulse response functions and the variance are time varying the contribution of

each shock may change over time.

Estimation

• The easiest way to estimate the model is by using Bayesian MCMC methods, specifically

the Gibbs sampler.

• Objective: we want to draws of the coefficients from the posterior distribution.

• The posterior distribution is unknown. What is known are the conditional posteriors so we

can draw from

1. p(σT |xT , θT , φT ,Ω,Ξ,Ψ)

2. p(φT |xT , θT , σT ,Ω,Ξ,Ψ)

3. p(θT |xT , σT , φT ,Ω,Ξ,Ψ)

4. p(Ω|xT , θT , σT , φT ,Ξ,Ψ)

5. p(Ξ|xT , θT , σT , φT ,Ω,Ψ)

6. p(Ψ|xT , θT , σT , φT ,Ω,Ξ)

• After a large number N the draws obtained are draws from the joint posterior.

• The objects of interests (IRF, variance decomposition, etc.) can be computed from the

draws obtained.

Application 1: Cogley and Sargent (2002) on unemployment-inflation

dynamics)

• Very important paper. The first paper using a version of the model seen above.

• Aim: to provide evidence about the evolution of measures of the persistence of infla-

tion, prospective long-horizon forecasts (means) of inflation and unemployment, statistics

for a version of a Taylor rule.

• VAR for inflation, unemployment and the real interest rate.

• Main results: long-run mean, persistence and variance of inflation have changed. The

Taylor principle was violated before Volcker (pre-1980). Monetary policy too loose.

T. Cogley and T.J. Sargent, (2002). ”Evolving Post-World War II U.S. Inflation Dynamics,”

NBER Macroeconomics Annual 2001, Volume 16, pages 331-388.









Application 2: Primiceri (2005, ReStud) on monetary policy

• Very important paper: the first paper adding stochastic volatility.

• The paper studies changes in the monetary policy in the US over the postwar period.

• VAR for inflation, unemployment and the real interest rate.

• Main results:

– systematic responses of the interest rate to inflation and unemployment exhibit a

trend toward a more aggressive behavior,

– this has had a negligible effect on the rest of the economy

Source: G. Primiceri ”Time Varying Structural Vector Autoregressions and Monetary Policy”,

The Review of Economic Studies, 72, July 2005, pp. 821-852









Application 3: Gali and Gambetti (2009, AEJ-Macro) on the Great

Moderation

Sharp reduction in the volatility of US output growth starting from mid 80’s.

Kim and Nelson, (REStat, 99).

McConnel and Perez-Quiros, (AER, 00).

Blanchard and Simon, (BPEA 01).

Stock and Watson (NBER MA 02, JEEA 05).

The literature has provided three different explanations

1. Strong good luck hypothesis⇒ same reduction in the variance of all shocks (Ahmed,

Levin and Wilson, 2002).

2. Weak good luck hypotesis ⇒ reduction of the variance of some shocks (Arias,

Hansen and Ohanian, 2006, Justiniano and Primiceri, 2005).

3. Structural change hypothesis ⇒ policy or non-policy changes (monetary policy,

inventories management).

Gali and Gambetti (2009, AEJM) using this class of model try to assess the causes of this

reduction in volatility

Idea of the paper very simple. Exploit different implications in terms of conditional and

unconditional second moments of the different explanations.

1. Strong good luck hypothesis ⇒ scaling down of all shocks variances, no change in

conditional (to a specific shock) and unconditional correlations.

2. Weak good luck hypotesis ⇒ change in the pattern of unconditional correlations,

no change in conditional correlations.

3. Structural change hypothesis ⇒ changes in both unconditional and conditional

correlations.

Estimate a TVC-VAR for labor productivity growth and hours worked identify a technology

and non-technology shock and study model implied second moments.

Standard deviation of output growth

Unconditional moments: correlation of hours and labor productivity growth

Technology and non-technology components of output growth volatility

Non technology shock: variance decomposition of output growth

Non technology shock: labor productivity response

Non technology shock: hours response

Application 4: D’Agostino Gambetti and Giannone (forthcoming

JAE) on the forecasting

• D’Agostino Gambetti and Giannone (JAE forthcoming) consider whether time-variations

can improve upon the forecast made with standard VAR models.

• The result is not trivial: time variations helpful but quite high number of parameter couldworsen the forecasts.

• The sample spans from 1948:I-2007:IV. Forecast up to 12 quarters ahead.

• Estimation is in real time.

7. Bayesian VAR

Preliminaries

• y- data; θ parameters; p(·) - density function; p(·|·) conditional density function.

• Bayesian analysis aims at drawing conclusions abot parameters θ in terms of probabil-

ity statements which are conditional on the observed data y. The building block of Bayesian

statistics is represented by the Bayes’ rule.

• Bayes’ rule. The joint probability can be written as a product of two densities: the prior

distribution, p(θ), and the sampling ditribution (or data distribution or likelihood function),

p(y|θ):p(θ, y) = p(y|θ)p(θ)

Moreover

p(θ, y) = p(θ|y)p(y)The two rules combined yield the Bayes’ rule

p(θ|y) = p(θ, y|)p(y)

=p(y|θ)p(θ)

p(y)

where p(y) =∫p(y|θ)p(θ)dθ does not depend on θ . Therefore

p(θ|y) ∝ p(y|θ)p(θ)

That is the posterior density is proportional to the sampling distribution and the prior distri-

bution.

• Likelihood Notice that the data affect the posterior density only through the sampling

distribution p(y|θ) or likelihood function. Bayesian inference obeys the likelihood principle

which states that for a given sample of data any two probablity models that have the same

likelihood function yield the same inference for θ.

• Prior . The density function p(θ) containing any non-data information about the coef-

ficients. Is where prior econonometrician beliefs are contained.

• Posterior odds. Model comparison in a Bayesian is based on model’s probability. Let

Mi denote model i = 1, ..,m. The posterior model probability is p(Mi|y). By Bayes’ rule

p(Mi|y) = p(y|Mi)p(M − i)p(y)

where p(M − i) is the prior model probability and p(y|Mi) is the marginal likelihood. The

posterior odds ratio compares two models

POij =p(Mi|y)p(Mj|y)

=p(y|Mi)p(Mi)

p(y|Mj)p(Mj)

where p(y|Mi)p(y|Mj)

is the Bayes Factor

The simplest example

• Consider a single scalar observation y from a normal distrbution parametrized by a mean θ

and a variance σ2 which is assumed to be known. The sampling distribution is

p(y|θ) = 1√2πσ

exp[− 1

2σ2(y − θ)2

]• Now assume that the prior for θ is also exponential, in particular

p(θ) ∝ exp

[− 1

2τ20(θ − μ0)2

]• Multiplying the likelihood and the prior we obtain the posterior

p(θ|y) ∝ exp

[−12

((y − θ)2σ2

+(θ − μ0)2

τ20

)]which after some manipulations reduces to

p(θ|y) ∝ exp

[− 1

2τ21(θ − μ1)2

]where the posterior mean

μ1 =

μ0τ20

+ yσ2

1τ20

+ 1σ20

and the posterior precision (reciprocal of the variance)

1

τ21=

1

τ20+

1

σ20

Multiple observations

• Now suppose a sample of iid observations y = (y1, ..., yn) is available. Again assume the

same prior as before. The posterior density is

p(θ) ∝ exp

[− 1

2τ20(θ − μ0)2

] n∏i=1

exp

[− 1

2σ20(yi − θ)2

]

∝ exp

[−12

(1

τ20(θ − μ0)2 + 1

σ2

n∑i=1

(yi − θ)2)]

∝ exp

[−12

(1

τ20(θ − μ0)2 + n

σ2(y − θ)2

)]Therefore the posterior density is

p(θ|y1, ...yn) = N(μn, τ2n)

with

μn =

1τ20

μ0 +nσ2y

1τ20

+ nσ20

and the posterior precision.

1

τ21=

1

τ20+

n

σ20

Bayesian VARs (BVARs)

The likelihood function

Consider the VAR(p)

Yt = A1Yt−1 +A2Yt−1 + ...+ApYt−1 + εt

with εt ∼ N(0,Ω) and its the two alternative representations

Y = XΓ+ u

where X = [X1, ..., XT ]′, Xt = [Y ′t−1, Y′t−2..., Y

′t−p]

′ Y = [Y1, ..., YT ]′, u = [ε1, ..., εT ]′ and Γ =

[A1, ..., Ap]′ and

Yt = (In ⊗X ′t)γ + εt

with γ = vec(Γ).

The likelihood function can be written as

L(γ,Ω) ∝ |Ω|−T/2 exp[−12(γ − γ)′(Ω−1 ⊗X′X(γ − γ)− 1

2tr[Ω−1(Y−XΓ)′(Y−XΓ)

]]∝ N(γ|γ,Ω⊗ (X′X)−1)× IW (Ω|(Y−XΓ)′(Y−XΓ), t− n− 1) (70)

That is the likelihood function can be decomposed into the product of a normal density for

γ conditional on Ω and an Inverse-Wishart density for Ω.

Minnesota Prior

This is the prior distribution used in Litterman (1980, 1986). The residual variance convari-

ance is take to be fixed. The prior for γ is

p(γ) ∼ N(γ, Σ)

in each equation γi is set equal to 1 for the first lag of the variable appearing on the left

hand side (0 if the variables are expressed in growth rates) and zero otherwise. That is the

prior reflects the idea that

Yit = Yit−1 + εit

The variance of Yit is

V ar(Yit) =

⎧⎨⎩

π1p

for parameters on own lagsπ1σ2ipσ2

j

for parameters on lags of variable i �= j

π3σ2i for parameters on deterministic or exogenous variables

where σi is a scale factor accounting fot eh different variability of the variables. Common

values for hyperparameters are π1 = 0.05, π2 = 0.005, π3 = 105.

The resulting posterior distribution γ|Y ∼ N(γ, Σ) has the following posterior mean and

variance

Σ =[Σ−1 +

(Ω−1 ⊗X′X

)]−1γ = Σ

[Σ−1γ +

(Ω−1 ⊗X

)′Y

]where Ω is the MLE estimator of Ω.

Diffuse Prior

With the diffuse (or Jeffreys’ ) prior

p(γ,Ω) ∝ |Ω|−(n+1)/2

the posterior distribution is obtained as the product of the conditional density for γ and the

marginal for Ω

γ|Ω, Y ∼ N(γ,Ω⊗ (X′X)−1)

Ω|Y ∼ IW((Y−XΓ)′(Y−XΓ), T − k

)where k is the first dimension of Γ.

Conjugate Prior

The natural conjugate for normal data (recall the likelihood function is a N times a IW) is

the normal-Wishart

γ|Ω ∼ N(γ,Ω⊗ Σ)Ω|Y ∼ IW (Ω, α), α > n

the posterior is given by

γ|Ω, Y ∼ N(γ,Ω⊗ Σ)Ω|Y ∼ IW (Σ, T + α)

where

Σ = (Σ−1 +X′X)−1

Γ = Σ(Σ−1 +X′XΓ

)Ω = Γ′X′XΓ + Γ′Σ−1Γ + Ω + (Y−XΓ)′(Y−XΓ)− Γ′(Σ−1 +X′X)Γ

Documents

6 Time-Varying Coeﬃcients VAR - UAB Barcelonapareto.uab.es/lgambetti/Part5TSMaster3013.pdfThe Model Time-varying coeﬃcients VAR (TVC-VAR) represent a generalization of VAR models