Upload
others
View
13
Download
0
Embed Size (px)
Citation preview
6 Time-Varying Coefficients VAR
Motivation
• Economies and economic dynamics are evolving over-time.
• Parameters constancy probably not a too good idea.
• Better idea: allowing model dynamics to also vary over-time.
• Several ways to do it:
– More or less smooth regime switches.
– Continuously varying parameters
• Here we focus on the second.
The Model
Time-varying coefficients VAR (TVC-VAR) represent a generalization of VAR models in
which the coefficients are allowed to change over time. Let Yt be a vector of time series.
We assume that Yt satisfies
Yt = A0,t+A1,tYt−1 + ...+Ap,tYt−p+ εt (65)
where εt is a Gaussian white noise with zero mean and time-varying covariance matrix Σt.
Let At = [A0,t, A1,t..., Ap,t], and θt = vec(A′t), where vec(·) is the column stacking operator. We
postulate
θt = θt−1 + ωt (66)
where ωt is a Gaussian white noise with zero mean and covariance Ω.
Let Σt = FtDtF ′t , where Ft is lower triangular, with ones on the main diagonal, and Dt a
diagonal matrix. Let σt be the vector of the diagonal elements of D1/2t and φi,t, i = 1, ..., n−1
the column vector formed by the non-zero and non-one elements of the (i+ 1)-th row of
F−1t . We assume that
logσt = logσt−1 + ξt (67)
φi,t = φi,t−1 + ψi,t (68)
where ξt and ψi,t are Gaussian white noises with zero mean and covariance matrix Ξ and Ψi,
respectively.
Let φt = [φ′1,t, . . . , φ′n−1,t], ψt = [ψ′1,t, . . . , ψ
′n−1,t], and Ψ be the covariance matrix of ψt. We
assume that ψi,t is independent of ψj,t, for j �= i, and that ξt, ψt, ωt, εt are mutually
uncorrelated at all leads and lags.
Time-varying dynamics
A characteristic of the TVC-VAR is that impulse response functions and the second moments
(variances and covariances) are time varying meaning that the effects and the contributions
of a shock may change over time.
Example: TVC-VAR(1). consider the model
Yt = AtYt−1 + εt (69)
Ask: (i) what are the impulse response functions (effects of a shock)? (ii) and what is the
variance of the process?
IRF measure the effects of a shock occurring today (time t) on future (t+ j) time series.
Substituting forward we obtain
Yt+1 = At+1AtYt−1 +At+1εt+ εt+1
Yt+2 = At+2At+1AtYt−1 +At+2At+1εt+At+1εt+1 + εt+2
Yt+k = At+k...At+1AtYt−1 +At+k...At+2At+1εt+ ...+1εt+k
so the collection (At+k...At+2At+1), (At+k−1...At+2At+1), ..., (At+2At+1), At+1, I represent the im-pulse response functions of εt. Clearly these will be different for εt−k.
Note that if we ignore variations in future coefficients impulse response functions reduce to
Akt , Ak−1t , ..., At, I. That means that if we ignore future uncertainty in the parameters, the
vector of time series, if stationarity condition are satisfied for every t, can be inverted at
every t and
Yt = Ct(L)εtwhere Ct(L) = I + C1tL+ C2tL2 + ... and Ckt = Akt .
Identifcation is similar to the fixed coefficients case. The difference is that both the Cholesky
factor and the rotation matrix will be time varying St and Ht respectively. The structural
model is
Yt = Ct(L)StHtwt
Also the variance and the other second moments will be time varying. In fact the variance
of the process at time t will be given by
V art(Yt) =
∞∑j=0
AjtΩA′jt
Given that impulse response functions and the variance are time varying the contribution of
each shock may change over time.
Estimation
• The easiest way to estimate the model is by using Bayesian MCMC methods, specifically
the Gibbs sampler.
• Objective: we want to draws of the coefficients from the posterior distribution.
• The posterior distribution is unknown. What is known are the conditional posteriors so we
can draw from
1. p(σT |xT , θT , φT ,Ω,Ξ,Ψ)
2. p(φT |xT , θT , σT ,Ω,Ξ,Ψ)
3. p(θT |xT , σT , φT ,Ω,Ξ,Ψ)
4. p(Ω|xT , θT , σT , φT ,Ξ,Ψ)
5. p(Ξ|xT , θT , σT , φT ,Ω,Ψ)
6. p(Ψ|xT , θT , σT , φT ,Ω,Ξ)
• After a large number N the draws obtained are draws from the joint posterior.
• The objects of interests (IRF, variance decomposition, etc.) can be computed from the
draws obtained.
Application 1: Cogley and Sargent (2002) on unemployment-inflation
dynamics)
• Very important paper. The first paper using a version of the model seen above.
• Aim: to provide evidence about the evolution of measures of the persistence of infla-
tion, prospective long-horizon forecasts (means) of inflation and unemployment, statistics
for a version of a Taylor rule.
• VAR for inflation, unemployment and the real interest rate.
• Main results: long-run mean, persistence and variance of inflation have changed. The
Taylor principle was violated before Volcker (pre-1980). Monetary policy too loose.
T. Cogley and T.J. Sargent, (2002). ”Evolving Post-World War II U.S. Inflation Dynamics,”
NBER Macroeconomics Annual 2001, Volume 16, pages 331-388.
T. Cogley and T.J. Sargent, (2002). ”Evolving Post-World War II U.S. Inflation Dynamics,”
NBER Macroeconomics Annual 2001, Volume 16, pages 331-388.
T. Cogley and T.J. Sargent, (2002). ”Evolving Post-World War II U.S. Inflation Dynamics,”
NBER Macroeconomics Annual 2001, Volume 16, pages 331-388.
T. Cogley and T.J. Sargent, (2002). ”Evolving Post-World War II U.S. Inflation Dynamics,”
NBER Macroeconomics Annual 2001, Volume 16, pages 331-388.
T. Cogley and T.J. Sargent, (2002). ”Evolving Post-World War II U.S. Inflation Dynamics,”
NBER Macroeconomics Annual 2001, Volume 16, pages 331-388.
Application 2: Primiceri (2005, ReStud) on monetary policy
• Very important paper: the first paper adding stochastic volatility.
• The paper studies changes in the monetary policy in the US over the postwar period.
• VAR for inflation, unemployment and the real interest rate.
• Main results:
– systematic responses of the interest rate to inflation and unemployment exhibit a
trend toward a more aggressive behavior,
– this has had a negligible effect on the rest of the economy
Source: G. Primiceri ”Time Varying Structural Vector Autoregressions and Monetary Policy”,
The Review of Economic Studies, 72, July 2005, pp. 821-852
Source: G. Primiceri ”Time Varying Structural Vector Autoregressions and Monetary Policy”,
The Review of Economic Studies, 72, July 2005, pp. 821-852
Source: G. Primiceri ”Time Varying Structural Vector Autoregressions and Monetary Policy”,
The Review of Economic Studies, 72, July 2005, pp. 821-852
Source: G. Primiceri ”Time Varying Structural Vector Autoregressions and Monetary Policy”,
The Review of Economic Studies, 72, July 2005, pp. 821-852
Source: G. Primiceri ”Time Varying Structural Vector Autoregressions and Monetary Policy”,
The Review of Economic Studies, 72, July 2005, pp. 821-852
Application 3: Gali and Gambetti (2009, AEJ-Macro) on the Great
Moderation
Sharp reduction in the volatility of US output growth starting from mid 80’s.
Kim and Nelson, (REStat, 99).
McConnel and Perez-Quiros, (AER, 00).
Blanchard and Simon, (BPEA 01).
Stock and Watson (NBER MA 02, JEEA 05).
The literature has provided three different explanations
1. Strong good luck hypothesis⇒ same reduction in the variance of all shocks (Ahmed,
Levin and Wilson, 2002).
2. Weak good luck hypotesis ⇒ reduction of the variance of some shocks (Arias,
Hansen and Ohanian, 2006, Justiniano and Primiceri, 2005).
3. Structural change hypothesis ⇒ policy or non-policy changes (monetary policy,
inventories management).
Gali and Gambetti (2009, AEJM) using this class of model try to assess the causes of this
reduction in volatility
Idea of the paper very simple. Exploit different implications in terms of conditional and
unconditional second moments of the different explanations.
1. Strong good luck hypothesis ⇒ scaling down of all shocks variances, no change in
conditional (to a specific shock) and unconditional correlations.
2. Weak good luck hypotesis ⇒ change in the pattern of unconditional correlations,
no change in conditional correlations.
3. Structural change hypothesis ⇒ changes in both unconditional and conditional
correlations.
Estimate a TVC-VAR for labor productivity growth and hours worked identify a technology
and non-technology shock and study model implied second moments.
Standard deviation of output growth
Unconditional moments: correlation of hours and labor productivity growth
Technology and non-technology components of output growth volatility
Non technology shock: variance decomposition of output growth
Non technology shock: labor productivity response
Non technology shock: hours response
Application 4: D’Agostino Gambetti and Giannone (forthcoming
JAE) on the forecasting
• D’Agostino Gambetti and Giannone (JAE forthcoming) consider whether time-variations
can improve upon the forecast made with standard VAR models.
• The result is not trivial: time variations helpful but quite high number of parameter couldworsen the forecasts.
• The sample spans from 1948:I-2007:IV. Forecast up to 12 quarters ahead.
• Estimation is in real time.
7. Bayesian VAR
Preliminaries
• y- data; θ parameters; p(·) - density function; p(·|·) conditional density function.
• Bayesian analysis aims at drawing conclusions abot parameters θ in terms of probabil-
ity statements which are conditional on the observed data y. The building block of Bayesian
statistics is represented by the Bayes’ rule.
• Bayes’ rule. The joint probability can be written as a product of two densities: the prior
distribution, p(θ), and the sampling ditribution (or data distribution or likelihood function),
p(y|θ):p(θ, y) = p(y|θ)p(θ)
Moreover
p(θ, y) = p(θ|y)p(y)The two rules combined yield the Bayes’ rule
p(θ|y) = p(θ, y|)p(y)
=p(y|θ)p(θ)
p(y)
where p(y) =∫p(y|θ)p(θ)dθ does not depend on θ . Therefore
p(θ|y) ∝ p(y|θ)p(θ)
That is the posterior density is proportional to the sampling distribution and the prior distri-
bution.
• Likelihood Notice that the data affect the posterior density only through the sampling
distribution p(y|θ) or likelihood function. Bayesian inference obeys the likelihood principle
which states that for a given sample of data any two probablity models that have the same
likelihood function yield the same inference for θ.
• Prior . The density function p(θ) containing any non-data information about the coef-
ficients. Is where prior econonometrician beliefs are contained.
• Posterior odds. Model comparison in a Bayesian is based on model’s probability. Let
Mi denote model i = 1, ..,m. The posterior model probability is p(Mi|y). By Bayes’ rule
p(Mi|y) = p(y|Mi)p(M − i)p(y)
where p(M − i) is the prior model probability and p(y|Mi) is the marginal likelihood. The
posterior odds ratio compares two models
POij =p(Mi|y)p(Mj|y)
=p(y|Mi)p(Mi)
p(y|Mj)p(Mj)
where p(y|Mi)p(y|Mj)
is the Bayes Factor
The simplest example
• Consider a single scalar observation y from a normal distrbution parametrized by a mean θ
and a variance σ2 which is assumed to be known. The sampling distribution is
p(y|θ) = 1√2πσ
exp[− 1
2σ2(y − θ)2
]• Now assume that the prior for θ is also exponential, in particular
p(θ) ∝ exp
[− 1
2τ20(θ − μ0)2
]• Multiplying the likelihood and the prior we obtain the posterior
p(θ|y) ∝ exp
[−12
((y − θ)2σ2
+(θ − μ0)2
τ20
)]which after some manipulations reduces to
p(θ|y) ∝ exp
[− 1
2τ21(θ − μ1)2
]where the posterior mean
μ1 =
μ0τ20
+ yσ2
1τ20
+ 1σ20
and the posterior precision (reciprocal of the variance)
1
τ21=
1
τ20+
1
σ20
Multiple observations
• Now suppose a sample of iid observations y = (y1, ..., yn) is available. Again assume the
same prior as before. The posterior density is
p(θ) ∝ exp
[− 1
2τ20(θ − μ0)2
] n∏i=1
exp
[− 1
2σ20(yi − θ)2
]
∝ exp
[−12
(1
τ20(θ − μ0)2 + 1
σ2
n∑i=1
(yi − θ)2)]
∝ exp
[−12
(1
τ20(θ − μ0)2 + n
σ2(y − θ)2
)]Therefore the posterior density is
p(θ|y1, ...yn) = N(μn, τ2n)
with
μn =
1τ20
μ0 +nσ2y
1τ20
+ nσ20
and the posterior precision.
1
τ21=
1
τ20+
n
σ20
Bayesian VARs (BVARs)
The likelihood function
Consider the VAR(p)
Yt = A1Yt−1 +A2Yt−1 + ...+ApYt−1 + εt
with εt ∼ N(0,Ω) and its the two alternative representations
Y = XΓ+ u
where X = [X1, ..., XT ]′, Xt = [Y ′t−1, Y′t−2..., Y
′t−p]
′ Y = [Y1, ..., YT ]′, u = [ε1, ..., εT ]′ and Γ =
[A1, ..., Ap]′ and
Yt = (In ⊗X ′t)γ + εt
with γ = vec(Γ).
The likelihood function can be written as
L(γ,Ω) ∝ |Ω|−T/2 exp[−12(γ − γ)′(Ω−1 ⊗X′X(γ − γ)− 1
2tr[Ω−1(Y−XΓ)′(Y−XΓ)
]]∝ N(γ|γ,Ω⊗ (X′X)−1)× IW (Ω|(Y−XΓ)′(Y−XΓ), t− n− 1) (70)
That is the likelihood function can be decomposed into the product of a normal density for
γ conditional on Ω and an Inverse-Wishart density for Ω.
Minnesota Prior
This is the prior distribution used in Litterman (1980, 1986). The residual variance convari-
ance is take to be fixed. The prior for γ is
p(γ) ∼ N(γ, Σ)
in each equation γi is set equal to 1 for the first lag of the variable appearing on the left
hand side (0 if the variables are expressed in growth rates) and zero otherwise. That is the
prior reflects the idea that
Yit = Yit−1 + εit
The variance of Yit is
V ar(Yit) =
⎧⎨⎩
π1p
for parameters on own lagsπ1σ2ipσ2
j
for parameters on lags of variable i �= j
π3σ2i for parameters on deterministic or exogenous variables
where σi is a scale factor accounting fot eh different variability of the variables. Common
values for hyperparameters are π1 = 0.05, π2 = 0.005, π3 = 105.
The resulting posterior distribution γ|Y ∼ N(γ, Σ) has the following posterior mean and
variance
Σ =[Σ−1 +
(Ω−1 ⊗X′X
)]−1γ = Σ
[Σ−1γ +
(Ω−1 ⊗X
)′Y
]where Ω is the MLE estimator of Ω.
Diffuse Prior
With the diffuse (or Jeffreys’ ) prior
p(γ,Ω) ∝ |Ω|−(n+1)/2
the posterior distribution is obtained as the product of the conditional density for γ and the
marginal for Ω
γ|Ω, Y ∼ N(γ,Ω⊗ (X′X)−1)
Ω|Y ∼ IW((Y−XΓ)′(Y−XΓ), T − k
)where k is the first dimension of Γ.
Conjugate Prior
The natural conjugate for normal data (recall the likelihood function is a N times a IW) is
the normal-Wishart
γ|Ω ∼ N(γ,Ω⊗ Σ)Ω|Y ∼ IW (Ω, α), α > n
the posterior is given by
γ|Ω, Y ∼ N(γ,Ω⊗ Σ)Ω|Y ∼ IW (Σ, T + α)
where
Σ = (Σ−1 +X′X)−1
Γ = Σ(Σ−1 +X′XΓ
)Ω = Γ′X′XΓ + Γ′Σ−1Γ + Ω + (Y−XΓ)′(Y−XΓ)− Γ′(Σ−1 +X′X)Γ