Multivariate Time Series and Vector
Autoregressions
Eric Zivot
May 29, 2013
Multivariate Time Series
Consider time series variables {1} {}. A multivariate time seriesis the ( × 1) vector time series {Y} where the row of {Y} is {}.That is, for any time , Y = (1 )
0.
Multivariate time series analysis is used when one wants to model and explainthe interactions and co-movements among a group of time series variables:
• Consumption and income
• Stock prices and dividends; forward and spot exchange rates
• interest rates, money growth, income, inflation
Stock and Watson state that macroeconometricians do four things with multi-variate time series
1. Describe and summarize macroeconomic data
2. Make macroeconomic forecasts
3. Quantify what we do or do not know about the true structure of themacroeconomy
4. Advise macroeconomic policymakers
Stationary and Ergodic Multivariate Time Series
A multivariate time series Y is covariance stationary and ergodic if all of itscomponent time series are stationary and ergodic.
[Y] = μ = (1 )0
var(Y) = Γ0 = [(Y − μ)(Y − μ)0]
=
⎛⎜⎜⎜⎝var(1) cov(1 2) · · · cov(1 )
cov(2 1) var(2) · · · cov(2 )... ... . . . ...
cov( 1) cov( 2) · · · var()
⎞⎟⎟⎟⎠
The correlation matrix of Y is the (× ) matrix
corr(Y) = R0 = D−1Γ0D
−1
where D is an (×) diagonal matrix with diagonal element var()12.
The parameters , Γ0 and R0 are estimated from data (Y1 Y ) usingthe sample moments
Y =1
X=1
Y→ [Y] = μ
Γ0 =1
X=1
(Y−Y)(Y−Y)0→ var(Y) = Γ0
R0 = D−1Γ0D−1 → corr(Y) = R0
where D is the ( × ) diagonal matrix with the sample standard deviationsof along the diagonal. The Ergodic Theorem justifies convergence of thesample moments to their population counterparts.
Cross Covariance and Correlation Matrices
With a multivariate time series Y each component has autocovariances andautocorrelations but there are also cross lead-lag covariances and correlationsbetween all possible pairs of components. The autocovariances and autocorre-lations of for = 1 are defined as
= cov( −)
= corr( −) =
0
and these are symmetric in : = − , = − .
The cross lag covariances and cross lag correlations between and aredefined as
= cov( −)
= corr( −) =q0
0
and they are not necessarily symmetric in . In general,
= cov( −) 6= cov( +)= cov( −) = −
Defn:
• If 6= 0 for some 0 then is said to lead .
• If − 6= 0 for some 0 then is said to lead .
• It is possible that leads and vice-versa. In this case, there is saidto be feedback between the two series.
All of the lag cross covariances and correlations are summarized in the (×)lag cross covariance and lag cross correlation matrices
Γ = [(Y − μ)(Y− − μ)0] =⎛⎜⎜⎜⎝cov(1 1−) cov(1 2−) · · · cov(1 −)cov(2 1−) cov(2 2−) · · · cov(2 −)... ... . . . ...cov( 1−) cov( 2−) · · · cov( −)
⎞⎟⎟⎟⎠R = D
−1ΓD−1
The matrices Γ and R are not symmetric in but it is easy to show thatΓ− = Γ0 and R− = R
0.
The matrices Γ and R are estimated from data (Y1 Y ) using
Γ =1
X=+1
(Y − Y)(Y− − Y)0
R = D−1ΓD−1
Multivariate Wold Representation
Any ( × 1) covariance stationary multivariate time series Y has a Wold orlinear process representation of the form
Y = μ+ ε +Ψ1ε−1 +Ψ2ε−2 + · · ·
= μ+∞X=0
Ψε−
Ψ0 = I
ε ∼WN(0Σ)
Ψ is an (× ) matrix with ( )th element .
In lag operator notation, the Wold form is
Y = μ+Ψ()ε
Ψ() =∞X=0
Ψ
elements of Ψ() are 1-summable
The moments of Y are given by
[Y] = μ var(Y) =∞X=0
ΨΣΨ0
Vector Autoregression Models
The vector autoregression (VAR) model is one of the most successful, flexible,and easy to use models for the analysis of multivariate time series.
• Made fameous in Chris Sims’s paper “Macroeconomics and Reality,” ECTA1980.
• It is a natural extension of the univariate autoregressive model to dynamicmultivariate time series.
• Has proven to be especially useful for describing the dynamic behavior ofeconomic and financial time series and for forecasting.
• It often provides superior forecasts to those from univariate time seriesmodels and elaborate theory-based simultaneous equations models.
• Used for structural inference and policy analysis. In structural analysis,certain assumptions about the causal structure of the data under investiga-tion are imposed, and the resulting causal impacts of unexpected shocks orinnovations to specified variables on the variables in the model are summa-rized. These causal impacts are usually summarized with impulse responsefunctions and forecast error variances
The Stationary Vector Autoregression Model
Let Y = (1 2 )0 denote an (×1) vector of time series variables.
The basic -lag vector autoregressive (VAR()) model has the form
Y = c+Π1Y−1 +Π2Y−2 + · · ·+ΠY− + ε
ε ∼WN (0Σ)
Example: Bivariate VAR(2)Ã12
!=
Ã12
!+
Ã111 112121 122
!Ã1−12−1
!
+
Ã211 212221 222
!Ã1−22−2
!+
Ã12
!
Equation by equation we have
1 = 1 + 1111−1 + 1122−1+2111−2 + 2122−2 + 1
2 = 2 + 1211−1 + 1222−1+2211−2 + 2222−2 + 2
where cov(1 2) = 12 for = ; 0 otherwise.
Note: The equation for 1 has lagged values of 1 and 2 and the equationfor 2 has lagged values of 1 and 2 Hence, it is not assumed that 2 isstrictly exogenous for 1 and vice-versa. For example, strict exogeneity of 2implies
2 = 2212−1 + 2222−2 + 2
(2 1) = 0
Remarks:
• Each equation has the same regressors — lagged values of 1 and 2.
• Endogeneity is avoided by using lagged values of 1 and 2.
• The VAR() model is just a seemingly unrelated regression (SUR) modelwith lagged variables and deterministic terms as common regressors.
In lag operator notation, the VAR() is written as
Π()Y = c+ ε
Π() = I −Π1− −Π
The VAR() is stable if the roots of
det (I −Π1 − · · ·−Π) = 0
lie outside the complex unit circle (have modulus greater than one), or, equiv-alently, if the eigenvalues of the companion matrix
F =
⎛⎜⎜⎜⎝Π1 Π2 · · · Π
I 0 · · · 00 . . . 0 ...0 0 I 0
⎞⎟⎟⎟⎠have modulus less than one. A stable VAR() process is stationary and ergodicwith time invariant means, variances, and autocovariances.
Example: Stability of bivariate VAR(1) model
Y = ΠY−1 + εÃ12
!=
Ã11 1221 22
!Ã1−12−1
!+
Ã12
!
Then det (I2 −Π) = 0 becomes
(1− 11)(1− 22)− 12212 = 0
• Stability condition involves cross terms 12 and 21
• If 12 = 21 = 0 (diagonal VAR) then bivariate stability condition reducesto univariate stability conditions for each equation.
If Y is covariance stationary, then the unconditional mean is given by
μ = (I −Π1 − · · ·−Π)−1c
The mean-adjusted form of the VAR() is then
Y − μ = Π1(Y−1 − μ) +Π2(Y−2 − μ) + · · ·+Π(Y− − μ) + ε
The basic VAR() model may be too restrictive to represent sufficiently themain characteristics of the data. The general form of the VAR() model withdeterministic terms and strictly exogenous variables X is given by
Y = Π1Y−1 +Π2Y−2 + · · ·+ΠY−+ΦD +GX + ε
D = deterministic terms
X = strictly exogenous variables ([Xε] = 0 all )
Wold Representation
Consider the stationary VAR(p) model
Π()Y = c+ ε
Π() = I −Π1− −Π
Since Y is stationary, Π()−1 exists so that
Y = Π()−1c+Π()−1ε
= μ+∞X=0
Ψ−
Ψ0 = I lim→∞
Ψ = 0
Π()−1 = Ψ() =∞X=0
Ψ
The Wold coefficients Ψ may be determined from the VAR coefficients Π
by solving
Π()Ψ() = I
(I −Π1− −Π)(I +Ψ1+Ψ2
2 + · · · ) = I
which implies
Ψ1 = Π1
Ψ2 = Π1Π1 +Π2...
Ψ = Π1Ψ−1 + · · ·+ΠΨ−
Estimation
Assume that the VAR() model is covariance stationary, and there are no re-strictions on the parameters of the model. In SUR notation, each equation inthe VAR() may be written as
y = Zβ + e = 1
• y is a ( × 1) vector of observations on the equation
• Z is a ( × ) matrix with row given by Z0 = (1Y0−1 Y0−)
and = + 1
• β is a (× 1) vector of VAR parameters for the th equation and e is a( × 1) error with covariance matrix 2 I
Result: Since the VAR() is in the form of a SUR model where each equationhas the same explanatory variables, each equation may be estimated separatelyby ordinary least squares without losing efficiency relative to generalized leastsquares. Let
B = [β1 β]
denote the (× ) matrix of least squares coefficients for the equations.Let
vec(B) =
⎛⎜⎝ β1...β
⎞⎟⎠
Result: Under standard assumptions regarding the behavior of stationary andergodic VAR models (e.g., see Hamilton, 1994) vec(Π) is consistent and as-ymptotically normally distributed with asymptotic covariance matrix
davar(vec(B)) = Σ⊗ (Z0Z)−1
where
Σ(×)
=1
−
X=1
EE0
E(×1)
= Y − B0Z = (e1 e)0
Lag Length Selection
The lag length for the VAR() model may be determined using model se-lection criteria. The general approach is to fit VAR() models with orders = 0 and choose the value of which minimizes some model selec-tion criteria
MSC() = ln |Σ()|+ · ( )
Σ() = −1X=1
EE0
= function of sample size
( ) = penalty function
Information Criteria
The three most common information criteria are the Akaike (AIC), Schwarz-Bayesian (BIC) and Hannan-Quinn (HQ):
AIC() = ln |Σ()|+ 2
2
BIC() = ln |Σ()|+ ln
2
HQ() = ln |Σ()|+ 2 ln ln
2
Remarks:
• AIC criterion asymptotically overestimates the order with positive proba-bility,
• BIC and HQ criteria estimate the order consistently under fairly generalconditions if the true order is less than or equal to .
Granger Causality
One of the main uses of VAR models is forecasting. The following intuitivenotion of a variable’s forecasting ability is due to Granger (1969).
• If a variable, or group of variables, 1 is found to be helpful for predictinganother variable, or group of variables, 2 then 1 is said to Granger-cause2; otherwise it is said to fail to Granger-cause 2.
• Formally, 1 fails to Granger-cause 2 if for all 0 the MSE of a forecastof 2+ based on (2 2−1 ) is the same as the MSE of a forecastof 2+ based on (2 2−1 ) and (1 1−1 ).
• The notion of Granger causality does not imply true causality. It onlyimplies forecasting ability.
Example: Bivariate VAR Model
In a bivariate VAR() , 2 fails to Granger-cause 1 if all of the VAR coefficientmatrices Π1 Π are lower triangular:Ã
12
!=
Ã12
!+
Ã111 0121 122
!Ã1−12−1
!+ · · ·
+
Ã11 0
21
22
!Ã1−2−
!+
Ã12
!Similarly, 1 fails to Granger-cause 2 if all of the coefficients on lagged valuesof 1 are zero in the equation for 2. Notice that if 2 fails to Granger-cause 1and 1 fails to Granger-cause 2, then the VAR coefficient matricesΠ1 Π
are diagonal.
Granger-Causality Test Statistic
The linear coefficient restrictions implied by Granger non-causality may betested using the Wald statistic
Wald = (R·vec(B)− r)0nRh davar(vec(B))iR0o−1
×(R·vec(B)− r)
In the bivariate model, this reduces to a simple F-statistic.
Example: VAR(2)
Ã12
!=
Ã12
!+
Ã111 112121 122
!Ã1−12−1
!+ · · ·
+
Ã211 212221 222
!Ã1−22−2
!+
Ã12
!0 : 2 does not Granger-cause 1
⇒ 112 = 212 = 0
Note:
β1 = (1 111
112
211
212)
0
β1 = (2 121
122
221
222)
0
Under 0 β1 = (1 111 0
211 0)
0
Here
(B) =
Ãβ1β2
!=
⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝
11111122112122121122221222
⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠R =
Ã0 0 1 0 0 0 0 0 0 00 0 0 0 1 0 0 0 0 0
!r = (0 0)0
Forecasting Algorithms
Forecasting from a VAR(p) is a straightforward extension of forecasting froman AR(p). The multivariate Wold form is
Y = μ+ ε +Ψ1ε−1 +Ψ2ε−2 + · · ·Y+ = μ+ ε+ +Ψ1ε+−1 + · · ·
+Ψ−1ε+1 +Ψε + · · ·ε ∼WN(0Σ)
Note that
[Y] = μ
var(Y) = [(Y − μ)(Y − μ)0]
=
⎡⎢⎣⎛⎝ ∞X=1
Ψε−
⎞⎠⎛⎝ ∞X=1
Ψε−
⎞⎠0⎤⎥⎦
=∞X=1
ΨΣΨ0
The minimum MSE linear forecast of Y+ based on is
Y+| = μ+Ψε +Ψ+1ε−1 + · · ·
The forecast error is
ε+| = Y+ −Y+|= ε+ +Ψ1ε+−1 + · · ·+Ψ−1ε+1
The forecast error MSE is
MSE(ε+|) = [ε+|ε0+|]
= Σ+Ψ1ΣΨ01 + · · ·+Ψ−1ΣΨ
0−1
=−1X=1
ΨΣΨ0
Chain-rule of Forecasting
The best linear predictor, in terms of minimum mean squared error (MSE), ofY+1 or 1-step forecast based on information available at time is
Y+1| = c+Π1Y + · · ·+ΠY−+1
Forecasts for longer horizons (-step forecasts) may be obtained using thechain-rule of forecasting as
Y+| = c+Π1Y+−1| + · · ·+ΠY+−|Y+| = Y+ for ≤ 0
Note: Chain-rule may be derived from state-space framework⎛⎜⎜⎜⎝YY−1...
Y−+1
⎞⎟⎟⎟⎠ =
⎛⎜⎜⎜⎝Π1 Π2 · · · Π
I 0 ... 0... . . . ... ...0 · · · I 0
⎞⎟⎟⎟⎠⎛⎜⎜⎜⎝
YY−1...
Y−+1
⎞⎟⎟⎟⎠
+
⎛⎜⎜⎜⎝ε0...0
⎞⎟⎟⎟⎠ξ = Fξ−1 + v
Then
ξ+1| = Fξ
ξ+2| = Fξ+1| = F2ξ
...
ξ+| = Fξ+−1| = Fξ