Download pdf - Multivariate Time Series and Vector Autoregressions...is the ( × 1) vector time series {Y } where the row of {Y } is { }. That is, for any time , Y =( 1 )0. Multivariate time series

Multivariate Time Series and Vector

Autoregressions

Eric Zivot

May 29, 2013

Multivariate Time Series

Consider time series variables {1} {}. A multivariate time seriesis the ( × 1) vector time series {Y} where the row of {Y} is {}.That is, for any time , Y = (1 )

0.

Multivariate time series analysis is used when one wants to model and explainthe interactions and co-movements among a group of time series variables:

• Consumption and income

• Stock prices and dividends; forward and spot exchange rates

• interest rates, money growth, income, inflation

Stock and Watson state that macroeconometricians do four things with multi-variate time series

1. Describe and summarize macroeconomic data

2. Make macroeconomic forecasts

3. Quantify what we do or do not know about the true structure of themacroeconomy

4. Advise macroeconomic policymakers

Stationary and Ergodic Multivariate Time Series

A multivariate time series Y is covariance stationary and ergodic if all of itscomponent time series are stationary and ergodic.

[Y] = μ = (1 )0

var(Y) = Γ0 = [(Y − μ)(Y − μ)0]

=

⎛⎜⎜⎜⎝var(1) cov(1 2) · · · cov(1 )

cov(2 1) var(2) · · · cov(2 )... ... . . . ...

cov( 1) cov( 2) · · · var()

⎞⎟⎟⎟⎠

The correlation matrix of Y is the (× ) matrix

corr(Y) = R0 = D−1Γ0D

−1

where D is an (×) diagonal matrix with diagonal element var()12.

The parameters , Γ0 and R0 are estimated from data (Y1 Y ) usingthe sample moments

Y =1

X=1

Y→ [Y] = μ

Γ0 =1

X=1

(Y−Y)(Y−Y)0→ var(Y) = Γ0

R0 = D−1Γ0D−1 → corr(Y) = R0

where D is the ( × ) diagonal matrix with the sample standard deviationsof along the diagonal. The Ergodic Theorem justifies convergence of thesample moments to their population counterparts.

Cross Covariance and Correlation Matrices

With a multivariate time series Y each component has autocovariances andautocorrelations but there are also cross lead-lag covariances and correlationsbetween all possible pairs of components. The autocovariances and autocorre-lations of for = 1 are defined as

= cov( −)

= corr( −) =

0

and these are symmetric in : = − , = − .

The cross lag covariances and cross lag correlations between and aredefined as

= cov( −)

= corr( −) =q0

0

and they are not necessarily symmetric in . In general,

= cov( −) 6= cov( +)= cov( −) = −

Defn:

• If 6= 0 for some 0 then is said to lead .

• If − 6= 0 for some 0 then is said to lead .

• It is possible that leads and vice-versa. In this case, there is saidto be feedback between the two series.

All of the lag cross covariances and correlations are summarized in the (×)lag cross covariance and lag cross correlation matrices

Γ = [(Y − μ)(Y− − μ)0] =⎛⎜⎜⎜⎝cov(1 1−) cov(1 2−) · · · cov(1 −)cov(2 1−) cov(2 2−) · · · cov(2 −)... ... . . . ...cov( 1−) cov( 2−) · · · cov( −)

⎞⎟⎟⎟⎠R = D

−1ΓD−1

The matrices Γ and R are not symmetric in but it is easy to show thatΓ− = Γ0 and R− = R

0.

The matrices Γ and R are estimated from data (Y1 Y ) using

Γ =1

X=+1

(Y − Y)(Y− − Y)0

R = D−1ΓD−1

Multivariate Wold Representation

Any ( × 1) covariance stationary multivariate time series Y has a Wold orlinear process representation of the form

Y = μ+ ε +Ψ1ε−1 +Ψ2ε−2 + · · ·

= μ+∞X=0

Ψε−

Ψ0 = I

ε ∼WN(0Σ)

Ψ is an (× ) matrix with ( )th element .

In lag operator notation, the Wold form is

Y = μ+Ψ()ε

Ψ() =∞X=0

Ψ

elements of Ψ() are 1-summable

The moments of Y are given by

[Y] = μ var(Y) =∞X=0

ΨΣΨ0

Vector Autoregression Models

The vector autoregression (VAR) model is one of the most successful, flexible,and easy to use models for the analysis of multivariate time series.

• Made fameous in Chris Sims’s paper “Macroeconomics and Reality,” ECTA1980.

• It is a natural extension of the univariate autoregressive model to dynamicmultivariate time series.

• Has proven to be especially useful for describing the dynamic behavior ofeconomic and financial time series and for forecasting.

• It often provides superior forecasts to those from univariate time seriesmodels and elaborate theory-based simultaneous equations models.

• Used for structural inference and policy analysis. In structural analysis,certain assumptions about the causal structure of the data under investiga-tion are imposed, and the resulting causal impacts of unexpected shocks orinnovations to specified variables on the variables in the model are summa-rized. These causal impacts are usually summarized with impulse responsefunctions and forecast error variances

The Stationary Vector Autoregression Model

Let Y = (1 2 )0 denote an (×1) vector of time series variables.

The basic -lag vector autoregressive (VAR()) model has the form

Y = c+Π1Y−1 +Π2Y−2 + · · ·+ΠY− + ε

ε ∼WN (0Σ)

Example: Bivariate VAR(2)Ã12

!=

Ã12

!+

Ã111 112121 122

!Ã1−12−1

!

+

Ã211 212221 222

!Ã1−22−2

!+

Ã12

!

Equation by equation we have

1 = 1 + 1111−1 + 1122−1+2111−2 + 2122−2 + 1

2 = 2 + 1211−1 + 1222−1+2211−2 + 2222−2 + 2

where cov(1 2) = 12 for = ; 0 otherwise.

Note: The equation for 1 has lagged values of 1 and 2 and the equationfor 2 has lagged values of 1 and 2 Hence, it is not assumed that 2 isstrictly exogenous for 1 and vice-versa. For example, strict exogeneity of 2implies

2 = 2212−1 + 2222−2 + 2

(2 1) = 0

Remarks:

• Each equation has the same regressors — lagged values of 1 and 2.

• Endogeneity is avoided by using lagged values of 1 and 2.

• The VAR() model is just a seemingly unrelated regression (SUR) modelwith lagged variables and deterministic terms as common regressors.

In lag operator notation, the VAR() is written as

Π()Y = c+ ε

Π() = I −Π1− −Π

The VAR() is stable if the roots of

det (I −Π1 − · · ·−Π) = 0

lie outside the complex unit circle (have modulus greater than one), or, equiv-alently, if the eigenvalues of the companion matrix

F =

⎛⎜⎜⎜⎝Π1 Π2 · · · Π

I 0 · · · 00 . . . 0 ...0 0 I 0

⎞⎟⎟⎟⎠have modulus less than one. A stable VAR() process is stationary and ergodicwith time invariant means, variances, and autocovariances.

Example: Stability of bivariate VAR(1) model

Y = ΠY−1 + εÃ12

!=

Ã11 1221 22

!Ã1−12−1

!+

Ã12

!

Then det (I2 −Π) = 0 becomes

(1− 11)(1− 22)− 12212 = 0

• Stability condition involves cross terms 12 and 21

• If 12 = 21 = 0 (diagonal VAR) then bivariate stability condition reducesto univariate stability conditions for each equation.

If Y is covariance stationary, then the unconditional mean is given by

μ = (I −Π1 − · · ·−Π)−1c

The mean-adjusted form of the VAR() is then

Y − μ = Π1(Y−1 − μ) +Π2(Y−2 − μ) + · · ·+Π(Y− − μ) + ε

The basic VAR() model may be too restrictive to represent sufficiently themain characteristics of the data. The general form of the VAR() model withdeterministic terms and strictly exogenous variables X is given by

Y = Π1Y−1 +Π2Y−2 + · · ·+ΠY−+ΦD +GX + ε

D = deterministic terms

X = strictly exogenous variables ([Xε] = 0 all )

Wold Representation

Consider the stationary VAR(p) model

Π()Y = c+ ε

Π() = I −Π1− −Π

Since Y is stationary, Π()−1 exists so that

Y = Π()−1c+Π()−1ε

= μ+∞X=0

Ψ−

Ψ0 = I lim→∞

Ψ = 0

Π()−1 = Ψ() =∞X=0

Ψ

The Wold coefficients Ψ may be determined from the VAR coefficients Π

by solving

Π()Ψ() = I

(I −Π1− −Π)(I +Ψ1+Ψ2

2 + · · · ) = I

which implies

Ψ1 = Π1

Ψ2 = Π1Π1 +Π2...

Ψ = Π1Ψ−1 + · · ·+ΠΨ−

Estimation

Assume that the VAR() model is covariance stationary, and there are no re-strictions on the parameters of the model. In SUR notation, each equation inthe VAR() may be written as

y = Zβ + e = 1

• y is a ( × 1) vector of observations on the equation

• Z is a ( × ) matrix with row given by Z0 = (1Y0−1 Y0−)

and = + 1

• β is a (× 1) vector of VAR parameters for the th equation and e is a( × 1) error with covariance matrix 2 I

Result: Since the VAR() is in the form of a SUR model where each equationhas the same explanatory variables, each equation may be estimated separatelyby ordinary least squares without losing efficiency relative to generalized leastsquares. Let

B = [β1 β]

denote the (× ) matrix of least squares coefficients for the equations.Let

vec(B) =

⎛⎜⎝ β1...β

⎞⎟⎠

Result: Under standard assumptions regarding the behavior of stationary andergodic VAR models (e.g., see Hamilton, 1994) vec(Π) is consistent and as-ymptotically normally distributed with asymptotic covariance matrix

davar(vec(B)) = Σ⊗ (Z0Z)−1

where

Σ(×)

=1

−

X=1

EE0

E(×1)

= Y − B0Z = (e1 e)0

Lag Length Selection

The lag length for the VAR() model may be determined using model se-lection criteria. The general approach is to fit VAR() models with orders = 0 and choose the value of which minimizes some model selec-tion criteria

MSC() = ln |Σ()|+ · ( )

Σ() = −1X=1

EE0

= function of sample size

( ) = penalty function

Information Criteria

The three most common information criteria are the Akaike (AIC), Schwarz-Bayesian (BIC) and Hannan-Quinn (HQ):

AIC() = ln |Σ()|+ 2

2

BIC() = ln |Σ()|+ ln

2

HQ() = ln |Σ()|+ 2 ln ln

2

Remarks:

• AIC criterion asymptotically overestimates the order with positive proba-bility,

• BIC and HQ criteria estimate the order consistently under fairly generalconditions if the true order is less than or equal to .

Granger Causality

One of the main uses of VAR models is forecasting. The following intuitivenotion of a variable’s forecasting ability is due to Granger (1969).

• If a variable, or group of variables, 1 is found to be helpful for predictinganother variable, or group of variables, 2 then 1 is said to Granger-cause2; otherwise it is said to fail to Granger-cause 2.

• Formally, 1 fails to Granger-cause 2 if for all 0 the MSE of a forecastof 2+ based on (2 2−1 ) is the same as the MSE of a forecastof 2+ based on (2 2−1 ) and (1 1−1 ).

• The notion of Granger causality does not imply true causality. It onlyimplies forecasting ability.

Example: Bivariate VAR Model

In a bivariate VAR() , 2 fails to Granger-cause 1 if all of the VAR coefficientmatrices Π1 Π are lower triangular:Ã

12

!=

Ã12

!+

Ã111 0121 122

!Ã1−12−1

!+ · · ·

+

Ã11 0

21

22

!Ã1−2−

!+

Ã12

!Similarly, 1 fails to Granger-cause 2 if all of the coefficients on lagged valuesof 1 are zero in the equation for 2. Notice that if 2 fails to Granger-cause 1and 1 fails to Granger-cause 2, then the VAR coefficient matricesΠ1 Π

are diagonal.

Granger-Causality Test Statistic

The linear coefficient restrictions implied by Granger non-causality may betested using the Wald statistic

Wald = (R·vec(B)− r)0nRh davar(vec(B))iR0o−1

×(R·vec(B)− r)

In the bivariate model, this reduces to a simple F-statistic.

Example: VAR(2)

Ã12

!=

Ã12

!+

Ã111 112121 122

!Ã1−12−1

!+ · · ·

+

Ã211 212221 222

!Ã1−22−2

!+

Ã12

!0 : 2 does not Granger-cause 1

⇒ 112 = 212 = 0

Note:

β1 = (1 111

112

211

212)

0

β1 = (2 121

122

221

222)

0

Under 0 β1 = (1 111 0

211 0)

0

Here

(B) =

Ãβ1β2

!=

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

11111122112122121122221222

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠R =

Ã0 0 1 0 0 0 0 0 0 00 0 0 0 1 0 0 0 0 0

!r = (0 0)0

Forecasting Algorithms

Forecasting from a VAR(p) is a straightforward extension of forecasting froman AR(p). The multivariate Wold form is

Y = μ+ ε +Ψ1ε−1 +Ψ2ε−2 + · · ·Y+ = μ+ ε+ +Ψ1ε+−1 + · · ·

+Ψ−1ε+1 +Ψε + · · ·ε ∼WN(0Σ)

Note that

[Y] = μ

var(Y) = [(Y − μ)(Y − μ)0]

=

⎡⎢⎣⎛⎝ ∞X=1

Ψε−

⎞⎠⎛⎝ ∞X=1

Ψε−

⎞⎠0⎤⎥⎦

=∞X=1

ΨΣΨ0

The minimum MSE linear forecast of Y+ based on is

Y+| = μ+Ψε +Ψ+1ε−1 + · · ·

The forecast error is

ε+| = Y+ −Y+|= ε+ +Ψ1ε+−1 + · · ·+Ψ−1ε+1

The forecast error MSE is

MSE(ε+|) = [ε+|ε0+|]

= Σ+Ψ1ΣΨ01 + · · ·+Ψ−1ΣΨ

0−1

=−1X=1

ΨΣΨ0

Chain-rule of Forecasting

The best linear predictor, in terms of minimum mean squared error (MSE), ofY+1 or 1-step forecast based on information available at time is

Y+1| = c+Π1Y + · · ·+ΠY−+1

Forecasts for longer horizons (-step forecasts) may be obtained using thechain-rule of forecasting as

Y+| = c+Π1Y+−1| + · · ·+ΠY+−|Y+| = Y+ for ≤ 0

Note: Chain-rule may be derived from state-space framework⎛⎜⎜⎜⎝YY−1...

Y−+1

⎞⎟⎟⎟⎠ =

⎛⎜⎜⎜⎝Π1 Π2 · · · Π

I 0 ... 0... . . . ... ...0 · · · I 0

⎞⎟⎟⎟⎠⎛⎜⎜⎜⎝

YY−1...

Y−+1

⎞⎟⎟⎟⎠

+

⎛⎜⎜⎜⎝ε0...0

⎞⎟⎟⎟⎠ξ = Fξ−1 + v

Then

ξ+1| = Fξ

ξ+2| = Fξ+1| = F2ξ

...

ξ+| = Fξ+−1| = Fξ