1. The ARIMA command in Stata - Universitetet i oslo...1. The ARIMA command in Stata The ARIMA command includes the possibility of covariates. It estimates a model involving both y

1

HG ECON 5101

Fourth lecture – 5 Feb. 2014

1. The ARIMA command in Stata

The ARIMA command includes the possibility of covariates. It estimates a

model involving both ty and a set of covariates tx , assuming

' ~ ARMA( , )t ty x p q where is a vector of parameters, or

(1) 1 1

1 1

' '

p q

t t j t t j t j t

j j

y x y x

This includes the possibility that ty and tx are cointegrated in the

sense that (i) both ty and tx are non-stationary [I(1)], and (ii) that

that the linear combination, 't ty x is stationary [I(0)].

A potential application could be a situation where some other method

(e.g., Søren Johansen’s method) has established that ty and tx are

cointegrated, and an ARIMA analysis could be used to explore/confirm

the nature of the stationarity of 't ty x .

(1) is a special case of an ARMAX model with covariates. However, Stata

has not implemented a general ARMAX- routine. General ARMAX can

be handled by Kalman filtering (the sspace –command).

We will look at the Example 4 in the arima-documentation (pdf) in the Stata

manual: The data concern two US series

personal-consumption - consumpt

money supply - m2t

2

The analysis is an update inspired by an earlier analysis by Friedman and

Meiselman (1963) who postulated the simple relationship,

0 1m2t t tu consump

Load data with

use http://www.stata-press.com/data/r13/friedman2, clear

Stata cuts the data in 1982q1 because of an intervention by the reserve bank in

1982 to control the inflation. I.e., a structural change in 1982.

Apparent non-stationary but cointegrated series in period 1959 – 1981.

M2 missing

Data for example

0

200

04

00

06

00

0

1950q1 1960q1 1970q1 1980q1 1990q1 2000q1time

personal consumption, cur $,UAB NIPA M2 money supply, St Louis fed web

US personal consumption and money supply M2

3

Stata’s solution – postulate ARMA(1,1) for 't ty x :

. arima consump m2 if tin(, 1981q4), ar(1) ma(1)

ARIMA regression

Sample: 1959q1 - 1981q4 Number of obs = 92

Wald chi2(3) = 4394.80

Log likelihood = -340.5077 Prob > chi2 = 0.0000

------------------------------------------------------------------------------

| OPG

consump | Coef. Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------

consump |

m2 | 1.122029 .0363563 30.86 0.000 1.050772 1.193286

_cons | -36.09872 56.56703 -0.64 0.523 -146.9681 74.77062

-------------+----------------------------------------------------------------

ARMA |

ar |

L1. | .9348486 .0411323 22.73 0.000 .8542308 1.015467

|

ma |

L1. | .3090592 .0885883 3.49 0.000 .1354293 .4826891

-------------+----------------------------------------------------------------

/sigma | 9.655308 .5635157 17.13 0.000 8.550837 10.75978

------------------------------------------------------------------------------

Now, look at the residuals.

predict res1,resid

tsline res1 if tin(1959q2,1981q4)

-40

-20

02

04

0

resid

ua

l, o

ne

-ste

p

1960q1 1965q1 1970q1 1975q1 1980q1time

Residuals model 1 - orig data

4

Clearly not white noise (increasing variance). So we try to stabilize the variance

by taking the logs:

gen lm2=ln(m2)

gen lcons=ln( consump)

arima lcons lm2 if tin(, 1981q4), ar(1) ma(1)

ARIMA regression

Sample: 1959q1 - 1981q4 Number of obs = 92

Wald chi2(3) = 2600.44

Log likelihood = 299.9357 Prob > chi2 = 0.0000

------------------------------------------------------------------------------

| OPG

lcons | Coef. Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------

lcons |

lm2 | .9822586 .0607853 16.16 0.000 .8631217 1.101396

_cons | .1832399 .4434234 0.41 0.679 -.6858541 1.052334

-------------+----------------------------------------------------------------

ARMA |

ar |

L1. | .9731574 .0247959 39.25 0.000 .9245583 1.021757

|

ma |

L1. | .2496818 .1356231 1.84 0.066 -.0161347 .5154983

-------------+----------------------------------------------------------------

/sigma | .0091153 .0007558 12.06 0.000 .007634 .0105967

------------------------------------------------------------------------------

Note: The test of the variance against zero is one sided, and the two-sided

confidence interval is truncated at zero.

Residuals:

-.02

-.01

0

.01

.02

resid

ua

l, o

ne

-ste

p

1960q1 1965q1 1970q1 1975q1 1980q1time

Residuals from ln-transformed model

5

Normality test: swilk resln if tin(1959q2,1981q4)

Shapiro-Wilk W test for normal data

Variable | Obs W V z Prob>z

-------------+--------------------------------------------------

resln | 91 0.99384 0.470 -1.665 0.95207

2. Forecasting (prediction). (A good reference is Hamilton chap. 4)

Suppose we wish to predict a random variable Y from a set of predictor rv’s,

1 2, , , kX X X . A predictor is a function, 1 2ˆ ( , , , )kY h X X X .

The best predictor in the minimum mean squared error (MSE) sense, i.e.,

minimizing 2

ˆE Y Y

, is

(2) 1 2ˆ | , , , kY E Y X X X

(see Hamilton chap. 4 or a lecture note on prediction on the web-page for Econ

4130 (stat 2) 2013)

In time series the function in (2) often turns out to be complicated , so it is

common to use a next best solution, the best linear approximation to (2) (also

called the “linear projection predictor” from Hilbert space terminology that can

be shown to exist always),

(3) 0 1 1ˆ

k kY a a X a X

where the constants 0 1, , , ka a a are determined to minimize the MSE

2

ˆE Y Y

.

Note: If 1 2( , , , , )kY X X X are jointly normal (Gaussian), the (2) and (3) are

equivalent !

6

Solution:

Let Y have expectation and variance, 2 and Y Y .

Let 1 2' ( , , , )kX X X X have expectation 1 2' ( , , , )k and

covariance matrix , 1

cov( ,k

i j i jX X

.

Write 1 2' ( , , , )ka a a a , and the covariance between Y and X as the vector

1cov( , ), ,cov( , )XY kY X Y X

Theorem 3 The MSE minimizing constants, are given as any solution to i.

and ii. below: The solution is obtained by solving the equations

ˆ 0

ˆ( ) 0 for 1,2, ,j

E Y Y

E Y Y X j k

leading to

i. 0 1 1Y k ka a a

ii. ( cov( , ))XYa Y X

iii. If is non-singular, 1

XYa .

iv. The MSE of the solution, becomes,

2 2 1ˆ( ) Y XY XYMSE E Y Y

Proof: Exactly the same proof as for the OLS estimators in a multiple

regression problem. Only replace all sample quantities, means and sample

covariances, by corresponding population quantities. Or see Hamilton chap. 4.

(End of proof.)

In a time series context, we have, e.g., observed, 1 2, , , tY Y Y of a causal

stationary series with expectation ( )tE Y and autocovariance ( )h ,

0, 1, 2,h .

Let 1 2 1, , , ,t t t tD Y Y Y Y be all that we know at time t.

We want to forecast t jY

at a future time t j .

7

Notation: We write the best linear predictor as |ˆ ( | )t j tt j tY E Y D (if feasible)

or the best linear approximation by theorem 3.

With regard to Theorem 3, we have in general 1 0, ( , , , ),t j t tY Y X Y Y Y

1 1( , , , )t ta a a a , and , 1

( )t

i ji j

(a t t matrix), with solution

(4) 1

,t t ja where , ( ( ), ( 1), , ( 1))t t j j j j t .

For large t (4) becomes unpractical and is replaced by various recursion

formulas developed that do not involve matrix inversions (see Hamilton

(chap. 4) for some approaches).

The forecast problem for AR(p) is simple. We then only need the p values,

1 1, , ,t t t pY Y Y to predict

t jY .

The forecast problem for ARMA(p,q) for 1q is more complicated. We

then need the whole history, 1 1, , ,t tY Y Y to predict t jY

.

To simplify the argument below I assume a slightly stronger assumption for the

white noise series t , namely that

(5) 1 2( | , , ) 0t t tE for all t (in the causal case we look at here).

Note that (5) implies the weaker assumption that the 't s are uncorrelated, but

uncorrelated 't s does not imply (5) except in the Gaussian case (!).

The (causal) AR(p) case:

(6) 0 1 1t j t j p t j p t jY Y Y

Let 1, ,t t tD Y Y . From (6) we have

(7)

| 0 1 1

0 1 1

ˆ (Y | ) |

| | |

t j t t j t t j p t j p t j t

t j t p t j p t t j t

Y E D E Y Y D

E Y D E Y D E D

8

The solution of (6) (for all j) is (where ( )sE Y )

(8) 1 1 2 2t j t j t j t jY

showing that tD only depends on 1 2, , ,t t t . Hence (using (5)) the last

term in (7) must be 0, | 0t j tE D for 1j . Then, from (7)

| 0 1 1| |

ˆ ˆ ˆt j t t j t p t j p tY Y Y

In particular we get (noting that |s t sE Y D Y for s t ):

1| 0 1 1

ˆt t t p t pY Y Y

2 | 0 1 1| 2 2

ˆ ˆt t t t t p t pY Y Y Y

------

| 0 1 1| |

ˆ ˆ ˆt j t t j t p t j p tY Y Y for j p

The (causal) ARMA(p,q) case:

Consider the simplest ARMA(0,1) =MA(1) case (see Hamilton chap. 4 for the

general case):

(9) 0 1t j t j t jY

where we assume that the MA error term is invertible (i.e., 1 ).

Here the problem arises for the first forecast,

1| 1 0 1ˆ ( | ) ( | ) ( | )t t t t t t t tY E Y D E D E D

Since in the causal case (as for the AR(p) case), tD only depends on 1, ,t t ,

we must have 1( | ) 0t tE D , so we get

(10) 1| 1 0ˆ ( | ) ( | )t t t t t tY E Y D E D

9

Using the invertibility of t , we have from (9)

20

1 2 ( )1

r

t t t t t rY Y Y Y

showing that

1| 1 0

2 300 1 2

ˆ ( | ) ( | )

( )1

t t t t t t

r

t t t t r

Y E Y D E D

Y Y Y Y

An approximation to this is then obtained by truncating the Y-series after

1r t , giving

2 3 10

1| 0 1 2 1ˆ ( )

1

t

t t t t tY Y Y Y Y

This approximation will obviously improve as t increases since 0t

t

.

For 2j , however, the prediction gets easy since then

1( | ) ( | ) 0t j t t j tE D E D , implying simply

| 0

ˆt j tY

In the general ARMA(p,q) case we get similarly for 1 j q

| 0 1 1| | 1 1

ˆ ˆ ˆ ˆ ˆ ˆt j t t j t p t j p t j t j t q t j qY Y Y

where the ˆ 's s must be predicted by 1, ,s sY Y ,

but for j q , we get as before

| 0 1 1| |ˆ ˆ ˆt j t t j t p t j p tY Y Y

Exact predictors can be derived as well (see Hamilton chap. 4) but the formulas

are slightly complicated…

10

3. The property of “mean reversion” for causal stationary

processes.

The mean ( )tE Y in a stationary time series tY seems to play the role as

an attractor in the series – in the sense that, if an observation of tY is far away

from , the next observation has a tendency to be closer to . This tendency

some economists tend to call “mean reversion”.

In prediction the tendency becomes evident:

The general solution of a causal ARMA time series has the form

1 1 2 2t t t tY

and at time point t j in the future

(11) 1 1 2 2t j t j t j t j j tY

Using (5), all ( | ) 0t j tE D for 0j . Hence, the forecast becomes

(12) | 1 1

ˆ ( | )t j t t j t j t j tY E Y D

Since 0 as j j , we see that the error term will approach 0 as j

increases (it is not hard to prove this formally using the Hilbert-space concept

“mean-square” convergence).

Hence |

ˆt j tY as j increases showing the attractor property of .

As a contrast, it is easy to see that random walks do not share the mean

reversion property. Consider the simple RW,

1 2t tY

11

We get

1 2 1 1t j t t t j t t t jY Y

implying

|

ˆ ( | )t j t t j t tY E Y D Y for all 0j .

Hence, no mean reversion in random walks.

Final notes to forecasting.

See the example of forecasting the Norwegian lnGDP in lecture notes 3

(LN3). There we estimated the model based on a reduced data set leaving

the last observations to be predicted. To achieve this we could use the

Stata arima post-estimation command predict. The stata manual did

the same thing for the example we started with (see the pdf-manual).

However, suppose we have a covariate time series tx as in the example,

we have used the whole series and want to forecast future values beyond

the data. Then the predict command cannot be used. We have to use

the forecast command instead (as illustrated in the exercises for

seminar 1). In order to utilize the covariate tx in the forecasting, we

need to forecast t jx first (using forecast) an use the predicted values

of t jx in the equation as basis for forecasting t jy , again using

forecast.

Suppose we have estimated an Arima(p,1,q) for ty which is the same as

estimating an Arma(p,q) for ty . Having predicted values for ˆty ,

we can calculate predicted values for the original series ty by

cumulative sums (called “integration” in the cointegration literature):

2 1 2 3 1 2 3 1 2 3ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ, , , t ty y y y y y y y y y y y .

This is done automatically in Stata by the option y for predict.

12

4. Introduction to VAR(p) (An excellent reference for multivariate time series is H. Lütkepohl, “New

Introduction to Multiple Time Series Analysis”, Springer Verlag, 2005 ).

I use Lütkepohl’s notation in the following.

Let

1

2

t

t

t

Kt

y

yy

y

be a K-dimensional vector of time series.

DEF. We say that ty is a vector autoregressive time series of order p

( ~ VAR( )ty p ) if

(13) 1 1t t p t p ty A y A y u

where 1 2( , , , )K are constants, 1 2, , , pA A A are square K K

coefficient matrices, and 1 2( , , , )t t t Ktu u u u is a K-dimensional white noise

vector satisfying,

( ) 0, cov( ) , 0 for t t t t u t sE u u E u u E u u t s

(i.e., tu is a special case of a covariance stationary vector process).

DEF. We say that ty is a structural vector autoregressive time series of order

p ( ~ SVAR( )ty p ) if there is a non-singular K K matrix B such that

(14) 1 1t t p t p tBy A y A y

where ~ (0, )t WN

Multiplying (14) by 1B we get the reduced form (13) with

1 1, for 1, ,j jB A B A j p , and 1 ~ (0, )t t uu B WN where

13

1 1( )u B B

. Although important in dynamic modeling, we will not

discuss SVAR models further in this part of the course.

The VAR(1) case: We will look at first the VAR(1) model

(15) 1 1t t ty A y u

Note. It turns out (see below) that the more general VAR(p) can be considered a

special case of VAR(1)!

Successive substitution in (15) gives (where KI is the K-dimensional identity

matrix)

(16) 1 1

1 1 1 0 1 1 1 1( ) ( )t t t

t K t ty I A A A y u Au A u

For this to stabilize to something stationary, we must have that 1

tA must

converge to a 0-matrix (written 0 , i.e., a K K matrix of zeroes ). This

happens if and only if all eigenvalues of 1A have modulus strictly less than 1:

[Review of eigenvalues and eigenvectors. Let A be a square K K

matrix. If b is a vector 0 , and a scalar such that

Ab b (which is the same as 0KA I b ), we say that x is an

eigenvector and a corresponding eigenvalue.

For the equation 0KA I b to be possible for an 0x , the matrix

KA I must be singular with determinant = 0, i.e.,

14

11 12 1

21 22 2

1 2 11

det( ) ( )

K

K

K

K K

a a a

a a aA I p

a a a

, i.e., a

polynomial of order K. ( )p must have K roots, 1 2, , , K , making

( ) 0ip , some of which may be complex.

Assume for simplicity that all 1 2, , , K are different and let

1 2, , , Kb b b be the corresponding eigenvectors. Then

(17) i i iAb b for 1,2, ,i K

Collecting all ib in a matrix, 1 2( , , , ) ~KB b b b K K , it can be shown

that B must be non-singular, and (17) becomes

(18) 1 2 1 1 2 2( , , , ) ( , , , )K K KAB Ab Ab Ab b b b B ,

where

1

2

0 0

0 0

0 0 K

is a diagonal matrix.

Multiplying (18) by 1B from the right gives

(19) 1A B B

Using (19), we get 2 1 1 1 2 1

KA B B B B B I B B B ,

3 3 1A B B … etc, and in general

(20) 1j jA B B

On the other hand, regular matrix multiplication gives

15

2

1

2

2 2

2

0 0

0 0

0 0 K

, and in general,

1

2

0 0

0 0

0 0

j

j

j

j

K

So, if and only if all 1j , 0j

j (i.e., the 0-matrix).

From (20) we then have

1 1lim lim lim 0j j j

j j jA B B B B

(21) Basic stability condition for VAR(1)

In other words, the condition that 0j

jA

is that all eigenvalues of A

have modulus strictly less than 1.

Note. If some of the 'i s are equal, (20) is still valid, but with a slightly

more complicated (“Jordan form”) . j can still be calculated and the

conclusion is the same for stability.

End of review. ]

We need another matrix formula

Lemma 1 Let A be K K satisfying the stability condition (21). Then

i. 1 1( ) ( ) ( )t t

K K KI A A I A I A

ii. The infinite matrix series converges as t and is equal to

2 1

0

( )j

K K

j

I A A A I A

[Proof. Usual matrix multiplication gives

16

1

2 1

2 1

( )( )

t

K K

t

K

t t t

K

I A I A A

I A A A

A A A A I A

We must have ( )KI A is non-singular (i.e., det( ) 0KI A since 1 is

not an eigenvalue of A). Then, multiplying both sides of the equality by 1( )KI A gives i.

ii.: By definition we have

2 1

0

1 1

lim( )

lim( ) ( ) ( )

Defj t

K Kt

j

t

K K Kt

I A A A I A A

I A I A I A

End of proof. ]

Under stability (all eigenvalues of 1A have modulus less than 1) the solution (16)

(reproduced)

(16) 1 1

1 1 1 0 1 1 1 1( ) ( )t t t

t K t ty I A A A y u Au A u

will approach a stationary process as t :

The first term, 1

1 1 1( ) ( )t

K Kt

I A A I A

The second term 0

The third term converges to a stationary process (when we imagine that the

white noise process tu has been going on since :

17

Theorem 4. If all eigenvalues of 1A have modulus less than 1, then

i. 2

1 1 1 2 1

0

j

t t t t t j

j

z u Au A u A u

is a well defined random

variable and the time series tz is stationary with ( ) 0tE z and

autocovariance matrices

1 1

0

( ) ( ) ( )j h j

z t h t u

j

h E z z A A

ii. The VAR(1) equation, 1 1t t ty A y u , has a causal stationary

solution

2

1 1 1 2 1

0

j

t t t t t j

j

y u Au A u A u

where 1

1( ) ( )t KE y I A , and autocovariance matrices

1 1

0

( ) (y )( ) ( )j h j

y t h t u

j

h E y A A

(The proof of this follows from elementary Hilbert space theory combined with

the concept of “mean square convergence” between random variables – a

convergence concept that is slightly stronger than convergence in probability

and/or distribution.)

Note that the autocovariance matrices ( )y h are not depending on t, implying

covariance stationarity of the ty series.

So, theorem 4 shows that the stability condition implies the existence of a causal

stationary solution for the VAR(1) model.

18

5. The stability condition for general VAR(p) models.

We then need the following useful lemma.

Notation. For any square matrix B, write the determinant, det( )B B .

Lemma 2. The stability condition for VAR(1) (i.e., that all eigenvalues of 1A

have modulus less than 1), is equivalent to the following condition

(22) 1 0KI A z for all z such that 1z

where z is a scalar variable.

[Proof. Write (22) (remember that a constant taken out of a determinant

must be raised to the power of K)

1 1

1( )K

K KI A z z A Iz

(We can exclude the possibility 0z since 1 0KI ). Hence (22) is

equivalent to

(23) 1

10 for all 1KA I z

z

Assume that all eigenvalues of 1A have modulus strictly less than 1. Then

we must have

1 0KA I for any with 1 , i.e., for any z with 1 1

1z z , i.e.,

for any z with 1z , so (23) and therefore (22) must be true.

Assume now that (22) is true. Then (23) is true, implying that

1

10KA I

z can have no solution with

11

z . In other words, the

equation, 1 0KA I can have no solution with 1 . So all solutions

(i.e., eigenvalues) must fulfill 1 - which is the stability condition.

End of proof. ]

19

Any VAR(p) can be formulated as a VAR(1).:

If ty is a K-dimensional VAR(p):

(24) 1 1t t p t p ty A y A y u

we can define a Kp -dimensional VAR(1), 1t t tY AY U by putting

1

1

~ 1

t

t

t

t p

y

yY Kp

y

, 0

~ 1

0

Kp

1 2 1

0 0 0

~ ( )0 0 0

0 0 0

p p

K

K

K

A A A A

I

A Kp KpI

I

, 0

~ ( 1)

0

t

t

u

U Kp

or

1 2 1

1

1 2

1

0 0 00 0

0 0 0

0 00 0 0

p p

t t t

K

t t

K

t p t p

K

A A A Ay y u

Iy y

I

y yI

The matrix A is sometimes called the companion matrix for the VAR(p)

Some manipulation of determinants (as done in Hamilton for the univariate

AR(p) case), shows

(25) 1det det p

Kp K pI Az I A z A z

20

From Lemma 2 we get our stability condition for the VAR(p) in (24)

(26)

The VAR(p), 1 1t t p t p ty A y A y u , is stable if and only if

1det 0 for all 1p

K pI A z A z z

The polynomial, 1det p

K pI A z A z , we may call the companion

polynomial of the VAR(p) process.

The criterion (26) then says that the VAR(p) process is stable if and only if all

roots of the companion polynomial , 1det p

K pI A z A z , are outside the

unit circle (as in the univariate case).

Note also that because of Theorem 4 the stability of a VAR(p) implies (via its

VAR(1) representation) that it has a causal stationary solution which has a

MA( ) form.

Knowing that ty is stationary, the expected value, ( )tE y , is easily found

by taking expectations of 1 1t t p t p ty A y A y u directly, giving

1 1 or p K pA A I A A

giving

(27) 1

1( )t K pE y I A A

To find the MA( ) solution for ty , we may use the VAR(1) and theorem 4.

Introduce the ( )K Kp matrix

: 0 : : 0KJ I

21

Then t ty JY and from theorem 4 ( putting ( )tE Y

)

0 0

j j

t t t j t j

j j

y JY J J A U J A U

Example 1 (taken from Lütkepohl):

Consider the bivariate VAR(2) model

1 2

.5 .1 0 0

.4 .5 .25 0t t t ty y y u

Is this a stable (and a causal stationary) process?

The companion polynomial becomes

2

2

2 3

1 0 .5 .1 0 0 1 .5 .1det det

0 1 .4 .5 .25 0 .4 .25 1 .5

1 .21 .025

z zz z

z z z

z z z

with roots, 1 2 31.3, 3.55 4.26 , 3.55 4.26z z i z i

Since 2 2

2 3 3.55 4.26 5.545z z ,

we see that all roots are outside the unit circle – so the process is stable and a

causal stationary solution exists. (End of example.)

(A little bit on forecasting and estimation comes next lecture.)

Documents

1. The ARIMA command in Stata - Universitetet i oslo...1. The ARIMA command in Stata The ARIMA command includes the possibility of covariates. It estimates a model involving both y