Further Non-Stationarity Notes

8/13/2019 Further Non-Stationarity Notes

1/50

University of Oxford

Time Series Analysis

Section III

Michaelmas Term, 2010

Department of Statistics, 1 South Parks Road,

Oxford OX1 3TG


2/50

Contents

1 Non-Stationary Time Series 1

1.1 Phenomenology of Non-Stationarity . . . . . . . . . . . . . . . . . 1

1.2 Trend Stationary vs Difference Stationary . . . . . . . . . . . . . 3

2 Unit Root Tests 7

2.1 General Issues in Unit Root Testing . . . . . . . . . . . . . . . . . 7

2.2 Dickey-Fuller Tests . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.2.1 Model 1 No Drift or Trend . . . . . . . . . . . . . . . . 8

2.2.2 Model 2 Drift but no Trend . . . . . . . . . . . . . . . 9

2.2.3 Model 3 Drift and Trend . . . . . . . . . . . . . . . . . 10

2.2.4 Perron Sequential Testing Procedure for Unit Roots . . . . 12

2.3 Augmented Dickey Fuller Regression . . . . . . . . . . . . . . . . 15

2.3.1 1. Non-IID errors . . . . . . . . . . . . . . . . . . . . . . . 16

2.3.2 2. MA(q ) and AR( k ) terms . . . . . . . . . . . . . . . . . 17

2.3.3 The ADF test . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.3.4 ADF Test Procedure: Phase 1 . . . . . . . . . . . . . . . . 19

2.3.5 ADF Test Procedure: Phase 2 . . . . . . . . . . . . . . . . 20

1


3/50

CONTENTS 2

3 Spurious Regressions 21

4 Multivariate Time Series 29

4.1 Vector Time Series Models . . . . . . . . . . . . . . . . . . . . . . 29

4.1.1 Covariance and Correlation Matrix Functions . . . . . . . 29

4.1.2 Moving Average and Autoregressive Vector Models . . . . 33

5 Cointegration 39

5.1 The Error Correction Model . . . . . . . . . . . . . . . . . . . . . 39

5.2 Vector Error Correction Models . . . . . . . . . . . . . . . . . . . 42

5.3 Cointegration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

5.4 Engle-Granger Estimation . . . . . . . . . . . . . . . . . . . . . . 46


4/50

1

Non-Stationary Time Series

In Section I we examined several different ARIMA models. We determined

conditions under which these models were stationary. Having imposed these

stationarity conditions we then computed ACF functions for each model. In this

chapter we will examine some models for non-stationary series.

1.1 Phenomenology of Non-Stationarity

In this section we will consider some of the implications of non-stationarity. A

stationary series has a well-dened mean around which it can uctuate with

constant nite variance. This is not necessarily true for a non-stationary series.

The issues involved can best be illustrated by example. Consider the AR(1)

model

yt =1r

yt 1 + t , t = 1, 2, 3, . . . . (1.1)

This is equivalent to

1 1r

B yt = t , t = 1, 2, 3, . . . , (1.2)

1


5/50

1. NON-STATIONARY TIME SERIES 2

and the characteristic polynomial (1 1r z ) has root z = r. The series behaves

differently according to whether r > 1, r = 1 or r < 1. Equation (1.2) has

solution:

yt = t + 1r t

1 + . . . + 1r t 1 1

+ 1r t

y0 (1.3)

where y0 is the value of yt at t = 0.

It is clear that when r > 1 the inuence of the initial term 1r t y0 and the

impulses 1r i t i die out as they move further into the past.

For r > 1 therefore we see that the present is more important than the past.

For these values of r the series is stationary and its behaviour will consist of

oscillation around the mean value 0.

When r = 1 past shocks and the initial value have the same weight, the past

being as important as the present.

And for r < 1 the weights on past terms increase with t, the past is more

important than the present. Here the series rapidly diverges towards + or

. This behaviour is termed explosive and is of course counter-intuitive in

almost all situations. For that reason we safely assume that in time series models

of real life data all roots are either on or outside the unit circle.


6/50


1.2 Trend Stationary vs Difference Stationary

We have seen in the last chapter that differencing certain non-stationary time

series can produce stationary series.

For example if yt is an ARIMA( p, d, q ) process where the p roots of the AR

characteristic equation are all outside the unit circle then this series yt will be

non-stationary because of the presence of the d unit roots in the model.

If however we consider the differenced series z t = dyt , this series will be stationary

as all p of the roots of its AR characteristic equation are greater than one in

magnitude.

Instead of differencing a series to achieve stationarity one might think of removing

a polynomial trend from yt to leave a stationary series. In fact this technique

works for some series but not for others.

Consider the series

yt = + t + t , t IID white noise. (1.4)

If we remove the linear term + t from this series yt then the series z t = yt t

with which we are left is clearly a stationary series :

z t = t . (1.5)

Of course differencing this series would also leave a stationary series. This can


7/50


be seen from the following:

yt = + t + t

yt 1 = + (t 1) + t 1 (1.6)

yt = + t (1.7)

Patently this differenced series yt is stationary.

Nomenclature

Before proceeding we note the following:

Drift: Constant terms included in time series models such as in (1.4) are

referred to as Drift terms.

Trend: Constant multiples of time t such as t in (1.4) are referred to as Trendterms.

Next consider the process

yt = + yt 1 + t , t IID white noise. (1.8)

If = 1 this series is non-stationary and if the initial value of yt at t = 0 is y0

then, by iteration, we have:

yt = y0 + t +t

j =1

e j . (1.9)


8/50


If we remove a linear trend y0 + t from this series we are still left with a

non-stationary series

t j =1 e j .

If however we were to difference the series yt then we would nd yt = + t

which is stationary with mean .

Models such as (1.8) which require differencing to achieve stationarity (and cannot

be made stationary by just removing a linear trend) are called Difference-Stationary

series, whereas models which are stationary upon removal of a linear trend e.g.

(1.4) are called Trend-Stationary.

Note that the general ARIMA model (from Section I):

(B) d(yt ) = (B) t ,

could in fact be written as

(L) d[yt ( o + 1t + . . . + d 1td 1)] = (L) t (1.10)

becaused( o + 1t + . . . + d 1td

1) = 0 .

Thus (1.10) automatically includes polynomial trends of degree d 1. Including

a polynomial of degree d + k0 would give

(L) d[yt ( o + 1t + . . . + d+ k0 td+ k0 )]

= (L) t (1.11)


9/50


which is equivalent to

(L) dyt = c(t) + (L) t , (1.12)

c(t) polynomial of degree k0.

Both the Trend-Stationary and Difference-Stationary models allow for the inclusion

of a polynomial trend but in the Difference-Stationary case the deviations from

the polynomial trend still require differencing to achieve stationarity.

Choosing not to difference a series when in fact differencing is required can lead

to serious consequences such as spurious regression (c.f. Section 3), which is one

consequence of non-stationarity. As was seen from equation (1.9), removing a

linear trend does not solve the non-stationarity problem if the series is actually

difference stationary.

Unnecessary differencing, which has the benet of at least ensuring stationarity,

has far less serious consequences: it can lead to inefficient parameter estimates

and over conservative forecast intervals. These parameter estimates are, however,

unbiased and consistent.

In the next section we will examine how to determine whether a series is Difference

Stationary.


10/50

2

Unit Root Tests

2.1 General Issues in Unit Root Testing

We have seen the importance of the presence of unit roots in a time series. In

practice how should one decide whether a series contains a unit root or not?

One could examine plots of the time series looking for wandering behaviour that

would indicate non-stationarity. Alternatively one could look at the sample

auto-correlation function (ACF) of the original series and of the differenced

series, if the auto-correlations dont die out quickly then this would indicate

non-stationarity.

There are, however, problems with relying on graphical methods; the human eye

can deceive. Formal tests for unit roots have been developed and we will now lookat two of these tests in detail; the Dickey-Fuller and Augmented-Dickey-Fuller

tests. Formulating a set of hypotheses to test is our rst consideration.

A general unit root process can be written as

(L) (yt ) = (L)et , (1) = 0 . (2.1)

7


11/50

2. UNIT ROOT TESTS 8

This could be tested against the alternative hypothesis

(L)(1 L)(yt ) = (L)et , 1 < < 1. (2.2)

The way this test has been formulated indicates that we are choosing a null

hypothesis of a unit-root with stationary alternatives. So we accept a unit-root

unless there is signicant evidence that the process is stationary. We could have

decided to have stationarity as the null.

The reason we choose the hypotheses to have a unit-root null is because of therelative importance of the two errors in this testing procedure. If we decide the

series is stationary when in fact it contains a unit root then any forecast intervals

we derive will be too narrow and we will be over-condent of our forecasts. If

however we conclude the series possesses a unit root when in fact it is stationary

then we would difference a stationary series. The consequences of that are not

so serious: we would produce over conservative forecast intervals.

2.2 Dickey-Fuller Tests

In this Section we examine the Dickey-Fuller (DF) approach to testing for a unit

root.

2.2.1 Model 1 No Drift or Trend

The simplest example of the procedure is in the AR(1) model with no drift or

time trend term:

yt = yt 1 + et . (2.3)

We assume here that the et terms are IID white noise and we are interested in

testing the hypotheses:

H 0 : = 1 vs H A : < 1.


12/50


In practice it is easier to use a re-parameterisation of (2.3):

yt = yt 1 + et , (2.4)

where = 1, so that we are now testing

H 0 : = 0 vs H A : < 0.

Considering (2.4), we see that we can test this hypothesis by regressing yt on

yt 1 and computing the standard least squares t-statistic for testing that the

coefficient equals 0.

This test statistic, which we will call is produced automatically in the computer

output obtained from most statistical packages by running a regression for equation

(2.4).

There is one important thing to note, however. If the true process is (2.3) with

= 1 then, because of non-stationarity this t-test statistic does not, in fact,

follow the standard t-distribution. The asymptotic theory of this model has

been developed using Brownian motion techniques. Dickey and Fuller have used

Monte-Carlo simulation to compute a set of critical values for this test and for

other variations on this model. We present some of these critical values in Table

2.4.

2.2.2 Model 2 Drift but no Trend

Consider now a model with drift:

yt = yt 1 + + et (2.5)


13/50


or the reparameterisation

yt = yt 1 + + et . (2.6)

Again we are testing the hypotheses

H 0 : = 1 vs H A : < 1

or equivalently

H 0 : = 0 vs H A : < 0.

The test statistic in this test, , is again the standard least squares t-statistic

obtained by running a regression for equation (2.6).

If the true data generating process has = 0 so that the real process is actually

(2.3) and if = 1 then this test statistic, , follows a nonstandard distribution.

Critical values for this distribution, which is different from the one for the

statistic, have also been produced by Dickey and Fuller (c.f. Table 2.4). If however the true process contains a unit root but a non-zero drift term ((2.5)

with = 1), then follows the standard normal distribution.

2.2.3 Model 3 Drift and Trend

Finally consider a model with drift and a trend:

yt = yt 1 + + t + et (2.7)

yt = yt 1 + + t + et . (2.8)

Here the Dickey-Fuller test statistic, , is again the standard least squares

t-statistic obtained by running a regression for equation (2.8).

Again if the true data process is actually (2.3) with = 1 then this test statistic,

, also follows a nonstandard distribution. Critical values have again been


14/50


produced by Dickey and Fuller (c.f. Table 2.4). If the true process contains a unit

root but a non-zero drift term ((2.5) with = 1), then follows a nonstandard

distribution. Lastly, if the true process contains a unit root, a non-zero drift term

and a non-zero trend term((2.7) with = 1), then follows a standard normal

distribution.

We summarise these results in Table 2.1.

Estimating Test CriticalEquation Statistic True Model Values

yt = yt 1 + et yt = et Table 2.4

yt = yt 1 + + et yt = et Table 2.4yt = yt 1 + + et yt = + et Standard Normalyt = yt 1 + + et yt = + t + et Standard Normal

yt = yt 1 + + t + et yt = et Table 2.4yt = yt 1 + + t + et yt = + et Table 2.4yt = yt 1 + + t + et yt = + t + et Standard Normal

Table 2.1: Dickey Fuller -Test Statistics

As well as using these three -statistics to test the hypothesis of a unit root, it

is also possible to test some joint hypotheses for the presence of an intercept,

a time trend and a unit root. These joint tests use test statistics which are

calculated as standard F -statistics comparing restricted and unrestricted residual

sums of squares. However again due to non-stationarity the distributions are

non-standard. Dickey and Fuller present tables of critical values (c.f. Table 2.5

on page 15) for the three -statistics which are dened in Table 2.2.

We have now seen various tests of the unit-root hypotheses. Which test statistic

we should use depends not only on the estimating equation we will use but also


15/50


Estimating Test Critical

Equation Statistic Hypotheses Values

yt = + yt 1 + et 1 H 0 : (, ) = (0 , 1) Table 2.5H A : (, ) = (0 , 1)

yt = + t + yt 1 + et 2 H 0 : (,, ) = (0 , 0, 1) Table 2.5H A : (,, ) = (0 , 0, 1)

yt = + t + yt 1 + et 3 H 0 : (,, ) = ( , 0, 1) Table 2.5H A : (,, ) = ( , 0, 1)

Table 2.2: Dickey Fuller -Test Statistics

depends on what the true data generating process is. Of course we do not know

in advance which is the correct data generating process so we need a systematic

testing procedure.

2.2.4 Perron Sequential Testing Procedure for Unit Roots

Perron described such a sequential testing procedure and we have outlined it in

Table 2.3 on page 13. We begin using the most general model, if we fail to reject

the null hypothesis of a unit root we continue through the steps stopping as soon

as we can reject a null of a unit root.

NOTE: Steps 2a and 4a are only performed if we reject steps 2 and 4.

The reasoning behind this procedure is as follows. In step 1 we use non-standard

critical values. Suppose we perform step 1 and fail to reject the null hypothesis

of a unit root, we then are left to decide is this because there is a unit root in

the process or have we assumed the wrong underlying data generating process.

So in step 2 we try to establish if indeed the underlying process we have assumed

should be different. Now in step 2 we will either reject the null H 0 : (,, ) =


16/50


Estimating Test Critical

Step Equation Statistic Hypotheses Values

yt =

1 + t + yt 1 + et H 0 : = 0 Table 2.4H A : < 0

2 + t + yt 1 + et 3 H 0 : (,, ) = ( , 0, 0) Table 2.5H A : (,, ) = ( , 0, 0)

2a + t + yt 1 + et t ( ) H 0 : = 0 St. NormalH A : < 0

3 + yt 1 + et H 0 : = 0 Table 2.4H A : < 0

4 + yt 1 + et 1 H 0 : (, ) = (0 , 0) Table 2.5H A : (, ) = (0 , 0)

4a + yt 1 + et t( ) H 0 : = 0 St. NormalH A : < 0

5 yt 1 + et H 0 : = 0 Table 2.4H A : < 0

Table 2.3: Perron Sequential Procedure for the Dickey Fuller Unit Root Test

(, 0, 0) or not. If we do reject the null in step 2 then either = 0 or = 0 but

since we did not reject the null of a unit root in step 1 this means that the we

must have = 0. So we now conclude that there is a signicant trend in the

process and go to step 2a.

If, however, we do not reject the null in step 2 we conclude that there is no

evidence of a trend in the model and so we go to step 3 where we use an estimating

equation that does not include a trend.


17/50


Referring to Table 2.1 on page 11 we see that in the presence of a deterministic

time trend the -statistic is asymptotically standard normal. So in step 2a

instead of using the DF critical values we should use a standard t-statistic (with

degrees of freedom) to test for a unit root.

In step 3 we test for a unit root with a drift term in the model. If we fail to

reject the null in this step we proceed to step 4 where we test jointly for the

presence of a unit root and a drift term. If we fail to reject this null we concludethat the true process does not contain a drift term and we move to step 5.

If, however, we do reject the null in step 4 then this can only be because there

is a drift term present in the true model which would imply that the statistic

should follow the standard normal distribution. So we move to step 4a.

Having gone through all these steps, if we cannot reject the null of a unit rootwe conclude that a unit root is present in the model.

It should be noted that this test procedure is inuenced by the fact that including

additional deterministic terms in the estimating model beyond what is present

in the true process increases the chance of a type II error (accepting the null of

a unit root when in fact the true process is stationary). That is the power of the

test decreases against alternatives of stationarity.

This can be seen by looking at the DF critical values: < < . Suppose that

the true process is given by yt = et , for the lower tailed test H 0 : = 0 vs H A :

< 0, the ordering of the DF critical values means that it will be harder to reject

the null of a unit root when estimation uses a model with drift and a trend than

when it uses only a drift than when it uses neither.

The sequential procedure of Perron seeks to minimize the possibility of making

this kind of error. Having said that, we must of course be aware that the usual


18/50


Signicance Level Signicance Level Signicance Level

0.01 0.05 0.10 0.01 0.05 0.10 0.01 0.05 0.10Samplesize critical values for critical values for critical values for

25 2.66 1.95 1.60 3.75 3.00 2.63 4.38 3.60 3.2450 2.62 1.95 1.61 3.58 2.93 2.60 4.15 3.50 3.18100 2.60 1.95 1.61 3.51 2.89 2.58 4.04 3.45 3.15

t-dist. 2.33 1.65 1.28 2.33 1.65 1.28 2.33 1.65 1.28 d.f.

Table 2.4: Dickey Fuller Critical Values

Signicance Level Signicance Level Signicance Level0.01 0.05 0.10 0.01 0.05 0.10 0.01 0.05 0.10

Sample size critical values for 1 critical values for 2 critical values for 3

25 7.88 5.18 4.12 8.21 5.68 4.67 10.61 7.24 5.91

50 7.06 4.86 3.94 7.02 5.13 4.31 9.31 6.73 5.61100 6.70 4.71 3.86 6.50 4.88 4.16 8.73 6.49 5.47250 6.52 4.63 3.81 6.22 4.75 4.07 8.43 6.34 5.39500 6.47 4.61 3.79 6.15 4.71 4.05 8.34 6.30 5.36 6.43 4.59 3.78 6.09 4.68 4.03 8.27 6.25 5.34

Table 2.5: Dickey Fuller Critical Values

issues associated with multiple testing remain.

2.3 Augmented Dickey Fuller Regression

In practice we cannot always use the Dickey-Fuller tests which were described

in the previous Section because the assumptions required are too strong. Recall

that in the basic Dickey Fuller tests we were dealing with AR(1) processes with

errors et that were IID white noise. In reality there are complications which


19/50


would prevent the use of these DF tests.

The rst such complication is what should we do if the et are not IID white noise?In that instance the Dickey Fuller critical values may not be valid. The second

problem arises when the process follows a more general model than an AR(1)

model: AR(k) models or mixed models with MA terms. In the next two sections

we will examine both of these problems.

2.3.1 1. Non-IID errors

Suppose the true model is:

yt = yt 1 + + t + et (with = 0, = 0, = 1) (2.9)

where now et is not IID but instead is a stationary AR( k):

et = 1et 1 + 2et 2 + . . . + ket k + t , t IID white noise.

Equation (2.9) can be re-parameterised as :

yt = yt 1 + + t + et

(2.10)

= yt 1 + + t + 1et 1 + . . . + ket k + t

(2.11)

We now make use of the fact that in the true model: yt = et (c.f. (2.9)) to

rewrite Equation (2.11) as the AR( k) (c.f. (Section I)) process:

yt = yt 1 + + t + 1 yt 1 + . . . + k yt k + t . (2.12)


20/50


So an AR(1) process with autocorrelated errors can be transformed into an AR( k)

process with IID white noise errors.

2.3.2 2. MA( q ) and AR( k ) terms

What about models with MA terms? More generally, how should we test the

hypotheses (2.1) vs (2.2) for the general unit root model?

That is;

H 0 : (L) (yt ) = (L)et , (1) = 0

vs

H A : (L)(1 L)(yt ) = (L)et , 1 < < 1.

A possible approach might be suggested by the fact that a general ARMA process

can be approximated by an AR model, which has sufficiently high order to ensure

white noise residuals. The usefulness of this approach was conrmed by Said and

Dickey. They showed that an asymptotically valid unit root test for mixed models

with AR and MA components is obtained if the data are analysed as if the process

was an autoregressive model where the order of the AR model is related to n, the

sample size.

So both of the problems with the Dickey Fuller test are solved if we can test for

unit roots in AR( k) processes. Dickey and Fuller have developed such a test it is

called the Augmented Dickey Fuller (ADF) test.


21/50


2.3.3 The ADF test

We recall that a general AR( k) process,

yt = 1yt 1 + 2yt 2 + . . . + kyt k + et ,

can be written as:

yt = yt 1 +

1 yt 1 + . . . +

k 1 yt k+1 + et .

This version of the process is often called an Error Correction Mechanism (ECM)

and Section 5.1 contains a detailed discussion of such models. We saw earlier that

this process contains a unit root if = 0 and is stationary if < 0. Dickey and

Fuller showed that in large samples the t-statistic se() follows the same

distribution as the -statistic in the Dickey-Fuller test.

We can generalize to include drift and trend terms:

yt = + t + yt 1 +

1 yt 1 + . . . +

k 1 yt k+1 + et . (2.13)

Dickey and Fuller have also shown that in large samples the ADF versions of not

just but of all the statistics , , 1, 2, 3 follow the same distributions as in

the Dickey Fuller case (c.f. Table 2.1).

As mentioned before, an ARMA model with unknown orders for the AR and MA

components can be approximated by an AR( k) process, so long as k is sufficiently

large to ensure white noise residuals. The order k will increase as the sample size

increases, Schwert suggests using

k = int 12 T 100

1/ 4

, (2.14)

where int represents the integer part of.


22/50


Choosing the correct lag length is important. Including too few lags will mean

that the errors et will still be non-stationary and this will increase the probability

of a type I error. Including too many lags may reduce the power of the test as

the model will include too many unnecessary additional parameters. However

it is better to include too many lags than too few. If we include too many the

regression can set the unnecessary ones to zero while maybe losing some efficiency.

2.3.4 ADF Test Procedure: Phase 1

As discussed, the order k of the levels autoregression or k 1 in the ECM (2.16)

is unknown so our rst task is to decide on this using the data, including as many

lags as is appropriate to ensure that the residuals are IID white noise. One could

begin here by examining PACF and ACF plots of the differenced series yt , to

try and determine how many lags should be included. A signicant PACF at lag

j would indicate one should t k 1 = j lags in the Error Correction Model.

Alternatively one could use (2.14) initially, then t ARIMA models with the orderof the AR part = k 1, the order of integration d equal to 1 and no MA terms

to the original data. This is t (2.15) to the data

yt = + t +

1 yt 1 + . . . +

k 1 yt k+1 + et .. (2.15)

It should be noted that (2.15) is in fact (2.13) without the yt 1 term.

We then try to t (2.15) with one less lag and use Lagrange Multiplier tests tocheck for white noise residuals. The Ljung-Box-Pierce statistic is appropriate

here, it looks at the residuals as a group testing for white noise. We compare a

model with k lags with one with k 1 lags to see if the chosen k is correct. We

continue reducing the number of lags in the ARIMA model and stop when the

Ljung-Box-Pierce statistic rejects white-noise residuals.


23/50


2.3.5 ADF Test Procedure: Phase 2

By analogy with equation (2.8) in the Dickey Fuller test procedure, we see that

the ADF procedure appropriately begins by estimating the following ECM:

yt = + t + yt 1 +

k 1

i=1

i yt i + et . (2.16)

That is, we regress yt on , t , yt 1, and

yt 1, yt 2, . . . , yt k+1 .

Having decided on an appropriate order for the autoregression, the rest of the

Augmented-Dickey-Fuller test procedure follows the same as in the basic Dickey-Fuller

case. Refer to Table 2.3 on page 13 for details.

Of course if we decide that a series contains a unit-root there is the question as

to whether it also contains a second unit root, i.e. is the order of integrationI(1) or I(2)? To test this we should go through the ADF testing procedure on

the differenced series yt . So instead of (2.16) we would begin here with the

regression:

2yt = + t + yt 1 +

k 1

i=1

i2yt i + et , (2.17)

and proceed as usual.


24/50


25/50

3. SPURIOUS REGRESSIONS 22

When analysing several time series and trying to establish relationships between

them, it is important to be aware of the possibility of spurious regression. It is

possible that two independent time series can appear to be related when in fact

all that is happening is that there are correlated time trends. In Trend-Stationary

series one should include a deterministic time trend in the regression in order to

remove the trend effect. This will leave residuals which are stationary and allow

valid statistical inferences using t or F tests.

But suppose we are dealing with Difference-Stationary series, in this case including

a time trend in the model is not sufficient. Using standard regression techniqueswith non-stationary data will lead to spurious regressions giving invalid inferences

using t or F tests. An example will illustrate this. Consider the following two

independent time-series:

yt = yt 1 + ut , ut IID white noise (3.1)

xt = xt 1 + vt , vt IID white noise. (3.2)

The two series x t and yt are unrelated and estimation of the model

yt = 0 + 1x t + t (3.3)

should give the conclusion 1 = 0. In reaching that conclusion we use the fact that

1/se ( 1) should be distributed as a Student-t distribution with N 2 degrees

of freedom, where N is the number of pairs of observations ( xt , yt ). Howevernon-stationarity in the models (3.4, 3.5) can lead to a non-stationary t and

the fact that both series are changing with t will show up in the modelling as a

correlation between the two series and as a non zero estimate for 1. So estimation

of model (3.6) will imply a causal relationship between the series when in fact

none is present. To illustrate this spurious regression problem we have simulated

the series xt and yt when = 0.1 to give two pairs of stationary series and then


26/50


27/50


2 1 0 1

1 . 5

1

. 0

0

. 5

0 . 0

0 . 5

1 . 0

1 . 5

2 . 0

Scatter plot of Y_t vs X_t

Z[1, ]

Z [ 2

, ]

Figure 3.2: Scatter Plot of xt and yt when = 0.1

Examining Figure 3.2 we can see that the series xt and yt do not display any

correlation, as expected.

We now consider the series simulated with = 1. Time series plots of xt and

yt with = 1 are shown in Figure 3.3 and clearly indicate that the series are

non-stationary.


28/50


Plot of X_t Phi=1

Time

t ( Z ) [

, 1 ]

0 5 10 15 20 25 30

1

1

2

3

4

5

Plot of Y_t Phi=1

Time

t ( Z ) [

, 2 ]

0 5 10 15 20 25 30

0

1

2

3

4

5

Figure 3.3: Time Series Plots of xt and yt when = 1

The spurious regression phenomenon can be clearly seen when we examine a

scatter plot of xt vs yt with = 1 (Figure 3.4). In this plot there is a clear

positive correlation between the series xt and yt , despite the fact that these aregenerated from entirely independent processes.


29/50


30/50


To further examine the nature of the spurious regressions we estimated (3.6) for

the stationary pair of series and separately for the non-stationary pair, computing 1/se ( 1) in each case. We repeated these simulations 10000 times and Table 3.1

compares the percentiles of 1/se ( 1) from the stationary and non-stationary

regressions. The spurious regression problem can be seen quite clearly from these

simulations. When = 0.1 and we are dealing with stationary series, 1/se ( 1)

is distributed as a t-distribution with N 2 degrees of freedom. However, when

= 1, 1/se ( 1) is clearly no longer distributed as a t-distribution. In fact, it

is clear that the distribution of 1/se ( 1) in this case is much more spread out

leading to much higher rejection of the null hypothesis 1 = 0 in favour of 1 = 0.

Source 90th Percentile 95th Percentile 99th Percentile

t-distribution 1.312527 1.701131 2.46714 = 0.1 1.292041 1.673797 2.438064 = 1 8.128050 10.973310 17.09559

Table 3.1: 1/se ( 1) : Spurious vs Non-Spurious Regressions - 10,000 Simulations


31/50


Consider the following two independent time-series:

yt = yt 1 + ut , ut IID white noise (3.4)

xt = xt 1 + vt , vt IID white noise. (3.5)

The two series x t and yt are unrelated and estimation of the model

yt = 0 + 1x t + t (3.6)

should give the conclusion 1 = 0. However the non-stationarity in the models

(3.4), (3.5) leads to a non-stationary t . Estimation of model (3.6) will imply a

causal relationship between the series when in fact none is present.


32/50

4

Multivariate Time Series

4.1 Vector Time Series Models

Multivariate time series data is often modelled using Vector Autoregressive Moving

Average (VARMA) models. They are a more general classication of time series

model and can be used to describe relationships between a number of time series

variables (rather than focusing on the relationship between a single dependent

variable and several independent variables as we have discussed up to now).

4.1.1 Covariance and Correlation Matrix Functions

We denote the variables being studied as:

Z t = [Z 1t , Z 2t , . . . , Z Mt ]t (4.1)

where M i the number of time series being studied, t = 0, 1. 2, . . .. Z t is

an M-dimensional real-valued vector process. We also assume that Z t is jointly

stationary.

29


33/50

4. MULTIVARIATE TIME SERIES 30

Denition: A jointly stationary process implies that each univariate process is

stationary. However, the converse is not necessarily true: a vector of stationary

univariate time series is not necessarily a jointly stationary process.

We also assume that the expected value of Z t is given by:

E (Z rt ) = r (4.2)

E (Z t ) = = ( 1, 2, . . . , M )t (4.3)

where the mean, r , is constant for each r = 1 , 2, . . . , M . The covariances between

Z rt and Z su (for all r,s = 1, 2, . . ., M), are functions of the lag, or time difference,

(u-t). The covariance matrix for lag k is given by:

(k) = Cov(Z t , Z t+ k)

= E [(Z t )(Z t+ k )]

= E

Z 1t 1

Z 2t 2...

Z Mt M

Z 1(t+ k) 1, Z 2(t+ k) 2, . . . , Z M (t+ k) M

=

11 (k) 12(k) . . . 1M (k)

21(k) 22(k) . . . 2M (k)...

... ...

...

M 1(k) M 2(k) . . . MM (k)= Cov(Z t k , Z t ) (4.4)

where

rs (k) = E (Z rt r )(Z s(t+ k) s ) = E (Z r (t k) r )(Z st s ) (4.5)

for k = 0, 1, 2, . . ., and r,s = 1, 2, . . . , M.


34/50


(k) is referred to as the covariance matrix function for Z t . rr (k) is the autocovariancefunction for Z rt . rs (k) denotes the covariance function between Z rt and Z st .

Finally, (0) is the variance-covariance matrix at a given time.

The correlation matrix function for Z t is calculated using the matrix D, where D

is the diagonal matrix of M variances:

D = diag[ 11 (0), 22(0), . . . , MM (0)]. (4.6)

The correlation matrix is given by:

(k) = D 12 (k)D

12 = [rs (k)] (4.7)

for r,s = 1, 2, . . ., M. The r th diagonal element of the correlation matrix, rr (k),

represents the autocorrelation function for the r th series in Z t i.e. the ACF for Z rt .

The off-diagonal terms of the correlation matrix (k) are the cross-correlation

functions between the corresponding series e.g. rs (k) is the cross-correlation

function between Z rt and Z st . Each element can also be calculated using the

following formula:

rs (k) = rs (k)

rr (0) ss (0). (4.8)

Assumptions

It is important to note that the covariance and correlation matrices for a vector

time series are positive denite insofar as:

T

r =1

T

s=1

r (t r ts ) s 0

andT

r =1

T

s=1

r (t r ts ) s 0 (4.9)


35/50


for any set of time points t1, t2, . . . , tT and any set of real vectors 1, 2, . . . ,

T .

It should also be noted that

rs (k) = rs ( k)

and

rs (k) = rs ( k). (4.10)

Instead,

(k) = ( k)

and

(k) = ( k) (4.11)

since

rs (k) = E[(Z rt r )(Z s(t+ k) s )]

= E[(Z s(t+ k) s )(Z rt r )]

= sr (k). (4.12)


36/50


4.1.2 Moving Average and Autoregressive Vector Models

Moving Average Vector Models

The stationary vector time series Z t is called a linear process (or purely nondeterministic

process) if it can be written as a linear combination of white noise random vectors:

Z t = + at + 1a t 1 + 2a t 2 + . . .

= +

u=0

u a t u (4.13)

where the at are M-dimensional white noise random vectors with mean zero and

covariance matrix given by:

E (a t a t+ k) = if k = 0

0 if k = 0(4.14)

where is an MxM symmetric positive denite matrix. The elements of the

vector at at different times are uncorrelated. However, they may be contemporaneously

correlated.

Note also that the coefficients of the linear combination u are MxM coefficient

matrices with 0 = IM , the identity matrix.

This process is known as the multivariate moving average process.

Autoregressive Vector Models

The vector process can also be expressed as an autoregressive process. In an

autoregressive model, the value of of the series Z at a given time t is regressed

on its own past values and a random vector (of errors or shocks).

Z t = 1 Z t 1 + 2 Z t 2 + . . . + at

= u = 1 u Z ( t u) + at (4.15)


37/50


This can also be expressed in terms of the backshift operator, B:

(B) Z t = at (4.16)

where

(B) = I

u=1

u B u (4.17)

and the u are MxM matrices of the autoregressive coefficients. In particular,

0 = IM .

In order for the process to be invertible, the autoregressive coefficient matricesmust be absolutely summable i.e.

u=0

|rs,u | < (4.18)

for all r and s, where u = [rs,u ].

We have mentioned the conditions for stationarity in a moving average process

and invertibility in an autoregressive process. One does not imply the other.

A stationary process is not necessarily invertible. No zeros of the determinant of

the moving average matrix polynomial ( |(B)|) should lie inside or on the unit

circle in order for a vector process with a stationary moving average representation

to be invertible i.e.:

|(B)| = 0 for |B | 1. (4.19)

Similarly, an invertible process is not necessarily stationary. Suppose a vector

process has an invertible autoregressive representation, it is only stationary if the

determinant of the autoregressive matrix polynomial ( |(B)|) has no zeros on or

inside the unit circle i.e.:

|(B)| = 0 for |B | 1. (4.20)


38/50


39/50


i.e. the zeros of the determinantal polynomial |q(B)| are outside the unit circle.

In such a case, the model can be re-written in the form:

(B) Z t = at (4.25)

where

(B) = [ q(B)] 1 p(B) = I

u=1

u B u (4.26)

such that the sequence u is absolutely summable.

The Vector ARMA(p,q) process is said to be stationary if the zeros of thedeterminantal polynomial | p(B)| are outside the unit circle i.e.:

| p(B)| = 0 for |B | 1 (4.27)

A stationary process can then be written as:

Z t = ( B)a t (4.28)

where

(B) = [ p(B)] 1q(B)

=

u=0

u B u (4.29)

such that the sequence u is square summable.

Model Identication

The identication process for a Vector ARMA(p,q) model is similar to the identication

process of a univariate time series. In the univariate case, the following steps are

taken:

1. The time series plot is examined for evidence of non-stationarity.

2. If necessary, transformation (such as differencing or de-trending) of the data


40/50


are applied to ensure stationarity.

3. The sample autocorrelation function and sample partial autocorrelationfunction are calculated and plotted. These graphs are used to estimate the

order of autoregression and the order of the moving average components of

the model (p and q respectively).

In a similar way, given a vector time series Z 1, Z 2, . . . , Z M , the underlying model

is identied using the sample correlation and partial autocorrelation function

matrices (once any necessary transformations have been applied to ensure stationarity).

The Sample Correlation Matrix Function

Consider the observed vector time series Z 1, Z 2, . . . , Z M , the sample correlation

matrix function is denoted as:

(k) = [rs (k)]. (4.30)

The rs (k) are calculated using the following formula (Equation (4.31)) and

represent the cross-correlations between Z r and Z s .

rs (k) =n kt=1 (Z rt Z r )(Z s(t+ k) Z s )

nt=1 (Z rt Z r )2

nt=1 (Z st Z s )2

12

(4.31)

where Z r and Z s are the sample means of Z r and Z s respectively. It has been

shown (Hamann REF) that the sample correlation function estimator (k) isconsistent and asymptotically Normally distributed, assuming that the vector

process is stationary.

The sample correlation matrix function is used to identify the order of the

(nite-order) moving average component of the ARMA model. This is due to

the characteristic of the sample correlation matrix function that the correlation

matrices beyond lag q are zero for a vector MA(q) process.


41/50


With high-dimensional vectors however, identication using the sample correlation

matrices can be difficult simply due to the number of elements. It can make it

extremely difficult to determine the patterns present in the matrices. There is

a convenient method (introduced by Tiao and Box REF) which can ease the

complexity of pattern recognition. The sample correlations are summarized by

converting the entries to one of three symbols:

+ denotes a value greater than 2 x the estimated standard errors,

- denotes a value less than -2 x the estimated standard errors and

denotes a value within 2 estimated standard errors.

The Partial Autoregression Matrices

The order of the autoregressive component of the Vector ARMA(p,q) can be

identied in a similar way using the partial autocorrelation function (PACF).

The PACF between two series Z t and Z t+ k is dened as the correlation between

the two after the linear dependency on the variables in between the two ( Z t+1 ,

Z t+2 , . . . , Z t+ k 1) has been removed:

kk =Cov (Z t Z t ), (Z t+ k Z t+ k)

V ar(Z t Z t ) V ar(Z t+ k Z t+ k)(4.32)

where Z t and Z t+ k are the linear estimators of Z t and Z t+ k calculated by minimum

mean squared error linear regression on Z t+1 , Z t+2 , . . . , Z t+ k 1. This functionkk is zero for |k| > p where p is the number of autoregressive terms required by

the underlying model.


42/50

5

Cointegration

5.1 The Error Correction Model

Let us introduce the Error Correction Model which is favoured by economists as a

means of modelling time series with both long and short run behaviours. Suppose

we have two series I (Income) and C (Consumption) which each have unit roots,

so that I and C are stationary. Now suppose that we believe that there is

a relationship between I and C . Because these two series are non-stationary, we

have seen that trying to model a relationship between these series is subject to

the problem of spurious regression.

But since the differenced series are stationary we may decide that these differenced

series are related by a regression model:

C t = 0 I t + et . (5.1)

In this model increasing I by one unit per period will increase C by 0 units.

Statistically this model is sound but in economics it may not be so reasonable.

39


43/50

5. COINTEGRATION 40

In particular, one might think that the relationship between the increase in

Consumption given an increase in Income should also depend on the current

level of Income. One reasoning for this might be that if one earns a lot then any

increase in income will not be needed to be saved for necessities but could instead

be spent freely, whereas if one is not in a high income bracket then extra income

may not be so liberally consumed.

Economists are also generally interested in systems reaching equilibrium and the

model (5.1) does not include an equilibrium solution. In equilibrium we would

have C t = C t 1 = . . . and I t = I t 1 = . . . .

One way to try and x these problems is to include a term which is the deviation

between the actual value of C in the previous period t 1 and the equilibrium

value of C .

Suppose the equilibrium relationship between C and I is linear:

C equilt = I t . (5.2)

Then the deviation from this equilibrium at period t 1 is C t 1 I t 1. We can

incorporate this as a correction to model (5.1), and so the new model is:

C t = 0 I t (C t 1 I t 1) + et . (5.3)

The parameter is usually rewritten as (1 1), where 1 < 1 giving:

C t = 0 I t (1 1)(C t 1 I t 1) + et . (5.4)

This type of model is called an Error Correction Model (ECM) as it has the

ability to correct disequilibria.


44/50

5. COINTEGRATION 41

Let us consider how it implements this correction. Firstly, during periods of

equilibrium the term ( C t 1 I t 1) will be zero and the model (5.4) will revert

to the form (5.1). In a period of dis-equilibrium, C t increases faster or slower

than expected by the equilibrium relationship (5.2).

If C t increases slower than expected then we will nd ( C t 1 I t 1) < 0 but

(1 1) < 0 also. So the net effect is to add a positive term to the equilibrium

value 0 I t thus boosting C t and forcing C t back towards equilibrium.

If C t increases faster than expected then we are instead adding a negative term

which again forces C t back towards its equilibrium value.

The original model (5.1) did have problems as far as economics was concerned

however (5.1) was a relationship between stationary variables and so was sound

statistically.

The new model (5.4) may make more economic sense, however it now has statistical

problems: it only makes sense if the new variable C t 1 I t 1 is stationary. But

this variable is a linear combination of two non-stationary I(1) variables C and

I at time t 1 and this will also in general be non-stationary I(1).

We note that we can also generalize the ECM model (5.4), including more lag

lengths, to the following relationship linking the variables C t and I t :

A(L) C t = B(L) I t (1 )(C t 1 I t 1) + et . (5.5)

where

A(L) = 1 1L 2L2 . . . kLk ,

B (L) = 0 + 1L + 2L2 + . . . + qLq.


45/50


46/50

5. COINTEGRATION 43

Alternatively (5.6) can be written as

yt = yt k + 1 yt 1 + . . . + k 1 yt k+1 + t (5.8)

where

i = (I 1 2 . . . i), i = 1, 2, . . . , k 1

and

= 1 + . . . + k I.

In this section we introduced the ECM but we have seen that this model appears

not to make sense statistically as it involves both I(1) and I(0) variables together

in the same regression. The solution to this problem was presented by Engle

and Granger when they introduced the concept of cointegration, which we will

examine in the next section of this chapter.

5.3 Cointegration

Consider two series y1t and y2t , which are both integrated of order d I(d). In

general any linear combination of these series will also be integrated of order

d. In particular if a regression is performed of y1t on y2t , then the residuals

from this regression will be I(d) i.e. the regression will suffer from spurious

correlation. Engle and Granger noticed that in some situations it might be

possible to perform a regression containing non-stationary variables and still avoid

spurious regression. They introduced the concept of cointegration:


47/50


48/50

5. COINTEGRATION 45

In fact if yt is an p dimensional vector time series then there may be h < p

linearly independent p 1 vectors ( 1, 2, . . . , h ) such that yt is a stationary

vector time series, where is the following h p matrix:

=

1 2...

h

=

11 12 . . . 1 p

21 22 . . . 2 p...

h1 h2 . . . hp

As mentioned, the vectors ( 1, 2, . . . , h ) are not unique, as for any non-zero

1 h vector a the linear combination a yt is stationary.

If ( 1, 2, . . . , h ) span the co-integrating space then they form a basis for the

co-integrating space.

Having seen the formal denition of cointegration let us consider what it meansin practice. Cointegration means that although there may be many apparently

independent changes in the individual elements of yt , there are actually some

long-run equilibrium relations tying the individual components together. These

relations are represented by the linear combinations yt . So cointegration provides

a model for the idea in economics of a long-run equilibrium to which the system

will converge over time.

We see now that if two variables are fully co-integrated CI (d, d) it is possible to

perform a meaningful regression between them, the regression would pick up the

stationary linear combination and the residuals would no longer be non-stationary

thus eliminating the problem of spurious regression. In practice we mainly deal

with I(1) variables and seek to nd co-integrating linear combinations which will

be stationary.


49/50

5. COINTEGRATION 46

5.4 Engle-Granger Estimation

When the concept of cointegration was introduced by Engle and Granger, they

suggested a procedure to test for cointegration among two variables. If two

variables yt and xt are co-integrated then there is a stationary linear combination

of the variables. This means that the model:

yt = xt + et (5.9)

describes a stationary relationship, does not suffer from spurious regression and

can be consistently estimated by ordinary least squares.

Now, if the two variables yt and x t are not co-integrated then there will not be a

stationary linear combination of the two variables and hence equation (5.9) would

once again suffer from spurious regression as the residuals will be non-stationary.

Engle and Granger make use of this fact to construct a test for cointegration.

They suggest using an ADF test on the residuals et of the (5.9) regression to

see if they satisfy the null of being I(1) or the alternative of stationarity - I(0).

So as described in Section 2.3 we should estimate:

et = et 1 +

k 1

i=1

i et i + + t + t ,

t IID white noise (5.10)

The trend and drift terms can be added in the regression (5.9) or in (5.10) but

not in both.


50/50

5. COINTEGRATION 47

This Engle-Granger procedure for testing for cointegration suffers from many

problems:

1. The test has low power

2. In nite samples the cointegration estimates may be biased

3. Inferences about parameters in (5.9) cannot be performed using standard

t-statistics.

In addition to these problems the fact is that this approach, which uses a singleequation in the model, is only really suitable if there is just one cointegrating

relationship. In general, the multivariate Vector Auto Regression (VAR) approach

of Johansen is to be preferred.

Documents

Further Non-Stationarity Notes