Further Non-Stationarity Notes

Embed Size (px)

Citation preview

  • 8/13/2019 Further Non-Stationarity Notes

    1/50

    University of Oxford

    Time Series Analysis

    Section III

    Michaelmas Term, 2010

    Department of Statistics, 1 South Parks Road,

    Oxford OX1 3TG

  • 8/13/2019 Further Non-Stationarity Notes

    2/50

    Contents

    1 Non-Stationary Time Series 1

    1.1 Phenomenology of Non-Stationarity . . . . . . . . . . . . . . . . . 1

    1.2 Trend Stationary vs Difference Stationary . . . . . . . . . . . . . 3

    2 Unit Root Tests 7

    2.1 General Issues in Unit Root Testing . . . . . . . . . . . . . . . . . 7

    2.2 Dickey-Fuller Tests . . . . . . . . . . . . . . . . . . . . . . . . . . 8

    2.2.1 Model 1 No Drift or Trend . . . . . . . . . . . . . . . . 8

    2.2.2 Model 2 Drift but no Trend . . . . . . . . . . . . . . . 9

    2.2.3 Model 3 Drift and Trend . . . . . . . . . . . . . . . . . 10

    2.2.4 Perron Sequential Testing Procedure for Unit Roots . . . . 12

    2.3 Augmented Dickey Fuller Regression . . . . . . . . . . . . . . . . 15

    2.3.1 1. Non-IID errors . . . . . . . . . . . . . . . . . . . . . . . 16

    2.3.2 2. MA(q ) and AR( k ) terms . . . . . . . . . . . . . . . . . 17

    2.3.3 The ADF test . . . . . . . . . . . . . . . . . . . . . . . . . 18

    2.3.4 ADF Test Procedure: Phase 1 . . . . . . . . . . . . . . . . 19

    2.3.5 ADF Test Procedure: Phase 2 . . . . . . . . . . . . . . . . 20

    1

  • 8/13/2019 Further Non-Stationarity Notes

    3/50

    CONTENTS 2

    3 Spurious Regressions 21

    4 Multivariate Time Series 29

    4.1 Vector Time Series Models . . . . . . . . . . . . . . . . . . . . . . 29

    4.1.1 Covariance and Correlation Matrix Functions . . . . . . . 29

    4.1.2 Moving Average and Autoregressive Vector Models . . . . 33

    5 Cointegration 39

    5.1 The Error Correction Model . . . . . . . . . . . . . . . . . . . . . 39

    5.2 Vector Error Correction Models . . . . . . . . . . . . . . . . . . . 42

    5.3 Cointegration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

    5.4 Engle-Granger Estimation . . . . . . . . . . . . . . . . . . . . . . 46

  • 8/13/2019 Further Non-Stationarity Notes

    4/50

    1

    Non-Stationary Time Series

    In Section I we examined several different ARIMA models. We determined

    conditions under which these models were stationary. Having imposed these

    stationarity conditions we then computed ACF functions for each model. In this

    chapter we will examine some models for non-stationary series.

    1.1 Phenomenology of Non-Stationarity

    In this section we will consider some of the implications of non-stationarity. A

    stationary series has a well-dened mean around which it can uctuate with

    constant nite variance. This is not necessarily true for a non-stationary series.

    The issues involved can best be illustrated by example. Consider the AR(1)

    model

    yt =1r

    yt 1 + t , t = 1, 2, 3, . . . . (1.1)

    This is equivalent to

    1 1r

    B yt = t , t = 1, 2, 3, . . . , (1.2)

    1

  • 8/13/2019 Further Non-Stationarity Notes

    5/50

    1. NON-STATIONARY TIME SERIES 2

    and the characteristic polynomial (1 1r z ) has root z = r. The series behaves

    differently according to whether r > 1, r = 1 or r < 1. Equation (1.2) has

    solution:

    yt = t + 1r t

    1 + . . . + 1r t 1 1

    + 1r t

    y0 (1.3)

    where y0 is the value of yt at t = 0.

    It is clear that when r > 1 the inuence of the initial term 1r t y0 and the

    impulses 1r i t i die out as they move further into the past.

    For r > 1 therefore we see that the present is more important than the past.

    For these values of r the series is stationary and its behaviour will consist of

    oscillation around the mean value 0.

    When r = 1 past shocks and the initial value have the same weight, the past

    being as important as the present.

    And for r < 1 the weights on past terms increase with t, the past is more

    important than the present. Here the series rapidly diverges towards + or

    . This behaviour is termed explosive and is of course counter-intuitive in

    almost all situations. For that reason we safely assume that in time series models

    of real life data all roots are either on or outside the unit circle.

  • 8/13/2019 Further Non-Stationarity Notes

    6/50

    1. NON-STATIONARY TIME SERIES 3

    1.2 Trend Stationary vs Difference Stationary

    We have seen in the last chapter that differencing certain non-stationary time

    series can produce stationary series.

    For example if yt is an ARIMA( p, d, q ) process where the p roots of the AR

    characteristic equation are all outside the unit circle then this series yt will be

    non-stationary because of the presence of the d unit roots in the model.

    If however we consider the differenced series z t = dyt , this series will be stationary

    as all p of the roots of its AR characteristic equation are greater than one in

    magnitude.

    Instead of differencing a series to achieve stationarity one might think of removing

    a polynomial trend from yt to leave a stationary series. In fact this technique

    works for some series but not for others.

    Consider the series

    yt = + t + t , t IID white noise. (1.4)

    If we remove the linear term + t from this series yt then the series z t = yt t

    with which we are left is clearly a stationary series :

    z t = t . (1.5)

    Of course differencing this series would also leave a stationary series. This can

  • 8/13/2019 Further Non-Stationarity Notes

    7/50

    1. NON-STATIONARY TIME SERIES 4

    be seen from the following:

    yt = + t + t

    yt 1 = + (t 1) + t 1 (1.6)

    yt = + t (1.7)

    Patently this differenced series yt is stationary.

    Nomenclature

    Before proceeding we note the following:

    Drift: Constant terms included in time series models such as in (1.4) are

    referred to as Drift terms.

    Trend: Constant multiples of time t such as t in (1.4) are referred to as Trendterms.

    Next consider the process

    yt = + yt 1 + t , t IID white noise. (1.8)

    If = 1 this series is non-stationary and if the initial value of yt at t = 0 is y0

    then, by iteration, we have:

    yt = y0 + t +t

    j =1

    e j . (1.9)

  • 8/13/2019 Further Non-Stationarity Notes

    8/50

    1. NON-STATIONARY TIME SERIES 5

    If we remove a linear trend y0 + t from this series we are still left with a

    non-stationary series

    t j =1 e j .

    If however we were to difference the series yt then we would nd yt = + t

    which is stationary with mean .

    Models such as (1.8) which require differencing to achieve stationarity (and cannot

    be made stationary by just removing a linear trend) are called Difference-Stationary

    series, whereas models which are stationary upon removal of a linear trend e.g.

    (1.4) are called Trend-Stationary.

    Note that the general ARIMA model (from Section I):

    (B) d(yt ) = (B) t ,

    could in fact be written as

    (L) d[yt ( o + 1t + . . . + d 1td 1)] = (L) t (1.10)

    becaused( o + 1t + . . . + d 1td

    1) = 0 .

    Thus (1.10) automatically includes polynomial trends of degree d 1. Including

    a polynomial of degree d + k0 would give

    (L) d[yt ( o + 1t + . . . + d+ k0 td+ k0 )]

    = (L) t (1.11)

  • 8/13/2019 Further Non-Stationarity Notes

    9/50

    1. NON-STATIONARY TIME SERIES 6

    which is equivalent to

    (L) dyt = c(t) + (L) t , (1.12)

    c(t) polynomial of degree k0.

    Both the Trend-Stationary and Difference-Stationary models allow for the inclusion

    of a polynomial trend but in the Difference-Stationary case the deviations from

    the polynomial trend still require differencing to achieve stationarity.

    Choosing not to difference a series when in fact differencing is required can lead

    to serious consequences such as spurious regression (c.f. Section 3), which is one

    consequence of non-stationarity. As was seen from equation (1.9), removing a

    linear trend does not solve the non-stationarity problem if the series is actually

    difference stationary.

    Unnecessary differencing, which has the benet of at least ensuring stationarity,

    has far less serious consequences: it can lead to inefficient parameter estimates

    and over conservative forecast intervals. These parameter estimates are, however,

    unbiased and consistent.

    In the next section we will examine how to determine whether a series is Difference

    Stationary.

  • 8/13/2019 Further Non-Stationarity Notes

    10/50

    2

    Unit Root Tests

    2.1 General Issues in Unit Root Testing

    We have seen the importance of the presence of unit roots in a time series. In

    practice how should one decide whether a series contains a unit root or not?

    One could examine plots of the time series looking for wandering behaviour that

    would indicate non-stationarity. Alternatively one could look at the sample

    auto-correlation function (ACF) of the original series and of the differenced

    series, if the auto-correlations dont die out quickly then this would indicate

    non-stationarity.

    There are, however, problems with relying on graphical methods; the human eye

    can deceive. Formal tests for unit roots have been developed and we will now lookat two of these tests in detail; the Dickey-Fuller and Augmented-Dickey-Fuller

    tests. Formulating a set of hypotheses to test is our rst consideration.

    A general unit root process can be written as

    (L) (yt ) = (L)et , (1) = 0 . (2.1)

    7

  • 8/13/2019 Further Non-Stationarity Notes

    11/50

    2. UNIT ROOT TESTS 8

    This could be tested against the alternative hypothesis

    (L)(1 L)(yt ) = (L)et , 1 < < 1. (2.2)

    The way this test has been formulated indicates that we are choosing a null

    hypothesis of a unit-root with stationary alternatives. So we accept a unit-root

    unless there is signicant evidence that the process is stationary. We could have

    decided to have stationarity as the null.

    The reason we choose the hypotheses to have a unit-root null is because of therelative importance of the two errors in this testing procedure. If we decide the

    series is stationary when in fact it contains a unit root then any forecast intervals

    we derive will be too narrow and we will be over-condent of our forecasts. If

    however we conclude the series possesses a unit root when in fact it is stationary

    then we would difference a stationary series. The consequences of that are not

    so serious: we would produce over conservative forecast intervals.

    2.2 Dickey-Fuller Tests

    In this Section we examine the Dickey-Fuller (DF) approach to testing for a unit

    root.

    2.2.1 Model 1 No Drift or Trend

    The simplest example of the procedure is in the AR(1) model with no drift or

    time trend term:

    yt = yt 1 + et . (2.3)

    We assume here that the et terms are IID white noise and we are interested in

    testing the hypotheses:

    H 0 : = 1 vs H A : < 1.

  • 8/13/2019 Further Non-Stationarity Notes

    12/50

    2. UNIT ROOT TESTS 9

    In practice it is easier to use a re-parameterisation of (2.3):

    yt = yt 1 + et , (2.4)

    where = 1, so that we are now testing

    H 0 : = 0 vs H A : < 0.

    Considering (2.4), we see that we can test this hypothesis by regressing yt on

    yt 1 and computing the standard least squares t-statistic for testing that the

    coefficient equals 0.

    This test statistic, which we will call is produced automatically in the computer

    output obtained from most statistical packages by running a regression for equation

    (2.4).

    There is one important thing to note, however. If the true process is (2.3) with

    = 1 then, because of non-stationarity this t-test statistic does not, in fact,

    follow the standard t-distribution. The asymptotic theory of this model has

    been developed using Brownian motion techniques. Dickey and Fuller have used

    Monte-Carlo simulation to compute a set of critical values for this test and for

    other variations on this model. We present some of these critical values in Table

    2.4.

    2.2.2 Model 2 Drift but no Trend

    Consider now a model with drift:

    yt = yt 1 + + et (2.5)

  • 8/13/2019 Further Non-Stationarity Notes

    13/50

    2. UNIT ROOT TESTS 10

    or the reparameterisation

    yt = yt 1 + + et . (2.6)

    Again we are testing the hypotheses

    H 0 : = 1 vs H A : < 1

    or equivalently

    H 0 : = 0 vs H A : < 0.

    The test statistic in this test, , is again the standard least squares t-statistic

    obtained by running a regression for equation (2.6).

    If the true data generating process has = 0 so that the real process is actually

    (2.3) and if = 1 then this test statistic, , follows a nonstandard distribution.

    Critical values for this distribution, which is different from the one for the

    statistic, have also been produced by Dickey and Fuller (c.f. Table 2.4). If however the true process contains a unit root but a non-zero drift term ((2.5)

    with = 1), then follows the standard normal distribution.

    2.2.3 Model 3 Drift and Trend

    Finally consider a model with drift and a trend:

    yt = yt 1 + + t + et (2.7)

    yt = yt 1 + + t + et . (2.8)

    Here the Dickey-Fuller test statistic, , is again the standard least squares

    t-statistic obtained by running a regression for equation (2.8).

    Again if the true data process is actually (2.3) with = 1 then this test statistic,

    , also follows a nonstandard distribution. Critical values have again been

  • 8/13/2019 Further Non-Stationarity Notes

    14/50

    2. UNIT ROOT TESTS 11

    produced by Dickey and Fuller (c.f. Table 2.4). If the true process contains a unit

    root but a non-zero drift term ((2.5) with = 1), then follows a nonstandard

    distribution. Lastly, if the true process contains a unit root, a non-zero drift term

    and a non-zero trend term((2.7) with = 1), then follows a standard normal

    distribution.

    We summarise these results in Table 2.1.

    Estimating Test CriticalEquation Statistic True Model Values

    yt = yt 1 + et yt = et Table 2.4

    yt = yt 1 + + et yt = et Table 2.4yt = yt 1 + + et yt = + et Standard Normalyt = yt 1 + + et yt = + t + et Standard Normal

    yt = yt 1 + + t + et yt = et Table 2.4yt = yt 1 + + t + et yt = + et Table 2.4yt = yt 1 + + t + et yt = + t + et Standard Normal

    Table 2.1: Dickey Fuller -Test Statistics

    As well as using these three -statistics to test the hypothesis of a unit root, it

    is also possible to test some joint hypotheses for the presence of an intercept,

    a time trend and a unit root. These joint tests use test statistics which are

    calculated as standard F -statistics comparing restricted and unrestricted residual

    sums of squares. However again due to non-stationarity the distributions are

    non-standard. Dickey and Fuller present tables of critical values (c.f. Table 2.5

    on page 15) for the three -statistics which are dened in Table 2.2.

    We have now seen various tests of the unit-root hypotheses. Which test statistic

    we should use depends not only on the estimating equation we will use but also

  • 8/13/2019 Further Non-Stationarity Notes

    15/50

    2. UNIT ROOT TESTS 12

    Estimating Test Critical

    Equation Statistic Hypotheses Values

    yt = + yt 1 + et 1 H 0 : (, ) = (0 , 1) Table 2.5H A : (, ) = (0 , 1)

    yt = + t + yt 1 + et 2 H 0 : (,, ) = (0 , 0, 1) Table 2.5H A : (,, ) = (0 , 0, 1)

    yt = + t + yt 1 + et 3 H 0 : (,, ) = ( , 0, 1) Table 2.5H A : (,, ) = ( , 0, 1)

    Table 2.2: Dickey Fuller -Test Statistics

    depends on what the true data generating process is. Of course we do not know

    in advance which is the correct data generating process so we need a systematic

    testing procedure.

    2.2.4 Perron Sequential Testing Procedure for Unit Roots

    Perron described such a sequential testing procedure and we have outlined it in

    Table 2.3 on page 13. We begin using the most general model, if we fail to reject

    the null hypothesis of a unit root we continue through the steps stopping as soon

    as we can reject a null of a unit root.

    NOTE: Steps 2a and 4a are only performed if we reject steps 2 and 4.

    The reasoning behind this procedure is as follows. In step 1 we use non-standard

    critical values. Suppose we perform step 1 and fail to reject the null hypothesis

    of a unit root, we then are left to decide is this because there is a unit root in

    the process or have we assumed the wrong underlying data generating process.

    So in step 2 we try to establish if indeed the underlying process we have assumed

    should be different. Now in step 2 we will either reject the null H 0 : (,, ) =

  • 8/13/2019 Further Non-Stationarity Notes

    16/50

    2. UNIT ROOT TESTS 13

    Estimating Test Critical

    Step Equation Statistic Hypotheses Values

    yt =

    1 + t + yt 1 + et H 0 : = 0 Table 2.4H A : < 0

    2 + t + yt 1 + et 3 H 0 : (,, ) = ( , 0, 0) Table 2.5H A : (,, ) = ( , 0, 0)

    2a + t + yt 1 + et t ( ) H 0 : = 0 St. NormalH A : < 0

    3 + yt 1 + et H 0 : = 0 Table 2.4H A : < 0

    4 + yt 1 + et 1 H 0 : (, ) = (0 , 0) Table 2.5H A : (, ) = (0 , 0)

    4a + yt 1 + et t( ) H 0 : = 0 St. NormalH A : < 0

    5 yt 1 + et H 0 : = 0 Table 2.4H A : < 0

    Table 2.3: Perron Sequential Procedure for the Dickey Fuller Unit Root Test

    (, 0, 0) or not. If we do reject the null in step 2 then either = 0 or = 0 but

    since we did not reject the null of a unit root in step 1 this means that the we

    must have = 0. So we now conclude that there is a signicant trend in the

    process and go to step 2a.

    If, however, we do not reject the null in step 2 we conclude that there is no

    evidence of a trend in the model and so we go to step 3 where we use an estimating

    equation that does not include a trend.

  • 8/13/2019 Further Non-Stationarity Notes

    17/50

    2. UNIT ROOT TESTS 14

    Referring to Table 2.1 on page 11 we see that in the presence of a deterministic

    time trend the -statistic is asymptotically standard normal. So in step 2a

    instead of using the DF critical values we should use a standard t-statistic (with

    degrees of freedom) to test for a unit root.

    In step 3 we test for a unit root with a drift term in the model. If we fail to

    reject the null in this step we proceed to step 4 where we test jointly for the

    presence of a unit root and a drift term. If we fail to reject this null we concludethat the true process does not contain a drift term and we move to step 5.

    If, however, we do reject the null in step 4 then this can only be because there

    is a drift term present in the true model which would imply that the statistic

    should follow the standard normal distribution. So we move to step 4a.

    Having gone through all these steps, if we cannot reject the null of a unit rootwe conclude that a unit root is present in the model.

    It should be noted that this test procedure is inuenced by the fact that including

    additional deterministic terms in the estimating model beyond what is present

    in the true process increases the chance of a type II error (accepting the null of

    a unit root when in fact the true process is stationary). That is the power of the

    test decreases against alternatives of stationarity.

    This can be seen by looking at the DF critical values: < < . Suppose that

    the true process is given by yt = et , for the lower tailed test H 0 : = 0 vs H A :

    < 0, the ordering of the DF critical values means that it will be harder to reject

    the null of a unit root when estimation uses a model with drift and a trend than

    when it uses only a drift than when it uses neither.

    The sequential procedure of Perron seeks to minimize the possibility of making

    this kind of error. Having said that, we must of course be aware that the usual

  • 8/13/2019 Further Non-Stationarity Notes

    18/50

    2. UNIT ROOT TESTS 15

    Signicance Level Signicance Level Signicance Level

    0.01 0.05 0.10 0.01 0.05 0.10 0.01 0.05 0.10Samplesize critical values for critical values for critical values for

    25 2.66 1.95 1.60 3.75 3.00 2.63 4.38 3.60 3.2450 2.62 1.95 1.61 3.58 2.93 2.60 4.15 3.50 3.18100 2.60 1.95 1.61 3.51 2.89 2.58 4.04 3.45 3.15

    t-dist. 2.33 1.65 1.28 2.33 1.65 1.28 2.33 1.65 1.28 d.f.

    Table 2.4: Dickey Fuller Critical Values

    Signicance Level Signicance Level Signicance Level0.01 0.05 0.10 0.01 0.05 0.10 0.01 0.05 0.10

    Sample size critical values for 1 critical values for 2 critical values for 3

    25 7.88 5.18 4.12 8.21 5.68 4.67 10.61 7.24 5.91

    50 7.06 4.86 3.94 7.02 5.13 4.31 9.31 6.73 5.61100 6.70 4.71 3.86 6.50 4.88 4.16 8.73 6.49 5.47250 6.52 4.63 3.81 6.22 4.75 4.07 8.43 6.34 5.39500 6.47 4.61 3.79 6.15 4.71 4.05 8.34 6.30 5.36 6.43 4.59 3.78 6.09 4.68 4.03 8.27 6.25 5.34

    Table 2.5: Dickey Fuller Critical Values

    issues associated with multiple testing remain.

    2.3 Augmented Dickey Fuller Regression

    In practice we cannot always use the Dickey-Fuller tests which were described

    in the previous Section because the assumptions required are too strong. Recall

    that in the basic Dickey Fuller tests we were dealing with AR(1) processes with

    errors et that were IID white noise. In reality there are complications which

  • 8/13/2019 Further Non-Stationarity Notes

    19/50

    2. UNIT ROOT TESTS 16

    would prevent the use of these DF tests.

    The rst such complication is what should we do if the et are not IID white noise?In that instance the Dickey Fuller critical values may not be valid. The second

    problem arises when the process follows a more general model than an AR(1)

    model: AR(k) models or mixed models with MA terms. In the next two sections

    we will examine both of these problems.

    2.3.1 1. Non-IID errors

    Suppose the true model is:

    yt = yt 1 + + t + et (with = 0, = 0, = 1) (2.9)

    where now et is not IID but instead is a stationary AR( k):

    et = 1et 1 + 2et 2 + . . . + ket k + t , t IID white noise.

    Equation (2.9) can be re-parameterised as :

    yt = yt 1 + + t + et

    (2.10)

    = yt 1 + + t + 1et 1 + . . . + ket k + t

    (2.11)

    We now make use of the fact that in the true model: yt = et (c.f. (2.9)) to

    rewrite Equation (2.11) as the AR( k) (c.f. (Section I)) process:

    yt = yt 1 + + t + 1 yt 1 + . . . + k yt k + t . (2.12)

  • 8/13/2019 Further Non-Stationarity Notes

    20/50

    2. UNIT ROOT TESTS 17

    So an AR(1) process with autocorrelated errors can be transformed into an AR( k)

    process with IID white noise errors.

    2.3.2 2. MA( q ) and AR( k ) terms

    What about models with MA terms? More generally, how should we test the

    hypotheses (2.1) vs (2.2) for the general unit root model?

    That is;

    H 0 : (L) (yt ) = (L)et , (1) = 0

    vs

    H A : (L)(1 L)(yt ) = (L)et , 1 < < 1.

    A possible approach might be suggested by the fact that a general ARMA process

    can be approximated by an AR model, which has sufficiently high order to ensure

    white noise residuals. The usefulness of this approach was conrmed by Said and

    Dickey. They showed that an asymptotically valid unit root test for mixed models

    with AR and MA components is obtained if the data are analysed as if the process

    was an autoregressive model where the order of the AR model is related to n, the

    sample size.

    So both of the problems with the Dickey Fuller test are solved if we can test for

    unit roots in AR( k) processes. Dickey and Fuller have developed such a test it is

    called the Augmented Dickey Fuller (ADF) test.

  • 8/13/2019 Further Non-Stationarity Notes

    21/50

    2. UNIT ROOT TESTS 18

    2.3.3 The ADF test

    We recall that a general AR( k) process,

    yt = 1yt 1 + 2yt 2 + . . . + kyt k + et ,

    can be written as:

    yt = yt 1 +

    1 yt 1 + . . . +

    k 1 yt k+1 + et .

    This version of the process is often called an Error Correction Mechanism (ECM)

    and Section 5.1 contains a detailed discussion of such models. We saw earlier that

    this process contains a unit root if = 0 and is stationary if < 0. Dickey and

    Fuller showed that in large samples the t-statistic se() follows the same

    distribution as the -statistic in the Dickey-Fuller test.

    We can generalize to include drift and trend terms:

    yt = + t + yt 1 +

    1 yt 1 + . . . +

    k 1 yt k+1 + et . (2.13)

    Dickey and Fuller have also shown that in large samples the ADF versions of not

    just but of all the statistics , , 1, 2, 3 follow the same distributions as in

    the Dickey Fuller case (c.f. Table 2.1).

    As mentioned before, an ARMA model with unknown orders for the AR and MA

    components can be approximated by an AR( k) process, so long as k is sufficiently

    large to ensure white noise residuals. The order k will increase as the sample size

    increases, Schwert suggests using

    k = int 12 T 100

    1/ 4

    , (2.14)

    where int represents the integer part of.

  • 8/13/2019 Further Non-Stationarity Notes

    22/50

    2. UNIT ROOT TESTS 19

    Choosing the correct lag length is important. Including too few lags will mean

    that the errors et will still be non-stationary and this will increase the probability

    of a type I error. Including too many lags may reduce the power of the test as

    the model will include too many unnecessary additional parameters. However

    it is better to include too many lags than too few. If we include too many the

    regression can set the unnecessary ones to zero while maybe losing some efficiency.

    2.3.4 ADF Test Procedure: Phase 1

    As discussed, the order k of the levels autoregression or k 1 in the ECM (2.16)

    is unknown so our rst task is to decide on this using the data, including as many

    lags as is appropriate to ensure that the residuals are IID white noise. One could

    begin here by examining PACF and ACF plots of the differenced series yt , to

    try and determine how many lags should be included. A signicant PACF at lag

    j would indicate one should t k 1 = j lags in the Error Correction Model.

    Alternatively one could use (2.14) initially, then t ARIMA models with the orderof the AR part = k 1, the order of integration d equal to 1 and no MA terms

    to the original data. This is t (2.15) to the data

    yt = + t +

    1 yt 1 + . . . +

    k 1 yt k+1 + et .. (2.15)

    It should be noted that (2.15) is in fact (2.13) without the yt 1 term.

    We then try to t (2.15) with one less lag and use Lagrange Multiplier tests tocheck for white noise residuals. The Ljung-Box-Pierce statistic is appropriate

    here, it looks at the residuals as a group testing for white noise. We compare a

    model with k lags with one with k 1 lags to see if the chosen k is correct. We

    continue reducing the number of lags in the ARIMA model and stop when the

    Ljung-Box-Pierce statistic rejects white-noise residuals.

  • 8/13/2019 Further Non-Stationarity Notes

    23/50

    2. UNIT ROOT TESTS 20

    2.3.5 ADF Test Procedure: Phase 2

    By analogy with equation (2.8) in the Dickey Fuller test procedure, we see that

    the ADF procedure appropriately begins by estimating the following ECM:

    yt = + t + yt 1 +

    k 1

    i=1

    i yt i + et . (2.16)

    That is, we regress yt on , t , yt 1, and

    yt 1, yt 2, . . . , yt k+1 .

    Having decided on an appropriate order for the autoregression, the rest of the

    Augmented-Dickey-Fuller test procedure follows the same as in the basic Dickey-Fuller

    case. Refer to Table 2.3 on page 13 for details.

    Of course if we decide that a series contains a unit-root there is the question as

    to whether it also contains a second unit root, i.e. is the order of integrationI(1) or I(2)? To test this we should go through the ADF testing procedure on

    the differenced series yt . So instead of (2.16) we would begin here with the

    regression:

    2yt = + t + yt 1 +

    k 1

    i=1

    i2yt i + et , (2.17)

    and proceed as usual.

  • 8/13/2019 Further Non-Stationarity Notes

    24/50

  • 8/13/2019 Further Non-Stationarity Notes

    25/50

    3. SPURIOUS REGRESSIONS 22

    When analysing several time series and trying to establish relationships between

    them, it is important to be aware of the possibility of spurious regression. It is

    possible that two independent time series can appear to be related when in fact

    all that is happening is that there are correlated time trends. In Trend-Stationary

    series one should include a deterministic time trend in the regression in order to

    remove the trend effect. This will leave residuals which are stationary and allow

    valid statistical inferences using t or F tests.

    But suppose we are dealing with Difference-Stationary series, in this case including

    a time trend in the model is not sufficient. Using standard regression techniqueswith non-stationary data will lead to spurious regressions giving invalid inferences

    using t or F tests. An example will illustrate this. Consider the following two

    independent time-series:

    yt = yt 1 + ut , ut IID white noise (3.1)

    xt = xt 1 + vt , vt IID white noise. (3.2)

    The two series x t and yt are unrelated and estimation of the model

    yt = 0 + 1x t + t (3.3)

    should give the conclusion 1 = 0. In reaching that conclusion we use the fact that

    1/se ( 1) should be distributed as a Student-t distribution with N 2 degrees

    of freedom, where N is the number of pairs of observations ( xt , yt ). Howevernon-stationarity in the models (3.4, 3.5) can lead to a non-stationary t and

    the fact that both series are changing with t will show up in the modelling as a

    correlation between the two series and as a non zero estimate for 1. So estimation

    of model (3.6) will imply a causal relationship between the series when in fact

    none is present. To illustrate this spurious regression problem we have simulated

    the series xt and yt when = 0.1 to give two pairs of stationary series and then

  • 8/13/2019 Further Non-Stationarity Notes

    26/50

  • 8/13/2019 Further Non-Stationarity Notes

    27/50

    3. SPURIOUS REGRESSIONS 24

    2 1 0 1

    1 . 5

    1

    . 0

    0

    . 5

    0 . 0

    0 . 5

    1 . 0

    1 . 5

    2 . 0

    Scatter plot of Y_t vs X_t

    Z[1, ]

    Z [ 2

    , ]

    Figure 3.2: Scatter Plot of xt and yt when = 0.1

    Examining Figure 3.2 we can see that the series xt and yt do not display any

    correlation, as expected.

    We now consider the series simulated with = 1. Time series plots of xt and

    yt with = 1 are shown in Figure 3.3 and clearly indicate that the series are

    non-stationary.

  • 8/13/2019 Further Non-Stationarity Notes

    28/50

    3. SPURIOUS REGRESSIONS 25

    Plot of X_t Phi=1

    Time

    t ( Z ) [

    , 1 ]

    0 5 10 15 20 25 30

    1

    1

    2

    3

    4

    5

    Plot of Y_t Phi=1

    Time

    t ( Z ) [

    , 2 ]

    0 5 10 15 20 25 30

    0

    1

    2

    3

    4

    5

    Figure 3.3: Time Series Plots of xt and yt when = 1

    The spurious regression phenomenon can be clearly seen when we examine a

    scatter plot of xt vs yt with = 1 (Figure 3.4). In this plot there is a clear

    positive correlation between the series xt and yt , despite the fact that these aregenerated from entirely independent processes.

  • 8/13/2019 Further Non-Stationarity Notes

    29/50

  • 8/13/2019 Further Non-Stationarity Notes

    30/50

    3. SPURIOUS REGRESSIONS 27

    To further examine the nature of the spurious regressions we estimated (3.6) for

    the stationary pair of series and separately for the non-stationary pair, computing 1/se ( 1) in each case. We repeated these simulations 10000 times and Table 3.1

    compares the percentiles of 1/se ( 1) from the stationary and non-stationary

    regressions. The spurious regression problem can be seen quite clearly from these

    simulations. When = 0.1 and we are dealing with stationary series, 1/se ( 1)

    is distributed as a t-distribution with N 2 degrees of freedom. However, when

    = 1, 1/se ( 1) is clearly no longer distributed as a t-distribution. In fact, it

    is clear that the distribution of 1/se ( 1) in this case is much more spread out

    leading to much higher rejection of the null hypothesis 1 = 0 in favour of 1 = 0.

    Source 90th Percentile 95th Percentile 99th Percentile

    t-distribution 1.312527 1.701131 2.46714 = 0.1 1.292041 1.673797 2.438064 = 1 8.128050 10.973310 17.09559

    Table 3.1: 1/se ( 1) : Spurious vs Non-Spurious Regressions - 10,000 Simulations

  • 8/13/2019 Further Non-Stationarity Notes

    31/50

    3. SPURIOUS REGRESSIONS 28

    Consider the following two independent time-series:

    yt = yt 1 + ut , ut IID white noise (3.4)

    xt = xt 1 + vt , vt IID white noise. (3.5)

    The two series x t and yt are unrelated and estimation of the model

    yt = 0 + 1x t + t (3.6)

    should give the conclusion 1 = 0. However the non-stationarity in the models

    (3.4), (3.5) leads to a non-stationary t . Estimation of model (3.6) will imply a

    causal relationship between the series when in fact none is present.

  • 8/13/2019 Further Non-Stationarity Notes

    32/50

    4

    Multivariate Time Series

    4.1 Vector Time Series Models

    Multivariate time series data is often modelled using Vector Autoregressive Moving

    Average (VARMA) models. They are a more general classication of time series

    model and can be used to describe relationships between a number of time series

    variables (rather than focusing on the relationship between a single dependent

    variable and several independent variables as we have discussed up to now).

    4.1.1 Covariance and Correlation Matrix Functions

    We denote the variables being studied as:

    Z t = [Z 1t , Z 2t , . . . , Z Mt ]t (4.1)

    where M i the number of time series being studied, t = 0, 1. 2, . . .. Z t is

    an M-dimensional real-valued vector process. We also assume that Z t is jointly

    stationary.

    29

  • 8/13/2019 Further Non-Stationarity Notes

    33/50

    4. MULTIVARIATE TIME SERIES 30

    Denition: A jointly stationary process implies that each univariate process is

    stationary. However, the converse is not necessarily true: a vector of stationary

    univariate time series is not necessarily a jointly stationary process.

    We also assume that the expected value of Z t is given by:

    E (Z rt ) = r (4.2)

    E (Z t ) = = ( 1, 2, . . . , M )t (4.3)

    where the mean, r , is constant for each r = 1 , 2, . . . , M . The covariances between

    Z rt and Z su (for all r,s = 1, 2, . . ., M), are functions of the lag, or time difference,

    (u-t). The covariance matrix for lag k is given by:

    (k) = Cov(Z t , Z t+ k)

    = E [(Z t )(Z t+ k )]

    = E

    Z 1t 1

    Z 2t 2...

    Z Mt M

    Z 1(t+ k) 1, Z 2(t+ k) 2, . . . , Z M (t+ k) M

    =

    11 (k) 12(k) . . . 1M (k)

    21(k) 22(k) . . . 2M (k)...

    ... ...

    ...

    M 1(k) M 2(k) . . . MM (k)= Cov(Z t k , Z t ) (4.4)

    where

    rs (k) = E (Z rt r )(Z s(t+ k) s ) = E (Z r (t k) r )(Z st s ) (4.5)

    for k = 0, 1, 2, . . ., and r,s = 1, 2, . . . , M.

  • 8/13/2019 Further Non-Stationarity Notes

    34/50

    4. MULTIVARIATE TIME SERIES 31

    (k) is referred to as the covariance matrix function for Z t . rr (k) is the autocovariancefunction for Z rt . rs (k) denotes the covariance function between Z rt and Z st .

    Finally, (0) is the variance-covariance matrix at a given time.

    The correlation matrix function for Z t is calculated using the matrix D, where D

    is the diagonal matrix of M variances:

    D = diag[ 11 (0), 22(0), . . . , MM (0)]. (4.6)

    The correlation matrix is given by:

    (k) = D 12 (k)D

    12 = [rs (k)] (4.7)

    for r,s = 1, 2, . . ., M. The r th diagonal element of the correlation matrix, rr (k),

    represents the autocorrelation function for the r th series in Z t i.e. the ACF for Z rt .

    The off-diagonal terms of the correlation matrix (k) are the cross-correlation

    functions between the corresponding series e.g. rs (k) is the cross-correlation

    function between Z rt and Z st . Each element can also be calculated using the

    following formula:

    rs (k) = rs (k)

    rr (0) ss (0). (4.8)

    Assumptions

    It is important to note that the covariance and correlation matrices for a vector

    time series are positive denite insofar as:

    T

    r =1

    T

    s=1

    r (t r ts ) s 0

    andT

    r =1

    T

    s=1

    r (t r ts ) s 0 (4.9)

  • 8/13/2019 Further Non-Stationarity Notes

    35/50

    4. MULTIVARIATE TIME SERIES 32

    for any set of time points t1, t2, . . . , tT and any set of real vectors 1, 2, . . . ,

    T .

    It should also be noted that

    rs (k) = rs ( k)

    and

    rs (k) = rs ( k). (4.10)

    Instead,

    (k) = ( k)

    and

    (k) = ( k) (4.11)

    since

    rs (k) = E[(Z rt r )(Z s(t+ k) s )]

    = E[(Z s(t+ k) s )(Z rt r )]

    = sr (k). (4.12)

  • 8/13/2019 Further Non-Stationarity Notes

    36/50

    4. MULTIVARIATE TIME SERIES 33

    4.1.2 Moving Average and Autoregressive Vector Models

    Moving Average Vector Models

    The stationary vector time series Z t is called a linear process (or purely nondeterministic

    process) if it can be written as a linear combination of white noise random vectors:

    Z t = + at + 1a t 1 + 2a t 2 + . . .

    = +

    u=0

    u a t u (4.13)

    where the at are M-dimensional white noise random vectors with mean zero and

    covariance matrix given by:

    E (a t a t+ k) = if k = 0

    0 if k = 0(4.14)

    where is an MxM symmetric positive denite matrix. The elements of the

    vector at at different times are uncorrelated. However, they may be contemporaneously

    correlated.

    Note also that the coefficients of the linear combination u are MxM coefficient

    matrices with 0 = IM , the identity matrix.

    This process is known as the multivariate moving average process.

    Autoregressive Vector Models

    The vector process can also be expressed as an autoregressive process. In an

    autoregressive model, the value of of the series Z at a given time t is regressed

    on its own past values and a random vector (of errors or shocks).

    Z t = 1 Z t 1 + 2 Z t 2 + . . . + at

    = u = 1 u Z ( t u) + at (4.15)

  • 8/13/2019 Further Non-Stationarity Notes

    37/50

    4. MULTIVARIATE TIME SERIES 34

    This can also be expressed in terms of the backshift operator, B:

    (B) Z t = at (4.16)

    where

    (B) = I

    u=1

    u B u (4.17)

    and the u are MxM matrices of the autoregressive coefficients. In particular,

    0 = IM .

    In order for the process to be invertible, the autoregressive coefficient matricesmust be absolutely summable i.e.

    u=0

    |rs,u | < (4.18)

    for all r and s, where u = [rs,u ].

    We have mentioned the conditions for stationarity in a moving average process

    and invertibility in an autoregressive process. One does not imply the other.

    A stationary process is not necessarily invertible. No zeros of the determinant of

    the moving average matrix polynomial ( |(B)|) should lie inside or on the unit

    circle in order for a vector process with a stationary moving average representation

    to be invertible i.e.:

    |(B)| = 0 for |B | 1. (4.19)

    Similarly, an invertible process is not necessarily stationary. Suppose a vector

    process has an invertible autoregressive representation, it is only stationary if the

    determinant of the autoregressive matrix polynomial ( |(B)|) has no zeros on or

    inside the unit circle i.e.:

    |(B)| = 0 for |B | 1. (4.20)

  • 8/13/2019 Further Non-Stationarity Notes

    38/50

  • 8/13/2019 Further Non-Stationarity Notes

    39/50

    4. MULTIVARIATE TIME SERIES 36

    i.e. the zeros of the determinantal polynomial |q(B)| are outside the unit circle.

    In such a case, the model can be re-written in the form:

    (B) Z t = at (4.25)

    where

    (B) = [ q(B)] 1 p(B) = I

    u=1

    u B u (4.26)

    such that the sequence u is absolutely summable.

    The Vector ARMA(p,q) process is said to be stationary if the zeros of thedeterminantal polynomial | p(B)| are outside the unit circle i.e.:

    | p(B)| = 0 for |B | 1 (4.27)

    A stationary process can then be written as:

    Z t = ( B)a t (4.28)

    where

    (B) = [ p(B)] 1q(B)

    =

    u=0

    u B u (4.29)

    such that the sequence u is square summable.

    Model Identication

    The identication process for a Vector ARMA(p,q) model is similar to the identication

    process of a univariate time series. In the univariate case, the following steps are

    taken:

    1. The time series plot is examined for evidence of non-stationarity.

    2. If necessary, transformation (such as differencing or de-trending) of the data

  • 8/13/2019 Further Non-Stationarity Notes

    40/50

    4. MULTIVARIATE TIME SERIES 37

    are applied to ensure stationarity.

    3. The sample autocorrelation function and sample partial autocorrelationfunction are calculated and plotted. These graphs are used to estimate the

    order of autoregression and the order of the moving average components of

    the model (p and q respectively).

    In a similar way, given a vector time series Z 1, Z 2, . . . , Z M , the underlying model

    is identied using the sample correlation and partial autocorrelation function

    matrices (once any necessary transformations have been applied to ensure stationarity).

    The Sample Correlation Matrix Function

    Consider the observed vector time series Z 1, Z 2, . . . , Z M , the sample correlation

    matrix function is denoted as:

    (k) = [rs (k)]. (4.30)

    The rs (k) are calculated using the following formula (Equation (4.31)) and

    represent the cross-correlations between Z r and Z s .

    rs (k) =n kt=1 (Z rt Z r )(Z s(t+ k) Z s )

    nt=1 (Z rt Z r )2

    nt=1 (Z st Z s )2

    12

    (4.31)

    where Z r and Z s are the sample means of Z r and Z s respectively. It has been

    shown (Hamann REF) that the sample correlation function estimator (k) isconsistent and asymptotically Normally distributed, assuming that the vector

    process is stationary.

    The sample correlation matrix function is used to identify the order of the

    (nite-order) moving average component of the ARMA model. This is due to

    the characteristic of the sample correlation matrix function that the correlation

    matrices beyond lag q are zero for a vector MA(q) process.

  • 8/13/2019 Further Non-Stationarity Notes

    41/50

    4. MULTIVARIATE TIME SERIES 38

    With high-dimensional vectors however, identication using the sample correlation

    matrices can be difficult simply due to the number of elements. It can make it

    extremely difficult to determine the patterns present in the matrices. There is

    a convenient method (introduced by Tiao and Box REF) which can ease the

    complexity of pattern recognition. The sample correlations are summarized by

    converting the entries to one of three symbols:

    + denotes a value greater than 2 x the estimated standard errors,

    - denotes a value less than -2 x the estimated standard errors and

    denotes a value within 2 estimated standard errors.

    The Partial Autoregression Matrices

    The order of the autoregressive component of the Vector ARMA(p,q) can be

    identied in a similar way using the partial autocorrelation function (PACF).

    The PACF between two series Z t and Z t+ k is dened as the correlation between

    the two after the linear dependency on the variables in between the two ( Z t+1 ,

    Z t+2 , . . . , Z t+ k 1) has been removed:

    kk =Cov (Z t Z t ), (Z t+ k Z t+ k)

    V ar(Z t Z t ) V ar(Z t+ k Z t+ k)(4.32)

    where Z t and Z t+ k are the linear estimators of Z t and Z t+ k calculated by minimum

    mean squared error linear regression on Z t+1 , Z t+2 , . . . , Z t+ k 1. This functionkk is zero for |k| > p where p is the number of autoregressive terms required by

    the underlying model.

  • 8/13/2019 Further Non-Stationarity Notes

    42/50

    5

    Cointegration

    5.1 The Error Correction Model

    Let us introduce the Error Correction Model which is favoured by economists as a

    means of modelling time series with both long and short run behaviours. Suppose

    we have two series I (Income) and C (Consumption) which each have unit roots,

    so that I and C are stationary. Now suppose that we believe that there is

    a relationship between I and C . Because these two series are non-stationary, we

    have seen that trying to model a relationship between these series is subject to

    the problem of spurious regression.

    But since the differenced series are stationary we may decide that these differenced

    series are related by a regression model:

    C t = 0 I t + et . (5.1)

    In this model increasing I by one unit per period will increase C by 0 units.

    Statistically this model is sound but in economics it may not be so reasonable.

    39

  • 8/13/2019 Further Non-Stationarity Notes

    43/50

    5. COINTEGRATION 40

    In particular, one might think that the relationship between the increase in

    Consumption given an increase in Income should also depend on the current

    level of Income. One reasoning for this might be that if one earns a lot then any

    increase in income will not be needed to be saved for necessities but could instead

    be spent freely, whereas if one is not in a high income bracket then extra income

    may not be so liberally consumed.

    Economists are also generally interested in systems reaching equilibrium and the

    model (5.1) does not include an equilibrium solution. In equilibrium we would

    have C t = C t 1 = . . . and I t = I t 1 = . . . .

    One way to try and x these problems is to include a term which is the deviation

    between the actual value of C in the previous period t 1 and the equilibrium

    value of C .

    Suppose the equilibrium relationship between C and I is linear:

    C equilt = I t . (5.2)

    Then the deviation from this equilibrium at period t 1 is C t 1 I t 1. We can

    incorporate this as a correction to model (5.1), and so the new model is:

    C t = 0 I t (C t 1 I t 1) + et . (5.3)

    The parameter is usually rewritten as (1 1), where 1 < 1 giving:

    C t = 0 I t (1 1)(C t 1 I t 1) + et . (5.4)

    This type of model is called an Error Correction Model (ECM) as it has the

    ability to correct disequilibria.

  • 8/13/2019 Further Non-Stationarity Notes

    44/50

    5. COINTEGRATION 41

    Let us consider how it implements this correction. Firstly, during periods of

    equilibrium the term ( C t 1 I t 1) will be zero and the model (5.4) will revert

    to the form (5.1). In a period of dis-equilibrium, C t increases faster or slower

    than expected by the equilibrium relationship (5.2).

    If C t increases slower than expected then we will nd ( C t 1 I t 1) < 0 but

    (1 1) < 0 also. So the net effect is to add a positive term to the equilibrium

    value 0 I t thus boosting C t and forcing C t back towards equilibrium.

    If C t increases faster than expected then we are instead adding a negative term

    which again forces C t back towards its equilibrium value.

    The original model (5.1) did have problems as far as economics was concerned

    however (5.1) was a relationship between stationary variables and so was sound

    statistically.

    The new model (5.4) may make more economic sense, however it now has statistical

    problems: it only makes sense if the new variable C t 1 I t 1 is stationary. But

    this variable is a linear combination of two non-stationary I(1) variables C and

    I at time t 1 and this will also in general be non-stationary I(1).

    We note that we can also generalize the ECM model (5.4), including more lag

    lengths, to the following relationship linking the variables C t and I t :

    A(L) C t = B(L) I t (1 )(C t 1 I t 1) + et . (5.5)

    where

    A(L) = 1 1L 2L2 . . . kLk ,

    B (L) = 0 + 1L + 2L2 + . . . + qLq.

  • 8/13/2019 Further Non-Stationarity Notes

    45/50

  • 8/13/2019 Further Non-Stationarity Notes

    46/50

    5. COINTEGRATION 43

    Alternatively (5.6) can be written as

    yt = yt k + 1 yt 1 + . . . + k 1 yt k+1 + t (5.8)

    where

    i = (I 1 2 . . . i), i = 1, 2, . . . , k 1

    and

    = 1 + . . . + k I.

    In this section we introduced the ECM but we have seen that this model appears

    not to make sense statistically as it involves both I(1) and I(0) variables together

    in the same regression. The solution to this problem was presented by Engle

    and Granger when they introduced the concept of cointegration, which we will

    examine in the next section of this chapter.

    5.3 Cointegration

    Consider two series y1t and y2t , which are both integrated of order d I(d). In

    general any linear combination of these series will also be integrated of order

    d. In particular if a regression is performed of y1t on y2t , then the residuals

    from this regression will be I(d) i.e. the regression will suffer from spurious

    correlation. Engle and Granger noticed that in some situations it might be

    possible to perform a regression containing non-stationary variables and still avoid

    spurious regression. They introduced the concept of cointegration:

  • 8/13/2019 Further Non-Stationarity Notes

    47/50

  • 8/13/2019 Further Non-Stationarity Notes

    48/50

    5. COINTEGRATION 45

    In fact if yt is an p dimensional vector time series then there may be h < p

    linearly independent p 1 vectors ( 1, 2, . . . , h ) such that yt is a stationary

    vector time series, where is the following h p matrix:

    =

    1 2...

    h

    =

    11 12 . . . 1 p

    21 22 . . . 2 p...

    h1 h2 . . . hp

    As mentioned, the vectors ( 1, 2, . . . , h ) are not unique, as for any non-zero

    1 h vector a the linear combination a yt is stationary.

    If ( 1, 2, . . . , h ) span the co-integrating space then they form a basis for the

    co-integrating space.

    Having seen the formal denition of cointegration let us consider what it meansin practice. Cointegration means that although there may be many apparently

    independent changes in the individual elements of yt , there are actually some

    long-run equilibrium relations tying the individual components together. These

    relations are represented by the linear combinations yt . So cointegration provides

    a model for the idea in economics of a long-run equilibrium to which the system

    will converge over time.

    We see now that if two variables are fully co-integrated CI (d, d) it is possible to

    perform a meaningful regression between them, the regression would pick up the

    stationary linear combination and the residuals would no longer be non-stationary

    thus eliminating the problem of spurious regression. In practice we mainly deal

    with I(1) variables and seek to nd co-integrating linear combinations which will

    be stationary.

  • 8/13/2019 Further Non-Stationarity Notes

    49/50

    5. COINTEGRATION 46

    5.4 Engle-Granger Estimation

    When the concept of cointegration was introduced by Engle and Granger, they

    suggested a procedure to test for cointegration among two variables. If two

    variables yt and xt are co-integrated then there is a stationary linear combination

    of the variables. This means that the model:

    yt = xt + et (5.9)

    describes a stationary relationship, does not suffer from spurious regression and

    can be consistently estimated by ordinary least squares.

    Now, if the two variables yt and x t are not co-integrated then there will not be a

    stationary linear combination of the two variables and hence equation (5.9) would

    once again suffer from spurious regression as the residuals will be non-stationary.

    Engle and Granger make use of this fact to construct a test for cointegration.

    They suggest using an ADF test on the residuals et of the (5.9) regression to

    see if they satisfy the null of being I(1) or the alternative of stationarity - I(0).

    So as described in Section 2.3 we should estimate:

    et = et 1 +

    k 1

    i=1

    i et i + + t + t ,

    t IID white noise (5.10)

    The trend and drift terms can be added in the regression (5.9) or in (5.10) but

    not in both.

  • 8/13/2019 Further Non-Stationarity Notes

    50/50

    5. COINTEGRATION 47

    This Engle-Granger procedure for testing for cointegration suffers from many

    problems:

    1. The test has low power

    2. In nite samples the cointegration estimates may be biased

    3. Inferences about parameters in (5.9) cannot be performed using standard

    t-statistics.

    In addition to these problems the fact is that this approach, which uses a singleequation in the model, is only really suitable if there is just one cointegrating

    relationship. In general, the multivariate Vector Auto Regression (VAR) approach

    of Johansen is to be preferred.