Lecture 51011

Embed Size (px)

Citation preview

  • Introduction to Stata Handout 5: Time Series

    Hayley Fisher

    2 December 2010

    Key references: Wooldridge (2009) part II, Greene (2008) chapter 21.

    1 Time series

    To illustrate some basic features of Stata with time series, I am using a dataset of the general fertility rateand personal tax exemptions for the US from 1913 to 1984. This is based on an example in chapter 10 ofWooldridge (2009), and ultimately on an article by Whittington, Alm and Peters from 1990. The dataset isavailable from my website.

    We start off by summarizing the small dataset.

    . summarize

    Variable | Obs Mean Std. Dev. Min Max

    -------------+--------------------------------------------------------

    gfr | 72 95.63194 19.80464 65.4 126.6

    pe | 72 100.4015 65.87563 0 243.83

    year | 72 1948.5 20.92845 1913 1984

    pill | 72 .3055556 .4638749 0 1

    ww2 | 72 .0694444 .2559923 0 1

    gfr is the general fertility rate, and we are looking to explain it using personal tax exemption (pe),whether World War II was occurring, and whether the contraceptive pill was available. Stata makes it easyto create variables such as lags, leads and first differences. To take advantage of these we first need to declarethe data to be time series using the tsset command:

    . tsset year

    time variable: year, 1913 to 1984

    delta: 1 unit

    Once this has been done, we can create variables using the lag (L.), lead (F.) and first difference (D.)operators:

    . generate Lgfr=L.gfr

    (1 missing value generated)

    . generate Fgfr=F.gfr

    (1 missing value generated)

    . generate Dgfr=D.gfr

    (1 missing value generated)

    A missing value is created each time at the beginning, or end, of the dataset. We can check that these arecorrect by listing the first five values:

    1

  • . list year gfr Lgfr Fgfr Dgfr in 1/5

    +------------------------------------------+

    | year gfr Lgfr Fgfr Dgfr |

    |------------------------------------------|

    1. | 1913 124.7 . 126.6 . |

    2. | 1914 126.6 124.7 125 1.900002 |

    3. | 1915 125 126.6 123.4 -1.599998 |

    4. | 1916 123.4 125 121 -1.599998 |

    5. | 1917 121 123.4 119.8 -2.400002 |

    +------------------------------------------+

    We can naively estimate the relationship between the general fertility rate and these variables usingregress.

    . regress gfr pe ww2 pill, vce(robust)

    Linear regression Number of obs = 72

    F( 3, 68) = 51.57

    Prob > F = 0.0000

    R-squared = 0.4734

    Root MSE = 14.685

    ------------------------------------------------------------------------------

    | Robust

    gfr | Coef. Std. Err. t P>|t| [95% Conf. Interval]

    -------------+----------------------------------------------------------------

    pe | .08254 .0269359 3.06 0.003 .0287902 .1362898

    ww2 | -24.2384 3.351355 -7.23 0.000 -30.92592 -17.55087

    pill | -31.59403 3.131765 -10.09 0.000 -37.84337 -25.34469

    _cons | 98.68176 4.222996 23.37 0.000 90.2549 107.1086

    ------------------------------------------------------------------------------

    As would be expected, the fertility appears to be lower during World War II and when the contraceptivepill is available. For these results to be consistent we require no serial correlation in the error terms. Wecan test for this using the Durbin-Watson statistic, implemented by typing estat dwatson after runningthe regression.

    . estat dwatson

    Durbin-Watson d-statistic( 4, 72) = .1768727

    The statistic is far below 2 and so indicates serial correlation. Stata can also implement two LagrangeMultiplier tests for serial correlation, using estat durbinalt and estat bgodfrey.

    . estat durbinalt

    Durbins alternative test for autocorrelation

    ---------------------------------------------------------------------------

    lags(p) | chi2 df Prob > chi2

    -------------+-------------------------------------------------------------

    1 | 255.261 1 0.0000

    ---------------------------------------------------------------------------

    H0: no serial correlation

    2

  • . estat bgodfrey

    Breusch-Godfrey LM test for autocorrelation

    ---------------------------------------------------------------------------

    lags(p) | chi2 df Prob > chi2

    -------------+-------------------------------------------------------------

    1 | 57.031 1 0.0000

    ---------------------------------------------------------------------------

    H0: no serial correlation

    Again, serial correlation is detected. There are several ways of dealing with serial correlation of the errorterms. One method might be to add lags of the dependent variable. Here we can add two lags and it isnot necessary to create both lags before running the regression.

    . regress gfr L.gfr L2.gfr pe ww2 pill

    Source | SS df MS Number of obs = 70

    -------------+------------------------------ F( 5, 64) = 341.66

    Model | 25053.8199 5 5010.76397 Prob > F = 0.0000

    Residual | 938.613043 64 14.6658288 R-squared = 0.9639

    -------------+------------------------------ Adj R-squared = 0.9611

    Total | 25992.4329 69 376.701926 Root MSE = 3.8296

    ------------------------------------------------------------------------------

    gfr | Coef. Std. Err. t P>|t| [95% Conf. Interval]

    -------------+----------------------------------------------------------------

    gfr |

    L1. | 1.076351 .1214261 8.86 0.000 .8337748 1.318928

    L2. | -.175926 .1156576 -1.52 0.133 -.4069785 .0551266

    pe | .0227773 .0086381 2.64 0.010 .0055207 .040034

    ww2 | -2.545189 2.096259 -1.21 0.229 -6.732947 1.642569

    pill | -4.824531 1.439447 -3.35 0.001 -7.700157 -1.948904

    _cons | 8.143783 3.182315 2.56 0.013 1.786377 14.50119

    ------------------------------------------------------------------------------

    . estat durbinalt

    Durbins alternative test for autocorrelation

    ---------------------------------------------------------------------------

    lags(p) | chi2 df Prob > chi2

    -------------+-------------------------------------------------------------

    1 | 0.498 1 0.4805

    ---------------------------------------------------------------------------

    H0: no serial correlation

    . estat bgodfrey

    Breusch-Godfrey LM test for autocorrelation

    ---------------------------------------------------------------------------

    lags(p) | chi2 df Prob > chi2

    -------------+-------------------------------------------------------------

    1 | 0.549 1 0.4589

    ---------------------------------------------------------------------------

    3

  • H0: no serial correlation

    Adding these lags removes the serial correlation, and reduces the point estimates of coefficients. Note thatwith a lagged dependent variable the standard Durbin-Watson statistics is not appropriate.

    Instead of removing the serial correlation from the error terms, we could attempt to correct the standarderrors using newey. Here we must specify the order of serial correlation the number of lags over whichto calculate the serial correlation in the errors. I include the heteroscedasticity robust but not adjusted forserial correlation results for comparison:

    . regress gfr pe ww2 pill, vce(robust)

    Linear regression Number of obs = 72

    F( 3, 68) = 51.57

    Prob > F = 0.0000

    R-squared = 0.4734

    Root MSE = 14.685

    ------------------------------------------------------------------------------

    | Robust

    gfr | Coef. Std. Err. t P>|t| [95% Conf. Interval]

    -------------+----------------------------------------------------------------

    pe | .08254 .0269359 3.06 0.003 .0287902 .1362898

    ww2 | -24.2384 3.351355 -7.23 0.000 -30.92592 -17.55087

    pill | -31.59403 3.131765 -10.09 0.000 -37.84337 -25.34469

    _cons | 98.68176 4.222996 23.37 0.000 90.2549 107.1086

    ------------------------------------------------------------------------------

    . newey gfr pe ww2 pill, lag(2)

    Regression with Newey-West standard errors Number of obs = 72

    maximum lag: 2 F( 3, 68) = 25.84

    Prob > F = 0.0000

    ------------------------------------------------------------------------------

    | Newey-West

    gfr | Coef. Std. Err. t P>|t| [95% Conf. Interval]

    -------------+----------------------------------------------------------------

    pe | .08254 .0436866 1.89 0.063 -.0046352 .1697153

    ww2 | -24.2384 3.615538 -6.70 0.000 -31.45309 -17.0237

    pill | -31.59403 5.016629 -6.30 0.000 -41.60456 -21.58351

    _cons | 98.68176 7.003884 14.09 0.000 84.70572 112.6578

    ------------------------------------------------------------------------------

    As the order of serial correlation in the error term increases, the standard errors also increase. Thisadjusts the standard errors for the presence of serial correlation.

    Alternatively, it is possible to directly model the serial correlation in the error terms using the arimacommand. For example, if we believe that the error terms are best modelled by a first order autoregressiveprocess, and so (3) below should be estimated:

    gfrt = Xt + ut (1)

    ut = ut1 + t (2)

    gfrt = Xt + (gfrt1 Xt1) + t (3)

    4

  • Using arima with the option ar(1) achieves this. Extra autoregressive lags can be added, as can movingaverage components using the option ma(i) where i is the order of the moving average term.

    . arima gfr pe ww2 pill, ar(1)

    (setting optimization to BHHH)

    Iteration 0: log likelihood = -231.52653

    Iteration 1: log likelihood = -221.50334

    Iteration 2: log likelihood = -217.64887

    Iteration 3: log likelihood = -217.50191

    Iteration 4: log likelihood = -215.36991

    (switching optimization to BFGS)

    Iteration 5: log likelihood = -214.43403

    Iteration 6: log likelihood = -211.39705

    Iteration 7: log likelihood = -209.80199

    Iteration 8: log likelihood = -207.01082

    Iteration 9: log likelihood = -206.41696

    Iteration 10: log likelihood = -206.12935

    Iteration 11: log likelihood = -205.96173

    Iteration 12: log likelihood = -205.95487

    Iteration 13: log likelihood = -205.95375

    Iteration 14: log likelihood = -205.9535

    (switching optimization to BHHH)

    Iteration 15: log likelihood = -205.95347

    Iteration 16: log likelihood = -205.95347

    ARIMA regression

    Sample: 1913 - 1984 Number of obs = 72

    Wald chi2(4) = 791.75

    Log likelihood = -205.9535 Prob > chi2 = 0.0000

    ------------------------------------------------------------------------------

    | OPG

    gfr | Coef. Std. Err. z P>|z| [95% Conf. Interval]

    -------------+----------------------------------------------------------------

    gfr |

    pe | -.025783 .0313082 -0.82 0.410 -.087146 .0355799

    ww2 | -5.033364 1.873287 -2.69 0.007 -8.704939 -1.36179

    pill | -4.253375 19.34541 -0.22 0.826 -42.16968 33.66292

    _cons | 98.88224 18.5853 5.32 0.000 62.45571 135.3088

    -------------+----------------------------------------------------------------

    ARMA |

    ar |

    L1. | .982849 .040655 24.18 0.000 .9031666 1.062531

    -------------+----------------------------------------------------------------

    /sigma | 4.129038 .3477984 11.87 0.000 3.447366 4.81071

    ------------------------------------------------------------------------------

    Having explicitly modelled the autocorrelation we see that the personal exemption and pill variables areno longer statistically significant.

    However, it is also hypothesised that there may be a lag in the response to the personal tax exemption.We should therefore include lagged values of the personal exemption.

    . regress gfr pe L.pe L2.pe pill ww2, vce(robust)

    5

  • Linear regression Number of obs = 70

    F( 5, 64) = 31.21

    Prob > F = 0.0000

    R-squared = 0.4986

    Root MSE = 14.27

    ------------------------------------------------------------------------------

    | Robust

    gfr | Coef. Std. Err. t P>|t| [95% Conf. Interval]

    -------------+----------------------------------------------------------------

    pe |

    --. | .0726718 .0979877 0.74 0.461 -.1230812 .2684248

    L1. | -.0057796 .1132301 -0.05 0.959 -.2319826 .2204235

    L2. | .0338268 .0894437 0.38 0.707 -.1448575 .2125111

    pill | -31.30499 3.123293 -10.02 0.000 -37.54448 -25.06549

    ww2 | -22.1265 6.950919 -3.18 0.002 -36.01256 -8.24044

    _cons | 95.8705 4.284189 22.38 0.000 87.31185 104.4291

    ------------------------------------------------------------------------------

    The coefficients on the lags of pe are imprecisely estimated, but we can test their joint significance using thetest command introduced above:

    . test pe L.pe L2.pe

    ( 1) pe = 0

    ( 2) L.pe = 0

    ( 3) L2.pe = 0

    F( 3, 64) = 4.67

    Prob > F = 0.0051

    The three coefficients are jointly significantly different from zero at the 1% level.We may also want to include a time trend. This can be generated (plus a quadratic) if the data is sorted

    by date:

    . sort year

    . generate t=_n

    . generate t2=t^2

    These are easily included in a regression model. It remains important to test for the presence of serialcorrelation in the error term.

    However, we must also consider the possibility of a unit root in the series we are trying to explain. Timeseries can be simply displayed using the tsline command once the data have been declared to be time series.Graphing the fertility rate suggests that a unit root is very likely.

    6

  • . tsline gfr

    6080

    100

    120

    140

    birth

    s pe

    r 100

    0 wo

    men

    15

    44

    1920 1940 1960 19801913 to 1984

    To investigate further we can look at the autocorrelations of both the fertility rate and its first difference.These too strongly suggest a unit root which is removed by first differencing the series.

    . ac gfr, lags(10)

    1.

    00

    0.50

    0.00

    0.50

    1.00

    Auto

    corre

    latio

    ns o

    f gfr

    0 2 4 6 8 10Lag

    Bartletts formula for MA(q) 95% confidence bands

    7

  • . ac Dgfr, lags(10)

    0.

    40

    0.20

    0.00

    0.20

    0.40

    Auto

    corre

    latio

    ns o

    f Dgf

    r

    0 2 4 6 8 10Lag

    Bartletts formula for MA(q) 95% confidence bands

    A formal test for a unit root can be conducted using a Dickey-Fuller test and the dfuller command.

    . dfuller gfr

    Dickey-Fuller test for unit root Number of obs = 71

    ---------- Interpolated Dickey-Fuller ---------

    Test 1% Critical 5% Critical 10% Critical

    Statistic Value Value Value

    ------------------------------------------------------------------------------

    Z(t) -0.857 -3.551 -2.913 -2.592

    ------------------------------------------------------------------------------

    MacKinnon approximate p-value for Z(t) = 0.8019

    . dfuller Dgfr

    Dickey-Fuller test for unit root Number of obs = 70

    ---------- Interpolated Dickey-Fuller ---------

    Test 1% Critical 5% Critical 10% Critical

    Statistic Value Value Value

    ------------------------------------------------------------------------------

    Z(t) -6.161 -3.552 -2.914 -2.592

    ------------------------------------------------------------------------------

    MacKinnon approximate p-value for Z(t) = 0.0000

    Here the null hypothesis is that there is a unit root. So there is insufficient evidence to suggest that thereis no unit root in the gfr series, but the null hypothesis is rejected in the first difference of the series, assuggested by the autocorrelations.

    References

    Greene, William H., Econometric Analysis, 6th ed., Pearson/Prentice Hall, 2008.

    8

  • Wooldridge, Jeffrey M., Introductory Econometrics: A Modern Approach, 4th ed., South Western /Cengage Learning, 2009.

    9