24
#1 EC 485: Time Series Analysis in a Nut Shell

#1 EC 485: Time Series Analysis in a Nut Shell. #2 Data Preparation: 1)Plot data and examine for stationarity 2)Examine ACF for stationarity 3)If not

Embed Size (px)

Citation preview

Page 1: #1 EC 485: Time Series Analysis in a Nut Shell. #2 Data Preparation: 1)Plot data and examine for stationarity 2)Examine ACF for stationarity 3)If not

#1

EC 485: Time Series Analysis in a Nut Shell

Page 2: #1 EC 485: Time Series Analysis in a Nut Shell. #2 Data Preparation: 1)Plot data and examine for stationarity 2)Examine ACF for stationarity 3)If not

#2 Data Preparation:1) Plot data and examine for stationarity2) Examine ACF for stationarity3) If not stationary, take first differences4) If variance appears non-constant, take logarithm before first differencing5) Examine the ACF after these transformations to determine if the series is now stationary

Model Identification and Estimation:1) Examine the ACF and PACF’s of your (now) stationary series to get some ideas about what ARIMA(p,d,q) models to estimate.2) Estimate these models3) Examine the parameter estimates, the AIC

statistic and test of white noise for the residuals.

Forecasting:1) Use the best model to construct forecasts2) Graph your forecasts against actual values3) Calculate the Mean Squared Error for the forecasts

Page 3: #1 EC 485: Time Series Analysis in a Nut Shell. #2 Data Preparation: 1)Plot data and examine for stationarity 2)Examine ACF for stationarity 3)If not

#3 Data Preparation:1) Plot data and examine. Do a visula inspection to determine if your series is non-

stationary.

2) Examine Autocorrelation Function (ACF) for stationarity. The ACF for a non-stationary series will show large autocorrelations that diminish only very slowly at large lags. (At this stage you can ignore the partial autocorrelations and you can always ignore what SAS calls the inverse autocorrelations.

3) If not stationary, take first differences. SAS will do this automatically in the IDENTIFY VAR=y(1) statement where the variable to be “identified” is y and the 1 refers to first-differencing.

4) If variance appears non-constant, take logarithm before first differencing. You would take the log before the IDENTIFY

statement:ly = log(y);PROC ARIMA;

IDENTIFY VAR=ly(1);

5) Examine the ACF after these transformations to determine if the series is now stationary

Page 4: #1 EC 485: Time Series Analysis in a Nut Shell. #2 Data Preparation: 1)Plot data and examine for stationarity 2)Examine ACF for stationarity 3)If not

#4 In this presentation, a variable measuring the capacity utilization for the U.S. economy is modeled. The data are monthly from 1967:1 – 2004:03.

It will be used as an example of how to carry out the three steps outlined on the previous slide.

We will remove the last 6 observations 2003:10 – 2004:03 so that we can construct out-of-sample forecasts and compare our models’ ability to forecast.

Page 5: #1 EC 485: Time Series Analysis in a Nut Shell. #2 Data Preparation: 1)Plot data and examine for stationarity 2)Examine ACF for stationarity 3)If not

#5

This plot of the raw data indicates non-stationarity, although theredoes not appear to be a strong trend.

Capacity Utilization 1967:1 – 2004:03 (in levels)

Page 6: #1 EC 485: Time Series Analysis in a Nut Shell. #2 Data Preparation: 1)Plot data and examine for stationarity 2)Examine ACF for stationarity 3)If not

#6

This plot of the ACF clearly indicates a non-stationary series.The autocorrelations diminish only very slowly.

This ACF plot is producedBy SAS using the code:PROC ARIMA; IDENTIFY VAR=cu;

It will also produce aninverse autocorrelation plotthat you can ignore and a partial autocorrelation plotthat we will use in themodeling stage.

The ARIMA Procedure Name of Variable = cu Mean of Working Series 81.61519 Standard Deviation 3.764998 Number of Observations 441 Autocorrelations Lag Covariance Correlation -1 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 1 0 14.175211 1.00000 | |********************| 1 13.884523 0.97949 | . |********************| 2 13.485201 0.95132 | . |******************* | 3 13.007277 0.91761 | . |****************** | 4 12.434837 0.87722 | . |****************** | 5 11.820231 0.83387 | . |***************** | 6 11.191805 0.78953 | . |**************** | 7 10.561770 0.74509 | . |*************** | 8 9.900866 0.69846 | . |************** | 9 9.215675 0.65013 | . |************* | 10 8.479804 0.59821 | . |************ | 11 7.713914 0.54418 | . |*********** | 12 6.928244 0.48876 | . |********** | 13 6.160440 0.43459 | . |********* | 14 5.422593 0.38254 | . |******** | 15 4.717018 0.33277 | . |*******. | 16 4.051825 0.28584 | . |****** . | 17 3.390746 0.23920 | . |***** . | 18 2.751886 0.19413 | . |**** . |

Page 7: #1 EC 485: Time Series Analysis in a Nut Shell. #2 Data Preparation: 1)Plot data and examine for stationarity 2)Examine ACF for stationarity 3)If not

#7

This graph of first differences appears to be stationary.

First differences of Capacity Utilization 1967:1 – 2004:03

Page 8: #1 EC 485: Time Series Analysis in a Nut Shell. #2 Data Preparation: 1)Plot data and examine for stationarity 2)Examine ACF for stationarity 3)If not

#8

This ACF was produced in SAS using the code:

PROC ARIMA; IDENTIFY VAR=cu(1);RUN;

where the (1) tells SAS to usefirst differences.

This ACF shows the autocorrelations diminishing fairly quickly. So wedecide that the first difference of the capacity util. rate is stationary.

Name of Variable = cu Period(s) of Differencing 1 Mean of Working Series -0.03295 Standard Deviation 0.584287 Number of Observations 440 Observation(s) eliminated by differencing 1 Autocorrelations Lag Covariance Correlation -1 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 1 0 0.341391 1.00000 | |********************| 1 0.126532 0.37064 | . |******* | 2 0.093756 0.27463 | . |***** | 3 0.079004 0.23142 | . |***** | 4 0.062319 0.18254 | . |**** | 5 0.021558 0.06315 | . |*. | 6 0.020578 0.06028 | . |*. | 7 0.018008 0.05275 | . |*. | 8 0.029300 0.08583 | . |** | 9 0.040026 0.11724 | . |** | 10 0.020880 0.06116 | . |*. | 11 0.010021 0.02935 | . |*. | 12 -0.0071559 -.02096 | . | . | 13 -0.026090 -.07642 | **| . | 14 -0.031699 -.09285 | **| . | 15 -0.032960 -.09654 | **| . | 16 -0.023544 -.06897 | . *| . | 17 -0.021426 -.06276 | . *| . | 18 -0.0084132 -.02464 | . | . |

Page 9: #1 EC 485: Time Series Analysis in a Nut Shell. #2 Data Preparation: 1)Plot data and examine for stationarity 2)Examine ACF for stationarity 3)If not

#9 In addition to the autocorrelation function (ACF) and partial autocorrelationfunctions (PACF) SAS will print out an autocorrelation check forwhite noise. Specifically, it prints out the Ljung-Box statistics, calledChi-Square below, and the p-values. If the p-value is very small as they are below, then we can reject the null hypothesis that all of the autocorrelations up to the stated lag are jointly zero. For example, for our capacity utilization data (first differences):

Ho: 1 =2 =3 =4 =5 =6 = 0 (the data series is white noise)H1: at least one is non-zero

2 = 136.45 with a p-value of less than 0.0001 easily reject Ho

A check for white noise on your stationary series is important, because if your series is white noise there is nothing to model and thus no point in carrying out any estimation or forecasting. We see here that the first difference of capacity utilization is not white noise, so we proceed to the modeling and estimation stage. Note: we can ignore the autocorrelation check for the data before differencing because it is non-stationary.

Autocorrelation Check for White Noise To Chi- Pr > Lag Square DF ChiSq ---------------Autocorrelations--------------- 6 136.45 6 <.0001 0.371 0.275 0.231 0.183 0.063 0.060 12 149.50 12 <.0001 0.053 0.086 0.117 0.061 0.029 -0.021 18 164.64 18 <.0001 -0.076 -0.093 -0.097 -0.069 -0.063 -0.025 24 221.29 24 <.0001 -0.059 -0.064 -0.118 -0.114 -0.145 -0.257

Page 10: #1 EC 485: Time Series Analysis in a Nut Shell. #2 Data Preparation: 1)Plot data and examine for stationarity 2)Examine ACF for stationarity 3)If not

#10 Model Identification and Estimation:1) Examine the Autocorrelation Function (ACF) and Partial Autocorrelation

Function (PACF) of your (now) stationary series to get some ideas about what ARIMA(p,d,q) models to estimate. The “d” in ARIMA stands for the number of times the data have been differenced to render to stationary. This was already determined in the previous section.

The “p” in ARIMA(p,d,q) measures the order of the autoregressive component. To get an idea of what orders to consider, examine the partial autocorrelation function. If the time-series has an autoregressive order of 1, called AR(1), then we should see only the first partial autocorrelation coefficient as significant. If it has an AR(2), then we should see only the first and second partial autocorrelation coefficients as significant. (Note, that they could be positive and/or negative; what matters is the statistical significance.) Generally, the partial autocorrelation function PACF will have significant correlations up to lag p, and will quickly drop to near zero values after lag p.

Page 11: #1 EC 485: Time Series Analysis in a Nut Shell. #2 Data Preparation: 1)Plot data and examine for stationarity 2)Examine ACF for stationarity 3)If not

#11 Here is the partial autocorrelation function PACF for the first-differences capacity utilization series. Notice that the first two (maybe three) autocorrelations are statistically significant. This suggests AR(2) or AR(3)model. There is a statistically significant autocorrelation at lag 24 (not printed here) but this can be ignored. Remember that 5% of the time we can get an Autocorr. that is more than 2 st. dev.s above zero when in fact the true one is zero.

Partial Autocorrelations Lag Correlation -1 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 1 1 0.37064 | . |******* | 2 0.15912 | . |*** | 3 0.10330 | . |** | 4 0.04939 | . |*. | 5 -0.07279 | .*| . | 6 0.00433 | . | . | 7 0.01435 | . | . | 8 0.06815 | . |*. | 9 0.08346 | . |** | 10 -0.02903 | .*| . | 11 -0.03996 | .*| . | 12 -0.07539 | **| . | 13 -0.08379 | **| . | 14 -0.03419 | .*| . | 15 -0.02101 | . | . | 16 0.01950 | . | . | 17 -0.00768 | . | . | 18 0.01681 | . | . |

Page 12: #1 EC 485: Time Series Analysis in a Nut Shell. #2 Data Preparation: 1)Plot data and examine for stationarity 2)Examine ACF for stationarity 3)If not

#12 Model Identification and Estimation: (con’t)

The “q” measures the order of the moving average component. To get an idea of what orders to consider, we examine the autocorrelation function. If the time-series is a moving average of order 1, called a MA(1), we should see only one significant autocorrelation coefficient at lag 1. This is because a MA(1) process has a memory of only one period. If the time-series is a MA(2), we should see only two significant autocorrelation coefficients, at lag 1 and 2, because a MA(2) process has a memory of only two periods. Generally, for a time-series that is a MA(q), the autocorrelation function will have significant correlations up to lag q, and will quickly drop to near zero values after lag q.

For the capacity utilization time-series, we see that the ACF function decays, but only for the first 4 lags. Then it appears to drop off to zero abruptly.

Therefore, a MA(4) might be considered.

Our initial guess is ARIMA(2,1,4) where the 1 tells us that the data have been first-differenced to render it stationary.

Page 13: #1 EC 485: Time Series Analysis in a Nut Shell. #2 Data Preparation: 1)Plot data and examine for stationarity 2)Examine ACF for stationarity 3)If not

#13 2) Estimate the Models:

To estimate the model in SAS is fairly straight forward. Go back to the PROC ARIMA and add the ESTIMATE command. Here we will estimate four models: ARIMA(1,1,0), ARIMA(1,1,1), ARIMA(2,1,0), and ARIMA(2,1,4). Although we believe the last of these will be the best, it is instructive to estimate a simple AR(1) on our differenced series, this is the ARIMA(1,1,0) a model with an AR(1) and a MA(1) on the differenced series; this is the ARIMA(1,1,1), and a model with only an AR(2) term. This is the ARIMA(2,1,0)

PROC ARIMA; IDENTIFY VAR=cu(1);

ESTIMATE p = 1:ESTIMATE p = 1 q=1;ESTIMATE p = 2;ESTIMATE p = 2 q = 4;

RUN;

This estimates an ARIMA(1,1,0)

This estimates ARIMA(1,1,1)

This estimates an ARIMA(2,1,0)

This estimates an ARIMA(2,1,4)

This tells SAS that d=1 for all models

Page 14: #1 EC 485: Time Series Analysis in a Nut Shell. #2 Data Preparation: 1)Plot data and examine for stationarity 2)Examine ACF for stationarity 3)If not

#14 3) Examine the parameter estimates, the AIC statistic and test of white noise

for the residuals.

On the next few slides you will see the results of estimating the 4 models discussed in the previous section. We are looking at the statistical significance of the parameter estimates. We also want to compare measures of overall fit. We will use the AIC statistic. It is based on the sum of squared residuals from estimating the model and it balances the reduction in degrees of freedom against the reduction in sum of squared residuals from adding more variables (lags of the time-series). The lower the sum of squared residuals, the better the model. According to Stock and Watson page (455) AIC is,

Where k = p+q+1, the number of parameters estimated, and n is sample size the AIC measure will increase. If ESS decreases, then AIC will decrease.

Page 15: #1 EC 485: Time Series Analysis in a Nut Shell. #2 Data Preparation: 1)Plot data and examine for stationarity 2)Examine ACF for stationarity 3)If not

#15 This is the ARIMA(1,1,0) model: yt =0 +1 yt-1 + ut

Things to notice: the parameter estimate on the AR(1) term 1 is statisticallysignificant, which is good. However, the autocorrelation check of the residuals tells us that the residuals from this ARIMA(1,1,0) are not white-noise, with a p-value of 0.003. We have left important information in the residuals that could be used. We need a better model.

These are the estimates of 0 and 1

Conditional Least Squares Estimation Standard Approx Parameter Estimate Error t Value Pr > |t| Lag MU -0.03528 0.04115 -0.86 0.3918 0 AR1,1 0.37113 0.04440 8.36 <.0001 1 Constant Estimate -0.02219 Variance Estimate 0.295778 Std Error Estimate 0.543854 AIC 714.6766 SBC 722.8502 Number of Residuals 440 * AIC and SBC do not include log determinant. Autocorrelation Check of Residuals To Chi- Pr > Lag Square DF ChiSq ---------------Autocorrelations--------------- 6 17.95 5 0.0030 -0.059 0.103 0.109 0.114 -0.021 0.029 12 22.89 11 0.0183 0.006 0.040 0.092 0.017 0.022 -0.008 18 27.95 17 0.0455 -0.052 -0.048 -0.058 -0.022 -0.043 0.020 24 50.98 23 0.0007 -0.039 -0.008 -0.079 -0.037 -0.032 -0.198 30 62.85 29 0.0003 -0.071 -0.045 -0.087 -0.026 -0.056 0.082 36 68.07 35 0.0007 -0.046 0.056 -0.042 -0.027 -0.041 -0.040

Page 16: #1 EC 485: Time Series Analysis in a Nut Shell. #2 Data Preparation: 1)Plot data and examine for stationarity 2)Examine ACF for stationarity 3)If not

#16 This is the ARIMA(1,1,1) model: yt =0 +1 yt-1 + ut + 1ut-1

Things to notice: the parameter estimates of the AR(1) term 1 and of the MA(1) term 1 are statistically significant. Also, the autocorrelation check of the residuals tells us that the residuals from this ARIMA(1,1,1) are white-noise, since the Chi-Square statistics up to a lag of 18 have p-values less than 10%, meaning we cannot reject the null hypothesis that the autocorrelations up to lag 18 are jointly zero (p-value = 0.4021). Also the AIC statistic is smaller. So we might be done …

These are the estimates of 0 , 1 and1

Conditional Least Squares Estimation Standard Approx Parameter Estimate Error t Value Pr > |t| Lag MU -0.04037 0.05586 -0.72 0.4703 0 MA1,1 0.46161 0.09410 4.91 <.0001 1 AR1,1 0.75599 0.06951 10.88 <.0001 1 Constant Estimate -0.00985 Variance Estimate 0.286071 Std Error Estimate 0.534856 AIC 700.9892 SBC 713.2496 Number of Residuals 440 * AIC and SBC do not include log determinant. Autocorrelation Check of Residuals To Chi- Pr > Lag Square DF ChiSq ---------------Autocorrelations--------------- 6 4.71 4 0.3187 -0.001 -0.012 0.031 0.045 -0.079 -0.034 12 10.53 10 0.3953 -0.029 0.032 0.097 0.031 0.023 -0.012 18 16.75 16 0.4021 -0.062 -0.061 -0.059 -0.016 -0.017 0.045 24 35.15 22 0.0374 -0.002 0.014 -0.048 -0.008 -0.024 -0.190 30 45.51 28 0.0196 -0.072 -0.028 -0.066 -0.017 -0.022 0.104 36 49.89 34 0.0386 -0.003 0.070 -0.023 -0.025 -0.038 -0.040

Page 17: #1 EC 485: Time Series Analysis in a Nut Shell. #2 Data Preparation: 1)Plot data and examine for stationarity 2)Examine ACF for stationarity 3)If not

#17 This is the ARIMA(2,1,0) model: yt =0 +1 yt-1 + 2 yt-2 + ut

This model has statistically significant coefficient estimates, the residuals up to lag 6 reject the null hypothesis of white noise, casting some doubt on thismodel. We won’t place much meaning in the Chi-Square statistics for lags beyond 18. The AIC statistic is larger, which is not good.

Conditional Least Squares Estimation Standard Approx Parameter Estimate Error t Value Pr > |t| Lag MU -0.03783 0.04829 -0.78 0.4338 0 AR1,1 0.31208 0.04726 6.60 <.0001 1 AR1,2 0.15929 0.04726 3.37 0.0008 2 Constant Estimate -0.02 Variance Estimate 0.288946 Std Error Estimate 0.537537 AIC 705.3888 SBC 717.6491 Number of Residuals 440 * AIC and SBC do not include log determinant. Autocorrelation Check of Residuals To Chi- Pr > Lag Square DF ChiSq ---------------Autocorrelations--------------- 6 8.67 4 0.0700 -0.017 -0.045 0.085 0.089 -0.045 -0.007 12 13.96 10 0.1747 -0.010 0.038 0.096 0.023 0.019 -0.007 18 18.73 16 0.2832 -0.054 -0.053 -0.052 -0.020 -0.025 0.030 24 38.35 22 0.0167 -0.016 -0.004 -0.063 -0.009 -0.022 -0.193 30 47.43 28 0.0123 -0.067 -0.021 -0.070 -0.031 -0.034 0.085 36 51.02 34 0.0305 -0.019 0.053 -0.029 -0.030 -0.033 -0.037

Page 18: #1 EC 485: Time Series Analysis in a Nut Shell. #2 Data Preparation: 1)Plot data and examine for stationarity 2)Examine ACF for stationarity 3)If not

#18 This is the ARIMA(2,1,4) model: yt =0 +1 yt-1 +2 yt-2 + ut + 1ut-1 + 2ut-2 + 3ut-3+ 4ut-4

Two of the parameter estimates are not statistically significant telling us the model is not “parsimonious”, and the AIC statistic is larger than the AIC for the ARIMA(1,1,1) model. Ignore the first Chi-Square statistic since it has 0 d.o.f. dueto estimating a model with 7 parameters. The Chi-Square statistic at 18 lags is statistically insignificant indicating white noise.

Conditional Least Squares Estimation Standard Approx Parameter Estimate Error t Value Pr > |t| Lag MU -0.03613 0.04697 -0.77 0.4423 0 MA1,1 0.48913 0.29916 1.64 0.1028 1 MA1,2 -0.43438 0.13474 -3.22 0.0014 2 MA1,3 -0.17179 0.05634 -3.05 0.0024 3 MA1,4 -0.11146 0.08044 -1.39 0.1666 4 AR1,1 0.78020 0.29788 2.62 0.0091 1 AR1,2 -0.44336 0.19274 -2.30 0.0219 2 Constant Estimate -0.02396 Variance Estimate 0.284717 Std Error Estimate 0.533589 AIC 702.8553 SBC 731.4627 Number of Residuals 440 * AIC and SBC do not include log determinant. Autocorrelation Check of Residuals To Chi- Pr > Lag Square DF ChiSq ---------------Autocorrelations--------------- 6 0.00 0 <.0001 -0.000 0.003 0.005 0.020 -0.009 0.068 12 5.66 6 0.4624 0.028 0.032 0.072 0.008 0.022 -0.002 18 9.94 12 0.6212 -0.049 -0.050 -0.054 -0.016 -0.024 0.026 24 27.26 18 0.0743 -0.029 -0.003 -0.063 -0.022 -0.022 -0.177 30 35.68 24 0.0590 -0.058 -0.030 -0.070 -0.025 -0.048 0.076 36 40.12 30 0.1025 -0.027 0.056 -0.034 -0.033 -0.040 -0.040

Page 19: #1 EC 485: Time Series Analysis in a Nut Shell. #2 Data Preparation: 1)Plot data and examine for stationarity 2)Examine ACF for stationarity 3)If not

#19 Forecasts:proc arima; identify var=cu(1); estimate p=1; (any model goes here) forecast lead=6 id=date interval=month out=fore1;

We calculate the Mean Squared Error for the 6 out-of-sample forecasts. Graphs appear on the next four slides. We find that the fourth model produces forecasts with the smallest MSE.

SAS automaticallyadjusts the data fromfirst differences back intolevels.

MSE = (1/6)*(cuf – cua)2

where f is forecast anda is actual.

forecast with arima(1,1,0) USS N 7.7992476 6 forecast with arima(1,1,1) USS N 5.7563282 6 forecast with arima(2,1,0) USS N 6.7735246 6 forecast with arima(2,1,4) USS N 4.7313712 6

Page 20: #1 EC 485: Time Series Analysis in a Nut Shell. #2 Data Preparation: 1)Plot data and examine for stationarity 2)Examine ACF for stationarity 3)If not

#20

These are the forecasts forthe 4 models. Notice the model ARIMA(2,1,4)does the best job as wasconfirmed on the previousslide

forecast with arima(1,1,0) 86 OCT03 75.0 75.0263 87 NOV03 75.7 75.0509 88 DEC03 75.8 75.0379 89 JAN04 76.2 75.0109 90 FEB04 76.7 74.9787 91 MAR04 76.5 74.9445 *********************************************************** forecast with arima(1,1,1) 86 OCT03 75.0 75.0215 87 NOV03 75.7 75.1034 88 DEC03 75.8 75.1555 89 JAN04 76.2 75.1851 90 FEB04 76.7 75.1976 91 MAR04 76.5 75.1972 *********************************************************** forecast with arima(2,1,0) 86 OCT03 75.0 75.0048 87 NOV03 75.7 75.0813 88 DEC03 75.8 75.1018 89 JAN04 76.2 75.1004 90 FEB04 76.7 75.0833 91 MAR04 76.5 75.0577 ********************************************************* forecast with arima(2,1,4) 86 OCT03 75.0 75.1540 87 NOV03 75.7 75.3396 88 DEC03 75.8 75.3883 89 JAN04 76.2 75.3511 90 FEB04 76.7 75.2766 91 MAR04 76.5 75.2110

Page 21: #1 EC 485: Time Series Analysis in a Nut Shell. #2 Data Preparation: 1)Plot data and examine for stationarity 2)Examine ACF for stationarity 3)If not

#21

Page 22: #1 EC 485: Time Series Analysis in a Nut Shell. #2 Data Preparation: 1)Plot data and examine for stationarity 2)Examine ACF for stationarity 3)If not

#22

Page 23: #1 EC 485: Time Series Analysis in a Nut Shell. #2 Data Preparation: 1)Plot data and examine for stationarity 2)Examine ACF for stationarity 3)If not

#23

Page 24: #1 EC 485: Time Series Analysis in a Nut Shell. #2 Data Preparation: 1)Plot data and examine for stationarity 2)Examine ACF for stationarity 3)If not

#24