29
BABS 502 ARIMA Forecasting March 18, 2014

BABS 502 ARIMA Forecasting March 18, 2014. (c) Martin L. Puterman2 General Overview An ARIMA model is a mathematical model for time series data. Statisticians

Embed Size (px)

Citation preview

Page 1: BABS 502 ARIMA Forecasting March 18, 2014. (c) Martin L. Puterman2 General Overview An ARIMA model is a mathematical model for time series data. Statisticians

BABS 502

ARIMA ForecastingMarch 18, 2014

Page 2: BABS 502 ARIMA Forecasting March 18, 2014. (c) Martin L. Puterman2 General Overview An ARIMA model is a mathematical model for time series data. Statisticians

(c) Martin L. Puterman 2

General Overview

• An ARIMA model is a mathematical model for time series data.

• Statisticians George Box and Gwilym Jenkins developed a systematic approach for fitting these models to data so these models are often called Box-Jenkins models.

• We always use statistical or forecasting programs to fit these models– The programs fit models and produce forecasts.– Some choose best model automatically.

• But it is beneficial to understand the basic model to know that what the software is doing makes sense– Especially if we use an automatic forecasting program.

Page 3: BABS 502 ARIMA Forecasting March 18, 2014. (c) Martin L. Puterman2 General Overview An ARIMA model is a mathematical model for time series data. Statisticians

(c) Martin L. Puterman 3

ARIMA Models

• ARIMA Stands for AutoRegressive Integrated Moving Average

• We speak also of AR models, MA models, ARMA models, IMA models which are special cases of this general class.

• Models generalize regression but “independent” variables are past values of the series itself and unobservable random disturbances.

• Estimation is based on maximum likelihood; not least squares.

• We distinguish between seasonal and non-seasonal models.

Page 4: BABS 502 ARIMA Forecasting March 18, 2014. (c) Martin L. Puterman2 General Overview An ARIMA model is a mathematical model for time series data. Statisticians

(c) Martin L. Puterman 4

Notation• Y1, Y2, …, Yt denotes a series of values for a

time series. – These are observable.

• e1, e2, …, et denotes a series of random disturbances. – These are not observable.– They may be thought of as a series of random

shocks.– Usually they are assumed to be generated from a

Normal distribution with mean 0 and standard deviation and to be uncorrelated with each other.

– They are often called “white noise”.

Page 5: BABS 502 ARIMA Forecasting March 18, 2014. (c) Martin L. Puterman2 General Overview An ARIMA model is a mathematical model for time series data. Statisticians

(c) Martin L. Puterman 5

An Autoregressive (AR(1)) Model• AR(1) Model: Yt = A1Yt-1 + et

– A1 is an unknown parameter with values between -1 and +1 which is to be estimated from data

– As a first approximation we can estimate A1 by linear regression (with intercept set equal to 0) (How?)

• When A1 = 1, the model is called a random walk.– In this case,

Yt = Yt-1 + et – or alternatively

Yt - Yt-1 = et

– We can show (by back substitution and assuming Y0 = 0) that for a random walk

• E(Yt ) = 0 and Var(Yt) = t2 • Hence the values get more variable as you move out in the series.• This means that when data follows a random walk the best

prediction of the future is the present (a naïve forecast) and the prediction gets less accurate the further into the future we forecast.

Page 6: BABS 502 ARIMA Forecasting March 18, 2014. (c) Martin L. Puterman2 General Overview An ARIMA model is a mathematical model for time series data. Statisticians

The ACF for a random walk• If Yt is a random walk, it can be represented by

Yt = et + et-1 + … + e1

• Consequently– cov(Yt,Yt-1) = (t-1)σ2

– Var(Yt) = tσ2

• So that– Corr(Yt,Yt-1) = (t-1)/t

– Corr(Yt,Yt-k) = (t-k)/t

• This gives the ACF this shape ->

(c) Martin L. Puterman 6

-1.0

-0.5

0.0

0.5

1.0

0.0 10.3 20.5 30.8 41.0

Model: ArmaRoutine(1;0;0;0)

LagA

uto

corr

ela

tions

Page 7: BABS 502 ARIMA Forecasting March 18, 2014. (c) Martin L. Puterman2 General Overview An ARIMA model is a mathematical model for time series data. Statisticians

(c) Martin L. Puterman 7

Random Walk

-1.0

-0.5

0.0

0.5

1.0

0.0 10.3 20.5 30.8 41.0

Model: ArmaRoutine(1;0;0;0)

Lag

Auto

corr

ela

tions

-1.0

-0.5

0.0

0.5

1.0

0.0 10.3 20.5 30.8 41.0

Model: ArmaRoutine(1;0;0;0)

Lag

Part

ial A

uto

corr

ela

tions

-8.0

-4.5

-1.0

2.5

6.0

0.9 25.9 50.9 75.9 100.9

Plot of Simulated Data

Time

Sim

ula

ted D

ata

Page 8: BABS 502 ARIMA Forecasting March 18, 2014. (c) Martin L. Puterman2 General Overview An ARIMA model is a mathematical model for time series data. Statisticians

(c) Martin L. Puterman 8

Other AR(p) models

• The AR(2) Model– Yt = A1Yt-1 +A2 Yt-2 + et

– Here, A1 and A2 are unknown parameters

• The AR(p) Model– Yt = A1Yt-1 +A2 Yt-2 + … + Ap Yt-p+ et

– Here, A1, … Ap are unknown parameters

• To apply these in practice, we estimate the parameters and then use the model for forecasting by substituting past observed values.

• These models are called ARIMA(p,0,0) models.

Page 9: BABS 502 ARIMA Forecasting March 18, 2014. (c) Martin L. Puterman2 General Overview An ARIMA model is a mathematical model for time series data. Statisticians

Models with constants

• Above models assume 0 mean.

• An AR model with a constant c has form Yt = c + A1Yt-1 +A2 Yt-2 + … + Ap Yt-p+ et

(c) Martin L. Puterman 9

Page 10: BABS 502 ARIMA Forecasting March 18, 2014. (c) Martin L. Puterman2 General Overview An ARIMA model is a mathematical model for time series data. Statisticians

(c) Martin L. Puterman 10

Which Model to Fit?

• The Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF) give some insight into what model to fit to data. – We work backwards here.

• Given a theoretical model, we can determine theoretically what its ACF and PACF should be.

• So if the ACF and PACF from the data have a recognizable pattern then we try fitting a model that could generate that pattern to the data.

• What is a PACF?– The pth partial autocorrelation is the coefficient of Yt-p in a

regression of Yt on Yt-1, Yt-2, …, Yt-p. – Thus, if the data was generated by an AR(2) model, in theory the

first two PACFs would be non-zero and all PACF’s higher than two would be zero.

Page 11: BABS 502 ARIMA Forecasting March 18, 2014. (c) Martin L. Puterman2 General Overview An ARIMA model is a mathematical model for time series data. Statisticians

(c) Martin L. Puterman 11

Some further comments on ACFs and PACFs

• Computing autocorrelations (ACs) is similar to performing a series of simple regressions of Yt on Yt-1, then on Yt-2, then on Yt-3, ….– The AC coefficients reflect only the relationship between the two

quantities included in the regression.

• Computing partial autocorrelations (PACs) is more in the spirit of multiple regression. The PACs remove the effects of all lower order lags before computing the autocorrelation. – For example the 2nd order PAC is the effect of observations two

periods ago on the current observation, given that the effect of the observation one period ago has been removed.

– This can be viewed as multiple regression.

Page 12: BABS 502 ARIMA Forecasting March 18, 2014. (c) Martin L. Puterman2 General Overview An ARIMA model is a mathematical model for time series data. Statisticians

(c) Martin L. Puterman 12

Example: AR(1) model A1 = .8

-1.0

-0.5

0.0

0.5

1.0

0.0 10.3 20.5 30.8 41.0

Model: ArmaRoutine(0.8;0;0;0)

Lag

Auto

corr

ela

tions

-6.0

-3.5

-1.0

1.5

4.0

0.9 25.9 50.9 75.9 100.9

Plot of Simulated Data

Time

Sim

ula

ted D

ata

-1.0

-0.5

0.0

0.5

1.0

0.0 10.3 20.5 30.8 41.0

Model: ArmaRoutine(0.8;0;0;0)

Lag

Part

ial A

uto

corr

ela

tions

Page 13: BABS 502 ARIMA Forecasting March 18, 2014. (c) Martin L. Puterman2 General Overview An ARIMA model is a mathematical model for time series data. Statisticians

(c) Martin L. Puterman 13

Example: AR(1) Model; A1 =-.7

-1.0

-0.5

0.0

0.5

1.0

0.0 10.3 20.5 30.8 41.0

Model: ArmaRoutine(-0.7;0;0;0)

Lag

Auto

corr

ela

tions

-6.0

-3.0

0.0

3.0

6.0

0.9 25.9 50.9 75.9 100.9

Plot of Simulated Data

Time

Sim

ula

ted D

ata

-1.0

-0.5

0.0

0.5

1.0

0.0 10.3 20.5 30.8 41.0

Model: ArmaRoutine(-0.7;0;0;0)

Lag

Part

ial A

uto

corr

ela

tions

Page 14: BABS 502 ARIMA Forecasting March 18, 2014. (c) Martin L. Puterman2 General Overview An ARIMA model is a mathematical model for time series data. Statisticians

(c) Martin L. Puterman 14

Example: AR(2) Model

-1.0

-0.5

0.0

0.5

1.0

0.0 10.3 20.5 30.8 41.0

Model: ArmaRoutine(0.8,-0.5;0;0;0)

Lag

Auto

corr

ela

tions

-1.0

-0.5

0.0

0.5

1.0

0.0 10.3 20.5 30.8 41.0

Model: ArmaRoutine(0.8,-0.5;0;0;0)

Lag

Part

ial A

uto

corr

ela

tions

-4.0

-2.0

0.0

2.0

4.0

0.9 25.9 50.9 75.9 100.9

Plot of Simulated Data

Time

Sim

ula

ted D

ata

Page 15: BABS 502 ARIMA Forecasting March 18, 2014. (c) Martin L. Puterman2 General Overview An ARIMA model is a mathematical model for time series data. Statisticians

(c) Martin L. Puterman 15

Monthly Pulp Price Data

-1.0

-0.5

0.0

0.5

1.0

0.0 10.3 20.5 30.8 41.0

Autocorrelations of pulp (0,0,12,1,0)

Time

Auto

corr

ela

tions

-1.0

-0.5

0.0

0.5

1.0

0.0 10.3 20.5 30.8 41.0

Partial Autocorrelations of pulp (0,0,12,1,0)

Time

Part

ial A

uto

corr

ela

tions

200.0

450.0

700.0

950.0

1200.0

0.9 63.9 126.9 189.9 252.9

Plot of pulp

Time

pulp

Page 16: BABS 502 ARIMA Forecasting March 18, 2014. (c) Martin L. Puterman2 General Overview An ARIMA model is a mathematical model for time series data. Statisticians

(c) Martin L. Puterman 16

Annual Births Data

-1.0

-0.5

0.0

0.5

1.0

0.0 10.3 20.5 30.8 41.0

Autocorrelations of Births (0,0,12,1,0)

Time

Auto

corr

ela

tions

-1.0

-0.5

0.0

0.5

1.0

0.0 10.3 20.5 30.8 41.0

Partial Autocorrelations of Births (0,0,12,1,0)

Time

Part

ial A

uto

corr

ela

tions

300000.0

350000.0

400000.0

450000.0

500000.0

0.9 14.1 27.4 40.6 53.9

Plot of Births

Time

Birth

s

Page 17: BABS 502 ARIMA Forecasting March 18, 2014. (c) Martin L. Puterman2 General Overview An ARIMA model is a mathematical model for time series data. Statisticians

(c) Martin L. Puterman 17

Stationarity• A time series is stationary if:

– It’s mean is the same at every time– It’s variance is the same every time– It’s autocorrelations are the same at every time

• A series of outcomes from independent identical trials is stationary.• A series with a trend is not stationary.• A random walk is not stationary. (Why?)• If a time series is non-stationary, its ACF dies off slowly and the first

partial autocorrelation is near 1.– In such cases we can sometimes create a stationary series by

differencing the original series.– If Yt is a random walk, then its differences are white noise which is

stationary

• A unit root test (Section 8.1) is a formal test for non-stationarity – One such test is the Dickey-Fuller test (adf.test in R)– See also the KPSS test

Page 18: BABS 502 ARIMA Forecasting March 18, 2014. (c) Martin L. Puterman2 General Overview An ARIMA model is a mathematical model for time series data. Statisticians

(c) Martin L. Puterman 18

Differenced Births Data

-1.0

-0.5

0.0

0.5

1.0

0.0 10.3 20.5 30.8 41.0

Autocorrelations of Births (1,0,12,1,0)

Time

Auto

corr

ela

tions

-1.0

-0.5

0.0

0.5

1.0

0.0 10.3 20.5 30.8 41.0

Partial Autocorrelations of Births (1,0,12,1,0)

Time

Part

ial A

uto

corr

ela

tions

The PACF suggests that the differences of the birth data may follow an AR(1) or AR(2) or AR(5) model.

Page 19: BABS 502 ARIMA Forecasting March 18, 2014. (c) Martin L. Puterman2 General Overview An ARIMA model is a mathematical model for time series data. Statisticians

(c) Martin L. Puterman 19

Differenced Pulp Price Data

-1.0

-0.5

0.0

0.5

1.0

0.0 10.3 20.5 30.8 41.0

Autocorrelations of pulp (1,0,12,0,0)

Time

Auto

corr

ela

tions

-1.0

-0.5

0.0

0.5

1.0

0.0 10.3 20.5 30.8 41.0

Partial Autocorrelations of pulp (1,0,12,0,0)

Time

Part

ial A

uto

corr

ela

tions

The story is less clear here. Perhaps the differences follow an AR(1), the lag 1 PAC is .346, the lag 2 PAC is .184.

Page 20: BABS 502 ARIMA Forecasting March 18, 2014. (c) Martin L. Puterman2 General Overview An ARIMA model is a mathematical model for time series data. Statisticians

(c) Martin L. Puterman 20

Differenced Models

• We let Zt = Yt – Yt-1.

• When the differenced model is stationary, we can write a model in terms of Zt .

• If Zt follows an AR(p) model, then Yt follows and ARIMA(p,1,0) model.

• In practice ARIMA(1,1,0) and ARIMA(2,1,0) are quite common.

Page 21: BABS 502 ARIMA Forecasting March 18, 2014. (c) Martin L. Puterman2 General Overview An ARIMA model is a mathematical model for time series data. Statisticians

(c) Martin L. Puterman 21

Pulp Data

• The fit from an ARIMA(1,1,0) model is– A1 =.346 (t-value 5.46)– So fitted model is

• Zt = .346 Zt-1 + et

– The residuals appear to have no remaining autocorrelation

– Forecasts seem pretty flat; 561.7, 562.3, 562.6, 562.6, 562.6

-1.0

-0.5

0.0

0.5

1.0

0.0 12.3 24.5 36.8 49.0

Autocorrelations of Residuals

Lag

Auto

corr

ela

tions

0.0

300.0

600.0

900.0

1200.0

982.9 1051.9 1120.9 1189.9 1258.9

pulp Chart

Time

pulp

Page 22: BABS 502 ARIMA Forecasting March 18, 2014. (c) Martin L. Puterman2 General Overview An ARIMA model is a mathematical model for time series data. Statisticians

(c) Martin L. Puterman 22

MA(q) Models• These are less plausible but fit many series well.• MA(1) model:

– Yt = et + W1 et-1

• MA(2) model:– Yt = et + W1 et-1 + W2 et-2

• MA(q) model– Yt = et + W1 et-1 + W2 et-2 +…+ Wq et-q

– This is referred to as an ARIMA(0,0,q) model.

• Rationale for MA models is that effects of disturbances are short lived (q periods) as opposed to an AR model where they persist forever.

• Note that the disturbances are not observable.

Page 23: BABS 502 ARIMA Forecasting March 18, 2014. (c) Martin L. Puterman2 General Overview An ARIMA model is a mathematical model for time series data. Statisticians

Differenced models with constants

IfZt = c + et

Then–Yt = c + Yt-1 + et

which is a random walk with “drift”

(c) Martin L. Puterman 23

Page 24: BABS 502 ARIMA Forecasting March 18, 2014. (c) Martin L. Puterman2 General Overview An ARIMA model is a mathematical model for time series data. Statisticians

(c) Martin L. Puterman 24

An MA(1) Model: W1 = .7

-1.0

-0.5

0.0

0.5

1.0

0.0 10.3 20.5 30.8 41.0

Model: ArmaRoutine(0;0;.7;0)

Lag

Auto

corr

ela

tions

-1.0

-0.5

0.0

0.5

1.0

0.0 10.3 20.5 30.8 41.0

Model: ArmaRoutine(0;0;.7;0)

Lag

Part

ial A

uto

corr

ela

tions

-4.0

-2.0

0.0

2.0

4.0

0.9 25.9 50.9 75.9 100.9

Plot of Simulated Data

Time

Sim

ula

ted D

ata

Page 25: BABS 502 ARIMA Forecasting March 18, 2014. (c) Martin L. Puterman2 General Overview An ARIMA model is a mathematical model for time series data. Statisticians

(c) Martin L. Puterman 25

An MA(1) Model: W1 = -.7

-1.0

-0.5

0.0

0.5

1.0

0.0 10.3 20.5 30.8 41.0

Model: ArmaRoutine(0;0;-.7;0)

Lag

Auto

corr

ela

tions

-1.0

-0.5

0.0

0.5

1.0

0.0 10.3 20.5 30.8 41.0

Model: ArmaRoutine(0;0;-.7;0)

Lag

Part

ial A

uto

corr

ela

tions

-4.0

-2.0

0.0

2.0

4.0

0.9 25.9 50.9 75.9 100.9

Plot of Simulated Data

Time

Sim

ula

ted D

ata

Page 26: BABS 502 ARIMA Forecasting March 18, 2014. (c) Martin L. Puterman2 General Overview An ARIMA model is a mathematical model for time series data. Statisticians

(c) Martin L. Puterman 26

Births Data

• Clearly differencing is required

• Consider fitting an MA(1) model to the differenced data

• Find that estimated coefficient is -.42 with a T-value of -3.87

• But autocorrelation of residuals contains information– Note lag 2 AC = .349

-1.0

-0.5

0.0

0.5

1.0

0.0 12.3 24.5 36.8 49.0

Autocorrelations of Residuals

Lag

Auto

corr

ela

tions

Page 27: BABS 502 ARIMA Forecasting March 18, 2014. (c) Martin L. Puterman2 General Overview An ARIMA model is a mathematical model for time series data. Statisticians

(c) Martin L. Puterman 27

Births Data

• Try an ARIMA(0,1,2) model

• Parameters are -.37 (t =-3.47 ), -.59 (t=-5.76)

• Residuals appear to be white noise.

• Forecasts are 338311, 340936, 340936,….

-1.0

-0.5

0.0

0.5

1.0

0.0 12.3 24.5 36.8 49.0

Autocorrelations of Residuals

Lag

Auto

corr

ela

tions

150000.0

250000.0

350000.0

450000.0

550000.0

0.9 20.1 39.4 58.6 77.9

Births Chart

Time

Birth

s

Page 28: BABS 502 ARIMA Forecasting March 18, 2014. (c) Martin L. Puterman2 General Overview An ARIMA model is a mathematical model for time series data. Statisticians

(c) Martin L. Puterman 28

The ARIMA(0,1,1) Model Revisited

• This model can be written as (letting w = -W1)

Yt –Yt-1 = et - w et-1

• The forecast from this model is

Ft = Yt-1 - w(Yt-1 - Ft-1) = (1-w) Yt-1 + w Ft-1

• This is simple exponential smoothing

• The new concept here is that the ARIMA(0,1,1) model is a formal statistical model while simple exponential is an ad hoc approach to forecasting.This means that there is an error term and hence forecast

errors and hypothesis tests are part of the model.

Page 29: BABS 502 ARIMA Forecasting March 18, 2014. (c) Martin L. Puterman2 General Overview An ARIMA model is a mathematical model for time series data. Statisticians

Relationship between MA and AR Models

• Any finite AR model can be written as an infinite MA model

• Any finite MA model can be written as an infinite AR model.– These results can be shown by backward substitution (as

we did previously for the AR models)

• Two consequences of these observations– Model Selection

• If your best fit is an AR model with several terms (i.e., 4 or more); try an MA model with a few terms and conversely

– Identification• AR models have ACF with several terms and short PACFs• MA models have short ACF’s and long PACFs

(c) Martin L. Puterman 29