15
1 CH4. Auto Regression Type Model Stationary Processes often a time series has same type of random behavior from one time period to the next outside temperature: each summer is similar to the past summers interest rates and returns on equities stationary stochastic processes are probability models for such series process stationary if behavior unchanged by shifts in time a process is weakly stationary if its mean, variance, and covariance are unchanged by time shifts thus 1 X , 2 X ,… is a weakly stationary process if ( ) i EX (a constant) for all i 2 ( ) i Var X (a constant) for all i ( , ) ( ) i j Corr X X i j for all i and j for some function the correlation between two observations depends only on the time distance between them (called the lag) example: correlation between 2 X and 5 X = correlation between 7 X and 10 X is the correlation function Note that () ( ) h h covariance between t X and t h X is denoted by () h () is called the autocovariance function Note that 2 () () h h and that 2 (0) since (0) 1 () ( ) h h many financial time series not stationary but the changes in these time series may be stationary: 1 t t t z y y or (1 ) t t z By Lag operator B. White Noise simplest example of stationary process: No Correlation Case 1 X , 2 X ,…is White noise or 2 ( , ) WN if ( ) i EX for all i 2 ( ) i Var X (a constant) for all i ( , ) 0 i j Corr X X for all i j Note: Distribution specification is not required. It does not have to be a normal distribution If 1 X , 2 X ,… IID normal then process is Gaussian white noise process white noise process is weakly stationary with (0) 1 and () 0 t if t 0 so that 2 (0) and () 0 t if t 0 WN is uninteresting in itself but is the building block of important models usually used as error terms.

CH4. Auto Regression Type Modelocw.sogang.ac.kr/rfile/2015/TSDAF/CH4_AR_20160111141208.pdf · – The output data set includes all the variables in the input data set, the forecast

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: CH4. Auto Regression Type Modelocw.sogang.ac.kr/rfile/2015/TSDAF/CH4_AR_20160111141208.pdf · – The output data set includes all the variables in the input data set, the forecast

1

CH4. Auto Regression Type Model

Stationary Processes

• often a time series has same type of random behavior from one time period to the next

– outside temperature: each summer is similar to the past summers

– interest rates and returns on equities

• stationary stochastic processes are probability models for such series

• process stationary if behavior unchanged by shifts in time

• a process is weakly stationary if its mean, variance, and covariance are unchanged by time

shifts

• thus 1X , 2X ,… is a weakly stationary process if

( )iE X (a constant) for all i

2( )iVar X (a constant) for all i

( , ) ( )i jCorr X X i j for all i and j for some function

• the correlation between two observations depends only on the time distance between them

(called the lag)

• example: correlation between 2X and 5X = correlation between 7X and 10X

• is the correlation function

Note that ( ) ( )h h

• covariance between tX and t hX is denoted by ( )h

• ( ) is called the autocovariance function

• Note that 2( ) ( )h h and that

2(0) since (0) 1

( ) ( )h h

• many financial time series not stationary

• but the changes in these time series may be stationary: 1t t tz y y or (1 )t tz B y

• Lag operator B.

White Noise

• simplest example of stationary process:

No Correlation Case

• 1X , 2X ,…is White noise or 2( , )WN if

( )iE X for all i

2( )iVar X (a constant) for all i

( , ) 0i jCorr X X for all i j

Note: Distribution specification is not required.

It does not have to be a normal distribution

• If 1X , 2X ,… IID normal then process is Gaussian white noise process

• white noise process is weakly stationary with

(0) 1 and ( ) 0t if t 0

so that 2( 0 ) and ( ) 0t if t 0

• WN is uninteresting in itself

– but is the building block of important models usually used as error terms.

Page 2: CH4. Auto Regression Type Modelocw.sogang.ac.kr/rfile/2015/TSDAF/CH4_AR_20160111141208.pdf · – The output data set includes all the variables in the input data set, the forecast

2

• It is interesting to know if a financial time series, e.g., of net returns, is WN.

Estimating parameters of a stationary process

• observe 1,..., ny y

• estimate and 2 with Y and

2s

• estimate autocovariance with

1

1

( ) ( )( )n h

j h j

j

h n y y y y

• estimate ( ) with

ˆ( )ˆ( )

ˆ(0)

hh

, h = 1, 2,…:

Auto Regressive (AR) processes

AR(1) processes

• time series models with correlation built from WN

• in AR processes ty is modeled as a weighted average of past observations plus a white

noise “error”

• AR(1) is simplest AR process

• 1 2, ,... are 2(0, )WN

• 1, 2,...y y is an AR(1) process if

1( )t t ty y for all t

three parameters:

: mean, 2

: variance of one-step ahead prediction errors

: a correlation parameter

• 1(1 )t t ty y

• compare with linear regression model,

0 1t t ty x

• 0 (1 ) is called the “constant” in computer output

• is called the “mean” in the output

• When 1 then

2

1 2 ...t t t ty 0

h

t h

h

• infinite moving average [MA( )] representation

• If 1 , then 1,...y is a weakly stationary process

since 1 , 0h as the lag h

• Note: if { }t is a strictly independent zero-mean random variable, then a stationary time

series { }ty is linear. Otherwise the series is nonlinear.

Properties of a stationary AR(1) process

• When 1 (stationarity), then

Page 3: CH4. Auto Regression Type Modelocw.sogang.ac.kr/rfile/2015/TSDAF/CH4_AR_20160111141208.pdf · – The output data set includes all the variables in the input data set, the forecast

3

( )tE y t

2

2(0) ( )

1tVar y

t

2

2( ) ( , )

1

h

t t hh Cov y y

t

( ) ( , )h

t t hh Corr y y t

Only if 1 and only for AR(1) processes

• if 1 , then the AR(1) process is nonstationary, and the mean, variance, and correlation

are not constant

• Formulas 1–4 can be proved using

2

1 2

0

... h

t t t t t h

h

y

For example 2

2 2

20 0

( ) ( )1

h h

t t h

h h

Var y Var

• Also, for h > 0 2

20 0

( ) ( , )1

h

i j

t i t h j

i j

h Cov

• distinguish between 2

= variance of 1 2,..., and (0) = variance of 1 2, ,...y y

Non-stationary AR(1) processes

Random Walk

• if = 1 (unit root case) then 1t t ty y

• not stationary

• random walk process

• 1 2 1 0 1( ) ... ...t t t t t t ty y y y = 0

1

t

i

i

y

• start at the process at an arbitrary point 0y then 0 0( )tE y y y for all t

• and 2

0( )tVar y y t

• A shock on t on time 0t is transient in the stationary process, whereas it is permanent

in the unit root series.

• Stationary processes tend to have mean reversion, whereas unit root processes tend to move

irregularly off the mean (there is even no stationary level of mean to revert).

• Unit root processes include the unpredictable stochastic trend.

When 1 , an AR(1) process has explosive behavior

• Suppose an explosive AR(1) process starts at 0 0y and has 0 Then

1t t ty y 2 1( )t t ty

2

2 1t t ty

Page 4: CH4. Auto Regression Type Modelocw.sogang.ac.kr/rfile/2015/TSDAF/CH4_AR_20160111141208.pdf · – The output data set includes all the variables in the input data set, the forecast

4

... 2 1

1 2 1 0... t t

t t t y

• Therefore, 0( ) t

tE y y and

• 2

2 2 4 2( 1) 2

2

1( ) (1 ... )

1

tt

tVar y

Since 1 , variance increases geometrically fast at t

• increasing variance makes the random walk “wander” AR(1) processes when 1

• Explosive AR processes not widely used in econometrics since economic growth usually is

not explosive.

(n = 200)

Unit Root Tests

• Test for 0 : 1H v.s. 1 :| | 1H

0 : 1H nonstationary 1 : | | 1H stationary

• Dickey-Fuller Test

1t t ty y , where 2~ (0, )t WN

• If 1 , ty are referred to as integrated with order one or I(1).

1t t t ty y y

Note: t is not integrated, i.e., I(0).

• Order of integration: I(d).

It reports the minimum number of differences required to obtain a stationary series.

A time series is integrated of order d if (1 )d

tB X yields a stationary process.

Page 5: CH4. Auto Regression Type Modelocw.sogang.ac.kr/rfile/2015/TSDAF/CH4_AR_20160111141208.pdf · – The output data set includes all the variables in the input data set, the forecast

5

• Augmented Dickey-Fuller Test

1

1

p

t t i t i t

i

y y y

, where 2~ (0, )t WN

• Three types

– Zero mean: 1t t ty y

– Single mean: 1t t ty y

– Trend: 1t t ty t y

• The following SAS code performs augmented Dickey-Fuller tests with autoregressive orders

2 and 5.

proc arima data=test;

identify var=x stationarity=(adf=(2,5));

run;

• Phillips-Perron test is using nonparametric estimation skill

• Unlike the null hypothesis of the Dickey-Fuller and Phillips-Perron tests, the null hypothesis

of the KPSS states that the time series is stationary.

• R-code for unit root tests

library(tseries)

x = rnorm(1000) # no unit-root

y = diffinv(x) # contains a unit-root

adf.test(x)

pp.test(x)

kpss.test(x)

Estimation

• AR(1) model is a linear regression model

• one creates a lagged variable in y and uses this as the “x-variable”

• The least squares estimation: minimize

2

1

2

( )n

t t

t

y y

• If the errors are Gaussian white noise then OLS =MLE

• In SAS, use the “AUTOREG” or the “ARIMA” procedure

• SAS provides maximum likelihood estimates (ML), unconditional least squares estimates

(ULS), Yule-Walker estimates (YW: default), iterative Yule-Walker estimates (ITYW)

proc autoreg data=b;

model y = time / nlag=2 method=ml backstep;

output out=p p=yhat pm=ytrend

lcl=lcl ucl=ucl;

run;

Page 6: CH4. Auto Regression Type Modelocw.sogang.ac.kr/rfile/2015/TSDAF/CH4_AR_20160111141208.pdf · – The output data set includes all the variables in the input data set, the forecast

6

– The output data set includes all the variables in the input data set, the forecast values

(YHAT), the predicted trend (YTREND), and the upper (UCL) and lower (LCL) 95%

confidence limits.

– Backstep: stepwise variable selection method

– For more about Yule-Walker estimates, see Gallant and Goebel (1976)

Residuals

1ˆˆ ˆ ˆ( )t t ty y

• estimate 1 2, ,..., n since 1( )t t ty y

• used to check that 1 2, ,..., ny y y is an AR(1) process

• autocorrelation in residuals evidence against AR(1) assumption

• to test for residual autocorrelation use SAS’s autocorrelation plots

• can also use the Ljung-Box test

null hypothesis is that autocorrelations up to a specified lag are zero

SAS code for SACF and PSACF

• computes the extended sample autocorrelation function

• uses these estimates to tentatively identify the autoregressive and moving average orders of

mixed models.

• The following code generates an ESACF table with dimensions of p=(0:7) and q=(0:8).

proc arima data=call;

identify var=vol esacf p=(0:7) q=(0:8);

run;

Autocorrelation Tests

• Durbin-Watson Test

Suppose 1t t te e , where | | 1 with t is iid normal distributed.

Test 0 : 0H is testing no first order autocorrelation in te .

R-code:

– Autocorrelation test for OLS residual

library(lmtest)

dwtest(Revenue~Assets, data=bankdat)

– The number of lags can be specified using the max.lag argument

library(car)

results =lm(Y ~ x1 + x2)

durbin.watson(results,max.lag=2)

SAS code:

– Autocorrelation test for OLS residual

Page 7: CH4. Auto Regression Type Modelocw.sogang.ac.kr/rfile/2015/TSDAF/CH4_AR_20160111141208.pdf · – The output data set includes all the variables in the input data set, the forecast

7

proc autoreg data=a;

model y = time / dw=4 dwprob;

run;

• Box-Pierce and Ljung-Box Test

library(ts)

y = arima.sim(list(order = c(2,1,1), ar = c(0.7, 0.2), ma=(0.3)), n = 200)

#y: simulated time series from arima(2,1,1) model

ts.plot(y)

a =arima(y,order=c(1,1,0))

Box.test(a$resid)

Box.test(a$resid, type="Ljung-Box")

Example: GE daily returns

• The SAS output comes from running the following program (proc autoreg)

Here is the SAS output

• SAS uses the model

Page 8: CH4. Auto Regression Type Modelocw.sogang.ac.kr/rfile/2015/TSDAF/CH4_AR_20160111141208.pdf · – The output data set includes all the variables in the input data set, the forecast

8

1t t ty y

• ̂ = -0.225 and standard deviation of ̂ is 0.0616

• t-value for testing 0 : 0H versus

1 : 0H is - 3.60

• null hypothesis: log returns are white noise, and alternative is that they are correlated

• small p-value is evidence against the geometric random walk hypothesis

• ( ) hh correlation between successive log returns

• In summary, AR(1) process fits GE log returns better than white noise

• not proof that the AR(1) fits these data

• to check that the AR(1) fits well, look sample autocorrelation function (SACF) of the

residuals

• plot of the residual SACF is available from SAS

• SACF of the residuals from the GE daily log returns shows high negative autocorrelation at

lag 6

ˆ(6) is outside the test limits so is “significant” at .05

SACF of residuals from an AR(1) fit to GE daily log returns

• the more conservative Ljung-Box “simultaneous” test that (1) ... (12) 0 has

pvalue = .011 (in MINITAB)

• since the AR(1) model does not fit well, one might consider more complex models

• these will be discussed in following sections

• can also estimate and test that is zero

– t-value for testing that is zero is very small

– p-value is near one

– large values of the p-value are insignificant

Example: Cree Daily Log Returns

• The SAS output comes from running the following program (proc arima)

Page 9: CH4. Auto Regression Type Modelocw.sogang.ac.kr/rfile/2015/TSDAF/CH4_AR_20160111141208.pdf · – The output data set includes all the variables in the input data set, the forecast

9

Here is the SAS output

AR(p) models

• ty is AR(p) process if

1 1 2 2( ) ( ) ( ) ... ( )t t t p t p ty y y y

• here 1,..., n is 2(0, )WN

• multiple linear regression model with lagged values of the time series as the “x-variables”

• model can be re-expressed as

0 1 1 ...t t p t p ty y y

here 0 11 ( ... )p

• least-squares estimator minimizes

2

0 1 1

|1

( ... )n

t t p t p

t p

y y y

Page 10: CH4. Auto Regression Type Modelocw.sogang.ac.kr/rfile/2015/TSDAF/CH4_AR_20160111141208.pdf · – The output data set includes all the variables in the input data set, the forecast

10

• least-squares estimator can be calculated using a multiple linear regression program

• one must create “x-variables” by lagging the timeseries with lags 1 throught p

• easier to use the ARIMA command in SAS’s AUTOREG procedures

• these do the lagging automatically

Stepwise regression applied to AR processes

• stepwise regression: looks at many regression models

see which ones fit the data well will be discussed later

• backwards regression:

starts with all possible x-variables

eliminates them one at time

stop when all remaining variables are “significant”

• can be applied to AR models

• SAS’s AUTOREG procedure allows backstepping as an option

• Example

proc autoreg data=a;

model y = time / method=ml nlag=5 backstep;

run;

The following SAS program starts with an AR(6) model and backsteps

Here is the SAS output

Page 11: CH4. Auto Regression Type Modelocw.sogang.ac.kr/rfile/2015/TSDAF/CH4_AR_20160111141208.pdf · – The output data set includes all the variables in the input data set, the forecast

11

Page 12: CH4. Auto Regression Type Modelocw.sogang.ac.kr/rfile/2015/TSDAF/CH4_AR_20160111141208.pdf · – The output data set includes all the variables in the input data set, the forecast

12

Forecasting

• AR models can forecast future values

• consider forecasting using an AR(1) process

• have data 1,..., ny y

• and estimates ̂ and ̂

• remember 1 1( )n n ny y and 1 1( ,..., ) 0n nE y y

so we estimate 1ny by 1ˆˆ ˆ ˆ: ( )n ny y

and 2ny by 2 1ˆ ˆ ˆˆ ˆ ˆ ˆ ˆ ˆ: ( ) ( )n n ny y y etc.

• in general, ˆˆ ˆ ˆ( )k

n k ny y

• if ̂ < 1 then as k increases forecasts decay exponentially fast to ̂

• forecasting general AR(p) processes is similar

• Example: for an AR(2) process

– 1 1 2 1 1( ) ( )n n n ny y y

– therefore

1 1 2 1ˆ ˆˆ ˆ ˆ: ( ) ( )n n ny y y

2 1 1 2ˆ ˆˆ ˆ ˆ ˆ ˆ: ( ) ( )n n ny y y , etc.

• the forecasts can be generated automatically by statistical software SAS

• SAS program: AR(2) using proc Autoreg

proc autoreg data=b;

model y = / nlag=2 method=ml backstep;

output out=p p=yhat pm=ytrend

lcl=lcl ucl=ucl;

run;

– The output data set includes all the variables in the input data set, the forecast values

(YHAT), the predicted trend (YTREND), and the upper (UCL) and lower (LCL) 95%

confidence limits.

Page 13: CH4. Auto Regression Type Modelocw.sogang.ac.kr/rfile/2015/TSDAF/CH4_AR_20160111141208.pdf · – The output data set includes all the variables in the input data set, the forecast

13

Appendices

Co-integration Tests

• If we find the response and predictor variables are integrated (non-stationary), then we might

suspect the spurious regression problem (see Granger and Newbold, 1974) from the models.

• A regression model involving the non-stationary series can spuriously lead to a significant

relationship between unrelated series. a spurious regression problem

• However, Engle and Granger (1987) claimed that if the error term is stationary, in which case,

the non-stationary time series are said to be co-integrated. Then the relationship between

variables is interpreted to be in long-run equilibrium.

• Technically, Hamilton has shown that the OLS estimates for the coefficients of the regression

model are consistent under the existence of the co-integration (Hamilton, 1994, pp. 590~591).

• Test for the non-stationairy of each response and predictor variable

test for a unit root of each variable

• Test for the co-integrating relationship between variables

test for a unit root in the residuals of the co-integration regression

• Let ty and tx be integrated (non-stationary), and

t t ty X e

If ty and tx are co-integrated, then estimates of te would be I(0).

If not, estimates of te would be also non-stationary for some .

• Evidence of co-integration implies that a variable captures the dominant source of persistent

innovations in the other variable over this period interested in long term relationship

• Phillips-Ouliaris Co-integration Test

– Computes the Phillips-Ouliaris test for the null hypothesis that x is not co-integrated.

– The unit root is estimated from a regression of the first variable (column) of x on the

remaining variables of x without a constant and a linear trend.

– R-code

x = diffinv(rnorm(1000))

y = 2.0-3.0*x+rnorm(x,sd=5)

z = ts(cbind(x,y)) # co-integrated

po.test(z)

• If the co-integration relationship is detected, the error correction model (ECM) is usually

applied to model the dynamic relationship among the co-integrated variables.

• ECM employs the differenced variables to transform the original series into a stationary

process.

• Example

1 1 1

k k k

t t i t i t i t

i i i

Y Y RE I u

Granger Causality

• For some 0k , if 2 2( ( | )) ( ( | ))t k t k t t k t k tE y E y F E y E y ,

Page 14: CH4. Auto Regression Type Modelocw.sogang.ac.kr/rfile/2015/TSDAF/CH4_AR_20160111141208.pdf · – The output data set includes all the variables in the input data set, the forecast

14

then we say that x Granger-causes y

Note: tF denotes the information set of x and y available at time t

t denotes the information set of y available at time t

• Packet ‘lmtest’ in R: Currently, the methods for the generic function ‘grangertest’ only

perform tests for Granger causality in bivariate series. The test is simply a Wald test

comparing the unrestricted model—in which y is explained by the lags (up to order order) of

y and x—and the restricted model—in which y is only explained by the lags of y.

• Null hypothesis: x does not Granger-cause y

• R-code

## Which came first: the chicken or the egg?

library(lmtest)

data(ChickEgg)

grangertest(egg ~ chicken, order = 3, data = ChickEgg)

grangertest(chicken ~ egg, order = 3, data = ChickEgg)

## alternative ways of specifying the same test

grangertest(ChickEgg, order = 3)

grangertest(ChickEgg[, 1], ChickEgg[, 2], order = 3)

R-code

# simulated time series y from AR(2) model and estimate & forecasting

library(ts)

y = arima.sim(list(order = c(2,0,0), ar = c(0.7, 0.2)), n = 200) #simulation

ts.plot(y)

a=arima(y,order=c(2,0,0)) #estimation (fitting)

predict(arima(lh, order = c(2,0,0)), n.ahead = 12) # forecasting up to 12-time ahead

# load a random walk data from R packet, and plot it

library(TSA)

data(rwalk)

plot(rwalk, type='o', ylab='Random Walk')

Time

Ran

dom

Wal

k

0 10 20 30 40 50 60

-20

24

68

# multivariate case example

library(vars)

## Not run:

Page 15: CH4. Auto Regression Type Modelocw.sogang.ac.kr/rfile/2015/TSDAF/CH4_AR_20160111141208.pdf · – The output data set includes all the variables in the input data set, the forecast

15

data(Canada)

var.2c <- VAR(Canada, p = 2, type = "const")

plot(var.2c)

## Diagnostic Testing

## ARCH test

archtest <- arch.test(var.2c)

plot(archtest)

## Normality test

normalitytest <- normality.test(var.2c)

plot(normalitytest)

## serial correlation test

serialtest <- serial.test(var.2c)

plot(serialtest)

## Prediction

var.2c.prd <- predict(var.2c, n.ahead = 8, ci = 0.95)

plot(var.2c.prd)

## Stability

var.2c.stabil <- stability(var.2c, type = "Rec-CUSUM")

plot(var.2c.stabil)

## End(Not run)

Reading lists

[1] Cryer, J.D., Chan, K. (2008), Time Series Analysis with Applications in R, Springer, New York,

USA.

[2] Enger RF and Granger CWJ (1987). Conintegration and error correction: representation,

estimation, and testing. Econometrica 55: 251-276.

[3] Gallant, A. R. and Goebel, J. J. (1976) Nonlinear Regression with Autoregressive Errors, Journal

of the American Statistical Association, 71, 961–967.

[4] Granger CWJ and Newbold P (1974). Spurious regressions in econometrics. J Econometrics 2:

111-120.

[5] Hamilton J (1994). Time Series Analysis. Princeton, New Jersey.

[6] Tsay, R.S. (2005), Analysis of Financial Time Series, Wiley, New Jersey, USA.

[7] 경제시계열분석 (2002), 박준용, 장유순, 한상범, 경문사