Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
1
CH4. Auto Regression Type Model
Stationary Processes
• often a time series has same type of random behavior from one time period to the next
– outside temperature: each summer is similar to the past summers
– interest rates and returns on equities
• stationary stochastic processes are probability models for such series
• process stationary if behavior unchanged by shifts in time
• a process is weakly stationary if its mean, variance, and covariance are unchanged by time
shifts
• thus 1X , 2X ,… is a weakly stationary process if
( )iE X (a constant) for all i
2( )iVar X (a constant) for all i
( , ) ( )i jCorr X X i j for all i and j for some function
• the correlation between two observations depends only on the time distance between them
(called the lag)
• example: correlation between 2X and 5X = correlation between 7X and 10X
• is the correlation function
Note that ( ) ( )h h
• covariance between tX and t hX is denoted by ( )h
• ( ) is called the autocovariance function
• Note that 2( ) ( )h h and that
2(0) since (0) 1
( ) ( )h h
• many financial time series not stationary
• but the changes in these time series may be stationary: 1t t tz y y or (1 )t tz B y
• Lag operator B.
White Noise
• simplest example of stationary process:
No Correlation Case
• 1X , 2X ,…is White noise or 2( , )WN if
( )iE X for all i
2( )iVar X (a constant) for all i
( , ) 0i jCorr X X for all i j
Note: Distribution specification is not required.
It does not have to be a normal distribution
• If 1X , 2X ,… IID normal then process is Gaussian white noise process
• white noise process is weakly stationary with
(0) 1 and ( ) 0t if t 0
so that 2( 0 ) and ( ) 0t if t 0
• WN is uninteresting in itself
– but is the building block of important models usually used as error terms.
2
• It is interesting to know if a financial time series, e.g., of net returns, is WN.
Estimating parameters of a stationary process
• observe 1,..., ny y
• estimate and 2 with Y and
2s
• estimate autocovariance with
1
1
( ) ( )( )n h
j h j
j
h n y y y y
• estimate ( ) with
ˆ( )ˆ( )
ˆ(0)
hh
, h = 1, 2,…:
Auto Regressive (AR) processes
AR(1) processes
• time series models with correlation built from WN
• in AR processes ty is modeled as a weighted average of past observations plus a white
noise “error”
• AR(1) is simplest AR process
• 1 2, ,... are 2(0, )WN
• 1, 2,...y y is an AR(1) process if
1( )t t ty y for all t
three parameters:
: mean, 2
: variance of one-step ahead prediction errors
: a correlation parameter
• 1(1 )t t ty y
• compare with linear regression model,
0 1t t ty x
• 0 (1 ) is called the “constant” in computer output
• is called the “mean” in the output
• When 1 then
2
1 2 ...t t t ty 0
h
t h
h
• infinite moving average [MA( )] representation
• If 1 , then 1,...y is a weakly stationary process
since 1 , 0h as the lag h
• Note: if { }t is a strictly independent zero-mean random variable, then a stationary time
series { }ty is linear. Otherwise the series is nonlinear.
Properties of a stationary AR(1) process
• When 1 (stationarity), then
3
( )tE y t
2
2(0) ( )
1tVar y
t
2
2( ) ( , )
1
h
t t hh Cov y y
t
( ) ( , )h
t t hh Corr y y t
Only if 1 and only for AR(1) processes
• if 1 , then the AR(1) process is nonstationary, and the mean, variance, and correlation
are not constant
• Formulas 1–4 can be proved using
2
1 2
0
... h
t t t t t h
h
y
For example 2
2 2
20 0
( ) ( )1
h h
t t h
h h
Var y Var
• Also, for h > 0 2
20 0
( ) ( , )1
h
i j
t i t h j
i j
h Cov
• distinguish between 2
= variance of 1 2,..., and (0) = variance of 1 2, ,...y y
Non-stationary AR(1) processes
Random Walk
• if = 1 (unit root case) then 1t t ty y
• not stationary
• random walk process
• 1 2 1 0 1( ) ... ...t t t t t t ty y y y = 0
1
t
i
i
y
• start at the process at an arbitrary point 0y then 0 0( )tE y y y for all t
• and 2
0( )tVar y y t
• A shock on t on time 0t is transient in the stationary process, whereas it is permanent
in the unit root series.
• Stationary processes tend to have mean reversion, whereas unit root processes tend to move
irregularly off the mean (there is even no stationary level of mean to revert).
• Unit root processes include the unpredictable stochastic trend.
When 1 , an AR(1) process has explosive behavior
• Suppose an explosive AR(1) process starts at 0 0y and has 0 Then
1t t ty y 2 1( )t t ty
2
2 1t t ty
4
... 2 1
1 2 1 0... t t
t t t y
• Therefore, 0( ) t
tE y y and
• 2
2 2 4 2( 1) 2
2
1( ) (1 ... )
1
tt
tVar y
Since 1 , variance increases geometrically fast at t
• increasing variance makes the random walk “wander” AR(1) processes when 1
• Explosive AR processes not widely used in econometrics since economic growth usually is
not explosive.
(n = 200)
Unit Root Tests
• Test for 0 : 1H v.s. 1 :| | 1H
0 : 1H nonstationary 1 : | | 1H stationary
• Dickey-Fuller Test
1t t ty y , where 2~ (0, )t WN
• If 1 , ty are referred to as integrated with order one or I(1).
1t t t ty y y
Note: t is not integrated, i.e., I(0).
• Order of integration: I(d).
It reports the minimum number of differences required to obtain a stationary series.
A time series is integrated of order d if (1 )d
tB X yields a stationary process.
5
• Augmented Dickey-Fuller Test
1
1
p
t t i t i t
i
y y y
, where 2~ (0, )t WN
• Three types
– Zero mean: 1t t ty y
– Single mean: 1t t ty y
– Trend: 1t t ty t y
• The following SAS code performs augmented Dickey-Fuller tests with autoregressive orders
2 and 5.
proc arima data=test;
identify var=x stationarity=(adf=(2,5));
run;
• Phillips-Perron test is using nonparametric estimation skill
• Unlike the null hypothesis of the Dickey-Fuller and Phillips-Perron tests, the null hypothesis
of the KPSS states that the time series is stationary.
• R-code for unit root tests
library(tseries)
x = rnorm(1000) # no unit-root
y = diffinv(x) # contains a unit-root
adf.test(x)
pp.test(x)
kpss.test(x)
Estimation
• AR(1) model is a linear regression model
• one creates a lagged variable in y and uses this as the “x-variable”
• The least squares estimation: minimize
2
1
2
( )n
t t
t
y y
• If the errors are Gaussian white noise then OLS =MLE
• In SAS, use the “AUTOREG” or the “ARIMA” procedure
• SAS provides maximum likelihood estimates (ML), unconditional least squares estimates
(ULS), Yule-Walker estimates (YW: default), iterative Yule-Walker estimates (ITYW)
proc autoreg data=b;
model y = time / nlag=2 method=ml backstep;
output out=p p=yhat pm=ytrend
lcl=lcl ucl=ucl;
run;
6
– The output data set includes all the variables in the input data set, the forecast values
(YHAT), the predicted trend (YTREND), and the upper (UCL) and lower (LCL) 95%
confidence limits.
– Backstep: stepwise variable selection method
– For more about Yule-Walker estimates, see Gallant and Goebel (1976)
Residuals
1ˆˆ ˆ ˆ( )t t ty y
• estimate 1 2, ,..., n since 1( )t t ty y
• used to check that 1 2, ,..., ny y y is an AR(1) process
• autocorrelation in residuals evidence against AR(1) assumption
• to test for residual autocorrelation use SAS’s autocorrelation plots
• can also use the Ljung-Box test
null hypothesis is that autocorrelations up to a specified lag are zero
SAS code for SACF and PSACF
• computes the extended sample autocorrelation function
• uses these estimates to tentatively identify the autoregressive and moving average orders of
mixed models.
• The following code generates an ESACF table with dimensions of p=(0:7) and q=(0:8).
proc arima data=call;
identify var=vol esacf p=(0:7) q=(0:8);
run;
Autocorrelation Tests
• Durbin-Watson Test
Suppose 1t t te e , where | | 1 with t is iid normal distributed.
Test 0 : 0H is testing no first order autocorrelation in te .
R-code:
– Autocorrelation test for OLS residual
library(lmtest)
dwtest(Revenue~Assets, data=bankdat)
– The number of lags can be specified using the max.lag argument
library(car)
results =lm(Y ~ x1 + x2)
durbin.watson(results,max.lag=2)
SAS code:
– Autocorrelation test for OLS residual
7
proc autoreg data=a;
model y = time / dw=4 dwprob;
run;
• Box-Pierce and Ljung-Box Test
library(ts)
y = arima.sim(list(order = c(2,1,1), ar = c(0.7, 0.2), ma=(0.3)), n = 200)
#y: simulated time series from arima(2,1,1) model
ts.plot(y)
a =arima(y,order=c(1,1,0))
Box.test(a$resid)
Box.test(a$resid, type="Ljung-Box")
Example: GE daily returns
• The SAS output comes from running the following program (proc autoreg)
Here is the SAS output
• SAS uses the model
8
1t t ty y
• ̂ = -0.225 and standard deviation of ̂ is 0.0616
• t-value for testing 0 : 0H versus
1 : 0H is - 3.60
• null hypothesis: log returns are white noise, and alternative is that they are correlated
• small p-value is evidence against the geometric random walk hypothesis
• ( ) hh correlation between successive log returns
• In summary, AR(1) process fits GE log returns better than white noise
• not proof that the AR(1) fits these data
• to check that the AR(1) fits well, look sample autocorrelation function (SACF) of the
residuals
• plot of the residual SACF is available from SAS
• SACF of the residuals from the GE daily log returns shows high negative autocorrelation at
lag 6
ˆ(6) is outside the test limits so is “significant” at .05
SACF of residuals from an AR(1) fit to GE daily log returns
• the more conservative Ljung-Box “simultaneous” test that (1) ... (12) 0 has
pvalue = .011 (in MINITAB)
• since the AR(1) model does not fit well, one might consider more complex models
• these will be discussed in following sections
• can also estimate and test that is zero
– t-value for testing that is zero is very small
– p-value is near one
– large values of the p-value are insignificant
Example: Cree Daily Log Returns
• The SAS output comes from running the following program (proc arima)
9
Here is the SAS output
AR(p) models
• ty is AR(p) process if
1 1 2 2( ) ( ) ( ) ... ( )t t t p t p ty y y y
• here 1,..., n is 2(0, )WN
• multiple linear regression model with lagged values of the time series as the “x-variables”
• model can be re-expressed as
0 1 1 ...t t p t p ty y y
here 0 11 ( ... )p
• least-squares estimator minimizes
2
0 1 1
|1
( ... )n
t t p t p
t p
y y y
10
• least-squares estimator can be calculated using a multiple linear regression program
• one must create “x-variables” by lagging the timeseries with lags 1 throught p
• easier to use the ARIMA command in SAS’s AUTOREG procedures
• these do the lagging automatically
Stepwise regression applied to AR processes
• stepwise regression: looks at many regression models
see which ones fit the data well will be discussed later
• backwards regression:
starts with all possible x-variables
eliminates them one at time
stop when all remaining variables are “significant”
• can be applied to AR models
• SAS’s AUTOREG procedure allows backstepping as an option
• Example
proc autoreg data=a;
model y = time / method=ml nlag=5 backstep;
run;
The following SAS program starts with an AR(6) model and backsteps
Here is the SAS output
11
12
Forecasting
• AR models can forecast future values
• consider forecasting using an AR(1) process
• have data 1,..., ny y
• and estimates ̂ and ̂
• remember 1 1( )n n ny y and 1 1( ,..., ) 0n nE y y
so we estimate 1ny by 1ˆˆ ˆ ˆ: ( )n ny y
and 2ny by 2 1ˆ ˆ ˆˆ ˆ ˆ ˆ ˆ ˆ: ( ) ( )n n ny y y etc.
• in general, ˆˆ ˆ ˆ( )k
n k ny y
• if ̂ < 1 then as k increases forecasts decay exponentially fast to ̂
• forecasting general AR(p) processes is similar
• Example: for an AR(2) process
– 1 1 2 1 1( ) ( )n n n ny y y
– therefore
1 1 2 1ˆ ˆˆ ˆ ˆ: ( ) ( )n n ny y y
2 1 1 2ˆ ˆˆ ˆ ˆ ˆ ˆ: ( ) ( )n n ny y y , etc.
• the forecasts can be generated automatically by statistical software SAS
• SAS program: AR(2) using proc Autoreg
proc autoreg data=b;
model y = / nlag=2 method=ml backstep;
output out=p p=yhat pm=ytrend
lcl=lcl ucl=ucl;
run;
– The output data set includes all the variables in the input data set, the forecast values
(YHAT), the predicted trend (YTREND), and the upper (UCL) and lower (LCL) 95%
confidence limits.
13
Appendices
Co-integration Tests
• If we find the response and predictor variables are integrated (non-stationary), then we might
suspect the spurious regression problem (see Granger and Newbold, 1974) from the models.
• A regression model involving the non-stationary series can spuriously lead to a significant
relationship between unrelated series. a spurious regression problem
• However, Engle and Granger (1987) claimed that if the error term is stationary, in which case,
the non-stationary time series are said to be co-integrated. Then the relationship between
variables is interpreted to be in long-run equilibrium.
• Technically, Hamilton has shown that the OLS estimates for the coefficients of the regression
model are consistent under the existence of the co-integration (Hamilton, 1994, pp. 590~591).
• Test for the non-stationairy of each response and predictor variable
test for a unit root of each variable
• Test for the co-integrating relationship between variables
test for a unit root in the residuals of the co-integration regression
• Let ty and tx be integrated (non-stationary), and
t t ty X e
If ty and tx are co-integrated, then estimates of te would be I(0).
If not, estimates of te would be also non-stationary for some .
• Evidence of co-integration implies that a variable captures the dominant source of persistent
innovations in the other variable over this period interested in long term relationship
• Phillips-Ouliaris Co-integration Test
– Computes the Phillips-Ouliaris test for the null hypothesis that x is not co-integrated.
– The unit root is estimated from a regression of the first variable (column) of x on the
remaining variables of x without a constant and a linear trend.
– R-code
x = diffinv(rnorm(1000))
y = 2.0-3.0*x+rnorm(x,sd=5)
z = ts(cbind(x,y)) # co-integrated
po.test(z)
• If the co-integration relationship is detected, the error correction model (ECM) is usually
applied to model the dynamic relationship among the co-integrated variables.
• ECM employs the differenced variables to transform the original series into a stationary
process.
• Example
1 1 1
k k k
t t i t i t i t
i i i
Y Y RE I u
Granger Causality
• For some 0k , if 2 2( ( | )) ( ( | ))t k t k t t k t k tE y E y F E y E y ,
14
then we say that x Granger-causes y
Note: tF denotes the information set of x and y available at time t
t denotes the information set of y available at time t
• Packet ‘lmtest’ in R: Currently, the methods for the generic function ‘grangertest’ only
perform tests for Granger causality in bivariate series. The test is simply a Wald test
comparing the unrestricted model—in which y is explained by the lags (up to order order) of
y and x—and the restricted model—in which y is only explained by the lags of y.
• Null hypothesis: x does not Granger-cause y
• R-code
## Which came first: the chicken or the egg?
library(lmtest)
data(ChickEgg)
grangertest(egg ~ chicken, order = 3, data = ChickEgg)
grangertest(chicken ~ egg, order = 3, data = ChickEgg)
## alternative ways of specifying the same test
grangertest(ChickEgg, order = 3)
grangertest(ChickEgg[, 1], ChickEgg[, 2], order = 3)
R-code
# simulated time series y from AR(2) model and estimate & forecasting
library(ts)
y = arima.sim(list(order = c(2,0,0), ar = c(0.7, 0.2)), n = 200) #simulation
ts.plot(y)
a=arima(y,order=c(2,0,0)) #estimation (fitting)
predict(arima(lh, order = c(2,0,0)), n.ahead = 12) # forecasting up to 12-time ahead
# load a random walk data from R packet, and plot it
library(TSA)
data(rwalk)
plot(rwalk, type='o', ylab='Random Walk')
Time
Ran
dom
Wal
k
0 10 20 30 40 50 60
-20
24
68
# multivariate case example
library(vars)
## Not run:
15
data(Canada)
var.2c <- VAR(Canada, p = 2, type = "const")
plot(var.2c)
## Diagnostic Testing
## ARCH test
archtest <- arch.test(var.2c)
plot(archtest)
## Normality test
normalitytest <- normality.test(var.2c)
plot(normalitytest)
## serial correlation test
serialtest <- serial.test(var.2c)
plot(serialtest)
## Prediction
var.2c.prd <- predict(var.2c, n.ahead = 8, ci = 0.95)
plot(var.2c.prd)
## Stability
var.2c.stabil <- stability(var.2c, type = "Rec-CUSUM")
plot(var.2c.stabil)
## End(Not run)
Reading lists
[1] Cryer, J.D., Chan, K. (2008), Time Series Analysis with Applications in R, Springer, New York,
USA.
[2] Enger RF and Granger CWJ (1987). Conintegration and error correction: representation,
estimation, and testing. Econometrica 55: 251-276.
[3] Gallant, A. R. and Goebel, J. J. (1976) Nonlinear Regression with Autoregressive Errors, Journal
of the American Statistical Association, 71, 961–967.
[4] Granger CWJ and Newbold P (1974). Spurious regressions in econometrics. J Econometrics 2:
111-120.
[5] Hamilton J (1994). Time Series Analysis. Princeton, New Jersey.
[6] Tsay, R.S. (2005), Analysis of Financial Time Series, Wiley, New Jersey, USA.
[7] 경제시계열분석 (2002), 박준용, 장유순, 한상범, 경문사