Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
1
HG ECON 5101
Fourth lecture – 5 Feb. 2014
1. The ARIMA command in Stata
The ARIMA command includes the possibility of covariates. It estimates a
model involving both ty and a set of covariates tx , assuming
' ~ ARMA( , )t ty x p q where is a vector of parameters, or
(1) 1 1
1 1
' '
p q
t t j t t j t j t
j j
y x y x
This includes the possibility that ty and tx are cointegrated in the
sense that (i) both ty and tx are non-stationary [I(1)], and (ii) that
that the linear combination, 't ty x is stationary [I(0)].
A potential application could be a situation where some other method
(e.g., Søren Johansen’s method) has established that ty and tx are
cointegrated, and an ARIMA analysis could be used to explore/confirm
the nature of the stationarity of 't ty x .
(1) is a special case of an ARMAX model with covariates. However, Stata
has not implemented a general ARMAX- routine. General ARMAX can
be handled by Kalman filtering (the sspace –command).
We will look at the Example 4 in the arima-documentation (pdf) in the Stata
manual: The data concern two US series
personal-consumption - consumpt
money supply - m2t
2
The analysis is an update inspired by an earlier analysis by Friedman and
Meiselman (1963) who postulated the simple relationship,
0 1m2t t tu consump
Load data with
use http://www.stata-press.com/data/r13/friedman2, clear
Stata cuts the data in 1982q1 because of an intervention by the reserve bank in
1982 to control the inflation. I.e., a structural change in 1982.
Apparent non-stationary but cointegrated series in period 1959 – 1981.
M2 missing
Data for example
0
200
04
00
06
00
0
1950q1 1960q1 1970q1 1980q1 1990q1 2000q1time
personal consumption, cur $,UAB NIPA M2 money supply, St Louis fed web
US personal consumption and money supply M2
3
Stata’s solution – postulate ARMA(1,1) for 't ty x :
. arima consump m2 if tin(, 1981q4), ar(1) ma(1)
ARIMA regression
Sample: 1959q1 - 1981q4 Number of obs = 92
Wald chi2(3) = 4394.80
Log likelihood = -340.5077 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
| OPG
consump | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
consump |
m2 | 1.122029 .0363563 30.86 0.000 1.050772 1.193286
_cons | -36.09872 56.56703 -0.64 0.523 -146.9681 74.77062
-------------+----------------------------------------------------------------
ARMA |
ar |
L1. | .9348486 .0411323 22.73 0.000 .8542308 1.015467
|
ma |
L1. | .3090592 .0885883 3.49 0.000 .1354293 .4826891
-------------+----------------------------------------------------------------
/sigma | 9.655308 .5635157 17.13 0.000 8.550837 10.75978
------------------------------------------------------------------------------
Now, look at the residuals.
predict res1,resid
tsline res1 if tin(1959q2,1981q4)
-40
-20
02
04
0
resid
ua
l, o
ne
-ste
p
1960q1 1965q1 1970q1 1975q1 1980q1time
Residuals model 1 - orig data
4
Clearly not white noise (increasing variance). So we try to stabilize the variance
by taking the logs:
gen lm2=ln(m2)
gen lcons=ln( consump)
arima lcons lm2 if tin(, 1981q4), ar(1) ma(1)
ARIMA regression
Sample: 1959q1 - 1981q4 Number of obs = 92
Wald chi2(3) = 2600.44
Log likelihood = 299.9357 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
| OPG
lcons | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
lcons |
lm2 | .9822586 .0607853 16.16 0.000 .8631217 1.101396
_cons | .1832399 .4434234 0.41 0.679 -.6858541 1.052334
-------------+----------------------------------------------------------------
ARMA |
ar |
L1. | .9731574 .0247959 39.25 0.000 .9245583 1.021757
|
ma |
L1. | .2496818 .1356231 1.84 0.066 -.0161347 .5154983
-------------+----------------------------------------------------------------
/sigma | .0091153 .0007558 12.06 0.000 .007634 .0105967
------------------------------------------------------------------------------
Note: The test of the variance against zero is one sided, and the two-sided
confidence interval is truncated at zero.
Residuals:
-.02
-.01
0
.01
.02
resid
ua
l, o
ne
-ste
p
1960q1 1965q1 1970q1 1975q1 1980q1time
Residuals from ln-transformed model
5
Normality test: swilk resln if tin(1959q2,1981q4)
Shapiro-Wilk W test for normal data
Variable | Obs W V z Prob>z
-------------+--------------------------------------------------
resln | 91 0.99384 0.470 -1.665 0.95207
2. Forecasting (prediction). (A good reference is Hamilton chap. 4)
Suppose we wish to predict a random variable Y from a set of predictor rv’s,
1 2, , , kX X X . A predictor is a function, 1 2ˆ ( , , , )kY h X X X .
The best predictor in the minimum mean squared error (MSE) sense, i.e.,
minimizing 2
ˆE Y Y
, is
(2) 1 2ˆ | , , , kY E Y X X X
(see Hamilton chap. 4 or a lecture note on prediction on the web-page for Econ
4130 (stat 2) 2013)
In time series the function in (2) often turns out to be complicated , so it is
common to use a next best solution, the best linear approximation to (2) (also
called the “linear projection predictor” from Hilbert space terminology that can
be shown to exist always),
(3) 0 1 1ˆ
k kY a a X a X
where the constants 0 1, , , ka a a are determined to minimize the MSE
2
ˆE Y Y
.
Note: If 1 2( , , , , )kY X X X are jointly normal (Gaussian), the (2) and (3) are
equivalent !
6
Solution:
Let Y have expectation and variance, 2 and Y Y .
Let 1 2' ( , , , )kX X X X have expectation 1 2' ( , , , )k and
covariance matrix , 1
cov( ,k
i j i jX X
.
Write 1 2' ( , , , )ka a a a , and the covariance between Y and X as the vector
1cov( , ), ,cov( , )XY kY X Y X
Theorem 3 The MSE minimizing constants, are given as any solution to i.
and ii. below: The solution is obtained by solving the equations
ˆ 0
ˆ( ) 0 for 1,2, ,j
E Y Y
E Y Y X j k
leading to
i. 0 1 1Y k ka a a
ii. ( cov( , ))XYa Y X
iii. If is non-singular, 1
XYa .
iv. The MSE of the solution, becomes,
2 2 1ˆ( ) Y XY XYMSE E Y Y
Proof: Exactly the same proof as for the OLS estimators in a multiple
regression problem. Only replace all sample quantities, means and sample
covariances, by corresponding population quantities. Or see Hamilton chap. 4.
(End of proof.)
In a time series context, we have, e.g., observed, 1 2, , , tY Y Y of a causal
stationary series with expectation ( )tE Y and autocovariance ( )h ,
0, 1, 2,h .
Let 1 2 1, , , ,t t t tD Y Y Y Y be all that we know at time t.
We want to forecast t jY
at a future time t j .
7
Notation: We write the best linear predictor as |ˆ ( | )t j tt j tY E Y D (if feasible)
or the best linear approximation by theorem 3.
With regard to Theorem 3, we have in general 1 0, ( , , , ),t j t tY Y X Y Y Y
1 1( , , , )t ta a a a , and , 1
( )t
i ji j
(a t t matrix), with solution
(4) 1
,t t ja where , ( ( ), ( 1), , ( 1))t t j j j j t .
For large t (4) becomes unpractical and is replaced by various recursion
formulas developed that do not involve matrix inversions (see Hamilton
(chap. 4) for some approaches).
The forecast problem for AR(p) is simple. We then only need the p values,
1 1, , ,t t t pY Y Y to predict
t jY .
The forecast problem for ARMA(p,q) for 1q is more complicated. We
then need the whole history, 1 1, , ,t tY Y Y to predict t jY
.
To simplify the argument below I assume a slightly stronger assumption for the
white noise series t , namely that
(5) 1 2( | , , ) 0t t tE for all t (in the causal case we look at here).
Note that (5) implies the weaker assumption that the 't s are uncorrelated, but
uncorrelated 't s does not imply (5) except in the Gaussian case (!).
The (causal) AR(p) case:
(6) 0 1 1t j t j p t j p t jY Y Y
Let 1, ,t t tD Y Y . From (6) we have
(7)
| 0 1 1
0 1 1
ˆ (Y | ) |
| | |
t j t t j t t j p t j p t j t
t j t p t j p t t j t
Y E D E Y Y D
E Y D E Y D E D
8
The solution of (6) (for all j) is (where ( )sE Y )
(8) 1 1 2 2t j t j t j t jY
showing that tD only depends on 1 2, , ,t t t . Hence (using (5)) the last
term in (7) must be 0, | 0t j tE D for 1j . Then, from (7)
| 0 1 1| |
ˆ ˆ ˆt j t t j t p t j p tY Y Y
In particular we get (noting that |s t sE Y D Y for s t ):
1| 0 1 1
ˆt t t p t pY Y Y
2 | 0 1 1| 2 2
ˆ ˆt t t t t p t pY Y Y Y
------
| 0 1 1| |
ˆ ˆ ˆt j t t j t p t j p tY Y Y for j p
The (causal) ARMA(p,q) case:
Consider the simplest ARMA(0,1) =MA(1) case (see Hamilton chap. 4 for the
general case):
(9) 0 1t j t j t jY
where we assume that the MA error term is invertible (i.e., 1 ).
Here the problem arises for the first forecast,
1| 1 0 1ˆ ( | ) ( | ) ( | )t t t t t t t tY E Y D E D E D
Since in the causal case (as for the AR(p) case), tD only depends on 1, ,t t ,
we must have 1( | ) 0t tE D , so we get
(10) 1| 1 0ˆ ( | ) ( | )t t t t t tY E Y D E D
9
Using the invertibility of t , we have from (9)
20
1 2 ( )1
r
t t t t t rY Y Y Y
showing that
1| 1 0
2 300 1 2
ˆ ( | ) ( | )
( )1
t t t t t t
r
t t t t r
Y E Y D E D
Y Y Y Y
An approximation to this is then obtained by truncating the Y-series after
1r t , giving
2 3 10
1| 0 1 2 1ˆ ( )
1
t
t t t t tY Y Y Y Y
This approximation will obviously improve as t increases since 0t
t
.
For 2j , however, the prediction gets easy since then
1( | ) ( | ) 0t j t t j tE D E D , implying simply
| 0
ˆt j tY
In the general ARMA(p,q) case we get similarly for 1 j q
| 0 1 1| | 1 1
ˆ ˆ ˆ ˆ ˆ ˆt j t t j t p t j p t j t j t q t j qY Y Y
where the ˆ 's s must be predicted by 1, ,s sY Y ,
but for j q , we get as before
| 0 1 1| |ˆ ˆ ˆt j t t j t p t j p tY Y Y
Exact predictors can be derived as well (see Hamilton chap. 4) but the formulas
are slightly complicated…
10
3. The property of “mean reversion” for causal stationary
processes.
The mean ( )tE Y in a stationary time series tY seems to play the role as
an attractor in the series – in the sense that, if an observation of tY is far away
from , the next observation has a tendency to be closer to . This tendency
some economists tend to call “mean reversion”.
In prediction the tendency becomes evident:
The general solution of a causal ARMA time series has the form
1 1 2 2t t t tY
and at time point t j in the future
(11) 1 1 2 2t j t j t j t j j tY
Using (5), all ( | ) 0t j tE D for 0j . Hence, the forecast becomes
(12) | 1 1
ˆ ( | )t j t t j t j t j tY E Y D
Since 0 as j j , we see that the error term will approach 0 as j
increases (it is not hard to prove this formally using the Hilbert-space concept
“mean-square” convergence).
Hence |
ˆt j tY as j increases showing the attractor property of .
As a contrast, it is easy to see that random walks do not share the mean
reversion property. Consider the simple RW,
1 2t tY
11
We get
1 2 1 1t j t t t j t t t jY Y
implying
|
ˆ ( | )t j t t j t tY E Y D Y for all 0j .
Hence, no mean reversion in random walks.
Final notes to forecasting.
See the example of forecasting the Norwegian lnGDP in lecture notes 3
(LN3). There we estimated the model based on a reduced data set leaving
the last observations to be predicted. To achieve this we could use the
Stata arima post-estimation command predict. The stata manual did
the same thing for the example we started with (see the pdf-manual).
However, suppose we have a covariate time series tx as in the example,
we have used the whole series and want to forecast future values beyond
the data. Then the predict command cannot be used. We have to use
the forecast command instead (as illustrated in the exercises for
seminar 1). In order to utilize the covariate tx in the forecasting, we
need to forecast t jx first (using forecast) an use the predicted values
of t jx in the equation as basis for forecasting t jy , again using
forecast.
Suppose we have estimated an Arima(p,1,q) for ty which is the same as
estimating an Arma(p,q) for ty . Having predicted values for ˆty ,
we can calculate predicted values for the original series ty by
cumulative sums (called “integration” in the cointegration literature):
2 1 2 3 1 2 3 1 2 3ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ, , , t ty y y y y y y y y y y y .
This is done automatically in Stata by the option y for predict.
12
4. Introduction to VAR(p) (An excellent reference for multivariate time series is H. Lütkepohl, “New
Introduction to Multiple Time Series Analysis”, Springer Verlag, 2005 ).
I use Lütkepohl’s notation in the following.
Let
1
2
t
t
t
Kt
y
yy
y
be a K-dimensional vector of time series.
DEF. We say that ty is a vector autoregressive time series of order p
( ~ VAR( )ty p ) if
(13) 1 1t t p t p ty A y A y u
where 1 2( , , , )K are constants, 1 2, , , pA A A are square K K
coefficient matrices, and 1 2( , , , )t t t Ktu u u u is a K-dimensional white noise
vector satisfying,
( ) 0, cov( ) , 0 for t t t t u t sE u u E u u E u u t s
(i.e., tu is a special case of a covariance stationary vector process).
DEF. We say that ty is a structural vector autoregressive time series of order
p ( ~ SVAR( )ty p ) if there is a non-singular K K matrix B such that
(14) 1 1t t p t p tBy A y A y
where ~ (0, )t WN
Multiplying (14) by 1B we get the reduced form (13) with
1 1, for 1, ,j jB A B A j p , and 1 ~ (0, )t t uu B WN where
13
1 1( )u B B
. Although important in dynamic modeling, we will not
discuss SVAR models further in this part of the course.
The VAR(1) case: We will look at first the VAR(1) model
(15) 1 1t t ty A y u
Note. It turns out (see below) that the more general VAR(p) can be considered a
special case of VAR(1)!
Successive substitution in (15) gives (where KI is the K-dimensional identity
matrix)
(16) 1 1
1 1 1 0 1 1 1 1( ) ( )t t t
t K t ty I A A A y u Au A u
For this to stabilize to something stationary, we must have that 1
tA must
converge to a 0-matrix (written 0 , i.e., a K K matrix of zeroes ). This
happens if and only if all eigenvalues of 1A have modulus strictly less than 1:
[Review of eigenvalues and eigenvectors. Let A be a square K K
matrix. If b is a vector 0 , and a scalar such that
Ab b (which is the same as 0KA I b ), we say that x is an
eigenvector and a corresponding eigenvalue.
For the equation 0KA I b to be possible for an 0x , the matrix
KA I must be singular with determinant = 0, i.e.,
14
11 12 1
21 22 2
1 2 11
det( ) ( )
K
K
K
K K
a a a
a a aA I p
a a a
, i.e., a
polynomial of order K. ( )p must have K roots, 1 2, , , K , making
( ) 0ip , some of which may be complex.
Assume for simplicity that all 1 2, , , K are different and let
1 2, , , Kb b b be the corresponding eigenvectors. Then
(17) i i iAb b for 1,2, ,i K
Collecting all ib in a matrix, 1 2( , , , ) ~KB b b b K K , it can be shown
that B must be non-singular, and (17) becomes
(18) 1 2 1 1 2 2( , , , ) ( , , , )K K KAB Ab Ab Ab b b b B ,
where
1
2
0 0
0 0
0 0 K
is a diagonal matrix.
Multiplying (18) by 1B from the right gives
(19) 1A B B
Using (19), we get 2 1 1 1 2 1
KA B B B B B I B B B ,
3 3 1A B B … etc, and in general
(20) 1j jA B B
On the other hand, regular matrix multiplication gives
15
2
1
2
2 2
2
0 0
0 0
0 0 K
, and in general,
1
2
0 0
0 0
0 0
j
j
j
j
K
So, if and only if all 1j , 0j
j (i.e., the 0-matrix).
From (20) we then have
1 1lim lim lim 0j j j
j j jA B B B B
(21) Basic stability condition for VAR(1)
In other words, the condition that 0j
jA
is that all eigenvalues of A
have modulus strictly less than 1.
Note. If some of the 'i s are equal, (20) is still valid, but with a slightly
more complicated (“Jordan form”) . j can still be calculated and the
conclusion is the same for stability.
End of review. ]
We need another matrix formula
Lemma 1 Let A be K K satisfying the stability condition (21). Then
i. 1 1( ) ( ) ( )t t
K K KI A A I A I A
ii. The infinite matrix series converges as t and is equal to
2 1
0
( )j
K K
j
I A A A I A
[Proof. Usual matrix multiplication gives
16
1
2 1
2 1
( )( )
t
K K
t
K
t t t
K
I A I A A
I A A A
A A A A I A
We must have ( )KI A is non-singular (i.e., det( ) 0KI A since 1 is
not an eigenvalue of A). Then, multiplying both sides of the equality by 1( )KI A gives i.
ii.: By definition we have
2 1
0
1 1
lim( )
lim( ) ( ) ( )
Defj t
K Kt
j
t
K K Kt
I A A A I A A
I A I A I A
End of proof. ]
Under stability (all eigenvalues of 1A have modulus less than 1) the solution (16)
(reproduced)
(16) 1 1
1 1 1 0 1 1 1 1( ) ( )t t t
t K t ty I A A A y u Au A u
will approach a stationary process as t :
The first term, 1
1 1 1( ) ( )t
K Kt
I A A I A
The second term 0
The third term converges to a stationary process (when we imagine that the
white noise process tu has been going on since :
17
Theorem 4. If all eigenvalues of 1A have modulus less than 1, then
i. 2
1 1 1 2 1
0
j
t t t t t j
j
z u Au A u A u
is a well defined random
variable and the time series tz is stationary with ( ) 0tE z and
autocovariance matrices
1 1
0
( ) ( ) ( )j h j
z t h t u
j
h E z z A A
ii. The VAR(1) equation, 1 1t t ty A y u , has a causal stationary
solution
2
1 1 1 2 1
0
j
t t t t t j
j
y u Au A u A u
where 1
1( ) ( )t KE y I A , and autocovariance matrices
1 1
0
( ) (y )( ) ( )j h j
y t h t u
j
h E y A A
(The proof of this follows from elementary Hilbert space theory combined with
the concept of “mean square convergence” between random variables – a
convergence concept that is slightly stronger than convergence in probability
and/or distribution.)
Note that the autocovariance matrices ( )y h are not depending on t, implying
covariance stationarity of the ty series.
So, theorem 4 shows that the stability condition implies the existence of a causal
stationary solution for the VAR(1) model.
18
5. The stability condition for general VAR(p) models.
We then need the following useful lemma.
Notation. For any square matrix B, write the determinant, det( )B B .
Lemma 2. The stability condition for VAR(1) (i.e., that all eigenvalues of 1A
have modulus less than 1), is equivalent to the following condition
(22) 1 0KI A z for all z such that 1z
where z is a scalar variable.
[Proof. Write (22) (remember that a constant taken out of a determinant
must be raised to the power of K)
1 1
1( )K
K KI A z z A Iz
(We can exclude the possibility 0z since 1 0KI ). Hence (22) is
equivalent to
(23) 1
10 for all 1KA I z
z
Assume that all eigenvalues of 1A have modulus strictly less than 1. Then
we must have
1 0KA I for any with 1 , i.e., for any z with 1 1
1z z , i.e.,
for any z with 1z , so (23) and therefore (22) must be true.
Assume now that (22) is true. Then (23) is true, implying that
1
10KA I
z can have no solution with
11
z . In other words, the
equation, 1 0KA I can have no solution with 1 . So all solutions
(i.e., eigenvalues) must fulfill 1 - which is the stability condition.
End of proof. ]
19
Any VAR(p) can be formulated as a VAR(1).:
If ty is a K-dimensional VAR(p):
(24) 1 1t t p t p ty A y A y u
we can define a Kp -dimensional VAR(1), 1t t tY AY U by putting
1
1
~ 1
t
t
t
t p
y
yY Kp
y
, 0
~ 1
0
Kp
1 2 1
0 0 0
~ ( )0 0 0
0 0 0
p p
K
K
K
A A A A
I
A Kp KpI
I
, 0
~ ( 1)
0
t
t
u
U Kp
or
1 2 1
1
1 2
1
0 0 00 0
0 0 0
0 00 0 0
p p
t t t
K
t t
K
t p t p
K
A A A Ay y u
Iy y
I
y yI
The matrix A is sometimes called the companion matrix for the VAR(p)
Some manipulation of determinants (as done in Hamilton for the univariate
AR(p) case), shows
(25) 1det det p
Kp K pI Az I A z A z
20
From Lemma 2 we get our stability condition for the VAR(p) in (24)
(26)
The VAR(p), 1 1t t p t p ty A y A y u , is stable if and only if
1det 0 for all 1p
K pI A z A z z
The polynomial, 1det p
K pI A z A z , we may call the companion
polynomial of the VAR(p) process.
The criterion (26) then says that the VAR(p) process is stable if and only if all
roots of the companion polynomial , 1det p
K pI A z A z , are outside the
unit circle (as in the univariate case).
Note also that because of Theorem 4 the stability of a VAR(p) implies (via its
VAR(1) representation) that it has a causal stationary solution which has a
MA( ) form.
Knowing that ty is stationary, the expected value, ( )tE y , is easily found
by taking expectations of 1 1t t p t p ty A y A y u directly, giving
1 1 or p K pA A I A A
giving
(27) 1
1( )t K pE y I A A
To find the MA( ) solution for ty , we may use the VAR(1) and theorem 4.
Introduce the ( )K Kp matrix
: 0 : : 0KJ I
21
Then t ty JY and from theorem 4 ( putting ( )tE Y
)
0 0
j j
t t t j t j
j j
y JY J J A U J A U
Example 1 (taken from Lütkepohl):
Consider the bivariate VAR(2) model
1 2
.5 .1 0 0
.4 .5 .25 0t t t ty y y u
Is this a stable (and a causal stationary) process?
The companion polynomial becomes
2
2
2 3
1 0 .5 .1 0 0 1 .5 .1det det
0 1 .4 .5 .25 0 .4 .25 1 .5
1 .21 .025
z zz z
z z z
z z z
with roots, 1 2 31.3, 3.55 4.26 , 3.55 4.26z z i z i
Since 2 2
2 3 3.55 4.26 5.545z z ,
we see that all roots are outside the unit circle – so the process is stable and a
causal stationary solution exists. (End of example.)
(A little bit on forecasting and estimation comes next lecture.)