Upload
jose-c-beraun-tapia
View
215
Download
0
Embed Size (px)
Citation preview
1
Master in Business Administration and Quantitative Methods
FINANCIAL ECONOMETRICS
2009-2010
ESTHER RUIZ
CHAPTER 1. BASIC CONCEPTS IN TIME SERIES ANALYSIS
1.1 Characteristics of time series: Stationary stochastic processes
A time series is a succession of observations equally spaced in time (not necessarily)
Tyyy ,...,, 21
At each point in time, the observation can correspond to a unique variable or to several variables. In the
former case, we have what is known as a univariate series and in the latter a multivariate series. For the
moment, our focus is on univariate series.
Examples of real time series:
0.8
0.9
1.0
1.1
1.2
1.3
1.4
250 500 750 1000 1250 1500
Daily Euro-Dollar exchange rate from 4th January 1999 up to 25th May 2005
1200000
1300000
1400000
1500000
1600000
1700000
91 92 93 94 95 96 97 98 99 00 01 02 03 04
Quartely GDP Europe from 1st 1991 up to 3th 2004
80
85
90
95
100
105
110
115
120
91 92 93 94 95 96 97 98 99 00 01 02 03 04
Monthly IPC Europe from January 1991 up to December 2004
The main objectives when analysing a univariate time series are:
i) Description of the dynamic properties of the series: trend, seasonal components,
dependencies with the past
ii) Prediction of future values of the series
2
To achieve these objectives, we have to take into account that the observations of a variable obtained at
different moments of time have two main properties:
i) They are dependent.
ii) They are obtained in a changing context.
As a consequence, we can not assume, as in the classical framework, that we have T independent
observations of a given random variable. Consider, for example, the observations of the CPI above. The
observations clearly depend on each other. Even more, we want to analyse this dependency to extrapolate
it into the future:
ectedUnt
edictablett ayyfy
expPr11 ),...,( += −
On the other hand, the CPI in January 1991 is not observed in the same circumstances as the CPI in
December 2004. Therefore, we cannot assume that we have 168 observations of a random variable with
mean µ and variance 2σ .
Alternatively, we assume that the observed time series is the realization of a stochastic process. A
stochastic process is a succession of random variables ordered in time (they can be ordered with
alternative indexes)
)(),...,2(),1( TYYY
Example:
Time
The stochastic process can generate in theory an infinite number of realizations over the period t=1,…,T.
............
...
...
...
...
)(...)2()1(
)3()3(
2
)3(
1
)2()2(
2
)2(
1
)1(1()
2
)1(
1
T
T
T
yyy
yyy
yyy
TYYY
↓↓↓
3
time
When several realizations are available, the mean of each variable of the process, ))(( tYEt
=µ , can be
estimated by ensemble averages
m
ym
j
j
t
t
∑== 1
)(
µ̂
where )( j
ty denotes the jth observation on
ty and m is the number of realizations. However, in most
empirical problems, only a single realization is available. Each observation in our time series is a
realization of each of the random variables of the process.
Tyyy
TYYY
...
...
)(...)2()1(
21
↓↓↓
time
Consequently, we have to restrict the properties of the process to carry out inference.
4
To allow estimation, we need to restrict the process to be stationary. There are two main concepts of
stationarity: strict and weak stationarity.
Strict stationarity: The process { )(),...,2(),1( TYYY } is said to be strictly stationary if the joint
distribution of { )(),...,(),(21 k
tYtYtY } is identical to that of { )(),...,(),(21
htYhtYhtYk
+++ } for all h,
where k is an arbitrary positive integer and ),...,,( 21 kttt is a collection of k positive integers. Therefore,
strict stationarity requires that the distribution of { )(),...,(),(21 k
tYtYtY } is invariant under time shifts. For
example,
),(),(),(
),(),(),(
)(...)()(
1167261
873221
21
yyfyyfyyf
yyfyyfyyf
yfyfyfT
====
===
Weak stationarity: The process { )(),...,(),(21 k
tYtYtY } is said to be weakly stationary if:
i) ttYEy
∀= ,))(( µ
ii) ttYVary
∀= ,))((2σ
iii) thhtYtYCovy
∀=+ ),())(),(( γ
time
The first condition implies that the mean of each random variable in the process is the same regardless of
the particular random variable chosen. Even if we have a collection of random variables, as all of them
have the same mean, we can use the sample mean to estimate their common mean as follows:
T
y
y
T
i
i∑=== 1µ̂
The other two conditions have similar interpretations. Given that all the random variables of the process
have the same variance, we can use the information contained in all of them to estimate it. In particular,
we estimate 2σ by
T
yy
s
T
t
i
y
∑=
−== 1
2
22
)(
σ̂
Finally, the third condition tells us that the dependency between any two random variables of the process
depends on the distance between them but not on the moment of time in which we observe them. The
relationship between two observations separated by, for example, one period of time is the same at the
beginning and at the end of the sample:
5
))301(),300(())2(),1(( YYCovYYCov =
This condition allows us to use the sample covariances to estimate the linear relationship between
observations separated by h periods of time as follows:
T
yyyy
hch
T
ht
htt∑+=
− −−== 1
))((
)()(γ̂
Therefore, weak starionarity requires that the first two moments of the process are time invariant. It is
obvious to see that weak stationarity does no imply strict stationarity, as the latter requires that the whole
distribution is invariant. However, it is important to note that strict stationarity implies weak stationarity
if the variance of the process is finite. In the context of non-linear, non-Gaussian models, there are
examples of strictly stationary processes without finite variance that are not weakly stationary.
When )}(),...,2(),1({ TYYY has a joint Normal multivariate distribution, the process is said to be
Gaussian. In this case, the distribution of every subset )}(),...,(),({21 k
tYtYtY is also multivariate Normal.
In particular, the marginal distribution of each variable of the process is Normal. Given that the Normal
distribution is characterized by the first two order moments, weak stationarity implies strict stationarity.
In this case, weak and strict stationarity coincides.
1.2 Transformations to stationarity
When a series of observations is generated by a stationary process, they fluctuate around a constant level
and there is no tendency for their spread to increase or decrease over time. Consider, for example, the
following series generated by a stationary series known as white noise. A white noise process is a
particular case of a stationary process with 0=µ and hh ∀= ,0)(γ .
-3
-2
-1
0
1
2
3
25 50 75 100 125 150 175 200
Gaussian white noise process with variance 1.
However, in practice, most real economic time series do not fluctuate around a constant level, but instead
show some kind of systematic upward or downward movement known as trend. Therefore, their marginal
means are not constant over time and, consequently, the series is not stationary.
6
Example:
1200000
1300000
1400000
1500000
1600000
1700000
91 92 93 94 95 96 97 98 99 00 01 02 03 04
Quartely GDP Europe from 1st 1991 up to 3th 2004
Furthermore, in many macroeconomic time series, not only we observe trends but also that the spread of
the series around its mean increases with the level of the series. In this case, neither the marginal means
nor the marginal variances are constant.
0.00E+00
4.00E+08
8.00E+08
1.20E+09
1.60E+09
2.00E+09
2.40E+09
60 65 70 75 80 85 90 95 00
Monthly Spanish Exports from January 1960 up to Jult 2004
Although real time series are usually non-stationary, under appropriate transformations, the transformed
series is stationary. Next, we describe some of these transformations. One of the most popular of them to
stabilize the variance when it increases with the level of the series is the logarithmic transformation.
Consider the previous example of monthly exports in Spain. After taking logs, the series is
14
15
16
17
18
19
20
21
22
60 65 70 75 80 85 90 95 00
Monthly logarithmic exports in Spain from January 1960 to July 2004
7
We focus now on transformations to stabilize the mean. Consider, first that the series of interest is
stationary with marginal mean µ . One possible representation for this series is
tty εµ +=
where t
ε is a stationary process with zero mean. If the mean of ty is not constant but evolves over time,
then we can represent ty by the following model:
ttt
ttty
ηµµεµ+=
+=
−1
where t
η is a ),0(2
ησNID process independent of t
ε . The process of t
µ is known as random walk. Note
that when 02 =ησ , we go back to stationarity. In general, if 0
2 ≠ησ , the level of the series evolves over
time without a trend.
-3
-2
-1
0
1
2
3
25 50 75 100 125 150 175 200
Random walk plus noise with sigeps=1 and sigeta=0
-15
-10
-5
0
5
10
15
20
25 50 75 100 125 150 175 200
Random walk plus noise with sigeps=1 and sigeta=1
-3
-2
-1
0
1
2
3
4
25 50 75 100 125 150 175 200
Random walk plus noise with sigeps=1 and sigeta=0.1
-8
-4
0
4
8
12
25 50 75 100 125 150 175 200
Random walk plus noise with sigeps=1 and sigeta=0.5
In this case, if we take first differences,
1111 −−−− −+=−+−=−=∆tttttttttt
yyy εεηεεµµ
0}{}{ =∆+=∆ttt
EyE εη
which is a stationary process. When a non-stationary process becomes stationary after taking first
differences, the process is known as integrated of order 1, I(1).
Example:
8
0.8
0.9
1.0
1.1
1.2
1.3
1.4
250 500 750 1000 1250 1500
Daily Euro-Dollar exchange rate from 4th January 1999 up to 25th May 2005
-.03
-.02
-.01
.00
.01
.02
.03
250 500 750 1000 1250 1500
First differences of logs of euro-dollar exchange rates: returns
If we have a series with trend and constant slope, the following representation is appropriate
ttt
ttty
ηβµµεµ
++=+=
−1
where β is the rate of growth of the trend.
0
40
80
120
160
200
25 50 75 100 125 150 175 200
Random walk with drift plus noise with sigeps=1 and sigeta=1
In this case, taking first differences,
βεβη =∆++=∆ }{}{ttt
EyE
the series is also starionary. Therefore, we have an I(1) series. Note that introducing a constant in a time
series model, can change completely the long run behaviour of the series.
9
-4
-2
0
2
4
6
8
25 50 75 100 125 150 175 200
First differences of random walk with drift plus noise
Finally, we can also observe series with the rate of growth (slope) of the trend changing over time.
Following the same arguments as before, we can model this behaviour as follows
ttt
tttt
ttty
ξββηβµµ
εµ
+=++=
+=
−
−
1
1
where t
ξ is a white noise with variance 2
ξσ independent of t
ε and t
η . In this case, if we take first
differences, the series is non-stationary:
tttttEyE βεβη =∆++=∆ }{}{
-100
0
100
200
300
400
500
600
25 50 75 100 125 150 175 200
Stochastic trend model with sigeps=1, sigeta=1 and sigpsi=0.5
-8
-4
0
4
8
12
25 50 75 100 125 150 175 200
First differences of stochastic trend model
However, taking a second difference
211
2111
2
2
2)()(
−−−
−−−−
+−++−=+−+−+−=∆++∆=∆∆=∆
tttttt
ttttttttttttyy
εεεξηηεεεββηηεβη
0)2()(211
2 =+−++−=∆ −−− tttttttEyE εεεξηη
Therefore, by taking two differences, we obtain a stationary series with zero mean
10
-12
-8
-4
0
4
8
25 50 75 100 125 150 175 200
Second differences of stochastic trend model
Example:
0.0E+00
1.0E+09
2.0E+09
3.0E+09
4.0E+09
5.0E+09
6.0E+09
7.0E+09
1970 1975 1980 1985 1990 1995 2000
Monthly european M3
-.03
-.02
-.01
.00
.01
.02
.03
.04
.05
1970 1975 1980 1985 1990 1995 2000
First differences of logarithmic M3
-.04
-.03
-.02
-.01
.00
.01
.02
.03
.04
1970 1975 1980 1985 1990 1995 2000
Seasonal and regular differences of logarithmic M3
11
Finally, when both parameters 022 == ξη σσ , we obtain a deterministic trend. In this case, it is possible to
fit a regression model and the stationary transformation is the series of residuals from this model.
0
40
80
120
160
200
240
25 50 75 100 125 150 175 200
Deterministic trend plus noise
After fitting a deterministic trend, we obtain the following results:
The residuals from this model are the corresponding stationary transformation:
-3
-2
-1
0
1
2
3
25 50 75 100 125 150 175 200
Y6 Residuals
12
1.3 Wald theorem: linear ARMA models
The objective of time series analysis is to decompose the observed values of a time series in a component
that depends on past observations (and can be predicted) and an unexpected component.
tttayyfy += − ),...,(
11
The process }{ta is known as innovation. By definition, it should not be related with the past.
Wald Theorem: If a process is stationary without deterministic components, then
∑∞
=−Ψ=
0i
ititay
where 10
=Ψ , ∑∞
=
∞<Ψ0
2
i
iand }{
ta is a white noise process, i.e. an uncorrelated sequence. When the
process }{ta has independent variables, it is called a strict white noise.
The usefulness of the Wald Theorem is that it allows to approximating the dynamic evolution of a
variable ty by a linear model. If the innovations }{
ta are independent, then the linear model is the only
possible representation. However, when }{ta is merely an uncorrelated but not independent sequence,
then the linear model exists but it is not the unique representation of the dynamic dependence of the
series. In the latter case, it is possible that the linear model is not very useful, and we may have a
nonlinear model relating the observed value of ty with its past evolution.
The Wald representation depends on infinite parameters and, consequently, it is not useful in practice. To
solve this problem, it is approximated by using models that have a finite number of parameters.
ttaLy )(∞Ψ=
where ...1)( 2
21+Ψ+Ψ+=Ψ∞ LLL and L is the lag operator such that
1−=ttxLx . The infinite order
polynomial, )(L∞Ψ , is approximated by the following ratio:
)(
)()(
L
LL
p
q
ΦΘ
=Ψ∞
where q
qqLLL θθ −−−=Θ ...1)(
1 and
p
ppLLL φφ −−−=Φ ...1)(
1. The stationarity condition is that the
roots of the autoregressive polynomial, )(Lp
Φ , are outside the unit circle. Furthermore, for identification,
we assume that both polynomials do not have common roots. The resulting model is known as
ARMA(p,q) model and it is given by
qtqttptpttaaayyy −−−− −−−+++= θθφφ ......
1111
This approximation is valid in a large proportion of cases. However, there are stationary models that
cannot be approximated by using ARMA models. This is the case of long-memory series.
Once we have the ARMA model, we need to decide how to choose p and q suitable for the series of
interest.
13
1.4 The autocorrelation function and the correlogram
As we mentioned above, the main objective in the analysis of univariate time series is to decompose the
observed series in the part that depends on the past (and can be consequently predicted) and the
unexpected part:
ectedUnt
edictablett
ayyfyexpPr
11),...,( += −
Because of the Wald theorem, we can focus as a first approximation on linear dynamic dependencies (f is
linear). In this case, the autocovariances are the main instrument available to analyze these dependencies.
The autocovariance of order h is given by
2}{)})({()( µµµγ −=−−= −− htthtt
yyEyyEh
and the corresponding autocorrelation is given by
)0(
)()(
γγρ h
h = .
Note, that we are assuming that the autocorrelations are constant over time. This assumption is needed to
estimate them using the sample autocorrelations estimated in the usual way. As a consequence of the
stationarity, )()( hh −= ρρ , it is unnecessary to consider negative values of h. To identify the model
appropriate for a given time series, we are matching the shape of its estimated acf (correlogram) with the
shapes of the theoretical acf’s implied by the alternative ARMA models. However, we should take into
account the sample properties of the estimators of the autocorrelations. To estimate the autocorrelation of
order h, we use its sample analogue
∑
∑
=
+=−
−
−−=
T
t
t
T
ht
htt
yy
yyyy
hr
1
2
1
)(
))((
)(
If }{ ty is an iid sequence with finite variance, then )(hr is asymptotically normal with mean zero and
variance T
1. This result can help us to build the Barlett confidence bounds for the estimated
autocorrelations.
Finally, if we want to test jointly that the first m autocorrelations are zero, i.e.
0)(...)2()1(:0
==== mH ρρρ , then we can use the Ljung-Box-Pierce statistic given by
∑= −
+=m
h hT
hrTTmQ
1
2)()2()(
If }{ ty is an iid sequence with finite fourth order moment, then )(mQ has asymptotically a chi-squared
distribution with m degrees of freedom. In practice, the selection of m affects the finite sample
performance of the statistic. Simulation studies suggest the choice of )log(Tm ≈ provides best powers.
14
Examples:
i) Correlogram of stationary transformation of monthly European M3
ii) Correlogram of daily returns of Euro/Dollar exchange rate
15
When, as in the case of the Euro-Dollar exchange rate, the correlogram of a series has not significant
autocorrelations, it means that there are not linear models to predict the future of the series using the
information contained in the past. However, it is possible that there are nonlinear transformations which
are correlated and, consequently, that can be predicted.
Example:
10
15
20
25
30
35
40
45
250 500 750 1000 1250 1500
Daily CAC index from 7 December 1998 up to 9 September 2005
-.2
-.1
.0
.1
.2
.3
.4
250 500 750 1000 1250 1500
Daily returns of CAC index from 7 december 1998 to 9 September 2005
Correlogram of daily retuns of CAC
16
Correlogram of squared daily returns of CAC
1.5 Linear ARMA models
In this section, we describe the statistical properties of different ARMA models. The objective is to
analyse the dynamic dependence (autocorrelations) between observations generated by each of the
models considered. We can then compare the shape of the correlogram of a real time series with the
theoretical autocorrelations to decide which model seems to be more likely to have generated the series.
Autoregressive models
In autoregressive models of order p, the parameters θ are zero and the value of the series at time t is a
linear combination of the last p observations of the series. In the simplest case, the value of the series at
time t only depends on the previous observation. In this case, we have the AR(1) model given by:
tttaycy ++= −11
φ
The stationarity condition is 1||1
<φ . In this case, the marginal mean is given by
φµφ
−=⇒+= −1
)()(11
cyEcyEtt
The model can be reparametrized by
tttayy +−=− − )(
11µφµ
17
If 1||1
<φ , the observations fluctuate around µ which is the mean of the process.
-3
-2
-1
0
1
2
3
4
25 50 75 100 125 150 175 200
AR(1) model: y(t)=0.2*y(t-1)+a(t)
3
4
5
6
7
8
9
10
25 50 75 100 125 150 175 200
AR(1) series: y(t)=5+0.2*y(t-1)+a(t)
In the previous example, it is possible to observe that the mean is not 5 but 5/(1-0.2)=6.25.
The parameter 1
φ is related with the memory of the series. The closer is to zero, the shorter is the
memory. As the parameter 1
φ increases, the memory is larger and, consequently, the dependency with
respect to the past is also stronger.
-4
-3
-2
-1
0
1
2
3
4
25 50 75 100 125 150 175 200
AR(1) series: y(t)=0.5*y(t-1)+a(t)
-4
-2
0
2
4
6
25 50 75 100 125 150 175 200
AR(1) series: y(t)=0.8*y(t-1)+a(t)
-4
-2
0
2
4
6
8
10
12
25 50 75 100 125 150 175 200
AR(1) series: y(t)=0.95*y(t-1)+a(t)
-6
-4
-2
0
2
4
6
25 50 75 100 125 150 175 200
y(t)=-0.8*y(t-1)+a(t)
18
In a stationary model, the effects of the innovations are transitory while in non-stationary models, their
effects are permanent (the series is not mean reverting). To illustrate this point, consider the following
AR(1) model:
tttayy += −11
φ
Substituting recursively backwards, it is possible to obtain the following representation of ty in terms of
past innovations:
it
i
i
itay −
∞
=∑=
0
φ
If 1||1
<φ , then the effects of innovations get weaker as time passes by. However, if 1||1
=φ , then the
effects are permanent. In this case, the model is known as random walk.
Next, we derive the autocorrelation function of series generated by AR(1) models. We derive first the
marginal variance:
2
1
2
2
11
22
1
2
1
2
11
2
1
))(()()(
))(()()(
φσ
σ
µφµφµφµ
−=
⇒−++−=+−=−=
−−
−
a
tttt
tttt
ayEaEyE
ayEyEyVar
The autocovariances are given by
2
11111)})()({()})({()1( σφµµφµµγ =−+−=−−= −−− yttttt
yayEyyE
22
112112)1()})()({()})({()2( σφγφµµφµµγ ==−+−=−−= −−− ttttt
yayEyyE
2
1111)1()})()({()})({()( σφγφµµφµµγ h
httthtthyayEyyEh =−=−+−=−−= −−− .
Therefore, it is straightforward to see that the autocorrelation function (acf) of an AR(1) model is given
by
,...2,1,0,)(1
== hh hφρ
Consider, for example, the correlograms of the AR(1) series plotted before:
19
a) tttayy += −12.0
20
b) tttayy ++= −12.05
21
c) tttayy += −15.0
22
d) tttayy += −18.0
23
e) tttayy += −195.0
24
f) tttayy +−= −18.0
We have seen before that the marginal mean and variance of the process are given by
11 φ
µ−
= cy
and2
1
2
2
1 φσσ−
= a
y, respectively. Given that the process is stationary, these moments are
constant over time. Consider now the corresponding conditional moments which are given by
1111),...,|( −− =
tttyyyyE φ
2
1
2
2
11
2
11
2
11
11
2
111
1),...,|{),...,|){(
},...,|))({(),...,|(
φσσφ−
<==−
=−=
−−−
−−−
a
attttt
ttt
ttt
yyaEyyyyE
yyyEyEyyyVar
25
Consider now an AR(2) model which is given by
ttttayycy +++= −− 2211
φφ
The stationarity condition is that the modulus of the roots of 2
211 xx φφ +− are larger than one. In this
case, it is possible to derive the following acf of the AR(2) model:
>−+−
=−=1),2()1(
1,)1()(
21
2
1
hhh
hh
ρφρφφ
φρ
When the roots of 2
211 xx φφ +− are complex, the AR(2) model is able to generate cyclical behaviour.
Examples:
a) ttttayyy +−= −− 21
8.06.1
-8
-4
0
4
8
12
25 50 75 100 125 150 175 200
y(t)=1.6*y(t-1)-0.8*y(t-2)+a(t)
26
b) ttttayyy +−−= −− 21
7.05.1
-10
-5
0
5
10
25 50 75 100 125 150 175 200
y(t)=-1.5*y(t-1)-0.7*y(t-2)+a(t)
The autocorrelations of AR(p) models decay exponentially towards zero. However, looking at them it is
difficult to know which order of the model could be more appropriate. The partial autocorrelations help us
to decide the order p of the AR(p) models. The partial autocorrelation of order h is defined as:
),...,|,(11 +−−−=
htthtthhyyyyCorrφ
To compute the partial autocorrelations, we can construct the following models
27
tttayy1111
+= −φ
ttttayyy
2222121++= −− φφ
tttttayyyy3333232131
+++= −−− φφφ
In a AR(p) model, the partial autocorrelations are zero for order larger than p.
Moving average models
When the parameters φ of the Wald representation are zero, we obtain Moving Average models of order
q, MA(q) in which the observed series at time t is a linear combination of the last q innovations. In the
simplest case, we have the MA(1) model given by
11 −−+=tttaacy θ
MA models are always stationary. Therefore, we do not need to restrict the parameters for the marginal
mean, variance and covariances to be constant over time. The constant c is once more related to the mean
of the process. In this case,
cyEt
=)(
Example:
a) 1
5.0 −−=tttaay
-4
-3
-2
-1
0
1
2
3
4
25 50 75 100 125 150 175 200
y(t)=a(t)-0.5*a(t-1)
b) 1
5.05 −−+=tttaay
1
2
3
4
5
6
7
8
9
25 50 75 100 125 150 175 200
y(t)=5+a(t)-0.5*a(t-1)
28
When 1
θ is positive, successive values of ty are positively correlated and so the process will tend to be
smoother than the white noise, ta . On the other hand, a negative value of
1θ will yield a series which is
more irregular than a random series, in the sense that positive values of ty tend to be followed by
negative values and viceversa.
-4
-3
-2
-1
0
1
2
3
4
25 50 75 100 125 150 175 200
y(t)=a(t)+0.5*a(t-1)
We now derive the autocorrelation function of a MA(1) model:
)1(
)(2)()()()()(
2
1
22
11
2
1
2
1
22
11
2
θσσθθθµ
+=⇒−+=−=−= −−−
a
ttttttttaaEaEaEaaEyEyVar
2
1211111)})({()})({()1(
attttttaaaaEyyE σθθθµµγ −=−−=−−= −−−−
0)})({()})({()2(312112
=−−=−−= −−−− ttttttaaaaEyyE θθµµγ
0)})({()})({()(11112
=−−=−−= −−−−− hthtttttaaaaEyyEh θθµµγ
Therefore, the acf of the MA(1) model is given by
>
=+−
=1,0
1,1)( 2
1
1
h
hh θ
θρ
29
Examples:
a) 1
5.0 −−=tttaay
30
b) 1
5.0 −+=tttaay
31
c) 1
8.0 −−=tttaay
It is important to note that the maximum absolute value of the autocorrelation of order one in a MA(1)
model is 0.5.
Finally, we are looking at the conditional mean and variance of MA(1) models:
1111),...,|( −− =
tttayyyE θ
)1(),...,|{),...,|){(
},...,|))({(),...,|(
2
1
22
11
2
11
2
11
11
2
111
θσσθ +<==−
=−=
−−−
−−−
aattttt
ttt
ttt
yyaEyyayE
yyyEyEyyyVar
Once more, using the information contained in the past history of the variable, we reduce the uncertainty
about the future.
Consider now a MA(2) model:
32
2211 −− −−+=ttttaaacy θθ
In this case, it is easy to show that the acf is given by
>
=++
−
=++
+−
=
2,0
2,)1(
1,)1(
)(3
2
2
1
2
3
2
2
1
211
h
h
h
hθθ
θθθθθθ
ρ
In general, the acf of a MA(q) model has autocorrelations of orders larger than q equal to zero.
Example:
215.08.0 −− −−=
ttttaaay
ARMA models
Finally, we may have mixed models with both autoregressive and moving average components. An
ARMA(p,q) model is stationary if the autoregressive part of the model is stationary. In the simplest case,
the ARMA(1,1) model is given by
1111 −− −++=ttttaaycy θφ
The condition for stationarity is 1||1
<φ . In this case, the marginal mean is given by 1
1 φµ
−= c
and the acf
is given by
>−
=++
−+=
1),1(
1,21
))(1(
)(
1
21
2
1
1111
hh
hh
ρφθφθθφθφ
ρ
33
Examples:
a) 11
9.05.0 −− −+=ttttaayy
-3
-2
-1
0
1
2
3
4
5
25 50 75 100 125 150 175 200
y(t)=0.8*y(t-1)+a(t)-0.5*a(t-1)
34
b) 11
5.08.0 −− ++=ttttaayy
-6
-4
-2
0
2
4
6
8
25 50 75 100 125 150 175 200
y(t)=0.8*y(t-1)+a(t)+0.5*a(t-1)
35
c) 11
5.05.0 −− ++=ttttaayy
-4
-3
-2
-1
0
1
2
3
4
5
25 50 75 100 125 150 175 200
y(t)=0.5*y(t-1)+a(t)+0.5*a(t-1)
Summary
Note that even if the process is stationary the conditional moments can evolve over time. In stationary
ARMA models, the marginal mean and variance are constant over time but the conditional mean evolves
over time while the conditional variance is constant.
36
Marginal moments Conditional moments Acf Partial acf
Mean Variance Mean Variance
AR(1)
11 φ−c
2
1
2
1 φσ−
a 11 −tyφ 2
aσ Exponential
decay
0 for h>1
AR(2)
211 φφ −−
c
2
2
2
1
2
1 φφσ
−−a
2211 −− − tt yy φφ 2
aσ Exponential
decay
0 for h>2
MA(1) c )1( 2
1
2 θσ +a
11 −−taθ 2
aσ 0 for h>1 Exponential
decay
MA(2) c )1( 2
2
2
1
2 θθσ ++a 2211 −− −−ttaa θθ 2
aσ 0 for h>2 Exponential
decay
ARMA(1,1)
11 φ−c
2
2
1
11
2
1
1
21a
σφ
θφθ−++
1111 −− − tt ay θφ 2
aσ Exponential
decay
Exponential
decay
-3
-2
-1
0
1
2
3
4
5
25 50 75 100 125 150 175 200
(1-0.6L)(1-0.8L12)y(t)=(1-0.3L12)a(t)
ARIMA models
Remember that in previous sections, we have seen that many economic time series are not stationary in
the sense that the marginal mean is not constant over time. However, in many cases, these series are
stationary after being differenced. We have denoted by I(d) a series that is stationary after being
differenced d times. On the other hand, ARMA models are designed for stationary time series.
Consequently, in practice, we will start the analysis of a time series by finding the stationary
transformation and then fitting the corresponding ARMA model to this transformation. Denote by
t
d
tyw ∆= the stationary transformation. Then, we fit an ARMA(p,q) model to
tw as follows
qtqttptpttaaawww −−−− −−−+++= θθφφ ......
1111
which is equivalent to
37
qtqttpt
d
pt
d
t
daaayyy −−−− −−−+∆++∆=∆ θθφφ ......
1111
The model above is known as ARIMA(p,d,q) model. Consider, for example, the ARIMA(1,1,0) model
given by
tttayy +∆=∆ −11
φ
The model for ty can be rewritten as follows
ttt
ttt
ttttt
ayy
ayy
ayyyy
++=+−+
=+−+=
−−
−−
−−−
2
*
21
*
1
2111
21111
)1(
φφφφ
φφ
Therefore, an ARIMA(1,1,0) model is an AR(2) model with unit roots.
We can also extend this notation to the multiplicative ARIMA models as follows
t
s
Qqt
D
s
ds
PpaLLyLL )()()()( Θ=∆∆Φ θφ
1.6 Estimation and testing
Testing for non-stationarity
As we have seen above, the first step when analysing a given time series is to decide which is the
stationarity transformation. For this reason, it is very important to develop tests of stationarity. One of the
most popular tests is the Dickey-Fuller (DF) test. Consider, for example, that the series is generated by an
AR(1) process. In this case,
tttaycy ++= −11
φ
If the series is stationary then 1||1
<φ while if it is not, then 1||1
=φ . Therefore, the DF test is designed to
test the following hypothesis
1:
1:
11
10
<=
φφ
H
H
A couple of comments are due before we describe the test. First of all, note that under the null we are
interested only in the positive unit root. The case of 11
−=φ is not of interest for economic time series.
Also note that when testing for unit roots, we need to introduce a constant in the model because under the
alternative hypothesis, we are testing stationarity and not whether the mean of the series is zero. Finally,
note that the test is a one-sided test.
Instead of estimating the equation tttaycy ++= −11
φ , the test is based on the following equivalent
equation
tt
ttt
ayc
aycy
++=+−+=∆
−
−
1
*
11)1(
φφ
0:
0:
*
1
*
0
<
=
φφ
H
H
The previous model is estimated by Ordinary Least Squares (OLS) and the t-statistic of *φ is constructed
in the usual way. However, because under the null hypothesis ty is not stationary, the asymptotic
38
distribution of this t-statistic is not the usual Normal distribution. Dickey and Fuller tabulated by
simulation the corresponding distribution.
Example:
a) DF test for the Euro-Dollar exchange rate
b) DF test for returns of Euro-Dollar exchange rate
39
The test can be extended (Augmented DF) to consider a dynamic structure richer than the AR(1) model
considered above. In this case, the distribution of the test does not change. However, the distribution
depends on the deterministic components included in the model (constants, trends, dummy variables etc.)
Maximum Likelihood estimation of the parameters
The parameters of ARMA models can be estimated by Maximum Likelihood (ML) assuming a particular
conditional distribution of the series of interest. Note that even though the observations are not mutually
independent, the likelihood can always be decomposed as follows
)()|(
)()|()|()()|()(
1
2
1
221111
yfYyf
YfYyfYyfYfYyfYfLT
t
tt
TTTTTTTTT
∏=
−
−−−−−− ====
where ( )ttyyY ,...,
1= . Taking logs,
)(log))|(log(log1
2
1yfYyfL
T
t
tt∑=
− +=
If ty is conditionally Normal then its conditional distribution is given by
( )( )
−
−=−
−
−
−)|(2
)|(exp
)|(2
1)|(
1
2
1
2/1
1
1
tt
ttt
tt
ttYyVar
YyEy
YyVarYyf
π
If we further assume that the process is stationary and Gaussian, so the marginal distribution of the initial
observations is also Gaussian, then, the marginal density of 1y is given by
( )( )
−
−=2
2
1
2/121
2exp
2
1)(
σµ
πσy
yf
Finally, the Gaussian log-likelihood is given by
( )2
2
12
2 1
2
1
2
1
1
2
1
2
)(log
2
1
)|(
)|(
2
1)|(log
2
1)2log(
2
)(log))|(log(log
σµ
σπ−
−−−
−−−
=+=
∑∑
∑
= −
−
=−
=−
y
YyVar
YyEyYyVar
T
yfYyfL
T
t tt
tttT
t
tt
T
t
tt
As we have seen before, in ARMA models, the conditional variance is always constant and equal to the
variance of the innovations. Therefore, 2
1)|(
attYyVar σ=− and the Gaussian log-likelihood is given by
( )2
2
12
2
2
12
2
2
)(log
2
1)|(
2
1
2
)1()2log(
2log
σµσ
σσπ −
−−−−−
−−= ∑=
−
yYyEy
TTL
T
t
ttt
a
a
On the other hand, the conditional mean as well as the marginal distribution of 1y depends on the
particular model fitted to the data. If, for example, we fit an AR(1) model, we have seen before that
φµ
−=1
c
2
1
2
2
1 φσ
σ−
= a
111)|( −− +=
tttycYyE φ
Therefore, in this case, the Gaussian log-likelihood function is given by
40
( )
−
−−
−−−−−+−−= ∑=
−
2
2
2
1
2
2
112
22
12
)1(
2
1)1(
2
1
2)2log(
2log
φσ
φφ
σφσπ
a
T
t
tt
a
a
cy
ycyTT
L
The estimator is not linear as the log-likelihood function is non-linear in φ . However, if 1y is regarded as
fixed in repeated realisations, then its marginal distribution does not enter into the likelihood function
which is given by
( )∑=
−−−−−
−−=T
t
tt
a
aycy
TTL
2
2
112
2
2
1
2
)1()2log(
2log φ
σσπ
The conditional ML estimator is then linear, being given by a regression of ty on
1−ty . The asymptotic
properties of the estimator are not affected by this simplification. In AR models, the parameters can be
estimated by Ordinary Least Squares (OLS).
Example: Estimated model for the stationary transformation of the monthly European M3 ( )3log12
M∆∆
1.7 Prediction
The analysis of univariate time series has two objectives:
i) Description of the dynamic evolution of the series
ii) Prediction of future values
Once we have fulfilled the first objective, we have modelled the dependency of each observation with
respect to the past. Therefore, to predict we have to extrapolate into the future this dependency. If the
prediction criteria is to minimize the mean Square Error (MSE) then,
( )TktTktYyEy |ˆ
| ++ =
41
The point predictions are obtained in a recursive way. All future observations are equal to their
predictions (conditional expected values) and all future innovations are equal to zero. We will obtain
expressions for future predictions assuming that the parameters and the within-sample innovations are
known.
Examples:
i) Consider the following AR(1) model
tttaycy ++= −11
φ
The next observations in the series can be forecasted by
( ) ( )TTTTTTTTycYaycEYyEy
111/1||ˆ φφ +=++== ++
TTtTTTTTTTTTycycYyEcYaycEYyEy
2
11|11112112/2)1(ˆ)|()|()|(ˆ φφφφφ ++=+=+=++== ++++++
tTTTTTTTTTycycYaycEYyEy
3
1
2
11|213213/3)1(ˆ)|()|(ˆ φφφφφ +++=+=++== +++++
...
t
kk
TkTTkTkTTkTTkTycycYaycEYyEy
111|1111/)1/()1(ˆ)|()|(ˆ φφφφφ +−−=+=++== −++−+++
When the prediction horizon tends to infinity, the predictions tend to the marginal mean.
µφ
=−
=+∞→1
|1
ˆlimc
yTkT
k
As we predict further into the future the information contained in the actual observations is loosing its
weight. Note that the closer φ is to one (to non-stationarity), the more weight is given toTy .
ii) Consider now the MA(1) model
11 −−+=tttaacy θ
In this case, the predictions of future observations are given by
TTTTTTTTacYaacEYyEy
1111|1)|()|(ˆ θθ −=−+== +++
cYaacEYyEyTTTTTTT
=−+== ++++ )|()|(ˆ1122|2
θ
...
,...3,2,)|()|(ˆ11|
==−+== −++++ kcYaacEYyEyTkTkTTkTTkT
θ
After 2 periods of time the predictions are equal to c (the marginal mean). The information contained in
the values of the series at the end of the sample is not relevant to predict the future. The memory of MA
models is very short.
iii) Consider the ARI(1,1) model given by
tttaycy +∆+=∆ −11
φ
In this case, to obtain the predictions of future values of ty , we are rewritten the model as an AR(2)
model:
ttttayycy +−++= −− 2111
)1( φφ
Using this expression of the model, the preditions of future observations can be easily obtained as follows
21112111|1)1()|)1(()|(ˆ −+−++ −++=+−++==
TTTTTTTTTTyycYayycEYyEy φφφφ
TTTTTTTTTTyycYayycEYyEy
111211112|2ˆ)1()|)1(()|(ˆ φφφφ −++=+−++== ++−−++
42
1121121213|3ˆˆ)1()|)1(()|(ˆ +++++++ −++=+−++==TTTTTTTTTTyycYayycEYyEy φφφφ
...
21112111|ˆˆ)1()|)1(()|(ˆ −+−++−+−+++ −++=+−++==
kTkTTkTkTkTTkTTkTyycYayycEYyEy φφφφ
Note that in this case (non-sationary models), the values of the actual observations are weighted forever in
future predictions
iv) Finally, consider the following IMA(2,1) model
11
2
−−+=∆tttaacy θ
In order to obtain expression for future predictions, the model can be rewritten as
11212 −−− −++−=
tttttaacyyy θ
Using this expression, future predictions are given by
TTTTTTTTTTTTayycYaayycEYyEy
121121|12)|2()|(ˆ θθ −−+=−+−+== −+−++
TTTTTTTTTTTTyycYaayycEYyEy −+=−+−+== ++++++ |111212|2
ˆ2)|2()|(ˆ θ
TTTTTTTTTTTTTyycYaayycEYyEy
|1|2213123|3ˆˆ2)|2()|(ˆ ++++++++ −+=−+−+== θ
...
TkTTkTTkTkTkTkTTkTTkTyycYaayycEYyEy
|2|11121|ˆˆ2)|2()|(ˆ −+−+−++−+−+++ −+=−+−+== θ
Up to now we have seen how to obtain point predictions. However, in practice it is of interest to have also
a measure of the uncertainty associated with these predictions. Therefore, we are now deriving the
variances of the prediction errors. In order to derive these variances, we are considering the Wald
representation of the model where ty is expressed as a linear combination of past innovations:
∑∞
=−Ψ=
0i
ititay
where 10
=Ψ and }{ta is a white noise process. In general, we are not imposing the stationarity
condition ∑∞
=
∞<Ψ0
2
i
i because, we want to consider also the prediction of non-stationary time series. As
we have seen before, the prediction of kT
y + given the information available at time T is given by the
conditional mean
( )TktTktYyEy |ˆ
| ++ =
Therefore, what we are doing in the Wald representation is to equal to zero all future innovations:
( ) ∑∑∞
=−+
∞
=−+++ Ψ=
Ψ==00
|||ˆ
i
iTikT
i
ikTiTktTktaYaEYyEy
The prediction error is then given by
∑∑∑−
=−+
∞
=−+
∞
=−++++ Ψ=Ψ−Ψ=−=
1
000
|ˆ
k
i
ikTi
i
iTik
i
ikTiTktkTkTaaayye
Note that the prediction error is a linear combination of the future innovations that we are assuming that
are zero when in reality they are going to be different from zero. Using the previous expression of the
prediction error, it is easy to show that
0)(1
0
=
Ψ= ∑−
=−++
k
i
ikTikTaEeE
43
∑ ∑∑−
=
−
=−+
−
=−++ Ψ=Ψ=
Ψ=1
0
1
0
2222
21
0
)()(k
i
k
i
iaikTi
k
i
ikTikTaEaEeVar σ
Note that when the model is stationary, ∑∞
=
∞<Ψ0
2
i
i, and consequently, the variance is bounded. The
uncertainty about the future increases with the prediction horizon but it is bounded. Furthermore, from the
expression of the variance above, we can see that in this case, the limit of the variance of prediction errors
is the marginal variance. However, when the model is not stationary, the variance of future prediction
errors and, consequently, the uncertainty about the future increases without bound.
Using the results above and assuming that the innovations are Gaussian, we can construct prediction
intervals for kT
y + at the (1-α)% level as follows
( ) 2/1
2/|)(ˆ
kTTkTeVarzy ++ ± α
where αz is the α percentile of the Normal distribution.
-.015
-.010
-.005
.000
.005
.010
.015
2004M10 2005M01 2005M04 2005M07 2005M10
Multistep predictions of stationary transformation of M3
REFERENCES
Brooks, C. (2002), Chapter 5
Maravall, A. (1993), An application of nonlinear time series forecasting, Journal of Business and
Economic Statistics, 1, 66-74.
Mills, T.C. (1999), Chapter 2
Tsay, R.S. (2002), Sections 2.1 to 2.7 and 2.10
EXERCICES
1.1 Obtain the acf functions of AR(2) and ARMA(1,1) functions.
1.2 Obtain the Wald representation of an AR(1), MA(1) and an ARMA(1,1) models.
1.3 Exercices 1, 2, 5 and 6 of chapter 2 of Tsay (2000)