Chapter 1long

1

Master in Business Administration and Quantitative Methods

FINANCIAL ECONOMETRICS

2009-2010

ESTHER RUIZ

CHAPTER 1. BASIC CONCEPTS IN TIME SERIES ANALYSIS

1.1 Characteristics of time series: Stationary stochastic processes

A time series is a succession of observations equally spaced in time (not necessarily)

Tyyy ,...,, 21

At each point in time, the observation can correspond to a unique variable or to several variables. In the

former case, we have what is known as a univariate series and in the latter a multivariate series. For the

moment, our focus is on univariate series.

Examples of real time series:

0.8

0.9

1.0

1.1

1.2

1.3

1.4

250 500 750 1000 1250 1500

Daily Euro-Dollar exchange rate from 4th January 1999 up to 25th May 2005

1200000

1300000

1400000

1500000

1600000

1700000

91 92 93 94 95 96 97 98 99 00 01 02 03 04

Quartely GDP Europe from 1st 1991 up to 3th 2004

80

85

90

95

100

105

110

115

120

91 92 93 94 95 96 97 98 99 00 01 02 03 04

Monthly IPC Europe from January 1991 up to December 2004

The main objectives when analysing a univariate time series are:

i) Description of the dynamic properties of the series: trend, seasonal components,

dependencies with the past

ii) Prediction of future values of the series

2

To achieve these objectives, we have to take into account that the observations of a variable obtained at

different moments of time have two main properties:

i) They are dependent.

ii) They are obtained in a changing context.

As a consequence, we can not assume, as in the classical framework, that we have T independent

observations of a given random variable. Consider, for example, the observations of the CPI above. The

observations clearly depend on each other. Even more, we want to analyse this dependency to extrapolate

it into the future:

ectedUnt

edictablett ayyfy

expPr11 ),...,( += −

On the other hand, the CPI in January 1991 is not observed in the same circumstances as the CPI in

December 2004. Therefore, we cannot assume that we have 168 observations of a random variable with

mean µ and variance 2σ .

Alternatively, we assume that the observed time series is the realization of a stochastic process. A

stochastic process is a succession of random variables ordered in time (they can be ordered with

alternative indexes)

)(),...,2(),1( TYYY

Example:

Time

The stochastic process can generate in theory an infinite number of realizations over the period t=1,…,T.

............

...

...

...

...

)(...)2()1(

)3()3(

2

)3(

1

)2()2(

2

)2(

1

)1(1()

2

)1(

1

T

T

T

yyy

yyy

yyy

TYYY

↓↓↓

3

time

When several realizations are available, the mean of each variable of the process, ))(( tYEt

=µ , can be

estimated by ensemble averages

m

ym

j

j

t

t

∑== 1

)(

µ̂

where )( j

ty denotes the jth observation on

ty and m is the number of realizations. However, in most

empirical problems, only a single realization is available. Each observation in our time series is a

realization of each of the random variables of the process.

Tyyy

TYYY

...

...

)(...)2()1(

21

↓↓↓

time

Consequently, we have to restrict the properties of the process to carry out inference.

4

To allow estimation, we need to restrict the process to be stationary. There are two main concepts of

stationarity: strict and weak stationarity.

Strict stationarity: The process { )(),...,2(),1( TYYY } is said to be strictly stationary if the joint

distribution of { )(),...,(),(21 k

tYtYtY } is identical to that of { )(),...,(),(21

htYhtYhtYk

+++ } for all h,

where k is an arbitrary positive integer and ),...,,( 21 kttt is a collection of k positive integers. Therefore,

strict stationarity requires that the distribution of { )(),...,(),(21 k

tYtYtY } is invariant under time shifts. For

example,

),(),(),(

),(),(),(

)(...)()(

1167261

873221

21

yyfyyfyyf

yyfyyfyyf

yfyfyfT

====

===

Weak stationarity: The process { )(),...,(),(21 k

tYtYtY } is said to be weakly stationary if:

i) ttYEy

∀= ,))(( µ

ii) ttYVary

∀= ,))((2σ

iii) thhtYtYCovy

∀=+ ),())(),(( γ

time

The first condition implies that the mean of each random variable in the process is the same regardless of

the particular random variable chosen. Even if we have a collection of random variables, as all of them

have the same mean, we can use the sample mean to estimate their common mean as follows:

T

y

y

T

i

i∑=== 1µ̂

The other two conditions have similar interpretations. Given that all the random variables of the process

have the same variance, we can use the information contained in all of them to estimate it. In particular,

we estimate 2σ by

T

yy

s

T

t

i

y

∑=

−== 1

2

22

)(

σ̂

Finally, the third condition tells us that the dependency between any two random variables of the process

depends on the distance between them but not on the moment of time in which we observe them. The

relationship between two observations separated by, for example, one period of time is the same at the

beginning and at the end of the sample:

5

))301(),300(())2(),1(( YYCovYYCov =

This condition allows us to use the sample covariances to estimate the linear relationship between

observations separated by h periods of time as follows:

T

yyyy

hch

T

ht

htt∑+=

− −−== 1

))((

)()(γ̂

Therefore, weak starionarity requires that the first two moments of the process are time invariant. It is

obvious to see that weak stationarity does no imply strict stationarity, as the latter requires that the whole

distribution is invariant. However, it is important to note that strict stationarity implies weak stationarity

if the variance of the process is finite. In the context of non-linear, non-Gaussian models, there are

examples of strictly stationary processes without finite variance that are not weakly stationary.

When )}(),...,2(),1({ TYYY has a joint Normal multivariate distribution, the process is said to be

Gaussian. In this case, the distribution of every subset )}(),...,(),({21 k

tYtYtY is also multivariate Normal.

In particular, the marginal distribution of each variable of the process is Normal. Given that the Normal

distribution is characterized by the first two order moments, weak stationarity implies strict stationarity.

In this case, weak and strict stationarity coincides.

1.2 Transformations to stationarity

When a series of observations is generated by a stationary process, they fluctuate around a constant level

and there is no tendency for their spread to increase or decrease over time. Consider, for example, the

following series generated by a stationary series known as white noise. A white noise process is a

particular case of a stationary process with 0=µ and hh ∀= ,0)(γ .

-3

-2

-1

0

1

2

3

25 50 75 100 125 150 175 200

Gaussian white noise process with variance 1.

However, in practice, most real economic time series do not fluctuate around a constant level, but instead

show some kind of systematic upward or downward movement known as trend. Therefore, their marginal

means are not constant over time and, consequently, the series is not stationary.

6

Example:

1200000

1300000

1400000

1500000

1600000

1700000

91 92 93 94 95 96 97 98 99 00 01 02 03 04

Quartely GDP Europe from 1st 1991 up to 3th 2004

Furthermore, in many macroeconomic time series, not only we observe trends but also that the spread of

the series around its mean increases with the level of the series. In this case, neither the marginal means

nor the marginal variances are constant.

0.00E+00

4.00E+08

8.00E+08

1.20E+09

1.60E+09

2.00E+09

2.40E+09

60 65 70 75 80 85 90 95 00

Monthly Spanish Exports from January 1960 up to Jult 2004

Although real time series are usually non-stationary, under appropriate transformations, the transformed

series is stationary. Next, we describe some of these transformations. One of the most popular of them to

stabilize the variance when it increases with the level of the series is the logarithmic transformation.

Consider the previous example of monthly exports in Spain. After taking logs, the series is

14

15

16

17

18

19

20

21

22

60 65 70 75 80 85 90 95 00

Monthly logarithmic exports in Spain from January 1960 to July 2004

7

We focus now on transformations to stabilize the mean. Consider, first that the series of interest is

stationary with marginal mean µ . One possible representation for this series is

tty εµ +=

where t

ε is a stationary process with zero mean. If the mean of ty is not constant but evolves over time,

then we can represent ty by the following model:

ttt

ttty

ηµµεµ+=

+=

−1

where t

η is a ),0(2

ησNID process independent of t

ε . The process of t

µ is known as random walk. Note

that when 02 =ησ , we go back to stationarity. In general, if 0

2 ≠ησ , the level of the series evolves over

time without a trend.

-3

-2

-1

0

1

2

3

25 50 75 100 125 150 175 200

Random walk plus noise with sigeps=1 and sigeta=0

-15

-10

-5

0

5

10

15

20

25 50 75 100 125 150 175 200

Random walk plus noise with sigeps=1 and sigeta=1

-3

-2

-1

0

1

2

3

4

25 50 75 100 125 150 175 200

Random walk plus noise with sigeps=1 and sigeta=0.1

-8

-4

0

4

8

12

25 50 75 100 125 150 175 200

Random walk plus noise with sigeps=1 and sigeta=0.5

In this case, if we take first differences,

1111 −−−− −+=−+−=−=∆tttttttttt

yyy εεηεεµµ

0}{}{ =∆+=∆ttt

EyE εη

which is a stationary process. When a non-stationary process becomes stationary after taking first

differences, the process is known as integrated of order 1, I(1).

Example:

8

0.8

0.9

1.0

1.1

1.2

1.3

1.4

250 500 750 1000 1250 1500

Daily Euro-Dollar exchange rate from 4th January 1999 up to 25th May 2005

-.03

-.02

-.01

.00

.01

.02

.03

250 500 750 1000 1250 1500

First differences of logs of euro-dollar exchange rates: returns

If we have a series with trend and constant slope, the following representation is appropriate

ttt

ttty

ηβµµεµ

++=+=

−1

where β is the rate of growth of the trend.

0

40

80

120

160

200

25 50 75 100 125 150 175 200

Random walk with drift plus noise with sigeps=1 and sigeta=1

In this case, taking first differences,

βεβη =∆++=∆ }{}{ttt

EyE

the series is also starionary. Therefore, we have an I(1) series. Note that introducing a constant in a time

series model, can change completely the long run behaviour of the series.

9

-4

-2

0

2

4

6

8

25 50 75 100 125 150 175 200

First differences of random walk with drift plus noise

Finally, we can also observe series with the rate of growth (slope) of the trend changing over time.

Following the same arguments as before, we can model this behaviour as follows

ttt

tttt

ttty

ξββηβµµ

εµ

+=++=

+=

−

−

1

1

where t

ξ is a white noise with variance 2

ξσ independent of t

ε and t

η . In this case, if we take first

differences, the series is non-stationary:

tttttEyE βεβη =∆++=∆ }{}{

-100

0

100

200

300

400

500

600

25 50 75 100 125 150 175 200

Stochastic trend model with sigeps=1, sigeta=1 and sigpsi=0.5

-8

-4

0

4

8

12

25 50 75 100 125 150 175 200

First differences of stochastic trend model

However, taking a second difference

211

2111

2

2

2)()(

−−−

−−−−

+−++−=+−+−+−=∆++∆=∆∆=∆

tttttt

ttttttttttttyy

εεεξηηεεεββηηεβη

0)2()(211

2 =+−++−=∆ −−− tttttttEyE εεεξηη

Therefore, by taking two differences, we obtain a stationary series with zero mean

10

-12

-8

-4

0

4

8

25 50 75 100 125 150 175 200

Second differences of stochastic trend model

Example:

0.0E+00

1.0E+09

2.0E+09

3.0E+09

4.0E+09

5.0E+09

6.0E+09

7.0E+09

1970 1975 1980 1985 1990 1995 2000

Monthly european M3

-.03

-.02

-.01

.00

.01

.02

.03

.04

.05

1970 1975 1980 1985 1990 1995 2000

First differences of logarithmic M3

-.04

-.03

-.02

-.01

.00

.01

.02

.03

.04

1970 1975 1980 1985 1990 1995 2000

Seasonal and regular differences of logarithmic M3

11

Finally, when both parameters 022 == ξη σσ , we obtain a deterministic trend. In this case, it is possible to

fit a regression model and the stationary transformation is the series of residuals from this model.

0

40

80

120

160

200

240

25 50 75 100 125 150 175 200

Deterministic trend plus noise

After fitting a deterministic trend, we obtain the following results:

The residuals from this model are the corresponding stationary transformation:

-3

-2

-1

0

1

2

3

25 50 75 100 125 150 175 200

Y6 Residuals

12

1.3 Wald theorem: linear ARMA models

The objective of time series analysis is to decompose the observed values of a time series in a component

that depends on past observations (and can be predicted) and an unexpected component.

tttayyfy += − ),...,(

11

The process }{ta is known as innovation. By definition, it should not be related with the past.

Wald Theorem: If a process is stationary without deterministic components, then

∑∞

=−Ψ=

0i

ititay

where 10

=Ψ , ∑∞

=

∞<Ψ0

2

i

iand }{

ta is a white noise process, i.e. an uncorrelated sequence. When the

process }{ta has independent variables, it is called a strict white noise.

The usefulness of the Wald Theorem is that it allows to approximating the dynamic evolution of a

variable ty by a linear model. If the innovations }{

ta are independent, then the linear model is the only

possible representation. However, when }{ta is merely an uncorrelated but not independent sequence,

then the linear model exists but it is not the unique representation of the dynamic dependence of the

series. In the latter case, it is possible that the linear model is not very useful, and we may have a

nonlinear model relating the observed value of ty with its past evolution.

The Wald representation depends on infinite parameters and, consequently, it is not useful in practice. To

solve this problem, it is approximated by using models that have a finite number of parameters.

ttaLy )(∞Ψ=

where ...1)( 2

21+Ψ+Ψ+=Ψ∞ LLL and L is the lag operator such that

1−=ttxLx . The infinite order

polynomial, )(L∞Ψ , is approximated by the following ratio:

)(

)()(

L

LL

p

q

ΦΘ

=Ψ∞

where q

qqLLL θθ −−−=Θ ...1)(

1 and

p

ppLLL φφ −−−=Φ ...1)(

1. The stationarity condition is that the

roots of the autoregressive polynomial, )(Lp

Φ , are outside the unit circle. Furthermore, for identification,

we assume that both polynomials do not have common roots. The resulting model is known as

ARMA(p,q) model and it is given by

qtqttptpttaaayyy −−−− −−−+++= θθφφ ......

1111

This approximation is valid in a large proportion of cases. However, there are stationary models that

cannot be approximated by using ARMA models. This is the case of long-memory series.

Once we have the ARMA model, we need to decide how to choose p and q suitable for the series of

interest.

13

1.4 The autocorrelation function and the correlogram

As we mentioned above, the main objective in the analysis of univariate time series is to decompose the

observed series in the part that depends on the past (and can be consequently predicted) and the

unexpected part:

ectedUnt

edictablett

ayyfyexpPr

11),...,( += −

Because of the Wald theorem, we can focus as a first approximation on linear dynamic dependencies (f is

linear). In this case, the autocovariances are the main instrument available to analyze these dependencies.

The autocovariance of order h is given by

2}{)})({()( µµµγ −=−−= −− htthtt

yyEyyEh

and the corresponding autocorrelation is given by

)0(

)()(

γγρ h

h = .

Note, that we are assuming that the autocorrelations are constant over time. This assumption is needed to

estimate them using the sample autocorrelations estimated in the usual way. As a consequence of the

stationarity, )()( hh −= ρρ , it is unnecessary to consider negative values of h. To identify the model

appropriate for a given time series, we are matching the shape of its estimated acf (correlogram) with the

shapes of the theoretical acf’s implied by the alternative ARMA models. However, we should take into

account the sample properties of the estimators of the autocorrelations. To estimate the autocorrelation of

order h, we use its sample analogue

∑

∑

=

+=−

−

−−=

T

t

t

T

ht

htt

yy

yyyy

hr

1

2

1

)(

))((

)(

If }{ ty is an iid sequence with finite variance, then )(hr is asymptotically normal with mean zero and

variance T

1. This result can help us to build the Barlett confidence bounds for the estimated

autocorrelations.

Finally, if we want to test jointly that the first m autocorrelations are zero, i.e.

0)(...)2()1(:0

==== mH ρρρ , then we can use the Ljung-Box-Pierce statistic given by

∑= −

+=m

h hT

hrTTmQ

1

2)()2()(

If }{ ty is an iid sequence with finite fourth order moment, then )(mQ has asymptotically a chi-squared

distribution with m degrees of freedom. In practice, the selection of m affects the finite sample

performance of the statistic. Simulation studies suggest the choice of )log(Tm ≈ provides best powers.

14

Examples:

i) Correlogram of stationary transformation of monthly European M3

ii) Correlogram of daily returns of Euro/Dollar exchange rate

15

When, as in the case of the Euro-Dollar exchange rate, the correlogram of a series has not significant

autocorrelations, it means that there are not linear models to predict the future of the series using the

information contained in the past. However, it is possible that there are nonlinear transformations which

are correlated and, consequently, that can be predicted.

Example:

10

15

20

25

30

35

40

45

250 500 750 1000 1250 1500

Daily CAC index from 7 December 1998 up to 9 September 2005

-.2

-.1

.0

.1

.2

.3

.4

250 500 750 1000 1250 1500

Daily returns of CAC index from 7 december 1998 to 9 September 2005

Correlogram of daily retuns of CAC

16

Correlogram of squared daily returns of CAC

1.5 Linear ARMA models

In this section, we describe the statistical properties of different ARMA models. The objective is to

analyse the dynamic dependence (autocorrelations) between observations generated by each of the

models considered. We can then compare the shape of the correlogram of a real time series with the

theoretical autocorrelations to decide which model seems to be more likely to have generated the series.

Autoregressive models

In autoregressive models of order p, the parameters θ are zero and the value of the series at time t is a

linear combination of the last p observations of the series. In the simplest case, the value of the series at

time t only depends on the previous observation. In this case, we have the AR(1) model given by:

tttaycy ++= −11

φ

The stationarity condition is 1||1

<φ . In this case, the marginal mean is given by

φµφ

−=⇒+= −1

)()(11

cyEcyEtt

The model can be reparametrized by

tttayy +−=− − )(

11µφµ

17

If 1||1

<φ , the observations fluctuate around µ which is the mean of the process.

-3

-2

-1

0

1

2

3

4

25 50 75 100 125 150 175 200

AR(1) model: y(t)=0.2*y(t-1)+a(t)

3

4

5

6

7

8

9

10

25 50 75 100 125 150 175 200

AR(1) series: y(t)=5+0.2*y(t-1)+a(t)

In the previous example, it is possible to observe that the mean is not 5 but 5/(1-0.2)=6.25.

The parameter 1

φ is related with the memory of the series. The closer is to zero, the shorter is the

memory. As the parameter 1

φ increases, the memory is larger and, consequently, the dependency with

respect to the past is also stronger.

-4

-3

-2

-1

0

1

2

3

4

25 50 75 100 125 150 175 200

AR(1) series: y(t)=0.5*y(t-1)+a(t)

-4

-2

0

2

4

6

25 50 75 100 125 150 175 200


-4

-2

0

2

4

6

8

10

12

25 50 75 100 125 150 175 200


-6

-4

-2

0

2

4

6

25 50 75 100 125 150 175 200

y(t)=-0.8*y(t-1)+a(t)

18

In a stationary model, the effects of the innovations are transitory while in non-stationary models, their

effects are permanent (the series is not mean reverting). To illustrate this point, consider the following

AR(1) model:

tttayy += −11

φ

Substituting recursively backwards, it is possible to obtain the following representation of ty in terms of

past innovations:

it

i

i

itay −

∞

=∑=

0

φ

If 1||1

<φ , then the effects of innovations get weaker as time passes by. However, if 1||1

=φ , then the

effects are permanent. In this case, the model is known as random walk.

Next, we derive the autocorrelation function of series generated by AR(1) models. We derive first the

marginal variance:

2

1

2

2

11

22

1

2

1

2

11

2

1

))(()()(

))(()()(

φσ

σ

µφµφµφµ

−=

⇒−++−=+−=−=

−−

−

a

tttt

tttt

ayEaEyE

ayEyEyVar

The autocovariances are given by

2

11111)})()({()})({()1( σφµµφµµγ =−+−=−−= −−− yttttt

yayEyyE

22

112112)1()})()({()})({()2( σφγφµµφµµγ ==−+−=−−= −−− ttttt

yayEyyE

2

1111)1()})()({()})({()( σφγφµµφµµγ h

httthtthyayEyyEh =−=−+−=−−= −−− .

Therefore, it is straightforward to see that the autocorrelation function (acf) of an AR(1) model is given

by

,...2,1,0,)(1

== hh hφρ

Consider, for example, the correlograms of the AR(1) series plotted before:

19

a) tttayy += −12.0

20

b) tttayy ++= −12.05

21

c) tttayy += −15.0

22

d) tttayy += −18.0

23

e) tttayy += −195.0

24

f) tttayy +−= −18.0

We have seen before that the marginal mean and variance of the process are given by

11 φ

µ−

= cy

and2

1

2

2

1 φσσ−

= a

y, respectively. Given that the process is stationary, these moments are

constant over time. Consider now the corresponding conditional moments which are given by

1111),...,|( −− =

tttyyyyE φ

2

1

2

2

11

2

11

2

11

11

2

111

1),...,|{),...,|){(

},...,|))({(),...,|(

φσσφ−

<==−

=−=

−−−

−−−

a

attttt

ttt

ttt

yyaEyyyyE

yyyEyEyyyVar

25

Consider now an AR(2) model which is given by

ttttayycy +++= −− 2211

φφ

The stationarity condition is that the modulus of the roots of 2

211 xx φφ +− are larger than one. In this

case, it is possible to derive the following acf of the AR(2) model:

>−+−

=−=1),2()1(

1,)1()(

21

2

1

hhh

hh

ρφρφφ

φρ

When the roots of 2

211 xx φφ +− are complex, the AR(2) model is able to generate cyclical behaviour.

Examples:

a) ttttayyy +−= −− 21

8.06.1

-8

-4

0

4

8

12

25 50 75 100 125 150 175 200

y(t)=1.6*y(t-1)-0.8*y(t-2)+a(t)

26

b) ttttayyy +−−= −− 21

7.05.1

-10

-5

0

5

10

25 50 75 100 125 150 175 200

y(t)=-1.5*y(t-1)-0.7*y(t-2)+a(t)

The autocorrelations of AR(p) models decay exponentially towards zero. However, looking at them it is

difficult to know which order of the model could be more appropriate. The partial autocorrelations help us

to decide the order p of the AR(p) models. The partial autocorrelation of order h is defined as:

),...,|,(11 +−−−=

htthtthhyyyyCorrφ

To compute the partial autocorrelations, we can construct the following models

27

tttayy1111

+= −φ

ttttayyy

2222121++= −− φφ

tttttayyyy3333232131

+++= −−− φφφ

In a AR(p) model, the partial autocorrelations are zero for order larger than p.

Moving average models

When the parameters φ of the Wald representation are zero, we obtain Moving Average models of order

q, MA(q) in which the observed series at time t is a linear combination of the last q innovations. In the

simplest case, we have the MA(1) model given by

11 −−+=tttaacy θ

MA models are always stationary. Therefore, we do not need to restrict the parameters for the marginal

mean, variance and covariances to be constant over time. The constant c is once more related to the mean

of the process. In this case,

cyEt

=)(

Example:

a) 1

5.0 −−=tttaay

-4

-3

-2

-1

0

1

2

3

4

25 50 75 100 125 150 175 200

y(t)=a(t)-0.5*a(t-1)

b) 1

5.05 −−+=tttaay

1

2

3

4

5

6

7

8

9

25 50 75 100 125 150 175 200

y(t)=5+a(t)-0.5*a(t-1)

28

When 1

θ is positive, successive values of ty are positively correlated and so the process will tend to be

smoother than the white noise, ta . On the other hand, a negative value of

1θ will yield a series which is

more irregular than a random series, in the sense that positive values of ty tend to be followed by

negative values and viceversa.

-4

-3

-2

-1

0

1

2

3

4

25 50 75 100 125 150 175 200

y(t)=a(t)+0.5*a(t-1)

We now derive the autocorrelation function of a MA(1) model:

)1(

)(2)()()()()(

2

1

22

11

2

1

2

1

22

11

2

θσσθθθµ

+=⇒−+=−=−= −−−

a

ttttttttaaEaEaEaaEyEyVar

2

1211111)})({()})({()1(

attttttaaaaEyyE σθθθµµγ −=−−=−−= −−−−

0)})({()})({()2(312112

=−−=−−= −−−− ttttttaaaaEyyE θθµµγ

0)})({()})({()(11112

=−−=−−= −−−−− hthtttttaaaaEyyEh θθµµγ

Therefore, the acf of the MA(1) model is given by

>

=+−

=1,0

1,1)( 2

1

1

h

hh θ

θρ

29

Examples:

a) 1

5.0 −−=tttaay

30

b) 1

5.0 −+=tttaay

31

c) 1

8.0 −−=tttaay

It is important to note that the maximum absolute value of the autocorrelation of order one in a MA(1)

model is 0.5.

Finally, we are looking at the conditional mean and variance of MA(1) models:

1111),...,|( −− =

tttayyyE θ

)1(),...,|{),...,|){(

},...,|))({(),...,|(

2

1

22

11

2

11

2

11

11

2

111

θσσθ +<==−

=−=

−−−

−−−

aattttt

ttt

ttt

yyaEyyayE

yyyEyEyyyVar

Once more, using the information contained in the past history of the variable, we reduce the uncertainty

about the future.

Consider now a MA(2) model:

32

2211 −− −−+=ttttaaacy θθ

In this case, it is easy to show that the acf is given by

>

=++

−

=++

+−

=

2,0

2,)1(

1,)1(

)(3

2

2

1

2

3

2

2

1

211

h

h

h

hθθ

θθθθθθ

ρ

In general, the acf of a MA(q) model has autocorrelations of orders larger than q equal to zero.

Example:

215.08.0 −− −−=

ttttaaay

ARMA models

Finally, we may have mixed models with both autoregressive and moving average components. An

ARMA(p,q) model is stationary if the autoregressive part of the model is stationary. In the simplest case,

the ARMA(1,1) model is given by

1111 −− −++=ttttaaycy θφ

The condition for stationarity is 1||1

<φ . In this case, the marginal mean is given by 1

1 φµ

−= c

and the acf

is given by

>−

=++

−+=

1),1(

1,21

))(1(

)(

1

21

2

1

1111

hh

hh

ρφθφθθφθφ

ρ

33

Examples:

a) 11

9.05.0 −− −+=ttttaayy

-3

-2

-1

0

1

2

3

4

5

25 50 75 100 125 150 175 200

y(t)=0.8*y(t-1)+a(t)-0.5*a(t-1)

34

b) 11

5.08.0 −− ++=ttttaayy

-6

-4

-2

0

2

4

6

8

25 50 75 100 125 150 175 200

y(t)=0.8*y(t-1)+a(t)+0.5*a(t-1)

35

c) 11

5.05.0 −− ++=ttttaayy

-4

-3

-2

-1

0

1

2

3

4

5

25 50 75 100 125 150 175 200

y(t)=0.5*y(t-1)+a(t)+0.5*a(t-1)

Summary

Note that even if the process is stationary the conditional moments can evolve over time. In stationary

ARMA models, the marginal mean and variance are constant over time but the conditional mean evolves

over time while the conditional variance is constant.

36

Marginal moments Conditional moments Acf Partial acf

Mean Variance Mean Variance

AR(1)

11 φ−c

2

1

2

1 φσ−

a 11 −tyφ 2

aσ Exponential

decay

0 for h>1

AR(2)

211 φφ −−

c

2

2

2

1

2

1 φφσ

−−a

2211 −− − tt yy φφ 2

aσ Exponential

decay

0 for h>2

MA(1) c )1( 2

1

2 θσ +a

11 −−taθ 2

aσ 0 for h>1 Exponential

decay

MA(2) c )1( 2

2

2

1

2 θθσ ++a 2211 −− −−ttaa θθ 2

aσ 0 for h>2 Exponential

decay

ARMA(1,1)

11 φ−c

2

2

1

11

2

1

1

21a

σφ

θφθ−++

1111 −− − tt ay θφ 2

aσ Exponential

decay

Exponential

decay

-3

-2

-1

0

1

2

3

4

5

25 50 75 100 125 150 175 200

(1-0.6L)(1-0.8L12)y(t)=(1-0.3L12)a(t)

ARIMA models

Remember that in previous sections, we have seen that many economic time series are not stationary in

the sense that the marginal mean is not constant over time. However, in many cases, these series are

stationary after being differenced. We have denoted by I(d) a series that is stationary after being

differenced d times. On the other hand, ARMA models are designed for stationary time series.

Consequently, in practice, we will start the analysis of a time series by finding the stationary

transformation and then fitting the corresponding ARMA model to this transformation. Denote by

t

d

tyw ∆= the stationary transformation. Then, we fit an ARMA(p,q) model to

tw as follows

qtqttptpttaaawww −−−− −−−+++= θθφφ ......

1111

which is equivalent to

37

qtqttpt

d

pt

d

t

daaayyy −−−− −−−+∆++∆=∆ θθφφ ......

1111

The model above is known as ARIMA(p,d,q) model. Consider, for example, the ARIMA(1,1,0) model

given by

tttayy +∆=∆ −11

φ

The model for ty can be rewritten as follows

ttt

ttt

ttttt

ayy

ayy

ayyyy

++=+−+

=+−+=

−−

−−

−−−

2

*

21

*

1

2111

21111

)1(

φφφφ

φφ

Therefore, an ARIMA(1,1,0) model is an AR(2) model with unit roots.

We can also extend this notation to the multiplicative ARIMA models as follows

t

s

Qqt

D

s

ds

PpaLLyLL )()()()( Θ=∆∆Φ θφ

1.6 Estimation and testing

Testing for non-stationarity

As we have seen above, the first step when analysing a given time series is to decide which is the

stationarity transformation. For this reason, it is very important to develop tests of stationarity. One of the

most popular tests is the Dickey-Fuller (DF) test. Consider, for example, that the series is generated by an

AR(1) process. In this case,

tttaycy ++= −11

φ

If the series is stationary then 1||1

<φ while if it is not, then 1||1

=φ . Therefore, the DF test is designed to

test the following hypothesis

1:

1:

11

10

<=

φφ

H

H

A couple of comments are due before we describe the test. First of all, note that under the null we are

interested only in the positive unit root. The case of 11

−=φ is not of interest for economic time series.

Also note that when testing for unit roots, we need to introduce a constant in the model because under the

alternative hypothesis, we are testing stationarity and not whether the mean of the series is zero. Finally,

note that the test is a one-sided test.

Instead of estimating the equation tttaycy ++= −11

φ , the test is based on the following equivalent

equation

tt

ttt

ayc

aycy

++=+−+=∆

−

−

1

*

11)1(

φφ

0:

0:

*

1

*

0

<

=

φφ

H

H

The previous model is estimated by Ordinary Least Squares (OLS) and the t-statistic of *φ is constructed

in the usual way. However, because under the null hypothesis ty is not stationary, the asymptotic

38

distribution of this t-statistic is not the usual Normal distribution. Dickey and Fuller tabulated by

simulation the corresponding distribution.

Example:

a) DF test for the Euro-Dollar exchange rate

b) DF test for returns of Euro-Dollar exchange rate

39

The test can be extended (Augmented DF) to consider a dynamic structure richer than the AR(1) model

considered above. In this case, the distribution of the test does not change. However, the distribution

depends on the deterministic components included in the model (constants, trends, dummy variables etc.)

Maximum Likelihood estimation of the parameters

The parameters of ARMA models can be estimated by Maximum Likelihood (ML) assuming a particular

conditional distribution of the series of interest. Note that even though the observations are not mutually

independent, the likelihood can always be decomposed as follows

)()|(

)()|()|()()|()(

1

2

1

221111

yfYyf

YfYyfYyfYfYyfYfLT

t

tt

TTTTTTTTT

∏=

−

−−−−−− ====

where ( )ttyyY ,...,

1= . Taking logs,

)(log))|(log(log1

2

1yfYyfL

T

t

tt∑=

− +=

If ty is conditionally Normal then its conditional distribution is given by

( )( )

−

−=−

−

−

−)|(2

)|(exp

)|(2

1)|(

1

2

1

2/1

1

1

tt

ttt

tt

ttYyVar

YyEy

YyVarYyf

π

If we further assume that the process is stationary and Gaussian, so the marginal distribution of the initial

observations is also Gaussian, then, the marginal density of 1y is given by

( )( )

−

−=2

2

1

2/121

2exp

2

1)(

σµ

πσy

yf

Finally, the Gaussian log-likelihood is given by

( )2

2

12

2 1

2

1

2

1

1

2

1

2

)(log

2

1

)|(

)|(

2

1)|(log

2

1)2log(

2

)(log))|(log(log

σµ

σπ−

−−−

−−−

=+=

∑∑

∑

= −

−

=−

=−

y

YyVar

YyEyYyVar

T

yfYyfL

T

t tt

tttT

t

tt

T

t

tt

As we have seen before, in ARMA models, the conditional variance is always constant and equal to the

variance of the innovations. Therefore, 2

1)|(

attYyVar σ=− and the Gaussian log-likelihood is given by

( )2

2

12

2

2

12

2

2

)(log

2

1)|(

2

1

2

)1()2log(

2log

σµσ

σσπ −

−−−−−

−−= ∑=

−

yYyEy

TTL

T

t

ttt

a

a

On the other hand, the conditional mean as well as the marginal distribution of 1y depends on the

particular model fitted to the data. If, for example, we fit an AR(1) model, we have seen before that

φµ

−=1

c

2

1

2

2

1 φσ

σ−

= a

111)|( −− +=

tttycYyE φ

Therefore, in this case, the Gaussian log-likelihood function is given by

40

( )

−

−−

−−−−−+−−= ∑=

−

2

2

2

1

2

2

112

22

12

)1(

2

1)1(

2

1

2)2log(

2log

φσ

φφ

σφσπ

a

T

t

tt

a

a

cy

ycyTT

L

The estimator is not linear as the log-likelihood function is non-linear in φ . However, if 1y is regarded as

fixed in repeated realisations, then its marginal distribution does not enter into the likelihood function

which is given by

( )∑=

−−−−−

−−=T

t

tt

a

aycy

TTL

2

2

112

2

2

1

2

)1()2log(

2log φ

σσπ

The conditional ML estimator is then linear, being given by a regression of ty on

1−ty . The asymptotic

properties of the estimator are not affected by this simplification. In AR models, the parameters can be

estimated by Ordinary Least Squares (OLS).

Example: Estimated model for the stationary transformation of the monthly European M3 ( )3log12

M∆∆

1.7 Prediction

The analysis of univariate time series has two objectives:

i) Description of the dynamic evolution of the series

ii) Prediction of future values

Once we have fulfilled the first objective, we have modelled the dependency of each observation with

respect to the past. Therefore, to predict we have to extrapolate into the future this dependency. If the

prediction criteria is to minimize the mean Square Error (MSE) then,

( )TktTktYyEy |ˆ

| ++ =

41

The point predictions are obtained in a recursive way. All future observations are equal to their

predictions (conditional expected values) and all future innovations are equal to zero. We will obtain

expressions for future predictions assuming that the parameters and the within-sample innovations are

known.

Examples:

i) Consider the following AR(1) model

tttaycy ++= −11

φ

The next observations in the series can be forecasted by

( ) ( )TTTTTTTTycYaycEYyEy

111/1||ˆ φφ +=++== ++

TTtTTTTTTTTTycycYyEcYaycEYyEy

2

11|11112112/2)1(ˆ)|()|()|(ˆ φφφφφ ++=+=+=++== ++++++

tTTTTTTTTTycycYaycEYyEy

3

1

2

11|213213/3)1(ˆ)|()|(ˆ φφφφφ +++=+=++== +++++

...

t

kk

TkTTkTkTTkTTkTycycYaycEYyEy

111|1111/)1/()1(ˆ)|()|(ˆ φφφφφ +−−=+=++== −++−+++

When the prediction horizon tends to infinity, the predictions tend to the marginal mean.

µφ

=−

=+∞→1

|1

ˆlimc

yTkT

k

As we predict further into the future the information contained in the actual observations is loosing its

weight. Note that the closer φ is to one (to non-stationarity), the more weight is given toTy .

ii) Consider now the MA(1) model

11 −−+=tttaacy θ

In this case, the predictions of future observations are given by

TTTTTTTTacYaacEYyEy

1111|1)|()|(ˆ θθ −=−+== +++

cYaacEYyEyTTTTTTT

=−+== ++++ )|()|(ˆ1122|2

θ

...

,...3,2,)|()|(ˆ11|

==−+== −++++ kcYaacEYyEyTkTkTTkTTkT

θ

After 2 periods of time the predictions are equal to c (the marginal mean). The information contained in

the values of the series at the end of the sample is not relevant to predict the future. The memory of MA

models is very short.

iii) Consider the ARI(1,1) model given by

tttaycy +∆+=∆ −11

φ

In this case, to obtain the predictions of future values of ty , we are rewritten the model as an AR(2)

model:

ttttayycy +−++= −− 2111

)1( φφ

Using this expression of the model, the preditions of future observations can be easily obtained as follows

21112111|1)1()|)1(()|(ˆ −+−++ −++=+−++==

TTTTTTTTTTyycYayycEYyEy φφφφ

TTTTTTTTTTyycYayycEYyEy

111211112|2ˆ)1()|)1(()|(ˆ φφφφ −++=+−++== ++−−++

42

1121121213|3ˆˆ)1()|)1(()|(ˆ +++++++ −++=+−++==TTTTTTTTTTyycYayycEYyEy φφφφ

...

21112111|ˆˆ)1()|)1(()|(ˆ −+−++−+−+++ −++=+−++==

kTkTTkTkTkTTkTTkTyycYayycEYyEy φφφφ

Note that in this case (non-sationary models), the values of the actual observations are weighted forever in

future predictions

iv) Finally, consider the following IMA(2,1) model

11

2

−−+=∆tttaacy θ

In order to obtain expression for future predictions, the model can be rewritten as

11212 −−− −++−=

tttttaacyyy θ

Using this expression, future predictions are given by

TTTTTTTTTTTTayycYaayycEYyEy

121121|12)|2()|(ˆ θθ −−+=−+−+== −+−++

TTTTTTTTTTTTyycYaayycEYyEy −+=−+−+== ++++++ |111212|2

ˆ2)|2()|(ˆ θ

TTTTTTTTTTTTTyycYaayycEYyEy

|1|2213123|3ˆˆ2)|2()|(ˆ ++++++++ −+=−+−+== θ

...

TkTTkTTkTkTkTkTTkTTkTyycYaayycEYyEy

|2|11121|ˆˆ2)|2()|(ˆ −+−+−++−+−+++ −+=−+−+== θ

Up to now we have seen how to obtain point predictions. However, in practice it is of interest to have also

a measure of the uncertainty associated with these predictions. Therefore, we are now deriving the

variances of the prediction errors. In order to derive these variances, we are considering the Wald

representation of the model where ty is expressed as a linear combination of past innovations:

∑∞

=−Ψ=

0i

ititay

where 10

=Ψ and }{ta is a white noise process. In general, we are not imposing the stationarity

condition ∑∞

=

∞<Ψ0

2

i

i because, we want to consider also the prediction of non-stationary time series. As

we have seen before, the prediction of kT

y + given the information available at time T is given by the

conditional mean

( )TktTktYyEy |ˆ

| ++ =

Therefore, what we are doing in the Wald representation is to equal to zero all future innovations:

( ) ∑∑∞

=−+

∞

=−+++ Ψ=

Ψ==00

|||ˆ

i

iTikT

i

ikTiTktTktaYaEYyEy

The prediction error is then given by

∑∑∑−

=−+

∞

=−+

∞

=−++++ Ψ=Ψ−Ψ=−=

1

000

|ˆ

k

i

ikTi

i

iTik

i

ikTiTktkTkTaaayye

Note that the prediction error is a linear combination of the future innovations that we are assuming that

are zero when in reality they are going to be different from zero. Using the previous expression of the

prediction error, it is easy to show that

0)(1

0

=

Ψ= ∑−

=−++

k

i

ikTikTaEeE

43

∑ ∑∑−

=

−

=−+

−

=−++ Ψ=Ψ=

Ψ=1

0

1

0

2222

21

0

)()(k

i

k

i

iaikTi

k

i

ikTikTaEaEeVar σ

Note that when the model is stationary, ∑∞

=

∞<Ψ0

2

i

i, and consequently, the variance is bounded. The

uncertainty about the future increases with the prediction horizon but it is bounded. Furthermore, from the

expression of the variance above, we can see that in this case, the limit of the variance of prediction errors

is the marginal variance. However, when the model is not stationary, the variance of future prediction

errors and, consequently, the uncertainty about the future increases without bound.

Using the results above and assuming that the innovations are Gaussian, we can construct prediction

intervals for kT

y + at the (1-α)% level as follows

( ) 2/1

2/|)(ˆ

kTTkTeVarzy ++ ± α

where αz is the α percentile of the Normal distribution.

-.015

-.010

-.005

.000

.005

.010

.015

2004M10 2005M01 2005M04 2005M07 2005M10

Multistep predictions of stationary transformation of M3

REFERENCES

Brooks, C. (2002), Chapter 5

Maravall, A. (1993), An application of nonlinear time series forecasting, Journal of Business and

Economic Statistics, 1, 66-74.

Mills, T.C. (1999), Chapter 2

Tsay, R.S. (2002), Sections 2.1 to 2.7 and 2.10

EXERCICES

1.1 Obtain the acf functions of AR(2) and ARMA(1,1) functions.

1.2 Obtain the Wald representation of an AR(1), MA(1) and an ARMA(1,1) models.

1.3 Exercices 1, 2, 5 and 6 of chapter 2 of Tsay (2000)

Documents

Chapter 1long