Time Series Analysis and Forecasting€¦ · Time Series Analysis and Forecasting Will McLennan Based on lectures by Dr W. Scott Short Description Management can only hope to achieve

Time Series Analysis and Forecasting

Will McLennanBased on lectures by Dr W. Scott

Short Description

Management can only hope to achieve effective planning and control procedures if reliableforecasts can be made. Stochastic models for time-dependent data are considered.

Seasonal variation; weighted moving averages; Savitsky-Golay smoothing; the use ofSPSS in fitting models; practical examples from science and business

Summary of Intended Learning Outcomes

An understanding of the principles behind modern forecasting techniques. The abilityto select an appropriate model, to fit parameter values, and to carry out the forecastingcalculations.

Contents

1 Introduction 1

2 Sample autovariance and autocorrelation 22.1 White noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.2 The large-sample distribution of rk . . . . . . . . . . . . . . . . . . . . . . 22.3 Hypothesis testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

3 Smoothing methods 33.1 Whittaker-Henderson smoothing . . . . . . . . . . . . . . . . . . . . . . . 33.2 Adjusted average smoothing . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3.2.1 Example: Spencer’s 21 term formula . . . . . . . . . . . . . . . . . 83.2.2 Example: Kings adjusted average formula . . . . . . . . . . . . . . 83.2.3 Formulae for R2

z . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83.2.4 The Slutsky-Yule effect . . . . . . . . . . . . . . . . . . . . . . . . 9

4 Autoregressive moving average models 104.1 MA(q) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104.2 MA(2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

5 Autoregressive models 125.1 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135.2 ARMA (p,q) models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

6 ARIMA models 156.1 Example: Random walk with drift . . . . . . . . . . . . . . . . . . . . . . 156.2 Example: Stock market . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166.3 Forecasting with ARMA models . . . . . . . . . . . . . . . . . . . . . . . 17

6.3.1 Example: An ARIMA(1,1,0) model . . . . . . . . . . . . . . . . . . 18

i

Time Series Analysis and Forecasting 1

1 Introduction

A time series may be thought of as a set of measurements {Xt} indexed by time, t, whichmay be continuous or discrete. We say that the process {Xt} is strictly stationary if forall m, k and t1., t2, . . . , tm the joint distribution of (Xt1+k, Xt2+k, . . . , Xtm+k) is the sameas that of (Xt1 , Xt2 , . . . , Xtm).

Theorem. If {Xt} is strictly stationary,

E(Xt) = µ (a constant) ∀ t (1.1)

and

cov(Xt1 , Xt2) = E(Xt1 ∗Xt2)− E(Xt1).E(Xt2) (1.2)

= γx(t1 − t2) (1.3)

where γx(t1 − t2) is an even function of t1 − t2.

A process for which 1.1 and 1.3 hold is said to be weakly stationary (or second-orderstationary). Thus strictly stationary ⇒ weakly stationary. The converse in not trueunless {Xt} is normally distributed or multivariate normal.

Definitions. Consider a second-order stationary process, {Xt}. The autocovariancefunction is defined as

γx(k) = cov(Xt, Xt+k) ∀ t (1.4)

γx(−k) = γx(k) ∀ k (1.5)

where k is the lag. The autocorrelation function is defined as

ρx = corr(Xt, Xt+k) =γx(k)

γx(0)(1.6)

Notes.

1. γx(0) = var(Xt) = cov(Xt, Xt) and var(Xt+k) = var(Xt)

2. ρx(k) = ρx(−k) for all k and ρx(0) = 1

3. If (as is often the case) we have a discrete process it is sufficient to consider ρx(k)for k = 1, 2, 3 . . . since ρx(−k) = ρx(k).

Remarks. When we have a (second-order) stationary process. E(Xt) = µ for all t sothere are no long-term trends (as Xt tends to return to µ). This is not generally the casein practice. By considering differences in {Xt} we can allow for polynomial trends.

We can deal with polynomial trends by using ARIMA (Auto Regressive Integrated Mov-ing Average) models, but one may not be sure that polynomial trends will continue farinto the future.


2 Sample autovariance and autocorrelation

Suppose that we have a discrete process {Xt} which is , at least, weakly stationary. Wehave obtained the observations X1, X2, X3, . . . , Xn. µ is estimated by the sample meanand we estimate γx(k) by the sample covariance at lag k, i.e.,

Ck =

∑n−kt=1 (Xt − X̄)(Xt+k − X̄)

n∗(2.1)

*some books use n− k here. We estimate ρx(k) by

rk = sample autocorrelation at k =CkC0

= ACF (k) (2.2)

where ACF is the autocorrelation function.

1. We should only use rk when k is smaller than about n/3.

2. A plot of rk against k is called a correlogram and some features of the times seriesmay be deduced from its correlogram.

2.1 White noise

Theorem. A white noise process, {et}, is such that

E(et) = 0 ∀ t (2.3)

{et} are uncorrelated (2.4)

var(et) = σ2 ∀ t (2.5)

That is, γe(k) = 0 for all k 6= 0. If, in addition et ∼ N(0, σ2) for all t, we haveGaussian white noise. Note that for Gaussian white noise, {Zt} are independent (as wellas uncorrelated). There can also be, so called, ‘white noise plus constant mean’ wheresuppose that

Xt = µ+ et (2.6)

where {et} is white noise. Similarly for Gaussian white noise.

2.2 The large-sample distribution of rk

Suppose that {Xt} is a discrete white noise process. It may be shown that for largen,

rk ∼ N(0, 1n) (2.7)

for any given k = 1, 2, . . ., also the rk’s are approximately independent.


2.3 Hypothesis testing

We want to test whether {Xt} is white noise plus a constant mean. We must haveconstant variance for all t and in addition ρ(k) = 0 for k = 1, 2, 3, . . .. We will test thenull hypothesis

H0 : ρ(k) = 0 (2.8)

against the alternativeH1 : ρ(k) 6= 0 (2.9)

where k is fixed (i.e., k = 1, 2, 3, . . .). Under H0,

rk ∼ N(0, 1n) (2.10)

Hence we reject H0 at the 5% significance level if

|rk| >1.96√n

(2.11)

We may also construct a 95% confidence interval for ρ(k) by using the result that

rk ∼ N(ρ(k), 1n) (2.12)

which gives the 95% confidence level rk ± 1.96/√n

These calculations refer to a fixed k, but we may also wish to test

ρ(1) = ρ(2) = . . . = ρ(k) = 0 (2.13)

for some k (up to about n/3). This leads to the problem of multiple testing. In thisproblem, however, the quantities rk are approximately independent N(0, 1n) variablesif H0 is true. Thus the number of rk’s which lie outside ±1.96/

√n is approximately

binomial(k, 0.05), which has a mean of 0.05k.

3 Smoothing methods

3.1 Whittaker-Henderson smoothing

We first consider Whittaker-Henderson smoothing, in which we consider the:

{ui : i = 1, 2, 3, . . .} = raw rates (3.1)

{vi : i = 1, 2, 3, . . .} = smoothed rates (3.2)

We wish to choose v1, v2, . . . , vn so that they are fairly close to the ui’s (adherenceto data) and secondly that they are quite smooth (this is often measured by thirddifferences), In particular, we choose v1, v2, . . . , vn to minimise the quantity

n∑i=1

(ui − vi)2 + λ

n−3∑i=1

(∆3vi)2 (3.3)


where λ is a positive constant and ∆ is the forward difference operator.

Let I = Identity operator so that

If(x) = f(x) ∀x (3.4)

and let E = forward shift operator so that

Ef(x) = f(x+ 1) ∀x (3.5)

We define the forward difference operator ∆ to be

∆ = E − I (3.6)

Thus

∆f(x) = (E − I)f(x) = f(x+ 1)− f(x) (3.7)

∆2f(x) = ∆(∆f(x)) = f(x+ 2)− 2f(x+ 1) + f(x) (3.8)

In general we have the formal binomial expansion

∆pf(x) = (E − I)pf(x) (3.9)

=

p∑j=0

(p

j

)(−1)p−jf(x+ j) (3.10)

If we now write u and v in vector form as:

u = (u1, u2, . . . , un)T (3.11)

v = (v1, v2, . . . , vn)T (3.12)

Thus

∆3vi =

3∑r=0

(3

r

)(−1)3−rvi+r (3.13)

=i+3∑j=i

(3

j − i

)(−1)3+j−ivj (1 ≤ i ≤ n− 3) (3.14)

We conclude that

∆3vi =

n∑j=i

Kijvj (3.15)

where K is the 9n− 3× n matrix with entries

Kij =

{(−1)3+j−i

(3j−1)

for j = 1, i+ 1, i+ 2, . . .

0 otherwise(3.16)


Hence

Kv =

∆3v1∆3v2

...∆3vn−3

(3.17)

and therefore

n−3∑i=1

(∆3vi)2 = (Kv)TKv (3.18)

= vTKTKv (3.19)

We may now express our minimisation problem in matrix form

minimise f(v) = (v − u)T(v − u) + λvTKTKv (3.20)

= vTBv − 2vTu + uTv (3.21)

where B = I + λKTK (3.22)

This is a positive definite matrix, since I is positive definite and λKTK is positivesemi-definite. That is,

xTBx = xTx + λ(Kx)T(Kx) (3.23)

≥ xTx ∀x (3.24)

i.e., xTBx ≥ 0 ∀x (3.25)

It follows that B is invertible.

Theorem.f(v) = (v −B−1u)TB(v −B−1u) + a constant (3.26)

Proof.

(v −B−1u)TB(v −B−1u) = (vT − uTB−1u)B(v −B−1u) (3.27)

= vTB(v −B−1u)− uT(v −B−1u) (3.28)

= vTBv − vTu− uTv + uTB−1u (3.29)

= vTBv − 2vTu + constant (3.30)

= f(v) + constant (3.31)

Conclusion.

The minimum value of f(v) occurs when

v −B−1u = 0 (3.32)


using the fact that B is positive definite

v = B−1u (3.33)

This is the Whittaker-Henderson vector of smoothed values. Note that each vi willdepend in general on all the ui’s and not just the values nearby. This is in contrast toadjusted average smoothing.

3.2 Adjusted average smoothing

Suppose that the raw rates are:vx = ux + ex (3.34)

for a number of consecutive values of x where {ux} denotes a smooth trend (unknown)and {ex} denotes Gaussian white noise. An adjusted average formula is a set of coeffi-cients,

{Kj : j = α, α+ 1, . . . , β} (3.35)

which are used to find the smoothed rates

v′x =

β∑j=α

Kjvx+j = u′x + e′x (3.36)

We may write

v′x =∞∑

j=−∞Kjvx+j (3.37)

if Kj = 0 for j < α and j > β.

Special formulae may be needed near the tails of the data set. The length of the formulais l = α+ β + 1. A central formula is such that α = −β and thus has length l = 2β + 1.A symmetric formula is a central formula with Kj = K−j for j = 1, 2, . . . , β.

For example, Spencer’s 15 term formula is a symmetric formula of length 15,

1

130(−3,−6,−5, 3, 21, 46, 67, 74, 67, 46, 21, 3,−5,−6,−3) (3.38)

We want u′x to be as sloes as possible to ux. Minimising distortion of the trend. Inparticular we often wish the formula to be exact on cubics. We want {ex} to be as smallas possible or as smooth as possible, or perhaps a combination of these features. Wealso want the new error {e′x} to be smaller than the old errors and smoother than theold errors. These properties are assessed by variances, i.e., we want

var(e′x) < var(ex) (3.39)

var(∆3e′x) < var(∆3ex) (3.40)


In particular we define,

Rz =var(∆ze′x)

var(∆zex)for z = 1, 2, . . . (3.41)

We may choose formulae to make R2z as small as possible for some z. In particular, when

z = 0 we obtain minimum variance formula if we minimise R20. These formulae are also

called Savitsky-Golay formulae and are widely used in chemistry. We define the errorreducing index,

E.R.I. =√R2

0 =var(e′x)

var(ex)=

s.d. of e′xs.d. of ex

(3.42)

and the smoothing index,

S.I. =√R2

3 =var(∆3e′x)

var(∆3ex)=

s.d. of ∆3e′xs.d. of ∆3ex

(3.43)

We will show later how to find the formulae which minimise R2z.

We also want to avoid distortion of the trend, recall that

u′x =

β∑j=α

Kjux+j (3.44)

should be as close to the original trend as possible. In particular, we often require oursmoothing to be ‘exact on cubics’, i.e.,

u′x = ux if ux+j is a cubic for α ≤ j ≤ β (3.45)

Now suppose that ux+j is a cubic for α ≤ j ≤ β then

ux+j = β0 + β1j + β2j2 + β3j

3 for α ≤ j ≤ β (3.46)

where β0, β1, β2, β3 are constants which may or may not depend on x. We have,

u′x =

β∑j=α

Kjux+j =

β∑j=α

Kj{β0 + β1j + β2j2 + β3j

3} (3.47)

β0(∑

Kj) + β1(∑

jKj) + β2(∑

j2Kj) + β3(∑

j3Kj) (3.48)

= ux for any β0, β1, β2, β3 (3.49)

We thus have the four equations which must be satisfied for an equation to be exact oncubics:

β∑j=α

Kj = 1

β∑j=α

jKj = 0

β∑j=α

j2Kj = 0

β∑j=α

j3Kj = 0

(3.50)


If the formula is symmetric, i.e., α = −β and Kj = K−j we only need two equa-tions:

β∑j=−β

Kj = 1

β∑j=1

j2Kj = 0 (3.51)

3.2.1 Example: Spencer’s 21 term formula

This is a symmetric formula of length 21 and β = 10, the coefficients are:

1

350(−1,−3,−5,−5,−2, 6, 18, 33, 47, 57, 60, . . .) (3.52)

where 60 is the central term. We can check that this is exact on cubics and calculatethe ERI and SI. The ERI = 0.0378 (moderate) and the SI = 0.00626 (good).

3.2.2 Example: Kings adjusted average formula

A formula of length 15:

Kj =

−0.008 for j = −7,−6,−4,−3,−2

0.216 for j = −2,−1, 0, 1, 2−0.008 for j = 3, 4, 5, 6, 7

(3.53)

This formula was devised to deal with systematic errors with a 5 year cycle. We cancheck that it is exact on cubic, the ERI is 0.484.

3.2.3 Formulae for R2z

We note that

∆zex =z∑j=o

(z

j

)(−1)z−jex+j (3.54)

Using ∆ = E − I we can show that:

var(∆zex) = var

z∑j=0

(z

j

)(−1)z−jex+j

(3.55)

= σ2z∑j=0

[(z

j

)(−1)z−j

]2(3.56)

=

(2z

z

)σ2 (3.57)


It can be shown that (not here) that

var(∆ze′x) =

β∑j=α−z

(∆3Kj)σ2

(3.58)

where we take Kj = 0 if j < α or j > β. Hence

R2z =

∑βα−z(∆

zKj)2(

2zz

) (3.59)

In particular if z = 3 we get:

R23 =

[β∑

α−3(∆3Kj)

2

]/ 20 (3.60)

Hence we can evaluate the smoothing index by constructing a difference table for {Kj}.We may find minimum Rz formulae by matrix methods, or in some cases direct calcula-tion. We omit the theory of minimum Rz formulae, expect for a few special cases whichmay be dealt with by solving a few equations for K0, K1 etc.

3.2.4 The Slutsky-Yule effect

This effect is the fact that the ‘new’ errors {e′x} are correlated and consequently theresiduals are also correlated

v′x − vx = e′x − ex (3.61)

assuming no distortion of the trend. To show this, we note that:

e′x =

β∑j=α

Kjex+j (3.62)

where {ex+j} are independent N(0, σ2) variables. Now

cov(e′xe′x+j) = E(e′xe

′x+j)− 0 (3.63)

since E(e′x) = E(e′x+j) = 0, therefore:

cov(e′xe′x+j) = E

( β∑i=α

Kiex+i

) β∑j=α

Kjex+j

(3.64)

=

β∑i=α

β∑j=α

KiKjE(ex+iex+k+j) (3.65)

=

(β∑i=α

KiKk−i

)σ2 (3.66)


where we define Kk−i = 0 if k−i lies outside the range α to β. Hence the autocorrelationfunction between e′x and e′x+k is given by:

ρk =cov(e′xe

′x+k)√

var(e′x)var(e′x+k)(3.67)

=

∑βi=αKiKk+i∑βi=αK

2i

(3.68)

which is generally non zero. Similar calculations show that the residuals are also corre-lated.

e′x − ex = (1−K0)ex +∑j 6=0

Kjex+j (3.69)

4 Autoregressive moving average models

ARMA (Autoregressive moving average) models are widely used for times series withno trend (or possibly a constant trend). If the trend is polynomial, we can use ARIMA(autoregressive integrated moving average) models, however these models assume thereis a long term polynomial trend which may not be reliable.

4.1 MA(q)

Let {Zt} be [Gaussian] white noise. We suppose that {Yt} is (at least) second orderstationary with zero mean, i.e., E(Yt) = 0 for all t. {Yt} is said to be a moving averagemodel of order q, MA(q), if there are θ1, θ2, . . . , θq such that

Yt = Zt + θtZt−1 + . . .+ θqZt−q ∀t (4.1)

This model has the following properties:

1.var(Yt) = (1 + θ21 + θ22 + . . .+ θ2q)σ

2 (4.2)

2.

γ(k) = cov(YtYt+k) = E(YtYt−k)− E(Yt)E(Yt+k) (4.3)

= E [(Zt + . . .+ θqZt−q)(Zt+k + . . .+ θqZt+k−q)] (4.4)

= σ2

[k−q∑i=0

θiθi+k

](4.5)


where we define θ0 = 1 and use the fact that

E(ZiZj) =

{σ2 if i = j0 otherwise

(4.6)

For k > q, γ(k) = 0 and general theory shows that γ(−k) = γ(k) for k < 0. Thus:

ρ(k) =

∑q−ki=0 θiθi+k∑q−ki=0 θ

2i

(4.7)

and ρ(−k) = ρ(k) for k < 0, ρ(0) = 1 and ρ(k) = ρ(−k) for k > q

We may write the MA(q) model as:

Yt = θ(B)Zt ∀t (4.8)

whereθ(B) = 1 + θ1B + θ2B

2 + . . .+ θqB (4.9)

We sometimes write θ0 = 1 as before.

We want our model to be ‘invertible’, it is sufficient for invertability that the roots of theequation θ(B) = 0 lie outside the unit circle. Consider the MA(1) model where

Yt = Zt − θZt−1 (4.10)

= θ(B)Zt (4.11)

The root of θ(B) is 1/θ so we have an invertible process when |θ| < 1

4.2 MA(2)

Here we have

Yt = Zt + θ1Zt−1 + θ2Zt−2 (4.12)

var(Yt) = (1 + θ21 + θ22)σ2 (4.13)

ρ(k) =

∑q−ki=0 θiθi+k∑qi=0 θ

2i

(4.14)

which gives

ρ(1) =θ1(1 + θ2)

1 + θ21 + θ22(4.15)

ρ(2) =θ0θ2)

1 + θ21 + θ22(4.16)


5 Autoregressive models

We say that {Yt} is autoregressive of order p, AR(p) if

Yt = φ1Yt−1 + . . .+ φpYt−p + Zt ∀t (5.1)

for some constants φ1, φ2, . . . , φp. We may write this as:

φ(B)Yt = Zt ∀t (5.2)

whereφ(B) = 1− φ1B − φ2B2 − . . .− φpBp (5.3)

We normally assume E(Yt) = 0 for all t and for technical reasons wish the process to beinvertible (or ‘causal’). This is the case if the roots lie outside the unit circle. We thenfind that

Yt =1

φ(B)Zt

=∞∑i=0

ψiZt−i

where {ψi} depend on the φ’s, with ψ0 = 1 and∑∞

i=0 |ψi| <∞

We recall that:

Yt =

∞∑t=0

ψiZt−i (5.4)

It follows that for k = 1, 2, 3, . . .

γ(k) = cov(Yt, Yt+k) = E(Yt, Yt+k)− E(Yt)E(Yt+k) (5.5)

= E

[ ∞∑i=0

ψiZt−i

] ∞∑j=0

ψjZt+k−j

(5.6)

=

∞∑=0

∞∑j=0

ψiψjE(Zt−iZt+k−j) (5.7)

={σ2if j = i+ k,= 0otherwise (5.8)

CHANGE TO PIECEWISE

Thus

ρ(k) =cov(Yt, Yt+k)√var(Yt)var(Yt+k)

(5.9)


Now

var(Yt) = var

( ∞∑i=0

ψiZt−i

)=

( ∞∑i=0

ψ2i

)σ2 = var(Yt+k) (5.10)

Hence

ρ(k) =

∑∞i=0 ψiψi+k∑∞i=0 ψ

2i

(5.11)

Note that ρ(−k) = ρ(k) if k < 0 and ρ(0) = 1.

5.1 Example

Consider the AR(1) model:

Yt = φiYt−i + Zt (5.12)

where |φ| < 1, φ1 6= 0. We have shown that

ψi = (φi)i, for i = 0, 1, 2, . . . (5.13)

Therefore

ρ(k) =

∑∞i=0 ψiψi+k∑∞i=0 ψ

2i

=

∑∞i=0 φ

i+i+ki∑∞

i=0 φ2ii

= φki for k = 1, 2, . . . (5.14)

Note that if {rk} looks like:

1 2 3 4 5 k

Figure 1: Sample correlogram for which an AR(1) model could be used

we could try an AR(1) model (with 0 < φi < 1). Also if the correlogram looks like:

1 2 3 4 5 k

Figure 2: Sample correlogram for which an AR(1) model could be used

again we could try an AR(1) model (with 0 < φi < 1).


5.2 ARMA (p,q) models

These combine MA(q) and AR(p) models, i.e.,

φ(B)Yt = θ(B)Zt ∀t (5.15)

or[1− φ1B − φ2B2 − . . .− φpBp]Yt = [1 + θ1B + . . .+ θqB

q]Zt (5.16)

These models include MA(q) and AR(p) as special cases if we allow p = 0 and q = 0respectively. We normally require the model to be ‘regular’. i.e.,

1. The roots of φ(B) and θ(B) lie outside the unit circle

2. They have no common roots (which could be ‘canceled’ and simplify the model)

We again have the expression

Yt =

∞∑i=0

ψiZt−i (5.17)

which follows from the formal division

Yt =θ(B)

φ(B)Zt = ψ(B)Zt (5.18)

where

ψ(B) =θ(B)

φ(B)Zt = ψ0 + ψ1B + ψ2B

2 + . . . (5.19)

where ψ0 = 1 and the coefficients are:

∞∑i=0

|ψi| <∞ (5.20)

Sometimes we can find ψ(B) by little tricks. For example if we have that:

φ(B)Xt = θ(B)Zt (5.21)

where φ(B) = 1 − αB, θ(B) = 1 + 3αB. We know that they lie outside the unit circlebecause we are give that 0 < α < 1/3. Thus

Yt = ψ(B)Zt (5.22)

ψ(B) = (1 + 3αB)(1− αB)−1 (5.23)

= (1 + 3αB)(1 + αB + α2B2 + . . .) (5.24)

This allows us to evaluate ψ(B). We have assumed until now that the trend was trendwas zero, we can also have constant trend. If the case is such that E(Yt) = µ, where µ


may be non zero. Suppose we have an ARMA(p,q) model with E(Yt) = µ, for all t. Wemay subtract µ from Yt and proceed as before, i.e.,

φ(B)[Yt − µ] = θ(B)Zt (5.25)

This can be written asφ(B)Yt = θ(B) + µ′ (5.26)

where µ′ = (1− φ1 − . . .− φp)µ. We shall normally assume that E(Yt) = 0.

Parameter estimation assuming an ARMA(p,q) model with E(Yt) = µ, we have theparameters {φi}, {θi}, σ2 and µ. We should check the fit by residual analysis (there arevarious tests) and perhaps alter p and q and try again. This process is usually done bya computer.

6 ARIMA models

Now suppose ha E(Yt) is not necessrily constant, but is a polynomial of degree d. Thisassumption may be quesionable over a long time-scale. We now use ARIMA(p,d,q)models. Where ARIMA stands for Auto Regressive Integrated Moving Average. and isdefined that {Yt} is ARIMA(p,d,q) if:

Wt = (I −B)dYt or ∇dYt (t = 1, 2, 3, . . .) (6.1)

where ∇ is the backwards difference operator. If E(Wt) = µ, then E(Yt) = the trend inYt which is a polynomial of degree d. We consider t = 1, 2, . . . only and we specify Y0and possibly Y−1, Y−2 depending on d.

6.1 Example: Random walk with drift

Consider the ARIMA(0,1,0) model:

Wt = ∇Yt = Z+µ (6.2)

We assume that Y0 is specified, we then have:

Y − 1− Y0 = Z1 + µ so Y1 = Y0 + Z1 + µ (6.3)

andY2 − Y − 1− Y0 = Z2 + µ so Y2 = Y0 + Z1 + Z2 + µ (6.4)

and so on, i.e.,Yt = Y0 + tµ+ Z1 + Z2 + . . .+ Zt (6.5)

The expectation of Yt is given by:

Yt = Y0 + tµ (6.6)


as the expectation of all the white noise is zero. The variance is given by:

var(Yt) = tσ2 (6.7)

Notes:

1. If {Zt} is Gaussian white noise then Yt ∼ N(Y0 + µt, tσ2)

2. If µ = 0, there is no drift

3. For any t we are 95% confident that Yt lies in the parabola shown:

1.96 σ√t

1.96 σ√t

Y

Y + μt

0

0

t0 1 2 3 ...

Figure 3: 95% confidence limits on ARIMA models

The correlation coefficient of Yt and Ys is given by

ρt,s =

√t

s(6.8)

6.2 Example: Stock market

Suppose you invest £1,000 now, let:

Yt = ln [proceeds of investment at time t] t = 1, 2, . . . (6.9)

It is assumed that

ln

[proceeds at time t

proceeds at time t -1

]= µ+ Zt (6.10)

where µ is a constant and {Zt} is Gaussian white noise. Thus,

Wt = Yt − Yt−1 = µ+ Zt = N(µ, σ2) (6.11)

By previous resultsYt ∼ N(Y0 + µ, tσ2) (6.12)


where Y0 = ln(1000). Suppose that µ = 0.058135 and σ2 = 0.000267. What is thechance that the proceeds at time t = 15 will be under £2,100? Y15 = 0.0632852. Itfollows that

P (Y15 < ln(2100)) = P

(Z <

ln(2100)− 7.77978

0.063285

)(6.13)

= 0.02 (6.14)

6.3 Forecasting with ARMA models

We recall thatYt = Zt + ψ1Zt−1 + ψ2Zt−2 + . . . (6.15)

Suppose we are at time n and we wish to estimate Yt at time n + h. The ‘natural’forecast is:

Y hn+h = E(Yn+h/ Yn, Yn−1, /ldots︸︷︷︸

known

) (6.16)

It is stated without proof that:

Y hn+h =

∑ψiZn+h−i (6.17)

The error in the forecast is:

en(h) = Yn+h︸︷︷︸real

− Y hn+h︸︷︷︸

forecast

=h−1∑i=0

ψiZn+h−i (6.18)

We observe that en(h) has mean 0 and variance

σ2h−1∑i=0

ψ2 (6.19)

If we assume Gaussian white noise for {Zt}, we have

ehn ∼ N

(0, σ2

h−1∑i=0

ψi

)(6.20)

Thus a 95% prediction interval for Yn+h is given by

Y hn+h ± 1.96σ

√√√√h−1∑i=o

ψ2i (6.21)

The width of the interval increases as h increases as expected.


6.3.1 Example: An ARIMA(1,1,0) model

Suppose that:Wt = ∇Yt = φWt−1 + Zt (6.22)

i.e.,Yt = (1− φ)Yt−1 − φYt−2 + Zt (6.23)

where we take Y0 = Y−1 = 0. We get that

Y1 = Z1 (6.24)

Y2 = (1 + φ)Z1 + Z2 (6.25)

Y3 = Z3 + (1 + φ)Z2 + (1 + φ+ φ2)Z1 (6.26)

Giving in general

Yt = Zt + (1 + φ)Zt−1 + . . .+ (1 + φ+ φ2 + . . .+ φt−1)Z1 for t = 1, 2, 3, . . . (6.27)

Thus E(Yt) = 0 for all t and the variance is given using the geometric progressionformula

var(Yt) =

[1 +

(1− φ2

1− φ

)2

+ . . .+

(1− φt

1− φ

)2]σ2 (6.28)

Documents

Time Series Analysis and Forecasting€¦ · Time Series Analysis and Forecasting Will McLennan Based on lectures by Dr W. Scott Short Description Management can only hope to achieve