Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Time Series Analysis and Forecasting
Will McLennanBased on lectures by Dr W. Scott
Short Description
Management can only hope to achieve effective planning and control procedures if reliableforecasts can be made. Stochastic models for time-dependent data are considered.
Seasonal variation; weighted moving averages; Savitsky-Golay smoothing; the use ofSPSS in fitting models; practical examples from science and business
Summary of Intended Learning Outcomes
An understanding of the principles behind modern forecasting techniques. The abilityto select an appropriate model, to fit parameter values, and to carry out the forecastingcalculations.
Contents
1 Introduction 1
2 Sample autovariance and autocorrelation 22.1 White noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.2 The large-sample distribution of rk . . . . . . . . . . . . . . . . . . . . . . 22.3 Hypothesis testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
3 Smoothing methods 33.1 Whittaker-Henderson smoothing . . . . . . . . . . . . . . . . . . . . . . . 33.2 Adjusted average smoothing . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.2.1 Example: Spencer’s 21 term formula . . . . . . . . . . . . . . . . . 83.2.2 Example: Kings adjusted average formula . . . . . . . . . . . . . . 83.2.3 Formulae for R2
z . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83.2.4 The Slutsky-Yule effect . . . . . . . . . . . . . . . . . . . . . . . . 9
4 Autoregressive moving average models 104.1 MA(q) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104.2 MA(2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
5 Autoregressive models 125.1 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135.2 ARMA (p,q) models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
6 ARIMA models 156.1 Example: Random walk with drift . . . . . . . . . . . . . . . . . . . . . . 156.2 Example: Stock market . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166.3 Forecasting with ARMA models . . . . . . . . . . . . . . . . . . . . . . . 17
6.3.1 Example: An ARIMA(1,1,0) model . . . . . . . . . . . . . . . . . . 18
i
Time Series Analysis and Forecasting 1
1 Introduction
A time series may be thought of as a set of measurements {Xt} indexed by time, t, whichmay be continuous or discrete. We say that the process {Xt} is strictly stationary if forall m, k and t1., t2, . . . , tm the joint distribution of (Xt1+k, Xt2+k, . . . , Xtm+k) is the sameas that of (Xt1 , Xt2 , . . . , Xtm).
Theorem. If {Xt} is strictly stationary,
E(Xt) = µ (a constant) ∀ t (1.1)
and
cov(Xt1 , Xt2) = E(Xt1 ∗Xt2)− E(Xt1).E(Xt2) (1.2)
= γx(t1 − t2) (1.3)
where γx(t1 − t2) is an even function of t1 − t2.
A process for which 1.1 and 1.3 hold is said to be weakly stationary (or second-orderstationary). Thus strictly stationary ⇒ weakly stationary. The converse in not trueunless {Xt} is normally distributed or multivariate normal.
Definitions. Consider a second-order stationary process, {Xt}. The autocovariancefunction is defined as
γx(k) = cov(Xt, Xt+k) ∀ t (1.4)
γx(−k) = γx(k) ∀ k (1.5)
where k is the lag. The autocorrelation function is defined as
ρx = corr(Xt, Xt+k) =γx(k)
γx(0)(1.6)
Notes.
1. γx(0) = var(Xt) = cov(Xt, Xt) and var(Xt+k) = var(Xt)
2. ρx(k) = ρx(−k) for all k and ρx(0) = 1
3. If (as is often the case) we have a discrete process it is sufficient to consider ρx(k)for k = 1, 2, 3 . . . since ρx(−k) = ρx(k).
Remarks. When we have a (second-order) stationary process. E(Xt) = µ for all t sothere are no long-term trends (as Xt tends to return to µ). This is not generally the casein practice. By considering differences in {Xt} we can allow for polynomial trends.
We can deal with polynomial trends by using ARIMA (Auto Regressive Integrated Mov-ing Average) models, but one may not be sure that polynomial trends will continue farinto the future.
Time Series Analysis and Forecasting 2
2 Sample autovariance and autocorrelation
Suppose that we have a discrete process {Xt} which is , at least, weakly stationary. Wehave obtained the observations X1, X2, X3, . . . , Xn. µ is estimated by the sample meanand we estimate γx(k) by the sample covariance at lag k, i.e.,
Ck =
∑n−kt=1 (Xt − X̄)(Xt+k − X̄)
n∗(2.1)
*some books use n− k here. We estimate ρx(k) by
rk = sample autocorrelation at k =CkC0
= ACF (k) (2.2)
where ACF is the autocorrelation function.
1. We should only use rk when k is smaller than about n/3.
2. A plot of rk against k is called a correlogram and some features of the times seriesmay be deduced from its correlogram.
2.1 White noise
Theorem. A white noise process, {et}, is such that
E(et) = 0 ∀ t (2.3)
{et} are uncorrelated (2.4)
var(et) = σ2 ∀ t (2.5)
That is, γe(k) = 0 for all k 6= 0. If, in addition et ∼ N(0, σ2) for all t, we haveGaussian white noise. Note that for Gaussian white noise, {Zt} are independent (as wellas uncorrelated). There can also be, so called, ‘white noise plus constant mean’ wheresuppose that
Xt = µ+ et (2.6)
where {et} is white noise. Similarly for Gaussian white noise.
2.2 The large-sample distribution of rk
Suppose that {Xt} is a discrete white noise process. It may be shown that for largen,
rk ∼ N(0, 1n) (2.7)
for any given k = 1, 2, . . ., also the rk’s are approximately independent.
Time Series Analysis and Forecasting 3
2.3 Hypothesis testing
We want to test whether {Xt} is white noise plus a constant mean. We must haveconstant variance for all t and in addition ρ(k) = 0 for k = 1, 2, 3, . . .. We will test thenull hypothesis
H0 : ρ(k) = 0 (2.8)
against the alternativeH1 : ρ(k) 6= 0 (2.9)
where k is fixed (i.e., k = 1, 2, 3, . . .). Under H0,
rk ∼ N(0, 1n) (2.10)
Hence we reject H0 at the 5% significance level if
|rk| >1.96√n
(2.11)
We may also construct a 95% confidence interval for ρ(k) by using the result that
rk ∼ N(ρ(k), 1n) (2.12)
which gives the 95% confidence level rk ± 1.96/√n
These calculations refer to a fixed k, but we may also wish to test
ρ(1) = ρ(2) = . . . = ρ(k) = 0 (2.13)
for some k (up to about n/3). This leads to the problem of multiple testing. In thisproblem, however, the quantities rk are approximately independent N(0, 1n) variablesif H0 is true. Thus the number of rk’s which lie outside ±1.96/
√n is approximately
binomial(k, 0.05), which has a mean of 0.05k.
3 Smoothing methods
3.1 Whittaker-Henderson smoothing
We first consider Whittaker-Henderson smoothing, in which we consider the:
{ui : i = 1, 2, 3, . . .} = raw rates (3.1)
{vi : i = 1, 2, 3, . . .} = smoothed rates (3.2)
We wish to choose v1, v2, . . . , vn so that they are fairly close to the ui’s (adherenceto data) and secondly that they are quite smooth (this is often measured by thirddifferences), In particular, we choose v1, v2, . . . , vn to minimise the quantity
n∑i=1
(ui − vi)2 + λ
n−3∑i=1
(∆3vi)2 (3.3)
Time Series Analysis and Forecasting 4
where λ is a positive constant and ∆ is the forward difference operator.
Let I = Identity operator so that
If(x) = f(x) ∀x (3.4)
and let E = forward shift operator so that
Ef(x) = f(x+ 1) ∀x (3.5)
We define the forward difference operator ∆ to be
∆ = E − I (3.6)
Thus
∆f(x) = (E − I)f(x) = f(x+ 1)− f(x) (3.7)
∆2f(x) = ∆(∆f(x)) = f(x+ 2)− 2f(x+ 1) + f(x) (3.8)
In general we have the formal binomial expansion
∆pf(x) = (E − I)pf(x) (3.9)
=
p∑j=0
(p
j
)(−1)p−jf(x+ j) (3.10)
If we now write u and v in vector form as:
u = (u1, u2, . . . , un)T (3.11)
v = (v1, v2, . . . , vn)T (3.12)
Thus
∆3vi =
3∑r=0
(3
r
)(−1)3−rvi+r (3.13)
=i+3∑j=i
(3
j − i
)(−1)3+j−ivj (1 ≤ i ≤ n− 3) (3.14)
We conclude that
∆3vi =
n∑j=i
Kijvj (3.15)
where K is the 9n− 3× n matrix with entries
Kij =
{(−1)3+j−i
(3j−1)
for j = 1, i+ 1, i+ 2, . . .
0 otherwise(3.16)
Time Series Analysis and Forecasting 5
Hence
Kv =
∆3v1∆3v2
...∆3vn−3
(3.17)
and therefore
n−3∑i=1
(∆3vi)2 = (Kv)TKv (3.18)
= vTKTKv (3.19)
We may now express our minimisation problem in matrix form
minimise f(v) = (v − u)T(v − u) + λvTKTKv (3.20)
= vTBv − 2vTu + uTv (3.21)
where B = I + λKTK (3.22)
This is a positive definite matrix, since I is positive definite and λKTK is positivesemi-definite. That is,
xTBx = xTx + λ(Kx)T(Kx) (3.23)
≥ xTx ∀x (3.24)
i.e., xTBx ≥ 0 ∀x (3.25)
It follows that B is invertible.
Theorem.f(v) = (v −B−1u)TB(v −B−1u) + a constant (3.26)
Proof.
(v −B−1u)TB(v −B−1u) = (vT − uTB−1u)B(v −B−1u) (3.27)
= vTB(v −B−1u)− uT(v −B−1u) (3.28)
= vTBv − vTu− uTv + uTB−1u (3.29)
= vTBv − 2vTu + constant (3.30)
= f(v) + constant (3.31)
Conclusion.
The minimum value of f(v) occurs when
v −B−1u = 0 (3.32)
Time Series Analysis and Forecasting 6
using the fact that B is positive definite
v = B−1u (3.33)
This is the Whittaker-Henderson vector of smoothed values. Note that each vi willdepend in general on all the ui’s and not just the values nearby. This is in contrast toadjusted average smoothing.
3.2 Adjusted average smoothing
Suppose that the raw rates are:vx = ux + ex (3.34)
for a number of consecutive values of x where {ux} denotes a smooth trend (unknown)and {ex} denotes Gaussian white noise. An adjusted average formula is a set of coeffi-cients,
{Kj : j = α, α+ 1, . . . , β} (3.35)
which are used to find the smoothed rates
v′x =
β∑j=α
Kjvx+j = u′x + e′x (3.36)
We may write
v′x =∞∑
j=−∞Kjvx+j (3.37)
if Kj = 0 for j < α and j > β.
Special formulae may be needed near the tails of the data set. The length of the formulais l = α+ β + 1. A central formula is such that α = −β and thus has length l = 2β + 1.A symmetric formula is a central formula with Kj = K−j for j = 1, 2, . . . , β.
For example, Spencer’s 15 term formula is a symmetric formula of length 15,
1
130(−3,−6,−5, 3, 21, 46, 67, 74, 67, 46, 21, 3,−5,−6,−3) (3.38)
We want u′x to be as sloes as possible to ux. Minimising distortion of the trend. Inparticular we often wish the formula to be exact on cubics. We want {ex} to be as smallas possible or as smooth as possible, or perhaps a combination of these features. Wealso want the new error {e′x} to be smaller than the old errors and smoother than theold errors. These properties are assessed by variances, i.e., we want
var(e′x) < var(ex) (3.39)
var(∆3e′x) < var(∆3ex) (3.40)
Time Series Analysis and Forecasting 7
In particular we define,
Rz =var(∆ze′x)
var(∆zex)for z = 1, 2, . . . (3.41)
We may choose formulae to make R2z as small as possible for some z. In particular, when
z = 0 we obtain minimum variance formula if we minimise R20. These formulae are also
called Savitsky-Golay formulae and are widely used in chemistry. We define the errorreducing index,
E.R.I. =√R2
0 =var(e′x)
var(ex)=
s.d. of e′xs.d. of ex
(3.42)
and the smoothing index,
S.I. =√R2
3 =var(∆3e′x)
var(∆3ex)=
s.d. of ∆3e′xs.d. of ∆3ex
(3.43)
We will show later how to find the formulae which minimise R2z.
We also want to avoid distortion of the trend, recall that
u′x =
β∑j=α
Kjux+j (3.44)
should be as close to the original trend as possible. In particular, we often require oursmoothing to be ‘exact on cubics’, i.e.,
u′x = ux if ux+j is a cubic for α ≤ j ≤ β (3.45)
Now suppose that ux+j is a cubic for α ≤ j ≤ β then
ux+j = β0 + β1j + β2j2 + β3j
3 for α ≤ j ≤ β (3.46)
where β0, β1, β2, β3 are constants which may or may not depend on x. We have,
u′x =
β∑j=α
Kjux+j =
β∑j=α
Kj{β0 + β1j + β2j2 + β3j
3} (3.47)
β0(∑
Kj) + β1(∑
jKj) + β2(∑
j2Kj) + β3(∑
j3Kj) (3.48)
= ux for any β0, β1, β2, β3 (3.49)
We thus have the four equations which must be satisfied for an equation to be exact oncubics:
β∑j=α
Kj = 1
β∑j=α
jKj = 0
β∑j=α
j2Kj = 0
β∑j=α
j3Kj = 0
(3.50)
Time Series Analysis and Forecasting 8
If the formula is symmetric, i.e., α = −β and Kj = K−j we only need two equa-tions:
β∑j=−β
Kj = 1
β∑j=1
j2Kj = 0 (3.51)
3.2.1 Example: Spencer’s 21 term formula
This is a symmetric formula of length 21 and β = 10, the coefficients are:
1
350(−1,−3,−5,−5,−2, 6, 18, 33, 47, 57, 60, . . .) (3.52)
where 60 is the central term. We can check that this is exact on cubics and calculatethe ERI and SI. The ERI = 0.0378 (moderate) and the SI = 0.00626 (good).
3.2.2 Example: Kings adjusted average formula
A formula of length 15:
Kj =
−0.008 for j = −7,−6,−4,−3,−2
0.216 for j = −2,−1, 0, 1, 2−0.008 for j = 3, 4, 5, 6, 7
(3.53)
This formula was devised to deal with systematic errors with a 5 year cycle. We cancheck that it is exact on cubic, the ERI is 0.484.
3.2.3 Formulae for R2z
We note that
∆zex =z∑j=o
(z
j
)(−1)z−jex+j (3.54)
Using ∆ = E − I we can show that:
var(∆zex) = var
z∑j=0
(z
j
)(−1)z−jex+j
(3.55)
= σ2z∑j=0
[(z
j
)(−1)z−j
]2(3.56)
=
(2z
z
)σ2 (3.57)
Time Series Analysis and Forecasting 9
It can be shown that (not here) that
var(∆ze′x) =
β∑j=α−z
(∆3Kj)σ2
(3.58)
where we take Kj = 0 if j < α or j > β. Hence
R2z =
∑βα−z(∆
zKj)2(
2zz
) (3.59)
In particular if z = 3 we get:
R23 =
[β∑
α−3(∆3Kj)
2
]/ 20 (3.60)
Hence we can evaluate the smoothing index by constructing a difference table for {Kj}.We may find minimum Rz formulae by matrix methods, or in some cases direct calcula-tion. We omit the theory of minimum Rz formulae, expect for a few special cases whichmay be dealt with by solving a few equations for K0, K1 etc.
3.2.4 The Slutsky-Yule effect
This effect is the fact that the ‘new’ errors {e′x} are correlated and consequently theresiduals are also correlated
v′x − vx = e′x − ex (3.61)
assuming no distortion of the trend. To show this, we note that:
e′x =
β∑j=α
Kjex+j (3.62)
where {ex+j} are independent N(0, σ2) variables. Now
cov(e′xe′x+j) = E(e′xe
′x+j)− 0 (3.63)
since E(e′x) = E(e′x+j) = 0, therefore:
cov(e′xe′x+j) = E
( β∑i=α
Kiex+i
) β∑j=α
Kjex+j
(3.64)
=
β∑i=α
β∑j=α
KiKjE(ex+iex+k+j) (3.65)
=
(β∑i=α
KiKk−i
)σ2 (3.66)
Time Series Analysis and Forecasting 10
where we define Kk−i = 0 if k−i lies outside the range α to β. Hence the autocorrelationfunction between e′x and e′x+k is given by:
ρk =cov(e′xe
′x+k)√
var(e′x)var(e′x+k)(3.67)
=
∑βi=αKiKk+i∑βi=αK
2i
(3.68)
which is generally non zero. Similar calculations show that the residuals are also corre-lated.
e′x − ex = (1−K0)ex +∑j 6=0
Kjex+j (3.69)
4 Autoregressive moving average models
ARMA (Autoregressive moving average) models are widely used for times series withno trend (or possibly a constant trend). If the trend is polynomial, we can use ARIMA(autoregressive integrated moving average) models, however these models assume thereis a long term polynomial trend which may not be reliable.
4.1 MA(q)
Let {Zt} be [Gaussian] white noise. We suppose that {Yt} is (at least) second orderstationary with zero mean, i.e., E(Yt) = 0 for all t. {Yt} is said to be a moving averagemodel of order q, MA(q), if there are θ1, θ2, . . . , θq such that
Yt = Zt + θtZt−1 + . . .+ θqZt−q ∀t (4.1)
This model has the following properties:
1.var(Yt) = (1 + θ21 + θ22 + . . .+ θ2q)σ
2 (4.2)
2.
γ(k) = cov(YtYt+k) = E(YtYt−k)− E(Yt)E(Yt+k) (4.3)
= E [(Zt + . . .+ θqZt−q)(Zt+k + . . .+ θqZt+k−q)] (4.4)
= σ2
[k−q∑i=0
θiθi+k
](4.5)
Time Series Analysis and Forecasting 11
where we define θ0 = 1 and use the fact that
E(ZiZj) =
{σ2 if i = j0 otherwise
(4.6)
For k > q, γ(k) = 0 and general theory shows that γ(−k) = γ(k) for k < 0. Thus:
ρ(k) =
∑q−ki=0 θiθi+k∑q−ki=0 θ
2i
(4.7)
and ρ(−k) = ρ(k) for k < 0, ρ(0) = 1 and ρ(k) = ρ(−k) for k > q
We may write the MA(q) model as:
Yt = θ(B)Zt ∀t (4.8)
whereθ(B) = 1 + θ1B + θ2B
2 + . . .+ θqB (4.9)
We sometimes write θ0 = 1 as before.
We want our model to be ‘invertible’, it is sufficient for invertability that the roots of theequation θ(B) = 0 lie outside the unit circle. Consider the MA(1) model where
Yt = Zt − θZt−1 (4.10)
= θ(B)Zt (4.11)
The root of θ(B) is 1/θ so we have an invertible process when |θ| < 1
4.2 MA(2)
Here we have
Yt = Zt + θ1Zt−1 + θ2Zt−2 (4.12)
var(Yt) = (1 + θ21 + θ22)σ2 (4.13)
ρ(k) =
∑q−ki=0 θiθi+k∑qi=0 θ
2i
(4.14)
which gives
ρ(1) =θ1(1 + θ2)
1 + θ21 + θ22(4.15)
ρ(2) =θ0θ2)
1 + θ21 + θ22(4.16)
Time Series Analysis and Forecasting 12
5 Autoregressive models
We say that {Yt} is autoregressive of order p, AR(p) if
Yt = φ1Yt−1 + . . .+ φpYt−p + Zt ∀t (5.1)
for some constants φ1, φ2, . . . , φp. We may write this as:
φ(B)Yt = Zt ∀t (5.2)
whereφ(B) = 1− φ1B − φ2B2 − . . .− φpBp (5.3)
We normally assume E(Yt) = 0 for all t and for technical reasons wish the process to beinvertible (or ‘causal’). This is the case if the roots lie outside the unit circle. We thenfind that
Yt =1
φ(B)Zt
=∞∑i=0
ψiZt−i
where {ψi} depend on the φ’s, with ψ0 = 1 and∑∞
i=0 |ψi| <∞
We recall that:
Yt =
∞∑t=0
ψiZt−i (5.4)
It follows that for k = 1, 2, 3, . . .
γ(k) = cov(Yt, Yt+k) = E(Yt, Yt+k)− E(Yt)E(Yt+k) (5.5)
= E
[ ∞∑i=0
ψiZt−i
] ∞∑j=0
ψjZt+k−j
(5.6)
=
∞∑=0
∞∑j=0
ψiψjE(Zt−iZt+k−j) (5.7)
={σ2if j = i+ k,= 0otherwise (5.8)
CHANGE TO PIECEWISE
Thus
ρ(k) =cov(Yt, Yt+k)√var(Yt)var(Yt+k)
(5.9)
Time Series Analysis and Forecasting 13
Now
var(Yt) = var
( ∞∑i=0
ψiZt−i
)=
( ∞∑i=0
ψ2i
)σ2 = var(Yt+k) (5.10)
Hence
ρ(k) =
∑∞i=0 ψiψi+k∑∞i=0 ψ
2i
(5.11)
Note that ρ(−k) = ρ(k) if k < 0 and ρ(0) = 1.
5.1 Example
Consider the AR(1) model:
Yt = φiYt−i + Zt (5.12)
where |φ| < 1, φ1 6= 0. We have shown that
ψi = (φi)i, for i = 0, 1, 2, . . . (5.13)
Therefore
ρ(k) =
∑∞i=0 ψiψi+k∑∞i=0 ψ
2i
=
∑∞i=0 φ
i+i+ki∑∞
i=0 φ2ii
= φki for k = 1, 2, . . . (5.14)
Note that if {rk} looks like:
1 2 3 4 5 k
Figure 1: Sample correlogram for which an AR(1) model could be used
we could try an AR(1) model (with 0 < φi < 1). Also if the correlogram looks like:
1 2 3 4 5 k
Figure 2: Sample correlogram for which an AR(1) model could be used
again we could try an AR(1) model (with 0 < φi < 1).
Time Series Analysis and Forecasting 14
5.2 ARMA (p,q) models
These combine MA(q) and AR(p) models, i.e.,
φ(B)Yt = θ(B)Zt ∀t (5.15)
or[1− φ1B − φ2B2 − . . .− φpBp]Yt = [1 + θ1B + . . .+ θqB
q]Zt (5.16)
These models include MA(q) and AR(p) as special cases if we allow p = 0 and q = 0respectively. We normally require the model to be ‘regular’. i.e.,
1. The roots of φ(B) and θ(B) lie outside the unit circle
2. They have no common roots (which could be ‘canceled’ and simplify the model)
We again have the expression
Yt =
∞∑i=0
ψiZt−i (5.17)
which follows from the formal division
Yt =θ(B)
φ(B)Zt = ψ(B)Zt (5.18)
where
ψ(B) =θ(B)
φ(B)Zt = ψ0 + ψ1B + ψ2B
2 + . . . (5.19)
where ψ0 = 1 and the coefficients are:
∞∑i=0
|ψi| <∞ (5.20)
Sometimes we can find ψ(B) by little tricks. For example if we have that:
φ(B)Xt = θ(B)Zt (5.21)
where φ(B) = 1 − αB, θ(B) = 1 + 3αB. We know that they lie outside the unit circlebecause we are give that 0 < α < 1/3. Thus
Yt = ψ(B)Zt (5.22)
ψ(B) = (1 + 3αB)(1− αB)−1 (5.23)
= (1 + 3αB)(1 + αB + α2B2 + . . .) (5.24)
This allows us to evaluate ψ(B). We have assumed until now that the trend was trendwas zero, we can also have constant trend. If the case is such that E(Yt) = µ, where µ
Time Series Analysis and Forecasting 15
may be non zero. Suppose we have an ARMA(p,q) model with E(Yt) = µ, for all t. Wemay subtract µ from Yt and proceed as before, i.e.,
φ(B)[Yt − µ] = θ(B)Zt (5.25)
This can be written asφ(B)Yt = θ(B) + µ′ (5.26)
where µ′ = (1− φ1 − . . .− φp)µ. We shall normally assume that E(Yt) = 0.
Parameter estimation assuming an ARMA(p,q) model with E(Yt) = µ, we have theparameters {φi}, {θi}, σ2 and µ. We should check the fit by residual analysis (there arevarious tests) and perhaps alter p and q and try again. This process is usually done bya computer.
6 ARIMA models
Now suppose ha E(Yt) is not necessrily constant, but is a polynomial of degree d. Thisassumption may be quesionable over a long time-scale. We now use ARIMA(p,d,q)models. Where ARIMA stands for Auto Regressive Integrated Moving Average. and isdefined that {Yt} is ARIMA(p,d,q) if:
Wt = (I −B)dYt or ∇dYt (t = 1, 2, 3, . . .) (6.1)
where ∇ is the backwards difference operator. If E(Wt) = µ, then E(Yt) = the trend inYt which is a polynomial of degree d. We consider t = 1, 2, . . . only and we specify Y0and possibly Y−1, Y−2 depending on d.
6.1 Example: Random walk with drift
Consider the ARIMA(0,1,0) model:
Wt = ∇Yt = Z+µ (6.2)
We assume that Y0 is specified, we then have:
Y − 1− Y0 = Z1 + µ so Y1 = Y0 + Z1 + µ (6.3)
andY2 − Y − 1− Y0 = Z2 + µ so Y2 = Y0 + Z1 + Z2 + µ (6.4)
and so on, i.e.,Yt = Y0 + tµ+ Z1 + Z2 + . . .+ Zt (6.5)
The expectation of Yt is given by:
Yt = Y0 + tµ (6.6)
Time Series Analysis and Forecasting 16
as the expectation of all the white noise is zero. The variance is given by:
var(Yt) = tσ2 (6.7)
Notes:
1. If {Zt} is Gaussian white noise then Yt ∼ N(Y0 + µt, tσ2)
2. If µ = 0, there is no drift
3. For any t we are 95% confident that Yt lies in the parabola shown:
1.96 σ√t
1.96 σ√t
Y
Y + μt
0
0
t0 1 2 3 ...
Figure 3: 95% confidence limits on ARIMA models
The correlation coefficient of Yt and Ys is given by
ρt,s =
√t
s(6.8)
6.2 Example: Stock market
Suppose you invest £1,000 now, let:
Yt = ln [proceeds of investment at time t] t = 1, 2, . . . (6.9)
It is assumed that
ln
[proceeds at time t
proceeds at time t -1
]= µ+ Zt (6.10)
where µ is a constant and {Zt} is Gaussian white noise. Thus,
Wt = Yt − Yt−1 = µ+ Zt = N(µ, σ2) (6.11)
By previous resultsYt ∼ N(Y0 + µ, tσ2) (6.12)
Time Series Analysis and Forecasting 17
where Y0 = ln(1000). Suppose that µ = 0.058135 and σ2 = 0.000267. What is thechance that the proceeds at time t = 15 will be under £2,100? Y15 = 0.0632852. Itfollows that
P (Y15 < ln(2100)) = P
(Z <
ln(2100)− 7.77978
0.063285
)(6.13)
= 0.02 (6.14)
6.3 Forecasting with ARMA models
We recall thatYt = Zt + ψ1Zt−1 + ψ2Zt−2 + . . . (6.15)
Suppose we are at time n and we wish to estimate Yt at time n + h. The ‘natural’forecast is:
Y hn+h = E(Yn+h/ Yn, Yn−1, /ldots︸ ︷︷ ︸
known
) (6.16)
It is stated without proof that:
Y hn+h =
∑ψiZn+h−i (6.17)
The error in the forecast is:
en(h) = Yn+h︸ ︷︷ ︸real
− Y hn+h︸ ︷︷ ︸
forecast
=h−1∑i=0
ψiZn+h−i (6.18)
We observe that en(h) has mean 0 and variance
σ2h−1∑i=0
ψ2 (6.19)
If we assume Gaussian white noise for {Zt}, we have
ehn ∼ N
(0, σ2
h−1∑i=0
ψi
)(6.20)
Thus a 95% prediction interval for Yn+h is given by
Y hn+h ± 1.96σ
√√√√h−1∑i=o
ψ2i (6.21)
The width of the interval increases as h increases as expected.
Time Series Analysis and Forecasting 18
6.3.1 Example: An ARIMA(1,1,0) model
Suppose that:Wt = ∇Yt = φWt−1 + Zt (6.22)
i.e.,Yt = (1− φ)Yt−1 − φYt−2 + Zt (6.23)
where we take Y0 = Y−1 = 0. We get that
Y1 = Z1 (6.24)
Y2 = (1 + φ)Z1 + Z2 (6.25)
Y3 = Z3 + (1 + φ)Z2 + (1 + φ+ φ2)Z1 (6.26)
Giving in general
Yt = Zt + (1 + φ)Zt−1 + . . .+ (1 + φ+ φ2 + . . .+ φt−1)Z1 for t = 1, 2, 3, . . . (6.27)
Thus E(Yt) = 0 for all t and the variance is given using the geometric progressionformula
var(Yt) =
[1 +
(1− φ2
1− φ
)2
+ . . .+
(1− φt
1− φ
)2]σ2 (6.28)