Forecasting
• Purpose is to forecast, not to explain the historical pattern
• Models for forecasting may not make sense as a description for ”physical” behaviour of the time series
• Common sense and mathematics in a good combination produces ”optimal” forecasts
• With time series regression models, forecasting (prediction) is a natural step and forecasting limits (intervals) can be constructed
• With Classical decomposition, forecasting may be done, but estimation of accuracy lacks and no forecasting limits are produced
• Classical decomposition is usually combined with Exponential smoothing methods
Exponential smoothing
• Use the historical data to forecast the future
• Let different parts of the history have different impact on the forecasts
• Forecast model is not developed from any statistical theory
Single exponential smoothing
• Given are historical values y1,y2,…yT
• Assume data contains no trend
Algorithm for forecasting:
)forecasts!constant ( ,2,1;ˆ
,,1,;)1( 111
TT
ttt
y
Tttty
where is a smoothing parameter with value between 0 and 1
• The forecast procedure is a recursion formula
• How shall we choose α?
• Where should we start, i.e. which is the initial value ?0t
For long length time series:
Use a part (usually first half) of the historical data
and calculate their average:
Set histt y0
Update with the rest of the historical data
using the recursion formula
0
10
1 t
tthist y
ty
0,,1 tyy
Tt yy ,,10
Example: Sales of everyday commodities
Year Sales values
1985 151
1986 151
1987 147
1988 149
1989 146
1990 142
1991 143
1992 145
1993 141
1994 143
1995 145
1996 138
1997 147
1998 151
1999 148
2000 148
Note! This time series is short but we use it for illustration purposes!
Calculate the average of the first 8 observations of the series:
75.146...145)/8151151( histy
Set 75.1468 histy
Assume first that the sales are very stable, i.e. during the period the background mean value is assumed not to change
Set α to be relatively small. This means that the latest observation plays a less role than the history in the forecasts. Thumb rule: 0.05 < α < 0.3
E.g. Set α=0.1
Update using the next 8 values of the historical data
998.145776.1459.01481.09.01.0
776.1451955.1459.01511.09.01.0
1955.145995.1449.01471.09.01.0
995.144772.1459.01381.09.01.0
772.1458575.1459.01451.09.01.0
8575.145175.1469.01431.09.01.0
175.14675.1469.01411.09.01.0
141515
131414
121313
111212
101111
91010
899
y
y
y
y
y
y
y
Forecasts:
.
2.146ˆ
2.146ˆ
2.146ˆ
2.146998.1459.01481.09.01.0
19
18
17
151616
etc
y
y
y
y
For short length time series:
Calculate the average of all historical data i.e.
Tty ttt ,,2,1;)1( 1
T
ttt y
T 10
10
T
tthist y
Ty
1
1
Update from the beginning of the time series:
There are a lot of alternatives: • Average of all data, update from the middle of the series• Average of the first half, update from beginning• etc.
Analysis of example data with MINITAB
MTB > Name c3 "FORE1" c4 "UPPE1" c5 "LOWE1"
MTB > SES 'Sales values';
SUBC> Weight 0.1;
SUBC> Initial 8;
SUBC> Forecasts 3;
SUBC> Fstore 'FORE1';
SUBC> Upper 'UPPE1';
SUBC> Lower 'LOWE1';
SUBC> Title "SES alpha=0.1".
Single Exponential Smoothing for Sales values
Data Sales values
Length 16
Smoothing Constant
Alpha 0.1
Accuracy Measures
MAPE 2.2378
MAD 3.2447
MSD 14.4781
Forecasts
Period Forecast Lower Upper
17 146.043 138.094 153.992
18 146.043 138.094 153.992
19 146.043 138.094 153.992
MINITAB uses smoothing from 1st value!
Assume now that the sales are less stable, i.e. during the period the background mean value is possibly changing.
(Note that a change means an occasional “level shift” , not a systematic trend)
Set α to be relatively large. This means that the latest observation becomes more important in the forecasts.
E.g. Set α=0.5 (A bit exaggerated)
Single Exponential Smoothing for Sales values
Data Sales values
Length 16
Smoothing Constant
Alpha 0.5
Accuracy Measures
MAPE 1.9924
MAD 2.8992
MSD 13.0928
Forecasts
Period Forecast Lower Upper
17 147.873 140.770 154.976
18 147.873 140.770 154.976
19 147.873 140.770 154.976
Slightly narrower prediction intervals
We can also use some adaptive procedure to continuosly evaluate the forecast ability and maybe change the smoothing parameter over time
Alt. We can run the process with different alphas and choose the one that performs best. This can be done with the MINITAB procedure.
Single Exponential Smoothing for Sales values
---
Smoothing Constant
Alpha 0.567101
Accuracy Measures
MAPE 1.7914
MAD 2.5940
MSD 12.1632
Forecasts
Period Forecast Lower Upper
17 148.013 141.658 154.369
18 148.013 141.658 154.369
19 148.013 141.658 154.369
Index
Sale
s valu
es
18161412108642
156
152
148
144
140
Alpha 0.567101Smoothing Constant
MAPE 1.7914MAD 2.5940MSD 12.1632
Accuracy Measures
ActualFitsForecasts95.0% PI
Variable
SES optimal alpha
Yet, narrower prediction intervals
Exponential smoothing for times series with trend and/or seasonal variation
• Double exponential smoothing (one smoothing parameter) for trend
• Holt’s method (two smoothing parameters) for trend
• Multiplicative Winter’s method (three smoothing parameters) for seasonal (and trend)
• Additive Winter’s method (three smoothing parameters) for seasonal (and trend)
Modern methods
The classical approach:
Method Pros Cons
Time series regression • Easy to implement
• Fairly easy to interpret
• Covariates may be added (normalization)
• Inference is possible (though sometimes questionable)
• Static
• Normal-based inference not generally reliable
• Cyclic component hard to estimate
Decomposition • Easy to interpret
• Possible to have dynamic seasonal effects
• Cyclic components can be estimated
• Descriptive (no inference per def)
• Static in trend
Explanation to the static behaviour:
The classical approach assumes all components except the irregular ones (i.e. t and IRt ) to be deterministic, i.e. fixed functions or constants
To overcome this problem, all components should be allowed to be stochastic, i.e. be random variates.
A time series yt should from a statistical point of view be treated as a stochastic process.
We will interchangeably use the terms time series and process depending on the situation.
Stationary and non-stationary time series
20
10
0
100908070605040302010
Stationary
Index
3000
2000
1000
0
300200100
Non-stationary
Index
Characteristics for a stationary time series:
• Constant mean
• Constant variance
A time series with trend is non-stationary!
Auto Regressive,
Integrated,
Moving Average
Box-Jenkins models
A stationary times series can be modelled on basis of the serial correlations in it.
A non-stationary time series can be transformed into a stationary time series, modelled and back-transformed to original scale (e.g. for purposes of forecasting)
ARIMA – models
These parts can be modelled on a stationary series
This part has to do with the transformation
Different types of transformation
1. From a series with linear trend to a series with no trend:
First-order differences zt = yt – yt – 1
MTB > diff c1 c2
Note that the differenced series varies around zero.
20
15
10
5
0
linear trendno trend
Variable
2. From a series with quadratic trend to a series with no trend:
Second-order differences
wt = zt – zt – 1 = (yt – yt – 1) – (yt – 1 – yt – 2) = yt – 2yt – 1 + yt – 2
MTB > diff 2 c3 c4
20
15
10
5
0
quadratic trendno trend 2
Variable
3. From a series with non-constant variance (heteroscedastic) to a series with constant variance (homoscedastic):
Box-Cox transformations (per def 1964)
Practically is chosen so that yt + is always > 0
Simpler form: If we know that yt is always > 0 (as is the usual case for measurements)
0 and 0for ln
0 and 0for 1
tt
tt
t
yy
yy
yg
asticity heterosced extreme if1
asticity heteroscedheavy if1
asticity heterosced pronounced ifln
- " -
asticity heteroscedmodest if4
t
t
t
t
t
t
y
y
y
y
y
yg
The log transform (ln yt ) usually also makes the data ”more” normally distributed
Example: Application of root (yt ) and log (ln yt ) transforms
25
20
15
10
5
0
originalrootlog
Variable
AR-models (for stationary time series)
Consider the model
yt = δ + ·yt –1 + at
with {at } i.i.d with zero mean and constant variance = σ2
and where δ (delta) and (phi) are (unknown) parameters
Set δ = 0 by sake of simplicity E(yt ) = 0
Let R(k) = Cov(yt,yt-k ) = Cov(yt,yt+k ) = E(yt ·yt-k ) = E(yt ·yt+k )
R(0) = Var(yt) assumed to be constant
Now:
R(0) = E(yt ·yt ) = E(yt ·( ·yt-1 + at ) = · E(yt ·yt-1 ) + E(yt ·at ) =
= ·R(1) + E(( ·yt-1 + at ) ·at ) = ·R(1) + · E(yt-1 ·at ) + E(at ·at )=
= ·R(1) + 0 + σ2 (for at is independent of yt-1 )
R(1) = E(yt ·yt+1 ) = E(yt ·( ·yt + at+1 ) = · E(yt ·yt ) + E(yt ·at+1 ) =
= ·R(0) + 0 (for at+1 is independent of yt )
R(2) = E(yt ·yt+2 ) = E(yt ·( ·yt+1 + at+2 ) = · E(yt ·yt+1 ) +
+ E(yt ·at+2 ) = ·R(1) + 0 (for at+1 is independent of yt )
R(0) = ·R(1) + σ2
R(1) = ·R(0) Yule-Walker equations
R(2) = ·R(1)
…
R(k ) = ·R(k – 1) =…= k·R(0)
R(0) = 2 ·R(0) + σ2
2
2
1)0(
R
Note that for R(0) to become positive and finite (which we require from a variance) the following must hold:
112
This in effect the condition for an AR(1)-process to be weakly stationary
Now, note that
)0()(
)0()0(
)(
)()(
),(),(
RkR
RR
kR
yVaryVar
yyCovyyCorr
ktt
kttkktt
kk
k R
R
)0(
)0(
ρk is called the Autocorrelation function (ACF) of yt
”Auto” because it gives correlations within the same time series.
For pairs of different time series one can define the Cross correlation function which gives correlations at different lags between series.
By studying the ACF it might be possible to identify the approximate magnitude of
Examples: ACF for AR(1), phi=0.1
0
0.2
0.4
0.6
0.8
1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
k
ACF for AR(1), phi=0.3
0
0.2
0.4
0.6
0.8
1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
k
ACF for AR(1), phi=0.5
0
0.2
0.4
0.6
0.8
1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
ACF for AR(1), phi=0.8
0
0.2
0.4
0.6
0.8
1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
ACF for AR(1), phi=0.99
0
0.2
0.4
0.6
0.8
1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
ACF for AR(1), phi=-0.1
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
ACF for AR(1), phi=-0.5
-1-0.8-0.6-0.4-0.20
0.20.40.60.81
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
ACF for AR(1), phi=-0.8
-1-0.8-0.6-0.4-0.20
0.20.40.60.81
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
The look of an ACF can be similar for different kinds of time series, e.g. the ACF for an AR(1) with = 0.3 could be approximately the same as the ACF for an Auto-regressive time series of higher order than 1 (we will discuss higher order AR-models later)
To do a less ambiguous identification we need another statistic:
The Partial Autocorrelation function (PACF):
υk = Corr (yt ,yt-k | yt-k+1, yt-k+2 ,…, yt-1 )
i.e. the conditional correlation between yt and yt-k given all observations in-between.
Note that –1 υk 1
A concept sometimes hard to interpret, but it can be shown that
for AR(1)-models with positive the look of the PACF is
and for AR(1)-models with negative the look of the PACF is
0.00
1.00
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
k
-1
0
1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
k
Assume now that we have a sample y1, y2,…, yn from a time series assumed to follow an AR(1)-model.
Example:
Monthly exchange rates DKK/USD 1991-1998
0
2
4
6
8
10
The ACF and the PACF can be estimated from data by their sample counterparts:
Sample Autocorrelation function (SAC):
if n large, otherwise a scaling
might be needed
Sample Partial Autocorrelation function (SPAC)
Complicated structure, so not shown here
n
tt
kt
kn
tt
k
yy
yyyyr
1
2
1
)(
))((
The variance function of these two estimators can also be estimated
Opportunity to test
H0: k = 0 vs. Ha: k 0
or
H0: k = 0 vs. Ha: k 0
for a particular value of k.
Estimated sample functions are usually plotted together with critical limits based on estimated variances.
Example (cont) DKK/USD exchange:
SAC:
SPAC: Critical limits
Ignoring all bars within the red limits, we would identify the series as being an AR(1) with positive .
The value of is approximately 0.9 (ordinate of first bar in SAC plot and in SPAC plot)
Higher-order AR-models
AR(2): or
yt-2 must be present
AR(3):
or other combinations with 3 yt-3
AR(p):
i.e. different combinations with p yt-p
tttt ayyy 2211
ttt ayy 22
ttttt ayyyy 332211
tptptt ayyy ...11
Stationarity conditions:
For p > 2, difficult to express on closed form.
For p = 2:
The values of 1 and 2 must lie within the blue triangle in the figure below:
tttt ayyy 2211
Typical patterns of ACF and PACF functions for higher order stationary AR-models (AR( p )):
ACF: Similar pattern as for AR(1), i.e. (exponentially) decreasing
bars, (most often) positive for 1 positive and alternating for 1 negative.
PACF: The first p values of k are non-zero with decreasing
magnitude. The rest are all zero (cut-off point at p )
(Most often) all positive if 1 positive and alternating if 1 negative
Examples:
AR(2), 1 positive:
AR(5), 1 negative:
PACF
0
1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
ACF
0
1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
PACF
-1
0
1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
ACF
-1
0
1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15