Luciano Rispoli Department of Economics, Mathematics and Statistics
Birkbeck College (University of London)
1
Forecasting: definition
“Forecasting is the process of making statements about events whose actual outcomes (typically) have not yet been observed”(Wikipedia)
There are many categories of forecasting methods :
Categorical vs Quantitative
Naïve approach (forecast based solely on previous period realisation)
Time Series (Box Jenkins Methodology)
Judgemental methods (based on subjective probability)
2
Why Forecasting?
Economic Forecasting (Inflation/GDP forecasting)
Sales Forecasting
Supply Chain Forecasting
Earthquake Forecasting
Weather Forecasting
Hotel Management (Room bookings forecasting)
3
Forecasting medical time series : a sample of existing studies
“The Application of Forecasting Techniques to Modelling Emergency Medical System Calls in Calgary, Alberta” [see Channouf et al (2006)]
“Box Jenkins Methodology in Medical Research“ [see Helfenstein (1996)]
“Time series modelling for syndromic surveillance” [Reis and Mandle (2003)]
“Conventional and advanced time series estimation: application to the Australian and New Zealand Intensive Care Society (ANZICS) adult patient database, 1993–2008 [See Solomon and Moran (2011)]
4
Today’s application: UK Hospital Waiting Lists
“PM's election pledge in jeopardy as report reveals patients waiting 6% longer”… The Guardian
“NHS Chief warns of rising Hospital waiting times” BBC News
“NHS waiting times may increase at one in three flagship hospitals: report” The Telegraph
“…We will match people's symptoms to certain groups of conditions and try to provide a general forecast…“ Weather used to forecast illness Daily Mail
“Surgery waiting lists hit one million” Mail online 5
Solution: Forecasting the number of patients placed on a waiting list! But, why is it so important?
Waiting lists grow if the demand for a specific treatment outcasts hospital capacity (supply)
Hence, forecasting the number of patient placed on a waiting list at a given time might provide an estimate of the demand and supply imbalance
Hospital Managers, could in principle, know exactly by how much the list is growing/declining (at least within a confidence interval)
This methodology could be easily extended to forecast individual hospital waiting lists, as well as treatments/surgery specific waiting lists
6
Data Available at :
http://www.dh.gov.uk/en/Publicationsandstatistics/Statistics/Performancedataandstatistics/HospitalWaitingTimesandListStatistics/index.htm
This dataset contains information on patients waiting to be admitted to NHS hospitals in England either as a day case or ordinary admission.
Provider based
Time series data from April 1998 to Feb 2010 (143 obs)
It does not contain:
Emergency cases and outpatients
7
Methodology ARIMA models have been found particularly useful in
describing stationary (non – seasonal) time series.
A stationary stochastic process is a process whose joint distribution does not shift in time and space, therefore characterized by finite first and second order moments
Wold’s Theorem: Any stationary series can be expressed as a combination of two components: a perfectly forecastable series and a moving average of possibly infinite order.
Thus non-seasonal series can always be approximated by a MA(∞) model, which in turn can be approximated by an ARMA (p , q) with a small number of parameters p, q.
8
Methodology cont’d However, most time series are not stationary and usually
have a seasonal component!
We have to transform these series into stationary non – seasonal before we can model them
Seasonal differencing
Non - stationarity can be classified :
Trend in mean (difference as many times as required)
Trend in variance (apply a power transformation, e.g. log)
Note: The latter should only be applied only if it stabilises the variance!
9
Total waiting list x 1000 patients
Source: author’s calculations
600
800
1000
1200
1400
tota
lwa
itin
gli
stx
10
00
1998m1 2000m1 2002m1 2004m1 2006m1 2008m1 2010m1t
10
Growth Rate of Hospital Waiting Lists (difference the log)
11
-.0
4-.
02
0
.02
dlw
sa
1998m1 2000m1 2002m1 2004m1 2006m1 2008m1 2010m1t
Source: author’s calculations
Box Jenkins Methodology Plot the series and identify the trend (is the series trending in
the mean/Variance ?) Test for stationarity (Augmented Dickey Fuller Test, KPSS Test,
PP-Test) Transform the series into a stationary series (power
transformation /seasonal differencing/ first differencing etc ) Plot the Autocorrelations and Partial Autocorrelation functions
(Identify possible models) Estimate the possible models (check for coefficient significance
and white noise residuals) Select the best models based on the information criteria (the
model with lowest AIC, BIC, HQ ) Select the best two models given the above and test their
forecasting accuracy (Diebold Mariano Test, Granger Newbold)
12
Model identification: ACF, PACF -0
.50
0.0
00
.50
1.0
0
Au
toco
rrela
tion
s o
f d
ltw
l
0 10 20 30 40Lag
Bartlett's formula for MA(q) 95% confidence bands
-0.5
00
.00
0.5
01
.00
Pa
rtia
l a
uto
corr
ela
tion
s of
dlt
wl
0 10 20 30 40Lag
95% Confidence bands [se = 1/sqrt(n)]
13
ACF = Correlation of the series and lag of itself across time 𝑟𝑘= Corr (Xt , Xt -k) = 𝛾𝑘/𝛾0
PACF: Amount of correlation between a variable and a lag of itself that is not explained by correlations at all lower-order-lags
List of potential models
AR (1)
AR (2)
MA (1)
MA (2)
ARMA (1 , 1)
ARMA (2 , 1)
ARMA (1 , 2)
Note: This was the model with : smallest information criteria, individually/jointly significant coefficients, uncorrelated residuals as well as yielding the most accurate forecasts!
14
Selecting the best Model The selected model is an ARMA ( 1 , 2 )
dltwl = 0.90 dltwl (-1) - 0.28 (𝜺𝒕−𝟏) - 0.19 (𝜺𝒕−𝟐)
The series shows an high degree of persistence insofar it inherits a large proportion of the past period realisation
Invertibility, causality and stationary conditions:
The process satisfies the stationarity condition since the coefficient on the AR component is in absolute value lesser than one, hence ensuring that the process has finite first and second moments
The process satisfies invertibility and causality conditions since the roots of the characteristic equation of the autoregressive process and the moving average, lie outside the unit circle.
These last two requirements imply that the model is uniquely identified in its parameters!
15
FORECASTING
16
ARMA (1,2) Model forecast
“Out of sample forecast”
17
-.04
-.02
0
.02
1998m1 2000m1 2002m1 2004m1 2006m1 2008m1 2010m1 2012m1t
dlwsa xb prediction, one-step
No change forecast -.
04
-.0
2
0
.02
1998m1 2000m1 2002m1 2004m1 2006m1 2008m1 2010m1t
dltwl L.dltwl
18
Forecasting Diagnostics Theil U-statistics
U-statistics =
1
𝑇
𝑓𝑡+1−𝑦𝑡+1
𝑦𝑡
2𝑇−1𝑡
1
𝑇
𝑦𝑡+1−𝑦𝑡𝑦𝑡
2𝑇−1𝑡
= 0.34
If U< 1 the model is superior to a No change
forecast
If U>1 the model is inferior to a No change forecast
19
Drawbacks of this approach
Lack of data availability
It assumes that the Data Generating Process is time invariant (however it might well not be the case!)
The point forecast confidence interval becomes wider and wider as the forecasting horizon increases! (Unless we are only concerned with one step – ahead forecasts and the data-set is sufficiently large for our purposes)
It is very unlikely to find a model that fits particularly well the data, in fact, in practice models can often explain very little!
20
Conclusions
The past values of a variable contain very important information about the future of that variable!
Time Series Analysis Forecasting is a very useful tool and it is easily implementable
It can be applied to a variety of fields of which one one of them is Healthcare /Medical Research
In this example, we have seen that only by looking at the time series properties of the data we were able to infer the sign of the growth rate of hospital admissions !
21
To conclude, if your model fails…
22