Download pdf - Luciano Rispoli Department of Economics, Mathematics and ... · Solution: Forecasting the number of patients placed on a waiting list! But, why is it so important? Waiting lists grow

Luciano Rispoli Department of Economics, Mathematics and Statistics

Birkbeck College (University of London)

1

Forecasting: definition

“Forecasting is the process of making statements about events whose actual outcomes (typically) have not yet been observed”(Wikipedia)

There are many categories of forecasting methods :

Categorical vs Quantitative

Naïve approach (forecast based solely on previous period realisation)

Time Series (Box Jenkins Methodology)

Judgemental methods (based on subjective probability)

2

Why Forecasting?

Economic Forecasting (Inflation/GDP forecasting)

Sales Forecasting

Supply Chain Forecasting

Earthquake Forecasting

Weather Forecasting

Hotel Management (Room bookings forecasting)

3

Forecasting medical time series : a sample of existing studies

“The Application of Forecasting Techniques to Modelling Emergency Medical System Calls in Calgary, Alberta” [see Channouf et al (2006)]

“Box Jenkins Methodology in Medical Research“ [see Helfenstein (1996)]

“Time series modelling for syndromic surveillance” [Reis and Mandle (2003)]

“Conventional and advanced time series estimation: application to the Australian and New Zealand Intensive Care Society (ANZICS) adult patient database, 1993–2008 [See Solomon and Moran (2011)]

4

Today’s application: UK Hospital Waiting Lists

“PM's election pledge in jeopardy as report reveals patients waiting 6% longer”… The Guardian

“NHS Chief warns of rising Hospital waiting times” BBC News

“NHS waiting times may increase at one in three flagship hospitals: report” The Telegraph

“…We will match people's symptoms to certain groups of conditions and try to provide a general forecast…“ Weather used to forecast illness Daily Mail

“Surgery waiting lists hit one million” Mail online 5

Solution: Forecasting the number of patients placed on a waiting list! But, why is it so important?

Waiting lists grow if the demand for a specific treatment outcasts hospital capacity (supply)

Hence, forecasting the number of patient placed on a waiting list at a given time might provide an estimate of the demand and supply imbalance

Hospital Managers, could in principle, know exactly by how much the list is growing/declining (at least within a confidence interval)

This methodology could be easily extended to forecast individual hospital waiting lists, as well as treatments/surgery specific waiting lists

6

Data Available at :

http://www.dh.gov.uk/en/Publicationsandstatistics/Statistics/Performancedataandstatistics/HospitalWaitingTimesandListStatistics/index.htm

This dataset contains information on patients waiting to be admitted to NHS hospitals in England either as a day case or ordinary admission.

Provider based

Time series data from April 1998 to Feb 2010 (143 obs)

It does not contain:

Emergency cases and outpatients

7




Methodology ARIMA models have been found particularly useful in

describing stationary (non – seasonal) time series.

A stationary stochastic process is a process whose joint distribution does not shift in time and space, therefore characterized by finite first and second order moments

Wold’s Theorem: Any stationary series can be expressed as a combination of two components: a perfectly forecastable series and a moving average of possibly infinite order.

Thus non-seasonal series can always be approximated by a MA(∞) model, which in turn can be approximated by an ARMA (p , q) with a small number of parameters p, q.

8

Methodology cont’d However, most time series are not stationary and usually

have a seasonal component!

We have to transform these series into stationary non – seasonal before we can model them

Seasonal differencing

Non - stationarity can be classified :

Trend in mean (difference as many times as required)

Trend in variance (apply a power transformation, e.g. log)

Note: The latter should only be applied only if it stabilises the variance!

9

Total waiting list x 1000 patients

Source: author’s calculations

600

800

1000

1200

1400

tota

lwa

itin

gli

stx

10

00

1998m1 2000m1 2002m1 2004m1 2006m1 2008m1 2010m1t

10

Growth Rate of Hospital Waiting Lists (difference the log)

11

-.0

4-.

02

0

.02

dlw

sa

1998m1 2000m1 2002m1 2004m1 2006m1 2008m1 2010m1t

Source: author’s calculations

Box Jenkins Methodology Plot the series and identify the trend (is the series trending in

the mean/Variance ?) Test for stationarity (Augmented Dickey Fuller Test, KPSS Test,

PP-Test) Transform the series into a stationary series (power

transformation /seasonal differencing/ first differencing etc ) Plot the Autocorrelations and Partial Autocorrelation functions

(Identify possible models) Estimate the possible models (check for coefficient significance

and white noise residuals) Select the best models based on the information criteria (the

model with lowest AIC, BIC, HQ ) Select the best two models given the above and test their

forecasting accuracy (Diebold Mariano Test, Granger Newbold)

12

Model identification: ACF, PACF -0

.50

0.0

00

.50

1.0

0

Au

toco

rrela

tion

s o

f d

ltw

l

0 10 20 30 40Lag

Bartlett's formula for MA(q) 95% confidence bands

-0.5

00

.00

0.5

01

.00

Pa

rtia

l a

uto

corr

ela

tion

s of

dlt

wl

0 10 20 30 40Lag

95% Confidence bands [se = 1/sqrt(n)]

13

ACF = Correlation of the series and lag of itself across time 𝑟𝑘= Corr (Xt , Xt -k) = 𝛾𝑘/𝛾0

PACF: Amount of correlation between a variable and a lag of itself that is not explained by correlations at all lower-order-lags

List of potential models

AR (1)

AR (2)

MA (1)

MA (2)

ARMA (1 , 1)

ARMA (2 , 1)

ARMA (1 , 2)

Note: This was the model with : smallest information criteria, individually/jointly significant coefficients, uncorrelated residuals as well as yielding the most accurate forecasts!

14

Selecting the best Model The selected model is an ARMA ( 1 , 2 )

dltwl = 0.90 dltwl (-1) - 0.28 (𝜺𝒕−𝟏) - 0.19 (𝜺𝒕−𝟐)

The series shows an high degree of persistence insofar it inherits a large proportion of the past period realisation

Invertibility, causality and stationary conditions:

The process satisfies the stationarity condition since the coefficient on the AR component is in absolute value lesser than one, hence ensuring that the process has finite first and second moments

The process satisfies invertibility and causality conditions since the roots of the characteristic equation of the autoregressive process and the moving average, lie outside the unit circle.

These last two requirements imply that the model is uniquely identified in its parameters!

15

FORECASTING

16

ARMA (1,2) Model forecast

“Out of sample forecast”

17

-.04

-.02

0

.02

1998m1 2000m1 2002m1 2004m1 2006m1 2008m1 2010m1 2012m1t

dlwsa xb prediction, one-step

No change forecast -.

04

-.0

2

0

.02

1998m1 2000m1 2002m1 2004m1 2006m1 2008m1 2010m1t

dltwl L.dltwl

18

Forecasting Diagnostics Theil U-statistics

U-statistics =

1

𝑇

𝑓𝑡+1−𝑦𝑡+1

𝑦𝑡

2𝑇−1𝑡

1

𝑇

𝑦𝑡+1−𝑦𝑡𝑦𝑡

2𝑇−1𝑡

= 0.34

If U< 1 the model is superior to a No change

forecast

If U>1 the model is inferior to a No change forecast

19

Drawbacks of this approach

Lack of data availability

It assumes that the Data Generating Process is time invariant (however it might well not be the case!)

The point forecast confidence interval becomes wider and wider as the forecasting horizon increases! (Unless we are only concerned with one step – ahead forecasts and the data-set is sufficiently large for our purposes)

It is very unlikely to find a model that fits particularly well the data, in fact, in practice models can often explain very little!

20

Conclusions

The past values of a variable contain very important information about the future of that variable!

Time Series Analysis Forecasting is a very useful tool and it is easily implementable

It can be applied to a variety of fields of which one one of them is Healthcare /Medical Research

In this example, we have seen that only by looking at the time series properties of the data we were able to infer the sign of the growth rate of hospital admissions !

21

To conclude, if your model fails…

22