48
Seasonal Mortality Forecasting based on Influenza Epidemics in The Netherlands Emanuela Festa Master’s Thesis to obtain the degree in Actuarial Science and Mathematical Finance University of Amsterdam Faculty of Economics and Business Amsterdam School of Economics Author: Emanuela Festa Student nr: 10746994 Email: emanuela [email protected] Date: October 29, 2016 Supervisor: Andrei Lalu Second reader: Prof. Dr. Ir. Michel Vellekoop

Seasonal Mortality Forecasting based on In uenza Epidemics

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Seasonal Mortality Forecasting based on In uenza Epidemics

Seasonal Mortality Forecasting basedon Influenza Epidemics in The

Netherlands

Emanuela Festa

Master’s Thesis to obtain the degree inActuarial Science and Mathematical FinanceUniversity of AmsterdamFaculty of Economics and BusinessAmsterdam School of Economics

Author: Emanuela FestaStudent nr: 10746994Email: emanuela [email protected]

Date: October 29, 2016Supervisor: Andrei LaluSecond reader: Prof. Dr. Ir. Michel Vellekoop

Page 2: Seasonal Mortality Forecasting based on In uenza Epidemics

Statement of Originality

This document is written by Student Emanuela Festa who declares to take full responsibility forthe contents of this document.

I declare that the text and the work presented in this document is original and that no sourcesother than those mentioned in the text and its references have been used in creating it.

The Faculty of Economics and Business is responsible solely for the supervision of completion ofthe work, not for the contents.

2

Page 3: Seasonal Mortality Forecasting based on In uenza Epidemics

Seasonal Mortality Forecasting — Emanuela Festa iii

Abstract

It is important for pension funds and life insurance companies to be able to determinelongevity and mortality risk with great accuracy, especially due to Solvency II regula-tions. Current mortality forecasting methods provide estimates of future mortality ratesusing observed annual data. Mortality, however, does not stay constant throughout theyear. Studies have shown that economically developed countries such as The Nether-lands tend to have a much higher mortality rate in winter than in summer. Possiblecauses include influenza epidemics and cold winter temperatures. A higher degree ofaccuracy in mortality forecasting could perhaps be achieved if future mortality ratescould be predicted using higher frequency recordings, for example, weekly observationsinstead of annual ones. To this end, this thesis focuses on weekly mortality rates inThe Netherlands in the time span of 1995 to 2015. In order to explain their seasonalbehaviour, correlations between mortality and temperature, and between mortality andinfluenza epidemics are studied. The standard Lee-Carter model is then used as thebasis for designing seasonal mortality forecasting models. Different types of seasonalARIMA models to estimate Lee-Carter’s time-varying parameter, the mortality index,are considered. One of the model specifications incorporates a flu indicator to estimatethe excess winter mortality more accurately. The other models are seasonal ARIMAstructures and ARIMA with seasonal dummy variables. The models are then evaluatedaccording to their performance in out-of-sample forecasts and according to measuresof residual variance. The results show that a flu indicator increases the out-of-sampleforecast performance of seasonal ARIMA models significantly and residual variance isreduced the most when using an ARIMA with seasonal dummy variables. Finally, thethesis argues that a flu indicator model could be a useful tool for testing different futureflu epidemics scenarios.

Keywords Lee-Carter, Seasonal ARIMA, Seasonal mortality, Excess winter mortality, Fore-

casting mortality, Dutch mortality data, Influenza epidemics

Page 4: Seasonal Mortality Forecasting based on In uenza Epidemics

Contents

Acknowledgements vi

1 Introduction 1

2 Data description 3

2.1 Influenza epidemics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.2 Average temperature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.3 Central mortality rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.4 Correlation of mt with flu data and average temperatures in winter . . . 5

2.4.1 Correlation between mt and average winter temperatures . . . . 5

2.4.2 Correlation between mt and number of recorded ILI patients . . 6

3 Modelling of weekly Dutch population mortality 8

3.1 The Lee-Carter Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3.1.1 Estimation of the age specific and time-varying parameters . . . 9

3.2 Application of the Lee-Carter Model to weekly non-age specific mortality 10

3.2.1 Estimation of the parameters . . . . . . . . . . . . . . . . . . . . 10

3.3 Non-seasonal ARIMA models . . . . . . . . . . . . . . . . . . . . . . . . 11

3.3.1 Model specification . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.3.2 Random Walk with Drift model . . . . . . . . . . . . . . . . . . . 12

3.3.3 Mortality forecasting with RWD . . . . . . . . . . . . . . . . . . 13

3.4 Seasonal ARIMA structures . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.4.1 SARIMA(0,1,0)(0,1,0)52 with drift model . . . . . . . . . . . . . 15

3.5 ARIMA model with weekly dummy variables . . . . . . . . . . . . . . . 15

3.6 SARIMA with flu indicator as exogenous regressor . . . . . . . . . . . . 17

3.6.1 Constructing a flu indicator . . . . . . . . . . . . . . . . . . . . . 17

4 Results 20

4.1 Full sample forecasts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

4.1.1 Random Walk with Drift . . . . . . . . . . . . . . . . . . . . . . 20

4.1.2 SARIMA(p,d,q)(P,D,Q)52 with drift . . . . . . . . . . . . . . . . 21

4.1.3 RWD with weekly dummy variables . . . . . . . . . . . . . . . . 22

4.1.4 SARIMA(p,d,q)(P,D,Q)52 with drift, with flu indicator . . . . . . 23

4.1.5 Model evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

4.2 Out-of-sample forecasts of κt and mt . . . . . . . . . . . . . . . . . . . . 25

4.2.1 SARIMA(2,0,1)(0,1,1)52 with drift . . . . . . . . . . . . . . . . . 25

4.2.2 RWD with weekly dummy variables . . . . . . . . . . . . . . . . 26

4.2.3 SARIMA(2,0,1)(0,1,1)52 with drift, with flu indicator . . . . . . . 27

4.2.4 Out-of-sample forecasts of the central mortality rate . . . . . . . 27

4.2.5 Model evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4.3 Long-term forecasts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

5 Conclusions 33

iv

Page 5: Seasonal Mortality Forecasting based on In uenza Epidemics

Seasonal Mortality Forecasting — Emanuela Festa v

Appendix A: Monthly to weekly adjustments of population data 35

Appendix B: Parameters of ARIMA models 36

Appendix C: R code 39

Bibliography 41

Page 6: Seasonal Mortality Forecasting based on In uenza Epidemics

Acknowledgements

First of all I would like to thank my supervisor Andrei Lalu for his availability andthe discussions on seasonal mortality forecasting. My gratitude goes also to my husbandPedro Borges for his loving support during my studies and for contributing with thereviewing of my thesis. Finally, a big thank you to my family for their unconditionalsupport, which made writing this thesis possible.

vi

Page 7: Seasonal Mortality Forecasting based on In uenza Epidemics

Chapter 1

Introduction

Calculating longevity and mortality risk is important for pension funds and insur-ance companies providing pension plans and life insurance. In Europe, in order to becompliant with Solvency II regulations, these institutions need to have enough capitalreserved to be able to finance the liabilities they will incur over the next year with aprobability of at least 99.5%. It is therefore very important to be able to estimate andforecast mortality as accurately as possible.

Existing mortality forecasting methods use observed annual mortality rates as inputand hence yield annual forecasts. However, in certain countries, mortality does not stayconstant throughout the year. In economically developed countries, mortality rates tendto increase significantly in the winter season, and in some years more than in others. Astudy by Reichert et al. (2004) suggests that influenza epidemics play a very importantrole in causing this phenomenon. Another factor that could be responsible for highermortality rates in winter are cold spells (Huynen et al., 2001).

In order to increase accuracy of mortality forecasts and therefore improve the de-termination of mortality risk, one could study the death rate observed on a monthlyor weekly basis. Moreover, if a strong correlation between winter excess mortality andeither average temperature or influenza epidemics can be found, one of these variablescould be added as an indicator in mortality forecasting models based on higher frequencymortality rates. Creating a model that can forecast the mortality on a weekly basis in-stead of yearly might allow pension funds and insurance companies to manage reservesmore effectively and estimate their risk for the coming years with more accuracy.

The forecasting method chosen for this research is the standard model by Lee andCarter (1992) due to its simplicity and empirical accuracy. The Lee-Carter model is oneof the most frequently used models for mortality forecasting and various extensions,improvements and applications have been found since its introduction in 1992 (see forexample Lee, 2000).

The aim of this thesis is to find appropriate models for forecasting the seasonalbehaviour of the time-varying parameter of the Lee-Carter model, and hence of thecentral mortality rate. To this end, various specifications of seasonal autoregressiveintegrated moving average (ARIMA) models are considered. Furthermore, an improvedARIMA specification that uses information from a flu indicator to model excess wintermortality is proposed.

As input for the models weekly mortality rates in The Netherlands spanning 20 years(from 1995 to 2015) are used. The performance of these models is evaluated using out-of-sample forecasts and by comparing measures of residual variance. Finally, the out-of-sample forecasts of the seasonal models are compared to the out-of-sample forecast of a

1

Page 8: Seasonal Mortality Forecasting based on In uenza Epidemics

2 Emanuela Festa — Seasonal Mortality Forecasting

yearly model by aggregating the weekly forecasts of the mortality rate into yearly ones.This is an interesting comparison to make because it could provide some evidence insupport of the hypothesis that by aggregating higher frequency forecasts a higher degreeof accuracy of yearly mortality forecasts can be obtained. If this is the case, modellinghigher frequency mortality rates and then aggregating the forecasts could become amore accurate method for determining future yearly mortality, and hence mortality andlongevity risk.

Here is a brief outline of how the thesis is structured. Chapter 2 provides a descriptionof the types of data that were studied and their sources. Also, a discussion on the choiceof variables to use for modelling excess winter mortality is given. Chapter 3 focuseson the theory of the models considered for forecasting yearly and seasonal mortality.The first section describes the standard Lee-Carter model and the estimation of itsparameters, whereas the second section presents the non-age specific version of themodel. The third section describes a method for forecasting the time-trend parameterof the Lee-Carter model, also called the mortality index. Sections 4 and 5 describemore complex methods that can be applied in order to model and forecast the seasonalbehaviour of the weekly mortality index, and the last section describes how the researchon flu epidemics can be applied in order to model excess winter mortality. Chapter 4presents the results of the forecasts obtained with the models described in chapter 3and evaluates their performance. Finally, in chapter 5, the conclusions are presented.

Page 9: Seasonal Mortality Forecasting based on In uenza Epidemics

Chapter 2

Data description

Three types of data are considered for the research. The first type gives an indicationof how many people are affected by influenza in The Netherlands each week and thesecond is a measure of the average temperature in The Netherlands per week. The thirdtype concerns demographics: more specifically number of deaths and population size inThe Netherlands per week. These statistics have been used in order to calculate thecentral mortality rate per week in The Netherlands over the last 20 years. The nextthree sections explain in detail what kind of data was obtained and from which sources.In the fourth section examples are shown of how the central mortality rate correlateswith data on flu epidemics and with average temperature during winter seasons.

2.1 Influenza epidemics

The World Health Organization (WHO) talks in terms of “influenza seasons”. Aninfluenza season is a time period that begins each year at week 40 and ends at week 20in the following year. The advantage of talking in terms of influenza seasons instead ofyears is that normally an influenza season includes at most one flu epidemic whereas ayear could possibly include two. This system is more useful because it makes it clearwhich flu epidemic one is referring to.

The Dutch government established that a flu epidemic is present in The Netherlandswhen, for two consecutive weeks, at least 51 out of 100.000 people per week have reportedan influenza-like illness (ILI) to their general practicioner. One of the organisations thatgathers this type of statistics in The Netherlands is the Netherlands Institute for Re-search of Healthcare (NIVEL). The NIVEL brings out weekly bulletins called “NIVELZorgregistraties eerste lijn” (Hooiveld et al., 2016) during the influenza seasons, report-ing, among other things, the number of ILI patients that have been registered by acertain number of general practitioners, referred to as the sentinel doctors, who areworking in sentinel station practices. The sentinel stations are spread all over the coun-try but the total number and their distribution across The Netherlands may vary peryear. This kind of registration is called Continuous Morbidity Registration (CMR). TheNIVEL also publishes extensive yearly reports called “Continue Morbiditeits RegistratiePeilstations Nederland”, which gather and analyse all the information collected in theprevious year and compare it to information collected in earlier years. See for exampleDonker (2012) and Bartelds (2003). The reports are not limited to ILI but cover a widerange of pathologies.

The data from the weekly NIVEL bulletins and from the yearly NIVEL reportspublished in the last 20 years has been used in this research to derive an indicator forthe intensity and length of flu epidemics from 1995 to 2015.

3

Page 10: Seasonal Mortality Forecasting based on In uenza Epidemics

4 Emanuela Festa — Seasonal Mortality Forecasting

2.2 Average temperature

Daily and monthly average temperatures are available from the European ClimateAssessment & Dataset (ECA&D) (Klein Tank, 2002). In order to obtain weekly averagetemperatures the daily average temperatures have been averaged to form weekly ones.The location in The Netherlands where the temperatures were measured is De Bilt.This location is chosen because it is in the centre of The Netherlands and thereforegives more or less an average of the temperatures in the whole country. In the Northand more inland towards the South and the East, temperatures tend to be slightly lowerduring winter than in the centre of The Netherlands, whereas in the West they tend tobe slightly higher. Overall, however, average temperatures in different regions of TheNetherlands are fairly similar.

2.3 Central mortality rate

The age specific central mortality rate mx,t is defined as the average death rateexperienced by a population group of age x at time t. It is calculated as follows:

mx,t =dx,tex,t

, (2.1)

where dx,t represents the number of people aged x that have died at time t and ex,t,also known as the exposure to mortality risk, represents the average number of peopleaged x living at time t.

In general time t is expressed in terms of years and it is unusual for actuaries towork with non-annual demographic data. Death rates of a higher frequency (for examplemonthly or weekly) are not as common and therefore harder to obtain for a specific time-span. The time-span that can be studied is therefore determined by the availabilityof weekly data from Statistics Netherlands (CBS). This is a time-span ranging from1995 up to and including 2014. For this time-span the CBS has provided the number ofdeaths in The Netherlands per week and the number of people living in The Netherlandsper month (measured at the beginning of the month). Weekly population figures wereestimated from the monthly figures using a method described in Appendix A. The dataon number of deaths and population is also available for female and male populationsubgroups and (for shorter time-spans) for different age groups.

Due to data availability constraints age differentiation will be excluded in the model.Furthermore, instead of the yearly behaviour of the mortality time trend the seasonalbehaviour will be studied. To this end the non-age specific central mortality rate, de-noted by mt will be used. Here t represents the week number belonging to a particularyear. For simplification it is assumed that a year consists of exactly 52 weeks.

The non-age specific central mortality rate is defined as follows:

mt =dtet, (2.2)

where dt represents the number of deaths in week t and et, represents the average numberof people living in week t.

Page 11: Seasonal Mortality Forecasting based on In uenza Epidemics

Seasonal Mortality Forecasting — Emanuela Festa 5

2.4 Correlation of mt with flu data and average tempera-tures in winter

The first part of this section evaluates the correlation between average temperatureand the central mortality rate mt, whereas the second part addresses the correlationbetween flu epidemics and mt. Based on these evaluations, the most suitable variablefor fitting seasonal mortality is chosen. Only data from the winter seasons (starting onweek 40 in year y and ending on week 20 in year y+ 1) is shown because measurementsregarding influenza are only published for this period of time by organisations such asthe NIVEL.

2.4.1 Correlation between mt and average winter temperatures

In order to study the correlation between mt and weekly average temperature, the R2

values of these variables were calculated for all winter seasons occurring between 1995and 2015. The plots in figure 2.1 illustrate four examples (taken in five year intervals)of the correlations found. The regression lines and R2 values belonging to the plots aregiven in table 2.1.

Figure 2.1: Correlation between mt and average temperature for four specific winterseasons.

Winter Season Linear Regression Equation R2

1997/1998 y = 9.2× 104x− 8 6.2%

2002/2003 y = −1.8× 105x+ 37 17.9%

2007/2008 y = −2.0× 105x+ 41 19.5%

2012/2013 y = −2.2× 105x+ 43 41.1%

Table 2.1: Linear regression equations and R2 values from the scatter plots in figure 2.1.

Three of the plots in figure 2.1 suggest a negative correlation between average temper-ature and mt, whereas in one of the plots (winter 1997/1998) it appears to be positive.Out of the 20 plots made, however, only one finds a positive correlation. Therefore, thecorrelation between average temperature and mt is predominantly negative, as intu-itively expected.

Page 12: Seasonal Mortality Forecasting based on In uenza Epidemics

6 Emanuela Festa — Seasonal Mortality Forecasting

The histogram in figure 2.2 illustrates the distribution of the 20 R2 values calculated.They range from 6% to 61% and the median was found to be 31%.

R2

Fre

quen

cy

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

01

23

45

6

Figure 2.2: Histogram showing theR2 values between average temperature and mortalityfor all the winter seasons between 1995 and 2015.

2.4.2 Correlation between mt and number of recorded ILI patients

The correlation between mt and the number of ILI patients recorded by sentineldoctors was calculated for all winter seasons occurring between 1995 and 2015. In figure2.3 four examples of correlation plots between these two variables are shown for the samewinter seasons considered in figure 2.1. The regression lines and R2 values belonging tothe plots are given in table 2.2.

Figure 2.3: Correlation between mt and number of recorded ILI patients for four specificwinter seasons.

The correlations in the plots of figure 2.3 were found to be positive. Positive cor-relations were also found for the remaining 16 years in the time range from 1995 to2015.

Page 13: Seasonal Mortality Forecasting based on In uenza Epidemics

Seasonal Mortality Forecasting — Emanuela Festa 7

Winter Season Linear Regression Equation R2

1997/1998 y = 3.0× 106x− 473 59.0%

2002/2003 y = 9.2× 105x− 134 29.4%

2007/2008 y = 5.6× 105x− 53 11.3%

2012/2013 y = 3.0× 106x− 383 75.3%

Table 2.2: Linear regression equations and R2 values from the scatter plots in figure 2.3.

The histogram in figure 2.4 illustrates the distribution of the 20 R2 values calculated.The values range from 7% to 87% and the median was found to be 57%.

R2

Fre

quen

cy

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

01

23

45

6

Figure 2.4: Histogram showing the R2 values between recorded ILI patients and mor-tality for all the winter seasons between 1995 and 2015.

The histograms show that the correlation strength between flu epidemics and mt

varies considerably from year to year but is overall stronger than the correlation foundbetween average temperature and mortality. In fact, the correlation between flu epi-demics and mt was found to be stronger for 16 out of the 20 years considered, that isto say in 80% of the cases. Furthermore, the median R2 value was found to be 57% forflu epidemics versus mt, and only 31% for average temperature versus mt. Followingthese observations, the decision was made to use the recorded ILI patients as exogenousvariable for modelling seasonal mortality in The Netherlands.

Page 14: Seasonal Mortality Forecasting based on In uenza Epidemics

Chapter 3

Modelling of weekly Dutchpopulation mortality

The modelling of weekly Dutch population mortality from 1995 to 2015 in this re-search is based on the standard Lee-Carter model (Lee and Carter, 1992). As mentionedin the introduction, this model is chosen due to its simplicity and empirical accuracy.

Section 3.1 briefly explains the theory of the model presented in Lee and Carter’spaper from 1992, Modeling and Forecasting US Mortality. Section 3.2 describes howthe non-age specific version of the Lee-Carter model is applied in order to conductresearch on seasonal mortality in The Netherlands. The remaining sections describevarious ARIMA models that were used to fit the κt parameter of the Lee-Carter model,also known as the mortality index.

In section 3.3 the base specification for the ARIMA model used to fit κt is described.It is a simple ARIMA(0,1,0) with constant, which can also be referred to as a randomwalk with drift (RWD). However, the mortality index determined per week over a 20year time span has a strong seasonal behaviour due to excess mortality occurring in thewinter season. This seasonal behaviour cannot be modelled using a simple RWD modeland therefore more sophisticated models need to be found.

In order to capture seasonality, other types of ARIMA models are explored. Section3.4 describes seasonal ARIMA (SARIMA) structures where both the seasonal and non-seasonal behaviours of κt can be modelled. Section 3.5 describes an ARIMA(0,1,0) modelwith 51 seasonal dummy variables and section 3.6 introduces a SARIMA model in whichan influenza indicator based on the number of reported flu cases in the winter season isused as an exogenous regressor to fit seasonal behaviour of the mortality index.

3.1 The Lee-Carter Model

Lee and Carter published in 1992 a very easy to use model for fitting and forecast-ing the central mortality rate. According to this model, the age specific log-mortalityln(mx,t) is given by

ln(mx,t) = αx + βxκt + σxεt. (3.1)

The αx and βx parameters are age specific, whereas the κt parameter, a stochasticprocess also known as the mortality index, represents the change in mortality rates overtime. The αx parameter can be interpreted as the average mortality belonging to acertain age group and the βx parameter as the age sensitivity of the mortality index.

8

Page 15: Seasonal Mortality Forecasting based on In uenza Epidemics

Seasonal Mortality Forecasting — Emanuela Festa 9

The error term εt is assumed to be independent and identically distributed with mean0 and variance 1.

Without restrictions the model is undetermined. One can easily verify that for anyconstant c, the following set of parameters will lead to the exact same model: {αx, βx, κt},{αx, βx/c, c · κt} or {αx − c · βx, βx, κt + c}. To make the model identifiable, Lee andCarter imposed the following constraints on βx and κt:

∑x

βx = 1; (3.2)

∑t

κt = 0. (3.3)

From these restrictions it follows that the αx parameter is equal to the age specificlog-mortality averaged over time:

αx =1

T

T∑t=1

ln(mx,t). (3.4)

3.1.1 Estimation of the age specific and time-varying parameters

Equation 3.4 can be used to estimate the αx parameter:

αx =1

T

T∑t=1

ln(mx,t). (3.5)

The estimation of βx and κt can be obtained by finding the least squares solution to

∑x,t

(Zx,t − βxκt)2, (3.6)

where Zx,t = ln(mx,t) − αx (Koissi and Shapiro, 2008). This problem was solved byLee and Carter through Singular Value Decomposition (SVD). According to SVD, ma-trix Zx,t can be decomposed into a series of singular column-vectors Ux,i multiplied bysingular row-vectors V T

t,i, multiplied by factors√ηi, where i = 1, ..., r and r = rank[Zx,t]:

Zx,t =

r∑i=1

√ηiUx,iV

Tt,i. (3.7)

Zx,t can then be approximated using just the first component, which corresponds tothe highest eigenvalue:

Zx,t ≈√η1Ux,1V

Tt,1. (3.8)

Page 16: Seasonal Mortality Forecasting based on In uenza Epidemics

10 Emanuela Festa — Seasonal Mortality Forecasting

The estimation of βx and κt hence follows from the minimisation of equation 3.6 andfrom equation 3.8:

βx =Ux,1∑i Ux,i

; (3.9)

κt =√η1V

Tt,1

∑i

Ux,i. (3.10)

Note that in expressing βx the first constraint imposed by Lee and Carter, given byequation 3.2, has been taken into account, and hence it is normalised.

The mortality time trend in this model is the same for all ages. This implies that theforecasting of future central mortality rates for all age groups simplifies to forecastingfuture values of κt. The modelling and forecasting of the mortality index κt is discussedfrom section 3.3 onwards.

3.2 Application of the Lee-Carter Model to weekly non-age specific mortality

The non-age specific version of the Lee-Carter model can be seen as a special case ofthe model described in the previous section. In this case, the log-mortality is given by

ln(mt) = α+ κt + σεt, (3.11)

where the α parameter is equal to the average log-mortality over time and the β param-eter is equal to 1 because this model does not include the age sensitivity of the mortalitytime trend κt. Both α and β can be expressed as one single term instead of as vectors.The error term εt is again assumed to be independent and identically distributed withmean 0 and variance 1. This error term represents the variability that is not explainedby the model, and which this thesis aims to decrease by looking into seasonal patterns.Lee and Carter used yearly measurements of the central mortality rate in order to de-termine κt but in order to model seasonality weekly measurements are used. Therefore,t denotes the week number belonging to a particular year instead of a year.

Note that the restrictions given in equations 3.2 and 3.3 still apply to this model.These constraints leads to

α =1

T

T∑t=1

ln(mt). (3.12)

3.2.1 Estimation of the parameters

For the estimation of α the result obtained in equation 3.12 can be used. It is equalto the average log-mortality over time:

Page 17: Seasonal Mortality Forecasting based on In uenza Epidemics

Seasonal Mortality Forecasting — Emanuela Festa 11

α =1

T

T∑t=1

ln(mt). (3.13)

The β parameter is equal to 1 in the one dimensional case (no age subgroups) andtherefore no estimation is needed.

κt can simply be estimated using equation 3.11:

κt = ln(mt)− α. (3.14)

Note that the one dimensional case has been formulated using the same notation asin section 3.1.1 in order to be consistent with the general specification.

3.3 Non-seasonal ARIMA models

When κt is calculated using the central mortality rate in The Netherlands per weeka strong seasonal variation is found. A time series that is measured at this frequencycan be decomposed in a trend component Tt, a seasonal component St and an errorcomponent Rt. An illustration of the seasonal behaviour of κt is shown in figure 3.1.The figure shows values for all weeks of the year from 1995 up to and including 2014.The trend component, which was obtained using a moving average (method: symmetricwindow of length 52 with equal weights) is shown in red.

Year

κ t

1995 2000 2005 2010 2015

−0.

10.

00.

10.

20.

3

κt

Tt

Figure 3.1: Plot of κt per week obtained from Dutch mortality rates from 1995 to 2015.The red line denotes the trend Tt, obtained using a moving average.

In order to model the stochastic process κt various types of ARIMA(p,d,q) modelscan be used. The non-seasonal type is described in this section and more complex modelsthat include seasonality are described from section 3.4 onwards.

Page 18: Seasonal Mortality Forecasting based on In uenza Epidemics

12 Emanuela Festa — Seasonal Mortality Forecasting

3.3.1 Model specification

ARIMA models are named this way because they may include autoregression (AR),differencing and moving averages (MA). In an ARIMA(p,d,q) time series, p representsthe order of the autoregressive part, d is the order of integration and q represents theorder of the moving average part (Heij et al., 2004). The general formulation of thistype of time series is

φ(L)(1− L)dyt = φ0 + θ(L)εt, (3.15)

where

• L is the lag operator which shifts the data back by one or more periods. Forexample: Lyt = yt−1 and Ldyt = Ld−1yt−1 = yt−d;

• (1 − L)d is an expression for the differencing process, which takes place in orderto remove trends or seasonality and make the time series stationary. A differencedtime series is said to have integration order d;

• φ(L), which can also be written as AR(p), is the autoregressive equation withparameters φi where i = 1, ..., p:

φ(L) = 1− φ1L− φ2L2 − ...− φp−1Lp−1 − φpLp; (3.16)

• θ(L), which can also be written as MA(q), is the moving average equation withparameters θj where j = 1, ..., q:

θ(L) = 1 + θ1L+ θ2L2 + ...+ θq−1L

q−1 + θqLq; (3.17)

• φ0 is a constant that is equal to the drift of the time series when d > 0;

• εt is a normally independent, identical distributed error term with mean 0 andvariance σ2ε , also known as a white noise process.

3.3.2 Random Walk with Drift model

Lee and Carter (1992) used a random walk with drift model (RWD), which is anARIMA(0,1,0) with constant, in order to fit and then forecast the mortality index κt.The process can be written as

κt = φ0 + κt−1 + εt, (3.18)

where φ0 is the drift of the time trend and εt the error term with mean 0 and varianceσ2ε (Vellekoop, 2016). Note that the above equation can also be written as

κt = κ0 + tφ0 +

t∑s=1

εs. (3.19)

By rearranging the terms the drift can be estimated:

Page 19: Seasonal Mortality Forecasting based on In uenza Epidemics

Seasonal Mortality Forecasting — Emanuela Festa 13

φ0 =κt − κ0

t, for t > 0. (3.20)

Here the error term vanishes since it has zero mean. Furthermore, the residual variancecan be estimated with the following equation:

σ2ε =1

t− 1

t∑j=1

(κj − κj−1 − φ0)2, for t > 1. (3.21)

This means that the drift and residual variance parameters can be directly obtainedfrom the data.

For the yearly data used by Lee and Carter (1992), this model provided a fairly goodfit and gave reasonable forecasts. However, when using weekly data the drift shouldnot be determined with equation 3.20 because the first and last values of κt can varysignificantly depending on which year (and on which week of the year) the time seriesends and begins. In this case, these values represent the mortality index in week 1 (ofyear 1995) and week 52 (of year 2014) but the mortality index in any specific week ofthe year can vary considerably from year to year. The difference between the first andlast values of the time series can be equal to any in a wide range of values, both positiveand negative, and the resulting drift would therefore, very probably, not represent theactual, observed drift of the time series. Another method for determining the drift ofthe weekly κt time series should therefore be used.

The drift can alternatively be determined by fitting a regression line to the timetrend Tt of κt, which is shown in red in figure 3.1. The regression line of Tt is illustratedin figure 3.2, and through this method the drift is found to be φ0 = −1.00× 10−4. Thisis the value that will be used for the drift in all models discussed in this thesis for fittingthe mortality index.

Figure 3.2: Trend Tt of the mortality index per week (red line) and linear regression(black line) for determining the drift of the time series.

3.3.3 Mortality forecasting with RWD

In order to forecast mortality using the Lee-Carter model it suffices to forecast futurevalues of κt. When κt follows a simple RWD model and all the values of κt are known

Page 20: Seasonal Mortality Forecasting based on In uenza Epidemics

14 Emanuela Festa — Seasonal Mortality Forecasting

up to and including time T , the forecast κT+f can be written as

κT+f = κT−1+f + φ0 + εT+f . (3.22)

This is equivalent to

κT+f = κT + fφ0 +

f∑h=1

εT+h. (3.23)

Since the error terms have zero means, the expected value of the forecast for κT+f willbe simply equal to κT +fφ0. Furthermore, κT+f is normally distributed and has varianceσ2ε f . Hence the 95% confidence interval will be equal to ±1.96 standard deviations, or±1.96σε

√f .

This result can then be used to find the expected value of the forecasted centralmortality rate:

E[(mT+f )] = exp{α+ κT + fφ0}. (3.24)

3.4 Seasonal ARIMA structures

One way to take seasonality into account in ARIMA models is to apply seasonaldifferencing. A seasonal ARIMA model can be written as SARIMA(p,d,q)(P ,D,Q)m,where m = number of periods in a year, and can include both a non-seasonal part,denoted by (p,d,q), and a seasonal part denoted by (P ,D,Q)m. The general equation ofa SARIMA with constant is of the following form:

φ(L)Φ(L)(1− L)d(1− Lm)Dyt = φ0 + θ(L)Θ(L)εt, (3.25)

where, in addition to the non-seasonal terms described in section 3.3.1

• (1− Lm)D is an expression for the seasonal differencing process of order D;

• Φ(L) is the seasonal autoregressive equation AR(P ) with parameters Φi, wherei = 1, ..., P :

Φ(L) = 1− Φ1Lm − Φ2L

2m − ...− ΦP−1L(P−1)m − ΦPL

Pm; (3.26)

• Θ(L) is the seasonal moving average equation MA(Q) with parameters Θj , wherej = 1, ..., Q:

Θ(L) = 1 + Θ1Lm + Θ2L

2m + ...+ ΘQ−1L(Q−1)m + ΘQL

Qm. (3.27)

Note that in the seasonal terms, all lags are elevated to the power m. For a more indepth description of this type of seasonal ARIMA structures the reader may refer toEnders (2010, p. 97-103).

Page 21: Seasonal Mortality Forecasting based on In uenza Epidemics

Seasonal Mortality Forecasting — Emanuela Festa 15

3.4.1 SARIMA(0,1,0)(0,1,0)52 with drift model

As equation 3.25 shows, SARIMA specifications can become complex. To illustratehow κt can be fitted using a SARIMA model, a relatively simple model will be takenas an example: the SARIMA(0,1,0)(0,1,0)52 with drift. This model takes into accounta general yearly trend in the non-seasonal part and applies differencing in the seasonalpart (with m = 52 for weekly data) in order to model the seasonal trend as well.

Equation 3.25 reduces to:

(1− L)(1− L52)κt = φ0 + εt, (3.28)

which is equivalent to

κt = κt−1 + κt−52 − κt−53 + φ0 + εt. (3.29)

The forecasted values of κt in this model are obtained with:

κT+f = κT−1+f + κT−52+f − κT−53+f + φ0 + εT+f . (3.30)

Another way of expressing this is:

κT+f − κT−52+f = κT − κT−52 + fφ0 +

f∑h=1

εT+h. (3.31)

Therefore, the expected values of the forecasts can be found through:

E[(κT+f )] = κT−52+f + κT − κT−52 + fφ0. (3.32)

Note that when forecasting κT+f values where f > 52, the previous forecasts (wheref < 53) need to be computed first.

3.5 ARIMA model with weekly dummy variables

Another way to model the seasonal behaviour shown in figure 3.1, is to assume thatevery single week in a year follows its own specific time trend. In this case an ARIMAtime series can be expressed in the most general form as

φ(L)(1− L)dyt = βt + θ(L)εt, (3.33)

where the drift βt takes on 52 different values, depending on which week of the year thetth observation is made. Therefore, β1 will be equal to β53, β2 to β54 and so on.

Page 22: Seasonal Mortality Forecasting based on In uenza Epidemics

16 Emanuela Festa — Seasonal Mortality Forecasting

This model can also be described using so-called dummy variables. For a detailedexplanation on the use of dummy variables for modelling parameter variation the readermay refer to Heij et al. (2004, p. 303-319). Dummy variables Dh (where h = 1, 2, ..., 52)are defined as follows: Dht is equal to 1 when the tth observation falls in week h and isequal to zero otherwise. Equation 3.33 can subsequently be re-written as

φ(L)(1− L)dyt = β1D1t + β2D2t + ...+ β52D52t + θ(L)εt. (3.34)

However, since it is preferable to have a constant term in the model (denoting thedrift), the first dummy variable D1 is eliminated and the first term on the right handside of equation 3.34 is replaced by φ0:

φ(L)(1− L)dyt = φ0 +52∑i=2

βiDit + θ(L)εt, (3.35)

where φ0 describes the drift that the time trend would have if only the first week ofevery year was taken into account. Then β2 is equal to the difference one would need toadd or subtract to φ0 if only week 2 was taken into account every year instead of week1. Similarly for β3 up until β52.

To fit the mortality index κt a simple ARIMA(0,1,0) is used in combination withdummy variables and equation 3.35 reduces to

κt = κt−1 + φ0 +52∑i=2

βiDit + εt. (3.36)

This is an interesting model to investigate because it models each week of the yearseparately. One negative aspect of this model is that there are a lot of parameters toestimate. The second is that the mortality rate can vary significantly in the same weekfrom one year to the next and therefore the estimated parameters can be inaccurate.

The forecasts of κt are given by:

κT+f = κT+f−1 + φ0 +52∑i=2

βiDi(T+f) + εT+f , (3.37)

which simplifies to

κT+f = κT + fφ0 + f

52∑i=2

βiDi(T+f) +

f∑h=1

εT+h, (3.38)

so that the expected values of the forecasts can be found through

E[(κT+f )] = κT + fφ0 + f

52∑i=2

βiDi(T+f). (3.39)

Page 23: Seasonal Mortality Forecasting based on In uenza Epidemics

Seasonal Mortality Forecasting — Emanuela Festa 17

3.6 SARIMA with flu indicator as exogenous regressor

This section describes how to use an indicator representing the intensity of an in-fluenza epidemic to act as an exogenous regressor for the mortality index κt. The purposeof the indicator is to take into account the excess mortality in The Netherlands duringthe winter season and hence improve the quality of the fit and of the mortality forecast.The main advantage of using a predictor in the model such as a flu indicator is that,when information on imminent influenza epidemics is available, the mortality forecastsfor the immediate future can be improved.

In general the model can be expressed as

φ(L)Φ(L)(1− L)d(1− Lm)Dyt = cIt + φ0 + θ(L)Θ(L)εt, (3.40)

where It is the influenza indicator and c is a parameter to be estimated.

For a SARIMA(0,1,0)(0,1,0)52 with drift, with flu indicator, for example, the modelis given by

κt = κt−1 + κt−52 − κt−53 + cIt + φ0 + εt. (3.41)

The forecasts for this model are

κT+f = κT−1+f + κT−52+f − κT−53+f + cIT+f + φ0 + εT+f . (3.42)

This simplifies to

κT+f − κT−52+f = κT − κT−52 + c

f∑h=1

IT+h + fφ0 +

f∑h=1

εT+h, (3.43)

and the expected value of the forecasts are given by:

E[(κT+f )] = κT−52+f + κT − κT−52 + c

f∑h=1

IT+h + fφ0. (3.44)

3.6.1 Constructing a flu indicator

One way to construct the flu indicator It in the model is to simply setting it equal tothe number of flu cases registered by the NIVEL. In figures 3.3 and 3.4 the flu indicatorwas plotted together with a multiple of κt for comparison. Figure 3.3 compares thesetwo variables from 1995 to 2005 and figure 3.4 shows how they compare from 2005 to2015. The figures show that during the winter season the two are very highly correlated.

The peaks of the mortality index κt coincide quite well with the peaks of the numberof ILI cases registered by the NIVEL, with the exception of the 2009/2010 winter season,when a different kind of influenza epidemic (the A(H1N1), also known as “swine flu”)

Page 24: Seasonal Mortality Forecasting based on In uenza Epidemics

18 Emanuela Festa — Seasonal Mortality Forecasting

broke out. According to the NIVEL, the A(H1N1) influenza affects mostly the youngerpart of the population, causing less deaths than the A(H2N2) and A(H3N2) viruses,which affect mostly the elderly. In order to improve the quality of the indicator onecould choose to exclude registered influenza cases affected by the A(H1N1) virus. Seealso the paper by Reichert et al. (2004) whose study finds that excess mortality inwinter had a far higher correlation with the occurrence of A(H2N2) and A(H3N2) virusepidemics than with A(H1N1) and B virus epidemics.

Occasionally, the mortality index peaks slightly precede the peaks of the number ofILI cases (see, for example, the winters of 1997, 1999 and 2009). This seems illogicalbut could perhaps be caused by delays in registration of the ILI cases.

Figure 3.3: Number of flu cases per week registered by NIVEL and κt× 1000 from 1995to 2005.

Figure 3.4: Number of flu cases per week registered by NIVEL and κt× 1000 from 2005to 2015.

The graphs also show that the summer seasons during time period 2005 to 2015are not well described by the indicator. The troughs of κt, unlike the peaks, differconsiderably from the “troughs” of the flu indicator, which are simply equal to zero.Sharp peaks occurring in summer can also be observed (albeit smaller than the winter

Page 25: Seasonal Mortality Forecasting based on In uenza Epidemics

Seasonal Mortality Forecasting — Emanuela Festa 19

peaks). These could possibly be caused by heat waves but a correlation study betweentemperature and mortality in the summer season would have to be conducted in order toanswer this question with more certainty. The different summer behaviour will decreasethe quality of the fit and one could consider modifying the indicator such that negativevalues are inserted to indicate the summer seasons, or even include an extra indicator forthe summer. However, modelling summer troughs goes beyond the scope of the thesisand it will therefore be limited to forecasting winter peaks.

Page 26: Seasonal Mortality Forecasting based on In uenza Epidemics

Chapter 4

Results

This chapter presents the results obtained from forecasting mortality using the fourARIMA models described in chapter 3. Section 4.1 shows three year forecasts of themortality index κt per week and compares the relative performance of the models. Sec-tion 4.2 compares how well the models perform in out-of-sample forecasts, by measuringthe difference between forecasted and observed values of κt and mt. The section con-cludes with a comparison between the forecasting accuracies of the seasonal models andthat of a yearly RWD model. And finally, section 4.3 shows the results obtained forlong term forecasts (twenty years projection). The parameters of all the ARIMA mod-els mentioned in this chapter can be found in Appendix B and the R code used for thecalculations can be found in Appendix C.

4.1 Full sample forecasts

Three year forecasts of κt have been made with the following models: Random Walkwith Drift, SARIMA(p,d,q)(P,D,Q)52 with drift, ARIMA(0,1,0) with weekly dummyvariables and SARIMA(p,d,q)(P,D,Q)52 with drift, with flu indicator as exogenous re-gressor. The models were estimated with observations from 1995 to 2015. At the end ofthe section a table summarises the main properties of the forecasts.

In the forecast graphs presented in this chapter, the black solid line represents ob-served values of κt whereas the solid blue line represents forecasted values. Two confi-dence intervals are shown in the graphs of the first section: the smaller one with darkershade shows the 80% confidence interval and the larger one, shaded light grey, showsthe 95% confidence interval.

4.1.1 Random Walk with Drift

When the model used for fitting κt is an ARIMA(0,1,0) with constant, only the trendcan be forecasted since the seasonal component is not modelled. Figure 4.1 shows thethree year forecast of the trend of κt. The residual variance σ2ε is 1.97 × 10−3 and theconfidence intervals are quite large, indicating the imprecision of this model.

The drift φ0 was calculated in chapter 3 as −1.00× 10−4. The calculation was doneusing the trend line in figure 3.2 rather than using the Arima function from the forecastpackage (Hyndman, 2015) in R, in order to avoid obtaining an incorrect drift value. TheArima function calculates the drift using equation 3.20, thereby only considering thefirst and last values of the time series (κ1 and κ1040). As explained in chapter 3, thedifference between the first and last values of the time series can be equal to any in a

20

Page 27: Seasonal Mortality Forecasting based on In uenza Epidemics

Seasonal Mortality Forecasting — Emanuela Festa 21

Forecasts from ARIMA(0,1,0) with drift

Year

κ t

1995 2000 2005 2010 2015

−1.

0−

0.5

0.0

0.5

1.0

κt

forecast

Figure 4.1: Three year forecast of the mortality index κt per week using the RWD model.

wide range of values, both positive and negative, and the resulting drift would thereforebe quite different from the general drift of the time series.

In order to obtain the drift estimated in chapter 3 a correction was applied to thefirst and final values of the observed time series. κ1 was assigned a value equal to thefirst point of the regression line and κ1040 to the last point (times a factor to account forthe different number of data points). Following this correction, when the Arima functionwas used to fit the time series a drift φ0 of −1.00× 10−4 was obtained, with a standarderror (s.e.) of 1.40× 10−3, implying the drift estimation is not significant.

Since the forecast provides no information whatsoever on the future seasonal be-haviour of κt, the model is clearly more suitable for fitting yearly rather than weeklytime series. Therefore, the rest of this chapter will focus on the other three types ofARIMA models.

4.1.2 SARIMA(p,d,q)(P,D,Q)52 with drift

If seasonal effects are taken into account by fitting κt using a SARIMA(p,d,q)(P,D,Q)52with drift structure, as described in section 3.4, a seasonal trend appears in the forecast.

Different SARIMA(p,d,q)(P,D,Q)52 models were tested using the standard Box-Jenkins methodology for ARIMA model selection (Box et al., 2015). Amongst thesethe SARIMA(2,0,1)(0,1,1)52 with drift was found to be the best fit for κt. Figure 4.2shows the three year forecast of κt using this model. The forecast has a residual varianceσ2ε of 1.68 × 10−3. This is a large improvement with respect to the RWD model. Theconfidence intervals are also much narrower than the ones shown in figure 4.1.

When the model is chosen using the Box-Jenkins methodology, a correction for thedrift, like the one that was applied in the RWD model, is no longer necessary sincehigher orders of p, q and Q render the model more accurate. The drift estimated withthe Arima function in R is equal to the drift that was estimated in chapter 3 using figure3.2 (φ0 = −1.00×10−4). See Appendix B for the estimation of the remaining parametercoefficients. Besides φ2 all parameter estimations are significant.

Page 28: Seasonal Mortality Forecasting based on In uenza Epidemics

22 Emanuela Festa — Seasonal Mortality Forecasting

Forecasts from SARIMA(2,0,1)(0,1,1)[52] with drift

Year

κ t

1995 2000 2005 2010 2015

−0.

2−

0.1

0.0

0.1

0.2

0.3

κt

forecast

Figure 4.2: Three year forecast of the mortality index κt per week using theSARIMA(2,0,1)(0,1,1)52 with drift model.

4.1.3 RWD with weekly dummy variables

Forecasts from RWD with Dummy Variables

Year

κ t

1995 2000 2005 2010 2015

−1.

0−

0.5

0.0

0.5

1.0

κt

forecast

Figure 4.3: Three year forecast of the mortality index κt per week using a RWD withseasonal dummy variables.

Figure 4.3 shows the three year forecast obtained by using a RWD with dummyvariables for each week of the year (see section 3.5). In Appendix B the values of theestimated parameters (including 51 dummy variables) can be found.

Similarly to the forecast obtained using the SARIMA(2,0,1)(0,1,1)52 with drift model,each period (of length 1 year) looks exactly the same. The shape of a single period in theforecast is the result of fitting each of the 52 weeks in a year with separate parameters.

Page 29: Seasonal Mortality Forecasting based on In uenza Epidemics

Seasonal Mortality Forecasting — Emanuela Festa 23

The residual variance σ2ε is lower than the one in the RWD model but higher than inthe SARIMA model: 1.75× 10−3.

In a RWD model the drift is determined by the first and last values of the time seriesso also in this case a correction needs to be made in order to ensure that the forecastuses the correct drift (φ0 = −1.00× 10−4). The correction was made by first setting κ1and κ1040 (the first and last values) equal to the average of all the first and last valuesof the years in the sample. Then it was calculated that in order to obtain the correctdrift the difference between the first and last value needs to be -0.104. Half of this valuewas therefore subtracted from κ1 and added half of it to κ1040.

Similarly to the SARIMA models, the fit and drift with this type of model are alsoimproved when higher orders of p and q are considered. Appendix B also includes theparameters estimated for an ARIMA(5,1,1) with dummy variables, which is the bestmodel with dummy variables according to Box-Jenkins methodology (Box et al., 2015).The coefficients of the estimated β parameters (the so-called dummy variables) are ap-proximately equal to the coefficients of the β parameters belonging to the ARIMA(0,1,0)with dummy variables model. The standard errors, however, are reduced considerably(up to a factor of 2). Also σ2ε is considerably reduced.

What all the models so far fail to predict are exceptional spikes in the winter season,indicating higher than usual excess mortality. The peaks (and troughs) in the forecastare all very regular while the time series clearly shows that there is great variation in theheight of the peaks. This aspect of the time series is considered in the model describedin section 3.6, which takes into account the intensity of influenza epidemics during thewinter season. The results of the forecasts obtained with the flu indicator model areshown in the following subsection.

4.1.4 SARIMA(p,d,q)(P,D,Q)52 with drift, with flu indicator

Forecasts from SARIMA(2,0,1)(0,1,1)[52] with drift, with Flu Indicator

Year

κ t

1995 2000 2005 2010 2015

−0.

2−

0.1

0.0

0.1

0.2

0.3

κt

forecast

Figure 4.4: Three year forecast of the mortality index κt per week using aSARIMA(2,0,1)(0,1,1)52 with a flu indicator as exogenous regressor.

If the intensity of influenza epidemics in the coming winter season is known approx-imately, the prediction of excess mortality in winter could be made more accurately

Page 30: Seasonal Mortality Forecasting based on In uenza Epidemics

24 Emanuela Festa — Seasonal Mortality Forecasting

using the model described in section 3.6. This method is therefore mainly applicable forshort-term mortality forecasting.

Figure 4.4 shows a hypothetical three year forecast of the mortality index in winter.The flu indicator predicting influenza intensity for years 2015 to 2018 was chosen tobe equal to the one used for years 1995 to 1998. From the graph it can be seen thatthe three predicted peaks follow the same pattern of relative heights as the peaks inyears 1995 to 1998, however they don’t reach the same level of mortality because of thedeclining mortality rate observed in the second decade relative to the first one.

Similarly to the SARIMA(2,0,1)(0,1,1)52 model without flu indicator, there was noneed to correct the drift since the model, selected through Box-Jenkins methodology,manages to capture the correct drift by including higher orders of p, q and Q. Fur-thermore, the residual variance σ2ε is a lot lower than in the other models presented:1.54× 10−3.

The c parameter in equation 3.41 was estimated as 6× 10−4 (with a s.e. of 1× 10−4)and the coefficients of the remaining parameters can be found in Appendix B. It’sinteresting to note that all the parameter coefficients, apart from the one belonging toΘ1 (which is already highly significant), improve their significance considerably withrespect to the parameter coefficients of the SARIMA(2,0,1)(0,1,1)52 model without fluindicator.

4.1.5 Model evaluation

This subsection shows how the models discussed in this section perform in terms ofthe “Akaike Information Criterion” (AIC), the “Bayesian Information Criterion” (BIC)and the residual variance σ2ε . These criteria are useful in selecting the best ARIMAmodels because they all represent a measure for the variance (or information loss).Therefore, the lower their values, the better the model. However, model selection shouldalso take into account the relative complexity of the models. When considering modelsthat yield a similar goodness of fit, less complex models should be preferred to morecomplex ones. To this end, the AIC and BIC also include a penalty term which isproportional to the number of parameters fitted.

The AIC and BIC can be calculated as follows:

AIC = −2 ln(L) + 2k, (4.1)

BIC = −2 ln(L) + k ln(n), (4.2)

where

• L is the maximum value of the likelihood function for the model;

• k denotes the total number of parameters used (including the variance parameterσ2ε );

• n denotes the number of observations used for estimation.

Table 4.1 shows that the residual variance σ2ε is smallest when using an ARIMA(5,1,1)model with weekly dummy variables. Despite the large number of parameters involved,this model gives a better performance than the ARIMA(0,1,0) model with weeklydummy variables also in terms of the AIC and BIC.

Page 31: Seasonal Mortality Forecasting based on In uenza Epidemics

Seasonal Mortality Forecasting — Emanuela Festa 25

Model AIC BIC σ2εRWD -3516 -3507 0.00197

SARIMA(2,0,1)(0,1,1)52 with drift -3267 -3238 0.00168

RWD with weekly dummy variables -3536 -3274 0.00175

ARIMA(5,1,1) with weekly dummy variables, with drift -3746 -3454 0.00141

SARIMA(2,0,1)(0,1,1)52 with drift, with flu indicator -3358 -3324 0.00154

Table 4.1: An evaluation of the models used for forecasting the mortality index.

The RWD model is the model that performs least well in terms of residual variance.As noted earlier, the model can be used for forecasting the trend (although a correctionfor determining the right drift is needed) of κt but gives no information on its seasonalbehaviour.

The SARIMA(2,0,1)(0,1,1)52 with flu indicator is the second best model when consid-ering the residual variance. It has the advantage that it has less parameters to estimatethan the ARIMA(5,1,1) model with weekly dummy variables, and produces a moreaccurate forecast of the winter mortality when information on flu epidemics is available.

4.2 Out-of-sample forecasts of κt and mt

In this section the performances of the SARIMA(2,0,1)(0,1,1)52 with drift, RWDwith weekly dummy variables and ARIMA(2,0,1)(0,1,1)52 with drift, with flu indicatormodels are measured through computation of out-of-sample forecasts. The out-of-sampleforecasts are based on the following data: the mortality index per week from the firstweek of 1995 to the last week of 2011. In other words, the full sample except for the lastthree years is used.

The graphs below show how well the forecasts from 2012 to 2015 (represented by theblue lines) match the observed values of κt (represented by the red lines) in the samethree year period according to the various models that were tested. Confidence intervalsare not included with the forecasts in order to be able to zoom in better on the graphs.

In table 4.2 the out-of-sample forecast performance of the models is compared bycalculating the Mean Absolute Error (MAE) and the Root Mean Square Error (RMSE).

Further in this section, out-of-sample forecasts of mt are shown. The performance ofthe models is measured again using the MAE and RMSE. These values are shown intable 4.3.

At the end of the section the accuracy of the out-of-sample forecasts of the seasonalmodels is compared to the accuracy of out-of-sample forecasts of a yearly RWD modelby aggregating the weekly forecasts into yearly ones and then calculating the MAE andRMSE (see table 4.4).

4.2.1 SARIMA(2,0,1)(0,1,1)52 with drift

Figure 4.5 shows the out-of-sample forecast of the mortality index per week usingthe SARIMA(2,0,1)(0,1,1)52 with drift model. The forecast is a good approximation forthe actual values of κt in the summer seasons but excess mortality in winter is less wellforecasted. The height of the peaks only matches in the 2013/2014 winter season. Forthe remaining winter seasons, the forecast of the peaks is too low.

Page 32: Seasonal Mortality Forecasting based on In uenza Epidemics

26 Emanuela Festa — Seasonal Mortality Forecasting

Figure 4.5: Three year forecast of κt using the SARIMA(2,0,1)(0,1,1)52 with drift model(blue line) and actual values of κt from 1995 to 2012 (black line) and from 2012 to 2015(red line).

4.2.2 RWD with weekly dummy variables

Figure 4.6: Three year forecast of κt using an ARIMA(0,1,0) with seasonal dummyvariables (blue line) and actual values of κt from 1995 to 2012 (black line) and from2012 to 2015 (red line).

Figure 4.6 shows the out-of-sample forecast of the mortality index per week usingthe ARIMA(0,1,0) with weekly dummy variables model. Years 2012 to 2015 appear tobe fairly well forecasted. Similarly to the SARIMA model, the summer seasons matchreasonably well. What the model fails to capture are the high peaks at the end of year

Page 33: Seasonal Mortality Forecasting based on In uenza Epidemics

Seasonal Mortality Forecasting — Emanuela Festa 27

2012 and 2014 as this model will always yield equal peak heights. Peaks appear to beslightly higher than in the SARIMA model.

4.2.3 SARIMA(2,0,1)(0,1,1)52 with drift, with flu indicator

Figure 4.7: Three year forecast of κt using a SARIMA(2,0,1)(0,1,1)52 with drift, witha flu indicator as exogenous regressor (blue line) and actual values of κt from 1995 to2012 (black line) and from 2012 to 2015 (red line).

The out-of-sample forecast of the mortality index per week using a flu indicator asexogenous regressor with a SARIMA(2,0,1)(0,1,1)52 with drift model is shown in figure4.7. The forecast yields similar results to the SARIMA(2,0,1)(0,1,1)52 with drift modelwithout flu indicator but the forecast of the winter peaks is improved. Especially thehighest peak occurring in the 2012/2013 winter season is improved.

4.2.4 Out-of-sample forecasts of the central mortality rate

It is also interesting to look at how the out-of-sample results for κt translate to theforecasted central mortality rate. Using equation 3.11 and the obtained forecasts of κtthe forecasted central mortality rate can be estimated as follows:

mT+f = exp{α+ κT+f}. (4.3)

The following three graphs (figures 4.8, 4.9 and 4.10) show how mT+f compares tothe actual mt values from 2012 to 2015.

Page 34: Seasonal Mortality Forecasting based on In uenza Epidemics

28 Emanuela Festa — Seasonal Mortality Forecasting

Forecasts from SARIMA(2,0,1)(0,1,1)[52] with drift

Year

cent

ral m

orta

lity

rate

mt

2012.0 2012.5 2013.0 2013.5 2014.0 2014.5 2015.0

0.00

014

0.00

015

0.00

016

0.00

017

0.00

018

0.00

019 mt 2012 − 2015

forecast (out−of−sample)

Figure 4.8: Three year forecast of mt using the SARIMA(2,0,1)(0,1,1)52 with drift model(blue line) and actual values of mt from 2012 to 2015 (black line).

Forecasts from RWD with Dummy Variables

Year

cent

ral m

orta

lity

rate

mt

2012.0 2012.5 2013.0 2013.5 2014.0 2014.5 2015.0

0.00

014

0.00

015

0.00

016

0.00

017

0.00

018

0.00

019 mt 2012 − 2015

forecast (out−of−sample)

Figure 4.9: Three year forecast of mt using an RWD with seasonal dummy variables(blue line) and actual values of mt from 2012 to 2015 (black line).

4.2.5 Model evaluation

By defining Yt as the observation of the mortality index at week t and Ft as theforecast of the mortality index at week t it is possible to compare the models based onthe forecast error et = Yt−Ft. According to Hyndman and Koehler (2006), the MAE =mean(|et|) and the RMSE =

√mean(e2t ) are suitable measures for comparing different

methods applied to the same data set. Measures involving percentage errors (such asthe Mean Absolute Percentage Error), on the other hand, should be avoided when thereare values of Yt close to 0 because these will have a highly skewed distribution.

Page 35: Seasonal Mortality Forecasting based on In uenza Epidemics

Seasonal Mortality Forecasting — Emanuela Festa 29

Forecasts from SARIMA(2,0,1)(0,1,1)[52] with drift, with Flu Indicator

Year

cent

ral m

orta

lity

rate

mt

2012.0 2012.5 2013.0 2013.5 2014.0 2014.5 2015.0

0.00

014

0.00

015

0.00

016

0.00

017

0.00

018

0.00

019 mt 2012 − 2015

forecast (out−of−sample)

Figure 4.10: Three year forecast of mt using a SARIMA(2,0,1)(0,1,1)52 with drift, witha flu indicator as exogenous regressor (blue line) and actual values of mt from 2012 to2015 (black line).

Mortality index

In table 4.2 the MAE and RMSE for the three ARIMA models used in this sectionare shown. The calculations include all of the out-of-sample observations and forecasts.

.Model MAE RMSE

SARIMA(2,0,1)(0,1,1)52 with drift 0.0513 0.0644

RWD with weekly dummy variables 0.0400 0.0503

SARIMA(2,0,1)(0,1,1)52 with drift, with flu indicator 0.0364 0.0463

Table 4.2: A comparison of the out-of-sample performance of the models used for fore-casting the mortality index.

From the results shown in the table it can be concluded that the ARIMA(2,0,1)(0,1,1)52with flu indicator model yields the best results as it produces the most accurate forecastof κt for the years 2012 to 2015. Compared to the SARIMA(2,0,1)(0,1,1)52 with driftit improves the MAE by 29% and the RMSE by 28%, which is significant. The modelthat performs worse is the SARIMA(2,0,1)(0,1,1)52 with drift.

Central mortality rate

Table 4.3 shows the the MAE and RMSE obtained when the models are used toforecast the central mortality rate mt:

Similarly to the forecasts of the mortality index κt, the model that performs best inout-of-sample forecasting is the SARIMA(2,0,1)(0,1,1)52 with drift, with flu indicatorand the one that performs least well is the SARIMA(2,0,1)(0,1,1)52 with drift.

Page 36: Seasonal Mortality Forecasting based on In uenza Epidemics

30 Emanuela Festa — Seasonal Mortality Forecasting

.Model MAE RMSE

SARIMA(2,0,1)(0,1,1)52 with drift 7.9× 10−6 1.0× 10−5

RWD with weekly dummy variables 6.4× 10−6 8.2× 10−6

SARIMA(2,0,1)(0,1,1)52 with drift, with flu indicator 5.7× 10−6 7.4× 10−6

Table 4.3: A comparison of the out-of-sample performance of the models used for fore-casting the central mortality rate.

A comparison between seasonal and yearly models

Finally, a comparison of the forecasting performance in terms of MAE and RMSE wasmade between the seasonal models with weekly data and a RWD model with annualdata. In order to compare weekly and annual models, each 52 weekly out-of-sampleforecasts of mt from the seasonal models were aggregated to form yearly forecasts ofmt. The results are shown in table 4.4.

Model MAE RMSE

RWD 3.0× 10−4 3.0× 10−4

(input: annual data; forecast: annual mt)

SARIMA(2,0,1)(0,1,1)52 with drift 3.9× 10−4 3.9× 10−4

(input: weekly data; forecast: aggregated weekly mt)

RWD with weekly dummy variables 1.8× 10−4 1.8× 10−4

(input: weekly data; forecast: aggregated weekly mt)

SARIMA(2,0,1)(0,1,1)52 with drift, with flu indicator 2.0× 10−4 2.0× 10−4

(input: weekly data; forecast: aggregated weekly mt)

Table 4.4: A comparison of the out-of-sample forecasting performance of a yearly andthree seasonal models used for forecasting the central mortality rate.

The results show that for two out of three models, the aggregated seasonal forecastsof mt are more accurate than forecasts from a standard RWD model that uses annualdata. The only model that performs worse is the SARIMA with drift model (withoutflu indicator). Although the sample used for these calculations is quite small, the resultssuggest that some seasonal ARIMA models are more accurate than the yearly RWDmodel.

4.3 Long-term forecasts

In this section the forecasts of κt in the long term are shown. The graphs wereobtained by extending the forecasts in the first section of this chapter from three totwenty years. Once again confidence intervals are left out in order to zoom in better onthe graphs.

The SARIMA(2,0,1)(0,1,1)52 with drift and RWD with seasonal dummy variablesmodels in figures 4.11 and 4.12 show very similar forecasts. Both models yield very reg-ular forecasts since in each one all periods repeat exactly the same pattern. A differencebetween the two models is that the RWD with seasonal dummy variables model predictsslightly higher peaks. Both models project very regular forecasts, which do not reflectthe great variation in peak height shown by the observed κt values from 1995 to 2015.

The flu indicator model is suitable for testing various future influenza epidemicsscenarios. As an example, figure 4.13 shows a twenty year forecast made with the flu

Page 37: Seasonal Mortality Forecasting based on In uenza Epidemics

Seasonal Mortality Forecasting — Emanuela Festa 31

Forecasts from SARIMA(2,0,1)(0,1,1)[52] with drift

Year

κ t

2000 2010 2020 2030

−0.

2−

0.1

0.0

0.1

0.2

0.3

κt

forecast

Figure 4.11: Twenty year forecast of the mortality index per week using theSARIMA(2,0,1)(0,1,1)52 with drift model.

Forecasts from RWD with Dummy Variables

Year

κ t

2000 2010 2020 2030

−0.

2−

0.1

0.0

0.1

0.2

0.3

κt

forecast

Figure 4.12: Twenty year forecast of the mortality index per week using a RWD modelwith seasonal dummy variables.

indicator model. The flu indicator used for forecasting years 2015 to 2035 was the sameas the one used for fitting the model from year 1995 to 2015. As the example of figure4.13 shows, the winter peaks follow roughly the pattern of the peaks from 1995 to 2015.

Page 38: Seasonal Mortality Forecasting based on In uenza Epidemics

32 Emanuela Festa — Seasonal Mortality Forecasting

Forecasts from SARIMA(2,0,1)(0,1,1)[52] with drift, with Flu Indicator

Year

κ t

2000 2010 2020 2030

−0.

2−

0.1

0.0

0.1

0.2

0.3

κt

forecast

Figure 4.13: Twenty year forecast of the mortality index per week using aSARIMA(2,0,1)(0,1,1)52 with drift, with a flu indicator as exogenous regressor. Theflu indicator used for the forecast uses the same flu values observed from 1995 to 2015.

However, it is possible to see that the accuracy of the forecast decreases with time asthe winter peaks tend to become more and more regular instead of following the peakpattern of the observed time series.

Page 39: Seasonal Mortality Forecasting based on In uenza Epidemics

Chapter 5

Conclusions

The first conclusion that can be made from the results obtained in this research isthat in order to model the seasonal component of the time-varying parameter κt, whichwas determined per week, a more complex type of ARIMA structure is needed than theRWD used by Lee-Carter in 1992 to model yearly mortality. Therefore, to find a suitablemodel for forecasting seasonal mortality three types of seasonal ARIMA models wereinvestigated: a SARIMA(p,d,q)(P,D,Q)m, an ARIMA with m− 1 dummy variables anda SARIMA(p,d,q)(P,D,Q)m combined with an indicator based on influenza intensity.

After a careful model selection procedure following Box-Jenkins methodology, it wasfound that a SARIMA(2,0,1)(0,1,1)52 and an ARIMA(5,1,1) with dummy variables canindeed significantly approximate the seasonal behaviour belonging to the weekly mortal-ity index (and hence of the weekly mortality rate). When comparing the two models, itwas found that the ARIMA with dummy variables can produce acceptable results evenwhen a simple RWD form is considered (although model selection through Box-Jenkinsmethodology will of course improve the goodness of fit). The SARIMA(p,d,q)(P,D,Q)m,however, needs to be more complex than a RWD SARIMA (a SARIMA(0,1,0)(0,1,0)52with constant) in order to achieve similar results.

The out-of-sample forecasts presented in the “Results” chapter show that the modelsperform well in the short term since the forecasts, in general, do not differ considerablyfrom the actual observations (except for at the “winter peaks”). The models, however,produce a very regular type of forecast that assumes that the mortality index will behaveexactly the same way each year, apart from a light overall drift that tends to decreasethe mortality over the years.

In order to model the great variation in excess mortality during the winter seasonshown by κt, a flu indicator was added to the SARIMA(2,0,1)(0,1,1)52 model. Theout-of-sample forecasts results show that the addition of this exogenous regressor in themodel does improve the goodness of fit significantly (29% improvement of the MAE and28% improvement of the RMSE with respect to the SARIMA without indicator). Whatthe flu indicator does not model, however, is the mortality variation in the summerseason. Also, the long term forecasts show that, given a specific flu indicator scenario,the ability to forecast excess mortality using the flu indicator decreases with time. Sothe further in the future the forecast is, the lower the influence of the flu indicator.

When out-of-sample forecasts of the central mortality rate obtained with a standardannual RWD model were compared with aggregated seasonal out-of-sample forecasts,two of the three seasonal models provided more accurate forecasts. These were theSARIMA with flu indicator and the RWD with dummy variables. This suggests thataggregating higher frequency forecasts could improve the accuracy of annual forecasts.However, more tests should be made (and with larger samples) in order to further

33

Page 40: Seasonal Mortality Forecasting based on In uenza Epidemics

34 Emanuela Festa — Seasonal Mortality Forecasting

investigate this claim.

Influenza was chosen as a predictor for excess mortality in winter because flu epi-demics occur almost every year and studies suggest that they are very likely the singularcause of winter excess mortality in economically developed countries (see Reichert et al.(2004)). Moreover, the histograms shown in chapter 2 show that it is a better variableto use than average temperature due to its stronger correlation with the weekly mortal-ity rate (in the time period of 1995 to 2015). The results show that, if information oninfluenza epidemics is available and it is incorporated in the SARIMA model describedin this thesis, seasonal mortality forecasts can become considerably more accurate thanwhen influenza is not taken into account.

Although predicting the intensity of future flu epidemics is not yet possible, researchon this topic is taking place in the medical field and may become feasible in the future.See for example Viboud et al. (2003), which describes a method that proved suitablefor retrospectively predicting temporal and geographical diffusion of influenza up to10 weeks in advance, and Shaman and Karspeck (2012), for more recent studies usingweb-based estimates of local influenza infection rates.

Another interesting development is the increasing use of flu vaccines in The Nether-lands by elderly people, which could be one of the causes of the decreasing flu epidemicintensity observed over the last decades (Dijkstra et al., 2009). If a clear correlationbetween flu vaccinations and excess winter mortality exists, the flu indicator model de-scribed in this thesis could be used to study the impact of different vaccination policieson seasonal mortality.

Improvements of the forecasts could be achieved once a larger sample is available.Since the current flu indicator only models winter excess mortality, it would also bea good idea to try and investigate the correlation between heat waves and summermortality (see for example Huynen et al., 2001) so that an indicator for excess summermortality can be constructed. Moreover, to improve forecasts of excess winter mortalityanother indicator based on temperature could be added to the model. Furthermore, theestimation of the drift φ0 parameter could, perhaps, be improved by using yearly datainstead of applying the estimation procedure described in chapter 3.

In order to continue the research on seasonal mortality forecasting, one could dif-ferentiate the data (if available) according to gender and age. Finally, it would alsobe interesting to model seasonal mortality using another frequency, for example, usingmonthly data instead of weekly.

Page 41: Seasonal Mortality Forecasting based on In uenza Epidemics

Appendix A: Monthly to weeklyadjustments of population data

Here follows a method that was used to estimate the population in The Netherlandsper week, Pwi at the beginning of week wi (with i = 1, 2, ..., 52), from monthly populationdata, Pmj (j = 1, 2, ..., 12), and number of deaths in The Netherlands per week, Dwi .Note that Pmj denotes the total population at the beginning of month mj and Dwi isthe number of deaths that occurred during week wi.

The change in population in The Netherlands in month mj , ∆Pmj = Pmj+1 − Pmj ,can be calculated by subtracting the total number of emigrations and deaths from thetotal number of births and immigrations that have occurred in that month. However,since only the number of deaths per month, Dmj , is known (following aggregation ofDwi), the total number of births and immigrations reduced by the number of emigrationsis an unknown number Xmj , and ∆Pmj can be written as

∆Pmj = Xmj −Dmj . (5.1)

It is further assumed that the unknown changes in population are equally spread outthroughout the month so that if week wi occurs in month mj then Xmj is simply equalto Xwi times the number of weeks in month mj . It follows that Xwi can be expressedas

Xwi =Dmj + ∆Pmj

number of weeks in month mj. (5.2)

Let wi be the first week of month mj and suppose this month consists of 4 weeks.Then Pwi is simply equal to Pmj , and in order to calculate the population per week inthe remaining weeks of the month one can subtract the number of deaths in the previousweek from the population at the beginning of the previous week and add Xwi :

Pwi+1 = Pwi −Dwi +Xwi ; (5.3)

Pwi+2 = Pwi+1 −Dwi+1 +Xwi ; (5.4)

Pwi+3 = Pwi+2 −Dwi+2 +Xwi . (5.5)

35

Page 42: Seasonal Mortality Forecasting based on In uenza Epidemics

Appendix B: Parameters ofARIMA models

ARIMA(0,1,0) with drift

.Parameters φ0 σ2εCoefficients -0.0001 0.00197

s.e. 0.0014 n.a.

ARIMA(0,1,0) with Seasonal Dummy Variables, with drift

.

Parameters φ0 σ2ε β2 β3 β4 β5 β6 β7

Coefficients -0.0001 0.00175 0.0270 0.0205 -0.0073 -0.0150 -0.0156 -0.0193

s.e. 0.0013 n.a. 0.0095 0.0131 0.0159 0.0181 0.0200 0.0217

Parameters β8 β9 β10 β11 β12 β13 β14 β15

Coefficients -0.0282 -0.0206 -0.0199 -0.0204 -0.0428 -0.0513 -0.0644 -0.0690

s.e. 0.0231 0.0244 0.0256 0.0267 0.0276 0.0285 0.0293 0.0300

Parameters β16 β17 β18 β19 β20 β21 β22 β23

Coefficients -0.0846 -0.0816 -0.0977 -0.1255 -0.1234 -0.1253 -0.1423 -0.1482

s.e. 0.0306 0.0312 0.0317 0.0322 0.0325 0.0329 0.0332 0.0334

Parameters β24 β25 β26 β27 β28 β29 β30 β31

Coefficients -0.1419 -0.1520 -0.1494 -0.1526 -0.1438 -0.1589 -0.1642 -0.1554

s.e. 0.0338 0.0338 0.0337 0.0336 0.0334 0.0331 0.0329 0.0325

Parameters β32 β33 β34 β35 β36 β37 β38 β39

Coefficients -0.1678 -0.1773 -0.1643 -0.1743 -0.1986 -0.1858 -0.1837 -0.1735

s.e. 0.0321 0.0336 0.0337 0.0338 0.0317 0.0312 0.0306 0.0300

Parameters β40 β41 β42 β43 β44 β45 β46 β47

Coefficients -0.1726 -0.1590 -0.1531 -0.1407 -0.1294 -0.1350 -0.1318 -0.1180

s.e. 0.0292 0.0285 0.0276 0.0266 0.0255 0.0244 0.0231 0.0216

Parameters β48 β49 β50 β51 β52

Coefficients -0.1038 -0.0866 -0.0789 -0.0417 -0.0117

s.e. 0.0199 0.0180 0.0157 0.0130 0.0093

36

Page 43: Seasonal Mortality Forecasting based on In uenza Epidemics

Seasonal Mortality Forecasting — Emanuela Festa 37

ARIMA(5,1,1) with Seasonal Dummy Variables, with drift

.

Parameters φ0 φ1 φ2 φ3 φ4 φ5 θ1 σ2εCoefficients -0.0001 0.5751 0.0369 0.1042 -0.1024 0.0611 -0.9850 0.00141

s.e. 0.0001 0.0317 0.0357 0.0357 0.0357 0.0316 0.0063 n.a.

Parameters β2 β3 β4 β5 β6 β7 β8 β9

Coefficients 0.0263 0.0239 -0.0041 -0.0116 -0.0124 -0.0162 -0.0250 -0.0175

s.e. 0.0095 0.0115 0.0124 0.0135 0.0138 0.0141 0.0144 0.0146

Parameters β10 β11 β12 β13 β14 β15 β16 β17

Coefficients -0.0168 -0.0174 -0.0399 -0.0484 -0.0615 -0.0661 -0.0817 -0.0788

s.e. 0.0147 0.0149 0.0149 0.0150 0.0150 0.0151 0.0151 0.0151

Parameters β18 β19 β20 β21 β22 β23 β24 β25

Coefficients -0.0949 -0.1228 -0.1207 -0.1225 -0.1396 -0.1455 -0.1393 -0.1493

s.e. 0.0151 0.0152 0.0152 0.0152 0.0152 0.0152 0.0152 0.0152

Parameters β26 β27 β28 β29 β30 β31 β32 β33

Coefficients -0.1467 -0.1500 -0.1412 -0.1563 -0.1616 -0.1528 -0.1653 -0.1748

s.e. 0.0152 0.0152 0.0152 0.0152 0.0152 0.0152 0.0152 0.0152

Parameters β34 β35 β36 β37 β38 β39 β40 β41

Coefficients -0.1618 -0.1719 -0.1962 -0.1834 -0.1813 -0.1712 -0.1704 -0.1568

s.e. 0.0152 0.0152 0.0151 0.0151 0.0151 0.0151 0.0150 0.0150

Parameters β42 β43 β44 β45 β46 β47 β48 β49

Coefficients -0.1510 -0.1386 -0.1274 -0.1331 -0.1299 -0.1162 -0.1022 -0.0852

s.e. 0.0149 0.0148 0.0147 0.0146 0.0144 0.0141 0.0138 0.0135

Parameters β50 β51 β52

Coefficients -0.0777 -0.0411 -0.0112

s.e. 0.0123 0.0114 0.0093

SARIMA(2,0,1)(0,1,1)52 with drift

.Parameters φ0 φ1 φ2 θ1 Θ1 σ2εCoefficients -0.0001 1.0894 -0.2251 -0.5047 -0.9278 0.00168

s.e. 0.0000 0.2282 0.1606 0.2192 0.0360 n.a.

Page 44: Seasonal Mortality Forecasting based on In uenza Epidemics

38 Emanuela Festa — Seasonal Mortality Forecasting

SARIMA(2,0,1)(0,1,1)52 with Flu Indicator, with drift

.Parameters φ0 φ1 φ2 θ1 Θ1 c σ2εCoefficients -0.0001 1.3621 -0.3804 -0.8927 -0.9069 0.0006 0.00154

s.e. 0.0000 0.0624 0.0520 0.0469 0.0304 0.0001 n.a.

Page 45: Seasonal Mortality Forecasting based on In uenza Epidemics

Appendix C: R code

Below we give the R code used for modelling and forecasting weekly mortality. Notethat by changing the parameters in lines 7 and 10 it is possible to use this script alsofor calculating age-specific Lee-Carter parameters.

library(forecast)

library(astsa)

# MODEL INPUT (LOG DEATH RATES)

nl<-read.table("weekly mortality input with indicator.txt", fileEncoding="UTF-16", header=TRUE)

rates <- nl$Total

logmu <- log(rates)

M <- matrix(logmu, 1040, 1, byrow=TRUE) # M stores log of death rates in matrix of 1040 weeks

# by 1 age co-hort

# FITTING THE LEE-CARTER MODEL USING SVD

a <- colMeans(M) # yields mean log rates for each group,

# (the alphas in LC equation)

for(j in 1:1) M[,j] <- M[,j] - a[j] # subtract a from all weeks

d <- svd(M,1,1) # compute the SVD

b <-d$v/sum(d$v) # normalize the b’s so they add up to 1

sum(b)

k <- d$u*sum(d$v)*d$d[1] # calculate the k vector

sum(k) # should sum up to 0

# CALCULATE STANDARD DEVIATION

y <- 0:(length(k)-2) # calculate sigma (standard deviation)

for (i in 1:1039) y[i] <- k[i+1] - k[i]

q <- (y-drift)^2

sigma <- sqrt((1/1038)*sum(q))

sigma

# PLOT Kt AND TREND

k<-ts(k,freq=52,start=c(1995,1)) # kt time series

j<-decompose(k)

png(’OriginalTimeSeries.png’)

plot(j$x, xlab="Year", ylab=expression(kappa[t]))

lines(j$trend, col="red")

legend(2008,0.35, c(expression(kappa[t]),expression(T[t])), lty=c(1,1), col=c("black","red"))

dev.off()

# SEASONAL DUMMY VARIABLES

dummies<-seasonaldummy(k) # creates 51 seasonal dummy variables

39

Page 46: Seasonal Mortality Forecasting based on In uenza Epidemics

40 Emanuela Festa — Seasonal Mortality Forecasting

# FLU INDICATORS (PAST AND FUTURE)

I<-nl$Indicator

tI<-transpose(I)

tIf<-head(tI,156)

# PLOTTING FORECASTS WITH 4 ARIMA MODELS

#RWD

fit1<-Arima(k,order=c(0,1,0),include.drift=TRUE)

png(’ForecastRWD.png’)

plot(forecast(fit1, 156), xlab="Year", ylab=expression(kappa[t]))

legend(1998,1, c(expression(kappa[t]),"forecast"), lty=c(1,1), col=c("black","blue"))

dev.off()

#SARIMA

fit2<-Arima(k, order=c(2,0,1), seasonal=list(order=c(0,1,1),period=52), include.drift=TRUE)

png(’ForecastSARIMA.png’)

plot(forecast(fit2, 156), xlab="Year", ylab=expression(kappa[t]), main="Forecasts from

SARIMA(2,0,1)(0,1,1)[52]")

legend(2008,0.35, c(expression(kappa[t]),"forecast"), lty=c(1,1), col=c("black","blue"))

dev.off()

#ARIMA with dummy variables

fit3<-Arima(k, order=c(0,1,0), xreg=dummies, include.drift=TRUE)

png(’ForecastDummies.png’)

plot(forecast(fit3, xreg=seasonaldummyf(k, 156)), xlab="Year", ylab=expression(kappa[t]),

main="Forecasts from ARIMA(0,1,0) with Dummy Variables")

legend(1998,1, c(expression(kappa[t]),"forecast"), lty=c(1,1), col=c("black","blue"))

dev.off()

#SARIMA WITH FLU INDICATOR

fit4<-Arima(k, order=c(2,0,1), seasonal=list(order=c(0,1,1),period=52), xreg=tI, include.drift=TRUE)

png(’ForecastFluIndicator.png’)

plot(forecast(fit4,xreg=tIf, 156), xlab="Year", ylab=expression(kappa[t]),

main="Forecasts from SARIMA(2,0,1)(0,1,1)[52] with Flu Indicator")

legend(2008,0.35, c(expression(kappa[t]),"forecast"), lty=c(1,1), col=c("black","blue"))

dev.off()

Page 47: Seasonal Mortality Forecasting based on In uenza Epidemics

Bibliography

Anon. (2016), Bevolking op de eerste van de maand; geslacht vanaf 1950. CBS StatLine,URL http://statline.cbs.nl/Statweb/selection/?DM=SLNL&PA=82218NED&VW=

T.

Anon. (2016), Overledenen; geslacht en leeftijd, per week. CBS StatLine, URLhttp://statline.cbs.nl/Statweb/selection/?VW=T&DM=SLNL&PA=70895ned&D1=

0&D2=a&D3=a&D4=1161-1170&HDR=T%2cG2%2cG1&STB=G3/.

Bartelds, A.I.M. (1996), Continue Morbiditeits Registratie Peilstations Nederland 1995.NIVEL, URL http://www.nivel.nl/.

Bartelds, A.I.M. (2003), Continue Morbiditeits Registratie Peilstations Nederland 2002.NIVEL, URL http://www.nivel.nl/.

Box, George E.P., Gwilym M. Jenkins, Gregory C. Reinsel, and Greta M. Ljung (2015),Time series analysis: forecasting and control. John Wiley & Sons.

Dijkstra, F, GA Donker, B Wilbrink, AB Van Gageldonk-Lafeber, and MAB VanDer Sande (2009), “Long time trends in influenza-like illness and associated determi-nants in The Netherlands.” Epidemiology and infection, 137, 473–479.

Donker, G.A. (2006), Continue Morbiditeits Registratie Peilstations Nederland 2005.NIVEL, URL http://www.nivel.nl/.

Donker, G.A. (2009), Continue Morbiditeits Registratie Peilstations Nederland 2008.NIVEL, URL http://www.nivel.nl/.

Donker, G.A. (2012), Continue Morbiditeits Registratie Peilstations Nederland 2011.NIVEL, URL http://www.nivel.nl/.

Donker, G.A. (2015), NIVEL Zorgregistraties eerste lijn - Peilstations 2014. NIVEL,URL http://www.nivel.nl/.

Enders, W. (2010), Applied Econometric Time Series, 3rd edition. John Wiley & Sons.

Heij, Christiaan, Paul De Boer, Philip Hans Franses, Teun Kloek, Herman K Van Dijk,et al. (2004), Econometric methods with applications in business and economics. OUPOxford.

Hooiveld, M., G.A. Donker, and F.G. Schellevis (2016), Wekelijkse surveillance cijfers.Uit: NIVEL Zorgregistraties eerste lijn. Utrecht, NIVEL, URL http://www.nivel.

nl/surveillance.

Huynen, M Martens, Pim Martens, Dieneke Schram, Matty P Weijenberg, and Anton EKunst (2001), “The impact of heat waves and cold spells on mortality rates in theDutch population.” Environmental health perspectives, 109, 463.

Hyndman, R.J. (2015), forecast: Forecasting functions for time series and linear models.R package version 6.2, URL http://github.com/robjhyndman/forecast.

41

Page 48: Seasonal Mortality Forecasting based on In uenza Epidemics

42 Emanuela Festa — Seasonal Mortality Forecasting

Hyndman, R.J. and Y. Khandakar (2008), “Automatic time series forecasting: the fore-cast package for R.” Journal of Statistical Software, 26, 1–22.

Hyndman, R.J. and A.B. Koehler (2006), “Another look at measures of forecast accu-racy.” International Journal of Forecasting, 22, 679–688.

Klein Tank, A.M.G and Coauthors (2002), “Daily dataset of 20th-century surface airtemperature and precipitation series for the European Climate Assessment.” Inter-national Journal of Climatology, 22, 1441–1453 URL http://www.ecad.eu.

Koissi, Marie-Claire and Arnold F Shapiro (2008), “The Lee-Carter model under thecondition of variables age-specific parameters.” In 43rd Actuarial Research Confer-ence.

Lee, R.D. and L.R. Carter (1992), “Modeling and forecasting US mortality.” Journal ofthe American Statistical Association, 87, 659–671.

Lee, Ronald (2000), “The Lee-Carter method for forecasting mortality, with variousextensions and applications.” North American actuarial journal, 4, 80–91.

Osterhaus, A.D.M.E., G.F. Rimmelzwaan, J.C. de Jong, M.L. Heijnen, W. Dorigo,A.I.M. Bartelds, and J.K. van Wijngaarden (2000), NIEUWSBRIEF INFLUENZA-SURVEILLANCE ‘99 - ‘00 Vol. 8 no. 9. NIC; EUR; RIVM; NIVEL; IGZ.

R Development Core Team (2012), R: A Language and Environment for StatisticalComputing. R Foundation for Statistical Computing, Vienna, Austria, URL http:

//www.R-project.org/. ISBN 3-900051-07-0.

Reichert, Thomas A., Lone Simonsen, Ashutosh Sharma, Scott A. Pardo, David S.Fedson, and Mark A. Miller (2004), “Influenza and the Winter Increase in Mortalityin the United States, 1959-1999.” American Journal of Epidemiology, 160, 492–502.

Shaman, Jeffrey and Alicia Karspeck (2012), “Forecasting seasonal outbreaks of in-fluenza.” Proceedings of the National Academy of Sciences, 109, 20425–20430.

Vellekoop, M. H. (2016), Lecture notes Caput Financing of Pensions Week VI: HumanMortality. Universiteit van Amsterdam, Amsterdam.

Viboud, Cecile, Pierre-Yves Boelle, Fabrice Carrat, Alain-Jacques Valleron, and AntoineFlahault (2003), “Prediction of the spread of influenza epidemics by the method ofanalogues.” American Journal of Epidemiology, 158, 996–1006.