Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
POLITECNICO DI MILANOSCHOOL OF INDUSTRIAL AND INFORMATION
ENGINEERING
Department of MathematicsMaster of Science in Mathematical Engineering
Count processes approachto recurrent event data:
a Bayesian model for blood donations
ENRICO SPINELLIMATRICOLA: 875462
SUPERVISOR: PROF. ALESSANDRA GUGLIELMI
COADVISOR: PROF. ETTORE LANZARONE
A.Y. 2018-2019
Abstract
This work tries to give a solution to a very important and practical issue: theprediction of the number of donations in a specific blood centre, in order toefficiently plan the collection phase of the blood supply chain.
First, statistical models for estimation of the rate of blood donations are considered.This kind of models allows to predict the return time to donation for an individual.The real data that have been analyzed come from the Milan section’s databases ofAssociazione Volontari Italiani Sangue (AVIS). The class of models and methods usedare those of Bayesian Statistics, and blood donations have been modeled as recurrentevents. Specifically, the focus has been on the rate function, which is the instantaneousprobability of the event occurrence. The object of the inference of this approach is thecounting process {Ni(t) : t ≥ 0}, for each donor i, where Ni(t) represents the number ofdonations made at time t by the i− th donor.
Usually the waiting times between donations are considered, but, on the other hand,modeling the counts allows the process to retain memory and to take place with adifferent occurrence rate depending on the time of the event.
The analysis highlights a decreasing trend of the rate function and identifies somesignificant covariates. Moreover, with the use of random effects in the model, hetero-geneity among individuals is captured and for each donor the posterior density of oneparameter (called frailty) summarises his/her personal propensity to donate.
The behaviour of existing donors has been modeled within the context of recurrentevents. Since the supply of blood is given also by occasional donors or new donors, aBayesian time series model has been proposed to make prediction in this context.
I
Estratto in lingua italiana
Questo lavoro cerca di dare una soluzione a un problema molto importante epratico: la previsione del numero di donazioni in un centro di raccolta di sanguespecifico, al fine di pianificare in modo efficiente la fase di raccolta della catena
di approvvigionamento del sangue.
Innanzitutto sono stati considerati i modelli statistici per la stima del tasso di don-azioni di sangue. Questo tipo di modelli consente di prevedere il tempo di ritorno alladonazione per un individuo. I dati reali che sono stati analizzati provengono dai databasedella sezione di Milano dell’Associazione Volontari Italiani Sangue (AVIS). La classedi modelli e metodi utilizzati sono quelli della statistica Bayesiana e le donazioni disangue sono state modellizzate come eventi ricorrenti. Nello specifico, l’attenzione si èconcentrata sulla rate function, che è la probabilità istantanea del verificarsi dell’evento.L’oggetto dell’inferenza di questo approccio è il processo di conteggio {Ni(t) : t ≥ 0}, perogni donatore i, dove Ni(t) rappresenta il numero di donazioni fatte fino al tempo tdall’i− esimo donatore .
Di solito si considerano i tempi di attesa tra le donazioni, ma la modellazione deiconteggi consente al processo di conservare la memoria e di svolgersi con un tasso dioccorrenza diverso in base al passare del tempo.
L’analisi evidenzia una tendenza a decrescere della rate function e identifica alcunecovariate come significative. Inoltre, con l’inclusione di random effects nel modello,l’eterogeneità tra gli individui viene spiegata e per ogni donatore la distribuzione aposteriori di un parametro (chiamato frailty) riassume la sua personale propensione alladonazione.
Il comportamento dei donatori esistenti è stato modellizzato nel contesto di eventiricorrenti. Poiché la fornitura di sangue è data anche da una componente fornita dadonatori occasionali o nuovi donatori, un modello Bayesiano per serie storiche è statoproposto per fare previsioni di questo fenomeno.
III
Table of Contents
Abstract I
Estratto in lingua italiana III
Table of Contents V
List of Figures IX
List of Tables XI
Introduction 1
1 Theoretical background on modelling recurrent events 51.1 Framework and notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Recurrent events as gap times . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Recurrent events as event counts . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4 Heterogeneity between individuals . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4.1 Covariates in the Poisson process . . . . . . . . . . . . . . . . . . . . 8
1.4.2 Random effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.5 Extensions to renewal and Poisson processes . . . . . . . . . . . . . . . . . . 11
1.5.1 "At risk" indicator function . . . . . . . . . . . . . . . . . . . . . . . . 11
1.5.2 General intensity-based model . . . . . . . . . . . . . . . . . . . . . . 12
1.5.3 Multi-state Markov models . . . . . . . . . . . . . . . . . . . . . . . . 13
1.5.4 Modelling the baseline intensity function . . . . . . . . . . . . . . . 13
1.6 The Bayesian approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.6.1 Bayesian Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.6.2 Monte Carlo Markov Chains . . . . . . . . . . . . . . . . . . . . . . . 16
1.6.3 Discretization of the Gamma process prior . . . . . . . . . . . . . . . 16
V
TABLE OF CONTENTS
1.6.4 Autocorrelated prior for the baseline intensity function . . . . . . . 17
1.7 Model evaluation in terms of predictive performances . . . . . . . . . . . . 18
1.7.1 Log posterior predictive density . . . . . . . . . . . . . . . . . . . . . 18
1.7.2 Computation of WAIC . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.7.3 Evaluating predictive accuracy in the case of recurrent events . . . 19
2 Data source 212.1 The AVIS association . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.1.1 Brief history of AVIS . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.1.2 Italian donation rules . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.2 Data sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.2.1 The EMONET database . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.2.2 The AVIS database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.2.3 Data selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.2.4 Suspensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.3 Features selection and data transformation . . . . . . . . . . . . . . . . . . 27
2.4 Descriptive analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.4.1 Rate of donations and gap times . . . . . . . . . . . . . . . . . . . . . 28
2.4.2 Covariates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3 Modelling blood donations as recurrent events 373.1 Recurrent event models for blood donations . . . . . . . . . . . . . . . . . . 37
3.2 Modelling choices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.2.1 Baseline intensity function . . . . . . . . . . . . . . . . . . . . . . . . 39
3.2.2 Frailty parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.2.3 Covariates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.2.4 At risk indicator function, censoring and suspensions . . . . . . . . 44
3.3 The Bayesian model for recurrent data of M donors . . . . . . . . . . . . . . 45
3.3.1 The likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.3.2 Prior elicitation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.4 The predictive distribution of the counting process of a new incoming donor 46
4 Posterior inference on AVIS data 494.1 Posterior inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.2 Inference on parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.2.1 Baseline intensity function . . . . . . . . . . . . . . . . . . . . . . . . 50
VI
TABLE OF CONTENTS
4.2.2 Covariates coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.2.3 Random effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.2.4 Predictive density for the count process of a new incoming donor . 59
4.3 Point predictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5 Forecasting new donors 635.1 State Space Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.2 Descriptive analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.3 A Bayesian model for the new donors . . . . . . . . . . . . . . . . . . . . . . 66
5.4 Posterior inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
Conclusions and further developments 73
Bibliography 77
VII
List of Figures
FIGURE Page
2.1 Histogram of the empirical rates of donation (number of donations divided for
the years of observation) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.2 Boxplot of the number of days passed from the observed last donation of every
donors to their censoring time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.3 Trend of gap times with the number of donations . . . . . . . . . . . . . . . . . 31
2.4 Trend of the gap times with the years passed since entrance . . . . . . . . . . 31
2.5 Histogram of the logarithm of the gap times . . . . . . . . . . . . . . . . . . . . 32
2.6 Boxplots of the BMI according to the values of the categorical covariates . . . 34
2.7 Boxplots of the first donation age according to the values of the categorical
covariates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.8 Boxplots of the donation rate grouped with the categorical variable . . . . . . 36
2.9 Scatterplot of the donation rates against the continuous variable (AGE and
BMI) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.1 Histogram of gap times of female donors, the red line corresponds to 180 days 41
3.2 Histogram of gap times of female donors, the red line corresponds to 90 days 41
3.3 Percentage of earlier that allowed donations as a function of the threshold
age for menopause . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.1 95 % credibility intervals for the baseline intensity function . . . . . . . . . . . 51
4.2 Estimated log posterior predictive density . . . . . . . . . . . . . . . . . . . . . 52
4.3 95 % credibility intervals for the βi ’s parameters . . . . . . . . . . . . . . . . . 52
4.4 Summaries of wi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.5 Predictive densities of Ti,ni+1 given Ti,ni for some donors . . . . . . . . . . . . 55
IX
LIST OF FIGURES
4.6 95 % posterior predictive credibility intervals of wnewj , j = 1, . . . , J, the frailty
of a new donor from zone j. In grey the estimate obtained with the model with
no areal dependence. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.7 Pointwise predictive 95 % credibility intervals for Nnew(t)|xnew, where xnew
is set to the mean (or to the mode) of the features used as covariates . . . . . 58
4.8 Mean functions for Nnew(t)|xnew,data. Unless stated otherwise, the covari-
ates are set to the mean (or to the mode) . . . . . . . . . . . . . . . . . . . . . . 59
5.1 Weekly arrivals of new donors grouped by years . . . . . . . . . . . . . . . . . . 65
5.2 Weekly arrivals of new donors grouped by months . . . . . . . . . . . . . . . . . 66
5.3 Time series of the weekly arrivals . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.4 Traceplots variance parameters Model 1 . . . . . . . . . . . . . . . . . . . . . . 68
5.5 Model 1: decomposition of the time series . . . . . . . . . . . . . . . . . . . . . . 69
5.6 Model 2: decomposition of the time series . . . . . . . . . . . . . . . . . . . . . . 70
5.7 Prediction of new weekly arrivals: 95 % credibility intervals . . . . . . . . . . . 70
5.8 Predictive mean of the seasonal component . . . . . . . . . . . . . . . . . . . . . 71
X
List of Tables
2.1 Variables from table PRESENTAZIONI in EMONET database that are in-
cluded in our dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.2 Variables from table DONAZIONI in EMONET database that are included in
our dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.3 Variables from table ANAGRAFICHE in EMONET database that are included
in out dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.4 Variables from table EMC_DONABILI in EMONET database that are in-
cluded in our dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.5 Variables from table TIPIZZAZIONE in EMONET database that are included
in our dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.6 Variables from table STILIVITA in AVIS database that are included in our
dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.7 Variables from table SOSPENSIONI in AVIS database that are included in
our dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.8 Frequency table that relates the type of suspensions to the respect of the
suspensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.9 Description of the features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.10 Number of donors that did exactly n total donations (after the first one) . . . 29
2.11 Table of the sample frequencies of the categorical variable . . . . . . . . . . . . 33
2.12 Mean and standard deviation of the continuous variable . . . . . . . . . . . . . 34
4.1 Bayesian p-values and hazard ratios . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.2 Predictive performances evaluation of models with different sets of covariates
using 10 fold cross validation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.3 Point prediction errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.1 Summaries of the empirical distribution of the time series of the weekly arrivals 67
XI
LIST OF TABLES
5.2 Prediction of future weekly arrivals . . . . . . . . . . . . . . . . . . . . . . . . . 71
XII
Introduction
Human blood is a natural product, not artificially reproducible, so the only way of guar-
anteeing its availability for health purposes is through donations from living individuals.
Blood is needed to save lives, to improve their quality and to extend their lengths. It is
essential in first aids, emergency services, surgeries, organ and bone marrow transplants,
cure of oncological and haematological diseases. Blood is not only essential in exceptional
cases like natural disasters or accidents or in serious pathological conditions, but it is also
a unique source of survival in case of chronic diseases like anaemia, liver dysfunctions,
lack of coagulation factors and disorders of the immune system.
The blood donation supply chain can be divided in four phases: collection, trans-
portation, storage and utilization. In the collection phase donor’s eligibility to donate is
checked and, then, if the donation occurred, blood is screened in laboratory to prevent
infectious diseases and it is possibly fractionated in subcomponents. Afterwards it is
transported and stored to hospitals or transfusions centres, and finally it is used for a
transfusion.
Bas Güre et al. (2018) discuss how the management of blood collection from donors has
not been adequately considered so far. Indeed most of the efforts in scientific literature are
aimed to the demand prediction or to an efficient management of storage and distribution.
Despite of the lack of consideration in scientific literature, collection is one of the most
important phases of the blood donation supply chain. Blood has a shelf life, and so the
demand of hospitals and transfusions centres has to be covered with the maximum
precision, to avoid wastage of this resource. When neither the demand nor an estimate
of it are present, the storage should be planned to keep constant the number of blood
units of each type across days in every centre. Moreover knowing in advance the number
of incoming donors can lead to an optimal planning of the appointment scheduling
system, with the purpose of merging together the production balancing requirements
and the service planning requirements. In this way the quality of the service from donors’
1
perspective would benefit from it.
In Italy, as in many other western countries, the acquisition of blood products relies
on voluntary donations. The major organization that collects volunteer blood donors is
the Associazione Volontari Italiani Sangue (AVIS). It is straightforward that a precise
arrivals forecast is necessary to have an efficient management of blood collection. Mod-
elling and understanding the behaviour of donors is a way to do so. Some statistical
models have been proposed in scientific literature. Previous works rely on the use of
logistic regressions, or in modelling the gap times between donations in the framework
of recurrent event. Apart from Gianoli (2016), in all the publications frequentist methods
have been used, while the Bayesian approach is largely unexplored.
The class of methods used in this thesis belongs to Bayesian statistics and a recurrent
event approach is adopted, but, unlinke in Gianoli (2016), event counts over time are
modelled, not the waiting times between two successive blood donations.
In the last decades, thanks to the improvements of the performances of computing
systems and to the spread of the MCMC methods, the Bayesian approach is spreading
in the scientific world, since it is able to give a richer inference than classical statistics.
Indeed, probabilistic estimates are exact because they do not rely on a large sample
theory, and instruments like interval estimates have a clearer meaning. Moreover, with
predictive distributions, the Bayesian paradigm offers a natural way to do forecasting.
This thesis deals with the analysis of a dataset built from real data provided by the
AVIS section of Milan.
Suitable data have been downloaded, using SQL queries, from two databases in the
AVIS’ server. Afterwards, a stage of pre-processing followed in order to make the raw
data usable for a statistical analysis. As a result, a dataset of M individuals has been
built. Times of donations, personal features and the total time of observation (namely,
censoring time) were available for each individual.
Subsequently a proper model for treating blood donations as recurrent events has
been formulated. At first, statistical modelling of recurrent event processes has been
studied. A brief research on the state of the art of statistical methods in the field of
recurrent blood donations has been done, either in Bayesian or in frequentist statistics.
Then, a suitable class of models has been identified. However some modifications were
done to adapt the class of models to the real phenomenon. For instance, the model can
handle some typical features of blood donations cycle, such as the mandatory deferral
time after each donation or the suspensions from the activities of donor.
Posterior inference was computed via Stan (see Stan Development Team and others,
2
2016), a C++ open source software which allows to make MCMC sampling.
Finally, posterior inference in the form of MCMC output has been analyzed and
interpreted; moreover a way to sample from a recurrent event process has been proposed.
Appropriate instruments of goodness of fit and of predictive performance accuracy have
been discussed and used to compare different models (for instance, different parametriza-
tions or different subsets of covariates).
The result of the work summarized above is a mathematical model that can explain
the behaviour of a blood donor starting from the moment of his/her first donation. Individ-
ual features are present in the model as covariates. Some of them have been identified as
statistically significant and correlated to higher or lower number of donations in the time
unit. The model can be also used to do individual-specific prediction of new donations.
Finally, to have a complete modelling of the number of donations in a specific blood
collection centre, the time series of the weekly number of new donors has been briefly
analyzed as a State Space Model. Summing up, the original contributions of this work
are:
• composition of the dataset;
• the study of models for the rate function of recurrent events, particularly using the
Bayesian approach;
• application of the class of models to the dataset;
• predictive accuracy comparison of different models;
• a State Space model to predict the number of new donors;
• Stan implementation of the models.
The thesis is organized as follows.
In the first chapter an overview on recurrent event processes and on the various
modeling techniques will be given, both in frequentist and Bayesian frameworks.
The second chapter is dedicated to the description of the data sources.
The particular modeling choices regarding the examined dataset will be explained in
detail in the third chapter.
The fourth chapter is dedicated to the presentation of the results of the analysis. The
inference a posteriori about the parameters of the model will be shown and commented.
The last chapter is devoted to the time series modelling of new donors’ weekly
number.
3
Theoretical background on
modelling recurrent events1
I n this Section, a brief review of the statistical models used in the analysis ofrecurrent events will be presented. By recurrent event processes one refers to thosekind of processes in which events are generated repeatedly over time.
Afterwards there will be a brief recall on what Bayesian Statistics and MCMC methodsare. Model evaluation in terms of predictive performances will conclude the chapter.Almost all the material that is included in this chapter is from Cook and Lawless (2007).
1.1 Framework and notation
A single recurrent event process starting at time t = 0 is characterized by an increasing
sequence of event times {Tk,k ∈ N}, where each element of the sequence denotes the
time of the corresponding event. To this sequence it is associated the counting process{N(t), t ≥ 0}, defined as:
N(t)=∞∑
k=0I(Tk ≤ t), (1.1)
where I(Tk ≤ t) is a function equal to 1 when (Tk ≤ t), and it is equal to 0 otherwise. The
counting process evaluated at time t records the cumulative number of events occurred
in the interval [0, t]. Moreover the number of events occurred in the interval (s, t] can be
expressed as N(t)−N(s).
Let H(t) = {N(s),0 < s ≤ t} be the history of the process, a recurrent event process
can be defined specifying the instantaneous probability that an event occurs given the
previous history and under the hypothesis that two events cannot occur simultaneously.
Considering the probability that an event occurs in the interval (t, t+∆t] one can define
5
CHAPTER 1. THEORETICAL BACKGROUND ON MODELLING RECURRENTEVENTS
the intensity function:
λ(t|H(t))= lim∆t→0
P(N(t+∆t)−N(t)= 1|H(t))∆t
. (1.2)
Once the intensity function is known, it is possible to write the probability of a specified
event history and conditional probabilities for inter-event times through the following
results (see Cook and Lawless, 2007).
Conditionally on H(τ0), the probability density of the outcome "n events, at timest1 < . . . < tn, where n > 0, for a process with an integrable intensity λ(t|H(t)), over thespecified interval [τ0,τ]", is:
exp(−
∫ τ
τ0
λ(u|H(u))du) n∏
j=1λ(t j|H(t j)). (1.3)
For an event with integrable density λ(t|H(t))
P(N(t)−N(s)= 0|H(s)
)= exp(−
∫ t
sλ(u|H(u))du
). (1.4)
Let Wj = T j −T j−1 be the waiting time between the events (j-1) and j, then:
P(Wj > w|T j−1 = t j−1,H(t j−1)
)= exp(−
∫ t j−1+w
t j−1
λ(u|H(u))du). (1.5)
It is clear from the formulas above that the amount of information contained in the
intensity function leads it to play a crucial role in modelling a recurrent event process.
According to the goal of the analysis, it is possible to model event occurrences through
two main ways: event count and gap times. In the first scenario the focus is on the counting
process N(t), while in the second case the waiting times between two consecutive events
are modelled. In the next sections a brief summary of the two approaches will be given.
1.2 Recurrent events as gap times
The analysis of recurrent events as gap times is common when the events are relatively
infrequent or when the system returns to the initial state after every occurrence. In this
case the process is called renewal process and it is a useful framework in system failures
or in case of cyclical phenomena. In a renewal process the gap times Wj = T j −T j−1
between the events j and (j-1) are independent and identically distributed conditionally
to parameters. This condition is equivalent to:
λ(t|H(t))=h(t−TN(t−)), (1.6)
N(t−) := lim∆t→0
N(t−∆t) (1.7)
6
1.3. RECURRENT EVENTS AS EVENT COUNTS
where h(t) is the hazard function, defined as follows:
h(t)= lim∆t→0
P(W > t+∆t|W ≥ t)∆t
= f (t)S(t)
, (1.8)
where f (t) is the density function of the waiting times, and S(t)= P(W ≥ t) is the survivalfunction of the waiting times. The hazard function is the main focus of a branch of
statistics called survival analysis, in which times such as failures or deaths are analyzed.
These kind of processes are often called time-to-event processes. Similarities between
the hazard function and the intensity function are recognizable. Indeed, both represent
the instantaneous probability that an event occurs at time t. Hence, the same modeling
approach can be followed for both the functions.
A renewal process is equivalent to many time-to-event processes which occur one
after the other, since, as it can be noticed in equation (1.6), the intensity function gets
the same values after every event, losing memory of the past. A renewal process can be
generalized by inducing dependence between gap times through linear models. Thus it is
possible to have a trend in the waiting times.
1.3 Recurrent events as event counts
The main way of representing a recurrent event process through event counts is to model
it as a Poisson process. In this special framework the events occur randomly in such a way
that their number in disjoints time intervals are statistically independent. This peculiar
property is reflected in an equivalent way through the independence of the intensity
function at time t with respect to the history H(t) of the process. Mathematically it
means that the intensity function has no dependence on the history of the process and it
can be expressed in the following form:
λ(t|H(t))= ρ(t), t > 0, (1.9)
where ρ(t) is a non-negative integrable function that is called rate function. If, for each
t, ρ(t) = ρ, which means that the intensity is constant over time, the Poisson process
is called homogeneous, otherwise it is called non-homogeneous. As ρ(t) represents the
probability that an event occurs in the interval [t, t+dt], then ρ(t)dt is equivalent to the
mean number of the events in an infinitesimal time interval. Hence
µ(t)=∫ t
0ρ(u)du, (1.10)
is the mean number of events in the interval [0, t] and it is called cumulative rate function.
The definition of Poisson processes implies the following properties:
7
CHAPTER 1. THEORETICAL BACKGROUND ON MODELLING RECURRENTEVENTS
• if t ≥ s ≥ 0 N(s, t)= N(t)−N(s) has a Poisson distribution with mean µ(t)−µ(s);
• if (s1, t1] and (s2, t2] are non-overlapping intervals then N(s1, t1) and N(s2, t2) are
independent random variables;
• in the case of an homogeneous Poisson process with intensity ρ, the gap times
Wj = T j −T j−1 are independent and identically distributed as Exponential random
variables with survivor function equal to
P(Wj > w)= exp(−ρw) w ≥ 0; (1.11)
• if the Poisson process is non-homogeneous with mean function µ(t), the process
defined with a new time scale s =µ(t) as
M(s)= N(µ−1(s)),0< s (1.12)
is an homogeneous Poisson process with unitary intensity.
Hence the intensity function ρ(t) can be used to model a time trend in the events.
1.4 Heterogeneity between individuals
In some contexts the events generating process may differ among individuals; such
heterogeneity can be modeled by including covariates and random effects in the model.
1.4.1 Covariates in the Poisson process
The most common way of including a vector of time-varying covariates x(t) in an intensity-
based recurrent event process is to consider first of all a baseline intensity function λ0(t),which corresponds to the intensity function of a particular individual (for example an
individual who has x(t)= 0).
The next step is to consider intensities of the form:
λ(t|x(t))=λ0(t)g(x(t);β) (1.13)
where g(x(t);β) is a non-negative integrable function and β is a vector of regression
parameters. Typically g(x(t);β)= exp(x(t)′β). This is called multiplicative model or log-linear model.
When the covariates are time-invariant, their effect on a Poisson process has a simple
interpretation. Indeed, conditionally on the covariates, the corresponding Poisson process
8
1.4. HETEROGENEITY BETWEEN INDIVIDUALS
would be characterized by intensity λ0(t)g(x;β) and mean function∫ t
0 ρ0(u)dug(x;β). As a
consequence, the mean and the rate functions for two individuals with covariates x1 and
x2 are proportional, andg(x1;β)g(x2;β)
is the constant of proportionality (in the multiplicative
model the constant is exp((x1 − x2)′β)). This property does not hold in general when the
covariates are time-dependent.
Moreover, some generalizations of the multiplicative model (1.13) can be considered.
A possible extension is to include, as covariates, components based on the prior events
history (e.g. the number of events experienced before t or the time since the last event).
Because of history-dependence, in this case the process is not a Poisson process anymore
and it is called modulated Poisson process.
Another possible extension is to consider intensity functions of the form
λ(t|x(t))=λ0(t)+ g(x(t);β) (1.14)
where g(x(t);β) has to be chosen such that λ(t|x(t))≥ 0. This model is called additive.
The last possible extension presented here is the time transform model, analogous to
the accelerated failure time model in survival analysis:
λ(t|x(t))=λ0
(∫ t
0exp(x(u)′β)du
)exp(x(t)′β). (1.15)
In this case s = exp(x(t)′β) can be considered as a transformed time scale.
1.4.2 Random effects
In some situations unobservable factors may create heterogeneity across different in-
dividuals that experience the same recurrent event process. In this case it is useful
to introduce random effects in order to capture this feature in the model. Thus, the
subject-specific intensity function for the i− th individual can be written as:
λi(t|H(t),ui, x,β)= uiλ0(t), (1.16)
where ui is called frailty and it represents the unobservable individual specific random
effect.
Typically, for inference purposes, all the random effects ui can be modeled as inde-
pendent random variables equally distributed with Gamma density with mean equal to
1 and variance equal to φ, with φ≥ 0. This model is equivalent to state that, condition-
ally to ui, the stochastic process {Ni(t) : 0 ≤ t}, which represents the number of events
occurred to individual i, is a Poisson process with intensity equal to uiλ0(t).
9
CHAPTER 1. THEORETICAL BACKGROUND ON MODELLING RECURRENTEVENTS
However, marginalizing the process over the random effects, makes the process no more
Poisson.
Indeed:
E[Ni(t)∣∣λ0(t)]=µ0(t); (1.17)
var[Ni(t)∣∣λ0(t)]=µ0(t)+µ0(t)2φ; (1.18)
cov[Ni(s1, t1) , Ni(s2, t2)∣∣λ0(t)]=φµ0(s1, t1)µ0(s2, t2); (1.19)
where µ0(s, t)= ∫ ts λ0(u)du and µ0(t)=µ0(0, t) and s1 < t1 < s2 < t2.
Of course, some properties of the Poisson process are violated, for instance the mean and
the variance functions are not equal. Moreover the counts in disjoint intervals are not
statistically independent since their covariance is different from zero. From equations
(1.18) and (1.19) it is clear that the variance of the random effects φ quantifies both the
heterogeneity across individuals (since the variance is an increasing function of it) and
the dependence between counts in disjoint intervals.
Marginalizing equation (1.16) over the random effect ui leads to:
λi(t|H(t))=λ0(t)1+φNi(t−)1+φµ0(t)
(1.20)
where Ni(t−)= lims→t− Ni(s).
This can be done by writing
P(Ni(t+∆t)−Ni(t)
∣∣Hi(t))= ∫ ∞
0P
(Ni(t+∆t)−Ni(t)
∣∣Hi(t),ui) P
(Hi(t)
∣∣ui)g(ui|φ)∫ ∞
0 P(Hi(t)
∣∣ui)g(ui|φ)dui
dui,
(1.21)
then, for small ∆t
P(Ni(t+∆t)−Ni(t)
∣∣Hi(t),ui)=λ0(t)ui∆t, (1.22)
remembering that the density g(ui|φ) is a Gamma with scale and shape parameters
equal to φ−1 and that
P(Hi(t)
∣∣ui)= { Ni(t−)∏
j=1uiλ0(ti, j)
}exp
(−∫ ∞
0uiλ0(x)dx
), (1.23)
since it is the expression of the likelihood of the process (1.3).
10
1.5. EXTENSIONS TO RENEWAL AND POISSON PROCESSES
Substituting (1.22) and (1.23) in (1.21) and simplifying
P(Ni(t+∆t)−Ni(t)
∣∣Hi(t))
∆t=λ0(t)
∫ ∞0 uNi(t−)+φ−1
i exp(−ui(
∫ t0 λ0(x)dx+φ−1)
)dui∫ ∞
0 uNi(t−)+φ−1−1i exp
(−ui(∫ t
0 λ0(x)dx+φ−1))dui
=
(1.24)
=λ0(t)Γ(Ni(t−)+φ−1 +1)Γ(Ni(t−)+φ−1)
(φ−1 +∫ t
0 λ0(x)dx)Ni(t−)+φ−1
(φ−1 +∫ t
0 λ0(x)dx)Ni(t−)+φ−1+1
=λ0(t)1+φNi(t−)1+φµ0(t)
(1.25)
it results (1.20).
Hence, if random effects are present in the model, the intensity depends on the
number of events experienced by the individual.
The random effects approach and the multiplicative model including covariates can be
combined.
1.5 Extensions to renewal and Poisson processes
1.5.1 "At risk" indicator function
Another feature that can be included in the model is the heterogeneity of the observation
time of each individual. In order to do so we introduce the risk indicator function Yi(t),that is equal to 1 when the i − th individual is observed (and he or she is "at risk"
of experiencing the event), otherwise it is equal to 0. For example, if an individual
is observed in the interval [τ0i,τi], then Yi(t) = I(τ0i ≤ t ≤ τi). The notation can also
accommodate settings where individuals are observed over disjoint time intervals, for
example if an individual is lost to followup for a certain period of time.
The right end of the observation window τi is typically called censoring time and it
represents the termination of the study for i− th individual.
It is now possible to define respectively the observed part of the counting process, the
history and the intensity of the observable process:
Ni(t) :=∫ t
0Yi(u)dN(u); (1.26)
Hi(t) :={Ni(s),Yi(s),0≤ s < t}; (1.27)
λi(t|Hi(t)) := lim∆t→0
P(Ni(t+∆t)−Ni(t)= 1|H(t))∆t
. (1.28)
11
CHAPTER 1. THEORETICAL BACKGROUND ON MODELLING RECURRENTEVENTS
In some cases information is incorporated from the history of the process into the
intensity function. As a consequence, ∆Ni(t) := lim∆t→0 Ni(t+∆t)−Ni(t) and Yi(t) are
conditionally independent given the history, and so the intensity of the observable process
is such that
λi(t|Hi(t))=λi(t|Hi(t))Yi(t). (1.29)
Basically, the observable process has intensity 0 outside of the observation scheme.
The likelihood (1.3) can now be expressed in terms of the observable process as:
exp(−
∫ ∞
0λ(u|H(u))Y (u)du
)×
n∏j=1
λ(t j|H(t j)), (1.30)
and it can be used to estimate λ(t|H(t)).
1.5.2 General intensity-based model
In the previous sections the intensity functions of renewal processes and of the counting
processes have been analyzed. In case of renewal processes the intensity is a function
of the time since the last event. This function is called hazard function in analogy to
survival analysis. In case of counting process the intensity is called rate function. Both
models can be extended with covariates and random effects, by multiplying the baseline
intensity function with a function of a linear combination of the covariates and/or with a
parameter called frailty, which represents the variability between individuals that is not
captured by the observed features.
The two models can be combined in order to have dependence both from the recurrent
events count and from the gap-times. In this case the intensity can be written as:
λ(t|H(t))= exp(α+βg1(t)+γI(N(t−)> 0)g2(t−TN(t−))
). (1.31)
The functions g1(t) and g2(t) express the dependence from calendar time and from the
time since the last event, respectively. When the parameter γ is equal to 0 the recurrent
event process is a Poisson process, and when the parameter β is equal to 0 the recurrent
event process is a renewal process, since the intensity depends only on the waiting times.
The intensity depends on the process itself, and so it is not always possible to have a well
defined analytical framework like in the Poisson process model. However, thanks to (1.5),
it is possible to simulate the gap-times and hence to have a Monte Carlo estimate of the
law of N(t), for any t.
12
1.5. EXTENSIONS TO RENEWAL AND POISSON PROCESSES
1.5.3 Multi-state Markov models
There are at least two possible approaches in order to introduce the dependence of the
recurrent event process on the number of events experienced until time t.The first is to introduce a function of N(t−) in the covariates, while the second is to model
the process as a Multi-state Markov model. In this particular framework every individual
at every time is in a particular state, which it corresponds to the cumulative number of
events experienced until that moment. The transition from a state to another is possible
only from the state k to the state k+1, and to every transition it is associated to an
intensity αk(t), where
αk(t)= lim∆t→0
P(N(t)−N(t−∆t)= 1|N(t−∆t)= k,H(t)
)∆t
. (1.32)
Hence, the intensity of the process can be written as:
λ(t|H(t))=∞∑
k=0αk(t) I(N(t)= k). (1.33)
In the case αk(t)=α(t) for every k the model is the canonical Poisson process with α(t)as a rate function.
1.5.4 Modelling the baseline intensity function
Once covariates and frailties have been introduced in the model, an important issue
is the choice of the baseline intensity function. This choice can be either parametric
or non-parametric. The simplest parametric choice for the baseline intensity function
is the constant intensity. This choice implies an homogeneous Poisson Process, where
gap-times are distributed as Exponential random variables and the mean function is
linear with respect to the time.
In some contexts the intensity function cannot be constant over time. This is the case
either of diseases in which there is a significant infant mortality (decreasing intensity
function) or of aging process in which the events are more likely to happen once some
time is passed (increasing intensity function). Then a possible extension is the Weibull
model, where the gap times are independent random variables distributed with density:
f (x|λ,α)=λαxα−1e−λxα I{x≥0}(x). (1.34)
If α> 1 then the intensity is increasing, if α= 1 the intensity is constant, otherwise it is
decreasing. Under this assumption:
N(t)∼ Poisson(λtα), ∀t ≥ 0 (1.35)
13
CHAPTER 1. THEORETICAL BACKGROUND ON MODELLING RECURRENTEVENTS
and {N(t) : 0≤ t} is a Poisson process.
The baseline intensity function can assume a non-parametric form in the following
way. Let us divide the observation time in K disjoint intervals taking a0 = 0< a1 < . . .<aK as cut-points. For each of the resulting sub-intervals (ak−1,ak] let us assume that
the intensity is constant and equal to λk > 0. Now the baseline intensity function is
characterized by the vector of parameters (λ1, . . . ,λK ). This kind of model can approximate
the shape of every type of intensity function, and the approximation will be as good as Kis large enough. However a larger K implies more parameters to estimate and though a
greater computational effort. In this case, including time varying covariates {xi(t) : 0≤ t},the likelihood of the model can be expressed as the product of the contributions that
every individual has on the specific interval:
K∏k=1
{λ
n·kk
M∏i=1
{exp(
ni∑j=1
xi(ti j)′βI(ak−1,ak](ti j)−λk
∫ ak
ak−1
Yi(s)exp(xi(s)′β)ds)}}
, (1.36)
where ti j is the time of the j-th event experienced by the i-th individual, M is the
total number of individuals, and n·k =∑mi=1 nik, where nik is the total number of events
between ak−1 and ak experienced by the i-th individual.
The cut-points can be chosen in different ways. In order to guarantee an estimate of every
λk the observation of at least one individual must fall into the corresponding interval; for
this reason one possible choice is to set ak as thekK
empirical quantile of the distribution
of the event times. Another possible choice, simpler and independent of the observation,
is to divide the observation time in K equispaced intervals. Of course this modelling issue
must be object of an analysis of sensitivity, both on the number of cut-points K and on
their position on the time domain. In the literature of survival analysis Gustafson et al.
(2003) suggest the use of the quantiles, while Yin et al. (2006) and Sahu et al. (1997)
propose the use of equispaced grids. In the field of recurrent event process in a Bayesian
setting Pennell and Dunson (2006) use a tightly spaced grid and an auto-correlated prior
in order to borrow strength between intervals.
If one imposes that λ1, . . . ,λK are independent random variables distributed with
proper Gamma distributions, the resulting cumulative intensity function µ(t)= ∫ t0 λ(u)du
is a realization of a Gamma process (see Kalbfleisch, 1978), which is a particular stochas-
tic process built such that the increments are independent random variables Gamma
distributed, namely:
µ(t)−µ(s)∼Γ(φ(t)−φ(s), c
), (1.37)
14
1.6. THE BAYESIAN APPROACH
where φ(t) is an increasing function, and c is a positive-valued parameter.
1.6 The Bayesian approach
1.6.1 Bayesian Statistics
In a statistical model once a dataset y= (y1, . . . , yn) is observed it is possible to associate
a measure of beliefs through p(y|θ), which depends on a vector of parameters θ. p(y|θ)
is called likelihood, and it is a probability measure. The vector θ typically summarises
the characteristics of the population from which the dataset y is sampled. While in
the frequentist framework θ is a fixed number, in Bayesian statistics it is a random
variable, and a probability measure π(θ) is associated to its every possible value. π(θ) is
called prior probability. Hence the likelihood function p(y|θ) has to be interpreted as the
probability associated to y once θ is the true parameter vector. Summarising:
• π(θ) is a measure of beliefs that θ represents the true characteristics of the popula-
tion;
• p(y|θ) is a measure of beliefs that y would be sampled from the population if θ is
the true parameter.
The Bayesian approach offers a way to update the prior beliefs about θ with the com-
putation of the posterior distribution π(θ|y), which is a function that summarises the
beliefs about θ once y is observed. This is done by using the Bayes’ Theorem:
π(θ|y)= p(y|θ)π(θ)∫p(y|θ)π(θ)dθ
, (1.38)
where the integral is on all the support of θ.
Once this function is known it is possible to compute all the summaries of the posterior
distribution like the posterior mean E[θ|y], the posterior variance V ar[θ|y] or to make
an interval estimate C such that P(θ ∈ C|y)= 1−α.
The Bayesian method offers a typical scientific approach where some hypothesis on a
phenomenon (summarised in π(θ)) are validated by the collection of data y yielding to a
new point of view, namely the posterior distribution π(θ|y).
In this thesis the Bayesian approach will be followed. In Chapter 3 the statistical
model is set up with the likelihood of the data and the prior elicitation, while in Chapter
4 the posterior inference is showed and commented.
15
CHAPTER 1. THEORETICAL BACKGROUND ON MODELLING RECURRENTEVENTS
1.6.2 Monte Carlo Markov Chains
Equation (1.38) is usually an intractable expression, hence all the inference can be done
by simulating a sample from the posterior distribution. Monte Carlo Markov Chains
(MCMC) methods offer a way to do so.
MCMC is a class of algorithms in which a Markov chain whose stationary distribution
is the posterior distribution is simulated. This means that every step of the Markov chain
can be considered as a draw from the posterior distribution, if we let run the simulation
for enough time. The MCMC algorithms generates a Markov Chain θ(1), . . . ,θ(T), where
θ(t) is independent of θ(1), . . . ,θ(t−2) conditionally on θ(t). Then, under general conditions,
if T →∞ and if h(θ) is a measurable function:
1T
T∑t=1
h(θt)→∫
h(θ)π(θ|y)dθ = E[h(θ)|y]. (1.39)
Hence all the summarises of the posterior distribution can be approximated by averaging
over the MCMC sample.
The MCMC algorithm used in this thesis is the Hamiltonian Monte Carlo (HMC),
which is efficiently implemented in a software called Stan (see Stan Development Team
and others, 2016). Stan is an open source software written in C++ that can be integrated
with the software R with the package rstan.
1.6.3 Discretization of the Gamma process prior
A possible implementation of a non-parametric intensity model is the Gamma process.
In Johnson et al. (2010), Section 13.2.5, a discretizationn of the Gamma process prior
in the survival setting is given, but this can be extended to the framework of recurrent
events. The model is the following.
First of all, a partition of the time domain must be given. Let us call it a0 := 0, . . . ,aK .
Then the idea is to center the intensity function on a certain value λ∗, which corresponds
to the intensity function of an Exponential random variable of parameter λ∗. As a
consequence, all the pieces of the intensity function must satisfy the equation:
E[λk|λ∗]=λ∗. (1.40)
A further requirement is that the prior variance of each λk is inversely proportional
to the length of the corresponding interval ak −ak−1 an to another parameter called w,
which is common to all the steps of the intensity function. Once mean and variance are
16
1.6. THE BAYESIAN APPROACH
defined, the last condition to impose is that the increments are Gamma-distributed, and
this is equivalent to:
λk|w,λ∗ ind∼ Gamma(λ∗w(ak+1 −ak) , w(ak+1 −ak)
), k = 1, . . . ,K . (1.41)
The parameters λ∗ and w can be fixed or they can be modelled with a prior distribution.
Another important reference for Bayesian modelling of recurrent events is Ouyang
et al. (2013), in which it is also discussed the case where the termination of the obser-
vation of the recurrent event process is dependent on the process itself. In their work
Ouyang et al. (2013) propose to model the steps of the intensity function as a priori
independent and identically distributed, which is the approach that it is used in this
thesis.
1.6.4 Autocorrelated prior for the baseline intensity function
In Pennell and Dunson (2006) the prior structure of λ1, . . . ,λK is built to have correlations
among the parameters. Every steps of the intensity function is written as
λk = λk∆ j, (1.42)
where λk is the initial guess on the baseline intensity in that interval and ∆ j is a
multiplicative effect. The multiplicative effects are modelled in the following way:
∆ j =ν0
j∏h=1
νh j = 1, . . . ,K (1.43)
ν0 ∼Gamma(φ,φ) (1.44)
ν ji.i.d.∼ Gamma(ψ,ψ) j = 1, . . . ,K . (1.45)
It can be noticed that ∆ j =∆ j−1ν j, and so a covariance structure is induced in the multi-
plicative effects. Moreover φ controls the degree of shrinkage of the posterior towards
the initial guess on the baseline, and ψ regulates the smoothness in the deviations from
the prior estimate.
Another autocorrelated prior is proposed in Arjas and Gasbarra (1994). In this case
λ1 ∼Gamma(α1,β1) (1.46)
λk|λk−1, . . . ,λ1i.i.d.∼ Gamma
(α,
α
λk−1
), k = 2, . . . ,K , (1.47)
with α1 and β1 that have to be chosen in order to model the value at time t = 0 of
the baseline intensity function. The parameter α is inversely proportional to the prior
17
CHAPTER 1. THEORETICAL BACKGROUND ON MODELLING RECURRENTEVENTS
variance of the parameters λk. In fact from the following equation
E[λk|λk−1, . . . ,λ1]=λk−1 (1.48)√V ar[λk|λk−1, . . . ,λ1] =λk−1p
α, (1.49)
it can be noticed that, if α is very small, high deviations from the mean are allowed. In
the limiting case of α→∞ the baseline intensity function is a priori constant. Equation
(1.48) is equivalent to assume that the baseline intensity function has a martingale
structure with respect to the prior distribution and the internal filtration.
1.7 Model evaluation in terms of predictiveperformances
The fitting of a statistical model is often followed by its evaluation in terms of predictive
accuracy. The idea is to obtain an unbiased and accurate measure of the out-of-sample
predictive error. This issue has been tackled also in Bayesian statistics, for example in
Gelman et al. (2014) and Vehtari et al. (2017).
The most natural way to estimate the predictive error is through cross-validation,
however it requires multiple fits of the model and, especially in the Bayesian setting,
this could be a problem because of the computational burden of the MCMC methods.
Alternative methods aim to estimate the out-of-sample predictive error with the data,
using a correction for the bias that arises from evaluating the model’s prediction on
the data used to fit it. Some of these measures are the Akaike Information Criterion
(AIC), the Deviance Information Criterion (DIC), or the Watanabe–Akaike information
criterion (WAIC), which is a fully Bayesian method.
1.7.1 Log posterior predictive density
Consider data y1, . . . , yn modeled as observations of independent random variables given
parameter θ. The contribution of the single data point yi to the likelihood of the model is
p(yi|θ), while the total likelihood is p(y|θ)=∏ni=1 p(yi|θ). The notation can be generalized
even when there are covariates substituting p(yi|θ) with p(yi|θ, xi). If a new data point∼y is produced by the true data generating process, the out-of-sample predictive fit for
this datum can be computed as:
log p(∼y |y1, . . . , yn)= logE
[p(
∼y |θ)
∣∣y1, . . . , yn]= log
∫p(
∼y |θ)p(θ|y1, . . . , yn)dθ (1.50)
18
1.7. MODEL EVALUATION IN TERMS OF PREDICTIVE PERFORMANCES
This quantity can be estimated by:
l ppd = logn∏
i=1p(yi|y1, . . . , yn)=
n∑i=1
log∫
p(yi|θ)p(θ|y1, . . . , yn)dθ, (1.51)
where lppd stands for log pointwise predictive density. Equation (1.51) is a biased
estimate of the (1.50) since the out-of-sample predictive fit is evaluated in the data
point itself, indeed the observation yi appears both in the likelihood p(yi|θ) and in
p(θ|y1, . . . , yn), which is the posterior distribution of θ.
To compute (1.51), it is possible to evaluate the expectation using draws from the posterior
distribution of the parameters p(θ|y1, . . . , yn), that are indicated as θ(s), s = 1, . . . ,S.
computed l ppd =n∑
i=1log
( 1S
S∑i=1
p(yi|θ(s)))
(1.52)
1.7.2 Computation of WAIC
WAIC (introduced by Watanabe in 2010) estimates the out-of-sample predictive measure
by computing expression (1.52) and then adding a bias correction. Then, the expected log
pointwise predictive density is computed as:
el ppdW AIC = l ppd− pW AIC, (1.53)
where pW AIC is the adjustment, that can be computed in two ways:
• pW AIC1 = 2∑n
i=1(logE[p(yi|θ)
∣∣y1, . . . , yn]−E[log p(yi|θ)|y1, . . . , yn]);
• pW AIC2 =∑ni=1 V ar[log p(yi|θ)
∣∣y1, . . . , yn].
Both the measures can be approximated once an MCMC sample is available.
Gelman et al. (2014) recommend pW AIC2, because, in its series expansion, equation (1.53)
resembles leave-one-out cross validation.
1.7.3 Evaluating predictive accuracy in the case of recurrentevents
All the formulas in the previous section rely on the division of the data in some partition
(the yi ’s with which it is possible to compute the probability p(yi|θ)).
In the case of recurrent event process one possibility is to consider the whole process of
events for every individual i in the study. Hence p(yi|θ) is:
19
CHAPTER 1. THEORETICAL BACKGROUND ON MODELLING RECURRENTEVENTS
exp(−
∫ ∞
0λi(u|Hi(u),θ)Yi(u)du
)×
ni∏j=1
λi(ti j|Hi(ti j),θ), (1.54)
In the case of a multiplicative model, with random effects and with the presence of
covariates
λi(t|Hi(t),θ)= wnewλ0(t|H(t),θ)exp(x′iβ). (1.55)
Since the main interest lies in predictive accuracy, wnew is not the random effects of
the individual i (which is estimated in the model), but is the frailty of a new incoming
individual given the observations.
20
Data source2
I n this chapter details on the dataset that has been analyzed are given.The first section is devoted to present the AVIS association, from its history to therules that regulate blood donations. All the information given are taken from the
websites of AVIS and AVIS Milan.Then it follows a thorough description of AVIS and EMONET databeses (the data sources).
2.1 The AVIS association
2.1.1 Brief history of AVIS
The Associazione Volontari Italiani Sangue (AVIS) was born in Milan in 1927 thanks to
the physician Vittorio Formentano, who made an appeal on a daily newspaper of the time
to form a group of donor volunteers. Seventeen persons answered the call and formed
the first AVIS group of the history.
However the official formation of the association is dated on 1929; transfusion thera-
pies started to be accessible to everybody, and not only to wealthy people. At the same
time the memorandum of the association has been approved. A passage of the memoran-
dum can be translated as follows: "The finality of the Association is to promote, especially
in the working class, the humanitarian, social and patriotic concept of the voluntary
offering of their own blood." In this period groups of blood donors associations born in
other cities like Ancona, Bergamo, Brescia, Torino, Napoli, Cagliari, Cremona.
With the purpose to coordinate the local groups spread in Italy, in 1946 the Association
assumed a national form, with Milan as headquarter.
In 1950 the Republic of Italy gave legal recognition to AVIS with Law n. 49; in
1967 Law n. 592 recognized the civic and social role of AVIS in the organization and
promotion in matter of transfusion, while in 1990 another law established the principle
21
CHAPTER 2. DATA SOURCE
of the gratuity of blood donation. Furthermore, it is stated that the voluntary blood
donor associations and the related federations contribute to the institutional aims of the
National Health Service concerning the promotion and development of blood donations
and the protection of donors.
The activity of the association became more and more popular and in 2005 AVIS reached
the goal of one million donors and in 2009 for the first time since the foundation more
than two millions of donations took place in Italy.
In 2017 AVIS had its 90th birthday; through its long life it has become one of the most
important voluntary associations in Italy.
2.1.2 Italian donation rules
Because of the importance of blood in healthcare, there are some rules that regulates
the mechanism of blood donations. These rules are meant to protect both the health
of the patient who will receive the blood and the health of the donor himself/herself. A
legislative act called "Disposizioni relative ai requisiti di qualità e sicurezza del sangue e
degli emocomponenti" (see Ministero Della Salute, 2015) collects all these rules.
Any candidate donor must be between 18 and 60 years old. However the responsible
physician can allow a candidate donor older than 60 years old to donate for the first time.
The anagraphic age limit is increased to 65 years old for periodic donors, even in this case
the physician can allow a person to donate until 70 years old after a clinical evaluation
of the risks correlated to the age. Every donor must weigh more than 50 Kg, the blood
pressure, the frequency of the heartbeats and the level of hemoglobin must lie between
certain ranges. The yearly maximum number of donations for men and for women who
are in menopause is 4, for the other women is 2. By law, the minimum gap time between
two consecutive donations is 90 days. In order to respect the restriction on the yearly
maximum number of donations for women the minimum gap time is put to 180 days,
but this is an internal rule of the association, not a law limit. However the responsible
physician can move up the donation if he or she thinks that the health and the wellness
of the donor are not in danger. The donor can be suspended from the activity for a certain
time or forever if the donation can in some way compromise his/her own health status
or the quality of the component donated. Suspensions are not exceptional events; for
example journeys in exotic countries, dental care, change of the partner or a recent flu
are some causes of temporary suspensions. Of course, the length of the suspension is
related to the severity of the cause.
22
2.2. DATA SOURCES
2.2 Data sources
The data of Milan’s AVIS section come from two databases: the EMONET database and
the AVIS database. Data used in this work have been collected from multiple tables of
the two databases. The EMONET database is made of tables concerning donations or
personal data of the donors; the AVIS database contains information about suspensions
and donors’ habits. All the data have been extracted using SQL queries on the AVIS’
servers, and have been joined with the unique ID of the donor and/or with the unique ID
of the blood donation. The dataset has been built only with the tables going from 2.1 to
2.7. In the next subsections some tables describes the two databases.
2.2.1 The EMONET database
We have considered five tables in the EMONET database
• tables PRESENTAZIONI and DONAZIONI contain some information about the
donations (see Tables 2.1 and 2.2);
• tables TIPIZZAZIONE and ANAGRAFICHE contain information about the donors
(see Tables 2.5 and 2.3);
• table EMC_DONABILI records the blood components that could be donated (see
Table 2.4 ).
Column Type DescriptionCAI numerical donor unique id
DTPRES date-time date and timeIDPRES numerical donation unique id
TIPO_ATTIVITA categorical (D for donation, C for control)ID_PUNTPREL numerical AVIS location unique id
Table 2.1: Variables from table PRESENTAZIONI in EMONET database that are in-cluded in our dataset
23
CHAPTER 2. DATA SOURCE
Column Type DescriptionCAI numerical donor unique id
DTPRES date-time date and timeIDPRES numerical donation unique id
ID_EMCDON categorical blood component unique id: 1 for whole blood, 2 for plasma, ...VALIDITA categorical V if the donation was effective, N otherwise
Table 2.2: Variables from table DONAZIONI in EMONET database that are included inour dataset
Column Type DescriptionCAI numerical donor unique id
SESSO numerical donor gender (1 for man, 2 for woman)DATANASCITA date donor’s birthday
CAP_DOMIC categorical donor’s domicile postal codeCAP_RESID categorical donor’s residence postal code
Table 2.3: Variables from table ANAGRAFICHE in EMONET database that are includedin out dataset
Column Type DescriptionID_EMCDON categorical blood component unique id: 1 for whole blood, 2 for plasma, ...INTERVALLO numerical minimum gap time between two donations of the component
DESCR character description of the blood componentNDONMAXMAS numerical maximum number of donation in a year for menNDONMAXFEM numerical maximum number of donation in a year for men
Table 2.4: Variables from table EMC_DONABILI in EMONET database that are includedin our dataset
Column Type DescriptionCAI numerical donor unique idAB0 numerical blood type (A, A1, A2, A3, B, AB, A1B, A2B, 0)
TIPO_RH categorical Rhesus factor (POS or NEG)
Table 2.5: Variables from table TIPIZZAZIONE in EMONET database that are includedin our dataset
2.2.2 The AVIS database
In AVIS database two tables have been considered. Table STILIVITA registers some
information about the lifestyle of the donors (Table 2.6), while all the suspensions have
24
2.2. DATA SOURCES
been recorded in table SOSPENSIONI (Table 2.7).
Column Type DescriptionCAI numerical donor unique id
FUMO categorical smoking habitsALCOOL categorical drinking habits
THE categorical tea consumptionCAFFE categorical coffee consumptionDIETA categorical diet type
STRESS categorical stress levelATTIVITAFISICA categorical physical activity habits
CIRCONFERENZAVITA numerical abdominal circumferenceALTEZZA numerical height
PESO numerical weightBMI numerical Body Mass Index
Table 2.6: Variables from table STILIVITA in AVIS database that are included in ourdataset
Column Type DescriptionCAI numerical donor unique id
TIPO_SOSP categorical T for temporary, D for definitiveDATAINSERIMENTO date-time suspension starting dateDATARIAMMISSIONE date-time suspension ending date
Table 2.7: Variables from table SOSPENSIONI in AVIS database that are included inour dataset
2.2.3 Data selection
For this work the whole period going from the 1st of January 2010 to the 30th of June
2018 has been considered as observation time. The focus of the analysis is on donations of
whole blood performed in the main building of AVIS Milano, that is located in the district
of Lambrate. We have considered only "new" donors, namely people who have become
donors in this period, discarding all the others. For every donor there is an observation
interval that has its origin in his/her first whole blood donation and its end in the 30th of
June 2018, which it is considered as a censoring time. According to this selection criteria
there is a dataset composed of 9175 donors; each donor’s observation time has a length
that is generally different from the others, with a different number of donations.
25
CHAPTER 2. DATA SOURCE
2.2.4 Suspensions
The donor could be suspended from his/her activity for a certain period of time if his/her
wellness or the quality of the blood component are in danger. These facts are registered
and the suspensions are collected in the databases of the Association (see Table 2.7).
In this period 805 suspensions related to 618 donors are registered. However many
of these suspensions are overlapping; this may happen when after a further control the
suspension is extended because the reasons to preclude the person to donate remain. For
each suspension the beginning and the end times are present, and a categorical variable
named TIPO_SOSP points out if it is a life-suspension or a temporary suspensions.
Among these, there are 421 temporary suspensions for 348 donors without an end date,
hence it is difficult to correlate the effect of the suspension on the individuals’ donations.
The remaining ones are not respected in 92 cases, which is about the 25% of the times,
and so a blood donation is performed during the suspension.
Definitive TemporaryNOT RESPECTED 5 87
RESPECTED 42 250
Table 2.8: Frequency table that relates the type of suspensions to the respect of thesuspensions
The fact that not all the suspension are respected does not mean that there is a lack
of control of the Association on this issue, indeed the responsible physician can decide to
move up the end of the suspension, and this is probably that case. A possible solution to
this issue could be to think the real end of the suspension as the minimum between the
time of the successive donation and the registered end of the suspension. However the
temporary suspensions without an end time remain an issue, because with the above
solution there is the possibility that an individual who does not return to donate for a
long time for his/her will can be confused with an individual for whom the donation is
precluded.
Other data that are available are donations of other blood components. It is possible to
think the period of rest after each donation as a suspension from the donations of whole
blood and to include these information in the analysis. From Tables 2.4, 2.1, 2.2 it is
possible to have the starting and the end date of the period of inactivity due to donations
of blood components different from whole blood. Hence the data about suspensions are
completed with 727 observations related to 267 donors. Only 5 of these are not respected.
26
2.3. FEATURES SELECTION AND DATA TRANSFORMATION
2.3 Features selection and data transformation
Feature Levels Description MissingSESSO 2 Gender of the donor (1 Male, 0 Female) 8FUMO 15 Daily number of cigarettes 315
ALCOOL 6 Daily weigth of alcool consumed 315THE 7 Daily number of cups 2843
CAFFE 7 Daily number of cups 2232DIETA 7 Kind of diet 315
ATTIVITAFISICA 15 Sport level 315PESO - Height in cm 315
ALTEZZA - Weight in kg 315STRESS 5 From absent to stressed 315
AB0 9 Blood type 0RH 2 Positive or negative 0
CAP_DOMIC 1482 Postal code 51
Table 2.9: Description of the features
There are many features in the databases that can be used as covariates in a statistical
model (see Sections 2.2.2 and 2.2.1). Let us focus on the ones described in Table 2.9.
There are missing values for some donors. When the missing values were in a notable
number (like in the variables THE and CAFFE, namely the daily number of cups of coffee
and tea, see Table 2.9) the whole feature have been discarded (column-wise deletion),
while for all the other features just the corresponding individual has been discarded
(row-wise deletion). Most of the features are categorical variable with many levels. In
order to make them suitable for a statistical model they have been transformed into
binary dummy variables.
• The variable FUMO takes the value 1 if the donor is a smoker, 0 if he or she is not;
• the variable ALCOOL takes value 0 if the donor declare to not consume alcoholic
beverages, 1 otherwise;
• the variable ATTIVITAFISICA takes value 0 if the donor declare to have a seden-
tary lifestyle, or if he/she consider low or irregular his/her level of physical activity;
• the blood type is transformed into a 4 levels dummy variable (A,B,0,AB). For
instance (1,0,0,0) is blood type A, (0,1,0,0) is B, and so on.
27
CHAPTER 2. DATA SOURCE
The variables DIETA and STRESS do not seem to be useful for an analysis, almost all
the donors declare to have a balanced diet and an absent level of stress.
Numerical features are also present:
• AGE is the age of the donor when he/she donates for the first time in his/her life;
• with the variables PESO (weight) and ALTEZZA (height) the Body Mass Index
(BMI) has been computed as:
BMI = WeightHeight2
where the weight is expressed in Kg and the height in meters.
2.4 Descriptive analysis
2.4.1 Rate of donations and gap times
At the end of the procedure of data selection 9175 donors were registered in the dataset.
All these persons together did 34864 donations of whole blood in the period that goes
from the 1st of January 2010 to the 30th of June 2018.
As it can be noticed in Table 2.10, about 35 % of them just entered in the study, without
any further donation. Since the goal of the proposed models is to describe donations as
recurrent events these individuals are excluded from the analysis. The others will be
called "recurrent donors".
The total number of donations for a donor does not give all the information about how
much a person donates in a certain time period. This number must be related to the time
in which each individual is observed, for example dividing it for the years of observation.
The empirical rates of donation have been computed (only for recurrent donors) and are
shown in figure 2.1. Notice that the empirical distribution of the yearly rate of donation
is left-skewed: most of the donors did less than two donation per year.
28
2.4. DESCRIPTIVE ANALYSIS
Total donations (n) Donors Sample frequency0 3238 0.35291 1608 0.17532 1101 0.12003 723 0.07884 555 0.06055 417 0.04546 292 0.03187 262 0.02868 178 0.01949 160 0.0174
10 119 0.013011 101 0.011012 75 0.008213 64 0.0070
>13 282 0.0307
Table 2.10: Number of donors that did exactly n total donations (after the first one)
Empirical yearly rate of donation
RATE
Den
sity
0 1 2 3 4
0.0
0.1
0.2
0.3
0.4
0.5
Figure 2.1: Histogram of the empirical rates of donation (number of donations dividedfor the years of observation)
29
CHAPTER 2. DATA SOURCE
It can be noticed that there could be a problem of loss to follow-up. This fact can be
realized by computing, for each donor, the number of days passed from the last donation
to the censoring time (namely, the last day of observation). See Figure 2.2 for the boxplot
of this quantity.
Loss to follow-up happens when an individual voluntarily abandoned the study, and
so he/she does not show up for a long period of time. However blood donations are on a
voluntary basis, hence we do not know if a donor actually decided to stop his/her activity
or he/she is only postponing the next donation.
If one believes that the history of the process influences the fact that some individuals
do not show up for a while then some choices about the censoring time Ci have to be done.
Then the dependence between the process and Ci must be modelled (see Ouyang et al.
(2013) for event dependent censoring time and chapter 7 of Cook and Lawless (2007) for
more details about loss to followup).
Non−recurrent donors recurrent donors
050
010
0020
0030
00
Figure 2.2: Boxplot of the number of days passed from the observed last donation ofevery donors to their censoring time
30
2.4. DESCRIPTIVE ANALYSIS
●●
●
●
●
●
●
●●
●
●
●●
●
●
●
●●●●●
●
●
●
●
●●●●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●●●
●
●
●
●
●●●
●
●
●
●●
●
●
●
●●●
●●●●
●
●
●●●
●●
●●
●
●
●
●
●●
●●
●●●
●●●
●
●●
●
●
●
●●●
●
●
●
●
●
●●
●
●
●●●
●
●
●
●●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●●
●
●
●
●●●
●
●
●
●
●
●
●
●
●●●●
●
●●
●
●●
●
●●
●
●
●
●●
●●●
●●
●
●●
●
●
●
●
●●
●●
●
●
●
●
●●●
●
●
●
●
●●●●●●●●
●
●●
●
●●●●●●
●●●
●
●
●
●
●
●
●
●●
●
●
●●
●●
●
●●
●
●
●
●
●●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●●●
●
●
●
●
●
●●
●●
●
●●
●
●
●●●●
●
●●
●
●
●
●
●●●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●●●
●
●
●
●●●
●●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●●
●●
●
●
●
●
●
●
●
●
●●●●●
●●
●
●●
●
●
●
●
●
●●●
●
2 3 4 5 6 7 8 9 10 11
4.5
5.0
5.5
6.0
6.5
7.0
7.5
repetition
log(
gap
times
)
(a) Boxplots of the waiting times of all in-dividuals Wi, j = Ti, j+1 −Ti, j grouped by thej− th repetition
●
●
●●
●
● ●
● ●
● ●
●
●
●
●
●
●
●
● ●
5 10 15 20
100
120
140
160
180
200
220
repetitionda
ys
●
●
●
●●
● ●● ●
●●
● ● ● ●
●●
● ● ●
meanmedian
(b) Trend of the mean and of the median ofWi, j with respect to j
Figure 2.3: Trend of gap times with the number of donations
●●●●●●●●●
●●●●●●●
●●●●
●●●
●
●●●●●●●
●●
●●
●
●
●●
●
●●●●●
●●
●●
●
●●
●
●●●
●
●●●●●●
●
●●
●●
●●●●●●●
●
●
●
●●●
●
●●●●●●●●●●●
●
●●●●●
●
●
●●●●●●●●●
●●●●
●
●●●●●●●●●●●●●
●●●●●●
●●●●●●●●●●●
●●
●●●●●
●●●●●
●●●●
●
●
●
●●●●
●
●●●●●●●●
●●●●●
●
●
●
●●
●
●
●●●●●●●●●●●
●
●●
●●●●●
●●●
●●●●●●●
●
●
●
●
●●
●●●
●
●●●●●
●
●●●●●●●
●
●
●
●●
●
●●
●
●
●
●
●
●●
●●●●●
●●●
●
●
●●●
●●●
●
●●●●
●●●●
●
●●●●●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●●
●●
●
●
●
●●
●
●
●●●●
●●
●
●
●
●●●●
●
●
●
●●●●●
●●●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●●
●
●●
●●●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
1 2 3 4 5 6 7 8 9
4.5
5.0
5.5
6.0
6.5
7.0
7.5
year
log(
gap
times
)
(a) Boxplots of the waiting times of all in-dividuals Wi, j = Ti, j+1 −Ti, j grouped by theyear in which the events occurred
●
●
●
● ●
●
●
●
●
2 4 6 8
100
150
200
250
year
days
●
●
● ●
●●
● ●
●
(b) Trend of the mean (red) and of the me-dian (blue) of Wi, j with respect to the yearin which the events occurred
Figure 2.4: Trend of the gap times with the years passed since entrance
In Gianoli (2016) it has been observed how the waiting times between two events
31
CHAPTER 2. DATA SOURCE
seem to have a decreasing trend as the number of donation goes by (see Figure 2.3).
Figure 2.3 gives some information about the rate of donations at time t conditionally to
the number of events occurred until that time. However in this thesis we are interested
in deepen how the rates of donations change once times is passed, without taking into
account the information on the number of events experienced. In this sense is more
meaningful to investigate the relationships between the gap times and the year in which
the corresponding events occurred.
By looking at Figures 2.4a and 2.4b a trend between the two is not evident, in fact the
medians seem constant over the years (remind that the years are counted from the first
donation of each individual). The tail of the empirical distributions becomes longer as it
can be noticed by the growth of the mean and by the boxplots.
log(gap times)
Fre
quen
cy
4.5 5.0 5.5 6.0 6.5 7.0 7.5
050
010
0015
0020
00
Figure 2.5: Histogram of the logarithm of the gap times
An interesting fact is the bimodality of the distribution of the gap times that reflects
the difference of the donations rule between the two genders: men are allowed to donate
before women. In Figure 2.5 the histogram of the gap times is shown, the red lines
correspond to the logarithms of 90 and 180, namely the minimum waiting times for men
and women.
32
2.4. DESCRIPTIVE ANALYSIS
2.4.2 Covariates
As mentioned before, some features are used as covariates in the statistical model that
we propose. Time-dependent covariates are not taken into account in this work; all
the covariates we include in the models were registered at the entry time in the study,
specifically the time that a person decides to sign up in AVIS.
In Table 2.11 all the categorical covariates are summarised with their sample fre-
quency. Some of them are objective (like sex, blood type or Rhesus factor), while the
others are declared by the person her/him-self (smoke and alcohol habits and level of
physical activity).
Variable Value Sample frequencySex F 0.372
M 0.628Smoke Non-smoker 0.656
Smoker 0.344Alcohol Not consumer 0.697
Consumer 0.303Physical Activity Sedentary life 0.327
Active life 0.673AB0 A 0.432
B 0.123AB 0.0120 0.462
RH POS 0.865NEG 0.135
DIETA Balanced 0.938Highly caloric 0.011Lowly caloric 0.004
Vegan/Vegetarian 0.016STRESS Absent 0.0658
Negative 1 0.824Negative 2 0.060Negative 3 0.011
Positive 0.004
Table 2.11: Table of the sample frequencies of the categorical variable
There are more men than women donors in the dataset. The majority of the population
has blood type group 0, and the positive Rhesus factor is more frequent than the negative
one. For what concern living habits variables it seems that donors have an healthy life.
33
CHAPTER 2. DATA SOURCE
In fact there are more non-smokers than smokers, and an active life is declared by most
of the individuals. Moreover the consumers of alcoholic beverages are outnumbered.
Variable Sample mean Standard deviationAGE 31.64 9.83BMI 23.67 3.88
Table 2.12: Mean and standard deviation of the continuous variable
For what concerns the continuous covariates, in Table 2.12 empirical mean and
standard deviations can be found. This values are used in the standardization of these
features.
From this table it can be noticed that in mean a person becomes donor for the first
time at about 32 years. Notice in the boxplots in Figure 2.7 that the first and the third
quantile are about at 25 and at 40 years.
The Body Mass Index is thought to be a measure that divides continuously the weight
situation of a person from underweight to severe obesity.
●
●
●●●●
●
●
●
●
●
●●●
●
●●
●
●
●
●●
●●
●
●
●
●
●●●●
●●
●
●●●●●
●●
●
●●
●
●
●
●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●●
●
●
●
●
●●
●●●
●●
●●
●
●
●
●
●
●
●●
●
●
●●
●●
●
●●●
●
●
●
●
●
●
●
●
●
●●●●
●●●
●
●●●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●●●
●
●●
●
●●●●
●
●●●
●●
●●●●●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
F M
1525
3545
Sex
BM
I
●●
●
●
●
●
●●●
●
●
●
●●
●
●
●●
●●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●●●
●
●●
●●
●
●
●
●
●
●
●●●●●
●
●
●●
●
●●●●
●
●
●●●
●
●
●
●
●
●
●
●●
●
●●
●●
●
●
●
●●●●
●
●
●
●●
●●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●●●
●
●
●
●
●
●
●
●
0 1
1525
3545
Active life
BM
I
●●
●
●
●●
●●●●
●
●●
●
●●
●
●
●
●●●●●
●●●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●●
●●●
●
●
●●
●
●
●●
●●
●
●●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●●●
●●
●
●●●●●
●
●
●
●
●●●
●
●
●
●
●●●
●●●
●
●
●
●
●
●
●
●●
●●●
●
●
●●
●
●
●●
●
●
●
●
●
●
●●●
●
●●●
●
●
●
●
●●●
●
●
●
●●
●
●
●
NO YES
1525
3545
Smoke
BM
I
●●
●
●
●
●
●●●●
●
●●●●
●●●●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●●
●
●●
●
●
●
●●●●●●●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●●
●
●
●
●●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●●
●
●●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●●●
NO YES
1525
3545
Alcohol
BM
I ●
●
●
●
●●●●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●●●
●
●
●
●
●
●
●●
●●●●●
●●
●
●●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●●
●●●●●
●
●
●
●
●●
●●
●●
●
●●
●
●
●
●
●●●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
NEG POS
1525
3545
Rhesus factor
BM
I
●●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●●
●
●
●●
●
●●
●●
●
●●
●
●
●
●●
●
●
●●●●
●
●
●●●●●●
●●●●●●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●●●●●●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●●●●●
●
●●
● ●
●
●
●●
●
●
●
●
●●●●
●
●
●
●
●
●●●
●
●
0 A AB B
1525
3545
Blood type
BM
I
Figure 2.6: Boxplots of the BMI according to the values of the categorical covariates
The normal weight range goes from 18K gm2 to 25
K gm2 . From Table 2.12 and Figure
2.6, one can say that the donors are in a situation of wellness.
34
2.4. DESCRIPTIVE ANALYSIS
Figures 2.6 and 2.7 are useful to discover some correlation pattern between the
continuous and categorical variables. However it is not clear from the box-plots if a
significant correlation exists.
The goal of the model that it is proposed in this thesis is to estimate the donation
rate, namely the number of donations in the time unit. It is evident, from figure 2.8, that
the distribution of the rates reaches higher values in males than in females. This was
expected since, according to law, men have the double of the possibilities to donate that
women have. No other correlations are evident in the mentioned figure.
●●
F M
2030
4050
60
Sex
Firs
t don
atio
n ag
e
●●●●●
●
●●●●
0 1
2030
4050
60
Active life
Firs
t don
atio
n ag
e
●●
NO YES
2030
4050
60
Smoke
Firs
t don
atio
n ag
e
●●●
NO YES
2030
4050
60
Alcohol
Firs
t don
atio
n ag
e
NEG POS
2030
4050
60
Rhesus factor
Firs
t don
atio
n ag
e
●
0 A AB B
2030
4050
60
Blood type
Firs
t don
atio
n ag
e
Figure 2.7: Boxplots of the first donation age according to the values of the categoricalcovariates
In Figure 2.9 the logarithm of the empirical rate is plotted against the corresponding
values of the BMI and of the first donation age. In red there is the line obtained with the
OLS estimator. The estimated correlation is positive for both the covariates, but further
investigations must be done in order to establish the significance of this relationship.
35
CHAPTER 2. DATA SOURCE
●●
●●
●
●●●
●
●
●●●●●●
●
●●●●●●●●
●●●
●●●●●
● ●
●
●
●
●
●
●
●
●
●
●●●
●
●●●●●
●
●
●●●
●
●
●
●
●●
●
F M
02
46
Sex
Don
atio
n ra
te (
N/y
ears
)
●●●●
●
●
●●●
●
●●
●
●
●
●●
●
●
●●
●●
●●
●
●
●●●●●
●
●
●
●
●●
●●
●
●
●●●●
●
●●●●
●
●●●●
●
●
●
●●●
●
●●
●●
●
●●●●●
●
●
●●
●
●
●
●●
NO YES
02
46
Active life
Don
atio
n ra
te (
N/y
ears
)
●●●●
●
●
●
●
●●●●●
●
●●●●
●
●●●
●
●●●●
●
●
●
●
●
●●
●●
●
●
●●●
●
●
●●●
●
●
●
●●
●
●●●●
●
●
●●●●
●
●●
●
●●●●
●
●
●
●
●●
●
●●●
●
NO YES
02
46
Smokers
Don
atio
n ra
te (
N/y
ears
)
●●
●●●●
●
●
●●
●●
●●●●●
●
●
●
●●
●●
●
●●●
●●
●
●
●
●
●
●●
●●
●
●●●●●
●●
●
●
●
●
●●●
●
●
●
●
●
●
●
●●●●
●●●
●
●●●
●
●●
●●
●
●●
NO YES
02
46
Alcohol consumers
Don
atio
n ra
te (
N/y
ears
)
●●●
●
●
●●●●
●
●
●●
●
●●
●●●●
●●
●
●
●●●●●
●
●
●
●●●
●●
●
●
●●
●
●●●●
●
●
●
●●●●●
●
●●
●
●
●●●●
●●●●●
●
●
●
●●●
●
●
●
●
●●
●
●●
●
0 A AB B
02
46
Blood type
Don
atio
n ra
te (
N/y
ears
)
●
●●●●●
●
●
●
●
●●
●●●●●●●
●
●
●
●
●●●●●
●●
●
●
●●
●
●
●
●
●●
●●●
●
●●●●
●
●
●●●
●
●
●
●●●
●
●●
●
●●●●
●●●●●
●
●
●●●
●
●
●
●●
NEG POS
02
46
Rhesus factor
Don
atio
n ra
te (
N/y
ears
)
Figure 2.8: Boxplots of the donation rate grouped with the categorical variable
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
● ●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
● ●●
●
●
●
●●
● ●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●●●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
● ●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●●
●
●
● ●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●●
●● ●
●
●
●
●
●
●
● ●
●●
●
●
●
●
●
● ●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●●
●
●
●
●
● ●
●
●
●
●●●
●● ●
●
●
●
●
●
● ●
● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
● ●
●
●●
● ●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
● ●
●
●●
●
●●
●● ●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●●●
●
●●●
●
●
● ●
●
●
●
● ●
●
●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●● ●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●●
●
●
●
●
●
●●
●
●
●
●
● ●
● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●● ●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●●
●●
● ●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
● ●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
● ●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●●
●
●
●●●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●●
●●●
●
●
●
● ●
●
●
●
●
●
●
●
●
● ●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●● ●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
● ●
●
● ●●
●●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●● ●
●
●●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
●
● ●●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●●
●
● ●
●
●
●
●
●
● ●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
● ●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
● ●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●
●●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
● ●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
● ●
●
●
●
●
●
●●● ●
●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●
● ●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
● ●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
● ●
● ●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●
●
●●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
● ●
●
●
●
●
●
●
●
●
●
●
●●
● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●●●
●
●
●●
●
●
● ●●
● ●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●●
●
●
●
●●
●
●
●
●
●●
●● ●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●●
●
●
●
●●●
●●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●●● ●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●●●
● ●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●●
●
● ●●
●
●
●●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●
● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
● ●●
●
●●
●
● ●
●
●
●●
● ● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●● ●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●●●
●
● ●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●●
●
●●
●
●
●
●
●
●
●
●
●●
●
● ●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●●
●●
●
●
●
●
● ●●
● ●
●●
●
●●
●
●
●
●●
●
●
●
●
●
● ●
●
●
●
●
●●
●
●
●
● ●
●
●
●
●
●
●
●
● ●
●
● ●
●●
●
●● ●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●●
●
●
●
●
●
●
●
● ●●
●
●●
●
●
●
●
●
●
●●
●
● ●
●
●
● ●
●● ●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
● ●
● ●●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●●
●●
●
●
●
●
●
●
● ●
●
●
●
●
●● ●
●
● ●●
●
●
●●
●
● ●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●
●
●
●
●
●●●
●
●
●● ●
●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●●
●●
●
●●●
●
●
●
● ●●
●●
●
●
●●
●●
●
● ●
●
●
●
●
●
●
●●
●●●
●
●
●
●●
●
● ●
●
● ●
●●
●
●
● ●
●
● ●
●
●●●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●●●
●
●
● ●●
●
●●
●
●
●
●
●
●● ●
●●●
●
●
●
●
●
●●
●●●
●
●
● ●●
● ●●
●
●
●
●
●
● ●
●
●●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
● ●
●●
●
●●
●
●●● ●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
● ● ●●
● ●
●
●
●●● ●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●●
●●
●
●
●●
●●
●
●
●
●●●
●
●● ●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●●●
●
●
●●
●●
●
●
●
●
●●●●●
●●
●● ●●
●●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●
● ●● ●
●
●
●●
●
● ●
●
●
●●
●
●
●●
●
●
●
● ●
● ●
●
●●
●●
●●
●
●
●
●
●●
●
●●
●
●
●●
●●
●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●
●
●●
●●
●
●
●●●
●
● ●● ●
●
● ●●
●
●
●●
●
●
●●
●
●●
●
●●●
●
●●●
●
●
●● ●
●
● ●
●
●
●
●
●
●
●●●
●●
●●
●
●
●
●
●
●●
●
●●● ●
●
●
●
●
●
●
●
●
●
●● ●●
●●
●●
●
●
●
●
● ●●
●
●●
●
●
●
●
● ●
●
●
●●
●
●
●
●
●●
●
● ●● ●
●● ●● ●
●●●
● ●
●
●●
●
●
●
● ●●●
●●
●
●
●●●
●●
●
●
●
●●
●
●●●●
●
●
●
● ●
●
●
●
●●●
●
●●
● ●●
●
●
●
●
●
●
●
●●●
●
●●●● ●●
●
●●
●●● ●●●
●
●●●●●
● ● ●●
●●● ●● ●●
●●
● ●●●
●●
●
●●●
15 20 25 30 35 40 45
−1.
5−
0.5
0.5
1.5
BMI
log(
rate
)
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●
● ●
●●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
● ●●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●
●
●
●
●
●
●
●●
●
●
● ●
●●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
● ●
●
●
●
● ●
● ● ●
●
●
●
●
●
●
● ●
●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●● ●
●
●
●
●
● ●
●
●
●
●●●
●●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
● ●
●
●
● ●
●
●●
● ●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
● ●
●
●●
●
●●
●●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
● ●●
●
● ●●
●
●
●●
●
●
●
● ●
●
●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
● ●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●● ●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
● ●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●● ●
●
●
●
●
●
●●
●
●
●
●
●
●●
● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●●
●●
● ●
●
●
●
● ●
●
● ●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●
● ●
●
●
●●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
● ●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
● ●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●●
●
●
●● ●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
● ●
● ●●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●
● ●●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●●
●
● ●●
● ●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
● ●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●
● ●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
● ●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●● ●
●
●●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
● ●●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
● ●●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
● ●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
● ●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
● ●
●
●●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
● ●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●● ●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
● ●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
● ●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
● ● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
● ●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●● ●
●
●
●●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●●
●
●
● ●
●●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●
●
●
●
● ●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●●●
●
●
●●
●
●
●●●
● ●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●●
●
●
●
●●
●
●
●
●
●●
● ● ●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●●●●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●● ●●
● ●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●●
●
●●●
●
●
● ●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
● ●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●● ●
●
●●
●
● ●
●
●
●●
●● ●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
● ●● ●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●
● ● ●
●
● ●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
● ●
●
●●
●
●
●
● ●
●●
●
●
●
●
●●●
●●
●●
●
● ●
●
●
●
● ●
●
●
●
●
●
● ●
●
●
●
●
●●
●
●
●
● ●
●
●
●
●
●
●
●
● ●
●
● ●
● ●
●
●● ●
●
●●
●●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●● ●
●
●
●
●
●
●
●
●●●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
● ●
●●●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
● ●
●●●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●●
● ●
●
●
●
●
●
●
● ●
●
●
●
●
●●●
●
●●●
●
●
● ●
●
●●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
● ●●
●
●
●
●
● ●● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●● ●
●
●
●● ●
●
●
●
●●
●
●
●
●●
●
●
●●
●
●
● ●
● ●
●
● ● ●
●
●
●
● ●●
● ●
●
●
●●
●●
●
●●
●
●
●
●
●
●
● ●
●● ●
●
●
●
●●
●
● ●
●
●●
●●
●
●
● ●
●
●●
●
● ●●
●●
●
●
●
●
●
●
●●
●
● ●
●
●
●
●
●
● ●●
●
●
● ●●
●
●●
●
●
●
●
●
●● ●
● ●●
●
●
●
●
●
●●
● ●●
●
●
●● ●
●●●
●
●
●
●
●
●●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
● ●
● ●
●
●●
●
●●● ●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●●●●
● ●
●
●
● ●●●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
● ●
●●
●
●
●●
●●
●
●
●
●●●
●
●● ●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●●●●
●
●●
●●
●
●
●
●
●● ● ●●
● ●
●●●●
●●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
● ● ● ●
●
●
●●
●
● ●
●
●
●●
●
●
● ●
●
●
●
● ●
●●
●
●●
●●
●●
●
●
●
●
●●
●
●●
●
●
●●
●●
●
●
●
●
●●
●
●
●
●
● ●●
●
●
●
●
●
●●
● ●
●
●
●● ●
●
● ●● ●
●
●● ●
●
●
● ●
●
●
● ●
●
●●
●
●●●
●
●●●
●
●
●● ●
●
●●
●
●
●
●
●
●
●●●
●●
●●
●
●
●
●
●
●●
●
●●● ●
●
●
●
●
●
●
●
●
●
●● ●●
●●
●●
●
●
●
●
● ●●
●
●●
●
●
●
●
● ●
●
●
●●
●
●
●
●
●●
●
● ●● ●
● ●●●●
●● ●
●●
●
●●
●
●
●
● ●● ●
●●
●
●
●●●
●●
●
●
●
●●
●
● ●●●
●
●
●
●●
●
●
●
●●●
●
●●
●●●
●
●
●
●
●
●
●
● ●●
●
● ● ●● ●●
●
●●
●●●●● ●
●
●● ●● ●
●● ●●
●●● ●● ● ●
●●
●●●●
●●
●
● ●●
20 30 40 50 60
−1.
5−
0.5
0.5
1.5
First donation age
log(
rate
)
Figure 2.9: Scatterplot of the donation rates against the continuous variable (AGE andBMI)
36
Modelling blood donations as
recurrent events3
I n this chapter, the model used to analyze the blood donations data will be ex-plained in detail. First, state of the art of predictive models of blood donation isdiscussed. Afterwards, there will be a subsection for each class of parameters in
the model, then the model will be summarized in the third section of the chapter. In theend it is explained how to obtain an MCMC sample of recurrent event processes.
3.1 Recurrent event models for blood donations
Bosnes et al. (2005) set up a logistic regression model to predict if donors actually show
up on a scheduled donation session. However this kind of approach focuses on a single
donation on a specific date, and, as a consequence, gives a limited insight into the long-
term behaviour of blood donors. Logistic regression has been applied even in Flegel et al.
(2000), where the probability that a person returns to donate within a preselected time
interval has been modelled. James and Matthews (1996) follow a time-to-event approach,
using non-parametric methods of survival analysis. Indeed the Kaplan-Meier estimator
for the hazard function of the first 5 donations cycle has been built. Proportional hazard
model is then used to establish covariates effect. Ownby et al. (1999) approach gap times
recurrent event modelling with a proportional hazard model to describe the first 10
return times to donation of an individual. Then, the first 5 return times were combined
using an homogeneous Poisson process with proportional hazards. All the previous
publications rely on frequentist methods. For what concerns the Bayesian setting, in
Gianoli (2016) blood donations are treated as recurrent events in the framework of gap
times between events. In particular, a class of autoregressive Bayesian semiparametric
37
CHAPTER 3. MODELLING BLOOD DONATIONS AS RECURRENT EVENTS
models for gap times have been considered. Fixed and time-dependent covariates were
considered and an individual-time-specific random effect has been modelled through a
Dirichlet process (DP) mixture prior, inducing clustering among donors.
3.2 Modelling choices
In the framework of recurrent event process the goal is to estimate the intensity function:
λ(t|H(t))= lim∆t→0
P(N(t)−N(t−∆t)= 1|H(t))∆t
. (3.1)
Before starting the discussion it is important to clarify what does the time variable tmean. For every individual the origin of the time axis is the time of his/her first whole
blood donation and so the time t is the number of days passed since that moment.
Hence for the individual i the set of observations is composed by:
Ti,1, . . . ,Ti,ni ,
where ni is the number of donations experienced by the donor, and so Ti, j is the number
of days passed from time t = 0 to the j− th recurrent blood donation (after the first one).
For every individual a blood donation occurred at time t = 0; this donation has not to be
considered as an event time but just part of the initial conditions, hence all the analyses
are done conditionally to this event.
Once the intensity function and the time scale are defined it is possible to compute
the likelihood of the realization of the recurrent event process of one single donor using
the formula in Section 2.1 in Cook and Lawless (2007):
P(n events at times t1 < t2 < ...< tn|H(τ0))=exp(−
∫ τ
τ0
λ(u|H(u))Y (u)du) n∏
j=1λ(t j|H(t j))
0= τ0 < t1 < . . .< tn < τ,
where Y (t) is the "at risk" indicator function, a binary function that indicates if an
individual is at risk to experience an event or not (see Section 1.5.1).
Let us denote with λi(t|H(t)) the intensity function and with Yi(t) the "at risk"
indicator function related to the i− th individual. The intensity will be modeled in the
framework of the multiplicative model as explained in Section 1.4. Hence :
λi(t|H(t))=Yi(t)×wi ×λ0(t|H(t))×exp(x′iβ), (3.2)
where:
38
3.2. MODELLING CHOICES
• λ0(t|H(t))) is the baseline intensity function;
• wi is an individual-specific random effect (also said frailty);
• xi is the vector of covariates of the i− th individual and β a vector of coefficients;
• Yi(t) is the "at risk" indicator function that is considered as a datum for each
individual.
3.2.1 Baseline intensity function
The baseline intensity function will be modeled as the product of two components.
λ0(t|H(t))=( K∑
k=1λk × I(ak−1,ak](t)
)×
(I(t−TN(t−)>φG )(t)
). (3.3)
The first component is independent of H(t) (the history of the process until time t) and
it is expressed as a piece-wise constant function (see Section 1.5.4). This model is very
flexible but requires to partition the time domain in a fixed number of intervals. Let us
call K the number of cut-points (denoted as a0 = 0, . . . ,aK ). For each one of these intervals
there is a parameter λk that can be interpreted, in analogy to the homogeneous Poisson
process, as an occurrence rate of the events in that interval. The choice of the knots will
be object of a predictive performance analysis, both on the choice of K (5,10 or 20) and on
type of division of the time domain. About the latter, two kinds of cut-points have been
considered:
• the quantiles of the donation times of each individual which were performed from
the 1st of July 2001 to the 31st of December 2009. Recall that the time of a donation
of a person is the number of days passed from the first donation of that individual;
• an equispaced grid from time 0 to the maximum observed time.
As already mentioned in Chapter 1, it is common to choose the quantiles of the event
times as cut-points of the time domain. However this is a data driven choice and, by
definition, it is not independent of the data that the model aims to fit. To keep balanced
the number of events in each interval and to make a data-independent choice of the grid,
we selected the quantiles of the event times of a time window in the past that lasted as
the one used to extract the data for this thesis.
A prior probability is assigned to the rates λ1, . . . ,λK .
λki.i.d.∼ Γ(αλ,βλ), k = 1, . . . ,K αλ,βλ fixed. (3.4)
39
CHAPTER 3. MODELLING BLOOD DONATIONS AS RECURRENT EVENTS
While the first part of the baseline intensity function has no dependency from the
past, the second component of (3.3) depends on the history of the process. Moreover it
repeats itself equal after each event, like the intensity function of a renewal process.
Since in this model the features of a Poisson process and of a renewal process coexist
we are in the case of the general intensity-based model (see Section 1.5.2). The indicator
function in the second part of (3.3) has the goal to model the fact that a person cannot
donate for a certain period of time φG , which depends on his/her gender. Indeed the
intensity is set equal to 0 for φG days after every event, and so it is the probability
to donate. According to AVIS rules, the post-donation rest time φG should be equal to
φM = 90 days for men, and φF = 180 days for women. However there are donations
that happen before (see Figures 3.1 and 3.2), since a physician is allowed to move up
donations. Hence we set the parameter φM to 85 days and φF to 150 days, discarding
from the analysis all the donors that at least once did not respect this further restriction.
The thresholds have been fixed heuristically. The goal of this choice was to discard as
least as possible individuals from the study and to allow reasonable early donations. With
this particular choice only 5 men and 82 women has been discarded. If the information
about the fertility status of a female donor was available, it would have been possible
to apply the threshold of the men even to women in menopause, like the association (in
principle) does. However, since no particular trend has been noticed between the gap
times of the women and their age at the times of donation it has been decided not to
investigate any further in the Association databases and to treat the female population
as one but to lower more -with respect to males- the post-donation rest time.
40
3.2. MODELLING CHOICES
Histogram of gap times: Females
Gap times
Fre
quen
cy
0 200 400 600 800 1000
010
020
030
040
0
Figure 3.1: Histogram of gap times of female donors, the red line corresponds to 180 days
Histogram of gap times: Males
Gap times
Fre
quen
cy
0 200 400 600 800 1000
050
010
0015
0020
0025
00
Figure 3.2: Histogram of gap times of female donors, the red line corresponds to 90 days
41
CHAPTER 3. MODELLING BLOOD DONATIONS AS RECURRENT EVENTS
20 30 40 50 60
02
46
810
12
Threshold age for menopause
% e
arly
wom
en d
onat
ions
Post donation rest time
150160170175180
Figure 3.3: Percentage of earlier that allowed donations as a function of the thresholdage for menopause
In fact, as it can be noticed in Figure 3.3, for each possible choice of the post-donation
rest time for women φF and for each reasonable choice of a threshold age for menopause
there remains a significant percentage of "earlier than allowed" donations, namely early
donations of young women. With a choice of 150 days for the post-donation rest time and
setting 50 years as the menopausal age only 1 % of early donations are observed.
3.2.2 Frailty parameters
The random effects or frailties are denoted by wi, where the subscript i is the index
of the individual. These parameters are meant to capture the heterogeneity between
individuals and have a multiplicative effect on the intensity function, which means
that a value greater or smaller than 1 can be interpreted respectively as a more or
as a less propensity to experience an event. Usually these parameters are modeled
as Gamma random variables with mean equal to 1 and variance equal to η. In this
work these conditions holds conditionally to the variance parameter η, which it has its
marginal prior distribution inducing correlations among the random effects through
exchangeability. Summing up:
wi|η iid∼ Γ(η−1,η−1), i = 1, . . . , M (3.5)
η∼Γ(2,2), (3.6)
42
3.2. MODELLING CHOICES
where the scale and the shape parameters of η have been chosen after an analysis of
sensitivity.
Another option is to consider a division of the individuals into groups according
to their postal code. In this case the frailties are areal-dependent, and one variance
parameter η j is estimated for the j− th zone. The areal dependence of the random effects
has been addressed in literature by many authors. See for example Banerjee et al. (2003),
Henderson et al. (2002), Li and Ryan (2002). The prior structure of the random effects’
parameters is mainly based on the distance matrix of the areas. However, to keep the
model simple, a different prior has been chosen in this work. The parameters η j, for
j = 1, . . . , J, (J is the number of areas) are a priori exchangeable, hence correlation is
induced among them once the hyperparameters are marginalized. In this case the model
is:
wi|η jiid∼ Γ(η−1
j ,η−1j ), i = 1, . . . , M and j is the zone of the i− th individual (3.7)
η j|αη,βηiid∼ Γ(αη,βη), j = 1, . . . , J (3.8)
αη,βηiid∼ Γ(a,b), a,b fixed (3.9)
a = 3,b = 2 (3.10)
Once the posterior distribution of the variance parameter η is known it is possible to
compute the predictive density of a new donor’s random effect, which we call wnew. Let
us indicate with L (η|data) the posterior law of η.
L (wnew|data)=∫
L (wnew|η,data)L (η|data)dη=∫
L (wnew|η)L (η|data)dη (3.11)
Then, if for every η(s) in an MCMC sample from L (η|data) of dimension S, wnew,(s) is
sampled independently from a Gamma distribution of scale and shape parameters equal
to1η(s) , the result {wnew,(1), . . . ,wnew,(S)} is an MCMC sample from L (wnew|data), namely
the predictive density of the frailty of a new incoming donor.
3.2.3 Covariates
As mentioned in the previous chapter, donor-specific fixed-time covariates are considered
in the analysis. The maximum number of covariates included in the model is 9, but
models with less covariates will be compared through goodness-of-fit indicators. The
whole set of covariates considered is as follows:
• age at the time of the first donation (standardized);
43
CHAPTER 3. MODELLING BLOOD DONATIONS AS RECURRENT EVENTS
• binary variable for gender (1 male, 0 female);
• Body Mass Index (standardized);
• binary variable for smoker (1 smoker, 0 otherwise);
• binary variable for alcohol consumption (1 consumer, 0 otherwise);
• binary variable if the donor has an active life (1 if yes, 0 if not);
• dummy variable for blood type 0 (equal to 1 if the donor’s blood type is 0, otherwise
0);
• dummy variable for blood type A (equal to 1 if the donor’s blood type is A, otherwise
0);
• binary variable for Rhesus factor (1 if positive, 0 negative).
A dummy variable for blood type AB has not been considered since very few donors in
the dataset are AB-typed.
To have a non-informative prior distribution, the parameters β1, . . . ,βp, are a priori
independent identically normal distributed random variables with mean 0 and variance
equal to 100.
3.2.4 At risk indicator function, censoring and suspensions
The interval of observation is not the same for all the individuals. In fact the time-axis
origin is the time of the first donation of a donor, while the other extreme of the interval
of observation is the number of days between the 30th of June 2018 and the time-axis
origin. This is a censoring phenomenon.
Moreover some donors cannot be observed in a certain period since a suspension from the
donations can occur if there are some health issues. The suspensions of each donor are
available in the AVIS database and they are treated in the model as data. The modeling
of these two phenomena is done with a function Yi(t), that is equal to 1 if donor i is not
censored or not suspended at time t, and 0 otherwise. As explained in Cook and Lawless
(2007), if the value of Yi(t) is independent of the recurrent event process the intensity
function can be rewritten as:
Yi(t)λ(t|H(t)), (3.12)
44
3.3. THE BAYESIAN MODEL FOR RECURRENT DATA OF M DONORS
and the likelihood becomes:
exp(−
∫ τ
0λi(u)Yi(u)du
)×
{ M∏j=1
λi(ti j)}. (3.13)
However since the suspensions regard few individuals and they are very noisy (see 2.2.4),
they are not included in the function Yi(t) for the analysis.
3.3 The Bayesian model for recurrent data of Mdonors
3.3.1 The likelihood
For any i = 1, . . . , M we define the observations as:
• nik = number of events experienced by the i−th individual in the interval (ak−1,ak];
• n·k =∑Mi=1 nik = total number of events in the interval (ak−1,ak];
• ni =∑Kk=1 nik = total number of events experienced by the i− th individual;
• Yi(t)= I(i− th individual is observed at time t);it contains information about censoring and, possibly, suspensions;
• τik =∫ ak
ak−1Yi(u)I(u−TNi (u−)>φG )(u)du =
total time that the i− th individual has been observed in the interval (ak−1,ak];
• xi = (xi1, . . . , xip)′, p ≤ 9
p-dimensional vector of covariates of the i− th individual.
The likelihood function of the proposed model is:K∏
k=1
{λ
n·kk
M∏i=1
{wni
i exp(x′iβ−wi exp(x′iβ)λkτik
)}}. (3.14)
3.3.2 Prior elicitation
The parameters of the model can be expressed as a vector θ defined in the following way:
θ := (λ1, . . . ,λk,β1, . . . ,βp,w1, . . . ,wM ,η), (3.15)
or
θ := (λ1, . . . ,λk,β1, . . . ,βp,w1, . . . ,wM ,η1, . . . ,ηJ), (3.16)
where:
45
CHAPTER 3. MODELLING BLOOD DONATIONS AS RECURRENT EVENTS
• λ1, . . . ,λK are the interval-specific rates;
• β := (β1, . . . ,βp)′ is the p-dimensional vector of covariates coefficients;
• w1, . . . ,wM are the individual specific random effects;
• η1, . . . ,ηJ or η are respectively the variances of the random effects with or without
areal dependence.
Given the parameter θ and the vector xi, the intensity function of the i− th individual is:
λi(t|H(t),θ)=(I(t−TNi (t−)>φG )(t)
)×
K∑k=1
wi exp(x′iβ)λkI(ak−1,ak](t) i = 1, . . . , M, (3.17)
where φG is a fixed parameter that depends on the sex of the individual and it represents
the post-donation rest time.
A priori independence among blocks of parameters is assumed, with marginal priors as
follows:
β∼N (0,σ2Ip) σ2 fixed Ip identity matrix ∈Rpxp. (3.18)
λkiid∼ Γ(αλ,βλ) k =1, . . . ,K αλ,βλ fixed. (3.19)
If the model has zone-dependent frailties:
wi|η jind∼ Γ(η−1
j ,η−1j ), i = 1, . . . , M. (3.20)
η j|αη,βηiid∼ Γ(αη,βη) j = 1, . . . , J. (3.21)
αη,βηiid∼ Γ(a,b) a,b fixed. (3.22)
otherwise:
wi|η iid∼ Γ(η−1,η−1), i = 1, . . . , M. (3.23)
η∼Γ(aη,bη) aη,bη fixed. (3.24)
3.4 The predictive distribution of the countingprocess of a new incoming donor
The predictive distribution of the point process of a new incoming donor with known
covariates xnew can be computed as:
L (Nnew(t)|data, xnew)=∫
L (Nnew(t)|wnew,λ, xnew,β)L (wnew,λ,β|data) dwnew dλ dβ,
(3.25)
46
3.4. THE PREDICTIVE DISTRIBUTION OF THE COUNTING PROCESS OF A NEWINCOMING DONOR
which can be estimated through MCMC once L (Nnew(t)|wnew,λ, xnew,β) is analytically
known (for example in the case of a Poisson process).
The analytical expression of the law of Nnew(t) given the parameters would require the
computation of the law of all the event times Tnew1 ,Tnew
2 , . . ..
Nnew(t)≥ k ⇐⇒ Tnewk ≤ t, (3.26)
However the distribution of Tnewk is not trivial if the intensity function depends on the
history of the process, like the one used in this thesis. As a consequence, the estimation
of L (Nnew(t)|data, xnew) can be done via MCMC, extracting one realization of a process
for each vector of parameters drawn from the posterior distribution. In order to do this,
it is necessary to be able to sample a realization of a recurrent event process of intensity
function λ(t|H(t),θ). A possible strategy is to use the inversion method to draw a sequence
of event times{T1,T2, . . .
}by using the cumulative distribution function of T j given T j−1
obtained in (1.5).
Hence the sampling scheme is:
• tnew0 := 0;
At step j:
• derive F(tnewj |tnew
j−1 ,θ, xnew)= P(T j < tnewj |T j−1 = tnew
j−1 ,θ, xnew);
• Sample U j ∼Uni f ([0,1]);
• tnewj solves the equation U j = F(tnew
j |tnewj−1 ,θ, xnew).
The algorithm terminates when tnewj exits from the time window.
The resulting sequence {tnew0 , tnew
1 , . . .} is a realization of a recurrent event process
of intensity function λ(t|H(t),θ). An MCMC sample of L (Nnew(t)|data, xnew) can be
obtained in the following way:
• θ(s) is a vector of the MCMC sample from the posterior distribution of the model
(3.14) (s = 1, . . . ,S);
• sample {tnew0s , tnew
1s , . . .} from a recurrent event process of intensity function λ(t|H(t),θ(s), xnew);
• Nnews (t)=∑
i≥1 I(tis ≤ t).
{Nnew1 (t), . . . , Nnew
S (t)} is a sample from L (Nnew(t)|data, xnew), the posterior distribution
of the point process of a new incoming donor.
47
Posterior inference on AVIS data4
T his chapter presents posterior and predictive inference for the models describedin Chapter 3 and applied to the AVIS data (see Chapter 2). In the first sectionit is described how the inference has been obtained, while the second section is
devoted to illustrate inference about the parameters.
4.1 Posterior inference
Sampling from the posterior distribution has been done using Stan (Stan Development
Team and others, 2016), which is a more efficient software for MCMC sampling rather
than the ones written in the BUGS language, like JAGS or WinBUGS. However, Stan
has the drawback that sampling from a discrete parameter cannot be done, which was
not relevant in this model since all the parameters are continuous.
Unless stated otherwise, the sampling is performed with 50000 iterations of warm-
up plus other 50000 of sampling thinned of 25. The result is an MCMC sample of
2000 observations. The likelihood of the model is (3.14); the prior for the parameters is
specified in (3.18) and (3.19), and (3.23) (or (3.20)). The covariates are the ones described
in Paragraph 3.2.3. The time domain (in days) is [0,3100].
The initial dataset was composed of 9175 donors. Among these individuals, 3238
persons just entered in the study without performing any donation apart from the one at
time t = 0. Since the goal of this work is to model the behaviour of a donor experiencing
multiple blood donations in a specific blood collection point, these 3238 individuals have
been excluded from the analysis. Among the remaining 5937 persons, there are 87 donors
who at least once did not respect the post-donation rest time of 85 days for men and 150
for women. Moreover, other 92 donors have some missing values in their covariates. The
final sample is composed of 5758 donors and 25073 whole blood donations.
The convergence diagnostics of all the simulations has always been checked, showing
that all the MCMC chains have reached stationarity.
49
CHAPTER 4. POSTERIOR INFERENCE ON AVIS DATA
4.2 Inference on parameters
4.2.1 Baseline intensity function
The baseline intensity function is modelled as a step function. The steps are a priori
independent random variables Gamma distributed, with fixed scale and shape para-
meters (αλ = βλ = 2 in equation (3.4)). A sensitivity analysis showed that the model is
robust with respect to the choice of the hyperparameters; simulations with parameters
αλ = βλ = 3, αλ = βλ = 0.01 and αλ = 2.5 βλ = 1.5 have been run to check robustness of
the model.
Preliminary choices to be discussed are the type and the numbers of intervals (5,
10 or 20 intervals, denoted by K in equation (3.3)). The evaluated intervals were either
equispaced or having as cut-points the empirical quantiles of the donations occurred in
the eight and a half years previous the study (remind that the study lasted from the
1st of January 2010 to the 30th of June 2018, namely eight and a half years). Posterior
inference for each of these six possible choices is shown in Figure 4.1. Observe that in all
the plots in Figure 4.1 the baseline intensity function has a decreasing trend with time,
meaning that the propensity to donate is higher at the beginning of the life as a donor.
However there are some fluctuations at the end of the time domain in Figure 4.1e and in
the first intervals in Figures 4.1d and 4.1f.
If the choice of the cut-points is made by the quantiles, the first interval (in days) is
[0,85], where no events occurred because of the post-donation rest time. In this case the
MCMC algorithm sampled from the prior, namely a Gamma distribution with shape and
scale equal to αλ =βλ = 2.
The six choices were evaluated using WAIC. However the diagnostic of this method
was not good (the majority of the components of the sum in pW AIC2 exceed the value 0.4,
which, according to Gelman et al. (2014), can lead to an unreliable estimate of the l ppd).
Hence the log-posterior predictive density was evaluated on the data. From Figure 4.2
it seems that the growth of the predictive performances of the model by doubling the
intervals from 10 to 20 is not significant, indeed an "elbow" appears. Moreover the
estimated l ppd are nearly equal for the two choices of cut-points in the case of 20
intervals.
The inference on the other parameters is robust with respect to the choice of the
intervals.
50
4.2. INFERENCE ON PARAMETERS
0 500 1000 1500 2000 2500 3000
0.00
00.
005
0.01
00.
015
Days
Bas
elin
e ra
te fu
nctio
n
(a) 5 equispaced intervals
0 500 1000 1500 2000 2500 3000
0.00
00.
005
0.01
00.
015
Days
Bas
elin
e ra
te fu
nctio
n
(b) 5 quantiles-defined intervals
0 500 1000 1500 2000 2500 3000
0.00
00.
005
0.01
00.
015
Days
Bas
elin
e ra
te fu
nctio
n
(c) 10 equispaced intervals
0 500 1000 1500 2000 2500 3000
0.00
00.
005
0.01
00.
015
Days
Bas
elin
e ra
te fu
nctio
n
(d) 10 quantiles-defined intervals
0 500 1000 1500 2000 2500 3000
0.00
00.
005
0.01
00.
015
Days
Bas
elin
e ra
te fu
nctio
n
(e) 20 equispaced intervals
0 500 1000 1500 2000 2500 3000
0.00
00.
005
0.01
00.
015
Days
Bas
elin
e ra
te fu
nctio
n
(f) 20 quantiles-defined intervals
Figure 4.1: 95 % credibility intervals for the baseline intensity function51
CHAPTER 4. POSTERIOR INFERENCE ON AVIS DATA
●
●
●
5 10 15 20
−14
9300
−14
9200
−14
9100
−14
9000
Number of intervals
elpp
d
●
INTERVALS
quantilesequispaced
Figure 4.2: Estimated log posterior predictive density
4.2.2 Covariates coefficients
The model takes into account also the dependence of the intensity function from some
individual features. The relationship between these quantities is captured by a multi-
plicative effect on the baseline intensity function. The multiplicative effect is expressed as
the exponential of a linear combination of the covariates with coefficients βi, i = 1, . . . , p.
The variables associated to each of the coefficients are presented in Section 3.2.3.
Figure 4.3 reports the 95 % credibility intervals for the covariates coefficients.
●
●
●
●
●
●
● ●
●
−0.
20.
00.
20.
40.
6
Age Sex BMI Smoke Alcohol Active life Type 0 Type A Rh +
Figure 4.3: 95 % credibility intervals for the βi ’s parameters
To see how much significant these parameters are, Bayesian p-values can be com-
52
4.2. INFERENCE ON PARAMETERS
puted.
Bayesian p−value =min{P(βi > 0|data),P(βi < 0|data)
}. (4.1)
A low Bayesian p-value denotes that 0 lies in the tail of the posterior distribution, and so
that the coefficient is significant.
Coefficient Bayesian p-value Hazard ratio (q0.025,q0.975)Age 0.00 (1.23,1.30)Sex 0.00 (1.58,1.80)BMI 0.19 (0.98,1.04)
Smoke 0.00 (0.81,0.92)Alcohol 0.22 (0.92,1.03)
Active life 0.14 (0.98,1.10)Type 0 0.35 (0.91,1.07)Type A 0.31 (0.90,1.07)Rh + 0.03 (0.86,1.01)
Table 4.1: Bayesian p-values and hazard ratios
The variable Age, which denotes the age of the donor at the entrance in the study, has a
positive effect on the intensity function: the elder the individual the higher the rate of
donation. A positive effect is given by the variable Sex too, which means that men have
higher propensity to donate rather than women. The other two significant covariates are
Smoke and Rh+, which have a negative effect on the rate function. Individuals with a
negative Rhesus factor can receive transfusions only by other individuals of negative
Rhesus factor. The fact that a positive Rhesus factor diminishes the intensity function
may suggest that the Rh negative individuals feel more responsibilities in their role,
considering that this feature of the blood is known to be less frequent than a positive
Rhesus factor.
The effect of the covariates can be quantified by computing exp(βi), which is the ratio
among the intensity functions of two individuals that differ in covariate xi of one unit.
In the case of categorical variables this is the effect of the group xi. In survival analysis
exp(βi) is called hazard ratio. The 95 % credibility intervals of the hazard ratios are in
the third column of Table 4.1.
To find an optimal subset of covariates, a sensitivity analysis has been done. Three
models with a different subset of covariates have been compared. The cut-points were
fixed to 10 intervals quantiles-defined. As explained before, WAIC was not reliable to
estimate the log-posterior predictive density, and so 10 fold-cross validation has been
used as a measure of predictive accuracy. The three compared set of covariates are:
53
CHAPTER 4. POSTERIOR INFERENCE ON AVIS DATA
• the maximal set of covariates (p = 9);
• only the significant covariates (Age, Sex, Smoke, Rh +) and the dummies for the
blood type (p = 6);
• only the significant covariates.
Age Sex BMI Smoke Alcohol Active life Type 0 Type A Rh lppd p3 3 3 3 3 3 3 3 3 -149085.04 93 3 7 3 7 7 3 3 3 -149087.98 63 3 7 3 7 7 7 7 3 -149087.14 4
Table 4.2: Predictive performances evaluation of models with different sets of covariatesusing 10 fold cross validation.
The estimates of the coefficients are robust with respect to the presence of other features
in the model. The results of predictive accuracy comparison are in Table 4.2. The differ-
ence in the estimated lppd’s seems not significant. In this case the best practice could be
to select the simplest model, and so the model with p = 4 (or p = 6 if one want to have in
the model the information about blood types).
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
● ●
●
● ●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●●
●
●
●
●
●
● ●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●● ●
● ● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●● ●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●
●
● ●
●
●
●
●
●●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
● ●
●●
●
●
●
●
●
●
●
●
●●
●●
●
●
●●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●● ●
●
●
●
●●
●
●
●
●
●
●
●
●●
● ●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●●
●
●
●
●
● ●●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●●
●
●
●
●
●●
●
●
●
●
●
●●
●●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
● ●
●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●●
●
●●●
●
●
●
●
●
●●
●
●
●
●
● ●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●● ●
● ●●
●
●
● ●
●
●
●●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●
● ●
●●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
● ●●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●● ●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●●
●
●
●
●
●● ●
●●
●●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
● ●
●●
●
●
●
●
●●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●●● ●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
● ●●●
●
●
●●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●●
● ●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
● ●●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●●
●
●
●
●
●
●●
●●
●●
●
●●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●● ●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
● ●
●
●●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●●
●
●●
●
●
●
●
●●
●
●
●
●
●
●●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
● ●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
● ●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●●
●●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●
●
●●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
● ●
●●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●●●●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●●
● ●
●
● ●●
●●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●
● ●
●
●
●●●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●●●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●●●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●● ●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●●
●
●
●
●
●
● ●
●
●
●
●
●
●●
●
●
●●●
●
●●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●●●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
● ●
●●
●
●●
●
●
●
●●●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●●
●●
●
●
●
●
●
●
●●
●
●
●●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
● ●●
●
●
●
●●
●
●
●
●●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●●
●
●
●
●●●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●●● ●
●
●
●
●●
●
●
●
●
●
●●
●●
●
●
●●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●●
●
●●●●
●●●
●
●
●
●●
●
●
●●
●● ●
●
●
●●●
●●
●●
●●
●●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●●●●
●● ●
●
●
● ●
●●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●●●
●
●
●
●●
●●
●
●●●
●
●
●●
●
●
●●
●●
●
●
● ●●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●●
●●
●
●●
●●
●●
●
●
●
●
●
●
●●● ●
●●
●
●
●●
●
●
●
● ●●●
●
●
●●●●●
●●
●
●●
●●●●●
●
●
●
●●●
●● ●●
●
−7.0 −6.5 −6.0 −5.5 −5.0 −4.5 −4.0
−2
−1
01
2
log(Rate)
log(
w_p
ost_
mea
n)
(a) Scatterplot of the posterior meanof all wi and of the empirical rate ofdonations of i− th individual
0 1 2 3 4 5 6 7
0.0
0.5
1.0
1.5
2.0
Den
sity
FRAILTY
New donorDonor−specific
(b) Marginal posterior densities of thefrailty of 3 donors in the sample andposterior predictive posterior densityof wnew
Figure 4.4: Summaries of wi
54
4.2. INFERENCE ON PARAMETERS
2000 2200 2400 2600 2800 3000
0.00
00.
004
0.00
8
Days
Den
sity
FRAILTY
New donorDonor−specific
Donor-specificNew donor
(a) Predictive density for the new donationof the 19− th individual
0 2 4 6 8 100.
00.
10.
20.
30.
40.
50.
60.
7
Den
sity
FRAILTY
New donorDonor−specific
Donor-specificNew donor
(b) Posterior density of w19 and predictivedensity for wnew
1000 1500 2000 2500 3000
0.00
00.
004
0.00
80.
012
Days
Den
sity
FRAILTY
New donorDonor−specific
Donor-specificNew donor
(c) Predictive density for the next donationof the 28− th individual
0.0 0.5 1.0 1.5 2.0 2.5 3.0
01
23
45
Den
sity
FRAILTY
New donorDonor−specific
Donor-specificNew donor
(d) Posterior density of w28 and predictivedensity for wnew
Figure 4.5: Predictive densities of Ti,ni+1 given Ti,ni for some donors
55
CHAPTER 4. POSTERIOR INFERENCE ON AVIS DATA
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
020
4060
80100
0 1 2 3 4 5 6
CI for w
_new[j]
●●
●
●●
●
●
●●
●
●●
●●
●●
●
●●
●●
●
●●
●
●
●●
●●
●●
●●
●●
●●
●●
●●
●
●●
●
●
●
●
●●
●
●
●●
●●
●●
●
●●
●
●●
●
●
●●
●
●
●●
●●
●●
●●
●●
●●
●●
●
●
●●
●
●
●●
●
●●
●
●
●●
●●
Cities in the province of M
ilanD
istricts in the city of Milan
Provinces of Lom
bardyC
ities outside Lombardy
No zone dependence
ArlunoCorbetta
CuggionoMagentaParabiago
PeroRho
SedrianoSettimo Milanese
ArconateBaranzate
Castano PrimoGarbagnate Milanese
LegnanoNovate Milanese
SenagoCormano
Paderno DugnanoCambiago
GrezzagoBasiano
CarugateCassano d'Adda
Cernusco sul NaviglioGorgonzola
InzagoMelzo
PaulloPeschiera Borromeo
Vaprio d'AddaCerro al Lambro
MelegnanoSan Colombano al Lambro
AlbairateAbbiategrasso
BinascoGaggiano
LacchiarellaLocate di Triulzi
Motta ViscontiGudo Visconti
RozzanoAssago
BressoCinisello Balsamo
Cologno MonzeseCorsico
Cusano MilaninoPioltello
San Donato MilaneseSan Giuliano Milanese
Sesto San Giovanni20121
2012220123
2012420125
2012620127
2012820129
2013120132
2013320134
2013520136
201372013820139
2014120142
2014320144
201452014620147
2014820149
2015120152
20153201542015520156
201572015820159
2016120162
BGBS
COCR
LCLO
MBMN
OthersPV
SOVA
Figure
4.6:95%
posteriorpredictive
credibilityintervals
ofwn
ewj
,j=
1,...,J,the
frailtyofa
newdonor
fromzone
j.Ingrey
theestim
ateobtained
with
them
odelwith
noarealdependence.
56
4.2. INFERENCE ON PARAMETERS
4.2.3 Random effects
As already mentioned, the individual specific random effects have been modelled in two
ways. The first one is to model each of the random effects with an exchangeable Gamma
prior with mean equal to 1 and a common variance parameter called η.
One random effect is estimated for each individual in the dataset. As it can be seen
in Figure 4.4a, it seems that there is a linear association between the observed rate of
donation of the donor and the frailty’s posterior mean. Every individual in the dataset
contributes to estimate the variance η of the random effects’ population, and so it is
possible to estimate the predictive density of a new incoming donor’s frailty wnew (see
3.2.2). Moreover every donor in the dataset is characterized by the posterior density of
his/her frailty (some examples, compared to the predictive density of a new incoming
donor, in Figure 4.4b), and so it is possible to do an individual-specific prediction. In
Figure 4.5 the predictive density of a new donation given the last observed donation
is shown for some donors in the dataset. In blue it is displayed the predictive density
computed with the predictive density of wnew (as if the donor was not in the sample),
while the one in red is computed with the individual-specific random effect wi.
Summing up, in Figures 4.4 and 4.5 it is noticeable that there is heterogeneity
between individuals that is not captured by observable features. The random effects
do not concentrate on a single value (like if every donor experiences the recurrent
event process in the same manner), but they are spread in a wide range (see Figure
4.4a). Therefore, this approach is useful to make an individual-specific prediction for
the individuals in the sample (see Figures 4.5a and 4.5c). Furthermore, the variability
with which each new donor approaches his/her-self to blood donation is captured in the
predictive density of the random effect of a new donor wnew.
4.2.3.1 Areal dependent frailties
The second approach to random effects has been to divide the individuals assigning an
area to each one of them according to the postal code of their own residence. The number
of different zones will be denoted by J. The prior is specified in (3.20), (3.21) and (3.22).
In this way the n j parameters are exchangeable and they can "borrow strength" from
each others for the estimate of the posterior distribution. The division in zones has been
done in the following way:
• each municipality in the province of Milan has its specific zone (51 zones);
57
CHAPTER 4. POSTERIOR INFERENCE ON AVIS DATA
• each of the 38 postal code associated to a district in the municipality of Milan has
its specific zone (38 zones);
• one zone for each province in Lombardy (11 zones: Bergamo, Brescia, Como, Cre-
mona, Lecco, Lodi, Mantova, Monza e Brianza, Pavia, Sondrio, Varese) ;
• another zone which collects all the municipalities that do not belong to the previous
categories.
This division results in J = 102 zones.
Figure 4.6 shows the inference for the posterior predictive density of the random
effects of a new incoming individual from zone j. By looking at this plot no particular
dependence from the area of origin is inferred, since the differences with the estimate ob-
tained with the model without zone dependence (in grey in the figure) are not significant.
In addition, each of the credibility intervals in Figure 4.6 has been colored according to
a further division in 4 macro-areas (red for districts of Milan, blue for municipalities
in the province of Milan, yellow for cities in other provinces in Lombardy and green for
the "rest of the world"). Fitting the model with this division (J = 4) did not revealed any
significant dependence of the random effects from the zones resulted from this additional
division.
0 200 400 600
01
23
45
67
CI male donor
Day
N(t
)
0 200 400 600
01
23
45
67
CI female donor
Day
N(t
)
Figure 4.7: Pointwise predictive 95 % credibility intervals for Nnew(t)|xnew, where xnew
is set to the mean (or to the mode) of the features used as covariates
58
4.2. INFERENCE ON PARAMETERS
0 200 400 600
01
23
4
Day
E[N
(t)|
data
, x]
MaleFemale
0 200 400 600
01
23
4Day
E[N
(t)|
data
, x]
Male,Age=0Male,Age=1Male,Age=−1
0 200 400 600
01
23
4
Day
E[N
(t)|
data
, x]
Male,Non−smokerMale,Smoker
0 200 400 600
01
23
4
Day
E[N
(t)|
data
, x]
Male,Rh −Male,Rh +
Figure 4.8: Mean functions for Nnew(t)|xnew,data. Unless stated otherwise, the covari-ates are set to the mean (or to the mode)
4.2.4 Predictive density for the count process of a newincoming donor
The law of N(t) given the data can be estimated by simulating one recurrent event
process for each drawn in the MCMC sample (see 3.4). Figure 4.7 displays 95 % posterior
predictive credibility intervals for Nnew(t) in the case of a male and a female new
59
CHAPTER 4. POSTERIOR INFERENCE ON AVIS DATA
incoming donor (t varies from 0 to the first two years after the first blood donation). The
plots refer to a vector of covariates xnew, equal to 0 for the continuous variables (i.e. the
sample mean) and equal to the sample modes for the categorical variables (non-smoker,
non-alcohol consumer, blood type 0, positive Rh).
Figure 4.8 displays the posterior mean function for Nnew(t)|xnew, for some possible
covariates configurations. The mean function of a man doubles the mean function of a
woman. This is natural since, according to law, a man has the double of the opportunity
to donate in one year with respect to a woman. The posterior predictive credible bands
for Nnew(t) tends to be larger and larger, since the lower bound remains near to 0 for all
the time domain, while the upper bound increases with the time.
4.3 Point predictions
Let us consider the Mean Absolute Error (MAE) between some predicted values y∗i and
the respective real observed values yi.
MAE = 1M
M∑i=1
|y∗i − yi| (4.2)
MAE is easily interpretable as the average absolute error between the predictions
and the real values. In the case of recurrent events yi and y∗i represent days, and so the
forecast error has a clear unit of measurement.
In order to have an intuitive measure of accuracy, point predictions coming from the
posterior predictive distribution for the last donation has been considered.
To have an unbiased measure of the prediction error the dataset has been divided
into train and test set. The train set is composed of all the donations except the last of
each donor, which is considered censored after the last but one donation. The test set is
composed of the last donations of each donor. By applying this division, it is possible to
have an MCMC sample of wi for i = 1, . . . , M, and to evaluate the model using even these
individual-specific parameters.
using the model with 10 cut-points quantiles-defined, for each i = 1, . . . , M, an MCMC
sample from L (Ti,ni |Ti,ni−1,data, xi) has been obtained. Then, mean and median from
the posterior predictive distribution have been estimated. All the different subset of
covariates has been considered to evaluate the point predictive accuracy (p = 4,6,9).
Summing up:
60
4.3. POINT PREDICTIONS
• MEANp4, MEANp6, MEANp9 are the predictors that use the mean of the pos-
terior predictive distribution for Ti,ni given Ti,ni−1. The subscripts indicate how
many covariates are used in the test set;
• MEDI ANp4, MEDI ANp6, MEDI ANp9 are the predictors that use the median of
the posterior predictive distribution for Ti,ni given Ti,ni−1. The subscripts indicate
how many covariates are used in the test set.
Moreover some "naive" predictors have been taken into account.
• NAIVE_MEAN for donor i predicts Ti,ni−1 plus the mean gap times among Wi,1 =Ti,2 −Ti,1 , ..., Wi,ni−2 = Ti,ni−1 −Ti,ni−2;
• NAIVE_MEDIAN for donor i predicts Ti,ni−1 plus the median of the gap times
among Wi,1 = Ti,2 −Ti,1, ..., Wi,ni−2 = Ti,ni−1 −Ti,ni−2;
• NAIVE_MEAN_ALL for donor i predicts Ti,ni−1 plus the mean gap times of all the
donors of the same sex as i;
• NAIVE_MINIMUM for donor i predicts Ti,ni−1 plus the minimum gap time accord-
ing AVIS rules (i.e. 90 days if i is male, and 180 if i is female).
It can be noticed in Table 4.3 that the predictors that perform better according to
MAE are MEDI ANp6 and MEDI ANp9, but they are comparable to the naive estimator
N AIV E_MEAN_ALL, which uses the donor-specific information to predict the next
donation.
Another possible measure of point prediction error is the Root Mean Square Error
(RMSE).
RMSE =√√√√ 1
M
M∑i=1
(y∗i − yi)2 (4.3)
By computing the square of each error, RMSE penalizes more higher deviations
from the prediction with respect to MAE. However RMSE does not possess the same
proprieties of interpretability that MAE has.
Posterior mean predictors performs better in terms of RMSE. Moreover, according to
this measure and unlike to MAE, the naive predictors (apart from N AIV E_MEAN_ALL)
do not offer the same accuracy of the predictive posterior summaries (see Table 4.3).
61
CHAPTER 4. POSTERIOR INFERENCE ON AVIS DATA
PREDICTOR MAE RMSEMEANp4 136.32 227.68MEANp6 137.96 225.25MEANp9 137.98 225.21
MEDI ANp4 120.21 234.78MEDI ANp6 118.15 230.00MEDI ANp9 118.19 229.96
N AIV E_MEAN 125.79 247.85N AIV E_MEDI AN 124.24 256.59
N AIV E_MEAN_ALL 117.81 231.18N AIV E_MINIMUM 133.95 260.44
Table 4.3: Point prediction errors
62
Forecasting new donors5
Previous chapters deal with modelling the behaviour of already enrolled donors.To have a complete picture of the number of blood donations in a specificcollection center, a time series model for new donors is proposed in this chapter.
First, State Space Models will be presented. Later, this family of models will be applied toAVIS data in order to estimate the weekly number of new incoming donors.
5.1 State Space Models
State Space Models (SSMs) are widely used in time series analysis. Within this frame-
work, the time series is decomposed in two parts. The first part represents the obser-
vational level and it usually consists of temporally independent specifications of the
elements of the time seris. The second part, instead, describes the evolution of the process
at a latent, unobserved level. The unobservable variables introduced are often referred
to as states. The result is a very flexible and general latent-variables class of models
which can be used in many applications.
SSM were originally introduced to model continuous time series data, but subse-
quently a straightforward extension to discrete-valued time series has been developed.
SSM can be tackled also within Bayesian perspective. One of the most general case of
SSM was introduced by West et al. (1985) and it is called dynamic generalized linearmodel (DGLM). Consider the time series y1, . . . , yT , let EF(µ,φ) denotes an exponential
family distribution with mean µ and variance φ c(µ), where c(µ) is a function of the
mean.
63
CHAPTER 5. FORECASTING NEW DONORS
The decomposition for t = 1, . . . ,T is given by the equations
Observation equation: yt|xt,θind∼ EF(µt,φ) (5.1)
Link function: g(µt)=z′txt (5.2)
System equation: xt =G txt−1 +wt (5.3)
Residual equation : wt|θ ind∼ N(0,W) (5.4)
where
• zt is a known vector at time t, which could possibly include covariates;
• xt is a time-dependent latent state at time t;
• G t is the matrix that describes the evolution of the latent state;
• θ is a vector of all the hyperparameters (including φ and W).
A prior distribution on the hyperparameters θ and on the initial state x0 would complete
the formulation of the model in the Bayesian perspective.
DGLM considers only linear models at the link relation and at the system evolution
levels, however linearity is usually a suitable hypothesis in many applications. The
general formulation of the observation equation allows to treat both continuous (e.g.
with a Gaussian density) and discrete (with Poisson, Binomial or Negative Binomial
distributions) time series data.
Some useful features (like the level of the series, the local growth and the seasonality)
can be represented within this formulation. In Section 5.3 this issue will be deepened
with the specification of the employed model.
5.2 Descriptive analysis
As mentioned in the previous chapter, 9175 donors have become donors in the period
that goes from the 1st of January 2010 to the 30th of June 2018. While not all the donors
were considered in the analysis of the blood donations as recurrent events, in this case
there is no reason to keep some of them out of the analysis. Indeed in the first case the
goal of the analysis was to have an estimate of the behaviour of the existing donors, and
so an individual that just entered in the study without any further donations could not
be considered as drawn from the population of the recurrent donors. On the other side,
64
5.2. DESCRIPTIVE ANALYSIS
each entrance in the study is associated with a whole blood donation and it is part of the
blood supply chain, even if performed by a non-recurrent donors.
The time series that will be considered are the weekly number of new incoming
donors. The data collection period starts the 1st of January 2010, which is Friday. As a
consequence yt, with t = 1, . . . , N, is the number of new donors in the week t, which goes
from Friday to the subsequent Thursday. The resulting time series has length N = 443.
Figure 5.1, which displays boxplots of the weekly arrivals of new donors grouped by
years, shows that this number has grown over the years.
●● ●
●
●
●
●
●
●
●
●
2010 2011 2012 2013 2014 2015 2016 2017 2018
020
4060
8010
0
YEAR
WE
EK
CO
UN
T
Figure 5.1: Weekly arrivals of new donors grouped by years
A seasonal trend can be seen in Figure 5.2. Indeed, the number of new donors declines
in January, August and December.
Figure 5.3 shows the whole time series. It is interesting to observe that some high
peaks appears in 2016 and 2017, maybe some exceptional events occurred at that time.
In particular, at the end of August 2016 an earthquake hit Central Italy causing wounded
and damages. As a consequence, health authorities made appeals to call people to donate
their blood to contain the emergency.
The empirical distribution of the time series is summarized in Table 5.1. On average
there are about 20 new donors every week.
65
CHAPTER 5. FORECASTING NEW DONORS
●
●
●
●
1 2 3 4 5 6 7 8 9 10 11 12
010
2030
4050
60
MONTH
WE
EK
CO
UN
T
Figure 5.2: Weekly arrivals of new donors grouped by months
0 100 200 300 400
020
4060
8010
012
0
Index
Wee
kly
arriv
als
2010 2011 2012 2013 2014 2015 2016 2017 2018
Figure 5.3: Time series of the weekly arrivals
5.3 A Bayesian model for the new donors
In this section we describe the class of models used in this work. The counts of the new
arrivals are modelled in the observation equation as independent Poisson random vari-
ables conditionally to the parameters. For each t the Poisson parameters are decomposed
as the exponential of the sum of two components. The first component is the trend µt,
while the second is the seasonal effect τt, which has a periodicity of 52 weeks, namely a
66
5.3. A BAYESIAN MODEL FOR THE NEW DONORS
Minimum 21st Quartile 13
Median 203rd Quartile 27Maximum 99
Mean 20.66Sd 9.94
Table 5.1: Summaries of the empirical distribution of the time series of the weeklyarrivals
year. The term δt has the interpretation of the local growth of the µt parameter. All of the
three class of parameters µt, δt and τt are modelled a priori as Random Walks centered
in a linear combination of the parameters in the past. In particular, µt is centered in µt−1
with a slope correction given by δt, which in turn is centered in δt−1. For what concerns
the seasonal effect τt, for any t the sum of the components in every period∑S−1
s=0 τt−s has
mean equal to zero. No particular features are present to be used as covariates.
The standard deviations of the hidden variables are a priori assumed indepedent and
marginally uniformly distributed in the interval [0,T], where T has been fixed to 100.
The following model (Model 1) has been implemented in Stan (Stan Development Team
and others, 2016), with 100000 iterations of warm-up, and 200000 iterations of sampling
(thinned every 50 iterations), so that, an MCMC sample of 4000 observations has been
obtained.
Summing up:
yt|λtind∼ Poisson(λt) t = 1, . . . , N (5.5)
log(λt)=µt +τt (5.6)
µt|µt−1,δt,σµind∼ N (µt−1 +δt,σ2
µ) trend (5.7)
σµ ∼Uni f orm([0,T]) (5.8)
δt|δt−1,σδind∼ N (δt−1,σ2
δ) local growth (5.9)
σδ ∼Uni f orm([0,T]) (5.10)
τt|τt−1, . . . ,τt−S+1,στind∼ N (−
S−1∑s=0
τt−s,σ2τ) season effect (5.11)
στ ∼Uni f orm([0,T]) (5.12)
As it can be seen in Figure 5.4, the traceplot of the parameter σδ shows that the
chain had a not negligible autocorrelation.
67
CHAPTER 5. FORECASTING NEW DONORS
sigma0 sigma1 sigma2
0 500 1000 1500 2000 0 500 1000 1500 2000 0 500 1000 1500 2000
0.15
0.20
0.25
0.03
0.06
0.09
0.40
0.45
0.50
SIGMA_MU SIGMA_DELTA SIGMA_TAU
Figure 5.4: Traceplots variance parameters Model 1
As an alternative we have considered a second model (Model 2), removing the local
slope component δt. The characteristic of the sampler are the same of Model 1 and the
convergence of the chain has been checked.
yt|λtind∼ Poisson(λt) t = 1, . . . , N (5.13)
log(λt)=µt +τt (5.14)
µt|µt−1,σµind∼ N (µt−1,σ2
µ) trend (5.15)
σµ ∼Uni f orm([0,T]) (5.16)
τt|τt−1, . . . ,τt−S+1,στind∼ N (−
S−1∑s=0
τt−s,σ2τ) season effect (5.17)
στ ∼Uni f orm([0,T]) (5.18)
5.4 Posterior inference
Figure 5.5 and 5.6 display the posterior means of each of the class of parameters that
decompose the time series under the two models. The series {µt, t = 1, . . . , N} has an
increasing trend, which confirms the rise of the number of new arrivals through the
years observed in Figure 5.1.
Let θ(s)t denotes the s− th draw of the MCMC sample of the parameter θt. The MCMC
sampling from the predictive distribution L (yN+k|data), k ≥ 1 can be obtained with the
68
5.4. POSTERIOR INFERENCE
following scheme:
• draw δ(s)N+k from N
(δ(s)
N+k−1,(σ(s)δ
)2) (or set it equal to 0 in the case of Model 2);
• draw µ(s)N+k from N
(µ(s)
N+k−1 +δ(s)N+k−1,(σ(s)
µ )2);• draw τ(s)
N+k from N(−∑S
p=1τ(s)N+k−p, (σ(s)
τ )2);• draw y(s)
N+k from Poisson(exp(µt +τt)
).
The sequence {y(s)N+k : s = 1, . . . ,4000} is an MCMC sample from the predictive distribution
of yN+k. Figure 5.7 shows the credible bands for the prediction of new arrivals for
k = 1, . . . ,52. The two models are in agreement. After the 10− th week the prediction
starts to oscillating with larger amplitude as the weeks pass by. This is due to the
fluctuations of the prediction of the seasonal components, see Figure 5.8. Table 5.2 shows
the numerical values of the prediction in the first 12 weeks, before the oscillations of the
credible bands.
0 100 200 300 400
1.0
1.5
2.0
2.5
3.0
3.5
4.0
Week
Tren
d
0 100 200 300 400
−0.
002
0.00
00.
002
0.00
40.
006
0.00
80.
010
0.01
2
Week
Loca
l gro
wth
0 100 200 300 400
−0.
4−
0.3
−0.
2−
0.1
0.0
0.1
0.2
Week
Sea
sona
l Tre
nd
Figure 5.5: Model 1: decomposition of the time series
69
CHAPTER 5. FORECASTING NEW DONORS
0 100 200 300 400
1.0
1.5
2.0
2.5
3.0
3.5
4.0
Week
Tren
d
0 100 200 300 400−
0.4
−0.
3−
0.2
−0.
10.
00.
10.
2
Week
Sea
sona
l Tre
nd
Figure 5.6: Model 2: decomposition of the time series
0 10 20 30 40 50
050
100
150
200
Week
Pre
dict
ion
(a) Prediction Model 1
0 10 20 30 40 50
050
100
150
200
Week
Pre
dict
ion
(b) Prediction Model 2
Figure 5.7: Prediction of new weekly arrivals: 95 % credibility intervals
70
5.4. POSTERIOR INFERENCE
0 10 20 30 40 50
−1.
0−
0.5
0.0
0.5
1.0
Step
Sea
sona
l Tre
nd: p
redi
ctiv
e m
ean
Figure 5.8: Predictive mean of the seasonal component
Step forward q0.025 Median q0.975
1 6.00 15.00 30.002 5.00 15.00 35.003 5.00 16.00 41.004 4.00 17.00 44.005 4.00 16.00 48.006 3.00 15.00 49.007 3.00 16.00 61.008 3.00 16.00 69.009 2.00 12.00 56.00
10 1.00 11.00 53.0011 2.00 17.00 81.0012 3.00 25.00 138.00
Table 5.2: Prediction of future weekly arrivals
71
Conclusions and further
developments
In this thesis, we have proposed a statistical model to describe and predict recurrent
blood donations. Forecasting the number of arrivals in a blood collection centre is very
important to plan efficiently the storage of this resource in the transfusion centres. A
solution to this problem would bring benefits to all the healthcare system by improving
the quality of the service from donors’ point of view, by reducing the costs of the service
and by leading to an increase of the number of donations. This work has been possible
thanks to the collaboration of AVIS Milan, who provided the data.
The approach followed in this work has been to consider a donor in the study once
he/she donates for the first time in his/her life. Then, all the successive donations has
been modelled as a recurrent event process, using the Bayesian approach. Since the focus
was on event counts over time, the intensity function has been modelled in the framework
of the multiplicative model, with a step function as a baseline intensity function. The
analysis revealed a decreasing trend of the rate of donations, meaning that a donor has a
higher propensity to donate at the beginning of his/her donor-life rather than once some
time is passed.
Four covariates have been identified as significant. These are the gender of the donor,
his/her age, the smoke habits and the Rhesus factor. However, all these covariates are
time-fixed because they were considered at the beginning of the study, hence a possible
extension could be to introduce time-dependent features in the model.
The heterogeneity among donors has been captured using random effects in the
intensity function. These parameters are individual-specific and allow to discriminate
among donors, summarising in the posterior distribution of the random effects their
reliability. Moreover with this approach it is possible to customize the prediction for each
donor in the sample, and, in case of new incoming donors, to make prediction taking into
73
account the variability between individuals.
Suspensions from donation could be, in principle, handled by the model. However the
corresponding data revealed to be noisy, and so this phenomenon was not included in the
model formulation. A better comprehension of these data can be useful to formulate the
model in a proper way in order to handle suspensions in the model.
Another question which deserves to be deepened is the different deferral time of
women whether they are in menopause or not. Identifying the two sub-populations could
be a way to improve the model, since the mandatory rest time after the donation is a
fundamental part of it.
To have a complete picture of the number of blood donations in a specific blood
collection center, the new donors arrivals’ time series has been modelled. However this
part of the work has to be intended as a preliminary work, and indeed some issues
arose. For example, Stan software has been used to make posterior inference, but a
more suitable MCMC algorithm should be used (e.g. Particle filters methods). Moreover,
covariates were not included in this model, but appropriate features could reduce the
variability of the prediction. The resulting prediction were not satisfying since there
were oscillations of the credible bands due to the seasonal components. An improvement
of the proposed model should include a theoretical study of the property of the model to
understand this phenomenon.
Bibliography
Arjas, E. and Gasbarra, D. (1994).
Nonparametric Bayesian inference from right censored survival data, using the Gibbs
sampler.
Statistica sinica, pages 505–524.
Banerjee, S., Wall, M. M., and Carlin, B. P. (2003).
Frailty modeling for spatially correlated survival data, with application to infant
mortality in Minnesota.
Biostatistics, 4(1):123–142.
Bas Güre, S., Carello, G., Lanzarone, E., and Yalçındag, S. (2018).
Unaddressed problems and research perspectives in scheduling blood collection from
donors.
Production Planning & Control, 29(1):84–90.
Bosnes, V., Aldrin, M., and Heier, H. E. (2005).
Predicting blood donor arrival.
Transfusion, 45(2):162–170.
Cook, R. J. and Lawless, J. (2007).
The statistical analysis of recurrent events.
Springer Science & Business Media.
Flegel, W., Besenfelder, W., and Wagner, F. (2000).
Predicting a donor’s likelihood of donating within a preselected time interval.
Transfusion Medicine, 10(3):181–192.
Gamerman, D., Abanto-Valle, C. A., Silva, R. S., and Martins, T. G. (2015).
Dynamic Bayesian models for discrete-valued time series.
Handbook of Discrete-Valued Time Series, pages 165–186.
77
Gelman, A., Hwang, J., and Vehtari, A. (2014).
Understanding predictive information criteria for Bayesian models.
Statistics and computing, 24(6):997–1016.
Gianoli, I. (2016).
Analysis of gap times of recurrent blood donations via bayesian nonparametric models.
MSc. Thesis, Politecnico di Milano.
Gustafson, P., Aeschliman, D., and Levy, A. R. (2003).
A simple approach to fitting Bayesian survival models.
Lifetime data analysis, 9(1):5–19.
Henderson, R., Shimakura, S., and Gorst, D. (2002).
Modeling spatial variation in leukemia survival data.
Journal of the American Statistical Association, 97(460):965–972.
James, R. and Matthews, D. (1996).
Analysis of blood donor return behaviour using survival regression methods.
Transfusion medicine, 6(1):21–30.
Johnson, W., Branscum, A., Hanson, T. E., and Christensen, R. (2010).
Bayesian ideas and data analysis: an introduction for scientists and statisticians.
CRC Press.
Kalbfleisch, J. D. (1978).
Non-parametric Bayesian analysis of survival time data.
Journal of the Royal Statistical Society: Series B (Methodological), 40(2):214–221.
Li, Y. and Ryan, L. (2002).
Modeling spatial survival data using semiparametric frailty models.
Biometrics, 58(2):287–297.
Ministero Della Salute (2015).
Disposizioni relative ai requisiti di qualità e sicurezza del sangue e degli emocompo-
nenti.
Gazzetta Ufficiale.
Ouyang, B., Sinha, D., Slate, E. H., and Van Bakel, A. B. (2013).
Bayesian analysis of recurrent event with dependent termination: an application to a
heart transplant study.
78
Statistics in medicine, 32(15):2629–2642.
Ownby, H., Kong, F., Watanabe, K., Tu, Y., Nass, C. C., and Study, R. E. D. (1999).
Analysis of donor return behavior.
Transfusion, 39(10):1128–1135.
Pennell, M. L. and Dunson, D. B. (2006).
Bayesian semiparametric dynamic frailty models for multiple event time data.
Biometrics, 62(4):1044–1052.
Sahu, S. K., Dey, D. K., Aslanidou, H., and Sinha, D. (1997).
A Weibull regression model with gamma frailties for multivariate survival data.
Lifetime data analysis, 3(2):123–137.
Soyer, R., Aktekin, T., and Kim, B. (2015).
Bayesian modeling of time series of counts with business applications.
Handbook of Discrete-Valued Time Series, Davis RA, Holan SH, Lund R, RavishankerN, pages 245–264.
Stan Development Team and others (2016).
Stan modeling language users guide and reference manual.
Technical report.
Vehtari, A., Gelman, A., and Gabry, J. (2017).
Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC.
Statistics and Computing, 27(5):1413–1432.
West, M., Harrison, P. J., and Migon, H. S. (1985).
Dynamic generalized linear models and Bayesian forecasting.
Journal of the American Statistical Association, 80(389):73–83.
Yin, G., Ibrahim, J. G., et al. (2006).
Bayesian transformation hazard models.
In Optimality, pages 170–182. Institute of Mathematical Statistics.
80