Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Shared Frailty Survival Analysis
Using Semiparametric Bayesian Method
Shaban A. Shaban1,* and Ayman A. Mostafa2,**
1Department of Mathematical Statistics,
Institute of Statistical Studies and Research
2Department of Technical Accounting General Insurance
Egyptian Insurance Supervisory Authority
28, Talaat Harb St., Cairo P.O. Box: 2545
* [email protected] ** [email protected]
1
SUMMARY. In survival data analysis, the proportional hazard model was introduced
by Cox (1972) in order to estimate the effects of different covariates influencing the
time-to-event data. The proportional hazard model has been used extensively in
biomedicine, reliability engineering and, recently, interest in its application in different
areas of knowledge has increased. However, proportional hazard model makes a
number of assumptions, which may be violated. The object of this article is to present a
Bayesian analysis for survival models with frailty under additive framework for the
hazard function in contrast to proportional hazard model. Frailty models in survival
analysis deal with the unobserved heterogeneity among subjects. Gibbs sampling
technique is used to assess the posterior quantities of interest. An illustrative analysis
within the context of survival time data is given.
KEY WORDS: Survival Analysis, Regression Models, Additive Survival Analysis,
Bayesian Inference, Frailty Models, BUGS
2
1. Introduction Survival data is a term used for describing data that measure the time to some event.
Statistical models and methods for such data and other time-to-event data are
extensively used in many fields, including the biomedical sciences, engineering, the
environmental sciences, economics, actuarial sciences, management, and social
sciences.
In survival analysis, the additive, multiplicative and the class of general additive-
multiplicative hazard models provide the three principle frameworks for studying the
association between covariates and the survival time. The hazard function, also called,
the risk or intensity function, of a survival time T associated with a P-vector of
covariates x is defined as h(t | x) f (t | x) [1 F(t | x)],= − where and are
the density function and the distribution function, respectively, of the random variable T
conditioned on the vector of covariates x. The function S(
f (. | x) F(. | x)
t | x) 1 F(t | x)= − is called the
survival function.
Under the additive hazard model (Lin and Ying, 1994; Beamonte and Bermúdez,
2003), the hazard function takes the form
0 0h(t | x) h (t) x,′= + β (1.1)
Under the multiplicative hazard model (Cox, 1972) takes the form
0h(t | x) h (t)exp( x),0′= α (1.2)
and under the class of general additive-multiplicative models (Lin and Ying, 1995;
Dunson and Herring, 2004), the hazard function takes the form
0 0 0h(t | Z) g{ R} h (t)h{ x}.′ ′= β + α (1.3)
where h0(t) is an unspecified “baseline hazard function”, Z (R ,x′ ′)= is a p-vector of
covariates and (say) is a p-vector of unknown regression parameters. The
covariate Z can be time-dependent, and are known link functions. It is
obvious that
0 0 0( , )′ ′ ′β γ = θ
g{.} h{.}
(1.3) encompasses both models (1.1) and (1.2).
In many applications such as biomedical sciences, the survival time T is often
subject to right censoring because certain patients may still be surviving at the end of
3
the study period. Furthermore, due to the complexity of biological process, it is
desirable not to parameterize and therefore, only semiparametric inference has
been used for models
0h (t)
(1.1), (1.2) and consequently (1.3).
In order to draw semiparametric inference for model (1.2), Cox (1972, 1975)
introduced the partial likelihood approach to estimate the regression vector parameter
. Since the proportional hazards assumptions are often violated, the need for more
flexible model motivate the introduction of models
0α
(1.1) and (1.3) . From Bayesian
perspective, the model (1.2) has also been approached. The article by Sinha and Dey
(1997) and the book on Bayesian survival analysis by Ibrahim et al. (2001) contain an
excellent survey.
Although the additive hazard model (1.1) have been advocated and successfully
utilized by numerous authors (e.g., Buckly, 1984; McKeague and Sasieni, 1994, and
other references therein), no satisfactory semiparametric methods of estimation have
been developed because of the fact that the partial likelihood approach can not be
directly used to eliminate nuisance function in estimating β0h (.) 0.
The above three models, as defined in expressions (1.1), (1.2) and (1.3), it is
modeled data as if all the individuals in the sample (conditionally to the vector of
covariates) are drawn from a single homogenous population. But frequently there is
heterogeneity in the population that the available covariates do not properly explain.
We now consider generalizations of model (1.1) to allow for covariates that do not
properly explain (or unobserved individuals effects). These are usually referred to as
‘frailty’ in the biomedical sciences. The notion of frailty provides a convenient way to
introduce random effects, association and unobserved heterogeneity into models for
survival data. In its simplest form a frailty is an unobserved proportionality factor that
modifies the hazard function (1.1). As discussed in Sinha (1993), the idea of frailty,
which was introduced by Vaupel et al. (1979), is particularly natural in the context of
proportional hazard model (1.2). In some situations the extra random frailty component
of proportional hazard model is required only to get a correct inference on fixed effects
4
of covariates, whereas in other cases the distribution of the random subject effect could
be one of the major interests.
In many applications, the study population can not be assumed to be homogenous
but must be considered as a heterogeneous sample, i.e., a mixture of individuals with
different hazards. For example, in many applications, it is impossible to measure all
relevant covariates related to the event of interest. Sometimes because the importance of
some covariates is still unknown or sometimes because of economical reasons.
Therefore the frailty approach is statistical modeling concept which aims to account for
heterogeneity, caused by unmeasured covariates.
This article focuses on the shared frailty model in the additive hazard model (1.1).
The shared frailty model is relevant to event times of related individuals, similar
members and repeated measurements (parallel data). Individuals in a group (cluster) are
assumed to share the same frailty, which is why this model is called “shared frailty
model”. It was introduced by Clayton (1978) and extensively studied in Hougaard
(2000). The survival times are assumed to be conditional independent with respect to
the shared (common) frailty. Most research in this area focused on multiplicative model
(1.2) with frailty from both classical approach (e.g., Vu, 2003) and Bayesian approach
(e.g., Clayton, 1991). In addition, there has no consideration of additive (1.1) or
additive-multiplicative models in the Bayesian literature. A notable exception is the
article by Beamonte and Bermúdez (2003), that proposed an approach for Bayesian
inference in an additive Gamma-polygonal hazard model and the recent article by
Dunson and Herring (2003), that focused on the problem of variable selection and
inference, first in model (1.1) and then in the more general model (1.3). These recent
articles did not consider the frailty for additive or additive-multiplicative hazard models.
In this article, we propose a hierarchical model where the shared frailty is assumed
to be Gamma distribution and the hazard function is given by model (1.1). We
considered the parametric part in (1.1), 0x′β , to be an Exponential hazard function
specific for each individual but associated with the covariates, which are assumed to be
5
time-independent, through a probabilistic model. This approach is especially
appropriate to deal with semiparametric Bayesian with a frailty.
Fully Bayesian computation of hierarchical models using simulation technique,
such as Markov chain Monte Carlo (MCMC) algorithms is conducted.
Section 2 describes the proposed model. Section 3 introduces some notation and
then derives the likelihood function. Section 4 derives the conditional posterior
distributions of model’s unknown parameters and Section 5 exemplifies the
methodology with a well-known data set. The appendix A gives details proofs of some
of the results. Appendix B gives the BUGS code that has been used to get the results of
the example.
2. The Additive Exponential-Piecewise Linear Hazard Model with the Shared Frailty
We consider an analysis of multiple event data event data where there are n groups
(clusters) (either individual subjects or groups of subjects) and that the ith cluster has mi
individuals and associates with an unobserved frailty wi, 1 ≤ i ≤ n. The jth individual in
the ith cluster, 1 ≤ j ≤ mi, associates with the fixed covariate vector xij. Such individuals
are assigned as belonging to a specific cluster because they are related somehow, say by
family association, or graphic location. Conditional on frailties wi, the complete survival
times are assumed to be independent. For convenience we suppress from now on the
subscript indexing individuals and consider the model
(2.1) 0 1h(t | w, x) w[h (t) h (t | x)], t 0,= + ≥
where represents a hazard function that has been modified by the inclusion
of a frailty. The frailty random variable, w, is assumed to independently of t and x for
all clusters with some parametric distribution with unit mean (when the mean is
assumed to be finite), usually Gamma (Clayton, 1991), where the unknown variance of
w (say, η) quantifies the amount of heterogeneity among individuals. That is we may
assume that
h(t | w,x)
6
(2.2) 1 1i iw | ~ Ga(w | , ),− −η η η
g
that is, given η, wi, i = 1,2,…,n, is modeled as Gamma distribution with scale parameter
and shape parameter . It is important to keep in mind the interpretation of w in
expression
1−η 1−η
(2.1). The frailty random variable w measures the random sensitivity of ith
cluster to the event of interest after eliminating the effect of the covariate. On the other
words, if the value of the frailty in (2.1) is greater than one, the individual has a larger
than average hazards and is said to be more ‘frail’ and vice versa. That is why for finite
mean frailty we need to assume unit mean (to assure identifiability) and we need to
assume that the frailty distribution of individuals at different covariate levels the same
mean but may have different variability.
The non-parametric part of model (2.1), h0(t), is assumed to be ‘a piecewise linear
hazard’. An ordinary piecewise constant hazard (see, e.g., Gamerman, 1991), which is
an example of a semiparametric hazard specification, has advantage that it is a simple
way to get a flexible hazard function, with simple estimation. On the other hand, it has a
major disadvantage that is the hazard is not continuous as a function of time, as there
are jumps at the interval end points. In order to avoid the discontinuity of the ordinary
piecewise Exponential model (Gamerman, 1991). To construct this model, we first split
the time the time axis into intervals 0 10 a a ... a= < < < , where g is the number of
intervals of observation time, i.g., ag > tij for all i = 1,2,…,n and j = 1,2,…,mi. The
hazard in the interval Ik = (ak−1,ak] is 00 k 1 0k k 1(t a ) I(a t)− −λ + − λ < . Thus, the hazard
function can be described as
(2.3) g
0 00 k 1 0k k 1k 1
h (t) (t a ) I(a t),− −=
= λ + − λ
by means of an indicator of death for the jijk ij k 1 ijI[a t ]−δ = δ <th individual in the ith
cluster throughout kth interval, and the observation time tijk in the interval. This quantity
equals
ij k 1 ij k 1ijk
ij j 1
t a if t a ,t
0 if t a .− −
−
− >⎧⎪= ⎨ 0.=θ
In general, the survival function is related to the hazard function through the expression
, where the integrated hazard function, H(t), is =
−ln[S(t). Hence given the relationship between the hazard and the survival function, it
can be shown that the individual survival function which has been modified by the
inclusion of a frailty, given the parameters w = (w
S(t) exp[ H(t)]= −t
0
H(t) h(u)du= ∫
1,w2,…,wn), λ0 = (λ00,λ01,…,λ0g) and θ
is
(2.4) [ w0 0 0 1S(t | w, , ) S (t | )S (t | ) ,λ θ = λ θ ]
0k
where S0(t) and S1(t) are, respectively, the survival functions related to linear piecewise
hazard function and Exponential hazard function, i.e., t g
0 0 kk 00
S (t | ) exp[ h (u)du] exp[ c (t) ],=
λ = − = − λ∑∫
1t
1 uS (t | ) exp( )du exp( ),∞
θ = − = −t
θ θ θ∫
where ck(t) are positive statistics. Therefore, the density function takes the form
0 0f (t | w, , ) h(t | w, , )S(t | w, , ),λ θ = λ θ λ θ0
] (2.5) [ w0 0 1 0 0 1 = w[h (t | ) h (t | )] S (t | )S (t | ) .λ + θ λ θ
8
We assume that the parameter θ is specific for each individual in the population, but
related to the covariates x through a probabilistic model. In order to facilitate
implementation, it is convenient to assume that
2| x ~ N(log | x, ),θθ θ β σ (2.6)
that is, given x, the logarithm of the mean, log θ, is modeled as Normal distribution with
mean βx, a linear combination of the effects covariates such that β = (β1,β2,…,βp) and x
be the N×p matrix with rows x1,x2,…,xN, and variance 2θσ . The hyperparameters β and
are unknown constant common to all individuals in the population. The expression 2θσ
(2.6) is equivalent to say that, given the hyperparameters, the mean of the Exponential
distribution is log-Normal distributed.
The hierarchical model given by the two stages (2.5) and (2.6) allows complete
heterogeneity ‘frailty’ in the population, so we can find two different individuals who
have the same covariate vector, but their hazard functions not necessarily identical.
Further, in the first stage (2.5), it has been described with its true parametric model
given vectors of a specific parameters, later to be estimated, while the second stage (2.6)
accounts for cross-sectional (between-subject) heterogeneity of the vector parameter θ =
{θij} (i = 1,2,…,n and j = 1,2,…,mi). So that θij denote the expected value of death for jth
individual in the cluster i. The hierarchical representation of the model enables us to use
MCMC methodology, that allows the Bayesian analysis of the problem.
3. The Likelihood Specification Using the Counting Process Approach Suppose that the jth individual in the ith cluster survival time Tij is an absolutely
continuous random variable conditionally independent of a right censoring time Zij
given the covariates xij and frailty wi. Let Vij = min(Tij,Zij) and ij ij ijI(T Z )δ = ≤ denote
the time to the end-point event and the indicator for the event of interest to take place,
respectively. Suppose that ij ij ij i(V , ,x ,w )δ are i.i.d, for i = 1,2,…,n, j = 1,2,…,mi, and
9
the conditional hazard function of Tij given xij and wi satisfies the additive Exponential-
piecewise linear hazard model.
For subject j in cluster i, let ijN (t) 1= if ij 1δ = in interval [0,t] and ijN (t) 0=
otherwise, and let if the subject is still exposed to risk at time t and ijY (t) 1= ijY (t) 0=
otherwise. Hence, we have a set of n
ii 1
N m=
=∑ subjects such that the counting process
for the jij{N (t); t 0}≥th subject in ith cluster in the set, records the number of observed
events up to time t. Letting denote the increment on ijdN (t) ijN (t) over the small
interval [t,t+dt), the likelihood of the data conditioned on wi is the proportional to
(3.1)
[ ]
[ ]
iij
mndN (t)
ij i 0 0 1i 1 j 1 t 0
ij i 0 0 1t 0
Y (t)w h (t | ) h (t | )
exp Y (t)w h (t | ) h (t | ) .
= = ≥
≥
⎛ ⎞λ + θ⎜ ⎟
⎝ ⎠
⎛ ⎞× − λ + θ⎜ ⎟⎜ ⎟
⎝ ⎠
∏∏ ∏
∫
Since we allow each ijN (t) to take at most one jump for each subject, the
contribute to the likelihood in the same manner as independent Poisson random
variables even though for all i, j and t.
ijdN (t)
ijdN (t) 1≤
Suppose that, as described in section 2, the time axis [0,∞) is pertained into g + 1
disjoint intervals I1, I2,…, Ig + 1 where Ik = for k = 1,2,…,g+1, with k 1 k[a ,a )− 0a 0= and
In the kg 1a + = ∞.th interval, given wi, the jth subject in the ith cluster has hazard form
, (k = 1,2,…,gi 0 ij 0k 1 ij ijw {h (t | ) h (t | )}λ + θ ij). Recall that gij denotes the number of
partitions of the time interval for the jth subject in the ith group. Given the complete data
(T, w), where T = {tij :i = 1,2,…,ni; j = 1,2,…,mi}, w = (w1,…,wn), the likelihood (3.1)
can be re-expressed as
10
(3.2)
{ }
[ ]
ijiijk
k 1 k]
gmn dNij i 0 0 1
i 1 j 1 k 1 t (a ,a
ij i 0 0 1t 0
Y (t)w h (t | ) h (t | )
exp Y (t)w h (t | ) h (t | ) ,
−= = = ∈
≥
⎛ ⎞⎜ ⎟⎡ ⎤λ + θ⎣ ⎦⎜ ⎟⎝ ⎠
⎛ ⎞× − λ + θ⎜ ⎟⎜ ⎟
⎝ ⎠
∏∏∏ ∏
∫
where dNijk is the change in the count function for jth subject in the ith group in the
interval k. Under the assumption that the risk occurred in the interval Ik is small, i.e.,
(3.3) { }k
k 1
a
ij 0 0 1a
Y (t) h (t | ) h (t | ) dt 0 for all i,j,k −
λ + θ ≈∫
The likelihood contribution across this interval for individuals at risk is approximately
ijkdN
i 0k k k 1 i 0k k k 1ij ij
1 1w dH (a a ) exp w dH (a a )− −⎧ ⎫ ⎛⎡ ⎤ ⎡ ⎤⎪ ⎪ ⎜ ⎟+ − × − + −⎢ ⎥ ⎢ ⎥⎨ ⎬ ⎜ ⎟θ θ⎢ ⎥ ⎢ ⎥⎪ ⎪⎣ ⎦ ⎣ ⎦⎩ ⎭ ⎝
⎞
⎠ (3.4)
where
, is the usual cumulative baseline intensity for the kk 1
k
a
0k 0a
dH h (t)dt−
= ∫ th interval.
Hence, the likelihood (3.4) is essentially Poisson in form, reflecting the fact that the
likelihood may be thought of as generated by independent contributions of many data
‘atoms’ each concerned with observation of an individual over a very short interval
during which the intensity may be regarded constant and approximately zero (for a
review of this point, see Clayton, 1994). Therefore, we replace (3.4) with
ijki
ijk
dNmn
i 0k k k 1iji 1 j 1 k:Y 1
i 0k k k 1ij
1w dH (a a )
1 exp w dH (a a ) ,
−= = =
−
⎧ ⎫⎡ ⎤⎪ ⎪+ −⎢ ⎥⎨ ⎬θ⎢ ⎥⎪ ⎣⎩
⎛ ⎞⎡ ⎤⎜ ⎟× − + −⎢ ⎥⎜ ⎟θ⎢ ⎥⎣ ⎦⎝ ⎠
∏∏ ∏⎪⎦⎭ (3.5)
11
where Yijk = 1 if the jth subject in the ith group is exposed to risk at time ,
and Y
k 1 kt (a ,a ]−∈
ijk = 0 otherwise.
4. Prior Distribution To complete a Bayesian specification of the model, prior distributions are needed for
the vector parameter , and the hyperparameters β, 0λ2θσ and have to be specified. It
seems natural to assume independent priors for 0λ , (β, 2θσ ) and 0 00 01 0g( , ,..., ) ,′λ = λ λ λ
we assume independent Gamma priors, i.e.,
, (4.1) 0k 0k 0k 0k ij~ Ga( | a , b ), k 1, 2,..., gλ λ =
where 0k 0ka b is the prior expectation for 0kλ , and 2
0k 0ka b is the prior variance, with
prior independence assumed across kth interval, hence
(4.2) ijg
0 0k 0kk 1
~ Ga( | a ,b=
λ λ∏ 0k ).
2 ,
For (β, ), we choose the usual Normal-Inverse Gamma conjugate priors, i.e. 2θσ
2 p| ~ N ( | m , V )θ θ θ θβ σ β σ (4.3)
with 2 2~ Ga(1/ | a ,b ).θ θ θ θσ σ (4.4)
Finally, we suggest a Gamma distribution as a prior for η, i.e.
1 2~ Ga( , ).η φ φ (4.5)
where 1 2φ φ is the prior expectation for η, and 2
1 2φ φ is the prior variance.
4.1 Data Augmentation and Gibbs Sampler
To perform the conditional posterior distribution, we use the approach of ‘data
augmentation’ (Tanner and Wong, 1987). The idea of data augmentation is to augment
with the so-called latent data or missing data, in order exploit the simplicity of the
resulting conditional posterior distributions of vector parameters of interest. Although,
12
this will increase the dimensionality of the problem (possibly at the expense of extra
computing time), the Gibbs sampler will be seen to be simple as follows:
First not that under (3.4), it is essentially Poisson in form, reflecting the fact that
the likelihood may be thought of as generated by independent contributions of many
data each concerned with observation of individual i of cluster j over a very short
interval during which the intensity may be regarded as constant, i.e.,
n
i 0k k k 1ij
ijk 0k ij ij i ij
i 0k k k 1ij
1w dH (a a )
P(N n | dH , x , , w ,Y 1)n!
1 exp w dH (a a ) .
−
−
⎧ ⎫⎡ ⎤⎪ ⎪+ −⎢ ⎥⎨ ⎬θ⎢ ⎥⎪ ⎪⎣ ⎦⎩ ⎭= θ = =
⎛ ⎞⎡ ⎤⎜ ⎟× − + −⎢ ⎥⎜ ⎟θ⎢ ⎥⎣ ⎦⎝ ⎠
(4.6)
Hence we have
ind
iijk ijk i 0k k k 1
ij
wdN ~ Poisson[dN | w dH (a a )].−+ −θ (4.7)
Since the additive form of the Poisson sum does not result in the conditional posterior
distribution in a closed form, we can solve this problem by re-expressing (4.2) in an
augmented form involving independent Poisson latent variables, unobserved or missing
data, corresponding to each term in the expression for the Poisson mean. In particular,
we assume
(4.8) ijk ijk0 ijk2 ijk1 ijdN dN dN dN , for all i, j: Y 1,= + + =
such that ijk0 ijk0 i 00 k k 1dN ~ Poisson[dN | w (a a )],−λ −
2iijk2 ijk2 0k k k 1wdN ~ Poisson[dN | (a a ) ], k 1,2,...,g,2 −λ − =
iijk1 ijk1 k k 1ij
wdN ~ Poisson[dN | (a a )].−−θ
Using the property that the sum of independent Poisson random variables is also
Poisson, it is straightforward to show that (4.8) is equivalent to (4.7). Such expression
13
allows us to take advantage of Poisson-Gamma conjugacy to obtain simple conditional
posterior as much as possible. Some of the derivations of these conditional distributions
are outlined in appendix A. The sampler iterates through the following steps:
Step 1. Sample the latent variables ijk0 ijk2 ijk1 ij(dN ,dN ,dN ) , for all i, j,k: Y 1,′ = jointly
from their full conditional posterior distribution from (A.3) as follows:
1. If then let ijkdN 0= ijk0 ijk2 ijk1dN dN dN 0= = = ,
2. If dNijk > 0 then sample (dNijk0, dNijk2,dNijk1) from
Multimomial(dNijk|Pijk0,Pijk2,Pijk1), where
00 k k 1ijk02 i
00 k k 1 0k k k 1 k k 1ij
(a a )P
w1(a a ) (a a ) (a a )2
−
− − −
λ −=λ − + λ − + −
θ
,
2
0k k k 1
ijk22 i
00 k k 1 0k k k 1 k k 1ij
1 (a a )2P ,
w1(a a ) (a a ) (a a )2
−
− −
λ −=λ − + λ − + −
θ −
and
k k 1 ijijk12 i
00 k k 1 0k k k 1 k k 1ij
(a a )P
w1(a a ) (a a ) (a a )2
−
− − −
− θ=λ − + λ − + −
θ
.
Step 2. Sample from expression 00λ (A.4).
Step 3. Sample , k = 1,2,…,g0kλ i, from expression (A.5).
Step 5. Sample , i = 1,2,…,n. from expression iw (A.6).
Step 6. Sample and then 2θσ2| θβ σ , from expressions (A.6) and (A.8), respectively.
The other conditionals do not have a conjugate analysis. For each j = 1,2,…,mi and
i = 1,2,…,n, the conditional distribution for ijθ is proportional to
(4.9) ijkdN 2
ij i 0 ij 0 ij ij ij ijY (t)w h(t | , ) S(t | , )f ( | x , ), for all k 1,2,...,g ,θ⎡ ⎤λ θ λ θ θ β σ =⎣ ⎦
14
The expression (4.9) does not have closed form. But it is still possible to sample from it
using a Metropolis algorithm.
Finally; for i = 1,2,…,n. Letting 1,−ξ = η the full conditional distribution of ξ does not
have closed form, either. It is proportional to
n
ini 11 n
i ni 1
exp ww
[ ( )]=ξ− − ξ
=
⎛ ⎞−ξ⎜ ⎟
⎛ ⎞ ⎝ ⎠ f ( ),ξ ξ⎜ ⎟Γ ξ⎝ ⎠
∑∏ (4.10)
with 1 2~ Ga( | , ).ξ ξ φ φ (4.11)
With this choice of priors, it can be shown that the above full conditional density is log-
concave. Thus, we can use the adaptive rejection algorithm of Gilks and Wild (1992) to
sample from this full conditional.
5. Application
Here we demonstrate the method using the well-known leukemia data analyzed by Cox
(1972), (Hougaard, 2000, subsection 1.5.4), (Ibrahim et al., 2001, example 3.4),
Spiegelhalter et al., 2004), among others. These data listed in Table 1 as reported by
Houggard (2000) which consisted of 21 pairs matched of leukemia patients. The
random variable of interest consists of remission times (in weeks) of the patients
assigned to treatment with a drug or a placebo during remission maintenance therapy.
Further, the patients were matched according to center and remission status, either
partial or complete. Thus one in each pair received 6-MP as a treatment and one
placebo. These data have been used in many articles, but in most of them neglecting the
pairing. The aim is to find the effect of the treatment, and the corresponding covariate is
15
TABLE 1: Leukemia Remission Time Data
Status Placebo 6-MP
P C C C C P C C C C C P C C C P P C C C C
1 22 3
12 8
17 2
11 8
12 2 5 4
15 8
23 5
11 4 1 8
10 7
32+ 23 22 6 16
34+ 32+ 25+ 11+ 20+ 19+
6 17+ 35+
6 13 9+ 6+
10+
Source. Hougaard (2000, page 15)
of the matched pair’s type. In the analysis below, we have used the program BUGS.
Given the model assumptions, this program performs the Gibbs sampler by simulating
from the full conditional distributions. The code to specify this model and to obtain the
posterior distributions of the parameters is in the appendix B. the Bayesian estimators
were obtained through the implementation of the Gibbs sampling scheme described in
the previous section. We implemented 10,000 iterations of the algorithm and described
the first 500 iterations as a burn-in. Spiegelhalter et al., (2004), the BUGS team, use the
idea of parallel multiple chains to check the convergence of the Gibbs sampler and
recommended to use from 2-5 chains. As mentioned by BUGS team, the fully
Quantitative monitoring of parallel multiple chains was first proposed by Gelman and
Rubin (1992a, b). The chains should start from over-dispersed initial values to ensure
16
good converge of parameter space. To generate the Gibbs posterior samples in the
previous section, we choose to use two parallel chains. Monitoring convergence of the
chains, which have been done in this article via the Brooks and Gelman (1998)
convergence-diagnostic-graph. Hence, once convergence has been achieved, 10,000
observations are taken from each chain after the burn-in period to reach our goal of
20,000 observations. Inspection of the Brooks and Gelman’s diagnostic graphs (Figures
1a-2b), we find the BGR (Brooks an Gelman Ratio) convergent to one, this show that
the convergence for the coefficient of regression β and the standard deviation of frailty
b.σ Therefore, beyond the burn-in period, a sample of 10,000 observations from each of
the two chains is drawn.
Figure 1. Diagnostics related to β
beta chains 1:2
iteration501 5000 10000
0.0
0.5
1.0
beta chains 2:1
iteration104501040010350
-6.0 -4.0 -2.0 0.0 2.0
a) Brooks & Gelman convergence diagnostics b) Trace plot of β for each chain
c) ACF for the iterations for each chain
beta chains 1:2
lag0 20 40
-1.0 -0.5 0.0 0.5 1.0
d) History plot of β for each chain
17
Figure 2. Diagnostics related to bσ
a) Brooks & Gelman convergence diagnostics b) Trace plot of β for each chain
sigma.b chains 1:2
iteration501 5000 10000
0.0
0.5
1.0
sigma.b chains 2:1
iteration104501040010350
1.0 2.0 3.0 4.0
sigma.b chains 1:2
lag0 20 40
-1.0 -0.5 0.0 0.5 1.0
c) ACF for the iterations for each chain
For each of the two-chains, BUGS software depicts estimated parameters as a
function in the iteration number (Figures 1b-2b). Additionally, the BUGS software
offers also a graph of the autocorrelation function (ACF) of the iterations to the 50-lag
for each chain independently (Figures 1C-2C). The autocorrelation plot in Figure 2c
illustrates such dependence between successive observation, which appears to die out
wee before lag 40. This indicates fairly rapid mixing and thus good convergence of the
parameter space with a reasonably small number of iterations. As a rule of thumb if the
autocorrelations are needed to get ride of the dependence structure, but from (Figures
1b, 1d), we can be reasonably confident that convergence of β has been achieved (the
two chains appear to be overlapping one another) and thus the convergence looks
reasonable.
For each node for the data set, similar set of graphs is produced to monitor
convergence, independence and convergence. They are suppressed in this article for
purposes of space limit.
18
Once, one is satisfied with the ACF and converges graph at least for the
parameters of interest, and most importantly with the convergence of all model
parameters, Gibbs sample of size 20,000 is drawn for each parameter. Table 2, 2.5%
and 95.5% correspond to the respective posterior percentiles of β and bσ .
TABLE 2. Posterior summaries of β and bσ
Parameter Mean SD 2.5% Median 9.75%
β bσ
-1.54 0.6604 −3.02 −1.473 −0.4582 1.865 0.2648 1.427 1.838 2.461
Therefore, the 95% credible interval for β is thus (−3.02, −0.4582), and the mass for the
posterior distribution of β is to the left of zero, indicating the treatment 6-MP drug has a
significant effect compared to placebo drug. This can be further illustrated in a plot of
the marginal posterior density of β as shown below in Figure 3.
Figure 3. Estimated marginal density for β
beta chains 1:2 sample: 20000
-8.0 -6.0 -4.0 -2.0 0.0
0.0 0.2 0.4 0.6 0.8
Figure 4, below, demonstrates the types of inference for the survival probability for the
two groups separately and simultaneously that can be obtained from the full posterior
samples.
Figure 4. Posterior mean and central 95% limits for survival probabities
The survival probability for treatment group
Time (weeks) 0.0 10.0 20.0 30.0 40.0
S(t)
0.2
0.4
0.6
0.8
1.0
19
The survival probability for placebo group
Time (weeks) 0.0 10.0 20.0 30.0 40.0
S(t)
0.0
0.5
1.0
The survival probability for the treatment and placebo simulatneously
Time (weeks) 0.0 10.0 20.0 30.0 40.0
0.2
0.4
0.6
0.8
1.0
6. Conclusion
The current article makes the additive hazard model a practical alternative to the
proportional hazard model. When the proportional hazard assumptions are violated,
there is a need for alternative models such as the one studied here. Following the
additive hazard model alternatively with the proportional hazard model, based on
counting process is extended to deal with the unobserved heterogeneity among the
individuals in study. A Bayesian analysis for survival analysis for survival models with
frailty, especially survival models with PHM or AHM for the hazard function, were
impossible only a few years ago. Nowadays, with the great increase in computational
power of computers, the analysis of these kinds of models is advisable in survival
analysis. That is why, such models can better explain the relationship between the
lifetime random variable and the diagnostic factors (covariates). Spiegelhalter et al.
(2004), the BUGS team, analyzed the data set, used in section 5, assumed that the
proportional hazard assumptions are verified. We assume for a moment, for the purpose
of illustration, that one of assumptions of proportional hazard for this data set, are
20
violated. Hence, We have developed the BUGS code available from the BUGS team to
implement the algorithm that has been described in Section 4.
APPENDIX A Derivation of Conditional Posteriors:
The joint posterior density of the parameters and latent variables
( , , ) is proportional to
20( , , , , wθλ θ β σ )
ijk0dN ijk2dN ijk1dN
iji gmn
ijk ijk0 ijk2 ijk1 ijk0 i 00 k k 1i 1 j 1 k 1
2i iijk2 0k k k 1 ijk1 k k 1 ij
ij
2 1 1ij 0k 0k 0k i
I(dN dN dN dN )Poisson[dN | w (a a )]
w wPoisson[dN | (a a ) ]Poisson[dN | (a a )]N( log |2
x , )Ga( | a ,b )Ga(w | , )
−= = =
− −
− −θ
= + + λ −
λ − − θθ
β σ λ η η
∏∏∏ ×
2 21 2 00 00 00
Ga( | , )Ga( | a ,b )N( | m , V )Ga(1 | a ,b ).θ θ θ θ θ θ
⎧ ⎫⎪ ⎪⎪ ⎪⎪ ⎪⎪ ⎪⎪ ⎪⎪ ⎪⎨ ⎬⎪ ⎪⎪ ⎪⎪ ⎪⎪ ⎪⎪ ⎪⎪ ⎪⎩ ⎭
× η φ φ λ β σ σ
(A.1)
Step 1. It follows from expression (A.1) that the full conditional distribution of the
latent variables is proportional to
ijk0
ijk1ijk2
ijk2
dNi 00 k k 1
ijk ijk0 ijk2 ijk1ijk0
dNdN2i ij k k 1i k k 1
dNijk1
[w (a a )]I(dN dN dN dN )
dN !
[(w )(a a )][(1 2)w (a a ) ] .dN !!
−
−−
λ −= + + ×
θ −−×
×
(A.2)
on the other hand, given (A.1), we have (A.2) is also proportional to
21
dNijk2ijk0
ijk1
ijk2
i 00 k k 1 i 0k k k 1 i ij k k 1
dN 2i 00 k k 1 i 0k k k 1
ijk0 ijk2
dNi ij k k 1
ijk1
dN !
[w (a a ) (w 2) (a a ) (w )(a a )]
[w (a a )] [(w 2) (a a ) ]dN ! dN !
[(w )(a a )],
dN !
− −
− −
−
−
×λ − + λ − + θ −
λ − λ −× ×
θ −
ijk0
ijk2
ijk
ijk0 ijk2 ijk1
dN
00 k k 12
i 00 k k 1 i 0k k k 1 i ij k k 1
dN2
0k k k 12
i 00 k k 1 i 0k k k 1 i ij k k 1
dN !dN !dN !dN !
(a a )
{w (a a ) (w 2) (a a ) (w )(a a )}
( 2)(a a )
{w (a a ) (w 2) (a a ) (w )(a a )}
−
− − −
−
− − −
∝ ×
⎡ ⎤λ −×⎢ ⎥
λ − + λ − + θ −⎢ ⎥⎣ ⎦
⎡ ⎤λ −×⎢ ⎥
λ − + λ − + θ −⎢ ⎥⎣ ⎦ijk1dN
k k 1 ij2
i 00 k k 1 i 0k k k 1 i ij k k 1
(a a ) ,
{w (a a ) (w 2) (a a ) (w )(a a )}−
− − −
⎡ ⎤− θ⎢ ⎥
λ − + λ − + θ −⎢ ⎥⎣ ⎦
ijk0 ijk2 ijk1dN dN dNijk ijk0 ijk2 ijk1ijk0 ijk2 ijk1
dN !P P P
dN !dN !dN != ,
∝ Multinomial ({ , , }| , { }). (A.3) ijk0dN ijk2dN ijk1dN ijkdN ijk0, ijk2 ijk1P P ,P
where , and are defined in subsection 4.1. ijk0P ijk2P ijk1P
Step 2. The full conditional distribution of 00λ , is proportional to
[ ]ijk0
ij
00
dNi 00 k k 1
i 00 k k 1ijk0i, j,k:Y 1
a 100 00 00
[w (a a )]exp w (a a )
dN !
( ) exp( b ),
−−
=
−
⎧ ⎫λ −⎪ ⎪− λ −⎨ ⎬⎪ ⎪⎩ ⎭
× λ −λ
∏
( ) ijk0i, j,k:Yij dN00 00 ij i k k 1i, j,kexp Y w (a a ) ,−∑ ⎡ ⎤∝ λ −λ −⎣ ⎦∑
22
ij iji ig gm mn n
00 00 ijk0 00 ij i k k 1i 1 j 1 k 1 i 1 j 1 k 1
Ga | a dN ,b Y w (a a ) .−= = = = = =
⎛ ⎞⎜ ⎟∝ λ + + −⎜ ⎟⎝ ⎠
∑∑∑ ∑∑∑ (A.4)
Step 3. The full conditional distribution of 0kλ , k = 1,2,…, , is proportional to ijg
ijk 0k
ij
dN a 12 2i 0k k k 1 i 0k k k 1 0k
i, j,k:Y 1
0k 0k
[(w 2) (a a ) ] exp[ (w 2) (a a ) ]
exp( b ),
−− −
=
λ − − λ − λ
× −λ
∏
mn iijk i
i 1 j 1
0k
dN mn
0k i 0k ij k k 1i 1 j 1
a 10k 0k 0k
( ) exp (w 2) Y (a a )
( ) exp( b ),
= =−
= =
−
∑∑ ⎡ ⎤∝ λ − λ −⎢ ⎥
⎢ ⎥⎣ ⎦
× λ −λ
∑∑
( )ijki, j0k 0k i ij k k 10k dN i, jGa | a ,b (w 2) Y (a a ) .−+∝ λ + −∑ ∑ (A.5) Step 4. To derive the conditional distribution of , i = 1,2,…,n, we start with the joint
posterior density of parameters prior to augmentation that is proportional to
iw
( )
ijki
ij
1
dNm
i 0k k k 1 i 0k k k 1ij ijj 1 k:Y 1
1ii
1 1w dH (a a ) exp w dH (a a )
w exp w ,−
− −= =
η −
⎧ ⎫ ⎧⎡ ⎤ ⎡ ⎤⎪ ⎪ ⎪+ − − + −⎢ ⎥ ⎢ ⎥⎨ ⎬ ⎨θ θ⎢ ⎥ ⎢ ⎥⎪ ⎪ ⎪⎣ ⎦ ⎣ ⎦⎩ ⎭ ⎩
−η
∏ ∏⎫⎪×⎬⎪⎭
gm iji 1ijijk i
j 1k 1gdN 1 m
1i i 0k k
ijj 1 k 1
1(w ) exp w dH (a a ) ,−
= =+η −
−−
= =
∑ ∑ ⎧ ⎫⎛ ⎞⎪ ⎪⎜ ⎟∝ − η + +⎨ ⎬⎜ ⎟θ⎪ ⎪⎝ ⎠⎩ ⎭∑∑ k 1−
ij iji ig gm m1
ijk 0k k k 1ijj 1 k 1 j 1 k 1
1Ga dN , dH (a a )− −= = = =
⎛ ⎞⎜ ⎟∝ + η + −⎜ ⎟θ⎝ ⎠∑∑ ∑∑ . (A.6)
Step 5. The full conditional distribution of (β, 2θσ ) is proportional to
23
imn2 2 2
ij iji 1 j 1
N(log | x , ) N( | m , V )Ga(1 | a ,b )θ θ θ θ θ θ= =
⎧ ⎫⎪ ⎪θ β σ β σ σ⎨ ⎬⎪ ⎪⎩ ⎭∏∏ θ .
That expression is the same as one that appears in the usual conjugate analysis of the
Normal data (see, e.g., DeGroot, 1970, pages 249-252). It is then proportional to a
multivariate Normal-Inverse Gamma distribution, i.e.,
( )2 2 1p ˆ| ~ N | , (V xx−θ θ θ ,′β σ β β σ + (A.7)
2 2 1V 1 ˆ ˆ~ Ga 1 | a ,b [(y x ) y (m ) V m .2 2
−θ θ θ θ θ θ
⎛ ⎞′ ′σ σ + + − β + −β⎜ ⎟⎝ ⎠
θ
n
(A.8)
where 111 1m nmy (log ,...log ,..., log )′= θ θ θ , x is the covariate matrix and the estimated
of the coefficient of regression, β̂ , is calculated from 1 1 1ˆ (V x x) (V m x y).− − −θ θ θ′ ′β = + +
APPENDIX B Here we give the program code to analyze the data that has been described in Section 5
with the program BUGS. Winbugs does not allow a[0] or dH[0] to be used, so j is
started from 2, ‘j = 0’ in the original formula is treated as ‘j = 1’. Therefore, dH[j] is the
intensity in ( )j 1− th interval.
Model; leukemia data #the name of the program { # Set up data for(i in 1:N) { # N is the total number of patients for(j in 2:T) { # T is the number of unique failure times # risk set = 1 if obs.t >=a, where obs.t[i] is the observed remission or censoring time ith patient # eps = 0.00001 will be used to guard against numerical imprecision in step function # a[T] is the unique failure time + maximum censoring time Y[i,j]
# Model for(j in 2:T) { # Idt[N,T] is the total intensity process # I0dt[N,T] and I2dt[N,T] are the intensities for the baseline hazard function # I1dt[N,T] is the intensity for the parametric part for the hazard function # Y[N,T] =1 if subject observed and zero if the patient does not observed for(i in 1:N) { Idt[i, j]
sigma.theta
Clayton, D. (1994). Bayesian analysis of frailty models. Technical Report, Medical
Research Council Biostatistics Unit, Cambridge.
Clayton, D.G. (1978). A model for association in bivariate life-tables and its application
in epidemiological studies of chronic disease incidence. Biometrika, 65, 141-
151.
Cox, D.R. (1972). Regression models and life-tables (with discussion). J.R. Statist. Soc.,
B34, 187-220.
Cox, D.R. (1975). Partial likelihood. Biometrika, 62, 269-276.
DeGroot, M. H. (1970). Optimal Statistical Decisions. New York: McGraw-Hill.
Dunson, D.B. and Herring, A.H. (2004). Bayesian model selection and averaging in
additive and proportional hazards model (available for download at
www.ftp.isds.duke.edu/workingPapers/04-16.pdf).
Gamerman, D. (1991). Dynamic Bayesian models for survival data. Applied Statistics,
40, 63-79.
Gelman, A. and Rubin, D. (1992a). Inference from iterative simulation using multiple
sequences. Statistical Science, 7, 457-511.
Gelman, A. and Rubin, D. (1992b). A single from the Gibbs sampler provides a false
sense of security. Bayesian Statistics, 4, eds. J.M. Bernardo, J.O. Berger, A.P.
Dawid and A.F.M. Smith, New York: Oxford University Press, 625-631.
Gilks, W.R. and Wilks, P. (1992). Adaptive rejection sampling for Gibbs sampling.
Applied Statistics, 41, 337-348.
Hougaard, P. (2000). Analysis of Multivariate Survival Data. New York: Springer -
Verlag.
Ibrahim, J.G., Chen, M.H. and Sinha, D. (2001). Bayesian Survival Analysis. New
York: Springer – Verlag.
Lin, D.Y. and Ying, Z.L. (1994). Semiparametric analysis of the additive risk model.
Biometrika, 81, 61-71.
27
http://www.ftp.isds.duke.edu/workingPapers/04-16.pdf)
Lin, D.Y. and Ying, Z.L. (1995). Semiparametric analysis of general additive-
multiplicative hazard models for counting process. Annals of Statistics,
23,1712-1734.
McKeague, I. W. and Sasieni, P.D. (1994). A partly parametric additive risk model.
Biometrika, 81, 501-514.
Sinha, D. (1993). Semiparametric Bayesian analysis of multiple event time data.
Journal of the American Statistical Association, 88, 979-983.
Sinha, D. and Dey, D.k. (1997). Semiparametric Bayesian analysis of survival analysis
of survival data. Journal of the American Statistical Association, 92, 1195-
1212.
Sorensen, D. and Gianale, D. (2002). Likelihood, Bayesian, and MCMC Methods in
Quantitative Genetics. New York: Springer - Verlag.
Spiegelhalter, D.J., Thomas, A., Best N.G., Gilks W.R. and Lunn D. (2004). BUGS:
Bayesian Inference Using Gibbs Sampling. MRC Biostatistics Unit, Cambridge,
English.
Tanner, M.A. and Wong, W.H. (1987). The calculation of posterior distributions data
augmentation (with discussion). Journal of American Statistical Association,82,
528-550.
Vaupel, J.W., Manton, K.G., and Stallard, E. (1979). The impact of heterogeneity in
individual frailty on the dynamics of mortality. Demography, 16, 439-454.
Vu, H.T. (2003). Parametric and semiparametric conditional shared gamma frailty
models with events before study entry. Communications in Statistics:
Simulation and computation, 32(4), 1223-1248.
28
Shaban A. Shaban1,* and Ayman A. Mostafa2,** Institute of Statistical Studies and Research Under the multiplicative hazard model (Cox, 1972) takes the form Figure 1. Diagnostics related to ( Step 5. The full conditional distribution of ((, ) is proportional to APPENDIX B