28
Shared Frailty Survival Analysis Using Semiparametric Bayesian Method Shaban A. Shaban 1, * and Ayman A. Mostafa 2, ** 1 Department of Mathematical Statistics, Institute of Statistical Studies and Research 2 Department of Technical Accounting General Insurance Egyptian Insurance Supervisory Authority 28, Talaat Harb St., Cairo P.O. Box: 2545 * [email protected] ** [email protected] 1

Shared Frailty Survival Analysis Using Semiparametric Bayesian Methodinterstat.statjournals.net/YEAR/2005/articles/0511002.pdf · 2016. 4. 26. · Bayesian analysis for survival models

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

  • Shared Frailty Survival Analysis

    Using Semiparametric Bayesian Method

    Shaban A. Shaban1,* and Ayman A. Mostafa2,**

    1Department of Mathematical Statistics,

    Institute of Statistical Studies and Research

    2Department of Technical Accounting General Insurance

    Egyptian Insurance Supervisory Authority

    28, Talaat Harb St., Cairo P.O. Box: 2545

    * [email protected] ** [email protected]

    1

  • SUMMARY. In survival data analysis, the proportional hazard model was introduced

    by Cox (1972) in order to estimate the effects of different covariates influencing the

    time-to-event data. The proportional hazard model has been used extensively in

    biomedicine, reliability engineering and, recently, interest in its application in different

    areas of knowledge has increased. However, proportional hazard model makes a

    number of assumptions, which may be violated. The object of this article is to present a

    Bayesian analysis for survival models with frailty under additive framework for the

    hazard function in contrast to proportional hazard model. Frailty models in survival

    analysis deal with the unobserved heterogeneity among subjects. Gibbs sampling

    technique is used to assess the posterior quantities of interest. An illustrative analysis

    within the context of survival time data is given.

    KEY WORDS: Survival Analysis, Regression Models, Additive Survival Analysis,

    Bayesian Inference, Frailty Models, BUGS

    2

  • 1. Introduction Survival data is a term used for describing data that measure the time to some event.

    Statistical models and methods for such data and other time-to-event data are

    extensively used in many fields, including the biomedical sciences, engineering, the

    environmental sciences, economics, actuarial sciences, management, and social

    sciences.

    In survival analysis, the additive, multiplicative and the class of general additive-

    multiplicative hazard models provide the three principle frameworks for studying the

    association between covariates and the survival time. The hazard function, also called,

    the risk or intensity function, of a survival time T associated with a P-vector of

    covariates x is defined as h(t | x) f (t | x) [1 F(t | x)],= − where and are

    the density function and the distribution function, respectively, of the random variable T

    conditioned on the vector of covariates x. The function S(

    f (. | x) F(. | x)

    t | x) 1 F(t | x)= − is called the

    survival function.

    Under the additive hazard model (Lin and Ying, 1994; Beamonte and Bermúdez,

    2003), the hazard function takes the form

    0 0h(t | x) h (t) x,′= + β (1.1)

    Under the multiplicative hazard model (Cox, 1972) takes the form

    0h(t | x) h (t)exp( x),0′= α (1.2)

    and under the class of general additive-multiplicative models (Lin and Ying, 1995;

    Dunson and Herring, 2004), the hazard function takes the form

    0 0 0h(t | Z) g{ R} h (t)h{ x}.′ ′= β + α (1.3)

    where h0(t) is an unspecified “baseline hazard function”, Z (R ,x′ ′)= is a p-vector of

    covariates and (say) is a p-vector of unknown regression parameters. The

    covariate Z can be time-dependent, and are known link functions. It is

    obvious that

    0 0 0( , )′ ′ ′β γ = θ

    g{.} h{.}

    (1.3) encompasses both models (1.1) and (1.2).

    In many applications such as biomedical sciences, the survival time T is often

    subject to right censoring because certain patients may still be surviving at the end of

    3

  • the study period. Furthermore, due to the complexity of biological process, it is

    desirable not to parameterize and therefore, only semiparametric inference has

    been used for models

    0h (t)

    (1.1), (1.2) and consequently (1.3).

    In order to draw semiparametric inference for model (1.2), Cox (1972, 1975)

    introduced the partial likelihood approach to estimate the regression vector parameter

    . Since the proportional hazards assumptions are often violated, the need for more

    flexible model motivate the introduction of models

    (1.1) and (1.3) . From Bayesian

    perspective, the model (1.2) has also been approached. The article by Sinha and Dey

    (1997) and the book on Bayesian survival analysis by Ibrahim et al. (2001) contain an

    excellent survey.

    Although the additive hazard model (1.1) have been advocated and successfully

    utilized by numerous authors (e.g., Buckly, 1984; McKeague and Sasieni, 1994, and

    other references therein), no satisfactory semiparametric methods of estimation have

    been developed because of the fact that the partial likelihood approach can not be

    directly used to eliminate nuisance function in estimating β0h (.) 0.

    The above three models, as defined in expressions (1.1), (1.2) and (1.3), it is

    modeled data as if all the individuals in the sample (conditionally to the vector of

    covariates) are drawn from a single homogenous population. But frequently there is

    heterogeneity in the population that the available covariates do not properly explain.

    We now consider generalizations of model (1.1) to allow for covariates that do not

    properly explain (or unobserved individuals effects). These are usually referred to as

    ‘frailty’ in the biomedical sciences. The notion of frailty provides a convenient way to

    introduce random effects, association and unobserved heterogeneity into models for

    survival data. In its simplest form a frailty is an unobserved proportionality factor that

    modifies the hazard function (1.1). As discussed in Sinha (1993), the idea of frailty,

    which was introduced by Vaupel et al. (1979), is particularly natural in the context of

    proportional hazard model (1.2). In some situations the extra random frailty component

    of proportional hazard model is required only to get a correct inference on fixed effects

    4

  • of covariates, whereas in other cases the distribution of the random subject effect could

    be one of the major interests.

    In many applications, the study population can not be assumed to be homogenous

    but must be considered as a heterogeneous sample, i.e., a mixture of individuals with

    different hazards. For example, in many applications, it is impossible to measure all

    relevant covariates related to the event of interest. Sometimes because the importance of

    some covariates is still unknown or sometimes because of economical reasons.

    Therefore the frailty approach is statistical modeling concept which aims to account for

    heterogeneity, caused by unmeasured covariates.

    This article focuses on the shared frailty model in the additive hazard model (1.1).

    The shared frailty model is relevant to event times of related individuals, similar

    members and repeated measurements (parallel data). Individuals in a group (cluster) are

    assumed to share the same frailty, which is why this model is called “shared frailty

    model”. It was introduced by Clayton (1978) and extensively studied in Hougaard

    (2000). The survival times are assumed to be conditional independent with respect to

    the shared (common) frailty. Most research in this area focused on multiplicative model

    (1.2) with frailty from both classical approach (e.g., Vu, 2003) and Bayesian approach

    (e.g., Clayton, 1991). In addition, there has no consideration of additive (1.1) or

    additive-multiplicative models in the Bayesian literature. A notable exception is the

    article by Beamonte and Bermúdez (2003), that proposed an approach for Bayesian

    inference in an additive Gamma-polygonal hazard model and the recent article by

    Dunson and Herring (2003), that focused on the problem of variable selection and

    inference, first in model (1.1) and then in the more general model (1.3). These recent

    articles did not consider the frailty for additive or additive-multiplicative hazard models.

    In this article, we propose a hierarchical model where the shared frailty is assumed

    to be Gamma distribution and the hazard function is given by model (1.1). We

    considered the parametric part in (1.1), 0x′β , to be an Exponential hazard function

    specific for each individual but associated with the covariates, which are assumed to be

    5

  • time-independent, through a probabilistic model. This approach is especially

    appropriate to deal with semiparametric Bayesian with a frailty.

    Fully Bayesian computation of hierarchical models using simulation technique,

    such as Markov chain Monte Carlo (MCMC) algorithms is conducted.

    Section 2 describes the proposed model. Section 3 introduces some notation and

    then derives the likelihood function. Section 4 derives the conditional posterior

    distributions of model’s unknown parameters and Section 5 exemplifies the

    methodology with a well-known data set. The appendix A gives details proofs of some

    of the results. Appendix B gives the BUGS code that has been used to get the results of

    the example.

    2. The Additive Exponential-Piecewise Linear Hazard Model with the Shared Frailty

    We consider an analysis of multiple event data event data where there are n groups

    (clusters) (either individual subjects or groups of subjects) and that the ith cluster has mi

    individuals and associates with an unobserved frailty wi, 1 ≤ i ≤ n. The jth individual in

    the ith cluster, 1 ≤ j ≤ mi, associates with the fixed covariate vector xij. Such individuals

    are assigned as belonging to a specific cluster because they are related somehow, say by

    family association, or graphic location. Conditional on frailties wi, the complete survival

    times are assumed to be independent. For convenience we suppress from now on the

    subscript indexing individuals and consider the model

    (2.1) 0 1h(t | w, x) w[h (t) h (t | x)], t 0,= + ≥

    where represents a hazard function that has been modified by the inclusion

    of a frailty. The frailty random variable, w, is assumed to independently of t and x for

    all clusters with some parametric distribution with unit mean (when the mean is

    assumed to be finite), usually Gamma (Clayton, 1991), where the unknown variance of

    w (say, η) quantifies the amount of heterogeneity among individuals. That is we may

    assume that

    h(t | w,x)

    6

  • (2.2) 1 1i iw | ~ Ga(w | , ),− −η η η

    g

    that is, given η, wi, i = 1,2,…,n, is modeled as Gamma distribution with scale parameter

    and shape parameter . It is important to keep in mind the interpretation of w in

    expression

    1−η 1−η

    (2.1). The frailty random variable w measures the random sensitivity of ith

    cluster to the event of interest after eliminating the effect of the covariate. On the other

    words, if the value of the frailty in (2.1) is greater than one, the individual has a larger

    than average hazards and is said to be more ‘frail’ and vice versa. That is why for finite

    mean frailty we need to assume unit mean (to assure identifiability) and we need to

    assume that the frailty distribution of individuals at different covariate levels the same

    mean but may have different variability.

    The non-parametric part of model (2.1), h0(t), is assumed to be ‘a piecewise linear

    hazard’. An ordinary piecewise constant hazard (see, e.g., Gamerman, 1991), which is

    an example of a semiparametric hazard specification, has advantage that it is a simple

    way to get a flexible hazard function, with simple estimation. On the other hand, it has a

    major disadvantage that is the hazard is not continuous as a function of time, as there

    are jumps at the interval end points. In order to avoid the discontinuity of the ordinary

    piecewise Exponential model (Gamerman, 1991). To construct this model, we first split

    the time the time axis into intervals 0 10 a a ... a= < < < , where g is the number of

    intervals of observation time, i.g., ag > tij for all i = 1,2,…,n and j = 1,2,…,mi. The

    hazard in the interval Ik = (ak−1,ak] is 00 k 1 0k k 1(t a ) I(a t)− −λ + − λ < . Thus, the hazard

    function can be described as

    (2.3) g

    0 00 k 1 0k k 1k 1

    h (t) (t a ) I(a t),− −=

    = λ + − λ

  • by means of an indicator of death for the jijk ij k 1 ijI[a t ]−δ = δ <th individual in the ith

    cluster throughout kth interval, and the observation time tijk in the interval. This quantity

    equals

    ij k 1 ij k 1ijk

    ij j 1

    t a if t a ,t

    0 if t a .− −

    − >⎧⎪= ⎨ 0.=θ

    In general, the survival function is related to the hazard function through the expression

    , where the integrated hazard function, H(t), is =

    −ln[S(t). Hence given the relationship between the hazard and the survival function, it

    can be shown that the individual survival function which has been modified by the

    inclusion of a frailty, given the parameters w = (w

    S(t) exp[ H(t)]= −t

    0

    H(t) h(u)du= ∫

    1,w2,…,wn), λ0 = (λ00,λ01,…,λ0g) and θ

    is

    (2.4) [ w0 0 0 1S(t | w, , ) S (t | )S (t | ) ,λ θ = λ θ ]

    0k

    where S0(t) and S1(t) are, respectively, the survival functions related to linear piecewise

    hazard function and Exponential hazard function, i.e., t g

    0 0 kk 00

    S (t | ) exp[ h (u)du] exp[ c (t) ],=

    λ = − = − λ∑∫

    1t

    1 uS (t | ) exp( )du exp( ),∞

    θ = − = −t

    θ θ θ∫

    where ck(t) are positive statistics. Therefore, the density function takes the form

    0 0f (t | w, , ) h(t | w, , )S(t | w, , ),λ θ = λ θ λ θ0

    ] (2.5) [ w0 0 1 0 0 1 = w[h (t | ) h (t | )] S (t | )S (t | ) .λ + θ λ θ

    8

  • We assume that the parameter θ is specific for each individual in the population, but

    related to the covariates x through a probabilistic model. In order to facilitate

    implementation, it is convenient to assume that

    2| x ~ N(log | x, ),θθ θ β σ (2.6)

    that is, given x, the logarithm of the mean, log θ, is modeled as Normal distribution with

    mean βx, a linear combination of the effects covariates such that β = (β1,β2,…,βp) and x

    be the N×p matrix with rows x1,x2,…,xN, and variance 2θσ . The hyperparameters β and

    are unknown constant common to all individuals in the population. The expression 2θσ

    (2.6) is equivalent to say that, given the hyperparameters, the mean of the Exponential

    distribution is log-Normal distributed.

    The hierarchical model given by the two stages (2.5) and (2.6) allows complete

    heterogeneity ‘frailty’ in the population, so we can find two different individuals who

    have the same covariate vector, but their hazard functions not necessarily identical.

    Further, in the first stage (2.5), it has been described with its true parametric model

    given vectors of a specific parameters, later to be estimated, while the second stage (2.6)

    accounts for cross-sectional (between-subject) heterogeneity of the vector parameter θ =

    {θij} (i = 1,2,…,n and j = 1,2,…,mi). So that θij denote the expected value of death for jth

    individual in the cluster i. The hierarchical representation of the model enables us to use

    MCMC methodology, that allows the Bayesian analysis of the problem.

    3. The Likelihood Specification Using the Counting Process Approach Suppose that the jth individual in the ith cluster survival time Tij is an absolutely

    continuous random variable conditionally independent of a right censoring time Zij

    given the covariates xij and frailty wi. Let Vij = min(Tij,Zij) and ij ij ijI(T Z )δ = ≤ denote

    the time to the end-point event and the indicator for the event of interest to take place,

    respectively. Suppose that ij ij ij i(V , ,x ,w )δ are i.i.d, for i = 1,2,…,n, j = 1,2,…,mi, and

    9

  • the conditional hazard function of Tij given xij and wi satisfies the additive Exponential-

    piecewise linear hazard model.

    For subject j in cluster i, let ijN (t) 1= if ij 1δ = in interval [0,t] and ijN (t) 0=

    otherwise, and let if the subject is still exposed to risk at time t and ijY (t) 1= ijY (t) 0=

    otherwise. Hence, we have a set of n

    ii 1

    N m=

    =∑ subjects such that the counting process

    for the jij{N (t); t 0}≥th subject in ith cluster in the set, records the number of observed

    events up to time t. Letting denote the increment on ijdN (t) ijN (t) over the small

    interval [t,t+dt), the likelihood of the data conditioned on wi is the proportional to

    (3.1)

    [ ]

    [ ]

    iij

    mndN (t)

    ij i 0 0 1i 1 j 1 t 0

    ij i 0 0 1t 0

    Y (t)w h (t | ) h (t | )

    exp Y (t)w h (t | ) h (t | ) .

    = = ≥

    ⎛ ⎞λ + θ⎜ ⎟

    ⎝ ⎠

    ⎛ ⎞× − λ + θ⎜ ⎟⎜ ⎟

    ⎝ ⎠

    ∏∏ ∏

    Since we allow each ijN (t) to take at most one jump for each subject, the

    contribute to the likelihood in the same manner as independent Poisson random

    variables even though for all i, j and t.

    ijdN (t)

    ijdN (t) 1≤

    Suppose that, as described in section 2, the time axis [0,∞) is pertained into g + 1

    disjoint intervals I1, I2,…, Ig + 1 where Ik = for k = 1,2,…,g+1, with k 1 k[a ,a )− 0a 0= and

    In the kg 1a + = ∞.th interval, given wi, the jth subject in the ith cluster has hazard form

    , (k = 1,2,…,gi 0 ij 0k 1 ij ijw {h (t | ) h (t | )}λ + θ ij). Recall that gij denotes the number of

    partitions of the time interval for the jth subject in the ith group. Given the complete data

    (T, w), where T = {tij :i = 1,2,…,ni; j = 1,2,…,mi}, w = (w1,…,wn), the likelihood (3.1)

    can be re-expressed as

    10

  • (3.2)

    { }

    [ ]

    ijiijk

    k 1 k]

    gmn dNij i 0 0 1

    i 1 j 1 k 1 t (a ,a

    ij i 0 0 1t 0

    Y (t)w h (t | ) h (t | )

    exp Y (t)w h (t | ) h (t | ) ,

    −= = = ∈

    ⎛ ⎞⎜ ⎟⎡ ⎤λ + θ⎣ ⎦⎜ ⎟⎝ ⎠

    ⎛ ⎞× − λ + θ⎜ ⎟⎜ ⎟

    ⎝ ⎠

    ∏∏∏ ∏

    where dNijk is the change in the count function for jth subject in the ith group in the

    interval k. Under the assumption that the risk occurred in the interval Ik is small, i.e.,

    (3.3) { }k

    k 1

    a

    ij 0 0 1a

    Y (t) h (t | ) h (t | ) dt 0 for all i,j,k −

    λ + θ ≈∫

    The likelihood contribution across this interval for individuals at risk is approximately

    ijkdN

    i 0k k k 1 i 0k k k 1ij ij

    1 1w dH (a a ) exp w dH (a a )− −⎧ ⎫ ⎛⎡ ⎤ ⎡ ⎤⎪ ⎪ ⎜ ⎟+ − × − + −⎢ ⎥ ⎢ ⎥⎨ ⎬ ⎜ ⎟θ θ⎢ ⎥ ⎢ ⎥⎪ ⎪⎣ ⎦ ⎣ ⎦⎩ ⎭ ⎝

    ⎠ (3.4)

    where

    , is the usual cumulative baseline intensity for the kk 1

    k

    a

    0k 0a

    dH h (t)dt−

    = ∫ th interval.

    Hence, the likelihood (3.4) is essentially Poisson in form, reflecting the fact that the

    likelihood may be thought of as generated by independent contributions of many data

    ‘atoms’ each concerned with observation of an individual over a very short interval

    during which the intensity may be regarded constant and approximately zero (for a

    review of this point, see Clayton, 1994). Therefore, we replace (3.4) with

    ijki

    ijk

    dNmn

    i 0k k k 1iji 1 j 1 k:Y 1

    i 0k k k 1ij

    1w dH (a a )

    1 exp w dH (a a ) ,

    −= = =

    ⎧ ⎫⎡ ⎤⎪ ⎪+ −⎢ ⎥⎨ ⎬θ⎢ ⎥⎪ ⎣⎩

    ⎛ ⎞⎡ ⎤⎜ ⎟× − + −⎢ ⎥⎜ ⎟θ⎢ ⎥⎣ ⎦⎝ ⎠

    ∏∏ ∏⎪⎦⎭ (3.5)

    11

  • where Yijk = 1 if the jth subject in the ith group is exposed to risk at time ,

    and Y

    k 1 kt (a ,a ]−∈

    ijk = 0 otherwise.

    4. Prior Distribution To complete a Bayesian specification of the model, prior distributions are needed for

    the vector parameter , and the hyperparameters β, 0λ2θσ and have to be specified. It

    seems natural to assume independent priors for 0λ , (β, 2θσ ) and 0 00 01 0g( , ,..., ) ,′λ = λ λ λ

    we assume independent Gamma priors, i.e.,

    , (4.1) 0k 0k 0k 0k ij~ Ga( | a , b ), k 1, 2,..., gλ λ =

    where 0k 0ka b is the prior expectation for 0kλ , and 2

    0k 0ka b is the prior variance, with

    prior independence assumed across kth interval, hence

    (4.2) ijg

    0 0k 0kk 1

    ~ Ga( | a ,b=

    λ λ∏ 0k ).

    2 ,

    For (β, ), we choose the usual Normal-Inverse Gamma conjugate priors, i.e. 2θσ

    2 p| ~ N ( | m , V )θ θ θ θβ σ β σ (4.3)

    with 2 2~ Ga(1/ | a ,b ).θ θ θ θσ σ (4.4)

    Finally, we suggest a Gamma distribution as a prior for η, i.e.

    1 2~ Ga( , ).η φ φ (4.5)

    where 1 2φ φ is the prior expectation for η, and 2

    1 2φ φ is the prior variance.

    4.1 Data Augmentation and Gibbs Sampler

    To perform the conditional posterior distribution, we use the approach of ‘data

    augmentation’ (Tanner and Wong, 1987). The idea of data augmentation is to augment

    with the so-called latent data or missing data, in order exploit the simplicity of the

    resulting conditional posterior distributions of vector parameters of interest. Although,

    12

  • this will increase the dimensionality of the problem (possibly at the expense of extra

    computing time), the Gibbs sampler will be seen to be simple as follows:

    First not that under (3.4), it is essentially Poisson in form, reflecting the fact that

    the likelihood may be thought of as generated by independent contributions of many

    data each concerned with observation of individual i of cluster j over a very short

    interval during which the intensity may be regarded as constant, i.e.,

    n

    i 0k k k 1ij

    ijk 0k ij ij i ij

    i 0k k k 1ij

    1w dH (a a )

    P(N n | dH , x , , w ,Y 1)n!

    1 exp w dH (a a ) .

    ⎧ ⎫⎡ ⎤⎪ ⎪+ −⎢ ⎥⎨ ⎬θ⎢ ⎥⎪ ⎪⎣ ⎦⎩ ⎭= θ = =

    ⎛ ⎞⎡ ⎤⎜ ⎟× − + −⎢ ⎥⎜ ⎟θ⎢ ⎥⎣ ⎦⎝ ⎠

    (4.6)

    Hence we have

    ind

    iijk ijk i 0k k k 1

    ij

    wdN ~ Poisson[dN | w dH (a a )].−+ −θ (4.7)

    Since the additive form of the Poisson sum does not result in the conditional posterior

    distribution in a closed form, we can solve this problem by re-expressing (4.2) in an

    augmented form involving independent Poisson latent variables, unobserved or missing

    data, corresponding to each term in the expression for the Poisson mean. In particular,

    we assume

    (4.8) ijk ijk0 ijk2 ijk1 ijdN dN dN dN , for all i, j: Y 1,= + + =

    such that ijk0 ijk0 i 00 k k 1dN ~ Poisson[dN | w (a a )],−λ −

    2iijk2 ijk2 0k k k 1wdN ~ Poisson[dN | (a a ) ], k 1,2,...,g,2 −λ − =

    iijk1 ijk1 k k 1ij

    wdN ~ Poisson[dN | (a a )].−−θ

    Using the property that the sum of independent Poisson random variables is also

    Poisson, it is straightforward to show that (4.8) is equivalent to (4.7). Such expression

    13

  • allows us to take advantage of Poisson-Gamma conjugacy to obtain simple conditional

    posterior as much as possible. Some of the derivations of these conditional distributions

    are outlined in appendix A. The sampler iterates through the following steps:

    Step 1. Sample the latent variables ijk0 ijk2 ijk1 ij(dN ,dN ,dN ) , for all i, j,k: Y 1,′ = jointly

    from their full conditional posterior distribution from (A.3) as follows:

    1. If then let ijkdN 0= ijk0 ijk2 ijk1dN dN dN 0= = = ,

    2. If dNijk > 0 then sample (dNijk0, dNijk2,dNijk1) from

    Multimomial(dNijk|Pijk0,Pijk2,Pijk1), where

    00 k k 1ijk02 i

    00 k k 1 0k k k 1 k k 1ij

    (a a )P

    w1(a a ) (a a ) (a a )2

    − − −

    λ −=λ − + λ − + −

    θ

    ,

    2

    0k k k 1

    ijk22 i

    00 k k 1 0k k k 1 k k 1ij

    1 (a a )2P ,

    w1(a a ) (a a ) (a a )2

    − −

    λ −=λ − + λ − + −

    θ −

    and

    k k 1 ijijk12 i

    00 k k 1 0k k k 1 k k 1ij

    (a a )P

    w1(a a ) (a a ) (a a )2

    − − −

    − θ=λ − + λ − + −

    θ

    .

    Step 2. Sample from expression 00λ (A.4).

    Step 3. Sample , k = 1,2,…,g0kλ i, from expression (A.5).

    Step 5. Sample , i = 1,2,…,n. from expression iw (A.6).

    Step 6. Sample and then 2θσ2| θβ σ , from expressions (A.6) and (A.8), respectively.

    The other conditionals do not have a conjugate analysis. For each j = 1,2,…,mi and

    i = 1,2,…,n, the conditional distribution for ijθ is proportional to

    (4.9) ijkdN 2

    ij i 0 ij 0 ij ij ij ijY (t)w h(t | , ) S(t | , )f ( | x , ), for all k 1,2,...,g ,θ⎡ ⎤λ θ λ θ θ β σ =⎣ ⎦

    14

  • The expression (4.9) does not have closed form. But it is still possible to sample from it

    using a Metropolis algorithm.

    Finally; for i = 1,2,…,n. Letting 1,−ξ = η the full conditional distribution of ξ does not

    have closed form, either. It is proportional to

    n

    ini 11 n

    i ni 1

    exp ww

    [ ( )]=ξ− − ξ

    =

    ⎛ ⎞−ξ⎜ ⎟

    ⎛ ⎞ ⎝ ⎠ f ( ),ξ ξ⎜ ⎟Γ ξ⎝ ⎠

    ∑∏ (4.10)

    with 1 2~ Ga( | , ).ξ ξ φ φ (4.11)

    With this choice of priors, it can be shown that the above full conditional density is log-

    concave. Thus, we can use the adaptive rejection algorithm of Gilks and Wild (1992) to

    sample from this full conditional.

    5. Application

    Here we demonstrate the method using the well-known leukemia data analyzed by Cox

    (1972), (Hougaard, 2000, subsection 1.5.4), (Ibrahim et al., 2001, example 3.4),

    Spiegelhalter et al., 2004), among others. These data listed in Table 1 as reported by

    Houggard (2000) which consisted of 21 pairs matched of leukemia patients. The

    random variable of interest consists of remission times (in weeks) of the patients

    assigned to treatment with a drug or a placebo during remission maintenance therapy.

    Further, the patients were matched according to center and remission status, either

    partial or complete. Thus one in each pair received 6-MP as a treatment and one

    placebo. These data have been used in many articles, but in most of them neglecting the

    pairing. The aim is to find the effect of the treatment, and the corresponding covariate is

    15

  • TABLE 1: Leukemia Remission Time Data

    Status Placebo 6-MP

    P C C C C P C C C C C P C C C P P C C C C

    1 22 3

    12 8

    17 2

    11 8

    12 2 5 4

    15 8

    23 5

    11 4 1 8

    10 7

    32+ 23 22 6 16

    34+ 32+ 25+ 11+ 20+ 19+

    6 17+ 35+

    6 13 9+ 6+

    10+

    Source. Hougaard (2000, page 15)

    of the matched pair’s type. In the analysis below, we have used the program BUGS.

    Given the model assumptions, this program performs the Gibbs sampler by simulating

    from the full conditional distributions. The code to specify this model and to obtain the

    posterior distributions of the parameters is in the appendix B. the Bayesian estimators

    were obtained through the implementation of the Gibbs sampling scheme described in

    the previous section. We implemented 10,000 iterations of the algorithm and described

    the first 500 iterations as a burn-in. Spiegelhalter et al., (2004), the BUGS team, use the

    idea of parallel multiple chains to check the convergence of the Gibbs sampler and

    recommended to use from 2-5 chains. As mentioned by BUGS team, the fully

    Quantitative monitoring of parallel multiple chains was first proposed by Gelman and

    Rubin (1992a, b). The chains should start from over-dispersed initial values to ensure

    16

  • good converge of parameter space. To generate the Gibbs posterior samples in the

    previous section, we choose to use two parallel chains. Monitoring convergence of the

    chains, which have been done in this article via the Brooks and Gelman (1998)

    convergence-diagnostic-graph. Hence, once convergence has been achieved, 10,000

    observations are taken from each chain after the burn-in period to reach our goal of

    20,000 observations. Inspection of the Brooks and Gelman’s diagnostic graphs (Figures

    1a-2b), we find the BGR (Brooks an Gelman Ratio) convergent to one, this show that

    the convergence for the coefficient of regression β and the standard deviation of frailty

    b.σ Therefore, beyond the burn-in period, a sample of 10,000 observations from each of

    the two chains is drawn.

    Figure 1. Diagnostics related to β

    beta chains 1:2

    iteration501 5000 10000

    0.0

    0.5

    1.0

    beta chains 2:1

    iteration104501040010350

    -6.0 -4.0 -2.0 0.0 2.0

    a) Brooks & Gelman convergence diagnostics b) Trace plot of β for each chain

    c) ACF for the iterations for each chain

    beta chains 1:2

    lag0 20 40

    -1.0 -0.5 0.0 0.5 1.0

    d) History plot of β for each chain

    17

  • Figure 2. Diagnostics related to bσ

    a) Brooks & Gelman convergence diagnostics b) Trace plot of β for each chain

    sigma.b chains 1:2

    iteration501 5000 10000

    0.0

    0.5

    1.0

    sigma.b chains 2:1

    iteration104501040010350

    1.0 2.0 3.0 4.0

    sigma.b chains 1:2

    lag0 20 40

    -1.0 -0.5 0.0 0.5 1.0

    c) ACF for the iterations for each chain

    For each of the two-chains, BUGS software depicts estimated parameters as a

    function in the iteration number (Figures 1b-2b). Additionally, the BUGS software

    offers also a graph of the autocorrelation function (ACF) of the iterations to the 50-lag

    for each chain independently (Figures 1C-2C). The autocorrelation plot in Figure 2c

    illustrates such dependence between successive observation, which appears to die out

    wee before lag 40. This indicates fairly rapid mixing and thus good convergence of the

    parameter space with a reasonably small number of iterations. As a rule of thumb if the

    autocorrelations are needed to get ride of the dependence structure, but from (Figures

    1b, 1d), we can be reasonably confident that convergence of β has been achieved (the

    two chains appear to be overlapping one another) and thus the convergence looks

    reasonable.

    For each node for the data set, similar set of graphs is produced to monitor

    convergence, independence and convergence. They are suppressed in this article for

    purposes of space limit.

    18

  • Once, one is satisfied with the ACF and converges graph at least for the

    parameters of interest, and most importantly with the convergence of all model

    parameters, Gibbs sample of size 20,000 is drawn for each parameter. Table 2, 2.5%

    and 95.5% correspond to the respective posterior percentiles of β and bσ .

    TABLE 2. Posterior summaries of β and bσ

    Parameter Mean SD 2.5% Median 9.75%

    β bσ

    -1.54 0.6604 −3.02 −1.473 −0.4582 1.865 0.2648 1.427 1.838 2.461

    Therefore, the 95% credible interval for β is thus (−3.02, −0.4582), and the mass for the

    posterior distribution of β is to the left of zero, indicating the treatment 6-MP drug has a

    significant effect compared to placebo drug. This can be further illustrated in a plot of

    the marginal posterior density of β as shown below in Figure 3.

    Figure 3. Estimated marginal density for β

    beta chains 1:2 sample: 20000

    -8.0 -6.0 -4.0 -2.0 0.0

    0.0 0.2 0.4 0.6 0.8

    Figure 4, below, demonstrates the types of inference for the survival probability for the

    two groups separately and simultaneously that can be obtained from the full posterior

    samples.

    Figure 4. Posterior mean and central 95% limits for survival probabities

    The survival probability for treatment group

    Time (weeks) 0.0 10.0 20.0 30.0 40.0

    S(t)

    0.2

    0.4

    0.6

    0.8

    1.0

    19

  • The survival probability for placebo group

    Time (weeks) 0.0 10.0 20.0 30.0 40.0

    S(t)

    0.0

    0.5

    1.0

    The survival probability for the treatment and placebo simulatneously

    Time (weeks) 0.0 10.0 20.0 30.0 40.0

    0.2

    0.4

    0.6

    0.8

    1.0

    6. Conclusion

    The current article makes the additive hazard model a practical alternative to the

    proportional hazard model. When the proportional hazard assumptions are violated,

    there is a need for alternative models such as the one studied here. Following the

    additive hazard model alternatively with the proportional hazard model, based on

    counting process is extended to deal with the unobserved heterogeneity among the

    individuals in study. A Bayesian analysis for survival analysis for survival models with

    frailty, especially survival models with PHM or AHM for the hazard function, were

    impossible only a few years ago. Nowadays, with the great increase in computational

    power of computers, the analysis of these kinds of models is advisable in survival

    analysis. That is why, such models can better explain the relationship between the

    lifetime random variable and the diagnostic factors (covariates). Spiegelhalter et al.

    (2004), the BUGS team, analyzed the data set, used in section 5, assumed that the

    proportional hazard assumptions are verified. We assume for a moment, for the purpose

    of illustration, that one of assumptions of proportional hazard for this data set, are

    20

  • violated. Hence, We have developed the BUGS code available from the BUGS team to

    implement the algorithm that has been described in Section 4.

    APPENDIX A Derivation of Conditional Posteriors:

    The joint posterior density of the parameters and latent variables

    ( , , ) is proportional to

    20( , , , , wθλ θ β σ )

    ijk0dN ijk2dN ijk1dN

    iji gmn

    ijk ijk0 ijk2 ijk1 ijk0 i 00 k k 1i 1 j 1 k 1

    2i iijk2 0k k k 1 ijk1 k k 1 ij

    ij

    2 1 1ij 0k 0k 0k i

    I(dN dN dN dN )Poisson[dN | w (a a )]

    w wPoisson[dN | (a a ) ]Poisson[dN | (a a )]N( log |2

    x , )Ga( | a ,b )Ga(w | , )

    −= = =

    − −

    − −θ

    = + + λ −

    λ − − θθ

    β σ λ η η

    ∏∏∏ ×

    2 21 2 00 00 00

    Ga( | , )Ga( | a ,b )N( | m , V )Ga(1 | a ,b ).θ θ θ θ θ θ

    ⎧ ⎫⎪ ⎪⎪ ⎪⎪ ⎪⎪ ⎪⎪ ⎪⎪ ⎪⎨ ⎬⎪ ⎪⎪ ⎪⎪ ⎪⎪ ⎪⎪ ⎪⎪ ⎪⎩ ⎭

    × η φ φ λ β σ σ

    (A.1)

    Step 1. It follows from expression (A.1) that the full conditional distribution of the

    latent variables is proportional to

    ijk0

    ijk1ijk2

    ijk2

    dNi 00 k k 1

    ijk ijk0 ijk2 ijk1ijk0

    dNdN2i ij k k 1i k k 1

    dNijk1

    [w (a a )]I(dN dN dN dN )

    dN !

    [(w )(a a )][(1 2)w (a a ) ] .dN !!

    −−

    λ −= + + ×

    θ −−×

    ×

    (A.2)

    on the other hand, given (A.1), we have (A.2) is also proportional to

    21

  • dNijk2ijk0

    ijk1

    ijk2

    i 00 k k 1 i 0k k k 1 i ij k k 1

    dN 2i 00 k k 1 i 0k k k 1

    ijk0 ijk2

    dNi ij k k 1

    ijk1

    dN !

    [w (a a ) (w 2) (a a ) (w )(a a )]

    [w (a a )] [(w 2) (a a ) ]dN ! dN !

    [(w )(a a )],

    dN !

    − −

    − −

    ×λ − + λ − + θ −

    λ − λ −× ×

    θ −

    ijk0

    ijk2

    ijk

    ijk0 ijk2 ijk1

    dN

    00 k k 12

    i 00 k k 1 i 0k k k 1 i ij k k 1

    dN2

    0k k k 12

    i 00 k k 1 i 0k k k 1 i ij k k 1

    dN !dN !dN !dN !

    (a a )

    {w (a a ) (w 2) (a a ) (w )(a a )}

    ( 2)(a a )

    {w (a a ) (w 2) (a a ) (w )(a a )}

    − − −

    − − −

    ∝ ×

    ⎡ ⎤λ −×⎢ ⎥

    λ − + λ − + θ −⎢ ⎥⎣ ⎦

    ⎡ ⎤λ −×⎢ ⎥

    λ − + λ − + θ −⎢ ⎥⎣ ⎦ijk1dN

    k k 1 ij2

    i 00 k k 1 i 0k k k 1 i ij k k 1

    (a a ) ,

    {w (a a ) (w 2) (a a ) (w )(a a )}−

    − − −

    ⎡ ⎤− θ⎢ ⎥

    λ − + λ − + θ −⎢ ⎥⎣ ⎦

    ijk0 ijk2 ijk1dN dN dNijk ijk0 ijk2 ijk1ijk0 ijk2 ijk1

    dN !P P P

    dN !dN !dN != ,

    ∝ Multinomial ({ , , }| , { }). (A.3) ijk0dN ijk2dN ijk1dN ijkdN ijk0, ijk2 ijk1P P ,P

    where , and are defined in subsection 4.1. ijk0P ijk2P ijk1P

    Step 2. The full conditional distribution of 00λ , is proportional to

    [ ]ijk0

    ij

    00

    dNi 00 k k 1

    i 00 k k 1ijk0i, j,k:Y 1

    a 100 00 00

    [w (a a )]exp w (a a )

    dN !

    ( ) exp( b ),

    −−

    =

    ⎧ ⎫λ −⎪ ⎪− λ −⎨ ⎬⎪ ⎪⎩ ⎭

    × λ −λ

    ( ) ijk0i, j,k:Yij dN00 00 ij i k k 1i, j,kexp Y w (a a ) ,−∑ ⎡ ⎤∝ λ −λ −⎣ ⎦∑

    22

  • ij iji ig gm mn n

    00 00 ijk0 00 ij i k k 1i 1 j 1 k 1 i 1 j 1 k 1

    Ga | a dN ,b Y w (a a ) .−= = = = = =

    ⎛ ⎞⎜ ⎟∝ λ + + −⎜ ⎟⎝ ⎠

    ∑∑∑ ∑∑∑ (A.4)

    Step 3. The full conditional distribution of 0kλ , k = 1,2,…, , is proportional to ijg

    ijk 0k

    ij

    dN a 12 2i 0k k k 1 i 0k k k 1 0k

    i, j,k:Y 1

    0k 0k

    [(w 2) (a a ) ] exp[ (w 2) (a a ) ]

    exp( b ),

    −− −

    =

    λ − − λ − λ

    × −λ

    mn iijk i

    i 1 j 1

    0k

    dN mn

    0k i 0k ij k k 1i 1 j 1

    a 10k 0k 0k

    ( ) exp (w 2) Y (a a )

    ( ) exp( b ),

    = =−

    = =

    ∑∑ ⎡ ⎤∝ λ − λ −⎢ ⎥

    ⎢ ⎥⎣ ⎦

    × λ −λ

    ∑∑

    ( )ijki, j0k 0k i ij k k 10k dN i, jGa | a ,b (w 2) Y (a a ) .−+∝ λ + −∑ ∑ (A.5) Step 4. To derive the conditional distribution of , i = 1,2,…,n, we start with the joint

    posterior density of parameters prior to augmentation that is proportional to

    iw

    ( )

    ijki

    ij

    1

    dNm

    i 0k k k 1 i 0k k k 1ij ijj 1 k:Y 1

    1ii

    1 1w dH (a a ) exp w dH (a a )

    w exp w ,−

    − −= =

    η −

    ⎧ ⎫ ⎧⎡ ⎤ ⎡ ⎤⎪ ⎪ ⎪+ − − + −⎢ ⎥ ⎢ ⎥⎨ ⎬ ⎨θ θ⎢ ⎥ ⎢ ⎥⎪ ⎪ ⎪⎣ ⎦ ⎣ ⎦⎩ ⎭ ⎩

    −η

    ∏ ∏⎫⎪×⎬⎪⎭

    gm iji 1ijijk i

    j 1k 1gdN 1 m

    1i i 0k k

    ijj 1 k 1

    1(w ) exp w dH (a a ) ,−

    = =+η −

    −−

    = =

    ∑ ∑ ⎧ ⎫⎛ ⎞⎪ ⎪⎜ ⎟∝ − η + +⎨ ⎬⎜ ⎟θ⎪ ⎪⎝ ⎠⎩ ⎭∑∑ k 1−

    ij iji ig gm m1

    ijk 0k k k 1ijj 1 k 1 j 1 k 1

    1Ga dN , dH (a a )− −= = = =

    ⎛ ⎞⎜ ⎟∝ + η + −⎜ ⎟θ⎝ ⎠∑∑ ∑∑ . (A.6)

    Step 5. The full conditional distribution of (β, 2θσ ) is proportional to

    23

  • imn2 2 2

    ij iji 1 j 1

    N(log | x , ) N( | m , V )Ga(1 | a ,b )θ θ θ θ θ θ= =

    ⎧ ⎫⎪ ⎪θ β σ β σ σ⎨ ⎬⎪ ⎪⎩ ⎭∏∏ θ .

    That expression is the same as one that appears in the usual conjugate analysis of the

    Normal data (see, e.g., DeGroot, 1970, pages 249-252). It is then proportional to a

    multivariate Normal-Inverse Gamma distribution, i.e.,

    ( )2 2 1p ˆ| ~ N | , (V xx−θ θ θ ,′β σ β β σ + (A.7)

    2 2 1V 1 ˆ ˆ~ Ga 1 | a ,b [(y x ) y (m ) V m .2 2

    −θ θ θ θ θ θ

    ⎛ ⎞′ ′σ σ + + − β + −β⎜ ⎟⎝ ⎠

    θ

    n

    (A.8)

    where 111 1m nmy (log ,...log ,..., log )′= θ θ θ , x is the covariate matrix and the estimated

    of the coefficient of regression, β̂ , is calculated from 1 1 1ˆ (V x x) (V m x y).− − −θ θ θ′ ′β = + +

    APPENDIX B Here we give the program code to analyze the data that has been described in Section 5

    with the program BUGS. Winbugs does not allow a[0] or dH[0] to be used, so j is

    started from 2, ‘j = 0’ in the original formula is treated as ‘j = 1’. Therefore, dH[j] is the

    intensity in ( )j 1− th interval.

    Model; leukemia data #the name of the program { # Set up data for(i in 1:N) { # N is the total number of patients for(j in 2:T) { # T is the number of unique failure times # risk set = 1 if obs.t >=a, where obs.t[i] is the observed remission or censoring time ith patient # eps = 0.00001 will be used to guard against numerical imprecision in step function # a[T] is the unique failure time + maximum censoring time Y[i,j]

  • # Model for(j in 2:T) { # Idt[N,T] is the total intensity process # I0dt[N,T] and I2dt[N,T] are the intensities for the baseline hazard function # I1dt[N,T] is the intensity for the parametric part for the hazard function # Y[N,T] =1 if subject observed and zero if the patient does not observed for(i in 1:N) { Idt[i, j]

  • sigma.theta

  • Clayton, D. (1994). Bayesian analysis of frailty models. Technical Report, Medical

    Research Council Biostatistics Unit, Cambridge.

    Clayton, D.G. (1978). A model for association in bivariate life-tables and its application

    in epidemiological studies of chronic disease incidence. Biometrika, 65, 141-

    151.

    Cox, D.R. (1972). Regression models and life-tables (with discussion). J.R. Statist. Soc.,

    B34, 187-220.

    Cox, D.R. (1975). Partial likelihood. Biometrika, 62, 269-276.

    DeGroot, M. H. (1970). Optimal Statistical Decisions. New York: McGraw-Hill.

    Dunson, D.B. and Herring, A.H. (2004). Bayesian model selection and averaging in

    additive and proportional hazards model (available for download at

    www.ftp.isds.duke.edu/workingPapers/04-16.pdf).

    Gamerman, D. (1991). Dynamic Bayesian models for survival data. Applied Statistics,

    40, 63-79.

    Gelman, A. and Rubin, D. (1992a). Inference from iterative simulation using multiple

    sequences. Statistical Science, 7, 457-511.

    Gelman, A. and Rubin, D. (1992b). A single from the Gibbs sampler provides a false

    sense of security. Bayesian Statistics, 4, eds. J.M. Bernardo, J.O. Berger, A.P.

    Dawid and A.F.M. Smith, New York: Oxford University Press, 625-631.

    Gilks, W.R. and Wilks, P. (1992). Adaptive rejection sampling for Gibbs sampling.

    Applied Statistics, 41, 337-348.

    Hougaard, P. (2000). Analysis of Multivariate Survival Data. New York: Springer -

    Verlag.

    Ibrahim, J.G., Chen, M.H. and Sinha, D. (2001). Bayesian Survival Analysis. New

    York: Springer – Verlag.

    Lin, D.Y. and Ying, Z.L. (1994). Semiparametric analysis of the additive risk model.

    Biometrika, 81, 61-71.

    27

    http://www.ftp.isds.duke.edu/workingPapers/04-16.pdf)

  • Lin, D.Y. and Ying, Z.L. (1995). Semiparametric analysis of general additive-

    multiplicative hazard models for counting process. Annals of Statistics,

    23,1712-1734.

    McKeague, I. W. and Sasieni, P.D. (1994). A partly parametric additive risk model.

    Biometrika, 81, 501-514.

    Sinha, D. (1993). Semiparametric Bayesian analysis of multiple event time data.

    Journal of the American Statistical Association, 88, 979-983.

    Sinha, D. and Dey, D.k. (1997). Semiparametric Bayesian analysis of survival analysis

    of survival data. Journal of the American Statistical Association, 92, 1195-

    1212.

    Sorensen, D. and Gianale, D. (2002). Likelihood, Bayesian, and MCMC Methods in

    Quantitative Genetics. New York: Springer - Verlag.

    Spiegelhalter, D.J., Thomas, A., Best N.G., Gilks W.R. and Lunn D. (2004). BUGS:

    Bayesian Inference Using Gibbs Sampling. MRC Biostatistics Unit, Cambridge,

    English.

    Tanner, M.A. and Wong, W.H. (1987). The calculation of posterior distributions data

    augmentation (with discussion). Journal of American Statistical Association,82,

    528-550.

    Vaupel, J.W., Manton, K.G., and Stallard, E. (1979). The impact of heterogeneity in

    individual frailty on the dynamics of mortality. Demography, 16, 439-454.

    Vu, H.T. (2003). Parametric and semiparametric conditional shared gamma frailty

    models with events before study entry. Communications in Statistics:

    Simulation and computation, 32(4), 1223-1248.

    28

    Shaban A. Shaban1,* and Ayman A. Mostafa2,** Institute of Statistical Studies and Research Under the multiplicative hazard model (Cox, 1972) takes the form Figure 1. Diagnostics related to ( Step 5. The full conditional distribution of ((, ) is proportional to APPENDIX B