Shared Frailty Survival Analysis Using Semiparametric Bayesian Methodinterstat.statjournals.net/YEAR/2005/articles/0511002.pdf · 2016. 4. 26. · Bayesian analysis for survival models

Shared Frailty Survival Analysis

Using Semiparametric Bayesian Method

Shaban A. Shaban1,* and Ayman A. Mostafa2,**

1Department of Mathematical Statistics,

Institute of Statistical Studies and Research

2Department of Technical Accounting General Insurance

Egyptian Insurance Supervisory Authority

28, Talaat Harb St., Cairo P.O. Box: 2545

* [email protected] ** [email protected]

1

SUMMARY. In survival data analysis, the proportional hazard model was introduced

by Cox (1972) in order to estimate the effects of different covariates influencing the

time-to-event data. The proportional hazard model has been used extensively in

biomedicine, reliability engineering and, recently, interest in its application in different

areas of knowledge has increased. However, proportional hazard model makes a

number of assumptions, which may be violated. The object of this article is to present a

Bayesian analysis for survival models with frailty under additive framework for the

hazard function in contrast to proportional hazard model. Frailty models in survival

analysis deal with the unobserved heterogeneity among subjects. Gibbs sampling

technique is used to assess the posterior quantities of interest. An illustrative analysis

within the context of survival time data is given.

KEY WORDS: Survival Analysis, Regression Models, Additive Survival Analysis,

Bayesian Inference, Frailty Models, BUGS

2

1. Introduction Survival data is a term used for describing data that measure the time to some event.

Statistical models and methods for such data and other time-to-event data are

extensively used in many fields, including the biomedical sciences, engineering, the

environmental sciences, economics, actuarial sciences, management, and social

sciences.

In survival analysis, the additive, multiplicative and the class of general additive-

multiplicative hazard models provide the three principle frameworks for studying the

association between covariates and the survival time. The hazard function, also called,

the risk or intensity function, of a survival time T associated with a P-vector of

covariates x is defined as h(t | x) f (t | x) [1 F(t | x)],= − where and are

the density function and the distribution function, respectively, of the random variable T

conditioned on the vector of covariates x. The function S(

f (. | x) F(. | x)

t | x) 1 F(t | x)= − is called the

survival function.

Under the additive hazard model (Lin and Ying, 1994; Beamonte and Bermúdez,

2003), the hazard function takes the form

0 0h(t | x) h (t) x,′= + β (1.1)

Under the multiplicative hazard model (Cox, 1972) takes the form

0h(t | x) h (t)exp( x),0′= α (1.2)

and under the class of general additive-multiplicative models (Lin and Ying, 1995;

Dunson and Herring, 2004), the hazard function takes the form

0 0 0h(t | Z) g{ R} h (t)h{ x}.′ ′= β + α (1.3)

where h0(t) is an unspecified “baseline hazard function”, Z (R ,x′ ′)= is a p-vector of

covariates and (say) is a p-vector of unknown regression parameters. The

covariate Z can be time-dependent, and are known link functions. It is

obvious that

0 0 0( , )′ ′ ′β γ = θ

g{.} h{.}

(1.3) encompasses both models (1.1) and (1.2).

In many applications such as biomedical sciences, the survival time T is often

subject to right censoring because certain patients may still be surviving at the end of

3

the study period. Furthermore, due to the complexity of biological process, it is

desirable not to parameterize and therefore, only semiparametric inference has

been used for models

0h (t)

(1.1), (1.2) and consequently (1.3).

In order to draw semiparametric inference for model (1.2), Cox (1972, 1975)

introduced the partial likelihood approach to estimate the regression vector parameter

. Since the proportional hazards assumptions are often violated, the need for more

flexible model motivate the introduction of models

0α

(1.1) and (1.3) . From Bayesian

perspective, the model (1.2) has also been approached. The article by Sinha and Dey

(1997) and the book on Bayesian survival analysis by Ibrahim et al. (2001) contain an

excellent survey.

Although the additive hazard model (1.1) have been advocated and successfully

utilized by numerous authors (e.g., Buckly, 1984; McKeague and Sasieni, 1994, and

other references therein), no satisfactory semiparametric methods of estimation have

been developed because of the fact that the partial likelihood approach can not be

directly used to eliminate nuisance function in estimating β0h (.) 0.

The above three models, as defined in expressions (1.1), (1.2) and (1.3), it is

modeled data as if all the individuals in the sample (conditionally to the vector of

covariates) are drawn from a single homogenous population. But frequently there is

heterogeneity in the population that the available covariates do not properly explain.

We now consider generalizations of model (1.1) to allow for covariates that do not

properly explain (or unobserved individuals effects). These are usually referred to as

‘frailty’ in the biomedical sciences. The notion of frailty provides a convenient way to

introduce random effects, association and unobserved heterogeneity into models for

survival data. In its simplest form a frailty is an unobserved proportionality factor that

modifies the hazard function (1.1). As discussed in Sinha (1993), the idea of frailty,

which was introduced by Vaupel et al. (1979), is particularly natural in the context of

proportional hazard model (1.2). In some situations the extra random frailty component

of proportional hazard model is required only to get a correct inference on fixed effects

4

of covariates, whereas in other cases the distribution of the random subject effect could

be one of the major interests.

In many applications, the study population can not be assumed to be homogenous

but must be considered as a heterogeneous sample, i.e., a mixture of individuals with

different hazards. For example, in many applications, it is impossible to measure all

relevant covariates related to the event of interest. Sometimes because the importance of

some covariates is still unknown or sometimes because of economical reasons.

Therefore the frailty approach is statistical modeling concept which aims to account for

heterogeneity, caused by unmeasured covariates.

This article focuses on the shared frailty model in the additive hazard model (1.1).

The shared frailty model is relevant to event times of related individuals, similar

members and repeated measurements (parallel data). Individuals in a group (cluster) are

assumed to share the same frailty, which is why this model is called “shared frailty

model”. It was introduced by Clayton (1978) and extensively studied in Hougaard

(2000). The survival times are assumed to be conditional independent with respect to

the shared (common) frailty. Most research in this area focused on multiplicative model

(1.2) with frailty from both classical approach (e.g., Vu, 2003) and Bayesian approach

(e.g., Clayton, 1991). In addition, there has no consideration of additive (1.1) or

additive-multiplicative models in the Bayesian literature. A notable exception is the

article by Beamonte and Bermúdez (2003), that proposed an approach for Bayesian

inference in an additive Gamma-polygonal hazard model and the recent article by

Dunson and Herring (2003), that focused on the problem of variable selection and

inference, first in model (1.1) and then in the more general model (1.3). These recent

articles did not consider the frailty for additive or additive-multiplicative hazard models.

In this article, we propose a hierarchical model where the shared frailty is assumed

to be Gamma distribution and the hazard function is given by model (1.1). We

considered the parametric part in (1.1), 0x′β , to be an Exponential hazard function

specific for each individual but associated with the covariates, which are assumed to be

5

time-independent, through a probabilistic model. This approach is especially

appropriate to deal with semiparametric Bayesian with a frailty.

Fully Bayesian computation of hierarchical models using simulation technique,

such as Markov chain Monte Carlo (MCMC) algorithms is conducted.

Section 2 describes the proposed model. Section 3 introduces some notation and

then derives the likelihood function. Section 4 derives the conditional posterior

distributions of model’s unknown parameters and Section 5 exemplifies the

methodology with a well-known data set. The appendix A gives details proofs of some

of the results. Appendix B gives the BUGS code that has been used to get the results of

the example.

2. The Additive Exponential-Piecewise Linear Hazard Model with the Shared Frailty

We consider an analysis of multiple event data event data where there are n groups

(clusters) (either individual subjects or groups of subjects) and that the ith cluster has mi

individuals and associates with an unobserved frailty wi, 1 ≤ i ≤ n. The jth individual in

the ith cluster, 1 ≤ j ≤ mi, associates with the fixed covariate vector xij. Such individuals

are assigned as belonging to a specific cluster because they are related somehow, say by

family association, or graphic location. Conditional on frailties wi, the complete survival

times are assumed to be independent. For convenience we suppress from now on the

subscript indexing individuals and consider the model

(2.1) 0 1h(t | w, x) w[h (t) h (t | x)], t 0,= + ≥

where represents a hazard function that has been modified by the inclusion

of a frailty. The frailty random variable, w, is assumed to independently of t and x for

all clusters with some parametric distribution with unit mean (when the mean is

assumed to be finite), usually Gamma (Clayton, 1991), where the unknown variance of

w (say, η) quantifies the amount of heterogeneity among individuals. That is we may

assume that

h(t | w,x)

6

(2.2) 1 1i iw | ~ Ga(w | , ),− −η η η

g

that is, given η, wi, i = 1,2,…,n, is modeled as Gamma distribution with scale parameter

and shape parameter . It is important to keep in mind the interpretation of w in

expression

1−η 1−η

(2.1). The frailty random variable w measures the random sensitivity of ith

cluster to the event of interest after eliminating the effect of the covariate. On the other

words, if the value of the frailty in (2.1) is greater than one, the individual has a larger

than average hazards and is said to be more ‘frail’ and vice versa. That is why for finite

mean frailty we need to assume unit mean (to assure identifiability) and we need to

assume that the frailty distribution of individuals at different covariate levels the same

mean but may have different variability.

The non-parametric part of model (2.1), h0(t), is assumed to be ‘a piecewise linear

hazard’. An ordinary piecewise constant hazard (see, e.g., Gamerman, 1991), which is

an example of a semiparametric hazard specification, has advantage that it is a simple

way to get a flexible hazard function, with simple estimation. On the other hand, it has a

major disadvantage that is the hazard is not continuous as a function of time, as there

are jumps at the interval end points. In order to avoid the discontinuity of the ordinary

piecewise Exponential model (Gamerman, 1991). To construct this model, we first split

the time the time axis into intervals 0 10 a a ... a= < < < , where g is the number of

intervals of observation time, i.g., ag > tij for all i = 1,2,…,n and j = 1,2,…,mi. The

hazard in the interval Ik = (ak−1,ak] is 00 k 1 0k k 1(t a ) I(a t)− −λ + − λ < . Thus, the hazard

function can be described as

(2.3) g

0 00 k 1 0k k 1k 1

h (t) (t a ) I(a t),− −=

= λ + − λ

by means of an indicator of death for the jijk ij k 1 ijI[a t ]−δ = δ <th individual in the ith

cluster throughout kth interval, and the observation time tijk in the interval. This quantity

equals

ij k 1 ij k 1ijk

ij j 1

t a if t a ,t

0 if t a .− −

−

− >⎧⎪= ⎨ 0.=θ

In general, the survival function is related to the hazard function through the expression

, where the integrated hazard function, H(t), is =

−ln[S(t). Hence given the relationship between the hazard and the survival function, it

can be shown that the individual survival function which has been modified by the

inclusion of a frailty, given the parameters w = (w

S(t) exp[ H(t)]= −t

0

H(t) h(u)du= ∫

1,w2,…,wn), λ0 = (λ00,λ01,…,λ0g) and θ

is

(2.4) [ w0 0 0 1S(t | w, , ) S (t | )S (t | ) ,λ θ = λ θ ]

0k

where S0(t) and S1(t) are, respectively, the survival functions related to linear piecewise

hazard function and Exponential hazard function, i.e., t g

0 0 kk 00

S (t | ) exp[ h (u)du] exp[ c (t) ],=

λ = − = − λ∑∫

1t

1 uS (t | ) exp( )du exp( ),∞

θ = − = −t

θ θ θ∫

where ck(t) are positive statistics. Therefore, the density function takes the form

0 0f (t | w, , ) h(t | w, , )S(t | w, , ),λ θ = λ θ λ θ0

] (2.5) [ w0 0 1 0 0 1 = w[h (t | ) h (t | )] S (t | )S (t | ) .λ + θ λ θ

8

We assume that the parameter θ is specific for each individual in the population, but

related to the covariates x through a probabilistic model. In order to facilitate

implementation, it is convenient to assume that

2| x ~ N(log | x, ),θθ θ β σ (2.6)

that is, given x, the logarithm of the mean, log θ, is modeled as Normal distribution with

mean βx, a linear combination of the effects covariates such that β = (β1,β2,…,βp) and x

be the N×p matrix with rows x1,x2,…,xN, and variance 2θσ . The hyperparameters β and

are unknown constant common to all individuals in the population. The expression 2θσ

(2.6) is equivalent to say that, given the hyperparameters, the mean of the Exponential

distribution is log-Normal distributed.

The hierarchical model given by the two stages (2.5) and (2.6) allows complete

heterogeneity ‘frailty’ in the population, so we can find two different individuals who

have the same covariate vector, but their hazard functions not necessarily identical.

Further, in the first stage (2.5), it has been described with its true parametric model

given vectors of a specific parameters, later to be estimated, while the second stage (2.6)

accounts for cross-sectional (between-subject) heterogeneity of the vector parameter θ =

{θij} (i = 1,2,…,n and j = 1,2,…,mi). So that θij denote the expected value of death for jth

individual in the cluster i. The hierarchical representation of the model enables us to use

MCMC methodology, that allows the Bayesian analysis of the problem.

3. The Likelihood Specification Using the Counting Process Approach Suppose that the jth individual in the ith cluster survival time Tij is an absolutely

continuous random variable conditionally independent of a right censoring time Zij

given the covariates xij and frailty wi. Let Vij = min(Tij,Zij) and ij ij ijI(T Z )δ = ≤ denote

the time to the end-point event and the indicator for the event of interest to take place,

respectively. Suppose that ij ij ij i(V , ,x ,w )δ are i.i.d, for i = 1,2,…,n, j = 1,2,…,mi, and

9

the conditional hazard function of Tij given xij and wi satisfies the additive Exponential-

piecewise linear hazard model.

For subject j in cluster i, let ijN (t) 1= if ij 1δ = in interval [0,t] and ijN (t) 0=

otherwise, and let if the subject is still exposed to risk at time t and ijY (t) 1= ijY (t) 0=

otherwise. Hence, we have a set of n

ii 1

N m=

=∑ subjects such that the counting process

for the jij{N (t); t 0}≥th subject in ith cluster in the set, records the number of observed

events up to time t. Letting denote the increment on ijdN (t) ijN (t) over the small

interval [t,t+dt), the likelihood of the data conditioned on wi is the proportional to

(3.1)

[ ]

[ ]

iij

mndN (t)

ij i 0 0 1i 1 j 1 t 0

ij i 0 0 1t 0

Y (t)w h (t | ) h (t | )

exp Y (t)w h (t | ) h (t | ) .

= = ≥

≥

⎛ ⎞λ + θ⎜ ⎟

⎝ ⎠

⎛ ⎞× − λ + θ⎜ ⎟⎜ ⎟

⎝ ⎠

∏∏ ∏

∫

Since we allow each ijN (t) to take at most one jump for each subject, the

contribute to the likelihood in the same manner as independent Poisson random

variables even though for all i, j and t.

ijdN (t)

ijdN (t) 1≤

Suppose that, as described in section 2, the time axis [0,∞) is pertained into g + 1

disjoint intervals I1, I2,…, Ig + 1 where Ik = for k = 1,2,…,g+1, with k 1 k[a ,a )− 0a 0= and

In the kg 1a + = ∞.th interval, given wi, the jth subject in the ith cluster has hazard form

, (k = 1,2,…,gi 0 ij 0k 1 ij ijw {h (t | ) h (t | )}λ + θ ij). Recall that gij denotes the number of

partitions of the time interval for the jth subject in the ith group. Given the complete data

(T, w), where T = {tij :i = 1,2,…,ni; j = 1,2,…,mi}, w = (w1,…,wn), the likelihood (3.1)

can be re-expressed as

10

(3.2)

{ }

[ ]

ijiijk

k 1 k]

gmn dNij i 0 0 1

i 1 j 1 k 1 t (a ,a

ij i 0 0 1t 0

Y (t)w h (t | ) h (t | )

exp Y (t)w h (t | ) h (t | ) ,

−= = = ∈

≥

⎛ ⎞⎜ ⎟⎡ ⎤λ + θ⎣ ⎦⎜ ⎟⎝ ⎠

⎛ ⎞× − λ + θ⎜ ⎟⎜ ⎟

⎝ ⎠

∏∏∏ ∏

∫

where dNijk is the change in the count function for jth subject in the ith group in the

interval k. Under the assumption that the risk occurred in the interval Ik is small, i.e.,

(3.3) { }k

k 1

a

ij 0 0 1a

Y (t) h (t | ) h (t | ) dt 0 for all i,j,k −

λ + θ ≈∫

The likelihood contribution across this interval for individuals at risk is approximately

ijkdN

i 0k k k 1 i 0k k k 1ij ij

1 1w dH (a a ) exp w dH (a a )− −⎧ ⎫ ⎛⎡ ⎤ ⎡ ⎤⎪ ⎪ ⎜ ⎟+ − × − + −⎢ ⎥ ⎢ ⎥⎨ ⎬ ⎜ ⎟θ θ⎢ ⎥ ⎢ ⎥⎪ ⎪⎣ ⎦ ⎣ ⎦⎩ ⎭ ⎝

⎞

⎠ (3.4)

where

, is the usual cumulative baseline intensity for the kk 1

k

a

0k 0a

dH h (t)dt−

= ∫ th interval.

Hence, the likelihood (3.4) is essentially Poisson in form, reflecting the fact that the

likelihood may be thought of as generated by independent contributions of many data

‘atoms’ each concerned with observation of an individual over a very short interval

during which the intensity may be regarded constant and approximately zero (for a

review of this point, see Clayton, 1994). Therefore, we replace (3.4) with

ijki

ijk

dNmn

i 0k k k 1iji 1 j 1 k:Y 1

i 0k k k 1ij

1w dH (a a )

1 exp w dH (a a ) ,

−= = =

−

⎧ ⎫⎡ ⎤⎪ ⎪+ −⎢ ⎥⎨ ⎬θ⎢ ⎥⎪ ⎣⎩

⎛ ⎞⎡ ⎤⎜ ⎟× − + −⎢ ⎥⎜ ⎟θ⎢ ⎥⎣ ⎦⎝ ⎠

∏∏ ∏⎪⎦⎭ (3.5)

11

where Yijk = 1 if the jth subject in the ith group is exposed to risk at time ,

and Y

k 1 kt (a ,a ]−∈

ijk = 0 otherwise.

4. Prior Distribution To complete a Bayesian specification of the model, prior distributions are needed for

the vector parameter , and the hyperparameters β, 0λ2θσ and have to be specified. It

seems natural to assume independent priors for 0λ , (β, 2θσ ) and 0 00 01 0g( , ,..., ) ,′λ = λ λ λ

we assume independent Gamma priors, i.e.,

, (4.1) 0k 0k 0k 0k ij~ Ga( | a , b ), k 1, 2,..., gλ λ =

where 0k 0ka b is the prior expectation for 0kλ , and 2

0k 0ka b is the prior variance, with

prior independence assumed across kth interval, hence

(4.2) ijg

0 0k 0kk 1

~ Ga( | a ,b=

λ λ∏ 0k ).

2 ,

For (β, ), we choose the usual Normal-Inverse Gamma conjugate priors, i.e. 2θσ

2 p| ~ N ( | m , V )θ θ θ θβ σ β σ (4.3)

with 2 2~ Ga(1/ | a ,b ).θ θ θ θσ σ (4.4)

Finally, we suggest a Gamma distribution as a prior for η, i.e.

1 2~ Ga( , ).η φ φ (4.5)

where 1 2φ φ is the prior expectation for η, and 2

1 2φ φ is the prior variance.

4.1 Data Augmentation and Gibbs Sampler

To perform the conditional posterior distribution, we use the approach of ‘data

augmentation’ (Tanner and Wong, 1987). The idea of data augmentation is to augment

with the so-called latent data or missing data, in order exploit the simplicity of the

resulting conditional posterior distributions of vector parameters of interest. Although,

12

this will increase the dimensionality of the problem (possibly at the expense of extra

computing time), the Gibbs sampler will be seen to be simple as follows:

First not that under (3.4), it is essentially Poisson in form, reflecting the fact that

the likelihood may be thought of as generated by independent contributions of many

data each concerned with observation of individual i of cluster j over a very short

interval during which the intensity may be regarded as constant, i.e.,

n

i 0k k k 1ij

ijk 0k ij ij i ij

i 0k k k 1ij

1w dH (a a )

P(N n | dH , x , , w ,Y 1)n!

1 exp w dH (a a ) .

−

−

⎧ ⎫⎡ ⎤⎪ ⎪+ −⎢ ⎥⎨ ⎬θ⎢ ⎥⎪ ⎪⎣ ⎦⎩ ⎭= θ = =

⎛ ⎞⎡ ⎤⎜ ⎟× − + −⎢ ⎥⎜ ⎟θ⎢ ⎥⎣ ⎦⎝ ⎠

(4.6)

Hence we have

ind

iijk ijk i 0k k k 1

ij

wdN ~ Poisson[dN | w dH (a a )].−+ −θ (4.7)

Since the additive form of the Poisson sum does not result in the conditional posterior

distribution in a closed form, we can solve this problem by re-expressing (4.2) in an

augmented form involving independent Poisson latent variables, unobserved or missing

data, corresponding to each term in the expression for the Poisson mean. In particular,

we assume

(4.8) ijk ijk0 ijk2 ijk1 ijdN dN dN dN , for all i, j: Y 1,= + + =

such that ijk0 ijk0 i 00 k k 1dN ~ Poisson[dN | w (a a )],−λ −

2iijk2 ijk2 0k k k 1wdN ~ Poisson[dN | (a a ) ], k 1,2,...,g,2 −λ − =

iijk1 ijk1 k k 1ij

wdN ~ Poisson[dN | (a a )].−−θ

Using the property that the sum of independent Poisson random variables is also

Poisson, it is straightforward to show that (4.8) is equivalent to (4.7). Such expression

13

allows us to take advantage of Poisson-Gamma conjugacy to obtain simple conditional

posterior as much as possible. Some of the derivations of these conditional distributions

are outlined in appendix A. The sampler iterates through the following steps:

Step 1. Sample the latent variables ijk0 ijk2 ijk1 ij(dN ,dN ,dN ) , for all i, j,k: Y 1,′ = jointly

from their full conditional posterior distribution from (A.3) as follows:

1. If then let ijkdN 0= ijk0 ijk2 ijk1dN dN dN 0= = = ,

2. If dNijk > 0 then sample (dNijk0, dNijk2,dNijk1) from

Multimomial(dNijk|Pijk0,Pijk2,Pijk1), where

00 k k 1ijk02 i

00 k k 1 0k k k 1 k k 1ij

(a a )P

w1(a a ) (a a ) (a a )2

−

− − −

λ −=λ − + λ − + −

θ

,

2

0k k k 1

ijk22 i

00 k k 1 0k k k 1 k k 1ij

1 (a a )2P ,

w1(a a ) (a a ) (a a )2

−

− −

λ −=λ − + λ − + −

θ −

and

k k 1 ijijk12 i

00 k k 1 0k k k 1 k k 1ij

(a a )P

w1(a a ) (a a ) (a a )2

−

− − −

− θ=λ − + λ − + −

θ

.

Step 2. Sample from expression 00λ (A.4).

Step 3. Sample , k = 1,2,…,g0kλ i, from expression (A.5).

Step 5. Sample , i = 1,2,…,n. from expression iw (A.6).

Step 6. Sample and then 2θσ2| θβ σ , from expressions (A.6) and (A.8), respectively.

The other conditionals do not have a conjugate analysis. For each j = 1,2,…,mi and

i = 1,2,…,n, the conditional distribution for ijθ is proportional to

(4.9) ijkdN 2

ij i 0 ij 0 ij ij ij ijY (t)w h(t | , ) S(t | , )f ( | x , ), for all k 1,2,...,g ,θ⎡ ⎤λ θ λ θ θ β σ =⎣ ⎦

14

The expression (4.9) does not have closed form. But it is still possible to sample from it

using a Metropolis algorithm.

Finally; for i = 1,2,…,n. Letting 1,−ξ = η the full conditional distribution of ξ does not

have closed form, either. It is proportional to

n

ini 11 n

i ni 1

exp ww

[ ( )]=ξ− − ξ

=

⎛ ⎞−ξ⎜ ⎟

⎛ ⎞ ⎝ ⎠ f ( ),ξ ξ⎜ ⎟Γ ξ⎝ ⎠

∑∏ (4.10)

with 1 2~ Ga( | , ).ξ ξ φ φ (4.11)

With this choice of priors, it can be shown that the above full conditional density is log-

concave. Thus, we can use the adaptive rejection algorithm of Gilks and Wild (1992) to

sample from this full conditional.

5. Application

Here we demonstrate the method using the well-known leukemia data analyzed by Cox

(1972), (Hougaard, 2000, subsection 1.5.4), (Ibrahim et al., 2001, example 3.4),

Spiegelhalter et al., 2004), among others. These data listed in Table 1 as reported by

Houggard (2000) which consisted of 21 pairs matched of leukemia patients. The

random variable of interest consists of remission times (in weeks) of the patients

assigned to treatment with a drug or a placebo during remission maintenance therapy.

Further, the patients were matched according to center and remission status, either

partial or complete. Thus one in each pair received 6-MP as a treatment and one

placebo. These data have been used in many articles, but in most of them neglecting the

pairing. The aim is to find the effect of the treatment, and the corresponding covariate is

15

TABLE 1: Leukemia Remission Time Data

Status Placebo 6-MP

P C C C C P C C C C C P C C C P P C C C C

1 22 3

12 8

17 2

11 8

12 2 5 4

15 8

23 5

11 4 1 8

10 7

32+ 23 22 6 16

34+ 32+ 25+ 11+ 20+ 19+

6 17+ 35+

6 13 9+ 6+

10+

Source. Hougaard (2000, page 15)

of the matched pair’s type. In the analysis below, we have used the program BUGS.

Given the model assumptions, this program performs the Gibbs sampler by simulating

from the full conditional distributions. The code to specify this model and to obtain the

posterior distributions of the parameters is in the appendix B. the Bayesian estimators

were obtained through the implementation of the Gibbs sampling scheme described in

the previous section. We implemented 10,000 iterations of the algorithm and described

the first 500 iterations as a burn-in. Spiegelhalter et al., (2004), the BUGS team, use the

idea of parallel multiple chains to check the convergence of the Gibbs sampler and

recommended to use from 2-5 chains. As mentioned by BUGS team, the fully

Quantitative monitoring of parallel multiple chains was first proposed by Gelman and

Rubin (1992a, b). The chains should start from over-dispersed initial values to ensure

16

good converge of parameter space. To generate the Gibbs posterior samples in the

previous section, we choose to use two parallel chains. Monitoring convergence of the

chains, which have been done in this article via the Brooks and Gelman (1998)

convergence-diagnostic-graph. Hence, once convergence has been achieved, 10,000

observations are taken from each chain after the burn-in period to reach our goal of

20,000 observations. Inspection of the Brooks and Gelman’s diagnostic graphs (Figures

1a-2b), we find the BGR (Brooks an Gelman Ratio) convergent to one, this show that

the convergence for the coefficient of regression β and the standard deviation of frailty

b.σ Therefore, beyond the burn-in period, a sample of 10,000 observations from each of

the two chains is drawn.

Figure 1. Diagnostics related to β

beta chains 1:2

iteration501 5000 10000

0.0

0.5

1.0

beta chains 2:1

iteration104501040010350

-6.0 -4.0 -2.0 0.0 2.0

a) Brooks & Gelman convergence diagnostics b) Trace plot of β for each chain

c) ACF for the iterations for each chain

beta chains 1:2

lag0 20 40

-1.0 -0.5 0.0 0.5 1.0

d) History plot of β for each chain

17

Figure 2. Diagnostics related to bσ

a) Brooks & Gelman convergence diagnostics b) Trace plot of β for each chain

sigma.b chains 1:2

iteration501 5000 10000

0.0

0.5

1.0

sigma.b chains 2:1

iteration104501040010350

1.0 2.0 3.0 4.0

sigma.b chains 1:2

lag0 20 40

-1.0 -0.5 0.0 0.5 1.0

c) ACF for the iterations for each chain

For each of the two-chains, BUGS software depicts estimated parameters as a

function in the iteration number (Figures 1b-2b). Additionally, the BUGS software

offers also a graph of the autocorrelation function (ACF) of the iterations to the 50-lag

for each chain independently (Figures 1C-2C). The autocorrelation plot in Figure 2c

illustrates such dependence between successive observation, which appears to die out

wee before lag 40. This indicates fairly rapid mixing and thus good convergence of the

parameter space with a reasonably small number of iterations. As a rule of thumb if the

autocorrelations are needed to get ride of the dependence structure, but from (Figures

1b, 1d), we can be reasonably confident that convergence of β has been achieved (the

two chains appear to be overlapping one another) and thus the convergence looks

reasonable.

For each node for the data set, similar set of graphs is produced to monitor

convergence, independence and convergence. They are suppressed in this article for

purposes of space limit.

18

Once, one is satisfied with the ACF and converges graph at least for the

parameters of interest, and most importantly with the convergence of all model

parameters, Gibbs sample of size 20,000 is drawn for each parameter. Table 2, 2.5%

and 95.5% correspond to the respective posterior percentiles of β and bσ .

TABLE 2. Posterior summaries of β and bσ

Parameter Mean SD 2.5% Median 9.75%

β bσ

-1.54 0.6604 −3.02 −1.473 −0.4582 1.865 0.2648 1.427 1.838 2.461

Therefore, the 95% credible interval for β is thus (−3.02, −0.4582), and the mass for the

posterior distribution of β is to the left of zero, indicating the treatment 6-MP drug has a

significant effect compared to placebo drug. This can be further illustrated in a plot of

the marginal posterior density of β as shown below in Figure 3.

Figure 3. Estimated marginal density for β

beta chains 1:2 sample: 20000

-8.0 -6.0 -4.0 -2.0 0.0

0.0 0.2 0.4 0.6 0.8

Figure 4, below, demonstrates the types of inference for the survival probability for the

two groups separately and simultaneously that can be obtained from the full posterior

samples.

Figure 4. Posterior mean and central 95% limits for survival probabities

The survival probability for treatment group

Time (weeks) 0.0 10.0 20.0 30.0 40.0

S(t)

0.2

0.4

0.6

0.8

1.0

19

The survival probability for placebo group

Time (weeks) 0.0 10.0 20.0 30.0 40.0

S(t)

0.0

0.5

1.0

The survival probability for the treatment and placebo simulatneously

Time (weeks) 0.0 10.0 20.0 30.0 40.0

0.2

0.4

0.6

0.8

1.0

6. Conclusion

The current article makes the additive hazard model a practical alternative to the

proportional hazard model. When the proportional hazard assumptions are violated,

there is a need for alternative models such as the one studied here. Following the

additive hazard model alternatively with the proportional hazard model, based on

counting process is extended to deal with the unobserved heterogeneity among the

individuals in study. A Bayesian analysis for survival analysis for survival models with

frailty, especially survival models with PHM or AHM for the hazard function, were

impossible only a few years ago. Nowadays, with the great increase in computational

power of computers, the analysis of these kinds of models is advisable in survival

analysis. That is why, such models can better explain the relationship between the

lifetime random variable and the diagnostic factors (covariates). Spiegelhalter et al.

(2004), the BUGS team, analyzed the data set, used in section 5, assumed that the

proportional hazard assumptions are verified. We assume for a moment, for the purpose

of illustration, that one of assumptions of proportional hazard for this data set, are

20

violated. Hence, We have developed the BUGS code available from the BUGS team to

implement the algorithm that has been described in Section 4.

APPENDIX A Derivation of Conditional Posteriors:

The joint posterior density of the parameters and latent variables

( , , ) is proportional to

20( , , , , wθλ θ β σ )

ijk0dN ijk2dN ijk1dN

iji gmn

ijk ijk0 ijk2 ijk1 ijk0 i 00 k k 1i 1 j 1 k 1

2i iijk2 0k k k 1 ijk1 k k 1 ij

ij

2 1 1ij 0k 0k 0k i

I(dN dN dN dN )Poisson[dN | w (a a )]

w wPoisson[dN | (a a ) ]Poisson[dN | (a a )]N( log |2

x , )Ga( | a ,b )Ga(w | , )

−= = =

− −

− −θ

= + + λ −

λ − − θθ

β σ λ η η

∏∏∏ ×

2 21 2 00 00 00

Ga( | , )Ga( | a ,b )N( | m , V )Ga(1 | a ,b ).θ θ θ θ θ θ

⎧ ⎫⎪ ⎪⎪ ⎪⎪ ⎪⎪ ⎪⎪ ⎪⎪ ⎪⎨ ⎬⎪ ⎪⎪ ⎪⎪ ⎪⎪ ⎪⎪ ⎪⎪ ⎪⎩ ⎭

× η φ φ λ β σ σ

(A.1)

Step 1. It follows from expression (A.1) that the full conditional distribution of the

latent variables is proportional to

ijk0

ijk1ijk2

ijk2

dNi 00 k k 1

ijk ijk0 ijk2 ijk1ijk0

dNdN2i ij k k 1i k k 1

dNijk1

[w (a a )]I(dN dN dN dN )

dN !

[(w )(a a )][(1 2)w (a a ) ] .dN !!

−

−−

λ −= + + ×

θ −−×

×

(A.2)

on the other hand, given (A.1), we have (A.2) is also proportional to

21

dNijk2ijk0

ijk1

ijk2

i 00 k k 1 i 0k k k 1 i ij k k 1

dN 2i 00 k k 1 i 0k k k 1

ijk0 ijk2

dNi ij k k 1

ijk1

dN !

[w (a a ) (w 2) (a a ) (w )(a a )]

[w (a a )] [(w 2) (a a ) ]dN ! dN !

[(w )(a a )],

dN !

− −

− −

−

−

×λ − + λ − + θ −

λ − λ −× ×

θ −

ijk0

ijk2

ijk

ijk0 ijk2 ijk1

dN

00 k k 12

i 00 k k 1 i 0k k k 1 i ij k k 1

dN2

0k k k 12

i 00 k k 1 i 0k k k 1 i ij k k 1

dN !dN !dN !dN !

(a a )

{w (a a ) (w 2) (a a ) (w )(a a )}

( 2)(a a )

{w (a a ) (w 2) (a a ) (w )(a a )}

−

− − −

−

− − −

∝ ×

⎡ ⎤λ −×⎢ ⎥

λ − + λ − + θ −⎢ ⎥⎣ ⎦

⎡ ⎤λ −×⎢ ⎥

λ − + λ − + θ −⎢ ⎥⎣ ⎦ijk1dN

k k 1 ij2

i 00 k k 1 i 0k k k 1 i ij k k 1

(a a ) ,

{w (a a ) (w 2) (a a ) (w )(a a )}−

− − −

⎡ ⎤− θ⎢ ⎥

λ − + λ − + θ −⎢ ⎥⎣ ⎦

ijk0 ijk2 ijk1dN dN dNijk ijk0 ijk2 ijk1ijk0 ijk2 ijk1

dN !P P P

dN !dN !dN != ,

∝ Multinomial ({ , , }| , { }). (A.3) ijk0dN ijk2dN ijk1dN ijkdN ijk0, ijk2 ijk1P P ,P

where , and are defined in subsection 4.1. ijk0P ijk2P ijk1P

Step 2. The full conditional distribution of 00λ , is proportional to

[ ]ijk0

ij

00

dNi 00 k k 1

i 00 k k 1ijk0i, j,k:Y 1

a 100 00 00

[w (a a )]exp w (a a )

dN !

( ) exp( b ),

−−

=

−

⎧ ⎫λ −⎪ ⎪− λ −⎨ ⎬⎪ ⎪⎩ ⎭

× λ −λ

∏

( ) ijk0i, j,k:Yij dN00 00 ij i k k 1i, j,kexp Y w (a a ) ,−∑ ⎡ ⎤∝ λ −λ −⎣ ⎦∑

22

ij iji ig gm mn n

00 00 ijk0 00 ij i k k 1i 1 j 1 k 1 i 1 j 1 k 1

Ga | a dN ,b Y w (a a ) .−= = = = = =

⎛ ⎞⎜ ⎟∝ λ + + −⎜ ⎟⎝ ⎠

∑∑∑ ∑∑∑ (A.4)

Step 3. The full conditional distribution of 0kλ , k = 1,2,…, , is proportional to ijg

ijk 0k

ij

dN a 12 2i 0k k k 1 i 0k k k 1 0k

i, j,k:Y 1

0k 0k

[(w 2) (a a ) ] exp[ (w 2) (a a ) ]

exp( b ),

−− −

=

λ − − λ − λ

× −λ

∏

mn iijk i

i 1 j 1

0k

dN mn

0k i 0k ij k k 1i 1 j 1

a 10k 0k 0k

( ) exp (w 2) Y (a a )

( ) exp( b ),

= =−

= =

−

∑∑ ⎡ ⎤∝ λ − λ −⎢ ⎥

⎢ ⎥⎣ ⎦

× λ −λ

∑∑

( )ijki, j0k 0k i ij k k 10k dN i, jGa | a ,b (w 2) Y (a a ) .−+∝ λ + −∑ ∑ (A.5) Step 4. To derive the conditional distribution of , i = 1,2,…,n, we start with the joint

posterior density of parameters prior to augmentation that is proportional to

iw

( )

ijki

ij

1

dNm

i 0k k k 1 i 0k k k 1ij ijj 1 k:Y 1

1ii

1 1w dH (a a ) exp w dH (a a )

w exp w ,−

− −= =

η −

⎧ ⎫ ⎧⎡ ⎤ ⎡ ⎤⎪ ⎪ ⎪+ − − + −⎢ ⎥ ⎢ ⎥⎨ ⎬ ⎨θ θ⎢ ⎥ ⎢ ⎥⎪ ⎪ ⎪⎣ ⎦ ⎣ ⎦⎩ ⎭ ⎩

−η

∏ ∏⎫⎪×⎬⎪⎭

gm iji 1ijijk i

j 1k 1gdN 1 m

1i i 0k k

ijj 1 k 1

1(w ) exp w dH (a a ) ,−

= =+η −

−−

= =

∑ ∑ ⎧ ⎫⎛ ⎞⎪ ⎪⎜ ⎟∝ − η + +⎨ ⎬⎜ ⎟θ⎪ ⎪⎝ ⎠⎩ ⎭∑∑ k 1−

ij iji ig gm m1

ijk 0k k k 1ijj 1 k 1 j 1 k 1

1Ga dN , dH (a a )− −= = = =

⎛ ⎞⎜ ⎟∝ + η + −⎜ ⎟θ⎝ ⎠∑∑ ∑∑ . (A.6)

Step 5. The full conditional distribution of (β, 2θσ ) is proportional to

23

imn2 2 2

ij iji 1 j 1

N(log | x , ) N( | m , V )Ga(1 | a ,b )θ θ θ θ θ θ= =

⎧ ⎫⎪ ⎪θ β σ β σ σ⎨ ⎬⎪ ⎪⎩ ⎭∏∏ θ .

That expression is the same as one that appears in the usual conjugate analysis of the

Normal data (see, e.g., DeGroot, 1970, pages 249-252). It is then proportional to a

multivariate Normal-Inverse Gamma distribution, i.e.,

( )2 2 1p ˆ| ~ N | , (V xx−θ θ θ ,′β σ β β σ + (A.7)

2 2 1V 1 ˆ ˆ~ Ga 1 | a ,b [(y x ) y (m ) V m .2 2

−θ θ θ θ θ θ

⎛ ⎞′ ′σ σ + + − β + −β⎜ ⎟⎝ ⎠

θ

n

(A.8)

where 111 1m nmy (log ,...log ,..., log )′= θ θ θ , x is the covariate matrix and the estimated

of the coefficient of regression, β̂ , is calculated from 1 1 1ˆ (V x x) (V m x y).− − −θ θ θ′ ′β = + +

APPENDIX B Here we give the program code to analyze the data that has been described in Section 5

with the program BUGS. Winbugs does not allow a[0] or dH[0] to be used, so j is

started from 2, ‘j = 0’ in the original formula is treated as ‘j = 1’. Therefore, dH[j] is the

intensity in ( )j 1− th interval.

Model; leukemia data #the name of the program { # Set up data for(i in 1:N) { # N is the total number of patients for(j in 2:T) { # T is the number of unique failure times # risk set = 1 if obs.t >=a, where obs.t[i] is the observed remission or censoring time ith patient # eps = 0.00001 will be used to guard against numerical imprecision in step function # a[T] is the unique failure time + maximum censoring time Y[i,j]

# Model for(j in 2:T) { # Idt[N,T] is the total intensity process # I0dt[N,T] and I2dt[N,T] are the intensities for the baseline hazard function # I1dt[N,T] is the intensity for the parametric part for the hazard function # Y[N,T] =1 if subject observed and zero if the patient does not observed for(i in 1:N) { Idt[i, j]

sigma.theta

Clayton, D. (1994). Bayesian analysis of frailty models. Technical Report, Medical

Research Council Biostatistics Unit, Cambridge.

Clayton, D.G. (1978). A model for association in bivariate life-tables and its application

in epidemiological studies of chronic disease incidence. Biometrika, 65, 141-

151.

Cox, D.R. (1972). Regression models and life-tables (with discussion). J.R. Statist. Soc.,

B34, 187-220.

Cox, D.R. (1975). Partial likelihood. Biometrika, 62, 269-276.

DeGroot, M. H. (1970). Optimal Statistical Decisions. New York: McGraw-Hill.

Dunson, D.B. and Herring, A.H. (2004). Bayesian model selection and averaging in

additive and proportional hazards model (available for download at

www.ftp.isds.duke.edu/workingPapers/04-16.pdf).

Gamerman, D. (1991). Dynamic Bayesian models for survival data. Applied Statistics,

40, 63-79.

Gelman, A. and Rubin, D. (1992a). Inference from iterative simulation using multiple

sequences. Statistical Science, 7, 457-511.

Gelman, A. and Rubin, D. (1992b). A single from the Gibbs sampler provides a false

sense of security. Bayesian Statistics, 4, eds. J.M. Bernardo, J.O. Berger, A.P.

Dawid and A.F.M. Smith, New York: Oxford University Press, 625-631.

Gilks, W.R. and Wilks, P. (1992). Adaptive rejection sampling for Gibbs sampling.

Applied Statistics, 41, 337-348.

Hougaard, P. (2000). Analysis of Multivariate Survival Data. New York: Springer -

Verlag.

Ibrahim, J.G., Chen, M.H. and Sinha, D. (2001). Bayesian Survival Analysis. New

York: Springer – Verlag.

Lin, D.Y. and Ying, Z.L. (1994). Semiparametric analysis of the additive risk model.

Biometrika, 81, 61-71.

27

http://www.ftp.isds.duke.edu/workingPapers/04-16.pdf)

Lin, D.Y. and Ying, Z.L. (1995). Semiparametric analysis of general additive-

multiplicative hazard models for counting process. Annals of Statistics,

23,1712-1734.

McKeague, I. W. and Sasieni, P.D. (1994). A partly parametric additive risk model.

Biometrika, 81, 501-514.

Sinha, D. (1993). Semiparametric Bayesian analysis of multiple event time data.

Journal of the American Statistical Association, 88, 979-983.

Sinha, D. and Dey, D.k. (1997). Semiparametric Bayesian analysis of survival analysis

of survival data. Journal of the American Statistical Association, 92, 1195-

1212.

Sorensen, D. and Gianale, D. (2002). Likelihood, Bayesian, and MCMC Methods in

Quantitative Genetics. New York: Springer - Verlag.

Spiegelhalter, D.J., Thomas, A., Best N.G., Gilks W.R. and Lunn D. (2004). BUGS:

Bayesian Inference Using Gibbs Sampling. MRC Biostatistics Unit, Cambridge,

English.

Tanner, M.A. and Wong, W.H. (1987). The calculation of posterior distributions data

augmentation (with discussion). Journal of American Statistical Association,82,

528-550.

Vaupel, J.W., Manton, K.G., and Stallard, E. (1979). The impact of heterogeneity in

individual frailty on the dynamics of mortality. Demography, 16, 439-454.

Vu, H.T. (2003). Parametric and semiparametric conditional shared gamma frailty

models with events before study entry. Communications in Statistics:

Simulation and computation, 32(4), 1223-1248.

28

Shaban A. Shaban1,* and Ayman A. Mostafa2,** Institute of Statistical Studies and Research Under the multiplicative hazard model (Cox, 1972) takes the form Figure 1. Diagnostics related to ( Step 5. The full conditional distribution of ((, ) is proportional to APPENDIX B

Documents

Shared Frailty Survival Analysis Using Semiparametric Bayesian Methodinterstat.statjournals.net/YEAR/2005/articles/0511002.pdf · 2016. 4. 26. · Bayesian analysis for survival models