7

Click here to load reader

Multivariate Frailty Models for Exchangeable …faculty.london.edu/cstefanescu/Stefanescu-Turnbull_2006.pdf · Multivariate Frailty Models for Exchangeable Survival Data With Covariates

  • Upload
    dinhdat

  • View
    212

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Multivariate Frailty Models for Exchangeable …faculty.london.edu/cstefanescu/Stefanescu-Turnbull_2006.pdf · Multivariate Frailty Models for Exchangeable Survival Data With Covariates

Multivariate Frailty Models for ExchangeableSurvival Data With Covariates

Catalina STEFANESCU

London Business SchoolLondon NW1 4SAUnited Kingdom

([email protected])

Bruce W. TURNBULL

School of Operations ResearchCornell UniversityIthaca, NY 14853

([email protected])

We consider a multivariate lognormal frailty model for correlated exchangeable failure time data, wherethe marginal lifetimes have conditional Weibull distributions. We discuss Bayesian statistical methods tofit this model to experimental data with varying cluster sizes. The Bayesian inferential approach arisesnaturally from the hierarchical structure of the frailty model. In contrast, implementation of the maximumlikelihood approach encounters practical difficulties. The methodology is illustrated with the analyses ofthree datasets.

KEY WORDS: Correlated survival data; Gibbs sampling; Load-sharing models; Markov chain MonteCarlo; Multivariate lognormal frailty; Reliability.

1. INTRODUCTION

In this article, we consider parametric models for correlatedfailure time data and discuss Bayesian methods for fitting thesemodels to experimental data. An important example of suchdata is provided by component lifetimes in reliability studies(Crowder, Kimber, Smith, and Sweeting 1991, sec. 7.2; Roy2001; Lawless 2002, sec. 11.2). Association may arise betweencomponent lifetimes because they share a common operatingenvironment. Also, the failure of one part of a system may placea higher stress on the surviving components, potentially lead-ing to earlier failure. Such ideas of load sharing may occur inmany situations (e.g., in software reliability studies, power plantsafety assessment, or materials testing with fiber composites);see Hollander and Pena (1995) and Kvam and Pena (2005).

Correlated failure time data may be regarded as grouped inclusters, where the clusters are lifetimes of components belong-ing to the same system. The observations from different clustersare independent, but those from the same cluster are associ-ated. In many such cases, it is reasonable to assume that theresponses within a cluster are exchangeable. For example, inthe previous application, the load-sharing mechanism may bequite complex to model; if the components are identical, how-ever, considerations of symmetry would imply the reasonableminimal assumption that the failure times be exchangeable.

Sometimes it is the association itself between the clus-tered responses that is of scientific interest; its magnitude anddependence on explanatory variables may then need to beestimated. In other situations, the intracluster association is re-garded merely as a nuisance characteristic of the data, whereasinterest focuses on the marginal parameters. In these cases, theassociation still needs to be taken into account in order to makecorrect inference about the regression parameters. Indeed, thestandard analytic tools that ignore the intracluster associationmay lead to inefficient, inconsistent, and biased estimates, aswell as to erroneous standard errors.

A number of different models for multivariate survival datahave been proposed in the literature—an extensive survey ofthese models and their applications is presented by Hougaard(2000). One particularly flexible modeling tool for correlated

survival data is the frailty approach. The models in this classassume that, conditional on some unobserved quantity, W say,the lifetimes are independent. When the unknown W is inte-grated out, the lifetimes become dependent; the dependence isinduced by the common value of W . When W is a scalar, we saywe have a univariate or “shared” frailty model. In the reliabilityliterature, several articles have focused on shared frailty models(Lindley and Singpurwalla 1986; Nayak 1987; Hougaard 1989;Whitmore and Lee 1991; Jaisingh, Dey, and Griffith 1993).The theory of multivariate frailty models, where W is a vec-tor, has received less attention. Yet these models have severaladvantages; they can take into account individual-level explana-tory variables (discussed later) and allow a more flexible de-pendence structure within clusters of observations. Indeed, theunivariate frailty model can only model positive dependencebetween lifetimes. In practice, negative dependence could alsoarise, for example, when components may be competing for re-sources. Xue and Brookmeyer (1996, sec. 2) discussed severalmodeling situations where a multivariate rather than a univari-ate frailty approach is needed.

In addition to the modeling of association, a second issueconcerns the introduction of covariates for multivariate survivaldata. It may be reasonable to assume that the clustered sur-vival times are exchangeable only after taking the presence ofexplanatory variables into account. We shall say that the clus-tered lifetimes T1,T2, . . . ,Tr with covariates x1, . . . ,xr are ex-changeable after adjustment for covariates if their joint survivalfunction satisfies

S(t1, . . . , tr;x1, . . . ,xr)

= S(tπ(1), . . . , tπ(r);xπ(1), . . . ,xπ(r)

)(1)

for any t1, . . . , tr and any permutation π of the indices 1,2,

. . . , r. The inclusion of covariates is important because theiromission could bias the results of the analysis and because their

© 2006 American Statistical Association andthe American Society for Quality

TECHNOMETRICS, AUGUST 2006, VOL. 48, NO. 3DOI 10.1198/004017006000000048

411

Page 2: Multivariate Frailty Models for Exchangeable …faculty.london.edu/cstefanescu/Stefanescu-Turnbull_2006.pdf · Multivariate Frailty Models for Exchangeable Survival Data With Covariates

412 CATALINA STEFANESCU AND BRUCE W. TURNBULL

effects on the marginal survival times distributions may them-selves be of interest. Covariates may act either at a cluster levelor at an individual level. For example, the covariates specificto the operating conditions of the system (such as temperature,load, and pressure) act at the cluster level, whereas the covari-ates specific to each component (e.g., lot number and manu-facturer) act at the individual level (Costigan and Klein 1993).There is a need for models rich enough to allow both types ofcovariates while keeping the exchangeable association structurefairly flexible.

The main contributions of this article are to investigate a classof parametric multivariate lognormal frailty models for fail-ure time data that can incorporate covariates and to implementa Bayesian approach to model fitting when clusters of vary-ing sizes are available. Hougaard (2000, sec. 10.6) mentionedthe multivariate frailty model as a promising way of describ-ing negative dependence and more general complex depen-dence structures. Xue and Brookmeyer (1996) discussed thecase of a bivariate lognormal frailty and proposed a modi-fied expectation–maximization (EM) algorithm for estimation.In the multivariate case, direct maximization of the likelihoodis generally difficult to achieve because of the computationalintractability of the high-dimensional Laplace transforms in-volved in the likelihood function. As an alternative to maxi-mum likelihood estimation, we propose a Bayesian frameworkfor inference. This arises naturally from the inherent hierarchi-cal structure of the models and may be implemented by meansof Gibbs sampling. Similar Bayesian approaches to inferencefor univariate frailty models with Weibull marginals have beendiscussed by Sahu, Dey, Aslanidou, and Sinha (1997), Sahu andDey (2000), and Ibrahim, Chen, and Sinha (1991, chap. 4).

The article is structured as follows. Section 2 examines pa-rameterizations of models for clustered exchangeable survivalvariables. The Bayesian framework for estimation is developedin Section 3, and Section 4 describes three applications.

2. EXCHANGEABLE MODELS FOR SURVIVAL DATA

Let (T1,T2, . . . ,Tr) be a cluster of lifetimes and denote byS(t1, . . . , tr) their joint survival function. Let xi ∈ �p be a vectorof covariates corresponding to the ith observation of the clusterand let β be the vector of regression parameters. We assume thatthe lifetimes are exchangeable after adjustment for covariates.Let W = (W1, . . . ,Wr) be an r vector of frailties representingthe unmeasured effects of the environment on component lifelengths. Given W , the lifetimes are assumed to be indepen-dent. The exchangeability of Ti implies that the frailties Wi

are also exchangeable. Possible choices for the frailty distribu-tion are the multivariate normal, the multivariate logistic (Kotz,Balakrishnan, and Johnson 2000, p. 551), and the multivari-ate t (Spiegelhalter, Thomas, Best, and Lunn 2003). FollowingHougaard (2000, p. 379), we take the frailties to have a multi-variate normal distribution W ∼ Nr(0,�r), with the covariancematrix �r parameterized by

(�r)ii = σ 2, i = 1, . . . , r,(2)

(�r)ij = ρσ 2, i, j = 1, . . . , r, i �= j.

Here σ 2 ≥ 0 is the frailty variance and ρ is the frailty correla-tion. The restriction ρ > −1/(r − 1) is needed to ensure that�r is positive definite.

We consider the case where the conditional lifetime distribu-tions are Weibull:

Ti|(Wi,xi)

∼ Weibull(α, exp(Wi + xiβ)/µ

), i = 1, . . . , r, (3)

where α,µ > 0. The focus on the Weibull distribution is moti-vated by mathematical convenience and by the fact that Weibullis perhaps the most widely used lifetime distribution in reliabil-ity (Crowder et al. 1991; Lawless 2002). It provides a usefulapproximating model for a monotone failure rate, which is of-ten a reasonable assumption. Its mathematical properties makeit a particularly convenient and flexible generalization of theexponential distribution. It is sometimes also used in biologicaland medical applications as an alternative to nonparametric andsemiparametric specifications.

Note that the conditional hazard function, given by

h(ti|Wi;xi) = αµ−α exp[α(Wi + xiβ)]tα−1i ,

has a Cox (1972) proportional hazards structure. The marginalmoments and correlations of (T1, . . . ,Tr) may be expressed asfunctions of the model parameters by way of successive condi-tioning. Thus, for example, on the log scale we have

E(log(T);x) = E[E(log(T)|W;x)

]= E

[−γ

α− log(eW+xβ/µ)

]

= −γ

α+ log(µ) − xβ, (4)

where γ = .5772 . . . is Euler’s constant. Equation (4) holds be-cause log(T) has an extreme-value type I conditional distribu-tion. Similarly,

var(log(T);x) = var[E(log(T)|W;x)

] + E[var(log(T)|W;x)

]= � ′(1)

α2+ σ 2,

where � ′(1) = π2/6 is the variance of the extreme-value type Idistribution. It can also be shown that, for i �= j,

corr(log(Ti), log(Tj);xi,xj

) = ρσ 2

� ′(1)/α2 + σ 2. (5)

On the original time scale, the expressions for the marginalmoments are more complicated. They are given by

E(T;x) = E[E(T|W;x)] = �(1 + 1/α)µ exp(σ 2/2 − xβ),

where �(z) = ∫ ∞0 tz−1e−t dt, z > 0, is the gamma function, and

var(T;x) = eσ 2−2xβµ2[eσ 2�(1 + 2/α) − �(1 + 1/α)2].

For i �= j, we have

E(TiTj;xi,xj) = E[E(TiTj|W;xi,xj)]= E

[E(Ti|W;xi)E(Tj|W;xj)

]= (�(1 + 1/α))2

e(xi+xj)βµ2E

[e−(Wi+Wj)

]

= (�(1 + 1/α))2

e(xi+xj)βµ2e(ρ+1)σ 2

,

TECHNOMETRICS, AUGUST 2006, VOL. 48, NO. 3

Page 3: Multivariate Frailty Models for Exchangeable …faculty.london.edu/cstefanescu/Stefanescu-Turnbull_2006.pdf · Multivariate Frailty Models for Exchangeable Survival Data With Covariates

MULTIVARIATE FRAILTY MODELS 413

and, hence,

corr(Ti,Tj;xi,xj) = E(TiTj;xi,xj) − E(Ti;xi)E(Tj;xj)√var(Ti;xi)var(Tj;xj)

= (�(1 + 1/α))2(eρσ 2 − 1)

eσ 2�(1 + 2/α) − �(1 + 1/α)2

. (6)

Note that the correlation between lifetimes does not dependon the values of the covariates. In particular, when the condi-tional distributions are exponential (α = 1), we have

E(T;x) = µeσ 2/2−xβ , var(T;x) = µ2eσ 2−2xβ(2eσ 2 − 1

),

and

corr(Ti,Tj) = eρσ 2 − 1

2eσ 2 − 1.

The bivariate correlation when σ 2 = 1 in the exponential case isgraphed in figure 2 of Lindeboom and Van Den Berg (1994). Ingeneral, corr(Ti,Tj) is monotone in ρ. When ρ = 1 and we havea shared frailty model, the correlation tends to .5 as σ 2 → ∞.When ρ = −1/(r − 1), the lower bound for the correlation de-pends on the cluster size r, and it is obtained for σ 2 = σ 2

0 , whereσ 2

0 is the root of the equation

2 exp

(σ 2 r

r − 1

)− 2 exp(σ 2)

r

r − 1+ 1

r − 1= 0.

For example, when r = 2, we have −.1716 ≤ corr(Ti,

Tj) ≤ .5. In general, in all survival models with frailties, theintracluster correlation between lifetimes does not vary overthe whole range [−1,1]. For example, the shared univariatefrailty model can only describe positive association of life-times. Note, however, that the correlation on the logarithmicscale is not similarly restricted. Indeed, from (5) it follows thatcorr(log(Ti), log(Tj)) can take any value in the (−1,1) range.

Alternative measures of intracluster dependence includeSpearman’s correlation coefficient ρS, the median concordance,and Kendall’s τ coefficient (Hougaard 2000, secs. 4.2–4.4).Numerical integration is necessary to compute Spearman’s ρS.Also, the median concordance and Kendall’s τ both depend onthe Laplace transform of the lognormal distribution. Because nosimple formulas are available for it, approximations or numeri-cal integration must also be used to compute τ and the medianconcordance (Hougaard 2000, pp. 229–230, 244).

Independence between clustered observations can be ob-tained in the multivariate frailty model when either ρ = 0 orσ 2 = 0. The two conditions lead, however, to different inter-pretations. If σ 2 = 0, then the frailties are constant, the frailtycorrelation parameter ρ is not identifiable, and the clusteredlifetimes are independent. If ρ = 0 but σ 2 > 0, then the clus-tered lifetimes T1, . . . ,Tr are again independent, but their haz-ard rates are still affected by the univariate (nonshared) frailtiesW1, . . . ,Wr . These frailties represent now the unmeasurable co-variates that lead to unobserved heterogeneity among clusteredobservations (Hougaard 1995).

The multivariate frailty model defined by (3) is an exten-sion of the shared univariate frailty model with Weibull haz-ards, which is obtained when ρ = 1. The main advantage ofthe multivariate model is given by its more flexible dependence

structure, in particular, its ability to describe negative depen-dence. However, the multivariate model is sometimes to bepreferred to the univariate model even when the dependenceis positive—see the example in Section 4.1. The multivari-ate model is particularly useful for the analysis of data withseveral sources of variation giving rise to different degrees ofdependence (Hougaard 2000, p. 345). Indeed, the model canbe extended to accommodate a more complex random effectsstructure—see the application in Section 4.3.

3. ESTIMATION

In this section we discuss a Bayesian approach for fittingmodel (3) to experimental data. Suppose that K independentclusters of varying sizes are available for inference. Denote by{((Tki,Cki, δki),xki)|1 ≤ k ≤ K,1 ≤ i ≤ rk} the data from the kthcluster. The lifetimes Tk1, . . . ,Tkrk are possibly right censoredby Ck1, . . . ,Ckrk , with censoring indicators δk = (δk1, . . . , δkrk)

given by

δki ={

1, if Tki ≤ Cki

0, otherwise.

The observed data consist of {((Yki, δki),xki)|1 ≤ k ≤ K,

1 ≤ i ≤ rk}, where Yki = min(Tki,Cki). Let the maximumcluster size be R = max{rk;1 ≤ k ≤ K}. The model parame-ters must satisfy the constraints α,µ > 0, σ 2 ≥ 0, and alsoρ ≥ −1/(R − 1) in order to ensure that �r is positive definitefor all 1 ≤ r ≤ R.

Let wk = (wk1, . . . ,wkrk) be the frailty vector for the kthcluster. We assume that conditionally on w the censoring is in-dependent and noninformative of w. The observed but incom-plete data for the kth cluster are (yk, δk); the complete data are(yk, δk,wk). For j = 1, . . . , rk, we have

f (ykj, δkj|wk) = [αµ−αyα−1

kj exp{α(wkj + xkjβ)}]δkj

× exp{−µ−αyα

kj exp[α(wkj + xkjβ)]},and the complete data density in the kth cluster is

f((yk1, δk1), . . . ,

(ykrk , δkrk

),wk

)= φrk(wk;ρ,σ 2)

rk∏j=1

f (ykj, δkj|wk),

where φr(·;ρ,σ 2) is the density function of the multivariatenormal distribution with mean 0 and covariance matrix �r

given by (2). Hence, the full likelihood of the sample in termsof parameters θ is given by

L(θ,w) =K∏

k=1

φrk(wk)

×K∏

k=1

rk∏j=1

[{αµ−αeα(wkj+xkjβ)yα−1

kj

}δkj

× exp{−µ−αeα(wkj+xkjβ)yα

kj

}]. (7)

The likelihood of the sample is obtained from (7) by integrat-ing out the unobserved frailties with respect to their lognormal

TECHNOMETRICS, AUGUST 2006, VOL. 48, NO. 3

Page 4: Multivariate Frailty Models for Exchangeable …faculty.london.edu/cstefanescu/Stefanescu-Turnbull_2006.pdf · Multivariate Frailty Models for Exchangeable Survival Data With Covariates

414 CATALINA STEFANESCU AND BRUCE W. TURNBULL

densities. Maximum likelihood estimators are difficult to ob-tain directly because no closed-form analytical expressions areavailable for this likelihood and the multivariate integrals be-come increasingly more difficult to compute with higher clustersizes.

We propose the use of a Bayesian approach for estimation.This is natural because of the inherent hierarchical structure ofthe multivariate lognormal frailty model (3). The first level ofthe hierarchy is given by the parameters of the lognormal distri-bution ρ and σ 2, on the second level lie the unobserved randomeffects w and the Weibull scale and shape parameters µ and α,and the third level consists of the observed data given by theclustered observations.

The parameters and hyperparameters of the model are θ =(α,µ,β, ρ, σ 2,w). A Bayesian specification requires prior dis-tributions to be chosen for all parameters and hyperparametersin the hierarchy. We assume that the parameters are a prioriindependent, and with little external information available, wegenerally would like to specify noninformative priors p(·) forthe components of θ . The prior distribution of θ then becomes

p(θ) = p(α)p(µ)p(β)p(ρ)p(σ 2),

and the joint posterior density p(θ, {wk}|{(y, δ)}) is propor-tional to the product of the prior and the augmented likelihoodgiven by (7).

The marginal posterior density of each parameter is obtainedby integrating out the other parameters from the joint posteriordensity. This is difficult to achieve analytically; therefore, wepropose the use of Gibbs sampling (Geman and Geman 1984;Gelfand and Smith 1990) for generation of the marginal poste-rior distributions. Details of its implementation are given in theAppendix. For all parameters, 95% credible intervals can becomputed from the samples of observations generated from theposterior densities, and these can then be used in testing specifichypotheses about the parameters. In particular, the hypothesesthat β = 0 (no covariate effect), ρ = 1 (univariate frailty), ρ = 0(independence with unobserved heterogeneity), or σ 2 = 0 (in-dependence) may be of interest. Ibrahim et al. (1991, chap. 6)described several methods for model comparison in a Bayesianframework. Following Spiegelhalter et al. (2003), in the appli-cations described in Sections 4 and 5 we shall use the devianceinformation criterion (DIC) to choose among different modelsfitted to the same dataset.

4. APPLICATIONS

4.1 Load-Sharing Data

Kim and Kvam (2004) in their table 1 described two simu-lated failure time datasets, which were then used to illustrateorder-restricted estimation of load-sharing models. Each of thedatasets consists of 20 clusters of equal size R = 3. None of thefailure times is censored and there are no covariates. We ana-lyzed the data from their sample 1 using several frailty mod-els, in particular, those discussed in Section 2. Kim and Kvam(2004) discussed many load-sharing models including local andnonmonotone load sharing. However, in the absence of an indi-cation of any particular load-sharing mechanism, it is useful to

Table 1. Load-Sharing Data: Parameter Estimates

Variable Mean Standard error Median 95% credible intervals

Model 1: multivariate lognormal frailty (DIC = 285.271)α 3.56 3.92 2.24 (1.39, 15.33)µ 6.384 1.417 6.276 (3.929, 9.430)ρ .784 .151 .800 (.463, .994)σ 2 .966 .429 .925 (.280, 1.920)

Model 2: multivariate lognormal frailty, ρ = 0 (DIC = 366.906)α 1.21 .15 1.20 (.953, 1.519)µ 8.079 .980 8.038 (6.264, 10.130)σ 2 .065 .083 .034 (.004, .314)

Model 3: univariate lognormal frailty (DIC = 332.312)α 1.75 .24 1.75 (1.29, 2.25)µ 7.004 1.520 6.867 (4.458, 10.310)σ 2 .735 .370 .674 (.197, 1.626)

Model 4: multivariate t-distributed frailty (DIC = 314.242)α 2.42 2.48 1.98 (1.41, 6.38)µ 7.839 1.376 7.784 (5.228, 10.710)ρ .84 .14 .88 (.49, .99)σ 2 .368 .249 .305 (.080, 1.036)

consider a “minimalist” model for which the correlation struc-ture is based only on the assumption of exchangeability of com-ponent lifetimes. In particular, this can serve as a benchmarkagainst which to judge the fit of more specific load-sharingmodels.

The frailty correlation is restricted by −1/2 ≤ ρ ≤ 1, andwe specified a uniform prior for ρ on (−1/2,1) (Ibrahim et al.1991, p. 138). The posterior estimates of the parameters givenin Table 1 are based on an inverse �(.1, .1) prior for α and σ 2,and a N(0,103) prior for log(µ). As in the example of Sec-tion 4.3 we have investigated different choices of priors, butthey had little influence on the posterior estimates. The Gibbssampler was started with initial values α0 = 1, µ0 = 1, σ 2

0 = 1,and ρ0 = 0. The chains were run for 30,000 iterations, with thefirst 10,000 iterations discarded as the burn-in period.

We fit the multivariate lognormal frailty model with generalcorrelation ρ (model 1), then the same model with the con-straint ρ = 0 (model 2), then a univariate lognormal frailtymodel (model 3), and finally, a multivariate-t frailty modelwith k = 3 degrees of freedom (model 4). The models can becompared using the deviance information criterion (DIC). Notsurprisingly, model 2, which implies independence of failuretimes, has the worst fit. This is consistent with the results of test-ing that ρ = 0 in the other models; because none of the credibleintervals for ρ contain 0, the hypothesis of independence can berejected. Model 1 has the best fit, followed by model 4. There-fore, the multivariate lognormal frailty seems preferable to themultivariate-t frailty in this case. Note that the difference in thegoodness of fit between models 1 and 3 (the multivariate versusunivariate frailty) is quite substantial and that the estimated ρ

is rather high. This provides an example of a case when a mul-tivariate frailty model gives a much better fit to the data than aunivariate frailty model, even though the association is strongand positive.

4.2 Tumorigenesis Data

Mantel, Bohidar, and Ciminera (1977) reported data froma litter-matched tumorigenesis experiment. The experiment in-volved 50 male and 50 female litters, each of three rats. Two rats

TECHNOMETRICS, AUGUST 2006, VOL. 48, NO. 3

Page 5: Multivariate Frailty Models for Exchangeable …faculty.london.edu/cstefanescu/Stefanescu-Turnbull_2006.pdf · Multivariate Frailty Models for Exchangeable Survival Data With Covariates

MULTIVARIATE FRAILTY MODELS 415

in each litter served as controls, and the remaining rat receiveda drug. The data recorded are the time to tumor appearance.Censoring was induced by death from other causes, as well asby the end of study after 104 weeks.

Common genetic factors and shared carcinogenic exposureinduce association in the times to tumor appearance betweenlitter mates. To assess the treatment effect, it is, therefore, im-portant to account for intralitter dependence. Note, in particu-lar, that it seems reasonable to assume exchangeability of theresponses within a litter. The sample of male rats was heavilycensored, because there were only two male rats that developedtumors. Therefore, we restrict our analysis to the subset of thedata concerning the female rats. Here treatment is an individual-level covariate.

We now fit the multivariate lognormal frailty model (3) tothe female rats’ tumor time data, with treatment as a binaryindividual-level covariate (control x = 0 or drug x = 1). Theresults reported here are based on an inverse �(.1, .1) priorfor σ 2, a �(.1, .1) prior for α, a N(0,105) prior for log(µ), anda N(0,106) prior for β . Because the cluster size is R = 3, thefrailty correlation is restricted by −1/2 ≤ ρ ≤ 1, and an appro-priate noninformative prior for ρ is given by the uniform distri-bution on (−1/2,1) (Ibrahim et al. 1991, p. 138). To study thesensitivity of the results to other choices of prior for ρ, however,we also fit the model with diffuse beta-type priors on (−1/2,1).The results are displayed in Table 2. It can be seen that the es-timates were robust to the prior specification. Also, not shown,we found that different choices of diffuse priors for α, σ 2, β ,and log(µ) had little influence on the estimates of µ and β . TheGibbs sampler was started with initial values α0 = 1, µ0 = 1,σ 2

0 = 1, ρ0 = 0, and β0 = 0. The chain was run for 50,000 iter-ations, with the first 10,000 iterations discarded as the burn-inperiod. Table 2 presents the mean, standard deviation, and me-dian of the marginal posterior distributions of the parameters, aswell as the 90% credible intervals. It appears that the chain ofsamples from the marginal posterior distributions of σ 2 and ρ

has very slow mixing properties, and this is corroborated by therelatively large standard errors.

These litter tumorigenesis data were also analyzed by Kleinand Moeschberger (2003, pp. 429–435) who fit a Cox propor-

Table 2. Female Rat Tumor Time Data: Parameter Estimates WithDifferent Priors for Frailty Correlation ρ

Standard 90% crediblePrior Variable Mean error Median intervals

ρ ∼ U(−.5, 1) α 6.620 5.110 4.830 (3.130, 16.600)µ 163.000 19.140 160.700 (136.200, 198.500)β .260 .103 .255 (.097, .437)ρ .443 .257 .434 (.043, .889)σ 2 .171 .092 .158 (.052, .340)

ρ = −.5 + 1.5u α 5.904 4.491 4.205 (2.954, 19.950)u ∼ beta(ζ , ζ ) µ 164.600 17.890 161.800 (140.000, 208.300)ζ ∼ �(.1, .1) β .243 .107 .236 (.045, .480)

ρ .591 .340 .607 (−.040, .999)σ 2 .152 .080 .138 (.039, .340)

ρ = −.5 + 1.5u α 5.588 3.481 4.431 (2.948, 16.460)u ∼ beta(ζ , ζ ) µ 163.600 15.920 162.800 (136.300, 197.700)ζ ∼ �(1, .01) β .264 .095 .263 (.086, .452)

ρ .506 .249 .503 (.022, .925)σ 2 .147 .082 .133 (.030, .321)

tional hazards model first assuming independence and then witha univariate gamma frailty. They found no evidence of a littereffect in this experiment, and their estimate of the treatment ef-fect β is close to the estimate in Table 2. Ripatti, Larsen, andPalmgren (2002) fit a univariate frailty model with a lognormalfrailty to the litter tumorigenesis data. Hougaard (2000, p. 284)reported analyses of this dataset using Weibull models with uni-variate stable and gamma frailty. All of these analyses foundonly a slight intracluster dependence.

Unlike the univariate frailty models, our approach using amultivariate frailty model allows for negative correlation withina litter. The independence case no longer lies on the boundaryof the parameter space, and, thus, testing whether the correla-tion is statistically significant becomes straightforward. (Whenthe null value is on the boundary of the parameter space, test-ing is more complicated—see, e.g., Andersen, Borgan, Gill, andKeiding 1993, p. 663.) For the rat example, the 90% credibleinterval for the frailty correlation ρ does not include 0, and,hence, there is some suggestive evidence of a positive depen-dence among litter mates.

4.3 Breaking Strengths of Rigging Lines

Crowder et al. (1991, table 7.1) reported data from an exper-iment on the breaking strengths of parachute rigging lines. Thetest was carried out on eight lines on each of six parachutes, andthe strength was measured on each line at six equally spacedpositions at increasing distances from the hem. There is a cleardifference in strength between the parachutes due to differentexposure times to weathering.

In an attempt to determine how breaking strength varies withline and position, we fit the frailty model (3) to the parachutedataset. In fact, we need to use a slight generalization to al-low one of the covariates to be modeled as a random effectrather than a fixed effect. The correlation between measure-ments on the same line is modeled by choosing each line to bea cluster. We model the breaking strengths as exchangeable af-ter accounting for the variable “distance from the hem” (codedas “positions” with values 1, . . . ,6)—an individual-level co-variate. As suggested in the analysis of Crowder et al. (1991,p. 141), we might start by considering a linear effect of thiscovariate. A parachute random effect is introduced in orderto model the heterogeneity in breaking strengths among para-chutes. Model (3) thus becomes

Tkji|(Wkj,xkji) ∼ Weibull(α, exp(Wkji + uk + βxkji)/µ

)(8)

for i = 1, . . . ,6, j = 1, . . . ,8, and k = 1, . . . ,6, where uk ∼N(0, σ 2

u ). Here k is the index for parachutes, j for lines, and i forposition on the line. The parachute random effects u1, . . . ,u6will also be sampled at each iteration of the Gibbs sampler, andthe posterior conditional distributions (A.2)–(A.7) will includethe variance of the random effect σ 2

u as well.Because R = 6, the frailty correlation is restricted by −1/5 ≤

ρ ≤ 1, and we specified a uniform prior for ρ on (−1/5,1)

(Ibrahim et al. 1991, p. 138). To check the sensitivity of our es-timates, we specified a range of different priors—gamma andinverse gamma for α and σ 2, normal with variances between103 and 106 for β and log(µ). These choices of priors had lit-tle influence on the estimates of µ and β . The results reported

TECHNOMETRICS, AUGUST 2006, VOL. 48, NO. 3

Page 6: Multivariate Frailty Models for Exchangeable …faculty.london.edu/cstefanescu/Stefanescu-Turnbull_2006.pdf · Multivariate Frailty Models for Exchangeable Survival Data With Covariates

416 CATALINA STEFANESCU AND BRUCE W. TURNBULL

Table 3. Rigging-Line Breaking-Strength Data: Parameter Estimates

Variable Mean Standard error Median 90% credible intervals

α 42.13 5.54 41.49 (34.52, 52.09)µ 1,412.00 199.50 1,382.00 (1,132.00, 1,709.00)β −.0092 .0013 −.0092 (−.0113, −.0071)ρ .896 .050 .903 (.803, .962)σ 2 .0051 .0012 .0050 (.0035, .0073)σ 2

u .178 .220 .118 (.032, .512)

here are based on an inverse �(.1, .1) prior for α, σ 2, and σ 2u ,

a N(0,105) prior for log(µ), and a N(0,106) prior for β . TheGibbs sampler was started with initial values α0 = 1, µ0 = 1,σ 2

0 = 1, ρ0 = 0, σ 2u0 = 1, and β0 = 0.

The chain was run for 25,000 iterations, with the first10,000 iterations discarded as the burn-in period. The poste-rior estimates are given in Table 3. The chain of samples fromthe marginal posterior distributions of σ 2 and ρ has very slowmixing properties, mainly due to the small number of clustersin the dataset. The estimated β is negative, which agrees withCrowder’s conclusion that the strength of the line decreases asone moves from position 6 down to position 1. The estimatefor the frailty correlation is high: ρ̂ = .90 with 90% credibleinterval (.80, .96).

5. CONCLUSION

The model discussed in this article extends both the sharedunivariate frailty model, which only allows for positive de-pendence, and the univariate nonshared frailty model, whichcaptures unobserved heterogeneity due to neglected individ-ual covariates. More complex dependence structures could eas-ily be modeled by relaxing the exchangeability assumptionand allowing different values for the frailty correlations in �r .Our computational experience shows that Gibbs sampling isan efficient approach to estimation in the Bayesian framework.As expected, the computational complexity and the time to con-vergence of the Markov chains increased with cluster size, butthey remained reasonable in all the applications that we inves-tigated.

ACKNOWLEDGMENTS

This research was supported in part by grant R01 CA66218from the U.S. National Institutes of Health and by an RAMDgrant from London Business School.

APPENDIX: IMPLEMENTATION OFTHE GIBBS SAMPLER

The Gibbs sampler proceeds by successively updating eachvariable by sampling from its conditional distribution given thecurrent values of all other variables. Under mild conditions, itcan be proven that convergence is achieved after a sufficientlylarge number of iterations, and the values of the updated vari-ables so obtained form a sample from the joint distribution—see, for example, Robert and Casella (1999).

The Gibbs sampler requires all the conditional posterior dis-tributions, and these can be derived based on the joint posterior:

• The random variables w1, . . . ,wK are independent, andwk|yk, δk, θ has density proportional to

φrk(wk) exp

rk∑j=1

wkjδkj

− µ−α

rk∑j=1

exp{α(wkj + xkjβ)}yαkj

]. (A.1)

• The posterior distribution of α conditional on (µ,β, ρ,

σ 2), {yk}, {δk}, {xk}, and {wk} has density proportional to

p(α)

K∏k=1

rk∏j=1

[{αµ−αeα(wkj+xkjβ)yα−1

kj

}δkj

× exp{−µ−αeα(wkj+xkjβ)yα

kj

}]. (A.2)

• The posterior distribution of µ conditional on (α,β, ρ,

σ 2), {yk}, {δk}, {xk}, and {wk} has density proportional to

p(µ)µ−α

∑k,j δkj

× exp

{−µ−α

K∑k=1

rk∑j=1

yαkj exp[α(wkj + xkjβ)]

}. (A.3)

• The posterior distribution of β conditional on (α,µ,ρ,

σ 2), {yk}, {δk}, {xk}, and {wk} has density proportional to

p(β) exp

{(K∑

k=1

rk∑j=1

xkjδkj

)αβ

− µ−αK∑

k=1

rk∑j=1

yαkj exp[α(wkj + xkjβ)]

}. (A.4)

• The posterior distribution of ρ conditional on (α,µ,β,

σ 2), {yk}, {δk}, {xk}, and {wk} has density proportional to

p(ρ)σ−∑Kk=1 rk(1 − ρ)−

∑Kk=1(rk−1)/2

×K∏

k=1

{1 + (rk − 1)ρ}−1/2f (ρ,σ 2, {wk}), (A.5)

where

f (ρ,σ 2, {wk})

= exp

[−

K∑k=1

{1 + (rk − 2)ρ}Sk1 − 2ρSk2

2σ 2(1 − ρ){1 + (rk − 1)ρ}

],

with Sk1 = ∑rkj=1 w2

kj and Sk2 = ∑i �=j wkiwkj. To see this,

note that the posterior conditional distribution of ρ is pro-portional to

p(ρ)

{K∏

k=1

φrk(wk;ρ,σ 2)

}

∝ p(ρ)

K∏k=1

1

(2π)rk/2|�rk |1/2exp

{−1

2w′

k�−1rk

wk

}.

(A.6)

TECHNOMETRICS, AUGUST 2006, VOL. 48, NO. 3

Page 7: Multivariate Frailty Models for Exchangeable …faculty.london.edu/cstefanescu/Stefanescu-Turnbull_2006.pdf · Multivariate Frailty Models for Exchangeable Survival Data With Covariates

MULTIVARIATE FRAILTY MODELS 417

But |�r| = σ 2r(1 − ρ)r−1{1 + (r − 1)ρ} and

�−1r = 1

σ 2(1 − ρ){1 + (r − 1)ρ}[Ir{1+ (r −1)ρ}−ρ1r

],

where Ir is the identity matrix of order r and 1r is ther × r matrix of 1’s. Hence, (A.5) may now be derivedfrom (A.6).

• The posterior distribution of σ 2 conditional on (α,µ,β,

ρ), {yk}, {δk}, {xk}, and {wk} has density proportional to

p(σ 2)(σ 2)−∑K

k=1 rk/2f (ρ,σ 2, {wk}). (A.7)

This follows from arguments similar to those used to de-rive (A.5).

Sampling from the conditional posterior distributions canbe realized using a griddy Gibbs approach (Ritter and Tanner1992). In practice, the Gibbs sampler may be implemented inWinBugs (Spiegelhalter et al. 2003), and convergence diag-nostics are usually computed with CODA (Cowles and Carlin1996).

[Received ????. Revised ????.]

REFERENCES

Andersen, P. K., Borgan, O., Gill, R., and Keiding, N. (1993), Statistical ModelsBased on Counting Processes, New York: Springer-Verlag.

Costigan, T. M., and Klein, J. P. (1993), “Multivariate Survival Analysis Basedon Frailty Models,” in Advances in Reliability, ed. A. P. Basu, Amsterdam:Elsevier, pp. 43–58.

Cowles, M. K., and Carlin, B. P. (1996), “Markov Chain Monte Carlo Conver-gence Diagnostics: A Comparative Review,” Journal of the American Statis-tical Association, 91, 883–904.

Cox, D. R. (1972), “Regression Models and Life Tables” (with discussion),Journal of the Royal Statistical Society, Ser. B, 34, 187–220.

Crowder, M. J., Kimber, A. C., Smith, R. L., and Sweeting, T. J. (1991), Statis-tical Analysis of Reliability Data, London: Chapman & Hall.

Gelfand, A. E., and Smith, A. F. M. (1990), “Sampling-Based Approaches toCalculating Marginal Densities,” Journal of the American Statistical Associ-ation, 85, 398–409.

Geman, S., and Geman, D. (1984), “Stochastic Relaxation, Gibbs Distribu-tions, and the Bayesian Restoration of Images,” IEEE Transactions on Pat-tern Analysis and Machine Intelligence, 6, 721–741.

Hollander, M., and Pena, E. A. (1995), “Dynamic Reliability Models With Con-ditional Proportional Hazards,” Lifetime Data Analysis, 1, 377–401.

Hougaard, P. (1989), “Fitting a Multivariate Failure Time Distribution,” IEEETransactions on Reliability, 38, 444–448.

(1995), “Frailty Models for Survival Data,” Lifetime Data Analysis, 1,255–273.

(2000), Analysis of Multivariate Survival Data, New York: Springer-Verlag.

Ibrahim, J. G., Chen, M. H., and Sinha, D. (1991), Bayesian Survival Analysis,New York: Springer-Verlag.

Jaisingh, L. R., Dey, D. K., and Griffith, W. S. (1993), “Properties of a Mul-tivariate Survival Distribution Generated by a Weibull and Inverse GaussianMixture,” IEEE Transactions on Reliability, 42, 618–622.

Kim, H., and Kvam, P. H. (2004), “Reliability Estimation Based on SystemData With an Unknown Load Share Rule,” Lifetime Data Analysis, 10,83–94.

Klein, J. P., and Moeschberger, M. (2003), Survival Analysis, New York:Springer-Verlag.

Kotz, S., Balakrishnan, N., and Johnson, N. L. (2000), Continuous MultivariateDistributions, Vol. 1 (2nd ed.), New York: Wiley.

Kvam, P. H., and Pena, E. A. (2005), “Estimating Load-Sharing Properties ina Dynamic Reliability System,” Journal of the American Statistical Associa-tion, 100, 262–272.

Lawless, J. F. (2002), Statistical Models and Methods for Lifetime Data(2nd ed.), New York: Wiley.

Lindeboom, M., and Van Den Berg, G. J. (1994), “Heterogeneity in Models forBivariate Survival: The Importance of the Mixing Distribution,” Journal ofthe Royal Statistical Society, Ser. B, 56, 1016–1022.

Lindley, D. V., and Singpurwalla, N. D. (1986), “Multivariate Distributions forthe Life Lengths of Components of a System Sharing a Common Environ-ment,” Journal of Applied Probability, 23, 418–431.

Mantel, N., Bohidar, N. R., and Ciminera, J. L. (1977), “Mantel–HaenszelAnalyses of Litter-Matched Time-to-Response Data, With Modifications forRecovery of Interlitter Information,” Cancer Research, 37, 3863–3868.

Nayak, T. K. (1987), “Multivariate Lomax Distribution: Properties and Useful-ness in Reliability Theory,” Journal of Applied Probability, 24, 170–177.

Ripatti, S., Larsen, K., and Palmgren, J. (2002), “Maximum Likelihood Infer-ence for Multivariate Frailty Models Using an Automated Monte Carlo EMAlgorithm,” Lifetime Data Analysis, 8, 349–360.

Ritter, C., and Tanner, M. A. (1992), “The Gibbs Stopper and the Griddy GibbsSampler,” Journal of the American Statistical Association, 87, 861–868.

Robert, C. P., and Casella, G. (1999), Monte Carlo Statistical Methods, NewYork: Springer-Verlag.

Roy, D. (2001), “Some Properties of a Classification System for MultivariateLife Distributions,” IEEE Transactions on Reliability, 50, 214–220.

Sahu, S. K., and Dey, D. K. (2000), “A Comparison of Frailty and Other Modelsfor Bivariate Survival Data,” Lifetime Data Analysis, 6, 207–227.

Sahu, S. K., Dey, D. K., Aslanidou, H., and Sinha, D. (1997), “A Weibull Re-gression Model With Gamma Frailties for Multivariate Survival Data,” Life-time Data Analysis, 3, 123–137.

Spiegelhalter, D. G., Thomas, A., Best, N. G., and Lunn, D. (2003), WinBUGSVersion 1.4 User Manual, Cambridge, U.K.: MRC Biostatistics Unit.

Whitmore, G. A., and Lee, M. T. (1991), “A Multivariate Survival DistributionGenerated by an Inverse Gaussian Mixture of Exponentials,” Technometrics,33, 39–50.

Xue, X., and Brookmeyer, R. (1996), “Bivariate Frailty Model for the Analysisof Multivariate Survival Time,” Lifetime Data Analysis, 2, 277–289.

TECHNOMETRICS, AUGUST 2006, VOL. 48, NO. 3