18
http://smm.sagepub.com/ Research Statistical Methods in Medical http://smm.sagepub.com/content/12/4/333 The online version of this article can be found at: DOI: 10.1191/0962280203sm335ra 2003 12: 333 Stat Methods Med Res Gabriel Escarela and Jacques F. Carriere Fitting competing risks with an assumed copula Published by: http://www.sagepublications.com can be found at: Statistical Methods in Medical Research Additional services and information for http://smm.sagepub.com/cgi/alerts Email Alerts: http://smm.sagepub.com/subscriptions Subscriptions: http://www.sagepub.com/journalsReprints.nav Reprints: http://www.sagepub.com/journalsPermissions.nav Permissions: http://smm.sagepub.com/content/12/4/333.refs.html Citations: What is This? - Aug 1, 2003 Version of Record >> at UNIV OF VIRGINIA on September 14, 2012 smm.sagepub.com Downloaded from

Fitting competing risks with an assumed copula

  • Upload
    j-f

  • View
    215

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Fitting competing risks with an assumed copula

http://smm.sagepub.com/Research

Statistical Methods in Medical

http://smm.sagepub.com/content/12/4/333The online version of this article can be found at:

 DOI: 10.1191/0962280203sm335ra

2003 12: 333Stat Methods Med ResGabriel Escarela and Jacques F. Carriere

Fitting competing risks with an assumed copula  

Published by:

http://www.sagepublications.com

can be found at:Statistical Methods in Medical ResearchAdditional services and information for    

  http://smm.sagepub.com/cgi/alertsEmail Alerts:

 

http://smm.sagepub.com/subscriptionsSubscriptions:  

http://www.sagepub.com/journalsReprints.navReprints:  

http://www.sagepub.com/journalsPermissions.navPermissions:  

http://smm.sagepub.com/content/12/4/333.refs.htmlCitations:  

What is This? 

- Aug 1, 2003Version of Record >>

at UNIV OF VIRGINIA on September 14, 2012smm.sagepub.comDownloaded from

Page 2: Fitting competing risks with an assumed copula

Fitting competing risks with an assumed copulaGabriel Escarela Departamento de Matema ticas, Universidad Autonoma Metropolitana,Unidad Iztapalapa, Mexico DF, Mexico and Jacques F. Carrie© re Department ofMathematical and Statistical Sciences, University of Alberta, Edmonton, Alberta, Canada

We propose a fully parametric model for the analysis of competing risks data where the types of failuremay not be independent. We show how the dependence between the cause-speci�c survival times can bemodelled with a copula function. Features include: identi�ability of the problem; accessible understandingof the dependence structures; and �exibility in choosing marginal survival functions. The model isconstructed in such a way that it allows us to adjust for concomitant variables and for a dependenceparameter to assess the effects of these on each marginal survival model and on the relationship betweenthe causes of death. The methods are applied to a prostate cancer data set. We �nd that, with the copulamodel, more accurate inferences are obtained than with the use of a simpler model such as the independentcompeting risks approach.

1 Introduction

A number of regression models for competing risks data have been proposed.1 –4 Mostof these models, however, can be criticized on the basis of their unwarranted assump-tions, dif�culty of interpretation and susceptibility to over �t the data. In particular, theuse of the semiparametric cause-speci�c Cox proportional hazards has become routine.This method implies an in�nite-dimensional speci�cation for the cause-speci�c hazardfunctions. Therefore, if one’s interest lies in estimating the cause-speci�c survivalprobabilities, for instance, the resulting estimated hazards will generally be very erraticstepwise functions and will have very wide con�dence bands. The contribution of thisarticle is to suggest a �exible class of parametric speci�cations of the multivariatesurvival function that provides a number of desirable elements for modelling competingrisks.

A precise parametric speci�cation of a dependent competing risks model should leadto more ef�cient estimation of regression parameters and related quantities; it shouldalso provide the basis for a more concise summary of the data and enhance theunderstanding of the cause-speci�c failure–time process. Most fully parametric survivalmodels in the competing risks framework, nevertheless, have assumed the events to bemutually independent.3 ,5 ,6 The main argument to use such an independent structure hasbeen that for most dependent models the relationship between the m cause-speci�csurvival times is not identi�able.7 Thus, the choice of the independent model has been aresult both of its tractability and the identi�ability of the problem. It is common,

Address for correspondence: Dr Gabriel Escarela, Departamento de Matematicas, Universidad AutonomaMetropolitana, Unidad Iztapalapa, AT-223, Av. San Rafael Atlixco No. 186, Col. Vicentina, CP 09340,Mexico DF, Mexico. E-mail: [email protected]

Statistical Methods in Medical Research 2003; 12: 333^349

# Arnold 2003 10.1191/0962280203sm335ra

at UNIV OF VIRGINIA on September 14, 2012smm.sagepub.comDownloaded from

Page 3: Fitting competing risks with an assumed copula

however, to �nd situations where the assumption of independence may be questionable;furthermore, there might be situations where researchers would like to assess the degreeof dependence between the competing survival times in the presence of covariates.

In this article, we formulate a methodology that extends standard statistica l modelsfor competing risks. First, we believe that there are many situations in which the cause-speci�c risks of failure may not be independent, and therefore choose a copula model,which allows for a range of dependence structures through a dependence parameter (ora set of parameters) as well as allowing for separate marginal models to act in thesurvival times on each risk. Secondly, we also believe that the covariates will actdifferently on different types of failure, and thus we extend the model into a moregeneral competing risks framework. This general model may be simpli�ed for anyparticular data set; examination of the parameter estimates will inform us whether thecomplexity of the dependent competing risks is necessary.

In section 2 below we introduce the general competing risks set-up and provide adescription of the identi�ability problem. A copula approach is introduced in section 3,where the extended dependent competing risks model is described, which is then furtherdeveloped to allow the marginal hazards to depend on covariates. The models are �ttedand calibrated to a prostate cancer data set in section 4, where particular attention ispaid to the �tting procedure of both the copula model and the independent approach, inorder to compare the coef�cients; the paper concludes with a discussion of these resultsand of the study.

2 The competing risks framework

Consider the competing risks set up with m failure types. Let zi be a p £ 1 covariatevector and Tji be the jth conceptual failure time for the ith subject, where j ˆ 1; . . . ; mand i ˆ 1; . . . ; n. In this case, we make use of the dataset fXi; cijg, whereXi ˆ min…Ti;Ci†, and Ti ˆ minfTij; j ˆ 1; . . . ; mg is the so-called actual survival time,and Ci is the censoring time; here, the status indicator matrix cij is de�ned ascij ˆ I…Ti ˆ Tij† and ci¢ ˆ

Pmjˆ1 cij is the usual censoring status vector. To avoid

ambiguity, we assume that for j 6ˆ k; Tij 6ˆ Tik.Our goal is to model the joint survival function of the cause-speci�c survival times

random vector …T1 ; T2 ; . . . ; Tm†, often called the multiple decrement function, which isde�ned as follows:

S…t1 ; . . . ; tm† ˆ PrfT1 > t1; . . . ; Tm > tmg: …1†

Throughout this study, we assume that S…t1 ; . . . ; tm† is absolutely continuous and thatits marginal survival functions, de�ned as Sj…tj† ˆ PrfTj > tjg, are not defective, i.e.,limtj!1 Sj…tj† ˆ 0.

334 G Escarela and JF Carriere

at UNIV OF VIRGINIA on September 14, 2012smm.sagepub.comDownloaded from

Page 4: Fitting competing risks with an assumed copula

The crude hazard function, also known as the ‘cause-speci�c hazard rate’, is de�nedas

h…j†…t† ˆ limDt!0

Pr…t µ Tj < t ‡ DtjT ¶ t†Dt

ˆ 1ST …t†

f …j†…t†…2†

where ST …t† ˆ Pr…T > t† ˆ Prf\mjˆ1…Tj > t†g ˆ S…t; t; . . . ; t† is the overall survival func-

tion, and

f …j†…t† ˆ ¡ @

@tjS…t1 ; . . . ; tm†

­­­­tkˆt;8k

…3†

is the crude density function. Analogous to the usual hazard function, that is, when theevent ‘any failure’ is analysed, h…j†…t† can be interpreted as the instantaneous failure ratefor the event of failing from cause j at time t given that the individual had not failedfrom any cause prior to t in the presence of competing risks. Note that the crude hazardfunction h…j†…tj† does not usually coincide with the marginal hazard, which is de�ned as

hj…tj† ˆ limDtj!0

Pr…tj µ Tj < tj ‡ DtjjTj ¶ tj†Dtj

ˆ ¡ ddtj

flog Sj…tj†g:

An exception is when the survival times Tj are mutually independent, that is, whenS…t1 ; . . . ; tm† ˆ

Qmjˆ1 Pr…Tj > tj†. It is also possible to �nd expressions where the crude

and marginal hazard functions coincide when the Tjs are statistica lly dependent.If we observe the pair of values (T, J), where T ˆ min…T1; . . . ; Tm† and

J ˆPm

jˆ1 jI…T ˆ Tj†, then the time at death and the cause of death are well identi�ed.It follows that the survival distribution corresponding to the random pair (T, J) is

Sj*…t† ˆ PrfT > t; J ˆ jg ˆ…1

tf …j†…z† dz; …4†

which is the crude probability of eventually dying from cause j at a time greater than t.Essentially, our objective is to calculate the marginal survival function Si(t) assumingthat the form of the crude function Sj*…t†, and consequently the form of f …j†…t†, is known.If the marginal survival function Sj(t) can be calculated then we say that it is identi�able.

The mathematical de�nition of identi�ability is given as follows. Let F denote afamily of multivariate survival functions de�ned in equation (1) and let G denote afamily of crude survival functions de�ned in equation (4). Let C be a mapping from Fonto G, de�ned so that C…S† ˆ S*. We say that F is identi�able by G whenever C is aninjective function. That is, if C…SA† ˆ C…SB† then SA ˆ SB. It is known, however, that ingeneral C is not injective.7

Fitting competing risks with an assessed copula 335

at UNIV OF VIRGINIA on September 14, 2012smm.sagepub.comDownloaded from

Page 5: Fitting competing risks with an assumed copula

In this set-up, we assume that the censoring variable on an individual i during theexperiment Ci and the survival times corresponding to cause j, Tij, are independent.Thus, the likelihood has the following form8

Ln ˆYn

iˆ1

Ym

jˆ1

‰f …j†…Xi†Šc ij

Á !‰ST…Xi†Š1¡c i¢ …5†

We con�ne this study to the simplest situation of only two causes of failure (i.e., m ˆ 2)and take a copula modelling approach for specifying the dependency structure betweenthe failure times of the competing risks.

3 The copula model

Copula models are classes of bivariate survival distributions, speci�ed in terms of themarginal survivor functions and a copula function, which is a continuous bivariatedistribution function on the unit square [0,1] with uniform marginals. An attractivecharacteristic of the copula class is that the elimination of the marginals through thecopula helps to model and understand dependence structures effectively, as thedependence has no relationship with the marginal behaviour of individual character-istics.

In the competing risks context, it is possible to specify the multiple decrementfunction in terms of two marginal survival distributions and a copula that allows fordependence relationships among the individual random variables corresponding to eachrisk of failure. Thus, the survival copula C of Tj; j ˆ 1; 2, is found by making marginalprobability integral transforms on Tj so that the joint distribution function of the Tjs isexpressed as

S…t1 ; t2† ˆ C‰S1…t1†; S2…t2†Š

Carriere9 proved that if the marginals S1…t1† and S2…t2† are continuous then C existsand C is unique. He showed that copulas can be used to characterize the relationshipbetween the crude survival functions and the marginal hazard functions. This relation-ship is made through a system of nonlinear differential equations with a unique solutionthat solves the problem of identi�ability.

More recent research, mainly focused on the study of bivariate failure-time data,1 0

has concentrated on the use of a subclass of copulas that is invariant under continuousmonotone transformations of each marginal component. The bivariate decrementfunction corresponding to this family of copulas, called Archimidean, takes the generalform

S…t1; t2† ˆ cy‰c¡1y fS1…t1†g‡ c¡1

y fS2…t2†gŠ …6†

where cy…¢† is a generator function, cy: ‰0;1Š ! ‰0;1Š, which is differentiable twice withcy…0† ˆ 1, c0

y…¢† < 0 and c00y…¢† < 0, and c¡1

y …¢† is its inverse function.

336 G Escarela and JF Carriere

at UNIV OF VIRGINIA on September 14, 2012smm.sagepub.comDownloaded from

Page 6: Fitting competing risks with an assumed copula

Useful models of this family of copulas include the bivariate frailty family where thegenerator is the Laplace transform cy…s† ˆ E‰exp…¡sW †Š ˆ

„ 10 e¡yt dH…y† correspond-

ing to a non-negative random variable W , sometimes called the frailty variable, whosedistribution function is H. For this class of functions, Oakes1 1 showed that the survivalcopula de�ned in equation (6) asserts that T1 and T2 are conditionally independentgiven the frailty W, so that S…t1 ; t2 jW ˆ w† ˆ S1…t1 jW ˆ w†S2…t2 jW ˆ w†. Models inthe Archimedean class of copulas include Gumbel’s,1 2 ,1 3 Frank’s,1 4 Clayton’s,1 5 andHougaard’s.1 6

The analysis requires that we choose a class of copulas. Because of its welldocumented properties, we adopt Frank’s1 4 family of copulas de�ned as

Cy…u; v† ˆ ¡ 1y

log 1 ‡ …e¡yu ¡ 1†…e¡yv ¡ 1†…e¡y ¡ 1†

µ ¶; y 6ˆ 0 …7†

The use of Frank’s copula is appealing since it is able to capture the full range ofdependence.1 7 ,1 8 It includes the Frechet upper and lower bound copulas as well as theproduct copula. The latter case is found when y ! 0 and it de�nes the independentmodel uv.

It follows that for two arbitrary survival functions S1 and S2 , the multiple decrementfunction (MDF) speci�ed by Frank’s family of copulas can be expressed asS…t1 ; t2† ˆ Cy‰S…t1†; S…t2†Š if y 6ˆ 0, and S…t1; t2† ˆ S1…t1†S2…t2† if y ˆ 0. Differentiatingthis MDF with respect to t1 and t2 , when y 6ˆ 0, we �nd that the corresponding pdf isof the form

f …t1 ; t2† ˆ h1…t1†h2…t2†S1…t1†S2…t2†y expf¡y‰S1…t1† ‡ S2…t2† ¡ 2S…t1 ; t2†Šg…1 ¡ e¡y†

which is non-negative for all t1; t2 > 0.The corresponding crude density functions de�ned in equation (3) can then be

written as

f …j†…t† ˆ hj…t†Sj…t†expf¡ySj…t†g‰expf¡yS3¡j…t†g¡ 1Š

…e¡y ¡ 1†expf¡yST …t†g… j ˆ 1; 2† …8†

where y 6ˆ 0 and

ST…t† ˆ ¡ 1y

log 1 ‡ …expf¡yS1…t†g¡ 1†…expf¡yS2…t†g¡ 1†…e¡y ¡ 1†

³ ´

When y ˆ 0, f …t1; t2† ˆ h1…t1†h2…t2†S1…t1†S2…t2† and f …j†…t† ˆ hj…t†Sj…t†.

Fitting competing risks with an assessed copula 337

at UNIV OF VIRGINIA on September 14, 2012smm.sagepub.comDownloaded from

Page 7: Fitting competing risks with an assumed copula

The copula parameter y measures strength of dependence. One way to interpret thisparameter is through Kendall’s t measure of association which, under Frank’s model,can be expressed as

ty ˆ 1 ¡ 4y

1 ¡ 1y

…y

0

tet ¡ 1

dt³ ´

…9†

Note that ty ˆ ¡t¡y.The identi�ability of Frank’s Copula model at this point seems obvious. Let f …j†…tjy†

be the crude density function in equation (8) evaluated at t when the speci�c value of yis given. It is clear that if y1 6ˆ y2 then Cy1

…u; v† 6ˆ Cy2…u; v† for all (u; v) when Cy…¢; ¢† is

as de�ned in equation (7). This is true because Kendall’s t in equation (9) is amonotonic function in y. It follows that f …j†…tjy1† 6ˆ f …j†…tjy2† and thus we have provedidenti�ability of the dependence structure.

In theory, any marginal distributions for non-negative random variables may beappropriate. In our study, we consider the Weibull family of survivor distributionswhose survival function is speci�ed by Sj…tj† ˆ expf¡…ljtj†

aj g, where lj; aj > 0 , and thecorresponding hazard function is represented as hj…tj† ˆ ajlj…ljtj†

aj¡1 . Thus, when thebivariate survival function de�ned by Frank’s model is assigned Weibull marginals, abivariate Weibull with absolute continuity is obtained.

Figure 1 shows a selection of bivariate distributiona l, crude distributiona l and crudehazard shapes for a particular case of Weibull marginals. Note that the parameters ljand aj only govern scaling and power transformation. The effect of the dependenceparameter y is more involved and can be broadly summarized as follows. In general,when y decreases from 0 to ¡ 1 , with the other parameters held constant, the contoursmove away from the origin and are squeezed towards the line t1 ‡ t2 ˆ constant, givingrise to negative correlation between t1 and t2 . The crude density with the lowest modeseems to be more affected by approaching the density with the highest mode. A similareffect is observed with the crude hazard functions and tends to have a similar shape.When y increases from 0 to 1 , the contours are pulled towards the origin and aresqueezed towards the line t1 ¡ t2 ˆ constant, giving rise to a positive correlationbetween t1 and t2 . With positive dependence, both the crude density and crudehazard functions tend to separate and change shape.

To interpret concomitant information in each marginal model, we proceed to use thelog-linear form of the scale parameter lj:

log l1 ˆ b10 ‡ bT1 u and log l2 ˆ b20 ‡ bT

2 v …10†

which involves the vector u and v of auxiliary variables , the intercept bj0 , and a vector bjof regression coef�cients. Notice that the models in equation (10) allow for differentcovariate sets for each risk of death, that is, u may contain some or all of the variables inv, as well as other variables not included in v.

338 G Escarela and JF Carriere

at UNIV OF VIRGINIA on September 14, 2012smm.sagepub.comDownloaded from

Page 8: Fitting competing risks with an assumed copula

Note that, if we set bTj ˆ abT

j and rj ˆ exp…b0j†, the Weibull hazard model can takethe form of a proportional hazards model with a Weibull baseline hazard function asfollows:

hj…tj† ˆ h0j…tj†exp…bTj z†

where h0j…t† ˆ ajraj

j …rjt†aj¡1 is the baseline hazard function. Therefore, under the

product copula, that is, when y ˆ 0, the Weibull model described above is a specializa-tion of the proportional hazards and accelerated failure time models presented byPrentice et al.2

An important characteristic of this copula model with its speci�c marginals is that theprocess of �nding a parsimonious model follows standard-likelihood-ratio-based ideas,for example, such as that used in the theory of generalized linear models.1 9 However,this model is not linear, so numerical techniques need to be used in order to �nd the

Figure 1 Bivariate density contours, crude density and crude hazard curves for the selected dependenceparameter values t¡5:74 ˆ ¡0:5, t0 ˆ 0, t5.74 ˆ 0.5 of the Weibull model with l1 ˆ 0.028, a1 ˆ 2, l2 ˆ 0.039, a2 ˆ 1.5

Fitting competing risks with an assessed copula 339

at UNIV OF VIRGINIA on September 14, 2012smm.sagepub.comDownloaded from

Page 9: Fitting competing risks with an assumed copula

maximum in equation (5). We used the function nlminb in the statistica l package S-PLUS2 0 to minimize d ˆ ¡2 £ log Ln, the deviance. Unlike many other programs thatapproximate the covariance matrix with automatic routines, we used the observedinformation matrix; for this, we used the mathematical package MAPLE2 1 to obtain theHessian matrix and translated the code into S-PLUS.

4 The prostate cancer data

4.1 The data setWe illustra te the proposed methods with the prostate cancer data set published by

Andrews and Herzberg.2 2 Among the 483 patients with complete data, there were 125(26%) deaths from prostate cancer, 219 from ‘other diseases’ (45%) and 139 censoredobservations (29%). The goal of the analysis is to compare the levels of diethylstilbes-trol (DES), a drug to treat prostate cancer, with respect to survival of the patients. As aresult of potentially fatal side effects from DES, such as cardiovascular-related or othertypes of cancer, the assessment of the actual bene�t of the treatment must take intoaccount not only the death time from prostate cancer but also from other competingcauses. In addition, the study should look for important treatment £ covariate interac-tions which might lead to the de�nition of subsets of patients in which treatmentdifferences are signi�cantly more marked or even reversed.

Various approaches and classi�cation of the risks of death have been used to analysethe prostate cancer data set. Kay2 3 and Lunn and McNeil2 4 focused on the cause-speci�c Cox proportional hazards models classifying the causes of death as cancer(any), cardiovascular or other. Kay estimated regression coef�cients separately for eachcause with deaths from remaining causes treated as censored observations, and Lunnand McNeil used a duplication method Cox regression by stratifying the types offailure. Byar and Green2 5 and Cheng et al.2 6 classi�ed the nonprostate cancer deaths inthe other causes category. Byar and Green focused on the overall exponential survivalmodel; Cheng et al. concentrated on the estimation of the cumulative incidence functionwhen �tting the cause-speci�c proportional hazard model used by Kay.

In this analysis, we categorize the risks of death as prostate cancer and ‘other’ andconsider the following covariate information: RX, the treatment (0 ˆ placebo and 0.2 mgestrogen, the ‘low-dose group’, and 1 ˆ 1.0 mg estrogen and 4 ˆ 5.0 mg estrogen, the‘high-dose group’); age, age of the patient at diagnosis; wt, standardized weight(weight in kg 7 height in cm ‡ 200); PF, performance rating (0 ˆ normal activity and1 ˆ in bed at least 50% of daytime); HX, history of cardiovascular disease (0 ˆ no,1 ˆ yes); hg, haemoglobin in mg=100 ml; sz, size of primary lesion estimated in cm2

from rectal examination; and sg, combined index of tumour stage and histologic grade.

4.2 Regression modellingTo resolve the choice of appropriate covariates to be included in the systematic part

of each cause-speci�c marginal hazard, we use backward elimination (where the leastimportant variable is successively removed until all the remaining variables aresigni�cant) to �nd the best models. We thus test differences in deviance of nestedmodels against a critical value of w2

0:95 ;dfˆ1 ˆ 3:84.

340 G Escarela and JF Carriere

at UNIV OF VIRGINIA on September 14, 2012smm.sagepub.comDownloaded from

Page 10: Fitting competing risks with an assumed copula

Maximizing the log of the likelihood speci�ed in equation (5), we �tted both Frank’sand the product copula models with Weibull marginals as described in the previoussection, so that we could make comparisons of deviances and coef�cients when differentmarginals and copulas were assumed. The deviances of the maximized log-likelihoodsfor several models are displayed in Table 1.

History of cardiovascular disease did not seem important for the prostate cancer risk.Similarly, neither serum haemoglobin, size of primary tumour nor histologic gradeseemed important for ‘other’ causes of death. When comparing models 4 and 5 (Table1) we �nd that interaction RX £ age shows signi�cant effects under both speci�cations.We also found that the interaction RX £ wt was signi�cant only for the prostate cancerrisk. All remaining RX £ covariate interactions showed a signi�cant lack of �t for bothrisks. The performance rating did not show signi�cant effects in any risk under eitherapproach. Thus, using either speci�cation, model 4 is the most parsimonious.

Table 1 also shows the estimated Kendall’s t and 95% con�dence intervals, based onthe approximation …yy ¡ y†=bseseyy ¹ N…0;1†, for each model under Frank’s speci�cation. Itis interesting to see how the dependence parameter changes from positive to negative ascovariates are removed. It is possible to observe that the con�dence intervals in models1 to 6 do not include zero, which demonstrates that, in the presence of importantcovariates, the two risks of death are positively dependent. It is also possible to notethat the con�dence intervals in models 7 and 8 are very wide and do include zero. Thisis a fascinating result: the two cause-speci�c survival times do not seem to showevidence of dependence either with no covariates or with the treatment RX on its own asoriginally believed. Here, we observed that the dependence parameter shows astatistica lly signi�cant positive association as key covariates are included in the model.

We chose model 4 as the most satisfactory. Table 2 gives the maximum likelihoodestimators and their estimated standard errors. We �rst concentrate on the regression

Table 1 Deviances and estimated Kendall’s t for the prostate cancer data using Frank’s copula model and theproduct copula model with Weibull marginals

Formula

Model Prostate cancer Other Copula Np Deviance tyy (95% CI)

1 Rx*(Age ‡ Wt ‡ PF ‡ Rx*(Age ‡ Wt ‡ PF ‡ Frank 35 3589.71 0.55 (0.34, 0.69)‡ Hx ‡ Hg ‡ Sz ‡ Sg) ‡ Hx ‡ Hg ‡ Sz ‡ Sg) P 34 3593.18 0

2 Rx*(Age ‡ Wt) ‡ PF ‡ Rx*Age ‡ Wt ‡ PF ‡ Frank 24 3595.05 0.57 (0.32, 0.70)‡ Hx ‡ Hg ‡ Sz ‡ Sg ‡ Hx ‡ Hg ‡ Sz ‡ Sg P 23 3600.37 0

3 Rx*(Age ‡ Wt) ‡ PF ‡ Rx*Age ‡ Wt ‡ PF ‡ Frank 21 3595.91 0.48 (0.31, 0.59)‡ Hx ‡ Hg ‡ Sz ‡ Sg Hx P 20 3602.58 0

4 Rx*(Age ‡ Wt) ‡ Hg ‡ Rx*Age ‡ Wt ‡ Hx Frank 18 3602.16 0.41 (0.29, 0.51)‡ Sz ‡ Sg P 17 3605.78 0

5 Age ‡ Rx*Wt ‡ Hg ‡ Rx ‡ Age ‡ Wt ‡ Hx Frank 16 3610.31 0.34 (0.20, 0.45)‡ Sz ‡ Sg P 15 3612.29 0

6 Rx ‡ Age ‡ Wt ‡ Hg ‡ Rx ‡ Age ‡ Wt ‡ Hx Frank 15 3616.28 0.26 (0.11, 0.37)‡ Sz ‡ Sg P 14 3617.37 0

7 Rx Rx Frank 7 3819.46 ¡0.49 (¡0.76, 0.42)P 6 3820.55 0

8 null null Frank 5 3828.84 ¡0.47 (¡0.70, 0.15)P 4 3829.18 0

Np ˆ number of parameters

Fitting competing risks with an assessed copula 341

at UNIV OF VIRGINIA on September 14, 2012smm.sagepub.comDownloaded from

Page 11: Fitting competing risks with an assumed copula

coef�cients and their standard errors. The table con�rms that, while the interactionRX : age has a major impact on both types of failure (signi�cant at p µ 0.05 in eachtype of death) under Frank’s copula model, its contribution is only signi�cant upon therisk of death from prostate cancer under the independent approach. Thus, with thedependent competing risks approach, we may conclude that the treatment of DESaffects the course of both the prostate cancer (PC) and ‘other’ types of death, and that itinteracts with age for both risks of death and with the weight index wt for the risk ofprostate cancer death.

This is consistent with the model for overall survival considered by Byar and Green.2 5

Their model, nevertheless, indicates a signi�cant RX : sg relationship, which we did not�nd. Another discrepancy is that they found no signi�cant effects of weight, whereas wehave observed an important interaction with the treatment for the risk of prostatecancer and signi�cant main effects in the risk of ‘other’ type of death.

The estimated Kendal’s ty obtained from Frank’s copula model in Table 2 is 0.409,which indicates a positive association between the risks of dying from prostate cancerand ‘other’. To test for the signi�cance of the dependence parameter y, it is possible toperform the null hypothesis test H0 : y ˆ 0, which corresponds to the independentmodel versus the alternative hypothesis H1 : y 6ˆ 0. Assuming large sample propertieswe then proceeded to use the familiar standard normal test statistic Z ˆ jyy=seyyj. FromTable 2 we obtained Z ˆ 5.38, which suggests that we reject the null hypothesis. Thus,the dependence between the two cause-speci�c survival times is signi�cant and thereforewe must take account of the parameter y.

When the assumption that the competing risks follow a dependent structure isquestionable, we can carry out a sensitivity analysis of inference to local departuresof y from its maximum. For this, we de�ne the pro�le log-likelihood as

`*…y† ˆ maxdjy

flog Ln…d; y†g ˆ log Ln‰dd…y†; yŠ …11†

Table 2 Estimates of the coef� cients and standard errors for the prostate cancer data using Frank’s andproduct copula models with Weibull marginals

Copula odel

Frank Independent

Parameter PC (j ˆ 1) Other (j ˆ 2) PC (j ˆ 1) Other (j ˆ 2)

I n t e r c ep t ¡3.997 (1.114) ¡4.162 (1.128) ¡3.735 (1.355) ¡5.696 (1.020)R X ¡1.069 (1.457) ¡3.675 (1.717) ¡0.628 (1.745) ¡3.040 (1.826)a g e ¡0.024 (0.011) 0.016 (0.016) ¡0.035 (0.012) 0.031 (0.012)w t 4 £ 10¡4 (0.005) ¡0.015 (0.005) 0.001 (0.006) ¡0.014 (0.005)H X 0.636 (0.091) 0.802 (0.081)h g ¡0.008 (0.003) ¡0.012 (0.004)s z 0.027 (0.004) 0.034 (0.004)s g 0.199 (0.031) 0.237 (0.032)R X : a g e 0.039 (0.017) 0.050 (0.023) 0.037 (0.018) 0.043 (0.026)R X : w t ¡0.022 (0.009) ¡0.026 (0.011)aj 1.303 (0.265) 0.983 (0.062) 1.264 (0.130) 0.939 (0.056)y 4.280 (0.795) 0

342 G Escarela and JF Carriere

at UNIV OF VIRGINIA on September 14, 2012smm.sagepub.comDownloaded from

Page 12: Fitting competing risks with an assumed copula

where dT ˆ …b10 ; b1 ; b20 ; b2; a1 ; a2†. We are particularly interested in the local shape of

`* at y ˆ yy; if, assuming any marginal model, the curve is very �at, then the point ofmaximum can depend sensitively on any transformation, which indicates that thedependence structure is poorly described by the data.2 7

To carry out the sensitivity analysis, we employ two special cases of the three-parameter Burr model characterized by the survival function Sj…tj† ˆ ‰1 ‡ gj…tjlj†aj Š¡1=gj ,where lj; aj; gj > 0, to be used as the marginals of Frank’s copula. We thus, adopt thePareto survival model,2 8 obtained when aj ˆ 1, and the log-logistic survival model,2 9

obtained when gj ˆ 1. Note that the Weibull marginal model adopted in this study isobtained when 1=gj approaches in�nity.

Rather than attempting to estimate the vectors b1…y† and b2…y† for the variousmarginals in Frank’s copula model, we may only be interested in the median survivaltime for each marginal risk, denoted here as Mj…y† … j ˆ 1; 2† For this, we take

ll1…y† ˆ expfbb10…y† ‡ bbT1 …y†·uug and ll2…y† ˆ expfbb20…y† ‡ bbT

2 …y†·vvg

where ·uu ˆ n¡1 Pniˆ1 ui and ·vv ˆ n¡1 Pn

iˆ1 vi; here, ui and vi contain the sets of covariatesdisplayed in Table 2.

Figure 2 displays the pro�le log-likelihoods `* for each survival marginal in Frank’scopula model plotted against y. We can note that the curves are ‘well behaved’ in thesense that their shapes are fairly close to quadratic for values close to the maximum. Weobserved that for the log-logistic model yy ˆ 3:40 (tyy ˆ 0:34), for the Pareto modelyy ˆ 0:56 (tyy ˆ 0:061), and for the Weibull model yy ˆ 4:28 (tyy ˆ 0:41), which suggeststhat the point of maximum depends sensitively on the marginals used. Consistently,Figure 3 illustrates that the extent at which the selection of the marginal modelin�uences the estimation of the marginal survival medians Mj…y† is also sensitive to

Figure 2 Pro� le log-likelihood for the dependence parameter y, as estimated by using the log-logistic, Paretoand Weibull marginal models

Fitting competing risks with an assessed copula 343

at UNIV OF VIRGINIA on September 14, 2012smm.sagepub.comDownloaded from

Page 13: Fitting competing risks with an assumed copula

the marginals used. When it came to using the copula model with Burr marginals, weobserved that the parameter estimates were very similar to those obtained with theWeibull model. We thus conclude that the data tends to give suf�cient informationabout local departures of y from the maximum and that different inferences can beobtained for the given marginals.

4.3 The predictive value of the modelFor the assessment of the overall adequacy of the competing risks model, we can

compare both �tted and empirical distributions of the crude probability distributionsQj*…t† (i.e., the cause-speci�c failure probabilities),

Qj*…t† ˆ PrfT µ t; J ˆ jg ˆ…t

0f …j†…z† dz …12†

for every cause of death. The �tted cause-speci�c failure probabilities can be calculatedby integrating out equation (8) for the given marginals Sj(t). The primitive, however,does not have an analytic form, so we used the function integrate in the package S-PLUS to obtain a numerical approximation. For the computation of the empiricalcause-speci�c failure distributions, we adopt the implementation of the nonparametricprocedure suggested by Gaynor et al.,4 which is described below.

Consider the crude hazard function de�ned as in equation (2) and assume that thereare r ‡ 1 distinct survival times t…0† ˆ 0 < t…1† < t…2† < ¢ ¢ ¢ < t…r†. Dinse and Larson3 0

showed, for a more general semi-Markov model, that if djk denotes the number ofindividuals who die from cause j at t…k† and nk denotes the number of individualsat risk, then the maximum likelihood estimate of h…j†…t† is hh…j†…t† ˆ djk=nk, which is

Figure 3 Marginal medians for the dependence parameter y, as estimated by using the log-logistic, Paretoand Weibull models

344 G Escarela and JF Carriere

at UNIV OF VIRGINIA on September 14, 2012smm.sagepub.comDownloaded from

Page 14: Fitting competing risks with an assumed copula

approximately unbiased. It follows that the nonparametric cause-speci�c failure prob-ability can be written as:

~QQj*…t† ˆX

fk:t…k†µtg

djk

nk

~SSKM…t…k¡1†† …13†

where

~SSKM…t…k¡1†† ˆYk¡1

lˆ1

1 ¡ dl

nl

³ ´; dl ˆ

Xm

jˆ1

djl

is the Kaplan–Meier estimate of the probability of surviving from any cause beforet…k¡1†, that is, the overall survivor proportion. Equation (13) indicates that an individualmust survive from all causes up to time t to subsequently fail of cause j at time t. Anappealing result from equation (13) is that

~FFKM…t† ˆ 1 ¡ ~SSKM…t† ˆXm

jˆ1

~QQj*…t†

Diagnostics of model �t can be carried out by borrowing those graphical methodsused to predict the accuracy of multivariate prognostic models.3 1 One of these graphicalmethods is the split-sample technique, which offers a procedure to obtain informationabout the real predictive values of a statistical model. This consists of dividing theobservations into various subsamples that represent groups of individuals which arewell separated and suf�ciently large to be useful in a medical setting; both the empiricaldistribution and the �tted probabilities are compared. In most cases, the de�nition ofthese groups is given by the quantiles of the distribution of the prognostic indexPIi ˆ bbTxi, where xi is the vector of covariates and bb is the estimator of the regressioncoef�cient.

Implementing such methods to the competing risks model requires some modi�ca-tions since we do not have a unique PI to compute the marginal scale models inequation (10). To overcome the problem of nonlinearity, we propose instead that a pairof prognostic indexes PPIi ˆ …bbT

1 ui; bbT2 vi† should be de�ned. This can be treated in a

similar way to the PI of the linear model referred to above. To �nd appropriate riskgroups, we select the closest observations to four vectors whose entries are the 12.5,37.5, 62.5 and 87.5% quantiles of each of the corresponding prognostic vector entriesso that we can allocate four risk groups of cases respectively. Although restrictive, thismethod provides a pragmatic way of partitioning the data set. We then proceed to useHartigan’s3 2 k-means algorithm, which partitions neighbourhoods of indexes so thatindexes within clusters are close. We used the S-PLUS function kmeans to �nd thegroups mentioned above, taking as the four cluster centres the quantile vectors.Different results might be expected if the cluster centres of the groups are estimatedwith different criteria .

Fitting competing risks with an assessed copula 345

at UNIV OF VIRGINIA on September 14, 2012smm.sagepub.comDownloaded from

Page 15: Fitting competing risks with an assumed copula

Figure 4 shows a scatter plot of the pair …bbT1 ui; bbT

2 vi†, where bb1 and bb2 are the MLEsof the copula model with Weibull marginals displayed in Table 2. Note that the plotsare classi�ed by the groups obtained with the k-means method; here, the number Ndenotes the number of individuals in each group. The nonparametric formulation isapplied to the four groups and plotted together with the parametric crude distributionprobabilities evaluated at the four different averaged PPIs. This is illustra ted in Figure 5.We observe that the four groups have reasonably close �t. We therefore conclude thatthese graphs suggest that the parametric model gives a good �t to the prostate cancerdata set and that the discrimination method proposed achieves a useful ability toseparate individuals with different responses.

5 Conclusions

The main goal of this article has been to present a statistica l procedure with parametricspeci�cations for analysing failure when the event of interest can be categorized into –possibly – dependent types of death. We have argued that this development has theadvantages over previous approaches such as the semiparametric cause-speci�c Coxproportional hazards where the multiple decrement function and other related quan-tities are �tted using an arbitrary post hoc procedure. First, it allows for a richparametric dependence speci�cation in the multiple decrement function. Secondly, it

Figure 4 Prognostic indexes for groups of individuals classi� ed by the k-means discriminant method

346 G Escarela and JF Carriere

at UNIV OF VIRGINIA on September 14, 2012smm.sagepub.comDownloaded from

Page 16: Fitting competing risks with an assumed copula

can simultaneously estimate the dependence parameter and the parametric marginalsurvival distributions, avoid over-�tting problems and cause censoring to be moreinformative. Thirdly, the technique allows for separate statistical models to be speci�edfor the marginal model of each type of death. Lastly, the estimation procedure followsroutine model �tting such as maximum likelihood inference.

In our case study, we observed that slightly more accurate inferences can be obtainedwith Frank’s copula model than with the model that assumes independent risks ofdeath. Our argument that the risks could be dependent upon the the DES treatment,turned out to be weak when we �tted the Frank’s model with RX alone in each marginalmodel and observed no signi�cant pattern of dependence; nonetheless, the argumentwas strengthened because we had added an important set of covariates, which includedRX itself. The categorization of the covariates in our analysis differs from previousstudies which have constructed categories of the continuous variables using a somehow‘arbitrary’ method. Greater insight into the nature of each covariate in each risk ofdeath may be achieved using the proposals of Durrleman and Simon,3 3 and Hastie andTibshirani.3 4 Finding the categories for the covariates, however, may not be a trivialmatter. Conducting research in this area is needed to chart the boundaries of the resultswe have identi�ed here.

Figure 5 Observed (stepwise) and � tted (smooth) crude distribution curves in four risk groups for theprostate cancer data set

Fitting competing risks with an assessed copula 347

at UNIV OF VIRGINIA on September 14, 2012smm.sagepub.comDownloaded from

Page 17: Fitting competing risks with an assumed copula

AcknowledgementsThe authors gratefully acknowledge the helpful comments of a referee and the editor.

The �rst author wishes to thank CONACYT, Mexico, for their �nancial support.The assistance of colleagues Dr Robert Hughes and Dr Alberto Castillo is muchappreciated.

References

1 Holt JD. Competing risks analyses withspecial reference to matched pair experiments.Biometrika 1978; 65: 159–66.

2 Prentice RL, Kalb�eisch JD, Peterson A,Flournoy N, Farewell V, Breslow N. Theanalysis of failure times in the presence ofcompeting risks. Biometrics 1978; 34: 541–54.

3 Spivey LB, Gross AJ. Concomitantinformation in competing risk analysis.Biometrical Journal 1991; 33: 419–27.

4 Gaynor JJ, Feuer EJ, Tan CC et al. On the useof cause-speci�c failure and conditionalfailure probabilities: Examples from clinicaloncology data. Journal of the AmericanStatistical Association 1993; 88: 400–409.

5 Kanie H, Nonaka Y. Estimation of Weibullshape-parameters for 2 independentcompeting risks. IEEE Transactions onReliability 1985; 34: 53–56.

6 Rao BR, Talwalker S. Random censorship,competing risks and a simple proportionalhazards model. Biometrical Journal 1991; 33:461–83.

7 Tsiatis A. A nonidenti�ability aspect of theproblem of competing risks. Proceedings ofthe National Academy of Sciences 1975; 72:20–22.

8 Eldant-Johnson RC, Johnson NL. Survivalmodels and data analysis. New York: Wiley,1980: 269–93.

9 Carriere JF. Removing cancer when it iscorrelated with other causes of death.Biometrical Journal 1995; 37: 339–50.

10 Wang W, Wells MT. Model selection andsemiparametric inference for bivariate failure-time data. Journal of the American StatisticalAssociation 2000; 95: 62–72.

11 Oakes D. Bivariate survival models inducedby frailties. Journal of the American StatisticalAssociation 1989; 84: 487–93.

12 Gumbel EJ. Bivariate exponentialdistributions. Journal of the AmericanStatistical Association 1960; 55: 698–707.

13 Gumbel EJ. Bivariate logistic distributions.Journal of the American StatisticalAssociation 1961; 56: 335–49.

14 Frank MJ. On the simultaneous associativityof f(x,y) and x ‡ y ¡ f …x; y†. AequationesMathematicae 1979; 19: 194–226.

15 Clayton DG. A model for association inbivariate life tables and its application inepidemiological studies of familial tendency inchronic disease incidence. Biometrika 1978;65: 141–51.

16 Hougaard P. A class of multivariate failuretime distributions. Biometrika 1986; 73: 671–78.

17 Nelsen RB. Properties of a one-parameterfamily of bivariate distributions with speci�edmarginals. Communications in Statistics –Theory and Methods 1986; 15: 3277–85.

18 Nelsen RB. An introduction to copulas. NewYork: Springer-Verlag, 1999.

19 McCullagh P, Nelder JA. Generalized linearmodels, 2nd edn. London: Chapman & Hall,1989.

20 StatSci. S-PLUS User’s Manual. WashingtonDC: StatSci, 1993.

21 Char BW, Geddes KO, Gonnet GH, LeongBL, Monagan MB, Watt SM. Maple VLibrary Reference Manual. New York:Springer-Verlag, 1991.

22 Andrews DF, Herzberg AM. Data: Acollection of problems from many �elds forthe student and research worker. New York:Springer-Verlag, 1985.

23 Kay R. Treatment effects in competing-risksanalysis of prostate cancer data. Biometrics1986; 42: 203–11.

24 Lunn M, McNeil D. Applying Cox regressionto competing risks. Biometrics 1995; 51: 524–32.

25 Byar DP, Green SB. Prognostic variables forsurvival in a randomized comparison oftreatments for prostatic cancer. Bulletin duCancer (Paris) 1980; 67: 477–90.

26 Cheng SC, Fine JP, Wei LJ. Prediction ofcumulative incidence function under theproportional hazards model. Biometrics1998; 54: 219–28.

348 G Escarela and JF Carriere

at UNIV OF VIRGINIA on September 14, 2012smm.sagepub.comDownloaded from

Page 18: Fitting competing risks with an assumed copula

27 Copas JB, Li HG. Inference for non-randomsamples. Journal of the Royal StatisticalSociety B 1997; 59: 55–95.

28 Clayton DG, Cuzick J. The semi-parametricPareto model for regression analysis ofsurvival times. Bulletin of the InternationalStatistical Institute 1985; 51(23.3): 1–18.

29 Bennet S. Log-logistic regression models forsurvival data. Applied Statistics 1983; 32:165–71.

30 Dinse GE, Larson MG. A note on semi-Markov models for partially censored data.Biometrika 1986; 73: 379–86.

31 Harrell FE, Lee KL, Mark DB. Multivariateprognostic models: issues in developingmodels, evaluating assumptions and accuracy,and measuring and predicting errors.Statistics in Medicine 1996; 15: 361–87.

32 Hartigan JA. Clustering algorithms. NewYork: Wiley, 1975.

33 Durrleman S, Simon R. Flexible regressionmodels with cubic splines. Statistics inMedicine 1989; 8: 551–61.

34 Hastie TJ, Tibshirani RJ. Generalized additivemodels. London: Chapman & Hall, 1990.

Fitting competing risks with an assessed copula 349

at UNIV OF VIRGINIA on September 14, 2012smm.sagepub.comDownloaded from