15
A Representation of Mortality Data by Competing Risks Author(s): David G. Hoel Source: Biometrics, Vol. 28, No. 2 (Jun., 1972), pp. 475-488 Published by: International Biometric Society Stable URL: http://www.jstor.org/stable/2556161 . Accessed: 24/06/2014 21:39 Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp . JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected]. . International Biometric Society is collaborating with JSTOR to digitize, preserve and extend access to Biometrics. http://www.jstor.org This content downloaded from 194.29.185.37 on Tue, 24 Jun 2014 21:39:14 PM All use subject to JSTOR Terms and Conditions

A Representation of Mortality Data by Competing Risks

Embed Size (px)

Citation preview

Page 1: A Representation of Mortality Data by Competing Risks

A Representation of Mortality Data by Competing RisksAuthor(s): David G. HoelSource: Biometrics, Vol. 28, No. 2 (Jun., 1972), pp. 475-488Published by: International Biometric SocietyStable URL: http://www.jstor.org/stable/2556161 .

Accessed: 24/06/2014 21:39

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .http://www.jstor.org/page/info/about/policies/terms.jsp

.JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range ofcontent in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new formsof scholarship. For more information about JSTOR, please contact [email protected].

.

International Biometric Society is collaborating with JSTOR to digitize, preserve and extend access toBiometrics.

http://www.jstor.org

This content downloaded from 194.29.185.37 on Tue, 24 Jun 2014 21:39:14 PMAll use subject to JSTOR Terms and Conditions

Page 2: A Representation of Mortality Data by Competing Risks

BIOMETRICS 28, 475-488 June 1972

A REPRESENTATION OF MORTALITY DATA BY COMPETING RISKS

DAVID G. HOEL'

National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, North Carolina 27709, U. S. A.

SUMMARY

Cohort mortality data is represented by a probabilistic combination of competing risks (diseases). Each risk is described by an age-at-death distribution and a net probability of occurrence. This representation is illustrated by a set of pathology data from a well-con- trolled laboratory animal experiment.

1. INTRODUCTION

In recent years, rapidly increasing interest in the relationship between human health and various environmental pollutants has focused mounting attention on the analysis of mortality data. Generally, the usual actuarial methods have not been concerned as much with individual diseases as with the construction of life tables and general mortality rates (Seal [1954], Grenander [1956], Kimball [1960]). However, in the case of the well-con- trolled laboratory animal experiment or epidemiological study, the investi- gator is often concerned primarily with the effects which a certain treatment (exposure to pollutants) has upon the occurrence of a few specific terminal diseases (causes of death). He may, for example, use smog as his pollutant being primarily interested in the occurence of lung tumors. When comparing his data with that from a control group, it could happen that the observed incidence of lung tumors is lower in the treated group. This situation could occur if the treatment caused a generally lower age at death throughout the population. In this manner, sufficient opportunity is not allowed for the development of lung tumors, which have a tendency of occurring in older animals. In any case, it is easy to imagine some of the possible difficulties in the interpretation of this type of data and for further discussion the reader is referred to the paper by Kimball [1958].

We shall assume that we have a complete set of cohort autopsy data giving both the age and cause of death for a population. The particular data used for illustration in this paper were obtained from a well-controlled laboratory experiment. Using these data we wish to describe both the inci- dence and age at death for each particular cause of death. Along this line

I Some of the work was done while the author was at the Oak Ridge National Laboratory.

475

This content downloaded from 194.29.185.37 on Tue, 24 Jun 2014 21:39:14 PMAll use subject to JSTOR Terms and Conditions

Page 3: A Representation of Mortality Data by Competing Risks

476 BIOMETRICS, JUNE 1972

Kimball [1958] has approached the problem by estimating, without any distributional assumptions, the individual disease incidences in a population. Another novel approach as suggested by Kimball [1958] is the application of the results of Sampford [1952]. In Sampford's paper, an 'accidental death' model is proposed in which each individual has potential survival times t1 and t2, where t2 corresponds to accidental death and t1 to 'natural' death. Sampford assumes that t1 and t2 are independent and normally distributed and that death occurs at time equal to the minimum of t1 and t2. Berkson and Elveback [1960] have also used this approach with exponential survival times. More recently, Moeschberger and David [1971] have generalized the problem to k causes of death with general survival distributions. Also, Chiang [1968] has considered competing risks for grouped data with the assumption that the force of mortality associated with a disease remains a constant proportion over time of the total force of mortality (see David [1970]).

To facilitate an explanation of mortality data, we shall employ a modifi- cation of Sampford's method which is as follows. For each possible cause of death (risk) we assume that there is both an age-at-death distribution and also a probability of occurrence. This probability of occurrence is defined as the probability of death if the specific cause is the only cause in effect in the population and is usually referred to as the net probability over the entire life-span of the population (see Chiang [1968]). Net probabilities are usually calculated for a specific subset of the life-span and are assumed to be equal to 1 for the entire life-span. Our use of net probabilities is similar to the quantity which Sampford [1954] calls 'proportion not immune' and which Boag [1949] and Kodlin [1961] call 'fraction uncured and subject to a partic- ular cause of death'.

To obtain a realization of the age at death of a subject, a collection of risks is constructed with a particular risk belonging to the collection with a probability equal to its net probability. This collection is not empty with probability 1 since we shall assume that one of the risks is so-called 'other causes' and that its net probability is 1. For each risk in the collection an observation from its associated age-at-death distribution is obtained; how- ever, only the minimum is actually observed. Then the subject's age at death is taken to be the minimum of these ages. In essence, the risk corresponding to the minimum age is the cause of death.

In order to fit this model to a set of data, we assume that the functional form of each age distribution is known up to the point of unknown param- eters. These parameters plus the unknown net probabilities are then esti- mated from the data by the usual maximum likelihood (ML) methods. Alternatively, if no parametric assumptions are made, the nonparametric approach of Kaplan and Meier [1958] may be applied to estimate both the net probability and the age-at-death distribution.

The salient feature of this method of describing autopsy data is that by combining probabilistic models for the age at death and the probability of occurrence of each cause of death, we obtain a single probabilistic description of the subject's age and cause of death. This permits comparisons between

This content downloaded from 194.29.185.37 on Tue, 24 Jun 2014 21:39:14 PMAll use subject to JSTOR Terms and Conditions

Page 4: A Representation of Mortality Data by Competing Risks

MORTALITY DATA BY COMPETING RISKS 477

treatment and control groups from the standpoint of each risk. Also for each risk comparisons may be made for both the probabilities of occurrence and the age-at-death distributions.

2. MORTALITY MODEL

Suppose that there are k causes of death S1, * * *, Sk which may represent such things as myeloid leukemia, nephrosclerosis, ovarian tumor, etc. These causes of death are the ones in which the investigator is interested and all other types of death are grouped together in So which we refer to as 'other causes'. Also, Si can be chosen to represent a collection of diseases instead of a single cause of death. For example, the experimenter may be interested in tumors of the female endocrine system instead of simply ovarian tumors. For each risk Si, there is a net probability pi of being subject to Si and we assume that po is equal to 1. Next, let the positive continuous random vari- able Xi denote the age at death from cause Si and let Fi denote the distri- bution function of Xi (i = 0, 1, *., k). Finally, we define the indicator random variables

1 with probability pi Yi =

o with probability 1 - pi

foir i = 0, 1, , k and assume that the sequences {Xi}<, f Y} are inde- pendent. This last assumption of independence is important because it is essentially a statement that the causes of death behave independently. This will not be generally valid unless the Si represent unrelated diseases, which hopefully will be at least approximately true in most cases. Also, it must be stressed that the Si represent actual causes of death and not simply death with a particular disease present such as certain tumors. It is inappropriate to apply competing risk methodology to nonlethal disease occurrences.

In this format the random variable representing the subject's age at death is defined by

X = min (XoYo0, X1Y1, , XI ,

and the cause of death is that Si for which X = XiYi . Note that X will be finite with probability 1 since YO = 1 with probability 1. Also, the inclusion of the improper variables Yi is simply an artificial device enabling us to write the random variable X in a simple manner.

A subject's age at death X and cause of death Si can then be interpreted as follows. A set of contending causes of death S is determined for the sub- ject, where the cause Si belongs to S with probability pi . The set S is then the possible causes of death for the particular individual under considera- tion. Next, an age at death Xi is found for each Si belonging to S and the smallest such age Xi is the subject's age at death with the corresponding Si the cause of death. Also, only the smallest age Xi is actually observable.

If we let F denote the distribution function of X, it follows from the

This content downloaded from 194.29.185.37 on Tue, 24 Jun 2014 21:39:14 PMAll use subject to JSTOR Terms and Conditions

Page 5: A Representation of Mortality Data by Competing Risks

478 BIOMETRICS, JUNE 1972

above assumptions that k

F(x) = 1 - H [1 - piFi(x)]. (1) i =o

Another quantity of interest is p*. which is the probability of Si being the cause of death (crude probability). It is not difficult to show that

p* = {I [1 - pjFi(x)]}pi dFi(x). (2) ioi

Now, p*, which corresponds to the observed incidence of Si , should not be confused with pi . The net probability pi is the probability of being affected by cause Si if the subject is prevented from being affected by all other risks. Thus we see that pi does not depend upon what other causes we are con- sidering in the model, whereas p*. does. Also from (2) we have pi > p* which is intuitively correct.

Another quantity of interest is the distribution F* of the observed ages at death due to cause Si . Thus F*. is the conditional distribution of X given that Si is the cause of death and may be written as

Fi(t) = { TI - piFj(x)]}p* dFi(x)/p* (3)

The mean of F* , referred to as the observed mean age at death for cause Si is then given by

= f t{ I [1 - piFi(t)]}pi dFi(t)/p* . (4)

Some statements can be made about the effects on p*., F* , p*. by a change in some other cause Si (j # i). We see, first of all, from (2) that p* is a nonincreasing function of both pi and Fj . In other words, the observed incidence of death due to Si will decrease (not increase) as the incidence of Si increases or as the age at death due to Si decreases.

To see the effect on F*. by a change in pi we write (3) as

F*(t) = f [1-piFi(x)] dG(x) f [1 - pjF(x)] dG(x), (5)

where G is a nondecreasing function independent of j. Differentiating (5) by pi we find that F* (t) will be an nondecreasing function of pi provided that

t 0 X t

f(1 -pF) dGf FdG f (I -pF) dG F dG. (6)

Now (6) is equivalent to

f dG f F dG > f dG F dG (7)

which is valid since F and G being nondecreasing functions implies that t F t

F dG/ dG

This content downloaded from 194.29.185.37 on Tue, 24 Jun 2014 21:39:14 PMAll use subject to JSTOR Terms and Conditions

Page 6: A Representation of Mortality Data by Competing Risks

MORTALITY DATA BY COMPETING RISKS 479

is a nondecreasing function of t. We therefore conclude that the observed age-at-death distribution FP is an increasing (nondecreasing) function of pi. This in turn implies that the observed mean age-at-death ,* is a decreasing (nonincreasing) function of pi . Thus the experimenter will often observe a life-shortening with respect to a particular cause of death when there is an increase in the incidence of another disease. Nothing definite can be said about the effects on FP and ,u* by changes in either Fi or ,gi . For example, a location change in Fi may increase or decrease ,u*. . It simply depends upon the distributions involved.

3. ESTIMATION OF THE MODEL

In particular, we assume that there is a sample {xi } (j = 1, n, m i = 01, *... , k) of ages at death where ni ages xi, , Xi2 . , xi,i are deaths due to cause Si . From this sample we wish to obtain the ML estimates of the net probabilities {pi} and the unknown parameters Oi of the distribu- tions {Fi}. In order to do this, we observe that the likelihood function is proportional to

k ni

L = 11111 pifi(x,i) rI [1 - prFr(xjj)l} (8) i=o ij1 rXi

where fi is the probability density function associated with the distribution Fi . However, the terms of L can be rearranged so that

L =H Li (9) t =o with

Li= [ pifi(Xij)}( f [1 - pjFji(xi)]} (10)

MVloeschberger and David [1971] also deal with the likelihood function (8) but take the net probabilities to be unity.

If we assume that each distribution has its own set of parameters Oi then the estimation can be performed individually for each cause. That is, for cause Si, we estimate pi and Oi by maximizing Li . So we see that the esti- mates for a given cause are not affected by the other causes which are used to make up the model. This fact was first pointed out by Sampford [1952] for his multi-response model.

To find the ML estimates pi and Oi we may either directly maximize Li numerically with respect to pi and Oi, or solve the system of equations

a log Li/api 0 (11)

and

a log Li/a0i1 = 0 j= 1, , Mi (12)

for pi and Oi , where Oi = (0 , , 6,) From the definition of Li in

This content downloaded from 194.29.185.37 on Tue, 24 Jun 2014 21:39:14 PMAll use subject to JSTOR Terms and Conditions

Page 7: A Representation of Mortality Data by Competing Risks

480 BIOMETRICS, JUNE 1972

(10) it follows that (11) is equivalent to

nr E Z piFj(x,i)/[1 - p,FX(x,1)] = ni (13)

pI.'i j-i

and (12) is equivalent to

E [Ofi(XiX)/OGim]/f1i(xii) - E [pi jFi(xrj)/O9irj]/[1 - pjFj(x71)] = 0 j-1 rp1 ij1

(14)

for m = 1, ,m . To obtain some idea as to the variability associated with these estimates

we appeal to the fact that (0'o,Ojo , **, Oim) is asymptotically normally distributed with covariance matrix

-(Ed ) j,r = 0, 1, ** , Mi (15)

where Oio = pi . It can be shown that the expected value in (15) is equal to

n _____ __ 1 (I( (Oj) ( ) )] pi [1 - Fm(z)] dz a I L aGj- fOG z ~~ `A '90Gl. /1 mi (16)

- n L da, + 1 -Fi(z) ( /1 E fm(z) HI [1 - F.(z)]} dz, where n = ni.

It should be pointed out that although the estimates for risk Si do not depend upon the other risks, expression (16) and therefore the asymptotic covariance matrix does depend upon these other risks. Clearly the asymp- totic covariance between a parameter from risk Si and one from risk Sj is equal to zero. Once the estimates are obtained for all causes, then (16) may be approximated using the estimated parameter values. From these approxi- mations we obtain from (15) our estimated asymptotic covariance matrix for the parameters of risk Si .

Besides giving information concerning the precision of our estimates, (15) is also useful in attempting to determine whether or not a treatment has significantly affected the age distribution or incidence of a particular risk of death.

4. NONPARAMETRIC ESTIMATION OF THE MODEL

Suppose the ages at death are ordered as 0 < t1 < t2 ? ... *? tn and let Gi (t) be the distribution function of the cause Si . The { ti I are then made up of ages associated with Si and ages of death due to the other causes. Thus we can consider the data to be a set of randomly censored observations. Kaplan and Meier [1958] have given a nonparametric M1L estimate of GC (t) which they call the product-limit estimate. This estimate is given by

(?i(t) = 1 - ]I [(n - ri)/(n - ri + 1)], (17)

This content downloaded from 194.29.185.37 on Tue, 24 Jun 2014 21:39:14 PMAll use subject to JSTOR Terms and Conditions

Page 8: A Representation of Mortality Data by Competing Risks

MORTALITY DATA BY COMPETING RISKS 481

where ri assumes values for which t,i < t and tr, is an age associated with cause Si. They have further shown that if the number of deaths at ages greater than or equal to t tends to infinity as n tends to infinity, then 6i (t) converges in probability to 0, (t).

Since the model identifies Gi(t) with piFi(t) we estimate the net prob- ability pi by

pi =(ci(0) (18)

and the age-at-death distribution Fi by

F i Pi (19)

Combining these estimates we find

P(t) = 1-II [1 -_ii(t)] i-O

- 1- hI rI [(n - r)/(n - ri + 1)] r/n, i-O ri

where r is the largest integer s such that t, < t. In other words, P(t) is the empirical distribution function for the complete set of age-at-death data.

The calculation of the nonparametric estimates (18) and (19) is straight- forward in comparison to the difficulties in obtaining the parameter estimates for a given functional form of F,(t) as described in section 3. An alternative method for finding these parametric estimates is to construct a sample based upon the nonparametric estimate Fi of Fi . To do this, suppose xi , x2 * ,m

denote the points of increase of Fi with jumps of size jl , j2 , jm (i.e. F(xi) - F(x-i) = ji). Since each ji is a rational fraction, we let ii = hildi with hi and di integers (i = 1, *.., m) and define d = E , di. Then a sample with ji d observations equal to xi (i = 1, * * *, m) will have Fi as its empirical distribution function. We then propose estimating the parameters of Fi by finding their M\L estimates based upon the constructed sample. The likelihood function for the sample is

L = fi f(xi)iid in1

and thus it is required to maximize fl.-, f(xj) ;. Estimates determined in this manner will naturally differ from the ML estimates. However, they are generally easier to find.

Finally, there is a relationship between the product-limit estimate (17) and the usual actuarial-interval estimates. To obtain the interval estimates, divide the time axis into m intervals Ii(i = , * * *, m) and let ni denote the number of animals alive at the start of interval Ii . The net probability of death due to risk Si in interval Ij is defined as (see Chiang [1968]):

qi= Pr {an animal alive at the start of I, will die in I; if Si is the only risk acting on the population }.

This content downloaded from 194.29.185.37 on Tue, 24 Jun 2014 21:39:14 PMAll use subject to JSTOR Terms and Conditions

Page 9: A Representation of Mortality Data by Competing Risks

482 BIOMETRICS, JUNE 1972

From these net interval probabilities we can form a cumulative mortality due to risk Si from the relation

Fii = Fi,i-1 + (1 - Fii_1)9i (20)

(Ffo -0), where Fi; is probability of death due to Si up to and including interval I; when Si is the only risk acting on the population.

Now, assume the intervals are sufficiently small so that at most one death occurs in each interval. Most estimates of qi , including Chiang's, are then of the form

1/ni if a death due to Si occurs in 13. qij =

0 otherwise.

Let x1 < x2 < ... < x, represent the observed deaths due to risk Si and define si to be the number of animals alive before the death at xi . Now in the limit as the interval lengths tend to zero the cumulative mortality func- tion (20) has jumps only at the observed deaths xi . The value of the esti- mated cumulative mortality function at xi is then

F(xi) = F(xi_1) + [1 - F(xi_1)]/si , (21)

with F(xo) = 0. Solving (21) we find

'(Xi) = 1 - II(1 - 1/si), (22)

which can be shown to be equal to the product-limit estimate (17).

5. EXAMPLES

To illustrate the methods given in the previous sections we shall use some mortality data kindly provided by Dr. H. E. Walburg, Jr., of the Oak Ridge National Laboratory. The data given in Tables 1 and 2 were obtained from a laboratory experiment on two groups of RFM strain male mice which had received a radiation dose of 300r at an age of 5-6 weeks. The first group of mice (Table 1) lived in a conventional laboratory environment while the second group (Table 2) was in a germ-free environment. We chose to con- sider only the two major causes of death (thymic lymphoma and reticulum cell sarcoma) and combined all the other causes into a single group. Biolo- gists believe that both of these two diseases are lethal and that they are independent of one another and of other causes of death. Thus the basic assumptions of the model are reasonably well satisfied.

In selecting a functional form for Fi(t), the Makeham-Gompertz distri- bution

f (t) = b(ln a)eb taleb t > 0 (23)

was chosen because of its history in describing cumulative mortality curves (see Kimball [1960]).

This content downloaded from 194.29.185.37 on Tue, 24 Jun 2014 21:39:14 PMAll use subject to JSTOR Terms and Conditions

Page 10: A Representation of Mortality Data by Competing Risks

MORTALITY D)ATA BY COMPETING RISKS 483

TABLE 1 AUTOPSY DATA FOR 99 RFM CONVENTIONAL MALE MICE WHICH RECEIVED A RADIATION DOSE

OF 300r AT AGE 5-6 WEEKS

159, 189, 191, 198, 200, 207, 220, 235, 245 Thymic Lymphoma 250, 256, 261, 265, 266, 280, 343, 356, 383

(22%) 403, 414, 428, 432

317, 318, 399, 495, 525, 536, 549, 552, 554 557, 558, 571, 586, 594, 596, 605, 612, 621

Reticulum Cell Sarcoma 628, 631, 636, 643, 647, 648, 649, 661, 663 (38%) 666, 670, 695, 697, 700, 705, 712, 713, 738

748, 753

40, 42, 51, 62, 163, 179, 206, 222, 228 252, 249, 282, 324, 333, 341, 366, 385, 407

Other Causes 420, 431, 441, 461, 462, 482, 517, 517, 524 (39%) 564, 567, 586, 619, 620, 621, 622, 647, 651

686, 761, 763

For the given data, ML estimates of the parameters were obtained by actually maximizing the likelihood function numerically by computer. This turned out to be more expedient than solving the likelihood equations (13) and (14). The estimates of these parameters are given in Table 3 and the resulting distributions along with the estimated nonparametric distributions are shown in Figures 1 and 2. Now, a possible criterion for the adequacy of a parametric model for the mortality function is how well it agrees with the nonparametric mortality curve. On this basis, it appears from Figures 1 and 2 that the Makeham-Gompertz assumption is quite adequate for re- ticulum cell sarcoma but fails with thymnic lymphoma. A more 'exponential type' function such as the Weibull distribution should provide a better fit. The estimated and observed total mortality are given in Figure 3. The estimated total mortality was obtained from (1) using the ML estimates to estimate the individual Fi . Here the inadequacy of the Makeham-Gompertz distribution for thymic lymphoma reveals itself, particularly in the germ- free mice where over a third of the observed deaths were due to thymic lymphoma and occurred in the lower age range.

Table 4 demonstrates a fairly close agreement between the net prob- abilities obtained from the Gompertz assumption and those established by

TABLE 2 AUTOPSY DATA FOR 82 RFM GERM-FREE MALE MICE WHICH RECEIVED A RADIATION DOSE

OF 3001 AT AGE 5-6 WEEKS

158, 192, 193, 194, 195, 202, 212, 215, 229 Thymiic Lymphoma 230, 237, 240, 244, 247, 259, 300, 301, 321

(35%) 337, 415, 434, 444, 485, 496, 529, 537, 624 707, 800

Reticulum Cell Sarcoma 430, 590, 606, 638, 655, 679, 691, 693, 696 (18%) 747, 752, 760, 778, 821, 986

136, 246, 255, 376, 421, 565, 616, 617, 652 655, 658, 660, 662, 675, 681, 734, 736, 737

Other Causes 757, 769, 777, 800, 807, 825, 855, 857, 864 (46%) 868, 870, 870, 873, 882, 895, 910, 934, 942

1015, 1019

This content downloaded from 194.29.185.37 on Tue, 24 Jun 2014 21:39:14 PMAll use subject to JSTOR Terms and Conditions

Page 11: A Representation of Mortality Data by Competing Risks

484 BIOMETRICS, JUNE 1972

TABLE t3

ML ESTIMATES OF THE MVIAKEHAM-GOMPERTZ PARAMETERS

ConvelItional Mica Ger-,-Fre Mice

a b a b

Thymic Lymphoma 1.026 .011'18 1.35 .003375

Reticulum Cell Sarcoma 1.00009 .01378 1.0009 .008242

Other Causes 1.055 .00403 1.0019 .007321

the nonparametric method. Upon comparing these probabilities with the observed frequencies (Tables 1 and 2), we see that there is a large discrepancy with reticulum cell sarcoma but not with thvmic lymphoma. This is to be expected considering the fact that one disease is early-occurring and the other late-occurring. Another point of interest is that both the Gompertz and the nonparametric estimated crude probabilities (see (2)) agreed almost exactly with the observed incidence. Finally, in Table 5, the mean ages at death are given. For the Gompertz model the mean is found by using the parameter estimates in the expression for the Makeham-Gompertz mean which is given in the Appendix. The means for both models do not differ much from the observed means for the conventional mice; however, with the germ-free mice there is an increase of almost 100 days for reticulum cell sarcoma. This would thereby alter any statement made about the effects of a germ-free environment on the mean age at death of reticulum"cell sarcoma.

- 0 ORNL-DWG 70-5345

THYMIC LYMPHOMA

0.8 1_______1-

0.6

0.4

RETICULUM CELL SARCOMA

0.2

0 200 400 600 800 1000 1200 AGE (days)

FIGURE 1 ESTIMATED C-UM-ULATIVE MORTALITY F-UNCTIONS FOR THE CONVENTIONAL MICE

This content downloaded from 194.29.185.37 on Tue, 24 Jun 2014 21:39:14 PMAll use subject to JSTOR Terms and Conditions

Page 12: A Representation of Mortality Data by Competing Risks

MORTALITY DATA BY COMPETING RISKS 485

ORNL-DWG 70-5346

1.0

THYMIC LYMPHOMA

0.8

0.6

0.4

RETICULUM CELL

0.2 1 /t \ SARCOMA __l

0 0 200 400 600 800 1000 1200

AGE (days)

FIGURE 2 ESTIMATED CUMULATIVE MORTALITY FUNCTIONS FOR THE GERM-FREE MICE

ORNL-DWG 70-5344 1.0 r-- '--

0.90~*

0. 8 ___ -

0.7IMATEDTOTAL CONVENTIO EMORTALIiNAL

0.6- -

g0 61 ~GERM-FREE

0. 4- 0'

0.3 - -S

0.2 - - _ _

0.1 --?

.0

0 100 200 300 400 500 600 700 800 900 1000 1100

AGE (days)

FIGUR.E 3 ESTIMATED TOTAL CUMULATIVE, MORTALITY FUNCTIONS

This content downloaded from 194.29.185.37 on Tue, 24 Jun 2014 21:39:14 PMAll use subject to JSTOR Terms and Conditions

Page 13: A Representation of Mortality Data by Competing Risks

486 BIOMETRICS, JUNE 1972

TABLE 4 ESTIMATED NET PROBABILITIES OF OCCIURRENCE

Conventional Mice Germ-Free Mice

Gompertz Nonparametric Gompertz Nonparametric

Thymic Lymphoma .26 .27 .39 .41

Reticulum Cell Sarcoma .97 .94 .55 .58

TABLE 5 MEANS OF THE FITTED MORTALITY DISTRIBUTIONS AND OF THE OBSERVED DATA

Conventional Mice Germ-Free Mice

Gompertz Nonpara. Obs. Gompertz Nonpara. Obs.

Thymic Lymphoma 285 288 281 362 375 344

Reticulum Cell Sarcoma 634 607 609 782 803 701

6. FINAL COMMENTS

Although it should be a universally accepted procedure, we emphasize the need for examining mortality data on a basis of competing risks. All too often, unfortunately, conclusions about the effect of a particular treatment on a disease are based simply upon the observed incidence and mean age at death. Our purpose has been to describe a risk by both a net probability and an associated age-at-death distribution. When comparing a treatment group with a control group, either the net probability or the age distribution or both may change. This then presents the problem of deciding whether or not these changes are significant. One useful approach considered by Efron [1965], Gehan [1965], and Breslow [1970] has been to modify the Wilcoxon statistic to account for censoring. (Using Breslow's statistic as an approxi- mate test of differences between the germ-free and conventional mice of section 5, we find a highly significant difference with respect to reticulum cell sarcoma but no difference with thymic lymphoma.) These tests compare the combined net probability and age-at-death distribution. Thus it would be useful to be able to test these quantities individually.

A second problem of practical importance is the weakening of the assump- tion of independent risks. This is certainly of relevant concern and will probably require some large and carefully designed laboratory experiments in order to suggest reasonable structures for any models of disease depend- encies.

ACKNOWLEDGMENT

I am indebted to Dr. H. E. Walburg, Jr., of the Oak Ridge National Laboratory for providing the data used and for his guidance on biological questions.

This content downloaded from 194.29.185.37 on Tue, 24 Jun 2014 21:39:14 PMAll use subject to JSTOR Terms and Conditions

Page 14: A Representation of Mortality Data by Competing Risks

MORTALITY DATA BY COMPETING RISKS 487

UNE REPRESENTATION DE DONNEES DE MORTALITE PAR DES RISQUES COMPETITIFS

RESUME

La mortalite d'une cohorte est representee par une combinaison stochastique de risques competitifs (maladies). Chaque risque est decrit par une distribution de probabilit6 de l'age A la mort et les probabilit6s croisees d'apparition. Cette representation est illustree par un ensemble de donnees de pathologie provenant d'un laboratoire d'experimentation animale bien contr6lee.

REFERENCES

Berkson, J. and Elveback, L. [1960]. Competing exponential risks, with particular reference to smoking and lung cancer. J. Amer. Statist. Ass. 55, 415-28.

Breslow, N. l1970]. A generalized Kruskal-Wallis test for comparing K samples subject to unequal patterns of censorship. Biometrika 57, 579-94.

Boag, J. W. [1949]. Maximum likelihood estimates of the proportion of patients cured by cancer therapy. J. R. Statist. Soc. B 11, 15-53.

Chiang, C. L. [1968]. Introduction to Stochastic Processes in Biostatistics. Wiley, New York. David, H. A. [1970]. On Chiang's proportionality assumption in the theory of competing

risks. Biometrics 26, 336-9. Efron, B. [1965]. The two sample problem with censored data. Proc. 5th Berkeley Symp.

Math. Statist. Prob. IV, 831-53. Gehan, E. A. [1965]. A generalized Wilcoxon test for comparing arbitrarily singly censored

samples. Biometrika 52, 203-23. Grenander, V. [1956]. On the theory of mortality measurement. Skand. Aktuarietidskr. 39,

1-26. Kaplan, E. L. and Meier, P. [1958]. Nonparametric estimation from incomplete observa-

tions. J. Amer. Statist. Ass. 53, 457-81. Kimball, A. W. [1958]. Disease incidence estimation in populations subject to multiple

causes of death. Bull. Internat. Statist. Inst. 36, 193-204. Kimball, A. WX. [1960]. Estimation of mortality intensities in animal experiments. Bio-

metrics 16, 505-21. Kodlin, D. [1961]. Survival time analysis for treatment evaluation in cancer therapy.

Cancer Research 21, 1103-07. Moeschberger, M. L. and David, H. A. [1971]. Life-tests under competing causes of failure

and the theory of competing risks. Biometrics 27, 909-33. Sampford, M. R. [1952]. The estimation of response-time distributions II. Multi-stimulus

distributions. Biometrics 8, 307-53. Sampford, M. R. [1954]. The estimation of response-time distributions III. Truncation and

survival. Biometrics 10, 531-61. Seal, H. L. [1954]. The estimation of mortality and other decremental probabilities. Skand.

Aktuarietidskr. 37, 137-62.

APPENDIX

In order to evaluate the moments of the Makeham-Gompertz density (23) we write

tmb(ln a)e'tal'eb dt = a(In a)b-m' (In t)'e-t Ia It.

This content downloaded from 194.29.185.37 on Tue, 24 Jun 2014 21:39:14 PMAll use subject to JSTOR Terms and Conditions

Page 15: A Representation of Mortality Data by Competing Risks

488 BIOMETRICS, JUNE 1972

Then, by using the relationship

(In t)me-" dt - , [A 1r(t + 1 ,i)],o

where

r(a, b) = f talet dt = r(a, 0) - E (- Jb n~~~~~~'-O (a + n) .n!

it can be shown that the first two moments of (23) are

a [cX (-lIn a)']

and

(C+ lnlna)2 + 2 Z ji!)j

respectively, where C is Euler's constant.

Received May 1971, Revised December 1971

Key Words: Competing risks; Survival experiment; Cumulative mortality function: Gom- pertz distribution.

This content downloaded from 194.29.185.37 on Tue, 24 Jun 2014 21:39:14 PMAll use subject to JSTOR Terms and Conditions