4
Environment International Vol. l, pp. 347-350. Pergamon Press Ltd. 1978. Printed in Great Britain. Nonidentifiability of Competing Risks* David M. Rose Energy Technoqogy Applications Division, Boeing Computer Services Compare/, Seattle, Washington, U S A Frequently one is interested in examining the survival experience of a set of individuals exposed to k risks. If H is the time of failure for the individual and J is the indicator variable for cause of death, then the competing risk framework assumes that W = rain X~ and J = j ifX 1 is the minimum, where the X~'s are the potential lifetimes when only risk R i is operating on the population. To examine the underlying structure of the survival experience one has to know the joint distribution of the X~ (Fx). It is shown here that if only W and J are observed, the joint distribution of the X~ (Fx) is nonidentifiable. 1. Introduction The subject of competing risks arises from the study of a population, or several populations, where each member of a population has a lifetime W and, associated with the failure at time W, a cause of failure J. Let Rj denote the risk associated with cause of failure J. The usual treatment of the subject thus assumes that each failure is attributable to a unique cause. The populations may be biological units, where lifetime may be time until death or time until onset of a well-defined state, such as remission of disease. The associated risks may be the set of possible pathological causes of death, disease remission, or loss to follow-up. The populations may also be sets of physical units, such as motors, where the lifetime may be running time until breakdown, and the risks, R j, are the causes of breakdown. Concern about two fundamental problems described below has focused attention on the subject of competing risks. Whether this method can solve these problems is closely related to the main topic to be addressed nonidentifiability of competing risks. Problem 1: A population is subject to K modes of failure. One risk, R1, is removed. What is the resulting effect on the distribution of life Wand cause of failureJ ? What mode of failure becomes dominant? Problem 2: Two populations are to be compared with respect to risk R1. The populations have dissimilar histories with respect to the other risks (or diseases) present. How can the effect of these nuisance risks be removed in order to compare the incidence of R 1 in these two populatlons? *Research supported under an N.I.H. Biometry grant 1971 72 University of Washington. The interest of biostatisticians in the area of competing risks is, in the author's view, chiefly due to the work ofC. L. Chiang. His papers have collected much of the prior work in the area in addition to his own work and presented the material in an extremely readable and well-organized fashion (e.g., Chiang, 61, 68). 2. Statement of structure The above problems may be mapped into the following mathematical structure. Each individual has a lifetime W and a mode of failure J, with cumulative (in W) distribution Fw, j(w, j) = P[W < w, J =j], and densityJw ' j(w, j). W is sometimes referred to as the crude lifetime. Each individual has a set of K potential or net lifetimesXi, i = 1 ..... K, which is the lifetime if risk Ri is the only mode of failure. These net lifetimes are related to W,, J by and W : min X i, J =j ifX~<Xi i ¢j. So that W and J are well defined, it is assumed that P[X i = Xj] = 0 for i ~ j. These net lifetimes are governed by a joint densityjx(x ), cumulative distribution Fx(x) = P[X1 <_Xl ..... Xk <xk], and survival or reliability function Hxtx ) = P[X I > x I ..... X~> xk]. The marginal densities for the X i are denoted byfx ' (x), the 347

Nonidentifiability of competing risks

Embed Size (px)

Citation preview

Page 1: Nonidentifiability of competing risks

Environment International Vol. l, pp. 347-350. Pergamon Press Ltd. 1978. Printed in Great Britain.

Nonidentifiability of Competing Risks*

David M. Rose Energy Technoqogy Applications Division, Boeing Computer Services Compare/, Seattle, Washington, U S A

Frequently one is interested in examining the survival experience of a set of individuals exposed to k risks. If H is the time of failure for the individual and J is the indicator variable for cause of death, then the competing risk framework assumes that W = rain X~ and J = j ifX 1 is the minimum, where the X~'s are the potential lifetimes when only risk R i is operating on the population. To examine the underlying structure of the survival experience one has to know the joint distribution of the X~ (Fx). It is shown here that if only W and J are observed, the joint distribution of the X~ (Fx) is nonidentifiable.

1. Introduction

The subject of competing risks arises from the study of a population, or several populations, where each member of a population has a lifetime W and, associated with the failure at time W, a cause of failure J. Let Rj denote the risk associated with cause of failure J. The usual treatment of the subject thus assumes that each failure is attributable to a unique cause. The populations may be biological units, where lifetime may be time until death or time until onset of a well-defined state, such as remission of disease. The associated risks may be the set of possible pathological causes of death, disease remission, or loss to follow-up. The populations may also be sets of physical units, such as motors, where the lifetime may be running time until breakdown, and the risks, R j, are the causes of breakdown.

Concern about two fundamental problems described below has focused attention on the subject of competing risks. Whether this method can solve these problems is closely related to the main topic to be addressed nonidentifiability of competing risks.

P r o b l e m 1: A population is subject to K modes of failure. One risk, R1, is removed. What is the resulting effect on the distribution of life Wand cause of failureJ ? What mode of failure becomes dominant? Prob l em 2: Two populations are to be compared with respect to risk R1. The populations have dissimilar histories with respect to the other risks (or diseases) present. How can the effect of these nuisance risks be removed in order to compare the incidence of R 1 in these two populatlons?

*Research supported under an N.I.H. Biometry grant 1971 72 University of Washington.

The interest of biostatisticians in the area of competing risks is, in the author's view, chiefly due to the work ofC. L. Chiang. His papers have collected much of the prior work in the area in addition to his own work and presented the material in an extremely readable and well-organized fashion (e.g., Chiang, 61, 68).

2. Statement of structure

The above problems may be mapped into the following mathematical structure. Each individual has a lifetime W and a mode of failure J, with cumulative (in W) distribution

Fw, j (w, j ) = P [ W < w, J = j ] ,

and densityJw ' j(w, j). W is sometimes referred to as the crude lifetime. Each individual has a set of K potential or net lifetimesXi, i = 1 . . . . . K, which is the lifetime if risk Ri is the only mode of failure. These net lifetimes are related t o W,, J by

and

W : min X i,

J = j ifX~<Xi i ¢ j .

So that W and J are well defined, it is assumed that

P [ X i = Xj] = 0 for i ~ j. These net lifetimes are governed by a joint densityjx(x ), cumulative distribution

Fx(x) = P[X1 <_Xl . . . . . Xk <xk] ,

and survival or reliability function

Hxtx ) = P [ X I > x I . . . . . X ~ > xk].

The marginal densities for the X i are denoted byfx ' (x), the

347

Page 2: Nonidentifiability of competing risks

348 David M. Rose

marginal cumulative distr ibution by

F x , ( X ) = P[X, <_x],

and the marginal survival function by Hx,(X) = P[X i > x].

Similarly the marginal density, cumulative distr ibution and survival function for W are denoted byfw(w), Fw(w) and Hw(w ). The basic relation between W,, J, and X can then be given by

f/f: ; Fw, j (w, j) = . . fx(X) dx I dx 2 . . . dx k dxj j" j

o r

f f: fw.j(w,j) = . . . fx(X) d x , . . .

where the following relation holds:

Hx(x)iX 1 = x 2 . . . . . x k = w. (2.2) fw, a(w, j) = -- ,?x~

The chief problem in compet ing risks invoh'es the situation where one makes all observations ill the 14/.. J space, but would like to make statements about the structure in the X space.

In all subsequent discussion the joint distribution Fw, s will be assumed to be known without error. Does knowledge ofFw. j determine F x, is there identifiability?

In discussing the problem it is convenient to deal with hazard rates, instantaneous failure rates, or forces of mortali ty, defined as

2x,(X ) = fx,(x)/Hx,(X)

and )~w(W) = Jw(w)lHw(w).

Note that 2w determines Fw; 2x, determines Fx,; but )~x,, . . . . 2x~ do not determine F x (Frechet 's theorem).

A partial hazard rate 2w, s(w, j) is introduced as the instantaneous failure rate from risk Rj when all modes of failure are present:

2w. j(w, j) = fw.a(w, j) /Hw(w ).

Thus, one obtains k

Z 2w. s(w, j ) = 2w(W ). j = l

3. Identifiability

What assumptions are necessary such that Fw. J uniquely determines Hx?

Proposition 1 : If the X[s are stochastically independent then Fw, s uniquely determines H x. Proof F r o m Fw.a(w, j) one obtains directly Fw(w ), Hw(w ), a n d J , , j(w,j). Since theXi are independent, one has

k

Jdx) = n jx,(X~). i=1

Using equat ion (2.1), one obtains

j;,.a(w, j) = . . . ix(X) H dxi xj = w i = 1 i¢: i

= H L,,(x,) dxi ~(w) i = l i¢~j

= Hw(w ) 2xflW ). (3.1)

Then from equat ion (3.1) and the definition of 2w. s, one obtains

2xflW) =fw. j(w, j) /Hw(w ) = 2w, j(w, j) (3.2)

and

Hxf lX )=exp - 2w. J(w) dw ,

dxj x dxj+ t. • • dXk, which yields H x as a function Of2w. j. Thus, if one assumes (2.1) independence one can observe in the W,, J space and

determine the structure in the X space.

Proposition 2: Fw. j does not uniquely define H x (H x is nonidentifiable), without any further assumptions. Proof The nonidentifiability o fH x is most easily shown by a counterexample to the identifiability conjecture (Example 1 below). The underlying structure that leads to this nonidentifiability is most easily shown by the following geometrical argument .

Let k = 2. There are two risks R 1, R 2 operat ing on the populat ion with net lifetimes X 1, X 2. Then one obtains

F,:Aw, 1) = P[W<_w,X, <X2] = P[X 1 <w,X a <X2] (3.3)

and, similarly,

Fw. ~(w, 2) = P[X 2 < w, X z <S,] .

If w, > w o, then the probabil i ty that an item will fail f rom risk R, in the interval (w 0' w,) is given by

Fw.a(w,, 1)-Fw.j(Wo, 1) = P [ w o < X , <w, ,X , <)(2]

(see Fig. 1). Thus from equat ion (3.3) we can find the probabili ty of any "trapezoid" bounded on the bo t tom of the 45" line, bounded on the left and right by lines parallel to t heX 2 axis, and unbounded on the top. Similarly using Fw.j(w,2),we can find the probabil i ty of "trapezoids" on the other side of the 45 r line (see Fig. 3.1).

1 7 ~ Fw. J (W,,I)-- Fw, J (Wo, I )

~- -~Fw,J (W3,2)- F..j (W2,2)

w,,

%

/ /

/ X , --Xz

/ /

e, i i x i Wo w~

Fig. 1. Sets whose probabilities are well defined.

Page 3: Nonidentifiability of competing risks

NonidentiliabiliD of competing risks 349

The probabil i ty of any set in the a-algebra generated by these " ' trapezoids" is well defined in terms of Fw, a. However, this a-algebra is properly contained in the Borel field on this (X 1,X2) space. For example, a rectangle may not be formed by unions and intersections of these trapezoids.) The extension of the probabil i ty measure from this a- algebra to the Borel field is not unique (see Example 1).

E~amph ' 1 • LetX* andX* be independent uniform (0,1) r andom variables. Thus * ix(X) is one over the unit square. In the upper left-hand corner of the unit square ~e draw a square with sides of length d<½, and subdivide that square into four squares. Then starting ~xith the upper left square, we place its probabil i ty mass evenly over the lower left square, and take the probabili ty mass of the lower right square and place it evenly in the upper right square. The resulting bivariate density we obtained for the new variables X t and X 2 is

l , ( x I, x 2) = 0

= 2

= 1

= 0

(See Fig. 2.)

i f O < x t < d / 2 , 1 - d / 2 < x 2 < 1,

or d / 2 < x ~ <d, 1 - d < x 2 < 1 - d / 2

i fO<x~ <all2, l - d < x 2 < l - d ~ 2 ,

o r d / 2 < x l <d , 1 - d / 2 < x 2 < l

all other values of x t and x2 in the unit square

x t, x 2 not in unit square.

Note that the marginal distributions o f X 1 and X 2 are still uniform (0,1), but X 1 and X 2 are not independent since

. I x ( x ) # .Ix, (.,, ~ ) ./:~-:(x2).

the inequality holding whenever X~ and X 2 lie in the square where the uniform density is "disturbed". Fw, .dw,j) can be quickly evaluated geometrically if we note that the volume under.txix)for the shaded area in Fig. 3 is the same as for.t {(x). The area where./* is zero in the shaded area is the same as the area l* is 2 in the shaded area.

Oin I I

a i n

4- , /

Xx I-- d12 ~:x

I ~'/2 a i x,

Fig. 2. B ixar ia tc d e p e n d e n t dens i t y i l lu s t r a t ing P r o p o s i t i o n 2.

'2

>,£

w

4-,, /

/ /

/ /

/ /

x

Fig. 3./: ~ flw, 11 equals the hatched area for the density given by Fig. 2.

Furthermore, this volume is constant as the parameter d varies between 0 and 1, and the perturbed square floats around above or below, but does not intersect, the 4 5 line: a family of distributions is thus generated that all look the same in the W, J space.

The question arises as to what structural assumptions are necessary to yield identifiability. Proposi t ion 1 shows independence of risks is sufficient. Independence of the risks implies (equation 3.2) equality of the marginal hazard rate. ';,xj and the partial hazard rate )ow. J. One might conjecture that the property (3.2) implies independence and hence identifiability. This is false, as can be seen through Example 1.

The family of distributions generated when we let the perturbed square "'float a round" above and below the 45 line all look the same in the W,J space, and yield the same value for

),w.a(w, j ) = )~xflW) = l/1 - w , 0 < w < l , . j = l, 2

which is the same as for independent uniform net lives.

Propos i t ion 3: If the net lifetimes are dependent with joint survival function H x, generating a distribution for crude life Fw. j, there exists a distribution H* for which the risks are independent and which generates the same crude life distribution Fw. a. Proq]i From Fw. . /one obtains the partial hazard rates 2w..1 and sets the marginal hazard rates 2 * equal to them :

;.*(w) = ,':,.., (w, j).

Then one obtains

i) H*,Ixl = e x p - *(w) dw

and

k

H*(x) = H H~,(xi). i = l

This joint distribution looks the same as H x in the W, J space since the f-w, a uniquely define Fw, j, again yielding the nonidentifiability of H x. It should be noted that the distributions H* may be improper (have some mass at Xi infinity) if

f ' ~x (w) dw i 0

is finite.

Page 4: Nonidentifiability of competing risks

350 David M. Rose

Example 2: Proposition 3 can be illustrated using the bivariate dependent exponential model of Gumbel (60):

Hx(x) = exp(--21X 1 - - , ~ 2 X 2 - - 6X1X2) , 0 ~ 6--<5.122

with exponential marginals; thus, one has

2x, = 2 i

and uses equation (2.2) to obtain

JW.j(w, j) = (2j +'6w) exp[--(21 +22)w-- ~Sw 2]

and

2w, j(w, j) = 2 j+ 6w.

Thus, the partial and marginal hazard rates are not equal. Letting

2* = 2~ + 6w

defines a distribution of independent risks with linear hazard rates for the marginal distributions, which looks the same in the W, J space as the bivariate Gumbel :

H*(x ) = e x p - t ; (;ti + ,Sw) dw

= exp - (2ix + ~/2x 2)

H x is also identifiable in the following trivial case. The form of H~, 0 is known but the vector of parameters 0 is unknown. Fw, J also depends on 0. If the vector 0 is estimable in the 14/,, J space, then the structure of H x is identifiable, and "solutions" to Problems 1 and 2 are available.

Discussion

The competing-risks framework presented above is an attempt to provide a structure to Problems I and 2 which admits solutions.

In Problem 1, a population is subject to k modes of failure. If a mode of failure, R ~, is removed, how will the survival experience change? What will be the leading mode of failure? The applicability of the standard competing-risk structure to this problem is subject to question. The usual approach is to let theX ~ be integrated out of H x, thus letting the survival experience be

W* = rain X~ i=2 . . . . . k

with J* the corresponding indicator variable. IftheX~'s are not stochastically independent, then H x is not identifiable, and thus no solution is possible. Pursuing an answer

assuming independence of the sake of simplicity invites error. Bounds are, however, available [see Peterson (77)]. The assumption that removing a risk is equivalent to integrating out the corresponding variable from fx is subject to criticism. There are situations where removing a risk results in a decrease in the original survival function Hw, which this approach does not allow. This approach may be replaced by others where it seems appropriate; e.g., removing risk R1 is equivalent to insuring X1 is greater than some value Xo and thus

W * = min X~,givenX l > x 0. i=1 . . . . . k

This approach may make sense in a physical system where one component is replaced by an improved component with a guarantee time of x 0, but the replacement of this component does not affect the interrelationships of the X i ' s .

Problem 2 addresses the comparison of two populations with respect to a given mode of failure, removing the effect of all other nuisance risks. The populations usually have dissimilar histories with respect to the nuisance risks. Usually there is serious doubt as to the independence of the risks. The risks can only be theoretically removed in that the populations will continue to be subject to all modes of failure in the future. Comparing the crude survival curves does not seem a good method of comparison, and the independent competing- risks approach is also of questionable value. The integral of the partial hazard rate may be a metric that provides some method of comparison.

References

( 'h iang C. L. (1961) On the probability of death from specific causes in the presence of competing risks, Proc. Fourth Berkeley Syrup., pp. 169 180. University of California Press, Berkeley, California. Chiang C. L. (1968) Introduction to Stochastic Process in Biostatistics, John Wiley and Sons, New York. Gumbel E. J. (1960) Bivariate exponential distributions, J. Am. ~statist. Assoc. 55, 698 707. Peterson A. V. (1977) Dependent competing risks: Bounds for net- survival functions with fixed crude survival functions, Proceedings of this workshop.

Discussion

Sohel: Has anybody tried to give bounds for net-survival functions in the dcpcndcnt case'? Rose: That is a good question. I suggested this in a talk, 1 think, at a meeting in Corvallis. Someone then said that Peterson had done just that. By chance. Peterson's talk will follow and he will answer your question.