Lecture 19: Competing Risk Regression. When Competing Risks? Recall censoring assumption: – Event...

Preview:

Citation preview

Lecture 19: Competing Risk Regression

When Competing Risks?

• Recall censoring assumption:– Event times and censoring times are independent

• If this is questionable, then competing risks is likely more appropriate

• But… must be able to distinguish the other events/risks

Competing Risks Data

• Subject can fail from any of K events types, but only the earliest failure time can be observed.

• As in the non-competing risks setting, observations take the form of (T, δ).

• T is the minimum of t1, t2, … tK

• δ is 1, 2, … k if failed – Can also have 0 if no failure has yet occurred

• Z are the covariates we are interested in

Examples of Types of Observations

Examples of Types of Observations

Summarizing Competing Risks

• For a population, three approaches– Kaplan Meier: “net”– Cumulative incidence: “crude”– Conditional probability

• Cumulative incidence most appropriate (and most commonly used) for most settings of CR

• However, each provides it own potentially useful information

Recall The Issue

• We’ve already discussed this for estimation of the survival distribution.

• In the case of Kaplan-Meier analysis recall– Assumes that the event of interest is the only risk

acting on the population– Censors all other events• i.e. treats all other events the same as LTFU or drop-out

Competing Risks

• For estimation in a population, it is usually shown using ‘cumulative incidence’ instead of survival

• Let’s review how CR approach differs from KM approach…

Recall: KM Survival

• Estimate of survival for event r at time ti

,

, , 1

ˆ 1

# of pts who don't experience event at ˆ ˆ# of patients at risk just prior to

i

ii

dKM r i Yt t

iKM r i KM r i

i

S t

r tS t S t

t

“Net”: KM Cumulative Incidence (CI)

• Estimate of cumulative incidence at time ti

, ,

, 1 , 1

ˆ1

# of events of type at ˆ# of pts at risk just prior to

KM r i KM i i

iKM r i KM r i

i

CI t S t

r tCI t S t

t

(The sum of the CI current incidence rate plus the previous incidence rate)

Cumulative Incidence Approach

• Estimating CI:

– For t > ti

– Or

• Alternatively it can be written as:

1

1

1 j j i

j i

i

id r r

cr Y Yt t j

CI t

, ,ˆ i

i

i

rcr KM r d i Y

t t

CI t S t

1 , , 1

# of events of type at

# of pts at risk just prior to i

cr i cr i KM r d ii

r tCI t CI t S t

t

How does the cumulative incidence (CI) differ from Kaplan Meier (KM)…

Comparison

• KM

• CI

, 1

, ,

, 1 , 1

# of pts who don't have event at

# of patients at risk just prior to

1

# of pts who have event at

# of patients at risk just prior to

iKM r i r i

i

KM r i KM r i

iKM r i KM r i

r tS t S t

t

CI t S t

r tCI t S t

it

1 , , 1

# of pts who have event at

# of patients at risk just prior to i

cr i cr i KM r d ii

r tCI t CI t S t

t

Why Competing Risk Regression?

• Understand the effect of therapy on different subgroups – Allow us to target interventions to those most

likely to benefit– Allow us to summarize the absolute failure via

estimation of cause-specific failure probabilties

Cause-Specific Cumulative Hazard

• Chief quantity in competing risks setting is the cause-specific hazard function lk

• This also can be used to define the cause-specific cumulative hazard

• Overall cumulative hazard function is

0 i

t

k k k it tt u du t

1 2 ... Kt t t t

Cumultative Incidence

• How does this all relate to our earlier discussion of estimating overall incidence?

• We are still interested in cumulative incidence

0

failure time ; cause =

where

exp

k

t

k

F t P T t k

S u u du

S t t

Cumultative Incidence

• Sum of the K cumulative incidence functions is the probability of failure from any cause…

• NOTE! The cumulative probability of event k in the presence of competing risks is often miscalculated as

1

1K

hhF t S t

1 kS t

Competing Risk Model

• Recall a subject can fail from one of K events

• This means that we have partial information on all events– A subject who experience the kth event at time ti is

know to have survived to time ti for all other competing events.

Competing Risk Model

• There are two basic approaches to competing risk regression

– Modeling cause-specific hazard– Modeling cumulative incidence

• Both are analogous to the Cox proportional hazards model

Modeling Cause Specific Hazard

• Model denoted as

• Function of some unspecified baseline hazard function for kth cause

• Also function of covariate vector Z and regression coefficients b

0 1exp

p

k k j jjt t Z

BMT Example

• Recall our BMT data examining the association between time to event disease type– ALL, AML low risk, AML high risk– Other factors: FAB class, donor/patient

characteristics, Waiting time, platelet recovery,

• Originally we had considered time to relapse or death as a single event.

• What if we wanted to examine factors for each?

Cause Specific Hazard Modelslibrary(survival); library(Kmsurv)

### Cause-specific hazard model ###data(bmt)colnames(bmt)<-c("dgroup","TTD","DFS","dead","relapse","Either","tAGVHD", "AGvHD","tCGVHD", "CGvHD","tPR","PR","PtAge","DonAge","PtSex","DonSex", "PtCMV","DonCVM", "TTTrans","FAB","Hosp","MTX")

### Either Death or Relapserreg.cox<-coxph(Surv(DFS, Either)~factor(dgroup) + FAB + PtAge + DonAge + PtAge*DonAge,

data=bmt)

#Relapse Modelrreg.cox<-coxph(Surv(DFS, relapse)~factor(dgroup) + FAB + PtAge + DonAge + PtAge*DonAge,

data=bmt)

Death or Relapse Model

> summary(reg.cox)coxph(formula = Surv(DFS, relapse) ~ factor(dgroup)+FAB+PtAge+DonAge+PtAge*DonAge, data = bmt) n= 137, number of events= 83 coef exp(coef) se(coef) z Pr(>|z|) factor(dgroup)2 -1.0906 0.3359 0.354279 -3.078 0.002080 ** factor(dgroup)3 -0.4039 0.6677 0.362776 -1.113 0.265549 FAB 0.8374 2.3104 0.278464 3.007 0.002636 ** PtAge -0.08164 0.9216 0.036107 -2.261 0.023756 * DonAge -0.08459 0.9189 0.030097 -2.810 0.004947 ** PtAge:DonAge 0.00316 1.0032 0.000951 3.323 0.000891 ***

Concordance= 0.665 (se = 0.033 )Rsquare= 0.213 (max possible= 0.996 )Likelihood ratio test= 32.8 on 6 df, p=1.144e-05Wald test = 33.02 on 6 df, p=1.039e-05Score (logrank) test = 35.75 on 6 df, p=3.078e-06

Relapse Specific Hazard Model> summary(rreg.cox)coxph(formula = Surv(DFS, relapse) ~ factor(dgroup)+FAB+PtAge+DonAge+PtAge*DonAge, data = bmt)n= 137, number of events= 42 coef exp(coef) se(coef) z Pr(>|z|) factor(dgroup)2 -1.8406 0.1587 0.582111 -3.162 0.00157 **factor(dgroup)3 -0.5794 0.5602 0.540797 -1.071 0.28403 FAB 1.4239 4.1531 0.433179 3.287 0.00101 **PtAge -0.0384 0.9624 0.052539 -0.730 0.46511 DonAge -0.0835 0.9199 0.044086 -1.894 0.05822 . PtAge:DonAge 0.0024 1.0024 0.001432 1.648 0.09937 .

Concordance= 0.75 (se = 0.046 )Rsquare= 0.203 (max possible= 0.938 )Likelihood ratio test= 31.16 on 6 df, p=2.361e-05Wald test = 27.15 on 6 df, p=0.0001355Score (logrank) test = 31.55 on 6 df, p=1.993e-05

Interpretation of Cause-Specific Model

• Interpret the hazard ratio for two individuals the same as from a normal Cox model

• However, there is a dependency between the failure types…

• Problem– Effects seen in the model may reflect the influence

of the competing events– As a result, covariate effects don’t necessarily

pertain to the cumulative incidence of the kth event

Modeling Cumulative Incidence

• Analogous to our general approach for a study population with competing risk, there is a model base on cumulative incidence.

• Competing risk regression model (Fine and Gray)– Direct regression of the effect of covariates on

cumulative incidence– Distinguish between patients who have had other

events and those at risk for event of interest– Based on PHM approach

Modeling Cumulative Incidence

• In this case the form of the model is

• Where is referred to as the sub-distribution hazard – crude hazard from CIcr

* *0 1

expp

k k j jjt t Z

*k t

Sub-Distribution Hazard

• Expression for the sub-distribution hazard

• Can also think of this as the hazard function for an improper random variable

* 1lim ,k t

t P t T t t k T t T t kt

* * 1 *T I k T I k

Risk Set in CRR Model

• Just as in the case of the Cox model, we need a likelihood expression for estimation of the model

• Definition of the risk set slightly altered

:i j i j i jR j T T T T k

Partial Likelihood

• The expression for the partial Likelihood based on our newly defined risk set is

• Partial log-likelihood

'

'1

exp

exp

i

i

I k

n i

ijj R

L

Z

Z

' '

1log exp

i

n

i i ji j Rl I k

Z Z

Estimation & Testing

• We can maximize the partial log-likelihood in the same way we did with the Cox PHM

• Additionally, Fine and Gray developed a score test to make inference about the regression coefficients in the model

What About Censoring

• Up until now, we have been assuming we have “complete” data.

• If we observe censoring we must change the definition of the risk set slightly…

:i j j i j i j j iR j C T T T T k C T

Competing Risk Regression in R

• Can implement the cause-specific hazard model using the coxph function in the survival library

• The “cmprsk” package in R implements the Fine and Gray model we’ve just discussed

• So back to our BMT example

“cmprsk” Library

• crr: Fit the Fine and Gray model of the subdistribution functions in competing risk– ftime and fstatus: define survival data– cov1 and cov2: matrix of covariates– tf: functions of time for covariate in cov2– failcode: which event are you modeling?

• Other functionality deals with estimating and comparing cumulative incidence across groups

Fitting CRR Model### Fine and Gray Cumulative Incidence Model ######First we have to generate a single event type variable

etime<-ifelse(bmt$relapse==1, bmt$DFS, bmt$TTD)etype<-ifelse(bmt$relapse==1, 1, 0)etype<-ifelse(bmt$dead==1 & bmt$relapse==0, 2, etype)

dx2<-ifelse(bmt$dgroup==2, 1, 0)dx3<-ifelse(bmt$dgroup==3, 1, 0)ptdn.intx<-bmt$PtAge*bmt$DonAgefab<-bmt$FABptage<-bmt$PtAgednage<-bmt$DonAgecov<-cbind(dx2, dx3, fab, ptage, dnage, ptdn.intx)

CRR Model> rreg.crr<-crr(etime, etype, cov, failcode=1)> summary(rreg.crr)Competing Risks RegressionCall: crr(ftime = etime, fstatus = etype, cov1 = cov, failcode = 1)

coef exp(coef) se(coef) z p-valuedx2 -1.55581 0.211 0.55215 -2.818 0.0048dx3 -0.53913 0.583 0.52981 -1.018 0.3100fab 1.30349 3.682 0.43894 2.970 0.0030ptage -0.01688 0.983 0.06499 -0.260 0.8000dnage -0.05874 0.943 0.04879 -1.204 0.2300ptdn.intx 0.00152 1.002 0.00176 0.864 0.3900

Num. cases = 137Pseudo Log-likelihood = -187 Pseudo likelihood ratio test = 24.1 on 6 df,

CRR Model Results

exp(coef) exp(-coef) 2.5% 97.5%dx2 0.211 4.739 0.0715 0.623dx3 0.583 1.715 0.2065 1.648fab 3.682 0.272 1.5577 8.704ptage 0.983 1.017 0.8657 1.117dnage 0.943 1.060 0.8570 1.038ptdn.intx 1.002 0.998 0.9981 1.005

Num. cases = 137Pseudo Log-likelihood = -187 Pseudo likelihood ratio test = 24.1 on 6 df,

Additional Notes

• Can also include a matrix of time varying covariates

• Can NOT use factor here… Must create dummy variables for all factor variables of interest in the data

Comparison of CHR and CRR

Cause-specific Competing Risks b HR p b p HR

AML low risk -1.841 0.16 (0.05, 0.50) 0.0016 -1.556 0.0048 0.21 (0.07, 0.62)

AML high risk -0.579 0.56 (0.19, 1.62) 0.2800 -0.539 0.3100 0.58 (0.21, 1.65)

FAB 1.424 4.15 (1.78, 9.71) 0.0010 1.303 0.0030 3.68 (1.56, 8.70)

Patient Age -0.038 0.96 (0.87, 1.07) 0.4700 -0.017 0.8000 0.98 (0.87, 1.12)

Donor Age -0.084 0.92 (0.84, 1.00) 0.0580 -0.059 0.2300 0.94 (0.86, 1.04)

PatAge x DonAge 0.0024 1.00 (0.999, 1.005) 0.0990 0.0015 0.3900 1.00 (0.998, 1.005)

Cause-Specific vs CRR approach

• Competing risks are truly independent– Cause-specific model provides valid estimates of

the risk of each event– CRR model tends to be biased towards the null

• If competing risks are dependent– Cause specific model

Which One to Use?

• It depends…• The cause-specific approach – can give cause-specific hazards and CIFs. – However, we cannot examine covariate effects

• Subdistribution approach – allows us to test covariate effects on the CIF – but subdistribution hazards are dicult to interpret

and should be used with caution.

• Cumulative incidence model more realistically models treatment effect in a population.

• If we want real world probabilities of death then competing risks methodology should be used as opposed to standard survival analysis methods.

• Allows us to separate the probability of death into different causes.

Which One to Use?

Model Selection for CRR

• A nice feature of the CRR approach is that we can evaluate associations between covariates and our different events

• This means we can conduct model selection to find a more parsimonious model– Use same approaches as before• P-values• AIC, BIC• Forward/backward selection

#p-value approach, with forward selectiondx2<-ifelse(bmt$dgroup==2, 1, 0); dx3<-ifelse(bmt$dgroup==3, 1, 0)ptage<-bmt$PtAge; dnage<-bmt$DonAge; ptdn.intx<-bmt$PtAge*bmt$DonAgefab<-bmt$FABtttrans<-bmt$TTTransmtx<-bmt$MTXptsex<-bmt$PtSex; dnsex<-bmt$DonSex; sx.intx<-bmt$PtSex*bmt$DonSexh2<-ifelse(bmt$Hosp==2, 1, 0); h3<-ifelse(bmt$Hosp==3, 1, 0); h4<-ifelse(bmt$Hosp==4, 1, 0)

cov1a<-cbind(dx2, dx3, fab)cov1b<-cbind(dx2, dx3, ptage, dnage, ptdn.intx)cov1c<-cbind(dx2, dx3, ptsex, dnsex, sx.intx)cov1d<-cbind(dx2, dx3, tttrans)cov1e<-cbind(dx2, dx3, mtx)cov1f<-cbind(dx2, dx3, h2,h3,h4)

rreg.crra<-crr(etime, etype, cov1a, failcode=1) #p FAB = 0.0039rreg.crrb<-crr(etime, etype, cov1b, failcode=1) #p Pt/Dn age = 0.80, 0.23, 0.39rreg.crrc<-crr(etime, etype, cov1c, failcode=1) #p Pt/Dn sex = 0.36, 0.65, 0.69rreg.crrd<-crr(etime, etype, cov1d, failcode=1) #p TTTrans = 0.055rreg.crre<-crr(etime, etype, cov1e, failcode=1 #p MTX = 0.73rreg.crrf<-crr(etime, etype, cov1f, failcode=1) #p Hops = 0.94

Example (forward using p-values)

Example (forward using p-values)#Final Model (choosing p<0.1)> rreg.crra<-crr(etime, etype, cov2a, failcode=1)> summary(rreg.crra)Competing Risks RegressionCall: crr(ftime = etime, fstatus = etype, cov1 = cov2a, failcode = 1)

coef exp(coef) se(coef) z p-valuedx2 -1.58888 0.204 0.481107 -3.303 0.00096dx3 -0.40978 0.664 0.500140 -0.819 0.41000fab 1.20707 3.344 0.423032 2.853 0.00430tttrans -0.00105 0.999 0.000562 -1.875 0.06100

exp(coef) exp(-coef) 2.5% 97.5%dx2 0.204 4.898 0.0795 0.524dx3 0.664 1.506 0.2491 1.769fab 3.344 0.299 1.4593 7.661tttrans 0.999 1.001 0.9978 1.000

Num. cases = 137Pseudo Log-likelihood = -187 Pseudo likelihood ratio test = 25 on 4 df,

Automated Model Selection

• There is an automated selection algorithm for the fine and Gray Model

• Traditional selection criteria include– AIC = -2logLp + 2p – BIC = -2logLp + plogn

• Alternative proposed in the literature– BICcr = -2logLp + plog(n*)

BICcr selection criteria

• Should select a more parsimonious model than the AIC, and has a less stringent penalty than the BIC.

• Provides a good compromise for working with the Fine and Gray model for competing risk data.

Crrstep function in R

• R package called crrstep implements stepwise model selection for the Fine and Gray competing risks model.

• Available selection criterion include AIC, BIC, or BICcr selection criteria for choosing covariates.

BMT Example###Automated Approachlibrary(crrstep)

mAIC<-crrstep(etime~factor(dgroup)+FAB+TTTrans+MTX+PtSex+DonSex+PtAge+DonAge+PtAge*DonAge, etype=etype, data=bmt, direction="forward",

failcode=1, criterion = "AIC")

mBIC<-crrstep(etime~factor(dgroup)+FAB+TTTrans+MTX+PtSex+DonSex+PtAge+ DonAge+PtAge*DonAge, etype=etype, data=bmt, direction="forward",

failcode=1, criterion = "BIC")

mBICcr<-crrstep(etime~factor(dgroup)+FAB+TTTrans+MTX+PtSex+DonSex+PtAge+DonAge+PtAge*DonAge, etype=etype, data=bmt, direction="forward",

failcode=1, criterion = "BICcr")

BMT Example (AIC)> mAIC<-crrstep(etime~factor(dgroup)+FAB+TTTrans+MTX+PtSex+DonSex+PtAge+DonAge+PtAge*DonAge, etype=etype, data=bmt, direction="forward", failcode=1, criterion = "AIC")NULL AIC+FAB 390.38+factor(dgroup) 390.46<none> 398.60+PtSex 399.53+TTTrans 399.54+DonSex 399.69+MTX 400.36+PtAge:DonAge 400.47+PtAge 400.54+DonAge 400.58[1] "FAB"

BMT Example (AIC)> mAIC<-crrstep(etime~factor(dgroup)+FAB+TTTrans+MTX+PtSex+DonSex+PtAge+DonAge+PtAge*DonAge, etype=etype, data=bmt, direction="forward", failcode=1, criterion = "AIC")…[1] "FAB" "factor(dgroup)" "TTTrans" AIC<none> 381.57+DonSex 383.08+PtSex 383.19+PtAge:DonAge 383.25+MTX 383.26+DonAge 383.58+PtAge 384.73

Comparison AIC to p-value approach> mAIC$coefficients estimate std.error t-statFAB 1.21000 0.423000 2.850factor(dgroup)2 -1.59000 0.481000 3.300factor(dgroup)3 -0.41000 0.500000 0.819TTTrans -0.00105 0.000562 1.870

$log.likelihood[1] -186.79

> summary(rreg.crra)Competing Risks RegressionCall: crr(ftime = etime, fstatus = etype, cov1 = cov2a, failcode = 1)

coef exp(coef) se(coef) z p-valuedx2 -1.58888 0.204 0.481107 -3.303 0.00096dx3 -0.40978 0.664 0.500140 -0.819 0.41000fab 1.20707 3.344 0.423032 2.853 0.00430tttrans -0.00105 0.999 0.000562 -1.875 0.06100

Pseudo Log-likelihood = -187

Comparison of Three Criterion> mAIC

estimate std.error t-statFAB 1.21000 0.423000 2.850factor(dgroup)2 -1.59000 0.481000 3.300factor(dgroup)3 -0.41000 0.500000 0.819TTTrans -0.00105 0.000562 1.870

> mBICestimate std.error t-stat

FAB 1.220 0.422 2.890factor(dgroup)2 1.360 0.485 2.800factor(dgroup)3 -0.303 0.499 0.608

> mBICcrestimate std.error t-stat

FAB 1.220 0.422 2.890factor(dgroup)2 -1.360 0.485 2.800factor(dgroup)3 -0.303 0.499 0.608

One Final Note

• Competing risk regression generally assumes that events are mutually exclusive– BMT data this isn’t true – We can’t really look at relapse as competing for

death

• Joint Frailty modeling offers an alternative and can be implemented using frailtypack– Idea is to model recurrent events jointly with

some terminal event

Next Time

• A little about sample size estimation and power

Recommended