15
Satistics 262 1 Statistics 262: Intermediate Biostatistics Jonathan Taylor and Kristin Cobb May 18, 2004: Cox Regression III: residuals and diagnostics, repeated events

Statistics 262: Intermediate Biostatistics

Embed Size (px)

DESCRIPTION

Statistics 262: Intermediate Biostatistics. May 18, 2004: Cox Regression III: residuals and diagnostics, repeated events. Jonathan Taylor and Kristin Cobb. Residuals. Residuals are used to investigate the lack of fit of a model to a given subject. - PowerPoint PPT Presentation

Citation preview

Page 1: Statistics 262: Intermediate Biostatistics

Satistics 262 1

Statistics 262: Intermediate Biostatistics

Jonathan Taylor and Kristin Cobb

May 18, 2004: Cox Regression III: residuals and diagnostics, repeated events

Page 2: Statistics 262: Intermediate Biostatistics

Satistics 262 2

Residuals Residuals are used to investigate

the lack of fit of a model to a given subject.

For Cox regression, there’s no easy analog to the usual “observed minus predicted” residual of linear regression

Page 3: Statistics 262: Intermediate Biostatistics

Satistics 262 3

Deviance Residuals Deviance residuals are based on

martingale residuals: ci (1 if event, 0 if censored) minus the estimated cumulative hazard to ti (as a function of fitted model) for individual i:ci-H(ti,Xi,ßi)

See Hosmer and Lemeshow for more discussion…

Page 4: Statistics 262: Intermediate Biostatistics

Satistics 262 4

Deviance Residuals Behave like residuals from ordinary

linear regression Should be symmetrically distributed

around 0 and have standard deviation of 1.0.

Negative for observations with longer than expected observed survival times.

Plot deviance residuals against covariates to look for unusual patterns.

Page 5: Statistics 262: Intermediate Biostatistics

Satistics 262 5

Deviance Residuals In SAS, option on the output

statement:Ouput out=outdata resdev=

Page 6: Statistics 262: Intermediate Biostatistics

Satistics 262 6

Schoenfeld residuals Schoenfeld (1982) proposed the first set of

residuals for use with Cox regression packages Schoenfeld D. Residuals for the proportional hazards

regresssion model. Biometrika, 1982, 69(1):239-241. Instead of a single residual for each individual,

there is a separate residual for each individual for each covariate

Based on the individual contributions to the derivative of the log partial likelihood (see chapter 6 in Hosmer and Lemeshow for more math details, p.198-199)

Note: Schoenfeld residuals are not defined for censored individuals.

Page 7: Statistics 262: Intermediate Biostatistics

Satistics 262 7

Schoenfeld residualsWhere K is the covariate of interest,

the Schoenfeld residual is the covariate-value, Xik, for the person (i) who actually died at time ti minus the expected value of the covariate for the risk set at ti (=a weighted-average of the covariate, weighted by each individual’s likelihood of dying at ti).

)(

1

residualitRj

ijkjik pxx

Plot Schoenfeld residuals against time to evaluate PH assumption

Page 8: Statistics 262: Intermediate Biostatistics

Satistics 262 8

Schoenfeld residualsIn SAS: option on the output statement:ressch=

Page 9: Statistics 262: Intermediate Biostatistics

Satistics 262 9

Influence diagnostics How would the result change if a

particular observation is removed from the analysis?

Page 10: Statistics 262: Intermediate Biostatistics

Satistics 262 10

Influence statistics• Likelihood displacement (ld): measures

influence of removing one individual on the model as a whole. What’s the change in the likelihood when this individual is omitted?

• DFBETA-how much each coefficient will change by removal of a single observation

• negative DFBETA indicates coefficient increases when the observation is removed

Page 11: Statistics 262: Intermediate Biostatistics

Satistics 262 11

Influence statisticsIn SAS: option on the output statement:ld= dfbeta=

Page 12: Statistics 262: Intermediate Biostatistics

Satistics 262 12

Death (presumably) can only happen once, but many outcomes could happen twice… Fractures Heart attacks PregnancyEtc…

What about repeated events?

Page 13: Statistics 262: Intermediate Biostatistics

Satistics 262 13

Strategy 1: run a second Cox regression (among those who had a first event) starting with first event time as the origin

Repeat for third, fourth, fifth, events, etc. Problems: increasingly smaller and

smaller sample sizes.

Repeated events: 1

Page 14: Statistics 262: Intermediate Biostatistics

Satistics 262 14

Treat each interval as a distinct observation, such that someone who had 3 events, for example, gives 3 observations to the dataset Major problem: dependence between

the same individual

Repeated events:Strategy 2

Page 15: Statistics 262: Intermediate Biostatistics

Satistics 262 15

Stratify by individual (“fixed effects partial likelihood”)

In PROC PHREG: strata id; Problems: does not work well with RCT data, however requires that most individuals have at least 2

events Can only estimate coefficients for those

covariates that vary across successive spells for each individual; this excludes constant personal characteristics such as age, education, gender, ethnicity, genotype

Strategy 3