View
0
Download
0
Category
Preview:
Citation preview
12th UK Stata Users’ meeting, September 2006 1 Patrick Royston
Visualising and analysing time-to-event data: lifting the veil of censoring
Patrick RoystonCancer Group, MRC Clinical
Trials Unit, London
12th UK Stata Users’ meeting, September 2006 2 Patrick Royston
A poet writes about censored observations:
Last night I saw upon the stairA little man who wasn't thereHe wasn't there again todayOh, how I wish he'd go away!
From Antigonish (1899)
Hughes Mearns (1875-1965)
12th UK Stata Users’ meeting, September 2006 3 Patrick Royston
Outline
• Why is censoring of time-to-event data an issue?
• Example in breast cancer• Visualisation of censored data using
model-based imputation• Multiple imputation and analysis of
survival data with missing covariate observations
• Demonstration with Stata
12th UK Stata Users’ meeting, September 2006 4 Patrick Royston
Why is censoring an issue?
• You can’t picture the raw data easily• Reliance on Kaplan-Meier plots
Exaggerates differences between groupsAttracts attention to unreliable survival estimates at extreme times
• Data will be analysed using Cox modelStill the almost-automatic choice – although decent alternatives exist
• Time is “forgotten about” in the Cox modelAnalysis is based on the ranks of failure times
12th UK Stata Users’ meeting, September 2006 5 Patrick Royston
• Results of Cox regression models are usually expressed as (log) hazard ratios
Indirect – not dealing directly with timeCan be hard to interpret – different effect on survival curves at high and low survival probsParticularly difficult for interactions – ‘ratio of hazard ratios’
• Non-proportional hazardsData with long-term follow-up typically have itModelling and interpretation may be complex
12th UK Stata Users’ meeting, September 2006 6 Patrick Royston
Example: Primary node-positive breast cancer
• GBSG trial BMFT-2• 686 patients, 299 events for recurrence-
free survival (RFS)• Patients assigned to hormonal therapy
(TAM) or not• Visualise the effect of TAM on RFS• Visualise interaction between TAM and ER
(estrogen receptor status)
12th UK Stata Users’ meeting, September 2006 7 Patrick Royston
Traditional visualisation: Kaplan-Meier by TAM group
0.00
0.25
0.50
0.75
1.00
S(t)
0 2 4 6 8Recurrence-free survival time, yr
hormone = No TAM hormone = TAM
Kaplan-Meier survival estimates, by hormone
12th UK Stata Users’ meeting, September 2006 8 Patrick Royston
Dot plot by TAM therapy –unhelpful with censored data
00000010110000111111011111111110111000111101111111111111111111111111111111111110100111111111111011110011111111111110000001100101011111110110110001101100101101010011001011010010111011100010010101111111010101000100011110110000001110110000101010001101011010110100100011100100110000000011010101000101000111000010000000000101001000100100100000100000011000000001000100010000000110000000100000000000000000000010000001001000000010000000001000001000
000011100011101001101110111111100011111111111111111111111001111011001101011000010100111000001010010001010101100100100110001011000010000110011010000000011100000000001100001000000000000000000000100000000001110000100010000011000000000000010000000000
02
46
8R
ecur
renc
e-fre
e su
rviv
al ti
me,
yr
No TAM TAMHormone therapy, 1=no, 2=yes
fig2
12th UK Stata Users’ meeting, September 2006 9 Patrick Royston
How better to visualise survival times?
• To make progress with visualisation, aim to impute the “missing” part of censored times
• Assume a parametric distribution of survival time• Survival times are sometimes approximately
lognormally distributed (Royston 2001a)Can check by using modified Normal Q-Q plot
• If lognormal approximation is not good, can consider Box-Cox transformation of time
Or another transformation towards normality
12th UK Stata Users’ meeting, September 2006 10 Patrick Royston
Assessing lognormality: modified Normal Q-Q plot• Simple transformation of Kaplan-Meier survival curve
.2.5
12
510
Rec
urre
nce-
free
surv
ival
tim
e (lo
g sc
ale)
-3 -2 -1 0 1Normal equivalent deviates
12th UK Stata Users’ meeting, September 2006 11 Patrick Royston
Normal Q-Q plot by TAM group
.2.5
12
510
Rec
urre
nce-
free
surv
ival
tim
e (lo
g sc
ale)
-3 -2 -1 0 1Normal equivalent deviates
No TAM TAM
12th UK Stata Users’ meeting, September 2006 12 Patrick Royston
Visualisation of censored data using imputation
• Create m (≥ 1) copies of the data with censored survival times imputed
• Need an imputation model to reflectDistribution of times (e.g. lognormal)Effects of covariates (prognostic factors)
• Creating an imputation model:Use mfp with cnreg (censored normal regrn.) to model poss. non-linear effects of covariatesE.g. mfp cnreg lnt x1 x2 x3 x4a x4b x5 x6 x7 hormone, censored(c) select(1) dfdefault(2)
12th UK Stata Users’ meeting, September 2006 13 Patrick Royston
Creating the imputed dataset(s)
• Can use the ice multiple imputation command to create the imputations
Royston (2004, 2005a, 2005b) Stata J• ice varlist using filename[.dta][if exp] [in range] [weight],[m(#) cmd(cmdlist) cycles(#) boot[(varlist)] seed(#) dryruneq(eqlist) passive(passivelist) substitute(sublist) dropmissinginterval(intlist) other_options]
12th UK Stata Users’ meeting, September 2006 14 Patrick Royston
Interval censoring with ice
• gen ll = lnt
• gen ul = cond(_d==1, lnt, ln(50))// chose upper limit of 50 years for
RFS: can use . for +∞• (generate FP transformations of x1, x5, x6)• ice x1_1 x2 x3 x4a x4b x5_1 x6_1 x7 hormone ll ul lnt using imputed.dta, interval(lnt:ll ul) m(10)
12th UK Stata Users’ meeting, September 2006 15 Patrick Royston
How interval() works
Censoredobs
Upperlimit
Completeobs
-4 -3 -2 -1 0 1 2 3 4
• Sample randomly from truncated normal distribution (shaded)
12th UK Stata Users’ meeting, September 2006 16 Patrick Royston
Code fragment from uvis.ado
`cmd' `yvarlist' `xvars' `wgt',`options'
...if "`cmd'"=="intreg" {
tempvar PhiA PhiBgen `PhiA‘ = cond(missing(`ll'), 0,
norm((`ll'-`xb')/`rmsestar'))gen `PhiB‘ = cond(missing(`ul'), 1,
norm((`ul'-`xb')/`rmsestar'))replace `yimp‘ = `xb‘
+`rmsestar'*invnorm(`u'*(`PhiB'-`PhiA')+`PhiA')
}
12th UK Stata Users’ meeting, September 2006 17 Patrick Royston
Uses of the interval() option
• Impute right-, left- or interval-censored outcomes
Response variable in time-to-event studies• Impute when a covariate is sometimes
partly observed, sometimes completeSome observations recorded exactlyOthers known to be below or above a cutoffE.g. D-dimer in DVT, PgR/ER in breast cancer
• Interval censored covariatesIncome in surveys recorded as ranges only
12th UK Stata Users’ meeting, September 2006 18 Patrick Royston
Breast cancer data: visualisation of time to recurrence
00000000000001011011010111101111110101101101111111101011111111100011011111011111111111111111111111111110101111111011111011001111111111111110111110110011100111011110101111111101111101011011001111111101000111110101000001111010001101011100110011101100100100101001001001101111101111110011111101000000100010010001011001101101010000000010100000100101011001001011100011100000110000100111001101110110101100100001010000010000010001000101000000001101000000010110111001000100010110011000100001001010100000011000000100000000101000000000101000000100000000000000010000000000010000000000000010001000000100010000000011001010000000000000001100001000100000000000001000100100000000000000001000000000010000
11111111111111111111111101111111111101111111111111111111111111111111111111111111111111111111111111101111011111111111011111111111111111111111111011111111111001111111111111111111111111111010110101101111111101011111111110110101111101110101011001111111010111111111111101111001011110101100110001010011110001100111011011010001010000000100010101001011101101001100000100010000000000010001100001000100001000000000000000000110001010000010000100000000000000000000000000000000000001000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
.05
.1.2
.51
25
1020
50R
ecur
renc
e-fre
e su
rviv
al ti
me,
yr
0 1imputation number
fig_response
12th UK Stata Users’ meeting, September 2006 19 Patrick Royston
Visualisation: some plots using the first imputed sample
.2.5
12
510
2050
Rec
urre
nce-
free
surv
ival
tim
e (lo
g sc
ale)
No TAM TAMHormone therapy
.2.5
12
510
2050
Rec
urre
nce-
free
surv
ival
tim
e (lo
g sc
ale)
0 10 20 30 40 50x5 - number of positive lymph nodes
12th UK Stata Users’ meeting, September 2006 20 Patrick Royston
Visualisation: treatment by covariate interaction
.2.5
12
510
2050
Rec
urre
nce-
free
surv
ival
tim
e (lo
g sc
ale)
ER neg, no TAM ER neg, TAM ER pos, no TAM ER pos, TAMTAM x ER status
12th UK Stata Users’ meeting, September 2006 21 Patrick Royston
Limitations
• Imputed times to event are helpful for visualisation, but less so for analysis
Effectively, such imputations are extrapolations into the futureWe don’t know the future distributionEstimates of means, SD’s, regression coeffsetc. are heavily dependent on the distributional assumptionsPotential for bias if assumed distr’n is wrong
• Imputed times may be unrealisticE.g. survival time 150 years!
12th UK Stata Users’ meeting, September 2006 22 Patrick Royston
Other approaches
• A reasonably large literature exists• Buckley-James estimation (Buckley & James
1979)Estimates the mean of the censored partNot so good for visualisation
• Wei & Tanner (1991)Two algorithms which give multiple imputations of the censored partRelaxes the normality assumption – samples taken from the distribution of the residuals
• stpm (Royston 2001b, Royston & Parmar 2002)More flexible distributions of survival time available
12th UK Stata Users’ meeting, September 2006 23 Patrick Royston
Imputation of survival data with missing covariate observations
• So far, have assumed covariates have complete data• If covariates have missing data, need a suitable algorithm
for multiple imputation of all missing valuese.g. MICE (ice)
• To reduce bias, must include the response (time-to-event) in the imputation model
How?• “Standard” approach is to include (censored) log time and
the censoring indicator in the imputation model No theoretical justification
• May be better toInclude covariates as usualImpute right-censored times using ice with interval() option
• Can also use imputed data for visualisation
12th UK Stata Users’ meeting, September 2006 24 Patrick Royston
Analysis of survival data with missing covariate observations
• Disregard the imputed times in the MI dataset
Except for visualisation purposes
• Use original time and censoring indicator• Can analyse the MI dataset using
stcox (Cox regression)streg (several models available)stpm (flexible parametric survival models)
• micombine supports such models
12th UK Stata Users’ meeting, September 2006 25 Patrick Royston
Conclusions
• Use of familiar graphical tools with imputed times to event can give greater insight into censored survival data
Scatter plots, smoothers, etc
• Treatment or prognostic effects may be depressingly small when displayed as scatter plots of times
Much overlap between groupsWeak regression relationships
• Imputation of times may be helpful in multiple imputation with missing covariate values
12th UK Stata Users’ meeting, September 2006 26 Patrick Royston
Some references
• Buckley J, James I (1979) Linear regression with censored data. Biometrika 66: 429-436• Faucett CL, Schenker N, Taylor JMG (2002) Survival analysis using auxiliary variables via multiple
imputation, with application to AIDS clinical trial data. Biometrics 58: 37-47.• Ma SG (2006) Multiple augmentation with partial missing regressors. Biometrical Journal 48: 83-92• Pan W (2000) A multiple imputation approach to Cox regression with interval-censored data.
Biometrics 56: 199-203• Royston P (2001a) The lognormal distribution as a model for survival time in cancer, with an
emphasis on prognostic factors. Statistica Neerlandica 55: 89-104• Royston P (2001b) Flexible alternatives to the Cox model, and more. The Stata Journal 1:1-28. • Royston P, Parmar MKB (2002) Flexible proportional-hazards and proportional-odds models for
censored survival data, with application to prognostic modelling and estimation of treatment effects. Statistics in Medicine 21: 2175-2197
• Royston P (2004) Multiple imputation of missing values. Stata Journal 4: 227-241• Royston P (2005a) Multiple imputation of missing values: update. Stata Journal 5: 188-201• Royston P. (2005b) Multiple imputation of missing values: update of ice. Stata Journal 5: 527-536• Tanner MA, Wing HW (1987) The calculation of posterior distributions by data augmentation. JASA
82: 528-540. [Cited 642 times, WoS 10.9.2006]• Wei GCG, Tanner MA (1990) Posterior computations for censored regression data. JASA 85: 829-39• Wei GCG, Tanner MA (1991) Application of multiple imputation to the analysis of censored
regression data. Biometrics 47: 1297-1309
Recommended