Upload
magdalene-hudson
View
213
Download
0
Tags:
Embed Size (px)
Citation preview
1
On ranking in survival analysis: Bounds on the concordance index
Vikas C. Raykar | Harald Steck | Balaji Krishnapuram CAD & Knowledge Solutions (IKM CKS), Siemens Medical Solutions USA, Inc., Malvern, USA
Cary Dehing-Oberije | Philippe LambinMaastro clinic, University Hospital Maastricht, University Maastricht-GROW, The Netherlands
NIPS 2007
2
Organization
• Motivation• Brief review of survival analysis• Concordance index• Our proposed ranking approach• Connections to survival analysis• Results
3
Motivation: Personalized medicine
Predict survival time of lung cancer patients.
Different kinds of treatmentChemo/radiotherapy dosage
Different patient characteristicsAge/gender/health
Survival time
Dataset available from MAASTRO hospital our collaborator.
4
Why not use regression?
• Not amenable to standard statistical/ machine learning methods due to censored data.• Well studied in statistics as survival analysis.
5
Review: Survival Analysis
Branch of statistics that deals with time until the occurrence of a event
When did a patient die ? When did the disease manifest? When did the machine fail?
Widely used in medical statistics, epidemiology, reliability engineering, economics, sociology, marketing, insurance, etc.
6
2001TIME
Start of the study Data collected at this time
Patient 1 Death
What is censored data?
2005
End of study
Censored Data
At the end of the study a lot of patients may still survive.
Some patients die during the study period.
Patient unavailable for follow-up
The exact survival time may be longer than the observation period
7
Censoring provides only partial information
Censored Data
Observed Data
Su
rviv
al Tim
e
Typically a large portion of the data is censored.
9
Proportional Hazard (PH) Model
• Has become a standard model for studying the effect of covariates on survival time distributions.
Baseline hazard function
relativehazard function
covariate
unknown regression parameters
• Parameter estimates for PH model are obtained by maximizing Cox’s partial likelihood.
10
Concordance Index or c-index
• Standard performance measure for model assessment in survival analysis.
• Generalization of the area under the ROC curve to regression problems/censored data.
• Fraction of all pairs of subjects who's survival times can be ordered such that the subject with higher predicted survival is the one who actually survived longer.
11
Concordance Index-no censoring
2
3
1
4
5
1
5
4Survival time
23
covariate
C=1 perfect prediction accuracyC=0.5 as good as a random predictor
12
Concordance Index-with censoring
Censored
1
2
3
5
4Survival time
1
4
5
2
3
No arrow can go above a censored point
13
Proposed approach: Maximize CI directly
• While CI is widely used to evaluate a learnt model, it is not generally used as an objective function for training.
• CI is invariant to monotone transformation of the survival times.
• Hence the model learnt by maximizing the CI is a ranking function. (N-partite ranking problem)
14
Lower bounds on the CI
Discrete optimization problem
Use a differentiableconcave lower bound
Related to the PH model
15
Maximize lower bounds on the CI
Linear ranking functions
RegularizationUse gradient based methods to maximize this
16
Connection to the PH model
For a proportional hazard model we can show that
This is a common assumption made in ranking literature. We have shown that if we use PH models this is exactly the case.
Log-likelihood for correct ranking
18
Cox partial likelihood
• Our proposed method explicitly maximizes a lower bound.• Cox method maximizes partial likelihood.• Experimental results indicate that both do well.• Conjecture: Is Cox’s partial likelihood also a lower bound on the CI?