View
213
Download
1
Category
Tags:
Preview:
Citation preview
Model and Variable Selections for Personalized Medicine
Lu Tian (Northwestern University)
Hajime Uno (Kitasato University)
Tianxi Cai, Els Goetghebeur, L.J. Wei (Harvard University)
Outline
Background and motivation
Developing and evaluating prediction rules based on a set of markers for Continuous or binary outcome Censored event time outcome Evaluating the incremental value of a biomarker over
the entire population various sub-populations
Incorporating the patient level precision of the prediction Prediction intervals/sets
Remarks
Background and Motivation
DiagnosisPrognosis Treatment
Personalized medicine: using information about a person’s biological and genetic make up to tailor strategies for the prevention, detection and treatment of disease
Important step: develop prediction rules that can accurately predict health outcome or diagnosis of clinical phenotype
Background and Motivation
Subject CharacteristicsBiomarkers
Genetic Markers
Predictor Z Outcome Y
Disease statusTime to event
Treatment Response
Accurate prediction of disease outcome and treatment response, however, are complex and difficult tasks.
Developing prediction rules involve Identifying important predictors Evaluating the accuracy of the prediction Evaluating the incremental value of new markers
Background and Motivation AIDS Clinical Trial : ACTG320
Study objective: to compare 3-drug regimen (n=579): Zidovudine + Lamivudine + Indinarvir 2-drug regimen (n=577): Zidovudine + Lamivudine
Identify biomarkers for predicting treatment response
How well can we predict the treatment response? Is RNA needed?
Age, CD4week 0, CD4week 8 RNAweek 0, RNAweek 8
Predictor Z
CD4week 24
Outcome Y?
Background and Motivation
CD4week 24Predictors
AssociationCoefficients for RNA significant?
Is RNA needed?
Regression Analysis: ZY '
Background and Motivation AIDS Clinical Trial
Age RNAweek 0 RNAweek 8 CD4week 0 CD4week 8
Estimate -0.55 0.08 -12.06 0.03 0.68
SE 0.35 5.53 2.80 0.07 0.10
Pvalue 0.12 0.99 0.00 0.72 0.00
Regression Coefficient
Coefficient for RNAweek 8 highly significant RNA needed for a more precise prediction of responses??
Background and Motivation
Y = CD4week 8Z=PredictorsIs RNA needed?
Does adding RNA improve the prediction?
is? )(ZYhat than w
Y closer to RNA),(ZY Is
01
02(Z)Yprediction procedure
)(ZY 01RNA),(ZY 02
(Z)Y(Z)Y
1. Prediction rule: based on regression models2. The distance between and Y?
Developing Prediction RulesBased on a Set of Markers
)'()|( ZgZtTP tt
Regression approach to approximate Y | Z Continuous or binary outcome: Generalize linear
regression Survival outcome:
Proportional Hazards model Time-specific prediction models
Regression modeling as a vehicle: the procedure has to be valid when the imposed statistical
model is not the true model!
Developing and Evaluating Prediction Rules
Predict Y with Z based on the prediction model
Evaluate the performance of the prediction by the average “distance” between and Y The utility or cost to predicting Y as is The average “distance” is
(Z)Y
c} )'ˆ({)(ˆ )'ˆ()(ˆ :Examples ZgIZYZgZY
][ )}(ˆ,{ ZYYdED (Z)Y )}(ˆ,{ ZYYd
Examples:
Absolute prediction error: |Y-(Z)Y|Y}(Z),Y{d Total “Cost” of Risk Stratification:
d01 d02 d03
d11 d11 d31
Y = 0
Y = 1
1(Z)Y 2(Z)Y 3(Z)Y
kydy}Yk,(Z)Y{d
Evaluating and Comparing Prediction Rules
The performance of the prediction model/rule with can be estimated by
Prediction Model/Rule Comparison: Prediction with E(Y | Z) = g1(a’Z) vs E(Y | W) = g2(b’W) Compare two models/rules by comparing
n
iii ZYYdnD
1
1 )(ˆ , ˆ
n
iiiii ZYYdZYYdnDD
121
121 )}(ˆ{)}(ˆ{ˆˆˆ
(Z)Y
(Z)}Yd{Y 1 (Z)}Yd{Y 2and
Variability in the prediction errors: Estimate = 50, SE = 1? SE = 50?
Inference about D and = D1 – D2
Confidence intervals based on large sample approximations to the distribution of
)ˆ( ),ˆ( 2/12/1 nDDn
Variability in the Estimated Prediction Performance Measures
Bias Correction
Bias issue in the apparent error type estimators Bias correction via Cross-validation:
Data partition Tk, Vk
For each partition Obtain based on observations in Tk
Obtain based on observations in Vk
Obtain cross-validated estimator
β )(-k
)β(Dk
)ˆ(ˆ~)(
1
1k
K
kkDKD
))ˆ(ˆ(2/1 DDn and have the same limiting distribution
)~
(2/1 DDn
Example: AIDS Clinical Trial
Objective: identify biomarkers to predict the treatment response
Outcome: Y = CD4week 24
Predictors Z: Age, CD4week 0, CD4week 8,
RNAweek 0, RNAweek 8
Working Model: E(Y|Z) = ’’ZZ
Example: AIDS Clinical TrialIncremental Value of RNA
Full Model
w/o RNA
Apparent 51 (2.7*)
52 (2.7)
10-fold CV 52 53
2n/3 CV 53 53
Apparent [46, 56] [47, 57]
10-fold CV [47, 57] [48, 58]
2n/3 CV [48, 58] [48, 58]
Gain Due to RNA
-0.61(0.61)
-0.64
-0.28
[-2.0, 0.4]
[-2.0, 0.4]
[-1.5, 0.9]
* : Std Error Estimates
Estimates
95% C.I.
Incremental Value of RNA within Various Sub-populations
Trandolapril Cardiac Evaluation Study
(Kober et al 2005, NEJM)
• Prognostic importance of the left ventricular dysfunction– Thune et al (2005) : Diamond study– Trace study (Kober et al 2005, NEJM)
• Designed to determine whether patients w/ left ventricular dysfunction soon after myocardial infarction benefit from long-term oral ACE inhibition
• Between 1990 and 1992, a total of 6676 patients with myocardial infarction were screened with echocardiography
• A total of 5921 subjects had available data
Trandolapril Cardiac Evaluation Study (Kober et al 2005, NEJM)
• Routine Markers include: – Age– creatine (CRE)– occurrence of heart failure (CHF)– history of diabetes (DIA),– history of hypertension (HYP), – cardiogenic shock after MI (KS)
• We are interested in evaluating in the incremental value of wall motion index (WMI)
Age CRE CHF DIA HYP KS WMI
Est .055 -.010 .759 .718 .187 1.153 -1.097
SE .004 .002 .067 .101 .073 .163 .083
Pvalue .000 .000 .000 .000 .010 .000 .000
• Does WMI improve the prediction of 5-year survival?
Trandolapril Cardiac Evaluation Study (Kober et al 2005, NEJM)
OME
Routine Markers w/o WMI 0.28
Markers Including WMI 0.26
Population Gain Attributed to WMI
0.02
Population Average Incremental Value of WMI
Predicting 5-year Survival
5-year mortality rate = 42%
D1 D2
)0ˆ,1()1ˆ,0(
)ˆ()ˆ,(
YYIYYI
YYIYYD
)0ˆ,1()1ˆ,0(
)ˆ()ˆ,(
YYIYYI
YYIYYD
1ˆ and 0 ofError YY 0ˆ and 1 ofError YY
Gain
Du
e t
o
WM
I
= 1 = 4 = 9
)0ˆ,1()1ˆ,0()ˆ,( YYIYYIYYD
Gain
Du
e t
o W
MI
wit
h r
esp
ect
to
D
ExampleBreast Cancer Gene Expression Study
Objective: construct a new classifier that can accurately predict future disease outcome
van’t Veer et al (2002) established a classifier based on a 70-gene profile good- or poor-prognosis signature based on their correlation with the
previously determined average profile in tumors from patients with good prognosis
Classify subjects as Good prognosis if Gene score > cut-off Poor prognosis if Gene score < cut-off
van de Vijver et al (2002) evaluated the accuracy of this classifier by using hazard ratios and signature specific Kaplan Meier curves
ExampleBreast Cancer Gene Expression Study
Data consist of 295 Subjects Outcome T: time to death Predictors: Lymph-Node Status, Estrogen Receptor
Status, gene score
We are interested in Constructing prediction rules for identify subjects who would
survive t-year, Y = I(T t)=1.
Evaluating the incremental value of the Gene Score.
ModelApparent
Error
Naïve 0.30 (0.031)
Clinical only 0.28 (0.033)
Clinical +Gene Score 0.25 (0.036)
Van de Vijver 0.35 (0.050)
10-fold
CV
Random
CV
0.29 0.30
0.30 0.28
0.27 0.28
Example: Breast Cancer DataPredicting 10-year Survival
Evaluating the Prediction RuleBased on Various Accuracy Measures
For a future patient with T0 and Z0, we predict
Classification accuracy measures
Sensitivity Specificity
Prediction accuracy measures
c)Z'β( if 00 gtT c)Z'β( if 00 gtT
}'β({)( 00 tc|T)ZgPcSE }|c)Z'β({)( 00 tTgPcSP
}'β(|{)( 00 c)ZgtTPcPPV c})Z'β(|{)( 00 gtTPcNPV
Naïve o Clinical Clinical + Gene van de Vijver
Example: Breast Cancer DataPredicting 10-year Survival
Example: Breast Cancer Data
To compare Model II: g(a + Node + ER) Model III: g(a + Node + ER + Gene)
Choosing cut-off values for each model to achieve SE = 69% which is an attainable value for Model II, then
Model II SP = 0.45, PPV = 0.35, NPV = 0.77 Model III SP = 0.75, PPV = 0.54, NPV = 0.85 95% CI for the difference in
SP: [0.11, 0.45], PPV: [0.01, 0.24], NPV: [0.06, 0.19]
Prediction IntervalAccounting for the Precision of the Prediction
Based on a prediction model predict the response summarize the corresponding population average accuracy
)(ˆ as 00 ZYY
][ )}(ˆ,{ˆ 000 ZYYdEDD
)(ˆ 0ZY
What if the population average accuracy of 70% is not satisfactory? How to achieve 90% accuracy?
What if can predict Y0 more precisely for certain Z0, while on the other hand fails to predict Y0 accurately?
Account for the precision of the prediction? Identify patients would need further assessment?
Predicted Risk = 0.04Predicted Risk = 0.51
Classic Rule: Risk of Death < 0.50 Survivor {Y=0} Risk of Death ≥ 0.50 Non-survivor {Y=1}
{1} {0}
Prediction Interval
To account for patient-level prediction error, one may instead predict such that
The optimal interval for the population with Z0 is
: estimated conditional density function
)(ˆ 00 ZKY
})|(ˆ:{)(ˆ,
00 cZyfyZK
)|(ˆ 0Zyf
}|)(ˆ{ 000 ZZKYP
Example: Breast Cancer Study
Data: 295 patients Response: 10 year survival Predictors: Lymph-Node Status, Estrogen Receptor Status, Gene
Score
Model
Possible prediction sets: {}, {0}, {1}, {0,1} Classic prediction: considers {0}, {1} only.
)'β()|10( ZgZTP
Predicted Risk = 0.51 Predicted Risk = 0.04
90% Prediction Set: {0,1} 90% Prediction Set: {0}
Example: Breast Cancer Study Prediction Sets Based on Clinical + Gene Score
(0%)
(63%)
(37%)
4%
39%
57%
Proper choice of the accuracy/cost measure Classification accuracy vs predictive values Utility function: what is the consequence of predicting
a subject with outcome Y as
With an expensive or invasive marker Should it be applied to the entire population? Is it helpful for a certain sub-population? Should the cost of the marker be considered when
evaluating its value?
Remarks
(Z)Y
Recommended