Upload
lynda
View
57
Download
0
Embed Size (px)
DESCRIPTION
Use of Candidate Predictive Biomarkers in the Design of Phase III Clinical Trials. Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute http://brb.nci.nih.gov. Predictive biomarkers - PowerPoint PPT Presentation
Citation preview
Use of Candidate Use of Candidate Predictive Biomarkers in Predictive Biomarkers in the Design of Phase III the Design of Phase III
Clinical TrialsClinical TrialsRichard Simon, D.Sc.Richard Simon, D.Sc.
Chief, Biometric Research BranchChief, Biometric Research BranchNational Cancer InstituteNational Cancer Institutehttp://brb.nci.nih.govhttp://brb.nci.nih.gov
Predictive biomarkersPredictive biomarkers Measured before treatment to identify who is Measured before treatment to identify who is
likely or unlikely to benefit from a particular likely or unlikely to benefit from a particular treatmenttreatment
ER, HER2, KRAS, EGFRER, HER2, KRAS, EGFR
Biomarker ValidityBiomarker Validity Analytical validityAnalytical validity
Measures what it’s supposed toMeasures what it’s supposed to Reproducible and robustReproducible and robust
Clinical validity (correlation)Clinical validity (correlation) It correlates with something clinicallyIt correlates with something clinically
Medical utilityMedical utility Actionable resulting in patient benefitActionable resulting in patient benefit
Developing a drug with a companion Developing a drug with a companion test increases complexity and cost of test increases complexity and cost of development but should improve development but should improve chance of success and has substantial chance of success and has substantial benefits for patients and for the benefits for patients and for the economics of health careeconomics of health care
How can we do it in a way that How can we do it in a way that provides the kind of reliable answers provides the kind of reliable answers we expect from phase III trials?we expect from phase III trials?
When the Biology is ClearWhen the Biology is Clear
1.1. Develop a completely specified classifier of the Develop a completely specified classifier of the patients likely (or unlikely) to benefit from a patients likely (or unlikely) to benefit from a new drugnew drug
Classifier is based on either a single Classifier is based on either a single gene/protein or composite scoregene/protein or composite score
2.2. Develop an analytically validated Develop an analytically validated 3.3. Design a focused clinical trial to evaluate Design a focused clinical trial to evaluate
effectiveness of the new treatment and how it effectiveness of the new treatment and how it relates to the testrelates to the test
Using phase II data, develop predictor of response to new drug
Develop Predictor of Response to New Drug
Patient Predicted Responsive
New Drug Control
Patient Predicted Non-Responsive
Off Study
Targeted (Enrichment) Design
Evaluating the Efficiency of Evaluating the Efficiency of Targeted DesignTargeted Design
Simon R and Maitnourim A. Evaluating the efficiency of Simon R and Maitnourim A. Evaluating the efficiency of targeted designs for randomized clinical trials. Clinical targeted designs for randomized clinical trials. Clinical Cancer Research 10:6759-63, 2004; Correction and Cancer Research 10:6759-63, 2004; Correction and supplement 12:3229, 2006supplement 12:3229, 2006
Maitnourim A and Simon R. On the efficiency of Maitnourim A and Simon R. On the efficiency of targeted clinical trials. Statistics in Medicine 24:329-targeted clinical trials. Statistics in Medicine 24:329-339, 2005.339, 2005.
Relative efficiency of targeted design Relative efficiency of targeted design depends on depends on proportion of patients test positiveproportion of patients test positive effectiveness of new drug (compared to control) effectiveness of new drug (compared to control)
for test negative patientsfor test negative patients When less than half of patients are test When less than half of patients are test
positive and the drug has little or no benefit positive and the drug has little or no benefit for test negative patients, the targeted for test negative patients, the targeted design requires dramatically fewer design requires dramatically fewer randomized patients than the standard randomized patients than the standard design in which the marker is not useddesign in which the marker is not used
Comparing T vs C on Survival Comparing T vs C on Survival or DFSor DFS
5% 2-sided Significance and 90% Power 5% 2-sided Significance and 90% Power % Reduction in Hazard Number of Events Required
25% 50930% 33235% 22740% 16245% 11850% 88
Hazard ratio 0.60 for test + patientsHazard ratio 0.60 for test + patients 40% reduction in hazard40% reduction in hazard
Hazard ratio 1.0 for test – patientsHazard ratio 1.0 for test – patients 0% reduction in hazard0% reduction in hazard
33% of patients test positive33% of patients test positive Hazard ratio for unselected population Hazard ratio for unselected population
is is 0.33*0.60 + 0.67*1 = 0.870.33*0.60 + 0.67*1 = 0.87 13% reduction in hazard13% reduction in hazard
To have 90% power for detecting To have 90% power for detecting 40% reduction in hazard within a 40% reduction in hazard within a biomarker positive subsetbiomarker positive subset Number of events within subset = 162Number of events within subset = 162
To have 90% power for detecting To have 90% power for detecting 13% reduction in hazard overall13% reduction in hazard overall Number of events = 2172Number of events = 2172
Stratification DesignStratification Design
Develop Predictor of Response to New Rx
Predicted Non-responsive to New Rx
Predicted ResponsiveTo New Rx
ControlNew RX Control
New RX
Develop prospective analysis plan for evaluation Develop prospective analysis plan for evaluation of treatment effect and how it relates to biomarkerof treatment effect and how it relates to biomarker type I error should be protected for multiple type I error should be protected for multiple
comparisonscomparisons Trial sized for evaluating treatment effect overall and in Trial sized for evaluating treatment effect overall and in
subsets defined by test subsets defined by test Stratifying” (balancing) the randomization is Stratifying” (balancing) the randomization is
useful to ensure that all randomized patients have useful to ensure that all randomized patients have the test performed but is not necessary for the the test performed but is not necessary for the validity of comparing treatments within marker validity of comparing treatments within marker defined subsetsdefined subsets
Post-stratification provides more time for development Post-stratification provides more time for development of analytically validated tests but risks validity of the of analytically validated tests but risks validity of the results if adequate specimens are not collected in -> results if adequate specimens are not collected in -> 100% of cases100% of cases
Fallback Analysis PlanFallback Analysis Plan
Compare the new drug to the control overall Compare the new drug to the control overall for all patients ignoring the classifier.for all patients ignoring the classifier. If pIf poveralloverall ≤ 0.01 claim effectiveness for the ≤ 0.01 claim effectiveness for the
eligible population as a wholeeligible population as a whole Otherwise perform a single subset analysis Otherwise perform a single subset analysis
evaluating the new drug in the classifier + evaluating the new drug in the classifier + patientspatients If pIf psubset subset ≤ 0.04 claim effectiveness for the ≤ 0.04 claim effectiveness for the
classifier + patients.classifier + patients.
Sample size for Analysis Plan Sample size for Analysis Plan
To have 90% power for detecting uniform To have 90% power for detecting uniform 33% reduction in overall hazard at 1% 33% reduction in overall hazard at 1% two-sided level requires 370 events.two-sided level requires 370 events.
If 33% of patients are positive, then when If 33% of patients are positive, then when there are 370 total events there will be there are 370 total events there will be approximately 123 events in positive approximately 123 events in positive patients patients 123 events provides 90% power for detecting 123 events provides 90% power for detecting
a 45% reduction in hazard at a 4% two-sided a 45% reduction in hazard at a 4% two-sided significance level. significance level.
To detect a 40% reduction in hazard in an To detect a 40% reduction in hazard in an a-priori defined subset with 90% power a-priori defined subset with 90% power and a 5% significance level requires 162 and a 5% significance level requires 162 events in the subset.events in the subset.
To detect a 40% reduction in hazard in an To detect a 40% reduction in hazard in an a-priori defined subset with 90% power a-priori defined subset with 90% power and a 4% two-sided significance level and a 4% two-sided significance level requires 171 events in the subset.requires 171 events in the subset.
If the prevalence of the marker is 33%, If the prevalence of the marker is 33%, then the trial might be sized for 3*171= then the trial might be sized for 3*171= total 513 events.total 513 events.
R Simon. Using genomics in clinical trial R Simon. Using genomics in clinical trial design, Clinical Cancer Research 14:5984-design, Clinical Cancer Research 14:5984-93, 200893, 2008
R Simon. Designs and adaptive analysis R Simon. Designs and adaptive analysis plans for pivotal clinical trials of plans for pivotal clinical trials of therapeutics and companion diagnostics, therapeutics and companion diagnostics, Expert Opinion in Medical Diagnostics Expert Opinion in Medical Diagnostics 2:721-29, 20082:721-29, 2008
Web Based Software for Web Based Software for Planning Clinical Trials of Planning Clinical Trials of
Treatments with a Treatments with a Candidate Predictive Candidate Predictive
BiomarkerBiomarker http://brb.nci.nih.gov http://brb.nci.nih.gov
The Biology is Often Not So The Biology is Often Not So ClearClear
Cancer biology is complex and it is not Cancer biology is complex and it is not always possible to have the right single always possible to have the right single completely defined predictive classifier completely defined predictive classifier identified and analytically validated by the identified and analytically validated by the time the pivotal trial of a new drug is time the pivotal trial of a new drug is ready to start accrualready to start accrual
K Candidate Biomarkers K Candidate Biomarkers DesignDesign
Based on Adaptive Threshold Based on Adaptive Threshold DesignDesign
W Jiang, B Freidlin & R SimonW Jiang, B Freidlin & R SimonJNCI 99:1036-43, 2007JNCI 99:1036-43, 2007
K Candidate Biomarkers K Candidate Biomarkers DesignDesign
Have identified K candidate binary Have identified K candidate binary classifiers Bclassifiers B11 , …, B , …, BKK thought to be thought to be predictive of patients likely to predictive of patients likely to benefit from T relative to Cbenefit from T relative to C
Eligibility not restricted by Eligibility not restricted by candidate markerscandidate markers
Compare T vs C for all patientsCompare T vs C for all patients If results are significant at level .01 claim broad effectiveness of If results are significant at level .01 claim broad effectiveness of
TT Otherwise proceed as followsOtherwise proceed as follows
Compare T vs C for the subset of patients positive for Compare T vs C for the subset of patients positive for marker 1; compute pmarker 1; compute p11
Similarly compare T vs C for the subset of patients Similarly compare T vs C for the subset of patients positive for marker 2 (ppositive for marker 2 (p22), positive for marker 3 (p), positive for marker 3 (p33), …), …positive for marker K (ppositive for marker K (pkk))
Compute p* = min{pCompute p* = min{p11 , p , p22 , …, p , …, pKK}} Compute whether a value of p* is statistically Compute whether a value of p* is statistically
significant when adjusted for multiple testingsignificant when adjusted for multiple testing Adjust for multiple testing using permutation of treatment Adjust for multiple testing using permutation of treatment
labels to adjust for correlation among testslabels to adjust for correlation among tests
To detect a 40% reduction in hazard in To detect a 40% reduction in hazard in an a-priori defined subset with 90% an a-priori defined subset with 90% power and a 4% two-sided significance power and a 4% two-sided significance level requires 171 events in the subset.level requires 171 events in the subset.
If the prevalence of the marker is 33%, If the prevalence of the marker is 33%, then the trial might be sized for then the trial might be sized for 3*171= total 513 events.3*171= total 513 events.
To adjust for multiplicity with 4 To adjust for multiplicity with 4 independent tests, 171 -> 224; 513 -> independent tests, 171 -> 224; 513 -> 672 total events.672 total events.
Designs When there are Designs When there are Many Candidate Markers Many Candidate Markers
and too Much Patient and too Much Patient Heterogeneity for any Heterogeneity for any
Single Marker Single Marker
Adaptive Signature Adaptive Signature DesignDesign
Boris Freidlin and Boris Freidlin and Richard SimonRichard Simon
Clinical Cancer Research 11:7872-8, Clinical Cancer Research 11:7872-8, 20052005
Biomarker Adaptive Signature Biomarker Adaptive Signature DesignDesign
Randomized trial of T vs CRandomized trial of T vs C Large number of candidate Large number of candidate
predictive biomarkers availablepredictive biomarkers available Eligibility not restricted by any Eligibility not restricted by any
biomarkerbiomarker This approach can be used with any This approach can be used with any
set of candidate markersset of candidate markers
End of Trial AnalysisEnd of Trial AnalysisFallback AnalysisFallback Analysis
Compare T to C for Compare T to C for all patientsall patients at at significance level αsignificance level α00 (eg 0.01) (eg 0.01) If overall HIf overall H00 is rejected, then claim is rejected, then claim
effectiveness of T for eligible patientseffectiveness of T for eligible patients Otherwise proceed as followsOtherwise proceed as follows
Using only a randomly selected subset of Using only a randomly selected subset of patients of pre-specified size (e.g. patients of pre-specified size (e.g. 1/31/3) to be ) to be used as a training set used as a training set TT, develop a binary , develop a binary classifier M based of whether a patient is classifier M based of whether a patient is likely to benefit from T relative to Clikely to benefit from T relative to C The classifier may use multiple markersThe classifier may use multiple markers The classifier classifies patients into only 2 The classifier classifies patients into only 2
subsets; those predicted to benefit from T and subsets; those predicted to benefit from T and those for whom T is not predicted better than those for whom T is not predicted better than CC
Apply the classifier M to classify Apply the classifier M to classify patients in the validation set patients in the validation set V=D-V=D-TT
Compare T vs C in the subset of Compare T vs C in the subset of V V who are predicted to benefit from T who are predicted to benefit from T using a threshold of significance of using a threshold of significance of 0.040.04
This approach can also be used to This approach can also be used to identify the subset of patients who identify the subset of patients who don’t benefit from T in cases where don’t benefit from T in cases where T is superior to C overall at the 0.01 T is superior to C overall at the 0.01 level. level.
Cross-Validated Cross-Validated Adaptive Signature Adaptive Signature
DesignDesign Freidlin B, Jiang W, Simon RFreidlin B, Jiang W, Simon RClinical Cancer Research 16(2) 2010Clinical Cancer Research 16(2) 2010
At the conclusion of the trial randomly At the conclusion of the trial randomly partition the patients into K approximately partition the patients into K approximately equally sized sets Pequally sized sets P11 , … , P , … , PKK
Let DLet D-i-i denote the full dataset minus data for denote the full dataset minus data for patients in Ppatients in Pii
Omit patients in POmit patients in P11 Apply the defined algorithm to analyze the Apply the defined algorithm to analyze the
data in Ddata in D-1 -1 to obtain a classifier Mto obtain a classifier M-1-1 Classify each patient j in PClassify each patient j in P11 using model M using model M-1-1 Record the treatment recommendation T or CRecord the treatment recommendation T or C
Repeat the above for all K loops of Repeat the above for all K loops of the cross-validationthe cross-validation
All patients have been classified All patients have been classified once as what their optimal treatment once as what their optimal treatment is predicted to be is predicted to be
Let Let SSTT denote the set of patients for whom denote the set of patients for whom treatment T is predicted optimal treatment T is predicted optimal
Compare outcomes for patients in SCompare outcomes for patients in STT who who actually received T to those in Sactually received T to those in STT who who actually received Cactually received C Compute Kaplan Meier curves of those Compute Kaplan Meier curves of those
receiving T and those receiving Creceiving T and those receiving C Let zLet zT T = standardized log-rank statistic = standardized log-rank statistic
Test of Significance for Effectiveness of T vs Test of Significance for Effectiveness of T vs C C
Compute statistical significance of zCompute statistical significance of zTT by randomly permuting treatment by randomly permuting treatment labels and repeating the entire labels and repeating the entire cross-validation procedurecross-validation procedure Do this 1000 or more times to generate Do this 1000 or more times to generate
the permutation null distribution of the permutation null distribution of treatment effect for the patients in each treatment effect for the patients in each subsetsubset
By applying the analysis algorithm to By applying the analysis algorithm to the full RCT dataset the full RCT dataset DD, , recommendations are developed for recommendations are developed for how future patients should be how future patients should be treatedtreated
The size of the T vs C treatment The size of the T vs C treatment effect for the indicated population is effect for the indicated population is (conservatively) estimated by the (conservatively) estimated by the Kaplan Meier survival curves of T Kaplan Meier survival curves of T and of C in Sand of C in STT
70% Response to T in Sensitive Patients70% Response to T in Sensitive Patients25% Response to T Otherwise25% Response to T Otherwise
25% Response to C25% Response to C30% Patients Sensitive30% Patients Sensitive
ASD CV-ASD
Overall 0.05 Test 0.830 0.838
Overall 0.04 Test 0.794 0.808
Sensitive Subset 0.01 Test
0.306 0.723
Overall Power 0.825 0.918
506 prostate cancer patients were randomly allocated to one of four arms: Placebo and 0.2 mg of diethylstilbestrol (DES) were combined as control arm C
1.0 mg DES, or 5.0 mg DES were combined as T.
The end-point was overall survival (death from any cause).Covariates: Age: In years
Performance status (pf): Not bed-ridden at all vs other
Tumor size (sz): Size of the primary tumor (cm2)
Index of a combination of tumor stage and histologic grade (sg)
Serum phosphatic acid phosphatase levels (ap)
Figure 1: Overall analysis. The value of the log-rank statistic Figure 1: Overall analysis. The value of the log-rank statistic is 2.9 and the corresponding p-value is 0.09. The new is 2.9 and the corresponding p-value is 0.09. The new
treatment thus shows no benefit overall at the 0.05 level.treatment thus shows no benefit overall at the 0.05 level.
Figure 2: Cross-validated survival curves for patients Figure 2: Cross-validated survival curves for patients predicted to benefit from the new treatment. log-rank statistic predicted to benefit from the new treatment. log-rank statistic
= 10.0, permutation p-value is .002= 10.0, permutation p-value is .002
Figure 3: Survival curves for cases predicted not to benefit Figure 3: Survival curves for cases predicted not to benefit from the new treatment. The value of the log-rank statistic is from the new treatment. The value of the log-rank statistic is
0.54.0.54.
AcknowledgementsAcknowledgements
Boris FreidlinBoris Freidlin Yingdong ZhaoYingdong Zhao Wenyu JiangWenyu Jiang Aboubakar MaitournamAboubakar Maitournam