Upload
madlyn-rogers
View
216
Download
3
Tags:
Embed Size (px)
Citation preview
11
Issues in the Use of Adaptive Clinical Trial Designs
Issues in the Use of Adaptive Clinical Trial Designs
Scott S. Emerson, M.D., Ph.D.Professor of Biostatistics University of Washington
Scott S. Emerson, M.D., Ph.D.Professor of Biostatistics University of Washington
© 2002, 2003, 2004 Scott S. Emerson, M.D., Ph.D.© 2002, 2003, 2004 Scott S. Emerson, M.D., Ph.D.
22
Clinical TrialsClinical Trials
Experimentation in human volunteers– Investigates a new treatment/preventive agent
• Safety: » Are there adverse effects that clearly outweigh any
potential benefit?• Efficacy:
» Can the treatment alter the disease process in a beneficial way?
• Effectiveness: » Would adoption of the treatment as a standard affect
morbidity / mortality in the population?
Experimentation in human volunteers– Investigates a new treatment/preventive agent
• Safety: » Are there adverse effects that clearly outweigh any
potential benefit?• Efficacy:
» Can the treatment alter the disease process in a beneficial way?
• Effectiveness: » Would adoption of the treatment as a standard affect
morbidity / mortality in the population?
33
Clinical Trial DesignClinical Trial Design
Finding an approach that best addresses the often competing goals: Science, Ethics, Efficiency
• Basic scientists: focus on mechanisms• Clinical scientists: focus on overall patient health• Ethical: focus on patients on trial, future patients• Economic: focus on profits and/or costs• Governmental: focus on validity of marketing claims• Statistical: focus on questions answered precisely • Operational: focus on feasibility of mounting trial
Finding an approach that best addresses the often competing goals: Science, Ethics, Efficiency
• Basic scientists: focus on mechanisms• Clinical scientists: focus on overall patient health• Ethical: focus on patients on trial, future patients• Economic: focus on profits and/or costs• Governmental: focus on validity of marketing claims• Statistical: focus on questions answered precisely • Operational: focus on feasibility of mounting trial
44
Statistical PlanningStatistical Planning
Ensure that the trial will satisfy the various collaborators as much as possible
• Discriminate between relevant scientific hypotheses– Scientific and statistical credibility
• Protect economic interests of sponsor– Efficient designs– Economically important estimates
• Protect interests of patients on trial– Stop if unsafe or unethical– Stop when credible decision can be made
• Promote rapid discovery of new beneficial treatments
Ensure that the trial will satisfy the various collaborators as much as possible
• Discriminate between relevant scientific hypotheses– Scientific and statistical credibility
• Protect economic interests of sponsor– Efficient designs– Economically important estimates
• Protect interests of patients on trial– Stop if unsafe or unethical– Stop when credible decision can be made
• Promote rapid discovery of new beneficial treatments
55
Refine Scientific HypothesesRefine Scientific Hypotheses
– Target population• Inclusion, exclusion, important subgroups
– Intervention• Dose, administration (intention to treat)
– Measurement of outcome(s)• Efficacy/effectiveness, toxicity
– Statistical hypotheses in terms of some summary measure of outcome distribution
• Mean, geometric mean, median, odds, hazard, etc.
– Criteria for statistical credibility• Frequentist (type I, II errors) or Bayesian
– Target population• Inclusion, exclusion, important subgroups
– Intervention• Dose, administration (intention to treat)
– Measurement of outcome(s)• Efficacy/effectiveness, toxicity
– Statistical hypotheses in terms of some summary measure of outcome distribution
• Mean, geometric mean, median, odds, hazard, etc.
– Criteria for statistical credibility• Frequentist (type I, II errors) or Bayesian
66
Statistics to Address VariabilityStatistics to Address Variability
At the end of the study:– Frequentist and/or Bayesian data analysis to assess
the credibility of clinical trial results• Estimate of the treatment effect
– Single best estimate– Precision of estimates
• Decision for or against hypotheses– Binary decision– Quantification of strength of evidence
At the end of the study:– Frequentist and/or Bayesian data analysis to assess
the credibility of clinical trial results• Estimate of the treatment effect
– Single best estimate– Precision of estimates
• Decision for or against hypotheses– Binary decision– Quantification of strength of evidence
77
Statistical Sampling PlanStatistical Sampling Plan
Ethical and efficiency concerns are addressed through sequential sampling
• During the conduct of the study, data are analyzed at periodic intervals and reviewed by the DMC
• Using interim estimates of treatment effect– Decide whether to continue the trial– If continuing, decide on any modifications to
» scientific / statistical hypotheses and/or» sampling scheme
Ethical and efficiency concerns are addressed through sequential sampling
• During the conduct of the study, data are analyzed at periodic intervals and reviewed by the DMC
• Using interim estimates of treatment effect– Decide whether to continue the trial– If continuing, decide on any modifications to
» scientific / statistical hypotheses and/or» sampling scheme
88
Sampling Plan: General ApproachSampling Plan: General Approach
– Perform analyses when sample sizes N1. . . NJ
• Can be randomly determined
– At each analysis choose stopping boundaries• aj < bj < cj < dj
– Compute test statistic T(X1. . . XNj)
• Stop if T < aj (extremely low)
• Stop if bj < T < cj (approximate equivalence)
• Stop if T > dj (extremely high)
• Otherwise continue (with possible adaptive modification of analysis schedule, sample size, etc.)
– Boundaries for modification of sampling plan
– Perform analyses when sample sizes N1. . . NJ
• Can be randomly determined
– At each analysis choose stopping boundaries• aj < bj < cj < dj
– Compute test statistic T(X1. . . XNj)
• Stop if T < aj (extremely low)
• Stop if bj < T < cj (approximate equivalence)
• Stop if T > dj (extremely high)
• Otherwise continue (with possible adaptive modification of analysis schedule, sample size, etc.)
– Boundaries for modification of sampling plan
99
Sequential Sampling IssuesSequential Sampling Issues
– Design stage• Choosing sampling plan which satisfies desired operating
characteristics– E.g., type I error, power, sample size requirements
– Monitoring stage• Flexible implementation of the stopping rule to account for
assumptions made at design stage– E.g., adjust sample size to account for observed
variance
– Analysis stage• Providing inference based on true sampling distribution of
test statistics
– Design stage• Choosing sampling plan which satisfies desired operating
characteristics– E.g., type I error, power, sample size requirements
– Monitoring stage• Flexible implementation of the stopping rule to account for
assumptions made at design stage– E.g., adjust sample size to account for observed
variance
– Analysis stage• Providing inference based on true sampling distribution of
test statistics
1010
Sequential Sampling StrategiesSequential Sampling Strategies
Two broad categories of sequential sampling– Prespecified stopping guidelines
– Adaptive procedures
Two broad categories of sequential sampling– Prespecified stopping guidelines
– Adaptive procedures
1111
Prespecified Stopping PlansPrespecified Stopping Plans
Prior to collection of data, specify– Scientific and statistical hypotheses of interest– Statistical criteria for credible evidence– Rule for determining maximal statistical information
• E.g., fix power, maximal sample size, or calendar time
– Randomization scheme– Rule for determining schedule of analyses
• E.g., according to sample size, statistical information, or calendar time
– Rule for determining conditions for early stopping• E.g., boundary shape function for stopping boundaries on the
scale of some test statistic
Prior to collection of data, specify– Scientific and statistical hypotheses of interest– Statistical criteria for credible evidence– Rule for determining maximal statistical information
• E.g., fix power, maximal sample size, or calendar time
– Randomization scheme– Rule for determining schedule of analyses
• E.g., according to sample size, statistical information, or calendar time
– Rule for determining conditions for early stopping• E.g., boundary shape function for stopping boundaries on the
scale of some test statistic
1212
Adaptive Sampling PlansAdaptive Sampling Plans
At each interim analysis, possibly modify– Scientific and statistical hypotheses of interest– Statistical criteria for credible evidence– Maximal statistical information– Randomization ratios– Schedule of analyses– Conditions for early stopping
At each interim analysis, possibly modify– Scientific and statistical hypotheses of interest– Statistical criteria for credible evidence– Maximal statistical information– Randomization ratios– Schedule of analyses– Conditions for early stopping
1313
Adaptive Sampling: ExamplesAdaptive Sampling: Examples
– E.g., Modify sample size to account for estimated information (variance or baseline rates)
• No effect on type I error IF– Estimated information independent of estimate of
treatment effect» Proportional hazards,» Normal data, and/or» Carefully phrased alternatives
– And willing to use conditional inference» Carefully phrased alternatives
– E.g., Modify sample size to account for estimated information (variance or baseline rates)
• No effect on type I error IF– Estimated information independent of estimate of
treatment effect» Proportional hazards,» Normal data, and/or» Carefully phrased alternatives
– And willing to use conditional inference» Carefully phrased alternatives
1414
Estimation of Statistical InformationEstimation of Statistical InformationIf maximal sample size is maintained, the study
discriminates between null hypothesis and an alternative measured in units of statistical information
If maximal sample size is maintained, the study discriminates between null hypothesis and an alternative measured in units of statistical information
V
nV
n2
01
21
201
21
)()(
1515
Estimation of Statistical InformationEstimation of Statistical InformationIf statistical power is maintained, the study sample
size is measured in units of statistical information
If statistical power is maintained, the study sample size is measured in units of statistical information
201
21
201
21
)()(
V
nVn
1616
Adaptive Sampling: ExamplesAdaptive Sampling: Examples
– E.g., Proschan & Hunsberger (1995)• Modify ultimate sample size based on conditional power
– Computed under current best estimate (if high enough)• Make adjustment to inference to maintain Type I error
– E.g., Self-designing Trial (Fisher, 1998)• Combine arbitrary test statistics from sequential groups• Prespecify weighting of groups “just in time”
– Specified at immediately preceding analysis• Fisher’s test statistic is N(0,1) under the null hypothesis of no
treatment difference on any of the endpoints tested
– E.g., Randomized Play the Winner• Biased coin favors currently best performing treatment
– E.g., Proschan & Hunsberger (1995)• Modify ultimate sample size based on conditional power
– Computed under current best estimate (if high enough)• Make adjustment to inference to maintain Type I error
– E.g., Self-designing Trial (Fisher, 1998)• Combine arbitrary test statistics from sequential groups• Prespecify weighting of groups “just in time”
– Specified at immediately preceding analysis• Fisher’s test statistic is N(0,1) under the null hypothesis of no
treatment difference on any of the endpoints tested
– E.g., Randomized Play the Winner• Biased coin favors currently best performing treatment
1717
Motivation for Adaptive DesignsMotivation for Adaptive Designs
Scientific and statistical hypotheses of interest– Modify target population, intervention, measurement
of outcome, alternative hypotheses of interest– Possible justification
• Changing conditions in medical environment– Approval/withdrawal of competing/ancillary treatments– Diagnostic procedures
• New knowledge from other trials about similar treatments• Evidence from ongoing trial
– Toxicity profile (therapeutic index)– Subgroup effects
Scientific and statistical hypotheses of interest– Modify target population, intervention, measurement
of outcome, alternative hypotheses of interest– Possible justification
• Changing conditions in medical environment– Approval/withdrawal of competing/ancillary treatments– Diagnostic procedures
• New knowledge from other trials about similar treatments• Evidence from ongoing trial
– Toxicity profile (therapeutic index)– Subgroup effects
1818
Motivation for Adaptive DesignsMotivation for Adaptive Designs
Modification of other design parameters may have great impact on the hypotheses considered– Statistical criteria for credible evidence– Maximal statistical information– Randomization ratios– Schedule of analyses– Conditions for early stopping
Modification of other design parameters may have great impact on the hypotheses considered– Statistical criteria for credible evidence– Maximal statistical information– Randomization ratios– Schedule of analyses– Conditions for early stopping
1919
Prespecified vs AdaptivePrespecified vs Adaptive
Major issues with use of adaptive designs– What do we truly gain?
• Can proper evaluation of trial designs obviate need?
– What can we lose?• Efficiency? (and how should it be measured?)• Scientific inference?
– Science vs Statistics vs Game theory – Definition of scientific/statistical hypotheses– Quantifying precision of inference
Major issues with use of adaptive designs– What do we truly gain?
• Can proper evaluation of trial designs obviate need?
– What can we lose?• Efficiency? (and how should it be measured?)• Scientific inference?
– Science vs Statistics vs Game theory – Definition of scientific/statistical hypotheses– Quantifying precision of inference
2020
Major Issue: Frequentist InferenceMajor Issue: Frequentist Inference
Frequentist inference is still the most commonly used form of quantifying statistical strength of evidence– Estimates which minimize bias, MSE– Confidence intervals– P values; type I, II errors
Frequentist inference depends on sampling distribution
Frequentist inference is still the most commonly used form of quantifying statistical strength of evidence– Estimates which minimize bias, MSE– Confidence intervals– P values; type I, II errors
Frequentist inference depends on sampling distribution
2121
Prespecified Sampling PlanPrespecified Sampling Plan
– Perform analyses when sample sizes N1. . . NJ
• Can be randomly determined
– At each analysis choose stopping boundaries• aj < bj < cj < dj
– Compute test statistic T(X1. . . XNj)
• Stop if T < aj (extremely low)
• Stop if bj < T < cj (approximate equivalence)
• Stop if T > dj (extremely high)
• Otherwise continue as prespecified
– Perform analyses when sample sizes N1. . . NJ
• Can be randomly determined
– At each analysis choose stopping boundaries• aj < bj < cj < dj
– Compute test statistic T(X1. . . XNj)
• Stop if T < aj (extremely low)
• Stop if bj < T < cj (approximate equivalence)
• Stop if T > dj (extremely high)
• Otherwise continue as prespecified
2222
Boundary ScalesBoundary Scales
– Stopping rule for one test statistic is easily transformed to a rule for another statistic
• “Group sequential stopping rules”– Sum of observations– Point estimate of treatment effect– Normalized (Z) statistic– Fixed sample P value– Error spending function
• Conditional probability• Predictive probability• Bayesian posterior probability
– Stopping rule for one test statistic is easily transformed to a rule for another statistic
• “Group sequential stopping rules”– Sum of observations– Point estimate of treatment effect– Normalized (Z) statistic– Fixed sample P value– Error spending function
• Conditional probability• Predictive probability• Bayesian posterior probability
2323
Unified Family: MLE ScaleUnified Family: MLE Scale
Boundary shape function unifying families of stopping rules (Kittelson & Emerson, 1999) – Wang & Tsiatis (1987) based families (R=0, A=0)
• P=1: O’Brien & Fleming (1979); P= 0.5: Pocock (1977)• Emerson & Fleming (1989); Pampallona & Tsiatis (1994)
– Triangular test (Whitehead, 1983): (P=1, R=0, A=1)– Seq cond probability ratio test (Xiong & Tan, 1994)– Some boundaries constant on conditional or
predictive power– Extensions: Peto-Haybittle (using Burington &
Emerson, 2003)
Boundary shape function unifying families of stopping rules (Kittelson & Emerson, 1999) – Wang & Tsiatis (1987) based families (R=0, A=0)
• P=1: O’Brien & Fleming (1979); P= 0.5: Pocock (1977)• Emerson & Fleming (1989); Pampallona & Tsiatis (1994)
– Triangular test (Whitehead, 1983): (P=1, R=0, A=1)– Seq cond probability ratio test (Xiong & Tan, 1994)– Some boundaries constant on conditional or
predictive power– Extensions: Peto-Haybittle (using Burington &
Emerson, 2003)
2424
Spectrum of Conditions for Early StoppingSpectrum of Conditions for Early Stopping
– Down columns: Early stopping vs no early stopping– Across rows: One-sided vs two-sided decisions
– Down columns: Early stopping vs no early stopping– Across rows: One-sided vs two-sided decisions
2525
Spectrum of Boundary Shape FunctionsSpectrum of Boundary Shape FunctionsA wide variety of boundary shapes possible
– All of the rules depicted have the same type I error and power to detect the design alternative
A wide variety of boundary shapes possible– All of the rules depicted have the same type I error
and power to detect the design alternative
2626
Boundary ScalesBoundary Scales
Conditional Probability Scale:– Threshold at final analysis from unified family– Hypothesized value of mean
Conditional Probability Scale:– Threshold at final analysis from unified family– Hypothesized value of mean
jJ
jJXJ
jJXJJXj
NN
xNtN
XtXtC
j
**
*;*
1
|Pr,
JXt
*
2727
Boundary ScalesBoundary Scales
Predictive Probability Scale:– Prior distribution
Predictive Probability Scale:– Prior distribution 2,~ N
2222
222
1
|,|Pr
jJjJ
jjJjjXjJ
jjjXjjXj
NNNN
xNNxtNN
dXXtXtH
2828
Boundary ScalesBoundary Scales
Bayesian Posterior Scale:– Prior
Bayesian Posterior Scale:– Prior
22
2222*
,,1**
1
|Pr
j
jjj
N
N
xNN
XXBjj
2,~ N
2929
Major Issue: Frequentist InferenceMajor Issue: Frequentist InferenceFrequentist operating characteristics are based on
the sampling distribution– Stopping rules do affect the sampling distribution of
the usual statistics • MLEs are not normally distributed• Z scores are not standard normal under the null
– (1.96 is irrelevant)• The null distribution of fixed sample P values is not uniform
– (They are not true P values)
Frequentist operating characteristics are based on the sampling distribution– Stopping rules do affect the sampling distribution of
the usual statistics • MLEs are not normally distributed• Z scores are not standard normal under the null
– (1.96 is irrelevant)• The null distribution of fixed sample P values is not uniform
– (They are not true P values)
3030
Sequential Sampling: The PriceSequential Sampling: The Price
It is only through full knowledge of the sampling plan that we can assess the full complement of frequentist operating characteristics– In order to obtain inference with maximal precision
and minimal bias, the sampling plan must be well quantified
– (Note that adaptive designs using ancillary statistics pose no special problems if we condition on those ancillary statistics.)
It is only through full knowledge of the sampling plan that we can assess the full complement of frequentist operating characteristics– In order to obtain inference with maximal precision
and minimal bias, the sampling plan must be well quantified
– (Note that adaptive designs using ancillary statistics pose no special problems if we condition on those ancillary statistics.)
3131
Sampling Distribution of EstimatesSampling Distribution of Estimates
3232
Sampling Distribution of EstimatesSampling Distribution of Estimates
3333
Sampling Distributions with Stopping RulesSampling Distributions with Stopping Rules
3434
Sampling DistributionSampling Distribution
For any known stopping rule, however, we can compute the correct sampling distribution with specialized software– From the computed sampling distributions we then
compute• Bias adjusted estimates• Correct (adjusted) confidence intervals• Correct (adjusted) P values
– Candidate designs can then be compared with respect to their operating characteristics
For any known stopping rule, however, we can compute the correct sampling distribution with specialized software– From the computed sampling distributions we then
compute• Bias adjusted estimates• Correct (adjusted) confidence intervals• Correct (adjusted) P values
– Candidate designs can then be compared with respect to their operating characteristics
3535
Evaluation of DesignsEvaluation of Designs
Process of choosing a trial design– Define candidate design
• Usually constrain two operating characteristics– Type I error, power at design alternative– Type I error, maximal sample size
– Evaluate other operating characteristics• Different criteria of interest to different investigators
– Modify design– Iterate
Process of choosing a trial design– Define candidate design
• Usually constrain two operating characteristics– Type I error, power at design alternative– Type I error, maximal sample size
– Evaluate other operating characteristics• Different criteria of interest to different investigators
– Modify design– Iterate
3636
Operating CharacteristicsOperating Characteristics
The same regardless of the type of stopping rule – Frequentist power curve
• Type I error (null) and power (design alternative)
– Sample size requirements• Maximum, average, median, other quantiles• Stopping probabilities
– Inference at study termination (at each boundary)• Frequentist or Bayesian (under spectrum of priors)
– Futility measures• Conditional power, predictive power
The same regardless of the type of stopping rule – Frequentist power curve
• Type I error (null) and power (design alternative)
– Sample size requirements• Maximum, average, median, other quantiles• Stopping probabilities
– Inference at study termination (at each boundary)• Frequentist or Bayesian (under spectrum of priors)
– Futility measures• Conditional power, predictive power
3737
At Design StageAt Design Stage
In particular, at design stage we can know – Conditions under which trial will continue at each
analysis• Estimates, inference, conditional and predictive power
– Tradeoffs between early stopping and loss in unconditional power
In particular, at design stage we can know – Conditions under which trial will continue at each
analysis• Estimates, inference, conditional and predictive power
– Tradeoffs between early stopping and loss in unconditional power
3838
Frequentist InferenceFrequentist InferenceO'Brien-Fleming Pocock
N MLEBias AdjEstimate 95% CI P val MLE
Bias AdjEstimate 95% CI P val
Efficacy
425 -0.171 -0.163 (-0.224, -0.087) 0.000 -0.099 -0.089 (-0.152, -0.015) 0.010
850 -0.086 -0.080 (-0.130, -0.025) 0.002 -0.070 -0.065 (-0.114, -0.004) 0.018
1275 -0.057 -0.054 (-0.096, -0.007) 0.012 -0.057 -0.055 (-0.101, -0.001) 0.023
1700 -0.043 -0.043 (-0.086, 0.000) 0.025 -0.050 -0.050 (-0.099, 0.000) 0.025
Futility
425 0.086 0.077 (0.001, 0.139) 0.977 0.000 -0.010 (-0.084, 0.053) 0.371
850 0.000 -0.006 (-0.061, 0.044) 0.401 -0.029 -0.035 (-0.095, 0.014) 0.078
1275 -0.029 -0.031 (-0.079, 0.010) 0.067 -0.042 -0.044 (-0.098, 0.002) 0.029
1700 -0.043 -0.043 (-0.086, 0.000) 0.025 -0.050 -0.050 (-0.099, 0.000) 0.025
3939
Efficiency / Unconditional PowerEfficiency / Unconditional Power
Tradeoffs between early stopping and loss of powerBoundaries Loss of Power Avg Sample Size
Tradeoffs between early stopping and loss of powerBoundaries Loss of Power Avg Sample Size
4040
Stochastic CurtailmentStochastic Curtailment
Boundaries transformed to conditional or predictive power– Key issue: Computations are based on assumptions
about the true treatment effect• Conditional power
– “Design”: based on hypotheses– “Estimate”: based on current estimates
• Predictive power– “Prior assumptions”
Boundaries transformed to conditional or predictive power– Key issue: Computations are based on assumptions
about the true treatment effect• Conditional power
– “Design”: based on hypotheses– “Estimate”: based on current estimates
• Predictive power– “Prior assumptions”
4141
Conditional/Predictive PowerConditional/Predictive Power
Symmetric O’Brien-Fleming O’Brien-Fleming Efficacy, P=0.8 Futility
Conditional Power Predictive Power Conditional Power Predictive Power
N MLE Design Estimate Sponsor Noninf MLE Design Estimate Sponsor Noninf
Efficacy (rejects 0.00) Efficacy (rejects 0.00)
425 -0.171 0.500 0.000 0.002 0.000 -0.170 0.500 0.000 0.002 0.000
850 -0.085 0.500 0.002 0.015 0.023 -0.085 0.500 0.002 0.015 0.023
1275 -0.057 0.500 0.091 0.077 0.124 -0.057 0.500 0.093 0.077 0.126
Futility (rejects -0.0855) Futility (rejects -0.0866)
425 0.085 0.500 0.000 0.077 0.000 0.047 0.719 0.000 0.222 0.008
850 0.000 0.500 0.002 0.143 0.023 -0.010 0.648 0.015 0.247 0.063
1275 -0.028 0.500 0.091 0.241 0.124 -0.031 0.592 0.142 0.312 0.177
4242
Efficiency / Unconditional PowerEfficiency / Unconditional Power
Tradeoffs between early stopping and loss of powerBoundaries Loss of Power Avg Sample Size
Tradeoffs between early stopping and loss of powerBoundaries Loss of Power Avg Sample Size
4343
Key IssuesKey Issues
Very different probabilities based on assumptions about the true treatment effect– Extremely conservative O’Brien-Fleming boundaries
correspond to conditional power of 50% (!) under alternative rejected by the boundary
– Resolution of apparent paradox: if the alternative were true, there is less than .0001 probability of stopping for futility at the first analysis
Very different probabilities based on assumptions about the true treatment effect– Extremely conservative O’Brien-Fleming boundaries
correspond to conditional power of 50% (!) under alternative rejected by the boundary
– Resolution of apparent paradox: if the alternative were true, there is less than .0001 probability of stopping for futility at the first analysis
4444
Further CommentsFurther Comments
Neither conditional power nor predictive power have good foundational motivation– Frequentists should use Neyman-Pearson paradigm
and consider optimal unconditional power across alternatives
• And conditional/predictive power is not a good indicator in loss of unconditional power
– Bayesians should use posterior distributions for decisions
Neither conditional power nor predictive power have good foundational motivation– Frequentists should use Neyman-Pearson paradigm
and consider optimal unconditional power across alternatives
• And conditional/predictive power is not a good indicator in loss of unconditional power
– Bayesians should use posterior distributions for decisions
4545
Fully Adaptive SamplingFully Adaptive Sampling
What is the cost of planning not to plan?– In order to provide frequentist estimation, we must
know the rule used to modify the clinical trial• Hypothesis testing of a null is possible with fully adaptive trials
– Statistics: type I error is controlled– Game theory: chance of “winning” with completely
ineffective therapy is controlled– Science:
» At best: ability to discriminate clinically relevant hypothesis may be impaired
» At worst: uncertainty as to what the treatment has effect on
What is the cost of planning not to plan?– In order to provide frequentist estimation, we must
know the rule used to modify the clinical trial• Hypothesis testing of a null is possible with fully adaptive trials
– Statistics: type I error is controlled– Game theory: chance of “winning” with completely
ineffective therapy is controlled– Science:
» At best: ability to discriminate clinically relevant hypothesis may be impaired
» At worst: uncertainty as to what the treatment has effect on
4646
Prespecified Modification RulesPrespecified Modification Rules
Adaptive sampling plans exact a price in statistical efficiency– Tsiatis & Mehta (2002)
• A classic prespecified group sequential stopping rule can be found that is more efficient than a given adaptive design
– Shi & Emerson (2003)• Fisher’s test statistic in the self-designing trial provides
markedly less precise inference than that based on the MLE– To compute the sampling distribution of the latter, the
sampling plan must be known
Adaptive sampling plans exact a price in statistical efficiency– Tsiatis & Mehta (2002)
• A classic prespecified group sequential stopping rule can be found that is more efficient than a given adaptive design
– Shi & Emerson (2003)• Fisher’s test statistic in the self-designing trial provides
markedly less precise inference than that based on the MLE– To compute the sampling distribution of the latter, the
sampling plan must be known
4747
Conditional/Predictive PowerConditional/Predictive Power
Additional issues with maintaining conditional or predictive power– Modification of sample size may allow precise
knowledge of interim treatment effect• Interim estimates may cause change in study population
– Time trends due to investigators gaining or losing enthusiasm
• In extreme cases, potential for unblinding of individual patients
– Effect of outliers on test statistics
Additional issues with maintaining conditional or predictive power– Modification of sample size may allow precise
knowledge of interim treatment effect• Interim estimates may cause change in study population
– Time trends due to investigators gaining or losing enthusiasm
• In extreme cases, potential for unblinding of individual patients
– Effect of outliers on test statistics
4848
Self-Designing TrialSelf-Designing Trial
Additional issues with Self-Designing Trial– The self-designing trial requires pre-specification of
the analysis at which the trial stops• Trial stops when all of the remaining weight is to be applied
at the current analysis, as specified at the previous analysis
Additional issues with Self-Designing Trial– The self-designing trial requires pre-specification of
the analysis at which the trial stops• Trial stops when all of the remaining weight is to be applied
at the current analysis, as specified at the previous analysis
4949
Randomized Play the WinnerRandomized Play the Winner
Additional issues with Randomized Play the Winner– For a fixed total sample size, greatest efficiency is
obtained when ratio of sample sizes is equal to ratio of statistical information from arms
• Constant ratio of standard deviation of observations to sample size
– (Of course, PTW is designed to minimize the number of subjects receiving an inferior treatment, which may be a greater cost in total patients and time)
Additional issues with Randomized Play the Winner– For a fixed total sample size, greatest efficiency is
obtained when ratio of sample sizes is equal to ratio of statistical information from arms
• Constant ratio of standard deviation of observations to sample size
– (Of course, PTW is designed to minimize the number of subjects receiving an inferior treatment, which may be a greater cost in total patients and time)
5050
Final CommentsFinal Comments
Adaptive designs versus prespecified stopping rules– Adaptive designs come at a price of efficiency and
(sometimes) scientific interpretation
– With adequate tools for careful evaluation of designs, there is little need for adaptive designs
Adaptive designs versus prespecified stopping rules– Adaptive designs come at a price of efficiency and
(sometimes) scientific interpretation
– With adequate tools for careful evaluation of designs, there is little need for adaptive designs
5151
Bottom LineBottom Line
You better think (think)
about what you’re
trying to do…
-Aretha Franklin
You better think (think)
about what you’re
trying to do…
-Aretha Franklin