51
1 Issues in the Use of Adaptive Clinical Trial Designs Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington 02, 2003, 2004 Scott S. Emerson, M.D., Ph.D.

1 1 Issues in the Use of Adaptive Clinical Trial Designs Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Scott S. Emerson,

Embed Size (px)

Citation preview

Page 1: 1 1 Issues in the Use of Adaptive Clinical Trial Designs Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Scott S. Emerson,

11

Issues in the Use of Adaptive Clinical Trial Designs

Issues in the Use of Adaptive Clinical Trial Designs

Scott S. Emerson, M.D., Ph.D.Professor of Biostatistics University of Washington

Scott S. Emerson, M.D., Ph.D.Professor of Biostatistics University of Washington

© 2002, 2003, 2004 Scott S. Emerson, M.D., Ph.D.© 2002, 2003, 2004 Scott S. Emerson, M.D., Ph.D.

Page 2: 1 1 Issues in the Use of Adaptive Clinical Trial Designs Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Scott S. Emerson,

22

Clinical TrialsClinical Trials

Experimentation in human volunteers– Investigates a new treatment/preventive agent

• Safety: » Are there adverse effects that clearly outweigh any

potential benefit?• Efficacy:

» Can the treatment alter the disease process in a beneficial way?

• Effectiveness: » Would adoption of the treatment as a standard affect

morbidity / mortality in the population?

Experimentation in human volunteers– Investigates a new treatment/preventive agent

• Safety: » Are there adverse effects that clearly outweigh any

potential benefit?• Efficacy:

» Can the treatment alter the disease process in a beneficial way?

• Effectiveness: » Would adoption of the treatment as a standard affect

morbidity / mortality in the population?

Page 3: 1 1 Issues in the Use of Adaptive Clinical Trial Designs Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Scott S. Emerson,

33

Clinical Trial DesignClinical Trial Design

Finding an approach that best addresses the often competing goals: Science, Ethics, Efficiency

• Basic scientists: focus on mechanisms• Clinical scientists: focus on overall patient health• Ethical: focus on patients on trial, future patients• Economic: focus on profits and/or costs• Governmental: focus on validity of marketing claims• Statistical: focus on questions answered precisely • Operational: focus on feasibility of mounting trial

Finding an approach that best addresses the often competing goals: Science, Ethics, Efficiency

• Basic scientists: focus on mechanisms• Clinical scientists: focus on overall patient health• Ethical: focus on patients on trial, future patients• Economic: focus on profits and/or costs• Governmental: focus on validity of marketing claims• Statistical: focus on questions answered precisely • Operational: focus on feasibility of mounting trial

Page 4: 1 1 Issues in the Use of Adaptive Clinical Trial Designs Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Scott S. Emerson,

44

Statistical PlanningStatistical Planning

Ensure that the trial will satisfy the various collaborators as much as possible

• Discriminate between relevant scientific hypotheses– Scientific and statistical credibility

• Protect economic interests of sponsor– Efficient designs– Economically important estimates

• Protect interests of patients on trial– Stop if unsafe or unethical– Stop when credible decision can be made

• Promote rapid discovery of new beneficial treatments

Ensure that the trial will satisfy the various collaborators as much as possible

• Discriminate between relevant scientific hypotheses– Scientific and statistical credibility

• Protect economic interests of sponsor– Efficient designs– Economically important estimates

• Protect interests of patients on trial– Stop if unsafe or unethical– Stop when credible decision can be made

• Promote rapid discovery of new beneficial treatments

Page 5: 1 1 Issues in the Use of Adaptive Clinical Trial Designs Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Scott S. Emerson,

55

Refine Scientific HypothesesRefine Scientific Hypotheses

– Target population• Inclusion, exclusion, important subgroups

– Intervention• Dose, administration (intention to treat)

– Measurement of outcome(s)• Efficacy/effectiveness, toxicity

– Statistical hypotheses in terms of some summary measure of outcome distribution

• Mean, geometric mean, median, odds, hazard, etc.

– Criteria for statistical credibility• Frequentist (type I, II errors) or Bayesian

– Target population• Inclusion, exclusion, important subgroups

– Intervention• Dose, administration (intention to treat)

– Measurement of outcome(s)• Efficacy/effectiveness, toxicity

– Statistical hypotheses in terms of some summary measure of outcome distribution

• Mean, geometric mean, median, odds, hazard, etc.

– Criteria for statistical credibility• Frequentist (type I, II errors) or Bayesian

Page 6: 1 1 Issues in the Use of Adaptive Clinical Trial Designs Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Scott S. Emerson,

66

Statistics to Address VariabilityStatistics to Address Variability

At the end of the study:– Frequentist and/or Bayesian data analysis to assess

the credibility of clinical trial results• Estimate of the treatment effect

– Single best estimate– Precision of estimates

• Decision for or against hypotheses– Binary decision– Quantification of strength of evidence

At the end of the study:– Frequentist and/or Bayesian data analysis to assess

the credibility of clinical trial results• Estimate of the treatment effect

– Single best estimate– Precision of estimates

• Decision for or against hypotheses– Binary decision– Quantification of strength of evidence

Page 7: 1 1 Issues in the Use of Adaptive Clinical Trial Designs Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Scott S. Emerson,

77

Statistical Sampling PlanStatistical Sampling Plan

Ethical and efficiency concerns are addressed through sequential sampling

• During the conduct of the study, data are analyzed at periodic intervals and reviewed by the DMC

• Using interim estimates of treatment effect– Decide whether to continue the trial– If continuing, decide on any modifications to

» scientific / statistical hypotheses and/or» sampling scheme

Ethical and efficiency concerns are addressed through sequential sampling

• During the conduct of the study, data are analyzed at periodic intervals and reviewed by the DMC

• Using interim estimates of treatment effect– Decide whether to continue the trial– If continuing, decide on any modifications to

» scientific / statistical hypotheses and/or» sampling scheme

Page 8: 1 1 Issues in the Use of Adaptive Clinical Trial Designs Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Scott S. Emerson,

88

Sampling Plan: General ApproachSampling Plan: General Approach

– Perform analyses when sample sizes N1. . . NJ

• Can be randomly determined

– At each analysis choose stopping boundaries• aj < bj < cj < dj

– Compute test statistic T(X1. . . XNj)

• Stop if T < aj (extremely low)

• Stop if bj < T < cj (approximate equivalence)

• Stop if T > dj (extremely high)

• Otherwise continue (with possible adaptive modification of analysis schedule, sample size, etc.)

– Boundaries for modification of sampling plan

– Perform analyses when sample sizes N1. . . NJ

• Can be randomly determined

– At each analysis choose stopping boundaries• aj < bj < cj < dj

– Compute test statistic T(X1. . . XNj)

• Stop if T < aj (extremely low)

• Stop if bj < T < cj (approximate equivalence)

• Stop if T > dj (extremely high)

• Otherwise continue (with possible adaptive modification of analysis schedule, sample size, etc.)

– Boundaries for modification of sampling plan

Page 9: 1 1 Issues in the Use of Adaptive Clinical Trial Designs Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Scott S. Emerson,

99

Sequential Sampling IssuesSequential Sampling Issues

– Design stage• Choosing sampling plan which satisfies desired operating

characteristics– E.g., type I error, power, sample size requirements

– Monitoring stage• Flexible implementation of the stopping rule to account for

assumptions made at design stage– E.g., adjust sample size to account for observed

variance

– Analysis stage• Providing inference based on true sampling distribution of

test statistics

– Design stage• Choosing sampling plan which satisfies desired operating

characteristics– E.g., type I error, power, sample size requirements

– Monitoring stage• Flexible implementation of the stopping rule to account for

assumptions made at design stage– E.g., adjust sample size to account for observed

variance

– Analysis stage• Providing inference based on true sampling distribution of

test statistics

Page 10: 1 1 Issues in the Use of Adaptive Clinical Trial Designs Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Scott S. Emerson,

1010

Sequential Sampling StrategiesSequential Sampling Strategies

Two broad categories of sequential sampling– Prespecified stopping guidelines

– Adaptive procedures

Two broad categories of sequential sampling– Prespecified stopping guidelines

– Adaptive procedures

Page 11: 1 1 Issues in the Use of Adaptive Clinical Trial Designs Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Scott S. Emerson,

1111

Prespecified Stopping PlansPrespecified Stopping Plans

Prior to collection of data, specify– Scientific and statistical hypotheses of interest– Statistical criteria for credible evidence– Rule for determining maximal statistical information

• E.g., fix power, maximal sample size, or calendar time

– Randomization scheme– Rule for determining schedule of analyses

• E.g., according to sample size, statistical information, or calendar time

– Rule for determining conditions for early stopping• E.g., boundary shape function for stopping boundaries on the

scale of some test statistic

Prior to collection of data, specify– Scientific and statistical hypotheses of interest– Statistical criteria for credible evidence– Rule for determining maximal statistical information

• E.g., fix power, maximal sample size, or calendar time

– Randomization scheme– Rule for determining schedule of analyses

• E.g., according to sample size, statistical information, or calendar time

– Rule for determining conditions for early stopping• E.g., boundary shape function for stopping boundaries on the

scale of some test statistic

Page 12: 1 1 Issues in the Use of Adaptive Clinical Trial Designs Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Scott S. Emerson,

1212

Adaptive Sampling PlansAdaptive Sampling Plans

At each interim analysis, possibly modify– Scientific and statistical hypotheses of interest– Statistical criteria for credible evidence– Maximal statistical information– Randomization ratios– Schedule of analyses– Conditions for early stopping

At each interim analysis, possibly modify– Scientific and statistical hypotheses of interest– Statistical criteria for credible evidence– Maximal statistical information– Randomization ratios– Schedule of analyses– Conditions for early stopping

Page 13: 1 1 Issues in the Use of Adaptive Clinical Trial Designs Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Scott S. Emerson,

1313

Adaptive Sampling: ExamplesAdaptive Sampling: Examples

– E.g., Modify sample size to account for estimated information (variance or baseline rates)

• No effect on type I error IF– Estimated information independent of estimate of

treatment effect» Proportional hazards,» Normal data, and/or» Carefully phrased alternatives

– And willing to use conditional inference» Carefully phrased alternatives

– E.g., Modify sample size to account for estimated information (variance or baseline rates)

• No effect on type I error IF– Estimated information independent of estimate of

treatment effect» Proportional hazards,» Normal data, and/or» Carefully phrased alternatives

– And willing to use conditional inference» Carefully phrased alternatives

Page 14: 1 1 Issues in the Use of Adaptive Clinical Trial Designs Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Scott S. Emerson,

1414

Estimation of Statistical InformationEstimation of Statistical InformationIf maximal sample size is maintained, the study

discriminates between null hypothesis and an alternative measured in units of statistical information

If maximal sample size is maintained, the study discriminates between null hypothesis and an alternative measured in units of statistical information

V

nV

n2

01

21

201

21

)()(

Page 15: 1 1 Issues in the Use of Adaptive Clinical Trial Designs Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Scott S. Emerson,

1515

Estimation of Statistical InformationEstimation of Statistical InformationIf statistical power is maintained, the study sample

size is measured in units of statistical information

If statistical power is maintained, the study sample size is measured in units of statistical information

201

21

201

21

)()(

V

nVn

Page 16: 1 1 Issues in the Use of Adaptive Clinical Trial Designs Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Scott S. Emerson,

1616

Adaptive Sampling: ExamplesAdaptive Sampling: Examples

– E.g., Proschan & Hunsberger (1995)• Modify ultimate sample size based on conditional power

– Computed under current best estimate (if high enough)• Make adjustment to inference to maintain Type I error

– E.g., Self-designing Trial (Fisher, 1998)• Combine arbitrary test statistics from sequential groups• Prespecify weighting of groups “just in time”

– Specified at immediately preceding analysis• Fisher’s test statistic is N(0,1) under the null hypothesis of no

treatment difference on any of the endpoints tested

– E.g., Randomized Play the Winner• Biased coin favors currently best performing treatment

– E.g., Proschan & Hunsberger (1995)• Modify ultimate sample size based on conditional power

– Computed under current best estimate (if high enough)• Make adjustment to inference to maintain Type I error

– E.g., Self-designing Trial (Fisher, 1998)• Combine arbitrary test statistics from sequential groups• Prespecify weighting of groups “just in time”

– Specified at immediately preceding analysis• Fisher’s test statistic is N(0,1) under the null hypothesis of no

treatment difference on any of the endpoints tested

– E.g., Randomized Play the Winner• Biased coin favors currently best performing treatment

Page 17: 1 1 Issues in the Use of Adaptive Clinical Trial Designs Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Scott S. Emerson,

1717

Motivation for Adaptive DesignsMotivation for Adaptive Designs

Scientific and statistical hypotheses of interest– Modify target population, intervention, measurement

of outcome, alternative hypotheses of interest– Possible justification

• Changing conditions in medical environment– Approval/withdrawal of competing/ancillary treatments– Diagnostic procedures

• New knowledge from other trials about similar treatments• Evidence from ongoing trial

– Toxicity profile (therapeutic index)– Subgroup effects

Scientific and statistical hypotheses of interest– Modify target population, intervention, measurement

of outcome, alternative hypotheses of interest– Possible justification

• Changing conditions in medical environment– Approval/withdrawal of competing/ancillary treatments– Diagnostic procedures

• New knowledge from other trials about similar treatments• Evidence from ongoing trial

– Toxicity profile (therapeutic index)– Subgroup effects

Page 18: 1 1 Issues in the Use of Adaptive Clinical Trial Designs Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Scott S. Emerson,

1818

Motivation for Adaptive DesignsMotivation for Adaptive Designs

Modification of other design parameters may have great impact on the hypotheses considered– Statistical criteria for credible evidence– Maximal statistical information– Randomization ratios– Schedule of analyses– Conditions for early stopping

Modification of other design parameters may have great impact on the hypotheses considered– Statistical criteria for credible evidence– Maximal statistical information– Randomization ratios– Schedule of analyses– Conditions for early stopping

Page 19: 1 1 Issues in the Use of Adaptive Clinical Trial Designs Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Scott S. Emerson,

1919

Prespecified vs AdaptivePrespecified vs Adaptive

Major issues with use of adaptive designs– What do we truly gain?

• Can proper evaluation of trial designs obviate need?

– What can we lose?• Efficiency? (and how should it be measured?)• Scientific inference?

– Science vs Statistics vs Game theory – Definition of scientific/statistical hypotheses– Quantifying precision of inference

Major issues with use of adaptive designs– What do we truly gain?

• Can proper evaluation of trial designs obviate need?

– What can we lose?• Efficiency? (and how should it be measured?)• Scientific inference?

– Science vs Statistics vs Game theory – Definition of scientific/statistical hypotheses– Quantifying precision of inference

Page 20: 1 1 Issues in the Use of Adaptive Clinical Trial Designs Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Scott S. Emerson,

2020

Major Issue: Frequentist InferenceMajor Issue: Frequentist Inference

Frequentist inference is still the most commonly used form of quantifying statistical strength of evidence– Estimates which minimize bias, MSE– Confidence intervals– P values; type I, II errors

Frequentist inference depends on sampling distribution

Frequentist inference is still the most commonly used form of quantifying statistical strength of evidence– Estimates which minimize bias, MSE– Confidence intervals– P values; type I, II errors

Frequentist inference depends on sampling distribution

Page 21: 1 1 Issues in the Use of Adaptive Clinical Trial Designs Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Scott S. Emerson,

2121

Prespecified Sampling PlanPrespecified Sampling Plan

– Perform analyses when sample sizes N1. . . NJ

• Can be randomly determined

– At each analysis choose stopping boundaries• aj < bj < cj < dj

– Compute test statistic T(X1. . . XNj)

• Stop if T < aj (extremely low)

• Stop if bj < T < cj (approximate equivalence)

• Stop if T > dj (extremely high)

• Otherwise continue as prespecified

– Perform analyses when sample sizes N1. . . NJ

• Can be randomly determined

– At each analysis choose stopping boundaries• aj < bj < cj < dj

– Compute test statistic T(X1. . . XNj)

• Stop if T < aj (extremely low)

• Stop if bj < T < cj (approximate equivalence)

• Stop if T > dj (extremely high)

• Otherwise continue as prespecified

Page 22: 1 1 Issues in the Use of Adaptive Clinical Trial Designs Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Scott S. Emerson,

2222

Boundary ScalesBoundary Scales

– Stopping rule for one test statistic is easily transformed to a rule for another statistic

• “Group sequential stopping rules”– Sum of observations– Point estimate of treatment effect– Normalized (Z) statistic– Fixed sample P value– Error spending function

• Conditional probability• Predictive probability• Bayesian posterior probability

– Stopping rule for one test statistic is easily transformed to a rule for another statistic

• “Group sequential stopping rules”– Sum of observations– Point estimate of treatment effect– Normalized (Z) statistic– Fixed sample P value– Error spending function

• Conditional probability• Predictive probability• Bayesian posterior probability

Page 23: 1 1 Issues in the Use of Adaptive Clinical Trial Designs Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Scott S. Emerson,

2323

Unified Family: MLE ScaleUnified Family: MLE Scale

Boundary shape function unifying families of stopping rules (Kittelson & Emerson, 1999) – Wang & Tsiatis (1987) based families (R=0, A=0)

• P=1: O’Brien & Fleming (1979); P= 0.5: Pocock (1977)• Emerson & Fleming (1989); Pampallona & Tsiatis (1994)

– Triangular test (Whitehead, 1983): (P=1, R=0, A=1)– Seq cond probability ratio test (Xiong & Tan, 1994)– Some boundaries constant on conditional or

predictive power– Extensions: Peto-Haybittle (using Burington &

Emerson, 2003)

Boundary shape function unifying families of stopping rules (Kittelson & Emerson, 1999) – Wang & Tsiatis (1987) based families (R=0, A=0)

• P=1: O’Brien & Fleming (1979); P= 0.5: Pocock (1977)• Emerson & Fleming (1989); Pampallona & Tsiatis (1994)

– Triangular test (Whitehead, 1983): (P=1, R=0, A=1)– Seq cond probability ratio test (Xiong & Tan, 1994)– Some boundaries constant on conditional or

predictive power– Extensions: Peto-Haybittle (using Burington &

Emerson, 2003)

Page 24: 1 1 Issues in the Use of Adaptive Clinical Trial Designs Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Scott S. Emerson,

2424

Spectrum of Conditions for Early StoppingSpectrum of Conditions for Early Stopping

– Down columns: Early stopping vs no early stopping– Across rows: One-sided vs two-sided decisions

– Down columns: Early stopping vs no early stopping– Across rows: One-sided vs two-sided decisions

Page 25: 1 1 Issues in the Use of Adaptive Clinical Trial Designs Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Scott S. Emerson,

2525

Spectrum of Boundary Shape FunctionsSpectrum of Boundary Shape FunctionsA wide variety of boundary shapes possible

– All of the rules depicted have the same type I error and power to detect the design alternative

A wide variety of boundary shapes possible– All of the rules depicted have the same type I error

and power to detect the design alternative

Page 26: 1 1 Issues in the Use of Adaptive Clinical Trial Designs Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Scott S. Emerson,

2626

Boundary ScalesBoundary Scales

Conditional Probability Scale:– Threshold at final analysis from unified family– Hypothesized value of mean

Conditional Probability Scale:– Threshold at final analysis from unified family– Hypothesized value of mean

jJ

jJXJ

jJXJJXj

NN

xNtN

XtXtC

j

**

*;*

1

|Pr,

JXt

*

Page 27: 1 1 Issues in the Use of Adaptive Clinical Trial Designs Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Scott S. Emerson,

2727

Boundary ScalesBoundary Scales

Predictive Probability Scale:– Prior distribution

Predictive Probability Scale:– Prior distribution 2,~ N

2222

222

1

|,|Pr

jJjJ

jjJjjXjJ

jjjXjjXj

NNNN

xNNxtNN

dXXtXtH

Page 28: 1 1 Issues in the Use of Adaptive Clinical Trial Designs Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Scott S. Emerson,

2828

Boundary ScalesBoundary Scales

Bayesian Posterior Scale:– Prior

Bayesian Posterior Scale:– Prior

22

2222*

,,1**

1

|Pr

j

jjj

N

N

xNN

XXBjj

2,~ N

Page 29: 1 1 Issues in the Use of Adaptive Clinical Trial Designs Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Scott S. Emerson,

2929

Major Issue: Frequentist InferenceMajor Issue: Frequentist InferenceFrequentist operating characteristics are based on

the sampling distribution– Stopping rules do affect the sampling distribution of

the usual statistics • MLEs are not normally distributed• Z scores are not standard normal under the null

– (1.96 is irrelevant)• The null distribution of fixed sample P values is not uniform

– (They are not true P values)

Frequentist operating characteristics are based on the sampling distribution– Stopping rules do affect the sampling distribution of

the usual statistics • MLEs are not normally distributed• Z scores are not standard normal under the null

– (1.96 is irrelevant)• The null distribution of fixed sample P values is not uniform

– (They are not true P values)

Page 30: 1 1 Issues in the Use of Adaptive Clinical Trial Designs Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Scott S. Emerson,

3030

Sequential Sampling: The PriceSequential Sampling: The Price

It is only through full knowledge of the sampling plan that we can assess the full complement of frequentist operating characteristics– In order to obtain inference with maximal precision

and minimal bias, the sampling plan must be well quantified

– (Note that adaptive designs using ancillary statistics pose no special problems if we condition on those ancillary statistics.)

It is only through full knowledge of the sampling plan that we can assess the full complement of frequentist operating characteristics– In order to obtain inference with maximal precision

and minimal bias, the sampling plan must be well quantified

– (Note that adaptive designs using ancillary statistics pose no special problems if we condition on those ancillary statistics.)

Page 31: 1 1 Issues in the Use of Adaptive Clinical Trial Designs Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Scott S. Emerson,

3131

Sampling Distribution of EstimatesSampling Distribution of Estimates

Page 32: 1 1 Issues in the Use of Adaptive Clinical Trial Designs Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Scott S. Emerson,

3232

Sampling Distribution of EstimatesSampling Distribution of Estimates

Page 33: 1 1 Issues in the Use of Adaptive Clinical Trial Designs Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Scott S. Emerson,

3333

Sampling Distributions with Stopping RulesSampling Distributions with Stopping Rules

Page 34: 1 1 Issues in the Use of Adaptive Clinical Trial Designs Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Scott S. Emerson,

3434

Sampling DistributionSampling Distribution

For any known stopping rule, however, we can compute the correct sampling distribution with specialized software– From the computed sampling distributions we then

compute• Bias adjusted estimates• Correct (adjusted) confidence intervals• Correct (adjusted) P values

– Candidate designs can then be compared with respect to their operating characteristics

For any known stopping rule, however, we can compute the correct sampling distribution with specialized software– From the computed sampling distributions we then

compute• Bias adjusted estimates• Correct (adjusted) confidence intervals• Correct (adjusted) P values

– Candidate designs can then be compared with respect to their operating characteristics

Page 35: 1 1 Issues in the Use of Adaptive Clinical Trial Designs Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Scott S. Emerson,

3535

Evaluation of DesignsEvaluation of Designs

Process of choosing a trial design– Define candidate design

• Usually constrain two operating characteristics– Type I error, power at design alternative– Type I error, maximal sample size

– Evaluate other operating characteristics• Different criteria of interest to different investigators

– Modify design– Iterate

Process of choosing a trial design– Define candidate design

• Usually constrain two operating characteristics– Type I error, power at design alternative– Type I error, maximal sample size

– Evaluate other operating characteristics• Different criteria of interest to different investigators

– Modify design– Iterate

Page 36: 1 1 Issues in the Use of Adaptive Clinical Trial Designs Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Scott S. Emerson,

3636

Operating CharacteristicsOperating Characteristics

The same regardless of the type of stopping rule – Frequentist power curve

• Type I error (null) and power (design alternative)

– Sample size requirements• Maximum, average, median, other quantiles• Stopping probabilities

– Inference at study termination (at each boundary)• Frequentist or Bayesian (under spectrum of priors)

– Futility measures• Conditional power, predictive power

The same regardless of the type of stopping rule – Frequentist power curve

• Type I error (null) and power (design alternative)

– Sample size requirements• Maximum, average, median, other quantiles• Stopping probabilities

– Inference at study termination (at each boundary)• Frequentist or Bayesian (under spectrum of priors)

– Futility measures• Conditional power, predictive power

Page 37: 1 1 Issues in the Use of Adaptive Clinical Trial Designs Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Scott S. Emerson,

3737

At Design StageAt Design Stage

In particular, at design stage we can know – Conditions under which trial will continue at each

analysis• Estimates, inference, conditional and predictive power

– Tradeoffs between early stopping and loss in unconditional power

In particular, at design stage we can know – Conditions under which trial will continue at each

analysis• Estimates, inference, conditional and predictive power

– Tradeoffs between early stopping and loss in unconditional power

Page 38: 1 1 Issues in the Use of Adaptive Clinical Trial Designs Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Scott S. Emerson,

3838

Frequentist InferenceFrequentist InferenceO'Brien-Fleming Pocock

N MLEBias AdjEstimate 95% CI P val MLE

Bias AdjEstimate 95% CI P val

Efficacy

425 -0.171 -0.163 (-0.224, -0.087) 0.000 -0.099 -0.089 (-0.152, -0.015) 0.010

850 -0.086 -0.080 (-0.130, -0.025) 0.002 -0.070 -0.065 (-0.114, -0.004) 0.018

1275 -0.057 -0.054 (-0.096, -0.007) 0.012 -0.057 -0.055 (-0.101, -0.001) 0.023

1700 -0.043 -0.043 (-0.086, 0.000) 0.025 -0.050 -0.050 (-0.099, 0.000) 0.025

Futility

425 0.086 0.077 (0.001, 0.139) 0.977 0.000 -0.010 (-0.084, 0.053) 0.371

850 0.000 -0.006 (-0.061, 0.044) 0.401 -0.029 -0.035 (-0.095, 0.014) 0.078

1275 -0.029 -0.031 (-0.079, 0.010) 0.067 -0.042 -0.044 (-0.098, 0.002) 0.029

1700 -0.043 -0.043 (-0.086, 0.000) 0.025 -0.050 -0.050 (-0.099, 0.000) 0.025

Page 39: 1 1 Issues in the Use of Adaptive Clinical Trial Designs Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Scott S. Emerson,

3939

Efficiency / Unconditional PowerEfficiency / Unconditional Power

Tradeoffs between early stopping and loss of powerBoundaries Loss of Power Avg Sample Size

Tradeoffs between early stopping and loss of powerBoundaries Loss of Power Avg Sample Size

Page 40: 1 1 Issues in the Use of Adaptive Clinical Trial Designs Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Scott S. Emerson,

4040

Stochastic CurtailmentStochastic Curtailment

Boundaries transformed to conditional or predictive power– Key issue: Computations are based on assumptions

about the true treatment effect• Conditional power

– “Design”: based on hypotheses– “Estimate”: based on current estimates

• Predictive power– “Prior assumptions”

Boundaries transformed to conditional or predictive power– Key issue: Computations are based on assumptions

about the true treatment effect• Conditional power

– “Design”: based on hypotheses– “Estimate”: based on current estimates

• Predictive power– “Prior assumptions”

Page 41: 1 1 Issues in the Use of Adaptive Clinical Trial Designs Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Scott S. Emerson,

4141

Conditional/Predictive PowerConditional/Predictive Power

Symmetric O’Brien-Fleming O’Brien-Fleming Efficacy, P=0.8 Futility

Conditional Power Predictive Power Conditional Power Predictive Power

N MLE Design Estimate Sponsor Noninf MLE Design Estimate Sponsor Noninf

Efficacy (rejects 0.00) Efficacy (rejects 0.00)

425 -0.171 0.500 0.000 0.002 0.000 -0.170 0.500 0.000 0.002 0.000

850 -0.085 0.500 0.002 0.015 0.023 -0.085 0.500 0.002 0.015 0.023

1275 -0.057 0.500 0.091 0.077 0.124 -0.057 0.500 0.093 0.077 0.126

Futility (rejects -0.0855) Futility (rejects -0.0866)

425 0.085 0.500 0.000 0.077 0.000 0.047 0.719 0.000 0.222 0.008

850 0.000 0.500 0.002 0.143 0.023 -0.010 0.648 0.015 0.247 0.063

1275 -0.028 0.500 0.091 0.241 0.124 -0.031 0.592 0.142 0.312 0.177

Page 42: 1 1 Issues in the Use of Adaptive Clinical Trial Designs Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Scott S. Emerson,

4242

Efficiency / Unconditional PowerEfficiency / Unconditional Power

Tradeoffs between early stopping and loss of powerBoundaries Loss of Power Avg Sample Size

Tradeoffs between early stopping and loss of powerBoundaries Loss of Power Avg Sample Size

Page 43: 1 1 Issues in the Use of Adaptive Clinical Trial Designs Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Scott S. Emerson,

4343

Key IssuesKey Issues

Very different probabilities based on assumptions about the true treatment effect– Extremely conservative O’Brien-Fleming boundaries

correspond to conditional power of 50% (!) under alternative rejected by the boundary

– Resolution of apparent paradox: if the alternative were true, there is less than .0001 probability of stopping for futility at the first analysis

Very different probabilities based on assumptions about the true treatment effect– Extremely conservative O’Brien-Fleming boundaries

correspond to conditional power of 50% (!) under alternative rejected by the boundary

– Resolution of apparent paradox: if the alternative were true, there is less than .0001 probability of stopping for futility at the first analysis

Page 44: 1 1 Issues in the Use of Adaptive Clinical Trial Designs Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Scott S. Emerson,

4444

Further CommentsFurther Comments

Neither conditional power nor predictive power have good foundational motivation– Frequentists should use Neyman-Pearson paradigm

and consider optimal unconditional power across alternatives

• And conditional/predictive power is not a good indicator in loss of unconditional power

– Bayesians should use posterior distributions for decisions

Neither conditional power nor predictive power have good foundational motivation– Frequentists should use Neyman-Pearson paradigm

and consider optimal unconditional power across alternatives

• And conditional/predictive power is not a good indicator in loss of unconditional power

– Bayesians should use posterior distributions for decisions

Page 45: 1 1 Issues in the Use of Adaptive Clinical Trial Designs Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Scott S. Emerson,

4545

Fully Adaptive SamplingFully Adaptive Sampling

What is the cost of planning not to plan?– In order to provide frequentist estimation, we must

know the rule used to modify the clinical trial• Hypothesis testing of a null is possible with fully adaptive trials

– Statistics: type I error is controlled– Game theory: chance of “winning” with completely

ineffective therapy is controlled– Science:

» At best: ability to discriminate clinically relevant hypothesis may be impaired

» At worst: uncertainty as to what the treatment has effect on

What is the cost of planning not to plan?– In order to provide frequentist estimation, we must

know the rule used to modify the clinical trial• Hypothesis testing of a null is possible with fully adaptive trials

– Statistics: type I error is controlled– Game theory: chance of “winning” with completely

ineffective therapy is controlled– Science:

» At best: ability to discriminate clinically relevant hypothesis may be impaired

» At worst: uncertainty as to what the treatment has effect on

Page 46: 1 1 Issues in the Use of Adaptive Clinical Trial Designs Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Scott S. Emerson,

4646

Prespecified Modification RulesPrespecified Modification Rules

Adaptive sampling plans exact a price in statistical efficiency– Tsiatis & Mehta (2002)

• A classic prespecified group sequential stopping rule can be found that is more efficient than a given adaptive design

– Shi & Emerson (2003)• Fisher’s test statistic in the self-designing trial provides

markedly less precise inference than that based on the MLE– To compute the sampling distribution of the latter, the

sampling plan must be known

Adaptive sampling plans exact a price in statistical efficiency– Tsiatis & Mehta (2002)

• A classic prespecified group sequential stopping rule can be found that is more efficient than a given adaptive design

– Shi & Emerson (2003)• Fisher’s test statistic in the self-designing trial provides

markedly less precise inference than that based on the MLE– To compute the sampling distribution of the latter, the

sampling plan must be known

Page 47: 1 1 Issues in the Use of Adaptive Clinical Trial Designs Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Scott S. Emerson,

4747

Conditional/Predictive PowerConditional/Predictive Power

Additional issues with maintaining conditional or predictive power– Modification of sample size may allow precise

knowledge of interim treatment effect• Interim estimates may cause change in study population

– Time trends due to investigators gaining or losing enthusiasm

• In extreme cases, potential for unblinding of individual patients

– Effect of outliers on test statistics

Additional issues with maintaining conditional or predictive power– Modification of sample size may allow precise

knowledge of interim treatment effect• Interim estimates may cause change in study population

– Time trends due to investigators gaining or losing enthusiasm

• In extreme cases, potential for unblinding of individual patients

– Effect of outliers on test statistics

Page 48: 1 1 Issues in the Use of Adaptive Clinical Trial Designs Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Scott S. Emerson,

4848

Self-Designing TrialSelf-Designing Trial

Additional issues with Self-Designing Trial– The self-designing trial requires pre-specification of

the analysis at which the trial stops• Trial stops when all of the remaining weight is to be applied

at the current analysis, as specified at the previous analysis

Additional issues with Self-Designing Trial– The self-designing trial requires pre-specification of

the analysis at which the trial stops• Trial stops when all of the remaining weight is to be applied

at the current analysis, as specified at the previous analysis

Page 49: 1 1 Issues in the Use of Adaptive Clinical Trial Designs Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Scott S. Emerson,

4949

Randomized Play the WinnerRandomized Play the Winner

Additional issues with Randomized Play the Winner– For a fixed total sample size, greatest efficiency is

obtained when ratio of sample sizes is equal to ratio of statistical information from arms

• Constant ratio of standard deviation of observations to sample size

– (Of course, PTW is designed to minimize the number of subjects receiving an inferior treatment, which may be a greater cost in total patients and time)

Additional issues with Randomized Play the Winner– For a fixed total sample size, greatest efficiency is

obtained when ratio of sample sizes is equal to ratio of statistical information from arms

• Constant ratio of standard deviation of observations to sample size

– (Of course, PTW is designed to minimize the number of subjects receiving an inferior treatment, which may be a greater cost in total patients and time)

Page 50: 1 1 Issues in the Use of Adaptive Clinical Trial Designs Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Scott S. Emerson,

5050

Final CommentsFinal Comments

Adaptive designs versus prespecified stopping rules– Adaptive designs come at a price of efficiency and

(sometimes) scientific interpretation

– With adequate tools for careful evaluation of designs, there is little need for adaptive designs

Adaptive designs versus prespecified stopping rules– Adaptive designs come at a price of efficiency and

(sometimes) scientific interpretation

– With adequate tools for careful evaluation of designs, there is little need for adaptive designs

Page 51: 1 1 Issues in the Use of Adaptive Clinical Trial Designs Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Scott S. Emerson,

5151

Bottom LineBottom Line

You better think (think)

about what you’re

trying to do…

-Aretha Franklin

You better think (think)

about what you’re

trying to do…

-Aretha Franklin