1 1 Issues in the Use of Adaptive Clinical Trial Designs Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Scott S. Emerson,

11

Issues in the Use of Adaptive Clinical Trial Designs

Issues in the Use of Adaptive Clinical Trial Designs

Scott S. Emerson, M.D., Ph.D.Professor of Biostatistics University of Washington

Scott S. Emerson, M.D., Ph.D.Professor of Biostatistics University of Washington

© 2002, 2003, 2004 Scott S. Emerson, M.D., Ph.D.© 2002, 2003, 2004 Scott S. Emerson, M.D., Ph.D.

22

Clinical TrialsClinical Trials

Experimentation in human volunteers– Investigates a new treatment/preventive agent

• Safety: » Are there adverse effects that clearly outweigh any

potential benefit?• Efficacy:

» Can the treatment alter the disease process in a beneficial way?

• Effectiveness: » Would adoption of the treatment as a standard affect

morbidity / mortality in the population?

Experimentation in human volunteers– Investigates a new treatment/preventive agent

• Safety: » Are there adverse effects that clearly outweigh any

potential benefit?• Efficacy:

» Can the treatment alter the disease process in a beneficial way?

• Effectiveness: » Would adoption of the treatment as a standard affect

morbidity / mortality in the population?

33

Clinical Trial DesignClinical Trial Design

Finding an approach that best addresses the often competing goals: Science, Ethics, Efficiency

• Basic scientists: focus on mechanisms• Clinical scientists: focus on overall patient health• Ethical: focus on patients on trial, future patients• Economic: focus on profits and/or costs• Governmental: focus on validity of marketing claims• Statistical: focus on questions answered precisely • Operational: focus on feasibility of mounting trial

Finding an approach that best addresses the often competing goals: Science, Ethics, Efficiency

• Basic scientists: focus on mechanisms• Clinical scientists: focus on overall patient health• Ethical: focus on patients on trial, future patients• Economic: focus on profits and/or costs• Governmental: focus on validity of marketing claims• Statistical: focus on questions answered precisely • Operational: focus on feasibility of mounting trial

44

Statistical PlanningStatistical Planning

Ensure that the trial will satisfy the various collaborators as much as possible

• Discriminate between relevant scientific hypotheses– Scientific and statistical credibility

• Protect economic interests of sponsor– Efficient designs– Economically important estimates

• Protect interests of patients on trial– Stop if unsafe or unethical– Stop when credible decision can be made

• Promote rapid discovery of new beneficial treatments

Ensure that the trial will satisfy the various collaborators as much as possible

• Discriminate between relevant scientific hypotheses– Scientific and statistical credibility

• Protect economic interests of sponsor– Efficient designs– Economically important estimates

• Protect interests of patients on trial– Stop if unsafe or unethical– Stop when credible decision can be made

• Promote rapid discovery of new beneficial treatments

55

Refine Scientific HypothesesRefine Scientific Hypotheses

– Target population• Inclusion, exclusion, important subgroups

– Intervention• Dose, administration (intention to treat)

– Measurement of outcome(s)• Efficacy/effectiveness, toxicity

– Statistical hypotheses in terms of some summary measure of outcome distribution

• Mean, geometric mean, median, odds, hazard, etc.

– Criteria for statistical credibility• Frequentist (type I, II errors) or Bayesian

– Target population• Inclusion, exclusion, important subgroups

– Intervention• Dose, administration (intention to treat)

– Measurement of outcome(s)• Efficacy/effectiveness, toxicity

– Statistical hypotheses in terms of some summary measure of outcome distribution

• Mean, geometric mean, median, odds, hazard, etc.

– Criteria for statistical credibility• Frequentist (type I, II errors) or Bayesian

66

Statistics to Address VariabilityStatistics to Address Variability

At the end of the study:– Frequentist and/or Bayesian data analysis to assess

the credibility of clinical trial results• Estimate of the treatment effect

– Single best estimate– Precision of estimates

• Decision for or against hypotheses– Binary decision– Quantification of strength of evidence

At the end of the study:– Frequentist and/or Bayesian data analysis to assess

the credibility of clinical trial results• Estimate of the treatment effect

– Single best estimate– Precision of estimates

• Decision for or against hypotheses– Binary decision– Quantification of strength of evidence

77

Statistical Sampling PlanStatistical Sampling Plan

Ethical and efficiency concerns are addressed through sequential sampling

• During the conduct of the study, data are analyzed at periodic intervals and reviewed by the DMC

• Using interim estimates of treatment effect– Decide whether to continue the trial– If continuing, decide on any modifications to

» scientific / statistical hypotheses and/or» sampling scheme

Ethical and efficiency concerns are addressed through sequential sampling

• During the conduct of the study, data are analyzed at periodic intervals and reviewed by the DMC

• Using interim estimates of treatment effect– Decide whether to continue the trial– If continuing, decide on any modifications to

» scientific / statistical hypotheses and/or» sampling scheme

88

Sampling Plan: General ApproachSampling Plan: General Approach

– Perform analyses when sample sizes N1. . . NJ

• Can be randomly determined

– At each analysis choose stopping boundaries• aj < bj < cj < dj

– Compute test statistic T(X1. . . XNj)

• Stop if T < aj (extremely low)

• Stop if bj < T < cj (approximate equivalence)

• Stop if T > dj (extremely high)

• Otherwise continue (with possible adaptive modification of analysis schedule, sample size, etc.)

– Boundaries for modification of sampling plan








• Otherwise continue (with possible adaptive modification of analysis schedule, sample size, etc.)

– Boundaries for modification of sampling plan

99

Sequential Sampling IssuesSequential Sampling Issues

– Design stage• Choosing sampling plan which satisfies desired operating

characteristics– E.g., type I error, power, sample size requirements

– Monitoring stage• Flexible implementation of the stopping rule to account for

assumptions made at design stage– E.g., adjust sample size to account for observed

variance

– Analysis stage• Providing inference based on true sampling distribution of

test statistics

– Design stage• Choosing sampling plan which satisfies desired operating

characteristics– E.g., type I error, power, sample size requirements

– Monitoring stage• Flexible implementation of the stopping rule to account for

assumptions made at design stage– E.g., adjust sample size to account for observed

variance

– Analysis stage• Providing inference based on true sampling distribution of

test statistics

1010

Sequential Sampling StrategiesSequential Sampling Strategies

Two broad categories of sequential sampling– Prespecified stopping guidelines

– Adaptive procedures

Two broad categories of sequential sampling– Prespecified stopping guidelines

– Adaptive procedures

1111

Prespecified Stopping PlansPrespecified Stopping Plans

Prior to collection of data, specify– Scientific and statistical hypotheses of interest– Statistical criteria for credible evidence– Rule for determining maximal statistical information

• E.g., fix power, maximal sample size, or calendar time

– Randomization scheme– Rule for determining schedule of analyses

• E.g., according to sample size, statistical information, or calendar time

– Rule for determining conditions for early stopping• E.g., boundary shape function for stopping boundaries on the

scale of some test statistic

Prior to collection of data, specify– Scientific and statistical hypotheses of interest– Statistical criteria for credible evidence– Rule for determining maximal statistical information

• E.g., fix power, maximal sample size, or calendar time

– Randomization scheme– Rule for determining schedule of analyses

• E.g., according to sample size, statistical information, or calendar time

– Rule for determining conditions for early stopping• E.g., boundary shape function for stopping boundaries on the

scale of some test statistic

1212

Adaptive Sampling PlansAdaptive Sampling Plans

At each interim analysis, possibly modify– Scientific and statistical hypotheses of interest– Statistical criteria for credible evidence– Maximal statistical information– Randomization ratios– Schedule of analyses– Conditions for early stopping

At each interim analysis, possibly modify– Scientific and statistical hypotheses of interest– Statistical criteria for credible evidence– Maximal statistical information– Randomization ratios– Schedule of analyses– Conditions for early stopping

1313

Adaptive Sampling: ExamplesAdaptive Sampling: Examples

– E.g., Modify sample size to account for estimated information (variance or baseline rates)

• No effect on type I error IF– Estimated information independent of estimate of

treatment effect» Proportional hazards,» Normal data, and/or» Carefully phrased alternatives

– And willing to use conditional inference» Carefully phrased alternatives

– E.g., Modify sample size to account for estimated information (variance or baseline rates)

• No effect on type I error IF– Estimated information independent of estimate of

treatment effect» Proportional hazards,» Normal data, and/or» Carefully phrased alternatives

– And willing to use conditional inference» Carefully phrased alternatives

1414

Estimation of Statistical InformationEstimation of Statistical InformationIf maximal sample size is maintained, the study

discriminates between null hypothesis and an alternative measured in units of statistical information

If maximal sample size is maintained, the study discriminates between null hypothesis and an alternative measured in units of statistical information

V

nV

n2

01

21

201

21

)()(

1515

Estimation of Statistical InformationEstimation of Statistical InformationIf statistical power is maintained, the study sample

size is measured in units of statistical information

If statistical power is maintained, the study sample size is measured in units of statistical information

201

21

201

21

)()(

V

nVn

1616

Adaptive Sampling: ExamplesAdaptive Sampling: Examples

– E.g., Proschan & Hunsberger (1995)• Modify ultimate sample size based on conditional power

– Computed under current best estimate (if high enough)• Make adjustment to inference to maintain Type I error

– E.g., Self-designing Trial (Fisher, 1998)• Combine arbitrary test statistics from sequential groups• Prespecify weighting of groups “just in time”

– Specified at immediately preceding analysis• Fisher’s test statistic is N(0,1) under the null hypothesis of no

treatment difference on any of the endpoints tested

– E.g., Randomized Play the Winner• Biased coin favors currently best performing treatment

– E.g., Proschan & Hunsberger (1995)• Modify ultimate sample size based on conditional power

– Computed under current best estimate (if high enough)• Make adjustment to inference to maintain Type I error

– E.g., Self-designing Trial (Fisher, 1998)• Combine arbitrary test statistics from sequential groups• Prespecify weighting of groups “just in time”

– Specified at immediately preceding analysis• Fisher’s test statistic is N(0,1) under the null hypothesis of no

treatment difference on any of the endpoints tested

– E.g., Randomized Play the Winner• Biased coin favors currently best performing treatment

1717

Motivation for Adaptive DesignsMotivation for Adaptive Designs

Scientific and statistical hypotheses of interest– Modify target population, intervention, measurement

of outcome, alternative hypotheses of interest– Possible justification

• Changing conditions in medical environment– Approval/withdrawal of competing/ancillary treatments– Diagnostic procedures

• New knowledge from other trials about similar treatments• Evidence from ongoing trial

– Toxicity profile (therapeutic index)– Subgroup effects

Scientific and statistical hypotheses of interest– Modify target population, intervention, measurement

of outcome, alternative hypotheses of interest– Possible justification

• Changing conditions in medical environment– Approval/withdrawal of competing/ancillary treatments– Diagnostic procedures

• New knowledge from other trials about similar treatments• Evidence from ongoing trial

– Toxicity profile (therapeutic index)– Subgroup effects

1818

Motivation for Adaptive DesignsMotivation for Adaptive Designs

Modification of other design parameters may have great impact on the hypotheses considered– Statistical criteria for credible evidence– Maximal statistical information– Randomization ratios– Schedule of analyses– Conditions for early stopping

Modification of other design parameters may have great impact on the hypotheses considered– Statistical criteria for credible evidence– Maximal statistical information– Randomization ratios– Schedule of analyses– Conditions for early stopping

1919

Prespecified vs AdaptivePrespecified vs Adaptive

Major issues with use of adaptive designs– What do we truly gain?

• Can proper evaluation of trial designs obviate need?

– What can we lose?• Efficiency? (and how should it be measured?)• Scientific inference?

– Science vs Statistics vs Game theory – Definition of scientific/statistical hypotheses– Quantifying precision of inference

Major issues with use of adaptive designs– What do we truly gain?

• Can proper evaluation of trial designs obviate need?

– What can we lose?• Efficiency? (and how should it be measured?)• Scientific inference?

– Science vs Statistics vs Game theory – Definition of scientific/statistical hypotheses– Quantifying precision of inference

2020

Major Issue: Frequentist InferenceMajor Issue: Frequentist Inference

Frequentist inference is still the most commonly used form of quantifying statistical strength of evidence– Estimates which minimize bias, MSE– Confidence intervals– P values; type I, II errors

Frequentist inference depends on sampling distribution

Frequentist inference is still the most commonly used form of quantifying statistical strength of evidence– Estimates which minimize bias, MSE– Confidence intervals– P values; type I, II errors

Frequentist inference depends on sampling distribution

2121

Prespecified Sampling PlanPrespecified Sampling Plan








• Otherwise continue as prespecified








• Otherwise continue as prespecified

2222

Boundary ScalesBoundary Scales

– Stopping rule for one test statistic is easily transformed to a rule for another statistic

• “Group sequential stopping rules”– Sum of observations– Point estimate of treatment effect– Normalized (Z) statistic– Fixed sample P value– Error spending function

• Conditional probability• Predictive probability• Bayesian posterior probability

– Stopping rule for one test statistic is easily transformed to a rule for another statistic

• “Group sequential stopping rules”– Sum of observations– Point estimate of treatment effect– Normalized (Z) statistic– Fixed sample P value– Error spending function

• Conditional probability• Predictive probability• Bayesian posterior probability

2323

Unified Family: MLE ScaleUnified Family: MLE Scale

Boundary shape function unifying families of stopping rules (Kittelson & Emerson, 1999) – Wang & Tsiatis (1987) based families (R=0, A=0)

• P=1: O’Brien & Fleming (1979); P= 0.5: Pocock (1977)• Emerson & Fleming (1989); Pampallona & Tsiatis (1994)

– Triangular test (Whitehead, 1983): (P=1, R=0, A=1)– Seq cond probability ratio test (Xiong & Tan, 1994)– Some boundaries constant on conditional or

predictive power– Extensions: Peto-Haybittle (using Burington &

Emerson, 2003)

Boundary shape function unifying families of stopping rules (Kittelson & Emerson, 1999) – Wang & Tsiatis (1987) based families (R=0, A=0)

• P=1: O’Brien & Fleming (1979); P= 0.5: Pocock (1977)• Emerson & Fleming (1989); Pampallona & Tsiatis (1994)

– Triangular test (Whitehead, 1983): (P=1, R=0, A=1)– Seq cond probability ratio test (Xiong & Tan, 1994)– Some boundaries constant on conditional or

predictive power– Extensions: Peto-Haybittle (using Burington &

Emerson, 2003)

2424

Spectrum of Conditions for Early StoppingSpectrum of Conditions for Early Stopping

– Down columns: Early stopping vs no early stopping– Across rows: One-sided vs two-sided decisions

– Down columns: Early stopping vs no early stopping– Across rows: One-sided vs two-sided decisions

2525

Spectrum of Boundary Shape FunctionsSpectrum of Boundary Shape FunctionsA wide variety of boundary shapes possible

– All of the rules depicted have the same type I error and power to detect the design alternative

A wide variety of boundary shapes possible– All of the rules depicted have the same type I error

and power to detect the design alternative

2626


Conditional Probability Scale:– Threshold at final analysis from unified family– Hypothesized value of mean

Conditional Probability Scale:– Threshold at final analysis from unified family– Hypothesized value of mean

jJ

jJXJ

jJXJJXj

NN

xNtN

XtXtC

j

**

*;*

1

|Pr,

JXt

*

2727


Predictive Probability Scale:– Prior distribution

Predictive Probability Scale:– Prior distribution 2,~ N

2222

222

1

|,|Pr

jJjJ

jjJjjXjJ

jjjXjjXj

NNNN

xNNxtNN

dXXtXtH

2828


Bayesian Posterior Scale:– Prior

Bayesian Posterior Scale:– Prior

22

2222*

,,1**

1

|Pr

j

jjj

N

N

xNN

XXBjj

2,~ N

2929

Major Issue: Frequentist InferenceMajor Issue: Frequentist InferenceFrequentist operating characteristics are based on

the sampling distribution– Stopping rules do affect the sampling distribution of

the usual statistics • MLEs are not normally distributed• Z scores are not standard normal under the null

– (1.96 is irrelevant)• The null distribution of fixed sample P values is not uniform

– (They are not true P values)

Frequentist operating characteristics are based on the sampling distribution– Stopping rules do affect the sampling distribution of

the usual statistics • MLEs are not normally distributed• Z scores are not standard normal under the null

– (1.96 is irrelevant)• The null distribution of fixed sample P values is not uniform

– (They are not true P values)

3030

Sequential Sampling: The PriceSequential Sampling: The Price

It is only through full knowledge of the sampling plan that we can assess the full complement of frequentist operating characteristics– In order to obtain inference with maximal precision

and minimal bias, the sampling plan must be well quantified

– (Note that adaptive designs using ancillary statistics pose no special problems if we condition on those ancillary statistics.)

It is only through full knowledge of the sampling plan that we can assess the full complement of frequentist operating characteristics– In order to obtain inference with maximal precision

and minimal bias, the sampling plan must be well quantified

– (Note that adaptive designs using ancillary statistics pose no special problems if we condition on those ancillary statistics.)

3131

Sampling Distribution of EstimatesSampling Distribution of Estimates

3232

Sampling Distribution of EstimatesSampling Distribution of Estimates

3333

Sampling Distributions with Stopping RulesSampling Distributions with Stopping Rules

3434

Sampling DistributionSampling Distribution

For any known stopping rule, however, we can compute the correct sampling distribution with specialized software– From the computed sampling distributions we then

compute• Bias adjusted estimates• Correct (adjusted) confidence intervals• Correct (adjusted) P values

– Candidate designs can then be compared with respect to their operating characteristics

For any known stopping rule, however, we can compute the correct sampling distribution with specialized software– From the computed sampling distributions we then

compute• Bias adjusted estimates• Correct (adjusted) confidence intervals• Correct (adjusted) P values

– Candidate designs can then be compared with respect to their operating characteristics

3535

Evaluation of DesignsEvaluation of Designs

Process of choosing a trial design– Define candidate design

• Usually constrain two operating characteristics– Type I error, power at design alternative– Type I error, maximal sample size

– Evaluate other operating characteristics• Different criteria of interest to different investigators

– Modify design– Iterate

Process of choosing a trial design– Define candidate design

• Usually constrain two operating characteristics– Type I error, power at design alternative– Type I error, maximal sample size

– Evaluate other operating characteristics• Different criteria of interest to different investigators

– Modify design– Iterate

3636

Operating CharacteristicsOperating Characteristics

The same regardless of the type of stopping rule – Frequentist power curve

• Type I error (null) and power (design alternative)

– Sample size requirements• Maximum, average, median, other quantiles• Stopping probabilities

– Inference at study termination (at each boundary)• Frequentist or Bayesian (under spectrum of priors)

– Futility measures• Conditional power, predictive power

The same regardless of the type of stopping rule – Frequentist power curve

• Type I error (null) and power (design alternative)

– Sample size requirements• Maximum, average, median, other quantiles• Stopping probabilities

– Inference at study termination (at each boundary)• Frequentist or Bayesian (under spectrum of priors)

– Futility measures• Conditional power, predictive power

3737

At Design StageAt Design Stage

In particular, at design stage we can know – Conditions under which trial will continue at each

analysis• Estimates, inference, conditional and predictive power

– Tradeoffs between early stopping and loss in unconditional power

In particular, at design stage we can know – Conditions under which trial will continue at each

analysis• Estimates, inference, conditional and predictive power

– Tradeoffs between early stopping and loss in unconditional power

3838

Frequentist InferenceFrequentist InferenceO'Brien-Fleming Pocock

N MLEBias AdjEstimate 95% CI P val MLE

Bias AdjEstimate 95% CI P val

Efficacy

425 -0.171 -0.163 (-0.224, -0.087) 0.000 -0.099 -0.089 (-0.152, -0.015) 0.010

850 -0.086 -0.080 (-0.130, -0.025) 0.002 -0.070 -0.065 (-0.114, -0.004) 0.018

1275 -0.057 -0.054 (-0.096, -0.007) 0.012 -0.057 -0.055 (-0.101, -0.001) 0.023

1700 -0.043 -0.043 (-0.086, 0.000) 0.025 -0.050 -0.050 (-0.099, 0.000) 0.025

Futility

425 0.086 0.077 (0.001, 0.139) 0.977 0.000 -0.010 (-0.084, 0.053) 0.371

850 0.000 -0.006 (-0.061, 0.044) 0.401 -0.029 -0.035 (-0.095, 0.014) 0.078

1275 -0.029 -0.031 (-0.079, 0.010) 0.067 -0.042 -0.044 (-0.098, 0.002) 0.029

1700 -0.043 -0.043 (-0.086, 0.000) 0.025 -0.050 -0.050 (-0.099, 0.000) 0.025

3939

Efficiency / Unconditional PowerEfficiency / Unconditional Power

Tradeoffs between early stopping and loss of powerBoundaries Loss of Power Avg Sample Size


4040

Stochastic CurtailmentStochastic Curtailment

Boundaries transformed to conditional or predictive power– Key issue: Computations are based on assumptions

about the true treatment effect• Conditional power

– “Design”: based on hypotheses– “Estimate”: based on current estimates

• Predictive power– “Prior assumptions”

Boundaries transformed to conditional or predictive power– Key issue: Computations are based on assumptions

about the true treatment effect• Conditional power

– “Design”: based on hypotheses– “Estimate”: based on current estimates

• Predictive power– “Prior assumptions”

4141

Conditional/Predictive PowerConditional/Predictive Power

Symmetric O’Brien-Fleming O’Brien-Fleming Efficacy, P=0.8 Futility

Conditional Power Predictive Power Conditional Power Predictive Power

N MLE Design Estimate Sponsor Noninf MLE Design Estimate Sponsor Noninf

Efficacy (rejects 0.00) Efficacy (rejects 0.00)

425 -0.171 0.500 0.000 0.002 0.000 -0.170 0.500 0.000 0.002 0.000

850 -0.085 0.500 0.002 0.015 0.023 -0.085 0.500 0.002 0.015 0.023

1275 -0.057 0.500 0.091 0.077 0.124 -0.057 0.500 0.093 0.077 0.126

Futility (rejects -0.0855) Futility (rejects -0.0866)

425 0.085 0.500 0.000 0.077 0.000 0.047 0.719 0.000 0.222 0.008

850 0.000 0.500 0.002 0.143 0.023 -0.010 0.648 0.015 0.247 0.063

1275 -0.028 0.500 0.091 0.241 0.124 -0.031 0.592 0.142 0.312 0.177

4242

Efficiency / Unconditional PowerEfficiency / Unconditional Power



4343

Key IssuesKey Issues

Very different probabilities based on assumptions about the true treatment effect– Extremely conservative O’Brien-Fleming boundaries

correspond to conditional power of 50% (!) under alternative rejected by the boundary

– Resolution of apparent paradox: if the alternative were true, there is less than .0001 probability of stopping for futility at the first analysis

Very different probabilities based on assumptions about the true treatment effect– Extremely conservative O’Brien-Fleming boundaries

correspond to conditional power of 50% (!) under alternative rejected by the boundary

– Resolution of apparent paradox: if the alternative were true, there is less than .0001 probability of stopping for futility at the first analysis

4444

Further CommentsFurther Comments

Neither conditional power nor predictive power have good foundational motivation– Frequentists should use Neyman-Pearson paradigm

and consider optimal unconditional power across alternatives

• And conditional/predictive power is not a good indicator in loss of unconditional power

– Bayesians should use posterior distributions for decisions

Neither conditional power nor predictive power have good foundational motivation– Frequentists should use Neyman-Pearson paradigm

and consider optimal unconditional power across alternatives

• And conditional/predictive power is not a good indicator in loss of unconditional power

– Bayesians should use posterior distributions for decisions

4545

Fully Adaptive SamplingFully Adaptive Sampling

What is the cost of planning not to plan?– In order to provide frequentist estimation, we must

know the rule used to modify the clinical trial• Hypothesis testing of a null is possible with fully adaptive trials

– Statistics: type I error is controlled– Game theory: chance of “winning” with completely

ineffective therapy is controlled– Science:

» At best: ability to discriminate clinically relevant hypothesis may be impaired

» At worst: uncertainty as to what the treatment has effect on

What is the cost of planning not to plan?– In order to provide frequentist estimation, we must

know the rule used to modify the clinical trial• Hypothesis testing of a null is possible with fully adaptive trials

– Statistics: type I error is controlled– Game theory: chance of “winning” with completely

ineffective therapy is controlled– Science:

» At best: ability to discriminate clinically relevant hypothesis may be impaired

» At worst: uncertainty as to what the treatment has effect on

4646

Prespecified Modification RulesPrespecified Modification Rules

Adaptive sampling plans exact a price in statistical efficiency– Tsiatis & Mehta (2002)

• A classic prespecified group sequential stopping rule can be found that is more efficient than a given adaptive design

– Shi & Emerson (2003)• Fisher’s test statistic in the self-designing trial provides

markedly less precise inference than that based on the MLE– To compute the sampling distribution of the latter, the

sampling plan must be known

Adaptive sampling plans exact a price in statistical efficiency– Tsiatis & Mehta (2002)

• A classic prespecified group sequential stopping rule can be found that is more efficient than a given adaptive design

– Shi & Emerson (2003)• Fisher’s test statistic in the self-designing trial provides

markedly less precise inference than that based on the MLE– To compute the sampling distribution of the latter, the

sampling plan must be known

4747

Conditional/Predictive PowerConditional/Predictive Power

Additional issues with maintaining conditional or predictive power– Modification of sample size may allow precise

knowledge of interim treatment effect• Interim estimates may cause change in study population

– Time trends due to investigators gaining or losing enthusiasm

• In extreme cases, potential for unblinding of individual patients

– Effect of outliers on test statistics

Additional issues with maintaining conditional or predictive power– Modification of sample size may allow precise

knowledge of interim treatment effect• Interim estimates may cause change in study population

– Time trends due to investigators gaining or losing enthusiasm

• In extreme cases, potential for unblinding of individual patients

– Effect of outliers on test statistics

4848

Self-Designing TrialSelf-Designing Trial

Additional issues with Self-Designing Trial– The self-designing trial requires pre-specification of

the analysis at which the trial stops• Trial stops when all of the remaining weight is to be applied

at the current analysis, as specified at the previous analysis

Additional issues with Self-Designing Trial– The self-designing trial requires pre-specification of

the analysis at which the trial stops• Trial stops when all of the remaining weight is to be applied

at the current analysis, as specified at the previous analysis

4949

Randomized Play the WinnerRandomized Play the Winner

Additional issues with Randomized Play the Winner– For a fixed total sample size, greatest efficiency is

obtained when ratio of sample sizes is equal to ratio of statistical information from arms

• Constant ratio of standard deviation of observations to sample size

– (Of course, PTW is designed to minimize the number of subjects receiving an inferior treatment, which may be a greater cost in total patients and time)

Additional issues with Randomized Play the Winner– For a fixed total sample size, greatest efficiency is

obtained when ratio of sample sizes is equal to ratio of statistical information from arms

• Constant ratio of standard deviation of observations to sample size

– (Of course, PTW is designed to minimize the number of subjects receiving an inferior treatment, which may be a greater cost in total patients and time)

5050

Final CommentsFinal Comments

Adaptive designs versus prespecified stopping rules– Adaptive designs come at a price of efficiency and

(sometimes) scientific interpretation

– With adequate tools for careful evaluation of designs, there is little need for adaptive designs

Adaptive designs versus prespecified stopping rules– Adaptive designs come at a price of efficiency and

(sometimes) scientific interpretation

– With adequate tools for careful evaluation of designs, there is little need for adaptive designs

5151

Bottom LineBottom Line

You better think (think)

about what you’re

trying to do…

-Aretha Franklin

You better think (think)

about what you’re

trying to do…

-Aretha Franklin

Documents

1 1 Issues in the Use of Adaptive Clinical Trial Designs Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Scott S. Emerson,