Goal : Beat the frequentists at their own game in phase III clinical trial design

Requirements: Maintain overall false-positive error rate

and targeted power Compare to O’Brien-Fleming, Pocock and

Optimal group-sequential designs

The method must be robust, and hence must not depend on the proportional hazards assumption

GoalGoal: Beat the frequentists at their own game : Beat the frequentists at their own game in phase III clinical trial designin phase III clinical trial design

Bayesian Doubly OptimalDoubly Optimal Group Sequential Design for Clinical Trials

Solution: A Bayesian Doubly Optimal Group Sequential (BDOGS) Design

(Wathen and Thall, Stat in Medicine, 2008)

1. A robust Bayesian decision-theoretic approach to designing group sequential clinical trials

2. The focus is on two-arm trials with time-to-failure (TTF) outcomes

3. Uses Bayesian adaptive model selection

4. Maintains overall frequentist size and power

Basic Elements of BDOGS1) Assume the data come from one of M models

(characterized by their hazard functions)

2) Before the trial: Derive the Optimal Decision Bounds for each model, and store them

3) During the trial: At each interim analysis, make decisions using the Optimal Decision Bounds of the Optimal Model

4) The optimal boundaries depend on the model, and the model is optimized adaptively The decision boundaries may change from one interim evaluation to the next

BDOGSillustration

A Doubly Optimal ProcedureStep 1 (Before the Trial): For each of M specific models, obtain the OptimalOptimal Decision Boundaries using forward simulation.

Step 2 (During the Trial): Obtain posterior model probabilities for the set of M possible models using approximate Bayes Factors to determine the OptimalOptimal Model.

Step 3 (During the Trial): Apply the optimal decision boundaries corresponding to the optimal model at each interim decision based on the most recent data.

= E – S = actual improvement in median failure time of experimental (E) over standard (S), a parameter under the Bayesian model (hence random)

* = fixed desired improvement in median failure time of E over S

Expected Utility = ½ E= 0(N) + ½ E= *(N)

Decision BoundariesTo facilitate computation, for each model BDOGS

uses the two parametric boundary functions

PU = aU – bU { N+(Xn)/N }

PL = aL + bL { N+(Xn)/N }

where N = maximum sample size, and

N+(Xn) = # failure events in data Xn

(aU , bU , cU , aL , bL , cL ) characterize the decision boundary for a given model

cU

cL

Decision Rules

Superiority of S over E

S = Pr( | x ) > PU Stop and select S

Superiority of E over S

E = Pr( | x ) > PU Stop and select E

Futility

S < PL and E < PL Stop for futility

Acquire more information

PL S, E PU Continue randomizing to obtain more information

Forward Simulation

Simulate the entire trial 5000 times assuming

= 0, and 5000 times assuming * :

1. For each interim analysis, calculate E and S, and store E, S, and also store

[# of patients], [# events] for each treatment arm.

2. Apply the decision rule, d to obtain the expected utility for a trial using d

3. Find d that maximizes the expected utility.

(A complex search algorithm is required.)

Examples of Hazard Functions (Models)

Hazard function for M1 = exponential distribution is constant

A Metastatic Non-Small Cell Lung Cell Cancer (NSCLC) Trial

Median overall survival (OS) in metastatic NSCLC is about 4 months

A phase III trial of localized surgery or radiation therapy versus systemic chemotherapy for metastatic NSCLC was designed with the goal to improve median progression-free survival (PFS) from 4 to 8 months

Initially, a conventional .05/.90 group sequential design with O’Brien-Fleming boundaries was planned, with up to 3 tests at 30, 60 and 89 events.

Under the “usual” assumptions, accruing 2 to 4 patients/month, a typical O’Brien-Fleming .05/.90

group sequential design will require ~ 100 to 120 patients and take ~ 2 ½ to 4 ½ years to complete

Analysis of Historical Data on PFS time in Metastatic NSCLS

A preliminary goodness-of-fit analysis, based on a published Kaplan-Meier plot of PFS times of NSCLC patients with metastatic disease, showed that the Log Normal distribution gave a much better fit than the Weibull or Exponential.

The proportional hazards assumption was very likely invalid.

The hazard function was very likely non-monotone.

1. To test H0: = 0 versus H1: 0

2. Assume med(T) = 4 mos. for std. therapy

3. Type I Error = .05, Power = 0.90 for = 4 months, improvement to med(T) = 8 mos.

4. Assume 2 patients per month accrual

5. Up to 5 interim analyses + 1 final analysis, at 25, 50, 75, 87, 112 and 122 events

6. Five possible models

A BDOGS Design for the NSCLC Trial

Possible Models (Hazard Functions)

M1 = constant (Exponential model)

M2 = increasing

M3 = decreasing

M4 = initially increasing, then a slight decrease

M5 = initially increasing, then a large decrease

A priori, the 5 models were assumed to be

equally likely: Pr(M1) = …= Pr(M5) = .20.

Non-Constant Hazard Functions (Models)

For comparability in the simulations:

An O’Brien-Fleming design was constructed to have the same 6 looks, for both superiority (reject the null) and inferiority (accept the null) decisions.

Both designs had the same maximum sample size N = 122 patients.

For each case (underlying true PFS distribution) studied, the data were simulated ahead of time and each method was presented with the same data.

Simulation Study for the NSCLS Trial

Non-constant Hazards Used in Simulation Study for S (solid line) and E (dashed line)

index

ss.n

ull

0 5 10 15 20 25

40

60

80

10

01

20

Sa

mp

le S

izeA: Null Case

30

50

70

90

110

130

B OF B OF B OF B OF B OF B OF B OFExp LN-BF W-BF WD LN-ID2 LN-ID3 Exp

Simulation Results: Null Case

B = BDOGS, OF = O’Brien-Fleming

Lower - Upper Lines = 2.5 - 97.5 Percentiles Line in Box = Median

Box = 25 – 75 Percentiles Dot in Box = Mean

index

ss.n

ull

0 5 10 15 20 25

40

60

80

10

01

20

Sa

mp

le S

ize

B: Alternative Case

30

50

70

90

110

130

B OF B OF B OF B OF B OF B OF B OFExp LN-BF W-BF WD LN-ID2 LN-ID3 WI

Simulation Results: Alternative Case

B = BDOGS, OF = O’Brien-Fleming

Lower - Upper Lines = 2.5 - 97.5 Percentiles Line in Box = Median

Box = 25 – 75 Percentiles Dot in Box = Mean

Simulation Results

If the hazard is constant, both BDOGS and OF maintain targeted size and

power, but OF requires a much larger sample (33% to 51% more patients)

Simulation Results

If the hazard is Log Normal, both BDOGS and OF maintain targeted size

and power, but OF requires a much larger sample

Simulation Results

If the hazard is Weibull, both BDOGS and OF maintain targeted power,

BDOGS has a reduced size = .02, and OF requires a much larger sample

Simulation Results

If the hazard is Weibull with decreasing hazard, BDOGS has size .07, OF has

reduced power .81, and OF requires a much larger sample

Simulation Results

If the hazard is Weibull with increasing hazard, both methods have greatly reduced size .01, OF has greatly

increased power .99, and OF has a 61% to 141% larger sample size

Documents

Goal : Beat the frequentists at their own game in phase III clinical trial design