29
Requirements: Maintain overall false-positive error rate and targeted power Compare to O’Brien-Fleming, Pocock and Optimal group-sequential designs The method must be robust, and hence must not depend on the proportional hazards assumption Goal Goal : Beat the frequentists at : Beat the frequentists at their own game in phase III their own game in phase III clinical trial design clinical trial design Bayesian Doubly Optimal Doubly Optimal Group Sequential Design for Clinical Trials

Goal : Beat the frequentists at their own game in phase III clinical trial design

Embed Size (px)

DESCRIPTION

Bayesian Doubly Optimal Group Sequential Design for Clinical Trials. Goal : Beat the frequentists at their own game in phase III clinical trial design. Requirements: Maintain overall false-positive error rate and targeted power - PowerPoint PPT Presentation

Citation preview

Page 1: Goal : Beat the frequentists at their own game in phase III clinical trial design

Requirements: Maintain overall false-positive error rate

and targeted power Compare to O’Brien-Fleming, Pocock and

Optimal group-sequential designs

The method must be robust, and hence must not depend on the proportional hazards assumption

GoalGoal: Beat the frequentists at their own game : Beat the frequentists at their own game in phase III clinical trial designin phase III clinical trial design

Bayesian Doubly OptimalDoubly Optimal Group Sequential Design for Clinical Trials

Page 2: Goal : Beat the frequentists at their own game in phase III clinical trial design

Solution: A Bayesian Doubly Optimal Group Sequential (BDOGS) Design

(Wathen and Thall, Stat in Medicine, 2008)

1. A robust Bayesian decision-theoretic approach to designing group sequential clinical trials

2. The focus is on two-arm trials with time-to-failure (TTF) outcomes

3. Uses Bayesian adaptive model selection

4. Maintains overall frequentist size and power

Page 3: Goal : Beat the frequentists at their own game in phase III clinical trial design

Basic Elements of BDOGS1) Assume the data come from one of M models

(characterized by their hazard functions)

2) Before the trial: Derive the Optimal Decision Bounds for each model, and store them

3) During the trial: At each interim analysis, make decisions using the Optimal Decision Bounds of the Optimal Model

4) The optimal boundaries depend on the model, and the model is optimized adaptively The decision boundaries may change from one interim evaluation to the next

BDOGSillustration

Page 4: Goal : Beat the frequentists at their own game in phase III clinical trial design

A Doubly Optimal ProcedureStep 1 (Before the Trial): For each of M specific models, obtain the OptimalOptimal Decision Boundaries using forward simulation.

Step 2 (During the Trial): Obtain posterior model probabilities for the set of M possible models using approximate Bayes Factors to determine the OptimalOptimal Model.

Step 3 (During the Trial): Apply the optimal decision boundaries corresponding to the optimal model at each interim decision based on the most recent data.

Page 5: Goal : Beat the frequentists at their own game in phase III clinical trial design

= E – S = actual improvement in median failure time of experimental (E) over standard (S), a parameter under the Bayesian model (hence random)

* = fixed desired improvement in median failure time of E over S

Expected Utility = ½ E= 0(N) + ½ E= *(N)

Page 6: Goal : Beat the frequentists at their own game in phase III clinical trial design

Decision BoundariesTo facilitate computation, for each model BDOGS

uses the two parametric boundary functions

PU = aU – bU { N+(Xn)/N }

PL = aL + bL { N+(Xn)/N }

where N = maximum sample size, and

N+(Xn) = # failure events in data Xn

(aU , bU , cU , aL , bL , cL ) characterize the decision boundary for a given model

cU

cL

Page 7: Goal : Beat the frequentists at their own game in phase III clinical trial design

Decision Rules

Superiority of S over E

S = Pr( | x ) > PU Stop and select S

Superiority of E over S

E = Pr( | x ) > PU Stop and select E

Futility

S < PL and E < PL Stop for futility

Acquire more information

PL S, E PU Continue randomizing to obtain more information

Page 8: Goal : Beat the frequentists at their own game in phase III clinical trial design

Forward Simulation

Simulate the entire trial 5000 times assuming

= 0, and 5000 times assuming * :

1. For each interim analysis, calculate E and S, and store E, S, and also store

[# of patients], [# events] for each treatment arm.

2. Apply the decision rule, d to obtain the expected utility for a trial using d

3. Find d that maximizes the expected utility.

(A complex search algorithm is required.)

Page 9: Goal : Beat the frequentists at their own game in phase III clinical trial design

Examples of Hazard Functions (Models)

Hazard function for M1 = exponential distribution is constant

Page 10: Goal : Beat the frequentists at their own game in phase III clinical trial design
Page 11: Goal : Beat the frequentists at their own game in phase III clinical trial design
Page 12: Goal : Beat the frequentists at their own game in phase III clinical trial design
Page 13: Goal : Beat the frequentists at their own game in phase III clinical trial design
Page 14: Goal : Beat the frequentists at their own game in phase III clinical trial design
Page 15: Goal : Beat the frequentists at their own game in phase III clinical trial design

A Metastatic Non-Small Cell Lung Cell Cancer (NSCLC) Trial

Median overall survival (OS) in metastatic NSCLC is about 4 months

A phase III trial of localized surgery or radiation therapy versus systemic chemotherapy for metastatic NSCLC was designed with the goal to improve median progression-free survival (PFS) from 4 to 8 months

Initially, a conventional .05/.90 group sequential design with O’Brien-Fleming boundaries was planned, with up to 3 tests at 30, 60 and 89 events.

Page 16: Goal : Beat the frequentists at their own game in phase III clinical trial design

Under the “usual” assumptions, accruing 2 to 4 patients/month, a typical O’Brien-Fleming .05/.90

group sequential design will require ~ 100 to 120 patients and take ~ 2 ½ to 4 ½ years to complete

Page 17: Goal : Beat the frequentists at their own game in phase III clinical trial design

Analysis of Historical Data on PFS time in Metastatic NSCLS

A preliminary goodness-of-fit analysis, based on a published Kaplan-Meier plot of PFS times of NSCLC patients with metastatic disease, showed that the Log Normal distribution gave a much better fit than the Weibull or Exponential.

The proportional hazards assumption was very likely invalid.

The hazard function was very likely non-monotone.

Page 18: Goal : Beat the frequentists at their own game in phase III clinical trial design

1. To test H0: = 0 versus H1: 0

2. Assume med(T) = 4 mos. for std. therapy

3. Type I Error = .05, Power = 0.90 for = 4 months, improvement to med(T) = 8 mos.

4. Assume 2 patients per month accrual

5. Up to 5 interim analyses + 1 final analysis, at 25, 50, 75, 87, 112 and 122 events

6. Five possible models

A BDOGS Design for the NSCLC Trial

Page 19: Goal : Beat the frequentists at their own game in phase III clinical trial design

Possible Models (Hazard Functions)

M1 = constant (Exponential model)

M2 = increasing

M3 = decreasing

M4 = initially increasing, then a slight decrease

M5 = initially increasing, then a large decrease

A priori, the 5 models were assumed to be

equally likely: Pr(M1) = …= Pr(M5) = .20.

Page 20: Goal : Beat the frequentists at their own game in phase III clinical trial design

Non-Constant Hazard Functions (Models)

Page 21: Goal : Beat the frequentists at their own game in phase III clinical trial design

For comparability in the simulations:

An O’Brien-Fleming design was constructed to have the same 6 looks, for both superiority (reject the null) and inferiority (accept the null) decisions.

Both designs had the same maximum sample size N = 122 patients.

For each case (underlying true PFS distribution) studied, the data were simulated ahead of time and each method was presented with the same data.

Simulation Study for the NSCLS Trial

Page 22: Goal : Beat the frequentists at their own game in phase III clinical trial design

Non-constant Hazards Used in Simulation Study for S (solid line) and E (dashed line)

Page 23: Goal : Beat the frequentists at their own game in phase III clinical trial design

index

ss.n

ull

0 5 10 15 20 25

40

60

80

10

01

20

Sa

mp

le S

izeA: Null Case

30

50

70

90

110

130

B OF B OF B OF B OF B OF B OF B OFExp LN-BF W-BF WD LN-ID2 LN-ID3 Exp

Simulation Results: Null Case

B = BDOGS, OF = O’Brien-Fleming

Lower - Upper Lines = 2.5 - 97.5 Percentiles Line in Box = Median

Box = 25 – 75 Percentiles Dot in Box = Mean

Page 24: Goal : Beat the frequentists at their own game in phase III clinical trial design

index

ss.n

ull

0 5 10 15 20 25

40

60

80

10

01

20

Sa

mp

le S

ize

B: Alternative Case

30

50

70

90

110

130

B OF B OF B OF B OF B OF B OF B OFExp LN-BF W-BF WD LN-ID2 LN-ID3 WI

Simulation Results: Alternative Case

B = BDOGS, OF = O’Brien-Fleming

Lower - Upper Lines = 2.5 - 97.5 Percentiles Line in Box = Median

Box = 25 – 75 Percentiles Dot in Box = Mean

Page 25: Goal : Beat the frequentists at their own game in phase III clinical trial design

Simulation Results

If the hazard is constant, both BDOGS and OF maintain targeted size and

power, but OF requires a much larger sample (33% to 51% more patients)

Page 26: Goal : Beat the frequentists at their own game in phase III clinical trial design

Simulation Results

If the hazard is Log Normal, both BDOGS and OF maintain targeted size

and power, but OF requires a much larger sample

Page 27: Goal : Beat the frequentists at their own game in phase III clinical trial design

Simulation Results

If the hazard is Weibull, both BDOGS and OF maintain targeted power,

BDOGS has a reduced size = .02, and OF requires a much larger sample

Page 28: Goal : Beat the frequentists at their own game in phase III clinical trial design

Simulation Results

If the hazard is Weibull with decreasing hazard, BDOGS has size .07, OF has

reduced power .81, and OF requires a much larger sample

Page 29: Goal : Beat the frequentists at their own game in phase III clinical trial design

Simulation Results

If the hazard is Weibull with increasing hazard, both methods have greatly reduced size .01, OF has greatly

increased power .99, and OF has a 61% to 141% larger sample size