15
Clinical Trials 2008; 5: 93–106 ARTICLE A predictive probability design for phase II cancer clinical trials J Jack Lee and Diane D Liu Background Two- or three-stage designs are commonly used in phase II cancer clinical trials. These designs possess good frequentist properties and allow early termination of the trial when the interim data indicate that the experimental regimen is inefficacious. The rigid study design, however, can be difficult to follow exactly because the response has to be evaluated at prespecified fixed number of patients. Purpose Our goal is to develop an efficient and flexible design that possesses desirable statistical properties. Methods A flexible design based on Bayesian predictive probability and the minimax criterion is constructed. A three-dimensional search algorithm is implemented to determine the design parameters. Results The new design controls type I and type II error rates, and allows continuous monitoring of the trial outcome. Consequently, under the null hypothesis when the experimental treatment is not efficacious, the design is more efficient in stopping the trial earlier, which results in a smaller expected sample size. Exact computation and simulation studies demonstrate that the predictive probability design possesses good operating characteristics. Limitations The predictive probability design is more computationally intensive than two- or three-stage designs. Similar to all designs with early stopping due to futility, the resulting estimate of treatment efficacy may be biased. Conclusions The predictive probability design is efficient and remains robust in controlling type I and type II error rates when the trial conduct deviates from the original design. It is more adaptable than traditional multi-stage designs in evaluating the study outcome, hence, it is easier to implement. S-PLUS/R programs are provided to assist the study design. Clinical Trials 2008; 5: 93–106. http://ctj.sagepub.com Introduction Phase II studies play a pivotal role in drug development. The main purpose for phase II clinical trials is to determine whether a new treatment demonstrates sufficient efficacy to warrant further investigation. It helps to screen out inefficacious agents and avoid sending them to large phase III trials. [1,2] A commonly used primary endpoint in phase II cancer clinical trials is the clinical response to a treatment, which is a binary endpoint defined as the patient achieving complete or partial response within a predefined treatment course. In the early phase II development of new drugs, most trials are open label, single-arm studies, while late phase II trials tend to be multi- arm, randomized studies. Overviews of statistical methods for phase II trials can be found in the work of Thall and Simon [3], Kramar et al. [4], Mariani and Marubini [5], and Lee and Feng [6]. In this paper, we focus on the one arm phase II trials with binary endpoints. Multi-stage designs are often implemented in such settings to increase the study efficiency by allowing early termination if the treatment is deemed inefficacious. The first phase II design proposed for cancer research is a two-stage Author for correspondence: J Jack Lee, Ph.D., Department of Biostatistics, The University of Texas M.D. Anderson Cancer Center, 1515 Holcombe Blvd., Unit 447, Houston, Texas 77030, USA. E-mail: [email protected] Department of Biostatistics, University of Texas M.D. Anderson Cancer Center, Houston, Texas, USA ß Society for Clinical Trials 2008 SAGE Publications, Los Angeles, London, New Delhi and Singapore 10.1177/1740774508089279

A predictive probability design for phase II cancer ...biostat.mc.vanderbilt.edu/.../Main/MSJournalClub/PredProbDesign.pdfsample size. Exact computation and simulation studies demonstrate

  • Upload
    ledan

  • View
    215

  • Download
    0

Embed Size (px)

Citation preview

Clinical Trials 2008; 5: 93–106ARTICLE

A predictive probability design for phase IIcancer clinical trials

J Jack Lee and Diane D Liu

Background Two- or three-stage designs are commonly used in phase II cancerclinical trials. These designs possess good frequentist properties and allow earlytermination of the trial when the interim data indicate that the experimentalregimen is inefficacious. The rigid study design, however, can be difficult to followexactly because the response has to be evaluated at prespecified fixed number ofpatients.Purpose Our goal is to develop an efficient and flexible design that possessesdesirable statistical properties.Methods A flexible design based on Bayesian predictive probability and theminimax criterion is constructed. A three-dimensional search algorithm isimplemented to determine the design parameters.Results The new design controls type I and type II error rates, and allowscontinuous monitoring of the trial outcome. Consequently, under the nullhypothesis when the experimental treatment is not efficacious, the design ismore efficient in stopping the trial earlier, which results in a smaller expectedsample size. Exact computation and simulation studies demonstrate that thepredictive probability design possesses good operating characteristics.Limitations The predictive probability design is more computationally intensivethan two- or three-stage designs. Similar to all designs with early stopping due tofutility, the resulting estimate of treatment efficacy may be biased.Conclusions The predictive probability design is efficient and remains robust incontrolling type I and type II error rates when the trial conduct deviates from theoriginal design. It is more adaptable than traditional multi-stage designsin evaluating the study outcome, hence, it is easier to implement. S-PLUS/Rprograms are provided to assist the study design. Clinical Trials 2008; 5: 93–106.http://ctj.sagepub.com

Introduction

Phase II studies play a pivotal role in drugdevelopment. The main purpose for phase IIclinical trials is to determine whether a newtreatment demonstrates sufficient efficacy towarrant further investigation. It helps to screenout inefficacious agents and avoid sending them tolarge phase III trials. [1,2] A commonly usedprimary endpoint in phase II cancer clinical trialsis the clinical response to a treatment, which is abinary endpoint defined as the patient achievingcomplete or partial response within a predefined

treatment course. In the early phase II developmentof new drugs, most trials are open label, single-armstudies, while late phase II trials tend to be multi-arm, randomized studies. Overviews of statisticalmethods for phase II trials can be found in the workof Thall and Simon [3], Kramar et al. [4], Marianiand Marubini [5], and Lee and Feng [6].

In this paper, we focus on the one arm phase IItrials with binary endpoints. Multi-stage designs areoften implemented in such settings to increase thestudy efficiency by allowing early termination if thetreatment is deemed inefficacious. The first phase IIdesign proposed for cancer research is a two-stage

Author for correspondence: J Jack Lee, Ph.D., Department of Biostatistics, The University of Texas M.D. AndersonCancer Center, 1515 Holcombe Blvd., Unit 447, Houston, Texas 77030, USA.E-mail: [email protected]

Department of Biostatistics, University of Texas M.D. Anderson Cancer Center, Houston, Texas, USA

� Society for Clinical Trials 2008SAGE Publications, Los Angeles, London, New Delhi and Singapore 10.1177/1740774508089279

procedure by Gehan [7]. With that design, if noresponses are observed in the first stage, the newtreatment is abandoned. Otherwise, additionalpatients are enrolled in the second stage to providea better estimation of the response rate.The commonly used Simon’s designs [8] are alsotwo-stage designs. Simon’s designs allow earlystopping due to futility, i.e., when lack of efficacyis observed. If the experimental treatment workswell, more patients are treated in the second stage,which also allows better estimation of the responserate and toxicity of the new treatment, beforelaunching a phase III trial. Under the null hypoth-esis, the optimal two-stage design minimizes theexpected sample size, while the minimax designminimizes the maximum sample size. Both designsare subject to the constraints of type I and type IIerror rates. Many other multi-stage designs withdifferent objectives and optimization criteria can befound in the literature [9–12]. Jung et al. [13]devised a graphical method for searching all designparameters to attain certain admissible two-stagedesigns with good design characteristics. They latergeneralized the method and identified a family ofadmissible designs using a Bayesian decision-theo-retic criterion [14].

Multi-stage designs possess better statisticalproperties than single-stage designs by utilizingthe information gained in the interim data.Furthermore, three-stage designs are generallymore efficient than two-stage designs because theadditional interim look at the data allows for earlierdecision to stop the trial if convincing evidence tosupport the null or alternative hypothesis is found.These designs, however, are more difficult toconduct because of the rigid requirement ofexamining the outcome at the specified samplesize in each fixed stage. The strict sample sizeguideline in each stage is particularly difficult toadhere to in multi-center trials due to the complex-ity of coordinating patient accrual and follow-upacross multiple sites. Temporarily halting the studyaccrual can also stall the momentum of the trialand lower the enthusiasm for investigators toparticipate in the trial. In addition, when theactual conduct deviates from the original design,the stopping boundaries are left undefined and theplanned statistical properties no longer hold. Manyauthors have recognized this problem and haveoffered solutions. Green and Dahlberg [15] exam-ined the performance of planned versus attaineddesigns and gave an empirical solution by adaptingthe stopping rules to achieve desirable statisticalproperties when the actual sample size deviatesfrom the planned one. Herndon proposed a hybriddesign by blending the one-stage and two-stagedesigns, which allows uninterrupted accrualbetween the stages [16]. Chen and Ng [17] gave

a collection of two-stage designs with a range ofsample sizes in the first and second stages toconstruct optimal flexible designs.

Another limitation of the fixed design is thateven after an extra long series of failures isobserved, there is no formal mechanism to stopthe trial before the predetermined sample size isreached. Furthermore, investigators often need todecide whether to continue or to terminate a trial atcertain interim points, not always at the timepoints initially planned, due to slow accrual orother practical considerations [18]. This lack ofdesign flexibility exposes a fundamental limitationof all frequentist-based methods because statisticalinference is made by computing the probability ofobserving certain data conditioned on a particulardesign and the sampling plan. When there is adisparity between the proposed design and theactual trial conduct, which is more a norm than anexception in clinical trials, adjustments must bemade in all statistical inferences. All these reasonssupport the need for more flexible designs.

Bayesian methods, on the other hand, offer adifferent approach for designing and monitoringclinical trials by computing the probability ofparameters given data. Based on the likelihoodprinciple, all information pertinent to the para-meters is contained in the data and is notconstrained by the design. Bayesian methods areparticular appealing in clinical trial design because,inherently, they allow for flexibility in trial conductand impart the ability to examine interim data,update the posterior probability of parameters, andmake sensible decisions accordingly. Bayesianmethods can also incorporate relevant information,both internal and external to the trial, for decisionmaking [19,20]. Several Bayesian phase II designshave been proposed in the literature. Sylvesterproposed a decision theoretical approach by mini-mizing the expected loss to determine an optimalBayesian design [21]. Thall and Simon developed adesign with continuous monitoring until achievinga high posterior probability that either a drug ispromising or not promising, or until reaching themaximum sample size [22,23]. Also based onthe posterior probability, Heitjan [24] advocatedthe use of ‘persuasion probability’ as a consistentcriterion for determining whether the drug ispromising or not. Tan and Machin [25] constructedBayesian two-stage designs, which mimic thefrequentist alternatives, by calibrating design param-eters based on the posterior probability approach.Similar approaches based on informative priors ormixtures of informative priors have also beenreported [26,27].

Applying the concept of predictive probability,we take the proper Bayesian approach in the sensethat we apply the full likelihood approach, but

94 JJ Lee and DD Liu

Clinical Trials 2008; 5: 93–106 http://ctj.sagepub.com

without specifying the loss function, as in thedecision theoretical approach. Hence, we do notcompute the Bayes risk [28]. One advantage of thepredictive probability approach is that it closelymimics the clinical decision making process.Predictive probability is obtained by calculatingthe probability of a positive conclusion (rejectingthe null hypothesis) should the trial be conductedto the maximum planned sample size given theinterim observed data. In this framework, thechance that the trial will show a conclusive resultat the end of the study, given the current informa-tion, is evaluated. Then, the decision to continue orto stop the trial can be made according to thestrength of the predictive probability. Severaladaptations of predictive probability approachesfor interim monitoring of clinical trials have beendiscussed [29–31, 32–34, 35–39]. Notably, Choi etal. [29] calculated predictive probability in makingan early decision in a large sample two-arm trialwhere the outcome is the proportion of successes,using normal approximation under a large sampleassumption. Spiegelhalter et al. [31] suggested thatthe predictive power, derived by averaging theconditional power based on current data over thecurrent belief about the unknown parameters,should be used to detect significant differencesbetween treatments. As for trial design, Herson [36]first proposed the predictive probability approachfor designing phase II clinical trials with dichot-omous outcomes. The proposed maximum samplesize and critical region are originated from afrequentist one-stage design. The choice of thecutoff values for the predictive probability is some-what arbitrary and does not guarantee the type Iand type II error rates.

Under the predictive probability framework, ourapproach searches for the design parameters withinthe given constraints such that both the size andpower of the test can be guaranteed. In the section‘Methods for Simon’s two-stage design and pre-dictive probability design’, we give an overview ofSimon’s two-stage design and define the proposedpredictive probability approach in a Bayesiansetting. Exact computation and a searching proce-dure have been developed to facilitate the predic-tive probability design. In section ‘Comparisonbetween predictive probability approach andSimon’s two-stage design’, we investigate theproperty of the predictive probability approachand compare its performance with Simon’s two-stage design through several examples. In‘Properties of the predictive probability approach’,section we describe the simulation studies forevaluating the robustness of the design when trialconduct deviates from the originally proposeddesign and when the trial is stopped prematurely.The estimation bias and the comparison between

the predictive probability versus the posteriorprobability approaches are also given. We providea discussion in the final section.

Methods for Simon’s two-stage designand predictive probability design

Under the hypothesis testing framework, a phaseIIA clinical trial is designed to test

H0 : p � p0

H1 : p � p1

where p0 represents a prespecified response rate ofthe standard treatment and p1 represents a targetresponse rate of a new treatment. A study isdesigned such that

PðAccept New TreatmentjH0Þ � �

and PðReject New TreatmentjH1Þ � �

where � and � are type I and type II error rates,respectively. Given p0, p1, the maximum number ofpatients, number of stages, cohort size at eachstage, acceptance region and rejection region ateach stage, the type I and type II error rates, theprobability of early termination (PET) of the trialand the expected sample size (E(N)) under H0 canbe calculated by applying the recursive formulas ofSchultz et al. [38]. A desirable design is one withoperating characteristics that satisfy the constraintsof the type I and type II error rates, with a highprobability of early termination and a smallexpected sample size under H0. Note that wedefine the acceptance region as the outcomespace leading to the acceptance of the new treat-ment (i.e., reject the null hypothesis) and, therejection region as the outcome space leading tothe rejection of the agent (i.e., fail to reject the nullhypothesis).

Simon’s two-stage designs

The algorithm for implementing Simon’s two-stagedesigns is as follows:

Stage I: Enroll n1 patients. Stop the trial andreject the new treatment (stop for futility) if thenumber of observed responses, x1, is r1 or less.Otherwise, continue to stage II.

Stage II: Enroll ðNmax � n1Þ patients to reach atotal of Nmax patients. Reject the new treatment ifthe number of responses is r or less. Otherwise,consider that the new treatment warrants furtherdevelopment.

Predictive probability trial design 95

http://ctj.sagepub.com Clinical Trials 2008; 5: 93–106

Based on these decision rules, PET and E(N )under H0 can be calculated:

PETðp0Þ ¼ ProbðEarly TerminationjH0Þ ¼ ProbðX1 �

r1jH0Þ; where X1 is the number of responses inn1ðn1<NmaxÞ patients from the first stage, and X1

follows a binomial distribution with X1�binomialðn1; p0Þ,

EðN j p0Þ ¼ n1 þ ½1 � PETðp0Þ� � ðNmax � n1Þ:

Among all designs that satisfy the constraint ofthe type I and type II error rates, Simon’s optimaltwo-stage design is obtained when EðN j p0Þ isthe smallest. On the other hand, the minimaxdesign is defined when the maximum sample size isthe smallest.

Predictive probability approach in a Bayesiansetting

In the Bayesian approach, we assume that the priordistribution of the response rate �ðpÞ follows a betadistribution, betaða0; b0Þ. It represents the investi-gator’s previous knowledge or belief of the efficacyof the new regimen. The quantity a0=ða0 þ b0Þ

reflects the prior mean while size of a0 þ b0

indicates how informative the prior is. The quan-tities a0 and b0 can be considered as the numberof response and the number of nonresponses,respectively. Thus, a0 þ b0 can be considered as ameasure of the amount of information containedin the prior. The larger the value of a0 þ b0 , themore informative the prior and the stronger thebelief it contains. We set a maximum accrual ofpatients to Nmax. We assume the number ofresponses in the current n ðn � NmaxÞ patients, X,follows a binomial distribution, binomialðn; pÞ, andthe likelihood function for the observed data x is

LxðpÞ / px � ð1 � pÞn�x:

Consequently, the posterior distribution of theresponse rate follows a beta distribution

Pjx � betaða0 þ x; b0 þ n� xÞ:

Thus, the number of responses in thepotential m ¼ Nmax � n future patients, Y, followsa beta-binomial distribution, beta-binomialðm; a0 þ x; b0 þ n� xÞ.

When Y ¼ i, we denote the posterior probabilityof P as f ðp jX ¼ x; Y ¼ iÞ, where P|X¼ x, Y¼ i�betaða0 þ xþ i; b0 þNmax � x� iÞ.

From here on, the observed response X ¼ x willbe abbreviated as x. To calculate the predictiveprobability, we further define

Bi ¼ ProbðP>p0 j x;Y ¼ iÞ;

where P follows a beta distribution. Bi measures theprobability that the response rate is larger than p0

given x responses in n patients in the current dataand i responses in m future patients. Comparing Bi

to a threshold value �T yields an indicator Ii forconsidering that the treatment is efficacious at theend of the trial given the current data and thepotential outcome of Y ¼ i (see Table 1, Panel A fora schematic representation).

We define

Predictive Probability ðPPÞ

¼Xm

i¼0

fProbðY ¼ i j xÞ

� IðProbðP>p0 j x;Y ¼ iÞ>�T Þg

¼Xm

i¼0

fProbðY ¼ i j xÞ � IðBi>�T Þg

¼Xm

i¼0

ProbðY ¼ i j xÞ � Ii� �

where ProbðY ¼ i j xÞ is the probability of observingi responses in m patients given current data x,where Y follows a beta-binomial distribution. Theweighted sum of indicator Ii over Y yields thepredictive probability (PP) of concluding a positiveresult by the end of the trial based on thecumulative information in the current stage. Ahigh PP means that the treatment is likely to beefficacious by the end of the study, given thecurrent data, whereas a low PP suggests that thetreatment may not have sufficient activity.Therefore, PP can be used to determine whetherthe trial should be stopped early due to efficacy/futility or continued because the current data arenot yet conclusive. The decision rules can beconstructed as follows:

If PP<�L, then stop the trial and reject thealternative hypothesis;

If PP>�U , then stop the trial and reject the nullhypothesis;

Otherwise continue to the next stage untilreaching Nmax patients.

Typically, we choose �L as a small positivenumber and �U as a large positive constant, bothbetween 0 and 1 (inclusive). PP<�L indicates that itis unlikely the response rate will be larger than p0 atthe end of the trial given the current information.When this happens, we may as well stop the trialand reject the alternative hypothesis at that point.On the other hand, when PP>�U , the current datasuggest that, if the same trend continues, we willhave a high probability to conclude that thetreatment is efficacious at the end of the study.This result, then, provides evidence to stop the trialearly due to efficacy. By choosing �L>0 and

96 JJ Lee and DD Liu

Clinical Trials 2008; 5: 93–106 http://ctj.sagepub.com

�U<1:0, the trial can terminate early due to eitherfutility or efficacy. For phase IIA trials, we prefer tochoose �L > 0 and �U ¼ 1:0 to allow early stoppingdue to futility, but not due to efficacy.

For example (Table 1, Panel B), in a phase II trial,an investigator plans to enroll a maximum ofNmax ¼ 40 patients into the study. At a given time,x ¼ 16 responses are observed in n ¼ 23 patients.What is Pðresponse rate>60%Þ? Assuming a priordistribution of response rate ðpÞ as betað0:6; 0:4Þ andwith the number of responses in future m ¼ 17patients, Y follows a beta-binomial distribution,beta-binomialð17;16:6;7:4Þ. At each possible valueof Y ¼ i, the posterior probability of P follows a betadistribution, Pjx; Y ¼ i � betað16:6 þ i; 24:4 � iÞ. Inthis example, we set �T ¼ 0:90. As it can be seen,when Y is in the range from 0 to 11, the resultingPðresponse rate>60%Þ ranges from 0.0059 to 0.8415.Hence, we will conclude H0 for Y � 11. On the otherhand, when Y goes from 12 to 17, the resultingPðresponse rate>60%Þ ranges from 0.9089 to0.9990. Therefore, we will conclude H1 for Y � 12.The predictive probability is then the weightedaverage (weighted by the probability of the realiza-tion of each Y) of the indicator of a positive trialshould the current trend continue and the trial

be conducted until the end of the study.The calculation yields PP ¼ 0:5656. If we choose�L ¼ 0:10, the trial will not be stopped because PP isgreater than �L.

Designing a trial using the PP approach: SearchNmax, �L, �T, and �U for the PP design to satisfy theconstraints of the type I and type II error rates

Given p0, p1, the prior distribution of response rate�ðpÞ and the cohort size for interim monitoring, wesearch the design parameters, including the max-imum sample size Nmax, �L, �T, and �U, to yield adesign satisfying the type I and type II error ratesconstraints simultaneously. As mentioned earlier,we choose �U ¼ 1:0 because if the treatment isworking, there is little reason to stop the trial early– enrolling more patients to the active treatment isgood. Treating more patients until the maximumsample size is reached (usually, less than 100) canalso increase the precision in estimating theresponse rate. Given Nmax , the question is, ‘Arethere values of �L and �T that yield desirable designproperties’? By searching over all possible values of

Table 1 Bayesian predictive probability approach: a schema and an example

Panel A:Schema

Y¼ i Prob(Y¼ i |x) Bi¼ Prob(P> p0|x, Y¼ i ) Ii(Bi > �T)

0 Prob(Y¼0|x) B0¼ Prob(P> p0|p, f(p|x, Y¼0)) 01 Prob(Y¼1|x) B1¼ Prob(P> p0|p, f(p|x, Y¼1)) 0

. . . . . . . . . . . .

m�1 Prob(Y¼m�1|x) Bm�1¼Prob(P> p0 | p, f(p|x, Y¼m�1) 1

m Prob(Y¼m|x) Bm¼Prob(P> p0|p, f(p|x, Y¼m)) 1

Panel B: Example: Nmax¼40, x¼16, n¼23, prior distribution of P�beta(0.6, 0.4)

Y¼ i Prob(Y¼ i|x) Bi¼ Prob(P > 60% |x, Y¼ i) Ii(Bi>0.90)

0 0.0000 0.0059 0

1 0.0000 0.0138 02 0.0001 0.0296 0

3 0.0006 0.0581 0

4 0.0021 0.1049 05 0.0058 0.1743 0

6 0.0135 0.2679 0

7 0.0276 0.3822 0

8 0.0497 0.5085 09 0.0794 0.6349 0

10 0.1129 0.7489 0

11 0.1426 0.8415 0

12 0.1587 0.9089 113 0.1532 0.9528 1

14 0.1246 0.9781 1

15 0.0811 0.9910 1

16 0.0381 0.9968 117 0.0099 0.9990 1

Predictive probability trial design 97

http://ctj.sagepub.com Clinical Trials 2008; 5: 93–106

�L and �T, our goal is to identify the combinationsof �L and �T to yield the desirable power within theconstraint of the specified type I error rates. Theremay exist ranges of �L and �T that satisfy theconstraints. By varying Nmax from small to large,the design with the smallest Nmax that controlsboth type I and type II error rates at the nominallevel is the one we choose. This idea is similar tofinding the minimax design, i.e., minimizing themaximum sample size.

The framework of this PP method allows theinvestigator to monitor the trial continuously or byany cohort size. We choose to start computing PPand making interim decisions after the first 10patients have been treated and evaluated for theirresponse status. Although the choice of treating aminimum of 10 patients is somewhat arbitrary, aminimum number of patients is required to providesufficient information in order to obtain a goodestimate of the treatment efficacy and avoidmaking premature decisions based on spuriousresults from a small number of patients. After 10patients, we calculate PP continuously (i.e., withcohort size of 1) to monitor the treatment efficacy.If PP is low, the data indicate that the treatmentmay not be efficacious. A sufficiently low PP(e.g., PP < �L) suggests that the trial could bestopped early due to lack of efficacy. Note that PPcan be computed for any cohort size and at anyinterim time. A trial can be stopped anytime due toexcessive toxicity, however.

Comparison between predictiveprobability approach and Simon’stwo-stage design

The comparison between the PP design andSimon’s two-stage design is performed by examin-ing the design’s operating characteristics, such asthe type I error rate, power, PETðp0Þ and EðN j p0Þ.The concept of designing trials using the PPapproach is illustrated in the following examples.

Example 1: A lung cancer trial

The primary objective of this study is to assess theefficacy of a combination therapy as front-linetreatment in patients with advanced nonsmall celllung cancer. The study involves the combination ofa vascular endothelial growth factor antibody plusan epidermal growth factor receptor tyrosine kinaseinhibitor. The primary endpoint is the clinicalresponse rate (i.e., rate of complete response andpartial response combined) to the new regimen.

The current standard treatment yields a responserate of approximately 20% (p0). The target responserate of the new regimen is 40% (p1). With theconstraint of both type I and type II error rates�0.1, Simon’s optimal two-stage design yieldsn1¼17, r1¼3, Nmax¼37, r¼10, PETðp0Þ ¼0.55and EðN j p0Þ¼26.02 with �¼0.095 and �¼0.097.The corresponding minimax design yields n1¼19,r1¼3, Nmax¼36, r¼10, PETðp0Þ¼0.46 andEðN j p0Þ ¼28.26 with �¼0.086 and �¼0.098.

Taking the PP approach, we assume that theresponse rate P has a prior distribution ofbetað0:2; 0:8Þ. The trial is monitored continuouslyafter evaluating the responses of the first 10patients. For each Nmax between 25 and 50, wesearch the �L and �T space to generate designs thathave both type I and type II error rates under 0.10.listed in Table 2 are some of the results in the orderof the maximum sample size, Nmax. Among all thedesigns, the design with Nmax ¼ 36 (boldfaced type)is the design with the smallest Nmax that has bothtype I and type II error rates controlled under 0.1.

Based on this setting, �L and �T are determinedto be 0.001 and [0.852, 0.922], respectively. Thecorresponding rejection region (in number ofresponse/n) is 0/10, 1/17, 2/21, 3/24, 4/27, 5/29,6/31, 7/33, 8/34, 9/35, and 10/36. The trial will bestopped and the treatment is considered ineffectivewhen the number of responses first falls intothe rejection region. Based on these boundaries, ifthe true response rate is 20%, the probability ofaccepting the treatment is 0.088. On the otherhand, if the true response rate is 40%, theprobability of accepting the treatment is 0.906.The probability of stopping the trial early is 0.86and the expected sample size is 27.67 when the trueresponse rate is 20%. Compared to Simon’s mini-max two-stage design, the trial is monitored morefrequently in the PP design, which also has a largerprobability of early termination and a smallerexpected sample size in the null case. Both designshave the same maximum sample size withcontrolled type I and type II error rates.

Example 2: A tongue cancer trial

The primary objective of a second study is to assessthe efficacy of induction chemotherapy (withpaclitaxel, ifosfamide, and carboplatin) followedby radiation in treating young patients with prioruntreated squamous cell carcinoma of the tongue.Previous results show that radiation alone gives aresponse rate of 60% (p0). With induction che-motherapy plus radiation, the target response rateis set at 80% (p1). Under the constraint of a type Ierror rate � 0.05 and a type II error rate � 0.20,

98 JJ Lee and DD Liu

Clinical Trials 2008; 5: 93–106 http://ctj.sagepub.com

Simon’s optimal two-stage design yields n1¼11,r1¼7, Nmax¼43, r¼30, PETðp0Þ ¼0.70, andEðN j p0Þ ¼20.48 with �¼0.049, �¼0.198. Thecorresponding minimax design yields n1¼13,r1¼8, Nmax¼35, r¼25, PETðp0Þ ¼0.65, andEðN j p0Þ ¼20.77 with �¼0.050 and �¼0.192.

Taking the PP approach, we search the designparameters with a maximum number of patientsranging between 25 and 44, assuming the priordistribution of the response rate as betað0:6; 0:4Þ,and using continuous monitoring after the first 10patients are evaluated. Table 3 lists the resultingdesigns. We find that the design with the smallestNmax that has desirable type I and II error rates isthe one with Nmax ¼ 35 (shown in boldface typein Table 3), where �L and �T are in the ranges of[0.075, 0.079] and [0.924, 0.963], respectively. Inthis design, if the true response rate is 60%, theprobability of accepting the treatment is 0.050, theprobability of early termination of the trial is 0.94and the expected sample size is 16.87. On the otherhand, if the true response rate is 80%, theprobability of accepting the treatment is 0.815.Again, compared to Simon’s designs, the trial ismonitored more frequently applying the PPapproach and has a higher probability of earlystopping with a smaller expected sample size, if the

regimen is not efficacious. In this example, themaximum number of patients under the PPapproach is also the same as that under Simon’sminimax two-stage design.

More examples

Operating characteristics of more examples usingSimon’s minimax and optimal two-stage designs,and the PP design are shown in Table 4, withp0¼0.1 to 0.7 by 0.1 intervals, p1 � p0 ¼ 0:2, type Iand type II error rates of 0.10, and continuousmonitoring after the first 10 evaluable patients inthe PP design. While the type I and type II errorrates are controlled in both PP and Simon’s two--stage designs, the probability of stopping the trialearly is always larger in the PP design. In addition,we find that the PP design has a smaller expectedsample size under the null hypothesis in all casesexcept when p0 ¼ 0:1 and p1 ¼ 0:3. Compared tomulti-stage designs, PP designs are also moreflexible to conduct and have more robust designproperties when the trial conduct does not followthe plan exactly. This topic is described in greaterdetail in the next section.

Table 2 Operating characteristics of Simon’s two-stage designs and the PP design with type I and type II

error rates�0.10, prior for p¼beta(0.2,0.8), p0¼0.2, p1¼0.4

Simon’s minimax/optimal 2-satge

r1/n1 r/Nmax PET (p0) E(N|p0) � �

Minimax 3/19 10/36 0.46 28.26 0.086 0.098Optimal 3/17 10/37 0.55 26.02 0.095 0.097

Predictive probability

�L �T r/Nmax PET (p0) E(N|p0) � �

0.001 [0.852,0.922]* 10/36 0.86 27.67 0.088 0.0940.011 [0.830,0.908] 10/37 0.85 25.13 0.099 0.0840.001 [0.876,0.935] 11/39 0.88 29.24 0.073 0.092

0.001 [0.857,0.923] 11/40 0.86 30.23 0.086 0.075

0.003 [0.837,0.910] 11/41 0.85 30.27 0.100 0.0620.043 [0.816,0.895] 11/42 0.86 23.56 0.099 0.083

0.001 [0.880,0.935] 12/43 0.88 32.13 0.072 0.074

0.001 [0.862,0.924] 12/44 0.87 33.71 0.085 0.059

0.001 [0.844,0.912] 12/45 0.85 34.69 0.098 0.0480.032 [0.824,0.898] 12/46 0.86 26.22 0.098 0.068

0.001 [0.884,0.936] 13/47 0.89 35.25 0.071 0.058

0.001 [0.868,0.925] 13/48 0.87 36.43 0.083 0.047

0.001 [0.850,0.914] 13/49 0.86 37.86 0.095 0.0380.020 [0.832,0.901] 13/50 0.86 30.60 0.100 0.046

*The numbers within the bracket indicate a closed interval with both endpoints included. This applies to allintervals shown in the table.

Predictive probability trial design 99

http://ctj.sagepub.com Clinical Trials 2008; 5: 93–106

Properties of the predictive probabilityapproach

Robustness for the predictive probability design

We have discussed the schema of continuousmonitoring after evaluating the first cohort. Inpractice, instead of continuous monitoring, inves-tigators may want to monitor a trial by differentcohort sizes. To examine the robustness of the PPdesign without continuous monitoring, we performsimulation studies. Specifically, once we choose �L ,�T, and Nmax based on prespecified type I and type IIerror rates to yield a desirable design, we evaluatethe method’s robustness by fixing these designparameters but varying the number of stages andcohort size in each stage.

In Example 1 (‘Example 1: A lung cancer trial’),with Nmax ¼ 36, �L is set at 0.001 and �T is in a rangeof [0.852, 0.922]. The trial is designed to monitorevery patient after the first 10 patients until thetrial reaches the maximum sample size. We inves-tigate the results of monitoring every five or tenpatients. In addition, three scenarios of monitoringschedules for this trial are examined. In scenarios 1and 2, the numbers of stages are fixed to 3 and 10,respectively, but cohort sizes after the first 10patients are chosen randomly. In scenario 3, boththe number of stages, between 3 and 10, and cohortsizes after the first 10 patients vary randomly.We use uniform distributions to choose the numberof stages (scenario 3) and the size of each stage(scenarios 1, 2, and 3). When we monitor the trials

after treating every 5 or 10 patients with the trueresponse rate of 0.2, the type I error rates are 0.088and 0.088 , the powers are 0.907 and 0.907 , theprobabilities of early termination are 0.86 and 0.45and the expected numbers of patients treated are29.74 and 30.87, respectively. Applying the variablewidth distribution plot [39], summaries of theoperating characteristics for scenarios 1, 2, and 3are shown in Figure 1, based on 1000 simulatedtrials under each scenario. These results indicatethat the PP design remains robust in controllingtype I and type II error rates. When the trial ismonitored less frequently than planned, type Ierror rates are still under the nominal level of 0.10and the power is higher than the nominal level of0.90. Also, the chance of early termination due tofutility is smaller than the continuous monitoringfor all the scenarios studied. The expected samplesize is slightly larger in scenarios 2 and 3 than thosefrom the original PP design and Simon’s minimaxtwo-stage design.

In Example 2 (‘Example 2: A tongue cancertrial’), under the same evaluation of robustness,almost all the type I error rates are slightly largerthan 0.05 and the power values are all higher than0.80 (Figure 2). Again, the probability of earlytermination due to futility is smaller and theexpected sample size is larger than those from theoriginal PP design. However, all the probabilities ofearly termination due to futility in the 10-stagescenario and the majority of the probabilities ofearly termination in the random-stage scenario arelarger than those from Simon’s minimax two-stagedesign. This causes all the expected sample sizes in

Table 3 Operating characteristics of Simon’s two-stage designs and the PP design with type I error rate�0.05 and type II error

rate�0.20, prior for p¼beta(0.6,0.4), p0¼0.6, p1¼0.8

Simon’s minimax/optimal 2-satge

r1/n1 r/Nmax PET(p0) E(N|p0) � �

Minimax 8/13 25/35 0.65 20.77 0.050 0.192Optimal 7/11 30/43 0.70 20.48 0.049 0.198

Predictive probability

�L �T r/Nmax PET(p0) E(N|p0) � �

[0.075,0.079]* [0.924,0.963] 25/35 0.94 16.87 0.050 0.18550.001 [0.940,0.972] 26/36 0.94 23.16 0.045 0.1680.001 [0.953,0.978] 27/37 0.96 23.05 0.035 0.191

[0.061,0.067] [0.925,0.963] 27/38 0.94 18.20 0.050 0.158

0.001 [0.941,0.971] 28/39 0.94 25.12 0.045 0.1410.001 [0.953,0.978] 29/40 0.96 24.87 0.035 0.161

0.051 [0.927,0.962] 29/41 0.94 19.57 0.0500 0.135

0.001 [0.942,0.971] 30/42 0.94 27.28 0.045 0.119

0.001 [0.954,0.977] 31/43 0.96 26.88 0.035 0.1360.051 [0.929,0.962] 31/44 0.94 21.40 0.050 0.111

*The numbers within the bracket indicate a closed interval with both endpoints included. This applies to all intervals shown in the table.

100 JJ Lee and DD Liu

Clinical Trials 2008; 5: 93–106 http://ctj.sagepub.com

Table

4Operatingch

aracteristicsofadditionalexamplesusingSim

on’stw

o-stagedesignsandthePPdesignwithtypeIandtypeIIerrorratesof0.10

Sim

on’sminim

ax/optimaltw

oStagedesigna

Predictive

probability

design

p0/p

1r 1/n

1r/Nmax

PET(p

0)

E(N|p

0)

��

� L� T

r/Nmax

PET(p

0)

E(N|p

0)

��

0.1/0.3

1/16

4/25

0.52

20.37

0.095

0.097

0.001

[0.782,0.911]

4/25

0.79

19.99

0.096

0.095

1/12

5/35

0.66

19.84

0.098

0.099

0.001

[0.855,0.942]

5/29

0.86

21.98

0.062

0.099

0.001

[0.838,0.933]

5/30

0.85

23.25

0.072

0.081

0.001

[0.821,0.924]

5/31

0.83

23.94

0.082

0.068

0.001

[0.804,0.913]

5/32

0.81

25.42

0.093

0.054

0.003

[0.786,0.902]

5/33

0.81

23.50

0.100

0.056

[0.037,0.046]*

[0.767,0.890]

5/34

0.84

19.96

0.099

0.069

0.001

[0.879,0.949]

6/35

0.89

26.01

0.054

0.070

0.001

[0.866,0.942]

6/36

0.87

26.75

0.061

0.059

0.001

[0.852,0.934]

6/37

0.86

28.16

0.070

0.048

0.001

[0.838,0.926]

6/38

0.85

29.14

0.078

0.040

0.2/0.4

3/19

10/36

0.46

28.26

0.086

0.098

0.001

[0.852,0.922]

10/36

0.86

27.67

0.088

0.094

3/17

10/37

0.55

26.02

0.095

0.097

0.011

[0.830,0.908]

10/37

0.85

25.13

0.100

0.084

0.001

[0.876,0.935]

11/39

0.88

29.24

0.073

0.092

0.001

[0.857,0.923]

11/40

0.86

30.23

0.086

0.075

0.003

[0.837,0.910]

11/41

0.85

30.27

0.100

0.062

0.043

[0.816,0.895]

11/42

0.86

23.56

0.099

0.083

0.3/0.5

7/28

15/39

0.37

34.99

0.094

0.100

0.001

[0.859,0.918]

16/42

0.86

32.47

0.096

0.083

7/22

17/46

0.67

29.89

0.097

0.095

0.001

[0.881,0.932]

17/44

0.88

33.50

0.081

0.088

0.001

[0.858,0.916]

17/45

0.86

34.84

0.098

0.069

0.048

[0.834,0.899]

17/46

0.87

25.11

0.099

0.092

0.001

[0.880,0.929]

18/47

0.88

35.70

0.083

0.073

0.001

[0.858,0.914]

18/48

0.86

37.33

0.100

0.056

0.4/0.6

11/28

20/41

0.55

33.84

0.095

0.099

0.001

[0.868,0.923]

20/41

0.87

31.07

0.096

0.098

7/18

22/46

0.56

30.22

0.095

0.100

0.001

[0.875,0.927]

21/43

0.88

32.12

0.091

0.093

0.001

[0.882,0.930]

22/45

0.89

33.65

0.086

0.087

[0.024,0.025]

[0.854,0.911]

22/46

0.87

27.33

0.100

0.084

0.001

[0.889,0.934]

23/47

0.89

35.00

0.082

0.082

[0.010,0.011]

[0.862,0.915]

23/48

0.87

31.58

0.099

0.068

0.5/0.7

11/23

23/39

0.50

31.00

0.098

0.099

0.001

[0.869,0.925]

23/39

0.87

29.15

0.100

0.095

11/21

26/45

0.67

28.96

0.096

0.098

[0.019,0.020]

[0.863,0.920]

24/41

0.88

25.39

0.100

0.091

0.001

[0.892,0.939]

25/42

0.90

30.82

0.082

0.097

0.037

[0.858,0.915]

25/43

0.88

24.39

0.100

0.091

0.001

[0.887,0.934]

26/44

0.89

32.48

0.087

0.082

0.060

[0.852,0.910]

26/45

0.88

24.31

0.099

0.087

0.6/0.8

18/27

24/35

0.82

28.47

0.097

0.100

0.001

[0.885,0.939]

25/36

0.89

25.51

0.090

0.089

6/11

26/38

0.47

25.38

0.097

0.096

[0.042,0.046]

[0.864,0.924]

26/38

0.88

21.65

0.099

0.081

0.001

[0.889,0.940]

27/39

0.89

27.73

0.088

0.075

0.001

[0.909,0.952]

28/40

0.91

27.43

0.071

0.088

0.026

[0.869,0.926]

28/41

0.88

24.65

0.100

0.062

0.7/0.9

11/16

20/25

0.55

20.05

0.091

0.902

[0.001,0.011]

[0.884,0.953]

20/25

0.89

16.42

0.091

0.098

6/9

22/28

0.54

17.79

0.099

0.090

[0.103,0.133]

[0.860,0.937]

22/28

0.88

15.73

0.100

0.077

0.001

[0.883,0.949]

23/29

0.89

19.55

0.093

0.064

0.001

[0.903,0.959]

24/30

0.91

19.64

0.077

0.073

aThefirstrow

represents

Sim

on’sminim

axtw

o-stagedesignandtheseco

ndrow

represents

Sim

on’soptimaltw

o-stagedesign.

*Thenumbers

within

thebracketindicate

aclosedintervalwithboth

endpoints

included.Thisappliesto

allintervalsshownin

thetable.

Predictive probability trial design 101

http://ctj.sagepub.com Clinical Trials 2008; 5: 93–106

the 10-stage scenario and most of the expectedsample sizes in the random-stage scenario to besmaller than those from Simon’s minimax two-stage design.

When trial conduct deviates from theoriginal design, predictive probability can still becalculated at any time during the trial to provideinvestigators updated information. The robustnessstudy indicates that the PP design remainsrobust even when the continuous monitoring ruleis not implemented after observing the outcome ofevery patient. We found that the inflation of thetype I error rate is small and usually less than 10%of the prespecified level in all of our simulationstudies.

Unplanned early termination

Clinical trials can be terminated earlier thanplanned due to slow accrual or some other reasons.For this situation, we examine the influence ofearly termination on the design properties. Usingour first trial example, presented in section‘Example 1: A lung cancer trial’, we suppose thetrial is stopped after 20 evaluable patients. When �T

is held at the prespecified value of 0.922, we willdeclare that the new regimen is not promising ifsix or fewer responses have been observed in 20patients, and promising otherwise. The correspond-ing type I error rate and power are 0.087 and 0.750,respectively. When this unplanned early termina-tion occurs, the type I error rate is still controlled ata level under 10%, whereas power suffers a loss(< 90%) due to the smaller than planned samplesize. Similarly, we assume that the trial is stoppedafter 20 patients in our second example (section‘Example 2: A tongue cancer trial’). With �T¼0.963,the type I error rate is controlled at 0.051 (a 2%increase from 0.05) but the power is reduced to0.630, much less than the targeted 80% level due tothe smaller sample size.

Estimation bias

It is well known that sequential design withoutcome-based early stopping rules can result inbias in estimating the parameters correspondingto that outcome. We performed simulation studiesto evaluate the bias for Examples 1 and 2 presented

Random stages

10 stages

3 stages

0.086 0.087 0.088 0.902 0.903 0.904 0.905 0.906 0.907

Power

0.2 0.4 0.6 0.8

Random stages

10 stages

3 stages

Probability of early termination

28 30 32 34

Expected sample size

Type I error rate(a) (b)

(c) (d)

Figure 1 Operating characteristics for 1000 simulated trials under three scenarios for Example 1 : 3 stages, 10 stages, and a random

number of stages, all with random cohort sizes. Subpanels: (a) Type I error rates, (b) Power for detecting a response rate of 40%,

(c) Probability of early termination if the true response rate is 20%, and (d) Expected sample size if the true response rate is 20%.Line types: ������ Simon’s minimax two-stage design, - - - - - Predictive probability design, monitoring every patient, -�-�- Predictive

probability design, monitoring every five patients

102 JJ Lee and DD Liu

Clinical Trials 2008; 5: 93–106 http://ctj.sagepub.com

in ‘Comparison between predictive probabilityapproach and Simon’s two-stage design’ section.Based on 5000 simulations, the median and mean(standard deviation) of the estimated response ratesfor Example 1 under p0 ¼ 0:20 and p1 ¼ 0:40 are0.19, 0.17 (0.09) and 0.39, 0.40 (0.09), respectively.For Example 2, the corresponding statistics forthe estimated response rates under p0 ¼ 0:60 andp1 ¼ 0:80 are 0.55, 0.55 (0.12) and 0.80, 0.78 (0.10),respectively. Due to the higher probability of earlystopping under H0, the bias is more prominent inH0 than in H1. Under H0, the observed median andmean underestimate the true response rate byabout 10%. The early stopping rate under H1 islow. Hence, less bias is observed in estimating thetrue response rate.

Comparison between the designs based on theposterior probability versus designs based onpredictive probability

Taking the settings in Examples 1 and 2, we alsoevaluate the operating characteristics for thedesigns based on the posterior approach. To the

aforementioned predictive probability approach,we define the early stopping rule for the posteriorprobability approach: (1) Stop the trial whenPðp>p1Þ< �� for futility and (2) declare a positivetrial at the end of the study when PðP>p0Þ > ��:In the setting of Example 1, with Nmax ¼ 36,the corresponding design parameters for the poster-ior probability approach are �� ¼0.001, ��¼ [0.852,0.922]. The resulting type I error rate, probabilityof early stopping, and expected sample size underH0 are 0.088, 0.45, and 28.73, respectively.Under H1, the study power is 0.905 and theexpected sample size is 35.73. These results arequite similar to the predictive probability approachreported in Section 3, except that the probability ofearly stopping is much lower (0.45 vs 0.86)under H0.

Another notable difference is the construct ofrejection regions. For the posterior probabilityapproach, the rejection region is 0/10, 1/15, 2/20.3/24, 4/28, 5/32, and 10/36. There is a bigjump toward the end of the study (5/32 to 10/36)for the posterior probability approach whilethe transition for the predictive probabilityapproach is smoother. Similar results are found inExample 2 (data not shown).

0.048 0.050 0.052 0.054 0.056

Random stages

10 stages

3 stages

Type I error rate(a) (b)

(c) (d)

0.80 0.81 0.82 0.83 0.84 0.85

Power

0.4 0.5 0.6 0.7 0.8 0.9 1.0

Random stages

10 stages

3 stages

16 18 20 22 24 26

Expected sample sizeProbability of early termination

Figure 2 Operating characteristics for 1000 simulated trials under three scenarios for Example 2 : 3 stages, 10 stages, and a randomnumber of stages, all with random cohort sizes. Subpanels: (a) Type I error rates, (b) Power for detecting a response rate of 80%,

(c) Probability of early termination if the true response rate is 60%, and (d) Expected sample size if the true response rate is 60%.

Line types: ������ Simon’s minimax two-stage design, - - - - - Predictive probability design, monitoring every patient, -�-�- Predictive

probability design, monitoring every five patients

Predictive probability trial design 103

http://ctj.sagepub.com Clinical Trials 2008; 5: 93–106

Discussion

In the frequentist multi-stage design, the studydesign is rigid in the sense that decisions can onlybe made at predetermined cohort sizes of the trial.However, the design with the predictive probabilityapproach provides an excellent alternative forconducting multi-stage phase II trials. It is efficientand flexible. The design parameters can be found bysearching the parameter space and choosing appro-priate values of �L and �T such that the type I andtype II error rates are controlled. The sample size canbe determined by choosing the smallest Nmax

among all designs that satisfy the design criteria.The PP design yields a higher probability of earlystopping and a smaller expected sample size com-pared to Simon’s two-stage designs when thetreatment is unlikely to be efficacious. By varyingthe number of stages and cohort sizes, the PP designalso shows robust operating characteristics. Whenthe trial conduct deviates from the original plan, thetype I error rate only increases slightly. To ensurethat the nominal type I error rate is controlled, wecan design the trial with a stricter type I error rate.For example, we can set type I error rates as 0.045and 0.09 for targeting 0.05 and 0.10 type I errorrates, respectively. Furthermore, through the speci-fication of the prior distribution, the PP design canformally incorporate the additional efficacy infor-mation obtained before and throughout the trialinto the decision making process. The PP designallows for a highly flexible monitoring schedule thatis suitable for clinical applications, yet its operatingcharacteristics remain robust. In our trial examples,we monitor the trial continuously after the first 10patients are treated and are evaluable for response.Although we present the PP design using contin-uous monitoring, the approach can be applied withany number of stages and cohort sizes. In practice,the first cohort of patients and the number of stagesare usually determined after consulting with studyinvestigators to identify a reasonable choice thatbalances the practical considerations with trialefficiency.

We find that comparable operating charac-teristics can be obtained by taking both thepredictive probability and the posterior probabilityapproaches. However, the predictive probabilityapproach resembles more closely the clinicaldecision making process, i.e., deciding whether tocontinue the study based on the interim data, aswell as projecting into the future. The predictiveprobability approach also leads to higher earlystopping rates under the null hypothesis (whentreatment not working). Furthermore, the rejectionregion has a smooth transition compared to theposterior probability approach, which is likely to

have a ‘jump’ between the interim data analysisand the final analysis. Further research on compar-ing the applications of the two approaches indesigning clinical trials is warranted.

Regarding the choice of �L and �T, in general, thelower �T is, the easier it is to reject H0, hence, there isincreased power and type I error rate. On the otherhand, the higher �T is, the harder it is to reject H0,hence, there is decreased power and type I error rate.Likewise, a larger �L corresponds to easier stoppingdue to futility and decreased power and type I errorrate, and a smaller �L corresponds to the opposites.However, due to the discrete nature of the binomialdistribution, for a fixed parameter (e.g., responserate =20%) with a given prior distribution, there maybe a range of �L and �T which gives the same rejectionboundaries. In this case, from the trial conduct pointof view, it does not matter which values of �L and �Tare used. If a range of �L and �T exists which satisfiesthe prespecified constraints, one may choose themidpoint of the range for defining the decisionrule using the predictive probability. The effect ofvarying parameters or the prior distributions can beevaluated accordingly.

Although frequentist approaches have histori-cally dominated the field of clinical trial design andanalysis, Bayesian methods provide many appeal-ing properties. These include the ability to incor-porate the information obtained before the trialinto the study design, to use the informationobtained during the trial for monitoring thestudy, flexibility in trial conduct, and a consistentway for making inference [36,40–42]. Instead oftaking an ideological stand of choosing either theBayesian or frequentist method, we take a prag-matic approach in searching for a better method forevaluating treatment efficacy in the phase IIsetting. We use the Bayesian framework as a toolto design clinical trials with desirable frequentistproperties.[19] Taking the Bayesian approach, wederive an efficient and flexible design. In themeantime, we choose the design parameters (inour case, Nmax, �L and �T) to control the type I andtype II error rates under a variety of settings. Severalsimilar approaches of taking the Bayesian frame-work to achieve good frequentist propertieshave been proposed in the literature [43,44]. Weadvocate the use of extensive computation and/orsimulation studies to search for the optimal designparameters and to evaluate the design’s operatingcharacteristics. Sensitivity analysis should be con-ducted to examine how the design parametersinfluence the design properties. Robustness analysisshould also be carried out to study the impact ondesign properties when the study conduct deviatesfrom the original design. Based on our extensivestudies, we conclude that the predictive probabilityapproach offers practical and desirable designs for

104 JJ Lee and DD Liu

Clinical Trials 2008; 5: 93–106 http://ctj.sagepub.com

phase II cancer clinical trials. S-PLUS/R programscan be downloaded from http://biostatistics.mdanderson.org/SoftwareDownload/ to assist indeveloping a study design using the describedpredictive probability method.

One reviewer pointed out the limitation of thetraditional hypothesis testing framework whendesigns are constructed with the primary goal forcontrolling type I and type II error rates. Forexample, as shown in Example 2 and Table 3, bothSimon’s minimax design and our predictive prob-ability design will reject the null hypothesis if weobserve 25 responses in 35 patients when p0¼0.6,p1¼0.8, �¼0.05, and �¼0.2. It seems unreasonablethat we fail to reject p0¼0.6 when the observedresponse rate p̂¼0.71 is considerably larger than0.6. This is a result of choosing a small type I errorrate. To achieve a valid scientific inference, weemphasize the importance of estimation even whenthe trial is designed under the hypothesis testingframework. In this example, the posterior distribu-tion for the response rate is beta(25.6, 10.4) if wetake the prior as beta(0.6, 0.4). The probabilities thatthe response rate is greater than 0.6 and 0.8 are 0.92and 0.11, respectively. The 95% highest probabilitydensity interval for the response rate is (0.56, 0.85).The posterior probability provides a more thoroughassessment of the response rate under the Bayesianframework.

Lastly, we would like to emphasize that reachingan efficacy threshold is not the only purpose ofphase II trials. Information collected from phase IIstudies will enable the investigators to providefurther safety analysis, to find out appropriatedosing schedules, to identify appropriate popula-tions for a new treatment, and to assess thepossibility for combination studies, etc. Dependingon the purpose of the study, the investigators maynot always want to terminate the study early, evenwhen the early stopping boundary is crossed.However, the predictive probability approach stilloffers a consistent way to evaluate the strength ofthe treatment efficacy based on the observed data.Phase II trials can be considered as learning studies.The predictive probability design under theBayesian framework provides an ideal environmentfor learnisng.

Acknowledgments

This work was supported in part by grants fromthe National Cancer Institute, CA97007, and theDepartment of Defense, W81XWH-04-1-0142 andW81XWH-05-2-0027. The authors thank thereviewers for their helpful comments and Lee Ann

Chastain and M. Victoria Cervantes for theireditorial assistance.

References

1. Green S, Benedetti J, Crowley J. Clinical trials inoncology, 2nd edn. Boca, Chapman & Hall/CRC, Raton,FL, 2002.

2. Palesch YY, Tilley BC, Sackett DL, Johnson KC,Woolson R. Applying a phase II futility study design totherapeutic stroke trials. Stroke 2005; 36: 2410-14.

3. Thall PF, Simon RM. Recent developments in the designof phase II clinical trials. Cancer Treat Res 1995; 75: 49–71.

4. Kramar A, Potvin D, Hill C. Multistage designs for phaseII clinical trials: statistical issues in cancer research. Br JCancer, 1996; 74: 1317–20.

5. Mariani L, Marubini E. Design and analysis of phase IIcancer trials: a review of statistical methods and guidelinesfor medical researchers. Int Stat Rev 1996; 64(1): 61–88.

6. Lee JJ, Feng L. Randomized phase II designs in cancerclinical trials: current status and future directions. J ClinOncol 2005; 23: 4450–57.

7. Gehan EA. The determination of the number of patientsrequired in a preliminary and a follow-up trial of a newchemotherapeutic agent. J Chronic Dis 1961; 13: 346–53.

8. Simon R. Optimal two-stage designs for phase II clinicaltrials. Cont Clin Trials 1989; 10: 1–10.

9. Fleming TR. One-sample multiple testing procedure forphase II clinical trials. Biometrics 1982; 38: 143–51.

10. Lee YJ. Phase II trials in cancer: present status andanalysis methods. Drugs Exp Clin Res 1986; 12: 57–71.

11. Ensign LG, Gehan EA, Kamen DS, Thall, PF. Anoptimal three-stage design for phase II clinical trials.Stat Med 1994; 13: 1727–36.

12. Chen TT. Optimal three-stage designs for phase II cancerclinical trials. Stat Med 1997; 16: 2701–11.

13. Jung SH, Carey M, Kim KM. Graphical search for two-stage designs for phase II clinical trials. Cont Clin Trials2001; 22: 367–72.

14. Jung SH, Lee T, Kim KM, George SL. Admissible two-stage designs for phase II cancer clinical trials. Stat Med2004; 23: 561–9.

15. Green SJ, Dahlberg S. Planned versus attained design inphase II clinical trials. Stat Med 1992; 11: 853–62.

16. Herndon II, JE. A design alternative for two-stage, phaseII, multicenter cancer clinical trials. Cont Clin Trials1998; 19: 440–50.

17. Chen TT, Ng TH. Optimal flexible designs in phase IIclinical trials. Stat Med 1998; 17: 2301–12.

18. Korn EL, Simon R. Data monitoring committees andproblems of lower-than-expected accrual or events rates.Cont Clin Trials 1996; 17: 526–35.

19. Berry DA. Introduction to Bayesian methods III: use andinterpretation of Bayesian tools in design and analysis.Clin Trials 2005; 2: 295-300; discussion 301–4, 364–78.

20. Berry DA. Clinical trials: is the Bayesian approach readyfor prime time? Yes! Stroke 2005; 36: 1621–22.

21. Sylvester RJ. A Bayesian approach to the design of phaseII clinical trials. Biometrics 1998; 44: 823–36.

22. Thall PF, Simon R. A Bayesian approach to establishingsample size and monitoring criteria for phase II clinicaltrials. Cont Clin Trials 1994; 15: 463–481.

23. Thall PF, Simon R. Practical Bayesian guidelines forphase IIB clinical trials. Biometrics 1994; 50: 337–49.

24. Heitjan DF. Bayesian interim analysis of phase II cancerclinical trials. Stat Med 1997; 16: 1791–802.

25. Tan SB, Machin D. Bayesian two-stage designs for phaseII clinical trials. Stat Med 2002; 21: 1991-2012.

Predictive probability trial design 105

http://ctj.sagepub.com Clinical Trials 2008; 5: 93–106

26. Mayo MS, Gajewski BJ. Bayesian sample size calcula-tions in phase II clinical trials using informativeconjugate priors. Cont Clin Trials 2004; 25: 157–67.

27. Gajewski BJ, Mayo MS. Bayesian sample size calcula-tions in phase II clinical trials using a mixture ofinformative priors. Stat Med 2006; 25: 2554–66.

28. Spiegelhalter DJ, Abrams KR, Myles JP. BayesianApproaches to Clinical Trials and Health-Care Evaluation,John Wiley & Sons, Chichester, West Sussex, 2004.

29. Choi SC, Smith PJ, Becker DP. Early decision in clinicaltrials when the treatment differences are small.Experience of a controlled trial in head trauma. ContClin Trials 1985; 6: 280–88.

30. Choi SC, Pepple PA. Monitoring clinical trials based onpredictive probability of significance. Biometrics 1989;45: 317–23.

31. Spiegelhalter DJ, Freedman LS, Blackburn PR.Monitoring clinical trials: conditional or predictivepower? Cont Clin Trials 1986; 7: 8–17.

32. Spiegelhalter DJ, Freedman LS. A predictive approachto selecting the size of a clinical trial, based on subjectiveclinical opinion. Stat Med 1986; 5: 1–13.

33. Grieve AP. Predictive probability in clinical trials.Biometrics 1991; 47: 323–30.

34. Johns D, Andersen JS. Use of predictive probabilities inphase II and phase III clinical trials. J Biopharm Stat 1999;9: 67–79.

35. Gould AL. Timing of futility analyses for proof ofconcept trials. Stat Med 2005; 24: 1815–35.

36. Berry DA. A guide to drug discovery: Bayesian clinicaltrials. Nature Reviews Drug Discovery 2006; 5: 27–36.

37. Herson J. Predictive probability early terminationplans for phase II clinical trials. Biometrics 1979; 35:775–83.

38. Schultz JR, Nichol FR, Elfring GL, Weed SD. Multiple-stage procedures for drug screening. Biometrics 1973; 29:293–300.

39. Lee JJ, Tu ZN. A versatile one-dimensional distributionplot: the BLiP plot. The American Statistician 1997; 51:353–58.

40. Spiegelhalter DJ, Freedman LS, Parmar MKB. Bayesianapproaches to randomized trials. Journal of the RoyalStatistical Society Series A 1994; 157: 357–416.

41. Grossman J, Parmar MKB, Spiegelhalter DJ,Freedman LS. A unified method for monitoring andanalysing controlled trials. Statistics in Medicine 1994;13: 1815–26.

42. Spiegelhalter DJ, Myles JP, Jones DR, Abrams KR.Bayesian methods in health technology assessment:A review. Health Technology Assessment 2000; 4: 1–121.

43. Stallard N, Whitehead J, Cleall S. Decision-making in aphase II clinical trial: a new approach combiningBayesian and frequentist concepts. PharmaceuticalStatistics 2005; 4: 119–28.

44. Wang YG, Leung DHY, Li M, Tan SB. Bayesiandesigns with frequentist and Bayesian error rate con-siderations. Statistical Methods in Medical Research 2005;14: 445–56.

Abbreviations

E(N) expected sample sizePET probability of early termination

PP Predictive probability

106 JJ Lee and DD Liu

Clinical Trials 2008; 5: 93–106 http://ctj.sagepub.com

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.