15
A group sequential type design for three-arm non-inferiority trials with binary endpoints Gang Li ,1 and Shan Gao 2 1 GlaxoSmithKline, 1250 S. Collegeville Rd, Collegeville, PA 19426, USA 2 Techdata LLC, 700 American Ave., King of Prussia, PA 19406, USA Received 14 August 2009, revised 11 February 2010, accepted 30 April 2010 The three-arm design with a test treatment, an active control and a placebo group is the gold standard design for non-inferiority trials if it is ethically justifiable to expose patients to placebo. In this paper, we first use the closed testing principle to establish the hierarchical testing procedure for the multiple comparisons involved in the three-arm design. For the effect preservation test we derive the explicit formula for the optimal allocation ratios. We propose a group sequential type design, which naturally accommodates the hierarchical testing procedure. Under this proposed design, Monte Carlo simulations are conducted to evaluate the performance of the sequential effect pre- servation test when the variance of the test statistic is estimated based on the restricted maximum likelihood estimators of the response rates under the null hypothesis. When there are uncertainties for the placebo response rate, the proposed design demonstrates better operating characteristics than the fixed sample design. Key words: Effect preservation test; Group sequential design; Non-inferiority; Three-arm trial. Supporting Information for this article is available from the author or on the WWW under http://dx.doi.org/10.1002/bimj.200900188. 1 Introduction Non-inferiority clinical trials are widely conducted in pharmaceutical industry and there are two major objectives in non-inferiority trials. As in any clinical trials that study a new treatment, the first objective is to establish the effectiveness of the test treatment. On the other hand, researchers also want to compare the relative effectiveness of the test treatment and a control treatment, which is often a standard treatment on the market for the investigated disease indication. Statistically this is to show that the test treatment is non-inferior to the control treatment. The two-arm active-controlled design was widely adopted for the non-inferiority trials in phar- maceutical industry. Such trial design provides a head-to-head comparison between the test treat- ment and the control treatment. The non-inferiority is established by showing that the efficacy of the test treatment is not inferior to the control treatment by some pre-specified non-inferiority margin (FDA, 2010). It is favored by pharmaceutical companies as the design provides a direct comparison with a marketed drug and does not have the potential ethical problems as in placebo-controlled trials. However, as pointed out in literatures (Fleming, 1987; Tsong et al., 2003; Hung, Wang and O’Neill, 2005, 2007, 2009) the two-arm trial has major deficiencies in the design, data analysis and *Corresponding author: e-mail: [email protected], Phone: 11-610-917-7332, Fax: 11-610-917-4538 r 2010 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim 504 Biometrical Journal 52 (2010) 4, 504–518 DOI: 10.1002/bimj.200900188

A group sequential type design for three-arm non-inferiority trials with binary endpoints

  • Upload
    gang-li

  • View
    228

  • Download
    4

Embed Size (px)

Citation preview

Page 1: A group sequential type design for three-arm non-inferiority trials with binary endpoints

A group sequential type design for three-arm non-inferiority trials

with binary endpoints

Gang Li�,1 and Shan Gao2

1 GlaxoSmithKline, 1250 S. Collegeville Rd, Collegeville, PA 19426, USA2 Techdata LLC, 700 American Ave., King of Prussia, PA 19406, USA

Received 14 August 2009, revised 11 February 2010, accepted 30 April 2010

The three-arm design with a test treatment, an active control and a placebo group is the goldstandard design for non-inferiority trials if it is ethically justifiable to expose patients to placebo. Inthis paper, we first use the closed testing principle to establish the hierarchical testing procedure forthe multiple comparisons involved in the three-arm design. For the effect preservation test we derivethe explicit formula for the optimal allocation ratios. We propose a group sequential type design,which naturally accommodates the hierarchical testing procedure. Under this proposed design,Monte Carlo simulations are conducted to evaluate the performance of the sequential effect pre-servation test when the variance of the test statistic is estimated based on the restricted maximumlikelihood estimators of the response rates under the null hypothesis. When there are uncertainties forthe placebo response rate, the proposed design demonstrates better operating characteristics than thefixed sample design.

Key words: Effect preservation test; Group sequential design; Non-inferiority; Three-armtrial.

Supporting Information for this article is available from the author or on the WWW underhttp://dx.doi.org/10.1002/bimj.200900188.

1 Introduction

Non-inferiority clinical trials are widely conducted in pharmaceutical industry and there are twomajor objectives in non-inferiority trials. As in any clinical trials that study a new treatment, the firstobjective is to establish the effectiveness of the test treatment. On the other hand, researchers alsowant to compare the relative effectiveness of the test treatment and a control treatment, which isoften a standard treatment on the market for the investigated disease indication. Statistically this isto show that the test treatment is non-inferior to the control treatment.

The two-arm active-controlled design was widely adopted for the non-inferiority trials in phar-maceutical industry. Such trial design provides a head-to-head comparison between the test treat-ment and the control treatment. The non-inferiority is established by showing that the efficacy of thetest treatment is not inferior to the control treatment by some pre-specified non-inferiority margin(FDA, 2010). It is favored by pharmaceutical companies as the design provides a direct comparisonwith a marketed drug and does not have the potential ethical problems as in placebo-controlledtrials.

However, as pointed out in literatures (Fleming, 1987; Tsong et al., 2003; Hung, Wang andO’Neill, 2005, 2007, 2009) the two-arm trial has major deficiencies in the design, data analysis and

*Corresponding author: e-mail: [email protected], Phone: 11-610-917-7332, Fax: 11-610-917-4538

r 2010 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

504 Biometrical Journal 52 (2010) 4, 504–518 DOI: 10.1002/bimj.200900188

Page 2: A group sequential type design for three-arm non-inferiority trials with binary endpoints

interpretation. First of all, the effectiveness of the test treatment can not be established directly fromthe trial data. Instead, it is a combination of the direct comparison of the test treatment versus thecontrol treatment from the current trial and the assessment of the control treatment versus placebofrom historical trials. Secondly, without a placebo arm, the assay sensitivity of the trialis not verifiable from the trial data so it has to rely on external information such as historicalplacebo trials for the control treatment. The only exception is when the trial shows superiority ofeither treatment over the other one. Without the trial assay sensitivity, it is questionable to interpretthe non-inferiority testing results as well as the effectiveness of the test treatment. Additionalstatistical risks are involved with the use of historical placebo trials data and Hung et al. (2007)discussed two kinds of type-I errors, within-trial and across-trial errors. They concluded thatconsideration of both kinds of error rates is important for defining a non-inferiority margin. For theindirect statistical inference, any method that controls only the across-trial type-I error is in-adequate.

Whenever ethically it is justifiable to expose patients to placebo in the studied indication, it isrecommended to use a three-arm design for non-inferiority trials (Tang and Tang, 2004). The three-arm design with a test treatment, an active control and a placebo group is referred as the goldstandard design for non-inferiority trials in the International Conference on Harmonization (ICH)E10 guideline (ICH, 2000). With a placebo group, the effectiveness of the test treatment can bedirectly established. It is also avoided to define a fixed non-inferiority margin for the assessmentof non-inferiority. Instead the non-inferiority test problem can be formulated to evaluate whetherthe test treatment preserves a pre-specified proportion of the effect of the active control overplacebo.

In this paper, we consider the design of the three-arm non-inferiority trials with binary endpointsand the treatment effects are compared via the risk difference. In Section 2, we present the statisticalmodel and use the closed testing principle to establish the hierarchical testing procedure for themultiple comparisons involved in the three-arm non-inferiority trials. We also derive the explicitoptimal allocation ratio for the effect preservation test. In Section 3, we propose a group sequentialtype design which naturally fit with the hierarchical testing procedure. The effect preservationhypotheses are tested using the maximum likelihood estimation (RMLE) method for which thevariance of the test statistic is estimated based on the restricted maximum likelihood estimator ofthe treatment response rates. The actual type-I error and the testing power for the effect pre-servation test are assessed via Monte-Carlo simulations. The proposed design is compared with thefixed sample design in Section 4 when there are uncertainties on the placebo response rate. Weconclude with a discussion in Section 5.

2 Three-arm non-inferiority trial design

2.1 Closed testing procedure

Let XT, XC and XP be independent binary response variables with binomial distributions Bin(nT,pT), Bin(nC, pC) and Bin(nP, pP), respectively, where pT, pC, and pP represent the true efficacy ratesunder the test, control and placebo treatment group, respectively. In such a study design, there arefour possible tests we can conduct to compare the treatments.

With a placebo arm in the trial, the effectiveness of the test treatment can be directly establishedby the superiority testing

H01 : pT � pP versus H11 : pT4pP: ð1Þ

Similarly, the effectiveness of the control treatment can be verified by the superiority testing

H02 : pC � pP versus H12 : pC4pP: ð2Þ

r 2010 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com

Biometrical Journal 52 (2010) 4 505

Page 3: A group sequential type design for three-arm non-inferiority trials with binary endpoints

The non-inferiority objective can be assessed by showing that the efficacy of the test treatment isnot inferior to the control treatment by a non-inferiority margin, say d, in the following hypotheses

~H03 : pT � pC � �d versus ~H13 : pT � pC4� d;

where d is considered a fixed non-inferiority margin and must be pre-specified in the protocol. Inthe three-arm trials the effect preservation test (Holmgren, 1999) provides an alternative approachfor non-inferiority testing. In this approach d is replaced by a fraction of the control treatmenteffect, (1�r)(pC–pP), where r>0. If the null hypothesis of (1) is rejected, the above non-inferiorityhypotheses can be rewritten as the following effect preservation hypotheses

H03 :pC � pPpT � pP

�1

rversus H13 :

pC � pPpT � pP

o1

rð3Þ

The proportion r defines the hypotheses and it must be pre-specified in the trial design. Rejectionof the null hypothesis H03 implies that the test treatment preserves at least 100r% of the controltreatment effect.

If the non-inferiority is established, one may be interested in testing the superiority of the testingtreatment over the control treatment in the following hypotheses

H04 : pT � pC versus H14 : pT4pC: ð4Þ

With a placebo arm in the study design, the comparison of the test treatment with placeboprovides the most straightforward and convincing evidence for the effective of the test treatment. Ifthe superiority of the test treatment over placebo can not be shown the trial will fail. For this reasonwe propose to test hypotheses (1) first. Once H01 is rejected at a level of one-sided a, where the valueof a is usually set at 2.5%, we proceed to test

H0 : H02 \H03 \H04 versus H1 : Hc02 [H

c03 [H

c04:

Koch and Rohmel (2004) discussed this testing procedure for the fixed margin non-inferiority testwith continuous endpoints. In this paper we consider the effect preservation non-inferiority test forbinary endpoints.

In the following arguments we use the closed testing principle (Marcus, Peritz and Gabriel, 1976)to address the multiplicity problems in the multiple comparisons of the three treatment arms. Withthree original hypotheses, there are three intersection hypotheses containing two original hy-potheses, fH02 \ H03g, fH02 \ H04g and fH03 \ H04g and one intersection hypothesis containing allthree original hypotheses fH02 \ H03 \ H04g.

The intersection fH02 \ H03g corresponds to (pT–pP)/rrpC�pPr0, which contradicts the re-jection of H01. The intersection fH02 \ H04g corresponds to pTrpCrpP, which again contradictsthe rejection of H01. The intersection fH03 \ H04g corresponds to pC�pPZ(pT�pP)/r and pTrpC.Given the rejection of H01, this is equivalent to H03. The intersection fH02 \ H03 \ H04g contradictsthe rejection of H01 as fH02 \ H03g does.

By the closed testing principle, H02 is rejected if both H01 and H02 are rejected at the one-sidedsignificance level of a; H03 is rejected if both H01 and H03 are rejected at the one-sided significancelevel of a; H04 is rejected if H01, H03 and H04 are all rejected at the one-sided significance level of a.Although the test H04 can be tested without inflation of the overall type-I error, this test is generallynot powered in a non-inferiority trial so we will focus on the other two tests.

2.2 Optimal allocation ratios for the effect preservation test

The effect preservation test was first proposed by Holmgren (1999) and studied by many authors,including Pigeot et al. (2003), Tang and Tang (2004) and Kieser and Friede (2007). When the nullhypothesis of (1) is rejected, the effect preservation test (3) for the binary data can be expressed as

H03 : pT � rpC � ð1� rÞpP � 0 versus H13 : pT � rpC � ð1� rÞpP40

506 G. Li and S. Gao: A group sequential type design for three-arm

r 2010 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com

Page 4: A group sequential type design for three-arm non-inferiority trials with binary endpoints

We follow Tang and Tang (2004) to define:

c ¼ pT � rpC � ð1� rÞpP;

where pi ¼ Xi=ni; i ¼ T ;C;P. The variance of c is

s2ðcÞ ¼ pT ð1� pT Þ=nT1r2pCð1� pCÞ=nC1ð1� rÞ2pPð1� pPÞ=nP: ð5Þ

For the sample size estimation, we use the following notations. Let fpð1Þi ; i ¼ T ;C;Pg be the

response rates under the alternative hypothesis and fpð0Þi ; i ¼ T ;C;Pg be the restricted maxi-mum likelihood estimates under the constraint pð0ÞT � rpð0ÞC � ð1� rÞpð0ÞP ¼ 0. Letcð1Þ ¼ pð1ÞT � rpð1ÞC � ð1� rÞpð1ÞP , t2 ¼ pT ð1� pT Þ1r2pCð1� pCÞ=lC1ð1� rÞ2pPð1� pPÞ=lP, wherelC 5 nC/nT and lP 5 nP/nT. Asymptotically Z(0) ¼

ffiffiffiffiffiffinTpðc=tÞ and Z(1) ¼

ffiffiffiffiffiffinTp

[ðc� c(1)Þ=t] follow

the standard normal distribution under the null and alternative hypothesis, respectively. Since t2

depends on fpi; i5T, C, P, g, approximate sample size formula could be obtained by replacing themwith appropriate estimates (Kieser and Friede, 2007). When the sample sizes are calculated based onthe variance under the alternative hypothesis, fpi; i5T, C, P g are replaced by fpð1Þi ; i ¼ T ;C;Pg andthe sample size in the test treatment group is

nð11ÞT ¼ ðza1zbÞ

2 pð1ÞT ð1� pð1ÞT Þ1r2pð1ÞC ð1� pð1ÞC Þ=lC1ð1� rÞ2pð1ÞP ð1� pð1ÞP Þ=lP

pð1ÞT � rpð1ÞC � ð1� rÞpð1ÞPh i2 ; ð6Þ

where zx is the 100(1–x)% percentile of the standard normal distribution. The total sample size isN11 ¼ ð11lC1lPÞn

ð11ÞT .

Kieser and Friede (2007) considered the optimal allocation ratios under a fixed ratio lC. We relaxthis constraint and obtain the optimal allocation ratios. Since the total sample size is proportionalto

hðlC;lPÞ ¼ ð11lC1lPÞ pð1ÞT ð1� pð1ÞT Þ1r2pð1ÞC ð1� pð1ÞC Þ

lC1ð1� rÞ2pð1ÞP ð1� pð1ÞP Þ

lP

" #;

the optimal allocation ratios fl�C;l�Pg minimize the function h(lC, lP). By the Cauchy-Schwartz

inequality we have

hðlC;lPÞ �ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffipð1ÞT ð1� pð1ÞT Þ

q1r

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffipð1ÞC ð1� pð1ÞC Þ

q1ð1� rÞ

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffipð1ÞP ð1� pð1ÞP Þ

q� �2; ð7Þ

where the equality holds at the following optimal allocation ratios

l�C ¼ r

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffipð1ÞC ð1� pð1ÞC Þ

qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffipð1ÞT ð1� pð1ÞT Þ

q ;l�P ¼ ð1� rÞ

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffipð1ÞP ð1� pð1ÞP Þ

qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffipð1ÞT ð1� pð1ÞT Þ

q : ð8Þ

In the other two sample size formulae, the variance of the test statistic under the null hypothesis isinvolved. Since the restricted maximum likelihood estimates of the response rates under the nullhypothesis depend on the allocation ratios, there are no analytic solutions for the optimal allocationratios under these two sample size formulae.

It is noteworthy that the optimal allocation ratios derived above are based on the sample sizeformulas which rely on the normal approximation. Due to the discreteness of the binomial dis-tribution, the actual power could be much different from the nominal power when this approx-imation does not work. In addition, although the optimal allocation ratios minimizes the totalsample size, the choice of the actual allocation ratios for a practical trial is usually selected based onother reasons. For example, for the ease of randomization procedure and blindness maintaining, theallocation ratios are chosen as ratios of integers, such as 1:1:1, 2:1:1, and etc.

Biometrical Journal 52 (2010) 4 507

r 2010 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com

Page 5: A group sequential type design for three-arm non-inferiority trials with binary endpoints

3 A group sequential type design

3.1 Proposed design

As we mentioned before, there are two major objectives for the three-arm non-inferiority trials witha natural hierarchial testing structure. The superiority of the test treatment over placebo is aprerequisite for the effect preservation test. However, the fixed sample design does not take thishierarchial structure into account. In a fixed sample design, the sample size is determined upfrontand fixed. The statistical testings are conducted after the enrollments are completed and all subjectdata are available. Once the null hypothesis of (1) is not rejected by the trial data, the data from thecontrol treatment group might be used for the assessment of the trial assay sensitivity or furtherexplorations as pointed out by one referee. However, all of these provide little help for the studyobjectives and a waste of resource is unavoidable.

This is problematic when the response rate of the placebo arm is highly variable or unknown. Forcertain diseases in bacterial infections, the historical placebo trials on the control treatment can notprovide reliable estimates of the response rates for both arms as the emergence of the pathogenresistance. In this case, a conservative assumption is usually made on the placebo response rate anda clinically minimal treatment difference is powered to be detected by the trial. Since the sample sizerequired for the superiority test is generally much smaller than the effect preservation test, thesample size estimation for a fixed sample design is usually driven by the effect preservation test. Thiscould result in a higher sample size than actually needed.

For such design, a group sequential type design is more appropriate than a fixed sample size.With pre-specified stopping boundaries, the trial has the options of stopping early for efficacy, orfutility or both. If the actual placebo response rate is lower than the assumed rate, the trial has agood chance to stop early to claim efficacy. In this section, we propose a group sequential typedesign which accommodates the hierarchial objectives and the testing procedure naturally.

There are two stages in our proposed design. In the first stage, the sample size is determined basedon the superiority test (1). Assuming the allocation ratios of 1 : lC : lP and the response rates offpð1ÞT ;p

ð1ÞC ;p

ð1ÞP g, with a one-sided type-I error of asup and a type-II error of bsup, the sample sizes in the

first stage are

n1T ¼ðzasup

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi�pTPð1� �pTPÞð111=lPÞ

p1zbsup

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffipð1ÞT ð1� pð1ÞT Þ1pð1ÞP ð1� pð1ÞP Þ=lP

qÞ2

ðpð1ÞT � pð1ÞP Þ2

n1C ¼ n1TlC; n1P ¼ n1TlP; �pTP ¼pð1ÞT 1pð1ÞP lP

11lP:

Once the enrollment for the first stage is completed, an interim analysis is conducted and thesuperiority of the test treatment over placebo is tested. If the null hypothesis of (1) is not rejected thetrial will be stopped for futility; otherwise, we continue to perform the effect preservation test (3). Ifthe null hypothesis of (3) is rejected then the trial stops; otherwise the trial enters its second stage.

In the second stage, the placebo arm is dropped from the trial and enrollment continues for theother two arms with the same allocation ratio of 1 : lC. The second stage could use a traditionalgroup sequential design and more interim analyses might be planned. Since multiple testings areinvolved in the sequential testings, the stopping boundaries for the effect preservation test aredetermined in order to maintain the overall type-I error rate. In the second stage the trial could bedesigned for early stopping for either efficacy or futility or both. In this paper, we will only considerstopping for efficacy in the effect preservation test. Figure 1 demonstrates the testing procedureunder this proposed design.

Debates have been seen in literatures on whether H02 must also be rejected to make the trialsuccessful. Proponents argued the rejection of H02 as a requirement for the trial assay sensitivity.

508 G. Li and S. Gao: A group sequential type design for three-arm

r 2010 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com

Page 6: A group sequential type design for three-arm non-inferiority trials with binary endpoints

In our proposed study design, the assay sensitivity has been demonstrated following the rejectionof H01 in the first stage as pointed out by one referee. However, we propose to perform thesuperiority test (2) at the first time when the null hypothesis of the effect preservation test (3) isrejected for the interpretation of the effect preservation test results. If H02 is rejected then therejection of H03 can be interpreted as an effect preservation statement. Otherwise, an effect pre-servation statement can not be made because the control treatment effect is not demonstrated in thesame trial. Koch and Rohmel (2004) provided an interesting discussion on the interpretation in thissituation.

3.2 Testing procedure and sample size for the effect preservation test

Let fnik ; i5T,Cg be the sample sizes for the test treatment and the control treatment arms at the kthinterim analysis, where k5 1,y,K, and let np be the sample size for placebo arm determined in thefirst stage. Denote Z

ð0Þk to be the test statistic calculated based on the data accumulated at the kth

interim analysis, then

Zð0Þk ¼

ck

sðckÞ; ð9Þ

where ck ¼ pTk � rpCk � ð1� rÞpP, pik ¼ Xik=nik; i ¼ T ;C, pP ¼ XP=nP ands2ðckÞ ¼ pT ð1� pT Þ=nTk1r2pCð1� pCÞ=nCk1ð1� rÞ2pPð1� pPÞ=nP. Under the null hypothesis of(3), the test statistics ðZ

ð0Þ1 ; . . .;Z

ð0ÞK Þ have the null joint distribution:

ðiÞðZð0Þ1 ; . . .;Z

ð0ÞK Þ follow a multivariate normal distribution asymptotically;

ðiiÞEðZð0Þk Þ ¼ 0; k ¼ 1; . . .;K ;

ðiiiÞcovðZð0Þi ;Z

ð0Þj Þ ¼ sðcjÞ=sðciÞ; 1 � i � j � K :

ð10Þ

Similar to what we have described in Section 2.3, an appropriate estimate of the variance of thetest statistics Z

ð0Þk at the kth interim analysis will be used in the testing. In this paper, we use the

Figure 1 Flow chart of the testing procedure under the proposed design.

Biometrical Journal 52 (2010) 4 509

r 2010 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com

Page 7: A group sequential type design for three-arm non-inferiority trials with binary endpoints

RMLE method and replace fpi; i5T, C, Pg in Zð0Þk by fpð0Þik ; i ¼ T ;C;Pg, which are the restricted

maximum likelihood estimates based on the data accumulated at the kth interim analysis under theconstraint pð0ÞTk � rpð0ÞCk � ð1� rÞpð0ÞPk ¼ 0.

A group sequential test with possible stopping to reject the null hypothesis of (3) is defined bystopping boundary constants fbk; k5 1,y,Kg.

After the kth interim analysis; k ¼ 1; . . .;K � 1

if Zð0Þk � bk; stop the trial; reject H03

otherwise; continue to group k11;

After the final analysis K

if Zð0ÞK � bK ; stop the trial; reject H03

if Zð0ÞK � bK ; stop the trial; accept H03

The type-I error for the effect preservation test (3) is

ProbðZð0Þk � bk for some k ¼ 1; ::::;KÞ; ð11Þ

where fZð0Þ1 ; . . .;Z

ð0ÞK g follow the null distribution (10). The commonly applied tests, such as Pocock

(1977) and O’Brien and Fleming (1979), could be used for the stopping boundaries. Each type oftest uses a different sequence of critical values fb1,ybKg, but all are chosen to ensure the overallone-sided type-I error is maintained at the target level a.

In the traditional group sequential design with equal increments in information, the boundariesdepends only on the number of interim analyses. However, for the design proposed in this paper,since the determination of the sample size for the first stage is independent from the effect pre-servation test, the variance-covariance matrix of the test statistics fZ

ð0Þ1 ; . . .;Z

ð0ÞK g depends on many

other factors, such as r, n1T , response rates and etc. We use numerical integrations to determinethese boundaries. Table 1 presents some examples of O’Brien & Fleming boundaries, where dnT isthe sample size increment per one additional interim analysis in the test treatment group after thefirst stage.

Let fpð1Þi ; i ¼ T ;C;Pg be the response rates under the alternative hypothesis and

Zð1Þk ¼

ck � cð1Þ

sðckÞ; ð12Þ

Table 1 O’Brien & Fleming boundary for efficacy; pT 5 pC 5 0.9, pP 5 0.7, K5 3, lC 5 0.5, asup 52.5%, bsup 5 10%, a5 2.5%.

r lP dnT Critical Values Alpha Spent

b1 b2 b3 b1 b2 b3

0.5 1 50 3.017 2.361 2.004 0.00128 0.0095 0.025100 3.734 2.481 1.987 0.00009 0.0068 0.025

0.5 50 2.660 2.270 2.013 0.00391 0.0125 0.025100 3.145 2.380 1.992 0.00083 0.0089 0.025

0.8 1 50 3.030 2.371 2.013 0.00122 0.0093 0.025100 3.751 2.492 1.996 0.00009 0.0064 0.025

0.5 50 2.680 2.287 2.028 0.00368 0.0121 0.025100 3.167 2.397 2.006 0.00077 0.0085 0.025

510 G. Li and S. Gao: A group sequential type design for three-arm

r 2010 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com

Page 8: A group sequential type design for three-arm non-inferiority trials with binary endpoints

where cð1Þ ¼ pð1ÞT � rpð1ÞC � ð1� rÞpð1ÞP . Under the alternative hypothesis of (3), the statisticsðZð1Þ1 ; . . .;Z

ð1ÞK Þ have the following joint distribution:

ðiÞðZð1Þ1 ; . . .;Z

ð1ÞK Þ follow a multivariate normal distribution asymptotically;

ðiiÞEðZð1Þk Þ ¼ 0; k ¼ 1; . . .;K ;

ðiiiÞcovðZð1Þi ;Z

ð1Þj Þ ¼ sðcjÞ=sðciÞ; 1 � i � j � K :

ð13Þ

The power of the effect preservation test at c5c(1) is

Prob[Kk¼1

Zð1Þj obj for j ¼ 1; � � � ; k� 1 and Z

ð1Þk � bk

� �( ); ð14Þ

where fZð1Þ1 ; . . .;Z

ð1ÞK g follow the distribution (13). For given r, lC, lP, K and boundaries fbi;

i5 1,y,Kg, the sample sizes for three treatment groups can be found such that (14) is equal to thetarget power 1�b under the assumed response rates of fpð1ÞT ; p

ð1ÞC ;p

ð1ÞP g at the alternative hypothesis of

(3).There is no explicit formula for the sample size. R programs are written in the numerical search of

the sample size. Similar to the problem in the testing procedure, the variance-covariance matrix ofthe test statistics ðZ

ð1Þ1 ; . . .;Z

ð1ÞK Þ under alternative hypotheses depends on the treatment response

rates fpT, pC, pPg. In our programs, the treatment response rates at the kth interim analysis arereplaced by either the restricted maximum likelihood estimates fpð0Þik ; i ¼ T ;C;Pg(method 1) or theassumed response rates fpð1Þi ; i ¼ T ;C;Pg at the alternative hypothesis (method 2). We denote theresulted total sample size by N00 and N01, respectively. For the numerical search of the sample sizeto meet the target power, we start with dnT 5 2 and use the R function PMNORM to calculate thestatistical power. In each step we increase dnT by 1 until the resulted power reaches the target level.

Table 2 Total sample sizes for two sample size estimation methods based on the RMLE testingmethod for the effect preservation test; K5 3, asup 5 2.5%, bsup 5 10%, a5 2.5%, b5 10%.

(pT, pC, pP) r (lC, lP) N00 N01

Max Mean Max Mean

(0.9,0.9,0.7) 0.5 (1,1) 698 551 734 575(1,0.5) 834 630 1034 766(0.5,0.5) 704 538 887 660(1,2) 632 519 612 506

(0.9,0.9,0.5) 0.7 (1,1) 522 409 538 419(1,0.5) 645 494 761 565(0.5,0.5) 570 435 696 513(1,2) 440 355 420 344

(0.8,0.8,0.5) 0.6 (1,1) 768 599 776 603(1,0.5) 1020 771 1128 838(0.5,0.5) 919 695 1033 766(1,2) 640 516 624 507

(0.8,0.8,0.3) 0.8 (1,1) 1277 983 1185 928(1,0.5) 1834 1372 1718 1300(0.5,0.5) 1829 1365 1733 1306(1,2) 964 767 912 737

Biometrical Journal 52 (2010) 4 511

r 2010 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com

Page 9: A group sequential type design for three-arm non-inferiority trials with binary endpoints

Since the boundary values fb1,y, bKg depend on dnT, new boundary values have to be determined ineach step. Table 2 provides the maximum and average sample size for some scenarios with K5 3.

It is noted from Table 2 that when the assumed response rate for placebo is much lower than that forthe test treatment, the required total sample size might be very high. This is due to the fact that thenumber of subjects in the placebo arm is determined by the superiority test (1) in the first stage. A largedifference in the response rates between the test treatment and placebo would require a small sample sizefor the placebo group and subsequently a large sample size for the effect preservation test. To avoid toosmall number of subjects in the placebo arm, we could estimate the sample size for the superiority test(1) at a different significance level or a different power level. For example, we can choose to set the type-Ierror at a lower level, say asup50.001 or power the study at a higher level, say 1�bsup599%. In thisway, the total sample sizes are significantly reduced while the number of subjects in the placebo groupdoes not increased much. The following Table 3 demonstrates the sample size reductions under theseapproaches. It indicates that the total sample size decreases with nP in a certain range. In addition,appropriate choice of the allocation ratios will also help reducing the total sample size. In practical trials,we may be able to choose an nP which corresponds to an appropriate total sample size numerically.

3.3 Monte-Carlo simulations

In this section, we investigate the performance of the sequential effect preservation test in theproposed design. First of all we examine the type-I error of the RMLE testing method; Secondly, wecompare the power property of the two testing methods mentioned in Section 3.2.

Given the nominal one-sided type-I error of a, the critical region for the sequential effect pre-servation test is given by the stopping boundary fb1,y, bKg, which are determined numerically. Thetype-I error and power are given in (11) and (14), respectively. To examine the type-I error of the

Table 3 Total sample sizes based on the RMLE testing method for the effect preservation testwhen the sample size in the first stage is estimated under the specified asup and bsup; K5 3,pT 5 pC 5 0.8, pP 5 0.3, r5 0.8, a5 2.5%, b5 10%.

(lC, lP) asup(%) bsup(%) nP N00 N01

Max Mean Max Mean

(1,1) 2.5 10 19 1277 983 1185 9280.1 10 35 873 697 837 6762.5 1 32 904 719 864 6950.1 1 52 788 637 764 622

(1,0.5) 2.5 10 14 1834 1372 1718 13000.1 10 26 1004 783 948 7492.5 1 24 1058 822 994 7830.1 1 38 850 673 818 653

(0.5,0.5) 2.5 10 14 1829 1365 1733 13060.1 10 26 964 752 940 7382.5 1 24 1019 792 986 7710.1 1 38 806 639 800 635

(1,2) 2.5 10 28 964 767 912 7370.1 10 52 788 645 764 6312.5 1 46 808 658 780 6420.1 1 78 760 627 740 616

512 G. Li and S. Gao: A group sequential type design for three-arm

r 2010 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com

Page 10: A group sequential type design for three-arm non-inferiority trials with binary endpoints

RMLE method, we consider the scenarios with (pC, pP)5 (0.9,0.6),(0.75,0.4),(0.6,0.2) and pT 5 rpC1(1�r)pP with r5 0.5,0.8. To examine the statistical power, we assume pT 5 pC at the alternativewith (pC�pP)/(pT�pP)5 1. We consider the following three allocation ratios (test:control:placebo)(i) 1:1:1; (ii) 1:1:0.5; (iii)1:0.5:0.5 with the sample size increment in the second stage dnT 5 40,80,120and K5 3,4,5. For each scenario, simulations with 10000 replications were performed. The simu-lated type-I error and power are the proportion of the simulations that cross the boundariesfb1,y,bKg under pT 5 rpC1(1�r) pP and (pC�pP)/(pT�pP)5 1, respectively.

Table 4 displays the simulated type-I error of the RMLE method. The type-I error rates for theRMLE test are maintained well for the simulated scenarios and they do not exceed 20% of thenominal level. This suggests that the RMLE test is robust according to Cochran’s criterion (Co-chran, 1952). Table 5 gives the simulated power of the RMLE method for the effect preservationtest. Comparatively the method 2 shows slightly more consistent power than the method 1. But forall investigated scenarios, the simulated power for both methods are close to the nominal power andthe power difference between two methods are very minor.

Table 4 Simulation results on the type-I error rate (%) for the effect preservation test at pT 5 rpC1(1�r)pP based on the RMLE method; asup 5 2.5%, bsup 5 10%, a5 2.5%.

K (pC, pP) (lC, lP) dnT 5 40 dnT 5 80 dnT 5 120

r5 0.5 r5 0.8 r5 0.5 r5 0.8 r5 0.5 r5 0.8

3 (0.9,0.6) (1,1) 2.48 2.41 2.41 2.54 2.46 2.46(1,0.5) 2.80 2.26 2.69 2.67 2.87 2.25(0.5,0.5) 2.51 2.30 2.44 2.53 2.26 2.46

(0.75,0.4) (1,1) 2.63 2.42 2.51 2.27 2.34 2.50(1,0.5) 2.35 2.45 2.42 2.67 2.28 2.51(0.5,0.5) 2.55 2.37 2.53 2.36 2.19 2.51

(0.6,0.2) (1,1) 2.42 2.60 2.17 2.37 2.34 2.53(1,0.5) 2.30 2.76 2.43 2.46 2.04 2.42(0.5,0.5) 2.71 2.27 2.14 2.47 2.09 2.46

4 (0.9,0.6) (1,1) 2.35 2.25 2.38 2.68 2.58 2.61(1,0.5) 2.54 2.17 2.63 2.26 2.60 2.46(0.5,0.5) 2.37 2.32 2.41 2.41 2.64 2.56

(0.75,0.4) (1,1) 2.52 2.46 2.48 2.65 2.60 2.66(1,0.5) 2.23 2.55 2.35 2.58 2.30 2.90(0.5,0.5) 2.36 2.68 2.49 2.46 2.12 2.66

(0.6,0.2) (1,1) 2.25 2.53 2.25 2.30 2.32 2.50(1,0.5) 2.46 2.57 1.88 2.58 1.96 2.68(0.5,0.5) 2.15 2.53 2.14 2.49 2.02 2.49

5 (0.9,0.6) (1,1) 2.76 2.32 2.33 2.48 2.50 2.67(1,0.5) 2.45 2.32 2.57 2.83 2.73 2.87(0.5,0.5) 2.25 2.64 2.64 2.74 2.23 2.36

(0.75,0.4) (1,1) 2.34 2.70 2.20 2.28 2.40 2.64(1,0.5) 2.41 2.69 2.51 2.25 2.30 2.33(0.5,0.5) 2.19 2.35 2.38 2.31 2.42 2.48

(0.6,0.2) (1,1) 2.36 2.52 2.18 2.27 1.81 2.15(1,0.5) 2.24 2.58 1.96 2.32 1.75 2.69(0.5,0.5) 2.24 2.51 1.72 2.32 1.99 2.38

Biometrical Journal 52 (2010) 4 513

r 2010 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com

Page 11: A group sequential type design for three-arm non-inferiority trials with binary endpoints

3.4 A hypothetical example

In this section we use a hypothetical example to demonstrate the test procedure for the proposeddesign. Suppose we have designed a three-arm non-inferiority trial in the acute bacterial otitismedia. Due to the resistance of the current treatment on the acute bacterial otitis media and therelacks of historical placebo data, assumptions on the efficacy rates for the treatments are made withhigh variability. Hypothetically, we assume the efficacy rates are 85%, 85% and 65% for the test,control and placebo, respectively. Assuming the 1:1:1 allocation ratio, the first stage sample size(nT1, nC1, nP)5 (97, 97, 97). With a maximum of K5 4 sequential effect preservation tests and atarget 50% of the control effect to be preserved by the test treatment, the O’Brian & Fleming typeboundary (b1, b2, b3, b4)5 (4.167, 2.866, 2.320, 2.000). To achieve a 90% power for the effectpreservation test, the sample size increment dnT is 108 in the stage 2 of the design.

Assuming the true efficacy rates (pT, pC, pP)5 (0.82, 0.82, 0.5), a hypothetical trial is generated bythe Monte Carlo simulation. The simulated data and testing results are summarized in Table 6. Inthe first stage, the test treatment is shown to be superior to placebo as the test statistic is higher than

Table 5 Simulation results on the power (%) at pT 5 pC for the effect preservation test based onthe RMLE method; asup 5 2.5%, bsup 5 10%, a5 2.5%, b5 10%.

r5 0.5 r5 0.8K (pC,pP) (lC,lP)

Method 1 Method 2 Method 1 Method 2

3 (0.9,0.6) (1,1) 90.61 90.42 89.03 89.80(1,0.5) 89.42 91.04 88.09 90.62(0.5,0.5) 88.71 90.89 87.76 90.41

(0.75,0.4) (1,1) 90.74 89.89 89.60 90.07(1,0.5) 89.84 89.76 89.77 90.26(0.5,0.5) 89.35 89.74 89.34 89.74

(0.6,0.2) (1,1) 92.46 90.27 92.51 90.12(1,0.5) 93.49 90.47 92.50 89.87(0.5,0.5) 93.00 89.54 90.98 90.10

4 (0.9,0.6) (1,1) 90.32 91.10 88.95 90.36(1,0.5) 89.47 91.19 87.99 90.61(0.5,0.5) 89.25 90.69 87.96 90.17

(0.75,0.4) (1,1) 90.22 90.62 90.24 90.29(1,0.5) 89.89 90.30 90.44 89.75(0.5,0.5) 90.37 90.33 89.93 89.63

(0.6,0.2) (1,1) 92.63 90.15 92.80 90.09(1,0.5) 93.15 89.69 93.46 89.79(0.5,0.5) 93.53 89.78 92.93 89.91

5 (0.9,0.6) (1,1) 90.06 91.20 89.32 90.47(1,0.5) 89.36 91.41 87.97 90.53(0.5,0.5) 89.24 91.49 88.33 90.02

(0.75,0.4) (1,1) 90.60 90.04 89.93 89.43(1,0.5) 90.34 90.87 89.70 89.74(0.5,0.5) 89.87 90.41 90.22 90.06

(0.6,0.2) (1,1) 92.79 89.98 92.46 90.29(1,0.5) 93.39 89.57 89.23 89.90(0.5,0.5) 93.09 89.85 92.73 89.85

514 G. Li and S. Gao: A group sequential type design for three-arm

r 2010 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com

Page 12: A group sequential type design for three-arm non-inferiority trials with binary endpoints

1.96. The effect preservation test statistic (1.393) in the first interim analysis does not cross theboundary (4.167) and the enrollment continues for the test and the control treatment group. In thesecond interim analysis, the test statistic (1.804) is still smaller than the boundary value (2.866) andthe enrollment continues. The third interim analysis rejects the null hypothesis of the effect pre-servation test as the test statistic (2.937) is higher than the boundary value (2.320). It is also shownin this interim analysis that the control treatment is superior to placebo.

4 Design comparison

In practical settings, there exists situations when the placebo response rate is either highly variableor unknown and the assumption on the placebo response rate is made with large uncertainty. It isimportant to study the impact of this uncertainty on the study design and the operating char-acteristics of the effect preservation test.

In this section we compare the proposed design with the fixed sample design under three situa-tions. We assume pT 5 pC 5 0.9 to be the true response rates for the test and the control treatment.Let pP and ~pP be the assumed and true placebo response rates, respectively, and e ¼ ~pP=pP. e>1implies the placebo response rate is underestimated while eo1 corresponds to an overestimatedplacebo response rate.

In the first situation, we make a conservative assumption on the placebo response rate with pP5 0.7.The superiority test of the test treatment versus placebo is powered at a treatment difference of 0.2which is assumed to be minimally clinically important. Comparing with the fixed sample design, theproposed design has the advantage of early stopping for efficacy if the placebo response rate isoverestimated in the design. We explore the operating characteristics of the test when the placebo rateis overestimated with e5 0.9,0.8,0.7. In the second situation, we assume a large difference between thetest treatment and placebo with an underestimated pP5 0.3 and e5 1.1,1.2,1.3. Lastly we assume amodest placebo response rate pP5 0.5, which may be considered close to the true placebo responserate. Correspondingly we consider a smaller uncertainty around this assumption with e5 1.05 or 0.95.For all three scenarios, we have also included e5 1 as a reference. With different assumptions made onthe placebo rate and treatment difference in the three scenarios, proportions of the control treatmenteffect to be preserved by the test treatment are chosen as r5 0.5, 0.8 and 0.65, respectively.

Simulations are conducted for each scenario with 10000 replications to produce the average totalsample size and testing power. Table 7 presented the comparisons of the proposed design with thefixed sample design. For all three situations, the power of the effect preservation test for theproposed design is comparable to that for the fixed sample size while the former design need smallertotal average sample sizes for all cases. In addition, since the sample size for the placebo arm isdetermined by the superiority test (1), the proposed design required much less number of subjects inplacebo arm than the fixed sample design.

Table 6 Simulation data and testing results for a hypothetical example

Data Superiority Testing Effect preservation test

Interim T C P T versus P Pass? C versus P Pass? Statistic Pass?

1 74/97 81/97 52/97 3.408 Yes 1.393 No2 156/205 171/205 52/97 1.804 No3 245/313 255/313 52/97 5.048 Yes 2.937 Yes4 NA NA NA

Biometrical Journal 52 (2010) 4 515

r 2010 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com

Page 13: A group sequential type design for three-arm non-inferiority trials with binary endpoints

5 Discussion

In this paper we have argued that the three-arm non-inferiority design has a natural hierarchialstructure for its two major objectives. The superiority of the test treatment versus placebo must bedemonstrated first before proceeding to conduct the effect preservation test. However, sample sizesfor the fixed sample designs are determined in a reverse order. In the fixed sample size, the samplesizes are usually calculated based on the effect preservation test. Whenever there are difficulties toelicit the placebo response rate, there may exist remarkable uncertainties which may have large

Table 7 Comparisons between the proposed design and the fixed sample size design when there areuncertainties on the placebo response rate; pT 5 pC 5 0.9, asup 5 2.5%, a5 2.5%, b5 20%.

Proposed design Fixed sample design

pP r bsup e (lC, lP) nP E(N01) Power nP N01 Power

0.7 0.5 10% 1.0 (1,1) 82 405 80.98 152 456 80.51(1,2) 126 404 81.68 272 544 80.99(0.5,0.5) 60 383 80.63 98 391 80.11

0.9 (1,1) 82 355 95.92 152 456 96.19(1,2) 126 365 96.46 272 544 96.36(0.5,0.5) 60 326 95.85 98 391 96.33

0.8 (1,1) 82 310 99.53 152 456 99.59(1,2) 126 330 99.71 272 544 99.55(0.5,0.5) 60 333 99.46 98 391 99.67

0.7 (1,1) 82 276 99.97 152 456 99.98(1,2) 126 301 99.99 272 544 100(0.5,0.5) 60 255 99.93 98 391 99.99

0.5 0.65 1% 0.95 (1,1) 44 198 85.35 81 243 86.24(1,2) 64 207 85.51 152 304 86.19(0.5,0.5) 33 170 84.58 50 199 85.92

1.0 (1,1) 44 202 81.58 81 243 81.17(1,2) 64 210 81.55 152 304 81.78(0.5,0.5) 33 174 80.18 50 199 81.47

1.05 (1,1) 44 206 76.31 81 243 77.17(1,2) 64 213 76.50 152 304 76.70(0.5,0.5) 33 178 76.20 50 199 76.06

0.3 0.8 1% 1.0 (1,1) 20 244 81.01 104 312 81.41(1,2) 28 239 80.96 204 408 82.16(0.5,0.5) 15 229 81.55 63 251 82.52

1.1 (1,1) 20 248 77.37 104 312 77.94(1,2) 28 243 76.96 204 408 77.04(0.5,0.5) 15 234 77.39 63 251 77.95

1.2 (1,1) 20 251 72.72 104 312 73.09(1,2) 28 246 73.16 204 408 72.87(0.5,0.5) 15 237 71.99 63 251 73.08

1.3 (1,1) 20 255 68.37 104 312 67.43(1,2) 28 249 68.32 204 408 68.01(0.5,0.5) 15 241 67.47 63 251 68.60

516 G. Li and S. Gao: A group sequential type design for three-arm

r 2010 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com

Page 14: A group sequential type design for three-arm non-inferiority trials with binary endpoints

impact on the trial design since it is crucial to demonstrate superiority of the test treatment overplacebo. An overestimated placebo response rate will resulted in a large size trial while an under-estimated placebo response rate will have a risk of failing to show superiority.

We propose a group sequential type design which naturally accommodate the hierarchialstructure of the two major objectives of the non-inferiority trial. Both the sample size determinationand the testing procedure follow this structure. It has a great flexibility with the early stoppingoptions which are especially useful when there exists great uncertainties in the study assumptions. Inthe comparisons with the fixed sample size design, simulation results demonstrate that the proposeddesign has better operating characteristics. In this paper, we only consider the stopping early forefficacy in the sequential testings for the effect preservation. More options can be considered, suchas stopping for futility and sample size re-estimation. In future research we will investigate theseoptions.

Like classic group sequential designs, the proposed design has more operational challenges thanthe fixed sample design in the conduct of the clinical trials. When implementing the propose design,an independent data monitoring committee (IDMC) might be necessarily considered for main-taining the validity and integrity of the clinical trial. In addition, IDMC should only reveal limitedinformation with recommendations to investigators or sponsors about treatment effects and sta-tistical methods in order to minimize biases. Like any trial designs with treatment arms droppedduring the trial conduct, special handlings should be in place in order to maintain the trial blindnessand data integrity. For example, as pointed out by referees and in the EMEA reflection paper onadaptive designs (EMEA/CHMP, 2007), patient population enrolled in the stage 2 might be dif-ferent (e.g. more severe) from those enrolled in stage 1 if the time when the placebo arm has beendropped is known to the investigators. It is suggested in the reflection paper to attempt to restrictsuch knowledge during the trial conduct. In the study planning, the target patient population shouldalso be carefully defined by as objective inclusion/exclusion criteria as possible. In addition, if theseverity of the disease could be monitored based on the blinded data collected from the patients, aclose monitoring of such data could be helpful. In the analysis stage, checks for consistency betweenthe results from different stages are also recommended.

Acknowledgements The authors would like to thank Professor Man-Lai Tang, two anonymous referees andthe editor for their thorough review and valuable comments which greatly improved the manuscript.

Conflict of Interest

The authors have declared no conflict of interest.

References

Cochran, W. G. (1952). The w2 test of goodness of fit. Annals of Mathematical Statistics 23, 315–345.EMEA Committee for Medicinal Products for Human Use. (2007). Reflection paper on methodological issues

in confirmatory clinical trials planned with an adaptive design, Doc Ref. CHMP/EWP/2459/02.FDA. (2010). Draft guidance for industry on non-inferioirty clinical trials. Food and Drug Administration,

HHS. http://www.fda.gov/downloads/Drugs/GuidanceComplianceRegulatoryInformation/Guidances/UCM202140.pdf.

Fleming, T. R. (1987). Treatment evaluation in active control studies. Cancer Treatment Report 71, 1061–1064.Holmgren, E. B. (1999). Establishing equivalence by showing that a prespecified percentage of the effect of the

active control over placebo is maintained. Journal of Biopharmaceutical Statistics 9, 651–659.Hung, H. M. J., Wang, S. J., O’Neill, R. (2005). A regulatory perspective on choice of margin and statistical

inference issue in non-inferiority trials. Biometrical Journal 47, 28–36.Hung, H. M. J., Wang, S. J., O’Neill, R. (2007). Issues with statistical risks for testing methods in non-

inferiority trial without a placebo arm. Journal of Biopharmaceutical Statistics 17, 201–213.

Biometrical Journal 52 (2010) 4 517

r 2010 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com

Page 15: A group sequential type design for three-arm non-inferiority trials with binary endpoints

Hung, H. M. J., Wang, S. J., O’Neill, R. (2009). Challenges and regulatory experiences with non-inferioritytrial design without placebo arm. Biometrical Journal 51, 324–334.

International Conference on Harmonisation of Technical Requirements for Registration of Pharmaceuticalsfor Human Use. Choice of control group in clinical trials. CPMP/ICH/346/96. 2000.

Kieser, M., and Friede, T. (2007). Planning and analysis of three-arm non-inferiority trials with binary end-points. Statistics in Medicine, 26, 253–273.

Koch, A. and Rohmel, J. (2004). Hypothesis testing in the "gold standard" design for proving the efficacy of anexperimental treatment relative to placebo and a reference. Journal of Biopharmaceutical Statistics 14,315–325.

Marcus, R., Peritz, E. and Gabriel, K. R. (1976). On closed testing procedure with special reference to orderedanalysis of variance. Biometrika 63, 655–660.

O’Brien, P. C. and Fleming, T. R., (1979). A multiple testing procedure for clinical trials. Biometrics. 35,549–556.

Pigeot, I., Schafer, J., Rohmel, J. and Hauschke, D. (2003). Assessing non-inferiority of a new treatment in athree-arm clinical trial including a placebo. Statistics in Medicine 22, 883–899.

Pocock, S. J. (1977). Group sequential methods in the design and analysis of clinical trials. Biometrika. 64,191–199.

Tang, M-L and Tang, N-S. (2004). Tests of non-inferiority via rate difference for three-arm clinical trials withplacebo. Journal of Biopharmaceutical Statistics, 14, 337–347.

Tsong, Y., Wang, S. J., Hung, H. M. J. and Cui, L. (2003). Statistical issues on objective, design and analysis ofnon-inferiority active controlled trial. Journal of Biopharmaceutical Statistics 13, 29–42.

518 G. Li and S. Gao: A group sequential type design for three-arm

r 2010 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com