11
Journal of Statistical Planning and Inference 138 (2008) 2106 – 2116 www.elsevier.com/locate/jspi Pilot–pivotal trials for average bioequivalence Thomas Mathew , Yanping Wu Department of Mathematics and Statistics, University of Maryland Baltimore County, 1000 Hilltop Circle, Baltimore, MA 21250, USA Received 22 August 2006; received in revised form 30 January 2007; accepted 9 September 2007 Available online 12 October 2007 Abstract Before carrying out a full scale bioequivalence trial, it is desirable to conduct a pilot trial to decide if a generic drug product shows promise of bioequivalence. The purpose of a pilot trial is to screen test formulations, and hence small sample sizes can be used. Based on the outcome of the pilot trial, one can decide whether or not a full scale pivotal trial should be carried out to assess bioequivalence. This article deals with the design of a pivotal trial, based on the evidence from the pilot trial. A two-stage adaptive procedure is developed in order to determine the sample size and the decision rule for the pivotal trial, for testing average bioequivalence using the two one-sided test (TOST). Numerical implementation of the procedure is discussed in detail, and the required tables are provided. Numerical results indicate that the required sample sizes could be smaller than that recommended by the FDA for a single trial, especially when the pilot study provides strong evidence in favor of bioequivalence. © 2007 Elsevier B.V.All rights reserved. Keywords: AUC; C max ; Conditional power; Two one-sided test (TOST); Two-stage adaptive procedure 1. Introduction The goal of bioequivalence testing in generic drug development is to compare the bioavailabilities of two drug products: a brand name drug (or reference drug, R), and a generic drug (or test drug, T ). Crossover designs are used for this purpose and the response that is usually obtained consists of the area under the plasma concentration–time curve, simply referred to as area under the curve or AUC. Other responses of interest are the maximum blood con- centration, denoted by C max , and the time to reach the maximum concentration, denoted by T max . Typically, AUC and C max are assumed to follow a log-normal distribution; hence the log-transformed data are usually modeled and analyzed. Before carrying out a full scale bioequivalence trial, a pilot trial can be first carried out to decide if a generic drug product shows promise of bioequivalence. The FDA (2003a) guidance document does recommend that “a pilot study in a small number of subjects can be carried out before proceeding with a full bioequivalence study” (see p. 7 of the document). Furthermore, the FDA (2003b) guidance documents points out that “The pilot study can also provide an estimate of the number of subjects to be included in the pivotal study ...” (see p. 28 of the document). The purpose of a pilot trial is to screen test formulations, in order to assess the acceptability of the formulation for further assessment in a pivotal trial. Thus small sample sizes can be used in the pilot trial. When there is a pilot trial, it appears natural to determine the sample size for the pivotal trial using information from the pilot trial. Intuitively, Corresponding author. Tel.: +1 410 455 2418; fax: +1 410 455 1066. E-mail address: [email protected] (T. Mathew). 0378-3758/$ - see front matter © 2007 Elsevier B.V. All rights reserved. doi:10.1016/j.jspi.2007.09.009

Pilot–pivotal trials for average bioequivalence

Embed Size (px)

DESCRIPTION

Pilot–pivotal trials for average bioequivalence

Citation preview

Page 1: Pilot–pivotal trials for average bioequivalence

Journal of Statistical Planning and Inference 138 (2008) 2106–2116www.elsevier.com/locate/jspi

Pilot–pivotal trials for average bioequivalenceThomas Mathew∗, Yanping Wu

Department of Mathematics and Statistics, University of Maryland Baltimore County, 1000 Hilltop Circle, Baltimore, MA 21250, USA

Received 22 August 2006; received in revised form 30 January 2007; accepted 9 September 2007Available online 12 October 2007

Abstract

Before carrying out a full scale bioequivalence trial, it is desirable to conduct a pilot trial to decide if a generic drug productshows promise of bioequivalence. The purpose of a pilot trial is to screen test formulations, and hence small sample sizes canbe used. Based on the outcome of the pilot trial, one can decide whether or not a full scale pivotal trial should be carried out toassess bioequivalence. This article deals with the design of a pivotal trial, based on the evidence from the pilot trial. A two-stageadaptive procedure is developed in order to determine the sample size and the decision rule for the pivotal trial, for testing averagebioequivalence using the two one-sided test (TOST). Numerical implementation of the procedure is discussed in detail, and therequired tables are provided. Numerical results indicate that the required sample sizes could be smaller than that recommended bythe FDA for a single trial, especially when the pilot study provides strong evidence in favor of bioequivalence.© 2007 Elsevier B.V. All rights reserved.

Keywords: AUC; Cmax; Conditional power; Two one-sided test (TOST); Two-stage adaptive procedure

1. Introduction

The goal of bioequivalence testing in generic drug development is to compare the bioavailabilities of two drugproducts: a brand name drug (or reference drug, R), and a generic drug (or test drug, T ). Crossover designs are usedfor this purpose and the response that is usually obtained consists of the area under the plasma concentration–timecurve, simply referred to as area under the curve or AUC. Other responses of interest are the maximum blood con-centration, denoted by Cmax, and the time to reach the maximum concentration, denoted by Tmax. Typically, AUCand Cmax are assumed to follow a log-normal distribution; hence the log-transformed data are usually modeled andanalyzed.

Before carrying out a full scale bioequivalence trial, a pilot trial can be first carried out to decide if a genericdrug product shows promise of bioequivalence. The FDA (2003a) guidance document does recommend that “a pilotstudy in a small number of subjects can be carried out before proceeding with a full bioequivalence study” (seep. 7 of the document). Furthermore, the FDA (2003b) guidance documents points out that “The pilot study can alsoprovide an estimate of the number of subjects to be included in the pivotal study . . .” (see p. 28 of the document).The purpose of a pilot trial is to screen test formulations, in order to assess the acceptability of the formulation forfurther assessment in a pivotal trial. Thus small sample sizes can be used in the pilot trial. When there is a pilot trial,it appears natural to determine the sample size for the pivotal trial using information from the pilot trial. Intuitively,

∗ Corresponding author. Tel.: +1 410 455 2418; fax: +1 410 455 1066.E-mail address: [email protected] (T. Mathew).

0378-3758/$ - see front matter © 2007 Elsevier B.V. All rights reserved.doi:10.1016/j.jspi.2007.09.009

Page 2: Pilot–pivotal trials for average bioequivalence

T. Mathew, Y. Wu / Journal of Statistical Planning and Inference 138 (2008) 2106–2116 2107

one can conclude that if the pilot trial provides very strong evidence in favor of bioequivalence, a somewhat smallersample size will give satisfactory power at the pivotal stage, compared to a situation where the evidence from the pilottrial, though in favor of bioequivalence, is somewhat weak. It should however be noted that the drug product for thepilot trial may not be the same batch as used for the pivotal trial; the formulation may be modified before the pivotaltrial.

In practice, the determination of the sample size in a bioequivalence trial is based on the relative bioavailability andvariability of T versus R. In this paper, we develop a formal procedure for designing a pivotal average bioequivalence(ABE) study, following the results of a pilot study. We shall develop the procedure for a 2 × 4 crossover design,assuming a certain mixed model for the data. Our procedure can be easily adopted for other designs and models.The 2 × 4 crossover design that we shall consider is given by the two sequences RTTR and TRRT. The responseof interest consists of AUC, Cmax, or Tmax, after a log-transformation, since log-normality is typically assumed. LetYijkl denote the lth response for the j th subject in the ith sequence receiving the kth formulation, where l = 1, 2;k = T , R; j = 1, 2, . . . , n; i = 1, 2. Here n denotes the number of subjects in each sequence; we assume the samenumber of subjects in each sequence . Following Chinchilli and Esinhart (1996), we shall consider the following modelfor Yijkl :

Yijkl = �k + �ikl + �ijk + �ijkl , (1)

where �T and �R are population mean responses corresponding to treatments T and R, respectively, �ikl is a fixedeffect corresponding to the lth administration of treatment k in sequence i, satisfying the estimability condition∑2

i=1∑2

l=1 �ikl = 0, for k = T and R, �ijk is a random subject effect corresponding to treatment k for subject jin sequence i, and the �ijkl are within-subject errors. It is assumed that �ijkl are normally and independently distributedwith mean zero and variance �2

Wk , k=T , R. It is further assumed that (�ijT , �ijR) follows a bivariate normal distribution

with zero means and variance covariance matrix �B =(

�2BT ��BT �BR

��BT �BR �2BR

).

Let

Y ijk = 1

2(Yijk1 + Yijk2), Y i.T = 1

n

n∑j=1

Y ijT , Y i.R = 1

n

n∑j=1

Y ijR, D = 1

2

2∑i=1

(Y i.T − Y i.R),

S2 =2∑

i=1

n∑j=1

[(Y ijT − Y ijR) − (Y i.T − Y i.R)]2, �2D = �2

BT + �2BR − 2��BT �BR ,

and

�2 = �2D + 1

2 (�2WT + �2

WR).

Then D ∼ N(�T − �R, �2

2n) and S2/�2 ∼ 2

2(n−1), a central chisquare distribution with 2(n − 1) degrees of freedom.

Thus �2 can be estimated as �̂2 = S2/(2(n − 1)). The hypothesis of ABE is

H0 : |�T − �R|� ln(1.25) vs. H1 : |�T − �R| < ln(1.25),

where ln(1.25) is the limit defined by regulatory agencies. The ABE hypothesis can be tested using the two one-sidedtest (TOST) procedure due to Schuirmann (1981, 1987). The test consists of rejecting H0 and concluding averagebioequivalence at the significance level if

D + ln(1.25)

�̂/√

2n> t2(n−1)() and

D − ln(1.25)

�̂/√

2n< − t2(n−1)(),

where tv() is the upper percentile of a t distribution with v degrees of freedom. Equivalently, we conclude ABE if|D|−ln(1.25)

�̂/√

2n<−t2(n−1)(). Thus |D|−ln(1.25)

�̂/√

2ncan be thought of as the test statistic for the TOST. However, the distribution

of this statistic does depend on the nuisance parameter �. For a detailed discussion of the TOST, we refer to the booksby Chow and Liu (2000), Patterson and Jones (2006), and Hauschke et al. (2007).

Page 3: Pilot–pivotal trials for average bioequivalence

2108 T. Mathew, Y. Wu / Journal of Statistical Planning and Inference 138 (2008) 2106–2116

In a recent article, Koyama et al. (2005) have developed a calculus for two-stage designs; in particular, they havedeveloped the formulas for sample size determination in the second stage, given the results of the first stage. Theirsetup is that of comparing the means of two normal populations with a common variance (assumed to be known). Inthis paper, we shall use the ideas in Koyama et al. (2005); except that we shall not assume the variance to be known.However, note that the sample size calculation does require the specification of a value of the variance, even thoughthis value will not be used to carry out the test.

Wang and Zhou (1999) and Pan and Wang (2006) have addressed the problem of choosing the sample size for thepilot trial, taking into account a maximum affordable sample size for the pivotal trial. However, the criterion usedby these authors in the pilot phase is not the one in the ABE hypothesis mentioned above, since the objective of apilot trial are different from those of a pivotal trial. It is well known that the TOST carried out at the 5% significancelevel is equivalent to computing the usual t-interval for �T − �R at the 90% significance level, and concluding ABE ifthis interval is a subset of (− ln(1.25), ln(1.25)). For the pilot trial, Wang and Zhou (1999) and Pan and Wang (2006)recommend using an interval wider than (−ln(1.25), ln(1.25)). Another option is to analyze the pilot trial data usingthe usual ABE criterion, but using a significance level higher than 5%. This latter option is essentially equivalent tousing an interval wider than (− ln(1.25), ln(1.25)); this is briefly explained in the next section.

As far as we know, the problem of selecting the sample size for the pivotal trial, based on the outcome from the pilottrial, has not been addressed at all. In this paper, we shall develop a two-stage adaptive procedure for pilot–pivotaltrials. Numerical implementation of our procedure will be discussed in detail, and practical recommendations will bemade. Comparison with a single trial will also be discussed. Even though we have developed our procedure for a 2×4crossover design, the methodology can be easily developed for other designs, for example, a 2×2 or 4×4 crossoverdesigns. It is also possible to have a 2×2 crossover design at the pilot stage, and a 2×4 or 4×4 crossover design atthe pivotal stage. Furthermore, it is also possible to accommodate possibly different variances at the pilot and pivotalstages.

2. Pilot stage criteria and the two-stage adaptive procedure

In Pan and Wang (2006), the criterion used in the pilot stage consists of continuing to the pivotal stage if the 90%confidence interval for �T − �R based on the pilot data is contained in the interval (−ln(�×1.25), ln(�×1.25)), where� >1. Note that this is equivalent to testing (using the TOST)

H01 : |�T − �R|� ln(1.25) + �1 vs. H11 : |�T − �R| < ln(1.25) + �1

and continuing to the pivotal stage if H01 is rejected at the 5% significance level, where �1 = ln(�) > 0. It is easy toshow that rejecting H01 at level (using the TOST) will result in a rejection probability 1 > at the parameter value|�T − �R| = ln(1.25). Conversely, the rejection of H0 : |�T − �R|� ln(1.25) at level 1 (using the TOST) impliesthe existence of �1 > 0 such that the rejection probability is a specified quantity (where < 1) at the parametervalue |�T − �R| = ln(1.25) + �1. Such a �1 will depend on the pilot stage variability �2. We thus conclude that analternative criterion for the pilot stage consists of carrying out the TOST for testing the usual average bioequivalence(ABE) hypothesis at a significance level 1 that is bigger than . In our analysis, we have chosen 1 = 0.10; however,it is possible to compute the value of 1 that corresponds to the criterion in Pan and Wang (2006). We would like toemphasize that here 1 refers to the probability of rejecting a true H01, and thereby continuing to the pivotal stage; itis not the probability of erroneously concluding bioequivalence based on the pilot sample.

Considering the pilot trial as stage I and the pivotal trial as stage II, we now develop a two-stage procedure in apilot–pivotal study, applicable to 2 × 4 designs. For the stage I and stage II designs, the sample size per sequence willbe denoted by n1 and n2, respectively. At stage I, let X1 = |D1|−ln(1.25)

�̂1/√

2n1, where D1 is the observed value of the statistic

D in the pilot trial, and �̂21 is the estimate of �2 in the pilot trial. The decision rule during the pilot trial is{

reject H0 and move to the pivotal trial if X1 �k1,

do not reject H0 if X1 > k1,

where k1 = −t2(n1−1)(1). Note that when we do not reject H0 based on the pilot trial, we are concluding that ABEdoes not hold, and hence there is no need to carry out a pivotal trial. As mentioned earlier, we shall assume 1 = 0.10in our numerical results.

Page 4: Pilot–pivotal trials for average bioequivalence

T. Mathew, Y. Wu / Journal of Statistical Planning and Inference 138 (2008) 2106–2116 2109

At stage II, let X2 = |D2|−ln(1.25)

�̂2/√

2n2, where D2 is the observed value of D in the pivotal trial, n2 is the unknown sample

size per sequence in the pivotal trial, and �̂22 is the estimate of �2 in the pivotal trial. The decision rule for the pivotal

trial is

{reject H0 and conclude bioequivalence if X2 �w(x1),

do not conclude bioequivalence if X2 > w(x1),

where w(x1) is a function of x1, to be determined.Following Koyama et al. (2005), we define the conditional power functions:

A(x1, �i ) = P(reject H0 in the pivotal trial |X1 = x1, �T − �R = �i ), i = 0, 1.

Here �0 = ln(1.25), and �1 is a value under H1, i.e., a value less than ln(1.25). It is necessary to assume a functionalform for A(x1, �i ) before we can develop our procedure. There is certainly arbitrariness in the choice of this function.The choices that we shall use are given below; we shall comment on this later. Note that we require A(x1, �i ) to be anon-increasing function of x1. The assumed functional form of A(x1, �0) is given by

A(x1, �0) = P(reject H0 in the pivotal trial |X1 = x1, D ∼ N(�0, �2/(2n2)))

={0 if x1 > k1,

a + b(k1 − x1) if a + b(k1 − x1)�1 and x1 �k1,

1 if a + b(k1 − x1) > 1 and x1 �k1,

where a and b are coefficients to be determined. Also

A(x1, �1) = P(reject H0 in the pivotal trial|X1 = x1, D ∼ N(�1, �2/(2n2)))

={0 if x1 > k1

c + d(k1 − x1) if c + d(k1 − x1)�1 and x1 �k1,

1 if c + d(k1 − x1) > 1 and x1 �k1.

The coefficients a, b, c and d are to be determined so that the unconditional type I error probability, i.e. E[A(X1, �0)]is a specified quantity, say 2, and the unconditional power, i.e. E[A(X1, �1)] is a specified quantity, say 1 − 2.That is

{E[A(X1, �0)] = ∫ +∞

−∞ A(x1, �0)f0(x1)dx1 = 2,

E[A(X1, �1)] = ∫ +∞−∞ A(x1, �1)f1(x1)dx1 = 1 − 2,

where f0(x1) and f1(x1) denote the density function of X1 when � = �0 and � = �1, respectively.Note that E[A(X1, �0)] and E[A(X1, �1)] do depend on �2. An estimate of �2 is certainly available from the pilot

trial and can be used in the above calculations. The coefficients a, b, c, d are actually not unique. It is possible to choosethese to make E[A(X1, �0)] and E[A(X1, �1)] somewhat insensitive to �2; this will be explained in the next section.This is certainly desirable, and we have indeed made such a choice.

We shall now explain the procedure for determining the sample size n2 and the critical value w(x1) in the pivotalstage, given the outcome x1 from the pilot study. Note that we have x1 �k1, where k1 =−t2(n1−1)(1), where n2 is also

a function of x1. Furthermore, in the pivotal study we reject H0 and conclude ABE if X2 �w(x1), where X2 = |D2|−�0�̂2/

√2n2

.

Under the pivotal trial, we shall obtain representations for the type I error probability A(x1, �0) and power A(x1, �1),using the distribution of X2. Both A(x1, �0) and A(x1, �1) so derived are obviously functions of n2 and w(x1). Weshall equate them to the functional forms of A(x1, �0) and A(x1, �1) given earlier. We thus get two equations involvingthe two unknown quantities n2 and w(x1). Solving, we get n2 and w(x1), for each x1. Recall that for computingthe type I error probability, we use the distribution D2 ∼ N(�0, �

2/(2n2)) and for computing power, we use the

Page 5: Pilot–pivotal trials for average bioequivalence

2110 T. Mathew, Y. Wu / Journal of Statistical Planning and Inference 138 (2008) 2106–2116

distribution D2 ∼ N(�1, �2/(2n2)).Now

A(x1, �0) = P(X2 �w(x1)|X1 = x1 and D2 ∼ N(�0, �2/(2n2)))

= P

( |D2| − �0

�̂2/√

2n2�w(x1)

∣∣∣∣ X1 = x1 and D2 ∼ N

(�0,

�2

2n2

))

= P

(−�0 − w(x1)

�̂2√2n2

�D2 ��0 + w(x1)�̂2√2n2

∣∣∣∣X1 = x1 and D2 ∼ N

(�0,

�2

2n2

))

= P

(D2 − �0

�̂2/√

2n2�w(x1)

∣∣∣∣X1 = x1 and D2 ∼ N

(�0,

�2

2n2

))

− P

(D2 − �0 + 2�0

�̂2/√

2n2� − w(x1)

∣∣∣∣X1 = x1 and D2 ∼ N

(�0,

�2

2n2

))

= P(T2(n2−1) �w(x1)) − P

(T2(n2−1)

(2�0

�/√

2n2

)� − w (x1)

),

where T2(n2−1) denotes a central t random variable with 2(n2 − 1) df and T2(n2−1)(�) denotes a non-central t randomvariable with 2(n2 − 1) df and non-centrality parameter �. A similar derivation gives

A(x1, �1) = P

(T2(n2−1)

(− �0 − �1

�/√

2n2

)�w(x1)

)− P

(T2(n2−1)

(− �0 + �1

�/√

2n2

)� − w(x1)

).

The two unknown quantities n2 and w(x1) can now be obtained by numerically solving the following equations:

a + b(k1 − x1) = P(T2(n2−1) �w(x1)) − P

(T2(n2−1)

(2�0

�/√

2n2

)� − w(x1)

), (2)

c + d(k1 − x1) = P

(T2(n2−1)

(− �0 − �1

�/√

2n2

)�w(x1)

)− P

(T2(n2−1)

(− �0 + �1

�/√

2n2

)� − w(x1)

). (3)

The next section is on the numerical computation of n2 and w(x1) by solving the above two equations. We note that thedetermination of these quantities do require the specification of a value for �, perhaps obtained from the pilot study.

3. Numerical implementation

The constants a, b, c, d have to be determined subject to the conditions E[A(X1, �0)]=2 and E[A(X1, �1)]=1− 2.Note also that the quantity k1 in the functional form of A(x1, �0) and A(x1, �1) is −t2(n1−1)(1), where t2(n1−1)(1) isthe upper 1 percentile of a t distribution with 2(n1 − 1) df. It is clear that a, b, c, d are not unique. In what follows, weshall fix values for a and c and then determine b and d so that E[A(X1, �0)] = 2 and E[A(X1, �1)] = 1 − 2. In factwe shall choose a and c close to 2 and 1 − 2, respectively, so that b and d come out to be quite small. This resultsin functions A(x1, �0) and A(x1, �1) that are rather insensitive to the choice of the pilot sample size n1, the value of� and also 1 (the type I error probability in the pilot stage). In other words, this will result in a “universal choice” ofthe functions A(x1, �0) and A(x1, �1), once the values of �0, �1, 2 and 1 − 2 are specified. For the values 1 = 0.1,�0 = ln(1.25), �1 = 0.05, 2 = 0.05 and 1 − 2 = 0.9, the functions are given by

A(x1, �0) ={0 if x1 > k1,

0.0495 + 0.0095 × (k1 − x1) if 0.0495 + 0.0095 × (k1 − x1)�1 and x1 �k1,

1 if 0.0495 + 0.0095 × (k1 − x1) > 1 and x1 �k1

(4)

and

A(x1, �1) ={0 if x1 > k1,

0.9 + 0.0019 × (k1 − x1) if 0.9 + 0.0019 × (k1 − x1)�1 and x1 �k1,

1 if 0.9 + 0.0019 × (k1 − x1) > 1 and x1 �k1.

(5)

The values a = 0.0495 and c = 0.9 were chosen in advance. The quantities b and d were then chosen so as to satisfyE[A(X1, �0)] = 0.05 and E[A(X1, �1)] = 0.9, where we estimated these expected values using 10,000 simulations.

Page 6: Pilot–pivotal trials for average bioequivalence

T. Mathew, Y. Wu / Journal of Statistical Planning and Inference 138 (2008) 2106–2116 2111

Table 1Values of n2 and w(x1) for 1 = 0.1, 2 = 0.05,1 − 2 = 0.9, n1 = 4 and 6, �1 = 0.05 and � = 0.2

x1 n1

4 6

n2 w(x1) n2 w(x1)

−8 5 −1.3191 5 −1.3151−7.5 6 −1.3266 6 −1.3223−7 6 −1.3569 6 −1.3528−6.5 6 −1.3885 6 −1.3841−6 6 −1.4213 6 −1.4168−5.5 6 −1.4557 6 −1.4510−5 6 −1.4917 6 −1.4867−4.5 6 −1.5295 6 −1.5242−4 6 −1.5693 6 −1.5638−3.5 6 −1.6115 6 −1.6057−3 7 −1.6319 7 −1.6259−2.5 7 −1.6782 7 −1.6717−2 7 −1.7278 7 −1.7209

n1: number of subjects per sequence in the pilot trial; n2: number of subjects per sequence in the pivotal trial; 1: significance level for the pilot trial;2: unconditional type I error probability for the pivotal trial; 1 − 2: unconditional power for the pivotal trial; �1: a value of |�T − �R | under thealternative hypothesis; x1: value of the test statistic based on pilot trial data; w(x1): pivotal trial critical value.

Table 2Values of n2 and w(x1) for 1 = 0.1 and 0.15,� = 0.2, �1 = 0.05 and n1 = 4 (the various quantities are as defined for Table 1)

x1 1

0.1 0.15

n2 w(x1) n2 w(x1)

−8 5 −1.3191 5 −1.3011−7.5 6 −1.3266 5 −1.3307−7 6 −1.3569 6 −1.3382−6.5 6 −1.3885 6 −1.369−6 6 −1.4213 6 −1.4011−5.5 6 −1.4557 6 −1.4345−5 6 −1.4917 6 −1.4695−4.5 6 −1.5295 6 −1.5061−4 6 −1.5693 6 −1.5447−3.5 6 −1.6115 6 −1.5854−3 7 −1.6319 6 −1.6286−2.5 7 −1.6782 7 −1.6495−2 7 −1.7278 7 −1.697

The value �1 = 0.05 has been recommended in the FDA (2001) guidance document for the purpose of sample sizecalculation. We would like to point out that even though the coefficients of k1 − x1 in (4) and (5) are rather small, thevalues of x1 do have a significant influence on the solutions to n2 and w(x1); this will be clear from the numericalresults.

It turns out that the above functions remain practically unaffected if we choose, for example, 1 = 0.15 insteadof 1 = 0.1. The functions are also insensitive for 0.05���0.75. However, for smaller values of �, we noted thatE[A(X1, �1)] > 0.9. For � = 0.01, we have E[A(X1, �1)] = 0.9886 and for � = 0.001, we have E[A(X1, �1)] = 1.However, even for �=0.01 and 0.001, we still have E[A(X1, �0)]=0.05. Thus we conclude that the functions A(x1, �0)

and A(x1, �1) can be used in all pilot–pivotal bioequivalence problems as long as we fix the values �0 = ln(1.25),�1 =0.05, 2 =0.05 and 1− 2 =0.9. The only noticeable change is in the power function A(x1, �1), with unconditionalpower more than the specified value of 0.90, when � becomes very small.

Page 7: Pilot–pivotal trials for average bioequivalence

2112 T. Mathew, Y. Wu / Journal of Statistical Planning and Inference 138 (2008) 2106–2116

Table 3Values of n2 and w(x1) for different values of � and x1, and for �1 = 0.05 (the various quantities are as defined for Table 1)

x1 � = 0.2 � = 0.25 � = 0.3 � = 0.35

n2 w(x1) n2 w(x1) n2 w(x1) n2 w(x1)

−8 5 −1.3191 8 −1.2733 11 −1.2558 15 −1.2445−7.5 6 −1.3266 8 −1.3014 11 −1.283 15 −1.2711−7 6 −1.3569 8 −1.3304 11 −1.3112 15 −1.2987−6.5 6 −1.3885 8 −1.3606 11 −1.3404 15 −1.3273−6 6 −1.4213 8 −1.392 12 −1.3707 16 −1.3546−5.5 6 −1.4557 9 −1.4153 12 −1.3977 16 −1.3854−5 6 −1.4917 9 −1.449 12 −1.4264 16 −1.4175−4.5 6 −1.5295 9 −1.4844 12 −1.4647 16 −1.451−4 6 −1.5693 9 −1.5215 13 −1.4962 17 −1.4838−3.5 6 −1.6115 9 −1.5607 13 −1.5339 17 −1.5207−3 7 −1.6319 9 −1.6023 13 −1.5737 17 −1.5598−2.5 7 −1.6782 10 −1.6363 13 −1.6161 18 −1.5987−2 7 −1.7278 10 −1.683 14 −1.6565 18 −1.6428

x1 � = 0.4 � = 0.45 � = 0.5 � = 0.55

n2 w(x1) n2 w(x1) n2 w(x1) n2 w(x1)

−8 19 −1.2382 23 −1.2343 29 −1.2305 35 −1.2281−7.5 19 −1.2646 24 −1.2597 29 −1.2565 35 −1.254−7 19 −1.2918 24 −1.2867 30 −1.2829 36 −1.2804−6.5 20 −1.3188 25 −1.3138 30 −1.3106 36 −1.308−6 20 −1.348 25 −1.3428 31 −1.3389 37 −1.3363−5.5 20 −1.3784 25 −1.373 31 −1.3689 38 −1.3658−5 21 −1.4087 26 −1.4035 32 −1.3995 38 −1.3968−4.5 21 −1.4418 26 −1.4363 32 −1.4321 39 −1.4289−4 22 −1.4751 27 −1.4698 33 −1.4656 40 −1.4624−3.5 22 −1.5115 27 −1.5059 34 −1.5009 41 −1.4977−3 22 −1.55 28 −1.5431 34 −1.5388 41 −1.5353−2.5 23 −1.5893 29 −1.5826 35 −1.5783 42 −1.5748−2 24 −1.6314 29 −1.6256 36 −1.6204 43 −1.6169

x1 � = 0.6 � = 0.65 � = 0.7 � = 0.75

n2 w(x1) n2 w(x1) n2 w(x1) n2 w(x1)

−8 41 −1.2264 48 −1.225 56 −1.2238 64 −1.2229−7.5 42 −1.252 49 −1.2505 56 −1.2495 65 −1.2485−7 42 −1.2786 49 −1.2771 7 −1.2759 66 −1.2748−6.5 43 −1.3059 50 −1.304 58 −1.3032 67 −1.3021−6 44 −1.3342 51 −1.3327 59 −1.3315 68 −1.3304−5.5 44 −1.3639 52 −1.3622 60 −1.3609 69 −1.3598−5 45 −1.3946 53 −1.3928 61 −1.3915 70 −1.3904−4.5 46 −1.4266 54 −1.4248 62 −1.4235 71 −1.4224−4 47 −1.4602 55 −1.4584 64 −1.4569 73 −1.4557−3.5 48 −1.4955 56 −1.4936 65 −1.4921 74 −1.4909−3 49 −1.5327 57 −1.5308 66 −1.5292 76 −1.5279−2.5 50 −1.5721 59 −1.57 68 −1.5684 78 −1.5671−2 52 −1.6139 60 −1.6119 70 −1.6102 80 −1.6089

The conditional type I error probability can certainly assume bigger values for certain choices of x1. In particular,we are allowing the possibility of A(x1, �0) to be even 1. Note that the probability of this is really small. For example,for n1 =4, 1 =0.1, k1 =−t2(n1−1)(1)=−1.4398, �=0.2, we have A(x1, �0) > 0.1 if x1 <−6.7559. The probabilityfor this is only 0.0399. We shall comment on this further when we discuss the choice of the functions A(x1, �0) andA(x1, �1) in the last section.

Page 8: Pilot–pivotal trials for average bioequivalence

T. Mathew, Y. Wu / Journal of Statistical Planning and Inference 138 (2008) 2106–2116 2113

We now have to numerically solve Eqs. (2) and (3) in order to obtain n2 and w(x1). For x1, we shall consider onlythe values −8�x1 �k1 since P(X1 < − 8) is almost zero. Our approach toward solving Eqs. (2) and (3) is as follows.Specify a value of x1 and a large interval for n2, say 5�n2 �100. For each value of n2 in this interval, determinew(x1) so that Eq. (2) holds. Now substitute the (n2, w(x1)) pairs so obtained into Eq. (3), and pick the pair thatsatisfies Eq. (3).

Table 1 contains the values so obtained for sample size n1 =4 and n1 =6 of the pilot trial when �=0.20. We observethat n1 does not have a significant influence on n2 and w(x1). We also did the same computation for different �’s andthe conclusion was the same. Hence, in the following simulations, we only consider the case of n1 = 4. Table 2 showssolutions of n2 and w(x1) for 1 = 0.1 and 0.15 in the pilot trial; the solutions are not significantly different between1 =0.1 and 1 =0.15. Table 3 gives solutions of n2 and w(x1) for various combinations of x1 and �. We observe that n2decreases when x1 decreases, while w(x1) increases when x1 decreases. This is to be expected. In a pilot trial, smallervalue of x1 indicates more evidence in favor of bioequivalence. Table 3 also shows that n2 and w(x1) increase when �increases; once again, as expected. Similar pattern also shows up in the remainder of Table 3. In practice, if values of� and x1 different from those in Table 3 are encountered, linear interpolation can be used to obtain the correspondingn2 and w(x1). The smallest value of � reported in Table 3 is � = 0.2. If a smaller value is under consideration, ourrecommendation is to use the sample size corresponding to � = 0.2, since these sample sizes are already rather small.Note that some of the � values in Table 3 are rather big, and are unlikely to be encountered in applications. We haveactually tried to cover the range of � values considered in the FDA (2001) document. Furthermore, somewhat highervalues of � will be encountered in the context of highly variable drug products; for examples, see Patterson et al. (2001)and Tothfalusi and Endrenyi (2003).

Even though we have developed our procedure for a 2 × 4 design, our approach covers the 4 × 4 and 2 × 2 designsas well. In the case of a 4 × 4 design, simply replace 2n1 with 4n1, 2n2 with 4n2, and the df’s 2(n1 − 1) and 2(n2 − 1)

with 4(n1 − 1) and 4(n2 − 1), respectively.

4. Summary and practical recommendation

From a practical point of view, our procedure can be summarized as follows, assuming that a 2×4 crossover designis used in the pilot stage and pivotal stage.

(i) Use a sample size n1 of between 4 and 6 subjects per sequence in the pilot stage, and use 1=0.1 as the significancelevel at the pilot stage. Let k1 = −t2(n1−1)(1).

(ii) Analyze the pilot trial data using the model (1) and compute D1 and �̂1. Let X1 = |D1|−�0�̂1/

√2n1

, where �0 = ln(1.25).(iii) Let x1 denote the observed value of X1. If x1 > k1, conclude that T and R are not average bioequivalent, and the

pivotal trial will not be carried out. If x1 �k1, the pivotal trial will be carried out.(iv) Let 2 = 0.05 and 1 − 2 = 0.90, respectively, denote the significance level and the desired power at the pivotal

stage, for a specified value of � at the alternative �1 = 0.05. After observing the value of X1, the sample size n2(per sequence) required in the pivotal study can be obtained from Table 3. Use linear interpolation if the valueof � or x1 is not given in Table 3. If a value of � smaller than 0.2 is under consideration, use the sample sizecorresponding to � = 0.2.

(v) Based on the data from the pivotal trial, compute the value of X2, where X2 is defined similar to X1. Let x2 denotethe observed value of X2. Conclude average bioequivalence if x2 �w(x1).

5. Comparison with a single trial

A question of considerable practical interest is whether a pivotal trial following a pilot trial will result in sample sizesavings, compared to a single trial only. In order to answer this question, Table 4 gives the recommended sample sizes(per sequence) given in the FDA (2001) guidance document for the case of a single trial. These sample sizes are alsofor 2 × 4 designs, and correspond to 80% power at the alternative �1 = 0.05, for a 5% significant level. Note that thetotal expected sample size per sequence in a pilot–pivotal trial scenario is n1 + E(n2).

Let us first look at the overall significance level and overall power when we have a pilot trial followed by a pivotaltrial. Toward this, note that given X1 �k1, the probability of rejecting H0 in the pivotal stage is A(X1, �0). Thus

Page 9: Pilot–pivotal trials for average bioequivalence

2114 T. Mathew, Y. Wu / Journal of Statistical Planning and Inference 138 (2008) 2106–2116

Table 4FDA recommended sample sizes (per sequence) for a single trial for 80% power at �1 = 0.05

�WT = �WR �D 2 × 4 design

0.15 0.01 60.1 100.15 12

0.23 0.01 120.1 160.15 18

0.30 0.01 200.1 240.15 26

0.5 0.01 540.1 580.15 60

�WT ,�WR : within-subject variance for the test drug and reference drug, respectively; �2D : variance component representing subject-by-formulation

interaction.

Table 5Values of n2 for 2 × 4 designs, 1 = 0.1, 2 = 0.05, 1 = 2 = 0.1, n1 = 4 and �1 = 0.05

x1 � = 0.15 � = 0.212 � = 0.316

−8 5 6 12−7.5 6 7 12−7 6 7 12−6.5 6 7 12−6 6 7 13−5.5 6 7 13−5 6 7 13−4.5 6 7 13−4 6 7 14−3.5 6 7 14−3 7 8 14−2.5 7 8 14−2 7 8 15

1 − 1: power for the pilot trial; the other quantities are as defined for Table 1.

the overall type I error probability is at most 12. For example, when we choose 1 = 0.10 and 2 = 0.05 theoverall significance level is only 0.005. Similarly, suppose the pilot and pivotal trials have been designed in such away that the power for each is 0.90 for a specified value of �, at an alternative value �1. Then the overall poweris (0.90)2.

We shall consider a limited comparison of pilot–pivotal trials with a single trial, as follows. Keep the significancelevel as 0.1 in the pilot and 0.05 in the pivotal trials, and 0.05 in the single trial. Fix a value of � and consider thespecific alternative �1 = 0.05. Now consider the sample sizes required for a power of 0.90 for the pilot trial and anunconditional power of 0.90 for the pivotal trial, so that the overall unconditional power is 0.80 (approximately). In thisset up, we shall compare the total expected sample size per sequence for the pilot–pivotal trial, with that for a singletrial for which the power is 0.8. Note that if � is big, a rather big sample size n1 will be required in the pilot trial toguarantee a power of 0.90, and this may be unrealistic. Thus we shall consider values of � that give reasonable samplesizes in the pilot trial.

Table 5 contains the values so obtained for some combinations of n1 and � by using our approach. ComparingTables 4 and 5, we observe that our total expected sample sizes (per sequence) are close to, and sometimes less than, those

recommended by FDA. For the first case, let �D=0.01 and �WT =�WR=0.15. Thus �=√

�2D + (�2

WT + �2WR)/2=0.15,

which is < 0.2. Hence, the estimated sample size for �=0.2 is applied. The estimated sample size per sequence for a 2×4

Page 10: Pilot–pivotal trials for average bioequivalence

T. Mathew, Y. Wu / Journal of Statistical Planning and Inference 138 (2008) 2106–2116 2115

design is 6 according to the FDA guidance file. In our approach, the sample size in the pilot trial is 4 and our estimatedsample size is 5–7 in the pivotal trial. Thus, the total estimated sample size per sequence is 9–11, which is bigger thanthe one recommended by FDA. However, this is reasonable since we may not want the pivotal sample size to be less

than the one in the pilot trial. Now let �D = 0.15 and �WT = �WR = 0.15. Thus � =√

�2D + (�2

WT + �2WR)/2 = 0.212,

which is between 0.2 and 0.25, and linear interpolation is applied to compute the estimated sample size. The estimatedsample size per sequence for a 2 × 4 design is 12 according to the FDA guidance file. In our approach, the estimatedsample size for �= 0.212 is 6–8 in the pivotal trial. Thus, the total estimated sample size per sequence is 10–12, whichis close to the one recommended by FDA. Now consider a third case with �D = 0.3 and �WT = �WR = 0.1. Thus

� =√

�2D + (�2

WT + �2WR)/2 = 0.316. The estimated sample size per sequence for a 2 × 4 design is 24 according to

the FDA guidance file. In our approach, the estimated sample size for � = 0.316 is 12–15 in the pivotal trial. Thus, thetotal estimated sample size for each sequence is 16–19. It is clear that the total estimated sample sizes can be smallerthan those recommended by the FDA. Furthermore, we have the option of stopping the trial if we do not reject the nullhypothesis in the pilot stage. Clearly, a pivotal trial following a pilot trial is advantageous.

6. Discussion

The two-stage adaptive procedure derived in this paper appears to be the first attempt to formally design a pivotalbioequivalence trial, following a pilot trial, for testing average bioequivalence. Earlier work on pilot–pivotal trials,due to Wang and Zhou (1999) and Pan and Wang (2006) dealt with the design of a pilot trial after fixing the variousparameters of a pivotal trial. Since the pivotal trial follows the pilot trial, it appears natural to design the pivotal trialfollowing the outcome of the pilot trial. Also note that at the pilot stage, one may not have any idea about the magnitudeof the variance, and an arbitrary value has to be used to determine the sample size. On the other hand, while designingthe pivotal study, we do have some information concerning the variance, namely, the estimate obtained from the pilotdata. We also believe that the strength of evidence in favor of ABE at the pilot trial should have a bearing on the designof the pivotal trial. In other words it appears natural and practically useful to design the pivotal study following thepilot study.

The procedure we have developed depends on the functional forms of A(x1, �0) and A(x1, �1) given in (4) and (5);clearly there is arbitrariness in the choice of these functions. The functions have a linear component, apart from theconstants 0 and 1. We investigated the option of replacing the linear component by a quadratic component, namelye + f (k1 − x1)

2, and chose e and f similar to the choices of a and b, or c and d. In the following discussion, we shallrefer to this as the quadratic case, and the choices in (4) and (5) as the linear case. The consequence of the quadraticchoice is two-fold: (i) for certain values of x1, the conditional type I error probability can be significantly larger than 1,even though the unconditional type I error probability is still 1. In the linear case, such inflated conditional type I errorprobabilities occurred only at fairly large negative values of x1, and these have really small probabilities. (ii) When x1is close to k1, the conditional power is much smaller in the quadratic case compared to the linear case; this will makethe required sample size much larger in the quadratic case. The converse is true for values of x1 far removed from k1.Thus the choice of the functions A(x1, �0) and A(x1, �1) should be motivated by two considerations: the conditionaltype I error probability should not be too large compared to 1 for values of x1 that are quite likely. Secondly, theconditional power should be such that the resulting sample sizes are reasonable for practical use. We believe that thechoices in (4) and (5) do meet these requirements.

In our work, we have relied on the ideas developed in Koyama et al. (2005). However, there are some differencesbetween that paper and our work. The test statistic considered in Koyama et al. (2005) follows a normal distributionsince they assume the case of a known variance. The test we have considered is the standard test for ABE, namely theTOST, and does not require knowledge of the variance to carry out the test. In their stage I, Koyama et al. (2005) have athree action decision rule: accept the null, reject the null, or continue into stage II; see Section 2. We have a two actiondecision rule: to conclude that ABE does not hold, or to continue into the pivotal study.

A major conclusion regarding the pivotal trial is that the required sample size is insensitive to the pilot trial samplesize. From their analysis, Pan and Wang (2006) and Wang and Zhou (1999) also noted that the pilot trial sample sizeis insensitive to the pivotal trial sample size. A final point to note is that our procedure really does not require theassumption of a common variance at the pilot stage and pivotal stage. It is conceivable that entirely different groups ofsubjects will be used in the two trials, and not having to assume a common variance appears more realistic.

Page 11: Pilot–pivotal trials for average bioequivalence

2116 T. Mathew, Y. Wu / Journal of Statistical Planning and Inference 138 (2008) 2106–2116

Acknowledgments

We are grateful to two referees and an Associate Editor, whose comments resulted in the clarification of severalideas, and in a significant improvement in the presentation of the results.

References

Chinchilli, V.M., Esinhart, J.D., 1996. Design and analysis of intra-subject variability in crossover experiments. Statist. Med. 15, 1619–1634.Chow, S.-C., Liu, J.-P., 2000. Design and Analysis of Bioavailability and Bioequivalence Studies. second ed. Marcel Dekker, New York.Hauschke, H., Steinijans, V., Pigeot, I., 2007. Bioequivalence Studies in Drug Development: Methods and Applications. Wiley, New York.Koyama, T., Sampson, A.R., Gleser, L.J., 2005. A calculus for design of two-stage adaptive procedures. J. Amer. Statist. Assoc. 100, 197–203.Pan, G., Wang, Y., 2006. Average bioequivalence evaluation: general methods for pilot trials. J. Biopharmaceutical Statist. 16, 207–225.Patterson, S., Jones, B., 2006. Bioequivalence and Statistics in Clinical Pharmacology. Chapman & Hall/CRC Press, New York.Patterson, S.D., Zariffa, N.M.-D., Montague, T.H., Howland, K., 2001. Non-traditional study designs to demonstrate average bioequivalence for

highly variable drug products. European J. Clinical Pharmacology 57, 663–670.Schuirmann, D.J., 1981. On hypothesis testing to determine if the mean of a normal distribution is contained in a known interval. Biometrics 37,

617.Schuirmann, D.J., 1987.A comparison of the two one-sided procedure and the power approach for assessing the equivalence of average bioavailability.

J. Pharmacokinetics Biopharmaceutics 15, 657–680.Tothfalusi, L., Endrenyi, L., 2003. Limits for the scaled average bioequivalence of highly variable drugs and drug products. Pharmaceutical Res. 20,

382–389.US Food and Drug Administration, 2001. Guidance for industry: statistical approaches to establishing bioequivalence. 〈www.fda.gov/cder/

guidance/3616fnl.htm〉.US Food and Drug Administration, 2003a. Guidance for industry: bioavailability and bioequivalence studies for orally administered drug products-

general considerations. 〈www.fda.gov/cder/guidance/5356fnl.pdf〉.US Food and Drug Administration, 2003b. Guidance for industry: bioavailability and bioequivalence studies for nasal aerosols and nasal sprays for

local action. 〈www.fda.gov/cder/guidance/5383DFT.pdf〉.Wang, Y., Zhou, S., 1999. Pilot trial for the assessment of relative bioavailability in generic drug product development: statistical power.

J. Biopharmaceutical Statist. 9, 179–187.