7
Testing non-inferiority and superiority for two endpoints for several treatments with a control John Lawrence a Some multiple comparison procedures are described for multiple armed studies. The procedures are appropriate for testing all hypotheses for comparing two endpoints and multiple test arms to a single control group, for example three different fixed doses compared to a placebo. The procedure assumes that among the two endpoints, one is designated as a primary endpoint such that for a given treatment arm, no hypothesis for the secondary endpoint can be rejected unless the hypothesis for the primary endpoint was rejected. The procedures described control the family-wise error rate in the strong sense at a specified level a. Copyright r 2010 John Wiley & Sons, Ltd. Keywords: multiple comparisons with a control; Dunnett’s test; Gatekeeping tests 1. INTRODUCTION In this article, we focus on the multiple comparisons problem for comparing m treatment arms to a control group. The different treatment arms could be different drugs or could be different dosing regimens of the same drug. Also, we assume that there are two endpoints. One is designated as the primary endpoint. We consider the special situation where the control is an active control (not placebo) and the primary endpoint is first tested for non-inferiority with respect to the active control group. Examples of active controls might be warfarin for treating atrial fibrillation or paclitaxel for treating some kinds of cancer; it depends on the objectives of the trial and what benefits the investigators hope to demonstrate for the test drugs. The primary endpoint might be the time to myocardial infarction or death, for example. Non-inferiority means that the test arm is not too much worse than the control, where how much worse is defined by some pre-defined margin. Demon- strating non-inferiority usually shows that the drug is effective as long as the margin is smaller than the established effect of the active control. Further, it is desirable to test superiority for the primary endpoint for any arm that is shown to be non- inferior. Also for a given treatment arm, the hypothesis for the secondary endpoint cannot be rejected unless the non- inferiority hypothesis for the primary endpoint was rejected. The secondary endpoint may be time to first cardiovascular hospitalization, for example. Dmitrienko et al. [1] describes a multiple testing procedure based on Dunnett’s test for the situation where the endpoints can be ordered into groups. For example, group 1 might be non-inferiority for the primary endpoint, group 2 superiority for the primary endpoint, and group 3 superiority for the secondary endpoint. Using that procedure, a win for a particular treatment for an endpoint in group 3 requires winning on the primary endpoint for that same group and winning for at least one treatment for some treatment in group 2. However, this article deals with the special topic of testing non-inferiority, which deserves attention because it has some special characteristics. Moreover, the procedures in this article do not impose any logical restriction on the testing of hypotheses that forces the tests to be ordered into groups. For example, the procedures would allow a treatment to be superior for the primary endpoint, but no treatments superior for the secondary endpoint and vice versa. This seems to make more sense for this situation. This article first describes testing one treatment versus a control, and then describes how to test three treatments versus a control; an extension to m treatments is shown in Appendix C. The extensions to more than one secondary endpoint may also be possible, but not discussed here. 2. ONE DOSE Consider a clinical trial where patients with heart disease are randomized to an active control or a test drug. The primary endpoint is the time to the first stroke, myocardial infarction, or death from any cause. The test drug will be declared effective (or non-inferior) if it can be shown to have a hazard ratio compared to the active control of no more than some fixed margin D, for example 1.1. If it is non-inferior for the primary endpoint, then we may also want to test if it is superior to the active control for the primary endpoint. Moreover, we may also want to test whether it is superior to the active control on some secondary endpoint, say time to first hospitalization. 318 MAIN PAPER Published online 15 October 2010 in Wiley Online Library (wileyonlinelibrary.com) DOI: 10.1002/pst.468 Pharmaceut. Statist. 2011, 10 318–324 Copyright r 2010 John Wiley & Sons, Ltd. a US Food and Drug Administration, Silver Spring, MD, USA *Correspondence to: John Lawrence, US Food and Drug Administration, 10903 New Hampshire Ave., Silver Spring, MD 20993-002, USA. E-mail: [email protected]

Testing non-inferiority and superiority for two endpoints for several treatments with a control

Embed Size (px)

Citation preview

Page 1: Testing non-inferiority and superiority for two endpoints for several treatments with a control

Testing non-inferiority and superiorityfor two endpoints for several treatmentswith a controlJohn Lawrencea�

Some multiple comparison procedures are described for multiple armed studies. The procedures are appropriate for testingall hypotheses for comparing two endpoints and multiple test arms to a single control group, for example three differentfixed doses compared to a placebo. The procedure assumes that among the two endpoints, one is designated as a primaryendpoint such that for a given treatment arm, no hypothesis for the secondary endpoint can be rejected unless thehypothesis for the primary endpoint was rejected. The procedures described control the family-wise error rate in the strongsense at a specified level a. Copyright r 2010 John Wiley & Sons, Ltd.

Keywords: multiple comparisons with a control; Dunnett’s test; Gatekeeping tests

1. INTRODUCTION

In this article, we focus on the multiple comparisons problemfor comparing m treatment arms to a control group. Thedifferent treatment arms could be different drugs or could bedifferent dosing regimens of the same drug. Also, we assumethat there are two endpoints. One is designated as the primaryendpoint. We consider the special situation where the controlis an active control (not placebo) and the primary endpoint isfirst tested for non-inferiority with respect to the active controlgroup. Examples of active controls might be warfarin fortreating atrial fibrillation or paclitaxel for treating some kinds ofcancer; it depends on the objectives of the trial and whatbenefits the investigators hope to demonstrate for the testdrugs. The primary endpoint might be the time to myocardialinfarction or death, for example. Non-inferiority means that thetest arm is not too much worse than the control, where howmuch worse is defined by some pre-defined margin. Demon-strating non-inferiority usually shows that the drug is effectiveas long as the margin is smaller than the established effect ofthe active control. Further, it is desirable to test superiority forthe primary endpoint for any arm that is shown to be non-inferior. Also for a given treatment arm, the hypothesis for thesecondary endpoint cannot be rejected unless the non-inferiority hypothesis for the primary endpoint was rejected.The secondary endpoint may be time to first cardiovascularhospitalization, for example.

Dmitrienko et al. [1] describes a multiple testing procedurebased on Dunnett’s test for the situation where the endpointscan be ordered into groups. For example, group 1 might benon-inferiority for the primary endpoint, group 2 superiority forthe primary endpoint, and group 3 superiority for the secondaryendpoint. Using that procedure, a win for a particular treatmentfor an endpoint in group 3 requires winning on the primaryendpoint for that same group and winning for at least one

treatment for some treatment in group 2. However, this articledeals with the special topic of testing non-inferiority, whichdeserves attention because it has some special characteristics.Moreover, the procedures in this article do not impose anylogical restriction on the testing of hypotheses that forces thetests to be ordered into groups. For example, the procedureswould allow a treatment to be superior for the primaryendpoint, but no treatments superior for the secondaryendpoint and vice versa. This seems to make more sense forthis situation.

This article first describes testing one treatment versus acontrol, and then describes how to test three treatments versusa control; an extension to m treatments is shown in Appendix C.The extensions to more than one secondary endpoint may alsobe possible, but not discussed here.

2. ONE DOSE

Consider a clinical trial where patients with heart disease arerandomized to an active control or a test drug. The primaryendpoint is the time to the first stroke, myocardial infarction, ordeath from any cause. The test drug will be declared effective(or non-inferior) if it can be shown to have a hazard ratiocompared to the active control of no more than some fixedmargin D, for example 1.1. If it is non-inferior for the primaryendpoint, then we may also want to test if it is superior to theactive control for the primary endpoint. Moreover, we may alsowant to test whether it is superior to the active control onsome secondary endpoint, say time to first hospitalization.

31

8

MAIN PAPER

Published online 15 October 2010 in Wiley Online Library(wileyonlinelibrary.com) DOI: 10.1002/pst.468

Pharmaceut. Statist. 2011, 10 318–324 Copyright r 2010 John Wiley & Sons, Ltd.

aUS Food and Drug Administration, Silver Spring, MD, USA

*Correspondence to: John Lawrence, US Food and Drug Administration, 10903New Hampshire Ave., Silver Spring, MD 20993-002, USA.E-mail: [email protected]

Page 2: Testing non-inferiority and superiority for two endpoints for several treatments with a control

These last two goals are equally important in the sense thatwe would want to test one of them even if we did not succeedon the other. The goal is to make a testing procedure thathas those properties and controls the familywise error rate atlevel a for testing all these 3 hypotheses described in thissituation.

Trials with the objective of showing non-inferiority areusually very large (or have many events) and we will assumethat large sample theory will apply for all estimates and teststatistics [see the Discussion section for advice about the smallsample case]. Let l̂P denote the estimates of the log-hazardratio of the test group compared to the active control for theprimary endpoint. Also, let l̂S denote the estimated log-hazardratios for the secondary endpoint. By the large sample theory,ðl̂P; l̂SÞ

0 has mean equal to the true values of the parametersðlP; lSÞ

0 and is approximately bivariate normal with correlationmatrix L. We will assume L is known and does not depend onany unknown parameters and that their standard deviationsare sP and sS.

The elementary hypotheses to be tested can be written asH1 : lPX logðDÞ, H2 : lPX0, and H3 : lSX0. To construct a validprocedure, the test drug will be declared non-inferior for theprimary endpoint if T ni

P ¼ ðl̂P � logðDÞÞ=sPoc1 where D is thenon-inferiority margin for the hazard ratio and c1 is yet to bedetermined. If it is declared non-inferior for the primaryendpoint, then it will be superior for the primary endpoint ifT s

P ¼ l̂P=sPoc2 and it will be superior for the secondaryendpoint if TS ¼ l̂S=sSoc3.

Since no other hypotheses can be tested unless the studysucceeds in showing non-inferiority, we can define c1 byP½T ni

P oc1jlP ¼ logðDÞ� ¼ a. Since T niP is standard normal when

lP = log(D), the solution is the lower a-quantile of the standardnormal distribution, i.e. c1 =F�1(a).

Now, we are going to construct a closed test procedure [2] byconsidering how to test each of the seven non-emptyintersections of the three null hypotheses. We will use T ni

P oc1

as the rejection criteria to test these four intersections:H123 ¼ H1 \ H2 \ H3, H12 ¼ H1 \ H2, H13 ¼ H1 \ H3, and H1.Also, we will use T s

Poc2 or TSoc3 to test H23 ¼ H2 \ H3, useT s

Poc2 to test H2, and use TSoc3 to test H3. It remains to find c2

and c3 that will make these last 3 tests valid tests.The constants c2 and c3 need to satisfy the following

inequality: P½T sPoc2 or TSoc3jlP ¼ lS ¼ 0�pa. This is sufficient

because for any fx; yg 2 ½0;1Þ � ½0;1Þ, one can show: P½T sPo

c2jlP ¼ x�pP½T sPoc2 or TSoc3jflP; lSg ¼ fx; yg�, P½TSoc3jlS ¼ y�

pP½T sPoc2 or TSoc3jflP; lSg ¼ fx; yg� and P½T s

Poc2 orTSoc3jflP; lSg ¼ fx; yg�pP½T s

Poc2 or TSoc3j lP ¼ lS ¼ 0�. For exam-ple, if Corrðl̂P; l̂SÞ ¼ 0:5 and a= 0.025, then one possible choiceis c2 = c3 =�2.212. Figure 1 shows a graph of other choiceswhen Corrðl̂P; l̂SÞ ¼ 3

4 ;12 ;

14, and 0.

Note that c2 and c3 are always smaller than c1. Also, c3 is astrictly decreasing function of c2 such that as c2 decreases to�N, c3 approaches c1.

This procedure is simple to implement, but it is not an a-exhaustive procedure [3]. It could be made slightly morepowerful by testing each intersection at level a. The a-exhaustive procedure would conclude superiority forthe primary endpoint if T s

Poc2 or fT sPoF�1ðaÞ and TSoc3g

and conclude superiority for the secondary endpoint if fT niP oc1

and TSoc3g or fT sPoc2 and TSoF�1ðaÞg.

It can be seen that the gain in power of the a-exhaustiveprocedure over the original procedure is only for the superiority

of the primary and secondary hypotheses (they are identical asfar as non-inferiority). For example, with a= 0.025, Corrðl̂P; l̂SÞ ¼0:5 and c2 = c3 =�2.212 and assuming 1=2ET ni

P ¼ ETsP ¼ ETS ¼ �2

where ET denotes the expected value of the statistics T, thechance of concluding superiority for the primary endpoint usingthe simpler procedure is P½T s

Poc2� ¼ Fð21c2Þ � 0:416 while thechance of concluding superiority using the a-exhaustiveprocedure is P½T s

Poc2 or fT sPoF�1ðaÞ and TSoc3g� ¼ P½T s

Poc2�1P½c2oT s

PoF�1ðaÞ and TSoc3�E0.41610.042 = 0.458. The chanceof concluding superiority for the secondary endpoint using thesimpler procedure is also 0.416 while the chance using the a-exhaustive procedure is

P½fT niP oc1 and TSoc3g or fT s

Poc2 and TSoF�1ðaÞg�

¼ P½T niP oc1 and TSoc3�1P½T s

Poc2 and TSoF�1ðaÞ�

� P½T sPoc2 and TSoc3� � 0:457:

Another comparison can be made to a Bonferroni-based closedtest procedure that starts by testing for non-inferiority usinglevel a. If this is rejected, the two tests for superiority are doneusing the equal weighted Holm’s procedure. Under the sameassumptions as in the last paragraph, the probability ofconcluding superiority for the primary endpoint is

P½T sPoF�1ða=2Þ or fT s

PoF�1ðaÞ and TSoF�1ða=2Þg�

P½T sPoF�1ða=2Þ�1P½F�1ða=2ÞoT s

PoF�1ðaÞ and TSoF�1ða=2Þ� � 0:452

and the probability of concluding superiority for the secondaryendpoint is

P½fT niP oF�1ðaÞ and TSoF�1ða=2Þg or fT s

PoF�1ða=2Þ and TSoF�1ðaÞg�

¼ P½T niP oF�1ðaÞ and TSoF�1ða=2Þ�1P½T s

PoF�1ða=2Þ and TSoF�1ðaÞ�

� P½T sPoF�1ða=2Þ and TSoF�1ða=2Þ� � 0:450:

These calculations in the last two paragraphs illustrate that theBonferroni-based closed test procedure is more powerful thanthe simple test procedure, but not as powerful as the full closureprocedure.

3. THREE DOSES

Consider a clinical trial similar to the one in Section 2, except thepatients are randomized to an active control or to one of threedoses of a test drug. The primary endpoint is the time to the first 3

19

Figure 1. Possible values of c2 and c3 when a= 0.025 and Corrðl̂P; l̂SÞ ¼ 34 ;

12 ;

14,

and 0 (going from top to bottom).

J. Lawrence

Pharmaceut. Statist. 2011, 10 318–324 Copyright r 2010 John Wiley & Sons, Ltd.

Page 3: Testing non-inferiority and superiority for two endpoints for several treatments with a control

stroke, myocardial infarction, or death from any cause. There arenow three hypotheses for non-inferiority for the primaryendpoint, three for superiority for the primary endpoint, andthree for the secondary endpoint. The goal is to make a testingprocedure that has similar properties as Section 2 (that is, if non-inferiority is concluded for any dose, then superiority can betested for both endpoints for that dose) and controls the family-wise error rate at level a for testing all of these nine hypothesesdescribed in this situation.

Let l̂P;1, l̂P;2, and l̂P;3 denote the estimates of the log-hazardratio of the three test groups compared to the active control forthe primary endpoint. Also, let l̂S;1, l̂S;2, and l̂S;3 denote theestimated log-hazard ratios for the secondary endpoint. By thelarge sample theory, ðl̂P;1; l̂P;2; l̂P;3; l̂S;1; l̂S;2; l̂S;3Þ

0 has meanequal to the true values of the parameters and is approximatelymultivariate normal with covariance matrix L. We will assume Lis known and does not depend on any unknown parametersand the standard deviations are sP;i and sS;i , i = 1, 2, 3. The ninehypotheses to be tested are H1;i : lP;iX logðDÞ, H2;i : lP;iX0, andH3;i : lS;iX0 for i = 1, 2, 3.

Group i will be declared non-inferior for the primary endpointif T ni

P;i ¼ ðl̂P;i � logðDÞÞ=sP;ioc1 where D is the non-inferioritymargin for the hazard ratio and c1 is yet to be determined. Ifgroup i is declared non-inferior for the primary endpoint, then itwill be superior for the primary endpoint if T s

P;i ¼ l̂P;i=sP;ioc2

and/or it will be superior for the secondary endpoint ifTS;i ¼ l̂S;i=sS;ioc3.

Since no other hypotheses can be tested for a group unless itsucceeds in showing non-inferiority, we can define c1 by

P½mini

T niP;ioc1jlP;i ¼ logðDÞ� ¼ a.

This can be found numerically using the algorithm forcomputing multivariate normal probabilities described in [4].

The constants c2 and c3 will depend on c1 and each other.There is some flexibility in the choice of these constants andthey should be inversely related. Suppose that c3 =�N, whichshould allow c2 to be as large as possible. We are going to thinkabout how we can test each intersection of null hypotheses. Forany j, we want to be able to reject the intersection H2;j \fT

i 6¼j H1;ig if T sP;joc2 or min

i 6¼jT ni

P;ioc1. So, c2 will need to be small

enough to ensure P½T sP;joc2 or min

i 6¼jT ni

P;ioc1jlP;j ¼ 0 and lP;j ¼

logðDÞ for i 6¼ j�pa. Similarly, we want to ensure P½T niP;joc1

or mini 6¼j

T sP;ioc2jlP;j ¼ logðDÞ and lP;i ¼ 0 for i 6¼ j�pa. Also, we

need P½mini

T sP;ioc2jlP;i ¼ 0�pa. Note that all these conditions

are satisfied if and only if c2pc1.After c2 is chosen, c3 will need to be small enough to satisfy

the following inequalities:

(i) P½T sP;joc2 or TS;joc3 or min

i 6¼jT ni

P;ioc1�pa given lP;j ¼ lS;j ¼ 0and lP;i ¼ logðDÞ for i 6¼ j

(ii) P½T niP;joc1 or min

i 6¼jT s

P;ioc2 or mini 6¼j

TS;ioc3�pa given lP;j ¼logðDÞ and lP;i ¼ lS;i ¼ 0 for i 6¼ j

(iii) P½mini

T sP;ioc2 or min

iTS;ioc3jlP;i ¼ lS;i ¼ 0�pa.

If non-inferiority is declared for a particular group i, i.e.T ni

P;ioc1, then the test procedure will declare superiority on theprimary endpoint if T s

P;ioc2 and will independently declaresuperiority for the secondary endpoint if TS;ioc3. This procedureis a closed test procedure [3]. As such, it controls the familywiseerror rate at level a. To see why it is a closed test procedure, we

need to describe how each intersection of null hypotheses willbe tested. The process should be more clear by taking a look ata specific example such as H2;1 \ H3;1 \ H1;3 \ H3;3. This will berejected if either T s

P;1oc2 or TS;1oc3 or T niP;3oc1. Notice that the

part H3,3 in the intersection is ignored in defining the rejectioncriteria because the primary non-inferiority hypothesis for group3 supersedes that. Then, it is straightforward to define therejection criteria for the remaining hypotheses. This is an a-levelprocedure for testing this intersection because P½T s

P;1oc2 or TS;1oc3 or T ni

P;3oc1� given H2;1 \ H3;1 \ H1;3 \ H3;3 is less than or equal

to P½T niP;3oc1 or min

i 6¼3T s

P;ioc2 or mini 6¼3

TS;ioc3�pa given lP;3 ¼ log

ðDÞ and lP;i ¼ lS;i ¼ 0 for i 6¼ 3. This last probability is less thanor equal to a by the definition of the constants because themaximum over the region occurs on the boundary and oneevent is contained in the other.

As in Section 2, this procedure is easy to implement, but is nota-exhaustive. It can be made a-exhaustive by testing eachintersection at level a. There are 511 non-empty intersections.Some of these intersections are redundant since they use thesame test. But, as we saw in Section 2, the extra effort might payoff in a non-trivial increase in power. See the next section andAppendices B and C for more details about a-exhaustiveprocedures.

4. EXAMPLE WITH THREE DOSES

Suppose the study is planned to have a total of 2000 events forthe primary endpoint and the patients are equally allocatedbetween treatment arms, the non-inferiority margin is 1.1,a= 0.025, and that there will be 3000 events for the secondaryendpoint. In this case,

L ¼

1 12

12 r r

2r2

12 1 1

2r2 r r

212

12 1 r

2r2 r

r r2

r2 1 1

212r

2 r r2

12 1 1

2r2

r2 r 1

212 1

26666664

37777775

where r represents the correlation between the two endpoints;assume r to be 0.5 in this example.

Now, c1 =�2.349 and we can choose c2 to be anythingsmaller than that. Suppose we take c2 =�2.6. Then, the largestvalues of c3 that will satisfy inequalites (i), (ii), and (iii) inSection 3 are �2.602, �2.585, and �2.569 respectively (seeAppendix A). Since we need c3 to satisfy all three, we have touse the smallest, i.e. c3 =�2.602. Figure 2 shows some otherchoices of c2 and c3. The figure shows that there is a tradeoffbetween power of concluding superiority for the primaryendpoints versus concluding superiority for the secondaryendpoint. If the objectives are to have roughly the samechance of concluding either given the same effect sizes, thena choice such as c2 =�2.601 and c3 =�2.601 would bereasonable.

One can define a Bonferroni-based closed test procedure forthis scenario also. We will use the graphical approach describedin [5] to define one possible procedure shown in Figure 3. In thegraph, the hypotheses are represented by nodes. The threenodes in the top row are for the non-inferiority for the primaryendpoint for the three doses, the next row is for superiority forthe primary endpoint, and the bottom row is for superiority for3

20

J. Lawrence

Copyright r 2010 John Wiley & Sons, Ltd. Pharmaceut. Statist. 2011, 10 318–324

Page 4: Testing non-inferiority and superiority for two endpoints for several treatments with a control

the secondary endpoint. In words, for each dose, non-inferiorityis first tested at level a/3. If rejected, then Holm’s procedure isused to test both superiority hypotheses for that dose. If allthree hypotheses are rejected for a particular dose, then theother doses can each be tested at level a/2 rather than the initiala/3 (this transfer is represented in the graph by the edges withweight e/2). Finally, if in addition, all three hypotheses arerejected for a second dose, then the remaining dose can betested at level a.

It is difficult to compare the power of the simple proceduredescribed in this section with the procedure in Figure 3 becausethere are different ways of defining power. Certainly, neitherprocedure dominates the other. For starters, the graphical pro-cedure rejects no hypotheses unless minfT ni

P;igoF�1ð0:025=3Þ� �2:394, but the other procedure will reject a non-inferiority

hypothesis for dose i if T niP;io� 2:349. This tells us that the

graphical procedure does not dominate the other procedureand moreover, if power is defined as the probability of rejectingat least one hypothesis, the simple Dunnett-based procedure isuniformly (for any postulated effect sizes for the primaryendpoint) more powerful. But, on the other hand, if all thehypotheses for dose 1 can be rejected in the Bonferroni-basedprocedure, then non-inferiority for the other doses (i = 2 and 3)can be concluded if T ni

P;ioF�1ð0:025=2Þ � �2:241 butthe simple Dunnett-based procedure still uses the criticalvalue �2.349.

Furthermore, the full closure of the Dunnett’s test dominatesthe simpler procedure and also dominates the procedure inFigure 3 because it is based on the same concept, but uses thecorrelation to define larger critical values. However, it is muchharder to write down. In Appendix B, an a-exhaustive closed testbased on Dunnett’s test is defined explicitly and compared tothe procedure in Figure 3. In each case in the table, the rejectionregion for the Dunnett-based test includes the rejection regionfor the Bonferroni-based test, so the former dominates thelatter. Appendix C shows a general algorithm for constructing aDunnett-based closed test procedure for the case of mtreatment arms. Note: the algorithm in Appendix C was notused to define the procedure in Appendix B. In Appendix B, itwas modified slightly to construct a procedure that dominatesthe Bonferroni-based procedure, but is still based on the closureof the Dunnett’s test. There are many ways of defining suchprocedures based on Dunnett’s test and this just illustrates twoof them.

5. DISCUSSION

This article provides an algorithm for testing non-inferiority andsuperiority for a primary endpoint and superiority for asecondary endpoint. The examples were mainly for time toevent endpoints, but the procedure also works with any type ofendpoints as long as the test statistics are jointly multivariatenormal and the test for superiority or non-inferiority for thesame endpoint within the same treatment arm differ by aconstant. The procedure described is easy to implement afterthe constants are chosen, but is not a-exhaustive. It could bemade more powerful by modifying each test of everyintersection hypothesis so that it is tested at level a.

The correlations between endpoints are estimated from thedata, but in the procedure, we assume they are known. If thesample size (or number of events) is very large, then one wouldexpect that the estimated correlation could be treated as aknown parameter. However, if the sample size is too small toassume the correlation is known but still large enough toassume the statistics are normal, then one could use themaximum critical value over either over all possible correlationsin (�1, 1) or could use the maximum over, say, an approximate99% confidence interval for the correlation. One possible way tomake an approximate confidence interval for the correlation isto use the bootstrap distribution, randomly choosing subjectswith replacement from the pooled set of subjects from bothgroups. This simulates data under the global null hypothesisthat there is no difference between the groups [not under thenull hypothesis for non-inferiority]. Another possible way is toresample with replacement from each group separately. Thenuse the upper and lower 0.5% quantiles of the distribution ofthe correlations from this simulation. The coverage probability 3

21

Figure 3. Graphical representation of a Bonferroni-based closed test procedure.

-2.8

-3.0

c2

c3

-2.9

-2.8

-2.7

-2.6

-2.5

-2.7 -2.6 -2.5 -2.4

Figure 2. Possible values of c2 and c3 for the scenario in this section.

J. Lawrence

Pharmaceut. Statist. 2011, 10 318–324 Copyright r 2010 John Wiley & Sons, Ltd.

Page 5: Testing non-inferiority and superiority for two endpoints for several treatments with a control

will not be exactly 99%, but it should be adequate for thepurpose of giving a reasonable upper and lower bound forthe true correlation. The procedures that use Dunnett’s test inthis article do not apply to the scenario where the number ofevents is too small to use the normal approximation. In thatcase, a Bonferroni-based procedure such as described in Figure3 is preferable.

Acknowledgements

Thanks to Norman Stockbridge for writing the software toproduce Figure 3. The views expressed are those of the authorsand not necessarily those of the U.S. Food and DrugAdministration.

REFERENCES

[1] Dmitrienko A, Offen W, Wang O, Xiao D. Gatekeeping proceduresin dose-response clinical trials based on the Dunnett test.Pharmaceutical Statistics 2006; 5:19–28.

[2] Marcus R, Peritz E, Gabriel KR. On closed testing procedures withspecial reference to ordered analysis of variance. Biometrika 1976;63:655–660.

[3] Grechanovsky E, Hochberg Y. Closed procedures are better andoften admit a shortcut. Journal of Statistical Planning and Inference.1999; 76:79–91.

[4] Genz A, Bretz F. Computation of multivariate normal and t probabilities.Lecture Notes in Statistics, Vol. 195. Springer: Heidelberg, 2009.

[5] Bretz F, Maurer W, Brannath W, Posch M. A graphical approach tosequentially rejective multiple test procedures. Statistics inMedicine 2009; 28:586–604.

APPENDIX A

R program to calculate the constants in Section 4library(mvtnorm) ]uses package ‘mvtnorm’alphao�0.025m2o-diag(c(1/2,1/2,1/2,1/3,1/3,1/3))m2[lower.tri(m2)]o-c(1/4,1/4,0.5/sqrt(6),0.5/2/sqrt(6),

0.5/2/sqrt(6),1/4,0.5/sqrt(6)/2,0.5/sqrt(6),0.5/sqrt(6)/2,0.5/sqrt(6)/2,0.5/sqrt(6)/2,0.5/sqrt(6),1/6,1/6,1/6)

for (i in 1:6) m2[i,]o-m2[i,]/sqrt(m2[i,i])for (i in 1:6) m2[,i]o-m2[,i]/m2[i,i]f1o-function(c1,alpha) 1-pmvnorm(lower = rep(c1,3),

upper = rep(Inf,3),sigma = m2[1:3,1:3])-alphac1o-round(uniroot(f1,c(�2.5,�1.9), alpha = alpha)$root,3)c2o-(�2.6)f1o-function(c3,c1,c2,alpha) 1-pmvnorm(lower =

c(c2,c1,c1,c3),upper = rep(Inf,4),sigma = m2[1:4,1:4])-alpha

c31o-round(uniroot(f1,c(�3.5,�1.9),c1 = c1,c2 = c2,alpha = alpha)$root,3)

f1o-function(c3,c1,c2,alpha) 1-pmvnorm(lower =c(c2,c2,c1,c3,c3),upper = rep(Inf,5),sigma = m2[1:5,1:5])-alpha

c32o-round(uniroot(f1,c(�3.5,�1.9),c1 = c1,c2 = c2,alpha = alpha)$root,3)

f1o-function(c3,c1,c2,alpha) 1-pmvnorm(lower =c(c2,c2,c2,c3,c3,c3),upper = rep(Inf,6),sigma = m2)-alpha

c33o-round(uniroot(f1,c(�3.5,�1.9),c1 = c1,c2 = c2,alpha = alpha)$root,3)

print(c(c31,c32,c33,min(c(c31,c32,c33))))

APPENDIX B

This appendix describes the a-exhaustive Dunnett-basedprocedure and the Bonferroni-based procedure from Section 4in more detail. First, order the nine hypotheses H1;1, H2;1, H3;1,H1;2, H2;2, H3;2, H1;3, H2;3, H3;3. Use the notation H111111111 todenote the intersection of all 9 hypotheses, H111011011 to denotethe intersection of all hypotheses except the 4th and 7th, etc. Inthis way, by using a sequence of nine 0’s and 1’s, we can denoteall the 511 non-empty intersection hypotheses. Furthermore, adot in any of the positions will be a wild card which canrepresent either 0 or 1. For example, H1��1��1�� represents anyof the 64 intersection hypotheses contained in the intersectionof the 1st, 4th, and 7th. Furthermore, note that the procedures aresymmetrical with respect to the three doses. For example, oncethe test is defined for H111000011, it is automatically defined forH111011000, H011111000, H000111011, H011000111, and H000011111. Thefollowing table defines the tests for all the 511 non-emptyintersection hypotheses

Intersectionhypothesis

Numberdefined

Test forDunnett-basedfull closure(rejection region)

Test forBonferroni-basedclosure(rejection region)

H1��1��1�� 64 T niP;1o� 2:349 or

T niP;2o� 2:349 or

T niP;3o� 2:349

T niP;1o� 2:394 or

T niP;2o� 2:394 or

T niP;3o� 2:394

H1��1��011 48 T niP;1o� 2:349 or

T niP;2o� 2:349 or

T sP;3o� 2:601 or

TS;3o� 2:601

T niP;1o� 2:394 or

T niP;2o� 2:394 or

T sP;3o� 2:638 or

TS;3o� 2:638

H1��1��001 48 T niP;1o� 2:356 or

T niP;2o� 2:356 or

TS;3o� 2:394

T niP;1o� 2:394 or

T niP;2o� 2:394 or

TS;3o� 2:394

H1��1��010 48 T niP;1o� 2:349 or

T niP;2o� 2:349 or

T sP;3o� 2:349

T niP;1o� 2:394 or

T niP;2o� 2:394 or

T sP;3o� 2:394

H1��1��000 48 T niP;1o� 2:212 or

T niP;2o� 2:212

T niP;1o� 2:241 or

T niP;2o� 2:241

H1��011011 12 T niP;1o� 2:349 or

T sP;2o� 2:592 or

TS;2o� 2:592 orT s

P;3o� 2:592 orTS;3o� 2:592

T niP;1o� 2:394 or

T sP;2o� 2:638 or

TS;2o� 2:638 orT s

P;3o� 2:638 orTS;3o� 2:638

H1��011001 24 T niP;1o� 2:349 or

T sP;2o� 2:592 or

TS;2o� 2:592 orTS;3o� 2:387

T niP;1o� 2:394 or

T sP;2o� 2:638 or

TS;2o� 2:638 orTS;3o� 2:394

H1��011010 24 T niP;1o� 2:349 or

T sP;2o� 2:592 or

TS;2o� 2:592 orT s

P;3o� 2:358

T niP;1o� 2:394 or

T sP;2o� 2:638 or

TS;2o� 2:638 orT s

P;3o� 2:39432

2

J. Lawrence

Copyright r 2010 John Wiley & Sons, Ltd. Pharmaceut. Statist. 2011, 10 318–324

Page 6: Testing non-inferiority and superiority for two endpoints for several treatments with a control

H1��011000 24 T niP;1o� 2:212 or

T sP;2o� 2:463 or

TS;2o� 2:463

T niP;1o� 2:241 or

T sP;2o� 2:498 or

TS;2o� 2:498

H1��001001 12 T niP;1o� 2:349 or

TS;2o� 2:38 orTS;3o� 2:38

T niP;1o� 2:394 or

TS;2o� 2:394 orTS;3o� 2:394

H1��001010 24 T niP;1o� 2:349 or

TS;2o� 2:4 orT s

P;3o� 2:358

T niP;1o� 2:394 or

TS;2o� 2:394 orT s

P;3o� 2:394H1��001000 24 T ni

P;1o� 2:212 orTS;2o� 2:252

T niP;1o� 2:241 or

TS;2o� 2:241

H1��010010 12 T niP;1o� 2:349 or

T sP;2o� 2:349 or

T sP;3o� 2:349

T niP;1o� 2:394 or

T sP;2o� 2:394 or

T sP;3o� 2:394

H1��010000 24 T niP;1o� 2:212 or

T sP;2o� 2:212

T niP;1o� 2:241 or

T sP;2o� 2:241

H1��000000 12 T niP;1o� 1:960 T ni

P;1o� 1:960

H011011011 1 T sP;1o� 2:584 or

TS;1o� 2:584 orT s

P;2o� 2:584 orTS;2o� 2:584 orT s

P;3o� 2:584 orTS;3o� 2:584

T sP;1o� 2:638 or

TS;1o� 2:638 orT s

P;2o� 2:638 orTS;2o� 2:638 orT s

P;3o� 2:638 orTS;3o� 2:638

H011011001 3 T sP;1o� 2:584 or

TS;1o� 2:584 orT s

P;2o� 2:584 orTS;2o� 2:584 orTS;3o� 2:366

T sP;1o� 2:638 or

TS;1o� 2:638 orT s

P;2o� 2:638 orTS;2o� 2:638 orTS;3o� 2:394

H011011010 3 T sP;1o� 2:584 or

TS;1o� 2:584 orT s

P;2o� 2:584 orTS;2o� 2:584 orT s

P;3o� 2:366

T sP;1o� 2:638 or

TS;1o� 2:638 orT s

P;2o� 2:638 orTS;2o� 2:638 orT s

P;3o� 2:394

H011011000 3 T sP;1o� 2:453 or

TS;1o� 2:453 orT s

P;2o� 2:453 orTS;2o� 2:453

T sP;1o� 2:498 or

TS;1o� 2:498 orT s

P;2o� 2:498 orTS;2o� 2:498

H011001001 3 T sP;1o� 2:601 or

TS;1o� 2:601 orTS;2o� 2:349 orTS;3o� 2:349

T sP;1o� 2:638 or

TS;1o� 2:638 orTS;2o� 2:394 orTS;3o� 2:394

H011001010 6 T sP;1o� 2:584 or

TS;1o� 2:584 orTS;2o� 2:378 orT s

P;3o� 2:366

T sP;1o� 2:638 or

TS;1o� 2:638 orTS;2o� 2:394 orT s

P;3o� 2:394

H011001000 6 T sP;1o� 2:453 or

TS;1o� 2:453 orT s

P;2o� 2:223

T sP;1o� 2:498 or

TS;1o� 2:498 orT s

S;2o� 2:241

H011010010 3 T sP;1o� 2:584 or

TS;1o� 2:584 orT s

P;1o� 2:638 orTS;1o� 2:638 or

T sP;2o� 2:363 or

T sP;3o� 2:363

T sP;2o� 2:394 or

T sP;3o� 2:394

H011010000 6 T sP;1o� 2:453 or

TS;1o� 2:453 orTS;2o� 2:223

T sP;1o� 2:498 or

TS;1o� 2:498 orTP;2o� 2:241

H011000000 3 T sP;1o� 2:212 or

TS;1o� 2:212T s

P;1o� 2:241 orTS;1o� 2:241

H001001001 1 TS;1o� 2:349 orTS;2o� 2:349 orTS;3o� 2:349

TS;1o� 2:394 orTS;2o� 2:394 orTS;3o� 2:394

H001001010 3 TS;1o� 2:356 orTS;2o� 2:356 orT s

P;3o� 2:394

TS;1o� 2:394 orTS;2o� 2:394 orT s

P;3o� 2:394

H001001000 3 TS;1o� 2:212 orTS;2o� 2:212

TS;1o� 2:241 orTS;2o� 2:241

H001010010 3 TS;1o� 2:381 orT s

P;2o� 2:363 orT s

P;3o� 2:363

TS;1o� 2:394 orT s

P;2o� 2:394 orT s

P;3o� 2:394

H001010000 6 TS;1o� 2:24 orT s

P;2o� 2:223TS;1o� 2:241 orTP;2o� 2:241

H001000000 3 TS;1o� 1:960 TS;1o� 1:960

H010010010 1 T sP;1o� 2:349 or

T sP;2o� 2:349 or

T sP;3o� 2:349

T sP;1o� 2:394 or

T sP;2o� 2:394 or

T sP;3o� 2:394

H010010000 3 T sP;1o� 2:212 or

T sP;2o� 2:212

T sP;1o� 2:241 or

T sP;2o� 2:241

H010000000 3 T sP;1o� 1:960 T s

P;1o� 1:960

APPENDIX C

This appendix describes an a-exhaustive Dunnett-basedprocedure for the general case of m treatment arms. First,order the 3m hypotheses as in Appendix B: H1,1, H2,1, H3,1, H1,2,H2,2, H3,2,y,H1;m, H2;m, H3;m. For each intersection hypothesis,there is a binary sequence of length 3 for each family of threehypotheses corresponding to treatment arm i: H1;i , H2;i , H3;i . Letbi denote this binary sequence for family i. Take a specificintersection hypothesis. The algorithm will proceed recursively.If any bi = 000, then we can ignore that treatment arm and usethe test that we would use of there were m�1 treatment armsfor this intersection hypothesis. So, we can assume that nobi = 000. Suppose the number of bi that begin with 1 is a1; thenumber of bi = 011 is a2; the number of bi = 001 is a3; so that thenumber of bi = 010 is m�a1�a2�a3. For each intersectionhypothesis, the rejection region will look like[

bi¼1��

fT niP;ioc1g [

[bi¼011

fT sP;ioc2 [ T s

S;ioc2g [[

bi¼001

fTS;ioc3g

[[

bi¼010

fT sP;ioc4g 3

23

J. Lawrence

Pharmaceut. Statist. 2011, 10 318–324 Copyright r 2010 John Wiley & Sons, Ltd.

Page 7: Testing non-inferiority and superiority for two endpoints for several treatments with a control

where the constants may be different for different intersectionhypotheses. There are four cases to consider, each successivecase assumes the previous cases do not apply:

(i) If a1 = m, then the rejection region isSn

i¼1 fT niP;ioc1g where c1

is chosen to make this an a-level test under the nullhypothesis

Tmi¼1 H1;i .

(ii) If a11a21a3om, then momentarily replace the bi for oneof the families with bi = 010 by the sequence 011. Findthe constants for this new intersection hypothesis and callthem c01; c02; c03; c04. Now, return to the original intersectionhypothesis and define the rejection region by

[bi¼1��

fT niP;ioc01g [

[bi¼011

fT sP;ioc02 [ T s

S;ioc02g

[[

bi¼001

fTS;ioc03g [[

bi¼010

fT sP;ioc4g

where c4 is chosen to make this an a-level test under the nullhypothesis

Tbi¼1�� H1;i \

Tbi¼01� H2;i \

Tbi¼0�1 H3;i .

(iii) If a340, then momentarily replace the bi for one of thefamilies with bi = 001 by the sequence 011. Find theconstants for this new intersection hypothesis and call them

c01; c02; c03; c04. Now, return to the original intersectionhypothesis and define the rejection region by

[bi¼1��

fT niP;ioc01g [

[bi¼011

fT sP;ioc02 [ T s

S;ioc02g

[[

bi¼001

fTS;ioc3g [[

bi¼010

fT sP;ioc04g

where c3 is chosen to make this an a-level test under thenull hypothesis

Tbi¼1�� H1;i \

Tbi¼01� H2;i \

Tbi¼0�1 H3;i .

(iv) If a240, then momentarily replace the bi for one of thefamilies with bi = 011 by the sequence 111. Find theconstants for this new intersection hypothesis and callthem c01; c02; c03; c04. Now, return to the original intersectionhypothesis and define the rejection region by

[bi¼1��

fT niP;ioc01g [

[bi¼011

fT sP;ioc2 [ T s

S;ioc2g

[[

bi¼001

fTS;ioc03g [[

bi¼010

fT sP;ioc04g

where c2 is chosen to make this an a-level test under thenull hypothesis

Tbi¼1�� H1;i \

Tbi¼01� H2;i \

Tbi¼0�1 H3;i .

32

4

J. Lawrence

Copyright r 2010 John Wiley & Sons, Ltd. Pharmaceut. Statist. 2011, 10 318–324