Download pdf - ANALYZING MULTIPLE ENDPOINTS WITH A TWO-STAGE GROUP ... › ~boos › library › mimeo... · I warmly thank my sisters Marie-Andreeand Monique, my brother Benoit and their families

ANALYZING MULTIPLE ENDPOINTSWITH A TWO-STAGE GROUP SEQUENTIAL DESIGN

IN CLINICAL TRIALS

by

Claudine Legault

Department of Biostatistics, University ofNorth Carolina at Chapel Hill, NC.

Institute of statistics Mimeo Series No. 1889T

September 1991

ANALYZING MULTIPLE ENDPOINTS

WITH A TWO-STAGE GROUP SEQUENTIAL DESIGN

IN CLINICAL TRIALS

by

Claudine Legault

A dissertation submitted to the faculty of the University of North Carolina at Chapel

Hill in partial fulfillment of the requirements for the degree of Doctor of Philosophy in

the Department of Biostatistics.

Chapel Hill

1991

Approved by:

-~.~>~ eader

~a..--c.....cA Reader

~eader

Re~er

ABSTRACT

CLAUDINE LEGAULT. Analyzing Multiple Endpoints with a two-stage group

sequential design in clinical trials. (Under the direction of Timothy M. Morgan.)

In many clinical trials, the assessment of the response to the various treatments

can include a large variety of outcome variables which are generally correlated.

Different endpoints may be regarded by the investigators as important in determining if

a certain treatment is effective. The more variables there are, the more likely it is that

differences will appear at random if adjustments are not made for the multiple tests.

Bonferroni's adjustment for multiple comparisons is one of the approaches used

when multiple correlated outcomes are being compared. For alternative hypotheses in

which several endpoints are affected in the same direction, Bonferroni's procedure may

lack power because the rejection of the overall hypothesis is based on the smallest p

value of all the test. statistics. Hotelling's T 2 makes no distinction between variables

that change favorably and variables that change unfavorably. It lacks power to detect

any specific types of departure considered a priori to be biologically plausible in a clinical

trial and was therefore considered unsuitable by Pocock (1987) for the analysis of clinical

trials. A test proposed by O'Brien (1984) focusses on alternative hypotheses with all

endpoints showing an effect in the same direction. In that situation it provides better

power but it deteriorates sharply to a power of only 5% when variables are affected in

opposite directions.

This dissertation first compares the critical regions and the power contours of

the three procedures mentioned above. The efficiency and robustness of these

procedures are compared as a function of the direction of the alternative hypotheses.

A new test is first derived using data from the interim look in a two-stage group

sequential design to form the rejection boundary at the second stage. Initially, the test

uses an Hotelling T 2 rejection region at the end of the first stage and an O'Brien 'type'

procedure at the end of the second stage. The test is then extended to allow for early

acceptance. The distribution of the proposed tests is presented and their power and

efficiency are compared to common procedures. Finally, two examples are presented

and future research recommendations are discussed.

ii

ACKNOWLEGMENTS

First and foremost, I would like to thank Dr Tim Morgan for his support. His

guidance, his availability, his patience and his sincere care have been constant and

precious throughout this work.

I would also like to thank Dr P. K. Sen, the chairman of my committee and my

academic advisor, for his judicious advice. Appreciation is also extended to the other

members of my committee, Professors C. Ed Davis, Paul Stewart and Gerardo Heiss for

their constructive comments.

I warmly thank my sisters Marie-Andree and Monique, my brother Benoit and

their families for encouraging me in the pursuit of my doctoral degree. Their presence

on the day of my final oral examination means a lot to me.

I have made dear friends during my stay in Chapel Hill and I want to thank

them all, as well as my friends from Montreal, for their support and friendship. The

most heartfelt and enduring gratitude is expressed to Susan Lewis who enthusiastically

and cheerfully supported me in many ways.

Finally, I am indebted to the 'Fonds de la Recherche en Sante du Quebec' for

their financial support.

iii

TABLE OF CONTENTS

Page

List of Tables vi

List of Figures vii

Chapter 1: Introduction and Literature Review 1

1.1 Introduction 1

1.2 Literature Review 3

1.2.1 M~ltiple endpoint procedures 3

1.2.1.1 Bonferroni's Inequality 3

1.2.1.2 Hotelling's T 2 : 12

1.2.1.3 O'Brien's Test 14

1.2.1.3.1 Nonparametric procedure 15

1.2.1.3.2 GLS Parametric procedure 15

1.2.2 Sequential designs 16

1.3 Proposed research 19

Chapter 2: Evaluation and Comparison of Three Common Multivariate Testing

Procedures 21

2.1 Introduction 21

2.2 Bonferroni's Inequality 22

2.3 Hotelling's T 2 25

2.4 O'Brien's Test 27

2.5 Comparison of the three procedures 30

2.6 Evaluation with correlated data 39

IV

Chapter 3: Two-Stage Group Sequential Test with Multiple endpoints 62

3.1 Introduction 62

3.2 The new test statistic L 62

3.2.1 Density of Lunder Ho 64

3.2.2 Power of L 66

3.2.2.1 Special case: p =~ = 0.5 and 1J2 = O 66

3.2.2.2 General case: p = ~1 and IJ* = {Ii 1J1 73

3.2.2.3 Distribution of L, allowing for early stopping 76

3.3 Transformation for correlated data 85

3.4 Using the new test L with more than two endpoints 85

Chapter 4: Two Examples : 88

4.1 Reduction in incidence of coronary heart disease 88

4.2 Oral contraceptives and coronary atherosclerosis of cynomolgus monkeys 91

Chapter 5: Summary and Suggestions for Future Research 97

References 100

v

LIST OF TABLES

Table 2.5.1: Power of O'Brien's test, for different alternative hypotheses whenthe power of Hotelling's T 2 is 80% and the number of endpointsp = 2, 3, 4, 5 and 10 38

Table 3.2.1.1:

Table 3.3.1:

Table 3.3.2:

Table 3.3.3:

Table 4.1.1:

Table 4.1.2:

Table 4.1.3:

Table 4.1.4:

Table 4.1.5:

Table 4.2.1:

Table 4.2.2:

Table 4.2.3:

Critical values Ie and power for L, a = 0.05 70

Critical values Ie and power for L, when allowing for early stopping,a = 0.05 78

Power of L, when allowing for early stopping, for alternativehypotheses for which the power of Hotelling's T 2 is 80% 81

Power of L, when allowing for early stopping, for alternativehypotheses for which the power of Hotelling's T 2 is 50% 82

Mean cholesterol and triglycerides levels, all subjects 90

Mean cholesterol and triglycerides levels, first accrual period 90

Mean cholesterol and triglycerides levels, second accrual period 90

Drug effects and their corresponding T statistics,for each accrual period 91

Tests results at the end of the LRC-CPPT 91

Mean LTHDL and LMIA, all subjects 93

Mean LTHDL and LMIA, for each accrual period 94

Tests results at the end of the cynomolgus monkeys trial 95

vi

LIST OF FIGURES

New axes to determine the power of O'Brien's test 29

Contours for powers of 50%,80% and 90%: Bonferroni (B)Hotelling (H) O'Brien (0) 24

Critical regions: Bonferroni (B) Hotelling (H) O'Brien (0) 23

Regions in which each test has greater power than the othertwo tests: Bonferroni (B) Hotelling (H) O'Brien (0) 31

One-degree sections for 80% power contour of Hotelling's test ......... 33

Figure 2.2.1:

Figure 2.2.2:

Figure 2.4.1:

Figure 2.5.1:

Figure 2.5.2:

Figure 2.5.2.1: Power at each one-degree section for alternative hypothesesfor which the power of Hotelling's test is 50%:Bonferroni (B) Hotelling (H) O~Brien (0) 34

Figure 2.5.2.2: Power at each one-degree section for alternative hypothesesfor which the power of Hotelling's test is 80%:Bonferroni (B) Hotelling (H) O'Brien (0) 35

Figure 2.5.2.3: Power at each one-degree section for alternative hypothesesfor which the power of Hotelling's test is 90%:Bonferroni (B) Hotelling (H) O'Brien (0) 36

Figure 2.6.1.1: Critical regions, rho = -0.5:Bonferroni (B) Hotelling (H) O'Brien (0) 42

Figure 2.6.1.2: Critical regions, rho = -0.9:Bonferroni (B) Hotelling (H) O'Brien (0) 43

Figure 2.6.1.3: Critical regions, rho = 0.5:Bonferroni (B) Hotelling (H) O'Brien (0) 44

Figure 2.6.1.4: Critical regions, rho = 0.9:Bonferroni (B) Hotelling (H) O'Brien (0) 45

Figure 2.6.2.1: Contours for powers of 50%, 80% and 90%, rho = -0.5:Bonferroni (B) Hotelling (H) O'Brien (0) 46

Figure 2.6.2.2: Contours for powers of 50%, 80% and 90%, rho = -0.9:Bonferroni (B) Hotelling (H) O'Brien (0) 47

Figure 2.6.2.3: Contours for powers of 50%, 80% and 90%, rho = 0.5:Bonferroni (B) Hotelling (H) O'Brien (0) 48

Figure 2.6.2.4: Contours for powers of 50%, 80% and 90%, rho = 0.9:Bonferroni (B) Hotelling (H) O'Brien (0) 49

Figure 2.6.3.1.1: Power at each one-degree section for alternative hypothesesfor which the power of Hotelling's test is 50%rho = -0.5 50

vii

Figure 2.6.3.1.2: Power at each one-degree section for alternative hypothesesfor which the power of Hotelling's test is 80%rho = -0.5Bonferroni (B) Hotelling (H) O'Brien (0) 51





Figure 2.6.3.3.1: Power at each one-degree section for alternative hypothesesfor which the power of Hotelling's test is 50%rho = 0.5Bonferroni (B) Hotelling (H) O'Brien (0) 56





Figure 2.6.3.4.3: Power at each one-degree section for a.lternative hypothesesfor which the power of Hotelling's test is 90%rho = 0.9Bonferroni (B) Hotelling (H) O'Brien (0) 61

viii

·Figure 3.2.1.1:

Figure 3.2.1.2:

Figure 3.2.1.3:

Figure 3.2.3.1:

Density of L, p = 0.1 67



Power at each one-degree section for alternative hypothesesfor which the power of Hotelling's test is 80%:Bonferroni (B) Hotelling (H) O'Brien (0) New test (L) 83

Figure 3.2.3.2: Power at each one-degree section for alternative hypothesesfor which the power of Hotelling's test is 50%:Bonferroni (B) Hotelling (H) O'Brien (0) New test (L) 84

Figure

Figure

4.1.1: Cholestyramine trial:L statistic = OL, O'Brien's statistic = OB 92

4.2.1: Oral contraceptives trial:L statistic = OL, O'Brien's statistic = OB 96

ix

CHAPTER 1INTRODUCTION AND LITERATURE REYIEW

1.1 Introduction

A study often involves a great number of response variables, each of them

reflecting one aspect of the overall question of interest. Different endpoints may be

regarded by the investigators as important in determining if a certain treatment is

effective. The more variables there are, the more likely it is that differences will appear

at random if adjustments are not made for the multiple tests; however, investigators do

not want to ignore any potentially important endpoint. In many clinical trials, the

assessment of the response to the various treatments can include a large variety of

outcome variables. For example, Pocock et al (1987) discussed a chronic respiratory

disease crossover trial studying the effect of the addition of an inhaled drug to each

patient's normal treatment on respiratory function. Three standard respiratory function

measures were taken: the peak expiratory flow rate (PEFR), the forced expiratory

. volume (FEY1) and the forced vital capacity (FYC). O'Brien (1984) examined a trial

comparing two therapies for the treatment of diabetes. The improvement of the nerve

function was measured by 34 electromyographic variables. Smith et al (1987) reported

on 67 trials from the 1982 issues of four medical journals: Lancet, New England Journal

of Medicine, Journal of the American Medical Association and the British Medical

Journal. Of the 67 trials, 66 contained more than one therapeutic comparison. The

main source of these multiple therapeutic comparisons was multiple outcomes with a

mean of 21.7 different analyzed outcomes per trial. Only two of these trials contained

any statistical adjustments for the multiple therapeutic comparisons. None of the trials

in the study used methods developed for analyzing interrelated outcomes.

One common approach to the analysis of an overall question is to consider the

outcomes simultaneously and to present multiple p-values from univariate tests on each

of the specified endpoints. This presents some problems. First, the use of multiple

significance tests is likely to increase the chance of detecting a difference in at least one

of the outcomes between two treatments. This difference may, in fact, appear more

important than it really is, the probability of a significant result increasing with the

number of tests being performed, when the null hypothesis of no effect is true. A

correction is often imposed on the level of significance lr. Second, the endpoints may be

correlated and separate univariate tests do not, by themselves, take into account the

correlation structure. Conclusions based on such analyses can only be looked at with

reservation. Sometimes a single primary endpoint is first specified and analyzed at the

prespecified level lr. Secondary.endpoints are then defined and multiple significance test

results are reported to the reader who may look at them as exploratory results or

modify the lr level according to the number of tests performed. The choice of a primary

endpoint may be arduous; the distinction between the primary and some of the

secondary endpoints may not be obvious. Finally, if the endpoints are not all affected in

the same direction, the interpretation of the results may present some difficulties in

deciding if there is an appreciable difference. For example, if some of the variables

measuring the nerve function showed a significant improvement under one therapy while

the remaining variables demonstrated a deterioration under the same therapy, the

overall evaluation of the effect would certainly cause problems to the researchers.

Common procedures for making multiple comparisons of correlated outcomes,

multivariate tests combining several endpoints, and procedures for comparing outcomes

at multiple time points will be reviewed subsequently. A brief description of group

2

sequential designs will also be given. This dissertation will consider new methods that

use information obtained from interim analyses in group sequential designs in making

treatment comparisons on multiple endpoints.

1.2 Literature Review

1.2.1 Multiple endpoint procedures

Several testing procedures have been proposed in the literature for performing

multiple hypothesis tests, including those proposed by Tukey (1951), Duncan (1951),

Scheffe (1953), Dunn (1959) and Roy & Bose (1953). General references on

simultaneous statistical inference are Miller (1981), Anderson (1958), Press (1972),

Morisson (1976) and Hochberg & Tamhane (1987). Recently, Berry (1988) and Breslow

(1990) presented some bayesian approaches to the problems of multiplicity. This review

will focus on three procedures: Bonferroni's inequality (1936) which is a commonly used

procedure for multiple hypothesis tests, Hotelling's T 2 (1931) which is a standard

classical test for the comparison of multivariate samples, and a test recently adapted to

clinical trials by O'Brien (1984). Descriptions and comparisons of these methods will be

presented in the following sections.

1.2.1.1 Bonferroni's Inequality

Let P1'..... 'Pn be a set of p-values to test hypotheses H1,..... ,Hn. The Bonferroni

procedure will lead to the rejection of Ho = {H1,..... ,Hn} if any p-value is less than o:/n,

where 0: is the overall level of significance. Furthermore, each hypothesis Hi (i=1, ..... ,n)

will be individually rejected if Pi :$ o:/n. The Bonferroni inequality,

n

Pr{U (Pi $ o:/n)} $ 0:

i=1

3

(0 $ 0: $1) (1.2.1)

ensures that the probability of rejecting at least one hypothesis when all are true is no

greater than a (Simes, 1986). If the n endpoints are independent,

Pr(smallest p-value $ a/n)

= Pr(rejecting at least one hypothesis)

= 1 - Pr(not rejecting any hypothesis)

= 1 - (1 - a/n)n

< 1 - (1 - (a/n)n)

= a.

For alternative hypotheses in which several endpoints are affected in the same

direction, Bonferroni's procedure may lack power because the rejection of the overall

hypothesis is based on the smallest p-value of the k test statistics.

In practice, endpoints are generally correlated. Pocock et al (1987) have

demonstrated that Bonferroni's correction works reasonably well for moderately

correlated normally distributed endpoints with known variance and the same correlation

p for all possible pairs within each of two compared groups. The conservatism of

Bonferroni's approach increases as p increases but there is no noticeable deterioration in

Bonferroni's correction as the number of correlated endpoints increases. With five

endpoints, a=0.05 and p=0.5, they have shown that the value a', which the smallest of

5 one-sided p-values obtained from the normal test statistics will reach with probability

a under the null hypothesis, is 0.0128 compared with a/n=O.Ol. However, multiple

endpoints are not usually equicorrelated and normally distributed. Pocock suggests that

similar findings should occur for any continuous asymptotically normal test statistics

and that if most pairwise correlations are less than 0.5, serious conservatism should not

occur.

4

Sidak (1967) proposed a modification of Bonferroni's inequality. Instead of

testing each hypothesis at aj = a/n, he recommended using a level of significance aj =

to 1 - (1 - a)l/n. Similarily to Bonferroni's approach, Sidak showed that, for n

independent endpoints:

Pr(smaUest p-value $ 1 _ (1 _ a)l/n )

= 1 - {1 - [1 - (1 _ a/lnnn

$ 1 - {1 - [1 - (1 - a/n)]}n

= 1 - {1 - a/n}n

$ 1 - {1 :- (a/n)n}

= a.

For n < 10 and a=0.05, Sidak's multiplicative inequality only leads to a slightly

mC;>re powerful test than Bonferroni's.

Holm (1979) presented a sequentially rejective Bonferroni test where tests can be

conducted at successively higher significance levels. It is as simple to compute as the

classical Bonferroni test and has a strictly larger probability of rejecting each hypotheses

individually. However, the probability of rejecting Ho = {H1,..... ,Hn} is the same for

both tests.

Let Yi> Y2 , ........ ,Yn be some test statistics,

Pk(Yk ) be the p-value for the outcome of the test statistic Yk ,

k=l,..... ,n

The test is:

5

Is R1 ~ a/n?

yes

Reject HI

.lJ.

no ::} Accept HI' ........ , Hn , stop.

Is R 2 ~ a/n-1? no::} Accept H2 , ......... , Hn , stop.

yes

Reject H2

.lJ.

Is Rn ~ a/I?

yes

Reject Hn , stop.

no =? Accept Hn , stop.

The power gain obtained by using a sequentially rejective Bonferroni test instead

of a classical Bonferroni test depends very much upon the alternative hypothesis. It is

small if all the hypotheses are 'almost true', but it may be considerable if a number of

hypotheses are 'completely wrong'. If m of the n basic hypotheses are 'completely

wrong', the corresponding levels attain small values, and these hypotheses are rejected in

the first m steps with a big probability. The other levels are then compared to a/k for

k = n - m, n - m - 1, n - m - 2,..... , 2, 1, which is equivalent to performing a sequentially

rejective Bonferroni test only on those hypotheses that are not 'completely wrong'.

A great advantage with the sequentially rejective Bonferroni test (as well as the

classical Bonferroni test) is that there are no restrictions on the type of tests, the only

requirement being that it should be possible to calculate the obtained level for each

separate test. Furthermore, when the test statistics are independent, the comparison

6

l/n 1/(n-1)constants Ot/n, Ot/(n-1),..... ,Ot/1 can be replaced by 1-(1-Ot) , 1-(1-Ot) ,..... , 1-(1-

Ot)l, which are greater. This means that the test is more powerful but, the increase in

power is not very big.

It may happen that some hypotheses are more important than others, which

may imply the use of higher levels of significance for the most important hypotheses and

smaller levels of significance for the less important hypotheses when the Bonferroni

technique is applied. The sequentially rejective Bonferroni test can be adapted for this

situation. At each step in the procedure the obtained levels for the not yet rejected

hypotheses are compared to parts of the Ot, which are proportional to the corresponding

constants.

Hommel (1983) introduced yet another level Ot test less conservative than

Bonferroni's, based on Ruger's inequality (1978). Let P (k) be the kthsmallest of n p

values, 2$k$n. Reject Ho if P (k) $ kOt/n. Here k has to be determined before

performing the n tests.

To avoid choosing k .in advance, one can use the following level ot test also

proposed by Hommel (1983): reject Ho if P (k)$kOt/nCn for at least one k, 1$k$n,

nwhere Cn = I: 1/i. Simes (1986) introduced a modification of Hommel's procedure

i-I

that will be described later.

Shaffer (1986) modified Holms' sequentially rejective procedure to obtain a

further increase in· power when there are logical implications among the hypotheses and

alternatives so that not all combinations of true and false hypotheses are possible.

Given that j-l hypotheses have been rejected, instead of using Ot/n-j+l for the next test

as in Holm's procedure, the denominator can be set at tj , where t j equals the maximum

number of hypotheses that could be true, given that at least j-1 hypotheses are false.

Obviously, t j is never greater than n-j+l, and for some values of j it may be strictly

smaller. This modified sequentially rejective Bonferroni (MSRB) procedure will never be

7

less powerful than the sequentially rejective procedure while maintaining an

experimentwise significance level $a.

Often the n hypotheses are not tested separately unless a more comprehensive

hypothesis has initially been rejected at significance level a, where such rejection implies

that at least some number r of the n hypotheses are false, r = 1, 2,..... , n-1. A further

improvement in the MSRB is then possible; the critical values a/tj can be replaced by

et/tn-r without increasing the overall significance level above a.

Another modification of the MRSB procedure takes into account the particular

hypotheses rejected. The power of the MRSB procedure can be increased, at the cost of

greater complexity, by substituting for a/tj at stage j the value et/tj, where tj* is the

maximum number of hypotheses that could be true, given that the specific ordered

hypotheses HI' H2 ,.· ... , Hj-I are false. This procedure has an experimentwise

significance level $ a. Shaffer's modifications are independent of the particular test

statistic used, except for the knowledge ~f their respective marginal distributions.

Worsley (1982) prese~ted an improved Bonferroni inequality which gives an

upper bound for the probability of the union of an arbitrary sequence of events. It is

constructed in terms of the joint probability of pairs of events, which are represented by

edges on a graph. His procedure represents an improvement over the Bonferroni, Sidak

and Holm approaches, but it requires knowledge of the joint probabilities of pairs of

events which is not always easily available.

Armitage and Parmar (1986) developed a sequential method to investigate the p-

values of test statistics which follow a multivariate normal distribution: the 'peeling'

procedure. For k ordered p-values, the ith Bonferroni-adjusted p-value is ..

d. P (. )k-(i-I)a J (i) = 1 - 1 - P (i)

8

i=l d (1.2.2)

"

where d is the first adjusted value to be judged non-significant. We should expect d to

be small even for large k. However, any Bonferroni-type approach is too conservative

when tests are correlated. Thus, they proposed an adjusted correction which allows for

correlations:

(1.2.3)

where O::;x::;l. For k independent tests x=l and for fully correlated tests x=O. For

the general case, x is defined as a function of the correlation structure.

The maximum relative error, 10 using adj P (1) as an adjusted Bonferroni

correction for 5 correlated tests, when 0.001 ::; P (1) ::; 0.05 was calculated assuming

multivariate normality for the tests statistics, using Schervish's algorithm for the

multivariate normal integral. Thirty-three different correlation structures and P-values

were considered. The maximum. relative error was 8%. The authors have found that

adj P (1) also gives very good results for k = 2, 3 and 4 dimensions.

Simes (1986) presented a generalization of the Bonferroni procedure which has

an actual significance level closer to the nominal level in a wide range of circumstances

and which has a lower type II error rate for a given nominal significance level than the

classical procedure. It is a modification that is less conservative than Hommel's

procedure because of omitting the constant Cn' His procedure uses different critical

values for each p-value. For n ordered p-values P (1) ::; ..... ::;P(n) testing hypotheses

H(l)' ..... ,H(n)' one rejects the overall null hypothesis Ho = {Hi ,..... ,Hn} if P (k) ::; kOlin

for any k = 1,..... ,n. This procedure has type I error probability equal to Ol for

independent tests. Simes simulated a multivariate normal distribution with unit

variances and common correlation coefficient p as well as chi-squared tests to estimate

the type I error rate of the classical and generalized test procedures. The classical

9

Bonferroni procedure has similar type I error rate for independent tests but is more

conservative than the generalized test for highly correlated outcomes. The simulation

study of test statistics under various alternative hypotheses showed that the

improvement in power is appreciable when several of the alternative hypotheses are

correct. One disadvantage seems to be a slight increase in computation. Finally, the

modified Bonferroni test procedure does not allow specific alternative hypotheses to be

identified; statements about individual hypotheses should be considered exploratory.

Hommel (1988) extended Simes' procedure to make inferences on individual

hypotheses. Let J = {i' E {l n}: P (n-i'+k) > ka/i'j k = 1, ,i'}; the Ps are ordered.

If J is non empty, reject H(i) whenever P(i) $ alj' with j'= max (i'e J). If J is empty,

reject all Hi (i=l,..... ,n).

At the same time, Hochberg (1988) gave a simple sequential way of making

inferences on individual hypotheses that is able to reject at least one individual

hypothesis when the global null hypothesis is rejected. Using the ordered p-values,

reject H(j) if there exists a j (1 $j $ n) such that P U> $ a I (n-j+1) and P (i) $ P U>.

Hommel's procedure, although more complicated, is more powerful than Hochberg's

procedure.

Rom (1990) showed that the superiority of Hommel's procedure was due to the

conservatism of Hochberg's procedure Le. its size was strictly less than a for n>2. He

corrected this undesired property by modifying the critical points of Hochberg's

procedure to obtain a new procedure that would still strongly control the family-wise

error rate at the designated significance level a. Hochberg's critical points a/n, a/n-

1, ..... ,a are replaced by cl ,..... , cnn, respectively where c1 =a and c· = c'+l 'n 1 Ij I j+l

l$i$j. The modified critical points are obtained iteratively by solving the recurrence

relationship:

10

.'

•

't-! i (n) n-i _.£..J cnn - . c(n-i) - O.1=1 1 n

The modified critical values are greater than the original ones, except for n:52, so the

modified procedure always rejects the global null hypothesis whenever the original one

does. The inference on the individual hypotheses can be done in the following sequential

way: if P n:5cn then all Hi's are rejected; otherwise, Hn cannot be rejected and one goes

on to compare P n-1 with cn-1 ' etc.

In summary:

1. Bonferroni's procedure may lack power for alternative hypotheses in which several

endpoints are correlated.

2. Sidak's multiplicative inequality only leads to a slightly more powerful test than

Bonferroni's.

3. The power of Holm's sequentially rejective Bonferroni test is small if all the

alternative hypotheses are 'almost true', but it may be considerable if a number of

hypotheses are 'completely wrong'. It can also be used with weights.

4. To use the procedure based on Ruger's inequality, one must determine k (P(k»)

before performing the n tests. To avoid choosing k in advance, Hommel introduced a

second procedure which is strictly not less powerful than Holm's procedure but is more

conservative than Simes's test.

5. Shaffer modified Holm's sequentially rejective procedure to obtain a further increase

in power when there are logical implications among the hypotheses and alternatives so

that not all combinations of true and false hypotheses are possible. An even bigger

increase in power can be obtained if i) a more comprehensive hypothesis has initially

been rejected, or ii) the particular hypotheses rejected are taken into account.

6. Worsley's procedure is an improvement over Bonferroni's but it requires knowledge

11

of the joint probabilities of pairs of events which is not usually available.

7. Armitage & Parmar's procedure takes into account the correlation structure but IS

more difficult to apply.

8. Simes's procedure is less conservative than Hommel's. The improvement in power is

appreciable when several of the alternative hypotheses are correct. However, it does not

allow specific statements about individual hypotheses.

9. With Hommel's extension of Simes's procedure, one can make inferences on

individual hypotheses.

10. Hochberg's procedure is another sequential way of making inferences on individual

hypotheses and it is simpler to use than Hommel's procedure. However it is not as

powerful as Hommel's.

11. Rom's modification of Hochberg's critical values makes Hochberg's procedure as

powerful as Hommel's while it is still simpler to use.

1.2.1.2 Hotelling's T 2

Hotelling's T 2 (1931) is a standard approach to study several normally

distributed endpoints simultaneously. To test the null hypothesis Ho: I! = I!o, the

statistic is defined as:

(1.2.4)

where ¥ is a vector of means from a sample of size N drawn from a population Np(I!'~)'

and ~-1 is the inverse of the sample covariance matrix. (<::~D~T 2 is distributed as a

non-central Fp,N-p with non-centrality parameter N(I! I!o)' ~-1 (I! I!o), If I! = I!o,

the distribution is the central F (Anderson, 1958).

If the prime interest is to compare the means of two normal populations where

12

..

the covariance matrices are assumed equal but unknown, the T 2 statistic can also be

used.

(i) (i) (i) )'Let Yl ,..... , YN. be a sample from N JJ , ~ 1=1,2.- - 1 -

JJ(I)= JJ(2). y(i) is distributed N(JJ(i), (1\N.)~) and- - - - 1

The null hypothesis is Ho:

where

T 2 NtN2 (_(1) _(2»)'S-I(-(I) _(2»)= N +N Y - Y - Y - Y ,

I 2 - - - -(1.2.5)

s_ 1 {~( (1) _(1»)( (1) _(1»),+~( (2) _(2»)( (2) _(2»),}Nt +N

2-2 ~ ~i -~ ~i -~ ~ ~i -~ ~i -~ .

1=1 1=1

Th N1+N2-p-l T 2 ' d' ·b t d t I F 'thus, (Nt+N

2-2)p 1S 1stn u e. as a non-cen ra p,Nt+N2-p-l WI

significance level Q and non-centrality parameter:

The distribution is the central F under Ho .

O'Brien (1984) noted that the T 2 statistic makes no distinction between

variables that change favorably and variables that change unfavorably. It is a test that

can detect a possible difference at a certain standardized distance from Ho in all

directions. Pocock (1987) further added that because it is intended to detect any

departure from the null hypothesis, it lacks power to detect any specific types of

departure considered a priori to be biologically plausible in a clinical trial and therefore

Hotelling's T 2 is unsuitable for the analysis of clinical trials,

13

1.2.1.3 O'Brien's Test

O'Brien's interest (1984) focussed on tests with alternative hypotheses with all

endpoints showing an effect in the same direction. He was concerned with the lack of

power of the Bonferroni inequality and the lack of discrimination of Hotelling's T 2• He

was seeking a single global test that would allow making overall probability statements

instead of having to interpret multiple test results, when some effect was consistent

among all endpoints. O'Brien observed, through simulations in which the number of

endpoints studied is large relative to sample size, that while separate tests on each

variable may not reach statistical significance, the overall evidence may suggest strong

differences. He also showed that tests such as Hotelling's T 2 achieved low power in such

circumstances.

He considered three global procedures: a nonparametric procedure that is a rank-

sum-type test, and two parametric approaches that are similar, one being based on

generalized least squares (GLS) estimation while the other one uses ordinary least

squares (OLS) estimation methods. As he pointed out, the efficiency of the OLS

procedure relative to the GLS procedure is :5 1 so attention will be focussed on the GLS

approach.

Let Y ijk represent the kth variable for the jth subject in group (k=l,..... ,Kj

j=l,..... ,ni j i=l,..... ,I).

..

COV(Yijk'Yj'j'k') =O'kk'o

if ij = i'j'otherwise.

•

Assume Yijk is defined so that large values are better than small values for each

k=l,..... ,K and Yij are independently distributed with mean ~i and covariance matrix ~.

14

The null hypothesis Ho : ~l =.....= ~I versus the alternative hypotheses for which IJik >

IJi'k' for k = 1, 2, ..... ,K, are of prime interest i.e. if the mean of variable 1 is greater in

group 1 than in group 2, then the mean of variables 2, ..... ,K will all be greater in group

1 than in group 2.

1.2.1.3.1 Nonparametric procedure

The nonparametric test is particularily recommended when the variables are not

normally distributed or the sample size is small.

Let Rjjk represent the rank of Yjjk among all values of variable k in the pooled

set of I samples.

KDefine Sij = 2: Rijk'

k=l

Perform a one-way analysis of variance on the {Sij} values.

1.2.1.3.2 GLS Parametric procedure

First, assume the {Yijk} values are standardized i.e. the overall mean is

subtracted from each observation and the result is then divided by the pooled within-

group sample standard deviation. Then compute:

where

F = 2:nj {Jt-1(}\ - Y.. )}2 / ((I-I) J't-1ni

J' = (1,1,.....,1),

Yi.= 2:Yij / nj'j

y .. =2:Yij / 2:nj, andij j

ta,b = L(Yija-Yi.a)(Yijb-Yi.b) / L(nj-l).ij j

(1.2.6)

Reject Ho if F exceeds the (l-a)xlOO percentile of the standard F distribution

15

with 1-1 and 2)nj-K) degrees of freedom in the numerator and denominator,i

respectively.

O'Brien has shown that the GLS procedure is remarkably robust to the

normality assumption and achieves optimality in the normal theory setting. He has also

demonstrated that both procedures, parametric and nonparametric, asymptotically

provide approximations of the probability of type-I error. In the repeated measures

setting, when variances are heterogeneous, the GLS procedure may allow a considerably

greater power with a slight increase in the size of the test.

In brief, O'Brien's nonparametric test is simply performing a one-way analysis of

variance on the sum of the ranks assigned to each subject or a univariate t-test on that

sum when comparing two samples. His parametric approach is also a one-way analysis

of variance but this time on the avarage of the standardized data. Both procedures

have the property of collapsing the multiple variables into one summary statictic.

Before proceeding to a more detailed analysis of the three tests presented, it is of

interest to review the use of sequential designs in clinical trials. As mentioned earlier,

they will play an important role in the methods developed in this dissertation.

1.2.2 Sequential designs

The theory for sequential designs was developed by Wald in 1947. A test is

performed after the accrual of each pair of observations; the only decision to be made is

whether to terminate or continue the trial. This classical sequential design is called an

'open' plan because there is no fixed sample size. Wald and Wolfowitz (1948) showed

that fully sequential designs led to the lowest expected sample size under the null and

the alternative hypotheses. Armitage (1957) introduced the 'closed' sequential design to

impose a limit on the sample size. In 1971, McPherson and Armitage developed theory

on repeated significance tests on accumulating data which is similar to the 'closed'

16

•

sequential design. Later, Armitage (1978) and Jones and Whitehead (1979) applied

sequential designs to survival data. Despite the savings in sample size, the need for

constant data monitoring, rapid response measures and matching of the participants was

not appealing. The requirement of analysis after each pair of outcomes was also

cumbersome.

Group sequential designs were subsequently developed to avoid some of the

problems of classical sequential designs. They are more practical and the increase in

sample size relative to the classical design is slight. The subjects are divided into I

equal-sized groups with 2n subjects in each. The data are then analyzed a maximum of

I times i.e. once after each accrual of 2n subjects. If the statistic Zj is outside a

prespecified stopping boundary, the experiment is -stopped and the null hypothesis is

rejected. If the statistic is inside the boundary, the experiment continues until i = I.

When i = I, the trial stops and the null hypothesis either is or is not rejected.

Haybittle (1971) and Peto (1976), Pocock (1977) and O'Brien &-Fleming (1979)

suggested different group sequential stopping boundaries for the standardized normal

statistic Zj' Haybittle & Peto favored a large critical value such as Zj = ±3.0 for all

interim tests and the conventional critical value for the last test. Pocock proposed to

use a constant critical value based on the number of analyses such that the overall

significance level would be o. O'Brien & Flemming suggested using Z*..[i7i where Z is

such that the overall significance level 0 is achieved. Extensions and modifications to

these designs were then proposed by DeMets & Ware (1980), Tsiatis (1982), Fairbanks

(1982), Gould & Pecore (1982), Harrington, Fleming & Green (1982), Whitehead

(1983), Whitehead & Stratton (1983), Selke & Siegmund (1983), Lan & DeMets (1983),

DeMets & Lan (1984), Lan, DeMets & Halperin (1984), Jennison, Turnbull & Tsiatis

(1984), Fleming, Harrington & O'Brien (1984), Freedman, Lowe & Macaskill (1984),

Gail (1984) and Geller & Pocock (1987).

17

Tang. Gnecco and Geller (1989) looked at the analysis of multiple endpoints with

group sequential designs. Using O'Brien's GLS approach. they proposed a method for

the design of clinical trials that allows for interim analyses and considers all endpoints

simultaneously. Patients are entered sequentially in a clinical trial. After each accrual

of 2n patients (n randomized to treatment A and n to treatment B). an interim analysis

is undertaken on the accumulated data. Assume the patient's data are independent k-

dimensional variables with mean tfi = (Jlil'Jli2 ...... 'Jlik)' i= A, B and common known

·covariance matrix~. The null hypotheses of interest are Hoi: JlAi - JlSi = O. The

alternative hypotheses are Hai : JlAi - JlSi = "\0i' i = 1,2,..... ,k where 0i specifies the

relative difference of interest. In O'Brien's original model, OJ = (Ti' for all i. The null

hypothesis can now be written Hoi: ..\ =. O.

Let Yj = (yU,.....,y~)' for the first j groups of patients data, have a multivariate

normal distribution with mean ..\§ and covariance matrix 2~/nj.

j

YU = L (;cAirn - "Sirn) / j, i = 1,2,.....krn=l

where "lirnis the average mean for patients in group m i.e. accrued between the (m_l)st

and mth analyses, I = A, B. The GLS estimate of ..\ is

(1.2.7)

O'Brien's test is then defined as:

F = (nj/2)1/2 §' ~-l Yj / (§' ~-l §)1/2. (1.2.8)

Under Ho , F ,." N(O,I). Under Ha: ..\ = ..\0 (>0), the mean of F is ..\o(nj §' ~-l §//2.

For two-stage and three stage group sequential trials, O'Brien's statistic would generally

18

be compared to Pocock boundaries or O'Brien and Flemming boundaries to decide if the

trial is to be continued or not.

The main advantage of this procedure is the sample-size saving. Let nj be the

sample size when only the ith endpoint is analyzed.

(n/2 §' ?;-l §)1/2 = (nJ2)1/2 fJJUj

=> (n/ni/2 = (OJ/Uj) (§' ~-l §//2.

The authors proved that §' ?;-l § ~ OJ 2/ Uj

2, for all i, so the sample size n $ min(nj)'

This implies that their test is more powerful than the univariate test on anyone

endpoint.

Some of the limitations of the proposed procedure reside in the fact that the data

must be normally distributed with known covariance matrix. Furthermore, early

stopping may· be based on small sample size· and the handling of missing data needs

more investigation. To summarize, this approach simply applies a group sequential

design to an O'Brien type univariate linear combination of the multivariate outcomes.

1.3 Proposed research

The objective of this research is to develop a new procedure to analyze multiple

endpoints that will take advantage of the techniques used to implement group sequential

designs combined with the advantages of multivariate testing. Although these designs

will be discussed in terms of their application to randomized clinical trials, the procedure

presented can be used in other contexts where multiple endpoints are analyzed.

In Chapter 2, this dissertation will compare the critical regions and the contours

for the power of each of three common procedures: Bonferroni's inequality, Rotelling's

T 2 and O'Brien's test. The efficiency and robustness of these procedures will be

19

compared as a function of the direction of the alternative hypotheses.

A new test is proposed in Chapter 3 that uses information from the interim

analysis in a two-stage group sequential design to form the rejection boundaries at the

second stage. The test uses an Hotelling T 2 rejection region at the end of the first stage

and an O'brien type procedure at the end of the second stage. The distribution of the

proposed test will be derived and its power and efficiency will be compared to common

procedures. A modification of this test that allows for early stopping i.e. acceptance of

the null hypothesis at the interim analysis will also be presented.

In Chapter 4, calculation of the new test statistic is illustrated by applying it to

two example data sets. The implications of this dissertation and suggestions for future

research are discussed in Chapter 5.

20

CHAPTER 2EVALUATION AND COMPARISON OF

THREE COMMON MULTIVARIATE TESTING PROCEDURES

2.1 Introduction

Studies are often designed to compare two or more groups with respect to one or

more variables. For clarity, we will restrict ourselves to two groups and two variables

for most of this chapter. Define OJ as a statistic computed from the observed data and

O'i.as the standard deviation of OJ' i=1,2. The null hypothesis to be tested is of the form1

Ho: ~ = ~o where ~' = (9 1, ( 2 ) is generally the differences between the means of the two

groups for each variable but could also be another appropriate parameter. Let ~' = (ZI'

0. - 9·Z2) where Zj = I (1'. 10, Zl and Z2 will be assumed to be distributed normally or at

'j

least approximately normally 'distributed based on large sample theory, with zero means

and unit variances under the null hypothesis. One major concern is the power of the

test i.e. the probability of rejecting Ho given Ha is true, based on the true values 91a

and 92a • But what does 'rejecting Ho ' mean in the case of multivariate samples?

Rejection of Ho: 91 = 92 = 0 leads to eight possible alternatives 91= 0 and 92 > 0, or

91< 0 and 92 < 0, or 91< 0 and 92 = 0, or 91< 0 and 92 > O. Should all the possible

alternatives be looked at jointly or should more power be allowed to detect some of

them considered of prime interest? In this chapter, the power for these different

alternatives will be studied for each of the three procedures described in Chapter 1:

Bonferroni's inequality, Hotelling's T 2 and 0 'Brien's test.

Let 1-{3 denote the power with a type I error level a. The values of 0la and 02a

for which the power is 50%, 80% and 90% with a significance level a = 0.05 will be

derived. Initially, in Sections 2.2 to 2.4, Z1and Z2 will be assumed independent. The

case where Z1 and Z2 are correlated will be discussed in Section 2.6. The critical regions

for each test will be shown as well as the contours for the power. These will form the

basis for describing the advantages and disadvantages of existing procedures.

2.2 Bonferroni's inequality

Bonferroni's correction, we will reject Ho = {H 1 , H2} if any Pj ~ af2, i=1,2, a = 0.025

for a two-sided test. The critical value of the test is Z.0125 = 2:24. The square in

Figure 2.2.1 represents the critical region of the test. Letting Zl and Z2 be defined as

previously, the power is given by

Pr(rejecting Ho I Ha true)

= Pr(Zl > 2.24 or Zl < -2.24 or Z2 > 2.24 or Z2 < -2.24 I Hatrue)

= 1 - Pr( -2.24 < Zl < 2.24 and -2.24 < Z2 < 2.24 I Hatrue) (2.2.1)

= 1 - [Pr( -2.24 < Zl < 2.24 I Hatrue).Pr( -2.24 < Z2 < 2.24 I Hatrue)]

= 1 - [Pr( -2.24 - 0la<Z~<2.24 - 0la).Pr( -2.24 - 02a<Z2'<2.24 - 02a)]

where Zi. = Zj -Oia is a random variable with standard normal distribution and

Pr( -2.24 < Zj < 2.24 I Hatrue)

((1) (2)= Pr -2.24-0ia « 2.24-0ia ) where 0ia = Jlja - Jlja ' under Ha

= Pr( -2.24-0ia <Zj< 2.24-0ia )

Figure 2.2.2 shows the contours for powers of 50%, 80% and 90% with Q =0.05.

22

4

3

2

o

-1

-2

-3

o

-4 L,------r---r----..----r-----,----r----,----,-

-4 -3 -2 -1 o

Z1

2 3 4

Figure 2.2. 1 Critical RegionsBonferroni (B) Hotelling (H) O'Brien (0)

23

6

• 'J.' 0,.-,

, ", ,. "..... "."'."S, I :. ,

, ' ,.. ,.. ,, ' ,, ".' ,, ",',.','~

'H

2

5

3

4

-5

82a 0

-1

-2

-3

-4

-6 -5 -4 -3 -2 -1 0 23456

Figure 2.2.2 Contours for powers of 50%, 80% and 90%Sonferroni (B) Hotelling (H) O'Brien (0)

24

For a prespecified power and a fixed value of 81a , a computer search was done to

determine the positive value of 82a that would ensure the desired power. This was

repeated for 0:581a:54 to generate the points (81a , 82a ) in the first quadrant. The

points in the other quadrants were obtained by symmetry relations with the ones

already calculated. All the points (81a , 82a ) on the almost circular contours satisfy the

equations above with power 50%, 80% and 90% respectively. Power contours for Simes'

modification of Bonferroni's correction were also determined. The results were so similar

that the contours of both tests coincide in Figure 2.2.2.

2.3 Hotelling's T 2

For the two sample problem with known variance-covariance matrix, let {~f1)},

i=1,..... ,N1 be a sample from a N(t'(l), ~) population and {~f2)}, i=1,..... ,N2 be a

sample from a population N(t'(2), ~). Define

(2.3.1)

and

(2.3.2)

So, for Ho: ~ = t'(1) - t'(2) = Q, the confidence region, when ~ is known, is defined as

(2.3.3)

Asymptotically, for p=2, ~ = ! and N1 = N2 :;: N, the confidence region is

25

Z'Z _- N (_:[(1) __:[(2»), (_:[(1) __:[(2») < 2 599 r. 005_ _ _ _ _ _ X2 = . lor a=. . (2.3.4)

This inequality is the interior and boundary of a circle with center at (0,0) and

ray = ~ 5.99 as shown in Figure 2.2.1.

Under Ha,

Z Z NlN2 (_(1) _(2»), ~-1(_(1) _(2») < 2-'-= Nl+N2~ -~ ~ ~ -~ _Xp.nc

where the non-centrality parameter nc is

Asymptotically, for p=2, ~ = ! and Nl = N2 = N,

(2.3.5)

(2.3.6)

(2.3.7)

So, for all the values (81a ,82a ) where 81a = {N(jJ~~) - jJ~~») and 82a =

{N(jJ~~) - jJ~~») such that 8~a + 8~a= nc, the probability of rejecting Ho given (81a,82a )

1 - Pr(X~.nc < X~). nc= 8ia + 8~a' (2.3.8)

Power contours for HoteIling's test can be seen on Figure 2.2.2. A computer

search determined the distance from the origin for which powers of 50%, 80% and 90%

were respectively obtained. All points on a circle with ray equal to this distance have

the same power.

26

2.4 O'Brien's Test

For the two sample problem with the previous notation, p=2, t = !, O'Brien's

test for the null hypothesis Ho : J!<I) = /2) is asymptotically

F = L:nj {J't-I()\ - Y.. )}2 / {(I-I) J't-1J}i

{(_<I) _(2») (_<I) _'(2»)}2_ YI - YI + Y2 - Y2

- n 2 2

The confidence region is

(2.4.1)

IZl + Z21 = ~ n/2 l(y~l)- y~2») + (y~l) - y~2»)1 ~ ~ 2 (F0.95,1,00)

= ~ 2 (3.8416)

= 2.77

This inequality represents the region included between the two lines

equations (2.4.3) have power 0.5 i.e.

27

(2.4.2)

(2.4.3)

satisfying

(2.4.4)

Similarly, all the points lying on the two parallel lines, perpendicular to the 450

line through the origin in quadrant I and III equidistant from the origin, will have the

same power. To calculate this power, let's consider new axes U and V centered in a

given point (01a,02a) as in Figure 2.4.1. It is easy to verify that the length of the new

axis U from the origin to the line with power 0.5 is 1.96. So, the power at (01a,02a) is

1-{3 = Pr(rejecting Ho I (01a,02a»

= Pr(U > 1.96 - ~ O~a + O~a) + Pr(U < -( 1.96 + ~ O~a + O~a»'

(2.4.5)

All points on the lines perpendicular to the 450 line through the origin in

points satisfy the equations:

For example, to ensure a power of 0.8, (01a,02a)=(1.99,1.99).

Pr(U > 1.96 - ~ 0ia + O~a) + Pr(U < -(1.96 + ~ O~a + O~a» = 0.8

=> 1.96 - ~ 8~a + O~a = -0.85

=> 0la = 02a = 1.99.

28

6

5

4

3

2

820

0

(9'0. 9 20)-1

-2

-3

-4

-5

-6

-6 -5 -4 -3 -2 -1 0 2 3 4 5 6

Figure 2.4.1. New axes to determine the power of

O'Brien's test

29

All points on 82a,1-13 = -81a,1-13 + 3.98 and 82a,1-13 = -81a,1-13 - 3.98 have

power 0.8 (Figure 2.2.2). Similarly, all points on 82a,1-13 = -81a,l-13 + 4.6 and 82a,1-13 =

-81a,1-.8 - 4.6 have power 0.9.

2.5 Comparison of the three procedures

The comparison will first be done in a bivariate context assuming independence,

although it can be generalized to more than two correlated variables. Of the three tests

previously described, O'Brien's test has the greatest power to detect an alternative

hypothesis that lies on the diagonal in quadrants I and III. This attractive property

holds for all (81a ,82a ) within a certain symmetric distance from the diagonal as shown

on Figure 2.5.1. The regions '0' were determined from the intersections of O'Brien's

and Hotelling's contours for power varying from 5% to 95%. The set of (81a ,82a ) for

.which the power of O'Brien's test is better than for Hotelling's and Bonferroni's test is

indicated in the areas '0'. The regions 'B' were obtained in a similar way but the

intersections of Bonferroni's and Hotelling's contours were considered. If the power is

greater than 27%, Bonferroni's test is better within the regions 'B'. For a fixed

alternative (('la,82a ), outside the regions '0', Hotelling's test will have better power

than Bonferroni's and O'Brien's procedures. If the truth lies on the diagonal in

quadrants II and IV, O'Brien's power is then at its minimum i.e. 5%. It should be

emphasized that O'Brien's test is optimal when all variables are believed to be affected

in exactly the same direction and the same magnitude while Hotelling's T 2 is not affected

by the direction of the effect of the variables. This property makes the T 2 a more

robust approach not only when the truth lies on the diagonal through quadrants II and

IV but for all (81a ,82a ) included in the regions 'H' on Figure 2.5.1. However, if the

power is at least 27% and if one of the two statistics is close to zero while the other one

reaches its maximum value, then Bonferroni's procedure will have better chances of

30

-4 -3 -2 -1 o 2 3 4

Figure 2.5. 1 Regions in which each test has greaterpower than the other two testsBonferroni (B) Hotelling (H) O'Brien (0)

31

detecting a difference. It can be seen that intersection points- on Figure 2.2.2 for powers

of 50%, 80% and 90% fall on the contours of the regions observed on Figure 2.5.1.

Showing the regions for which each test is the best does not indicate how much

better each test is. To q.uantify the differences in power, a region covering -90° to 90°,

from the diagonal of quadrants II and IV was considered. The half circle from

Hotelling's contours included in this region was divided into 180 one-degree sections

(Figure 2.5.2). The values of (Ola,02a) for each one-degree section on Hotelling's

contour at 50% power were determined and corresponding Bonferroni's and O'Brien's

powers were evaluated. Figure 2.5.2.1 shows the power for all three tests when

Hotelling's is 50%. Figures 2.5.2.2 and 2.5.2.3 correspond to Hotelling's powers of 80%

and 90% respectively.

As expected, the curves are symmetric about zero degree and the power of

O'Brien's test is 5% at _90° and 90° i.e. on the diagonal in quadrants II and IV.

O'Brien's test has maximum power at 0°, on the diagonal of quadrants I and III. This

maximal improvement is only of 11% when the power of Hotelling's test is 50% and

decreases to 4.5% when the power of Hotelling's test is 90%. It is also obvious from

these figures that although O'Brien's test has better power when all variables are

affected in the same direction, it deteriorates sharply to achieve the lowest power of 5%

when variables are affected in opposite directions. Bonferroni's largest improvement

over Hotelling's T 2 is small (~1%), only occurs for powers greater than 27% and is

limited to few values of (Ola,02a)'

In conclusion, O'Brien's test has better power when the outcomes are in exactly

the same direction but is by far the worst to detect a difference when the outcomes are

affected in opposite directions. Hotelling's constant power dominates Bonferroni's

conserva.tism except for a. narrow range of (813 ,82a ) where the improvement due to

Bonferroni is much smaller than the one gained by Hotelling's when its power is

32

-6 -5 -4 -3 -2 -1 0 23456

Figure 2.5.2 One-degree sections for 80% power contour

of Hotelling's test

33

1.0

0.9

0.8

0.7P0 0.6

WE 0.5

R0.4

0.3

0.2

0.1

0.0

-90 -45 o

ANGLE (degrees)

45

o

90

Figure 2.5.2.1 Power at each one-degree sectionfor alternative hypotheses for which

the power of Hotelling's test is 50%Bonferroni (B) Hotelling (H) O'Brien (0)

34

•

1.0

0.9

0.8

0.7P0 0.6

WE 0.5

R0.4

0.3

0.2

0.1

0.0

-90 -45 o

ANGLE (degrees)

45 90

Figure 2.5.2.2 Power at each one-degree sectionfor alternative hypotheses for whichthe power of Hotelling's test is 80%Bonferroni (B) Hotelling (H) O'Brien (0)

35

1.0

H0.9

B0.8

0.7P0 0.6

WE 0:5

R0.4

0.3

0.2

0.1 0

0.0

-90 -45 0 45 90

ANGLE (degrees)

Figure 2.5.2.3 Power at each one-degree sectionfor alternative hypotheses for which

the power of Hotelling's test is 90%

Bonferroni (B) Hotelling (H) O'Brien (0)

36

greater.

To compare the power of O'Brien's test with that of Hotelling's T 2 when more

than two uncorrelated outcomes are studied, different alternatives were considered. The

non-centrality p!i-rameters (NC), when the power of Hotelling's test is 80%, were

evaluated with the X2 distribution for 3, 4, 5 and 10 endpoints; their values w~re 9.63,

10.90, 11.94, 12.83 and 16.25 respectively. The cosine of the angle between ~:, the

vector of alternative hypotheses, and the diagonal l) was determined. For example, for

p = 2 endpoints:

where ota and O;a are the parameter values under the alternative hypothesis; (Ora' O;a)

can be any point on the axis V on Figure 2.4.1 and ~ NC * cos 'Y = ~ (O~a + (}~a) on

the same figure. So, cos 'Y = 1 for the alternative hypothesis where all o~, i=l, 2, ... , p,.

are equal and 'Y = 0°. If (p-1) O~ 's are equal and one 0i~ = 0, cos 'Y = ~ p~l. If (p-2)

O~ 's are equal and two (}~ 's ~ 0, cos 'Y = ~ p~2 so if (p-r) O~ 's are equal and r (}~ 's =

o then cos 'Y = ~ p~r, r < p. Once cos 'Y was determined, the power was evaluated as

follows:

power = 1 - 4{1.96 - ~ NC * cos 'Y ) + 4>( - 1.96 - ~ NC * cos 'Y ).

and results are presented in Table 2.5.1. When all the (}la's are equal, the power of

O'Brien's test increases as the number of endpoints increases. However, as the number

of (}ia's that are equal decreases, the power also decreases; it can be as low as 24.7% for

37

Table 2.5.1 Power of O'Brien's test, for different alternative hypotheses when thepower of Hotelling's T 2 is 80% and the number of endpointsp = 2, 3, 4, 5 and 10.

Number of endpoints ..Alternative hypotheses p = 2 p=3 p=4 p=5 p = 10

all Dia's are equal 87.3% 91.0% 93.3% 94.8% 98.1%

( 0°)<*> ( 0°) ( 0°) ( 0°) ( 0°)

(p-1) Dia's are equal & 59.2% 76.9% 84.9% 89.3% 96.9%

one Dia= 0 (45°) (35°) (30°) (27°) (18°)

(p-2) Dia's are equal & 47.9% 68.6% 79.2% 95.0%

two Dia's = 0 (55°) (45°) (39°) (27°)

(p-3) Dia's are equal & 40.8% 62.0% 92.1%

three Dia's = 0 (60°) (51°) (33°)

(p-4) Dia's are equal & 36.0% 87.7%

four Dia's = 0 (63°) (39°)

(p-9) Dia's are equal & 24.7%

nine Dia's = 0 (72°)

(*) Angle between ~:, the vector of alternative hypotheses, and the diagonal 1).

38

ten endpoints. When only half the 8ia 's are equal, the power is about 80% for tOen

endpoints but it drops to 68.6% when four endpoints are considered and even lower to

59.2% with two endpoints.

In the following section, the same three tests will be examined with correlated

data. This will allow the comparison of their behavior when the data are assumed to be

from N(Q,~:n populations.

2.6 Evaluation with correlated data.

In many clinical trials, it is common to find outcome variables which are

correlated. The assumption of independence used in the previous sections does not hold

anymore. By applying a transformation to the Zl and Z2 independent standard normal

variables, it is possible to obtain the critical regions and power contours for Hotelling's

and O'Brien's test with correlated data.

: ] be a matdx ,uch that j!: = ~ ~ andLet A = [ a- b

Let ~ = (Zl,Z2)' be a vector of independent N(0,1) variables and ~ = (Xl ,X2)'

be N(Q,~) where

~ Xl = aZl + bZ2

X2 = bZl + cZ2

~ Xi = (a + b) Zj

Xi = (b + c) Zj

~ a= c.

(2.6.1)

= Cov(aZl + bZ2, bZl + cZ2)

= ab Cov(Zl,Zl) + bc Cov(Z2,Z2)

= ab + bc

= 2 ab (a = c) (2.6.2)

39

(2.6.1) and (2.6.2) => a = ~(1 + J1"'=7) / 2

(2.6.3)

So A =[ a- b

: ] where a and b are defined in (2.6.3) and 1> = ~~.

The critical regions and power contours have been redefined using that

transformation for p = -0.5, -0.9, 0.5 and 0.9. Figures 2.6.1.1 to 2.6.2.4 show the

transformed results. Bonferroni's critical regions are not affected by the correlation

structure but the powers were recalculated using an algorithm that provides bivariate

normal probabilities. The product of two univariate probabilities from (2.2.1) does not

hold with correlated data.

Bonferroni's and Hotelling's power contours are flattened and rotated so that the

longer axis is in the direction of the correlation. O'Brien's regions do not rotate; they

are shifted towards the origin as p decreases (Figures 2.6.2.1 to 2.6.2.4). As for the

uncorrelated data, O'Brien's power remains the greatest in the neighbourhood of the 45°

line of quadrants I and III although the range of values for which O'Brien's power is the

greatest varies widely with the degree of correlation. The more negatively correlated the

variables are, the wider the fange is where O'Brien's power is the best. For p = -0.9,

O'Brien's power surpasses Hotelling's for almost all the points included in the region

covering 180° from the diagonal of quadrants II and IV (Figures 2.6.3.2.1 to 2.6.3.2.3).

For positively correlated variables, the gain in power with O'Brien's test over Hotelling's

T 2 is lost with only a slight deviation from the the 45° line of quadrants I and III

(Figures 2.6.3.3.1 to 2.6.3.4.3). The maximum gain regardless of the correlation and the

value of Hotelling's power (50%, 80% or 90%) does not exceed 11%. Bonferroni's power

improves around 0° with a positive correlation but greatly deteriorates when the

variables are negatively correlated (Figures 2.6.3.1.1 to 2.6.3.2.3). When p = -0.9, its

power remains close to 10% for all the values of (Ola' 02a) in the first quadrant. The

same low power is observed in quadrants II and IV when p = 0.9. The conclusions

reached when the variables were not correlated remain the same. The gain in power

from O'Brien's or Bonferroni's test over Hotelling's T 2 remains overall small in

comparison to the improvement achieved when Hotelling's power is the greatest of the

three. However, O'Brien's test always has the best power to detect a difference when

40

both variables are affected in exactly the same direction and the same magnitude. It is

that property of O'Brien's test combined with the robustness of Hotelling's T 2 that

motivated the investigation of a new test. It will be presented and its distribution will

be derived in Chapter 3.

41

4

3

2

Z20

-1

0

-2

-3

-4 -3 -2 -1 o

Z1

2 3 4

Figure 2.6.1.1 Critical regions, rho = -0.5

Bonferroni (B) Hotelling (H) O'Brien (0)

42

4

3

2

o

-1

-2

-3

-4 "'r-----r--,--...,.-----,--,----.---,---,.-

-4 -3 -2 -1 o 1 2 3 4

Figure 2.6.1.2 Critical regions, rho = -0.9Bonferroni (B) Hotelling (H) O'Brien (0)

43

4

3

2

o

-1

-2

-3

-4 ';-----r-----,.---.,.-----r---.----,------y-----,

-4 -3 -2 -1 o 2 3 4 .

Figure 2.6.1.3 Critical regions, rho = 0.5Bonferroni (B) Hotelling (H) O'Brien (0)

44

4

3

2

o

-1

-2

-3

B

-4 -3 -2 -1 o 2 3 4

Figure 2.6.1.4 Critical regions, rho = 0.9Bonferroni (B) Hotelling (H) O'Brien (0)

45

6

5

4

3

2

9 200

-1

-2

-3

-4

-5

-6

-6 -5 -4 -3 -2 -1 0

() 10

23456

Figure 2.6.2.1 Contours for powers of 50%, 80% and90%, rho = -0.5Bonferroni (B) Hotelling (H) O'Brien (0)

46

6

5

4

3

2

() 200

-1

-2

-3

-4

-5

-6

-6 -5 -4 -3 -2 -1 0 23456

Figure 2.6.2.2 Contours for powers of 50%, 80% and 90%rho = -0.9Bonferroni (B) Hotelling (H) O'Brien (0)

47

6

B

.............:; 0

H

2

3

4

5

8 200

-1

-2

-3

-4

-5

-6

-6 -5 -4 -3 -2 -1 0 23456

Figure 2.6.2.3 Contours for powers of 50%. 80% and 90%rho = 0.5Bonferroni (B) Hotelling (H) O'Brien (0)

48

6

5

4

3

2

e200

-1

-2

-3

-4

-5

-6

·-6 -5 -4 -3 -2 -1 0

8 10

23456

Figure 2.6.2.4 Contours for powers of 50%. 80% and 90%rho = 0.9Bonferroni (B) Hotelling (H) O'Brien (0)

49

90o 45

ANGLE (deg-ees)

-450.0'r-----...------....------...------.,-

-90

0.2

0.1

0.3

1.0

0.8

0.9

0.7P00.6WE 0.5 --.,;::""",.-----,,L----------~-~,...<:;.---

R 0.4

Figure 2.6.3.1.1 Power at each one-degree sectionfor alternative hypotheses for whichthe power of Hotelling's test is 50%rho = -0.5Bonferroni (B) Hotelling (H) O'Brien (0)

50

1.0

0.9

0.8

0.7P0 0.6

W 0.5ER 0.4

0.3

0.2

O. 1

0.0-90 -45 0 45 90

ANGLE (deg-ees)


51

1.0B

O. 1

0.9~======="""""-----''''''::::':::::''''-'''------=:::::::''''~-""'''?''''''''''::::::==

0.8

0.3

0.2

0.7

~ 0.6

~ 0.5

0.4

90o 45

ANGLE (deg-ees)

-450.0"lr------r-------,.....----~----__,_

-90


52

1.0

0.9

0.8

o

B

0.1

0.3

0.2

0.7

~F? 0.6

0.5-1~~_c-------------------:.......----,L-

H0.4

90.0 45

ANGLE (deg-ees)

-450.0 l.,------.....-----~----_.__----___r

-90

Figure 2.6.3.2. 1 Power at each one-degree sectionfor alternative hypotheses for whichthe power of Hotelling's test is 50%rho = -0.9Bonferroni (B) Hotelling (H) O'Brien (0)

53

1.0

B

o

H

O. 1

0.8-l-J"r---..,..c--------------~~-r_

0.9

0.3

0.2

0.7

~ 0.6

~ 0.5

0.4

90o 45

ANGLE (deg-ees)

-4~

0.0"r------,------r--------.--------r-90

Figure 2.6.3.2.2 Power at each one-degree sectionfor alternative hypotheses for whichthe power of Hotelling's test is 80%rho =-0.9Bonferroni (B) Hotelling (H) O'Brien (0)

54

1.0B

0.9H

0.8

0.7iO.G

0.5

0.4

0.3

0.2

O. 1 O.

0.0-90 -45 0 45 90

ANGLE(deg-ees)


55

1.0

0.9

0.8

0.7

~ 0.6

~ 0.5

0.4

0.3

0.2

o. 1

H

B

o

90o 45

ANGLE(deg-ees)

-45o.o-.,..- ~----_..,._----_,_----__.

-90

Figure 2.6.3.3.1 Power at each one-degree sectionfor alternative hypotheses for whichthe power of Hotelling's test is 50%rho = 0.5Bonferroni (B) Hotelling (H) O'Brien (0)

56

90o 45

ANGLE (deg-ees)

-45

O. 1

1.0

0.9

0.0 'r------~----_r_----....__----..,.-90

0.3

0.2

0.7

~ 0.6

~ 0.5

0.4


57

1.0H

0.9~-------?"'"""':::::::;;~=:::=:::S?"""""::::--------

0.8

0.7

@0.6

~ 0.5

0.4

0.3

0.2

O. 1

90o 45

ANGLE(deg-eeS)

-450.0 'r------r-----...,.------,-------r

-90


58

1.0

0.9

0.8

0.7

~ 0.6

~ 0.5

0.4

0.3

0.2

O. 1

H

B

a0.0 ;-----.,------r-------r------=-r-90 -45 0 45 90

ANGLE(deg-ees)

Figure 2.6.3.4. 1 Power at each one-degree sectionfor alternative hypotheses for whichthe power of Hotelling's test is 50%rho = 0.9Bonferroni (B) Hotelling (H) O'Brien (0)

59

1.0

0.9

0.8

0.7

~ 0.6

~ 0.5

0.4

0.3

0.2

0.1

H

B

o

90o 45

ANGLE(deg-ees)

-45

0.0...... --.- --.- ---.- -,-

-90


60

1.0H

0.9

0.8

0.7i0.60.5

0.4

0.3

0.2 BO. , 0

0.0-90 -45 0 45 90

ANGLE (deg-ees)

Figure 2.6.3.4.3 Power at each one-degree sectiomfor alternative hypotheses for whichthe power of Hotelling's test is 90%rho = 0.9Bonferroni (B) Hotelling (H) O'Brien (0)

61

CHAPTER 3TWO-STAGE GROUP SEQUENTIAL TEST WITH MULTIPLE ENDPOINTS

3.1 Introduction

In Chapter 2, O'Brien's test was shown to be the best test to detect a difference

between treatments when the true effects of the variables are all in the same direction

predicted by O'Brien; however, the test becomes very inefficient with departure from

this prespecified direction. A group sequential approach that would allow the

investigators to look at the direction of the data through an interim analysis will be

investigated. It would relax the requirement of having all effects in the same direction

and exactly the same effect sizes and would result in a very robust test.

In this chapter, a two-stage group sequential test for randomized clinical trials

with multiple endpoints will be developed to test the hypothesis Ho : € = €o vs Ha : € i=

~o. Its distribution under the null and the alternative hypotheses will be derived. The

proportion p of the whole sample after which the interim analysis should be done will

then be examined.

3.2 The new test statistic L

Consider a two-stage group sequential design with the interim analysis performed

after the accrual of n 1 subjects. The final analysis would take place after n2 additional

subjects would be included in the study i.e. with n = n1 + n2 subjects.

Let Xl and Y I be normally distributed test statistics for two uncorrelated

variables based on data from the 1st accrual period with mean zero and unit variance

under the null hypothesis; let X and Y be the same test statistics based on the data

from the whole trial. Let Y1 be the vector from the origin to (Xl' Y1)' the statistics

calculated from the data of the 1st accrual period, y be the vector from the origin to

(X, V), the statistics calculated from all the data at the end of the trial, and Y2 = (y

~ ndn y 1) / ~ n2/n . O'brien's test is based on the length of the projection of y on the

diagonal through the first and third quadrant, i.e. ~'y where ~' = (1/.[2, 1/.[2).

O'Brien's test can easily be extended to be optimal to detect any hypotheses that lay on

a specified line in the direction of ~, where ~ may be chosen by 'experts'. However, as

was noted in Chapter 2, the test can perform extremely poorly if the true effect deviates

in direction from the chosen~. Of particular concern would be the case where the data

from an interim review suggests that the effect is 600 away from the direction chosen

before the trial began. Do the investigators continue the trial with the knowledge that

they are very likely looking in the wrong direction or do they 'cheat' and change the test

in mid-trial? The proposed test is based on chosing ~ to be in the direction indicated by

Y1 from the interim analysis. The new statistic is defined as:

v'vL = - I -

~y/ YI

Replacing Y by ~ ndn Y1 + ~ n2/n Y2'

The distribution of L1 and L2 follow immediately.

63

(3.2.1)

(3.2.2)

(3.2.3)

(3.2.4)

•

v'Letting !O' = J -~ ,

Yl Yl

L2 = a' Y2 where Y2 '" N(Q,I) and L2 '" N(O,1).

3.2.1 Density function of L under flo

(3.2.5)

To derive the distribution of L = ..JP ~ + ~(1-p) L2 under the null

hypothesis, let K = L1 . Then,

L - .[PKL2 = .~ and L1 = K

(l-p)

and

J=1/~(1-p)

°-~ p/(l-p)

1

_ 1

- ~(1-p)·

N(O,1).

L1 has a chi-square distribution with 2 degrees of freedom and L2 is distributed

~l and ~2 are independent because they are from independent samples so L1

and L2 are also independent under flo.

So,

1

It follows that

64

00 -k (1-.[Pi<)2

fd I) = 1 Je-r 2(1-p) dk2ffi~{1-p) 0

00 -k(1-P)-(12-21.[Pi<+pk)

_ 1 Je 2(1-p) dk2ffi~{l-p) 0

00 -k-I2 +21.[Pi<

= 1 Je 2(1-p) dk2ffi~{1-p) 0

\

(3.2.6)

But,

00

Jrl;o

transformation. Let

2(~~P)({k-{P1)2e dk

(.[.k - .[PI)u = ~(l-p)'

can be evaluated with the following

it follows that {k = ~(l-p)u + .[pI and

du = 1 dk.2.Jk~{l-p)

-.[pIIf k=O then u = ~ and if k=oo then u=oo.

(l-p)

OOJ 1 2(~~P)({k-{p1)2~e dk~21ro

00 1 2

= J rl; e-2u 2(~ (l-p)u + .[PI) ~ (l-p) du

-{PI

~ (l-p)

We then have

•

OOJ _!u 2

- 1 2(1-p) ue 2 du +- :J2;-{PI

~ (l-p)

65

So,

~( .fP I)~ (l-p) .

(3.2.7)

The density function of L for p = 0.1, 0.5 and 0.9 is graphed in Figures 3.1.2.1

to 3.1.2.3, respectively. Note that when p = 0, L ,... N(O,I) and when p = 1, L is

equivalent to Hotelling's T 2 •

l~he critical region, for a fixed p, was determined by finding the value Ie for

which f fL (I) = 0.95. Euler's method of numerical integration was used iteratively and

the res~1is are shown in Table 3.2.1.1.

3.2.2 Power of L

3.2.2.1 SpeCial case: p = ~l = 0.5 and Jl2 = O.

To evaluate the power of L, let's define Xl ,... N(.JIil J.l1' 1), Y1 ,... N(.JIil J.l2' 1),

X2 ,... N( .[n2J.l1' 1) and Y2 ,... N( .[ri2J.l2' 1). The asymptotic distribution will be

derived assuming J.l1 and J.l2 are of 0(n-1

/2

). Let J.l2 = 0 and J.l = ..pil J.l1 = ..[D2 J.ll'

66

1.0

0.9

0.8

0.7

0.6

f(l)0.5

0.4

0.3

0.2

O. 1

0.0

-10 -5 0

Figure 3.2. 1. 1 Density of L. p - O. 1

67

5 10

.,

1.0

0.9

0.8

0.7

0.6

f(l)0.5

0.4

0.3

0.2

O. 1

105o-5

0.0 t.;=======::::::;::====:::::..----.-----2:==:;::::::=======:..-10

Figure 3.2.1.2 Density of L, p =0.5

68

1.0

0.9

0.8

0.7

0.6

f(l)0.5

0.4

0.3

0.2

O. 1

0.0

-10 -5 0

Figure 3.2. 1.3 Density of L, p - 0.9

69

5 10

...

..

Table 3.2.1.1. Critical values Ie and power for L, a = 0.05.

£ power

0.00 1.96 0.5275

0.05 2.0079 0.5821

0.10 2.0526 0.6276

0.15 2.0941 0.6642

0.20 2.1325 0.6935

0.25 2.1677 0.7169

0.30 2.2000 0.7355

0.35 2.2294 0.7503

0.40 2.2561 0.7620

0.45 2.2805 0.7711

0.50 2.3027 0.7783

0.55 2.3229 0.7838

0.60 2.3415 0.7881

0.65 2.3584 0.7914

0.70 2.3741 0.7938

0.75 2.3886 0.7956

0.80 2.4020 0.7970

0.85 2.4146 0.7979

0.90 2.4264 0.7984

0.95 2.4376 0.7987

1.00 2.4474 0.8000

70

then

Xl ,.., N(p, 1) X2 ,.., N(p, 1) Y 1 ,.., N(O, 1) Y2 ,.., N(O, 1).

Consequently,

(3.2.8)

(3.2.9)

The general case, using p = ~, will be developed later. In a first step, nl and

n2 wil be assumed equal to n/2 so that:

From (3.2.8),

X1X2 + Y1Y2

.[2~Xf + Yf .(3.2.10)

fXl,X2,Vl;V2(Xl,X2'Yl'Y2) = (2;)2 exp {-~ [(xl - p)2 + (X2 - p)2 + y~ + y~]).(3.2.11)

where r

Letting R = ~Xf + Y~,

.[2 L = R + XlX2 t Y1Y2

Y_ .[2RL - R2 - X1X2 8Y2 .[2R

=> 2 - Yl and 8L =~.

The density of Lis:

- ~Xf + yf and

(3.2.12)

(3.2.13)

A

71

= {X2 ft - (JJ~1 + Yi({21 - r))}2 - (JJ~1 + ;~({21 - r))2 +

~(Y~ + ({21 - r)2) + 2JJ2 - 2x1JJY1

_ { X2 #i - (JJ~1 + ;~({21 - r))}2 + 212 - 2{21 (r + JJ~1)2 2

_JJ ;1 + 2JJ2 + 2r2r

- { X2 Yi - (JJ~1 + ;~({21 - r))}2 + B

and B = 212 - 2{21 (r + JJ~1) - <;~ + 2JJ2 + 2r2

(3.2.14)

(3.2.15)

(3.2.16)

2 2By completing the square, B = 212 - 2{21 (r + JJ~1) _ JJ}1 + 2J.l2 + 2r2

= {{21 - (r + JJ~1)}2 _ (r + JJ~1)2 _ <;~ + 2JJ2 + 2r2

= {{21 - (r + JJ~1)}2 + (JJ - X1)2 + y~.

So the power when L = lc is: FL(-lc) + (1 - FL(lc)) where

FL(lC) = TT ,r;/2 JeXP{-~{(.f21 - (r + Jl~l ))2 + (JJ - x1)2 + y~} (3.2.17)-00-00(211') -00

dl dYl dXl

72

The more general case where p = ~1 and JJ* = {ii JJI will now be developed.

3.2.2.2 General case: p = ~ and JJ* = {ii JJI.

From (3.2.9),

Then, ..[iii JJI = {Pii JJl and .,[ii2 JJI = ~ (1-p)n JJI so,

fXI ,X2'YI'Y2(XI,X2'Yl'Y2) = (2;)2 exp {-~ [(xl - {jiJJ*)2 + (X2 - ~1-PJJ*)2 + y~ + y~]).(3.2.19)

Letting R = ~X? + V?,

..

and o_Y_2 = '""l"=~R~oL ~ 1-p Y I

where r

The density of Lis:

73

(3.2.20)

A = (x _{pJl*)2 + (x -~l-p Jl*)2 + Y~ + (rI-{pr2

-.J'T-P XIX2)2I 2 (l-p) Y~

= X~ - 2XI {pJl* + pJl*2 + X~ - 2X2~ l-pJl* + (l-p)Jl*2 + Y~ + 2r212

2(l-P)YI

pr4 X~X~ 2{pr31 2.Jl-'P rlxlX2 2{pr2~ l-p XlX2+ 2+-2-- 2- 2 + 2

(l-P)YI YI (l-P)YI (I-P)Yl (l-P)YI

= X2 f2 _ 2 X f (.J'T-PJl*YI + xl (I - rpf»)2 ~ 2 Yi r ~l-PYI "P"

+ r2

2(I-p)y~ + (l - {pr)2) + Jl*2 - 2XI {pJl*(I-P)YI

= {X .L _ (.J'T-PJl*YI + xl (I _ {pr»)}22 YI r ~l-PYI

- (.J'T-P:*YI + xl (I _ {pr»)2 + r2

2((1-p)y~ + (I - {pr)2)~ I-PYI (l-P)YI

(3.2.21)

(3.2.22)

(3.2.23)

By completing the square,

B = £ _ 21 ({pr + Jl*Xl) _ (l-p)Jl*2y~ + Jl*2 + .Ll-p l-p r r2 l-p

= l~p {(I- (.[pr + (lop~*Xl))2 _ (.[pr + (lop~*Xl)2

74

_ (1_p)2 J.&*2y~ + (l-p )J.&*2 + r2}r 2

= l~P (1 - (.JPr + (1-P~*Xl))2 + (.JPJ.&* - Xl)2 + y~.

So the power when L = Ie is: FL(-le) + (1 - FL(le)) where

..

00 00 {

=f f 1 3/2-00-00 ~ 1-p (2?l')

*

(3.2.26)

The power of this test, when Hotelling's T 2 power is 80%, was obtained by

evaluating FL(-le) + (1 - FL(le)) where J.&* = 3.1064 and Ie was taken from Table

3.2.1.1 for each different value of p. For example, when p = 0.10, Ie = 2.0526. When

Hotelling's power is 80%, the ray of its power contour is 3.1064. As expected, without

allowing for early stopping, the power of the new test is not superior to that of

Hotelling's T 2 . If Y1 is a short vector i.e. the statistics observed at the end of the

interim analysis do not deviate substantially from the null hypothesis, then the direction

of the vector Y2 can vary greatly as well as its projection on Y1 which affects the power

of the new test. Allowing for early acceptance of the null hypothesis takes care of this

situation and improves the power.

75

3.2.3 Distribution of L, allowing for early stopping.

In the previous section, the distribution and power of L have been derived

without allowing for early stopping. However, if LI is very small, the possibility of not

rejecting Ho must be considered. In the same manner, if LI is very large, Ho could be

rejected and the trial stopped. Let cI' C2 and Ie be three critical values such that :

Pr (rejecting Ho at the interim analysis) = Pr (LI > CI)

and

Pr (going to the 2nd stage and rejecting Ho ) = Pr (C2 < LI < c i and L > Ie).

Assume Cl= 00, so that the probability of rejecting Ho at the interim analysis is

zero. Then, modifying (3.2.6),

.JP I ~(_ c2 -..fP I ) }~(l-p)

(3.3.1)

For a fixed p, and C2 such that P(LI <C2) = P l , the probability of rejecting the

null hypothesis is:

(3.3.2)

where

76

and •

*

Table 3.2.3.1 shows the critical values Ie for which the above probability is 0.05

and, p (ndn) and P 1 vary from 0 to 0.90 by increments of 0.10. The power using the

alternative hypothesis for which Hotelling's power is 80% is presented in Table 3.2.3.2.

The greatest power 89.3% is observed at p = n1/n = 0.50 and P 1 = P(L1 < C2) = 0.70.

In that neighbourhood, for p between 0.40 and 0.60 and P 1 between 0.5 and 0.8, the

power is approximately 88%. It would seem reasonable and intuitive to consider p =

P 1 = 0.50 in the planning of a trial. Finally, from Table 3.2.3.3, it can be seen that the

greatest power for the new test 63.4%, for alternative hypotheses for which Hotelling's

power is 50% is obtained when p = 0.50 and P 1 = 0.70. Figures 3.2.3.1 and 3.2.3.2

show the added line for the power of L when PI = 0.7 and p = 0.5. The power of the

new test L is constant for alternative hypotheses on the 1800 region and it is also

greater than the power of the other three tests for the same alternative hypotheses. The

gain in power may be explained by the O'Brien 'type' approach used with the new test

but part of the improvement may also be due to the use of a two-stage design compared

to a one-stage design for the other three tests.

77

Table 3.2.3.1 Critical values Ie and power for L, allowing for early stopping, a = 0.05.

P PI C2 Ie P 2

0.00 0.00 0.0000 1.9600 0.950.00 0.10 0.4590 1.9150 0.850.00 0.20 0.6680 1.8630 0.750.00 0.30 0.8446 1.8030 0.650.00 0.40 1.0108 1.7320 0.550.00 0.50 1.1774 1.6450 0.450.00 0.60 1.3537 1.5350 0.350.00 0.70 1.5518 1.3830 0.250.00 0.80 1.7941 1.1510 0.150.00 0.90 2.1460 0.6750 0.05

0.10 0.00 0.0000 2.0530 0.950.10 0.10 0.4590 2.0240 0.850.10 0.20 0.6680 1.9890 0.750.10 0.30 0.8446 1.9470 0.650.10 0.40 1.0108 1.8960 0.550.10 0.50 1.1774 1.8300 0.450.10 0.60 1.3537 1.7410 0.350.10 0.70 1.5518 1.6130 0.250.10 0.80 1.7941 1.3990 0.150.10 0.90 2.1460 0.8950 0.05

0.20 0.00 0.0000 2.1330 0.950.20 0.10 0.4590 2.1150 0.850.20 0.20 0.6680 2.0940 0.750.20 0.30 0.8446 2.0660 0.650.20 0.40 1.0108 2.0300 0.550.20 0.50 1.1774 1.9810 0.450.20 0.60 1.3537 1.9130 0.350.20 0.70 1.5518 1.8090 0.250.20 0.80 1.7941 1.6260 0.150.20 0.90 2.1460 1.1510 0.05

0.30 0.00 0.0000 2.2000 0.950.30 0.10 0.4590 2.1910 0.850.30 0.20 0.6680 2.1780 0.750.30 0.30 0.8446 2.1600 0.650.30 0.40 1.0108 2.1360 0.550.30 0.50 1.1774 2.1010 0.450.30 0.60 1.3537 2.0490 0.350.30 0.70 1.5518 1.9660 0.250.30 0.80 1.7941 1.8140 0.150.30 0.90 2.1460 1.3940 0.05

78

Table 3.2.3.1 (continued)

P Pi C2 Ie P 2 ..0.40 0.00 0.0000 2.2560 0.950.40 0.10 0.4590 2.2520 0.850.40 0.20 0.6680 2.2450 0.750.40 0.30 0.8446 2.2340 0.650.40 0.40 1.0108 2.2180 0.550.40 0.50 1.1774 2.1940 0.450.40 0.60 1.3537 2.1560 0.350.40 0.70 1.5518 2.0920 0.250.40 0.80 1.7941 1.9690 0.150.40 0.90 2.1460 1.6060 0.05

0.50 0.00 0.0000 2.3030 0.950.50 0.10 0.4590 2.3010 0.850.50 0.20 0.6680 2.2980 0.750.50 0.30 0.8446 2.2920 0.650.50 0.40 1.0108 2.2830 0.550.50 0.50 1.1774 2.2670 0.450.50 0.60 1.3537 2.2410 0.350.50 0.70 1.5518 2.1940 0.250.50 0.80 1.7941 2.0980 0.150.50 0.90 2.1460 1.7940 0.05

0.60 0.00 0.0000 2.3420 0.950.60 0.10 0.4590 2.3410 0.850.60 0.20 0.6680 2.3400 0.750.60 0.30 0.8446 2.3380 0.650.60 0.40 1.0108 2.3330 0.550.60 0.50 1.1774 2.3240 0.450.60 0.60 1.3537 2.3080 0.350.60 0.70 1.5518 2.2770 0.250.60 0.80 1.7941 2.2070 0.150.60 0.90 2.1460 1.9620 0.05

0.70 0.00 0.0000 2.3740 0.950.70 0.10 0.4590 2.3740 0.850.70 0.20 0.6680 2.3740 0.750.70 0.30 0.8446 2.3730 0.650.70 0.40 1.0108 2.3720 0.550.70 0.50 1.1774 2.3680 0.450.70 0.60 1.3537 2.3610 0.350.70 0.70 1.5518 2.3430 0.250.70 0.80 1.7941 2.2980 0.150.70 0.90 2.1460 2.1150 0.05

79

Table 3.2.3.1 (continued)

P PI c2 Ie P 2

0.80 0.00 0.0000 2.4020 0.950.80 0.10 0.4590 2.4020 0.850.80 0.20 0.6680 2.4020 0.750.80 0.30 0.8446 2.4020 0.650.80 0.40 1.0108 2.4020 0.550.80 0.50 1.1774 2.4010 0.450.80 0.60 1.3537 2.3990 0.350.80 0.70 1.5518 2.3930 0.250.80 0.80 1.7941 2.3710 0.150.80 0.90 2.1460 2.2530 0.05

0.90 0.00 0.0000 2.4270 0.950.90 0.10 0.4590 2.4270 0.850.90 0.20 0.6680 2.4270 0.750.90 0.30 0.8446 2.4270 0.650.90 0.40 1.0108 2.4270 0.550.90 0.50 1.1774 2.4270 0.450.90 0.60 1.3537 2.4270 0.350.90 0.70 1.5518 2,4260 0.250.90 0.80 1.7941 2.4220 0.150.90 0.90 2.1460 2.3750 0.05

80

Table 3.2.3.2 Power of L, allowing for early stopping, for alternatives for whichHotelling's power is 80%.

P(L1 < c2)0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90

0.10 0.628 0.651 0.670 0.683 0.690 0.691 0.687 0.684 0.681 0.650

0.20 0.694 0.721 0.746 0.767 0.783 0.793 0.797 0.795 0.791 0.764

0.30 0.736 0.763 0.789 0.812 0.831 0.846 0.854 0.854 0.843 0.808

0040 0.762 0.787 0.811 0.833 0.853 0.869 0.880 0.882 0.872 0.831

0.50 0.778 0.800 0.820 0.841 0.859 0.875 0.887 0.893 0.885 0.845

0.60 0.788 0.805 0.822 0.839 0.855 0.870 0.882 0.890 0.887 0.852

0.70 0.794 0.807 0.819 0.832 0.845 0.857 0.868 0.877 0.879 0.853

0.80 0.797 0.805 0.814 0.823 0.831 0.840 0.848 0.855 0.860 0.847

0.90 0.798 0.803 0.807 0.811 0.815 0.820 0.824 0.828 0.832 0.831

81

..

Table 3.2.3.3 Power of L, allowing for early stopping, for alternatives for which

Hotelling's power is 50%.

P(L1 < c2)0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90

0.10 0.391 0.416 0.441 0.465 0.487 0.504 0.513 0.509 0.487 0.457

0.20 0.428 0.455 0.483 0.510 0.537 0.560 0.578 0.583 0.569 0.515

0.30 0.454 0.481 0.508 0.536 0.564 0.591 0.612 0.623 0.611 0.551

0.40 0.468 0,493 0.518 0.545 0.571 0.597 0.619 0.633 0.627 0.565

0.50 0.484 0.505 0.527 0.550 0.574 0.598 0.619 0.634 0.632 0.578

0.60 0.492 0.509 0.527 0.546 0.565 0.585 0.603 0.618 0.621 0.579

0.70 0.497 0.510 0.523 0.537 0.552 0.566 0.581 0.594 0.599 0.573

0.80 0.500 0.508 0.517 0.526 0.535 0.545 0.555 0.564 0.570 0.559

0.90 0.500 0.505 0.509 0.514 0.518 0.523 0.527 0.532 0.536 0.536

82

1.0

L0.9

0.8

0.7P0 0.6WE 0.5

R0.4

0.3

0.2

0.1

0.0

-90 -45 0 45 90

ANGLE (degrees)

Figure 3.2.3.1 Power at each one-degree sectionfor alternative hypotheses for whichthe power of Hotelling's test is 80%Bonferroni' (B) Hotelling (H) O'Brien (0)New test (L)

83

1.0

0.9

0.8

0.7 LP0 0.6

WE 0.5

R0.4

0.3

0.2

O. 1

0.0

-90 -45 0 45 90

ANGLE (deg-ees)

Figure 3.2.3.2 Power at each one-degree sectionfor alternative hypotheses for whichthe power of Hotelling's test is 50%Bonferroni (B) Hotelling (H) O'Brien (0)New test (L)

84

3.3 Transformation for correlated data

The new test presented above has been developed for uncorrelated outcomes. In

clinical trials, correlated outcomes are often encountered. In that case, it is necessary to

convert them into uncorrelated outcomes before the new test can be used.

Let ~ be a vector distributed as N(Q,~) that is to be transformed into ~, a

vector of uncorrelated N(O,I) variables. There exists a matrix ~ equal to the "square

root" of ~-l such that ~ = ~~. By definition, ~ is the "square root" of the positive

definite matrix ~-l if ~'~ = ~-l. Methods to evaluate ~ have already been derived

(Graybill, 1969).

This transformation must be applied to the vectors YI and Y2 if their

components are not uncorrelated before the new test can be used.

3.4. Using the new test L with more than two endpoints

Let Yi (rxl) be the vector of normally distributed statistics from the data at the

end of the first acrrual period where r, the number of endpoints, is greater than 2. Let

y* (rxl) be the corresponding vector from all the data at the end of the trial and y~ =

(y* - {i5 vi) / ~ I-p. Define

v*' v* ~ V*' V*L* = - 1 - = rp V*' V* + ~ I-p - 1 - 2~ *' * ~ }' _ 1 _ 1 j *' *Y~ Y1 YI Y1

There exists an orthonormal matrix ~ such that Y1 = ~ 'vi and Y = ~'y~.

Using the Gram-Schmidt orthogonalization procedure, we can construct two

orthonormal vectors ~1 and ~2 that are linear combinations of Yi and Y;. Let ~I and

g2 be two linear functions from Yi and y~ such that ~1 and ~2 will be the normalized-. gj . 1 2~i I.e. IIg.11' 1 = , .

_I

g1 V* V*Let's start by choosing ~1 = Yi then A - _-_ - 1_ - ~ - 1

- 1 - 11~111 - IIYill - vi' y(

Next, let's choose ~2 = y~ - 01 ~1 where 01 is such that ~1 and ~2 are

orthogonal. Then,

So,

~2 =y; - ~~ y; ~1 =y;

85

V*' V*

II~/tll y; IIVhl

~2 = Y~(v*'v*)v*_1 _2 _1

V*' V*_ 1 _ 1

~2 {}and ~2 = 1I~211" Therefore, ~ = ~1' ~2 is an orthonormal matrix and

y, =~'yr = {~:. ~W yr ={Nr' Yr, 0}' and

{ }'V*' V*V = - 1 - 2 ,M where- 2 ~ *' * NYl Yl

(v*' V*)V*' V*M = V*' V* _ - 1 - 2 - 1 - 2 and- 2 - 2 V*' V*_ 1 _ 1

N= (v* _ (Yi'Y~)Yi)' (v* _ (Yi'Y~)Yi).- 2 V*' V* - 2 V*' V*_ 1 _ 1 _ 1 _ 1

Then, the new statistic L based on the 2x1 vectors Y1 and Y2 is equal to L*based on the rxl vectors Yi and Y~ because it is invariant to orthogonal

transformations i.e.

~ V*' AA'V*= rp V*' AA'V* + ~ 1-p - 1 - - - 2~ p _ 1 - - _ 1 ~ *' '*Yl ~~ Y1

I V*' V *= {ii ~ Yi' yr + ~ 1-p - 1 - 2IV*' V*~ _ 1 _ 1

v*' V*- _1 - -L*-~V*'V*-- 1 _ 1

86

Furthermore, as Y1 and Y2 are also normally distributed and of dimension

(2xl), the theory already developed for the distribution of L still holds for L*.

87

CHAPTER 4TWO EXAMPLES

4.1 Reduction in incidence of coronary heart disease

The Lipid Research Clinics Coronary Primary Prevention Trial (LRC-CPPT) is

a multicenter, randomized, double-blind clinical trial that was initiated in 1973 to study

the efficacy of lowering total plasma cholesterol levels (TOTAL-C) and low-density

lipoprotein cholesterol levels (LDL-C) in reducing risk of coronary heart disease (CHD).

High-density lipoprotein cholesterol levels (HDL-C) were also looked at as it is believed

that they are negatively correlated with the incidence of CHD. A cohort of men, aged

35 to 59 years, with a high risk of developing CHD was followed for an average period of

7.4 years. The accrual phase consisted of four screening visits at monthly intervals. At

the second screening visit, a' moderate cholesterol-lowering diet was prescribed for all

potential participants. At the fifth visit to the clinic, eligible participants were randomly

assigned to one of two groups. The treatment group was prescribed the bile acid

sequestrant cholestyramine resin and the control group received a placebo. Participants

were followed up bimonthly for all the duration of the trial. The primary endpoint for

evaluating the treatment was the combination of definite CHD death and/or definite

nonfatal myocardial infarction. The effect of the drug on the different cholesterol levels

and triglyceride levels (TG) was also investigated.

To illustrate the use of the new statistic L as well as Hotelling's T 2 , and

O'Brien's test, 389 participants from one of the twelve clinics of the original trial were

studied. The two treatments were compared with respect to the relative changes (%~)

in HDL-C/TOTAL-C and TG at the end of the first year. A positive relative change in

HDL-C/TOTAL-C and a negative relative change in TG would indicate a beneficial

effect from the cholestyramine resin. Percent changes from the participant baseline were

computed for each individual and their averages are presented in Table 4.1.1. A 23.5%

increase in HDL-C/TOTAL-C was observed in the cholestyramine group after one year

compared to 2.9% in the placebo group. However, triglyceride levels rose by 10.8% in

the treatment group and 6.1% in the placebo group. In order to have the treatment

effect on both variables in the same direction, -%A(TG) was used in the analyses. The

correlation between %A(HDL-C/TOTAL-C) and -%A(TG) was 0.14. A two-stage

group sequentilll design was simulated by dividing the subjects into two groups with

50% of them for each accrual period. The division was based on the date of

randomization. Tables 4.1.2 and 4.1.3 show the means for each accrual period. They

are consistent with what was observed for the whole sample. For both accrual periods,

the relative changes for both endpoints are bigger in the cholestyramine group. The

drug effect on the percent changes for each accrual period are shown in Table 4.1.4 as

well as their corresponding T. statistics. The relative change in triglycerides levels was

bigger in the treatment group than in the control group. At the end of the first accrual

period, (n-3)T2 /(n-2)2 = 19hT2 /192/2 = 24.38 and was distributed F with 2 and 191

degrees of freedom. With the Pr(L1 < C2) at the 0.7 level, early acceptance of the null

hypothesis could not be achieved with this result (p < 0.3) so the data from the second

stage was analysed and the results at the end of the trial are included in Table 4.1.5.

For O'Brien's test, the data was standardized first and the F statistic was 32.82

(p<0.001). The L test was calculated from the uncorrelated outcomes. All three

statistics were significant indicating that the cholestyramine resin had an effect on HDL

C/TOTAL-C and on triglycerides. Figure 4.1.1 presents a graphical representation of

O'Brien's test and the L test. The projections of the vector y, the statistics at the end

89

Table 4.1.1. Mean cholesterol and triglycerides levels, all subjects.

All (n=389)

Placebo (n=194)

Pre Post %6

Treatment (n-195)

Pre Post %6

TOTAL-C 292.6

HDL-C 42.6

HDL-C/TOTAL-C 0.147

TG 166.8

275.3

40.6

0.149

167.2

0.029

0.061

291.7 236.7

43.9 42.9

0.152 0.185 0.235

161.4 171.1 0.108

Table 4.1.2. Mean cholesterol and triglycerides levels, first accrual period.

Accrual 1 Ln.!=194)

Placebo (n=98)

Pre Post %6

Treatment (n=96)

Pre Post %6

TOTAL-C 295.5

HDL-C 44.3

HDL-C/TOTAL-C 0.151

TG 168.3

275.9

42.0

. 0.153 0.022

166.3 0.048

294.4 238.5

45.6 43.8

0.156 0.187 0.221

159.0 165.7 0.085

Table 4.1.3. Mean cholesterol and triglycerides levels, second accrual period.

Placebo (n-96)

Pre Post %6

Treatment (n-99)

Pre Post %6

TOTAL-C 289.6 272.6 289.1 234.9

HDL-C 40.7 39.2 42.3 42.0

HDL-C/TOTAL-C 0.142 0.145 0.037 0.148 0.182 0.249

TG 165.2 168.1 0.075 163.8 176.3 0.130

90

Table 4.1.4. Drug effects and their corresponding T statistics, for each accrual period.

Drug effect (Treatment :. Placebo) T statistic

Accrual 1:•

%a(HDL-CjTOTAL-C)

-%a(TG)

Accrual 2:

%a(HDL-CjTOTAL-C)

-%a(TG)

0.199

-0.037

0.212

-0.055

6.645

-0.921

7.881

-1.245

Table 4.1.5. Test results at the end of the LRC-CPPT trial.

O'Brien's test:

2.Hotelling's T •

L test:

F 1,385 = 32.82, p-value = 2.04 E-8

F2,386 = 58.94, p-value = 0

L = 11.1298, p-value = 9.03 E-7

of the trial, on Y1 (L test), the corresponding statistics at the end of the first accrual

period, and on the diagonal of quadrants I and III (O'Brien's test) are shown.

4.2 Oral contraceptives and coronary artery atherosclerosis of cynomolgus monkeys

Studies have shown that an increase in high-density lipoprotein (HDL)

concentrations can red uce coronary artery atherosclerosis. As the effect of some oral

contraceptives is to reduce HDL concentrations in women, there is a potential risk of

increasing corornary artery atherosclerosis. Clarkson et al. (1990) studied the effect of

two contraceptive steroid preparations on 83 adult female cynomolgus macaques fed a

moderately atherogenic diet. Their age varied between 4 and 8 years and none were

91

12

....-.... 6C)

GwC)

z« 0

I0 L~

--.

-6

-12 'T------.-----+-------r-------.-12 -6 o 6 12

%CHANGE(HDL-CjTOTAL-C)

Figure 4. 1. 1 Cholestyramine tria I

L statistic = OL, O'Brien's statistic = OB

92

pregnant. The two preparations were: ethinyl estradiol with nogestrel and ethinyl

estradiol with ethynodiol diacetate. The monkeys were randomized into three groups,

one for each preparation and the control group, balanced for the ratio of total plasma

cholesterol to HDL cholesterol, age, and the frequency of menstrual cycles. These

characteristics are known to influence atherogenesis. There were also no differences

between the three groups,. during the pre-experimental period, in social status rankings,

based on aggressive behavior, plasma lipid concentrations and low-density lipoprotein

(LDL) cholesterol. Atherosclerosis was characterized as the cross-sectional area of

intimal lesion in mm2 of a histologic section of a tissue block. At necropsy, after the

animals were given sodium pentobarbital, five tissue blocks were cut for each of three

coronary arteries: aorta, carotid and iliaca-femoral arteries.

For this example, the effect of contraceptives on cholesterol concentrations as

well as atherosclerosis are considered. The two endpoints of interest are the natural

logarithm of the ratio of the total plasma cholesterol to HDL cholesterol as measured at

the end of the experiment (LTHDL) and the natural logarithm of the mean of the

intimal areas over the five sections of the three coronary arteries (LMIA). The two

groups that received the oral contraceptives are combined (n=49) and compared to the

group that was given the placebo (n=24). The correlation between the two endpoints is

0.79 and, the mean and standard deviation for each group are presented in Table 4.2.1.

Table 4.2.1 Mean LTHDL and LMIA, all subjects.

..

Placebo

Contraceptive

LTHDL

2.097±O.15

2.653±0.12

LMIA

-3.736±0.49

-4.243±O.31

In the original study, all macaques have been recuited at the same time. To

simulate a first accrual period with 50% of the monkeys (p = ndn = 0.5), each group is

93

randomly divided into two subgroups. The mean and standard deviations of each

subgroups are shown in Table 4.2.2.

Table 4.2.2 Mean LTHDL and LMIA, for each accrual period

Placebo (n=24)

1st accrual (n=12)

2nd accrual (n=12)

Contraceptive (n=49)

1st accrual (n=24)

2nd accrual (n=25)

LTHDL

2.087±0.18

2.106±0.25

2.656±0.19

2.651±0.16

LMIA

-4.024±0.61

-3.448±0.79

-4.098±0.43

-4.383±0.44

At the interim analysis, there was an increase of 0.5688 for LTHDL in the

contraceptive group while a decrease of 0.0741 was observed for LMIA. Their respective

T statistics were 1.883 and -0.099. The Hotelling's T 2 was 8.5688 which did not allow

for early acceptance of the null hypothesis of no difference between the two treatments

so the trial would continue through its second stage. At the end of the trial, the group

effects are 0.5566 (T statistic = 2.683) for LTHDL and -0.5073 (T statistic = -0.916) for

LMIA. Table 4.2.3 shows the final results. Hotelling's T 2 is 31.9616 with a p-value of

0.0000002 when comparing it to a X2 with 2 degrees freedom. The overall mean is

substracted from each observation and the result is divided by the pooled within-group

sample standard deviation before obtaining O'Brien's F equal to 0.8718; the

corresponding p-value from the standard F distribution with 1 and 69 degrees freedom

in the numerator and denominator is 0.3537. Finally, the L test is calculated after

transforming the variables into uncorrelated outcomes with the transformation described

[

0.8975 0.6377]in Chapter 3 where A = . Its value is 9.1201 with a p-value of

- 0.6377 0.8975

94

0.000001 obtained from numerical integration using the distribution function of L. The

results are illustrated in Figure 4.2.1. While O'Brien's test would lead to the conclusion

that the treatment does not have a significant effect on the outcomes, both Hotelling's

T 2 and the L test would indicate that there is a significant difference between the

treatment group and the control group. The power of O'Brien's test to detect

alternative hypotheses that are in opposite directions i.e. on the diagonal of quadrants II

and IV is only 5%1 therefore, it is not surprising to observe a p-value > 0.05.

Table 4.2.3. Test results at the end of the cynomolgus monkeys trial.

O'Brien's test:

2.Hotelling's T •

L test:

F 1,69 = 0.8718, p-value = 0.3537

F2,70 = 31.9616, p-value = 0.0000002

L = 9.1201, p-value = 0.000001..

These examples were included to illustrate how each one of the three procedures

can be implemented. The goal was not to show which test is the best one as it could

not be determined through a few examples. Rather, the power of each test for different

alternative hypotheses should be taken into account during the planning a trial to select

the most appropriate procedure.

95

•

10

5

----'-_ L

o+---------~~--------

-5

LMIA

-10 Lr----~----_+_---~----__._

-10 -5 o 5 10

LTHDL

Figure 4.2. 1 Oral contraceptives trial

l statistic = Ol, O'Brien's statistic - OB

96

CHAPTER 5SUMMARY AND SUGGESTIONS FOR FUTURE RESEARCH

This work has presented a global test statistic for the analysis of multiple

endpoints. Pocock and O'Brien have discussed the use of a global statistic as an

additional tool to univariate methods. Instead of leaving the reader with the

interpretation of multiple p-values, the global test provides an overall conclusion about

the differences between two treatments that takes into account the correlation structure

of the multiple endpoints. From the literature review, three procedures arose as being

the most commonly used: Bonferroni's procedure that performs well with moderately

correlated outcomes, Hotelling's ~2 that has the same power to detect a difference in

any direction and O'Brien's test that has the greater power of the three tests when the

variables are all affected in exactly the same direction and the same magnitude.

The new test that has been proposed combines the robustness of Hotelling's :r2

and the optimality properties of O'Brien's test for alternatives that have their effect in

the same direction and the same magnitude with the use of a two-stage group sequential

design. The new test allows one to 'cheat' and look at the data at the interim analysis

in order to use an O'Brien type test at the end of the trial. The 'cheating' is permissible

provided one pays the price of using the correct distribution derived in this dissertation.

It is not limited to continuous variables like the difference between means; it can be used

with any test statistic that is normally distributed or at least asymptotically normally

distributed like log odds, hazard ratios, etc.. It is invariant to rotation like Hotelling's

T 2 which makes it robust for alternative hypotheses in any direction. Unlike the power

of O'Brien's test, its power does not deteriote sharply when the variables are not

affected in exactly the same direction and the same magnitude. An optimal p = ndn,

the proportion of participants recruited during the first accrual period, and PI = P(Ll <

C2)' the probability of accepting the null hypothesis at the interim analysis, can be

determined so that the power of the new test is greater than the power of the three

common procedures in the neighborhood of these specified values. Another attractive

property of this new test is that its application to more than two endpoints is

immediate. Furthermore, even for vectors of dimension greater than two, there always

exists an orthogonal matrix that will project the vectors onto a two-dimensional space

while preserving the angle between them so that the distribution properties already

derived remain the same for three or more endpoints.

Future research could be performed in order to increase the usefulness of this test

in practice. For this research, the main interest was to investigate the improvement in

power when using the results generated at the interim analysis in the development of the

final statistic so the probability of rejecting the null hypothesis at the end of the first

stage of the trial was assumed to be zero. Relaxing this assumption to allow for

rejection of the null hypothesis at the interim analysis would reduce the expected sample

size of the two-stage design and a study of the power in that case should be considered.

Some endpoints may be related but hard to combine because of the different

level of importance of each one. For example, in a clinical trial where death and/or

myocardial infarction are the two endpoints of interest, the investigators may want to

consider death as a more severe outcome than myocardial infarction. An extension of

the new test that would allow the use of weights that reflect the importance of each

variable should be explored.

The test can be used in situations where the parameters are not just means. It

may be applied to statistics that are asymptotically normal. However, the parameter p

98

•

which correspond to the proportion of information that is observed at the time from

analysis at the first stage may be different than the proportion of subjects recruited

during the first period of accrual. For example, in survival analysis, p would be the

proportion of deaths at then end of the first stage relative to the total expected deaths

at the end of the trial. Although the new test applies to most MLE estimates, the

derivation of the distribution assumes that p is the same for all variables. This would

cause difficulties in combining survival and mean endpoints where p for a mean is

typically ~l and p in a survival context is typically ~l where d 1 is the number of deaths

at the end of the first stage and d is the number of deaths at the end of the trial.

Further research is required to allow different p's for different variables.

Finally, as it was mentioned in Chapter 3, the improvement in power for the new

test may be partly due to the two-stage design. Further comparisons with a two-stage

design using Hotelling's T 2 at each stage should be investigated.

In summary, the the9ry presented here is an important step in the direction of

providing more powerful tools for the analysis of multiple endpoints. Although the

results have been presented in the context of clinical trials, the new procedure can be

used in other situations where multiple outcomes are analyzed with a two-stage design.

99

REFERENCES

Abelson, R.P. and Tukey, J.W. (1963). Efficient Utilization of Non-numericalInformation in Quantitative Analysis: General Theory and the Case of SimpleOrder, Annals of Mathematical Statistics, 34, 1341-1369.

Anderson, T.W. (1958). An Introduction to Multivariate Statistical Analysis, Wiley, NewYork.

Armitage, P. (1957). Restricted Sequential Procedures, Biometrika, 44, 9-26.

Armitage, P. (1915). Sequential Medical Trials, Oxford:Bl?,ckwell.

Armitage, P. (1918). Sequential Medical Trials, Biomedicine Special Issue, 28, 40-41.

Armitage, P. and Parmar M. (1986). Some Approaches to the Problem of Multiplicityin Clinical Trials, Proceedings of the Xlllth International Biometrics Conference.

Bauer, P. (1981). On- the Assessment of the Performance of Multiple Test Procedures,Biom. Journal, 28, 811-819.

Bauer, P. (1986). Two Stage Sampling for Simultaneously Testing Main and SideEffects in Clinical Trials, Biom. Journal, 28, 811-819.

Bauer, P., Hackl, P., Hommel, G., Sonnemann, E. (1986). Multiple Testing of Pairs ofOne-Sided Hypotheses, Metrika, 33, 121-121.

Berry, D.A. (1988). Multiple Comparisons, Multiple Tests, and Data Dredging: ABayesian Perspective, Bayesian Statistics 3, 79-84.

Breslow, N. (1990). Biostatistics and Bayes, Statistical Science, 5, 269-298.

Clarkson, T.B., Shively, C.A., Morgan, T.M., Korotnik, D.R., Adams, M.R. andKaplan, J.R. (1990). Oral Contraceptives and Coronary Artery Atherosclerosisof Cynomolgus Monkeys, Obstetrics and Gynecology, 75, 217-222.

Cupples, L.A., Heeren, B.A., Schatzkin, A., Colton, T. (1984). Multiple Testing ofHypotheses in Comparing Two Groups, Annals of Internal Medicine, 100, 122-129.

DeMets, D.L. and Ware, K.K.G. (1980). Group Sequential Methods for Clinical Trialswith a One-Sided Hypothesis, Biometrika, 67, 651-660.

DeMets, D.L. and Lan, K.K.G. (1984). An Overview of Sequential Methods and theirApplication in Clinical Trials, Communications in Statistics - Theory and Methods,13 (19), 2315-2338.

Duncan, D.B. (1951). A Significance Test for Differences Between Ranked Treatmentsin an Analysis of Variance, Virginia Journal Of Sciences, 2, 111-189.

Duncan, D.B. (1952). On the Properties of the Multiple Comparison Test, VirginiaJournal of Sciences, 3, 49-61.

100

Dunn, O.J. (1959). Confidence Intervals for the Means of Dependent, NormallyDistributed Variables, Journal of the American Statistical Association, 54, 613-621.

Fairbanks, K. and Madsen, R. (1982). P Values for Tests Using a RepeatedSignificance Test Design, Biometrika, 69, 69-74.

Fleming, T.R., Harrington, D.P., O'Brien P.C. (1984). Designs for Group SequentialTests, Controlled Clinical Trials, 5, 348-361.

Freedman, L.S., Lowe, D., Macaskill , P. (1984). Stopping Rules for Clinical TrialsIncorporating Clinical Opinion, Biometrics, 40, 575-586.

Friedman, L.M., Furberg, C.D., DeMets, D.L. (1981). Fundamentals of Clinical Trials,Wright, Boston.

Gail, M. (1984). Nonparametric Frequentist Proposals for Monitoring ComparativeSurvival Studies, Handbook of Statistics, P.R. Krishnaiah and P .K. Sen (eds), 4,791-811.

Geller, N.L., Pocock, S.J. (1987). Interim Analyses in Randomized Clinical Trials:Ramifications and Guidelines for Practitioners, Biometrics, 43, 213-223.

Godfrey, K. (1985). Comparing the Means of Several Groups, The New England Journalof Medicine, 313, 1450-1456.

Gould, A.L. and Pecore, V.J. (1982). Group Sequential Methods for Clinical TrialsAllowing Early Acceptance of Ho and Incorporating Costs, Biometrika, 69, 75-80.

Graybill, F.A. (1969). Introduction to Matrices with Applications in Statistics,Wadsworth Publishing Company Inc., Belmont, California.

Harrington, D.P., Flemming, T.R., Green, S.J. (1982). Procedures for Serial Testing inCensored Survival Data, IMS Monograph Series, 269-286.

Haybittle, J.L. (1971). Repeated Assessment of Results in Clinical Trials of CancerTreatment, British Journal of Radiology, 44, 793-797.

Hochberg, Y. (1988). A sharper Bonferroni procedure for multiple tests of significance,Biometrika, 75, 4, 800-802.

Hochberg, Y., Tamhane, A.C. (1987). Multiple Comparison Procedures, Wiley, NewYork.

Holm, S. (1979). A Simple Sequentially Rejective Multiple Test Procedure,Scandinavian Journal of Statistics, 6, 65-70.

Hommel, G. (1983). Tests of the overall hypothesis for arbitrary dependencestructures, Biometrical Journal, 25, 423-430.

Hommel, G. (1986). Multiple test Procedures for arbitrary dependence structures,Metrika, 33, 321-336.

Hommel, G. (1988). A stagewise rejective mulyiple test procedure based on a modifiedBonferroni test, Biometrika, 75, 2, 383-386.

101

•

•

Hommel, G. (1989). A comparison of two modified Bonferroni procedures, Biometrika,76, 3, 624-625.

Hotelling, H. (1931). The Generalization of Student's Ratio, Annals of MathematicalStatistics, 2, 360-378.

Jennison, C. and Turnbull, B.W. (1983). Repeated Confidence Interval for GroupSequential Clinical Trials, Controlled Clinical Trials, 5, 33-45.

Johnson, L.W. and Riess, R.D. (1977). Numerical Analysis, Addison-Wesley PublishingCompany, Reading.

Jones, D. and Whitehead J. (1979a). Sequential Forms of The Log Rank and ModifiedWilcoxon Tests for Censored Data, Biometrika, 66, 105-113.

Jones, D.R. and Whitehead J. (1979b). Group Sequential Methods, British Journal ofCancer, 40, 171.

Lan, K.K.G. and DeMets, D.L. (1983). Discrete Sequential Boundaries for ClinicalTrials, Biometrika, 70, 659-663.

Lan, K.K.G. , DeMets, D.L., Halperin, M. (1984). More Flexible Sequential and NonSequential Designs in Long-Term Clinical Trials, Communications in Statistics Theory and Methods, 13 (19), 2339-2353.

Lipid Research Clinics Program (1984). The Lipid Research Clinics Coronary PrimaryPrevention Trial Results: I. Reduction in Incidence of Coronary Heart Disease,Journal of the American Medical Association, 251, 351-364.

Lipid Research Clinics Program (1984). The Lipid Research Clinics Coronary PrimaryPrevention Trial Results: II. The Relationship of Reduction in Incidence ofCoronary Heart Disease to Cholesterol Lowering, Journal of the American MedicalAssociation, 251, 365-374.

McPherson, K. (1974). Statistics: The problem of Examining Accumulating Data MoreThan Once, The New England Journal of Medicine, 290, 501-502.

Mc Pherson, K. and Armitage, P. (1971). Repeated Significance tests on accumulatingdata when the null hypothesis is not true, Journal of the Royal Statitical Society,Series A, 134, 15-25.

Meier, P. (1975). Statistics and Medical Experimentation, Biometrics, 31, 511-529.

Miller, R. (1981). Simultaneous Statistical Inference, McGraw-Hill, New York.

Morrison, D.F. (1976). Multivariate Statistical Methods, McGraw-Hill, New York.

Noble, B. (1969). Applied Linear Algebra, Prentice-Hall Inc., Englewood Cliffs, NewJersey.

O'Brien, P.C. (1984). Procedures for Comparing Samples with Multiple Endpoints,Biometrics, 40, 1079-1087.

O'Brien, P.C. and Fleming, T.R. (1979). A Multiple Testing Procedure for ClinicalTrials, Biometrics, 35, 549-556.

102

Peto, R., Pike, M.C., Armitage, P., Breslow, N.E., Cox, D.R., Howard, S.V., Mantel,N., McPherson, K., Peto, J., Smith, P .G. (1976). Design and Analysis ofRandomized Clinical Trials Requiring Prolonged Observation of Each Patient. I.Introduction and Design, British Journal of Cancer, 34, 585-612.

Pocock, S.J. (1977). Group Sequential Methods in the Design and Analysis of ClinicalTrials, Biometrika, 64, 191-199.

Pocock, S.J. (1982). Interim Analyses for Randomized Clinical Trials: The GroupSequential Approach, Biometrics, 38, 153-162.

Pocock, S.J. (1985). Current Issues in the Design and Interpretation of Clinical Trials,British Medical Journal, 290, 39-42.

Pocock, S.J., Geller, N.L., Tsiatis, A. (1987). The Analysis of Multiple Endpoints inClinical Trials, Biometrics, 43, 487-498.

Press, S.J. (1972). Applied Multivariate Analysis, New York: Holt, Rinehart & Winston.

Rom, D.M. (1990). A sequentially rejective test procedure based on a modifiedBonferroni inequality, Biometrika, 77, 3, 663-665.

Roy, S.N. and Bose, R.C. (1953). Simultaneous Confidence Interval Estimation, Annalsof Mathematical Statistics, 24, 513-536.

Ruger, B. (1978). Das Maximale Signifikanzniveau des Tests: " Lehne Ho ab, wenn kunter n Gegebenen Tests zur Ablehnung Fuhren, Metrika, 25, 171-178.

Scheffe, H. (1953). A Method for Judging all Contrasts in the Analysis of Variance,Biometrika, 40, 87-104.

Selke, T. and Siegmund, D. "(1983). Sequential Analysis of the Proportional HazardsModel, Biometrika, 70, 315-326.

Shaffer, J.P. (1986). Modified sequentially rejective multiple test procedures, Journal ofthe American Statistical Association, 81, 395, 826-831.

Sidak, Z. (1967). Rectangular Confidence Regions for the Means of MultivariateNormal Distributions, Journal of the American Statistical Association, 62, 626-633.

Sidak, Z. (1968). On Multivariate Normal Probabilities of Rectangles: Theirdependence on Correlation, Annals of Mathematical Statistics, 5, 1425-1434.

Sidak, Z. (1971). On Probabilities of Rectangles in Multivariate Student Distributions:Their Dependence on Correlations, Annals of Mathematical Statistics, 1, 169-175.

Simes, R.J. (1986). An Improved Bonferroni Procedure for Multiple Tests ofSignificance, Biometrika, 73, 751-754.

Smith, D.G., Clemens, J., Crede, W., Harvey, M., Gracely, E.J. (1987). Impact ofMultiple Comparisons in Randomized Clinical Trials, The American Journal ofMedicine, 83, 545-550.

103

•

Tang, D., Gnecco, C., Geller, N.L. (1989). Design of Group Sequential Clinical Trialswith Multiple Endpoints, Journal of the American Statistical Association, 84, 776779.

Tsiatis, A.A. (1982). Repeated Significance Testing for a General Class of StatisticsUsed in Censored Survival Analysis, Journal of the American Statistical Association,77, 855-861.

Tsiatis, A.A., Rosner, G.L., Tritchler, D.L. (1985). Group Sequential Tests withCensored Survival Data Adjusting for Covariates, Biometrika, 72, 365-373.

Tukey, J.W. (1977). Some Thoughts on Clinical Trials, Especially on Problems ofMultiplicity, Science, 198,679-684.

Wald, A. (1947). Sequential Analysis, Wiley, New York.

Wald, A. and Wolfowitz, J. (1948). Optimum Character of the Sequential ProbabilityRatio Test, Annals of Mathematical Statistics, 19, 326-339.

Whitehead, J. (1983). The Design and Analysis of Sequential Clinical Trials, EllisHorwood Limited, Chichester.

Whitehead, J. and Stratton, I. (1983). Group Sequential Clinical Trials withTriangular Continuation Regions, Biometrics, 39, 227-236.

Worsley, K.J. (1982). An Improved Bonferroni Inequality and Applications, Biometrika,69, 297-302.

104