ANALYZING MULTIPLE ENDPOINTSWITH A TWO-STAGE GROUP SEQUENTIAL DESIGN
IN CLINICAL TRIALS
by
Claudine Legault
Department of Biostatistics, University ofNorth Carolina at Chapel Hill, NC.
Institute of statistics Mimeo Series No. 1889T
September 1991
ANALYZING MULTIPLE ENDPOINTS
WITH A TWO-STAGE GROUP SEQUENTIAL DESIGN
IN CLINICAL TRIALS
by
Claudine Legault
A dissertation submitted to the faculty of the University of North Carolina at Chapel
Hill in partial fulfillment of the requirements for the degree of Doctor of Philosophy in
the Department of Biostatistics.
Chapel Hill
1991
Approved by:
-~.~>~ eader
~a..--c.....cA Reader
~eader
Re~er
ABSTRACT
CLAUDINE LEGAULT. Analyzing Multiple Endpoints with a two-stage group
sequential design in clinical trials. (Under the direction of Timothy M. Morgan.)
In many clinical trials, the assessment of the response to the various treatments
can include a large variety of outcome variables which are generally correlated.
Different endpoints may be regarded by the investigators as important in determining if
a certain treatment is effective. The more variables there are, the more likely it is that
differences will appear at random if adjustments are not made for the multiple tests.
Bonferroni's adjustment for multiple comparisons is one of the approaches used
when multiple correlated outcomes are being compared. For alternative hypotheses in
which several endpoints are affected in the same direction, Bonferroni's procedure may
lack power because the rejection of the overall hypothesis is based on the smallest p
value of all the test. statistics. Hotelling's T 2 makes no distinction between variables
that change favorably and variables that change unfavorably. It lacks power to detect
any specific types of departure considered a priori to be biologically plausible in a clinical
trial and was therefore considered unsuitable by Pocock (1987) for the analysis of clinical
trials. A test proposed by O'Brien (1984) focusses on alternative hypotheses with all
endpoints showing an effect in the same direction. In that situation it provides better
power but it deteriorates sharply to a power of only 5% when variables are affected in
opposite directions.
This dissertation first compares the critical regions and the power contours of
the three procedures mentioned above. The efficiency and robustness of these
procedures are compared as a function of the direction of the alternative hypotheses.
A new test is first derived using data from the interim look in a two-stage group
sequential design to form the rejection boundary at the second stage. Initially, the test
uses an Hotelling T 2 rejection region at the end of the first stage and an O'Brien 'type'
procedure at the end of the second stage. The test is then extended to allow for early
acceptance. The distribution of the proposed tests is presented and their power and
efficiency are compared to common procedures. Finally, two examples are presented
and future research recommendations are discussed.
ii
ACKNOWLEGMENTS
First and foremost, I would like to thank Dr Tim Morgan for his support. His
guidance, his availability, his patience and his sincere care have been constant and
precious throughout this work.
I would also like to thank Dr P. K. Sen, the chairman of my committee and my
academic advisor, for his judicious advice. Appreciation is also extended to the other
members of my committee, Professors C. Ed Davis, Paul Stewart and Gerardo Heiss for
their constructive comments.
I warmly thank my sisters Marie-Andree and Monique, my brother Benoit and
their families for encouraging me in the pursuit of my doctoral degree. Their presence
on the day of my final oral examination means a lot to me.
I have made dear friends during my stay in Chapel Hill and I want to thank
them all, as well as my friends from Montreal, for their support and friendship. The
most heartfelt and enduring gratitude is expressed to Susan Lewis who enthusiastically
and cheerfully supported me in many ways.
Finally, I am indebted to the 'Fonds de la Recherche en Sante du Quebec' for
their financial support.
iii
TABLE OF CONTENTS
Page
List of Tables vi
List of Figures vii
Chapter 1: Introduction and Literature Review 1
1.1 Introduction 1
1.2 Literature Review 3
1.2.1 M~ltiple endpoint procedures 3
1.2.1.1 Bonferroni's Inequality 3
1.2.1.2 Hotelling's T 2 : 12
1.2.1.3 O'Brien's Test 14
1.2.1.3.1 Nonparametric procedure 15
1.2.1.3.2 GLS Parametric procedure 15
1.2.2 Sequential designs 16
1.3 Proposed research 19
Chapter 2: Evaluation and Comparison of Three Common Multivariate Testing
Procedures 21
2.1 Introduction 21
2.2 Bonferroni's Inequality 22
2.3 Hotelling's T 2 25
2.4 O'Brien's Test 27
2.5 Comparison of the three procedures 30
2.6 Evaluation with correlated data 39
IV
Chapter 3: Two-Stage Group Sequential Test with Multiple endpoints 62
3.1 Introduction 62
3.2 The new test statistic L 62
3.2.1 Density of Lunder Ho 64
3.2.2 Power of L 66
3.2.2.1 Special case: p =~ = 0.5 and 1J2 = O 66
3.2.2.2 General case: p = ~1 and IJ* = {Ii 1J1 73
3.2.2.3 Distribution of L, allowing for early stopping 76
3.3 Transformation for correlated data 85
3.4 Using the new test L with more than two endpoints 85
Chapter 4: Two Examples : 88
4.1 Reduction in incidence of coronary heart disease 88
4.2 Oral contraceptives and coronary atherosclerosis of cynomolgus monkeys 91
Chapter 5: Summary and Suggestions for Future Research 97
References 100
v
LIST OF TABLES
Table 2.5.1: Power of O'Brien's test, for different alternative hypotheses whenthe power of Hotelling's T 2 is 80% and the number of endpointsp = 2, 3, 4, 5 and 10 38
Table 3.2.1.1:
Table 3.3.1:
Table 3.3.2:
Table 3.3.3:
Table 4.1.1:
Table 4.1.2:
Table 4.1.3:
Table 4.1.4:
Table 4.1.5:
Table 4.2.1:
Table 4.2.2:
Table 4.2.3:
Critical values Ie and power for L, a = 0.05 70
Critical values Ie and power for L, when allowing for early stopping,a = 0.05 78
Power of L, when allowing for early stopping, for alternativehypotheses for which the power of Hotelling's T 2 is 80% 81
Power of L, when allowing for early stopping, for alternativehypotheses for which the power of Hotelling's T 2 is 50% 82
Mean cholesterol and triglycerides levels, all subjects 90
Mean cholesterol and triglycerides levels, first accrual period 90
Mean cholesterol and triglycerides levels, second accrual period 90
Drug effects and their corresponding T statistics,for each accrual period 91
Tests results at the end of the LRC-CPPT 91
Mean LTHDL and LMIA, all subjects 93
Mean LTHDL and LMIA, for each accrual period 94
Tests results at the end of the cynomolgus monkeys trial 95
vi
LIST OF FIGURES
New axes to determine the power of O'Brien's test 29
Contours for powers of 50%,80% and 90%: Bonferroni (B)Hotelling (H) O'Brien (0) 24
Critical regions: Bonferroni (B) Hotelling (H) O'Brien (0) 23
Regions in which each test has greater power than the othertwo tests: Bonferroni (B) Hotelling (H) O'Brien (0) 31
One-degree sections for 80% power contour of Hotelling's test ......... 33
Figure 2.2.1:
Figure 2.2.2:
Figure 2.4.1:
Figure 2.5.1:
Figure 2.5.2:
Figure 2.5.2.1: Power at each one-degree section for alternative hypothesesfor which the power of Hotelling's test is 50%:Bonferroni (B) Hotelling (H) O~Brien (0) 34
Figure 2.5.2.2: Power at each one-degree section for alternative hypothesesfor which the power of Hotelling's test is 80%:Bonferroni (B) Hotelling (H) O'Brien (0) 35
Figure 2.5.2.3: Power at each one-degree section for alternative hypothesesfor which the power of Hotelling's test is 90%:Bonferroni (B) Hotelling (H) O'Brien (0) 36
Figure 2.6.1.1: Critical regions, rho = -0.5:Bonferroni (B) Hotelling (H) O'Brien (0) 42
Figure 2.6.1.2: Critical regions, rho = -0.9:Bonferroni (B) Hotelling (H) O'Brien (0) 43
Figure 2.6.1.3: Critical regions, rho = 0.5:Bonferroni (B) Hotelling (H) O'Brien (0) 44
Figure 2.6.1.4: Critical regions, rho = 0.9:Bonferroni (B) Hotelling (H) O'Brien (0) 45
Figure 2.6.2.1: Contours for powers of 50%, 80% and 90%, rho = -0.5:Bonferroni (B) Hotelling (H) O'Brien (0) 46
Figure 2.6.2.2: Contours for powers of 50%, 80% and 90%, rho = -0.9:Bonferroni (B) Hotelling (H) O'Brien (0) 47
Figure 2.6.2.3: Contours for powers of 50%, 80% and 90%, rho = 0.5:Bonferroni (B) Hotelling (H) O'Brien (0) 48
Figure 2.6.2.4: Contours for powers of 50%, 80% and 90%, rho = 0.9:Bonferroni (B) Hotelling (H) O'Brien (0) 49
Figure 2.6.3.1.1: Power at each one-degree section for alternative hypothesesfor which the power of Hotelling's test is 50%rho = -0.5 50
vii
Figure 2.6.3.1.2: Power at each one-degree section for alternative hypothesesfor which the power of Hotelling's test is 80%rho = -0.5Bonferroni (B) Hotelling (H) O'Brien (0) 51
Figure 2.6.3.1.3: Power at each one-degree section for alternative hypothesesfor which the power of Hotelling's test is 90%rho = -0.5Bonferroni (B) Hotelling (H) O'Brien (0) 52
Figure 2.6.3.2.1: Power at each one-degree section for alternative hypothesesfor which the power of Hotelling's test is 50%rho = -0.9Bonferroni (B) Hotelling (H) O'Brien (0) 53
Figure 2.6.3.2.2: Power at each one-degree section for alternative hypothesesfor which the power of Hotelling's test is 80%rho = -0.9Bonferroni (B) Hotelling (H) O'Brien (0) 54
Figure 2.6.3.2.3: Power at each one-degree section for alternative hypothesesfor which the power of Hotelling's test is 90%rho = -0.9Bonferroni (B) Hotelling (H) O'Brien (0) 55
Figure 2.6.3.3.1: Power at each one-degree section for alternative hypothesesfor which the power of Hotelling's test is 50%rho = 0.5Bonferroni (B) Hotelling (H) O'Brien (0) 56
Figure 2.6.3.3.2: Power at each one-degree section for alternative hypothesesfor which the power of Hotelling's test is 80%rho = 0.5Bonferroni (B) Hotelling (H) O'Brien (0) 57
Figure 2.6.3.3.3: Power at each one-degree section for alternative hypothesesfor which the power of Hotelling's test is 90%rho = 0.5Bonferroni (B) Hotelling (H) O'Brien (0) 58
Figure 2.6.3.4.1: Power at each one-degree section for alternative hypothesesfor which the power of Hotelling's test is 50%rho = 0.9Bonferroni (B) Hotelling (H) O'Brien (0) 59
Figure 2.6.3.4.2: Power at each one-degree section for alternative hypothesesfor which the power of Hotelling's test is 80%rho = 0.9Bonferroni (B) Hotelling (H) O'Brien (0) 60
Figure 2.6.3.4.3: Power at each one-degree section for a.lternative hypothesesfor which the power of Hotelling's test is 90%rho = 0.9Bonferroni (B) Hotelling (H) O'Brien (0) 61
viii
·Figure 3.2.1.1:
Figure 3.2.1.2:
Figure 3.2.1.3:
Figure 3.2.3.1:
Density of L, p = 0.1 67
Density of L, p = 0.5 68
Density of L, p = 0.9 69
Power at each one-degree section for alternative hypothesesfor which the power of Hotelling's test is 80%:Bonferroni (B) Hotelling (H) O'Brien (0) New test (L) 83
Figure 3.2.3.2: Power at each one-degree section for alternative hypothesesfor which the power of Hotelling's test is 50%:Bonferroni (B) Hotelling (H) O'Brien (0) New test (L) 84
Figure
Figure
4.1.1: Cholestyramine trial:L statistic = OL, O'Brien's statistic = OB 92
4.2.1: Oral contraceptives trial:L statistic = OL, O'Brien's statistic = OB 96
ix
CHAPTER 1INTRODUCTION AND LITERATURE REYIEW
1.1 Introduction
A study often involves a great number of response variables, each of them
reflecting one aspect of the overall question of interest. Different endpoints may be
regarded by the investigators as important in determining if a certain treatment is
effective. The more variables there are, the more likely it is that differences will appear
at random if adjustments are not made for the multiple tests; however, investigators do
not want to ignore any potentially important endpoint. In many clinical trials, the
assessment of the response to the various treatments can include a large variety of
outcome variables. For example, Pocock et al (1987) discussed a chronic respiratory
disease crossover trial studying the effect of the addition of an inhaled drug to each
patient's normal treatment on respiratory function. Three standard respiratory function
measures were taken: the peak expiratory flow rate (PEFR), the forced expiratory
. volume (FEY1) and the forced vital capacity (FYC). O'Brien (1984) examined a trial
comparing two therapies for the treatment of diabetes. The improvement of the nerve
function was measured by 34 electromyographic variables. Smith et al (1987) reported
on 67 trials from the 1982 issues of four medical journals: Lancet, New England Journal
of Medicine, Journal of the American Medical Association and the British Medical
Journal. Of the 67 trials, 66 contained more than one therapeutic comparison. The
main source of these multiple therapeutic comparisons was multiple outcomes with a
mean of 21.7 different analyzed outcomes per trial. Only two of these trials contained
any statistical adjustments for the multiple therapeutic comparisons. None of the trials
in the study used methods developed for analyzing interrelated outcomes.
One common approach to the analysis of an overall question is to consider the
outcomes simultaneously and to present multiple p-values from univariate tests on each
of the specified endpoints. This presents some problems. First, the use of multiple
significance tests is likely to increase the chance of detecting a difference in at least one
of the outcomes between two treatments. This difference may, in fact, appear more
important than it really is, the probability of a significant result increasing with the
number of tests being performed, when the null hypothesis of no effect is true. A
correction is often imposed on the level of significance lr. Second, the endpoints may be
correlated and separate univariate tests do not, by themselves, take into account the
correlation structure. Conclusions based on such analyses can only be looked at with
reservation. Sometimes a single primary endpoint is first specified and analyzed at the
prespecified level lr. Secondary.endpoints are then defined and multiple significance test
results are reported to the reader who may look at them as exploratory results or
modify the lr level according to the number of tests performed. The choice of a primary
endpoint may be arduous; the distinction between the primary and some of the
secondary endpoints may not be obvious. Finally, if the endpoints are not all affected in
the same direction, the interpretation of the results may present some difficulties in
deciding if there is an appreciable difference. For example, if some of the variables
measuring the nerve function showed a significant improvement under one therapy while
the remaining variables demonstrated a deterioration under the same therapy, the
overall evaluation of the effect would certainly cause problems to the researchers.
Common procedures for making multiple comparisons of correlated outcomes,
multivariate tests combining several endpoints, and procedures for comparing outcomes
at multiple time points will be reviewed subsequently. A brief description of group
2
sequential designs will also be given. This dissertation will consider new methods that
use information obtained from interim analyses in group sequential designs in making
treatment comparisons on multiple endpoints.
1.2 Literature Review
1.2.1 Multiple endpoint procedures
Several testing procedures have been proposed in the literature for performing
multiple hypothesis tests, including those proposed by Tukey (1951), Duncan (1951),
Scheffe (1953), Dunn (1959) and Roy & Bose (1953). General references on
simultaneous statistical inference are Miller (1981), Anderson (1958), Press (1972),
Morisson (1976) and Hochberg & Tamhane (1987). Recently, Berry (1988) and Breslow
(1990) presented some bayesian approaches to the problems of multiplicity. This review
will focus on three procedures: Bonferroni's inequality (1936) which is a commonly used
procedure for multiple hypothesis tests, Hotelling's T 2 (1931) which is a standard
classical test for the comparison of multivariate samples, and a test recently adapted to
clinical trials by O'Brien (1984). Descriptions and comparisons of these methods will be
presented in the following sections.
1.2.1.1 Bonferroni's Inequality
Let P1'..... 'Pn be a set of p-values to test hypotheses H1,..... ,Hn. The Bonferroni
procedure will lead to the rejection of Ho = {H1,..... ,Hn} if any p-value is less than o:/n,
where 0: is the overall level of significance. Furthermore, each hypothesis Hi (i=1, ..... ,n)
will be individually rejected if Pi :$ o:/n. The Bonferroni inequality,
n
Pr{U (Pi $ o:/n)} $ 0:
i=1
3
(0 $ 0: $1) (1.2.1)
ensures that the probability of rejecting at least one hypothesis when all are true is no
greater than a (Simes, 1986). If the n endpoints are independent,
Pr(smallest p-value $ a/n)
= Pr(rejecting at least one hypothesis)
= 1 - Pr(not rejecting any hypothesis)
= 1 - (1 - a/n)n
< 1 - (1 - (a/n)n)
= a.
For alternative hypotheses in which several endpoints are affected in the same
direction, Bonferroni's procedure may lack power because the rejection of the overall
hypothesis is based on the smallest p-value of the k test statistics.
In practice, endpoints are generally correlated. Pocock et al (1987) have
demonstrated that Bonferroni's correction works reasonably well for moderately
correlated normally distributed endpoints with known variance and the same correlation
p for all possible pairs within each of two compared groups. The conservatism of
Bonferroni's approach increases as p increases but there is no noticeable deterioration in
Bonferroni's correction as the number of correlated endpoints increases. With five
endpoints, a=0.05 and p=0.5, they have shown that the value a', which the smallest of
5 one-sided p-values obtained from the normal test statistics will reach with probability
a under the null hypothesis, is 0.0128 compared with a/n=O.Ol. However, multiple
endpoints are not usually equicorrelated and normally distributed. Pocock suggests that
similar findings should occur for any continuous asymptotically normal test statistics
and that if most pairwise correlations are less than 0.5, serious conservatism should not
occur.
4
Sidak (1967) proposed a modification of Bonferroni's inequality. Instead of
testing each hypothesis at aj = a/n, he recommended using a level of significance aj =
to 1 - (1 - a)l/n. Similarily to Bonferroni's approach, Sidak showed that, for n
independent endpoints:
Pr(smaUest p-value $ 1 _ (1 _ a)l/n )
= 1 - {1 - [1 - (1 _ a/lnnn
$ 1 - {1 - [1 - (1 - a/n)]}n
= 1 - {1 - a/n}n
$ 1 - {1 :- (a/n)n}
= a.
For n < 10 and a=0.05, Sidak's multiplicative inequality only leads to a slightly
mC;>re powerful test than Bonferroni's.
Holm (1979) presented a sequentially rejective Bonferroni test where tests can be
conducted at successively higher significance levels. It is as simple to compute as the
classical Bonferroni test and has a strictly larger probability of rejecting each hypotheses
individually. However, the probability of rejecting Ho = {H1,..... ,Hn} is the same for
both tests.
Let Yi> Y2 , ........ ,Yn be some test statistics,
Pk(Yk ) be the p-value for the outcome of the test statistic Yk ,
k=l,..... ,n
The test is:
5
Is R1 ~ a/n?
yes
Reject HI
.lJ.
no ::} Accept HI' ........ , Hn , stop.
Is R 2 ~ a/n-1? no::} Accept H2 , ......... , Hn , stop.
yes
Reject H2
.lJ.
Is Rn ~ a/I?
yes
Reject Hn , stop.
no =? Accept Hn , stop.
The power gain obtained by using a sequentially rejective Bonferroni test instead
of a classical Bonferroni test depends very much upon the alternative hypothesis. It is
small if all the hypotheses are 'almost true', but it may be considerable if a number of
hypotheses are 'completely wrong'. If m of the n basic hypotheses are 'completely
wrong', the corresponding levels attain small values, and these hypotheses are rejected in
the first m steps with a big probability. The other levels are then compared to a/k for
k = n - m, n - m - 1, n - m - 2,..... , 2, 1, which is equivalent to performing a sequentially
rejective Bonferroni test only on those hypotheses that are not 'completely wrong'.
A great advantage with the sequentially rejective Bonferroni test (as well as the
classical Bonferroni test) is that there are no restrictions on the type of tests, the only
requirement being that it should be possible to calculate the obtained level for each
separate test. Furthermore, when the test statistics are independent, the comparison
6
l/n 1/(n-1)constants Ot/n, Ot/(n-1),..... ,Ot/1 can be replaced by 1-(1-Ot) , 1-(1-Ot) ,..... , 1-(1-
Ot)l, which are greater. This means that the test is more powerful but, the increase in
power is not very big.
It may happen that some hypotheses are more important than others, which
may imply the use of higher levels of significance for the most important hypotheses and
smaller levels of significance for the less important hypotheses when the Bonferroni
technique is applied. The sequentially rejective Bonferroni test can be adapted for this
situation. At each step in the procedure the obtained levels for the not yet rejected
hypotheses are compared to parts of the Ot, which are proportional to the corresponding
constants.
Hommel (1983) introduced yet another level Ot test less conservative than
Bonferroni's, based on Ruger's inequality (1978). Let P (k) be the kthsmallest of n p
values, 2$k$n. Reject Ho if P (k) $ kOt/n. Here k has to be determined before
performing the n tests.
To avoid choosing k .in advance, one can use the following level ot test also
proposed by Hommel (1983): reject Ho if P (k)$kOt/nCn for at least one k, 1$k$n,
nwhere Cn = I: 1/i. Simes (1986) introduced a modification of Hommel's procedure
i-I
that will be described later.
Shaffer (1986) modified Holms' sequentially rejective procedure to obtain a
further increase in· power when there are logical implications among the hypotheses and
alternatives so that not all combinations of true and false hypotheses are possible.
Given that j-l hypotheses have been rejected, instead of using Ot/n-j+l for the next test
as in Holm's procedure, the denominator can be set at tj , where t j equals the maximum
number of hypotheses that could be true, given that at least j-1 hypotheses are false.
Obviously, t j is never greater than n-j+l, and for some values of j it may be strictly
smaller. This modified sequentially rejective Bonferroni (MSRB) procedure will never be
7
less powerful than the sequentially rejective procedure while maintaining an
experimentwise significance level $a.
Often the n hypotheses are not tested separately unless a more comprehensive
hypothesis has initially been rejected at significance level a, where such rejection implies
that at least some number r of the n hypotheses are false, r = 1, 2,..... , n-1. A further
improvement in the MSRB is then possible; the critical values a/tj can be replaced by
et/tn-r without increasing the overall significance level above a.
Another modification of the MRSB procedure takes into account the particular
hypotheses rejected. The power of the MRSB procedure can be increased, at the cost of
greater complexity, by substituting for a/tj at stage j the value et/tj, where tj* is the
maximum number of hypotheses that could be true, given that the specific ordered
hypotheses HI' H2 ,.· ... , Hj-I are false. This procedure has an experimentwise
significance level $ a. Shaffer's modifications are independent of the particular test
statistic used, except for the knowledge ~f their respective marginal distributions.
Worsley (1982) prese~ted an improved Bonferroni inequality which gives an
upper bound for the probability of the union of an arbitrary sequence of events. It is
constructed in terms of the joint probability of pairs of events, which are represented by
edges on a graph. His procedure represents an improvement over the Bonferroni, Sidak
and Holm approaches, but it requires knowledge of the joint probabilities of pairs of
events which is not always easily available.
Armitage and Parmar (1986) developed a sequential method to investigate the p-
values of test statistics which follow a multivariate normal distribution: the 'peeling'
procedure. For k ordered p-values, the ith Bonferroni-adjusted p-value is ..
d. P (. )k-(i-I)a J (i) = 1 - 1 - P (i)
8
i=l d (1.2.2)
"
where d is the first adjusted value to be judged non-significant. We should expect d to
be small even for large k. However, any Bonferroni-type approach is too conservative
when tests are correlated. Thus, they proposed an adjusted correction which allows for
correlations:
(1.2.3)
where O::;x::;l. For k independent tests x=l and for fully correlated tests x=O. For
the general case, x is defined as a function of the correlation structure.
The maximum relative error, 10 using adj P (1) as an adjusted Bonferroni
correction for 5 correlated tests, when 0.001 ::; P (1) ::; 0.05 was calculated assuming
multivariate normality for the tests statistics, using Schervish's algorithm for the
multivariate normal integral. Thirty-three different correlation structures and P-values
were considered. The maximum. relative error was 8%. The authors have found that
adj P (1) also gives very good results for k = 2, 3 and 4 dimensions.
Simes (1986) presented a generalization of the Bonferroni procedure which has
an actual significance level closer to the nominal level in a wide range of circumstances
and which has a lower type II error rate for a given nominal significance level than the
classical procedure. It is a modification that is less conservative than Hommel's
procedure because of omitting the constant Cn' His procedure uses different critical
values for each p-value. For n ordered p-values P (1) ::; ..... ::;P(n) testing hypotheses
H(l)' ..... ,H(n)' one rejects the overall null hypothesis Ho = {Hi ,..... ,Hn} if P (k) ::; kOlin
for any k = 1,..... ,n. This procedure has type I error probability equal to Ol for
independent tests. Simes simulated a multivariate normal distribution with unit
variances and common correlation coefficient p as well as chi-squared tests to estimate
the type I error rate of the classical and generalized test procedures. The classical
9
Bonferroni procedure has similar type I error rate for independent tests but is more
conservative than the generalized test for highly correlated outcomes. The simulation
study of test statistics under various alternative hypotheses showed that the
improvement in power is appreciable when several of the alternative hypotheses are
correct. One disadvantage seems to be a slight increase in computation. Finally, the
modified Bonferroni test procedure does not allow specific alternative hypotheses to be
identified; statements about individual hypotheses should be considered exploratory.
Hommel (1988) extended Simes' procedure to make inferences on individual
hypotheses. Let J = {i' E {l n}: P (n-i'+k) > ka/i'j k = 1, ,i'}; the Ps are ordered.
If J is non empty, reject H(i) whenever P(i) $ alj' with j'= max (i'e J). If J is empty,
reject all Hi (i=l,..... ,n).
At the same time, Hochberg (1988) gave a simple sequential way of making
inferences on individual hypotheses that is able to reject at least one individual
hypothesis when the global null hypothesis is rejected. Using the ordered p-values,
reject H(j) if there exists a j (1 $j $ n) such that P U> $ a I (n-j+1) and P (i) $ P U>.
Hommel's procedure, although more complicated, is more powerful than Hochberg's
procedure.
Rom (1990) showed that the superiority of Hommel's procedure was due to the
conservatism of Hochberg's procedure Le. its size was strictly less than a for n>2. He
corrected this undesired property by modifying the critical points of Hochberg's
procedure to obtain a new procedure that would still strongly control the family-wise
error rate at the designated significance level a. Hochberg's critical points a/n, a/n-
1, ..... ,a are replaced by cl ,..... , cnn, respectively where c1 =a and c· = c'+l 'n 1 Ij I j+l
l$i$j. The modified critical points are obtained iteratively by solving the recurrence
relationship:
10
.'
•
't-! i (n) n-i _.£..J cnn - . c(n-i) - O.1=1 1 n
The modified critical values are greater than the original ones, except for n:52, so the
modified procedure always rejects the global null hypothesis whenever the original one
does. The inference on the individual hypotheses can be done in the following sequential
way: if P n:5cn then all Hi's are rejected; otherwise, Hn cannot be rejected and one goes
on to compare P n-1 with cn-1 ' etc.
In summary:
1. Bonferroni's procedure may lack power for alternative hypotheses in which several
endpoints are correlated.
2. Sidak's multiplicative inequality only leads to a slightly more powerful test than
Bonferroni's.
3. The power of Holm's sequentially rejective Bonferroni test is small if all the
alternative hypotheses are 'almost true', but it may be considerable if a number of
hypotheses are 'completely wrong'. It can also be used with weights.
4. To use the procedure based on Ruger's inequality, one must determine k (P(k»)
before performing the n tests. To avoid choosing k in advance, Hommel introduced a
second procedure which is strictly not less powerful than Holm's procedure but is more
conservative than Simes's test.
5. Shaffer modified Holm's sequentially rejective procedure to obtain a further increase
in power when there are logical implications among the hypotheses and alternatives so
that not all combinations of true and false hypotheses are possible. An even bigger
increase in power can be obtained if i) a more comprehensive hypothesis has initially
been rejected, or ii) the particular hypotheses rejected are taken into account.
6. Worsley's procedure is an improvement over Bonferroni's but it requires knowledge
11
of the joint probabilities of pairs of events which is not usually available.
7. Armitage & Parmar's procedure takes into account the correlation structure but IS
more difficult to apply.
8. Simes's procedure is less conservative than Hommel's. The improvement in power is
appreciable when several of the alternative hypotheses are correct. However, it does not
allow specific statements about individual hypotheses.
9. With Hommel's extension of Simes's procedure, one can make inferences on
individual hypotheses.
10. Hochberg's procedure is another sequential way of making inferences on individual
hypotheses and it is simpler to use than Hommel's procedure. However it is not as
powerful as Hommel's.
11. Rom's modification of Hochberg's critical values makes Hochberg's procedure as
powerful as Hommel's while it is still simpler to use.
1.2.1.2 Hotelling's T 2
Hotelling's T 2 (1931) is a standard approach to study several normally
distributed endpoints simultaneously. To test the null hypothesis Ho: I! = I!o, the
statistic is defined as:
(1.2.4)
where ¥ is a vector of means from a sample of size N drawn from a population Np(I!'~)'
and ~-1 is the inverse of the sample covariance matrix. (<::~D~T 2 is distributed as a
non-central Fp,N-p with non-centrality parameter N(I! I!o)' ~-1 (I! I!o), If I! = I!o,
the distribution is the central F (Anderson, 1958).
If the prime interest is to compare the means of two normal populations where
12
..
the covariance matrices are assumed equal but unknown, the T 2 statistic can also be
used.
(i) (i) (i) )'Let Yl ,..... , YN. be a sample from N JJ , ~ 1=1,2.- - 1 -
JJ(I)= JJ(2). y(i) is distributed N(JJ(i), (1\N.)~) and- - - - 1
The null hypothesis is Ho:
where
T 2 NtN2 (_(1) _(2»)'S-I(-(I) _(2»)= N +N Y - Y - Y - Y ,
I 2 - - - -(1.2.5)
s_ 1 {~( (1) _(1»)( (1) _(1»),+~( (2) _(2»)( (2) _(2»),}Nt +N
2-2 ~ ~i -~ ~i -~ ~ ~i -~ ~i -~ .
1=1 1=1
Th N1+N2-p-l T 2 ' d' ·b t d t I F 'thus, (Nt+N
2-2)p 1S 1stn u e. as a non-cen ra p,Nt+N2-p-l WI
significance level Q and non-centrality parameter:
The distribution is the central F under Ho .
O'Brien (1984) noted that the T 2 statistic makes no distinction between
variables that change favorably and variables that change unfavorably. It is a test that
can detect a possible difference at a certain standardized distance from Ho in all
directions. Pocock (1987) further added that because it is intended to detect any
departure from the null hypothesis, it lacks power to detect any specific types of
departure considered a priori to be biologically plausible in a clinical trial and therefore
Hotelling's T 2 is unsuitable for the analysis of clinical trials,
13
1.2.1.3 O'Brien's Test
O'Brien's interest (1984) focussed on tests with alternative hypotheses with all
endpoints showing an effect in the same direction. He was concerned with the lack of
power of the Bonferroni inequality and the lack of discrimination of Hotelling's T 2• He
was seeking a single global test that would allow making overall probability statements
instead of having to interpret multiple test results, when some effect was consistent
among all endpoints. O'Brien observed, through simulations in which the number of
endpoints studied is large relative to sample size, that while separate tests on each
variable may not reach statistical significance, the overall evidence may suggest strong
differences. He also showed that tests such as Hotelling's T 2 achieved low power in such
circumstances.
He considered three global procedures: a nonparametric procedure that is a rank-
sum-type test, and two parametric approaches that are similar, one being based on
generalized least squares (GLS) estimation while the other one uses ordinary least
squares (OLS) estimation methods. As he pointed out, the efficiency of the OLS
procedure relative to the GLS procedure is :5 1 so attention will be focussed on the GLS
approach.
Let Y ijk represent the kth variable for the jth subject in group (k=l,..... ,Kj
j=l,..... ,ni j i=l,..... ,I).
..
COV(Yijk'Yj'j'k') =O'kk'o
if ij = i'j'otherwise.
•
Assume Yijk is defined so that large values are better than small values for each
k=l,..... ,K and Yij are independently distributed with mean ~i and covariance matrix ~.
14
The null hypothesis Ho : ~l =.....= ~I versus the alternative hypotheses for which IJik >
IJi'k' for k = 1, 2, ..... ,K, are of prime interest i.e. if the mean of variable 1 is greater in
group 1 than in group 2, then the mean of variables 2, ..... ,K will all be greater in group
1 than in group 2.
1.2.1.3.1 Nonparametric procedure
The nonparametric test is particularily recommended when the variables are not
normally distributed or the sample size is small.
Let Rjjk represent the rank of Yjjk among all values of variable k in the pooled
set of I samples.
KDefine Sij = 2: Rijk'
k=l
Perform a one-way analysis of variance on the {Sij} values.
1.2.1.3.2 GLS Parametric procedure
First, assume the {Yijk} values are standardized i.e. the overall mean is
subtracted from each observation and the result is then divided by the pooled within-
group sample standard deviation. Then compute:
where
F = 2:nj {Jt-1(}\ - Y.. )}2 / ((I-I) J't-1ni
J' = (1,1,.....,1),
Yi.= 2:Yij / nj'j
y .. =2:Yij / 2:nj, andij j
ta,b = L(Yija-Yi.a)(Yijb-Yi.b) / L(nj-l).ij j
(1.2.6)
Reject Ho if F exceeds the (l-a)xlOO percentile of the standard F distribution
15
with 1-1 and 2)nj-K) degrees of freedom in the numerator and denominator,i
respectively.
O'Brien has shown that the GLS procedure is remarkably robust to the
normality assumption and achieves optimality in the normal theory setting. He has also
demonstrated that both procedures, parametric and nonparametric, asymptotically
provide approximations of the probability of type-I error. In the repeated measures
setting, when variances are heterogeneous, the GLS procedure may allow a considerably
greater power with a slight increase in the size of the test.
In brief, O'Brien's nonparametric test is simply performing a one-way analysis of
variance on the sum of the ranks assigned to each subject or a univariate t-test on that
sum when comparing two samples. His parametric approach is also a one-way analysis
of variance but this time on the avarage of the standardized data. Both procedures
have the property of collapsing the multiple variables into one summary statictic.
Before proceeding to a more detailed analysis of the three tests presented, it is of
interest to review the use of sequential designs in clinical trials. As mentioned earlier,
they will play an important role in the methods developed in this dissertation.
1.2.2 Sequential designs
The theory for sequential designs was developed by Wald in 1947. A test is
performed after the accrual of each pair of observations; the only decision to be made is
whether to terminate or continue the trial. This classical sequential design is called an
'open' plan because there is no fixed sample size. Wald and Wolfowitz (1948) showed
that fully sequential designs led to the lowest expected sample size under the null and
the alternative hypotheses. Armitage (1957) introduced the 'closed' sequential design to
impose a limit on the sample size. In 1971, McPherson and Armitage developed theory
on repeated significance tests on accumulating data which is similar to the 'closed'
16
•
sequential design. Later, Armitage (1978) and Jones and Whitehead (1979) applied
sequential designs to survival data. Despite the savings in sample size, the need for
constant data monitoring, rapid response measures and matching of the participants was
not appealing. The requirement of analysis after each pair of outcomes was also
cumbersome.
Group sequential designs were subsequently developed to avoid some of the
problems of classical sequential designs. They are more practical and the increase in
sample size relative to the classical design is slight. The subjects are divided into I
equal-sized groups with 2n subjects in each. The data are then analyzed a maximum of
I times i.e. once after each accrual of 2n subjects. If the statistic Zj is outside a
prespecified stopping boundary, the experiment is -stopped and the null hypothesis is
rejected. If the statistic is inside the boundary, the experiment continues until i = I.
When i = I, the trial stops and the null hypothesis either is or is not rejected.
Haybittle (1971) and Peto (1976), Pocock (1977) and O'Brien &-Fleming (1979)
suggested different group sequential stopping boundaries for the standardized normal
statistic Zj' Haybittle & Peto favored a large critical value such as Zj = ±3.0 for all
interim tests and the conventional critical value for the last test. Pocock proposed to
use a constant critical value based on the number of analyses such that the overall
significance level would be o. O'Brien & Flemming suggested using Z*..[i7i where Z is
such that the overall significance level 0 is achieved. Extensions and modifications to
these designs were then proposed by DeMets & Ware (1980), Tsiatis (1982), Fairbanks
(1982), Gould & Pecore (1982), Harrington, Fleming & Green (1982), Whitehead
(1983), Whitehead & Stratton (1983), Selke & Siegmund (1983), Lan & DeMets (1983),
DeMets & Lan (1984), Lan, DeMets & Halperin (1984), Jennison, Turnbull & Tsiatis
(1984), Fleming, Harrington & O'Brien (1984), Freedman, Lowe & Macaskill (1984),
Gail (1984) and Geller & Pocock (1987).
17
Tang. Gnecco and Geller (1989) looked at the analysis of multiple endpoints with
group sequential designs. Using O'Brien's GLS approach. they proposed a method for
the design of clinical trials that allows for interim analyses and considers all endpoints
simultaneously. Patients are entered sequentially in a clinical trial. After each accrual
of 2n patients (n randomized to treatment A and n to treatment B). an interim analysis
is undertaken on the accumulated data. Assume the patient's data are independent k-
dimensional variables with mean tfi = (Jlil'Jli2 ...... 'Jlik)' i= A, B and common known
·covariance matrix~. The null hypotheses of interest are Hoi: JlAi - JlSi = O. The
alternative hypotheses are Hai : JlAi - JlSi = "\0i' i = 1,2,..... ,k where 0i specifies the
relative difference of interest. In O'Brien's original model, OJ = (Ti' for all i. The null
hypothesis can now be written Hoi: ..\ =. O.
Let Yj = (yU,.....,y~)' for the first j groups of patients data, have a multivariate
normal distribution with mean ..\§ and covariance matrix 2~/nj.
j
YU = L (;cAirn - "Sirn) / j, i = 1,2,.....krn=l
where "lirnis the average mean for patients in group m i.e. accrued between the (m_l)st
and mth analyses, I = A, B. The GLS estimate of ..\ is
(1.2.7)
O'Brien's test is then defined as:
F = (nj/2)1/2 §' ~-l Yj / (§' ~-l §)1/2. (1.2.8)
Under Ho , F ,." N(O,I). Under Ha: ..\ = ..\0 (>0), the mean of F is ..\o(nj §' ~-l §//2.
For two-stage and three stage group sequential trials, O'Brien's statistic would generally
18
be compared to Pocock boundaries or O'Brien and Flemming boundaries to decide if the
trial is to be continued or not.
The main advantage of this procedure is the sample-size saving. Let nj be the
sample size when only the ith endpoint is analyzed.
(n/2 §' ?;-l §)1/2 = (nJ2)1/2 fJJUj
=> (n/ni/2 = (OJ/Uj) (§' ~-l §//2.
The authors proved that §' ?;-l § ~ OJ 2/ Uj
2, for all i, so the sample size n $ min(nj)'
This implies that their test is more powerful than the univariate test on anyone
endpoint.
Some of the limitations of the proposed procedure reside in the fact that the data
must be normally distributed with known covariance matrix. Furthermore, early
stopping may· be based on small sample size· and the handling of missing data needs
more investigation. To summarize, this approach simply applies a group sequential
design to an O'Brien type univariate linear combination of the multivariate outcomes.
1.3 Proposed research
The objective of this research is to develop a new procedure to analyze multiple
endpoints that will take advantage of the techniques used to implement group sequential
designs combined with the advantages of multivariate testing. Although these designs
will be discussed in terms of their application to randomized clinical trials, the procedure
presented can be used in other contexts where multiple endpoints are analyzed.
In Chapter 2, this dissertation will compare the critical regions and the contours
for the power of each of three common procedures: Bonferroni's inequality, Rotelling's
T 2 and O'Brien's test. The efficiency and robustness of these procedures will be
19
compared as a function of the direction of the alternative hypotheses.
A new test is proposed in Chapter 3 that uses information from the interim
analysis in a two-stage group sequential design to form the rejection boundaries at the
second stage. The test uses an Hotelling T 2 rejection region at the end of the first stage
and an O'brien type procedure at the end of the second stage. The distribution of the
proposed test will be derived and its power and efficiency will be compared to common
procedures. A modification of this test that allows for early stopping i.e. acceptance of
the null hypothesis at the interim analysis will also be presented.
In Chapter 4, calculation of the new test statistic is illustrated by applying it to
two example data sets. The implications of this dissertation and suggestions for future
research are discussed in Chapter 5.
20
CHAPTER 2EVALUATION AND COMPARISON OF
THREE COMMON MULTIVARIATE TESTING PROCEDURES
2.1 Introduction
Studies are often designed to compare two or more groups with respect to one or
more variables. For clarity, we will restrict ourselves to two groups and two variables
for most of this chapter. Define OJ as a statistic computed from the observed data and
O'i.as the standard deviation of OJ' i=1,2. The null hypothesis to be tested is of the form1
Ho: ~ = ~o where ~' = (9 1, ( 2 ) is generally the differences between the means of the two
groups for each variable but could also be another appropriate parameter. Let ~' = (ZI'
0. - 9·Z2) where Zj = I (1'. 10, Zl and Z2 will be assumed to be distributed normally or at
'j
least approximately normally 'distributed based on large sample theory, with zero means
and unit variances under the null hypothesis. One major concern is the power of the
test i.e. the probability of rejecting Ho given Ha is true, based on the true values 91a
and 92a • But what does 'rejecting Ho ' mean in the case of multivariate samples?
Rejection of Ho: 91 = 92 = 0 leads to eight possible alternatives 91= 0 and 92 > 0, or
91< 0 and 92 < 0, or 91< 0 and 92 = 0, or 91< 0 and 92 > O. Should all the possible
alternatives be looked at jointly or should more power be allowed to detect some of
them considered of prime interest? In this chapter, the power for these different
alternatives will be studied for each of the three procedures described in Chapter 1:
Bonferroni's inequality, Hotelling's T 2 and 0 'Brien's test.
Let 1-{3 denote the power with a type I error level a. The values of 0la and 02a
for which the power is 50%, 80% and 90% with a significance level a = 0.05 will be
derived. Initially, in Sections 2.2 to 2.4, Z1and Z2 will be assumed independent. The
case where Z1 and Z2 are correlated will be discussed in Section 2.6. The critical regions
for each test will be shown as well as the contours for the power. These will form the
basis for describing the advantages and disadvantages of existing procedures.
2.2 Bonferroni's inequality
Bonferroni's correction, we will reject Ho = {H 1 , H2} if any Pj ~ af2, i=1,2, a = 0.025
for a two-sided test. The critical value of the test is Z.0125 = 2:24. The square in
Figure 2.2.1 represents the critical region of the test. Letting Zl and Z2 be defined as
previously, the power is given by
Pr(rejecting Ho I Ha true)
= Pr(Zl > 2.24 or Zl < -2.24 or Z2 > 2.24 or Z2 < -2.24 I Hatrue)
= 1 - Pr( -2.24 < Zl < 2.24 and -2.24 < Z2 < 2.24 I Hatrue) (2.2.1)
= 1 - [Pr( -2.24 < Zl < 2.24 I Hatrue).Pr( -2.24 < Z2 < 2.24 I Hatrue)]
= 1 - [Pr( -2.24 - 0la<Z~<2.24 - 0la).Pr( -2.24 - 02a<Z2'<2.24 - 02a)]
where Zi. = Zj -Oia is a random variable with standard normal distribution and
Pr( -2.24 < Zj < 2.24 I Hatrue)
((1) (2)= Pr -2.24-0ia « 2.24-0ia ) where 0ia = Jlja - Jlja ' under Ha
= Pr( -2.24-0ia <Zj< 2.24-0ia )
Figure 2.2.2 shows the contours for powers of 50%, 80% and 90% with Q =0.05.
22
4
3
2
o
-1
-2
-3
o
-4 L,------r---r----..----r-----,----r----,----,-
-4 -3 -2 -1 o
Z1
2 3 4
Figure 2.2. 1 Critical RegionsBonferroni (B) Hotelling (H) O'Brien (0)
23
6
• 'J.' 0,.-,
, ", ,. "..... "."'."S, I :. ,
, ' ,.. ,.. ,, ' ,, ".' ,, ",',.','~
'H
2
5
3
4
-5
82a 0
-1
-2
-3
-4
-6 -5 -4 -3 -2 -1 0 23456
Figure 2.2.2 Contours for powers of 50%, 80% and 90%Sonferroni (B) Hotelling (H) O'Brien (0)
24
For a prespecified power and a fixed value of 81a , a computer search was done to
determine the positive value of 82a that would ensure the desired power. This was
repeated for 0:581a:54 to generate the points (81a , 82a ) in the first quadrant. The
points in the other quadrants were obtained by symmetry relations with the ones
already calculated. All the points (81a , 82a ) on the almost circular contours satisfy the
equations above with power 50%, 80% and 90% respectively. Power contours for Simes'
modification of Bonferroni's correction were also determined. The results were so similar
that the contours of both tests coincide in Figure 2.2.2.
2.3 Hotelling's T 2
For the two sample problem with known variance-covariance matrix, let {~f1)},
i=1,..... ,N1 be a sample from a N(t'(l), ~) population and {~f2)}, i=1,..... ,N2 be a
sample from a population N(t'(2), ~). Define
(2.3.1)
and
(2.3.2)
So, for Ho: ~ = t'(1) - t'(2) = Q, the confidence region, when ~ is known, is defined as
(2.3.3)
Asymptotically, for p=2, ~ = ! and N1 = N2 :;: N, the confidence region is
25
Z'Z _- N (_:[(1) __:[(2»), (_:[(1) __:[(2») < 2 599 r. 005_ _ _ _ _ _ X2 = . lor a=. . (2.3.4)
This inequality is the interior and boundary of a circle with center at (0,0) and
ray = ~ 5.99 as shown in Figure 2.2.1.
Under Ha,
Z Z NlN2 (_(1) _(2»), ~-1(_(1) _(2») < 2-'-= Nl+N2~ -~ ~ ~ -~ _Xp.nc
where the non-centrality parameter nc is
Asymptotically, for p=2, ~ = ! and Nl = N2 = N,
(2.3.5)
(2.3.6)
(2.3.7)
So, for all the values (81a ,82a ) where 81a = {N(jJ~~) - jJ~~») and 82a =
{N(jJ~~) - jJ~~») such that 8~a + 8~a= nc, the probability of rejecting Ho given (81a,82a )
1 - Pr(X~.nc < X~). nc= 8ia + 8~a' (2.3.8)
Power contours for HoteIling's test can be seen on Figure 2.2.2. A computer
search determined the distance from the origin for which powers of 50%, 80% and 90%
were respectively obtained. All points on a circle with ray equal to this distance have
the same power.
26
2.4 O'Brien's Test
For the two sample problem with the previous notation, p=2, t = !, O'Brien's
test for the null hypothesis Ho : J!<I) = /2) is asymptotically
F = L:nj {J't-I()\ - Y.. )}2 / {(I-I) J't-1J}i
{(_<I) _(2») (_<I) _'(2»)}2_ YI - YI + Y2 - Y2
- n 2 2
The confidence region is
(2.4.1)
IZl + Z21 = ~ n/2 l(y~l)- y~2») + (y~l) - y~2»)1 ~ ~ 2 (F0.95,1,00)
= ~ 2 (3.8416)
= 2.77
This inequality represents the region included between the two lines
equations (2.4.3) have power 0.5 i.e.
27
(2.4.2)
(2.4.3)
satisfying
(2.4.4)
Similarly, all the points lying on the two parallel lines, perpendicular to the 450
line through the origin in quadrant I and III equidistant from the origin, will have the
same power. To calculate this power, let's consider new axes U and V centered in a
given point (01a,02a) as in Figure 2.4.1. It is easy to verify that the length of the new
axis U from the origin to the line with power 0.5 is 1.96. So, the power at (01a,02a) is
1-{3 = Pr(rejecting Ho I (01a,02a»
= Pr(U > 1.96 - ~ O~a + O~a) + Pr(U < -( 1.96 + ~ O~a + O~a»'
(2.4.5)
All points on the lines perpendicular to the 450 line through the origin in
points satisfy the equations:
For example, to ensure a power of 0.8, (01a,02a)=(1.99,1.99).
Pr(U > 1.96 - ~ 0ia + O~a) + Pr(U < -(1.96 + ~ O~a + O~a» = 0.8
=> 1.96 - ~ 8~a + O~a = -0.85
=> 0la = 02a = 1.99.
28
6
5
4
3
2
820
0
(9'0. 9 20)-1
-2
-3
-4
-5
-6
-6 -5 -4 -3 -2 -1 0 2 3 4 5 6
Figure 2.4.1. New axes to determine the power of
O'Brien's test
29
All points on 82a,1-13 = -81a,1-13 + 3.98 and 82a,1-13 = -81a,1-13 - 3.98 have
power 0.8 (Figure 2.2.2). Similarly, all points on 82a,1-13 = -81a,l-13 + 4.6 and 82a,1-13 =
-81a,1-.8 - 4.6 have power 0.9.
2.5 Comparison of the three procedures
The comparison will first be done in a bivariate context assuming independence,
although it can be generalized to more than two correlated variables. Of the three tests
previously described, O'Brien's test has the greatest power to detect an alternative
hypothesis that lies on the diagonal in quadrants I and III. This attractive property
holds for all (81a ,82a ) within a certain symmetric distance from the diagonal as shown
on Figure 2.5.1. The regions '0' were determined from the intersections of O'Brien's
and Hotelling's contours for power varying from 5% to 95%. The set of (81a ,82a ) for
.which the power of O'Brien's test is better than for Hotelling's and Bonferroni's test is
indicated in the areas '0'. The regions 'B' were obtained in a similar way but the
intersections of Bonferroni's and Hotelling's contours were considered. If the power is
greater than 27%, Bonferroni's test is better within the regions 'B'. For a fixed
alternative (('la,82a ), outside the regions '0', Hotelling's test will have better power
than Bonferroni's and O'Brien's procedures. If the truth lies on the diagonal in
quadrants II and IV, O'Brien's power is then at its minimum i.e. 5%. It should be
emphasized that O'Brien's test is optimal when all variables are believed to be affected
in exactly the same direction and the same magnitude while Hotelling's T 2 is not affected
by the direction of the effect of the variables. This property makes the T 2 a more
robust approach not only when the truth lies on the diagonal through quadrants II and
IV but for all (81a ,82a ) included in the regions 'H' on Figure 2.5.1. However, if the
power is at least 27% and if one of the two statistics is close to zero while the other one
reaches its maximum value, then Bonferroni's procedure will have better chances of
30
-4 -3 -2 -1 o 2 3 4
Figure 2.5. 1 Regions in which each test has greaterpower than the other two testsBonferroni (B) Hotelling (H) O'Brien (0)
31
detecting a difference. It can be seen that intersection points- on Figure 2.2.2 for powers
of 50%, 80% and 90% fall on the contours of the regions observed on Figure 2.5.1.
Showing the regions for which each test is the best does not indicate how much
better each test is. To q.uantify the differences in power, a region covering -90° to 90°,
from the diagonal of quadrants II and IV was considered. The half circle from
Hotelling's contours included in this region was divided into 180 one-degree sections
(Figure 2.5.2). The values of (Ola,02a) for each one-degree section on Hotelling's
contour at 50% power were determined and corresponding Bonferroni's and O'Brien's
powers were evaluated. Figure 2.5.2.1 shows the power for all three tests when
Hotelling's is 50%. Figures 2.5.2.2 and 2.5.2.3 correspond to Hotelling's powers of 80%
and 90% respectively.
As expected, the curves are symmetric about zero degree and the power of
O'Brien's test is 5% at _90° and 90° i.e. on the diagonal in quadrants II and IV.
O'Brien's test has maximum power at 0°, on the diagonal of quadrants I and III. This
maximal improvement is only of 11% when the power of Hotelling's test is 50% and
decreases to 4.5% when the power of Hotelling's test is 90%. It is also obvious from
these figures that although O'Brien's test has better power when all variables are
affected in the same direction, it deteriorates sharply to achieve the lowest power of 5%
when variables are affected in opposite directions. Bonferroni's largest improvement
over Hotelling's T 2 is small (~1%), only occurs for powers greater than 27% and is
limited to few values of (Ola,02a)'
In conclusion, O'Brien's test has better power when the outcomes are in exactly
the same direction but is by far the worst to detect a difference when the outcomes are
affected in opposite directions. Hotelling's constant power dominates Bonferroni's
conserva.tism except for a. narrow range of (813 ,82a ) where the improvement due to
Bonferroni is much smaller than the one gained by Hotelling's when its power is
32
-6 -5 -4 -3 -2 -1 0 23456
Figure 2.5.2 One-degree sections for 80% power contour
of Hotelling's test
33
1.0
0.9
0.8
0.7P0 0.6
WE 0.5
R0.4
0.3
0.2
0.1
0.0
-90 -45 o
ANGLE (degrees)
45
o
90
Figure 2.5.2.1 Power at each one-degree sectionfor alternative hypotheses for which
the power of Hotelling's test is 50%Bonferroni (B) Hotelling (H) O'Brien (0)
34
•
1.0
0.9
0.8
0.7P0 0.6
WE 0.5
R0.4
0.3
0.2
0.1
0.0
-90 -45 o
ANGLE (degrees)
45 90
Figure 2.5.2.2 Power at each one-degree sectionfor alternative hypotheses for whichthe power of Hotelling's test is 80%Bonferroni (B) Hotelling (H) O'Brien (0)
35
1.0
H0.9
B0.8
0.7P0 0.6
WE 0:5
R0.4
0.3
0.2
0.1 0
0.0
-90 -45 0 45 90
ANGLE (degrees)
Figure 2.5.2.3 Power at each one-degree sectionfor alternative hypotheses for which
the power of Hotelling's test is 90%
Bonferroni (B) Hotelling (H) O'Brien (0)
36
greater.
To compare the power of O'Brien's test with that of Hotelling's T 2 when more
than two uncorrelated outcomes are studied, different alternatives were considered. The
non-centrality p!i-rameters (NC), when the power of Hotelling's test is 80%, were
evaluated with the X2 distribution for 3, 4, 5 and 10 endpoints; their values w~re 9.63,
10.90, 11.94, 12.83 and 16.25 respectively. The cosine of the angle between ~:, the
vector of alternative hypotheses, and the diagonal l) was determined. For example, for
p = 2 endpoints:
where ota and O;a are the parameter values under the alternative hypothesis; (Ora' O;a)
can be any point on the axis V on Figure 2.4.1 and ~ NC * cos 'Y = ~ (O~a + (}~a) on
the same figure. So, cos 'Y = 1 for the alternative hypothesis where all o~, i=l, 2, ... , p,.
are equal and 'Y = 0°. If (p-1) O~ 's are equal and one 0i~ = 0, cos 'Y = ~ p~l. If (p-2)
O~ 's are equal and two (}~ 's ~ 0, cos 'Y = ~ p~2 so if (p-r) O~ 's are equal and r (}~ 's =
o then cos 'Y = ~ p~r, r < p. Once cos 'Y was determined, the power was evaluated as
follows:
power = 1 - 4{1.96 - ~ NC * cos 'Y ) + 4>( - 1.96 - ~ NC * cos 'Y ).
and results are presented in Table 2.5.1. When all the (}la's are equal, the power of
O'Brien's test increases as the number of endpoints increases. However, as the number
of (}ia's that are equal decreases, the power also decreases; it can be as low as 24.7% for
37
Table 2.5.1 Power of O'Brien's test, for different alternative hypotheses when thepower of Hotelling's T 2 is 80% and the number of endpointsp = 2, 3, 4, 5 and 10.
Number of endpoints ..Alternative hypotheses p = 2 p=3 p=4 p=5 p = 10
all Dia's are equal 87.3% 91.0% 93.3% 94.8% 98.1%
( 0°)<*> ( 0°) ( 0°) ( 0°) ( 0°)
(p-1) Dia's are equal & 59.2% 76.9% 84.9% 89.3% 96.9%
one Dia= 0 (45°) (35°) (30°) (27°) (18°)
(p-2) Dia's are equal & 47.9% 68.6% 79.2% 95.0%
two Dia's = 0 (55°) (45°) (39°) (27°)
(p-3) Dia's are equal & 40.8% 62.0% 92.1%
three Dia's = 0 (60°) (51°) (33°)
(p-4) Dia's are equal & 36.0% 87.7%
four Dia's = 0 (63°) (39°)
(p-9) Dia's are equal & 24.7%
nine Dia's = 0 (72°)
(*) Angle between ~:, the vector of alternative hypotheses, and the diagonal 1).
38
ten endpoints. When only half the 8ia 's are equal, the power is about 80% for tOen
endpoints but it drops to 68.6% when four endpoints are considered and even lower to
59.2% with two endpoints.
In the following section, the same three tests will be examined with correlated
data. This will allow the comparison of their behavior when the data are assumed to be
from N(Q,~:n populations.
2.6 Evaluation with correlated data.
In many clinical trials, it is common to find outcome variables which are
correlated. The assumption of independence used in the previous sections does not hold
anymore. By applying a transformation to the Zl and Z2 independent standard normal
variables, it is possible to obtain the critical regions and power contours for Hotelling's
and O'Brien's test with correlated data.
: ] be a matdx ,uch that j!: = ~ ~ andLet A = [ a- b
Let ~ = (Zl,Z2)' be a vector of independent N(0,1) variables and ~ = (Xl ,X2)'
be N(Q,~) where
~ Xl = aZl + bZ2
X2 = bZl + cZ2
~ Xi = (a + b) Zj
Xi = (b + c) Zj
~ a= c.
(2.6.1)
= Cov(aZl + bZ2, bZl + cZ2)
= ab Cov(Zl,Zl) + bc Cov(Z2,Z2)
= ab + bc
= 2 ab (a = c) (2.6.2)
39
(2.6.1) and (2.6.2) => a = ~(1 + J1"'=7) / 2
(2.6.3)
So A =[ a- b
: ] where a and b are defined in (2.6.3) and 1> = ~~.
The critical regions and power contours have been redefined using that
transformation for p = -0.5, -0.9, 0.5 and 0.9. Figures 2.6.1.1 to 2.6.2.4 show the
transformed results. Bonferroni's critical regions are not affected by the correlation
structure but the powers were recalculated using an algorithm that provides bivariate
normal probabilities. The product of two univariate probabilities from (2.2.1) does not
hold with correlated data.
Bonferroni's and Hotelling's power contours are flattened and rotated so that the
longer axis is in the direction of the correlation. O'Brien's regions do not rotate; they
are shifted towards the origin as p decreases (Figures 2.6.2.1 to 2.6.2.4). As for the
uncorrelated data, O'Brien's power remains the greatest in the neighbourhood of the 45°
line of quadrants I and III although the range of values for which O'Brien's power is the
greatest varies widely with the degree of correlation. The more negatively correlated the
variables are, the wider the fange is where O'Brien's power is the best. For p = -0.9,
O'Brien's power surpasses Hotelling's for almost all the points included in the region
covering 180° from the diagonal of quadrants II and IV (Figures 2.6.3.2.1 to 2.6.3.2.3).
For positively correlated variables, the gain in power with O'Brien's test over Hotelling's
T 2 is lost with only a slight deviation from the the 45° line of quadrants I and III
(Figures 2.6.3.3.1 to 2.6.3.4.3). The maximum gain regardless of the correlation and the
value of Hotelling's power (50%, 80% or 90%) does not exceed 11%. Bonferroni's power
improves around 0° with a positive correlation but greatly deteriorates when the
variables are negatively correlated (Figures 2.6.3.1.1 to 2.6.3.2.3). When p = -0.9, its
power remains close to 10% for all the values of (Ola' 02a) in the first quadrant. The
same low power is observed in quadrants II and IV when p = 0.9. The conclusions
reached when the variables were not correlated remain the same. The gain in power
from O'Brien's or Bonferroni's test over Hotelling's T 2 remains overall small in
comparison to the improvement achieved when Hotelling's power is the greatest of the
three. However, O'Brien's test always has the best power to detect a difference when
40
both variables are affected in exactly the same direction and the same magnitude. It is
that property of O'Brien's test combined with the robustness of Hotelling's T 2 that
motivated the investigation of a new test. It will be presented and its distribution will
be derived in Chapter 3.
41
4
3
2
Z20
-1
0
-2
-3
-4 -3 -2 -1 o
Z1
2 3 4
Figure 2.6.1.1 Critical regions, rho = -0.5
Bonferroni (B) Hotelling (H) O'Brien (0)
42
4
3
2
o
-1
-2
-3
-4 "'r-----r--,--...,.-----,--,----.---,---,.-
-4 -3 -2 -1 o 1 2 3 4
Figure 2.6.1.2 Critical regions, rho = -0.9Bonferroni (B) Hotelling (H) O'Brien (0)
43
4
3
2
o
-1
-2
-3
-4 ';-----r-----,.---.,.-----r---.----,------y-----,
-4 -3 -2 -1 o 2 3 4 .
Figure 2.6.1.3 Critical regions, rho = 0.5Bonferroni (B) Hotelling (H) O'Brien (0)
44
4
3
2
o
-1
-2
-3
B
-4 -3 -2 -1 o 2 3 4
Figure 2.6.1.4 Critical regions, rho = 0.9Bonferroni (B) Hotelling (H) O'Brien (0)
45
6
5
4
3
2
9 200
-1
-2
-3
-4
-5
-6
-6 -5 -4 -3 -2 -1 0
() 10
23456
Figure 2.6.2.1 Contours for powers of 50%, 80% and90%, rho = -0.5Bonferroni (B) Hotelling (H) O'Brien (0)
46
6
5
4
3
2
() 200
-1
-2
-3
-4
-5
-6
-6 -5 -4 -3 -2 -1 0 23456
Figure 2.6.2.2 Contours for powers of 50%, 80% and 90%rho = -0.9Bonferroni (B) Hotelling (H) O'Brien (0)
47
6
B
.............:; 0
H
2
3
4
5
8 200
-1
-2
-3
-4
-5
-6
-6 -5 -4 -3 -2 -1 0 23456
Figure 2.6.2.3 Contours for powers of 50%. 80% and 90%rho = 0.5Bonferroni (B) Hotelling (H) O'Brien (0)
48
6
5
4
3
2
e200
-1
-2
-3
-4
-5
-6
·-6 -5 -4 -3 -2 -1 0
8 10
23456
Figure 2.6.2.4 Contours for powers of 50%. 80% and 90%rho = 0.9Bonferroni (B) Hotelling (H) O'Brien (0)
49
90o 45
ANGLE (deg-ees)
-450.0'r-----...------....------...------.,-
-90
0.2
0.1
0.3
1.0
0.8
0.9
0.7P00.6WE 0.5 --.,;::""",.-----,,L----------~-~,...<:;.---
R 0.4
Figure 2.6.3.1.1 Power at each one-degree sectionfor alternative hypotheses for whichthe power of Hotelling's test is 50%rho = -0.5Bonferroni (B) Hotelling (H) O'Brien (0)
50
1.0
0.9
0.8
0.7P0 0.6
W 0.5ER 0.4
0.3
0.2
O. 1
0.0-90 -45 0 45 90
ANGLE (deg-ees)
Figure 2.6.3.1.2 Power at each one-degree sectionfor alternative hypotheses for whichthe power of Hotelling's test is 80%rho = -0.5Bonferroni (B) Hotelling (H) O'Brien (0)
51
1.0B
O. 1
0.9~======="""""-----''''''::::':::::''''-'''------=:::::::''''~-""'''?''''''''''::::::==
0.8
0.3
0.2
0.7
~ 0.6
~ 0.5
0.4
90o 45
ANGLE (deg-ees)
-450.0"lr------r-------,.....----~----__,_
-90
Figure 2.6.3.1.3 Power at each one-degree sectionfor alternative hypotheses for whichthe power of Hotelling's test is 90%rho = -0.5Bonferroni (B) Hotelling (H) O'Brien (0)
52
1.0
0.9
0.8
o
B
0.1
0.3
0.2
0.7
~F? 0.6
0.5-1~~_c-------------------:.......----,L-
H0.4
90.0 45
ANGLE (deg-ees)
-450.0 l.,------.....-----~----_.__----___r
-90
Figure 2.6.3.2. 1 Power at each one-degree sectionfor alternative hypotheses for whichthe power of Hotelling's test is 50%rho = -0.9Bonferroni (B) Hotelling (H) O'Brien (0)
53
1.0
B
o
H
O. 1
0.8-l-J"r---..,..c--------------~~-r_
0.9
0.3
0.2
0.7
~ 0.6
~ 0.5
0.4
90o 45
ANGLE (deg-ees)
-4~
0.0"r------,------r--------.--------r-90
Figure 2.6.3.2.2 Power at each one-degree sectionfor alternative hypotheses for whichthe power of Hotelling's test is 80%rho =-0.9Bonferroni (B) Hotelling (H) O'Brien (0)
54
1.0B
0.9H
0.8
0.7iO.G
0.5
0.4
0.3
0.2
O. 1 O.
0.0-90 -45 0 45 90
ANGLE(deg-ees)
Figure 2.6.3.2.3 Power at each one-degree sectionfor alternative hypotheses for whichthe power of Hotelling's test is 90%rho = -0.9Bonferroni (B) Hotelling (H) O'Brien (0)
55
1.0
0.9
0.8
0.7
~ 0.6
~ 0.5
0.4
0.3
0.2
o. 1
H
B
o
90o 45
ANGLE(deg-ees)
-45o.o-.,..- ~----_..,._----_,_----__.
-90
Figure 2.6.3.3.1 Power at each one-degree sectionfor alternative hypotheses for whichthe power of Hotelling's test is 50%rho = 0.5Bonferroni (B) Hotelling (H) O'Brien (0)
56
90o 45
ANGLE (deg-ees)
-45
O. 1
1.0
0.9
0.0 'r------~----_r_----....__----..,.-90
0.3
0.2
0.7
~ 0.6
~ 0.5
0.4
Figure 2.6.3.3.2 Power at each one-degree sectionfor alternative hypotheses for whichthe power of Hotelling's test is 80%rho = 0.5Bonferroni (B) Hotelling (H) O'Brien (0)
57
1.0H
0.9~-------?"'"""':::::::;;~=:::=:::S?"""""::::--------
0.8
0.7
@0.6
~ 0.5
0.4
0.3
0.2
O. 1
90o 45
ANGLE(deg-eeS)
-450.0 'r------r-----...,.------,-------r
-90
Figure 2.6.3.3.3 Power at each one-degree sectionfor alternative hypotheses for whichthe power of Hotelling's test is 90%rho = 0.5Bonferroni (B) Hotelling (H) O'Brien (0)
58
1.0
0.9
0.8
0.7
~ 0.6
~ 0.5
0.4
0.3
0.2
O. 1
H
B
a0.0 ;-----.,------r-------r------=-r-90 -45 0 45 90
ANGLE(deg-ees)
Figure 2.6.3.4. 1 Power at each one-degree sectionfor alternative hypotheses for whichthe power of Hotelling's test is 50%rho = 0.9Bonferroni (B) Hotelling (H) O'Brien (0)
59
1.0
0.9
0.8
0.7
~ 0.6
~ 0.5
0.4
0.3
0.2
0.1
H
B
o
90o 45
ANGLE(deg-ees)
-45
0.0...... --.- --.- ---.- -,-
-90
Figure 2.6.3.4.2 Power at each one-degree sectionfor alternative hypotheses for whichthe power of Hotelling's test is 80%rho = 0.9Bonferroni (B) Hotelling (H) O'Brien (0)
60
1.0H
0.9
0.8
0.7i0.60.5
0.4
0.3
0.2 BO. , 0
0.0-90 -45 0 45 90
ANGLE (deg-ees)
Figure 2.6.3.4.3 Power at each one-degree sectiomfor alternative hypotheses for whichthe power of Hotelling's test is 90%rho = 0.9Bonferroni (B) Hotelling (H) O'Brien (0)
61
CHAPTER 3TWO-STAGE GROUP SEQUENTIAL TEST WITH MULTIPLE ENDPOINTS
3.1 Introduction
In Chapter 2, O'Brien's test was shown to be the best test to detect a difference
between treatments when the true effects of the variables are all in the same direction
predicted by O'Brien; however, the test becomes very inefficient with departure from
this prespecified direction. A group sequential approach that would allow the
investigators to look at the direction of the data through an interim analysis will be
investigated. It would relax the requirement of having all effects in the same direction
and exactly the same effect sizes and would result in a very robust test.
In this chapter, a two-stage group sequential test for randomized clinical trials
with multiple endpoints will be developed to test the hypothesis Ho : € = €o vs Ha : € i=
~o. Its distribution under the null and the alternative hypotheses will be derived. The
proportion p of the whole sample after which the interim analysis should be done will
then be examined.
3.2 The new test statistic L
Consider a two-stage group sequential design with the interim analysis performed
after the accrual of n 1 subjects. The final analysis would take place after n2 additional
subjects would be included in the study i.e. with n = n1 + n2 subjects.
Let Xl and Y I be normally distributed test statistics for two uncorrelated
variables based on data from the 1st accrual period with mean zero and unit variance
under the null hypothesis; let X and Y be the same test statistics based on the data
from the whole trial. Let Y1 be the vector from the origin to (Xl' Y1)' the statistics
calculated from the data of the 1st accrual period, y be the vector from the origin to
(X, V), the statistics calculated from all the data at the end of the trial, and Y2 = (y
~ ndn y 1) / ~ n2/n . O'brien's test is based on the length of the projection of y on the
diagonal through the first and third quadrant, i.e. ~'y where ~' = (1/.[2, 1/.[2).
O'Brien's test can easily be extended to be optimal to detect any hypotheses that lay on
a specified line in the direction of ~, where ~ may be chosen by 'experts'. However, as
was noted in Chapter 2, the test can perform extremely poorly if the true effect deviates
in direction from the chosen~. Of particular concern would be the case where the data
from an interim review suggests that the effect is 600 away from the direction chosen
before the trial began. Do the investigators continue the trial with the knowledge that
they are very likely looking in the wrong direction or do they 'cheat' and change the test
in mid-trial? The proposed test is based on chosing ~ to be in the direction indicated by
Y1 from the interim analysis. The new statistic is defined as:
v'vL = - I -
~y/ YI
Replacing Y by ~ ndn Y1 + ~ n2/n Y2'
The distribution of L1 and L2 follow immediately.
63
(3.2.1)
(3.2.2)
(3.2.3)
(3.2.4)
•
v'Letting !O' = J -~ ,
Yl Yl
L2 = a' Y2 where Y2 '" N(Q,I) and L2 '" N(O,1).
3.2.1 Density function of L under flo
(3.2.5)
To derive the distribution of L = ..JP ~ + ~(1-p) L2 under the null
hypothesis, let K = L1 . Then,
L - .[PKL2 = .~ and L1 = K
(l-p)
and
J=1/~(1-p)
°-~ p/(l-p)
1
_ 1
- ~(1-p)·
N(O,1).
L1 has a chi-square distribution with 2 degrees of freedom and L2 is distributed
~l and ~2 are independent because they are from independent samples so L1
and L2 are also independent under flo.
So,
1
It follows that
64
00 -k (1-.[Pi<)2
fd I) = 1 Je-r 2(1-p) dk2ffi~{1-p) 0
00 -k(1-P)-(12-21.[Pi<+pk)
_ 1 Je 2(1-p) dk2ffi~{l-p) 0
00 -k-I2 +21.[Pi<
= 1 Je 2(1-p) dk2ffi~{1-p) 0
\
(3.2.6)
But,
00
Jrl;o
transformation. Let
2(~~P)({k-{P1)2e dk
(.[.k - .[PI)u = ~(l-p)'
can be evaluated with the following
it follows that {k = ~(l-p)u + .[pI and
du = 1 dk.2.Jk~{l-p)
-.[pIIf k=O then u = ~ and if k=oo then u=oo.
(l-p)
OOJ 1 2(~~P)({k-{p1)2~e dk~21ro
00 1 2
= J rl; e-2u 2(~ (l-p)u + .[PI) ~ (l-p) du
-{PI
~ (l-p)
We then have
•
OOJ _!u 2
- 1 2(1-p) ue 2 du +- :J2;-{PI
~ (l-p)
65
So,
~( .fP I)~ (l-p) .
(3.2.7)
The density function of L for p = 0.1, 0.5 and 0.9 is graphed in Figures 3.1.2.1
to 3.1.2.3, respectively. Note that when p = 0, L ,... N(O,I) and when p = 1, L is
equivalent to Hotelling's T 2 •
l~he critical region, for a fixed p, was determined by finding the value Ie for
which f fL (I) = 0.95. Euler's method of numerical integration was used iteratively and
the res~1is are shown in Table 3.2.1.1.
3.2.2 Power of L
3.2.2.1 SpeCial case: p = ~l = 0.5 and Jl2 = O.
To evaluate the power of L, let's define Xl ,... N(.JIil J.l1' 1), Y1 ,... N(.JIil J.l2' 1),
X2 ,... N( .[n2J.l1' 1) and Y2 ,... N( .[ri2J.l2' 1). The asymptotic distribution will be
derived assuming J.l1 and J.l2 are of 0(n-1
/2
). Let J.l2 = 0 and J.l = ..pil J.l1 = ..[D2 J.ll'
66
1.0
0.9
0.8
0.7
0.6
f(l)0.5
0.4
0.3
0.2
O. 1
0.0
-10 -5 0
Figure 3.2. 1. 1 Density of L. p - O. 1
67
5 10
.,
1.0
0.9
0.8
0.7
0.6
f(l)0.5
0.4
0.3
0.2
O. 1
105o-5
0.0 t.;=======::::::;::====:::::..----.-----2:==:;::::::=======:..-10
Figure 3.2.1.2 Density of L, p =0.5
68
1.0
0.9
0.8
0.7
0.6
f(l)0.5
0.4
0.3
0.2
O. 1
0.0
-10 -5 0
Figure 3.2. 1.3 Density of L, p - 0.9
69
5 10
...
..
Table 3.2.1.1. Critical values Ie and power for L, a = 0.05.
£ power
0.00 1.96 0.5275
0.05 2.0079 0.5821
0.10 2.0526 0.6276
0.15 2.0941 0.6642
0.20 2.1325 0.6935
0.25 2.1677 0.7169
0.30 2.2000 0.7355
0.35 2.2294 0.7503
0.40 2.2561 0.7620
0.45 2.2805 0.7711
0.50 2.3027 0.7783
0.55 2.3229 0.7838
0.60 2.3415 0.7881
0.65 2.3584 0.7914
0.70 2.3741 0.7938
0.75 2.3886 0.7956
0.80 2.4020 0.7970
0.85 2.4146 0.7979
0.90 2.4264 0.7984
0.95 2.4376 0.7987
1.00 2.4474 0.8000
70
then
Xl ,.., N(p, 1) X2 ,.., N(p, 1) Y 1 ,.., N(O, 1) Y2 ,.., N(O, 1).
Consequently,
(3.2.8)
(3.2.9)
The general case, using p = ~, will be developed later. In a first step, nl and
n2 wil be assumed equal to n/2 so that:
From (3.2.8),
X1X2 + Y1Y2
.[2~Xf + Yf .(3.2.10)
fXl,X2,Vl;V2(Xl,X2'Yl'Y2) = (2;)2 exp {-~ [(xl - p)2 + (X2 - p)2 + y~ + y~]).(3.2.11)
where r
Letting R = ~Xf + Y~,
.[2 L = R + XlX2 t Y1Y2
Y_ .[2RL - R2 - X1X2 8Y2 .[2R
=> 2 - Yl and 8L =~.
The density of Lis:
- ~Xf + yf and
(3.2.12)
(3.2.13)
A
71
= {X2 ft - (JJ~1 + Yi({21 - r))}2 - (JJ~1 + ;~({21 - r))2 +
~(Y~ + ({21 - r)2) + 2JJ2 - 2x1JJY1
_ { X2 #i - (JJ~1 + ;~({21 - r))}2 + 212 - 2{21 (r + JJ~1)2 2
_JJ ;1 + 2JJ2 + 2r2r
- { X2 Yi - (JJ~1 + ;~({21 - r))}2 + B
and B = 212 - 2{21 (r + JJ~1) - <;~ + 2JJ2 + 2r2
(3.2.14)
(3.2.15)
(3.2.16)
2 2By completing the square, B = 212 - 2{21 (r + JJ~1) _ JJ}1 + 2J.l2 + 2r2
= {{21 - (r + JJ~1)}2 _ (r + JJ~1)2 _ <;~ + 2JJ2 + 2r2
= {{21 - (r + JJ~1)}2 + (JJ - X1)2 + y~.
So the power when L = lc is: FL(-lc) + (1 - FL(lc)) where
FL(lC) = TT ,r;/2 JeXP{-~{(.f21 - (r + Jl~l ))2 + (JJ - x1)2 + y~} (3.2.17)-00-00(211') -00
dl dYl dXl
72
The more general case where p = ~1 and JJ* = {ii JJI will now be developed.
3.2.2.2 General case: p = ~ and JJ* = {ii JJI.
From (3.2.9),
Then, ..[iii JJI = {Pii JJl and .,[ii2 JJI = ~ (1-p)n JJI so,
fXI ,X2'YI'Y2(XI,X2'Yl'Y2) = (2;)2 exp {-~ [(xl - {jiJJ*)2 + (X2 - ~1-PJJ*)2 + y~ + y~]).(3.2.19)
Letting R = ~X? + V?,
..
and o_Y_2 = '""l"=~R~oL ~ 1-p Y I
where r
The density of Lis:
73
(3.2.20)
A = (x _{pJl*)2 + (x -~l-p Jl*)2 + Y~ + (rI-{pr2
-.J'T-P XIX2)2I 2 (l-p) Y~
= X~ - 2XI {pJl* + pJl*2 + X~ - 2X2~ l-pJl* + (l-p)Jl*2 + Y~ + 2r212
2(l-P)YI
pr4 X~X~ 2{pr31 2.Jl-'P rlxlX2 2{pr2~ l-p XlX2+ 2+-2-- 2- 2 + 2
(l-P)YI YI (l-P)YI (I-P)Yl (l-P)YI
= X2 f2 _ 2 X f (.J'T-PJl*YI + xl (I - rpf»)2 ~ 2 Yi r ~l-PYI "P"
+ r2
2(I-p)y~ + (l - {pr)2) + Jl*2 - 2XI {pJl*(I-P)YI
= {X .L _ (.J'T-PJl*YI + xl (I _ {pr»)}22 YI r ~l-PYI
- (.J'T-P:*YI + xl (I _ {pr»)2 + r2
2((1-p)y~ + (I - {pr)2)~ I-PYI (l-P)YI
(3.2.21)
(3.2.22)
(3.2.23)
By completing the square,
B = £ _ 21 ({pr + Jl*Xl) _ (l-p)Jl*2y~ + Jl*2 + .Ll-p l-p r r2 l-p
= l~p {(I- (.[pr + (lop~*Xl))2 _ (.[pr + (lop~*Xl)2
74
_ (1_p)2 J.&*2y~ + (l-p )J.&*2 + r2}r 2
= l~P (1 - (.JPr + (1-P~*Xl))2 + (.JPJ.&* - Xl)2 + y~.
So the power when L = Ie is: FL(-le) + (1 - FL(le)) where
..
00 00 {
=f f 1 3/2-00-00 ~ 1-p (2?l')
*
(3.2.26)
The power of this test, when Hotelling's T 2 power is 80%, was obtained by
evaluating FL(-le) + (1 - FL(le)) where J.&* = 3.1064 and Ie was taken from Table
3.2.1.1 for each different value of p. For example, when p = 0.10, Ie = 2.0526. When
Hotelling's power is 80%, the ray of its power contour is 3.1064. As expected, without
allowing for early stopping, the power of the new test is not superior to that of
Hotelling's T 2 . If Y1 is a short vector i.e. the statistics observed at the end of the
interim analysis do not deviate substantially from the null hypothesis, then the direction
of the vector Y2 can vary greatly as well as its projection on Y1 which affects the power
of the new test. Allowing for early acceptance of the null hypothesis takes care of this
situation and improves the power.
75
3.2.3 Distribution of L, allowing for early stopping.
In the previous section, the distribution and power of L have been derived
without allowing for early stopping. However, if LI is very small, the possibility of not
rejecting Ho must be considered. In the same manner, if LI is very large, Ho could be
rejected and the trial stopped. Let cI' C2 and Ie be three critical values such that :
Pr (rejecting Ho at the interim analysis) = Pr (LI > CI)
and
Pr (going to the 2nd stage and rejecting Ho ) = Pr (C2 < LI < c i and L > Ie).
Assume Cl= 00, so that the probability of rejecting Ho at the interim analysis is
zero. Then, modifying (3.2.6),
.JP I ~(_ c2 -..fP I ) }~(l-p)
(3.3.1)
For a fixed p, and C2 such that P(LI <C2) = P l , the probability of rejecting the
null hypothesis is:
(3.3.2)
where
76
and •
*
Table 3.2.3.1 shows the critical values Ie for which the above probability is 0.05
and, p (ndn) and P 1 vary from 0 to 0.90 by increments of 0.10. The power using the
alternative hypothesis for which Hotelling's power is 80% is presented in Table 3.2.3.2.
The greatest power 89.3% is observed at p = n1/n = 0.50 and P 1 = P(L1 < C2) = 0.70.
In that neighbourhood, for p between 0.40 and 0.60 and P 1 between 0.5 and 0.8, the
power is approximately 88%. It would seem reasonable and intuitive to consider p =
P 1 = 0.50 in the planning of a trial. Finally, from Table 3.2.3.3, it can be seen that the
greatest power for the new test 63.4%, for alternative hypotheses for which Hotelling's
power is 50% is obtained when p = 0.50 and P 1 = 0.70. Figures 3.2.3.1 and 3.2.3.2
show the added line for the power of L when PI = 0.7 and p = 0.5. The power of the
new test L is constant for alternative hypotheses on the 1800 region and it is also
greater than the power of the other three tests for the same alternative hypotheses. The
gain in power may be explained by the O'Brien 'type' approach used with the new test
but part of the improvement may also be due to the use of a two-stage design compared
to a one-stage design for the other three tests.
77
Table 3.2.3.1 Critical values Ie and power for L, allowing for early stopping, a = 0.05.
P PI C2 Ie P 2
0.00 0.00 0.0000 1.9600 0.950.00 0.10 0.4590 1.9150 0.850.00 0.20 0.6680 1.8630 0.750.00 0.30 0.8446 1.8030 0.650.00 0.40 1.0108 1.7320 0.550.00 0.50 1.1774 1.6450 0.450.00 0.60 1.3537 1.5350 0.350.00 0.70 1.5518 1.3830 0.250.00 0.80 1.7941 1.1510 0.150.00 0.90 2.1460 0.6750 0.05
0.10 0.00 0.0000 2.0530 0.950.10 0.10 0.4590 2.0240 0.850.10 0.20 0.6680 1.9890 0.750.10 0.30 0.8446 1.9470 0.650.10 0.40 1.0108 1.8960 0.550.10 0.50 1.1774 1.8300 0.450.10 0.60 1.3537 1.7410 0.350.10 0.70 1.5518 1.6130 0.250.10 0.80 1.7941 1.3990 0.150.10 0.90 2.1460 0.8950 0.05
0.20 0.00 0.0000 2.1330 0.950.20 0.10 0.4590 2.1150 0.850.20 0.20 0.6680 2.0940 0.750.20 0.30 0.8446 2.0660 0.650.20 0.40 1.0108 2.0300 0.550.20 0.50 1.1774 1.9810 0.450.20 0.60 1.3537 1.9130 0.350.20 0.70 1.5518 1.8090 0.250.20 0.80 1.7941 1.6260 0.150.20 0.90 2.1460 1.1510 0.05
0.30 0.00 0.0000 2.2000 0.950.30 0.10 0.4590 2.1910 0.850.30 0.20 0.6680 2.1780 0.750.30 0.30 0.8446 2.1600 0.650.30 0.40 1.0108 2.1360 0.550.30 0.50 1.1774 2.1010 0.450.30 0.60 1.3537 2.0490 0.350.30 0.70 1.5518 1.9660 0.250.30 0.80 1.7941 1.8140 0.150.30 0.90 2.1460 1.3940 0.05
78
Table 3.2.3.1 (continued)
P Pi C2 Ie P 2 ..0.40 0.00 0.0000 2.2560 0.950.40 0.10 0.4590 2.2520 0.850.40 0.20 0.6680 2.2450 0.750.40 0.30 0.8446 2.2340 0.650.40 0.40 1.0108 2.2180 0.550.40 0.50 1.1774 2.1940 0.450.40 0.60 1.3537 2.1560 0.350.40 0.70 1.5518 2.0920 0.250.40 0.80 1.7941 1.9690 0.150.40 0.90 2.1460 1.6060 0.05
0.50 0.00 0.0000 2.3030 0.950.50 0.10 0.4590 2.3010 0.850.50 0.20 0.6680 2.2980 0.750.50 0.30 0.8446 2.2920 0.650.50 0.40 1.0108 2.2830 0.550.50 0.50 1.1774 2.2670 0.450.50 0.60 1.3537 2.2410 0.350.50 0.70 1.5518 2.1940 0.250.50 0.80 1.7941 2.0980 0.150.50 0.90 2.1460 1.7940 0.05
0.60 0.00 0.0000 2.3420 0.950.60 0.10 0.4590 2.3410 0.850.60 0.20 0.6680 2.3400 0.750.60 0.30 0.8446 2.3380 0.650.60 0.40 1.0108 2.3330 0.550.60 0.50 1.1774 2.3240 0.450.60 0.60 1.3537 2.3080 0.350.60 0.70 1.5518 2.2770 0.250.60 0.80 1.7941 2.2070 0.150.60 0.90 2.1460 1.9620 0.05
0.70 0.00 0.0000 2.3740 0.950.70 0.10 0.4590 2.3740 0.850.70 0.20 0.6680 2.3740 0.750.70 0.30 0.8446 2.3730 0.650.70 0.40 1.0108 2.3720 0.550.70 0.50 1.1774 2.3680 0.450.70 0.60 1.3537 2.3610 0.350.70 0.70 1.5518 2.3430 0.250.70 0.80 1.7941 2.2980 0.150.70 0.90 2.1460 2.1150 0.05
79
Table 3.2.3.1 (continued)
P PI c2 Ie P 2
0.80 0.00 0.0000 2.4020 0.950.80 0.10 0.4590 2.4020 0.850.80 0.20 0.6680 2.4020 0.750.80 0.30 0.8446 2.4020 0.650.80 0.40 1.0108 2.4020 0.550.80 0.50 1.1774 2.4010 0.450.80 0.60 1.3537 2.3990 0.350.80 0.70 1.5518 2.3930 0.250.80 0.80 1.7941 2.3710 0.150.80 0.90 2.1460 2.2530 0.05
0.90 0.00 0.0000 2.4270 0.950.90 0.10 0.4590 2.4270 0.850.90 0.20 0.6680 2.4270 0.750.90 0.30 0.8446 2.4270 0.650.90 0.40 1.0108 2.4270 0.550.90 0.50 1.1774 2.4270 0.450.90 0.60 1.3537 2.4270 0.350.90 0.70 1.5518 2,4260 0.250.90 0.80 1.7941 2.4220 0.150.90 0.90 2.1460 2.3750 0.05
80
Table 3.2.3.2 Power of L, allowing for early stopping, for alternatives for whichHotelling's power is 80%.
P(L1 < c2)0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90
0.10 0.628 0.651 0.670 0.683 0.690 0.691 0.687 0.684 0.681 0.650
0.20 0.694 0.721 0.746 0.767 0.783 0.793 0.797 0.795 0.791 0.764
0.30 0.736 0.763 0.789 0.812 0.831 0.846 0.854 0.854 0.843 0.808
0040 0.762 0.787 0.811 0.833 0.853 0.869 0.880 0.882 0.872 0.831
0.50 0.778 0.800 0.820 0.841 0.859 0.875 0.887 0.893 0.885 0.845
0.60 0.788 0.805 0.822 0.839 0.855 0.870 0.882 0.890 0.887 0.852
0.70 0.794 0.807 0.819 0.832 0.845 0.857 0.868 0.877 0.879 0.853
0.80 0.797 0.805 0.814 0.823 0.831 0.840 0.848 0.855 0.860 0.847
0.90 0.798 0.803 0.807 0.811 0.815 0.820 0.824 0.828 0.832 0.831
81
..
Table 3.2.3.3 Power of L, allowing for early stopping, for alternatives for which
Hotelling's power is 50%.
P(L1 < c2)0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90
0.10 0.391 0.416 0.441 0.465 0.487 0.504 0.513 0.509 0.487 0.457
0.20 0.428 0.455 0.483 0.510 0.537 0.560 0.578 0.583 0.569 0.515
0.30 0.454 0.481 0.508 0.536 0.564 0.591 0.612 0.623 0.611 0.551
0.40 0.468 0,493 0.518 0.545 0.571 0.597 0.619 0.633 0.627 0.565
0.50 0.484 0.505 0.527 0.550 0.574 0.598 0.619 0.634 0.632 0.578
0.60 0.492 0.509 0.527 0.546 0.565 0.585 0.603 0.618 0.621 0.579
0.70 0.497 0.510 0.523 0.537 0.552 0.566 0.581 0.594 0.599 0.573
0.80 0.500 0.508 0.517 0.526 0.535 0.545 0.555 0.564 0.570 0.559
0.90 0.500 0.505 0.509 0.514 0.518 0.523 0.527 0.532 0.536 0.536
82
1.0
L0.9
0.8
0.7P0 0.6WE 0.5
R0.4
0.3
0.2
0.1
0.0
-90 -45 0 45 90
ANGLE (degrees)
Figure 3.2.3.1 Power at each one-degree sectionfor alternative hypotheses for whichthe power of Hotelling's test is 80%Bonferroni' (B) Hotelling (H) O'Brien (0)New test (L)
83
1.0
0.9
0.8
0.7 LP0 0.6
WE 0.5
R0.4
0.3
0.2
O. 1
0.0
-90 -45 0 45 90
ANGLE (deg-ees)
Figure 3.2.3.2 Power at each one-degree sectionfor alternative hypotheses for whichthe power of Hotelling's test is 50%Bonferroni (B) Hotelling (H) O'Brien (0)New test (L)
84
3.3 Transformation for correlated data
The new test presented above has been developed for uncorrelated outcomes. In
clinical trials, correlated outcomes are often encountered. In that case, it is necessary to
convert them into uncorrelated outcomes before the new test can be used.
Let ~ be a vector distributed as N(Q,~) that is to be transformed into ~, a
vector of uncorrelated N(O,I) variables. There exists a matrix ~ equal to the "square
root" of ~-l such that ~ = ~~. By definition, ~ is the "square root" of the positive
definite matrix ~-l if ~'~ = ~-l. Methods to evaluate ~ have already been derived
(Graybill, 1969).
This transformation must be applied to the vectors YI and Y2 if their
components are not uncorrelated before the new test can be used.
3.4. Using the new test L with more than two endpoints
Let Yi (rxl) be the vector of normally distributed statistics from the data at the
end of the first acrrual period where r, the number of endpoints, is greater than 2. Let
y* (rxl) be the corresponding vector from all the data at the end of the trial and y~ =
(y* - {i5 vi) / ~ I-p. Define
v*' v* ~ V*' V*L* = - 1 - = rp V*' V* + ~ I-p - 1 - 2~ *' * ~ }' _ 1 _ 1 j *' *Y~ Y1 YI Y1
There exists an orthonormal matrix ~ such that Y1 = ~ 'vi and Y = ~'y~.
Using the Gram-Schmidt orthogonalization procedure, we can construct two
orthonormal vectors ~1 and ~2 that are linear combinations of Yi and Y;. Let ~I and
g2 be two linear functions from Yi and y~ such that ~1 and ~2 will be the normalized-. gj . 1 2~i I.e. IIg.11' 1 = , .
_I
g1 V* V*Let's start by choosing ~1 = Yi then A - _-_ - 1_ - ~ - 1
- 1 - 11~111 - IIYill - vi' y(
Next, let's choose ~2 = y~ - 01 ~1 where 01 is such that ~1 and ~2 are
orthogonal. Then,
So,
~2 =y; - ~~ y; ~1 =y;
85
V*' V*
II~/tll y; IIVhl
~2 = Y~(v*'v*)v*_1 _2 _1
V*' V*_ 1 _ 1
~2 {}and ~2 = 1I~211" Therefore, ~ = ~1' ~2 is an orthonormal matrix and
y, =~'yr = {~:. ~W yr ={Nr' Yr, 0}' and
{ }'V*' V*V = - 1 - 2 ,M where- 2 ~ *' * NYl Yl
(v*' V*)V*' V*M = V*' V* _ - 1 - 2 - 1 - 2 and- 2 - 2 V*' V*_ 1 _ 1
N= (v* _ (Yi'Y~)Yi)' (v* _ (Yi'Y~)Yi).- 2 V*' V* - 2 V*' V*_ 1 _ 1 _ 1 _ 1
Then, the new statistic L based on the 2x1 vectors Y1 and Y2 is equal to L*based on the rxl vectors Yi and Y~ because it is invariant to orthogonal
transformations i.e.
~ V*' AA'V*= rp V*' AA'V* + ~ 1-p - 1 - - - 2~ p _ 1 - - _ 1 ~ *' '*Yl ~~ Y1
I V*' V *= {ii ~ Yi' yr + ~ 1-p - 1 - 2IV*' V*~ _ 1 _ 1
v*' V*- _1 - -L*-~V*'V*-- 1 _ 1
86
Furthermore, as Y1 and Y2 are also normally distributed and of dimension
(2xl), the theory already developed for the distribution of L still holds for L*.
87
CHAPTER 4TWO EXAMPLES
4.1 Reduction in incidence of coronary heart disease
The Lipid Research Clinics Coronary Primary Prevention Trial (LRC-CPPT) is
a multicenter, randomized, double-blind clinical trial that was initiated in 1973 to study
the efficacy of lowering total plasma cholesterol levels (TOTAL-C) and low-density
lipoprotein cholesterol levels (LDL-C) in reducing risk of coronary heart disease (CHD).
High-density lipoprotein cholesterol levels (HDL-C) were also looked at as it is believed
that they are negatively correlated with the incidence of CHD. A cohort of men, aged
35 to 59 years, with a high risk of developing CHD was followed for an average period of
7.4 years. The accrual phase consisted of four screening visits at monthly intervals. At
the second screening visit, a' moderate cholesterol-lowering diet was prescribed for all
potential participants. At the fifth visit to the clinic, eligible participants were randomly
assigned to one of two groups. The treatment group was prescribed the bile acid
sequestrant cholestyramine resin and the control group received a placebo. Participants
were followed up bimonthly for all the duration of the trial. The primary endpoint for
evaluating the treatment was the combination of definite CHD death and/or definite
nonfatal myocardial infarction. The effect of the drug on the different cholesterol levels
and triglyceride levels (TG) was also investigated.
To illustrate the use of the new statistic L as well as Hotelling's T 2 , and
O'Brien's test, 389 participants from one of the twelve clinics of the original trial were
studied. The two treatments were compared with respect to the relative changes (%~)
in HDL-C/TOTAL-C and TG at the end of the first year. A positive relative change in
HDL-C/TOTAL-C and a negative relative change in TG would indicate a beneficial
effect from the cholestyramine resin. Percent changes from the participant baseline were
computed for each individual and their averages are presented in Table 4.1.1. A 23.5%
increase in HDL-C/TOTAL-C was observed in the cholestyramine group after one year
compared to 2.9% in the placebo group. However, triglyceride levels rose by 10.8% in
the treatment group and 6.1% in the placebo group. In order to have the treatment
effect on both variables in the same direction, -%A(TG) was used in the analyses. The
correlation between %A(HDL-C/TOTAL-C) and -%A(TG) was 0.14. A two-stage
group sequentilll design was simulated by dividing the subjects into two groups with
50% of them for each accrual period. The division was based on the date of
randomization. Tables 4.1.2 and 4.1.3 show the means for each accrual period. They
are consistent with what was observed for the whole sample. For both accrual periods,
the relative changes for both endpoints are bigger in the cholestyramine group. The
drug effect on the percent changes for each accrual period are shown in Table 4.1.4 as
well as their corresponding T. statistics. The relative change in triglycerides levels was
bigger in the treatment group than in the control group. At the end of the first accrual
period, (n-3)T2 /(n-2)2 = 19hT2 /192/2 = 24.38 and was distributed F with 2 and 191
degrees of freedom. With the Pr(L1 < C2) at the 0.7 level, early acceptance of the null
hypothesis could not be achieved with this result (p < 0.3) so the data from the second
stage was analysed and the results at the end of the trial are included in Table 4.1.5.
For O'Brien's test, the data was standardized first and the F statistic was 32.82
(p<0.001). The L test was calculated from the uncorrelated outcomes. All three
statistics were significant indicating that the cholestyramine resin had an effect on HDL
C/TOTAL-C and on triglycerides. Figure 4.1.1 presents a graphical representation of
O'Brien's test and the L test. The projections of the vector y, the statistics at the end
89
Table 4.1.1. Mean cholesterol and triglycerides levels, all subjects.
All (n=389)
Placebo (n=194)
Pre Post %6
Treatment (n-195)
Pre Post %6
TOTAL-C 292.6
HDL-C 42.6
HDL-C/TOTAL-C 0.147
TG 166.8
275.3
40.6
0.149
167.2
0.029
0.061
291.7 236.7
43.9 42.9
0.152 0.185 0.235
161.4 171.1 0.108
Table 4.1.2. Mean cholesterol and triglycerides levels, first accrual period.
Accrual 1 Ln.!=194)
Placebo (n=98)
Pre Post %6
Treatment (n=96)
Pre Post %6
TOTAL-C 295.5
HDL-C 44.3
HDL-C/TOTAL-C 0.151
TG 168.3
275.9
42.0
. 0.153 0.022
166.3 0.048
294.4 238.5
45.6 43.8
0.156 0.187 0.221
159.0 165.7 0.085
Table 4.1.3. Mean cholesterol and triglycerides levels, second accrual period.
Placebo (n-96)
Pre Post %6
Treatment (n-99)
Pre Post %6
TOTAL-C 289.6 272.6 289.1 234.9
HDL-C 40.7 39.2 42.3 42.0
HDL-C/TOTAL-C 0.142 0.145 0.037 0.148 0.182 0.249
TG 165.2 168.1 0.075 163.8 176.3 0.130
90
Table 4.1.4. Drug effects and their corresponding T statistics, for each accrual period.
Drug effect (Treatment :. Placebo) T statistic
Accrual 1:•
%a(HDL-CjTOTAL-C)
-%a(TG)
Accrual 2:
%a(HDL-CjTOTAL-C)
-%a(TG)
0.199
-0.037
0.212
-0.055
6.645
-0.921
7.881
-1.245
Table 4.1.5. Test results at the end of the LRC-CPPT trial.
O'Brien's test:
2.Hotelling's T •
L test:
F 1,385 = 32.82, p-value = 2.04 E-8
F2,386 = 58.94, p-value = 0
L = 11.1298, p-value = 9.03 E-7
of the trial, on Y1 (L test), the corresponding statistics at the end of the first accrual
period, and on the diagonal of quadrants I and III (O'Brien's test) are shown.
4.2 Oral contraceptives and coronary artery atherosclerosis of cynomolgus monkeys
Studies have shown that an increase in high-density lipoprotein (HDL)
concentrations can red uce coronary artery atherosclerosis. As the effect of some oral
contraceptives is to reduce HDL concentrations in women, there is a potential risk of
increasing corornary artery atherosclerosis. Clarkson et al. (1990) studied the effect of
two contraceptive steroid preparations on 83 adult female cynomolgus macaques fed a
moderately atherogenic diet. Their age varied between 4 and 8 years and none were
91
12
....-.... 6C)
GwC)
z« 0
I0 L~
--.
-6
-12 'T------.-----+-------r-------.-12 -6 o 6 12
%CHANGE(HDL-CjTOTAL-C)
Figure 4. 1. 1 Cholestyramine tria I
L statistic = OL, O'Brien's statistic = OB
92
pregnant. The two preparations were: ethinyl estradiol with nogestrel and ethinyl
estradiol with ethynodiol diacetate. The monkeys were randomized into three groups,
one for each preparation and the control group, balanced for the ratio of total plasma
cholesterol to HDL cholesterol, age, and the frequency of menstrual cycles. These
characteristics are known to influence atherogenesis. There were also no differences
between the three groups,. during the pre-experimental period, in social status rankings,
based on aggressive behavior, plasma lipid concentrations and low-density lipoprotein
(LDL) cholesterol. Atherosclerosis was characterized as the cross-sectional area of
intimal lesion in mm2 of a histologic section of a tissue block. At necropsy, after the
animals were given sodium pentobarbital, five tissue blocks were cut for each of three
coronary arteries: aorta, carotid and iliaca-femoral arteries.
For this example, the effect of contraceptives on cholesterol concentrations as
well as atherosclerosis are considered. The two endpoints of interest are the natural
logarithm of the ratio of the total plasma cholesterol to HDL cholesterol as measured at
the end of the experiment (LTHDL) and the natural logarithm of the mean of the
intimal areas over the five sections of the three coronary arteries (LMIA). The two
groups that received the oral contraceptives are combined (n=49) and compared to the
group that was given the placebo (n=24). The correlation between the two endpoints is
0.79 and, the mean and standard deviation for each group are presented in Table 4.2.1.
Table 4.2.1 Mean LTHDL and LMIA, all subjects.
..
Placebo
Contraceptive
LTHDL
2.097±O.15
2.653±0.12
LMIA
-3.736±0.49
-4.243±O.31
In the original study, all macaques have been recuited at the same time. To
simulate a first accrual period with 50% of the monkeys (p = ndn = 0.5), each group is
93
randomly divided into two subgroups. The mean and standard deviations of each
subgroups are shown in Table 4.2.2.
Table 4.2.2 Mean LTHDL and LMIA, for each accrual period
Placebo (n=24)
1st accrual (n=12)
2nd accrual (n=12)
Contraceptive (n=49)
1st accrual (n=24)
2nd accrual (n=25)
LTHDL
2.087±0.18
2.106±0.25
2.656±0.19
2.651±0.16
LMIA
-4.024±0.61
-3.448±0.79
-4.098±0.43
-4.383±0.44
At the interim analysis, there was an increase of 0.5688 for LTHDL in the
contraceptive group while a decrease of 0.0741 was observed for LMIA. Their respective
T statistics were 1.883 and -0.099. The Hotelling's T 2 was 8.5688 which did not allow
for early acceptance of the null hypothesis of no difference between the two treatments
so the trial would continue through its second stage. At the end of the trial, the group
effects are 0.5566 (T statistic = 2.683) for LTHDL and -0.5073 (T statistic = -0.916) for
LMIA. Table 4.2.3 shows the final results. Hotelling's T 2 is 31.9616 with a p-value of
0.0000002 when comparing it to a X2 with 2 degrees freedom. The overall mean is
substracted from each observation and the result is divided by the pooled within-group
sample standard deviation before obtaining O'Brien's F equal to 0.8718; the
corresponding p-value from the standard F distribution with 1 and 69 degrees freedom
in the numerator and denominator is 0.3537. Finally, the L test is calculated after
transforming the variables into uncorrelated outcomes with the transformation described
[
0.8975 0.6377]in Chapter 3 where A = . Its value is 9.1201 with a p-value of
- 0.6377 0.8975
94
0.000001 obtained from numerical integration using the distribution function of L. The
results are illustrated in Figure 4.2.1. While O'Brien's test would lead to the conclusion
that the treatment does not have a significant effect on the outcomes, both Hotelling's
T 2 and the L test would indicate that there is a significant difference between the
treatment group and the control group. The power of O'Brien's test to detect
alternative hypotheses that are in opposite directions i.e. on the diagonal of quadrants II
and IV is only 5%1 therefore, it is not surprising to observe a p-value > 0.05.
Table 4.2.3. Test results at the end of the cynomolgus monkeys trial.
O'Brien's test:
2.Hotelling's T •
L test:
F 1,69 = 0.8718, p-value = 0.3537
F2,70 = 31.9616, p-value = 0.0000002
L = 9.1201, p-value = 0.000001..
These examples were included to illustrate how each one of the three procedures
can be implemented. The goal was not to show which test is the best one as it could
not be determined through a few examples. Rather, the power of each test for different
alternative hypotheses should be taken into account during the planning a trial to select
the most appropriate procedure.
95
•
10
5
----'-_ L
o+---------~~--------
-5
LMIA
-10 Lr----~----_+_---~----__._
-10 -5 o 5 10
LTHDL
Figure 4.2. 1 Oral contraceptives trial
l statistic = Ol, O'Brien's statistic - OB
96
CHAPTER 5SUMMARY AND SUGGESTIONS FOR FUTURE RESEARCH
This work has presented a global test statistic for the analysis of multiple
endpoints. Pocock and O'Brien have discussed the use of a global statistic as an
additional tool to univariate methods. Instead of leaving the reader with the
interpretation of multiple p-values, the global test provides an overall conclusion about
the differences between two treatments that takes into account the correlation structure
of the multiple endpoints. From the literature review, three procedures arose as being
the most commonly used: Bonferroni's procedure that performs well with moderately
correlated outcomes, Hotelling's ~2 that has the same power to detect a difference in
any direction and O'Brien's test that has the greater power of the three tests when the
variables are all affected in exactly the same direction and the same magnitude.
The new test that has been proposed combines the robustness of Hotelling's :r2
and the optimality properties of O'Brien's test for alternatives that have their effect in
the same direction and the same magnitude with the use of a two-stage group sequential
design. The new test allows one to 'cheat' and look at the data at the interim analysis
in order to use an O'Brien type test at the end of the trial. The 'cheating' is permissible
provided one pays the price of using the correct distribution derived in this dissertation.
It is not limited to continuous variables like the difference between means; it can be used
with any test statistic that is normally distributed or at least asymptotically normally
distributed like log odds, hazard ratios, etc.. It is invariant to rotation like Hotelling's
T 2 which makes it robust for alternative hypotheses in any direction. Unlike the power
of O'Brien's test, its power does not deteriote sharply when the variables are not
affected in exactly the same direction and the same magnitude. An optimal p = ndn,
the proportion of participants recruited during the first accrual period, and PI = P(Ll <
C2)' the probability of accepting the null hypothesis at the interim analysis, can be
determined so that the power of the new test is greater than the power of the three
common procedures in the neighborhood of these specified values. Another attractive
property of this new test is that its application to more than two endpoints is
immediate. Furthermore, even for vectors of dimension greater than two, there always
exists an orthogonal matrix that will project the vectors onto a two-dimensional space
while preserving the angle between them so that the distribution properties already
derived remain the same for three or more endpoints.
Future research could be performed in order to increase the usefulness of this test
in practice. For this research, the main interest was to investigate the improvement in
power when using the results generated at the interim analysis in the development of the
final statistic so the probability of rejecting the null hypothesis at the end of the first
stage of the trial was assumed to be zero. Relaxing this assumption to allow for
rejection of the null hypothesis at the interim analysis would reduce the expected sample
size of the two-stage design and a study of the power in that case should be considered.
Some endpoints may be related but hard to combine because of the different
level of importance of each one. For example, in a clinical trial where death and/or
myocardial infarction are the two endpoints of interest, the investigators may want to
consider death as a more severe outcome than myocardial infarction. An extension of
the new test that would allow the use of weights that reflect the importance of each
variable should be explored.
The test can be used in situations where the parameters are not just means. It
may be applied to statistics that are asymptotically normal. However, the parameter p
98
•
which correspond to the proportion of information that is observed at the time from
analysis at the first stage may be different than the proportion of subjects recruited
during the first period of accrual. For example, in survival analysis, p would be the
proportion of deaths at then end of the first stage relative to the total expected deaths
at the end of the trial. Although the new test applies to most MLE estimates, the
derivation of the distribution assumes that p is the same for all variables. This would
cause difficulties in combining survival and mean endpoints where p for a mean is
typically ~l and p in a survival context is typically ~l where d 1 is the number of deaths
at the end of the first stage and d is the number of deaths at the end of the trial.
Further research is required to allow different p's for different variables.
Finally, as it was mentioned in Chapter 3, the improvement in power for the new
test may be partly due to the two-stage design. Further comparisons with a two-stage
design using Hotelling's T 2 at each stage should be investigated.
In summary, the the9ry presented here is an important step in the direction of
providing more powerful tools for the analysis of multiple endpoints. Although the
results have been presented in the context of clinical trials, the new procedure can be
used in other situations where multiple outcomes are analyzed with a two-stage design.
99
REFERENCES
Abelson, R.P. and Tukey, J.W. (1963). Efficient Utilization of Non-numericalInformation in Quantitative Analysis: General Theory and the Case of SimpleOrder, Annals of Mathematical Statistics, 34, 1341-1369.
Anderson, T.W. (1958). An Introduction to Multivariate Statistical Analysis, Wiley, NewYork.
Armitage, P. (1957). Restricted Sequential Procedures, Biometrika, 44, 9-26.
Armitage, P. (1915). Sequential Medical Trials, Oxford:Bl?,ckwell.
Armitage, P. (1918). Sequential Medical Trials, Biomedicine Special Issue, 28, 40-41.
Armitage, P. and Parmar M. (1986). Some Approaches to the Problem of Multiplicityin Clinical Trials, Proceedings of the Xlllth International Biometrics Conference.
Bauer, P. (1981). On- the Assessment of the Performance of Multiple Test Procedures,Biom. Journal, 28, 811-819.
Bauer, P. (1986). Two Stage Sampling for Simultaneously Testing Main and SideEffects in Clinical Trials, Biom. Journal, 28, 811-819.
Bauer, P., Hackl, P., Hommel, G., Sonnemann, E. (1986). Multiple Testing of Pairs ofOne-Sided Hypotheses, Metrika, 33, 121-121.
Berry, D.A. (1988). Multiple Comparisons, Multiple Tests, and Data Dredging: ABayesian Perspective, Bayesian Statistics 3, 79-84.
Breslow, N. (1990). Biostatistics and Bayes, Statistical Science, 5, 269-298.
Clarkson, T.B., Shively, C.A., Morgan, T.M., Korotnik, D.R., Adams, M.R. andKaplan, J.R. (1990). Oral Contraceptives and Coronary Artery Atherosclerosisof Cynomolgus Monkeys, Obstetrics and Gynecology, 75, 217-222.
Cupples, L.A., Heeren, B.A., Schatzkin, A., Colton, T. (1984). Multiple Testing ofHypotheses in Comparing Two Groups, Annals of Internal Medicine, 100, 122-129.
DeMets, D.L. and Ware, K.K.G. (1980). Group Sequential Methods for Clinical Trialswith a One-Sided Hypothesis, Biometrika, 67, 651-660.
DeMets, D.L. and Lan, K.K.G. (1984). An Overview of Sequential Methods and theirApplication in Clinical Trials, Communications in Statistics - Theory and Methods,13 (19), 2315-2338.
Duncan, D.B. (1951). A Significance Test for Differences Between Ranked Treatmentsin an Analysis of Variance, Virginia Journal Of Sciences, 2, 111-189.
Duncan, D.B. (1952). On the Properties of the Multiple Comparison Test, VirginiaJournal of Sciences, 3, 49-61.
100
Dunn, O.J. (1959). Confidence Intervals for the Means of Dependent, NormallyDistributed Variables, Journal of the American Statistical Association, 54, 613-621.
Fairbanks, K. and Madsen, R. (1982). P Values for Tests Using a RepeatedSignificance Test Design, Biometrika, 69, 69-74.
Fleming, T.R., Harrington, D.P., O'Brien P.C. (1984). Designs for Group SequentialTests, Controlled Clinical Trials, 5, 348-361.
Freedman, L.S., Lowe, D., Macaskill , P. (1984). Stopping Rules for Clinical TrialsIncorporating Clinical Opinion, Biometrics, 40, 575-586.
Friedman, L.M., Furberg, C.D., DeMets, D.L. (1981). Fundamentals of Clinical Trials,Wright, Boston.
Gail, M. (1984). Nonparametric Frequentist Proposals for Monitoring ComparativeSurvival Studies, Handbook of Statistics, P.R. Krishnaiah and P .K. Sen (eds), 4,791-811.
Geller, N.L., Pocock, S.J. (1987). Interim Analyses in Randomized Clinical Trials:Ramifications and Guidelines for Practitioners, Biometrics, 43, 213-223.
Godfrey, K. (1985). Comparing the Means of Several Groups, The New England Journalof Medicine, 313, 1450-1456.
Gould, A.L. and Pecore, V.J. (1982). Group Sequential Methods for Clinical TrialsAllowing Early Acceptance of Ho and Incorporating Costs, Biometrika, 69, 75-80.
Graybill, F.A. (1969). Introduction to Matrices with Applications in Statistics,Wadsworth Publishing Company Inc., Belmont, California.
Harrington, D.P., Flemming, T.R., Green, S.J. (1982). Procedures for Serial Testing inCensored Survival Data, IMS Monograph Series, 269-286.
Haybittle, J.L. (1971). Repeated Assessment of Results in Clinical Trials of CancerTreatment, British Journal of Radiology, 44, 793-797.
Hochberg, Y. (1988). A sharper Bonferroni procedure for multiple tests of significance,Biometrika, 75, 4, 800-802.
Hochberg, Y., Tamhane, A.C. (1987). Multiple Comparison Procedures, Wiley, NewYork.
Holm, S. (1979). A Simple Sequentially Rejective Multiple Test Procedure,Scandinavian Journal of Statistics, 6, 65-70.
Hommel, G. (1983). Tests of the overall hypothesis for arbitrary dependencestructures, Biometrical Journal, 25, 423-430.
Hommel, G. (1986). Multiple test Procedures for arbitrary dependence structures,Metrika, 33, 321-336.
Hommel, G. (1988). A stagewise rejective mulyiple test procedure based on a modifiedBonferroni test, Biometrika, 75, 2, 383-386.
101
•
•
Hommel, G. (1989). A comparison of two modified Bonferroni procedures, Biometrika,76, 3, 624-625.
Hotelling, H. (1931). The Generalization of Student's Ratio, Annals of MathematicalStatistics, 2, 360-378.
Jennison, C. and Turnbull, B.W. (1983). Repeated Confidence Interval for GroupSequential Clinical Trials, Controlled Clinical Trials, 5, 33-45.
Johnson, L.W. and Riess, R.D. (1977). Numerical Analysis, Addison-Wesley PublishingCompany, Reading.
Jones, D. and Whitehead J. (1979a). Sequential Forms of The Log Rank and ModifiedWilcoxon Tests for Censored Data, Biometrika, 66, 105-113.
Jones, D.R. and Whitehead J. (1979b). Group Sequential Methods, British Journal ofCancer, 40, 171.
Lan, K.K.G. and DeMets, D.L. (1983). Discrete Sequential Boundaries for ClinicalTrials, Biometrika, 70, 659-663.
Lan, K.K.G. , DeMets, D.L., Halperin, M. (1984). More Flexible Sequential and NonSequential Designs in Long-Term Clinical Trials, Communications in Statistics Theory and Methods, 13 (19), 2339-2353.
Lipid Research Clinics Program (1984). The Lipid Research Clinics Coronary PrimaryPrevention Trial Results: I. Reduction in Incidence of Coronary Heart Disease,Journal of the American Medical Association, 251, 351-364.
Lipid Research Clinics Program (1984). The Lipid Research Clinics Coronary PrimaryPrevention Trial Results: II. The Relationship of Reduction in Incidence ofCoronary Heart Disease to Cholesterol Lowering, Journal of the American MedicalAssociation, 251, 365-374.
McPherson, K. (1974). Statistics: The problem of Examining Accumulating Data MoreThan Once, The New England Journal of Medicine, 290, 501-502.
Mc Pherson, K. and Armitage, P. (1971). Repeated Significance tests on accumulatingdata when the null hypothesis is not true, Journal of the Royal Statitical Society,Series A, 134, 15-25.
Meier, P. (1975). Statistics and Medical Experimentation, Biometrics, 31, 511-529.
Miller, R. (1981). Simultaneous Statistical Inference, McGraw-Hill, New York.
Morrison, D.F. (1976). Multivariate Statistical Methods, McGraw-Hill, New York.
Noble, B. (1969). Applied Linear Algebra, Prentice-Hall Inc., Englewood Cliffs, NewJersey.
O'Brien, P.C. (1984). Procedures for Comparing Samples with Multiple Endpoints,Biometrics, 40, 1079-1087.
O'Brien, P.C. and Fleming, T.R. (1979). A Multiple Testing Procedure for ClinicalTrials, Biometrics, 35, 549-556.
102
Peto, R., Pike, M.C., Armitage, P., Breslow, N.E., Cox, D.R., Howard, S.V., Mantel,N., McPherson, K., Peto, J., Smith, P .G. (1976). Design and Analysis ofRandomized Clinical Trials Requiring Prolonged Observation of Each Patient. I.Introduction and Design, British Journal of Cancer, 34, 585-612.
Pocock, S.J. (1977). Group Sequential Methods in the Design and Analysis of ClinicalTrials, Biometrika, 64, 191-199.
Pocock, S.J. (1982). Interim Analyses for Randomized Clinical Trials: The GroupSequential Approach, Biometrics, 38, 153-162.
Pocock, S.J. (1985). Current Issues in the Design and Interpretation of Clinical Trials,British Medical Journal, 290, 39-42.
Pocock, S.J., Geller, N.L., Tsiatis, A. (1987). The Analysis of Multiple Endpoints inClinical Trials, Biometrics, 43, 487-498.
Press, S.J. (1972). Applied Multivariate Analysis, New York: Holt, Rinehart & Winston.
Rom, D.M. (1990). A sequentially rejective test procedure based on a modifiedBonferroni inequality, Biometrika, 77, 3, 663-665.
Roy, S.N. and Bose, R.C. (1953). Simultaneous Confidence Interval Estimation, Annalsof Mathematical Statistics, 24, 513-536.
Ruger, B. (1978). Das Maximale Signifikanzniveau des Tests: " Lehne Ho ab, wenn kunter n Gegebenen Tests zur Ablehnung Fuhren, Metrika, 25, 171-178.
Scheffe, H. (1953). A Method for Judging all Contrasts in the Analysis of Variance,Biometrika, 40, 87-104.
Selke, T. and Siegmund, D. "(1983). Sequential Analysis of the Proportional HazardsModel, Biometrika, 70, 315-326.
Shaffer, J.P. (1986). Modified sequentially rejective multiple test procedures, Journal ofthe American Statistical Association, 81, 395, 826-831.
Sidak, Z. (1967). Rectangular Confidence Regions for the Means of MultivariateNormal Distributions, Journal of the American Statistical Association, 62, 626-633.
Sidak, Z. (1968). On Multivariate Normal Probabilities of Rectangles: Theirdependence on Correlation, Annals of Mathematical Statistics, 5, 1425-1434.
Sidak, Z. (1971). On Probabilities of Rectangles in Multivariate Student Distributions:Their Dependence on Correlations, Annals of Mathematical Statistics, 1, 169-175.
Simes, R.J. (1986). An Improved Bonferroni Procedure for Multiple Tests ofSignificance, Biometrika, 73, 751-754.
Smith, D.G., Clemens, J., Crede, W., Harvey, M., Gracely, E.J. (1987). Impact ofMultiple Comparisons in Randomized Clinical Trials, The American Journal ofMedicine, 83, 545-550.
103
•
Tang, D., Gnecco, C., Geller, N.L. (1989). Design of Group Sequential Clinical Trialswith Multiple Endpoints, Journal of the American Statistical Association, 84, 776779.
Tsiatis, A.A. (1982). Repeated Significance Testing for a General Class of StatisticsUsed in Censored Survival Analysis, Journal of the American Statistical Association,77, 855-861.
Tsiatis, A.A., Rosner, G.L., Tritchler, D.L. (1985). Group Sequential Tests withCensored Survival Data Adjusting for Covariates, Biometrika, 72, 365-373.
Tukey, J.W. (1977). Some Thoughts on Clinical Trials, Especially on Problems ofMultiplicity, Science, 198,679-684.
Wald, A. (1947). Sequential Analysis, Wiley, New York.
Wald, A. and Wolfowitz, J. (1948). Optimum Character of the Sequential ProbabilityRatio Test, Annals of Mathematical Statistics, 19, 326-339.
Whitehead, J. (1983). The Design and Analysis of Sequential Clinical Trials, EllisHorwood Limited, Chichester.
Whitehead, J. and Stratton, I. (1983). Group Sequential Clinical Trials withTriangular Continuation Regions, Biometrics, 39, 227-236.
Worsley, K.J. (1982). An Improved Bonferroni Inequality and Applications, Biometrika,69, 297-302.
104