A p-value for testing the equivalence of the variances of a bivariate normal distribution

Journal of Statistical Planning and Inference 138 (2008) 3982–3992www.elsevier.com/locate/jspi

A p-value for testing the equivalence of the variances ofa bivariate normal distributionThomas Mathewa,∗, Gitanjali Paula,b

aDepartment of Mathematics and Statistics, University of Maryland, 1000 Hilltop Circle, Baltimore, MD 21250, USAbGlaxoSmithKline, King of Prussia, Pennsylvania, PA 19406, USA

Received 18 May 2005; received in revised form 20 April 2007; accepted 19 February 2008Available online 29 February 2008

Abstract

A p-value is developed for testing the equivalence of the variances of a bivariate normal distribution. The unknown correlationcoefficient is a nuisance parameter in the problem. If the correlation is known, the proposed p-value provides an exact test. For largesamples, the p-value can be computed by replacing the unknown correlation by the sample correlation, and the resulting test is quitesatisfactory. For small samples, it is proposed to compute the p-value by replacing the unknown correlation by a scalar multipleof the sample correlation. However, a single scalar is not satisfactory, and it is proposed to use different scalars depending on themagnitude of the sample correlation coefficient. In order to implement this approach, tables are obtained providing sub-intervals forthe sample correlation coefficient, and the scalars to be used if the sample correlation coefficient belongs to a particular sub-interval.Once such tables are available, the proposed p-value is quite easy to compute since it has an explicit analytic expression. Numericalresults on the type I error probability and power are reported on the performance of such a test, and the proposed p-value test is alsocompared to another test based on a rejection region. The results are illustrated with two examples: an example dealing with thecomparability of two measuring devices, and an example dealing with the assessment of bioequivalence.© 2008 Elsevier B.V. All rights reserved.

MSC: 62F03; 62H15; 62P99

Keywords: Bioequivalence; Crossover design; Incomplete beta; Wishart distribution

1. Introduction

We address the problem of testing if the variances of a bivariate normal distribution are equivalent, i.e., if they are

close according to a specified criterion. Let � =(

�11 �12�12 �22

)be the variance–covariance matrix of a bivariate normal

distribution. The hypothesis of interest to us is the equivalence of �11 and �22, i.e., we want to test if the ratio �11/�22is close to 1. The hypothesis can be stated as

H0 :�11�22

�c or�11�22

� 1

cvs. H1 :

1

c<

�11�22

< c, (1)

∗Corresponding author. Tel.: +14104552418; fax: +14104551066.E-mail address: [email protected] (T. Mathew).

0378-3758/$ - see front matter © 2008 Elsevier B.V. All rights reserved.doi:10.1016/j.jspi.2008.02.005

http://www.elsevier.com/locate/jspi

T. Mathew, G. Paul / Journal of Statistical Planning and Inference 138 (2008) 3982–3992 3983

for a suitably chosen c> 1. Note that we conclude the equivalence of �11 and �22 if H0 is rejected. Define the parameter

� = max

(�11�22

,�22�11

). (2)

Then our hypotheses can equivalently be stated as

H0 : ��c vs. H1 : � < c. (3)

An article that addresses the above testing problem isWang (1999a), who has derived a rejection region for the problem.Wang’s (1999a) test is quite satisfactory; it is not too conservative. Our purpose here is to derive an easily computablep-value.

The above problem is of interest in several applications. The problem is relevant in the context of establishing theequivalency of measuring devices; for example, when we want to establish the equivalence of an alternative measuringdevice to a standard device. Wellek (2002, p. 85) reports an example dealing with the comparability of blood pressuremeasurements taken using two different automatic devices. The sample for this consists of bivariate observations on thediastolic blood pressure of 20 individuals obtained using the two devices (the data are reproduced in Table 6). Note thateven if the means are equivalent, we cannot conclude that the two devices are equivalent, unless the variances are alsoequivalent. The problem of comparing two measuring devices is very common in applications in industrial hygiene,where it is required to obtain data on workplace exposure to toxicants. For gathering such data, industrial hygienistswould prefer to use a cheap or easy to use sampling device, provided it is equivalent to an accurate standard device.For background information and examples on this, we refer to the article by Krishnamoorthy and Mathew (2002).

The testing problem (1) is also of interest in the assessment of bioequivalence in two settings. Suppose a 2 × 2crossover design is used, with n subjects, to test the bioequivalence of a test drug T with a reference drug R. Then eachsubject receives T and R once, and let Y jT and Y j R denote the corresponding responses for the jth subject. Typically,the response obtained is the area under the curve (AUC), or the maximum blood concentration (Cmax) after a logtransformation. Several authors have used the one-way random model for Y jT and Y j R ; see, for example, Sheiner(1992), Schall and Luus (1993), Schall (1995) and Wang (1999b). Thus the model is

Y jT = �T + � jT + � jT ,

Y j R = �R + � j R + � j R , (4)

j = 1, 2, . . . , n, where �T and �R are population mean responses corresponding to treatments T and R, � jT and � j Rare random subject effects, and � jT and � j R are the random within-subject errors. It is further assumed that (� jT , � j R)follows a bivariate normal distribution with zero means and variance–covariance matrix, say �B , given by

�B =(

�2BT �BT R

�BT R �2BR

), (5)

and � jT ∼ N(0, �2WT ) and � j R ∼ N(0, �2WR). Define

YT = 1

n

n∑j=1

Y jT , YR = 1

n

n∑j=1

Y j R ,

S =n∑j=1

(Y jT − YTY j R − YR

)(Y jT − YT , Y j R − YR),

� = �B + diag(�2WT , �2WR) =(

�11 �12�12 �22

)(say).

Then it is easily verified that �T = YT and �R = YR are unbiased estimators of �T and �R . Furthermore,

Var

(Y jT

Y j R

)= �, Var

(�T�R

)= 1

n� and S ∼ W2(�, n − 1),

where W2(�, n − 1) denotes the two-dimensional Wishart distribution with n − 1 degrees of freedom and associatedvariance–covariance matrix �. We note that the alternative hypothesis in (1), or equivalently in (3), states that �T and�R (and also Y jT and Y j R) have equivalent variances. This is also noted in Wang (1999a).

3984 T. Mathew, G. Paul / Journal of Statistical Planning and Inference 138 (2008) 3982–3992

Now suppose we have an s × 4 crossover design for testing the bioequivalence of T and R, and we use the modelconsidered by the FDA (2001); see Chow and Liu (2000) for details. Let Yi jkl denote the lth response (log(AUC)or log(Cmax), for example) from formulation k (i.e., T or R) for the j th subject in the i th sequence; i = 1, 2, . . . , s,j = 1, 2, . . . , ni , k = T, R, l = 1, 2, where ni is the number of subjects in the i th sequence. The statistical model in theFDA (2001) document is

Yi jkl = �k + �ikl + �i jk + �i jkl , (6)

where�T and�R are populationmean responses corresponding to treatmentsT andR, respectively, �ikl is the fixed effectcorresponding to the lth application of treatment k in sequence i, satisfying the estimability condition

∑si=1

∑2l=1�ikl=0

(for k = T, R), �i jk is the random effect corresponding to treatment k for subject j in sequence i, and �i jkl ’s are therandom within-subject errors. Similar to the assumptions made for the one-way random model (4), we assume that�i jkl ’s are normally and independently distributed with mean zero and variance �2Wk , k = T, R. It is further assumedthat (�i jT , �i j R) follows a bivariate normal distribution with zero means and variance–covariance matrix �B givenin (5). Define

Yi jk. = (Yi jk1 + Yi jk2)/2, Yi.k. = 1

ni

ni∑j=1

Yi jk., �k = 1

s

s∑i=1

Yi.k. (k = T, R),

S =s∑

i=1

ni∑j=1

(Yi jT . − Yi.T .

Yi j R. − Yi.R.

)(Yi jT . − Yi.T ., Yi j R. − Yi.R.),

� = �B + 1

2diag(�2WT , �2WR) =

(�11 �12�12 �22

)(say),

c2 = 1

s2

s∑i=1

1

niand =

s∑i=1

ni − s. (7)

Then(�T�R

)∼ N

((�T�R

), c2�

)and S ∼ W2(�, ). (8)

It should once again be clear that the alternative hypothesis in (1), or equivalently in (3), states that �T and �R haveequivalent variances.

The paper is organized as follows. In the next section we shall develop our p-value for testing the hypotheses in (3),based on S ∼ W2(�,m), where the df m depends on the model under consideration, as explained above in the contextof the models (4) and (6). Towards this, we first give an explicit analytic expression for the p-value as a function ofthe population correlation coefficient . Since is unknown, we proceed as follows. Replace by h, where is thesample correlation and the constant h is chosen so that the test performs well in terms of type I error. However, it turnsout that a single scalar h does not provide satisfactory performance. Consequently, we provide sub-intervals for , anda different constant h depending on the sub-interval to which belongs. Numerical results on the performance of ourp-value are given in Section 3, and we have also compared our procedure with the rejection region developed byWang(1999a). Two examples are presented in Section 4. Some concluding remarks appear in Section 5.

2. The p-value

A canonical form for our problem is as follows. Let

S =(S11 S12S21 S22

)∼ W2(�,m),

where

� =(

�11 �12�21 �22

)=

(�11

√�11�22

√

�11�22 �22

). (9)


The problem of interest is to test the hypotheses in (1), or equivalently in (3), where the parameter of interest, namely�, is defined in (2). Our approach can be motivated as follows. From the definition of �, it follows that if is known, itis reasonable to use

T0 = max

(S11S22

,S22S11

)(10)

as our test statistic. It also turns out that the distribution of T0 is stochastically increasing in �; this is proved in theAppendix. Hence a p-value, say p(), for testing the hypotheses in (3) is given by

p() = maxH0

P(T0� t) = P(T0� t |� = c), (11)

where t is the observed value of T0. Here we have used the notation p() to emphasize that the p-value depends onthe unknown correlation . In order that the above p-value be useful, we have to take care of the dependence of p()on . Before we address this difficulty, we shall give an expression for p(). The expression is based on the cdf ofW = √

(S11/�11)/(S22/�22), derived in Finney (1938); see also Kotz et al. (2000, pp. 451–454). For w�1, the cdf isgiven by

P(W �w) = Iw0

(m2

,m

2

),

where

w0 = 1

2

⎡⎢⎢⎢⎢⎣1 −

w − 1

w√(w + 1

w

)2

− 42

⎤⎥⎥⎥⎥⎦ , (12)

and Iw0 (m/2,m/2) denotes the incomplete beta function with parameters (m/2,m/2).Assume �11��22 without loss of generality, so that �=�11/�22. Then we haveW =√

(1/�)(S11/S22). Hence, from(10) and (11),

p() = P

(1

t� S11

S22� t |� = c

)= P(

√1/ct�W �

√t/c)

= P(W �√1/ct) − P(W �

√t/c)

= P(W �√ct) − P(W �

√t/c)

= 1 − P(W �√ct) − P(W �

√t/c), (13)

where we have used the fact thatW and 1/W have the same distribution. Note that ct�1. However, t/c could be < 1.Define

t1 = 1

2

[1 −

√ct − √

1/ct√(√ct + √

1/ct)2 − 42

],

t2 = 1

2

⎡⎣1 −

√t/c − √

c/t√(√t/c + √

c/t)2 − 42

⎤⎦ if t/c�1,

t3 = 1

2

⎡⎣1 −

√c/t − √

t/c√(√c/t + √

t/c)2 − 42

⎤⎦ if t/c< 1. (14)


From (12)–(14), we get

p() = 1 − It1(m2

,m

2

)− It2

(m2

,m

2

)if t/c�1,

= It3(m2

,m

2

)− It1

(m2

,m

2

)if t/c< 1. (15)

Since is unknown, suppose we replace it with the sample correlation and carry out the test based on p()computed using the formula (15). Numerical results show that this results in a test procedure that is liberal, i.e., thesize of the test exceeds the nominal level, especially for small values of m. However, our numerical results also showthat when m is somewhat large, the test based on p() has size close to the nominal level. For smaller values m, weexplore the following strategy: replace by h for a suitable constant h, and carry out the test based on the p-valuep(h). The constant h can be chosen so that the size of the test does not exceed the nominal level. Unfortunately, ifwe use a single constant h, the test based on p(h) becomes too conservative for certain values of . To overcome thisdrawback, we use the following strategy. Note from (12) that p() is a function of ||. Divide the interval [0, 1) for|| into smaller sub-intervals. Choose a different constant h depending on the sub-interval to which || belongs. Thedivision into sub-intervals and the choice of the constants h’s can be numerically determined so that the size of the testis close to the nominal level for various values of ∈ [0, 1). For some values of m and for a 5% significance level, thepartition of [0, 1) into sub-intervals is discussed in the next section.

Here then is a summary of our procedure for computing a p-value for testing the hypotheses in (3):

1. Obtain a partition of [0, 1) into sub-intervals, and obtain the corresponding h values, as illustrated in the next section.2. Let t = observed value of max(S11/S22, S22/S11).3. Compute the sample correlation coefficient = S12/

√S11S22 and choose the h-value corresponding to the sub-

interval of [0, 1) to which || belongs.4. Compute t1, along with t2 or t3, using the formulas in (14), with h in the place of .5. The p-value is then given by p(h), and can be computed using the formula (15).

3. Numerical results

Our numerical results show that the test based on p() has type I error probabilities that can exceed the nominallevel, especially for small values of m, as already pointed out. Detailed numerical results on this are not reported here.However, ifm is large, the test based on p() performs well in terms of type I error probability. In fact the test based onp() exhibits satisfactory performance for m�33. Our simulation results for the test based on p(), for m = 33, gavea maximum (with respect to ) type I error probability of 0.0529, and most of the type I error probabilities were veryclose to 0.05. All of our numerical results in this section correspond to a 5% significance level, and we have chosenc = 1.252.We shall now give some tables providing the sub-intervals of [0, 1) and the corresponding h-values required to

compute the p-value p(h). Tables 1–3 provide these for values of m ranging from 8 to 32. It is possible to use thesame set of h-values for several values of m; the tables have been prepared taking this into account. For example, ifm = 16 and = 0.375, we choose h = 1.08 and the p-value is p(1.08). Type I errors of our proposed test are given inTables 4 and 5 for m = 16 and 22. These tables also include the type I error probabilities of a test due to Wang (1999a),to be described below, and power comparisons. We note that for the test based on the p-value p(h), the type I errorsare all close to the nominal level. This is the case for all values ofm ranging from 8 to 32, even though we have reportedthe type I error probabilities only for m = 16 and 32. Indeed, the h-values in Tables 1–3 have been obtained so as toachieve satisfactory type I error performance. Note that once we decide upon a value of c and a particular significancelevel, the only variable in the problem is the degrees of freedom m. Thus, once Tables 1–3 are available, the p-value isthen very easy to calculate since we have a closed form formula for p(h). Similar tables for the h-values (and a SAScode for computing them) are available from the authors for other values of m, c, and the significance level.

In order to describeWang’s (1999a) rejection region for testing the hypotheses in (1), let F denote a central F randomvariable with df = (m,m) and let d0 satisfy

P

(1

cd0< F <

d0c

)= �,


Table 1Values of h for 8�m�16, c = 1.252 and a 5% significance level

8�m�11 12�m�16

Interval for || h Interval for || h

0.000� ||< 0.005 0.600 0� ||< 0.06 1.1000.005� ||< 0.015 0.950 0.06� ||< 0.23 1.1700.015� ||< 0.045 1.150 0.23� ||< 0.27 1.1300.045� ||< 0.150 1.200 0.27� ||< 0.29 1.0000.150� ||< 0.220 1.150 0.29� ||< 0.31 0.8000.220� ||< 0.240 1.130 0.31� ||< 0.34 0.9200.240� ||< 0.400 1.100 0.34� ||< 0.39 1.0800.250� ||< 0.350 1.000 0.39� ||< 0.43 1.0200.400� ||< 0.650 0.875 0.43� ||< 0.46 0.9000.650� ||< 0.710 0.865 0.46� ||< 0.50 0.9500.710� ||< 0.740 0.750 0.50� ||< 0.65 0.9900.740� ||< 0.880 0.880 0.65� ||< 0.73 0.8700.880� ||< 0.960 0.960 0.73� ||< 0.79 0.8500.960� ||< 0.980 0.994 0.79� ||< 0.85 0.8700.980� ||< 1.000 0.999 0.85� ||< 0.87 0.890

0.87� ||< 0.94 0.9750.94� ||< 0.98 0.9950.98� ||< 1.00 0.999


17�m�20 21�m�24


0.000� ||< 0.060 1.20 0� ||< 0.04 1.220.060� ||< 0.220 1.15 0.04� ||< 0.28 1.180.220� ||< 0.240 1.10 0.28� ||< 0.35 1.030.240� ||< 0.320 1.20 0.35� ||< 0.41 1.010.320� ||< 0.420 1.10 0.41� ||< 0.43 1.050.420� ||< 0.510 0.75 0.43� ||< 0.51 1.000.510� ||< 0.530 0.80 0.51� ||< 0.57 0.980.530� ||< 0.570 0.95 0.57� ||< 0.70 0.870.570� ||< 0.660 0.80 0.70� ||< 0.83 0.950.660� ||< 0.710 0.72 0.83� ||< 0.92 0.990.710� ||< 0.770 0.78 0.92� ||< 0.96 0.9950.770� ||< 0.840 0.88 0.96� ||< 0.98 1.000.840� ||< 0.880 0.895 0.98� ||< 0.992 0.9990.880� ||< 0.935 0.997 0.992� ||< 1.000 1.0000.935� ||< 0.980 0.9950.980� ||< 0.987 0.9990.987� ||< 1.000 0.9995

where � is the significance level. For � = 0.05, c = 1.252, and m = 16 and 22, we have d0 = 1.0483 and 1.0476,respectively. Let tm−1(�) denote the 1 − � percentile of a central t distribution with df = m − 1. Define

H0() = c{tm−1(�)(1 − 2)1/2 + [m − 1 + (1 − 2)t2m−1(�)]

1/2}2m − 1

,

H1() = max{H0(), 1},

H2() =⎧⎨⎩H1() if || > 0,

d0 + (H1(0) − d0)2

20otherwise,



25�m�28 29�m�32


0.00� ||< 0.03 1.22 0.00� ||< 0.01 0.500.03� ||< 0.08 1.18 0.01� ||< 0.11 0.600.08� ||< 0.10 0.75 0.11� ||< 0.16 0.750.10� ||< 0.14 1.15 0.16� ||< 0.19 0.400.14� ||< 0.20 0.60 0.19� ||< 0.28 0.990.20� ||< 0.27 0.85 0.28� ||< 0.32 0.700.27� ||< 0.30 1.05 0.32� ||< 0.33 0.550.30� ||< 0.33 0.90 0.33� ||< 0.36 0.800.33� ||< 0.38 1.00 0.36� ||< 0.51 0.950.38� ||< 0.41 0.75 0.51� ||< 0.54 0.900.41� ||< 0.61 0.90 0.54� ||< 0.65 0.850.61� ||< 0.65 0.99 0.65� ||< 0.67 0.880.65� ||< 0.72 0.94 0.67� ||< 0.70 0.950.72� ||< 0.81 0.97 0.70� ||< 0.80 0.990.81� ||< 0.91 0.985 0.80� ||< 0.83 0.920.91� ||< 0.96 0.999 0.83� ||< 0.88 0.990.96� ||< 0.98 0.995 0.88� ||< 0.95 0.9950.98� ||< 1.00 0.9995 0.95� ||< 0.99 0.999

0.99� ||< 1.00 1.00

Table 4Type I error probabilities and power for Wang’s test and the proposed test for m = 16, c = 1.252 and a 5% significance level

P (Type I error) Power at �11/�22 = 1

Wang Proposed Wang Proposed

0.00 0.0458 0.0499 0.0712 0.07370.10 0.0472 0.0505 0.0774 0.07360.20 0.0466 0.0508 0.0736 0.07460.30 0.0422 0.0494 0.0718 0.07700.40 0.0452 0.0505 0.0754 0.08070.50 0.0468 0.0497 0.0818 0.08700.60 0.0414 0.0503 0.0836 0.09610.70 0.0492 0.0501 0.0928 0.11110.80 0.0390 0.0488 0.1266 0.15090.90 0.0420 0.0486 0.3014 0.32700.95 0.0504 0.0474 0.6656 0.66180.99 0.0506 0.0460 0.9980 0.9979

for a constant 0 to be chosen. Wang’s test consists of rejecting H0 when

1

H2()� S11

S22�H2(). (16)

The constant 0 is to be chosen subject to the condition that

sup0��1

P

(1

H2()� S11

S22�H2()|� = c

)= �.

For m = 16 and 22, the values of 0 are 0.891 and 0.840, respectively.The numerical results in Tables 4 and 5 indicate that the proposed test based on p(h) and Wang’s test both have

good performance in terms of type I error probability, even thoughWang’s test is somewhat conservative in some cases,


Table 5Type I error probabilities and power for Wang’s test and the proposed test for m = 22, c = 1.252 and a 5% significance level

P (Type I error) Power at �11/�22 = 1

Wang Proposed Wang Proposed

0.00 0.04950 0.0494 0.08532 0.08580.10 0.04977 0.0497 0.08646 0.08620.20 0.04947 0.0487 0.08727 0.08890.30 0.04836 0.0512 0.08769 0.09140.40 0.04893 0.0505 0.09129 0.09650.50 0.04593 0.0506 0.09578 0.10630.60 0.04376 0.0512 0.10321 0.11950.70 0.04089 0.0505 0.11773 0.14880.80 0.03996 0.0504 0.17557 0.22870.90 0.04670 0.0507 0.48596 0.51110.95 0.04986 0.0510 0.84590 0.84610.99 0.04999 0.0507 1.00000 0.9999

especially around = 0.80. This explains the better performance of the proposed test in terms of power, especiallyaround = 0.80.

The type I error probabilities and powers in Tables 4 and 5 were obtained based on 10,000 simulated data setsfrom the bivariate Wishart distribution. All the simulations were carried out using SAS Version 8.2. Our method forgenerating a sample from a Wishart distribution with parameters m and � is as follows. Suppose we wish to generateS ∼ W2(�,m), where

S =(S11 S12S21 S22

)and � =

(�11 �12�21 �22

).

It is known that

S22�22

∼ �2(m),S11.2�11.2

∼ �2(m − 1) and

[S12 − �12

�22S22

]/√�11.2S11.2 ∼ N(0, 1), (17)

where S11.2 = S11 − S212/S22, �11.2 = �11 − �212/�22, and the three random variables in (17) are independent; seeMuirhead (1982, p. 93). In view of this result, in order to generate the bivariate Wishart matrix S, we first generated theindependently distributed random variables �21 ∼ �2 with df=m, �22 ∼ �2 with df=m−1, and Z ∼ N(0, 1). Using theresult (17), S can be generated using the relationships S22 = �22�21, S11.2 = �11.2�22, S12 = Z

√�11.2S11.2 + �12S22/�22,

and S11 = S11.2 + S212/S22.The h-values inTables 1–3were obtained by running a simulation study as follows. For a fixedm and a 5%significance

level, the type I error probabilities of the proposed test were computed for various values of h (0�h�2), for a specifiedvalue of . We chose that value of h for which the type I error probability was close to 0.05 (for the specified valueof ). The values of having nearly the same value of h were then grouped into one interval. Some trial and errorwas needed in order to arrive at a final set of h-values, since is unknown, and the test has to be carried out using .However, the trial and error adjustment was needed only for large values of (values greater than 0.9). In fact the type Ierror probability turned out to be very sensitive to slight changes in the h-values as || approached one, which is wheremost of the trial and error was involved. Tables 1–3 were obtained following the above procedure. We would like toemphasize the lack of theoretical arguments to show that our proposed test is a level � test; we have only numericalresults on the performance of the test. However, since p() in (11), or equivalently in (15), is the appropriate p-valuewhen is known, it appears natural to suitably modify this quantity for the case of an unknown . We have arrived at amodification that performs satisfactorily, and is also easy to compute, even though our conclusions are based entirelyon numerical results.

We note that the values of h in Tables 1–3 do not exhibit any monotonicity. We do not expect this since, as a functionof , p() is not always monotone; in fact, it can be shown that the monotonicity actually depends on the observed


value t. In other words, for certain observed values of t, p() is an increasing function of , and for certain otherobserved values, it is a decreasing function of . A proof of this is not included here.

4. Examples

Two examples are analyzed here. The first example is on the comparison of two measuring devices. The secondexample deals with bioequivalence testing. We have used c = 1.252 and � = 0.05 in both the examples.

4.1. Comparison of two measuring devices

The data given in Table 6 are the diastolic blood pressure measurements of 20 individuals obtained using twoautomatic devices. The corresponding bivariate random variable is denoted by (Y1, Y2)′. We first checked the bivariatenormality of the data using a graphical method proposed by Srivastava (1984); see also Srivastava (2002, p. 70). Thesample variance–covariance matrix S, based on the data in Table 6, is given by

S =(229.93083 228.74226228.74226 236.80891

).

Let the bivariate observations in Table 6 be denoted by yi = (Y1i , Y2i )′ (i = 1, 2, . . . , 20), and let h1 and h2 denote theorthonormal eigenvectors of S given above. Then h1=(0.70177, 0.71240)′ and h2=(0.71240, −0.70177)′. Srivastava’s(1984) graphical method consist of obtaining two normal probability plots based on the data h′

1yi , i =1, 2, . . . , 20, andh′2yi , i = 1, 2, . . . , 20. These plots (not given here) indicated that bivariate normality is reasonable.For S given above, = 0.9803. From Table 2, we have h = 0.999 and the p-value is 0.0000899. Thus we reject H0

in (1) and conclude that the two measuring devices have equivalent variances.

4.2. An example on bioequivalence assessment

This example is based on bioequivalence data taken from the FDA website (www.fda.gov/cder/bioequivdata/). Wedid not carry out any model diagnostics for this example; we carried out our analysis based on the model (6) consideredby the FDA (2001), for the log-transformed data.

The data for the example is based on a 4-sequence and 4-period crossover design to test the equivalence of the genericdrug Anti-Depressant IR to a reference drug. The four sequences are: RRTT, RTTR, TTRR, TRRT. The response ofinterest in this case is Cmax, the maximum blood concentration. There were five subjects in each of the four sequences.The matrix S defined in (7) has the distribution W2(�, 16), and has the value

S =(1.702 1.0271.027 1.442

).

We now have = 0.6257. From Table 1, h = 0.99 and the p-value turns out to be 0.175. Thus we fail to reject H0, andwe conclude that the variances of �T and �R are not equivalent.

Table 6Data for the first example

Y1 Y2 Y1 Y2

62.167 62.667 74.500 76.66785.667 85.333 91.667 93.50080.667 80.000 73.667 69.16755.167 53.833 63.833 71.33392.000 93.500 80.333 77.66791.000 93.000 61.167 57.833107.833 108.833 63.167 67.33393.667 93.833 73.167 67.500101.167 100.000 103.333 101.00080.500 79.667 87.333 89.833

http://www.fda.gov/cder/bioequivdata/


5. Concluding remarks

When we want to test the equality of the variances of a bivariate normal distribution, the unknown correlationcoefficient is a nuisance parameter. If is known, the p-value p() in (11), or equivalently in (15), is the appropriatequantity to use for the testing problem addressed in the paper. Thus in the practical situation of an unknown , it isnatural to think of using and suitably modifying p(), to be used as a p-value. Our suggestion is to use p(h), wherethe constants h have been tabulated. The scalar h to be used depends on the value of the sample correlation coefficient.Given that p() is a fairly complicated function of , theoretical results appear to be difficult to obtain. However, oncethe table of h-values is available, our p-value is quite easy to calculate. Numerical results show that our test performswell in terms of both type I error probability and power. The results are applied to examples dealing with testing theequivalency of two measuring devices, and testing bioequivalence.

Appendix

Here we shall prove that the distribution of T0 in (10) is stochastically increasing in � = max(�11/�22, �22/�11).In the proof, we shall use the expression for the density of the random variable W = √

(S11/�11)/(S22/�22).Bose (1935) and Finney (1938) have derived the probability density function of W, and the density is

f (w) = 2(1 − 2)

B(m2

,m

2

) wm−1

(1 + w2)m

{1 − 42w2

(1 + w2)2

}−(m−1)/2

.

Without loss of generality, we can assume �11��22, so that �=max(�11/�22, �22/�11)=�11/�22. Define V11=S11/�11and V22 = S22/�22, so that W = √

V11/V22. Let

P∗(�) = P

(max

(S11S22

,S22S11

)�k

)= P

(max

(�V11V22

,V22�V11

)�k

)

= P

(�V11V22

�k,V22�V11

�k

)= P

(1

k�� V11

V22� k

�

)

= P

(1√k�

�W �√k

�

).

Using the density ofW given above, we get

P∗(�) =∫ √

k�

1√k�

2(1 − 2)

B(m2

,m

2

) wm−1

(1 + w2)m

{1 − 42w2

(1 + w2)2

}−(m−1)/2

dw

⇒ �P∗(�)��

= 2(1 − 2)

B(m2

,m

2

) �

��

⎡⎣∫ √

k�

1√k�

wm−1

(1 + w2)m

{1 − 42w2

(1 + w2)2

}−(m−1)/2

dw

⎤⎦

= 2(1 − 2)

B(m2

,m

2

)⎡⎢⎣

( k�

)(m−1)/2

(1 + k

�

)m⎧⎨⎩1 − 42k/�(

1 + k�

)2⎫⎬⎭

−(m−1)/2 (−1

2k1/2�−3/2

)

−( k�

)(m−1)/2(1 + k

�

)m

⎧⎨⎩1 − 42k/�(

1 + k�

)2⎫⎬⎭

−(m−1)/2 (−1

2k− 1

2 �−3/2)⎤⎥⎥⎦


= (1 − 2)

B(m2

,m

2

)[km/2�(m−2)/2

(1 + k�)

{1 − 42k�

(1 + k�)2

}−(m−1)/2

−km/2�(m−2)/2

(k + �)

{1 − 42k�

(k + �)2

}−(m−1)/2]

= (1 − 2)

B(m2

,m

2

)km/2�(m−2)/2

[{1 − 42k�

(1 + k�)2

}−(m−1)/21

(1 + k�)m

−{1 − 42k�

(k + �)2

}−(m−1)/21

(� + k)m

]. (18)

Note that (k� + 1) − (k + �) = (k − 1)(� − 1)�0, since ��1 and k�1. Hence,[1 − 42k�

(k� + 1)2

]�

[1 − 42k�

(k + �)2

]⇒

[1 − 42k�

(k� + 1)2

]−(m−1)/2

�[1 − 42k�

(k + �)2

]−(m−1)/2

.

We also have 1/(k� + 1)�1/(k + �). Hence the expression inside the bracket in (18) �0, implying that

P

(max

(S11S22

,S22S11

)�k

)↓ � ⇒ P

(max

(S11S22

,S22S11

)> k

)↑ �.

Hence, the distribution of max(S11/S22, S22/S11) is stochastically increasing in max(�11/�22, �22/�11).

References

Bose, S.S., 1935. On the distribution of the ratio of variances of two samples drawn from a given normal bivariate correlated population. Sankhya2, 65–72.

Chow, S.C., Liu, J.P., 2000. Design and Analysis of Bioavailability and Bioequivalence Studies. second ed. Marcel Dekker, New York.FDA, 2001. Guidance for industry: statistical approaches to establishing bioequivalence. Center for Drug Evaluation and Research, US. Food and

Drug Administration, Rockville, MD, USA.Finney, D.J., 1938. The distribution of the ratio of estimates of the two variances in a sample from a normal bivariate population. Biometrika 30,

190–192.Kotz, S., Balakrishnan, N., Johnson, N.L., 2000. Continuous Multivariate Distributions. second ed. Wiley, New York.Krishnamoorthy, K., Mathew, T., 2002. Statistical methods for establishing equivalency of a sampling device to the OSHA standard. Amer. Industrial

Hygiene Assoc. J. 63, 567–571.Muirhead, R.J., 1982. Aspects of Multivariate Statistical Theory. Wiley, New York.Schall, R., 1995. Assessment of individual and population bioequivalence using the probability that bioavailabilities are similar. Biometrics 51,

615–626.Schall, R., Luus, H.G., 1993. On population and individual bioequivalence. Statist. Med. 12, 1109–1124.Sheiner, L.B., 1992. Bioequivalence revisited. Statist. Med. 11, 1777–1788.Srivastava, M.S., 1984. A measure of skewness and kurtosis and a graphical method for assessing multivariate normality. Statist. Probab. Lett. 2,

263–267.Srivastava, M.S., 2002. Methods of Multivariate Statistics. Wiley, New York.Wang, W., 1999a. On equivalence of two variances of a bivariate normal vector. J. Statist. Plann. Inference 81, 279–292.Wang, W., 1999b. On testing of individual bioequivalence. J. Amer. Statist. Assoc. 94, 880–887.Wellek, S., 2002. Testing Statistical Hypotheses of Equivalence. Chapman and Hall, New York.

Documents

A p-value for testing the equivalence of the variances of a bivariate normal distribution