10
A Quality Index Based on Data Depth and Multivariate Rank Tests Author(s): Regina Y. Liu and Kesar Singh Reviewed work(s): Source: Journal of the American Statistical Association, Vol. 88, No. 421 (Mar., 1993), pp. 252- 260 Published by: American Statistical Association Stable URL: http://www.jstor.org/stable/2290720 . Accessed: 11/10/2012 19:01 Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp . JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected]. . American Statistical Association is collaborating with JSTOR to digitize, preserve and extend access to Journal of the American Statistical Association. http://www.jstor.org

Liu R. and Singh K. (1993) a Quality Index Based on Data Depth and Multivariate Rank Tests

Embed Size (px)

Citation preview

A Quality Index Based on Data Depth and Multivariate Rank TestsAuthor(s): Regina Y. Liu and Kesar SinghReviewed work(s):Source: Journal of the American Statistical Association, Vol. 88, No. 421 (Mar., 1993), pp. 252-260Published by: American Statistical AssociationStable URL: http://www.jstor.org/stable/2290720 .Accessed: 11/10/2012 19:01

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .http://www.jstor.org/page/info/about/policies/terms.jsp

.JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range ofcontent in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new formsof scholarship. For more information about JSTOR, please contact [email protected].

.

American Statistical Association is collaborating with JSTOR to digitize, preserve and extend access to Journalof the American Statistical Association.

http://www.jstor.org

A Quality Index Based on Data Depth and Multivariate Rank Tests

REGINA Y. LIU and KESAR SINGH*

Let F and G be the distribution functions of two given populations on RP, p 2 1. We introduce and study a parameter Q = Q(F, G), which measures the overall "outlyingness" of population G relative to population F. The parameter Q can be defined using any concept of data depth. Its value ranges from 0 to 1, and is .5 when F and G are identical. We show that within the class of elliptical distributions when G departs from F in location or G has a larger spread, or both, the value of Q dwindles down from .5. Hence Q can be used to detect the loss of accuracy or precision of a manufacturing process, and thus it should serve as an important measure in quality assurance. This in fact is the reason why we refer to Q as a quality index in this article. In addition to studying the properties of Q, we provide an exact rank test for testing Q = .5 vs. Q < .5. This can be viewed as a multivariate analog of Wilcoxon's rank sum test. The tests proposed here have power against location change and scale increase simultaneously. We introduce some estimates of Q and investigate their limiting distributions when F = G. We also consider a version of Q and its estimates, which are defined after correcting the location shift of G. In this case Q is used to measure scale increase only. KEY WORDS: Data depth; Multivariate rank tests; Quality assurance; Quality index.

1. INTRODUCTION A data depth is a device for measuring the centrality of a

multivariate data point with respect to a given data cloud. The main purpose of this article is to propose and study a two-population parameter based on a concept of data depth, which we shall call the quality index Q = Q(F, G), and some test statistics pertaining to Q. Here F and G are the distri- bution functions of the two given independent populations in RP, p 2 1. The parameter Q gauges the overall "outly- ingness" of the G population with respect to the given F population. The index Q can detect whether G has a different location and/ or has additional dispersion as compared to F. The range of Q is [0, 1]. Under the null hypothesis Ho: F = G, Q = 2, and Q < 2 indicates that there is a possible location shift and/or a scale increase from F to G. If Q > 2, then G has a smaller dispersion with perhaps the same location or with a relatively minor location shift. Suppose F is the population of measurements of an accepted lot arising from a manufacturing process and G is the population of a future lot whose quality in terms of the same measurements is to be monitored. The parameter Q studied here appears to be an attractive tool for measuring whether G is meeting the same standards as F. The proposed tests on Q here may serve as useful tests for carrying out the on-line inspection in quality assurance.

We now proceed to describe briefly the proposed pa- rameter Q and its related statistics. To begin with, we use D(F; * ) to denote a measure of data depth. Generally speak- ing, for a point x in lRaP, the larger D(F; x) is, the deeper (or the more central) the point x is with respect to the distribu- tion F.

In Section 2 we review the following specific notions of data depth: Mahalanobis's depth, Tukey's depth, the sim- plicial depth, and the majority depth. For any y E lRP, let

R(F; y) = PF(X: D(F; X) ? D(F; y)),

* Regina Y. Liu is Associate Professor and Kesar Singh is Professor, De- partment of Statistics, Rutgers University, New Brunswick, NJ 08903. The authors gratefully acknowledge the support from the National Science Foundation through Grants DMS 9022126 and DMS 90-04658 and thank Arthur B. Yeh for his computing assistance and the referees and the associate editor for their helnfuil comments.

where X is a random sample from the F population. In the context of quality control, F is regarded as the "good" pop- ulation (i.e., meeting the required standard specifications), and y is an observation from a future population G. If Y is a random observation from G, then Proposition 3.1 shows that under the hypothesis F = G, the distribution of R(F; Y) is U[O, 1], a uniform distribution with the support [0, 1]. We define

Q(F, G) = P{D(F; X) ' D(F; Y)IX - F, Y G}.

Note that Q(F, G) = EGR(F; Y). Roughly speaking, with respect to F population, R(F; y) is the fraction of the F population that is "less central" than the value y, and Q(F, G) is the average of such fractions over all y's from the G population. When Q < 2, it means on the average more than 50% of the F population is deeper than any given measure- ment Y from G, indicating an inconsistency between F and G.

Because affine invariance is a practical requirement of sta- tistics used in data analysis, we focus mainly on some affine- invariant data depths. A functional defined on F, say h(F; *), is said to be affine invariant if for any nonsingular matrix A and any constant vector b, h(FA,b; A* + b) = h(F; *), where FA,b is the distribution function of AX + b and X - F. It is worth mentioning that if a depth notion D(F; * ) is affine invariant, then so are its corresponding R (F; * ) and Q. All depths mentioned in Section 2 are affine invariant.

This article is organized as follows. After reviewing some notions of data depth in Section 2, we establish several im- portant properties of Q for elliptical distributions in Section 3. In particular, we appeal to a celebrated lemma of T. W. Anderson (1955) to show that Q is monotonically decreasing as the location parameter shifts along a line and that Q < 2

when G is more spread out than F. In Section 4 we propose an exact rank test that in essence is a multivariate analog of Wilcoxon rank sum test for testing Ho: Q = 2 versus Ha: Q <2. In fact the test may be called a three-sample rank sum

? 1993 American Statistical Association Journal of the American Statistical Association

March 1993, Vol. 88, No. 421, Theory and Methods

252

Liu and Singh: Data Depth and Multivariate Rank Tests 253

test. In Sections 5 and 6 we consider the estimation of Q together with the related testing procedure. Specifically, the two-sample estimate Q = Q(Fm, G,) is studied in detail in Section 6. Here X = {X1, . .. , X,,} is a sample from F and Y = { Y1, . . ., Y, } is a sample from G, and Fm and G, are their empirical distributions. Finding the limiting null dis- tribution of Q is a nontrivial problem in asymptotics. In some cases we have been able to prove this to be N( I, (1 / m + 1 /n)/ 12) (cf. Thm. 6.5), and we believe the same should hold for the remaining cases. Here N(a, b) stands for a nor- mal distribution with the mean a and the variance b. In Section 7 we argue that for measuring scale increase alone one may filter out the location effect before defining Q. We refer to this modified version of Q as the scale index. Its estimate may be used for testing whether G has a higher scale than F, disregarding the location factor. To tighten the article's exposition, we defer most of the proofs to the Appendix.

For the sake of demonstration, we enlist in Table 1 some values of Q for a bivariate case, which clearly show that the location change and the scale increase have a compounded effect (though it is nonlinear) in bringing down the value of Q. At the end of the article we also present a histogram of Q(Fm, GO) for trivariate standard normal Fand G that clearly demonstrates that the limiting distribution of Q(Fm, GO) is N(1/2, (1/m + 1/n)/12).

Some related papers, though intrinsically different in ob- jectives, are Brown and Hettmansperger (1987, 1989), Brown, Hettmansperger, Nyblom, and Oja (1992), and Oja and Nyblom (1989). In those papers the authors developed bivariate rank tests for testing location shifts, using the con- cept of ordering due to Oja (1983). In principle one can also define Q based on Oja's idea of ordering. Similarly, several other criteria that have been used to define multivariate lo- cation seen in Rousseeuw and Leroy (1987, chap. 7) can be viewed as depth measures, and their corresponding ranks can be used to define Q.

2. DATA DEPTH

Let X = {X1, . . , Xm } be a random sample from a p- dimensional distribution F, p 2 1. We describe here some notions of data depth that are affine-invariant and their pop- ulation versions all have the ability to measure "depth" of a distribution at a point in the following sense.

Definition 2.1 (MonotonicityProperty). Let Fbe a dis- tribution symmetric around a point 00; that is, if F is the

Table 1. Q(F, G) Values for F- N(0(g) ( ) 1)

((1 ) 2( 1 ) and G N(~(~~~( 0))

8 \ 1 X 2 3

o .5 .333333 .2 .1 .5 .441248 .282161 .163746 .079852 1 .303265 .171139 .089866 .040657 2 .067668 .023161 .008152 .002732

distribution function of X, then (X - 00) and (00 - X) have the same distribution. We say that the depth function D(F; *) has the monotonicity property if D(F; 00 + a(x - 00)) 2 D(F; x), for all x and for any a such that 0 < a < 1.

The monotonicity property means that D(F; * ) is mono- tonically decreasing along any fixed ray stemming from the center 00. Unless stated otherwise, the notation D(F; *) stands for any of the following four depth measures through- out the article:

1. Mahalanobis's depth (MhD) (Mahalanobis 1936): Let d(x, ,'F) be the Mahalanobis distance between x and /.F (the mean of F); that is, d(x, 1'F) = (X - 1FYF (X - 1tF). We define MhD(F; x) = [1 +d(x, ,uF)I1, and its sample version is [1 + (x - X)'S-'(x - X)]-, where lF and S are the covariance matrix and the sample covariance matrix and X is the sample mean. Here x' indicates the transpose of the p X 1 vector x.

2. Tukey's depth (TD) (Tukey 1975): For a point x, we define Tukey's depth at x to be

TD(F; x)

= inf{ F(2): Z is a closed half space containing x

The sample version of TD(F; x) is defined by replacing F by Fm. Note that when p = 1, TD(F; x) = min{F(x), 1 -F(x-)}.

3. Simplicial depth (SD) (Liu 1988, 1990): Let Xi, . . .. Xp+ I be (p + 1) iid observations from F. The simplicial depth at the point x is

SD(F; x) = PF (x is inside the closed simplex

whose vertices are Xi, . .. , Xp+ ),

and its sample version can be obtained by replacing F by Fm in this expression or by computing ( i+ I) -(*)I (x is inside the closed simplex whose vertices are Xi,, . . . , Xi,+,), where (*) is taken over all possible subsets of X of size (p + 1) and I(A) = 1 if event A occurs and 0 otherwise. In the real line case, SD(F; x) = 2F(x)(1 - F(x-)).

4. Majority depth (MjD) (Singh 1991): For a given ran- dom sample X, ... , Xp from F, a unique hyperplane con- taining these points is obtained and denoted by H(X1, . . ., Xp). With this hyperplane as the common boundary, two closed half-spaces are obtained. We say a point x is in the major side if x falls inside the half-space that has probability greater than or equal to 2, and we define

MjD(F; x) = PF{(XI, .. . , XP): x is in a major side}.

The sample version is M jD(Fm; x). Note that when p = 1, MjD(F;x)= 2 +min{F(x), 1 -F(x-)}.

Remark 2.1. The monotonicity property holds for the Tukey's, simplicial, and majority depths even for the so- called angularly symmetric distributions-a class of distri- butions broader than the class of symmetric distributions (cf. Liu 1990). A distribution F is angularly symmetric about 0o if(X- 00)! IX- 0o11 and -(X- 00)! IX- 0oI1 are iden- tically distributed.

254 Journal of the American Statistical Association, March 1993

Remark 2.2. Under proper conditions, the uniform consistency of the sample version of a depth function D(F; * ) (i.e., SUPxERP I D(Fm; x)) - D(F; x) I 0 almost surely as m oo) holds for all four depths. Specifically, for MhD it holds if EFIIX 112 < oo, for TD and SD it holds if F is absolutely continuous, and for M jD it holds if F is an ellip- tical distribution.

Remark 2.3. The depth MhD is the easiest to compute; however, it leads in general to nonrobust procedures, as it is defined in terms of nonrobust statistics: the sample mean and the sample dispersion.

3. THE PROPERTIES OF Q

Recall that under a given data depth D(.; ) for any two given p-dimensional populations F and G, X - F and Y G, we define

R(F; y) = P{D(F; X) < D(F; y) I X - F}

and

Q(F, G) = P{ D(F; X) < D(F; Y) I X - F, Y G}

(=EGR(F; Y)). In addition to the intuitive interpretation of Q in the context of quality control mentioned in Section 1, Q has several useful mathematical properties.

Proposition 3.1. If the distribution of D(F; X) is con- tinuous, then R(F; X) - U[O, 1], a uniform distribution supported on [0, 1].

The result clearly follows from the probability distribution transformation.

Proposition 3.2. If D(F; *) is affine-invariant, then so are R(F; Y) and Q(F, G); that is,

R(F; Y) = R(FA,b; AY + b)

and

Q(F, G) = Q(FA,b, GA,b),

where FA,b is the distribution function of AX + b, GA,b is the distribution function of A Y + b, A is a p X p nonsingular matrix, and b is a p X 1 vector.

It is worth pointing out that the preceding proposition asserts that the value of Q does not depend on the scales of the underlying measurements as long as they are the same for F and G. In the following we argue that Q decreases from 2 if there is a location shift or a scale increase, or both. For mathematical convenience, we restrict ourselves to the class of elliptical distributions on RP. We begin by a definition of such a distribution.

Definition 3.1. If a distribution has a density of the form

g(x) = clI-' Ih((x - 0)'-'(x - a)),

where h(*) is a function from R + to R +, then it is called elliptical with parameters 6 and , and we denote it by ell( h; 6, X). Here X is positive definite.

For instance, if h ( t) = exp ( -t/ 2), then the distribution is multivariate normal. If the second moment of the distri-

bution exists, then 0 is the mean vector and kU is the dis- persion matrix. Without moment conditions, 0 can be viewed as a center and I as a measure of spread in the following sense. If W1 - ell(h; 0, Z1), W2 ellsh; 0, 22), and Y2 -t 1 is positive definite, then 11W1 - 0| < 11 W2- 011, where < means "stochastically smaller."

We first consider the case of a location shift but no change in dispersion. We prove that when F - ell(h; 00, 20) and G - ell (h; 0, s0), the Q function decreases as 0 is moved away from 00 along any line. For mathematical convenience, we assume hereafter that h is a nonincreasing function.

Proposition 3.3. Let F - ell(h; 00, s0). Assume that D(F; * ) has affine invariance and the monotonicity property. Let 01 and 02 be related as

01 = 0o + a(02 - 00),

where O < a < 1. If Y - ell (h; 01, 7,o) and Z - ell (h; 02, X0), then

St

R(F; Z) < R(F; Y)

and

Q(F, ell(h; 02, 20)) < Q(F, ell(h; 01, 2o)). Consider now the case of the same location but different

dispersions; that is, F - ell(h; Oo, 2o) and G - ell(h; 01, 21), where

00 = 01 and (21 - lo) is positive definite. (1)

Proposition 3.4. Assume that (1) and the affine invari- ance of D(F; *) hold. Then

St1 R(F; Y) <R(F;X), andthus Q(F, G) < -

Proof Let Z - ell(h; 00, I), and let X - 00 = -00) and y - 00 = I/2(- Oo). (Thus X and Yhave their desired marginal distributions.) We need to show that

P((X- 00) 1-'(X- 00) < a)

2 P((Y- Oo)X1-'(Y- Oo) ? a). (2)

This, together with the arguments used in the proof of Prop- osition 3.3, is sufficient to claim the result. To prove (2), we note that

left side = P((Z - 0)'(Z - 0) < a)

and

right side = P((Z - 0)'4/'2 - 1 /2(Z - 0) < a).

That condition (1 - 20) is positive definite implies that (1-' - I 1) is positive definite, which in turn implies that ( 1/2zu1 0 1 /2-I) is positive definite. The latter clearly suf- fices for the claim.

We provide a special case here to exemplify the just-dis- cussed property of Q.

Example 3.1. Let F be bivariate normal N(,u X) and let G be N(,u o2z), where af > 0. To obtain the Q value in this case, it suffices to consider the case where z = I and ,u = (s?), in view of the invariance property. It can be shown

Liu and Singh: Data Depth and Multivariate Rank Tests 255

that Q = fi1% a-2exp(-w)exp(- w/a2)dw, and thus Q = (1 + a2)-l . The figures in row 1 of Table 1 obtained with a different approach agree with this result.

For the case of both location and scale changes, we have the following result.

Proposition 3.5. Let F - ell(h; 00, 20) and G - ell(h; 0, 21) where 21 - 2o is positive definite. Then Q(F, G) decreases monotonically as 0 is moved away from 00 along any line. This proposition follows from the previous prop- ositions.

Finally, we mention a monotonic change in Q in terms of increase in the scale due to Huber's contamination

Ga = (1 - a)F + aH, (3)

where 0 < a < 1 and H is an elliptical distribution with a higher scale than F.

Proposition 3.6. Let F - ell(h; 00, 20) and H - ell(h; 00, 21) with 21 - 20 being positive definite. Define Ga as in (3). Then Q(F, Ga,) is monotonically decreasing as a in- creases in [0, 1].

This result can be shown following Propositions 3.4 and 3.5 and the definition in (3).

Remark 3.1. One naturally expects some type of mono- tonicity property of Q(F, G) as the dispersion of G "in- creases," but so far we have not been able to establish such a result. We tend to believe that Q(F, G1) > Q(F, G2) when F - ell(h; 00, s0), G1 I ell(h; 00, II), and G2 - ell(h; 00, 22), with both ( - lo) and (2 - 21) being positive def- inite. This belief is in fact supported by Example 3.1.

Remark 3.2. It is expected from the theory that when F and G have the same location and G has a smaller scale, then Q > 2. For example, when F - N(O, 1) and G - N(O, 4), then Q is approximately .7.

To illustrate this point, we present some values of Q(F, G) in Table 1, where F N((0), I) and G N(AL(i), a21)

with different ,t and a. Note that due to the invariance prop- erty of normal distributions, the same result holds if I is replaced by a general nonsingular z and if the same vector is added to both means. Also, the value of Q is independent of the depth used as long as it is affine-invariant and has the strict monotonicity property defined in the beginning of Sec- tion 2. The values in Table 1 are obtained based on the following observations. Under MhD, Q(F, G) = P(X'X 2 Y'YIX ~ F, Y - G). Note that X'X - X22, a chi- squared distribution with degree of freedom 2, and Y'Y ~ v2x2(2p72), where x 2(2p72) is a chi-squared distribution

with degree of freedom 2 and noncentrality (2u2). Because X'X and Y'Y are independent, Q(F, G) = P(W ? -)

where W follows a noncentral F distribution with degree of freedom (2, 2) and noncentrality (2pu2). Based on the last formula, the computer software Mathematica is used to cal- culate Q for various combinations of, and v.

4. AN EXACT RANK SUM TEST FOR Q Let X = {Xl, X2, . . . , Xm} and Y = {Y1, Y2, . .., Y4}

be samples from F and Gs. A well-known nonparametric test for location shift in the real line case is the Wilcoxon rank

sum test. The test statistic W in this method is the sum of the ranks of Y values in the combined sample X U Y. The exact distribution of Wis the same as that of 'Y1 + 7Y2 + * * - + y,n, where (Y1, of2, ...X y,n) is a random draw without replacement from the number { 1, 2, 3, .. ., m + n }. The exact distributions of this statistic for different m, n are tab- ulated, and the asymptotics are known (see, for example, Hettmansperger 1984 and Lehmann 1975). We propose a test here for testing Ho: F = G versus Ha: Q < 2. Even though there are differences in the nature of ranking here, the proposed test can be viewed as a multivariate extension of the Wilcoxon rank sum test.

Suppose that we simply combine the X and Y samples and define Was the sum of center-outward ranks of Yvalues according to a certain notion of data depth. Such a W can surely detect an increase of scale in the G population, but it cannot detect a change of location. If the scales are the same, then the distribution of this W under a change of location will be similar to the null distribution. Therefore, we ap- proach the problem of defining a meaningful rank sum sta- tistic somewhat differently.

Suppose that there is an additional sample Z = {5Z, Z2, * Zn,0 } from the F population, perhaps with no substan-

tially larger than m, n. We propose to use the empirical population of Z, Fn, as the reference population. That is, for each Xi we define R (Fn,; Xi) = the proportion of Z sample having lower depth (computed with respect to Z sample itself) than Xi. In other words, R(Fn,; Xi) = the proportion of Zj's with D(Fn,; Zj) < D(Fn,; Xi). Note that R(Fn,; Xi) can be viewed as the relative rank of Xi with respect to Z. Similarly, define R(Fn,; Y,). Arrange the m + n values R(Fno; Xi) and R(Fn,; Yj) for all i and j in an ascending order and assign ranks 1, ..., m + n to them accordingly. Let W = The sum of the ranks of R(Fn,; Yj)

for j = 1, ..., n. Under the null hypothesis F = G, the ranks of R(Fn,;

Yj)'s behave like a random draw of n numbers from { 1, 2, ... , m + n } without replacement; however, under the al- ternative Q < 2, R (Fn,0; Yj) values will tend to be lower than R(Fn,; Xi) values and, as a result, their ranks will be lower. This in turn will make Wrelatively smaller. Thus this Wwill be able to successfully detect changes in the index Q. One technical difficulty in using this Wmay be the problem of tie-breaking. We propose the following nonrandom scheme for breaking the ties.

Tie-Breaking Scheme (*). Regard the values R (Fn,; Xi) and R(Fn,; Yj), i = 1, . .., m andj = 1, .. ., n, aspj, P2, * * Pm, Pm + X, .1 . X Pm+n. If k of these values are equal, say Pi1 = P2 = . Pikwith il < i2 < . . . < ik, and y of the pi's are smaller than these k values, then the ranks of pi, Pi2, ... Pik are - + 1, V + 2, ... ., + k, in that order.

Theorem 4.1. Let 8y1, 5Y2, ..., 'Yn be a random sample without replacement from { 1, 2, ... ., m + n }, and let Hm,n stand for the sampling distribution of '1 + ***+ 'Y,n As- sume that F has density function f( *). Under the null hy- pothesis F = G, the conditional distribution of W with the

256 Journal of the American Statistical Association, March 1993

scheme (*), given a set of distinct values (ZI, Z2,

Z,0)forZand {xi,x2, ... xm,y1,y2I *.,Yn} for(X,Y) disregarding the order, is Hm,n. Moreover, because the sample values are distinct with probability 1, the unconditional dis- tribution of W is also Hm,n .

Proof. Under the scheme (*), each of the (m + n)! per- mutations of {xl, x2, . ., xm, Y1, Y2 * *Yn} of XU Y corresponds to exactly one ranking of the pi's. The claim clearly follows from this observation.

Remark 4.1. Instead of the preceding scheme (*), one may use the following random tiebreaker. If Pi, = Pi2 = * = Pik then select one of the k! choices of ranks randomly from -j + 1, y + 2, ..., y + k and assign them to Pi,, * .* , Pik

Remark 4.2. The proposed test will require a consider- ably larger sample from the F population (of size nO + m). But this should not be of much concern, because the same sample from the F population will be used repeatedly for testing the quality of upcoming batches, a quite standard practice in the manufacturing sector.

5. Q AS A LOCATION PARAMETER

In this section it is assumed that the population Fis known and that one has a sample Y = { Y, Y2 ... ., Yn } from the population G. In reality it could mean either (a) one regards F as the collection of all measurements of one (or several) acceptable lot(s), or (b) one specifies a model on F, say F is an elliptical distribution (e.g., multivariate normal) with ,u and X obtained from the measurements of a large acceptable batch.

Recall that Q = EGR(F; Y), the mean of the random variable R(F; Y) where Y G. We may estimate Q by the corresponding sample mean:

I n Q(F, GO) = - , R(F; Y&) n l

From Proposition 3.1 of Section 3, we obtain the following theorem.

Theorem 5.1. If the distribution of D(F; X) is contin- uous, then the distribution of Q(F, G), under the null F

G, is the same as that of In Ui/n, where U1, U2 ... ., Un are iid U[0, 1] random variables.

With the values R(F; Yi ), for i = 1, . .. , n now known, the problem of testing Q(F, G) < 2 is only a location prob- lem. One may use the sample mean Q(F, GO) to this end or resort to a nonparametric procedure, such as the sign test or the sign rank test.

Even if F is assumed to be a completely specified distri- bution, it may be an extremely tedious job (computationally) to compute the required function R(F; Yi), except in the case when one is using MhD as the underlying depth. But it turns out that when F is an elliptical distribution, the value of R(F; Y) does not depend on the depth used, provided that the depth is affine-invariant and satisfies the monoton- icity property. We formally state this fact as the following theorem.

Theorem 5.2. Assume that F ell( h; Iu, Z). If depth function D(F; * ) is affine-invariant and satisfies strict monotonicity property on the support of F, then

RD(F; Y) = RMhD(F; Y).

Here RD(F; Y) stands for R(F; Y) derived from the depth D(F; ).

Proof. Under the assumed conditions, the contours of constant D are of the form (x - AFY71 - (X - IAF) = c. Thus

RD(F; Y) = PF{(X- AF)YXF1 (X- ItF)

2 (Y- AF) YF (Y- AF)}

= PF{MhD(F; X) ' MhD(F; Y)}

= RMhD(F; Y).

Remark 5.1. Note that if h(t)> O on t E [O, k] for some k > 0 or on [0, oo), then the depths of Section 2 have strict monotonicity on the support of F.

Theorem 5.2 states that if F is elliptical, then it suffices to obtain R(F; Yj) with MhD as the depth. Fortunately, com- puting MhD requires only At and 2 and not the function h.

An alternative to assuming that F is elliptical with known ,u and Z: is to treat F as a finite population consisting of the measurements of an acceptable, fully inspected lot and use a computer-intensive approach. This approach involves re- peatedly drawing a random sample of size n from F, com- puting Q(F, GO) each time, and finally obtaining a histogram of these Q(F, GO) values. Then a future value of Q(F, GO) based on an actual Y from G is placed on this histogram to test if this value agrees with the hypothesis F = G.

6. TWO-SAMPLE ESTIMATE OF Q

In practice a more realistic situation would be that one has two samples, X = { XI, . . ., Xn } from F and Y = { Y1,

Yn } from G, rather than that the distribution F is com- pletely known. In this case a natural estimate of Q is

I n Q(Fm5 GOn)--z: R(Fm; Yi), n =I

where R(Fm; Yi) = the proportion of Xj's having D(Fm; Xj) ? D(Fm; Ye). Here D(Fm; *) is the empirical depth com- puted with respect to Fm. The difference of Q(Fm5 GO) and

can be used to test Ho: F = G versus Ha: Q(F, G) < 2 In this section we first prove the consistency of Q(Fm, GO) and then present some asymptotic distribution results under Ho.

Theorem 6.1.: Consistency of Q(Fm, Gn). Assume that

lim sup I D(Fm; x) - D(F; x)j =0, almost surely (4) m-oo xERP

and that D(F; Y) has a continuous distribution. Then

Q(Fm, GO) -> Q(F, G) almost surely as

Proof: Define

R+(F; y) = PF{X: D(F; X) < D(F; y) + e}

Liu and Singh: Data Depth and Multivariate Rank Tests 257

and

Re(F; y) = PF{X: D(F; X) < D(F; y) - e}.

Note that for any e > 0, and for all large m and n,

1 n I n ->JRe(F; Yi) < Q(Fm, Gn) <-> R+(F; Y1) ni=1 ni=1

almost surely. Now the result can be obtained using the strong law of large numbers and the following facts: As e -O 0,

EGR+(F; Y) EGR(F; Y) = Q(F, G)

and

EGR (F; Y) EGR(F; Y) = Q(F, G).

The latter claims follow from the monotone convergence theorem and the condition that D(F; Y) has a continuous distribution.

We now turn to the asymptotic null distribution of [ Q(Fm, Gn) - Q(F, G)] (4Q(Fm, GO) - 2]). Assume that F = G throughout the rest of this section. We believe this distri- bution to be N(O, (1/nm + 1 /n)/ 12) in general; however, we have been able to establish this result in Theorems 6.3, 6.4, and 6.5 only in the real line case for all four depths and in the general multivariate case for MhD. Finding the limiting distribution in general multivariate cases for SD, TD, and MjD remains an open problem.

We begin with a general result on the conditional limiting distribution of the difference (Q(Fm, G) - Q(Fm, F)), where Q(Fm, F) = E[Q(Fm, G) I X ]

Theorem 6.2. Assume that D(F; X) has a continuous distribution under F and assume that (4) holds. Then, con- ditionally on X, as n -- oo, and m -- oo,

Vn[Q(Fm, G) - Q(Fm, F)] N(0, 1)

along almost all sequences of X. We first comment on the role played by this result on the

eventual null distribution. Write

Q(Fm, GO)-- = [Q(Fm, Gn) - Q(Fm, F)] 2

+ [Q(Fmn F)- = (I) + (II), say.

In the cases considered in Theorems 6.3 and 6.4, the limiting distribution of (II) is N(O, 1/(1 2m)). The final claim of the limit N(O, (1/nm + 1 In)! 12) in Theorem 6.5 follows from the fact that the limit in Theorem 6.2 is independent of X, via a characteristic function argument.

Theorem 6.2 can be shown using the Lindberg-Feller central limit theorem and the following lemma.

Lemma 6.1. Under the condition of Theorem 6.2 and the assumption that F = G conditionally on X, as m -> oo,

along almost all X sequences.

Before stating the final result in Theorem 6.5, we first prove the following results regarding the limit distribution of (II).

Theorem 6.3. Assume that F is defined on R' and is continuous and that it has a density bounded above and below in a neighborhood of the median M. For TD, SD, and MjD in llR1, [Q(Fm, F) - 1] ?N(O, as m - o. 00

Theorem 6.4. Assume that F is defined on RP and is absolutely continuous. Assume also that EFIIX KI < 00. If MhD is used to define Q, then, as m -- oo,

VA[Q(Fm, F) - 2] N( i)

Theorem 6.5. Under the conditions of Theorem 6.3 or the conditions of Theorem 6.4,

[(+ 1) 12[Q(Fm5 GO) - N(0, 1), as min(m, n) - oo0.

Figure 1 illustrates the result established in Theorem 6.5 when p = 3. We let m = n = 50 and use MhD as the depth measure. The figure is a histogram consisting of 1,000 ran- dom values of Z -i0 (Q (Fm, G) - 4), where F and G are both standard normal distributions in R 3. Note that (1I/m + 1/n)/ 12 = 3 here.

Our simulation results reveal a negative bias in Q(Fm, GO) that seems to increase with the dimension. According to our simulation, the bias is roughly .016 when p = 2 and .03 when p = 3. A theoretical study on the nature of this bias should be interesting. We would recommend using resampling pro-

0

0

0

0-< __

lc ll

o 4 -2 0-

Figure 1. 1,000 Simulations of Z's for the Three-Dimensional Case.

258 Journal of the American Statistical Association, March 1993

cedures, such as jackknife or bootstrap, to eliminate this bias in high-dimensional cases. In view of the second-order prop- erties of bootstrap, one would expect that bootstrap could capture this bias and give a better approximation of the lim- iting distribution of [(I /m + 1 /n)/ 12]-1/2(Q(Fm, G) -2) than the normal approximation. (See also Remark 5 in Sec- tion 8.) Our simulation seems to confirm this expectation. The bootstrap bias estimate we obtained is .018 when p = 2.

7. INDEX OF SCALE: TESTING SCALE WHEN LOCATIONS ARE DIFFERENT

We begin this section with an important remark regarding local alternatives and Q. There is a stark resemblance be- tween Q and the mean squared error (MSE) in terms of their sensitivity to the local change in the variance and in the location. For convenience, we consider only the univariate case (p = 1), though this discussion clearly carries over to general multivariate cases. Let X - F and Y - G on 0R, where F(*) = Fo((. -00)/ao) and G(*) = Fo((* -0)/a). Assume that F0 has a density symmetric about 0 and has a bounded derivative. In case a = ao and 0 = 00 + O(n- 2), we have

Q Q(F, G)= 2 + (n-)

and E(Y -o)2 = E(X -o)2 + O(n-1). (5)

Note that the diluted effect is of order O(n -') on both Q and the MSE, whereas the location change is of the order O(n-1/2). On the other hand, if 0 = 00 and a = ao + O(n-1/2), then

Q= 2 + O(n112

and

E(Y -o)2 E(X -o)2 + O(n-1/2). (6)

The implication of (5) is that tests based on Q are locally inferior to other standard tests for location shift alone (keep- ing the dispersion unchanged). On the positive side, (5) and (6) suggest that to test scale change only, one could translate G to G?( * ) = G( * - ( - 00)), redefine Q between F and Go, and then obtain a test for scale accordingly. This test seems to have good local properties. Specifically, we let QS(F, G) = Q(F, GO). Then a natural two-sample estimate of QS(F, G) is

Q, Qs(Fm, Gn)(=Q(Fm, G*)),

where G* stands for the empirical distribution of the ad- justed sample Y * Y - ( - o), i = ...,n} and 6 and bo are l/_- and lm-consistent estimators for 0 and 00. Because the effect of a location shift on Q is di- luted, one expects to see that Q(Fm, G*) = Q(Fm, GOn) + o(max(m-1/2, n-1/2)). Here Go stands for the theoretic empirical distribution of Y = {Y = Yi- (6 - Oo), i = l, ... ., n }, which is a theoretical sample from G?. Con- sequently, Q(Fm, Gn*) and Q(Fm, G?n) should have the same limiting distribution under the hypothesis that* F and G merely diffier in location (in this case Fand G? are identical) .

We state the univariate version of this result in the following theorem and present its proof in the Appendix.

Theorem 7.1. Assume that F( * ) = Fo(( - -0)/ ao), G(*) = Fo (( - 0)/ uo), and Fo has a bounded, uniformly con- tinuous density symmetric around 0. With any of the four depths on the real line, if 0o and 6 are Vm- and 4&-consistent estimators of 00 and 0, then, as min(m, n)

-~00,

(Os - 2) N(O, (1/m + I/n)/12).

Remark 7.1. If G and F differ in their scales-that is, F(*) = Fo((. -00)/cro), G = Fo((* -0)/j), and u * uo- then Q(Fm, G*) will reflect the scale difference the way Q(Fm, GO) does. Note that the populations of X and Y 0, namely F and Go, have the same location and the original standard deviations uo and u. For instance, if uo < u, then Q(Fm, GO) (or Q(Fm, G*)) will tend to be smaller than 1

8. SOME CONCLUDING REMARKS, QUESTIONS, AND OPEN PROBLEMS

1. If one uses TD, SD or MjD to define Q, then the theory presented here is free of any moment requirement. This may be regarded as an advantage of our approach for testing lo- cation, scale, or inconsistency of the output of a production line over other moment-dependent methods. For instance, Q could be used to test a possible increase of dispersion in a multivariate Cauchy population.

2. Kolmogorov-Smimov (K-S)-type statistics, for testing F = G versus F # G, can also detect location shift and scale change simultaneously. (By the scale change we mean both scale increase and decrease.) But in quality assurance, the K-S may well give a false alarm when the location of Y1's stays the same (accuracy stays the same) and its scale de- creases (precision is improved), which is obviously a desirable property in quality control. In this case of scale decrease, Q will actually give higher values, indicating improvement in consistency with the target measurements.

3. Q may be viewed as a "loss function"-free measure of the average increased "deviation" from a preset target value, in contrast to other measures such as MSE matrix (around the target value).

4. How would the power function depend on the choice of data depth? Is there an optimal notion of depth for a given location-scale family of distributions? For elliptical distri- butions, all four depth functions have similar contours. It should be interesting to see what implication this has on the power functions within the class of elliptical distributions.

5. The second-order property of the bootstrap asserts, roughly speaking, that if the limiting distribution is free from population parameters (as in our case here, with the limiting distribution N(0, (1 /m + 1 /n)/ 12)), then the bootstrap ap- proximation is closer to the actual distribution than the lim- iting distribution. This extra accuracy obtained in the boot- strap is due to the cancellation of an extra term in the asymptotic expansion of the sampling distribution. This suggests that in principle one should be able to get a better approximation for the sampling distribution of Q by using the bootstrap. Our simulation results support this suggestion.

Liu and Singh: Data Depth and Multivariate Rank Tests 259

APPENDIX: PROOFS

A. 1 Proof of Proposition 3.3

We state one of the key steps of the proof as the following lemma.

Lemma 3.1. Let F - ell(h; 00, Z,O). If D is affine-invariant and has the strict monotonicity property on the support of F, then the contours {x: D(F; x) = c} are of the form

(x - 0O)'Y-i(x - do) = dc and the contours are nested within one another as c decreases.

Proof Let Z be a random variable with distribution H, H - ell (h; 00, I). Applying proper rotations to Z and both the affine invariance and the monotonicity properties of D(H; *), we see that the contours { x: D (H; x) = c} are nested spheres. We then apply the transformation (X - do) = X 1"2(Z _O) (which implies X - F) and the invariance of D(F; *) to see that the transformed contours are elliptical and nested.

As a direct consequence of Lemma 3.1, the set {R(F; Y) 2 t} is of the form (Y - 60)'1-'(Y - 00) c t*. Using theorem 1 of Anderson (1955), we see that PG((Y- 0o)'X1 (Y - 00) < t*) de- creases monotonically as 6 moves away from 00 on a line. The result is thus proved, because

EGR(F; Y) = f PG(R(F; Y) 2 t) dt.

A.2 Proof of Lemma 6.1

It suffices to show that R (Fm; y) converges to R (F; y) for almost all fixed y (with respect to F) along almost all sequences X. Fix a sequence X along which (4) holds. For any given e > 0, supyeRP I D(Fm; y) - D(F; y) I c e/2 for all m 2 mO for some mo. Therefore, for all such m,

{Y: D(F; Y) 2 D(F; y) + } c {Y: D(Fm; Y) 2 D(Fm; y)} {Y: D(F; Y) 2 D(F; y)-}.

The claim is deduced by letting e tend to 0.

A.3 Proof of Theorem 6.3

In the cases considered here we can write

Q(Fm, F)-- = 2 Fm(x) A (I -Fm(x)) dF(x)-2 2 J

=-2 J F(x) d[Fm(x) A (1 - Fm(X))] -

(using integration by parts)

=-2 f F(x) d[Fm(x) A (1 - Fm(x))]

- 2[1 - Fm(Mm-)] + - + O(m-) 2

lm 1 = -2- (i +-+ O(m-),

m = 2

where Mm is the sample median of Fm and

i= F(Xi) if Xi Mm

I1 -F(Xi) if Xi > Mm.

Let

(i,t = F(Xi) if Xi c t

Note that (i = (iMm Note also that Et, = 4 + O((t -M)2) in a neighborhood of M. Here M is the median of F. This can be shown

by checking that

(it-. I = O(I t - MI) on the interval [M, t] if M < t

or on the interval [t, M] if M 2 t

= 0 elsewhere.

Thus it remains to be shown that

2Vm 1-2? ~,m, - E~,j---~ N(O I, m i1 12/

as m -- oo. Because the distribution of 2ti,M is U[O, 1], this is achieved by showing that

T(Mm) = I ~ -E~ijm. i[ 1m]

- [m (i,M - Eti,M1 = op(m-1/2).

Toward this end, we show that

sup IT(x)I op(m-1/2), Ix-MI <VM log m

using a standard set of arguments given on the line. The sup is estimated by the max at the endpoints of shrinking partitions of length m-, and finally a probability bound coupled with the Bon- ferroni inequality is used.

A.4 Proof of Theorem 6.4

Let X and S denote the sample mean and sample dispersion matrix of the X data set. Given some x E RP, define

A(x, 0, A) = {y E 1RP: (y - 0)'A-'(y -0) ? (x - 0)'A-'(x -0)

where 0 is a p X 1 vector and A is p X p invertible matrix. With this notation, one can write

-IQ(Fm, F) -

= f {Fm(A(x, X, S)) - F(A(x, X, S))} dF(x),

provided that S is nonsingular; which is true almost surely for all large n. Because for any fixed ,u and nonsingular X, F(A(X,,u, ,)) is distributed as U[0, 1], it follows that

V- f {Fm(A(x, gt, X)) - F(A(x, t, X)) } dF(x) 4' N(0, 12

The proof is completed by showing that

f {Fm(A(X, X, S)) - F(A(x, X, S))

- Fm(A(X, A, X)) + F(A(x, ,u, X))} dF(x) = op(I/V_). Note that this result needs suitable estimates of fluctuations of mul- tivariate empirical processes. We have been unable to find results in probability literature that could meet this need. On the other hand, we have succeeded in constructing a very lengthy and tedious proof for the bivariate case along the lines of Bahadur-Kiefer rep- resentation on the real line (see, for instance, Babu and Singh 1978). We choose to omit this part of the proof here.

A.5 Proof of Theorem 7.1

Without loss of generality, we assume that 00 = 1. It suffices to prove that as/I-min {n, m }-oo,

VN(Q(Fm, G*) -Q(Fm, Gn)) -~0 in probability.

260 Journal of the American Statistical Association, March 1993

Note that when D(F; *) is taken as TD, SD, or MjD on the line,

Q(Fm, G*) 2[Fm(y) A (1 -Fm(y))] dG*(y), (A.1)

where G*(.) is the empirical distribution of Y*, and a A b = min{a, b}.

The following proof is for Q(Fm, G*) in (A.1). A similar but separate argument is needed for the case of MhD. To prove (A. 1), we first fix constants K and e such that P(( - Oo) > K/IV) < e and show that, with d = 0 - 00,

Su? [Fm(Y d) A (I -Fm(y -d) )-Fm( y-d + $)

as / -1 oo. We write this expression as (I) + (II), where (I) = (A.2) with Fm replaced by F and (II) = the difference of (A.2) and (I). On the interval I y -0 1 2 e, the integrand in (I) equals (with f(*) =F'( ) = fo0( -0o))

t 4 [f(y - d)sign(y - 0) + 61(y)]' (A.3)

where supy I 61(y) I 0 as I -o o, using the uniform continuity of the density. On the other hand, when I y - 0 1 < , the integrand in (I) is of the order o(/-1/2) uniformly in Iti ? d. Thus

(I) ? - n (Y) + O(eI1/2),

where t(Yj) =f(Yi - d)sign(Yi - 0)I(Yi - 01 e) = fo(Yi - 0)sign(Yi - 0)I( Yi -0 1 2 e). Because Et(Yj ) = 0 for all i and e is arbitrary, (I) = op(l-1/2). Now it remains to be shown that

{lI (II) I- 0 in probability. The result follows from the known bounds on the empirical pro- cess. We estimate the integral in two different regions separately: I y -oI I-'/8 and I y-doI > 18.On the first we use the usual

bound Op(n-'12) on the empiricals; on the second we use the os- cillation bound on the empirical process (see, for example, Stute 1982).

[Received November 1990. Revised June 1992.]

REFERENCES Anderson, T. W. (1955), "The Integral of a Symmetric Unimodal Function

Over a Symmetric Convex Set and Some Probability Inequalities," in Proceedings of the American Mathematical Society, 6, 170-176.

Babu, G. J., and Singh, K. (1978), "On Deviation Between Empirical and Quantile Processes for Mixing Random Variables," Journal ofMultivariate Analysis, 8, 532-549.

Brown, B. M., and Hettmansperger, T. P. (1987), "Affine-Invariant Rank Methods in the Bivariate Location Model," Journal of the Royal Statistical Society, Ser. B, 49, 301-3 10.

(1989), "The Affine-Invariant Bivariate version of the Sign Test," Journal of the Royal Statistical Society, Ser. B, 51, 117-125.

Brown, B. M., Hettmansperger, T. P., Nyblom, J., and Oja, H. (1992), "On Certain Bivariate Sign Tests and Medians," unpublished manuscript sub- mitted to Journal of the American Statistical Association.

Hettmansperger, T. P. (1984), Statistical Inference Based on Ranks, New York: John Wiley.

Lehmann, E. L. (1975), Nonparametrics: Statistical Methods Based on Ranks, San Francisco: Holden-Day.

Liu, R. (1988), "On a Notion of Simplicial Depth," in Proceedings of the National Academy of Sciences, 85, 1732-1734.

(1990), "On a Notion of Data Depth Based on Random Simplices," The Annals of Statistics, 18, 405-414.

Mahalanobis, P. C. (1936), "On the Generalized Distance in Statistics," in Proceedings of the National Academy of India, 12, 49-55.

Oja, H. (1983), "Descriptive Statistics for Multivariate Distributions," Sta- tistics and Probability Letters, 1, 327-332.

Oja, H., and Nyblom, J. (1989), "On Bivariate Sign Tests," Journal of the American Statistical Association, 84, 249-259.

Rousseeuw, P. J., and Leroy, A. M. (1987), Robust Regression and Outlier Detection, New York: John Wiley.

Singh, K. (1991), "A Notion of Majority Depth," technical report, Rutgers University, Dept. of Statistics.

Stute, W. (1982), "The Oscillation Behavior of Empirical Processes," The Annals of Probability, 10, 86-107.

Tukey, J. W. (1975), "Mathematics and Picturing Data," in Proceedings of the 1974 International Congress of Mathematicians, Vancouver, 2, 523- 531.