2984898

Embed Size (px)

DESCRIPTION

math

Citation preview

  • Wiley and Royal Statistical Society are collaborating with JSTOR to digitize, preserve and extend access to Journal of the Royal Statistical Society. Series B (Methodological).

    http://www.jstor.org

    The Geometry of Generalized Inverses Author(s): William Kruskal Source: Journal of the Royal Statistical Society. Series B (Methodological), Vol. 37, No. 2 (1975),

    pp. 272-283Published by: for the Wiley Royal Statistical SocietyStable URL: http://www.jstor.org/stable/2984898Accessed: 16-09-2015 15:53 UTC

    Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at http://www.jstor.org/page/ info/about/policies/terms.jsp

    JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected].

    This content downloaded from 132.174.254.159 on Wed, 16 Sep 2015 15:53:44 UTCAll use subject to JSTOR Terms and Conditions

  • 272 [No. 2,

    The Geometry of Generalized Inverses

    By WILLIAM KRUSKAL University of Chicago

    [Received July 1973. Final revision November 1974]

    SUMMARY Generalized inverses of linear transformations must satisfy at least the natural requirement that they are true inverses for appropriately restricted subspaces. There are three other characteristics that may or may not hold independently. The four properties are described in coordinate-free geometrical terms, followed by discussion of dimensionalities and other topics.

    Keywords: GENERALIZED INVERSES; INVERSES; LEAST SQUARES; LINEAR TRANS- FORMATIONS; MOORE-PENROSE INVERSE

    1. INTRODUCTION J. BIBBY (1972), in his review of a monograph on generalized matrix inverses, complained wryly that threet current books on the subject made no reference to each other, nor did they use consistent terminologies, nor did they explain fully and clearly the different varieties of generalized inverses. Those varieties are formed by the ways in which three or four dichotomous characteristics may hold or fail to hold.

    Bibby's third complaint is his most interesting, and it motivates this paper. I shall present simple geometrical interpretations for the three optional dichotomous characteristics of common discourse, and for the underlying characteristic that appears essential for any transformation worthy of the name "inverse". The exposition will be from the coordinate-free approach to vector spaces, an approach given masterful textbook exposition by Paul Halmos (1958); two articles of mine in the statistical literature (Kruskal 1961, 1968) exemplify use of the approach in the linear hypothesis context. In particular, I shall write mostly in a language of linear transformations rather than one of matrices.

    The underlying characteristic, presumably a sine qua non for generalized inverse- ness, is this: A- is a generalized inverse of the linear transformation A if A- is a one-to-one linear transformation from the range of A to some linear manifold such that application of A-, followed by application of A, is the identity on A's range.

    The linear manifold mentioned just above may-except in the presently un- interesting case that A is non-singular-be chosen in many ways. It must, of course, have the same dimension as the range of A, and it must be disjoint: from A's null space.

    t Bibby's review was of Boullion and Odell (1971). The two other monographs mentioned were Pringle and Rayner (1971) and Rao and Mitra (1971). Since then, at least two more mono- graphs about generalized inverses have appeared: Albert (1972) and Ben-Israel and Greville (1974). It is remarkable that four of these five monographs are of double authorship.

    t "Disjoint" for linear manifolds means having no vectors in common except for the origin, the zero vector. The preimage of a set N under a transformation B is the set of all vectors taken by B into N.

    This content downloaded from 132.174.254.159 on Wed, 16 Sep 2015 15:53:44 UTCAll use subject to JSTOR Terms and Conditions

  • 1975] KRUSKAL - Geometry of Generalized Inverses 273

    A second scope for free choice is in the definition of A- for arguments not in the range of A. So far as the sine qua non basic property is concerned, there is no restriction whatever on what A- does outside A's range, provided only that A- is a linear trans- formation.

    The three dichotomous characteristics now to be sketched cut down the above free choices in different, geometrically natural, ways. If all three are required, there is a unique generalized inverse.

    The first of the three characteristics is the egalitarian one of equal rank; it says that the rank of A- is the same as that of A. An equivalent requirement is that the range of A- be exactly the choosable manifold mentioned three paragraphs above.

    The second characteristic is orthogonality of the choosable manifold and the null space of A. The third is orthogonality of the range of A and the preimaget under A- of the null space of A.

    I shall-following sensible tradition-suppose that the underlying or sine qua non property must hold for membership in the generalized inverse club. The other three may hold or fail independently (in the logical sense) so that there are eight resulting kinds of generalized inverse. Each may be readily described and discussed in the above geometrical terms.

    The analysis of this paper can hardly be novel. Yet the recent monographs on generalized inverses, and Bibby's review, suggest that a succinct exposition of the coordinate-free approach might be useful.t In another direction, I should mention more general approaches, in particular those of Beutler and Root (1974) and the twin papers of Nashed and Votruba (1974)-note the continued tradition of double authorship. These papers, and earlier ones to which they refer, extend the generalized inverse concept to classes of spaces broader than the finite-dimensional vector spaces of the present discussion.

    I end this paper with an outline of the relationships of our theme to the solution of simultaneous linear relations and to generalized least squares in two senses.

    Some readers may share my desire to gain security and concreteness from simple examples. To that end, one section of this paper exemplifies the general discussion in terms of a very simple example indeed: a 2 x 3 matrix consisting entirely of zeroes save for a single one. Yet this example provides enough singular richness to illustrate the general ideas.:

    The next section presents formal background so that a more precise discussion may begin.

    2. THE BACKDROP We begin with two finite-dimensional vector spaces, V1 and V2, of dimension n1

    and n2 respectively. For some purposes we shall need inner-products in the two spaces, and we shall suppose them given and denoted by (u, v), where u and v are

    t The Editor has kindly told me of the paper by Masaaki Sibuya (1970). It is perhaps closest to the present paper in spirit and content among the many treatments I have seen.

    + Another way of gaining a sense of concreteness is to draw balloon-shaped objects denoting the various linear manifolds we shall be considering (especially N, M, R and S in notation soon to be introduced) and to show on the resulting diagram arrows for A and A-. I understand that such aids to the mind are at best activities of child-like naivete, akin to rhythmic toe tapping by a string quartet player, and with analogous dangers of misinterpretation and social contempt. Nonetheless, used quietly and with care not to substitute a highly special case for a proof, diagrams like the one suggested can be most useful; it would be disingenuous to remain silent about their utility.

    This content downloaded from 132.174.254.159 on Wed, 16 Sep 2015 15:53:44 UTCAll use subject to JSTOR Terms and Conditions

  • 274 KRUSKAL - Geometry of Generalized Inverses [No. 2,

    either both in V1 or both in V2. (There is ambiguity in using the same inner product notation for both spaces, but it creates no difficulty in our context.) For convenience, take both vector spaces over the real line. Of course V1 and V2 may be the same, but there are advantages, both technical and intuitive, in working with the slightly more general case.

    There is under consideration a linear transformation, A, from V1 to V2. The range of A is denoted by R = AV1, a subspace of V2 of rank p say. The null space of A is denoted by N; it is the subspace of V1, of rank v1 = n1 - p, made up of all x E V1 such that Ax = 0. We also define v2 = n2-p, so that p + vj = ni (i = 1,2).

    If V = V2= 0 (and hence n1 = n2 = p) the transformation A is non-singular, i.e. it establishes a one-to-one relationship between V1 and V2. Our interest, however, is in the singular case, where at least one vi is positive. In such a case how might we usefully discuss a concept of inverse, a so-called generalized inverse?

    3. IDENTITY ON THE RANGE OF A The least we can expect of a generalized inverse is that there be a p-dimensional

    linear manifold (i.e. subspace) of V1 (call it M) such that the generalized inverse provides a one-to-one relationship between R and M.

    For the one-to-one relationship we must require that M be disjoint from N, but that is sufficient. For any p-dimensional subspace of V1, M, disjoint from N, A establishes a one-to-one relationship between R and M. Proof is immediate: disjoint- ness implies that at most one vector in M is taken by A into a given vector of R; common dimensionality provides that AM = R.

    Denote by A- the backward transformation from R to such an M. The one-to-one nature of the relationship is expressed by

    A-AXM = XM, AA YR =YR (0)

    for all xM eM and YR e R. (In general, subscripts on a symbol like "x" or "y" indicate membership in a given linear manifold. I shall use "x" to denote vectors in V1, and "y" vectors in V2.) It is readily shown that A- is linear from R to M.

    For each such M, then, there is an A-, whose extension to a linear transformation on all of V2 to V1 will soon be discussed. As to the multiplicity of such M's, there is at least one and in general there are many. (Only when N is itself V1, or the manifold consisting of 0 alone, is there a single such M.) So in the interesting cases there are many A- transformations.

    For any such A- based on an M, it follows that

    AA-A = A, (1)

    since M+N = V1 and Ax = AXM+AXN = AXM, where XM+XN is the unique decomposition of x into a sum of vectors in M and N. Note that (1) simply says that AA- is the identity on R.

    Conversely, if a linear transformation A- from V2 to V1 satisfies (1), then A- defines a one-to-one relationship between R and a linear manifold in V1 of dimension p and disjoint from N. For (1) says that AA-YR = YR for all YR ciR. The set of all A-YR, as YR wanders over R, must be a linear manifold in V1; call it M = A- R. Next, dim (M),< p, since dimension cannot be increased by a linear transformation, but if dim (M) were less than p, AM= R would also have dimension less than p, thus generating a contradiction. Finally, disjointness of M and N may also be seen by

    This content downloaded from 132.174.254.159 on Wed, 16 Sep 2015 15:53:44 UTCAll use subject to JSTOR Terms and Conditions

  • 1975] KRUSKAL - Geometry of Generalized Inverses 275

    contradiction, for suppose a non-zero x e Mn N; then x = A-YR (for some non-zero yR ER) and Ax = 0, whence AA-YR =YR = 0.

    Thus condition (1) and the description of A- in terms of M = A-R are equivalent. There are two so far undetermined elements, whose ambiguity reflects the generally many possible choices for A-. First, M may in general be chosen in many different ways. Second, having chosen M, so that A- is defined on R, we have yet (unless p = n2) to define A- on the rest of V2. Except when n1 = n2 = p, there is free choice in determining A-.

    It seems reasonable to require (1) or its equivalent as a sine qua non for the notion of a generalized inverse.

    4. DESCRIPTION OF AN A- Together, the disjoint manifolds M and N span V1 and provide a useful way of

    describing A. Is there some similar useful way of describing an A- over all of V2, not just R? We might, of course, choose any linear manifold in V2, of dimension n2 - p and disjoint from R, as an aid in describing A-. Indeed we shall later do that. But there is one such manifold (depending on A-) of particular interest, the preimage of N under A-.

    To motivate that interest note first that for any xe VI, Ax ER, A-Ax EM and x-A- Ax E N (because of (1)). Hence

    A-Ax+(I-A-A)x = x

    provides explicitly the unique decomposition of x into its M and N components. Now begin with any yE V2 and go the other way, starting with AA-y ER. Consider

    y-AA-y, and note that transforming it by A- gives a vector in N; that is, by (1),

    AA-(y-AA-y) = AA-y-AA-y = 0. That draws attention to the set of all vectors in V2 taken by A- into N, the preimage of N; call it S,

    S = {yV21A-yEN} = {yEV2AA-y = 0}. Clearly S is a linear manifold and disjoint from R (by disjointness of M and N). Thus

    AA-y+(I-AA-)y = y

    provides explicitly the decomposition of y into its R and S components. Since R and S span V2, the dimension of S is V2 = n2 - p. Of course S includes the null space of A-.

    The description of a given A- by its effects on the components of y in R and S is useful. If we are interested in the totality of A-'s, however, there is no S to begin with. So we may proceed by first choosing an M, then choosing a subspace of V2, S (of dimension V2 and disjoint from R), and finally specifying an arbitrary linear transformation from S to N.

    The next three sections in effect treat these three steps in specifying an A-, but (for convenience of exposition) in a different order: first the S to N transformation, then M and then S.

    (In some circumstances we may have no special interest in S; then A- may be fully described by stating M and by defining A- on any subspace of V2, W, having rank v2 and disjoint from R.)

    This content downloaded from 132.174.254.159 on Wed, 16 Sep 2015 15:53:44 UTCAll use subject to JSTOR Terms and Conditions

  • 276 KRUSKAL - Geometry of Generalized Inverses [No. 2,

    5. EQUAL RANK The simplest way of determining the S to N transformation is just A-yS = 0 for

    all y,s E S, i.e. A- S = 0. This is equivalent to requiring that S is the null space of A-, and that might be motivated by requiring, in an egalitarian spirit, that A- be of the same rank as A, that is p. Another way of stating the condition is to require that A- R be the range of A-.

    The A-S = 0 condition is also equivalent (assuming (1)) to the more usually stated

    A-AA- = A-, (2)

    i.e. A-A is the identity on the range of A-. In one direction, suppose A-yS = 0 and write Y=YR +YS, so that A-y = A-YR eM. Then from (O)-equivalent to (l)- A-AA-Y = A-y, or (2).

    Conversely, suppose (1) and (2) both hold for an A-; we are to show A- S = 0. With y = YR +yS again, by (2),

    A-YR+A-YS = A-AA-(YR+YS) = A-AA-YR

    = A-YR

    The second line comes from the first because A-yS E N; the third line comes from the second by (0). Hence A-yS = 0.

    In short, given (1), requiring (2) is equivalent to requiring the same rank for A and A-. The linear manifolds M and S are still at our disposal, subject to rank and disjointness.

    6. ORTHOGONALITY OF M, N We now make use of the inner product in V1, in particular via orthogonality and

    symmetry. It may seem natural to specify as M the orthogonal complement of N, and that is what this section contemplates. It is equivalent to specify that A-A be symmetric.

    Proof A-A, a linear transformation from V1 into itself, is idempotent by (1) and hence a projection (Halmos, 1958, Section 41). The projection is onto M along N. Now a projection is an orthogonal projection if and only if it is symmetric (Halmos, 1958, Section 75). That is, M and N are orthogonal if and only if A-A is symmetric.

    7. ORTHOGONALITY OF R, S In exactly the same way, but with respect to the given inner product in V2,

    requiring orthogonality of R and S is equivalent to requiring symmetry of AA-. That is because AA- is the projection onto R along S.

    8. SUMMARY So FAR If we require (1) in any case, as seems natural for any concept of generalized

    inverse, then there are eight combinations of the three further characteristics described above. Geometrical interpretations have been provided. If all three characteristics are required-equal rank and both orthogonalities-then A- is uniquely determined. The resulting unique A- is often called the Moore-Penrose inverse.

    This content downloaded from 132.174.254.159 on Wed, 16 Sep 2015 15:53:44 UTCAll use subject to JSTOR Terms and Conditions

  • 1975] KRUSKAL - Geometry of Generalized Inverses 277

    9. SYMMETRY OF A If V1 V2, A is itself symmetric, and the two inner products are the same, then R

    and N are orthogonal to begin with. It is then doubly natural to take R as M and to take S as N. The thus fully specified A- is the bonafide inverse of A considered as a non-singular transformation from R onto R; the same A- takes N into zero. (It might be convenient sometimes to relax the equal rank requirement and to permit A- to take N into itself in other ways.) The symmetric case is perhaps of greatest statistical importance because of its connection with traditional least squares.

    10. CONCRETE EXAMPLE To make matters as explicit as possible, a scandalously simple concrete example

    in conventional matrix terms will now be sketched. We take V1, V2 respectively as ordinary 3- and 2-dimensional coordinate space (with vertical coordinate vectors), and we adopt the conventional inner products. The specific A we work with is expressed by the 2 x 3 matrix

    I O O A = ,

    so that A is orthogonal projection in 3-dimensional coordinate space onto the 1 axis, followed by removal of the third coordinate entirely.

    Thus n1 = 3, n2 = 2, p = 1, v, = 2, v2 = 1. Further, R is the 1 axis in 2-dimensional coordinate space, and N is the 2, 3 coordinate plane in 3-space.

    An A- takes {O), spanning R on V2, into some one-dimensional manifold in 3-space disjoint from N, i.e. into some manifold spanned by

    I (1)

    ".3)

    for arbitrary (2, o. The "1," in the first coordinate guarantees disjointness from N

    without being otherwise limiting. The vector {0} may be taken by an A- into any vector of 3-space, so the general A-, i.e. the matrix form of the generalized inverse satisfying (1), is

    (1/3 A-= 4 2 I2 > 1' 0C3 I3)J

    for any numbers 2 U3, P19 f29 /3 (Note in passing that such A's form a flat, or coset, in the space of linear trans-

    formations from V2 to V1. The flat clearly has dimension 5 = n1 n2 - p2. We shall return to this dimensionality matter.)

    Next, to impose the requirement that the rank of A- is the same as that of A (that is, 1) is to say that the two columns of A- are proportional. Hence the general

    This content downloaded from 132.174.254.159 on Wed, 16 Sep 2015 15:53:44 UTCAll use subject to JSTOR Terms and Conditions

  • 278 KRUSKAL - Geometry of Generalized Inverses [No. 2,

    form of A- with equal rank is I 1 P (X2 P12

    %Y3 I3l )

    It is also readily seen that S is the 1-dimensional manifold spanned by { ; and that hence requiring A- S = 0 is equivalent to P2 = P1 2, P3 = P3l a3. That is a slightly different route to the same result.

    Consider next imposing (only) the condition MI N. This says that M is spanned by

    1~ so that the general A- satisfying MI N is ( 1 /3A

    {0 P3

    for any Pl, P2, P3. (Note that these form a flat in the space of linear transformations of dimension 3= v2n1.)

    What about the SIR condition (only)? A description of S was given above, so to say SIR is to say P3 = 0. Thus the general A- satisfying SIR is given by ( 10

    '2 2j

    (forming a flat of dimension 4 = vt n2). Equal rank cum MIN give the form

    U ?1

    O 0J

    (to define a flat of transformations of dimension 1 v2 p). Equal rank cum R I S give

    0'2 0

    OC3 0)

    (to define a flat of transformations of dimension 2- v p).

    This content downloaded from 132.174.254.159 on Wed, 16 Sep 2015 15:53:44 UTCAll use subject to JSTOR Terms and Conditions

  • 1975] KRUSKAL - Geometry of Generalized Inverses 279

    Requiring both orthogonalities, MI N and R i S, gives ( 10 0 /92 0 f3)

    (and a flat of transformations of dimension 2 -vI v2). Finally, requiring equal rank and both orthogonalities gives, of course, the unique

    transformation Ii0 (defining a flat of transformations of dimension zero).

    11. DESCRIPTION OF THE A-'s AGAIN If we deal with two or more A's there is some inconvenience in working with S,

    since S changes in general as A- does. As noted earlier, it may be simpler to fix any v2-dimensional subspace of V2 disjoint from R, call that subspace W, and work with W.

    So an A- may be described by stating M, which establishes A- on R, and then defining A- on W in any linear way as a transformation to V1. If Ao is any specific such A-, then it is easily seen that A--Ao takes R into N and is arbitrary on W. The linear transformations taking R to N form a linear manifold (of transformations) of dimension pvl, and those taking W to V1 a manifold of dimension v2 n,. Hence the A's form a flat of dimension

    V1 P + V2(p + V1) = V1 V2 + P(V1 + V2)

    2 = nj n2-P .

    Note that this checks with our concrete example. Imposing equal rank means that the transformation from W is now not anything

    but into M only. It is curious that this does not lead to a flat, as may be seen in the simple concrete example-the terms P1 2 and P oz3 show how the non-linearity is exhibited. We may also sense the non-linearity in the condition A-AA- = A-, where A- appears twice on the left, thus in a sense quadratically.

    If, however, we restrict ourselves for the moment to a specific M, i.e. to A-'s satisfying the equal rank condition and taking R into a particular M, held fixed pro teni, then the resulting transformations do clearly form a flat in transformation space of dimension v2 p (the number of ways of going linearly from W to M). If we then add on the natural component stemming from arbitrariness of M, or pvl (the number of ways of going linearly from R to N), we obtain a kind of quasi-dimension "p(V1 + V2)". Note that in the concrete example, p(vl + v2) = 2 + 1 = 3, corresponding to the number of parameters: X2, cx3 and P,.

    Now consider the MIN condition only. That specifies A- acting on R. The resulting flat in transformation space therefore has rank v2n, = v1 v2 + pv2.

    Next work with the Si R condition only. This may seem harder, but it is lightened by taking W = S, which one may do since S, the orthogonal complement of R, is

    This content downloaded from 132.174.254.159 on Wed, 16 Sep 2015 15:53:44 UTCAll use subject to JSTOR Terms and Conditions

  • 280 KRUSKAL - Geometry of Generalized Inverses [No. 2,

    fixed for transformations in the set under consideration. For dimension of the flat of transformation, we have pv1 for A- defined on R as before. Since S is the pre- image of N, however, we obtain v2 v, for A- defined on S. Altogether, we obtain Vl V2 + pvI = v1 n2 which checks in the concrete example.

    Equal rank and MIN means that A- is unique on R and takes W into M only. The resulting dimension is pv2.

    Equal rank and Si R means that A- takes S (serving as W) into 0. The resulting dimension is pvl since it is only A- acting on R that is free to vary.

    Imposing both orthogonalities only means that A- is unique on R and that S goes into N. The resulting dimension is vl v2.

    We may put these dimensionality facts together in an elegant partial ordering picture (Fig. 1).

    Fic.~~~~~~~~~~~~~~~~~ 1. Inlso an iesoaiypril reigrltosfrteegtfas

    no restrictions t Y, V2 + P(VI + v')I =n, n.,-p l)

    e e a vr ak only b Equal rank only e n o R on fl

    VI V" a\ ,V VI VI + }

    rank and MT1 2 o Rao and SR rand S p R pY2 I VI V2 lpV,

    differe .restrictions

    FIG. 1. Inclusion and dimensionality partial: ordering relations for the eight flats. The eight flats are in the vector space of gen neraliwse transformations from l to l An upper box confesponds to higher dimension and inclusion relative to a lower box. The equal-rank-only box is an exception: it does not correspond to a flat, and its "dimension" is given in quotation marks for dubiety. All transformations satisfy the sine qua non property (0) or (1).

    12. EXPRESSING ONE GENERALIZED INVERSE IN TERMS OF ANOTHER If Ao is a generalized inverse of A, how might we usefully express other generalized

    inverses in terms of A- ? Theorem 2.4.1 of Rao and Mitra (1971) presents two such formal expressions; in our geometrical language we approach these results a bit differently.

    Let M and S be Ao R and the preimage of N under Ao-. (It would be notationally better, but confusing to the eye, to write MO, So.) We shall let S serve as W in describing A -'s, and we remind ourselves that the R. S components of Y E- V2are

    YR= AoAoy Yys = (I-AOA- y

    This content downloaded from 132.174.254.159 on Wed, 16 Sep 2015 15:53:44 UTCAll use subject to JSTOR Terms and Conditions

  • 1975] KRUSKAL - Geometry of Generalized Inverses 281

    and that the M, N components of x E V1 are

    XM = -AO X, XN = (I-AO A) X. It is also useful to contemplate all (A--AO)'s, that is to look at the set of linear transformations from V2 to V1 that take R into N, and S into V1 without restriction.

    If T is an arbitrary linear transformation from V2 to V1, we may construct from it an A--AO as follows:

    Accept whatever T does to S; for yR e R, subtract from TYR its component in M, i.e. AO AOTy.

    Thus any T leads to the following A--A-:

    (A--AO)y = TYS+TYR-AOAOTYR =Ty-A-AOTAOAjy.

    Conversely, any such transformation is clearly an A--A-, for if y=yeR, it provides (I-AOAO) Ty EN. Further, any A--AO may be expressed this way. Thus we have restated, motivated and proved (2.4.2) of Rao-Mitra.

    The second Rao-Mitra form follows at least as readily. Let TM') and TM2) be two (unrelated) linear transformations from V2 to J/. Then construct the following transformations on y:

    T(1) y$,

    N component of T2)y = N component of T2 ys +N component of T(2)yR

    and add them to obtain

    (T() + N component of TM2))ys+N component of TM2 YR. Clearly any A--A- may be so expressed. And, conversely, any transformation so expressed is an A--A- since it takes R into N.

    Finally, go back to the first displayed expression above and write out the sum in terms of the explicit expressions for components,

    T(1)(I- AO A-)y + (I- A- AO) TM2 y. This is (2.4.3) of Rao-Mitra.

    13. ADJOINTS We may define A' by (y, Ax) = (A'y, x) for all xc Vl, ye V2. The resulting A' is

    readily shown to be a linear transformation from V2 to V1. Its null space is the orthogonal complement of R, and its range is the orthogonal complement of N. Hence, in particular, the rank of A' is p. For purposes of this paper, the adjoint is not needed, but I mention it briefly because some treatments do use it, for example, in explication of the Mi N and R i S orthogonality characteristics.

    14. RELATIONS TO SOLUTION OF LINEAR RELATIONS AND TO LEAST SQUARES I end this paper with a brief exposition of standard material in our framework.

    First consider which x's satisfy Ax = y, where yeR so that the set of satisfying x's is non-empty. It is readily seen that the satisfying set forms a flat (or coset) whose

    This content downloaded from 132.174.254.159 on Wed, 16 Sep 2015 15:53:44 UTCAll use subject to JSTOR Terms and Conditions

  • 282 KRUSKAL - Geometry of Generalized Inverses [No. 2,

    generating linear manifold is just N. In other words, if x* is one solution- if Ax* = y-then the set of all solutions is the set of all x* + XN as XN runs over N.

    That solution with minimum length in terms of the V1 inner product is of course the one in the orthogonal complement of N. Hence the minimum length x satisfying Ax = y eR is A-Y for any A- satisfying the MiN condition. Conditions on S are irrelevant since we are here only concerned with y in R.

    Whether it is in fact desirable to minimize length in terms of the V1 inner product is another matter entirely, and one that clearly depends on context.

    Now consider what happens if y is not in R, so that Ax = y for no x. Following a traditional path, we may decide that the best we can do is solve Ax = yC, where y, is in some sense the closest vector in R to y. It is traditional to measure closeness by the length of y-y, in terms of the V2 inner product, so that yJ = PRY, the orthogonal projection of y onto R. Then x = A-YC for any A- provides an approximate "least squares" solution in the above sense, i.e. minimizes jjAx-yjj with respect to x, where u I is length in terms of the V2 inner product.

    But A- PR for any A- is itself a generalized inverse satisfying the equal rank and the SiR conditions. Conversely any generalized inverse satisfying those conditions may be written as A-PR.

    Whether it is desirable to minimize length in terms of the V2 inner product is another question, statistical rather than mathematical, and not treated here.

    Finally, putting the two discussions just above together, the generalized inverse providing the shortest x that minimizes the length of Ax-y is the Moore-Penrose inverse, the unique generalized inverse satisfying the egalitarian and the two orthogonal conditions.

    ACKNOWLEDGEMENTS This research was carried out in the Department of Statistics, University of

    Chicago, under partial sponsorship of the Statistics Branch, Office of Naval Research, Navy N00014-67-A-0825-0009, and by Research Grant No. NSF GP 32037 from the Division of Mathematical, Physical and Engineering Sciences of the National Science Foundation.

    Highly helpful comments and suggestions have come to me from the following readers of a draft of this paper: John Bibby, John Chipman, Oscar Kempthorne, Joseph B. Kruskal, Frederick Mosteller, Patrick L. Odell, C. R. Rao, Arthur A. Rayner and Geoffrey Watson. I express warm appreciation together with the con- ventional disclaimer.

    REFERENCES ALBERT, A. (1972). Regression and the Moore-Penrose Pseudo-inverse. New York: Academic Press. BEN-ISRAEL, A. and GREVILLE, T. N. E. (1974). Generalized Inverses: Theory and Applications.

    New York: Wiley-Interscience. BEUTLER, F. J. and ROOT, W. L. (1973). The operator pseudoinverse in control and systems

    identification. Computer, Information and Control Engineering Program, University of Michigan.

    BIBBY, J. (1972). Review of Boullion and Odell (1971). J. R. Statist. Soc. A, 135, 608-609. BOULLION, T. L. and ODELL, P. L. (1971). Generalized Inverse Matrices. New York: Wiley. HALMOS, P. R. (1958). Finite-dimensional Vector Spaces. Princeton, N.J.: Van Nostrand.t

    t After completion of this paper, I learned that a new edition of Halmos's text has just been published or is about to be published.I

    This content downloaded from 132.174.254.159 on Wed, 16 Sep 2015 15:53:44 UTCAll use subject to JSTOR Terms and Conditions

  • 1975] KRUSKAL - Geometry of Generalized Inverses 283

    KRUSKAL, W. (1961). The coordinate-free approach to Gauss-Markov estiniation, and its application to missing and extra observations. Proc. 4th Berk. Symp. Math. Stat. and Prob., 1, 435-451. Berkeley and Los Angeles: University of California Press.

    (1968). When are Gauss-Markov and least squares estimators identical? A coordinate-free approach. Ann. Math. Statist., 39, 70-75.

    NASHED, M. Z. and VOTRUBA, G. F. (1974). A unified approach to generalized inverses of linear operators: I. Algebraic, topological and projectional properties. Bull. Amer. Math. Soc., 80, 825-830.

    (1974). A unified approach to generalized inverses of linear operators: II. External and proximal properties. Bull. Amer. Math. Soc., 80, 831-835.

    PRINGLE, R. M. and RAYNER, A. A. (1971). Generalized Inverse Matrices with Applications to Statistics. London: Griffin.

    RAO, C. R. and MITRA, S. K. (1971). Generalized Inverse of Matrices and its Applications. New York: Wiley.

    SIBUYA, M. (1970). Subclasses of generalized inverses of matrices. Ann. Inst. Stat. Math., 22, 543-556.

    This content downloaded from 132.174.254.159 on Wed, 16 Sep 2015 15:53:44 UTCAll use subject to JSTOR Terms and Conditions

    Article Contentsp. 272p. 273p. 274p. 275p. 276p. 277p. 278p. 279p. 280p. 281p. 282p. 283

    Issue Table of ContentsJournal of the Royal Statistical Society. Series B (Methodological), Vol. 37, No. 2 (1975) pp. 149-296+i-iiFront Matter [pp. ]Techniques for Testing the Constancy of Regression Relationships over Time [pp. 149-192]Bayes Equivariant Estimators in High Order Hierarchical Random Effects Models [pp. 193-197]A Sequential Test for Certain Composite Hypotheses [pp. 198-204]Equivalence Theorems for Polynomial-Projecting Predictors [pp. 205-215]Components of Cramr-von Mises Statistics. II[pp. 216-237]Synchronous and Asynchronous Distributions for Poisson Cluster Processes [pp. 238-247]On the Concepts of Sufficiency and Ancillarity in the Presence of Nuisance Parameters [pp. 248-258]Independent Inter-Departure Times in M/G/1/N Queues [pp. 259-263]Admissibility and Minimaxity of Some Maximum Likelihood Estimators when the Parameter Space is Restricted to Integers [pp. 264-271]The Geometry of Generalized Inverses [pp. 272-283]Addition to the Nearest Whole Unit [pp. 284-287]Coefficients in D-Optimal Experimental Design [pp. 288-292]Comments and QueriesThe Consistency of the GEID-(FP-) Estimates [pp. 293-295]

    Correction: Planning Experiments for Discriminating Between Models [pp. 296]Back Matter [pp. ]