Matrix Analysis for Scientist

Matrix Analysisfor Scientists & Engineers

This page intentionally left blank

Matrix Analysisfor Scientists & Engineers

Alan J. LaubUniversity of California

Davis, California

slam.

Copyright 2005 by the Society for Industrial and Applied Mathematics.

1 0 9 8 7 6 5 4 3 2 1

All rights reserved. Printed in the United States of America. No part of this bookmay be reproduced, stored, or transmitted in any manner without the written permissionof the publisher. For information, write to the Society for Industrial and AppliedMathematics, 3600 University City Science Center, Philadelphia, PA 19104-2688.

MATLAB is a registered trademark of The MathWorks, Inc. For MATLAB product information,please contact The MathWorks, Inc., 3 Apple Hill Drive, Natick, MA 01760-2098 USA,508-647-7000, Fax: 508-647-7101, [email protected], www.mathworks.com

Mathematica is a registered trademark of Wolfram Research, Inc.

Mathcad is a registered trademark of Mathsoft Engineering & Education, Inc.

Library of Congress Cataloging-in-Publication Data

Laub, Alan J., 1948-Matrix analysis for scientists and engineers / Alan J. Laub.

p. cm.Includes bibliographical references and index.ISBN 0-89871-576-8 (pbk.)

1. Matrices. 2. Mathematical analysis. I. Title.

QA188138 2005512.9'434dc22

2004059962

About the cover: The original artwork featured on the cover was created by freelanceartist Aaron Tallon of Philadelphia, PA. Used by permission.

slam is a registered trademark.

To my wife, Beverley

(who captivated me in the UBC math librarynearly forty years ago)

Contents

Preface xi

1 Introduction and Review 11.1 Some Notation and Terminology 11.2 Matrix Arithmetic 31.3 Inner Products and Orthogonality 41.4 Determinants 4

2 Vector Spaces 72.1 Definitions and Examples 72.2 Subspaces 92.3 Linear Independence 102.4 Sums and Intersections of Subspaces 13

3 Linear Transformations 173.1 Definition and Examples 173.2 Matrix Representation of Linear Transformations 183.3 Composition of Transformations 193.4 Structure of Linear Transformations 203.5 Four Fundamental Subspaces 22

4 Introduction to the Moore-Penrose Pseudoinverse 294.1 Definitions and Characterizations 294.2 Examples 304.3 Properties and Applications 31

5 Introduction to the Singular Value Decomposition 355.1 The Fundamental Theorem 355.2 Some Basic Properties 385.3 Row and Column Compressions 40

6 Linear Equations 436.1 Vector Linear Equations 436.2 Matrix Linear Equations 446.3 A More General Matrix Linear Equation 476.4 Some Useful and Interesting Inverses 47

vii

viii Contents

7 Projections, Inner Product Spaces, and Norms 517.1 Projections 51

7.1.1 The four fundamental orthogonal projections 527.2 Inner Product Spaces 547.3 Vector Norms 577.4 Matrix Norms 59

8 Linear Least Squares Problems 658.1 The Linear Least Squares Problem 658.2 Geometric Solution 678.3 Linear Regression and Other Linear Least Squares Problems 67

8.3.1 Example: Linear regression 678.3.2 Other least squares problems 69

8.4 Least Squares and Singular Value Decomposition 708.5 Least Squares and QR Factorization 71

9 Eigenvalues and Eigenvectors 759.1 Fundamental Definitions and Properties 759.2 Jordan Canonical Form 829.3 Determination of the JCF 85

9.3.1 Theoretical computation 869.3.2 On the +1's in JCF blocks 88

9.4 Geometric Aspects of the JCF 899.5 The Matrix Sign Function 91

10 Canonical Forms 9510.1 Some Basic Canonical Forms 9510.2 Definite Matrices 9910.3 Equivalence Transformations and Congruence 102

10.3.1 Block matrices and definiteness 10410.4 Rational Canonical Form 104

11 Linear Differential and Difference Equations 10911.1 Differential Equations 109

11.1.1 Properties of the matrix exponential 10911.1.2 Homogeneous linear differential equations 11211.1.3 Inhomogeneous linear differential equations 11211.1.4 Linear matrix differential equations 11311.1.5 Modal decompositions 11411.1.6 Computation of the matrix exponential 114

11.2 Difference Equations 11811.2.1 Homogeneous linear difference equations 11811.2.2 Inhomogeneous linear difference equations 11811.2.3 Computation of matrix powers 119

11.3 Higher-Order Equations 120

Contents ix

12 Generalized Eigenvalue Problems 12512.1 The Generalized Eigenvalue/Eigenvector Problem 12512.2 Canonical Forms 12712.3 Application to the Computation of System Zeros 13012.4 Symmetric Generalized Eigenvalue Problems 13112.5 Simultaneous Diagonalization 133

12.5.1 Simultaneous diagonalization via SVD 13312.6 Higher-Order Eigenvalue Problems 135

12.6.1 Conversion to first-order form 135

13 Kronecker Products 13913.1 Definition and Examples 13913.2 Properties of the Kronecker Product 14013.3 Application to Sylvester and Lyapunov Equations 144

Bibliography 151

Index 153

Preface

This book is intended to be used as a text for beginning graduate-level (or even senior-level)students in engineering, the sciences, mathematics, computer science, or computationalscience who wish to be familar with enough matrix analysis that they are prepared to use itstools and ideas comfortably in a variety of applications. By matrix analysis I mean linearalgebra and matrix theory together with their intrinsic interaction with and application tolinear dynamical systems (systems of linear differential or difference equations). The textcan be used in a one-quarter or one-semester course to provide a compact overview ofmuch of the important and useful mathematics that, in many cases, students meant to learnthoroughly as undergraduates, but somehow didn't quite manage to do. Certain topicsthat may have been treated cursorily in undergraduate courses are treated in more depthand more advanced material is introduced. I have tried throughout to emphasize only themore important and "useful" tools, methods, and mathematical structures. Instructors areencouraged to supplement the book with specific application examples from their ownparticular subject area.

The choice of topics covered in linear algebra and matrix theory is motivated both byapplications and by computational utility and relevance. The concept of matrix factorizationis emphasized throughout to provide a foundation for a later course in numerical linearalgebra. Matrices are stressed more than abstract vector spaces, although Chapters 2 and 3do cover some geometric (i.e., basis-free or subspace) aspects of many of the fundamentalnotions. The books by Meyer [18], Noble and Daniel [20], Ortega [21], and Strang [24]are excellent companion texts for this book. Upon completion of a course based on thistext, the student is then well-equipped to pursue, either via formal courses or through self-study, follow-on topics on the computational side (at the level of [7], [11], [23], or [25], forexample) or on the theoretical side (at the level of [12], [13], or [16], for example).

Prerequisites for using this text are quite modest: essentially just an understandingof calculus and definitely some previous exposure to matrices and linear algebra. Basicconcepts such as determinants, singularity of matrices, eigenvalues and eigenvectors, andpositive definite matrices should have been covered at least once, even though their recollec-tion may occasionally be "hazy." However, requiring such material as prerequisite permitsthe early (but "out-of-order" by conventional standards) introduction of topics such as pseu-doinverses and the singular value decomposition (SVD). These powerful and versatile toolscan then be exploited to provide a unifying foundation upon which to base subsequent top-ics. Because tools such as the SVD are not generally amenable to "hand computation," thisapproach necessarily presupposes the availability of appropriate mathematical software ona digital computer. For this, I highly recommend MATLAB although other software such as

xi

xii Preface

Mathematica or Mathcad is also excellent. Since this text is not intended for a course innumerical linear algebra per se, the details of most of the numerical aspects of linear algebraare deferred to such a course.

The presentation of the material in this book is strongly influenced by computa-tional issues for two principal reasons. First, "real-life" problems seldom yield to simpleclosed-form formulas or solutions. They must generally be solved computationally andit is important to know which types of algorithms can be relied upon and which cannot.Some of the key algorithms of numerical linear algebra, in particular, form the foundationupon which rests virtually all of modern scientific and engineering computation. A secondmotivation for a computational emphasis is that it provides many of the essential tools forwhat I call "qualitative mathematics." For example, in an elementary linear algebra course,a set of vectors is either linearly independent or it is not. This is an absolutely fundamentalconcept. But in most engineering or scientific contexts we want to know more than that.If a set of vectors is linearly independent, how "nearly dependent" are the vectors? If theyare linearly dependent, are there "best" linearly independent subsets? These turn out tobe much more difficult problems and frequently involve research-level questions when setin the context of the finite-precision, finite-range floating-point arithmetic environment ofmost modern computing platforms.

Some of the applications of matrix analysis mentioned briefly in this book derivefrom the modern state-space approach to dynamical systems. State-space methods arenow standard in much of modern engineering where, for example, control systems withlarge numbers of interacting inputs, outputs, and states often give rise to models of veryhigh order that must be analyzed, simulated, and evaluated. The "language" in which suchmodels are conveniently described involves vectors and matrices. It is thus crucial to acquirea working knowledge of the vocabulary and grammar of this language. The tools of matrixanalysis are also applied on a daily basis to problems in biology, chemistry, econometrics,physics, statistics, and a wide variety of other fields, and thus the text can serve a ratherdiverse audience. Mastery of the material in this text should enable the student to read andunderstand the modern language of matrices used throughout mathematics, science, andengineering.

While prerequisites for this text are modest, and while most material is developed frombasic ideas in the book, the student does require a certain amount of what is conventionallyreferred to as "mathematical maturity." Proofs are given for many theorems. When they arenot given explicitly, they are either obvious or easily found in the literature. This is idealmaterial from which to learn a bit about mathematical proofs and the mathematical maturityand insight gained thereby. It is my firm conviction that such maturity is neither encouragednor nurtured by relegating the mathematical aspects of applications (for example, linearalgebra for elementary state-space theory) to an appendix or introducing it "on-the-fly" whennecessary. Rather, one must lay a firm foundation upon which subsequent applications andperspectives can be built in a logical, consistent, and coherent fashion.

I have taught this material for many years, many times at UCSB and twice at UCDavis, and the course has proven to be remarkably successful at enabling students fromdisparate backgrounds to acquire a quite acceptable level of mathematical maturity andrigor for subsequent graduate studies in a variety of disciplines. Indeed, many students whocompleted the course, especially the first few times it was offered, remarked afterward thatif only they had had this course before they took linear systems, or signal processing,

Preface xiii

or estimation theory, etc., they would have been able to concentrate on the new ideasthey wanted to learn, rather than having to spend time making up for deficiencies in theirbackground in matrices and linear algebra. My fellow instructors, too, realized that byrequiring this course as a prerequisite, they no longer had to provide as much time for"review" and could focus instead on the subject at hand. The concept seems to work.

AJL, June 2004

Chapter 1

Introduction and Review

1.1 Some Notation and TerminologyWe begin with a brief introduction to some standard notation and terminology to be usedthroughout the text. This is followed by a review of some basic notions in matrix analysisand linear algebra.

The following sets appear frequently throughout subsequent chapters:

1. Rn= the set of n-tuples of real numbers represented as column vectors. Thus, x e Rnmeans

where xi e R for i e n.Henceforth, the notation n denotes the set {1, . . . , n}.Note: Vectors are always column vectors. A row vector is denoted by yT, wherey G Rn and the superscript T is the transpose operation. That a vector is always acolumn vector rather than a row vector is entirely arbitrary, but this convention makesit easy to recognize immediately throughout the text that, e.g., XTy is a scalar whilexyT is an n x n matrix.

2. Cn = the set of n-tuples of complex numbers represented as column vectors.

3. Rmxn = the set of real (or real-valued) m x n matrices.4. Rmxnr = the set of real m x n matrices of rank r. Thus, Rnxnn denotes the set of real

nonsingular n x n matrices.

5. Cmxn = the set of complex (or complex-valued) m x n matrices.6. Cmxn = the set of complex m x n matrices of rank r.

1

Chapter 1. Introduction and Review

Each of the above also has a "block" analogue obtained by replacing scalar components inthe respective definitions by block submatrices. For example, if A e Rnxn, B e Rm x n , andC e Rmxm, then the (m + n) x (m + n) matrix [ A0 Bc ] is block upper triangular.

The transpose of a matrix A is denoted by AT and is the matrix whose (i, j)th entryis the (7, Oth entry of A, that is, (A7),, = a,,. Note that if A e Rmx", then A7" e E"xm.If A e Cmx", then its Hermitian transpose (or conjugate transpose) is denoted by AH (orsometimes A*) and its (i, j)\h entry is (AH),7 = (77), where the bar indicates complexconjugation; i.e., if z = a + jf$ (j = i = v^T), then z = a jfi. A matrix A is symmetricif A = AT and Hermitian if A = AH. We henceforth adopt the convention that, unlessotherwise noted, an equation like A = AT implies that A is real-valued while a statementlike A = AH implies that A is complex-valued.

Remark 1.1. While \/\ is most commonly denoted by i in mathematics texts, j isthe more common notation in electrical engineering and system theory. There is someadvantage to being conversant with both notations. The notation j is used throughout thetext but reminders are placed at strategic locations.

Example 1.2.

Transposes of block matrices can be defined in an obvious way. For example, it iseasy to see that if A,, are appropriately dimensioned subblocks, then

is symmetric (and Hermitian).

is complex-valued symmetric but not Hermitian.

is Hermitian (but not symmetric).

2

We now classify some of the more familiar "shaped" matrices. A matrix A e(or A eC"x")is

diagonal if a,7 = 0 for i ^ j. upper triangular if a,; = 0 for i > j. lower triangular if a,7 = 0 for / < j. tridiagonal if a(y = 0 for |z j\ > 1. pentadiagonal if ai; = 0 for |/ j\ > 2. upper Hessenberg if afj = 0 for i j > 1. lower Hessenberg if a,; = 0 for j i > 1.

1.2. Matrix Arithmetic

1.2 Matrix ArithmeticIt is assumed that the reader is familiar with the fundamental notions of matrix addition,multiplication of a matrix by a scalar, and multiplication of matrices.

A special case of matrix multiplication occurs when the second matrix is a columnvector x, i.e., the matrix-vector product Ax. A very important way to view this product isto interpret it as a weighted sum (linear combination) of the columns of A. That is, suppose

The importance of this interpretation cannot be overemphasized. As a numerical example,

take A = [96 85 74]x = 2 . Then we can quickly calculate dot products of the rows of A

with the column x to find Ax = [50 32]' but this matrix-vector product can also be computedv1a

For large arrays of numbers, there can be important computer-architecture-related advan-tages to preferring the latter calculation method.

For matrix multiplication, suppose A e Rmxn and B = [bi,...,bp] e Rnxp withbi e W1. Then the matrix product A B can be thought of as above, applied p times:

There is also an alternative, but equivalent, formulation of matrix multiplication that appearsfrequently in the text and is presented below as a theorem. Again, its importance cannot beoveremphasized. It is deceptively simple and its full understanding is well rewarded.

Theorem 1.3. Let U = [MI , . . . , un] e Rmxn with ut e Rm and V = [v{,..., vn] e Rpxnwith vt e Rp. Then

If matrices C and D are compatible for multiplication, recall that (CD)T = DTCT(or (CD}H DHCH). This gives a dual to the matrix-vector result above. Namely, ifC eRmxn has row vectors cj e Elx", and is premultiplied by a row vector yTe R l xm ,then the product can be written as a weighted linear sum of the rows of C as follows:

3

Theorem 1.3 can then also be generalized to its "row dual." The details are left to the readei

Then


1.3 Inner Products and OrthogonalityFor vectors x, y e R", the Euclidean inner product (or inner product, for short) of x andy is given by

Note that the inner product is a scalar.If x, y e C", we define their complex Euclidean inner product (or inner product,

for short) by

and we see that, indeed, (x, y)c = (y, x)c.Note that x Tx = 0 if and only if x = 0 when x e Rn but that this is not true if x e Cn.

What is true in the complex case is that XH x = 0 if and only if x = 0. To illustrate, considerthe nonzero vector x above. Then XTX = 0 but XHX = 2.

Two nonzero vectors x, y e R are said to be orthogonal if their inner product iszero, i.e., xTy = 0. Nonzero complex vectors are orthogonal if XHy = 0. If x and y areorthogonal and XTX = 1 and yTy = 1, then we say that x and y are orthonormal. Amatrix A e Rnxn is an orthogonal matrix if ATA = AAT = /, where / is the n x nidentity matrix. The notation / is sometimes used to denote the identity matrix in Rnx"(orC"x"). Similarly, a matrix A e Cnxn is said to be unitary if AH A = AAH = I. Clearlyan orthogonal or unitary matrix has orthonormal rows and orthonormal columns. There isno special name attached to a nonsquare matrix A e Rmxn (or Cmxn) with orthonormalrows or columns.

1.4 DeterminantsIt is assumed that the reader is familiar with the basic theory of determinants. For A e R nxn(or A 6 Cnxn) we use the notation det A for the determinant of A. We list below some of

Note that (x, y)c = (y, x)c, i.e., the order in which x and y appear in the complex innerproduct is important. The more conventional definition of the complex inner product is( x , y ) c = yHx = Eni=1 xiyi but throughout the text we prefer the symmetry with the realcase.

Example 1.4. Let x = [ 1j ] and y = [ 1/2 ]. Then

while

4

1.4. Determinants

the more useful properties of determinants. Note that this is not a minimal set, i.e., severalproperties are consequences of one or more of the others.

1. If A has a zero row or if any two rows of A are equal, then det A = 0.

2. If A has a zero column or if any two columns of A are equal, then det A = 0.

3. Interchanging two rows of A changes only the sign of the determinant.

4. Interchanging two columns of A changes only the sign of the determinant.

5. Multiplying a row of A by a scalar a results in a new matrix whose determinant isa det A.

6. Multiplying a column of A by a scalar a results in a new matrix whose determinantis a det A.

7. Multiplying a row of A by a scalar and then adding it to another row does not changethe determinant.

8. Multiplying a column of A by a scalar and then adding it to another column does notchange the determinant.

9. det AT = det A (det AH = det A if A e Cnxn).10. If A is diagonal, then det A = a11a22 ann, i.e., det A is the product of its diagonal

elements.

11. If A is upper triangular, then det A = a11a22 ann.

12. If A is lower triangular, then det A = a11a22 ann.

13. If A is block diagonal (or block upper triangular or block lower triangular), withsquare diagonal blocks A11, A22, , Ann (of possibly different sizes), then det A =det A11 det A22 det Ann.

14. If A, B eRnxn,thendet(AB) = det A det 5.15. If A Rnxn, then det(A-1) = 1det A.

16. If A e Rn x nand D e Rmxm , then det [Ac BD] = del A det(D CA l B).Proof: This follows easily from the block LU factorization

17. If A eR n x n and D e RMmxm, then det [Ac BD] = det D det(A B D 1 C ) .Proof: This follows easily from the block UL factorization

5


Remark 1.5. The factorization of a matrix A into the product of a unit lower triangularmatrix L (i.e., lower triangular with all 1's on the diagonal) and an upper triangular matrixU is called an LU factorization; see, for example, [24]. Another such factorization is ULwhere U is unit upper triangular and L is lower triangular. The factorizations used aboveare block analogues of these.

Remark 1.6. The matrix D CA 1 B is called the Schur complement of A in [AC BD].Similarly, A BD l C is the Schur complement of D in [AC BD ].

EXERCISES

1. If A e Rnxn and or is a scalar, what is det(aA)? What is det(A)?2. If A is orthogonal, what is det A? If A is unitary, what is det A?

3. Let x, y e Rn. Show that det(I xyT) = 1 yTx.4. Let U1, U2, . . ., Uk Rnxn be orthogonal matrices. Show that the product U =

U1 U2 Uk is an orthogonal matrix.

5. Let A e R n x n . The trace of A, denoted TrA, is defined as the sum of its diagonalelements, i.e., TrA = Eni=1 aii.

(a) Show that the trace is a linear function; i.e., if A, B e Rnxn and a, ft e R, thenTr(aA + fiB)= aTrA + fiTrB.

(b) Show that Tr(Afl) = Tr(A), even though in general AB ^ B A.(c) Let S Rnxn be skew-symmetric, i.e., ST = -S. Show that TrS = 0. Then

either prove the converse or provide a counterexample.

6. A matrix A e Wx" is said to be idempotent if A2 = A.

/ x . , , ! T 2cos2

Chapter 2

Vector Spaces

In this chapter we give a brief review of some of the basic concepts of vector spaces. Theemphasis is on finite-dimensional vector spaces, including spaces formed by special classesof matrices, but some infinite-dimensional examples are also cited. An excellent referencefor this and the next chapter is [10], where some of the proofs that are not given here maybe found.

2.1 Definitions and ExamplesDefinition 2.1. A field is a set F together with two operations +, : F x F > F such that

Axioms (A1)-(A3) state that (F, +) is a group and an abelian group if (A4) also holds.Axioms (M1)-(M4) state that (F \ {0}, ) is an abelian group.

Generally speaking, when no confusion can arise, the multiplication operator "" isnot written explicitly.

7

(Al) a + (P + y ) = (a + p ) + y f o r all a, ft, y F.(A2) there exists an element 0 e F such that a + 0 = a. for all a e F.(A3) for all a e F, there exists an element (a) e F such that a + (a) = 0.(A4) a + p = ft + afar all a, ft e F.

(Ml) a - ( p - y ) = ( a - p ) - y f o r all a, p, y e F.(M2) there exists an element 1 e F such that a I = a for all a e F.(M3) for all a e , a ^ 0, there exists an element a"1 F such that a a~l = 1.(M4) a p = P a for all a, p e F.(D) a - ( p + y)=ci-p+a- y for alia, p,ye.

Chapter 2. Vector Spaces

Example 2.2.

1. R with ordinary addition and multiplication is a field.

2. C with ordinary complex addition and multiplication is a field.

3. Raf.r] = the field of rational functions in the indeterminate x

8

where Z+ = {0,1,2, ...}, is a field.4. RMrmxn = { m x n matrices of rank r with real coefficients) is clearly not a field since,

for example, (Ml) does not hold unless m = n. Moreover, R"x" is not a field eithersince (M4) does not hold in general (although the other 8 axioms hold).

Definition 2.3. A vector space over a field F is a set V together with two operations+ :V x V -^V and- : F xV - V such that

A vector space is denoted by (V, F) or, when there is no possibility of confusion as to theunderlying fie Id, simply by V.

Remark 2.4. Note that + and in Definition 2.3 are different from the + and in Definition2.1 in the sense of operating on different objects in different sets. In practice, this causesno confusion and the operator is usually not even written explicitly.

Example 2.5.

1. (R", R) with addition defined by

and scalar multiplication defined by

is a vector space. Similar definitions hold for (C", C).

(VI) (V, +) is an abelian group.(V2) ( a - p ) - v = a - ( P ' V ) f o r all a, p e F and for all v e V.(V3) (a + ft) v = a v + p v for all a, p F and for all v e V.(V4) a-(v + w)=a-v + a- w for all a e F and for all v, w e V.(V5) 1 v = v for all v e V (1 e F).

2.2. Subspaces

3. Let (V, F) be an arbitrary vector space and V be an arbitrary set. Let O(X>, V) be theset of functions / mapping D to V. Then O(D, V) is a vector space with additiondefined by

2.2 SubspacesDefinition 2.6. Let (V, F) be a vector space and let W c V, W = 0. Then (W, F) is asubspace of (V, F) if and only if (W, F) is itself a vector space or, equivalently, if and onlyi f ( a w 1 + W2) e W for all a, e and for all w1, w2 e W.Remark 2.7. The latter characterization of a subspace is often the easiest way to checkor prove that something is indeed a subspace (or vector space); i.e., verify that the set inquestion is closed under addition and scalar multiplication. Note, too, that since 0 e F, thisimplies that the zero vector must be in any subspace.

Notation: When the underlying field is understood, we write W c V, and the symbol c,when used with vector spaces, is henceforth understood to mean "is a subspace of." Theless restrictive meaning "is a subset of" is specifically flagged as such.

9

2. (Emxn, E) is a vector space with addition defined by



Special Cases:

(a) V = [to, t \ ] , (V, F) = (IR", E), and the functions are piecewise continuous=: (PC[f0, t\])n or continuous =: (C[?0, h])n.

4. Let A R"x". Then (x(t) : x ( t ) = Ax(t}} is a vector space (of dimension n).

Then Wa, is a subspace of V if and only if = 0. As an interesting exercise, sketchW2,1, W2,o, W1/2,1, and W1/2,

0. Note, too, that the vertical line through the origin (i.e.,a = oo) is also a subspace.All lines through the origin are subspaces. Shifted subspaces Wa, with = 0 arecalled linear varieties.

Henceforth, we drop the explicit dependence of a vector space on an underlying field.Thus, V usually denotes a vector space with the underlying field generally being R unlessexplicitly stated otherwise.

Definition 2.9. If 12, and S are vector spaces (or subspaces), then R = S if and only ifR C S and S C R.Note: To prove two vector spaces are equal, one usually proves the two inclusions separately:An arbitrary r e R is shown to be an element of S and then an arbitrary 5 S is shown tobe an element of R.

2.3 Linear IndependenceLet X = {v1, v2, } be a nonempty collection of vectors u, in some vector space V.Definition 2.10. X is a linearly dependent set of vectors if and only if there exist k distinctelements v1, . . . , vk e X and scalars a1, . . . , ak not all zero such that

10 Chapter 2. Vector Spaces

Example 2.8.

1. Consider (V, F) = (R"X",R) and let W = [A e R"x" : A is symmetric}. ThenWe V.

Proof: Suppose A\, A2 are symmetric. Then it is easily shown that ctA\ + fiAi issymmetric for all a, ft e R.

2. Let W = {A R"x" : A is orthogonal}. Then W is /wf a subspace of R"x".

3. Consider (V, F) = (R2, R) and for each v R2 of the form v = [v1v2 ] identify v1 withthe jc-coordinate in the plane and u2 with the y-coordinate. For a, e R, define

X is a linearly independent set of vectors if and only if for any collection of k distinctelements v1, . . . ,Vk of X and for any scalars a1, . . . , ak,

2.3. Linear Independence 11

(since 2v\ v2 + v3 = 0).

2. Let A e R xn and 5 e R" xm. Then consider the rows of etA B as vectors in Cm [t0, t1](recall that efA denotes the matrix exponential, which is discussed in more detail inChapter 11). Independence of these vectors turns out to be equivalent to a conceptcalled controllability, to be studied further in what follows.

Let vf e R", i e k, and consider the matrix V = [v1 , ... ,Vk] e Rnxk. The lineardependence of this set of vectors is equivalent to the existence of a nonzero vector a e Rksuch that Va = 0. An equivalent condition for linear dependence is that the k x k matrixVT V is singular. If the set of vectors is independent, and there exists a e R* such thatVa = 0, then a = 0. An equivalent condition for linear independence is that the matrixVTV is nonsingular.

Definition 2.12. Let X = [ v 1 , v 2 , . . . } be a collection of vectors vi. e V. Then the span ofX is defined as

Example 2.13. Let V = Rn and define

Then Sp{e1, e2, ...,en} = Rn.Definition 2.14. A set of vectors X is a basis for V if and only ij

1. X is a linearly independent set (of basis vectors), and2. Sp(X) = V.

Example 2.11.

is a linearly independent set. Why?

s a linearly dependent setHowever,

1. LetV = R3. Then

where N = {1, 2, ...}.


Example 2.15. [e\,..., en} is a basis for IR" (sometimes called the natural basis).Now let b1, ..., bn be a basis (with a specific order associated with the basis vectors)

for V. Then for all v e V there exists a unique n-tuple {E1 , . . . , E n } such that

Definition 2.16. The scalars {Ei} are called the components (or sometimes the coordinates)of v with respect to the basis (b1, ..., bn] and are unique. We say that the vector x ofcomponents represents the vector v with respect to the basis B.

Example 2.17. In Rn,

we have

To see this, write

Then

Theorem 2.18. The number of elements in a basis of a vector space is independent of theparticular basis considered.

Definition 2.19. If a basis X for a vector space V= 0) has n elements, V is said tobe n-dimensional or have dimension n and we write dim(V) = n or dim V n. For

We can also determine components of v with respect to another basis. For example, while

with respect to the basis

where

2.4 Sums and Intersections of SubspacesDefinition 2.21. Let (V, F) be a vector space and let 71, S c V. The sum and intersectionof R, and S are defined respectively by:

The subspaces R, and S are said to be complements of each other in T.

Remark 2.23. The union of two subspaces, R C S, is not necessarily a subspace.

Definition 2.24. T = R 0 S is the direct sum of R and S if

Theorem 2.22.

2.4. Sums and Intersections of Subspaces 13

consistency, and because the 0 vector is in any vector space, we define dim(O) = 0. Avector space V is finite-dimensional if there exists a basis X with n < +00 elements;otherwise, V is infinite-dimensional.

Thus, Theorem 2.18 says that dim(V) = the number of elements in a basis.Example 2.20.

1. dim(Rn)=n.2. dim(Rmxn) = mn.

Note: Check that a basis for Rmxn is given by the mn matrices Eij; i e m, j e n,where Efj is a matrix all of whose elements are 0 except for a 1 in the (i, j)th location.The collection of Eij matrices can be called the "natural basis matrices."

3. dim(C[to, t1]) - +00.4. dim{A Rnxn : A = AT} = {1/2(n + 1).

12(To see why, determine 1/2n(n + 1) symmetric basis matrices.)

5. dim{A e Rnxn : A is upper (lower) triangular} = 1/2n(n + 1).

1. n + S = {r + s : r e U, s e 5}.2. ft H 5 = {v : v e 7^ and v e 5}.

K

1. K + S C V (in general, U\ -\ h 7^ =: ]T ft/ C V, for finite k).1=1

2. 72. D 5 C V (in general, f] *R,a C V/or an arbitrary index set A).a e A

1. n n S = 0, and

2. U + S = T (in general ft; n (^ ft,-) = 0 am/ ]P ft,- = T).y>f


Remark 2.25. The complement of ft (or S) is not unique. For example, consider V = R2and let ft be any line through the origin. Then any other distinct line through the origin isa complement of ft. Among all the complements there is a unique one orthogonal to ft.We discuss more about orthogonal complements elsewhere in the text.

Theorem 2.26. Suppose T =R O S. Then

1. every t T can be written uniquely in the form t = r + s with r e R and s e S.2. dim(T) = dim(ft) + dim(S).

Proof: To prove the first part, suppose an arbitrary vector t e T can be written in two waysas t = r1 + s1 = r2 + S2, where r1, r2 e R. and s1, S2 e S. Then r1 r2 = s2 s\. Butr1 r2 ft and 52 si e S. Since ft fl S = 0, we must have r\ = r-i and s\ = si fromwhich uniqueness follows.

The statement of the second part is a special case of the next theorem. D

Theorem 2.27. For arbitrary subspaces ft, S of a vector space V,

EXERCISES

1. Suppose {vi,..., Vk} is a linearly dependent set. Then show that one of the vectorsmust be a linear combination of the others.

2. Let x\, *2, . . . , x/c E R" be nonzero mutually orthogonal vectors. Show that [x\,...,Xk} must be a linearly independent set.

3. Let v\,... ,vn be orthonormal vectors in R". Show that Av\,..., Avn are also or-thonormal if and only if A e R"x" is orthogonal.

4. Consider the vectors v\ [2 l]r and 1*2 = [3 l]r. Prove that vi and V2 form a basisfor R2. Find the components of the vector v = [4 l]r with respect to this basis.

Example 2.28. Let U be the subspace of upper triangular matrices in E" x" and let be thesubspace of lower triangular matrices in Rnxn. Then it may be checked that U + L = Rnxnwhile U n is the set of diagonal matrices in Rnxn. Using the fact that dim (diagonalmatrices} = n, together with Examples 2.20.2 and 2.20.5, one can easily verify the validityof the formula given in Theorem 2.27.

Example 2.29. Let (V, F) = (Rnxn, R), let ft be the set of skew-symmetric matrices inR"x", and let S be the set of symmetric matrices in R"x". Then V = U 0 S.

Proof: This follows easily from the fact that any A e R"x" can be written in the form

The first matrix on the right-hand side above is in S while the second is in ft.

Exercises 15

5. Let P denote the set of polynomials of degree less than or equal to two of the formPo + p\x + pix2, where po, p\, p2 e R. Show that P is a vector space over E. Showthat the polynomials 1, *, and 2x2 1 are a basis for P. Find the components of thepolynomial 2 + 3x + 4x2 with respect to this basis.

6. Prove Theorem 2.22 (for the case of two subspaces R and S only).7. Let Pn denote the vector space of polynomials of degree less than or equal to n, and of

the form p ( x ) = po + p\x + + pnxn, where the coefficients /?, are all real. Let PEdenote the subspace of all even polynomials in Pn, i.e., those that satisfy the propertyp(x} = p(x). Similarly, let PQ denote the subspace of all odd polynomials, i.e.,those satisfying p(x} = p ( x ) . Show that Pn = PE PO-

8. Repeat Example 2.28 using instead the two subspaces 7" of tridiagonal matrices andU of upper triangular matrices.

Chapter 3

Linear Transformations

3.1 Definition and ExamplesWe begin with the basic definition of a linear transformation (or linear map, linear function,or linear operator) between two vector spaces.

Definition 3.1. Let (V, F) and (W, F) be vector spaces. Then C : V -> W is a lineartransformation if and only if

(avi + pv2) = aCv\ + fiv2 far all a, e F and far all v},v2e V.

The vector space V is called the domain of the transformation C while VV, the space intowhich it maps, is called the co-domain.

Example 3.2.

1. Let F = R and take V = W = PC[f0, +00).Define : PC[t0, +00) -> PC[t0, +00) by

2. Let F = R and take V = W = Rmx". Fix M e Rmxm.Define : Rmx" -> Mmxn by

3. Let F = R and take V = P" = (p(x) = a0 + ct}x H h anx" : a, E R} andw = -pn-1.Define C.: V > W by Lp p', where' denotes differentiation with respect to x.

17

18 Chapters. Li near Transformations

3.2 Matrix Representation of Linear TransformationsLinear transformations between vector spaces with specific bases can be represented con-veniently in matrix form. Specifically, suppose : (V, F) > (W, F) is linear and furthersuppose that {u,, i e n} and {Wj, j e m] are bases for V and W, respectively. Then theith column of A = Mat (the matrix representation of with respect to the given basesfor V and W) is the representation of i>, with respect to {w}, j e raj. In other words,

represents since

where W = [w\,..., wm] and

is the z'th column of A. Note that A = Mat depends on the particular bases for V and W.This could be reflected by subscripts, say, in the notation, but this is usually not done.

The action of on an arbitrary vector v e V is uniquely determined (by linearity)by its action on a basis. Thus, if v = E1v1 + + Envn = Vx (where u, and hence jc, isarbitrary), then

Thinking of A both as a matrix and as a linear transformation from Rn to Rm usually causes noconfusion. Change of basis then corresponds naturally to appropriate matrix multiplication.

Thus, V = WA since x was arbitrary.When V = R", W = Rm and [ v i , i e n}, [ w j , j e m} are the usual (natural) bases

the equation V = WA becomes simply = A. We thus commonly identify A as a lineatransformation with its matrix representation, i.e.,

3.3. Composition of Transformations 19

3.3 Composition of TransformationsConsider three vector spaces U, V, and W and transformations B from U to V and A fromV to W. Then we can define a new transformation C as follows:

formula

Two Special Cases:

Inner Product: Let x, y e Rn. Then their inner product is the scalar

Outer Product: Let x e Rm, y e Rn. Then their outer product is the m x nmatrix

Note that any rank-one matrix A e Rmxn can be written in the form A = xyTabove (or xyH if A e Cmxn). A rank-one symmetric matrix can be written inthe form XXT (or XXH).

The above diagram illustrates the composition of transformations C = AB. Note that inmost texts, the arrows above are reversed as follows:

However, it might be useful to prefer the former since the transformations A and B appearin the same order in both the diagram and the equation. If dimZ// = p, dimV = n,and dim W = m, and if we associate matrices with the transformations in the usual way,then composition of transformations corresponds to standard matrix multiplication. That is,we have C A B . The above is sometimes expressed componentwise by the

20 Chapter 3. Li near Transformations

3.4 Structure of Linear TransformationsLet A : V > W be a linear transformation.

Definition 3.3. The range of A, denotedlZ( A), is the set {w e W : w = Av for some v e V}.Equivalently, R(A) {Av : v e V}. The range of A is also known as the image of A anddenoted Im(A).

The nullspace of A, denoted N(A), is the set {v e V : Av = 0}. The nullspace ofA is also known as the kernel of A and denoted Ker (A).

Theorem 3.4. Let A : V > W be a linear transformation. Then1. R (A) C W.2. N(A) c V.

Note that N(A) and R(A) are, in general, subspaces of different spaces.

Theorem 3.5. Let A e Rmxn. If A is written in terms of its columns as A = [a\,... ,an],then

Proof: The proof of this theorem is easy, essentially following immediately from the defi-nition. D

Remark 3.6. Note that in Theorem 3.5 and throughout the text, the same symbol (A) isused to denote both a linear transformation and its matrix representation with respect to theusual (natural) bases. See also the last paragraph of Section 3.2.

Definition 3.7. Let {v1 , . . . , vk] be a set of nonzero vectors u, e Rn. The set is said tobe orthogonal if' vjvj = 0 for i ^ j and orthonormal if vf vj = 8ij, where 8tj is theKronecker delta defined by

Example 3.8.

is an orthogonal set.

is an orthonormal set.

3. If { t > i , . . . , Vk} with u, M." is an orthogonal set, then I /==, - -., /=== | is anI ^/v, vi ^/v'k vk ]

orthonormal set.

then

3.4. Structure of Linear Transformations 21

Definition 3.9. Let S c Rn. Then the orthogonal complement of S is defined as the set

S1- = {v e Rn : VTS = 0 for all s e S}.

Example 3.10. Let

Then it can be shown that

Working from the definition, the computation involved is simply to find all nontrivial (i.e.,nonzero) solutions of the system of equations

Note that there is nothing special about the two vectors in the basis defining S being or-thogonal. Any set of vectors will do, including dependent spanning vectors (which would,of course, then give rise to redundant equations).

Theorem 311 Let R S C Rn The

Proof: We prove and discuss only item 2 here. The proofs of the other results are left asexercises. Let { v1 , ..., vk} be an orthonormal basis for S and let x e Rn be an arbitraryvector. Set

we see that x2 is orthogonal to v1, ..., Vk and hence to any linear combination of thesevectors. In other words, X2 is orthogonal to any vector in S. We have thus shown thatS + S1 = Rn. We also have that S U S1 =0 since the only vector s e S orthogonal toeverything in S (i.e., including itself) is 0.

It is also easy to see directly that, when we have such direct sum decompositions, wecan write vectors in a unique way with respect to the corresponding subspaces. Suppose,for example, that x = x1 + x2. = x'1+ x'2, where x\, x 1 E S and x2, x'2 e S1. Then(x'1 x1)T(x '2 x2) = 0 by definition of ST. But then (x'1 x 1 ) T ( x ' 1 x1) = 0 sincex2 X2 = (x'1 x1) (which follows by rearranging the equation x1+x2 = x'1 + x'2). Thus,x1 x'1 and x2 = x2. DTheorem 3.12. Let A : Rn > Rm. Then

1. N(A)1" = 7(Ar). (Note: This holds only for finite-dimensional vector spaces.)2. 'R,(A)1~ J\f(AT). (Note: This also holds for infinite-dimensional vector spaces.)

Proof: To prove the first part, take an arbitrary x e A/"(A). Then Ax = 0 and this isequivalent to yT Ax = 0 for all v. But yT Ax = ( A T y ) x. Thus, Ax = 0 if and only if xis orthogonal to all vectors of the form AT v, i.e., x e R(A r) . Since x was arbitrary, wehave established that N(A)1 = U(AT}.

The proof of the second part is similar and is left as an exercise. D

Definition 3.13. Let A : Rn -> Rm. Then {v e R" : Av = 0} is sometimes called theright nullspace of A. Similarly, (w e Rm : WT A = 0} is called the left nullspace of A.Clearly, the right nullspace is A/"(A) while the left nullspace is J\f(AT).

Theorem 3.12 and part 2 of Theorem 3.11 can be combined to give two very fun-damental and useful decompositions of vectors in the domain and co-domain of a lineartransformation A. See also Theorem 2.26.

Theorem 3.14 (Decomposition Theorem). Let A : R" -> Rm. Then7. every vector v in the domain space R" can be written in a unique way as v = x + y,

where x M(A) and y J\f(A) = ft(Ar) (i.e., R" = M(A) 0 ft(Ar)).2. every vector w in the co-domain space Rm can be written in a unique way asw = x+y,

where x e U(A) and y e ft(A)1- = Af(AT) (i.e., Rm = 7l(A) 0 M(AT)).

This key theorem becomes very easy to remember by carefully studying and under-standing Figure 3.1 in the next section.

3.5 Four Fundamental SubspacesConsider a general matrix A E^x". When thought of as a linear transformation from E"to Rm, many properties of A can be developed in terms of the four fundamental subspaces


Then x\ e

3.5. Four Fundamental Subspaces 23

Figure 3.1. Four fundamental subspaces.

7(A), 'R.(A)^, A f ( A ) , and N(A)T. Figure 3.1 makes many key properties seem almostobvious and we return to this figure frequently both in the context of linear transformationsand in illustrating concepts such as controllability and observability.

Definition 3.15. Let V and W be vector spaces and let A : Vmotion.

1. A is onto (also called epic or surjective) ifR,(A) = W.

W be a linear transfor-

2. A is one-to-one or 1-1 (also called monic or infective) ifJ\f(A) = 0. Two equivalentcharacterizations of A being 1-1 that are often easier to verify in practice are thefollowing:

Definition 3.16. Let A : E" -> Rm. Then rank(A) = dimftCA). This is sometimes calledthe column rank of A (maximum number of independent columns). The row rank of A is

24 Chapter3. Linear Transformations

dim 7(Ar) (maximum number of independent rows). The dual notion to rank is the nullityof A, sometimes denoted nullity(A) or corank(A), and is defined as dim A/"(A).Theorem 3.17. Let A : Rn -> Rm. Then dim K(A) = dimA/'(A). (Note: SinceA/Â)1" = 7l(AT), this theorem is sometimes colloquially stated "row rank of A = columnrank of A.")Proof: Define a linear transformation T : J\f(A)~L > 7(A) by

Clearly T is 1-1 (since A/"(T) = 0). To see that T is also onto, take any w e 7(A). Thenby definition there is a vector x e R" such that Ax w. Write x = x\ + X2, wherex\ e A/Â)1- and jc2 e A/"(A). Then Ajti = u; = r*i since *i e A/Â)-1. The last equalityshows that T is onto. We thus have that dim7?.(A) = dimA/Â^ since it is easily shownthat if {ui, . . . , iv} is abasis forA/'CA)1, then {Tv\, . . . , Tvr] is abasis for 7?.(A). Finally, ifwe apply this and several previous results, the following string of equalities follows easily:"column rank of A" = rank(A) = dim7e(A) = dim A/Â)1 = dim7l(AT) = rank(Ar) ="row rank of A." D

The following corollary is immediate. Like the theorem, it is a statement about equalityof dimensions; the subspaces themselves are not necessarily in the same vector space.

Corollary 3.18. Let A : R" -> Rm. Then dimA/"(A) + dimft(A) = n, where n is thedimension of the domain of A.Proof: From Theorems 3.11 and 3.17 we see immediately that

For completeness, we include here a few miscellaneous results about ranks of sumsand products of matrices.

Theorem 3.19. Let A, B e R"xn. Then

Part 4 of Theorem 3.19 suggests looking at the general problem of the four fundamentalsubspaces of matrix products. The basic results are contained in the following easily provedtheorem.

3.5. Four Fundamental Subspaces 25

Theorem 3.20. Let A e Rmxn, B e Rnxp. Then

The next theorem is closely related to Theorem 3.20 and is also easily proved. Itis extremely useful in text that follows, especially when dealing with pseudoinverses andlinear least squares problems.

Theorem 3.21. Let A e Rmxn. Then

We now characterize 1-1 and onto transformations and provide characterizations interms of rank and invertibility.

Theorem 3.22. Let A : Rn - Rm. Then

1. A is onto if and only //"rank(A) m (A has linearly independent rows or is said tohave full row rank; equivalently, AAT is nonsingular).

2. A is 1-1 if and only z/rank(A) = n (A has linearly independent columns or is saidto have full column rank; equivalently, AT A is nonsingular).

Proof: Proof of part 1: If A is onto, dim7?,(A) m rank (A). Conversely, let y e Rmbe arbitrary. Let jc = AT(AAT)~]y e Rn. Then y = Ax, i.e., y e 7?.(A), so A is onto.

Proof of part 2: If A is 1-1, then A/"(A) = 0, which implies that dim A/^A)-1 n dim 7(Ar), and hence dim 7(A) = n by Theorem 3.17. Conversely, suppose Ax\ = Ax^.Then ArA;ti = AT Ax2, which implies x\ = x^. since ArA is invertible. Thus, A is1-1. D

Definition 3.23. A : V W is invertible (or bijective) if and only if it is 1-1 and onto.Note that if A is invertible, then dim V dim W. Also, A : W1 - E" is invertible ornonsingular if and only z/rank(A) = n.

Note that in the special case when A R"x", the transformations A, Ar, and A"1are all 1-1 and onto between the two spaces M(A) and 7(A). The transformations ATand A~! have the same domain and range but are in general different maps unless A isorthogonal. Similar remarks apply to A and A~T.


If a linear transformation is not invertible, it may still be right or left invertible. Defi-nitions of these concepts are followed by a theorem characterizing left and right invertibletransformations.

Definition 3.24. Let A : V -> W. Then

1. A is said to be right invertible if there exists a right inverse transformation A~R :W > V such that AA~R = Iw, where Iw denotes the identity transformation on W.

2. A is said to be left invertible if there exists a left inverse transformation A~L : W >V such that A~LA = Iv, where Iv denotes the identity transformation on V.

Theorem 3.25. Let A : V -> W. Then

1. A is right invertible if and only if it is onto.

2. A is left invertible if and only if it is 1-1.Moreover, A is invertible if and only if it is both right and left invertible, i.e., both 1-1 andonto, in which case A~l = A~R = A~L.

Note: From Theorem 3.22 we see that if A : E" -> Em is onto, then a right inverseis given by A~R = AT(AAT) . Similarly, if A is 1-1, then a left inverse is given byA~L = (ATA)~1AT.Theorem 3.26. Let A : V - V.

1. If there exists a unique right inverse A~R such that AA~R = I, then A is invertible.

2. If there exists a unique left inverse A~L such that A~LA = I, then A is invertible.Proof: We prove the first part and leave the proof of the second to the reader. Notice thefollowing:

Thus, (A R + A RA /) must be a right inverse and, therefore, by uniqueness it must bethe case that A~R + A~RA I = A~R. But this implies that A~RA = /, i.e., that A~R isa left inverse. It then follows from Theorem 3.25 that A is invertible. D

Example 3.27.

1. Let A = [1 2] : E2 - E1. Then A is onto. (Proof: Take any a E1; then onecan always find v e E2 such that [1 2][^] = a). Obviously A has full row rank(=1) and A~R = [ _j j is a right inverse. Also, it is clear that there are infinitely manyright inverses for A. In Chapter 6 we characterize all right inverses of a matrix bycharacterizing all solutions of the linear matrix equation AR = I.

Exercises 27

2. Let A = [J] : E1 -> E2. ThenAis 1-1. (Proof: The only solution to 0 = Av = [I2]vis v = 0, whence A/"(A) = 0 so A is 1-1). It is now obvious that A has full columnrank (=1) and A~L = [3 1] is a left inverse. Again, it is clear that there areinfinitely many left inverses for A. In Chapter 6 we characterize all left inverses of amatrix by characterizing all solutions of the linear matrix equation LA = I.

3. The matrix

when considered as a linear transformation on IEbelow bases for its four fundamental subspaces.

\ is neither 1-1 nor onto. We give

EXERCISES3 41. Let A = [8 5 J and consider A as a linear transformation mapping E3 to E2.

Find the matrix representation of A with respect to the bases

2. Consider the vector space Rnx" over E, let S denote the subspace of symmetricmatrices, and let 7 denote the subspace of skew-symmetric matrices. For matricesX, Y e Enx" define their inner product by (X, Y) = Tr(XrF). Show that, withrespect to this inner product, 'R, S^.

3. Consider the differentiation operator C defined in Example 3.2.3. Is 1-1? Isonto?

4. Prove Theorem 3.4.

of R3 and

of E2.

28 Chapters. Linear Transformations

5. Prove Theorem 3.11.4.

6. Prove Theorem 3.12.2.

7. Determine bases for the four fundamental subspaces of the matrix

8. Suppose A e Rmxn has a left inverse. Show that AT has a right inverse.

9. Let A = [ J o]. Determine A/"(A) and 7(A). Are they equal? Is this true in general?If this is true in general, prove it; if not, provide a counterexample.

10. Suppose A Mg9x48. How many linearly independent solutions can be found to thehomogeneous linear system Ax = 0?

11. Modify Figure 3.1 to illustrate the four fundamental subspaces associated with AT eRnxm thought of as a transformation from Rm to R".

Chapter 4

Introduction to theMoore-Pen rosePseudoinverse

In this chapter we give a brief introduction to the Moore-Penrose pseudoinverse, a gener-alization of the inverse of a matrix. The Moore-Penrose pseudoinverse is defined for anymatrix and, as is shown in the following text, brings great notational and conceptual clarityto the study of solutions to arbitrary systems of linear equations and linear least squaresproblems.

4.1 Definitions and CharacterizationsConsider a linear transformation A : X > y, where X and y are arbitrary finite-dimensional vector spaces. Define a transformation T : Af(A)1- > Tl(A) by

Then, as noted in the proof of Theorem 3.17, T is bijective (1-1 and onto), and hence wecan define a unique inverse transformation T~l : 7(A) > J\f(A}~L. This transformationcan be used to give our first definition of A+, the Moore-Penrose pseudoinverse of A.Unfortunately, the definition neither provides nor suggests a good computational strategyfor determining A+.

Definition 4.1. With A and T as defined above, define a transformation A+ : y X by

where y = y\ + j2 with y\ e 7(A) and yi e Tl(A}L. Then A+ is the Moore-Penrosepseudoinverse of A.

Although X and y were arbitrary vector spaces above, let us henceforth consider thecase X = W1 and y = Rm. We have thus defined A+ for all A e IRX". A purely algebraiccharacterization of A+ is given in the next theorem, which was proved by Penrose in 1955;see [22].

29

30 Chapter 4. Introduction to the Moore-Penrose Pseudoinverse

Theorem 4.2. Let A e R?xn. Then G = A+ if and only if(PI) AGA = A.(P2) GAG = G.(P3) (AGf = AG.(P4) (GA)T = GA.Furthermore, A+ always exists and is unique.

Note that the inverse of a nonsingular matrix satisfies all four Penrose properties. Also,a right or left inverse satisfies no fewer than three of the four properties. Unfortunately, aswith Definition 4.1, neither the statement of Theorem 4.2 nor its proof suggests a computa-tional algorithm. However, the Penrose properties do offer the great virtue of providing acheckable criterion in the following sense. Given a matrix G that is a candidate for beingthe pseudoinverse of A, one need simply verify the four Penrose conditions (P1)-(P4). If Gsatisfies all four, then by uniqueness, it must be A+. Such a verification is often relativelystraightforward.

Example 4.3. Consider A = [']. Verify directly that A+ = [| f ] satisfies (P1)-(P4).Note that other left inverses (for example, A~L = [3 1]) satisfy properties (PI), (P2),and (P4) but not (P3).

Still another characterization of A+ is given in the following theorem, whose proofcan be found in [1, p. 19]. While not generally suitable for computer implementation, thischaracterization can be useful for hand calculation of small examples.Theorem 4.4. Let A e Rxn. Then

4.2 ExamplesEach of the following can be derived or verified by using the above definitions or charac-terizations.

Example 4.5. A+ = AT(AAT)~ if A is onto (independent rows) (A is right invertible).

Example 4.6. A+ = (AT A)~ AT if A is 1-1 (independent columns) (A is left invertible).

Example 4.7. For any scalar a,

4.3. Properties and Applications 31

Example 4.8. For any vector v e M",

Example 4.9.

Example 4.10.

4.3 Properties and ApplicationsThis section presents some miscellaneous useful results on pseudoinverses. Many of theseare used in the text that follows.

Theorem 4.11. Let A e Rmx" and suppose U e Rmxm, V e Rnx" are orthogonal (M isorthogonal if MT = M-1). Then

Proof: For the proof, simply verify that the expression above does indeed satisfy each cthe four Penrose conditions. D

Theorem 4.12. Let S e Rnxn be symmetric with UTSU = D, where U is orthogonal anD is diagonal. Then S+ = UD+UT, where D+ is again a diagonal matrix whose diagoncelements are determined according to Example 4.7.

Theorem 4.13. For all A e Rmxn,

Proof: Both results can be proved using the limit characterization of Theorem 4.4. Theproof of the first result is not particularly easy and does not even have the virtue of beingespecially illuminating. The interested reader can consult the proof in [1, p. 27]. Theproof of the second result (which can also be proved easily by verifying the four Penroseconditions) is as follows:

32 Chapter 4. Introduction to the Moore-Penrose Pseudoinverse

Note that by combining Theorems 4.12 and 4.13 we can, in theory at least, computethe Moore-Penrose pseudoinverse of any matrix (since A AT and AT A are symmetric). Thisturns out to be a poor approach in finite-precision arithmetic, however (see, e.g., [7], [11],[23]), and better methods are suggested in text that follows.

Theorem 4.11 is suggestive of a "reverse-order" property for pseudoinverses of prod-nets of matrices such as exists for inverses of nroducts TTnfortnnatelv. in peneraK

As an example consider A = [0 1J and B = LI. Then

while

However, necessary and sufficient conditions under which the reverse-order property doeshold are known and we quote a couple of moderately useful results for reference.

Theorem 4.14. (AB)+ = B+A+ if and only if

Proof: For the proof, see [9]. DTheorem 4.15. (AB)+ = B?A+, where BI = A+AB and A) = AB\B+.Proof: For the proof, see [5]. DTheorem 4.16. If A e Rnrxr, B e Rrrxm, then (AB)+ = B+A+.Proof: Since A e Rnrxr, then A+ = (ATA)~lAT, whence A+A = Ir. Similarly, sinceB e Wrxm, we have B+ = BT(BBT)~\ whence BB+ = Ir. The result then follows bytaking BI = B, A\ = A in Theorem 4.15. D

The following theorem gives some additional useful properties of pseudoinverses.

Theorem 4.17. For all A e Rmxn,

Exercises 33

Note: Recall that A e R"xn is normal if AAT = AT A. For example, if A is symmetric,skew-symmetric, or orthogonal, then it is normal. However, a matrix can be none of thepreceding but still be normal, such as

for scalars a, b e E.The next theorem is fundamental to facilitating a compact and unifying approach

to studying the existence of solutions of (matrix) linear equations and linear least squaresproblems.

Theorem 4.18. Suppose A e Rnxp, B e EMXm . Then K(B) c U(A) if and only ifAA+B = B.

Proof: Suppose K(B) c U(A) and take arbitrary jc e Rm. Then Bx e H(B) c H(A), sothere exists a vector y e Rp such that Ay = Bx. Then we have

where one of the Penrose properties is used above. Since x was arbitrary, we have shownthat B = AA+B.

To prove the converse, assume that AA+B = B and take arbitrary y e K(B). Thenthere exists a vector x e Rm such that Bx = y, whereupon

EXERCISES1. Use Theorem 4.4 to compute the pseudoinverse of \

2 21

2. If jc, y e R", show that (xyT)+ = (xTx)+(yTy)+yxT.3. For A e Rmxn, prove that 7(A) = 7(AAr) using only definitions and elementary

properties of the Moore-Penrose pseudoinverse.

4. For A e Rmxn, prove that ft(A+) = ft(Ar).5. For A e Rpxn and 5 Rmx", show that JV(A) C A/"(S) if and only if fiA+A = B.6. Let A G M"xn, 5 e Enxm, and D Emxm and suppose further that D is nonsingular.

(a) Prove or disprove that

(b) Prove or disprove that

Chapter 5

Introduction to the SingularValue Decomposition

In this chapter we give a brief introduction to the singular value decomposition (SVD). Weshow that every matrix has an SVD and describe some useful properties and applicationsof this important matrix factorization. The SVD plays a key conceptual and computationalrole throughout (numerical) linear algebra and its applications.

5.1 The Fundamental TheoremTheorem 5.1. Let A e Rxn . Then there exist orthogonal matrices U e Rmxm andV Rnxn such that

where S = [J 0], S = diagfcri, ... ,o>) e Rrxr, and a\ > > or > 0. Morespecifically, we have

The submatrix sizes are all determined by r (which must be < min{m, }), i.e., U\ e Wnxr,U2 e ^x(m-r); Vi e Rxr j y2 Rnxfo-r^ and the 0-JM^/ocJb in E are compatiblydimensioned.

Proof: Since A r A > 0 ( A r A i s symmetric and nonnegative definite; recall, for example,[24, Ch. 6]), its eigenvalues are all real and nonnegative. (Note: The rest of the proof followsanalogously if we start with the observation that A A T > 0 and the details are left to the readeras an exercise.) Denote the set of eigenvalues of AT A by {of , / e n} with a\ > > ar >0 = o>+i = = an. Let {u, , i e n} be a set of corresponding orthonormal eigenvectorsand let V\ = [v\, ..., v r ] , Vi = [vr+\, . . . , vn]. Letting S diag(cri, . . . , cfr), we canwrite A rAVi = ViS2. Premultiplying by Vf gives Vf ATAVi = VfV^S2 = S2, the latterequality following from the orthonormality of the r;, vectors. Pre- and postmultiplying byS~l eives the emotion

35

36 Chapter 5. Introduction to the Singular Value Decomposition

Turning now to the eigenvalue equations corresponding to the eigenvalues or+\, . . . , an wehave that ATAV2 = V20 = 0, whence Vf ATAV2 = 0. Thus, AV2 = 0. Now define thematrix Ui e Mmx/" by U\ = AViS~l. Then from (5.4) we see that UfU\ = /; i.e., thecolumns of U\ are orthonormal. Choose any matrix U2 ^77IX(~r) such that [U\ U2] isorthogonal. Then

since A V2 =0. Referring to the equation U\ = A V\ S l defining U\, we see that U{ AV\ =S and 1/2 AVi = U^UiS = 0. The latter equality follows from the orthogonality of thecolumns of U\ andU2. Thus, we see that, in fact, UTAV = [Q Q], and defining this matrixto be S completes the proof. D

Definition 5.2. Let A = t/E VT be an SVD of A as in Theorem 5.1.1. The set [a\,..., ar} is called the set of (nonzero) singular values of the matrix A andi

is denoted (A). From the proof of Theorem 5.1 we see that cr,(A) = A(2 (ATA) =A.? (AAT). Note that there are also min{m, n] r zero singular values.

2. The columns ofU are called the left singular vectors of A (and are the orthonormaleigenvectors of AAT).

3. The columns of V are called the right singular vectors of A (and are the orthonormaleigenvectors of A1A).

Remark 5.3. The analogous complex case in which A e Cx" is quite straightforward.The decomposition is A = t/E VH, where U and V are unitary and the proof is essentiallyidentical, except for Hermitian transposes replacing transposes.

Remark 5.4. Note that U and V can be interpreted as changes of basis in both the domainand co-domain spaces with respect to which A then has a diagonal matrix representation.Specifically, let C, denote A thought of as a linear transformation mapping W to W. Thenrewriting A = U^VT as AV = U E we see that Mat is S with respect to the bases[v\,..., vn} for R" and {u\,..., um] for Rm (see the discussion in Section 3.2). See alsoRemark 5.16.

Remark 5.5. The singular value decomposition is not unique. For example, an examinationof the proof of Theorem 5.1 reveals that

any orthonormal basis for jV(A) can be used for V2.there may be nonuniqueness associated with the columns of V\ (and hence U\) cor-responding to multiple cr/'s.

5.1. The Fundamental Theorem 37

any C/2 can be used so long as [U\ Ui] is orthogonal.

columns of U and V can be changed (in tandem) by sign (or multiplier of the forme

je in the complex case).

What is unique, however, is the matrix E and the span of the columns of U\, f/2, Vi, and2 (see Theorem 5.11). Note, too, that a "full SVD" (5.2) can always be constructed froma "compact SVD" (5.3).

Remark 5.6. Computing an SVD by working directly with the eigenproblem for AT A orAAT is numerically poor in finite-precision arithmetic. Better algorithms exist that workdirectly on A via a sequence of orthogonal transformations; see, e.g., [7], [11], [25].

F/vamnlp 5.7.

Example 5.10. Let A e RMX" be symmetric and positive definite. Let V be an orthogonalmatrix of eigenvectors that diagonalizes A, i.e., VT AV = A > 0. Then A = VAVT is anSVD of A.

A factorization t/SVr o f a n m x n matrix A qualifies as an SVD if U and V areorthogonal and is an m x n "diagonal" matrix whose diagonal elements in the upperleft corner are positive (and ordered). For example, if A = f/E VT is an SVD of A, thenVS rC/ r i sanSVDof AT.

where U is an arbitrary 2x2 orthogonal matrix, is an SVD.

Example 5.8.

where 0 is arbitrary, is an SVD.

Example 5.9.

is an SVD.

38 Chapter 5. Introduction to the Singular Value Decomposition

5.2 Some Basic PropertiesTheorem 5.11. Let A e Rmxn have a singular value decomposition A = VLVT. Usingthe notation of Theorem 5.1, the following properties hold:

1. rank(A) = r = the number of nonzero singular values of A.

2. Let U =. [HI, . . . , um] and V = [v\, ..., vn]. Then A has the dyadic (or outerproduct) expansion

Remark 5.12. Part 4 of the above theorem provides a numerically superior method forfinding (orthonormal) bases for the four fundamental subspaces compared to methods basedon, for example, reduction to row or column echelon form. Note that each subspace requiresknowledge of the rank r. The relationship to the four fundamental subspaces is summarizednicely in Figure 5.1.

Remark 5.13. The elegance of the dyadic decomposition (5.5) as a sum of outer productsand the key vector relations (5.6) and (5.7) explain why it is conventional to write the SVDas A = UZVT rather than, say, A = UZV.

Theorem 5.14. Let A e Emx" have a singular value decomposition A = UHVT as inTheoremS.]. Then

where

3. The singular vectors satisfy the relations

5.2. Some Basic Properties 39

Figure 5.1. SVD and the four fundamental subspaces.

with the Q-subblocks appropriately sized. Furthermore, if we let the columns of U and Vbe as defined in Theorem 5.11, then

Proof: The proof follows easily by verifying the four Penrose conditions. DRemark 5.15. Note that none of the expressions above quite qualifies as an SVD of A+if we insist that the singular values be ordered from largest to smallest. However, a simplereordering accomplishes the task:

This can also be written in matrix terms by using the so-called reverse-order identity matrix(or exchange matrix) P = \er,er^\, ..., e^, e\\, which is clearly orthogonal and symmetric.

is the matrix version of (5.11). A "full SVD" can be similarly constructed.

Remark 5.16. Recall the linear transformation T used in the proof of Theorem 3.17 andin Definition 4.1. Since T is determined by its action on a basis, and since ( v \ , . . . , vr} is abasis forJ\f(A), then T can be defined by TV; = cr, w, , / e r. Similarly, since [u\, ... ,ur}isabasisfor7(.4), then T~l can be defined by T^'M, = ^-u, , / e r. From Section 3.2, thematrix representation for T with respect to the bases { v \ , ..., vr} and {MI , . . . , ur] is clearlyS, while the matrix representation for the inverse linear transformation T~l with respect tothe same bases is 5""1.

5.3 Row and Column Compressions

Row compressionLet A E Rmxn have an SVD given by (5.1). Then

Notice that M(A) - M(UT A) = A/"(SV,r) and the matrix SVf e Rrx" has full rowrank. In other words, premultiplication of A by UT is an orthogonal transformation that"compresses" A by row transformations. Such a row compression can also be accomplished

D _

by orthogonal row transformations performed directly on A to reduce it to the form 0 ,

where R is upper triangular. Both compressions are analogous to the so-called row-reducedechelon form which, when derived by a Gaussian elimination algorithm implemented infinite-precision arithmetic, is not generally as reliable a procedure.

Column compressionAgain, let A e Rmxn have an SVD given by (5.1). Then

This time, notice that H(A) = K(AV) = K(UiS) and the matrix UiS e Rmxr has fullcolumn rank. In other words, postmultiplication of A by V is an orthogonal transformationthat "compresses" A by column transformations. Such a compression is analogous to the

40 Chapters. Introduction to the Singular Value Decomposition

Then

Exercises 41

so-called column-reduced echelon form, which is not generally a reliable procedure whenperformed by Gauss transformations in finite-precision arithmetic. For details, see, forexample, [7], [11], [23], [25].

EXERCISES

1. Let X Mmx". If XTX = 0, show that X = 0.

2. Prove Theorem 5.1 starting from the observation that AAT > 0.

3. Let A e E"xn be symmetric but indefinite. Determine an SVD of A.

4. Let x e Rm, y e Rn be nonzero vectors. Determine an SVD of the matrix A e Rdefined by A = xyT.

6. Let A e Rmxn and suppose W eRmxm and 7 e Rnxn are orthogonal.

(a) Show that A and W A F have the same singular values (and hence the same rank).(b) Suppose that W and Y are nonsingular but not necessarily orthogonal. Do A

and WAY have the same singular values? Do they have the same rank?

7. Let A R"XM. Use the SVD to determine a polar factorization of A, i.e., A = QPwhere Q is orthogonal and P = PT > 0. Note: this is analogous to the polar formz = rel&ofa complex scalar z (where i = j = V^T).

5. Determine SVDs of the matrices

Chapter 6

Linear Equations

In this chapter we examine existence and uniqueness of solutions of systems of linearequations. General linear systems of the form

are studied and include, as a special case, the familiar vector system

6.1 Vector Linear EquationsWe begin with a review of some of the principal results associated with vector linear systems.

Theorem 6.1. Consider the system of linear equations

1. There exists a solution to (6.3) if and only ifbeH(A).2. There exists a solution to (6.3} for all b e Rm if and only ifU(A) = W", i.e., A is

onto; equivalently, there exists a solution if and only j/"rank([A, b]) = rank(A), andthis is possible only ifm < n (since m = dimT^(A) = rank(A) < min{m, n}).

3. A solution to (6.3) is unique if and only ifJ\f(A) = 0, i.e., A is 1-1.

4. There exists a unique solution to (6.3) for all b e W" if and only if A is nonsingular;equivalently, A G Mmxm and A has neither a 0 singular value nor a 0 eigenvalue.

5. There exists at most one solution to (6.3) for all b e W1 if and only if the columns ofA are linearly independent, i.e., A/"(A) = 0, and this is possible only ifm > n.

6. There exists a nontrivial solution to the homogeneous system Ax = 0 if and only ifrank(A) < n.

43

44 Chapter 6. Linear Equations

Proof: The proofs are straightforward and can be consulted in standard texts on linearalgebra. Note that some parts of the theorem follow directly from others. For example, toprove part 6, note that x = 0 is always a solution to the homogeneous system. Therefore, wemust have the case of a nonunique solution, i.e., A is not 1-1, which implies rank(A) < nby part 3. D

6.2 Matrix Linear EquationsIn this section we present some of the principal results concerning existence and uniquenessof solutions to the general matrix linear system (6.1). Note that the results of Theorem6.1 follow from those below for the special case k = 1, while results for (6.2) follow byspecializing even further to the case m = n.

Theorem 6.2 (Existence). The matrix linear equation

and this is clearly of the form (6.5).

has a solution if and only ifl^(B) C 7(A); equivalently, a solution exists if and only ifAA+B = B.

Proof: The subspace inclusion criterion follows essentially from the definition of the rangeof a matrix. The matrix criterion is Theorem 4.18.

Theorem 6.3. Let A e Rmxn, B eRmxk and suppose that AA+B = B. Then any matrixof the form

is a solution of

Furthermore, all solutions of (6.6) are of this form.Proof: To verify that (6.5) is a solution, premultiply by A:

That all solutions arc of this form can be seen as follows. Let Z be an arbitrary solution of(6.6), i.e., AZ B. Then we can write

6.2. Matrix Linear Equations 45

Remark 6.4. When A is square and nonsingular, A+ = A"1 and so (/ A+A) = 0. Thus,there is no "arbitrary" component, leaving only the unique solution X = A~1B.

Remark 6.5. It can be shown that the particular solution X = A+B is the solution of (6.6)that minimizes TrX7 X. (Tr(-) denotes the trace of a matrix; recall that TrXr X = \ jcj.)Theorem 6.6 (Uniqueness). A solution of the matrix linear equation

is unique if and only if A+A = /; equivalently, (6.7) has a unique solution if and only ifM(A) = 0.Proof: The first equivalence is immediate from Theorem 6.3. The second follows by notingthat A+A = / can occur only if r n, where r = rank(A) (recall r < h). But rank(A) = nif and only if A is 1-1 or _/V(A) = 0. DExample 6.7. Suppose A e E"x". Find all solutions of the homogeneous system Ax 0.

Solution:

where y e R" is arbitrary. Hence, there exists a nonzero solution if and only if A+A /= I.This is equivalent to either rank (A) = r < n or A being singular. Clearly, if there exists anonzero solution, it is not unique.

Computation: Since y is arbitrary, it is easy to see that all solutions are generatedfrom a basis for 7(7 A+A). But if A has an SVD given by A = f/E VT, then it is easilychecked that / - A+A = V2V2r and U(V2V^) = K(V2) = N(A).Example 6.8. Characterize all right inverses of a matrix A e ]Rmx"; equivalently, find allsolutions R of the equation AR = Im. Here, we write Im to emphasize the m x m identitymatrix.

Solution: There exists a right inverse if and only if 7(/m) c 7(A) and this isequivalent to AA+Im = Im. Clearly, this can occur if and only if rank(A) = r = m (sincer < m) and this is equivalent to A being onto (A+ is then a right inverse). All right inversesof A are then of the form

where Y e E"xm is arbitrary. There is a unique right inverse if and only if A+A = /(AA(A) = 0), in which case A must be invertible and R = A"1.Example 6.9. Consider the system of linear first-order difference equations


with A e R"xn and fieR"xm(rc>l,ra>l). The vector Jt* in linear system theory isknown as the state vector at time k while Uk is the input (control) vector. The generalsolution of (6.8) is given by

for k > 1. We might now ask the question: Given XQ = 0, does there exist an input sequence{uj }y~Q such that x^ takes an arbitrary vaof reachability. Since m > 1, from thesee that (6.8) is reachable if and only if

[Uj }kjj^ such that Xk takes an arbitrary value in W ? In linear system theory, this is a questionof reachability. Since m > 1, from the fundamental Existence Theorem, Theorem 6.2, we

or, equivalently, if and only if

A related question is the following: Given an arbitrary initial vector XQ, does there ex-ist an input sequence {"y}"~o such that xn = 0? In linear system theory, this is calledcontrollability. Again from Theorem 6.2, we see that (6.8) is controllable if and only if

Clearly, reachability always implies controllability and, if A is nonsingular, control-lability and reachability are equivalent. The matrices A = [ 1Q1 and 5 = f ^ 1 provide anexample of a system that is controllable but not reachable.

The above are standard conditions with analogues for continuous-time models (i.e.,linear differential equations). There are many other algebraically equivalent conditions.

Example 6.10. We now introduce an output vector yk to the system (6.8) of Example 6.9by appending the equation

with C e Rpxn and D Rpxm (p > 1). We can then pose some new questions about theoverall system that are dual in the system-theoretic sense to reachability and controllability.The answers are cast in terms that are dual in the linear algebra sense as well. The conditiondual to reachability is called observability: When does knowledge of {"7 }"!Q and {y_/}"~osuffice to determine (uniquely) Jt0? As a dual to controllability, we have the notion ofreconstructibility: When does knowledge of {wy}"~Q and {;y/}"Io suffice to determine(uniquely) xnl The fundamental duality result from linear system theory is the following:

(A, B) is reachable [controllable] if and only if(AT, B T] is observable [reconstructive].

6.4 Some Useful and Interesting Inverses 47

To derive a condition for observability, notice that

Thus,

Let v denote the (known) vector on the left-hand side of (6.13) and let R denote the matrix onthe right-hand side. Then, by definition, v e Tl(R), so a solution exists. By the fundamentalUniqueness Theorem, Theorem 6.6, the solution is then unique if and only if N(R) = 0,or, equivalently, if and only if

6.3 A More General Matrix Linear EquationTheorem 6.11. Let A e Rmxn, B e Rmxq, and C e Rpxti. Then the equation

has a solution if and only if AA+BC+C = B, in which case the general solution is of the

where Y Rn*p is arbitrary.

A compact matrix criterion for uniqueness of solutions to (6.14) requires the notionof the Kronecker product of matrices for its statement. Such a criterion (CC+


1. (A + BDCr1 = A~l - A~lB(D~l + CA~lB)~[CA~l.This result is known as the Sherman-Morrison-Woodbury formula. It has manyapplications (and is frequently "rediscovered") including, for example, formulas forthe inverse of a sum of matrices such as (A + D)"1 or (A"1 + D"1) . It alsoyields very efficient "updating" or "downdating" formulas in expressions such as

T 1(A + JUT ) (with symmetric A e R"x" and ;c e E") that arise in optimizationtheory.

EXERCISES

1. As in Example 6.8, characterize all left inverses of a matrix A e Mmx".

2. Let A Emx", B e Rmxk and suppose A has an SVD as in Theorem 5.1. Assuming7Z(B) c 7(A), characterize all solutions of the matrix linear equation

Both of these matrices satisfy the matrix equation X^ = I from which it is obviousthat X~l = X. Note that the positions of the / and / blocks may be exchanged.

where E = (D CA B) (E is the inverse of the Schur complement of A). Thisresult follows easily from the block LU factorization in property 16 of Section 1.4.

where F = (A ED C) . This result follows easily from the block UL factor-ization in property 17 of Section 1.4.

in terms of the SVD of A

Exercises 49

3. Let jc, y e E" and suppose further that XTy ^ 1. Show that

4. Let x, y E" and suppose further that XTy ^ 1. Show that

where c = 1/(1 xTy).5. Let A e R"x" and let A"1 have columns c\, ..., cn and individual elements y;y.

Assume that x/( 7^ 0 for some / and j. Show that the matrix B A leieT: (i.e.,A with subtracted from its (zy)th element) is singular.Hint: Show that ct

Chapter 7

Projections, Inner ProductSpaces, and Norms

7.1 ProjectionsDefinition 7.1. Let V be a vector space with V = X 0 y. By Theorem 2.26, every v e Vhas a unique decomposition v = x + y with x e X and y e y. Define PX y V > X c Vby

Figure 7.1. Oblique projections.

Theorem 7.2. Px,y is linear and P# y Px,y-Theorem 7.3. A linear transformation P is a projection if and only if it is idempotent, i.e.,P2 = P. Also, P is a projection if and only if I P is a projection. Infact, Py,x I Px,y-Proof: Suppose P is a projection, say on X along y (using the notation of Definition 7.1).

51

Px,y is called the (oblique) projection on X along 3^.Figure 7.1 displays the projection of v on both X and 3^ in the case V =

52 Chapter 7. Projections, Inner Product Spaces, and Norms

Let u e V be arbitrary. Then Pv = P(x + y) = Px = x. Moreover, P2v = PPv Px = x = Pv. Thus, P2 = P. Conversely, suppose P2 = P. Let X = {v e V : Pv = v}and y = {v V : Pv = 0}. It is easy to check that X and 3^ are subspaces. We now provethat V = X 0 y. First note that tfveX, then Pv = v. If v e y, then Pv = 0. Henceif v X n y, then v = 0. Now let u e V be arbitrary. Then v = Pv + (I - P)v. Letx = Pv, y = (I - P)v. Then Px = P2v = Pv = x so x e X, while Py = P(I - P}v =Pv - P2v = 0 so y e y. Thus, V = X 0 y and the projection on X along y is P.Essentially the same argument shows that / P is the projection on y along X. DDefinition 7.4. In the special case where y = X^, PX.XL *s called an orthogonal projec-tion and we then use the notation PX = PX,XL-Theorem 7.5. P e E"xn is the matrix of an orthogonal projection (onto K(P)} if and onlyifP2 = p = PT.Proof: Let P be an orthogonal projection (on X, say, along XL} and let jc, y e R" bearbitrary. Note that (/ - P)x = (I - PX,X^X = Px,xx by Theorem 7.3. Thus,(/ - P)x e XL. Since Py e X, we have ( P y f ( I - P)x = yTPT(I - P)x = 0.Since x and y were arbitrary, we must have PT (I P) = 0. Hence PT = PTP = P,with the second equality following since PTP is symmetric. Conversely, suppose P is asymmetric projection matrix and let x be arbitrary. Write x = Px + (I P)x. Thenx

TPT(I - P)x = xTP(I - P}x = 0. Thus, since Px e U(P), then (/ - P)x 6 ft(P)1and P must be an orthogonal projection. D

7.1.1 The four fundamental orthogonal projections

Using the notation of Theorems 5.1 and 5.11, let A 6 Rmxn with SVD A = UT,VT =UtSVf. Then

are easily checked to be (unique) orthogonal projections onto the respective four funda-mental subspaces,

7.1. Projections 53

Example 7.6. Determine the orthogonal projection of a vector v e M" on another nonzerovector w e Rn.

Solution: Think of the vector w as an element of the one-dimensional subspace IZ(w).Then the desired projection is simply

(using Example 4.8)

Moreover, the vector z that is orthogonal to w and such that v = Pv + z is given byz = PK(W)V = (/ PK(W))V = v (^-^ j w. See Figure 7.2. A direct calculation showsthat z and u; are, in fact, orthogonal:

Figure 7.2. Orthogonal projection on a "line."

Example 7.7. Recall the proof of Theorem 3.11. There, {v \ , . . . , Vk} was an orthornormalbasis for a subset S of W1. An arbitrary vector x e R" was chosen and a formula for x\appeared rather mysteriously. The expression for x\ is simply the orthogonal projection ofx on S. Specifically,

Example 7.8. Recall the diagram of the four fundamental subspaces. The indicated directsum decompositions of the domain E" and co-domain Rm are given easily as follows.

Let x e W1 be an arbitrary vector. Then


Similarly, let y e ]Rm be an arbitrary vector. Then

Example 7.9. Let

Then

and we can decompose the vector [2 3 4]r uniquely into the sum of a vector in A/'CA)-1and a vector in J\f(A), respectively, as follows:

7.2 Inner Product SpacesDefinition 7.10. Let V be a vector space over R. Then { , ) : V x Vproduct if

is a real inner

1. (x, x) > Qfor all x 6V and ( x , x } =0 if and only ifx = 0.2. (x, y) = (y,x)forallx,y e V.3. {*, cryi + ^2) = a(x, y\) + /3(jt, y^} for all jc, yi, j2 ^ V and for alia, ft e R.

Example 7.11. Let V = R". Then {^, y} = XTy is the "usual" Euclidean inner product ordot product.

Example 7.12. Let V = E". Then (jc, y)Q = XTQy, where Q = QT > 0 is an arbitraryn x n positive definite matrix, defines a "weighted" inner product.

Definition 7.13. If A e Rmx", then AT e Rnxm is the unique linear transformation or mapsuch that (x, Ay) - (ATx, y) for all x Rm and for all y e R".

7.2. Inner Product Spaces 55

It is easy to check that, with this more "abstract" definition of transpose, and if the(/, y)th element of A is a(;, then the (i, y)th element of AT is a/,. It can also be checkedthat all the usual properties of the transpose hold, such as (Afl) = BT AT. However, thedefinition above allows us to extend the concept of transpose to the case of weighted innerproducts in the following way. Suppose A e Rmxn and let {-, -}g and (, -}R, with Q andR positive definite, be weighted inner products on Rm and W, respectively. Then we candefine the "weighted transpose" A# as the unique map that satisfies

(x, Ay)Q = (A#x, y)R for all x e Rm and for all y e W1.

By Example 7.12 above, we must then have XT QAy = xT(A#) Ry for all x, y. Hence wemust have QA = (A#) R. Taking transposes (of the usual variety) gives AT Q = RA#.Since R is nonsingular, we find

A* = /r'A' Q.We can also generalize the notion of orthogonality (xTy = 0) to Q -orthogonality (Q isa positive definite matrix). Two vectors x, y e W are C is a complexinner product if

1. ( x , x ) > Qfor all x e V and ( x , x ) =0 if and only ifx = 0.2. (x, y) = (y, x) for all x, y e V.3. (x,ayi + fiy2) = a(x, y\) + fi(x, y2}forallx, y\, y2 e V and for alia, ft 6 C.

Remark 7.15. We could use the notation {, -}c to denote a complex inner product, butif the vectors involved are complex-valued, the complex inner product is to be understood.Note, too, from part 2 of the definition, that ( x , x ) must be real for all x.

Remark 7.16. Note from parts 2 and 3 of Definition 7.14 that we have

(ax\ + fix2, y) = a(x\, y) + P(x2, y}.

Remark 7.17. The Euclidean inner product of x, y e C" is given by

The conventional definition of the complex Euclidean inner product is (x, y} = yHx but weuse its complex conjugate xHy here for symmetry with the real case.

Remark 7.18. A weighted inner product can be defined as in the real case by (x, y}Q XH Qy, for arbitrary Q = QH > 0. The notion of Q -orthogonality can be similarlygeneralized to the complex case.


Definition 7.19. A vector space (V, F) endowed with a specific inner product is called aninner product space. If F = C, we call V a complex inner product space. If F = R, wecall V a real inner product space.

Example 7.20.

1. Check that V = R" x" with the inner product (A, B) = Tr AT B is a real inner productspace. Note that other choices are possible since by properties of the trace function,TrATB = TrBTA = TrABT = TrBAT.

2. Check that V = Cnx" with the inner product (A, B) = Tr AHB is a complex innerproduct space. Again, other choices are possible.

Definition 7.21. Let V be an inner product space. For v e V, we define the norm (orlength) ofv by \\v\\ = */(v, v). This is called the norm induced by ( - , - ) .

Example 7.22.

1. If V = E." with the usual inner product, the induced norm is given by ||i>|| =xV* 9\ 7(E,=i

7.3. Vector Norms 57

Theorem 7.25 (Polarization Identity).1. For x, y R", an inner product is defined by

7.3 Vector NormsDefinition 7.26. Let (V, F) be a vector space. Then \ \ - \ \ : V -> R is a vector norm if itsatisfies the following three properties:

2. For x, y e C", an inner product is defined by

where j = i = \/T.

(This is called the triangle inequality, as seen readily from the usual diagram illustrating the sum of two vectors in R2.)

Remark 7.27. It is convenient in the remainder of this section to state results for complex-valued vectors. The specialization to the real case is obvious.

Definition 7.28. A vector space (V, F) is said to be a normed linear space if and only ifthere exists a vector norm || || : V -> R satisfying the three conditions of Definition 7.26.Example 7.29.

1. For x e C", the Holder norms, or p-norms, are defined by

Special cases:

(The second equality is a theorem that requires proof.)


2. Some weighted p-norms:

(a) ||JC||,.D = E^rf/l*/!, where 4 > 0.(b) Ikl lz.g (xh QXY > where Q = QH > 0 (this norm is more commonly

denoted || ||c).3. On the vector space (C[to, t \ ] , R), define the vector norm

On the vector space ((C[to, t\])n,

Documents

Matrix Analysis for Scientist