Lecture 4 Orthogonal It y

November 5, 2010

INNER PRODUCT SPACES

RODICA D. COSTIN

Contents

1. Inner product 11.1. Inner product 11.2. Inner product spaces 32. Orthogonal bases 32.1. Orthogonal projections 42.2. Existence of orthogonal basis in finite dimension 52.3. Orthogonal subspaces, orthogonal complements 62.4. Splitting of linear transformations 72.5. The adjoint and the four fundamental subspaces of a matrix 83. Least squares approximations 103.1. The Cauchy-Schwartz inequality 103.2. More about orthogonal projections 103.3. Overdetermined systems: best fit solution 113.4. Another formula for orthogonal projections 124. Orthogonal and unitary matrices, QR factorization 124.1. Unitary and orthogonal matrices 124.2. Rectangular matrices with orthonormal columns. A simple

formula for orthogonal projections 144.3. QR factorization 14

1. Inner product

1.1. Inner product.

1.1.1. Inner product on real spaces. Vectors in Rn have more properties thanthe ones listed in the definition of vector spaces: we can define their length,and the angle between two vectors.

Recall that two vectors are orthogonal if and only if their dot product iszero, and, more generally, the cosine of the angle between two unit vectorsin R3 is their dot product. The notion of inner product extracts the essentialproperties of the dot product, while allowing it to be defined on quite generalvector spaces. We will first define it for real vector spaces, and then we willformulate it for complex ones.

1

2 RODICA D. COSTIN

Definition 1. An inner product on vector space V over F = R is anoperation which associate to two vectors x, y V a scalar x, y R thatsatisfies the following properties:(i) its is positive definite: x, x 0 and x, x = 0 if and only if x = 0,(ii) it is symmetric: x, y = y, x(iii) it is linear in the second argument: x, y + z = x, y + x, z andx, cy = cx, y

Note that by symmetry it follows that an inner product is linear in thefirst argument as well: x+ z, y = x, y+ z, y and cx, y = cx, y.

Example 1. The dot product in R2 or R3 is clearly an inner product: ifx = (x1, x2, x3) and y = (y1, y2, y3) then define

x,y = x y = x1y1 + x2y2 + x3y3Example 2. More generally, an inner product on Rn is

(1) x,y = x y = x1y1 + . . .+ xnynExample 3. Here is another inner product on R3:

x,y = 5x1y1 + 10x2y2 + 2x3y3(some directions are weighted more than others).

Example 4. On spaces of functions the most useful inner products useintegration. For example, consider P be the vector space of all polynomialswith real coefficients. Then

p, q = 10p(t)q(t) dt

is an inner product on P (check!).Example 5. Sometimes a weight is used: let w(t) be a positive function.

Then

p, q = 10w(t)p(t)q(t) dt

is also an inner product on P (check!).1.1.2. Inner product on complex spaces. For complex vector spaces extracare is needed. The blueprint of the construction here can be seen on thesimplest case, C as a one dimensional vector space over C: the inner productof z, z needs to be a positive number! It makes sense to define z, z = zz.Definition 2. An inner product on vector space V over F = C is anoperation which associate to two vectors x, y V a scalar x, y C thatsatisfies the following properties:(i) its is positive definite: x, x 0 and x, x = 0 if and only if x = 0,(ii) it is linear in the second argument: x, y + z = x, y + x, z andx, cy = cx, y(iii) it is conjugate symmetric: x, y = y, x.

INNER PRODUCT SPACES 3

Note that conjugate symmetry combined with linearity implies that ., .is conjugate linear in the first variable: x + z, y = x, y + z, y andcx, y = cx, y.

Please keep in mind that most mathematical books use inner productlinear in the first variable, and conjugate linear in the second one. Youshould make sure you know the convention used by each author.

Example 1. On the one dimensional complex vector space C an innerproduct is z, w = zw.

Example 2. More generally, an inner product on Cn is

(2) x,y = x1y1 + . . .+ xnynExample 3. Here is another inner product on C3:

x,y = 5x1y1 + 10x2y2 + 2x3y3(some directions are weighted more than others).

Example 4. Let PC be the vector space of all polynomials with complexcoefficients. Then

p, q = 10p(t)q(t) dt

is an inner product on PC (check!).Example 5. Weights need to be positive: for w(t) a givenpositive func-

tion. Then

p, q = 10w(t)p(t)q(t) dt

is also an inner product on PC (check!).1.2. Inner product spaces.

Definition 3. A vector space V equipped with an inner product (V, ., .) iscalled an inner product space.

Examples 1.-5. before are examples of inner product spaces over R, whileExamples 1.-5. are inner product spaces over C.

Let (V, ., .) be an inner product space. The quantityx =

x, x

is called the norm of the vector x. For V = R3 with the usual inner product(which is the dot product) the norm of a vector is its length.

Vectors of norm one, x with x = 1, are called unit vectors.

2. Orthogonal bases

Let (V, ., .) be an inner product space.Definition 4. Two vectors x,y V are called orthogonal if x,y = 0.

4 RODICA D. COSTIN

Note that the zero vector is orthogonal to any vector:

x,0 = 0 for all x Vsince x,0 = x, 0y = 0x,y = 0.Definition 5. A set B = {v1, . . . ,vn} V is called an orthogonal set ifvj ,vi = 0 for all i 6= j.

Remark. An orthogonal set which does not contain the zero vector isa linearly independent set, since if {v1, . . . ,vk} are orthogonal and c1v1 +. . .+ ckvk = 0 then for any j = 1, . . . , k

0 = vj , c1v1 + . . .+ ckvk = c1vj ,v1+ . . .+ cjvj ,vj+ . . .+ ckvj ,vk= cjvj ,vj

and since vj 6= 0 then vj ,vj 6= 0 which implies cj = 0. 2.Definition 6. An orthonormal set of vectors which form a basis for V iscalled an orthogonal basis.

An orthogonal basis made of unit vectors is called an orthonormal basis.

For example, the standard basis e1, . . . , en is an orthonormal basis in Rn(when equipped this the inner product given by the dot product).

Of course, if v1, . . . ,vn is an orthogonal basis, then1v1v1, . . . ,

1vnvn is

an orthonormal basis.In an inner product space, the coefficients of the expansion of a vector in

an orthogonal basis can be easily calculated:

Theorem 7. Let B = {v1, . . . ,vn} be an orthogonal basis for V .Then the coefficients of the expansion of any x V in the basis B are

found as

(3) x = c1v1 + . . .+ cnvn where cj =vj ,xvj2

Proof.Consider the expansion of x in the basis B: x = c1v1 + . . . + cnvn. For

each j = 1, . . . , n

vj , x = vj , c1v1+. . .+cnvn = c1vj ,v1+. . .+cjvj ,vj+. . .+cnvj ,vn= cjvj ,vj = cjvj2

which gives the formula (3) for cj . 2

2.1. Orthogonal projections. Given two vectors x, v V the orthogo-nal projection of x in the direction of v is

(4) Pv x =v,xv2 v

For V = R3 with the inner product given by the dot product of vectors,this is the usual geometric projection.

Remarks:


1. Pv x is a scalar multiple of v.2. (4) can be rewritten as

Pv x = vv ,xv

v , wherev

v is a unit vectorwhich shows that the projection in the direction of v depends only on itsdirection, and not on its norm. Furthermore, note that Pv = Pv, so amore appropriate name for Pv is the orthogonal projection onto the subspaceSp(v).

3. Pv : V V is a linear transformation, whose range is Sp(v), andwhich satisfies P 2v = Pv.

Note that (x Pv x) v (check!).More generally, we can define orthogonal projections onto higher dimen-

sional subspaces. Consider for example a two dimensional space Sp(v1,v2)where we assume v1 v2 for simplicity (and v1,2 6= 0). Consider for any xthe vector

(5) PSp(v1,v2)x Px =v1,xv12 v1 +

v2,xv22 v2

We have: Px U , P 2 = P and (x Px) Sp(v1,v2) (check!), so Pis the orthogonal projection (transformation) onto U . We will give exactcharacterizations of all orthogonal projections below in 2.5.2.2. Existence of orthogonal basis in finite dimension. We know thatany vector space has a basis. Moreover, any finite dimensional inner productspace has an orthogonal basis, and here is how to find it:

Theorem 8. Gram-Schmidt OrthogonalizationLet {u1, . . . ,un} be a basis for V . Then an orthogonal basis {v1, . . . ,vn}

can be found as follows:

(6) v1 = u1 Sp(u1)

(7) v2 = u2 v1,u2v12 v1 Sp(u1,u2)

(8) v3 = u3 v1,u3v12 v1 v2,u3v22 v2 Sp(u1,u2,u3)...

(9) vk = uk k1j=1

vj ,ukvj2 vj Sp(u1, . . . ,uk)

...

(10) vn = un n1j=1

vj ,ukvj2 vj Sp(u1, . . . ,un)

6 RODICA D. COSTIN

The Gram-Schmidt orthogonalization process is exactly as in Rn. At thefirst step we keep u1, (6). Next, we consider Sp(u1,u2) and we replace u2by a vector which is orthogonal to v1 = u1, which is u2Pv1u2, giving (7).We have produced the orthogonal basis v1,v2 for Sp(u1,u2).

For the next step we add one more vector, and we consider Sp(u1,u2,u3) =Sp(v1,v2,u3). We need to replace u3 by a vector orthogonal to Sp(v1,v2)and this is u3 PSp(v1,v2)u3, which is (8).

The procedure is continued; at each step k we subtract from uk the or-thogonal projection of uk on Sp(v1, . . . ,vk1).

To check that vk vi for all i = 1, . . . , k 1 we calculate, using thelinearity in the second argument of the inner product,

vi,vk = vi,uk k1j=1

vj ,ukvj2 vi,vj = vi,uk

vi,ukvi2 vi,vi = 0

2

Corollary 9. Let U be a subspace of V . Any orthogonal basis of U can becompleted to an orthogonal basis of V .

Indeed, U has an orthogonal basis v1, . . . ,vk by Theorem 8, which canbe completed to a basis of V , v1, . . . ,vk,uk+1 . . .un. Applying the Gram-Schmidt orthogonalization procedure to this basis, we obtain an orthogonalbasis of V . It can be easily seen that the procedure leaves v1, . . . ,vk un-changed. 2

2.3. Orthogonal subspaces, orthogonal complements. Let (V, ., .)be an inner product space.

Definition 10. Let U be a subspace of V . We say that a vector w V isorthogonal to U if it is orthogonal to every vector in U :

w U if and only if w u for all u UDefinition 11. Two subspaces U,W V are orthogonal (U W ) ifevery u U is orthogonal to every w W .

Remark. If U W then U W = {0}.Indeed, suppose that x U W . Then x U and x W , and therefore

we must have x,x = 0 which implies x = 0.Examples. Let V = R3.(i) If the (subspace consisting of all vectors on) the z-axis is orthogonal to

any one dimensional subspace ` (a line through the origin) then ` is includedin the xy-plane.

(ii) In fact the z-axis is orthogonal to the xy-plane.(iii) But the intuition coming from classical geometry stops here. As

vector subspaces, the yz-plane is not orthogonal to the xy-plane (they havea subspace in common!).


Recall that any subspace U V as a complement in V (In fact, it hasinfinitely many). But exactly one of those is orthogonal to U :

Theorem 12. Let U be a subspace of V . Denote

U = {w V |w U}Then U is a subspace and

U U = VThe space U is called the orthogonal complement of U in V .

Proof of Theorem 12.To show that U is a subspace, let w,w U; then u,w = 0 and

u,w = 0 for any u U . Then for any c, c any scalars in F we haveu, cw + cw = cu,w+ cu,w = 0

which shows that cw + cw U.The sum is direct since if u U U then u U and u U, therefore

u,u = 0 therefore u = 0.Finally to show that U + U = V , let u1, . . . ,uk be an orthogonal basis

of U , which can be completed to an orthogonal basis of V by Corollary 9:u1, . . . ,uk,v1, . . . ,vnk.

Then U = Sp(v1, . . . ,vnk). Indeed, clearly Sp(v1, . . . ,vnk) Uand by dimensionality, U cannot contain any other linearly independentvectors. Therefore U U = V . 2

Remark. For any subspace U of a finite dimensional inner product spaceV the orthogonal complement of the orthogonal complement of U is U :(

U)

= U

The proof is left as an exercise.

2.4. Splitting of linear transformations. Let U, V be two inner productspaces over the same scalar field F , and consider any linear transformationT : U V .

If T is not one to one we can restrict the domain of T and obtain a oneto one any linear transformation: split U = N (T )N (T ) and define T tobe the restriction of T to the subspace N (T ):

T : N (T ) V, T (x) = T (x)Note that R(T ) = R(T ).

If T is not onto, then split V = R(T )R(T ). The linear transformationT : U R(T ) defined by T (x) = T (x) for all x U is onto (and it isessentially T ).

By splitting both the domain and the codomain we can write

(11) T : N (T ) N (T ) R(T )R(T )

8 RODICA D. COSTIN

where the restriction of T :

T0 : N (T ) R(T ), T0(x) = T (x)is one to one and onto: it is invertible!

Let us look at the matrix associated to T in suitable bases. Let B1 ={u1, . . .uk} be a basis for N (T ) and uk+1, . . . ,un a basis for N (T ). ThenBU = {u1, . . . ,un} is a basis for U . Similarly, let B2 = {v1, . . .vk} be a basisfor R(T ) and vk+1, . . . ,um a basis for R(T ). Then BV = {v1, . . . ,vn} isa basis for V . Then the matrix associated to T in the bases BU , BV has theblock form [

M0 00 0

]where M0 is the kk invertible matrix associated to T0 in the bases B1, B2.2.5. The adjoint and the four fundamental subspaces of a matrix.

2.5.1. The adjoint of a linear transformation. Let (V, ., .V ), (U, ., .U ) betwo inner product spaces over the same scalar field F . We restrict thediscussion to finite dimensional spaces.

Definition 13. If T : U V is a linear transformation, its adjoint T isthe linear transformation T : V U which satisfies

Tu,vV = u, T vU for all u U, v VThe existence of the adjoint, and its uniqueness, needs a proof, which is

not difficult, but will be omitted here.We focus instead on linear transformations given by matrix multiplication,

in the vectors spaces Rn or Cn with the usual inner product (1), respectively(2). In this case we will denote, for simplicity, the linear operator and itsmatrix by the same letter and say, by abuse of language and notation, thata matrix is a linear transformation. Definition 13 becomes: if M is an nmmatrix with entries in F , its adjoint is the matrix M for which

(12) Mu,vV = u,MvU for all u Fn, v FmFor F = R the usual linear algebra notation for the inner product (1) is

x,y = xTyand in this notation, recalling that (AB)T = BTAT , relation (12) is

(13) uTMTv = uTMv

which shows that for real matrices the adjoint is the transpose: M = MT .For F = C another linear algebra notation for the inner product (2) is

(14)

x,y = xHy, where xH is the Hermitian of the vector x : xH = xTand relation (12) is

uHMHv = uHMv


which shows that for complex matrices the adjoint is the complex conjugate

of the transpose: M = MH = MT .The notations (13), (14) will not be used here: they are very useful for

finite dimensional vector spaces, but not for infinite dimensional ones.We prefer to use the unified notation M for the adjoint of a matrix,

keeping in mind that M = MT in the real case and M = MH in thecomplex case.

Note that

(15) (M) = M

2.5.2. The four fundamental subspaces of a matrix. Let M be an mn realmatrix.

a) Since the columns of MT are the rows of M , thenR(MT ) =the column space of MT =the row space of M .Theorem 14 (ii) shows that the row space of M and the null space of M areorthogonal complements to each other.

b) The left null space of M is, by definition, the set of all x Rm sothat xTM = 0. Taking a transposition, we see that this is equivalent toMTx = 0, therefore the left null space of M equals N (MT ). Theorem 14 (i)shows that the left null space of M and the column space of M are orthogonalcomplements each other.

Theorem 14. Let M be an nm matrix. The following hold:(i) N (M) = R(M)(ii) R(M) = N (M)Proof.We need the following very useful fact:

Remark 15. Let (V, , ) be an inner product space. We have:x,v = 0 for all v V if and only if x = 0.

This is easy to see, since if we take in particular v = x then x,x = 0which implies x = 0.

To prove Theorem 14 (i): we have u R(M) if and only if u,Mv = 0for all v Rn if and only if Mu,v = 0 for all v Rn if and only if (byRemark 15) Mu = 0 hence u N (M).

The statement (ii) of Theorem 14 follows from (i) after replacing M byM and using (15). 2

We should take now another look at the splitting in 2.4. Formula (11):T : N (T ) N (T ) R(T )R(T )

is a splitting using the four fundamental spaces, and it can be also writtenas

T : R(T )N (T ) R(T )N (T )

10 RODICA D. COSTIN

(in infinite dimensions extra care is needed here).

3. Least squares approximations

3.1. The Cauchy-Schwartz inequality. One of the most useful, and deep,inequalities in mathematics, which holds in finite or infinite dimensional in-ner product spaces is:

Theorem 16. The Cauchy-Schwartz inequalityIn an inner product vector space any two vectors x,y satisfy

(16) |x,y| x ywith equality if and only if x,y are linearly dependent.

Before noting its rigorous and very short proof, let us have an intuitiveview based on the geometry of R3. Recall that the cosine of the anglebetween two vectors x,y R3 is cos = x y/(x y). Since | cos | 1the Cauchy-Schwartz inequality follows. Equality holds only for = 0 or = pi (in which cases x,y are scalar multiples of each other). We canalso have equality in the Cauchy-Schwartz inequality if x or y are the zerovectors (when we cannot define the angle between them), and in this casethe two vectors are linearly dependent as well.

Proof.If y = 0 then the inequality is trivially true. Otherwise, denote c =

x,y/y,y, and we have

0 x cy,x cy = x,x |x,y|2

y,y|which gives (16). 2

3.2. More about orthogonal projections. We already defined the or-thogonal projection (linear transformation/matrix) onto one dimensionaland two dimensional subspaces in 2.1. The formulas are similar for higherdimensional subspaces which are specified by an orthogonal basis. Here aretwo more characterization of orthogonal projections.

The first theorem gives us a very easy way to see if a linear transformationis, or is not, an orthogonal projection:

Theorem 17. Let V be a finite dimensional inner product space. A lineartransformation P : V V is an orthogonal projection if and only if P 2 = Pand P = P .

Proof.Suppose that P satisfied (i) and (ii), and show that it is a projection.We first need to identify the subspace onto which P projects; obviously,

this would be R(P ). We need to check that xPx R(P ), therefore thatPy,x Px = 0 for all y V . Indeed,

Py,x Px = Py,x Py, Px = Py,x P Py,x


= Py,x P 2y,x = Py,x Py,x = 0Conversely, let P be the orthogonal projection onto a subspace U . If

v1, . . . ,vk is an orthonormal basis for U , then the formula for P is thenatural generalization of (5) which is (taking into account that we assumedour vj to the unit vectors)

Px = v1,xv1 + . . .+ vk,xvkNote that Pvj = v1,vjv1 + . . .+ vk,vjvk = vj ,vjvj = vj for all j.

To check that P 2 = P calculate

P 2x = P (v1,xv1 + . . .+ vk,xvk) = v1,xPv1 + . . .+ vk,xPvk= v1,xv1 + . . .+ vk,xvk = Px

To check that P = P calculate, for any x,y Vy, Px = y, v1,xv1+. . .+vk,xvk = v1,xy,v1+. . .+vk,xy,vkv1,xv1,y+. . .+vk,xvk,y = v1,yv1+. . .+vk,yvk,x = Py,xwhich completes the proof. 2

In an inner product space we can define the distance between two vectorsx,y as xy. Note that this the usual distance between two points in R3.

The second characterization of orthogonal projections is very useful inapproximations. In planar and spacial geometry, the shortest distance be-tween a point and a line (or a plane) is the one measured on a perpendicularline; this is true in general:

Theorem 18. Let U be a subspace of the inner product V . For any x Vthe point in U which is at minimal distance to x is the orthogonal projectionof x onto U :

x Px = min{x u |u U}Proof.This is an immediate consequence to the Pythagorean theorem:

(17) x u2 = u Px2 + x Px2, for any u UExercise. Prove (17) using the fact that P is the orthogonal projection

onto U , therefore (xPx) U and that P 2 = P , P = P (see Theorem 17).2

3.3. Overdetermined systems: best fit solution. Let M be an nmmatrix with entries in R (we could as well work with F = C). By abuse ofnotation we will speak of the matrix M both as a matrix and as the lineartransformation Rn Rm which takes x to Mx, thus denoting by R(M) therange of the transformation (which is the same as the column space of M).

The linear system Mx = b has solutions if and only if b R(M). Ifb 6 R(M) then the system has no solutions, and it is called overdetermined.

In practice overdetermined systems are not uncommon, usual sources be-ing that linear systems are only models for more complicated phenomena,

12 RODICA D. COSTIN

and the collected data is subject to errors and fluctuations. For practicalproblems it is important to produce a best fit solution: an x for which theerror Mx b is as small as possible.

There are many ways of measuring such an error, the most convenient isthe least squares error: find x which minimizes the square error:

S = r21 + . . .+ r2m where rj = (Mx)j bj

Of course, this is the same as minimizing Mx b where the innerproduct is the usual dot product on Rm. By Theorem 18 it follows thatMx must equal p = Pb, the orthogonal projection of b onto the subspaceR(M).

We only need to solve the system Mx = Pb, which is solvable sincePb R(M). If M is one to one, then there is a unique solution x, otherwiseany vector in the space x +N (M) is a solution as well.

To find a formula for x, recall that we must have (b Pb) R(M)therefore, by Theorem 14 (i), we must have (b Pb) N (M), henceMb = MPb, so

Mb = MMxwhich is called the normal equation in statistics.

If M is one to one, then so is MM by Teorem 14 (iii), and since MM isa square matrix, then it is invertible, and we can solve x = (MM)1Mband we also have a formula for the projection

(18) Pb = M(MM)1Mb

If M is not one to one, then one particular least squares solution canbe chosen among all in x + N (M). Choosing the vector with the smallestnorm, this gives x = M+b where M+ is called the pseudoinverse of M . Thenotion of pseudoinverse will be studied in more detail later on.

3.4. Another formula for orthogonal projections. Formula (18) is an-other useful way of writing projections. Suppose that U is a subspace of Rmwhich is given as the range of a matrix M ; finding an orthogonal basis for Uusually requires some work, but this turns out not to be necessary in orderto write a nice formula for the orthogonal projection onto U = R(M). If thematrix is not one to one, then by restricting its domain to a smaller subspacewe obtain a smaller matrix with the same range, see 2.4. With this extracare, formula (18) gives the orthogonal projection onto the subspace R(M).

4. Orthogonal and unitary matrices, QR factorization

4.1. Unitary and orthogonal matrices. Let (V, , ) be an inner prod-uct space. The following type of linear transformation are what isomor-phisms of inner product spaces should be: linear, invertible, and they pre-serve the inner product (therefore angles and lengths):

Definition 19. A linear transformation U : V V is called a unitarytransformation if UU = UU = I.


In particular, for V = Cn or Rn when the transformation is given themultiplication by a matrix U :

Definition 20. An unitary matrix is an nn matrix with UU = UU =I and therefore U1 = U.

and

Definition 21. An unitary matrix with real entries is called an orthogonalmatrix.

We should immediately state and prove the following properties of unitarymatrices, each one can be used as a definition for a unitary matrix:

Theorem 22. The following statements are equivalent:(i) The n n matrix U is unitary.(ii) The columns of U are an orthonormal set of vectors (therefore an or-thonormal basis of Cn).(iii) U preserves angles: Ux, Uy = x,y for all x,y Cn.(iv) U is an isometry: Ux = x for all x Cn.

Remark. It also follows that the rows of U are orthonormal wheneverthe columns of U are orthonormal.

Examples. Rotation matrices and reflexion matrices in Rn, and theirproducts, are orthogonal (they preserve the length of vectors).

Remark. An isometry is necessarily one to one, and therefore, it is alsoonto (in finite dimensional spaces).

Remark. The equivalence between (i) and (iv) is not true in infinitedimensions (unless the isometry is assumed onto).

We will need the following result which shows that in an inner productspace the inner product is completely recovered if we know the norm of everyvector:

Theorem 23. The polarization identity:for complex spaces

f, g = 14

(f + g2 f g2 + if + ig2 if ig2)and for real spaces

f, g = 14

(f + g2 f g2)The proof is by a straightforward calculation.

Proof of Theorem 22.(i)(ii) is obvious by matrix multiplication (line j of U multiplying,

place by place, column i of U is exactly the dot product of column j complexconjugated and column i).

(i)(iii) is obvious because UU = I is equivalent to UUx,y = x,yfor all x,y Rn, which is equivalent to (iii).

14 RODICA D. COSTIN

(iii)(iv) by taking y = x.(iv)(iii) follows from the polarization identity. 2

4.2. Rectangular matrices with orthonormal columns. A simpleformula for orthogonal projections.

Let M = [u1, . . . ,uk] be an n k matrix whose columns u1, . . . ,uk arean orthonormal set of vectors. Then necessarily n k. If n = k then thematrix M is unitary, but assume here that n > k.

Note that N (M) = {0} since the columns are independent.Note also that

MM = I

(since line j of M multiplying, place by place, column i of M is exactlyuj ui which equals 1 for i = j and 0 otherwise). Then the least squaresminimization formula (18) takes the simple form P = MM:

Theorem 24. Let u1, . . . ,uk be an orthonormal set, and M = [u1, . . . ,uk].The orthogonal projection onto U = Sp(u1, . . . ,uk) = R(M) (cf. 3.4) is

P = MM

4.3. QR factorization. The following decomposition of matrices has count-less applications, and extends to infinite dimensions.

If an m n matrix M = [u1, . . . ,un] has linearly independent columns(hence m n and N (M) = 0) then applying the Gram-Schmidt process onthe columns u1, . . . ,un where we also normalize, to produce an orthonormalbasis (not only an orthogonal one) amounts to factoring M as describedbelow.

Theorem 25. QR factorization of matricesAny m k matrix M = [u1, . . . ,uk] with linearly independent columns

can be factored as M = QR where Q is an m k matrix whose columnsform an orthonormal basis for R(M) (hence QQ = I) and R is an k kupper triangular matrix with positive entries on its diagonal (hence R isinvertible).

In the case of a square matrix M then Q is also square, and it is a unitarymatrix.

If the matrix M has real entries, then Q and R have real entries, and ifk = n then Q is an orthogonal matrix.

Remark 26. A similar factorization can be written A = Q1R1 with Q1 anm m unitary matrix and R1 an m k rectangular matrix whose first krows are the upper triangular matrix R and the last m k rows are zero:

A = Q1R1 = [Q Q2]

[R0

]= QR

Proof of Theorem 25.


Every step of the Gram-Schmidt process (6), (7), . . . (9) is completed bynormalization: let

(19) qj =jvjvj for all j = 1, . . . k, where j C, |j | = 1

to obtain an orthonormal set q1, . . . ,qk .First, use (19) to replace the orthogonal set of vectors v1, . . . ,vk in (6),

(7),. . .,(9) by the orthonormal set q1, . . . ,qk.Then invert: write uj s in terms of qj s. Since q1 Sp(u1), q2

Sp(u1,u2),. . . , qj Sp(u1,u2, . . . ,uj), . . . then u1 Sp(q1), u2 Sp(q1,q2),. . . ,uj Sp(q1,q2, . . . ,qj), . . . and therefore there are scalars cij so that(20) uj = c1jq1 + c2jq2 + . . .+ cjjqj for each j = 1, . . . , k

and since q1, . . . ,qk are orthonormal, then

(21) cij = qi,ujRelations (20), (21) can be written in matrix form as

M = QR

with

Q = [q1, . . . ,qk], R =

q1,u1 q1,u2 . . . q1,uk

0 q2,u2 . . . q2,uk...

......

0 0 . . . qk,uk

We have the freedom of choosing the constants j of modulus 1, and

they can be chosen so that all diagonal elements of R are positive (since

qj ,uj = jvjvj ,uj choose j = vj ,uj/|vj ,uj|). 2

For numerical calculations the Gram-Schmidt process described aboveaccumulates round-off errors. For large m and k other more efficient, nu-merically stable, algorithms exist, and should be used.

Applications of the QR factorization to solving linear systems.1. Suppose M is an invertible square matrix. To solve Mx = b, factoring

M = QR, the system is QRx = b, or Rx = Qb which can be easily solvedsince R is triangular.

2. Suppose M is an m k rectangular matrix, of full rank k. Sincem > k the linear system Mx = b may be overdetermined. Using the QRfactorization in Remark 26, the system is Q1R1x = b, or R1x = Q

1b, which

is easy to see if it has solutions: the last mk rows of Q1b must be zero. Ifthis is the case, the system can be easily solved since R is upper triangular.

Documents

Lecture 4 Orthogonal It y