APPLIED MATRIX THEORY - University of New Mexiconitsche/courses/464/notes.pdf · Nitsche and Benner Applied Matrix Theory 7 Linear Transformations81 7.1 ... The textbook for the class

APPLIED MATRIX THEORY

j

Lecture Notes for Math 464/514 Presented by

DR. MONIKA NITSCHE

j

Typeset and Editted by

ERIC M. BENNER

j

STUDENTS PRESSDecember 3, 2013

Copyright © 2013

Contents

1 Introduction to Linear Algebra 1

1.1 Lecture 1: August 19, 2013 . . . . . . . . . . . . . . . . . . . . . . . . . . 1

About the class, 1. Linear Systems, 1. Example: Application to boundary valueproblem, 2. Analysis of error, 3. Solution of the discretized equation, 4.

2 Matrix Inversion 5

2.1 Lecture 2: August 21, 2013 . . . . . . . . . . . . . . . . . . . . . . . . . . 5

Gaussian Elimination, 5. Inner-product based implementation, 7. Office hours andother class notes, 8. Example: Gauss Elimination, 8.

2.2 Lecture 3: August 23, 2013 . . . . . . . . . . . . . . . . . . . . . . . . . . 8

Example: Gauss Elimination, cont., 8. Operation Cost of Forward Elimination, 9.Cost of the Order of an Algorithm, 10. Validation of Lower/Upper Triangular Form, 11.Theoretical derivation of Lower/Upper Form, 11.

2.3 HW 1: Due August 30, 2013 . . . . . . . . . . . . . . . . . . . . . . . . . 12

3 Factorization 15

3.1 Lecture 4: August 26, 2013 . . . . . . . . . . . . . . . . . . . . . . . . . . 15

Elementary Matrices, 15. Solution of Matrix using the Lower/Upper factorization, 18.Sparse and Banded Matrices, 18. Motivation for Gauss Elimination with Pivoting, 19.

3.2 Lecture 5: August 28, 2013 . . . . . . . . . . . . . . . . . . . . . . . . . . 19

Motivation for Gauss Elimination with Pivoting, cont., 19. Discussion of well-posedness, 20.Gaussian elimination with pivoting, 21.

3.3 Lecture 6: August 30, 2013 . . . . . . . . . . . . . . . . . . . . . . . . . . 22

Discussion of HW problem 2, 22. PLU factorization, 22.

3.4 Lecture 7: September 4, 2013 . . . . . . . . . . . . . . . . . . . . . . . . . 24

PLU Factorization, 24. Triangular Matrices, 25. Multiplication of lower triangular ma-trices, 25. Inverse of a lower triangular matrix, 25. Uniqueness of LU factorization, 26.Existence of the LU factorization, 26.


About Homeworks, 27. Discussion of ill-conditioned systems, 27. Inversion of lowertriangular matrices, 28. Example of LU decomposition of a lower triangular matrix, 28.Banded matrix example, 29.

iii

Nitsche and Benner Applied Matrix Theory


Existence of the LU factorization (cont.), 29. Rectangular matrices, 31.

3.7 HW 2: Due September 13, 2013 . . . . . . . . . . . . . . . . . . . . . . . 32

4 Rectangular Matrices 35

4.1 Lecture 10: September 11, 2013 . . . . . . . . . . . . . . . . . . . . . . . 35

Rectangular matrices (cont.), 35. Example of RREF of a Rectangular Matrix, 37.


Solving Ax = b, 38. Example, 38. Linear functions, 39. Example: Transposeoperator, 40. Example: trace operator, 40. Matrix multiplication, 41. Proof oftransposition property, 42.


Inverses, 42. Low rank perturbations of I, 43. The Sherman–Morrison Formula, 44.Finite difference example with periodic boundary conditions, 44. Examples of pertur-bation, 45. Small perturbations of I, 45.


Small perturbations of I (cont.), 46. Matrix Norms, 47. Condition Number, 48.

4.5 HW 3: Due September 27, 2013 . . . . . . . . . . . . . . . . . . . . . . . 49

5 Vector Spaces 55


Topics in Vector Spaces, 55. Field, 55. Vector Space, 56. Examples of functionspaces, 57.


The four subspaces of Am×n, 58.


The Four Subspaces of A, 62. Linear Independence, 63.


Linear functions (rev), 64. Review for exam, 64. Previous lecture continued, 65.

5.5 Lecture 18: October 2, 2013. . . . . . . . . . . . . . . . . . . . . . . . . . 66

Exams and Points, 66. Continuation of last lecture, 66.

6 Least Squares 69

6.1 Lecture 19: October 4, 2013. . . . . . . . . . . . . . . . . . . . . . . . . . 69

Least Squares, 69.

6.2 Lecture 20: October 7, 2013. . . . . . . . . . . . . . . . . . . . . . . . . . 70

Properties of Transpose Multiplication, 71. The Normal Equations, 71. Exam 1, 73.

6.3 Lecture 21: October 9, 2013. . . . . . . . . . . . . . . . . . . . . . . . . . 74

Exam Review, 74. Least squares and minimization, 74.

6.4 HW 4: Due October 21, 2013 . . . . . . . . . . . . . . . . . . . . . . . . . 76

iv


7 Linear Transformations 81

7.1 Lecture 22: October 14, 2013 . . . . . . . . . . . . . . . . . . . . . . . . . 81

Linear Transformations, 83. Examples of Linear Functions, 83. Matrix representationof linear transformations, 83.

7.2 Lecture 23: October 16, 2013 . . . . . . . . . . . . . . . . . . . . . . . . . 84

Basis of a linear transformation, 84. Action of linear transform, 87. Change of Basis, 88.

7.3 Lecture 24: October 21, 2013 . . . . . . . . . . . . . . . . . . . . . . . . . 89

Change of Basis (cont.), 89.

7.4 Lecture 25: October 23, 2013 . . . . . . . . . . . . . . . . . . . . . . . . . 91

Properties of Special Bases, 91. Invariant Subspaces, 93.

7.5 HW 5: Due November 4, 2013 . . . . . . . . . . . . . . . . . . . . . . . . 94

8 Norms 99

8.1 Lecture 26: October 25, 2013 . . . . . . . . . . . . . . . . . . . . . . . . . 99

Difinition of norms, 99. Vector Norms, 99. The two norm, 99. Matrix Norms, 101.Induced Norms, 102.

8.2 Lecture 27: October 28, 2013 . . . . . . . . . . . . . . . . . . . . . . . . .102

Matrix norms (review), 102. Frobenius Norm, 102. Induced Matrix Norms, 104.

8.3 Lecture 28: October 30, 2013 . . . . . . . . . . . . . . . . . . . . . . . . .106

The 2-norm, 106.

9 Orthogonalization with Projection and Rotation 109

9.1 Lecture 28 (cont.) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .109

Inner Product Spaces, 109.

9.2 Lecture 29: November 1, 2013 . . . . . . . . . . . . . . . . . . . . . . . .110

Inner Product Spaces, 110. Fourier Expansion, 111. Orthogonalization Process(Gramm-Schmidt), 111.


Gramm–Schmidt Orthogonalization, 112.


Unitary (orthogonal) matrices, 116. Rotation, 117. Reflection, 118.

9.5 HW 6: Due November 11, 2013 . . . . . . . . . . . . . . . . . . . . . . .118


Elementary orthogonal projectors, 120. Elementary reflection, 121. ComplimentarySubspaces of V, 121. Projectors, 121.

9.7 Lecture 33: November 11, 2013. . . . . . . . . . . . . . . . . . . . . . . .122

Projectors, 122. Representation of a projector, 123.

9.8 Lecture 34: November 13, 2013. . . . . . . . . . . . . . . . . . . . . . . .124

Projectors, 124. Decompositions of Rn, 125. Range Nullspace decomposition ofAn×n, 126.

9.9 HW 7: Due November 22, 2013 . . . . . . . . . . . . . . . . . . . . . . .126

v


9.10 Lecture 35: November 15, 2013. . . . . . . . . . . . . . . . . . . . . . . .128Range Nullspace decomposition of An×n, 128. Corresponding factorization of A, 129.

10 Singular Value Decomposition 131

10.1 Lecture 35 (cont.) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .131Singular Value Decomposition, 131.

10.2 Lecture 36: November 18, 2013. . . . . . . . . . . . . . . . . . . . . . . .132Singular Value Decomposition, 132. Existence of the Singular Value Decomposition, 133.

10.3 Lecture 37: November 20, 2013. . . . . . . . . . . . . . . . . . . . . . . .136Review and correction from last time, 136. Singular Value Decomposition, 136. Geometricinterpretation, 138.

10.4 Lecture 38: November 22, 2013. . . . . . . . . . . . . . . . . . . . . . . .139Review for Exam 2, 139. Norms, 139. More major topics, 140.

10.5 HW 8: Due December 10, 2013 . . . . . . . . . . . . . . . . . . . . . . .142

10.6 Lecture 39: November 27, 2013. . . . . . . . . . . . . . . . . . . . . . . .144Singular Value Decomposition, 144. SVD in Matlab, 145.

11 Additional Topics 149

11.1 Lecture 39 (cont.) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .149The Determinant, 149.

11.2 Lecture 40: December 2, 2013 . . . . . . . . . . . . . . . . . . . . . . . .150Further details for class, 150. Diagonalizable Matrices, 150. Eigenvalues and eigenvec-tors, 150.

Index 155

Other Contents 157

vi

UNIT 1

Introduction to Linear Algebra

1.1 Lecture 1: August 19, 2013

About the class

The textbook for the class will be Matrix Analysis and Applied Linear Algebra by Meyer.Another highly recommended text is Laub’s Matrix Analysis for Scientists and Engineers.

Linear Systems

A linear system may be of the general form

Ax = b. (1.1.1)

This may be represented in several equivalent ways.

2x1 + x2 − 3x3 = 18, (1.1.2a)

−4x1 + 5x3 = −28, (1.1.2b)

6x1 + 13x2 = 37. (1.1.2c)

This also may be put in matrix form 2 1 −3−4 0 5

6 13 0

x1

x2

x3

=

18−28

37

. (1.1.3)

Finally, a the third common form is vector form: 2−4

6

x1 +

10

13

x2 +

−350

x3 =

18−28

37

. (1.1.4)

1

Nitsche and Benner Unit 1. Introduction to Linear Algebra

t

y

t0 t1 t2 t3 · · · tn

y(t)

Figure 1.1. Finite difference approximation of a 1D boundary value problem.

Example: Application to boundary value problem

We will use finite difference approximations on a rectangular grid to solve the system,

− y′′(t) = f(t), for t ∈ [0, 1], (1.1.5)

with the boundary conditions

y(0) = 0, (1.1.6a)

y(1) = 0. (1.1.6b)

This is a 1D version of the general Laplace equation represented by,

−∆u = f (1.1.7)

or in more engineering/science form

−∇2u = f. (1.1.8)

The Laplace operator in cartesian coordinates,

∇2u =∇ · (∇u), (1.1.9a)

= uxx + uyy + uzz. (1.1.9b)

Finite Difference Approximation

Let tj = j∆t, with j = 0, . . . , N . The approximate forms of the solution yj ≈ y(tj).Now we need to approximate the derivatives with discrete values of the variables. The

forward difference approximation is

y′(tj) =yj+1 − yjtj+1 − tj

, (1.1.10)

or

y′(tj) =yj+1 − yj

∆t, (1.1.11)

2

1.1. Lecture 1: August 19, 2013 Applied Matrix Theory

The backward difference approximation is

y′(tj) =yj − yj−1

∆t. (1.1.12)

The centered difference approximation is

y′(tj) =yj+1 − yj−1

2∆t. (1.1.13)

Each of these are useful approximations to the first derivative that have varying propertieswhen applied to specific differential equations.

The second derivative may be approximated by combining the approximations of the firstderivative

(y′)′(tj) ≈y′j+ 1

2

− y′j− 1

2

∆t, (1.1.14a)

=yj+1−yj

∆t− yj−yj−1

∆t

∆t, (1.1.14b)

=yj+1 − 2yj + yj−1

∆t2. (1.1.14c)

Analysis of error

To understand the error of this approximation we may utilize the Taylor series . A generalTaylor series is

f(x) = f(a) + f ′(a)(x− a) +1

2f ′′(a)(x− a)2 +

1

3!f ′′′(a)(x− a)3 + · · · (1.1.15)

By the Taylor remainder theorem, we may approximate the error with a special truncationof the series,

f(x) = f(a) + f ′(a)(x− a) +1

2f ′′(a)(x− a)2 +

1

3!f ′′′(ξ)(x− a)3, (1.1.16)

or simply

f(x) = f(a) + f ′(a)(x− a) +1

2f ′′(a)(x− a)2 +O

((x− a)3

). (1.1.17)

The difference we are interested in to find the error is,

E = y′′(tj)−y(tj+1)− 2y(tj) + y(tj−1)

∆t2(1.1.18)

The Taylor series,

y(tj+1) = y(tj + ∆t) = y(tj) + y′(tj)∆t+O(∆t2), (1.1.19a)

y(tj−1) = y(tj −∆t) = y(tj)− y′(tj)∆t+O(∆t2)

(1.1.19b)

will need to be substituted.A function g is said to be order 2, or g = O(h2), if,

|g| ≤ Ch2. (1.1.20)

3

Nitsche and Benner Unit 1. Introduction to Linear Algebra

Solution of the discretized equation

We now substitute the discrete difference,

− yj+1 − 2yj + yj−1

∆t2= f(tj), for j = 1, . . . , n− 1 (1.1.21)

and the boundary conditions become

y0 = 0, (1.1.22a)

yn = 0. (1.1.22b)

This gives the linear system which will need to be solved for the unknowns yi.2 −1 0 · · · 0

−1 2 −1. . .

...

0 −1 2. . . 0

.... . . . . . . . . −1

0 · · · 0 −1 2

y1

y2...

yn−2

yn−1

= ∆t2

f(t1)f(t2)

...f(tn−2)f(tn−1)

. (1.1.23)

4

UNIT 2

Matrix Inversion


Previously we came up with a tridiagonal system for finite difference solution last time.

Gaussian Elimination

We want to solve Ax = b. Claim: Gaussian elimination: A = LU

Notation:

A = [aij] (2.1.1)

Lower triangular system Lx = b. In class we use underlines to indicate the vector. Ingeneral these vectors are column vectors, and we will use xᵀ to indicate the row vector.

Lower triangular system Lx = b`11 0 0 0`21 `22 0 · · · 0`31 `32 `21 0

.... . .

...`n1 `n2 `n3 · · · `nn

x1

...

xn

=

b1

...

bn

(2.1.2)

or

`11x1 = b1 (2.1.3a)

`21x1 + `22x2 = b2 (2.1.3b)

· · · (2.1.3c)

`n1x1 + `n2x2 + · · ·+ `nnxn = bn (2.1.3d)

5

Nitsche and Benner Unit 2. Matrix Inversion

Rearranging to solve the equations,

x1 =b1

`11

(2.1.4a)

x2 =b2 − `21x1

`22

(2.1.4b)

· · · (2.1.4c)

xi =bi −

(ì(i−1)xi−1 + · · ·+ ì1x1

)ìi

(2.1.4d)

The basic algorithm for solution of the above system in pseudo code follows:

1: x1 ← b1/`11

2: for i← 2, n do3: xi ← [bi −

∑i−1k=1 ìkxk]/ìi

4: end for

The operation count, Nops, becomes,

Nops = 1 +n∑i=2

[1︸︷︷︸

division

+ 1︸︷︷︸substitution

+ (i− 1)︸︷︷︸multiplication

+ (i− 2)︸︷︷︸addition

]. (2.1.5)

Each of the terms arise directly from the steps of the algorithm shown above.

ASIDE: Finite sums

We need the following sums for our derivations of the operation counts,

n∑i=1

i =n(n+ 1)

2, (2.1.6)

n∑i=1

i2 =n(n+ 1)(2n+ 1)

6. (2.1.7)

Evaluating the operation count,

Nops = 1 +n∑i=2

(2i− 1), (2.1.8a)

=n∑i=1

(2i− 1), (2.1.8b)

= 2

(n∑i=1

i

)− n, (2.1.8c)

= n(n+ 1)− n, (2.1.8d)

= n2. (2.1.8e)

6


Implementation of lower triangular solution in Matlab

We give a Matlab code for this solution,

1 function x = L t r i s o l (L , b)2 % so l v e $Lx = b$ , assuming $L { i i } \ne 0$3 n = length (b ) ;4 % i n i t i a l i z e the s i z e o f your v e c t o r s5 x1 = b (1)/ l ( 1 , 1 ) ;6 for i = 2 : n7 x ( i ) = b( i ) ;8 for k = 1 : i−19 x ( i ) = x ( i ) − l ( i , k ) ∗ x ( k ) ;

10 end11 end12 %13 end

This would be saved as the code Ltrisol.m and would be run as

>> L = ...; b = ...;

>> x = Ltrisol(L, b)

Warning: Matlab loops are very slow!

Inner-product based implementation

How do we re-write the code as inner products?We can reorder the second for-loop so that it is simply an inner-product,

1 function x = L t r i s o l (L , b)2 % so l v e $Lx = b$ , assuming $L { i i } \ne 0$3 n = length (b ) ;4 % i n i t i a l i z e the s i z e o f your v e c t o r s5 x1 = b (1)/ l ( 1 , 1 ) ;6 for i = 2 : n7 x ( i ) = (b( i ) − l ( i , 1 : i −1)∗x ( 1 : i −1))/ l ( i , i ) ;8 end9 %

10 end

Note that the l(i,1:i-1) term is a row vector and x(1:i-1) is a column vector so thiscode will work fine. Recall that this required that x be initialized as a column vector. Theinner part can also be rewritten more cleanly as,

1 function x = L t r i s o l (L , b)2 % so l v e $Lx = b$ , assuming $L { i i } \ne 0$3 n = length (b ) ;4 % i n i t i a l i z e the s i z e o f your v e c t o r s5 x1 = b (1)/ l ( 1 , 1 ) ;6 for i = 2 : n7 k = 1 : i −1;8 x ( i ) = (b( i ) − l ( i , k )∗x ( k ) )/ l ( i , i ) ;9 end

7


10 %11 end

Office hours and other class notes

Office hours will be from 12–1 on MWF, the web address is, www.math.unm.edu/~nitsche/math464.html.

Example: Gauss Elimination

Example:

2x1 − x2 + 3x3 = 13 (2.1.9a)

−4x1 + 6x2 − 5x3 = −28 (2.1.9b)

6x1 + 13x2 − 16x3 = 37 (2.1.9c)

Let’s perform each step in full equation form. So we execute the steps R2 → R2 − (−2)R1

and R3 → R3 − (−3)R1.

2x1 − x2 + 3x3 = 13 (2.1.10a)

4x2 + x3 = −2 (2.1.10b)

16x2 + 7x3 = −2 (2.1.10c)

Next step will be R3 → R3 − (4)R2.


Example: Gauss Elimination, cont.

Example:

2x1 − x2 + 3x3 = 13 (2.2.1a)

−4x1 + 6x2 − 5x3 = −28 (2.2.1b)

6x1 + 13x2 − 16x3 = 37 (2.2.1c)

Let’s perform each step in full equation form. So we execute the steps R2 → R2 − (−2)R1

and R3 → R3 − (−3)R1.

2x1 − x2 + 3x3 = 13 (2.2.2a)

4x2 + x3 = −2 (2.2.2b)

16x2 + 7x3 = −2 (2.2.2c)

8

www.math.unm.edu/~nitsche/math464.html

www.math.unm.edu/~nitsche/math464.html


Next step will be R3 → R3 − (4)R2.

2x1 − x2 + 3x3 = 13 (2.2.3a)

4x2 + x3 = −2 (2.2.3b)

3x3 = 6 (2.2.3c)

Now we begin the backward substitution.

x3 = 2; (2.2.4a)

x2 = (−2− x3)/4, (2.2.4b)

= −1; (2.2.4c)

x1 = (13 + x2 − 3x3)/2, (2.2.4d)

= 3. (2.2.4e)

Gauss Elimination is forward elimination and backward substitution. Now we will do thesame problem in matrix form, 2 −1 3 13

−4 6 −5 −286 13 16 37

→ 2 −1 3 13

0 4 1 −20 16 7 −2

, (2.2.5a)

→

2 −1 3 130 4 1 −20 0 3 6

. (2.2.5b)

Operation Cost of Forward Elimination

Now we want to know the operation count for the forward elimination step when we takeA→ U without pivoting for a general n× n matrix, A = [aij]. As an example of each step:

a11 a12 a13 a14 a15

a21 a22 a23 a24 a25

a31 a32 a33 a34 a35

a41 a42 a43 a44 a45

a51 a52 a53 a54 a55

→a11 a12 a13 a14 a15

0 a′22 a′23 a′24 a′25

0 a′32 a′33 a′34 a′35

0 a′42 a′43 a′44 a′45

0 a′52 a′53 a′54 a′55

(2.2.6a)

These operations are given by, rowj → rowj − ìjrowi, where ìj =aijaii

if aii 6= 0 (aii should

not be close to zero or we will need to use pivoting). An example, a1j → aij − a1ja11a1j = 0.

The next step,

→

a11 a12 a13 a14 a15

0 a′22 a′23 a′24 a′25

0 0 a′′33 a′′34 a′′35

0 0 a′′43 a′′44 a′′45

0 0 a′′53 a′′54 a′′55

(2.2.6b)

9


t

y

t0 t1 t2 t3 · · · tn

y(t)

(a) n grid

t

y

t0 t2 t4 t6 t8 t10 t12 t14 t16 · · · t4n

y(t)

(b) 4n grid

Figure 2.1. One-dimensional discrete grids.

At ith step (i = 1 : n− 1),B(n−i)×(n−i) → B(n−i)×(n−i), (2.2.7)

the cost of the individual step: n− i︸︷︷︸comp ìj

+ 2(n− i)2︸︷︷︸comp aij

. The total cost is thus,

Nops =n−1∑i=1

[(n− i) + 2(n− 1)2

](2.2.8a)

Let k = n− i then i = 1→ k = n− 1 and i = n− 1→ k = n− (n− 1) = 1

=1∑

k=n−1

(k + 2k2), (2.2.8b)

=(n− 1)n

2︸︷︷︸O(n2)

+2(n− 1)n(2(n− 1) + 1)

6︸︷︷︸O(n3)

, (2.2.8c)

≈ O(n3). (2.2.8d)

This means that the problem scales with order 3.

Cost of the Order of an Algorithm

For an order 3 algorithm, if you increase the size of your matrix by a factor of 2, the expenseof computer time will increase by a factor of 8. Similarly, if it took one day to solve aboundary value problem in 1D with n = 1000, then it will take 64 days to do n = 4000 (seefigure 2.1).

Alternatively, if you are doing a 2D simulation, increasing by a factor of 4, as shown infigure 2.2, would increase the domain to 16 and thus the calculations would increase to 163.This gets very expensive! This is one of the reasons that models of phenomena such as theweather is very difficult.

10


x

y

x0 xny0

yn

(a) n× n grid

x

y

x0 x4n

y0

y4n

(b) 4n× 4n grid

Figure 2.2. Two-dimensional discrete grids.

Validation of Lower/Upper Triangular Form

Consider that we have the Gaussian Elimination with A = LU, where

L =

(1 0ìj 1

). (2.2.9)

Check our previous system: 2 −1 3−4 6 −5

6 13 16

=

1 0 0−2 1 0

3 4 1

2 −1 30 4 10 0 3

. (2.2.10)

This works!

Theoretical derivation of Lower/Upper Form

We want to show that Gauss elimination naturally leads to the LU form using elementaryrow operations. The three elementary operations are:

1. Multiply row by α;

2. Switch rowi and rowj;

3. Add multiple of rowi to rowj.

All are equivalent to pre-multiplying A by an elementary matrix. Let’s illustrate these:

11


1. Multiply by α.1 0 0 00 1 0 · · · 00 0 α 0

......

0 0 0 · · · 1

︸︷︷︸

Ei

a11 a12 a13 a1n

a21 a22 a23 · · · a2n

a31 a32 a33 a3n...

. . ....

an1 an2 an3 · · · ann

=

a11 a12 a13 a1n

a21 a22 a23 · · · a2n

αa31 αa32 αa33 αa3n...

. . ....

an1 an2 an3 · · · ann

(2.2.11a)

2.3 Homework Assignment 1: Due Friday, August 30,

2013

1. Use Taylor series expansions of f(x± h) about x to show that

f ′′(x) =f(x+ h)− 2f(x) + f(x− h)

h2− h2

12f (4)(x) +O

(h4). (2.3.1)

2. Consider the two-point boundary value problem

y′′(x) = ex, y(−1) =1

e, y(1) = e (2.3.2)

where x ∈ [−1, 1], Divide the interval [−1, 1] into N equal subintervals and applythe finite difference method presented in class to find the approximate the solutionyj ≈ y(xj) at theN−1 interior points j = 1, . . . , N−1, where xj = a+jh, h = (b−a)/N ,and [a, b] = [−1, 1]. Compare the approximate values at the grid points with the exactsolution at the grid points. Use N = 2, 4, 8, . . . , 29 and report the maximal absoluteerror for each N in a table. Your writeup should contain:

• the Matlab code;

• a table with two columns. The first contains h, the second contains the corre-sponding maximal errors. By how much is the error reduced every time N isdoubled? Can you conclude whether the error is O(h), O(h2) or O(hp) for someother integer p?

Regarding Matlab: If needed, go over the Matlab tutorial on the course website,items 1–6. This covers more than you need for this problem. In Matlab, type

help diag or help ones

to find what these commands do. The (N−1)×(N−1) matrix with 2s on the diagonaland –1 on the off-diagonals can be constructed by

v=ones(1,n-1);

A=2*diag(v)-diag(v(1:n-2),1)-diag(v(1:n-2),-1);

12

2.3. HW 1: Due August 30, 2013 Applied Matrix Theory

The system Ax = b can be solved in Matlab by x = A\b. The maximal differencebetween two vectors x and y is error=max(abs(x-y)). Your code should have thefollowing structure

Listing 2.1. code stub for tridiagonal solver

1 disp ( sprintf ( h e r r o r )2 a = . . . ; b = . . . ; % Set va l u e s o f endpo in t s3 ya = . . . ; yb = . . . ; % Set va l u e s o f y at the endpo in t s4 for n = . . . ;5 h=2/n ;6 x=a : h : b ;7 % Set matrix A of the l i n e a r system to be so l v ed .8 v=ones (1 , n−1);9 A=2∗diag ( v)−diag ( v ( 1 : n−2) ,1)−diag ( v ( 1 : n−2) ,−1);

10 % Set r i g h t hand s i d e o f l i n e a r system .11 rhs = . . .12 % Solve l i n e a r system to f i nd approximate s o l u t i o n .13 y ( 2 : n)=A\ r h s ; y(1)=ya ; y (n+1)=yb ;14 % Compute exac t s o l u t i o n and approximation error15 yex = . . . % se t exac t s o l u t i o n16 plot (x , y , b− , x , yex , r − ) % to compare v i s u a l l y17 error=max(abs (y−yex ) )18 disp ( sprintf ( %15.10 f %20.15 f , h , e r ror ) )19 end

Note that in Matlab the index of all vectors starts with 1. Thus, x=-1:h:1, is avector of length n+ 1 and the interior points are x(2:n).

3. Let U be an upper triangular n× n matrix with nonzero entries uij, j ≥ i.

(a) Write an algorithm that solves Ux = b for a given right hand side b for theunknown x.

(b) Find the number of operations that it takes to solve for x, using your algorithmabove.

(c) Write a Matlab function function x=utrisol(u,b) that implements your al-gorithm and returns the solution x.

4. Given A,b below,

(a) find the LU factorization of A (using the Gauss Elimination algorithm);

(b) use it to solve Ax = b.

A =

2 −1 0 0−1 2 −1 0

0 −1 2 −10 0 −1 2

, b =

0005

. (2.3.3)

5. Sparsity of L and U, given sparsity of A = LU. If A, B, C, D have non-zeros in thepositions marked by x, which zeros (marked by 0) are still guaranteed to be zero in

13


their factors L and U? (B,C,D are all band matrices with p = 3 bands, but differingsparsity within the bands. The question is how much of this sparsity is preserved.) Ineach case, highlight the new nonzero entries in L and U.

A =

x x x xx x x 00 x x x0 0 x x

, B =

x 0 x 0 0 00 x 0 x 0 0x 0 x 0 x 00 x 0 x 0 00 0 x 0 x 00 0 0 x 0 x

,

C =

x x x 0 0 00 x 0 x 0 0x 0 x 0 x 00 x 0 x 0 00 0 x 0 x 00 0 0 x 0 x

, D =

x 0 0 x 0 00 x 0 0 x 0x 0 x 0 0 x0 x 0 x 0 00 0 x 0 x 00 0 0 x 0 x

,

6. Consider solving a differential equation in a unit cube, using N points to discretize eachdimension. That is, you have a total of N3 points at which you want to approximatethe solution. Suppose that at each time step, you need to solve a linear system Ax = b,where A is an N3×N3 matrix, which you solve using Gauss Elimination, and supposethere are no other computations involved. Assume your personal computer runs at 1GigaFLOPS, that is, it executes 109 floating point operations per second.

(a) How much time does it take to solve your problem for N = 500 for 1000 timesteps?

(b) When you double the number of points N , you typically also have to halve thetimestep, that is, double the total number of timesteps taken. By what factordoes the runtime increase each time you double N?

(c) How much time will it take to solve the problem if you use N = 2000?

14

UNIT 3

Factorization


For the h in the homework, for n = 2.^(1:1:10). We want to deduce the order of themethod from the table of h and the error.

Elementary Matrices

1. Multiply rowi by α:

E1 =

1 0 0 0 0

0 . . . 0 0 0

0 0 α 0 0

0 0 0 . . . 0

0 0 0 0 1

. (3.1.1)

The inverse is,

E−11 =

1 0 0 0 0

0 . . . 0 0 0

0 0 1α

0 0

0 0 0 . . . 0

0 0 0 0 1

. (3.1.2)

E1E−11 = I (3.1.3)

2. Exchange rowi and rowj:

E2 =

1 0 0 0 0 00 1 0 0 0 00 0 0 1 0 00 0 1 0 0 00 0 0 0 1 00 0 0 0 0 1

. (3.1.4)

15

Nitsche and Benner Unit 3. Factorization

E22 = I (3.1.5)

3. Replace rowj by rowj + αrowi.

E3 =

1 0 0 0 0 00 1 0 0 0 00 0 1 0 0 00 0 α 1 0 00 0 0 0 1 00 0 0 0 0 1

. (3.1.6)

E−13 =

1 0 0 0 0 00 1 0 0 0 00 0 1 0 0 00 0 −α 1 0 00 0 0 0 1 00 0 0 0 0 1

. (3.1.7)

What happens if we post-multiply by the elementary matrices? The matrices will act onthe columns instead of the rows.

AE1 =

a11 a12 a13 a1n

a21 a22 a23 · · · a2n

a31 a32 a33 a3n...

. . ....

an1 an2 an3 · · · an

1 0 0 0 0

0 . . . 0 0 0

0 0 α 0 0

0 0 0 . . . 0

0 0 0 0 1

=

a11 a12 αa13 a1n

a21 a22 αa23 · · · a2n

a31 a32 αa33 a3n...

. . ....

an1 an2α an3 · · · an

(3.1.8)

AE2 =

a11 a12 a13 a1n

a21 a22 a23 · · · a2n

a31 a32 a33 a3n...

. . ....

an1 an2 an3 · · · an

1 0 0 0 0 00 1 0 0 0 00 0 0 1 0 00 0 1 0 0 00 0 0 0 1 00 0 0 0 0 1

(3.1.9)

Gaussian Elimination without pivoting

Premultiply by elementary matrices type 3 repeatedly.

`ji =ajiaii, for j > i (3.1.10)

E−21A =

x x x x x0 x x x xx x x x xx x x x xx x x x x

(3.1.11)

16


E−31E−21A =

x x x x x0 x x x x0 x x x xx x x x xx x x x x

(3.1.12)

This sequence continues until we have introduced zeros to get the lower diagonal:

E−n,n−1 · · ·E−n1 · · ·E−31E−21A =

x x x x x0 x x x x0 0 x x x0 0 0 x x0 0 0 0 x

= U (3.1.13)

Thus,A = E21E31 · · ·En−1,n−2En,n−2En,n−1︸︷︷︸

L

U (3.1.14)

E21E31 =

1 0 0 0 0 0`21 1 0 0 0 00 0 1 0 0 00 0 0 1 0 00 0 0 0 1 00 0 0 0 0 1

1 0 0 0 0 00 1 0 0 0 0`31 0 1 0 0 00 0 0 1 0 00 0 0 0 1 00 0 0 0 0 1

=

1 0 0 0 0 0`21 1 0 0 0 0`31 0 1 0 0 00 0 0 1 0 00 0 0 0 1 00 0 0 0 0 1

. (3.1.15)

Which extends to

E1 = En1 · · ·E21E31 =

1 0 0 0 0 0`21 1 0 0 0 0`31 0 1 0 0 0... 0 0

. . . 0 0`n−1,1 0 0 0 1 0`n1 0 0 0 0 1

. (3.1.16)

This further extends to,

E1E2 =

1 0 0 0 0 0`21 1 0 0 0 0`31 `32 1 0 0 0...

... 0. . . 0 0

`n−1,1 `n−1,2 0 0 1 0`n1 `n2 0 0 0 1

. (3.1.17)

Finally we get that

E1E2 · · · En−1 =

1 0 0 0 0 0`21 1 0 0 0 0`31 `32 1 0 0 0...

.... . . . . . 0 0

`n−1,1 `n−1,2 · · · `n−1,n−2 1 0`n1 `n2 · · · `n,n−2 `n,n−1 1

. (3.1.18)

17


Solution of Matrix using the Lower/Upper factorization

To use A = LU to solve Ax = b.

1. Find L,U (number of operations: 23n3)

2. L(Ux) = b First solve Ly = b (number of operations: n2), then solve, Ux = y(number of operations: n2).

Example:

To solve Ax = b_k k= 1,10^4

%

Find L, U once O(2/3 n^3)

then solve

L y = b

U x = y

10,000 times

O(10,000 * n^2 * 2)

Sparse and Banded Matrices

Given

A =

x 0 0 0 0

0 . . . 0 0 0

0 0 x 0 0

0 0 0 . . . 0

0 0 0 0 x

(3.1.19)

the bandwidth is 1. Below,

A =

x x 0 0 0 0x x x 0 0 00 x x x 0 00 0 x x x 00 0 0 x x x0 0 0 0 x x

, (3.1.20)

the bandwidth is 3—this is a tridiagonal matrix . This type of matrix maintains it’s sparsitywhen it undergoes LU decomposition.

x x 0 0 0 0x x x 0 0 00 x x x 0 00 0 x x x 00 0 0 x x x0 0 0 0 x x

=

1 0 0 0 0 0x 1 0 0 0 00 x 1 0 0 00 0 x 1 0 00 0 0 x 1 00 0 0 0 x 1

x x 0 0 0 00 x x 0 0 00 0 x x 0 00 0 0 x x 00 0 0 0 x x0 0 0 0 0 x

. (3.1.21)

18


Motivation for Gauss Elimination with Pivoting

When does Gauss elimination give us a problem? For example

1.

(0 11 1

)

2. A =

(δ 11 1

). Solve Ax =

(1 + δ

2

), the exact solution is

(11

). However, we run into

numerical problems.


Motivation for Gauss Elimination with Pivoting, cont.

When does Gauss elimination give us a problem? Returning to the example problem, A =(δ 11 1

). Solve Ax =

(1 + δ

2

), the exact solution is

(11

), but we run into numerical

problems.There are a couple approaches to this problem. First, solve for x by first finding L,U

and using them numerically,

A =

(δ 11 1

)→(δ 10 1− 1

δ

)= U (3.2.1)

and

L =

(1δ

10 1

)(3.2.2)

Now we want to solve L (Ux) = b

1 for j =1:162 d e l t a = 10ˆ(− j ) ;3 b = [ 1 + del ta , 2 ] ;4 L = [ 1 , 0 ; 1/ de l ta , 1 ] ;5 U = [ de l ta , 1 ; 0 , 1−1/ de l t a ] ;6 % Solve Ly = b \ to y7 y (1 ) = b ( 1 ) ; y (2 ) = b (2) − L(2 ,1 )∗ y ( 1 ) ;8 % Solve Ux = y \ to x9 x (2 ) = y (2)/ u ( 2 , 2 ) ; x (1 ) = ( y (1 ) − u (1 ,2 )∗ x ( 2 ) ) / u ( 1 , 1 ) ;

10 %11 disp ( sprintf ( ’ %5.0 e %20.15 f %20.15 f %10.8 e ’ , de l ta , x ( 1 ) , x ( 2 ) ,norm(x− [ 1 , 1 ] ) ) ;12 end

Note that the norm is the Euclidian norm, x − [1, 1] =√

(x(1)− 1)2 + (x(2)− 1)2 . Thisgives us a table of results as shown below

Conclusion: Ax = b is a good problem (well-posed) introducing small perturbations(e.g., by roundoff) does not change the solution by much. Matlab’s algorithm A\b is agood algorithm (stable); LU decomposition does not give a good algorithm (unstable).

19


Table 3.1. Variation of error with the perturbation variable

δ x(1) x(2) ||x− [1, 1]||21e-01 1.000 1.000 8e-161e-02 1.000 1.000 1e-131e-03 0.999 1.000 6e-121e-04 1.000. . . 28 1.000 e-111e-05 . . . 1.000 e-10. . . . . . . . . . . .1e-16 0.888 1.000 e-0

Discussion of well-posedness

Geometrically, Ax = b,

δx1 + x2 = 1 + δ, (3.2.3a)

x1 + x2 = 2. (3.2.3b)

This is a well-posed system. Rearranging

x2 ≈ 1− δx1, x2 = 2− x1. (3.2.4a)

Our other system Ly = b,

y1 = 1 (3.2.5a)

1

δy1 + y2 = 2 (3.2.5b)

This makes a very ill-posed system because small wiggles in δ give much larger errors becausethe slopes are so near each other.

Now we consider Ux = y,

δx1 + x2 = 1, (3.2.6a)(1− 1

δ

)x2 = y2. (3.2.6b)

This is also ill-posed as well. All of these linear problems are illustrated in figure 3.1.

20


x1

x2

(1, 1)

(a) Ax = b

x1

x2

(b) Ly = b

x1

x2

(c) Ux = y

Figure 3.1. Plot of linear problems and their solutions.

Gaussian elimination with pivoting

Pivoting means we exchange rows such that the current |aii| = maxj≥i|aji|. Similarly, `ji =

ajiaii≤ 1 for all j > i. Now,[

δ 1 1 + δ1 1 2

]→[

1 1 2δ 1 1 + δ

](3.2.7a)

R2←R2−δR1−−−−−−−→[

1 1 20 1− δ 1 + δ − 2δ

](3.2.7b)

→[

1 1 20 1− δ 1− δ

](3.2.7c)

PLU always works. Theorem: Gaussian elimination with pivoting yields PA = LU. Thepermutation matrix is P. Every matrix has a PLU factorization.

To do the pivoting, at each step, first premultiply A by

Pk =

1 0 0 0 0 0

0. . . 0 0 0 0

0 0 0 1 0 00 0 1 0 0 0

0 0 0 0. . . 0

0 0 0 0 0 1

(3.2.8)

then premultiply by

Lk =

1 0 0 0 0 0

0. . . 0 0 0 0

0 0 1 1 0 00 0 `k−1,k 1 0 0

0 0... 0

. . . 00 0 `n,k 0 0 1

(3.2.9)

21


We do this in succession,

Ln−1Pn−1 · · ·L2P2L1P1A = U (3.2.10)

How do these commute into a useful P and L matrix?


Discussion of HW problem 2

− yj−1 + 2yj − yj+1 = h2f(xj), for j = 1, . . . , n− 1. (3.3.1)

2 −1 0 · · · 0

−1 2 −1. . .

...

0 −1 2. . . 0

.... . . . . . . . . −1

0 · · · 0 −1 2

y1

y2...

yn−2

yn−1

= h2

f(t1) + y0

f(t2)...

f(tn−2)f(tn−1) + yn

. (3.3.2)

So we’ve set up our matrix

rhs = matrix of zeros size $1 \times n-1$

for

A_{(n-1)x(n-1)}

x = a:h:b = linspace(a,b,n+1)

rhs = h^2*f(x(2:n));

rhs(1) = rhs(1) + ya;

rhs(n-1) = rhs(n-1) + yb;

Recall that our f(x) = −ex:

− y′′ = −ex (3.3.3)

PLU factorization

For PLU factorization, we are doing Gauss elimination with pivoting. At each kth stepof Gaussian elimination, switch rows so that the pivots, a

(k)kk , are the largest number by

magnitude in the kth column.

For example, 1 −1 3−1 0 −2

2 2 4

x1

x2

x3

=

−310

. (3.3.4)

22


or 1 −1 3 −3−1 0 −2 1

2 2 4 0

→ 2 2 4 0−1 0 −2 1

1 −1 3 −3

, row1 ↔ row3 (3.3.5a)

→

2 2 4 00 1 −0 10 −2 1 −3

, row2 ← row2 −1

3row1, and row3 ← row3 −

1

2row1

(3.3.5b)

→

2 2 4 00 −2 1 −30 1 −0 1

, row2 ↔ row3 (3.3.5c)

→

2 2 4 00 −2 1 −30 0 1/2 −1/2

, row3 ← row3 −(−1

2

)row2

(3.3.5d)

We need to do the back substitution to solve this system. But more importantly, we wantto know what the factorization of this system would be. Recall,

Lk =

1 0 0 0 0 0

0. . . 0 0 0 0

0 0 1 0 0 00 0 `k−1,k 1 0 0

0 0... 0

. . . 00 0 `n,k 0 0 1

, (3.3.6)

and

L−(n−1)Pn−1 · · ·L−2P2L−1P1A = U. (3.3.7)

Reordering,

Pn−1 · · ·L−2P2L−1P1A = L(n−1)U. (3.3.8)

We want to move each P to be right next to A and all the Ls such that we can form a trueL. Claim,

PjL−k = L−kPj, j > k. (3.3.9)

Pj permutation moves columns below the kth row. This allows us to move L’s out.

PjL−kPj = L−k (3.3.10a)

L−n · · · ˜L−1Pn−1 · · ·P1A = U (3.3.11)

23


Now we can return to our example but with keeping track of the 1 −1 3 −3−1 0 −2 1

2 2 4 0

→ 2 2 4 0−1 0 −2 1

1 −1 3 −3

, row1 ↔ row3,P1 =

0 0 10 1 01 0 0

(3.3.12a)

→

2 2 4 0

−12

1 −0 1

12−2 1 −3

, row2 ← row2 −(−1

2

)row1, row3 ← row3 −

1

2row1

(3.3.12b)

→

2 2 4 0

12−2 1 −3

−12

1 −0 1

, row2 ↔ row3,P2 =

0 0 11 0 00 1 0

(3.3.12c)

→

2 2 4 0

12

−2 1 −3

−12

−12

1/2 −1/2

, row3 ← row3 −(−1

2

)row2

(3.3.12d)

Because P = P−1, we should remember that,

PA = LU (3.3.13a)

A = PLU. (3.3.13b)

3.4 Lecture 7: September 4, 2013

PLU Factorization

RecallPA = LU (3.4.1)

always exists by construction. This is because we can make anything non-zero by the per-mutation. This is also equivalent to,

A = PLU (3.4.2)

because P = P−1. To use this in an actual solution,

PAx = Pb, (3.4.3)

orLUx = Pb, (3.4.4)

So this system is determined by:

24

3.4. Lecture 7: September 4, 2013 Applied Matrix Theory

1. Solving Ly = Pb,

2. Solving Ux = y.

In Matlab, we would use the commands [L,U,P] = lu(A), to find these three matrices.This factorization is not unique. We want to show the uniqueness of the LU factorization,and are also interested in when it exists.

Triangular Matrices

We are interested in the determinants of lower or upper triangular matrices. Let’s discussdet(L).

L =

`11 0 0 0 0...

. . . 0 0 0ì1 · · · `jj 0 0... · · · ...

. . . 0`n1 · · · `nj . . . `nn

(3.4.5)

the determinant is det(L) =∏n

i=1 ìi. Thus L is invertible only if ìi 6= 0 for all ìi. Weconjecture the product of two lower triangular matrices will give us lower a triangular matrix.e.g.

L1L2 = L12 (3.4.6)

We want to prove this!

Multiplication of lower triangular matrices

Prove that L1L2 is lower triangular. Assume AB are lower triangular. Show C = AB islower triangular. We know that bijaij = 0 for j > i. In our proof, we first consider matrixmultiplication.

eij =∑

aikbkj. (3.4.7)

We know that aik = 0 for k > i, and bkj = 0 for j > k. If j > i, then when k < i we havethat k < j so bkj = 0. Alternatively, if k > i then aik = 0. Thus, in either case one of thetwo products is zero and we have proved our hypothesis.

Inverse of a lower triangular matrix

A lower triangular matrix’s inverse is also a lower triangular matrix;

L−1 =

`11 · · · 0...

. . ....

`n1 · · · `nn

= Lower triangular (3.4.8)

So, this helps with inversion of the form,

L−n · · ·L−2L−1A = U. (3.4.9)

25


For matrixes of the form

L−k =

1 0 0 0 0 0

0. . . 0 0 0 0

0 0 1 0 0 00 0 −ìj 1 0 0

0 0... 0

. . . 00 0 −`nj 0 0 1

; (3.4.10)

the inverse matrix is

Lk =

1 0 0 0 0 0

0. . . 0 0 0 0

0 0 1 0 0 00 0 ìj 1 0 0

0 0... 0

. . . 00 0 `nj 0 0 1

. (3.4.11)

For any

Lk =

1 0 0 0 0...

. . . 0 0 00 · · · 1 0 0... 0

.... . . 0

0 · · · ìn . . . 1

`11 0 0 0 0...

. . . 0 0 00 · · · ìi 0 0... 0

.... . . 0

0 · · · 0 . . . `nn

(3.4.12)

To find L−1, [L I]GE−−→ [I L−1]. Use Gaussian elimination on L, and we go through each

column.

Uniqueness of LU factorization

Theorem: If A is such that no non-zero pivots are encountered, then A = LU with ìi = 1and uii 6= 0, which are the pivots. For, ìj =

aijaii

for j < i by construction.Proof: Assume A = L1U1 = L2U2, then

L−12 L1U1 = U2, (3.4.13a)

L−12 L1 = U2U

−11 (3.4.13b)

= diagonal matrix (3.4.13c)

= I. (3.4.13d)

If this is the case, then L−12 L1 = I or L2 = L1, and similarly U2 = U1. Thus these matrices

are the same and the solution must be unique.

Existence of the LU factorization

Theorem: A = LU with no zero pivots, then all leading principal submatrices Ak are non-singular. We define the leading principle sub matrices Ak of An×n is Ak = A(1:k),(1:k). Theseare the upper-left square matrices of the full matrix.

26


Part 2. A = LU then define Ak 6= 0 for any k. We want to prove that if A = LU, showthat Ak is invertible. Then if Ak is invertible show that A = LU.


About Homeworks

The median score was 50 out of 60. A histogram was shown with the general grade distri-bution. 1 around 10, 3 around 25, 1 around 40, 4 from 45–50, 4 from 50–55, 6 from 55–60.Comments: write in working Matlab code. Also, L must have ones on the diagonal, whileU has pivots on the diagonal. “Computing efficiently” means using the LU decomposition,not invert the matrix A.

For homework 2, we will have applications of finding the inverse of A or solve

AX = I (3.5.1)

or

A(x1 x2 · · · xn

)=(e1 e2 · · · en

)(3.5.2)

To find A−1, solve

Axj = ej, (3.5.3)

for all j = 1, 2, . . . , n. Use the LU decomposition.

Discussion of ill-conditioned systems

We define Ax = b as an ill-conditioned system if small changes in A or b introduces largechanges in the solution. Geometrically we showed this interpretation previously on a 2 × 2system, and we noted that the slopes were very similar to each-other. Numerically, we havetrouble because the roundoff when we solve Ax = b. We also may compute a conditionnumber which tells us the amplification factor of errors in the system.

In Matlab, the command cond(A) gives you the condition. This should hopefully beunder a thousand. The condition number essentially tells you how much accuracy you canexpect to get from the final solution. In other words, if your condition number is 1 × 105

then you can only expect to have about 11 significant digits in our solution at floating pointarithmetic.

27


Inversion of lower triangular matrices

Show that if A is a lower triangular matrix then so is A−1. So let’s solve AX = I with Alower triangular.

x 0 0 0 0 1 0 0 0 0x x 0 0 0 0 1 0 0 0x x x 0 0 0 0 1 0 0x x x x 0 0 0 0 1 0x x x x x 0 0 0 0 1

→

x 0 0 0 0 1 0 0 0 0x x 0 0 0 y 1 0 0 0x x x 0 0 y 0 1 0 0x x x x 0 y 0 0 1 0x x x x x y 0 0 0 1

, (3.5.4a)

→

x 0 0 0 0 1 0 0 0 0x x 0 0 0 y 1 0 0 0x x x 0 0 y y 1 0 0x x x x 0 y y 0 1 0x x x x x y y 0 0 1

, (3.5.4b)

→

x 0 0 0 0 1 0 0 0 0x x 0 0 0 y 1 0 0 0x x x 0 0 y y 1 0 0x x x x 0 y y y 1 0x x x x x y y y 0 1

, (3.5.4c)

→

x 0 0 0 0 1 0 0 0 0x x 0 0 0 y 1 0 0 0x x x 0 0 y y 1 0 0x x x x 0 y y y 1 0x x x x x y y y y 1

. (3.5.4d)

We now have shown that we can get the lower triangular matrix A into the form LD. Nowwe do backward substitution to get our X. In this case this is simply deviding each row bythe value of the pivot of that row. In this way with D = U, we have X = D−1L−1.

Example of LU decomposition of a lower triangular matrix

Given the matrix,

2 0 01 3 02 1 4

=

1 0 012

1 01 1

31

2 0 00 3 00 0 4

, (3.5.5a)

= LU. (3.5.5b)

28


Banded matrix example

Exercise 3.10.7: Band matrix A with bandwidth w is a matrix with aij = 0 if |i− j| > w.If w = 0, we have a diagonal matrix.

Aw=0 =

a11 0 0 0 00 a22 0 0 00 0 a33 0 00 0 0 a44 00 0 0 0 a55

. (3.5.6)

For bandwidth, w = 1,

Aw=1 =

a11 a12 0 0 0a21 a22 a23 0 00 a32 a33 a34 00 0 a43 a44 a45

0 0 0 a54 a55

. (3.5.7)

For bandwidth, w = 2,

Aw=2 =

a11 a12 a13 0 0a21 a22 a23 a24 0a31 a32 a33 a34 a35

0 a42 a43 a44 a45

0 0 a53 a54 a55

. (3.5.8)

In the LU decomposition these zeros are preserved. However there are other cases (as shownin the homework) where the zeros may not be preserved.

We will return to our theorem on Monday. For the homework, a matrix has an LUdecomposition if and only if all principle submatrices are invertible.


Existence of the LU factorization (cont.)

When does LU factorization exist? Theorem: If no zero pivots that appears in Gaussianelimination (including the nth one) then A = LU, ìi = 1 and uii 6= 0 are pivots. Then L,U are unique.

Theorem: A = LU if and only if the leading principle submatrices Ak is invertible.Proof: Assume (for block matrices of length k × k, n− k × n− k and the difference)

A = LU, (3.6.1)

=

(L11 0L21 L22

)(U11 U12

0 U22

), (3.6.2)

=

(L11U11 L11U12

L21U11 L22U22

)(3.6.3)

29


Now our question: is Ak = L11U11? We know that det L11 =∏k

j=1 `jj 6= 0 so L11 is

invertible. Similarly, U11 =∏k

j=1 ujj 6= 0 so it is also invertibles. Since we know that theproduct of two invertible matrices is also invertible, Ak must also be invertible.

We will now do a proof by induction: If we assume that all Ak are invertible. Show thatA = LU.

ASIDE: Example of proof by induction.

We want to show,n∑

j=1

j2 =n(n+ 1)(2n+ 1)

6. (3.6.4)

The steps of proof by induction are

1. First we show that this holds for n = 1,

2. next we assume it holds for n,

3. finally we show that it holds for n+ 1.

Let’s show the third step,

n+1∑j=1

j2 =

n∑j=1

j2 + (n+ 1)2, (3.6.5a)

=n(n+ 1)(2n+ 1)

6+ (n+ 1)2, (3.6.5b)

=n(n+ 1)(2n+ 1) + 6(n+ 1)2

6, (3.6.5c)

=(n+ 1) [n(2n+ 1) + 6(n+ 1)]

6, (3.6.5d)

=(n+ 1)

[2n2 + 7n+ 1

]6

, (3.6.5e)

=(n+ 1)(n+ 2)(2n+ 3)

6. (3.6.5f)

Which is what would be expected, and we have proved this relation by induction.

So for our system,

1. First we show that this holds for n = 1,

A = [a11] = [1] [a11] where a11 6= 0.

2. Assume true for n:

If Ak, k = 1, . . . , n are invertible, then An×n = Ln×nUn×n.

3. Show it holds for n+ 1.

So let’s move onto the third step, assume A(n+1)×(n+1) with Ak, k = 1, . . . , n+1 are invertible.By induction assumption An = LnUn, since A1, . . . ,An are invertible. Now we need to showthat An+1 = Ln+1Un+1,

An+1 =

(An bcᵀ α

), (3.6.6a)

=

(LnUn b

cᵀ α

), (3.6.6b)

=

(Ln 0yᵀ 1

)(Un x0ᵀ

β

). (3.6.6c)

30


We want Lnx = b so we let x = L−1n b which supposes that L−1

n exists. We also wantyᵀUn = cᵀ so we let yᵀ = cᵀU−1

n . Finally, we want yᵀx + β = α, so we let β = α−yᵀx. Weknow,

An+1 =

(LnUn b

cᵀ α

), (3.6.7a)

=

(Ln 0

cᵀU−1n 1

)(Un L−1

n b0ᵀ

α− cᵀU−1n L−1

n b

). (3.6.7b)

Since A = An+1 is invertible, we must have β 6= 0 because if β = 0 then det(Ln+1) det(Un+1) =0, in which case An+1 would not be invertible. So, An+1 has an LU decomposition and byprinciple of induction we have proven our theorem.

Rectangular matrices

For a rectangular matrix Am×n ∈ Rm×n. Our question: is Ax = b solvable? Is the solutionunique? We are presented with there options: no solution, unique solution, or infinitelymany solutions. We are going to do Gaussian elimination to reduce the form of the matrixto see how many solutions we will have. So we will do row echelon form (REF) reduction.

Example of row echelon form

A =

1 2 1 3 32 4 0 4 41 2 3 5 52 4 0 4 7

, (3.6.8a)

→

1 2 1 3 30 0 −2 −2 −20 0 2 2 20 0 −2 −1 1

, (3.6.8b)

→

1 2 1 3 30 0 1 1 10 0 0 0 00 0 0 0 2

, (3.6.8c)

→

1 2 1 3 30 0 1 1 10 0 0 0 10 0 0 0 0

. (3.6.8d)

Where we made interchanges to have leading ones for the columns. What do we know aboutour matrix A from this information? First, we know what columns are linearly independent.We are trying to find the column space of our matrix.

31


3.7 Homework Assignment 2: Due Friday, September

13, 2013

1. Textbook 3.10.1 (a, c): LU and PLU factorizations

Let, A =

1 4 54 18 263 16 30

.

(a) Determine the LU factors of A

(c) Use the LU factors to determine A−1

2. Textbook 3.10.2

Let A and b be the matrices,

A =

1 2 4 173 6 −12 32 3 −3 20 2 −2 6

and b =

17334

.(a) Explain why A does not have an LU factorization.

(b) Use partial pivoting and find the permutation matrix P as well as the LU factorssuch that PA = LU.

(c) Use the information in P, L, and U to solve Ax = b.

3. Textbook 3.10.3

Determine all values of ξ for which A =

ξ 2 01 ξ 10 1 ξ

fails to have an LU factorization.

4. Textbook 3.10.5

If A is a matrix that contains only integer entries and all of its pivots are 1, explainwhy A−1 must also be an integer matrix. Note: This fact can be used to constructrandom integer matrices that posses integer inverses by randomly generating integermatrices L and U with unit diagonals and then constructing the product A = LU.

5. Lower triangular matrices

Let A be a 3× 3 matrix with real entries. We showed that GE is equivalent to findinglower triangular matrices L−1 and L−2 such that L−2L−1A = U where U is uppertriangular and,

L−1 =

1 0 0−`21 1 0−`31 0 1

, L−2 =

1 0 00 1 00 −`32 1

, (3.7.1)

32

3.7. HW 2: Due September 13, 2013 Applied Matrix Theory

with

(L−1)−1 =

1 0 0`21 1 0`31 0 1

= L1, (L−2)−1 =

1 0 00 1 00 `32 1

= L2. (3.7.2)

It follows that A = L2L1U. Show that

L2L1 =

1 0 0`21 1 0`31 `32 1

. (3.7.3)

Show by example that generally,

L2L1 6= L1L2 (3.7.4)

That is, the order in which these lower triangular matrices are multiplied matters.

6. Textbook 1.6.4: Conditioning

Using geometric considerations, rank the following three systems according to theircondition.

(a)

1.001x− y = 0.235,

x+ 0.0001y = 0.765.

(b)

1.001x− y = 0.235,

x+ 0.9999y = 0.765.

(c)

1.001x+ y = 0.235,

x+ 0.9999y = 0.765.

7. Textbook 1.6.5

Determine the exact solution of the following system:

8x+ 5y + 2z = 15,

21x+ 19y + 16z = 56,

39x+ 48y + 53z = 140.

Now change 15 to 14 in the first equation and again solve the system with exactarithmetic. Is the system ill-conditioned?

33


8. Textbook 1.6.6

Show that the system

v − w − x− y − z = 0,

w − x− y − z = 0,

x− y − z = 0,

y − z = 0,

z = 1,

is ill-conditioned by considering the following perturbed system:

v − w − x− y − z = 0,

− 1

15v + w − x− y − z = 0,

− 1

15v + x− y − z = 0,

− 1

15v + y − z = 0,

− 1

15v + z = 1.

34

UNIT 4

Rectangular Matrices


Rectangular matrices (cont.)

We are interested in a rectangular matrix, Am×n. We may apply REF, or RREF to find thecolumn dependence, what the basic columns are, and what the rank of the matrix is. Thisway we can find for any system Ax = b, whether the system is consistent and find all thesolutions; whether it is homogeneous, or what the free variables are; and what the particularsolutions are. Last time’s example, we went from

A =

1 2 1 3 32 4 0 4 41 2 3 5 52 4 0 4 7

, (4.1.1a)

→

1 2 1 3 30 0 2 2 20 0 0 0 30 0 0 0 0

. (4.1.1b)

The first, third, and fifth columns have pivots and are the basic columns. They correspondto the linearly independent columns in A. How do we write the other two columns (c2, c4)as functions of the other three columns? We can notice that, c2 = 2c1, and similarly c4 =2c1 + c3. The reduced row echelon form (RREF) has pivots on 1, and zeros below and above

35

Nitsche and Benner Unit 4. Rectangular Matrices

x1

x2

(a) Intersecting system (onesolution)

x1

x2

(b) Parallel system (no solu-tion)

x1

x2

(c) Equivalent system (infi-nite solutions)

Figure 4.1. Geometric illustration of linear systems and their solutions.

all pivots. So, 1 2 1 3 30 0 2 2 20 0 0 0 30 0 0 0 0

→

1 2 1 3 30 0 1 1 10 0 0 0 10 0 0 0 0

, (4.1.2a)

→

1 2 1 3 00 0 1 1 00 0 0 0 10 0 0 0 0

, (4.1.2b)

→

1 2 0 2 00 0 1 1 00 0 0 0 10 0 0 0 0

. (4.1.2c)

In this form, the basic columns are very clear, and the relations between the dependentcolumns and the basic columns is also obvious. So again we can see that, c2 = 2c1 andc4 = 2c1 +1c3. The rank of the matrix is the number of linearly independent columns, whichis also the number of linearly independent rows, and also the number of pivots in row-echelonform of the matrix. A consistent system, Ax = b is a system that has at least one solution.It is inconsistent if it has no solutions. To determine if Ax = b is consistent, in a 2 × 2system, Ax = b,

a11x1 + a12x2 = b1, (4.1.3a)

a21x1 + a22x2 = b2. (4.1.3b)

Since this system is a linear system we can see three cases: one intersection, parallel andseparated, and parallel and the same. Each of these cases are illustrated in Figure 4.1.In general, for any size matrix, we find the row echelon form of the augmented system

36


[A b]→[E b

]. x x x x x

0 x x x x0 0 0 0 α

(4.1.4)

If α 6= 0, then the system is inconsistent. So Ax = b is consistent if rank([A b]) = rank(A).If α = 0 then b is not a basic column of (A b). The we can write b as a linear combinationof the basic columns of E. We can write b as linear combinations of basic columns of A.

In our example, we had c1, c3, and c5 where the basic columns and Ax = b was consistent.Here then if we were to preform a reduction, the b = x1c1 + x3c3 + x5c5, or in other words,

A

x1

0x3

0x5

= b. (4.1.5)

Example of RREF of a Rectangular Matrix

Given the matrix, 1 1 2 2 1 12 2 4 4 3 12 2 4 4 2 23 5 8 6 5 3

→

1 1 2 2 1 10 0 0 0 1 −10 0 0 0 0 00 2 2 0 2 0

, (4.1.6a)

→

1 1 2 2 1 10 2 2 0 2 00 0 0 0 1 −10 0 0 0 0 0

. (4.1.6b)

Thus, our system is consistent. We have that rank([A b]) = rank(A). Similarly, we observethat we have 3 basic columns, r, and 2 linearly dependent columns, n− r. (If n > m, thenn > r, so n− r 6= 0). Let’s continue on to perform the reduced row echelon form.

1 1 2 2 1 10 2 2 0 2 00 0 0 0 1 −10 0 0 0 0 0

→

1 1 2 2 1 10 1 1 0 1 00 0 0 0 1 −10 0 0 0 0 0

, (4.1.7a)

→

1 1 2 2 0 20 1 1 0 0 10 0 0 0 1 −10 0 0 0 0 0

, (4.1.7b)

→

1 0 1 2 0 10 1 1 0 0 10 0 0 0 1 −10 0 0 0 0 0

. (4.1.7c)

37


Thus our b = 1c1 + 1c2 − 1c5. Therefore, b = 1c1 + 1c2 − 1c5, and

x =

1100−1

. (4.1.8)

So in review, 1 1 2 2 1 12 2 4 4 3 12 2 4 4 2 23 5 8 6 5 3

→

1 0 1 2 0 10 1 1 0 0 10 0 0 0 1 −10 0 0 0 0 0

. (4.1.9)

We found a particular solution, xp = (1 1 0 0 − 1)ᵀ of Ax = b. For any solution xh ofAx = 0, we have that A (xp + xH) = b + 0. So (xp + xH) also solves Ax = b.


Solving Ax = b

Ax = b is consistent if rank[A |b] = rank(A). We have that b is a nonbasic column of[A |b]. We can express b in terms of columns of A to get a solution Axp = b. The set of allsolutions is xp + xH , where Axp = b has the particular solution to Ax = b. We also solveAxH = 0, and get all homogeneous solutions, xH . Since we can add these two solutions, wehave A (xp + xH) = b.

Now to actually find the particular solution, xp, we write b in terms of basic columns.To find the homogeneous solutions, xH , we solve Ax = 0 by solving for basic variables xiin terms of the n− r free variables. Basic variables correspond to basic columns, while freevariables correspond to nonbasic columns. Note that if n > r then the set of columns islinearly independent and we can find x 6= 0 such that Ax = 0.

Example

From our example 1 1 2 2 1 12 2 4 4 3 12 2 4 4 2 23 5 8 6 5 3

→

1 0 1 2 0 10 1 1 0 0 10 0 0 0 1 −10 0 0 0 0 0

, (4.2.1a)

we have that

b = a:1 + a:2 − a:5, (4.2.2a)

= x1a:1 + x2a:2 − x5a:5, (4.2.2b)

= Axp, where xp =(1 1 0 0 −1

)ᵀ. (4.2.2c)

38


Solve,

[A |0] =

1 0 1 2 0 00 1 1 0 0 00 0 0 0 1 00 0 0 0 0 0

. (4.2.3a)

This gives us the three equations for the homogeneous solutions,

x1 = −x3 − 2x4, (4.2.4a)

x2 = −x3, (4.2.4b)

x5 = 0. (4.2.4c)

This gives us the homogeneous solutions of the form,

xH =

−x3 − 2x4

−x3

x3

x4

0

, (4.2.5a)

= x3

−1−1

000

+ x4

−2

0010

. (4.2.5b)

Thus the set of all solutions are,

x = xp + xH , (4.2.6a)

=

1100−1

+ x3

−1−1

000

+ x4

−2

0010

. (4.2.6b)

This solves Ax = b for any x3 and x4. Therefore we have infinitely many solutions. Not wecan only have unique solutions if n = r.

Linear functions

We have any function f : D → R is a linear function if

1. f(x+ y) = f(x) + f(y),

2. f(αx) = αf(x).

39


For example, f(x) = ax+ b, with b 6= 0.

f(x+ y) = (ax+ b) + (ay + b) , (4.2.7a)

= a(x+ y) + 2b, (4.2.7b)

6= a(x+ y) + b. (4.2.7c)

Thus this is not a linear function. However when b = 0, the function f(x) = ax can beverified to be linear.

Example: Transpose operator

The transpose operator is f(A) = Aᵀ. Define that if A = [aij], then Aᵀ = [aji] andA∗ = Aᵀ = [aji]. Is this linear?

f(A + B) = (A + B)ᵀ, (4.2.8a)

= [aij + bij]ᵀ, (4.2.8b)

= [aji + bji] , (4.2.8c)

= Aᵀ

+ Bᵀ. (4.2.8d)

To check the second criterion,

f(αA) = [αA]ᵀ, (4.2.9a)

= α [A]ᵀ, (4.2.9b)

= αf(A). (4.2.9c)

So this operator is linear.

Example: trace operator

The trace operator is f(A) = tr(A) =∑

i aii.

f(A + B) =∑i

(aii + bii) , (4.2.10a)

=∑i

aii +∑i

bii, (4.2.10b)

= tr(A) + tr(B). (4.2.10c)

The second cirterion,

f(αA) = tr(αA), (4.2.11a)

=∑i

αaii, (4.2.11b)

= α∑i

aii, (4.2.11c)

= α tr(A), (4.2.11d)

= αf(A). (4.2.11e)

We have therefore shown that this is a linear operator.

40


Matrix multiplication

Given,

A =

(a bc d

), B =

(a b

c d

). (4.2.12)

Then consider

f(x) = Ax =

(ax1 + bx2

cx1 + dx2

), g(x) = Bx =

(ax1 + bx2

cx1 + dx2

). (4.2.13)

Take

f(g(x)) = A (Bx) ≡ ABx. (4.2.14)

But,

f(g(x)) =

(a(ax1 + bx2) + b(cx1 + dx2)

c(ax1 + bx2) + d(cx1 + dx2)

), (4.2.15a)

=

((aa+ bc)x1 + (ab+ bd)x2

(ca+ dc)x1 + (cb+ dd)x2

), (4.2.15b)

=

(aa+ bc ab+ bd

ca+ dc cb+ dd

)(x1

x2

), (4.2.15c)

≡ AB. (4.2.15d)

Now if we define AB = [Ai:B:j]︸︷︷︸(AB)ij

or Ai:B:j =∑n

k=1AikBkj. We get that matrix multiplication

is not generally commutative, or AB 6= BA. If AB = 0 then either A = 0 or B = 0 unlessA or B are invertible. Further we know that we have the distributive properties,

A (B + C) = AB + AC, (4.2.16)

or

(A + B) D = AD + BD, (4.2.17)

and the associative property

(AB) C = A (BC) . (4.2.18)

A property of the transpose operator is,

(AB)ᵀ

= BᵀAᵀ, (4.2.19)

which also helps to understand that,

tr(AB) = tr(BA). (4.2.20)

Note, however, that tr(ABC) 6= tr(ACB) as we will demonstrate on the homework.

41


Proof of transposition property

We want to prove the useful property,

(AB)ᵀ

= BᵀAᵀ. (4.2.21)

Dealing with our left hand side of the equation,

LHS : (AB)ᵀ

=[(AB)

ᵀij

], (4.2.22a)

= [(AB)ji], (4.2.22b)

= [Aj:B:i]. (4.2.22c)

Manipulating the right hand side of the property,

RHS : BᵀAᵀ

=[(

BᵀAᵀ)ij

], (4.2.23a)

=[Bᵀi:Aᵀ:j

], (4.2.23b)

= [B:iAj:], (4.2.23c)

= [Aj:B:i], (4.2.23d)

= LHS. (4.2.23e)

Thus, we have proved the identity.


We will be having an exam on September 30th.

Inverses

We define: A has an inverse if each A−1 exists such that,

AA−1 = A−1A = I. (4.3.1)

We also have the properties:

• (AB)−1 = B−1A−1,

• (Aᵀ)−1

= (A−1)ᵀ,

• (A−1)−1

= A.

What about the inverse of sums (A + B)−1? There are the special cases,

• low rank perturbations of In×n : (I + CDᵀ)−1

, where C, D ∈ Rn×k or the matrices areof rank k.

• small perturbation of I : (I + A)−1, where ||A||.

42


We have a rank-1 matrix uvᵀ, with u, v ∈ Rn = Rn×1.

uvᵀ

=

u1

u2...uk

(v1 v2 · · · vk), (4.3.2a)

=

u1v1 u1v2 · · · u1vku2v1 u2v2 · · · u2vk

......

. . ....

ukv1 ukv2 · · · ukvk

, (4.3.2b)

=

u1v

ᵀ

u2vᵀ

...ukv

ᵀ

. (4.3.2c)

Now let’s say we have an example where all matrix entries are zero except for αij at somepoint (i, j).

0 · · · 0

... α...

0 · · · 0

=

0...α...0

(0 · · · 1 · · · 0

), (4.3.3a)

= αeieᵀj . (4.3.3b)

Low rank perturbations of I

We make the claim the if u, v are such that vᵀu + 1 6= 0 then

(I + uv

ᵀ)−1= I− uvᵀ

1 + vᵀu(4.3.4)

Proof: (I + uv

ᵀ)(I− uvᵀ

1 + vᵀu

)= I− uvᵀ

1 + vᵀu+ uv

ᵀ − u (vᵀu) vᵀ

1 + vᵀu, (4.3.5a)

= I− uvᵀ

1 + vᵀu+ uv

ᵀ − (vᵀu)

1 + vᵀuuvᵀ, (4.3.5b)

= I− uvᵀ(

1

1 + vᵀu+ 1− (vᵀu)

1 + vᵀu

), (4.3.5c)

= I− uvᵀ

��(

1− 1 + vᵀu

1 + vᵀu

), (4.3.5d)

= I. (4.3.5e)

43


So if c, d ∈ Rn such that dᵀ

(A−1c) + 1 6= 0, we are interested in A−1.[A + cd

ᵀ]−1=[A(I + A−1cd

ᵀ)], (4.3.6a)

=(I +

(A−1c

)dᵀ)

A−1, (4.3.6b)

=

(I− A−1cd

ᵀ

1 + dᵀA−1c

)A−1, (4.3.6c)

= A−1 − A−1cdᵀA−1

1 + dᵀA−1c

. (4.3.6d)

The Sherman–Morrison Formula

The Sherman–Morrison formula states that if A is invertible and C, D ∈ Rn×k such thatI + DᵀA−1C is invertible. Then,(

A + CDᵀ)−1

= A−1 −A−1C(I + D

ᵀA−1C

)−1DᵀA−1 (4.3.7)

Finite difference example with periodic boundary conditions

Previously, we had,

−y′′ = f, on [a, b], (4.3.8a)

y(a) = ya, (4.3.8b)

y(b) = yb. (4.3.8c)

We get the finite difference approximation of,2 −1 0 0−1 2 −1 · · · 0

0 −1 2. . . 0

.... . . . . .

...0 0 0 · · · 2

y1

y2

y3...

yn−1

= h2

f1...fi...

fn−1

+

y0...0...yn

. (4.3.9)

If we instead use periodic boundary conditions we have perturbed our solution,

−y′′ = f, on [a, b], (4.3.10a)

y(a) = y(b), (4.3.10b)

y′(a) = y′(b). (4.3.10c)2 −1 0 −1−1 2 −1 · · · 0

0 −1 2. . . 0

.... . . . . .

...−1 0 0 · · · 2

y1

y2

y3...

yn−1

= h2

f1...fi...

fn−1

. (4.3.11)

In this case the Shermann–Morrison formula would help greatly with our inversion.

44


Examples of perturbation

Given a matrix

A =

(1 21 3

), (4.3.12a)

A−1 =

(3 −2−1 1

). (4.3.12b)

B =

(1 22 3

), (4.3.12c)

= A +

(0 01 0

), (4.3.12d)

= A + e2eᵀ1, (4.3.12e)

Applying the Shermann–Morrison formula

B−1 = A−1 −A−1

(0 01 0

)A−1

1 + eᵀ1A−1e2

, (4.3.12f)

= A−1 −

(3 −2−1 1

)(0 03 −2

)1− 2

, (4.3.12g)

= A−1 −(−6 4

3 −2

), (4.3.12h)

=

(9 −2−4 3

). (4.3.12i)

Small perturbations of I

We want to show what happens when we have small perturbations from the identity matrixI;

(I−A)−1 ?= I + A + A2 + · · · , (4.3.13)

when ‖A‖ < 1.We first consider the geometric series,

(1− x)−1 =1

1− x, (4.3.14a)

=∞∑n=0

xn, (4.3.14b)

= 1 + x+ x2 + x3 + · · · (4.3.14c)

when |x| < 1.To be continued. . .

45



Small perturbations of I (cont.)

We want to show what happens when we have small perturbations from the identity matrixI;

(I−A)−1 ?= I + A + A2 + · · · , (4.4.1)

when ‖A‖ < 1.We first consider the geometric series ,

(1− x)−1 =1

1− x, (4.4.2a)

=∞∑n=0

xn, (4.4.2b)

= 1 + x+ x2 + x3 + · · · , (4.4.2c)

when |x| < 1. This is proved as follows,

S =n∑k=0

xk, (4.4.3a)

S − xS = 1 + x+ x2 + · · ·+ xn − x− x2 − · · · − xn+1, (4.4.3b)

= 1 +��(x− x) +��(x2 − x2

)+ · · ·+��

(xn − xn)− xn+1, (4.4.3c)

= 1− xn+1, (4.4.3d)

S =1− xn−1

1− x, (4.4.3e)

= limn→∞

1− xn−1

1− x, (4.4.3f)

=1

1− x. (4.4.3g)

Returning to the full series for a matrix,

(I−A) (I + A + · · ·+ An) = I + A + A2 + · · ·+ An −A−A2 − · · · −An+1, (4.4.4a)

= I +��(A−A) +��(A2 −A2

)+ · · ·+��

(An −An)−An+1,(4.4.4b)

= I−An+1. (4.4.4c)

If A is small, so that An → 0 as n→∞, then

(I−A)∞∑k=0

Ak = I, (4.4.4d)

(I−A)−1 =∞∑k=0

Ak. (4.4.4e)

46


Let’s consider the convergence of this series now.

L =∞∑k=1

ak, (4.4.5)

where ak → 0 as k → ∞. We define that L is finite if limn→∞∑n

k=1 ak exists and is finite.As an example we see that

∑∞n=1

1n

diverges since limn→∞∑n

k=11k→ ∞. So we also should

consider that the difference,

(L−)−∞∑k=1

ak → 0, as n→∞. (4.4.6)

Thus, we can consider that,

L ≈n∑k=1

ak, with error → 0 as n→∞. (4.4.7)

In particular if A is small then,

(I−A)−1 ≈ I + A. (4.4.8)

For example,

(A + B)−1 =[A(I + A−1B

)]−1, (4.4.9a)

where A−1 exists,

=(I + A−1B

)−1A−1, (4.4.9b)

≈(I−A−1B

)A−1, (4.4.9c)

= A−1 −A−1BA−1. (4.4.9d)

Matrix Norms

The properties of norms of matrix A ∈ Rm×n has a norm, ‖ · ‖, if the norm satisfies,

1. ‖A‖ ≥ 0, and if ‖A‖ = 0 then A = 0,

2. ‖A + B‖ ≤ ‖A‖+ ‖B‖,3. ‖αA‖ = |α| ‖A‖,

and we must add the fourth property;

4. ‖AB‖ ≤ ‖A‖ ‖B‖.

As an example of a norm,

‖A‖ = maxj

∑i

|aij| (4.4.10)

47


which is the maximum absolute value of the column sum. If ‖A‖ < 1, then 0 ≤ ‖An‖ ≤‖A‖n → 0 as n→∞. So ‖An‖ → 0 as n→∞ and An → 0 as n→∞.

When is A−1B small? ∥∥A−1B∥∥ ≤ ∥∥A−1

∥∥ ‖B‖, (4.4.11a)

=∥∥A−1

∥∥‖B‖‖A‖‖A‖, (4.4.11b)

=‖B‖‖A‖

. (4.4.11c)

Thus, ∥∥A−1∥∥ 6≤ 1

‖A‖. (4.4.12)

Note, we have shown ‖A−1‖ 6≤ 1‖A‖ , since ‖AA−1‖ = ‖I‖ which we suppose to be equal to

1. If this is the case,

1 =∥∥AA−1

∥∥, (4.4.13a)

= ‖A‖∥∥A−1

∥∥. (4.4.13b)

So,

1

‖A‖≤∥∥A−1

∥∥. (4.4.13c)

However, we would get,

∥∥A−1∥∥ =‖A−1‖‖A‖‖A‖

, (4.4.13d)

=∥∥A−1

∥∥κ (A) . (4.4.13e)

Condition Number

For example pertaining to the condition number , we suppose we have Ax = b, and we havethe perturbation (A + B) x = b, where we know that ‖A−1B‖ < 1, or in other words thatB is sufficiently small. We can get the relative change in x introduced by the change in A,

‖x− x‖‖x‖

=

∥∥A−1b− (A + B)−1 b∥∥

‖x‖, (4.4.14a)

=

∥∥[A−1 − (A + B)−1]b∥∥‖x‖

, (4.4.14b)

48


If we use (A + B)−1 ≈ A−1 −A−1BA−1

≈ ‖A−1BA−1b‖‖x‖

, (4.4.14c)

≤ ‖A−1B‖‖x‖‖x‖

, (4.4.14d)

≤ ‖A−1‖‖B‖‖A‖‖A‖

, (4.4.14e)

=‖B‖‖A‖

κ(A). (4.4.14f)

Thus, κ(A) measures the amplification of the errors.

4.5 Homework Assignment 3: Due Friday, September

27, 2013

For the first four problems, you may use the Matlab commands rref(a) and a\b to checkyour work.

1. Textbook 2.2.1: Row Echelon Form, Rank, Consistency, General solution of Ax = b.

Determine the reduced row echelon form for each of the following matrices and thenexpress each nonbasic column in terms of the basic columns:

(a)

1 2 3 32 4 6 92 6 7 6

(b)

2 1 1 3 0 4 14 2 4 4 1 5 52 1 3 1 0 4 36 3 4 8 1 9 50 0 3 −3 0 0 38 4 2 14 1 13 3

2. Textbook 2.3.3

If A is an m × n matrix with rank(A) = m, explain why the system [A|b] must beconsistent for every right-hand side b.

3. Textbook 2.5.1

Determine the general solution for each of the following non homogeneous systems.

(a)

x1 + 2x2 + x3 + 2x4 = 3, (4.5.1a)

2x1 + 4x2 + x3 + 3x4 = 4, (4.5.1b)

23x1 + 6x2 + x3 + 4x4 = 5. (4.5.1c)

49


(b)

2x+ y + z = 4, (4.5.2a)

4x+ 2y + z = 6, (4.5.2b)

6x+ 3y + z = 8, (4.5.2c)

8x+ 4y + z = 10. (4.5.2d)

(c)

x1 + x2 + 2x3 = 3, (4.5.3a)

3x1 + 3x3 + 3x4 = 6, (4.5.3b)

2x1 + x2 + 3x3 + x4 = 3, (4.5.3c)

x1 + 2x2 + 3x3 − x4 = 0. (4.5.3d)

(d)

2x+ y + z = 2, (4.5.4a)

4x+ 2y + z = 5, (4.5.4b)

6x+ 3y + z = 8, (4.5.4c)

8x+ 5y + z = 8. (4.5.4d)

4. Textbook 2.5.4

Consider the following system:

2x+ 2y + 3z = 0, (4.5.5a)

4x+ 8y + 12z = −4, (4.5.5b)

6x+ 2y + αz = 4. (4.5.5c)

(a) Determine all values of α for which the system is consistent.

(b) Determine all values of α for which there is a unique solution, and compute thesolution for these cases.

(c) Determine all values of α for which there are infinitely many different solutions,and give the general solution for these cases.

5. Textbook 3.3.1: Linear Functions

Each of the following is a function from R2 into R2. Determine which are linearfunctions.

(a) f

(xy

)=

(x

1 + y

).

(b) f

(xy

)=

(yx

).

50


Figure 4.2. Figures for Textbook problem 3.3.4.

(c) f

(xy

)=

(0xy

).

(d) f

(xy

)=

(x2

y2

).

(e) f

(xy

)=

(x

sin y

).

(f) f

(xy

)=

(x+ yx− y

).

6. Textbook 3.3.4

Determine which of the following three transformations in R2 are linear.

7. Textbook 3.5.4: Matrix Multiplication

Let ej denote the jth unit column that contains a 1 in the jth position and zeroseverywhere else. For a general matrix An×n, describe the following products. (a)Aej (b) eᵀjA (c) eᵀjAej

8. Textbook 3.5.6

(please use induction)

For A =1/2 α0 1/2

, determine limn→∞An. Hint: Compute a few powers of A and try

to deduce the general form of An.

9. Textbook 3.5.9

If A = [aij(t)] is a matrix whose entries are functions of a variable t, the derivative ofA with respect to t is defined to be the matrix of derivatives. That is,

dA

dt=

[daijdt

].

51


Derive the product rule for differentiation

d(AB)

dt=

dA

dtB + A

dB

dt.

10. Textbook 3.6.2

For all matrices An×k and Bk×n show that the block matrix

L =

(I−BA B

2A−ABA AB− I

)has the property L2 = I. Matrices with this property are said to be involuntary, andthey occur in the science of cryptography.

11. Textbook 3.6.3

For the matrix

A =

1 0 0 1/3 1/3 1/30 1 0 1/3 1/3 1/30 0 1 1/3 1/3 1/30 0 0 1/3 1/3 1/30 0 0 1/3 1/3 1/30 0 0 1/3 1/3 1/3

,

determine A300. Hint: A square matrix C is said to be idempotent when it has theproperty that C2 = C. Make use of the idempotent submatrices in A.

12. Textbook 3.6.5

If A and B are symmetric matrices that commute, prove that the product AB is alsosymmetric. If AB 6= BA, is AB necessarily symmetric?

13. Textbook 3.6.7

For each matrix An×n, explain why it is impossible to find a solution for Xn×n in thematrix equation

AX−AX = I. (4.5.6)

Hint: Consider the trace function.

14. Textbook 3.6.11

Prove that each of the following statements is true for conformable matrices

(a) tr (ABC) = tr(BCA) = tr(CAB).

(b) tr (ABC) can be different from tr (BAC).

(c) AᵀB = tr(ABᵀ)

15. Textbook 3.7.2: Inverses

Find the matrix X such that X = AX + B, where

A =

0 −1 00 0 −10 0 0

and B =

1 22 13 3

.

52


16. Textbook 3.7.6

If A is a square matrix such that I−A is nonsingular, prove that

A (I−A)−1 = (I−A)−1 A.

17. Textbook 3.7.8

If A, B, and A + B are each nonsingular, prove that

A (A + B)−1 = B (A + B)−1 A =(A−1 + B−1

)−1.

18. Textbook 3.7.9

Let S be a skew-symmetric matrix with real entries.

(a) Prove that I− S is nonsingular. Hint: xᵀx = 0 means x = 0.

(b) If A = (I + S) (I− S)−1, show that A−1 = Aᵀ.

19. Textbook 3.9.9: Sherman–Morrison formula, rank 1 matrices

Prove that rank(An×n) = 1 if and only if there are nonzero columns um×1 and vn×1

such thatA = uv

ᵀ.

20. Textbook 3.9.10

Prove that rank(An×n) = 1, then A2 = τA, where τ = tr(A).

53


54

UNIT 5

Vector Spaces


Topics in Vector Spaces

We will be discussing the following topics in this lecture (and possibly the next couple).

• Field

• Vector Space

• Subspace

• Spanning Set

• Basis

• Dimension

• The four subspaces of Am×n

Field

We define a field as a set F with the properties such that,

• Closed under addition (+) and multiplication ( · ). Thus if α, β ∈ F , then α + β ∈ Fand α · β ∈ F .

• Addition and multiplication are commutative.

• Addition and multiplication are associative. This means that (α+β)+γ = α+(β+γ)and (αβ)γ = α(βγ).

• Addition with multiplication is distributive. α(β + γ) = αβ + αγ.

• There exists an additive and multiplicative identity α + 0 = α, α · 1 = α.

• There exists an additive and multiplicative inverse α + (−α) = 0, α(α−1) = 1.

For example the reals and the complex numbers are fields. The natural numbers are not,the rational numbers are. The set L2 = {0, 1} has the three operations 0 + 0 = 1, 0 + 1 = 1,1 + 1 = 0.

55

Nitsche and Benner Unit 5. Vector Spaces

Vector Space

We may define a vector space V over a field F is a set V with operations + and · such that,

• v + w ∈ V for any v,w ∈ V .

• αv ∈ V for any v ∈ V , α ∈ F .

• v + w = w + v for any v,w ∈ V . This is the commutative property of addition.

• (u+v)+w = u+(v+w) for any u,v,w ∈ V , which is the associative law of addition.

• For each 0 ∈ V contains u + 0 = u, for any u ∈ V .

• For each −u ∈ V contains u + (−u) = 0, for any u ∈ V .

• (αβ)u = α(βu) for any α, β ∈ F , u ∈ V .

• (α+ β)u = αu + βu for any α, β ∈ F , u ∈ V . This is the first form of the distributiveproperty.

• 1 · u = u, the 1 multiplication identity in F .

• α(u + v) = αu + αv for any α ∈ F , and u,v ∈ V .

Examples of vector spaces of R is Rn = Rn×1, Rn×m, Cm×n, all functions such that [0, 1]→ R,all polynomials which map R→ R.

Theorem 5.1. A subset S of a vector space V over F is a vector space over F if

• v + w ∈ S, for any v,w ∈ S.

• αv ∈ S for any α ∈ F , v ∈ S.

Several examples include All continuous functions: [0, 1] → R = C[0, 1], all polynomialsof degree n, S = {0} contained in V .

Definition 5.2. Let {v1, . . . ,vn} ∈ V , then span{v1, . . . ,vn} = {α1v1 + α2v2 + · · ·+ αnvn, αk ∈ F}.

Theorem 5.3. This gives the theorem: The span of {v1, . . . ,vn} is a subspace.

Definition 5.4. The set {v1, . . . ,vn} is a spanning set of span{v1, . . . ,vn}.

Note the 0 ∈ span{v1, . . . ,vn}, and 0 ∈ subspace.

For example, span

{(12

)}contained in R2 = span

{(12

),

(−2−4

)}. This gives rise to

the basis vector

(12

), thus the system is one-dimensional. The basis vector is illustrated

along with the solution on Figure 5.1.

Definition 5.5. A basis for a vector space is a minimal spanning set.

Theorem 5.6. Any two passes for a vector space have the same number of elements.

Definition 5.7. The number of elements in the basis is equal to the dimension of the space.

56


x1

x2

Figure 5.1. Basis vector of example solution.

For example, P2 = {a1 + a2x+ a3x2} the basis of this set is {1, x, x2} and we observe

that it must have three dimensions. Therefore, for a polynomial of degree n the dimensionsof the polynomial function space are dim(Pn) = n+ 1.

As another example, S = {0} = ∅ the basis is the null set, and we have a zero-dimensionalsystem. Thus, zero cannot be an element of a basis.

Definition 5.8. A set {v1, . . . ,vn} is linearly independent if α1v1 + α2v2 + · · ·+ αnvn = 0implies α1 = α2 = · · · = αn = 0.

It follows that {0} is not a linearly independent space since,

α0 = 0, for any α 6= 0. (5.1.1)

Similarly, any set containing 0 is not linearly independent.

Examples of function spaces

On example is the solutions to y′′ = 0. This is the set {y = αx+ b |α, β ∈ R}. Thevector space has two dimensions and the basis is {1, x}. Another example is the setof solutions of y′′ = y. The set of solutions is {y = c1ex + c2e−x}, which has the two-dimensional basis {ex, e−x}. A third example is the set of solutions of y′′ = −y. This set is{y = c1 sin(x) + c2 cos(x)} which is also the two-dimensional space {sin(x), cos(x)}. A finalexample of interest is y′′ = 2. This gives the solution set {y = x2 + αx+ β}. This howeveris not a vector space because we are restricted by the defined coefficient of x2 being one!This results from the fact that this is a non homogeneous system, unlike the other exampleswhich may be rearranged into homogeneous form.

In the general example of R2×2 =

{(a bc d

)}, the basis of this system is

{(1 00 0

),

(0 10 0

),

(0 01 0

),

(0 00 1

)}.

57



The four subspaces of Am×n

We now define the four fundamental subspaces of Am×n : Rn → Rm. These are:

1. R(A) = {y : y = Ax,x ∈ Rn} ⊂ Rm This is the column space.

2. N(A) = {x ∈ Rn : Ax = 0} ⊂ Rn. This is the null space of A.

3. R(Aᵀ) = {y : y = Aᵀx,x ∈ Rm} ⊂ Rn. This is equivalently, R(Aᵀ) = {y : yᵀ =xᵀA,x ∈ Rn} ⊂ Rm. This determines why this is the row space of A.

4. N(Aᵀ) = {x ∈ Rm : Aᵀx = 0 or xᵀA = 0ᵀ} ⊂ Rm. This is called the left null space of

A.

We want to show that R(A) is a vector space. So we let y1,y2 ∈ R(A) Then, y1 = Ax1 andy2 = Ax2 for some x1,x2. This tells us that

y1 + y2 = Ax1 + Ax2, (5.2.1a)

= A (x1 + x2) ∈ R(A). (5.2.1b)

Also

αy1 = αAx1, (5.2.2a)

= Aαx1 ∈ R(A). (5.2.2b)

Thus R(A) is a subspace of Rm.An example: Find the spanning set for all 4 subspaces of,

A =

1 2 1 3 32 4 0 4 41 2 3 5 52 4 0 4 2

→

1 2 0 2 00 0 1 1 00 0 0 0 10 0 0 0 0

(5.2.3)

So the row space,

R(A) = span

1212

,

1030

,

3452

⊂ R4. (5.2.4)

To find the column space, we need the solution of the homogeneous equation Ax = 0.

x1 = −2x2 − 2x4, (5.2.5a)

x3 = −x4, (5.2.5b)

x5 = 0, (5.2.5c)

58


or

x = x2

−2

1000

+ x4

−2

0−1

10

. (5.2.6)

Thus,

N(A) = span

−2

1000

,

−2

0−1

10

⊂ R5. (5.2.7)

Now say,

A→ EA : Pm×mAm×n = EA,m×n. (5.2.8)

We have that Pm×m is square and invertible (it is a product matrix). We also know thatPA = EA where the rows EA are a linear combination of rows of A. Similarly, A = P−1EA

has that the rows of A are linearly commutations of the rows of EA or that the row spaceof A is equal to the row space of EA. So,

R(Aᵀ) = row space

ᵀof A, (5.2.9a)

= span

12020

,

00110

,

00001

, (5.2.9b)

={y : y = A

ᵀx or y

ᵀ= x

ᵀA}

(5.2.9c)

59


To find the fourth space, N(Aᵀ),1 2 1 22 4 2 41 0 3 03 4 5 43 4 5 2

→

1 2 1 20 0 0 00 −2 2 −20 −2 2 −20 −2 2 4

, (5.2.10a)

→

1 2 1 20 1 −1 10 0 0 −20 0 0 00 0 0 0

, (5.2.10b)

→

1 2 1 20 1 −1 10 0 0 −20 0 0 00 0 0 0

, (5.2.10c)

→

1 2 1 00 1 −1 00 0 0 10 0 0 00 0 0 0

, (5.2.10d)

→

1 0 3 00 1 −1 00 0 0 10 0 0 00 0 0 0

. (5.2.10e)

So the solution for Aᵀx = 0,

x1 = −3x3, (5.2.11a)

x2 = x3, (5.2.11b)

x3 = x3. (5.2.11c)

or

x = x3

−3

110

. (5.2.12)

This finally gives us that,

N(Aᵀ) = span

−3

110

⊂ R4. (5.2.13)

60


So the dimension of the row space of A is

dim (R(A)) = r, (5.2.14)

which is also known as the rank of A. The dimensions of the other spaces are

dim (N(A)) = n− r. (5.2.15)

For

dim(R(A

ᵀ))

= r. (5.2.16)

Finally,

dim(N(A

ᵀ))

= n− r. (5.2.17)

Alternative to fin the left null space of A. That is

N(Aᵀ) =

{x : x

ᵀA = 0

}. (5.2.18)

We use

PA =

— b1 —

...— br —

...— 0 —

(5.2.19a)

with r rows occupied and n− r zero rows. From this we can use block matrices,

P =

(P1

P2

)(5.2.20)

So

PA =

(P1

P2

)A =

(P1AP2A

). (5.2.21)

We know that P2A = 0. So we claim that the rows of P2 span the left null space ofA = N(Aᵀ) and

R(Pᵀ2) = N(A

ᵀ). (5.2.22)


Dr. Nitsche is not in town October 18 or Wednesday before thanksgiving. May have to havealternate times for class.

61


The Four Subspaces of A

To recall what we discussed last class,

• R(A) is the range of A or the column space. This has dimensions r.

• N(A) is the column space of Aᵀ = {Aᵀy}. This has dimensions n− r.

• R(Aᵀ) is the rowspace transpose of A = {(yAᵀ)ᵀ} and is also known as the left rangeof A.This has dimensions r.

• N(Aᵀ) = {x : Aᵀx = 0} = {x : xᵀA = 0} and this is the left null space of A. This hasdimensions m− r.

Returning to the manipulation A→ EA with PA = EA with Pm×m is invertible.(P1

P2

)A =

(P1AP2A

), (5.3.1a)

=

(B1

0

), (5.3.1b)

where P2A = 0.

Theorem 5.9.

N(Aᵀ) = R(P

ᵀ2) (5.3.2)

where the right hand side is the rowspace of P2.

Proof. For proof of ⊇, Assume y ∈ R(Pᵀ2). Then y = Pᵀ2x for some x. Reformuating,yᵀ = xᵀP2. So yᵀA = xᵀP2A = xᵀ0, which gives y ∈ N(Aᵀ).

Also assume ⊆, assume y ∈ N(Aᵀ), Then yᵀA = 0. This gives yᵀP−1EA = 0 =

yᵀ[Q1|Q2]

(U0

). So, 0 = (yᵀQ1)Ur×m where we have that U is full rank. This gives

yᵀQ1 = 0.

We know that QP = I. [Q1|Q2]

[P1

P2

]= I, Q1P1 + Q2P2 = I, Q1P1 = I −Q2P2. This

gives, 0 = yᵀQ1P1 = yᵀ (I−Q2P2). So, yᵀ = yᵀQ2P2 and so we have

y = Pᵀ2

(Qᵀ2y)∈ R(P

ᵀ). (5.3.3)

�

As an example,1 2 1 3 3 1 0 0 02 4 0 4 4 0 1 0 01 2 3 5 5 0 0 1 02 4 0 4 2 0 0 0 1

→

1 2 0 2 0 0 −12

0 10 0 1 1 0 0 −2

313

12

0 0 0 0 1 0 12

0 −12

0 0 0 0 0 1 −13−1

30

(5.3.4a)

62


Note that the N(Aᵀ) is orthogonal to R(A). We also find from this manipulation that

R(A) = span

1212

,

1030

,

3−4

52

(5.3.5)

and

N(Aᵀ) = R(P

ᵀ2) =

3−1−1

0

. (5.3.6)

Linear Independence

Definition 5.10. A set {v1, . . . ,vn} is linearly independent if α1v1 + · · ·+αnvn = 0 impliesα1 = · · · = αn = 0. From this we get the equivalent statements;

• {v1, . . . ,vn} linearly independent,

• A = [v1 · · · vn] has full rank r,

• N(A) = {Aα = 0} = {0}.

For example we have the polynomial basis set to order n, {1, x, x2, . . . , xn} which islinearly independent because, c0 + c1x+ c2x

2 + · · ·+ cnxn = 0 implies that c0 = · · · = xn = 0.

As another example we can show that the zero set, {0} is linearly independent. This isbecause α0 = 0 for any α 6= 0. Any set containing 0, e.g. {v1, . . . ,vn,0} is linearly dependent.

Another example is any set of distinct unit vectors, {ei1, ei2, . . . , ein} where ei ∈ Rm andn ≤ m. This is also a linear independent since,

A =

0 0 10 0 01 0 00 1 00 0 0

. (5.3.7)

We take as another example the Van der Monde matrix which has applications in poly-nomial interpolation. Let x1, . . . , xm be distinct real numbers,

A =

1 x1 x2

1 · · · xn−11

1 x2 x22 · · · xn−1

2...

1 xm x2m · · · xn−1

m

(5.3.8)

where n ≤ m. Then we have Ac = y, where c = [c0 · · · cn−1]ᵀ. Because p(x1) = y1 andp(xm) = ym. Solution to Ac = y gives a polynomial that interpolates (xk, yk). For Ac = 0then we have p with m roots x1, . . . , xm, but another polynomial of degree n − 1 can onlyhave n− 1 distinct roots since m > n− 1. So p ≡ 0 and therefore c ≡ 0.

63


x

y

(x1, y1)

(xk, yk)

(xn, yn)

Figure 5.2. Interpolating system.


Linear functions (rev)

Is f linear? Here it was good to find the formula. Some could be done by inspection. Herewe also should check f(p1 + p2) = f(p1) + f(p2) and f(αp) = αf(p). So let’s talk aboutthe finding the functions; say the flipping function:

f(x, y) = (x,−y) =

(1 00 −1

)(xy

)(5.4.1)

For the mapping of the projection,

f(x, y) =

(x+ y

2,x+ y

2

)=

(12

12

12

12

)(xy

)(5.4.2)

For the rotation, x = r cos(ψ), and y = r sin(ψ). If we denote the shifted with primes,x′ = r cos(ψ + θ) and y′ = r sin(ψ + θ). We can use identities to get x′ = r(cosψ cos θ −sinψ sin θ) = x cos θ − y sin θ and y′ = r(sinψ cos θ + cosψ sin θ) = y cos θ + x cos θ. Thisgives us the function,

f(x, y) =

(x′

y′

)=

(cos(θ) − sin(θ)sin(θ) cos(θ)

)(xy

). (5.4.3)

Note this is a skew symmetric matrix with determinant equal to 1.

Review for exam

Anything on the first three homework’s is fair game. We have been doing computationsof the LU, PLU, REF, RREF. We have solved Ax = b. Writing systems of linear equa-tions in matrix form. We have talked about the elementary matrices and the process ofpremultiplication as well as there invertibility.

64


We have also discussed some proof, especially this last one. We showed this majorone: tr(AB) = tr(AB), (AB)

ᵀ= BᵀAᵀ, (AB)−1 = B−1A−1, (A−1)

ᵀ= (Aᵀ)

−1= A−ᵀ.

Similarly we have shown that the LU decomposition exists if all principle submatrices areinvertible. The relation (I−A)−1 =

∑nk=0 Ak if Ak → 0. We also discussed (A + B)−1

with perturbation matrices. Finally, we discussed rank one matrices, so we need to knowthe Sherman–Morrison formula, (I + uvᵀ)

−1.

Previous lecture continued

Comment on previous lecture:

A =

1 x1 x2

1 · · · xn−11

1 x2 x22 · · · xn−1

2...

1 xm x2m · · · xn−1

m

m×n

(5.4.4)

When we consider Ac = y is equivalent to p(xi) = yi, where i = 1, · · · ,m. Thus we havethe equation c0 + c1xi + c2x2 + · · ·+ cn−1x

n−1i = 0, where i = 1, · · · ,m, and we have a linear

system in the coefficients, ck. m ≥ n. In terms of vectors, these are linearly independentbecause the set

1...1

,

x1...xm

,

x21...x2m

, · · · ,

xn−11...

xn−1m

, (5.4.5)

has rank(A) = n. To show that this is linearly independent, we set up the system

c0

1...1

+ c1

x1...xm

+ c2

x21...x2m

+ · · ·+ cn−1

xn−11...

xn−1m

=

0...0

. (5.4.6)

Here we must show that we have at least m distinct roots, but p ∈ Pn−1 has at most n− 1roots. We know this by the fundamental theorem of algebra. So, m > n − 1 and thepolynomial must be identically equal to the zero polynomial, p ≡ 0, and ck = 0 for all k.

So we want to interpolate the polynomial p(x) ∈ Pn−1. We set up p(xi) = yi for i =1, . . . ,m. If n − 1 = m then we will have a unique solution to the interpolation. If insteadm > n then we have either no solution or infinitely many solutions. We defined the spanof a set as the set of all linear combinations that are a vector set over the field of reals:span {v1, . . . ,vn} = {

∑cnvn, cn ∈ R}. The basis for a vector space V is the set {v1, . . . ,vk},

that spans V and is linearly independent. We also know that the basis for {0} is the emptyset ∅. Thus, for convenience, we define span {∅} = {0}.

Theorem 5.11. If {v1, . . . ,vn} is a linearly independent basis of V, then {u1, . . . ,um}m > n is linearly dependent.

65


5.5 Lecture 18: October 2, 2013

Exams and Points

We decided that we will have three exams total, but only the best two will each count for20% of our semester grade. Homework will be worth 60%. Lecture notes will be postedonline.

Continuation of last lecture

Theorem 5.12. If {u1, . . . ,un} spans V and S = {v1, . . . ,vn} ⊂ V with m > n, then S islinearly dependent.

Proof. Consider∑m

i=1 αiv1 = 0. Using vi =∑n

j=1 cijuj,

m∑i=1

αi

n∑j=1

cijuj = 0, (5.5.1a)

n∑j=1

(m∑i=1

αicij

)︸︷︷︸

(αᵀC)j

uj = 0. (5.5.1b)

Since Cᵀn×mα = 0 has ranks free, recall there exists α 6= 0 such that Cᵀα = 0. So (Cᵀα)j = 0for any j so

∑i αivi = 0. �

Definition 5.13. A basis of V is a linearly independent spanning set of V .

Theorem 5.14. Any two basis have the same number of elements.

Equivalent characterizations of basis,

• linearly independent spanning set

• minimal spanning set

• max linearly independent subset of V .

Definition 5.15. dim(V) is equal to the number of elements in the basis.

Recalling the four subspaces for a matrix,

Am×n =

| | |a1 a2 · · · an| | |

m×n

; (5.5.2)

• R(A) ⊂ Rm, dim = r;

• N(A) ⊂ Rn, dim = n− r;• R(Aᵀ) ⊂ Rn, dim = r;

66

5.5. Lecture 18: October 2, 2013 Applied Matrix Theory

• N(Aᵀ) ⊂ Rm, dim = m− r.

Definition 5.16. If X and Y are two subspaces of V then

X + Y = {x + y,x ∈ X ,y ∈ Y} . (5.5.3)

Is X + Y a subspace? We shall illustrate this in two parts

1. Given z ∈ X + Y , is αz ∈ X + Y?

If this is the case, z = x + y and αz = αx + αy ∈ X + Y , where we recalled that thevectors x and y are within their respective sets.

2. Given z1, z2 ∈ X + Y , is z1 + z2 ∈ X + Y?

Here we substitute for the summed vectors of each of the z vectors, (x1+y1)+(x2+y2) =(x1 + x2) + (y1 + y2) ∈ X + Y .

Theorem 5.17. dim(X + Y) = dim(X ) + dim(Y)− dim(X ∩ Y).

Proof. Let BX∩Y = {z1, . . . , zk} be the basis for X ∩Y . Then we can extent the set to basesfor X and Y .

BX = {z1, . . . , zk,x1, . . . ,xn} , (5.5.4a)

BY = {z1, . . . , zk,y1, . . . ,ym} . (5.5.4b)

We now claim that we have a set S = {z1, . . . , zk,x1, . . . ,xn,y1, . . . ,ym} = BX+Y . We nowconsider: does S span X +Y? We let z ∈ X +Y . Then, we know z = x + y for every x ∈ Xand y ∈ Y . So,

z =

(∑i

αizi +∑i

βixi

)+

(∑i

α′izi +∑i

γiyi

), (5.5.5a)

=∑i

(αi + α′i) zi +∑i

βixi +∑i

γiyi, ∈ span(S). (5.5.5b)

Is S linearly independent? Consider∑αizi +

∑βixi +

∑γiyi = 0, (5.5.6a)

∑γiyi︸︷︷︸∈Y

= −

∑αizi +∑

βixi︸︷︷︸∈X

,∈ X ∩ Y , (5.5.6b)

=∑

δizi, (5.5.6c)∑γiyi +

∑δizi = 0, (5.5.6d)

This indicates γi = δi ≡ 0.∑αizi +

∑βixi = 0, (5.5.6e)

which also indicates αi = δi ≡ 0. �

67


From our example the range was spanned by the vectors,

R(A) = span

1212

,

1030

,

3452

⊂ R4, (5.5.7a)

N(A) = span

−2

1000

,

−2

0−1

00

⊂ R5, (5.5.7b)

R(Aᵀ) = span

12020

,

00110

,

00001

⊂ R5, (5.5.7c)

N(Aᵀ) = span

3−1−1

0

⊂ R4. (5.5.7d)

Theorem 5.18. (a) R(A) is orthogonal to N(Aᵀ) and (b) R(A) ∪ N(Aᵀ) = {0}.Which means R(A) + N(Aᵀ) = Rm and R(Aᵀ) + N(A) = Rn. Any Am×n gives an

orthogonal decomposition of Rn and Rm.

Proof. (a) Let y ∈ R(A) gives y = Az for some z. Then x ∈ N(Aᵀ) which means thatAᵀx = 0 and additionally xᵀA = 0. Considering xᵀy = xᵀAz = 0, so x must beorthogonal to y; therefore R(A) ⊥ N(A).

(b) If x ∈ R(A) and x ∈ N(Aᵀ), then xᵀx = 0 which implies xi = 0 and x = 0.�

68

UNIT 6

Least Squares


Least Squares

We will now be covering the concept of least squares . If we are given an equation Ax = b, wemay multiply by the transpose of the matrix to find the least squares solution; AᵀAx = Aᵀb.We will show that this is consistent even if Ax = b is inconsistent.

Previously we showed,

Theorem 6.1. dim(X + Y) = dim(X ) + dim(Y) − dim(X ∩ Y), where X ,Y are subspacesof V.

We now consider,

Theorem 6.2. Given conformal matrices A and B,

rank(A + B)︸︷︷︸dim(R(A+B))

≤ rank(A)︸︷︷︸dim(R(A))

+ rank(B)︸︷︷︸dim(R(B))

. (6.1.1)

Proof. R(A + B) ⊂ R(A) + R(B) since, if y ∈ R(A + B) then

y = (A + B)x, (6.1.2a)

= Ax + Bx, ∈ R(A) + R(B), ⊂ R(A) ⊂ R(B). (6.1.2b)

Further,

dim(R(A + B)) ≤ dim(R(A) + R(B)), (6.1.3a)

= dim(R(A)) + dim(R(B))− dim(R(A) ∩ R(B)), (6.1.3b)

≤ dim(R(A)) + dim(R(B)), (6.1.3c)

= rank(A) + rank(B). (6.1.3d)

�

69

Nitsche and Benner Unit 6. Least Squares

Theorem 6.3. rank(AB) = rank(B)− dim(N(A) ∩ R(B))

Proof. Let S = {x1, . . . ,xs} be a basis of N(A) ∩ R(B). Since N(A) ∩ R(B) ⊂ R(B) canextend S to a basis for R(B),

BR(B) = {x1, . . . ,xs, z1, . . . , zt} . (6.1.4)

To prove dim(R(AB)) = t we claim {Az1 , . . . ,Azt} is a basis for R(AB). First we showthat it spans. We let b ∈ R(AB). So b = ABy, for some y where By ∈ R(B). So

b = A

(∑i

αix1 +∑i

βizi

), (6.1.5a)

=∑i

αi Axi︸︷︷︸=0

+∑i

βiAzi, since x1 ∈ N(A). (6.1.5b)

=∑i

βiAzi, ∈ span(S1). (6.1.5c)

Next we show that S2 is lineally independent;∑

i αiAzi = 0. Rearranging, A (∑

i αizi) = 0and

∑i αizi ∈ N(A) ∩ R(B) since zi ∈ R(B). Thus,

∑i αizi =

∑i βixi and

∑i αizi −∑

i βixi = 0. Therefore, αi = βi = 0 since {zi,xi} are linearly independent. �

Theorem 6.4. Given matrices Am×n and Bn×p, then

rank(A) + rank(B)− n ≤ rank(AB) ≤ min(rank(A), rank(B)) (6.1.6)

Proof. We will consider the right inequality first and the left inequality second. First,rank(AB) ≤ rank(B). We know that rank((AB)ᵀ) = rank(BᵀAᵀ) ≤ rank(Aᵀ) = rank(A)and finally rank((AB)ᵀ) = rank(AB).

For the left inequality, N(A)∩R(B) ⊂ N(A). Thus, dim(N(A)∩R(B)) ≤ dim(N(A)) =n−rank(A). So, rank(AB) = rank(B)−dim(N(A)∩R(B)) ≥ rank(B)−(n−rank(A)). �

Theorem 6.5. (1) rank(AᵀA) = rank(A) and rank(AAᵀ) = rank(Aᵀ) = rank(A).

(2) R(AᵀA) = R(Aᵀ) and R(AAᵀ) = R(A).

(3) N(AᵀA) = N(A) and N(AAᵀ) = N(Aᵀ).

Proof. For part (1), rank(AᵀA) = rank(A)−dim(N(Aᵀ)∩R(A)), but N(Aᵀ) ⊥ R(A) and soN(Aᵀ) ∩ R(A) = {0}. Since, if we let x ∈ N(Aᵀ) and x ∈ R(A) then Aᵀx = 0 and x = Ay.Which gives xᵀx = yᵀAx = 0 which implies that x = 0. So dim(N(Aᵀ) ∩ R(A)) = 0.

to be continued... �


We will have two weeks for the next homework.

70


Properties of Transpose Multiplication

In review we covered the following theorems last time:

Theorem 6.6. dim(X + Y) = dim(X ) + dim(Y) − dim(X ∩ Y), where X ,Y are subspacesof V.

We also had the theorem,


And finally we showed the relation


We left off at the theorem covering multiplication relations and the rank and dimensionsof the matrix,

Theorem 6.9. (1) rank(AᵀA) = rank(A) and rank(AAᵀ) = rank(Aᵀ) = rank(A).

(2) R(AᵀA) = R(Aᵀ) and R(AAᵀ) = R(A).

(3) N(AᵀA) = N(A) and N(AAᵀ) = N(Aᵀ).

We proved the first one using the third of the theorems above. We now prove the secondand third parts of this theorem.

Proof. For part 2, Let y ∈ R(AᵀA) then AᵀAx = y for some x. So y = Aᵀz for some z. Thusy ∈ R(Aᵀ) So R(AᵀA) ⊂ R(Aᵀ), since dim(R(AᵀA)) = dim(R(Aᵀ)) and R(AᵀA) = R(Aᵀ).This is because BR(AᵀA) ⊂ BR(Aᵀ) but since these have the same number of elements soBR(AᵀA) = BR(Aᵀ).

For the third part we want to show that the basis are contained in the other and then wecompare the domimensions. So we let x ∈ N(A) the Ax = 0 or AᵀAx = 0 and x ∈ N(AᵀA)so N(A) ⊂ N(AᵀA). But also dim(N(A)) = n + r or dim(N(AᵀA)) = n − r therefore thetwo sets must be the same; N(A) = N(AᵀA). �

The Normal Equations

Definition 6.10. The normal equations for a system Ax = b is

AᵀAx = A

ᵀb. (6.2.1)

Theorem 6.11. For any A, AᵀAx = Aᵀb is consistent.

Proof. RHS in R(Aᵀ); by the previous theorem RHS ∈ R(AᵀA) for every x such that AᵀAx =RHS. �

Note: the solution to the normal equation is unique when rank(A) = n.

71


Example 6.12. Fit (xi, yi), i = 1, . . . ,m by a polynomial of degree 2. So p(x) = c0 + c1x+c2x

2, where m > 3. Our problem it solve is p(xi) = yi, i = 1, . . . ,m or c0 + c1xi + cix2i =

yi, i = 1, . . . ,m. The system is therefore linear in the system of the unknowns c0, c1, andc2. We can write this in matrix form,

1 x1 x21

1 x2 x22

...1 xm x2

m

c0

c1

c2

=

y1

y2...ym

(6.2.2)

or alternatively we have the system Ac = y. What is the rank of the matrix A? We knowthat

1 x1 x21

1 x2 x22

...1 xm x2

m

is invertible, since Ac = 0 implies that c = 0. So,

Ac =

p(x1)p(x2)

...p(xm)

. (6.2.3)

Now, 3 ≤ m− 1 and we know that

rank

1 x1 x2

1

1 x2 x22

...1 xm x2

m

= 3 (6.2.4)

and AᵀAx = Aᵀb has a unique solution.To solve the normal equations,

1 1 1x1 x2 · · · xmx2

1 x22 x2

m

1 x1 x21

1 x2 x22

...1 xm x2

m

c0

c1

c2

=

1 1 1x1 x2 · · · xmx2

1 x22 x2

m

y1

y2...ym

, (6.2.5a)

m∑xi∑x2i∑

xi∑x2i

∑x3i∑

x2i

∑x3i

∑x4i

c0

c1

c2

=

∑yi∑xiyi∑x2i yi

(6.2.5b)

Suggestion: have an outline of the major proofs we have shown in class in your mind.Go back and give them a study over.

Theorem 6.13. AᵀAx = Aᵀb gives x which minimizes ‖Ax− b‖22 = (Ax− b)

ᵀ(Ax− b).

72


b

AxN(A)

Figure 6.1. Minimization of distance between point and a plane.

x

y

Figure 6.2. Parabolic fitting by least squares

By corollary this is an if and only if statement. Here every solution of the normalequations minimizes the sum of the squares of the entries of the vector

∑mi=1 (Ax− b)2

i .Note here ‖x‖2

2 = xᵀx =∑x2i . We illustrate this in Figure 6.1 where the minimal line

connecting a point to a plane is shown.

Example 6.14. What does the solution to the normal equations minimize from our example?The solution c0, c1, and c2 minimizes

∑mi=1 (Ac− y)2

i =∑

((Ac)i − yi)2 =

∑(p(xi)− yi)2.

We can visualize our parabolic least squares method as shown in Figure 6.2.

Exam 1

We had a range from 36–98, with a median of 66. For this exam: 70–100 is an A-rangescore, 50–70 is about a B, and below is a C (as long as as you are showing involvement in theclass). First two problems went fine; four was covered in class, five was on the homework,we will confer the solution of the sixth problem in class next time.

73



Need to have a couple classes early because of missing next Friday. So, next Monday andWednesday we will start at 8:35.

We will review problem 6 from the exam, then finish up least squares; cover lineardependence and finally linear transformations.

Exam Review

We review exam problems 6. Given u,v ∈ Rn. (a) Show A = I + uvᵀ is A−1 = I + αuvᵀ.Find α.

So we check that AA−1 = A−1A = I. Now

AA−1 =(I + αuv

ᵀ) (I + αuv

ᵀ), (6.3.1a)

= I + αuvᵀ

+ uvᵀ

+ uvᵀαuv

ᵀ, (6.3.1b)

= I + αuvᵀ

+ uvᵀ

+ αu(vᵀu)vᵀ, (6.3.1c)

= I + αuvᵀ

+ uvᵀ

+ α(vᵀu)uvᵀ, (6.3.1d)

= I + uvᵀ (

1 + α(1 + vᵀu)). (6.3.1e)

This is equal to I if 1 + α(1 + vᵀu) = 0 or when α = 11+vᵀu

. Thus, the Sherman–Morrisonformula is, (

I + uvᵀ)−1

= I− 1

1 + vᵀuuvᵀ. (6.3.2)

For part (b) B = A + αeieᵀj = A

(I + αA−1eie

ᵀj

)where A is invertible. For the inverse of

B:

B−1 =(I + αA−1eie

ᵀj

)−1A−1, (6.3.3a)

=

[I +

1

1 + eᵀjαA−1ei

αA−1eieᵀj

]A−1. (6.3.3b)

This exists if 1 + αeᵀjA−1ei︸︷︷︸

A−1ji

6= 0 and can make α sufficiently small.

Least squares and minimization

Theorem 6.15. x solves AᵀAx = Aᵀb is equivalent to x minimizes (Ax− b)ᵀ(Ax− b) =‖Ax− b‖2

2, where ‖x‖22 = xxᵀ =

∑i x

2i .

Note:

f(x) = f(x1, x2, . . . , xn), (6.3.4a)

= (Ax− b)ᵀ(Ax− b), (6.3.4b)

= (xᵀAᵀ − b

ᵀ)(Ax− b), (6.3.4c)

= xᵀAᵀAx− x

ᵀAᵀb− b

ᵀAx + b

ᵀb. (6.3.4d)

74


For scalars x, we have that xᵀ = x. So, vᵀAx =(bᵀAx)ᵀ

= xᵀAᵀb. This manipulates ourprevious result to,

= xᵀAᵀAx− 2x

ᵀAᵀb + b

ᵀb. (6.3.5)

This is a quadratic form and the minimum occurs when ∂f∂xi

= 0.

Proof. To prove from the right hand side to the left; suppose x minimizes f(x), then

0 =∂f

∂xi, (6.3.6a)

=∂xᵀ

∂xiAᵀAx + x

ᵀAᵀA∂x

∂xi− 2

∂xᵀ

∂xiAᵀb, (6.3.6b)

= 2eᵀiAᵀAx− 2e

ᵀiAᵀb. (6.3.6c)

This gives us

eᵀiAᵀAx = e

ᵀiAᵀb (6.3.7)

and

(AᵀAx)i = (A

ᵀb)i, any i. (6.3.8)

This finally means that we have formulated equivalently to AᵀAx = Aᵀb.

ASIDE:∂

∂xi(uv) =

∂u

∂xiv + u

∂v

∂xi. (6.3.9)

To prove going the other direction, suppose that x solves AᵀAx = Aᵀb then show thatf(x) < f(y) for any y 6= x. First, we consider

f(y)− f(x) = yᵀAᵀAy − 2y A

ᵀb︸︷︷︸

AᵀAx

+bᵀb− x

ᵀAᵀAx− 2x A

ᵀb︸︷︷︸

AᵀAx

+bᵀb, (6.3.10a)

= yᵀAᵀAy − 2yA

ᵀAx− x

ᵀAᵀAx− 2xA

ᵀAx, (6.3.10b)

= (Ay −Ax)ᵀ

(Ay −Ax) , (6.3.10c)

= ‖A (y − x)‖22, (6.3.10d)

≥ 0. (6.3.10e)

If A has full rank (no nontrivial null space), then this must be greater than zero. Soany solution to the normal equations minimizes this norm, or and solution AᵀAx = Aᵀbminimizes ‖Ax− b‖2

2. Further, if A has full rank then we are guaranteed a unique leastsquares solution x. Finally, if A has a nontrivial null space (r < n) then we have infinitelymany least squares solutions. �

In Matlab we can do help \ to find out what solution it gives for underdeterminedsolutions. What does it minimize?

75


6.4 Homework Assignment 4: Due Monday, October

21, 2013

1. Textbook 4.1.1: Vector spaces, subspaces, fundamental subspaces of a matrix.

Determine which of the following subsets of Rn are in fact subspaces of Rn (n > 2).

(a) {x | xi ≥ 0},(b) {x | x1 = 0},(c) {x | x1x2 = 0},

(d){

x∣∣ ∑n

j=1 xj = 0}

,

(e){

x∣∣ ∑n

j=1 xj = 1}

,

(f) {x | Ax = b, where Am×n 6= 0 and bm×1 6= 0}.

2. Textbook 4.1.2

Determine which of the following subsets of Rn×n are in fact subspaces of Rn×n.

(a) The symmetric matrices.

(b) The diagonal matrices.

(c) The nonsingular matrices.

(d) The singular matrices.

(e) The triangular matrices.

(f) The upper-triangular matrices.

(g) All matrices that commute with a given matrix A.

(h) All matrices such that A2 = A.

(i) All matrices such that tr(A) = 0.

3. Textbook 4.1.6

Which of the following are spanning sets for R3?

(a){(

1 1 1)}

,

(b){(

1 0 0),(0 0 1

)},

(c){(

1 0 0),(0 1 0

),(0 0 1

),(1 1 1

)},

(d){(

1 2 1),(2 0 −1

),(4 4 1

)},

(e){(

1 2 1),(2 0 −1

),(4 4 0

)}.

4. Textbook 4.1.7

For a vector space V , and for M,N ⊆ V , explain why span(M∪N ) = span(M) +span(N ).

76

6.4. HW 4: Due October 21, 2013 Applied Matrix Theory

5. Textbook 4.2.1

Determine spanning sets for each of the four fundamental subspaces associated with

A =

1 2 1 1 5−2 −4 0 4 −2

1 2 2 4 9

.

6. Textbook 4.2.3

Suppose that A is a 3× 3 matrix such that

R =

1

23

, 1−1

2

and N =

−2

10

span R(A) and N(A), respectively, and consider a linear system Ax = b, where

b =

1−7

0

.

(a) Explain why Ax = b must be consistent.

(b) Explain why Ax = b cannot have a unique solution.

7. Textbook 4.2.7

If A =

(A1

A2

)is a square matrix such that N(A1) = R(Aᵀ2), prove that A must be

nonsingular.

8. Textbook 4.2.8

Consider a linear system of equations Ax = b for which yᵀb = 0 for every y ∈ N(Aᵀ).Explain why this means the system must be consistent.

9. Textbook 4.3.1(abc): Linear independence, basis.

Determine which of the following sets are linearly independent. For those sets that arelinearly dependent, write one of the vectors as a linear combination of the others.

(a)

1

23

,2

10

,1

59

(b)

{(1 2 3

),(0 4 5

),(0 0 6

),(1 1 1

)}(c)

3

21

,1

00

,2

10

10. Textbook 4.3.4

Consider a particular species of wild flower in which each plant has several stems,leaves, and flowers, and for each plant let the following hold.

77


S = the average stem length (in inches).

L = the average leaf width (in inches).

F = the number of flowers.

Four particular plants are examined, and the information is tabulated in the followingmatrix:

A =

S L F

#1 1 1 10#2 2 1 12#3 2 2 15#4 3 2 17

For these four plants, determine whether or not there exists a linear relationship be-tween S, L, and F . In other words, do there exist constants α0, α1, α2, and α3 suchthat α0 + α1S + α2L+ α3F = 0?

11. Textbook 4.3.13

Which of the following sets of functions are linearly independent?

(a) {sin(x), cos(x), x sin(x)}.(b) {ex, xex, x2ex}.(c)

{sin2(x), cos2(x), cos(2x)

}.

12. Textbook 4.4.2

Find a basis for each of the four fundamental subspaces associated with

A =

1 2 0 2 13 6 1 9 62 4 1 7 5

(6.4.1)

13. Textbook 4.4.8

Let B = {b1,b2, . . . ,bn} be a basis for a vector space V . Prove that each v ∈ V canbe expressed as a linear combination of the bi’s, v = α1b1 +α2b2 + · · ·+αnbn, in onlyone way—i.e., the coordinates αi are unique.

14. Textbook 4.5.5

For A ∈ Rm×n, explain why AᵀA = 0 implies A = 0.

15. Textbook 4.5.8

Is rank(AB) = rank(BA) when both products are defined? Why?

16. Textbook 4.5.14

Prove that if the entries of Fr×r satisfy∑r

j=1 |fij| < 1 for each i (i.e., each absoluterow sum < 1), then I + F is nonsingular. Hint: Use the triangle inequality for scalars|α + β| ≤ |α|+ |β| to show N(I + F) = 0.

17. Textbook 4.5.18

If A is n× n, prove that the following statements are equivalent:

78

6.4. HW 4: Due October 21, 2013 Applied Matrix Theory

(a) N(A) = N(A2)

(b) R(A) = R(A2)

(c) R(A) ∩ N(A) = {0}

18. Textbook 4.6.1: Least Squares.

Hookes law says that the displacement y of an ideal spring is proportional to the forcex that is applied—i.e., y = kx for some constant k. Consider a spring in which k isunknown. Various masses are attached, and the resulting displacements shown in thefigure are observed. Using these observations, determine the least squares estimate fork.

19. Textbook 4.6.2

Show that the slope of the line that passes through the origin in R2 and comes closestin the least squares sense to passing through the points {(x1, y1), (x2, y2), . . . , (xn, yn)}is given by m =

∑i xiyi/

∑i x

2i .

20. Textbook 4.6.6

After studying a certain type of cancer, a researcher hypothesizes that in the short runthe number (y) of malignant cells in a particular tissue grows exponentially with time(t). That is, y = α0eα1t. Determine least squares estimates for the parameters α0 andα1 from the researchers observed data given below.

t (days) 1 2 3 4 5y (cells) 16 27 45 74 122

Hint: What common transformation converts an exponential function into a linearfunction?

79


80

UNIT 7

Linear Transformations


Theorem 7.1. Given a vector space V. If {u1, . . . ,un} spans V and {vi}mi=1 ⊂ V, then{vi}mi=1 is linearly dependent if m > n (because there is more vectors in the set).

Proof. Consider∑m

i=1 αivi = 0. Use vi =∑n

j=1 cijuj then∑m

i=1 αi∑n

j=1 cijuj = 0 and∑mi=1

n∑j=1

αicij︸︷︷︸(αᵀC)j=(Cᵀα)j

uj = 0 If we consider α =(α1 · · · αn

)ᵀSo cᵀα = 0 has nonzero

solutions α, since m−n > 0 free variables. So for every αi 6= 0, cᵀα = 0 and∑αivi = 0. �

Any two bases for V have the same number of elements.

Definition 7.2. Let V be a vector space with basis B = {b1, . . . ,bn} The coordinates of

x ∈ V are cj =

c1...cn

such that x =∑n

j=1 cjbj.

Theorem 7.3. Coordinates of x ∈ V with respect to the basis B are unique. [x]B =

c1...cn

.

Example 7.4. We take as an example a vector x ∈ R3,

x =

123

, (7.1.1a)

= 1e1 + 2e2 + 3e3, (7.1.1b)

= ı + 2 + 3k. (7.1.1c)

81

Nitsche and Benner Unit 7. Linear Transformations

with the standard bidis in Rn = {e1, . . . , en} = S or123

= [x]S (7.1.2)

We can have another basis for R3;1

10

,

111

,

200

= B (7.1.3)

This is linearly independent because the matrix2 1 10 1 10 0 1

is nonsingular. So

[x]B =

−13−1

2

(7.1.4)

Now find

c1

c2

c3

such that

c1

110

+ c2

111

+ c3

200

= 1e1 + 2e2 + 3e3. (7.1.5)

In matrix form,

Bc =

123

, (7.1.6a)

1 1 21 1 00 1 0

c1

c2

c3

=

123

. (7.1.6b)

Solving for the individual variables,

c2 = 3; (7.1.7a)

c1 = 2− c1, (7.1.7b)

= −1; (7.1.7c)

2c3 = 1− 3 + 1, (7.1.7d)

c3 = −1

2. (7.1.7e)

Summary

• For any vector space, V , there exists a basis B.

• Any x ∈ V is represented uniquely by a tuple of numbers, the coordinates [x]B.

82


Linear Transformations

Definition 7.5. Given the vector spaces U ,V , a map T : U → V such that,

• T(x + y) = T(x) + T(y)

• T(αx) = αT(x)

is a linear transformation of U → V .

We also recognize that a linear transformation is a linear function on vector spaces.

Definition 7.6. A linear transformation U → U is a linear operator on U .

Our goal now is two fold:

• Show that the set of all linear transformations U → V is a vector space L(U ,V).

• Find the basis and coordinate unit basis of any T ∈ L(U ,V).

Examples of Linear Functions

Example 7.7. T(x) = Am×nxn×1 so T : Rn → Rm.

• Rotation A = R(θ)

• projection

• reflection

Example 7.8. f(x) = ax, f : R→ RExample 7.9. D(f) = df

dx, D : Pn → Pn−1 or D : C1 → set of all functions.

Example 7.10. I(f) =´ baf(x) dx, I : C0 → R

Example 7.11. One final example regarding matrices, T(Bn×k) = Am×nBn×k, T : Rn×k →Rm×k.

Matrix representation of linear transformations

Every linear transformation on finite dimensional spaces has a matrix representation. Sup-pose T : U → V and B = {u1, . . . ,un} forms the basis for U and B′ = {v1, . . . ,vn} formsthe basis for V . Then the action of T on U is

T(u) = T

(n∑i=1

ξiui

), (7.1.8a)

=n∑i=1

ξiT (ui) , (7.1.8b)

=n∑i=1

ξi

n∑j=1

αijvj, (7.1.8c)

=n∑i=1

n∑j=1

αijξivj, (7.1.8d)

where αij describes the action of T.

83


Theorem 7.12. The set of all linear transformations T : U ,V = L(U ,V) is a vector space.

Proof. Given T1,T2 ∈ L(U ,V), then (T1 + T2) x = T1x + T2x and T1 + T2 ∈ L(U ,V).Further (αT1)x = αT1(x) which gives αT1 ∈ L(U ,V). Some other properties of note:0x = 0 and 0 ∈ L(U ,V); (T1 −T1) = 0, etc. �

Theorem 7.13. Given U with basis B = {u1, . . . ,un} and V with basis B′ = {v1, . . . ,vn}then a basis for L(U ,V) is {Bij}i=1,...,n; j=1,...,m, where Bij : U → V by Bij(u) = ξivj where

u =∑n

k=1 ξkuk.

It follows that dim(L(U ,V)) = dim(U) dim(V) = nm.

Proof. Let’s prove linear independence: Consider∑ηijBij = 0, then

0 =

(∑ij

ηijBij

)(uk), (7.1.9a)

=∑ij

ηij(Bijuk), (7.1.9b)

=∑j

ηkjvj (7.1.9c)

ASIDE: Note that Bijuk = ξivj = 0, i 6= k;vj , i = k With [uk]B =(0 · · · 1 · · · 0

)ᵀ, with the 1 at the

kth position.

Since {vj} are linearly independent it follows that ηkj ≡ 0 for all j and each k. ThereforeBij are linearly independent. �


The next major things we are going to try to cover are:

• Basis for L(U ,V) coordinates for T ∈ L(U ,V)

• Action of T

• Change of coordinates of u ∈ U under change of basis

• Change of coordinates of T ∈ L(U ,V) under change of basis

Basis of a linear transformation

The linear set,L(U ,V) = {T : U → V |T linear transformation} (7.2.1)

Theorem 7.14. Bji : U → V by Bjiu = ξivj where B = {u1, . . . ,un} is a basis for U andB′ = {v1, . . . ,vn} is a basis for V and u =

∑nk=1 ξkuk. Also, {Bij} are basis for L(U ,V).

84


Proof. First, we observe that we have linear independence. Second we check the span. If welet T ∈ L(U ,V), then

T(u) = T(∑j

ξjuj), (7.2.2a)

=∑j

ξjT(uj), (7.2.2b)

=∑j

ξj

m∑i=1

αijvi. (7.2.2c)

Here we recognize that T(uj) =∑m

i=1 αijvi.

=∑j

m∑i=1

αij ξjvi︸︷︷︸Bij(u)

, (7.2.2d)

=

(∑j

m∑i=1

αijBij

)(u). (7.2.2e)

for any u. Thus, T =∑

j

∑i αijBij; so {Bij} spans L(U ,V). It follows that

[T]BB′ = {αij} , (7.2.3a)

=

α11 α12 α1n

α21 α22 · · · α2n...

. . ....

αm1 αm2 · · · αmn

, (7.2.3b)

=([T(u1)]B′ [T(u2)]B′ · · · [T(un)]B′

). (7.2.3c)

�

If T : U → U is a linear operator that goes to the same space then [T]BB = [T]B forconvenience.

Example 7.15. Let D : Pn → Pn−1 by D(p) = dpdx

. Our basis is B = {1, x, . . . , xn} and we

85


also have the operated basis B′ = {1, x, . . . , xn−1}. So,

[D(1)]B′ = [0]B′ , (7.2.4a)

=

0...0

; (7.2.4b)

[D(x)]B′ = [1]B′ , (7.2.4c)

=

10...0

; (7.2.4d)

[D(x2)

]B′ = [2x]B′ , (7.2.4e)

=

020...0

; (7.2.4f)

[D(xn)]B′ =[nxn−1

]B′ , (7.2.4g)

=

0...0n

. (7.2.4h)

This allows us to represent the differentiation operator by the matrix,

[D]BB′ =

0 1 0 0 00 0 2 0 · · · 0

0 0 0 3. . . 0

.... . . . . .

...0 0 0 0 · · · n

n×(n+1)

. (7.2.5)

Example 7.16. Let D : Pn → Pn by D(p) = dpdx

. This will be the same as the previousexample except we will add a row of zeros at the bottom and give us a square matrix.

[D]B =

0 1 0 0 00 0 2 0 · · · 0

0 0 0 3. . . 0

.... . . . . .

...0 0 0 0 · · · n0 0 0 0 0

(n+1)×(n+1)

. (7.2.6)

86


We may do this for any operator. For example we could do this for projection. Whatwe want is to find a basis that gives us a nice representation of the operator. Highly sparsebasis are nice.

Action of linear transform

The action of T : U → V . Recall,

T(u) = T

(n∑j=1

ξjuj

), (7.2.7a)

=n∑j=1

ξjT (uj) , (7.2.7b)

=n∑j=1

ξj

m∑i=1

αijvi, (7.2.7c)

=n∑j=1

(m∑i=1

αijξj

)︸︷︷︸

[Aξ]i

vi (7.2.7d)

This gives us the coordinates of the V basis.

[T(u)]B′ = Aξ, (7.2.8a)

= [T]BB′ [u]B . (7.2.8b)

Thus the action is represented by matrix multiplication. Now return to our example,

Example 7.17. Let D : Pn → Pn−1 by D(p) = dpdx

. Our basis is B = {1, x, . . . , xn} and wealso have the operated basis B′ = {1, x, . . . , xn−1}. If we consider p(x) = α0+α1x+· · ·+αnxnand D(p(x)) = α1 + 2α2x+ · · ·+ nαnx

n−1. This gives our vector representation of

[D(p)]B′ =

α1

2α2

3α3...

nαn

, (7.2.9a)

=

0 1 0 0 00 0 2 0 · · · 0

0 0 0 3. . . 0

.... . . . . .

...0 0 0 0 · · · n

α0

α1

α2

α3...αn

. (7.2.9b)

It follows that [L + T]BB′ = [L]BB′+ [T]BB′ and [αL]BB′ = α [L]BB′ . We may also considerthe composition of linear operators. Say L(T(x)) = (LT)(x), also [LT]BB′′ = [L]BB′ [T]B′B′′ .

87


Change of Basis

If we change the coordinates of our system when given vector space U . Let B = {u1, . . . ,un}is a basis for U and B′ = {v1, . . . ,vn} be two bases for U . The relation between [u]B and[u]B′ is given by

[u]B′ = P [u]B . (7.2.10)

P is called the change of basis matrix from B to B′. Recall, the coordinates of [T(u)]B′ =[T(u)]BB′ [u]B. Clearly P is [T(u)]BB′ when T = I or P = [I(u)]BB′ . We will use ourdifferentiation operator as an example once more.

Example 7.18. Given U = P2 we have the bases B = {1, x, x2} and B′ = {1, 1 + x, 1 + x+ x2},then

[I(u)]BB′ =([I(u1)]B′ [I(u2)]B′ [I(u3)]B′

), (7.2.11a)

=([u1]B′ [u2]B′ [u3]B′

), (7.2.11b)

=

1 −1 00 1 −10 0 1

, (7.2.11c)

= P. (7.2.11d)

We know this is true for any u. We can find the representation of the polynomial p(x) =3 + 2t+ 4t2 in the [p]B′ . So,

[p]B′ =

1 −1 00 1 −10 0 1

324

, (7.2.12a)

=

1−2

4

. (7.2.12b)

Finally, let U be a vector space with basis B = {u1, . . . ,un} and B′ = {v1, . . . ,vn}.Then if we have T : U → U . We know the relation between [T]B and [T]B′ and we may letP = [I]BB′ . We have,

[T(u)]B′ = [T]BB′ [u]B , (7.2.13a)

= A [u]B . (7.2.13b)

Further we have

[u]B′ = P [u]B , (7.2.14a)

[T(u)]B′ = P [T(u)]B . (7.2.14b)

SoP [T(u)]B = A . . . (7.2.15)

to be continued. . .Note: No class Friday.

88



Change of Basis (cont.)

If we have T : U → U , let U be a vector space with basis B = {u1, . . . ,un} and B′ ={v1, . . . ,vn}.

1. Basis for L(U ,V) = {Bij : Biju = ξivj, where u =∑

k ξkuk} coordinates of T, [T] =([Tu1]B′ [Tu2]B′ · · · [Tun]B′

).

2. Achar of T [T(u)]B′ = [T]BB′ [u]B.

3. Given x ∈ U with B,B′ are two bases for U , then [x]B′ = P [x]B. and P = [I]BB′ .

4. T : U → U with B,B′ are two bases for U , then we want to relate [T]B and [T]B′ .

To show property 4,

[Tu]B′ = [T]B′B′ [u]B′ , (7.3.1a)

[Tu]B = [T]BB [u]B . (7.3.1b)

But also,

[Tu]B′ = P [Tu]B , (7.3.2a)

Considering P = [I]BB′

[u]B′ = P [u]B . (7.3.2b)

So

P [Tu]B = · · · (7.3.3a)

And we get,

[T]BB = P−1 [T]B′B′ P, (7.3.4a)

[T]B = P−1 [T]B′ P. (7.3.4b)

The matrix representation of T under different basis are self similar.

Definition 7.19. If A = C−1BC for some C, then A and B are self-similar (A,B,C ∈Rn×n).

Theorem 7.20. Given any two self-similar matrices A, B, they represent the same lineartransformation under two different bases.

89


Example 7.21. Example illustrating the self-similarity: [T]B = P−1 [T]B′ P. Let T ∈L(U ,U) be defined by

Tu =

(0 1−2 3

)(xy

)(7.3.5)

where u = xu1 + yu2.

Tu =

(y

−2x+ 3y

)= yu1 + (−2x+ 3y)u2. (7.3.6)

In basis notation we may consider this,

[Tu]B = M [u]B . (7.3.7)

Now let’s consider a different basis. Let S = {e1, e2} and S ′ ={(

11

),

(12

)}. Now

[T]S =([Te1]S [Te2]S

), (7.3.8a)

=

([(0−2

)]S

[(13

)]S

), (7.3.8b)

=

(0 1−2 3

), (7.3.8c)

= M. (7.3.8d)

Now in our different basis,

[T]S′ =

([T

(11

)]S

[T

(12

)]S

), (7.3.9a)

=

([(11

)]S′

[(24

)]S′

), (7.3.9b)

=

(1 00 2

). (7.3.9c)

This helps us by diagonalizing the operator. Now we want to find P,

P = [I]BB′ , (7.3.10a)

=([Tu1]B′ [Tu2]B′

), (7.3.10b)

=

([(10

)]B′

[(01

)]B′

), (7.3.10c)

=

(2 −1−1 1

). (7.3.10d)

Similarly,

P−1 =

(1 11 2

). (7.3.11)

90


We can verify this,

P−1 [T]S′ P =

(1 11 2

)(1 00 2

)(2 −1−1 1

), (7.3.12a)

=

(1 11 2

)(2 −1−2 2

), (7.3.12b)

=

(0 1−2 3

). (7.3.12c)

So this checks out.

Example 7.22. Let M ∈ L(U ,V) defined by [M(u)]S = M [u]S where S is the standardbasis. Then

[M]S = M, (7.3.13a)

=([Me1]S [Me2]S · · · [Men]S

)(7.3.13b)

and we define S ′ = {q1, . . . ,qn}. When we have Q = [I]S′S ,

[M]S′ = Q−1MQ, (7.3.14a)

=([q1]S [q2]S · · · [qn]S

), (7.3.14b)

=(q1 q2 · · · qn

). (7.3.14c)

.

Now let A = Q−1BQ with S = {e1, . . . , en} and S ′ = {q1, . . . ,qn} and Let L(u) = Bu.[L]S = B and [I]S′S = Q so [L]S′ = Q−1BQ.

If T ∈ L(U ,U) and X ⊂ U such that T(X ) ⊂ X where T(X ) = {T(x) such that x ∈ X}then X is an invariant subspace of U under T.

Example 7.23. If (λ,v) are an eigen-pair of A then

(λI−A) v = 0, (7.3.15a)

λv = Av. (7.3.15b)

and span{v} is an invariant subspace under A.


Properties of Special Bases

If we consider B and B′ as bases for U with operation T : U → U . Then we have,

[T]BB′ =([T(u1)]B′ · · · [T(un)]B′

), (7.4.1a)

[T]B =([T(u1)]B · · · [T(un)]B

), (7.4.1b)

= P−1 [T]B′ P (7.4.1c)

91


And we also have P = (I)BB′ . We consider T on Rn, T(x) = Ax and [T]S = A. SoA = P−1BP for appropriate B and P, with B = [T]B′

Note: A tuple is an ordered set of numbers.Now we have two goals:

1. Find a basis such that [T]B is simple

2. FInd invariant quantities

Example 7.24. tr(P−1BP) = tr(BPP−1) = tr(B)

Example 7.25. For T : Pn → Pn by T(p) = Dp,

[T]B =

0 1 0 · · · 0

0 0 2...

... 0 n0 · · · 0 0

(7.4.2)

tr(T) = 0

Example 7.26. rank(P−1BP) = rank(B)

Example 7.27. Nilpotent operator of index k N : U → U such that Nk = 0, but Nk−1 6= 0.On the homework we will have to show that

{x,Nx,N2x, . . . ,Nk−1x

}a basis for Rk and

x is defined such that Nk−1(x) 6= 0. So,

[N]B =

0 0 · · · 0

1 0. . .

......

. . . 0 00 · · · 1 0

= J (7.4.3)

Example 7.28. An idempotent operator E : U → U has the property E2 = E. This isbecause these are projection operators which can only return the same answer if done twice.

B =

x1, . . . ,xr︸︷︷︸BR(E)

,y1, . . . ,yn−r︸︷︷︸BN(E)

.

[E] =

(Ir×r 00 0

)(7.4.4)

Example 7.29. If A has a full set of e-vectors qj, j = 1, . . . , n. Then, Aqj = λjqj withbases S,P . So

[I]PS =(q1, . . . ,qn

), (7.4.5a)

= Q, (7.4.5b)

[T]P = Q−1 [T]S Q, (7.4.5c)

Λ = Q−1AQ (7.4.5d)

92


So

[T]P =

λ1 0 · · · 0

0 λ2. . .

......

. . . . . . 00 · · · 0 λn

, (7.4.6a)

T(x) = Ax. (7.4.6b)

Invariant Subspaces

Let T be a linear operator T : U → U .

Definition 7.30. A subset X ⊂ U is invariant under T if Tx ∈ X for any x ∈ X (orT(X ) ⊂ X ). Also T1x : X → X .

Example 7.31. Given

T(x) = Ax, (7.4.7a)

=

−1 −1 −1 −1

0 −5 −16 −220 3 10 144 8 12 14

. (7.4.7b)

X = span {q1,q2} where q1 =

2−100

and q2 =

−1

2−1

0

. Show that X is invariant under T.

So,

T(q1) =

−1−15−3

0

, (7.4.8a)

= q1 + 3q2 ∈ span(X ); (7.4.8b)

T(q2) =

06−4

0

, (7.4.8c)

= 2q1 + 4q2 ∈ span(X ). (7.4.8d)

So for any T(α1q1+α2q2) = α1T(q1)+α2T(q2) ⊂ X . So for T : R4 → R4 with T1x : X → X ,

[T1x]q1,q2=

(1 23 4

). (7.4.9)

93


Now say we have [T]P , P =

q1,q2,

1000

,

0001

. Then,

[T]P =

1 2 x x3 4 x x0 0 −1 x0 0 4 x

(7.4.10)

So we have gained some zero elements. This is since,

T

1000

=

−1

004

, (7.4.11a)

T

0001

=

−1−12

1414

(7.4.11b)

Now if X ,Y are subspaces of U and are invariant under T; T(X ) ⊂ X and T; T(Y) ⊂ Y .and X + Y = U . Then B =

[x1, . . . ,xr,y1, . . . ,yn−r

].

[T]B =([T(x1)]B · · · [T(xr)]B [T(y1)]B · · ·

[T(yn−r)

]B

), (7.4.12a)

=

([T1x]Bx 0

0 [T1y]By

), (7.4.12b)

= Q−1AQ. (7.4.12c)

7.5 Homework Assignment 5: Due Monday, November

4, 2013

1. Explain how we proved in class that, for any A ∈ Rm×n, the linear AᵀAx = Ab isconsistent. Do not reproduce all proofs, but outline the train of thought, starting frombasic linear algebra facts.

2. For the overdetermined linear system1 21 21 2

x =

112

(a) Is the matrix A rank-deficient or of full rank? What is the rank of AᵀA?

(b) Find all least squares solutions.

94

7.5. HW 5: Due November 4, 2013 Applied Matrix Theory

(c) Find the solution that Matlab returns, using A\b. Also find the least squaressolution of minimum norm. Do they agree?

(d) What criterion does Matlabs use to choose a solution? (use help mldivide to findout)

3. Textbook 4.7.2: Linear transformations

For A ∈ Rn×n, determine which of the following functions are linear transformations.

(a) T(Xn×n) = AX−XA,

(b) T(xn×1) = Ax + b for b 6= 0,

(c) T(A) = Aᵀ,

(d) T(Xn×n) = (X + Xᵀ) /2.

4. Textbook 4.7.6

For the operator T : R2 → R2 defined by T(x, y) = (x+ y,−2x+ 4y), determine [T]B,

where B is the basis B =

{(11

),

(12

)}.

5. Textbook 4.7.11

Let P be the projector that maps each point v ∈ R2 to its orthogonal projection onthe line y = x as depicted in Figure 4.7.4.

Figure 7.1. Figure 4.7.4

(a) Determine the coordinate matrix of P with respect to the standard basis.

(b) Determine the orthogonal projection of v =

(αβ

)onto the line y = x.

6. Textbook 4.7.13

For P2 and P3 (the spaces of polynomials of degrees less than or equal to two and three,respectively), let S : P2 → P3 be the linear transformation defined by S(p) =

´ t0p(x) dx.

Determine [S]BB′ , where B = {1, t, t2} and B′ = {1, t, t2, t3}.

95


7. Textbook 4.8.1: Change of basis

Explain why rank is a similarity invariant.

8. Textbook 4.8.2

Explain why similarity is transitive in the sense that A ' B and B ' C impliesA ' C.

9. Textbook 4.8.3

A(x, y, z) = (x+ 2y − z,−y, x+ 7z) is a linear operator on R3.

(a) Determine [A]S , where S is the standard basis.

(b) Determine [A]S′ as well as the nonsingular matrix Q such that [A]S′ = Q−1[A]SQ

for S ′ =

1

00

,

110

,

111

.

10. Textbook 4.8.11

(a) N is nilpotent of index k when Nk = 0 but Nk−1 6= 0. If N is a nilpotent operatorof index n on Rn, and if Nn−1(y) 6= 0, show B = {y,N(y),N2(y), . . . ,Nn−1(y)}is a basis for Rn, and then demonstrate that?

[N]B = J =

0 0 · · · 0 01 0 · · · 0 00 1 · · · 0 0...

.... . .

......

0 0 · · · 1 0

(b) If A and B are any two n× n nilpotent matrices of index n, explain why A ' B.

(c) Explain why all n×n nilpotent matrices of index n must have a zero trace and beof rank n− 1.

11. Textbook 4.8.12

E is idempotent when E2 = E. For an idempotent operator E on Rn, let X = {xi}ri=1

and Y = {xi}n−ri=1 be bases for R(E) and N(E), respectively.

(a) Prove that B = X ∪ Y is a basis for Rn. Hint: Show Exi = xi and use this todeduce that B is linearly independent.

(b) Show that [E]B =

(Ir 00 0

).

(c) Explain why two n× n idempotent matrices of the same rank must be similar.

(d) If F is an idempotent matrix, prove that rank(F) = tr(F).

96


12. Textbook 4.9.3: Invariant subspaces

Let T be the linear operator on R4 defined by

T(x1, x2, x3, x4) = (x1 + x2 + 2x3 − x4, x2 + x4, 2x3 − x4, x3 + x4),

and let X = span {e1, e2} be the subspace that is spanned by the first two unit vectorsin R4.

(a) Explain why X is invariant under T.

(b) Determine[T/X

]{e1,e2}

.

(c) Describe the structure of [T]B, where B is any basis obtained from an extension of{e1, e2}.

13. Textbook 4.9.4

Let T and Q be the matrices

T =

−2 −1 −5 −2−9 0 −8 −2

2 3 11 53 −5 −13 −7

and Q =

1 0 0 −11 1 3 −4−2 0 1 0

3 −1 −4 3

(a) Explain why the columns of Q are a basis for R4.

(b) Verify that X = span {Q:1,Q:2} and Y = span {Q:3,Q:4} are each invariant sub-spaces under T.

(c) Describe the structure of Q−1TQ without doing any computation.

(d) Now compute the product Q−1TQ to determine[T/X

]{Q:1,Q:2}

and[T/Y

]{Q:3,Q:4}

.

14. Textbook 4.9.7

If A is an n × n matrix and λ is a scalar such that (A − λI) is singular (i.e., λ is aneigenvalue), explain why the associated space of eigenvectors N(A−λI) is an invariantsubspace under A.

15. Textbook 4.9.8

Consider the matrix A =

(−9 4−24 11

).

(a) Determine the eigenvalues of A.

(b) Identify all subspaces of R2 that are invariant under A.

(c) Find a nonsingular matrix Q such that Q−1AQ is a diagonal matrix.

97


98

UNIT 8

Norms


Homework 5 due Friday

Difinition of norms

Norm acts on a vector space V over R or C.

Definition 8.1. A norm is a function ‖ · ‖ : V → R by : x→ ‖x‖ such that

1. ‖x‖ ≥ 0 for any x ∈ V , and ‖x‖ = 0 if and only if x = 0

2. ‖αx‖ = |α|‖x‖

3. ‖x + y‖ ≤ ‖x‖+ ‖y‖

Vector Norms

Some norms:

• ‖x‖2 =√∑n

i=1 x2i which is the 2-norm or the Euclidean norm

• ‖x‖1 =∑n

i=1 |xi|

• ‖x‖p = (∑n

i=1 xpi )

1/p

• ‖x‖∞ = maxi |xi| = limp→∞ ‖x‖p

The two norm

A unit vector is x‖x‖ and the unit ball in R2 {x ∈ R2 : ‖x‖ = 1} We illustrate the unit balls

for the three primary norms: ‖x‖2 = 1 which gives a circle, ‖x‖1 = 1 or |x1|+ |x2| = 1 whichgives a rhombus, ‖x‖∞ = 1 or (x1, x2) such that max(|x1|, |x2|) = 1 which gives a square.

99

Nitsche and Benner Unit 8. Norms

Theorem 8.2. ‖x‖∞ ≤ ‖x‖2 ≤ ‖x‖1

Proof.

‖x‖∞ = maxi|xi|, (8.1.1a)

= maxi

√x2i , (8.1.1b)

=√x2k , for some k, (8.1.1c)

≤

√√√√ n∑i=1

x2i , (8.1.1d)

= ‖x‖2; (8.1.1e)

=√∑

|xi|2 , (8.1.1f)

≤√(∑

|xi|)2

, (8.1.1g)

= ‖x‖1. (8.1.1h)

�

Our goal is now to prove the triangle inequality for the 2-norm. Note that ‖x‖22 =

∑x2i =

xᵀx, where xᵀy is the standard inner product.

Theorem 8.3. The Cauchy–Schwarz inequality (or CBS): |xᵀy| ≤ ‖x‖‖y‖

Proof. Let α = xᵀyxᵀx

; note xᵀy = yᵀx. Also,

xᵀ

(αx− y) = xᵀ(

xᵀy

xᵀxx− y

), (8.1.2a)

= xᵀxᵀy

xᵀxx− x

ᵀy, (8.1.2b)

=xᵀy

xᵀxxᵀx− x

ᵀy, (8.1.2c)

= xᵀy − x

ᵀy, (8.1.2d)

= 0. (8.1.2e)

Further,

0 ≤ ‖αx− y‖22 = (αx− y)

ᵀ(αx− y) , (8.1.3a)

= αx (αx− y)− y (αx− y) , (8.1.3b)

= −αyᵀx + y

ᵀy, (8.1.3c)

= −xᵀy

xᵀxyᵀx + y

ᵀy, (8.1.3d)

= −|xᵀy|‖x‖2

2

+ ‖y‖22. (8.1.3e)

100


this gives ‖y‖2 ≥ |xᵀy|‖x‖22

and therefore ‖x‖2‖y‖2 ≥ |xᵀy|2. �

Theorem 8.4. ‖x + y‖2 ≤ ‖x‖2 + ‖y‖2

Proof.

‖x + y‖22 = (x + y)

ᵀ(x + y) , (8.1.4a)

=(xᵀ

+ yᵀ)

(x + y) , (8.1.4b)

= xᵀx + 2x

ᵀy + y

ᵀy, (8.1.4c)

≤ ‖x‖2 + 2∣∣xᵀy∣∣+ ‖y‖2, (8.1.4d)

≤ ‖x‖2 + 2‖x‖2‖y‖2 + ‖y‖2, (8.1.4e)

= (‖x‖2 + ‖y‖2)2 , (8.1.4f)

‖x + y‖2 ≤√

(‖x‖2 + ‖y‖2)2 , (8.1.4g)

≤ ‖x‖2 + ‖y‖2. (8.1.4h)

�

Matrix Norms

Definition 8.5. A matrix norm is a function ‖ · ‖ : Rn×m → R such that,

1. ‖A‖ ≥ 0 for any A ∈ Rn×m, and ‖A‖ = 0 if and only if A = 0

2. ‖αA‖ = |α|‖A‖

3. ‖A + B‖ ≤ ‖A‖+ ‖B‖

The Frobenius Norm

The Frobeius norm is defined

‖A‖F =

√∑i,j

a2ij . (8.1.5)

or

‖A‖2F =

∑i

‖Ai,:‖22, (8.1.6a)

=∑j

‖A:,j‖22, (8.1.6b)

=∑j

aᵀjaj, (8.1.6c)

= tr(AᵀA). (8.1.6d)

which gives us a convenient way of expressing this norm.

101


Induced Norms

Given a vector norm on Rn we may define (where sup is the smallest upper bound)

‖A‖ = supx∈Rn

‖Ax‖‖x‖

= sup‖x‖=1

‖Ax‖. (8.1.7)

we may also replace the smallest upper bound (sup) with the maximum (max). We can nowtake ‖A‖2, ‖A‖1, and ‖A‖∞


Matrix norms (review)

Definition 8.6. A norm on V

1. ‖A‖ ≥ 0 for any A ∈ Rn×m, and ‖A‖ = 0 if and only if A = 0

2. ‖αA‖ = |α|‖A‖

3. ‖A + B‖ ≤ ‖A‖+ ‖B‖

Frobenius Norm

The Frobenius norm is defined

‖A‖2F =

∑i

|aij|2, (8.2.1a)

=∑i

‖Ai,:‖22, (8.2.1b)

=∑j

‖A:,j‖22, (8.2.1c)

= tr(AᵀA), (8.2.1d)

= tr(A?A) (8.2.1e)

for A ∈ Cn×m. In in the real set A? = Aᵀ.

Properties of the Frobenius norm:

1. ‖Ax‖2 = ‖x‖2‖A‖F

2. ‖AB‖F = ‖A‖F‖B‖F

102


Proof. Property (1):

‖Ax‖2 =∑i

(Ax)2i , (8.2.2a)

=∑i

(Ai,:x)2 , (8.2.2b)

≤∑i

‖Ai,:‖22‖x‖

22, (8.2.2c)

= ‖x‖22

∑i

‖Ai,:‖22︸︷︷︸

‖A‖2F

. (8.2.2d)

Property (2):

‖AB‖2F =

∑j

‖(AB)j,:‖22, (8.2.3a)

=∑j

‖ABj,:‖22, (8.2.3b)

≤∑j

‖A‖2F‖Bj,:‖2

2, (8.2.3c)

= ‖A‖2F

∑j

‖Bj,:‖22︸︷︷︸

‖B‖2F

. (8.2.3d)

�

Example 8.7.

A =

(1 20 2

)(8.2.4)

AᵀA =

(1 02 2

)(1 20 2

), (8.2.5a)

=

(1 22 8

)(8.2.5b)

So

‖A‖2 =√

tr(AᵀA) , (8.2.6a)

=√

9 , (8.2.6b)

= 3. (8.2.6c)

which may be called by norm(A, ’fro’) in Matlab.

103


Induced Matrix Norms

Definition 8.8. For A ∈ Rn×m the induced norm of the matrix is

‖A‖ = maxx∈Rn

‖Ax‖‖x‖

, (8.2.7a)

= max‖x‖=1

‖Ax‖ (8.2.7b)

Example 8.9.

A =

(1 20 2

)(8.2.8)

‖A‖1 = max‖x‖=1

‖Ax‖1, (8.2.9a)

= max‖x‖=1

∑|(Ax)|. (8.2.9b)

This provides a remap of the vector x. For example we may find the image of the pointsof the corners of the unit rhombus for the x vector. Which can provide a way to find the1-norm, but this is not the best physically. Returning to the ∞-norm,

‖A‖∞ = max‖x‖∞=1

‖Ax‖∞, (8.2.10a)

= max‖x‖∞=1

maxi|(Ax)i|. (8.2.10b)

Here we can remap the corners of the unit square to a stretched parallelogram. What isthe maximum ∞-norm? From the figure, we can see it is 3. Now we are interested in themapping of the 2-norm, which is the unit circle.

‖A‖2 = max‖x‖2=1

‖Ax‖2, (8.2.11a)

≈ 2.92. (8.2.11b)

ASIDE: Say we have,

(Ax)21 + (Ax)22 = (a11x1 + a12x2)2 + (a21x1 + a22x2)2 , (8.2.12a)

= a211x21 + x1x2 (a11a12 + a21a22) + a222x

22, (8.2.12b)

= constant. (8.2.12c)

which would give an ellipse.

Theorem 8.10. ‖A‖1 = maxj∑

i |aij| which gives the maximum column-sum, and ‖A‖1 =maxi

∑j |aij| which is the maximum row-sum.

104


Properties

The induced norms of a matrix have similar properties to the Frobenius norm:

1. ‖Ax‖ ≤ ‖A‖‖x‖ since ‖Ax‖‖x‖ ≤ ‖A‖

2. ‖AB‖ = ‖A‖‖B‖ (Will be shown in the homework)

Example 8.11. The induced norm of the identity matrix is 1; ‖I‖ = 1.

Proof.

‖A‖1 = max‖x‖1=1

∑i

|(Ax)|︸︷︷︸‖Ax‖1

, (8.2.13a)

= max‖x‖1=1

∑i

∣∣∣∣∣∑j

aijxj

∣∣∣∣∣, (8.2.13b)

≤ max‖x‖1=1

∑i

∑j

|aij||xj|, (8.2.13c)

= max‖x‖1=1

∑i

|xj|∑j

|aij|, (8.2.13d)

≤ max‖x‖1=1

∑i

|xj|∑j

|aij|︸︷︷︸independent of j

, (8.2.13e)

= max‖x‖1=1

maxj

∑i

|aij|∑j

|xj|︸︷︷︸‖x‖1=1

, (8.2.13f)

= maxj

∑i

|aij|. (8.2.13g)

Now find an x such that the upper bound is attained. So let k =∑

i |aik| = maxj∑

i |aij|.Now let x = ek, then

‖Ax‖1 = ‖Aek‖, (8.2.14a)

= ‖A:k‖1, (8.2.14b)

=∑i

|aij|, (8.2.14c)

= maxj|aij|, (8.2.14d)

= upper bound. (8.2.14e)

�

Further ‖A‖22 = max ‖Ax‖2

2 such that ‖x‖22 = 1. Then, ‖A‖2

2 = max (xᵀAᵀAx) such thatxᵀx = 1. This arrizes Lagrange multipliers, or ∇f = λ∇g.

105



The 2-norm

Given the 2-norm ‖A‖2 = max‖x‖2=1 ‖Ax‖2 we have, f(x) = ‖A‖22 = max (xᵀAᵀAx) such

that g(x) = xᵀx = 1 where f(x) : Rn → R. This needs Lagrange multipliers, or∇f = λ∇g.For a minimization problem.

∂UV

∂xj=∂U

∂xjV + U

∂V

∂xj(8.3.1)

Lemma 8.12. If B is symmetric, ∇ (xᵀBx) = 2Bx

Note: ∇ (xᵀx) = 2x

Proof. To prove this lemma,

∂

∂xj

(xᵀBx)

=∂

∂xj

(xᵀBx)

+ xᵀB∂x

∂xj, (8.3.2a)

= eᵀjBx + x

ᵀBej, (8.3.2b)

= eᵀjBx +

(xᵀBej

)ᵀ, (8.3.2c)

= eᵀj B︸︷︷︸

=Bᵀ

x + eᵀjBᵀx, (8.3.2d)

= 2eᵀjBx, (8.3.2e)

= 2 (Bx)j . (8.3.2f)

�

Proof. Alternatively, we may consider,

∂

∂xj

(∑i

xi (Bx)i

)=

∂

∂xj

(∑i

xi∑k

Bikxk

), (8.3.3a)

=∂

∂xj

(∑i

∑k

xiBikxk

), (8.3.3b)

=∑k

Bjkxk +∑i

xiBij, (8.3.3c)

=∑k

Bjkxk +∑k

Bkjxk, (8.3.3d)

=∑k

Bjkxk +∑k

Bjkxk, (8.3.3e)

= 2 (Bx)j . (8.3.3f)

�

106


So,

2AᵀAx = 2x, (8.3.4a)

AᵀAx = λx, (8.3.4b)

and the solution (λ,x) is an eigenpair of AᵀA. Note, for these x, f(x) = xᵀAᵀAx = xᵀλx =λxᵀx = λ. Thus,

max(f) = λmax = maxkλk (8.3.5)

and λk = eigenvalue of AᵀA. Note further that AᵀA is symmetric so the eigenvalues are realand therefore f(x) ≥ 0 and λk ≥ 0.

Example 8.13. Given

A =

(1 20 2

)(8.3.6)

and

AᵀA =

(1 22 8

). (8.3.7)

Then,

det(AᵀA− λI

)=

∣∣∣∣1− λ 22 8− λ

∣∣∣∣ , (8.3.8a)

= (1− λ) (8− λ)− 4, (8.3.8b)

= λ2 − 9λ+ 4. (8.3.8c)

So,

λ1,2 =9±√

81− 16

2, (8.3.9a)

=9±√

65

2, (8.3.9b)

and

λmax =9 +√

65

2. (8.3.10)

Therefore:

‖A‖2 =9 +√

65

2≈ 2.9208 . . . (8.3.11)

Now, ‖x‖∞ ≤ ‖x‖2 ≤ ‖x‖1. This inequality does not hold for matrices. Some properties,(where UᵀU = I and VᵀV = I)

• ‖A‖2 = ‖Aᵀ‖2

• ‖AᵀA‖2 = ‖A‖22

•∥∥∥∥(A 0

0 B

)∥∥∥∥2

= max (‖A‖2, ‖B‖2)

• ‖UᵀAU‖2 = ‖A‖2

• ‖A−1‖2 = 1√λmin(AᵀA)

107


108

UNIT 9

Orthogonalization with Projection andRotation

9.1 Lecture 28 (cont.)

Inner Product Spaces

An inner product space V plus the the inner product.

Definition 9.1. Given a vector space V , an inner product is a function f : V ×V → R or Cby f(x,y) = 〈x,y〉 such that

• 〈x,y〉 = 〈y,x〉

• 〈x, αy〉 = α 〈x,y〉, note 〈x, αy〉 = 〈y,xα〉 = α 〈y,x〉 = α〈y,x〉 = α 〈x,y〉

• 〈x + z,y〉 = 〈x,y〉+ 〈z,y〉

• 〈x,x〉 ≥ 0 for any x ∈ V

• 〈x,x〉 = 0 implies x = 0

Example 9.2. 1. 〈x,y〉 = xᵀy with V = Rn and 〈x,y〉 = x∗y with V = Cn, wherex∗ = xᵀ.

2. 〈x,y〉A = xᵀAᵀAy with V = Rn and 〈x,y〉A = x∗A∗Ay with V = Cn

This gives us a new norm ‖x‖A =√

xᵀAᵀAx = ‖Ax‖2.

3. 〈f, g〉 =´ baf(x)g(x) dx, V = C0[a, b] and ‖f‖ =

√´ baf(x)f(x) dx =

√´ ba|f(x)|2 dx

4. 〈f, g〉 =´ baω(x)f(x)g(x) dx where ω(x) ≥ 0

5. 〈A,B〉 = tr(AᵀB) and ‖A‖ =√

tr(AᵀA) = ‖A‖F

109

Nitsche and Benner Unit 9. Orthogonalization with Projection and Rotation

9.2 Lecture 29: November 1, 2013

Inner Product Spaces

Reviewing properties of inner product spaces,

• 〈x,y〉 = 〈y,x〉

• 〈x, αy〉 = α 〈x,y〉

• 〈x + z,y〉 = 〈x,y〉+ 〈z,y〉

• 〈x,x〉 ≥ 0 for any x ∈ V

• 〈x,x〉 = 0 implies x = 0

Now we may define norms ‖x‖ =√〈x,x〉 . Let’s say we want to define angles between

vectors and ‖y − x‖2 = ‖x‖2 + ‖y‖2 − 2‖x‖‖y‖ cos(θ). Rearranged,

cos(θ) =−‖y − x‖2 + ‖x‖2 + ‖y‖2

2‖x‖‖y‖, (9.2.1a)

=〈x,x〉+ 〈y,y〉 − 〈y − x,y − x〉

2‖x‖‖y‖, (9.2.1b)

=〈y,x〉+ 〈x,y〉

2‖x‖‖y‖, (9.2.1c)

=〈x,y〉‖x‖‖y‖

, (9.2.1d)

only if 〈x,y〉 ∈ R. For a more general definition 〈y,x〉 + 〈x,y〉 = 〈y,x〉 + 〈y,x〉 =2 Re(〈y,x〉). So we would have the problem of the conjugate in finding the angle, buthave reduced this issue.

Definition 9.3. The angle between x,y is given by

cos(θ) =〈x,y〉‖x‖‖y‖

. (9.2.2)

So, for x ⊥ y means 〈x,y〉 = 0.

Note: If the inner product is not a real number, then 〈x,y〉 = 0 means ‖x‖2 + ‖y‖2 =‖y − x‖2, but not vice-versa.

Example 9.4.

x =

1−2

3−1

and y =

41−2−4

.

110

9.2. Lecture 29: November 1, 2013 Applied Matrix Theory

So x ⊥ y in 〈x,y〉 = xᵀy, but x 6⊥y in 〈x,y〉A = xᵀAᵀAy where,

A =

1 2 0 00 1 0 00 0 1 00 0 0 1

.

Definition 9.5. A set {u1, . . . ,un} is orthonormal if ‖uk‖ = 1 for any k and 〈uj,uk〉 = 0for any j 6= k.

Fourier Expansion

Given an orthonormal basis for V we can write x ∈ V as

x = c1u1 + c2u2 + · · · cnun (9.2.3)

with 〈x,uj〉 = cj 〈uj,uj〉 = cj.

Example 9.6. Given a series{

1√π

sin(kx)}nk−1

is orthonormal with respect to the inner

product´ π−π f(x)g(x) dx.

How do we compute the following integrals?´

sin(kx) dx =´ 1−cos(2kx)

2dx So if f ∈

span {sin(kx)} then f = 1√π

∑nk=1 ck sin(kx). Thus, ck = 1√

π

´ π−π f(x) sin(kx) dx.

In homework we will approximate a line on [−π, π] with the sine and cosine Fourier series.This is essentially the 2-norm approximation of the span of the Fourier series. The Gibbsphenomena will be observed with overshoot of the sines and cosines above the function.Thus, orthonormal bases are useful for partial differential equations applications.

Orthogonalization Process (Gramm-Schmidt)

Goal: Given basis {a1, . . . , an} find an orthonormal basis {u1, . . . ,un} for V . This is theorthogonalization process. Method: find uk such that span {u1, . . . ,un} = span {a1, . . . , an}for k = 1, . . . , n. Now let’s show the process.

k = 1:

u1 =a1

‖a1‖

k = 2:

u2 =a2 − 〈u1, a2〉u1

‖a2 − 〈u1, a2〉u1‖

111


As an example of the orthogonality of u1 and u2

a2 − 〈u1, a2〉u1

‖a2 − 〈u1, a2〉u1‖=〈u1, a2 − 〈u1, a2〉u1〉

`2

, (9.2.4a)

=1

`2

〈u1, a2 − 〈u1, a2〉u1〉 , (9.2.4b)

=1

`2

[〈u1, a2〉 − u1 〈u1, a2〉u1] , (9.2.4c)

=1

`2

〈u1, a2〉 − 〈u1, a2〉〈u1,u1〉︸︷︷︸1

, (9.2.4d)

= 0. (9.2.4e)

k = 3:. . .

k = k:

uk =ak − 〈u1, ak〉u1 − 〈u2, ak〉u2 − · · · − 〈uk−1, ak〉uk−1

‖ak − 〈u1, ak〉u1 − 〈u2, ak〉u2 − · · · − 〈uk−1, ak〉uk−1‖.

This is the Gramm–Schmidt orthogonalization process. If we want, we can write it as,

uk =(I−Uk−1U

∗k) ak

‖(I−Uk−1U∗k) ak‖. (9.2.5)

Here

Uk−1 =

| |u1 · · · uk−1

| |

.9.3 Lecture 30: November 4, 2013

Gramm–Schmidt Orthogonalization

Given basis {a1, . . . , an} find an orthonormal basis {u1, . . . ,un} for that spans the samespace. Algorithm,

u1 =a1

‖a1‖, (9.3.1a)

u2 =a2 − (u1a2) u1

`2

, (9.3.1b)

(9.3.1c)

with using projections,

〈u1, a2〉u1 =(uᵀ1a2

)u1, (9.3.2a)

= u1uᵀ1︸︷︷︸

P11

a2. (9.3.2b)

112


From,

u2 =a2 − (u1a2) u1

‖a2 − (u1a2) u1‖, (9.3.3a)

=(I− u1u

ᵀ1) a2

‖(I− u1uᵀ1) a2‖

, (9.3.3b)

= P⊥a2. (9.3.3c)

Example 9.7. Given the vectors,

a1 =

034

, a2 =

−202711

, and a3 =

−14−4−2

Then we can find the orthogonal vectors,

u1 =1

5

034

. (9.3.4)

Then,

v2 = a2 − 〈u1, a1〉u1, (9.3.5a)

=

−202711

− 1

5

(0 3 4

)−202711

034

1

5, (9.3.5b)

=

−202711

− 125

25

034

· · · (9.3.5c)

***

and

u1 =1

5

034

, (9.3.6a)

u2 =1

25

−20−12−9

, (9.3.6b)

and u3 =1

25

−15−16

12

. (9.3.6c)

113


Now rewriting our system,

u1 =a1

`1

, (9.3.7a)

u2 =a2 − r12u1

`2

, (9.3.7b)

u3 =a3 − r13u1 − r23u2

`3

, (9.3.7c)

· · · (9.3.7d)

un =an − r1nu1 − r2nu2 − · · · − rn−1,nun−1

`n. (9.3.7e)

where rij = 〈ai,uj〉. Now in different vector form,

a1 = `1u1, (9.3.8a)

a2 = r12u1 + `2u2, (9.3.8b)

a3 = r13u1 + r23u2 + `3u3, (9.3.8c)

· · · (9.3.8d)

an = r1nu1 + r2nu2 + · · ·+ rn−1,nun−1 + `nun. (9.3.8e)

We can put this in a matrix form. If A is full rank (must have m ≥ n). Since can have atmost m linearly independent vectors ai. With A = QR,

| | |a1 a2 · · · an| | |

m×n

=

| | |u1 u2 · · · un| | |

m×n

`1 r12 r13 r1n

0 `2 r23 · · · r2n

0 0 `3. . . r3n

.... . . . . .

...0 0 0 · · · `n

n×n

. (9.3.9)

where rii = ì 6= 0 > 0 and R is invertible. This uniquely determines the Fourier coefficientsof the Fourier expansion of this system.

Thus, every matrix A of full rank has a unique decomposition, known as a QR fac-torization, Am×n = Qm×nRn×n, where R is invertible. What do we know about QᵀQ?(QᵀQ)ij = uᵀiuj which is zero for i 6= j and one for i = j. So QᵀQ = In×n. These areorthogonal matrices.

Decompositions of A:

• Am×n = Qm×nRn×n, where QᵀQ = I and R is invertible.

• A = LU if |Ak| 6= 0.

• PA = LU always exists.

Now what about QQᵀ? It will be an m×m matrix, but otherwise we know little about it.

114


Example 9.8. Returning to our example,0 −20 −143 27 −44 11 −2

=

0 −20/25 −15/253/5 12/25 −16/254/5 −9/25 12/25

5 25 r13

0 `2 r23

0 0 `3

(9.3.10)

In this case Q has three linearly independent columns and three linearly independent rows.So Qᵀ has linearly independent columns. And interestingly (Qᵀ)

ᵀQᵀ = QQᵀ = I. This is

an orthogonal matrix: it is both invertible and has orthogonal columns. In general this isnot the case because it is not n× n and QQᵀ is not necessarily the identity if m > n.

Use A = QR:

Example 9.9. Assume An×n invertible; solve Ax = b. Rewrite

QRx = b, (9.3.11a)

QᵀQRx = Q

ᵀb, (9.3.11b)

Rx = Qᵀb. (9.3.11c)

This system is quick to solve (once Q and R are known).

Example 9.10. Assume Am×n full rank m > n then Ax = b is an overdetermined systemand least squares solution satisfies,

AᵀAx = A

ᵀb, (9.3.12a)

RᵀQᵀQRx = R

ᵀQᵀb, (9.3.12b)

RᵀRx = R

ᵀQᵀb, (9.3.12c)

Rx =(Rᵀ)−1

RᵀQᵀb, (9.3.12d)

Rx = Qᵀb. (9.3.12e)

Go through this proof and the solutions manual. Then we will see how well SVD canimprove things later.


In homework the reduced QR factorization reffered to is where we can always write Am×n =Qm×nRn×n where QᵀQ = I and Rn×n is triangular. This factorization is unique, but wemay also

QR =

| |q1 · · · qn| |

x x · · · x

0 x. . . x

.... . . . . .

...0 0 · · · x

(9.4.1)

115


since {q1, . . . ,qn} is an orthogonal basis for R(A) ⊂ Rm. Now,

QR =

| | | |q1 · · · qn qn+1 · · · qm| | | |

m×m

x x · · · x0 x · · · x...

. . . . . ....

0 0 · · · x0 0 · · · 0

...0 0 · · · 0

m×n

(9.4.2)

In this case, the reduced QR is not unique.

Unitary (orthogonal) matrices

The unitary refers to the complex case and the orthogonal refers to the real.

Definition 9.11. A unitary matrix is Q ∈ Cn×n such that Q∗Q = I. This means we haveQ has n orthogonal columns. Additionally, since Q is square we have n orthogonal rows.So, Q ∈ Rn×n, with QᵀQ = QQᵀ = In×n.

Properties

Some properties for a unitary Q:

• Q∗Q = QQ∗ = In×n

• Q−1 = Q∗

• columns are orthonormal

• rows are orthonormal

• (Qx)∗Qy = x∗Q∗Qy = x∗y for any x,y.

Note: ‖Qx‖ = ‖x‖, so Q is an isometry . Also, If u,v unitary, then uv is unitary since,

(uv)∗ (uv) = v∗u∗uv, (9.4.3a)

= v∗v, (9.4.3b)

= I, (9.4.3c)

= (uv) (uv)∗ . (9.4.3d)

Example 9.12. Q in full QR factorization of any A. In Matlab, [q, r] = qr(a) (is thisfull QR?) and [q, r] = qr(a,0) (is this reduced QR?).

Now to compute the QR factorization, the Gramm–Schmidt algorithm is not numericallystable. Thus, small changes in the input matrix values can cause large changes in the result.The alternative is the modified Gramm–Schmidt which improves the stability properties. We

116


will not cover this here, but it is discussed in future courses. A better algorithm is to obtainthe QR by premuliplying by orthogonal matrices until it is triangular, or

Qn · · ·Q1︸︷︷︸Q∗

A = R, (9.4.4)

then A = QR. This is better because it does not use projections, which are not orthogonal.Rotations of orthogonal matrices as well as reflections are useful to introduce zeros. As anexample,

Rotation

Example 9.13. Rotation in the xy plane about the origin. So the matrix P =

(cos(θ) − sin(θ)sin(θ) cos(θ)

).

Now, P−1 = P−θ =

(cos(θ) sin(θ)− sin(θ) cos(θ)

)= Pᵀ. This again shows that it is orthogonal, par-

ticularly the columns are orthogonal. These are rotations in the plane.

Example 9.14. 3D RotationRotation in three dimensions about the z-axis: This is very similar,

P =

cos(θ) − sin(θ) 0sin(θ) cos(θ) 0

0 0 1

. (9.4.5)

this rotates in the xy plane.

We can further rotate in any plane ij for some vector in Rn;

P =

i j

1i cos(θ) − sin(θ)

1j sin(θ) cos(θ)

1

, (9.4.6)

Px =

x1...

xi−1

cos(θ)xi − sin(θ)xjxi+1

...xj−1

sin(θ)xi − cos(θ)xjxj+1

...xn

. (9.4.7)

117


This is called a Givens rotation. We can choose our θ such that (Qix)j = 0. So if we have

Pθ =

x x x xx x x xx x x xx x x xx x x x

=

x x x xx x x xx x x xx x x x0 x x x

. (9.4.8)

So the QR factorization by Givens rotations,

Pθn · · ·Pθ2Pθ1︸︷︷︸Q∗

A = R. (9.4.9)

Note, projections are not orthogonal. We can check this with PP∗ = I. However P(u1) = 0and this means we have a non-trivial null-space so projections is not invertible. Therefore,this is not invertible.

Reflection

Example 9.15. Suppose we have the vectors u and x, where ‖u‖ = 1. We want to reflectx across the plane orthogonal to u. We will consider this operation Rx This operation isalso orthogonal. Now we will generalize a vector u⊥ = {v : vᵀu = 0}. So the orthogonalprojection onto u⊥; first we know 〈u,x〉,

Px = x− 〈u,x〉u, (9.4.10a)

= (I− uu∗) x, (9.4.10b)

Rx = (I− 2uu∗) x. (9.4.10c)

where P is the projection onto the subspace and R is the reflection across the subspace.Now R∗ = (I− 2uu∗) = R and R2 = I. This implies that R−1 = R∗ and R is orthogonal.

9.5 Homework Assignment 6: Due Monday, November

11, 2013

1. Let A =

(1 23 4

). Find ‖A‖p for p = 1, 2,∞,F.

2. Show that ‖A‖∞ = maxi

∑j |aij| (Hint: make sure you understand how the analogous

formula for ‖A‖1 was derived in class.)

3. (a) Given a vector norm ‖x‖, prove that the formula ‖A‖ = supx 6=0

‖Ax‖‖x‖ defines a matrix

norm.

(This is called the induced matrix norm.)

(b) Show that for any induced matrix norm, ‖Ax‖ ≤ ‖A‖‖x‖.(c) Prove that any induced matrix norm also satisfies ‖AB‖ ≤ ‖A‖‖B‖.

118


4. Consider the formula ‖A‖ = maxi,j|aij|

(a) Show that it defines a matrix norm.

(b) Show that it is not induced by a vector norm.

5. Meyer, Exercise 5.2.6

Establish the following properties of the matrix 2-norm.

(a) ‖A‖2 = max‖x‖2=1, ‖y‖2=1

|y∗Ax|,

(b) ‖A‖2 = ‖A∗‖2,

(c) ‖A∗A‖ = ‖A‖22,

(d)

∥∥∥∥(A 00 B

)∥∥∥∥2

= max {‖A‖2, ‖B‖2} (take A, B to be real),

(e) ‖U∗AV‖2 = ‖A‖2 when UU∗ = I and V∗V = I.

6. Show that ‖A−1‖ =√

1λmin

where λmin is the smallest eigenvalue of AᵀA.

7. Show that 〈A,B〉 = tr(A∗B) defines an inner product.


For a real inner-product space with ‖ ? ‖2 = 〈?, ?〉, derive the inequality

〈x,y〉 ≤ ‖x‖2 + ‖y‖2

2. Hint: Consider x− y.


For n× n matrices A and B, explain why each of the following inequalities is valid.

(a) |tr(B)|2 ≤ n[tr(B∗B)].

(b) tr(B2) ≤ tr(BᵀB) for real matrices.

(c) tr(AᵀB) ≤ tr(AᵀA)+tr(BᵀB)2

for real matrices.

10. Given

A =

1 0 −11 2 11 1 −30 1 1

, and b =

1111

.(a) Find an orthonormal basis for R(A), using the standard inner product.

(b) Find the (reduced) QR decomposition of A.

(c) For the matrix Q in (b), compute QᵀQ and QQᵀ.

(d) Find the least squares solution of Ax = b, using your results above.

(e) Determine the Fourier expansion of b with respect to the basis you found in (a).

11. Explain why the (reduced) QR factorization of a matrix A of full rank is unique.

119



Let V be the inner-product space of real-valued continuous functions defined on theinterval [−1, 1], where the inner product is defined by

〈f, g〉 =

ˆ 1

−1

f(x)g(x) dx ,

and let S be the subspace of V that is spanned by the three linearly independentpolynomials q0 = 1, q1 = x, q2 = x2.

(a) Use the Gram–Schmidt process to determine an orthonormal set of polynomials{p0, p1, p2} that spans S. These polynomials are the first three normalized Legendrepolynomials.

(b) Verify that pn satisfies Legendres differential equation

(1− x2)y′′ − 2xy′ + n(n+ 1)y = 0

for n = 0, 1, 2. This equation and its solutions are of considerable importance inapplied mathematics.


From last time:

Elementary orthogonal projectors

Let u, where ‖u‖ = 1, then the projection of a vector x onto the sub plane orthogonalto u is P⊥x = x − 〈u,x〉u. And P|| = uu∗ and P = I − uu∗. Now this projector,P, is not orthogonal. This is because an orthogonal matrix has the form Q∗ = Q−1 orQ∗Q = QQ∗ = I. Now,

P∗⊥ = I− (u∗)∗ u∗, (9.6.1a)

= P⊥. (9.6.1b)

This further gives

P∗P = P2, (9.6.2a)

= P, (9.6.2b)

6= I. (9.6.2c)

This property shows that once we project, projection a second time does not change theresult. Also, N(P) 6= 0, so the projectors are not invertible. Now the null space of P|| isequal to u⊥, or N(P||) = u⊥. Similarly N(P⊥) = span(u).

120


Elementary reflection

Now Rx = x−〈u,x〉u, and in this case R is orthogonal. So R∗ = R and R∗R = RR∗ = I.Also,

(I− 2uu∗) (I− 2uu∗) = I− 2uu∗ − 2uu∗ + 4u (u∗u) u∗, (9.6.3a)

= I− 4uu∗ + 4uu∗, (9.6.3b)

= I. (9.6.3c)

Now use reflectors to compute A = QR. So say we have

Ru =

x x xx x xx x xx x x

. (9.6.4)

So Rx = (‖u‖, 0) = ‖u‖ei. Thus, u = x− ‖x‖ei. Doing successive reflections,

RuN· · ·Ru2Ru1︸︷︷︸

Q

A = R. (9.6.5)

This gives us the Householder method .

Complimentary Subspaces of VDefinition 9.16. If V = X + Y , where X ,Y are subspaces such that X ∩ Y = {0}, whichare called complimentary subspaces and V = X ⊕ Y is the direct sum of X ,Y .

Given the general picture, how do we define the angle between two subspaces? Note: IfV = X ⊕ Y then any z ∈ V can be written uniquely as z = x + y, for x ∈ X and y ∈ Y .Further dim(V) = dim(X ) + dim(Y) and BV = BX ∪ BY .

Proof. If z = x1 + y1 = x2 + y2 then x1 − x2 = y1 − y2 ∈ X ∩ Y . So x1 − x2 = y1 − y2 = 0and X ∩ Y = {0}. �

Example 9.17. Say we have Rn = R(A)⊕ N(Aᵀ) for Am×n.

Projectors

Definition 9.18. We define general projectors: The projector P onto X along Y is thelinear operator such that P(z) = P(x + y) = x.

Note: If P projects onto X along Y then P2 = P because P2(x+y) = P(x) = P(x+0) =x = P(z). Now the null space, N(P) = y because P(z) = P(x + y) = x = 0. Further,R(P) = x. Also, R(P)⊕ N(P) = Rn as we showed in Homework 5.

Ultimately, we want to find the Jordan canonical form of our matrices. In general R(A)+N(A) 6= Rn. This is obvious if Am×n because they have different dimensions, so this onlymakes sense if An×n. But even if A is square, let y ∈ N(A) ∩ R(A) then Ay = 0 andy = Az for some z. Then A (Az) = A2z = 0, and we have a non-trivial intersection. So ifA2 has a nontrivial null space, then N(A) and R(A) have nontrivial intersection.

121


Example 9.19. Obviously this cannot be an invertible matrix, so say we have

A =

(0 10 0

)and A2 =

(0 00 0

)This is an example of a null-potent matrix. But this is only true for projectors.

Theorem 9.20. P is a projector if and only if P2 = P. These are also known as idempotentmatrices.


From last time:

Definition 9.21. P : V → V is a projector if for each X ,Y such that V = X ⊕ Y andV(x + y) for any z = x + y ∈ V .

Note: R(V) = X and N(V) = Y .

Projectors

Theorem 9.22. P is a projector if and only if P2 = P. These are also known as idempotentmatrices.

Proof. Given the vector space V and the operator P = P2,

R(P)︸︷︷︸X

⊕N(P)︸︷︷︸Y

= V , (9.7.1a)

P(x + y) = Px + Py, (9.7.1b)

= P Px0︸︷︷︸x, some x0

, (9.7.1c)

= Px0, (9.7.1d)

= x. (9.7.1e)

Going the other way,

z = Pz︸︷︷︸∈R(P)

+ (z−Pz)︸︷︷︸∈N(P)

, (9.7.2a)

V = R(P)⊕ N(P). (9.7.2b)

�

122


Representation of a projector

We discuss the representation of P. Given {m1, . . . ,mr} as a basis for R(P) = X and{n1, . . . ,nn−r} as a basis for N(P) = Y . Then Pmi = mi and Pni = 0. Let B = [M | N].Then

PB = P[M | N], (9.7.3a)

= [M | 0]. (9.7.3b)

[P]s = P, (9.7.4a)

= [M | 0]B−1, (9.7.4b)

= [M | N]

(Ir×r 00 0

)B−1, (9.7.4c)

= B

(Ir×r 00 0

)B−1, (9.7.4d)

= [I]BS [P]B [I]−1BS . (9.7.4e)

Definition 9.23. For any subspace M⊂ V , M⊥ ={v ∈ V such that v⊥u = 0, u ∈M

}.

Theorem 9.24. For any subspace M⊂ V, V =M⊕M⊥

Proof. Given basis {b1, . . . ,bm} ofM, choose {bi} orthonormal complement by orthogonalset {bm+1, . . . ,bn} such that {b1, . . . ,bm︸︷︷︸

basis for M

,bm+1, . . . ,bn︸︷︷︸basis for M⊥

} is a basis for V . �

Example 9.25. Rn = R(A) ⊕ N(Aᵀ) where R(A) ⊥ N(Aᵀ). An orthogonal projector ontoM is PM is

PM = [M | N]

(I 00 0

)[M | N]−1, (9.7.5a)

M∗M = 0, (9.7.5b)

N∗N = 0, (9.7.5c)

(9.7.5d)

Where

M =

| |m1 · · · mm

| |

n×m

and N =

| |nm+1 · · · nn| |

n×(n−m)

. (9.7.6)

Note: ((M∗M)−1 M∗

(N∗N)−1 N∗

)︸︷︷︸

[M | N]−1

(M N

)=

(I 00 I

)(9.7.7)

123


and

PM = [M | 0]

((M∗M)−1 M∗

(N∗N)−1 N∗

), (9.7.8a)

= M (M∗M)−1 M∗. (9.7.8b)

But if the basis were orthonormal, how does this change the formula? Given any basis{m1, . . . ,mm} for subspace M, orthogonal projector.

PM = M (M∗M)−1 M∗. (9.7.9)

If {m1, . . . ,mm} are orthogonal then M∗M = I and

PM = MM∗. (9.7.10)

Example 9.26. Elementary orthogonal projectors,

P|| = uu∗. (9.7.11)

andP⊥ = I− uu∗ (9.7.12)

Theorem 9.27.‖x−PMx‖2

2 = miny∈M‖x− y‖2

2. (9.7.13)

(we will prove this as an exercise)

Note: A (AᵀA)−1

Aᵀ is the projector onto the range of A, or PR(A) where we assumethat A has full rank. The normal equations to solve Ax = b is

AᵀAx = A

ᵀb. (9.7.14)

andx =

(AᵀA)−1

Aᵀ︸︷︷︸

pseudoinverse

b. (9.7.15)

So,

Ax = A(AᵀA)−1

Aᵀb, (9.7.16a)

= PR(A)b. (9.7.16b)


Projectors

We discussed a projector P onto X along Y and also that the projector is idempotent,P2 = P. Further, R(P) = X and N(P) = Y .

[P]S = [M | N]

(Ir×r 00 0

)[M | N]−1. (9.8.1)

124


The orthogonal projector onto M = R(M), where M = [m1 · · · mm] is the basis of M,

P = M (M∗M)−1 M∗ (9.8.2)

The normal equations for Ax = b, with A being a full rank matrix, are

Ax = PR(A)b. (9.8.3)

Projector P is orthogonal, then P∗ = P.

Proof. P is an orthogonal projector,

P = M (M∗M)−1 M∗, (9.8.4a)

P∗ = M (M∗M)−1 M∗, (9.8.4b)

= P. (9.8.4c)

further suppose that P = P2 and P = P∗. Now we want to show that N(P) ⊥ R(P), whereit is normal in the standard inner product. Let x ∈ R(P) and y ∈ N(P). Then consider theinner product,

y∗x = y∗Px, (9.8.5a)

= ( P∗︸︷︷︸P

y)∗x, (9.8.5b)

= (Py)∗︸︷︷︸0∗

x, (9.8.5c)

= 0∗x, (9.8.5d)

= 0. (9.8.5e)

if {mi} are orthogonal, PM = MM∗. �

Example 9.28.

P|| = uu∗, (9.8.6a)

P⊥ = I− uu∗. (9.8.6b)

V = X ⊕ Y .

Decompositions of Rn

Given An×n, we know R(A)⊕ N(Aᵀ) = Rn and R(Aᵀ)⊕ N(A) = Rn, but R(A)⊥ = N(Aᵀ).Let B = {u1, . . . ,ur︸︷︷︸

basis for R(A)

,ur+1, . . . ,un︸︷︷︸basis for N(Aᵀ)

} orthonormal. Further B = { v1, . . . ,vr︸︷︷︸basis for R(Aᵀ)

,vr+1, . . . ,vn︸︷︷︸basis for N(A)

}

125


also orthonormal. So,

UᵀAV =

(UR(A) UN(Aᵀ)

)ᵀA(VR(Aᵀ) VN(A)

), (9.8.7a)

=

(UᵀR(A)

UᵀN(Aᵀ)

)(AVR(Aᵀ) AVN(A)

), (9.8.7b)

=

(UᵀR(A)AVR(Aᵀ) 0

UᵀN(Aᵀ)AVN(A) 0

), (9.8.7c)

=

(Cr×r 0

0 0

). (9.8.7d)

(UN(Aᵀ)A

)ᵀ= A

ᵀUN(Aᵀ), (9.8.8a)

= 0. (9.8.8b)

Range Nullspace decomposition of An×n

Theorem 9.29. Rn = R(Ak) ⊕ N(Ak) for some k. This is not necessarily an orthogonaldecomposition. The smallest such k is called the index of A.

Proof. First, note that R(Ak+1) ⊂ R(Ak) for any k. This is because if y ∈ R(Ak+1), theny = Ak+1z for some z, then y = Ak (Az). Second, R(A) ⊂ R(A2) ⊂ R(A3) ⊂ · · · ⊂R(Ak) = R(Ak+1) = R(Ak+2) = · · · contains equality for some k. �

to be continued. . .

9.9 Homework Assignment 7: Due Friday, November

22, 2013

You may use Matlab to compute matrix products, or to reduce a matrix to Row EchelonForm.

1. (a) Let A ∈ Rm×n. Prove R(A) and N(Aᵀ) are orthogonal complements of Rm.

(b) Verify this fact for A =

1 2 02 4 11 2 0

.

2. Prove: If X ,Y are subspaces of V such that V = X ⊕ Y , then for any x ∈ V thereexists a unique x ∈ X and y ∈ Y such that z = x + y.

3. Prove: If X ,Y are subspaces of V such that V = X+Y and dim(X )+dim(Y) = dim(V)then X ∩ Y = {0}.

4. Textbook 5.11.3:

Find a basis for the orthogonal complement of M = span

1203

,

2416

.

126


5. Let P be a projector. Let P′ = I−P.

(a) Show that P′ = I−P is also a projector. It is called the complementary projectorof P.

(b) Any projector projects a point z ∈ V onto X along Y , where X ⊕ Y = V , byP(z) = P(x + y) = x. What are the X and Y for P and I−P, respectively?

6. Textbook 5.9.1:

Let X and Y be subspaces of R3 whose respective bases are

BX =

1

11

,1

22

and BY =

1

23

(a) Explain why X and Y are complementary subspaces of R3.

(b) Determine the projector P onto X along Y as well as the complementary projectorQ onto Y along X .

(c) Determine the projection of v =

2−1

1

onto Y along X .

(d) Verify that P and Q are both idempotent.

(e) Verify that R(P) = X = N(Q) and N(P) = Y = R(Q).

7. (a) Find the orthogonal projection of b = (4, 8)ᵀ onto M = span {u}, where u =(3, 1)ᵀ.

(b) Find the orthogonal projection of b onto u⊥, for b, u given in (a).

(c) Find the orthogonal projection of b = (5, 2, 5, 3)ᵀ onto

M = span{

(3/5, 0, 4/5, 0)ᵀ, (0, 0, 0, 1)

ᵀ, (4/5, 0, 3/5, 0)

ᵀ}.

(Note: the given columns are orthonormal.)

(d) Find the orthogonal projection of b = (1, 1, 1)ᵀ onto the range of

A =

1 02 11 0

8. (a) Show that ‖P‖2 ≥ 1 for every projector P 6= 0. When is ‖P‖2 = 1?

(b) Show that ‖I−P‖2 = ‖P‖2 for all projectors P 6= 0, I.

9. (a) Show that the eigenvalues of a unitary matrix satisfy |λ| = 1. Show by a counter-example that reverse not true.

(b) Show that the eigenvalues of a projector are either 0 or 1. Show by a counter-example that the reverse not true.

10. Let u be a unit vector. The elementary reflector about u⊥ is defined to be R = I−2uu∗.

127


(a) Prove that all elementary reflectors are involutory (R2 = I), hermitian, and uni-tary.

(b) Prove that if Rx = µei, then µ = ±‖x‖2, and that R:i = Rei = ±x.

(c) Find the elementary reflector that maps x = 13(1,−2,−2)ᵀ onto the x-axis.

(d) Verify by direct computation that your reflector in (c) is symmetric, orthogonal,involutory.

(e) Extend the vector x in (c), to an orthonormal basis for R3. (Hint: what do youknow about the columns of R from parts (a,b) above?)

11. Textbook 5.6.17:

Perform the following sequence of rotations in R3 beginning with

v0 =

11−1

1. Rotate v0 counterclockwise 45° around the x-axis to produce v1.

2. Rotate v1 clockwise 90° around the y-axis to produce v2.

3. Rotate v2 counterclockwise 30° around the z-axis to produce v3.

Determine the coordinates of v3 as well as an orthogonal matrix Q such that Qv0 = v3.

12. (a) Find the index of A =

−2 0 −44 2 43 2 2

. Find its core-nilpotent decomposition.

(b) A matrix is said to be nilpotent if Ak = 0 for some k. Show that the index ofa nilpotent matrix is the smallest k for which Ak = 0. Find its core-nilpotentdecomposition.

(c) Find the index of a projector that is not the identity. Find its core-nilpotentdecomposition.

(d) What is the index of the identity?


Range Nullspace decomposition of An×n

Theorem 9.30. For any An×n and some k; Rn = R(Ak)⊕ N(Ak). The smallest such k iscalled the index of A.

Example 9.31. Nilpotent matrices have some k such that Nk = 0, R(Nk) = {0}, andN(Nk) = Rn

Proof. First, note that R(Ak+1) ⊆ R(Ak) for any k. This is because if y ∈ R(Ak+1), theny = Ak+1z for some z, then y = Ak (Az). Second, R(A) ⊂ R(A2) ⊂ R(A3) ⊂ · · · ⊂R(Ak) = R(Ak+1) = R(Ak+2) = · · · contains equality for some k. The dimensions decrease

128


if proper. Third, once equality achieved, it is maintained through the rest of the chain. Theproof:

R(Ak+2) = R(Ak+1A), (9.10.1a)

= AR(Ak+1), (9.10.1b)

= AR(Ak), (9.10.1c)

= R(Ak). (9.10.1d)

Fourth, N(A0) ⊂ N(A) ⊂ N(A2) ⊂ · · · ⊂ N(Ak) = N(Ak+1) = N(Ak+2) = · · ·. Whydoes the nullspace change at the same spot as the columnspace? Because dim(N(Ak)) =n−dim(R(Ak)), so once the dimensions are constant in the columnspace, then the dimensionswill be constant for the nullspace. Fifth, R(Ak) ∩ N(Ak) = {0}: Let y ∈ R(Ak) andy ∈ N(Ak), then y = Akx for some x, and Aky = 0. So A2kx = 0 and x ∈ N(A2k) = N(Ak)so Akx︸︷︷︸

y

= 0. Sixth, R(Ak) + N(Ak) = Rn since the dimensions add up and there is no

intersection of the two spaces (except for {0}). �

Now, how can we factor the matrix?

Corresponding factorization of A

Let {x1, . . . ,xr} be a basis for R(Ak) and{y1, . . . ,yn−r

}be a basis for N(Ak). Then S =[

x1, . . . ,xr,y1, . . . ,yn−r], and we note that X = span {x1, . . . ,xr} and Y = span

{y1, . . . ,yn−r

}which are both invariant subspaces. So

S−1AS =

(Cr×r 0

0 N(n−r),(n−r)

). (9.10.2)

Note S−1AkS = (S−1AS)k

because the inverse and normal S terms cancel out in the expo-nentiation. Thus,

S−1AkS =

(C 00 Nk

), (9.10.3a)

= S−1Ak(X Y

), (9.10.3b)

= S−1(AkX AkY

), (9.10.3c)

= S−1(AkX 0

), (9.10.3d)

=(S−1AkX 0

). (9.10.3e)

Thus Nk = 0 and N is nilpotent and C is invertible. So we have a core-nilpotent factorizationof A. So we have a similarity factorization which always exists. We recall the decompositionfor any A ∈ Rn×n = R(A)⊕ N(Aᵀ) = R(Aᵀ)⊕ N(A), corresponding factorization

UᵀAV =

(C 00 0

). (9.10.4)

129


130

UNIT 10

Singular Value Decomposition



The singular value decomposition is a way to find the orthogonal matrices Un and Vn maybe found such that we may diagonalize A. Or

Uᵀm · · ·U

ᵀ2Uᵀ1AV1V2 · · ·Vm =

σ1 0 · · · 0 0 · · · 0

0 σ2. . .

......

......

. . . . . . 0 0 · · · 00 · · · 0 σr 0 · · · 00 · · · 0 0 0 · · · 0...

......

.... . .

...0 · · · 0 0 0 · · · 0

(10.1.1)

Theorem 10.1. For any Am×n there exists orthogonal U and V such that

Am×n = UDVᵀ, (10.1.2a)

= [U]m×m

σ1 0 · · · 0 0 · · · 0

0 σ2. . .

......

......

. . . . . . 0 0 · · · 00 · · · 0 σr 0 · · · 00 · · · 0 0 0 · · · 0...

......

.... . .

...0 · · · 0 0 0 · · · 0

[Vᵀ]n×n (10.1.2b)

where σi are real and greater than 0. Further σ1 ≥ σ2 ≥ · · · ≥ σr, where r = rank(A).

Definition 10.2. σi are the singular values of A.

Note:

131

Nitsche and Benner Unit 10. Singular Value Decomposition

1. σi are uniquely determined, but U,V are not unique

2. rank(A) = rank(D)

3. ‖A‖2 = ‖D‖2 ∥∥∥∥(A 00 B

)∥∥∥∥2

= max (‖A‖2, ‖B‖2)

4. If A is invertible,

A = U

σ1 0 · · · 0

0 σ2. . .

......

. . . . . . 00 · · · 0 σr

Vᵀ, (10.1.3a)

A−1 = V

1σ1

0 · · · 0

0 1σ2

. . ....

.... . . . . . 0

0 · · · 0 1σn

Uᵀ, (10.1.3b)

= V

1σn

0 · · · 0

0 1σn−1

. . ....

.... . . . . . 0

0 · · · 0 1σ1

Uᵀ. (10.1.3c)

Now K(A) = ‖A‖·‖A−1‖ = σ1σn

which means that we can have issues with singularities.

Example 10.3. Prove ‖I−P‖2 = ‖P‖2. What is the norm of P and of I − P? Fromillustration we can use tangents to the unit ball. Then ‖Pω‖ = ‖(I−P)ω‖ needs to beshown.


We will do review for exam on Friday.


SVD:

Theorem 10.4. For any Am×n there exists orthogonal U,V such that

Am×n = Um×mDm×nVᵀn×n, (10.2.1)

132


where

D =

σ1 0 · · · 0 0 · · · 0

0 σ2. . .

......

......

. . . . . . 0 0 · · · 00 · · · 0 σr 0 · · · 00 · · · 0 0 0 · · · 0...

......

.... . .

...0 · · · 0 0 0 · · · 0

m×n

(10.2.2)

and σi > 0, σ1 ≥ σ2 ≥ · · · ≥ σr > 0.

Notes:

1. ‖A‖2 = σ1, ‖A−1‖2 = 1σn

, where A would have to be invertible.

The condition number is κ(A) = σ1σn

.

2. r = rank(A).

3. |det(A)| =∏n

i=1 σi.

4. A−1 = VD−1Uᵀ.

Existence of the Singular Value Decomposition

Proof. We know that there exists U and V such that,

UAᵀV =

(C 00 0

), (10.2.3)

where Cr×r is invertible. Let x be such that ‖x‖ = 1, and

‖C‖2 = max‖y‖2=1

‖Cy‖2, (10.2.4a)

= ‖Cx‖2, (10.2.4b)

= σ1, (10.2.4c)

= ‖A‖2. (10.2.4d)

Let y = Cx‖Cx‖2

and further the two orthogonal matrices, [x | X] and [y | Y]. Now,

[y | Y]ᵀ(

C 00 0

)[x | X] =

(yᵀ

Yᵀ

)(Cx CX

), (10.2.5a)

=

(yᵀCx yᵀCXYᵀCx YᵀCX

)(10.2.5b)

133


Further,

yᵀCx =

xᵀCᵀCx

‖Cx‖2

, (10.2.6a)

=‖Cx‖2

2

‖Cx‖2

, (10.2.6b)

= ‖Cx‖2, (10.2.6c)

= σ1. (10.2.6d)

Similarly, YCx = 0,

yᵀCX =

xᵀCᵀCX

‖Cx‖2

, (10.2.7a)

=xᵀCᵀCxxᵀX

‖Cx‖2

, (10.2.7b)

=xᵀCᵀCx

‖Cx‖2

xᵀX, (10.2.7c)

= σ1 xᵀX︸︷︷︸

orthogonal

, (10.2.7d)

= 0. (10.2.7e)

So we have reduced to, (σ1 0

0ᵀ

C

).

We may then repeat this by maximizing the two-norm to get the full singular value decom-position. �

Notes:

Am×n = [U]m×m

σ1 0 · · · 0 0 · · · 0

0 σ2. . .

......

......

. . . . . . 0 0 · · · 00 · · · 0 σr 0 · · · 00 · · · 0 0 0 · · · 0...

......

.... . .

...0 · · · 0 0 0 · · · 0

m×n

[Vᵀ]n×n, (10.2.8a)

=

| |u1 · · · ur| |

m×r

σ1 0 · · · 0

0 σ2. . .

......

. . . . . . 00 · · · 0 σr

r×r

− vᵀ1 −...

− vᵀr −

r×n

, (10.2.8b)

= UDV. (10.2.8c)

134


from trimming out the zeros. Here σ1, . . . , σr are unique, and u1, . . . ,ur and v1, . . . ,vr areunique up to sign.

From the existence of A = UDVᵀ, what can we deduce? We know that UᵀU = UUᵀ = Iand VᵀV = VVᵀ = I. So,

[AV]:j = [UD]:j , (10.2.9a)

= U

0...0σj0...0

, (10.2.9b)

= σjuj. (10.2.9c)

where (AB):j = AB:j. Now,

Avj =

{σjuj, 1 ≤ j ≤ r0, j > r

, (10.2.10a)

Aᵀ

= VDᵀUᵀ, (10.2.10b)

AᵀU = VD,A

ᵀuj =

{σjvj, 1 ≤ j ≤ r0, j > r

. (10.2.10c)

So, the four fundamental subspaces are

• R(A) = span {u1, . . . ,ur}• N(A) = span {vr+1, . . . ,vn}• R(Aᵀ) = span {v1, . . . ,vr}• N(Aᵀ) = span {ur+1, . . . ,um}(

AᵀA)n×n = VD

ᵀUᵀUDV

ᵀ, (10.2.11a)

= VDᵀDV

ᵀ, (10.2.11b)

= V

σ21 0 · · · 0 0 · · · 0

0 σ22

. . ....

......

.... . . . . . 0 0 · · · 0

0 · · · 0 σ2r 0 · · · 0

0 · · · 0 0 0 · · · 0...

......

.... . .

...0 · · · 0 0 0 · · · 0

n×n

Vᵀ, (10.2.11c)

(AᵀAV

):j

=(VD

ᵀD)

:j, (10.2.11d)

AᵀAvj =

{σ1jvj, j ≤ r

0, j > r. (10.2.11e)

135


Thus, σj =√λs (AᵀA) , for j = 1, . . . , r. Similarly, vj are the eigenvectors of AᵀA for j =

1, . . . , r and vj are orthogonal because eigenvectors of symmetric matrices are orthogonal.To construct the SVD, we will

1. find λj, which are the eigenvalues of AᵀA and the eigenvectors of AᵀA, vj.

2. Find u1, . . . ,ur for σjuj = Avj

3. Find complementary orthogonal set ur+1, . . . ,um and vr+1, . . . ,vn.


Review and correction from last time

From last time:

UᵀAV =

(C 00 0

)(10.3.1)

Then we said there exists an x such that ‖Cx‖ = ‖C‖2 = σ1. Then we let y = Cxσ1

Consider,[x |X] and [y |Y]. In our system (we must correct this from last lecture), since we know thatthe x is the eigenvector corresponding to the λ and CᵀCx = λx. Then xᵀCᵀC = xᵀλ = λxᵀ

yᵀCX =

xᵀCᵀCX

σ1

, (10.3.2a)

=λxᵀX

σ1

, (10.3.2b)

= 0. (10.3.2c)

SVD will not be on the exam, but will be on the final.


We know,

A = UDVᵀ, (10.3.3a)

= UDVᵀ

(10.3.3b)

Similarly,VA = UD. (10.3.4)

This means that

Avj =

{σjuj, j ≤ r0, j > r

(10.3.5)

Then,

Aᵀuj =

{σjvj, j ≤ r0, j > r

(10.3.6)

Thus, vj are called the right singular vectors, the uj are the left singular vectors, and

σj =√λ(AᵀA) are the singular values. Also we may define the four subspaces,

136


• R(A) = span {u1, . . . ,ur}• N(A) = span {vr+1, . . . ,vn}• R(Aᵀ) = span {v1, . . . ,vr}• N(Aᵀ) = span {ur+1, . . . ,um}

So if we have the SVD, it is easy to describe these subspaces. So we will construct the SVDusing these facts. Now,

AᵀA = VD

ᵀUᵀUDV

ᵀ, (10.3.7a)

= VDᵀDV

ᵀ, (10.3.7b)

AᵀAV = VD

ᵀD. (10.3.7c)

Example 10.5. Given

A =

(1 12 2

). (10.3.8)

Then r = 1 and

AᵀA =

(1 21 2

)(1 12 2

), (10.3.9a)

=

(5 55 5

)(10.3.9b)

AᵀAv = λv, (10.3.10a)

det(AᵀA− λI

)= 0, (10.3.10b)∣∣∣∣5− λ 5

5 5− λ

∣∣∣∣ = 25− 10λ+ λ2 − 25, (10.3.10c)

= λ2 − 10λ, (10.3.10d)

= λ (λ− 10) . (10.3.10e)

So to find v1 for (AᵀA− λI)v = 0.(5− 10 5

5 5− 10

)v = 0, (10.3.11a)(

−5 55 −5

)(v1

v2

)=

(00

), (10.3.11b)

−5v1 + 5v2 = 0, v2 = v1, (10.3.11c)

v1 =1√2

(11

)(10.3.11d)

So

Av1 =

(1 21 2

)1√2

(11

)=

1√2

(24

)(10.3.12a)

137


Thus,

u1 =1√20

(24

), (10.3.13a)

=1√5

(12

)(10.3.13b)

A =1√5

(12

)(√10) 1√

2

(1 1

), (10.3.14a)

= UDVᵀ, (10.3.14b)

=1√5

(1 22 −1

)(√10 00 0

)1√2

(1 1−1 1

), (10.3.14c)

= UDVᵀ

(10.3.14d)

This is great to do by hand, but is not a very numerically stable way to find the SVD.

Geometric interpretation

The image of the unit sphere S2 = {x ∈ Rn, ‖x‖2 = 1}

y = Ax, (10.3.15a)

= UDVᵀx, (10.3.15b)

Uᵀy = DV

ᵀx. (10.3.15c)

Let y′ = Uᵀx and x′ = Vᵀx. So

y′ = Dx′, (10.3.16a)

y′j = σjx′j. (10.3.16b)

Now ‖x‖22 = 1 and ‖x′‖2

2 = ‖Vᵀx‖22 = 1. Thus,

(x′1)2

+ (x′2)2

+ · · ·+ (x′n)2

= 1, (10.3.17a)(y′1σ1

)2

+

(y′2σ2

)2

+ · · ·+(y′nσn

)2

= 1, (10.3.17b)

which is a hyperellipse! Viewing the transformation of Axj = σjuj. This shows that the σjgive the major and minor axes of the multi-dimensional ellipsoid.

There is a nice fact about the SVD. For low rank approximations (the second step maybe rationalized easily from the matrix form)

A = UDVᵀ, (10.3.18a)

=r∑j=1

σjujvᵀj . (10.3.18b)

This is a way to write any matrix as a sum of rank 1 matrices. Now the σj decrease, so we may

truncate the series when σj gets close to zero. Let Ak =∑k

j=1 σjujvᵀj with rank(Ak) = k.

138


Theorem 10.6. ‖A−Ak‖2 = σk+1 and is the best approximation, or

‖A−Ak‖2 = minrank(B)=k

‖A−B‖2. (10.3.19)

From

Ak = U

σ1

. . .

σkσk+1

. . .

σr. . .

0

Vᵀ −U

σ1

. . .

σk0

. . .

0. . .

0

Vᵀ,

(10.3.20a)

= U

0. . .

0σk+1

. . .

σr. . .

0

Vᵀ

(10.3.20b)

We will explore the proof and implications of this theorem later.


Review for Exam 2

From homework, we need to be able to go through proofs like this

• ‖A‖∞ ⇐⇒ ‖A‖1

• Matrix norm

• QR unique

• ‖A−1‖2 = 1√λmin(AᵀA)

• ‖A‖2 =√λmax(AᵀA)

Norms

To show that something is a norm (whether matrices or vectors), we must show the followingproperties,

139


1. ‖x‖ ≥ 0 for any x and ‖x‖ = 0 implies x = 0

2. ‖αx‖ = |α|‖x‖3. ‖x + y‖ = ‖x‖+ ‖y‖

Several matrix norms, the induced and Frobenius, have the fourth property,

‖AB‖ ≤ ‖A‖‖B‖ (10.4.1)

More major topics

The exam covers chapters 4 and 5 (minus the SVD). These are things to know:

• Subspace (closed under addition and scalar multiplication)

• Linear transformations (Definition: addition and scalar multiplication)

• Coordinates and change of bases

[x]S = x, (10.4.2a)

=∑i

xiei = Ix, (10.4.2b)

=∑i

ciui = Uc. (10.4.2c)

where c = [x]B and this is clearly a problem of inverting a matrix. The formula is

c = [x]B , (10.4.3a)

=([e1]B [e2]B · · · [en]B

)︸︷︷︸U−1

[x]S . (10.4.3b)

So we really care about what the representation is of some linear operator for somebasis.

[T]B =([T(u1)]B [T(u2)]B · · · [T(un)]B

), (10.4.4a)

[T(x)]B = [T]B [x]B . (10.4.4b)

• Change coordinates

[T]B ∼ [T]′B , (10.4.5a)

T = ST′S−1. (10.4.5b)

• Least squares: Ax = b. The normal equations are

AᵀAx = A

ᵀb (10.4.6)

140


This connects with projections because,

Ax = A(AᵀA)−1

Aᵀ︸︷︷︸

P⊥R(A)

b = Pb. (10.4.7)

The solution is unique of the matrix is full rank because then AᵀA is invertible.

• Projectors

Defined by P = P2, similarly we have the properties of the complementary projector(I−P). These are orthogonal to each other, and P∗ = P (Unitary matrix is anorthonormal matrix (when real) Q∗Q = I and Q∗ = Q−1. The projector alwaysprojects onto its range. The proof of P and (I−P).

• Gram–Schmidt needed to orthogonalize a set of matrices. or P = uu∗ = UU∗ = I−uu∗

• QR othogonalization (using Gram–Schmidt)

Show A = QR is unique, rjj > 0, Q is orthonormal, R is upper triangular. Existenceand uniqueness? From the Gramm–Schmidt construction process we know we can getit because we can always construct it. GS was

a1 = r11q1, (10.4.8a)

a2 = r12q1 + r22q2, (10.4.8b)

· · · (10.4.8c)

an = r1nq1 + r2nq2 + · · ·+ r2,nqn. (10.4.8d)

Uniqueness: this also shows uniqueness directly because you have these equations andmay invert them. (Invertibility) a1 = r11q1 implies ‖a1‖ = ‖r11q1‖ = |r11|‖q1‖ and wemay find the r11 so then q1 = 1

r11a1. Then induction may prove this is true for all the

other values of n. First we would show true for n = 1 (all qk are uniquely determined),then show if true for n = k then it’s also still true for n = k + 1. This is done withshowing r1,k+1, . . . , rk+1,k+1, qk+1 are uniquely determined.

ak+1 = r1,k+1q1 + · · ·+ rk,k+1qk + rk+1,k+1qk+1 (10.4.9a)

This is a Fourier series and we may take⟨ak+1,qj

⟩= rj,k+1

⟨qj,qj

⟩= rj,k+1 for j < k+1

therefore all we have left is to find the vector

rk+1,k+1qk+1 = ak+1 − r1,k+1q1 − · · · − rk,k+1qk (10.4.9b)

and we can do the same argument again to finish with rk+1,k+1 and qk+1∥∥rk+1,k+1qk+1

∥∥ = ‖b‖, (10.4.10a)

|rk+1,k+1|∥∥qk+1

∥∥︸︷︷︸1

= ‖b‖, (10.4.10b)

|rk+1,k+1| = ‖b‖, (10.4.10c)

rk+1,k+1 = ‖b‖. (10.4.10d)

141


For positive rj,j.

So we have several decompositions now to work with.

• Invariant subspaces will give a block diagonal form of the matrix.

will have class on Wednesday.

10.5 Homework Assignment 8: Due Tuesday, Decem-

ber 10, 2013

You may use Matlab to compute matrix products, or to reduce a matrix to Row EchelonForm.

1. Determine the SVDs of the following matrices (by hand calculation).

(a)

(3 00 −2

)

(b)

0 20 00 0

(c)

(1 11 1

)

2. Let

(1 20 2

)(a) Use Matlab to find the SVD of A. State U, Σ, V (4-decimal digit format is

fine).

(b) In one plot draw the unit circle C and indicate the vectors v1,v2, and in anotherplot draw the ellipse AC (i.e. the image of the circle under the transformation x→Ax) and indicate the vectors Av1 = σ1u1, Av2 = σ2u2. Use the axis(’square’)

command in Matlab to ensure that the horizontal and vertical axes have thesame scale.

(c) Find A1, the best rank-1 approximation to A in the 2-norm. Find ‖A−A1‖2.

3. Let A ∈ Rm×n, with rank r. Use the singular value decomposition of A to prove thefollowing.

(a) N(A) and R(Aᵀ) are orthogonal complementary subspaces of Rn.

(b) Properties in 5.2.6 (b, c, d, e):

Establish the following properties of the matrix 2-norm.

(a) *(b) ‖A‖2 = ‖A∗‖2,(c) ‖A∗A‖2 = ‖A‖2

2,

142

10.5. HW 8: Due December 10, 2013 Applied Matrix Theory

(d)

∥∥∥∥(A 00 A

)∥∥∥∥2

= max {‖A‖2, ‖B‖2} (take A,B to be real,

(e) ‖U∗AV‖2 = ‖A‖2 when UU∗ = I and V∗V = I.

(c) ‖A‖F =√σ2

1 + σ22 + · · ·+ σ2

r .

4. Show that if A ∈ Rn×n is symmetric then σj = |λj|.5. Compute the determinants of the matrices given in 6.1.3 (a), 6.1.3 (c), 6.2.1 (b).

(a) A =

1 2 32 4 11 4 4

(b) A =

1 2 −3 44 8 12 −82 3 2 1−3 −1 1 −4

(c)

∣∣∣∣∣∣∣∣0 0 −2 31 0 1 2−1 1 2 1

0 2 −3 0

∣∣∣∣∣∣∣∣6. (a) Show that if A is invertible, then det(A−1) = 1/ det(A).

(b) Show that for any invertible matrix S, det(SAS−1) = det(A).

(c) If A is n× n, show that det(αA) = αn det(A).

(d) If A is skew-symmetric, show that A is singular whenever n is odd.

(e) Show by example that in general, det(A + B) 6= det(A) + det(B).

7. (a) Let An×n = diag {d1, d2, . . . , dn}. What are the eigenvalues and eigenvectors of A?

(b) Let A be a nonsingular matrix and let λ be an eigenvalue of A. Show that 1/λ isan eigenvalue of A−1.

(c) Let A be an n × n matrix and let B = A − αI for some scalar α. How do theeigenvalues of A and B compare? Explain.

(d) Show that all eigenvalues of a nilpotent matrix are 0.

8. For each of the two matrices,

A = A1 =

3 2 10 2 0−2 −3 0

, A = A2 =

−4 −3 −30 −1 06 6 5

determine if they are diagonalizable. If they are, find

(a) a nonsingular P such that P−1AP is diagonal.

(b) A100

(c) eA.

143


9. Use diagonalization to solve the system

dx

dt= x+ y,

dy

dt= −x+ y, x(0) = 100, y(0) = 100.

10. 7.4.1

Suppose that An×n is diagonalizable, and let P = [x1|x2| · · · |xn] be a matrix whosecolumns are a complete set of linearly independent eigenvectors corresponding to eigen-values λi. Show that the solution to u′ = Au,u(0) = c, can be written as

u(t) = ξ1eλ1tx1 + ξ1eλ1tx1

in which the coefficients ξi satisfy the algebraic system Pξ = c.

11. 7.5.3

Show that A ∈ Rn×n is normal and has real eigenvalues if and only if A is symmetric.

12. 7.5.4

Prove that the eigenvalues of a real skew-symmetric or skew-hermitian matrix must bepure imaginary numbers (i.e., multiples of i).

13. 7.6.1

Which of the following matrices are positive definite?

A =

1 −1 −1−1 5 1−1 1 5

, B =

20 6 86 3 08 0 8

, C =

2 0 20 6 22 2 4

.

14. 7.6.4

By diagonalizing the quadratic form 13x2 + 10xy + 13y2, show that the rotated graphof 13x2 + 10xy + 13y2 = 72 is an ellipse in standard form as shown in Figure 7.2.1 onp. 505.


We will have one more homework before the end. We will have a homework on SVD andeigenvalues with the diagonalization, and we will be covering the Jordan Canonical Formbut may not be putting it on the homework. It will be due next Friday so we have time forthe solutions before the final. The final is cumulative and will be held on Wednesday.


We know that A = UΣVᵀ for any matrix A. Here Σ is a diagonal matrix. We mayrearrange,

AV = UΣ, (10.6.1a)

Avj =

{σjuj, j ≤ r0 j > r

(10.6.1b)

144


The SVD A =∑r

j=1 σjujvᵀj for a matrix of rank r. We may define, Ak =

∑kj=1 σjujv

ᵀj and

have an aproximation of rank k.

Theorem 10.7.

‖A−Ak‖2 = σk+1, (10.6.2a)

= minrank(B)=k

‖A−B‖2 (10.6.2b)

In words, Ak is a best approximation of rank k of A in the 2-norm.

Proof. The first part is easily shown by the matrix form of the eigenvalues which are in thediagonal matrix of the SVD. For the second part, we assume there is a matrix B which hasrank k and follows the condition ‖A−B‖2 < σk+1. Then there exists a subspace W ofdim(W) = n− k such that Bw = 0 for any w ∈ W . For such a w,

‖Aw‖2 = ‖(A−B) w‖2, (10.6.3a)

≤ ‖(A−B)‖2‖w‖2, (10.6.3b)

< σk+1‖w‖2. (10.6.3c)

But subspace V of dim(V) = k + 1 such that, ‖Aw‖2 ≥ σk+1 for all w ∈ V , namelyV = span {v1, . . . ,vk+1}. This is impossible though because the subspaces do not haveagreeing dimensions, or since dim(V) + dim(W) > n there exists w 6= 0 ∈ (V ∩W). Forthis w must have ‖Aw‖2 < σk+1‖w‖2 and ‖Aw‖2 ≥ σk+1‖w‖2 which is an impossiblecontradiction. This proof is a little more elementary than the proof in the book. �

Thus, we can approximate a matrix by some lower-rank matrices. This is good becausethen we have fewer non-zero entries in our system and reduce our co.

SVD in Matlab

Example handed out in class: In Matlab if you say x = load(clown.mat) then typewhos and you will see a matrix X. This may be displayed with image(X). Then we do[U,S,V] = svd(X). The first figure (Figure 10.1) plots the diagonal entries of S. So we seewe can truncate the small values. As we increase the approximations for k = 3, 10, 30 wesee a significantly improving image in Figure 10.2. So Ak = UΣVᵀ and this is done withAk = U(:,1:k) * S(1:k,1:k) * V(:,1:k)’. Now we see that for k = 30 we have a goodapproximation which is significantly less expensive than the original matrix. Further inTable 10.1 we observe that the relative error decreases significantly.

Listing 10.1. svdimag.m

1 % app l i c a t i o n o f the SVD to image compression2 % from ”Appl ied Numerical Linear Algebra ” , by J . Demmel , page 114 (SIAM)3 load clown . mat4 % X i s a matrix o f p i c e l s o f dimension 200 by 3205 [U, S ,V]=svd (X) ;6 %%7 f igure (1 )

145


Table 10.1. Relative error of SVD approximation matrix Ak

relative error compression ratiok σk+1/σk 520k/(200 · 320)3 0.155 0.02410 0.077 0.08130 0.027 0.244

8 plot (diag (S ) ) ;9 set (gca , ’ FontSize ’ , 15)

10 xlabel ( ’ k ’ )11 ylabel ( ’ \ s igma k ’ )12 t i t l e ( ’ S ingu la r va lue s o f X ’ )13 %%14 f igure (2 )15 i f o n t =1216 colormap ( ’ gray ’ )17 subplot ( ’ p o s i t i o n ’ , [ . 0 7 , . 5 4 , . 4 0 , . 4 0 ] )18 k=3; image(U( : , 1 : k )∗S ( 1 : k , 1 : k )∗V( : , 1 : k ) ’ ) ; t i t l e ( ’ k=3 ’ )19 set (gca , ’ FontSize ’ , i f o n t )20 set (gca , ’ XTickLabel ’ , ’ ’ )21 %22 subplot ( ’ p o s i t i o n ’ , [ . 5 , . 5 4 , . 4 0 , . 4 0 ] )23 k=10; image(U( : , 1 : k )∗S ( 1 : k , 1 : k )∗V( : , 1 : k ) ’ ) ; t i t l e ( ’ k=10 ’ )24 set (gca , ’ FontSize ’ , i f o n t )25 set (gca , ’ YTickLabel ’ , ’ ’ )26 set (gca , ’ XTickLabel ’ , ’ ’ )27 %28 subplot ( ’ p o s i t i o n ’ , [ . 0 7 , . 0 6 , . 4 0 , . 4 0 ] )29 k=30; image(U( : , 1 : k )∗S ( 1 : k , 1 : k )∗V( : , 1 : k ) ’ ) ; t i t l e ( ’ k=30 ’ , ’ FontSize ’ , i f o n t )30 set (gca , ’ FontSize ’ , i f o n t )31 %32 subplot ( ’ p o s i t i o n ’ , [ . 5 , . 0 6 , . 4 0 , . 4 0 ] )33 image(X) ; t i t l e ( ’ o r i g i n a l ’ )34 set (gca , ’ FontSize ’ , i f o n t )35 set (gca , ’ YTickLabel ’ , ’ ’ )

146


0 20 40 60 80 100 120 140 160 180 2000

2,000

4,000

6,000

8,000

k

σk

Singular values of X

Figure 10.1. Singular values σk of matrix X versus k.

k=3

20

40

60

80

100

120

140

160

180

200

k=10

k=30

50 100 150 200 250 300

20

40

60

80

100

120

140

160

180

200

original

50 100 150 200 250 300

Figure 10.2. Rank k approximations of original image.

147


148

UNIT 11

Additional Topics


The Determinant

We will quickly cover the essentials of chapter 6. The determinant is defined;

Definition 11.1.det(A) =

∑p

σ(p)a1p1a2p2 · · · anpn (11.1.1)

where p is the number of permutations of (1, . . . , n)→ (p1, p2, . . . , pn). Also, σ(p) is the signof the permutation,

σ(p) =

{+1, if even number of exchanges needed to obtain p from (1, . . . , n)−1, if odd number of exchanges needed to obtain p from (1, . . . , n)

(11.1.2)

If we have a non-zero determinant, then Ax = b has a unique solution.

Theorem 11.2. We have several interesting properties of determinants.

1. Triangular matrices:

det

a11 a12 · · · a1n

0 a22. . . a2n

.... . . . . .

...0 0 · · · ann

=n∏i=1

aii (11.1.3)

2. det(Aᵀ) = det(A)

3. det(AB) = det(A) det(B).

4. If B is obtained for A by

• Exchange row i with row j; det(B) = det(A).

• Multiply row i by α; det(B) = α det(A).

• Add multiple of row i to row j; det(B) = det(A).

5. det(A) is a bilinear operator in the rows and columns of A

149

Nitsche and Benner Unit 11. Additional Topics

11.2 Lecture 40: December 2, 2013

Further details for class

Homework due Friday, with latest it can possibly be turned in on Tuesday before 4:30 (toget solutions). Final is on Wednesday at 7:30–9:30. (?)

Today we will cover eigenvalues and eigenvectors. Then on Wednesday we will coverpositive-definite matrices.

For Final, we will review on Friday. Some homework problems may definitely be ignoredbecause they were too involved.

Diagonalizable Matrices

We know that for any matrix,A ∼ B (11.2.1)

meansA = SBS−1. (11.2.2)

Now we want to know when A ∼ D which is a diagonal matrix .

Eigenvalues and eigenvectors

Say we have the eigen-pair (λ,v), when

Av = λv, (11.2.3a)

(A− λI) v = 0. (11.2.3b)

which is only the case for v ∈ N(A− λI). Thus we care about det(A− λI) = 0. So,

det (A− λI) =

∣∣∣∣∣∣∣∣∣a11 − λ a12 · · · a1n

a21 a22 − λ · · · a2n...

.... . .

...an1 an2 · · · ann − λ

∣∣∣∣∣∣∣∣∣ , (11.2.4a)

= (a11 − λ) (a22 − λ) · · · (ann − λ) + powers of λ of degree ≤ n− 2, (11.2.4b)

= p(λ), (11.2.4c)

= (−1)nλn + (−1)n−1 λn−1 (a11 + a22 + · · ·+ ann)︸︷︷︸tr(A)

+ lower order terms in λk, k ≤ n− 2,

(11.2.4d)

= (λ− λ1) (λ− λ2) · · · (λ− λn) (−1)n, (11.2.4e)

= (−1)nλn + (−1)n−1 (λ1 + λ2 + · · ·+ λn) + l.o.t., (11.2.4f)

= (−1)n[λn + λn−1 (−λ1 − λ2 − · · · − λn) + l.o.t.

], (11.2.4g)

(11.2.4h)

with the final step being from the fundamental theorem of algebra. From this we get thefollowing:

150

11.2. Lecture 40: December 2, 2013 Applied Matrix Theory

• Every matrix A has n eigenvalues.

• The sum∑λk = tr(A).

•∏λk = p(0) = det(A).

• If A is triangular det(A− λiI) =∏

(aii − λi) = 0 so the roots are simply the aii andλi = aii.

Example 11.3. For a little reviewing find the eigenvalues and the eigenvectors of

A =

(1 −11 1

)So,

det(A− λI) =

∣∣∣∣1− λ −11 1− λ

∣∣∣∣ = (1− λ)2 + 1, (11.2.5a)

= λ2 − 2λ+ 2, (11.2.5b)

λ1,2 =2±√

4− 8

2, (11.2.5c)

= 1± i. (11.2.5d)

Then for λ1:

(A− λI) v = 0

[1− (1 + i) −1 0

1 1− (1 + i) 0

]=

[−i −1 01 i 0

], (11.2.6a)

→[

1 −i 0−i −1 0

], (11.2.6b)

→[

1 −i 00 0 0

], (11.2.6c)

(11.2.6d)

So,

v1 − iv2 = 0, (11.2.7a)

v1 = iv2, (11.2.7b)

v1 =

(i1

)(11.2.7c)

Then

λ2 = 1− i, (11.2.8a)

v2 =

(−i

1

)(11.2.8b)

Note that the eigenvectors v1,v2 are linearly independent.

151


Note: If A has a linearly independent set of eigenvectors, then,

V =

(| | |

v1 v2 · · · vn| | |

)is invertible and Avj = λjvj. Then, for a diagonal matrix D with the eigenvalues along thediagonal

(AV):j = (VD):j , (11.2.9a)

AV = VD, (11.2.9b)

A = VDV−1. (11.2.9c)

So not all matrices are diagonalizable.

Example 11.4. Given the matrix

A =

(1 10 1

),

has the double eigenvalue of 1; λ1 = λ2 = 1. So,

A− λI =

(0 10 0

), (11.2.10a)

dim(N(A− λI)) = 1, (11.2.10b)

Thus there is only one eigenvector.

Example 11.5. Given the matrix

A =

(1 00 1

),

has the double eigenvalue of 1; λ1 = λ2 = 1. But here,

A− λI =

(0 00 0

), (11.2.11a)

dim(N(A− λI)) = 2, (11.2.11b)

and there are two linearly independent eigenvectors.

v1 =

(10

)and v2 =

(01

).

Example 11.6. Any nilpotent matrix where Nk = 0 does not have a full set of eigenvalues.This is because,

A ∼

0 · · · 0 01 0 0

. . . . . ....

0 1 0

(11.2.12a)

So λ1 = λ2 = · · · = λn = 0 and dim(N(A− λI)) = dim(N(A)) = 1.

152

11.2. Lecture 40: December 2, 2013 Applied Matrix Theory

Theorem 11.7. If A has n distinct eigenvalues, then the corresponding eigenvectors aredistinct.

Proof. Assume that {vk} are linearly dependent. Then, we can write one of them as alinearly independent subset of the other eigenvectors. Then, vk =

∑j 6=k cjvj where {vj} are

linearly independent. Then,

(A− λkI) vk = (A− λkI)∑j 6=k

cjvj, (11.2.13a)

0 = λkvk − λkvk, (11.2.13b)

=∑j 6=k

cj (Avj − λkvj) , (11.2.13c)

=∑j 6=k

cj (λj − λk)︸︷︷︸6=0

vj, (11.2.13d)

0 =∑

αj︸︷︷︸6=0

vj. (11.2.13e)

This however means that the set {vj} is linearly dependent. But this is a contradiction sothe assumption is not possible. So {vj} are linearly independent. �

Now if A = VDV−1 then,

Ak = VDV−1VDV−1 · · ·VDV−1, (11.2.14a)

= VDkV−1 (11.2.14b)

Similarly we can do a power series. This will be useful in solving systems of differentialequations.

153


154

Index

backward substitution, 9basic columns, 38basis, 56, 66, 84bilinear operator, 149

Cauchy–Schwarz inequality, 100change of basis, 88column space, 58complementary projector, 127complimentary subspaces, 121condition number, 27, 48consistent system, 36

determinant, 149diagonal matrix, 150differentiation, 86direct sum, 121

eigenvalues, 150eigenvectors, vi, 150elementary operations, 15Euclidian norm, 19exams, 73, 74

field, 55finite difference, 2, 44four fundamental subspaces, 58Frobeius norm, 101fundamental theorem of algebra, 65, 150

geometric series, 46Givens rotation, 118Gramm–Schmidt orthogonalization, 112

homogeneous solutions, 39Householder method, 121

idempotent matrices, 122

idempotent operator, 92ill-posed, 20induced norm, 104inner product, 109interpolation, 63invariant subspace, 91isometry, 116

Laplace equation, 2least squares, 69left null space, 58linear function, 39linear system, 1linear transformation, 83

action, 83, 87linearly dependent, 66linearly independent, 57, 63lower triangular, 25lower triangular system, 5

matrix form, 1matrix norm, 101minimization, 74modified Gramm–Schmidt, 116

nilpotent matrix, 128nilpotent operator, 92nonbasic columns, 38norm, 47, 99normal equations, 71null space, 58

operation count, 9order, 3orthogonal projector, 123orthogonalization, 111orthonormal, 111

155

Nitsche and Benner Index

orthonormal basis, 111

partial differential equations, 111particular solution, 38periodic boundary conditions, 44perturbations, 42pivoting, 19, 22PLU factorization, 22projection, 118

QR factorization, 114

rank, 61reduced row echelon form, 35reflection, 118review, 140rotation, 117row echelon form, 31row space, 58

self-similar, 89Sherman–Morrison formula, 44singular value decomposition, 131singular values, 131smallest upper bound, 102spanning set, 56sparsity, 18submatrices, 26subspaces, 67

Taylor series, 3trace, 40tridiagonal matrix, 18tuple, 92

Van der Monde matrix, 63vector form, 1vector space, 56

well-posed, 20

156

Figures

1.1 Finite difference approximation of a 1D boundary value problem. . . . . . . 2

2.1 One-dimensional discrete grids. . . . . . . . . . . . . . . . . . . . . . . . . . 102.2 Two-dimensional discrete grids. . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.1 Plot of linear problems and their solutions. . . . . . . . . . . . . . . . . . . . 21

4.1 Geometric illustration of linear systems and their solutions. . . . . . . . . . . 364.2 Figures for Textbook problem 3.3.4. . . . . . . . . . . . . . . . . . . . . . . . 51

5.1 Basis vector of example solution. . . . . . . . . . . . . . . . . . . . . . . . . 575.2 Interpolating system. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

6.1 Minimization of distance between point and a plane. . . . . . . . . . . . . . 736.2 Parabolic fitting by least squares . . . . . . . . . . . . . . . . . . . . . . . . 73

7.1 Figure 4.7.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

10.1 Singular values σk of matrix X versus k. . . . . . . . . . . . . . . . . . . . . 14710.2 Rank k approximations of original image. . . . . . . . . . . . . . . . . . . . . 147

157

Nitsche and Benner Figures

158

Tables

3.1 Variation of error with the perturbation variable . . . . . . . . . . . . . . . . 20

10.1 Relative error of SVD approximation matrix Ak . . . . . . . . . . . . . . . . 146

159

Nitsche and Benner Tables

160

Listings

2.1 code stub for tridiagonal solver . . . . . . . . . . . . . . . . . . . . . . . . . 1310.1 svdimag.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

161

Documents

APPLIED MATRIX THEORY - University of New Mexiconitsche/courses/464/notes.pdf · Nitsche and Benner Applied Matrix Theory 7 Linear Transformations81 7.1 ... The textbook for the class