Upload
doankhuong
View
227
Download
0
Embed Size (px)
Citation preview
APPLIED MATRIX THEORY
j
Lecture Notes for Math 464/514 Presented by
DR. MONIKA NITSCHE
j
Typeset and Editted by
ERIC M. BENNER
j
STUDENTS PRESSDecember 3, 2013
Copyright © 2013
Contents
1 Introduction to Linear Algebra 1
1.1 Lecture 1: August 19, 2013 . . . . . . . . . . . . . . . . . . . . . . . . . . 1
About the class, 1. Linear Systems, 1. Example: Application to boundary valueproblem, 2. Analysis of error, 3. Solution of the discretized equation, 4.
2 Matrix Inversion 5
2.1 Lecture 2: August 21, 2013 . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Gaussian Elimination, 5. Inner-product based implementation, 7. Office hours andother class notes, 8. Example: Gauss Elimination, 8.
2.2 Lecture 3: August 23, 2013 . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Example: Gauss Elimination, cont., 8. Operation Cost of Forward Elimination, 9.Cost of the Order of an Algorithm, 10. Validation of Lower/Upper Triangular Form, 11.Theoretical derivation of Lower/Upper Form, 11.
2.3 HW 1: Due August 30, 2013 . . . . . . . . . . . . . . . . . . . . . . . . . 12
3 Factorization 15
3.1 Lecture 4: August 26, 2013 . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Elementary Matrices, 15. Solution of Matrix using the Lower/Upper factorization, 18.Sparse and Banded Matrices, 18. Motivation for Gauss Elimination with Pivoting, 19.
3.2 Lecture 5: August 28, 2013 . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Motivation for Gauss Elimination with Pivoting, cont., 19. Discussion of well-posedness, 20.Gaussian elimination with pivoting, 21.
3.3 Lecture 6: August 30, 2013 . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Discussion of HW problem 2, 22. PLU factorization, 22.
3.4 Lecture 7: September 4, 2013 . . . . . . . . . . . . . . . . . . . . . . . . . 24
PLU Factorization, 24. Triangular Matrices, 25. Multiplication of lower triangular ma-trices, 25. Inverse of a lower triangular matrix, 25. Uniqueness of LU factorization, 26.Existence of the LU factorization, 26.
3.5 Lecture 8: September 6, 2013 . . . . . . . . . . . . . . . . . . . . . . . . . 27
About Homeworks, 27. Discussion of ill-conditioned systems, 27. Inversion of lowertriangular matrices, 28. Example of LU decomposition of a lower triangular matrix, 28.Banded matrix example, 29.
iii
Nitsche and Benner Applied Matrix Theory
3.6 Lecture 9: September 9, 2013 . . . . . . . . . . . . . . . . . . . . . . . . . 29
Existence of the LU factorization (cont.), 29. Rectangular matrices, 31.
3.7 HW 2: Due September 13, 2013 . . . . . . . . . . . . . . . . . . . . . . . 32
4 Rectangular Matrices 35
4.1 Lecture 10: September 11, 2013 . . . . . . . . . . . . . . . . . . . . . . . 35
Rectangular matrices (cont.), 35. Example of RREF of a Rectangular Matrix, 37.
4.2 Lecture 11: September 13, 2013 . . . . . . . . . . . . . . . . . . . . . . . 38
Solving Ax = b, 38. Example, 38. Linear functions, 39. Example: Transposeoperator, 40. Example: trace operator, 40. Matrix multiplication, 41. Proof oftransposition property, 42.
4.3 Lecture 12: September 16, 2013 . . . . . . . . . . . . . . . . . . . . . . . 42
Inverses, 42. Low rank perturbations of I, 43. The Sherman–Morrison Formula, 44.Finite difference example with periodic boundary conditions, 44. Examples of pertur-bation, 45. Small perturbations of I, 45.
4.4 Lecture 13: September 18, 2013 . . . . . . . . . . . . . . . . . . . . . . . 46
Small perturbations of I (cont.), 46. Matrix Norms, 47. Condition Number, 48.
4.5 HW 3: Due September 27, 2013 . . . . . . . . . . . . . . . . . . . . . . . 49
5 Vector Spaces 55
5.1 Lecture 14: September 20, 2013 . . . . . . . . . . . . . . . . . . . . . . . 55
Topics in Vector Spaces, 55. Field, 55. Vector Space, 56. Examples of functionspaces, 57.
5.2 Lecture 15: September 23, 2013 . . . . . . . . . . . . . . . . . . . . . . . 58
The four subspaces of Am×n, 58.
5.3 Lecture 16: September 25, 2013 . . . . . . . . . . . . . . . . . . . . . . . 61
The Four Subspaces of A, 62. Linear Independence, 63.
5.4 Lecture 17: September 27, 2013 . . . . . . . . . . . . . . . . . . . . . . . 64
Linear functions (rev), 64. Review for exam, 64. Previous lecture continued, 65.
5.5 Lecture 18: October 2, 2013. . . . . . . . . . . . . . . . . . . . . . . . . . 66
Exams and Points, 66. Continuation of last lecture, 66.
6 Least Squares 69
6.1 Lecture 19: October 4, 2013. . . . . . . . . . . . . . . . . . . . . . . . . . 69
Least Squares, 69.
6.2 Lecture 20: October 7, 2013. . . . . . . . . . . . . . . . . . . . . . . . . . 70
Properties of Transpose Multiplication, 71. The Normal Equations, 71. Exam 1, 73.
6.3 Lecture 21: October 9, 2013. . . . . . . . . . . . . . . . . . . . . . . . . . 74
Exam Review, 74. Least squares and minimization, 74.
6.4 HW 4: Due October 21, 2013 . . . . . . . . . . . . . . . . . . . . . . . . . 76
iv
Nitsche and Benner Applied Matrix Theory
7 Linear Transformations 81
7.1 Lecture 22: October 14, 2013 . . . . . . . . . . . . . . . . . . . . . . . . . 81
Linear Transformations, 83. Examples of Linear Functions, 83. Matrix representationof linear transformations, 83.
7.2 Lecture 23: October 16, 2013 . . . . . . . . . . . . . . . . . . . . . . . . . 84
Basis of a linear transformation, 84. Action of linear transform, 87. Change of Basis, 88.
7.3 Lecture 24: October 21, 2013 . . . . . . . . . . . . . . . . . . . . . . . . . 89
Change of Basis (cont.), 89.
7.4 Lecture 25: October 23, 2013 . . . . . . . . . . . . . . . . . . . . . . . . . 91
Properties of Special Bases, 91. Invariant Subspaces, 93.
7.5 HW 5: Due November 4, 2013 . . . . . . . . . . . . . . . . . . . . . . . . 94
8 Norms 99
8.1 Lecture 26: October 25, 2013 . . . . . . . . . . . . . . . . . . . . . . . . . 99
Difinition of norms, 99. Vector Norms, 99. The two norm, 99. Matrix Norms, 101.Induced Norms, 102.
8.2 Lecture 27: October 28, 2013 . . . . . . . . . . . . . . . . . . . . . . . . .102
Matrix norms (review), 102. Frobenius Norm, 102. Induced Matrix Norms, 104.
8.3 Lecture 28: October 30, 2013 . . . . . . . . . . . . . . . . . . . . . . . . .106
The 2-norm, 106.
9 Orthogonalization with Projection and Rotation 109
9.1 Lecture 28 (cont.) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .109
Inner Product Spaces, 109.
9.2 Lecture 29: November 1, 2013 . . . . . . . . . . . . . . . . . . . . . . . .110
Inner Product Spaces, 110. Fourier Expansion, 111. Orthogonalization Process(Gramm-Schmidt), 111.
9.3 Lecture 30: November 4, 2013 . . . . . . . . . . . . . . . . . . . . . . . .112
Gramm–Schmidt Orthogonalization, 112.
9.4 Lecture 31: November 6, 2013 . . . . . . . . . . . . . . . . . . . . . . . .115
Unitary (orthogonal) matrices, 116. Rotation, 117. Reflection, 118.
9.5 HW 6: Due November 11, 2013 . . . . . . . . . . . . . . . . . . . . . . .118
9.6 Lecture 32: November 8, 2013 . . . . . . . . . . . . . . . . . . . . . . . .120
Elementary orthogonal projectors, 120. Elementary reflection, 121. ComplimentarySubspaces of V, 121. Projectors, 121.
9.7 Lecture 33: November 11, 2013. . . . . . . . . . . . . . . . . . . . . . . .122
Projectors, 122. Representation of a projector, 123.
9.8 Lecture 34: November 13, 2013. . . . . . . . . . . . . . . . . . . . . . . .124
Projectors, 124. Decompositions of Rn, 125. Range Nullspace decomposition ofAn×n, 126.
9.9 HW 7: Due November 22, 2013 . . . . . . . . . . . . . . . . . . . . . . .126
v
Nitsche and Benner Applied Matrix Theory
9.10 Lecture 35: November 15, 2013. . . . . . . . . . . . . . . . . . . . . . . .128Range Nullspace decomposition of An×n, 128. Corresponding factorization of A, 129.
10 Singular Value Decomposition 131
10.1 Lecture 35 (cont.) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .131Singular Value Decomposition, 131.
10.2 Lecture 36: November 18, 2013. . . . . . . . . . . . . . . . . . . . . . . .132Singular Value Decomposition, 132. Existence of the Singular Value Decomposition, 133.
10.3 Lecture 37: November 20, 2013. . . . . . . . . . . . . . . . . . . . . . . .136Review and correction from last time, 136. Singular Value Decomposition, 136. Geometricinterpretation, 138.
10.4 Lecture 38: November 22, 2013. . . . . . . . . . . . . . . . . . . . . . . .139Review for Exam 2, 139. Norms, 139. More major topics, 140.
10.5 HW 8: Due December 10, 2013 . . . . . . . . . . . . . . . . . . . . . . .142
10.6 Lecture 39: November 27, 2013. . . . . . . . . . . . . . . . . . . . . . . .144Singular Value Decomposition, 144. SVD in Matlab, 145.
11 Additional Topics 149
11.1 Lecture 39 (cont.) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .149The Determinant, 149.
11.2 Lecture 40: December 2, 2013 . . . . . . . . . . . . . . . . . . . . . . . .150Further details for class, 150. Diagonalizable Matrices, 150. Eigenvalues and eigenvec-tors, 150.
Index 155
Other Contents 157
vi
UNIT 1
Introduction to Linear Algebra
1.1 Lecture 1: August 19, 2013
About the class
The textbook for the class will be Matrix Analysis and Applied Linear Algebra by Meyer.Another highly recommended text is Laub’s Matrix Analysis for Scientists and Engineers.
Linear Systems
A linear system may be of the general form
Ax = b. (1.1.1)
This may be represented in several equivalent ways.
2x1 + x2 − 3x3 = 18, (1.1.2a)
−4x1 + 5x3 = −28, (1.1.2b)
6x1 + 13x2 = 37. (1.1.2c)
This also may be put in matrix form 2 1 −3−4 0 5
6 13 0
x1
x2
x3
=
18−28
37
. (1.1.3)
Finally, a the third common form is vector form: 2−4
6
x1 +
10
13
x2 +
−350
x3 =
18−28
37
. (1.1.4)
1
Nitsche and Benner Unit 1. Introduction to Linear Algebra
t
y
t0 t1 t2 t3 · · · tn
y(t)
Figure 1.1. Finite difference approximation of a 1D boundary value problem.
Example: Application to boundary value problem
We will use finite difference approximations on a rectangular grid to solve the system,
− y′′(t) = f(t), for t ∈ [0, 1], (1.1.5)
with the boundary conditions
y(0) = 0, (1.1.6a)
y(1) = 0. (1.1.6b)
This is a 1D version of the general Laplace equation represented by,
−∆u = f (1.1.7)
or in more engineering/science form
−∇2u = f. (1.1.8)
The Laplace operator in cartesian coordinates,
∇2u =∇ · (∇u), (1.1.9a)
= uxx + uyy + uzz. (1.1.9b)
Finite Difference Approximation
Let tj = j∆t, with j = 0, . . . , N . The approximate forms of the solution yj ≈ y(tj).Now we need to approximate the derivatives with discrete values of the variables. The
forward difference approximation is
y′(tj) =yj+1 − yjtj+1 − tj
, (1.1.10)
or
y′(tj) =yj+1 − yj
∆t, (1.1.11)
2
1.1. Lecture 1: August 19, 2013 Applied Matrix Theory
The backward difference approximation is
y′(tj) =yj − yj−1
∆t. (1.1.12)
The centered difference approximation is
y′(tj) =yj+1 − yj−1
2∆t. (1.1.13)
Each of these are useful approximations to the first derivative that have varying propertieswhen applied to specific differential equations.
The second derivative may be approximated by combining the approximations of the firstderivative
(y′)′(tj) ≈y′j+ 1
2
− y′j− 1
2
∆t, (1.1.14a)
=yj+1−yj
∆t− yj−yj−1
∆t
∆t, (1.1.14b)
=yj+1 − 2yj + yj−1
∆t2. (1.1.14c)
Analysis of error
To understand the error of this approximation we may utilize the Taylor series . A generalTaylor series is
f(x) = f(a) + f ′(a)(x− a) +1
2f ′′(a)(x− a)2 +
1
3!f ′′′(a)(x− a)3 + · · · (1.1.15)
By the Taylor remainder theorem, we may approximate the error with a special truncationof the series,
f(x) = f(a) + f ′(a)(x− a) +1
2f ′′(a)(x− a)2 +
1
3!f ′′′(ξ)(x− a)3, (1.1.16)
or simply
f(x) = f(a) + f ′(a)(x− a) +1
2f ′′(a)(x− a)2 +O
((x− a)3
). (1.1.17)
The difference we are interested in to find the error is,
E = y′′(tj)−y(tj+1)− 2y(tj) + y(tj−1)
∆t2(1.1.18)
The Taylor series,
y(tj+1) = y(tj + ∆t) = y(tj) + y′(tj)∆t+O(∆t2), (1.1.19a)
y(tj−1) = y(tj −∆t) = y(tj)− y′(tj)∆t+O(∆t2)
(1.1.19b)
will need to be substituted.A function g is said to be order 2, or g = O(h2), if,
|g| ≤ Ch2. (1.1.20)
3
Nitsche and Benner Unit 1. Introduction to Linear Algebra
Solution of the discretized equation
We now substitute the discrete difference,
− yj+1 − 2yj + yj−1
∆t2= f(tj), for j = 1, . . . , n− 1 (1.1.21)
and the boundary conditions become
y0 = 0, (1.1.22a)
yn = 0. (1.1.22b)
This gives the linear system which will need to be solved for the unknowns yi.2 −1 0 · · · 0
−1 2 −1. . .
...
0 −1 2. . . 0
.... . . . . . . . . −1
0 · · · 0 −1 2
y1
y2...
yn−2
yn−1
= ∆t2
f(t1)f(t2)
...f(tn−2)f(tn−1)
. (1.1.23)
4
UNIT 2
Matrix Inversion
2.1 Lecture 2: August 21, 2013
Previously we came up with a tridiagonal system for finite difference solution last time.
Gaussian Elimination
We want to solve Ax = b. Claim: Gaussian elimination: A = LU
Notation:
A = [aij] (2.1.1)
Lower triangular system Lx = b. In class we use underlines to indicate the vector. Ingeneral these vectors are column vectors, and we will use xᵀ to indicate the row vector.
Lower triangular system Lx = b`11 0 0 0`21 `22 0 · · · 0`31 `32 `21 0
.... . .
...`n1 `n2 `n3 · · · `nn
x1
...
xn
=
b1
...
bn
(2.1.2)
or
`11x1 = b1 (2.1.3a)
`21x1 + `22x2 = b2 (2.1.3b)
· · · (2.1.3c)
`n1x1 + `n2x2 + · · ·+ `nnxn = bn (2.1.3d)
5
Nitsche and Benner Unit 2. Matrix Inversion
Rearranging to solve the equations,
x1 =b1
`11
(2.1.4a)
x2 =b2 − `21x1
`22
(2.1.4b)
· · · (2.1.4c)
xi =bi −
(`i(i−1)xi−1 + · · ·+ `i1x1
)`ii
(2.1.4d)
The basic algorithm for solution of the above system in pseudo code follows:
1: x1 ← b1/`11
2: for i← 2, n do3: xi ← [bi −
∑i−1k=1 `ikxk]/`ii
4: end for
The operation count, Nops, becomes,
Nops = 1 +n∑i=2
[1︸︷︷︸
division
+ 1︸︷︷︸substitution
+ (i− 1)︸ ︷︷ ︸multiplication
+ (i− 2)︸ ︷︷ ︸addition
]. (2.1.5)
Each of the terms arise directly from the steps of the algorithm shown above.
ASIDE: Finite sums
We need the following sums for our derivations of the operation counts,
n∑i=1
i =n(n+ 1)
2, (2.1.6)
n∑i=1
i2 =n(n+ 1)(2n+ 1)
6. (2.1.7)
Evaluating the operation count,
Nops = 1 +n∑i=2
(2i− 1), (2.1.8a)
=n∑i=1
(2i− 1), (2.1.8b)
= 2
(n∑i=1
i
)− n, (2.1.8c)
= n(n+ 1)− n, (2.1.8d)
= n2. (2.1.8e)
6
2.1. Lecture 2: August 21, 2013 Applied Matrix Theory
Implementation of lower triangular solution in Matlab
We give a Matlab code for this solution,
1 function x = L t r i s o l (L , b)2 % so l v e $Lx = b$ , assuming $L { i i } \ne 0$3 n = length (b ) ;4 % i n i t i a l i z e the s i z e o f your v e c t o r s5 x1 = b (1)/ l ( 1 , 1 ) ;6 for i = 2 : n7 x ( i ) = b( i ) ;8 for k = 1 : i−19 x ( i ) = x ( i ) − l ( i , k ) ∗ x ( k ) ;
10 end11 end12 %13 end
This would be saved as the code Ltrisol.m and would be run as
>> L = ...; b = ...;
>> x = Ltrisol(L, b)
Warning: Matlab loops are very slow!
Inner-product based implementation
How do we re-write the code as inner products?We can reorder the second for-loop so that it is simply an inner-product,
1 function x = L t r i s o l (L , b)2 % so l v e $Lx = b$ , assuming $L { i i } \ne 0$3 n = length (b ) ;4 % i n i t i a l i z e the s i z e o f your v e c t o r s5 x1 = b (1)/ l ( 1 , 1 ) ;6 for i = 2 : n7 x ( i ) = (b( i ) − l ( i , 1 : i −1)∗x ( 1 : i −1))/ l ( i , i ) ;8 end9 %
10 end
Note that the l(i,1:i-1) term is a row vector and x(1:i-1) is a column vector so thiscode will work fine. Recall that this required that x be initialized as a column vector. Theinner part can also be rewritten more cleanly as,
1 function x = L t r i s o l (L , b)2 % so l v e $Lx = b$ , assuming $L { i i } \ne 0$3 n = length (b ) ;4 % i n i t i a l i z e the s i z e o f your v e c t o r s5 x1 = b (1)/ l ( 1 , 1 ) ;6 for i = 2 : n7 k = 1 : i −1;8 x ( i ) = (b( i ) − l ( i , k )∗x ( k ) )/ l ( i , i ) ;9 end
7
Nitsche and Benner Unit 2. Matrix Inversion
10 %11 end
Office hours and other class notes
Office hours will be from 12–1 on MWF, the web address is, www.math.unm.edu/~nitsche/math464.html.
Example: Gauss Elimination
Example:
2x1 − x2 + 3x3 = 13 (2.1.9a)
−4x1 + 6x2 − 5x3 = −28 (2.1.9b)
6x1 + 13x2 − 16x3 = 37 (2.1.9c)
Let’s perform each step in full equation form. So we execute the steps R2 → R2 − (−2)R1
and R3 → R3 − (−3)R1.
2x1 − x2 + 3x3 = 13 (2.1.10a)
4x2 + x3 = −2 (2.1.10b)
16x2 + 7x3 = −2 (2.1.10c)
Next step will be R3 → R3 − (4)R2.
2.2 Lecture 3: August 23, 2013
Example: Gauss Elimination, cont.
Example:
2x1 − x2 + 3x3 = 13 (2.2.1a)
−4x1 + 6x2 − 5x3 = −28 (2.2.1b)
6x1 + 13x2 − 16x3 = 37 (2.2.1c)
Let’s perform each step in full equation form. So we execute the steps R2 → R2 − (−2)R1
and R3 → R3 − (−3)R1.
2x1 − x2 + 3x3 = 13 (2.2.2a)
4x2 + x3 = −2 (2.2.2b)
16x2 + 7x3 = −2 (2.2.2c)
8
2.2. Lecture 3: August 23, 2013 Applied Matrix Theory
Next step will be R3 → R3 − (4)R2.
2x1 − x2 + 3x3 = 13 (2.2.3a)
4x2 + x3 = −2 (2.2.3b)
3x3 = 6 (2.2.3c)
Now we begin the backward substitution.
x3 = 2; (2.2.4a)
x2 = (−2− x3)/4, (2.2.4b)
= −1; (2.2.4c)
x1 = (13 + x2 − 3x3)/2, (2.2.4d)
= 3. (2.2.4e)
Gauss Elimination is forward elimination and backward substitution. Now we will do thesame problem in matrix form, 2 −1 3 13
−4 6 −5 −286 13 16 37
→ 2 −1 3 13
0 4 1 −20 16 7 −2
, (2.2.5a)
→
2 −1 3 130 4 1 −20 0 3 6
. (2.2.5b)
Operation Cost of Forward Elimination
Now we want to know the operation count for the forward elimination step when we takeA→ U without pivoting for a general n× n matrix, A = [aij]. As an example of each step:
a11 a12 a13 a14 a15
a21 a22 a23 a24 a25
a31 a32 a33 a34 a35
a41 a42 a43 a44 a45
a51 a52 a53 a54 a55
→a11 a12 a13 a14 a15
0 a′22 a′23 a′24 a′25
0 a′32 a′33 a′34 a′35
0 a′42 a′43 a′44 a′45
0 a′52 a′53 a′54 a′55
(2.2.6a)
These operations are given by, rowj → rowj − `ijrowi, where `ij =aijaii
if aii 6= 0 (aii should
not be close to zero or we will need to use pivoting). An example, a1j → aij − a1ja11a1j = 0.
The next step,
→
a11 a12 a13 a14 a15
0 a′22 a′23 a′24 a′25
0 0 a′′33 a′′34 a′′35
0 0 a′′43 a′′44 a′′45
0 0 a′′53 a′′54 a′′55
(2.2.6b)
9
Nitsche and Benner Unit 2. Matrix Inversion
t
y
t0 t1 t2 t3 · · · tn
y(t)
(a) n grid
t
y
t0 t2 t4 t6 t8 t10 t12 t14 t16 · · · t4n
y(t)
(b) 4n grid
Figure 2.1. One-dimensional discrete grids.
At ith step (i = 1 : n− 1),B(n−i)×(n−i) → B(n−i)×(n−i), (2.2.7)
the cost of the individual step: n− i︸ ︷︷ ︸comp `ij
+ 2(n− i)2︸ ︷︷ ︸comp aij
. The total cost is thus,
Nops =n−1∑i=1
[(n− i) + 2(n− 1)2
](2.2.8a)
Let k = n− i then i = 1→ k = n− 1 and i = n− 1→ k = n− (n− 1) = 1
=1∑
k=n−1
(k + 2k2), (2.2.8b)
=(n− 1)n
2︸ ︷︷ ︸O(n2)
+2(n− 1)n(2(n− 1) + 1)
6︸ ︷︷ ︸O(n3)
, (2.2.8c)
≈ O(n3). (2.2.8d)
This means that the problem scales with order 3.
Cost of the Order of an Algorithm
For an order 3 algorithm, if you increase the size of your matrix by a factor of 2, the expenseof computer time will increase by a factor of 8. Similarly, if it took one day to solve aboundary value problem in 1D with n = 1000, then it will take 64 days to do n = 4000 (seefigure 2.1).
Alternatively, if you are doing a 2D simulation, increasing by a factor of 4, as shown infigure 2.2, would increase the domain to 16 and thus the calculations would increase to 163.This gets very expensive! This is one of the reasons that models of phenomena such as theweather is very difficult.
10
2.2. Lecture 3: August 23, 2013 Applied Matrix Theory
x
y
x0 xny0
yn
(a) n× n grid
x
y
x0 x4n
y0
y4n
(b) 4n× 4n grid
Figure 2.2. Two-dimensional discrete grids.
Validation of Lower/Upper Triangular Form
Consider that we have the Gaussian Elimination with A = LU, where
L =
(1 0`ij 1
). (2.2.9)
Check our previous system: 2 −1 3−4 6 −5
6 13 16
=
1 0 0−2 1 0
3 4 1
2 −1 30 4 10 0 3
. (2.2.10)
This works!
Theoretical derivation of Lower/Upper Form
We want to show that Gauss elimination naturally leads to the LU form using elementaryrow operations. The three elementary operations are:
1. Multiply row by α;
2. Switch rowi and rowj;
3. Add multiple of rowi to rowj.
All are equivalent to pre-multiplying A by an elementary matrix. Let’s illustrate these:
11
Nitsche and Benner Unit 2. Matrix Inversion
1. Multiply by α.1 0 0 00 1 0 · · · 00 0 α 0
......
0 0 0 · · · 1
︸ ︷︷ ︸
Ei
a11 a12 a13 a1n
a21 a22 a23 · · · a2n
a31 a32 a33 a3n...
. . ....
an1 an2 an3 · · · ann
=
a11 a12 a13 a1n
a21 a22 a23 · · · a2n
αa31 αa32 αa33 αa3n...
. . ....
an1 an2 an3 · · · ann
(2.2.11a)
2.3 Homework Assignment 1: Due Friday, August 30,
2013
1. Use Taylor series expansions of f(x± h) about x to show that
f ′′(x) =f(x+ h)− 2f(x) + f(x− h)
h2− h2
12f (4)(x) +O
(h4). (2.3.1)
2. Consider the two-point boundary value problem
y′′(x) = ex, y(−1) =1
e, y(1) = e (2.3.2)
where x ∈ [−1, 1], Divide the interval [−1, 1] into N equal subintervals and applythe finite difference method presented in class to find the approximate the solutionyj ≈ y(xj) at theN−1 interior points j = 1, . . . , N−1, where xj = a+jh, h = (b−a)/N ,and [a, b] = [−1, 1]. Compare the approximate values at the grid points with the exactsolution at the grid points. Use N = 2, 4, 8, . . . , 29 and report the maximal absoluteerror for each N in a table. Your writeup should contain:
• the Matlab code;
• a table with two columns. The first contains h, the second contains the corre-sponding maximal errors. By how much is the error reduced every time N isdoubled? Can you conclude whether the error is O(h), O(h2) or O(hp) for someother integer p?
Regarding Matlab: If needed, go over the Matlab tutorial on the course website,items 1–6. This covers more than you need for this problem. In Matlab, type
help diag or help ones
to find what these commands do. The (N−1)×(N−1) matrix with 2s on the diagonaland –1 on the off-diagonals can be constructed by
v=ones(1,n-1);
A=2*diag(v)-diag(v(1:n-2),1)-diag(v(1:n-2),-1);
12
2.3. HW 1: Due August 30, 2013 Applied Matrix Theory
The system Ax = b can be solved in Matlab by x = A\b. The maximal differencebetween two vectors x and y is error=max(abs(x-y)). Your code should have thefollowing structure
Listing 2.1. code stub for tridiagonal solver
1 disp ( sprintf ( h e r r o r )2 a = . . . ; b = . . . ; % Set va l u e s o f endpo in t s3 ya = . . . ; yb = . . . ; % Set va l u e s o f y at the endpo in t s4 for n = . . . ;5 h=2/n ;6 x=a : h : b ;7 % Set matrix A of the l i n e a r system to be so l v ed .8 v=ones (1 , n−1);9 A=2∗diag ( v)−diag ( v ( 1 : n−2) ,1)−diag ( v ( 1 : n−2) ,−1);
10 % Set r i g h t hand s i d e o f l i n e a r system .11 rhs = . . .12 % Solve l i n e a r system to f i nd approximate s o l u t i o n .13 y ( 2 : n)=A\ r h s ; y(1)=ya ; y (n+1)=yb ;14 % Compute exac t s o l u t i o n and approximation error15 yex = . . . % se t exac t s o l u t i o n16 plot (x , y , b− , x , yex , r − ) % to compare v i s u a l l y17 error=max(abs (y−yex ) )18 disp ( sprintf ( %15.10 f %20.15 f , h , e r ror ) )19 end
Note that in Matlab the index of all vectors starts with 1. Thus, x=-1:h:1, is avector of length n+ 1 and the interior points are x(2:n).
3. Let U be an upper triangular n× n matrix with nonzero entries uij, j ≥ i.
(a) Write an algorithm that solves Ux = b for a given right hand side b for theunknown x.
(b) Find the number of operations that it takes to solve for x, using your algorithmabove.
(c) Write a Matlab function function x=utrisol(u,b) that implements your al-gorithm and returns the solution x.
4. Given A,b below,
(a) find the LU factorization of A (using the Gauss Elimination algorithm);
(b) use it to solve Ax = b.
A =
2 −1 0 0−1 2 −1 0
0 −1 2 −10 0 −1 2
, b =
0005
. (2.3.3)
5. Sparsity of L and U, given sparsity of A = LU. If A, B, C, D have non-zeros in thepositions marked by x, which zeros (marked by 0) are still guaranteed to be zero in
13
Nitsche and Benner Unit 2. Matrix Inversion
their factors L and U? (B,C,D are all band matrices with p = 3 bands, but differingsparsity within the bands. The question is how much of this sparsity is preserved.) Ineach case, highlight the new nonzero entries in L and U.
A =
x x x xx x x 00 x x x0 0 x x
, B =
x 0 x 0 0 00 x 0 x 0 0x 0 x 0 x 00 x 0 x 0 00 0 x 0 x 00 0 0 x 0 x
,
C =
x x x 0 0 00 x 0 x 0 0x 0 x 0 x 00 x 0 x 0 00 0 x 0 x 00 0 0 x 0 x
, D =
x 0 0 x 0 00 x 0 0 x 0x 0 x 0 0 x0 x 0 x 0 00 0 x 0 x 00 0 0 x 0 x
,
6. Consider solving a differential equation in a unit cube, using N points to discretize eachdimension. That is, you have a total of N3 points at which you want to approximatethe solution. Suppose that at each time step, you need to solve a linear system Ax = b,where A is an N3×N3 matrix, which you solve using Gauss Elimination, and supposethere are no other computations involved. Assume your personal computer runs at 1GigaFLOPS, that is, it executes 109 floating point operations per second.
(a) How much time does it take to solve your problem for N = 500 for 1000 timesteps?
(b) When you double the number of points N , you typically also have to halve thetimestep, that is, double the total number of timesteps taken. By what factordoes the runtime increase each time you double N?
(c) How much time will it take to solve the problem if you use N = 2000?
14
UNIT 3
Factorization
3.1 Lecture 4: August 26, 2013
For the h in the homework, for n = 2.^(1:1:10). We want to deduce the order of themethod from the table of h and the error.
Elementary Matrices
1. Multiply rowi by α:
E1 =
1 0 0 0 0
0 . . . 0 0 0
0 0 α 0 0
0 0 0 . . . 0
0 0 0 0 1
. (3.1.1)
The inverse is,
E−11 =
1 0 0 0 0
0 . . . 0 0 0
0 0 1α
0 0
0 0 0 . . . 0
0 0 0 0 1
. (3.1.2)
E1E−11 = I (3.1.3)
2. Exchange rowi and rowj:
E2 =
1 0 0 0 0 00 1 0 0 0 00 0 0 1 0 00 0 1 0 0 00 0 0 0 1 00 0 0 0 0 1
. (3.1.4)
15
Nitsche and Benner Unit 3. Factorization
E22 = I (3.1.5)
3. Replace rowj by rowj + αrowi.
E3 =
1 0 0 0 0 00 1 0 0 0 00 0 1 0 0 00 0 α 1 0 00 0 0 0 1 00 0 0 0 0 1
. (3.1.6)
E−13 =
1 0 0 0 0 00 1 0 0 0 00 0 1 0 0 00 0 −α 1 0 00 0 0 0 1 00 0 0 0 0 1
. (3.1.7)
What happens if we post-multiply by the elementary matrices? The matrices will act onthe columns instead of the rows.
AE1 =
a11 a12 a13 a1n
a21 a22 a23 · · · a2n
a31 a32 a33 a3n...
. . ....
an1 an2 an3 · · · an
1 0 0 0 0
0 . . . 0 0 0
0 0 α 0 0
0 0 0 . . . 0
0 0 0 0 1
=
a11 a12 αa13 a1n
a21 a22 αa23 · · · a2n
a31 a32 αa33 a3n...
. . ....
an1 an2α an3 · · · an
(3.1.8)
AE2 =
a11 a12 a13 a1n
a21 a22 a23 · · · a2n
a31 a32 a33 a3n...
. . ....
an1 an2 an3 · · · an
1 0 0 0 0 00 1 0 0 0 00 0 0 1 0 00 0 1 0 0 00 0 0 0 1 00 0 0 0 0 1
(3.1.9)
Gaussian Elimination without pivoting
Premultiply by elementary matrices type 3 repeatedly.
`ji =ajiaii, for j > i (3.1.10)
E−21A =
x x x x x0 x x x xx x x x xx x x x xx x x x x
(3.1.11)
16
3.1. Lecture 4: August 26, 2013 Applied Matrix Theory
E−31E−21A =
x x x x x0 x x x x0 x x x xx x x x xx x x x x
(3.1.12)
This sequence continues until we have introduced zeros to get the lower diagonal:
E−n,n−1 · · ·E−n1 · · ·E−31E−21A =
x x x x x0 x x x x0 0 x x x0 0 0 x x0 0 0 0 x
= U (3.1.13)
Thus,A = E21E31 · · ·En−1,n−2En,n−2En,n−1︸ ︷︷ ︸
L
U (3.1.14)
E21E31 =
1 0 0 0 0 0`21 1 0 0 0 00 0 1 0 0 00 0 0 1 0 00 0 0 0 1 00 0 0 0 0 1
1 0 0 0 0 00 1 0 0 0 0`31 0 1 0 0 00 0 0 1 0 00 0 0 0 1 00 0 0 0 0 1
=
1 0 0 0 0 0`21 1 0 0 0 0`31 0 1 0 0 00 0 0 1 0 00 0 0 0 1 00 0 0 0 0 1
. (3.1.15)
Which extends to
E1 = En1 · · ·E21E31 =
1 0 0 0 0 0`21 1 0 0 0 0`31 0 1 0 0 0... 0 0
. . . 0 0`n−1,1 0 0 0 1 0`n1 0 0 0 0 1
. (3.1.16)
This further extends to,
E1E2 =
1 0 0 0 0 0`21 1 0 0 0 0`31 `32 1 0 0 0...
... 0. . . 0 0
`n−1,1 `n−1,2 0 0 1 0`n1 `n2 0 0 0 1
. (3.1.17)
Finally we get that
E1E2 · · · En−1 =
1 0 0 0 0 0`21 1 0 0 0 0`31 `32 1 0 0 0...
.... . . . . . 0 0
`n−1,1 `n−1,2 · · · `n−1,n−2 1 0`n1 `n2 · · · `n,n−2 `n,n−1 1
. (3.1.18)
17
Nitsche and Benner Unit 3. Factorization
Solution of Matrix using the Lower/Upper factorization
To use A = LU to solve Ax = b.
1. Find L,U (number of operations: 23n3)
2. L(Ux) = b First solve Ly = b (number of operations: n2), then solve, Ux = y(number of operations: n2).
Example:
To solve Ax = b_k k= 1,10^4
%
Find L, U once O(2/3 n^3)
then solve
L y = b
U x = y
10,000 times
O(10,000 * n^2 * 2)
Sparse and Banded Matrices
Given
A =
x 0 0 0 0
0 . . . 0 0 0
0 0 x 0 0
0 0 0 . . . 0
0 0 0 0 x
(3.1.19)
the bandwidth is 1. Below,
A =
x x 0 0 0 0x x x 0 0 00 x x x 0 00 0 x x x 00 0 0 x x x0 0 0 0 x x
, (3.1.20)
the bandwidth is 3—this is a tridiagonal matrix . This type of matrix maintains it’s sparsitywhen it undergoes LU decomposition.
x x 0 0 0 0x x x 0 0 00 x x x 0 00 0 x x x 00 0 0 x x x0 0 0 0 x x
=
1 0 0 0 0 0x 1 0 0 0 00 x 1 0 0 00 0 x 1 0 00 0 0 x 1 00 0 0 0 x 1
x x 0 0 0 00 x x 0 0 00 0 x x 0 00 0 0 x x 00 0 0 0 x x0 0 0 0 0 x
. (3.1.21)
18
3.2. Lecture 5: August 28, 2013 Applied Matrix Theory
Motivation for Gauss Elimination with Pivoting
When does Gauss elimination give us a problem? For example
1.
(0 11 1
)
2. A =
(δ 11 1
). Solve Ax =
(1 + δ
2
), the exact solution is
(11
). However, we run into
numerical problems.
3.2 Lecture 5: August 28, 2013
Motivation for Gauss Elimination with Pivoting, cont.
When does Gauss elimination give us a problem? Returning to the example problem, A =(δ 11 1
). Solve Ax =
(1 + δ
2
), the exact solution is
(11
), but we run into numerical
problems.There are a couple approaches to this problem. First, solve for x by first finding L,U
and using them numerically,
A =
(δ 11 1
)→(δ 10 1− 1
δ
)= U (3.2.1)
and
L =
(1δ
10 1
)(3.2.2)
Now we want to solve L (Ux) = b
1 for j =1:162 d e l t a = 10ˆ(− j ) ;3 b = [ 1 + del ta , 2 ] ;4 L = [ 1 , 0 ; 1/ de l ta , 1 ] ;5 U = [ de l ta , 1 ; 0 , 1−1/ de l t a ] ;6 % Solve Ly = b \ to y7 y (1 ) = b ( 1 ) ; y (2 ) = b (2) − L(2 ,1 )∗ y ( 1 ) ;8 % Solve Ux = y \ to x9 x (2 ) = y (2)/ u ( 2 , 2 ) ; x (1 ) = ( y (1 ) − u (1 ,2 )∗ x ( 2 ) ) / u ( 1 , 1 ) ;
10 %11 disp ( sprintf ( ’ %5.0 e %20.15 f %20.15 f %10.8 e ’ , de l ta , x ( 1 ) , x ( 2 ) ,norm(x− [ 1 , 1 ] ) ) ;12 end
Note that the norm is the Euclidian norm, x − [1, 1] =√
(x(1)− 1)2 + (x(2)− 1)2 . Thisgives us a table of results as shown below
Conclusion: Ax = b is a good problem (well-posed) introducing small perturbations(e.g., by roundoff) does not change the solution by much. Matlab’s algorithm A\b is agood algorithm (stable); LU decomposition does not give a good algorithm (unstable).
19
Nitsche and Benner Unit 3. Factorization
Table 3.1. Variation of error with the perturbation variable
δ x(1) x(2) ||x− [1, 1]||21e-01 1.000 1.000 8e-161e-02 1.000 1.000 1e-131e-03 0.999 1.000 6e-121e-04 1.000. . . 28 1.000 e-111e-05 . . . 1.000 e-10. . . . . . . . . . . .1e-16 0.888 1.000 e-0
Discussion of well-posedness
Geometrically, Ax = b,
δx1 + x2 = 1 + δ, (3.2.3a)
x1 + x2 = 2. (3.2.3b)
This is a well-posed system. Rearranging
x2 ≈ 1− δx1, x2 = 2− x1. (3.2.4a)
Our other system Ly = b,
y1 = 1 (3.2.5a)
1
δy1 + y2 = 2 (3.2.5b)
This makes a very ill-posed system because small wiggles in δ give much larger errors becausethe slopes are so near each other.
Now we consider Ux = y,
δx1 + x2 = 1, (3.2.6a)(1− 1
δ
)x2 = y2. (3.2.6b)
This is also ill-posed as well. All of these linear problems are illustrated in figure 3.1.
20
3.2. Lecture 5: August 28, 2013 Applied Matrix Theory
x1
x2
(1, 1)
(a) Ax = b
x1
x2
(b) Ly = b
x1
x2
(c) Ux = y
Figure 3.1. Plot of linear problems and their solutions.
Gaussian elimination with pivoting
Pivoting means we exchange rows such that the current |aii| = maxj≥i|aji|. Similarly, `ji =
ajiaii≤ 1 for all j > i. Now,[
δ 1 1 + δ1 1 2
]→[
1 1 2δ 1 1 + δ
](3.2.7a)
R2←R2−δR1−−−−−−−→[
1 1 20 1− δ 1 + δ − 2δ
](3.2.7b)
→[
1 1 20 1− δ 1− δ
](3.2.7c)
PLU always works. Theorem: Gaussian elimination with pivoting yields PA = LU. Thepermutation matrix is P. Every matrix has a PLU factorization.
To do the pivoting, at each step, first premultiply A by
Pk =
1 0 0 0 0 0
0. . . 0 0 0 0
0 0 0 1 0 00 0 1 0 0 0
0 0 0 0. . . 0
0 0 0 0 0 1
(3.2.8)
then premultiply by
Lk =
1 0 0 0 0 0
0. . . 0 0 0 0
0 0 1 1 0 00 0 `k−1,k 1 0 0
0 0... 0
. . . 00 0 `n,k 0 0 1
(3.2.9)
21
Nitsche and Benner Unit 3. Factorization
We do this in succession,
Ln−1Pn−1 · · ·L2P2L1P1A = U (3.2.10)
How do these commute into a useful P and L matrix?
3.3 Lecture 6: August 30, 2013
Discussion of HW problem 2
− yj−1 + 2yj − yj+1 = h2f(xj), for j = 1, . . . , n− 1. (3.3.1)
2 −1 0 · · · 0
−1 2 −1. . .
...
0 −1 2. . . 0
.... . . . . . . . . −1
0 · · · 0 −1 2
y1
y2...
yn−2
yn−1
= h2
f(t1) + y0
f(t2)...
f(tn−2)f(tn−1) + yn
. (3.3.2)
So we’ve set up our matrix
rhs = matrix of zeros size \(1 \times n-1\)
for
A_{(n-1)x(n-1)}
x = a:h:b = linspace(a,b,n+1)
rhs = h^2*f(x(2:n));
rhs(1) = rhs(1) + ya;
rhs(n-1) = rhs(n-1) + yb;
Recall that our f(x) = −ex:
− y′′ = −ex (3.3.3)
PLU factorization
For PLU factorization, we are doing Gauss elimination with pivoting. At each kth stepof Gaussian elimination, switch rows so that the pivots, a
(k)kk , are the largest number by
magnitude in the kth column.
For example, 1 −1 3−1 0 −2
2 2 4
x1
x2
x3
=
−310
. (3.3.4)
22
3.3. Lecture 6: August 30, 2013 Applied Matrix Theory
or 1 −1 3 −3−1 0 −2 1
2 2 4 0
→ 2 2 4 0−1 0 −2 1
1 −1 3 −3
, row1 ↔ row3 (3.3.5a)
→
2 2 4 00 1 −0 10 −2 1 −3
, row2 ← row2 −1
3row1, and row3 ← row3 −
1
2row1
(3.3.5b)
→
2 2 4 00 −2 1 −30 1 −0 1
, row2 ↔ row3 (3.3.5c)
→
2 2 4 00 −2 1 −30 0 1/2 −1/2
, row3 ← row3 −(−1
2
)row2
(3.3.5d)
We need to do the back substitution to solve this system. But more importantly, we wantto know what the factorization of this system would be. Recall,
Lk =
1 0 0 0 0 0
0. . . 0 0 0 0
0 0 1 0 0 00 0 `k−1,k 1 0 0
0 0... 0
. . . 00 0 `n,k 0 0 1
, (3.3.6)
and
L−(n−1)Pn−1 · · ·L−2P2L−1P1A = U. (3.3.7)
Reordering,
Pn−1 · · ·L−2P2L−1P1A = L(n−1)U. (3.3.8)
We want to move each P to be right next to A and all the Ls such that we can form a trueL. Claim,
PjL−k = L−kPj, j > k. (3.3.9)
Pj permutation moves columns below the kth row. This allows us to move L’s out.
PjL−kPj = L−k (3.3.10a)
L−n · · · ˜L−1Pn−1 · · ·P1A = U (3.3.11)
23
Nitsche and Benner Unit 3. Factorization
Now we can return to our example but with keeping track of the 1 −1 3 −3−1 0 −2 1
2 2 4 0
→ 2 2 4 0−1 0 −2 1
1 −1 3 −3
, row1 ↔ row3,P1 =
0 0 10 1 01 0 0
(3.3.12a)
→
2 2 4 0
−12
1 −0 1
12−2 1 −3
, row2 ← row2 −(−1
2
)row1, row3 ← row3 −
1
2row1
(3.3.12b)
→
2 2 4 0
12−2 1 −3
−12
1 −0 1
, row2 ↔ row3,P2 =
0 0 11 0 00 1 0
(3.3.12c)
→
2 2 4 0
12
−2 1 −3
−12
−12
1/2 −1/2
, row3 ← row3 −(−1
2
)row2
(3.3.12d)
Because P = P−1, we should remember that,
PA = LU (3.3.13a)
A = PLU. (3.3.13b)
3.4 Lecture 7: September 4, 2013
PLU Factorization
RecallPA = LU (3.4.1)
always exists by construction. This is because we can make anything non-zero by the per-mutation. This is also equivalent to,
A = PLU (3.4.2)
because P = P−1. To use this in an actual solution,
PAx = Pb, (3.4.3)
orLUx = Pb, (3.4.4)
So this system is determined by:
24
3.4. Lecture 7: September 4, 2013 Applied Matrix Theory
1. Solving Ly = Pb,
2. Solving Ux = y.
In Matlab, we would use the commands [L,U,P] = lu(A), to find these three matrices.This factorization is not unique. We want to show the uniqueness of the LU factorization,and are also interested in when it exists.
Triangular Matrices
We are interested in the determinants of lower or upper triangular matrices. Let’s discussdet(L).
L =
`11 0 0 0 0...
. . . 0 0 0`i1 · · · `jj 0 0... · · · ...
. . . 0`n1 · · · `nj . . . `nn
(3.4.5)
the determinant is det(L) =∏n
i=1 `ii. Thus L is invertible only if `ii 6= 0 for all `ii. Weconjecture the product of two lower triangular matrices will give us lower a triangular matrix.e.g.
L1L2 = L12 (3.4.6)
We want to prove this!
Multiplication of lower triangular matrices
Prove that L1L2 is lower triangular. Assume AB are lower triangular. Show C = AB islower triangular. We know that bijaij = 0 for j > i. In our proof, we first consider matrixmultiplication.
eij =∑
aikbkj. (3.4.7)
We know that aik = 0 for k > i, and bkj = 0 for j > k. If j > i, then when k < i we havethat k < j so bkj = 0. Alternatively, if k > i then aik = 0. Thus, in either case one of thetwo products is zero and we have proved our hypothesis.
Inverse of a lower triangular matrix
A lower triangular matrix’s inverse is also a lower triangular matrix;
L−1 =
`11 · · · 0...
. . ....
`n1 · · · `nn
= Lower triangular (3.4.8)
So, this helps with inversion of the form,
L−n · · ·L−2L−1A = U. (3.4.9)
25
Nitsche and Benner Unit 3. Factorization
For matrixes of the form
L−k =
1 0 0 0 0 0
0. . . 0 0 0 0
0 0 1 0 0 00 0 −`ij 1 0 0
0 0... 0
. . . 00 0 −`nj 0 0 1
; (3.4.10)
the inverse matrix is
Lk =
1 0 0 0 0 0
0. . . 0 0 0 0
0 0 1 0 0 00 0 `ij 1 0 0
0 0... 0
. . . 00 0 `nj 0 0 1
. (3.4.11)
For any
Lk =
1 0 0 0 0...
. . . 0 0 00 · · · 1 0 0... 0
.... . . 0
0 · · · `in . . . 1
`11 0 0 0 0...
. . . 0 0 00 · · · `ii 0 0... 0
.... . . 0
0 · · · 0 . . . `nn
(3.4.12)
To find L−1, [L I]GE−−→ [I L−1]. Use Gaussian elimination on L, and we go through each
column.
Uniqueness of LU factorization
Theorem: If A is such that no non-zero pivots are encountered, then A = LU with `ii = 1and uii 6= 0, which are the pivots. For, `ij =
aijaii
for j < i by construction.Proof: Assume A = L1U1 = L2U2, then
L−12 L1U1 = U2, (3.4.13a)
L−12 L1 = U2U
−11 (3.4.13b)
= diagonal matrix (3.4.13c)
= I. (3.4.13d)
If this is the case, then L−12 L1 = I or L2 = L1, and similarly U2 = U1. Thus these matrices
are the same and the solution must be unique.
Existence of the LU factorization
Theorem: A = LU with no zero pivots, then all leading principal submatrices Ak are non-singular. We define the leading principle sub matrices Ak of An×n is Ak = A(1:k),(1:k). Theseare the upper-left square matrices of the full matrix.
26
3.5. Lecture 8: September 6, 2013 Applied Matrix Theory
Part 2. A = LU then define Ak 6= 0 for any k. We want to prove that if A = LU, showthat Ak is invertible. Then if Ak is invertible show that A = LU.
3.5 Lecture 8: September 6, 2013
About Homeworks
The median score was 50 out of 60. A histogram was shown with the general grade distri-bution. 1 around 10, 3 around 25, 1 around 40, 4 from 45–50, 4 from 50–55, 6 from 55–60.Comments: write in working Matlab code. Also, L must have ones on the diagonal, whileU has pivots on the diagonal. “Computing efficiently” means using the LU decomposition,not invert the matrix A.
For homework 2, we will have applications of finding the inverse of A or solve
AX = I (3.5.1)
or
A(x1 x2 · · · xn
)=(e1 e2 · · · en
)(3.5.2)
To find A−1, solve
Axj = ej, (3.5.3)
for all j = 1, 2, . . . , n. Use the LU decomposition.
Discussion of ill-conditioned systems
We define Ax = b as an ill-conditioned system if small changes in A or b introduces largechanges in the solution. Geometrically we showed this interpretation previously on a 2 × 2system, and we noted that the slopes were very similar to each-other. Numerically, we havetrouble because the roundoff when we solve Ax = b. We also may compute a conditionnumber which tells us the amplification factor of errors in the system.
In Matlab, the command cond(A) gives you the condition. This should hopefully beunder a thousand. The condition number essentially tells you how much accuracy you canexpect to get from the final solution. In other words, if your condition number is 1 × 105
then you can only expect to have about 11 significant digits in our solution at floating pointarithmetic.
27
Nitsche and Benner Unit 3. Factorization
Inversion of lower triangular matrices
Show that if A is a lower triangular matrix then so is A−1. So let’s solve AX = I with Alower triangular.
x 0 0 0 0 1 0 0 0 0x x 0 0 0 0 1 0 0 0x x x 0 0 0 0 1 0 0x x x x 0 0 0 0 1 0x x x x x 0 0 0 0 1
→
x 0 0 0 0 1 0 0 0 0x x 0 0 0 y 1 0 0 0x x x 0 0 y 0 1 0 0x x x x 0 y 0 0 1 0x x x x x y 0 0 0 1
, (3.5.4a)
→
x 0 0 0 0 1 0 0 0 0x x 0 0 0 y 1 0 0 0x x x 0 0 y y 1 0 0x x x x 0 y y 0 1 0x x x x x y y 0 0 1
, (3.5.4b)
→
x 0 0 0 0 1 0 0 0 0x x 0 0 0 y 1 0 0 0x x x 0 0 y y 1 0 0x x x x 0 y y y 1 0x x x x x y y y 0 1
, (3.5.4c)
→
x 0 0 0 0 1 0 0 0 0x x 0 0 0 y 1 0 0 0x x x 0 0 y y 1 0 0x x x x 0 y y y 1 0x x x x x y y y y 1
. (3.5.4d)
We now have shown that we can get the lower triangular matrix A into the form LD. Nowwe do backward substitution to get our X. In this case this is simply deviding each row bythe value of the pivot of that row. In this way with D = U, we have X = D−1L−1.
Example of LU decomposition of a lower triangular matrix
Given the matrix,
2 0 01 3 02 1 4
=
1 0 012
1 01 1
31
2 0 00 3 00 0 4
, (3.5.5a)
= LU. (3.5.5b)
28
3.6. Lecture 9: September 9, 2013 Applied Matrix Theory
Banded matrix example
Exercise 3.10.7: Band matrix A with bandwidth w is a matrix with aij = 0 if |i− j| > w.If w = 0, we have a diagonal matrix.
Aw=0 =
a11 0 0 0 00 a22 0 0 00 0 a33 0 00 0 0 a44 00 0 0 0 a55
. (3.5.6)
For bandwidth, w = 1,
Aw=1 =
a11 a12 0 0 0a21 a22 a23 0 00 a32 a33 a34 00 0 a43 a44 a45
0 0 0 a54 a55
. (3.5.7)
For bandwidth, w = 2,
Aw=2 =
a11 a12 a13 0 0a21 a22 a23 a24 0a31 a32 a33 a34 a35
0 a42 a43 a44 a45
0 0 a53 a54 a55
. (3.5.8)
In the LU decomposition these zeros are preserved. However there are other cases (as shownin the homework) where the zeros may not be preserved.
We will return to our theorem on Monday. For the homework, a matrix has an LUdecomposition if and only if all principle submatrices are invertible.
3.6 Lecture 9: September 9, 2013
Existence of the LU factorization (cont.)
When does LU factorization exist? Theorem: If no zero pivots that appears in Gaussianelimination (including the nth one) then A = LU, `ii = 1 and uii 6= 0 are pivots. Then L,U are unique.
Theorem: A = LU if and only if the leading principle submatrices Ak is invertible.Proof: Assume (for block matrices of length k × k, n− k × n− k and the difference)
A = LU, (3.6.1)
=
(L11 0L21 L22
)(U11 U12
0 U22
), (3.6.2)
=
(L11U11 L11U12
L21U11 L22U22
)(3.6.3)
29
Nitsche and Benner Unit 3. Factorization
Now our question: is Ak = L11U11? We know that det L11 =∏k
j=1 `jj 6= 0 so L11 is
invertible. Similarly, U11 =∏k
j=1 ujj 6= 0 so it is also invertibles. Since we know that theproduct of two invertible matrices is also invertible, Ak must also be invertible.
We will now do a proof by induction: If we assume that all Ak are invertible. Show thatA = LU.
ASIDE: Example of proof by induction.
We want to show,n∑
j=1
j2 =n(n+ 1)(2n+ 1)
6. (3.6.4)
The steps of proof by induction are
1. First we show that this holds for n = 1,
2. next we assume it holds for n,
3. finally we show that it holds for n+ 1.
Let’s show the third step,
n+1∑j=1
j2 =
n∑j=1
j2 + (n+ 1)2, (3.6.5a)
=n(n+ 1)(2n+ 1)
6+ (n+ 1)2, (3.6.5b)
=n(n+ 1)(2n+ 1) + 6(n+ 1)2
6, (3.6.5c)
=(n+ 1) [n(2n+ 1) + 6(n+ 1)]
6, (3.6.5d)
=(n+ 1)
[2n2 + 7n+ 1
]6
, (3.6.5e)
=(n+ 1)(n+ 2)(2n+ 3)
6. (3.6.5f)
Which is what would be expected, and we have proved this relation by induction.
So for our system,
1. First we show that this holds for n = 1,
A = [a11] = [1] [a11] where a11 6= 0.
2. Assume true for n:
If Ak, k = 1, . . . , n are invertible, then An×n = Ln×nUn×n.
3. Show it holds for n+ 1.
So let’s move onto the third step, assume A(n+1)×(n+1) with Ak, k = 1, . . . , n+1 are invertible.By induction assumption An = LnUn, since A1, . . . ,An are invertible. Now we need to showthat An+1 = Ln+1Un+1,
An+1 =
(An bcᵀ α
), (3.6.6a)
=
(LnUn b
cᵀ α
), (3.6.6b)
=
(Ln 0yᵀ 1
)(Un x0ᵀ
β
). (3.6.6c)
30
3.6. Lecture 9: September 9, 2013 Applied Matrix Theory
We want Lnx = b so we let x = L−1n b which supposes that L−1
n exists. We also wantyᵀUn = cᵀ so we let yᵀ = cᵀU−1
n . Finally, we want yᵀx + β = α, so we let β = α−yᵀx. Weknow,
An+1 =
(LnUn b
cᵀ α
), (3.6.7a)
=
(Ln 0
cᵀU−1n 1
)(Un L−1
n b0ᵀ
α− cᵀU−1n L−1
n b
). (3.6.7b)
Since A = An+1 is invertible, we must have β 6= 0 because if β = 0 then det(Ln+1) det(Un+1) =0, in which case An+1 would not be invertible. So, An+1 has an LU decomposition and byprinciple of induction we have proven our theorem.
Rectangular matrices
For a rectangular matrix Am×n ∈ Rm×n. Our question: is Ax = b solvable? Is the solutionunique? We are presented with there options: no solution, unique solution, or infinitelymany solutions. We are going to do Gaussian elimination to reduce the form of the matrixto see how many solutions we will have. So we will do row echelon form (REF) reduction.
Example of row echelon form
A =
1 2 1 3 32 4 0 4 41 2 3 5 52 4 0 4 7
, (3.6.8a)
→
1 2 1 3 30 0 −2 −2 −20 0 2 2 20 0 −2 −1 1
, (3.6.8b)
→
1 2 1 3 30 0 1 1 10 0 0 0 00 0 0 0 2
, (3.6.8c)
→
1 2 1 3 30 0 1 1 10 0 0 0 10 0 0 0 0
. (3.6.8d)
Where we made interchanges to have leading ones for the columns. What do we know aboutour matrix A from this information? First, we know what columns are linearly independent.We are trying to find the column space of our matrix.
31
Nitsche and Benner Unit 3. Factorization
3.7 Homework Assignment 2: Due Friday, September
13, 2013
1. Textbook 3.10.1 (a, c): LU and PLU factorizations
Let, A =
1 4 54 18 263 16 30
.
(a) Determine the LU factors of A
(c) Use the LU factors to determine A−1
2. Textbook 3.10.2
Let A and b be the matrices,
A =
1 2 4 173 6 −12 32 3 −3 20 2 −2 6
and b =
17334
.(a) Explain why A does not have an LU factorization.
(b) Use partial pivoting and find the permutation matrix P as well as the LU factorssuch that PA = LU.
(c) Use the information in P, L, and U to solve Ax = b.
3. Textbook 3.10.3
Determine all values of ξ for which A =
ξ 2 01 ξ 10 1 ξ
fails to have an LU factorization.
4. Textbook 3.10.5
If A is a matrix that contains only integer entries and all of its pivots are 1, explainwhy A−1 must also be an integer matrix. Note: This fact can be used to constructrandom integer matrices that posses integer inverses by randomly generating integermatrices L and U with unit diagonals and then constructing the product A = LU.
5. Lower triangular matrices
Let A be a 3× 3 matrix with real entries. We showed that GE is equivalent to findinglower triangular matrices L−1 and L−2 such that L−2L−1A = U where U is uppertriangular and,
L−1 =
1 0 0−`21 1 0−`31 0 1
, L−2 =
1 0 00 1 00 −`32 1
, (3.7.1)
32
3.7. HW 2: Due September 13, 2013 Applied Matrix Theory
with
(L−1)−1 =
1 0 0`21 1 0`31 0 1
= L1, (L−2)−1 =
1 0 00 1 00 `32 1
= L2. (3.7.2)
It follows that A = L2L1U. Show that
L2L1 =
1 0 0`21 1 0`31 `32 1
. (3.7.3)
Show by example that generally,
L2L1 6= L1L2 (3.7.4)
That is, the order in which these lower triangular matrices are multiplied matters.
6. Textbook 1.6.4: Conditioning
Using geometric considerations, rank the following three systems according to theircondition.
(a)
1.001x− y = 0.235,
x+ 0.0001y = 0.765.
(b)
1.001x− y = 0.235,
x+ 0.9999y = 0.765.
(c)
1.001x+ y = 0.235,
x+ 0.9999y = 0.765.
7. Textbook 1.6.5
Determine the exact solution of the following system:
8x+ 5y + 2z = 15,
21x+ 19y + 16z = 56,
39x+ 48y + 53z = 140.
Now change 15 to 14 in the first equation and again solve the system with exactarithmetic. Is the system ill-conditioned?
33
Nitsche and Benner Unit 3. Factorization
8. Textbook 1.6.6
Show that the system
v − w − x− y − z = 0,
w − x− y − z = 0,
x− y − z = 0,
y − z = 0,
z = 1,
is ill-conditioned by considering the following perturbed system:
v − w − x− y − z = 0,
− 1
15v + w − x− y − z = 0,
− 1
15v + x− y − z = 0,
− 1
15v + y − z = 0,
− 1
15v + z = 1.
34
UNIT 4
Rectangular Matrices
4.1 Lecture 10: September 11, 2013
Rectangular matrices (cont.)
We are interested in a rectangular matrix, Am×n. We may apply REF, or RREF to find thecolumn dependence, what the basic columns are, and what the rank of the matrix is. Thisway we can find for any system Ax = b, whether the system is consistent and find all thesolutions; whether it is homogeneous, or what the free variables are; and what the particularsolutions are. Last time’s example, we went from
A =
1 2 1 3 32 4 0 4 41 2 3 5 52 4 0 4 7
, (4.1.1a)
→
1 2 1 3 30 0 2 2 20 0 0 0 30 0 0 0 0
. (4.1.1b)
The first, third, and fifth columns have pivots and are the basic columns. They correspondto the linearly independent columns in A. How do we write the other two columns (c2, c4)as functions of the other three columns? We can notice that, c2 = 2c1, and similarly c4 =2c1 + c3. The reduced row echelon form (RREF) has pivots on 1, and zeros below and above
35
Nitsche and Benner Unit 4. Rectangular Matrices
x1
x2
(a) Intersecting system (onesolution)
x1
x2
(b) Parallel system (no solu-tion)
x1
x2
(c) Equivalent system (infi-nite solutions)
Figure 4.1. Geometric illustration of linear systems and their solutions.
all pivots. So, 1 2 1 3 30 0 2 2 20 0 0 0 30 0 0 0 0
→
1 2 1 3 30 0 1 1 10 0 0 0 10 0 0 0 0
, (4.1.2a)
→
1 2 1 3 00 0 1 1 00 0 0 0 10 0 0 0 0
, (4.1.2b)
→
1 2 0 2 00 0 1 1 00 0 0 0 10 0 0 0 0
. (4.1.2c)
In this form, the basic columns are very clear, and the relations between the dependentcolumns and the basic columns is also obvious. So again we can see that, c2 = 2c1 andc4 = 2c1 +1c3. The rank of the matrix is the number of linearly independent columns, whichis also the number of linearly independent rows, and also the number of pivots in row-echelonform of the matrix. A consistent system, Ax = b is a system that has at least one solution.It is inconsistent if it has no solutions. To determine if Ax = b is consistent, in a 2 × 2system, Ax = b,
a11x1 + a12x2 = b1, (4.1.3a)
a21x1 + a22x2 = b2. (4.1.3b)
Since this system is a linear system we can see three cases: one intersection, parallel andseparated, and parallel and the same. Each of these cases are illustrated in Figure 4.1.In general, for any size matrix, we find the row echelon form of the augmented system
36
4.1. Lecture 10: September 11, 2013 Applied Matrix Theory
[A b]→[E b
]. x x x x x
0 x x x x0 0 0 0 α
(4.1.4)
If α 6= 0, then the system is inconsistent. So Ax = b is consistent if rank([A b]) = rank(A).If α = 0 then b is not a basic column of (A b). The we can write b as a linear combinationof the basic columns of E. We can write b as linear combinations of basic columns of A.
In our example, we had c1, c3, and c5 where the basic columns and Ax = b was consistent.Here then if we were to preform a reduction, the b = x1c1 + x3c3 + x5c5, or in other words,
A
x1
0x3
0x5
= b. (4.1.5)
Example of RREF of a Rectangular Matrix
Given the matrix, 1 1 2 2 1 12 2 4 4 3 12 2 4 4 2 23 5 8 6 5 3
→
1 1 2 2 1 10 0 0 0 1 −10 0 0 0 0 00 2 2 0 2 0
, (4.1.6a)
→
1 1 2 2 1 10 2 2 0 2 00 0 0 0 1 −10 0 0 0 0 0
. (4.1.6b)
Thus, our system is consistent. We have that rank([A b]) = rank(A). Similarly, we observethat we have 3 basic columns, r, and 2 linearly dependent columns, n− r. (If n > m, thenn > r, so n− r 6= 0). Let’s continue on to perform the reduced row echelon form.
1 1 2 2 1 10 2 2 0 2 00 0 0 0 1 −10 0 0 0 0 0
→
1 1 2 2 1 10 1 1 0 1 00 0 0 0 1 −10 0 0 0 0 0
, (4.1.7a)
→
1 1 2 2 0 20 1 1 0 0 10 0 0 0 1 −10 0 0 0 0 0
, (4.1.7b)
→
1 0 1 2 0 10 1 1 0 0 10 0 0 0 1 −10 0 0 0 0 0
. (4.1.7c)
37
Nitsche and Benner Unit 4. Rectangular Matrices
Thus our b = 1c1 + 1c2 − 1c5. Therefore, b = 1c1 + 1c2 − 1c5, and
x =
1100−1
. (4.1.8)
So in review, 1 1 2 2 1 12 2 4 4 3 12 2 4 4 2 23 5 8 6 5 3
→
1 0 1 2 0 10 1 1 0 0 10 0 0 0 1 −10 0 0 0 0 0
. (4.1.9)
We found a particular solution, xp = (1 1 0 0 − 1)ᵀ of Ax = b. For any solution xh ofAx = 0, we have that A (xp + xH) = b + 0. So (xp + xH) also solves Ax = b.
4.2 Lecture 11: September 13, 2013
Solving Ax = b
Ax = b is consistent if rank[A |b] = rank(A). We have that b is a nonbasic column of[A |b]. We can express b in terms of columns of A to get a solution Axp = b. The set of allsolutions is xp + xH , where Axp = b has the particular solution to Ax = b. We also solveAxH = 0, and get all homogeneous solutions, xH . Since we can add these two solutions, wehave A (xp + xH) = b.
Now to actually find the particular solution, xp, we write b in terms of basic columns.To find the homogeneous solutions, xH , we solve Ax = 0 by solving for basic variables xiin terms of the n− r free variables. Basic variables correspond to basic columns, while freevariables correspond to nonbasic columns. Note that if n > r then the set of columns islinearly independent and we can find x 6= 0 such that Ax = 0.
Example
From our example 1 1 2 2 1 12 2 4 4 3 12 2 4 4 2 23 5 8 6 5 3
→
1 0 1 2 0 10 1 1 0 0 10 0 0 0 1 −10 0 0 0 0 0
, (4.2.1a)
we have that
b = a:1 + a:2 − a:5, (4.2.2a)
= x1a:1 + x2a:2 − x5a:5, (4.2.2b)
= Axp, where xp =(1 1 0 0 −1
)ᵀ. (4.2.2c)
38
4.2. Lecture 11: September 13, 2013 Applied Matrix Theory
Solve,
[A |0] =
1 0 1 2 0 00 1 1 0 0 00 0 0 0 1 00 0 0 0 0 0
. (4.2.3a)
This gives us the three equations for the homogeneous solutions,
x1 = −x3 − 2x4, (4.2.4a)
x2 = −x3, (4.2.4b)
x5 = 0. (4.2.4c)
This gives us the homogeneous solutions of the form,
xH =
−x3 − 2x4
−x3
x3
x4
0
, (4.2.5a)
= x3
−1−1
000
+ x4
−2
0010
. (4.2.5b)
Thus the set of all solutions are,
x = xp + xH , (4.2.6a)
=
1100−1
+ x3
−1−1
000
+ x4
−2
0010
. (4.2.6b)
This solves Ax = b for any x3 and x4. Therefore we have infinitely many solutions. Not wecan only have unique solutions if n = r.
Linear functions
We have any function f : D → R is a linear function if
1. f(x+ y) = f(x) + f(y),
2. f(αx) = αf(x).
39
Nitsche and Benner Unit 4. Rectangular Matrices
For example, f(x) = ax+ b, with b 6= 0.
f(x+ y) = (ax+ b) + (ay + b) , (4.2.7a)
= a(x+ y) + 2b, (4.2.7b)
6= a(x+ y) + b. (4.2.7c)
Thus this is not a linear function. However when b = 0, the function f(x) = ax can beverified to be linear.
Example: Transpose operator
The transpose operator is f(A) = Aᵀ. Define that if A = [aij], then Aᵀ = [aji] andA∗ = Aᵀ = [aji]. Is this linear?
f(A + B) = (A + B)ᵀ, (4.2.8a)
= [aij + bij]ᵀ, (4.2.8b)
= [aji + bji] , (4.2.8c)
= Aᵀ
+ Bᵀ. (4.2.8d)
To check the second criterion,
f(αA) = [αA]ᵀ, (4.2.9a)
= α [A]ᵀ, (4.2.9b)
= αf(A). (4.2.9c)
So this operator is linear.
Example: trace operator
The trace operator is f(A) = tr(A) =∑
i aii.
f(A + B) =∑i
(aii + bii) , (4.2.10a)
=∑i
aii +∑i
bii, (4.2.10b)
= tr(A) + tr(B). (4.2.10c)
The second cirterion,
f(αA) = tr(αA), (4.2.11a)
=∑i
αaii, (4.2.11b)
= α∑i
aii, (4.2.11c)
= α tr(A), (4.2.11d)
= αf(A). (4.2.11e)
We have therefore shown that this is a linear operator.
40
4.2. Lecture 11: September 13, 2013 Applied Matrix Theory
Matrix multiplication
Given,
A =
(a bc d
), B =
(a b
c d
). (4.2.12)
Then consider
f(x) = Ax =
(ax1 + bx2
cx1 + dx2
), g(x) = Bx =
(ax1 + bx2
cx1 + dx2
). (4.2.13)
Take
f(g(x)) = A (Bx) ≡ ABx. (4.2.14)
But,
f(g(x)) =
(a(ax1 + bx2) + b(cx1 + dx2)
c(ax1 + bx2) + d(cx1 + dx2)
), (4.2.15a)
=
((aa+ bc)x1 + (ab+ bd)x2
(ca+ dc)x1 + (cb+ dd)x2
), (4.2.15b)
=
(aa+ bc ab+ bd
ca+ dc cb+ dd
)(x1
x2
), (4.2.15c)
≡ AB. (4.2.15d)
Now if we define AB = [Ai:B:j]︸ ︷︷ ︸(AB)ij
or Ai:B:j =∑n
k=1AikBkj. We get that matrix multiplication
is not generally commutative, or AB 6= BA. If AB = 0 then either A = 0 or B = 0 unlessA or B are invertible. Further we know that we have the distributive properties,
A (B + C) = AB + AC, (4.2.16)
or
(A + B) D = AD + BD, (4.2.17)
and the associative property
(AB) C = A (BC) . (4.2.18)
A property of the transpose operator is,
(AB)ᵀ
= BᵀAᵀ, (4.2.19)
which also helps to understand that,
tr(AB) = tr(BA). (4.2.20)
Note, however, that tr(ABC) 6= tr(ACB) as we will demonstrate on the homework.
41
Nitsche and Benner Unit 4. Rectangular Matrices
Proof of transposition property
We want to prove the useful property,
(AB)ᵀ
= BᵀAᵀ. (4.2.21)
Dealing with our left hand side of the equation,
LHS : (AB)ᵀ
=[(AB)
ᵀij
], (4.2.22a)
= [(AB)ji], (4.2.22b)
= [Aj:B:i]. (4.2.22c)
Manipulating the right hand side of the property,
RHS : BᵀAᵀ
=[(
BᵀAᵀ)ij
], (4.2.23a)
=[Bᵀi:Aᵀ:j
], (4.2.23b)
= [B:iAj:], (4.2.23c)
= [Aj:B:i], (4.2.23d)
= LHS. (4.2.23e)
Thus, we have proved the identity.
4.3 Lecture 12: September 16, 2013
We will be having an exam on September 30th.
Inverses
We define: A has an inverse if each A−1 exists such that,
AA−1 = A−1A = I. (4.3.1)
We also have the properties:
• (AB)−1 = B−1A−1,
• (Aᵀ)−1
= (A−1)ᵀ,
• (A−1)−1
= A.
What about the inverse of sums (A + B)−1? There are the special cases,
• low rank perturbations of In×n : (I + CDᵀ)−1
, where C, D ∈ Rn×k or the matrices areof rank k.
• small perturbation of I : (I + A)−1, where ||A||.
42
4.3. Lecture 12: September 16, 2013 Applied Matrix Theory
We have a rank-1 matrix uvᵀ, with u, v ∈ Rn = Rn×1.
uvᵀ
=
u1
u2...uk
(v1 v2 · · · vk), (4.3.2a)
=
u1v1 u1v2 · · · u1vku2v1 u2v2 · · · u2vk
......
. . ....
ukv1 ukv2 · · · ukvk
, (4.3.2b)
=
u1v
ᵀ
u2vᵀ
...ukv
ᵀ
. (4.3.2c)
Now let’s say we have an example where all matrix entries are zero except for αij at somepoint (i, j).
0 · · · 0
... α...
0 · · · 0
=
0...α...0
(0 · · · 1 · · · 0
), (4.3.3a)
= αeieᵀj . (4.3.3b)
Low rank perturbations of I
We make the claim the if u, v are such that vᵀu + 1 6= 0 then
(I + uv
ᵀ)−1= I− uvᵀ
1 + vᵀu(4.3.4)
Proof: (I + uv
ᵀ)(I− uvᵀ
1 + vᵀu
)= I− uvᵀ
1 + vᵀu+ uv
ᵀ − u (vᵀu) vᵀ
1 + vᵀu, (4.3.5a)
= I− uvᵀ
1 + vᵀu+ uv
ᵀ − (vᵀu)
1 + vᵀuuvᵀ, (4.3.5b)
= I− uvᵀ(
1
1 + vᵀu+ 1− (vᵀu)
1 + vᵀu
), (4.3.5c)
= I− uvᵀ
���������(
1− 1 + vᵀu
1 + vᵀu
), (4.3.5d)
= I. (4.3.5e)
43
Nitsche and Benner Unit 4. Rectangular Matrices
So if c, d ∈ Rn such that dᵀ
(A−1c) + 1 6= 0, we are interested in A−1.[A + cd
ᵀ]−1=[A(I + A−1cd
ᵀ)], (4.3.6a)
=(I +
(A−1c
)dᵀ)
A−1, (4.3.6b)
=
(I− A−1cd
ᵀ
1 + dᵀA−1c
)A−1, (4.3.6c)
= A−1 − A−1cdᵀA−1
1 + dᵀA−1c
. (4.3.6d)
The Sherman–Morrison Formula
The Sherman–Morrison formula states that if A is invertible and C, D ∈ Rn×k such thatI + DᵀA−1C is invertible. Then,(
A + CDᵀ)−1
= A−1 −A−1C(I + D
ᵀA−1C
)−1DᵀA−1 (4.3.7)
Finite difference example with periodic boundary conditions
Previously, we had,
−y′′ = f, on [a, b], (4.3.8a)
y(a) = ya, (4.3.8b)
y(b) = yb. (4.3.8c)
We get the finite difference approximation of,2 −1 0 0−1 2 −1 · · · 0
0 −1 2. . . 0
.... . . . . .
...0 0 0 · · · 2
y1
y2
y3...
yn−1
= h2
f1...fi...
fn−1
+
y0...0...yn
. (4.3.9)
If we instead use periodic boundary conditions we have perturbed our solution,
−y′′ = f, on [a, b], (4.3.10a)
y(a) = y(b), (4.3.10b)
y′(a) = y′(b). (4.3.10c)2 −1 0 −1−1 2 −1 · · · 0
0 −1 2. . . 0
.... . . . . .
...−1 0 0 · · · 2
y1
y2
y3...
yn−1
= h2
f1...fi...
fn−1
. (4.3.11)
In this case the Shermann–Morrison formula would help greatly with our inversion.
44
4.3. Lecture 12: September 16, 2013 Applied Matrix Theory
Examples of perturbation
Given a matrix
A =
(1 21 3
), (4.3.12a)
A−1 =
(3 −2−1 1
). (4.3.12b)
B =
(1 22 3
), (4.3.12c)
= A +
(0 01 0
), (4.3.12d)
= A + e2eᵀ1, (4.3.12e)
Applying the Shermann–Morrison formula
B−1 = A−1 −A−1
(0 01 0
)A−1
1 + eᵀ1A−1e2
, (4.3.12f)
= A−1 −
(3 −2−1 1
)(0 03 −2
)1− 2
, (4.3.12g)
= A−1 −(−6 4
3 −2
), (4.3.12h)
=
(9 −2−4 3
). (4.3.12i)
Small perturbations of I
We want to show what happens when we have small perturbations from the identity matrixI;
(I−A)−1 ?= I + A + A2 + · · · , (4.3.13)
when ‖A‖ < 1.We first consider the geometric series,
(1− x)−1 =1
1− x, (4.3.14a)
=∞∑n=0
xn, (4.3.14b)
= 1 + x+ x2 + x3 + · · · (4.3.14c)
when |x| < 1.To be continued. . .
45
Nitsche and Benner Unit 4. Rectangular Matrices
4.4 Lecture 13: September 18, 2013
Small perturbations of I (cont.)
We want to show what happens when we have small perturbations from the identity matrixI;
(I−A)−1 ?= I + A + A2 + · · · , (4.4.1)
when ‖A‖ < 1.We first consider the geometric series ,
(1− x)−1 =1
1− x, (4.4.2a)
=∞∑n=0
xn, (4.4.2b)
= 1 + x+ x2 + x3 + · · · , (4.4.2c)
when |x| < 1. This is proved as follows,
S =n∑k=0
xk, (4.4.3a)
S − xS = 1 + x+ x2 + · · ·+ xn − x− x2 − · · · − xn+1, (4.4.3b)
= 1 +����(x− x) +������(x2 − x2
)+ · · ·+������
(xn − xn)− xn+1, (4.4.3c)
= 1− xn+1, (4.4.3d)
S =1− xn−1
1− x, (4.4.3e)
= limn→∞
1− xn−1
1− x, (4.4.3f)
=1
1− x. (4.4.3g)
Returning to the full series for a matrix,
(I−A) (I + A + · · ·+ An) = I + A + A2 + · · ·+ An −A−A2 − · · · −An+1, (4.4.4a)
= I +�����(A−A) +������(A2 −A2
)+ · · ·+������
(An −An)−An+1,(4.4.4b)
= I−An+1. (4.4.4c)
If A is small, so that An → 0 as n→∞, then
(I−A)∞∑k=0
Ak = I, (4.4.4d)
(I−A)−1 =∞∑k=0
Ak. (4.4.4e)
46
4.4. Lecture 13: September 18, 2013 Applied Matrix Theory
Let’s consider the convergence of this series now.
L =∞∑k=1
ak, (4.4.5)
where ak → 0 as k → ∞. We define that L is finite if limn→∞∑n
k=1 ak exists and is finite.As an example we see that
∑∞n=1
1n
diverges since limn→∞∑n
k=11k→ ∞. So we also should
consider that the difference,
(L−)−∞∑k=1
ak → 0, as n→∞. (4.4.6)
Thus, we can consider that,
L ≈n∑k=1
ak, with error → 0 as n→∞. (4.4.7)
In particular if A is small then,
(I−A)−1 ≈ I + A. (4.4.8)
For example,
(A + B)−1 =[A(I + A−1B
)]−1, (4.4.9a)
where A−1 exists,
=(I + A−1B
)−1A−1, (4.4.9b)
≈(I−A−1B
)A−1, (4.4.9c)
= A−1 −A−1BA−1. (4.4.9d)
Matrix Norms
The properties of norms of matrix A ∈ Rm×n has a norm, ‖ · ‖, if the norm satisfies,
1. ‖A‖ ≥ 0, and if ‖A‖ = 0 then A = 0,
2. ‖A + B‖ ≤ ‖A‖+ ‖B‖,3. ‖αA‖ = |α| ‖A‖,
and we must add the fourth property;
4. ‖AB‖ ≤ ‖A‖ ‖B‖.
As an example of a norm,
‖A‖ = maxj
∑i
|aij| (4.4.10)
47
Nitsche and Benner Unit 4. Rectangular Matrices
which is the maximum absolute value of the column sum. If ‖A‖ < 1, then 0 ≤ ‖An‖ ≤‖A‖n → 0 as n→∞. So ‖An‖ → 0 as n→∞ and An → 0 as n→∞.
When is A−1B small? ∥∥A−1B∥∥ ≤ ∥∥A−1
∥∥ ‖B‖, (4.4.11a)
=∥∥A−1
∥∥‖B‖‖A‖‖A‖, (4.4.11b)
=‖B‖‖A‖
. (4.4.11c)
Thus, ∥∥A−1∥∥ 6≤ 1
‖A‖. (4.4.12)
Note, we have shown ‖A−1‖ 6≤ 1‖A‖ , since ‖AA−1‖ = ‖I‖ which we suppose to be equal to
1. If this is the case,
1 =∥∥AA−1
∥∥, (4.4.13a)
= ‖A‖∥∥A−1
∥∥. (4.4.13b)
So,
1
‖A‖≤∥∥A−1
∥∥. (4.4.13c)
However, we would get,
∥∥A−1∥∥ =‖A−1‖‖A‖‖A‖
, (4.4.13d)
=∥∥A−1
∥∥κ (A) . (4.4.13e)
Condition Number
For example pertaining to the condition number , we suppose we have Ax = b, and we havethe perturbation (A + B) x = b, where we know that ‖A−1B‖ < 1, or in other words thatB is sufficiently small. We can get the relative change in x introduced by the change in A,
‖x− x‖‖x‖
=
∥∥A−1b− (A + B)−1 b∥∥
‖x‖, (4.4.14a)
=
∥∥[A−1 − (A + B)−1]b∥∥‖x‖
, (4.4.14b)
48
4.5. HW 3: Due September 27, 2013 Applied Matrix Theory
If we use (A + B)−1 ≈ A−1 −A−1BA−1
≈ ‖A−1BA−1b‖‖x‖
, (4.4.14c)
≤ ‖A−1B‖‖x‖‖x‖
, (4.4.14d)
≤ ‖A−1‖‖B‖‖A‖‖A‖
, (4.4.14e)
=‖B‖‖A‖
κ(A). (4.4.14f)
Thus, κ(A) measures the amplification of the errors.
4.5 Homework Assignment 3: Due Friday, September
27, 2013
For the first four problems, you may use the Matlab commands rref(a) and a\b to checkyour work.
1. Textbook 2.2.1: Row Echelon Form, Rank, Consistency, General solution of Ax = b.
Determine the reduced row echelon form for each of the following matrices and thenexpress each nonbasic column in terms of the basic columns:
(a)
1 2 3 32 4 6 92 6 7 6
(b)
2 1 1 3 0 4 14 2 4 4 1 5 52 1 3 1 0 4 36 3 4 8 1 9 50 0 3 −3 0 0 38 4 2 14 1 13 3
2. Textbook 2.3.3
If A is an m × n matrix with rank(A) = m, explain why the system [A|b] must beconsistent for every right-hand side b.
3. Textbook 2.5.1
Determine the general solution for each of the following non homogeneous systems.
(a)
x1 + 2x2 + x3 + 2x4 = 3, (4.5.1a)
2x1 + 4x2 + x3 + 3x4 = 4, (4.5.1b)
23x1 + 6x2 + x3 + 4x4 = 5. (4.5.1c)
49
Nitsche and Benner Unit 4. Rectangular Matrices
(b)
2x+ y + z = 4, (4.5.2a)
4x+ 2y + z = 6, (4.5.2b)
6x+ 3y + z = 8, (4.5.2c)
8x+ 4y + z = 10. (4.5.2d)
(c)
x1 + x2 + 2x3 = 3, (4.5.3a)
3x1 + 3x3 + 3x4 = 6, (4.5.3b)
2x1 + x2 + 3x3 + x4 = 3, (4.5.3c)
x1 + 2x2 + 3x3 − x4 = 0. (4.5.3d)
(d)
2x+ y + z = 2, (4.5.4a)
4x+ 2y + z = 5, (4.5.4b)
6x+ 3y + z = 8, (4.5.4c)
8x+ 5y + z = 8. (4.5.4d)
4. Textbook 2.5.4
Consider the following system:
2x+ 2y + 3z = 0, (4.5.5a)
4x+ 8y + 12z = −4, (4.5.5b)
6x+ 2y + αz = 4. (4.5.5c)
(a) Determine all values of α for which the system is consistent.
(b) Determine all values of α for which there is a unique solution, and compute thesolution for these cases.
(c) Determine all values of α for which there are infinitely many different solutions,and give the general solution for these cases.
5. Textbook 3.3.1: Linear Functions
Each of the following is a function from R2 into R2. Determine which are linearfunctions.
(a) f
(xy
)=
(x
1 + y
).
(b) f
(xy
)=
(yx
).
50
4.5. HW 3: Due September 27, 2013 Applied Matrix Theory
Figure 4.2. Figures for Textbook problem 3.3.4.
(c) f
(xy
)=
(0xy
).
(d) f
(xy
)=
(x2
y2
).
(e) f
(xy
)=
(x
sin y
).
(f) f
(xy
)=
(x+ yx− y
).
6. Textbook 3.3.4
Determine which of the following three transformations in R2 are linear.
7. Textbook 3.5.4: Matrix Multiplication
Let ej denote the jth unit column that contains a 1 in the jth position and zeroseverywhere else. For a general matrix An×n, describe the following products. (a)Aej (b) eᵀjA (c) eᵀjAej
8. Textbook 3.5.6
(please use induction)
For A =1/2 α0 1/2
, determine limn→∞An. Hint: Compute a few powers of A and try
to deduce the general form of An.
9. Textbook 3.5.9
If A = [aij(t)] is a matrix whose entries are functions of a variable t, the derivative ofA with respect to t is defined to be the matrix of derivatives. That is,
dA
dt=
[daijdt
].
51
Nitsche and Benner Unit 4. Rectangular Matrices
Derive the product rule for differentiation
d(AB)
dt=
dA
dtB + A
dB
dt.
10. Textbook 3.6.2
For all matrices An×k and Bk×n show that the block matrix
L =
(I−BA B
2A−ABA AB− I
)has the property L2 = I. Matrices with this property are said to be involuntary, andthey occur in the science of cryptography.
11. Textbook 3.6.3
For the matrix
A =
1 0 0 1/3 1/3 1/30 1 0 1/3 1/3 1/30 0 1 1/3 1/3 1/30 0 0 1/3 1/3 1/30 0 0 1/3 1/3 1/30 0 0 1/3 1/3 1/3
,
determine A300. Hint: A square matrix C is said to be idempotent when it has theproperty that C2 = C. Make use of the idempotent submatrices in A.
12. Textbook 3.6.5
If A and B are symmetric matrices that commute, prove that the product AB is alsosymmetric. If AB 6= BA, is AB necessarily symmetric?
13. Textbook 3.6.7
For each matrix An×n, explain why it is impossible to find a solution for Xn×n in thematrix equation
AX−AX = I. (4.5.6)
Hint: Consider the trace function.
14. Textbook 3.6.11
Prove that each of the following statements is true for conformable matrices
(a) tr (ABC) = tr(BCA) = tr(CAB).
(b) tr (ABC) can be different from tr (BAC).
(c) AᵀB = tr(ABᵀ)
15. Textbook 3.7.2: Inverses
Find the matrix X such that X = AX + B, where
A =
0 −1 00 0 −10 0 0
and B =
1 22 13 3
.
52
4.5. HW 3: Due September 27, 2013 Applied Matrix Theory
16. Textbook 3.7.6
If A is a square matrix such that I−A is nonsingular, prove that
A (I−A)−1 = (I−A)−1 A.
17. Textbook 3.7.8
If A, B, and A + B are each nonsingular, prove that
A (A + B)−1 = B (A + B)−1 A =(A−1 + B−1
)−1.
18. Textbook 3.7.9
Let S be a skew-symmetric matrix with real entries.
(a) Prove that I− S is nonsingular. Hint: xᵀx = 0 means x = 0.
(b) If A = (I + S) (I− S)−1, show that A−1 = Aᵀ.
19. Textbook 3.9.9: Sherman–Morrison formula, rank 1 matrices
Prove that rank(An×n) = 1 if and only if there are nonzero columns um×1 and vn×1
such thatA = uv
ᵀ.
20. Textbook 3.9.10
Prove that rank(An×n) = 1, then A2 = τA, where τ = tr(A).
53
Nitsche and Benner Unit 4. Rectangular Matrices
54
UNIT 5
Vector Spaces
5.1 Lecture 14: September 20, 2013
Topics in Vector Spaces
We will be discussing the following topics in this lecture (and possibly the next couple).
• Field
• Vector Space
• Subspace
• Spanning Set
• Basis
• Dimension
• The four subspaces of Am×n
Field
We define a field as a set F with the properties such that,
• Closed under addition (+) and multiplication ( · ). Thus if α, β ∈ F , then α + β ∈ Fand α · β ∈ F .
• Addition and multiplication are commutative.
• Addition and multiplication are associative. This means that (α+β)+γ = α+(β+γ)and (αβ)γ = α(βγ).
• Addition with multiplication is distributive. α(β + γ) = αβ + αγ.
• There exists an additive and multiplicative identity α + 0 = α, α · 1 = α.
• There exists an additive and multiplicative inverse α + (−α) = 0, α(α−1) = 1.
For example the reals and the complex numbers are fields. The natural numbers are not,the rational numbers are. The set L2 = {0, 1} has the three operations 0 + 0 = 1, 0 + 1 = 1,1 + 1 = 0.
55
Nitsche and Benner Unit 5. Vector Spaces
Vector Space
We may define a vector space V over a field F is a set V with operations + and · such that,
• v + w ∈ V for any v,w ∈ V .
• αv ∈ V for any v ∈ V , α ∈ F .
• v + w = w + v for any v,w ∈ V . This is the commutative property of addition.
• (u+v)+w = u+(v+w) for any u,v,w ∈ V , which is the associative law of addition.
• For each 0 ∈ V contains u + 0 = u, for any u ∈ V .
• For each −u ∈ V contains u + (−u) = 0, for any u ∈ V .
• (αβ)u = α(βu) for any α, β ∈ F , u ∈ V .
• (α+ β)u = αu + βu for any α, β ∈ F , u ∈ V . This is the first form of the distributiveproperty.
• 1 · u = u, the 1 multiplication identity in F .
• α(u + v) = αu + αv for any α ∈ F , and u,v ∈ V .
Examples of vector spaces of R is Rn = Rn×1, Rn×m, Cm×n, all functions such that [0, 1]→ R,all polynomials which map R→ R.
Theorem 5.1. A subset S of a vector space V over F is a vector space over F if
• v + w ∈ S, for any v,w ∈ S.
• αv ∈ S for any α ∈ F , v ∈ S.
Several examples include All continuous functions: [0, 1] → R = C[0, 1], all polynomialsof degree n, S = {0} contained in V .
Definition 5.2. Let {v1, . . . ,vn} ∈ V , then span{v1, . . . ,vn} = {α1v1 + α2v2 + · · ·+ αnvn, αk ∈ F}.
Theorem 5.3. This gives the theorem: The span of {v1, . . . ,vn} is a subspace.
Definition 5.4. The set {v1, . . . ,vn} is a spanning set of span{v1, . . . ,vn}.
Note the 0 ∈ span{v1, . . . ,vn}, and 0 ∈ subspace.
For example, span
{(12
)}contained in R2 = span
{(12
),
(−2−4
)}. This gives rise to
the basis vector
(12
), thus the system is one-dimensional. The basis vector is illustrated
along with the solution on Figure 5.1.
Definition 5.5. A basis for a vector space is a minimal spanning set.
Theorem 5.6. Any two passes for a vector space have the same number of elements.
Definition 5.7. The number of elements in the basis is equal to the dimension of the space.
56
5.1. Lecture 14: September 20, 2013 Applied Matrix Theory
x1
x2
Figure 5.1. Basis vector of example solution.
For example, P2 = {a1 + a2x+ a3x2} the basis of this set is {1, x, x2} and we observe
that it must have three dimensions. Therefore, for a polynomial of degree n the dimensionsof the polynomial function space are dim(Pn) = n+ 1.
As another example, S = {0} = ∅ the basis is the null set, and we have a zero-dimensionalsystem. Thus, zero cannot be an element of a basis.
Definition 5.8. A set {v1, . . . ,vn} is linearly independent if α1v1 + α2v2 + · · ·+ αnvn = 0implies α1 = α2 = · · · = αn = 0.
It follows that {0} is not a linearly independent space since,
α0 = 0, for any α 6= 0. (5.1.1)
Similarly, any set containing 0 is not linearly independent.
Examples of function spaces
On example is the solutions to y′′ = 0. This is the set {y = αx+ b |α, β ∈ R}. Thevector space has two dimensions and the basis is {1, x}. Another example is the setof solutions of y′′ = y. The set of solutions is {y = c1ex + c2e−x}, which has the two-dimensional basis {ex, e−x}. A third example is the set of solutions of y′′ = −y. This set is{y = c1 sin(x) + c2 cos(x)} which is also the two-dimensional space {sin(x), cos(x)}. A finalexample of interest is y′′ = 2. This gives the solution set {y = x2 + αx+ β}. This howeveris not a vector space because we are restricted by the defined coefficient of x2 being one!This results from the fact that this is a non homogeneous system, unlike the other exampleswhich may be rearranged into homogeneous form.
In the general example of R2×2 =
{(a bc d
)}, the basis of this system is
{(1 00 0
),
(0 10 0
),
(0 01 0
),
(0 00 1
)}.
57
Nitsche and Benner Unit 5. Vector Spaces
5.2 Lecture 15: September 23, 2013
The four subspaces of Am×n
We now define the four fundamental subspaces of Am×n : Rn → Rm. These are:
1. R(A) = {y : y = Ax,x ∈ Rn} ⊂ Rm This is the column space.
2. N(A) = {x ∈ Rn : Ax = 0} ⊂ Rn. This is the null space of A.
3. R(Aᵀ) = {y : y = Aᵀx,x ∈ Rm} ⊂ Rn. This is equivalently, R(Aᵀ) = {y : yᵀ =xᵀA,x ∈ Rn} ⊂ Rm. This determines why this is the row space of A.
4. N(Aᵀ) = {x ∈ Rm : Aᵀx = 0 or xᵀA = 0ᵀ} ⊂ Rm. This is called the left null space of
A.
We want to show that R(A) is a vector space. So we let y1,y2 ∈ R(A) Then, y1 = Ax1 andy2 = Ax2 for some x1,x2. This tells us that
y1 + y2 = Ax1 + Ax2, (5.2.1a)
= A (x1 + x2) ∈ R(A). (5.2.1b)
Also
αy1 = αAx1, (5.2.2a)
= Aαx1 ∈ R(A). (5.2.2b)
Thus R(A) is a subspace of Rm.An example: Find the spanning set for all 4 subspaces of,
A =
1 2 1 3 32 4 0 4 41 2 3 5 52 4 0 4 2
→
1 2 0 2 00 0 1 1 00 0 0 0 10 0 0 0 0
(5.2.3)
So the row space,
R(A) = span
1212
,
1030
,
3452
⊂ R4. (5.2.4)
To find the column space, we need the solution of the homogeneous equation Ax = 0.
x1 = −2x2 − 2x4, (5.2.5a)
x3 = −x4, (5.2.5b)
x5 = 0, (5.2.5c)
58
5.2. Lecture 15: September 23, 2013 Applied Matrix Theory
or
x = x2
−2
1000
+ x4
−2
0−1
10
. (5.2.6)
Thus,
N(A) = span
−2
1000
,
−2
0−1
10
⊂ R5. (5.2.7)
Now say,
A→ EA : Pm×mAm×n = EA,m×n. (5.2.8)
We have that Pm×m is square and invertible (it is a product matrix). We also know thatPA = EA where the rows EA are a linear combination of rows of A. Similarly, A = P−1EA
has that the rows of A are linearly commutations of the rows of EA or that the row spaceof A is equal to the row space of EA. So,
R(Aᵀ) = row space
ᵀof A, (5.2.9a)
= span
12020
,
00110
,
00001
, (5.2.9b)
={y : y = A
ᵀx or y
ᵀ= x
ᵀA}
(5.2.9c)
59
Nitsche and Benner Unit 5. Vector Spaces
To find the fourth space, N(Aᵀ),1 2 1 22 4 2 41 0 3 03 4 5 43 4 5 2
→
1 2 1 20 0 0 00 −2 2 −20 −2 2 −20 −2 2 4
, (5.2.10a)
→
1 2 1 20 1 −1 10 0 0 −20 0 0 00 0 0 0
, (5.2.10b)
→
1 2 1 20 1 −1 10 0 0 −20 0 0 00 0 0 0
, (5.2.10c)
→
1 2 1 00 1 −1 00 0 0 10 0 0 00 0 0 0
, (5.2.10d)
→
1 0 3 00 1 −1 00 0 0 10 0 0 00 0 0 0
. (5.2.10e)
So the solution for Aᵀx = 0,
x1 = −3x3, (5.2.11a)
x2 = x3, (5.2.11b)
x3 = x3. (5.2.11c)
or
x = x3
−3
110
. (5.2.12)
This finally gives us that,
N(Aᵀ) = span
−3
110
⊂ R4. (5.2.13)
60
5.3. Lecture 16: September 25, 2013 Applied Matrix Theory
So the dimension of the row space of A is
dim (R(A)) = r, (5.2.14)
which is also known as the rank of A. The dimensions of the other spaces are
dim (N(A)) = n− r. (5.2.15)
For
dim(R(A
ᵀ))
= r. (5.2.16)
Finally,
dim(N(A
ᵀ))
= n− r. (5.2.17)
Alternative to fin the left null space of A. That is
N(Aᵀ) =
{x : x
ᵀA = 0
}. (5.2.18)
We use
PA =
— b1 —
...— br —
...— 0 —
(5.2.19a)
with r rows occupied and n− r zero rows. From this we can use block matrices,
P =
(P1
P2
)(5.2.20)
So
PA =
(P1
P2
)A =
(P1AP2A
). (5.2.21)
We know that P2A = 0. So we claim that the rows of P2 span the left null space ofA = N(Aᵀ) and
R(Pᵀ2) = N(A
ᵀ). (5.2.22)
5.3 Lecture 16: September 25, 2013
Dr. Nitsche is not in town October 18 or Wednesday before thanksgiving. May have to havealternate times for class.
61
Nitsche and Benner Unit 5. Vector Spaces
The Four Subspaces of A
To recall what we discussed last class,
• R(A) is the range of A or the column space. This has dimensions r.
• N(A) is the column space of Aᵀ = {Aᵀy}. This has dimensions n− r.
• R(Aᵀ) is the rowspace transpose of A = {(yAᵀ)ᵀ} and is also known as the left rangeof A.This has dimensions r.
• N(Aᵀ) = {x : Aᵀx = 0} = {x : xᵀA = 0} and this is the left null space of A. This hasdimensions m− r.
Returning to the manipulation A→ EA with PA = EA with Pm×m is invertible.(P1
P2
)A =
(P1AP2A
), (5.3.1a)
=
(B1
0
), (5.3.1b)
where P2A = 0.
Theorem 5.9.
N(Aᵀ) = R(P
ᵀ2) (5.3.2)
where the right hand side is the rowspace of P2.
Proof. For proof of ⊇, Assume y ∈ R(Pᵀ2). Then y = Pᵀ2x for some x. Reformuating,yᵀ = xᵀP2. So yᵀA = xᵀP2A = xᵀ0, which gives y ∈ N(Aᵀ).
Also assume ⊆, assume y ∈ N(Aᵀ), Then yᵀA = 0. This gives yᵀP−1EA = 0 =
yᵀ[Q1|Q2]
(U0
). So, 0 = (yᵀQ1)Ur×m where we have that U is full rank. This gives
yᵀQ1 = 0.
We know that QP = I. [Q1|Q2]
[P1
P2
]= I, Q1P1 + Q2P2 = I, Q1P1 = I −Q2P2. This
gives, 0 = yᵀQ1P1 = yᵀ (I−Q2P2). So, yᵀ = yᵀQ2P2 and so we have
y = Pᵀ2
(Qᵀ2y)∈ R(P
ᵀ). (5.3.3)
�
As an example,1 2 1 3 3 1 0 0 02 4 0 4 4 0 1 0 01 2 3 5 5 0 0 1 02 4 0 4 2 0 0 0 1
→
1 2 0 2 0 0 −12
0 10 0 1 1 0 0 −2
313
12
0 0 0 0 1 0 12
0 −12
0 0 0 0 0 1 −13−1
30
(5.3.4a)
62
5.3. Lecture 16: September 25, 2013 Applied Matrix Theory
Note that the N(Aᵀ) is orthogonal to R(A). We also find from this manipulation that
R(A) = span
1212
,
1030
,
3−4
52
(5.3.5)
and
N(Aᵀ) = R(P
ᵀ2) =
3−1−1
0
. (5.3.6)
Linear Independence
Definition 5.10. A set {v1, . . . ,vn} is linearly independent if α1v1 + · · ·+αnvn = 0 impliesα1 = · · · = αn = 0. From this we get the equivalent statements;
• {v1, . . . ,vn} linearly independent,
• A = [v1 · · · vn] has full rank r,
• N(A) = {Aα = 0} = {0}.
For example we have the polynomial basis set to order n, {1, x, x2, . . . , xn} which islinearly independent because, c0 + c1x+ c2x
2 + · · ·+ cnxn = 0 implies that c0 = · · · = xn = 0.
As another example we can show that the zero set, {0} is linearly independent. This isbecause α0 = 0 for any α 6= 0. Any set containing 0, e.g. {v1, . . . ,vn,0} is linearly dependent.
Another example is any set of distinct unit vectors, {ei1, ei2, . . . , ein} where ei ∈ Rm andn ≤ m. This is also a linear independent since,
A =
0 0 10 0 01 0 00 1 00 0 0
. (5.3.7)
We take as another example the Van der Monde matrix which has applications in poly-nomial interpolation. Let x1, . . . , xm be distinct real numbers,
A =
1 x1 x2
1 · · · xn−11
1 x2 x22 · · · xn−1
2...
1 xm x2m · · · xn−1
m
(5.3.8)
where n ≤ m. Then we have Ac = y, where c = [c0 · · · cn−1]ᵀ. Because p(x1) = y1 andp(xm) = ym. Solution to Ac = y gives a polynomial that interpolates (xk, yk). For Ac = 0then we have p with m roots x1, . . . , xm, but another polynomial of degree n − 1 can onlyhave n− 1 distinct roots since m > n− 1. So p ≡ 0 and therefore c ≡ 0.
63
Nitsche and Benner Unit 5. Vector Spaces
x
y
(x1, y1)
(xk, yk)
(xn, yn)
Figure 5.2. Interpolating system.
5.4 Lecture 17: September 27, 2013
Linear functions (rev)
Is f linear? Here it was good to find the formula. Some could be done by inspection. Herewe also should check f(p1 + p2) = f(p1) + f(p2) and f(αp) = αf(p). So let’s talk aboutthe finding the functions; say the flipping function:
f(x, y) = (x,−y) =
(1 00 −1
)(xy
)(5.4.1)
For the mapping of the projection,
f(x, y) =
(x+ y
2,x+ y
2
)=
(12
12
12
12
)(xy
)(5.4.2)
For the rotation, x = r cos(ψ), and y = r sin(ψ). If we denote the shifted with primes,x′ = r cos(ψ + θ) and y′ = r sin(ψ + θ). We can use identities to get x′ = r(cosψ cos θ −sinψ sin θ) = x cos θ − y sin θ and y′ = r(sinψ cos θ + cosψ sin θ) = y cos θ + x cos θ. Thisgives us the function,
f(x, y) =
(x′
y′
)=
(cos(θ) − sin(θ)sin(θ) cos(θ)
)(xy
). (5.4.3)
Note this is a skew symmetric matrix with determinant equal to 1.
Review for exam
Anything on the first three homework’s is fair game. We have been doing computationsof the LU, PLU, REF, RREF. We have solved Ax = b. Writing systems of linear equa-tions in matrix form. We have talked about the elementary matrices and the process ofpremultiplication as well as there invertibility.
64
5.4. Lecture 17: September 27, 2013 Applied Matrix Theory
We have also discussed some proof, especially this last one. We showed this majorone: tr(AB) = tr(AB), (AB)
ᵀ= BᵀAᵀ, (AB)−1 = B−1A−1, (A−1)
ᵀ= (Aᵀ)
−1= A−ᵀ.
Similarly we have shown that the LU decomposition exists if all principle submatrices areinvertible. The relation (I−A)−1 =
∑nk=0 Ak if Ak → 0. We also discussed (A + B)−1
with perturbation matrices. Finally, we discussed rank one matrices, so we need to knowthe Sherman–Morrison formula, (I + uvᵀ)
−1.
Previous lecture continued
Comment on previous lecture:
A =
1 x1 x2
1 · · · xn−11
1 x2 x22 · · · xn−1
2...
1 xm x2m · · · xn−1
m
m×n
(5.4.4)
When we consider Ac = y is equivalent to p(xi) = yi, where i = 1, · · · ,m. Thus we havethe equation c0 + c1xi + c2x2 + · · ·+ cn−1x
n−1i = 0, where i = 1, · · · ,m, and we have a linear
system in the coefficients, ck. m ≥ n. In terms of vectors, these are linearly independentbecause the set
1...1
,
x1...xm
,
x21...x2m
, · · · ,
xn−11...
xn−1m
, (5.4.5)
has rank(A) = n. To show that this is linearly independent, we set up the system
c0
1...1
+ c1
x1...xm
+ c2
x21...x2m
+ · · ·+ cn−1
xn−11...
xn−1m
=
0...0
. (5.4.6)
Here we must show that we have at least m distinct roots, but p ∈ Pn−1 has at most n− 1roots. We know this by the fundamental theorem of algebra. So, m > n − 1 and thepolynomial must be identically equal to the zero polynomial, p ≡ 0, and ck = 0 for all k.
So we want to interpolate the polynomial p(x) ∈ Pn−1. We set up p(xi) = yi for i =1, . . . ,m. If n − 1 = m then we will have a unique solution to the interpolation. If insteadm > n then we have either no solution or infinitely many solutions. We defined the spanof a set as the set of all linear combinations that are a vector set over the field of reals:span {v1, . . . ,vn} = {
∑cnvn, cn ∈ R}. The basis for a vector space V is the set {v1, . . . ,vk},
that spans V and is linearly independent. We also know that the basis for {0} is the emptyset ∅. Thus, for convenience, we define span {∅} = {0}.
Theorem 5.11. If {v1, . . . ,vn} is a linearly independent basis of V, then {u1, . . . ,um}m > n is linearly dependent.
65
Nitsche and Benner Unit 5. Vector Spaces
5.5 Lecture 18: October 2, 2013
Exams and Points
We decided that we will have three exams total, but only the best two will each count for20% of our semester grade. Homework will be worth 60%. Lecture notes will be postedonline.
Continuation of last lecture
Theorem 5.12. If {u1, . . . ,un} spans V and S = {v1, . . . ,vn} ⊂ V with m > n, then S islinearly dependent.
Proof. Consider∑m
i=1 αiv1 = 0. Using vi =∑n
j=1 cijuj,
m∑i=1
αi
n∑j=1
cijuj = 0, (5.5.1a)
n∑j=1
(m∑i=1
αicij
)︸ ︷︷ ︸
(αᵀC)j
uj = 0. (5.5.1b)
Since Cᵀn×mα = 0 has ranks free, recall there exists α 6= 0 such that Cᵀα = 0. So (Cᵀα)j = 0for any j so
∑i αivi = 0. �
Definition 5.13. A basis of V is a linearly independent spanning set of V .
Theorem 5.14. Any two basis have the same number of elements.
Equivalent characterizations of basis,
• linearly independent spanning set
• minimal spanning set
• max linearly independent subset of V .
Definition 5.15. dim(V) is equal to the number of elements in the basis.
Recalling the four subspaces for a matrix,
Am×n =
| | |a1 a2 · · · an| | |
m×n
; (5.5.2)
• R(A) ⊂ Rm, dim = r;
• N(A) ⊂ Rn, dim = n− r;• R(Aᵀ) ⊂ Rn, dim = r;
66
5.5. Lecture 18: October 2, 2013 Applied Matrix Theory
• N(Aᵀ) ⊂ Rm, dim = m− r.
Definition 5.16. If X and Y are two subspaces of V then
X + Y = {x + y,x ∈ X ,y ∈ Y} . (5.5.3)
Is X + Y a subspace? We shall illustrate this in two parts
1. Given z ∈ X + Y , is αz ∈ X + Y?
If this is the case, z = x + y and αz = αx + αy ∈ X + Y , where we recalled that thevectors x and y are within their respective sets.
2. Given z1, z2 ∈ X + Y , is z1 + z2 ∈ X + Y?
Here we substitute for the summed vectors of each of the z vectors, (x1+y1)+(x2+y2) =(x1 + x2) + (y1 + y2) ∈ X + Y .
Theorem 5.17. dim(X + Y) = dim(X ) + dim(Y)− dim(X ∩ Y).
Proof. Let BX∩Y = {z1, . . . , zk} be the basis for X ∩Y . Then we can extent the set to basesfor X and Y .
BX = {z1, . . . , zk,x1, . . . ,xn} , (5.5.4a)
BY = {z1, . . . , zk,y1, . . . ,ym} . (5.5.4b)
We now claim that we have a set S = {z1, . . . , zk,x1, . . . ,xn,y1, . . . ,ym} = BX+Y . We nowconsider: does S span X +Y? We let z ∈ X +Y . Then, we know z = x + y for every x ∈ Xand y ∈ Y . So,
z =
(∑i
αizi +∑i
βixi
)+
(∑i
α′izi +∑i
γiyi
), (5.5.5a)
=∑i
(αi + α′i) zi +∑i
βixi +∑i
γiyi, ∈ span(S). (5.5.5b)
Is S linearly independent? Consider∑αizi +
∑βixi +
∑γiyi = 0, (5.5.6a)
∑γiyi︸ ︷︷ ︸∈Y
= −
∑αizi +∑
βixi︸ ︷︷ ︸∈X
,∈ X ∩ Y , (5.5.6b)
=∑
δizi, (5.5.6c)∑γiyi +
∑δizi = 0, (5.5.6d)
This indicates γi = δi ≡ 0.∑αizi +
∑βixi = 0, (5.5.6e)
which also indicates αi = δi ≡ 0. �
67
Nitsche and Benner Unit 5. Vector Spaces
From our example the range was spanned by the vectors,
R(A) = span
1212
,
1030
,
3452
⊂ R4, (5.5.7a)
N(A) = span
−2
1000
,
−2
0−1
00
⊂ R5, (5.5.7b)
R(Aᵀ) = span
12020
,
00110
,
00001
⊂ R5, (5.5.7c)
N(Aᵀ) = span
3−1−1
0
⊂ R4. (5.5.7d)
Theorem 5.18. (a) R(A) is orthogonal to N(Aᵀ) and (b) R(A) ∪ N(Aᵀ) = {0}.Which means R(A) + N(Aᵀ) = Rm and R(Aᵀ) + N(A) = Rn. Any Am×n gives an
orthogonal decomposition of Rn and Rm.
Proof. (a) Let y ∈ R(A) gives y = Az for some z. Then x ∈ N(Aᵀ) which means thatAᵀx = 0 and additionally xᵀA = 0. Considering xᵀy = xᵀAz = 0, so x must beorthogonal to y; therefore R(A) ⊥ N(A).
(b) If x ∈ R(A) and x ∈ N(Aᵀ), then xᵀx = 0 which implies xi = 0 and x = 0.�
68
UNIT 6
Least Squares
6.1 Lecture 19: October 4, 2013
Least Squares
We will now be covering the concept of least squares . If we are given an equation Ax = b, wemay multiply by the transpose of the matrix to find the least squares solution; AᵀAx = Aᵀb.We will show that this is consistent even if Ax = b is inconsistent.
Previously we showed,
Theorem 6.1. dim(X + Y) = dim(X ) + dim(Y) − dim(X ∩ Y), where X ,Y are subspacesof V.
We now consider,
Theorem 6.2. Given conformal matrices A and B,
rank(A + B)︸ ︷︷ ︸dim(R(A+B))
≤ rank(A)︸ ︷︷ ︸dim(R(A))
+ rank(B)︸ ︷︷ ︸dim(R(B))
. (6.1.1)
Proof. R(A + B) ⊂ R(A) + R(B) since, if y ∈ R(A + B) then
y = (A + B)x, (6.1.2a)
= Ax + Bx, ∈ R(A) + R(B), ⊂ R(A) ⊂ R(B). (6.1.2b)
Further,
dim(R(A + B)) ≤ dim(R(A) + R(B)), (6.1.3a)
= dim(R(A)) + dim(R(B))− dim(R(A) ∩ R(B)), (6.1.3b)
≤ dim(R(A)) + dim(R(B)), (6.1.3c)
= rank(A) + rank(B). (6.1.3d)
�
69
Nitsche and Benner Unit 6. Least Squares
Theorem 6.3. rank(AB) = rank(B)− dim(N(A) ∩ R(B))
Proof. Let S = {x1, . . . ,xs} be a basis of N(A) ∩ R(B). Since N(A) ∩ R(B) ⊂ R(B) canextend S to a basis for R(B),
BR(B) = {x1, . . . ,xs, z1, . . . , zt} . (6.1.4)
To prove dim(R(AB)) = t we claim {Az1 , . . . ,Azt} is a basis for R(AB). First we showthat it spans. We let b ∈ R(AB). So b = ABy, for some y where By ∈ R(B). So
b = A
(∑i
αix1 +∑i
βizi
), (6.1.5a)
=∑i
αi Axi︸︷︷︸=0
+∑i
βiAzi, since x1 ∈ N(A). (6.1.5b)
=∑i
βiAzi, ∈ span(S1). (6.1.5c)
Next we show that S2 is lineally independent;∑
i αiAzi = 0. Rearranging, A (∑
i αizi) = 0and
∑i αizi ∈ N(A) ∩ R(B) since zi ∈ R(B). Thus,
∑i αizi =
∑i βixi and
∑i αizi −∑
i βixi = 0. Therefore, αi = βi = 0 since {zi,xi} are linearly independent. �
Theorem 6.4. Given matrices Am×n and Bn×p, then
rank(A) + rank(B)− n ≤ rank(AB) ≤ min(rank(A), rank(B)) (6.1.6)
Proof. We will consider the right inequality first and the left inequality second. First,rank(AB) ≤ rank(B). We know that rank((AB)ᵀ) = rank(BᵀAᵀ) ≤ rank(Aᵀ) = rank(A)and finally rank((AB)ᵀ) = rank(AB).
For the left inequality, N(A)∩R(B) ⊂ N(A). Thus, dim(N(A)∩R(B)) ≤ dim(N(A)) =n−rank(A). So, rank(AB) = rank(B)−dim(N(A)∩R(B)) ≥ rank(B)−(n−rank(A)). �
Theorem 6.5. (1) rank(AᵀA) = rank(A) and rank(AAᵀ) = rank(Aᵀ) = rank(A).
(2) R(AᵀA) = R(Aᵀ) and R(AAᵀ) = R(A).
(3) N(AᵀA) = N(A) and N(AAᵀ) = N(Aᵀ).
Proof. For part (1), rank(AᵀA) = rank(A)−dim(N(Aᵀ)∩R(A)), but N(Aᵀ) ⊥ R(A) and soN(Aᵀ) ∩ R(A) = {0}. Since, if we let x ∈ N(Aᵀ) and x ∈ R(A) then Aᵀx = 0 and x = Ay.Which gives xᵀx = yᵀAx = 0 which implies that x = 0. So dim(N(Aᵀ) ∩ R(A)) = 0.
to be continued... �
6.2 Lecture 20: October 7, 2013
We will have two weeks for the next homework.
70
6.2. Lecture 20: October 7, 2013 Applied Matrix Theory
Properties of Transpose Multiplication
In review we covered the following theorems last time:
Theorem 6.6. dim(X + Y) = dim(X ) + dim(Y) − dim(X ∩ Y), where X ,Y are subspacesof V.
We also had the theorem,
Theorem 6.7. rank(AB) = rank(B)− dim(N(A) ∩ R(B))
And finally we showed the relation
Theorem 6.8. rank(AB) = rank(B)− dim(N(A) ∩ R(B))
We left off at the theorem covering multiplication relations and the rank and dimensionsof the matrix,
Theorem 6.9. (1) rank(AᵀA) = rank(A) and rank(AAᵀ) = rank(Aᵀ) = rank(A).
(2) R(AᵀA) = R(Aᵀ) and R(AAᵀ) = R(A).
(3) N(AᵀA) = N(A) and N(AAᵀ) = N(Aᵀ).
We proved the first one using the third of the theorems above. We now prove the secondand third parts of this theorem.
Proof. For part 2, Let y ∈ R(AᵀA) then AᵀAx = y for some x. So y = Aᵀz for some z. Thusy ∈ R(Aᵀ) So R(AᵀA) ⊂ R(Aᵀ), since dim(R(AᵀA)) = dim(R(Aᵀ)) and R(AᵀA) = R(Aᵀ).This is because BR(AᵀA) ⊂ BR(Aᵀ) but since these have the same number of elements soBR(AᵀA) = BR(Aᵀ).
For the third part we want to show that the basis are contained in the other and then wecompare the domimensions. So we let x ∈ N(A) the Ax = 0 or AᵀAx = 0 and x ∈ N(AᵀA)so N(A) ⊂ N(AᵀA). But also dim(N(A)) = n + r or dim(N(AᵀA)) = n − r therefore thetwo sets must be the same; N(A) = N(AᵀA). �
The Normal Equations
Definition 6.10. The normal equations for a system Ax = b is
AᵀAx = A
ᵀb. (6.2.1)
Theorem 6.11. For any A, AᵀAx = Aᵀb is consistent.
Proof. RHS in R(Aᵀ); by the previous theorem RHS ∈ R(AᵀA) for every x such that AᵀAx =RHS. �
Note: the solution to the normal equation is unique when rank(A) = n.
71
Nitsche and Benner Unit 6. Least Squares
Example 6.12. Fit (xi, yi), i = 1, . . . ,m by a polynomial of degree 2. So p(x) = c0 + c1x+c2x
2, where m > 3. Our problem it solve is p(xi) = yi, i = 1, . . . ,m or c0 + c1xi + cix2i =
yi, i = 1, . . . ,m. The system is therefore linear in the system of the unknowns c0, c1, andc2. We can write this in matrix form,
1 x1 x21
1 x2 x22
...1 xm x2
m
c0
c1
c2
=
y1
y2...ym
(6.2.2)
or alternatively we have the system Ac = y. What is the rank of the matrix A? We knowthat
1 x1 x21
1 x2 x22
...1 xm x2
m
is invertible, since Ac = 0 implies that c = 0. So,
Ac =
p(x1)p(x2)
...p(xm)
. (6.2.3)
Now, 3 ≤ m− 1 and we know that
rank
1 x1 x2
1
1 x2 x22
...1 xm x2
m
= 3 (6.2.4)
and AᵀAx = Aᵀb has a unique solution.To solve the normal equations,
1 1 1x1 x2 · · · xmx2
1 x22 x2
m
1 x1 x21
1 x2 x22
...1 xm x2
m
c0
c1
c2
=
1 1 1x1 x2 · · · xmx2
1 x22 x2
m
y1
y2...ym
, (6.2.5a)
m∑xi∑x2i∑
xi∑x2i
∑x3i∑
x2i
∑x3i
∑x4i
c0
c1
c2
=
∑yi∑xiyi∑x2i yi
(6.2.5b)
Suggestion: have an outline of the major proofs we have shown in class in your mind.Go back and give them a study over.
Theorem 6.13. AᵀAx = Aᵀb gives x which minimizes ‖Ax− b‖22 = (Ax− b)
ᵀ(Ax− b).
72
6.2. Lecture 20: October 7, 2013 Applied Matrix Theory
b
AxN(A)
Figure 6.1. Minimization of distance between point and a plane.
x
y
Figure 6.2. Parabolic fitting by least squares
By corollary this is an if and only if statement. Here every solution of the normalequations minimizes the sum of the squares of the entries of the vector
∑mi=1 (Ax− b)2
i .Note here ‖x‖2
2 = xᵀx =∑x2i . We illustrate this in Figure 6.1 where the minimal line
connecting a point to a plane is shown.
Example 6.14. What does the solution to the normal equations minimize from our example?The solution c0, c1, and c2 minimizes
∑mi=1 (Ac− y)2
i =∑
((Ac)i − yi)2 =
∑(p(xi)− yi)2.
We can visualize our parabolic least squares method as shown in Figure 6.2.
Exam 1
We had a range from 36–98, with a median of 66. For this exam: 70–100 is an A-rangescore, 50–70 is about a B, and below is a C (as long as as you are showing involvement in theclass). First two problems went fine; four was covered in class, five was on the homework,we will confer the solution of the sixth problem in class next time.
73
Nitsche and Benner Unit 6. Least Squares
6.3 Lecture 21: October 9, 2013
Need to have a couple classes early because of missing next Friday. So, next Monday andWednesday we will start at 8:35.
We will review problem 6 from the exam, then finish up least squares; cover lineardependence and finally linear transformations.
Exam Review
We review exam problems 6. Given u,v ∈ Rn. (a) Show A = I + uvᵀ is A−1 = I + αuvᵀ.Find α.
So we check that AA−1 = A−1A = I. Now
AA−1 =(I + αuv
ᵀ) (I + αuv
ᵀ), (6.3.1a)
= I + αuvᵀ
+ uvᵀ
+ uvᵀαuv
ᵀ, (6.3.1b)
= I + αuvᵀ
+ uvᵀ
+ αu(vᵀu)vᵀ, (6.3.1c)
= I + αuvᵀ
+ uvᵀ
+ α(vᵀu)uvᵀ, (6.3.1d)
= I + uvᵀ (
1 + α(1 + vᵀu)). (6.3.1e)
This is equal to I if 1 + α(1 + vᵀu) = 0 or when α = 11+vᵀu
. Thus, the Sherman–Morrisonformula is, (
I + uvᵀ)−1
= I− 1
1 + vᵀuuvᵀ. (6.3.2)
For part (b) B = A + αeieᵀj = A
(I + αA−1eie
ᵀj
)where A is invertible. For the inverse of
B:
B−1 =(I + αA−1eie
ᵀj
)−1A−1, (6.3.3a)
=
[I +
1
1 + eᵀjαA−1ei
αA−1eieᵀj
]A−1. (6.3.3b)
This exists if 1 + αeᵀjA−1ei︸ ︷︷ ︸
A−1ji
6= 0 and can make α sufficiently small.
Least squares and minimization
Theorem 6.15. x solves AᵀAx = Aᵀb is equivalent to x minimizes (Ax− b)ᵀ(Ax− b) =‖Ax− b‖2
2, where ‖x‖22 = xxᵀ =
∑i x
2i .
Note:
f(x) = f(x1, x2, . . . , xn), (6.3.4a)
= (Ax− b)ᵀ(Ax− b), (6.3.4b)
= (xᵀAᵀ − b
ᵀ)(Ax− b), (6.3.4c)
= xᵀAᵀAx− x
ᵀAᵀb− b
ᵀAx + b
ᵀb. (6.3.4d)
74
6.3. Lecture 21: October 9, 2013 Applied Matrix Theory
For scalars x, we have that xᵀ = x. So, vᵀAx =(bᵀAx)ᵀ
= xᵀAᵀb. This manipulates ourprevious result to,
= xᵀAᵀAx− 2x
ᵀAᵀb + b
ᵀb. (6.3.5)
This is a quadratic form and the minimum occurs when ∂f∂xi
= 0.
Proof. To prove from the right hand side to the left; suppose x minimizes f(x), then
0 =∂f
∂xi, (6.3.6a)
=∂xᵀ
∂xiAᵀAx + x
ᵀAᵀA∂x
∂xi− 2
∂xᵀ
∂xiAᵀb, (6.3.6b)
= 2eᵀiAᵀAx− 2e
ᵀiAᵀb. (6.3.6c)
This gives us
eᵀiAᵀAx = e
ᵀiAᵀb (6.3.7)
and
(AᵀAx)i = (A
ᵀb)i, any i. (6.3.8)
This finally means that we have formulated equivalently to AᵀAx = Aᵀb.
ASIDE:∂
∂xi(uv) =
∂u
∂xiv + u
∂v
∂xi. (6.3.9)
To prove going the other direction, suppose that x solves AᵀAx = Aᵀb then show thatf(x) < f(y) for any y 6= x. First, we consider
f(y)− f(x) = yᵀAᵀAy − 2y A
ᵀb︸︷︷︸
AᵀAx
+bᵀb− x
ᵀAᵀAx− 2x A
ᵀb︸︷︷︸
AᵀAx
+bᵀb, (6.3.10a)
= yᵀAᵀAy − 2yA
ᵀAx− x
ᵀAᵀAx− 2xA
ᵀAx, (6.3.10b)
= (Ay −Ax)ᵀ
(Ay −Ax) , (6.3.10c)
= ‖A (y − x)‖22, (6.3.10d)
≥ 0. (6.3.10e)
If A has full rank (no nontrivial null space), then this must be greater than zero. Soany solution to the normal equations minimizes this norm, or and solution AᵀAx = Aᵀbminimizes ‖Ax− b‖2
2. Further, if A has full rank then we are guaranteed a unique leastsquares solution x. Finally, if A has a nontrivial null space (r < n) then we have infinitelymany least squares solutions. �
In Matlab we can do help \ to find out what solution it gives for underdeterminedsolutions. What does it minimize?
75
Nitsche and Benner Unit 6. Least Squares
6.4 Homework Assignment 4: Due Monday, October
21, 2013
1. Textbook 4.1.1: Vector spaces, subspaces, fundamental subspaces of a matrix.
Determine which of the following subsets of Rn are in fact subspaces of Rn (n > 2).
(a) {x | xi ≥ 0},(b) {x | x1 = 0},(c) {x | x1x2 = 0},
(d){
x∣∣ ∑n
j=1 xj = 0}
,
(e){
x∣∣ ∑n
j=1 xj = 1}
,
(f) {x | Ax = b, where Am×n 6= 0 and bm×1 6= 0}.
2. Textbook 4.1.2
Determine which of the following subsets of Rn×n are in fact subspaces of Rn×n.
(a) The symmetric matrices.
(b) The diagonal matrices.
(c) The nonsingular matrices.
(d) The singular matrices.
(e) The triangular matrices.
(f) The upper-triangular matrices.
(g) All matrices that commute with a given matrix A.
(h) All matrices such that A2 = A.
(i) All matrices such that tr(A) = 0.
3. Textbook 4.1.6
Which of the following are spanning sets for R3?
(a){(
1 1 1)}
,
(b){(
1 0 0),(0 0 1
)},
(c){(
1 0 0),(0 1 0
),(0 0 1
),(1 1 1
)},
(d){(
1 2 1),(2 0 −1
),(4 4 1
)},
(e){(
1 2 1),(2 0 −1
),(4 4 0
)}.
4. Textbook 4.1.7
For a vector space V , and for M,N ⊆ V , explain why span(M∪N ) = span(M) +span(N ).
76
6.4. HW 4: Due October 21, 2013 Applied Matrix Theory
5. Textbook 4.2.1
Determine spanning sets for each of the four fundamental subspaces associated with
A =
1 2 1 1 5−2 −4 0 4 −2
1 2 2 4 9
.
6. Textbook 4.2.3
Suppose that A is a 3× 3 matrix such that
R =
1
23
, 1−1
2
and N =
−2
10
span R(A) and N(A), respectively, and consider a linear system Ax = b, where
b =
1−7
0
.
(a) Explain why Ax = b must be consistent.
(b) Explain why Ax = b cannot have a unique solution.
7. Textbook 4.2.7
If A =
(A1
A2
)is a square matrix such that N(A1) = R(Aᵀ2), prove that A must be
nonsingular.
8. Textbook 4.2.8
Consider a linear system of equations Ax = b for which yᵀb = 0 for every y ∈ N(Aᵀ).Explain why this means the system must be consistent.
9. Textbook 4.3.1(abc): Linear independence, basis.
Determine which of the following sets are linearly independent. For those sets that arelinearly dependent, write one of the vectors as a linear combination of the others.
(a)
1
23
,2
10
,1
59
(b)
{(1 2 3
),(0 4 5
),(0 0 6
),(1 1 1
)}(c)
3
21
,1
00
,2
10
10. Textbook 4.3.4
Consider a particular species of wild flower in which each plant has several stems,leaves, and flowers, and for each plant let the following hold.
77
Nitsche and Benner Unit 6. Least Squares
S = the average stem length (in inches).
L = the average leaf width (in inches).
F = the number of flowers.
Four particular plants are examined, and the information is tabulated in the followingmatrix:
A =
S L F
#1 1 1 10#2 2 1 12#3 2 2 15#4 3 2 17
For these four plants, determine whether or not there exists a linear relationship be-tween S, L, and F . In other words, do there exist constants α0, α1, α2, and α3 suchthat α0 + α1S + α2L+ α3F = 0?
11. Textbook 4.3.13
Which of the following sets of functions are linearly independent?
(a) {sin(x), cos(x), x sin(x)}.(b) {ex, xex, x2ex}.(c)
{sin2(x), cos2(x), cos(2x)
}.
12. Textbook 4.4.2
Find a basis for each of the four fundamental subspaces associated with
A =
1 2 0 2 13 6 1 9 62 4 1 7 5
(6.4.1)
13. Textbook 4.4.8
Let B = {b1,b2, . . . ,bn} be a basis for a vector space V . Prove that each v ∈ V canbe expressed as a linear combination of the bi’s, v = α1b1 +α2b2 + · · ·+αnbn, in onlyone way—i.e., the coordinates αi are unique.
14. Textbook 4.5.5
For A ∈ Rm×n, explain why AᵀA = 0 implies A = 0.
15. Textbook 4.5.8
Is rank(AB) = rank(BA) when both products are defined? Why?
16. Textbook 4.5.14
Prove that if the entries of Fr×r satisfy∑r
j=1 |fij| < 1 for each i (i.e., each absoluterow sum < 1), then I + F is nonsingular. Hint: Use the triangle inequality for scalars|α + β| ≤ |α|+ |β| to show N(I + F) = 0.
17. Textbook 4.5.18
If A is n× n, prove that the following statements are equivalent:
78
6.4. HW 4: Due October 21, 2013 Applied Matrix Theory
(a) N(A) = N(A2)
(b) R(A) = R(A2)
(c) R(A) ∩ N(A) = {0}
18. Textbook 4.6.1: Least Squares.
Hookes law says that the displacement y of an ideal spring is proportional to the forcex that is applied—i.e., y = kx for some constant k. Consider a spring in which k isunknown. Various masses are attached, and the resulting displacements shown in thefigure are observed. Using these observations, determine the least squares estimate fork.
19. Textbook 4.6.2
Show that the slope of the line that passes through the origin in R2 and comes closestin the least squares sense to passing through the points {(x1, y1), (x2, y2), . . . , (xn, yn)}is given by m =
∑i xiyi/
∑i x
2i .
20. Textbook 4.6.6
After studying a certain type of cancer, a researcher hypothesizes that in the short runthe number (y) of malignant cells in a particular tissue grows exponentially with time(t). That is, y = α0eα1t. Determine least squares estimates for the parameters α0 andα1 from the researchers observed data given below.
t (days) 1 2 3 4 5y (cells) 16 27 45 74 122
Hint: What common transformation converts an exponential function into a linearfunction?
79
Nitsche and Benner Unit 6. Least Squares
80
UNIT 7
Linear Transformations
7.1 Lecture 22: October 14, 2013
Theorem 7.1. Given a vector space V. If {u1, . . . ,un} spans V and {vi}mi=1 ⊂ V, then{vi}mi=1 is linearly dependent if m > n (because there is more vectors in the set).
Proof. Consider∑m
i=1 αivi = 0. Use vi =∑n
j=1 cijuj then∑m
i=1 αi∑n
j=1 cijuj = 0 and∑mi=1
n∑j=1
αicij︸ ︷︷ ︸(αᵀC)j=(Cᵀα)j
uj = 0 If we consider α =(α1 · · · αn
)ᵀSo cᵀα = 0 has nonzero
solutions α, since m−n > 0 free variables. So for every αi 6= 0, cᵀα = 0 and∑αivi = 0. �
Any two bases for V have the same number of elements.
Definition 7.2. Let V be a vector space with basis B = {b1, . . . ,bn} The coordinates of
x ∈ V are cj =
c1...cn
such that x =∑n
j=1 cjbj.
Theorem 7.3. Coordinates of x ∈ V with respect to the basis B are unique. [x]B =
c1...cn
.
Example 7.4. We take as an example a vector x ∈ R3,
x =
123
, (7.1.1a)
= 1e1 + 2e2 + 3e3, (7.1.1b)
= ı + 2 + 3k. (7.1.1c)
81
Nitsche and Benner Unit 7. Linear Transformations
with the standard bidis in Rn = {e1, . . . , en} = S or123
= [x]S (7.1.2)
We can have another basis for R3;1
10
,
111
,
200
= B (7.1.3)
This is linearly independent because the matrix2 1 10 1 10 0 1
is nonsingular. So
[x]B =
−13−1
2
(7.1.4)
Now find
c1
c2
c3
such that
c1
110
+ c2
111
+ c3
200
= 1e1 + 2e2 + 3e3. (7.1.5)
In matrix form,
Bc =
123
, (7.1.6a)
1 1 21 1 00 1 0
c1
c2
c3
=
123
. (7.1.6b)
Solving for the individual variables,
c2 = 3; (7.1.7a)
c1 = 2− c1, (7.1.7b)
= −1; (7.1.7c)
2c3 = 1− 3 + 1, (7.1.7d)
c3 = −1
2. (7.1.7e)
Summary
• For any vector space, V , there exists a basis B.
• Any x ∈ V is represented uniquely by a tuple of numbers, the coordinates [x]B.
82
7.1. Lecture 22: October 14, 2013 Applied Matrix Theory
Linear Transformations
Definition 7.5. Given the vector spaces U ,V , a map T : U → V such that,
• T(x + y) = T(x) + T(y)
• T(αx) = αT(x)
is a linear transformation of U → V .
We also recognize that a linear transformation is a linear function on vector spaces.
Definition 7.6. A linear transformation U → U is a linear operator on U .
Our goal now is two fold:
• Show that the set of all linear transformations U → V is a vector space L(U ,V).
• Find the basis and coordinate unit basis of any T ∈ L(U ,V).
Examples of Linear Functions
Example 7.7. T(x) = Am×nxn×1 so T : Rn → Rm.
• Rotation A = R(θ)
• projection
• reflection
Example 7.8. f(x) = ax, f : R→ RExample 7.9. D(f) = df
dx, D : Pn → Pn−1 or D : C1 → set of all functions.
Example 7.10. I(f) =´ baf(x) dx, I : C0 → R
Example 7.11. One final example regarding matrices, T(Bn×k) = Am×nBn×k, T : Rn×k →Rm×k.
Matrix representation of linear transformations
Every linear transformation on finite dimensional spaces has a matrix representation. Sup-pose T : U → V and B = {u1, . . . ,un} forms the basis for U and B′ = {v1, . . . ,vn} formsthe basis for V . Then the action of T on U is
T(u) = T
(n∑i=1
ξiui
), (7.1.8a)
=n∑i=1
ξiT (ui) , (7.1.8b)
=n∑i=1
ξi
n∑j=1
αijvj, (7.1.8c)
=n∑i=1
n∑j=1
αijξivj, (7.1.8d)
where αij describes the action of T.
83
Nitsche and Benner Unit 7. Linear Transformations
Theorem 7.12. The set of all linear transformations T : U ,V = L(U ,V) is a vector space.
Proof. Given T1,T2 ∈ L(U ,V), then (T1 + T2) x = T1x + T2x and T1 + T2 ∈ L(U ,V).Further (αT1)x = αT1(x) which gives αT1 ∈ L(U ,V). Some other properties of note:0x = 0 and 0 ∈ L(U ,V); (T1 −T1) = 0, etc. �
Theorem 7.13. Given U with basis B = {u1, . . . ,un} and V with basis B′ = {v1, . . . ,vn}then a basis for L(U ,V) is {Bij}i=1,...,n; j=1,...,m, where Bij : U → V by Bij(u) = ξivj where
u =∑n
k=1 ξkuk.
It follows that dim(L(U ,V)) = dim(U) dim(V) = nm.
Proof. Let’s prove linear independence: Consider∑ηijBij = 0, then
0 =
(∑ij
ηijBij
)(uk), (7.1.9a)
=∑ij
ηij(Bijuk), (7.1.9b)
=∑j
ηkjvj (7.1.9c)
ASIDE: Note that Bijuk = ξivj = 0, i 6= k;vj , i = k With [uk]B =(0 · · · 1 · · · 0
)ᵀ, with the 1 at the
kth position.
Since {vj} are linearly independent it follows that ηkj ≡ 0 for all j and each k. ThereforeBij are linearly independent. �
7.2 Lecture 23: October 16, 2013
The next major things we are going to try to cover are:
• Basis for L(U ,V) coordinates for T ∈ L(U ,V)
• Action of T
• Change of coordinates of u ∈ U under change of basis
• Change of coordinates of T ∈ L(U ,V) under change of basis
Basis of a linear transformation
The linear set,L(U ,V) = {T : U → V |T linear transformation} (7.2.1)
Theorem 7.14. Bji : U → V by Bjiu = ξivj where B = {u1, . . . ,un} is a basis for U andB′ = {v1, . . . ,vn} is a basis for V and u =
∑nk=1 ξkuk. Also, {Bij} are basis for L(U ,V).
84
7.2. Lecture 23: October 16, 2013 Applied Matrix Theory
Proof. First, we observe that we have linear independence. Second we check the span. If welet T ∈ L(U ,V), then
T(u) = T(∑j
ξjuj), (7.2.2a)
=∑j
ξjT(uj), (7.2.2b)
=∑j
ξj
m∑i=1
αijvi. (7.2.2c)
Here we recognize that T(uj) =∑m
i=1 αijvi.
=∑j
m∑i=1
αij ξjvi︸︷︷︸Bij(u)
, (7.2.2d)
=
(∑j
m∑i=1
αijBij
)(u). (7.2.2e)
for any u. Thus, T =∑
j
∑i αijBij; so {Bij} spans L(U ,V). It follows that
[T]BB′ = {αij} , (7.2.3a)
=
α11 α12 α1n
α21 α22 · · · α2n...
. . ....
αm1 αm2 · · · αmn
, (7.2.3b)
=([T(u1)]B′ [T(u2)]B′ · · · [T(un)]B′
). (7.2.3c)
�
If T : U → U is a linear operator that goes to the same space then [T]BB = [T]B forconvenience.
Example 7.15. Let D : Pn → Pn−1 by D(p) = dpdx
. Our basis is B = {1, x, . . . , xn} and we
85
Nitsche and Benner Unit 7. Linear Transformations
also have the operated basis B′ = {1, x, . . . , xn−1}. So,
[D(1)]B′ = [0]B′ , (7.2.4a)
=
0...0
; (7.2.4b)
[D(x)]B′ = [1]B′ , (7.2.4c)
=
10...0
; (7.2.4d)
[D(x2)
]B′ = [2x]B′ , (7.2.4e)
=
020...0
; (7.2.4f)
[D(xn)]B′ =[nxn−1
]B′ , (7.2.4g)
=
0...0n
. (7.2.4h)
This allows us to represent the differentiation operator by the matrix,
[D]BB′ =
0 1 0 0 00 0 2 0 · · · 0
0 0 0 3. . . 0
.... . . . . .
...0 0 0 0 · · · n
n×(n+1)
. (7.2.5)
Example 7.16. Let D : Pn → Pn by D(p) = dpdx
. This will be the same as the previousexample except we will add a row of zeros at the bottom and give us a square matrix.
[D]B =
0 1 0 0 00 0 2 0 · · · 0
0 0 0 3. . . 0
.... . . . . .
...0 0 0 0 · · · n0 0 0 0 0
(n+1)×(n+1)
. (7.2.6)
86
7.2. Lecture 23: October 16, 2013 Applied Matrix Theory
We may do this for any operator. For example we could do this for projection. Whatwe want is to find a basis that gives us a nice representation of the operator. Highly sparsebasis are nice.
Action of linear transform
The action of T : U → V . Recall,
T(u) = T
(n∑j=1
ξjuj
), (7.2.7a)
=n∑j=1
ξjT (uj) , (7.2.7b)
=n∑j=1
ξj
m∑i=1
αijvi, (7.2.7c)
=n∑j=1
(m∑i=1
αijξj
)︸ ︷︷ ︸
[Aξ]i
vi (7.2.7d)
This gives us the coordinates of the V basis.
[T(u)]B′ = Aξ, (7.2.8a)
= [T]BB′ [u]B . (7.2.8b)
Thus the action is represented by matrix multiplication. Now return to our example,
Example 7.17. Let D : Pn → Pn−1 by D(p) = dpdx
. Our basis is B = {1, x, . . . , xn} and wealso have the operated basis B′ = {1, x, . . . , xn−1}. If we consider p(x) = α0+α1x+· · ·+αnxnand D(p(x)) = α1 + 2α2x+ · · ·+ nαnx
n−1. This gives our vector representation of
[D(p)]B′ =
α1
2α2
3α3...
nαn
, (7.2.9a)
=
0 1 0 0 00 0 2 0 · · · 0
0 0 0 3. . . 0
.... . . . . .
...0 0 0 0 · · · n
α0
α1
α2
α3...αn
. (7.2.9b)
It follows that [L + T]BB′ = [L]BB′+ [T]BB′ and [αL]BB′ = α [L]BB′ . We may also considerthe composition of linear operators. Say L(T(x)) = (LT)(x), also [LT]BB′′ = [L]BB′ [T]B′B′′ .
87
Nitsche and Benner Unit 7. Linear Transformations
Change of Basis
If we change the coordinates of our system when given vector space U . Let B = {u1, . . . ,un}is a basis for U and B′ = {v1, . . . ,vn} be two bases for U . The relation between [u]B and[u]B′ is given by
[u]B′ = P [u]B . (7.2.10)
P is called the change of basis matrix from B to B′. Recall, the coordinates of [T(u)]B′ =[T(u)]BB′ [u]B. Clearly P is [T(u)]BB′ when T = I or P = [I(u)]BB′ . We will use ourdifferentiation operator as an example once more.
Example 7.18. Given U = P2 we have the bases B = {1, x, x2} and B′ = {1, 1 + x, 1 + x+ x2},then
[I(u)]BB′ =([I(u1)]B′ [I(u2)]B′ [I(u3)]B′
), (7.2.11a)
=([u1]B′ [u2]B′ [u3]B′
), (7.2.11b)
=
1 −1 00 1 −10 0 1
, (7.2.11c)
= P. (7.2.11d)
We know this is true for any u. We can find the representation of the polynomial p(x) =3 + 2t+ 4t2 in the [p]B′ . So,
[p]B′ =
1 −1 00 1 −10 0 1
324
, (7.2.12a)
=
1−2
4
. (7.2.12b)
Finally, let U be a vector space with basis B = {u1, . . . ,un} and B′ = {v1, . . . ,vn}.Then if we have T : U → U . We know the relation between [T]B and [T]B′ and we may letP = [I]BB′ . We have,
[T(u)]B′ = [T]BB′ [u]B , (7.2.13a)
= A [u]B . (7.2.13b)
Further we have
[u]B′ = P [u]B , (7.2.14a)
[T(u)]B′ = P [T(u)]B . (7.2.14b)
SoP [T(u)]B = A . . . (7.2.15)
to be continued. . .Note: No class Friday.
88
7.3. Lecture 24: October 21, 2013 Applied Matrix Theory
7.3 Lecture 24: October 21, 2013
Change of Basis (cont.)
If we have T : U → U , let U be a vector space with basis B = {u1, . . . ,un} and B′ ={v1, . . . ,vn}.
1. Basis for L(U ,V) = {Bij : Biju = ξivj, where u =∑
k ξkuk} coordinates of T, [T] =([Tu1]B′ [Tu2]B′ · · · [Tun]B′
).
2. Achar of T [T(u)]B′ = [T]BB′ [u]B.
3. Given x ∈ U with B,B′ are two bases for U , then [x]B′ = P [x]B. and P = [I]BB′ .
4. T : U → U with B,B′ are two bases for U , then we want to relate [T]B and [T]B′ .
To show property 4,
[Tu]B′ = [T]B′B′ [u]B′ , (7.3.1a)
[Tu]B = [T]BB [u]B . (7.3.1b)
But also,
[Tu]B′ = P [Tu]B , (7.3.2a)
Considering P = [I]BB′
[u]B′ = P [u]B . (7.3.2b)
So
P [Tu]B = · · · (7.3.3a)
And we get,
[T]BB = P−1 [T]B′B′ P, (7.3.4a)
[T]B = P−1 [T]B′ P. (7.3.4b)
The matrix representation of T under different basis are self similar.
Definition 7.19. If A = C−1BC for some C, then A and B are self-similar (A,B,C ∈Rn×n).
Theorem 7.20. Given any two self-similar matrices A, B, they represent the same lineartransformation under two different bases.
89
Nitsche and Benner Unit 7. Linear Transformations
Example 7.21. Example illustrating the self-similarity: [T]B = P−1 [T]B′ P. Let T ∈L(U ,U) be defined by
Tu =
(0 1−2 3
)(xy
)(7.3.5)
where u = xu1 + yu2.
Tu =
(y
−2x+ 3y
)= yu1 + (−2x+ 3y)u2. (7.3.6)
In basis notation we may consider this,
[Tu]B = M [u]B . (7.3.7)
Now let’s consider a different basis. Let S = {e1, e2} and S ′ ={(
11
),
(12
)}. Now
[T]S =([Te1]S [Te2]S
), (7.3.8a)
=
([(0−2
)]S
[(13
)]S
), (7.3.8b)
=
(0 1−2 3
), (7.3.8c)
= M. (7.3.8d)
Now in our different basis,
[T]S′ =
([T
(11
)]S
[T
(12
)]S
), (7.3.9a)
=
([(11
)]S′
[(24
)]S′
), (7.3.9b)
=
(1 00 2
). (7.3.9c)
This helps us by diagonalizing the operator. Now we want to find P,
P = [I]BB′ , (7.3.10a)
=([Tu1]B′ [Tu2]B′
), (7.3.10b)
=
([(10
)]B′
[(01
)]B′
), (7.3.10c)
=
(2 −1−1 1
). (7.3.10d)
Similarly,
P−1 =
(1 11 2
). (7.3.11)
90
7.4. Lecture 25: October 23, 2013 Applied Matrix Theory
We can verify this,
P−1 [T]S′ P =
(1 11 2
)(1 00 2
)(2 −1−1 1
), (7.3.12a)
=
(1 11 2
)(2 −1−2 2
), (7.3.12b)
=
(0 1−2 3
). (7.3.12c)
So this checks out.
Example 7.22. Let M ∈ L(U ,V) defined by [M(u)]S = M [u]S where S is the standardbasis. Then
[M]S = M, (7.3.13a)
=([Me1]S [Me2]S · · · [Men]S
)(7.3.13b)
and we define S ′ = {q1, . . . ,qn}. When we have Q = [I]S′S ,
[M]S′ = Q−1MQ, (7.3.14a)
=([q1]S [q2]S · · · [qn]S
), (7.3.14b)
=(q1 q2 · · · qn
). (7.3.14c)
.
Now let A = Q−1BQ with S = {e1, . . . , en} and S ′ = {q1, . . . ,qn} and Let L(u) = Bu.[L]S = B and [I]S′S = Q so [L]S′ = Q−1BQ.
If T ∈ L(U ,U) and X ⊂ U such that T(X ) ⊂ X where T(X ) = {T(x) such that x ∈ X}then X is an invariant subspace of U under T.
Example 7.23. If (λ,v) are an eigen-pair of A then
(λI−A) v = 0, (7.3.15a)
λv = Av. (7.3.15b)
and span{v} is an invariant subspace under A.
7.4 Lecture 25: October 23, 2013
Properties of Special Bases
If we consider B and B′ as bases for U with operation T : U → U . Then we have,
[T]BB′ =([T(u1)]B′ · · · [T(un)]B′
), (7.4.1a)
[T]B =([T(u1)]B · · · [T(un)]B
), (7.4.1b)
= P−1 [T]B′ P (7.4.1c)
91
Nitsche and Benner Unit 7. Linear Transformations
And we also have P = (I)BB′ . We consider T on Rn, T(x) = Ax and [T]S = A. SoA = P−1BP for appropriate B and P, with B = [T]B′
Note: A tuple is an ordered set of numbers.Now we have two goals:
1. Find a basis such that [T]B is simple
2. FInd invariant quantities
Example 7.24. tr(P−1BP) = tr(BPP−1) = tr(B)
Example 7.25. For T : Pn → Pn by T(p) = Dp,
[T]B =
0 1 0 · · · 0
0 0 2...
... 0 n0 · · · 0 0
(7.4.2)
tr(T) = 0
Example 7.26. rank(P−1BP) = rank(B)
Example 7.27. Nilpotent operator of index k N : U → U such that Nk = 0, but Nk−1 6= 0.On the homework we will have to show that
{x,Nx,N2x, . . . ,Nk−1x
}a basis for Rk and
x is defined such that Nk−1(x) 6= 0. So,
[N]B =
0 0 · · · 0
1 0. . .
......
. . . 0 00 · · · 1 0
= J (7.4.3)
Example 7.28. An idempotent operator E : U → U has the property E2 = E. This isbecause these are projection operators which can only return the same answer if done twice.
B =
x1, . . . ,xr︸ ︷︷ ︸BR(E)
,y1, . . . ,yn−r︸ ︷︷ ︸BN(E)
.
[E] =
(Ir×r 00 0
)(7.4.4)
Example 7.29. If A has a full set of e-vectors qj, j = 1, . . . , n. Then, Aqj = λjqj withbases S,P . So
[I]PS =(q1, . . . ,qn
), (7.4.5a)
= Q, (7.4.5b)
[T]P = Q−1 [T]S Q, (7.4.5c)
Λ = Q−1AQ (7.4.5d)
92
7.4. Lecture 25: October 23, 2013 Applied Matrix Theory
So
[T]P =
λ1 0 · · · 0
0 λ2. . .
......
. . . . . . 00 · · · 0 λn
, (7.4.6a)
T(x) = Ax. (7.4.6b)
Invariant Subspaces
Let T be a linear operator T : U → U .
Definition 7.30. A subset X ⊂ U is invariant under T if Tx ∈ X for any x ∈ X (orT(X ) ⊂ X ). Also T1x : X → X .
Example 7.31. Given
T(x) = Ax, (7.4.7a)
=
−1 −1 −1 −1
0 −5 −16 −220 3 10 144 8 12 14
. (7.4.7b)
X = span {q1,q2} where q1 =
2−100
and q2 =
−1
2−1
0
. Show that X is invariant under T.
So,
T(q1) =
−1−15−3
0
, (7.4.8a)
= q1 + 3q2 ∈ span(X ); (7.4.8b)
T(q2) =
06−4
0
, (7.4.8c)
= 2q1 + 4q2 ∈ span(X ). (7.4.8d)
So for any T(α1q1+α2q2) = α1T(q1)+α2T(q2) ⊂ X . So for T : R4 → R4 with T1x : X → X ,
[T1x]q1,q2=
(1 23 4
). (7.4.9)
93
Nitsche and Benner Unit 7. Linear Transformations
Now say we have [T]P , P =
q1,q2,
1000
,
0001
. Then,
[T]P =
1 2 x x3 4 x x0 0 −1 x0 0 4 x
(7.4.10)
So we have gained some zero elements. This is since,
T
1000
=
−1
004
, (7.4.11a)
T
0001
=
−1−12
1414
(7.4.11b)
Now if X ,Y are subspaces of U and are invariant under T; T(X ) ⊂ X and T; T(Y) ⊂ Y .and X + Y = U . Then B =
[x1, . . . ,xr,y1, . . . ,yn−r
].
[T]B =([T(x1)]B · · · [T(xr)]B [T(y1)]B · · ·
[T(yn−r)
]B
), (7.4.12a)
=
([T1x]Bx 0
0 [T1y]By
), (7.4.12b)
= Q−1AQ. (7.4.12c)
7.5 Homework Assignment 5: Due Monday, November
4, 2013
1. Explain how we proved in class that, for any A ∈ Rm×n, the linear AᵀAx = Ab isconsistent. Do not reproduce all proofs, but outline the train of thought, starting frombasic linear algebra facts.
2. For the overdetermined linear system1 21 21 2
x =
112
(a) Is the matrix A rank-deficient or of full rank? What is the rank of AᵀA?
(b) Find all least squares solutions.
94
7.5. HW 5: Due November 4, 2013 Applied Matrix Theory
(c) Find the solution that Matlab returns, using A\b. Also find the least squaressolution of minimum norm. Do they agree?
(d) What criterion does Matlabs use to choose a solution? (use help mldivide to findout)
3. Textbook 4.7.2: Linear transformations
For A ∈ Rn×n, determine which of the following functions are linear transformations.
(a) T(Xn×n) = AX−XA,
(b) T(xn×1) = Ax + b for b 6= 0,
(c) T(A) = Aᵀ,
(d) T(Xn×n) = (X + Xᵀ) /2.
4. Textbook 4.7.6
For the operator T : R2 → R2 defined by T(x, y) = (x+ y,−2x+ 4y), determine [T]B,
where B is the basis B =
{(11
),
(12
)}.
5. Textbook 4.7.11
Let P be the projector that maps each point v ∈ R2 to its orthogonal projection onthe line y = x as depicted in Figure 4.7.4.
Figure 7.1. Figure 4.7.4
(a) Determine the coordinate matrix of P with respect to the standard basis.
(b) Determine the orthogonal projection of v =
(αβ
)onto the line y = x.
6. Textbook 4.7.13
For P2 and P3 (the spaces of polynomials of degrees less than or equal to two and three,respectively), let S : P2 → P3 be the linear transformation defined by S(p) =
´ t0p(x) dx.
Determine [S]BB′ , where B = {1, t, t2} and B′ = {1, t, t2, t3}.
95
Nitsche and Benner Unit 7. Linear Transformations
7. Textbook 4.8.1: Change of basis
Explain why rank is a similarity invariant.
8. Textbook 4.8.2
Explain why similarity is transitive in the sense that A ' B and B ' C impliesA ' C.
9. Textbook 4.8.3
A(x, y, z) = (x+ 2y − z,−y, x+ 7z) is a linear operator on R3.
(a) Determine [A]S , where S is the standard basis.
(b) Determine [A]S′ as well as the nonsingular matrix Q such that [A]S′ = Q−1[A]SQ
for S ′ =
1
00
,
110
,
111
.
10. Textbook 4.8.11
(a) N is nilpotent of index k when Nk = 0 but Nk−1 6= 0. If N is a nilpotent operatorof index n on Rn, and if Nn−1(y) 6= 0, show B = {y,N(y),N2(y), . . . ,Nn−1(y)}is a basis for Rn, and then demonstrate that?
[N]B = J =
0 0 · · · 0 01 0 · · · 0 00 1 · · · 0 0...
.... . .
......
0 0 · · · 1 0
(b) If A and B are any two n× n nilpotent matrices of index n, explain why A ' B.
(c) Explain why all n×n nilpotent matrices of index n must have a zero trace and beof rank n− 1.
11. Textbook 4.8.12
E is idempotent when E2 = E. For an idempotent operator E on Rn, let X = {xi}ri=1
and Y = {xi}n−ri=1 be bases for R(E) and N(E), respectively.
(a) Prove that B = X ∪ Y is a basis for Rn. Hint: Show Exi = xi and use this todeduce that B is linearly independent.
(b) Show that [E]B =
(Ir 00 0
).
(c) Explain why two n× n idempotent matrices of the same rank must be similar.
(d) If F is an idempotent matrix, prove that rank(F) = tr(F).
96
7.5. HW 5: Due November 4, 2013 Applied Matrix Theory
12. Textbook 4.9.3: Invariant subspaces
Let T be the linear operator on R4 defined by
T(x1, x2, x3, x4) = (x1 + x2 + 2x3 − x4, x2 + x4, 2x3 − x4, x3 + x4),
and let X = span {e1, e2} be the subspace that is spanned by the first two unit vectorsin R4.
(a) Explain why X is invariant under T.
(b) Determine[T/X
]{e1,e2}
.
(c) Describe the structure of [T]B, where B is any basis obtained from an extension of{e1, e2}.
13. Textbook 4.9.4
Let T and Q be the matrices
T =
−2 −1 −5 −2−9 0 −8 −2
2 3 11 53 −5 −13 −7
and Q =
1 0 0 −11 1 3 −4−2 0 1 0
3 −1 −4 3
(a) Explain why the columns of Q are a basis for R4.
(b) Verify that X = span {Q:1,Q:2} and Y = span {Q:3,Q:4} are each invariant sub-spaces under T.
(c) Describe the structure of Q−1TQ without doing any computation.
(d) Now compute the product Q−1TQ to determine[T/X
]{Q:1,Q:2}
and[T/Y
]{Q:3,Q:4}
.
14. Textbook 4.9.7
If A is an n × n matrix and λ is a scalar such that (A − λI) is singular (i.e., λ is aneigenvalue), explain why the associated space of eigenvectors N(A−λI) is an invariantsubspace under A.
15. Textbook 4.9.8
Consider the matrix A =
(−9 4−24 11
).
(a) Determine the eigenvalues of A.
(b) Identify all subspaces of R2 that are invariant under A.
(c) Find a nonsingular matrix Q such that Q−1AQ is a diagonal matrix.
97
Nitsche and Benner Unit 7. Linear Transformations
98
UNIT 8
Norms
8.1 Lecture 26: October 25, 2013
Homework 5 due Friday
Difinition of norms
Norm acts on a vector space V over R or C.
Definition 8.1. A norm is a function ‖ · ‖ : V → R by : x→ ‖x‖ such that
1. ‖x‖ ≥ 0 for any x ∈ V , and ‖x‖ = 0 if and only if x = 0
2. ‖αx‖ = |α|‖x‖
3. ‖x + y‖ ≤ ‖x‖+ ‖y‖
Vector Norms
Some norms:
• ‖x‖2 =√∑n
i=1 x2i which is the 2-norm or the Euclidean norm
• ‖x‖1 =∑n
i=1 |xi|
• ‖x‖p = (∑n
i=1 xpi )
1/p
• ‖x‖∞ = maxi |xi| = limp→∞ ‖x‖p
The two norm
A unit vector is x‖x‖ and the unit ball in R2 {x ∈ R2 : ‖x‖ = 1} We illustrate the unit balls
for the three primary norms: ‖x‖2 = 1 which gives a circle, ‖x‖1 = 1 or |x1|+ |x2| = 1 whichgives a rhombus, ‖x‖∞ = 1 or (x1, x2) such that max(|x1|, |x2|) = 1 which gives a square.
99
Nitsche and Benner Unit 8. Norms
Theorem 8.2. ‖x‖∞ ≤ ‖x‖2 ≤ ‖x‖1
Proof.
‖x‖∞ = maxi|xi|, (8.1.1a)
= maxi
√x2i , (8.1.1b)
=√x2k , for some k, (8.1.1c)
≤
√√√√ n∑i=1
x2i , (8.1.1d)
= ‖x‖2; (8.1.1e)
=√∑
|xi|2 , (8.1.1f)
≤√(∑
|xi|)2
, (8.1.1g)
= ‖x‖1. (8.1.1h)
�
Our goal is now to prove the triangle inequality for the 2-norm. Note that ‖x‖22 =
∑x2i =
xᵀx, where xᵀy is the standard inner product.
Theorem 8.3. The Cauchy–Schwarz inequality (or CBS): |xᵀy| ≤ ‖x‖‖y‖
Proof. Let α = xᵀyxᵀx
; note xᵀy = yᵀx. Also,
xᵀ
(αx− y) = xᵀ(
xᵀy
xᵀxx− y
), (8.1.2a)
= xᵀxᵀy
xᵀxx− x
ᵀy, (8.1.2b)
=xᵀy
xᵀxxᵀx− x
ᵀy, (8.1.2c)
= xᵀy − x
ᵀy, (8.1.2d)
= 0. (8.1.2e)
Further,
0 ≤ ‖αx− y‖22 = (αx− y)
ᵀ(αx− y) , (8.1.3a)
= αx (αx− y)− y (αx− y) , (8.1.3b)
= −αyᵀx + y
ᵀy, (8.1.3c)
= −xᵀy
xᵀxyᵀx + y
ᵀy, (8.1.3d)
= −|xᵀy|‖x‖2
2
+ ‖y‖22. (8.1.3e)
100
8.1. Lecture 26: October 25, 2013 Applied Matrix Theory
this gives ‖y‖2 ≥ |xᵀy|‖x‖22
and therefore ‖x‖2‖y‖2 ≥ |xᵀy|2. �
Theorem 8.4. ‖x + y‖2 ≤ ‖x‖2 + ‖y‖2
Proof.
‖x + y‖22 = (x + y)
ᵀ(x + y) , (8.1.4a)
=(xᵀ
+ yᵀ)
(x + y) , (8.1.4b)
= xᵀx + 2x
ᵀy + y
ᵀy, (8.1.4c)
≤ ‖x‖2 + 2∣∣xᵀy∣∣+ ‖y‖2, (8.1.4d)
≤ ‖x‖2 + 2‖x‖2‖y‖2 + ‖y‖2, (8.1.4e)
= (‖x‖2 + ‖y‖2)2 , (8.1.4f)
‖x + y‖2 ≤√
(‖x‖2 + ‖y‖2)2 , (8.1.4g)
≤ ‖x‖2 + ‖y‖2. (8.1.4h)
�
Matrix Norms
Definition 8.5. A matrix norm is a function ‖ · ‖ : Rn×m → R such that,
1. ‖A‖ ≥ 0 for any A ∈ Rn×m, and ‖A‖ = 0 if and only if A = 0
2. ‖αA‖ = |α|‖A‖
3. ‖A + B‖ ≤ ‖A‖+ ‖B‖
The Frobenius Norm
The Frobeius norm is defined
‖A‖F =
√∑i,j
a2ij . (8.1.5)
or
‖A‖2F =
∑i
‖Ai,:‖22, (8.1.6a)
=∑j
‖A:,j‖22, (8.1.6b)
=∑j
aᵀjaj, (8.1.6c)
= tr(AᵀA). (8.1.6d)
which gives us a convenient way of expressing this norm.
101
Nitsche and Benner Unit 8. Norms
Induced Norms
Given a vector norm on Rn we may define (where sup is the smallest upper bound)
‖A‖ = supx∈Rn
‖Ax‖‖x‖
= sup‖x‖=1
‖Ax‖. (8.1.7)
we may also replace the smallest upper bound (sup) with the maximum (max). We can nowtake ‖A‖2, ‖A‖1, and ‖A‖∞
8.2 Lecture 27: October 28, 2013
Matrix norms (review)
Definition 8.6. A norm on V
1. ‖A‖ ≥ 0 for any A ∈ Rn×m, and ‖A‖ = 0 if and only if A = 0
2. ‖αA‖ = |α|‖A‖
3. ‖A + B‖ ≤ ‖A‖+ ‖B‖
Frobenius Norm
The Frobenius norm is defined
‖A‖2F =
∑i
|aij|2, (8.2.1a)
=∑i
‖Ai,:‖22, (8.2.1b)
=∑j
‖A:,j‖22, (8.2.1c)
= tr(AᵀA), (8.2.1d)
= tr(A?A) (8.2.1e)
for A ∈ Cn×m. In in the real set A? = Aᵀ.
Properties of the Frobenius norm:
1. ‖Ax‖2 = ‖x‖2‖A‖F
2. ‖AB‖F = ‖A‖F‖B‖F
102
8.2. Lecture 27: October 28, 2013 Applied Matrix Theory
Proof. Property (1):
‖Ax‖2 =∑i
(Ax)2i , (8.2.2a)
=∑i
(Ai,:x)2 , (8.2.2b)
≤∑i
‖Ai,:‖22‖x‖
22, (8.2.2c)
= ‖x‖22
∑i
‖Ai,:‖22︸ ︷︷ ︸
‖A‖2F
. (8.2.2d)
Property (2):
‖AB‖2F =
∑j
‖(AB)j,:‖22, (8.2.3a)
=∑j
‖ABj,:‖22, (8.2.3b)
≤∑j
‖A‖2F‖Bj,:‖2
2, (8.2.3c)
= ‖A‖2F
∑j
‖Bj,:‖22︸ ︷︷ ︸
‖B‖2F
. (8.2.3d)
�
Example 8.7.
A =
(1 20 2
)(8.2.4)
AᵀA =
(1 02 2
)(1 20 2
), (8.2.5a)
=
(1 22 8
)(8.2.5b)
So
‖A‖2 =√
tr(AᵀA) , (8.2.6a)
=√
9 , (8.2.6b)
= 3. (8.2.6c)
which may be called by norm(A, ’fro’) in Matlab.
103
Nitsche and Benner Unit 8. Norms
Induced Matrix Norms
Definition 8.8. For A ∈ Rn×m the induced norm of the matrix is
‖A‖ = maxx∈Rn
‖Ax‖‖x‖
, (8.2.7a)
= max‖x‖=1
‖Ax‖ (8.2.7b)
Example 8.9.
A =
(1 20 2
)(8.2.8)
‖A‖1 = max‖x‖=1
‖Ax‖1, (8.2.9a)
= max‖x‖=1
∑|(Ax)|. (8.2.9b)
This provides a remap of the vector x. For example we may find the image of the pointsof the corners of the unit rhombus for the x vector. Which can provide a way to find the1-norm, but this is not the best physically. Returning to the ∞-norm,
‖A‖∞ = max‖x‖∞=1
‖Ax‖∞, (8.2.10a)
= max‖x‖∞=1
maxi|(Ax)i|. (8.2.10b)
Here we can remap the corners of the unit square to a stretched parallelogram. What isthe maximum ∞-norm? From the figure, we can see it is 3. Now we are interested in themapping of the 2-norm, which is the unit circle.
‖A‖2 = max‖x‖2=1
‖Ax‖2, (8.2.11a)
≈ 2.92. (8.2.11b)
ASIDE: Say we have,
(Ax)21 + (Ax)22 = (a11x1 + a12x2)2 + (a21x1 + a22x2)2 , (8.2.12a)
= a211x21 + x1x2 (a11a12 + a21a22) + a222x
22, (8.2.12b)
= constant. (8.2.12c)
which would give an ellipse.
Theorem 8.10. ‖A‖1 = maxj∑
i |aij| which gives the maximum column-sum, and ‖A‖1 =maxi
∑j |aij| which is the maximum row-sum.
104
8.2. Lecture 27: October 28, 2013 Applied Matrix Theory
Properties
The induced norms of a matrix have similar properties to the Frobenius norm:
1. ‖Ax‖ ≤ ‖A‖‖x‖ since ‖Ax‖‖x‖ ≤ ‖A‖
2. ‖AB‖ = ‖A‖‖B‖ (Will be shown in the homework)
Example 8.11. The induced norm of the identity matrix is 1; ‖I‖ = 1.
Proof.
‖A‖1 = max‖x‖1=1
∑i
|(Ax)|︸ ︷︷ ︸‖Ax‖1
, (8.2.13a)
= max‖x‖1=1
∑i
∣∣∣∣∣∑j
aijxj
∣∣∣∣∣, (8.2.13b)
≤ max‖x‖1=1
∑i
∑j
|aij||xj|, (8.2.13c)
= max‖x‖1=1
∑i
|xj|∑j
|aij|, (8.2.13d)
≤ max‖x‖1=1
∑i
|xj|∑j
|aij|︸ ︷︷ ︸independent of j
, (8.2.13e)
= max‖x‖1=1
maxj
∑i
|aij|∑j
|xj|︸ ︷︷ ︸‖x‖1=1
, (8.2.13f)
= maxj
∑i
|aij|. (8.2.13g)
Now find an x such that the upper bound is attained. So let k =∑
i |aik| = maxj∑
i |aij|.Now let x = ek, then
‖Ax‖1 = ‖Aek‖, (8.2.14a)
= ‖A:k‖1, (8.2.14b)
=∑i
|aij|, (8.2.14c)
= maxj|aij|, (8.2.14d)
= upper bound. (8.2.14e)
�
Further ‖A‖22 = max ‖Ax‖2
2 such that ‖x‖22 = 1. Then, ‖A‖2
2 = max (xᵀAᵀAx) such thatxᵀx = 1. This arrizes Lagrange multipliers, or ∇f = λ∇g.
105
Nitsche and Benner Unit 8. Norms
8.3 Lecture 28: October 30, 2013
The 2-norm
Given the 2-norm ‖A‖2 = max‖x‖2=1 ‖Ax‖2 we have, f(x) = ‖A‖22 = max (xᵀAᵀAx) such
that g(x) = xᵀx = 1 where f(x) : Rn → R. This needs Lagrange multipliers, or∇f = λ∇g.For a minimization problem.
∂UV
∂xj=∂U
∂xjV + U
∂V
∂xj(8.3.1)
Lemma 8.12. If B is symmetric, ∇ (xᵀBx) = 2Bx
Note: ∇ (xᵀx) = 2x
Proof. To prove this lemma,
∂
∂xj
(xᵀBx)
=∂
∂xj
(xᵀBx)
+ xᵀB∂x
∂xj, (8.3.2a)
= eᵀjBx + x
ᵀBej, (8.3.2b)
= eᵀjBx +
(xᵀBej
)ᵀ, (8.3.2c)
= eᵀj B︸︷︷︸
=Bᵀ
x + eᵀjBᵀx, (8.3.2d)
= 2eᵀjBx, (8.3.2e)
= 2 (Bx)j . (8.3.2f)
�
Proof. Alternatively, we may consider,
∂
∂xj
(∑i
xi (Bx)i
)=
∂
∂xj
(∑i
xi∑k
Bikxk
), (8.3.3a)
=∂
∂xj
(∑i
∑k
xiBikxk
), (8.3.3b)
=∑k
Bjkxk +∑i
xiBij, (8.3.3c)
=∑k
Bjkxk +∑k
Bkjxk, (8.3.3d)
=∑k
Bjkxk +∑k
Bjkxk, (8.3.3e)
= 2 (Bx)j . (8.3.3f)
�
106
8.3. Lecture 28: October 30, 2013 Applied Matrix Theory
So,
2AᵀAx = 2x, (8.3.4a)
AᵀAx = λx, (8.3.4b)
and the solution (λ,x) is an eigenpair of AᵀA. Note, for these x, f(x) = xᵀAᵀAx = xᵀλx =λxᵀx = λ. Thus,
max(f) = λmax = maxkλk (8.3.5)
and λk = eigenvalue of AᵀA. Note further that AᵀA is symmetric so the eigenvalues are realand therefore f(x) ≥ 0 and λk ≥ 0.
Example 8.13. Given
A =
(1 20 2
)(8.3.6)
and
AᵀA =
(1 22 8
). (8.3.7)
Then,
det(AᵀA− λI
)=
∣∣∣∣1− λ 22 8− λ
∣∣∣∣ , (8.3.8a)
= (1− λ) (8− λ)− 4, (8.3.8b)
= λ2 − 9λ+ 4. (8.3.8c)
So,
λ1,2 =9±√
81− 16
2, (8.3.9a)
=9±√
65
2, (8.3.9b)
and
λmax =9 +√
65
2. (8.3.10)
Therefore:
‖A‖2 =9 +√
65
2≈ 2.9208 . . . (8.3.11)
Now, ‖x‖∞ ≤ ‖x‖2 ≤ ‖x‖1. This inequality does not hold for matrices. Some properties,(where UᵀU = I and VᵀV = I)
• ‖A‖2 = ‖Aᵀ‖2
• ‖AᵀA‖2 = ‖A‖22
•∥∥∥∥(A 0
0 B
)∥∥∥∥2
= max (‖A‖2, ‖B‖2)
• ‖UᵀAU‖2 = ‖A‖2
• ‖A−1‖2 = 1√λmin(AᵀA)
107
Nitsche and Benner Unit 8. Norms
108
UNIT 9
Orthogonalization with Projection andRotation
9.1 Lecture 28 (cont.)
Inner Product Spaces
An inner product space V plus the the inner product.
Definition 9.1. Given a vector space V , an inner product is a function f : V ×V → R or Cby f(x,y) = 〈x,y〉 such that
• 〈x,y〉 = 〈y,x〉
• 〈x, αy〉 = α 〈x,y〉, note 〈x, αy〉 = 〈y,xα〉 = α 〈y,x〉 = α〈y,x〉 = α 〈x,y〉
• 〈x + z,y〉 = 〈x,y〉+ 〈z,y〉
• 〈x,x〉 ≥ 0 for any x ∈ V
• 〈x,x〉 = 0 implies x = 0
Example 9.2. 1. 〈x,y〉 = xᵀy with V = Rn and 〈x,y〉 = x∗y with V = Cn, wherex∗ = xᵀ.
2. 〈x,y〉A = xᵀAᵀAy with V = Rn and 〈x,y〉A = x∗A∗Ay with V = Cn
This gives us a new norm ‖x‖A =√
xᵀAᵀAx = ‖Ax‖2.
3. 〈f, g〉 =´ baf(x)g(x) dx, V = C0[a, b] and ‖f‖ =
ë baf(x)f(x) dx =
ë ba|f(x)|2 dx
4. 〈f, g〉 =´ baω(x)f(x)g(x) dx where ω(x) ≥ 0
5. 〈A,B〉 = tr(AᵀB) and ‖A‖ =√
tr(AᵀA) = ‖A‖F
109
Nitsche and Benner Unit 9. Orthogonalization with Projection and Rotation
9.2 Lecture 29: November 1, 2013
Inner Product Spaces
Reviewing properties of inner product spaces,
• 〈x,y〉 = 〈y,x〉
• 〈x, αy〉 = α 〈x,y〉
• 〈x + z,y〉 = 〈x,y〉+ 〈z,y〉
• 〈x,x〉 ≥ 0 for any x ∈ V
• 〈x,x〉 = 0 implies x = 0
Now we may define norms ‖x‖ =√〈x,x〉 . Let’s say we want to define angles between
vectors and ‖y − x‖2 = ‖x‖2 + ‖y‖2 − 2‖x‖‖y‖ cos(θ). Rearranged,
cos(θ) =−‖y − x‖2 + ‖x‖2 + ‖y‖2
2‖x‖‖y‖, (9.2.1a)
=〈x,x〉+ 〈y,y〉 − 〈y − x,y − x〉
2‖x‖‖y‖, (9.2.1b)
=〈y,x〉+ 〈x,y〉
2‖x‖‖y‖, (9.2.1c)
=〈x,y〉‖x‖‖y‖
, (9.2.1d)
only if 〈x,y〉 ∈ R. For a more general definition 〈y,x〉 + 〈x,y〉 = 〈y,x〉 + 〈y,x〉 =2 Re(〈y,x〉). So we would have the problem of the conjugate in finding the angle, buthave reduced this issue.
Definition 9.3. The angle between x,y is given by
cos(θ) =〈x,y〉‖x‖‖y‖
. (9.2.2)
So, for x ⊥ y means 〈x,y〉 = 0.
Note: If the inner product is not a real number, then 〈x,y〉 = 0 means ‖x‖2 + ‖y‖2 =‖y − x‖2, but not vice-versa.
Example 9.4.
x =
1−2
3−1
and y =
41−2−4
.
110
9.2. Lecture 29: November 1, 2013 Applied Matrix Theory
So x ⊥ y in 〈x,y〉 = xᵀy, but x 6⊥y in 〈x,y〉A = xᵀAᵀAy where,
A =
1 2 0 00 1 0 00 0 1 00 0 0 1
.
Definition 9.5. A set {u1, . . . ,un} is orthonormal if ‖uk‖ = 1 for any k and 〈uj,uk〉 = 0for any j 6= k.
Fourier Expansion
Given an orthonormal basis for V we can write x ∈ V as
x = c1u1 + c2u2 + · · · cnun (9.2.3)
with 〈x,uj〉 = cj 〈uj,uj〉 = cj.
Example 9.6. Given a series{
1√π
sin(kx)}nk−1
is orthonormal with respect to the inner
product´ π−π f(x)g(x) dx.
How do we compute the following integrals?´
sin(kx) dx =´ 1−cos(2kx)
2dx So if f ∈
span {sin(kx)} then f = 1√π
∑nk=1 ck sin(kx). Thus, ck = 1√
π
´ π−π f(x) sin(kx) dx.
In homework we will approximate a line on [−π, π] with the sine and cosine Fourier series.This is essentially the 2-norm approximation of the span of the Fourier series. The Gibbsphenomena will be observed with overshoot of the sines and cosines above the function.Thus, orthonormal bases are useful for partial differential equations applications.
Orthogonalization Process (Gramm-Schmidt)
Goal: Given basis {a1, . . . , an} find an orthonormal basis {u1, . . . ,un} for V . This is theorthogonalization process. Method: find uk such that span {u1, . . . ,un} = span {a1, . . . , an}for k = 1, . . . , n. Now let’s show the process.
k = 1:
u1 =a1
‖a1‖
k = 2:
u2 =a2 − 〈u1, a2〉u1
‖a2 − 〈u1, a2〉u1‖
111
Nitsche and Benner Unit 9. Orthogonalization with Projection and Rotation
As an example of the orthogonality of u1 and u2
a2 − 〈u1, a2〉u1
‖a2 − 〈u1, a2〉u1‖=〈u1, a2 − 〈u1, a2〉u1〉
`2
, (9.2.4a)
=1
`2
〈u1, a2 − 〈u1, a2〉u1〉 , (9.2.4b)
=1
`2
[〈u1, a2〉 − u1 〈u1, a2〉u1] , (9.2.4c)
=1
`2
〈u1, a2〉 − 〈u1, a2〉 〈u1,u1〉︸ ︷︷ ︸1
, (9.2.4d)
= 0. (9.2.4e)
k = 3:. . .
k = k:
uk =ak − 〈u1, ak〉u1 − 〈u2, ak〉u2 − · · · − 〈uk−1, ak〉uk−1
‖ak − 〈u1, ak〉u1 − 〈u2, ak〉u2 − · · · − 〈uk−1, ak〉uk−1‖.
This is the Gramm–Schmidt orthogonalization process. If we want, we can write it as,
uk =(I−Uk−1U
∗k) ak
‖(I−Uk−1U∗k) ak‖. (9.2.5)
Here
Uk−1 =
| |u1 · · · uk−1
| |
.9.3 Lecture 30: November 4, 2013
Gramm–Schmidt Orthogonalization
Given basis {a1, . . . , an} find an orthonormal basis {u1, . . . ,un} for that spans the samespace. Algorithm,
u1 =a1
‖a1‖, (9.3.1a)
u2 =a2 − (u1a2) u1
`2
, (9.3.1b)
(9.3.1c)
with using projections,
〈u1, a2〉u1 =(uᵀ1a2
)u1, (9.3.2a)
= u1uᵀ1︸︷︷︸
P11
a2. (9.3.2b)
112
9.3. Lecture 30: November 4, 2013 Applied Matrix Theory
From,
u2 =a2 − (u1a2) u1
‖a2 − (u1a2) u1‖, (9.3.3a)
=(I− u1u
ᵀ1) a2
‖(I− u1uᵀ1) a2‖
, (9.3.3b)
= P⊥a2. (9.3.3c)
Example 9.7. Given the vectors,
a1 =
034
, a2 =
−202711
, and a3 =
−14−4−2
Then we can find the orthogonal vectors,
u1 =1
5
034
. (9.3.4)
Then,
v2 = a2 − 〈u1, a1〉u1, (9.3.5a)
=
−202711
− 1
5
(0 3 4
)−202711
034
1
5, (9.3.5b)
=
−202711
− 125
25
034
· · · (9.3.5c)
***
and
u1 =1
5
034
, (9.3.6a)
u2 =1
25
−20−12−9
, (9.3.6b)
and u3 =1
25
−15−16
12
. (9.3.6c)
113
Nitsche and Benner Unit 9. Orthogonalization with Projection and Rotation
Now rewriting our system,
u1 =a1
`1
, (9.3.7a)
u2 =a2 − r12u1
`2
, (9.3.7b)
u3 =a3 − r13u1 − r23u2
`3
, (9.3.7c)
· · · (9.3.7d)
un =an − r1nu1 − r2nu2 − · · · − rn−1,nun−1
`n. (9.3.7e)
where rij = 〈ai,uj〉. Now in different vector form,
a1 = `1u1, (9.3.8a)
a2 = r12u1 + `2u2, (9.3.8b)
a3 = r13u1 + r23u2 + `3u3, (9.3.8c)
· · · (9.3.8d)
an = r1nu1 + r2nu2 + · · ·+ rn−1,nun−1 + `nun. (9.3.8e)
We can put this in a matrix form. If A is full rank (must have m ≥ n). Since can have atmost m linearly independent vectors ai. With A = QR,
| | |a1 a2 · · · an| | |
m×n
=
| | |u1 u2 · · · un| | |
m×n
`1 r12 r13 r1n
0 `2 r23 · · · r2n
0 0 `3. . . r3n
.... . . . . .
...0 0 0 · · · `n
n×n
. (9.3.9)
where rii = `i 6= 0 > 0 and R is invertible. This uniquely determines the Fourier coefficientsof the Fourier expansion of this system.
Thus, every matrix A of full rank has a unique decomposition, known as a QR fac-torization, Am×n = Qm×nRn×n, where R is invertible. What do we know about QᵀQ?(QᵀQ)ij = uᵀiuj which is zero for i 6= j and one for i = j. So QᵀQ = In×n. These areorthogonal matrices.
Decompositions of A:
• Am×n = Qm×nRn×n, where QᵀQ = I and R is invertible.
• A = LU if |Ak| 6= 0.
• PA = LU always exists.
Now what about QQᵀ? It will be an m×m matrix, but otherwise we know little about it.
114
9.4. Lecture 31: November 6, 2013 Applied Matrix Theory
Example 9.8. Returning to our example,0 −20 −143 27 −44 11 −2
=
0 −20/25 −15/253/5 12/25 −16/254/5 −9/25 12/25
5 25 r13
0 `2 r23
0 0 `3
(9.3.10)
In this case Q has three linearly independent columns and three linearly independent rows.So Qᵀ has linearly independent columns. And interestingly (Qᵀ)
ᵀQᵀ = QQᵀ = I. This is
an orthogonal matrix: it is both invertible and has orthogonal columns. In general this isnot the case because it is not n× n and QQᵀ is not necessarily the identity if m > n.
Use A = QR:
Example 9.9. Assume An×n invertible; solve Ax = b. Rewrite
QRx = b, (9.3.11a)
QᵀQRx = Q
ᵀb, (9.3.11b)
Rx = Qᵀb. (9.3.11c)
This system is quick to solve (once Q and R are known).
Example 9.10. Assume Am×n full rank m > n then Ax = b is an overdetermined systemand least squares solution satisfies,
AᵀAx = A
ᵀb, (9.3.12a)
RᵀQᵀQRx = R
ᵀQᵀb, (9.3.12b)
RᵀRx = R
ᵀQᵀb, (9.3.12c)
Rx =(Rᵀ)−1
RᵀQᵀb, (9.3.12d)
Rx = Qᵀb. (9.3.12e)
Go through this proof and the solutions manual. Then we will see how well SVD canimprove things later.
9.4 Lecture 31: November 6, 2013
In homework the reduced QR factorization reffered to is where we can always write Am×n =Qm×nRn×n where QᵀQ = I and Rn×n is triangular. This factorization is unique, but wemay also
QR =
| |q1 · · · qn| |
x x · · · x
0 x. . . x
.... . . . . .
...0 0 · · · x
(9.4.1)
115
Nitsche and Benner Unit 9. Orthogonalization with Projection and Rotation
since {q1, . . . ,qn} is an orthogonal basis for R(A) ⊂ Rm. Now,
QR =
| | | |q1 · · · qn qn+1 · · · qm| | | |
m×m
x x · · · x0 x · · · x...
. . . . . ....
0 0 · · · x0 0 · · · 0
...0 0 · · · 0
m×n
(9.4.2)
In this case, the reduced QR is not unique.
Unitary (orthogonal) matrices
The unitary refers to the complex case and the orthogonal refers to the real.
Definition 9.11. A unitary matrix is Q ∈ Cn×n such that Q∗Q = I. This means we haveQ has n orthogonal columns. Additionally, since Q is square we have n orthogonal rows.So, Q ∈ Rn×n, with QᵀQ = QQᵀ = In×n.
Properties
Some properties for a unitary Q:
• Q∗Q = QQ∗ = In×n
• Q−1 = Q∗
• columns are orthonormal
• rows are orthonormal
• (Qx)∗Qy = x∗Q∗Qy = x∗y for any x,y.
Note: ‖Qx‖ = ‖x‖, so Q is an isometry . Also, If u,v unitary, then uv is unitary since,
(uv)∗ (uv) = v∗u∗uv, (9.4.3a)
= v∗v, (9.4.3b)
= I, (9.4.3c)
= (uv) (uv)∗ . (9.4.3d)
Example 9.12. Q in full QR factorization of any A. In Matlab, [q, r] = qr(a) (is thisfull QR?) and [q, r] = qr(a,0) (is this reduced QR?).
Now to compute the QR factorization, the Gramm–Schmidt algorithm is not numericallystable. Thus, small changes in the input matrix values can cause large changes in the result.The alternative is the modified Gramm–Schmidt which improves the stability properties. We
116
9.4. Lecture 31: November 6, 2013 Applied Matrix Theory
will not cover this here, but it is discussed in future courses. A better algorithm is to obtainthe QR by premuliplying by orthogonal matrices until it is triangular, or
Qn · · ·Q1︸ ︷︷ ︸Q∗
A = R, (9.4.4)
then A = QR. This is better because it does not use projections, which are not orthogonal.Rotations of orthogonal matrices as well as reflections are useful to introduce zeros. As anexample,
Rotation
Example 9.13. Rotation in the xy plane about the origin. So the matrix P =
(cos(θ) − sin(θ)sin(θ) cos(θ)
).
Now, P−1 = P−θ =
(cos(θ) sin(θ)− sin(θ) cos(θ)
)= Pᵀ. This again shows that it is orthogonal, par-
ticularly the columns are orthogonal. These are rotations in the plane.
Example 9.14. 3D RotationRotation in three dimensions about the z-axis: This is very similar,
P =
cos(θ) − sin(θ) 0sin(θ) cos(θ) 0
0 0 1
. (9.4.5)
this rotates in the xy plane.
We can further rotate in any plane ij for some vector in Rn;
P =
i j
1i cos(θ) − sin(θ)
1j sin(θ) cos(θ)
1
, (9.4.6)
Px =
x1...
xi−1
cos(θ)xi − sin(θ)xjxi+1
...xj−1
sin(θ)xi − cos(θ)xjxj+1
...xn
. (9.4.7)
117
Nitsche and Benner Unit 9. Orthogonalization with Projection and Rotation
This is called a Givens rotation. We can choose our θ such that (Qix)j = 0. So if we have
Pθ =
x x x xx x x xx x x xx x x xx x x x
=
x x x xx x x xx x x xx x x x0 x x x
. (9.4.8)
So the QR factorization by Givens rotations,
Pθn · · ·Pθ2Pθ1︸ ︷︷ ︸Q∗
A = R. (9.4.9)
Note, projections are not orthogonal. We can check this with PP∗ = I. However P(u1) = 0and this means we have a non-trivial null-space so projections is not invertible. Therefore,this is not invertible.
Reflection
Example 9.15. Suppose we have the vectors u and x, where ‖u‖ = 1. We want to reflectx across the plane orthogonal to u. We will consider this operation Rx This operation isalso orthogonal. Now we will generalize a vector u⊥ = {v : vᵀu = 0}. So the orthogonalprojection onto u⊥; first we know 〈u,x〉,
Px = x− 〈u,x〉u, (9.4.10a)
= (I− uu∗) x, (9.4.10b)
Rx = (I− 2uu∗) x. (9.4.10c)
where P is the projection onto the subspace and R is the reflection across the subspace.Now R∗ = (I− 2uu∗) = R and R2 = I. This implies that R−1 = R∗ and R is orthogonal.
9.5 Homework Assignment 6: Due Monday, November
11, 2013
1. Let A =
(1 23 4
). Find ‖A‖p for p = 1, 2,∞,F.
2. Show that ‖A‖∞ = maxi
∑j |aij| (Hint: make sure you understand how the analogous
formula for ‖A‖1 was derived in class.)
3. (a) Given a vector norm ‖x‖, prove that the formula ‖A‖ = supx 6=0
‖Ax‖‖x‖ defines a matrix
norm.
(This is called the induced matrix norm.)
(b) Show that for any induced matrix norm, ‖Ax‖ ≤ ‖A‖‖x‖.(c) Prove that any induced matrix norm also satisfies ‖AB‖ ≤ ‖A‖‖B‖.
118
9.5. HW 6: Due November 11, 2013 Applied Matrix Theory
4. Consider the formula ‖A‖ = maxi,j|aij|
(a) Show that it defines a matrix norm.
(b) Show that it is not induced by a vector norm.
5. Meyer, Exercise 5.2.6
Establish the following properties of the matrix 2-norm.
(a) ‖A‖2 = max‖x‖2=1, ‖y‖2=1
|y∗Ax|,
(b) ‖A‖2 = ‖A∗‖2,
(c) ‖A∗A‖ = ‖A‖22,
(d)
∥∥∥∥(A 00 B
)∥∥∥∥2
= max {‖A‖2, ‖B‖2} (take A, B to be real),
(e) ‖U∗AV‖2 = ‖A‖2 when UU∗ = I and V∗V = I.
6. Show that ‖A−1‖ =√
1λmin
where λmin is the smallest eigenvalue of AᵀA.
7. Show that 〈A,B〉 = tr(A∗B) defines an inner product.
8. Meyer, Exercise 5.3.4
For a real inner-product space with ‖ ? ‖2 = 〈?, ?〉, derive the inequality
〈x,y〉 ≤ ‖x‖2 + ‖y‖2
2. Hint: Consider x− y.
9. Meyer, Exercise 5.3.5
For n× n matrices A and B, explain why each of the following inequalities is valid.
(a) |tr(B)|2 ≤ n[tr(B∗B)].
(b) tr(B2) ≤ tr(BᵀB) for real matrices.
(c) tr(AᵀB) ≤ tr(AᵀA)+tr(BᵀB)2
for real matrices.
10. Given
A =
1 0 −11 2 11 1 −30 1 1
, and b =
1111
.(a) Find an orthonormal basis for R(A), using the standard inner product.
(b) Find the (reduced) QR decomposition of A.
(c) For the matrix Q in (b), compute QᵀQ and QQᵀ.
(d) Find the least squares solution of Ax = b, using your results above.
(e) Determine the Fourier expansion of b with respect to the basis you found in (a).
11. Explain why the (reduced) QR factorization of a matrix A of full rank is unique.
119
Nitsche and Benner Unit 9. Orthogonalization with Projection and Rotation
12. Meyer, Exercise 5.5.11
Let V be the inner-product space of real-valued continuous functions defined on theinterval [−1, 1], where the inner product is defined by
〈f, g〉 =
ˆ 1
−1
f(x)g(x) dx ,
and let S be the subspace of V that is spanned by the three linearly independentpolynomials q0 = 1, q1 = x, q2 = x2.
(a) Use the Gram–Schmidt process to determine an orthonormal set of polynomials{p0, p1, p2} that spans S. These polynomials are the first three normalized Legendrepolynomials.
(b) Verify that pn satisfies Legendres differential equation
(1− x2)y′′ − 2xy′ + n(n+ 1)y = 0
for n = 0, 1, 2. This equation and its solutions are of considerable importance inapplied mathematics.
9.6 Lecture 32: November 8, 2013
From last time:
Elementary orthogonal projectors
Let u, where ‖u‖ = 1, then the projection of a vector x onto the sub plane orthogonalto u is P⊥x = x − 〈u,x〉u. And P|| = uu∗ and P = I − uu∗. Now this projector,P, is not orthogonal. This is because an orthogonal matrix has the form Q∗ = Q−1 orQ∗Q = QQ∗ = I. Now,
P∗⊥ = I− (u∗)∗ u∗, (9.6.1a)
= P⊥. (9.6.1b)
This further gives
P∗P = P2, (9.6.2a)
= P, (9.6.2b)
6= I. (9.6.2c)
This property shows that once we project, projection a second time does not change theresult. Also, N(P) 6= 0, so the projectors are not invertible. Now the null space of P|| isequal to u⊥, or N(P||) = u⊥. Similarly N(P⊥) = span(u).
120
9.6. Lecture 32: November 8, 2013 Applied Matrix Theory
Elementary reflection
Now Rx = x−〈u,x〉u, and in this case R is orthogonal. So R∗ = R and R∗R = RR∗ = I.Also,
(I− 2uu∗) (I− 2uu∗) = I− 2uu∗ − 2uu∗ + 4u (u∗u) u∗, (9.6.3a)
= I− 4uu∗ + 4uu∗, (9.6.3b)
= I. (9.6.3c)
Now use reflectors to compute A = QR. So say we have
Ru =
x x xx x xx x xx x x
. (9.6.4)
So Rx = (‖u‖, 0) = ‖u‖ei. Thus, u = x− ‖x‖ei. Doing successive reflections,
RuN· · ·Ru2Ru1︸ ︷︷ ︸
Q
A = R. (9.6.5)
This gives us the Householder method .
Complimentary Subspaces of VDefinition 9.16. If V = X + Y , where X ,Y are subspaces such that X ∩ Y = {0}, whichare called complimentary subspaces and V = X ⊕ Y is the direct sum of X ,Y .
Given the general picture, how do we define the angle between two subspaces? Note: IfV = X ⊕ Y then any z ∈ V can be written uniquely as z = x + y, for x ∈ X and y ∈ Y .Further dim(V) = dim(X ) + dim(Y) and BV = BX ∪ BY .
Proof. If z = x1 + y1 = x2 + y2 then x1 − x2 = y1 − y2 ∈ X ∩ Y . So x1 − x2 = y1 − y2 = 0and X ∩ Y = {0}. �
Example 9.17. Say we have Rn = R(A)⊕ N(Aᵀ) for Am×n.
Projectors
Definition 9.18. We define general projectors: The projector P onto X along Y is thelinear operator such that P(z) = P(x + y) = x.
Note: If P projects onto X along Y then P2 = P because P2(x+y) = P(x) = P(x+0) =x = P(z). Now the null space, N(P) = y because P(z) = P(x + y) = x = 0. Further,R(P) = x. Also, R(P)⊕ N(P) = Rn as we showed in Homework 5.
Ultimately, we want to find the Jordan canonical form of our matrices. In general R(A)+N(A) 6= Rn. This is obvious if Am×n because they have different dimensions, so this onlymakes sense if An×n. But even if A is square, let y ∈ N(A) ∩ R(A) then Ay = 0 andy = Az for some z. Then A (Az) = A2z = 0, and we have a non-trivial intersection. So ifA2 has a nontrivial null space, then N(A) and R(A) have nontrivial intersection.
121
Nitsche and Benner Unit 9. Orthogonalization with Projection and Rotation
Example 9.19. Obviously this cannot be an invertible matrix, so say we have
A =
(0 10 0
)and A2 =
(0 00 0
)This is an example of a null-potent matrix. But this is only true for projectors.
Theorem 9.20. P is a projector if and only if P2 = P. These are also known as idempotentmatrices.
9.7 Lecture 33: November 11, 2013
From last time:
Definition 9.21. P : V → V is a projector if for each X ,Y such that V = X ⊕ Y andV(x + y) for any z = x + y ∈ V .
Note: R(V) = X and N(V) = Y .
Projectors
Theorem 9.22. P is a projector if and only if P2 = P. These are also known as idempotentmatrices.
Proof. Given the vector space V and the operator P = P2,
R(P)︸ ︷︷ ︸X
⊕N(P)︸ ︷︷ ︸Y
= V , (9.7.1a)
P(x + y) = Px + Py, (9.7.1b)
= P Px0︸︷︷︸x, some x0
, (9.7.1c)
= Px0, (9.7.1d)
= x. (9.7.1e)
Going the other way,
z = Pz︸︷︷︸∈R(P)
+ (z−Pz)︸ ︷︷ ︸∈N(P)
, (9.7.2a)
V = R(P)⊕ N(P). (9.7.2b)
�
122
9.7. Lecture 33: November 11, 2013 Applied Matrix Theory
Representation of a projector
We discuss the representation of P. Given {m1, . . . ,mr} as a basis for R(P) = X and{n1, . . . ,nn−r} as a basis for N(P) = Y . Then Pmi = mi and Pni = 0. Let B = [M | N].Then
PB = P[M | N], (9.7.3a)
= [M | 0]. (9.7.3b)
[P]s = P, (9.7.4a)
= [M | 0]B−1, (9.7.4b)
= [M | N]
(Ir×r 00 0
)B−1, (9.7.4c)
= B
(Ir×r 00 0
)B−1, (9.7.4d)
= [I]BS [P]B [I]−1BS . (9.7.4e)
Definition 9.23. For any subspace M⊂ V , M⊥ ={v ∈ V such that v⊥u = 0, u ∈M
}.
Theorem 9.24. For any subspace M⊂ V, V =M⊕M⊥
Proof. Given basis {b1, . . . ,bm} ofM, choose {bi} orthonormal complement by orthogonalset {bm+1, . . . ,bn} such that {b1, . . . ,bm︸ ︷︷ ︸
basis for M
,bm+1, . . . ,bn︸ ︷︷ ︸basis for M⊥
} is a basis for V . �
Example 9.25. Rn = R(A) ⊕ N(Aᵀ) where R(A) ⊥ N(Aᵀ). An orthogonal projector ontoM is PM is
PM = [M | N]
(I 00 0
)[M | N]−1, (9.7.5a)
M∗M = 0, (9.7.5b)
N∗N = 0, (9.7.5c)
(9.7.5d)
Where
M =
| |m1 · · · mm
| |
n×m
and N =
| |nm+1 · · · nn| |
n×(n−m)
. (9.7.6)
Note: ((M∗M)−1 M∗
(N∗N)−1 N∗
)︸ ︷︷ ︸
[M | N]−1
(M N
)=
(I 00 I
)(9.7.7)
123
Nitsche and Benner Unit 9. Orthogonalization with Projection and Rotation
and
PM = [M | 0]
((M∗M)−1 M∗
(N∗N)−1 N∗
), (9.7.8a)
= M (M∗M)−1 M∗. (9.7.8b)
But if the basis were orthonormal, how does this change the formula? Given any basis{m1, . . . ,mm} for subspace M, orthogonal projector.
PM = M (M∗M)−1 M∗. (9.7.9)
If {m1, . . . ,mm} are orthogonal then M∗M = I and
PM = MM∗. (9.7.10)
Example 9.26. Elementary orthogonal projectors,
P|| = uu∗. (9.7.11)
andP⊥ = I− uu∗ (9.7.12)
Theorem 9.27.‖x−PMx‖2
2 = miny∈M‖x− y‖2
2. (9.7.13)
(we will prove this as an exercise)
Note: A (AᵀA)−1
Aᵀ is the projector onto the range of A, or PR(A) where we assumethat A has full rank. The normal equations to solve Ax = b is
AᵀAx = A
ᵀb. (9.7.14)
andx =
(AᵀA)−1
Aᵀ︸ ︷︷ ︸
pseudoinverse
b. (9.7.15)
So,
Ax = A(AᵀA)−1
Aᵀb, (9.7.16a)
= PR(A)b. (9.7.16b)
9.8 Lecture 34: November 13, 2013
Projectors
We discussed a projector P onto X along Y and also that the projector is idempotent,P2 = P. Further, R(P) = X and N(P) = Y .
[P]S = [M | N]
(Ir×r 00 0
)[M | N]−1. (9.8.1)
124
9.8. Lecture 34: November 13, 2013 Applied Matrix Theory
The orthogonal projector onto M = R(M), where M = [m1 · · · mm] is the basis of M,
P = M (M∗M)−1 M∗ (9.8.2)
The normal equations for Ax = b, with A being a full rank matrix, are
Ax = PR(A)b. (9.8.3)
Projector P is orthogonal, then P∗ = P.
Proof. P is an orthogonal projector,
P = M (M∗M)−1 M∗, (9.8.4a)
P∗ = M (M∗M)−1 M∗, (9.8.4b)
= P. (9.8.4c)
further suppose that P = P2 and P = P∗. Now we want to show that N(P) ⊥ R(P), whereit is normal in the standard inner product. Let x ∈ R(P) and y ∈ N(P). Then consider theinner product,
y∗x = y∗Px, (9.8.5a)
= ( P∗︸︷︷︸P
y)∗x, (9.8.5b)
= (Py)∗︸ ︷︷ ︸0∗
x, (9.8.5c)
= 0∗x, (9.8.5d)
= 0. (9.8.5e)
if {mi} are orthogonal, PM = MM∗. �
Example 9.28.
P|| = uu∗, (9.8.6a)
P⊥ = I− uu∗. (9.8.6b)
V = X ⊕ Y .
Decompositions of Rn
Given An×n, we know R(A)⊕ N(Aᵀ) = Rn and R(Aᵀ)⊕ N(A) = Rn, but R(A)⊥ = N(Aᵀ).Let B = {u1, . . . ,ur︸ ︷︷ ︸
basis for R(A)
,ur+1, . . . ,un︸ ︷︷ ︸basis for N(Aᵀ)
} orthonormal. Further B = { v1, . . . ,vr︸ ︷︷ ︸basis for R(Aᵀ)
,vr+1, . . . ,vn︸ ︷︷ ︸basis for N(A)
}
125
Nitsche and Benner Unit 9. Orthogonalization with Projection and Rotation
also orthonormal. So,
UᵀAV =
(UR(A) UN(Aᵀ)
)ᵀA(VR(Aᵀ) VN(A)
), (9.8.7a)
=
(UᵀR(A)
UᵀN(Aᵀ)
)(AVR(Aᵀ) AVN(A)
), (9.8.7b)
=
(UᵀR(A)AVR(Aᵀ) 0
UᵀN(Aᵀ)AVN(A) 0
), (9.8.7c)
=
(Cr×r 0
0 0
). (9.8.7d)
(UN(Aᵀ)A
)ᵀ= A
ᵀUN(Aᵀ), (9.8.8a)
= 0. (9.8.8b)
Range Nullspace decomposition of An×n
Theorem 9.29. Rn = R(Ak) ⊕ N(Ak) for some k. This is not necessarily an orthogonaldecomposition. The smallest such k is called the index of A.
Proof. First, note that R(Ak+1) ⊂ R(Ak) for any k. This is because if y ∈ R(Ak+1), theny = Ak+1z for some z, then y = Ak (Az). Second, R(A) ⊂ R(A2) ⊂ R(A3) ⊂ · · · ⊂R(Ak) = R(Ak+1) = R(Ak+2) = · · · contains equality for some k. �
to be continued. . .
9.9 Homework Assignment 7: Due Friday, November
22, 2013
You may use Matlab to compute matrix products, or to reduce a matrix to Row EchelonForm.
1. (a) Let A ∈ Rm×n. Prove R(A) and N(Aᵀ) are orthogonal complements of Rm.
(b) Verify this fact for A =
1 2 02 4 11 2 0
.
2. Prove: If X ,Y are subspaces of V such that V = X ⊕ Y , then for any x ∈ V thereexists a unique x ∈ X and y ∈ Y such that z = x + y.
3. Prove: If X ,Y are subspaces of V such that V = X+Y and dim(X )+dim(Y) = dim(V)then X ∩ Y = {0}.
4. Textbook 5.11.3:
Find a basis for the orthogonal complement of M = span
1203
,
2416
.
126
9.9. HW 7: Due November 22, 2013 Applied Matrix Theory
5. Let P be a projector. Let P′ = I−P.
(a) Show that P′ = I−P is also a projector. It is called the complementary projectorof P.
(b) Any projector projects a point z ∈ V onto X along Y , where X ⊕ Y = V , byP(z) = P(x + y) = x. What are the X and Y for P and I−P, respectively?
6. Textbook 5.9.1:
Let X and Y be subspaces of R3 whose respective bases are
BX =
1
11
,1
22
and BY =
1
23
(a) Explain why X and Y are complementary subspaces of R3.
(b) Determine the projector P onto X along Y as well as the complementary projectorQ onto Y along X .
(c) Determine the projection of v =
2−1
1
onto Y along X .
(d) Verify that P and Q are both idempotent.
(e) Verify that R(P) = X = N(Q) and N(P) = Y = R(Q).
7. (a) Find the orthogonal projection of b = (4, 8)ᵀ onto M = span {u}, where u =(3, 1)ᵀ.
(b) Find the orthogonal projection of b onto u⊥, for b, u given in (a).
(c) Find the orthogonal projection of b = (5, 2, 5, 3)ᵀ onto
M = span{
(3/5, 0, 4/5, 0)ᵀ, (0, 0, 0, 1)
ᵀ, (4/5, 0, 3/5, 0)
ᵀ}.
(Note: the given columns are orthonormal.)
(d) Find the orthogonal projection of b = (1, 1, 1)ᵀ onto the range of
A =
1 02 11 0
8. (a) Show that ‖P‖2 ≥ 1 for every projector P 6= 0. When is ‖P‖2 = 1?
(b) Show that ‖I−P‖2 = ‖P‖2 for all projectors P 6= 0, I.
9. (a) Show that the eigenvalues of a unitary matrix satisfy |λ| = 1. Show by a counter-example that reverse not true.
(b) Show that the eigenvalues of a projector are either 0 or 1. Show by a counter-example that the reverse not true.
10. Let u be a unit vector. The elementary reflector about u⊥ is defined to be R = I−2uu∗.
127
Nitsche and Benner Unit 9. Orthogonalization with Projection and Rotation
(a) Prove that all elementary reflectors are involutory (R2 = I), hermitian, and uni-tary.
(b) Prove that if Rx = µei, then µ = ±‖x‖2, and that R:i = Rei = ±x.
(c) Find the elementary reflector that maps x = 13(1,−2,−2)ᵀ onto the x-axis.
(d) Verify by direct computation that your reflector in (c) is symmetric, orthogonal,involutory.
(e) Extend the vector x in (c), to an orthonormal basis for R3. (Hint: what do youknow about the columns of R from parts (a,b) above?)
11. Textbook 5.6.17:
Perform the following sequence of rotations in R3 beginning with
v0 =
11−1
1. Rotate v0 counterclockwise 45° around the x-axis to produce v1.
2. Rotate v1 clockwise 90° around the y-axis to produce v2.
3. Rotate v2 counterclockwise 30° around the z-axis to produce v3.
Determine the coordinates of v3 as well as an orthogonal matrix Q such that Qv0 = v3.
12. (a) Find the index of A =
−2 0 −44 2 43 2 2
. Find its core-nilpotent decomposition.
(b) A matrix is said to be nilpotent if Ak = 0 for some k. Show that the index ofa nilpotent matrix is the smallest k for which Ak = 0. Find its core-nilpotentdecomposition.
(c) Find the index of a projector that is not the identity. Find its core-nilpotentdecomposition.
(d) What is the index of the identity?
9.10 Lecture 35: November 15, 2013
Range Nullspace decomposition of An×n
Theorem 9.30. For any An×n and some k; Rn = R(Ak)⊕ N(Ak). The smallest such k iscalled the index of A.
Example 9.31. Nilpotent matrices have some k such that Nk = 0, R(Nk) = {0}, andN(Nk) = Rn
Proof. First, note that R(Ak+1) ⊆ R(Ak) for any k. This is because if y ∈ R(Ak+1), theny = Ak+1z for some z, then y = Ak (Az). Second, R(A) ⊂ R(A2) ⊂ R(A3) ⊂ · · · ⊂R(Ak) = R(Ak+1) = R(Ak+2) = · · · contains equality for some k. The dimensions decrease
128
9.10. Lecture 35: November 15, 2013 Applied Matrix Theory
if proper. Third, once equality achieved, it is maintained through the rest of the chain. Theproof:
R(Ak+2) = R(Ak+1A), (9.10.1a)
= AR(Ak+1), (9.10.1b)
= AR(Ak), (9.10.1c)
= R(Ak). (9.10.1d)
Fourth, N(A0) ⊂ N(A) ⊂ N(A2) ⊂ · · · ⊂ N(Ak) = N(Ak+1) = N(Ak+2) = · · ·. Whydoes the nullspace change at the same spot as the columnspace? Because dim(N(Ak)) =n−dim(R(Ak)), so once the dimensions are constant in the columnspace, then the dimensionswill be constant for the nullspace. Fifth, R(Ak) ∩ N(Ak) = {0}: Let y ∈ R(Ak) andy ∈ N(Ak), then y = Akx for some x, and Aky = 0. So A2kx = 0 and x ∈ N(A2k) = N(Ak)so Akx︸︷︷︸
y
= 0. Sixth, R(Ak) + N(Ak) = Rn since the dimensions add up and there is no
intersection of the two spaces (except for {0}). �
Now, how can we factor the matrix?
Corresponding factorization of A
Let {x1, . . . ,xr} be a basis for R(Ak) and{y1, . . . ,yn−r
}be a basis for N(Ak). Then S =[
x1, . . . ,xr,y1, . . . ,yn−r], and we note that X = span {x1, . . . ,xr} and Y = span
{y1, . . . ,yn−r
}which are both invariant subspaces. So
S−1AS =
(Cr×r 0
0 N(n−r),(n−r)
). (9.10.2)
Note S−1AkS = (S−1AS)k
because the inverse and normal S terms cancel out in the expo-nentiation. Thus,
S−1AkS =
(C 00 Nk
), (9.10.3a)
= S−1Ak(X Y
), (9.10.3b)
= S−1(AkX AkY
), (9.10.3c)
= S−1(AkX 0
), (9.10.3d)
=(S−1AkX 0
). (9.10.3e)
Thus Nk = 0 and N is nilpotent and C is invertible. So we have a core-nilpotent factorizationof A. So we have a similarity factorization which always exists. We recall the decompositionfor any A ∈ Rn×n = R(A)⊕ N(Aᵀ) = R(Aᵀ)⊕ N(A), corresponding factorization
UᵀAV =
(C 00 0
). (9.10.4)
129
Nitsche and Benner Unit 9. Orthogonalization with Projection and Rotation
130
UNIT 10
Singular Value Decomposition
10.1 Lecture 35 (cont.)
Singular Value Decomposition
The singular value decomposition is a way to find the orthogonal matrices Un and Vn maybe found such that we may diagonalize A. Or
Uᵀm · · ·U
ᵀ2Uᵀ1AV1V2 · · ·Vm =
σ1 0 · · · 0 0 · · · 0
0 σ2. . .
......
......
. . . . . . 0 0 · · · 00 · · · 0 σr 0 · · · 00 · · · 0 0 0 · · · 0...
......
.... . .
...0 · · · 0 0 0 · · · 0
(10.1.1)
Theorem 10.1. For any Am×n there exists orthogonal U and V such that
Am×n = UDVᵀ, (10.1.2a)
= [U]m×m
σ1 0 · · · 0 0 · · · 0
0 σ2. . .
......
......
. . . . . . 0 0 · · · 00 · · · 0 σr 0 · · · 00 · · · 0 0 0 · · · 0...
......
.... . .
...0 · · · 0 0 0 · · · 0
[Vᵀ]n×n (10.1.2b)
where σi are real and greater than 0. Further σ1 ≥ σ2 ≥ · · · ≥ σr, where r = rank(A).
Definition 10.2. σi are the singular values of A.
Note:
131
Nitsche and Benner Unit 10. Singular Value Decomposition
1. σi are uniquely determined, but U,V are not unique
2. rank(A) = rank(D)
3. ‖A‖2 = ‖D‖2 ∥∥∥∥(A 00 B
)∥∥∥∥2
= max (‖A‖2, ‖B‖2)
4. If A is invertible,
A = U
σ1 0 · · · 0
0 σ2. . .
......
. . . . . . 00 · · · 0 σr
Vᵀ, (10.1.3a)
A−1 = V
1σ1
0 · · · 0
0 1σ2
. . ....
.... . . . . . 0
0 · · · 0 1σn
Uᵀ, (10.1.3b)
= V
1σn
0 · · · 0
0 1σn−1
. . ....
.... . . . . . 0
0 · · · 0 1σ1
Uᵀ. (10.1.3c)
Now K(A) = ‖A‖·‖A−1‖ = σ1σn
which means that we can have issues with singularities.
Example 10.3. Prove ‖I−P‖2 = ‖P‖2. What is the norm of P and of I − P? Fromillustration we can use tangents to the unit ball. Then ‖Pω‖ = ‖(I−P)ω‖ needs to beshown.
10.2 Lecture 36: November 18, 2013
We will do review for exam on Friday.
Singular Value Decomposition
SVD:
Theorem 10.4. For any Am×n there exists orthogonal U,V such that
Am×n = Um×mDm×nVᵀn×n, (10.2.1)
132
10.2. Lecture 36: November 18, 2013 Applied Matrix Theory
where
D =
σ1 0 · · · 0 0 · · · 0
0 σ2. . .
......
......
. . . . . . 0 0 · · · 00 · · · 0 σr 0 · · · 00 · · · 0 0 0 · · · 0...
......
.... . .
...0 · · · 0 0 0 · · · 0
m×n
(10.2.2)
and σi > 0, σ1 ≥ σ2 ≥ · · · ≥ σr > 0.
Notes:
1. ‖A‖2 = σ1, ‖A−1‖2 = 1σn
, where A would have to be invertible.
The condition number is κ(A) = σ1σn
.
2. r = rank(A).
3. |det(A)| =∏n
i=1 σi.
4. A−1 = VD−1Uᵀ.
Existence of the Singular Value Decomposition
Proof. We know that there exists U and V such that,
UAᵀV =
(C 00 0
), (10.2.3)
where Cr×r is invertible. Let x be such that ‖x‖ = 1, and
‖C‖2 = max‖y‖2=1
‖Cy‖2, (10.2.4a)
= ‖Cx‖2, (10.2.4b)
= σ1, (10.2.4c)
= ‖A‖2. (10.2.4d)
Let y = Cx‖Cx‖2
and further the two orthogonal matrices, [x | X] and [y | Y]. Now,
[y | Y]ᵀ(
C 00 0
)[x | X] =
(yᵀ
Yᵀ
)(Cx CX
), (10.2.5a)
=
(yᵀCx yᵀCXYᵀCx YᵀCX
)(10.2.5b)
133
Nitsche and Benner Unit 10. Singular Value Decomposition
Further,
yᵀCx =
xᵀCᵀCx
‖Cx‖2
, (10.2.6a)
=‖Cx‖2
2
‖Cx‖2
, (10.2.6b)
= ‖Cx‖2, (10.2.6c)
= σ1. (10.2.6d)
Similarly, YCx = 0,
yᵀCX =
xᵀCᵀCX
‖Cx‖2
, (10.2.7a)
=xᵀCᵀCxxᵀX
‖Cx‖2
, (10.2.7b)
=xᵀCᵀCx
‖Cx‖2
xᵀX, (10.2.7c)
= σ1 xᵀX︸︷︷︸
orthogonal
, (10.2.7d)
= 0. (10.2.7e)
So we have reduced to, (σ1 0
0ᵀ
C
).
We may then repeat this by maximizing the two-norm to get the full singular value decom-position. �
Notes:
Am×n = [U]m×m
σ1 0 · · · 0 0 · · · 0
0 σ2. . .
......
......
. . . . . . 0 0 · · · 00 · · · 0 σr 0 · · · 00 · · · 0 0 0 · · · 0...
......
.... . .
...0 · · · 0 0 0 · · · 0
m×n
[Vᵀ]n×n, (10.2.8a)
=
| |u1 · · · ur| |
m×r
σ1 0 · · · 0
0 σ2. . .
......
. . . . . . 00 · · · 0 σr
r×r
− vᵀ1 −...
− vᵀr −
r×n
, (10.2.8b)
= UDV. (10.2.8c)
134
10.2. Lecture 36: November 18, 2013 Applied Matrix Theory
from trimming out the zeros. Here σ1, . . . , σr are unique, and u1, . . . ,ur and v1, . . . ,vr areunique up to sign.
From the existence of A = UDVᵀ, what can we deduce? We know that UᵀU = UUᵀ = Iand VᵀV = VVᵀ = I. So,
[AV]:j = [UD]:j , (10.2.9a)
= U
0...0σj0...0
, (10.2.9b)
= σjuj. (10.2.9c)
where (AB):j = AB:j. Now,
Avj =
{σjuj, 1 ≤ j ≤ r0, j > r
, (10.2.10a)
Aᵀ
= VDᵀUᵀ, (10.2.10b)
AᵀU = VD,A
ᵀuj =
{σjvj, 1 ≤ j ≤ r0, j > r
. (10.2.10c)
So, the four fundamental subspaces are
• R(A) = span {u1, . . . ,ur}• N(A) = span {vr+1, . . . ,vn}• R(Aᵀ) = span {v1, . . . ,vr}• N(Aᵀ) = span {ur+1, . . . ,um}(
AᵀA)n×n = VD
ᵀUᵀUDV
ᵀ, (10.2.11a)
= VDᵀDV
ᵀ, (10.2.11b)
= V
σ21 0 · · · 0 0 · · · 0
0 σ22
. . ....
......
.... . . . . . 0 0 · · · 0
0 · · · 0 σ2r 0 · · · 0
0 · · · 0 0 0 · · · 0...
......
.... . .
...0 · · · 0 0 0 · · · 0
n×n
Vᵀ, (10.2.11c)
(AᵀAV
):j
=(VD
ᵀD)
:j, (10.2.11d)
AᵀAvj =
{σ1jvj, j ≤ r
0, j > r. (10.2.11e)
135
Nitsche and Benner Unit 10. Singular Value Decomposition
Thus, σj =√λs (AᵀA) , for j = 1, . . . , r. Similarly, vj are the eigenvectors of AᵀA for j =
1, . . . , r and vj are orthogonal because eigenvectors of symmetric matrices are orthogonal.To construct the SVD, we will
1. find λj, which are the eigenvalues of AᵀA and the eigenvectors of AᵀA, vj.
2. Find u1, . . . ,ur for σjuj = Avj
3. Find complementary orthogonal set ur+1, . . . ,um and vr+1, . . . ,vn.
10.3 Lecture 37: November 20, 2013
Review and correction from last time
From last time:
UᵀAV =
(C 00 0
)(10.3.1)
Then we said there exists an x such that ‖Cx‖ = ‖C‖2 = σ1. Then we let y = Cxσ1
Consider,[x |X] and [y |Y]. In our system (we must correct this from last lecture), since we know thatthe x is the eigenvector corresponding to the λ and CᵀCx = λx. Then xᵀCᵀC = xᵀλ = λxᵀ
yᵀCX =
xᵀCᵀCX
σ1
, (10.3.2a)
=λxᵀX
σ1
, (10.3.2b)
= 0. (10.3.2c)
SVD will not be on the exam, but will be on the final.
Singular Value Decomposition
We know,
A = UDVᵀ, (10.3.3a)
= UDVᵀ
(10.3.3b)
Similarly,VA = UD. (10.3.4)
This means that
Avj =
{σjuj, j ≤ r0, j > r
(10.3.5)
Then,
Aᵀuj =
{σjvj, j ≤ r0, j > r
(10.3.6)
Thus, vj are called the right singular vectors, the uj are the left singular vectors, and
σj =√λ(AᵀA) are the singular values. Also we may define the four subspaces,
136
10.3. Lecture 37: November 20, 2013 Applied Matrix Theory
• R(A) = span {u1, . . . ,ur}• N(A) = span {vr+1, . . . ,vn}• R(Aᵀ) = span {v1, . . . ,vr}• N(Aᵀ) = span {ur+1, . . . ,um}
So if we have the SVD, it is easy to describe these subspaces. So we will construct the SVDusing these facts. Now,
AᵀA = VD
ᵀUᵀUDV
ᵀ, (10.3.7a)
= VDᵀDV
ᵀ, (10.3.7b)
AᵀAV = VD
ᵀD. (10.3.7c)
Example 10.5. Given
A =
(1 12 2
). (10.3.8)
Then r = 1 and
AᵀA =
(1 21 2
)(1 12 2
), (10.3.9a)
=
(5 55 5
)(10.3.9b)
AᵀAv = λv, (10.3.10a)
det(AᵀA− λI
)= 0, (10.3.10b)∣∣∣∣5− λ 5
5 5− λ
∣∣∣∣ = 25− 10λ+ λ2 − 25, (10.3.10c)
= λ2 − 10λ, (10.3.10d)
= λ (λ− 10) . (10.3.10e)
So to find v1 for (AᵀA− λI)v = 0.(5− 10 5
5 5− 10
)v = 0, (10.3.11a)(
−5 55 −5
)(v1
v2
)=
(00
), (10.3.11b)
−5v1 + 5v2 = 0, v2 = v1, (10.3.11c)
v1 =1√2
(11
)(10.3.11d)
So
Av1 =
(1 21 2
)1√2
(11
)=
1√2
(24
)(10.3.12a)
137
Nitsche and Benner Unit 10. Singular Value Decomposition
Thus,
u1 =1√20
(24
), (10.3.13a)
=1√5
(12
)(10.3.13b)
A =1√5
(12
)(√10) 1√
2
(1 1
), (10.3.14a)
= UDVᵀ, (10.3.14b)
=1√5
(1 22 −1
)(√10 00 0
)1√2
(1 1−1 1
), (10.3.14c)
= UDVᵀ
(10.3.14d)
This is great to do by hand, but is not a very numerically stable way to find the SVD.
Geometric interpretation
The image of the unit sphere S2 = {x ∈ Rn, ‖x‖2 = 1}
y = Ax, (10.3.15a)
= UDVᵀx, (10.3.15b)
Uᵀy = DV
ᵀx. (10.3.15c)
Let y′ = Uᵀx and x′ = Vᵀx. So
y′ = Dx′, (10.3.16a)
y′j = σjx′j. (10.3.16b)
Now ‖x‖22 = 1 and ‖x′‖2
2 = ‖Vᵀx‖22 = 1. Thus,
(x′1)2
+ (x′2)2
+ · · ·+ (x′n)2
= 1, (10.3.17a)(y′1σ1
)2
+
(y′2σ2
)2
+ · · ·+(y′nσn
)2
= 1, (10.3.17b)
which is a hyperellipse! Viewing the transformation of Axj = σjuj. This shows that the σjgive the major and minor axes of the multi-dimensional ellipsoid.
There is a nice fact about the SVD. For low rank approximations (the second step maybe rationalized easily from the matrix form)
A = UDVᵀ, (10.3.18a)
=r∑j=1
σjujvᵀj . (10.3.18b)
This is a way to write any matrix as a sum of rank 1 matrices. Now the σj decrease, so we may
truncate the series when σj gets close to zero. Let Ak =∑k
j=1 σjujvᵀj with rank(Ak) = k.
138
10.4. Lecture 38: November 22, 2013 Applied Matrix Theory
Theorem 10.6. ‖A−Ak‖2 = σk+1 and is the best approximation, or
‖A−Ak‖2 = minrank(B)=k
‖A−B‖2. (10.3.19)
From
Ak = U
σ1
. . .
σkσk+1
. . .
σr. . .
0
Vᵀ −U
σ1
. . .
σk0
. . .
0. . .
0
Vᵀ,
(10.3.20a)
= U
0. . .
0σk+1
. . .
σr. . .
0
Vᵀ
(10.3.20b)
We will explore the proof and implications of this theorem later.
10.4 Lecture 38: November 22, 2013
Review for Exam 2
From homework, we need to be able to go through proofs like this
• ‖A‖∞ ⇐⇒ ‖A‖1
• Matrix norm
• QR unique
• ‖A−1‖2 = 1√λmin(AᵀA)
• ‖A‖2 =√λmax(AᵀA)
Norms
To show that something is a norm (whether matrices or vectors), we must show the followingproperties,
139
Nitsche and Benner Unit 10. Singular Value Decomposition
1. ‖x‖ ≥ 0 for any x and ‖x‖ = 0 implies x = 0
2. ‖αx‖ = |α|‖x‖3. ‖x + y‖ = ‖x‖+ ‖y‖
Several matrix norms, the induced and Frobenius, have the fourth property,
‖AB‖ ≤ ‖A‖‖B‖ (10.4.1)
More major topics
The exam covers chapters 4 and 5 (minus the SVD). These are things to know:
• Subspace (closed under addition and scalar multiplication)
• Linear transformations (Definition: addition and scalar multiplication)
• Coordinates and change of bases
[x]S = x, (10.4.2a)
=∑i
xiei = Ix, (10.4.2b)
=∑i
ciui = Uc. (10.4.2c)
where c = [x]B and this is clearly a problem of inverting a matrix. The formula is
c = [x]B , (10.4.3a)
=([e1]B [e2]B · · · [en]B
)︸ ︷︷ ︸U−1
[x]S . (10.4.3b)
So we really care about what the representation is of some linear operator for somebasis.
[T]B =([T(u1)]B [T(u2)]B · · · [T(un)]B
), (10.4.4a)
[T(x)]B = [T]B [x]B . (10.4.4b)
• Change coordinates
[T]B ∼ [T]′B , (10.4.5a)
T = ST′S−1. (10.4.5b)
• Least squares: Ax = b. The normal equations are
AᵀAx = A
ᵀb (10.4.6)
140
10.4. Lecture 38: November 22, 2013 Applied Matrix Theory
This connects with projections because,
Ax = A(AᵀA)−1
Aᵀ︸ ︷︷ ︸
P⊥R(A)
b = Pb. (10.4.7)
The solution is unique of the matrix is full rank because then AᵀA is invertible.
• Projectors
Defined by P = P2, similarly we have the properties of the complementary projector(I−P). These are orthogonal to each other, and P∗ = P (Unitary matrix is anorthonormal matrix (when real) Q∗Q = I and Q∗ = Q−1. The projector alwaysprojects onto its range. The proof of P and (I−P).
• Gram–Schmidt needed to orthogonalize a set of matrices. or P = uu∗ = UU∗ = I−uu∗
• QR othogonalization (using Gram–Schmidt)
Show A = QR is unique, rjj > 0, Q is orthonormal, R is upper triangular. Existenceand uniqueness? From the Gramm–Schmidt construction process we know we can getit because we can always construct it. GS was
a1 = r11q1, (10.4.8a)
a2 = r12q1 + r22q2, (10.4.8b)
· · · (10.4.8c)
an = r1nq1 + r2nq2 + · · ·+ r2,nqn. (10.4.8d)
Uniqueness: this also shows uniqueness directly because you have these equations andmay invert them. (Invertibility) a1 = r11q1 implies ‖a1‖ = ‖r11q1‖ = |r11|‖q1‖ and wemay find the r11 so then q1 = 1
r11a1. Then induction may prove this is true for all the
other values of n. First we would show true for n = 1 (all qk are uniquely determined),then show if true for n = k then it’s also still true for n = k + 1. This is done withshowing r1,k+1, . . . , rk+1,k+1, qk+1 are uniquely determined.
ak+1 = r1,k+1q1 + · · ·+ rk,k+1qk + rk+1,k+1qk+1 (10.4.9a)
This is a Fourier series and we may take⟨ak+1,qj
⟩= rj,k+1
⟨qj,qj
⟩= rj,k+1 for j < k+1
therefore all we have left is to find the vector
rk+1,k+1qk+1 = ak+1 − r1,k+1q1 − · · · − rk,k+1qk (10.4.9b)
and we can do the same argument again to finish with rk+1,k+1 and qk+1∥∥rk+1,k+1qk+1
∥∥ = ‖b‖, (10.4.10a)
|rk+1,k+1|∥∥qk+1
∥∥︸ ︷︷ ︸1
= ‖b‖, (10.4.10b)
|rk+1,k+1| = ‖b‖, (10.4.10c)
rk+1,k+1 = ‖b‖. (10.4.10d)
141
Nitsche and Benner Unit 10. Singular Value Decomposition
For positive rj,j.
So we have several decompositions now to work with.
• Invariant subspaces will give a block diagonal form of the matrix.
will have class on Wednesday.
10.5 Homework Assignment 8: Due Tuesday, Decem-
ber 10, 2013
You may use Matlab to compute matrix products, or to reduce a matrix to Row EchelonForm.
1. Determine the SVDs of the following matrices (by hand calculation).
(a)
(3 00 −2
)
(b)
0 20 00 0
(c)
(1 11 1
)
2. Let
(1 20 2
)(a) Use Matlab to find the SVD of A. State U, Σ, V (4-decimal digit format is
fine).
(b) In one plot draw the unit circle C and indicate the vectors v1,v2, and in anotherplot draw the ellipse AC (i.e. the image of the circle under the transformation x→Ax) and indicate the vectors Av1 = σ1u1, Av2 = σ2u2. Use the axis(’square’)
command in Matlab to ensure that the horizontal and vertical axes have thesame scale.
(c) Find A1, the best rank-1 approximation to A in the 2-norm. Find ‖A−A1‖2.
3. Let A ∈ Rm×n, with rank r. Use the singular value decomposition of A to prove thefollowing.
(a) N(A) and R(Aᵀ) are orthogonal complementary subspaces of Rn.
(b) Properties in 5.2.6 (b, c, d, e):
Establish the following properties of the matrix 2-norm.
(a) *(b) ‖A‖2 = ‖A∗‖2,(c) ‖A∗A‖2 = ‖A‖2
2,
142
10.5. HW 8: Due December 10, 2013 Applied Matrix Theory
(d)
∥∥∥∥(A 00 A
)∥∥∥∥2
= max {‖A‖2, ‖B‖2} (take A,B to be real,
(e) ‖U∗AV‖2 = ‖A‖2 when UU∗ = I and V∗V = I.
(c) ‖A‖F =√σ2
1 + σ22 + · · ·+ σ2
r .
4. Show that if A ∈ Rn×n is symmetric then σj = |λj|.5. Compute the determinants of the matrices given in 6.1.3 (a), 6.1.3 (c), 6.2.1 (b).
(a) A =
1 2 32 4 11 4 4
(b) A =
1 2 −3 44 8 12 −82 3 2 1−3 −1 1 −4
(c)
∣∣∣∣∣∣∣∣0 0 −2 31 0 1 2−1 1 2 1
0 2 −3 0
∣∣∣∣∣∣∣∣6. (a) Show that if A is invertible, then det(A−1) = 1/ det(A).
(b) Show that for any invertible matrix S, det(SAS−1) = det(A).
(c) If A is n× n, show that det(αA) = αn det(A).
(d) If A is skew-symmetric, show that A is singular whenever n is odd.
(e) Show by example that in general, det(A + B) 6= det(A) + det(B).
7. (a) Let An×n = diag {d1, d2, . . . , dn}. What are the eigenvalues and eigenvectors of A?
(b) Let A be a nonsingular matrix and let λ be an eigenvalue of A. Show that 1/λ isan eigenvalue of A−1.
(c) Let A be an n × n matrix and let B = A − αI for some scalar α. How do theeigenvalues of A and B compare? Explain.
(d) Show that all eigenvalues of a nilpotent matrix are 0.
8. For each of the two matrices,
A = A1 =
3 2 10 2 0−2 −3 0
, A = A2 =
−4 −3 −30 −1 06 6 5
determine if they are diagonalizable. If they are, find
(a) a nonsingular P such that P−1AP is diagonal.
(b) A100
(c) eA.
143
Nitsche and Benner Unit 10. Singular Value Decomposition
9. Use diagonalization to solve the system
dx
dt= x+ y,
dy
dt= −x+ y, x(0) = 100, y(0) = 100.
10. 7.4.1
Suppose that An×n is diagonalizable, and let P = [x1|x2| · · · |xn] be a matrix whosecolumns are a complete set of linearly independent eigenvectors corresponding to eigen-values λi. Show that the solution to u′ = Au,u(0) = c, can be written as
u(t) = ξ1eλ1tx1 + ξ1eλ1tx1
in which the coefficients ξi satisfy the algebraic system Pξ = c.
11. 7.5.3
Show that A ∈ Rn×n is normal and has real eigenvalues if and only if A is symmetric.
12. 7.5.4
Prove that the eigenvalues of a real skew-symmetric or skew-hermitian matrix must bepure imaginary numbers (i.e., multiples of i).
13. 7.6.1
Which of the following matrices are positive definite?
A =
1 −1 −1−1 5 1−1 1 5
, B =
20 6 86 3 08 0 8
, C =
2 0 20 6 22 2 4
.
14. 7.6.4
By diagonalizing the quadratic form 13x2 + 10xy + 13y2, show that the rotated graphof 13x2 + 10xy + 13y2 = 72 is an ellipse in standard form as shown in Figure 7.2.1 onp. 505.
10.6 Lecture 39: November 27, 2013
We will have one more homework before the end. We will have a homework on SVD andeigenvalues with the diagonalization, and we will be covering the Jordan Canonical Formbut may not be putting it on the homework. It will be due next Friday so we have time forthe solutions before the final. The final is cumulative and will be held on Wednesday.
Singular Value Decomposition
We know that A = UΣVᵀ for any matrix A. Here Σ is a diagonal matrix. We mayrearrange,
AV = UΣ, (10.6.1a)
Avj =
{σjuj, j ≤ r0 j > r
(10.6.1b)
144
10.6. Lecture 39: November 27, 2013 Applied Matrix Theory
The SVD A =∑r
j=1 σjujvᵀj for a matrix of rank r. We may define, Ak =
∑kj=1 σjujv
ᵀj and
have an aproximation of rank k.
Theorem 10.7.
‖A−Ak‖2 = σk+1, (10.6.2a)
= minrank(B)=k
‖A−B‖2 (10.6.2b)
In words, Ak is a best approximation of rank k of A in the 2-norm.
Proof. The first part is easily shown by the matrix form of the eigenvalues which are in thediagonal matrix of the SVD. For the second part, we assume there is a matrix B which hasrank k and follows the condition ‖A−B‖2 < σk+1. Then there exists a subspace W ofdim(W) = n− k such that Bw = 0 for any w ∈ W . For such a w,
‖Aw‖2 = ‖(A−B) w‖2, (10.6.3a)
≤ ‖(A−B)‖2‖w‖2, (10.6.3b)
< σk+1‖w‖2. (10.6.3c)
But subspace V of dim(V) = k + 1 such that, ‖Aw‖2 ≥ σk+1 for all w ∈ V , namelyV = span {v1, . . . ,vk+1}. This is impossible though because the subspaces do not haveagreeing dimensions, or since dim(V) + dim(W) > n there exists w 6= 0 ∈ (V ∩W). Forthis w must have ‖Aw‖2 < σk+1‖w‖2 and ‖Aw‖2 ≥ σk+1‖w‖2 which is an impossiblecontradiction. This proof is a little more elementary than the proof in the book. �
Thus, we can approximate a matrix by some lower-rank matrices. This is good becausethen we have fewer non-zero entries in our system and reduce our co.
SVD in Matlab
Example handed out in class: In Matlab if you say x = load(clown.mat) then typewhos and you will see a matrix X. This may be displayed with image(X). Then we do[U,S,V] = svd(X). The first figure (Figure 10.1) plots the diagonal entries of S. So we seewe can truncate the small values. As we increase the approximations for k = 3, 10, 30 wesee a significantly improving image in Figure 10.2. So Ak = UΣVᵀ and this is done withAk = U(:,1:k) * S(1:k,1:k) * V(:,1:k)’. Now we see that for k = 30 we have a goodapproximation which is significantly less expensive than the original matrix. Further inTable 10.1 we observe that the relative error decreases significantly.
Listing 10.1. svdimag.m
1 % app l i c a t i o n o f the SVD to image compression2 % from ”Appl ied Numerical Linear Algebra ” , by J . Demmel , page 114 (SIAM)3 load clown . mat4 % X i s a matrix o f p i c e l s o f dimension 200 by 3205 [U, S ,V]=svd (X) ;6 %%7 f igure (1 )
145
Nitsche and Benner Unit 10. Singular Value Decomposition
Table 10.1. Relative error of SVD approximation matrix Ak
relative error compression ratiok σk+1/σk 520k/(200 · 320)3 0.155 0.02410 0.077 0.08130 0.027 0.244
8 plot (diag (S ) ) ;9 set (gca , ’ FontSize ’ , 15)
10 xlabel ( ’ k ’ )11 ylabel ( ’ \ s igma k ’ )12 t i t l e ( ’ S ingu la r va lue s o f X ’ )13 %%14 f igure (2 )15 i f o n t =1216 colormap ( ’ gray ’ )17 subplot ( ’ p o s i t i o n ’ , [ . 0 7 , . 5 4 , . 4 0 , . 4 0 ] )18 k=3; image(U( : , 1 : k )∗S ( 1 : k , 1 : k )∗V( : , 1 : k ) ’ ) ; t i t l e ( ’ k=3 ’ )19 set (gca , ’ FontSize ’ , i f o n t )20 set (gca , ’ XTickLabel ’ , ’ ’ )21 %22 subplot ( ’ p o s i t i o n ’ , [ . 5 , . 5 4 , . 4 0 , . 4 0 ] )23 k=10; image(U( : , 1 : k )∗S ( 1 : k , 1 : k )∗V( : , 1 : k ) ’ ) ; t i t l e ( ’ k=10 ’ )24 set (gca , ’ FontSize ’ , i f o n t )25 set (gca , ’ YTickLabel ’ , ’ ’ )26 set (gca , ’ XTickLabel ’ , ’ ’ )27 %28 subplot ( ’ p o s i t i o n ’ , [ . 0 7 , . 0 6 , . 4 0 , . 4 0 ] )29 k=30; image(U( : , 1 : k )∗S ( 1 : k , 1 : k )∗V( : , 1 : k ) ’ ) ; t i t l e ( ’ k=30 ’ , ’ FontSize ’ , i f o n t )30 set (gca , ’ FontSize ’ , i f o n t )31 %32 subplot ( ’ p o s i t i o n ’ , [ . 5 , . 0 6 , . 4 0 , . 4 0 ] )33 image(X) ; t i t l e ( ’ o r i g i n a l ’ )34 set (gca , ’ FontSize ’ , i f o n t )35 set (gca , ’ YTickLabel ’ , ’ ’ )
146
10.6. Lecture 39: November 27, 2013 Applied Matrix Theory
0 20 40 60 80 100 120 140 160 180 2000
2,000
4,000
6,000
8,000
k
σk
Singular values of X
Figure 10.1. Singular values σk of matrix X versus k.
k=3
20
40
60
80
100
120
140
160
180
200
k=10
k=30
50 100 150 200 250 300
20
40
60
80
100
120
140
160
180
200
original
50 100 150 200 250 300
Figure 10.2. Rank k approximations of original image.
147
Nitsche and Benner Unit 10. Singular Value Decomposition
148
UNIT 11
Additional Topics
11.1 Lecture 39 (cont.)
The Determinant
We will quickly cover the essentials of chapter 6. The determinant is defined;
Definition 11.1.det(A) =
∑p
σ(p)a1p1a2p2 · · · anpn (11.1.1)
where p is the number of permutations of (1, . . . , n)→ (p1, p2, . . . , pn). Also, σ(p) is the signof the permutation,
σ(p) =
{+1, if even number of exchanges needed to obtain p from (1, . . . , n)−1, if odd number of exchanges needed to obtain p from (1, . . . , n)
(11.1.2)
If we have a non-zero determinant, then Ax = b has a unique solution.
Theorem 11.2. We have several interesting properties of determinants.
1. Triangular matrices:
det
a11 a12 · · · a1n
0 a22. . . a2n
.... . . . . .
...0 0 · · · ann
=n∏i=1
aii (11.1.3)
2. det(Aᵀ) = det(A)
3. det(AB) = det(A) det(B).
4. If B is obtained for A by
• Exchange row i with row j; det(B) = det(A).
• Multiply row i by α; det(B) = α det(A).
• Add multiple of row i to row j; det(B) = det(A).
5. det(A) is a bilinear operator in the rows and columns of A
149
Nitsche and Benner Unit 11. Additional Topics
11.2 Lecture 40: December 2, 2013
Further details for class
Homework due Friday, with latest it can possibly be turned in on Tuesday before 4:30 (toget solutions). Final is on Wednesday at 7:30–9:30. (?)
Today we will cover eigenvalues and eigenvectors. Then on Wednesday we will coverpositive-definite matrices.
For Final, we will review on Friday. Some homework problems may definitely be ignoredbecause they were too involved.
Diagonalizable Matrices
We know that for any matrix,A ∼ B (11.2.1)
meansA = SBS−1. (11.2.2)
Now we want to know when A ∼ D which is a diagonal matrix .
Eigenvalues and eigenvectors
Say we have the eigen-pair (λ,v), when
Av = λv, (11.2.3a)
(A− λI) v = 0. (11.2.3b)
which is only the case for v ∈ N(A− λI). Thus we care about det(A− λI) = 0. So,
det (A− λI) =
∣∣∣∣∣∣∣∣∣a11 − λ a12 · · · a1n
a21 a22 − λ · · · a2n...
.... . .
...an1 an2 · · · ann − λ
∣∣∣∣∣∣∣∣∣ , (11.2.4a)
= (a11 − λ) (a22 − λ) · · · (ann − λ) + powers of λ of degree ≤ n− 2, (11.2.4b)
= p(λ), (11.2.4c)
= (−1)nλn + (−1)n−1 λn−1 (a11 + a22 + · · ·+ ann)︸ ︷︷ ︸tr(A)
+ lower order terms in λk, k ≤ n− 2,
(11.2.4d)
= (λ− λ1) (λ− λ2) · · · (λ− λn) (−1)n, (11.2.4e)
= (−1)nλn + (−1)n−1 (λ1 + λ2 + · · ·+ λn) + l.o.t., (11.2.4f)
= (−1)n[λn + λn−1 (−λ1 − λ2 − · · · − λn) + l.o.t.
], (11.2.4g)
(11.2.4h)
with the final step being from the fundamental theorem of algebra. From this we get thefollowing:
150
11.2. Lecture 40: December 2, 2013 Applied Matrix Theory
• Every matrix A has n eigenvalues.
• The sum∑λk = tr(A).
•∏λk = p(0) = det(A).
• If A is triangular det(A− λiI) =∏
(aii − λi) = 0 so the roots are simply the aii andλi = aii.
Example 11.3. For a little reviewing find the eigenvalues and the eigenvectors of
A =
(1 −11 1
)So,
det(A− λI) =
∣∣∣∣1− λ −11 1− λ
∣∣∣∣ = (1− λ)2 + 1, (11.2.5a)
= λ2 − 2λ+ 2, (11.2.5b)
λ1,2 =2±√
4− 8
2, (11.2.5c)
= 1± i. (11.2.5d)
Then for λ1:
(A− λI) v = 0
[1− (1 + i) −1 0
1 1− (1 + i) 0
]=
[−i −1 01 i 0
], (11.2.6a)
→[
1 −i 0−i −1 0
], (11.2.6b)
→[
1 −i 00 0 0
], (11.2.6c)
(11.2.6d)
So,
v1 − iv2 = 0, (11.2.7a)
v1 = iv2, (11.2.7b)
v1 =
(i1
)(11.2.7c)
Then
λ2 = 1− i, (11.2.8a)
v2 =
(−i
1
)(11.2.8b)
Note that the eigenvectors v1,v2 are linearly independent.
151
Nitsche and Benner Unit 11. Additional Topics
Note: If A has a linearly independent set of eigenvectors, then,
V =
(| | |
v1 v2 · · · vn| | |
)is invertible and Avj = λjvj. Then, for a diagonal matrix D with the eigenvalues along thediagonal
(AV):j = (VD):j , (11.2.9a)
AV = VD, (11.2.9b)
A = VDV−1. (11.2.9c)
So not all matrices are diagonalizable.
Example 11.4. Given the matrix
A =
(1 10 1
),
has the double eigenvalue of 1; λ1 = λ2 = 1. So,
A− λI =
(0 10 0
), (11.2.10a)
dim(N(A− λI)) = 1, (11.2.10b)
Thus there is only one eigenvector.
Example 11.5. Given the matrix
A =
(1 00 1
),
has the double eigenvalue of 1; λ1 = λ2 = 1. But here,
A− λI =
(0 00 0
), (11.2.11a)
dim(N(A− λI)) = 2, (11.2.11b)
and there are two linearly independent eigenvectors.
v1 =
(10
)and v2 =
(01
).
Example 11.6. Any nilpotent matrix where Nk = 0 does not have a full set of eigenvalues.This is because,
A ∼
0 · · · 0 01 0 0
. . . . . ....
0 1 0
(11.2.12a)
So λ1 = λ2 = · · · = λn = 0 and dim(N(A− λI)) = dim(N(A)) = 1.
152
11.2. Lecture 40: December 2, 2013 Applied Matrix Theory
Theorem 11.7. If A has n distinct eigenvalues, then the corresponding eigenvectors aredistinct.
Proof. Assume that {vk} are linearly dependent. Then, we can write one of them as alinearly independent subset of the other eigenvectors. Then, vk =
∑j 6=k cjvj where {vj} are
linearly independent. Then,
(A− λkI) vk = (A− λkI)∑j 6=k
cjvj, (11.2.13a)
0 = λkvk − λkvk, (11.2.13b)
=∑j 6=k
cj (Avj − λkvj) , (11.2.13c)
=∑j 6=k
cj (λj − λk)︸ ︷︷ ︸6=0
vj, (11.2.13d)
0 =∑
αj︸︷︷︸6=0
vj. (11.2.13e)
This however means that the set {vj} is linearly dependent. But this is a contradiction sothe assumption is not possible. So {vj} are linearly independent. �
Now if A = VDV−1 then,
Ak = VDV−1VDV−1 · · ·VDV−1, (11.2.14a)
= VDkV−1 (11.2.14b)
Similarly we can do a power series. This will be useful in solving systems of differentialequations.
153
Nitsche and Benner Unit 11. Additional Topics
154
Index
backward substitution, 9basic columns, 38basis, 56, 66, 84bilinear operator, 149
Cauchy–Schwarz inequality, 100change of basis, 88column space, 58complementary projector, 127complimentary subspaces, 121condition number, 27, 48consistent system, 36
determinant, 149diagonal matrix, 150differentiation, 86direct sum, 121
eigenvalues, 150eigenvectors, vi, 150elementary operations, 15Euclidian norm, 19exams, 73, 74
field, 55finite difference, 2, 44four fundamental subspaces, 58Frobeius norm, 101fundamental theorem of algebra, 65, 150
geometric series, 46Givens rotation, 118Gramm–Schmidt orthogonalization, 112
homogeneous solutions, 39Householder method, 121
idempotent matrices, 122
idempotent operator, 92ill-posed, 20induced norm, 104inner product, 109interpolation, 63invariant subspace, 91isometry, 116
Laplace equation, 2least squares, 69left null space, 58linear function, 39linear system, 1linear transformation, 83
action, 83, 87linearly dependent, 66linearly independent, 57, 63lower triangular, 25lower triangular system, 5
matrix form, 1matrix norm, 101minimization, 74modified Gramm–Schmidt, 116
nilpotent matrix, 128nilpotent operator, 92nonbasic columns, 38norm, 47, 99normal equations, 71null space, 58
operation count, 9order, 3orthogonal projector, 123orthogonalization, 111orthonormal, 111
155
Nitsche and Benner Index
orthonormal basis, 111
partial differential equations, 111particular solution, 38periodic boundary conditions, 44perturbations, 42pivoting, 19, 22PLU factorization, 22projection, 118
QR factorization, 114
rank, 61reduced row echelon form, 35reflection, 118review, 140rotation, 117row echelon form, 31row space, 58
self-similar, 89Sherman–Morrison formula, 44singular value decomposition, 131singular values, 131smallest upper bound, 102spanning set, 56sparsity, 18submatrices, 26subspaces, 67
Taylor series, 3trace, 40tridiagonal matrix, 18tuple, 92
Van der Monde matrix, 63vector form, 1vector space, 56
well-posed, 20
156
Figures
1.1 Finite difference approximation of a 1D boundary value problem. . . . . . . 2
2.1 One-dimensional discrete grids. . . . . . . . . . . . . . . . . . . . . . . . . . 102.2 Two-dimensional discrete grids. . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.1 Plot of linear problems and their solutions. . . . . . . . . . . . . . . . . . . . 21
4.1 Geometric illustration of linear systems and their solutions. . . . . . . . . . . 364.2 Figures for Textbook problem 3.3.4. . . . . . . . . . . . . . . . . . . . . . . . 51
5.1 Basis vector of example solution. . . . . . . . . . . . . . . . . . . . . . . . . 575.2 Interpolating system. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
6.1 Minimization of distance between point and a plane. . . . . . . . . . . . . . 736.2 Parabolic fitting by least squares . . . . . . . . . . . . . . . . . . . . . . . . 73
7.1 Figure 4.7.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
10.1 Singular values σk of matrix X versus k. . . . . . . . . . . . . . . . . . . . . 14710.2 Rank k approximations of original image. . . . . . . . . . . . . . . . . . . . . 147
157
Nitsche and Benner Figures
158
Tables
3.1 Variation of error with the perturbation variable . . . . . . . . . . . . . . . . 20
10.1 Relative error of SVD approximation matrix Ak . . . . . . . . . . . . . . . . 146
159
Nitsche and Benner Tables
160
Listings
2.1 code stub for tridiagonal solver . . . . . . . . . . . . . . . . . . . . . . . . . 1310.1 svdimag.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
161