Chapter 3 LEAST SQUARES PROBLEMS - Purdue …Chapter 3 LEAST SQUARES PROBLEMS the sea E F C D B A One application is geodesy & surveying. Let z = elevation, and suppose we have the

Chapter 3

LEAST SQUARES PROBLEMS

the sea

E

F

C

D

B

A

One application is geodesy & surveying. Let z = elevation, and suppose we have the mea-surements:

zA ≈ 1.,zB ≈ 2.,zC ≈ 3.,

zB − zA ≈ 1.,zC − zB ≈ 2.,zC − zA ≈ 1.

71

This is overdetermined and inconsistent:

1 0 00 1 00 0 1

−1 1 00 −1 1

−1 0 1

zA

zB

zC

≈

1.2.3.1.2.1.

.

Another application is fitting a curve to given data:

x

y

y=ax+b

x1 1x2 1...

...xn 1

[ab

]≈

y1

y2...

yn

.

More generally Am×nxn ≈ bm where A is known exactly, m ≥ n, and b is subject to inde-pendent random errors of equal variance (which can be achieved by scaling the equations).Gauss (1821) proved that the “best” solution x

minimizes ‖b − Ax‖2,

i.e., the sum of the squares of the residual components. Hence, we might write Ax ≈2 b.Even if the 2-norm is inappropriate, it is the easiest to work with.

3.1 The Normal Equations

3.2 QR Factorization

3.3 Householder Reflections

3.4 Givens Rotations

3.5 Gram-Schmidt Orthogonalization

3.6 Singular Value Decomposition

72

3.1 The Normal Equations

Recall the inner product xTy for x, y ∈ Rm. (What is the geometric interpretation of the

inner product in R3?) A sequence of vectors x1, x2, . . . , xn in R

m is orthogonal if xTi xj = 0

if and only if i 6= j and orthonormal if xTi xj = δij . Two subspaces S, T are orthogonal if

x ∈ S, y ∈ T ⇒ xTy = 0. If X = [x1, x2, . . . , xn], then orthonormal means XTX = I.Exercise. Define what it means for a set (rather than a multiset) of vectors to be orthog-

onal.An orthogonal matrix Q satisfies QT = Q−1. In 2D or in 3D it represents a reflection

and/or rotation.The problem

minx

‖b − Ax‖2 ⇔ minx1,...,xn

‖b − (x1a1 + · · · + xnan)‖2

where A = [a1, . . . , an] : find a linear combination of the columns of A which is nearest b.

b

Ax

r

R(A)

Here R(A) = column space of A. The best approximation is when r⊥R(A). Hence thebest approximation Ax = orthogonal projection of b onto R(A), i.e., r⊥aj , j = 1, 2, . . . , n

⇔ ATr = 0 ⇔ AT(b − Ax) = 0 ⇔ (ATA)x = ATbnormalequations

. Clearly x is unique ⇔columns of A are linearly independent. Otherwise, there are infinitely many solutions.

The least squares problem is sometimes written[I A

AT 0

] [rx

]=

[b0

].

73

analytical argument

Let rank(A) = n. Therefore ATA is nonsingular. Let x satisfy ATr = 0 where r = b − Ax.Then for any other value x + w

‖b − A(x + w)‖22 = ‖r − Aw‖2

2 = rTr − 2wTATr + wTATAw

= ‖b − Ax‖22 + ‖Aw‖2

2.

Hence, a solution of the normal equations is a solution of the least squares problem. Thesolution x is a unique minimum if rank(A) = n because then ‖w‖2 = 0 ⇒ w = 0.

r−Aw

Aw

r

R(A)

Ax

b

the pseudo-inverse

Assume rank(A) = n. Thenx = (ATA)−1ATb.

We call (ATA)−1AT = A† the pseudo-inverse and we write

x = A†b.

(The definition can be extended to the case rank(A) < n.) For a column vector v the pseudo-inverse v† = vT/‖v‖2

2. The product A†A = In×n:

= .

What about AA†? This is an orthogonal projector.

74

orthogonal projector

The productvvT

vTv= vv† produces an orthogonal projection of a vector onto span{v} because

1. vv†x ∈ span{v},2. x − vv†x⊥ span {v}.

Alternatively,vvT

vTvx is the closest vector to x that is some multiple of v.

T

T

xvv

vvv

x

SimilarlyAA† = A(ATA)−1AT, rank (A) = n ≤ m,

produces an orthogonal projection onto R(A), because AA†b ∈ R(A) (since AA†b = A(A†b)).For any b

b − AA†b⊥R(A)

since AT(b − AA†b) = 0.DEFN A matrix P is an orthogonal projector if for any x

x − Px⊥R(P )⇔ ∀x (x − Px)TP = 0⇔ PTP = P

⇔︷︸︸︷

PT = P P 2 = Psymmetry idempotence

If S is a subspace of Rn, then its orthogonal complement S⊥ := {x|xTy = 0 for all y ∈ S}.)

Every v ∈ Rn has a unique decomposition v = x + y, x ∈ S, y ∈ S⊥.

solving the normal equations

The direct approach to solving (ATA)x = ATb is to

1. form ATA,

75

2. use Cholesky to get factorization GGT,

3. set x = G−T(G−1(ATb)).

Example With 4-digit round-to-even floating-point arithmetic

A =

1 1.02

1 11 1

, ATA =

[3 3.02

3.02 3.0404

]→

[3 3.02

3.02 3.04

].

The result is not even positive definite.The problem is that (it can be shown that)

κ2(ATA) = κ2(A)2 where κ2(A) = ‖A†‖2‖A‖2 .

Any roundoff made in forming ATA will have a very significant effect if A is ill-conditioned.The solution is to

do entire computation in double precision

or

look for another algorithm.

Review questions

1. Define a least squares solution to an overdetermined system Ax ≈ b using matrixnotation.

2. For a least squares solution to an overdetermined system Ax ≈ b, how should the“equations” be scaled?

3. Define an orthogonal sequence? an orthonormal sequence?

4. How is a subspace represented computationally?

5. What does it mean for two subspaces to be orthogonal?

6. What is the orthogonal complement of a subspace?

7. What is the null space of a matrix?

8. Give an alternative expression for R(A)⊥ which is more useful computationally.

9. Give a geometric interpretation of a linear least squares problem Ax ≈ b.

10. Give necessary and sufficient condition for existence of a solution to a linear leastsquares problem Ax ≈ b. for existence of a unique solution.

76

11. What are the normal equations for a linear least squares problem Ax ≈ b?

12. Express the linear least squares problem Ax ≈ b as a system of m + n equations inm + n unknowns where m and n are the dimensions of A.

13. If x satisfies the normal equations, show that no other vector can be a better solutionto the least squares problem.

14. If y′ is the orthogonal projection of y onto a subspace S, what two conditions does y′

satisfy?

15. What doesvvT

vTvdo?

16. Give a formula for the orthogonal projector onto a subspace S in terms of a basis a1,a2, . . . , an for S.

17. What two simple conditions must a matrix P satisfy for it to be an orthogonal projec-tor?

18. What is an oblique projector?

19. What does idempotent mean?

20. Why is the explicit use of the normal equations undesirable computationally?

21. If we do use the normal equations, what method is used for the matrix factorization?

Exercises

1. What can you say about a nonsingular orthogonal projector?

2. (a) What is the orthogonal complement of R(A) where

A =

1 1.001

1 11 1

?

(b) What is the projector onto the orthogonal complement of R(A)?

(c) What is the projector onto R(A)?

3. Construct the orthogonal projector onto the plane through (0,0,0), (1,1,0), (1,0,1).

4. Assuming u and v are nonzero column vectors, show that I + uvT is orthogonal onlyif u = −2v/(vTv).

77

5. Assume that A ∈ Rm×n has linearly independent columns. Hence ATA has a (unique)

Cholesky factorization. Prove that there exists a factorization A = Q1R1 where Q1 ∈R

m×n has columns forming an orthonormal set and R1 ∈ Rn×n is an upper triangular

matrix. What is the solution of the least squares problem Ax ≈ b in terms of Q1, R1,and b?

6. Let x∗ be a least squares solution to an overdetermined system Ax ≈ b. What is thegeometric interpretation of Ax∗? When is Ax∗ unique? When is x∗ unique?

7. Show that an upper triangular orthogonal matrix is diagonal. What are the diagonalelements?

8. Suppose that the matrix [A B0 C

]

is orthogonal where A and C are square submatrices. Prove in a logical, deductivefashion that B = 0. (Hint: there is no need to have a formula for the inverse of a 2× 2block upper triangular matrix, and there is no need to consider individual elements,columns, or rows or A, B, or C.)

9. Show that ‖Qx‖2 = ‖x‖2 if Q is orthogonal.

10. Show that if Q is an m by m orthogonal matrix and A is an m by n matrix, then‖QA‖2 = ‖A‖2.

11. Assume [Q q0T ρ

]

is an orthogonal matrix where Q is square and ρ is a scalar. What can we say aboutQ, q, and ρ? (The answer should be simplified.)

12. (a) What is the orthogonal projector for span{v1, v2} where v1, v2 are linearly inde-pendent real vectors?

(b) How does this simplify if v2 is orthogonal to v1? In pariticular, what is the usualway of expressing an orthogonal projector in this case?

3.2 QR Factorization

The least squares problem minx

‖b − Ax‖2 is simplified if A is reduced to upper triangular

form by means of orthogonal transformations:

QTA = R right triangular

78

nmnmmm

Then

‖b − Ax‖2 = ‖QT(b − Ax)‖2 see exercise 3.1.9

= ‖QTb − Rx‖2 partition QTb =

[cd

]n

m − n, R =

[R0

]n

m − n

= ‖[

cd

]−

[R0

]x‖2

= ‖[

c − Rxd

]‖2 =

√‖c − Rx‖2

2 + ‖d‖22 .

Obviously minimized for x = R−1c, which is computed by back substitution. A special caseis m = n; i.e., A is square. This method is numerically very stable because there is no growthin the elements of the reduced matrix:

‖R‖2 = ‖QTA‖2 = · · · = ‖A‖2.

(Show this for matrices.) It is twice as much work as Gaussian elimination: 23n3 multiplica-

tions using Householder reflections.

Review questions

1. What is an orthogonal matrix?

2. What is the effect of multiplication by an orthogonal matrix on the 2-norm of a vector?2-norm of a matrix? Prove it.

3. What is a QR factorization?

4. Show how the QR factorization of a matrix A can be used to solve the linear leastsquares problem.

Exercise

1. Consider the problem of solving an overdetermined system Ax ≈ b in the least squaressense. Suppose A is such that it is possible to compute an accurate factorization LUwhere L is a square lower triangular matrix and U is upper triangular with the samedimensions as A. Why would this be of little use in solving the least squares problem;that is, what is special about a QR factorization?

79

2. Assume that A ∈ Rm×n has linearly independent columns. Hence ATA has a (unique)

Cholesky factorization. Prove that there exists a factorization A = Q1R1 where Q1 ∈R

m×n has columns forming an orthonormal set and R1 ∈ Rn×n is an upper triangular

matrix. What is the solution of the least squares problem Ax ≈ b in terms of Q1, R1,and b?

3.3 Householder Reflections

Definition (Householder) An elementary matrix has the form

I + rank one matrix.

It is easy to show that an elementary matrix has the form

I + uvT, u 6= 0, v 6= 0.

An example of an elementary matrix is a Gauss transformation Mk = I −mkeTk . An elemen-

tary matrix is efficient computationally; it is easy to invert. It is not difficult to show thatan orthogonal elementary matrix has the form

I − 2vvT

vTv=: P.

P is symmetric and P 2 = I. Exercise. Prove this.

Recall thatvvT

vTvx is the orthogonal projection of x onto v.

T

T

xvv

vv

x

v

What does P = I − 2vvT

vTvdo?

80

hyperplane

orthogonal to v

T

T

T

T

x−2 xvv

vv

xvv

vv

It reflects in the hyperplane through the origin orthogonal to v. (How should one store P ?How should one multiply by P ?) In sum

T

T

(Householder) reflectionvv

vvP=I−2

x

Px

v

Note that the length of v is irrelevant—only its direction matters.In practice we want to determine v so that for some given x

Px =

α0...0

for some α ∈ R.

81

Recalling that orthogonal matrices preserve Euclidean length,

‖x‖2 = ‖Px‖2 = |α| =⇒ α = ±‖x‖2.

To get v to point in the right direction, choose v = x − Px.

x

v=x−Px

Px

x v=x−Px

Px

Px = sign(x1)‖x‖2e1

first components of x and Pxhave same sign

Px = − sign(x1)‖x‖2e1

first components of x and Pxhave opposite sign

which do we choose?

Note that for very acute angles between x and Px, the direction of v becomes very sensitiveto perturbations in x or Px—there is a lot of cancellation in x–Px if they point in similardirections. Therefore,

v = x + sign(x1)‖x‖2e1, sign(0) = 1Px = −sign(x1)‖x‖2e1.

computing P LAPACK normalizes v so that vnew1 = 1, and hence uses

vnew = v/v1

instead. Also, it is careful to avoid underflow/overflow in the computation of ‖x‖2. Itcomputes

β :=2

vTv,

so P = I − βvvT.

computing PA for A ∈ Rm×n We are given v and β where P = I − βvvT. The product

P ·A would require m2n multiplications. However, A−v(β(vTA)) requires mn multiplicationsfollowed by n multiplications followed by mn multiplications. Hence, m2n multiplicationsvs. 2mn multiplications.

82

reduction to triangular form

As in Gaussian elimination, we reduce A column by column.Recursive description. The case m ≥ n = 1 is an exercise. Consider the case m ≥ n ≥ 2.

Determine orthogonal P such that

PA =

[α aT

0 A

].

P is to be constructed in the next section. Then

A = P

[α aT

0 A

]

and recursion gives A = QR so that

A = QR where Q = P

[1 0T

0 Q

]and R =

[α aT

0 R

].

Nonrecursive description. First, determine orthogonal P1 such that

P1A =

× × × ×00

0 A1

00

.

Similarly,

P2A1 =

× × ×0

0 A2

00

, P3A2 =

× ×0

0 A3

0

, P4A3 =

×

00

.

Then

11

1

P4

︸︷︷︸P4

11

P3

︸︷︷︸P3

[1

P2

]︸︷︷︸

P2

P1A =

× × × ×0 × × ×0 0 × ×0 0 0 ×0 0 0 00 0 0 0

︸︷︷︸R

.

Thus P4P3P2P1 = QT, but we do not need to form this product. Rather we compute

QTb = P4(P3(P2(P1b))).

Note. If m = n, then n − 1 reductions suffice.

83

Householder orthogonalization At beginning of the kth stage

k−1,nr

2n

1nr

k−1

k−1,k−1

22

1211

k

k

A~

r

r

rrr

We want to find Pk = I − βkvkvTk such that

PkAk−1 =

rkk rkk+1 · · · rkn

0... Ak

0

.

the algorithm

Householder orthogonalizationfor k = 1, 2, . . . , n do {

determine a Householder reflection Pk = I − βkvkvTk such that

Pk

akk

ak+1,k...

amk

=

rkk

0...0

;

for j = k + 1, k + 2, . . . , n do

rkj

ak+1,j...amj

= Pk

akj

ak+1,j...amj

;

84

}The storage scheme is as follows: after the kth step we have

~

A~

R

m

n

n?

?

12

k

k

...

beta

beta k

1

v

v~ v~

Review questions

1. What is an elementary matrix? Give an explicit form for an elementary matrix.

2. Give an explicit for an elementary orthogonal matrix. What do we call such a matrix?

3. Describe geometrically the effect of multiplying a vector x by a matrix H = I −2v(vTv)−1vT?

4. In practice H is constructed so that y = Hx where x and y are given. What conditionmust y satisfy for H to exist? If all but the first element of y is to vanish, what arethe choices for y?

5. Assuming the ability to construct a Householder transformation that maps a givenvector to another of the same length, give a recursive algorithm for QR factorizationusing Householder reflections. As always, use partitioned matrices to describe thealgorithm.

6. How much storage is required for the matrix Q in a QR factorization using Householderreflections?

Exercises

1. Suppose that P2P1A = R where Pi = I − βivivTi and

β1 = 13, v1 =

2−1

01

, β2 = 1

9, v2 =

041

−1

, and R =

3/2 00 −9/40 00 0

.

85

Determine the least squares solution of Ax ≈ b where

b =

[−3

2,−5

2, 0,

11

4

]T

.

Do an efficient computation and show every step.

2. Determine a reflection P = I − 2v(vTv)−1vT such that Px is a multiple of y wherex 6= 0 and y 6= 0. Of the two possibilities, which uses for v the sum of two vectorsseparated by an angle of at most π/2.

3. Prove or disprove: a Householder reflection is positive definite.

4. Write an algorithm for overwriting b with Pn · · ·P2P1b where Pk = I − βkvkvTk and vk

= [ 0 · · · 0 vkk · · · vmk]T . Only the following array values should be referenced

by your algorithmbi, 1 ≤ i ≤ m,vik, k ≤ i ≤ m, 1 ≤ k ≤ n,βk, 1 ≤ k ≤ n.

5. Let

A =

7/8 −21/25−1 23/50

2 −48/252 102/25

.

Determine β1, β2, v1, v2 and right triangular matrix R such that P2P1A = R wherePi = I − βiviv

Ti . (As a check confirm that A = P1P2R.)

6. Assuming u and v are nonzero column vectors, show that I + uvT is orthogonal onlyif u = −2v/(vTv).

7. Count the number of multiplications for the Householder orthogonalization algorithmas described in this section.

8. Recall that for a Householder reflection P = I − 2vvT

vTvthe vector Px is the mirror

reflection of a vector x in the hyperplane through the origin normal to v. Let x andy be linearly independent vectors of equal Euclidean length. By means of a pictorialargument determine a formula for v such that Px = y.

9. Apply Householder orthogonalization to compute the QR decomposition of the matrix

1 11 21 31 4

.

86

Normalize the Householder vectors vk so that their first (nonzero) element is 1, as doesLAPACK, and calculate βk values. (Do not actually form the matrices Pk or worse yetthe matrix Qk.).

10. The application of a Householder reduction to an m by n matrix A, n < m,

Pn · · ·P2P1A = R

yields an upper triangular matrix R. This can be used to create a reduced QR factor-ization A = QR where Q is m by n and R is a square upper triangular matrix. Whatis R in terms of P1, P2, . . . , Pn, R? What is Q in terms of P1, P2, . . . , Pn, R?

11. Given on the graph below is a vector x:

|

| / x

| /

| /

| /

| /

| /

| /

| /

|/

---------------------------------------+---------------------------------------

|

|

|

|

|

|

|

|

|

|

|

Construct the vector x′ def= −sign(x1)‖x‖2e1 on this graph. Also construct on the graph

a vector v in terms of x and x′ such that Px = x′ where P = I − 2vvT

vTv.

12. Simplify ‖a − ‖a‖2e1‖22 where a, e1 ∈ R

n and eT1 = [1, 0, . . . , 0].

87

3.4 Givens Rotations

The goal is to use one row of a matrix such as

× × × × × ×× × × × ×

a × × ×× × ×× × ×

b × × ×× × × ×× × × ×

to zero out an element in another row by recombining the two rows. By using

11

c s1

1−s c

11

where c2 +s2 = 1, we can accomplish this with an orthogonal matrix. How should we choosec and s? So that

−sa + cb = 0;

that is,s = b/

√a2 + b2, c = a/

√a2 + b2.

Cost is about 6(n−k) operations per eliminated element vs. 4(n−k) operations per eliminatedelement for a Householder reflection where k is the column index of the eliminated element.

Note. The matrix [cos θ sin θ− sin θ cos θ

]

denotes a 2-dimensional clockwise rotation by an angle of θ radians.

Review questions

1. What is the form of a Givens rotation? What is its geometric interpretation?

2. Show how a typical Givens rotation is determined in the course of computing a QRfactorization.

88

Exercises

1. Apply the first 2 (out of 5) Givens rotations in the QR decomposition of the matrix

1 11 21 31 4

.

2. Calculate a Givens rotation that zeros out the (4, 1) entry of

1 11 −1

−1 −1−1 1

without

changing the 1st and 3rd rows.

3.5 Gram-Schmidt Orthogonalization

Recall that if P is an orthogonal projector for a subspace S, then y − Py⊥S.The classical Gram-Schmidt process produces an orthonormal set q1, q2, . . . , qn from a

linearly independent set a1, a2, . . . , an, and, in particular, q1, q2, . . . , qk are formed from a1,a2, . . . , ak, k = 1, 2, . . . , n:

av

q1

2

2

q1 = a1/‖a1‖2,

v2 = a2 − q1qT1 a2,

q2 = v2/‖v2‖2,

3

3

2

1

v

a

q

q

89

v3 = a3 − q1qT1 a3 − q2q

T2 a3,

q3 = v3/‖v3‖2.

More generally,

vj = aj −j−1∑k=1

qkqTk aj,

qj = vj/‖vj‖2.

Gram-Schmidt orthogonalization computes a reduced QR factorization:

aj =

j−1∑k=1

qk (qTk aj)︸︷︷︸rkj

+ ‖vj‖2︸︷︷︸rjj

qj

=

j∑k=1

qkrkj,

so

[a1 a2 · · · an

]=

[q1 q2 · · · qn

]

r11 r12 · · · r1n

r22 · · · r2n

. . ....

rnn

or

A = QR, R =

‖v1‖2 qT1 a2 qT

1 a3 · · ·‖v2‖2 qT

2 a3 · · ·‖v3‖2 · · ·

. . .

This is a reduced QR factorization.The algorithm is

for k = 1, 2, . . . , n do {vk = ak;for j = 1, 2, . . . , k − 1 do {

rjk = qTj ak;

vk = vk − qjrjk;}rkk = ‖vk‖2;qk = vk/rkk;

}

90

QR vs. QR.The classical Gram-Schmidt process is not very satisfactory numerically. A mathemati-

cally equivalent but numerically superior alternative is given by the modified Gram-Schmidtprocess. Instead of

v3 = a3 − q1qT1 a3 − q2q

T2 a3,

usev3 = (I − q2q

T2 )(I − q1q

T1 )a3.

3T

11

3

3

2

1

)aq(I−q

q

v

a

q

Computationally,

v3 = a3;v3 = v3 − q1q

T1 v3;

v3 = v3 − q2qT2 v3.

An additional change is also desirable and that is to compute the elements of R row by rowinstead of column by column. A row-oriented version of MGS is preferable because it lendsitself readily to the use of column-pivoting to deal with possible rank deficiency; whereas,the column-oriented version does not. Both versions compute Q column by column, butafter computing each new column qk, the row-oriented version immediately orthogonalizesthe remaining vectors, ak+1, . . . , an, against qk:

for k = 1, 2, . . . , n do vk = ak;for k = 1, 2, . . . , n do {

rkk = ‖vk‖2;qk = vk/rkk;for j = k + 1, k + 2, . . . , n do {

rkj = qTk vj;

vj = vj − qkrjk;}

}

91

Review questions

1. Let q1, q2, . . . , qn be a sequence obtained from a linearly independent sequence a1, a2,. . . , an by Gram-Schmidt orthogonalization. The sequence q1, q2, . . . , qn is uniquelydetermined by what two properties? One involves relationships among the terms ofthe generated sequence and the other involves relationships between the two sequences.Express the second of these properties as a matrix equation.

2. Reproduce the algorithm for Gram-Schmidt orthogonalization.

3. What is the “defining” idea of the row-oriented version of MGS?

Exercises

1. (a) Give a high-level recursive algorithm for the reduced QR factorization A = QR ofan m by n matrix by working in terms of a (1, n− 1) partitioning of the columnsof A and Q and a (1, n− 1) by (1, n− 1) partitioning of R. (You will have to usethe orthogonality of Q, not to mention the triangularity of R.)

(b) Give the nonrecursive equivalent of part (a).

(c) How is this algorithm related to the classical or modified Gram-Schmidt process?

2. Apply the first 2 (out of 5) Givens rotations in the QR decomposition of the matrix

1 11 21 31 4

.

3.6 Singular Value Decomposition

THEOREM (SVD) Let A ∈ Rm×n, m ≥ n. There exist orthogonal matrices U ∈ R

m×m,V ∈ R

n×n such that

A = UΣV T Σ =

σ1 0σ2

. . .

0 σn

0 0

where the singular values σ1 ≥ σ2 ≥ · · · ≥ σn ≥ 0.If m > n, one can use the SVD for AT to get an SVD for A.The SVD provides a geometric interpretation for A:

V T: rotations and/or reflections

92

Σ: differential scaling along coordinate axes

U : rotations and/or reflections

VB

V B V B

sphereunit

T

TT

sigma

sigma

2

1

B

U

Hence ‖A‖2 = σ1.The condition number κ2(A)=σ1/σn.Computing the SVD is discussed in Section 4.7

3.6.1 Application to linear least squares problem

‖b − UΣV Tx‖2 = ‖UTb − ΣV Tx‖2.

Let c = UTb, y = V Tx. The problem is now to minimize

‖c − Σy‖2 =√

(c1 − σ1y1)2 + · · ·+ (cn − σnyn)2 + c2n+1 + · · ·+ c2

m

solutionyi = ci/σi.

What if σr+1 = · · · = σn = 0 but σr > 0? rank deficient Then yr+1, . . . , yn can be anything,and we have a family of solutions x = V y. Suppose we ask for x having minimum ‖x‖2?

‖x‖2 = ‖V y‖2 = ‖y‖2

93

For smallest ‖y‖2 choose

yi =

{ci/σi, i = 1, 2, . . . , r,0, i = r + 1, r + 2, . . . , n.

That is,

y = Σ†c where Σ† =

σ−11

. . . 0σ−1

r

0. . . 0

0

.

Moore–Penrose pseudoinverse

x = A†b, A† = V Σ†UT

3.6.2 Data Compression

THEOREM Let A ∈ Rm×n, n ≤ m, and let B be a matrix of rank k ≤ n for which ‖B−A‖F

is smallest. Then

B =

k∑i=1

uiσivTi

where

A =

n∑i=1

uiσivTi = [u1 · · ·um]

σ1

. . .

σn

0

vT1...

vTn

is the SVD of A.The SVD is a good way to compute the rank of a matrix.

Review questions

1. Under what conditions does a matrix A ∈ Rm×n, m ≥ n, have a singular value decom-

position?

2. Specify precisely the form of a singular value decomposition of a matrix A ∈ Rm×n for

n < m, for n = m, and for n > m.

3. What is the 2-norm of a matrix in terms of its SVD?

4. What is the 2-norm condition number of a nonsingular matrix in terms of its SVD?

94

5. What additional requirement makes the solution of any linear least squares problemunique?

6. What is the Moore-Penrose pseudoinverse of a rank deficient matrix A ∈ Rm×n, n < m?

7. What is the solution of a linear least squares problem with a rank deficient coefficientmatrix?

8. What is the matrix B of rank k ≤ n for which ‖B − A‖F is smallest where A ∈R

m×n, n < m?

95

Documents

Chapter 3 LEAST SQUARES PROBLEMS - Purdue …Chapter 3 LEAST SQUARES PROBLEMS the sea E F C D B A One application is geodesy & surveying. Let z = elevation, and suppose we have the