QR Factorization and Singular Value Decomposition€¦ · •QR Factorization – Householder method •Singular Value Decomposition •Total least squares •Practical notes . Review:

QR Factorization and Singular Value Decomposition

COS 323

Why Yet Another Method?

• How do we solve least-squares… – without incurring condition-squaring effect of normal equations

(ATAx = ATb)

– when A is singular, “fat”, or otherwise poorly-specified?

• QR Factorization – Householder method

• Singular Value Decomposition

• Total least squares

• Practical notes

Review: Condition Number

• Cond(A) is function of A

• Cond(A) >= 1, bigger is bad

• Measures how change in input propagates to output:

• E.g., if cond(A) = 451 then can lose log(451)= 2.65 digits of accuracy in x, compared to precision of A

|| ∆x |||| x ||

≤ cond(A)|| ∆A |||| A ||

Normal Equations are Bad

• Normal equations involves solving ATAx = ATb

• cond(ATA) = [cond(A)]2

• E.g., if cond(A) = 451 then can lose log(4512) = 5.3 digits of accuracy, compared to precision of A

|| ∆x |||| x ||

≤ cond(A)|| ∆A |||| A ||

QR Decomposition

What if we didn’t have to use ATA?

• Suppose we are “lucky”:

• Upper triangular matrices are nice!

bxR

x =

≅

0

#######

000

00#00

00##0###

How to make A upper-triangular?

• Gaussian elimination? – Applying elimination yields MAx = Mb

– Want to find x s.t. minimizes ||Mb-MAx||2

– Problem: ||Mv||2 != ||v||2 (i.e., M might “stretch” a vector v)

– Another problem: M may stretch different vectors differently

– i.e., M does not preserve Euclidean norm

– i.e., x that minimizes ||Mb-MAx|| may not be same x that minimizes Ax=b

QR Factorization

• Find upper-triangular R and orthogonal Q s.t.

• Doesn’t change least-squares solution – QTQ=I, columns of Q are orthonormal

– i.e., Q preserves Euclidean norm: ||Qv||2=||v||2

bQxRR

QA T=

=

0 so ,

0

Goal of QR

=

=

0000000?0

00

???

0

QR

QAm×n

m×m m×n

R: n×n,

upper tri.

(m-n)×n, all zeros

Reformulating Least Squares using QR

2

22

2

222

21

2

2

2

2

2

2

2

2

2

2

c

cRxc

xOR

bQ

xOR

QQbQ

xOR

Qb

Axbr

T

TT

=

+−=

−=

−=

−=

−=

=

OR

QAbecause

because Q preserves lengths

if we call

=

2

1

cc

bQT

if we choose x such that Rx=c1

because Q is orthogonal (QTQ=I)

Householder Method for Computing QR Decomposition

Orthogonalization for Factorization

• Rough idea: – For each i-th column of A, “zero out” rows i+1 and

lower

– Accomplish this by multiplying A with an orthogonal matrix Hi

– Equivalently, apply an orthogonal transformation to the i-th column (e.g., rotation, reflection)

– Q becomes product H1*…*Hn, R contains zero-ed out columns

=

OR

QA

Householder Transformation

• Accomplishes the critical sub-step of factorization: – Given any vector (e.g., a column of A), reflect it so

that its last p elements become 0.

– Reflection preserves length (Euclidean norm)

(4, 3)

(-5, 0)

Outcome of Householder

=

=

=

OR

HHQHHQ

OR

AHH

n

nT

n

Q=A so

so where

1

1

1

Review: Least Squares using QR

2

22

2

222

21

2

2

2

2

2

2

2

2

2

2

c

cRxc

xOR

bQ

xOR

QQbQ

xOR

Qb

Axbr

T

TT

=

+−=

−=

−=

−=

−=

=

OR

QAbecause

because Q preserves lengths

if we call

=

2

1

cc

bQT

if we choose x such that Rx=c1

because Q is orthogonal (QTQ=I)

Using Householder

• Iteratively compute H1, H2, … Hn and apply to A to get R

– also apply to b to get

• Solve for Rx=c1 using back-substitution

=

2

1

cc

bQT

Alternative Orthogonalization Methods

• Givens: – Don’t reflect; rotate instead

– Introduces zeroes into A one at a time

– More complicated implementation than Householder

– Useful when matrix is sparse

• Gram-Schmidt – Iteratively express each new column vector as a linear

combination of previous columns, plus some (normalized) orthogonal component

– Conceptually nice, but suffers from subtractive cancellation

Singular Value Decomposition

Motivation #1

• Diagonal matrices are even nicer than triangular ones:

≅

#######

000

00#0000000#0000#

x

Motivation #2

• What if you have fewer data points than parameters in your function? – i.e., A is “fat”

– Intuitively, can’t do standard least squares

– Recall that solution takes the form ATAx = ATb

– When A has more columns than rows, ATA is singular: can’t take its inverse, etc.

Motivation #3

• What if your data poorly constrains the function?

• Example: fitting to y=ax2+bx+c

Underconstrained Least Squares

• Problem: if problem very close to singular, roundoff error can have a huge effect – Even on “well-determined” values!

• Can detect this: – Uncertainty proportional to covariance C = (ATA)-1

– In other words, unstable if ATA has small values

– More precisely, care if xT(ATA)x is small for any x

• Idea: if part of solution unstable, set answer to 0 – Avoid corrupting good parts of answer

Singular Value Decomposition (SVD)

• Handy mathematical technique that has application to many problems

• Given any m×n matrix A, algorithm to find matrices U, V, and W such that

A = U W VT

U is m×n and orthonormal

W is n×n and diagonal

V is n×n and orthonormal

SVD

• Based on Householder reduction, QR decomposition, but treat as black box: code widely available e.g., in Matlab: [U,W,V]=svd(A,0)

T1

000000

=

VUA

nw

w

SVD

• The wi are called the singular values of A

• If A is singular, some of the wi will be 0

• In general rank(A) = number of nonzero wi

• SVD is mostly unique (up to permutation of singular values, or if some wi are equal)

SVD and Inverses

• Why is SVD so useful?

• Application #1: inverses

• A-1=(VT)-1 W-1 U-1 = V W-1 UT

– Using fact that inverse = transpose for orthogonal matrices

– Since W is diagonal, W-1 also diagonal with reciprocals of entries of W

SVD and the Pseudoinverse

• A-1=(VT)-1 W-1 U-1 = V W-1 UT

• This fails when some wi are 0 – It’s supposed to fail – singular matrix

– Happens when rectangular A is rank deficient

• Pseudoinverse: if wi=0, set 1/wi to 0 (!) – “Closest” matrix to inverse

– Defined for all (even non-square, singular, etc.) matrices

– Equal to (ATA)-1AT if ATA invertible

SVD and Condition Number

• Singular values used to compute Euclidean (spectral) norm for a matrix:

cond(A) =σmax (A)σmin (A)

SVD and Least Squares

• Solving Ax=b by least squares:

• ATAx = ATb → x = (ATA)-1ATb

• Replace with A+: x = A+b

• Compute pseudoinverse using SVD – Lets you see if data is singular (< n nonzero singular

values)

– Even if not singular, condition number tells you how stable the solution will be

– Set 1/wi to 0 if wi is small (even if not exactly 0)

Total Least Squares

• One final least squares application

• Fitting a line: vertical vs. perpendicular error

Total Least Squares

• Distance from point to line: where n is normal vector to line, a is a constant

• Minimize:

any

xd

i

ii −⋅

=

∑∑

−⋅

==

i i

i

ii an

y

xd

2

22 χ

Total Least Squares

• First, let’s pretend we know n, solve for a

• Then

ny

xm

a

any

x

i i

i

i i

i

⋅

=

−⋅

=

∑

∑

1

2

2χ

ny

xan

y

xd

my

i

mx

i

i

ii

i

i ⋅

−

−=−⋅

=

Σ

Σ

Total Least Squares

• So, let’s define and minimize

−

−=

Σ

Σ

my

i

mx

i

i

i

i

i

y

xy

x~

~

∑

⋅

i i

i ny

x2

~

~

Total Least Squares

• Write as linear system

• Have An=0 – Problem: lots of n are solutions, including n=0

– Standard least squares will, in fact, return n=0

0~~

~~

~~

33

22

11

=

y

x

n

n

yx

yx

yx

Constrained Optimization

• Solution: constrain n to be unit length

• So, try to minimize |An|2 subject to |n|2=1

• Expand in eigenvectors ei of ATA: where the λi are eigenvalues of ATA

( ) ( ) nnnnn AAAAA TTT2 ==

( )22

21

2

222

211

TT2211

µµ

µλµλ

µµ

+=

+=

+=

n

nnn

AAee


• To minimize subject to set µmin = 1, all other µi = 0

• That is, n is eigenvector of ATA with the smallest corresponding eigenvalue

222

211 µλµλ + 12

221 =+ µµ

SVD and Eigenvectors

• Let A=UWVT, and let xi be ith column of V

• Consider ATA xi:

• So elements of W are sqrt(eigenvalues) and columns of V are eigenvectors of ATA

iiiiii xwwxxx 222T2TTTT

0

0

0

1

0

=

=

===

VVWVVWUWVUVWAA


• To minimize subject to set µmin = 1, all other µi = 0

• That is, n is eigenvector of ATA with the smallest corresponding eigenvalue

• That is, n is column of V corresponding to smallest singular value

222

211 µλµλ + 12

221 =+ µµ

Comparison of Least Squares Methods

• Normal equations (ATAx = ATb) – O(mn2) (using Cholesky)

– cond(ATA)=[cond(A)]2

– Cholesky fails if cond(A)~1/sqrt(machine epsilon)

• Householder – Usually best

orthogonalization method

– O(mn2 - n3/3) operations

– Relative error is best possible for least squares

– Breaks if cond(A) ~ 1/(machine eps)

• SVD – Expensive: mn2 + n3 with

bad constant factor

– Can handle rank-deficiency, near-singularity

– Handy for many different things

Documents

QR Factorization and Singular Value Decomposition€¦ · •QR Factorization – Householder method •Singular Value Decomposition •Total least squares •Practical notes . Review: