Upload
others
View
10
Download
0
Embed Size (px)
Citation preview
QR Factorization and Singular Value Decomposition
COS 323
Why Yet Another Method?
• How do we solve least-squares… – without incurring condition-squaring effect of normal equations
(ATAx = ATb)
– when A is singular, “fat”, or otherwise poorly-specified?
• QR Factorization – Householder method
• Singular Value Decomposition
• Total least squares
• Practical notes
Review: Condition Number
• Cond(A) is function of A
• Cond(A) >= 1, bigger is bad
• Measures how change in input propagates to output:
• E.g., if cond(A) = 451 then can lose log(451)= 2.65 digits of accuracy in x, compared to precision of A
|| ∆x |||| x ||
≤ cond(A)|| ∆A |||| A ||
Normal Equations are Bad
• Normal equations involves solving ATAx = ATb
• cond(ATA) = [cond(A)]2
• E.g., if cond(A) = 451 then can lose log(4512) = 5.3 digits of accuracy, compared to precision of A
|| ∆x |||| x ||
≤ cond(A)|| ∆A |||| A ||
QR Decomposition
What if we didn’t have to use ATA?
• Suppose we are “lucky”:
• Upper triangular matrices are nice!
bxR
x =
≅
0
#######
000
00#00
00##0###
How to make A upper-triangular?
• Gaussian elimination? – Applying elimination yields MAx = Mb
– Want to find x s.t. minimizes ||Mb-MAx||2
– Problem: ||Mv||2 != ||v||2 (i.e., M might “stretch” a vector v)
– Another problem: M may stretch different vectors differently
– i.e., M does not preserve Euclidean norm
– i.e., x that minimizes ||Mb-MAx|| may not be same x that minimizes Ax=b
QR Factorization
• Find upper-triangular R and orthogonal Q s.t.
• Doesn’t change least-squares solution – QTQ=I, columns of Q are orthonormal
– i.e., Q preserves Euclidean norm: ||Qv||2=||v||2
bQxRR
QA T=
=
0 so ,
0
Goal of QR
=
=
0000000?0
00
???
0
QR
QAm×n
m×m m×n
R: n×n,
upper tri.
(m-n)×n, all zeros
Reformulating Least Squares using QR
2
22
2
222
21
2
2
2
2
2
2
2
2
2
2
c
cRxc
xOR
bQ
xOR
QQbQ
xOR
Qb
Axbr
T
TT
=
+−=
−=
−=
−=
−=
=
OR
QAbecause
because Q preserves lengths
if we call
=
2
1
cc
bQT
if we choose x such that Rx=c1
because Q is orthogonal (QTQ=I)
Householder Method for Computing QR Decomposition
Orthogonalization for Factorization
• Rough idea: – For each i-th column of A, “zero out” rows i+1 and
lower
– Accomplish this by multiplying A with an orthogonal matrix Hi
– Equivalently, apply an orthogonal transformation to the i-th column (e.g., rotation, reflection)
– Q becomes product H1*…*Hn, R contains zero-ed out columns
=
OR
QA
Householder Transformation
• Accomplishes the critical sub-step of factorization: – Given any vector (e.g., a column of A), reflect it so
that its last p elements become 0.
– Reflection preserves length (Euclidean norm)
(4, 3)
(-5, 0)
Outcome of Householder
=
=
=
OR
HHQHHQ
OR
AHH
n
nT
n
Q=A so
so where
1
1
1
Review: Least Squares using QR
2
22
2
222
21
2
2
2
2
2
2
2
2
2
2
c
cRxc
xOR
bQ
xOR
QQbQ
xOR
Qb
Axbr
T
TT
=
+−=
−=
−=
−=
−=
=
OR
QAbecause
because Q preserves lengths
if we call
=
2
1
cc
bQT
if we choose x such that Rx=c1
because Q is orthogonal (QTQ=I)
Using Householder
• Iteratively compute H1, H2, … Hn and apply to A to get R
– also apply to b to get
• Solve for Rx=c1 using back-substitution
=
2
1
cc
bQT
Alternative Orthogonalization Methods
• Givens: – Don’t reflect; rotate instead
– Introduces zeroes into A one at a time
– More complicated implementation than Householder
– Useful when matrix is sparse
• Gram-Schmidt – Iteratively express each new column vector as a linear
combination of previous columns, plus some (normalized) orthogonal component
– Conceptually nice, but suffers from subtractive cancellation
Singular Value Decomposition
Motivation #1
• Diagonal matrices are even nicer than triangular ones:
≅
#######
000
00#0000000#0000#
x
Motivation #2
• What if you have fewer data points than parameters in your function? – i.e., A is “fat”
– Intuitively, can’t do standard least squares
– Recall that solution takes the form ATAx = ATb
– When A has more columns than rows, ATA is singular: can’t take its inverse, etc.
Motivation #3
• What if your data poorly constrains the function?
• Example: fitting to y=ax2+bx+c
Underconstrained Least Squares
• Problem: if problem very close to singular, roundoff error can have a huge effect – Even on “well-determined” values!
• Can detect this: – Uncertainty proportional to covariance C = (ATA)-1
– In other words, unstable if ATA has small values
– More precisely, care if xT(ATA)x is small for any x
• Idea: if part of solution unstable, set answer to 0 – Avoid corrupting good parts of answer
Singular Value Decomposition (SVD)
• Handy mathematical technique that has application to many problems
• Given any m×n matrix A, algorithm to find matrices U, V, and W such that
A = U W VT
U is m×n and orthonormal
W is n×n and diagonal
V is n×n and orthonormal
SVD
• Based on Householder reduction, QR decomposition, but treat as black box: code widely available e.g., in Matlab: [U,W,V]=svd(A,0)
T1
000000
=
VUA
nw
w
SVD
• The wi are called the singular values of A
• If A is singular, some of the wi will be 0
• In general rank(A) = number of nonzero wi
• SVD is mostly unique (up to permutation of singular values, or if some wi are equal)
SVD and Inverses
• Why is SVD so useful?
• Application #1: inverses
• A-1=(VT)-1 W-1 U-1 = V W-1 UT
– Using fact that inverse = transpose for orthogonal matrices
– Since W is diagonal, W-1 also diagonal with reciprocals of entries of W
SVD and the Pseudoinverse
• A-1=(VT)-1 W-1 U-1 = V W-1 UT
• This fails when some wi are 0 – It’s supposed to fail – singular matrix
– Happens when rectangular A is rank deficient
• Pseudoinverse: if wi=0, set 1/wi to 0 (!) – “Closest” matrix to inverse
– Defined for all (even non-square, singular, etc.) matrices
– Equal to (ATA)-1AT if ATA invertible
SVD and Condition Number
• Singular values used to compute Euclidean (spectral) norm for a matrix:
cond(A) =σmax (A)σmin (A)
SVD and Least Squares
• Solving Ax=b by least squares:
• ATAx = ATb → x = (ATA)-1ATb
• Replace with A+: x = A+b
• Compute pseudoinverse using SVD – Lets you see if data is singular (< n nonzero singular
values)
– Even if not singular, condition number tells you how stable the solution will be
– Set 1/wi to 0 if wi is small (even if not exactly 0)
Total Least Squares
• One final least squares application
• Fitting a line: vertical vs. perpendicular error
Total Least Squares
• Distance from point to line: where n is normal vector to line, a is a constant
• Minimize:
any
xd
i
ii −⋅
=
∑∑
−⋅
==
i i
i
ii an
y
xd
2
22 χ
Total Least Squares
• First, let’s pretend we know n, solve for a
• Then
ny
xm
a
any
x
i i
i
i i
i
⋅
=
−⋅
=
∑
∑
1
2
2χ
ny
xan
y
xd
my
i
mx
i
i
ii
i
i ⋅
−
−=−⋅
=
Σ
Σ
Total Least Squares
• So, let’s define and minimize
−
−=
Σ
Σ
my
i
mx
i
i
i
i
i
y
xy
x~
~
∑
⋅
i i
i ny
x2
~
~
Total Least Squares
• Write as linear system
• Have An=0 – Problem: lots of n are solutions, including n=0
– Standard least squares will, in fact, return n=0
0~~
~~
~~
33
22
11
=
y
x
n
n
yx
yx
yx
Constrained Optimization
• Solution: constrain n to be unit length
• So, try to minimize |An|2 subject to |n|2=1
• Expand in eigenvectors ei of ATA: where the λi are eigenvalues of ATA
( ) ( ) nnnnn AAAAA TTT2 ==
( )22
21
2
222
211
TT2211
µµ
µλµλ
µµ
+=
+=
+=
n
nnn
AAee
Constrained Optimization
• To minimize subject to set µmin = 1, all other µi = 0
• That is, n is eigenvector of ATA with the smallest corresponding eigenvalue
222
211 µλµλ + 12
221 =+ µµ
SVD and Eigenvectors
• Let A=UWVT, and let xi be ith column of V
• Consider ATA xi:
• So elements of W are sqrt(eigenvalues) and columns of V are eigenvectors of ATA
iiiiii xwwxxx 222T2TTTT
0
0
0
1
0
=
=
===
VVWVVWUWVUVWAA
Constrained Optimization
• To minimize subject to set µmin = 1, all other µi = 0
• That is, n is eigenvector of ATA with the smallest corresponding eigenvalue
• That is, n is column of V corresponding to smallest singular value
222
211 µλµλ + 12
221 =+ µµ
Comparison of Least Squares Methods
• Normal equations (ATAx = ATb) – O(mn2) (using Cholesky)
– cond(ATA)=[cond(A)]2
– Cholesky fails if cond(A)~1/sqrt(machine epsilon)
• Householder – Usually best
orthogonalization method
– O(mn2 - n3/3) operations
– Relative error is best possible for least squares
– Breaks if cond(A) ~ 1/(machine eps)
• SVD – Expensive: mn2 + n3 with
bad constant factor
– Can handle rank-deficiency, near-singularity
– Handy for many different things