Upload
cade-egger
View
213
Download
0
Embed Size (px)
Citation preview
Eigen Decomposition and Singular Value DecompositionBased on the slides by Mani Thomas
Modified and extended by Longin Jan Latecki
Introduction Eigenvalue decomposition
Spectral decomposition theorem Physical interpretation of
eigenvalue/eigenvectors Singular Value Decomposition Importance of SVD
Matrix inversion Solution to linear system of equations Solution to a homogeneous system of equations
SVD application
What are eigenvalues? Given a matrix, A, x is the eigenvector
and is the corresponding eigenvalue if Ax = x A must be square and the determinant of A - I must be equal to zero
Ax - x = 0 ! (A - I) x = 0 Trivial solution is if x = 0 The non trivial solution occurs when det(A - I) = 0
Are eigenvectors are unique? If x is an eigenvector, then x is also an
eigenvector and is an eigenvalueA(x) = (Ax) = (x) = (x)
Calculating the Eigenvectors/values Expand the det(A - I) = 0 for a 2 £ 2 matrix
For a 2 £ 2 matrix, this is a simple quadratic equation with two solutions (maybe complex)
This “characteristic equation” can be used to solve for x
0
00det
010
01detdet
2112221122112
211222112221
1211
2221
1211
aaaaaa
aaaaaa
aa
aa
aaIA
21122211
22211
2211 4 aaaa
aaaa
Eigenvalue example Consider,
The corresponding eigenvectors can be computed as
For = 0, one possible solution is x = (2, -1) For = 5, one possible solution is x = (1, 2)
5,0)41(
02241)41(
0
42
21
2
2211222112211
2
aaaaaa
A
0
0
12
24
12
240
50
05
42
215
0
0
42
21
42
210
00
00
42
210
yx
yx
y
x
y
x
yx
yx
y
x
y
x
For more information: Demos in Linear algebra by G. Strang, http://web.mit.edu/18.06/www/
Physical interpretation Consider a covariance matrix, A, i.e., A = 1/n S ST
for some S
Error ellipse with the major axis as the larger eigenvalue and the minor axis as the smaller eigenvalue
25.0,75.1175.
75.121
A
Physical interpretation
Orthogonal directions of greatest variance in data Projections along PC1 (Principal Component) discriminate
the data most along any one axis
Original Variable A
Ori
gin
al V
ari
ab
le B
PC 1PC 2
Physical interpretation First principal component is the direction of
greatest variability (covariance) in the data Second is the next orthogonal
(uncorrelated) direction of greatest variability So first remove all the variability along the first
component, and then find the next direction of greatest variability
And so on … Thus each eigenvectors provides the
directions of data variances in decreasing order of eigenvalues
For more information: See Gram-Schmidt Orthogonalization in G. Strang’s lectures
Multivariate Gaussian
Bivariate Gaussian
Spherical, diagonal, full covariance
Let be a square matrix with m linearly independent eigenvectors (a “non-defective” matrix)
Theorem: Exists an eigen decomposition
(cf. matrix diagonalization theorem) Columns of U are eigenvectors of S Diagonal elements of are eigenvalues of
Eigen/diagonal Decomposition
diagonal
Unique for
distinct eigen-values
Diagonal decomposition: why/how
nvvU ...1Let U have the eigenvectors as columns:
n
nnnn vvvvvvSSU
............
1
1111
Then, SU can be written
And S=UU–1.
Thus SU=U, or U–1SU=
Diagonal decomposition - example
Recall .3,1;21
1221
S
The eigenvectors and form
1
1
1
1
11
11U
Inverting, we have
2/12/1
2/12/11U
Then, S=UU–1 =
2/12/1
2/12/1
30
01
11
11
RecallUU–1 =1.
Example continued
Let’s divide U (and multiply U–1) by 2
2/12/1
2/12/1
30
01
2/12/1
2/12/1Then, S=
Q (Q-1= QT )
Why? Stay tuned …
If is a symmetric matrix:
Theorem: Exists a (unique) eigen
decomposition
where Q is orthogonal: Q-1= QT
Columns of Q are normalized eigenvectors
Columns are orthogonal.
(everything is real)
Symmetric Eigen Decomposition
TQQS
Spectral Decomposition theorem If A is a symmetric and positive definite k £ k
matrix (xTAx > 0) with i (i > 0) and ei, i = 1 k being the k eigenvector and eigenvalue pairs, then
This is also called the eigen decomposition theorem Any symmetric matrix can be reconstructed
using its eigenvalues and eigenvectors
Tk
i k
Ti
kii
kkk
Tk
kkk
k
T
kk
T
kkkPPeeAeeeeeeA
1 11111
2122
11
111
k
kkk
kk
00
00
00
,, 2
1
21 eeeP
Example for spectral decomposition Let A be a symmetric, positive definite
matrix
The eigenvectors for the corresponding eigenvalues are
Consequently,
02316.016.65
0det8.24.0
4.02.2
2
IAA
51,
52,
52,
51
21TT ee
4.08.0
8.06.1
4.22.1
2.16.0
51
52
51
52
25
25
1
52
51
38.24.0
4.02.2A
Singular Value Decomposition If A is a rectangular m £ k matrix of real numbers, then
there exists an m £ m orthogonal matrix U and a k £ k orthogonal matrix V such that
is an m £ k matrix where the (i, j)th entry i ¸ 0, i = 1 min(m, k) and the other entries are zero The positive constants i are the singular values of A
If A has rank r, then there exists r positive constants 1, 2,r, r orthogonal m £ 1 unit vectors u1,u2,,ur and r orthogonal k £ 1 unit vectors v1,v2,,vr such that
Similar to the spectral decomposition theorem
IVVUUVUA
TT
kk
T
kmmmkm
r
i
Tiii
1
vuA
Singular Value Decomposition (contd.) If A is a symmetric and positive
definite then SVD = Eigen decomposition
EIG(i) = SVD(i2)
Here AAT has an eigenvalue-eigenvector pair (i
2,ui)
Alternatively, the vi are the eigenvectors of ATA with the same non zero eigenvalue i
2TT VVAA 2
T
TT
TTTT
UU
UVVU
VUVUAA
2
Example for SVD Let A be a symmetric, positive definite
matrix U can be computed as
V can be computed as
21,
21,
21,
2110,120det
111
111
11
31
13
131
113
131
113
2121TTT
T
uuIAA
AAA
305,
302,
301,0,
51,
52,
61,
62,
61
0,10,120det
242
4100
2010
131
113
11
31
13
131
113
321
321
TTT
T
T
vvv
IAA
AAA
Example for SVD Taking 2
1=12 and 22=10, the singular value
decomposition of A is
Thus the U, V and are computed by performing eigen decomposition of AAT and ATA
Any matrix has a singular value decomposition but only symmetric, positive definite matrices have an eigen decomposition
0,5
1,5
2
21
21
106
1,6
2,6
1
21
21
12
131
113A
Applications of SVD in Linear Algebra Inverse of a n £ n square matrix, A
If A is non-singular, then A-1 = (UVT)-1= V-1UT where -1=diag(1/1, 1/1,, 1/n) If A is singular, then A-1 = (UVT)-1¼ V0
-1UT where
0-1=diag(1/1, 1/2,, 1/i,0,0,,0)
Least squares solutions of a m£n system Ax=b (A is m£n, m¸n) =(ATA)x=ATb ) x=(ATA)-1 ATb=A+b If ATA is singular, x=A+b¼ (V0
-1UT)b where 0-1 =
diag(1/1, 1/2,, 1/i,0,0,,0)
Condition of a matrix Condition number measures the degree of singularity of A
Larger the value of 1/n, closer A is to being singular
http://www.cse.unr.edu/~bebis/MathMethods/SVD/lecture.pdf
Applications of SVD in Linear Algebra Homogeneous equations, Ax
= 0 Minimum-norm solution is
x=0 (trivial solution) Impose a constraint, “Constrained” optimization
problem Special Case
If rank(A)=n-1 (m ¸ n-1, n=0) then x= vn ( is a constant)
Genera Case If rank(A)=n-k (m ¸ n-k, n-
k+1== n=0) then x=1vn-
k+1++kvn with 2
1++2n=1
For proof: Johnson and Wichern, “Applied Multivariate Statistical Analysis”, pg 79
Axx 1min
1x
Has appeared before Homogeneous solution of
a linear system of equations
Computation of Homogrpahy using DLT
Estimation of Fundamental matrix
What is the use of SVD? SVD can be used to
compute optimal low-rank approximations of arbitrary matrices.
Face recognition Represent the face images
as eigenfaces and compute distance between the query face image in the principal component space
Data mining Latent Semantic Indexing
for document extraction Image compression
Karhunen Loeve (KL) transform performs the best image compression In MPEG, Discrete Cosine
Transform (DCT) has the closest approximation to the KL transform in PSNR
Singular Value Decomposition Illustration of SVD dimensions and
sparseness
SVD example
Let
01
10
11
A
Thus m=3, n=2. Its SVD is
2/12/1
2/12/1
00
30
01
3/16/12/1
3/16/12/1
3/16/20
Typically, the singular values arranged in decreasing order.
SVD can be used to compute optimal low-rank approximations.
Approximation problem: Find Ak of rank k such that
Ak and X are both mn matrices.
Typically, want k << r.
Low-rank Approximation
Frobenius normFkXrankX
k XAA
min)(:
Solution via SVD
Low-rank Approximation
set smallest r-ksingular values to zero
Tkk VUA )0,...,0,,...,(diag 1
column notation: sum of rank 1 matrices
Tii
k
i ik vuA
1
k
Approximation error How good (bad) is this approximation? It’s the best possible, measured by the
Frobenius norm of the error:
where the i are ordered such that i i+1.
Suggests why Frobenius error drops as k increased.
1)(:
min
kFkFkXrankX
AAXA