Eigen Decomposition and Singular Value Decomposition Based on the slides by Mani Thomas Modified and extended by Longin Jan Latecki

Eigen Decomposition and Singular Value DecompositionBased on the slides by Mani Thomas

Modified and extended by Longin Jan Latecki

Introduction Eigenvalue decomposition

Spectral decomposition theorem Physical interpretation of

eigenvalue/eigenvectors Singular Value Decomposition Importance of SVD

Matrix inversion Solution to linear system of equations Solution to a homogeneous system of equations

SVD application

What are eigenvalues? Given a matrix, A, x is the eigenvector

and is the corresponding eigenvalue if Ax = x A must be square and the determinant of A - I must be equal to zero

Ax - x = 0 ! (A - I) x = 0 Trivial solution is if x = 0 The non trivial solution occurs when det(A - I) = 0

Are eigenvectors are unique? If x is an eigenvector, then x is also an

eigenvector and is an eigenvalueA(x) = (Ax) = (x) = (x)

Calculating the Eigenvectors/values Expand the det(A - I) = 0 for a 2 £ 2 matrix

For a 2 £ 2 matrix, this is a simple quadratic equation with two solutions (maybe complex)

This “characteristic equation” can be used to solve for x

0

00det

010

01detdet

2112221122112

211222112221

1211

2221

1211

aaaaaa

aaaaaa

aa

aa

aaIA

21122211

22211

2211 4 aaaa

aaaa

Eigenvalue example Consider,

The corresponding eigenvectors can be computed as

For = 0, one possible solution is x = (2, -1) For = 5, one possible solution is x = (1, 2)

5,0)41(

02241)41(

0

42

21

2

2211222112211

2

aaaaaa

A

0

0

12

24

12

240

50

05

42

215

0

0

42

21

42

210

00

00

42

210

yx

yx

y

x

y

x

yx

yx

y

x

y

x

For more information: Demos in Linear algebra by G. Strang, http://web.mit.edu/18.06/www/

Physical interpretation Consider a covariance matrix, A, i.e., A = 1/n S ST

for some S

Error ellipse with the major axis as the larger eigenvalue and the minor axis as the smaller eigenvalue

25.0,75.1175.

75.121

A

Physical interpretation

Orthogonal directions of greatest variance in data Projections along PC1 (Principal Component) discriminate

the data most along any one axis

Original Variable A

Ori

gin

al V

ari

ab

le B

PC 1PC 2

Physical interpretation First principal component is the direction of

greatest variability (covariance) in the data Second is the next orthogonal

(uncorrelated) direction of greatest variability So first remove all the variability along the first

component, and then find the next direction of greatest variability

And so on … Thus each eigenvectors provides the

directions of data variances in decreasing order of eigenvalues

For more information: See Gram-Schmidt Orthogonalization in G. Strang’s lectures

Multivariate Gaussian

Bivariate Gaussian

Spherical, diagonal, full covariance

Let be a square matrix with m linearly independent eigenvectors (a “non-defective” matrix)

Theorem: Exists an eigen decomposition

(cf. matrix diagonalization theorem) Columns of U are eigenvectors of S Diagonal elements of are eigenvalues of

Eigen/diagonal Decomposition

diagonal

Unique for

distinct eigen-values

Diagonal decomposition: why/how

nvvU ...1Let U have the eigenvectors as columns:

n

nnnn vvvvvvSSU

............

1

1111

Then, SU can be written

And S=UU–1.

Thus SU=U, or U–1SU=

Diagonal decomposition - example

Recall .3,1;21

1221

S

The eigenvectors and form

1

1

1

1

11

11U

Inverting, we have

2/12/1

2/12/11U

Then, S=UU–1 =

2/12/1

2/12/1

30

01

11

11

RecallUU–1 =1.

Example continued

Let’s divide U (and multiply U–1) by 2

2/12/1

2/12/1

30

01

2/12/1

2/12/1Then, S=

Q (Q-1= QT )

Why? Stay tuned …

If is a symmetric matrix:

Theorem: Exists a (unique) eigen

decomposition

where Q is orthogonal: Q-1= QT

Columns of Q are normalized eigenvectors

Columns are orthogonal.

(everything is real)

Symmetric Eigen Decomposition

TQQS

Spectral Decomposition theorem If A is a symmetric and positive definite k £ k

matrix (xTAx > 0) with i (i > 0) and ei, i = 1 k being the k eigenvector and eigenvalue pairs, then

This is also called the eigen decomposition theorem Any symmetric matrix can be reconstructed

using its eigenvalues and eigenvectors

Tk

i k

Ti

kii

kkk

Tk

kkk

k

T

kk

T

kkkPPeeAeeeeeeA

1 11111

2122

11

111

k

kkk

kk

00

00

00

,, 2

1

21 eeeP

Example for spectral decomposition Let A be a symmetric, positive definite

matrix

The eigenvectors for the corresponding eigenvalues are

Consequently,

02316.016.65

0det8.24.0

4.02.2

2

IAA

51,

52,

52,

51

21TT ee

4.08.0

8.06.1

4.22.1

2.16.0

51

52

51

52

25

25

1

52

51

38.24.0

4.02.2A

Singular Value Decomposition If A is a rectangular m £ k matrix of real numbers, then

there exists an m £ m orthogonal matrix U and a k £ k orthogonal matrix V such that

is an m £ k matrix where the (i, j)th entry i ¸ 0, i = 1 min(m, k) and the other entries are zero The positive constants i are the singular values of A

If A has rank r, then there exists r positive constants 1, 2,r, r orthogonal m £ 1 unit vectors u1,u2,,ur and r orthogonal k £ 1 unit vectors v1,v2,,vr such that

Similar to the spectral decomposition theorem

IVVUUVUA

TT

kk

T

kmmmkm

r

i

Tiii

1

vuA

Singular Value Decomposition (contd.) If A is a symmetric and positive

definite then SVD = Eigen decomposition

EIG(i) = SVD(i2)

Here AAT has an eigenvalue-eigenvector pair (i

2,ui)

Alternatively, the vi are the eigenvectors of ATA with the same non zero eigenvalue i

2TT VVAA 2

T

TT

TTTT

UU

UVVU

VUVUAA

2

Example for SVD Let A be a symmetric, positive definite

matrix U can be computed as

V can be computed as

21,

21,

21,

2110,120det

111

111

11

31

13

131

113

131

113

2121TTT

T

uuIAA

AAA

305,

302,

301,0,

51,

52,

61,

62,

61

0,10,120det

242

4100

2010

131

113

11

31

13

131

113

321

321

TTT

T

T

vvv

IAA

AAA

Example for SVD Taking 2

1=12 and 22=10, the singular value

decomposition of A is

Thus the U, V and are computed by performing eigen decomposition of AAT and ATA

Any matrix has a singular value decomposition but only symmetric, positive definite matrices have an eigen decomposition

0,5

1,5

2

21

21

106

1,6

2,6

1

21

21

12

131

113A

Applications of SVD in Linear Algebra Inverse of a n £ n square matrix, A

If A is non-singular, then A-1 = (UVT)-1= V-1UT where -1=diag(1/1, 1/1,, 1/n) If A is singular, then A-1 = (UVT)-1¼ V0

-1UT where

0-1=diag(1/1, 1/2,, 1/i,0,0,,0)

Least squares solutions of a m£n system Ax=b (A is m£n, m¸n) =(ATA)x=ATb ) x=(ATA)-1 ATb=A+b If ATA is singular, x=A+b¼ (V0

-1UT)b where 0-1 =

diag(1/1, 1/2,, 1/i,0,0,,0)

Condition of a matrix Condition number measures the degree of singularity of A

Larger the value of 1/n, closer A is to being singular

http://www.cse.unr.edu/~bebis/MathMethods/SVD/lecture.pdf

Applications of SVD in Linear Algebra Homogeneous equations, Ax

= 0 Minimum-norm solution is

x=0 (trivial solution) Impose a constraint, “Constrained” optimization

problem Special Case

If rank(A)=n-1 (m ¸ n-1, n=0) then x= vn ( is a constant)

Genera Case If rank(A)=n-k (m ¸ n-k, n-

k+1== n=0) then x=1vn-

k+1++kvn with 2

1++2n=1

For proof: Johnson and Wichern, “Applied Multivariate Statistical Analysis”, pg 79

Axx 1min

1x

Has appeared before Homogeneous solution of

a linear system of equations

Computation of Homogrpahy using DLT

Estimation of Fundamental matrix

What is the use of SVD? SVD can be used to

compute optimal low-rank approximations of arbitrary matrices.

Face recognition Represent the face images

as eigenfaces and compute distance between the query face image in the principal component space

Data mining Latent Semantic Indexing

for document extraction Image compression

Karhunen Loeve (KL) transform performs the best image compression In MPEG, Discrete Cosine

Transform (DCT) has the closest approximation to the KL transform in PSNR

Singular Value Decomposition Illustration of SVD dimensions and

sparseness

SVD example

Let

01

10

11

A

Thus m=3, n=2. Its SVD is

2/12/1

2/12/1

00

30

01

3/16/12/1

3/16/12/1

3/16/20

Typically, the singular values arranged in decreasing order.

SVD can be used to compute optimal low-rank approximations.

Approximation problem: Find Ak of rank k such that

Ak and X are both mn matrices.

Typically, want k << r.

Low-rank Approximation

Frobenius normFkXrankX

k XAA

min)(:

Solution via SVD

Low-rank Approximation

set smallest r-ksingular values to zero

Tkk VUA )0,...,0,,...,(diag 1

column notation: sum of rank 1 matrices

Tii

k

i ik vuA

1

k

Approximation error How good (bad) is this approximation? It’s the best possible, measured by the

Frobenius norm of the error:

where the i are ordered such that i i+1.

Suggests why Frobenius error drops as k increased.

1)(:

min

kFkFkXrankX

AAXA

Documents

Eigen Decomposition and Singular Value Decomposition Based on the slides by Mani Thomas Modified and extended by Longin Jan Latecki