21
CS 789 ADVANCED BIG DATA - EIGENVECTOR - PRINCIPAL COMPONENTS ANALYSIS Mingon Kang, Ph.D. Department of Computer Science, University of Nevada, Las Vegas * Some contents are adapted from Dr. Hung Huang and Dr. Vassilis Athitsos at UT Arlington

CS 7265 Big Data Analytics - eigenvector - Principal ...mkang.faculty.unlv.edu/teaching/CS789/12.Principal... · Eigenvector & eigenvalue Eigenvector is a non-zero vector that does

  • Upload
    others

  • View
    20

  • Download
    0

Embed Size (px)

Citation preview

CS 789 ADVANCED BIG DATA

- EIGENVECTOR

- PRINCIPAL COMPONENTS ANALYSIS

Mingon Kang, Ph.D.

Department of Computer Science, University of Nevada, Las Vegas

* Some contents are adapted from Dr. Hung Huang and Dr. Vassilis Athitsos at UT

Arlington

Eigenvector & eigenvalue

Eigenvector is a non-zero vector that does not

change its direction when that linear transformation

is applied to it.

𝐀𝐱 = 𝜆𝐱

where 𝐀 is a square matrix

Eigenvector: Geometric exploration

http://www.sineofthetimes.org/eigenvectors-of-2-x-2-

matrices-a-geometric-exploration/

Eigenvector & eigenvalue

𝐀 =0 11 0

, 𝐱 =11

Then, 𝐀𝐱 =11

𝐀 =0 11 0

, 𝐱 =−11

Then, 𝐀𝐱 =1−1

Sum of 𝜆 = det(A) = σ𝑎𝑖𝑖

Eigenvector & eigenvalue

How to solve it?

𝐀𝐱 = 𝜆𝐱 ⇒ 𝐀𝐱 − 𝜆𝐱 = 0 ⇒ (𝐀 − 𝜆𝐈)𝐱 = 0

𝐀 − 𝜆𝐈 is a singular matrix ⇒ det(𝐀 − 𝜆𝐈 ) = 0

E.g.,

𝐀 =3 11 3

Principal Components Analysis

Reduce the dimensionality of a data set while

preserving as much as possible of the variation

present in the data set.

Transform to a new set of variables, “Principal

Components (PCs)

Example

Projection to principal components

Data with two variables: some variables may be

correlated each other

Projection to new spaces which are uncorrelated

Projection on PCA

Dimensionality Reduction

If we have 100 points on the

plot, how many numbers do

we need to specify them?

Every point (x, y) is on a line

𝑦 = 𝑎𝑥 + 𝑏

A, b and 100 numbers of x

coordinate of each point

Dimensionality Reduction

Project all points to a single

line

If we find the line, we can

approximately represent the

data with lower

dimensionality.

Vector Projection

Project of vector a onto b

Orthogonal projection of a onto a straight line parallel

to b.

𝐚𝟏 = a1መ𝐛

a1 is a scalar and መ𝐛 is the unit

vector of vector b

a1 = 𝐚 cos𝜃 = 𝐚 ∙ 𝐛

∙ is a dot product

https://en.wikipedia.org/wiki/Vector_projection

PCA

Given data 𝐱 = x1, … , xp , p random variables

𝛂𝐤 is a vector of p constants

New projection: 𝛂𝐤 ∙ 𝐱 = 𝛂𝐤′ 𝐱

Derivation of PCA

How to get principal components

1. Find linear function of x, 𝛂𝟏′ 𝐱, with maximum variance

2. Next find another linear function 𝛂𝟐′ 𝐱, which is

uncorrelated with 𝛂𝟏′ 𝐱 with maximum variance

3. Repeat, where k << p

Derivation of PCA

Find 𝛂𝐤′ 𝐱, with maximum variance

maximize Var(𝛂𝐤′ 𝐱) = 𝛂𝐤

′ 𝚺𝛂𝐤

subject to 𝛂𝐤′ 𝛂𝐤 = 𝟏 (unit length vector)

Use Lagrange multipliers

𝛂𝐤′ 𝚺𝛂𝐤 − 𝜆k(𝛂𝐤

′ 𝛂𝐤 − 𝟏)

Derivation of PCA

𝑑

𝑑𝛂𝐤𝛂𝐤′ 𝚺𝛂𝐤 − 𝜆𝑘 𝛂𝐤

′ 𝛂𝐤 − 1 = 0

𝚺𝛂𝐤 − 𝜆𝑘𝛂𝐤 = 0𝚺𝛂𝐤 = 𝜆𝑘𝛂𝐤

Eigenvector equation

Derivation of PCA

We can obtain eigenvectors and eigenvalues from

the equation.

Choose 𝜆𝑘 to be as big as possible

𝜆1 is the largest eigenvalue of 𝚺 and 𝛂𝟏 is the

corresponding eigenvector

First principal component of x

Derivation of PCA

Second principal component, 𝛂𝟐′ 𝐱 maximizes 𝛂𝟐

′ 𝚺𝛂𝟐

subject to being uncorrelated with 𝛂𝟏′ 𝐱

cov(𝛂𝟏′ 𝐱, 𝛂𝟐

′ 𝐱) = 𝛂𝟏′ 𝚺𝛂𝟐 = 𝛂𝟐

′ 𝜆1𝛂𝟏

= 𝜆1𝛂𝟐′ 𝛂𝟏 = 𝜆1𝛂𝟏

′ 𝛂𝟐 = 𝟎

Lagrangian again

𝛂𝟐′ 𝚺𝛂𝟐 − 𝜆2 𝛂𝟐

′ 𝛂𝟐 − 𝟏 − 𝜙𝛂𝟐′ 𝛂𝟏

Derivation of PCA

𝑑

𝑑𝛂𝟐(𝛂𝟐

′ 𝚺𝛂𝟐 − 𝜆2 𝛂𝟐′ 𝛂𝟐 − 1 − 𝜙𝛂𝟐

′ 𝛂𝟏) = 0

𝚺𝛂𝟐 − 𝜆2𝛂𝟐 − 𝜙𝛂𝟏 = 0 (multiple 𝛂𝟏)𝛂𝟏′ 𝚺𝛂𝟐 − 𝜆2𝛂𝟏

′ 𝛂𝟐 − 𝜙𝛂𝟏′ 𝛂𝟏 = 0

0 − 0 − 𝜙1=0

Now, 𝜙 = 0

𝚺𝛂𝟐 − 𝜆2𝛂𝟐 = 0

Derivation of PCA

This process can be repeated for k = 1…p yielding

up to p different eigenvectors of 𝚺 along with the

corresponding eigenvalues

Applications of PCA

Visualization of high-dimensional data

Applications of PCA

Eigen Face

References

Principal Component Analysis by Frank Wood

http://www.stat.columbia.edu/~fwood/Teaching/w

4315/Fall2009/pca.pdf

http://www.vision.jhu.edu/teaching/vision08/Hand

outs/case_study_pca1.pdf