46
Dimensionality Reduction: Principle Component Analysis Lingqiao Liu

Dimensionality Reduction I: Principle Component Analysisjaven/talk/L6... · 2014-09-02 · Dimensionality Reduction General principle: Preserve “useful” information in low dimensional

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Dimensionality Reduction I: Principle Component Analysisjaven/talk/L6... · 2014-09-02 · Dimensionality Reduction General principle: Preserve “useful” information in low dimensional

Dimensionality Reduction:

Principle Component Analysis

Lingqiao Liu

Page 2: Dimensionality Reduction I: Principle Component Analysisjaven/talk/L6... · 2014-09-02 · Dimensionality Reduction General principle: Preserve “useful” information in low dimensional

What is dimensionality reduction?

n

p

A n

k

X

Page 3: Dimensionality Reduction I: Principle Component Analysisjaven/talk/L6... · 2014-09-02 · Dimensionality Reduction General principle: Preserve “useful” information in low dimensional

Dimensionality Reduction: why?

Extract underlying factors

Page 4: Dimensionality Reduction I: Principle Component Analysisjaven/talk/L6... · 2014-09-02 · Dimensionality Reduction General principle: Preserve “useful” information in low dimensional

Dimensionality Reduction: why?

Reduce data noise

◦ Face recognition

◦ Applied to image de-noising

Image courtesy of Charles-Alban Deledalle, Joseph Salmon, Arnak Dalalyan; BMVC 2011

Image denoising with patch-based PCA: local versus global

Page 5: Dimensionality Reduction I: Principle Component Analysisjaven/talk/L6... · 2014-09-02 · Dimensionality Reduction General principle: Preserve “useful” information in low dimensional

Dimensionality Reduction: why?

Reduce the number of model parameters

◦ Avoid over-fitting

◦ Reduce the computational load

Page 6: Dimensionality Reduction I: Principle Component Analysisjaven/talk/L6... · 2014-09-02 · Dimensionality Reduction General principle: Preserve “useful” information in low dimensional

Dimensionality Reduction: why?

Visualization

Page 7: Dimensionality Reduction I: Principle Component Analysisjaven/talk/L6... · 2014-09-02 · Dimensionality Reduction General principle: Preserve “useful” information in low dimensional

Dimensionality Reduction

General principle: Preserve “useful” information in low dimensional data

How to define “usefulness”? ◦ Many

◦ An active research direction in machine learning

Taxonomy ◦ Supervised or Unsupervised

◦ Linear or nonlinear

Commonly used methods: ◦ PCA, LDA (linear discriminant analysis), local

linear embedding and more.

Page 8: Dimensionality Reduction I: Principle Component Analysisjaven/talk/L6... · 2014-09-02 · Dimensionality Reduction General principle: Preserve “useful” information in low dimensional

Outline

Theoretic Part

◦ PCA explained: two perspectives

◦ Mathematic basics

◦ PCA objective and solution

Application Example

◦ Eigen face

◦ Handle high feature dimensionality

Extensions:

◦ Kernel PCA

Page 9: Dimensionality Reduction I: Principle Component Analysisjaven/talk/L6... · 2014-09-02 · Dimensionality Reduction General principle: Preserve “useful” information in low dimensional

PCA explained: Two perspectives

Data correlation and information

redundancy

Signal-noise ratio maximization

Page 10: Dimensionality Reduction I: Principle Component Analysisjaven/talk/L6... · 2014-09-02 · Dimensionality Reduction General principle: Preserve “useful” information in low dimensional

PCA explained: Data correlation

and information redundancy

Page 11: Dimensionality Reduction I: Principle Component Analysisjaven/talk/L6... · 2014-09-02 · Dimensionality Reduction General principle: Preserve “useful” information in low dimensional

PCA explained: Data correlation

and information redundancy Dependency vs. Correlation Dependent is a stronger criterion

Equivalent when data follows Gaussian

distribution

PCA only de-correlates data

◦ One limitation of PCA

◦ ICA, but it is more complicate

Page 12: Dimensionality Reduction I: Principle Component Analysisjaven/talk/L6... · 2014-09-02 · Dimensionality Reduction General principle: Preserve “useful” information in low dimensional

PCA explained: Data correlation

and information redundancy Geometric interpretation of correlation

Page 13: Dimensionality Reduction I: Principle Component Analysisjaven/talk/L6... · 2014-09-02 · Dimensionality Reduction General principle: Preserve “useful” information in low dimensional

PCA explained: Data correlation

and information redundancy

Page 14: Dimensionality Reduction I: Principle Component Analysisjaven/talk/L6... · 2014-09-02 · Dimensionality Reduction General principle: Preserve “useful” information in low dimensional

PCA explained: Data correlation

and information redundancy

Page 15: Dimensionality Reduction I: Principle Component Analysisjaven/talk/L6... · 2014-09-02 · Dimensionality Reduction General principle: Preserve “useful” information in low dimensional

PCA explained: Data correlation

and information redundancy Correlation can be removed by rotating

the data point or coordinate

Page 16: Dimensionality Reduction I: Principle Component Analysisjaven/talk/L6... · 2014-09-02 · Dimensionality Reduction General principle: Preserve “useful” information in low dimensional

PCA explained: Signal-noise ratio

maximization

Maximize

Data matrix Data matrix

Signal Noise

Transform

Page 17: Dimensionality Reduction I: Principle Component Analysisjaven/talk/L6... · 2014-09-02 · Dimensionality Reduction General principle: Preserve “useful” information in low dimensional

PCA explained: Signal-noise ratio

maximization Keep one signal dimension, discard one

noisy dimension

Signal

Noise

Page 18: Dimensionality Reduction I: Principle Component Analysisjaven/talk/L6... · 2014-09-02 · Dimensionality Reduction General principle: Preserve “useful” information in low dimensional

PCA explained: Signal-noise ratio

maximization

Signal

Noise

Page 19: Dimensionality Reduction I: Principle Component Analysisjaven/talk/L6... · 2014-09-02 · Dimensionality Reduction General principle: Preserve “useful” information in low dimensional

PCA explained

Target

◦ 1: Find a new coordinate system which makes different dimensions zero correlated

◦ 2: Find a new coordinate system which aligns (top-k) largest variance

Method Rotate the data point or coordinate

Mathematically speaking…

◦ How to rotate?

◦ How to express our criterion

Page 20: Dimensionality Reduction I: Principle Component Analysisjaven/talk/L6... · 2014-09-02 · Dimensionality Reduction General principle: Preserve “useful” information in low dimensional

Mathematic Basics

Mean, Variance, Covariance

Matrix norm, trace,

Orthogonal matrix, basis

Eigen decomposition

Page 21: Dimensionality Reduction I: Principle Component Analysisjaven/talk/L6... · 2014-09-02 · Dimensionality Reduction General principle: Preserve “useful” information in low dimensional

Mathematic Basics

(Sample) Mean

(Sample) Variance

(Sample) Covariance

Page 22: Dimensionality Reduction I: Principle Component Analysisjaven/talk/L6... · 2014-09-02 · Dimensionality Reduction General principle: Preserve “useful” information in low dimensional

Mathematic Basics

Covariance Matrix

Page 23: Dimensionality Reduction I: Principle Component Analysisjaven/talk/L6... · 2014-09-02 · Dimensionality Reduction General principle: Preserve “useful” information in low dimensional

Mathematic Basics

Frobenius norm

Trace

Page 24: Dimensionality Reduction I: Principle Component Analysisjaven/talk/L6... · 2014-09-02 · Dimensionality Reduction General principle: Preserve “useful” information in low dimensional

Mathematic Basics

Symmetric Matrix

Covariance matrix is symmetric

Page 25: Dimensionality Reduction I: Principle Component Analysisjaven/talk/L6... · 2014-09-02 · Dimensionality Reduction General principle: Preserve “useful” information in low dimensional

Mathematic Basics

Orthogonal matrix

Rotation effect

Page 26: Dimensionality Reduction I: Principle Component Analysisjaven/talk/L6... · 2014-09-02 · Dimensionality Reduction General principle: Preserve “useful” information in low dimensional

Mathematic Basics

Relationship to coordinate system

◦ A point = linear combination of bases

◦ Combination weight = coordinate

Each row (column) of Q = basis

◦ Not unique

◦ Relation to coordinate rotation

New coordinate

Page 27: Dimensionality Reduction I: Principle Component Analysisjaven/talk/L6... · 2014-09-02 · Dimensionality Reduction General principle: Preserve “useful” information in low dimensional

Mathematic Basics

(2,2)

Old coordinate

New coordinate

Page 28: Dimensionality Reduction I: Principle Component Analysisjaven/talk/L6... · 2014-09-02 · Dimensionality Reduction General principle: Preserve “useful” information in low dimensional

Mathematic Basics

Eigenvalue and Eigenvector

Properties

◦ Multiple solutions

◦ Scaling invariant

◦ Relation to the rank of A

Page 29: Dimensionality Reduction I: Principle Component Analysisjaven/talk/L6... · 2014-09-02 · Dimensionality Reduction General principle: Preserve “useful” information in low dimensional

Mathematic Basics

Eigen-decomposition

If is symmetric

Page 30: Dimensionality Reduction I: Principle Component Analysisjaven/talk/L6... · 2014-09-02 · Dimensionality Reduction General principle: Preserve “useful” information in low dimensional

PCA: solution

Target 1: de-correlation

Page 31: Dimensionality Reduction I: Principle Component Analysisjaven/talk/L6... · 2014-09-02 · Dimensionality Reduction General principle: Preserve “useful” information in low dimensional

PCA: solution

Page 32: Dimensionality Reduction I: Principle Component Analysisjaven/talk/L6... · 2014-09-02 · Dimensionality Reduction General principle: Preserve “useful” information in low dimensional

PCA: solution

Variance of each dimension

Rank dimensions according to their

corresponding eigenvalues

Page 33: Dimensionality Reduction I: Principle Component Analysisjaven/talk/L6... · 2014-09-02 · Dimensionality Reduction General principle: Preserve “useful” information in low dimensional

PCA: algorithm

1. Subtract mean

2. Calculate the covariance matrix

3. Calculate eigenvectors and eigenvalues

of the covariance matrix

4. Rank eigenvectors by its corresponding

eigenvalues

4. Obtain P with its column vectors

corresponding to the top k eigenvectors

Page 34: Dimensionality Reduction I: Principle Component Analysisjaven/talk/L6... · 2014-09-02 · Dimensionality Reduction General principle: Preserve “useful” information in low dimensional

PCA: MATLAB code

Page 35: Dimensionality Reduction I: Principle Component Analysisjaven/talk/L6... · 2014-09-02 · Dimensionality Reduction General principle: Preserve “useful” information in low dimensional

PCA: reconstruction

Reconstruct x

Derive PCA through minimizing the

reconstruction error

Page 36: Dimensionality Reduction I: Principle Component Analysisjaven/talk/L6... · 2014-09-02 · Dimensionality Reduction General principle: Preserve “useful” information in low dimensional

PCA: reconstruction

Reighley Quotient

Solution = PCA

Page 37: Dimensionality Reduction I: Principle Component Analysisjaven/talk/L6... · 2014-09-02 · Dimensionality Reduction General principle: Preserve “useful” information in low dimensional

Application: Eigen-face method

Sirovich and Kirby (1987) showed

that PCA could be used on a collection of

face images to form a set of basis

features.

Not only limited to face recognition

Steps

◦ Image as high-dimensional feature

◦ PCA

Page 38: Dimensionality Reduction I: Principle Component Analysisjaven/talk/L6... · 2014-09-02 · Dimensionality Reduction General principle: Preserve “useful” information in low dimensional

Application: Eigen-face method

Page 39: Dimensionality Reduction I: Principle Component Analysisjaven/talk/L6... · 2014-09-02 · Dimensionality Reduction General principle: Preserve “useful” information in low dimensional

Application: Reconstruction

Reconstructed from top-2 eigenvectors

Page 40: Dimensionality Reduction I: Principle Component Analysisjaven/talk/L6... · 2014-09-02 · Dimensionality Reduction General principle: Preserve “useful” information in low dimensional

Application: Reconstruction

Reconstructed from top-15 eigenvectors

Page 41: Dimensionality Reduction I: Principle Component Analysisjaven/talk/L6... · 2014-09-02 · Dimensionality Reduction General principle: Preserve “useful” information in low dimensional

Application: Reconstruction

Reconstructed from top-40 eigenvectors

Page 42: Dimensionality Reduction I: Principle Component Analysisjaven/talk/L6... · 2014-09-02 · Dimensionality Reduction General principle: Preserve “useful” information in low dimensional

Application: Eigen-face method

From large to small eigenvalues

Common

Patterns Discriminative

Patterns

Noisy

Patterns

Page 43: Dimensionality Reduction I: Principle Component Analysisjaven/talk/L6... · 2014-09-02 · Dimensionality Reduction General principle: Preserve “useful” information in low dimensional

Eigen-face method: How to handle

high dimensionality

For high-dimensional data

can be too large

The number of samples is relatively small

Page 44: Dimensionality Reduction I: Principle Component Analysisjaven/talk/L6... · 2014-09-02 · Dimensionality Reduction General principle: Preserve “useful” information in low dimensional

Eigen-face method: How to handle

high dimensionality 1. Centralize data

2. Calculate the kernel matrix

3. Perform Eigen-decomposition on the kernel

matrix and obtain its eigenvector

4. Obtain the Eigenvector of the covariance

matrix by

Question? How many eigenvectors you can

obtain in this way?

Page 45: Dimensionality Reduction I: Principle Component Analysisjaven/talk/L6... · 2014-09-02 · Dimensionality Reduction General principle: Preserve “useful” information in low dimensional

Extension: Kernel PCA

Kernel Method

You do not have access to but you

know

Page 46: Dimensionality Reduction I: Principle Component Analysisjaven/talk/L6... · 2014-09-02 · Dimensionality Reduction General principle: Preserve “useful” information in low dimensional

Extension: Kernel PCA

Use the same way as how we handle high-

dimensional data