60
Principal Component Analysis Based on L1- Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008

Embed Size (px)

Citation preview

Page 1: Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008

Principal Component Analysis Based on L1-Norm Maximization

Nojun KwakIEEE Transactions on Pattern Analysis

and Machine Intelligence, 2008

Page 2: Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008

Outline

• Introduction• Background Knowledge• Problem Description• Algorithms• Experiments• Conclusion

2

Page 3: Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008

Introduction

• In data analysis problems, why do we need dimensionality reduction?

• Principal Component Analysis(PCA)• PCA based on the L2-Norm is prone to the

presence of outliers.

3

Page 4: Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008

Introduction

• Some algorithms for this problem:– L1-PCA• Weighted median method• Convex programming method• Maximum likelihood estimation method

– R1-PCA

4

Page 5: Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008

Background Knowledge

• L1-Norm, L2-Norm• Principal Component Analysis(PCA)

5

Page 6: Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008

Lp-Norm

• Consider an n-dimensional vector: • Define the p-Norm:

• L1-Norm is

• L2-Norm is

6

],...,,[ 21 nxxxx pn

i

p

ipxx

1

1

n

iixx

11

n

iixx

1

2

2

Page 7: Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008

Lp-Norm

• For example, x = [1, 2, 3]

• Special case :

7

name symbol value approximation

L1-Norm |x|1 6 6.000

L2-Norm |x|2 3.742

L3-Norm |x|3 3.302

L4-Norm |x|4 3.146

L∞-Norm |x|∞ 3 3.000

14

326

72 41

iixx max

Page 8: Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008

Principal Component Analysis

• Principal component analysis (PCA) is a technique to seek projections that best preserve the data in a least-squares sense.

• The projections constitute a low-dimensional linear subspace.

8

Page 9: Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008

Principal Component Analysis

• The projection vectors , …, are the eigenvectors of the scatter matrix having the largest eigenvalues.

9

Scatter matrix:

Page 10: Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008

Principal Component Analysis

• The rotational invariance property: a fundamental property of Euclidean space with L2-Norm.

• So, PCA has rotational invariance property.

10

Page 11: Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008

Problem Description

• Traditional PCA: the presence of outliers.• The effect of the outliers with a large norm is

exaggerated by the use of the L2-Norm.• Other method?

11

Page 12: Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008

Problem Description

• If we use L1-Norm instead of L2-Norm:

where is the dataset.

12

is the projection matrix.is the coefficient matrix.

Page 13: Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008

Problem Description

• However, it’s very hard to achieve the exact solution.

• To resolve it, Ding et al. propose the R1-Norm and an approximate solution.

13

We call it R1-PCA.

Page 14: Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008

Problem Description

• The solution of R1-PCA depends on the dimension of subspace being found.

• The optimal solution when is not necessarily a subspace of when .

• The proposed method: PCA-L1

14

Page 15: Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008

• We consider that:

• The maximization is done on the feature space.

Algorithms

15

ensure to orthonormality of the projection matrix.

Page 16: Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008

Algorithms

• However, it’s difficult to find a global solution for .

• The optimal ith projection varies with different as in R1-PCA.

• How to solve it?

16

Page 17: Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008

Algorithms

• We simplify it into a series of problems using a greedy search method.

• Then, if we set , it become that:

17

Although the successive greedy solutions may differ from the optimal solution, it’s expected to provide a good approximation.

Page 18: Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008

Algorithms

• The optimization is still difficult because it contains absolute value operation, which is nonlinear.

18

Page 19: Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008

Algorithms

19

Page 20: Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008

Algorithms

• However, does the PCA-L1 procedure finds a local maximum point ?

• We should prove it.

20

Page 21: Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008

Theorem

• Theorem: With the PCA-L1 procedure vector converges to , which is a local maximum points of .

• The proof includes two parts:– is a non-decreasing function of .– The objective function has a local maximum value

at .

21

Page 22: Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008

Proof

• is a non-decreasing function of .

is the set of optimal polarity corresponding to . For all ,

22

Page 23: Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008

Proof

• This holds because

23

are parallel.

The inner product of two vectors.

Page 24: Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008

Proof

• So, the objective function is non-decreasing and there are a finite number of data points.

The PCA-L1 procedure converges to a

projection vector .

24

Page 25: Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008

Proof

• The objective function has a local maximum value at .

• Because converges to by the PCA-L1 procedure, for all .

• By Step 4b, for all .

25

Page 26: Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008

Proof

• There exists a small neighborhood of , such that if , then for all .

• Then, since is parallel to , the inequality holds for all .

is a local maximum point.

26

Page 27: Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008

Algorithms

• So, the PCA-L1 procedure finds a local maximum point .

• Because is a linear combination of data points , i.e., , it’s invariant to rotations.

Under rotational transformation R:X→RX, then W→RW.

27

Page 28: Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008

Algorithms

• Computation complexity: • is the number of iterations for

convergence. does not depend on the

dimension .

28

Page 29: Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008

Algorithms

• The PCA-L1 procedure just finds a local maximum solution. It may not be the global solution.

• We can set appropriately.– By setting .– Run the PCA-L1 with different initial vector .

29

Page 30: Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008

Algorithms

• Extracting Multiple Features :

30

Original PCA’s thought.

Run the PCA-L1 for each feature dimension.

Page 31: Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008

Algorithms

• How to guarantee the orthonormality of the projection vectors?

• We should show that is orthogonal to .

31

Page 32: Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008

Proof

• The projection vector is a linear combination of samples .

It’s in the subspace spanned by .• Then, we consider :

32

Form Greedy search algorithm.

normal vector, (=1)

Page 33: Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008

Proof

• Because , is orthogonal to ..

is orthogonal to .

33

The orthonormality of the projection vectors is guaranteed.

Page 34: Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008

Algorithms

• Even if the greedy search algorithm does not provide the optimal solution, it provides a set of good projections that maximize L1 dispersion.

34

Page 35: Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008

Algorithms

• For data analysis, we could decide how much data could be captured.

• In PCA, we could compute the eigenvalue:

35

The eigenvalue is equivalent to the variance of the feature.

We can compute the ratio of the variance to the total variance.

The sum of variance:

In eigenvalue, it exceeds 95% of the total variance, m is set to .

Page 36: Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008

Algorithms

• In PCA-L1, once is obtained, we can compute the variance of the feature.

• The sum of variance:

• The total variance:

36

We can set the appropriate number of extracted features like original PCA.

Page 37: Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008

Experiments

• In the experiments, we apply PCA-L1 algorithm and compared with R1-PCA and original PCA.

• Three experiments:– A Toy problem with an Outlier– UCI Data Sets– Face Reconstruction

37

Page 38: Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008

A Toy Problem with an Outlier

• Consider the data points in a 2D space:

• If we discard the outlier, the projection vector should be .

38

an outlier.

Page 39: Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008

A Toy Problem with an Outlier

• The projection vector:

39

outlier

Page 40: Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008

A Toy Problem with an Outlier

• The residual error :

40

outlier

Average residual error

PCA-L1 L2-PCA R1-PCA

1.200 1.401 1.206Much influenced by the outlier.

Page 41: Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008

UCI Data Sets

• Data sets in UCI machine learning repositories.• Compare the classification performances.• 1-NN classifier was used and 10-fold cross

validation for average classification rate.• For PCA-L1, we set the initial projection vector

as .

41

Page 42: Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008

UCI Data Sets

• The data sets:

42

Page 43: Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008

UCI Data Sets

• The average correct classification rates:

43

Page 44: Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008

UCI Data Sets

• The average correct classification rates:

44

Page 45: Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008

UCI Data Sets

• The average correct classification rates:

45

In many cases, PCA-L1 outperformed L2-PCA and R1-PCA when the number of extracted features was small.

Page 46: Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008

UCI Data Sets

• Average Classification rate on UCI Data Sets:

46

PCA-L1 outperformed other methods by 1% on average.

Page 47: Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008

UCI Data Sets

• Computation cost:

47

Page 48: Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008

Face Reconstruction

• The Yale face database.– 11 individuals.– 15 face images for one person.

• Among 165 images, 20% were selected randomly and occluded with a noise block.

48

Page 49: Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008

Face Reconstruction

• For these image sets, we applied:– L2-PCA(eigenface)– R1-PCA– PCA-L1

• Then, we used extracted features to reconstruct images.

49

Page 50: Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008

Face Reconstruction

50

• Experimental results:

Page 51: Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008

Face Reconstruction

• The average reconstruction error is:

51

original image

reconstructed image

Form 10~20 features, the difference became apparent and PCA-L1 outperformed than other methods.

Page 52: Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008

Face Reconstruction

• We added 30 dummy images consist of random black and white dots to the original 165 Yale images.

• We applied:– L2-PCA(eigenface)– R1-PCA– PCA-L1

• We reconstructed images with features.

52

Page 53: Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008

Face Reconstruction

• Experimental results:

53

Page 54: Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008

Face Reconstruction

• The average reconstruction error:

54

From 6 to 36 features, the error of L2-PCA is constant. The dummy images serious affect those projection vectors.

From 14 to 36 features, the error of R1-PCA is increasing. The dummy images serious affect those projection vectors.

Page 55: Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008

Conclusion

• The PCA-L1 was proven to find a local maximum point.

• The computation complexity is proportional to– the number of samples– the dimension of input space– The number of iterations

• The method is usually faster and robust to outliers.

55

Page 56: Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008

Principal Component Analysis

• Given a dataset of l samples:• We represent D by projecting the data onto a

line running through the sample mean , denoted as ( ):

56

lidi RxD 1

Page 57: Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008

Principal Component Analysis

• Then,

57

Page 58: Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008

Principal Component Analysis

• To look for the best direction ,

58

scatter matrix

Page 59: Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008

Principal Component Analysis

• We want to minimize :

Maximize , subject to • We use Lagrange multipliers:

59

Page 60: Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008

Principal Component Analysis

• Since , minimizing can be achieved by choosing as the largest eigenvector of .

• Similarly, we can extend 1-d to -d projection.

60