PCA and ICA

7/28/2019 PCA and ICA

1/15

Principal Components Analysis & Independent Components Analysis

Aaron ClarkeSN: 206071237

Prof. Robert Cribbie

Statistics 6130


2/15

Introduction:

A common problem in information theory is that of representing a message space

with the smallest possible set of message components (Cottrell et al., 1987; Oja, 1983).

That is, to find a basis set of message components that could be used to form every

message, given a particular set of possible messages. For example, the Morse code forms

a basis set for the set of possible message that can be transmitted by Morse code. If one

wanted to form the message SOS, then one would simply combine the element for S

(---) with the element for O () such that the full message would be ------. If the

only message that anyone ever sent by Morse code was ------, then --- and

would be the basis set for the set of all messages sent by Morse code. The question then

arises: is there a basis set for the set of images that the human visual system was likely

to encounter in its evolutionary environment? If so then one would expect that the

human visual system would be adapted to optimally perceive this basis set and would use

it to reconstruct observed image information. It is already known that Fourier analysis

can be used to decompose any given image into a set of spatial frequency components of

varying phase, orientation and amplitude, however, the elements of the set of all possible

spatial frequency components are not equally distributed in natural images (Field, 1987).

Thus one would expect that there might exist a smaller basis set of spatial frequency

components that could be used to compose the set of natural images that the human

visual system is exposed to. Evidence supporting this theory can be found in the

neurophysiological literature where it has been shown that single neurons in the visual

cortex respond to a finite set of Gaussian enveloped Fourier components (called Gabor

patches) of particular spatial frequencies and orientations (Hubel & Wiesel, 1968). Thus,


3/15

it seems that the visual system somehow de-correlates the incoming visual information to

produce a useful basis set of image components with which to filter the incoming images.

A possible method for computing this basis set lies in principal components analysis

(PCA).

PCA

The idea behind principal components analysis is that a given message set, or a

given data set, is linearly transformed into a smaller dimensional dataset with the

property that each of the transformed variables is uncorrelated (Gill, 2002). PCA

generates a basis set for a set of messages by rotating the message data in the sample

space of observed messages (Gill, 2002). For example, in the figure below, the original

data are presented on the left and the PCA rotated data are presented on the right.

Figure 1: Left: original data. Right: PCA rotated data.

Note that the shape of the distribution is preserved, but the regression line through the

PCA rotated data set is now aligned with the x-axis, thereby de-correlating the data. This


4/15

is achieved mathematically by representing each message as a column vector U and by

placing each column vector in a matrix X.

Ui = [ Ui1

Ui2:

Uim]

and

X = [ U11 U21 U31 Un1U12 U22 U32 Un2: : : :

U1m U2m U3m Unm ]

Each row of a message column is treated as a separate variable, and each variable defines

a separate axis (Gill, 2002). The covariance matrix R for the matrix of message columns

X is then computed and then the eigenvectors E and the matrix of eigenvalues for the

variance-covariance matrix are computed (Gill, 2002). The eigenvalues of the variance-

covariance matrix represent the variance-covariance matrix of the rotation defined by the

principal components (Gill, 2002). So each eigenvalue is the variance of a principal

component where the first principal component now accounts for the largest variance

(Gill, 2002). The eigenvector matrix E provides the transformation of the data points

from the original message matrix X to the PCA metric Y through simple matrix

multiplication (Gill, 2002).

Y = XE

Here, the principal component scores matrix equals the original matrix multiplied by the

eigenvector matrix.

For a practical example of how PCA can be used in images, suppose that one

were given the set of faces illustrated in figure 2.


5/15

Figure 2: Original set of faces.

If this set represented the full set of faces that a visual system were exposed to, then one

could compute the set of basis faces required to fully represent those faces. This basis set

is shown in figure 3.

Figure 3: Complete basis set (i.e. the principal components) for the set of faces given in

figure 2. Going from left to right and from top to bottom the variances accounted for by

each face are: 72.9242%, 6.8400%, 4.7808%, 3.4802%, 2.9461%, 2.1323%, 2.0361%,

1.8905%, 1.6516% and 1.3180%

Additionally, however, one can also see what percent of the total variance in the face set

is accounted for by each individual basis face. Here it can be seen that the first basis face

accounts for most of the variance in the basis face set (~73%). Subjectively this face

looks the most like a face out of any of the faces in the set. The next face accounts for a

much smaller percentage of the total variance in the face set, as do all of the subsequent


6/15

faces. If one were arbitrarily to set a cut-off level for the variance explained by a basis

face at 2% then one could roughly represent all of the faces in the original face set using

only the first 7 basis faces as can be seen in figure 4.

Figure 4: Formulation of the first face using the basis set. The top left-hand face was

composed using only the first principal component. The face to the right of it was

composed using the first two principal components. This pattern continues from left to

right and from top to bottom. The last face uses all ten principal components and isexactly the same as the original face. Note that an excellent approximation is achieved

using 7 or more of the 10 basis faces.

The Matlab code that I wrote to compute the basis faces can be found in Appendix A.

One benefit of using PCA then is that it allows information to be compressed without the

loss of the subjective qualities of the information. Specifically, if one wanted to transmit

the full set of faces given in figure 2, then one would need only to transmit the first seven

basis faces from figure 3 and the amplitudes that each basis face would need to be

multiplied by to combine them to regenerate each original face. In this case, the use of

PCA results in roughly a 30% reduction in the amount of information that would need to

be sent. This procedure has been extended by other researchers to massive sets of natural

images where it was found that the components of the natural images tended to resemble

the Gaussian enveloped Fourier components noted by Hubel & Wiesel (1968) to be the

optimal stimuli for exciting neurons in the visual cortex (Olshausen & Field, 1997).


7/15

ICA

Assume that one has the following neural network:

Where the column vector X represents the sensory inputs from an external stimulus U,

and:

U = [ U1U2:

Um]

If the external stimulus (U) is subject to mixing where A is a mixing matrix of size m-by-

m, then the sensory information received by the brain (X) is given as:

X = AU

(Haykin, 1999). In this case, in order for the brain to pick out the original signal matrix

U, it is necessary to develop a neural model that unmixes the mixing done by A, and

transforms the inputs X into the output Y such that the elements of Y are as statistically

independent as is possible (Haykin, 1999). In order to do this, it is necessary to compute

an unmixing matrix W that reverses the effects of A, such that

Y = WX

And

Y = [ Y1Y2:

Neuralmodel

X1

X2

Xm

Y1

Y2

Ym


8/15

Ym]

(Haykin, 1999). The unmixing matrix W in this case would be the m-by-m matrix that

when multiplying X makes the elements of the resultant product Y as statistically

independent as is possible. Thus, the elements of Y would be the independent

components present in the original signal U, although rescaled and permuted (Haykin,

1999).

In order to make the elements of Y as statistically independent as is possible, it is

necessary to minimize the mutual information conveyed by any pair of elements in Y

(Haykin, 1999). Mutual information is a measure of the uncertainty about Yi after Yj has

been observed (Haykin, 1999). The mutual information I(Yi;Yj) between Yi and Yj,

then, is the entropy of Yi minus the conditional entropy of Yi given Yj:

I(Yi;Yj) = H(Yi) H(Yi|Yj)

(Haykin, 1999). This situation is represented in the following Venn diagram.

In order for all of the elements of Y to be statistically independent, the Kullback-Leibler

divergence between the probability density function Y and the probability density

H(Yi|Yj) H(Yj|Yi)I(Yi,Yj)

H(Yi) H(Yj)

H(Yi,Yj)


9/15

function defined by removing each element Yi from Y (where i goes from 1 to m) must

be minimized (Haykin, 1999).

Some Matlab code that I wrote to accomplish this objective using the set of faces

in figure 2 can be found in Appendix B. In the code that I wrote, it is implicitly assumed

that the unmixing matrix W converges by 1200 iterations. This may not necessarily be

the case, however, I have found it to work in some preliminary tests with the face stimuli.

In order to obtain a quantitative index of the demixers performance, one may calculate a

global rejection index as:

=pij

maxk

pik1

j=1

m

+

pij

maxk

pki1

i=1

m

j=1

m

i=1

m

where P = {pij}= WA (Haykin, 1999). The performance index is a measure of the

diagonality of matrix P (Haykin, 1999). If the matrix P is perfectly diagonal, = 0

(Haykin, 1999). For a matrix P whose elements are not concentrated on the principal

diagonal, the performance index will be high (Haykin, 1999). A good performance

index is around 0.05 (Haykin, 1999). This index could be used in the iterative code that I

wrote for computing W. Instead of iterating the loop for 1200 cycles, one could instead

use a while loop, evaluating the performance index at each iteration of the loop, and

exiting only when the performance index reached a certain threshold level. The

calculation of this index, however, is computationally intensive, and so I didnt include it

in my code in the hopes that any time I loose by iterating the loop calculating W past the

threshold performance index, Ill make up in the speed of my loop.

In the end, ICA may be viewed as an extension of PCA. Whereas PCA can only

impose independence up to the second order while constraining the direction vectors to


10/15

be orthogonal, ICA imposes statistical independence on the individual components of the

output vector Y and has no orthogonality constraint.

An example of the application of ICA to images can be found in figure 5.

Figure 5: Independent components derived from the original image set given in figure 2.Note the marked differences between the independent components of the image set

presented here and the principal components of the image set depicted in figure 3. Also

note that since the independent components are maximally statistically they all account

for an equal percent of the variance in the image set.

Here the same initial set of faces from figure 2 that was used in the PCA demonstration is

used again. Each image was vectorized by taking each row of the image and

concatenating it with the previous row to produce a row vector of length (image length

image height). Each row vector was then placed in a matrix X, providing the input vector

to the above diagrammed neural network.

In the algorithm for calculating the independent components, W is calculated, and

the independent components matrix Y can be calculated as Y = WX, where the rows of Y

are the independent components of the original images. These components were

presented in figure 5.

Note that in order to re-construct any of the original images, one must simply

multiply the inverse of the mixing matrix W by the matrix Y, where


11/15

X = W-1Y.

The top left-hand image from figure 2 is reconstructed in this manner in figure 6.

Figure 6: Re-constitution of the top left-hand corner face from figure 2 using the

independent components for the image set. Going from left to right and from top tobottom each image uses incrementally more independent components in its re-

constitution of the original face. Note that each component adds a lot of information

reflecting the high statistical independence of each component.

Note here, however, that each independent component contributes a substantial amount to

the subjective impression of the face as resembling the original face. This property

reflects the statistical independence of the components derived from the original face set

used to reconstruct the faces. In the end, ICA doesnt compress the image information as

much as PCA, however, it encodes the components more efficiently, making each

component a valuable contributor to the original image set. This property is desirable in

neural networks where it is necessary to make the most efficient use possible of the

neurons that are available for encoding information. That is, given a set of neurons that

are to be used to represent information about images in the real world, it would be

efficient to have the outputs of those neurons as statistically independent as possible.

This result also explains the neurophysiological findings of Hubel and Weisel (1968) as

noted in Bell and Sejnowski (1997).


12/15

Reference:

Bell, A.J., and Sejnowski, T.J. (1997). The independent components of natural scenes

are edge filters. Vision Research, 37, 3327-3338.

Cottrell, G.W., Munro, P.W., and Zipser, D. (1987). Image Compression by Back-

Propagation: A Demonstration of Extensional Programming. Technical Report 8720,

University of California, San Diego, Institute of Cognitive Science.

Field, D.J. (1987). Relations between the statistics of natural images and the response

properties of cortical cells. Journal of the Optical Society of America A, 4, 2379-2394.

Gill, J. (2002). What Is Principle Components Analysis Anyway? Retreived January 2,

2003, from http://www.clas.ufl.edu/~jgill/papers/pca.pdf

Haykin S. (1999). Neural Networks A Comprehensive Foundation Second Edition. New

Jersay: Prentice Hall.

Oja, E. (1983) Subspace methods of pattern Recognition. Letchworth, England: Research

Studies Press and Wiley.

Appendix A

% PCA

% Load the original set of faces

% (compliments of Prof. Jason Gould, University of Indianna)
http://www.clas.ufl.edu/~jgill/papers/pca.pdfhttp://www.clas.ufl.edu/~jgill/papers/pca.pdf


13/15

load FaceStruct.mat;

names = fieldnames(images);

% Initialize the matrix of images

Raw = zeros(prod(size(images.andrea)),length(names));

% Put the images into the matrix of images

for i = 1:length(names)

eval(['Raw(:,i) = reshape(images.',char(names(i)),',[length(Raw(:,1)) 1]);']);end

% Normalize the image matrix to have zero mean

% and unit standard deviationColMeans = repmat(mean(Raw),length(Raw(:,1)),1);

ColStd = repmat(std(Raw),length(Raw(:,1)),1);

X = (Raw-ColMeans)./ColStd;

% Calculate the variance-covariance matrix for the

% normalize image matrixR = cov(X);

% Calculate the eigenvectors and eigenvalues of the% variance-covariance matrix

[E, LATENT, EXPLAINED] = pcacov(R);

% Calculate the principal component scores% (These are the filters for the images)

Y = X*E;

% Calculate the inverse of the eigenvector matrix

Einv = inv(E);

% Build and display the first face using the principal components


eval(['ReformFace1',num2str(i),' = Y(:,1:i)*Einv(1:i,:);']);figure

eval(['img',num2str(i),' =

scale(reshape(ReformFace1',num2str(i),'(:,1),size(images.andrea)));']);eval(['image(repmat(img',num2str(i),',[1 1 3]));']);

axis equal

eval(['imwrite(img',num2str(i),',''MakeAnd',num2str(i),'.jpg'',''jpg'');']);end

% Display the principal components



14/15

eval([char(names(i)),' = 2*scale(reshape(X(:,i),size(images.andrea)))-1;']);

figure

eval(['image(repmat(scale(',char(names(i)),'),[1 1 3]));']);axis equal

eval(['title(''Variance explained = ',num2str(EXPLAINED(i)),''');']);

end

Appendix B

% ICA

% Load the original face set

% (Courtesy of Professor Jason Gould, University of Indianna)

load FaceStruct.matnames = fieldnames(images);

% Calculate the length of the column vector composed of

% the concatenated columns of one image.

ColLength = prod(size(images.andrea));

% Initialize the observation vector

X = zeros(length(names),ColLength);

% Fill in the observation vector


eval(['X(i,:) = reshape(images.',char(names{i}),',[1 ColLength]);']);end

% Initialize the unmixing matrixW = rand(length(names))*0.05;

% Initialize the unmixed matrix

Y = W*X;

% Calculate the updating parameter phi for the given W and X

phi = 1/2*Y.^5 + 2/3*Y.^7 + 15/2*Y.^9 + 2/15*Y.^11 + 112/3*Y.^13 +...128*Y.^15 - 512/3*Y.^17;

% Learning rateeta = 0.1;

% Initialize waitbar (this isn't a necessary part of the code,

% it just lets you see how far along the algorithm is as it's


15/15

% iterating.

h = waitbar(0,'Calculating matrix W...');

n = 1200;

% Main ICA loop, repeat n times, so that the matrix W convergesfor i = 1:n

W = W + eta*(eye(size(W)) - phi*Y')*W;

Y = W*X;phi = 1/2*Y.^5 + 2/3*Y.^7 + 15/2*Y.^9 + 2/15*Y.^11 + 112/3*Y.^13 + 128*Y.^15 -

512/3*Y.^17;

waitbar(i/n,h);

end

% Close waitbar

close(h)

% Display and save the images for the independent components

ImageMatrix = scale(Y);for i = 1:length(names)

img{i} = repmat(reshape(ImageMatrix(i,:),size(images.andrea)),[1 1 3]);

figureimage(img{i});

axis equal

eval(['imwrite(img{i},''IndComp',num2str(i),'.jpg'',''jpg'');']);

end

% Calculate the inverse of the unmixing matrix W

Winv = inv(W);

% Rebuild and display the first image using the independent components

for i = 1:length(names)A = Winv(:,1:i)*Y(1:i,:);

ImageMatrix = scale(A);

img{i} = repmat(reshape(ImageMatrix(1,:),size(images.andrea)),[1 1 3]);

figureimage(img{i});

axis equal

eval(['imwrite(img{i},''RebuiltUsing',num2str(i),'.jpg'',''jpg'');']);end

Documents

PCA and ICA