Upload
hotland-sitorus
View
220
Download
0
Embed Size (px)
Citation preview
7/28/2019 PCA and ICA
1/15
Principal Components Analysis & Independent Components Analysis
Aaron ClarkeSN: 206071237
Prof. Robert Cribbie
Statistics 6130
7/28/2019 PCA and ICA
2/15
Introduction:
A common problem in information theory is that of representing a message space
with the smallest possible set of message components (Cottrell et al., 1987; Oja, 1983).
That is, to find a basis set of message components that could be used to form every
message, given a particular set of possible messages. For example, the Morse code forms
a basis set for the set of possible message that can be transmitted by Morse code. If one
wanted to form the message SOS, then one would simply combine the element for S
(---) with the element for O () such that the full message would be ------. If the
only message that anyone ever sent by Morse code was ------, then --- and
would be the basis set for the set of all messages sent by Morse code. The question then
arises: is there a basis set for the set of images that the human visual system was likely
to encounter in its evolutionary environment? If so then one would expect that the
human visual system would be adapted to optimally perceive this basis set and would use
it to reconstruct observed image information. It is already known that Fourier analysis
can be used to decompose any given image into a set of spatial frequency components of
varying phase, orientation and amplitude, however, the elements of the set of all possible
spatial frequency components are not equally distributed in natural images (Field, 1987).
Thus one would expect that there might exist a smaller basis set of spatial frequency
components that could be used to compose the set of natural images that the human
visual system is exposed to. Evidence supporting this theory can be found in the
neurophysiological literature where it has been shown that single neurons in the visual
cortex respond to a finite set of Gaussian enveloped Fourier components (called Gabor
patches) of particular spatial frequencies and orientations (Hubel & Wiesel, 1968). Thus,
7/28/2019 PCA and ICA
3/15
it seems that the visual system somehow de-correlates the incoming visual information to
produce a useful basis set of image components with which to filter the incoming images.
A possible method for computing this basis set lies in principal components analysis
(PCA).
PCA
The idea behind principal components analysis is that a given message set, or a
given data set, is linearly transformed into a smaller dimensional dataset with the
property that each of the transformed variables is uncorrelated (Gill, 2002). PCA
generates a basis set for a set of messages by rotating the message data in the sample
space of observed messages (Gill, 2002). For example, in the figure below, the original
data are presented on the left and the PCA rotated data are presented on the right.
Figure 1: Left: original data. Right: PCA rotated data.
Note that the shape of the distribution is preserved, but the regression line through the
PCA rotated data set is now aligned with the x-axis, thereby de-correlating the data. This
7/28/2019 PCA and ICA
4/15
is achieved mathematically by representing each message as a column vector U and by
placing each column vector in a matrix X.
Ui = [ Ui1
Ui2:
Uim]
and
X = [ U11 U21 U31 Un1U12 U22 U32 Un2: : : :
U1m U2m U3m Unm ]
Each row of a message column is treated as a separate variable, and each variable defines
a separate axis (Gill, 2002). The covariance matrix R for the matrix of message columns
X is then computed and then the eigenvectors E and the matrix of eigenvalues for the
variance-covariance matrix are computed (Gill, 2002). The eigenvalues of the variance-
covariance matrix represent the variance-covariance matrix of the rotation defined by the
principal components (Gill, 2002). So each eigenvalue is the variance of a principal
component where the first principal component now accounts for the largest variance
(Gill, 2002). The eigenvector matrix E provides the transformation of the data points
from the original message matrix X to the PCA metric Y through simple matrix
multiplication (Gill, 2002).
Y = XE
Here, the principal component scores matrix equals the original matrix multiplied by the
eigenvector matrix.
For a practical example of how PCA can be used in images, suppose that one
were given the set of faces illustrated in figure 2.
7/28/2019 PCA and ICA
5/15
Figure 2: Original set of faces.
If this set represented the full set of faces that a visual system were exposed to, then one
could compute the set of basis faces required to fully represent those faces. This basis set
is shown in figure 3.
Figure 3: Complete basis set (i.e. the principal components) for the set of faces given in
figure 2. Going from left to right and from top to bottom the variances accounted for by
each face are: 72.9242%, 6.8400%, 4.7808%, 3.4802%, 2.9461%, 2.1323%, 2.0361%,
1.8905%, 1.6516% and 1.3180%
Additionally, however, one can also see what percent of the total variance in the face set
is accounted for by each individual basis face. Here it can be seen that the first basis face
accounts for most of the variance in the basis face set (~73%). Subjectively this face
looks the most like a face out of any of the faces in the set. The next face accounts for a
much smaller percentage of the total variance in the face set, as do all of the subsequent
7/28/2019 PCA and ICA
6/15
faces. If one were arbitrarily to set a cut-off level for the variance explained by a basis
face at 2% then one could roughly represent all of the faces in the original face set using
only the first 7 basis faces as can be seen in figure 4.
Figure 4: Formulation of the first face using the basis set. The top left-hand face was
composed using only the first principal component. The face to the right of it was
composed using the first two principal components. This pattern continues from left to
right and from top to bottom. The last face uses all ten principal components and isexactly the same as the original face. Note that an excellent approximation is achieved
using 7 or more of the 10 basis faces.
The Matlab code that I wrote to compute the basis faces can be found in Appendix A.
One benefit of using PCA then is that it allows information to be compressed without the
loss of the subjective qualities of the information. Specifically, if one wanted to transmit
the full set of faces given in figure 2, then one would need only to transmit the first seven
basis faces from figure 3 and the amplitudes that each basis face would need to be
multiplied by to combine them to regenerate each original face. In this case, the use of
PCA results in roughly a 30% reduction in the amount of information that would need to
be sent. This procedure has been extended by other researchers to massive sets of natural
images where it was found that the components of the natural images tended to resemble
the Gaussian enveloped Fourier components noted by Hubel & Wiesel (1968) to be the
optimal stimuli for exciting neurons in the visual cortex (Olshausen & Field, 1997).
7/28/2019 PCA and ICA
7/15
ICA
Assume that one has the following neural network:
Where the column vector X represents the sensory inputs from an external stimulus U,
and:
U = [ U1U2:
Um]
If the external stimulus (U) is subject to mixing where A is a mixing matrix of size m-by-
m, then the sensory information received by the brain (X) is given as:
X = AU
(Haykin, 1999). In this case, in order for the brain to pick out the original signal matrix
U, it is necessary to develop a neural model that unmixes the mixing done by A, and
transforms the inputs X into the output Y such that the elements of Y are as statistically
independent as is possible (Haykin, 1999). In order to do this, it is necessary to compute
an unmixing matrix W that reverses the effects of A, such that
Y = WX
And
Y = [ Y1Y2:
Neuralmodel
X1
X2
Xm
Y1
Y2
Ym
7/28/2019 PCA and ICA
8/15
Ym]
(Haykin, 1999). The unmixing matrix W in this case would be the m-by-m matrix that
when multiplying X makes the elements of the resultant product Y as statistically
independent as is possible. Thus, the elements of Y would be the independent
components present in the original signal U, although rescaled and permuted (Haykin,
1999).
In order to make the elements of Y as statistically independent as is possible, it is
necessary to minimize the mutual information conveyed by any pair of elements in Y
(Haykin, 1999). Mutual information is a measure of the uncertainty about Yi after Yj has
been observed (Haykin, 1999). The mutual information I(Yi;Yj) between Yi and Yj,
then, is the entropy of Yi minus the conditional entropy of Yi given Yj:
I(Yi;Yj) = H(Yi) H(Yi|Yj)
(Haykin, 1999). This situation is represented in the following Venn diagram.
In order for all of the elements of Y to be statistically independent, the Kullback-Leibler
divergence between the probability density function Y and the probability density
H(Yi|Yj) H(Yj|Yi)I(Yi,Yj)
H(Yi) H(Yj)
H(Yi,Yj)
7/28/2019 PCA and ICA
9/15
function defined by removing each element Yi from Y (where i goes from 1 to m) must
be minimized (Haykin, 1999).
Some Matlab code that I wrote to accomplish this objective using the set of faces
in figure 2 can be found in Appendix B. In the code that I wrote, it is implicitly assumed
that the unmixing matrix W converges by 1200 iterations. This may not necessarily be
the case, however, I have found it to work in some preliminary tests with the face stimuli.
In order to obtain a quantitative index of the demixers performance, one may calculate a
global rejection index as:
=pij
maxk
pik1
j=1
m
+
pij
maxk
pki1
i=1
m
j=1
m
i=1
m
where P = {pij}= WA (Haykin, 1999). The performance index is a measure of the
diagonality of matrix P (Haykin, 1999). If the matrix P is perfectly diagonal, = 0
(Haykin, 1999). For a matrix P whose elements are not concentrated on the principal
diagonal, the performance index will be high (Haykin, 1999). A good performance
index is around 0.05 (Haykin, 1999). This index could be used in the iterative code that I
wrote for computing W. Instead of iterating the loop for 1200 cycles, one could instead
use a while loop, evaluating the performance index at each iteration of the loop, and
exiting only when the performance index reached a certain threshold level. The
calculation of this index, however, is computationally intensive, and so I didnt include it
in my code in the hopes that any time I loose by iterating the loop calculating W past the
threshold performance index, Ill make up in the speed of my loop.
In the end, ICA may be viewed as an extension of PCA. Whereas PCA can only
impose independence up to the second order while constraining the direction vectors to
7/28/2019 PCA and ICA
10/15
be orthogonal, ICA imposes statistical independence on the individual components of the
output vector Y and has no orthogonality constraint.
An example of the application of ICA to images can be found in figure 5.
Figure 5: Independent components derived from the original image set given in figure 2.Note the marked differences between the independent components of the image set
presented here and the principal components of the image set depicted in figure 3. Also
note that since the independent components are maximally statistically they all account
for an equal percent of the variance in the image set.
Here the same initial set of faces from figure 2 that was used in the PCA demonstration is
used again. Each image was vectorized by taking each row of the image and
concatenating it with the previous row to produce a row vector of length (image length
image height). Each row vector was then placed in a matrix X, providing the input vector
to the above diagrammed neural network.
In the algorithm for calculating the independent components, W is calculated, and
the independent components matrix Y can be calculated as Y = WX, where the rows of Y
are the independent components of the original images. These components were
presented in figure 5.
Note that in order to re-construct any of the original images, one must simply
multiply the inverse of the mixing matrix W by the matrix Y, where
7/28/2019 PCA and ICA
11/15
X = W-1Y.
The top left-hand image from figure 2 is reconstructed in this manner in figure 6.
Figure 6: Re-constitution of the top left-hand corner face from figure 2 using the
independent components for the image set. Going from left to right and from top tobottom each image uses incrementally more independent components in its re-
constitution of the original face. Note that each component adds a lot of information
reflecting the high statistical independence of each component.
Note here, however, that each independent component contributes a substantial amount to
the subjective impression of the face as resembling the original face. This property
reflects the statistical independence of the components derived from the original face set
used to reconstruct the faces. In the end, ICA doesnt compress the image information as
much as PCA, however, it encodes the components more efficiently, making each
component a valuable contributor to the original image set. This property is desirable in
neural networks where it is necessary to make the most efficient use possible of the
neurons that are available for encoding information. That is, given a set of neurons that
are to be used to represent information about images in the real world, it would be
efficient to have the outputs of those neurons as statistically independent as possible.
This result also explains the neurophysiological findings of Hubel and Weisel (1968) as
noted in Bell and Sejnowski (1997).
7/28/2019 PCA and ICA
12/15
Reference:
Bell, A.J., and Sejnowski, T.J. (1997). The independent components of natural scenes
are edge filters. Vision Research, 37, 3327-3338.
Cottrell, G.W., Munro, P.W., and Zipser, D. (1987). Image Compression by Back-
Propagation: A Demonstration of Extensional Programming. Technical Report 8720,
University of California, San Diego, Institute of Cognitive Science.
Field, D.J. (1987). Relations between the statistics of natural images and the response
properties of cortical cells. Journal of the Optical Society of America A, 4, 2379-2394.
Gill, J. (2002). What Is Principle Components Analysis Anyway? Retreived January 2,
2003, from http://www.clas.ufl.edu/~jgill/papers/pca.pdf
Haykin S. (1999). Neural Networks A Comprehensive Foundation Second Edition. New
Jersay: Prentice Hall.
Oja, E. (1983) Subspace methods of pattern Recognition. Letchworth, England: Research
Studies Press and Wiley.
Appendix A
% PCA
% Load the original set of faces
% (compliments of Prof. Jason Gould, University of Indianna)
http://www.clas.ufl.edu/~jgill/papers/pca.pdfhttp://www.clas.ufl.edu/~jgill/papers/pca.pdf7/28/2019 PCA and ICA
13/15
load FaceStruct.mat;
names = fieldnames(images);
% Initialize the matrix of images
Raw = zeros(prod(size(images.andrea)),length(names));
% Put the images into the matrix of images
for i = 1:length(names)
eval(['Raw(:,i) = reshape(images.',char(names(i)),',[length(Raw(:,1)) 1]);']);end
% Normalize the image matrix to have zero mean
% and unit standard deviationColMeans = repmat(mean(Raw),length(Raw(:,1)),1);
ColStd = repmat(std(Raw),length(Raw(:,1)),1);
X = (Raw-ColMeans)./ColStd;
% Calculate the variance-covariance matrix for the
% normalize image matrixR = cov(X);
% Calculate the eigenvectors and eigenvalues of the% variance-covariance matrix
[E, LATENT, EXPLAINED] = pcacov(R);
% Calculate the principal component scores% (These are the filters for the images)
Y = X*E;
% Calculate the inverse of the eigenvector matrix
Einv = inv(E);
% Build and display the first face using the principal components
for i = 1:length(names)
eval(['ReformFace1',num2str(i),' = Y(:,1:i)*Einv(1:i,:);']);figure
eval(['img',num2str(i),' =
scale(reshape(ReformFace1',num2str(i),'(:,1),size(images.andrea)));']);eval(['image(repmat(img',num2str(i),',[1 1 3]));']);
axis equal
eval(['imwrite(img',num2str(i),',''MakeAnd',num2str(i),'.jpg'',''jpg'');']);end
% Display the principal components
for i = 1:length(names)
7/28/2019 PCA and ICA
14/15
eval([char(names(i)),' = 2*scale(reshape(X(:,i),size(images.andrea)))-1;']);
figure
eval(['image(repmat(scale(',char(names(i)),'),[1 1 3]));']);axis equal
eval(['title(''Variance explained = ',num2str(EXPLAINED(i)),''');']);
end
Appendix B
% ICA
% Load the original face set
% (Courtesy of Professor Jason Gould, University of Indianna)
load FaceStruct.matnames = fieldnames(images);
% Calculate the length of the column vector composed of
% the concatenated columns of one image.
ColLength = prod(size(images.andrea));
% Initialize the observation vector
X = zeros(length(names),ColLength);
% Fill in the observation vector
for i = 1:length(names)
eval(['X(i,:) = reshape(images.',char(names{i}),',[1 ColLength]);']);end
% Initialize the unmixing matrixW = rand(length(names))*0.05;
% Initialize the unmixed matrix
Y = W*X;
% Calculate the updating parameter phi for the given W and X
phi = 1/2*Y.^5 + 2/3*Y.^7 + 15/2*Y.^9 + 2/15*Y.^11 + 112/3*Y.^13 +...128*Y.^15 - 512/3*Y.^17;
% Learning rateeta = 0.1;
% Initialize waitbar (this isn't a necessary part of the code,
% it just lets you see how far along the algorithm is as it's
7/28/2019 PCA and ICA
15/15
% iterating.
h = waitbar(0,'Calculating matrix W...');
n = 1200;
% Main ICA loop, repeat n times, so that the matrix W convergesfor i = 1:n
W = W + eta*(eye(size(W)) - phi*Y')*W;
Y = W*X;phi = 1/2*Y.^5 + 2/3*Y.^7 + 15/2*Y.^9 + 2/15*Y.^11 + 112/3*Y.^13 + 128*Y.^15 -
512/3*Y.^17;
waitbar(i/n,h);
end
% Close waitbar
close(h)
% Display and save the images for the independent components
ImageMatrix = scale(Y);for i = 1:length(names)
img{i} = repmat(reshape(ImageMatrix(i,:),size(images.andrea)),[1 1 3]);
figureimage(img{i});
axis equal
eval(['imwrite(img{i},''IndComp',num2str(i),'.jpg'',''jpg'');']);
end
% Calculate the inverse of the unmixing matrix W
Winv = inv(W);
% Rebuild and display the first image using the independent components
for i = 1:length(names)A = Winv(:,1:i)*Y(1:i,:);
ImageMatrix = scale(A);
img{i} = repmat(reshape(ImageMatrix(1,:),size(images.andrea)),[1 1 3]);
figureimage(img{i});
axis equal
eval(['imwrite(img{i},''RebuiltUsing',num2str(i),'.jpg'',''jpg'');']);end