Facial PCA and Fisher Discriminant Analysis

Stats231

Project 1 – Facial PCA and Fisher Discriminant Analysis

Scot Fang 704125804

I. PCA Analysis1. Intensity PCA

Mean Face

20 Eigenfaces with largest eigenvalues in descending order (as is, not added to mean) – Ordered across columns, then down rows.

Four Examples of reconstructed face vs. original testing face:

Reconstruction Error Plot:

150 Training Images used to construct eigenvectors Compressed 27 Testing Images along eigenvectors Decompressed 27 Images and obtained average of pixel error across all images

With 20 Eigenvectors: The average squared difference per pixel ~ 300, square root = +-17.21. The range of the pixel matrix was 255. Thus the pixel error was 6.74% of the pixel range.

As seen above, error decreases approximately exponentially with increase in eigenvectors used to deconstruct an image. This makes sense to me, because the eigenvectors are ordered by descending eigenvalues. Thus the first eigenvector captures the most scatter across all the images, and thus should provide the most information. The next eigenvector provides information along an orthogonal dimension, thus the information is not redundant, but it is less informative then the previous eigenvector, so we should see a decrease in error. Each following eigenvector provides less information, so we should see the slope of the error decrease.

2. Geometry PCA – Using 87 Landmarks

Mean Warp

5 Largest Eigen-Warpings (Added to mean for visualization)

As you can see above, the eigen-warpings indeed show different angles/rotations/shapings of the face. Visually they approximately represent orthogonal warping dimensions, meaning they are warping in along different directions.


150 Training Images used to construct eigenvectors Compressed 27 Testing Images along eigenvectors Decompressed 27 Images and obtained average of landmark error across all images

With 20 Eigenvectors:

Each image was divided into 87 landmarks with an (x,y) coordinate. Error was gathered as an average of all x coordinate error and y coordinate error.

The average squared difference per coordinate ~ 2.5, square root = +-1.58. The empirical range of the coordinate matrix was 235.19. Thus the coordinate error was 0.67% of the range.

Geometric error plot followed similar exponential curve as intensity error, same methodology used for PCA deconstruction. However we obtained a lower error as a percentage of the range of our random variable. This is probably due to the fact that intensity has 256^2 dimensions per image, while landmarks only have 87*2 dimensions per image, thus there is less information to capture for geometry. Holding the number of eigenvectors constant, it is expected that PCA will lose more information in the case with much larger dimensionality.

3. Hybrid PCA – Geometry and Intensity PCA

Here we first warp all intensity images to the mean landmark, then do intensity PCA on the aligned images, then warp each image back to its original landmarks using geometry PCA.

Note: Intensity images are warped pixel by pixel to landmark orientation via interpolation, Octave 3.2.4 does not support cubic interpolation, so I used linear interpolation.

Mean Alignment vs. Aligned Intensity Mean vs. Unligned intensity Mean - (Each training data was aligned to mean landmarks to compute aligned mean)

Here you can see some fuzzy interpolation for the Aligned Intensity Mean vs the Unaligned Intensity Mean.

_____________________________________________________________________________________________


150 Training Images used to construct eigenvectors Compressed 27 Testing Images along eigenvectors Decompressed 27 Images and obtained average of pixel error across all images

With 20 Eigenvectors each for Geometry and Intensity: The average squared difference per coordinate ~ 400, square root = +20.

The empirical range of the coordinate matrix was 255. Thus the pixel error was 8% of the range.

The graph below shows the pixel error holding the number of Geometry Eigenvectors constant at 20

We reached a slightly higher pixel error as a percentage of the pixel range using hybrid PCA reconstruction vs Intensity PCA construction alone, 8% vs 6.74%. At first it is counter-intuitive that we are reaching a higher error rate using 20 eigenvectors for geometry and 20 eigenvectors for intensity vs. just using 20 eigenvectors for intensity. However, the process through which we are compressing our image data in this hybrid case has many more stages where information loss occurs.

We must be encountering information loss at these stages:

1. When aligning all intensity images to the mean landmarks, we lose information to interpolation, linear interpolation in this case.

2. When compressing an intensity image to 20 eigenvectors, we lose intensity information.3. When compressing landmarks to 20 eigenvectors, we lose geometric information.4. When re-warping decompressed intensity image, we further lose information to interpolation.

Thus, there are more information leaks in the hybrid pca process than the intensity pca process alone, which explains our higher error rate even with more eigenvectors.

All Reconstructed Testing Faces, Original vs. Reconstructed with Intensity and Geo PCA

20 igenvectors each for Geometry and Intensity

4. Random Synthesis of Faces using Geometry and Intensity PCA

Below are 20 randomly synthesized faces. Each face is a reconstruction of 10 aligned intensity eigenvectors warped back to 10 geometric eigenvectors. The scalar value of each eigenvector projection was obtained by randomly sampling from all the scalar projections of our image data given a specific eigenvector. As you can see below, the images look quite warped in some cases, indicating a large warp scalar along one or more geometric eigenvectors. The fuzziness can be attributed to both pixel interpolation and warping.

Part 2 – Fisher Discriminant Analysis

1. 1-D Fisher Discriminant Using Mixed Intensity and Geometric Data

Training Plot:

i) Discriminant from full, uncompressed data

78 Males and 75 Females were used to train our data, but as you can see below there are only two visible points after projecting the genders on to our Fisher Dimension. The reason for this is that the variance of the two classes after projection is 0! This makes sense as the goal of the Fisher Projection is to maximize Between-Class-Scatter and minimize Within-Class Scatter. In our case Within Class Scatter became 0.

Inspection of our projection vector ‘w’ reveals that 22% of the dimensions were given a 0 weight, thus the projection has discarded information from 22% of our original dimensions. These dimensions were not useful for discrimination: specifically they either increased within-class-scatter or decreased between class-scatter.

Since the sample variance of the 2 projected classes was 0, and thus equal, I chose a threshold “z” which was the mean of the two projected class means to be the discriminant. This is represented by the vertical line.

As you can see from our testing data above, 1D fisher discriminant only misclassified one point, and it was just on the threshold line. That is 5% of our total testing set.

With 1D, the unknown testing data was within a region halfway from the discriminant to either projected class mean.

2. 2D Fisher Face Visualization – Full Uncompressed Data

Here we visualize the separation when our data is projected onto two fisher dimensions: one for intensity and one for geometry. We used full uncompressed data to determine our fisher projections.

Once again, as seen above, variance of projected distributions for each class is zero on training data.

For testing data, we see that it appears to be linearly separable, although there is a clustering near the center of the plot between classes. The separating line also looks orthogonal to the two projected training distributions.

With 2 dimensions, 3 points in the unknown data veer towards the triangle class (male), and one towards the female class. This is contrary to the 1D case where 3 points veered towards the female class. However paying attention to the scale of the graph they are still very close to the discriminant line.

I performed a visual inspection of the unknown faces, and it looks like 3 faces are male, and one as female. Thus it seems 2-D projection works better on the unknown cases.

Additional Analysis: Mixed Fisher 1D using Compressed Data

Here I have used 10 eigenvectors for geometry and 10 for intensity once again, and compressed each image into 20 scalar values which along each eigenvector. I then performed mixed 1D Fisher analysis on the compressed images.

As you can see above, the projected training data was clearly not well separated.

As you can see above, the projected testing data was also not well separated.

The poor performance of Fisher analysis using PCA compressed data should not be surprising. PCA is a generative technique, it discovers the best dimensions along which to represent each image, and compresses along those dimensions. We can reconstruct well using PCA.

However, PCA preserves the most important generative information, its information extraction policy does not consider anything regarding discrimination between classes, or images. Thus, when we compress the images using PCA, we may very well lose information that helps Fisher analysis discriminate between classes. In our case, we did not preserve enough discriminative information after PCA compression.

Documents

Facial PCA and Fisher Discriminant Analysis