Face Recognition by Independent Component Analysis Author: Marian Stewart Bartlett, Javier R. Movellan, Terrence J. Sejnowski Lecturer: Fang Fang

Face Recognition by IndependentComponent Analysis

Author: Marian Stewart Bartlett,

Javier R. Movellan, Terrence J. Sejnowski

Lecturer: Fang Fang

General Information

IEEE TRANSACTIONS ON NEURAL

NETWORKS, VOL. 13, NO. 6, NOVEMBER 2002

http://ieeexplore.ieee.org/Xplore/dynhome.jsp

Marian Stewart Bartlett Assistant Research Professor at the Institute for Neural Computat

ion, University of California-San Diego

Education: the B.S. in mathematics and computer science from Middlebury

College, in 1988

the Ph.D. in cognitive science and psychology from the University of California-San Diego, La Jolla, in 1998.

Advisor: T. Sejnowski research interests · Image analysis through unsupervised learning.

· Facial identity recognition.

· Facial expression analysis.

· Independent component analysis for pattern recognition

Homepage: http://mplab.ucsd.edu/~marni/index.htmlEmail: [email protected]

Publications Book: Face Image Analysis by Unsupervised Learning. Foreword by Terrence J. Sejnowski.

Papers :Bartlett, M.S., Littlewort, G.C. Automatic Recognition of Facial Actions in Spontaneous

Expressions. Journal of Multimedia 1(6) p. 22-35. (2006).

Bartlett, M.S., Littlewort, G.C Fully automatic facial action recognition in spontaneous

behavior. Automatic Face and Gesture Recognition. (2006).

Bartlett, M.S., Littlewort, G.C Recognizing Facial Expression: Machine Learning and

Application to Spontaneous Behavior.CVPR2005.

Littlewort, G.C ., Bartlett, M.S. Dynamics of facial expression extracted automatically

from video., CVPR2004

Bartlett, M.S., Littlewort, G.C Real time face detection and expression recognition:

Development and application to human-computer interaction. CVPR 2003.

Javier R. Movellan was born in Palencia,Spain. Research Associate with Carnegie-Mellon University, from 1989 to1993

Assistant Professor with the Department of Cognitive Science, University of California-San Diego (USCD), from 1993 to 2001.

Research Associate with the Institute for Neural Computation and head of the Machine Perception Laboratory at UCSD.

Education: the B.S. Universidad Autonoma de Madrid, Spain.

the Ph.D. University of California-Berkeley in 1989

He was a Fulbright Scholar at the same University

research interests

development of perceptual computer interfaces.

analyzing the statistical structure of natural signals in order to help understand how the brain works

Email: [email protected]

PublicationsJavier R. Movellan: Local Algorithm to Learn Trajectories with Stochastic

Neural Networks. NIPS 1993

Javier R. Movellan: Visual Speech Recognition with Stochastic Networks. NIPS 1994

Javier R. Movellan, Paul Mineiro: Bayesian Robustification for Audio Visua

l Fusion. NIPS 1997 Javier R. Movellan: A Learning Theorem for Networks at Detailed Stochasti

c Equilibrium. Neural Computation 1998

Javier R. Movellan, Paul Mineiro: Robust Sensor Fusion: Analysis and Application to Audio Visual Speech Recognition. Machine Learning 1998

Javier R. Movellan, Paul Mineiro : Partially Observable SDE Models for Image Sequence Recognition Tasks. NIPS 2000

Javier R. Movellan, Thomas Wachtler : Factorial Coding of Color in Primary Visual Cortex. NIPS 2002

Terrence J. Sejnowski

joined the faculty of the Department of Biophysics at Johns Hopkins University in 1982

an Investigator with the Howard Hughes Medical Institute

a Professor at The Salk Institute for Biological Studies ,where he

directs the Computational Neurobiology Laboratory

Professor of Biology at the University of California-San Diego

Dr. Sejnowski received the IEEE Neural Networks Pioneer Award in 2002.

Education: the B.S. in physics from the CaseWestern Reserve University

the Ph.D. in physics from Princeton University, in 1978.

research interests

The long-range goal is to build linking principles from brain to behavior using computational models.

Email: [email protected]

PublicationsTerrence J. Sejnowski, B. Yuhas : Combining Visual and Acoustic Speech S

ignals with a Neural Network Improves Intelligibility. NIPS 1989

Nicol N. Schraudolph, Terrence J. Sejnowski : Competitive Anti-Hebbian Le

arning of Invariants. NIPS 1991 Steven J. Nowlan, Terrence J. Sejnowski : Filter Selection Model for Genera

ting Visual Motion Signals. NIPS 1992 Jutta Kretzberg, Terrence J. Sejnowski : Variability of postsynaptic response

s depends non-linearly on the number of synaptic inputs. Neurocomputing 2003

Odelia Schwartz, Terrence J. Sejnowski : Assignment of Multiplicative Mixtur

es in Natural Images . NIPS 2004 Odelia Schwartz, Terrence J. Sejnowski : A Bayesian Framework for Tilt Per

ception and Confidence. NIPS 2005

提纲摘要介绍 ICAICA 表示人脸的两种结构实验结果和结论

Abstract A number of current face recognition algorithms use face representations found by u

nsupervised statistical methods. Typically these methods find a set of basis images and represent faces as a linear combination of those images.

Principal componentanalysis (PCA) is a popular example of such methods. The basis images found by PCA depend only on pairwise relationships between pixels in the image database. In a task such as face recognition, in which important information may be contained in the high-order relationships among pixels, it seems reasonable to expect that better basis images may be found by methods sensitive to these high-order statistics.

Independent component analysis (ICA), a generalization of PCA, is one such method. We used aversion of ICA derived from the principle of optimal information transfer through sigmoidal neurons.

ICA was performed on face images in the FERET database under two different architectures, one which treated the images as random variables and the pixels as outcomes, and a second which treated the pixels as random variables and the images as outcomes. The first architecture found spatially local basis images for the faces. The second architecture produced a factorial face code.

Both ICA representations were superior to representations based on PCA for recognizing faces across days and changes in expression. A classifier that combined the two ICA representations gave the best performance.

摘要目前已存在很多人脸特征提取算法 , 大多采用无监督统计方法。

这些无监督统计方法找出一组人脸基图像 , 然后用这组基图像的线性组合来表示人脸图像。

主成分分析就是这种方法中比较受欢迎的一种 , 但是它提取的基图像只是基于原人脸图像象素两两之间的二阶统计关系。然而在人脸识别的应用中 , 识别所需要的重要信息可能包含象素间的高阶统计关系 , 因此采用对这些高阶信息敏感的特征能获得更好的识别效果。

独立分量分析 ( ICA) 是这种高阶统计方法中的一种 , 它是主成分分析的推广。 ICA 的求解过程遵循最优化的过程，这种最优化的信息转换又是通过 sigmoid 神经元实现的。

ICA 被应用在 FERET 数据库的人脸图像有两种不同的表示，一种表示方法是把图像作为随机变量把像素作为输出，另一种方法是把像素作为随机变量把图像作为输出。第一种结构叫独立基图像表示。第二种结构叫因子表示。

对于天数和表情变化的人脸识别，两种 ICA 的表示方法都要比PCA 的方法好。融合了两种 ICA 表示的分类器取得了最好的效果。

Introduction

PCA can only separate pairwise linear dependencies between pixels. High-order dependencies will still show in the joint distribution of PCA coefficients, and, thus, will not be properly separated.

In a task such as face recognition, much of the important information may be contained in the high-order relationships among the image pixels.

Independent component analysis (ICA) is one such generalization.

Independent Component Analysis

ICA Ⅰ基于高阶统计量的独立分量分析目的：把信号分解成若干个互相独立的成分背景：最早由 Jutten 和 Herault 在 1981 年和 1991

年提出（发表在 Signal Processing ） Comon 在 1994 年发表的论文，首次阐明了独立分

量的概念算法： 1. 信息极大化（ infomax ） Bell 和 Sejnowski 2. 互信息极小化 MMI Shun-ichi Amari (minimization of mutual information ) 3. 固定点算法（ fast ICA ） Erkki 和 Hyvarinen 提取信号非高斯极大化

ICA ⅡICA 的模型描述为：将观察到的信号（原始信号），

看作隐藏变量的线性组合。隐藏变量满足非高斯分布，且相互独立

原始信号表示为：

是观察到的信号，是独立分量上式表示为：

分量相互独立

X AS

1 2( , , )TnX x x x 1 2( , , )TnS s s s

S WX

is

当且仅当中各分量独立时

如何判断各分量间相互独立 ?

1

( ) ( )N

ii

p Y p y

ICA Ⅲ

互信息：

1

1

( )( ) ( ), ( ) ( ) log

( )

N

i Ni

ii

p YI Y KL p Y p y p Y dY

p y

Y ( ) 0I Y

W 选择矩阵，由求，使上式达到极小X Y WX

ICA Ⅳ

问题：需要对和作估计，估计即繁琐，也不准确解决：通过在输出端引入某种非线性环节，自动引入高阶统计量

( )ip y ( )p Y

信息极大化特点：输出 y 之后逐分量地引入一个非线性函数来代替对高阶统计量的估计目标：在给定合适的后，调节矩阵W ，使输出的总熵量极大极大意味着 y 的各分量间互信息极小

( )i i ir g y

( )i ig y

1 2, mr r r r ( )H r

( )H r

ICAⅤ： infomax algorithm

be an -dimensional random vector

be an invertible matrix

an -D random variable representing the outputs

of -neurons.

Typically, the logistic function is used:

sigmoid function

X nW n n

U WXnn

ICA Ⅵ： infomax algorithm

This is achieved by performing gradient ascent on the entropy of the output with respect to the weight matrix .

the ratio between the second and first partial derivatives of the activation function,

Computation of the matrix inverse can be avoided by employing the natural gradient , which amounts to multiplying the absolute gradient by TW W

附录 : 信息极大化的证明Ⅰ

证明： 1 1

( ) ( )n n n n

n n

H y H yW W W W W W

W W

1

1

( , ) ( ) ( ) log ( )

( ) log ( ) ( ) log ( )

( ) 1

N

i ii

N

i ii

H Y W H X p X W f u dx

H X W p X dx p X f u dx

p X dx

( ) log ( ) log( ( )i i X i ip X f u dx E f u

1

( , ) ( ) log log( ( )N

X i ii

H Y W H X W E f u

信息极大化的证明Ⅱ将上式对 W 求导：

T

1

N N1 1

1 1 N N

log( ( )) E[ (U)X ]W

f (u )f (u )(U) , ,

f (u ) f (u )

N

X i ii

E f u

i ij

ij ij i i

( ) f ( )1log( ( ))

( ) f ( )i i

i ii i

f u uf u x

w f u w u

1 1T T T T( )W E[ (U)X ] W +E[ X ]

H YY

W

1Tdet W1log W

W det WW

W

第二项：

第三项：

因为是以为概率密度函数的均值，所以作随机处理时，可以取消总集均值 xE ( )p X

∴

ICAⅦ：模型预处理统一中心：对训练样本中心化，使每个样本成为零均值矢量即：

S

( )X E X

1 1 1 1 1 1

2 2 2 2 2 2 IT T T T TXX U S S U U SS U

1

2 TX U S

2T T T T0 0 0 0 0U V V U U U U UTSS 2

0

球化（白化； sphering ）：使中各行互相正交，各行的能量都相等且等于 1 （消除一阶和二阶的相关性）

或

注：

是原始信号，和分别是的协方差矩阵的特征值矩阵和特征向量矩阵 S U

ICA 与 PCA 的比较Ⅰ If the sources are Gaussian, the likelihood of the data

depends only on first- and second-order statistics .Second-order statistics capture the amplitude spectrum of

images but not their phase spectrum. The high-order statistics capture the phase spectrum.

The phase spectrum, not the power spectrum, contains the structural information in images that drives human perception

For a given sample of natural images, we can scramble their phase spectrum while maintaining their power spectrum. This will dramatically alter the appearance of the images but will not change their second-order statistics

ICA 与 PCA 的比较：例 1

Original image Scambled phase Reconstructions with the amplitude of the original face and the phase of the other face

ICA 与 PCA 的比较Ⅱ

It provides a better probabilistic model of the data, which better identifies where the data concentrate in

-dimensional space. It uniquely identifies the mixing matrix . It finds a not-necessarily orthogonal basis which may

reconstruct the data better than PCA in the presence of noise.

It is sensitive to high-order statistics in the data, not just the covariance matrix.

nW

ICA 与 PCA 的比较：例 2

Top：3-D data distribution and corresponding PC and IC axesbottom left： Distribution of the first PCA coordinates of the data.bottom right： Distribution of the first ICA coordinates of the data.

If only two components are allowed, ICA chooses a different subspace than PCA .

人脸的 ICA 表示方法basis image( 独立基图像 )

images are random variables and pixels are trials.it makes sense to talk about independence of images o

r functions of images

factorial code( 因子表示 )pixels are random variables and images are trials.

it makes sense to talk about independence of pixels or functions of pixels.

人脸的 ICA 表示方法

Two architectures for performing ICA on images.(a) Architecture I for finding statistically independent basis images. Performing source

separation on the face images produced IC images in the rows of U(c) Architecture II for finding a factorial code. Performing source separation on the pixels

produced a factorial code in the columns of the output matrix, U.

IMAGE DATA The data set contained images of 425 individuals. There were up to four

frontal views of each individual: train on a single frontal view of each individual test for recognition under three different conditions Coordinates for eye and mouth locations were provided with the FERET

database. crop and scale them to 60× 50 pixels.

neutral expression and change of expression from session 1; neutral expression and change of expression from session2

ARCHITECTURE I:statistically independent basis images

ARCHITECTURE I:statistically independent basis images

3000 was intractable under our present memory limitation

ICA on a set of m linear combinations of those images

PCA ICA

PCA ICAThe PC representation of the set of zero-mean images in based on is defined as: A minimum squared error approximation of is obtained by

mP

The ICA algorithm produced a matrix such that ：

Therefore：

ˆ T Tm m m mX R P XP P X

TI mW P U 1T

m IP W U

ˆ Tm mX R P 1ˆ

m IX R W UCoefficient:

1m IB R W

PCA ICAA representation for test images was obtained by using the P

C representation based on the training images to obtain

, and then computingtext text mR X P

It was employed to serve two purposes:

1) to reduce the number of sources to a tractable number

2) to provide a convenient method for calculating representations of test images.

Face Recognition Performance

the coefficient vectors by the nearest neighbor algorithm,

using cosines as the similarity measure.

Subspace Selection

Discriminability of the ICA coefficients (solid lines) and discriminability

of the PCA components (dotted lines) for the three test cases. Components

were sorted by the magnitude of r

The ICA coefficients

consistently had greater

class discriminability than

the PCA coefficients

Subspace Selection

the ICA-defined subspace

encoded more information

about facial identity than

PCA-defined subspace.

Improvement in face recognition performance for the ICA and PCA

representations using subsets of components selected by the class

discriminability r. The improvement is indicated by the gray

segments at the top of the bars.

ARCHITECTURE Ⅱ:A Factorial Face Code

ARCHITECTURE :Ⅱthe data matrix x so that rows represent different pixels and columns

represent different images

ARCHITECTURE :ⅡThe representational code for test images is obtained:

In order to reduce the dimensionality of the input,,ICA was performed on the first 200 PCA coefficients of the face images.

representation for the training images:

representation for test images:

this approach tends to generate basis images that look more face-like than the basis images generated by PCA


There was no significant difference in the erformances of the two ICA representations


Selection of subsets of

components for the

representation by class

discriminability had little effect

on the recognition performance

using the ICA-factorial

representation

Examination of the ICA representations

Mutual information

DiscussionⅠ In this paper, we explored one such generalization:

Bell and Sejnowski’s ICA algorithm. We explored two different architectures for developing image repre

sentations of faces using ICA. The purpose of the comparison in this paper was to examine ICA a

nd PCA-based representations under identical conditions. Both ICA representations outperformed the “eigenface” representa

tion ,for recognizing images of faces sampled on a different day fro

m the training images. there was no significant difference between PCA and ICA using Eu

clidean distance as the similarity measure. ( Moghaddam )

DiscussionⅡ It is an open question as to whether these techniques would

enhance performance with PCA and ICA equally. It is possible that the factorial code representation may prove

advantageous with more powerful recognition engines than nearest neighbor on cosines, such as a Bayesian classifier

The research presented here found that face representations in which high-order dependencies are separated into individual coefficients gave superior recognition performance to representations which only separate second-order redundancies.

Thanks All!

Documents

Face Recognition by Independent Component Analysis Author: Marian Stewart Bartlett, Javier R. Movellan, Terrence J. Sejnowski Lecturer: Fang Fang