Upload
godwin-mcdaniel
View
219
Download
3
Tags:
Embed Size (px)
Citation preview
Face Recognition by IndependentComponent Analysis
Author: Marian Stewart Bartlett,
Javier R. Movellan, Terrence J. Sejnowski
Lecturer: Fang Fang
General Information
IEEE TRANSACTIONS ON NEURAL
NETWORKS, VOL. 13, NO. 6, NOVEMBER 2002
http://ieeexplore.ieee.org/Xplore/dynhome.jsp
Marian Stewart Bartlett Assistant Research Professor at the Institute for Neural Computat
ion, University of California-San Diego
Education: the B.S. in mathematics and computer science from Middlebury
College, in 1988
the Ph.D. in cognitive science and psychology from the University of California-San Diego, La Jolla, in 1998.
Advisor: T. Sejnowski research interests · Image analysis through unsupervised learning.
· Facial identity recognition.
· Facial expression analysis.
· Independent component analysis for pattern recognition
Homepage: http://mplab.ucsd.edu/~marni/index.htmlEmail: [email protected]
Publications Book: Face Image Analysis by Unsupervised Learning. Foreword by Terrence J. Sejnowski.
Papers :Bartlett, M.S., Littlewort, G.C. Automatic Recognition of Facial Actions in Spontaneous
Expressions. Journal of Multimedia 1(6) p. 22-35. (2006).
Bartlett, M.S., Littlewort, G.C Fully automatic facial action recognition in spontaneous
behavior. Automatic Face and Gesture Recognition. (2006).
Bartlett, M.S., Littlewort, G.C Recognizing Facial Expression: Machine Learning and
Application to Spontaneous Behavior.CVPR2005.
Littlewort, G.C ., Bartlett, M.S. Dynamics of facial expression extracted automatically
from video., CVPR2004
Bartlett, M.S., Littlewort, G.C Real time face detection and expression recognition:
Development and application to human-computer interaction. CVPR 2003.
Javier R. Movellan was born in Palencia,Spain. Research Associate with Carnegie-Mellon University, from 1989 to1993
Assistant Professor with the Department of Cognitive Science, University of California-San Diego (USCD), from 1993 to 2001.
Research Associate with the Institute for Neural Computation and head of the Machine Perception Laboratory at UCSD.
Education: the B.S. Universidad Autonoma de Madrid, Spain.
the Ph.D. University of California-Berkeley in 1989
He was a Fulbright Scholar at the same University
research interests
development of perceptual computer interfaces.
analyzing the statistical structure of natural signals in order to help understand how the brain works
Email: [email protected]
PublicationsJavier R. Movellan: Local Algorithm to Learn Trajectories with Stochastic
Neural Networks. NIPS 1993
Javier R. Movellan: Visual Speech Recognition with Stochastic Networks. NIPS 1994
Javier R. Movellan, Paul Mineiro: Bayesian Robustification for Audio Visua
l Fusion. NIPS 1997 Javier R. Movellan: A Learning Theorem for Networks at Detailed Stochasti
c Equilibrium. Neural Computation 1998
Javier R. Movellan, Paul Mineiro: Robust Sensor Fusion: Analysis and Application to Audio Visual Speech Recognition. Machine Learning 1998
Javier R. Movellan, Paul Mineiro : Partially Observable SDE Models for Image Sequence Recognition Tasks. NIPS 2000
Javier R. Movellan, Thomas Wachtler : Factorial Coding of Color in Primary Visual Cortex. NIPS 2002
Terrence J. Sejnowski
joined the faculty of the Department of Biophysics at Johns Hopkins University in 1982
an Investigator with the Howard Hughes Medical Institute
a Professor at The Salk Institute for Biological Studies ,where he
directs the Computational Neurobiology Laboratory
Professor of Biology at the University of California-San Diego
Dr. Sejnowski received the IEEE Neural Networks Pioneer Award in 2002.
Education: the B.S. in physics from the CaseWestern Reserve University
the Ph.D. in physics from Princeton University, in 1978.
research interests
The long-range goal is to build linking principles from brain to behavior using computational models.
Email: [email protected]
PublicationsTerrence J. Sejnowski, B. Yuhas : Combining Visual and Acoustic Speech S
ignals with a Neural Network Improves Intelligibility. NIPS 1989
Nicol N. Schraudolph, Terrence J. Sejnowski : Competitive Anti-Hebbian Le
arning of Invariants. NIPS 1991 Steven J. Nowlan, Terrence J. Sejnowski : Filter Selection Model for Genera
ting Visual Motion Signals. NIPS 1992 Jutta Kretzberg, Terrence J. Sejnowski : Variability of postsynaptic response
s depends non-linearly on the number of synaptic inputs. Neurocomputing 2003
Odelia Schwartz, Terrence J. Sejnowski : Assignment of Multiplicative Mixtur
es in Natural Images . NIPS 2004 Odelia Schwartz, Terrence J. Sejnowski : A Bayesian Framework for Tilt Per
ception and Confidence. NIPS 2005
提纲摘要介绍 ICAICA 表示人脸的两种结构实验结果和结论
Abstract A number of current face recognition algorithms use face representations found by u
nsupervised statistical methods. Typically these methods find a set of basis images and represent faces as a linear combination of those images.
Principal componentanalysis (PCA) is a popular example of such methods. The basis images found by PCA depend only on pairwise relationships between pixels in the image database. In a task such as face recognition, in which important information may be contained in the high-order relationships among pixels, it seems reasonable to expect that better basis images may be found by methods sensitive to these high-order statistics.
Independent component analysis (ICA), a generalization of PCA, is one such method. We used aversion of ICA derived from the principle of optimal information transfer through sigmoidal neurons.
ICA was performed on face images in the FERET database under two different architectures, one which treated the images as random variables and the pixels as outcomes, and a second which treated the pixels as random variables and the images as outcomes. The first architecture found spatially local basis images for the faces. The second architecture produced a factorial face code.
Both ICA representations were superior to representations based on PCA for recognizing faces across days and changes in expression. A classifier that combined the two ICA representations gave the best performance.
摘要目前已存在很多人脸特征提取算法 , 大多采用无监督统计方法。
这些无监督统计方法找出一组人脸基图像 , 然后用这组基图像的线性组合来表示人脸图像。
主成分分析就是这种方法中比较受欢迎的一种 , 但是它提取的基图像只是基于原人脸图像象素两两之间的二阶统计关系。然而在人脸识别的应用中 , 识别所需要的重要信息可能包含象素间的高阶统计关系 , 因此采用对这些高阶信息敏感的特征能获得更好的识别效果。
独立分量分析 ( ICA) 是这种高阶统计方法中的一种 , 它是主成分分析的推广。 ICA 的求解过程遵循最优化的过程,这种最优化的信息转换又是通过 sigmoid 神经元实现的。
ICA 被应用在 FERET 数据库的人脸图像有两种不同的表示, 一种表示方法是把图像作为随机变量把像素作为输出,另一种方法是把像素作为随机变量把图像作为输出。第一种结构叫独立基图像表示。第二种结构叫因子表示。
对于天数和表情变化的人脸识别,两种 ICA 的表示方法都要比PCA 的方法好。融合了两种 ICA 表示的分类器取得了最好的效果。
Introduction
PCA can only separate pairwise linear dependencies between pixels. High-order dependencies will still show in the joint distribution of PCA coefficients, and, thus, will not be properly separated.
In a task such as face recognition, much of the important information may be contained in the high-order relationships among the image pixels.
Independent component analysis (ICA) is one such generalization.
Independent Component Analysis
ICA Ⅰ基于高阶统计量的独立分量分析目的:把信号分解成若干个互相独立的成分背景:最早由 Jutten 和 Herault 在 1981 年和 1991
年提出(发表在 Signal Processing ) Comon 在 1994 年发表的论文 ,首次阐明了独立分
量的概念 算法: 1. 信息极大化( infomax ) Bell 和 Sejnowski 2. 互信息极小化 MMI Shun-ichi Amari (minimization of mutual information ) 3. 固定点算法 ( fast ICA ) Erkki 和 Hyvarinen 提取信号非高斯极大化
ICA ⅡICA 的模型描述为:将观察到的信号(原始信号),
看作隐藏变量的线性组合。隐藏变量满足非高斯分布,且相互独立
原始信号表示为:
是观察到的信号, 是独立分量上式表示为:
分量 相互独立
X AS
1 2( , , )TnX x x x 1 2( , , )TnS s s s
S WX
is
当且仅当 中各分量独立时
如何判断各分量间相互独立 ?
1
( ) ( )N
ii
p Y p y
ICA Ⅲ
互信息:
1
1
( )( ) ( ), ( ) ( ) log
( )
N
i Ni
ii
p YI Y KL p Y p y p Y dY
p y
Y ( ) 0I Y
W 选择矩阵 ,由 求 ,使上式达到极小X Y WX
ICA Ⅳ
问题:需要对 和 作估计,估计即繁琐,也不准确解决:通过在输出端引入某种非线性环节,自动引入高阶 统计量
( )ip y ( )p Y
信息极大化特点:输出 y 之后逐分量地引入一个非线性函数 来代替对高阶统计量的估计目标:在给定合适的 后 ,调节矩阵W ,使输 出 的总熵量 极大 极大意味着 y 的各分量间互信息极小
( )i i ir g y
( )i ig y
1 2, mr r r r ( )H r
( )H r
ICAⅤ: infomax algorithm
be an -dimensional random vector
be an invertible matrix
an -D random variable representing the outputs
of -neurons.
Typically, the logistic function is used:
sigmoid function
X nW n n
U WXnn
ICA Ⅵ: infomax algorithm
This is achieved by performing gradient ascent on the entropy of the output with respect to the weight matrix .
the ratio between the second and first partial derivatives of the activation function,
Computation of the matrix inverse can be avoided by employing the natural gradient , which amounts to multiplying the absolute gradient by TW W
附录 : 信息极大化的证明Ⅰ
证明: 1 1
( ) ( )n n n n
n n
H y H yW W W W W W
W W
1
1
( , ) ( ) ( ) log ( )
( ) log ( ) ( ) log ( )
( ) 1
N
i ii
N
i ii
H Y W H X p X W f u dx
H X W p X dx p X f u dx
p X dx
( ) log ( ) log( ( )i i X i ip X f u dx E f u
1
( , ) ( ) log log( ( )N
X i ii
H Y W H X W E f u
信息极大化的证明Ⅱ将上式对 W 求导:
T
1
N N1 1
1 1 N N
log( ( )) E[ (U)X ]W
f (u )f (u )(U) , ,
f (u ) f (u )
N
X i ii
E f u
i ij
ij ij i i
( ) f ( )1log( ( ))
( ) f ( )i i
i ii i
f u uf u x
w f u w u
1 1T T T T( )W E[ (U)X ] W +E[ X ]
H YY
W
1Tdet W1log W
W det WW
W
第二项:
第三项:
因为 是以 为概率密度函数的均值,所以作随机处理时,可以取消总集均值 xE ( )p X
∴
ICAⅦ:模型预处理统一中心:对训练样本中心化,使每个样本成为零均值矢量 即:
S
( )X E X
1 1 1 1 1 1
2 2 2 2 2 2 IT T T T TXX U S S U U SS U
1
2 TX U S
2T T T T0 0 0 0 0U V V U U U U UTSS 2
0
球化(白化; sphering ):使 中各行互相正交,各行的能量都相等且等于 1 (消除一阶和二阶的相关性)
或
注:
是原始信号, 和 分别是的协方差矩阵的特征值矩阵和特征向量矩阵 S U
ICA 与 PCA 的比较Ⅰ If the sources are Gaussian, the likelihood of the data
depends only on first- and second-order statistics .Second-order statistics capture the amplitude spectrum of
images but not their phase spectrum. The high-order statistics capture the phase spectrum.
The phase spectrum, not the power spectrum, contains the structural information in images that drives human perception
For a given sample of natural images, we can scramble their phase spectrum while maintaining their power spectrum. This will dramatically alter the appearance of the images but will not change their second-order statistics
ICA 与 PCA 的比较:例 1
Original image Scambled phase Reconstructions with the amplitude of the original face and the phase of the other face
ICA 与 PCA 的比较Ⅱ
It provides a better probabilistic model of the data, which better identifies where the data concentrate in
-dimensional space. It uniquely identifies the mixing matrix . It finds a not-necessarily orthogonal basis which may
reconstruct the data better than PCA in the presence of noise.
It is sensitive to high-order statistics in the data, not just the covariance matrix.
nW
ICA 与 PCA 的比较:例 2
Top:3-D data distribution and corresponding PC and IC axesbottom left: Distribution of the first PCA coordinates of the data.bottom right: Distribution of the first ICA coordinates of the data.
If only two components are allowed, ICA chooses a different subspace than PCA .
人脸的 ICA 表示方法basis image( 独立基图像 )
images are random variables and pixels are trials.it makes sense to talk about independence of images o
r functions of images
factorial code( 因子表示 )pixels are random variables and images are trials.
it makes sense to talk about independence of pixels or functions of pixels.
人脸的 ICA 表示方法
Two architectures for performing ICA on images.(a) Architecture I for finding statistically independent basis images. Performing source
separation on the face images produced IC images in the rows of U(c) Architecture II for finding a factorial code. Performing source separation on the pixels
produced a factorial code in the columns of the output matrix, U.
IMAGE DATA The data set contained images of 425 individuals. There were up to four
frontal views of each individual: train on a single frontal view of each individual test for recognition under three different conditions Coordinates for eye and mouth locations were provided with the FERET
database. crop and scale them to 60× 50 pixels.
neutral expression and change of expression from session 1; neutral expression and change of expression from session2
ARCHITECTURE I:statistically independent basis images
ARCHITECTURE I:statistically independent basis images
3000 was intractable under our present memory limitation
ICA on a set of m linear combinations of those images
PCA ICA
PCA ICAThe PC representation of the set of zero-mean images in based on is defined as: A minimum squared error approximation of is obtained by
mP
The ICA algorithm produced a matrix such that :
Therefore:
ˆ T Tm m m mX R P XP P X
TI mW P U 1T
m IP W U
ˆ Tm mX R P 1ˆ
m IX R W UCoefficient:
1m IB R W
PCA ICAA representation for test images was obtained by using the P
C representation based on the training images to obtain
, and then computingtext text mR X P
It was employed to serve two purposes:
1) to reduce the number of sources to a tractable number
2) to provide a convenient method for calculating representations of test images.
Face Recognition Performance
the coefficient vectors by the nearest neighbor algorithm,
using cosines as the similarity measure.
Subspace Selection
Discriminability of the ICA coefficients (solid lines) and discriminability
of the PCA components (dotted lines) for the three test cases. Components
were sorted by the magnitude of r
The ICA coefficients
consistently had greater
class discriminability than
the PCA coefficients
Subspace Selection
the ICA-defined subspace
encoded more information
about facial identity than
PCA-defined subspace.
Improvement in face recognition performance for the ICA and PCA
representations using subsets of components selected by the class
discriminability r. The improvement is indicated by the gray
segments at the top of the bars.
ARCHITECTURE Ⅱ:A Factorial Face Code
ARCHITECTURE :Ⅱthe data matrix x so that rows represent different pixels and columns
represent different images
ARCHITECTURE :ⅡThe representational code for test images is obtained:
In order to reduce the dimensionality of the input,,ICA was performed on the first 200 PCA coefficients of the face images.
representation for the training images:
representation for test images:
this approach tends to generate basis images that look more face-like than the basis images generated by PCA
Face Recognition Performance
There was no significant difference in the erformances of the two ICA representations
Face Recognition Performance
Selection of subsets of
components for the
representation by class
discriminability had little effect
on the recognition performance
using the ICA-factorial
representation
Examination of the ICA representations
Mutual information
DiscussionⅠ In this paper, we explored one such generalization:
Bell and Sejnowski’s ICA algorithm. We explored two different architectures for developing image repre
sentations of faces using ICA. The purpose of the comparison in this paper was to examine ICA a
nd PCA-based representations under identical conditions. Both ICA representations outperformed the “eigenface” representa
tion ,for recognizing images of faces sampled on a different day fro
m the training images. there was no significant difference between PCA and ICA using Eu
clidean distance as the similarity measure. ( Moghaddam )
DiscussionⅡ It is an open question as to whether these techniques would
enhance performance with PCA and ICA equally. It is possible that the factorial code representation may prove
advantageous with more powerful recognition engines than nearest neighbor on cosines, such as a Bayesian classifier
The research presented here found that face representations in which high-order dependencies are separated into individual coefficients gave superior recognition performance to representations which only separate second-order redundancies.
Thanks All!