Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
1
CHAPTER 1
INTRODUCTION
Face recognition is a complex image-processing problem in real
world applications with multifaceted effects of illumination, occlusion,
expression, pose variation and imaging condition on the live images. Facial
analysis includes face detection prior to face recognition. Face detection finds
the position of the face in a given image. Face recognition identifies the given
images as a particular person by comparing with known structured properties
of the faces in the database and they are used commonly in most of the
computer vision applications. These images have some common properties
like same resolution, same facial feature components, similar eye alignment,
etc. These images are referred as standard image. Face detection detects the
faces and extracts the face images which include the major facial features
used for distinguishing the faces that include eyes, eyebrows, nose, and
mouth. Face recognition compares the test image with the standard images
using the common features extracted. A face recognition system is one of the
biometric information processing systems. Compared to other biometric
information processing systems i.e. fingerprint, iris scanning, signature, etc.
face recognition system has larger working range. Face detection involves
recognizing people using the essential characteristics of the faces. Face
detection is used for many authentication applications. Compared to other
biometrics, such as fingerprint, DNA, or voice, face recognition is more
natural, nonintrusive and can be used without the cooperation of the subject.
Due to the recent advances in pattern recognition and use of powerful
computers face recognition systems are extended to the real-time and it
2
achieves a satisfying performance under controlled conditions. This leads to
many potential applications. Automated face recognition includes various
techniques from different research fields, as computer vision, image
processing, pattern recognition, and machine learning. Computer vision
applications are universally used in digital camera, mobile phones, security
areas, cars, toys, hospitals, airports.
The primary applications of face recognition are:
• Person verification (matching): The face image of an
unknown individual is compared along with a claim of
identity to establish whether the individual is who he or she
claims to be.
• Person identification (one-to-many comparison) : The face
image of an unknown individual is compared to the face
image of known individuals in the database to establish the
identity of the person.
Face recognition can be used for these two purposes and it has
several application areas, a few of such applications are stated below.
• Security
Face recognition system are used to control accesses to buildings,
airports, harbor, ATM machines, border checkpoints, network security and
email authentication on multimedia workstations.
• Surveillance
A large number of CCTVs are used to monitor and look for known
criminals, drug offenders, etc. and on locating such cases the authorities can
be notified.
3
• General identity verification
In the case of electoral registration, banking, electronic commerce,
identifying newborns, national IDs, passports, drivers’ licenses, employee IDs
face recognition can be used.
• Criminal justice systems
Facial recognition systems are also useful for mug-shot,booking
systems, post-event analysis, and forensic analysis.
• Image database investigations
Data investigation such as search for licensed drivers benefits users,
database investigation are also used for identifying the missing children and
immigrants.
• “Smart Card” applications
Also in certain cases the face-images can be stored in a smart card,
barcode or magnetic stripe, authentication of which is performed by matching
the live image with the stored template.
In environments using multi-media with adaptive human-computer
interfaces face recognition system can also be a part of ubiquitous or context
aware systems, behavior monitoring at childcare or old people’s centers,
recognizing customers and assessing their needs. Face recognition system can
also be used in video indexing, labeling faces in the video and facial
reconstruction of a witness.
4
1.1 BIOMETRIC RECOGNITION
Biometric recognition refers to the automatic recognition of
individuals based on their physical and behavioral characteristics or either of
it. Humans intuitively use some common characteristics to recognize each
other. Any human physiological or behavioral measurements ( Jain et al
2005) can be used as a biometric characteristic if it satisfies the following
requirements:
• Universality, each person should have some unique
characteristic.
• Distinctiveness, each person should be sufficiently different in
terms of the characteristic.
• Permanence, the characteristic should not vary over a period
of time.
• Collectability, the characteristic can be quantitatively
measureable.
In a practical system, the following parameters are also important:
• Performance, which refers to the recognition accuracy and
speed.
• Acceptability, it should be possible use in everyday life.
• Circumvention, which indicates how easily the system can be
fooled using fraudulent methods.
Any biometric system includes two different modes:
• Verification
• And identification.
5
In the verification process, the system compares the captured
biometric data with the template stored in the database, like a one-to-one
comparison. This mode is typically used for positive recognition.
In the identification mode, the system recognizes the user by
searching the templates of all users in the database. In this case, the
comparison is one-to-many. This mode is typical a negative recognition
applications.
A typical biometric system includes four main modules
• The sensor module that captures the biometric data.
• The feature extraction module where the features are extracted
by processing the data.
• The matcher module in which the features extracted are
compared to the features stored in templates. The decision
making module may also be an integral part of matcher
module where the user’s identity is confirmed (verification) or
established (identification).
• The system database module is used to store the biometric
templates.
In any biometric system the main focus is on the feature extraction
module. A number of biometric systems used in real applications, each one
with its strengths and weakness, the choice mostly depends on the application.
Some of the commonly used biometric systems include
6
DNA System:
The DNA is a unique code for one’s individuality but it is used mostly in
forensic applications. It’s not useful in automatic real-time recognition
applications.
Ear Recognition System:
The ear recognition is based on matching the distance of salient points on the
pinna, but it is not very distinctive in establishing the identity of a user.
Finger Print System:
The finger print has a very high matching accuracy with a reasonable price,
but this system requires a large amount of computational resources and for
elderly persons fingerprint is changing or not recognized.
Gait System:
The gait is the peculiar way one walks and is a very complex biometric as it
cannot be very distinctive, but it can be used in low-security applications.
This method is computationally expansive.
Hand Geometry Recognition System:
The hand geometry recognition system is very simple, easy to use and
relatively cheap, but the geometry is not so distinctive. It can be used in
verification mode. It is one of the earliest automated biometric systems.
Iris System:
The iris is the annular region of the eye bounded by the pupil and the sclera
(the white of the eye). The iris texture carries very distinctive information
useful for recognition. The early iris-based recognition system required
considerable user participation and were expensive, but the newer systems
have become more user-friendly and cost-effective.
7
Retinal Scan System:
The retinal scan is one of the most secure biometric as it is not easy to change
or replicate and it is the characteristic of each individual and each eye. The
image acquisition requires the user to look into an eyepiece and focus on a
specific spot so that a predetermined part of the retinal vasculature can be
imaged. The image acquisition requires high cooperation of the subject and
contact with the eye-piece. These factors can affect the public acceptability of
retinal biometric.
Signature Recognition System:
The signature has been accepted as verification method, but it could change
over a period of time and it can be influenced by physical and emotional
conditions of the subject.
Voice Recognition System:
The voice is another biometric which is not very distinctive and it changes a
lot over a period of time and is not useful in large-scale identification.
The samples of the same biometric characteristic from the same
person may not always be equal due to imperfect imaging conditions, changes
in the user’s characteristic, ambient conditions and user’s interaction with the
sensor.
The two major possibilities of errors in a biometric verification system are
When biometric measurements from two different persons are
recognized to be from the same person the error is called false
match, and when two biometric measurements from the same
person are recognized to be from two different persons the
error is called false non-match. They can be also called false
acceptance and false rejection. There is a tradeoff between the
8
false match and false non-match rate, in fact, both are
functions of the system threshold t: if t is decreased to make
the system more tolerant to input variations and noise, then the
false match rate increases, on the other hand if t is increased to
make the system more secure, then the false non-match rate
increases.
Two more recognition errors include the failure to capture and the
failure to enroll; the first corresponds to the number of times the biometric
device fails to automatically capture a sample, and the second denotes the
number of times users cannot enroll in the recognition system.
The applications of biometrics can be divided into three main
groups namely commercial, government and forensic
applications. The commercial constructive recognition
applications can work both in verification and identification
mode. Whereas government and forensic pessimistic
applications requires mostly identification.
Another important aspect of a biometric system is the
interaction with the user and his or her privacy. If the
interaction is easy and comfortable the system will be easily
accepted and if there is little cooperation and participation
required to the user the system may be perceived more
convenient. On the other hand, systems that do not require
user participation may be perceived as a threat to privacy.
In addition to these applications, the underlying techniques in the
current face recognition system have also been modified and used for related
applications such as gender classification, expression recognition and pose
recognition and each of these has its utility in various domains.
9
Facial expression recognition can be utilized in the field of
medicine for intensive care monitoring while facial feature recognition and
detection can be exploited for tracking a vehicle driver’s eyes and thus
monitoring his fatigue, as well as for stress detection. Face recognition is also
being used in conjunction with other biometrics such as speech, iris,
fingerprint and ear and gait recognition in order to enhance the recognition
performance of these systems. This has been made possible by the
accessibility of robust algorithms.
Face detection is a straightforward task for humans. In reality
humans recognize human faces very quickly compared to the face recognition
systems. Nevertheless, this task becomes challenging for a machine as the
face image captured with a vision sensor, gets altered by pose variations
(rotation out-of-plane)due to variation in camera angle, illumination, facial
expressions, and occlusions (glasses, sunglasses, hat).This makes the face
recognition process more complex. The face detection problem is one of the
oldest problems in computer vision, dating as early as 1972. Though the
performance of the systems that are developed in the past few years for face
detection have been good, face detection is still an interesting problem.
Because, recognizing patterns by computer systems is not completely
understood when compared to the performance of a human being. The first
step for face recognition system is to acquire an image from a camera. Second
step is to detect the face region from the acquired image. The third step is face
recognition that takes the face images from output of detection part. The
various steps for the face recognition system are given in Figure 1.1.
Figure 1.1 Stages involved in face processing
Input Image FaceDetection
FaceRecognition
10
There are many closely related problems of face detection. A
general view of different face processing problems is shown in Figure 1.2.
Figure 1.2 Face Recognition problems
Face tracking methods continuously estimate the location and
possibly the orientation of a face in an image sequence, while facial
expression recognition concerns identifying the affective states (happy, sad,
and surprised) of humans. For any of these systems the first step is face
detection.
1.2 FACTORS THAT IMPINGE ON FACE RECOGNITION
SYSTEM
The factors affecting the overall performance of the face
recognition system and portable solutions are given below:
• Camera distortion and noise problems.
• A human face is a 3D object and a non-rigid body, at times it
changes due to the change of mood of subject generating
facial expressions. A smiling face and a frowning face are
completely different from the perspective of face recognition.
Input Image
Pose variation Illumination variation
Expression variation
Presence of Occlusion
11
To overcome these problems the searching area could be
reduced to eyes and nose, excluding mouth and ears.
• Changes in illumination direction can affect the quality and
description of the 2D image representation. There are different
types of light with different spectrums, it can be artificial or
natural and change during the day. They can be alleviated
using image enhancement techniques.
• The subject can change his/her appearance in time due to the
age and to voluntary changes in facial outlook.
• The pose variation problem is one of the hardest problems in
face recognition. There could be variations in translation, scale
and rotation. According to Chan et al (2010) Translation can
be easily solved using a windowing method. The scaling
problem is also easy to solve creating an image pyramid (a
collection of the same image with different resolutions) to
represent the input image. The rotation along the axis
perpendicular to the image plane can be solved by rotating
back the image. The hardest problem is to handle rotations out
of the image plane because they can cause occlusions. An
occluded face usually is not suitable for recognition and it is
not selected from images databases or videos.
• A complex image background can also affect face recognition
system; it can be removed using a face detector before face
recognition to reduce the searching area.
• Facial makeup and hair style are less influential than other
facial variations. Usually a face recognition system requires
user’s cooperation on this problem.
12
The face biometric is accepted with a number of intrinsic
(e.g., expression and age) and extrinsic (e.g., pose and lighting) variations.
There has been a significant improvement in face recognition performance
during the past decade, but it is still below acceptable level for use in many
applications. The first problem that needs to be addressed in face recognition
is face detection. More efforts have been devoted to 2D face recognition
because of the availability of commodity: 2D cameras and deployment
opportunities in many security scenarios. However, 2D face recognition is
prone to a variety of factors encountered in practice, including pose and
lighting variations, expression variations, age variations, and facial
occlusions. The various problems that affect the face detection and
recognition system are discussed in detail.
1.2.1 Lighting Variation
The difference in face images of the same person due to severe
lighting variation can be more significant than the difference in features of
face images of different persons. The face being a 3D object, different
lighting sources generates various illumination conditions and shadings on the
face image. Many methods have been developed to study the invariant facial
features that are robust against lighting variations, and these methods
compensate for the lighting variations using prior knowledge of lighting
sources based on data meant for training. The examples of the face image
affected by illumination are as shown in Figure 1.3.
Figure 1.3 Example of varying illumination
13
These methods provide visually enhanced face images after lighting
being normalization and show improved recognition accuracy. Although the
performance of face recognition systems in indoor platforms has reached a
certain level, face recognition in outdoor platforms still remains as a
challenging topic because of illumination problem. Many illumination
invariant face recognition approaches are introduced to overcome illumination
challenge. On the other hand, various illumination normalization methods are
proposed. These illumination normalization methods remove the illumination
variation to some extent facilitating the face recognition process.
1.2.2 Occlusion
Figure 1.4 Example of occluded face images
Figure 1.4 shows a glimpse of the face image affected by partial
occlusion. Face images often appear occluded by other objects or by the face
itself (i.e., self- occlusion), especially in surveillance videos. Most of the
commercial face recognition systems reject an input image when the eyes
cannot be detected. Occlusions cause erroneous facial feature localization.
Local feature based methods are proposed to overcome the occlusion
problem. When misalignment problem is solved, a very high correct
recognition rates can be achieved with a generic local appearance-based face
recognition algorithm. In case of a lower face occlusion, only a slight
decrease in the performance is observed, when a local appearance-based face
14
representation approach is used. This indicates the importance of local
processing when dealing with partial face occlusion. Moreover, improved
alignment increases the correct recognition rate even in the experiments
performed against the lower face occlusion. This shows that the face
registration plays a key role on face recognition performance. The challenge
is on how to obtain the alignment points. Normally, eyes points are used for
face alignment. However, a sunglass or occlusion prevents the detection of
eye coordinates. They need many points from different parts of a face.
However, detecting many points from a face is not realistic in real-world
conditions. This problem is overcome by the proposed patch-based feature
extraction.
1.2.3 Facial Expression
Facial expression is an internal variation that causes large intra-
class variation. There are some local feature based approaches and 3D model
based approaches designed to handle the expression problem. On the other
hand, the recognition of facial expressions is an active research area in human
computer interaction and communications. Face expression is less significant
issue when compared with occlusion, angle and illumination but it affects the
face recognition results.
Figure1.5Example of varying expression
An example of the expression varying face image of a single person
is shown in figure 1.5.Although a closed eye or smiling face does affect the
15
recognition rate by 1% to 10%, a face with large laugh has an influence as
high as 30% since a laughing face changes the face appearance and distorts
the correlation of eyes, mouth and nose. Hence, the features are grouped into
different classes. This suddenly increases the false alarm rate. Many research
work focus on small changes on the face surface. However, huge changes in
expression are still an unsolved problem. The reconstruction of the face
components solves this challenging problem. The latest research which remodels
the face by using texture, shape and spatial frequency decomposition
overcomes this challenge.
1.2.4 Pose Variation
Pose variation degrades the performance of face recognition
system. The face image that may appear different depending on the direction
in which the face is imaged. Thus, it is possible that images taken at two
different viewpoints of the same subject (intra-user variation) may appear
more different than two images taken from the same view point for two
different subjects (inter-user variation). The different face images affected by
variation in poses are as shown in Figure 1.6.
Figure1.6Example of varying pose
In a surveillance system, the camera is mostly mounted to a
location where the people cannot reach to the camera. Since the camera is
mounted at high location, the faces are viewed by the camera are in different
poses (i.e with varying angle).
16
This is the simplest case in city surveillance applications. The most
difficult case is that people pass through the camera view and do not look at
the camera lens directly. Restriction cannot be given to people for their
behaviors in public places. Recognition in such cases must be done in an
accurate way. However, even state-of-the-art-techniques have 10 or 15 degree
angle limitation to recognize a face. Recognizing faces from more angles is
another challenge. The most significant face features are lost at an angle of 25
degree or 30 degree. Hence, the system reliability decreases exponentially.
The techniques proposed until now did not give good results in actual
working conditions since there are many other factors that are added to face
pose problem in outdoor environments. In many face recognition scenarios
the pose of the probe and gallery images is different. For example, the gallery
image might be a frontal ‘‘mug-shot’’ and the probe image might be a 3/4
view captured from a camera in the corner of a room. Approaches addressing
pose variation can be classified into two main categories depending on the
type of gallery images they use.
Multi-view face recognition is a direct extension of frontal face
recognition in which the algorithms require gallery images of every subject at
every pose. In face recognition across pose the concerns are with the problem
of building algorithms to recognize a face from a novel viewpoint, i.e., a
viewpoint from which it has not previously been seen.
The method for acquiring face images depends upon the underlying
application. Surveillance applications may best be served by capturing face
images by means of a video camera while image database investigations may
require static intensity images taken by a standard camera. Some other
applications, such as access to top security domains, may even necessitate the
forgoing of the non intrusive quality of face recognition by requiring the user
to stand in front of a 3D scanner or an infra-red sensor.
17
Face Detection
Image - Based Methods
Knowledge - Based Methods
1.3 FACE DETECTION
Depending on the face data acquisition methodology, face detection
techniques can be broadly divided into three categories: methods that operate
on intensity images, those that deal with video sequences, and those that
require other sensory data such as 3D information or infra-red imagery. The
various techniques of face detection includes
Feature based method
Template matching
Figure 1.7 Face detection methods
1.3.1 Feature Based Method
Invariant features of the face image are extracted in the Feature
based method. In this method features are to be invariable over the variability
of the human face expression and pose. The feature in this technique includes
the distance between the eyes, eyebrows, size of the lips, nose etc. Many
methods were proposed to extract the features of the face image. Based on the
extracted features a lot of statistical models were developed which would be
used for face detection.
18
Figure 1.8 Geometrical features (white) of the face image
The main advantage of the feature-based techniques is that such
methods are relatively robust to position variations in the input image. In
principle, feature-based schemes can be made invariant to size, orientation
and/or lighting. Other benefits of these schemes include the compactness of
representation of the face images and high speed matching. The major
disadvantage of these approaches is the difficulty of automatic feature
detection (as discussed above) and the fact that the implementer of any of
these techniques has to make arbitrary decisions about which features are
important. After all, if the feature set lacks discrimination ability, no amount
of subsequent processing can compensate for that intrinsic deficiency.
1.3.2 Template Matching
Template matching can be subdivided between two approaches:
feature-based and template-based matching. The feature-based approach uses
the features of the search and template image, such as edges or corners, as the
primary match-measuring metrics to find the best matching location of the
template in the source image. The template-based, or global, approach, uses
19
the entire template, with generally a sum-comparing metric
(using SAD,SSD, cross-correlation, etc.) that determines the best location by
testing all or a sample of the viable test locations within the search image that
the template image may match up to.
1.3.2.1 Feature based
If the template image has strong features, a feature-based approach
may be considered. The approach may prove further useful if the match in the
search image might be transformed in some fashion. Since this approach does
not consider the entire template image, it can be more computationally
efficient when working with source images of high resolution. As an
alternative method, template-based approach, may require searching
potentially large amounts of points in order to determine the best matching
location.
1.3.2.2 Template based approach
For templates without strong features, or for when the bulk of the
template image constitutes the matching image, a template-based approach
may be effective. As aforementioned, since template-based template matching
may potentially require sampling of a large number of points, it is possible to
reduce the number of sampling points by reducing the resolution of the search
and template images by the same factor and performing the operation on the
resultant downsized images (multi resolution, or pyramid, image processing),
providing a search window of data points within the search image so that the
template does not have to search every viable data point, or a combination of
both.
20
1.4 FACE RECOGNITION
Face recognition is the process of identifying a particular person if
he or she belongs to the group of members present in the database. There are
numerous methods for face recognition. The various face recognition
algorithms are classified as shown in Figure 1.9.
Figure 1.9 Face recognition methods
1.4.1 2D Face Recognition
1.4.1.1 Linear/Non-linear
Automatic face recognition is a kind of pattern recognition
problem, and it is very hard to solve due to its nonlinearity. Particularly, it is a
template matching problem, where recognition has to be performed in a high-
dimensional space. Higher the dimension of the space, computation time to
find a match is higher, so a dimensionality reduction technique can be used to
project the problem in a lower-dimensionality space. Some of the
dimensionality reduction techniques are discussed below.
The Eigenfaces method (Kirby & Sirovich 1990) can be considered
as is the first approaches in this sense. In the eigenface method an N x N
image I is linearized in a N2 vector, so that it represents a point in a
N2-dimensional space. A low dimensional space is found by means of a
dimensionality reduction technique.
Face
2D Recognition
3D Recognition
Linear/Non-Linear
Neural Network
21
The problem was overcome in the principal component analysis
method in which after the linearization the mean vector is calculated, among
all images, and subtracted from all the vectors, corresponding to the original
faces. The covariance matrix is then computed, in order to extract a limited
number of its eigenvectors, corresponding to the greatest eigen values. These
few eigenvectors, also referred to as eigenfaces, represent a base in a low-
dimensionality space. When a new image has to be tested, the corresponding
eigenface expansion is computed and compared against the entire database,
according to such a distance measure. As the PCA is performed only for
training the system, this method works very fast, for testing new face images.
The PCA has been intensively exploited in face recognition applications.
Linear Discriminant Analysis (LDA) (Martinez & Kak 2001) is
another better alternative to the PCA. It provides discrimination among the
classes, while the PCA deals with the input data in their entirety, without
paying any attention for the underlying structure. The main aim of the LDA is
to find a base vector to provide the best discrimination among the classes and
try to maximize the between-class differences and minimizing the within-
class ones. According to Martinez & Kak (2001), though the LDA outperform
the PCA, LDA provides better classification performances only when a wide
training set is available. Besides recent studies also strengthen this argument
especially of this problem referred to as the SSS (Small Sample Size)
problem.
Some approaches, such as the Fisherfaces, the PCA is considered as
a preliminary step in order to reduce the dimensionality of the input space,
and then the LDA is applied to the resulting space, in order to perform the real
classification. In the work of (Chen et al 2000; Yu & Yang 2001), combining
PCA and LDA, discriminated information together with redundant one is
22
discarded. Thus, in some cases, the LDA is applied directly on the input space
(Chen et al 2000; Yu & Yang 2001).
The DCV (Discriminant Common Vectors) (Cevikalp et al 2005)
represents a further development to this approach. The main idea of the DCV
is to collect the similarities among the elements in the same class and drop
their dissimilarities. In this way each class can be represented by a common
vector computed from the within scatter matrix. When an unknown face has
to be tested, the corresponding feature vector is computed and associated to
the class with the nearest common vector. The main disadvantage of the PCA,
LDA, Fisher-faces is their linearity Particularly the PCA extracts a low-
dimensional representation of the input data only exploiting the
covariancematrix, so that no more than first- and second order statistics are
used.
In (Bartlett Marian et al 2002) proposed that the first- and second
order statistics hold information only about the amplitude spectrum of an
image, discarding the phase-spectrum, but the human capability in
recognizing objects is mainly driven by the phase-spectrum. To overcome this
problem (Bartlett Marian et al 2002) the ICA was introduced as a more
powerful classification tool for the face recognition problem. The ICA can be
considered as a generalization of the PCA, but providing three main
advantages: (1) It allows a better characterization of data in an n-dimensional
space; (2) the vectors found by the ICA are not necessarily orthogonals, so
that they also reduce the reconstruction error; (3) they capture Discriminant
features not only exploiting the covariance matrix, but also considering the
high-order statistics
23
1.4.1.2 Neural networks
The other nonlinear solution for the face recognition problem is the
use of neural networks. Neural network is largely used in many other pattern
recognition problems, and readapted to cope the people authentication task.
The advantage of neural classifiers over linear ones is that they can reduce
misclassifications among the neighboring classes. The basic idea is to
consider a net with a neuron for every pixel in the image. Nevertheless,
because of the pattern dimensions (an image has a dimension of about 256 X
256 pixels) neural networks are not directly trained with the input images, but
they are preceded by the application of such a dimensionality reduction
technique.
Cottrell & Fleming (1990), introduced a neural net, that operates in
auto-association (AA) mode. At first, the face image, represented by a vector
x, is approximated by a new vector h with smaller dimensions by the first
network (auto-association), and then h is finally used as input for the
classification net. According to Cottrell and Fleming the AA neural network
does not perform better than the eigen faces even if in optimal circumstances
other kind of neural networks are also tested for face recognition, in order to
exploit their particular properties.
Self Organizing Map (SOM) is invariant with respect to minor
changes in the image sample, while convolution networks provide a partial
invariance with respect to rotations, translations and scaling. In general, the
structure of the network is strongly dependent on its application field, so that
different contexts result in quite different networks. Lin et al (1997)proposed
the Probabilistic Decision Based Neural Network, which they modeled for
three different applications namely a face detector, an eyes localizer and a
face recognizer. The flexibility of these networks is due to their hierarchical
structure with nonlinear basis functions and a competitive credit assignment
24
scheme. Meng et al (2002) introduced a hybrid approach, in which, through
the PCA, the most discriminating features are extracted and used as the input
of a Radial basis function (RBF) neural network. The RBFs perform well for
face recognition problems, as they have a compact topology and learning
speed is fast. This RBF neural network leads to the problem of the over
fitting. The dimension of the network input is comparable to the size of the
training set. High dimension of the input results in slow convergence. The
sample size has to exponentially grow for having a real estimate of the
multivariate densities when the dimension increases. In case of the singular
problem if the number of training patterns is less than the number of features,
the covariance matrix is singular. In general, neural networks based
approaches encounter problems when the number of classes increases.
Moreover, they are not suitable for a single model image recognition task,
because multiple model images per person are necessary in order for training
the system to optimal parameter setting.
1.4.2 3D Face Recognition
Majority of face recognition methods based on 2D image
processing using monochrome or color images, reached a recognition rate
higher than 90% under lighting controlled conditions, and whenever subjects
are consentient. Unfortunately in case of pose, illumination and expression
variations the system performances drop, because 2D face recognition
methods still encounter difficulties. Xu et al (2004) compared intensity
images against depth images with respect to the discriminating power of
recognizing people. The depth maps give a more robust face representation,
because intensity images are heavily affected by changes in illumination.
Generally, for 3D face recognition it is intended that a class of methods that
work on a three-dimensional dataset, representing both face and head shape as
range data or polygonal meshes. The main advantage of the 3D based
25
approaches is that the 3D model retains all the information about the face
geometry. Moreover, 3D face recognition also grows to be a further evolution
of 2D recognition problem, because a more accurate representation of the
facial features leads to a potentially higher discriminating power. In a 3D face
model, facial features are represented by local and global curvatures that can
be considered as the real signature for identifying persons. The 3D facial
representation is a promising tool coping many of the human face variations,
extra-personal as well as intrapersonal. Two main representations are
commonly used to model faces in 3D applications that are 2.5D and 3D
images as shown in Figure. 1.10. A 2.5D image (range image) consists of a
two dimensional representation of a 3D points set (x,y, z), where each pixel in
the X–Y plane stores the depth value z. One can think of a 2.5D image as a
grey-scale image, where the black pixel corresponds to the background, while
the white pixel represents the surface point that is nearest to the camera. In
particular, a 2.5D image taken from a single viewpoint only allows facial
surface modeling, instead of the whole head. This problem is solved by taking
several scans from different viewpoints, building a 3D head model during a
training stage. On the contrary, 3D images are a global representation of the
whole head, and the facial surface is further related to the internal anatomical
structure, while 2.5D images depend on the external appearance as well as
environmental conditions. The simplest 3D face representation is a 3D
polygonal mesh, that consists of a list of points (vertices) connected by edges
(polygons). There are many ways to built a 3D mesh, the most used are
combining several 2.5D images, properly tuning a 3D morphable model or
exploiting a 3D acquisition system (3D scanner). A further difference
between 2.5D and 3D images is that last ones are not affected by self-
occlusions of the face, when the pose is not full-frontal.
26
Figure 1.10 (a) 2D image, (b) 2.5 image and (c) 3D image
Many criteria can be adopted to compare existing 3D face
algorithms by taking into account the type of problems they address or their
intrinsic properties. Some approaches perform very well only on faces with
neutral expression, while some others try also to deal with expression
changes. An additional parameter to measure 3D models based robustness is
represented by sensitive size variation. In fact, sometimes the distance
between the target and the camera can affect the size of the facial surface, as
well as its height, depth, etc. Therefore, approaches exploiting a curvature-
based representation cannot distinguish between two faces with similar shape,
but different size. In order to overcome this problem some methods based on
point-to-point comparison or on volume approximation are used. However,
the absence of an appropriate standard dataset containing large number and
variety of people, whose images were taken with a significant time delay and
with meaningful changes in expression, pose and illumination, is one of the
great limitations to empirical experimentation for existing algorithms.
In particular, 3D face recognition systems are tested on proprietary
databases, with few models and with a limited number of variations per
model. Consequently comparing the performances of different algorithms
27
often turns into a difficult task. . Nevertheless, they can be classified based on
the type of problems they address such as mesh alignment, morphing, etc.
1.5 MOTIVATION
The number of vision applications in various digital devices are
increasing. Each of these applications requires efficient processing of image
sequences using more and more complex pattern recognition algorithms.
Speeding up any of these tasks gives way to integrate more vision algorithms
in a device. There has been a different approach to improve detection
algorithms. In face detection task cascade of classifiers has been a popular
choice. Object detection has been speeded up using branch and bound
algorithm within a particular detection framework. The time consumed for
feature computation was reduced by approximating the features at different
scales. Some have implemented the existing methods in GPU’s which makes
use of parallel computing to speed up the detection task. A new alternative
search approach is suggested to target face detection and recognition which
may add value to some of the above mentioned methods.
1.6 OBJECTIVE
An alternative search strategy is devised to detect faces from an
image which can perform reasonably well even when faces differ in their
pose, variation in expression or occlusion in faces.
1. To detect the face region from the given face image with
maximum accuracy and reduce the false alarm rate using skin
color and the statistical features of the simplified local binary
mean with support vector machine.
2. To detect the occlusion present in the partially occluded face
image and recognize the face by comparing the SLBM and
28
MBWM features with the features of the images in the face
database.
3. To validate the proposed algorithms SLBM and MBWM for
face expression recognition.
4. To validate the proposed algorithms SLBM and MBWM for
pose detection.
1.7 CONTRIBUTIONS
The main contributions of this thesis are as follows:
1. Face detection system
A face detection system is proposed to reduce the number of miss
detections using skin region detection and extracted SLBM features. The
features of the skin region are used to predict the location of the face region
and are further verified by SVM classifier. Theoretical insight on LBP
approach and the proposed SLBM approach with respect to detection rate,
accuracy, false detection rate and precision are discussed in detail.
2. Pre-Processing for face recognition
Pre-processing helps in increasing the recognition rate. Many
pre-processing algorithms are applied on the detected face image which helps
in the face recognition process. Illumination correction is almost completely
done using the pre-processing technique. Pre-processing including gamma
correction, log transform, histogram equalization and local histogram
equalization are performed. Recognition is performed on the pre-processed
image which gave a better result compared to the original image without pre-
processing.
29
3. An alternate occlusion detection and face recognition technique
More importance is given to occlusion detection problem. Novel
algorithm is proposed for the detection of occlusion and the recognition of the
face. Statistical LBP, SLBM and MBWM features are extracted, and SVM is
used to detect the occluded region. The features of the occlusion free region
are used for the face image recognition.
4. Expression and Pose estimation
The proposed algorithms are extended to estimate the varying
expressions in the face image. The expression variation mostly affects the eye
region and the mouth region. So, weighted feature extraction is employed to
estimate the expression in the face image. Also, the MBWMH algorithm is
used to find the pose of the image. These features are also compared with the
LBPH and the SLBMH algorithms.
1.8 ORGANIZATION OF THE ENTIRE THESIS
The introduction describes about face detection, recognition and the
purpose of the thesis. Chapter 1 also gives a detailed explanation on
applications of the face detection and recognition system and the various
methods used for face detection and recognition system.
Chapter 2 details about the background of face detection and
recognition in the past.
The chapter also compares the various methods proposed for face
detection and recognition with its merits and demerits.
Chapter 3 is about the face detection process and the preprocessing
technique. In this chapter, skin color based face detection, illumination
30
normalization method using gamma intensity correction; log transform,
histogram equalization and local histogram equalization are explained in
detail.
Chapter 4 gives information on occlusion detection and face
recognition using LBP, SLBM and MBWM features. Also, the feature
analysis technique is described in detail. Finally, feature classification method
is described for face recognition.
Chapter 5 deals with some of the applications of SLBM and
MBWM features for face expression and pose estimation. The components of
chapter 3 and chapter 4 are used for the expression and pose estimation.
Chapter 6 gives the results of testing carried out by using major
face databases. Most of the major databases are used to measure the
efficiency and robustness of the methods. In chapter 6, a comparison with
other systems is also provided with graphs and tables.
Chapter 7 includes the conclusion and the summary of the overall
work and the extension of the current work.