14
Nearest Neighbourhood Classifiers in Biometric Fusion Nearest Neighbourhood Classifiers in Biometric Fusion A. Teoh, S. A. Samad, and A. Hussain Electrical, Electronic and System Engineering Department Faculty Engineering National University of Malaysia, Malaysia E-mail: [email protected] , [email protected] , [email protected] Abstract 1. Introduction This paper presents fusion decision technique comparisons based on nearest- neighborhood (NN) classifiers family for a bimodal biometric verification system that makes use of face images and speech utterances. The system is essentially constructed by a face expert, a speech expert and a fusion decision module. Each individual expert has been optimized to operate in automatic mode and designed for security access application. Fusion decision schemes considered are ordinary, modified k-NN and evidence theoretic k-NN classifier based on Dampster-Safer theory. The aim is to obtain the optimum fusion module from amongst these three techniques best suited to the target application. The identification of humans, for example, for financial transactions, access control or computer access, has mostly been conducted by using ID numbers, such as a PIN or a password. The main problem with those numbers is that they can be used by unauthorized persons. Instead, biometric verification systems use unique personal features of the user himself to verify the identity claimed. However, a major problem with biometrics is that the physical appearance of a person tends to vary with time. In addition, correct verification may not be guaranteed due to sensor noise and limitations of the feature extractor and matcher. One solution to cope with these limitations is to combine several biometrics in a multi-modal identity verification system. By using those multiple Keywords: Biometrics, face verification, speech verification, fusion decision. International Journal of The Computer, the Internet and Management Vol. 12#1 (January – April, 2004) pp 23 - 36 23

Nearest Neighbourhood Classifiers in Biometric Fusion · Nearest Neighbourhood Classifiers in Biometric Fusion Nearest Neighbourhood Classifiers in Biometric Fusion ... Electronic

Embed Size (px)

Citation preview

Page 1: Nearest Neighbourhood Classifiers in Biometric Fusion · Nearest Neighbourhood Classifiers in Biometric Fusion Nearest Neighbourhood Classifiers in Biometric Fusion ... Electronic

Nearest Neighbourhood Classifiers in Biometric Fusion

Nearest Neighbourhood Classifiers in Biometric Fusion

A. Teoh, S. A. Samad, and A. Hussain

Electrical, Electronic and System Engineering Department Faculty Engineering

National University of Malaysia, Malaysia

E-mail: [email protected], [email protected], [email protected]

Abstract 1. Introduction

This paper presents fusion decision

technique comparisons based on nearest-neighborhood (NN) classifiers family for a bimodal biometric verification system that makes use of face images and speech utterances. The system is essentially constructed by a face expert, a speech expert and a fusion decision module. Each individual expert has been optimized to operate in automatic mode and designed for security access application. Fusion decision schemes considered are ordinary, modified k-NN and evidence theoretic k-NN classifier based on Dampster-Safer theory. The aim is to obtain the optimum fusion module from amongst these three techniques best suited to the target application.

The identification of humans, for example, for financial transactions, access control or computer access, has mostly been conducted by using ID numbers, such as a PIN or a password. The main problem with those numbers is that they can be used by unauthorized persons. Instead, biometric verification systems use unique personal features of the user himself to verify the identity claimed.

However, a major problem with

biometrics is that the physical appearance of a person tends to vary with time. In addition, correct verification may not be guaranteed due to sensor noise and limitations of the feature extractor and matcher. One solution to cope with these limitations is to combine several biometrics in a multi-modal identity verification system. By using those multiple

Keywords: Biometrics, face verification, speech verification, fusion decision.

International Journal of The Computer, the Internet and Management Vol. 12#1 (January – April, 2004) pp 23 - 36

23

Page 2: Nearest Neighbourhood Classifiers in Biometric Fusion · Nearest Neighbourhood Classifiers in Biometric Fusion Nearest Neighbourhood Classifiers in Biometric Fusion ... Electronic

A. Teoh, S. A. Samad, A. Hussain

biometric traits, a much higher accuracy can be achieved. Even when one biometric feature is somehow disturbed, for example in a noisy environment, the other traits still lead to an accurate decision.

Some work on multi-modal biometric

identity verification systems has been reported in literature. Brunelli and Falavigna (1995) have proposed a person identification system based on acoustic and visual features, where they use a HyperBF network as the best performing fusion module. Dieckmann et al. (1997) have proposed a decision level fusion scheme, based on a 2-out-of-3 majority voting, which integrates face and voice, analyzed by three different experts: face, lip motion, and voice. Duc et al. (1997) proposed a simple averaging technique and compared it with the Bayesian integration scheme presented by Bigun et al. (1997). In this multi-modal system the authors use a face identification expert, and a text-dependent speech expert. Kittler et al. (1998) proposed a multi-modal person verification system, using three experts: frontal face, face profile, and voice. The best combination results are obtained for a simple sum rule. Hong and Jain (1998) proposed a multi-modal personal identification system which integrates face and fingerprints that complement each other. The fusion algorithm operates at the expert (soft) decision level, where it combines the scores from the different experts under statistically independence hypothesis, by simply multiplying them. Ben-Yacoub (1998) proposed a multi-modal data fusion approach for person authentication, based on Support Vector Machines (SVM) to combine the results obtained from a face identification expert, and a text-dependent speech expert. Pigeon (1998) proposed a multi-modal person authentication approach based on simple fusion algorithms to combine the results coming from the frontal face, face

profile, and voice modal. Choudhury et al. (1999) proposed a multi-modal person recognition using unconstrained audio and video. The combination of the two experts is performed using a Bayes net. A bimodal biometric verification system based on facial and vocal modalities are described in this paper. It differs from the systems that are mentioned above in the sense that this system is targeted for applications involving automatic verification using personal computers and their multimedia capturing devices. Thus each module of the system has been fine-tuned to deal with the problems that may occur in this type of application, such as poor quality images obtained from using a low cost PC camera and the problem of using various types of microphones that may cause channel distortion or convolution noise. In addition, the system is designed to keep the rate as low as possible for the case when an imposter is accepted as being a genuine. Each module of the system, i.e. the face and voice, is developed separately and several fusion decision schemes are compared with the aim to obtain the optimum technique for this application.

2. Verification Modules 2.1 Face Verification

In personal verification, face recognition

refers to static, controlled full frontal portrait recognition. There are two major tasks in face recognition: (i) face detection and (ii) face verification.

Page 3: Nearest Neighbourhood Classifiers in Biometric Fusion · Nearest Neighbourhood Classifiers in Biometric Fusion Nearest Neighbourhood Classifiers in Biometric Fusion ... Electronic

A. Teoh, S. A. Samad, A. Hussain

Face extraction & Eye detection

Face Detection Input face

Identification Key

Model Selection

Face Normalization

Face Verification

Accept/Reject

Output face

Figure 1 Face Verification System

In our system as shown in figure 1, the

Eigenface approach (Moghaddam and Pentland, 1995) is used in the face detection and face recognition modules. The main idea of the Eigenface approach is to find the vectors that best account for the distribution of face images within the entire image space

and define as the face space. Face spaces are eigenvectors of the covariance matrix corresponding to the original face images, and since they are face-like in appearance they are so are called eigenfaces as shown in Figure 2.

Figure 2 Eigenface

Page 4: Nearest Neighbourhood Classifiers in Biometric Fusion · Nearest Neighbourhood Classifiers in Biometric Fusion Nearest Neighbourhood Classifiers in Biometric Fusion ... Electronic

Now let the training set of face images be i1, i2, … , im , the average face of the set is defined as

∑=

=M

jji

Mi

1

1 (1)

where M is the total number of images. Each face differs from the average by

the vector iinn −=φ . A covariance matrix is constructed where:

= AA∑=

=M

j

TjjC

1φφ T (2)

where A = [φ1 φ2 … φM]. Then, eigenvectors, vk and eigenvalues,

λk with symmetric matrix C are calculated. vk determine the linear combination of M difference images with φ to form the eigenfaces:

∑=

=

M

kklkl vu

1φ l=1, … ,M (3)

From these eigenfaces, K(<M)

eigenfaces are selected to correspond to the K highest eigenvalues.

Face detection is accomplished by

calculating the sum of the square error between a region of the scene and the Eigenface, a measure of Distance From Face Space (DFFS) that indicates a measure of how face-like a region. If a window, φ is swept across the scene, to find the DFFS at each location, the most probable location of the face can be estimated. This will simply be the point where the reconstruction error, ε has the minimum value.

fφφε −= (4) where is the projection into face-

space. fφ

From the extracted face, eye co-ordinate will be determined with the hybrid rule based approach and contour mapping technique (Samad, 2001). Based on the information obtained, scale normalization and lighting normalization are applied for a head in box format.

The Eigenface-based face recognition

method is divided into two stages: (i) the training stage, (ii) the operational stage (Turk and Pentland, 1991). At the training stage, a set of normalized face images, i that best describe the distribution of the training facial images in a lower dimensional subspace (eigenface) is computed by the operation:

)( iiu nkk −=ϖ (5)

where n = 1, … ,M and k=1, … ,K Next, the training facial images are

projected onto the eigenspace, Ωi, to generate the representations of the facials images in eigenface.

Ωi = [ϖn1, ϖn2, … ,ϖnK] (6) where i=1,2, . . . ,M. At the operational stage, an incoming

facial image is projected onto the same eigenspace and the similarity measure which is the Mahalanobis distance between the input facial image and the template is, thus, computed in the eigenspace.

Let ϕ denote the representation of the

input face image with claimed identity C and ϕ denote the representation of the Cth template. The similarity function between ϕ and ϕ is defined as follows:

01

C1

01

C1

F1(ϕ ,ϕ ) = || ϕ - ϕ ||0

1C1

01

C1 m (7)

where ||•||m denotes the Mahalanobis

distance.

26

Page 5: Nearest Neighbourhood Classifiers in Biometric Fusion · Nearest Neighbourhood Classifiers in Biometric Fusion Nearest Neighbourhood Classifiers in Biometric Fusion ... Electronic

Nearest Neighbourhood Classifiers in Biometric Fusion

2.2 Speaker Verification Anatomical variations that naturally

occur amongst different people and the differences in their learned speaking habits manifest themselves as differences in the acoustic properties of the speech signal. By analyzing and identifying these differences, it is possible to discriminate among speakers (Champbell, 1997). Our front end of the speech module aims to extract the user dependent information.

The system includes three important

stages: endpoint detection, feature extraction and pattern comparison. The endpoint detection stage aims to remove silent parts from the raw audio signal, as this part does not convey speaker dependent information.

Noise reduction techniques are used to

reduce the noise from the speech signal.

Simple spectral subtraction (Martin, 1994) is first used to remove additive noise prior to endpoint detection. Then, in order to cope with the channel distortion or convolution noise that is introduced by a microphone, the zero’th order cepstral coefficients are discarded and the remaining coefficients are appended with delta feature coefficients (Rabiner and Juang, 1993). In addition, the cepstral components are weighted adaptively to emphasize the narrow-band components and suppress the broadband components (Zilovic et al., 1997). The cleaned audio signal is converted to a 12th order linear prediction cepstral coefficients (LPCC), using the autocorrelation method that leads to a 24 dimensional vector for every utterance. The significant improvement of verification rate by using this combination can be found in paper (Samad, 2001). Figure 3 shows the process used in front end module.

International Journal of The Computer, the Internet and Management Vol. 12#1 (January – April, 2004) pp 23 - 36

27

Figure 3 The front-end of the speaker verification module. As with the face recognition module, the

speaker verification module also consists of two stages: (i) the training stage and (ii) the operational stage. At training phase, two sample utterances with the same words from the same speaker are collected and trained using the modified k-Mean algorithm (Wilpon and Rabiner, 1985). The main advantages of this algorithm are the statistical consistency of the generated templates and their ability to cope with a

wide range of individual speech variations in a speaker-independent environment.

At the operational stage, we opted for a

well-known pattern-matching algorithm – Dynamic Time Warping (DTW) (Sakoe and Chiba, 1971) to compute the distance between the trained template and the input sample.

Let ϕ represent the input speech

sample with the claimed identity C and ϕ 02

C2

Additive noise reduction - spectral suppression

Endpoint detection - energy and zero crossing rate

Convolution noise reduction - delta coefficients -.adaptive component weighting - zero’th order cepstral.

Feature extraction - LPC

Noisy signal

Coefficient features

Page 6: Nearest Neighbourhood Classifiers in Biometric Fusion · Nearest Neighbourhood Classifiers in Biometric Fusion Nearest Neighbourhood Classifiers in Biometric Fusion ... Electronic

Cth template. The similarity function between ϕ and ϕ is defined as follows: 0

2C2

F1(ϕ ,ϕ ) = ||ϕ - ϕ || (8) 0

2C2

02

C2

where ||•|| denotes the distance score result from DTW.

3. Fusion Decision Module 3.1 k – NN (Nearest Neighborhood) Classifier

The k-NN classifier (Duda and Hart,

1973) is a simple classifier that needs no specific training phase. All that are required are reference data points for both classes representing the genuine and the imposter. A pattern xs to be classified is then attributed with the same class label as the label of the majority of its k nearest reference neighbors.

To find the k nearest neighbors, the

Euclidean distance between the test point and all the reference points is calculated. The distances are then ranked in ascending order and the reference points corresponding to the k smallest Euclidean distances are taken. The exhaustive distance calculation step during the test phase leads rapidly to large computing time, which is the major drawback of this otherwise very simple algorithm. A special case of the k-NN classifier can be obtained when k = 1. In this case, the classifier is called the Nearest Neighbor (NN) classifier.

3.2 Evidence Theoretic k-NN Classifier Based On Dempster-Safer Theory In Dempster-Shafer theory, a problem is

represented by a set of Θ of mutually exclusive and exhaustive hypotheses called the frame of discernment (Safer, 1976). A basic belief assignment (BBA) is a function

m : 2Θ → [0 1] verifying m(ø) = 0 and ΣA⊆Θ.m(A) = 1. The quantity m(A) can be interpreted as the measure of the belief that is committed exactly to A, given the available evidence. Two evidential functions derived from the BBA are the credibility function Bel and the plausibility function Pl defined respectively as Bel(A) = ΣB⊆A.m(B) and Pl(A) = ΣA∩B≠φ m(B) for all A ⊆ Θ. A probability distribution P such that Bel(Hn) ≤ P(Hn) ≤ Pl(Hn) for any subset A of Θ is said to be compatible with m. An example of such a probability distribution is the pignistic probability distribution BetP defined for each single hypothesis θ ∈Θ as

BetP (θ) = ∑∈A A

)A(mθ

(9)

Any subset A of Θ such that m(A) > 0 is

called a focal element of m. Given two BBAs m1 and m2 representing two independent sources of evidence, the Dempster-Shafer’s rule defined a new BBA m = m1 ⊕ m2 verifying m(φ) = 0 and

∑=∩

=ABA

)A(m).A(mK

)A(m21

22

111 (10)

where K is defined by K = (11) ∑

≠∩ φ21

22

11

BA)A(m).A(m

for all A ≠ φ. For bimodal biometric verification

system, we consider the problem of classifying entities into M=2 categories or classes. The set of classes is denoted by Ω = ω1,ω2. The available information is assumed to consist in a training set T = (x(1),ω(1)), . . ., (xN,ωN) of 2 dimensional patterns x(i), i = 1, …, N and their corresponding class labels ωi, i=1,2 taking values in Ω. The similarity between patterns

28

Page 7: Nearest Neighbourhood Classifiers in Biometric Fusion · Nearest Neighbourhood Classifiers in Biometric Fusion Nearest Neighbourhood Classifiers in Biometric Fusion ... Electronic

Nearest Neighbourhood Classifiers in Biometric Fusion

is assumed to be correctly measured by a certain distance function d(⋅,⋅).

Let x be the test input to be classified on

the basis of the information contained in T. Each pair (xi,ωi) constitutes a distinct item of evidence regarding the class membership of x. If x is “close” to xi according to the relevant metric d, then one will be inclined to believe that both vectors belong to the same class. On the contrary, if d(x ,xi) is very large, then the consideration of xi will leave us in a situation of almost complete ignorance concerning the class of x. Consequently, this item of evidence may be postulated to induce a basic belief assignment (BBA) m( ⋅ | xi) over Ω defined by:

m(ωq|xi) = αφq(di) (12) m(Ω|xi) = 1 - αφq(di) (13) m(A|xi) = 0,∀ A ∈ 2Ω \ Ω,ωq

(14) where di = d(x,xi), ωq is the class of xi

(ωi = ωq), α is a parameter such as 0 < α <1 and φq is a decreasing function verifying φq(0) = 1 and limd→∞ φq(d) = 0. When d

denotes Euclidean distance, a rational choice for φq to be

φq(d) = exp(-γqd2) (15) γq being a positive parameter associated

to class ωq. As a result of considering each training

pattern in turn, we obtain N BBAs that can be combined using Demster’s rule of combination to form a resulting BBA m synthesizing one’s final belief regarding the class of x.

m = m(⋅|x1) ⊕ ⋅⋅⋅⊕ m(⋅|xN) (16) Since those training patterns situated far

from x actually provide very little information, it is sufficient to consider the k nearest neighbors of x in this sum. An alternative definition of m is therefore

m = m(⋅|x(i

1)) ⊕ ⋅⋅⋅⊕ m(⋅|x(i

k)) (17)

where Ik = i1, . . . ,ikcontains the indices

of the k nearest neighbors of x in T. This can be illustrated in below:

Figure 4 Evidence theoretic k-NN classifier Figure 4 Evidence theoretic k-NN classifier

x2 x1

BBA1

BBA1

m=m(⋅|x(i1

))⊕⋅⋅⋅⊕m(⋅|x(ik

))

returns two BBA. That is BBAk xk

Training set for class1 and class2 respectively Test point

International Journal of The Computer, the Internet and Management Vol. 12#1 (January – April, 2004) pp 23 - 36

29

Page 8: Nearest Neighbourhood Classifiers in Biometric Fusion · Nearest Neighbourhood Classifiers in Biometric Fusion Nearest Neighbourhood Classifiers in Biometric Fusion ... Electronic

A. Teoh, S. A. Samad, A. Hussain

The k nearest neighbors of x can be regarded as k independent sources of information, each one represented by a BBA. Note that the calculation of m involves the combination of k simple BBAs, and therefore can be performed in linear time with respect to the number of 2 classes.

With adopt the definition in equation

(10), (11) and (17), m can be shown to have the following expression:

∏∏∏≠ ∈∈

−−−=qr Ii

ir

Ii

iqq

rkqk

ddK

m,,

))(1()))(1(1(1)( αφαφω

∀q ∈ 1,2 (18)

m(Ω) = ∏ ∏≠ ∈

−1 ,

))(1(1r Ii

ir

rk

dK

αφ (19)

where Ik,q is the subset of Ik

corresponding to those neighbors of x belonging to class ωq and K is a normalizing factor. Hence the focal elements of m are singletons and the whole frame Ω. Consequently, the credibility and the plausibility of each class ωq are respectively equal to:

bel(ωq) = m(ωq) (20) pl(ωq) = m(ωq) + m(Ω) (21) As for the pignistic probability in eq(9),

we can write in the form: BetP(ωq) = m(ωq) + m(Ω)/M

(22) A decision is made by assigning a

pattern to the class ωqmax with maximum credibility (or equivalently, with maximum plausibility or pignistic probability):

qmax = arg maxq m(ωq) (23)

The main advantage of evidence theoretic k-NN classifier over the other classifiers is that reject options are provided (Hellman, 1970). Ambiguity reject is required when the pattern under consideration is located close to class boundaries. In this case, the probability of misclassification is high, and it may be preferable to postpone decision making. Distance reject is applied when the pattern is situated far from all known classes. In such a situation, it may be assumed that this pattern actually belongs to an unknown class that is not represented in the training set. However, the distance rejection is not suited to the identity verification application because the classes are already known.

Given a vector x to be classified,

ambiguity reject can be decided by comparing conf to a user defined threshold kr (0 ≤ kr ≤ 1):

If conf ≥ kr, then x is ambiguity-

rejected (24)

3.0 Experiments and Discussion 3.1 Distance score normalization

The similarity measure values from

equations (7) and (8) have different ranges and hence cannot be fused directly. They have to be mapped into a common score interval between [0 1].

A high score indicates the person is

genuine, while a low opinion suggests the person is an imposter. The opinions from the modality experts are used by a fusion stage also referred to as a decision stage. It considers the opinions and makes the final decision to either accept or reject the claim.

Page 9: Nearest Neighbourhood Classifiers in Biometric Fusion · Nearest Neighbourhood Classifiers in Biometric Fusion Nearest Neighbourhood Classifiers in Biometric Fusion ... Electronic

Nearest Neighbourhood Classifiers in Biometric Fusion

International Journal of The Computer, the Internet and Management Vol. 12#1 (January – April, 2004) pp 23 - 36

31

The bimodal biometric system here is designed as shown in Figure 5..

Speech Expert

Speech utterance Accept/

Reject Fusion Stage

Normalized score

Face image Face

Expert Figure 5 The building block of the bimodal face and speech verification system From the distance scores, x, that is

produced by the speech and face databases, the mean, µ, and the variance, σ2 of the distance values of the speech and face expert, respectively are found separately by performing validation experiments on the database. The distance score is then normalized by mapping it to the range [-1,1] using

σ

µ−=

xy (25)

The [-1,1] interval corresponds to the

approximately linear changing portion of the sigmoid function

)exp(11)(

yyf

−+= (26)

used to map the values to the [0,1]

interval. Figure 6 shows the distribution plot for

the genuine and imposter reference points obtained for the mapping technique.

Legend: Genuine Reference Point Imposter Reference Point Figure 6 The distribution plot for the genuine and imposter reference point

Page 10: Nearest Neighbourhood Classifiers in Biometric Fusion · Nearest Neighbourhood Classifiers in Biometric Fusion Nearest Neighbourhood Classifiers in Biometric Fusion ... Electronic

A. Teoh, S. A. Samad, A. Hussain

3.2 Performance Criteria The basic error measure of a verification

system are false acceptance rate (FAR) and false rejection rate (FRR) as defined in equations (27) and (28).

FAR =

Number of accepted imposter claims Total number of imposter accesses x 100%

(27) FRR =

Number of rejected genuine claims Total number of genuine accesses x 100%

(28)

An unique measure can be obtained by

combining these two errors into the total error rate (TER) or total success rate (TSR) where

TER = FAR + FRR

Total number of accesses x 100

(29)

TSR = 1 – TER (30) For the individual experts, EER (i.e

FAR = FRR ) is used to obtained their corresponding threshold value.

For the targeted application that uses

bimodal biometric verification system, the Minimum Total Misclassification Error (MTME) criterion is used, which means the system always tries to minimize ε as shown in equation (31)

ε = min(FA +FR) (31) In order to apply this criterion, we set

FAR<1% while keeping the FRR to a minimum possible value.

3.3 Experimental Setup All experiments are performed using a

face database obtained from Olivetti Research Lab (2000), speech database contributed by Otago speech corpus (2000) and Digit multi-modal Database (Sanderson, 2000). Three sessions of the face database and speech database are used separately. The first enrollment session is used for training. This means that each access is used to model the respective genuine, yielding 60 different genuine models. In the second enrollment session, the accesses from each person are used to generate the validation data in two different manners. The first is to derive a single genuine access by matching the shot or utterance template of a specific person with his own reference model, and the other is to generate 59 impostor accesses by matching it to the 59 models of the other persons of the database. This simple strategy thus leads to 60 genuine and 3540 impostor accesses, which are used to validate the performance of the individual verification system and to calculate the thresholds for the equal error rate (EER) criterion. The third enrollment session is used to test these verification systems, using the thresholds calculated with the validation data set.

3.4 Experimental Results The performance of the speech and face

expert is as shown in Table 1.

FRR(%) FAR(%) TER(%) TSR(%) Speec 8.333 6.497 6.528 93.472 Face 5.000 6.017 6.000 94.000

Table 1 Individual performance of the

face and speech expert.

Page 11: Nearest Neighbourhood Classifiers in Biometric Fusion · Nearest Neighbourhood Classifiers in Biometric Fusion Nearest Neighbourhood Classifiers in Biometric Fusion ... Electronic

Nearest Neighbourhood Classifiers in Biometric Fusion

International Journal of The Computer, the Internet and Management Vol. 12#1 (January – April, 2004) pp 23 - 36

33

From the values TSR in Table 1, we can observe that the experts are working equally well individually.

The results shown in following Table 2

are obtained by applying the k-NN technique.

K FRR(%) FAR(%) TER(%) TSR(%) 1 8.333 0.169 0.306 99.694 2 8.333 0.169 0.306 99.694

3 8.333 0.141 0.278 99.722

4 10 0.141 0.306 99.694 5 10 0.113 0.278 99.722 10 10 0.113 0.278 99.722

Table 2 Results from k-NN fusion scheme From the Table 2, it can be observed

that the FRR will increase but FAR decreases when k increases due to the unbalanced condition between the imposter number (3540) compared to genuine number (60). According to Verlinde and Chollet (2000), the number of imposter can be reduced using the k-means algorithm since this operation induces a loss of information. It will create the cluster center points which can replace the actual imposter reference points. Thus, we can create varying R clusters and range k. The best experimental result is obtained at k=3 for the application targeted and the results are as shown in Table 3.

R FRR(%) FAR(%) TER(%) TSR(%)

100 1.667 1.017 1.028 98.972

300 3.333 0.763 0.806 99.194

500 3.333 0.339 0.389 99.611

1000 3.333 0.396 0.444 99.556

1500 8.333 0.170 0.306 99.694

2000 8.333 0.170 0.306 99.694

3540 8.333 0.141 0.278 99.722

Table 3 Results for Modified k-NN fusion technique.

From Table 3, it is observed that the FRR increases and FAR decreases with R as expected. The optimal number of imposter prototypes R depends on the cost-function as specified by the application. While referring to the MTME criteria that states FAR<1%, at the same time the FRR to a minimum possible value in experiments, we found that R=500 gives the best performance through the experiments. Although the TSR is smaller than the ordinary k-NN, it gives much lower FRR.

By using the evidence theoretic k-NN

classifier, the results as shown in Table 4 are obtained.

k FRR

(%)

FAR

(%)

TER

(%)

TSR

(%)

Note

1 8.333 0.141 0.278 99.722 1 rejected

2 7.018 0.085 0.195 99.805 3 rejected

3 8.333 0.113 0.250 99.750 1 rejected

4 10.000 0.113 0.278 99.722 0 rejected

5 10.000 0.085 0.250 99.750 1 rejected

10 5.084 0.085 0.167 99.833 1 rejected

Table 4 Results for evidence theoretic k-NN fusion based on Dempster-Shafer theory.

From the table, we found that the results

are insensitive to the choices of the value k and outperforms the ordinary and modified k NN classifiers. When k = 10 the smallest values for FRR and FAR are obtained. In addition, this classifier provides an uncertainty management engine which can be used to improve the system performance. In the operational system, the user will be asked to perform another session of authentication.

5. Conclusion The paper has shown fusion decision

technique comparisons based on k-NN

Page 12: Nearest Neighbourhood Classifiers in Biometric Fusion · Nearest Neighbourhood Classifiers in Biometric Fusion Nearest Neighbourhood Classifiers in Biometric Fusion ... Electronic

classifiers and its variations for a biometric verification system. The system consists of a speech and face experts developed separately and targeted for applications involving automatic verification using personal computers and their multimedia capturing devices. In addition, the system is designed to keep the rate as low as possible for the case when an imposter is accepted as being a genuine. The fusion decision schemes considered are both the ordinary and modified k-NN classifier and also evidence theoretic k-NN classifier based on Dempster-Shafer theory.

From the experiments, it is found that

the best result is obtained using the evidence theoretic k-NN classifier as it introduces low FAR and FRR compared to both the ordinary and modified k-NN classifier. It also provides an uncertainty management engine which can improve the system performance effectively.

6. Acknowledgement The authors are indebted to the Ministry

of Science, Technology and Environment for funding this project under IRPA 09-02-02-0050.

References Bigun, E., Bigun, J., Duc, B. and Fisher, S.

(1997), “Expert conciliation for multi modal person authentication systems by Bayesian statistics”. Proceedings of the first international conference on Audio and Video-based Biometric Person Authentication 327–334.

Brunelli, R. and Falavigna, D. (1995),

“Personal Identification Using Multiple

Cues”. IEEE Trans. on Pattern Analysis and Machine Intelligence 17(10), 955-966.

Campbell, J. (1997), “Speaker Recognition,

A tutorial”. Proceeding of the IEEE 85(9), 1437 – 1462, September 1997.

Choudhury, T., Clarkson, B., Jebara, T. and

Pentland, A. (1999), “Multimodal Person Recognition using Unconstrained Audio and Video. Second” International Conference on Audio- and Video-based Biometric Person Authentication 176–181.

Olivetti Research Lab (2000): Database of

faces. http://www.cam-orl.co.uk/ facedatabase.html. Accessed 21 APRIL 2000.

Denoeux, T. (1994), “A k-nearest neighbor

classification rule based on Dempster-Safer theory”. IEEE Trans Syst. Man. Cybern. SMC-25(5): 804-813.

Dieckmann, U., Plankensteiner, P. and

Sesam, T. W. (1997), “A biometric person identification system using sensor fusion”. Pattern recognition letters 18(9): 827–833.

Duc, B., Maýtre, G., Fischer, S and Bigun, J.

(1997), “Person authentication by fusing face and speech information”. Proceedings of the First International Conference on Audio and Video-based Biometric Person Authentication LNCS-1206:311-318.

Duda, R. O. and Hart, P. E. (1973), “Pattern

Classification and Scene Analysis. New York: John Wiley & Sons.

Hellman, M. F. (1970), “The k nearest

neighbor classification rule with a reject

34

Page 13: Nearest Neighbourhood Classifiers in Biometric Fusion · Nearest Neighbourhood Classifiers in Biometric Fusion Nearest Neighbourhood Classifiers in Biometric Fusion ... Electronic

Nearest Neighbourhood Classifiers in Biometric Fusion

option”. IEEE Trans Syst. Man. Cybern., vol. SMC-6(3), pp. 155-165.

Hong, L. and Jain, A. (1998), “Integrating

Faces and Fingerprints for Personal Identification”. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(12): 1295–1307.

Kittler, J., Hatef, M., Duin, R. P. W. and

Matas, J. (1998), “On combining classifiers”. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(3): 226–239.

Martin, R. (1994), “Spectral Subtraction

Based on Minimum Statistics”. Proc. Seventh European Signal Processing Conference 1182-1185.

Moghaddam, B. and Pentland, A. (1995),

“Probabilistic Visual Learning for Object Detection”. 5th International Conference on Computer Vision 786-793.

Otago (2000), Otago Speech Corpus.

http://kel.otago.ac.nz/hyspeech/corpusinfo.html. Accessed 1 MAC 2000.

Pigeon, S and Vandendorpe, L. (1998),

“Multiple experts for robust face authentication”. SPIE, editor, Optical security and counterfeit deterrence II 3314: 166-177.

Rabiner, L. and Juang, B. H. (1993),

Fundamentals of speech recognition. United State: Prentice-Hall International, Inc.

Safer, G. (1976), A mathematical theory of

evidence.Princeton: Princeton University Press.

Sakoe, H. and Chiba, S. A. (1971),

“Dynamic Programming Approach To

Continuous Speech Recognition”. Proc. 7th Int. Congress Acoustics 20: C13.

Samad, S. A., Hussein, A. and Teoh, A.

(2001), “Eye Detection Using Hybrid Rule Based Method and Contour Mapping”. Proceeding of the Sixth International Symposium on Signal Processing and Its Applications, pages 631-634, Kuala Lumpur, Malaysia, August 2001.

Samad, S. A., Hussein, A. and Teoh, A.

(2001): “Increasing Robustness In A Speaker Verification System With Template Training And Noise Reduction Techniques”. Proceedings of the International Conference on Information Technology and Multimedia, UNITEN, August 2001.

Sanderson, C. (2001), “Digit Database 1.0 -

multimodal database for speaker identification/recognition”. http://spl.me.gu.edu.au/digit/. Accessed 1 MAY 2001.

Turk, M. and Pentland, A. (1991), “Face

Recognition Using Eigenfaces”. Journal of Cognitive Neuroscience 3(1): 71-86.

Verlinde, P. and Chollet, G. (1999),

“Comparing decision fusion paradigms using k-NNbased classifiers, decision trees and logistic regression in a multi-modal identity verification application”. In Second International Conference on Audio and Video-based Biometric Person Authentication (AVBPA), Washington D. C., USA, March 1999.

Wilpon J. G. and Rabiner, L. R. (1985), “A

Modified K-Means Clustering Algorithm For Use In Isolated Word Recognition”. IEEE Trans, Acoustics Speech, Signal Proc. 33(3): 587-597.

International Journal of The Computer, the Internet and Management Vol. 12#1 (January – April, 2004) pp 23 - 36

35

Page 14: Nearest Neighbourhood Classifiers in Biometric Fusion · Nearest Neighbourhood Classifiers in Biometric Fusion Nearest Neighbourhood Classifiers in Biometric Fusion ... Electronic

Yacoub, S. B. (1999), “Multi-Modal Data Fusion for Person Authentication using SVM”. Proceedings of the Second International Conference on Audio and Video-based Biometric Person Authentication (AVBPA'99) 25-30.

Zilovic, M. S., Ramachandran, R. P. and

Mammone, R. J. (1997), “A Fast Algorithm For Finding The Adaptive Component Weighting Cepstrum For Speaker Recognition”. IEEE Transactions on Speech & Audio Processing 5: 84-86.

36