Ordinal Feature Selection for Iris and Palmprint Recognition+Report 2

8/10/2019 Ordinal Feature Selection for Iris and Palmprint Recognition+Report 2

1/33


2/33


3/33

Ordinal Feature Selection for Iris and Palmprint Recognition

M.Tech in Signal Processing, SIT, Tumkur Page 3

variations of the parameters in multi-lobe ordinal filter can lead to an extremely huge feature set

of ordinal measures. For example, each basic Gaussian lobe in MOF has five parameters, i.e., x-

location, y-location, x-scale, y-scale and orientation. Thus there are totally 10 variables in a di-

lobe ordinal filter and 15 tunable parameters in a tri-lobe ordinal filter. Supposing that each

variable has ten possible values, the number of all possible di-lobe and tri-lobe ordinal measures

in a biometric image is at least in the order of 10 10 and 10 15 respectively. Although in general

ordinal measures are good descriptors for biometric feature representation, there are significant

differences between various ordinal features in terms of distinctiveness and robustness. Since the

primitive image structures vary greatly across different biometric modalities in terms of shape,

orientation, scale, etc., there does not exist a generic feature set of ordinal measures which can

achieve the optimal recognition performance for all biometric modalities. Even for the same

biometric modality, the existing individual difference in visual texture pattern determines that theoptimal ordinal features may vary from person to person. Moreover the redundancy among

different ordinal features should be reduced and it has been proven that it is possible to only use a

small number of ordinal features to achieve high accuracy in iris and palmprint biometrics.

Therefore it is unnecessary to extract all ordinal features because of the redundancy in the over-

complete set of ordinal feature representation. Based on the above analysis, a much smaller subset

of ordinal measures must be selected from the original feature space as a compact biometric

representation, into which the characteristics of visual biometric patterns should be incorporated,

for the purpose of efficient biometric identification.


4/33



II. RELATED WORK

Feature selection is a key problem in pattern recognition and has been extensively studied.

However, finding an optimal feature subset is usually intractable and in most cases there are only

solutions to suboptimal feature selection. Since no generic feature selection methods are

applicable to all problems, a number of feature selection methods have been proposed. These

methods employ different optimization functions and searching strategies for feature selection.

For example, the criteria of Max-Dependency, Max-Relevance is used to formulate an

optimization based feature selection method mRMR. Minimum redundancy feature selection is

an algorithm frequently used in a method to accurately identify characteristics of genes and

phenotypes and narrow down their relevance and is usually described in its pairing with relevant

feature selection as Minimum Redundancy Maximum Relevance (mRMR). Feature selection,

one of the basic problems in pattern recognition and machine learning, identifies subsets of data

that are relevant to the parameters used and is normally called Maximum Relevance. These

subsets often contain material which is relevant but redundant and mRMR attempts to address

this problem by removing those redundant subsets. mRMR has a variety of applications in many

areas such as cancer diagnosis and speech recognition. Features can be selected in many different

ways. One scheme is to select features that correlate strongest to the classification variable. This

has been called maximum-relevance selection. Many heuristic algorithms can be used, such asthe sequential forward, backward, or floating selections. On the other hand features can be

selected to be mutually far away from each other while still having "high" correlation to the

classification variable. This scheme, termed as Minimum Redundancy Maximum Relevance

(mRMR) selection has been found to be more powerful than the maximum relevance selection.

ReliefF is a simple yet efficient feature selection method suitable for problems with strong

dependencies between features. ReliefF has been regarded as one of the most successful

strategies in feature selection because the key idea of the ReliefF is to estimate the quality of

features according to how well their values distinguish between instances that are near to each

other. RELIEF is a feature selection algorithm used in binary classification (generalisable to

polynomial classification by decomposition into a number of binary problems) proposed by Kira

and Rendell in 1992. Its strengths are that it is not dependent on heuristics, requires only linear

time in the number of given features and training instances, and is noise-tolerant and robust to
http://en.wikipedia.org/wiki/Genehttp://en.wikipedia.org/wiki/Phenotypehttp://en.wikipedia.org/wiki/Feature_selectionhttp://en.wikipedia.org/wiki/Feature_selectionhttp://en.wikipedia.org/wiki/Phenotypehttp://en.wikipedia.org/wiki/Gene


5/33



feature interactions, as well as being applicable for binary or continuous data; however, it does

not discriminate between redundant features, and low numbers of training instances fool the

algorithm. Kononenko et al. proposed some updates to the algorithm (RELIEFF) in order to

improve the reliability of the probability approximation, make it robust to incomplete data, and

generalising it to multi-class problems

Most research works on feature selection mainly focus on generic pattern classification

applications rather than specific applications in biometrics. This paper mainly addresses the

efficient feature selection methods applicable to biometric authentication. Boosting and Lasso

have been proved as the well performed feature selection methods in face recognition.

Boosting has become a popular approach used for both feature selection and classifier design in

biometrics. Boosting algorithm aims to select a complementary ensemble of weak classifiers in agreedy manner. A reweighting strategy is applied for training samples to make sure that every

selected weak classifier should have a good performance on the hard samples which cannot be

well classified by the previously selected classifiers. Boosting has achieved good performance in

visual biometrics, including both face detection and face recognition. However, boosting cannot

guarantee a globally optimal feature set and an over fitting result may be obtained if the training

data is not well designed.

Destrero et al. proposed a regularized machine learning method enforcing sparsity for featureselection of face biometrics based on Lasso regression. The Lasso feature selection aims to solve

the following penalized least-squares problem:

}||2||{|| 122minarg f Af g f

f L

( 1 )

where g means the intra- or inter-class label ( + 1 or -1), the components of A indicate the intra- or

inter-class matching results based on individual features in the training database, f denotes the

feature weight vector, and is a parameter controlling the balance between regression errors and

sparsity of selected features. The objective function includes two parts. The first part 22|||| Af g

aims to minimize the regression errors and the second part 1||2 f uses L1 regularization to

enforce sparsity of the selected features. The The L1 regularized sparse representation was

evaluated to be better than Boosting for face detection and authentication in small size training

dataset. However, this approach also has some drawbacks. Firstly, although the optimization


6/33



problem defined in sparse representation is possible to achieve a global minimum, it is not

efficient in implementation due to the non-linear objective function. Therefore a three-stage

architecture is necessary to solve a large learning problem. Secondly, the squared sum of

regression errors defined in the objective function makes the feature selection sensitive to

outliers. Thirdly, the class label g can only take the value either + 1 or - 1, therefore the model

could not generate a maximal margin. Margin analysis is important to the generalization ability

of machine learning algorithms and the most powerful machine learning methods, e.g. Support

Vector Machine and Boosting are motivated by margins. In addition, the features of training

samples should be normalized to match the class label, therefore additional computational cost is

needed. Fourthly, the model of Lasso is not flexible so that the optimization does not take into

account the characteristics of image features and biometric recognition. For example, L1

regularization term 1|| f in the optimization function Eqn. 1 assigns an identical weight to all

features and the discriminative information of each feature is not taken into consideration.

The L1 regularization is a popular technique for feature selection. The objective function aims

to minimize misclassification error and the L1 norm of feature weight.

In summary, both Boosting and Lasso have limitations in ordinal feature selection and it is

desirable to develop a feature selection method with the following properties.

1) The feature selection process can be formulated as a simple optimization problem. Heresimple means that both the objective function and the constraint terms can be defined

following a well-established standard optimization problem. So that it is easy to obtain a

global solution of the feature selection problem.

2) A sparse solution can be achieved in feature selection so that the selected feature set is

compact for efficient storage, transmission and feature matching.

3) The penalty of misclassification cannot be a high-order function of regression errors to

control the influence of outliers.

4) The model of feature selection should be flexible to take into account the characteristics of

the biometric recognition problem so that the genuine and imposter matching results can be

well separated from each other and the selected image features are accurate in training

database.

5) The feature selection problem has less dependence on the training data and it can be solved


7/33



by a small set of training samples. It requires the feature selection method can circumvent the

curse of dimensionality problem and generalize to practical applications.

This paper proposes a novel feature selection method which meets all requirements listed above.

In our method, the feature selection process of ordinal measures is formulated as a constrainedoptimization problem. Both the optimization objective and the constraints are modeled as linear

functions; therefore linear programming (LP) can be used to efficiently solve the feature

selection problem. The feature units used for LP formulation are regional ordinal measures,

which are tested on the training dataset to generate both intra- and inter-class matching

samples. Our feature selection method aims at finding a compact subset of ordinal features that

minimizes biometric recognition errors with large margin between intra and inter-class

matching samples. The objective function of our LP formulation includes two parts. The first

part measures the misclassification errors of training samples failing to follow a large margin

principle. And the second part indicates weighted sparsity of ordinal feature units. Traditional

sparse representation uses L1-norm to achieve sparsity of feature selection and all feature

components have an identical unit weight in sparse learning. However, we argue that it is better

to incorporate some prior information related to candidate feature units into sparse representation

so that the most individually accurate ordinal measures are given preferential treatment. And

the linear inequality constraints of LP optimization problem require that all intra- and inter-

class matching results are well separated from each other with a large margin. Slack variables

are introduced to ensure the inequality constraints of ambiguous and outlier samples. Slack

variable is a variable that is added to an inequality constraint to transform it into equality.

Introducing a slack variable replaces an inequality constraint with an equality constraint and a

non-negativity constraint.


8/33



III. FEATURE SELECTION BASED ON

LINEAR PROGRAMMING

The objective of feature selection for biometric recognition is to select a limited number offeature units from the candidate feature set (Fig. 2). In this paper, a feature unit is defined as the

regional ordinal encoding result using a specific ordinal filter on a specific biometric region. We

aim to use a machine learning technique to find the weights of all ordinal feature units. So that

feature selection can also be regarded as a sparse representation method, i.e. most weight values

are zero and only a compact set of feature units have the weighted contribution to biometric

recognition.

In this paper, ordinal feature selection is formulated as a constrained optimization problem as

follows


9/33



subject to:

where D is the number of ordinal features available for feature selection, N + and N- denote the

number of intra- and inter-class biometric matching pairs in the training database respectively, wi

means the weight of ith ordinal feature for the biometric recognition system, P i measures the

recognition accuracy of ith ordinal feature on the training database, x+ ij denotes the Hamming

distance of ith ordinal feature for jth intra-class biometric image pairs in the training database, x-ik

denotes the Hamming distance of ith ordinal feature for k th inter-class iris image pairs in the

training database, and are two fixed parameters indicating the expected intra- and inter-class

biometric matching results respectively, + j and -k are slack variables for intra- and inter-class

biometric matching respectively, + and - are the constant parameters tuning the importance of


10/33



intra- and inter-class matching results for the biometric recognition system respectively. The idea

of this feature selection method is illustrated in Fig. 3.

The basic idea of the proposed feature selection method is to find a sparse representation of

ordinal features on the condition of large margin principle. On one hand, the intra and inter-class

biometric matching results are expected to be well separated with a large margin. On the other

hand, the number of selected ordinal features should be much smaller than the large number of

candidates. These two seemingly contradictory requirements are well integrated in our feature

selection method.

The objective function of LP formulation includes two parts motivated by the basic idea of

feature selection method. The first part of the objective function

N

jk

N

j j N N 11

aims to minimize the misclassification errors of intra- and inter-class matching samples according

to the expected thresh-olds and . Since and are defined as the mean intra- and inter-

classing Hamming distance for well performing ordinal features, a large margin principle is

actually incorporated into the optimization problem. The biometric matching samples failing to

meet the large margin requirement will suffer a penalty and such a penalty is deter-mined by the

distance from the dissimilarity measure to the expected thresholds and . Here a soft margintechnique is adopted by introducing slack variables + j and

-k to guarantee that all intra-class and

inter-class matching results follow the large margin principle. So the first part of objective

function

N

jk

N

j j N N 11

defines the overall penalty term of training samples according to the large margin principle. The

constant parameters + and - measure the penalty weights to the misclassifications of intra- and

inter-class matching samples respectively and their value can be tuned according to the

application requirements. For example, the FRR (False Reject Rate) sensitive applications such

as watch-list monitoring can set a larger + and the FAR (False Accept Rate) sensitive

applications such as banking can set a larger -. In normal applications, we can set + = - . In


11/33



summary, the objective function of the proposed LP feature selection method aims to minimize

the misclassification errors and enforce sparsity of the selected ordinal features simultaneously.

And the parameters + and - can balance the trade-off between accuracy and sparsity.

The second part of the objective function

enforces weighted sparsity of ordinal feature units. Sparsity of the ordinal feature units is very

important to effective and efficient biometric recognition. Firstly, the objective of biometric

recognition is to find a mapping function between the most characterizing features and the

identity label. Sparse learning is just for this purpose and it is possible to discover the intrinsic

features of biometric patterns. Secondly, sparsity means that it is possible to use a compact

feature set for biometric recognition, i.e., efficient encoding, storage, transmission and

comparison of biometric feature templates. Weighted sparsity proposed in this paper is a novel

idea in sparse representation. It differs from the existing sparse representation method, in that the

good performing individual features in the training database are given a higher weight in sparse

learning. Here the weight Pi represents the prior information of individual ordinal feature in

terms of recognition performance. It may be defined as the Equal Error Rate (EER), the Area

Under the ROC Curve (AUC) or the inverse of Discriminating Index (1/D-index). Since the

weight of each ordinal measure wi is constrained to be non-negative value, the second part of

objective function approximates the L1 regularization which is beneficial to generate a sparse

ordinal feature set after feature selection. The L1 regularization term in sparse representation

(Eqn. 1) can be regarded as the special case of

where Pi = 1 for all ordinal features. The prior information of each feature is not taken into

account in the Lasso method and all features are evenly treated to enforce sparsity. In our feature

selection method, better performing ordinal features are assigned with higher weights Pi so that

a more compact and effective feature set can be selected.


12/33



The LP formulation subjects to a set of linear inequality constraints. Eqn. 3 and Eqn. 4 require

that all intra- and inter-class matching samples in the training database should be well separated

based on a large margin principle. In fact, a large number of training samples close to the

decision boundaries cannot meet the large margin principle and these inter-class matching

results usually cannot be linearly separated. Therefore slack variables + j and -k are introduced to

the inequality constraints which makes our model more flexible and robust. Our LP formulation

is actually a soft margin model which can remove the influence of noisy samples or outliers

adaptively and also generate a larger margin to improve the accuracy and generalization

performance with the help of slack variables. Eqn. 7 indicates a non-negative constraint on the

weight of features w {w i }. We argue that the non-negative constraint of w is both reasonable and

beneficial. Firstly, the target of feature selection is to find the optimal solution of w, which is a

very important variable with physical meaning. Each element in w denotes the contribution ofeach ordinal feature to the success of biometric recognition. Since we are discussing a feature

selection method, each feature should only have positive contribution to the resulting large-margin

classification. Secondly, the second part of objective function

is equal to a weighted L1 regularization term if wi is enforced to be positive, which can lead to a

sparse result of feature selection. Thirdly, non-negative constraint of w is beneficial to a stable

solution of the LP optimization problem. For example, if wi < 0, it means that intra-class

Hamming distance of ith ordinal features may be generally larger than inter-class Hamming

distance based on Eqn. 3 and Eqn. 4. Of course such a conclusion contradicting the fact may

bring instable factors to the LP learning problem.

The feature selection method proposed in this paper has a different optimization formulation to

the existing LP method in terms of the weighted sparsity term in objective function and non-negative constraint of feature weights. Therefore our method is more suitable to learn

discriminant, robust and sparse features for biometric recognition.

It should be noted that our LP formulation is flexible and a number of variants may be generated

to meet the requirements of some specific feature selection applications. For example, the LP


13/33


14/33



State-of-the-art iris and palmprint recognition methods and representative feature selection

methods are evaluated on the CASIA and PolyU biometrics databases for performance

comparison to show the merit of the proposed LP formulation. It should be noted that the main

purpose of this paper is to discover the most effective ordinal features for iris and palmprint

recognition. It can be regarded as a specific feature selection problem. Two representative

methods in generic feature selection, i.e. mRMR and ReliefF, and two popular feature selection

methods in biometrics, i.e. Boosting and Lasso are used for performance comparison.


15/33



IV. ORDI NAL FEA TU RE SEL EC TI ON

FOR IRIS RECOGNITION

Previous works have demonstrated the effectiveness of ordinal measures for iris recognition and

there are a large number of stable ordinal measures in iris images. However, how to choose the

most effective feature set of ordinal measures for reliable iris recognition is still an unsolved

problem. In earlier methods, a di-lobe and a tri-lobe ordinal filter were jointly used for iris feature

extraction. The parameter settings of these ordinal filters are hand-crafted and they are performed

on all iris image regions. However, the texture characteristics such as scale, orientation and

salient texture primitives of iris patterns vary from region to region. So it is a better solution to

employ a region specific ordinal filter for iris feature analysis.

It should be noted that the process of ordinal feature selection does not consider the prior mask

information of eyelids, eyelashes, specular reflections. There are mainly two kinds of strategies to

deal with occlusion problem in iris recognition. The first is to segment and exclude occlusion

regions in iris images and label the regions using mask in iris matching. But it needs accurate and

efficient iris segmentation. In addition, the size of iris template becomes double. More

importantly, the computational cost of both iris image preprocessing and iris matching is

significantly increased because of the iris mask strategy. So it is more realistic to identify and

exclude the heavily occluded iris images in quality assessment stage. The remained iris images

used for feature extraction and matching are less occluded by eyelids and eyelashes. So that it is

beneficial to both accuracy and efficiency of iris recognition. This paper aims to learn a common

ordinal feature set applicable to less occluded iris images of all subjects. The process of the

feature selection is independent on any individual or image specific prior information such as iris

segmentation mask. We believe the commonly selected feature set should be accurate enough to

recognize almost all subjects because the individual or sample specific variations have already

been taken into consideration in feature selection. We have also tried to integrate the occlusionmask into feature selection and feature matching but no improvement of accuracy on state-of-the-

art iris image databases which have usually excluded heavily occluded iris images. We believe the

common ordinal features discovered in this paper are valuable for practical iris recognition

systems.

Iris texture varies from region to region in terms of scale, orientation, shape of texture


16/33



primitives, etc. So it is needed to use region specific ordinal filters to achieve the best

performance. Therefore iris images are divided into multiple blocks and different types of ordinal

filters with different parameter settings are applied on each image block. So that feature selection

methods can be used to find the most effective set of image blocks with the most appropriate

setting of parameters. In this paper, the preprocessed and normalized iris image is divided into

multiple regions and a number of di-lobe and tri-lobe ordinal filters with variable scale, orien-

tation and inter-lobe distance are performed on each region to generate 47,042 regional ordinal

feature units (Fig. 2). Each feature unit, which is jointly determined by the spatial location of iris

region and the corresponding ordinal filter, is constituted by 256 ordinal measures or 32 Bytes in

feature encoding. The objective of feature selection is to select a limited number of OM feature

units from the candidate feature set.

The experimental part of this paper aims to test and compare the proposed Linear Programming

(LP) method with four feature selection methods for ordinal iris feature analysis, i.e., Boosting,

Lasso, mRMR and ReliefF . All these feature selection methods used for selecting the effective

set of ordinal measures are simply named as LP-OM, Boost-OM, Lasso-OM, mRMR-OM and

ReliefF-OM respectively. In this paper, three iris image datasets in CASIA Iris Image Database

Version 4.0 (CASIA-IrisV4), namely CASIA-Iris-Thousand, CASIA-Iris-Lamp and CASIAIris-

Interval, are used in the experiments. To demonstrate the advantage of feature selection methods

for visual biometrics, a randomly selected ordinal feature set with the same number of feature

units is employed as the baseline algorithm. Such an ordinal feature representation method without

feature selection is denoted as Random-OM. To demonstrate the benefit of feature selection in

iris recognition, state-of-the-art iris recognition methods proposed by Daugman and Ma et al. are

implemented as the baseline algorithms. A number of hand-crafted parameter settings are tried

for these two methods and the best results are reported in this paper. The idea of sparse

representation of iris features has been recently proposed by Kumar using L1 regularization. So

the main feature selection method in can be represented by Lasso-OM.CASIA-Iris-Thousand contains 20,000 iris images from 1,000 subjects. The samples in

CASIA-Iris-Thousand are 8-bit gray level iris images with resolution 640 480. The diameter of

iris ring is around 200 pixels. And all iris images in CASIA-Iris-Thousand are compressed to

JPEG format to save storage memory. The main sources of


17/33



intra-class variations in CASIA-Iris-Thousand include illumination changes, motion blur,

eyeglasses, specular reflections, and JPEG compression. Since CASIA-Iris-Thousand is the


18/33


19/33



is approximately equivalent to the minimization of the L1 regularization term in the Lassoalgorithm, which has a solid theory to guarantee a sparse learning result. To further investigate

the relationship between iris recognition performance and the number of ordinal feature units, the

discriminating index of top N ordinal features chosen by the three feature selection methods (LP-

OM, Boost-OM, Lasso-OM) is shown in Fig. 4b. The experimental results indicate saturation of

iris recognition performance with increasing of the number of ordinal feature units. And this

result demonstrates the necessity and possibility of sparse representation of ordinal measures in

iris images. Because a limited number of ordinal features are sufficient to achieve high accuracy,

only 15 ordinal feature units (i.e., 420 Bytes ordinal code) with the largest weights are selected to

build an iris recognition system for the feature selection methods in the following experiments.

It is interesting to investigate the parameter Pi in the linear programming formulation. When P

is a unit vector, e.g.

is equal to the L1 regularization term in Lasso algorithm. We argue that it is better to incorporate

the prior information of each ordinal feature unit into the objective function to enforce the priority

of well-performing ordinal feature units in the training dataset. In the experiment on CASIA-Iris-

Thousand, four options of P (i .e. , Pi = 1 / D , Pi = 1 / D - index(OMi), Pi = AUC (OMi ), Pi =

EER(O Mi )) are tried to learn different ordinal feature sets for iris recognition. The testing

results of these four settings of parameter Pi are shown in Fig. 5. It is obvious the best iris

recognition result is achieved when Pi = 1 / D - index(OMi), which indicates the discriminating

index is the most important prior information of each ordinal feature unit. And the results also

demonstrate incorporation of discriminative penalty terms such as EER and AUC into feature

learning module can significantly improve biometric recognition accuracy.


20/33



Comparison results of the five feature selection methods and state-of-the-art iris recognition

methods on the testing dataset of CASIA-Iris-Thousand are shown in Fig. 6 and Table I. And the

baseline performance based on Random-OM is also listed in Table I.


21/33



TABLE I

COMPARISON OF PERFORMANCE OF IRIS RECOGNITION METHODS ON THE

CASIA-IRIS-THOUSAND

A number of conclusions can be drawn from the experimental results.

Ordinal features are effective for iris recognition. Even though we randomly select 15 ordinal

feature units from 47,016 candidates, it is possible to achieve a good recognition

performance (EER = 2.91%) on the largest iris dataset in the public domain.

The ordinal features automatically selected by most machine learning approaches (Boost-OM,

Lasso-OM, mRMR-OM and LP-OM) perform much better than randomly chosen ordinal

features (Random-OM). Therefore it is necessary to adopt feature selection methods to learn a

distinctive and robust ordinal feature set for iris recognition. The feature selection based iris

recognition methods perform significantly better than state-of-the-art methods. There are two

advantages of our methods. The first is the advantage of ordinal measures over iris code and

shape code. The second advantage is the use of feature selection method. In contrast, the

implementation of state-of-the-art iris recognition methods is based on hand-crafted feature

parameters.

There exist performance differences between the five feature selection methods. Both mRMR-OM and ReliefF-OM are generic feature selection methods, which are worse than the proven

feature selection methods in biometrics such as Boost-OM and Lasso-OM. In general, the

global optimization methods such as Lasso-OM and LP-OM can achieve a higher accuracy in

testing dataset than greedy learning method such as Boost-OM. And LP-OM can learn a better

ordinal feature set than Lasso-OM. Therefore the experimental results demonstrate that the


22/33



proposed linear programming method achieves the highest accuracy in terms of EER,

discriminating index and AUC. And the advantage of our feature selection method is more

significant in most practical iris recognition applications when FAR is usually required to be

smaller than 10-6. For example, when FAR = 10

-8, LP-OM can achieve a significantly smaller

FRR compared with Lasso-OM and Boost-OM.

The computational cost of feature selection is tested using Matlab2011 programming

environment on a 2.83GHZ personal computer. We can see linear programming is much more

efficient than Lasso and mRMR in feature selection. Boosting is the fastest method to select top

15 ordinal feature units because of its greedy feature selection strategy. In contrast, both linear

programming and Lasso can provide a global weighting result for all ordinal feature units, so

they are less efficient than boosting method in feature selection. This paper only tried feature

selection with ten thousands of variables so computational complexity of feature selection is

not so important in offline training stage. But it is possible to introduce millions of variables to

optimization in large-scale training database because more training data usually benefits pattern

recognition. In addition, it is possible to extend our work to online feature selection (e.g.,

person specific feature selection in forensic applications) when training time makes sense in

real-time applications.

The optimization objective functions of both Lasso and LP are mainly constituted by two

terms, namely misclassification penalty term and sparsity penalty term. We can use a parameter

to assign the importance weight to these two terms. It is interesting to investigate the

sensitivity of visual biometric recognition performance to the parameter . The EER of iris

recognition as a function of for these two feature selection methods in a cross validation

dataset is shown in Fig. 7. We can see that Lasso is sensitive to the parameter setting of but

LP can achieve a comparatively stable performance with variation of .


23/33



It is interesting to investigate the sparsity property of Lasso and LP. The results show that

linear programming can achieve a much more sparse training result, i.e., 26 non-zero

components (LP) vs. 500 non-zero components (Lasso). Therefore LP is advantageous over

Lasso to achieve a much more compact feature representation for iris biometrics Some

typical ordinal feature units which are selected by mRMR, LP, Lasso and Boost areillustrated in Fig. 8 (The results of ReliefF are not shown here because it performs much

worse than other feature selection methods). A number of conclusions can be drawn from

the visualization of feature selection results.

1) The lower part of iris image regions adjacent to pupil are the most effective for iris

recognition because these regions are rich of iris texture information and have much

smaller probability to be occluded by eyelids and eyelashes.

2) Both di-lobe and tri-lobe filters are selected so they are complementary for iris

recognition. And the orientation of most ordinal filters is horizontal because iris

texture is mainly distributed along the circular direction in iris images, i.e. horizontal

orientation in the normalized format.

3) There exist some differences among the four feature selection methods (mRMR, LP,


24/33



Lasso and Boost) in terms of the selected ordinal filters and iris image regions. And

these minor differences of feature selection results determine the differences of iris

recognition performance.


25/33



V. ORDINAL FEATURE SELECTION FOR

PALMPRINT RECOGNITIONPalmprint provides a reliable source of information for automatic personal identification

and has wide and important applications. Richness of visual information available on

palmprint images including principal lines, ridges, minutiae points, singular points, texture,

etc. provides various possibilities for palmprint feature representation and pattern recognition. A

number of feature representation methods for palmprint recognition have been proposed in the

literature, including geometric structure such as point and line patterns, global appearance

description based on subspace analysis, and local texture analysis, etc. Competitive code rep-

resents the state-of-the-art performance in palmprint recognition. There, each palmprint image

region is assumed to have a dominant line segment and its orientation is regarded as the palmprintfeature. Because the even Gabor filter is well suited to model the line segment, it was used to

filter the local image region along six different orientations, obtaining the corresponding contrast

magnitudes. Based on the winner take-all competitive rule, the index (ranging from 0 to 5) of the

minimum contrast magnitude was represented by three bits, namely competitive code. This paper

attempts to provide a new understanding and solution to the problem of palmprint feature analysis

using ordinal measures and linear programming.

Unique and rich texture information of palmprint images is useful for personal identification.

There are a large number of irregularly distributed line segments on palm surface, mainly

constituted by principle lines and wrinkles. Photometric properties of these line segments are

significantly different to that of non-line regions. Thus the reflection ratios between line and non-

line regions have stable ordinal relationship, i.e. R ( l i n e ) < R (n o n - l i n e ) . Since the

illumination strength of neighboring palmprint regions are approximately identical, it can be

derived that the ordinal measures of intensity between palmprint regions are robust descriptor for

identity verification. For each palm, spatial configuration of the line and non-line image regionsfor ordinal measures, such as location, orientation, scale, has its unique layout. So the core idea

of ordinal measures based palmprint representation is to recover the random layout of ordinal

measures for feature matching.

This paper mainly focuses on feature analysis of palmprint biometrics. For palmprint images,


26/33



the gaps between neighboring fingers can be used as the landmark points for correction of the

rotation and scale changes of palmprint images and then the central region can be cropped as the

input of feature analysis. In this paper, all palmprint images are normalized into a central ROI

region with resolution 128 128. And then each ordinal filter is performed on the ROI to generate

32 32= 1024 Bits (128 Bytes) ordinal code following the feature extraction routine of most state-

of-the-art palmprint recognition algorithms So if we select N ordinal filters for palmprint image

analysis, the template size for each palmprint image is 128 N Bytes.

Because of the difference between the texture primitives in iris and palmprint biometric patterns,

we need to provide biometric modality specific ordinal filters as the input of feature selection.

Previous work only tried di-lobe ordinal filters (Fig. 9a) for palmprint recognition and the results

show that the ordinal measures between two elongated, line-like and orthogonal image regions arewell-suited for palmprint feature analysis. In this paper we explore tri-lobe ordinal filters (Fig.

9b) for palmprint feature extraction because of the following reasons.

1) Tri-lobe ordinal filters are expected to be more discriminative and robust than di-lobe filters;

2) A much larger feature space can be generated by tri-lobe ordinal filters so that it is possible

for feature selection methods to search a better solution for palmprint recognition;

3) Di-lobe filters can be regarded as the special cases of tri-lobe filters so the good performing

di-lobe ordinal filters in our previous work are also included into our current development.

To test the proposed feature selection method for palm-print recognition, the PolyU palmprint

image database is used for performance evaluation. The PolyU Palmprint Database was collected

by a CCD camera-based imaging device. A subject puts his hand on a platform with the guidance

of six pegs. Hence the low-resolution images (75 dpi) are captured for online processing. The

latest PolyU Ver 2.0 contains 7,752 palmprint images from 386 palms. Each palm has two

sessions of images, either of which has at the most 10 images. Average time interval between

two sessions is two months. Light conditions and focus of the imaging device are changed between two occasions of image capture, which is challenging to robustness of recognition

algorithms. All images are 8-bit gray-level images with resolution 384 284. It is a great

challenge to group together intra-class palm-print images without compromising inter-class

distinctiveness. The latest version of PolyU Palmprint Database or PolyUPalmprint Ver 2.0 has


27/33



been widely used in the literature and most state-of-the-art palmprint recognition methods are

tested and compared on this database. The first version of PolyU Palmprint Database or PolyU-

Palmprint Ver 1.0 only has 600 palmprint images of 100 classes. In this paper we use PolyU-

Palmprint Ver 1.0 as the training dataset and PolyUPalmprint Ver 2.0 as the testing dataset.

It should be noted that the palmprint images of PolyU 1.0 are transformed from a small part of

images in PolyU 2.0 so there may exist correlation or overlap between PolyU 1.0 and PolyU 2.0.

It is usually suggested to use independent training and testing datasets in pattern recognition

experiments. However, this paper still uses PolyU 1.0 for training and PolyU 2.0 for testing due to

the following reasons.

Almost all public palmprint databases including PolyU and CASIA do not have a division of

training set and testing set like face biometrics. So most palmprint recognition researchers

usually report the best results which are tuned on the whole database. We think it is fair to

compare our methods with state-of-the-art palmprint recognition methods considering PolyU

1.0 is only related to 7.7% palmprint images of PolyU 2.0. It is better to report the palmprint

recognition accuracy on the full PolyU 2.0 for performance evaluation of the existing

methods.

Our previous work has demonstrated that it is easy to achieve 100% accuracy in PolyU 1.0 for

both competitive code and ordinal code. So the performance of state-of-the-art palmprint

recognition methods on the independent version of PolyU 2.0 (excluding all related images in

PolyU 1.0) can be measured and compared with the testing results on PolyU 2.0.

The generalization capability of LP-OM will be demonstrated on the CASIA database using

the ordinal features trained on PolyU 1.0. So it is unnecessary to emphasize the independence


28/33



between PolyU 1.0 and PolyU 2.0.

Since PolyU Palmprint Database is collected using high-quality sensor and PolyU-Palmprint

Ver 1.0 is small in size, our previous work based on hand crafted di-lobe ordinal filter can

achieve zero EER on PolyU-Palmprint Ver 1.0. To learn a robust feature set of ordinal measures,a more challenging training dataset is constructed by adding some noise and perturbations into

PolyU-Palmprint Ver 1.0 (Fig. 10). Finally the synthetic training dataset includes 4,200 palmprint

images of 100 classes.

Firstly 5,000 tri-lobe ordinal filters are generated with random parameter setting of location,

scale, and orientation. They are tested on the training dataset. The top 500 trilobe ordinal filters

with the smallest EER are selected as the candidate feature pool. Some tri-lobe ordinal filters in

the feature pool are shown in Fig. 9b. We can see that the ordinal filters are significantly

different to the filters used for iris recognition. And then the proposed linear programming

method is used to select the top 5 ordinal filters as shown in Fig. 11a. The experimental results

on the testing dataset show that we can only use the first two tri-lobe ordinal filters to achieve

state-of-the-art palmprint recognition performance. It is a grand challenge to search the huge

parameter space and find the optimal parameter setting of tri-lobe ordinal filters for palmprint

recognition because the design of tri-lobe ordinal filters totally involves 15 variables. Although

the top 2 trilobe ordinal filters selected from the random filter pool are good enough for palmprint recognition, the candidate feature pool only has 500 tri-lobe ordinal filters and it is

possible to find better tri-lobe ordinal filters outside the candidate feature pool. Therefore we

further generate more tri-lobe ordinal filters based on the basic profiles of top 2 tri-lobe ordinal

filters by variations of the scale and location parameters of basic lobes in tri-lobe ordinal filters.

The newly generated tri-lobe ordinal filters are used to train a better palmprint recognition


29/33



algorithm after the second round of feature selection (Fig. 1 1b).

The experimental results of the three feature selection methods on PolyU Palmprint Image

Database Ver 2.0 are shown in Fig. 12 and Table II. The state-of-the-art palmprint recognition

method based on competitive code and its variants and our previously proposed di-lobe OM are

used as the reference algorithms for performance comparison. We can see that the top 2 tri-lobe

ordinal filters in the first round of LP feature selection already achieve smaller EER than Boost-

OM, Lasso-OM, competitive code and di-lobe OM. Moreover, the LP-OM after the second round

of feature selection achieves the highest accuracy (EER = 6.19 10-5) with the smallest feature

template (256 Bytes) on the PolyU Ver 2.0 to the best of our knowledge.


30/33




31/33



VI. CONCLUSIONS

The authors have proposed a novel feature selection method to learn the most effective ordinal

features for iris and palmprint recognition based on linear programming. Due to the

incorporation of the large margin principle and weighted sparsity rules into the LP formulation

the LP feature selection becomes very successful. The feature selection model based on LP is

flexible to integrate the prior information of each feature unit related to biometric recognition

such as DI, EER and AUC into the optimization procedure. The experimental results have

demonstrated that the proposed LP feature selection method outperforms mRMR, ReliefF,

Boosting and Lasso.

A number of conclusions can be drawn from the study.

The identity information of visual biometric patterns comes from the unique structure of

ordinal measures. The optimal setting of parameters in local ordinal descriptors varies from

biometric modality to modality, subject to subject and even region to region. So it is

impossible to develop a common set of ordinal filters to achieve the best performance for all

visual biometric patterns. Ideally it is better to select the optimal ordinal filters to encode

individually specific ordinal measures via machine learning. However, such a personalized

solution is inefficient in large-scale personal identification applications. So the task of this

paper turns to a suboptimal solution, learning a common ordinal feature set for each biometricmodality, which is expected to work well for most subjects.

A main contribution of this paper is a novel optimization formulation for feature selection

based on linear programming (LP). Our expectations on the feature selection

results, i.e. an accurate and sparse ordinal feature set, can be described as a linear objective

function. Such a linear learning model has three advantages. Firstly, it is simple to build,

understand, learn and explain the feature selection model. Secondly, linear penalty term is

robust against outliers. Thirdly, linear model only needs a small number of training samples toachieve a global optimization result with great generalization ability.

Weighted sparsity is proposed in this paper and the results show that it performs better than

traditional sparse representation methods. So it is better to incorporate prior information of

candidate features into the optimization model in sparse learning.


32/33



The intra-class variations in visual biometrics mainly come from photometric (e.g.

illumination) and geometric changes (e.g. pose, deformation). In this paper we have shown that

LP feature selection is a good solution to sharp photometric variations and slight geometric

variations in iris and palmprint patterns. Our future work will apply LP feature selection to other

visual biometric traits such as palm vein, finger vein, face and fingerprint recognition, but some

additional efforts may be required to address the sharp geometric variations in face (pose) and

fingerprint (deformation) biometrics. The proposed linear programming formulation is used for

visual biometrics in this paper, but we think it is a general feature selection and sparse

representation method applicable to other computer vision and pattern recognition tasks.


33/33


REFERENCES

[1] T. Tan and Z. Sun, Ordinal representations for biometrics recognition, in Proc. 15th Eur.

Signal Process. Conf., 2007, pp. 35 39.

[2] Z. Sun and T. Tan, Ordinal measures for iris recognition, IEEE Trans. Pattern Anal. Mach.

Intell., vol. 31, no. 12, pp. 2211 2226, Dec. 2009.

[3] Z. Sun, T. Tan, Y. Wang, and S. Z. Li, Ordinal palmprint represen tation for personal

identification, in Proc. Conf. Comput. Vis. Pattern Recognit. (CVPR), vol. 1. 2005, pp. 279

284.

[4] P. Viola and M. Jones, Robust rea l-time face detection, Int. J. Comput. Vis., vol. 57, no. 2,

pp. 137 154, May 2004.

[5]PolyU Palmprint Database [Online]. Available: http://www.comp.polyu.edu.hk/ ~biometrics/[6] S. Z. Li, R. Chu, S. Liao, and L. Zhang, Illumination invariant face recognition using near -

infrared images, IEEE Trans. Pattern Anal. Mach. Intell., vol. 29, no. 4, pp. 627 639, Apr.

2007.

[7] CASIA Iris Image Database [Online]. Available: http://biometrics.idealtest.org
http://www.comp.polyu.edu.hk/http://www.comp.polyu.edu.hk/http://biometrics.idealtest.org/http://biometrics.idealtest.org/http://biometrics.idealtest.org/http://biometrics.idealtest.org/http://www.comp.polyu.edu.hk/

Documents

Ordinal Feature Selection for Iris and Palmprint Recognition+Report 2