A Multi-Hypothesis Approach for Off-Line Signature ... · A Multi-Hypothesis Approach for Off-Line...

Preview:

Citation preview

A Multi-Hypothesis Approach for Off-Line Signature Verification with HMMs

Luana Batista, Eric Granger and Robert SabourinLaboratoire d’imagerie, de vision et d’intelligence artificielle (LIVIA)

Ecole de technologie superieure1100, rue Notre-Dame Ouest, Montreal, QC, H3C 1K3, Canada

lbatista@livia.etsmtl.ca, {eric.granger, robert.sabourin}@etsmtl.ca

Abstract

In this paper, an approach based on the combinationof discrete Hidden Markov Models (HMMs) in the ROCspace is proposed to improve the performance of off-linesignature verification (SV) systems designed from limitedand unbalanced training data. This approach is inspiredby the multiple-hypothesis principle, and allows the systemto choose, from a set of different HMMs, the most suitablesolution for a given input sample. By training an ensembleof user-specific HMMs with different number of states, andthen combining these models in the ROC space, it is pos-sible to construct a composite ROC curve that provides amore accurate estimation of system’s performance duringtraining and significantly reduces the error rates during op-erations. The experiments performed by using a real-worldSV database with random, simple and skilled forgeries, in-dicated that the proposed approach can reduce the averageerror rates by more than 17%.

1 Introduction

Signature verification (SV) systems seek to authenticatethe identity of an individual, based on an analysis of his/hersignature, through a process that discriminates a genuinesignature from a forgery [1]. In off-line SV, the signature isavailable on a sheet of paper, which is later scanned in or-der to obtain a digital representation. These systems are rel-evant in many situations where handwritten signatures arecurrently used, such as cashing checks, transactions withcredit cards, and authenticating documents [8].

Though not fully explored in literature, it has recentlybeen shown that the Receiver Operating Characteristic(ROC) curve - where the true positive rates (TPR) are plot-ted on the y axis, while the false positive rates (FPR) areplotted on the x axis - provides a powerful tool for evaluat-ing, combining and comparing off-line SV systems [2] [10].By taking into account several operating points (thresholds),

the ROC curve allows to analyze these systems under dif-ferent classification costs [5].

Among several well-known classification methods usedin off-line SV, the discrete Hidden Markov Model (HMM)- a finite stochastic automata used to model sequences ofobservations - adapts easly to the dynamic characteristicsof the occidental handwriting [9]. Despite the fact that theHMM is a generative classifier [3] which generally requiresmuch training data to perform well, the HMM-based off-line SV systems are often designed from limited and unbal-anced data. This occurs because the dataset used to model awriter generally contains a reduced number of genuine sig-natures against several random forgeries [10].

In this paper, an approach based on the combination ofdiscrete HMMs in the ROC space is proposed to improveperformance of off-line SV systems designed from limitedand unbalanced data. By training an ensemble of user-specific HMMs with different number of states and thencombining these models in the ROC space, it is possibleto construct a composite ROC curve that provides a moreaccurate estimation of system’s performance during train-ing. Moreover, during operations, the corresponding oper-ating points - which may be selected dynamically accordingto the risk associated with the input samples - can signif-icantly reduce the error rates. This approach follows themultiple-hypothesis principle [7], which request the systemto propagate several hypothesis throughout the recognitionsteps, generating a hierarchical tree of possible solutions.

The rest of this paper is organized as follows. Next sec-tion describes the issues that impact the performance of off-line SV systems. Then, Section 3 describes the proposedapproach and its advantages. In Section 4, the experimentalresults are shown and discussed. Finally, Section 5 presentsthe conclusions of this work.

2 Problem Statement

In this paper, it is assumed that the off-line SV systemis composed of several local classifiers, where each one is a

2009 10th International Conference on Document Analysis and Recognition

978-0-7695-3725-2/09 $25.00 © 2009 IEEE

DOI 10.1109/ICDAR.2009.5

1315

discrete HMM trained with data corresponding to a specificwriter, and that the performance of the whole system is mea-sured by a single averaged ROC curve. Averaging methodshave been used to group ROC curves from different classi-fiers [5] [11]. Ross [11], for example, proposed a methodto generate averaged ROC curves which takes into accountuser-specific thresholds. At first, for each user i, the cumu-lative histogram of the impostor scores (regarding that user)is computed. Then, the similarity scores (thresholds) pro-viding a same value of cumulative frequency, γ, are used tocompute the operating points {TPRi(γ), FPRi(γ)}. Fi-nally, the operating points associated with a same γ areaveraged. Note that γ can be viewed as the true negativerate (TNR = negatives (forgeries) correctly classified / to-tal of negatives) and that it may be associated with differentthresholds. Figure 1 shows an example where the thresh-olds associated with γ = 0.3 are different for users 1 and 2,that is tuser1(0.3) ∼= 5.0 and tuser2(0.3) ∼= 5.5.

2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 70

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Similarity Score

Cum

ulat

ive

Fre

quen

ce (

γ )

user 1user 2

Figure 1. Cumulative histogram of the impos-tor scores regarding two different users.

In off-line SV, where the dataset used to model a writersignature generally contains a reduced number of genuinesamples against several random forgeries, it is common toobtain ROC curves with concave areas. In general, a con-cave area indicates that the ranking provided by the classi-fier in this region is worse than random [6]. Figure 2 (a)shows an example of score distribution in an off-line SVsystem. The positive class, P, contains only 10 genuine sam-ples of a given writer, while the negative class, N, contains100 samples of forgeries. Due the limited amount of sam-ples in the positive class, the resulting ROC curve (see Fig-ure 2 (b)) presents three concave areas, which correspondto low-quality predictions [6]. For example, the similarityscores between −1.2 and 0 provide TPRs of 90%. The re-sult of averaging the ROC curves related to the models of100 different writers, by using the Ross’s method, is illus-

trated by Figure 3. Note that the imperfections of individualROC curves are hidden within the average points, which canbe observed with any averaging algorithm.

−15 −10 −5 0 5 10 150

1

2

3

4

5

6

Similarity Score

Fre

quen

cy

NP

(a)

0 0.2 0.4 0.6 0.8 1

0.4

0.5

0.6

0.7

0.8

0.9

1

FPR

TP

R

concavity produced bysimilarity scores between −1.2 and 0

(b)

Figure 2. Typical score distribution and re-spective ROC curve of a writer model.

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35

0.4

0.5

0.6

0.7

0.8

0.9

1

FPR

TP

R

User−specific ROC curve Averaged ROC curve

γ = 0.91γ = 0.86

Global convex hull

Figure 3. Averaged ROC curve obtained byapplying the Ross’s method to 100 differentuser-specific ROC curves.

The drawback of using an averaged ROC curve can beobserved during the selection of optimal thresholds in therespective convex hull. Given two γ in the convex hull, γ1

and γ2, where each one minimizes a different set of costs[13], γ2 should provide a TPR higher than γ1 wheneverγ1 > γ2. However, regarding an user-specific ROC curve,γ1 and γ2 may fall in a same concave area, providing iden-tical TPRs. An example is illustrated by Figure 3, whereTPR(γ = 0.86) is higher than TPR(γ = 0.91) in theglobal convex hull, but, in the user-specific ROC curve,TPR(γ = 0.86) is equal to TPR(γ = 0.91).

1316

3 A Multi-Hypothesis Approach

Based on the combination of HMMs trained with dif-ferent number of states, the approach proposed in this sec-tion provides a solution to repair concavities of user-specificROC curves while generating a high quality averaged ROCcurve. Three steps are involved in the proposed multi-hypothesis approach: model selection, combination and av-eraging.

3.1 Model Selection

During the model selection step, a ROC outer bound-ary is constructed in order to encapsulate the best operatingpoints provided by HMMs trained with different numberof states. The utilization of different HMMs is motivatedby the fact that the superiority of a classifier over anothermay not occur on the whole ROC space [5]. In off-line SVsystems where the optimal number of states for a HMM isfound empirically by a cross-validation process [4] [9], itis often observed that the best HMM is not superior thanthe other intermediate/sub-optimal HMMs in all operatingpoints of the ROC space.

Given a set of ROC curves generated from different clas-sifiers associated with a same user, the process consists insplitting the x axis ∈ [0, 1] into a number of bins, and withineach bin finding the pair (FPR, TPR) having the largestvalue of TPR. While the ROC outer boundary is being gen-erated, the best HMMs, where s is the number of states,is automatically chosen for each operating point. Figure 4shows an example of an user-specific ROC outer boundaryconstructed from ROC curves of two different classifiers,HMM7 and HMM9. The corresponding convex hull iscomposed of three vertices, p, q and r, where p and q areassociated with HMM9, and r is associated with HMM7.

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

FPR

TP

R

ROC HMM9

ROC HMM7

r

q

p

Convex Hull

ROC outer boundary

Figure 4. ROC outer boundary constructedfrom ROC curves of two different HMMs.

3.2 Combination

In the second step, the combination method proposed byScott et. al [12] is applied to the ROC outer boundaries inorder to repair concavities. Given two vertices A and B onthe convex hull, it is possible to realize a point C, locatedbetween A and B, by randomly choosing between A andB. The probability of selecting one of the two operatingpoints is determined by the distance of C regarding A andB. Equations 1 and 2 indicates how a given FPRC can beobtained.

P (C = A) =FPRC − FPRB

FPRA − FPRB(1)

P (C = B) = 1 − P (C = A) (2)

In the convex hull illustrated by Figure 4, any point in thesegments pq and qr is realizable by combination. The ex-pected operating point (FPR, TPR) is given by equations3 and 4 [12].

FPR = (P (C = A) · FPRA) + (P (C = B) · FPRB) (3)

TPR = (P (C = A) · TPRA) + (P (C = B) · TPRB) (4)

3.3 Averaging

Finally, a process based on the Ross’s method [11] isused to group the operating points already computed dur-ing the combination step. Given a γ, the process searchesthe pairs {TPRi(γ), FPRi(γ)} where FPRi(γ) = 1 − γ(recalling that γ can be viewed as the TNR, and thatFPR = 1 − TNR). Then, the operating points corre-sponding to a same γ are averaged and used to generate theaveraged ROC curve.

In the test phase, γ is used to retrieve the set of user-specific HMMs/thresholds which will be applied to un-known data. Figure 5 illustrates two possible situationslinking the averaged ROC curve and an user-specific ROCcurve. In the first case, γ falls directly in HMM7. Whilein the second case, the requested γ is obtained combiningclassifiers HMM7 and HMM9. That is, each test samplemust be randomly sent either to HMM7 or to HMM9, ac-cording to the probabilities given by equations (1) and (2).

4 Simulation Results

The brazilian signature database [9] was used for proof-of-concept computer simulations. It contains 7920 samplesof signatures that were digitized as 8-bit greyscale imagesover 400X1000 pixels, at resolution of 300 ppi. The sig-natures were provided by 168 writers and are organized intwo sets: the development database (DBdev) and the ex-ploitation database (DBexp). DBdev is composed by 4320

1317

Averaged ROC curve User-specific ROC curve

HMM 9

HMM 7

HMM 7

Figure 5. Retrieving user-specific HMMs froman averaged ROC curve.

genuine samples supplied by 108 individuals and is mainlyused for designing codebooks. DBexp contains 60 writers,each one with 40 samples of genuine signatures, 10 samplesof simple forgery and 10 samples of skilled forgery. Foreach writer, 20 genuine samples are used for training, 10genuine samples for validation, and 30 samples for test (10genuine samples, 10 simple forgeries and 10 skilled forg-eries). Moreover, 10 genuine samples are randomly selectedfrom the other 59 writers and used as random forgeries totest the current model.

The signature images are represented by means of den-sity of pixels, extracted through the grid segmentationscheme described in [9]. In experiments not presented here,a codebook with 35 symbols, CB35, was constructed byapplying the k-means algorithm to the first 30 signatures ofeach writer in DBdev . Then, CB35 was applied to DBexp,which is composed of 60 authors. The ROC curve of themulti-hypothesis system is indicated by the star-dashed linein Figure 6. Whereas the circle-dashed line represents thebaseline or single-hypothesis system used for comparisons.In this system, only the HMM which performs the best inthe cross-validation process1 is considered for generatingan user-specific ROC curve. This means that all operat-ing points of an individual ROC curve are associated witha same HMM. Besides, the user-specific ROC curves aredirectly averaged, without repairing, by using the standardRoss’s method [11] (see the inner graphic in Figure 6). Asexpected, the multi-hypothesis system provided a higherROC curve, and, due the Scott’s method [12], a superiornumber of operating points was obtained. Note that bothROC curves were generated by using a validation set whichcontains only genuine samples and random forgeries.

In the following phase, the operating points of the multi-

1Given a set of HMMs trained with different number of states, thecross-validation process selects the HMM providing the highest trainingprobability [9].

0 0.02 0.04 0.06 0.08 0.100.85

0.9

0.95

1

FPR

TP

R

Multi−hypothesis system

Single−hypothesis system

0 0.02 0.04 0.06 0.08 0.10

0.2

0.4

0.6

0.8

1

Standard Ross’saveraging method

60 user−specific ROCcurves non−repaired

Figure 6. ROC curves of the single- and multi-hypothesis systems.

hypothesis ROC curve (given by γ) were used to retrieve theuser-specific HMMs/thresholds, and apply them to the testset. Table 1 presents the error rates on test for some γ valuesin both single and multi-hypothesis systems. Since there arethree types of forgeries in the test set, the average error rateis calculated as AER(γ) = (FNR(γ)+FPR(γ)random+FPR(γ)simple + FPR(γ)skilled)/4.

Table 1. Error rates (%) on test.Results obtained with the single-hypothesis system

γ FNR FPRrand. FPRsimp. FPRskil. AER0.96 0.50 6.00 10.83 64.83 20.540.97 0.83 5.67 9.00 60.17 18.920.98 1.17 4.00 5.67 52.50 15.830.99 2.33 2.67 4.00 42.67 12.92

1 12.67 0.33 1.17 19.83 8.50Results obtained with the multi-hypothesis system

γ FNR FPRrand. FPRsimp. FPRskil. AER0.96 4.13 4.67 8.70 54.81 18.070.97 4.15 3.17 7.35 52.23 16.720.98 4.48 2.50 5.51 47.05 14.880.99 5.55 1.17 4.00 39.63 12.58

1 12.83 0 1.17 20.50 8.62

In general, the multi-hypothesis system provided smallererror rates. Moreover, the FPR(γ)random are closer to theexpected error rates given by 1-γ. Other results obtainedwith the multi-hypothesis system are presented in Table 2.

Finally, in order to analyze the impact of repairing in-dividual ROC curves, the proposed approach was appliedonly to the 20 writers having ROC curves with concavities.On average, 75.91% of the problematic writers had theirAERs on test enhanced with the multi-hypothesis systemin the region between γ = 0.9 and γ = 1. This represents25.30% of the whole population, which may indicate a con-siderable amount of users in a real world application. Only18.64% of the problematic writers performed better with

1318

Table 2. Additional error rates (%) on testobtained with the multi-hypothesis system.γ FNR FPRrand. FPRsimp. FPRskil. AER

0.991 5.71 1.00 3.83 38.01 12.130.992 5.81 0.67 3.83 37.38 11.920.993 6.00 0.67 3.50 36.15 11.580.994 6.17 0.67 3.50 35.90 11.560.995 6.30 0.67 3.17 34.80 11.230.996 7.10 0.67 2.67 33.73 11.040.997 7.63 0.67 2.67 32.48 10.860.998 8.21 0.67 2.67 30.60 10.540.999 9.37 0.17 1.91 26.83 9.57

the single-hypothesis system. For the remaining 5.45%,both systems performed equally. Figure 7 presents the re-sults for γ = 0.96, where the improvements obtained withthe multi-hypothesis system are located in the positive side,that is, below the dotted line. With the single-hypothesissystem, the author indicated by the arrow had an AER of22.5%. When using the multi-hypothesis system, the re-spective AER was reduced to 4.9%, that is, 17.6% lower.

0 5 10 15 20 25 30 350

5

10

15

20

25

30

35γ = 0.96

AER (%) single−hypothesis system

AE

R (

%)

mul

ti−hy

poth

esis

sys

tem

+

Figure 7. User-specific AERs obtained ontest with the single- and multi-hypothesissystems.

5 Conclusions

Based on the combination of HMMs trained with differ-ent number of states, this paper proposed a multi-hypothesisapproach used to improve performance of off-line SV sys-tems designed from limited and unbalanced data. The ex-periments carried out on the brazilian SV database showedthat the proposed approach leads to smaller error rates onunseen data ragarding not only the baseline system, but alsoanother single-hypothesis system developed with the same

SV database [9]2. The multi-hypothesis approach can beused for dynamic selection of the best classification modelbased on the risk associated with the input samples. In abanking application, for example, the decision of using aspecific operating point may be associated with the amountof the check. In the simplest case, for an user that rarelysigns high value checks, big amounts would require oper-ating points related to low FPRs, such as provided by γ=0.999 or γ =1; while normal amounts would require oper-ating points related to low FNRs, since the user would notfeel comfortable with frequent rejections. Finally, this ap-proach can be easly adapted to any type of neural or statisti-cal classifiers designed to solve similar two-class problems.

References

[1] L. Batista, D. Rivard, R. Sabourin, E. Granger, andP. Maupin. State of the art in off-line signature verification.In B. Verma and M. Blumenstein, editors, Pattern Recogni-tion Technologies and Applications: Recent Advances. IGIGlobal, 1st edition, 2007.

[2] H. Coetzer and R. Sabourin. A human-centric off-line signa-ture verification system. In Int. Conf. on Document Analysisand Recognition (ICDAR 2007), pages 153–157, 2007.

[3] C. Drummond. Discriminative vs. generative classifiers forcost sensitive learning. In Canadian Conference on ArtificialIntelligence. Lecture Notes in Artificial Intelligence., pages479–490, 2006.

[4] A. El-Yacoubi, E. Justino, R. Sabourin, and F. Bor-tolozzi. Off-line signature verification using hmms andcross-validation. In IEEE Workshop on Neural Networks forSignal Processing, pages 859–868, 2000.

[5] T. Fawcett. Roc graphs: Notes and practical considerationsfor data mining researchers. Machine Learning, 2004.

[6] P. Flach and S. Wu. Repairing concavities in roc curves,2003.

[7] H. Fujisawa. Robustness Design of Industrial StrengthRecognition Systems. Springer, 2007.

[8] F. Griess and A. Jain. On-line signature verification. PatternRecognition, 35:2963–2972, 2002.

[9] E. Justino, F. Bortolozzi, and R. Sabourin. Off-line signatureverification using hmm for random, simple and skilled forg-eries. In International Conference on Document Analysisand Recognition, pages 105–110, 2001.

[10] L. Oliveira, E. Justino, and R. Sabourin. Off-line signa-ture verification using writer-independent approach. In Int.Joint Conference on Neural Networks (IJCNN), pages 2539–2544, 2007.

[11] A. Ross. Information Fusion in Fingerprint Authentication.PhD thesis, Michigan State University, 2003.

[12] M. Scott, M. Niranjan, and R. Prager. Realisable classifiers:improving operating performance on variable cost problems,1998.

[13] F. Tortorella. A roc-based reject rule for dichotomizers. Pat-tern Recognition Letters, 26:167–180, 2005.

2Error rates reported in [9]: FNR = 2.17%, FPRrand. = 1.23%,FPRsimp. = 3.17%, FPRskil. = 36.57%, AER = 10.78%.

1319

Recommended