A Novel and Computationally Efficient Approach to Setting ...mi.eng.cam.ac.uk/seminars/speech/mirghafori_seminar1.pdf · EA Txt1 EA Txt2 EUK Dig EUK Txt FR Dig FR Txt SP Dig SP Txt

A Novel and ComputationallyEfficient Approach to Setting

Decision Thresholds in an AdaptiveSpeaker Verification System

A Novel and ComputationallyEfficient Approach to Setting

Decision Thresholds in an AdaptiveSpeaker Verification System

Nikki MirghaforiNikki Mirghafori

Joint work withJoint work withMatthieu Hebert at NuanceMatthieu Hebert at Nuance

Friday the 13th, February 2004Friday the 13th, February 2004

2

OutlineOutline

•• BeginningBeginning

•• MiddleMiddle

•• EndEnd

3

OutlineOutline

•• MotivationMotivation•• ApproachApproach•• Data AnalysisData Analysis•• CalibrationCalibration•• Experimental ResultsExperimental Results•• ConclusionsConclusions

ICASSP 2004: Mirghafori, N. and Hebert, M. "Parameterization ofthe Score Threshold for a Text-Dependent Adaptive SpeakerVerification System", Montreal, Canada.

4

Context: Speaker Verification FieldContext: Speaker Verification FieldApplicationApplication

An example of a verification sessionAn example of a verification sessionAn example of a verification session

AcceptAccept

RejectRejectAuthenticate

VoiceAuthenticateAuthenticate

VoiceVoiceVerification

EngineVerificationVerification

EngineEngine

SpeakerSpeakerModelModel

Please enter youraccount number““55512345551234””You’re acceptedby the system

CompareCompare

ScoreScore

ThresholdThreshold

5

Problems with Score ThresholdsProblems with Score Thresholds

•• Overall fixed system thresholds areOverall fixed system thresholds aredifficult to set in advancedifficult to set in advance

•• If combined with online adaptation,If combined with online adaptation,thresholds must be dynamic and adaptivethresholds must be dynamic and adaptive

6

Problem I Problem I –– Setting Overall Threshold Setting Overall Threshold

•• Setting overall thresholds before-hand is challengingSetting overall thresholds before-hand is challenging

•• High convenience? High security?High convenience? High security?

•• Difference between lab and field conditions can causeDifference between lab and field conditions can causedifferences in thresholdsdifferences in thresholds

•• Not all speakers are created equalNot all speakers are created equal

Use

r F

alse

Rej

ect R

ate

(FR

)

Imposter False Accept Rate (FA)

High Security

High Convenience

X Threshold = 1.5

X

Threshold = 0.8

7

Problem II Problem II –– Effects of Online Effects of OnlineAdaptationAdaptation

Effects of Online Adaptation on EER and FA (at fixed threshold)

0

2

4

6

8

Base

line

Iter1

Iter2

Iter3

Iter4

Iter5

Iter6

Iter7

Iter8

Condition

Err

or

Rate EER

FA

8

Problem II Problem II –– Increase in Impostor Increase in ImpostorScoresScores

True Speaker and Imposter Distributions Pre and Post Adaptation

-7 -2 3 8

Verification Scores

Num

Trials

(N

orm

aliz

ed)

PreAdapt:TS

PreAdapt: Imp

PostAdapt: TS

PostAdapt: Imp

9

Problem -- IIProblem -- II

•• Speaker adaptation successfully reduces EER inSpeaker adaptation successfully reduces EER inverification [Heck & Mirghafori, ICSLP verification [Heck & Mirghafori, ICSLP ‘‘00]00]

•• However, both true speaker and, to a lesserHowever, both true speaker and, to a lesserextent, impostor scores may increaseextent, impostor scores may increase

•• Increase in impostor false accept rates can causeIncrease in impostor false accept rates can causespeaker model corruptionspeaker model corruption

•• Similar score increase observed in speechSimilar score increase observed in speechrecognition for confidence scores [recognition for confidence scores [Sankar Sankar &&KannanKannan, ICASSP , ICASSP ‘‘02]. Score transformation02]. Score transformationused as a solution.used as a solution.

10

MotivationMotivation

1.1. Eliminate the challenge of setting overallEliminate the challenge of setting overallapplication thresholds in advanceapplication thresholds in advance

2.2. Counter the verification score shiftsCounter the verification score shiftsresulting from online adaptationresulting from online adaptation

11

Previous Work in Setting ThresholdsPrevious Work in Setting Thresholdsin Speaker Verificationin Speaker Verification

qq Mirghafori, N. and Heck, L.P., "An Adaptive SpeakerMirghafori, N. and Heck, L.P., "An Adaptive SpeakerVerification System With Speaker Dependent A PrioriVerification System With Speaker Dependent A PrioriDecision Thresholds", Proc. International Conference onDecision Thresholds", Proc. International Conference onSpoken Language Processing, Denver, Colorado, 2002.Spoken Language Processing, Denver, Colorado, 2002.

qq A.C. A.C. Surendran Surendran and C.-H Lee. and C.-H Lee. ““A priori threshold selectionA priori threshold selectionin fixed vocabulary speaker verification system.in fixed vocabulary speaker verification system.”” ICSLP ICSLP20002000

qq J. Lindberg, J.W. J. Lindberg, J.W. KoolwaaijKoolwaaij, H.-P. , H.-P. HutterHutter, D. , D. GenoudGenoud, M., M.BlombergBlomberg, J.-B. , J.-B. PierrotPierrot, and F. , and F. BimbotBimbot. . ““Normalization andNormalization andselection of speech segments for speaker recognitionselection of speech segments for speaker recognitionscoring.scoring.”” RLA2C 1998. RLA2C 1998.

12

Our ApproachOur ApproachT

hres

hold

Number of training frames in the speaker model

FA=5.0%

FA=2.0%

FA=1.0%FA=0.5%

SDT = F( target_FA, number_of_training_frames)

13

EnrollmentEnrollment

AdaptationAdaptation

(1 (1 utt utt only)only)

TestingTesting

(Adaptation (Adaptation

turned off)turned off)

Test SetupTest Setup

•• Japanese DigitsJapanese Digits•• 6,477 speaker models6,477 speaker models•• Enrollment: 8-digit string (3x)Enrollment: 8-digit string (3x)•• Held out test set:Held out test set:

•• 8-digit string (1x)8-digit string (1x)

•• Adaptation:Adaptation:•• 8-digit string (1x)8-digit string (1x)

•• Gather all test scoresGather all test scores

•• Divide into bins w.r.t. to num framesDivide into bins w.r.t. to num frames

•• Calculate threshold at target FACalculate threshold at target FA

14

Speaker Verification SystemSpeaker Verification System

Impostor

MAPAdaptation

HD+GDImpostor Models

T E-Carb

TE-Cell

Cellular Carbon

Claimant

Electret

Root(UBM)

“Bob” onElectret

ElectretTE-Cell

T E-Carb

Cellular Carbon

MAPAdaptation

Root Model - HI+GI GMM

15

Japanese Digits

0

0.5

1

1.5

2

2.5

50

0

70

0

90

0

11

00

13

00

15

00

17

00

19

00

21

00

23

00

Num Frames in Speaker Model

Th

res

ho

ld

0.1% FA0.20.30.40.50.81357

The trend looks good: The trend looks good: Threshold = (Target FA, Num Frames)Threshold = (Target FA, Num Frames)

16

““ItIt’’s a Bug!s a Bug!”” Or is it? Or is it?

Canadian French Text

00.5

11.5

22.5

33.5

44.5

100

300

500

700

900

1100

1300

1500

1700


Th

res

ho

ld

0.1% FA0.20.30.40.50.81357

17

The trend is confirmedThe trend is confirmed……

UK English Text

0

0.5

1

1.5

2

2.5

3

10

0

20

0

30

0

40

0

50

0

60

0

70

0

80

0

90

0

10

00

11

00

12

00

13

00


Th

res

ho

ld

0.1% FA0.20.30.40.50.81357

18

WhatWhat’’s the Missing Parameter?s the Missing Parameter?

•• Password length!Password length!

•• Japanese digits DB uniform, whereas the twoJapanese digits DB uniform, whereas the twoothers are notothers are not

•• Contain both short and long passwordsContain both short and long passwords

•• In general: verification performance lower forIn general: verification performance lower forshorter passwordsshorter passwords

•• Hence, need higher thresholdHence, need higher threshold

•• Can we parameterize?Can we parameterize?•• Threshold = G(Target FA, Num. Frames, Password Length)Threshold = G(Target FA, Num. Frames, Password Length)

19

CalibrationCalibration

•• Japanese text AND digitJapanese text AND digit

•• Both short and long passwordsBoth short and long passwords

•• Trained and tested 10K speaker modelsTrained and tested 10K speaker models

•• Divided password length region between 0 andDivided password length region between 0 and700 into 8 bins700 into 8 bins

20

Password Length distributionPassword Length distribution

Password Length

Bins: between 120K to 196K attempts each

21

Num. Training Frames in Speaker ModelNum. Training Frames in Speaker ModelFor each password lengthFor each password length

Num. Training Frames in Speaker Model

Long passwords

Short passwords•Divide data each bininto 10 sub-binsaccording to the numberof frames in speakermodel

•Calculated AveragePassword Length (APL)and Average NumFrames (ANF)

22

Coverage of 2D spaceCoverage of 2D spaceNum. Training Frames Num. Training Frames vsvs. Password Len.. Password Len.

Ave. Num. Training Frames in Speaker Model

Ave

. P

assw

ord

Leng

th

Questionable region:at the edge of distrib.- it was excluded

23

Result of fittingResult of fittingTargetFA TargetFA = 0.1%= 0.1%

TargetFA=0.1%

ANF/APLAPL

Thresh.

Fit looks good!

• Fit a second degreepolynomial with threeparameters (10 freecoefficients)

24

Result of fittingResult of fittingTargetFA TargetFA = 5%= 5%

TargetFA=5%

ANF/APLAPL

Thresh.

Fit looks good!

25

Actual FA rates for FA goal 0.2-1.5%

0.53

0.68

0.42

0.4

2.16

1.97

1.13

1.2

0.11

0.41

0.17

1.08

0 0.5 1 1.5 2 2.5

EA Dig1

EA Dig2

EA Dig3

EA Dig4

EA Txt1

EA Txt2

EUK Dig

EUK Txt

FR Dig

FR Txt

SP Dig

SP Txt

Test

set

Goal Region

26

Effects of Online Adapatation on FA

0.00%

0.50%

1.00%

1.50%

2.00%

2.50%

3.00%

3.50%

Preadapt Iter0 Iter3 Iter6 Iter9 Iter12 Iter15 Iter18 Iter21 Iter24

Iterations of Online Adaptation

FA

Rate

FCDT

BASE

27

In ConclusionIn Conclusion

•• Setting Setting a prioria priori thresholds is a challenging and important thresholds is a challenging and importantproblem for field applicationsproblem for field applications

•• Threshold can be parameterized as function ofThreshold can be parameterized as function of•• Target FATarget FA

•• Number of training frames in speaker modelNumber of training frames in speaker model

•• Password lengthPassword length

•• Threshold parameters can be calibrated on one databaseThreshold parameters can be calibrated on one databaseand applied to others successfullyand applied to others successfully

•• Thresholds achieved within target FA rangeThresholds achieved within target FA range

•• FA rate kept constant during online adaptationFA rate kept constant during online adaptation

•• Overall performance (EER) does not degradeOverall performance (EER) does not degrade

Documents

A Novel and Computationally Efficient Approach to Setting ...mi.eng.cam.ac.uk/seminars/speech/mirghafori_seminar1.pdf · EA Txt1 EA Txt2 EUK Dig EUK Txt FR Dig FR Txt SP Dig SP Txt