Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
A Novel and ComputationallyEfficient Approach to Setting
Decision Thresholds in an AdaptiveSpeaker Verification System
A Novel and ComputationallyEfficient Approach to Setting
Decision Thresholds in an AdaptiveSpeaker Verification System
Nikki MirghaforiNikki Mirghafori
Joint work withJoint work withMatthieu Hebert at NuanceMatthieu Hebert at Nuance
Friday the 13th, February 2004Friday the 13th, February 2004
2
OutlineOutline
•• BeginningBeginning
•• MiddleMiddle
•• EndEnd
3
OutlineOutline
•• MotivationMotivation•• ApproachApproach•• Data AnalysisData Analysis•• CalibrationCalibration•• Experimental ResultsExperimental Results•• ConclusionsConclusions
ICASSP 2004: Mirghafori, N. and Hebert, M. "Parameterization ofthe Score Threshold for a Text-Dependent Adaptive SpeakerVerification System", Montreal, Canada.
4
Context: Speaker Verification FieldContext: Speaker Verification FieldApplicationApplication
An example of a verification sessionAn example of a verification sessionAn example of a verification session
AcceptAccept
RejectRejectAuthenticate
VoiceAuthenticateAuthenticate
VoiceVoiceVerification
EngineVerificationVerification
EngineEngine
SpeakerSpeakerModelModel
Please enter youraccount number““55512345551234””You’re acceptedby the system
CompareCompare
ScoreScore
ThresholdThreshold
5
Problems with Score ThresholdsProblems with Score Thresholds
•• Overall fixed system thresholds areOverall fixed system thresholds aredifficult to set in advancedifficult to set in advance
•• If combined with online adaptation,If combined with online adaptation,thresholds must be dynamic and adaptivethresholds must be dynamic and adaptive
6
Problem I Problem I –– Setting Overall Threshold Setting Overall Threshold
•• Setting overall thresholds before-hand is challengingSetting overall thresholds before-hand is challenging
•• High convenience? High security?High convenience? High security?
•• Difference between lab and field conditions can causeDifference between lab and field conditions can causedifferences in thresholdsdifferences in thresholds
•• Not all speakers are created equalNot all speakers are created equal
Use
r F
alse
Rej
ect R
ate
(FR
)
Imposter False Accept Rate (FA)
High Security
High Convenience
X Threshold = 1.5
X
Threshold = 0.8
7
Problem II Problem II –– Effects of Online Effects of OnlineAdaptationAdaptation
Effects of Online Adaptation on EER and FA (at fixed threshold)
0
2
4
6
8
Base
line
Iter1
Iter2
Iter3
Iter4
Iter5
Iter6
Iter7
Iter8
Condition
Err
or
Rate EER
FA
8
Problem II Problem II –– Increase in Impostor Increase in ImpostorScoresScores
True Speaker and Imposter Distributions Pre and Post Adaptation
-7 -2 3 8
Verification Scores
Num
Trials
(N
orm
aliz
ed)
PreAdapt:TS
PreAdapt: Imp
PostAdapt: TS
PostAdapt: Imp
9
Problem -- IIProblem -- II
•• Speaker adaptation successfully reduces EER inSpeaker adaptation successfully reduces EER inverification [Heck & Mirghafori, ICSLP verification [Heck & Mirghafori, ICSLP ‘‘00]00]
•• However, both true speaker and, to a lesserHowever, both true speaker and, to a lesserextent, impostor scores may increaseextent, impostor scores may increase
•• Increase in impostor false accept rates can causeIncrease in impostor false accept rates can causespeaker model corruptionspeaker model corruption
•• Similar score increase observed in speechSimilar score increase observed in speechrecognition for confidence scores [recognition for confidence scores [Sankar Sankar &&KannanKannan, ICASSP , ICASSP ‘‘02]. Score transformation02]. Score transformationused as a solution.used as a solution.
10
MotivationMotivation
1.1. Eliminate the challenge of setting overallEliminate the challenge of setting overallapplication thresholds in advanceapplication thresholds in advance
2.2. Counter the verification score shiftsCounter the verification score shiftsresulting from online adaptationresulting from online adaptation
11
Previous Work in Setting ThresholdsPrevious Work in Setting Thresholdsin Speaker Verificationin Speaker Verification
qq Mirghafori, N. and Heck, L.P., "An Adaptive SpeakerMirghafori, N. and Heck, L.P., "An Adaptive SpeakerVerification System With Speaker Dependent A PrioriVerification System With Speaker Dependent A PrioriDecision Thresholds", Proc. International Conference onDecision Thresholds", Proc. International Conference onSpoken Language Processing, Denver, Colorado, 2002.Spoken Language Processing, Denver, Colorado, 2002.
qq A.C. A.C. Surendran Surendran and C.-H Lee. and C.-H Lee. ““A priori threshold selectionA priori threshold selectionin fixed vocabulary speaker verification system.in fixed vocabulary speaker verification system.”” ICSLP ICSLP20002000
qq J. Lindberg, J.W. J. Lindberg, J.W. KoolwaaijKoolwaaij, H.-P. , H.-P. HutterHutter, D. , D. GenoudGenoud, M., M.BlombergBlomberg, J.-B. , J.-B. PierrotPierrot, and F. , and F. BimbotBimbot. . ““Normalization andNormalization andselection of speech segments for speaker recognitionselection of speech segments for speaker recognitionscoring.scoring.”” RLA2C 1998. RLA2C 1998.
12
Our ApproachOur ApproachT
hres
hold
Number of training frames in the speaker model
FA=5.0%
FA=2.0%
FA=1.0%FA=0.5%
SDT = F( target_FA, number_of_training_frames)
13
EnrollmentEnrollment
AdaptationAdaptation
(1 (1 utt utt only)only)
TestingTesting
(Adaptation (Adaptation
turned off)turned off)
Test SetupTest Setup
•• Japanese DigitsJapanese Digits•• 6,477 speaker models6,477 speaker models•• Enrollment: 8-digit string (3x)Enrollment: 8-digit string (3x)•• Held out test set:Held out test set:
•• 8-digit string (1x)8-digit string (1x)
•• Adaptation:Adaptation:•• 8-digit string (1x)8-digit string (1x)
•• Gather all test scoresGather all test scores
•• Divide into bins w.r.t. to num framesDivide into bins w.r.t. to num frames
•• Calculate threshold at target FACalculate threshold at target FA
14
Speaker Verification SystemSpeaker Verification System
Impostor
MAPAdaptation
HD+GDImpostor Models
T E-Carb
TE-Cell
Cellular Carbon
Claimant
Electret
Root(UBM)
“Bob” onElectret
ElectretTE-Cell
T E-Carb
Cellular Carbon
MAPAdaptation
Root Model - HI+GI GMM
15
Japanese Digits
0
0.5
1
1.5
2
2.5
50
0
70
0
90
0
11
00
13
00
15
00
17
00
19
00
21
00
23
00
Num Frames in Speaker Model
Th
res
ho
ld
0.1% FA0.20.30.40.50.81357
The trend looks good: The trend looks good: Threshold = (Target FA, Num Frames)Threshold = (Target FA, Num Frames)
16
““ItIt’’s a Bug!s a Bug!”” Or is it? Or is it?
Canadian French Text
00.5
11.5
22.5
33.5
44.5
100
300
500
700
900
1100
1300
1500
1700
Num Frames in Speaker Model
Th
res
ho
ld
0.1% FA0.20.30.40.50.81357
17
The trend is confirmedThe trend is confirmed……
UK English Text
0
0.5
1
1.5
2
2.5
3
10
0
20
0
30
0
40
0
50
0
60
0
70
0
80
0
90
0
10
00
11
00
12
00
13
00
Num Frames in Speaker Model
Th
res
ho
ld
0.1% FA0.20.30.40.50.81357
18
WhatWhat’’s the Missing Parameter?s the Missing Parameter?
•• Password length!Password length!
•• Japanese digits DB uniform, whereas the twoJapanese digits DB uniform, whereas the twoothers are notothers are not
•• Contain both short and long passwordsContain both short and long passwords
•• In general: verification performance lower forIn general: verification performance lower forshorter passwordsshorter passwords
•• Hence, need higher thresholdHence, need higher threshold
•• Can we parameterize?Can we parameterize?•• Threshold = G(Target FA, Num. Frames, Password Length)Threshold = G(Target FA, Num. Frames, Password Length)
19
CalibrationCalibration
•• Japanese text AND digitJapanese text AND digit
•• Both short and long passwordsBoth short and long passwords
•• Trained and tested 10K speaker modelsTrained and tested 10K speaker models
•• Divided password length region between 0 andDivided password length region between 0 and700 into 8 bins700 into 8 bins
20
Password Length distributionPassword Length distribution
Password Length
Bins: between 120K to 196K attempts each
21
Num. Training Frames in Speaker ModelNum. Training Frames in Speaker ModelFor each password lengthFor each password length
Num. Training Frames in Speaker Model
Long passwords
Short passwords•Divide data each bininto 10 sub-binsaccording to the numberof frames in speakermodel
•Calculated AveragePassword Length (APL)and Average NumFrames (ANF)
22
Coverage of 2D spaceCoverage of 2D spaceNum. Training Frames Num. Training Frames vsvs. Password Len.. Password Len.
Ave. Num. Training Frames in Speaker Model
Ave
. P
assw
ord
Leng
th
Questionable region:at the edge of distrib.- it was excluded
23
Result of fittingResult of fittingTargetFA TargetFA = 0.1%= 0.1%
TargetFA=0.1%
ANF/APLAPL
Thresh.
Fit looks good!
• Fit a second degreepolynomial with threeparameters (10 freecoefficients)
24
Result of fittingResult of fittingTargetFA TargetFA = 5%= 5%
TargetFA=5%
ANF/APLAPL
Thresh.
Fit looks good!
25
Actual FA rates for FA goal 0.2-1.5%
0.53
0.68
0.42
0.4
2.16
1.97
1.13
1.2
0.11
0.41
0.17
1.08
0 0.5 1 1.5 2 2.5
EA Dig1
EA Dig2
EA Dig3
EA Dig4
EA Txt1
EA Txt2
EUK Dig
EUK Txt
FR Dig
FR Txt
SP Dig
SP Txt
Test
set
Goal Region
26
Effects of Online Adapatation on FA
0.00%
0.50%
1.00%
1.50%
2.00%
2.50%
3.00%
3.50%
Preadapt Iter0 Iter3 Iter6 Iter9 Iter12 Iter15 Iter18 Iter21 Iter24
Iterations of Online Adaptation
FA
Rate
FCDT
BASE
27
In ConclusionIn Conclusion
•• Setting Setting a prioria priori thresholds is a challenging and important thresholds is a challenging and importantproblem for field applicationsproblem for field applications
•• Threshold can be parameterized as function ofThreshold can be parameterized as function of•• Target FATarget FA
•• Number of training frames in speaker modelNumber of training frames in speaker model
•• Password lengthPassword length
•• Threshold parameters can be calibrated on one databaseThreshold parameters can be calibrated on one databaseand applied to others successfullyand applied to others successfully
•• Thresholds achieved within target FA rangeThresholds achieved within target FA range
•• FA rate kept constant during online adaptationFA rate kept constant during online adaptation
•• Overall performance (EER) does not degradeOverall performance (EER) does not degrade