View
212
Download
0
Category
Tags:
Preview:
Citation preview
June 28th, 2004June 28th, 2004 BioSecure, SecurePhoneBioSecure, SecurePhone 11
Automatic Speaker Automatic Speaker Verification : Verification :
Technologies, EvaluationsTechnologies, Evaluationsand Possible Futureand Possible Future
Gérard CHOLLETGérard CHOLLETCNRS-LTCI, GET-ENSTCNRS-LTCI, GET-ENST
chollet@tsi.enst.fr
Biometrics in Current Security EnvironmentsBiometrics in Current Security Environments
June 28th, 2004June 28th, 2004 BioSecure, SecurePhoneBioSecure, SecurePhone 22
OutlineOutline State of affairs (tasks, security, forensic,…)State of affairs (tasks, security, forensic,…) Speaker characteristics in the speech signalSpeaker characteristics in the speech signal Automatic Speaker Verification :Automatic Speaker Verification :
Decision theoryDecision theory Text dependent / Text independentText dependent / Text independent
Imposture (occasional, dedicated)Imposture (occasional, dedicated) Voice transformationsVoice transformations Audio-visual speaker verificationAudio-visual speaker verification Evaluations (algorithms, field tests, ergonomy,…)Evaluations (algorithms, field tests, ergonomy,…) Conclusions, PerspectivesConclusions, Perspectives
June 28th, 2004June 28th, 2004 BioSecure, SecurePhoneBioSecure, SecurePhone 33
Why should a computer recognize Why should a computer recognize who is speaking ?who is speaking ?
Protection of individual property (habitation, bank account, Protection of individual property (habitation, bank account, personal data, messages, mobile phone, PDA,...) personal data, messages, mobile phone, PDA,...)
Limited access (secured areas, data bases)Limited access (secured areas, data bases) Personalization (only respond to its master’s voice)Personalization (only respond to its master’s voice) Locate a particular person in an audio-visual document Locate a particular person in an audio-visual document
(information retrieval)(information retrieval) Who is speaking in a meeting ?Who is speaking in a meeting ? Is a suspect the criminal ? (forensic applications)Is a suspect the criminal ? (forensic applications)
June 28th, 2004June 28th, 2004 BioSecure, SecurePhoneBioSecure, SecurePhone 44
Tasks in Tasks in Automatic Speaker RecognitionAutomatic Speaker Recognition
Speaker verification (Voice Biometric)Speaker verification (Voice Biometric) Are you really who you claim to be ?Are you really who you claim to be ?
Identification (Speaker ID) :Identification (Speaker ID) : Is this speech segment coming from a known speaker ?Is this speech segment coming from a known speaker ? How large is the set of speakers (population of the How large is the set of speakers (population of the
world) ? world) ? Speaker detection, segmentation, indexing, retrieval, tracking :Speaker detection, segmentation, indexing, retrieval, tracking :
Looking for recordings of a particular speakerLooking for recordings of a particular speaker Combining Speech and Speaker RecognitionCombining Speech and Speaker Recognition
Adaptation to a new speaker, speaker typologyAdaptation to a new speaker, speaker typology Personalization in dialogue systemsPersonalization in dialogue systems
June 28th, 2004June 28th, 2004 BioSecure, SecurePhoneBioSecure, SecurePhone 55
ApplicationsApplications
Access ControlAccess Control Physical facilities, Computer networks, WebsitesPhysical facilities, Computer networks, Websites
Transaction AuthenticationTransaction Authentication Telephone banking, e-CommerceTelephone banking, e-Commerce
Speech data ManagementSpeech data Management Voice messaging, Search enginesVoice messaging, Search engines
Law EnforcementLaw Enforcement Forensics, Home incarcerationForensics, Home incarceration
June 28th, 2004June 28th, 2004 BioSecure, SecurePhoneBioSecure, SecurePhone 66
Voice BiometricVoice Biometric AvantagesAvantages
Often the only modality over the telephone,Often the only modality over the telephone, Low cost (microphone, A/D), UbiquityLow cost (microphone, A/D), Ubiquity Possible integration on a smart (SIM) card Possible integration on a smart (SIM) card Natural bimodal fusion : speaking faceNatural bimodal fusion : speaking face
DisadvantagesDisadvantages Lack of discretionLack of discretion Possibility of imitation and electronic imposturePossibility of imitation and electronic imposture Lack of robustness to noise, distortion,…Lack of robustness to noise, distortion,… Temporal driftTemporal drift
June 28th, 2004June 28th, 2004 BioSecure, SecurePhoneBioSecure, SecurePhone 77
Speaker Identity in SpeechSpeaker Identity in Speech Differences inDifferences in
Vocal tract shapes and muscular controlVocal tract shapes and muscular control Fundamental frequency (typical values)Fundamental frequency (typical values)
100 Hz (Male), 200 Hz (Female), 300 Hz (Child)100 Hz (Male), 200 Hz (Female), 300 Hz (Child) Glottal waveformGlottal waveform PhonotacticsPhonotactics Lexical usageLexical usage
The differences between Voices of Twins is a limit The differences between Voices of Twins is a limit casecase
Voices can also be imitated or disguisedVoices can also be imitated or disguised
June 28th, 2004June 28th, 2004 BioSecure, SecurePhoneBioSecure, SecurePhone 88
spectral envelope of / i: /
f
A
Speaker A
Speaker B
Speaker Identity
segmental factors (~30ms)segmental factors (~30ms) glottal excitationglottal excitation::
fundamental frequency, amplitude,fundamental frequency, amplitude,voice quality (e.g., breathiness)voice quality (e.g., breathiness)
vocal tractvocal tract::characterized by its transfer function characterized by its transfer function and represented by MFCCs (Mel and represented by MFCCs (Mel Freq. Cepstral Coef)Freq. Cepstral Coef)
suprasegmental factorssuprasegmental factors speaking speed (timing and rhythm of speech units)speaking speed (timing and rhythm of speech units) intonation patternsintonation patterns dialect, accent, pronunciation habitsdialect, accent, pronunciation habits
June 28th, 2004June 28th, 2004 BioSecure, SecurePhoneBioSecure, SecurePhone 99
Acoutic featuresAcoutic features
Short term spectral analysisShort term spectral analysis
June 28th, 2004June 28th, 2004 BioSecure, SecurePhoneBioSecure, SecurePhone 1010
Intra- and Inter-speaker Intra- and Inter-speaker variabilityvariability
June 28th, 2004June 28th, 2004 BioSecure, SecurePhoneBioSecure, SecurePhone 1111
Speaker Verification
Typology of approaches (EAGLES Handbook) Text dependent
Public password Private password Customized password Text prompted
Text independent Incremental enrolment Evaluation
June 28th, 2004June 28th, 2004 BioSecure, SecurePhoneBioSecure, SecurePhone 1212
History of Speaker History of Speaker RecognitionRecognition
June 28th, 2004June 28th, 2004 BioSecure, SecurePhoneBioSecure, SecurePhone 1313
Current approachesCurrent approaches
June 28th, 2004June 28th, 2004 BioSecure, SecurePhoneBioSecure, SecurePhone 1414
HMM structure depends on the HMM structure depends on the applicationapplication
June 28th, 2004June 28th, 2004 BioSecure, SecurePhoneBioSecure, SecurePhone 1515
Gaussian Mixture ModelGaussian Mixture Model Parametric representation of the Parametric representation of the
probability distribution of observations:probability distribution of observations:
June 28th, 2004June 28th, 2004 BioSecure, SecurePhoneBioSecure, SecurePhone 1616
Gaussian Mixture ModelsGaussian Mixture Models
8 Gaussians per mixture
June 28th, 2004June 28th, 2004 BioSecure, SecurePhoneBioSecure, SecurePhone 1717
Two types of errors :Two types of errors : False rejectionFalse rejection (a client is rejected) (a client is rejected) False acceptationFalse acceptation (an impostor is accepted) (an impostor is accepted)
Decision theory : given an observation O and a claimed Decision theory : given an observation O and a claimed identityidentity HH00 hypothesis : it comes from an impostor hypothesis : it comes from an impostor HH1 1 hypothesis : it comes from our clienthypothesis : it comes from our client
HH1 1 is chosen if and only if P(is chosen if and only if P(HH11|O) > P(|O) > P(HH00|O) |O)
which could be rewritten (using Bayes law) aswhich could be rewritten (using Bayes law) as
Decision theory Decision theory for identity verificationfor identity verification
)1()(
)(
)1(
HPHoP
HoOP
HOP>
June 28th, 2004June 28th, 2004 BioSecure, SecurePhoneBioSecure, SecurePhone 1818
Signal detection theorySignal detection theory
June 28th, 2004June 28th, 2004 BioSecure, SecurePhoneBioSecure, SecurePhone 1919
DecisionDecision
June 28th, 2004June 28th, 2004 BioSecure, SecurePhoneBioSecure, SecurePhone 2020
Distribution of scoresDistribution of scores
June 28th, 2004June 28th, 2004 BioSecure, SecurePhoneBioSecure, SecurePhone 2121
Detection Error Tradeoff (DET) Detection Error Tradeoff (DET) CurveCurve
June 28th, 2004June 28th, 2004 BioSecure, SecurePhoneBioSecure, SecurePhone 2222
EvaluationEvaluation
Decision cost (FA, FR, priors, costs,…)Decision cost (FA, FR, priors, costs,…) Receiver Operating Characteristic CurveReceiver Operating Characteristic Curve Reference systems (open software)Reference systems (open software) Evaluations (algorithms, field trials, Evaluations (algorithms, field trials,
ergonomy,…)ergonomy,…)
June 28th, 2004June 28th, 2004 BioSecure, SecurePhoneBioSecure, SecurePhone 2323
National Institute of Standards & Technology National Institute of Standards & Technology (NIST)(NIST)
Speaker Verification EvaluationsSpeaker Verification Evaluations
• Annual evaluation since 1995• Common paradigm for comparing technologies
June 28th, 2004June 28th, 2004 BioSecure, SecurePhoneBioSecure, SecurePhone 2424
NIST evaluations : ResultsNIST evaluations : Results
ENST 2003
June 28th, 2004June 28th, 2004 BioSecure, SecurePhoneBioSecure, SecurePhone 2525
Combining Speech Recognition Combining Speech Recognition and Speaker Verification.and Speaker Verification.
Speaker independent phone HMMsSpeaker independent phone HMMs Selection of segments or segment Selection of segments or segment
classes which are speaker specificclasses which are speaker specific Preliminary evaluations are performed Preliminary evaluations are performed
on the NIST extended data set (one on the NIST extended data set (one hour of training data per speaker)hour of training data per speaker)
June 28th, 2004June 28th, 2004 BioSecure, SecurePhoneBioSecure, SecurePhone 2626
ALISP data-driven speech ALISP data-driven speech segmentationsegmentation
June 28th, 2004June 28th, 2004 BioSecure, SecurePhoneBioSecure, SecurePhone 2727
Searching in client and world speech Searching in client and world speech dictionaries dictionaries
for speaker verification purposesfor speaker verification purposes
June 28th, 2004June 28th, 2004 BioSecure, SecurePhoneBioSecure, SecurePhone 2828
FusionFusion
June 28th, 2004June 28th, 2004 BioSecure, SecurePhoneBioSecure, SecurePhone 2929
Fusion resultsFusion results
June 28th, 2004June 28th, 2004 BioSecure, SecurePhoneBioSecure, SecurePhone 3030
Speaking Faces : MotivationsSpeaking Faces : Motivations
A person speaking in front of a camera offers 2 A person speaking in front of a camera offers 2 modalities for identity verification (speech and face).modalities for identity verification (speech and face).
The sequence of face images and the The sequence of face images and the synchronisation of speech and lip movements could synchronisation of speech and lip movements could be exploited.be exploited.
Imposture is much more difficult than with single Imposture is much more difficult than with single modalities.modalities.
Many PCs, PDAs, mobile phones are equiped with a Many PCs, PDAs, mobile phones are equiped with a camera. Audio-Visual Identity Verification will offer camera. Audio-Visual Identity Verification will offer non-intrusive security for e-commerce, e-banking,…non-intrusive security for e-commerce, e-banking,…
June 28th, 2004June 28th, 2004 BioSecure, SecurePhoneBioSecure, SecurePhone 3131
Talking Face RecognitionTalking Face Recognition(hybrid verification)(hybrid verification)
June 28th, 2004June 28th, 2004 BioSecure, SecurePhoneBioSecure, SecurePhone 3232
Lip featuresLip features
Tracking lip movementsTracking lip movements
June 28th, 2004June 28th, 2004 BioSecure, SecurePhoneBioSecure, SecurePhone 3333
A talking face modelA talking face model
Using Hidden Markov Models (HMMs)Using Hidden Markov Models (HMMs)
Acoustic parameters
Visual parameters
June 28th, 2004June 28th, 2004 BioSecure, SecurePhoneBioSecure, SecurePhone 3434
Morphing, avatarsMorphing, avatars
June 28th, 2004June 28th, 2004 BioSecure, SecurePhoneBioSecure, SecurePhone 3535
Conclusions, PerspectivesConclusions, Perspectives
Deliberate imposture is a challenge for speech only systems
Verification of identity based on features extracted from talking faces should be developped
Common databases and evaluation protocols are necessary
Free access to reference systems will facilitate future developments
Recommended