Automatic Speaker Verification

Embed Size (px)

DESCRIPTION

this ppt describes about recognising a person by his voice .

Citation preview

  • Automatic Speaker Verification

    Zouhir Wakaf, PhD

  • Outline Introduction Speaker identification vs verification Speaker verification overview The parts of a speaker verification system Evaluation of speaker verification performance Application Future Directions

  • IntroductionExtracting Information from SpeechSpeech SignalSpeechrecognitionSpeakerrecognitionWordsSpeaker identityHow are you?Dr. AhmadGoal: Automatically extract informationtransmitted in speech signal

  • Determine the speaker identity Selection between a set of known voices The user does not claim an identity Closed set identification Assume that all speakers are known to the system Open set identification Possibility that speaker is not among the speakers known to the systemSpeaker identification???Whose voice is this?

  • Speaker Verification Synonyms: authentication, detection

    User claims an identity

    System task: Accept or reject identity claim

    The voice can come from outside the set of known speakers All speakers known: closed set

    Impostor: All voices but the true identity?Is this Ahmads voice?

  • Identification vs verificationFeatureextractionImpostor ModelSpeaker Modeldecision +_> accept

  • Speech ModalitiesApplication dictates different speech modalities:

    Text-dependent recognition Recognition system knows text spoken by person Examples: fixed phrase, prompted phrase Used for applications with strong control over user input Knowledge of spoken text can improve system performance Prompting may reduce risk of impostors using voice recordings

    Text-independent recognition Recognition system does not know text spoken by person Examples: User selected phrase, conversational speech Used for applications with less control over user input More flexible system but also more difficult problem Speech recognition can provide knowledge of spoken text

  • Speech for Identification Speech is easily produced

    It does not require advanced input devices

    Can be applied using telephones, PCs

    Can be supplied with

    - password phrase to improve security - Personal knowledge

  • Speaker verification Which features? How to model the speaker How to model the imposters How to make the decision to minimize probability of error

  • Two distinct phases to any speaker verification systemEnrolment PhasePhases of Speaker Verification SystemModel trainingFeatureextractionEnrolment speech for each speakerVoiceprints (models)for each speakerAhmadSalmaAhmadVerification PhaseVerificationdecisionFeatureextractionAccepted!Claimed identity: SalmaSalma

  • Features for Speaker Recognition Humans use several levels of perceptual cues for speaker recognition There are no exclusive speaker identity cues Low-level acoustic cues (physical traits) most applicable for automatic systems Desirable attributes of features for an automatic system

    Occur naturally and frequently in speech Easily measurable Not change over time or be affected by speakers health Not be affected by reasonable background noise nor depend on specific transmission characteristics Not be subject to mimicryPractical

    Robust

    Secure

  • Features for Speaker Recognition No feature has all these attributes

    Features derived from spectrum of speech have proven to be the most effective in automatic systems

    Typically: MFCCs

  • Speaker Models Speaker models (voiceprints) represent voice biometric in compactand generalizable form Modern speaker verification systems useHidden Markov Models (HMMs)

    HMMs are statistical models of how a speakerproduces sounds h-a-d

    HMMs represent underlying statistical variationsin the speech state (e.g., phoneme) and temporalchanges of speech between the states. Fast training algorithms (EM) exist for HMMs withguaranteed convergence properties.

  • Speaker ModelsForm of HMM depends on the applicationFixed Phrase Word/phrase modelsOpen SemsamePrompted phrases/passwords Phoneme models /s/ /i/ /x/Text-independent single state HMMGeneral speech

  • Text-independent speaker verification The imposter model is built using speech from all speakers

    GMM with high number of mixture components

    The speaker model is built using speaker adaptation Relatively small amount of speech

  • Verification Decision

    The decision is a 2-class hypothesis testH0: the speaker is an impostorH1: the speaker is indeed the claimed speaker. Statistic computed on test utterance S as likelihood ratio:

    Likelihood S came from speaker HMMLikelihood S did not come from speaker HMM=logFeatureextractionImpostor ModelSpeaker Modeldecision +_> accept

  • Verification PerformanceEvaluating Speaker Verification SystemsThere are many factors to consider in evaluating speakerverification systemsSpeech quality Channel and microphone characteristics Noise level and type Variability between enrolment and verification speechSpeech modality Fixed/prompted/user-selected phrases Free textSpeech durationSpeaker population Duration and number of sessions of enrolment and verification speech Size and composition

  • DET-curveImportance of the error types depend on application!

  • Applications Transaction authentication Toll fraud prevention Telephone credit card purchases Telephone brokerage (e.g., stock trading)

  • Applications Access control Physical facilities Computers and data networks

  • Applications Monitoring Remote time and attendance logging Home parole verification Prison telephone usage

  • Applications Information retrieval Customer information for call centers Audio indexing (speech skimming device)

  • Applications Forensics Voice sample matching

    Recorded threatSuspect

  • Future DirectionsResearch will focus on using speaker recognitionfor more unconstrained, uncontrolled situations

    Audio search and retrieval Increasing robustness to channel variability Incorporating higher-levels of knowledge into decisions

    Speaker recognition technology will become anintegral part of speech interfaces

    Personalization of services and devices Unobtrusive protection of transactions and information