34
Email: {ikeno, John.Hansen}@utdallas.edu Slide 1 IAFPA-2006 Center for Robust Speech Systems SLIDES by John H.L. Hansen, 2006 Ayako Ikeno and John H.L. Hansen IAFPA-2006 July 23-26, 2006 Center for Robust Speech Systems (CRSS) Erik Jonsson School of Engineering & Computer Science University of Texas at Dallas Richardson, Texas 75083-0688, U.S.A.

Ayako Ikeno and John H.L. Hansen

  • Upload
    phyre

  • View
    46

  • Download
    0

Embed Size (px)

DESCRIPTION

Perceptual In-Set Speaker ID using Neutral Speech and Lombard Speech. Ayako Ikeno and John H.L. Hansen. Center for Robust Speech Systems (CRSS) Erik Jonsson School of Engineering & Computer Science University of Texas at Dallas Richardson, Texas 75083-0688, U.S.A. - PowerPoint PPT Presentation

Citation preview

  • Email: {ikeno, John.Hansen}@utdallas.edu Slide * IAFPA-2006 Center for Robust Speech Systems SLIDES by John H.L. Hansen, 2006

    First observed by Etienne Lombard in 1911

    Change in speech production in response to noise to increase communication performance

    Lombard Test - standard test for hearing loss in U.S. (ASHA) measure dB-SPL change in speech production

    Hansen (1988) evaluation of 200 features with +10,000 statistical tests on 11 different stressed speech conditions to quantify changes in speech production

    Email: {ikeno, John.Hansen}@utdallas.edu Slide * IAFPA-2006 Center for Robust Speech Systems SLIDES by John H.L. Hansen, 2006

    IAFPA-06: focus on Lombard Effect Audio samples for the perceptual experiment were extracted from UTScope corpus.

    Speech under COgnitive and Physical stress & Emotion

    Consists of 4 Domains Lombard Effect noise levels & types Physical Stress stair climbing/stepper Cognitive Stress driving (simulator & actual) Emotion (Angry, Fear, Anxiety, Frustration)

    Email: {ikeno, John.Hansen}@utdallas.edu Slide * IAFPA-2006 Center for Robust Speech Systems SLIDES by John H.L. Hansen, 2006

    Goal: obtain Lombard Speech at different noise levelsQuantify ground truth with biometric analysis Lombard Effect Speech 9 conditions (3 noise, 3 levels)1 sec. durationPink Noise65,75,85 dB-SPLHighway Noise(windows open)70,80,90 dB-SPLLarge Crowd Noise70,80,90 dB-SPL

    Email: {ikeno, John.Hansen}@utdallas.edu Slide * IAFPA-2006 Center for Robust Speech Systems SLIDES by John H.L. Hansen, 2006

    UTScopePINK NOISE 65, 75, 86 dB-SPLHIGHWAY DRIVING,WINDOWS HALF OPEN70, 80 ,90 dB-SPLLARGE CROWD NOISE70, 80, 90 dB-SPLPURETONE HEARING SCREENINGOPEN-AIR HEADPHONESFOR SPEECH FEEDBACKNOISE LEVELS CALIBRATEDWITH QUEST SLM

    Email: {ikeno, John.Hansen}@utdallas.edu Slide * IAFPA-2006 Center for Robust Speech Systems SLIDES by John H.L. Hansen, 2006

    UTScope20 TIMIT SENTENCES5 DIGIT STRINGS1 MINUTE SPONTANEOUS SPEECH100 SPEAKERS8-CHANNEL DAT RECORDERP-MICCLOSE-TALKING MICFAR-FIELD MIC

    Email: {ikeno, John.Hansen}@utdallas.edu Slide * IAFPA-2006 Center for Robust Speech Systems SLIDES by John H.L. Hansen, 2006

    The ASHA-certified sound booth and recording equipments

    Email: {ikeno, John.Hansen}@utdallas.edu Slide * IAFPA-2006 Center for Robust Speech Systems SLIDES by John H.L. Hansen, 2006

    Male Lombard Male Neutral Lombard Effect impacts Temporal and Spectral Structure (as expected) Evaluation: Perceptual Experiments to assess Speaker Recognition

    Email: {ikeno, John.Hansen}@utdallas.edu Slide * IAFPA-2006 Center for Robust Speech Systems SLIDES by John H.L. Hansen, 2006

    Listener Test Speakers Corpus: UTScope Native US English speakers Female speakers only Speech ConditionsNoise TypeHighway drivingNoise Level90dB-SPL

    ReferenceTestNL-LDNeutralLombardLD-LDLombardLombardNL-NLNeutralNeutral

    Email: {ikeno, John.Hansen}@utdallas.edu Slide * IAFPA-2006 Center for Robust Speech Systems SLIDES by John H.L. Hansen, 2006

    Speech MaterialsRead speechTIMIT sentences: phonetically balanced3 sentences per audio sample (.wav, 16k Hz)Ref: Basketball can be an entertaining sport.My problem is, the cats meow always hurts my ears.The causeway ended abruptly at the shore.Test: Youngsters commonly love chocolate and candies as treats. December and January are nice months to spend in Miami. There were other farmhouses nearby.

    Email: {ikeno, John.Hansen}@utdallas.edu Slide * IAFPA-2006 Center for Robust Speech Systems SLIDES by John H.L. Hansen, 2006

    Listener Test Listeners (12: 2f/10m May 06, -- 41 as of July 06)

    India(4), China(1), Korea(1), Mexico(1), Pakistan(1), Thai(1), Turkey(1) US(1), Vietnam(1) Task: In-set vs. Out-of-set Speaker IdentificationReference/Training12 In-set Female speakersTest8 In-Set speakers 4 Out-of-Set speakers

    Email: {ikeno, John.Hansen}@utdallas.edu Slide * IAFPA-2006 Center for Robust Speech Systems SLIDES by John H.L. Hansen, 2006

    Reference audio: Neutral Lombard Test audio: Neutral Lombard

    Email: {ikeno, John.Hansen}@utdallas.edu Slide * IAFPA-2006 Center for Robust Speech Systems SLIDES by John H.L. Hansen, 2006

    The effect of speech condition: significant (p=.0024).Mismatched condition (NL-LD) accuracy: chance level (52%).Lombard speech (LD-LD, 79%): higher accuracy than neutral speech (NL-NL, 67%). Lombard effect may emphasize the speech characteristics, and improve accuracy on perceptual speaker ID.

    Part-1

    S7587.550

    WK585075

    RH332550

    VP5025100

    A676375

    LG5025100

    PA4213100

    MH676375

    NM423850

    AD330100

    MA4213100

    MM676375

    523979

    Part-2

    S677550

    WK9210075

    RH587525

    VP758850

    A758850

    LG9288100

    PA9210075

    MH587525

    NM50750

    AD8375100

    MA100100100

    MM100100100

    798763

    Part-3

    S50750

    WK8310050

    RH503875

    VP757575

    A838875

    LG6750100

    PA678825

    MH75880

    NM677550

    AD9288100

    MA585075

    MM425025

    677254

    Results_all

    527967

    398772

    796354

    NL-LD

    LD-LD

    NL-NL

    Speaker Category

    Identification Accuracy %

    In-Set vs. Out-of-Set Speaker IdentificationLombard Effect

    Average

    52

    79

    67

    Average

    Reference Speech and Test Speech Condictions

    Accuracy %

    In-Set vs. Out-of-Set Speaker IdentificationAccuracy AverageLombard Effecct

    In-Out

    398772

    796354

    NL-LD

    LD-LD

    NL-NL

    Speaker Category

    Accuracy %

    In-Set vs. Out-of-Set Speaker Identification Lombard Effect

    StatsView

    75675088757550500

    58928350100100757550

    335850257538502575

    5075752588751005075

    677583638888755075

    509267258850100100100

    42926713100881007525

    67587563758875250

    42506738757550050

    33839207588100100100

    4210058131005010010075

    671004263100507510025

    All

    Average

    AverageIn-SetOut-of-Set

    NL-LD523979

    LD-LD798763

    NL-NL677254

    Average

    NL-LD52

    LD-LD79

    NL-NL67

    In-SetOut-of-Set

    NL-LD3979

    LD-LD8763

    NL-NL7254

    Email: {ikeno, John.Hansen}@utdallas.edu Slide * IAFPA-2006 Center for Robust Speech Systems SLIDES by John H.L. Hansen, 2006

    Automated System Performance (SUSAS Corpus)(See Hansen, et.al, The Impact of Speech Under `Stress' on Military Speech Technology, NATO Research & Tech. Org. RTO-TR-10, March 2000). Angry62%Lombard48%Loud74%5-74%LOSSThe trend hold the same for the automated system.

    Email: {ikeno, John.Hansen}@utdallas.edu Slide * IAFPA-2006 Center for Robust Speech Systems SLIDES by John H.L. Hansen, 2006

    In-Set accuracy: affected by the speech condition significantly (p