15
Babel Audio Analysis - Ellis 2014-01-09 /15 1 1. Environment Types 2. Acoustic Condition Histograms 3. Microphone Condition 4. Conclusions Acoustic Analysis of Babel Recording Conditions Dan Ellis Columbia & ICSI Team SWORDFISH [email protected] http://labrosa.ee.columbia.edu /

Acoustic Analysis of Babel Recording Conditionsdpwe/talks/babel-acoustic-2014-01.pdfBabel Audio Analysis - Ellis 2014-01-09 /15! Example: BP_101 Car_Kit • Gating by subbands "5 0

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

  • Babel Audio Analysis - Ellis 2014-01-09 /15

    ���1

    1. Environment Types

    2. Acoustic Condition Histograms

    3. Microphone Condition

    4. Conclusions

    Acoustic Analysis of Babel Recording Conditions

    !Dan Ellis

    Columbia & ICSI

    Team SWORDFISH

    [email protected] http://labrosa.ee.columbia.edu/

    mailto:[email protected]://labrosa.ee.columbia.edu

  • Babel Audio Analysis - Ellis 2014-01-09 /15

    1. Environment Types• Each Babel recording has envType metadata

    STREET, PUBLIC_PLACE, HOME_OFFICE_MOBILE, VEHICLE, CAR_KIT, HOME_OFFICE_LANDLINE

    ���2

    • Substantial 
effect of 
condition 
& lang, e.g.:

    BP_105
CK vs. HOM

    BP_108
CK vs. HOL

    BP Kaldi 
on devCK V PP S HOM HOL TOT

    52 46 116 83 304 60 661

    Sum − Error breakdown by condition

    condition

    CK V PP S HOM HOL TOT0

    0.2

    0.4

    0.6

    14 9 13 13 54 17 120

    BP_101 − Error breakdown by condition

    erro

    r rat

    e

    CK V PP S HOM HOL TOT11 6 22 12 83 2 136

    BP_104 − Error breakdown by condition

    CK V PP S HOM HOL TOT0

    0.2

    0.4

    0.6

    9 8 20 15 64 11 127

    BP_105 − Error breakdown by condition

    erro

    r rat

    e

    CK V PP S HOM HOL TOT10 13 36 30 43 14 146

    BP_106 − Error breakdown by condition

    CK V PP S HOM HOL TOT0

    0.2

    0.4

    0.6

    8 10 25 13 60 16 132

    BP_107 − Error breakdown by condition

    condition

    erro

    r rat

    e

    subdelins

  • Babel Audio Analysis - Ellis 2014-01-09 /15

    Variations• Per-condition averages hide spreads

    still substantial differences between conditions

    ���3

    CK V PP S HOM HOL TOT00.20.40.60.8

    1

    14 9 13 13 54 17 120

    BP_101 − Error breakdown by condition

    erro

    r rat

    e

    CK V PP S HOM HOL TOT11 6 22 12 83 2 136

    BP_104 − Error breakdown by condition

    CK V PP S HOM HOL TOT00.20.40.60.8

    1

    9 8 20 15 64 11 127

    BP_105 − Error breakdown by condition

    erro

    r rat

    e

    CK V PP S HOM HOL TOT10 13 36 30 43 14 146

    BP_106 − Error breakdown by condition

    CK V PP S HOM HOL TOT00.20.40.60.8

    1

    condition

    8 10 25 13 60 16 132

    BP_107 − Error breakdown by condition

    erro

    r rat

    e

    CK V PP S HOM HOL TOTcondition

    52 46 116 83 304 60 661

    Sum − Error breakdown by condition

    But… 
within-condition 
variation 
exceeds 
between-condition

  • Babel Audio Analysis - Ellis 2014-01-09 /15

    Characterizing Acoustics!

    • Mel Spectrogram

    !!

    • Histogram of 
dB energy in 
each Mel band

    !

    • Energy
covariance

    ���4

    freq

    / Hz

    time / s

    BABEL_OP1_206_65882_20121201_174526_outLine_PP

    0 1 2 3

    579117721373882

    freq / Hzlev

    el / d

    B

    579 1177 2137 3882

    20406080

    100120

    freq / Hz

    freq

    / Hz

    579 2137

    579117721373882

    40

    60

    80

    20406080100

    0

    0.05

  • Babel Audio Analysis - Ellis 2014-01-09 /15

    Example: BP_101 Car_Kit• Gating by subbands

    ���5

    0

    50

    100

    level

    / dB

    CAR KIT - BP 101 43086 20111025 160708 in

    1500 Hz chan

    dB

    0 1000 2000 30000

    50

    100HOME LAND - BP 101 84943 20111020 144955 in

    0 1000 2000 3000freq / Hz freq / Hz

    0

    50

    100

    freq

    / Hz

    time / s87 90 93 96 99

    2000

    4000

    time / s

    level / dB

    67 70 73 76 79

    level

    / dB

    BABEL_BP_101_43086_20111025_160708_inLine − CK

    0

    50

    100

    freq

    / Hz

    579

    1177

    2137

    3882BABEL_BP_101_84943_20111020_144955_inLine − HOL

    0

    50

    100

    freq

    / Hz

    579

    1177

    2137

    3882

    freq / Hz579 1177 2137 3882

    freq / Hz579 3882

    freq / Hz579 1177 2137 3882

    freq / Hz579 3882

  • Babel Audio Analysis - Ellis 2014-01-09 /15

    Top-Level View - BP Languages• All BP_1xx dev set utterances pooled

    ���6

    level

    / dB

    BP_101 − ALL

    1200

    50

    100

    freq

    / Hz

    579117721373882

    BP_104 − ALL

    1360

    50

    100

    freq

    / Hz

    579117721373882

    level

    / dB

    BP_105 − ALL

    1270

    50

    100

    freq

    / Hz

    579117721373882

    BP_106 − ALL

    1460

    50

    100

    freq

    / Hz

    579117721373882

    level

    / dB

    BP_107 − ALL

    freq / Hz

    132579 1177 2137 3882

    0

    50

    100

    freq

    / Hz

    freq / Hz579 3882

    579117721373882

    Corpora − ALL

    freq / Hz

    661579 1177 2137 3882

    0

    50

    100

    freq

    / Hz

    freq / Hz579 3882

    579117721373882

  • Babel Audio Analysis - Ellis 2014-01-09 /15

    Drilling Down: BP_105 (Turkish)• DEV utterances sorted by envType

    ���7

    level

    / dB

    bp_105 − CAR_KIT

    90

    50

    100

    freq

    / Hz

    579117721373882

    bp_105 − VEHICLE

    80

    50

    100

    freq

    / Hz

    579117721373882

    level

    / dB

    bp_105 − PUBLIC_PLACE

    200

    50

    100

    freq

    / Hz

    579117721373882

    bp_105 − STREET

    150

    50

    100

    freq

    / Hz

    579117721373882

    level

    / dB

    bp_105 − HOME_OFFICE_MOBILE

    freq / Hz

    64579 1177 2137 3882

    0

    50

    100

    freq

    / Hz

    freq / Hz579 3882

    579117721373882

    bp_105 − HOME_OFFICE_LANDLINE

    freq / Hz

    11579 1177 2137 3882

    0

    50

    100

    freq

    / Hz

    freq / Hz579 3882

    579117721373882

  • Babel Audio Analysis - Ellis 2014-01-09 /15

    BP_105 Car_Kit vs. HO_Mob

    ���8

    level

    / dB

    BABEL_BP_105_31256_20120531_015506_inLine − CK

    WER=79.60

    50

    100

    579117721373882

    level

    / dB

    BABEL_BP_105_42212_20120706_194059_inLine − CK

    WER=66.30

    50

    100

    579117721373882

    level

    / dB

    BABEL_BP_105_39774_20120623_021946_inLine − CK

    WER=54.90

    50

    100

    579117721373882

    freq / Hz

    level

    / dB

    BABEL_BP_105_31345_20120515_214849_inLine − CK

    WER=49.1579 1177 2137 3882

    0

    50

    100

    freq / Hz

    579 3882

    579117721373882

    BABEL_BP_105_20213_20120123_011920_outLine − HOM

    WER=92.30

    50

    100

    579117721373882

    40

    60

    80

    BABEL_BP_105_48536_20120208_212737_outLine − HOM

    WER=56.90

    50

    100

    579117721373882

    BABEL_BP_105_66883_20120207_051718_inLine − HOM

    WER=46.10

    50

    100

    579117721373882

    freq / Hz

    BABEL_BP_105_87806_20120201_235442_outLine − HOM

    WER=28.8579 1177 2137 3882

    0

    50

    100

    freq / Hz

    579 3882

    579117721373882

  • Babel Audio Analysis - Ellis 2014-01-09 /15

    Top-Level View - OP1 Languages

    ���9

    level

    / dB

    OP1_102 − ALL

    1260

    50

    100

    freq

    / Hz

    579117721373882

    OP1_103 − ALL

    1250

    50

    100

    freq

    / Hz

    579117721373882

    level

    / dB

    OP1_201 − ALL

    1260

    50

    100

    freq

    / Hz

    579117721373882

    OP1_203 − ALL

    1310

    50

    100

    freq

    / Hz

    579117721373882

    level

    / dB

    OP1_206 − ALL

    freq / Hz

    141579 1177 21373882

    0

    50

    100

    freq

    / Hz

    freq / Hz579 3882

    579117721373882

    Corpora − ALL

    freq / Hz

    649579 1177 2137 3882

    0

    50

    100

    freq

    / Hz

    freq / Hz579 3882

    579117721373882

  • Babel Audio Analysis - Ellis 2014-01-09 /15

    Drilling Down: OP1_206 (Zulu)

    ���10

    level

    / dB

    op1_206 − CAR_KIT

    90

    50

    100

    freq

    / Hz

    579117721373882

    op1_206 − VEHICLE

    130

    50

    100

    freq

    / Hz

    579117721373882

    level

    / dB

    op1_206 − PUBLIC_PLACE

    200

    50

    100

    freq

    / Hz

    579117721373882

    op1_206 − STREET

    90

    50

    100

    freq

    / Hz

    579117721373882

    level

    / dB

    op1_206 − HOME_OFFICE_MOBILE

    freq / Hz

    77579 1177 2137 3882

    0

    50

    100

    freq

    / Hz

    freq / Hz579 3882

    579117721373882

    op1_206 − MICROPHONE

    freq / Hz

    13579 1177 2137 3882

    0

    50

    100

    freq

    / Hz

    freq / Hz579 3882

    579117721373882

  • Babel Audio Analysis - Ellis 2014-01-09 /15

    Drilling Down: OP1_201 (Creole)

    ���11

    level

    / dB

    op1_201 − CAR_KIT

    120

    50

    100

    freq

    / Hz

    579117721373882

    op1_201 − VEHICLE

    130

    50

    100

    freq

    / Hz

    579117721373882

    level

    / dB

    op1_201 − PUBLIC_PLACE

    120

    50

    100

    freq

    / Hz

    579117721373882

    op1_201 − STREET

    130

    50

    100

    freq

    / Hz

    579117721373882

    level

    / dB

    op1_201 − HOME_OFFICE_MOBILE

    freq / Hz

    63579 1177 2137 3882

    0

    50

    100

    freq

    / Hz

    freq / Hz579 3882

    579117721373882

    op1_201 − MICROPHONE

    freq / Hz

    13579 1177 2137 3882

    0

    50

    100

    freq

    / Hz

    freq / Hz579 3882

    579117721373882

  • Babel Audio Analysis - Ellis 2014-01-09 /15

    OP1_201 HO_Mob vs. Mic

    ���12

    BABEL_OP1_201_16184_20130305_081912_inLine − HOM

    0

    50

    100

    freq

    / Hz

    579

    1177

    2137

    3882

    BABEL_OP1_201_47877_20130429_092603_outLine − HOM

    0

    50

    100fre

    q / H

    z

    579

    1177

    2137

    3882

    BABEL_OP1_201_70110_20130224_022802_outLine − HOM

    0

    50

    100

    freq

    / Hz

    579

    1177

    2137

    3882

    BABEL_OP1_201_99813_20130514_080612_outLine − HOM

    freq / Hz579 1177 2137 3882

    0

    50

    100

    freq

    / Hz

    freq / Hz579 3882

    579

    1177

    2137

    3882

    level

    / dB

    BABEL_OP1_201_26074_20130522_003756_inLine − Mic

    0

    50

    100

    freq

    / Hz

    579

    1177

    2137

    3882

    level

    / dB

    BABEL_OP1_201_54162_20130508_044116_inLine − MIc

    0

    50

    100

    freq

    / Hz

    579

    1177

    2137

    3882

    level

    / dB

    BABEL_OP1_201_71263_20130602_021725_inLine − Mic

    0

    50

    100

    freq

    / Hz

    579

    1177

    2137

    3882

    level

    / dB

    BABEL_OP1_201_82035_20130601_052036_inLine − Mic

    freq / Hz579 1177 2137 3882

    0

    50

    100

    freq

    / Hz

    freq / Hz579 3882

    579

    1177

    2137

    3882

  • Babel Audio Analysis - Ellis 2014-01-09 /15

    2xx Wideband Mic Channel• 48 kHz sampling rate

    • Single noise floor

    • Extremely broad-band

    ���13

    freq / Hz

    leve

    l / d

    B

    0 1000 2000 30000

    50

    100

  • Babel Audio Analysis - Ellis 2014-01-09 /15

    Blind SNR Estimation

    • NIST STNR - 
histogram-based

    confused by 
double noise floor

    !

    • SNRvad - 
VAD-based

    tricked by 
noise gating

    ���14

    0 10 20 30 40 50 60 705

    10

    15

    20

    25

    30

    35

    40

    45

    50

    NIST stnr

    SNR

    vad

    NIST STNR vs SNRvad for BABEL OP1 206 train

    sphwav

  • Babel Audio Analysis - Ellis 2014-01-09 /15

    Summary• Acoustic environment affects performance

    impact varies with different languages

    • Cellphone noise gating gives complex

    histograms

    worse in CAR_KIT

    confuses simple SNR estimation

    • Mic channel is extremely high quality

    limited data - hard to exploit

    • Opportunities

    improved normalization?

    clustering of channel characteristics

    ���15