Spatial Diffuseness Features for DNN -Based Speech Recognition … · 2015-05-15 · Spatial...

Spatial Diffuseness Features for DNN-Based Speech Recognition in Noisy and

Reverberant EnvironmentsAndreas Schwarz, Christian Huemmer, Roland Maas,

Walter Kellermann

Lehrstuhl für Multimediakommunikation und SignalverarbeitungFriedrich-Alexander-Universität Erlangen-Nürnberg, Germany

ICASSP 2015

ICASSP 2015: Spatial Diffuseness Features for DNN-Based Speech RecognitionAndreas Schwarz, Christian Huemmer, Roland Maas, Walter Kellermann

Trend: explicit feature processing → implicit learning! MFCCs → simple filterbank features [Mohamed et al. 2013]! Filterbanks → raw time-domain signals [Jaitly, Hinton 2011]

! Denoising → noise-aware training [Seltzer et al. 2013]

What about spatial information (microphone arrays)?! Stacked feature vectors from multiple channels

[Swietojanski et al. 2013]! Phase information is not exploited

! Raw multi-channel waveforms [Hoshen et al. 2015]! Hard to generalize for arbitrary acoustic scenarios

! Spatial diffuseness features! Represent spatial information independently of

source position and microphone array

Deep Neural Networks for Acoustic Modeling

mh acoustics Eigenmike

Signal Model

Coherence-based Dereverberation in the STFT Domain

Extraction of Spatial Diffuseness Features

Outline

! Desired signal is fully coherent (only delayed between microphones)

! Noise and reverberation is diffuseand uncorrelated to the desired signal

! Coherence of the mixed sound fieldcan be modeled as:

Signal Model

→ Coherent-to-diffuse ratio (CDR) can be estimatedfrom the complex spatial coherence of the mixture

1. Estimate short-time spatial coherence (quasi-instantaneous)2. Estimate coherent-to-diffuse ratio (CDR)3. Perform spectral subtraction to suppress diffuse components

[Schwarz/Kellermann, “Coherent-to-Diffuse Power Ratio Estimation for Dereverberation”, IEEE/ACM Transactions on Audio, Speech and Language Processing, 2015]

Only instantaneous signal properties are exploitedNo knowledge or estimation of source DOA required

Coherence-based STFT-Domain Dereverberation

Word Error Rate for REVERB challenge evaluation set

Multi-condition training neutralizes the effect of dereverberation

Evaluation

x2x2testx2

logmelspec enh. logmelspec

Clean speech-trained DNNSimDataRealData

9.5 9.4

28.8 28.8

logmelspec enh. logmelspecW

Multi-condition-trained DNNSimDataRealData

⇨ Improvement for clean-trained DNN " ⇨ Disappears with multi-condition training #

Instead of STFT-domain enhancement, extract spatial features

! meldiffuseness:! 0 for purely directional sound, 1 for purely diffuse sound! computed from coherent-to-diffuse ratio: D(k,f)=1/(CDR(k,f)+1)

! Naive approach: magnitude squared coherence (melmsc)! Depends not only on diffuse noise content, but also on microphone spacing, DOA

Spatial Feature Extraction

Visualization of Features

logmelspec:

enhanced logmelspec:

meldiffuseness:

REVERB challenge “two microphone” task [Kinoshita et al. 2013]! noisy and reverberant signals created from WSJCAM0 corpus! varying direction of arrival! 2 microphones, 8cm spacing

DNN-based Speech Recognizer! Kaldi toolkit! hybrid DNN-HMM acoustic model! “maxout” network (4 hidden layers, 2000 inputs, 400 outputs per layer)! ±5#frame#splicing! training on#multi!condition noisy and reverberant data (17.5#hours)

Feature vectors! noisy logmelspec features:! enhanced logmelspec features:! augmented with melmsc:! augmented with meldiffuseness:

Evaluation Setup

x2x2testx2

logmelspec Δ ΔΔenh. logmel Δ ΔΔlogmelspec Δ melmsclogmelspec Δ meldiffuseness

overall dimension: 72

SimData: measured impulse responses, additive noiseRealData: real recordings in noisy environment

6% to 11% relative WER reduction by using spatial features

Evaluation Results

9.5 9.4 9.0 8.5

28.8 28.8 27.7 27.0

logmelspec enh. logmelspec logmelspec +melmsc

logmelspec +meldiffuseness

SimData

RealData

Motivation! STFT-domain dereverberation has little effect on WER! Idea: exploit spatial information in the DNN

Spatial Diffuseness Features! Can be extracted instantaneously! “Blind”, no knowledge or estimation of the source DOA required! Device-independent features! 6% to 11% relative WER reduction for REVERB challenge 2-channel task! MATLAB code available (see paper)

Can we use a similar approach to deal with directional interferers?

Thank you for your attention!

Summary

Results (Details)

SimData RealData

near far near far near far near farGMM-HMM MFCC-LDA-MLLT-fMLLR 6.6 7.5 9.4 16.6 11.1 20.7 12.0 31.2 30.2 30.7 12.1 31.6

logmelspec+∆+∆∆ 5.7 6.7 7.7 13.9 8.7 14.6 9.5 28.5 29.1 28.8 9.7 24.9enhanced logmelspec+∆+∆∆ 6.6 7.1 7.7 12.2 8.3 14.6 9.4 28.5 29.1 28.8 9.1 25.3logmelspec+∆+melmsc 6.2 6.3 7.0 12.3 8.2 13.9 9.0 27.3 28.0 27.7 8.7 24.7logmelspec+∆+meldiffuseness 5.9 6.1 6.9 11.0 8.2 12.9 8.5 27.8 26.3 27.0 7.9 24.2

Recognizer Feature

DNN-HMM

Room 1 Room 2AvgAvg

Evaluation Set Development SetSimData RealData

Avg AvgRoom 3 Room 1

Spatial Diffuseness Features for DNN -Based Speech Recognition … · 2015-05-15 · Spatial...

Documents

A Southern Fried Buffet of DNN Goodness - DNN Community, User Groups & More

NVIDIA DRIVE UPDATE€¦ · Sensor Complexity - More higher resolution cameras New DNN Models-CNN, RCNN, RNN DNN GOPs/Frame increasing 20+ DNN Models required for most demanding ODD

Isotropy and Diffuseness in Room Acoustics: Paper ICA2016- 556 · Isotropy and Diffuseness in Room Acoustics: Paper ICA2016- 556 ... By introducing a mean absorption exponent 𝛼𝑚′=−ln

Leadtail and DNN Webinar

Survey of DNN Hardware

Subject diffuseness in Maltese: on some subject ... - UM

DNN Extension Development - IowaComputerGurus Inc.static.iowacomputergurus.com/cdn/Downloads/BestPractices/DNN... · DNN Extension Development Best Practices Guide Page | 5 DotNetNuke.com

Analysis method for estimating diffuseness of sound fields ... · Analysis method for estimating diffuseness of ... An analysis method for estimating diffuseness of sound fields by

Packaging DNN extensions

DNN-Based Prediction Model for Spatial-Temporal Data · DNN-Based Prediction Model for Spatial-Temporal Data Junbo Zhang1, Yu Zheng1;2;3, Dekang Qi4, Ruiyuan Li2, Xiuwen Yi4 1Microsoft

Benefites of DNN eCommerce Development

Mobile/Embedded DNN and AI SoCs - Homepage - CMU · PDF fileOutline 1. Deep Neural Network Processor –Mobile DNN Applications –Basic CNN Architectures 2. M/E-DNN: Mobile/Embedded

DNN Performance Best Practices

Dnn 07 08 2013 001

Dnn connect dnnmobi-slides

DNN Sentinel

Isotropy and Diffuseness in Room Acoustics: Paper ICA2016- 556

What is DNN about?

Dnn for beginners

Why DNN Works for Speech and How to Make it More Efficient? · • 2006: DNN for small tasks (Hinton et al., 2006) o RBM-based pre-training for DNN • 2010: DNN for small-scale ASR