14
Robust Entropy-based Endpoint Detection for Speech Recognition in Noisy Environments 張張張 [email protected] http://www.cs.nthu.edu.tw/~jang

Robust Entropy-based Endpoint Detection for Speech Recognition in Noisy Environments

Embed Size (px)

DESCRIPTION

Robust Entropy-based Endpoint Detection for Speech Recognition in Noisy Environments. 張智星 [email protected] http://www.cs.nthu.edu.tw/~jang. Reference. - PowerPoint PPT Presentation

Citation preview

Page 1: Robust Entropy-based Endpoint Detection for Speech Recognition in Noisy Environments

Robust Entropy-based Endpoint Detection for Speech Recognition in

Noisy Environments

張智星[email protected]

http://www.cs.nthu.edu.tw/~jang

Page 2: Robust Entropy-based Endpoint Detection for Speech Recognition in Noisy Environments

2

Reference Jialin Shen, Jeihweih Hung, Linshan Lee,

“Robust entropy-based endpoint detection for speech recognition in noisy environments”, International Conference on Spoken Language Processing, Sydney, 1998

Page 3: Robust Entropy-based Endpoint Detection for Speech Recognition in Noisy Environments

3

Summary Entropy-based algorithm for accurate and

robust endpoint detection for speech recognition under noisy environments

Better than energy-based algorithms in both detection accuracy and recognition performance

Error reduction: 16%

Page 4: Robust Entropy-based Endpoint Detection for Speech Recognition in Noisy Environments

4

Motivation Energy-based endpoint detection becomes

less reliable when dealing with non-stationary noise and sound artifacts such as lip smacks, heavy breathing and mouth clicks, etc.

Spectral entropy is effective in distinguishing the speech segments from the non-speech parts.

Page 5: Robust Entropy-based Endpoint Detection for Speech Recognition in Noisy Environments

5

Spectral Entropy PDF: Normalization

Spectral entropy:

Nifs

fsp N

kk

ii ,...,1,

)(

)(

1

HzforHzfiffs iii 60002500)(

120 iii porpifp

N

kkk ppH

1

log

Page 6: Robust Entropy-based Endpoint Detection for Speech Recognition in Noisy Environments

6

Properties of Entropy N=2

entropyPlot.m

N=3

Page 7: Robust Entropy-based Endpoint Detection for Speech Recognition in Noisy Environments

7

Entropy Weighting A set of weighting factors can be applied:

These weighting factors are statistically estimated from a large collection of speech signals.

N

kkkk ppwH

1

log

Page 8: Robust Entropy-based Endpoint Detection for Speech Recognition in Noisy Environments

8

Endpoint Detection The sum of the spectral entropy values over a

duration of frames (20 frames) is first evaluated and smoothed by a median filter

Some thresholds are used to detect the beginning and ending boundaries of the embedded speech segments

A short period of background noise is first taken as the reference for some initial boundary detection process.

Short speech segments (<100ms) are rejected.

Page 9: Robust Entropy-based Endpoint Detection for Speech Recognition in Noisy Environments

9

Experiment Settings Speech database

Isolated digits in Mandarin Chinese produced by 100 speakers (10 speakers for test, others for training)

Speech features: 12-order MFCC and 12-order delta MFCC

Models Continuous-density HMM 6 states/digits, 3 mixture/state

Page 10: Robust Entropy-based Endpoint Detection for Speech Recognition in Noisy Environments

10

Experiment Settings Noise

NOISEX-92 noise-in-speech database White noise, pink noise, volvo noise (car

noise), F16 noise, machinegun noise Sound artifacts

Breath noise, cough noise and mouse click noise.

Page 11: Robust Entropy-based Endpoint Detection for Speech Recognition in Noisy Environments

11

Example

Page 12: Robust Entropy-based Endpoint Detection for Speech Recognition in Noisy Environments

12

Experimental Results

Page 13: Robust Entropy-based Endpoint Detection for Speech Recognition in Noisy Environments

13

Experimental Results

Page 14: Robust Entropy-based Endpoint Detection for Speech Recognition in Noisy Environments

14

Something Not Clear… What is the sample rate? Bit resolution? What is the frame size and overlap? What is the order of the median filter? How to use the “short period of background

noise”? What is the value for the thresholds of spectral

entropy for determining boundaries? What are the values for 1 and 2?