Upload
abra-porter
View
42
Download
9
Tags:
Embed Size (px)
DESCRIPTION
Robust Entropy-based Endpoint Detection for Speech Recognition in Noisy Environments. 張智星 [email protected] http://www.cs.nthu.edu.tw/~jang. Reference. - PowerPoint PPT Presentation
Citation preview
Robust Entropy-based Endpoint Detection for Speech Recognition in
Noisy Environments
http://www.cs.nthu.edu.tw/~jang
2
Reference Jialin Shen, Jeihweih Hung, Linshan Lee,
“Robust entropy-based endpoint detection for speech recognition in noisy environments”, International Conference on Spoken Language Processing, Sydney, 1998
3
Summary Entropy-based algorithm for accurate and
robust endpoint detection for speech recognition under noisy environments
Better than energy-based algorithms in both detection accuracy and recognition performance
Error reduction: 16%
4
Motivation Energy-based endpoint detection becomes
less reliable when dealing with non-stationary noise and sound artifacts such as lip smacks, heavy breathing and mouth clicks, etc.
Spectral entropy is effective in distinguishing the speech segments from the non-speech parts.
5
Spectral Entropy PDF: Normalization
Spectral entropy:
Nifs
fsp N
kk
ii ,...,1,
)(
)(
1
HzforHzfiffs iii 60002500)(
120 iii porpifp
N
kkk ppH
1
log
6
Properties of Entropy N=2
entropyPlot.m
N=3
7
Entropy Weighting A set of weighting factors can be applied:
These weighting factors are statistically estimated from a large collection of speech signals.
N
kkkk ppwH
1
log
8
Endpoint Detection The sum of the spectral entropy values over a
duration of frames (20 frames) is first evaluated and smoothed by a median filter
Some thresholds are used to detect the beginning and ending boundaries of the embedded speech segments
A short period of background noise is first taken as the reference for some initial boundary detection process.
Short speech segments (<100ms) are rejected.
9
Experiment Settings Speech database
Isolated digits in Mandarin Chinese produced by 100 speakers (10 speakers for test, others for training)
Speech features: 12-order MFCC and 12-order delta MFCC
Models Continuous-density HMM 6 states/digits, 3 mixture/state
10
Experiment Settings Noise
NOISEX-92 noise-in-speech database White noise, pink noise, volvo noise (car
noise), F16 noise, machinegun noise Sound artifacts
Breath noise, cough noise and mouse click noise.
11
Example
12
Experimental Results
13
Experimental Results
14
Something Not Clear… What is the sample rate? Bit resolution? What is the frame size and overlap? What is the order of the median filter? How to use the “short period of background
noise”? What is the value for the thresholds of spectral
entropy for determining boundaries? What are the values for 1 and 2?