View
216
Download
2
Embed Size (px)
HIWIRE MEETINGHIWIRE MEETINGCRETE, SEPTEMBER 23-24, 2004CRETE, SEPTEMBER 23-24, 2004
JOSÉ C. SEGURA LUNAJOSÉ C. SEGURA LUNA
GSTC UGRGSTC UGR
2
José
C. S
egur
a Lu
na
HIWIRE Meeting – Crete, 23-24 September, 2004
Schedule
VAD for noise suppression & frame-dropping Long-Term Spectral divergence Subband OS-based detector
Non-linear feature normalization Histogram equalization OS-based equalization Segmental implementation
3
José
C. S
egur
a Lu
na
HIWIRE Meeting – Crete, 23-24 September, 2004
VAD (1)
VAD: motivation To get an estimation of the background noise for
Wiener filter design Spectral subtraction
To discard non-speech frames
WIENERFILTER / SS
VAD
FRAMEDROPPING
NOISEESTIMATION
RECOGNIZERNOISY SPEECH
4
José
C. S
egur
a Lu
na
HIWIRE Meeting – Crete, 23-24 September, 2004
VAD (2)
Our approach
Use of rather long time spans (~100ms) instead of instantaneous measures
Increase discrimination
Use an statistical model in the log-FBE domain Smoother estimations
Use a feedback decision coupled with noise suppression VAD works on less noisy speech
Use of Order Statistics More robust estimation
5
José
C. S
egur
a Lu
na
HIWIRE Meeting – Crete, 23-24 September, 2004
Long-Term Spectral Divergence (1)
J. Ramírez , J.C. Segura, C. Benítez, A. de la Torre and A.J. Rubio, Efficient voice activity detection algorithms using long-term speech information, Speech Communication 42 (2004) 271–287
K
KnK
K
NFFT
k
NNn
ntkXK
tkN
speechtkN
speechnontkNtkNtkN
tkN
tkLTSE
NFFTtkLTSD
ntkXtkLTSE
),(12
1),(
)1,(
),()1()1,(),(
),(
),(1log10),(
),(max),(
1
02
2
10
6
José
C. S
egur
a Lu
na
HIWIRE Meeting – Crete, 23-24 September, 2004
Long-Term Spectral Divergence (2)
7
José
C. S
egur
a Lu
na
HIWIRE Meeting – Crete, 23-24 September, 2004
Long-Term Spectral Divergence (3)
8
José
C. S
egur
a Lu
na
HIWIRE Meeting – Crete, 23-24 September, 2004
Long-Term Spectral Divergence (4)
9
José
C. S
egur
a Lu
na
HIWIRE Meeting – Crete, 23-24 September, 2004
Long-Term Spectral Divergence (5)
10
José
C. S
egur
a Lu
na
HIWIRE Meeting – Crete, 23-24 September, 2004
Long-Term Spectral Divergence (7)
Recognition experiments with AURORA 2 and 3
11
José
C. S
egur
a Lu
na
HIWIRE Meeting – Crete, 23-24 September, 2004
Long-Term Spectral Divergence (6)
12
José
C. S
egur
a Lu
na
HIWIRE Meeting – Crete, 23-24 September, 2004
Subband OSF VAD (1)
J. Ramírez, J.C. Segura, C. Benítez, A. de la Torre, and A.J. Rubio,An Effective Subband OSF-based VAD with Noise Reduction for Robust Speech Recognition, IEEE Trans. On Speech and Audio Processing (to appear in 2005)
Decision is based on averaged QSNR defined as a inter-quantile difference
Feedback structure VAD operates over the
noise-reduced signal
13
José
C. S
egur
a Lu
na
HIWIRE Meeting – Crete, 23-24 September, 2004
Subband OSF VAD (2)
speech)(),(1
)(
9.0),(),(),(
speech-nonfor update),()1()1,(),(
22
),(),()1(),(
),(),(),( :statisticsOrder
)},(),,(,),,({:buffer Temporal
bandin at timeEnergy -log : ),(
1
5.0
)1()(
)12()()1(
tSNRtkQSNRK
tSNR
ptkEtkQtkQSNR
tkQtkEtkE
spNfpNs
tkfEtkEftkQ
tkEtkEtkE
NtkEtkENtkE
kttkE
K
k
Np
NN
ssp
Nr
14
José
C. S
egur
a Lu
na
HIWIRE Meeting – Crete, 23-24 September, 2004
Subband OSF VAD (3)
15
José
C. S
egur
a Lu
na
HIWIRE Meeting – Crete, 23-24 September, 2004
Subband OSF VAD (4)
16
José
C. S
egur
a Lu
na
HIWIRE Meeting – Crete, 23-24 September, 2004
Subband OSF VAD (5)
17
José
C. S
egur
a Lu
na
HIWIRE Meeting – Crete, 23-24 September, 2004
Accurate VAD
Open topics
New alternatives to improve the performanceNew decision criteria based on OS- filtersAlready used for edge detection in images
Computational efficiencyDevelopment of computationally efficient algorithms
18
José
C. S
egur
a Lu
na
HIWIRE Meeting – Crete, 23-24 September, 2004
Feature normalization
Objective Transform features to remove undesired variability
Linear techniques CMS
Cepstral mean subtraction Removes the effect of linear channel distortion
CMVN Cepstral mean and variance normalization Extension of CMS to deal with variance reduction caused by the
additive noise
19
José
C. S
egur
a Lu
na
HIWIRE Meeting – Crete, 23-24 September, 2004
Feature normalization
Non-linear feature distortion Environment effects are non-linear for MFCC features And can hardly be removed with linear techniques Because not only the location (mean) and scale (variance) of
the feature distributions are affected, but also the shape (affecting higher order moments of the distribution)
Non-linear extensions CDF-matching approaches (HEQ and related) Have been proved to be more effective than linear ones Give normalization for not only the two first moments of the
probability distributions
20
José
C. S
egur
a Lu
na
HIWIRE Meeting – Crete, 23-24 September, 2004
CDF-matching based equalization
The main idea Transform the features to match a given PDF In the one-dimensional case CDF-matching gives the solution
))((][ˆ)̂()(
)()̂()()(
)̂(][ˆ)(
1
ˆ
xCxTxxxC
duuxduupxC
xxTxxpx
XXX
xx
XX
XX
21
José
C. S
egur
a Lu
na
HIWIRE Meeting – Crete, 23-24 September, 2004
Equalization and robust classifiers
5.38.0expexplog nhnhxy
22
José
C. S
egur
a Lu
na
HIWIRE Meeting – Crete, 23-24 September, 2004
Invariance
CMS is invariant to additive bias CMVN is invariant to linear transformations Equalization to a reference distribution is invariant to any
invertible transformation (including non-linear ones)
xxCxGCy
xCxGC
yCyTy
xCxTx
xGy
XY
XY
YY
XX
ˆ))(()))(((ˆ
therefore and
)())((
then invertible is G() if
))(()(ˆ
))(()(ˆ
tiontransforma general A : )(
11
1
1
23
José
C. S
egur
a Lu
na
HIWIRE Meeting – Crete, 23-24 September, 2004
HEQ for robust speech recognition (1)
A. de la Torre, A.M. Peinado, J.C. Segura, J.L. Pérez, C. Benítez and A.J. Rubio, Histogram equalization of speech representation for robust speech recognition, IEEE Tans. On Speech and Audio Processing (to appear in 2005)
Transformation of each component of the MFCC vector to a Gaussian reference
Cumulative distribution are estimated using histograms
Performance compared with CMS, CMVN and model-based feature compensation (VTS)
Combination with (VTS)
24
José
C. S
egur
a Lu
na
HIWIRE Meeting – Crete, 23-24 September, 2004
HEQ for robust speech recognition (2)
25
José
C. S
egur
a Lu
na
HIWIRE Meeting – Crete, 23-24 September, 2004
HEQ for robust speech recognition (3)
26
José
C. S
egur
a Lu
na
HIWIRE Meeting – Crete, 23-24 September, 2004
HEQ for robust speech recognition (4)
27
José
C. S
egur
a Lu
na
HIWIRE Meeting – Crete, 23-24 September, 2004
HEQ for robust speech recognition (5)
28
José
C. S
egur
a Lu
na
HIWIRE Meeting – Crete, 23-24 September, 2004
Segmental HEQ (1)
J.C. Segura, C. Benítez, A. de la Torre, A.J. Rubio and J. Ramírez, Cepstral Domain Segmental Nonlinear Feature Transformations for Robust Speech Recognition, IEEE Signal Processing Letters, 11(5), May 2004
A segmental implementation of HEQ for non-stationary noise
A temporal buffer is used for the histogram estimation instead of the full sentence
The algorithmic delay is T frames
},,,{ TttTtt xxxX
29
José
C. S
egur
a Lu
na
HIWIRE Meeting – Crete, 23-24 September, 2004
Segmental HEQ (2)
30
José
C. S
egur
a Lu
na
HIWIRE Meeting – Crete, 23-24 September, 2004
OSEQ: An efficient implementation (1)
A very computationally efficient algorithm based on Order Statistics
][12
5.0)())(ˆ(ˆ:tionTransforma
12,,1125.0:table Lookup
125.0
)(ˆ:estimation CDF
:Statistics Order
},,,{: bufferTemporal
11
1
)(
)12()()1(
rGTxr
xCx
TrTr
G[r]
Tr
xC
xxx
xxxX
ttXt
rX
Tr
TttTtt
31
José
C. S
egur
a Lu
na
HIWIRE Meeting – Crete, 23-24 September, 2004
OSEQ: An efficient implementation (2)
32
José
C. S
egur
a Lu
na
HIWIRE Meeting – Crete, 23-24 September, 2004
Feature normalization
Open topics Reference distribution
Clean speech / Gaussian / ¿Others? Dynamic features normalization ( and )
After, before or simultaneously [Obuchi, Stern, EUSP’03]
Progressive normalization Not all MFCC are equally affected and do not have equal
discriminative power [de Wet, …, ICASSP’03] Lower order moments normalization [Hsu, Lee, ICASSP’04]
Parametric techniques Actual approaches are non-parametric [Haverinen, Kiss, EUSP’03]
New applications Speaker independence and adaptation Multi-stream normalization
33
José
C. S
egur
a Lu
na
HIWIRE Meeting – Crete, 23-24 September, 2004
Combination of techniques
Development of a combined robust front-end
An accurate VAD For noise parameter estimation
A noise reduction technique Spectral subtraction or Wiener filter Statistical feature compensation
A Frame-Dropping algorithm To discard non-speech frames
And a Feature normalization block For residual non-linear distortion compensation
34
José
C. S
egur
a Lu
na
HIWIRE Meeting – Crete, 23-24 September, 2004
VAD (1)
Development of a combined robust front-end
WIENERFILTER / SS
VAD
FRAMEDROPPING
NOISEESTIMATION
FEATUREEQUALIZATION
NOISY SPEECHRECOGNIZER
HIWIRE MEETINGHIWIRE MEETINGCRETE, SEPTEMBER 23-24, 2004CRETE, SEPTEMBER 23-24, 2004
JOSÉ C. SEGURA LUNAJOSÉ C. SEGURA LUNA
GSTC UGRGSTC UGR