View
214
Download
0
Tags:
Embed Size (px)
Citation preview
HIWIRE MEETINGHIWIRE MEETINGTorino, March 9-10, 2006Torino, March 9-10, 2006
José C. Segura, Javier RamírezJosé C. Segura, Javier Ramírez
2 HIWIRE Meeting – Torino, 9 -10 March, 2006
Schedule
HIWIRE database evaluations New results: HEQ and PEQ
Non-linear feature normalization Using temporal redundancy HEQ integration in Loquendo platform Recursive estimation of the equalization function
New improvements in robust VAD Bispectrum-based VAD SVM-enabled VAD
3 HIWIRE Meeting – Torino, 9 -10 March, 2006
HIWIRE database evaluations
Results without adaptation (50 test sentences) MODELS French Greek Italian Spanish World Avgd WSJ16k 13,06 23,70 20,41 17,30 12,55 17,60 WSJ16kfon 10,43 19,24 16,52 15,33 8,01 14,12 TIMIT (Loria) 7,30 9,96 11,87 9,27 5,77 8,99 TIMIT (retrained) 8,39 9,76 12,83 8,98 7,72 9,70 TIMIT HEQ 12,76 21,78 16,17 15,33 10,39 15,65 TIMIT PEQ 12,66 17,69 14,39 14,23 9,16 14,01 Results with MLLR adaptation (50 adapt / 50 test sentences) MODELS French Greek Italian Spanish World Avgd WSJ16k 3,85 4,50 5,94 4,53 3,90 4,55 WSJ16kfon 3,50 3,13 7,00 5,55 3,75 4,45 TIMIT (Loria) 3,13 2,71 3,80 2,99 2,81 3,13 TIMIT (retrained) 3,23 3,26 4,12 3,21 2,96 3,41 TIMIT HEQ 5,09 5,57 5,90 6,13 5,56 5,55 TIMIT PEQ 5,89 4,91 5,90 6,64 5,70 5,72
4 HIWIRE Meeting – Torino, 9 -10 March, 2006
Schedule
HIWIRE database evaluations New results: HEQ and PEQ
Non-linear feature normalization Using temporal redundancy HEQ integration in Loquendo platform Recursive estimation of the equalization function
New improvements in robust VAD Bispectrum-based VAD SVM-enabled VAD
5 HIWIRE Meeting – Torino, 9 -10 March, 2006
Temporal redundancy in HEQ
Enhance the normalization adding a linear transformation to restore temporal correlations
Each equalized cepstral coefficient is time-filtered with an ARMA filter that restores the autocorrelation of clean data
1 2 3 4 5 6 7 8 9 10 11 12 13 14 AvgdHIWIRE (baseline) 13,22 24,68 46,00 47,62 52,67 44,80 54,73 22,58 36,21 55,40 58,31 65,34 54,11 62,28 45,57ECDF (clean ref) 11,82 22,62 37,75 38,90 36,91 37,46 40,92 21,29 32,67 45,93 49,28 50,61 44,60 49,65 37,17ECDF (clean ref) + TES 11,42 21,40 35,25 37,53 34,59 36,17 38,56 20,15 28,80 43,87 47,66 49,83 44,75 46,77 35,48
AURORA4
Test A Test B Test C AvgdHIWIRE (baseline) 36,00 30,90 35,27 33,81ECDF (clean ref) 17,06 17,30 18,97 17,54ECDF (clean ref) + TES 16,24 14,21 16,35 15,45
AURORA2 (clean test)
6 HIWIRE Meeting – Torino, 9 -10 March, 2006
HEQ integration in Loquendo platform
SEGMENTALSAC WC WI WD WS
Baseline (LASR) 45,70 75,10 0,60 16,60 7,60denoise-MeanDev 46,60 77,50 4,80 7,20 10,40denoise-HEQ121 38,20 69,60 4,30 12,60 13,50denoise-HEQ1001 46,50 77,70 4,00 7,30 11,00
Actually implementedHIGH MISMATCH
SYSTEM HM MM WM HM MM WMHIWIRE 48,61 85,3 94,49 42,49 61,98 80,9HEQ_GAUS 73,36 80,1 82,72 41,37 46,17 48,97HEQ(Q31) 74,88 75,67 85,86 41,21 34,81 56,23HEQ(Q31) IIR SORT 0.8 81,34 88,65 95,46 53,67 64,69 83,12
WAC SACSENTENCE-BY-SENTENCE
RECURSIVE
New proposal
7 HIWIRE Meeting – Torino, 9 -10 March, 2006
HEQ integration (recursive estimation) (1)
Actual approach: Gaussian HEQ using ECDF
Ttr
TttrT
Ttr
CyCCx
ECDFTtT
tryC
yyyyyyY
XYXt
tY
TrrrT
)(1
,,1)]([
5.0)())(ˆ(ˆ
,,15.0)(
)(ˆ
},,,{
11
)()2()1(21
Using quantiles
krfrfloorkCTr
xxxxxxX
fxxfQQQQQK
kCCCCCCC
k
T
SORT
T
kkXk
XK
Xk
XX
kKk
)()1(1
}...{},...,,{
)1(},...,,...,{
5.0},,...,,...,,{
)()2()1(21
)1()(1
max1min
8 HIWIRE Meeting – Torino, 9 -10 March, 2006
HEQ integration (recursive estimation) (2)
Equalization by linear interpolation
},...,,...,{ 1XK
Xk
XX QQQQ
},...,,...,{ 1YK
Yk
YY QQQQ
Averaged over training data
From actual utterance
)()( YkYk
XkX QCCQC
Mapping correspondingquantiles
YkQ
XkQ
y
x̂
YkQ
XkQ
9 HIWIRE Meeting – Torino, 9 -10 March, 2006
HEQ integration (recursive estimation) (3)
feature each for tIndependen
quantiles) (31 quantiles betweenioninterpolat Linear :onEqualizati
)utterance each()1(
)initially(
R
R
qQQ
quantiles utterance Actual
quantiles Estimated
quantiles Reference
q
Q
QR
Alpha HM MM WM0,00 74,88 75,67 85,860,20 77,45 85,34 87,670,40 76,98 86,10 88,960,60 79,92 88,45 91,630,80 81,34 88,65 95,460,85 80,97 89,77 95,660,90 79,16 89,13 95,870,95 76,51 88,89 95,021,00 46,75 87,14 94,40
HEQ(31) IIR SORT (BEFORE)AURORA3 Italian results (before)
70
75
80
85
90
95
100
0,00 0,20 0,40 0,60 0,80 0,85 0,90 0,95 1,00
Alpha
WA
C(%
)
HM
MM
WM
10 HIWIRE Meeting – Torino, 9 -10 March, 2006
HEQ integration (recursive estimation) (4)
Alpha HM MM WM0,00 77,51 79,15 90,060,20 77,93 81,42 90,780,40 79,21 85,78 91,600,60 77,38 89,01 89,980,80 79,87 89,21 95,260,85 78,98 88,41 95,630,90 78,06 88,45 95,520,95 76,25 88,73 95,101,00 46,75 87,14 94,40
HEQ(31) IIR SORT (AFTER) AURORA3 Italian results (after)
70
75
80
85
90
95
100
0,00 0,20 0,40 0,60 0,80 0,85 0,90 0,95 1,00
Alpha
WA
C(%
)
HM
MM
WM
Utterances are equalized WITHOUT delay
Quantiles are updated AFTER the equalization
HIWIRE MEETINGHIWIRE MEETINGTorino, March 9-10, 2006Torino, March 9-10, 2006
José C. Segura,José C. Segura, Javier Ramírez Javier Ramírez
12 HIWIRE Meeting – Torino, 9 -10 March, 2006
Schedule
HIWIRE database evaluations New results: HEQ and PEQ
Non-linear feature normalization Using temporal redundancy HEQ integration in Loquendo platform Recursive estimation of the equalization function
New improvements in robust VAD Bispectrum-based VAD SVM-enabled VAD
13 HIWIRE Meeting – Torino, 9 -10 March, 2006
Bispectrum-based VAD (1)
Motivations: Ability of HOS methods to detect signals in noise
Knowledge of the input processes (Gaussian)
Issues to be addressed: Computationally expensive Variance of bispectrum estimators much higher than that of power
spectral estimators (identical data record size)
Solution: Integrated bispectrum J. K. Tugnait, “Detection of non-Gaussian signals using integrated
polyspectrum,” IEEE Trans. on Signal Processing, vol. 42, no. 11, pp. 3137–3149, 1994.
14 HIWIRE Meeting – Torino, 9 -10 March, 2006
Bispectrum-based VAD (2)
Definitions:Let x(t) be a discrete-time signal Bispectrum:
Third order cumulants:
Inverse transform:
i k
xx kijkiCB )}(exp{),(),( 21321
)}()()({),(3 ktxitxtxEkiC x
),( ),( 321 kiCB xx
21212123 )}(exp{),(
)2(
1),( ddkijBkiC xx
15 HIWIRE Meeting – Torino, 9 -10 March, 2006
Bispectrum-based VAD (3)
Noise only Noise + speech
16 HIWIRE Meeting – Torino, 9 -10 March, 2006
Bispectrum-based VAD (4)
Integrated bispectrum (IBI):
Cross-spectrum Syx()
Bispectrum Inverse
transform:
Bispectrum – Cross spectrum:
)()( 2 txty
)(}exp{)(2
1),0(
}exp{),0(}exp{)}()({)(
3
3
krdkjSkC
kjkCkjktxtyES
yxyxx
kx
kyx
21212123 )}(exp{),(
)2(
1),( ddkijBkiC xx
1122 ),(2
1),(
2
1)( dBdBS xxyx
i= 0
17 HIWIRE Meeting – Torino, 9 -10 March, 2006
Bispectrum-based VAD (5)
Integrated bispectrum (IBI): Defined as a cross spectrum between the signal and its square,
and therefore, it is a function of a single frequency variable
Benefits: Less computational cost
computed as a cross spectrum Variance of the same order of the power spectrum estimator
Properties For Gaussian processes:
Bispectrum is zero Integrated bispectrum is zero as well
18 HIWIRE Meeting – Torino, 9 -10 March, 2006
Two alternatives explored for formulating the decision rule: Estimation by block averaging (BA):
MO-LRT: Given a set of N= 2m+1 consecutive observations:
Bispectrum-based VAD (6)
)( )H(P
)H(P
)H|ˆ(
)H|ˆ()ˆ(
1
0
0H|
1H|
1H
0H0
1
l
ll p
pL
y
yy
y
y
ml
mlk k
kmllmlN
k
k
p
pL
)H|ˆ(
)H|ˆ()ˆ,...,ˆ,...,ˆ(
0H|
1H|
0
1
y
yyyy
y
y
KBNB samples
NBsamples
KB blocks
l-th frame
Frameshift
VADdecision
T
19 HIWIRE Meeting – Torino, 9 -10 March, 2006
Bispectrum-based VAD (7)
LRT evaluation IBI Gaussian Model
Variances Defined in terms of
Sss (clean speech power spectrum)
Snn (noise power spectrum)
20 HIWIRE Meeting – Torino, 9 -10 March, 2006
Bispectrum-based VAD (8)
Denoising:
Smoothedspectral
subtraction
( )xxS
( )nnS 1( )S
1st WF design
1st WFstage
2 ( )S 2nd WF design
2nd WFstage
( )ssS
1-framedelay
21 HIWIRE Meeting – Torino, 9 -10 March, 2006
Bispectrum VAD Analysis (1)
MO-LRT VAD
ml
mlk k
kmllmlN
k
k
p
pL
)H|ˆ(
)H|ˆ()ˆ,...,ˆ,...,ˆ(
0H|
1H|
0
1
y
yyyy
y
y
22 HIWIRE Meeting – Torino, 9 -10 March, 2006
Bispectrum-based VAD results (2)
0
20
40
60
80
100
0 10 20 30 40 50 60FALSE ALARM RATE (FAR0)
PA
US
E H
IT R
AT
E (
HR
0)
G.729AMR1AMR2AFE (Noise Est.)AFE (frame-dropping)LiMarzinzikSohnWooBA-IBI (KB= 1, NB= 256)BA-IBI (KB= 3, NB= 256)BA-IBI (KB= 5, NB= 256)
23 HIWIRE Meeting – Torino, 9 -10 March, 2006
Bispectrum-based VAD results (3)
0
20
40
60
80
100
0 10 20 30 40 50 60FALSE ALARM RATE (FAR0)
PA
US
E H
IT R
AT
E (
HR
0)
G.729AMR1AMR2AFE (Noise Est.)AFE (frame-dropping)LiMarzinzikSohnWooMO-LRT IBI (KB= 1, NB= 256, m= 2)MO-LRT IBI (KB= 1, NB= 256, m= 5)MO-LRT IBI (KB= 1, NB= 256, m= 7)
24 HIWIRE Meeting – Torino, 9 -10 March, 2006
Bispectrum-based VAD results (4)
WF: Wiener filteringFD : Frame-dropping
25 HIWIRE Meeting – Torino, 9 -10 March, 2006
SVM-enabled VAD (1)
Motivation: Ability of SVMs for learning from experimental data
SVMs enable defining a function:
using training data:
Classify unseen examples (x, y)
Statistical learning theory restricts the class of functions the learning machine can implement.
y
Rf N
}1,1{:
x
}1,1{ ),(),...,,(),,( 2211 NRyyy xxx
otherwise1
1)(1
xx
fy
26 HIWIRE Meeting – Torino, 9 -10 March, 2006
SVM-enabled VAD (2)
Hyperplane classifiers:
Training: w and b define maximal margin hyperplane
Kernels:
)·(sign)( bf xwx
bkf i
l
ii ),(sign)(
1
xxx )()·(),( yxyx k
27 HIWIRE Meeting – Torino, 9 -10 March, 2006
SVM-enabled VAD (3)
28 HIWIRE Meeting – Torino, 9 -10 March, 2006
SVM-enabled VAD (4)
Feature
extraction:
Training:
29 HIWIRE Meeting – Torino, 9 -10 March, 2006
SVM-enabled VAD (5)
Feature
extraction:
Decision function 2-band features
))((sign)(
),()(1
xx
xxx
gf
bkg i
l
ii
30 HIWIRE Meeting – Torino, 9 -10 March, 2006
SVM-enabled VAD (6)
Analysis: 4 subbands Noise reduction
Improvements: Contextual speech features Better results without noise
reduction
31 HIWIRE Meeting – Torino, 9 -10 March, 2006
Dissemination (VAD)
Integrated bispectrum: J.M. Górriz, J. Ramírez, C. G. Puntonet, J.C. Segura, “Generalized-LRT based
voice activity detector”, IEEE Signal Processing Letters, 2006.
J. Ramírez , J.M. Górriz, J. C. Segura, C. G. Puntonet, A. Rubio, “Speech/Non-speech Discrimination based on Contextual Information Integrated Bispectrum LRT”, IEEE Signal Processing Letters, 2006.
J.M. Górriz, J. Ramírez, J. C. Segura, C. G. Puntonet, L. García, “Effective Speech/Pause Discrimination Using an Integrated Bispectrum Likelihood Ratio Test” , ICASSP 2006.
SVM VAD: J. Ramírez, P. Yélamos, J.M. Górriz, J.C. Segura. “SVM-based Speech
Endpoint Detection Using Contextual Speech Features”, IEE Electronics Letters 2006.
J. Ramírez, P. Yélamos, J.M. Górriz, C.G. Puntonet, J.C. Segura. “SVM-enabled Voice Activity Detection”, ISNN 2006.
P. Yélamos, J. Ramírez, J.M. Górriz, C.G. Puntonet, J.C. Segura, “Speech Event Detection Using Support Vector Machines”, ICCS 2006.
HIWIRE MEETINGHIWIRE MEETINGAthens, November 3-4, 2005Athens, November 3-4, 2005
José C. Segura, Javier RamírezJosé C. Segura, Javier Ramírez