15
HIWIRE MEETING HIWIRE MEETING Paris, February 11, 2005 Paris, February 11, 2005 JOSÉ C. SEGURA LUNA JOSÉ C. SEGURA LUNA GSTC UGR GSTC UGR

HIWIRE MEETING Paris, February 11, 2005 JOSÉ C. SEGURA LUNA GSTC UGR

Embed Size (px)

Citation preview

Page 1: HIWIRE MEETING Paris, February 11, 2005 JOSÉ C. SEGURA LUNA GSTC UGR

HIWIRE MEETINGHIWIRE MEETINGParis, February 11, 2005Paris, February 11, 2005

JOSÉ C. SEGURA LUNAJOSÉ C. SEGURA LUNA

GSTC UGRGSTC UGR

Page 2: HIWIRE MEETING Paris, February 11, 2005 JOSÉ C. SEGURA LUNA GSTC UGR

2 HIWIRE Meeting – Paris, 11 February, 2005 José C. Segura Luna

Schedule

AURORA 4 HTK-based setup

Baseline results (AURORA databases) MFCC with C0 and CMN AFE

Additional results CMVN HEQ

Work in progress WP1: Improved HEQ WP2: User independence & robustness

Page 3: HIWIRE MEETING Paris, February 11, 2005 JOSÉ C. SEGURA LUNA GSTC UGR

3 HIWIRE Meeting – Paris, 11 February, 2005 José C. Segura Luna

AURORA 4 HTK-based setup

ETSI AURORA 4 evaluation Baseline system based on ISIP speech recognition system

Main drawbacks: CPU time for experiments (specially for decoding) Scripts are excessively complex to use

Described in: N. Parihar and J. Picone, "DSR Front End LVCSR Evaluation -

AU/384/02," Aurora Working Group, ETSI, December 06, 2002.

G. Hirsch, "Experimental Framework for the Performance Evaluation of Speech Recognition Front-ends on a Large Vocabulary Task, Version 2.0," ETSI STQ-Aurora DSR Working Group, November 19, 2002.

Page 4: HIWIRE MEETING Paris, February 11, 2005 JOSÉ C. SEGURA LUNA GSTC UGR

4 HIWIRE Meeting – Paris, 11 February, 2005 José C. Segura Luna

AURORA 4 HTK-based setup

HTK-based setup for AURORA 4 evaluations

Features 12MFCC + C0 (CMS) + Δ + Δ Δ

Cross-word tree-based tied-state tri-phones 3 states / 6 Gaussians per state

Back-off bi-gram language model Same as used in ISIP setup

Pruning is performed as in ISIP setup

Available for partners at: http://www.hiwire.org

Page 5: HIWIRE MEETING Paris, February 11, 2005 JOSÉ C. SEGURA LUNA GSTC UGR

5 HIWIRE Meeting – Paris, 11 February, 2005 José C. Segura Luna

AURORA 4 HTK-based setup

Performance comparisons (HTK-based setup vs. ISIP) Training clean models from scratch takes 3h52‘ on a 2.66GHz

Word error rate Decoding time (s)

ISIP HTK ISIP HTK

Test 01

(clean data)16.2% 13.22%

7580

(6.16RT)

3428

(2.78RT)

Test 02

(car noise)49.6% 24.68%

22195

(18.03RT)

8002

(6.50RT)

Test 03

(babble noise)62.2% 46.00%

33203

(26.9RT)

13747

(11.17RT)

12 MFCCs + C0 (CMS) + +

Page 6: HIWIRE MEETING Paris, February 11, 2005 JOSÉ C. SEGURA LUNA GSTC UGR

6 HIWIRE Meeting – Paris, 11 February, 2005 José C. Segura Luna

AURORA 4 Baseline results

TRAIN TEST LATTICEPARAMETERS MODE SIZE SIZE 01-07 08-14 01-14 01-07 08-14 01-14

MFCC_0_D_A_Z clean 166 none 40,53 50,60 45,57 --- --- ---MFCC_0_D_A_Z clean 166 sml 26,53 33,57 30,05MFCC_0_D_A_Z clean 166 mid 27,98 35,02 31,50MFCC_0_D_A_Z clean 330 none 40,72 50,78 45,75 -0,47% -0,36% -0,40%MFCC_0_D_A_Z clean 330 sml 25,75 32,93 29,34MFCC_0_D_A_Z clean 330 mid 27,18 34,25 30,71

MFCC_0_D_A_Z multi 166 none 24,58 29,88 27,23 39,36% 40,96% 40,25%MFCC_0_D_A_Z multi 166 sml 17,32 18,87 18,09MFCC_0_D_A_Z multi 166 mid 18,83 20,16 19,50MFCC_0_D_A_Z multi 330 none 24,74 29,73 27,24 38,97% 41,24% 40,23%MFCC_0_D_A_Z multi 330 sml 16,70 17,80 17,25MFCC_0_D_A_Z multi 330 mid 18,26 19,33 18,79

AVERAGES Relative Error Reduction

Page 7: HIWIRE MEETING Paris, February 11, 2005 JOSÉ C. SEGURA LUNA GSTC UGR

7 HIWIRE Meeting – Paris, 11 February, 2005 José C. Segura Luna

AURORA 4 Additional results

TRAIN TEST LATTICEPARAMETERS MODE SIZE SIZE 01-07 08-14 01-14 01-07 08-14 01-14

MFCC_0_D_A_Z clean 166 none 40,53 50,60 45,57 --- --- ---MFCC_0_D_A_Z multi 166 none 24,58 29,88 27,23 39,36% 40,96% 40,25%

MFCC_0_D_A_Z MV clean 166 none 36,12 48,50 42,31 10,88% 4,15% 7,14%MFCC_0_D_A_Z MV DELTAS clean 166 none 34,73 47,35 41,04 14,31% 6,43% 9,94%

AFE clean 166 none 27,57 34,99 31,28 31,99% 30,85% 31,36%AFE noFD clean 166 none 27,69 35,26 31,48 31,67% 30,31% 30,92%AFE noFD multi 166 none 22,33 27,67 25,00 44,90% 45,32% 45,13%

ECDF_WSJ_MULTI clean 166 none 32,81 43,77 38,29 19,06% 13,50% 15,97%ECDF_TID_MULTI clean 166 none 31,36 40,87 36,12 22,61% 19,24% 20,74%ECDF_WSJ_CLEAN clean 166 none 32,19 42,75 37,47 20,58% 15,53% 17,78%ECDF_TID_CLEAN clean 166 none 31,75 41,95 36,85 21,67% 17,09% 19,13%

AVERAGES Relative Error Reduction

Page 8: HIWIRE MEETING Paris, February 11, 2005 JOSÉ C. SEGURA LUNA GSTC UGR

8 HIWIRE Meeting – Paris, 11 February, 2005 José C. Segura Luna

Baseline results

HIWIRE baseline results: 12 MFCCs + C0 (CMS) + +

Subway Babble Car Exhibition Average RestaurantStreet Airport Station Average Subway MStreet M Average AverageClean 98,46 98,46 98,36 98,73 98,50 98,46 98,46 98,36 98,73 98,50 98,53 98,46 98,50 98,5020 dB 97,79 97,67 98,21 97,22 97,72 97,21 97,82 97,88 97,41 97,58 97,88 97,49 97,69 97,6615 dB 97,11 97,13 97,67 97,13 97,26 96,81 96,89 97,02 96,42 96,79 97,05 97,13 97,09 97,0410 dB 95,52 96,07 96,24 94,17 95,50 95,76 95,41 95,88 94,72 95,44 94,96 95,07 95,02 95,385 dB 90,30 90,24 87,41 87,75 88,93 89,75 89,06 90,69 87,87 89,34 90,11 88,18 89,15 89,140 dB 69,85 66,02 48,79 65,84 62,63 70,49 62,36 72,62 57,39 65,72 68,81 63,00 65,91 64,52-5dB 28,98 28,14 19,21 27,00 25,83 34,45 24,73 33,91 23,82 29,23 28,25 26,45 27,35 27,49Average 90,11 89,43 85,66 88,42 88,41 90,00 88,31 90,82 86,76 88,97 89,76 88,17 88,97 88,75

Subway Babble Car Exhibition Average RestaurantStreet Airport Station Average Subway MStreet M Average AverageClean 99,14 99,09 98,99 99,17 99,10 99,14 99,09 98,99 99,17 99,10 99,17 99,12 99,15 99,1120 dB 96,22 97,64 97,70 96,42 97,00 98,10 97,01 98,03 98,06 97,80 96,53 97,16 96,85 97,2915 dB 90,70 93,83 92,01 90,40 91,74 95,18 92,17 94,72 93,80 93,97 90,82 91,93 91,38 92,5610 dB 71,23 79,47 68,77 69,27 72,19 83,60 74,18 84,58 78,28 80,16 71,91 74,21 73,06 75,555 dB 38,19 47,28 32,84 34,80 38,28 54,38 42,26 52,94 44,80 48,60 38,07 42,68 40,38 42,820 dB 21,40 23,34 19,95 18,45 20,79 26,19 22,52 27,97 23,14 24,96 21,89 22,07 21,98 22,69-5dB 13,82 12,48 12,38 10,18 12,22 13,23 12,15 15,39 13,88 13,66 13,79 11,88 12,84 12,92Average 63,55 68,31 62,25 61,87 64,00 71,49 65,63 71,65 67,62 69,10 63,84 65,61 64,73 66,18

Absolute w ord accuracy. If an HTK

output is WORD: %Corr=99.14,

Acc=98.68 [H=……..], the value to enter is

98.68.

Clean training, multicondition testingA

Aurora 2 Small Vocabulary

Multicondition training, multicondition testingA B C

Absolute w ord accuracy. If an HTK

output is WORD: %Corr=99.14,

Acc=98.68 [H=……..], the value to enter is

98.68.

Aurora 2 Small Vocabulary B C

AURORA 2

Page 9: HIWIRE MEETING Paris, February 11, 2005 JOSÉ C. SEGURA LUNA GSTC UGR

9 HIWIRE Meeting – Paris, 11 February, 2005 José C. Segura Luna

Baseline results

AFE

AURORA 2

Subway Babble Car Exhibition Average RestaurantStreet Airport Station Average Subway MStreet M Average AverageClean 99,08 98,85 99,02 99,38 99,08 99,08 98,85 99,02 99,38 99,08 98,89 98,94 98,92 99,0520 dB 98,74 98,28 98,78 98,92 98,68 98,50 98,13 98,54 99,07 98,56 98,62 98,25 98,44 98,5815 dB 98,10 97,88 98,33 98,27 98,15 97,79 97,64 97,82 98,21 97,87 98,10 97,70 97,90 97,9810 dB 95,64 96,16 97,08 96,17 96,26 95,98 95,83 96,66 96,95 96,36 95,55 95,65 95,60 96,175 dB 91,96 91,05 93,80 90,77 91,90 90,70 90,72 92,51 91,79 91,43 90,70 89,09 89,90 91,310 dB 77,13 71,10 81,54 76,06 76,46 72,18 75,51 79,54 77,85 76,27 71,11 70,31 70,71 75,23-5dB 44,01 35,68 43,15 45,56 42,10 37,36 42,10 46,47 45,57 42,88 35,83 36,10 35,97 41,18Average 92,31 90,89 93,91 92,04 92,29 91,03 91,57 93,01 92,77 92,10 90,82 90,20 90,51 91,86

Subway Babble Car Exhibition Average RestaurantStreet Airport Station Average Subway MStreet M Average AverageClean 99,39 99,00 99,28 99,51 99,30 99,39 99,00 99,28 99,51 99,30 99,20 99,24 99,22 99,2820 dB 98,31 98,16 98,81 98,36 98,41 98,50 97,82 98,75 98,64 98,43 97,91 98,13 98,02 98,3415 dB 96,90 96,74 98,00 96,91 97,14 95,92 96,55 97,52 97,19 96,80 96,65 96,49 96,57 96,8910 dB 93,09 92,17 95,97 93,55 93,70 91,80 92,90 94,72 94,72 93,54 92,26 92,17 92,22 93,345 dB 85,26 81,47 89,29 84,91 85,23 80,78 84,16 86,04 86,45 84,36 83,42 82,56 82,99 84,430 dB 65,34 53,87 69,25 63,84 63,08 56,86 61,09 65,14 65,69 62,20 58,15 57,45 57,80 61,67-5dB 32,53 23,45 31,09 31,71 29,70 24,90 29,50 31,94 33,09 29,86 26,90 27,34 27,12 29,25Average 87,78 84,48 90,26 87,51 87,51 84,77 86,50 88,43 88,54 87,06 85,68 85,36 85,52 86,93

Absolute w ord accuracy. If an HTK

output is WORD: %Corr=99.14,

Acc=98.68 [H=……..], the value to enter is

98.68.

Clean training, multicondition testingA

Aurora 2 Small Vocabulary

Multicondition training, multicondition testingA B C

Absolute w ord accuracy. If an HTK

output is WORD: %Corr=99.14,

Acc=98.68 [H=……..], the value to enter is

98.68.

Aurora 2 Small Vocabulary B C

Page 10: HIWIRE MEETING Paris, February 11, 2005 JOSÉ C. SEGURA LUNA GSTC UGR

10 HIWIRE Meeting – Paris, 11 February, 2005 José C. Segura Luna

Baseline results

AURORA 3 word error rates

Italian Spanish German AverageWell (x40%) 5,58% 10,69% 8,86% 8,38%Mid (x35%) 12,98% 16,82% 18,81% 16,20%High (x25%) 53,25% 34,50% 20,31% 36,02%Overall 20,09% 18,79% 15,21% 18,03%

Italian Spanish German AverageWell (x40%) 3,29% 3,39% 4,87% 3,85%Mid (x35%) 7,47% 6,21% 10,40% 8,03%High (x25%) 11,00% 9,23% 8,70% 9,64%Overall 6,68% 5,84% 7,76% 6,76%

AFE

MFCC + C0 (CMS) + +

Page 11: HIWIRE MEETING Paris, February 11, 2005 JOSÉ C. SEGURA LUNA GSTC UGR

11 HIWIRE Meeting – Paris, 11 February, 2005 José C. Segura Luna

Work in progress (WP1)

Improved equalization

Modeling Speech & Noise separately

First results with Gaussian models Very promising on AURORA 4 Need to be evaluated on AURORA 2 & 3

Next Use more detailed / nonparametric models Incorporate dynamic features

Page 12: HIWIRE MEETING Paris, February 11, 2005 JOSÉ C. SEGURA LUNA GSTC UGR

12 HIWIRE Meeting – Paris, 11 February, 2005 José C. Segura Luna

Preliminary results

TRAIN TEST LATTICEPARAMETERS MODE SIZE SIZE 01-07 08-14 01-14 01-07 08-14 01-14

MFCC_0_D_A_Z clean 166 none 40,53 50,60 45,57 --- --- ---MFCC_0_D_A_Z multi 166 none 24,58 29,88 27,23 39,36% 40,96% 40,25%

MFCC_0_D_A_Z (MV) clean 166 none 36,12 48,50 42,31 10,88% 4,15% 7,14%MFCC_0_D_A_Z (MV DELTAS)clean 166 none 34,73 47,35 41,04 14,31% 6,43% 9,94%

AFE clean 166 none 27,57 34,99 31,28 31,99% 30,85% 31,36%AFE noFD clean 166 none 27,69 35,26 31,48 31,67% 30,31% 30,92%AFE noFD multi 166 none 22,33 27,67 25,00 44,90% 45,32% 45,13%

(ECDF_WSJ_MULTI) clean 166 none 32,81 43,77 38,29 19,06% 13,50% 15,97%(ECDF_TID_MULTI) clean 166 none 31,36 40,87 36,12 22,61% 19,24% 20,74%(ECDF_WSJ_CLEAN) clean 166 none 32,19 42,75 37,47 20,58% 15,53% 17,78%(ECDF_TID_CLEAN) clean 166 none 31,75 41,95 36,85 21,67% 17,09% 19,13%

CLASIF N20 ref01 clean 166 none 28,29 33,87 31,08 30,19% 33,06% 31,79%

AVERAGES Relative Error Reduction

Page 13: HIWIRE MEETING Paris, February 11, 2005 JOSÉ C. SEGURA LUNA GSTC UGR

13 HIWIRE Meeting – Paris, 11 February, 2005 José C. Segura Luna

Work in progress (WP1)

VAD & Noise reduction

Baseline evaluations AURORA 2 & 3 already done AURORA 4 to be ready on June

Integration with parametric techniques Speech & Noise equalization

Page 14: HIWIRE MEETING Paris, February 11, 2005 JOSÉ C. SEGURA LUNA GSTC UGR

14 HIWIRE Meeting – Paris, 11 February, 2005 José C. Segura Luna

Work in progress (WP2)

HEQ-based user robustness

Ready for AURORA 4Working in WSJ1 baseline

HEQ-based user adaptation

MLLR baselineEstimation of MLLR transformations using HEQWorking in WSJ1 baseline

Page 15: HIWIRE MEETING Paris, February 11, 2005 JOSÉ C. SEGURA LUNA GSTC UGR

HIWIRE MEETINGHIWIRE MEETINGParis, February 11, 2005Paris, February 11, 2005

JOSÉ C. SEGURA LUNAJOSÉ C. SEGURA LUNA

GSTC UGRGSTC UGR