Upload
ira-watson
View
215
Download
2
Embed Size (px)
Citation preview
IMPROVING RECOGNITION PERFORMANCE IN NOISY
ENVIRONMENTS• Joseph Picone1
Inst. for Signal and Info. Processing Dept. Electrical and Computer Eng. Mississippi State University
• Contact Information:
Box 9571 Mississippi State University Mississippi State. Mississippi 39762
Tel: 662-325-3149 Fax: 662-325-2298 Email: [email protected]
1. Three-time workshop survivor (’97-’99)!
CLSP SUMMER PLANNING WORKSHOP
OVERVIEWAURORA LVCSR EVALUATION
•WSJ 5K (closed task) with seven (digitally-added) noise conditions
•Common ASR system•Two participants:
QIO: QualC., ICSI, OGI; MFA: Moto., FrTel., Alcatel
•Client/server applications
•Evaluate robustness in noisy environments
•Propose a standard for LVCSR applications
Performance Summary
SiteTest Set
CleanNoise(Sennh)
Noise(MultiM)
Base (TS1)
15% 59% 75%
Base (TS2)
19% 33% 50%
QIO (TS2) 17% 26% 41%
MFA (TS2)
15% 26% 40%
STATE OF THE ARTADAPTIVE SIGNAL PROCESSING
•Commercial front ends use adaptive noise compensation:
•Advanced front ends use a variety of techniques including subspace methods, normalization, and multiple time scales:
•Aurora LVCSR eval did not address acoustic modeling issues and speaker/channel adaptation (by design).
PROPOSAL SUMMARY
•Focus on Aurora task (TS2):– multiple microphones; representative noise conditions– adaptation/multipass processing within a single utterance– establish benchmarks prior to workshop (incl. adaptation)
SIGNAL PROCESSING VS. ACOUSTIC MODELS
•Some possible themes:– knowledge vs. statistics– phone-dependent spectral models of speech and noise– multi-time scale analysis– subspace methods to separate speech and noise– iterative refinement
•Parallel research tracks:– noise robust front end processing– phone/state-specific features and/or noise models
• J. Picone, "Improving Speech Recognition Performance in Noisy Environments,” Mississippi State University, November 8, 2002 (http://www.isip.msstate.edu/publications/seminars/2002/clsp_pm/).
• N. Parihar and J. Picone, “DSR Front End LVCSR Evaluation – Baseline Recognition System Description,” Aurora Working Group, European Telecommunications Standards Institute, November 1, 2001 (http://www.isip.msstate.edu/publications/reports/aurora_frontend/2001).
• D. Machola, et al, “Evaluation of a Noise-Robust DSR Front End on Aurora Databases,” International Conference on Spoken Language Processing, Denver, Colorado, USA, pp. 17-20, September 2002.
• A. Adamia, et al, “Qualcomm-ICSI-OGI Features For ASR,” International Conference on Spoken Language Processing, Denver, Colorado, USA, pp. 21-24, September 2002.
• C.P. Chen, et al, “Front End Post-Processing and Back End Model Enhancement on the Aurora 2.0/3.0 Databases,” International Conference on Spoken Language Processing, Denver, Colorado, USA, pp. 241-244, September 2002.
• P. Mot´ý¡cek and L. Burget, “Noise Estimation For Efficient Speech Enhancement and Robust Speech Recognition,” International Conference on Spoken Language Processing, Denver, Colorado, USA, pp. 1033-1036, September 2002.
• J. Chen, et al, “Recognition of Noisy Speech Using Normalized Moments,” International Conference on Spoken Language Processing, Denver, Colorado, USA, pp. 2441-2444, September 2002.
• J. Wu and Q. Huo, “An Environment Compensated Minimum Classification Error Training Approach and Its Evaluation in Aurora 2 Database,” International Conference on Spoken Language Processing, Denver, Colorado, USA, pp. 453-456, September 2002.
• G. Saon and J.M. Huerta, “Improvements to the IBM Aurora 2 Multi-Condition System,” International Conference on Spoken Language Processing, Denver, Colorado, USA, pp. 469-4472, September 2002.
REFERENCESAURORA AND ICSLP’2002