IMPROVING RECOGNITION PERFORMANCE IN NOISY ENVIRONMENTS Joseph Picone 1 Inst. for Signal and Info. Processing Dept. Electrical and Computer Eng. Mississippi

IMPROVING RECOGNITION PERFORMANCE IN NOISY

ENVIRONMENTS• Joseph Picone1

Inst. for Signal and Info. Processing Dept. Electrical and Computer Eng. Mississippi State University

• Contact Information:

Box 9571 Mississippi State University Mississippi State. Mississippi 39762

Tel: 662-325-3149 Fax: 662-325-2298 Email: [email protected]

1. Three-time workshop survivor (’97-’99)!

CLSP SUMMER PLANNING WORKSHOP

OVERVIEWAURORA LVCSR EVALUATION

•WSJ 5K (closed task) with seven (digitally-added) noise conditions

•Common ASR system•Two participants:

QIO: QualC., ICSI, OGI; MFA: Moto., FrTel., Alcatel

•Client/server applications

•Evaluate robustness in noisy environments

•Propose a standard for LVCSR applications

Performance Summary

SiteTest Set

CleanNoise(Sennh)

Noise(MultiM)

Base (TS1)

15% 59% 75%

Base (TS2)

19% 33% 50%

QIO (TS2) 17% 26% 41%

MFA (TS2)

15% 26% 40%

STATE OF THE ARTADAPTIVE SIGNAL PROCESSING

•Commercial front ends use adaptive noise compensation:

•Advanced front ends use a variety of techniques including subspace methods, normalization, and multiple time scales:

•Aurora LVCSR eval did not address acoustic modeling issues and speaker/channel adaptation (by design).

PROPOSAL SUMMARY

•Focus on Aurora task (TS2):– multiple microphones; representative noise conditions– adaptation/multipass processing within a single utterance– establish benchmarks prior to workshop (incl. adaptation)

SIGNAL PROCESSING VS. ACOUSTIC MODELS

•Some possible themes:– knowledge vs. statistics– phone-dependent spectral models of speech and noise– multi-time scale analysis– subspace methods to separate speech and noise– iterative refinement

•Parallel research tracks:– noise robust front end processing– phone/state-specific features and/or noise models

• J. Picone, "Improving Speech Recognition Performance in Noisy Environments,” Mississippi State University, November 8, 2002 (http://www.isip.msstate.edu/publications/seminars/2002/clsp_pm/).

• N. Parihar and J. Picone, “DSR Front End LVCSR Evaluation – Baseline Recognition System Description,” Aurora Working Group, European Telecommunications Standards Institute, November 1, 2001 (http://www.isip.msstate.edu/publications/reports/aurora_frontend/2001).

• D. Machola, et al, “Evaluation of a Noise-Robust DSR Front End on Aurora Databases,” International Conference on Spoken Language Processing, Denver, Colorado, USA, pp. 17-20, September 2002.

• A. Adamia, et al, “Qualcomm-ICSI-OGI Features For ASR,” International Conference on Spoken Language Processing, Denver, Colorado, USA, pp. 21-24, September 2002.

• C.P. Chen, et al, “Front End Post-Processing and Back End Model Enhancement on the Aurora 2.0/3.0 Databases,” International Conference on Spoken Language Processing, Denver, Colorado, USA, pp. 241-244, September 2002.

• P. Mot´ý¡cek and L. Burget, “Noise Estimation For Efficient Speech Enhancement and Robust Speech Recognition,” International Conference on Spoken Language Processing, Denver, Colorado, USA, pp. 1033-1036, September 2002.

• J. Chen, et al, “Recognition of Noisy Speech Using Normalized Moments,” International Conference on Spoken Language Processing, Denver, Colorado, USA, pp. 2441-2444, September 2002.

• J. Wu and Q. Huo, “An Environment Compensated Minimum Classification Error Training Approach and Its Evaluation in Aurora 2 Database,” International Conference on Spoken Language Processing, Denver, Colorado, USA, pp. 453-456, September 2002.

• G. Saon and J.M. Huerta, “Improvements to the IBM Aurora 2 Multi-Condition System,” International Conference on Spoken Language Processing, Denver, Colorado, USA, pp. 469-4472, September 2002.

REFERENCESAURORA AND ICSLP’2002

Documents

IMPROVING RECOGNITION PERFORMANCE IN NOISY ENVIRONMENTS Joseph Picone 1 Inst. for Signal and Info. Processing Dept. Electrical and Computer Eng. Mississippi