13
The 1995 International Conference on Acoustics, Speech, and Signal Processing CONFERENCE PROCEEDINGS VOLUME 1: Speech Sponsored by The Signal Processing Society of The Institute of Electrical and Electronics Engineers May 9-12, 1995 Westin Hotel Detroit, Michigan U.S.A. 95CH35732

International Conference on Acoustics, Speech, and Signal ...The1995 International Conferenceon Acoustics, Speech,andSignal Processing CONFERENCE PROCEEDINGS VOLUME1: Speech SponsoredbyTheSignal

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

  • The 1995 International Conference on

    Acoustics, Speech, and Signal Processing

    CONFERENCE

    PROCEEDINGS

    VOLUME 1:

    Speech

    Sponsored by The Signal Processing Society of

    The Institute of Electrical and Electronics Engineers

    May 9-12, 1995 Westin Hotel Detroit, Michigan U.S.A.

    95CH35732

  • Volume I

    SPEECH

    TABLE OF CONTENTS

    CELP CODING

    Chair: Jean-Pierre Adoul, University ofSherbrooke(CANADA)

    4KBPS Improved Pitch Prediction CELP SpeechCoding with 20ms Frame 1Masahiro Serizawa, Kazunori Ozawa, NEC

    Corporation (JAPAN)

    A Low-Complexity Toll-Quality Variable Bit RateCoder for CDMA Cellular Systems SPeter Kroon, Michael Recchione, AT&T BellLaboratories (USA)

    Toll Quality 16 kb/s CELP Speech Coding withVery Low Complexity 9

    Juin-Hwey Chen, AT&TBell Laboratories (USA)

    CELP Coding Using Trellis-Coded Vector

    Quantization of the Excitation 13Andrei Popescu, Nicolas Moreau, Telecom Paris,Claude Lamblin, CNET/LAA/TSS/CMC(FRANCE)

    Interpolating the History Improved Excitation

    Coding for High Quality CELP Coding 17Per Hedelin, Thomas Eriksson, Chalmers UniversityofTechnology (SWEDEN)

    Fast Stochastic Codebook Search Through the Useof Odd-Symmetric Crosscorrelation Basis Vectors 21

    Cheung-Fat Chan, City University ofHong Kong(HONG KONG)

    Improvements of Background Sound Codingin Linear Predictive Speech Coders 25

    TorbjSrn Wigren, Anders BergstrGm, Susanne

    Harrysson, Fredrik Jansson, Hans Nilsson,Ericsson Radio Systems AB (SWEDEN)

    Improved CS-CELP Speech Coding in a NoisyEnvironment Using a Trained Sparse ConjugateCodebook 29

    Akitoshi Kataoka, Sachiko Hosaka, NTT Human

    Interface Labs; Jotaro Ikedo, NTT Wireless SystemsLaboratories; Takehiro Moriya, Shinja Hayashi, NTT

    Human Interface Labs. (JAPAN)

    CELP Coding Based on Mel-cepstral Analysis 33Kazuhito Koishida, Keiichi Tokuda, Takao Kobayashi,Satoshi Imai, Tokyo Institute of Technology (JAPAN)

    An Embedded Scheme for Regular Pulse Excited

    (RPE) Linear Predictive Coding 37Shude Zhang, Gordon Lockhart, University ofLeeds (UK)

    RECOGNITION: LARGE VOCABULARY

    Chair: Michael Picheny, IBM (USA)

    Performance of the IBM Large Vocabulary ContinuousSpeech Recognition System on the ARPA WallStreet Journal Task 41

    L. R. Bahl, S. Balakrishnan-Aiyer, J.R. Bellegarda,M. Franz, P.S. Gopalakrisnan, D. Nahamoo, M. Novak,M. Padmanabhan, M.A. Picheny, S. Roukos, IBM (USA)

    New Developments in the Lincoln Stack-DecoderBased Large Vocabulary CSR System 45Douglas B. Paul, MIT Lincoln Laboratory (USA)

    Large Vocabulary Continuous Speech RecognitionUsing Word Graphs 49Xavier Aubert, Philips GmbHResearch Laboratories -

    Aachen, Hermann Ney,,4 achen University ofTechnology(GERMANY)

    Reducing Word Error Rate on Conversational

    Speech from the Switchboard Corpus 53P. Jeanrenaud, E. Eide, U. Chaudhari, J. McDonoughK. Ng, M. Shi, H. Gish, BBN Systems and Technologies(USA)

    Golden Mandarin (Hl)--A User-Adaptive Prosodic -

    Segment-Based Mandarin Dictation Machine forChinese Language with Very Large Vocabulary 57Ren-Yuan Lyu, National Taiwan University; Lee-FengChien, Academia Sinica; Shiao-Hong Hwang, Hung-YunHsieh, Rung-Chiuan Yang, Bo-Ren Bai, Jia-Chi Weng,Yen-Ju Yang, Shi-Wei Lin, National Taiwan University;Keh-Jiann Chen, Chiu-Yu Tseng, Lin-Shan Lee, AcademiaSinica (REPUBLIC OF CHINA)

  • Complete Recognition of Continuous Mandarin

    Speech for Chinese Language with Very Large

    Vocabulary but Limited Training Data 61

    Hsin-min Wang, Jia-lin Shen, Yen-Ju Yang, National

    Taiwan University; Chiu-yu Tseng, Academia Sinica,Lin-shan Lee, National Taiwan University (REPUBLICOF CHINA)

    Developments in Continuous Speech Dictation

    Using the ARPA WSJ Task 65

    J.L. Gauvain, L. Lamel, M. Adda-Decker, LIMSI-CNRS

    (FRANCE)

    Recent Improvements to the ABBOT LargeVocabulary CSR System 69

    M.M. Hochberg, Cambridge University; S.J. Renals,

    University ofSheffield; A.J. Robinson, G. D. Cook,

    Cambridge University (ENGLAND)

    The 1994 HTK Large Vocabulary SpeechRecognition System 73P. C. Woodland, C. J. Leggetter, J. J. Odell, V.

    Valtchev, S.J. Young, Cambridge University (UK)

    Tangerine: A Large Vocabulary Mandarin

    Dictation System 77

    Yuqing Gao, Hsiao-Wuen Hon, Zhiwei Lin, Gareth

    Loudon, S. Yogananthan, Baosheng Yuan, National

    University ofSingapore (SINGAPORE)

    ASR SYSTEM & CORPORA

    Chair: John Bridle, Diagem (USA)

    WSJCAMO: A British English Speech Corpus for

    Large Vocabulary Continuous Speech Recognition 81

    Tony Robinson, Jeroen Fransen, David Pye, Jonathan

    Foote, Steve Renals, Cambridge University (UK)

    Voice Across Hispanic America: A TelephoneSpeech Corpus of American Spanish 85Yeshwant Muthusamy, Edward Holliman, Barbara

    Wheatley, Texas Instruments; Joseph Picone,Mississippi State University; John Godfrey, UniversityofPennsylvania (USA)

    Implementation of the POW (Phonetically OptimizedWords) Algorithm for Speech Database 89

    Yeonja Lim, Youngjik Lee, ETR1 (KOREA)

    Microsoft Windows Highly Intelligent SpeechRecognizer: Whisper 93

    Xuedong Huang, Alex Acero, Fil Alleva, Mei-Yuh

    Hwang, Li Jiang, Milind Mahajan, MicrosoftCorporation (USA)

    Concept-Based Speech Translation 97L. Mayfield, M. Gavalda, W. Ward, A. Waibel,

    Carnegie Mellon University (USA)

    PhoneBook: A Phonetically-Rich Isolated-Word

    Telephone-Speech Database 101John F. Pitrelli, Cynthia Fong, Suk H. Wong, Judith R.

    Spitz, Hong C. Leung, NYNEXScience & Technology,Inc. (USA)

    CTIMIT: A Speech Corpus for the CellularEnvironment with Applications to Automatic

    Speech Recognition 105

    Kathy L. Brown, E. Bryan George, Lockheed Sanders,Inc. (USA)

    Toward Movement-Invariant Automatic Lip-

    Reading and Speech Recognition 109Paul Duchnowski, University ofKarlsruhe

    (GERMANY); Martin Hunke, Carnegie Mellon

    University (USA); Dietrich Btlsching, Uwe Meier,

    University ofKarlsruhe (GERMANY); Alex Waibel,

    Carnegie Mellon University (USA)

    Some Results with a Trainable Speech Translation

    and Understanding System 113

    V.M. Jimenez, A. Castellanos, E. Vidal, Universidad

    Politecnica de Valencia (SPAIN)

    A Continuous Speech Recognition System UsingFinite State Network and Viterbi Beam Search for

    the Automatic Interpretation 117

    Nam-Yong Han, Hoi-Rin Kim, Kyu-Woong Hwang,

    Young-Mok Ahn, Joon-Hyung Ryoo, £77?/ (KOREA)

    ROBUST SPEECH RECOGNITION

    Chair: Richard Stern, Carnegie Mellon University (USA)

    Robust Speech Recognition Based on Stochastic

    Matching 121Ananth Sankar, SRI International; Chin-Hui Lee, AT&T

    Bell Laboratories (USA)

    On the Robustness of Linear Discriminant Analysis asa Preprocessing Step for Noisy Speech Recognition 125Olivier Siohan, CRIN-CNRS & INRIA - Lorraine (FRANCE)

    A Maximum Likelihood Procedure for a Universal

    Adaptation Method Based on HMM Composition 129Yasuhiro Minami, Sadaoki Furui, NTT Human InterfaceLaboratories (JAPAN)

    A Fast and Flexible Implementation of ParallelModel Combination 133

    M. J. F. Gales, S. J. Young, Cambridge University (U.K.)

  • Multivariate-Gaussian-BasedCepstralNormalization for Robust Speech Recognition 137Pedro J. Moreno, Bhiksha Raj, Evandro Gouvea,Richard M. Stern, Carnegie Mellon University (USA)

    Robust Speech Recognition in Noise UsingAdaptation and Mapping Techniques 141Leonardo Neumeyer, Mitchel Weintraub, SRIInternational (USA)

    Noisy Speech Recognition Using Robust Inversionof Hidden Markov Models 145

    Seokyong Moon, Jenq-Neng Hwang, University ofWashington (USA)

    Rapid Environment Adaptation for Robust SpeechRecognition 149Keizaburo Takagi, Hiroaki Hattori, Takao Watanabe,NEC Corporation (JAPAN)

    Noise Estimation Techniques for Robust Speech

    Recognition 153

    H.G. Hirsch, C. Erlicher, Aachen University ofTechnology (GERMANY)

    Pole-Filtered Cepstral Mean Subtraction 157

    Devang Naik, Rutgers University (USA)

    Discourse Structure for Multi-Speaker SpontaneousSpoken Dialogs: Incorporating Heuristics intoStochastic RTNs 177

    Sheryl R. Young, Carnegie Mellon University (USA)

    Improved Backing-Off for M-Gram LanguageModeling 181Reinhard Kneser, Philips GmbH Research Laboratories,Hermann Ney, RWTHAachen, University ofTechnology(GERMANY)

    QWI: A Method for Improved Smoothing in

    Language ModellingG. Bordel, I. Torres, Universidad del Pais Vasco; E.

    Vidal, Universidad Politecnica de Valencia (SPAIN)

    185

    Using a Stochastic Context-Free Grammar as a

    Language Model for Speech Recognition 189

    Daniel Jurafsky, University ofCalifornia at Berkeley;Chuck Wooters, Department ofDefense; Jonathan Segal,

    University ofCalifornia atBerkeley; Andreas Stolcke,SRI International; Eric Fosler, University ofCalifornia at

    Berkeley; Gary Tajchman, Voice Processing Corporation;Nelson Morgan, University ofCalifornia at Berkeley (USA)

    Improved Language Modeling by UnsupervisedAcquisition of Structure 193

    Klaus Ries, Finn Dag Bu0, University ofKarlsruhe

    (GERMANY); Ye-Yi Wang, Carnegie Mellon University

    (USA); Alex Waibel, University ofKarlsruhe (GERMANY)

    LANGUAGE MODELING

    Chair: Roni Rosenfeld, Carnegie Mellon University (USA)

    Language Model Adaptation via Minimum

    Discrimination Information 161

    P. Srinivasa Rao, Michael D. Monkowski, Salim

    Roukos, IBM T J. Watson Research Center (USA)

    Clustering Word Category Based on Binomial

    posteriori Co-occurence Distribution 165

    Masafumi Tamoto, Takeshi Kawabata, NTT Basic

    Research Labs (JAPAN)

    Language Modeling by Variable Length Sequences:Theoretical Formulation and Evaluation of

    Multigrams 169

    Sabine Deligne, FredeYic Bimbot, Telecom Paris

    (FRANCE)

    An Integrated Grammar/Bigram LanguageModel Using Path Scores

    Harvey Lloyd-Thomas, Jerry H. Wright, EnsigmaLimited; Gareth J.F. Jones, University ofCambridge

    (UK)

    173

    Understanding Referring Expressions in a Person-

    Machine Spoken Dialogue 197

    Claudia Pateras, Gregory Dudek, Renato De Mori,McGill University (CANADA)

    USE OF KNOWLEDGE IN ASR

    Chair: Jim Glass, Massachusetts Institute ofTechnology,LCS (USA)

    Analysis of Acoustic-Phonetic Variations in Fluent

    Speech Using TIMIT 201

    Don X. Sun, State University ofNew York (USA), Li

    Deng, University ofWaterloo (CANADA)

    Analyzing Weaknesses of Language Models for

    Speech Recognition 205

    Joerg P. Ueberla, DRA Malvern (UK)

    A Hidden Markov Model with Optimized Inter-

    Frame Dependence 209

    F. J. Smith, J. Ming, P. O'Boyle, A.D. Irvine, The

    Queen's University (N. IRELAND)

  • On the Use of Scaler Quantization for Fast HMM

    Computation 213

    Shigeki Sagayama, Satoshi Takahashi, NTTHuman

    Interface Laboratories (JAPAN)

    Large-vocabulary Speech Recognition in SpecializedDomains 217

    Haakon Chevalier, Chuck Ingold, Carol Kunz, ChipMoore, Crispen Roven, Jon Yamron, Bradley Baker,Paul Bamberg, Sarah Bridle, Tracy Bruce, Amy Weader,

    Dragon Systems, Inc. (USA)

    Understanding and Improving Speech RecognitionPerformance through the Use of Diagnostic Tools 221

    Ellen Eide, Herbert Gish, Philippe Jeanrenaud, Angela

    Mielke, BBN Systems and Technologies (USA)

    Phrase Bigrams for Continuous Speech Recognition 225

    Egidio P. Giachin, CSELT (ITALY)

    Using Explicit Segmentation to Improve HMMPhone Recognition 229Carl D. Mitchell, Mary P. Harper, Leah H. Jamieson,Purdue University (USA)

    Viterbi Algorithm for Acoustic Vectors Generated

    by a Linear Stochastic Differential Equation onEach State 233

    Marco Saerens, Universite Libre de Bruxelles,

    (BELGIUM)

    Non Deterministic Stochastic Language Models

    for Speech Recognition 237G. Riccardi, E. Bocchieri, R. Pieraccini, AT&T BellLaboratories (USA)

    TOPICS IN SPEECH CODING

    Chair: Peter Kroon, AT&TBell Laboratories (USA)

    Improving 16 kb/s G.728 LD-CELP Speech Coderfor Frame Erasure Channels 241

    Craig R. Watkins, Juin-Hwey Chen, AT&T BellLaboratories (USA)

    Reconstruction of Missing Packets for CELP-Based

    Speech Coders 245Aamir Husain, Vladimir Cuperman, Simon Fraser

    University (CANADA)

    A Robust Variable-Rate Speech Coder 249A. Shen, B. Tang, A. Alwan, G. Pottie, University ofCalifornia - Los Angeles(USA)

    Wideband Speech Coding Using MultipleCodebooks and Glottal Pulses 253

    C. McElroy, B.P. Murray, A.D. Fagan, UniversityCollege - Dublin (IRELAND)

    Speech Coding Using ISI Coded Quantization 257

    Nam Phamdo, SUNY; Cheng-Chieh Lee, University

    ofMaryland; Rajiv Laroia, AT&T Bell Laboratories

    (USA)

    New Techniques for Multi-prototype Waveform

    Coding at 2.84 kb/s 261I.S. Burnett, G.J. Bradley, University of Wollongong(AUSTRALIA)

    Quantization of Non-Linear Predictors in SpeechCoding 265Jes Thyssen, Henrik Nielsen, Tele Danmark Research,Steffen Duus Hansen, Technical University ofDenmark

    (DENMARK)

    A Fast Robust Stochastic Algorithm for Vector

    Quantizer Design for Nonstationary Channels 269

    B. K5vesi, S. Saoudi, J.M. Boucher, ENST/Bretagne,(FRANCE); Z. Reguly, Technical University ofBudapest (HUNGARY)

    Voice Quality of Interconnected PCS, JapaneseCellular, and Public Switched Telephone Networks 273

    Spiros Dimolitsas, Franklin L. Corcoran, Channasandra

    Ravishanker, COMSAT Laboratories; Marion Baraniecki,INTELSAT (USA)

    Objective Speech Measure for Chinese in WirelessEnvironment 277

    K.H. Lam, O.C. Au, C.C. Chan, K.F. Hui, S.F. Lau,

    Hong Kong University ofScience & Technology(HONG KONG)

    WORDSPOTTING, REJECTION, AND

    TOPIC IDENTIFICATION

    Chair: Jay Wilpon, AT&T Bell Laboratories (USA)

    A Training Procedure for Verifying StringHypotheses in Continuous Speech Recognition 281R.C. Rose, B.H. Juang, C.H. Lee, AT&T BellLaboratories (USA)

    Robust Utterance Verification for Connected

    Digits Recognition 285Mazin G. Rahim, Chin-Hui Lee, Biing-llwang Juang,A T&T Bell Laboratories (USA)

  • A Hybrid Wordspotting Method for SpontaneousSpeech Understanding Using Word-Based Pattern

    Matching and Phoneme-BasedHMM 289

    Hiroshi Kanazawa, Mitsuyoshi Tachimori, Yoichi

    Takebayashi, Toshiba Corporation (JAPAN)

    Acoustic and Language Modeling ofHuman and

    Nonhuman Noises for Human-to-Human

    Spontaneous Speech Recognition 293

    T. Schultz, I. Rogina, University ofKarlsruhe

    (GERMANY) and Carnegie Mellon University (USA)

    LVCSR Log-Likelihood Ratio Scoring for Keyword

    Spotting 297

    Mitchel Weintraub, SRI International (USA)

    Keyword Spotting Using Supervised/Unsupervised

    Competitive Learning 301

    Chakib Tadj, Franck Poirier, Telecom Paris (FRANCE)

    A Continuous Density Neural Tree Network

    Word Spotting System 305

    Stephen V. Kosonocky, IBM T.J. Watson Research

    Center; Richard J. Mammone, Rutgers University (USA)

    Video Mail Retrieval: The Effect of Word Spotting

    Accuracy on Precision 309

    G.J.F. Jones, J.T. Foote, K. Sparck Jones, S.J. Young,Cambridge University (UK)

    Improved Topic Spotting through Statistical

    Modelling of Keyword DependenciesJerry H. Wright, Michael J. Carey, Eluned S. Parris,

    Ensigma Limited (UK)

    313

    Topic Focusing Mechanism for Speech RecognitionBased on Probabilistic Grammar and Topic

    Markov Model 317

    Takeshi Kawabata, NTT Basic Research Labs (JAPAN)

    The Effects of Telephone Transmission Degradationson Speaker Recognition Performance 329D.A. Reynolds, M.A. Zissman, T.F. Quatieri, G.C.

    O'Leary, B.A. Carlson, MITLincoln Laboratory (USA)

    Covariance Estimation Methods for Channel

    Robust Text-Independent Speaker Identification 333Michael Schmidt, Herbert Gish, Angela Mielke, BBN

    Systems and Technologies (USA)

    Channel and Noise Compensation for Text

    Dependent Speaker Verification Over Telephone 337William Y. Huang, ITTAerospace Communications;Bhaskar D. Rao, University ofCalifornia (USA)

    Testing with the Yoho CD-ROM Voice Verification

    Corpus 341

    Joseph P. Campbell, Jr., U.S. Department ofDefense(USA)

    An Orthogonal Polynomial Representation of Speech

    Signals and Its Probabilistic Model for Text

    Independent Speaker Verification 345

    Chi-Shi Liu, Ministry ofTransportation and

    Communications; Hsiao-Chaun Wang, National Tsing

    Hua University (TAIWAN); Frank K. Soong, AT&T

    Bell Laboratories (USA); Chao-Shih Huang, Ministry

    of Transportation and Communications (TAIWAN)

    Text-Dependent Speaker Verification Using Data

    Fusion 349

    Kevin R. Farrell, Dictaphone Corporation (USA)

    Neural Net Approaches to Speaker Verification:

    Comparison with Second Order Statistic Measures

    M. Mehdi Homayounpour, CNRS/URA, (FRANCE);

    Gerard Chollet, IDIAP (SWITZERLAND)

    353

    A Subword Neural Tree Network Approach to Text-

    Dependent Speaker Verification 357

    Han-Sheng Liou, Richard J. Mammone, Rutgers

    University (USA)

    SPEAKER RECOGNITION

    Chair: S. ?nrthasmthy, AT&TBell Laboratories (USA)

    The Influence of Noise on the Speaker Recognition

    Performance Using the Higher Frequency Band 321

    Shoji Hayakawa, Fumitada Itakura, Nagoya University

    (JAPAN)

    Measuring Fine Structure in Speech: Application

    to Speaker Identification

    C.R. Jankowski, Jr., T.F. Quatieri, D.A. Reynolds,

    MIT Lincoln Laboratory (USA)

    325

    RECOGNITION: FEATURE ANALYSIS

    Chair: Shigeki Sagayama, NTT (JAPAN)

    Statistical Modeling of Speech Feature Vector

    Trajectories Based on a Piecewise Continuous

    Mean Path 361

    Mark M. Thomson, University ofAuckland (NEW

    ZEALAND)

  • Trace-Segmentation of Isolated Utterances for

    Speech Recognition 365Euvaldo F. Cabral, Jr., University ofSao Paulo,

    (BRAZIL); Graham D. Tattersall, University ofEast

    Anglia (UK)

    Optimal Linear Feature Transformations for

    Semi-Continuous Hidden Markov Models 369

    E. Gtinter Schukat-Talamazzini, Joachim Hornegger,Heinrich Niemann, Universitdt Erlangen-Nurnberg(GERMANY)

    Use of Generalized Dynamic Feature Parametersfor Speech Recognition: Maximum Likelihoodand Minimum Classification Error Approaches 373

    C. Rathinavelu, L. Deng, University of Waterloo

    (CANADA)

    A Statistical Pattern Recognition Approach toRobust Recursive Identification of Non-stationaryAR Model of Speech Production System 377Milan Z,. Markovic, Institute ofApplied Math andElectronics, Branko D. Kovacevic, University ofBelgrade, Milan M. Milosavljevic, Institute ofAppliedMath andElectronics (YUGOSLA VIA)

    The NP Speech Activity Detection AlgorithmJoseph Pencak, Douglas Nelson, Department of

    Defense (USA)

    Speech Analysis Based on Malvar WaveletTransform

    Christophe Ris, Vincent Fontaine, Henri Leich,Faculte Polytechnique de Mons (BELGIUM)

    Magnitude Spectral Estimation via PoissonMoments with Application to Speech RecognitionSamel Celebi, Jose C. Principe, University ofFlorida(USA)

    381

    Improved Speech Modeling and Recognition UsingMulti-dimensional Articulatory States as Primitive

    Speech Units 385L. Deng, J. Wu, H. Sameti, University ofWaterloo(CANADA)

    389

    393

    397Stochastic Perceptual Models of SpeechNelson Morgan, University ofCalifornia at Berkeley,(USA); Hervd Bourlard, Faculte Polytechnique(BELGIUM); Steven Greenberg, University ofCalifornia atBerkeley; Hynek Hermansky, Oregon Graduate Institute,Su-Lin Wu, University ofCalifornia at Berkeley (USA)

    TOPICS IN NOISE AND RECOGNITION

    Chair: Yariv Ephraim, George Mason University (USA)

    Auditory Scene Analysis and Hidden MarkovModel Recognition of Speech in Noise 401P.D. Green, M.P. Cooke, M.D. Crawford, UniversityofSheffield (UK)

    Speech Enhancement Based on Temporal

    Processing 405Hynek Hermansky, Eric A. Wan, Carlos Avendano,Oregon Graduate Institute ofScience & Technology (USA)

    A Comparative Study of Mel Cepstra and EIH forPhone Classification under Adverse Conditions 409

    Sumeet Sandhu, Oded Ghitza, AT&T Bell Laboratories

    (USA)

    Supplementary Orthogonol Cepstral Features 413Khaled T. Assaleh, Motorola, GSTG (USA)

    Subband Analysis for Robust Speech Recognitionin the Presence of Car Noise 417

    Engin Erzin, Bilkent University; A. Enis Cetin, Kog

    University; Yasemin Yardimci, Bogazici University(TURKEY)

    Robust Speech Feature Extraction Using SBCOR

    Analysis 421

    Shoji Kajita, Fumitada Itakura, Nagoya University(JAPAN)

    Methods for Improved Speech Recognition Over

    Telephone Lines 425Alfred Hauenstein, Erwin Marschall, Siemens AG

    (GERMANY)

    New HOS-Based Parameter Estimation Methods

    for Speech Recognition in Noisy Environments 429Asunci6n Moreno, Sergio Tortola, Josep Vidal,Jose* A. R. Fonollosa, Universitat Politecnica de

    Catalunya (SPAIN)

    Noise Compensation for Speech Recognition in CarNoise Environments 433

    Ruikang Yang, Petri Haavisto, Nokia Research Center

    (FINLAND)

    Speech Recognition in Impulsive Noise 437S.V. Vaseghi, B.P. Milner, University of East Anvjia(UK)

  • RECOGNITION: TRAINING TECHNIQUES SPEECH CODING BELOW 4 KB/S

    Chair: Robin Rohlicek, BBN, Inc. (USA)

    Speaker-independent Phone Modeling Based onSpeaker-dependent HMMs' Composition and

    Clustering 441Tetsuo Kosaka, Shoichi Matsunaga, A 77? InterpretingTelecommunications Research Laboratories; Mikio

    Kuraoka, Toyohashi University ofTechnology (JAPAN)

    Using Morphology Towards Better Large-Vocabulary Speech Recognition Systems 445P. Geutner, Universitat Karlsruhe (GERMANY)

    Optimal Splitting of HMM Gaussian Mixture

    Components with MMIE Training 449Yves Normandin, Centre de Recherche informatiquede Montreal (CANADA)

    Dictionary Learning: Performance through

    Consistency 453Tilo Sloboda, Universitat Karlsruhe (GERMANY)

    Incremental MAP Estimation ofHMMs for

    Efficient Training and Improved Performance 457Yoshihoko Gotoh, Brown University (USA); MichaelM. Hochberg, Cambridge University (UK); Daniel J.

    Mashao, Harvey F. Silverman, Brown University(USA)

    Discrete MMI Probability Models for HMM

    Speech Recognition 461

    J.T. Foote, Cambridge University (UK)

    Global Discrimination for Neural Predictive

    Systems Based on N-Best Algorithm 465

    Abdelhamid Mellouk, LRI, UA 410 CNRS; Patrick

    Gallinari, LAFORIA, UA CNRS 1095 (FRANCE)

    Enhancement of Discriminative Capabilities ofHMM Based Recognizer through Modification

    of Viterbi Algorithm 469

    Jianming Song, The University ofWoolongong(AUSTRALIA)

    A Generalization of the Baum Algorithm to

    Functions on Non-linear Manifolds 473

    D. Kanevsky, IBM T.J. Watson Research Center (USA)

    Data-Driven Codebook Adaptation in Phonetically

    Tied SCHMMs

    Thomas Kemp, Universitat Karlsruhe (GERMANY)

    All

    Chair: Thomas E. Tremain, U.S. Department ofDefense(USA)

    NATO STANAG 4479: A Standard for an 800 bpsVocoder and Channel Coding in HF-ECCM System 480B. Mouy, P. de La Noue, G. Goudezeune, ThomsonCSF-RGS (FRANCE)

    Harmonic and Noise Coding of LPC Residuals withClassified Vector Quantization 484

    Masayuki Nishiguchi, Jun Matsumoto, Sony Corporation(JAPAN)

    Progress Towards a New Government Standard2400 bps Voice Coder 488M.A. Kohler, L.M. Supplee, T.E. Tremain, U.S.

    Department ofDefense (USA)

    Variable Dimension Spectral Coding of Speech at2400 bps and Below with Phonetic Classification 492Amitava Das, Allen Gersho, University ofCalifornia -Santa Barbara (USA)

    Spectral Excitation Coding of Speech At 2.4 kb/s 496V. Cuperman, P. Lupini, B. Bhattacharya, SimonFraser University (CANADA)

    A Robust 2400 bps Subband LPC Vocoder 500

    P. A. Laurent, P. de La Noue, Thomson CSF-RGS

    (FRANCE)

    Band-Widened Harmonic Vocoder at 2 to 4 kbps 504

    Gao Yang, G. Zanellato, Lernout & Hauspie SpeechProducts; H. Leich, Faculte Polytechnique de Mons

    (BELGIUM)

    A Speech Coder Based on Decomposition ofCharacteristic Waveforms 508

    W. Bastiaan Kleijn, Jesper Haagen, AT&TBellLaboratories (USA)

    Speech Compression Using Pitch SynchronousInterpolation 512

    R. Taori, R.J. Sluijter, E. Kathmann, Philips Research

    Laboratories (THE NETHERLANDS)

    Pitch-Synchronous Multi-Band (PSMB) Speech

    Coding 516

    Haiyun Yang, Soo-Ngee Koh, Pratab Sivaprakasapillai,

    Nanyang Technological University (SINGAPORE)

  • RECOGNITION: MODELING

    STRUCTURES

    RECOGNITION: SEARCH TECHNIQUES

    Chair: Hsiao-Wuen Hon, Apple ISS Research Centre,

    University ofSingapore (SINGAPORE)

    Four-Level Tied-Structure for Efficient

    Representation of Acoustic Modeling 520Satoshi Takahashi, Shigeki Sagayama, NTT Human

    Interface Laboratories (JAPAN)

    Application of Clustering Techniques to Mixture

    Density Modelling for Continuous-SpeechRecognition 524Christian Dugast, Peter Beyerlein, Reinhold Haeb-

    Umbach, Philips Research Laboratories (GERMANY)

    Context Dependent Phonetic Duration Models for

    Decoding Conversational Speech 528Michael D. Monkowski, Michael A. Picheny, P.Srinivasa Rao, IBM- TJ Watson Research Center

    (USA)

    A Unified Way in Incorporating Segmental Featureand Segmental Model into HMM 532Jun He, Henri Leich, Faculte Polytechnique de Mons

    (BELGIUM)

    Experimental Evaluation of Segmental HMMs 536

    Wendy J. Holmes, Martin J. Russell, DRA Malvern(UK)

    Improved Acoustic Modeling for SpeechRecognition Using 2D Markov Random Fields 540Helmut Lucke, ATR/ITL (JAPAN)

    Structured Markov Models for Speech Recognition 544F. Wolfertstetter, G. Ruske, Munich University ofTechnology (GERMANY)

    Robust Parametric Modeling of Durations inHidden Markov Models 548David Burshtein, Tel-Aviv University (ISRAEL)

    Improved Decision Trees for Phonetic Modeling 552Roland Kuhn, Ariane Lazarides, Yves Normandin,Julie Brousseau, CRIM (CANADA)

    High Speed Speech Recognition Using Tree-Structured Probability Density Function 556Takao Watanabe, Koichi Shinoda, Keizaburo Takagi,Ken-ichi Iso, NEC Corporation (JAPAN)

    Chair: Al Alewa, Microsoft Corporation (USA)

    A Fast Segmental Viterbi Algorithm for LargeVocabulary Recognition 560P. Laface, C. Vair, Politechnico di Torino, L. Fissore,CSELT (ITALY)

    Searching with a Transcription Graph 564Z. Li, P. Kenny, D. O'Shaughnessy, Universite du

    Quebec (CANADA)

    On the Use of Stochastic Inference Networks for

    Representing Multiple Word Pronunciations 568Renato De Mori, Charles Snow, Michael Galler,McGill University School ofComputer Science(CANADA)

    A Tree Search Strategy for Large-VocabularyContinuous Speech Recognition 572P.S. Gopalakrishnan, L.R. Bahl, IBM; R.L. Mercer,Renaissance Technologies (USA)Lattice-Based Search Strategies for LargeVocabulary Speech Recognition 576F. Richardson, M. Ostendorf, Boston University; J.R.

    Rohlicek, Bolt Beranek & Newman, Inc. (USA)

    On Using a priori Segmentation of the SpeechSignal in an N-Best Solutions Post-processing 580T. Moudenc, D. Jouvet, J. Monne\ France Telecom

    (FRANCE)

    Time-Synchronous Continuous Speech RecognizerDriven by a Context-Free Grammar 584Tohru Shimizu, ATR/ITL; Seikou Monzen, YamagataUniversity; Harald Singer, Shoichi Matsunaga, ATR/ITL

    (JAPAN)

    Language Model Representations for Beam-SearchDecoding 588Giuliano Antoniol, Fabio Brugnara, Mauro Cettolo,Marcello Federico, I.R.S.T. (ITALY)

    A Lower-Complexity Viterbi Algorithm 592Sarvar Patel, Bellcore (USA)

    Efficient Search Using Posterior Phone ProbabilityEstimates 596

    Steve Renals, University ofSheffield, Mike Hochberg,University of Cambridge (UK)

  • PROSODY FOR SYNTHESIS &

    RECOGNITIONSPEECH SYNTHESIS & PRODUCTION

    Chair: Yoshinori Sagisaka, AT&TBell Laboratories (USA)

    Timing Patterns in Fluent and Disfluent SpontaneousSpeech 600Douglas O'Shaughnessy, Universite du Quebec (CANADA)

    Stochastic Modeling of Pause Insertion UsingContext- Free Grammar 604

    Shigeru Fujio, Yoshinori Sagisaka, Norio Higuchi,ATR-ITL (JAPAN)

    Automatic Classification of Pitch Movements via

    MLP-Based Estimation of Class Probabilities 608

    Louis F. M. ten Bosch, Institute for PerceptionResearch, (THE NETHERLANDS)

    On the Effects of Speech Rate in Large Vocabulary

    Speech Recognition Systems 612Matthew A. Siegler, Richard M. Stern, CarnegieMellon University (USA)

    A Prosodic Model of Mandarin Speech and Its

    Application to Pitch Level Generation for Text

    -to-Speech 616

    Shaw-Hwa Hwang, Sin-Homg Chen, National Chiao

    Tung University (REPUBLIC OF CHINA)

    Prosodic Cues to Word UsageKaren Ward, David G. Novick, Oregon Graduate

    Institute ofScience & Technology(USA)

    620

    Automatic Prosodic Segmentation by F0 Clustering

    Using Superpositional Modeling 624Mitsuru Nakai, Tohoku University, Harald Singer,Yoshinori Sagisaka, ATR-ITL; Hiroshi Shimodaira,JAIST (JAPAN)

    Duration Modeling in Large Vocabulary Speech

    Recognition 628

    Anastasios Anastasakos, Northeastern University,Richard Schwartz, Han Shu, BBNSystems and

    Technologies (USA)

    Speaker-Independent Automatic Classification of

    Thai Tones in Connected Speech by Analysis-

    Synthesis Method 632

    Siripong Potisuk, Mary P. Harper, Jackson T. Gandour,

    Purdue University (USA)

    Chair: Kathleen Cummings, Georgia Institute ofTechnology (USA)

    Speech Synthesis System Based on a Variable

    Decimation/Interpolation Factor 636F. M. Gimenezde los Galanes, Univerisity Politecnicade Madrid; M. H. Savoji, Universidad de Cantabria;J. M. Pardo, Univerisity Politecnica de Madrid (SPAIN)

    Automatic Speech Synthesiser Parameter Estimation

    Using HMMs 640R.E. Donovan, P.C. Woodland, Cambridge University(UK)

    Speaker Modification with LPC Pole Analysis 644Janet Slifka, Systems Research Laboratories, TimothyR. Anderson, Armstrong Laboratory (USA)

    Synthesizing Styled Speech Using the Klatt

    Synthesizer 648Janet C. Rutledge, Northwestern University; KathleenE. Cummings, Daniel A. Lambert, Mark A. Clements,

    GeorgiaInstitute ofTechnology (USA)

    Acoustical Measurements of the Vocal-Tract Area

    Function: Sensitivity Analysis and Experiments 652

    Hani Yehia, Nagoya University; Masaaki Honda, NTT

    Basic Research Laboratories; Fumitada Itakura, Nagoya

    University (JAPAN)

    Shape-Invariant Pitch-Synchronous Text-to-

    Speech Conversion 656

    Eduardo R. Banga, Carmen Garcia-Mateo, Universidad

    deVigo, (SPAIN)

    Speech Parameter Generation from HMM Using

    Dynamic FeaturesKeiichi Tokuda, Takao Kobayashi, Satoshi Imai,

    Tokyo Institute ofTechnology (JAPAN)

    660

    A Source Generator Based Modeling Framework for

    Synthesis of Speech Under Stress 664

    Sahar E. Bou-Ghazale, John H. L. Hansen, Duke

    University (USA)

    MBE Synthesis of Speech Coded in LPC Format

    K.F. Lam, C.F. Chan, City Polytechnic ofHong Kong

    (HONGKONG)

    668

    Modeling Speech Production Using Yee's Finite

    Difference Method 672

    Kathleen E. Cummings, Georgia Institute ofTechnology;James G. Maloney, Georgia Technical Research Institute;

    Mark A. Clements, Georgia Institute ofTechnology (USA)

  • SPEAKER ADAPTATION SPECTRAL QUANTIZATION

    Chair: C.H. Lee, AT&TBell Laboratories (USA)

    Batch, Incremental and Instantaneous AdaptationTechniques for Speech Recognition 676

    G. Zavaliagkos, Northeastern University; R. Schwartz,J. Makhoul, BBN Systems and Technologies (USA)

    Speaker Adaptation Using Combined Transformation

    and Bayesian Methods 680

    Vassilios Digalakis, Leonardo Neumeyer, SRIInternational (USA)

    Rapid Speaker Adaptation Using Model Prediction 684S. M. Ahadi, P. C. Woodland, Cambridge University(UK)

    Speaker Adaptation Based on Transfer Vector Field

    Smoothing Using Maximum a posteriori ProbabilityEstimation 688

    Masahiro Tonomura, Tetsuo Kosaka, Shoichi

    Matsunaga, A TR Interpreting TelecommunicationsResearch Labs (JAPAN)

    Experiments Using Data Augmentation for SpeakerAdaptation 692Jerome R. Bellegarda, Apple Computer Inc.; Peter V.de Souza, David Nahamoo, Mukund Padmanabhan,Michael A. Picheny, Lalit R. Bahl, IBM (USA)

    Vector-Field-Smoothed Bayesian Learning forIncremental Speaker Adaptation 696Jun-ichi Takahashi, Shigeki Sagayama, NTTHumanInterface Laboratories (JAPAN)

    A Speaker Adaptation Technique Using Linear

    Regression 700S.J. Cox, University ofEast Anglia (UK)

    Speaker Adaptation Based on Spectral Normalizationand Dynamic HMM Parameter Adaptation 704

    Ming-Whei Feng, GTE Laboratories, Inc. (USA)

    On-line Bayes Adaptation of SCHMM Parametersfor Speech Recognition 708Qiang Huo, Chorkin Chan, University ofHong Kong(HONGKONG)

    Iterative Self-Learning Speaker and ChannelAdaptation under Various Initial ConditionsYunxin Zhao, University ofIllinois at Urbana-Champaign (USA)

    712

    Chair: Costas Xydeas, University ofManchester (UK)

    Fast and Low-Complexity LSF Quantization UsingAlgebraic Vector Quantizer 716

    Minjie Xie, Jean-Pierre Adoul, University ofSherbrooke(CANADA)

    Low Cost Vector Quantization Methods for Spectral

    Coding in Low Rate Speech Coders 720H.R. Sadegh Mohammadi, W.H. Holmes, UniversityofNew South Wales (AUSTRALIA)

    Matrix Product Quantization for Very-Low-RateSpeech CodingStefan Bruhn, Technical University ofBerlin

    (GERMANY)

    724

    An Intrinsically Reliable and Fast Algorithm to

    Compute the Line Spectrum Pairs (LSP) in Low BitRate CELP Coding 728A. Goalie, S. Saoudi, ENST-Bretagne (FRANCE)

    Spectral Dynamics Is More Important Than

    Spectral Distortion 732H. Petter Knagenhjelm, W. Bastiaan Kleijn, AT&TBell Laboratories (USA)

    Efficient Quantization of LSF Parameters UsingClassified SVQ Combined with Conditional

    Splitting 736Dong-il Chang, Young-kwon Cho, Souguil Ann,Seoul National University (KOREA)

    Efficient Coding of LSP Parameters Using SplitMatrix Quantisation 740C.S. Xydeas, C. Papanastasiou, University ofManchester (UK)

    How Good Is Your p? Observations on VQTraining Ratios 744John S. Collura, Thomas E. Tremain, U.S. DepartmentofDefense (USA)

    Variable Rate Spectral Quantization for PhoneticallyClassified CELP Coding 748Roar Hagen, Chalmers University oj'Technology;Erdai Paksoy, Allen Gersho, Universitv ofCalifornia(USA)

    Optimal Distortion Measures for the High RateVector Quantization of LPC ParametersWilliam R. Gardner, University ofCalifornia-SanDiego; Bhaskar D. Rao, Qual Comm, Inc. (USA)

    752

  • SPEECH ANALYSIS

    Chair: Paul Mermelstein, INRS-Telecom (FRANCE)

    Harmonics Tracking and Pitch Extraction Based onInstantaneous Frequency 756Toshihiko Abe, Takao Kobayashi, Satoshi Imai, TokyoInstitute of Technology (JAPAN)

    Decomposition of Speech Signals into Deterministicand Stochastic Components 760C. d'Alessandro, LMSI-CNRS (FRANCE); B.Yegnanarayana, Indian Institute ofTechnology(INDIA); V. Darsinos, University ofPatras (GREECE)

    Modeling and Processing Speech with Sums of AM-FM Formant Models 764

    Shan Lu, Peter C. Doerschuk, Purdue University(USA)

    On the Statistical Properties of Line Spectrum Pairs 768J.S. Erkelens, P.M.T. Broersen, Delft University ofTechnology (THE NETHERLANDS)

    Individual Variations in Glottal Characteristics of

    Female Speakers 772Helen M. Hanson, Harvard University (USA)

    A Robust Method for Determining Instants of MajorExcitations in Voiced Speech 776B. Yegnanarayana, Indian Institute of Technology(INDIA); R.L.H.M, Smits, Institutefor PerceptionResearch (THE NETHERLANDS)

    Interpolation of LPC Spectra via Pole Shifting 780Vladimir Goncharoff, Maureen Kaine-Krolak,

    University ofIllinois at Chicago (USA)

    Speech Formant Frequency and Bandwidth

    Tracking Using Multiband Energy Demodulation 784Alexandras Potamianos, Petros Maragos, GeorgiaInstitute ofTechnology (USA)

    Nonlinear Prediction for Speech Coding UsingRadial Basis Functions 788

    Fernando Diaz-de-Maria, Universidafde Cantabria;Anfbal R. Figueiras-Vidal, Universidad Politecnica

    de Madrid (SPAIN)

    Recognition of Unvoiced Stops from Their Time-

    Frequency Representation 792

    Maria Rangoussi, Anastasios Delopoulos, NationalTechnical University ofAthens (GREECE)

    SPEECH ENHANCEMENT & NOISE

    REDUCTION

    Chair: John H.L. Hansen, Duke University (USA)

    Speech Enhancement Based on Masking Propertiesofthe Auditory System 796Nathalie Virag, Swiss Federal Institute of Technology(SWITZERLAND)

    Optimizing Speech Enhancement by ExploitingMasking Properties of the Human Ear 800A. Akbari Azirani, R. Le Bouquin Jeannes, G. Faucon,Universite de Rennes I (FRANCE)

    A Spectrally-Based Signal Subspace Approach for

    Speech Enhancement 804Yariv Ephraim, Harry L. VanTrees, George MasonUniversity (USA)

    Real-Time Implementation of HMM-Based MMSE

    Algorithm for Speech Enhancement in HearingAid Applications 808H. Sheikhzadeh, Univerisity of Waterloo; R.L.Brennan, Unitron Industries, Ltd.; H. Sameti, UniversityofWaterloo (CANADA)

    New Methods for Adaptive Noise Suppression 812Levent Arslan, Alan McCree, Vishu Viswanathan,Texas Instruments (USA)

    Single-Sensor Speech Enhancement Using aSoft-Decision/Variable Attenuation Algorithm 816E. Bryan George, Lockheed Sanders, Inc. (USA)

    Speech Enhancement Using a Ternary-DecisionBased Filter 820

    T.S. Sun, S. Nandkumar, J, Carmody, J. Rothweiler,A. Goldschen, N. Russell, S. Mpasi, P. Green,Martin Marietta Laboratories (USA)

    Signal Modeling Enhancements for Automatic

    Speech Recognition 824Zaki B. Nossair, Peter L. Silsbee, Stephen A. Zahorian,OldDominion University (USA)

    Co-Channel Speaker Separation 828

    David P. Morgan, E. Bryan George, Texas Instruments;

    Leonard T. Lee, LockheedSanders, Inc.; Stephen M.

    Kay, University ofRhode Island (USA)

    Speech Enhancement Based on the Generalized DualExcitation Model with Adaptive Analysis Window 832

    Chang D, Yoo, Jae S. Lim, Massachusetts Institute ofTechnology (USA)

  • SPECIAL TOPICS IN SPEECH

    RECOGNITION

    Chair: K. Paliwal, Griffith University

    Foreign Accent Classification Using Source Generator

    Based Prosodic Features 836

    John H. L. Hansen, Levent M. Arslan, Duke University

    (USA)

    Automatic Transcription of Unknown Words in a

    Speech Recognition System 840R. Haeb-Umbach, P. Beyerlein, E. Thelen, PhilipsResearch Laboratories-Aachen (GERMANY)

    An Evaluation of an Adaptive Multichannel Systemfor Speech Enhancement with Automatic Phase

    Alignment 844

    Silvana L. do N. Cunha Costa, Benedito G. AguiarNeto, Universidade Federal da Paraiba (BRAZIL)

    Knowing Who to Listen to in Speech Recognition:

    Visually Guided Beamforming 848Udo Bub, Martin Hunke, Alex Waibel, CarnegieMellon University (USA)

    An N-Best Strategy, Dynamic Grammars and

    Selectively Trained Neural Networks for Real-Time

    Recognition of Continuously Spelled Names Over

    the Telephone 852

    Jean-Claude Junqua, Stephane Valente, Speech

    TechnologyLaboratory, (USA); Dominique Fohr,Jean-Francois Mari, CRIN/INRIA, (FRANCE)

    Language Models for a Spelled Letter Recognizer 856Martin Betz, Hermann Hild, Universitat Karlsruhe

    (GERMANY)

    Hands Free Continuous Speech Recognition in NoisyEnvironment Using a Four Microphone Array 860D. Giuliani, M. Matassoni, M. Omologo, P. Svaizer,IRST (ITALY)

    A New Method for Automatic Generation of

    Speaker-Dependent Phonological Rules 864Toru Imai, Akio Ando, Eiichi Miyasaka, NHK Science& Technology Research Labs (JAPAN)

    Enhancing Automatic Speech Recognition with

    an Ultrasonic Lip Motion Detector 868David L. Jennings, Dennis W. Ruck, AFIT/ENG (USA)

    Classification and Clustering of Stop Consonantsvia Nonparametric Transformations and Wavelets 872Basilis Gidas, Brown University; Alejandro Murua,University ofChicago (USA)