9
Introducing Speech and Language Processing This major new textbook provides a clearly written, concise and accessible introduction to speech and language processing. Assuming knowledge of only the very basics of linguistics and written specifically for students with no technical background, it is the perfect starting point for anyone beginning to study the discipline. Students are introduced to topics such as digital signal processing, speech analysis and synthesis, finite-state machines, automatic speech recognition, parsing and probabilistic grammars, and are shown from a very elementary level how to work with two programming languages, C and Prolog. The accompanying CD-ROM contains all the software described in the book, along with a C compiler, Prolog interpreter and sound file editor, thus providing a self-contained, one-stop resource for the learner. Setting a firm grounding in speech and language processing and an invaluable foundation for further study, Introducing Speech and Language Processing is set to become the leading introduction to the field. JOHN COLEMAN is Director of the Phonetics Laboratory and Reader in Phonetics at Oxford University, having previously worked on speech technology at the universities of Hull and York, and AT&T Bell Laboratories. His previous books are Phonological Representations (Cambridge University Press, 1998) and Acoustics of American English Speech (Springer-Verlag, 1993). www.cambridge.org © Cambridge University Press Cambridge University Press 0521530695 - Introducing Speech and Language Processing John Coleman Frontmatter More information

Introducing Speech and Language Processing

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Introducing Speech and Language ProcessingThis major new textbook provides a clearly written, concise andaccessible introduction to speech and language processing. Assumingknowledge of only the very basics of linguistics and written specificallyfor students with no technical background, it is the perfect starting pointfor anyone beginning to study the discipline. Students are introduced totopics such as digital signal processing, speech analysis and synthesis,finite-state machines, automatic speech recognition, parsing andprobabilistic grammars, and are shown from a very elementary level howto work with two programming languages, C and Prolog. Theaccompanying CD-ROM contains all the software described in the book,along with a C compiler, Prolog interpreter and sound file editor, thusproviding a self-contained, one-stop resource for the learner. Setting afirm grounding in speech and language processing and an invaluablefoundation for further study, Introducing Speech and Language Processing isset to become the leading introduction to the field.

JOHN COLEMAN is Director of the Phonetics Laboratory and Reader inPhonetics at Oxford University, having previously worked on speechtechnology at the universities of Hull and York, and AT&T BellLaboratories. His previous books are Phonological Representations(Cambridge University Press, 1998) and Acoustics of American English Speech(Springer-Verlag, 1993).

www.cambridge.org© Cambridge University Press

Cambridge University Press0521530695 - Introducing Speech and Language ProcessingJohn ColemanFrontmatterMore information

Cambridge Introductions to Language and LinguisticsThis new textbook series provides students and their teachers with accessible introductions to the majorsubjects encountered within the study of language and linguistics. Assuming no prior knowledge of thesubject, each book is written and designed for ease of use in the classroom or seminar, and is ideal foradoption on a modular course as the core recommended textbook. Each book offers the ideal introductorymaterial for each subject, presenting students with an overview of the main topics encountered in theircourse, and features a glossary of useful terms, chapter previews and summaries, suggestions for furtherreading, and helpful exercises. Each book is accompanied by a supporting website.

Books published in the seriesIntroducing Phonology David OddenIntroducing Speech and Language Processing John Coleman

Forthcoming:Introducing Phonetic Science Michael Ashby and John MaidmentIntroducing Sociolinguistics Miriam Meyerhoff Introducing Morphology Maggie Tallerman and S. J. Hannahs Introducing Historical Linguistics Brian Joseph Introducing Second Language Acquisition Muriel Saville-TroikeIntroducing Language Bert Vaux

www.cambridge.org© Cambridge University Press

Cambridge University Press0521530695 - Introducing Speech and Language ProcessingJohn ColemanFrontmatterMore information

Introducing Speechand LanguageProcessing

JOHN COLEMANUniversity of OxfordPhonetics Laboratory

www.cambridge.org© Cambridge University Press

Cambridge University Press0521530695 - Introducing Speech and Language ProcessingJohn ColemanFrontmatterMore information

published by the press syndicate of the university of cambridgeThe Pitt Building, Trumpington Street, Cambridge, United Kingdom

cambridge university pressThe Edinburgh Building, Cambridge, CB2 2RU, UK40 West 20th Street, New York, NY 100114211, USA477 Williamstown Road, Port Melbourne, VIC 3207, AustraliaRuiz de Alarcón 13, 28014 Madrid, SpainDock House, The Waterfront, Cape Town 8001, South Africa

http://www.cambridge.org

© John Coleman 2005

This book is in copyright. Subject to statutory exception and to the provisions of relevant collective licensing agreements, no reproduction of any part may take place without the written permission of Cambridge University Press.

First published 2005

Printed in the United Kingdom at the University Press, Cambridge

Typeface Swift 9/12 pt System Quark ExpressTM [TB]

A catalogue record for this book is available from the British Library

Library of Congress Cataloguing in Publication data

Coleman, John (John S.)Introducing speech and language processing/ John Coleman.

p. cm. — (Cambridge introductions to language and linguistics)Includes bibliographical references and index.ISBN 0 521 82365 X – ISBN 0 521 53069 5 (pbk.)1. Speech processing systems. I. Title. II. Series.TK7882.S65C618 2004006.4�54–dc22 2004048599

ISBN 0 521 82365 X hardbackISBN 0 521 53069 5 paperback

www.cambridge.org© Cambridge University Press

Cambridge University Press0521530695 - Introducing Speech and Language ProcessingJohn ColemanFrontmatterMore information

Acknowledgements and copyright notices xCD-Rom and companion website x

1 Introduction 11.1 About this book 21.2 Purpose of this book 21.3 Some reasons to use this book 31.4 What’s in the book (and what’s not) 51.5 Computational set-up needed for this book 81.6 Computational skills that are necessary in order

to use the book 91.7 Free software suggestions 101.8 Book structure 10

2 Sounds and numbers 132.1 Preparatory assignments 142.2 Solutions 212.3 Sampling 242.4 Quantization 242.5 The sampling theorem 272.6 Generating a signal 292.7 Numeric data types 312.8 The program 342.9 Structure of a loop 352.10 Structure of an array 372.11 Calculating the cosine values 382.12 Structure of the program 392.13 Writing the signal to a file 40

Chapter summary 43Further Exercises 43Further reading 46

3 Digital filters and resonators 473.1 Operations on sequences of numbers 483.2 A program for calculating RMS amplitude 483.3 Filtering 503.4 A program for calculating running

means of 4 523.5 Smoothing over a longer time-window 543.6 Avoiding the need for long windows 543.7 IIR filters in C 61

Contents

www.cambridge.org© Cambridge University Press

Cambridge University Press0521530695 - Introducing Speech and Language ProcessingJohn ColemanFrontmatterMore information

viii CONTENTS

3.8 Structure of the Klatt formant synthesizer 62Chapter summary 68Exercises 68Further reading 69

4 Frequency analysis and linear predictive coding 714.1 Spectral analysis 724.2 Spectral analysis in C 724.3 Cepstral analysis 794.4 Computation of the cepstrum in C 804.5 Pitch tracking using cepstral analysis 834.6 Voicing detection 864.7 f0 estimation by the autocorrelation method 904.8 Linear predictive coding 954.9 C programs for LPC analysis and resynthesis 1004.10 Trying it out 1064.11 Applications of LPC 106

Chapter Summary 109Further exercises 109Further reading 110

5 Finite-state machines 1115.1 Some simple examples 1125.2 A more serious example 1135.3 Deterministic and non-deterministic automata 1165.4 Implementation in Prolog 1185.5 Prolog’s processing strategy and the treatment

of variables 1295.6 Generating strings 1325.7 Three possibly useful applications of that idea 1345.8 Another approach to describing finite-state machines 1355.9 Self-loops 1375.10 Finite-state transducers (FSTs) 1395.11 Using finite-state transducers to relate speech

to phonemes 1445.12 Finite-state phonology 1495.13 Finite-state syntactic processing 153

Chapter summary 156Further exercises 156Further reading 156

6 Introduction to speech recognition techniques 1576.1 Architectures for speech recognition 1586.2 The pattern-recognition approach 1666.3 Dynamic time warping 1686.4 Applications 1776.5 Sources of variability in speech 181

Chapter summary 182Further reading 183

www.cambridge.org© Cambridge University Press

Cambridge University Press0521530695 - Introducing Speech and Language ProcessingJohn ColemanFrontmatterMore information

Contents ix

7 Probabilistic finite-state models 1857.1 Introduction 1867.2 Indeterminacy: n-gram models for part-of-speech

tagging 1877.3 Some probability theory for language modelling 1907.4 Markov models 1927.5 Trigram models 1987.6 Incompleteness of the training corpus 2027.7 Part-of-speech model calculations 2097.8 Using HMMs for speech recognition 2107.9 Chomsky’s objections to Markov models

and some rejoinders 213Chapter summary 219Further reading 219

8 Parsing 2218.1 Introduction 2228.2 A demo 2228.3 ‘Intuitive’ parsing 2238.4 Recursive descent parsing 2258.5 The simplest parsing program 2328.6 Difference lists 2338.7 Generating a parse tree 2368.8 Syllabification 2388.9 Other parsing algorithms 2428.10 Chart parsing 2428.11 Depth-first vs. breadth-first search 2458.12 Deterministic parsing, Marcus parsing and minimal

commitment parsing 2468.13 Parallel parsing 249

Chapter summary 249Further reading 250

9 Using probabilistic grammars 2519.1 Motivations 2529.2 Probabilistic context-free grammars 2569.3 Estimation of rule probabilities 2589.4 A practical example 2619.5 A limitation of probabilistic context-free grammars 2679.6 Tree adjoining grammars 2689.7 Data-oriented parsing 271

Chapter Summary 272Conclusion and suggestions for further reading 272

Appendix: The American Standard Code for Information Interchange (ASCII) 275Glossary 277References 293Index 299

www.cambridge.org© Cambridge University Press

Cambridge University Press0521530695 - Introducing Speech and Language ProcessingJohn ColemanFrontmatterMore information

The production of a textbook such as this isvery much a team effort involving many peo-ple besides the author. I reserve my greatestthanks for the students who have followed myintroductory courses on speech and languageprocessing during the last ten years. Theirinterest in and enthusiasm for the subject sus-tained my efforts to put these lectures intopublishable form, and their critical commentsand questions led to many improvements. Iwould also like to thank my secretary, CeliaGlyn, who patiently transcribed an early set ofrecordings of the classes, thus laying down thefoundation of the text. Equally, my ability todevelop these course materials, and even writethe textbook, is due to the technical supportof Andrew Slater and other computing sup-port staff in my department. In particular, theDTW program in chapter 6 is abbreviated fromcode developed by Andrew, in relation toresearch we have conducted together (Slaterand Coleman 1996).

From Cambridge University Press, I wasencouraged first by Christine Bartels, then byAndrew Winnard and Juliet Davis-Berry to geton and finish it! The reviewers that Cambridgeengaged to comment in detail on the manu-script deserve praise as well as thanks for theirthorough critical commentaries. Their sugges-tions and complaints were most valuable. Inaddition, Robin Coleman provided valuablehelp in compiling the index.

Much of the material on the CD-ROM is copy-right and is provided for use under licence only

by the owner of the book. Some of the materialwas written by others, and is only included herewith their permission under terms stated in therelevant files on the disk. For classroom use, mul-tiple copies of the software may be made withpermission from the Copyright Clearance Centre.Otherwise, copying and distribution of any of thematerial on the CD-ROM is prohibited.

For the non-copyright intellectual propertyof all the methods of speech and language pro-cessing that are explained in the followingpages, I trust that the citations of prior workwill be acceptable indicators of my indebted-ness to earlier workers in the field, with respectto whom we are all students.

CD-ROM and companion website

The CD-ROM included with this book containsall the software used in the course, including:

• Two programming languages (C and Prolog)

• A sound file editor and a text file editor

• All of the program listings printed in thetext, and a few extras

• Licence details

Neither Cambridge University Press nor JohnColeman make any warranties, express orimplied, that the programs contained in thisbook and on the CD-ROM and companion websiteare free of error, or are consistent with any par-ticular standard of merchantability, or that theywill meet your requirements for any particularapplication. They should not be relied on for solv-ing a problem whose incorrect solution could

Acknowledgements andcopyright notices

www.cambridge.org© Cambridge University Press

Cambridge University Press0521530695 - Introducing Speech and Language ProcessingJohn ColemanFrontmatterMore information

result in injury to a person or loss of property. Ifyou do use the programs in such a manner, it isat your own risk. The authors and publisher dis-claim all liability for direct or consequential dam-ages resulting from your use of the program.

I may improve the software in future, as mayother developers of the software included on thedisk. Consequently, revised versions of the soft-ware, or links to revised or alternative resources,may be made available through the companionwebsite, www.islp.org.uk. Information about

bugs in the software that may be found afterpublication will also be placed there, so if any ofthe programs appear to be defective, pleasecheck there for a possible upgrade beforeattempting to contact me. (You can send email tome by visiting the companion website.)

Because material on the internet changes sofrequently, there are no URLs (i.e. addresses ofweb pages) in the text, except for www.islp.org.uk. Instead, I have provided some links touseful websites at www.islp.org.uk.

Acknowledgements and copyright notices xi

www.cambridge.org© Cambridge University Press

Cambridge University Press0521530695 - Introducing Speech and Language ProcessingJohn ColemanFrontmatterMore information