Transcript
Page 1: 2009 Almost-Spring Short Course on Speech Recognition Instructors:  Bhiksha Raj and Rita Singh

1

2009 Almost-Spring Short Course on Speech Recognition

Instructors: Bhiksha Raj and Rita Singh

Welcome

Page 2: 2009 Almost-Spring Short Course on Speech Recognition Instructors:  Bhiksha Raj and Rita Singh

2

What will the course be about

• We will cover most relevant topics of speech recognition

• The focus will be on the theory and practice– We will not discuss code for the most part– We will keep maths out of it as far as possible,

however

• We will discuss algorithms and implementation details

Page 3: 2009 Almost-Spring Short Course on Speech Recognition Instructors:  Bhiksha Raj and Rita Singh

3

Instructors

• Bhiksha Raj: Carnegie Mellon University– Expert in speech recognition

• Rita Singh: Carnegie Mellon University– Expert in speech recognition

• Peter Wolf: Independent Consultant– Previously in Dragon Systems Inc.– Sphinx4 expert, expert in speech recogintion

application development– Brought in primarily as a resource for helping with

sphinx4 and answering applications related questions

Page 4: 2009 Almost-Spring Short Course on Speech Recognition Instructors:  Bhiksha Raj and Rita Singh

4

Format of Course

• 3 Lectures daily– Morning: 8.00 AM, 1.00 – 1.30 ours– Late Morning / Early Afternoon: 11:00 AM– Afternoon: 2.30 PM

• The schedule is flexible – timings may vary depending on how much is covered

• Lectures expected to last 1.00 – 1.5 hours each

• Intervening times expected to be taken up by exercises

Page 5: 2009 Almost-Spring Short Course on Speech Recognition Instructors:  Bhiksha Raj and Rita Singh

5

Instruction Format

• Lectures will be pictorially oriented

• Although we will cover general topics, the specific implementations described will be based on CMU Sphinx– Most other systems are similar

• Exercises will be based on sphinx

Page 6: 2009 Almost-Spring Short Course on Speech Recognition Instructors:  Bhiksha Raj and Rita Singh

6

Lecture Outline: Day 1

• Lecture 1: “Speech recognition for dummies”– a quick development of speech recognition as string

matching

• Lecture 2: “Feature computation”– Explaining how features are computed for speech

recognition, including all signal processing

• Lecture 3: “Hidden Markov Models”– Describing HMMs and all associated problems

Page 7: 2009 Almost-Spring Short Course on Speech Recognition Instructors:  Bhiksha Raj and Rita Singh

7

Lecture Outline: Day 2

• Lecture 1: “Training From Continuous Speech”– How to train models from continuous speech– Phonemes, why we need them and how to train them

• Lecture 2: “Context dependent phonemes”– What are context dependent phonemes– Various types of context dependent phonemes– Training CD phonemes

• Lecture 3: “Decision Trees and State Tying”– All about decision trees for parameter sharing in ASR systems

Page 8: 2009 Almost-Spring Short Course on Speech Recognition Instructors:  Bhiksha Raj and Rita Singh

8

Lecture Outline: Day 3

• Lecture 1: “Training context-dependent models with tied states”– A (relatively) short lecture explaining the final overall process for

training models

• Lecture 2: “Language Modelling”– How to model “language” for speech recognition– Statistical language modelling

• Lecture 3: “Decoding: Basics”– Describing the basic ideas behind the decoding strategies for

continuous speech

Page 9: 2009 Almost-Spring Short Course on Speech Recognition Instructors:  Bhiksha Raj and Rita Singh

9

Lecture Outline: Day 4

• Lecture 1: “Decoding: Advanced”– Explaining various more advanced approaches to decoding

• Arriving at the state of art

• Lecture 2: “Advanced Topics”– Adaptation, Normalization, Discriminative Training etc.

• Session 3: Open.– Any spillover– Question Answering

Page 10: 2009 Almost-Spring Short Course on Speech Recognition Instructors:  Bhiksha Raj and Rita Singh

10

Exercises: Day 1

• There will be exercises following most lectures

• Lecture 1: None

• Lecture 2: Exercise on capture and feature computation from speech signals

• Lecture 3: None

Page 11: 2009 Almost-Spring Short Course on Speech Recognition Instructors:  Bhiksha Raj and Rita Singh

11

Exercises: Day 2

• Lecture 1: “Training From Continuous Speech”– Exercise on training phoneme models and

recognizing with them

• Lecture 2: “Context dependent phonemes”– Exercise on training models for context-dependent

phonemes and recognizing with them

• Lecture 3: “Decision Trees and State Tying”– Exercise on learning decision trees

Page 12: 2009 Almost-Spring Short Course on Speech Recognition Instructors:  Bhiksha Raj and Rita Singh

12

Exercises: Day 3

• Lecture 1: “Training context-dependent models with tied states”– Exercise on complete training of the ASR system

• Lecture 2: “Language Modelling”– Exercises on building JSGF grammars and Ngram

LMs for speech recognition

• Lecture 3: “Decoding: Basics”

Page 13: 2009 Almost-Spring Short Course on Speech Recognition Instructors:  Bhiksha Raj and Rita Singh

13

Lecture Outline: Day 4

• Lecture 1: “Decoding: Advanced”– Decoding with various speech recognition system

variants:• Sphinx3 flat, Sphinx3 tree, Sphinx4

• Lecture 2: “Advanced Topics”– No exercises

• Session 3: Open.– No exercises

Page 14: 2009 Almost-Spring Short Course on Speech Recognition Instructors:  Bhiksha Raj and Rita Singh

14

Software to Install

• We will be using the CMU sphinx extensively– Sphinxtrain– Sphinx3 decoder– Sphinx4 decoder– CMU LM Toolkit or SRI LM Toolkit

• We will need additional software to go with it– Java, ant, groovy for S4

Page 15: 2009 Almost-Spring Short Course on Speech Recognition Instructors:  Bhiksha Raj and Rita Singh

15

Sphinx Downloads: http://cmusphinx.sourceforge.net

Page 16: 2009 Almost-Spring Short Course on Speech Recognition Instructors:  Bhiksha Raj and Rita Singh

16

• Sphinxbase: – Click on the “sphinxbase” link on the left

– Click “all releases”

– Download version 0.4.1• http://downloads.sourceforge.net/cmusphinx/sphinxbase-0.4.1.tar.bz2?use_

mirror=superb-east

• Sphinx3: – Click on “sphinx3” link on left

– Click on “all releases”

– Download version 3-0.8• http://downloads.sourceforge.net/cmusphinx/sphinx3-0.8.zip?

use_mirror=internap

Sphinx Downloads: http://cmusphinx.sourceforge.net

Page 17: 2009 Almost-Spring Short Course on Speech Recognition Instructors:  Bhiksha Raj and Rita Singh

17

• Cepview: – Click on the “cepview” link on the left

• lm3g2dmp: – Click on “lm3g2dmp” link on left

• The above two are visualization / data-structure optimization tools and are not critical– But they are small, so you might as well download them

• CMULM toolkit: You may install SRI LM toolkit instead– Better maintained – CMU toolkit is not currently maintained

Sphinx Downloads: http://cmusphinx.sourceforge.net

Page 18: 2009 Almost-Spring Short Course on Speech Recognition Instructors:  Bhiksha Raj and Rita Singh

18

• Sphinx4: – For this workshop download a copy of sphinx that is under development

at github.com– http://github.com/juanzanos/sphinx4/tree/master

• Click on download link– Caveat: some scripts may not run; if so we will revert to release version

• Sphinx4 will also need– Java JDK 1.6 -- from http://javasoft.com– Apache ant -- from http://ant.apache.org– A useful scripting tool (some of our latest scripts are in it): Groovy– Groovy can be had from http://groovy.codehaus.org

• Bookmark this link:– http://cmusphinx.sourceforge.net/sphinx4/doc/

UsingSphinxTrainModels.html

Sphinx Downloads: http://cmusphinx.sourceforge.net

Page 19: 2009 Almost-Spring Short Course on Speech Recognition Instructors:  Bhiksha Raj and Rita Singh

19

Operating Systems

• Sphinxbase and Sphinx3 packages have been tried and tested on linux– We are not windows people

• Suggestion: Prefer linux-based machines– You may also try to run these programs on cygwin under

windows• Sphinx* should compile under cygwin

• Install “tcsh” under cygwin

• We will provide tcsh scripts

• Sphinx4 is platform independent

Page 20: 2009 Almost-Spring Short Course on Speech Recognition Instructors:  Bhiksha Raj and Rita Singh

20

Additional Packages

• Would be useful to have a visualization tool– Need to visualize matrices as surfaces

• Matlab would be great

• If you don’t have matlab, download octave– http://www.gnu.org/software/octave/

Page 21: 2009 Almost-Spring Short Course on Speech Recognition Instructors:  Bhiksha Raj and Rita Singh

21

Data

• You may use any data you wish to

• For exercise we will attempt to provide a small amount of data– As much as can be dealt with on your

computers

Page 22: 2009 Almost-Spring Short Course on Speech Recognition Instructors:  Bhiksha Raj and Rita Singh

22

Questions

• ?


Recommended