22
1 2009 Almost-Spring Short Course on Speech Recognition Instructors: Bhiksha Raj and Rita Singh Welcome

2009 Almost-Spring Short Course on Speech Recognition Instructors: Bhiksha Raj and Rita Singh

Embed Size (px)

DESCRIPTION

Welcome. 2009 Almost-Spring Short Course on Speech Recognition Instructors: Bhiksha Raj and Rita Singh. What will the course be about. We will cover most relevant topics of speech recognition The focus will be on the theory and practice We will not discuss code for the most part - PowerPoint PPT Presentation

Citation preview

Page 1: 2009 Almost-Spring Short Course on Speech Recognition Instructors:  Bhiksha Raj and Rita Singh

1

2009 Almost-Spring Short Course on Speech Recognition

Instructors: Bhiksha Raj and Rita Singh

Welcome

Page 2: 2009 Almost-Spring Short Course on Speech Recognition Instructors:  Bhiksha Raj and Rita Singh

2

What will the course be about

• We will cover most relevant topics of speech recognition

• The focus will be on the theory and practice– We will not discuss code for the most part– We will keep maths out of it as far as possible,

however

• We will discuss algorithms and implementation details

Page 3: 2009 Almost-Spring Short Course on Speech Recognition Instructors:  Bhiksha Raj and Rita Singh

3

Instructors

• Bhiksha Raj: Carnegie Mellon University– Expert in speech recognition

• Rita Singh: Carnegie Mellon University– Expert in speech recognition

• Peter Wolf: Independent Consultant– Previously in Dragon Systems Inc.– Sphinx4 expert, expert in speech recogintion

application development– Brought in primarily as a resource for helping with

sphinx4 and answering applications related questions

Page 4: 2009 Almost-Spring Short Course on Speech Recognition Instructors:  Bhiksha Raj and Rita Singh

4

Format of Course

• 3 Lectures daily– Morning: 8.00 AM, 1.00 – 1.30 ours– Late Morning / Early Afternoon: 11:00 AM– Afternoon: 2.30 PM

• The schedule is flexible – timings may vary depending on how much is covered

• Lectures expected to last 1.00 – 1.5 hours each

• Intervening times expected to be taken up by exercises

Page 5: 2009 Almost-Spring Short Course on Speech Recognition Instructors:  Bhiksha Raj and Rita Singh

5

Instruction Format

• Lectures will be pictorially oriented

• Although we will cover general topics, the specific implementations described will be based on CMU Sphinx– Most other systems are similar

• Exercises will be based on sphinx

Page 6: 2009 Almost-Spring Short Course on Speech Recognition Instructors:  Bhiksha Raj and Rita Singh

6

Lecture Outline: Day 1

• Lecture 1: “Speech recognition for dummies”– a quick development of speech recognition as string

matching

• Lecture 2: “Feature computation”– Explaining how features are computed for speech

recognition, including all signal processing

• Lecture 3: “Hidden Markov Models”– Describing HMMs and all associated problems

Page 7: 2009 Almost-Spring Short Course on Speech Recognition Instructors:  Bhiksha Raj and Rita Singh

7

Lecture Outline: Day 2

• Lecture 1: “Training From Continuous Speech”– How to train models from continuous speech– Phonemes, why we need them and how to train them

• Lecture 2: “Context dependent phonemes”– What are context dependent phonemes– Various types of context dependent phonemes– Training CD phonemes

• Lecture 3: “Decision Trees and State Tying”– All about decision trees for parameter sharing in ASR systems

Page 8: 2009 Almost-Spring Short Course on Speech Recognition Instructors:  Bhiksha Raj and Rita Singh

8

Lecture Outline: Day 3

• Lecture 1: “Training context-dependent models with tied states”– A (relatively) short lecture explaining the final overall process for

training models

• Lecture 2: “Language Modelling”– How to model “language” for speech recognition– Statistical language modelling

• Lecture 3: “Decoding: Basics”– Describing the basic ideas behind the decoding strategies for

continuous speech

Page 9: 2009 Almost-Spring Short Course on Speech Recognition Instructors:  Bhiksha Raj and Rita Singh

9

Lecture Outline: Day 4

• Lecture 1: “Decoding: Advanced”– Explaining various more advanced approaches to decoding

• Arriving at the state of art

• Lecture 2: “Advanced Topics”– Adaptation, Normalization, Discriminative Training etc.

• Session 3: Open.– Any spillover– Question Answering

Page 10: 2009 Almost-Spring Short Course on Speech Recognition Instructors:  Bhiksha Raj and Rita Singh

10

Exercises: Day 1

• There will be exercises following most lectures

• Lecture 1: None

• Lecture 2: Exercise on capture and feature computation from speech signals

• Lecture 3: None

Page 11: 2009 Almost-Spring Short Course on Speech Recognition Instructors:  Bhiksha Raj and Rita Singh

11

Exercises: Day 2

• Lecture 1: “Training From Continuous Speech”– Exercise on training phoneme models and

recognizing with them

• Lecture 2: “Context dependent phonemes”– Exercise on training models for context-dependent

phonemes and recognizing with them

• Lecture 3: “Decision Trees and State Tying”– Exercise on learning decision trees

Page 12: 2009 Almost-Spring Short Course on Speech Recognition Instructors:  Bhiksha Raj and Rita Singh

12

Exercises: Day 3

• Lecture 1: “Training context-dependent models with tied states”– Exercise on complete training of the ASR system

• Lecture 2: “Language Modelling”– Exercises on building JSGF grammars and Ngram

LMs for speech recognition

• Lecture 3: “Decoding: Basics”

Page 13: 2009 Almost-Spring Short Course on Speech Recognition Instructors:  Bhiksha Raj and Rita Singh

13

Lecture Outline: Day 4

• Lecture 1: “Decoding: Advanced”– Decoding with various speech recognition system

variants:• Sphinx3 flat, Sphinx3 tree, Sphinx4

• Lecture 2: “Advanced Topics”– No exercises

• Session 3: Open.– No exercises

Page 14: 2009 Almost-Spring Short Course on Speech Recognition Instructors:  Bhiksha Raj and Rita Singh

14

Software to Install

• We will be using the CMU sphinx extensively– Sphinxtrain– Sphinx3 decoder– Sphinx4 decoder– CMU LM Toolkit or SRI LM Toolkit

• We will need additional software to go with it– Java, ant, groovy for S4

Page 15: 2009 Almost-Spring Short Course on Speech Recognition Instructors:  Bhiksha Raj and Rita Singh

15

Sphinx Downloads: http://cmusphinx.sourceforge.net

Page 16: 2009 Almost-Spring Short Course on Speech Recognition Instructors:  Bhiksha Raj and Rita Singh

16

• Sphinxbase: – Click on the “sphinxbase” link on the left

– Click “all releases”

– Download version 0.4.1• http://downloads.sourceforge.net/cmusphinx/sphinxbase-0.4.1.tar.bz2?use_

mirror=superb-east

• Sphinx3: – Click on “sphinx3” link on left

– Click on “all releases”

– Download version 3-0.8• http://downloads.sourceforge.net/cmusphinx/sphinx3-0.8.zip?

use_mirror=internap

Sphinx Downloads: http://cmusphinx.sourceforge.net

Page 17: 2009 Almost-Spring Short Course on Speech Recognition Instructors:  Bhiksha Raj and Rita Singh

17

• Cepview: – Click on the “cepview” link on the left

• lm3g2dmp: – Click on “lm3g2dmp” link on left

• The above two are visualization / data-structure optimization tools and are not critical– But they are small, so you might as well download them

• CMULM toolkit: You may install SRI LM toolkit instead– Better maintained – CMU toolkit is not currently maintained

Sphinx Downloads: http://cmusphinx.sourceforge.net

Page 18: 2009 Almost-Spring Short Course on Speech Recognition Instructors:  Bhiksha Raj and Rita Singh

18

• Sphinx4: – For this workshop download a copy of sphinx that is under development

at github.com– http://github.com/juanzanos/sphinx4/tree/master

• Click on download link– Caveat: some scripts may not run; if so we will revert to release version

• Sphinx4 will also need– Java JDK 1.6 -- from http://javasoft.com– Apache ant -- from http://ant.apache.org– A useful scripting tool (some of our latest scripts are in it): Groovy– Groovy can be had from http://groovy.codehaus.org

• Bookmark this link:– http://cmusphinx.sourceforge.net/sphinx4/doc/

UsingSphinxTrainModels.html

Sphinx Downloads: http://cmusphinx.sourceforge.net

Page 19: 2009 Almost-Spring Short Course on Speech Recognition Instructors:  Bhiksha Raj and Rita Singh

19

Operating Systems

• Sphinxbase and Sphinx3 packages have been tried and tested on linux– We are not windows people

• Suggestion: Prefer linux-based machines– You may also try to run these programs on cygwin under

windows• Sphinx* should compile under cygwin

• Install “tcsh” under cygwin

• We will provide tcsh scripts

• Sphinx4 is platform independent

Page 20: 2009 Almost-Spring Short Course on Speech Recognition Instructors:  Bhiksha Raj and Rita Singh

20

Additional Packages

• Would be useful to have a visualization tool– Need to visualize matrices as surfaces

• Matlab would be great

• If you don’t have matlab, download octave– http://www.gnu.org/software/octave/

Page 21: 2009 Almost-Spring Short Course on Speech Recognition Instructors:  Bhiksha Raj and Rita Singh

21

Data

• You may use any data you wish to

• For exercise we will attempt to provide a small amount of data– As much as can be dealt with on your

computers

Page 22: 2009 Almost-Spring Short Course on Speech Recognition Instructors:  Bhiksha Raj and Rita Singh

22

Questions

• ?