45
Toolkits for ASR; Sphinx Samudravijaya K [email protected] 08-MAR-2011 Workshop on Fundamentals of Automatic Speech Recognition CDAC Noida, 08-MAR-2011 Samudravijaya K [email protected] Toolkits for ASR; Sphinx 1/31

Toolkits for ASR; Sphinx · Toolkits for ASR; Sphinx Samudravijaya K ... • A collection of tools and resources that enables developers/researchers to build successful speech recognizers

Embed Size (px)

Citation preview

Page 1: Toolkits for ASR; Sphinx · Toolkits for ASR; Sphinx Samudravijaya K ... • A collection of tools and resources that enables developers/researchers to build successful speech recognizers

Toolkits for ASR; Sphinx

Samudravijaya [email protected]

08-MAR-2011

Workshop on Fundamentals of Automatic Speech RecognitionCDAC Noida, 08-MAR-2011

Samudravijaya K [email protected] Toolkits for ASR; Sphinx 1/31

Page 2: Toolkits for ASR; Sphinx · Toolkits for ASR; Sphinx Samudravijaya K ... • A collection of tools and resources that enables developers/researchers to build successful speech recognizers

A Block Diagram of an ASR SystemSignal

Feature

MatchingAcoustic

Extraction

LanguageModel

Matching

Model (acoustic domain)

(symbolic domain)

Sentence Hypothesis

Symbol sequence

TestingTraining

Samudravijaya K [email protected] Toolkits for ASR; Sphinx 2/31

Page 3: Toolkits for ASR; Sphinx · Toolkits for ASR; Sphinx Samudravijaya K ... • A collection of tools and resources that enables developers/researchers to build successful speech recognizers

Hierachy of Units in an Utterance

source: “state of art ASR” by Steve Young, 2000

Samudravijaya K [email protected] Toolkits for ASR; Sphinx 3/31

Page 4: Toolkits for ASR; Sphinx · Toolkits for ASR; Sphinx Samudravijaya K ... • A collection of tools and resources that enables developers/researchers to build successful speech recognizers

Sentence HMM is composed of Phone HMMs

Samudravijaya K [email protected] Toolkits for ASR; Sphinx 4/31

Page 5: Toolkits for ASR; Sphinx · Toolkits for ASR; Sphinx Samudravijaya K ... • A collection of tools and resources that enables developers/researchers to build successful speech recognizers

Toolkits for Automatic Speech Recognition

(1) Training, (2) Testing, (3) Performance Evaluation

There are several public domain toolkits that help to build an ASRsystem:

• HTK: Hidden Markov Model ToolKit [1]. Public domain, butdecoder cannot be distributed (C).

• Sphinxes [2]: Open source: (C, C++, java)

Samudravijaya K [email protected] Toolkits for ASR; Sphinx 5/31

Page 6: Toolkits for ASR; Sphinx · Toolkits for ASR; Sphinx Samudravijaya K ... • A collection of tools and resources that enables developers/researchers to build successful speech recognizers

Toolkits for Automatic Speech Recognition

(1) Training, (2) Testing, (3) Performance Evaluation

There are several public domain toolkits that help to build an ASRsystem:

• HTK: Hidden Markov Model ToolKit [1]. Public domain, butdecoder cannot be distributed (C).

• Sphinxes [2]: Open source: (C, C++, java)

• ISIP Production system [3]. Public domain ( without anyrestrictions) (C++)

• Julius Open-Source Large Vocabulary CSR Engine [4]. It usesAcoustic Models in HTK format, and Grammar files in its ownformat. Open license (no limitations on distribution) (C++).

• HMM toolbox for Matlab Useful for Isolated WordRecognition [5].

Samudravijaya K [email protected] Toolkits for ASR; Sphinx 5/31

Page 7: Toolkits for ASR; Sphinx · Toolkits for ASR; Sphinx Samudravijaya K ... • A collection of tools and resources that enables developers/researchers to build successful speech recognizers

What is CMU Sphinx?

According to Arthur Chan (the editor of Hieroglyphs[6], the sphinxmanual in a book form), there are two definitions of Sphinx:

• A large vocabulary speech recognizer with high accuracy andspeed performance.

• A collection of tools and resources that enablesdevelopers/researchers to build successful speech recognizers

Samudravijaya K [email protected] Toolkits for ASR; Sphinx 6/31

Page 8: Toolkits for ASR; Sphinx · Toolkits for ASR; Sphinx Samudravijaya K ... • A collection of tools and resources that enables developers/researchers to build successful speech recognizers

Pocketsphinxsource: “SphinxLunch20041021.ppt” by Arthur Chan, 2004

Samudravijaya K [email protected] Toolkits for ASR; Sphinx 7/31

Page 9: Toolkits for ASR; Sphinx · Toolkits for ASR; Sphinx Samudravijaya K ... • A collection of tools and resources that enables developers/researchers to build successful speech recognizers

A Block Diagram of an ASR SystemSignal

Feature

MatchingAcoustic

Extraction

LanguageModel

Matching

Model (acoustic domain)

(symbolic domain)

Sentence Hypothesis

Symbol sequence

TestingTraining

Samudravijaya K [email protected] Toolkits for ASR; Sphinx 8/31

Page 10: Toolkits for ASR; Sphinx · Toolkits for ASR; Sphinx Samudravijaya K ... • A collection of tools and resources that enables developers/researchers to build successful speech recognizers

Language model training

source: “state of art ASR” by Steve Young, 2000

Samudravijaya K [email protected] Toolkits for ASR; Sphinx 9/31

Page 11: Toolkits for ASR; Sphinx · Toolkits for ASR; Sphinx Samudravijaya K ... • A collection of tools and resources that enables developers/researchers to build successful speech recognizers

CMU-Cambridge SLM toolkit

Samudravijaya K [email protected] Toolkits for ASR; Sphinx 10/31

Page 12: Toolkits for ASR; Sphinx · Toolkits for ASR; Sphinx Samudravijaya K ... • A collection of tools and resources that enables developers/researchers to build successful speech recognizers

Lexicon (Pronunciation Dictionary)

source: “Ph.D. thesis” of Ravi Shankar M., CMU [7]

Samudravijaya K [email protected] Toolkits for ASR; Sphinx 11/31

Page 13: Toolkits for ASR; Sphinx · Toolkits for ASR; Sphinx Samudravijaya K ... • A collection of tools and resources that enables developers/researchers to build successful speech recognizers

source: ”http://speech.tifr.res.in/resources/data/labelSetASR100815.pdf”

Samudravijaya K [email protected] Toolkits for ASR; Sphinx 12/31

Page 14: Toolkits for ASR; Sphinx · Toolkits for ASR; Sphinx Samudravijaya K ... • A collection of tools and resources that enables developers/researchers to build successful speech recognizers

source: “www.liacs.nl/ erwin/SR2003/Sphinx.ppt”

Samudravijaya K [email protected] Toolkits for ASR; Sphinx 13/31

Page 15: Toolkits for ASR; Sphinx · Toolkits for ASR; Sphinx Samudravijaya K ... • A collection of tools and resources that enables developers/researchers to build successful speech recognizers

Feature Extraction (Frontend processing)* wave2feat program computes 13 MFCCs from speech files stored

in any of wav,NIST,raw format.* Caution: use -dither yes option. Excise long silences.* cepView s0001.cep prints the cepstral coefficients.

source: “Ph.D. thesis” of Ravi Shankar M., CMU [7].

Samudravijaya K [email protected] Toolkits for ASR; Sphinx 14/31

Page 16: Toolkits for ASR; Sphinx · Toolkits for ASR; Sphinx Samudravijaya K ... • A collection of tools and resources that enables developers/researchers to build successful speech recognizers

SphinxTrain Training sub-word HMMs

Stages of training (Reference: http://www.speech.cs.cmu.edu/sphinxman/fr4.html):

1 Training context Independent phone HMMs

2 Training context Dependent phone HMMs

3 Decision tree building

4 Training context Dependent tied phone HMMs

5 Recursive Gaussian splitting

Samudravijaya K [email protected] Toolkits for ASR; Sphinx 15/31

Page 17: Toolkits for ASR; Sphinx · Toolkits for ASR; Sphinx Samudravijaya K ... • A collection of tools and resources that enables developers/researchers to build successful speech recognizers

Training Context Independent phone HMMs

2 steps: Initialization and Embedding re-estimation.

Inputs:* Feature vector sequences* Word-level transcriptions* Pronunciation dictionary

Samudravijaya K [email protected] Toolkits for ASR; Sphinx 16/31

Page 18: Toolkits for ASR; Sphinx · Toolkits for ASR; Sphinx Samudravijaya K ... • A collection of tools and resources that enables developers/researchers to build successful speech recognizers

Training Context Independent phone HMMs

2 steps: Initialization and Embedding re-estimation.

Inputs:* Feature vector sequences* Word-level transcriptions* Pronunciation dictionary

(I) Initialization:

1 Make a proto-type HMM (5-state, left-to-right, skipping 1state permitted); copy to all phone HMMs.

2 Compute means and variance of all training feature vectors

3 Initialise Gaussians of all states of phone HMMs with globalmeans and variance.

4 For each and every utterance, generate phone-leveltranscriptions from word-level transcriptions using thepronunciation dictionary.

Samudravijaya K [email protected] Toolkits for ASR; Sphinx 16/31

Page 19: Toolkits for ASR; Sphinx · Toolkits for ASR; Sphinx Samudravijaya K ... • A collection of tools and resources that enables developers/researchers to build successful speech recognizers

Training subword HMMs

An iterative algorithm (Baum-Welch, also known asForward-Backward) is used. The Maximum Likelihood approachguarantees increase of the likelihood of the trained model matchingwith training data with each iteration. To begin with, an initialestimation of parameters of HMMs (A,B , π) is required.

Q: How to get an initial estimation of (λ = {A,B , π}?

A: We can estimate parameters if we know the boundaries of everysubword HMM in training utterances.

Samudravijaya K [email protected] Toolkits for ASR; Sphinx 17/31

Page 20: Toolkits for ASR; Sphinx · Toolkits for ASR; Sphinx Samudravijaya K ... • A collection of tools and resources that enables developers/researchers to build successful speech recognizers

Training subword HMMs

An iterative algorithm (Baum-Welch, also known asForward-Backward) is used. The Maximum Likelihood approachguarantees increase of the likelihood of the trained model matchingwith training data with each iteration. To begin with, an initialestimation of parameters of HMMs (A,B , π) is required.

Q: How to get an initial estimation of (λ = {A,B , π}?

A: We can estimate parameters if we know the boundaries of everysubword HMM in training utterances.

Practical solution: Assume that the durations of all units (phones)are equal. If there are N phones in a training utterance, divide thefeature vector sequence into N equal parts. Assign each part, to aphoneme in the phoneme sequence corresponding to thetranscription of the utterance. Repeat for all training utterances.

Samudravijaya K [email protected] Toolkits for ASR; Sphinx 17/31

Page 21: Toolkits for ASR; Sphinx · Toolkits for ASR; Sphinx Samudravijaya K ... • A collection of tools and resources that enables developers/researchers to build successful speech recognizers

Initial estimation of HMM parameters: an illustration

Let the transcription of the 1st wave file be the following sequenceof words: mera bhaarat mahaan

Let the relevant lines in the dictionary be as follows:bhaarata bh aa r a tmahaana m a h aa nmera m e r aa

The phonemeHMM sequence (of length 16) corresponding to thissentence is sil m e r aa bh aa r a t m a h aa n sil

Samudravijaya K [email protected] Toolkits for ASR; Sphinx 18/31

Page 22: Toolkits for ASR; Sphinx · Toolkits for ASR; Sphinx Samudravijaya K ... • A collection of tools and resources that enables developers/researchers to build successful speech recognizers

Initial estimation of HMM parameters: an illustration

Let the transcription of the 1st wave file be the following sequenceof words: mera bhaarat mahaan

Let the relevant lines in the dictionary be as follows:bhaarata bh aa r a tmahaana m a h aa nmera m e r aa

The phonemeHMM sequence (of length 16) corresponding to thissentence is sil m e r aa bh aa r a t m a h aa n sil

If the duration of the wavefile is 1.0sec, there will 98 featurevectors (frame shift = 10msec and frame size = 25msec).

Assign the first 6 feature vectors to “sil” HMM; the next 6 (7through 12) to “m”; the next 6 (13 through 18) to “e”; ... ; thelast 8 feature vectors to “sil”. If HMM has 3 states, assign 2feature vector to each state; compute mean,SD.Assume ai ,j=0.5 if j=i or j=i+1; else assign 0.

Samudravijaya K [email protected] Toolkits for ASR; Sphinx 18/31

Page 23: Toolkits for ASR; Sphinx · Toolkits for ASR; Sphinx Samudravijaya K ... • A collection of tools and resources that enables developers/researchers to build successful speech recognizers

Embedded Re-estimation

(II) Embedding re-estimation:

1 For each utterance, do the following:• Using the phone-level transcriptions, compose a sentence

HMM out of phone HMMs.• Forward-Backward algorithm: compute the likelihood of each

feature vector being generated by each state of each phoneHMM in the sentence HMM

• Accumulate likelihoods of feature vectors being generated byeach state.

2 For each state: re-estimate HMM parameters using theaccumulated likelihoods.

Samudravijaya K [email protected] Toolkits for ASR; Sphinx 19/31

Page 24: Toolkits for ASR; Sphinx · Toolkits for ASR; Sphinx Samudravijaya K ... • A collection of tools and resources that enables developers/researchers to build successful speech recognizers

Embedded Re-estimation

(II) Embedding re-estimation:

1 For each utterance, do the following:• Using the phone-level transcriptions, compose a sentence

HMM out of phone HMMs.• Forward-Backward algorithm: compute the likelihood of each

feature vector being generated by each state of each phoneHMM in the sentence HMM

• Accumulate likelihoods of feature vectors being generated byeach state.

2 For each state: re-estimate HMM parameters using theaccumulated likelihoods.

Repeat the Embedded Re-estimation a few times.

Samudravijaya K [email protected] Toolkits for ASR; Sphinx 19/31

Page 25: Toolkits for ASR; Sphinx · Toolkits for ASR; Sphinx Samudravijaya K ... • A collection of tools and resources that enables developers/researchers to build successful speech recognizers

Training Context Dependent phone HMMs

1 Initialise N3 triphone models, where N is the number of

phones.

2 Compose sentence HMM out of triphone (CD) models insteadof monophone (CI) models.

3 Carry out the Embedded Re-estimation for a few iterations.

The sequence of CI HMMs wassil m e r aa bh aa r a t m a h aa n silThe sequence of CD HMMs (triphones) issil sil-m+e m-e+r e-r+aa r-aa+bh ...

Samudravijaya K [email protected] Toolkits for ASR; Sphinx 20/31

Page 26: Toolkits for ASR; Sphinx · Toolkits for ASR; Sphinx Samudravijaya K ... • A collection of tools and resources that enables developers/researchers to build successful speech recognizers

Training Context Dependent phone HMMs

1 Initialise N3 triphone models, where N is the number of

phones.

2 Compose sentence HMM out of triphone (CD) models insteadof monophone (CI) models.

3 Carry out the Embedded Re-estimation for a few iterations.

The sequence of CI HMMs wassil m e r aa bh aa r a t m a h aa n silThe sequence of CD HMMs (triphones) issil sil-m+e m-e+r e-r+aa r-aa+bh ...

If N = 50, each HMM has 3 states, there may be upto 375,000states. Each state is associated with one Gaussian. Huge amountof speech data is needed for robust estimation of the parameters(µ,Σ) of 375,000 Gaussians!

Samudravijaya K [email protected] Toolkits for ASR; Sphinx 20/31

Page 27: Toolkits for ASR; Sphinx · Toolkits for ASR; Sphinx Samudravijaya K ... • A collection of tools and resources that enables developers/researchers to build successful speech recognizers

Training Context Dependent phone HMMs

1 Initialise N3 triphone models, where N is the number of

phones.

2 Compose sentence HMM out of triphone (CD) models insteadof monophone (CI) models.

3 Carry out the Embedded Re-estimation for a few iterations.

The sequence of CI HMMs wassil m e r aa bh aa r a t m a h aa n silThe sequence of CD HMMs (triphones) issil sil-m+e m-e+r e-r+aa r-aa+bh ...

If N = 50, each HMM has 3 states, there may be upto 375,000states. Each state is associated with one Gaussian. Huge amountof speech data is needed for robust estimation of the parameters(µ,Σ) of 375,000 Gaussians!

Reduce the number of states by state-tying; use Decision Trees.Samudravijaya K [email protected] Toolkits for ASR; Sphinx 20/31

Page 28: Toolkits for ASR; Sphinx · Toolkits for ASR; Sphinx Samudravijaya K ... • A collection of tools and resources that enables developers/researchers to build successful speech recognizers

Training Context Dependent tied phone HMMs

* Build Decision Trees for parameter sharing.* One decision tree is built for each state position (5 decision treesif there are 5 emitting states of HMMs).

Samudravijaya K [email protected] Toolkits for ASR; Sphinx 21/31

Page 29: Toolkits for ASR; Sphinx · Toolkits for ASR; Sphinx Samudravijaya K ... • A collection of tools and resources that enables developers/researchers to build successful speech recognizers

Training Context Dependent tied phone HMMs

* Build Decision Trees for parameter sharing.* One decision tree is built for each state position (5 decision treesif there are 5 emitting states of HMMs).

The first step is to generate Linguistic Questions. Two methods:

1 Manually create linguistic questions using phonetic knowledge.

2 Run make quests program to automatically form phonegroups.

First few lines of a “linguistic-questions” file may look like this.

SIL sil h s shVOWELS a aa i ii u uu e ee o ooNASAL m n ngLABPLO p ph b bh

Samudravijaya K [email protected] Toolkits for ASR; Sphinx 21/31

Page 30: Toolkits for ASR; Sphinx · Toolkits for ASR; Sphinx Samudravijaya K ... • A collection of tools and resources that enables developers/researchers to build successful speech recognizers

Decision trees are used to decide which of the HMM states of all the triphones (seen and unseen) are similar to

each other, so that data from all these states are collected together and used to train one global state, which is

called a senone (also called a tied state). Example: Left states of 1st and 3rd triphones above would be similar.

Samudravijaya K [email protected] Toolkits for ASR; Sphinx 22/31

Page 31: Toolkits for ASR; Sphinx · Toolkits for ASR; Sphinx Samudravijaya K ... • A collection of tools and resources that enables developers/researchers to build successful speech recognizers

Training Context Dependent tied phone HMMs

1 Prune the Decision trees so that the number of senones (tiedstates) is commensurate with the amount of training data.

Samudravijaya K [email protected] Toolkits for ASR; Sphinx 23/31

Page 32: Toolkits for ASR; Sphinx · Toolkits for ASR; Sphinx Samudravijaya K ... • A collection of tools and resources that enables developers/researchers to build successful speech recognizers

Training Context Dependent tied phone HMMs

1 Prune the Decision trees so that the number of senones (tiedstates) is commensurate with the amount of training data.

2 Create CD tied model definition file that has (a) all triphoneswhich are seen during training, and (b) has the statescorresponding to these triphones identified with senones fromthe pruned trees (state-senone mapping).

3 Carry out the Embedded Re-estimation (tied CD models) fora few iterations.

Samudravijaya K [email protected] Toolkits for ASR; Sphinx 23/31

Page 33: Toolkits for ASR; Sphinx · Toolkits for ASR; Sphinx Samudravijaya K ... • A collection of tools and resources that enables developers/researchers to build successful speech recognizers

Training Context Dependent tied phone HMMs

1 Prune the Decision trees so that the number of senones (tiedstates) is commensurate with the amount of training data.

2 Create CD tied model definition file that has (a) all triphoneswhich are seen during training, and (b) has the statescorresponding to these triphones identified with senones fromthe pruned trees (state-senone mapping).

3 Carry out the Embedded Re-estimation (tied CD models) fora few iterations.

4 Generate Gaussian mixtures for each senone (tied state) andre-train. Repeat this step till the desired number (say 8) ofmixtures are created for each GMM (senone).

Samudravijaya K [email protected] Toolkits for ASR; Sphinx 23/31

Page 34: Toolkits for ASR; Sphinx · Toolkits for ASR; Sphinx Samudravijaya K ... • A collection of tools and resources that enables developers/researchers to build successful speech recognizers

Training Context Dependent tied phone HMMs

1 Prune the Decision trees so that the number of senones (tiedstates) is commensurate with the amount of training data.

2 Create CD tied model definition file that has (a) all triphoneswhich are seen during training, and (b) has the statescorresponding to these triphones identified with senones fromthe pruned trees (state-senone mapping).

3 Carry out the Embedded Re-estimation (tied CD models) fora few iterations.

4 Generate Gaussian mixtures for each senone (tied state) andre-train. Repeat this step till the desired number (say 8) ofmixtures are created for each GMM (senone).

5 One can carry out discriminative training following theMaximum Mutual Information Estimation scheme (maximisesthe posterior probability of the correct word sequence given allpossible word sequences) [9].

Samudravijaya K [email protected] Toolkits for ASR; Sphinx 23/31

Page 35: Toolkits for ASR; Sphinx · Toolkits for ASR; Sphinx Samudravijaya K ... • A collection of tools and resources that enables developers/researchers to build successful speech recognizers

source: “www.liacs.nl/ erwin/SR2003/Sphinx.ppt”

Samudravijaya K [email protected] Toolkits for ASR; Sphinx 24/31

Page 36: Toolkits for ASR; Sphinx · Toolkits for ASR; Sphinx Samudravijaya K ... • A collection of tools and resources that enables developers/researchers to build successful speech recognizers

Inputs to sphinx3 decoder

source: “www.liacs.nl/ erwin/SR2003/Sphinx.ppt”Samudravijaya K [email protected] Toolkits for ASR; Sphinx 25/31

Page 37: Toolkits for ASR; Sphinx · Toolkits for ASR; Sphinx Samudravijaya K ... • A collection of tools and resources that enables developers/researchers to build successful speech recognizers

Sphinx3 decoders

source: “www.liacs.nl/ erwin/SR2003/Sphinx.ppt”

Samudravijaya K [email protected] Toolkits for ASR; Sphinx 26/31

Page 38: Toolkits for ASR; Sphinx · Toolkits for ASR; Sphinx Samudravijaya K ... • A collection of tools and resources that enables developers/researchers to build successful speech recognizers

Output of recogniser

source: “www.liacs.nl/ erwin/SR2003/Sphinx.ppt”

Samudravijaya K [email protected] Toolkits for ASR; Sphinx 27/31

Page 39: Toolkits for ASR; Sphinx · Toolkits for ASR; Sphinx Samudravijaya K ... • A collection of tools and resources that enables developers/researchers to build successful speech recognizers

source: “SphinxLunch20041021.ppt” by Arthur Chan, 2004Samudravijaya K [email protected] Toolkits for ASR; Sphinx 28/31

Page 40: Toolkits for ASR; Sphinx · Toolkits for ASR; Sphinx Samudravijaya K ... • A collection of tools and resources that enables developers/researchers to build successful speech recognizers

Sphinx4

Sphinx-4 is a state-of-the-art speech recognition system writtenentirely in the Java programming language [10].

• Generalized pluggable front end architecture: MFCC, CMN

• Generalized pluggable language model architecture: trigram,JSGF and ARPA-format FST grammars.

• Generalized acoustic model architecture: Sphinx-3 acousticmodels.

• Generalized search management: breadth first and wordpruning

• Post-processing recognition results: obtaining confidencescores, generating lattices.

• Standalone tools: displaying waveforms and spectrograms;generating features from audio.

Samudravijaya K [email protected] Toolkits for ASR; Sphinx 29/31

Page 41: Toolkits for ASR; Sphinx · Toolkits for ASR; Sphinx Samudravijaya K ... • A collection of tools and resources that enables developers/researchers to build successful speech recognizers

Comparison of Performance of Sphinxes

source: [10].

PocketSphinx[11]: It is a small-footprint continuous speech recognition

system, suitable for handheld and desktop applications.

Samudravijaya K [email protected] Toolkits for ASR; Sphinx 30/31

Page 42: Toolkits for ASR; Sphinx · Toolkits for ASR; Sphinx Samudravijaya K ... • A collection of tools and resources that enables developers/researchers to build successful speech recognizers

Sphinx, the eternal mystery

source: [10].

Samudravijaya K [email protected] Toolkits for ASR; Sphinx 31/31

Page 43: Toolkits for ASR; Sphinx · Toolkits for ASR; Sphinx Samudravijaya K ... • A collection of tools and resources that enables developers/researchers to build successful speech recognizers

Bibliography

Cambridge University, UK; Entropic; Microsoft“HTK, Hidden Markov Model ToolKit”http://htk.eng.cam.ac.uk/

Project by Carnegie Mellon University“The CMU Sphinx group open source speech recognitionengines”http://cmusphinx.sourceforge.net/html/cmusphinx.php

Joe Picone et al.

“ISIP Production system” (r02 n02) (23-JUL-2009)http://www.isip.piconepress.com/projects/speech/software/

Japanese Universities and Laboratories“Open-Source Large Vocabulary CSR Engine: Julius”http://julius.sourceforge.jp/en/

Kevin Murphy“HMM toolbox for Matlab”

Samudravijaya K [email protected] Toolkits for ASR; Sphinx 31/31

Page 44: Toolkits for ASR; Sphinx · Toolkits for ASR; Sphinx Samudravijaya K ... • A collection of tools and resources that enables developers/researchers to build successful speech recognizers

http://www.cs.ubc.ca/ murphyk/Software/HMM/hmm.html

Arthur Chan“Hieroglyphs: Building Speech Application Using Sphinx andRelated Resources”, (3rd Draft) 11-MAR-2007http://www.cs.cmu.edu/archan/sphinxDoc.html

Ravishankar M.,“Efficient Algorithms for Speech Recognition”Ph.D Thesis, Carnegie Mellon University, May 1996, TechReport. CMU-CS-96-143http://www.cs.cmu.edu/ rkm/th/th.pdf

Cambridge University, UK; Entropic; Microsoft“HTK Book”, Documentation of HTKhttp://htk.eng.cam.ac.uk/docs/docs.shtml

L Qin and A Rudnicky“Implementing and Improving MMIE Training in SphinxTrain”CMU Sphinx Workshop 2010, 13 March 2010, Dallas, USAhttp://www.cs.cmu.edu/ sphinx/Sphinx2010/papers/107.unblinded.p

Samudravijaya K [email protected] Toolkits for ASR; Sphinx 31/31

Page 45: Toolkits for ASR; Sphinx · Toolkits for ASR; Sphinx Samudravijaya K ... • A collection of tools and resources that enables developers/researchers to build successful speech recognizers

Bhiksharaj et al.

A speech recognizer written entirely in the Java programminglanguagehttp://cmusphinx.sourceforge.net/sphinx4/

A small-footprint continuous speech recognition systemhttp://cmusphinx.sourceforge.net/2010/03/pocketsphinx-0-6-release/

Samudravijaya K [email protected] Toolkits for ASR; Sphinx 31/31