69

Feature extractor Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors

Embed Size (px)

Citation preview

Page 1: Feature extractor  Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors
Page 2: Feature extractor  Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors
Page 3: Feature extractor  Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors

Feature extractor

Page 4: Feature extractor  Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors

Feature extractorMel-Frequency Cepstral Coefficients

(MFCCs)Feature vectors

Page 5: Feature extractor  Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors

Acoustic Observations

Page 6: Feature extractor  Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors

Acoustic ObservationsHidden States

Page 7: Feature extractor  Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors

Acoustic ObservationsHidden StatesAcoustic Observation likelihoods

Page 8: Feature extractor  Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors

“Six”

Page 9: Feature extractor  Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors
Page 10: Feature extractor  Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors

Constructs the HMMs of phonesProduces observation likelihoods

Page 11: Feature extractor  Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors

Constructs the HMMs for units of speech

Produces observation likelihoods Sampling rate is critical! WSJ vs. WSJ_8k

Page 12: Feature extractor  Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors

Constructs the HMMs for units of speech

Produces observation likelihoods Sampling rate is critical! WSJ vs. WSJ_8kTIDIGITS, RM1, AN4, HUB4

Page 13: Feature extractor  Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors

Word likelihoods

Page 14: Feature extractor  Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors

ARPA format Example:

1-grams:-3.7839 board -0.1552-2.5998 bottom -0.3207-3.7839 bunch -0.21742-grams:-0.7782 as the -0.2717-0.4771 at all 0.0000-0.7782 at the -0.29153-grams:-2.4450 in the lowest -0.5211 in the middle -2.4450 in the on

Page 15: Feature extractor  Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors

public <basicCmd> = <startPolite> <command> <endPolite>;

public <startPolite> = (please | kindly | could you ) *;

public <endPolite> = [ please | thanks | thank you ];

<command> = <action> <object>;

<action> = (open | close | delete | move); <object> = [the | a] (window | file | menu);

Page 16: Feature extractor  Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors

Maps words to phoneme sequences

Page 17: Feature extractor  Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors

Example from cmudict.06d

POULTICE P OW L T AH SPOULTICES P OW L T AH S IH ZPOULTON P AW L T AH NPOULTRY P OW L T R IYPOUNCE P AW N SPOUNCED P AW N S TPOUNCEY P AW N S IYPOUNCING P AW N S IH NGPOUNCY P UW NG K IY

Page 18: Feature extractor  Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors

Constructs the search graph of HMMs from: Acoustic model Statistical Language model ~or~ Grammar Dictionary

Page 19: Feature extractor  Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors
Page 20: Feature extractor  Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors
Page 21: Feature extractor  Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors

Can be statically or dynamically constructed

Page 22: Feature extractor  Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors

FlatLinguist

Page 23: Feature extractor  Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors

FlatLinguistDynamicFlatLinguist

Page 24: Feature extractor  Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors

FlatLinguistDynamicFlatLinguistLexTreeLinguist

Page 25: Feature extractor  Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors

Maps feature vectors to search graph

Page 26: Feature extractor  Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors

Searches the graph for the “best fit”

Page 27: Feature extractor  Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors

Searches the graph for the “best fit”

P(sequence of feature vectors| word/phone)

aka. P(O|W)

-> “how likely is the input to have been generated by the word”

Page 28: Feature extractor  Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors

F ay ay ay ay v v v v vF f ay ay ay ay v v v vF f f ay ay ay ay v v vF f f f ay ay ay ay v vF f f f ay ay ay ay ay vF f f f f ay ay ay ay vF f f f f f ay ay ay v…

Page 29: Feature extractor  Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors

TimeO1 O2 O3

Page 30: Feature extractor  Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors

Uses algorithms to weed out low scoring paths during decoding

Page 31: Feature extractor  Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors

Words!

Page 32: Feature extractor  Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors

Most common metricMeasure the # of modifications to

transform recognized sentence into reference sentence

Page 33: Feature extractor  Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors

Reference: “This is a reference sentence.”

Result: “This is neuroscience.”

Page 34: Feature extractor  Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors

Reference: “This is a reference sentence.”

Result: “This is neuroscience.”Requires 2 deletions, 1 substitution

Page 35: Feature extractor  Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors

Reference: “This is a reference sentence.”

Result: “This is neuroscience.”

WER =100 ×deletions+ substitutions+ insertions

Length

Page 36: Feature extractor  Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors

Reference: “This is a reference sentence.”

Result: “This is neuroscience.” D S

D

WER =100 ×2 +1+ 0

5=100 ×

3

5= 60%

Page 37: Feature extractor  Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors
Page 38: Feature extractor  Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors
Page 39: Feature extractor  Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors
Page 40: Feature extractor  Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors
Page 41: Feature extractor  Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors
Page 42: Feature extractor  Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors
Page 43: Feature extractor  Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors
Page 44: Feature extractor  Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors
Page 45: Feature extractor  Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors
Page 46: Feature extractor  Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors
Page 47: Feature extractor  Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors
Page 48: Feature extractor  Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors

Limited Vocab Multi-Speaker

Page 49: Feature extractor  Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors

Limited Vocab Multi-SpeakerExtensive Vocab Single Speaker

Page 50: Feature extractor  Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors

*If you have noisy audio input multiply expected error rate x 2

Page 51: Feature extractor  Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors

Other variables:-Continuous vs. Isolated-Conversational vs. Read-Dialect

Page 52: Feature extractor  Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors

Questions?

Page 53: Feature extractor  Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors

TimeO1 O2 O3

Page 54: Feature extractor  Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors

TimeO1 O2 O3

P(ay | f) *P(O2|ay)

P(f|f) * P(O2 | f)

Page 55: Feature extractor  Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors

TimeO1 O2 O3

P (O1) * P(ay | f) *P(O2|ay)

Page 56: Feature extractor  Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors

TimeO1 O2 O3

Page 57: Feature extractor  Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors

Common Sphinx4 FAQs can be found online:http://cmusphinx.sourceforge.net/sphinx4/doc/Sphinx4-faq.html

What followes are some less-FAQs

Page 58: Feature extractor  Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors

Q. Is a search graph created for every recognition result or one for the recognition app?

A. This depends on which Linguist is used. The flat linguist generates the entire search graph and holds it in memory. It is only useful for small vocab recognition tasks. The lexTreeLinguist dynamically generates search states allowing it to handle very large vocabularies

Page 59: Feature extractor  Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors

Q. How does the Viterbi algorithm save computation over exhaustive search?

A. The Viterbi algorithm saves memory and computation by reusing subproblems already solved within the larger solution. In this way probability calculations which repeat in different paths through the search graph do not get calculated multiple times

Viterbi cost = n2 – n3

Exhaustive search cost = 2n -3n

Page 60: Feature extractor  Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors

Q. Does the linguist use a grammar to construct the search graph if it is available?

A. Yes, a grammar graph is created

Page 61: Feature extractor  Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors

Q. What algorithm does the Pruner use?

A. Sphinx4 uses absolute and relative beam pruning

Page 62: Feature extractor  Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors

Absolute Beam Width - # active search paths

<property name="absoluteBeamWidth" value="5000"/>

Page 63: Feature extractor  Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors

Absolute Beam Width - # active search paths

<property name="absoluteBeamWidth" value="5000"/>

Relative Beam Width – probability threshold

<property name="relativeBeamWidth" value="1E-120"/>

Page 64: Feature extractor  Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors

Absolute Beam Width - # active search paths

<property name="absoluteBeamWidth" value="5000"/>

Relative Beam Width – probability threshold

<property name="relativeBeamWidth" value="1E-120"/>

Word Insertion Probability – Word break likelihood

<property name="wordInsertionProbability" value="0.7"/>

Page 65: Feature extractor  Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors

Absolute Beam Width - # active search paths <property name="absoluteBeamWidth" value="5000"/>Relative Beam Width – probability threshold <property name="relativeBeamWidth" value="1E-120"/> Word Insertion Probability – Word break likelihood <property name="wordInsertionProbability" value="0.7"/> Language Weight – Boosts language model scores <property name="languageWeight" value="10.5"/>

Page 66: Feature extractor  Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors

Silence Insertion Probability – Likelihood of inserting silence

<property name="silenceInsertionProbability" value=".1"/>

Page 67: Feature extractor  Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors

Silence Insertion Probability – Likelihood of inserting silence

<property name="silenceInsertionProbability" value=".1"/>

Filler Insertion Probability – Likelihood of inserting filler words

<property name="fillerInsertionProbability" value="1E-10"/>

Page 68: Feature extractor  Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors

To call a Java example from Python:

import subprocess

subprocess.call(["java", "-mx1000m", "-jar","/Users/Username/sphinx4/bin/Transcriber.jar”)

Page 69: Feature extractor  Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors

Speech and Language Processing 2nd Ed.Daniel Jurafsky and James MartinPearson, 2009

Artificial Intelligence 6th Ed.George LugerAddison Wesley, 2009

Sphinx Whitepaperhttp://cmusphinx.sourceforge.net/sphinx4/#whitepaper

Sphinx Forumhttps://sourceforge.net/projects/cmusphinx/forums