Lecture 3: ASR: HMMs, Forward, Viterbi - Stanford University · Which lists how many ice-creams...

CS 224S / LINGUIST 285Spoken Language Processing

AndrewMaasStanfordUniversity

Spring2017

Lecture3:ASR:HMMs,Forward,ViterbiOriginalslidesbyDanJurafsky

Fun informative read on phoneticsTheArtofLanguageInvention.DavidJ.Peterson.2015.http://www.artoflanguageinvention.com/books/

Outline for Today� ASRArchitecture� DecodingwithHMMs

� Forward� ViterbiDecoding

� HowthisfitsintotheASRcomponentofcourse� Onyourown:N-gramsandLanguageModeling� Apr12:Training,AdvancedDecoding� Apr17:FeatureExtraction,GMMAcousticModeling� Apr24:NeuralNetworkAcousticModels� May1:Endtoendneuralnetworkspeechrecognition

The Noisy Channel Model

� Searchthroughspaceofallpossiblesentences.� Picktheonethatismostprobablegiventhewaveform.

!"#$%&$'!('!)'

!"#$%&'!&()&(%&

!"#$%&'()!!*+

*#&!!'+)'!"#$%&,!"#$%&*

!"#$%&+

!"#$%&,

-.'/#!0%'1&'

)2&'.""3'".'4"5&666

-.'/#!0%'1&'

)2&'.""3'".'4"5&666

!"#$!"%75&$8'2+998'.+/048

-('+'2"4&'0(')2&'*$"#(3

-.'/#!0%'1&')2&'.""3'".'4"5&

The Noisy Channel Model (II)� WhatisthemostlikelysentenceoutofallsentencesinthelanguageLgivensomeacousticinputO?

� TreatacousticinputOassequenceofindividualobservations�O=o1,o2,o3,…,ot

� Defineasentenceasasequenceofwords:�W=w1,w2,w3,…,wn

Noisy Channel Model (III)� Probabilisticimplication:PickthehighestprobS:

� WecanuseBayesruletorewritethis:

� SincedenominatoristhesameforeachcandidatesentenceW,wecanignoreitfortheargmax:

ˆ W = argmaxW ∈L

P(W | O)

ˆ W = argmaxW ∈L

P(O |W )P(W )€

ˆ W = argmaxW ∈L

P(O |W )P(W )P(O)

Speech Recognition Architecture

!"#$%&'()*"'%+&")",%&'!%-./

0'+$$-'/)1!.+$%-!)2.3"(

2455)*"'%+&"$

#6./")(-7"(-6..3$

822)(",-!./

!9:&';)('/:+':");.3"(

<-%"&=-)>"!.3"&

!"#$%&!'#()#*+)#",,-#,"#.,/)000

#$"%#$!&"%

Noisy channel model

ˆ W = argmaxW ∈L

P(O |W )P(W )

likelihood prior

!"#$%&$'!('!)'

!"#$%&'!&()&(%&

!"#$%&'()!!*+

*#&!!'+)'!"#$%&,!"#$%&*

!"#$%&+

!"#$%&,

-.'/#!0%'1&'

)2&'.""3'".'4"5&666

-.'/#!0%'1&'

)2&'.""3'".'4"5&666

!"#$!"%75&$8'2+998'.+/048

-('+'2"4&'0(')2&'*$"#(3

-.'/#!0%'1&')2&'.""3'".'4"5&

The noisy channel modelIgnoringthedenominatorleavesuswithtwofactors:P(Source)andP(Signal|Source)

Speech Architecture meets Noisy Channel

Decoding Architecture: five easy pieces

� FeatureExtraction:� 39“MFCC” features

� AcousticModel:� Gaussiansforcomputingp(o|q)

� Lexicon/PronunciationModel� HMM:whatphonescanfolloweachother

� LanguageModel� N-gramsforcomputingp(wi|wi-1)

� Decoder� Viterbialgorithm:dynamicprogrammingforcombiningallthesetogetwordsequencefromspeech

Lexicon� Alistofwords� Eachonewithapronunciationintermsofphones� Wegetthesefromon-linepronunciationdictionary

� CMUdictionary:127Kwords� http://www.speech.cs.cmu.edu/cgi-bin/cmudict

� We’llrepresentthelexiconasanHMM

HMMs for speech

Phones are not homogeneous!

Time (s)0.48152 0.937203

Each phone has 3 subphones

'()& *"+,-.%/.01#+2

%,, %$$

%0& %&, %,$ %$2

Resulting HMM word model for “six”

&'()' *+,-#

HMM for the digit recognition task

Markov chain for weather

!"#$"%

Markov chain for words

!"#$"%

*+,"-.

Markov chain = First-order observable Markov Model

� asetofstates� Q=q1,q2…qN; thestateattimetisqt

� Transitionprobabilities:� asetofprobabilitiesA=a01a02…an1…ann.� Eachaijrepresentstheprobabilityoftransitioningfromstateitostatej

� ThesetoftheseisthetransitionprobabilitymatrixA

� Distinguishedstartandendstates€

aij = P(qt = j |qt−1 = i) 1≤ i, j ≤ N

aij =1; 1≤ i ≤ Nj=1

Markov chain = First-order observable Markov Model

Currentstateonlydependsonpreviousstate

Markov Assumption : P(qi |q1!qi−1) = P(qi |qi−1)

Another representation for start state

� Insteadofstartstate�Specialinitialprobabilityvectorp�Aninitialdistributionoverprobabilityofstartstates

�Constraints:

π i = P(q1 = i) 1≤ i ≤ N

π j =1j=1

The weather figure using pi

The weather figure: specific example

Markov chain for weather�Whatistheprobabilityof4consecutivewarmdays?

� Sequenceiswarm-warm-warm-warm� I.e.,statesequenceis3-3-3-3� P(3,3,3,3)=�p3a33a33a33a33 =0.2x(0.6)3 =0.0432

How about?

� Hothothothot� Coldhotcoldhot

�Whatdoesthedifferenceintheseprobabilitiestellyouabouttherealworldweatherinfoencodedinthefigure?

HMM for Ice Cream

� Youareaclimatologistintheyear2799� Studyingglobalwarming� Youcan’tfindanyrecordsoftheweatherinBaltimore,MDforsummerof2008

� ButyoufindJasonEisner’sdiary� Whichlistshowmanyice-creamsJasonateeverydatethatsummer

� Ourjob:figureouthowhotitwas

Hidden Markov Model

� ForMarkovchains,outputsymbols=statesymbols� Seehot weather:we’reinstatehot

� Butnotinspeechrecognition� Outputsymbols:vectorsofacoustics(cepstral features)� Hiddenstates:phones

� Soweneedanextension!� AHiddenMarkovModel isanextensionofaMarkovchaininwhichtheinputsymbolsarenotthesameasthestates.

� Thismeanswedon’tknowwhichstatewearein.

Hidden Markov Models

Assumptions�Markovassumption:

�Output-independenceassumption

P(qi |q1!qi−1) = P(qi |qi−1)

P(ot |O1t−1,q1

t ) = P(ot | q t )

Eisner task

GivenObservedIceCreamSequence:

1,2,3,2,2,2,3…

Produce:HiddenWeatherSequence:

H,C,H,H,H,C…

HMM for ice cream

!"#$"%

&'()*+',-

./-010&'()2000000000034

./*010&'()200005000036

./7010&'()200000000003-

./-010+',200000000003*

./*010+',200005000036

./7010+',2000000000036

Different types of HMM structure

Bakis =left-to-right Ergodic =fully-connected

The Three Basic Problems for HMMs

Problem1(Evaluation):GiventheobservationsequenceO=(o1o2…oT),andanHMMmodelF =(A,B),howdoweefficientlycomputeP(O|F),theprobabilityoftheobservationsequence,giventhemodel?

Problem2(Decoding):GiventheobservationsequenceO=(o1o2…oT),andanHMMmodelF =(A,B),howdowechooseacorrespondingstatesequenceQ=(q1q2…qT) thatisoptimalinsomesense(i.e.,bestexplainstheobservations)?

Problem3(Learning):HowdoweadjustthemodelparametersF =(A,B) tomaximizeP(O|F )?

Jack Ferguson at IDA in the 1960s

Problem 1: computing the observation likelihood

GiventhefollowingHMM:

Howlikelyisthesequence313?

!"#$"%

&'()*+',-

./-010&'()2000000000034

./*010&'()200005000036

./7010&'()200000000003-

./-010+',200000000003*

./*010+',200005000036

./7010+',2000000000036

How to compute likelihood

� ForaMarkovchain,wejustfollowthestates313andmultiplytheprobabilities

� ButforanHMM,wedon’tknowwhatthestatesare!

� Solet’sstartwithasimplersituation.� Computingtheobservationlikelihoodforagivenhiddenstatesequence� SupposeweknewtheweatherandwantedtopredicthowmuchicecreamJasonwouldeat.

� i.e.,P(313|HHC)

Computing likelihood of 3 1 3 given hidden state sequence

Computing joint probability of observation and state sequence

Computing total likelihood of 3 1 3� Wewouldneedtosumover

� Hothotcold� Hothothot� Hotcoldhot� ….

� Howmanypossiblehiddenstatesequencesarethereforthissequence?

� HowaboutingeneralforanHMMwithNhiddenstatesandasequenceofTobservations?� NT

� Sowecan’tjustdoseparatecomputationforeachhiddenstatesequence.

Instead: the Forward algorithm� Adynamicprogrammingalgorithm

� JustlikeMinimumEditDistanceorCKYParsing�Usesatabletostoreintermediatevalues

� Idea:� Computethelikelihoodoftheobservationsequence

� Bysummingoverallpossiblehiddenstatesequences

� Butdoingthisefficiently� Byfoldingallthesequencesintoasingletrellis

The forward algorithm� Thegoaloftheforwardalgorithmistocompute

�We’lldothisbyrecursion

P(o1,o2,...,oT ,qT = qF | λ)

The forward algorithm� Eachcelloftheforwardalgorithmtrellisalphat(j)

� Representstheprobabilityofbeinginstatej� Afterseeingthefirstt observations�Giventheautomaton

� Eachcellthusexpressesthefollowingprobability

The Forward Recursion

The Forward Trellis

*+&,!"#$"-./.*+0,&-

12./.13

*+%,%-./.*+3,%-

14./.12

*+&,&-./.*+3,&-

15./.16

*+&,%-./.*+3,&-10./.16

*+%,&-./.*

17./.12

*+%,!"#$"-/*+0,%-

18./.17

!!"#$9102

!!"!$.9.1:2

!#"#$9.102/1:37.;.1:2/1:8.9.1::5:8

!#"!$.9.102/136.;.1:2/10:.9.1:67

!"#$" !"#$" !"#$"

'() '() '()<=

.32*.14+.02*.08=.0464

We update each cell

!"#$ !"

*&+!",

!!"#$%&"'&!!()"'$&%-&.*&+!",&

!"0$!"#'

!!()"*$

!!()"+$

!!()",$

!!()")$

!!(,"*$

!!(,"+$

!!(,",$

!!(,")$

The Forward Algorithm

Decoding� Givenanobservationsequence

� 313

� AndanHMM� Thetaskofthedecoder

� Tofindthebesthidden statesequence

� GiventheobservationsequenceO=(o1o2…oT),andanHMMmodelF =(A,B),howdowechooseacorrespondingstatesequenceQ=(q1q2…qT) thatisoptimalinsomesense(i.e.,bestexplainstheobservations)

Decoding� Onepossibility:

� ForeachhiddenstatesequenceQ� HHH,HHC,HCH,

� ComputeP(O|Q)� Pickthehighestone

� Whynot?�NT

� Instead:� TheViterbialgorithm� Isagainadynamicprogramming algorithm�UsesasimilartrellistotheForwardalgorithm

Viterbi intuition� Wewanttocomputethejointprobabilityoftheobservationsequencetogetherwiththebeststatesequence

maxq 0,q1,...,qT

P(q0,q1,...,qT ,o1,o2,...,oT ,qT = qF | λ)

Viterbi Recursion

The Viterbi trellis

*+&,!"#$"-./.*+0,&-

12./.13

*+%,%-./.*+3,%-

14./.12

*+&,&-./.*+3,&-

15./.16

*+&,%-./.*+3,&-10./.16

*+%,&-./.*

17./.12

*+%,!"#$"-/*+0,%-

18./.17

!"#$%9102

!"#"%.9.1:2

!$#$%9.;#<+102/1:37=.1:2/1:8-.9.1:778

!$#"%.9.;#<+102/136=.1:2/10:-.9.1:78

!"#$" !"#$" !"#$"

'() '() '()>?

@3 @2 @0

Viterbi intuition� Processobservationsequencelefttoright� Fillingoutthetrellis� Eachcell:

Viterbi Algorithm

Viterbi backtrace

*+&,!"#$"-./.*+0,&-

12./.13

*+%,%-./.*+3,%-

14./.12

*+&,&-./.*+3,&-

15./.16

*+&,%-./.*+3,&-10./.16

*+%,&-./.*

17./.12

*+%,!"#$"-/*+0,%-

18./.17

!"#$%9102

!"#"%.9.1:2

!$#$%9.;#<+102/1:37=.1:2/1:8-.9.1:778

!$#"%.9.;#<+102/136=.1:2/10:-.9.1:78

!"#$" !"#$" !"#$"

'() '() '()>?

@3 @2 @0

HMMs for Speech� Wehaven’tyetshownhowtolearn theAandBmatricesforHMMs;� we’lldothatonThursday� TheBaum-Welch(Forward-Backwardalg)

� Butlet’sreturntothinkaboutspeech

Reminder: a word looks like this:

&'()' *+,-#

HMM for digit recognition task

The Evaluation (forward) problem for speech� TheobservationsequenceOisaseriesofMFCCvectors

� ThehiddenstatesWarethephonesandwords� Foragivenphone/wordstringW,ourjobistoevaluateP(O|W)

� Intuition:howlikelyistheinputtohavebeengeneratedbyjustthatwordstringW?

Evaluation for speech: Summing over all different paths!� fayayayayvvvv� ffayayayayvvv� ffffayayayayv� ffayayayayayayv� ffayayayayayayayayv� ffayvvvvvvv

The forward lattice for “five”

The forward trellis for “five”

Viterbi trellis for “five”

Search space with bigrams

!"!" !"## # $$ $ %&%& %&

'('( '(&& & )) )

*&*& *&++ +

-./%)0/1/+&%/2 -./+&%/1/%)0/2

-./%)0/1/%)0/2

-./+&%/1/+&%/2

-./%)0/1/#0$%/2

-./#0$%/1/#0$%/2

-./#0$%/1/%)0/2

-./+&%/1/#0$%/2

-./#0$%/1/+&%/2

Viterbi trellis

Viterbi backtrace

Summary: ASR Architecture� Fiveeasypieces:ASRNoisyChannelarchitecture

� FeatureExtraction:� 39“MFCC” features

� AcousticModel:� Gaussiansforcomputingp(o|q)

� Lexicon/PronunciationModel� HMM:whatphonescanfolloweachother

� LanguageModel� N-gramsforcomputingp(wi|wi-1)

� Decoder� Viterbialgorithm:dynamicprogrammingforcombiningallthesetogetwordsequencefromspeech

Lecture 3: ASR: HMMs, Forward, Viterbi - Stanford University · Which lists how many ice-creams...

Documents

Conformation Based HMMs

What HMMs Can Do

Chapter 6. Optimized Viterbi Decoder Architecturestuprints.ulb.tu-darmstadt.de/1050/7/Thesis06.pdf · Chapter 6. Optimized Viterbi Decoder Architectures ... Viterbi Decoding Background

Hmm, HID HMMs

CS 224S / LINGUIST 281 Speech Recognition, Synthesis, and Dialogue Dan Jurafsky Lecture 5: Intro to ASR+HMMs: Forward, Viterbi, Word Error Rate IP Notice:

Design And Implementation Of Viterbi Decoder Using … · Design And Implementation Of Viterbi Decoder Using Trace Forward Method M.Azarudeen 1 Electronics and Communications Engineering

HMMs Speech Recognition

A(Framework(for(Deploying(( Signal(Processing(Applicaons ... · DSP1 DSP2 DSP3 VCP0(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VCP1( VCP2( VCP3(80Jsample(buﬀers((OFDMsymbols)(Decoded(bits

ENERGY SAVING VITERBI DECODER FOR FORWARD …studentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/... · 9 September 2010 Page | 2 7537929_AnjaliKuppayilSaji.pdf CONTENTS

viterbi algorithm ppt

HMMs and Particle Filters

Family of HMMs

Hmm viterbi

236607 Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden

Forward-Backward-Viterbi procedures in the · PDF fileForward-Backward-Viterbi procedures in the Transferable Belief Model for state sequence analysis using belief functions ... using

Biostatistics 615/815 Lecture 10: Hidden Markov Models · 2012. 10. 4. · Recap. . . . . . HMM. . . . . . Forward-backward. . . . Viterbi. . . . . . . . . . . . Biased Coin. Summary

Viterbi Career Services

L95: Natural Language Syntax and Parsing 1) HMMs and Viterbi · e.g. We don’t normally observe part-of-speech tags in a text. We observe words and infer the tags from the word sequences

HMMS Arts Programs

Information Extraction using HMMs