51
CS 4705 Hidden Markov Models 10/2/19 1 Slides adapted from Dan Jurafsky, and James Martin

CS 4705 Hidden Markov Models · Three fundamental Problems for HMMs • Likelihood: Given an HMM λ = (A,B) and an observaon sequence O, determine the likelihood P(O, λ). • Decoding:

  • Upload
    others

  • View
    6

  • Download
    1

Embed Size (px)

Citation preview

Page 1: CS 4705 Hidden Markov Models · Three fundamental Problems for HMMs • Likelihood: Given an HMM λ = (A,B) and an observaon sequence O, determine the likelihood P(O, λ). • Decoding:

CS4705HiddenMarkovModels

10/2

/19

1

Slides adapted from Dan Jurafsky, and James Martin

Page 2: CS 4705 Hidden Markov Models · Three fundamental Problems for HMMs • Likelihood: Given an HMM λ = (A,B) and an observaon sequence O, determine the likelihood P(O, λ). • Decoding:

Responseson“well”•  Thereisanoilwellafewmilesawayfrommyhouse.Noun.•  Well!IneverthoughtIwouldseethat!Interjec<on•  Itisawelldesignedprogram.Adv•  Tearswelledinhereyes.Verb•  Thestoresellsfruitaswellasvegetables.Conjunc<on.•  Areyouwell?Adj.•  Heandhisfamilywerewelloff.Adjec<valPhrase(?)

10/2

/19

2

Page 3: CS 4705 Hidden Markov Models · Three fundamental Problems for HMMs • Likelihood: Given an HMM λ = (A,B) and an observaon sequence O, determine the likelihood P(O, λ). • Decoding:

Announcements•  Readingfortoday:Chapter7-7.5(NLP),C8.4SpeechandLanguage

•  TheTaswillbeofferingtutorialsonthemathofneuralnetsandinpar<cular,backpropaga<on

10/2

/19

3

Page 4: CS 4705 Hidden Markov Models · Three fundamental Problems for HMMs • Likelihood: Given an HMM λ = (A,B) and an observaon sequence O, determine the likelihood P(O, λ). • Decoding:

Disambiguating“race”

10/2

/19

4

Page 5: CS 4705 Hidden Markov Models · Three fundamental Problems for HMMs • Likelihood: Given an HMM λ = (A,B) and an observaon sequence O, determine the likelihood P(O, λ). • Decoding:

Disambiguating“race”

10/2

/19

5

Page 6: CS 4705 Hidden Markov Models · Three fundamental Problems for HMMs • Likelihood: Given an HMM λ = (A,B) and an observaon sequence O, determine the likelihood P(O, λ). • Decoding:

Disambiguating“race”

10/2

/19

6

Page 7: CS 4705 Hidden Markov Models · Three fundamental Problems for HMMs • Likelihood: Given an HMM λ = (A,B) and an observaon sequence O, determine the likelihood P(O, λ). • Decoding:

Disambiguating“race”

10/2

/19

7

Page 8: CS 4705 Hidden Markov Models · Three fundamental Problems for HMMs • Likelihood: Given an HMM λ = (A,B) and an observaon sequence O, determine the likelihood P(O, λ). • Decoding:

•  P(NN|TO)=.00047•  P(VB|TO)=.83•  P(race|NN)=.00057•  P(race|VB)=.00012•  P(NR|VB)=.0027•  P(NR|NN)=.0012•  P(VB|TO)P(NR|VB)P(race|VB)=.00000027•  P(NN|TO)P(NR|NN)P(race|NN)=.00000000032•  Sowe(correctly)choosetheverbreading,

10/2

/19

8

Page 9: CS 4705 Hidden Markov Models · Three fundamental Problems for HMMs • Likelihood: Given an HMM λ = (A,B) and an observaon sequence O, determine the likelihood P(O, λ). • Decoding:

DeBinitions•  Aweightedfinite-stateautomatonaddsprobabili<estothearcs

•  Thesumoftheprobabili<esleavinganyarcmustsumtoone

•  AMarkovchainisaspecialcaseofaWFST•  theinputsequenceuniquelydetermineswhichstatestheautomatonwillgothrough

•  Markovchainscan’trepresentinherentlyambiguousproblems•  Assignsprobabili<estounambiguoussequences

10/2

/19

9

Page 10: CS 4705 Hidden Markov Models · Three fundamental Problems for HMMs • Likelihood: Given an HMM λ = (A,B) and an observaon sequence O, determine the likelihood P(O, λ). • Decoding:

Markovchainforweather

10/2

/19

10

Page 11: CS 4705 Hidden Markov Models · Three fundamental Problems for HMMs • Likelihood: Given an HMM λ = (A,B) and an observaon sequence O, determine the likelihood P(O, λ). • Decoding:

Markovchainforwords

10/2

/19

11

Page 12: CS 4705 Hidden Markov Models · Three fundamental Problems for HMMs • Likelihood: Given an HMM λ = (A,B) and an observaon sequence O, determine the likelihood P(O, λ). • Decoding:

Markovchain=“First-orderobservableMarkovModel”•  asetofstates

•  Q=q1,q2…qN;thestateat<metisqt•  Transi<onprobabili<es:

•  asetofprobabili<esA=a01a02…an1…ann.•  Eachaijrepresentstheprobabilityoftransi<oningfromstateitostatej•  Thesetoftheseisthetransi<onprobabilitymatrixA

•  Dis<nguishedstartandendstates

10/2

/19

12 €

aij = P(qt = j |qt−1 = i) 1≤ i, j ≤ N

aij =1; 1≤ i ≤ Nj=1

N

Page 13: CS 4705 Hidden Markov Models · Three fundamental Problems for HMMs • Likelihood: Given an HMM λ = (A,B) and an observaon sequence O, determine the likelihood P(O, λ). • Decoding:

Markovchain=“First-orderobservableMarkovModel”

• Currentstateonlydependsonpreviousstate

10/2

/19

13 €

P(qi |q1...qi−1) = P(qi |qi−1)

Page 14: CS 4705 Hidden Markov Models · Three fundamental Problems for HMMs • Likelihood: Given an HMM λ = (A,B) and an observaon sequence O, determine the likelihood P(O, λ). • Decoding:

Anotherrepresentationforstartstate

•  Insteadofstartstate

•  Specialini<alprobabilityvectorπ

•  Anini<aldistribu<onoverprobabilityofstartstates

•  Constraints:

10/2/19 14

π i = P(q1 = i) 1≤ i ≤ N

π j =1j=1

N

Page 15: CS 4705 Hidden Markov Models · Three fundamental Problems for HMMs • Likelihood: Given an HMM λ = (A,B) and an observaon sequence O, determine the likelihood P(O, λ). • Decoding:

TheweatherBigureusingpi

10/2

/19

15

Page 16: CS 4705 Hidden Markov Models · Three fundamental Problems for HMMs • Likelihood: Given an HMM λ = (A,B) and an observaon sequence O, determine the likelihood P(O, λ). • Decoding:

TheweatherBigure:speciBicexample

10/2

/19

16

Page 17: CS 4705 Hidden Markov Models · Three fundamental Problems for HMMs • Likelihood: Given an HMM λ = (A,B) and an observaon sequence O, determine the likelihood P(O, λ). • Decoding:

Markovchainforweather•  Whatistheprobabilityof4consecu<verainydays?

10/2

/19

17

Page 18: CS 4705 Hidden Markov Models · Three fundamental Problems for HMMs • Likelihood: Given an HMM λ = (A,B) and an observaon sequence O, determine the likelihood P(O, λ). • Decoding:

10/2

/19

18

Page 19: CS 4705 Hidden Markov Models · Three fundamental Problems for HMMs • Likelihood: Given an HMM λ = (A,B) and an observaon sequence O, determine the likelihood P(O, λ). • Decoding:

HiddenMarkovModels•  Wedon’tobservePOStags

•  Weinferthemfromthewordswesee

•  Observedevents

•  Hiddenevents

10/2

/19

19

Page 20: CS 4705 Hidden Markov Models · Three fundamental Problems for HMMs • Likelihood: Given an HMM λ = (A,B) and an observaon sequence O, determine the likelihood P(O, λ). • Decoding:

HMMforIceCream•  Youareaclimatologistintheyear2799•  Studyingglobalwarming•  Youcan’tfindanyrecordsoftheweatherinNewYork,NYforsummerof2007

•  ButyoufindKathyMcKeown’sdiary•  Whichlistshowmanyice-creamsKathyateeverydatethatsummer

•  Ourjob:figureouthowhotitwas

10/2

/19

20

Page 21: CS 4705 Hidden Markov Models · Three fundamental Problems for HMMs • Likelihood: Given an HMM λ = (A,B) and an observaon sequence O, determine the likelihood P(O, λ). • Decoding:

HiddenMarkovModel•  ForMarkovchains,theoutputsymbolsarethesameasthestates.•  Seehotweather:we’reinstatehot

• Butinpart-of-speechtagging(andotherthings)•  Theoutputsymbolsarewords•  Thehiddenstatesarepart-of-speechtags

•  Soweneedanextension!• AHiddenMarkovModelisanextensionofaMarkovchaininwhichtheinputsymbolsarenotthesameasthestates.

•  Thismeanswedon’tknowwhichstatewearein.

10/2

/19

21

Page 22: CS 4705 Hidden Markov Models · Three fundamental Problems for HMMs • Likelihood: Given an HMM λ = (A,B) and an observaon sequence O, determine the likelihood P(O, λ). • Decoding:

HiddenMarkovModels•  StatesQ = q1, q2…qN; •  Observa<onsO= o1, o2…oN;

•  Eachobserva<onisasymbolfromavocabularyV={v1,v2,…vV}•  Transi<onprobabili<es

• Transition probability matrix A = {aij}

•  Observa<onlikelihoods• Output probability matrix B={bi(k)}

•  Specialini<alprobabilityvectorπ

π i = P(q1 = i) 1≤ i ≤ N

aij = P(qt = j |qt−1 = i) 1≤ i, j ≤ N

bi(k) = P(Xt = ok |qt = i)

Page 23: CS 4705 Hidden Markov Models · Three fundamental Problems for HMMs • Likelihood: Given an HMM λ = (A,B) and an observaon sequence O, determine the likelihood P(O, λ). • Decoding:

HiddenMarkovModels

•  Someconstraints

10/2

/19

23

π i = P(q1 = i) 1≤ i ≤ N

aij =1; 1≤ i ≤ Nj=1

N

bi(k) =1k=1

M

π j =1j=1

N

Page 24: CS 4705 Hidden Markov Models · Three fundamental Problems for HMMs • Likelihood: Given an HMM λ = (A,B) and an observaon sequence O, determine the likelihood P(O, λ). • Decoding:

Assumptions• Markovassump5on:

• Output-independenceassump5on

10/2

/19

24

P(qi |q1...qi−1) = P(qi |qi−1)

P(ot |O1t−1,q1

t ) = P(ot |qt )

Page 25: CS 4705 Hidden Markov Models · Three fundamental Problems for HMMs • Likelihood: Given an HMM λ = (A,B) and an observaon sequence O, determine the likelihood P(O, λ). • Decoding:

McKeowntask•  Given

•  IceCreamObserva<onSequence:2,1,3,2,2,2,3…

•  Produce:• WeatherSequence:H,C,H,H,H,C…

10/2

/19

25

Page 26: CS 4705 Hidden Markov Models · Three fundamental Problems for HMMs • Likelihood: Given an HMM λ = (A,B) and an observaon sequence O, determine the likelihood P(O, λ). • Decoding:

HMMforicecream

10/2

/19

26

Page 27: CS 4705 Hidden Markov Models · Three fundamental Problems for HMMs • Likelihood: Given an HMM λ = (A,B) and an observaon sequence O, determine the likelihood P(O, λ). • Decoding:

DifferenttypesofHMMstructure

Bakis = left-to-right

Ergodic = fully-connected

Page 28: CS 4705 Hidden Markov Models · Three fundamental Problems for HMMs • Likelihood: Given an HMM λ = (A,B) and an observaon sequence O, determine the likelihood P(O, λ). • Decoding:

TransitionsbetweenthehiddenstatesofHMM,showingAprobs

10/2

/19

28

Page 29: CS 4705 Hidden Markov Models · Three fundamental Problems for HMMs • Likelihood: Given an HMM λ = (A,B) and an observaon sequence O, determine the likelihood P(O, λ). • Decoding:

BobservationlikelihoodsforPOSHMM

Page 30: CS 4705 Hidden Markov Models · Three fundamental Problems for HMMs • Likelihood: Given an HMM λ = (A,B) and an observaon sequence O, determine the likelihood P(O, λ). • Decoding:

ThreefundamentalProblemsforHMMs

•  Likelihood:GivenanHMMλ=(A,B)andanobserva<onsequenceO,determinethelikelihoodP(O,λ).

•  Decoding:Givenanobserva<onsequenceOandanHMMλ=(A,B),discoverthebesthiddenstatesequenceQ.

•  Learning:Givenanobserva<onsequenceOandthesetofstatesintheHMM,learntheHMMparametersAandB.WhatkindofdatawouldweneedtolearntheHMMparameters?

10/2

/19

30

Page 31: CS 4705 Hidden Markov Models · Three fundamental Problems for HMMs • Likelihood: Given an HMM λ = (A,B) and an observaon sequence O, determine the likelihood P(O, λ). • Decoding:

10/2

/19

31

Page 32: CS 4705 Hidden Markov Models · Three fundamental Problems for HMMs • Likelihood: Given an HMM λ = (A,B) and an observaon sequence O, determine the likelihood P(O, λ). • Decoding:

Decoding•  Thebesthiddensequence

• Weathersequenceintheicecreamtask•  POSsequencegivenaninputsentence

• Wecoulduseargmaxovertheprobabilityofeachpossiblehiddenstatesequence• Whynot?

• Viterbialgorithm•  Dynamicprogrammingalgorithm•  Usesadynamicprogrammingtrellis

•  Eachtrelliscellrepresents,vt(j),representstheprobabilitythattheHMMisinstatejanerseeingthefirsttobserva<onsandpassingthroughthemostlikelystatesequence

10/2

/19

32

Page 33: CS 4705 Hidden Markov Models · Three fundamental Problems for HMMs • Likelihood: Given an HMM λ = (A,B) and an observaon sequence O, determine the likelihood P(O, λ). • Decoding:

Viterbiintuition:wearelookingforthebest‘path’

10/2

/19

33 promised to back the bill

VBD

VBN

TO

VB

JJ

NN

RB

DT

NNP

VB

NN

promised to back the bill

VBD

VBN

TO

VB

JJ

NN

RB

DT

NNP

VB

NN

S1 S2 S4 S3 S5

promised to back the bill

VBD

VBN

TO

VB

JJ

NN

RB

DT

NNP

VB

NN

Slide from Dekang Lin

Page 34: CS 4705 Hidden Markov Models · Three fundamental Problems for HMMs • Likelihood: Given an HMM λ = (A,B) and an observaon sequence O, determine the likelihood P(O, λ). • Decoding:

Intuition•  ThevalueineachcelliscomputedbytakingtheMAXoverallpathsthatleadtothiscell.

•  Anextensionofapathfromstateiat<met-1iscomputedbymul<plying:

10/2

/19

34

Page 35: CS 4705 Hidden Markov Models · Three fundamental Problems for HMMs • Likelihood: Given an HMM λ = (A,B) and an observaon sequence O, determine the likelihood P(O, λ). • Decoding:

TheViterbiAlgorithm

10/2

/19

35

Page 36: CS 4705 Hidden Markov Models · Three fundamental Problems for HMMs • Likelihood: Given an HMM λ = (A,B) and an observaon sequence O, determine the likelihood P(O, λ). • Decoding:

TheAmatrixforthePOSHMM

10/2

/19

36

What is P(VB|TO)? What is P(NN|TO)? Why does this make sense? What is P(TO|VB)? What is P(TO|NN)? Why does this make sense?

Page 37: CS 4705 Hidden Markov Models · Three fundamental Problems for HMMs • Likelihood: Given an HMM λ = (A,B) and an observaon sequence O, determine the likelihood P(O, λ). • Decoding:

10/2

/19

37

Page 38: CS 4705 Hidden Markov Models · Three fundamental Problems for HMMs • Likelihood: Given an HMM λ = (A,B) and an observaon sequence O, determine the likelihood P(O, λ). • Decoding:

10/2

/19

38

Page 39: CS 4705 Hidden Markov Models · Three fundamental Problems for HMMs • Likelihood: Given an HMM λ = (A,B) and an observaon sequence O, determine the likelihood P(O, λ). • Decoding:

Whydoesthismakesense?

10/2

/19

39

Page 40: CS 4705 Hidden Markov Models · Three fundamental Problems for HMMs • Likelihood: Given an HMM λ = (A,B) and an observaon sequence O, determine the likelihood P(O, λ). • Decoding:

TheBmatrixforthePOSHMM

10/2

/19

40

Look at P(want|VB) and P(want|NN). Give an explanation for the difference in the probabilities.

Page 41: CS 4705 Hidden Markov Models · Three fundamental Problems for HMMs • Likelihood: Given an HMM λ = (A,B) and an observaon sequence O, determine the likelihood P(O, λ). • Decoding:

Viterbiexample

10/2

/19

41 t=1

Page 42: CS 4705 Hidden Markov Models · Three fundamental Problems for HMMs • Likelihood: Given an HMM λ = (A,B) and an observaon sequence O, determine the likelihood P(O, λ). • Decoding:

Viterbiexample

10/2

/19

42 t=1

X

Page 43: CS 4705 Hidden Markov Models · Three fundamental Problems for HMMs • Likelihood: Given an HMM λ = (A,B) and an observaon sequence O, determine the likelihood P(O, λ). • Decoding:

Viterbiexample

10/2

/19

43 t=1

X

J=NN

I=S

Page 44: CS 4705 Hidden Markov Models · Three fundamental Problems for HMMs • Likelihood: Given an HMM λ = (A,B) and an observaon sequence O, determine the likelihood P(O, λ). • Decoding:

TheAmatrixforthePOSHMM

10/2

/19

44

Page 45: CS 4705 Hidden Markov Models · Three fundamental Problems for HMMs • Likelihood: Given an HMM λ = (A,B) and an observaon sequence O, determine the likelihood P(O, λ). • Decoding:

Viterbiexample

10/2

/19

45 t=1

X

J=NN

I=S

.041X

Page 46: CS 4705 Hidden Markov Models · Three fundamental Problems for HMMs • Likelihood: Given an HMM λ = (A,B) and an observaon sequence O, determine the likelihood P(O, λ). • Decoding:

TheBmatrixforthePOSHMM

10/2

/19

46

Look at P(want|VB) and P(want|NN). Give an explanation for the difference in the probabilities.

Page 47: CS 4705 Hidden Markov Models · Three fundamental Problems for HMMs • Likelihood: Given an HMM λ = (A,B) and an observaon sequence O, determine the likelihood P(O, λ). • Decoding:

Viterbiexample

10/2

/19

47 t=1

X

J=NN

I=S

.041X 0 0

Page 48: CS 4705 Hidden Markov Models · Three fundamental Problems for HMMs • Likelihood: Given an HMM λ = (A,B) and an observaon sequence O, determine the likelihood P(O, λ). • Decoding:

Viterbiexample

10/2

/19

48 t=1

X

J=NN

I=S

.041X 0 0

0

0

.025

Page 49: CS 4705 Hidden Markov Models · Three fundamental Problems for HMMs • Likelihood: Given an HMM λ = (A,B) and an observaon sequence O, determine the likelihood P(O, λ). • Decoding:

Viterbiexample

10/2

/19

49 t=1

J=NN

I=S

0

0

0

.025

Show the 4 formulas you would use to compute the value at this node and the max.

Page 50: CS 4705 Hidden Markov Models · Three fundamental Problems for HMMs • Likelihood: Given an HMM λ = (A,B) and an observaon sequence O, determine the likelihood P(O, λ). • Decoding:

ComputingtheLikelihoodofanobservation

•  Forwardalgorithm

•  Exactlyliketheviterbialgorithm,except•  Tocomputetheprobabilityofastate,sumtheprobabili<esfromeachpath

10/2

/19

50

Page 51: CS 4705 Hidden Markov Models · Three fundamental Problems for HMMs • Likelihood: Given an HMM λ = (A,B) and an observaon sequence O, determine the likelihood P(O, λ). • Decoding:

ErrorAnalysis:ESSENTIAL!!!•  Lookataconfusionmatrix

•  Seewhaterrorsarecausingproblems•  Noun(NN)vsProperNoun(NN)vsAdj(JJ)•  Adverb(RB)vsPrep(IN)vsNoun(NN)•  Preterite(VBD)vsPar<ciple(VBN)vsAdjec<ve(JJ)

10/2

/19

51