CS 4705 Hidden Markov Models · Three fundamental Problems for HMMs • Likelihood: Given an HMM λ...

Preview:

Citation preview

CS4705HiddenMarkovModels

10/2

/19

1

Slides adapted from Dan Jurafsky, and James Martin

Responseson“well”•  Thereisanoilwellafewmilesawayfrommyhouse.Noun.•  Well!IneverthoughtIwouldseethat!Interjec<on•  Itisawelldesignedprogram.Adv•  Tearswelledinhereyes.Verb•  Thestoresellsfruitaswellasvegetables.Conjunc<on.•  Areyouwell?Adj.•  Heandhisfamilywerewelloff.Adjec<valPhrase(?)

10/2

/19

2

Announcements•  Readingfortoday:Chapter7-7.5(NLP),C8.4SpeechandLanguage

•  TheTaswillbeofferingtutorialsonthemathofneuralnetsandinpar<cular,backpropaga<on

10/2

/19

3

Disambiguating“race”

10/2

/19

4

Disambiguating“race”

10/2

/19

5

Disambiguating“race”

10/2

/19

6

Disambiguating“race”

10/2

/19

7

•  P(NN|TO)=.00047•  P(VB|TO)=.83•  P(race|NN)=.00057•  P(race|VB)=.00012•  P(NR|VB)=.0027•  P(NR|NN)=.0012•  P(VB|TO)P(NR|VB)P(race|VB)=.00000027•  P(NN|TO)P(NR|NN)P(race|NN)=.00000000032•  Sowe(correctly)choosetheverbreading,

10/2

/19

8

DeBinitions•  Aweightedfinite-stateautomatonaddsprobabili<estothearcs

•  Thesumoftheprobabili<esleavinganyarcmustsumtoone

•  AMarkovchainisaspecialcaseofaWFST•  theinputsequenceuniquelydetermineswhichstatestheautomatonwillgothrough

•  Markovchainscan’trepresentinherentlyambiguousproblems•  Assignsprobabili<estounambiguoussequences

10/2

/19

9

Markovchainforweather

10/2

/19

10

Markovchainforwords

10/2

/19

11

Markovchain=“First-orderobservableMarkovModel”•  asetofstates

•  Q=q1,q2…qN;thestateat<metisqt•  Transi<onprobabili<es:

•  asetofprobabili<esA=a01a02…an1…ann.•  Eachaijrepresentstheprobabilityoftransi<oningfromstateitostatej•  Thesetoftheseisthetransi<onprobabilitymatrixA

•  Dis<nguishedstartandendstates

10/2

/19

12 €

aij = P(qt = j |qt−1 = i) 1≤ i, j ≤ N

aij =1; 1≤ i ≤ Nj=1

N

Markovchain=“First-orderobservableMarkovModel”

• Currentstateonlydependsonpreviousstate

10/2

/19

13 €

P(qi |q1...qi−1) = P(qi |qi−1)

Anotherrepresentationforstartstate

•  Insteadofstartstate

•  Specialini<alprobabilityvectorπ

•  Anini<aldistribu<onoverprobabilityofstartstates

•  Constraints:

10/2/19 14

π i = P(q1 = i) 1≤ i ≤ N

π j =1j=1

N

TheweatherBigureusingpi

10/2

/19

15

TheweatherBigure:speciBicexample

10/2

/19

16

Markovchainforweather•  Whatistheprobabilityof4consecu<verainydays?

10/2

/19

17

10/2

/19

18

HiddenMarkovModels•  Wedon’tobservePOStags

•  Weinferthemfromthewordswesee

•  Observedevents

•  Hiddenevents

10/2

/19

19

HMMforIceCream•  Youareaclimatologistintheyear2799•  Studyingglobalwarming•  Youcan’tfindanyrecordsoftheweatherinNewYork,NYforsummerof2007

•  ButyoufindKathyMcKeown’sdiary•  Whichlistshowmanyice-creamsKathyateeverydatethatsummer

•  Ourjob:figureouthowhotitwas

10/2

/19

20

HiddenMarkovModel•  ForMarkovchains,theoutputsymbolsarethesameasthestates.•  Seehotweather:we’reinstatehot

• Butinpart-of-speechtagging(andotherthings)•  Theoutputsymbolsarewords•  Thehiddenstatesarepart-of-speechtags

•  Soweneedanextension!• AHiddenMarkovModelisanextensionofaMarkovchaininwhichtheinputsymbolsarenotthesameasthestates.

•  Thismeanswedon’tknowwhichstatewearein.

10/2

/19

21

HiddenMarkovModels•  StatesQ = q1, q2…qN; •  Observa<onsO= o1, o2…oN;

•  Eachobserva<onisasymbolfromavocabularyV={v1,v2,…vV}•  Transi<onprobabili<es

• Transition probability matrix A = {aij}

•  Observa<onlikelihoods• Output probability matrix B={bi(k)}

•  Specialini<alprobabilityvectorπ

π i = P(q1 = i) 1≤ i ≤ N

aij = P(qt = j |qt−1 = i) 1≤ i, j ≤ N

bi(k) = P(Xt = ok |qt = i)

HiddenMarkovModels

•  Someconstraints

10/2

/19

23

π i = P(q1 = i) 1≤ i ≤ N

aij =1; 1≤ i ≤ Nj=1

N

bi(k) =1k=1

M

π j =1j=1

N

Assumptions• Markovassump5on:

• Output-independenceassump5on

10/2

/19

24

P(qi |q1...qi−1) = P(qi |qi−1)

P(ot |O1t−1,q1

t ) = P(ot |qt )

McKeowntask•  Given

•  IceCreamObserva<onSequence:2,1,3,2,2,2,3…

•  Produce:• WeatherSequence:H,C,H,H,H,C…

10/2

/19

25

HMMforicecream

10/2

/19

26

DifferenttypesofHMMstructure

Bakis = left-to-right

Ergodic = fully-connected

TransitionsbetweenthehiddenstatesofHMM,showingAprobs

10/2

/19

28

BobservationlikelihoodsforPOSHMM

ThreefundamentalProblemsforHMMs

•  Likelihood:GivenanHMMλ=(A,B)andanobserva<onsequenceO,determinethelikelihoodP(O,λ).

•  Decoding:Givenanobserva<onsequenceOandanHMMλ=(A,B),discoverthebesthiddenstatesequenceQ.

•  Learning:Givenanobserva<onsequenceOandthesetofstatesintheHMM,learntheHMMparametersAandB.WhatkindofdatawouldweneedtolearntheHMMparameters?

10/2

/19

30

10/2

/19

31

Decoding•  Thebesthiddensequence

• Weathersequenceintheicecreamtask•  POSsequencegivenaninputsentence

• Wecoulduseargmaxovertheprobabilityofeachpossiblehiddenstatesequence• Whynot?

• Viterbialgorithm•  Dynamicprogrammingalgorithm•  Usesadynamicprogrammingtrellis

•  Eachtrelliscellrepresents,vt(j),representstheprobabilitythattheHMMisinstatejanerseeingthefirsttobserva<onsandpassingthroughthemostlikelystatesequence

10/2

/19

32

Viterbiintuition:wearelookingforthebest‘path’

10/2

/19

33 promised to back the bill

VBD

VBN

TO

VB

JJ

NN

RB

DT

NNP

VB

NN

promised to back the bill

VBD

VBN

TO

VB

JJ

NN

RB

DT

NNP

VB

NN

S1 S2 S4 S3 S5

promised to back the bill

VBD

VBN

TO

VB

JJ

NN

RB

DT

NNP

VB

NN

Slide from Dekang Lin

Intuition•  ThevalueineachcelliscomputedbytakingtheMAXoverallpathsthatleadtothiscell.

•  Anextensionofapathfromstateiat<met-1iscomputedbymul<plying:

10/2

/19

34

TheViterbiAlgorithm

10/2

/19

35

TheAmatrixforthePOSHMM

10/2

/19

36

What is P(VB|TO)? What is P(NN|TO)? Why does this make sense? What is P(TO|VB)? What is P(TO|NN)? Why does this make sense?

10/2

/19

37

10/2

/19

38

Whydoesthismakesense?

10/2

/19

39

TheBmatrixforthePOSHMM

10/2

/19

40

Look at P(want|VB) and P(want|NN). Give an explanation for the difference in the probabilities.

Viterbiexample

10/2

/19

41 t=1

Viterbiexample

10/2

/19

42 t=1

X

Viterbiexample

10/2

/19

43 t=1

X

J=NN

I=S

TheAmatrixforthePOSHMM

10/2

/19

44

Viterbiexample

10/2

/19

45 t=1

X

J=NN

I=S

.041X

TheBmatrixforthePOSHMM

10/2

/19

46

Look at P(want|VB) and P(want|NN). Give an explanation for the difference in the probabilities.

Viterbiexample

10/2

/19

47 t=1

X

J=NN

I=S

.041X 0 0

Viterbiexample

10/2

/19

48 t=1

X

J=NN

I=S

.041X 0 0

0

0

.025

Viterbiexample

10/2

/19

49 t=1

J=NN

I=S

0

0

0

.025

Show the 4 formulas you would use to compute the value at this node and the max.

ComputingtheLikelihoodofanobservation

•  Forwardalgorithm

•  Exactlyliketheviterbialgorithm,except•  Tocomputetheprobabilityofastate,sumtheprobabili<esfromeachpath

10/2

/19

50

ErrorAnalysis:ESSENTIAL!!!•  Lookataconfusionmatrix

•  Seewhaterrorsarecausingproblems•  Noun(NN)vsProperNoun(NN)vsAdj(JJ)•  Adverb(RB)vsPrep(IN)vsNoun(NN)•  Preterite(VBD)vsPar<ciple(VBN)vsAdjec<ve(JJ)

10/2

/19

51

Recommended