CS 4705 Hidden Markov Models · Three fundamental Problems for HMMs • Likelihood: Given an HMM λ...

CS4705HiddenMarkovModels

Slides adapted from Dan Jurafsky, and James Martin

Responseson“well”•  Thereisanoilwellafewmilesawayfrommyhouse.Noun.•  Well!IneverthoughtIwouldseethat!Interjec<on•  Itisawelldesignedprogram.Adv•  Tearswelledinhereyes.Verb•  Thestoresellsfruitaswellasvegetables.Conjunc<on.•  Areyouwell?Adj.•  Heandhisfamilywerewelloff.Adjec<valPhrase(?)

Announcements•  Readingfortoday:Chapter7-7.5(NLP),C8.4SpeechandLanguage

•  TheTaswillbeofferingtutorialsonthemathofneuralnetsandinpar<cular,backpropaga<on

Disambiguating“race”

DeBinitions•  Aweightedfinite-stateautomatonaddsprobabili<estothearcs

•  Thesumoftheprobabili<esleavinganyarcmustsumtoone

•  AMarkovchainisaspecialcaseofaWFST•  theinputsequenceuniquelydetermineswhichstatestheautomatonwillgothrough

•  Markovchainscan’trepresentinherentlyambiguousproblems•  Assignsprobabili<estounambiguoussequences

Markovchainforweather

Markovchainforwords

Markovchain=“First-orderobservableMarkovModel”•  asetofstates

•  Q=q1,q2…qN;thestateat<metisqt•  Transi<onprobabili<es:

•  asetofprobabili<esA=a01a02…an1…ann.•  Eachaijrepresentstheprobabilityoftransi<oningfromstateitostatej•  Thesetoftheseisthetransi<onprobabilitymatrixA

•  Dis<nguishedstartandendstates

12 €

aij = P(qt = j |qt−1 = i) 1≤ i, j ≤ N

aij =1; 1≤ i ≤ Nj=1

Markovchain=“First-orderobservableMarkovModel”

• Currentstateonlydependsonpreviousstate

13 €

P(qi |q1...qi−1) = P(qi |qi−1)

Anotherrepresentationforstartstate

•  Insteadofstartstate

•  Specialini<alprobabilityvectorπ

•  Anini<aldistribu<onoverprobabilityofstartstates

•  Constraints:

10/2/19 14

π i = P(q1 = i) 1≤ i ≤ N

π j =1j=1

TheweatherBigureusingpi

TheweatherBigure:speciBicexample

Markovchainforweather•  Whatistheprobabilityof4consecu<verainydays?

HiddenMarkovModels•  Wedon’tobservePOStags

•  Weinferthemfromthewordswesee

•  Observedevents

•  Hiddenevents

HMMforIceCream•  Youareaclimatologistintheyear2799•  Studyingglobalwarming•  Youcan’tfindanyrecordsoftheweatherinNewYork,NYforsummerof2007

•  ButyoufindKathyMcKeown’sdiary•  Whichlistshowmanyice-creamsKathyateeverydatethatsummer

•  Ourjob:figureouthowhotitwas

HiddenMarkovModel•  ForMarkovchains,theoutputsymbolsarethesameasthestates.•  Seehotweather:we’reinstatehot

• Butinpart-of-speechtagging(andotherthings)•  Theoutputsymbolsarewords•  Thehiddenstatesarepart-of-speechtags

•  Soweneedanextension!• AHiddenMarkovModelisanextensionofaMarkovchaininwhichtheinputsymbolsarenotthesameasthestates.

•  Thismeanswedon’tknowwhichstatewearein.

HiddenMarkovModels•  StatesQ = q1, q2…qN; •  Observa<onsO= o1, o2…oN;

•  Eachobserva<onisasymbolfromavocabularyV={v1,v2,…vV}•  Transi<onprobabili<es

• Transition probability matrix A = {aij}

•  Observa<onlikelihoods• Output probability matrix B={bi(k)}

•  Specialini<alprobabilityvectorπ

π i = P(q1 = i) 1≤ i ≤ N

aij = P(qt = j |qt−1 = i) 1≤ i, j ≤ N

bi(k) = P(Xt = ok |qt = i)

HiddenMarkovModels

•  Someconstraints

π i = P(q1 = i) 1≤ i ≤ N

aij =1; 1≤ i ≤ Nj=1

bi(k) =1k=1

π j =1j=1

Assumptions• Markovassump5on:

• Output-independenceassump5on

P(qi |q1...qi−1) = P(qi |qi−1)

P(ot |O1t−1,q1

t ) = P(ot |qt )

McKeowntask•  Given

•  IceCreamObserva<onSequence:2,1,3,2,2,2,3…

•  Produce:• WeatherSequence:H,C,H,H,H,C…

HMMforicecream

DifferenttypesofHMMstructure

Bakis = left-to-right

Ergodic = fully-connected

TransitionsbetweenthehiddenstatesofHMM,showingAprobs

BobservationlikelihoodsforPOSHMM

ThreefundamentalProblemsforHMMs

•  Likelihood:GivenanHMMλ=(A,B)andanobserva<onsequenceO,determinethelikelihoodP(O,λ).

•  Decoding:Givenanobserva<onsequenceOandanHMMλ=(A,B),discoverthebesthiddenstatesequenceQ.

•  Learning:Givenanobserva<onsequenceOandthesetofstatesintheHMM,learntheHMMparametersAandB.WhatkindofdatawouldweneedtolearntheHMMparameters?

Decoding•  Thebesthiddensequence

• Weathersequenceintheicecreamtask•  POSsequencegivenaninputsentence

• Wecoulduseargmaxovertheprobabilityofeachpossiblehiddenstatesequence• Whynot?

• Viterbialgorithm•  Dynamicprogrammingalgorithm•  Usesadynamicprogrammingtrellis

•  Eachtrelliscellrepresents,vt(j),representstheprobabilitythattheHMMisinstatejanerseeingthefirsttobserva<onsandpassingthroughthemostlikelystatesequence

Viterbiintuition:wearelookingforthebest‘path’

33 promised to back the bill

promised to back the bill

S1 S2 S4 S3 S5

promised to back the bill

Slide from Dekang Lin

Intuition•  ThevalueineachcelliscomputedbytakingtheMAXoverallpathsthatleadtothiscell.

•  Anextensionofapathfromstateiat<met-1iscomputedbymul<plying:

TheViterbiAlgorithm

TheAmatrixforthePOSHMM

What is P(VB|TO)? What is P(NN|TO)? Why does this make sense? What is P(TO|VB)? What is P(TO|NN)? Why does this make sense?

Whydoesthismakesense?

TheBmatrixforthePOSHMM

Look at P(want|VB) and P(want|NN). Give an explanation for the difference in the probabilities.

Viterbiexample

41 t=1

Viterbiexample

42 t=1

Viterbiexample

43 t=1

TheAmatrixforthePOSHMM

Viterbiexample

45 t=1

TheBmatrixforthePOSHMM

Look at P(want|VB) and P(want|NN). Give an explanation for the difference in the probabilities.

Viterbiexample

47 t=1

.041X 0 0

Viterbiexample

48 t=1

.041X 0 0

Viterbiexample

49 t=1

Show the 4 formulas you would use to compute the value at this node and the max.

ComputingtheLikelihoodofanobservation

•  Forwardalgorithm

•  Exactlyliketheviterbialgorithm,except•  Tocomputetheprobabilityofastate,sumtheprobabili<esfromeachpath

ErrorAnalysis:ESSENTIAL!!!•  Lookataconfusionmatrix

•  Seewhaterrorsarecausingproblems•  Noun(NN)vsProperNoun(NN)vsAdj(JJ)•  Adverb(RB)vsPrep(IN)vsNoun(NN)•  Preterite(VBD)vsPar<ciple(VBN)vsAdjec<ve(JJ)

CS 4705 Hidden Markov Models · Three fundamental Problems for HMMs • Likelihood: Given an HMM λ...

Documents

How I Learned to Stop Worrying & Love the or Dr. …...How I Learned to Stop Worrying & Love the λ or Dr. Strange-λ Todd L. Montgomery @toddlmontgomery λ Haskell Erlang λ λ Clojure

SIR based Frequency Reconfigurable Antenna using Varactor ......c, L e 0.0179 λ 0 W p1 0.022 λ 0 L d 0.00163 λ 0 W s 0.00245 λ 0 L f 0.0408 λ 0 W t 0.0049 λ 0 L g, W f 0.0245

100G/400G SMF O/E Roadmap - Finisar Corporation · 100G/400G SMF O/E Roadmap ... NRZ Re-Timed 2x50G λ PAM-4 RT 2x50G λ QAM / DMT RT 1x100G λ PAM-4 RT 1x100G λ QAM/DMT RT 1x100G

Maximum likelihood estimation in the proportional hazards ...Maximum likelihood estimation in the proportional hazards cure model 549 ∂l ∂α = τ 0 ∂ ∂αλ(t,α) λ(t,α) −{1−g(t,

Parameter Estimation & Maximum Likelihood · parameter value. Likelihood Function Data value Parameter value Likelihood function given data. Maximum Likelihood • Consistency - with

Weak decay of 5 Λ He and 12 Λ C : experimental results

Λ * (1520) Photoproduction

[αKλ λ Y - uni-hamburg.de

Chapter 2hoffmann/OPTIMIZATION/Chapter2.pdf · Chapter 2 40 Optimality Conditions Remark C is a convex cone if and only if: ∀ x 1,x 2 ∈ C∀ λ 1,λ 2 >0 λ 1x 1 +λ 2x 2 ∈

Tsunamis - Universitetet i oslo€¦ · Tsunamis are long waves (wavelength λ>> water depth h) Unidirectional motion implies constant wave period T Wave period T = λ/c ⇒λ~ c

Maximum likelihood (ML) Conditional distribution and likelihood Maximum likelihood estimator Information in the data and likelihood Observed and Fisher’s

Maximum Likelihood and Robust Maximum Likelihood

Wealth Management Analysis · MR MRS 95% Likelihood 80% Likelihood 50% Likelihood 30% Likelihood 2041 80 78 $2,271,963 $3,623,856 $5,759,601 $7,460,469 2042 …

Why Radiative Transfer? Introduction to Solar …zeus.colorado.edu/astr7500-toomre/Lectures/SolStelMag...Basic Radiative Transfer: Emission Emission j λ: I λ(s+ds)=I λ(s)+dI λ

A GENERALIZED DIVERGENCE FOR STATISTICAL INFERENCEbiru/anb.pdf · A Generalized Divergence for Statistical Inference 5 the form PD λ(dn,fθ) = 1 λ(λ+1) ∑ dn [(dn fθ)λ −1]

Migdal and his theory in Jülich. - arXiv · 2018. 11. 6. · T λ ω−E λ+ iη (22) L λ = v2 λ −u λv λ u λv λ −u2 λ!; T λ = u2 λ u λv λ −u λv λ −v2 λ! (23)

Joint and Conditional Maximum Likelihood Estimation for ... · Joint and Conditional Maximum Likelihood Estimation for ... Joint and Conditional Maximum Likelihood ... marginal maximum-likelihood

University of Toronto T-Space€¦ · Web view(S.7) S a = 2 - a - λ σ L 2 for a 0 < a ≤λ 2 - a - λ σ U 2 for a >λ 0 for a≤ a 0 , where . λ is a cut-point parameter

Data-driven neighborhood selection of a Gaussian field · When Λ is a torus, we compare it with likelihood-based methods like AIC [Aka73] and BIC [Sch78], even if they were not studied

XM λ