CS626-460: Spp,eech, NLP and the Webcs626-460-2012/lecture_slides/cs626-460... · Lecture 5,7: HMM...

CS626-460: Speech, NLP and the p ,Web

Pushpak BhattacharyyaCSE Dept., IIT Bombay

Lecture 5,7: HMM and Viterbi10th and 16th Jan, 2012

(Lecture 6 was on Computational Biomedicine research at Houston University by Prof. Ioannis)

HMM DefinitionProblem

Part of Speech

Parsing

Semantics NLPTrinity

Set of states : S where |S|=NOutput Alphabet : O where |O|=K Hindi

Marathi

English

FrenchMorphAnalysis

pTagging

Transition Probabilities : A = {aij} prob. of going from state Si to state Sj

E i i P b biliti B {b }

Algorithm

LanguageCRFMEMM

Emission Probabilities : B = {bpq} prob. of outputting symbol Oq from state Sp

Initial State Probabilities : πt a State obab t es

),,( πλ BA= ),,( πλ BA

Markov Processes

PropertiesLimited Horizon: Given previous t states aLimited Horizon: Given previous t states, a state i, is independent of preceding 0 to t-k+1 states.

P(Xt=i|Xt-1, Xt-2 ,… X0) = P(Xt=i|Xt-1, Xt-2… Xt-k)Order k Markov process

Time invariance: (shown for k=1) P(Xt=i|Xt-1=j) = P(X1=i|X0=j) …= P(Xn=i|Xn-1=j)

Three basic problems (contd.)

Problem 1: Likelihood of a sequenceForward ProcedureForward ProcedureBackward Procedure

Problem 2: Best state sequenceProblem 2: Best state sequenceViterbi Algorithm

P bl 3 R ti tiProblem 3: Re-estimationBaum-Welch ( Forward-Backward Al ith )Algorithm )

Probabilistic InferenceO: Observation SequenceS: State Sequence

Given O find S* where called Probabilistic Inference

* arg max ( / )S

S p S O=Probabilistic Inference

Infer “Hidden” from “Observed”

Infer Hidden from ObservedHow is this inference different from logical inference based on propositional or predicate calculus?

Essentials of Hidden Markov Model1. Markov + Naive Bayes

2 Uses both transition and observation probability2. Uses both transition and observation probability

1 1( ) ( / ) ( / )kOk k k k k kp S S p O S p S S+ +→ =

3. Effectively makes Hidden Markov Model a Finite State

1 1( ) ( ) ( )k k k k k kp p p+ +

Machine (FSM) with probability

Probability of Observation Sequence

( ) ( , )S

p O p O S=∑ = ( ) ( / )

Sp S p O S∑

Without any restriction,Search space size= |S||O|

Continuing with the Urn example

Colored Ball choosing

Urn 1# of Red = 30

# of Green = 50

Urn 3# of Red =60

# of Green =10

Urn 2# of Red = 10

# of Green 40# of Green 50 # of Blue = 20

# of Green 10 # of Blue = 30

# of Green = 40 # of Blue = 50

Example (contd.)

U1 U2 U3Gi

R G BU 0 3 0 5 0 2

Transition Probability Observation/output Probability

U1 0.1 0.4 0.5U2 0.6 0.2 0.2U 0 3 0 4 0 3

Given :and

U1 0.3 0.5 0.2U2 0.1 0.4 0.5U3 0.6 0.1 0.3U3 0.3 0.4 0.3

Observation : RRGGBRGR

U3 0.6 0.1 0.3

What is the corresponding state sequence ?

Diagrammatic representation (1/2)

0 3 0 3

B, 0.2

R, 0.3 G, 0.5

0.10.5

0.3 0.3

R, 0.6

G, 0.1

B, 0.3

U2R, 0.1

G, 0.4

0.2B, 0.5

Diagrammatic representation (2/2)g p ( / )

R,0.18G,0.03

R,0.03G,0.05B,0.02

R 0 02

R,0.15G,0.25B,0.10

B,0.09

R,0.18G,0.03B,0.09R,0.02

G,0.08B,0.10

R,0.24G 0 04

R,0.06G,0.24B,0.30R, 0.08

G 0 20

B,0.10 B,0.09

G,0.04B,0.12

G, 0.20B, 0.12

R,0.02,G,0.08B,0.10

Observations and statesO1 O2 O3 O4 O5 O6 O7 O8

O S G G GOBS: R R G G B R G RState: S1 S2 S3 S4 S5 S6 S7 S8

Si = U1/U2/U3; A particular statei 1/ 2/ 3; pS: State sequenceO: Observation sequenceO: Observation sequenceS* = “best” possible state (urn) sequenceGoal: Maximize P(S|O) by choosing “best” SGoal: Maximize P(S|O) by choosing best S

Maximize P(S|O) where S is the State Sequence and O is the ObservationSequence and O is the Observation Sequence

))|((maxarg* OSPS S=

Baye’s Theorem

)(/)|().()|( BPABPAPBAP =

P(A) -: PriorP(B|A) -: LikelihoodP(B|A) : Likelihood

)|().(maxarg)|(maxarg SOPSPOSP SS = )|().(maxarg)|(maxarg SOPSPOSP SS

State Transitions Probability

)|()...|().|().|().()()()(

718314213121

−−−

SSPSSPSSPSSPSPSPSPSP

)|()|()|()|()()(

By Markov Assumption (k=1)

)|()...|().|().|().()( 783423121 SSPSSPSSPSSPSPSP =

Observation Sequence probability

),|()...,|().,|().|()|( 81718812138112811 −−−−−−= SOOPSOOPSOOPSOPSOP

Assumption that ball drawn depends onlyAssumption that ball drawn depends only on the Urn chosen

)|()|()|()|()|( SOPSOPSOPSOPSOP )|()...|().|().|()|( 88332211 SOPSOPSOPSOPSOP =

)|().()|( SOPSPOSP =

)|()...|().|().|().|()...|().|().|().()|(

88332211

783423121

SOPSOPSOPSOPSSPSSPSSPSSPSPOSP =

)|()|()|()|(

Grouping termsO0 O1 O2 O3 O4 O5 O6 O7 O8

Obs: ε R R G G B R G RS S S S S S S S S S

P(S).P(O|S)= [P(O |S ) P(S |S )]

We introduce the statesS d S i i i l

State: S0 S1 S2 S3 S4 S5 S6 S7 S8 S9

= [P(O0|S0).P(S1|S0)].[P(O1|S1). P(S2|S1)].[P(O2|S2). P(S3|S2)].

S0 and S9 as initial and final states respectively.

[P(O3|S3).P(S4|S3)]. [P(O4|S4).P(S5|S4)]. [P(O5|S5).P(S6|S5)].

p yAfter S8 the next state

is S9 with probability 1 i e P(S |S ) 1

[ ( 5| 5) ( 6| 5)][P(O6|S6).P(S7|S6)]. [P(O7|S7).P(S8|S7)].[P(O |S ) P(S |S )]

1, i.e., P(S9|S8)=1O0 is ε-transition

[P(O8|S8).P(S9|S8)].

Introducing useful notationO0 O1 O2 O3 O4 O5 O6 O7 O8

Obs: ε R R G G B R G RS S S S S S S S S SState: S0 S1 S2 S3 S4 S5 S6 S7 S8 S9

R G G B R

S0 S1S7S2

S3S4 S5 S6

ε RRG G B R

S9P(Ok|Sk).P(Sk+1|Sk)=P(Sk Sk+1)

Viterbi Algorithm for the Urn problem (first two symbols)

0.50 3

U1 U2 U3

0 150.03

0.020.18

U1 U2 U3

U1 U2 U3 U1 U2 U3

0.02 0.24

0.015 0.04 0.075* 0.018 0.006 0.006 0.048* 0.036*: winner sequences

Markov process of order>1 (say 2)

O0 O1 O2 O3 O4 O5 O6 O7 O8

Obs: ε R R G G B R G RS S S S S S S S S S

Same theory worksP(S) P(O|S)

We introduce the statesS d S i i i l

State: S0 S1 S2 S3 S4 S5 S6 S7 S8 S9

P(S).P(O|S)= P(O0|S0).P(S1|S0).

[P(O1|S1). P(S2|S1S0)].

S0 and S9 as initial and final states respectively.

[P(O2|S2). P(S3|S2S1)]. [P(O3|S3).P(S4|S3S2)]. [P(O4|S4).P(S5|S4S3)].

p yAfter S8 the next state

is S9 with probability 1 i e P(S |S S ) 1

[ ( 4| 4) ( 5| 4 3)][P(O5|S5).P(S6|S5S4)]. [P(O6|S6).P(S7|S6S5)]. [P(O |S ) P(S |S S )]

1, i.e., P(S9|S8S7)=1O0 is ε-transition

[P(O7|S7).P(S8|S7S6)].[P(O8|S8).P(S9|S8S7)].

AdjustmentsTransition probability table will have tuples on rows and states on columnsrows and states on columnsOutput probability table will remain the sameIn the Viterbi tree the Markov process willIn the Viterbi tree, the Markov process will take effect from the 3rd input symbol (εRR)There will be 27 leaves, out of which only 9There will be 27 leaves, out of which only 9 will remainSequences ending in same tuples will be q g pcompared

Instead of U1, U2 and U3U1U1, U1U2, U1U3, U2U1, U2U2,U2U3, U3U1,U3U2,U3U3

Probabilistic FSMProbabilistic FSM

(a1:0.3)

(a2:0.4)(a1:0.1) (a1:0.3)(a2:0.4)

(a1:0 2)

(a1:0.1)

(a2:0 2)

(a1:0.3)

(a2:0 2)

S1 S2(a1:0.2)

(a2:0.3)

(a2:0.2) (a2:0.2)

The question here is:“what is the most likely state sequence given the output sequenceseen”

Developing the treeDeveloping the treeStart

1 0 0 0 €

1.0 0.0

0.1 0.3 0.2 0.3

S1 S2 S1 S2

0.1 0.3 0.2 0.3

1*0.1=0.1 0.3 0.0 0.0. .

0 2 0 2

S1 S2 S1 S2

a20.2 0.4 0.3 0.2

0.1*0.2=0.02 0.1*0.4=0.04 0.3*0.3=0.09 0.3*0.2=0.06 Choose the winning sequence per stateper iteration

Tree structure contdTree structure contd…

0.09 0.06

S1 S2 S1 S2

0.1 0.3 0.2 0.3

0.027 0.012..

0.09*0.1=0.009 0.018

0.3 0.2 0.40.2 a2

S1 S2 S2

0 00 8

0.0081 0.0054 0.00480.0024

Th bl b i dd d b thi t i )|(maxarg* 2121 aaaaSPS =The problem being addressed by this tree is )|(maxarg* ,2121 μaaaaSPSs

−−−=

a1-a2-a1-a2 is the output sequence and μ the model or the machine

P th f d S1 S2 S1 S2 S1Path found: (working backward)

S1 S2 S1 S2 S1

a2a1a1 a2

Problem statement: Find the best possible sequence ),|(maxarg* μOSPS = ),|(maxarg μOSPS

sMachineor Model Seq,Output Seq, State, →→→ μOSwhere

},,,{Machineor Model 0 TASS=

Start symbol State collection Alphabet set

Transitions

T is defined as kjijk

i SaSP ,, )( ∀⎯→⎯

Tabular representation of the tree

€ a1 a2 a1 a2

Latest symbol observed

S11.0 (1.0*0.1,0.0*0.2

)=(0.1,0.0)(0.02, 0.09)

(0.009, 0.012) (0.0024, 0.0081)

Ending state

S20.0 (1.0*0.3,0.0*0.3

)=(0.3,0.0)(0.04,0.0

6)(0.027,0.018) (0.0048,0.005

Note: Every cell records the winning probability ending in that stateNote: Every cell records the winning probability ending in that state

Final winnerThe bold faced values in each cell shows the sequence probability ending in that state Going backwardsequence probability ending in that state. Going backwardfrom final winner sequence which ends in state S2 (indicated By the 2nd tuple), we recover the sequence.

Algorithm(f ll l l d d(following James Alan, Natural Language Understanding (2nd edition), Benjamin Cummins (pub.), 1995

Given: 1. The HMM, which means:

a Start State: S1a. Start State: S1

b. Alphabet: A = {a1, a2, … ap}c. Set of States: S = {S1, S2, … Sn}d. Transition probability kjij

ki SaSP ,, )( ∀⎯→⎯p y

which is equal to 2. The output string a1a2…aT

)|,( ikj SaSP

To find: The most likely sequence of states C1C2…CT which produces the given output sequence, i.e., C1C2…CT = ],,...,|([maxarg 21 μT

CaaaCP

Algorithm contd…Data Structure:ata St uctu e

1. A N*T array called SEQSCORE to maintain the winner sequence always (N=#states, T=length of o/p sequence)o/p sequence)

2. Another N*T array called BACKPTR to recover the path.

Three distinct steps in the Viterbi implementationInitialization1. Initialization

2. Iteration3. Sequence Identificationq

1. InitializationSEQSCORE(1,1)=1.0BACKPTR(1,1)=0For(i=2 to N) do

SEQSCORE(i,1)=0.0[expressing the fact that first state[expressing the fact that first state

is S1]

2 Iteration2. IterationFor(t=2 to T) do

For(i 1 to N) doFor(i=1 to N) do

SEQSCORE(i,t) = Max(j=1,N))](*))1(([ SiaSjPtjSEQSCORE k⎯→⎯−

BACKPTR(I,t) = index j that gives the MAX above

)]())1(,([ SiSjPtjSEQSCORE →

3. Seq. IdentificationqC(T) = i that maximizes SEQSCORE(i,T)

For i from (T-1) to 1 doFor i from (T 1) to 1 doC(i) = BACKPTR[C(i+1),(i+1)]

Optimizations possible:1. BACKPTR can be 1*T

2. SEQSCORE can be T*2

Homework:‐ Compare this with A*, Beam Search [Homework]Reason for this comparison:

Both of them work for finding and recovering sequenceBoth of them work for finding and recovering sequence

CS626-460: Spp,eech, NLP and the Webcs626-460-2012/lecture_slides/cs626-460... · Lecture 5,7: HMM...

Documents

CS460/626 : Natural Language: Natural Language ...pb/cs626-460-2011/cs626-460...2011/02/14 · EttiExpectation-Miiti l ithMaximization algorithm 1. Start with uniform word translation

CS626-460: Language Technology for the Web/Natural Language Processing

Presentation of the UNL System - Indian Institute of ...cs626-449/cs626-460-2008/public_html/... · •For tiger (Y,Y,Y,N) Complete and correct Attributes are difficult to design

Language Modeling for Information Retrievalcs626-449/cs626-460-2008/public_html/2… · Language Modeling for Information Retrieval. Kluwer Academic Publishers, 2003. JOHN LAFFERTY

Abhimanyu Dhamija (07005024) E K Venkatesh (07005031) G ...cs626-460-2012/cs626... · Saurabh Chanderiya (07005004) Abhimanyu Dhamija (07005024) E K Venkatesh (07005031) G Hrudil

An Introduction to Natural Language Syntaxcs626-449/cs626-460-2008/public_html/2… · An Introduction to Natural Language Syntax Rajat Mohanty rkm@cse.iitb.ac.in CS-460/IT-632

Cs626 Lect1to4 Intro Pos

CS460/626 : Natural Language Processing/Speech, NLP and ...cs626-460-2012/lecture_slides/cs626-460... · CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 25–

CS626- Speech, NLP, Web Harnessing Annotation …pb/cs626-2014/cs626-lect37...Sentiment Annotation: Positive Annotation involves visualization, comprehension (understanding) and production

CS626: NLP, Speech and the Webcs626/cs626-sem1-2012/lecture_slides/c… · CS626: NLP, Speech and the Web Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 15, 17: Parsing Ambiguity,

CS460/626 : Natural Language Processing/Speech …cs626-460-2012/cs626-460-2011/lecture... · Syntagmatic vs. Paradigmatic realtions? ... those people who have a busy social life,

CS460/626 : Natural Language Processing/Speech NLP and …pb/cs626-460-2011/cs626-460-lect18-mt-as-labeling-giza... · CS460/626 : Natural Language Processing/Speech NLP and the WebProcessing/Speech,

CS460/626 : Natural Language Processing/Speech, …pb/cs626-460-2011/cs626-460...2011/03/31 · CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 33– Phonetics

CS460/626 : Natural Language Processing/Speech …cs626-460-2012/cs626-460...CS460/626 : Natural Language Processing/Speech NLP and the WebProcessing/Speech, NLP and the Web (Lecture

CS626 : Natural Language Processing/Speech, NLP …pb/cs626-sem1-2012/cs626...2012/11/01 · Introduction to sonority theory “The Sonority of a sound is its loudness relative to

CS460/626 : Natural Language Processing/Speech, NLP and ...cs626-460-2012/lecture... · CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 8– POS tagset) Pushpak

POS Tagger for Hindi - Indian Institute of Technology Bombaycs626-449/cs626-460-2008/public_html/... · POS Tagger for Hindi ... POS tagging is the process of identifying lexical

CS460/626 : Natural Language Processing/Speech, …cs626-460-2012/cs626-460...2011/04/07 · CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 35– Phonetics

CS460/626 : Natural Language Processing/Speech …pb/cs626-460-2011/cs626-460...Thus, only a limited no. of possible two‐consonant clusters. Three‐consonant: Restricted to licensed

CS460/626 : Natural LanguageCS460/626 : Natural …cs626-460-2012/cs626-460-2011/lecture... · CS460/626 : Natural LanguageCS460/626 : Natural Language Processing/Speech, NLP and