30
CS626-460: Speech, NLP and the Web Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 5,7: HMM and Viterbi 10 th and 16 th Jan, 2012 (Lecture 6 was on Computational Biomedicine research at Houston University by Prof. Ioannis)

CS626-460: Spp,eech, NLP and the Webcs626-460-2012/lecture_slides/cs626-460... · Lecture 5,7: HMM and Viterbi 10 thand 16 Jan, 2012 (Lecture 6 was on Computational Biomedicine research

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: CS626-460: Spp,eech, NLP and the Webcs626-460-2012/lecture_slides/cs626-460... · Lecture 5,7: HMM and Viterbi 10 thand 16 Jan, 2012 (Lecture 6 was on Computational Biomedicine research

CS626-460: Speech, NLP and the p ,Web

Pushpak BhattacharyyaCSE Dept., IIT Bombay

Lecture 5,7: HMM and Viterbi10th and 16th Jan, 2012

(Lecture 6 was on Computational Biomedicine research at Houston University by Prof. Ioannis)

Page 2: CS626-460: Spp,eech, NLP and the Webcs626-460-2012/lecture_slides/cs626-460... · Lecture 5,7: HMM and Viterbi 10 thand 16 Jan, 2012 (Lecture 6 was on Computational Biomedicine research

HMM DefinitionProblem

Part of Speech

Parsing

Semantics NLPTrinity

Set of states : S where |S|=NOutput Alphabet : O where |O|=K Hindi

Marathi

English

FrenchMorphAnalysis

pTagging

HMM

Transition Probabilities : A = {aij} prob. of going from state Si to state Sj

E i i P b biliti B {b }

Algorithm

LanguageCRFMEMM

Emission Probabilities : B = {bpq} prob. of outputting symbol Oq from state Sp

Initial State Probabilities : πt a State obab t es

),,( πλ BA= ),,( πλ BA

Page 3: CS626-460: Spp,eech, NLP and the Webcs626-460-2012/lecture_slides/cs626-460... · Lecture 5,7: HMM and Viterbi 10 thand 16 Jan, 2012 (Lecture 6 was on Computational Biomedicine research

Markov Processes

PropertiesLimited Horizon: Given previous t states aLimited Horizon: Given previous t states, a state i, is independent of preceding 0 to t-k+1 states.

P(Xt=i|Xt-1, Xt-2 ,… X0) = P(Xt=i|Xt-1, Xt-2… Xt-k)Order k Markov process

Time invariance: (shown for k=1) P(Xt=i|Xt-1=j) = P(X1=i|X0=j) …= P(Xn=i|Xn-1=j)

Page 4: CS626-460: Spp,eech, NLP and the Webcs626-460-2012/lecture_slides/cs626-460... · Lecture 5,7: HMM and Viterbi 10 thand 16 Jan, 2012 (Lecture 6 was on Computational Biomedicine research

Three basic problems (contd.)

Problem 1: Likelihood of a sequenceForward ProcedureForward ProcedureBackward Procedure

Problem 2: Best state sequenceProblem 2: Best state sequenceViterbi Algorithm

P bl 3 R ti tiProblem 3: Re-estimationBaum-Welch ( Forward-Backward Al ith )Algorithm )

Page 5: CS626-460: Spp,eech, NLP and the Webcs626-460-2012/lecture_slides/cs626-460... · Lecture 5,7: HMM and Viterbi 10 thand 16 Jan, 2012 (Lecture 6 was on Computational Biomedicine research

Probabilistic InferenceO: Observation SequenceS: State Sequence

Given O find S* where called Probabilistic Inference

* arg max ( / )S

S p S O=Probabilistic Inference

Infer “Hidden” from “Observed”

S

Infer Hidden from ObservedHow is this inference different from logical inference based on propositional or predicate calculus?

Page 6: CS626-460: Spp,eech, NLP and the Webcs626-460-2012/lecture_slides/cs626-460... · Lecture 5,7: HMM and Viterbi 10 thand 16 Jan, 2012 (Lecture 6 was on Computational Biomedicine research

Essentials of Hidden Markov Model1. Markov + Naive Bayes

2 Uses both transition and observation probability2. Uses both transition and observation probability

1 1( ) ( / ) ( / )kOk k k k k kp S S p O S p S S+ +→ =

3. Effectively makes Hidden Markov Model a Finite State

1 1( ) ( ) ( )k k k k k kp p p+ +

y

Machine (FSM) with probability

Page 7: CS626-460: Spp,eech, NLP and the Webcs626-460-2012/lecture_slides/cs626-460... · Lecture 5,7: HMM and Viterbi 10 thand 16 Jan, 2012 (Lecture 6 was on Computational Biomedicine research

Probability of Observation Sequence

( ) ( , )S

p O p O S=∑ = ( ) ( / )

S

Sp S p O S∑

Without any restriction,Search space size= |S||O|

Page 8: CS626-460: Spp,eech, NLP and the Webcs626-460-2012/lecture_slides/cs626-460... · Lecture 5,7: HMM and Viterbi 10 thand 16 Jan, 2012 (Lecture 6 was on Computational Biomedicine research

Continuing with the Urn example

Colored Ball choosing

Urn 1# of Red = 30

# of Green = 50

Urn 3# of Red =60

# of Green =10

Urn 2# of Red = 10

# of Green 40# of Green 50 # of Blue = 20

# of Green 10 # of Blue = 30

# of Green = 40 # of Blue = 50

Page 9: CS626-460: Spp,eech, NLP and the Webcs626-460-2012/lecture_slides/cs626-460... · Lecture 5,7: HMM and Viterbi 10 thand 16 Jan, 2012 (Lecture 6 was on Computational Biomedicine research

Example (contd.)

U1 U2 U3Gi

R G BU 0 3 0 5 0 2

Transition Probability Observation/output Probability

U1 0.1 0.4 0.5U2 0.6 0.2 0.2U 0 3 0 4 0 3

Given :and

U1 0.3 0.5 0.2U2 0.1 0.4 0.5U3 0.6 0.1 0.3U3 0.3 0.4 0.3

Observation : RRGGBRGR

U3 0.6 0.1 0.3

What is the corresponding state sequence ?

Page 10: CS626-460: Spp,eech, NLP and the Webcs626-460-2012/lecture_slides/cs626-460... · Lecture 5,7: HMM and Viterbi 10 thand 16 Jan, 2012 (Lecture 6 was on Computational Biomedicine research

Diagrammatic representation (1/2)

G 0 5

0 3 0 3

B, 0.2

R, 0.3 G, 0.5

U1 U3

0.10.5

0.3 0.3

R, 0.6

0.2

0.4

0.6

0.4

G, 0.1

B, 0.3

U2R, 0.1

G, 0.4

0.2B, 0.5

Page 11: CS626-460: Spp,eech, NLP and the Webcs626-460-2012/lecture_slides/cs626-460... · Lecture 5,7: HMM and Viterbi 10 thand 16 Jan, 2012 (Lecture 6 was on Computational Biomedicine research

Diagrammatic representation (2/2)g p ( / )

R,0.18G,0.03

R,0.03G,0.05B,0.02

U1 U3

R 0 02

R,0.15G,0.25B,0.10

B,0.09

R,0.18G,0.03B,0.09R,0.02

G,0.08B,0.10

R,0.24G 0 04

R,0.06G,0.24B,0.30R, 0.08

G 0 20

B,0.10 B,0.09

U2

G,0.04B,0.12

G, 0.20B, 0.12

R,0.02,G,0.08B,0.10

Page 12: CS626-460: Spp,eech, NLP and the Webcs626-460-2012/lecture_slides/cs626-460... · Lecture 5,7: HMM and Viterbi 10 thand 16 Jan, 2012 (Lecture 6 was on Computational Biomedicine research

Observations and statesO1 O2 O3 O4 O5 O6 O7 O8

O S G G GOBS: R R G G B R G RState: S1 S2 S3 S4 S5 S6 S7 S8

Si = U1/U2/U3; A particular statei 1/ 2/ 3; pS: State sequenceO: Observation sequenceO: Observation sequenceS* = “best” possible state (urn) sequenceGoal: Maximize P(S|O) by choosing “best” SGoal: Maximize P(S|O) by choosing best S

Page 13: CS626-460: Spp,eech, NLP and the Webcs626-460-2012/lecture_slides/cs626-460... · Lecture 5,7: HMM and Viterbi 10 thand 16 Jan, 2012 (Lecture 6 was on Computational Biomedicine research

Goal

Maximize P(S|O) where S is the State Sequence and O is the ObservationSequence and O is the Observation Sequence

))|((maxarg* OSPS S=

Page 14: CS626-460: Spp,eech, NLP and the Webcs626-460-2012/lecture_slides/cs626-460... · Lecture 5,7: HMM and Viterbi 10 thand 16 Jan, 2012 (Lecture 6 was on Computational Biomedicine research

Baye’s Theorem

)(/)|().()|( BPABPAPBAP =

P(A) -: PriorP(B|A) -: LikelihoodP(B|A) : Likelihood

)|().(maxarg)|(maxarg SOPSPOSP SS = )|().(maxarg)|(maxarg SOPSPOSP SS

Page 15: CS626-460: Spp,eech, NLP and the Webcs626-460-2012/lecture_slides/cs626-460... · Lecture 5,7: HMM and Viterbi 10 thand 16 Jan, 2012 (Lecture 6 was on Computational Biomedicine research

State Transitions Probability

)|()...|().|().|().()()()(

718314213121

81

−−−

==

SSPSSPSSPSSPSPSPSPSP

)|()|()|()|()()(

By Markov Assumption (k=1)

)|()...|().|().|().()( 783423121 SSPSSPSSPSSPSPSP =

Page 16: CS626-460: Spp,eech, NLP and the Webcs626-460-2012/lecture_slides/cs626-460... · Lecture 5,7: HMM and Viterbi 10 thand 16 Jan, 2012 (Lecture 6 was on Computational Biomedicine research

Observation Sequence probability

),|()...,|().,|().|()|( 81718812138112811 −−−−−−= SOOPSOOPSOOPSOPSOP

Assumption that ball drawn depends onlyAssumption that ball drawn depends only on the Urn chosen

)|()|()|()|()|( SOPSOPSOPSOPSOP )|()...|().|().|()|( 88332211 SOPSOPSOPSOPSOP =

)|().()|( SOPSPOSP =

)|()...|().|().|().|()...|().|().|().()|(

88332211

783423121

SOPSOPSOPSOPSSPSSPSSPSSPSPOSP =

)|()|()|()|(

Page 17: CS626-460: Spp,eech, NLP and the Webcs626-460-2012/lecture_slides/cs626-460... · Lecture 5,7: HMM and Viterbi 10 thand 16 Jan, 2012 (Lecture 6 was on Computational Biomedicine research

Grouping termsO0 O1 O2 O3 O4 O5 O6 O7 O8

Obs: ε R R G G B R G RS S S S S S S S S S

P(S).P(O|S)= [P(O |S ) P(S |S )]

We introduce the statesS d S i i i l

State: S0 S1 S2 S3 S4 S5 S6 S7 S8 S9

= [P(O0|S0).P(S1|S0)].[P(O1|S1). P(S2|S1)].[P(O2|S2). P(S3|S2)].

S0 and S9 as initial and final states respectively.

[P(O3|S3).P(S4|S3)]. [P(O4|S4).P(S5|S4)]. [P(O5|S5).P(S6|S5)].

p yAfter S8 the next state

is S9 with probability 1 i e P(S |S ) 1

[ ( 5| 5) ( 6| 5)][P(O6|S6).P(S7|S6)]. [P(O7|S7).P(S8|S7)].[P(O |S ) P(S |S )]

1, i.e., P(S9|S8)=1O0 is ε-transition

[P(O8|S8).P(S9|S8)].

Page 18: CS626-460: Spp,eech, NLP and the Webcs626-460-2012/lecture_slides/cs626-460... · Lecture 5,7: HMM and Viterbi 10 thand 16 Jan, 2012 (Lecture 6 was on Computational Biomedicine research

Introducing useful notationO0 O1 O2 O3 O4 O5 O6 O7 O8

Obs: ε R R G G B R G RS S S S S S S S S SState: S0 S1 S2 S3 S4 S5 S6 S7 S8 S9

R G G B R

S0 S1S7S2

S3S4 S5 S6

ε RRG G B R

G

S8

RO

S9P(Ok|Sk).P(Sk+1|Sk)=P(Sk Sk+1)

Ok

Page 19: CS626-460: Spp,eech, NLP and the Webcs626-460-2012/lecture_slides/cs626-460... · Lecture 5,7: HMM and Viterbi 10 thand 16 Jan, 2012 (Lecture 6 was on Computational Biomedicine research

Viterbi Algorithm for the Urn problem (first two symbols)

S0

0.50 3

0.2ε

U1 U2 U3

0.3

0 150.03

0 08

0.15

0.06

0.020.18

0.18

R

U1 U2 U3

0.08

U1 U2 U3 U1 U2 U3

0.02 0.24

0.015 0.04 0.075* 0.018 0.006 0.006 0.048* 0.036*: winner sequences

Page 20: CS626-460: Spp,eech, NLP and the Webcs626-460-2012/lecture_slides/cs626-460... · Lecture 5,7: HMM and Viterbi 10 thand 16 Jan, 2012 (Lecture 6 was on Computational Biomedicine research

Markov process of order>1 (say 2)

O0 O1 O2 O3 O4 O5 O6 O7 O8

Obs: ε R R G G B R G RS S S S S S S S S S

Same theory worksP(S) P(O|S)

We introduce the statesS d S i i i l

State: S0 S1 S2 S3 S4 S5 S6 S7 S8 S9

P(S).P(O|S)= P(O0|S0).P(S1|S0).

[P(O1|S1). P(S2|S1S0)].

S0 and S9 as initial and final states respectively.

[P(O2|S2). P(S3|S2S1)]. [P(O3|S3).P(S4|S3S2)]. [P(O4|S4).P(S5|S4S3)].

p yAfter S8 the next state

is S9 with probability 1 i e P(S |S S ) 1

[ ( 4| 4) ( 5| 4 3)][P(O5|S5).P(S6|S5S4)]. [P(O6|S6).P(S7|S6S5)]. [P(O |S ) P(S |S S )]

1, i.e., P(S9|S8S7)=1O0 is ε-transition

[P(O7|S7).P(S8|S7S6)].[P(O8|S8).P(S9|S8S7)].

Page 21: CS626-460: Spp,eech, NLP and the Webcs626-460-2012/lecture_slides/cs626-460... · Lecture 5,7: HMM and Viterbi 10 thand 16 Jan, 2012 (Lecture 6 was on Computational Biomedicine research

AdjustmentsTransition probability table will have tuples on rows and states on columnsrows and states on columnsOutput probability table will remain the sameIn the Viterbi tree the Markov process willIn the Viterbi tree, the Markov process will take effect from the 3rd input symbol (εRR)There will be 27 leaves, out of which only 9There will be 27 leaves, out of which only 9 will remainSequences ending in same tuples will be q g pcompared

Instead of U1, U2 and U3U1U1, U1U2, U1U3, U2U1, U2U2,U2U3, U3U1,U3U2,U3U3

Page 22: CS626-460: Spp,eech, NLP and the Webcs626-460-2012/lecture_slides/cs626-460... · Lecture 5,7: HMM and Viterbi 10 thand 16 Jan, 2012 (Lecture 6 was on Computational Biomedicine research

Probabilistic FSMProbabilistic FSM

(a1:0.3)

(a2:0.4)(a1:0.1) (a1:0.3)(a2:0.4)

(a1:0 2)

(a1:0.1)

(a2:0 2)

(a1:0.3)

(a2:0 2)

S1 S2(a1:0.2)

(a2:0.3)

(a2:0.2) (a2:0.2)

The question here is:“what is the most likely state sequence given the output sequenceseen”

Page 23: CS626-460: Spp,eech, NLP and the Webcs626-460-2012/lecture_slides/cs626-460... · Lecture 5,7: HMM and Viterbi 10 thand 16 Jan, 2012 (Lecture 6 was on Computational Biomedicine research

Developing the treeDeveloping the treeStart

1 0 0 0 €

S1 S2

1.0 0.0

0.1 0.3 0.2 0.3

a1

S1 S2 S1 S2

0.1 0.3 0.2 0.3

1*0.1=0.1 0.3 0.0 0.0. .

a1

0 2 0 2

S1 S2 S1 S2

. .

a20.2 0.4 0.3 0.2

0.1*0.2=0.02 0.1*0.4=0.04 0.3*0.3=0.09 0.3*0.2=0.06 Choose  the  winning sequence per stateper iteration

Page 24: CS626-460: Spp,eech, NLP and the Webcs626-460-2012/lecture_slides/cs626-460... · Lecture 5,7: HMM and Viterbi 10 thand 16 Jan, 2012 (Lecture 6 was on Computational Biomedicine research

Tree structure contdTree structure contd…

S1 S2

0.09 0.06

S1 S2

S1 S2 S1 S2

0.1 0.3 0.2 0.3

0.027 0.012..

0.09*0.1=0.009 0.018

a1

0.3 0.2 0.40.2 a2

S1 S2 S2

0 00 8

S1.

0.0081 0.0054 0.00480.0024

Th bl b i dd d b thi t i )|(maxarg* 2121 aaaaSPS =The problem being addressed by this tree is  )|(maxarg* ,2121 μaaaaSPSs

−−−=

a1-a2-a1-a2 is the output sequence and μ the model or the machine

Page 25: CS626-460: Spp,eech, NLP and the Webcs626-460-2012/lecture_slides/cs626-460... · Lecture 5,7: HMM and Viterbi 10 thand 16 Jan, 2012 (Lecture 6 was on Computational Biomedicine research

P th f d S1 S2 S1 S2 S1Path found: (working backward)

S1 S2 S1 S2 S1

a2a1a1 a2

Problem statement: Find the best possible sequence ),|(maxarg* μOSPS = ),|(maxarg μOSPS

sMachineor Model Seq,Output Seq, State, →→→ μOSwhere

},,,{Machineor Model 0 TASS=

Start symbol State collection Alphabet set

Transitions

T is defined as kjijk

i SaSP ,, )( ∀⎯→⎯

Page 26: CS626-460: Spp,eech, NLP and the Webcs626-460-2012/lecture_slides/cs626-460... · Lecture 5,7: HMM and Viterbi 10 thand 16 Jan, 2012 (Lecture 6 was on Computational Biomedicine research

Tabular representation of the tree

€ a1 a2 a1 a2

Latest symbol observed

S11.0 (1.0*0.1,0.0*0.2

)=(0.1,0.0)(0.02, 0.09)

(0.009, 0.012) (0.0024, 0.0081)

Ending state

S20.0 (1.0*0.3,0.0*0.3

)=(0.3,0.0)(0.04,0.0

6)(0.027,0.018) (0.0048,0.005

4)

Note: Every cell records the winning probability ending in that stateNote: Every cell records the winning probability ending in that state

Final winnerThe bold faced values in each cell shows the sequence probability ending in that state Going backwardsequence probability ending in that state. Going backwardfrom final winner sequence which ends in state S2 (indicated By the 2nd tuple), we recover the sequence.

Page 27: CS626-460: Spp,eech, NLP and the Webcs626-460-2012/lecture_slides/cs626-460... · Lecture 5,7: HMM and Viterbi 10 thand 16 Jan, 2012 (Lecture 6 was on Computational Biomedicine research

Algorithm(f ll l l d d(following James Alan, Natural Language Understanding (2nd edition), Benjamin Cummins (pub.), 1995

Given: 1. The HMM, which means:

a Start State: S1a. Start State: S1

b. Alphabet: A = {a1, a2, … ap}c. Set of States: S = {S1, S2, … Sn}d. Transition probability kjij

ki SaSP ,, )( ∀⎯→⎯p y

which is equal to 2. The output string a1a2…aT

)|,( ikj SaSP

To find: The most likely sequence of states C1C2…CT which produces the given output sequence, i.e., C1C2…CT = ],,...,|([maxarg 21 μT

CaaaCP

C

Page 28: CS626-460: Spp,eech, NLP and the Webcs626-460-2012/lecture_slides/cs626-460... · Lecture 5,7: HMM and Viterbi 10 thand 16 Jan, 2012 (Lecture 6 was on Computational Biomedicine research

Algorithm contd…Data Structure:ata St uctu e

1. A N*T array called SEQSCORE to maintain the winner sequence always (N=#states, T=length of o/p sequence)o/p sequence)

2. Another N*T array called BACKPTR to recover the path.

Three distinct steps in the Viterbi implementationInitialization1. Initialization

2. Iteration3. Sequence Identificationq

Page 29: CS626-460: Spp,eech, NLP and the Webcs626-460-2012/lecture_slides/cs626-460... · Lecture 5,7: HMM and Viterbi 10 thand 16 Jan, 2012 (Lecture 6 was on Computational Biomedicine research

1. InitializationSEQSCORE(1,1)=1.0BACKPTR(1,1)=0For(i=2 to N) do

SEQSCORE(i,1)=0.0[expressing the fact that first state[expressing the fact that first state

is S1]

2 Iteration2. IterationFor(t=2 to T) do

For(i 1 to N) doFor(i=1 to N) do

SEQSCORE(i,t) = Max(j=1,N))](*))1(([ SiaSjPtjSEQSCORE k⎯→⎯−

BACKPTR(I,t) = index j that gives the MAX above

)]())1(,([ SiSjPtjSEQSCORE →

Page 30: CS626-460: Spp,eech, NLP and the Webcs626-460-2012/lecture_slides/cs626-460... · Lecture 5,7: HMM and Viterbi 10 thand 16 Jan, 2012 (Lecture 6 was on Computational Biomedicine research

3. Seq. IdentificationqC(T) = i that maximizes SEQSCORE(i,T)

For i from (T-1) to 1 doFor i from (T 1) to 1 doC(i) = BACKPTR[C(i+1),(i+1)]

Optimizations possible:1. BACKPTR can be 1*T

2. SEQSCORE can be T*2

Homework:‐ Compare this with A*, Beam Search [Homework]Reason for this comparison: 

Both of them work for finding and recovering sequenceBoth of them work for finding and recovering sequence