Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
CS626-460: Speech, NLP and the p ,Web
Pushpak BhattacharyyaCSE Dept., IIT Bombay
Lecture 5,7: HMM and Viterbi10th and 16th Jan, 2012
(Lecture 6 was on Computational Biomedicine research at Houston University by Prof. Ioannis)
HMM DefinitionProblem
Part of Speech
Parsing
Semantics NLPTrinity
Set of states : S where |S|=NOutput Alphabet : O where |O|=K Hindi
Marathi
English
FrenchMorphAnalysis
pTagging
HMM
Transition Probabilities : A = {aij} prob. of going from state Si to state Sj
E i i P b biliti B {b }
Algorithm
LanguageCRFMEMM
Emission Probabilities : B = {bpq} prob. of outputting symbol Oq from state Sp
Initial State Probabilities : πt a State obab t es
),,( πλ BA= ),,( πλ BA
Markov Processes
PropertiesLimited Horizon: Given previous t states aLimited Horizon: Given previous t states, a state i, is independent of preceding 0 to t-k+1 states.
P(Xt=i|Xt-1, Xt-2 ,… X0) = P(Xt=i|Xt-1, Xt-2… Xt-k)Order k Markov process
Time invariance: (shown for k=1) P(Xt=i|Xt-1=j) = P(X1=i|X0=j) …= P(Xn=i|Xn-1=j)
Three basic problems (contd.)
Problem 1: Likelihood of a sequenceForward ProcedureForward ProcedureBackward Procedure
Problem 2: Best state sequenceProblem 2: Best state sequenceViterbi Algorithm
P bl 3 R ti tiProblem 3: Re-estimationBaum-Welch ( Forward-Backward Al ith )Algorithm )
Probabilistic InferenceO: Observation SequenceS: State Sequence
Given O find S* where called Probabilistic Inference
* arg max ( / )S
S p S O=Probabilistic Inference
Infer “Hidden” from “Observed”
S
Infer Hidden from ObservedHow is this inference different from logical inference based on propositional or predicate calculus?
Essentials of Hidden Markov Model1. Markov + Naive Bayes
2 Uses both transition and observation probability2. Uses both transition and observation probability
1 1( ) ( / ) ( / )kOk k k k k kp S S p O S p S S+ +→ =
3. Effectively makes Hidden Markov Model a Finite State
1 1( ) ( ) ( )k k k k k kp p p+ +
y
Machine (FSM) with probability
Probability of Observation Sequence
( ) ( , )S
p O p O S=∑ = ( ) ( / )
S
Sp S p O S∑
Without any restriction,Search space size= |S||O|
Continuing with the Urn example
Colored Ball choosing
Urn 1# of Red = 30
# of Green = 50
Urn 3# of Red =60
# of Green =10
Urn 2# of Red = 10
# of Green 40# of Green 50 # of Blue = 20
# of Green 10 # of Blue = 30
# of Green = 40 # of Blue = 50
Example (contd.)
U1 U2 U3Gi
R G BU 0 3 0 5 0 2
Transition Probability Observation/output Probability
U1 0.1 0.4 0.5U2 0.6 0.2 0.2U 0 3 0 4 0 3
Given :and
U1 0.3 0.5 0.2U2 0.1 0.4 0.5U3 0.6 0.1 0.3U3 0.3 0.4 0.3
Observation : RRGGBRGR
U3 0.6 0.1 0.3
What is the corresponding state sequence ?
Diagrammatic representation (1/2)
G 0 5
0 3 0 3
B, 0.2
R, 0.3 G, 0.5
U1 U3
0.10.5
0.3 0.3
R, 0.6
0.2
0.4
0.6
0.4
G, 0.1
B, 0.3
U2R, 0.1
G, 0.4
0.2B, 0.5
Diagrammatic representation (2/2)g p ( / )
R,0.18G,0.03
R,0.03G,0.05B,0.02
U1 U3
R 0 02
R,0.15G,0.25B,0.10
B,0.09
R,0.18G,0.03B,0.09R,0.02
G,0.08B,0.10
R,0.24G 0 04
R,0.06G,0.24B,0.30R, 0.08
G 0 20
B,0.10 B,0.09
U2
G,0.04B,0.12
G, 0.20B, 0.12
R,0.02,G,0.08B,0.10
Observations and statesO1 O2 O3 O4 O5 O6 O7 O8
O S G G GOBS: R R G G B R G RState: S1 S2 S3 S4 S5 S6 S7 S8
Si = U1/U2/U3; A particular statei 1/ 2/ 3; pS: State sequenceO: Observation sequenceO: Observation sequenceS* = “best” possible state (urn) sequenceGoal: Maximize P(S|O) by choosing “best” SGoal: Maximize P(S|O) by choosing best S
Goal
Maximize P(S|O) where S is the State Sequence and O is the ObservationSequence and O is the Observation Sequence
))|((maxarg* OSPS S=
Baye’s Theorem
)(/)|().()|( BPABPAPBAP =
P(A) -: PriorP(B|A) -: LikelihoodP(B|A) : Likelihood
)|().(maxarg)|(maxarg SOPSPOSP SS = )|().(maxarg)|(maxarg SOPSPOSP SS
State Transitions Probability
)|()...|().|().|().()()()(
718314213121
81
−−−
−
==
SSPSSPSSPSSPSPSPSPSP
)|()|()|()|()()(
By Markov Assumption (k=1)
)|()...|().|().|().()( 783423121 SSPSSPSSPSSPSPSP =
Observation Sequence probability
),|()...,|().,|().|()|( 81718812138112811 −−−−−−= SOOPSOOPSOOPSOPSOP
Assumption that ball drawn depends onlyAssumption that ball drawn depends only on the Urn chosen
)|()|()|()|()|( SOPSOPSOPSOPSOP )|()...|().|().|()|( 88332211 SOPSOPSOPSOPSOP =
)|().()|( SOPSPOSP =
)|()...|().|().|().|()...|().|().|().()|(
88332211
783423121
SOPSOPSOPSOPSSPSSPSSPSSPSPOSP =
)|()|()|()|(
Grouping termsO0 O1 O2 O3 O4 O5 O6 O7 O8
Obs: ε R R G G B R G RS S S S S S S S S S
P(S).P(O|S)= [P(O |S ) P(S |S )]
We introduce the statesS d S i i i l
State: S0 S1 S2 S3 S4 S5 S6 S7 S8 S9
= [P(O0|S0).P(S1|S0)].[P(O1|S1). P(S2|S1)].[P(O2|S2). P(S3|S2)].
S0 and S9 as initial and final states respectively.
[P(O3|S3).P(S4|S3)]. [P(O4|S4).P(S5|S4)]. [P(O5|S5).P(S6|S5)].
p yAfter S8 the next state
is S9 with probability 1 i e P(S |S ) 1
[ ( 5| 5) ( 6| 5)][P(O6|S6).P(S7|S6)]. [P(O7|S7).P(S8|S7)].[P(O |S ) P(S |S )]
1, i.e., P(S9|S8)=1O0 is ε-transition
[P(O8|S8).P(S9|S8)].
Introducing useful notationO0 O1 O2 O3 O4 O5 O6 O7 O8
Obs: ε R R G G B R G RS S S S S S S S S SState: S0 S1 S2 S3 S4 S5 S6 S7 S8 S9
R G G B R
S0 S1S7S2
S3S4 S5 S6
ε RRG G B R
G
S8
RO
S9P(Ok|Sk).P(Sk+1|Sk)=P(Sk Sk+1)
Ok
Viterbi Algorithm for the Urn problem (first two symbols)
S0
0.50 3
0.2ε
U1 U2 U3
0.3
0 150.03
0 08
0.15
0.06
0.020.18
0.18
R
U1 U2 U3
0.08
U1 U2 U3 U1 U2 U3
0.02 0.24
0.015 0.04 0.075* 0.018 0.006 0.006 0.048* 0.036*: winner sequences
Markov process of order>1 (say 2)
O0 O1 O2 O3 O4 O5 O6 O7 O8
Obs: ε R R G G B R G RS S S S S S S S S S
Same theory worksP(S) P(O|S)
We introduce the statesS d S i i i l
State: S0 S1 S2 S3 S4 S5 S6 S7 S8 S9
P(S).P(O|S)= P(O0|S0).P(S1|S0).
[P(O1|S1). P(S2|S1S0)].
S0 and S9 as initial and final states respectively.
[P(O2|S2). P(S3|S2S1)]. [P(O3|S3).P(S4|S3S2)]. [P(O4|S4).P(S5|S4S3)].
p yAfter S8 the next state
is S9 with probability 1 i e P(S |S S ) 1
[ ( 4| 4) ( 5| 4 3)][P(O5|S5).P(S6|S5S4)]. [P(O6|S6).P(S7|S6S5)]. [P(O |S ) P(S |S S )]
1, i.e., P(S9|S8S7)=1O0 is ε-transition
[P(O7|S7).P(S8|S7S6)].[P(O8|S8).P(S9|S8S7)].
AdjustmentsTransition probability table will have tuples on rows and states on columnsrows and states on columnsOutput probability table will remain the sameIn the Viterbi tree the Markov process willIn the Viterbi tree, the Markov process will take effect from the 3rd input symbol (εRR)There will be 27 leaves, out of which only 9There will be 27 leaves, out of which only 9 will remainSequences ending in same tuples will be q g pcompared
Instead of U1, U2 and U3U1U1, U1U2, U1U3, U2U1, U2U2,U2U3, U3U1,U3U2,U3U3
Probabilistic FSMProbabilistic FSM
(a1:0.3)
(a2:0.4)(a1:0.1) (a1:0.3)(a2:0.4)
(a1:0 2)
(a1:0.1)
(a2:0 2)
(a1:0.3)
(a2:0 2)
S1 S2(a1:0.2)
(a2:0.3)
(a2:0.2) (a2:0.2)
The question here is:“what is the most likely state sequence given the output sequenceseen”
Developing the treeDeveloping the treeStart
1 0 0 0 €
S1 S2
1.0 0.0
0.1 0.3 0.2 0.3
€
a1
S1 S2 S1 S2
0.1 0.3 0.2 0.3
1*0.1=0.1 0.3 0.0 0.0. .
a1
0 2 0 2
S1 S2 S1 S2
. .
a20.2 0.4 0.3 0.2
0.1*0.2=0.02 0.1*0.4=0.04 0.3*0.3=0.09 0.3*0.2=0.06 Choose the winning sequence per stateper iteration
Tree structure contdTree structure contd…
S1 S2
0.09 0.06
S1 S2
S1 S2 S1 S2
0.1 0.3 0.2 0.3
0.027 0.012..
0.09*0.1=0.009 0.018
a1
0.3 0.2 0.40.2 a2
S1 S2 S2
0 00 8
S1.
0.0081 0.0054 0.00480.0024
Th bl b i dd d b thi t i )|(maxarg* 2121 aaaaSPS =The problem being addressed by this tree is )|(maxarg* ,2121 μaaaaSPSs
−−−=
a1-a2-a1-a2 is the output sequence and μ the model or the machine
P th f d S1 S2 S1 S2 S1Path found: (working backward)
S1 S2 S1 S2 S1
a2a1a1 a2
Problem statement: Find the best possible sequence ),|(maxarg* μOSPS = ),|(maxarg μOSPS
sMachineor Model Seq,Output Seq, State, →→→ μOSwhere
},,,{Machineor Model 0 TASS=
Start symbol State collection Alphabet set
Transitions
T is defined as kjijk
i SaSP ,, )( ∀⎯→⎯
Tabular representation of the tree
€ a1 a2 a1 a2
Latest symbol observed
S11.0 (1.0*0.1,0.0*0.2
)=(0.1,0.0)(0.02, 0.09)
(0.009, 0.012) (0.0024, 0.0081)
Ending state
S20.0 (1.0*0.3,0.0*0.3
)=(0.3,0.0)(0.04,0.0
6)(0.027,0.018) (0.0048,0.005
4)
Note: Every cell records the winning probability ending in that stateNote: Every cell records the winning probability ending in that state
Final winnerThe bold faced values in each cell shows the sequence probability ending in that state Going backwardsequence probability ending in that state. Going backwardfrom final winner sequence which ends in state S2 (indicated By the 2nd tuple), we recover the sequence.
Algorithm(f ll l l d d(following James Alan, Natural Language Understanding (2nd edition), Benjamin Cummins (pub.), 1995
Given: 1. The HMM, which means:
a Start State: S1a. Start State: S1
b. Alphabet: A = {a1, a2, … ap}c. Set of States: S = {S1, S2, … Sn}d. Transition probability kjij
ki SaSP ,, )( ∀⎯→⎯p y
which is equal to 2. The output string a1a2…aT
)|,( ikj SaSP
To find: The most likely sequence of states C1C2…CT which produces the given output sequence, i.e., C1C2…CT = ],,...,|([maxarg 21 μT
CaaaCP
C
Algorithm contd…Data Structure:ata St uctu e
1. A N*T array called SEQSCORE to maintain the winner sequence always (N=#states, T=length of o/p sequence)o/p sequence)
2. Another N*T array called BACKPTR to recover the path.
Three distinct steps in the Viterbi implementationInitialization1. Initialization
2. Iteration3. Sequence Identificationq
1. InitializationSEQSCORE(1,1)=1.0BACKPTR(1,1)=0For(i=2 to N) do
SEQSCORE(i,1)=0.0[expressing the fact that first state[expressing the fact that first state
is S1]
2 Iteration2. IterationFor(t=2 to T) do
For(i 1 to N) doFor(i=1 to N) do
SEQSCORE(i,t) = Max(j=1,N))](*))1(([ SiaSjPtjSEQSCORE k⎯→⎯−
BACKPTR(I,t) = index j that gives the MAX above
)]())1(,([ SiSjPtjSEQSCORE →
3. Seq. IdentificationqC(T) = i that maximizes SEQSCORE(i,T)
For i from (T-1) to 1 doFor i from (T 1) to 1 doC(i) = BACKPTR[C(i+1),(i+1)]
Optimizations possible:1. BACKPTR can be 1*T
2. SEQSCORE can be T*2
Homework:‐ Compare this with A*, Beam Search [Homework]Reason for this comparison:
Both of them work for finding and recovering sequenceBoth of them work for finding and recovering sequence