View
42
Download
0
Category
Tags:
Preview:
DESCRIPTION
Natural Language Processing. Spring 2007 V. “Juggy” Jagannathan. Course Book. Foundations of Statistical Natural Language Processing. By Christopher Manning & Hinrich Schutze. Chapter 9. Markov Models March 5, 2007. Markov models. Markov assumption - PowerPoint PPT Presentation
Citation preview
Natural Language Processing
Spring 2007
V. “Juggy” Jagannathan
Foundations of Statistical Natural Language Processing
By
Christopher Manning & Hinrich Schutze
Course Book
Chapter 9
Markov Models
March 5, 2007
Markov models
• Markov assumption– Suppose X = (X1, …, XT) is a sequence of
random variables taking values in some finite set S = {s1,…,sN}, Markov properties are:
• Limited Horizon– P(Xt+1 = sk|X1,…,Xt) = P(Xt+1 = sk|Xt)
– i.e. the t+1 value only depends on t value
• Time invariant (stationary)• Stochastic Transition matrix A:
– aij = P(Xt+1 = sj|Xt=si) where
N
j ijij iajia1
,1&,,0
Markov model example
1
1
123121
112131211
11
)|()...|()|()(
),...,|()...,|()|()(),...,(
T
tXXX
TT
TTT
tta
XXPXXPXXPXP
XXXPXXXPXXPXPXXP
18.0
6.03.00.1
)|()|()(),,( 23121
iXpXPtXiXPtXPpitP
Probability: {lem,ice-t} giventhe machine starts in CP?
0.3x0.7x0.1+0.3x0.3x0.7=0.021+0.063 = 0.084
Hidden Markov Model Example
Why use HMMs?
• Underlying events generating surface observable events
• Eg. Predicting weather based on dampness of seaweeds• http://www.comp.leeds.ac.uk/roger/HiddenMarkovModels/
html_dev/main.html
• Linear Interpolation in n-gram models:
),|( 12 nnnli wwwP
),|()|()( 123312211 nnnnnn wwwPwwPwP
Look at Notes from David Meir Blei [UC Berkley]
http://www-nlp.stanford.edu/fsnlp/hmm-chap/blei-hmm-ch9.pptSlides 1-13
(Observed states)
Forward Procedure
)|,...()( 11 iXooPt tti
Niii 1,)1(
N
iijoijij NjTtbattt
1
1,1,)()1(
N
ii TOP
1
)1()|(
Initialization:
Induction:
Total computation:
Forward Procedure
)|,...()( iXooPt tTti
NiTi 1,1)1(
N
jjijoiji NiTttbat
t1
1,1),1()(
N
iiiOP
1
)1()|(
Initialization:
Induction:
Total computation:
Backward Procedure
)()(
)||...()|,...(
)|,...()|,(
11
1
tt
iXooPiXooP
iXooPiXOP
ii
tTttt
tTt
N
iii TtttOP
1
11),()()|(
Combining both – forward and backward
Finding the best state sequence
11),(maxarg
)()(
)()(
)|(
)|,(
)(
1
1
TttX
tt
tt
OP
OiXP
t
iNi
t
N
j jj
ji
t
i
To determine the state sequence that best explains observationsLet:
Individually the most likely state is:
This approach, however, does not correctly estimate the most likely state sequence.
Finding the best state sequenceViterbi algorithm
)|(maxarg OXPX
)|,...,...(max)( 1111... 11
jXooXXPt tttxx
jt
Njjj 1,)1(
Store the most probable path that leads to a given node
Initialization
Induction
Njbatttijoiji
Nij
1,)(max)1(
1
Store Backtrace
Njbatttijoiji
Nij
1,)(maxarg)1(
1
)1(maxarg1
1
TX iNi
T
)1(max)(1
TXP iNi
Parameter Estimation
Parameter Estimation
N
m mm
jijoiji
tt
ttt
tt
tbat
OP
OjXiXP
OjXiXPjip
t
1
1
1
)()(
)1()(
)|(
)|,,(
),|,(),(
Probability of traversing an arc at time t given observation sequence O:
T
tt
T
ti
Oinjtoistatefromstransitionofnumberectedjip
Oinistatefromstransitionofnumberectedt
1
1
__________exp),(
________exp)(
Parameter Estimation
T
t t
Ttkot t
ijk
T
t i
T
t t
ij
jip
jip
jtoistatefromstransitionofnumberected
observedkwithjtoistatefromstransitionofnumberectedb
t
jip
istatefromstransitionofnumberected
jtoistatefromstransitionofnumberecteda
t
1
}1,:{
1
1
),(
),(
________exp
___________exp
)(
),(
______exp
________exp
Recommended