Upload
kimberly-rodriguez
View
240
Download
1
Tags:
Embed Size (px)
Citation preview
Hidden Markov Models (HMM)
Rabiner’s Paper
Markoviana Reading Group Computer Eng. & Science Dept.Arizona State University
Markoviana Reading Group Fatih Gelgi – Feb, 2005 2
Stationary and Non-stationary
Stationary Process:Its statistical properties do not vary with time
Non-stationary Process:The signal properties vary over time
Markoviana Reading Group Fatih Gelgi – Feb, 2005 3
HMM Example - Casino Coin
Fair Unfair
0.9 0.2
0.8
0.1
0.30.5 0.5 0.7
H HT T
State transition Pbbties.
Symbol emission Pbbties.
HTHHTTHHHTHTHTHHTHHHHHHTHTHHObservation Sequence
FFFFFFUUUFFFFFFUUUUUUUFFFFFF State Sequence
Motivation: Given a sequence of H & Ts, can you tell at what times the casino cheated?
Observation Symbols
States
Two CDF tablesTwo CDF tables
Markoviana Reading Group Fatih Gelgi – Feb, 2005 4
Properties of an HMM
First-order Markov process qt only depends on qt-1
Time is discrete
Markoviana Reading Group Fatih Gelgi – Feb, 2005 5
Elements of an HMM
.....
........
.....
.....
...
2
1
321
N
M
S
S
S
OOOOb
....
.......
....
....
...
2
1
21
N
N
S
S
S
SSSa
....
...21 NSSS
N, the number of States M, the number of Symbols States S1, S2, … SN Observation Symbols O1, O2, … OM
, the Probability Distributions a, b,
Markoviana Reading Group Fatih Gelgi – Feb, 2005 6
HMM Basic Problems
1. Given an observation sequence O=O1O2O3…OT and , find P(O|)
Forward Algorithm / Backward Algorithm
2. Given O=O1O2O3…OT and find most likely state sequence Q=q1q2…qT
Viterbi Algorithm
3. Given O=O1O2O3…OT and re-estimate so that P(O|) is higher than it is now
Baum-Welch Re-estimation
Markoviana Reading Group Fatih Gelgi – Feb, 2005 7
Forward Algorithm Illustration
t(i) is the probability of observing a partial sequence O1O2O3…Ot such that the state Si.
Markoviana Reading Group Fatih Gelgi – Feb, 2005 8
Forward Algorithm Illustration (cont’d)
StateSj
SN bN(O1) 1(i) aiN) bN(O2)
… … …
S6 b6(O1) 1(i) ai6) b6(O2)
S5 b5(O1) 1(i) ai5) b5(O2)
S4 b4(O1) 1(i) ai4) b4(O2)
S3 b3(O1) 1(i) ai3) b3(O2)
S2 b2(O1) 1(i) ai2) b2(O2)
S1 b1(O1) 1(i) ai1) b1(O2)
t(j) O1 O2 O3 O4 … OT
Observations Ot
Tot
al o
f th
is c
olum
n gi
ves
solu
tion
t(i) is the probability of observing a partial sequence O1O2O3…Ot such that the state Si.
Markoviana Reading Group Fatih Gelgi – Feb, 2005 9
Forward Algorithm
)|,...()( 21 ittt SqOOOPi
NiObi ii 1)()( 11
NjTtObaij tj
N
iijtt
1,11)()()( 1
11
Definition:
Initialization:
Induction:
Problem 1 Answer:
t(i) is the probability of observing a partial sequence O1O2O3…Ot such that the state Si.
N
iT iOP
1
)()|( Complexity: O(N2T)
Markoviana Reading Group Fatih Gelgi – Feb, 2005 10
Backward Algorithm Illustration
t(i) is the probability of observing a partial sequence Ot+1Ot+2Ot+3…OT such that the state Si.
Markoviana Reading Group Fatih Gelgi – Feb, 2005 11
Backward AlgorithmDefinition:
Initialization:
Induction:
t(i) is the probability of observing a partial sequence Ot+1Ot+2Ot+3…OT such that the state Si.
Markoviana Reading Group Fatih Gelgi – Feb, 2005 12
Q2: Optimality Criterion 1* Maximize the expected number of correct individual states
Definition:
Initialization:
Problem 2 Answer:
t(i) is the probability of being in state Si at time t given the observation sequence O and the model .
Problem: If some aij=0, the optimal state sequence may not even be a valid state sequence.
Markoviana Reading Group Fatih Gelgi – Feb, 2005 13
Q2: Optimality Criterion 2
* Find the single best state sequence (path), i.e. maximize P(Q|O,).
Definition:
t(i) is the highest probability of a state path for the partial observation sequence O1O2O3…Ot such that the state Si.
Markoviana Reading Group Fatih Gelgi – Feb, 2005 14
Viterbi Algorithm
The major difference from the forward algorithm:
Maximization instead of sum
Markoviana Reading Group Fatih Gelgi – Feb, 2005 15
Viterbi Algorithm Illustration
StateSj
SN bN(O1) max1(i) aiN] bN(O2)
… … …
S6 b6(O1) max1(i) ai6] b6(O2)
S5 b5(O1) max1(i) ai5] b5(O2)
S4 b4(O1) max1(i) ai4] b4(O2)
S3 b3(O1) max1(i) ai3] b3(O2)
S2 b2(O1) max1(i) ai2] b2(O2)
S1 b1(O1) max1(i) ai1] b1(O2)
t(j) O1 O2 O3 O4 … OT
Observations Ot
t(i) is the highest probability of a state path for the partial observation sequence O1O2O3…Ot such that the state Si.
Max
of
this
col
indi
cate
s tr
aceb
ack
star
t
Markoviana Reading Group Fatih Gelgi – Feb, 2005 16
Relations with DBNForward Function:
Backward Function:
Viterbi Algorithm:
bj(Ot+1) aij t(i)
bj(Ot+1) aijt+1(j)
t+1(j)
t(i)
T(i)=1
bj(Ot+1) aij t(i)
t+1(j)
Markoviana Reading Group Fatih Gelgi – Feb, 2005 17
Some more definitions t(i) is the probability of being in state Si at time t
t(i,j) is the probability of being in state Si at time t, and Sj at time t+1
Markoviana Reading Group Fatih Gelgi – Feb, 2005 18
Baum-Welch Re-estimation
Expectation-Maximization Algorithm
Expectation:
Markoviana Reading Group Fatih Gelgi – Feb, 2005 19
Baum-Welch Re-estimation (cont’d)
Maximization:
Markoviana Reading Group Fatih Gelgi – Feb, 2005 20
Notes on the Re-estimation If the model does not change, it means that it has
reached a local maxima. Depending on the model, many local maxima can
exist Re-estimated probabilities will sum to 1
Markoviana Reading Group Fatih Gelgi – Feb, 2005 21
Implementation issues
Scaling Multiple observation sequences Initial parameter estimation Missing data Choice of model size and type
Markoviana Reading Group Fatih Gelgi – Feb, 2005 22
Scaling
calculation:
Recursion to calculate:
Markoviana Reading Group Fatih Gelgi – Feb, 2005 23
Scaling (cont’d)
calculation:
Desired condition:
* Note that is not true!
Markoviana Reading Group Fatih Gelgi – Feb, 2005 24
Scaling (cont’d)
Markoviana Reading Group Fatih Gelgi – Feb, 2005 25
Maximum log-likelihood
Initialization:
Recursion:
Termination:
Markoviana Reading Group Fatih Gelgi – Feb, 2005 26
Multiple observations sequences
Problem with re-estimation
Markoviana Reading Group Fatih Gelgi – Feb, 2005 27
Initial estimates of parameters
For and A, Random or uniform is sufficient
For B (discrete symbol prb.), Good initial estimate is needed
Markoviana Reading Group Fatih Gelgi – Feb, 2005 28
Insufficient training data
Solutions:
Increase the size of training data
Reduce the size of the model
Interpolate parameters using another model
Markoviana Reading Group Fatih Gelgi – Feb, 2005 29
References L Rabiner. ‘
A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition.’ Proceedings of the IEEE 1989.
S Russell, P Norvig. ‘Probabilistic Reasoning Over Time’. AI: A Modern Approach, Ch.15, 2002 (draft).
V Borkar, K Deshmukh, S Sarawagi. ‘Automatic segmentation of text into structured records.’ ACM SIGMOD 2001.
T Scheffer, C Decomain, S Wrobel. ‘Active Hidden Markov Models for Information Extraction.’ Proceedings of the International Symposium on Intelligent Data Analysis 2001.
S Ray, M Craven. ‘Representing Sentence Structure in Hidden Markov Models for Information Extraction.’ Proceedings of the 17th International Joint Conference on Artificial Intelligence 2001.