Hidden Markov Models (HMM) Rabiner’s Paper Markoviana Reading Group Computer Eng. & Science Dept. Arizona State University

Hidden Markov Models (HMM)

Rabiner’s Paper

Markoviana Reading Group Computer Eng. & Science Dept.Arizona State University

Markoviana Reading Group Fatih Gelgi – Feb, 2005 2

Stationary and Non-stationary

Stationary Process:Its statistical properties do not vary with time

Non-stationary Process:The signal properties vary over time


HMM Example - Casino Coin

Fair Unfair

0.9 0.2

0.8

0.1

0.30.5 0.5 0.7

H HT T

State transition Pbbties.

Symbol emission Pbbties.

HTHHTTHHHTHTHTHHTHHHHHHTHTHHObservation Sequence

FFFFFFUUUFFFFFFUUUUUUUFFFFFF State Sequence

Motivation: Given a sequence of H & Ts, can you tell at what times the casino cheated?

Observation Symbols

States

Two CDF tablesTwo CDF tables


Properties of an HMM

First-order Markov process qt only depends on qt-1

Time is discrete


Elements of an HMM

.....

........

.....

.....

...

2

1

321

N

M

S

S

S

OOOOb

....

.......

....

....

...

2

1

21

N

N

S

S

S

SSSa

....

...21 NSSS

N, the number of States M, the number of Symbols States S1, S2, … SN Observation Symbols O1, O2, … OM

, the Probability Distributions a, b,


HMM Basic Problems

1. Given an observation sequence O=O1O2O3…OT and , find P(O|)

Forward Algorithm / Backward Algorithm

2. Given O=O1O2O3…OT and find most likely state sequence Q=q1q2…qT

Viterbi Algorithm

3. Given O=O1O2O3…OT and re-estimate so that P(O|) is higher than it is now

Baum-Welch Re-estimation


Forward Algorithm Illustration

t(i) is the probability of observing a partial sequence O1O2O3…Ot such that the state Si.


Forward Algorithm Illustration (cont’d)

StateSj

SN bN(O1) 1(i) aiN) bN(O2)

… … …

S6 b6(O1) 1(i) ai6) b6(O2)

S5 b5(O1) 1(i) ai5) b5(O2)

S4 b4(O1) 1(i) ai4) b4(O2)

S3 b3(O1) 1(i) ai3) b3(O2)

S2 b2(O1) 1(i) ai2) b2(O2)

S1 b1(O1) 1(i) ai1) b1(O2)

t(j) O1 O2 O3 O4 … OT

Observations Ot

Tot

al o

f th

is c

olum

n gi

ves

solu

tion



Forward Algorithm

)|,...()( 21 ittt SqOOOPi

NiObi ii 1)()( 11

NjTtObaij tj

N

iijtt

1,11)()()( 1

11

Definition:

Initialization:

Induction:

Problem 1 Answer:


N

iT iOP

1

)()|( Complexity: O(N2T)


Backward Algorithm Illustration

t(i) is the probability of observing a partial sequence Ot+1Ot+2Ot+3…OT such that the state Si.


Backward AlgorithmDefinition:

Initialization:

Induction:

t(i) is the probability of observing a partial sequence Ot+1Ot+2Ot+3…OT such that the state Si.


Q2: Optimality Criterion 1* Maximize the expected number of correct individual states

Definition:

Initialization:

Problem 2 Answer:

t(i) is the probability of being in state Si at time t given the observation sequence O and the model .

Problem: If some aij=0, the optimal state sequence may not even be a valid state sequence.


Q2: Optimality Criterion 2

* Find the single best state sequence (path), i.e. maximize P(Q|O,).

Definition:

t(i) is the highest probability of a state path for the partial observation sequence O1O2O3…Ot such that the state Si.


Viterbi Algorithm

The major difference from the forward algorithm:

Maximization instead of sum


Viterbi Algorithm Illustration

StateSj

SN bN(O1) max1(i) aiN] bN(O2)

… … …

S6 b6(O1) max1(i) ai6] b6(O2)

S5 b5(O1) max1(i) ai5] b5(O2)

S4 b4(O1) max1(i) ai4] b4(O2)

S3 b3(O1) max1(i) ai3] b3(O2)

S2 b2(O1) max1(i) ai2] b2(O2)

S1 b1(O1) max1(i) ai1] b1(O2)

t(j) O1 O2 O3 O4 … OT

Observations Ot

t(i) is the highest probability of a state path for the partial observation sequence O1O2O3…Ot such that the state Si.

Max

of

this

col

indi

cate

s tr

aceb

ack

star

t


Relations with DBNForward Function:

Backward Function:

Viterbi Algorithm:

bj(Ot+1) aij t(i)

bj(Ot+1) aijt+1(j)

t+1(j)

t(i)

T(i)=1

bj(Ot+1) aij t(i)

t+1(j)


Some more definitions t(i) is the probability of being in state Si at time t

t(i,j) is the probability of being in state Si at time t, and Sj at time t+1


Baum-Welch Re-estimation

Expectation-Maximization Algorithm

Expectation:


Baum-Welch Re-estimation (cont’d)

Maximization:


Notes on the Re-estimation If the model does not change, it means that it has

reached a local maxima. Depending on the model, many local maxima can

exist Re-estimated probabilities will sum to 1


Implementation issues

Scaling Multiple observation sequences Initial parameter estimation Missing data Choice of model size and type


Scaling

calculation:

Recursion to calculate:


Scaling (cont’d)

calculation:

Desired condition:

* Note that is not true!


Scaling (cont’d)


Maximum log-likelihood

Initialization:

Recursion:

Termination:


Multiple observations sequences

Problem with re-estimation


Initial estimates of parameters

For and A, Random or uniform is sufficient

For B (discrete symbol prb.), Good initial estimate is needed


Insufficient training data

Solutions:

Increase the size of training data

Reduce the size of the model

Interpolate parameters using another model


References L Rabiner. ‘

A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition.’ Proceedings of the IEEE 1989.

S Russell, P Norvig. ‘Probabilistic Reasoning Over Time’. AI: A Modern Approach, Ch.15, 2002 (draft).

V Borkar, K Deshmukh, S Sarawagi. ‘Automatic segmentation of text into structured records.’ ACM SIGMOD 2001.

T Scheffer, C Decomain, S Wrobel. ‘Active Hidden Markov Models for Information Extraction.’ Proceedings of the International Symposium on Intelligent Data Analysis 2001.

S Ray, M Craven. ‘Representing Sentence Structure in Hidden Markov Models for Information Extraction.’ Proceedings of the 17th International Joint Conference on Artificial Intelligence 2001.

Documents

Hidden Markov Models (HMM) Rabiner’s Paper Markoviana Reading Group Computer Eng. & Science Dept. Arizona State University