29
Hidden Markov Models (HMM) Rabiner’s Paper Markoviana Reading Group Computer Eng. & Science Dept. Arizona State University

Hidden Markov Models (HMM) Rabiner’s Paper Markoviana Reading Group Computer Eng. & Science Dept. Arizona State University

Embed Size (px)

Citation preview

Page 1: Hidden Markov Models (HMM) Rabiner’s Paper Markoviana Reading Group Computer Eng. & Science Dept. Arizona State University

Hidden Markov Models (HMM)

Rabiner’s Paper

Markoviana Reading Group Computer Eng. & Science Dept.Arizona State University

Page 2: Hidden Markov Models (HMM) Rabiner’s Paper Markoviana Reading Group Computer Eng. & Science Dept. Arizona State University

Markoviana Reading Group Fatih Gelgi – Feb, 2005 2

Stationary and Non-stationary

Stationary Process:Its statistical properties do not vary with time

Non-stationary Process:The signal properties vary over time

Page 3: Hidden Markov Models (HMM) Rabiner’s Paper Markoviana Reading Group Computer Eng. & Science Dept. Arizona State University

Markoviana Reading Group Fatih Gelgi – Feb, 2005 3

HMM Example - Casino Coin

Fair Unfair

0.9 0.2

0.8

0.1

0.30.5 0.5 0.7

H HT T

State transition Pbbties.

Symbol emission Pbbties.

HTHHTTHHHTHTHTHHTHHHHHHTHTHHObservation Sequence

FFFFFFUUUFFFFFFUUUUUUUFFFFFF State Sequence

Motivation: Given a sequence of H & Ts, can you tell at what times the casino cheated?

Observation Symbols

States

Two CDF tablesTwo CDF tables

Page 4: Hidden Markov Models (HMM) Rabiner’s Paper Markoviana Reading Group Computer Eng. & Science Dept. Arizona State University

Markoviana Reading Group Fatih Gelgi – Feb, 2005 4

Properties of an HMM

First-order Markov process qt only depends on qt-1

Time is discrete

Page 5: Hidden Markov Models (HMM) Rabiner’s Paper Markoviana Reading Group Computer Eng. & Science Dept. Arizona State University

Markoviana Reading Group Fatih Gelgi – Feb, 2005 5

Elements of an HMM

.....

........

.....

.....

...

2

1

321

N

M

S

S

S

OOOOb

....

.......

....

....

...

2

1

21

N

N

S

S

S

SSSa

....

...21 NSSS

N, the number of States M, the number of Symbols States S1, S2, … SN Observation Symbols O1, O2, … OM

, the Probability Distributions a, b,

Page 6: Hidden Markov Models (HMM) Rabiner’s Paper Markoviana Reading Group Computer Eng. & Science Dept. Arizona State University

Markoviana Reading Group Fatih Gelgi – Feb, 2005 6

HMM Basic Problems

1. Given an observation sequence O=O1O2O3…OT and , find P(O|)

Forward Algorithm / Backward Algorithm

2. Given O=O1O2O3…OT and find most likely state sequence Q=q1q2…qT

Viterbi Algorithm

3. Given O=O1O2O3…OT and re-estimate so that P(O|) is higher than it is now

Baum-Welch Re-estimation

Page 7: Hidden Markov Models (HMM) Rabiner’s Paper Markoviana Reading Group Computer Eng. & Science Dept. Arizona State University

Markoviana Reading Group Fatih Gelgi – Feb, 2005 7

Forward Algorithm Illustration

t(i) is the probability of observing a partial sequence O1O2O3…Ot such that the state Si.

Page 8: Hidden Markov Models (HMM) Rabiner’s Paper Markoviana Reading Group Computer Eng. & Science Dept. Arizona State University

Markoviana Reading Group Fatih Gelgi – Feb, 2005 8

Forward Algorithm Illustration (cont’d)

StateSj

SN bN(O1) 1(i) aiN) bN(O2)

… … …

S6 b6(O1) 1(i) ai6) b6(O2)

S5 b5(O1) 1(i) ai5) b5(O2)

S4 b4(O1) 1(i) ai4) b4(O2)

S3 b3(O1) 1(i) ai3) b3(O2)

S2 b2(O1) 1(i) ai2) b2(O2)

S1 b1(O1) 1(i) ai1) b1(O2)

t(j) O1 O2 O3 O4 … OT

Observations Ot

Tot

al o

f th

is c

olum

n gi

ves

solu

tion

t(i) is the probability of observing a partial sequence O1O2O3…Ot such that the state Si.

Page 9: Hidden Markov Models (HMM) Rabiner’s Paper Markoviana Reading Group Computer Eng. & Science Dept. Arizona State University

Markoviana Reading Group Fatih Gelgi – Feb, 2005 9

Forward Algorithm

)|,...()( 21 ittt SqOOOPi

NiObi ii 1)()( 11

NjTtObaij tj

N

iijtt

1,11)()()( 1

11

Definition:

Initialization:

Induction:

Problem 1 Answer:

t(i) is the probability of observing a partial sequence O1O2O3…Ot such that the state Si.

N

iT iOP

1

)()|( Complexity: O(N2T)

Page 10: Hidden Markov Models (HMM) Rabiner’s Paper Markoviana Reading Group Computer Eng. & Science Dept. Arizona State University

Markoviana Reading Group Fatih Gelgi – Feb, 2005 10

Backward Algorithm Illustration

t(i) is the probability of observing a partial sequence Ot+1Ot+2Ot+3…OT such that the state Si.

Page 11: Hidden Markov Models (HMM) Rabiner’s Paper Markoviana Reading Group Computer Eng. & Science Dept. Arizona State University

Markoviana Reading Group Fatih Gelgi – Feb, 2005 11

Backward AlgorithmDefinition:

Initialization:

Induction:

t(i) is the probability of observing a partial sequence Ot+1Ot+2Ot+3…OT such that the state Si.

Page 12: Hidden Markov Models (HMM) Rabiner’s Paper Markoviana Reading Group Computer Eng. & Science Dept. Arizona State University

Markoviana Reading Group Fatih Gelgi – Feb, 2005 12

Q2: Optimality Criterion 1* Maximize the expected number of correct individual states

Definition:

Initialization:

Problem 2 Answer:

t(i) is the probability of being in state Si at time t given the observation sequence O and the model .

Problem: If some aij=0, the optimal state sequence may not even be a valid state sequence.

Page 13: Hidden Markov Models (HMM) Rabiner’s Paper Markoviana Reading Group Computer Eng. & Science Dept. Arizona State University

Markoviana Reading Group Fatih Gelgi – Feb, 2005 13

Q2: Optimality Criterion 2

* Find the single best state sequence (path), i.e. maximize P(Q|O,).

Definition:

t(i) is the highest probability of a state path for the partial observation sequence O1O2O3…Ot such that the state Si.

Page 14: Hidden Markov Models (HMM) Rabiner’s Paper Markoviana Reading Group Computer Eng. & Science Dept. Arizona State University

Markoviana Reading Group Fatih Gelgi – Feb, 2005 14

Viterbi Algorithm

The major difference from the forward algorithm:

Maximization instead of sum

Page 15: Hidden Markov Models (HMM) Rabiner’s Paper Markoviana Reading Group Computer Eng. & Science Dept. Arizona State University

Markoviana Reading Group Fatih Gelgi – Feb, 2005 15

Viterbi Algorithm Illustration

StateSj

SN bN(O1) max1(i) aiN] bN(O2)

… … …

S6 b6(O1) max1(i) ai6] b6(O2)

S5 b5(O1) max1(i) ai5] b5(O2)

S4 b4(O1) max1(i) ai4] b4(O2)

S3 b3(O1) max1(i) ai3] b3(O2)

S2 b2(O1) max1(i) ai2] b2(O2)

S1 b1(O1) max1(i) ai1] b1(O2)

t(j) O1 O2 O3 O4 … OT

Observations Ot

t(i) is the highest probability of a state path for the partial observation sequence O1O2O3…Ot such that the state Si.

Max

of

this

col

indi

cate

s tr

aceb

ack

star

t

Page 16: Hidden Markov Models (HMM) Rabiner’s Paper Markoviana Reading Group Computer Eng. & Science Dept. Arizona State University

Markoviana Reading Group Fatih Gelgi – Feb, 2005 16

Relations with DBNForward Function:

Backward Function:

Viterbi Algorithm:

bj(Ot+1) aij t(i)

bj(Ot+1) aijt+1(j)

t+1(j)

t(i)

T(i)=1

bj(Ot+1) aij t(i)

t+1(j)

Page 17: Hidden Markov Models (HMM) Rabiner’s Paper Markoviana Reading Group Computer Eng. & Science Dept. Arizona State University

Markoviana Reading Group Fatih Gelgi – Feb, 2005 17

Some more definitions t(i) is the probability of being in state Si at time t

t(i,j) is the probability of being in state Si at time t, and Sj at time t+1

Page 18: Hidden Markov Models (HMM) Rabiner’s Paper Markoviana Reading Group Computer Eng. & Science Dept. Arizona State University

Markoviana Reading Group Fatih Gelgi – Feb, 2005 18

Baum-Welch Re-estimation

Expectation-Maximization Algorithm

Expectation:

Page 19: Hidden Markov Models (HMM) Rabiner’s Paper Markoviana Reading Group Computer Eng. & Science Dept. Arizona State University

Markoviana Reading Group Fatih Gelgi – Feb, 2005 19

Baum-Welch Re-estimation (cont’d)

Maximization:

Page 20: Hidden Markov Models (HMM) Rabiner’s Paper Markoviana Reading Group Computer Eng. & Science Dept. Arizona State University

Markoviana Reading Group Fatih Gelgi – Feb, 2005 20

Notes on the Re-estimation If the model does not change, it means that it has

reached a local maxima. Depending on the model, many local maxima can

exist Re-estimated probabilities will sum to 1

Page 21: Hidden Markov Models (HMM) Rabiner’s Paper Markoviana Reading Group Computer Eng. & Science Dept. Arizona State University

Markoviana Reading Group Fatih Gelgi – Feb, 2005 21

Implementation issues

Scaling Multiple observation sequences Initial parameter estimation Missing data Choice of model size and type

Page 22: Hidden Markov Models (HMM) Rabiner’s Paper Markoviana Reading Group Computer Eng. & Science Dept. Arizona State University

Markoviana Reading Group Fatih Gelgi – Feb, 2005 22

Scaling

calculation:

Recursion to calculate:

Page 23: Hidden Markov Models (HMM) Rabiner’s Paper Markoviana Reading Group Computer Eng. & Science Dept. Arizona State University

Markoviana Reading Group Fatih Gelgi – Feb, 2005 23

Scaling (cont’d)

calculation:

Desired condition:

* Note that is not true!

Page 24: Hidden Markov Models (HMM) Rabiner’s Paper Markoviana Reading Group Computer Eng. & Science Dept. Arizona State University

Markoviana Reading Group Fatih Gelgi – Feb, 2005 24

Scaling (cont’d)

Page 25: Hidden Markov Models (HMM) Rabiner’s Paper Markoviana Reading Group Computer Eng. & Science Dept. Arizona State University

Markoviana Reading Group Fatih Gelgi – Feb, 2005 25

Maximum log-likelihood

Initialization:

Recursion:

Termination:

Page 26: Hidden Markov Models (HMM) Rabiner’s Paper Markoviana Reading Group Computer Eng. & Science Dept. Arizona State University

Markoviana Reading Group Fatih Gelgi – Feb, 2005 26

Multiple observations sequences

Problem with re-estimation

Page 27: Hidden Markov Models (HMM) Rabiner’s Paper Markoviana Reading Group Computer Eng. & Science Dept. Arizona State University

Markoviana Reading Group Fatih Gelgi – Feb, 2005 27

Initial estimates of parameters

For and A, Random or uniform is sufficient

For B (discrete symbol prb.), Good initial estimate is needed

Page 28: Hidden Markov Models (HMM) Rabiner’s Paper Markoviana Reading Group Computer Eng. & Science Dept. Arizona State University

Markoviana Reading Group Fatih Gelgi – Feb, 2005 28

Insufficient training data

Solutions:

Increase the size of training data

Reduce the size of the model

Interpolate parameters using another model

Page 29: Hidden Markov Models (HMM) Rabiner’s Paper Markoviana Reading Group Computer Eng. & Science Dept. Arizona State University

Markoviana Reading Group Fatih Gelgi – Feb, 2005 29

References L Rabiner. ‘

A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition.’ Proceedings of the IEEE 1989.

S Russell, P Norvig. ‘Probabilistic Reasoning Over Time’. AI: A Modern Approach, Ch.15, 2002 (draft).

V Borkar, K Deshmukh, S Sarawagi. ‘Automatic segmentation of text into structured records.’ ACM SIGMOD 2001.

T Scheffer, C Decomain, S Wrobel. ‘Active Hidden Markov Models for Information Extraction.’ Proceedings of the International Symposium on Intelligent Data Analysis 2001.

S Ray, M Craven. ‘Representing Sentence Structure in Hidden Markov Models for Information Extraction.’  Proceedings of the 17th International Joint Conference on Artificial Intelligence 2001.