39
. Hidden Markov Models with slides from Lise Getoor, Sebastian Thrun, William Cohen, and Yair Weiss

Hidden Markov Models with slides from Lise Getoor, Sebastian Thrun, William Cohen, and Yair Weiss

  • View
    220

  • Download
    8

Embed Size (px)

Citation preview

Page 1: Hidden Markov Models with slides from Lise Getoor, Sebastian Thrun, William Cohen, and Yair Weiss

.

Hidden Markov Models

with slides from Lise Getoor, Sebastian Thrun, William Cohen, and Yair Weiss

Page 2: Hidden Markov Models with slides from Lise Getoor, Sebastian Thrun, William Cohen, and Yair Weiss

Outline

Markov Models Hidden Markov Models The Main Problems in HMM Context Implementation Issues Applications of HMMs

Page 3: Hidden Markov Models with slides from Lise Getoor, Sebastian Thrun, William Cohen, and Yair Weiss

Weather: A Markov Model

Sunny

Rainy

Snowy

80%

15%

5%

60%

2%

38%

20%

75% 5%

Page 4: Hidden Markov Models with slides from Lise Getoor, Sebastian Thrun, William Cohen, and Yair Weiss

Ingredients of a Markov Model

States:

State transition probabilities:

Initial state distribution:

Sunny

Rainy

Snowy

80%

15%

5%

60%

2%

38%

20%

75% 5%

][ 1 ii SqP

},...,,{ 21 NSSS

)|( 1 jtitij SqSqPa

Page 5: Hidden Markov Models with slides from Lise Getoor, Sebastian Thrun, William Cohen, and Yair Weiss

Ingredients of Our Markov Model

States:

State transition probabilities:

Initial state distribution:

)05.25.7.(

},,{ snowyrainysunny SSS

2.05.75.

02.6.38.

05.15.8.

A

Sunny

Rainy

Snowy

80%

15%

5%

60%

2%

38%

20%

75% 5%

Page 6: Hidden Markov Models with slides from Lise Getoor, Sebastian Thrun, William Cohen, and Yair Weiss

Probability of a Seq. of States

Given:

What is the probability of this seq. of states?

)05.25.7.(

2.05.75.

02.6.38.

05.15.8.

A

0001512.02.002.06.06.015.07.0

)|()|()|()|()|()(

snowysnowyrainysnowy

rainyrainyrainyrainysunnyrainysunny

SSPSSPSSPSSPSSPSP

Page 7: Hidden Markov Models with slides from Lise Getoor, Sebastian Thrun, William Cohen, and Yair Weiss

Outline

Markov Models Hidden Markov Models The Main Problems in HMM Context Implementation Issues Applications of HMMs

Page 8: Hidden Markov Models with slides from Lise Getoor, Sebastian Thrun, William Cohen, and Yair Weiss

Hidden Markov Models

Sunny

Rainy

Snowy

80%

15%

5%

60%

2%

38%

20%

75% 5%

Sunny

Rainy

Snowy

80%

15%

5%

60%

2%

38%

20%

75% 5%

60%

10%

30%

65%

5%

30%

50%0%50%

NOT OBSERVABLE

Page 9: Hidden Markov Models with slides from Lise Getoor, Sebastian Thrun, William Cohen, and Yair Weiss

Sunny

Rainy

Snowy

80%

15%

5%

60%

2%

38%

20%

75% 5%

60%

10%

30%

50%0%50%

65%

5%

30%Ingredients of an HMM

States: State transition probabilities:

Initial state distribution:][ 1 ii SqP

},...,,{ 21 NSSS

)|( 1 jtitij SqSqPa

Observations:

Observation probabilities:

},...,,{ 21 MOOO

)|()( jtktj SqOvPkb emit output k in state j

prob of moving from state j to

state i

Page 10: Hidden Markov Models with slides from Lise Getoor, Sebastian Thrun, William Cohen, and Yair Weiss

Sunny

Rainy

Snowy

80%

15%

5%

60%

2%

38%

20%

75% 5%

60%

10%

30%

50%0%50%

65%

5%

30%

Ingredients of Our HMM States: Observations: State transition probabilities:

Initial state distribution: Observation probabilities:

)05.25.7.(

},,{ snowyrainysunny SSS

5.5.0

65.3.05.

1.3.6.

B

},,{ umbrellacoatshorts OOO

2.05.75.

02.6.38.

05.15.8.

A

Page 11: Hidden Markov Models with slides from Lise Getoor, Sebastian Thrun, William Cohen, and Yair Weiss

Three Basic Problems

Evaluation (aka likelihood): compute P(O| an HMM)

Decoding (aka inference): given an observed output sequence O

compute most likely state at each time periodcompute most likely state sequence

q* = argmax_q P(q|O, HMM) Training (aka learning):

find HMM* = argmax_HMM P(O|HMM)

Page 12: Hidden Markov Models with slides from Lise Getoor, Sebastian Thrun, William Cohen, and Yair Weiss

Probability of an Output Sequence

Given:

What is the probability of this output sequence?

)05.25.7.(

2.05.75.

02.6.38.

05.15.8.

A

5.5.0

65.3.05.

1.3.6.

B

),...,(),...,|()()|( 7171,..., all 71

qqPqqOPQPQOPqqQ

),...,,,()( umbrellaumbrellacoatcoat OOOOPOP

...6.01.03.08.07.0 426 exponential number of

terms

Page 13: Hidden Markov Models with slides from Lise Getoor, Sebastian Thrun, William Cohen, and Yair Weiss

The Forward Algorithm

S2

S3

S1

S2

S3

S1

O2 O3O1O2 O3O1

S2

S3

S1

O2 O3O1

S2

S3

S1

O2 O3O1

S2

S3

S1

O2 O3O1

),,...,()( 1 ittt SqOOPi

N

i

N

iTiT iSqOPOP

1 1

)(),()(

Page 14: Hidden Markov Models with slides from Lise Getoor, Sebastian Thrun, William Cohen, and Yair Weiss

The Forward Algorithm (cont.)

S2

S3

S1

S2

S3

S1

O2 O3O1O2 O3O1

S2

S3

S1

O2 O3O1

S2

S3

S1

O2 O3O1

S2

S3

S1

O2 O3O1

),,...,()( 1 ittt SqOOPi

)()(

)()|,(

),,...,(),,...,|,,...,(

),,...,()(

11

111

111111

1111

iaOb

iSqSqOP

SqOOPSqOOSqOOP

SqOOPj

tijt

N

ij

titjtt

N

i

N

iittittjtt

jttt

)()( 11 Obi ii

first, get to state i, then move to state j,

then omit output

O[t+1]

Page 15: Hidden Markov Models with slides from Lise Getoor, Sebastian Thrun, William Cohen, and Yair Weiss

Exercise

What is the probability of observing AB?

a. Initial state s1:

b. Initial state chosen at random:

s2s1

0.60.4

0.30.7

0.3

B

0.7

A

0.2

B

0.8

A

0.2 (0.4 0.8 + 0.6 0.7) = 0.148

0.5 0.148 + (0.5 0.3 (0.3 0.7 + 0.7 0.8)) = 0.1895

Page 16: Hidden Markov Models with slides from Lise Getoor, Sebastian Thrun, William Cohen, and Yair Weiss

The Backward Algorithm

S2

S3

S1

S2

S3

S1

O2 O3O1O2 O3O1

S2

S3

S1

O2 O3O1

S2

S3

S1

O2 O3O1

S2

S3

S1

O2 O3O1

)|,...,,()( 21 itTttt SqOOOPi

1)( iT

N

jttjijt jObai

111 )()(...)(

P(O) = sum over i: P(q1 is i) * P(emit O1 in state i) * beta_1(i)

Page 17: Hidden Markov Models with slides from Lise Getoor, Sebastian Thrun, William Cohen, and Yair Weiss

The Forward-Backward Algorithm

S2

S3

S1

S2

S3

S1

O2 O3O1O2 O3O1

S2

S3

S1

O2 O3O1

S2

S3

S1

O2 O3O1

S2

S3

S1

O2 O3O1

)(it)(it

P(O) = sum over i: alpha_t(i) * beta_t(i) for any t=> you can derive the formulas for forward algand backward alg from this

Page 18: Hidden Markov Models with slides from Lise Getoor, Sebastian Thrun, William Cohen, and Yair Weiss

Finding the best state sequenceWe would like to find the most likely path (and not just the most likely state at each time slice)

The Viterbi algorithm is an efficient method for finding the MPE:

and we to reconstruct the path:

)O,Q(P)O|Q(P maxargmaxargQQ

jitj

1tjitj

1ti1t

1ii1

a)j(maxarg)i(a)j(max)O(b)i(

)O(bq)i(

)Q(Q

)i(maxargQ)i(max)Q(P

tt1t

Ti

TTi

Page 19: Hidden Markov Models with slides from Lise Getoor, Sebastian Thrun, William Cohen, and Yair Weiss

Hidden Markov Models

Sunny

Rainy

Snowy

80%

15%

5%

60%

2%

38%

20%

75% 5%

Sunny

Rainy

Snowy

80%

15%

5%

60%

2%

38%

20%

75% 5%

60%

10%

30%

65%

5%

30%

50%0%50%

NOT OBSERVABLE

Page 20: Hidden Markov Models with slides from Lise Getoor, Sebastian Thrun, William Cohen, and Yair Weiss

Learning the Model with EM

Problem: Find HMM that makes data most likely

E-Step: Computefor given

M-Step: Compute new under these expectations (this is now a Markov model)

),|( OSqP it

Page 21: Hidden Markov Models with slides from Lise Getoor, Sebastian Thrun, William Cohen, and Yair Weiss

E-Step

Calculate

using the forward-backward algorithm, for fixed model

),|()( OSqPi itt

),|,(),( 1 OSqSqPji jtitt

Page 22: Hidden Markov Models with slides from Lise Getoor, Sebastian Thrun, William Cohen, and Yair Weiss

The M Step: generate =(, a, b))(1 at time statein timesofnumber expected 1 iSii

T

tt

T

tt

i

jiji

i

ji

S

SSa

1

1

)(

),(

state from ns transitioofnumber expected

state to state from ns transitioofnumber expected

T

tt

T

ttvO

i

kii

i

iI

S

vSkb

kt

1

1

)(

)(

statein timesofnumber expected

observing statein timesofnumber expected)(

Page 23: Hidden Markov Models with slides from Lise Getoor, Sebastian Thrun, William Cohen, and Yair Weiss

Understanding the EM Algorithm

The best way to understand the EM algorithm start with the M step, understand what quantities

it needs then look at the E step, see how it computes

those quantities with the help of the forward-backward algorithm

Page 24: Hidden Markov Models with slides from Lise Getoor, Sebastian Thrun, William Cohen, and Yair Weiss

Summary (Learning)

Given observation sequence O Guess initial model Iterate:

Calculate expected times in state Si at time t (and in Sj at

time t) using forward-backward algorithm Find new model by frequency counts

Page 25: Hidden Markov Models with slides from Lise Getoor, Sebastian Thrun, William Cohen, and Yair Weiss

Implementing HMM Algorithms

Quantities get very small for long sequences Taking logarithm helps

the Viterbi algorithm computing the alphas and betas not helpful in computing gammas

Normalization method can help these problems see the note by ChengXiang Zhai

Page 26: Hidden Markov Models with slides from Lise Getoor, Sebastian Thrun, William Cohen, and Yair Weiss

Problems with HMMs

Zero probabilities Training sequence: AAABBBAAA Test sequence: AAABBBCAAA

Finding “right” number of states, right structure Numerical instabilities

Page 27: Hidden Markov Models with slides from Lise Getoor, Sebastian Thrun, William Cohen, and Yair Weiss

Outline

Markov Models Hidden Markov Models The Main Problems in HMM Context Implementation Issues Applications of HMMs

Page 28: Hidden Markov Models with slides from Lise Getoor, Sebastian Thrun, William Cohen, and Yair Weiss

Three Problems

What bird is this?

How will the song continue?

Is this bird abnormal?

Time series classification

Time series prediction

Outlier detection

Page 29: Hidden Markov Models with slides from Lise Getoor, Sebastian Thrun, William Cohen, and Yair Weiss

Time Series Classification

Train one HMM l for each bird l Given time series O, calculate

'' )'()|(

)()|()|bird(

ll

l

lPOP

lPOPOlP

Page 30: Hidden Markov Models with slides from Lise Getoor, Sebastian Thrun, William Cohen, and Yair Weiss

Outlier Detection

Train HMM Given time series O, calculate probability

If abnormally low, raise flag If high, raise flag

)|( OP

Page 31: Hidden Markov Models with slides from Lise Getoor, Sebastian Thrun, William Cohen, and Yair Weiss

Time Series Prediction

Train HMM Given time series O, calculate distribution over final

state (via )

and ‘hallucinate’ new states and observations according to a, b

),|( OSqP iT

Page 32: Hidden Markov Models with slides from Lise Getoor, Sebastian Thrun, William Cohen, and Yair Weiss

Typical HMM in Speech Recognition

20-dim frequency spaceclustered using EM

Use Bayes rule + Viterbi for classification

Linear HMM representing one phoneme

[Rabiner 86] + everyone else

Page 33: Hidden Markov Models with slides from Lise Getoor, Sebastian Thrun, William Cohen, and Yair Weiss

Typical HMM in Robotics

[Blake/Isard 98, Fox/Dellaert et al 99]

Page 34: Hidden Markov Models with slides from Lise Getoor, Sebastian Thrun, William Cohen, and Yair Weiss

IE with Hidden Markov Models

Yesterday Pedro Domingos spoke this example sentence.

Yesterday Pedro Domingos spoke this example sentence.

Person name: Pedro Domingos

Given a sequence of observations:

and a trained HMM:

Find the most likely state sequence: (Viterbi)

Any words said to be generated by the designated “person name”state extract as a person name:

),(maxarg osPs

person name

location name

background

Page 35: Hidden Markov Models with slides from Lise Getoor, Sebastian Thrun, William Cohen, and Yair Weiss

HMM for Segmentation

Simplest Model: One state per entity type

Page 36: Hidden Markov Models with slides from Lise Getoor, Sebastian Thrun, William Cohen, and Yair Weiss

What is a “symbol” ???

Cohen => “Cohen”, “cohen”, “Xxxxx”, “Xx”, … ?

4601 => “4601”, “9999”, “9+”, “number”, … ?

000.. . . .999

3 -d ig i ts

00000 .. . .99999

5 -d ig i ts

0 ..99 0000 ..9999 000000 ..

O th e rs

N u m b e rs

A .. ..z

C h a rs

a a ..

M u lt i -le tte r

W o rds

. , / - + ? #

D e lim ite rs

A ll

Datamold: choose best abstraction level using holdout set

Page 37: Hidden Markov Models with slides from Lise Getoor, Sebastian Thrun, William Cohen, and Yair Weiss

HMM Example: “Nymble”

Other examples of shrinkage for HMMs in IE: [Freitag and McCallum ‘99]

Task: Named Entity Extraction

Train on ~500k words of news wire text.

Case Language F1 .Mixed English 93%Upper English 91%Mixed Spanish 90%

[Bikel, et al 1998], [BBN “IdentiFinder”]

Person

Org

Other

(Five other name classes)

start-of-sentence

end-of-sentence

Transitionprobabilities

Observationprobabilities

P(st | st-1, ot-1 ) P(ot | st , st-1 )

Back-off to: Back-off to:

P(st | st-1 )

P(st )

P(ot | st , ot-1 )

P(ot | st )

P(ot )

or

Results:

Page 38: Hidden Markov Models with slides from Lise Getoor, Sebastian Thrun, William Cohen, and Yair Weiss

Passage Selection (e.g., for IR)

Document

Query

Collection Information

Relevantpassages

How is a relevant passage different from a background passage in terms of language modeling?

Backgroundpassages

Page 39: Hidden Markov Models with slides from Lise Getoor, Sebastian Thrun, William Cohen, and Yair Weiss

HMMs: Main Lessons

HMMs: Generative probabilistic models of time series (with hidden state)

Forward-Backward: Algorithm for computing probabilities over hidden states

Learning models: EM, iterates estimation of hidden state and model fitting

Extremely practical, best known methods in speech, computer vision, robotics, …

Numerous extensions exist (continuous observations, states; factorial HMMs, controllable HMMs=POMDPs, …)