.
cmsc726: HMMs
material from: slides from Sebastian Thrun, and Yair Weiss
Outline
Time Series Markov Models Hidden Markov Models Learning HMMs with EM Applying HMMs Summary
Audio Spectrum
Audio Spectrum of the Song of the Prothonotary Warbler
Bird Sounds
Chestnut-sided Warbler Prothonotary Warbler
Questions One Could Ask
What bird is this? How will the song
continue? Is this bird sick? What phases does this
song have?
Time series classification Time series prediction
Outlier detection Time series segmentation
Other Sound Samples
Another Time Series Problem
Intel
Cisco General Electric
Microsoft
Questions One Could Ask
Will the stock go up or down?
What type stock is this (eg, risky)?
Is the behavior abnormal?
Time series prediction
Time series classification
Outlier detection
Music Analysis
Questions One Could Ask
Is this Beethoven or Bach? Can we compose more of
that? Can we segment the piece
into themes?
Time series classification Time series
prediction/generation Time series
segmentation
CiteSeer.Com (Citation Index)
Dave Rumelhart
Takeo Kanade Tom Mitchell
Raj ReddyJim Morris
Questions One Could Ask Shall UMD give tenure? Shall UMD hire? Shall UMD fire?
Time series classification Time series prediction Outlier detection
Disclaimer: This is a joke!
The Real Question
How do we model these problems?
How do we formulate these questions as a inference/learning problems?
Outline For Today
Time Series Markov Models Hidden Markov Models Learning HMMs with EM Applying HMMs Summary
Weather: A Markov Model
Sunny
Rainy
Snowy
80%
15%
5%
60%
2%
38%
20%
75% 5%
Ingredients of a Markov Model
States:
State transition probabilities:
Initial state distribution:
Sunny
Rainy
Snowy
80%
15%
5%
60%
2%
38%
20%
75% 5%
][ 1 ii SqP
},...,,{ 21 NSSS
)|( 1 jtitij SqSqPa
Ingredients of Our Markov Model
States:
State transition probabilities:
Initial state distribution:
)05.25.7.(
},,{ snowyrainysunny SSS
2.05.75.
02.6.38.
05.15.8.
A
Sunny
Rainy
Snowy
80%
15%
5%
60%
2%
38%
20%
75% 5%
Probability of a Time Series
Given:
What is the probability of this series?
)05.25.7.(
2.05.75.
02.6.38.
05.15.8.
A
0001512.02.002.06.06.015.07.0
)|()|()|()|()|()(
snowysnowyrainysnowy
rainyrainyrainyrainysunnyrainysunny
SSPSSPSSPSSPSSPSP
Outline For Today
Time Series Markov Models Hidden Markov Models Learning HMMs with EM Applying HMMs Summary
Hidden Markov Models
Sunny
Rainy
Snowy
80%
15%
5%
60%
2%
38%
20%
75% 5%
Sunny
Rainy
Snowy
80%
15%
5%
60%
2%
38%
20%
75% 5%
60%
10%
30%
65%
5%
30%
50%0%50%
NOT OBSERVABLE
Sunny
Rainy
Snowy
80%
15%
5%
60%
2%
38%
20%
75% 5%
60%
10%
30%
50%0%50%
65%
5%
30%Ingredients of an HMM
States: State transition probabilities:
Initial state distribution:][ 1 ii SqP
},...,,{ 21 NSSS
)|( 1 jtitij SqSqPa
Observations:
Observation probabilities:
},...,,{ 21 MOOO
)|()( jtktj SqOvPkb
Sunny
Rainy
Snowy
80%
15%
5%
60%
2%
38%
20%
75% 5%
60%
10%
30%
50%0%50%
65%
5%
30%
Ingredients of Our HMM States: Observations: State transition probabilities:
Initial state distribution: Observation probabilities:
)05.25.7.(
},,{ snowyrainysunny SSS
5.5.0
65.3.05.
1.3.6.
B
},,{ umbrellacoatshorts OOO
2.05.75.
02.6.38.
05.15.8.
A
Probability of a Time Series
Given:
What is the probability of this series?
)05.25.7.(
2.05.75.
02.6.38.
05.15.8.
A
5.5.0
65.3.05.
1.3.6.
B
),...,(),...,|()()|( 7171,..., all 71
qqPqqOPQPQOPqqQ
),...,,,()( umbrellaumbrellacoatcoat OOOOPOP
...6.01.03.08.07.0 426
Calculating Data Likelihood
Problem: exponential in time Is there a more efficient way?
The Forward Algorithm (1)
S2
S3
S1
S2
S3
S1
O2 O3O1O2 O3O1
S2
S3
S1
O2 O3O1
S2
S3
S1
O2 O3O1
S2
S3
S1
O2 O3O1
…
),,...,()( 1 ittt SqOOPi
)()(
)()|,(
),,...,(),,...,|,,...,(
),,...,()(
11
111
111111
1111
iaOb
iSqSqOP
SqOOPSqOOSqOOP
SqOOPj
tijt
N
ij
titjtt
N
i
N
iittittjtt
jttt
)()( 11 Obi ii
Question
Does this solve our problem of calculating ? )(OP
YES NO
Count
Answer
And the answer is…
Yes!
N
i
N
iTiT iSqOPOP
1 1
)(),()(
Exercise
What is the probability of observing AB?
a. Initial state s1:
b. Initial state chosen at random:
s2s1
0.60.4
0.30.7
0.3
B
0.7
A
0.2
B
0.8
A
0.2 (0.4 0.8 + 0.6 0.7) = 0.148
0.5 0.148 + (0.5 0.3 (0.3 0.7 + 0.7 0.8)) = 0.1895
Next Question: What is the probability that the state at time t was Si?
Can we answer this?
S2
S3
S1
S2
S3
S1
O2 O3O1O2 O3O1
S2
S3
S1
O2 O3O1
S2
S3
S1
O2 O3O1
S2
S3
S1
O2 O3O1
…
),,...,()( 1 ittt SqOOPi
No!
The Backward Algorithm (2)
S2
S3
S1
S2
S3
S1
O2 O3O1O2 O3O1
S2
S3
S1
O2 O3O1
S2
S3
S1
O2 O3O1
S2
S3
S1
O2 O3O1
…
)|,...,,()( 21 itTttt SqOOOPi
1)( iT
N
jttjijt jObai
111 )()(...)(
The Forward-Backward Algorithm (3)
S2
S3
S1
S2
S3
S1
O2 O3O1O2 O3O1
S2
S3
S1
O2 O3O1
S2
S3
S1
O2 O3O1
S2
S3
S1
O2 O3O1
…
)(it)(it
)()()()( iiSqPi ttitt
Summary (So Far)
For a given HMM , we can compute
),|( OSqP it
Finding the best state sequenceWe would like to the most likely path (and not just the most likely state at each time slice)
The Viterbi algorithm is an efficient method for finding the MPE:
and we to reconstruct the path:
)O,Q(P)O|Q(P maxargmaxargQQ
jitj
1tjitj
1ti1t
1ii1
a)j(maxarg)i(a)j(max)O(b)i(
)O(bq)i(
)Q(Q
)i(maxargQ)i(max)Q(P
tt1t
Ti
TTi
Outline
Time Series Markov Models Hidden Markov Models Learning HMMs with EM Applying HMMs Summary
Hidden Markov Models
Sunny
Rainy
Snowy
80%
15%
5%
60%
2%
38%
20%
75% 5%
Sunny
Rainy
Snowy
80%
15%
5%
60%
2%
38%
20%
75% 5%
60%
10%
30%
65%
5%
30%
50%0%50%
NOT OBSERVABLE
s2s1
0.60.4
0.30.7
0.3
B
0.7
A
0.2
B
0.8
A
Summary So Far
HMMs: generative probabilistic models of time series with hidden state
Forward-backward algorithm: Efficient algorithm for calculating
),|( OSqP it
What about learning?
EM
Problem: Find HMM that makes data most likely
E-Step: Compute for given
M-Step: Compute new under these expectations (this is now a Markov model)
),|( OSqP it
E-Step
Calculate
using the forward-backward algorithm, for fixed model
),|()( OSqPi itt
),|,(),( 1 OSqSqPji jtitt
The M Step: generate =(, a, b))(1 at time statein timesofnumber expected 1 iSii
T
tt
T
tt
i
jiji
i
ji
S
SSa
1
1
)(
),(
state from ns transitioofnumber expected
state to state from ns transitioofnumber expected
T
tt
T
ttvO
i
kii
i
iI
S
vSkb
kt
1
1
)(
)(
statein timesofnumber expected
observing statein timesofnumber expected)(
Summary (Learning)
Given observation sequence O Guess initial model Iterate:
Calculate expected times in state Si at time t (and in Sj at
time t) using forward-backward algorithm Find new model by frequency counts
Outline For Today
Time Series Markov Models Hidden Markov Models Learning HMMs with EM Applying HMMs Summary
Three Problems
What bird is this?
How will the song continue?
Is this bird abnormal?
Time series classification
Time series prediction
Outlier detection
Time Series Classification
Train one HMM l for each bird l Given time series O, calculate
'' )'()|(
)()|()|bird(
ll
l
lPOP
lPOPOlP
Outlier Detection
Train HMM Given time series O, calculate probability
If abnormally low, raise flag If high, raise flag
)|( OP
Time Series Prediction
Train HMM Given time series O, calculate distribution over final
state (via )
and ‘hallucinate’ new states and observations according to a, b
),|( OSqP iT
Typical HMM in Speech Recognition
20-dim frequency spaceclustered using EM
Use Bayes rule + Viterbi for classification
Linear HMM representing one phoneme
[Rabiner 86] + everyone else
Typical HMM in Robotics
[Blake/Isard 98, Fox/Dellaert et al 99]
Problems with HMMs
Zero probabilities Training sequence: AAABBBAAA Test sequence: AAABBBCAAA
Finding “right” number of states, right structure Numerical instabilities
Outline
Time Series Markov Models Hidden Markov Models Learning HMMs with EM Applying HMMs Summary
HMMs: Main Lessons
HMMs: Generative probabilistic models of time series (with hidden state)
Forward-Backward: Algorithm for computing probabilities over hidden states
Learning models: EM, iterates estimation of hidden state and model fitting
Extremely practical, best known methods in speech, computer vision, robotics, …
Numerous extensions exist (continuous observations, states; factorial HMMs, controllable HMMs=POMDPs, …)