Siddiqi and Moore, www.autonlab.org
Fast Inference and Learning in Large-State-Space HMMs
Sajid M. SiddiqiAndrew W. Moore
The Auton LabCarnegie Mellon
University
Siddiqi and Moore, www.autonlab.org
HMM Overview Reducing quadratic complexity in the number
of states• The model• Algorithms for fast evaluation and inference• Algorithms for fast learning
Results• Speed• Accuracy
Conclusion
Siddiqi and Moore, www.autonlab.org
HMM Overview Reducing quadratic complexity in the number
of states• The model• Algorithms for fast evaluation and inference• Algorithms for fast learning
Results• Speed• Accuracy
Conclusion
Siddiqi and Moore, www.autonlab.org
i P(qt+1=s1|qt=si) P(qt+1=s2|qt=si) … P(qt+1=sj|qt=si) …P(qt+1=sN|qt=si)
1 a11 a12…a1j
…a1N
2 a21 a22…a2j
…a2N
3 a31 a32…a3j
…a3N
: : : : : : :
i ai1 ai2…aij
…aiN
N aN1 aN2…aNj
…aNN
Transition Model
1/3
q0
q1
q2
q3
q4
Siddiqi and Moore, www.autonlab.org
Each of these probability tables is identical
i P(qt+1=s1|qt=si) P(qt+1=s2|qt=si) … P(qt+1=sj|qt=si) …P(qt+1=sN|qt=si)
1 a11 a12…a1j
…a1N
2 a21 a22…a2j
…a2N
3 a31 a32…a3j
…a3N
: : : : : : :
i ai1 ai2…aij
…aiN
N aN1 aN2…aNj
…aNN
Transition Model
1/3
q0
q1
q2
q3
q4
Notation:
)|( 1 itjtij sqsqPa
Siddiqi and Moore, www.autonlab.org
Observation Modelq0
q1
q2
q3
q4
O0
O1
O2
O3
O4
i P(Ot=1|qt=si) P(Ot=2|qt=si) … P(Ot=k|qt=si) … P(Ot=M|qt=si)
1 b1(1) b1 (2) … b1 (k) … b1(M)
2 b2 (1) b2 (2) … b2(k) … b2 (M)
3 b3 (1) b3 (2) … b3(k) … b3 (M)
: : : : : : :
i bi(1) bi (2) … bi(k) … bi (M)
: : : : : : :
N bN (1) bN (2) … bN(k) … bN (M)
Siddiqi and Moore, www.autonlab.org
Observation Modelq0
q1
q2
q3
q4
O0
O1
O2
O3
O4
i P(Ot=1|qt=si) P(Ot=2|qt=si) … P(Ot=k|qt=si) … P(Ot=M|qt=si)
1 b1(1) b1 (2) … b1 (k) … b1(M)
2 b2 (1) b2 (2) … b2(k) … b2 (M)
3 b3 (1) b3 (2) … b3(k) … b3 (M)
: : : : : : :
i bi(1) bi (2) … bi(k) … bi (M)
: : : : : : :
N bN (1) bN (2) … bN(k) … bN (M)
Notation:
)|()( itti sqkOPkb
Siddiqi and Moore, www.autonlab.org
Some Famous HMM TasksQuestion 1: State Estimation
What is P(qT=Si | O1O2…OT)
Siddiqi and Moore, www.autonlab.org
Question 1: State Estimation
What is P(qT=Si | O1O2…OT)
Some Famous HMM Tasks
Siddiqi and Moore, www.autonlab.org
Question 1: State Estimation
What is P(qT=Si | O1O2…OT)
Some Famous HMM Tasks
Siddiqi and Moore, www.autonlab.org
Question 1: State Estimation
What is P(qT=Si | O1O2…OT)
Question 2: Most Probable Path
Given O1O2…OT , what is the most probable path that I took?
Some Famous HMM Tasks
Siddiqi and Moore, www.autonlab.org
Question 1: State Estimation
What is P(qT=Si | O1O2…OT)
Question 2: Most Probable Path
Given O1O2…OT , what is the most probable path that I took?
Some Famous HMM Tasks
Siddiqi and Moore, www.autonlab.org
Question 1: State Estimation
What is P(qT=Si | O1O2…OT)
Question 2: Most Probable Path
Given O1O2…OT , what is the most probable path that I took?
Some Famous HMM Tasks
Woke up at 8.35, Got on Bus at 9.46, Sat in lecture 10.05-11.22…
Siddiqi and Moore, www.autonlab.org
Some Famous HMM TasksQuestion 1: State Estimation
What is P(qT=Si | O1O2…OT)
Question 2: Most Probable Path
Given O1O2…OT , what is the most probable path that I took?
Question 3: Learning HMMs:
Given O1O2…OT , what is the maximum likelihood HMM that could have produced this string of observations?
Siddiqi and Moore, www.autonlab.org
Some Famous HMM TasksQuestion 1: State Estimation
What is P(qT=Si | O1O2…OT)
Question 2: Most Probable Path
Given O1O2…OT , what is the most probable path that I took?
Question 3: Learning HMMs:
Given O1O2…OT , what is the maximum likelihood HMM that could have produced this string of observations?
Siddiqi and Moore, www.autonlab.org
Some Famous HMM TasksQuestion 1: State Estimation
What is P(qT=Si | O1O2…OT)
Question 2: Most Probable Path
Given O1O2…OT , what is the most probable path that I took?
Question 3: Learning HMMs:
Given O1O2…OT , what is the maximum likelihood HMM that could have produced this string of observations?
Eat
Bus
walk
aAB
aBB
aAA
aCB
aBA aBC
aCC
Ot-1 Ot+1
Ot
bA(Ot-1)
bB(Ot)
bC(Ot+1)
Siddiqi and Moore, www.autonlab.org
Basic Operations in HMMsFor an observation sequence O = O1…OT, the three basic HMM
operations are:
Problem Algorithm Complexity
Evaluation:
Calculating P(O|)
Forward-Backward O(TN2)
Inference:
Computing Q* = argmaxQ P(O,Q|)
Viterbi Decoding O(TN2)
Learning:
Computing * = argmax P(O|Baum-Welch (EM) O(TN2)
T = # timesteps, i.e. datapoints N = # states
Siddiqi and Moore, www.autonlab.org
Basic Operations in HMMsFor an observation sequence O = O1…OT, the three basic HMM
operations are:
Problem Algorithm Complexity
Evaluation:
Calculating P(O|)
Forward-Backward O(TN2)
Inference:
Computing Q* = argmaxQ P(O,Q|)
Viterbi Decoding O(TN2)
Learning:
Computing * = argmax P(O|Baum-Welch (EM) O(TN2)
This talk:
A simple approach to
reducing the complexity in N
T = # timesteps, i.e. datapoints N = # states
Siddiqi and Moore, www.autonlab.org
HMM Overview Reducing quadratic complexity
• The model• Algorithms for fast evaluation and inference• Algorithms for fast learning
Results• Speed• Accuracy
Conclusion
Siddiqi and Moore, www.autonlab.org
Reducing Quadratic Complexity in NWhy does it matter?
• Quadratic HMM algorithms hinder HMM computations when N is large
• Several promising applications for efficient large-state-space HMM algorithms in • topic modeling• speech recognition• real-time HMM systems such as for
activity monitoring• … and more
Siddiqi and Moore, www.autonlab.org
Idea One: Sparse Transition Matrix
• Only K << N non-zero next-state probabilities
Siddiqi and Moore, www.autonlab.org
Idea One: Sparse Transition Matrix
• Only K << N non-zero next-state probabilities
7.003.000
05.0005.0
75.00025.00
03.007.00
004.006.0
Siddiqi and Moore, www.autonlab.org
Idea One: Sparse Transition Matrix
• Only K << N non-zero next-state probabilities
7.003.000
05.0005.0
75.00025.00
03.007.00
004.006.0
Only O(TNK)!
Siddiqi and Moore, www.autonlab.org
Idea One: Sparse Transition Matrix
• Only K << N non-zero next-state probabilities
7.003.000
05.0005.0
75.00025.00
03.007.00
004.006.0
• But can get very badly
confused by
“impossible transitions”
• Cannot learn the
sparse structure (once
chosen cannot
change)
Only O(TNK)!
Siddiqi and Moore, www.autonlab.org
Dense-Mostly-Constant (DMC) Transitions
K non-constant probabilities per row DMC HMMs comprise a richer and more
expressive class of models than sparse HMMs
a DMC transition matrix with K=2
25.015.030.015.015.0
01.051.001.001.046.0
6.005.005.025.005.0
04.018.004.07.004.0
1.01.03.01.04.0
Siddiqi and Moore, www.autonlab.org
Dense-Mostly-Constant (DMC) Transitions• The transition model for state i now consists of:
• K = the number of non-constant values per row
• NCi = { j : sisj is a non-constant transition probability }
• ci = the transition probability for si to all states not in NCi
• aij = the non-constant transition probability for si sj,
iNCj
25.015.030.015.015.0
01.051.001.001.046.0
6.005.005.025.005.0
04.018.004.07.004.0
1.01.03.01.04.0 K = 2
NC3 = {2,5}
c3 = 0.05
a32 = 0.25
a35 = 0.6
Siddiqi and Moore, www.autonlab.org
HMM Overview Reducing quadratic complexity in the number
of states• The model• Algorithms for fast evaluation and inference• Algorithms for fast learning
Results• Speed• Accuracy
Conclusion
Siddiqi and Moore, www.autonlab.org
Evaluation in Regular HMMsP(qt = si | O1, O2 … Ot) =
Where
N
jt
t
j
i
1
)(
)(
ittt SqOOOi ..P 21
Siddiqi and Moore, www.autonlab.org
Evaluation in Regular HMMsP(qt = si | O1, O2 … Ot) =
Where
Then,
N
jt
t
j
i
1
)(
)(
ittt SqOOOi ..P 21
j
T iOP )|(
Siddiqi and Moore, www.autonlab.org
Evaluation in Regular HMMsP(qt = si | O1, O2 … Ot) =
Where
Then,
N
jt
t
j
i
1
)(
)(
ittt SqOOOi ..P 21
j
T iOP )|(
Called the “forward variables”
Siddiqi and Moore, www.autonlab.org
t t(1) t(2) t(3) … t(N)
1
2 …
3 …
4
5
6
7
8
9
iObaj ti
tjijt 11
•Cost O(TN2)
Siddiqi and Moore, www.autonlab.org
Similarly,
and
Also costs O(TN2)
itTttt SqOOOi |..P 21
iObaj ti
tjijt 11
Siddiqi and Moore, www.autonlab.org
Similarly,
and
Also costs O(TN2)
itTttt SqOOOi |..P 21
iObaj ti
tjijt 11
Called the “backward variables”
Siddiqi and Moore, www.autonlab.org
Fast Evaluation in DMC HMMs
O(N), but only computed
once per row of the table!O(K) for each t(j) entry
This yields O(TNK) complexity for the evaluation problem
Siddiqi and Moore, www.autonlab.org
Fast Inference in DMC HMMs
O(N2) recursion in regular model:
O(NK) recursion in DMC model:
O(N), but only computed
once per row of the tableO(K) for each t(j) entry
Siddiqi and Moore, www.autonlab.org
HMM Overview Reducing quadratic complexity in the number
of states• The model• Algorithms for fast evaluation and inference• Algorithms for fast learning
Results• Speed• Accuracy
Conclusion
Siddiqi and Moore, www.autonlab.org
Learning a DMC HMM
• Idea One:• Ask user to tell us the DMC
structure• Learn the parameters using EM
Siddiqi and Moore, www.autonlab.org
Learning a DMC HMM
• Idea One:• Ask user to tell us the DMC
structure• Learn the parameters using EM
• Simple!
• But in general, don’t know the DMC structure
Siddiqi and Moore, www.autonlab.org
Learning a DMC HMM
• Idea Two:Use EM to learn the DMC structure also
1. Guess DMC structure2. Find expected transition
counts and observation parameters, given current model and observations
3. Find maximum likelihood DMC model given counts
4. Goto 2
Siddiqi and Moore, www.autonlab.org
Learning a DMC HMM
• Idea Two:Use EM to learn the DMC structure also
1. Guess DMC structure2. Find expected transition
counts and observation parameters, given current model and observations
3. Find maximum likelihood DMC model given counts
4. Goto 2
DMC structure can (and does) change!
Siddiqi and Moore, www.autonlab.org
Learning a DMC HMM
• Idea Two:Use EM to learn the DMC structure also
1. Guess DMC structure2. Find expected transition
counts and observation parameters, given current model and observations
3. Find maximum likelihood DMC model given counts
4. Goto 2
DMC structure can (and does) change!
In fact, just start with an all-constant transition model
Siddiqi and Moore, www.autonlab.org
Learning a DMC HMM2. Find expected transition
counts and observation parameters, given current model and observations
Siddiqi and Moore, www.autonlab.org
newija )|( 1 itjt sqsqP We want new estimate of
N
kT
old
Told
OOOki
OOOji
121
21
,,,| ns transitio# Expected
,,,| ns transitio# Expected
Siddiqi and Moore, www.autonlab.org
newija )|( 1 itjt sqsqP We want new estimate of
N
kT
old
Told
OOOki
OOOji
121
21
,,,| ns transitio# Expected
,,,| ns transitio# Expected
N
k
T
tTitkt
T
tTitjt
OOOsqsqP
OOOsqsqP
1 121
old1
121
old1
),,,|,(
),,,|,(
Siddiqi and Moore, www.autonlab.org
newija )|( 1 itjt sqsqP We want new estimate of
N
kT
old
Told
OOOki
OOOji
121
21
,,,| ns transitio# Expected
,,,| ns transitio# Expected
N
k
T
tTitkt
T
tTitjt
OOOsqsqP
OOOsqsqP
1 121
old1
121
old1
),,,|,(
),,,|,(
N
kik
ij
S
S
1
where
T
tTitjtij OOsqsqPS
1
old11 )|,,,(
T
ttjttij Objia
111 )()()(
Applying Bayes rule to both terms gives us…
Siddiqi and Moore, www.autonlab.org
T
N
T
N
We want
N
kikijij SSa
1
new
T
ttjttijij ObjiaS
111 )()()( where
Siddiqi and Moore, www.autonlab.org
T
N
T
N
Can get this in O(TN) time
Can get this in O(TN) time
We want
N
kikijij SSa
1
new
T
ttjttijij ObjiaS
111 )()()( where
Siddiqi and Moore, www.autonlab.org
We want
N
kikijij SSa
1
new
T
tttij jiS
1
)()( where
T
N
T
N
Can get this in O(TN) time
Can get this in O(TN) time
)()()( 11 tjtijt Objaj
Siddiqi and Moore, www.autonlab.org
We want
N
kikijij SSa
1
new
T
tttij jiS
1
)()( where
T
N
T
N
SN
N
S24
*2 *4
Dot Product of Columns
Siddiqi and Moore, www.autonlab.org
We want
N
kikijij SSa
1
new
T
tttij jiS
1
)()( where
T
N
T
N
SN
N
S24
*2 *4
Dot Product of Columns
TS O(TN2)
Siddiqi and Moore, www.autonlab.org
We want
N
kikijij SSa
1
new
T
tttij jiS
1
)()( where
T
N
T
N
SN
N
S24
*2 *4
Dot Product of Columns
TS O(TN2)
Speedups:
• Strassen?
Siddiqi and Moore, www.autonlab.org
We want
N
kikijij SSa
1
new
T
tttij jiS
1
)()( where
T
N
T
N
SN
N
S24
*2 *4
Dot Product of Columns
TS O(TN2)
Speedups:
• Strassen
• Approximate by DMC
Siddiqi and Moore, www.autonlab.org
We want
N
kikijij SSa
1
new
T
tttij jiS
1
)()( where
T
N
T
N
SN
N
S24
*2 *4
Dot Product of Columns
TS O(TN2)
Speedups:
• Strassen
• Approximate by DMC
• Approximate randomized ATB
Siddiqi and Moore, www.autonlab.org
We want
N
kikijij SSa
1
new
T
tttij jiS
1
)()( where
T
N
T
N
SN
N
S24
*2 *4
Dot Product of Columns
TS O(TN2)
Speedups:
• Strassen
• Approximate by DMC
• Approximate randomized ATB
• Sparse structure fine?
Siddiqi and Moore, www.autonlab.org
We want
N
kikijij SSa
1
new
T
tttij jiS
1
)()( where
T
N
T
N
SN
N
S24
*2 *4
Dot Product of Columns
TS O(TN2)
Speedups:
• Strassen
• Approximate by DMC
• Approximate randomized ATB
• Sparse structure fine
• Fixed DMC is fine?
Siddiqi and Moore, www.autonlab.org
We want
N
kikijij SSa
1
new
T
tttij jiS
1
)()( where
T
N
T
N
SN
N
S24
*2 *4
Dot Product of Columns
TS O(TN2)
Speedups:
• Strassen
• Approximate by DMC
• Approximate randomized ATB
• Sparse structure fine
• Fixed DMC is fine
• Speedup without approximation
Siddiqi and Moore, www.autonlab.org
We want
N
kikijij SSa
1
new
T
tttij jiS
1
)()( where
T
N
T
N
SN
N
S24• Insight One: only need the top K entries
in each row of S
• Insight Two: Values in columns of and are often very skewed
Siddiqi and Moore, www.autonlab.org
T
N N
-biggies(i) -biggies(j)
For i = 1..N, store indexes of R largest values in i’th column of
For j = 1..N, store indexes of R largest values in j’th column of
There’s an important detail I’m omitting here to do with prescaling the rows of and .
Siddiqi and Moore, www.autonlab.org
T
N N
-biggies(i) -biggies(j)
For i = 1..N, store indexes of R largest values in i’th column of
For j = 1..N, store indexes of R largest values in j’th column of
R << T
Takes O(TN) time to do all indexes
Siddiqi and Moore, www.autonlab.org
T
N N
-biggies(i) -biggies(j)
For i = 1..N, store indexes of R largest values in i’th column of
For j = 1..N, store indexes of R largest values in j’th column of
R << T
Takes O(TN) time to do all indexes
T
tttij jiS
1
)()(
Siddiqi and Moore, www.autonlab.org
T
N N
-biggies(i) -biggies(j)
For i = 1..N, store indexes of R largest values in i’th column of
For j = 1..N, store indexes of R largest values in j’th column of
R << T
Takes O(TN) time to do all indexes
T
tttij jiS
1
)()(
biggies(j)-biggies(i)-
)()(
t
tt ji
biggies(j)-biggies(i)-
)()(
t
tt ji
Siddiqi and Moore, www.autonlab.org
T
N N
-biggies(i) -biggies(j)
For i = 1..N, store indexes of R largest values in i’th column of
For j = 1..N, store indexes of R largest values in j’th column of
R << T
Takes O(TN) time to do all indexes
T
tttij jiS
1
)()(
biggies(j)-biggies(i)-
)()(
t
tt ji
biggies(j)-biggies(i)-
)()(
t
tt ji
biggies(j)-biggies(i)-
)()(
t
tt ji
biggies(j)-biggies(i)-
)( )()(
t
tR ji
Siddiqi and Moore, www.autonlab.org
T
N N
-biggies(i) -biggies(j)
For i = 1..N, store indexes of R largest values in i’th column of
For j = 1..N, store indexes of R largest values in j’th column of
R << T
Takes O(TN) time to do all indexes
T
tttij jiS
1
)()(
biggies(j)-biggies(i)-
)()(
t
tt ji
biggies(j)-biggies(i)-
)()(
t
tt ji
biggies(j)-biggies(i)-
)()(
t
tt ji
biggies(j)-biggies(i)-
)( )()(
t
tR ji
R’th largest value in i’th column of
O(1) time to obtain
O(1) time to obtain (precached for all j in time O(TN) )
O(R) computation
Siddiqi and Moore, www.autonlab.org
S
N
j1 2 3 N…
Sij
Computing the i’th row of S…
In O(NR) time, we can put upper and lower bounds on Sij for j = 1,2 .. N
Siddiqi and Moore, www.autonlab.org
S
N
j1 2 3 N…
Sij
Computing the i’th row of S…
In O(NR) time, we can put upper and lower bounds on Sij for j = 1,2 .. N
Only need exact values of Sij for the k largest values within the row
Siddiqi and Moore, www.autonlab.org
S
N
j1 2 3 N…
Sij
Computing the i’th row of S…
In O(NR) time, we can put upper and lower bounds on Sij for j = 1,2 .. N
Only need exact values of Sij for the k largest values within the row
Ignore j’s that can’t be the best
Siddiqi and Moore, www.autonlab.org
S
N
j1 2 3 N…
Sij
Computing the i’th row of S…
In O(NR) time, we can put upper and lower bounds on Sij for j = 1,2 .. N
Only need exact values of Sij for the k largest values within the row
Ignore j’s that can’t be the best
Be exact for the rest: O(N) time each.
Siddiqi and Moore, www.autonlab.org
S
N
j1 2 3 N…
Sij
Computing the i’th row of S…
In O(NR) time, we can put upper and lower bounds on Sij for j = 1,2 .. N
Only need exact values of Sij for the k largest values within the row
Ignore j’s that can’t be the best
Be exact for the rest: O(N) time each.
If there’s enough pruning,
total time is O(TN+RN2)
Siddiqi and Moore, www.autonlab.org
In Short …• Sub-quadratic evaluation• Sub-quadratic inference• ‘Nearly’ sub-quadratic learning• Fully connected transition models allowed
Siddiqi and Moore, www.autonlab.org
In Short …• Sub-quadratic evaluation• Sub-quadratic inference• ‘Nearly’ sub-quadratic learning• Fully connected transition models allowed
Some extra work to extract ‘important’
transitions from data
Siddiqi and Moore, www.autonlab.org
HMM Overview Reducing quadratic complexity in the number
of states• The model• Algorithms for fast evaluation and inference• Algorithms for fast learning
Results• Speed• Accuracy
Conclusion
Siddiqi and Moore, www.autonlab.org
Evaluation and Inference Speedup
Dat
ase
t: s
ynth
etic
dat
a w
ith T
=20
00 t
ime
ste
ps
Siddiqi and Moore, www.autonlab.org
Parameter Learning Speedup
Dat
ase
t: s
ynth
etic
dat
a w
ith T
=20
00 t
ime
ste
ps
Siddiqi and Moore, www.autonlab.org
HMM Overview Reducing quadratic complexity in the number
of states• The model• Algorithms for fast evaluation and inference• Algorithms for fast learning
Results• Speed• Accuracy
Conclusion
Siddiqi and Moore, www.autonlab.org
Datasets• DMC-friendly dataset:
• From 2-D gaussian 20-state DMC HMM with K=5 (20,000 train, 5,000 test)
• Anti-DMC dataset: • From 2-D gaussian 20-state regular HMM with steadily
varying, well-distributed transition probabilities (20,000 train, 5,000 test)
• Motionlogger dataset: • Accelerometer data from two sensors worn over several
days (10,000 train, 4,720 test)
Siddiqi and Moore, www.autonlab.org
HMMs Used• Regular and DMC HMMs:
• 20 states
• Baseline 1: • 5-state regular HMM
• Baseline 2: • 20-state HMM with uniform transition probabilities
Siddiqi and Moore, www.autonlab.org
HMMs Used• Regular and DMC HMMs:
• 20 states
• Baseline 1: • 5-state regular HMM
• Baseline 2: • 20-state HMM with uniform transition probabilities
Do we really need a large HMM?
Does the transition model matter?
Siddiqi and Moore, www.autonlab.org
Learning Curves for DMC-friendly dataDMC model achieves full model score!
Siddiqi and Moore, www.autonlab.org
Learning Curves for DMC-friendly dataDMC model achieves full model score!
Siddiqi and Moore, www.autonlab.org
Learning Curves for Anti-DMC dataDMC model worse than full model
Siddiqi and Moore, www.autonlab.org
Learning Curves for Anti-DMC dataDMC model worse than full model
Siddiqi and Moore, www.autonlab.org
Learning Curves for Motionlogger dataDMC model achieves full model score!
Siddiqi and Moore, www.autonlab.org
Learning Curves for Motionlogger dataDMC model achieves full model score!
Baselines do much worse
Siddiqi and Moore, www.autonlab.org
Regularization with DMC HMMs• # of transition parameters in regular 100-state
HMM: 10,000• # of transition parameters in DMC 100-state
HMM with K= 5 : 500
Siddiqi and Moore, www.autonlab.org
Tradeoffs between N and K• We vary N and K while keeping the number of
transition parameters (N×K) constant• Increasing N and decreasing K allows more states
for modeling data features but fewer parameters per state for temporal structure
Siddiqi and Moore, www.autonlab.org
Tradeoffs between N and K
• Average test-set log-likelihoods at convergence• Datasets:
• A: DMC-friendly• B: Anti-DMC• C: Motionlogger
Siddiqi and Moore, www.autonlab.org
Tradeoffs between N and K
• Average test-set log-likelihoods at convergence• Datasets:
• A: DMC-friendly• B: Anti-DMC• C: Motionlogger
Each dataset has a different optimal N-vs-K
tradeoff
Siddiqi and Moore, www.autonlab.org
HMM Overview Reducing quadratic complexity in the number
of states• The model• Algorithms for fast evaluation and inference• Algorithms for fast learning
Results• Speed• Accuracy
Conclusion
Siddiqi and Moore, www.autonlab.org
Conclusions• DMC HMMs are an important class of models that allow
parameterized complexity-vs-efficiency tradeoffs in large state spaces
Siddiqi and Moore, www.autonlab.org
Conclusions• DMC HMMs are an important class of models that allow
parameterized complexity-vs-efficiency tradeoffs in large state spaces
• The speedup can be several orders of magnitude
Siddiqi and Moore, www.autonlab.org
Conclusions• DMC HMMs are an important class of models that allow
parameterized complexity-vs-efficiency tradeoffs in large state spaces
• The speedup can be several orders of magnitude
• Even for non-DMC domains, DMC HMMs yield higher scores than baseline models
Siddiqi and Moore, www.autonlab.org
Conclusions• DMC HMMs are an important class of models that allow
parameterized complexity-vs-efficiency tradeoffs in large state spaces
• The speedup can be several orders of magnitude
• Even for non-DMC domains, DMC HMMs yield higher scores than baseline models
• The DMC HMM model can be applied to arbitrary state spaces and observation densities
Siddiqi and Moore, www.autonlab.org
Related Work• Felzenszwalb et al. (2003) – fast HMM algorithms when
transition probabilities can be expressed as distances in an underlying parameter space
• Murphy and Paskin (2002) – fast inference in hierarchical HMMs cast as DBNs
• Salakhutdinov et al. (2003) – combines EM and conjugate gradient for faster HMM learning when missing information amount is high
• Ghahramani and Jordan (1996) – Factorial HMMs for distributed representation of large state spaces
• Beam Search – widely used heuristic in viterbi inference for speech systems
Siddiqi and Moore, www.autonlab.org
Future Work• Eliminate R parameter using an automatic backoff
evaluation approach• Investigate DMC HMMs as regularization
mechanism• Compare robustness against overfitting with factorial
HMMs for large-state-space problems