Speeding Up Algorithms for Hidden Markov Models by Exploiting
Repetitions
Shay Mozes
Oren Weimann (MIT)
Michal Ziv-Ukelson (Tel-Aviv U.)
2
Shortly:
• Hidden Markov Models are extensively used to model processes in many fields
• The runtime of HMM algorithms is usually linear in the length of the input
• We show how to exploit repetitions to obtain speedup
• First provable speedup of Viterbi’s algorithm
• Can use different compression schemes
• Applies to several decoding and training algorithms
3
Markov Models
q1 q2• statesq1 , … , qk
• transition probabilitiesPi←j
• emission probabilitiesei(σ) σєΣ
• time independent, discrete, finite
e1(A) = 0.3
e1(C) = 0.2
e1(G) = 0.2
e1(T) = 0.3
e2(A) = 0.2
e2(C) = 0.3
e2(G) = 0.3
e2(T) = 0.2
P1←1 = 0.9 P2←1 = 0.1 P2←2 = 0.8
P1←2 = 0.2
4
Hidden Markov Models
1
k
2
k
1
2
1
k
time
states
observed string
2
k
1
2
x1 x2xnx3
Markov Models
• We are only given the description of the model and the observed string
• Decoding: find the hidden sequence of states that is most likely to have generated the observed string
5
probability of best sequence of states that emits first 5 chars and ends in state 2
v6[4]= e4(c)·P4←2·v5[2]
probability of best sequence of states that emits first 5 chars and ends in state j
v6[4]= P4←2·v5[2]v6[4]= v5[2]v6[4]=maxj{e4(c)·P4←j·v5[j]}v5[2]
Decoding – Viterbi’s Algorithm1 2 3 4 5 6 7 8 9 n
1
2
3
4
5
6
a a c g a c g g t
states
time
6
Outline
• Overview
• Exploiting repetitions
• Using LZ78
• Using Run-Length Encoding
• Summary of results
7
vn=M(xn) ⊗M(xn-1) ⊗ ··· ⊗M(x1) ⊗ v0
v2 = M(x2) ⊗M(x1) ⊗ v0
VA in Matrix Notation
Viterbi’s algorithm:
v1[i]=maxj{ei(x1)·Pi←j · v0[j]}
v1[i]=maxj{ Mij(x1) · v0[j]}
Mij(σ) = ei (σ)·Pi←j
v1 = M(x1) ⊗ v0
(A⊗B)ij= maxk{Aik ·Bkj }
O(k2n)
O(k3n)
8
• use it twice!
vn=M(W)⊗M(t)⊗M(W)⊗M(t)⊗M(a)⊗M(c) ⊗v0
Exploiting Repetitionsc a t g a a c t g a a c
12 steps
6 steps
vn=M(c)⊗M(a)⊗M(a)⊗M(g)⊗M(t)⊗M(c)⊗M(a)⊗M(a)⊗M(g)⊗M(t)⊗M(a)⊗M(c)⊗v0
• compute M(W) = M(c)⊗M(a)⊗M(a)⊗M(g) once
9
ℓ - length of repetition W
λ – number of times W repeats in string
computing M(W) costs (ℓ -1)k3
each time W appears we save (ℓ -1)k2
W is good if λ(ℓ -1)k2 > (ℓ -1)k3
number of repeats λ > k number of states
Exploiting repetitions
>
matrix-matrix multiplication
matrix-vector multiplication
10
I. dictionary selection:choose the set D={Wi } of good substrings
II. encoding:compute M(Wi ) for every Wi in D
III. parsing:partition the input X into good substringsX = Wi1
Wi2 … Win’
X’ = i1,i2, … ,in’
IV. propagation:run Viterbi’s Algorithm on X’ using M(Wi)
General Scheme
Offline
11
Outline
• Overview
• Exploiting repetitions
• Using LZ78
• Using Run-Length Encoding
• Summary of results
12
LZ78
• The next LZ-word is the longest LZ-word previously seen plus one character
• Use a triea
c
g
g
aacgacg
• Number of LZ-words is asymptotically < n ∕ log n
13
I. O(n)
II. O(k3n ∕ log n)
III. O(n)
IV. O(k2n ∕ log n)
Using LZ78Cost
I. dictionary selection:D = words in LZ parse of X
II. encoding: use incremental nature of LZM(Wσ)= M(W) ⊗ M(σ)
III. parsing:X’ = LZ parse of X
IV. propagation:run VA on X’ using M(Wi )
Speedup: k2n log n
k3n ∕ log n k
14
• Remember speedup condition: λ > k • Use just LZ-words that appear more than k times• These words are represented by trie nodes with more
than k descendants• Now must parse X (step III) differently• Ensures graceful degradation with increasing k:
Speedup: min(1,log n ∕ k)
Improvementa
c
g
g
15
Experimental results
• Short - 1.5Mbp chromosome 4 of S. Cerevisiae (yeast)• Long - 22Mbp human Y-chromosome
~x5 faster:
16
Outline
• Overview
• Exploiting repetitions
• Using LZ78
• Using Run-Length Encoding
• Summary of results
17
Run Length Encodingaaaccggggg → a3c2g5
aaaccggggg → a2a1c2g4g1
18
Summary of results
• General framework • LZ78 log(n) ∕ k• RLE r ∕ log(r)• Byte-Pair Encoding r• Path reconstruction O(n)• F/B algorithms (standard matrix multiplication)• Viterbi training same speedups apply• Baum-Welch training speedup, many
details• Parallelization
19
Thank you!
Any questions?
20
Path traceback
• In VA, easy to do in O(n) time by keeping track of maximizing states during computation
• The problem: we run VA on X’, so we get the sequence of states for X’, not for X.we only get the states on the boundaries of good substrings of X
• Solution: keep track of maximizing states when computing the matrices M(w). Takes O(n) time and O(nk2) space
21
Training
• Estimate unknown parameters Pi←j , ei(σ)
• Use Expectation Maximization:1. Decoding
2. Recalculate parameters
• Viterbi Training: each iteration costs O( VA + n + k2)
Decoding (bottleneck) speedup!
path traceback +
update Pi←j , ei(σ)
22
Baum Welch Training
• each iteration costs: O( FB + nk2)
• If substring w has length l and repeats λ times satisfies:
then can speed up the entire process by precalculation
2
2
kl
lk
path traceback +
update Pi←j , ei(σ)
Decoding O(nk2)