Download ppt - Speeding Up Algorithms for Hidden Markov Models by Exploiting Repetitions Shay Mozes Oren Weimann (MIT) Michal Ziv-Ukelson (Tel-Aviv U.)

Speeding Up Algorithms for Hidden Markov Models by Exploiting

Repetitions

Shay Mozes

Oren Weimann (MIT)

Michal Ziv-Ukelson (Tel-Aviv U.)

2

Shortly:

• Hidden Markov Models are extensively used to model processes in many fields

• The runtime of HMM algorithms is usually linear in the length of the input

• We show how to exploit repetitions to obtain speedup

• First provable speedup of Viterbi’s algorithm

• Can use different compression schemes

• Applies to several decoding and training algorithms

3

Markov Models

q1 q2• statesq1 , … , qk

• transition probabilitiesPi←j

• emission probabilitiesei(σ) σєΣ

• time independent, discrete, finite

e1(A) = 0.3

e1(C) = 0.2

e1(G) = 0.2

e1(T) = 0.3

e2(A) = 0.2

e2(C) = 0.3

e2(G) = 0.3

e2(T) = 0.2

P1←1 = 0.9 P2←1 = 0.1 P2←2 = 0.8

P1←2 = 0.2

4

Hidden Markov Models

1

k

2

k

1

2

1

k

time

states

observed string

2

k

1

2

x1 x2xnx3

Markov Models

• We are only given the description of the model and the observed string

• Decoding: find the hidden sequence of states that is most likely to have generated the observed string

5

probability of best sequence of states that emits first 5 chars and ends in state 2

v6[4]= e4(c)·P4←2·v5[2]

probability of best sequence of states that emits first 5 chars and ends in state j

v6[4]= P4←2·v5[2]v6[4]= v5[2]v6[4]=maxj{e4(c)·P4←j·v5[j]}v5[2]

Decoding – Viterbi’s Algorithm1 2 3 4 5 6 7 8 9 n

1

2

3

4

5

6

a a c g a c g g t

states

time

6

Outline

• Overview

• Exploiting repetitions

• Using LZ78

• Using Run-Length Encoding

• Summary of results

7

vn=M(xn) ⊗M(xn-1) ⊗ ··· ⊗M(x1) ⊗ v0

v2 = M(x2) ⊗M(x1) ⊗ v0

VA in Matrix Notation

Viterbi’s algorithm:

v1[i]=maxj{ei(x1)·Pi←j · v0[j]}

v1[i]=maxj{ Mij(x1) · v0[j]}

Mij(σ) = ei (σ)·Pi←j

v1 = M(x1) ⊗ v0

(A⊗B)ij= maxk{Aik ·Bkj }

O(k2n)

O(k3n)

8

• use it twice!

vn=M(W)⊗M(t)⊗M(W)⊗M(t)⊗M(a)⊗M(c) ⊗v0

Exploiting Repetitionsc a t g a a c t g a a c

12 steps

6 steps

vn=M(c)⊗M(a)⊗M(a)⊗M(g)⊗M(t)⊗M(c)⊗M(a)⊗M(a)⊗M(g)⊗M(t)⊗M(a)⊗M(c)⊗v0

• compute M(W) = M(c)⊗M(a)⊗M(a)⊗M(g) once

9

ℓ - length of repetition W

λ – number of times W repeats in string

computing M(W) costs (ℓ -1)k3

each time W appears we save (ℓ -1)k2

W is good if λ(ℓ -1)k2 > (ℓ -1)k3

number of repeats λ > k number of states

Exploiting repetitions

>

matrix-matrix multiplication

matrix-vector multiplication

10

I. dictionary selection:choose the set D={Wi } of good substrings

II. encoding:compute M(Wi ) for every Wi in D

III. parsing:partition the input X into good substringsX = Wi1

Wi2 … Win’

X’ = i1,i2, … ,in’

IV. propagation:run Viterbi’s Algorithm on X’ using M(Wi)

General Scheme

Offline

11

Outline

• Overview


• Using LZ78



12

LZ78

• The next LZ-word is the longest LZ-word previously seen plus one character

• Use a triea

c

g

g

aacgacg

• Number of LZ-words is asymptotically < n ∕ log n

13

I. O(n)

II. O(k3n ∕ log n)

III. O(n)

IV. O(k2n ∕ log n)

Using LZ78Cost

I. dictionary selection:D = words in LZ parse of X

II. encoding: use incremental nature of LZM(Wσ)= M(W) ⊗ M(σ)

III. parsing:X’ = LZ parse of X

IV. propagation:run VA on X’ using M(Wi )

Speedup: k2n log n

k3n ∕ log n k

14

• Remember speedup condition: λ > k • Use just LZ-words that appear more than k times• These words are represented by trie nodes with more

than k descendants• Now must parse X (step III) differently• Ensures graceful degradation with increasing k:

Speedup: min(1,log n ∕ k)

Improvementa

c

g

g

15

Experimental results

• Short - 1.5Mbp chromosome 4 of S. Cerevisiae (yeast)• Long - 22Mbp human Y-chromosome

~x5 faster:

16

Outline

• Overview


• Using LZ78



17

Run Length Encodingaaaccggggg → a3c2g5

aaaccggggg → a2a1c2g4g1

18

Summary of results

• General framework • LZ78 log(n) ∕ k• RLE r ∕ log(r)• Byte-Pair Encoding r• Path reconstruction O(n)• F/B algorithms (standard matrix multiplication)• Viterbi training same speedups apply• Baum-Welch training speedup, many

details• Parallelization

19

Thank you!

Any questions?

20

Path traceback

• In VA, easy to do in O(n) time by keeping track of maximizing states during computation

• The problem: we run VA on X’, so we get the sequence of states for X’, not for X.we only get the states on the boundaries of good substrings of X

• Solution: keep track of maximizing states when computing the matrices M(w). Takes O(n) time and O(nk2) space

21

Training

• Estimate unknown parameters Pi←j , ei(σ)

• Use Expectation Maximization:1. Decoding

2. Recalculate parameters

• Viterbi Training: each iteration costs O( VA + n + k2)

Decoding (bottleneck) speedup!

path traceback +

update Pi←j , ei(σ)

22

Baum Welch Training

• each iteration costs: O( FB + nk2)

• If substring w has length l and repeats λ times satisfies:

then can speed up the entire process by precalculation

2

2

kl

lk

path traceback +

update Pi←j , ei(σ)

Decoding O(nk2)