Upload
phoebe-johns
View
219
Download
0
Tags:
Embed Size (px)
Citation preview
Hidden Markov Models
Yves Moreau
Katholieke Universiteit Leuven
Regular expressions
Alignment
Regular expression
Problem: regular expression does not distinguish Exceptional TGCTAGG Consensus ACACATC
ACA---ATGTCAACTATCACAC--AGCAGA---ATCACCG--ATC
[AT][CG][AC][ACGT]*A[TG][GC]
Hidden Markov Models
A .8C 0G 0T .2
A 0C .8G .2T 0
A .8C .2G 0T 0
A 1C 0G 0T 0
A 0C 0G .2T .8
A 0C .8G .2T 0
A .2C .4G .2T .2
1.0 1.0
.6
.4
.6
.4
1.01.0
Sequence score
04720
80180116040
6080180180)(
.
....
.... P
ACACATCTransition probabilities
Emission probabilities
Log odds
Use logarithm for scaling and normalize by random model
Log odds for sequence S:
A: 1.16T:-0.22
C: 1.16G:-0.22
A: 1.16C:-0.22
A: 1.39G:-0.22T: 1.16
C: 1.16G:-0.22
A:-0.22C: 0.47G:-0.22T:-0.22
0 0
-0.51
-0.92
-0.51
-0.92
00
25.0log)(log25.0
)(log LSP
SPL
Log odds
65.6
16.1016.139.151.047.0
51.016.1016.1016.1)(oddslog
ACACATC
Sequence Log odds
ACAC--ATC (consensus) 6.7
ACA---ATG 4.9
TCAACTATC 3.0
ACAC--AGC 5.3
AGA---ATC 4.9
ACCG--ATC 4.6
TGCT--AGG (exceptional) -0.97
Markov chain
Sequence: Example of a Markov chain
Probabilistic model of a DNA sequence
)|( 1 sxtxPa iist
Transition probabilities
),,, with ,...,, (e.g., 21 TGCAxxxxx iL AA
A
C G
T
Markov property
Probability of a sequence through Bayes’ rule
Markov property “The future is only function of the present and not of the past”
1 1
1 1 1 2 1 1
( ) ( , ,..., | length )
( | ,..., ) ( | ,..., ) ( )L L
L L L L
P x P x x x L
P x x x P x x x P x
L
ixx
LLLL
iiaxP
xPxxPxxPxxPxP
21
112211
1)(
)()|()|()|()(
Beginning and end of a sequence
Computation of the probability is not homogeneous Length distribution is not modeled
P(length=L) unspecified Solution
Modeling of beginning and end of the sequence
The probability to observe a sequence of a given length decreases with the length of the sequence
1
1
Sequence: , ,..., ,
( )
( | )
L
s
t L
x x
a P x s
a P x t
A
C G
T
Hidden Markov Models
A .8C 0G 0T .2
A 0C .8G .2T 0
A .8C .2G 0T 0
A 1C 0G 0T 0
A 0C 0G .2T .8
A 0C .8G .2T 0
A .2C .4G .2T .2
1.0 1.0
.6
.4
.6
.4
1.01.0
Sequence score
04720
80180116040
6080180180)(
.
....
.... P
ACACATCTransition probabilities
Emission probabilities
Hidden Markov Model
In a hidden Markov model, we observe the symbol sequence x but we want to reconstruct the hidden state sequence (path )
Transition probabilities (: a0l, : ak0)
Emission probabilities
Joint probability of the sequence ,x1,...,xL, and the path
)|( 1 klPa iikl
)|()( kbxPbe iik
L
ii iii
axeaxP1
0 11)(),(
Casino (I) – problem setup
The casino uses mostly a fair die but switches sometimes to a loaded die
We observe the outcome x of the successive throws but want to know when the die was fair or loaded (path )
1: 1/62: 1/63: 1/64: 1/65: 1/66: 1/6
1: 1/102: 1/103: 1/104: 1/105: 1/106: 1/2
0.05
0.1
0.90.95
Fair Loaded
Estimation of the sequence and state probabilities
The Viterbi algorithm
We look for the most probable path *
This problem can be tackled by dynamic programming Let us define vk(i) as the probability of the most probable path that
ends in state k for the emission of symbol xi
Then we can compute this probability recursively as
* arg max ( , ) arg max ( | )P x P x
))((max)()1( 1 klkk
ill aivxeiv
1 11 1 1
,...,( ) max ( ,..., , ,.... , )
ik i i iv i P x x k
The Viterbi algorithm
The Viterbi algorithm grows the best path dynamically Initial condition: sequence in beginning state Traceback pointers tot follow the best path (= decoding)
)(ptr:)1,...,(Traceback
))((maxarg
))((max),( :nTerminatio
))1((maxarg)(ptr
))1((max)()(:),...,1(Recursion
0,0)0(,1)0(:)0(tion Initializa
**1
0*
0*
0
iii
kkk
L
kkk
klkk
i
klkk
ill
k
Li
aLv
aLvxP
aivl
aivxeivLi
kvvi
Casino (II) - Viterbi
The forward algorithm
The forward algorithm let us compute the probability P(x) of a sequence w.r.t. an HMM
This is important for the computation of posterior probabilities and the comparison of HMMs
The sum over all paths (exponentially many) can be computed by dynamic programming
Les us define fk(i) as the probability of the sequence for the paths that end in state k with the emission of symbol xi
Then we can compute this probability as
),()( xPxP
),,...,()( 1 kxxPif iik
k
klkill aifxeif )()()1( 1
The forward algorithm
The forward algorithm grows the total probability dynamically from the beginning to the end of the sequence
Initial condition: sequence in beginning state End: all states converge to the end state
kkk
kklkill
k
aLfxP
aifxeifLi
kffi
0
0
)()( :End
)1()()(:),...,1(Recursion
0,0)0(,1)0(:)0(tion Initializa
The backward algorithm
The backward algorithm let us compute the probability of the complete sequence together with the condition that symbol xi is emitted from state k
This is important to compute the probability of a given state at symbol xi
P(x1,...,xi,i=k) can be computed by the forward algorithm fk(i)
Let us define bk(i) as the probability that the rest of the sequence for the paths that pass through state k at symbol xi
)|,...,(),,...,(
),,...,|,...,(),,...,(),(
11
111
kxxPkxxP
kxxxxPkxxPkxP
iLiii
iiLiiii
)|,...,()( 1 kxxPib iLik
The backward algorithm
The backward algorithm grows the probability bk(i) dynamically backwards (from end to beginning)
Border condition: start in end state
Once both forward and backward probabilities are available, we can compute the posterior probability of the state
llll
llilklk
kk
bxeaxP
ibxeaibLi
kaLbLi
)1()()( :nTerminatio
)1()()(:)1,...,1(Recursion
,)(:)(tion Initializa
10
1
0
)(
)()()|(
xP
ibifxkP kk
i
Posterior decoding
Instead of using the most probable path for decoding (Viterbi), we can use the path of the most probable states
The path ^ can be “illegal” (P(^|x)=0) This approach can also be used when we are interested
in a function g(k) of the state (e.g., labeling)
)|(maxargˆ xkP ik
i
k
i kgxkPxiG )()|()|(
Casino (III) – posterior decodering
Posterior probability of the state “fair” w.r.t. the die throws
Casino (IV) – posterior decodering
New situation : P(xi+1 = FAIR | xi = FAIR) = 0.99 Viterbi decoding cannot detect the cheating from 1000
throws, while posterior decoding does
1: 1/62: 1/63: 1/64: 1/65: 1/66: 1/6
1: 1/102: 1/103: 1/104: 1/105: 1/106: 1/2
0.01
0.1
0.90.99
Fair Loaded
Parameter estimation for HMMs
Choice of the architecture
For the parameter estimation, we assume that the architecture of the HMM is known
Choice of architecture is an essential design choice
Duration modeling
“Silent states” for gaps
Parameter estimation with known paths
HMM with parameters (transition and emission probabilities)
Training set D of N sequences x1,...,xN Score of the model is the likelihood of the parameters
given the training data
N
j
jN xPxxP1
1 )|(log)|,...,(log),(Score D
Parameter estimation with known paths
If the state paths are known, the parameters are estimated through counts (how often is a transition used, how often is a symbol produced by a given state)
Use of ‘pseudocounts’ if necessary Akl = number of transitions from k to l in training set +
pseudocount rkl
Ek(b) = number of emissions of b from k in training set + pseudocount rk(b)
bk
kk
llk
klkl bE
bEbe
AA
a)(
)()( and
Parameter estimation with unknown paths: Viterbi training
Strategy: iterative method Suppose that the parameters are known and find the best path Use Viterbi decoding to estimate the parameters Iterate till convergence
Viterbi training does not maximize the likelihood of the parameters
Viterbi training converges exactly in a finite number of steps
))(),...,(,|,...,(maxarg *1*1Vit NN xxxxP
Parameter estimation with unknown paths: Baum-Welch training
Strategy: parallel to Viterbi but we use the expected value for the transition and emission counts (instead of using only the best path)
For the transitions
For the emissions
)|(
)1()()(),|,( 1
1
xP
ibxeaifxlkP lilklk
ii
j i
jl
jilkl
jkj
j i
jiikl ibxeaif
xPxlkPA )1()()(
)|(1
),|,( 11
j bxi
jk
jkj
j bxi
jik
ji
ji
ibifxP
xkPbE}|{}|{
)()()|(
1),|()(
Parameter estimation with unknown paths: Baum-Welch training
Initialization: Choose arbitrary model parameters Recursion:
Set all transitions and emission variables to their pseudocount For all sequences j = 1,...,n
Compute fk(i) for sequence j with the forward algorithm Compute bk(i) for sequence j with the backward algorithm Add the contributions to A and E
Compute the new model parameters akl =Akl/kl’ and ek(b) Compute the log-likelihood of the model
End: stop when the log-likelihood does not change more than by some threshold or when the maximum number of iterations is exceeded
Casino (V) – Baum-Welch training
1: 0.192: 0.193: 0.234: 0.085: 0.236: 0.08
1: 0.072: 0.103: 0.104: 0.175: 0.056: 0.52
0.27
0.29
0.710.73
Fair Loaded
1: 0.172: 0.173: 0.174: 0.175: 0.176: 0.15
1: 0.102: 0.113: 0.104: 0.115: 0.106: 0.48
0.07
0.12
0.880.93
Fair Loaded
1: 1/62: 1/63: 1/64: 1/65: 1/66: 1/6
1: 1/102: 1/103: 1/104: 1/105: 1/106: 1/2
0.05
0.1
0.90.95
Fair Loaded
Originalmodel
300 throws 30000 throws
Numerical stability
Many expressions contain products of many probabilities This causes underflow when we compute these
expressions For Viterbi, this can be solved by working with the
logarithms For the forward and backward algorithms, we can work
with an approximation to the logarithm or by working with rescaled variables
Summary
Hidden Markov Models Computation of sequence and state probabilities
Viterbi computation of the best state path The forward algorithm for the computation of the probability of a
sequence The backward algorithm for the computation of state probabilities
Parameter estimation for HMMs Parameter estimation with known paths Parameter estimation with unknown paths
Viterbi training Baum-Welch training