Upload
livana
View
35
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Hidden Markov Models. Eine Einführung. A. T. B. E. C. G. Markov Chains. Markov Chains. We want a model that generates sequences in which the probability of a symbol depends on the previous symbol only. Transition probabilities : Probability of a sequence : Note:. Markov Chains. - PowerPoint PPT Presentation
Citation preview
Hidden Markov Models
Eine Einführung
Markov Chains
A T
C G
EB
Markov Chains
We want a model that generates sequences in which the probability of a symbol depends on the previous symbol only.
Transition probabilities:
Probability of a sequence:
Note:
1|st i ia P x t x s
1 1
1 1 1 2 1 1
, ,......,
| ,...., | ,...., ...
L L
L L L L
P x P x x x
P x x x P x x x P x
, |P X Y P X Y P Y
Markov Chains
The key property of a Markov Chain is that the probability of each symbol xi depends only on the value of the preceeding symbol
Modelling the beginning and end of sequences
1
1 1 2 2 1
12
| | ..... |
i i
L L L L
L
x xi
P x P x x P x x P x x
P x a
1
|
Bs
L tE
P x s a
and
P E x t a
Markov Chains
Markov Chains can be used to discriminate between two options by calculating a likelihood ratio Example: CpG – Islands in human DANNRegions labeled as CpG – islands + modelRegions labeled as non-CpG – islands - modelMaximum Likelihood estimators for the transition probabilities for each model
and analgously for the – model. Cst+ is the number of times letter t
followed letter s in the labelled region
''
stst
stt
ca
c
Markov Chains
From 48 putative CpG – islands of a human DNA one estimates the following transition probabilities
Note that the tables are asymmetric
+ A C G T
A 0.180 0.274 0.426 0.120
C 0.171 0.368 0.274 0.188
G 0.161 0.339 0.375 0.125
T 0.079 0.355 0.384 0.182
- A C G T
A 0.300 0.205 0.285 0.210
C 0.322 0.298 0.078 0.302
G 0.248 0.246 0.298 0.208
T 0.177 0.239 0.292 0.292
Markov Chains
To use the model for discrimination one calculates the log-odds ratio
1
1
1
1
1
| modlog log
| modi i
i i
i i
Lx x
i x x
L
x xi
aP x elS x
P x el a
Β (bits) A C G T
A -0.740 0.419 0.580 -0.803
C -0.913 0.302 1.812 -0.685
G -0.624 0.461 0.331 -0.730
T -1.169 0.573 0.393 -0.679
Hidden Markov Models
How can one find CpG – islands in a long chain of nucleotides? Merge both models into one model with small transition
probabilities between the chains. Within each chain the transition probabilities should remain close
to the original ones Relabeling of the states:
The states A+, C+, G+, T+ emit the symbols A, C, G, T
The relabeling is critical as there is no one to one correspondence between the states and the symbols. From looking at C in isolation one cannot tell whether it was emitted from C+ or C-
Hidden Markov Models
Formal Definitions Distinguish the sequence of states from the sequence of symbols Call the state sequence the path π. It follows a simple Markov model
with transition probabilities
As the symbols b are decoupled from the states k new parameters are needed giving the probability that symbol b is seen when in state k
These are known as emission probabilities
1|kl i ia P l k
|k i ie b P x b k
Hidden Markov Models
The Viterbi Algorithm It is the most common decoding algorithm with HMMs It is a dynamic programming algorithm There may be many state sequences which give rise to any
particular sequence of symbols But the corresponding probabilities are very different CpG – islands:
(C+, G+, C+, G+) (C-, G-, C-, G-) (C+, G-, C+, G-)
They all generate the symbol sequence
CGCG
but the first has the highest probability
Hidden Markov Models
Search recursively for the most probable path
Suppose the probability vk(i) of the most probable path ending in state k with observation i is known for all states k
Then this probability can be calculated for state xi+1 by
with initial condition
* arg max ,P x
11 maxl l i k klk
v i e x v i a
0 0 1v
Hidden Markov Models
Viterbi Algorithm Initialisation (i=0):
Rekursion (i=1..L):
Termination:
Traceback (i=1….L):
0
0
0
1
0 1, 0 0 0
max 1
arg max 1
, * max
arg max
*
*
*
k
l l i k klk
i k klk
k kk
k kLk
i ii
v v for k
v i e x v i a
ptr l v i a
P x v L a
v L a
ptr
Hidden Markov Models
CpG Islands and CGCG sequence
Vl(i) C G C G
B 1 0 0 0 0
A+0 0 0 0 0
C+0 0.13 0 0.12 0
G+0 0 0.034 0 0.0032
T+0 0 0 0 0
A-0 0 0 0 0
C-0 0.13 0 0.0026 0
G-0 0 0.010 0 0.00021
T-0 0 0 0 0
Hidden Markov Models
The Forward Algorithm As many different paths π can give rise to the same sequence,
the probability of a sequencey P(x) is
Brute force enumeration is not practical as the number of paths rises exponentially with the length of the sequence
A simple solution is to evaluate
at the most probable path only.
,P x P x
1 10
1
, *i i i
L
ii
P x a e x a
Hidden Markov Models
The full probability P(x) can be calculated in a recursive way with dynamic programming.This is called the forward algorithm.
Calculate the probability fk(i) of the observed sequence up to and including xi under the constraint that πi = k
The recursion equation is
1...... ,k i if i P x x k
1 1l l i k klk
f i e x f i a
Hidden Markov Model
Forward Algorithm
Initialization (i=0):
Recursion (i=1…..L):
Termination:
0
0
0 1, 0 0 0
1
k
l l i k klk
k kk
f f for k
f i e x f i a
P x f L a
Hidden Markov Model
The Backward Algorithm What is the most probable state for an observation xi ?
What is the probability P(πi = k | x) that observation xi came from state k given the observed sequence. This is the posterior probability of state k at time i when the emitted sequence is known.
First calculate the probability of producing the entire observed sequence with the ith symbol being produced by state k:
1 1 1
1 1
, ..... , .... | .... ,
.... , .... |
i i i i L i i
i i i L i
k k
P x k P x x k P x x x x k
P x x k P x x k
f i b i
Hidden Markov Model
The Backward Algorithm
Initialisation (i=L):
Recursion (i=L-1,…..,1):
Termination:
0
1
0 1
1
1
k k
k kl l i ll
l l ll
b L a k
b i a e x b i
P x a e x b
Hidden Markov Models
Posterior Probabilities
From the backward algorithm posterior probabilities can be obtained
where P(x) is the result of the forward algorithm.
| k ki
f i b iP k x
P x
Hidden Markov Model
Parameter Estimation for HMMs Two problems remain:
1) how to choose an appropriate model architecture
2) how to assign the transition and emission probabilities Assumption: Independent training sequences x1 …. xn are
given Consider the log likelihood
where θ represents the set of values of all parameters (akl,el)
1 1
1
,....., | log ,....., | log |n
n n j
j
l x x P x x P x
Hidden Markov Models
Estimation with known state sequence Assume the paths are known for all training sequences Count the number Akl and Ek(b) of times each particular
transition or emission is used in the set of training sequences plus pseudocounts rkl and rk(b), respectively.
The Maximum Likelihood estimators for akl and ek(b) are then given by
'
' '
'kkl
kl kkl k
l b
E bAa and e b
A E b
Hidden Markov Models
Estimation with unknown paths Iterative procedures must be used to estimate the parameters All standard algorithms for optimization of continuous functions
can be used One particular iteration method is standardly used: the Baum –
Welch algorithmus
-- first estimate the Akl and Ek(b) by considering probable paths for the training sequences using the current values of the akl and ek(b)
-- second use the maximum likelihood estimators to obtain new transition and emission parameters
-- iterate that process until a stopping criterium is met
-- many local maxima exist particularly with large HMMs
Hidden Markov Models
Baum – Welch Algorithmus It calculates the Akl and Ek(b) as the expected number of
times each transition or emission is used in the training sequence
It uses the values of the forward and backward algorithms The probability that akl is used at position i in sequence x is
11
1, | , k kl l i l
i i
f i a e x b iP k l x
P x
Hidden Markov Models
Baum – Welch Algorithm The expected number of times akl is used can be derived then by
summing over all positions and over all training sequences
The expected umber of times that letter b appears in state k is given by
1
11j j j
kl k kl l i ljj i
A f i a e x b iP x
{ | }
1ji
j jk k kj
j i x b
E b f i b iP x
Hidden Markov Models
Baum – Welch Algoritmus Initialisation: Pick arbitrary model parameters Recurrence: Set all A and E variables to their pseudocount
values r or to zero
For each sequence j=1……n:
-- calculate fk(i) for sequence j using the forward algorithm
-- calculate bk(i) for sequence j using the backward algorithm
-- add the contribution of sequence j to A and E
-- calculate the new model parameters maximum likelihood estimator
-- calculate the new log likelihood of the model Termination: stop if log likelihood change is less than threshold
Hidden Markov Models
Baum – Welch Algorithm The Baum – Welch algorithm is a special case of an Expectation
– Maximization Algorithm As an alternative Viterbi training can be used as well. There the
most probable paths are estimated with the Viterbi algorithm. These are used in the iterative re-estimation process.
Convergence is garanteed as the assignment of the paths is a discrete process
Unlike Baum – Welch this procedure does not maximise the true likelihood P(x1…..xn|θ) regarded as a function of the model parameters θ
It finds the value of θ that maximizes the contribution to the likelihood P(x1…..xn|θ,π*(x1),….., π*(xn)) from the most probable paths for all sequences.