View
147
Download
1
Category
Tags:
Preview:
DESCRIPTION
Citation preview
Markov chain and Hidden Markov Models
Nasir and Rajab
Dr. PanSpring 2014
MARKOV CHAINS:
One assumption that the stochastic process lead to Markov chain, which has the following key property:
A stochastic process {Xt} is said to have the Markovian property if The state of the system at time t+1
depends only on the state of the system at time t
] x X | x P[X
] x X , x X , . . . , x X , x X | x P[X
tt11t
00111-t1-ttt11t
t
t
Stationarity Assumption:
Probabilities independent of t when process is “stationary” So,
This means that if system is in state i, the probability that the system will next move to state j is .
Because the are conditional probabilities, they must be nonnegative, and since the process must make a transition into some
state, they must satisfy the properties:
1.
The n-step transition matrix
The Markov chains have the following properties:
1. A finite number of states.
2. Stationary transition probabilities.
We also will assume that we know the initial probabilities for all i.
_ Irreducible Markov chain :A Markov Chain is irreducible if the corresponding graph is strongly connected.
- Recurrent and transient states
A and B are transient states, C and D are recurrent states.
Once the process moves from B to D, it will never come back.A B
C D
A B
C D
E
The period of a state
A Markov Chain is periodic if all the states in it have a period k >1.
It is aperiodic otherwise.
Ergodic
A Markov chain is ergodic if :
1.the corresponding graph is strongly connected.
2.It is not peridoicA B
C D
A B
C D
E
Markov Chain Example
• Based on the weather today what will it be tomorrow?
• Assuming only four possible weather states
° Sunny
° Cloudy
° Rainy
° Snowing
Markov Chain Structure
• Each state is an observable event
• At each time interval the state changes to another or same state (qt {S1, S2, S3, S4})
State S1 State S2
State S3 State S4
(Sunny)
(Snowing)(Rainy)
(Cloudy)
Markov Chain Structure
Sunny Cloudy
Rainy Snowy
Markov Chain Transition Probabilities
• Transition probability matrix:
Time t + 1
State S1 S2 S3 S4 Total
Time t
S1 a11 a12 a13 a14 1
S2 a21 a22 a23 a24 1
S3 a31 a32 a33 a34 1
S4 a41 a42 a43 a44 1
𝑎𝑖𝑗=𝑝 (𝑞𝑡+ 1=𝑠 𝑗 {𝑞¿¿𝑡=𝑠𝑖)
Markov Chain Transition Probabilities
• Probabilities for tomorrow’s weather based on today’s weather
Time t + 1
State Sunny Cloudy Rainy Snowing
Time t
Sunny 0.6 0.3 0.1 0.0
Cloudy 0.2 0.4 0.3 0.1
Rainy 0.1 0.2 0.5 0.2
Snowing 0.0 0.3 0.2 0.5
0.6
0.4
0.5
0.5
0.2
0.10.3
0.2
0.3
0.1
0.3
0.20.1
0.2
Sunny
Cloudy
Rainy
Snowing
0.10.1
0.4
0.1
transition probabilities
1.0)|Pr(
4.0)|Pr(
1.0)|Pr(
1.0)|Pr(
1
1
1
1
gXtX
gXgX
gXcX
gXaX
ii
ii
ii
ii
Markov Chain ModelsA Markov Chain Model for DNA
A
TC
G
begin
state
transition
A AdenineC
G
T
Cytosine
Guanine
Thymine
The Probability of a Sequence for a Given Markov Chain Model
end
A
TC
G
begin
Pr(cggt) Pr(c | begin)Pr(g | c)Pr(g | g)Pr (t | g)Pr(end | t)
Markov Chain Notation
The transition parameters can be denoted by where
• Similarly we can denote the probability of a sequence x as
where represents the transition from the begin state
• This gives a probability distribution over sequences of length M
)|Pr()Pr( 12
12
11
i
M
ii
M
ixxx xxxaa
iiB
axi 1xiPr(X i x i | X i 1 x i 1)
ii xxa1
1xaB
Estimating the Model Parameters
Given some data (e.g. a set of sequences from CpG islands), how can we
determine the probability parameters of our model?
* One approach maximum likelihood estimation
* A Bayesian Approach
The "p" in CpG indicates that the C and the G are next to each other in
sequence, regardless of being single- or double- stranded. In a CpG site, both C
and G are found on the same strand of DNA or RNA and are connected by a
phosphodiester bond. This is a covalent bond between atoms.
Maximum Likelihood Estimation
• Let’s use a very simple sequence modelEvery position is independent of the othersEevery position generated from the same multinomial distribution
We want to estimate the parameters Pr(a), Pr(c), Pr(g), Pr(t)
and we’re given the sequencesaccgcgcttagcttagtgactagccgttac
then the maximum likelihood estimates are the observed frequencies of the bases
267.030
8)Pr(
233.030
7)Pr(
t
g
3.030
9)Pr(
2.030
6)Pr(
c
a
Pr(a) na
nii
Maximum Likelihood Estimation
• Suppose instead we saw the following sequencesgccgcgcttggcttggtggctggccgttgc
• Then the maximum likelihood estimates are
267.030
8)Pr(
433.030
13)Pr(
t
g
3.030
9)Pr(
030
0)Pr(
c
a
A Bayesian Approach
• A more general form: m-estimates
mn
mpna
ii
aa
)Pr(
• with m=8 and uniform priors
gccgcgcttg
gcttggtggc
tggccgttgc
number of “virtual” instances
prior probability of a
38
11
830
825.09)Pr(
c
Estimation for 1st Order Probabilities
To estimate a 1st order parameter, such as Pr(c|g), we count the number of times that c follows the history g in our given sequences
using Laplace estimates with the sequences
gccgcgcttg
gcttggtggc
tggccgttgc
412
12)|Pr(
412
13)|Pr(
412
17)|Pr(
412
10)|Pr(
gt
gg
gc
ga
47
10)|Pr(
ca
Example ApplicationMarkov Chains for Discrimination
• Suppose we want to distinguish CpG islands from other sequence
regions
• given sequences from CpG islands, and sequences from other regions,
we can construct
• A model to represent CpG islands.
• A null model to represent the other regions.
Markov Chains for Discrimination
+ a c g t
a .18 .27 .43 .12
c .17 .37 .27 .19
g .16 .34 .38 .12
t .08 .36 .38 .18
- a c g t
a .30 .21 .28 .21
c .32 .30 .08 .30
g .25 .24 .30 .21
t .18 .24 .29 .29
• Parameters estimated for CpG and null modelshuman sequences containing 48 CpG islands 60,000 nucleotides
Pr( | )c a
CpG null
The CpG matrix The Null matrix
Hidden Markov Models (HMM)
“Doubly stochastic process with an underlying process that is not
observable (It’s Hidden) but can only be observed through another
set of stochastic process that produce the sequence observed
symbol”
Rabiner& Juang 86’
• In Markov chain, the state is directly visible to the observer, and
therefore the state transition probabilities are the only parameters. In a
hidden Markov model, the state is not directly visible, but output,
dependent on the state, is visible. Each state has a probability
distribution over the possible output tokens. Therefore the sequence of
tokens generated by an HMM gives some information about the
sequence of states.
Difference between Markov chains and HMM
HMM Example
• Suppose we want to determine the average annual temperature at a
particular location on earth over a series of years.
• we consider two annual temperatures, (hot" and cold“).
State H C
H 0.7 0.3
C 0.4 0.6
Time t +1
Time t
The state is the average annual temperature.
The transition from one state to the next is a Markov process
Since we can't observe the state (temperature) in the past, we
can observe the size of tree rings.
Now suppose that current research indicates a correlation between the
size of tree growth rings and temperature. we only consider three
different tree ring sizes, small, medium and large, respectively.
S M L
H 0.1 0.4 0.5
C 0.7 0.2 0.1
Observation
State
The probabilistic relationship between annual temperature and tree ring sizes is
In this example, suppose that the initial state distribution, denoted by
H C
The probability of state sequence X is given by:
= 0.000212
State probability Normalized
HHHH 0.000412 0.042787
HHHC 0.000035 0.003635
HHCH 0.000706 0.073320
HHCC 0.000212 0.022017
HCHH 0.000050 0.005193
HCHC 0.000004 0.000415
HCCH 0.000302 0.031364
HCCC 0.000091 0.009451
CHHH 0.001098 0.114031
CHHC 0.000094 0.009762
CHCH 0.001882 0.195451
CHCC 0.000564 0.058573
CCHH 0.000470 0.048811
CCHC 0.000040 0.004154
CCCH 0.002822 0.293073
CCCC 0.000847 0.087963
Table 1: State sequence probabilities
To find the optimal state sequence in the dynamic
programming (DP), we simply choose the
sequence with the highest probability, namely,
CCCH.
0 1 2 3
P(H) 0.188182 0.519576 0.228788 0.804029
P(C) 0.811818 0.480424 0.771212 0.195971
Table 2: HMM probabilities
From Table 2 we found that the optimal sequence in the HMM sense is CHCH.
and, the optimal DP sequence differs from the optimal HMM sequence and all state
transitions are valid.
Note that: The DP solution and the HMM solution are not necessarily the same. For
example, the DP solution must have valid state transitions, while this is not
necessarily the case for the HMMs.
R code of CpG matrix :
library(markovchain)
DNAStates <- c("A", "C", "G","T")
byRow <- TRUE
DNAMatrix <- matrix(data = c(0.18,0.27,0.43,0.12,0.17,0.37,0.27,0.19,0.16,0.34,0.38,0.12,0.08,0.36,0.38,0.18),byrow
=byRow , nrow =4 , dimnames = list(DNAStates, DNAStates))
mcDNA <- new("markovchain", states = DNAStates, byrow = byRow, transitionMatrix = DNAMatrix, name = "DNA")
plot(mcDNA)
R code of null matrix :
DNAStates <- c("A", "C", "G","T")
byRow <- TRUE
DNAMatrix <- matrix(data = c(0.30,0.21,0.28,0.21,0.32,0.30,0.08,0.30,0.25,0.24,0.30,0.21,0.18,0.24,0.29,0.29),byrow
=byRow , nrow =4 , dimnames = list(DNAStates, DNAStates))
mcDNAnull <- new("markovchain", states = DNAStates, byrow = byRow, transitionMatrix = DNAMatrix, name =
"DNA")
plot(mcDNAnull)
References:
http://www.scs.leeds.ac.uk/scs-only/teaching-materials/HiddenMarkovModels/html_dev/main.html
A Revealing Introduction to Hidden Markov Models, Mark Stamp, Department of Computer Science
San Jose State University, September 28, 2012
Recommended