Introduction to Hidden Markov Modeling (HMM) · Finding the optimal state sequence with Viterbi 13 S C R S C R S C R S C R Repeat these calculations for all possible transitions recursively

Introduction to Hidden Markov Modeling (HMM)

Daniel S. Terry

Scott Blanchard and Harel Weinstein labs

1

HMM is useful for many, many problems.

2

Speech Recognitionand Translation

Weather Modeling Sequence Alignment

Financial Modeling

So let’s say you’re riding out nuclear war in a bunker…

To keep sane, you want to know what the weather outside is like…

…but all you can observe is if the security guard brings his umbrella.

4

?

Probabilistic reasoning

5

P(Sunny|Umbrella)

XE

P(Cloudy|Umbrella) P(Rain|Umbrella)

P(Rain|No Umbrella)P(Cloudy|No Umbrella)P(Sunny|No Umbrella)

P(X|E) = probability of X happening if E is observed.

Ob

serv

atio

ns

Hidden State

6

Probabilistic reasoning in stochastic processes

Time

Observations(“Emissions”)

HiddenState

X0

E0

X1

E1

X2

E2

X3

E3

X4

E4

…HiddenState


This is called a Markov chain

7

Assumptions in Markov modeling

Assumption 1: This is a stationary process, specifically a first-order Markov Process:

P(Xt|Xt-1,Xt-2,Xt-3,…) = P(Xt|Xt-1)

…in other words, the current state depends only on the previous state.We call this the transition model.

Assumption 2: The current observations depends only on the current state:

P(Et|Xt,Xt-1,Xt-2,…,Et-1,Et-2,Et-3,…) = P(Et|Xt)

…in other words, the current observation depends only on the current state.We call this the observation (or emission) model.

X0

E0

X1

E1

X2

E2

X3

E3

X4

E4

…HiddenState


The initial and transition probability models: π and A

8

X0

E0

X1

E1

…

Xt-1 P(Xt = Sunny)

P(Xt = Cloudy)

P(Xt = Raining)

Sunny 0.7 0.25 0.05

Cloudy 0.33 0.33 0.33

Raining 0.2 0.6 0.2

Encodes prior knowledge about weather trends.

π

Sunny 0.7

Cloudy 0.15

Raining 0.15

P(X)

P(Xt|Xt-1)

The observation probability model: B

9

X0

E0

X1

E1

X2

E2

…HiddenState


Xt P(Et=Um.)

Sunny 0.05

Cloudy 0.10

Raining 0.85

Encodes prior knowledge about how likely people are to bring their umbrella depending on weather conditions.

Together these parameters define a Markov model.

10

},,{ BA

Initial StateProbabilities

State TransitionProbabilities

ObservationDistributions

C RπC πR

aC,R

aC,C aR,R

aR,C

bC bR

11

Predicting state sequences from observations

Observation Sequence (t=1..T)

Predicted Hidden State Sequence

X0

E0

X1

E1

X2

E2

X3

E3

…

Markov Chain

C RπC πR

aC,R

aC,C aR,R

aR,C

bC bR

Markov Model

Finding the optimal state sequence with Viterbi

12

},,{ BA Given a model that describes the system ( ), we can determine the optimal state sequence (idealization) as follows:

S

C

R

S

C

R

S

C

R

S

C

R

For each state at time t, calculate probability of the state at time t (Xt) being a particular state xi (sunning, raining, etc), given observations and previous states:

P(Xt=xi|Et,Et-1,Et-2,…,Xt-1,Xt-2,Xt-3,…) = P(Xt=xi|Et, Xt-1=xj) = P(Xt=xi|Et) × P(Xt=xi|Xt=xj)

P(X0=xi) = π

X0 X1 X2 X3

Time

States …


13

S

C

R

S

C

R

S

C

R

S

C

R

Repeat these calculations for all possible transitions recursively.Then at each point in time we have an estimate of how likely we are to be in a particular state at that time given all possible previous paths.We also keep track of the most likely state at each point in time.(This complex looking thing is called a trellis. Can you see why?)

X0 X1 X2 X3

Time

States …


14

S

C

R

S

C

R

S

C

R

S

C

R

Find the most likely end state from the probabilities.We can then backtrack to find the most likely state sequence.You have seen a similar procedure with sequence alignment.

X0 X1 X2 X3

Time

States …

15

Predicting state sequences from observations

Observation Sequence (t=1..T)

Predicted Hidden State Sequence

X0

E0

X1

E1

X2

E2

X3

E3

…

Markov Chain

C RπC πR

aC,R

aC,C aR,R

aR,C

bC bR

Markov Model

16

A practical example of Markov modeling:

Analysis of single-molecule fluorescence trajectories

0 1 2 3 4 5

FR

ET

Time (min)

Flu

ore

sce

nce

…Ok, so I’m bored of talking about the weather.

www.nia.NIH.gov, public domain.

Neurotransmitter release and reuptake is central toneuronal signaling and proper functioning of the brain.

NSSReuptake

Neurotransmitter:Sodium Symporter (NSS) proteinsare the targets of many clinically-important drugs.

Drugs of Abuse

Therapeutic Inhibitors

www.nia.NIH.gov, public domain.

NSSReuptake

Intracellular

Extracellular

Neurotransmitter

High Na+ Outside

Low Na+ Inside

Key Question: What are the specific conformational changesrequired for such a mechanism and how do they mediate transport?

A practical example of Markov modeling:Analysis of single-molecule fluorescence trajectories

2 4 6 8 100.0

0.2

0.4

0.6

0.8

1.0

FR

ET

Distance (nm)

R0

DonorAcceptor

Single molecule FRET:A tool for examining conformational dynamics

20

FRET imaging of single-molecules can be achieved usinga few tricks, including total internal reflection excitation.

DonorAcceptor

532 nm TIR Excitation

Surface

0 1 2 3 4 50.0

0.2

0.4

0.6

0.8

1.0

FR

ET

Time (min)

0

5

10

15

Flu

ore

sce

nce

21

We want to know:1) How many distinct states are there?2) What are their FRET values?3) What are the rates?4) Most likely state at each point in time?

Co

nfo

rma

tion

0 2 4 6 8 10 12 14

FR

ET

Time (sec)

Flu

ore

sce

nce

HMM is a statistical framework for modeling a hidden systemusing a sequence of observations generated by that system.

Sequence of Hidden States

Sequence of Observations

22

X0

E0

X1

E1

X2

E2

…

Unlike with the weather, we have to learn the model form the data itself!!

Hidden Markov models have three components:

1) Initial state probabilities:

CO ,

2) Transition probabilities:

CCOC

COOO

jiaa

aaaA

,,

,,

, }{

},,{ BA

O CπO πC

aO,C

aO,O aC,C

aC,O

bO bC

23

3) Observation probability distribution (OPD):

0.4 0.5 0.6 0.7

FRET

μi

σi

2

2

2

)(exp

2

1)(

i

it

i

ti

EEbB

FRET distributionfor state i.

Goal: best model to explain the experimental data.

)|(argmaxˆ EP

In other words, we want to maximize the probability of the model giving the data.

(where λ is the model, E is the observed FRET trajectory)

)(

)()|()|(

EP

PEPEP

But we don’t know how to calculate P( λ | E ) !

Instead, turn it around using Bayes’ theorem:

)|(argmax)|(argmaxˆ EPEP

The prior probability P(E) is independent of the model choice andwill not affect model ranking. If we assume all models are equally likely, then:

24

P( E | λ ) is easy to calculate – it is the observation distribution.Why is X not here? We have to do this over all possible state sequences!

0 1 2 3 4 50.0

0.2

0.4

0.6

0.8

1.0

FR

ET

Time (min)

Segmental k-means (SKM): optimization on the cheap

25

State assignment(Viterbi)

Parameterreestimation

• To get B, simply calculate the mean and std for each state from the current assignment.

• To get A, count the number of transitions of each type and normalize.

• To get π, count the number of times each dwell starts with each state xi and normalize.

F. Qin (2004), Biophys J 86: 1488

λ0

λi

Works only if the starting model that is close to final.

Model optimization: expectation maximization (EM).

Expectation: Calculate the probability of data given the model (expectation).

Maximization: Adjust model parameters to better fit the calculated probabilities.

Termination: Iterate until log-likelihood converges (e.g., ΔLL<10-4).

26

)]|()|(log[)](log[..0

1 tt

Tt

tt EXPXXPXPLL

Tt

tttt EXPXXPXPEP..0

1 )|()|()()|(

Initial (π) Transition (A) Observation (B)

Restarts: if the likelihood “landscape” is very frustrated, restarting from a random initial model can help get out of local minima.

27

The forward-backward algorithm (Baum Welch)

X0

E0

X1

E1

X2

E2

X3

E3

X96

E96

X97

E97

X98

E98

X99

E99

…

Calculating the probabilities at a particular point in time (t):

The “past” The “future”

),|()|( ..1..1..1 TtttTt EEXPEXP )|()|( ..1..1 Ttttt EXPEXP α

We can do this because of Bayes’ rule and conditional independence of observations over time…

We calculate these much like we did with Viterbi…

Forward Backward

28

The forward algorithm

Partial probabilities (α) are calculated recursively as:

αt(j) = P(observation|hidden state is j) × P(all paths to state j at time t)

Initial condition: α0(j) = π(j)∙B(j,Et)

Iterate:

Then the total probability of the sequence is the sum of these α’s…

O

C

O

C

O

C

O

C

X0 X1 X2 X3

Time

States …

jit

n

i

t aij ,

1

t1 )(Ej,B )(

Maximization using forward-backward probabilities

Probability of being in state i at time t:

Probability of transitioning from state i to j at time t:(from the Forward-Backward algorithm)

Model parameters adjusted to maximize log-likelihood:

29

This very much like SKM, except we use explicit probabilities instead of just counting.

The problem of bias

30

• You can always get a better fit using more parameters!But it may not be a good model.

• Bayesian information criterion (BIC):

-2∙ln* P(E|k) + ≈ BIC = -2∙ln(LL) + k∙ln(n)

k is the number of free parameters, LL is log-likelihood of the optimal “fit”, and n is the number of data points.

• Akike information (AIC)AIC = -2∙k - 2∙ln(LL)

• Maximum evidence methods (vbFRET), etc.

We want to know:1) How many distinct states are there?2) What are their FRET values?3) What are the rates?4) Most likely state at each point in time?

Co

nfo

rma

tion

0 2 4 6 8 10 12 14

FR

ET

Time (sec)

Flu

ore

sce

nce

HMM is a statistical framework for modeling a hidden systemusing a sequence of observations generated by that system.

Sequence of Hidden States

Sequence of Observations

31

X0

E0

X1

E1

X2

E2

…

0 1 2 3 4 50.0

0.2

0.4

0.6

0.8

1.0

Time (min)

FR

ET

+2 mM Ala2 mM Na+:

Quantifying kinetics is then useful for understandinghow outside factors (ligands) influence dynamics.

-1 0 1 2 3 40

10

20

30

Open State

Closed State

Dw

ell

Tim

e (

s)

log [Ala] (M)

-1 0 1 2 3 4

20

40

60

80

Occup

an

cy (

%)

log [Ala] (M)

Zhao and Terry, et al (2011), Nature 474

Other important examples of Markov modeling:

• Single-channel recordings (patch clamp)

• Sequence analysis

• Cardiac electrical modeling

• Systems modeling of metabolic networks

33

C

O

We can do non-equilibrium Markov modeling, too

34Geggier et al (2010), JMB 399: 576

HMM is useful for many, many problems.

35

Speech Recognitionand Translation

Weather Modeling Sequence Alignment

Financial Modeling

Some useful references

• Artificial Intelligence: A Modern Approach

• http://www.comp.leeds.ac.uk/roger/HiddenMarkovModels/html_dev/main.html

• Rabiner (1989), Proc. of the IEEE 77: 257.

• Qin F. Principles of single-channel kinetic analysis. Methods Mol Biol. 2007; 403.

• Bronson et al (2009), Biophys J 97: 3196.

• QuB software suite: www.qub.buffalo.edu

36

http://www.comp.leeds.ac.uk/roger/HiddenMarkovModels/html_dev/main.html

http://www.comp.leeds.ac.uk/roger/HiddenMarkovModels/html_dev/main.html

Documents

Introduction to Hidden Markov Modeling (HMM) · Finding the optimal state sequence with Viterbi 13 S C R S C R S C R S C R Repeat these calculations for all possible transitions recursively