36
Introduction to Hidden Markov Modeling (HMM) Daniel S. Terry Scott Blanchard and Harel Weinstein labs 1

Introduction to Hidden Markov Modeling (HMM) · Finding the optimal state sequence with Viterbi 13 S C R S C R S C R S C R Repeat these calculations for all possible transitions recursively

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Introduction to Hidden Markov Modeling (HMM) · Finding the optimal state sequence with Viterbi 13 S C R S C R S C R S C R Repeat these calculations for all possible transitions recursively

Introduction to Hidden Markov Modeling (HMM)

Daniel S. Terry

Scott Blanchard and Harel Weinstein labs

1

Page 2: Introduction to Hidden Markov Modeling (HMM) · Finding the optimal state sequence with Viterbi 13 S C R S C R S C R S C R Repeat these calculations for all possible transitions recursively

HMM is useful for many, many problems.

2

Speech Recognitionand Translation

Weather Modeling Sequence Alignment

Financial Modeling

Page 3: Introduction to Hidden Markov Modeling (HMM) · Finding the optimal state sequence with Viterbi 13 S C R S C R S C R S C R Repeat these calculations for all possible transitions recursively
Page 4: Introduction to Hidden Markov Modeling (HMM) · Finding the optimal state sequence with Viterbi 13 S C R S C R S C R S C R Repeat these calculations for all possible transitions recursively

So let’s say you’re riding out nuclear war in a bunker…

To keep sane, you want to know what the weather outside is like…

…but all you can observe is if the security guard brings his umbrella.

4

?

Page 5: Introduction to Hidden Markov Modeling (HMM) · Finding the optimal state sequence with Viterbi 13 S C R S C R S C R S C R Repeat these calculations for all possible transitions recursively

Probabilistic reasoning

5

P(Sunny|Umbrella)

XE

P(Cloudy|Umbrella) P(Rain|Umbrella)

P(Rain|No Umbrella)P(Cloudy|No Umbrella)P(Sunny|No Umbrella)

P(X|E) = probability of X happening if E is observed.

Ob

serv

atio

ns

Hidden State

Page 6: Introduction to Hidden Markov Modeling (HMM) · Finding the optimal state sequence with Viterbi 13 S C R S C R S C R S C R Repeat these calculations for all possible transitions recursively

6

Probabilistic reasoning in stochastic processes

Time

Observations(“Emissions”)

HiddenState

X0

E0

X1

E1

X2

E2

X3

E3

X4

E4

…HiddenState

Observations(“Emissions”)

This is called a Markov chain

Page 7: Introduction to Hidden Markov Modeling (HMM) · Finding the optimal state sequence with Viterbi 13 S C R S C R S C R S C R Repeat these calculations for all possible transitions recursively

7

Assumptions in Markov modeling

Assumption 1: This is a stationary process, specifically a first-order Markov Process:

P(Xt|Xt-1,Xt-2,Xt-3,…) = P(Xt|Xt-1)

…in other words, the current state depends only on the previous state.We call this the transition model.

Assumption 2: The current observations depends only on the current state:

P(Et|Xt,Xt-1,Xt-2,…,Et-1,Et-2,Et-3,…) = P(Et|Xt)

…in other words, the current observation depends only on the current state.We call this the observation (or emission) model.

X0

E0

X1

E1

X2

E2

X3

E3

X4

E4

…HiddenState

Observations(“Emissions”)

Page 8: Introduction to Hidden Markov Modeling (HMM) · Finding the optimal state sequence with Viterbi 13 S C R S C R S C R S C R Repeat these calculations for all possible transitions recursively

The initial and transition probability models: π and A

8

X0

E0

X1

E1

Xt-1 P(Xt = Sunny)

P(Xt = Cloudy)

P(Xt = Raining)

Sunny 0.7 0.25 0.05

Cloudy 0.33 0.33 0.33

Raining 0.2 0.6 0.2

Encodes prior knowledge about weather trends.

π

Sunny 0.7

Cloudy 0.15

Raining 0.15

P(X)

P(Xt|Xt-1)

Page 9: Introduction to Hidden Markov Modeling (HMM) · Finding the optimal state sequence with Viterbi 13 S C R S C R S C R S C R Repeat these calculations for all possible transitions recursively

The observation probability model: B

9

X0

E0

X1

E1

X2

E2

…HiddenState

Observations(“Emissions”)

Xt P(Et=Um.)

Sunny 0.05

Cloudy 0.10

Raining 0.85

Encodes prior knowledge about how likely people are to bring their umbrella depending on weather conditions.

Page 10: Introduction to Hidden Markov Modeling (HMM) · Finding the optimal state sequence with Viterbi 13 S C R S C R S C R S C R Repeat these calculations for all possible transitions recursively

Together these parameters define a Markov model.

10

},,{ BA

Initial StateProbabilities

State TransitionProbabilities

ObservationDistributions

C RπC πR

aC,R

aC,C aR,R

aR,C

bC bR

Page 11: Introduction to Hidden Markov Modeling (HMM) · Finding the optimal state sequence with Viterbi 13 S C R S C R S C R S C R Repeat these calculations for all possible transitions recursively

11

Predicting state sequences from observations

Observation Sequence (t=1..T)

Predicted Hidden State Sequence

X0

E0

X1

E1

X2

E2

X3

E3

Markov Chain

C RπC πR

aC,R

aC,C aR,R

aR,C

bC bR

Markov Model

Page 12: Introduction to Hidden Markov Modeling (HMM) · Finding the optimal state sequence with Viterbi 13 S C R S C R S C R S C R Repeat these calculations for all possible transitions recursively

Finding the optimal state sequence with Viterbi

12

},,{ BA Given a model that describes the system ( ), we can determine the optimal state sequence (idealization) as follows:

S

C

R

S

C

R

S

C

R

S

C

R

For each state at time t, calculate probability of the state at time t (Xt) being a particular state xi (sunning, raining, etc), given observations and previous states:

P(Xt=xi|Et,Et-1,Et-2,…,Xt-1,Xt-2,Xt-3,…) = P(Xt=xi|Et, Xt-1=xj) = P(Xt=xi|Et) × P(Xt=xi|Xt=xj)

P(X0=xi) = π

X0 X1 X2 X3

Time

States …

Page 13: Introduction to Hidden Markov Modeling (HMM) · Finding the optimal state sequence with Viterbi 13 S C R S C R S C R S C R Repeat these calculations for all possible transitions recursively

Finding the optimal state sequence with Viterbi

13

S

C

R

S

C

R

S

C

R

S

C

R

Repeat these calculations for all possible transitions recursively.Then at each point in time we have an estimate of how likely we are to be in a particular state at that time given all possible previous paths.We also keep track of the most likely state at each point in time.(This complex looking thing is called a trellis. Can you see why?)

X0 X1 X2 X3

Time

States …

Page 14: Introduction to Hidden Markov Modeling (HMM) · Finding the optimal state sequence with Viterbi 13 S C R S C R S C R S C R Repeat these calculations for all possible transitions recursively

Finding the optimal state sequence with Viterbi

14

S

C

R

S

C

R

S

C

R

S

C

R

Find the most likely end state from the probabilities.We can then backtrack to find the most likely state sequence.You have seen a similar procedure with sequence alignment.

X0 X1 X2 X3

Time

States …

Page 15: Introduction to Hidden Markov Modeling (HMM) · Finding the optimal state sequence with Viterbi 13 S C R S C R S C R S C R Repeat these calculations for all possible transitions recursively

15

Predicting state sequences from observations

Observation Sequence (t=1..T)

Predicted Hidden State Sequence

X0

E0

X1

E1

X2

E2

X3

E3

Markov Chain

C RπC πR

aC,R

aC,C aR,R

aR,C

bC bR

Markov Model

Page 16: Introduction to Hidden Markov Modeling (HMM) · Finding the optimal state sequence with Viterbi 13 S C R S C R S C R S C R Repeat these calculations for all possible transitions recursively

16

A practical example of Markov modeling:

Analysis of single-molecule fluorescence trajectories

0 1 2 3 4 5

FR

ET

Time (min)

Flu

ore

sce

nce

…Ok, so I’m bored of talking about the weather.

Page 17: Introduction to Hidden Markov Modeling (HMM) · Finding the optimal state sequence with Viterbi 13 S C R S C R S C R S C R Repeat these calculations for all possible transitions recursively

www.nia.NIH.gov, public domain.

Neurotransmitter release and reuptake is central toneuronal signaling and proper functioning of the brain.

NSSReuptake

Page 18: Introduction to Hidden Markov Modeling (HMM) · Finding the optimal state sequence with Viterbi 13 S C R S C R S C R S C R Repeat these calculations for all possible transitions recursively

Neurotransmitter:Sodium Symporter (NSS) proteinsare the targets of many clinically-important drugs.

Drugs of Abuse

Therapeutic Inhibitors

www.nia.NIH.gov, public domain.

NSSReuptake

Page 19: Introduction to Hidden Markov Modeling (HMM) · Finding the optimal state sequence with Viterbi 13 S C R S C R S C R S C R Repeat these calculations for all possible transitions recursively

Intracellular

Extracellular

Neurotransmitter

High Na+ Outside

Low Na+ Inside

Key Question: What are the specific conformational changesrequired for such a mechanism and how do they mediate transport?

A practical example of Markov modeling:Analysis of single-molecule fluorescence trajectories

Page 20: Introduction to Hidden Markov Modeling (HMM) · Finding the optimal state sequence with Viterbi 13 S C R S C R S C R S C R Repeat these calculations for all possible transitions recursively

2 4 6 8 100.0

0.2

0.4

0.6

0.8

1.0

FR

ET

Distance (nm)

R0

DonorAcceptor

Single molecule FRET:A tool for examining conformational dynamics

20

Page 21: Introduction to Hidden Markov Modeling (HMM) · Finding the optimal state sequence with Viterbi 13 S C R S C R S C R S C R Repeat these calculations for all possible transitions recursively

FRET imaging of single-molecules can be achieved usinga few tricks, including total internal reflection excitation.

DonorAcceptor

532 nm TIR Excitation

Surface

0 1 2 3 4 50.0

0.2

0.4

0.6

0.8

1.0

FR

ET

Time (min)

0

5

10

15

Flu

ore

sce

nce

21

Page 22: Introduction to Hidden Markov Modeling (HMM) · Finding the optimal state sequence with Viterbi 13 S C R S C R S C R S C R Repeat these calculations for all possible transitions recursively

We want to know:1) How many distinct states are there?2) What are their FRET values?3) What are the rates?4) Most likely state at each point in time?

Co

nfo

rma

tion

0 2 4 6 8 10 12 14

FR

ET

Time (sec)

Flu

ore

sce

nce

HMM is a statistical framework for modeling a hidden systemusing a sequence of observations generated by that system.

Sequence of Hidden States

Sequence of Observations

22

X0

E0

X1

E1

X2

E2

Unlike with the weather, we have to learn the model form the data itself!!

Page 23: Introduction to Hidden Markov Modeling (HMM) · Finding the optimal state sequence with Viterbi 13 S C R S C R S C R S C R Repeat these calculations for all possible transitions recursively

Hidden Markov models have three components:

1) Initial state probabilities:

CO ,

2) Transition probabilities:

CCOC

COOO

jiaa

aaaA

,,

,,

, }{

},,{ BA

O CπO πC

aO,C

aO,O aC,C

aC,O

bO bC

23

3) Observation probability distribution (OPD):

0.4 0.5 0.6 0.7

FRET

μi

σi

2

2

2

)(exp

2

1)(

i

it

i

ti

EEbB

FRET distributionfor state i.

Page 24: Introduction to Hidden Markov Modeling (HMM) · Finding the optimal state sequence with Viterbi 13 S C R S C R S C R S C R Repeat these calculations for all possible transitions recursively

Goal: best model to explain the experimental data.

)|(argmaxˆ EP

In other words, we want to maximize the probability of the model giving the data.

(where λ is the model, E is the observed FRET trajectory)

)(

)()|()|(

EP

PEPEP

But we don’t know how to calculate P( λ | E ) !

Instead, turn it around using Bayes’ theorem:

)|(argmax)|(argmaxˆ EPEP

The prior probability P(E) is independent of the model choice andwill not affect model ranking. If we assume all models are equally likely, then:

24

P( E | λ ) is easy to calculate – it is the observation distribution.Why is X not here? We have to do this over all possible state sequences!

Page 25: Introduction to Hidden Markov Modeling (HMM) · Finding the optimal state sequence with Viterbi 13 S C R S C R S C R S C R Repeat these calculations for all possible transitions recursively

0 1 2 3 4 50.0

0.2

0.4

0.6

0.8

1.0

FR

ET

Time (min)

Segmental k-means (SKM): optimization on the cheap

25

State assignment(Viterbi)

Parameterreestimation

• To get B, simply calculate the mean and std for each state from the current assignment.

• To get A, count the number of transitions of each type and normalize.

• To get π, count the number of times each dwell starts with each state xi and normalize.

F. Qin (2004), Biophys J 86: 1488

λ0

λi

Works only if the starting model that is close to final.

Page 26: Introduction to Hidden Markov Modeling (HMM) · Finding the optimal state sequence with Viterbi 13 S C R S C R S C R S C R Repeat these calculations for all possible transitions recursively

Model optimization: expectation maximization (EM).

Expectation: Calculate the probability of data given the model (expectation).

Maximization: Adjust model parameters to better fit the calculated probabilities.

Termination: Iterate until log-likelihood converges (e.g., ΔLL<10-4).

26

)]|()|(log[)](log[..0

1 tt

Tt

tt EXPXXPXPLL

Tt

tttt EXPXXPXPEP..0

1 )|()|()()|(

Initial (π) Transition (A) Observation (B)

Restarts: if the likelihood “landscape” is very frustrated, restarting from a random initial model can help get out of local minima.

Page 27: Introduction to Hidden Markov Modeling (HMM) · Finding the optimal state sequence with Viterbi 13 S C R S C R S C R S C R Repeat these calculations for all possible transitions recursively

27

The forward-backward algorithm (Baum Welch)

X0

E0

X1

E1

X2

E2

X3

E3

X96

E96

X97

E97

X98

E98

X99

E99

Calculating the probabilities at a particular point in time (t):

The “past” The “future”

),|()|( ..1..1..1 TtttTt EEXPEXP )|()|( ..1..1 Ttttt EXPEXP α

We can do this because of Bayes’ rule and conditional independence of observations over time…

We calculate these much like we did with Viterbi…

Forward Backward

Page 28: Introduction to Hidden Markov Modeling (HMM) · Finding the optimal state sequence with Viterbi 13 S C R S C R S C R S C R Repeat these calculations for all possible transitions recursively

28

The forward algorithm

Partial probabilities (α) are calculated recursively as:

αt(j) = P(observation|hidden state is j) × P(all paths to state j at time t)

Initial condition: α0(j) = π(j)∙B(j,Et)

Iterate:

Then the total probability of the sequence is the sum of these α’s…

O

C

O

C

O

C

O

C

X0 X1 X2 X3

Time

States …

jit

n

i

t aij ,

1

t1 )(Ej,B )(

Page 29: Introduction to Hidden Markov Modeling (HMM) · Finding the optimal state sequence with Viterbi 13 S C R S C R S C R S C R Repeat these calculations for all possible transitions recursively

Maximization using forward-backward probabilities

Probability of being in state i at time t:

Probability of transitioning from state i to j at time t:(from the Forward-Backward algorithm)

Model parameters adjusted to maximize log-likelihood:

29

This very much like SKM, except we use explicit probabilities instead of just counting.

Page 30: Introduction to Hidden Markov Modeling (HMM) · Finding the optimal state sequence with Viterbi 13 S C R S C R S C R S C R Repeat these calculations for all possible transitions recursively

The problem of bias

30

• You can always get a better fit using more parameters!But it may not be a good model.

• Bayesian information criterion (BIC):

-2∙ln* P(E|k) + ≈ BIC = -2∙ln(LL) + k∙ln(n)

k is the number of free parameters, LL is log-likelihood of the optimal “fit”, and n is the number of data points.

• Akike information (AIC)AIC = -2∙k - 2∙ln(LL)

• Maximum evidence methods (vbFRET), etc.

Page 31: Introduction to Hidden Markov Modeling (HMM) · Finding the optimal state sequence with Viterbi 13 S C R S C R S C R S C R Repeat these calculations for all possible transitions recursively

We want to know:1) How many distinct states are there?2) What are their FRET values?3) What are the rates?4) Most likely state at each point in time?

Co

nfo

rma

tion

0 2 4 6 8 10 12 14

FR

ET

Time (sec)

Flu

ore

sce

nce

HMM is a statistical framework for modeling a hidden systemusing a sequence of observations generated by that system.

Sequence of Hidden States

Sequence of Observations

31

X0

E0

X1

E1

X2

E2

Page 32: Introduction to Hidden Markov Modeling (HMM) · Finding the optimal state sequence with Viterbi 13 S C R S C R S C R S C R Repeat these calculations for all possible transitions recursively

0 1 2 3 4 50.0

0.2

0.4

0.6

0.8

1.0

Time (min)

FR

ET

+2 mM Ala2 mM Na+:

Quantifying kinetics is then useful for understandinghow outside factors (ligands) influence dynamics.

-1 0 1 2 3 40

10

20

30

Open State

Closed State

Dw

ell

Tim

e (

s)

log [Ala] (M)

-1 0 1 2 3 4

20

40

60

80

Occup

an

cy (

%)

log [Ala] (M)

Zhao and Terry, et al (2011), Nature 474

Page 33: Introduction to Hidden Markov Modeling (HMM) · Finding the optimal state sequence with Viterbi 13 S C R S C R S C R S C R Repeat these calculations for all possible transitions recursively

Other important examples of Markov modeling:

• Single-channel recordings (patch clamp)

• Sequence analysis

• Cardiac electrical modeling

• Systems modeling of metabolic networks

33

C

O

Page 34: Introduction to Hidden Markov Modeling (HMM) · Finding the optimal state sequence with Viterbi 13 S C R S C R S C R S C R Repeat these calculations for all possible transitions recursively

We can do non-equilibrium Markov modeling, too

34Geggier et al (2010), JMB 399: 576

Page 35: Introduction to Hidden Markov Modeling (HMM) · Finding the optimal state sequence with Viterbi 13 S C R S C R S C R S C R Repeat these calculations for all possible transitions recursively

HMM is useful for many, many problems.

35

Speech Recognitionand Translation

Weather Modeling Sequence Alignment

Financial Modeling

Page 36: Introduction to Hidden Markov Modeling (HMM) · Finding the optimal state sequence with Viterbi 13 S C R S C R S C R S C R Repeat these calculations for all possible transitions recursively

Some useful references

• Artificial Intelligence: A Modern Approach

• http://www.comp.leeds.ac.uk/roger/HiddenMarkovModels/html_dev/main.html

• Rabiner (1989), Proc. of the IEEE 77: 257.

• Qin F. Principles of single-channel kinetic analysis. Methods Mol Biol. 2007; 403.

• Bronson et al (2009), Biophys J 97: 3196.

• QuB software suite: www.qub.buffalo.edu

36