Hidden Markov Modelsrshamir/algmb/presentations/HMM-1stLec.pdf · 1 =s)} •A: Transition prob....

Hidden Markov Models

• Dr Richard Durbin is a graduate in mathematics from Cambridge University and one of the founder members of the Sanger Institute. He has also held carried out research at the Laboratory of Molecular Biology in Cambridge and at Harvard and Stanford Universities in the USA. He is currently head of the informatics division in the Sanger Center.

Main source: Durbin et al.,

“Biological Sequence Alignment”

(Cambridge, ‘98)

The occasionally dishonest casino

13652656643662612564

13652656643662612564 PA(1) =

PA(2) =

… = 1/6

PB(1)=0.1

PB(5)=0.1

PB(6) =0.5

PA->B =

PB->A =

Can we tell when the loaded die is used?

Example - CpG islands • CpG islands:

– DNA stretches (100~1000bp) with frequent CG pairs (contiguous on same strand).

– Rare, appear in significant genome parts. • Problem (1): Given a short genome sequence,

decide if it comes from a CpG island.

Preliminaries: Markov Chains

(S, A, p) • S: State set • p: Initial state prob. vector {p(x1=s)} • A: Transition prob. matrix ast = P(xi=t | xi-1=s) Assumption: X=x1…xn is a random process with

memory length 1, i.e.: siS P(xi=si | x1=s1,…,xi-1=si-1) = P(xi=si | xi-1=si-1) = asi-1,si • Sequence probability: P(X) = p(x1) · i=2…Laxi-1,xi

Can avoid p by adding

0 ‘begin’ state +

transition probs A0*

Sequence probability T G C A - 0.210 0.285 0.205 0.300 A 0.302 0.078 0.298 0.322 C 0.208 0.298 0.246 0.248 G 0.292 0.292 0.239 0.177 T

P(X) = p(x1) · i=2…Laxi-1,xi

Markov model - Example

• Markov model,

• Adding “begin” and “end” states

Andrei Andreyevich Markov

• Born: 14 June 1856 in Ryazan, Russia

• Died: 20 July 1922 in Petrograd (now St Petersburg), Russia

• Seminal contributions to – central limit

theorem – stochastic processes – random walks,….

http://www-groups.dcs.st-and.ac.uk/~history/

Markov Models • - Transition probs for non-CpG islands

• + Transition probs for CpG islands

0.1200.4250.2740.180A

0.1880.2740.3680.171C

0.1250.3750.3390.161G

0.1820.3840.3550.079T

T G C A - 0.210 0.285 0.205 0.300 A 0.302 0.078 0.298 0.322 C 0.208 0.298 0.246 0.248 G 0.292 0.292 0.239 0.177 T

CpG islands: Fixed Window

• Problem (1): Given a short genome sequence X, decide if it comes from a CpG island.

• Solution: Model by a Markov chain. Let

– a+st: transition prob. in CpG islands,

– a-st: transition prob. outside CpG islands.

Decide by log-likelihood ratio score:

islandCpGnonXP

islandCpGXPXscore

1log)|(

)|(log)(

nXscorebits

Discrimination of sequences via Markov Chains

Durbin et. al, Fig. 3.2

48 CpG islands, tot length ~60K nt. Similar non-CpG.

CpG islands – the general case

• Problem(2): Detect CpG islands in a long DNA sequence.

• Naive Solution - Sliding windows: 1 k L-l,

– window: Xk = (xk+1,…,xk+l)

– score: score(Xk)

– positive score potential CpG island

Disadvantage: what is the length of the islands? How do we identify transitions?

Idea: Use Markov chains as before, with additional (hidden) states

Hidden Markov Model (HMM)

path =1,…,n (sequence of states - simple Markov chain)

Given sequence X = (x1,…,xL):

• akl = P(i=l | i-1=k),

• ek(b) = P(xi=b | i=k)

Alphabet of symbols Example: {A, C, G, T}

Finite set of states, capable

of emitting symbols. Example:

Q = {A+,C+,G+,T+,A-,C-,G-,T-}

=(A,E)

A: Transition

prob. akl k,lQ

E: Emission

prob. ek(b) kQ,

Joint prob. of observed sequence

X and path (convention: 0 - begin, L+1 - end)

M=(, Q, )

P(X,) = a0,1·i=1…Lei(xi) ·ai,i+1

Goal: Finding path * maximizing P(X,)

Viterbi’s Decoding Algorithm (finding most probable state path)

vk(i) = prob. of most probable path ending in state k at step i.

Init: v0(0) = 1; vk(0)=0 k>0 Step: vl(i+1)=el(xi+1)·maxk{vk(i)·akl} End: P(X, *) = maxk{vk(L) · ak0}

Time complexity: O(Ln2) for n states, m symbols, L steps

Can find * using back pointers.

Want: path maximizing P(X, )

The occasionally dishonest casino (2)

13652656643662612564

13652656643662612564 A

emission probabilities

The occasionally dishonest casino (2)

HMM for CpG Islands • States: A+ C+ G+ T+ A- C- G- T-

• Symbols: A C G T A C G T

• Path =1,…,n: sequence of states

0.1200.4250.2740.180A

0.1880.2740.3680.171C

0.1250.3750.3390.161G

0.1820.3840.3550.079T

0.2100.2850.2050.300A

0.3020.0780.2980.322C

0.2080.2980.2460.248G

0.2920.2920.2390.177T

transition prob. http://www.cs.huji.ac.il/~cbio/handouts/class4.ppt

HMM for CpG Islands

T + A+

Posterior State Probabilities Goal: calculate P(i=k | X)

• Our strategy: • P(X, i=k) = = P(x1,…,xi, i=k) · P(xi+1,…,xL | x1,…,xi, i=k) = P(x1,…,xi, i=k) · P(xi+1,…,xL | i=k) • P(i=k | X) = P(i=k, X) / P(X) Need to compute these two terms - and P(X)

Forward Algorithm

Goal: calculate P(X) = P(X, ) Approximation: take max path * from Viterbi alg. Not justified when several near maximal paths Exact alg : (a.k.a. “Forward Algorithm”) fk(i) = P(x0,…,xi, i=k) • Init: f0(0) = 1; fk(0)=0 k>0 • Step: fj(i+1) = ej(xi+1) · k fk(i)·akj

• End: P(X) = k fk(L)·ak0

Backward Algorithm

• bk(i) = P(xi+1,…xL | i=k)

• init: k, bk(L) = ak0

• step: bk(i) = l akl·el(xi+1)·bl(i+1)

• End: P(X) = k a0k·ek(x1)·bk(1)

Posterior State Probabilities (2)

Goal: calculate P(i=k | X) • Recall:

– fk(i) = P(x0,…,xi , i=k) – bk(i) = P(xi+1,…xL | i=k) – Each can be used to compute P(X)

• P(X, i=k) = = P(x1,…,xi, i=k) · P(xi+1,…,xL | x1,…,xi, i=k) = P(x1,…,xi, i=k) · P(xi+1,…,xL | i=k) = fk(i) · bk(i) • P(i=k | X) = P(i=k, X) / P(X)

Durbin et al. pp. 60

Dishonest Casino (3)

e.g., CpG island

S={A+,C+,G+,T+}

Posterior Decoding

• Now we have P(i=k | X). How do we decode?

1. i*=argmaxk P(i=k | X)

– Good when interested in state at particular point

– path of states 1*,.., L

* may not be legal

2. Define a function of interest g(i) on the states. Compute G(i|X) = k P(i=k | X) · g(k) • E.g.: g(i) =1 for states in S, 0 on the rest: G(i|X)

is posterior prob of symbol i coming from S

Andrew Viterbi • Dr. Andrew J. Viterbi is a pioneer in the

field of Wireless Communications. He received his Bachelors and Masters degrees from MIT, and his Ph.D. in digital communications from the University of Southern California (USC). He taught at UCLA and consulted for the Jet Propulsion Laboratory (JPL) Immediately after obtaining his Ph.D. He was a co-founder of Linkabit in 1968, a small military contractor, and co-founded QualComm with Irwin Jacobs in 1985. He created the Viterbi Algorithm for interference suppression and efficient decoding of a digital transmission sequence, used by all four international standards for digital cellular telephony. QualComm is the recognized pioneer of the Code Division Multiple Access (CDMA) digital wireless technology, which allows many users to share the same radio frequencies, and thereby increase system capacity many times over analog system capacity. He is a Life Fellow of the IEEE, and was inducted as a member of the National Academy of Engineering in 1978 and of the National Academy of Sciences in 1996. http://www.ieee.org/organizations/history

_center/comsoc/viterbi.html

Hidden Markov Modelsrshamir/algmb/presentations/HMM-1stLec.pdf · 1 =s)} •A: Transition prob....

Documents

Hmm Revisited

Lessons Hmm

HMM Introduction

Hybrid NN/HMM-Based Speech Recognition with a Discriminant … · HMM-System (RBF-network) p(x(t)l~ Figure I: Architecture of the hybrid NN/HMM system continuous systems, but with

Hidden Markov Models - Tel Aviv Universityrshamir/algmb/presentations/HMM-full.pdf · 1 Hidden Markov Models Main source: Durbin et al., ... Initial state prob. vector {p(x 1=s)}

Hmm Game Strategy

Chap09 Hmm

Hmm qc project

for example: human protein sequence Query Sequence 1. create HMM Query-HMM … · 2015. 1. 26. · 1. create HMM Query-HMM 6. update 5. HMM-HMM Viterbi comparison Passed HMMs about

Baum Welch HMM

rahul hmm 3

Tutorial on Hmm and Atutorial on hmm and applicationspplications

Probabilistic Graphical Models - Brown University · 4/25/2013 · HMM, then the associated conditional distribution p(y|x) is a linear-chain CRF. This HMM-like CRF is pictured in

Protein homology detection by HMM–HMM comparison Johannes Söding

Slide3 HMM

Main Report HMM

Aocr Hmm Presentation

12 : HMM & CRFepxing/Class/10708-13/lecture/scribe12.pdf · 12 : HMM & CRF Lecturer: ... 8k Iteration: k t= X i a k;iP(x +1jyi t+1 = 1) i ... Mismatch between learning objective function

HMM Presentation

Hmm Report