Hidden Markov Models CBB 231 / COMPSCI 261. An HMM is a following: An HMM is a stochastic machine...

Hidden Markov ModelsHidden Markov Models

CBB 231 / COMPSCI 261CBB 231 / COMPSCI 261

An HMM is aAn HMM is a stochastic machine M=(Q, , Pt, Pe) consisting of the

following:following:

• a finite set of states, Q={q0, q1, ... , qm}• a finite alphabet ={s0, s1, ... , sn}• a transition distribution Pt : Q×Q a¡ i.e., Pt (qj | qi) • an emission distribution Pe : Q× a¡ i.e., Pe (sj | qi)

30% 70%

R=0%Y = 100%

Y=0%R = 100%

What is an HMM?What is an HMM?

M1=({q0,q1,q2},{Y,R},Pt,Pe)

Pt={(q0,q1,1), (q1,q1,0.8), (q1,q2,0.15), (q1,q0,0.05), (q2,q2,0.7), (q2,q1,0.3)}

Pe={(q1,Y,1), (q1,R,0), (q2,Y,0), (q2,R,1)}

An ExampleAn Example

30% 70%

R=0%Y = 100%

Y=0%R = 100%

P(YRYRY|M1) =

a0→1b1,Ya1→2b2,Ra2→1b1,Ya1→2b2,Ra2→1b1,Ya1→0

=1 1 0.15 1 0.3 1 0.15 1 0.3 1 0.05

=0.00010125

Probability of a SequenceProbability of a Sequence

Another ExampleAnother Example

A=10%T=30%C=40%G=20%

A=10%T=30%C=40%G=20% A=11%

T=17%C=43%G=29%

A=11%T=17%C=43%G=29%

A=35%T=25%C=15%G=25%

A=35%T=25%C=15%G=25% A=27%

T=14%C=22%G=37%

A=27%T=14%C=22%G=37%50%50%

50%50%

100%100%

65%65%

35%35%

20%20%

M2 = (Q, , Pt, Pe)

Q = {q0, q1, q2, q3, q4}

={A, C, G, T}

A=10%T=30%C=40%G=20%

A=10%T=30%C=40%G=20% A=11%

T=17%C=43%G=29%

A=11%T=17%C=43%G=29%

A=35%T=25%C=15%G=25%

A=35%T=25%C=15%G=25% A=27%

T=14%C=22%G=37%

A=27%T=14%C=22%G=37%50%50%

50%50%

100%100%65%65%

35%35%

20%20%

The most probable path is:The most probable path is: States:States: 122222224 Sequence:Sequence: CATTAATAG

resulting in this parse:resulting in this parse: States:States: 122222224 Sequence:Sequence: CATTAATAG

The most probable path is:The most probable path is: States:States: 122222224 Sequence:Sequence: CATTAATAG

resulting in this parse:resulting in this parse: States:States: 122222224 Sequence:Sequence: CATTAATAG

Finding the Most Probable Path

Example: C A T T A A T A GExample: C A T T A A T A G

top:top: 7.0×10-7

bottom:bottom: 2.8×10-9

feature 1: C

feature 2: ATTAATA

feature 3: G

Finding the Most Probable PathFinding the Most Probable Path

)()|(max

φφφ

φφφφφ

PSPargmax

SPargmaxSP

SPargmaxSPargmax

P(φ) = Pt(yi+1 |yi )i=0

P(S|φ) = Pe(xi |yi+1)i=0

φmax=argmax

φPt(q0 |yL ) Pe(xi |yi+1)Pt(yi+1 |yi )

emission prob.emission prob. transition prob.transition prob.

Decoding with an HMMDecoding with an HMM

φi,k =argmaxφ j ,k−1+qi

P(φ j ,k−1,x0...xk−1)Pt(qi |qj )Pe(xk |qi)[ ] if k> 0

q0qi if k=0

⎧ ⎨ ⎪

⎩ ⎪

P(φi,k,x0...xk) =max

j P(φ j ,k−1,x0...xk−1)Pt(qi |qj )Pe(xk |qi )[ ] if k> 0

Pt(qi |q0 )Pe(x0 |qi ) if k=0

⎧ ⎨ ⎪

⎩ ⎪

The Best Partial ParseThe Best Partial Parse

φi,k = the best partial parse ending in stateqi at positionk

⎪⎩

⎪⎨⎧

>−=.0 if )|()|(

,0 if ),()|()1,(max),(

00 kqxPqqP

kqxPqqPkjVjkiVieit

ikejit

The Viterbi AlgorithmThe Viterbi Algorithm

φmax=argmaxφi,L−1

V(i, L−1)Pt(q0 |qi )

sequence

kk-1. . .

k-2 k+1. . . . . .

V (i,k)=max

j V ( j,k−1)Pt(qi |qj)Pe(xk |qi) ifk> 0,

Pt (qi |q0 )Pe(x0 |qi) ifk=0.

⎧ ⎨ ⎪

⎩ ⎪

T (i,k)=argmax

jV ( j,k−1)Pt (qi |qj)Pe(xk |qi) if k> 0,

0 if k=0.

⎧ ⎨ ⎪

⎩ ⎪

Viterbi: TracebackViterbi: Traceback

T( T( T( ... T( T(i, L-1), L-2) ..., 2), 1), 0) = 0

Viterbi Algorithm in PseudocodeViterbi Algorithm in Pseudocodetrans[qi]={qj | Pt(qi|qj)>0}

emit[s] = {qi | Pe(s|qi)>0}

initialization

fill out main part of DP matrix

choose best state from last column in DP matrix

traceback

F (i,k)=

1 fork=0, i =00 fork> 0, i =00 fork=0, i > 0

F ( j,k−1)Pt(qi |qj)Pe(xk−1 |qi)j=0

|Q|−1

∑ for1≤k≤|S |, 1≤ i <|Q |

⎪ ⎪ ⎪

P(S |M )= F (i,|S |)Pt (q0 |qi)i=0

|Q|−1

The Forward Algorithm : Probability of a SequenceThe Forward Algorithm : Probability of a Sequence

FF(ii,kk) represents the probability P(S0..k-1| qi) that the machine emits the subsequence x0...xk-1 by any path ending in state qi—i.e., so that symbol xk-1 is emitted by state qi.

F ( j,k−1)Pt(qi |qj )Pe(xk−1 |qi)j=0

|Q|−1

The Forward Algorithm : Probability of a SequenceThe Forward Algorithm : Probability of a Sequence

sequence

k-2 k+1. . . . . .

maxj V( j,k−1)Pt(qi |qj )Pe(xk |qi )Viterbi:Viterbi:Viterbi:Viterbi:

Forward:Forward:Forward:Forward:

the single most probable path

sum over all paths

P(S,φ)all

pathsφ

∑i.e.,

The Forward Algorithm in PseudocodeThe Forward Algorithm in Pseudocode

fill out the DP matrix

sum over the final column to get P(S)

CGATATTCGATTCTACGCGCGTATACTAGCTTATCTGATCCGATATTCGATTCTACGCGCGTATACTAGCTTATCTGATC 001111111111111122222222222222111111111111222222221111111111111122222222111111111100

to state

from state

0 0 (0%) 1 (100%) 0 (0%)

1 1 (4%) 21 (84%) 3 (12%)

2 0 (0%) 3 (20%) 12 (80%)

symbol

A C G T

instate

(24%)7

(28%)5

(20%)7

(20%)3

(20%)2

(13%)7

∑ −

∑ −Σ

Training an HMM from Labeled SequencesTraining an HMM from Labeled Sequencestr

Recall: Eukaryotic Gene StructureRecall: Eukaryotic Gene Structure

ATGATG TGATGA

coding segment

complete mRNA

ATG GT AG GT AG. . . . . . . . .

start codonstart codonstart codonstart codon stop codondonor sitedonor site donor siteacceptor acceptor sitesite

acceptor site

exonexon exon exonintronintron

exon 1exon 1 exon 2exon 2 exon 3exon 3

AGCTAGCAGTATGTCATGGCATGTTCGGAGGTAGTACGTAGAGGTAGCTAGTATAGGTCGATAGTACGCGA

IntergenicIntergenic

StartcodonStartcodon

StopcodonStop

ExonExon

DonorDonor AcceptorAcceptor

IntronIntron

the Markov model:the Markov model:

the gene prediction:the gene prediction:

the input sequence:the input sequence:

the most probable path:the most probable path:

Using an HMM for Gene PredictionUsing an HMM for Gene Prediction

Higher Order Markovian Eukaryotic RecognizerHigher Order Markovian Eukaryotic Recognizer (HOMER)(HOMER)

H27 H77

nucleotidessplice sites

start/stop codons exons genes

Sn Sp F Sn Sp Sn Sp Sn Sp F Sn #

baseline 100 28 44 0 0 0 0 0 0 0 0 0

H3 53 88 66 0 0 0 0 0 0 0 0 0

HOMER, version HOMER, version HH33

I=intron stateI=intron state

E=exon stateE=exon state

N=intergenic stateN=intergenic state

tested on 500 tested on 500 ArabidopsisArabidopsis genes: genes:

Startcodon

Stopcodon

ExonExon

Donor Acceptor

IntronIntron

Recall: Sensitivity and SpecificityRecall: Sensitivity and Specificity

F =2×Sn×Sp

H3 53 88 66 0 0 0 0 0 0 0 0 0

H5 65 91 76 1 3 3 3 0 0 0 0 0

HOMER, version HOMER, version HH55

three exon three exon states, for the states, for the three codon three codon positionspositions

H5 65 91 76 1 3 3 3 0 0 0 0 0

H17 81 93 87 34 48 43 37 19 24 21 7 35

HOMER HOMER version version HH1717 acceptor acceptor

sitesiteacceptor acceptor sitesite

donor donor sitesite

start codonstart codonstart codonstart codon

stop stop codoncodonstop stop codoncodon

StartcodonStart

codonStop

ExonExon

DonorDonor AcceptorAcceptor

IntronIntron

GTATGCGATAGTCAAGAGTGATCGCTAGACC01201201 201201201201201201 2012012012

| | | | | | || | | | | | |0 5 10 15 20 25 300 5 10 15 20 25 30

+phase:

sequence:

coordinates:

Maintaining Phase Across an IntronMaintaining Phase Across an Intron

nucleotides splice start/stop exons genes

H17 81 93 87 34 48 43 37 19 24 21 7 35

H27 83 93 88 40 49 41 36 23 27 25 8 38

HOMER HOMER version version HH2727 three three

separate separate intron intron

modelsmodels

three three separate separate

intron intron modelsmodels

T G A T A A T A G

G T A G

(start codons)(start codons) (start codons)(start codons)

(donor splice sites)(donor splice sites)(donor splice sites)(donor splice sites) (acceptor splice sites)(acceptor splice sites)(acceptor splice sites)(acceptor splice sites)

Recall: Weight MatricesRecall: Weight Matrices

(stop codons)(stop codons) (stop codons)(stop codons)

H27 83 93 88 40 49 41 36 23 27 25 8 38

H77 88 96 92 66 67 51 46 47 46 46 13 65

HOMER HOMER version version HH7777

positional biases positional biases near splice sitesnear splice sitespositional biases positional biases near splice sitesnear splice sites

H77 88 96 92 66 67 51 46 47 46 46 13 65

H95 92 97 94 79 76 57 53 62 59 60 19 93

HOMER version HOMER version HH9595

nucleotides

splice sites

Sn Sp F Sn Sp Sn Sp Sn Sp F Sn # baseline 100 28 44 0 0 0 0 0 0 0 0 0 H3 53 88 66 0 0 0 0 0 0 0 0 0 H5 65 91 76 1 3 3 3 0 0 0 0 0 H17 81 93 87 34 48 43 37 19 24 21 7 35 H27 83 93 88 40 49 41 36 23 27 25 8 38 H77 88 96 92 66 67 51 46 47 46 46 13 65 H95 92 97 94 79 76 57 53 62 59 60 19 93

Summary of HOMER ResultsSummary of HOMER Results

Higher-order Markov ModelsHigher-order Markov Models

A C G C T A A C G C T A

P(G|AC)

A C G C T A A C G C T A

P(G|C)

A C G C T A A C G C T A P(G)

0th order:

1st order:

2nd order:

Pe(gn |g0...gn−1,qj) ≈C (g0...gn,qj)

C (g0...gn−1s,qj)s∈∑

ordernucleotides

splice sites

starts/ stops exons genes

H95 0 92 97 94 79 76 57 53 62 59 60 19 93

H95 1 95 98 97 87 81 64 61 72 68 70 25 127

H95 2 98 98 98 91 82 65 62 76 69 72 27 136

H95 3 98 98 98 91 82 67 63 76 69 72 28 140

H95 4 98 97 98 90 81 69 64 76 68 72 29 143

H95 5 98 97 98 90 81 66 62 74 67 70 27 137

Higher-order Markov ModelsHigher-order Markov Models

Pebackoff(gn |g0...gn−1,qj)=

C (g0...gn,qj)

ifC (g0...gn−1,qj)≥K

Pebackoff(gn |g1...gn−1,qj) otherwise

⎨ ⎪ ⎪

⎩ ⎪ ⎪

Pebackoff(gn |g0...gn−1,qj)=

C (g0...gn,qj)

ifC (g0...gn−1,qj)≥K

Pebackoff(gn |g1...gn−1,qj) otherwise

⎨ ⎪ ⎪

⎩ ⎪ ⎪

PeIMM (s |g0...gk−1)=

kGPe(s |g0...gk−1)+ (1−k

G )PeIMM (s |g1...gk−1) ifk> 0

Pe(s) ifk=0

⎧ ⎨ ⎩

PeIMM (s |g0...gk−1)=

kGPe(s |g0...gk−1)+ (1−k

G )PeIMM (s |g1...gk−1) ifk> 0

Pe(s) ifk=0

⎧ ⎨ ⎩

1 ifm≥400

0 ifm< 400 andc< 0.5

C (g0...gk−1x)x∈∑ otherwise

⎪ ⎪ ⎪

Variable-Order Markov ModelsVariable-Order Markov Models

Interpolation ResultsInterpolation Results

SummarySummary

• An HMM is a stochastic generative model which emits sequences

• Parsing with an HMM can be accomplished using a decoding algorithm (such as Viterbi) to find the most probable (MAP) path generating the input sequence

• Training of unambiguous HMM’s can be accomplished using labeled sequence training

• Training of ambiguous HMM’s can be accomplished using Viterbi training or the Baum-Welch algorithm (next lesson...)

•Posterior decoding can be used to estimate the probability that a given symbol or substring was generate by a particular state (next lesson...)

• An HMM is a stochastic generative model which emits sequences

• Parsing with an HMM can be accomplished using a decoding algorithm (such as Viterbi) to find the most probable (MAP) path generating the input sequence

• Training of unambiguous HMM’s can be accomplished using labeled sequence training

• Training of ambiguous HMM’s can be accomplished using Viterbi training or the Baum-Welch algorithm (next lesson...)

•Posterior decoding can be used to estimate the probability that a given symbol or substring was generate by a particular state (next lesson...)

Hidden Markov Models CBB 231 / COMPSCI 261. An HMM is a following: An HMM is a stochastic machine...

Documents

CBB 132 DH v2 - jianghai-europe.com · APPLICATIONS CHARACTERISTICS EATURESF OVERVIEW PRODUCT ENVIRONMENTAL CBB 136 DP Al-case · Cylindric 85°C Page: 22 CBB 131S DY Cylindric ·

Nanobind CBB Big DNA Kit Handbook HBK-CBB-001 vf

Cbb past traditions

Hmm viterbi

Cbb 4 q13pres

HMM BoxSofa

CBB-Series Scotch-Yoke Pneumatic Actuatorss21.q4cdn.com/.../CBB-Series-Scotch-Yoke-Pneumatic-Actuators.pdf · CBB-Series Scotch-Yoke Pneumatic Actuators Improved Design - Compact,

img.tradeindia.com brochure01.pdfCREMICA Product Teddy Bear Cookies Weight 250g Packing 12 JOY-O JOY-O NO. Of packs Weight,'CBB kg in one CBB NO. Of CBB/20ft NO. Of CBB/40ft 1550 NO

cbb cifrado 01

Coffee Berry Borer (CBB) update Elsie Burbano• Elsie Burbano and Mark Wright: Efficacy of Beauveria bassiana and Provado® on the CBB, coffee phenology and CBB reproduction, coffee

Cbb Training Session 1

MSDS CBB G250.docx

Prospectus Cbb 2015

1 HMM (I) LING 570 Fei Xia Week 7: 11/5-11/7/07. 2 HMM Definition and properties of HMM –Two types of HMM Three basic questions in HMM

Grb100_low Imp_dlbb n Cbb

CBB REPORTING MODULE

Winter Project WOM and CBB

PCI Express Rx-Tx-Protocol Solutionsdownload.tek.com/document/PCIe3%20Customer...– Separate CBB for Gen 1/2/3 Compliance Base Board (CBB) CBB with Multiple Slots of different widths

Cbb Patanjali Ayurved

A Check Book Balancer for Unix/X11mcabral/softwares/cbb-man.pdfThe CBB install script supports several additional options. If you are having trouble installing CBB, you may want to