99
RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions MCMC sampling of RNA structures with pseudoknots Dirk Metzler Johann Wolfgang Goethe-Universität Frankfurt am Main Fachbereich Informatik und Mathematik joint work with Markus Nebel, Universität Kaiserslautern

MCMC sampling of RNA structures with pseudoknots · RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions MCMC sampling of RNA structures with

Embed Size (px)

Citation preview

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

MCMC sampling of RNA structureswith pseudoknots

Dirk Metzler

Johann Wolfgang Goethe-Universität Frankfurt am MainFachbereich Informatik und Mathematik

joint work with Markus Nebel, Universität Kaiserslautern

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

Outline

1 RNA folding with Stochastic Context-Free GrammarsBasic ModelDynamic Programming

2 RNA folding with PseudoknotsPseudoknotsCombine SCFG with PseudoknotsBayesian Sampling of RNA structures with pseudoknots

3 Performance on DatatmRNASimulated Data

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

Outline

1 RNA folding with Stochastic Context-Free GrammarsBasic ModelDynamic Programming

2 RNA folding with PseudoknotsPseudoknotsCombine SCFG with PseudoknotsBayesian Sampling of RNA structures with pseudoknots

3 Performance on DatatmRNASimulated Data

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

tmRNA of Escherichia Coli

GGGGCUGAUUCUGGAUUCGACGGGAUUUGCGAAACCCAAGGUGCAUGCCGAGGGGCGGUUGGCCUCGUAAAAAGCCGCAAAAAAUAGUCGCAAACGACGAAAACUACGCUUUAGCAGCUUAAUAACCUGCUUAGAGCCCUCUCUCCCUAGCCUCCGCUCUUAGGACGGGGAUCAAGAGAGGUCAAACCCAAAAGAGAUCGCGUGGAAGCCCUGCCUGGGGUUGAAGCGUUAAAACUUAAUCAGGCUAGUUUGUUAGUGGCGUGUCCGUCCGCAGCUGGCAAGCGAAUGUAAAGACUGACUAAGCAUGUAGUACCGAGGAUGUAGGAAUUUCGGACGCGGGUUCAACUCCCGCCAGCUCCACCA

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

tmRNA of Escherichia Coli

GGGGCUGAUUCUGGAUUCGACGGGAUUUGCGAAACCCAAGGUGCAUGCCGAGGGGCGGUUGGCCUCGUAAAAAGCCGCAAAAAAUAGUCGCAAACGACGAAAACUACGCUUUAGCAGCUUAAUAACCUGCUUAGAGCCCUCUCUCCCUAGCCUCCGCUCUUAGGACGGGGAUCAAGAGAGGUCAAACCCAAAAGAGAUCGCGUGGAAGCCCUGCCUGGGGUUGAAGCGUUAAAACUUAAUCAGGCUAGUUUGUUAGUGGCGUGUCCGUCCGCAGCUGGCAAGCGAAUGUAAAGACUGACUAAGCAUGUAGUACCGAGGAUGUAGGAAUUUCGGACGCGGGUUCAACUCCCGCCAGCUCCACCA

AG CGGC A C

C U

UUGG

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

tmRNA of Escherichia Coli

GGGGCUGAUUCUGGAUUCGACGGGAUUUGCGAAACCCAAGGUGCAUGCCGAGGGGCGGUUGGCCUCGUAAAAAGCCGCAAAAAAUAGUCGCAAACGACGAAAACUACGCUUUAGCAGCUUAAUAACCUGCUUAGAGCCCUCUCUCCCUAGCCUCCGCUCUUAGGACGGGGAUCAAGAGAGGUCAAACCCAAAAGAGAUCGCGUGGAAGCCCUGCCUGGGGUUGAAGCGUUAAAACUUAAUCAGGCUAGUUUGUUAGUGGCGUGUCCGUCCGCAGCUGGCAAGCGAAUGUAAAGACUGACUAAGCAUGUAGUACCGAGGAUGUAGGAAUUUCGGACGCGGGUUCAACUCCCGCCAGCUCCACCA

GGGGCUG

A UUCU

GGA

UUCG

AC

GGGAUUUGCGA

AAC

CCA

AGG

UGC

AU G

CCG

A GG

GGCGGUU

GG

CCUC

GUA

A A AA G C C G C

AAAAAAUAGU C G C A A ACGACGAAAACUACGCUUUA

GCAGCUUA

AUA

ACCUG

CUUAG

A

GCC

CU

CU

CU

CC

CUA

GC

CUC

CG

C U CUU

AGG A

CGGGGAUCAAGAGAGG

UCA

AACCCAA A

AGAGAUCG

CG

UGG

A AG

CC

CUG C

CUG

GG

GU

UGA

AGC

GUUAAA

ACUUAAU

CAGGCUA

G U U U G U U A G UG G C G

U G UCC

GUCCGC

AGCUGGCAAGCGA

AUG

UA

AA

GAC

UGA

CU

AA

GCAUG

UA

GU

A

CCG

AGGAU

GUAGGAA

U U UCG

GACGCGGG U

UC

AACU

CC

CG

CCAGCUCC

ACCA

AG CGGC A C

C U

UUGG

RNAfoldImplementation ofZuker algorithm inVienna RNA Package

Zuker (1989)Zuker et al. (1999)

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

Stochastic Context-Free Grammar (SCFG)

Terminal SymbolsA,C,G,U

Non-Terminal Symbols

S, L, F

Rules with Probabilities

LS xLSy xր ր ր

S → L F → xFy L → axFyb

with x , y , a, b ∈ A,C,G,U

forbidden: xFy → xLy

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

Stochastic Context-Free Grammar (SCFG)

Terminal SymbolsA,C,G,U

Non-Terminal Symbols

S, L, F

Rules with Probabilities

LS xLSy xր ր ր

S → L F → xFy L → axFyb

with x , y , a, b ∈ A,C,G,U

forbidden: xFy → xLy

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

Generating RNA structure from SCFG

S

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

Generating RNA structure from SCFG

S

S−>LS

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

Generating RNA structure from SCFG

L L L L L L L L L L L L L L L L L L S

S−>LS

L L L

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

Generating RNA structure from SCFG

L

S−>x

LL

L−>x

L L L L L L L L L L L L L L L L L L S

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

Generating RNA structure from SCFG

L

S−>x

LL

L−>x

a c g u a a g a u u a u g g c a u u a

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

Generating RNA structure from SCFG

a c g u a a g a u u a u g g c a u u aL L L

L−>axFyb

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

Generating RNA structure from SCFG

a c g u a a g a u u a u g g c a u u a

L−>axFybu a c g

g uauuacg

F FF

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

Generating RNA structure from SCFG

a c g u a a g a u u a u g g c a u u aua

g cu a

auc g

ugFFF F−>xFy

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

Generating RNA structure from SCFG

a c g u a a g a u u a u g g c a u u ac

u aau

c gug

F−>xFyF F F

uaga u a u g c

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

Generating RNA structure from SCFG

a c g u a a g a u u a u g g c a u u ac

u aau

c gug

F−>xFyua

F

F

F

ga u a u g ca

a cc c

gg

gg

uu

uu a

u a

u a

a u

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

Generating RNA structure from SCFG

a c g u a a g a u u a u g g c a u u ac

u aau

c gug

uaga u a u g ca

a cc c

gg

gg

uu

uu a

u a

u a

a uF

F

F

F−>xLSy

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

Generating RNA structure from SCFG

a c g u a a g a u u a u g g c a u u ac

u aau

c gug

uaga u a u g ca

a cc c

gg

gg

uu

uu a

a

u a

a u

F−>xLSy

L S

L S

L S

ua u

g c

c g

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

Generating RNA structure from SCFG

a c g u a a g a u u a u g g c a u u ac

u aau

c gug

uaga u a u g ca

a cc c

gg

gg

uu

uu a

a

u a

a u

ua uL

g cL

c gL

S−>LS

S

S

S

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

Generating RNA structure from SCFG

a c g u a a g a u u a u g g c a u u ac

u aau

c gug

uaga u a u g ca

a cc c

gg

gg

uu

uu a

a

u a

a u

ua u

g c

c g

S−>LS

S

S

SL

LLL L

LLLLL L L LL

L

L L

LL

L

LLL

LL LLLL

LL

L

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

Generating RNA structure from SCFG

a c g u a a g a u u a u g g c a u u ac

u aau

c gug

uaga u a u g ca

a cc c

gg

gg

uu

uu a

a

u a

a u

ua u

g c

c g

S−>x

acgu u ccg

cuc

ucgu c g u

c

uucuucg

c a aa

gcL

LL

L−>xL−>axFyb

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

Generating RNA structure from SCFG

a c g u a a g a u u a u g g c a u u ac

u aau

c gug

uaga u a u g ca

a cc c

gg

gg

uu

uu a

a

u a

a u

ua u

g c

c gac

gu u ccg

cuc

cgu c g u

uc

cuucg

c a a

u ucg

a

uuu

ua

a gg

aagu

cg

au

ccau

caa a

u

u

gg

aaaa

cc

ccug a

u cc

u ccug

ucg

gu

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

Generating RNA structure from SCFG

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

Generating RNA structure from SCFG

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

Outline

1 RNA folding with Stochastic Context-Free GrammarsBasic ModelDynamic Programming

2 RNA folding with PseudoknotsPseudoknotsCombine SCFG with PseudoknotsBayesian Sampling of RNA structures with pseudoknots

3 Performance on DatatmRNASimulated Data

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

Computing the Probability of a sequence

For given S = (s1, . . . , sn) ∈ a,c,g,un and probabilities ofgrammar rules

LS xLSy xր ր ր

S → L F → xFy L → axFyb

compute the probability that S is transformed into S.

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

Computing the Probability of a sequence:partial problems

Φij(X ) := Pr(X is transformed into si , . . . , sj)

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

Computing the Probability of a sequence:partial problems

Φij(X ) := Pr(X is transformed into si , . . . , sj)

Aim: compute Φ1n(S) !

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

Computing the Probability of a sequence:partial problems

Φij(X ) := Pr(X is transformed into si , . . . , sj)

Aim: compute Φ1n(S) !

Φij(F ) = Φi+1,j−1(F ) · Pr(F → xFy) · πsi sj

+∑

k

πsi sj · Pr(F → xLSy) · Φi+1,k (L) · Φk+1,j−1(S)

F

si sj

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

Computing the Probability of a sequence:partial problems

Φij(X ) := Pr(X is transformed into si , . . . , sj)

Aim: compute Φ1n(S) !

Φij(F ) = Φi+1,j−1(F ) · Pr(F → xFy) · πsi sj

+∑

k

πsi sj · Pr(F → xLSy) · Φi+1,k (L) · Φk+1,j−1(S)

FF

si si+1 sjsj−1

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

Computing the Probability of a sequence:partial problems

Φij(X ) := Pr(X is transformed into si , . . . , sj)

Aim: compute Φ1n(S) !

Φij(F ) = Φi+1,j−1(F ) · Pr(F → xFy) · πsi sj

+∑

k

πsi sj · Pr(F → xLSy) · Φi+1,k (L) · Φk+1,j−1(S)

F

si si+1 sjsj−1sk sk+1

L S

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

Sampling a Structure from Posterior in SCFG model

Sample according to contribution to Φij(X )

S

s1 sn

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

Sampling a Structure from Posterior in SCFG model

Sample according to contribution to Φij(X )

S

SL

s1 snsksk+1Pr(S → LS) · Φ1k (L) · Φk+1,n(S)

Φ1n(S)

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

Sampling a Structure from Posterior in SCFG model

Sample according to contribution to Φij(X )

Pr(L → axFyb)·

·Φ3,k−2(F ) · πs1sk πs2sk−1

Φik (L)

S

S

SLF

shs3 sksk+1sk−2 sh+1

L

s1 sn

Pr(S → LS)·

·Φ1k (S) · Φk+1,n(L)

Φk+1,n(S)

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

Sampling a Structure from Posterior in SCFG model

Sample according to contribution to Φij(X )

Pr(L → axFyb)·

·Φ3,k−2(F ) · πs1sk πs2sk−1

Φik (L)

S

S

SLF

shs3 sksk+1sk−2 sh+1

L

s1 sn

Pr(S → LS)·

·Φ1k (S) · Φk+1,n(L)

Φk+1,n(S)

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

Sampling a Structure from Posterior in SCFG model

Parse TreeS S

S

S

SS

S

S

S

F

F

F

s2s3s4 s5 s6s7s8s9s10 sn−1. . .

L LL

L

LL

L

L

s1 sn

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

Sampling a Structure from Posterior in SCFG model

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

Sampling a Structure from Posterior in SCFG model

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

Outline

1 RNA folding with Stochastic Context-Free GrammarsBasic ModelDynamic Programming

2 RNA folding with PseudoknotsPseudoknotsCombine SCFG with PseudoknotsBayesian Sampling of RNA structures with pseudoknots

3 Performance on DatatmRNASimulated Data

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

Pseudoknots

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

Pseudoknots

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

Structure of Escherichia Coli tmRNA

picture stolen from tmRNA websitehttp://www.indiana.edu/∼tmrna/

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

Structure of Escherichia Coli tmRNA

picture stolen from tmRNA websitehttp://www.indiana.edu/∼tmrna/

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

Structure of Escherichia Coli tmRNA

RNAfold estimationGGGGCUG

A UUCU

GGA

UUCG

AC

GGGAUUUGCGA

AAC

CCA

AGG

UGC

AU G

CCG

A GG

GGCGGUU

GG

CCUC

GUA

A A AA G C C G C

AAAAAAUAGU C G C A A ACGACGAAAACUACGCUUUA

GCAGCUUA

AUA

ACCUG

CUUAG

A

GCC

CU

CU

CU

CC

CUA

GC

CUC

CG

C U CUU

AGG A

CGGGGAUCAAGAGAGG

UCA

AACCCAA A

AGAGAUCG

CG

UGG

A AG

CC

CUG C

CUG

GG

GU

UGA

AGC

GUUAAA

ACUUAAU

CAGGCUA

G U U U G U U A G UG G C G

U G UCC

GUCCGC

AGCUGGCAAGCGA

AUG

UA

AA

GAC

UGA

CU

AA

GCAUG

UA

GU

A

CCG

AGGAU

GUAGGAA

U U UCG

GACGCGGG U

UC

AACU

CC

CG

CCAGCUCC

ACCA

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

Structure of Escherichia Coli tmRNA

RNAfold estimation Given on tmRNA WebsiteGGGGCUG

A UUCU

GGA

UUCG

AC

GGGAUUUGCGA

AAC

CCA

AGG

UGC

AU G

CCG

A GG

GGCGGUU

GG

CCUC

GUA

A A AA G C C G C

AAAAAAUAGU C G C A A ACGACGAAAACUACGCUUUA

GCAGCUUA

AUA

ACCUG

CUUAG

A

GCC

CU

CU

CU

CC

CUA

GC

CUC

CG

C U CUU

AGG A

CGGGGAUCAAGAGAGG

UCA

AACCCAA A

AGAGAUCG

CG

UGG

A AG

CC

CUG C

CUG

GG

GU

UGA

AGC

GUUAAA

ACUUAAU

CAGGCUA

G U U U G U U A G UG G C G

U G UCC

GUCCGC

AGCUGGCAAGCGA

AUG

UA

AA

GAC

UGA

CU

AA

GCAUG

UA

GU

A

CCG

AGGAU

GUAGGAA

U U UCG

GACGCGGG U

UC

AACU

CC

CG

CCAGCUCC

ACCA

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

Structure of Escherichia Coli tmRNA

0 100 200 300

Given on tmRNA Webpage

Estimated by RNAfold

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

Outline

1 RNA folding with Stochastic Context-Free GrammarsBasic ModelDynamic Programming

2 RNA folding with PseudoknotsPseudoknotsCombine SCFG with PseudoknotsBayesian Sampling of RNA structures with pseudoknots

3 Performance on DatatmRNASimulated Data

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

Models with restricted types of pseudoknots

L. Cai, R. L. Malmberg and Y. Wu. (2003)Stochastic modeling of RNA pseudoknotted structures: agramatical approach. Bioinformatics 19: i66-i73.

E. Rivas and S. R. Eddy. (1999)A dynamic programming algorithm for RNA structureprediction including pseudoknots. J. Mol. Biol.285:2053-2068.

J. Reeder and R. Giegerich.(2004)Design, implementation and evaluation of a practicalpseudoknot folding algorithm based on thermodynamics.BMC Bioinformatics, 5:104.

... and several others

restrict the way pseuknots may intersect

compute THE BEST Structure for given RNA sequence

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

Aims for our Pseudoknot Grammar / Method

a priori no restrictions on pseudoknot interactions

sampling structures from posterior distributions

efficient if high number of pseudoknots is unlikely

rather prior distribution than biologically meaningful model

simplicity!

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

Aims for our Pseudoknot Grammar / Method

a priori no restrictions on pseudoknot interactions

sampling structures from posterior distributions

efficient if high number of pseudoknots is unlikely

rather prior distribution than biologically meaningful model

simplicity!

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

Aims for our Pseudoknot Grammar / Method

a priori no restrictions on pseudoknot interactions

sampling structures from posterior distributions

efficient if high number of pseudoknots is unlikely

rather prior distribution than biologically meaningful model

simplicity!

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

Aims for our Pseudoknot Grammar / Method

a priori no restrictions on pseudoknot interactions

sampling structures from posterior distributions

efficient if high number of pseudoknots is unlikely

rather prior distribution than biologically meaningful model

simplicity!

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

Aims for our Pseudoknot Grammar / Method

a priori no restrictions on pseudoknot interactions

sampling structures from posterior distributions

efficient if high number of pseudoknots is unlikely

rather prior distribution than biologically meaningful model

simplicity!

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

Combine SCFG with pseudoknots

LS xLSy xր ր ր

S → L F → xFy L → axFybց

Q

1 SCFG RNA with Q-Symbols2 random mating of Q-symbols3 Q-Q-pairs produce stems

QQQQQQQ

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

Combine SCFG with pseudoknots

LS xLSy xր ր ր

S → L F → xFy L → axFybց

Q

1 SCFG RNA with Q-Symbols2 random mating of Q-symbols3 Q-Q-pairs produce stems

QQQQQQQ

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

Combine SCFG with pseudoknots

LS xLSy xր ր ր

S → L F → xFy L → axFybց

Q

1 SCFG RNA with Q-Symbols2 random mating of Q-symbols3 Q-Q-pairs produce stems

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

Notations

S: given Sequence

Q: Configuration of Q-stems

Ψ: SCFG Parse Tree

Ω = [Ψ,Q]: Stucture = (i , j) | Positions i and j are paired .

θ: Model parameters

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

Notations

S: given Sequence

Q: Configuration of Q-stems

Ψ: SCFG Parse Tree

Ω = [Ψ,Q]: Stucture = (i , j) | Positions i and j are paired .

θ: Model parameters

Aims:

ComputeLS(θ) = Prθ(S) =

Ψ,Q Prθ(S | Q, Ψ) · Prθ(Ψ) · Prθ(Q | Ψ)

Sample RNA Structure according toPr(Ω | S) =

Ψ,Q : [Ψ,Q]=Ω Pr(Ψ,Q | S)

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

Outline

1 RNA folding with Stochastic Context-Free GrammarsBasic ModelDynamic Programming

2 RNA folding with PseudoknotsPseudoknotsCombine SCFG with PseudoknotsBayesian Sampling of RNA structures with pseudoknots

3 Performance on DatatmRNASimulated Data

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

For fixed Q doable by dynamic programming

Compute Pr(Q | S) =∑

Ψ Pr(Ψ,Q | S)

Sample SCFG Parse Tree Ψ according to Pr(Ψ | Q,S)

Sample Structure Ω according to Pr(Ω | Q,S)

Computearg max

ΨPr(Ψ | Q,S)

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

For fixed Q doable by dynamic programming

Compute Pr(Q | S) =∑

Ψ Pr(Ψ,Q | S)

Sample SCFG Parse Tree Ψ according to Pr(Ψ | Q,S)

Sample Structure Ω according to Pr(Ω | Q,S)

Computearg max

ΨPr(Ψ | Q,S)

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

For fixed Q doable by dynamic programming

Compute Pr(Q | S) =∑

Ψ Pr(Ψ,Q | S)

Sample SCFG Parse Tree Ψ according to Pr(Ψ | Q,S)

Sample Structure Ω according to Pr(Ω | Q,S)

Computearg max

ΨPr(Ψ | Q,S)

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

For fixed Q doable by dynamic programming

Compute Pr(Q | S) =∑

Ψ Pr(Ψ,Q | S)

Sample SCFG Parse Tree Ψ according to Pr(Ψ | Q,S)

Sample Structure Ω according to Pr(Ω | Q,S)

Computearg max

ΨPr(Ψ | Q,S)

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

Bayesian sampling of Ω

Strategy for sampling RNA structure Ω according to itsposterior probability Pr(Ω | S) for given RNA sequence S:

1 Sample Qi according to Pr(Q | S) by Markov-Chain MonteCarlo (MCMC) Method.

2 Sample Ψi according to Pr(Ψ | Qi ,S) by dynamicprogramming.

3 Then Ωi = [Ψi ,Qi ] is sample according to

Pr(Ω | S) =∑

Ψ,Q : [Ψ,Q]=Ω

Pr(Ψ,Q | S)

=∑

Ψ,Q : [Ψ,Q]=Ω

Pr(Ψ | Q,S) · Pr(Q | S)

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

Bayesian sampling of Ω

Strategy for sampling RNA structure Ω according to itsposterior probability Pr(Ω | S) for given RNA sequence S:

1 Sample Qi according to Pr(Q | S) by Markov-Chain MonteCarlo (MCMC) Method.

2 Sample Ψi according to Pr(Ψ | Qi ,S) by dynamicprogramming.

3 Then Ωi = [Ψi ,Qi ] is sample according to

Pr(Ω | S) =∑

Ψ,Q : [Ψ,Q]=Ω

Pr(Ψ,Q | S)

=∑

Ψ,Q : [Ψ,Q]=Ω

Pr(Ψ | Q,S) · Pr(Q | S)

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

Bayesian sampling of Ω

Strategy for sampling RNA structure Ω according to itsposterior probability Pr(Ω | S) for given RNA sequence S:

1 Sample Qi according to Pr(Q | S) by Markov-Chain MonteCarlo (MCMC) Method.

2 Sample Ψi according to Pr(Ψ | Qi ,S) by dynamicprogramming.

3 Then Ωi = [Ψi ,Qi ] is sample according to

Pr(Ω | S) =∑

Ψ,Q : [Ψ,Q]=Ω

Pr(Ψ,Q | S)

=∑

Ψ,Q : [Ψ,Q]=Ω

Pr(Ψ | Q,S) · Pr(Q | S)

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

Markov-Chain Monte Carlo (MCMC)

MCMC: construct Markov chain Q0,Q1,Q2, ... with stationarydistribution Pr(Q | S) and let it converge.

Metropolis-Hastings:Given current state Qi propose Q′ with Prob. p(Qi → Q′)Accept Qi+1 := Q′ with probability

min

1,p(Q′ → Qi) · Pr(Q′ | S)

p(Qi → Q′) · Pr(Qi | S)

otherwise Qi+1 := Qi

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

Markov-Chain Monte Carlo (MCMC)

MCMC: construct Markov chain Q0,Q1,Q2, ... with stationarydistribution Pr(Q | S) and let it converge.

Metropolis-Hastings:Given current state Qi propose Q′ with Prob. p(Qi → Q′)Accept Qi+1 := Q′ with probability

min

1,p(Q′ → Qi) · Pr(Q′ | S)

p(Qi → Q′) · Pr(Qi | S)

otherwise Qi+1 := Qi

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

Proposals for Qi+1

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

Proposals for Qi+1

?

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

Proposals for Qi+1

?

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

Candidates for Pseudoknots

0 50 100 150 200 250 300 350

tmRNA of rice bacterium

position

0.05−0.20.2 −0.40.4 −0.60.6 −0.80.8−0.1>=1.0

SCFG stems

HSP scores

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

Candidates for Pseudoknots

0 50 100 150 200 250 300 350

tmRNA of rice bacterium

position

0.05−0.20.2 −0.40.4 −0.60.6 −0.80.8−0.1>=1.0

weighted scores

SCFG stems

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

Weight for Q-stem proposal

Proposal probability for HSP ∝1 − e(alignment score)·c1

max(SCFG stem probability), c2

c1 = 10−6, c2 = 10−5

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

Looking for Optima

We search forarg max

[Ψ,Q]Pr([Ψ,Q] | S)

by simulated annealing.

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

Outline

1 RNA folding with Stochastic Context-Free GrammarsBasic ModelDynamic Programming

2 RNA folding with PseudoknotsPseudoknotsCombine SCFG with PseudoknotsBayesian Sampling of RNA structures with pseudoknots

3 Performance on DatatmRNASimulated Data

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

Model parameter values used

LS 0.85 xLSy 0.25 x 0.9ր ր ր

S F L → axFyb 0.06ց ց ց

L 0.15 xFy 0.75 Q 0.04

πCG = πGC = 27.5%πAT = πTA = 17.5%πGU = πUG = 5%

πA = 35%πC = 20%πG = 20%πU = 25%

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

0 50 100 150 200 250 300 350

Treponema pallidum pre−tmRNAposterior vs. most probable

position

0.05−0.20.2 −0.40.4 −0.60.6 −0.80.8−0.1>=1.0

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

0 50 100 150 200 250 300 350

Treponema pallidum pre−tmRNAposterior vs. known

position

0.05−0.20.2 −0.40.4 −0.60.6 −0.80.8−0.1>=1.0

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

0 50 100 150 200 250 300 350

Treponema pallidum pre−tmRNApredictions vs. real

position

pknotsRG

RNAfold & pknotsRGAll

McQFoldRNAfold

McQFold & RNAfoldMcQFold & pknotsRG

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

0 50 100 150 200 250 300 350

tmRNA of rice bacteriumposterior vs. most probable

position

0.05−0.20.2 −0.40.4 −0.60.6 −0.80.8−1.0>=1.0

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

0 50 100 150 200 250 300 350

tmRNA of rice bacteriumposterior vs. known

position

0.05−0.20.2 −0.40.4 −0.60.6 −0.80.8−0.1>=1.0

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

0 50 100 150 200 250 300 350

tmRNA of rice bacteriumpredictions vs. known

position

RNAfoldpknotsRG

RNAfold & pknotsRGAll

McQFold

McQFold & RNAfoldMcQFold & pknotsRG

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

Comparison with RNAfold and pknotsRG

351 tmRNA Sequeces of length> 200 fromhttp://www.indiana.edu/∼tmrna/

estimated estimatedto be paired not to be paired

in fact paired A ain fact not paired B b

correctness: A/(A + B)sensitivity: A/(A + a)

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

RNAfold pknotsRG RNAfold pknotsRG

0.0

0.2

0.4

0.6

0.8

1.0

tmRNAcorrectness sensitivity

McQFold McQFold

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

estimated pairing probabilities

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

tmRNA

estimated probabilities

obse

rved

rel

ativ

e fr

eque

ncie

s

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

Outline

1 RNA folding with Stochastic Context-Free GrammarsBasic ModelDynamic Programming

2 RNA folding with PseudoknotsPseudoknotsCombine SCFG with PseudoknotsBayesian Sampling of RNA structures with pseudoknots

3 Performance on DatatmRNASimulated Data

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

Simulated Data

200 folded sequences of length 300-460

generated according to our model

same parameters as above

McQFold uses same parameters

of course unfair against RNAfold and pknotsRG

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

Simulated Data

200 folded sequences of length 300-460

generated according to our model

same parameters as above

McQFold uses same parameters

of course unfair against RNAfold and pknotsRG

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

Simulated Data

200 folded sequences of length 300-460

generated according to our model

same parameters as above

McQFold uses same parameters

of course unfair against RNAfold and pknotsRG

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

Simulated Data

200 folded sequences of length 300-460

generated according to our model

same parameters as above

McQFold uses same parameters

of course unfair against RNAfold and pknotsRG

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

Simulated Data

200 folded sequences of length 300-460

generated according to our model

same parameters as above

McQFold uses same parameters

of course unfair against RNAfold and pknotsRG

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

RNAfold pknotsRG RNAfold pknotsRG

0.0

0.2

0.4

0.6

0.8

1.0

Simulated Datacorrectness sensitivity

McQFold McQFold

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

Simulated Data

estimated probabilities

obse

rved

rel

ativ

e fr

eque

ncie

s

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

Links

D. Metzler, M. Nebel (2006) Predicting RNA SecondaryStructures with Pseudoknots by MCMC Samplingsubmitted

Preprint:www.cs.uni-frankfurt.de/∼metzler/McQFold/McQFold.pdf

Homepage of McQFold Softwarewww.cs.uni-frankfurt.de/∼metzler/McQFold/

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

Conclusions and Future Plans

Estimation of RNA structure from sequence can be veryuncertain.

Uncertainty should be assessed. This can be done byBayesian sampling.

Perhaps better to combine informations from relatedsequences.

Combine SCFGs with Q-stems to allow pseudoknots instructural alignments and/or structural sequence profiles.

RNA folding with SCFGs RNA folding with Pseudoknots Performance on Data Links Conclusions

Conclusions and Future Plans

Estimation of RNA structure from sequence can be veryuncertain.

Uncertainty should be assessed. This can be done byBayesian sampling.

Perhaps better to combine informations from relatedsequences.

Combine SCFGs with Q-stems to allow pseudoknots instructural alignments and/or structural sequence profiles.

Thanks: Martina Fröhlich, Christian Färber, Markus Nebel,audience