32
Rapid ab initio RNA Folding Including Pseudoknots via Graph Tree Decomposition Jizhen Zhao, Liming Cai Russell Malmberg Computer Science Plant Biology University of Georgia

Rapid ab initio RNA Folding Including Pseudoknots via Graph Tree Decomposition Jizhen Zhao, Liming Cai…

Embed Size (px)

DESCRIPTION

Increased number of ncRNAs ncRNA function other than coding proteins, e.g., structural, catalytic, and regulatory factors ncRNA genes do not have strong statistical features, such as ORFs, or polyadenylated, except Transcribed ncRNA molecules can fold into stable (and unique) secondary or tertiary structures

Citation preview

Page 1: Rapid ab initio RNA Folding Including Pseudoknots via Graph Tree Decomposition Jizhen Zhao, Liming Cai…

Rapid ab initio RNA Folding Including Pseudoknots via Graph Tree Decomposition Jizhen Zhao, Liming Cai Russell Malmberg

Computer Science Plant Biology University of Georgia

Page 2: Rapid ab initio RNA Folding Including Pseudoknots via Graph Tree Decomposition Jizhen Zhao, Liming Cai…

Why another RNA folding algorithm?

• The need for RNA analysis tools has increased because of the number of recently found functional RNAs (i.e., ncRNAs).

• RNA folding algorithms are not completely satisfactory in spite of having been intensively studied for more than 25 years.

Page 3: Rapid ab initio RNA Folding Including Pseudoknots via Graph Tree Decomposition Jizhen Zhao, Liming Cai…

Increased number of ncRNAs

• ncRNA function other than coding proteins, e.g., structural, catalytic, and regulatory factors

• ncRNA genes do not have strong statistical features, such as ORFs, or polyadenylated, except

• Transcribed ncRNA molecules can fold into stable (and unique) secondary or tertiary structures

Page 4: Rapid ab initio RNA Folding Including Pseudoknots via Graph Tree Decomposition Jizhen Zhao, Liming Cai…

Increased number of ncRNAs• rRNAs and tRNAs• RNA maturation: snRNA in recognizing splicing

sites• RNA modification: snoRNA converting uridine to

pseudo-uridine• Regulation of gene expression and translation:

e.g., miRNAs• DNA replication: e.g., telomerase RNAs - template

for addition of telomeric repeats• Etc.

In introns, intergenic regions, or 5’ and 3’ UTRs,

Page 5: Rapid ab initio RNA Folding Including Pseudoknots via Graph Tree Decomposition Jizhen Zhao, Liming Cai…

Increased number of ncRNAs(Bompfunewerer, et al, 2005)

Class Size Function Phylogeneticdistribution

tRNA 70-80 Translation ubiquitous

rRNA16S/18S28S+5.8S/23S5S

1.5K3K130

translation ubiquitous

RNase PMRP

220-440250-350

tRNA -maturation ubiquitouseukarya

snoRNA

telomerase

130

400-550

pseudouridinylationaddition of repeats

snRNAU1 ~ U6

100-600130-140

SpliceosomemRNA maturation

EukaryaEukarya, archaea

U7

7SK

~65

~300

Histone mRNAMaturationTranslationalregulation

Eukayotes

vertebrata

tmRNA 300-400 Tags proteinFor proteolysis

bacteria

miRNA ~22 Post-tran. Reg. Multi-cellular orgs

Page 6: Rapid ab initio RNA Folding Including Pseudoknots via Graph Tree Decomposition Jizhen Zhao, Liming Cai…

Long history of RNA foldings• First simple RNA folding algorithm (Nussinov

1978) • Thermodynamic based (Zuker&Stiegler 1981)• Zuker’s (1989) • mFOLD 3.2 • RNAfold (a part of Vienna Package 1.6.1)

• Not all that accurate on single sequence• Inherent computational complex from DP• Unable to predict pseudoknots

Page 7: Rapid ab initio RNA Folding Including Pseudoknots via Graph Tree Decomposition Jizhen Zhao, Liming Cai…

Background

• Base pairings allow RNA to fold

Watson-Crick base pairs: A-U, C-GWobble pair G-U

non-canonical pairs are also possible

Page 8: Rapid ab initio RNA Folding Including Pseudoknots via Graph Tree Decomposition Jizhen Zhao, Liming Cai…

N N

N

O

H

H

5’-u-u-c-c-g-a-a-g-c-u-c-a-a-c-g-g-g-a-a-a-u-g-a-g-c-u-3’

P a

P c

5’ 3’

P u a

P

g

P

CYTOSINEN

N

N

O

H

H

H

N

N

GUANINE

URACIL ADENINE

N N

O

O

H

N

N

N

N

N

HH

Page 9: Rapid ab initio RNA Folding Including Pseudoknots via Graph Tree Decomposition Jizhen Zhao, Liming Cai…

Secondary structure is important to tertiary structure

Page 10: Rapid ab initio RNA Folding Including Pseudoknots via Graph Tree Decomposition Jizhen Zhao, Liming Cai…

Hairpin loopJunction (Multiloop)

Bulge Loop

Single-Stranded

Interior Loop

Stem

Image– Wuchty

Pseudoknot

Page 11: Rapid ab initio RNA Folding Including Pseudoknots via Graph Tree Decomposition Jizhen Zhao, Liming Cai…

aacguu ccccucu ggggcagc cc

aga ugccc

stem (double helix): stacked base pairs

loop: strand of unpaired bases

accacc ggu

Page 12: Rapid ab initio RNA Folding Including Pseudoknots via Graph Tree Decomposition Jizhen Zhao, Liming Cai…

aacguu ccccucu acc ggggcagc ggucc

aga ugcacccc

Pseudoknots: crossing patterns of stems

Page 13: Rapid ab initio RNA Folding Including Pseudoknots via Graph Tree Decomposition Jizhen Zhao, Liming Cai…
Page 14: Rapid ab initio RNA Folding Including Pseudoknots via Graph Tree Decomposition Jizhen Zhao, Liming Cai…

terminates translation errors

Bacterial tmRNA consensus structure(Felden et al. 2001. NAR 29)

Page 15: Rapid ab initio RNA Folding Including Pseudoknots via Graph Tree Decomposition Jizhen Zhao, Liming Cai…

Pseudoknots in TMV 3’ UTR

Promotes efficient translationBinds EF1A, cooperates with 5’UTR

(Leathers et al. 1993 MCB 13Zeenko et al. 2002 JVI 76)

Page 16: Rapid ab initio RNA Folding Including Pseudoknots via Graph Tree Decomposition Jizhen Zhao, Liming Cai…

Previous work (Nussinov’s)

• maximizing the number of base pairs (Nussinov et al, 1978)

simple case(i, j) = 1

Page 17: Rapid ab initio RNA Folding Including Pseudoknots via Graph Tree Decomposition Jizhen Zhao, Liming Cai…

Previous work (Zuker’s)

• Thermodynamic energy based method (Zuker and Stiegler 1981)

• Energy minimization algorithm: find the secondary structure to minimize the free energy (G)

G calculated as sum of individual contributions of:– loops– base pairs– secondary structure elements

Page 18: Rapid ab initio RNA Folding Including Pseudoknots via Graph Tree Decomposition Jizhen Zhao, Liming Cai…

Previous work (Zuker’s)

• Free-energy values (kcal/mole at 37oC )

• Energies of stems calculated as stacking contributions between neighboring base pairs

Page 19: Rapid ab initio RNA Folding Including Pseudoknots via Graph Tree Decomposition Jizhen Zhao, Liming Cai…

Previous work (Zuker’s)

Page 20: Rapid ab initio RNA Folding Including Pseudoknots via Graph Tree Decomposition Jizhen Zhao, Liming Cai…

MFOLD: computing loop dependent energies

Previous work (Zuker’s)

Page 21: Rapid ab initio RNA Folding Including Pseudoknots via Graph Tree Decomposition Jizhen Zhao, Liming Cai…

Difficult issues

• Energy associated with any position is only influenced by local sequence and structure

• mFOLD does not predict pseudoknots• PKnots: (Eddy and Rivas 1999) predict

restricted cases of pseudoknots, O(n6) time and O(n4) space

• Min energy-based pseudoknot prediction is NP-hard (Lyngso and Pederson 2000)

Page 22: Rapid ab initio RNA Folding Including Pseudoknots via Graph Tree Decomposition Jizhen Zhao, Liming Cai…

Pseudoknots drastically increase the complexity

Page 23: Rapid ab initio RNA Folding Including Pseudoknots via Graph Tree Decomposition Jizhen Zhao, Liming Cai…

Heuristic RNA folding algoithmsILM (Ruan et al 2004)HotKnots (Ren et al 2005)

• Fast, sometime slow• unlimited class of pseudoknots• do not guarantee the optimality of the

predicted structure

Page 24: Rapid ab initio RNA Folding Including Pseudoknots via Graph Tree Decomposition Jizhen Zhao, Liming Cai…

This work

• Graph-theoretic based, aviod nucleotide level DP

• Unlimited pseudoknot structures

• Optimal solutions

• Fast

• Comparable performance in accuracy

Page 25: Rapid ab initio RNA Folding Including Pseudoknots via Graph Tree Decomposition Jizhen Zhao, Liming Cai…

This work (summary)

1. Model: similar to ILM, without loop energy

2. Approach: Find all stable stems, construct a stem graphReduce folding to independent set problem

3. Techniques:tree-decompose the stem graphDP to obtain optimal solution

Page 26: Rapid ab initio RNA Folding Including Pseudoknots via Graph Tree Decomposition Jizhen Zhao, Liming Cai…

This work (approach)

Page 27: Rapid ab initio RNA Folding Including Pseudoknots via Graph Tree Decomposition Jizhen Zhao, Liming Cai…

This work (approach)

A set of non-overlapping stems corresponds to an independent set of the stem graph.

The weight of each vertex is related to the energy of the corresponding stem.

Page 28: Rapid ab initio RNA Folding Including Pseudoknots via Graph Tree Decomposition Jizhen Zhao, Liming Cai…

This work (techniques)

A tree decomposition of the stem graph

Tree width t = 4

Page 29: Rapid ab initio RNA Folding Including Pseudoknots via Graph Tree Decomposition Jizhen Zhao, Liming Cai…

This work (techniques)

A tree decomposition of the stem graph

Tree width t = 4

Find an approximate tree decomposition of width t

MWIS can be found in time O(2tN), N=O(n2)by DP over the tree

Time can be improved to O(et/e) = O(1.44t)

Page 30: Rapid ab initio RNA Folding Including Pseudoknots via Graph Tree Decomposition Jizhen Zhao, Liming Cai…

This work (experimental results) Data sets: 50 tRNAs (length 71 - 79) 50 pseudoknots (23 - 113) 11 large RNAs (210 - 412

Compared with PKnots (DP, optimal, restricted pks)ILM (heuristic, unrestricted)HotKnots (heuristic, unrestricted

Measuresensitivity = TP/Real totalspecificity = TP/(TP+FP)Time

Page 31: Rapid ab initio RNA Folding Including Pseudoknots via Graph Tree Decomposition Jizhen Zhao, Liming Cai…

This work (experimental results)

Page 32: Rapid ab initio RNA Folding Including Pseudoknots via Graph Tree Decomposition Jizhen Zhao, Liming Cai…

Conclusion

• A new graph-theoretic algorithm to RNA folding

• Performance comparable with the best in both accuracy and speed

• With much room to be improved

• Applications in multiple structure alignment as well as in folding single sequence

• A part of NIH project for ncRNA gene search