Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
Math 8803/4803, Spring 2008:Discrete Mathematical Biology
Prof. Christine Heitsch
School of Mathematics
Georgia Institute of Technology
Lecture 9 – January 27, 2008
Combinatorics on biological words
DNA −→ ←− RNA
DNA and RNA are (oriented, biochemical) sequences over (nucleotide)
alphabets with the complementary Watson-Crick base pairing.
C. E. Heitsch, GA Tech 1
The “Central Dogma of Molecular Biology”
“Information flows as:
where RNA mediates the production of proteins from DNA.”
C. E. Heitsch, GA Tech 2
RNA: more than just the messenger
Breakthrough of the year:Small RNAs Make Big SplashPublished in Science Issue of 20 Dec 2002.
The structure of Pariacoto virus revealsa dodecahedral cage of duplex RNA,by Tang et. al. in Nat Struct Biol.
C. E. Heitsch, GA Tech 3
Biological function follows form
“Over the past two decades it has become clear that a variety of
RNA molecules have important or essential biological functions in
cells, beyond the well-established roles of ribosomal, transfer and
messenger RNAs in protein biosynthesis. . . . Each class of RNA is
likely to have a unique fold that confers biochemical function.”
From “Structural Genomics of RNA” by Jennifer A. Doudna,
published in Nature Structural Biology, Nov. 2000.
C. E. Heitsch, GA Tech 4
Levels of RNA structure I
Primary: linear sequence of nucleotide bases
Tertiary: all otherintra−molecular
interactions
Secondary: set of base pairsinduced by self−bonding
Three DimensionalRNA MolecularStructure
tRNA
C. E. Heitsch, GA Tech 5
Levels of RNA structure II
Selective base pair hybridization ⇐⇒ structure and function
GCGGAUUUAG
UCGCACCA
GCCUGAAGAUCUGGAGGUCCUGGUUCGAUCCACAGAAU
CUCAGUUGGGAGAGCGCCA
Primary sequence −→ secondary structure −→ 3D molecule
C. E. Heitsch, GA Tech 6
Important biomathematical questions
GCGGAUUUAG
UCGCACCA
GCCUGAAGAUCUGGAGGUCCUGGUUCGAUCCACAGAAU
CUCAGUUGGGAGAGCGCCA
Prediction?
Analysis?
Design?
How do RNA sequences encode secondary structures?
C. E. Heitsch, GA Tech 7
Sequence to structure: a one-to-many mapping
R = gcgga uuuagcuc aguuggga gagc g ccaga cugaa
gaucugg agguc cugug uucgauc cacag a auucgc acca
S1(R) =
S2(R) =
Hypothesis: RNA sequences fold with minimal free energy.
C. E. Heitsch, GA Tech 8
Thermodynamics of RNA folding
‘‘Helices’’
‘‘Loops’’
RNA secondary structures are balanced between
energetically favorable helices (stacked base pairs)
and destabilizing loops (single-stranded regions).
Loops
Helices
C. E. Heitsch, GA Tech 9
Predicting nested RNA base pairings
Let R = b1b2 . . . bn ∈ {a,u,c,g}+ be a 5′ to 3′ RNA sequence.
Definition. Let S(R) be a set of base pairs
S(R) = {bi − bj | 1 ≤ i < j ≤ n}
where all bi − bj and bi′ − bj′ are
distinct, i = i′ ⇐⇒ j = j′, and
either i < i′ < j′ < j or i < j < i′ < j′.
Then S(R) is a nested secondary structure of R.
C. E. Heitsch, GA Tech 10
Predicting nested RNA base pairings
Let R = b1b2 . . . bn ∈ {a,u,c,g}+ be a 5′ to 3′ RNA sequence.
Definition. Let S(R) be a set of base pairs
S(R) = {bi − bj | 1 ≤ i < j ≤ n}
where for all distinct bi − bj and
bi′ − bj′, i = i′ ⇐⇒ j = j′,
either i < i′ < j′ < j or i < j < i′ < j′.
Then S(R) is a nested secondary structure of R.
C. E. Heitsch, GA Tech 10
Predicting nested RNA base pairings
Let R = b1b2 . . . bn ∈ {a,u,c,g}+ be a 5′ to 3′ RNA sequence.
Definition. Let S(R) be a set of base pairs
S(R) = {bi − bj | 1 ≤ i < j ≤ n}
where for all distinct bi − bj and
bi′ − bj′, i = i′ ⇐⇒ j = j′,
either i < i′ < j′ < j or i < j < i′ < j′.
Then S(R) is a nested secondary structure of R.
C. E. Heitsch, GA Tech 10
Components of RNA secondary structures I
Definition. Let i.j denote a base pair bi − bj in a a nested RNA secondarystructure S(R).
• A base bi′ or base pair i′.j′ ∈ S(R) is accessible from i.j if i < i′ (< j′) < jand if there is no other base pair i′′.j′′ ∈ S(R) such thati < i′′ < i′ (< j′) < j′′ < j.
• A stacked pair is formed if exactly the base pair (i + 1).(j − 1) is accessiblefrom i.j. Successive stacked pairs form a stem.
• Otherwise, i.j closes a k-loop with k − 1 base pairs for k ≥ 1 and l ≥ 0unpaired bases accessible from i.j.
• The external loop, denoted Le(k− 1), is the set of l > 0 unpaired bases andk − 1 base pairs without a closing base pair.
C. E. Heitsch, GA Tech 11
Components of RNA secondary structures II
Assumption 1. The free energy of S(R) is the sum of loop free energies.Assumption 2. The free energy of a loop is independent of all other loops.
http://www.cs.washington.edu/education/courses/527/00wi/
C. E. Heitsch, GA Tech 12
Nearest neighbor energy model
∆G = -22.7 kcal/mole
S. cerevisiaePhe-tRNAat 37◦
C. E. Heitsch, GA Tech 13
Free energy parameters
Version 3.0 for RNA Folding at 37◦
Available through http://www.bioinfo.rpi.edu/∼zukerm/
C. E. Heitsch, GA Tech 14
Algorithmic question
Problem. How to find a RNA folding with minimal free energy?
Solution. Dynamic programming.
Reason. Recursive prediction of RNA secondary structures.
Let R = b1b2 . . . bi . . . bj . . . bn and W (0) = 0.
W (j) = min(W (j − 1), min1≤i<j
(V (i, j) + W (i− 1))) for j > 0
W (j) is the minimal free energy of an optimal structure for the first j residues. V (i, j) is as
W (j), but assuming i.j forms a base pair. A recursive calculation of V (i, j) depends on
V (i + 1, j − 1) and four other functions.
C. E. Heitsch, GA Tech 15
Recursive RNA secondary structure prediction
Let R = b1b2 . . . bi . . . bj . . . bn and W (0) = 0.
W (j) = min(W (j − 1), min1≤i<j
(V (i, j) + W (i− 1))) for j > 0
V (i, j) =∞ for i ≥ j
min(eH(i, j), eS(i, j) + V (i + 1, j − 1), V BI(i, j), V M(i, j)) for i < j
V BI(i, j) = mini′, j′
i < i′ < j′ < j
(eL(i, j, i′, j′) + V (i
′, j′))
V M(i, j) =
mink, i1, j1, i2, j2, . . . , ik, jk
i < i1 < j1 < i2 < j2 < . . . < ik < jk < j
k ≥ 2
(eM(i, j, i1, j1, i2, j2, . . . , ik, jk) +kX
h=1
V (ih, jh))
C. E. Heitsch, GA Tech 16
Secondary structure prediction software
• The DINAMelt web server by Nick Markham & Michael Zuker –
http://www.bioinfo.rpi.edu/applications/hybrid/
• Michael Zuker’s mfold Server –
http://www.bioinfo.rpi.edu/applications/mfold/rna/form1.cgi
• Vienna RNA Package – http://www.tbi.univie.ac.at/∼ivo/RNA/
• RNAsoft – http://www.rnasoft.ca/
• GTfold – coming soon!
C. E. Heitsch, GA Tech 17
Open problem: suboptimal foldings
The ∆E = 9.5 energy dot plot forthe cdk2 gene of Xenopus leavis.
Dots represent base pairs, colorcoded by energy (in kcal/mole).
Black: in optimal folding
Red: within 3.1
Blue: from 3.2-6.2
Yellow: from 6.3-9.5
C. E. Heitsch, GA Tech 18
Open problem: motif recognition
∆G = −2976.7 kcal / mole
3126 base pairs: 868 a - u, 1890 g - c, 368 u - g
Palmenberg & Sgro, unpublished.
Hepatitis C China Virus Complete Genome
9400 bases: 1920 a, 2796 c, 2650 g, 2034 u
Mfold secondary structure prediction
C. E. Heitsch, GA Tech 19
Open problem: pseudoknots
i
ji’
j’Pseudoknotted structure:
i < i′ < j < j′. 3D pseudoknot model
C. E. Heitsch, GA Tech 20
Acknowledgments
• Access Excellence Graphics Gallery –http://www.accessexcellence.com/RC/VL/GG/index.html
• Predicted RNA foldings courtesy of Michael Zuker’s mfold algorithm.
• CSE 527, Winter 2000 – Computational Biology – Martin Tompahttp://www.cs.washington.edu/education/courses/527/00wi/
• Bio-5495 – RNA Secondary Structure – Prof. Michael Zuker, Department ofMathematical Sciences, Rensselaer Polytechnic Institute.http://www.bioinfo.rpi.edu/ zukerm/Bio-5495/RNAfold-html/rnafold.html
• Prof. Ann Palmenberg (Dept of Biochemistry & Institute for MolecularVirology) and Dr. Jean-Yves Sgro (Institute for Molecular Virology),University of Wisconsin – Madison.
C. E. Heitsch, GA Tech 21
References
[1] E. Dam, K. Pleij, and D. Draper. Structural and functional aspects of rna pseudoknots.
Biochemistry, 31(47):11665–11676, 1992.
[2] R. B. Lyngsøand C. N. S. Pedersen. RNA pseudoknot prediction in energy-based models. J
Comput Biol, 7(3):409 – 427, 2000.
[3] D. Mathews, J. Sabina, M. Zuker, and D. Turner. Expanded sequence dependence of
thermodynamic parameters improves prediction of RNA secondary structure. J. Mol. Biol.,
288:911–940, 1999.
[4] S. B. Needleman and C. D. Wunsch. A general method applicable to the search for
similarities in the amino acid sequence of two proteins. J Mol Biol, 48(3):443–53, March
1970.
[5] A. C. Palmenberg and J.-Y. Sgro. Topological organization of picornaviral genomes:
Statistical pre diction of RNA structural signals. S Virology, 8:231–241, 1997.
[6] A. C. Palmenberg and J.-Y. Sgro. The Molecular Biology of Picornaviruses, chapter
Alignments and Comparative Profiles of Picornavirus Genera, pages 149–155. ASM Press,
Washington, DC, 2002.
[7] E. Rivas and S. Eddy. A dynamic programming algorithm for RNA structure prediction
including pseudoknots. J Mol Biol, 285(5):2053–68, Feb 5 1999.
[8] P. Schuster, W. Fontana, P. Stadler, and I. Hofacker. From sequences to shapes and back:
C. E. Heitsch, GA Tech 22
a case study in rna secondary structures. Proc R Soc Lond B Biol Sci, 255(1344):279–284,
1994.
[9] M. Serra, D. Turner, and S. Freier. Predicting thermodynamic properties of RNA. Meth.
Enzymol., 259:243–261, 1995.
[10] T. F. Smith and M. S. Waterman. Identification of common molecular subsequences. J
Mol Biol, 1:195–7, Mar 25 1981.
[11] S. Wuchty, W. Fontana, I. L. Hofacker, and P. S. chuster. Complete suboptimal folding of
RNA and the stability of second ary structures. Biopolymers, 49(2):145 – 165, Feb 8 1999.
[12] M. Zuker. On finding all suboptimal foldings of an RNA molecule. Science, 244(4900):48 –
52, Apr 7 1989.
[13] M. Zuker, D. Mathews, and D. Turner. Algorithms and thermodynamics for RNA
secondary structure predi ction: A practical guide. In J. Barciszewski and B. Clark, editors,
RNA Biochemistry and Biotechnology, NATO ASI Series, pages 11–43. Kluwer Academic
Publishers, 1999.
C. E. Heitsch, GA Tech 23