26
Math 8803/4803, Spring 2008: Discrete Mathematical Biology Prof. Christine Heitsch School of Mathematics Georgia Institute of Technology Lecture 9 – January 27, 2008

Math 8803/4803, Spring 2008: Discrete Mathematical Biologypeople.math.gatech.edu/~heitsch/Teaching/Sp08/Lectures/lect9.pdfMath 8803/4803, Spring 2008: Discrete Mathematical Biology

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Math 8803/4803, Spring 2008: Discrete Mathematical Biologypeople.math.gatech.edu/~heitsch/Teaching/Sp08/Lectures/lect9.pdfMath 8803/4803, Spring 2008: Discrete Mathematical Biology

Math 8803/4803, Spring 2008:Discrete Mathematical Biology

Prof. Christine Heitsch

School of Mathematics

Georgia Institute of Technology

Lecture 9 – January 27, 2008

Page 2: Math 8803/4803, Spring 2008: Discrete Mathematical Biologypeople.math.gatech.edu/~heitsch/Teaching/Sp08/Lectures/lect9.pdfMath 8803/4803, Spring 2008: Discrete Mathematical Biology

Combinatorics on biological words

DNA −→ ←− RNA

DNA and RNA are (oriented, biochemical) sequences over (nucleotide)

alphabets with the complementary Watson-Crick base pairing.

C. E. Heitsch, GA Tech 1

Page 3: Math 8803/4803, Spring 2008: Discrete Mathematical Biologypeople.math.gatech.edu/~heitsch/Teaching/Sp08/Lectures/lect9.pdfMath 8803/4803, Spring 2008: Discrete Mathematical Biology

The “Central Dogma of Molecular Biology”

“Information flows as:

where RNA mediates the production of proteins from DNA.”

C. E. Heitsch, GA Tech 2

Page 4: Math 8803/4803, Spring 2008: Discrete Mathematical Biologypeople.math.gatech.edu/~heitsch/Teaching/Sp08/Lectures/lect9.pdfMath 8803/4803, Spring 2008: Discrete Mathematical Biology

RNA: more than just the messenger

Breakthrough of the year:Small RNAs Make Big SplashPublished in Science Issue of 20 Dec 2002.

The structure of Pariacoto virus revealsa dodecahedral cage of duplex RNA,by Tang et. al. in Nat Struct Biol.

C. E. Heitsch, GA Tech 3

Page 5: Math 8803/4803, Spring 2008: Discrete Mathematical Biologypeople.math.gatech.edu/~heitsch/Teaching/Sp08/Lectures/lect9.pdfMath 8803/4803, Spring 2008: Discrete Mathematical Biology

Biological function follows form

“Over the past two decades it has become clear that a variety of

RNA molecules have important or essential biological functions in

cells, beyond the well-established roles of ribosomal, transfer and

messenger RNAs in protein biosynthesis. . . . Each class of RNA is

likely to have a unique fold that confers biochemical function.”

From “Structural Genomics of RNA” by Jennifer A. Doudna,

published in Nature Structural Biology, Nov. 2000.

C. E. Heitsch, GA Tech 4

Page 6: Math 8803/4803, Spring 2008: Discrete Mathematical Biologypeople.math.gatech.edu/~heitsch/Teaching/Sp08/Lectures/lect9.pdfMath 8803/4803, Spring 2008: Discrete Mathematical Biology

Levels of RNA structure I

Primary: linear sequence of nucleotide bases

Tertiary: all otherintra−molecular

interactions

Secondary: set of base pairsinduced by self−bonding

Three DimensionalRNA MolecularStructure

tRNA

C. E. Heitsch, GA Tech 5

Page 7: Math 8803/4803, Spring 2008: Discrete Mathematical Biologypeople.math.gatech.edu/~heitsch/Teaching/Sp08/Lectures/lect9.pdfMath 8803/4803, Spring 2008: Discrete Mathematical Biology

Levels of RNA structure II

Selective base pair hybridization ⇐⇒ structure and function

GCGGAUUUAG

UCGCACCA

GCCUGAAGAUCUGGAGGUCCUGGUUCGAUCCACAGAAU

CUCAGUUGGGAGAGCGCCA

Primary sequence −→ secondary structure −→ 3D molecule

C. E. Heitsch, GA Tech 6

Page 8: Math 8803/4803, Spring 2008: Discrete Mathematical Biologypeople.math.gatech.edu/~heitsch/Teaching/Sp08/Lectures/lect9.pdfMath 8803/4803, Spring 2008: Discrete Mathematical Biology

Important biomathematical questions

GCGGAUUUAG

UCGCACCA

GCCUGAAGAUCUGGAGGUCCUGGUUCGAUCCACAGAAU

CUCAGUUGGGAGAGCGCCA

Prediction?

Analysis?

Design?

How do RNA sequences encode secondary structures?

C. E. Heitsch, GA Tech 7

Page 9: Math 8803/4803, Spring 2008: Discrete Mathematical Biologypeople.math.gatech.edu/~heitsch/Teaching/Sp08/Lectures/lect9.pdfMath 8803/4803, Spring 2008: Discrete Mathematical Biology

Sequence to structure: a one-to-many mapping

R = gcgga uuuagcuc aguuggga gagc g ccaga cugaa

gaucugg agguc cugug uucgauc cacag a auucgc acca

S1(R) =

S2(R) =

Hypothesis: RNA sequences fold with minimal free energy.

C. E. Heitsch, GA Tech 8

Page 10: Math 8803/4803, Spring 2008: Discrete Mathematical Biologypeople.math.gatech.edu/~heitsch/Teaching/Sp08/Lectures/lect9.pdfMath 8803/4803, Spring 2008: Discrete Mathematical Biology

Thermodynamics of RNA folding

‘‘Helices’’

‘‘Loops’’

RNA secondary structures are balanced between

energetically favorable helices (stacked base pairs)

and destabilizing loops (single-stranded regions).

Loops

Helices

C. E. Heitsch, GA Tech 9

Page 11: Math 8803/4803, Spring 2008: Discrete Mathematical Biologypeople.math.gatech.edu/~heitsch/Teaching/Sp08/Lectures/lect9.pdfMath 8803/4803, Spring 2008: Discrete Mathematical Biology

Predicting nested RNA base pairings

Let R = b1b2 . . . bn ∈ {a,u,c,g}+ be a 5′ to 3′ RNA sequence.

Definition. Let S(R) be a set of base pairs

S(R) = {bi − bj | 1 ≤ i < j ≤ n}

where all bi − bj and bi′ − bj′ are

distinct, i = i′ ⇐⇒ j = j′, and

either i < i′ < j′ < j or i < j < i′ < j′.

Then S(R) is a nested secondary structure of R.

C. E. Heitsch, GA Tech 10

Page 12: Math 8803/4803, Spring 2008: Discrete Mathematical Biologypeople.math.gatech.edu/~heitsch/Teaching/Sp08/Lectures/lect9.pdfMath 8803/4803, Spring 2008: Discrete Mathematical Biology

Predicting nested RNA base pairings

Let R = b1b2 . . . bn ∈ {a,u,c,g}+ be a 5′ to 3′ RNA sequence.

Definition. Let S(R) be a set of base pairs

S(R) = {bi − bj | 1 ≤ i < j ≤ n}

where for all distinct bi − bj and

bi′ − bj′, i = i′ ⇐⇒ j = j′,

either i < i′ < j′ < j or i < j < i′ < j′.

Then S(R) is a nested secondary structure of R.

C. E. Heitsch, GA Tech 10

Page 13: Math 8803/4803, Spring 2008: Discrete Mathematical Biologypeople.math.gatech.edu/~heitsch/Teaching/Sp08/Lectures/lect9.pdfMath 8803/4803, Spring 2008: Discrete Mathematical Biology

Predicting nested RNA base pairings

Let R = b1b2 . . . bn ∈ {a,u,c,g}+ be a 5′ to 3′ RNA sequence.

Definition. Let S(R) be a set of base pairs

S(R) = {bi − bj | 1 ≤ i < j ≤ n}

where for all distinct bi − bj and

bi′ − bj′, i = i′ ⇐⇒ j = j′,

either i < i′ < j′ < j or i < j < i′ < j′.

Then S(R) is a nested secondary structure of R.

C. E. Heitsch, GA Tech 10

Page 14: Math 8803/4803, Spring 2008: Discrete Mathematical Biologypeople.math.gatech.edu/~heitsch/Teaching/Sp08/Lectures/lect9.pdfMath 8803/4803, Spring 2008: Discrete Mathematical Biology

Components of RNA secondary structures I

Definition. Let i.j denote a base pair bi − bj in a a nested RNA secondarystructure S(R).

• A base bi′ or base pair i′.j′ ∈ S(R) is accessible from i.j if i < i′ (< j′) < jand if there is no other base pair i′′.j′′ ∈ S(R) such thati < i′′ < i′ (< j′) < j′′ < j.

• A stacked pair is formed if exactly the base pair (i + 1).(j − 1) is accessiblefrom i.j. Successive stacked pairs form a stem.

• Otherwise, i.j closes a k-loop with k − 1 base pairs for k ≥ 1 and l ≥ 0unpaired bases accessible from i.j.

• The external loop, denoted Le(k− 1), is the set of l > 0 unpaired bases andk − 1 base pairs without a closing base pair.

C. E. Heitsch, GA Tech 11

Page 15: Math 8803/4803, Spring 2008: Discrete Mathematical Biologypeople.math.gatech.edu/~heitsch/Teaching/Sp08/Lectures/lect9.pdfMath 8803/4803, Spring 2008: Discrete Mathematical Biology

Components of RNA secondary structures II

Assumption 1. The free energy of S(R) is the sum of loop free energies.Assumption 2. The free energy of a loop is independent of all other loops.

http://www.cs.washington.edu/education/courses/527/00wi/

C. E. Heitsch, GA Tech 12

Page 16: Math 8803/4803, Spring 2008: Discrete Mathematical Biologypeople.math.gatech.edu/~heitsch/Teaching/Sp08/Lectures/lect9.pdfMath 8803/4803, Spring 2008: Discrete Mathematical Biology

Nearest neighbor energy model

∆G = -22.7 kcal/mole

S. cerevisiaePhe-tRNAat 37◦

C. E. Heitsch, GA Tech 13

Page 17: Math 8803/4803, Spring 2008: Discrete Mathematical Biologypeople.math.gatech.edu/~heitsch/Teaching/Sp08/Lectures/lect9.pdfMath 8803/4803, Spring 2008: Discrete Mathematical Biology

Free energy parameters

Version 3.0 for RNA Folding at 37◦

Available through http://www.bioinfo.rpi.edu/∼zukerm/

C. E. Heitsch, GA Tech 14

Page 18: Math 8803/4803, Spring 2008: Discrete Mathematical Biologypeople.math.gatech.edu/~heitsch/Teaching/Sp08/Lectures/lect9.pdfMath 8803/4803, Spring 2008: Discrete Mathematical Biology

Algorithmic question

Problem. How to find a RNA folding with minimal free energy?

Solution. Dynamic programming.

Reason. Recursive prediction of RNA secondary structures.

Let R = b1b2 . . . bi . . . bj . . . bn and W (0) = 0.

W (j) = min(W (j − 1), min1≤i<j

(V (i, j) + W (i− 1))) for j > 0

W (j) is the minimal free energy of an optimal structure for the first j residues. V (i, j) is as

W (j), but assuming i.j forms a base pair. A recursive calculation of V (i, j) depends on

V (i + 1, j − 1) and four other functions.

C. E. Heitsch, GA Tech 15

Page 19: Math 8803/4803, Spring 2008: Discrete Mathematical Biologypeople.math.gatech.edu/~heitsch/Teaching/Sp08/Lectures/lect9.pdfMath 8803/4803, Spring 2008: Discrete Mathematical Biology

Recursive RNA secondary structure prediction

Let R = b1b2 . . . bi . . . bj . . . bn and W (0) = 0.

W (j) = min(W (j − 1), min1≤i<j

(V (i, j) + W (i− 1))) for j > 0

V (i, j) =∞ for i ≥ j

min(eH(i, j), eS(i, j) + V (i + 1, j − 1), V BI(i, j), V M(i, j)) for i < j

V BI(i, j) = mini′, j′

i < i′ < j′ < j

(eL(i, j, i′, j′) + V (i

′, j′))

V M(i, j) =

mink, i1, j1, i2, j2, . . . , ik, jk

i < i1 < j1 < i2 < j2 < . . . < ik < jk < j

k ≥ 2

(eM(i, j, i1, j1, i2, j2, . . . , ik, jk) +kX

h=1

V (ih, jh))

C. E. Heitsch, GA Tech 16

Page 20: Math 8803/4803, Spring 2008: Discrete Mathematical Biologypeople.math.gatech.edu/~heitsch/Teaching/Sp08/Lectures/lect9.pdfMath 8803/4803, Spring 2008: Discrete Mathematical Biology

Secondary structure prediction software

• The DINAMelt web server by Nick Markham & Michael Zuker –

http://www.bioinfo.rpi.edu/applications/hybrid/

• Michael Zuker’s mfold Server –

http://www.bioinfo.rpi.edu/applications/mfold/rna/form1.cgi

• Vienna RNA Package – http://www.tbi.univie.ac.at/∼ivo/RNA/

• RNAsoft – http://www.rnasoft.ca/

• GTfold – coming soon!

C. E. Heitsch, GA Tech 17

Page 21: Math 8803/4803, Spring 2008: Discrete Mathematical Biologypeople.math.gatech.edu/~heitsch/Teaching/Sp08/Lectures/lect9.pdfMath 8803/4803, Spring 2008: Discrete Mathematical Biology

Open problem: suboptimal foldings

The ∆E = 9.5 energy dot plot forthe cdk2 gene of Xenopus leavis.

Dots represent base pairs, colorcoded by energy (in kcal/mole).

Black: in optimal folding

Red: within 3.1

Blue: from 3.2-6.2

Yellow: from 6.3-9.5

C. E. Heitsch, GA Tech 18

Page 22: Math 8803/4803, Spring 2008: Discrete Mathematical Biologypeople.math.gatech.edu/~heitsch/Teaching/Sp08/Lectures/lect9.pdfMath 8803/4803, Spring 2008: Discrete Mathematical Biology

Open problem: motif recognition

∆G = −2976.7 kcal / mole

3126 base pairs: 868 a - u, 1890 g - c, 368 u - g

Palmenberg & Sgro, unpublished.

Hepatitis C China Virus Complete Genome

9400 bases: 1920 a, 2796 c, 2650 g, 2034 u

Mfold secondary structure prediction

C. E. Heitsch, GA Tech 19

Page 23: Math 8803/4803, Spring 2008: Discrete Mathematical Biologypeople.math.gatech.edu/~heitsch/Teaching/Sp08/Lectures/lect9.pdfMath 8803/4803, Spring 2008: Discrete Mathematical Biology

Open problem: pseudoknots

i

ji’

j’Pseudoknotted structure:

i < i′ < j < j′. 3D pseudoknot model

C. E. Heitsch, GA Tech 20

Page 24: Math 8803/4803, Spring 2008: Discrete Mathematical Biologypeople.math.gatech.edu/~heitsch/Teaching/Sp08/Lectures/lect9.pdfMath 8803/4803, Spring 2008: Discrete Mathematical Biology

Acknowledgments

• Access Excellence Graphics Gallery –http://www.accessexcellence.com/RC/VL/GG/index.html

• Predicted RNA foldings courtesy of Michael Zuker’s mfold algorithm.

• CSE 527, Winter 2000 – Computational Biology – Martin Tompahttp://www.cs.washington.edu/education/courses/527/00wi/

• Bio-5495 – RNA Secondary Structure – Prof. Michael Zuker, Department ofMathematical Sciences, Rensselaer Polytechnic Institute.http://www.bioinfo.rpi.edu/ zukerm/Bio-5495/RNAfold-html/rnafold.html

• Prof. Ann Palmenberg (Dept of Biochemistry & Institute for MolecularVirology) and Dr. Jean-Yves Sgro (Institute for Molecular Virology),University of Wisconsin – Madison.

C. E. Heitsch, GA Tech 21

Page 25: Math 8803/4803, Spring 2008: Discrete Mathematical Biologypeople.math.gatech.edu/~heitsch/Teaching/Sp08/Lectures/lect9.pdfMath 8803/4803, Spring 2008: Discrete Mathematical Biology

References

[1] E. Dam, K. Pleij, and D. Draper. Structural and functional aspects of rna pseudoknots.

Biochemistry, 31(47):11665–11676, 1992.

[2] R. B. Lyngsøand C. N. S. Pedersen. RNA pseudoknot prediction in energy-based models. J

Comput Biol, 7(3):409 – 427, 2000.

[3] D. Mathews, J. Sabina, M. Zuker, and D. Turner. Expanded sequence dependence of

thermodynamic parameters improves prediction of RNA secondary structure. J. Mol. Biol.,

288:911–940, 1999.

[4] S. B. Needleman and C. D. Wunsch. A general method applicable to the search for

similarities in the amino acid sequence of two proteins. J Mol Biol, 48(3):443–53, March

1970.

[5] A. C. Palmenberg and J.-Y. Sgro. Topological organization of picornaviral genomes:

Statistical pre diction of RNA structural signals. S Virology, 8:231–241, 1997.

[6] A. C. Palmenberg and J.-Y. Sgro. The Molecular Biology of Picornaviruses, chapter

Alignments and Comparative Profiles of Picornavirus Genera, pages 149–155. ASM Press,

Washington, DC, 2002.

[7] E. Rivas and S. Eddy. A dynamic programming algorithm for RNA structure prediction

including pseudoknots. J Mol Biol, 285(5):2053–68, Feb 5 1999.

[8] P. Schuster, W. Fontana, P. Stadler, and I. Hofacker. From sequences to shapes and back:

C. E. Heitsch, GA Tech 22

Page 26: Math 8803/4803, Spring 2008: Discrete Mathematical Biologypeople.math.gatech.edu/~heitsch/Teaching/Sp08/Lectures/lect9.pdfMath 8803/4803, Spring 2008: Discrete Mathematical Biology

a case study in rna secondary structures. Proc R Soc Lond B Biol Sci, 255(1344):279–284,

1994.

[9] M. Serra, D. Turner, and S. Freier. Predicting thermodynamic properties of RNA. Meth.

Enzymol., 259:243–261, 1995.

[10] T. F. Smith and M. S. Waterman. Identification of common molecular subsequences. J

Mol Biol, 1:195–7, Mar 25 1981.

[11] S. Wuchty, W. Fontana, I. L. Hofacker, and P. S. chuster. Complete suboptimal folding of

RNA and the stability of second ary structures. Biopolymers, 49(2):145 – 165, Feb 8 1999.

[12] M. Zuker. On finding all suboptimal foldings of an RNA molecule. Science, 244(4900):48 –

52, Apr 7 1989.

[13] M. Zuker, D. Mathews, and D. Turner. Algorithms and thermodynamics for RNA

secondary structure predi ction: A practical guide. In J. Barciszewski and B. Clark, editors,

RNA Biochemistry and Biotechnology, NATO ASI Series, pages 11–43. Kluwer Academic

Publishers, 1999.

C. E. Heitsch, GA Tech 23