30
Craig L. Zirbel October 7, 2010 RNA 2D and 3D Structure

RNA 2D and 3D Structure

  • Upload
    neveah

  • View
    32

  • Download
    0

Embed Size (px)

DESCRIPTION

RNA 2D and 3D Structure. Craig L. Zirbel October 7, 2010. RNA primary sequences. Laboratory techniques make it possible to extract specific RNA molecules and determine the sequence of nucleotides. Here are the sequences of the 5S ribosomal RNA molecule from different organisms:. - PowerPoint PPT Presentation

Citation preview

Page 1: RNA 2D and 3D Structure

Craig L. ZirbelOctober 7, 2010

RNA 2D and 3D Structure

Page 2: RNA 2D and 3D Structure

RNA primary sequences

• Laboratory techniques make it possible to extract specific RNA molecules and determine the sequence of nucleotides. Here are the sequences of the 5S ribosomal RNA molecule from different organisms:

UUAGGCGGCCACAGCGGUGGGGUUGCCUCCCGUACCCAUCCCGAACACGGAAGAUAAGCCCACCAGCGUUCCGGGGAGUACUGGAGUGCGCGAGCCUCUGGGAAACCCGGUUCGCCGCCACC A H.m. (structure)GCCUGGCGGCCGUAGCGCGGUGGUCCCACCUGACCCCAUGCCGAACUCAGAAGUGAAACGCCGUAGCGCCGAUGGUAGUGUGGGGUCUCCCCAUGCGAGAGUAGGGAACUGCCAGGC B E.coli (structure)UCCCCCGUGCCCAUAGCGGCGUGGAACCACCCGUUCCCAUUCCGAACACGGAAGUGAAACGCGCCAGCGCCGAUGGUACUGGGCGGGCGACCGCCUGGGAGAGUAGGUCGGUGCGGGG B T.th. (structure)AGUGGUGGCCAUAUCGGCGGGGUUCCUCCCCGUACCCAUCCUGAACACGGAAGAUAAGCCCGCCAGCGUCCGGCAAGUACUGGAGUGCGCGAGCCUCUGGGAAAUCCGGUUCGCCGCCAC A L27170.1/1-120GUAGCGGCCACAGCGGUGGGGUUCCUCCCGUACCCAUCCCGAACACGGAAGAUAAGCCCACCAGCGUUCCGGGGAGUACUGGAGUGCGCGACCCUCUGGGAAACCGGGUUCGCCGCUAC A L27163.1/1-119GCGGCCAGGGCGGAGGGGAAACACCCGUACCCAUUCCGAACACGGAAGUGAAGCCCUCCAGCGAACCAGCUAGUACUAGAGUGGGAGACCCUCUGGGAGCGCUGGUUCGCCGCC A L27343.1/3-116UUUGGCGGUCAUGGCGUGGGGGUUUAUACCUGAUCUCGUUUCGAUCUCAGUAGUUAAGUCCUGCUGCGUUGUGGGUGUGUACUGCGGUUUUUUGCUGUGGGAAGCCCACUUCACUGCCAGAC A M36187.1/5-126GUUGGCGGUCAUGGCGUGGGGUUUAUACCUGAUCUCGUUUCGAUCUCAGUAGUUAAGUCCUGCUGCGUUGUGGGUGUGUACUGCGGUUUUUUGCUGUGGGAAGCCCACUUCACUGCCAGAC A X62857.1/1-121UUUGGCGGUCAUGGCGUGGGGGUUAUACCUGAUCUCGUUUCGAUCUCAGUAGUUAAGUCCUGCUGCGUUGUGGGUGUGUACUGCGGUGUUUUGCUGUGGGAAGCCCAUUUCACUGCCAGCC A X15364.1/6601-6721GUCGGUGGUGUUAGCGGUGGGGUCACGCCCGGUCCCUUUCCGAACCCGGAAGCUAAGCCUGCCUGCGCCGAUGGUACUGCACCUGGGAGGGUGUGGGAGAGUAGGACCCCGCCGGCA B M16176.1/4-120GUCGGUGGUUAUAGCGGUGGGGUCACGCCCGGUCCCAUUCCGAACCCGGAAGCUAAGCCCACCUGCGCCGAUGGUACUGCACCUGGGAGGGUGUGGGAGAGUAGGUCACCGCCGGCC B M16177.1/4-120GUUGGUGGUUAUUGUGUCGGGGGUACGCCCGGUCCCUUUCCGAACCCGGAAGCUAAGCCCGAUUGCGCUGAUGGUACUGCACCUGGGAGGGUGUGGGAGAGUAGGUCGCUGCCAACC B X55255.1/4-120UACGGCGGUCAAUAGCGGCAGGGAAACGCCCGGUCCCAUCCCGAACCCGGAAGCUAAGCCUGCCAGCGCCAAUGAUACUGCCCUCACCGGGUGGAAAAGUAGGACACCGCCGAAC B X55259.1/3-117UACGGCGGUCCAUAGCGGCAGGGAAACGCCCGGUCCCAUCCCGAACCCGGAAGCUAAGCCUGCCAGCGCCGAUGAUACUACCCAUCCGGGUGGAAAAGUAGGACACCGCCGAAC B X55251.1/3-116UACGGCGGCCACAGCGGCAGGGAAACGCCCGGUCCCAUUCCGAACCCGGAAGCUAAGCCUGCCAGCGCCGAUGAUACUGCCCCUCCGGGUGGAAAAGUAGGACACCGCCGAAC B X75601.1/91-203UAAGGCGGCCAUAGCGGUGGGGUUACUCCCGUACCCAUCCCGAACACGGAAGAUAAGCCCGCCUGCGUUCCGGUCAGUACUGGAGUGCGCGAGCCUCUGGGAAAUCCGGUUCGCCGCCUACU A X03407.1/5927-6048UUGGCGACCAUAGCGGCGAGUGACCUCCCGUACCCAUCCCGAACACGGAAGAUAAGCUCGCCUGCGUUUCGGUCAGUACUGGAUUGGGCGACCCUCUGGGAAAUCUGAUUCGCCGCCACC A L27168.1/1-120GGCGGCCAGAGCGGUGAGGUUCCACCCGUACCCAUCCCGAACACGGAAGUUAAGCUCACCUGCGUUCUGGUCAGUACUGGAGUGAGCGAUCCUCUGGGAAAUCCAGUUCGCCGCCC A X02128.1/24-139GGGCGGCCAGAGCGGUGAGGUUCCACCCGUACCCAUCCCGAACACGGAAGUUAAGCUCGCCUGCGUUCUGGUCAGUACUGGAGUGAGCGAUCCUCUGGGAAAUCCAGUUCGCCGCCCCU A X14441.1/5-123

Page 3: RNA 2D and 3D Structure

RNA can make double helices

• RNA chains are flexible enough to fold back on themselves and make the same types of basepairs as are found in DNA. These are called “Watson-Crick” basepairs.

Page 4: RNA 2D and 3D Structure

Watson-Crick basepairs• The main Watson-Crick basepairs are AU and GC. (GU also

occurs sometimes.) They can substitute for one another freely without changing the structure of the RNA molecule. They are said to be isosteric, and changes between these basepairs is an example of neutral variability. They are held together by hydrogen bonds (dotted lines).

Superposition

Page 5: RNA 2D and 3D Structure

Comparative sequence analysis

• By manually aligning similar RNA sequences and noting the pairs of columns where AU, CG, GC, and UA pairs replace one another, one can infer the locations of Watson-Crick basepairs (called the secondary structure ) of an RNA molecule.

This is the inferred secondary structure of the 5S RNA, with bases labeled as found in E. coli. There are five helical regions, with three “internal loops” and two “hairpin loops” separating them.

Fox & Woese 1975; Peattie et al. 1981; Noller 1984; Cannone et al. 2002; http://www.rna.ccbb.utexas.edu

UGCCUGGCGGCCGUAGCGCGGUGGUCCCACCUGACCCCAUGCCGAACUCAGAAGUGAAACGCCGUAGCGCCGAUGGUAGUGUGGGGUCUCCCCAUGCGAGAGUAGGGAACUGCCAGGCAU

Page 6: RNA 2D and 3D Structure

Comparative sequence analysis

This is the inferred secondary structure of the 5S RNA, with bases labeled as found in E. coli. There are five helical regions, with three “internal loops” and two “hairpin loops” separating them.

Fox & Woese 1975; Peattie et al. 1981; Noller 1984; Cannone et al. 2002; http://www.rna.ccbb.utexas.edu

UGCCUGGCGGCCGUAGCGCGGUGGUCCCACCUGACCCCAUGCCGAACUCAGAAGUGAAACGCCGUAGCGCCGAUGGUAGUGUGGGGUCUCCCCAUGCGAGAGUAGGGAACUGCCAGGCAU

• By manually aligning similar RNA sequences and noting the pairs of columns where AU, CG, GC, and UA pairs replace one another, one can infer the locations of Watson-Crick basepairs (called the secondary structure ) of an RNA molecule.

((((((((((-----((((((((----(((((((-------------)))))))))---(((((((((-(((((((--((((((((---))))))))--)))))))---))))))))))-

Page 7: RNA 2D and 3D Structure

RNA 3D structure

Starting late in the year 2000, high-resolution atomic structures of entire ribosomes have been published. These show the bases, the backbone, the Watson-Crick basepairs, and several new types of basepairs.

E. coli 5S RNA

The 3D structures confirm the predicted secondary structure and show the importance of Watson-Crick basepairs.

Page 8: RNA 2D and 3D Structure

RNA secondary structure prediction

• Now that we understand the basics of RNA 3D structure and Watson-Crick basepairs, we can pose the problem of predicting the secondary and 3D structure from an RNA sequence.

• Comparative sequence analysis requires multiple RNA sequences.

• For now, we will talk about predicting RNA secondary structure from a single sequence.

Page 9: RNA 2D and 3D Structure

Three methods to predict secondary structure

• Dot plots – a way to visualize the possible helices in a sequence. Somewhat primitive, but a good technique to know.

• Nussinov algorithm – a technique to find the set of basepairs which maximizes the number of basepairs in a sequence

• Energy methods – find the set of basepairs which results in the lowest energy structure, the one which is likely to be preferred in nature. mfold

Page 10: RNA 2D and 3D Structure

• Dot plot – make a grid with the RNA sequence down the rows and across the columns.

• Put a dot at the location of each CG or AU pair.c g u u u g g g u u c a c a a a C G

C

G

U

U

U

G

G

G

U

U

C

A

C

A

A

A

C

G

Page 11: RNA 2D and 3D Structure

• Dot plot – make a grid with the RNA sequence down the rows and across the columns.

• Put a dot at the location of each CG or AU pair.c g u u u g g g u u c a c a a a C G

C

G

U

U

U

G

G

G

U

U

C

A

C

A

A

A

C

G

Page 12: RNA 2D and 3D Structure

Dot Plotsc g u u u g g g u u c a c a a a C G

C + + + + +

G + + + +

U + + + +

U + + + +

U + + + +

G + + + +

G + + + +

G + + + +

U + + + +

U + + + +

C + + + + +

A + + + + +

C + + + + +

A + + + + +

A + + + + +

A + + + + +

C + + + + +

G + + + +

CGUUUGGGUUCACAAACG((((((------))))))“dot-bracket notation”

Page 13: RNA 2D and 3D Structure

Nussinov algorithm

• Finds the largest number of nested Watson-Crick pairs in an RNA sequence.

• Similar to a dot plot, but we keep track of the cumulative number of nested Watson-Crick basepairs in each subsequence as we go.

Page 14: RNA 2D and 3D Structure

• Put zeros down the diagonal.

Page 15: RNA 2D and 3D Structure

• Put ones above the diagonal where there is a CG, GC, AU, or UA pair.

Page 16: RNA 2D and 3D Structure

• Continue with Watson-Crick pairs, but also take the maximum of the cell to the left, below, and left and below.

Page 17: RNA 2D and 3D Structure

• Continue with Watson-Crick pairs, but also take the maximum of the cell to the left, below, and left and below.

Page 18: RNA 2D and 3D Structure

• Each cell we fill in tells the maximum number of Watson-Crick pairs in the subsequence down and to the left of the cell.

Page 19: RNA 2D and 3D Structure

• Fill in more “diagonals” in the same way.

Page 20: RNA 2D and 3D Structure

• This subsequence only has one nested Watson-Crick basepair, either a GC or a UA, but not both, since they would cross each other.

Page 21: RNA 2D and 3D Structure

• Finally we come to a subsequence that has two nested Watson-Crick pairs. The cell with the 2 is 1 for the GC pair plus 1 from the cell down and left of it, a UA.

Page 22: RNA 2D and 3D Structure

Thermodynamic methods

• Idea: find the secondary structure with the most favorable (lowest) energy.

• Zuker method (mfold): Uses Dynamic Programming to calculate structure with lowest free energy

• • McCaskill method (sfold): Uses Dynamic Programming to calculate the most probable structure (more theoretically rigorous)

• Used by these programs: Mfold, Sfold, Pfold

Assumptions:• Only Nearest Neighbor Interactions need to be considered. • Nearest Neighbor Interactions can be summed to give total free energy. • Pseudoknots and tertiary interactions can be ignored. • Most stable structure is also the kinetically favored structure.

Page 23: RNA 2D and 3D Structure

Nearest neighbor parameters

Bioinformatics: sequence and genome analysis By David W. Mount

This is more sophisticated than simply counting the number of AU, UA, CG, GC basepairs in each subsequence. You also tally up the strength of each pair and the energy of one pair stacking on another pair.

Page 24: RNA 2D and 3D Structure

Determining parameters

5’ - GCCAUCCG - 3’3’ - CGGUAGGC - 5’

cuvette

Heat measure absorbance at 260

nm (UV)

Page 25: RNA 2D and 3D Structure

Determining parameters

Reaction:Strand1 + Strand2 = Duplex

Equilibrium constant for each T:

[S1(T)][S2(T)] [Duplex(T)]Keq =

Free energy change:∆G(T) = -RT ln(Keq(T))

Page 26: RNA 2D and 3D Structure

Determining parameters

• Repeat this for many related sequences and do statistical analysis to get pairwise parameters.

5’ - GCCAUCCG - 3’3’ - CGGUAGGC - 5’

5’ - GCCAACCG - 3’3’ - CGGUUGGC - 5’

...

∆∆G

∆∆G - the energy change due to substituting one basepair for another.

Page 27: RNA 2D and 3D Structure

Nearest neighbor parameters

Parameters for most of the “loop” regions are unknown:There are too many possible loops to do experiments for all of them.

Hard to extrapolate - small change in sequence - large change in free energy.

Usually, unpaired regions are penalized, but it’s known that certain “loops” are very thermodynamically stable, and they are scored with low free energies (e.g. UNCG hairpin).

Page 28: RNA 2D and 3D Structure

Using thermodynamic parameters

• Dynamic Programming

∆G° = -RT ln(Keq) = -2.4-2.2-0.9-0.9-2.1-2.4+5.4 == -4.5 kcal/mol

Page 29: RNA 2D and 3D Structure

Things to keep in mind

• Calculated free energies are always approximate • Most stable calculated structure is not necessarily most stable real

structure • Must consider “sub-optimal” calculated structures • Must use additional information, if available, to pick correct structure

Page 30: RNA 2D and 3D Structure

MFOLD

• One of the best thermodynamic methods.• Developed by Michael Zuker.• Web server: http://mfold.bioinfo.rpi.edu/cgi-bin/rna-form1.cgi

• Submit a sequence, forget the various parameters you can set, look through the output. Look for: dot plots, multiple possible structures, and minimum free energies for the structures. Look at output in png format unless you prefer another image format.

M. Zuker Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res. 31 (13), 3406-15, (2003)