Upload
orly
View
27
Download
0
Embed Size (px)
DESCRIPTION
Roles of RNA mRNA (messenger) rRNA (ribosomal) tRNA (transfer) other ribonucleoproteins (e.g. spliceosome, signal recognition particle, ribonuclease P) viral genomes artificial ribozymes. Typical transfer RNA structure. Bulges. Internal loops. Hairpin loop. Multi-branched - PowerPoint PPT Presentation
Citation preview
Roles of RNA
• mRNA (messenger)
• rRNA (ribosomal)
• tRNA (transfer)
• other ribonucleoproteins (e.g. spliceosome, signal recognition particle, ribonuclease P)
• viral genomes
• artificial ribozymes
Typical transfer RNA structure
Bulges
Internal loops
Hairpin loop
Multi-branched loop
CG
U A
U A
U
C
C
U
} G = -2.1 kcal/mol
} G = -1.2 kcal/mol
loop G = + 4.5 kcal/mol
Thermodynamics parameters are measured on real molecules.
Helix formation = hydrogen bonds + stacking
Entropic penalty for loop formation.
Sum up contributions of helices and loops over the whole structure.
i
kj
l
(a)
lk
ji
(b)
i
k
j
l
(c)
Pairs i-j and k-l are compatible if (a) i < j < k < l , or (b) i < k < l < j .
(c) is called a pseudoknot: i < k < j < l . Usually not counted as secondary structure.
Bracket notation is used to represent structure:
a: ((((....))))..((((....))))
b: ((.((((....)))).))
Basic problem: Want an algorithm that considers every allowed secondary structure for a given sequence and finds the lowest energy state.
kj
jkijkEkiE
jiEjiE )1,1()1,(min
)1,(min),(
4
i j i j-1 j i k j
= or
Simplest case: find structure which maximizes number of base pairs.
Let = -1 if bases can pair and + if not.
Ignore loop contributions.
E(i,j) = energy of min energy structure for chain segment from i to j.
We want E(1,N).
ij
Algorithms that work by recursion relations like this are called dynamic programming.
The algorithm is O(N3) although the number of structures increases exponentially with N.
Also need to do backtracking to work out the minimum energy structure:
Set B(i,j) = k if j is paired with k, or 0 if unpaired.
Partition Function Algorithm (for simplest energy rules)
Real Energy Rules : Need to consider many special cases.
What type of loop are you closing?
Algorithm is more complex but still is O(N3).
kjjk
j
ikkijiij aZZZZ 1,1
4
1,1,
i j i j-1 j i k j
= or
)/exp( kTa kjkj where
N
endsijjiij
ij Z
ZZap
,1
1,1
N
jiji pp
10, 1
Equilibrium probability that base i is paired with j
Equilibrium probability that base i is unpaired
1 i j N
Example of pairing probabilities taken from Vienna package web-site
A B C D
B
E
CD
F
B
D
G
H
i
ii
iii I
Is folding kinetics important?
RNA folding kinetics involves reorganisation of secondary structure
Native structures may not be global minimum free energy states.
Morgan & Higgs (1996) J. Chem. Phys.
Quantity Fitting Function Parameters
Groundstate energy C1 = 2.9 (0.2)
= -0.368 (0.001)
Total number of states C2 = -5.6 (0.4)
= 0.533 (0.001)
Number of groundstates C3 = 1.75 (0.2)
= 0.068 (0.001)
NCE 1
NC 2ln
NC 3ln
Energy Landscapes in RNA Folding
Morgan & Higgs (1998)
Groundstates are degenerate in this model because energies are integers.
Generate many random groundstates.
How far apart are these groundstates?
How high are the barriers between groundstates?
We found Frozen pairs (present in every groundstate)
This figure shows the frozen pairs only.
The molecule is divided into independent unfrozen loops.
Define Neff as the length of the longest loop.
Two groundstates for the same sequence
Minimum Free Energy Prediction
Deterministic. Always gets MFE structure for a given set of energy rules.
If MFE structure is not the same as biological structure, this could be because
(i) energy rules are inaccurate or insufficient
(ii) kinetics is important and molecule is trapped in metastable state.
Monte Carlo simulations of folding kinetics.
Store a current structure.
Estimate rates of removal of existing helices and rates of addition of other compatible helices.
Choose one helix to be added or removed with probability proportional to its rate.
Repeat this many times. Can simulate structure formation from an unfolded state.
Q is a bacteriophage
RNA virus with approx 4000 nucleotides
Viral RNA has complex secondary structure.
The replicase gene codes for the replicase protein. This is an RNA-dependent RNA polymerase.Synthesizes complementary strand. Viral replication needs two steps: plus to minus to plus.
In vitro RNA evolution in the Q system
c c c c
Begin with Replicase + nucleotides +viral RNA
Replicase + nucleotides only
sequence RNA after many transfers
Transfer small quantity to each successive tube
Barrier heights between alternative groundstates
Observation:
Mean barrier height between groundstates scales as
<h> ~ Neff0.5
Neff ~ 0.3 N
Therefore barriers become significant for large enough sequences.
An example where kinetics is important to control biological function:
the 5’ region of the MS2 phage.
3500
130Maturation protein
0.0 2.0 4.0 6.0 8.0Time (s)
0.0
0.1
0.2
Ave
rage
pro
b. S
D fr
ee CC3435AA
WT & U32C
SA
Time to formation of the 5’ structure influences expression of the maturation protein more than the stability of this structure.
Simulations compare with experiments on mutant sequences.
RNA in comparison to Proteins
Both have well defined 3d structures
RNA folding problem is easier because secondary structure separates from tertiary structure more easily - But it is still a complex problem.
RNA model has real parameters therefore you can say something about real molecules. RNA folding algorithm is simple enough to be able to do statistical physics. (cf. 27-mer lattice protein models).
Part of sequence alignment of Mitochondrial Small Sub-Unit rRNA
Full gene is length ~950
11 Primate species with mouse as outgroup
Mouse : Lemur : Tarsier : SakiMonkey : Marmoset : Baboon : Gibbon : Orangutan : Gorilla : PygmyChimp : Chimp : Human :
* 20 * 40 * 60 * CUCACCAUCUCUUGCUAAUUCAGCCUAUAUACCGCCAUCUUCAGCAAACCCUAAAAAGG-UAUUAAAGUAAGCAAAAGACUCACCACUUCUUGCUAAUUCAACUUAUAUACCGCCAUCCCCAGCAAACCCUAUUAAGGCCC-CAAAGUAAGCAAAAACCUUACCACCUCUUGCUAAUUCAGUCUAUAUACCGCCAUCUUCAGCAAACCCUAAUAAAGGUUUUAAAGUAAGCACAAGUCUUACCACCUCUUGCC-AU-CAGCCUGUAUACCGCCAUCUUCAGCAAACUCUA-UAAUGACAGUAAAGUAAGCACAAGUCUCACCACGUCUAGCC-AU-CAGCCUGUAUACCGCCAUCUUCAGCAAACUCCU-UAAUGAUUGUAAAGUAAGCAGAAGUCCCACCCUCUCUUGCU----UAGUCUAUAUACCGCCAUCUUCAGCAAACCCUGAUGAAGGCUACGAAGUGAGCGCAAAUCUCACCAUCUCUUGCU----CAGCCUAUAUACCGCCAUCUUCAGCAAACCCUGACAAAGGCUAUAAAGUAAGCACAAACCUCACCACCCCUUGCU----CAGCCUAUAUACCGCCAUCUUCAGCAAACCCUGAUGAAGGCCACGAAGUAAGCGCAAACCUCACCACCUCUUGCU----CAGCCUAUAUACCGCCAUCUUCAGCAAACCCUGACGAAGGCCACAAAGUAAGCACAAGUCUCACCGCCUCUUGCU----CAGCCUAUAUACCGCCAUCUUCAGCAAACCCUGAUGAAGGUUACAAAGUAAGCGCAAGUCUCACCGCCUCUUGCU----CAGCCUAUAUACCGCCAUCUUCAGCAAACCCUGAUGAAGGUUACAAAGUAAGCGCAAGUCUCACCACCUCUUGCU----CAGCCUAUAUACCGCCAUCUUCAGCAAACCCUGAUGAAGGCUACAAAGUAAGCGCAAGUCucACC cuCUuGCu cAgccUaUAUACCGCCAUCuuCAGCAAACcCu A G aAAGUaAGC AA
: 78 : 78 : 79 : 76 : 76 : 75 : 75 : 75 : 75 : 75 : 75 : 75
le tte rs to na ture
61 6 N ATURE |V O L 4 0 9 |1 FEBR UA RY 2001|www.na ture .co m
C ondylura *
C holoepus d id. * C holoepus ho f.Tam anduaM yrm ecophagaEuphractus * C haetophractusTrichechus *Loxodonta * P rocavia * Echinops * O rycteropus *M acroscelidesE lephantulus *D idelphis * M acropus *
S ireniaProboscideaH yracoideaTenrecidaeTubulidentata
Xenarthra
M acroscelidea
M arsup ia lia
Tethytheria
Paenungulata
Afrotheria
100 77 99
94< 50 99
97 63
< 50 79
< 50< 50 50
< 50< 50
85 62
< 50
656479
100100100
100100100
100100
100100
100100
100100100
100100
936485
100100 C avia *
H ydrochaerusAgoutiE reth izonM yocastorD inom ysH ystrixH eterocephalus *M us *R attusC ricetusPedetesC astorD ipodom ysTam ias *M uscardinusSylv ilagus *O chotona *H ylobatesH om o *M acaca *A te les *C allim icoC ynocephalus */ ** Lem ur *T a r s i u s *T u p a i a *
Cavio-m orpha Hystric o-
gnath i
Anthropoidea Prim ates
D erm optera **
Scandentia
R odentia
Lagom orpha
Glires
Lem uriform es
Tarsiiform esPrim ates
88< 50
5398
90100
995295
97 87100
7287
9784
988099
60< 50< 50
67< 50< 50
9070
100100
100100100
100100
100100100
100100100
100100
100100
7183
100100100
9995
100100
III
749999
M egaptera *TursiopsH ippopotam us *TragelaphusO kapiaSusLam a * C eratotherium *TapirusEquus *Felis * LeopardusPantheraC anis * U rsusM anis *A rtibeus *N ycterisP teropus * R ousettusErinaceus *Sorex *Asioscalops
Cetacea
C etartiodactyla
Perissodactyla
C arnivora
Pholidota
C hiropteraM egachiroptera
M icrochiroptera
'Eu lipotyphla '
6677
9199
637267
92100 90
100100
9693
5270
100100100
100100100
100 97 98
100 96100
100100
100 99
100100100
100100
96 97100
100100
100100
I V
II
I
100100
< 50< 50 63
< 50< 50 55
< 50< 50
72
Fig ure 1 Phylo g e ne ticre la tio nship sa m o ng64 p la c e nta lm a m m a lsa ndtwom a rsup ia lsb a se do na na lysiso f9,779b pfro m15 nuc le a ra ndthre em tDNAg e ne s.The tre ere p re se ntsthem inim ume vo lutio nto p o lo g ye stim a te dthro ug hne ig hb o urjo ining(N J ),usingm a xim umlike liho o d(M L)d ista nc e s(se eM e tho d s).M a xim ump a rsim o ny(M P)a ndM L a na lyse s(se eSup p le m e nta ryInfo rm a tio n)p ro d uc e da sim ila rto p o lo g y(M P:TL = 26,422,c o nsiste nc yind e xC I= 0.34,re te ntio nind e xRI= 0.47;M L:Ln Like liho o d=- 95086.03)withd iffe re nc e sre vo lvingm o stlya ro undsho rtinte rno d e swithlo wb o o tstra psup p o rt.Num b e rsind ic a tep e rc e ntb o o tstra psup p o rtfro mNJ (to p ),M P(m id d le )a ndM L(b o tto m ),b a se do n1,000ite ra tio nsfo rN J a ndM P, a nd100ite ra tio nsfo rM L. M La na lyse swe reb a se do na p rune dd a tase t(37ta xam a rke dwitha ste risks). Te rm ina lta xa
a rela b e lle db ythe irg e ne ricd e sig na tio n,e xc e p twhe rem ultip lem e m b e rso fa g e nuswe reinc lud e d(se eM e tho d s).Bra c ke tsind ic a tehig he rle ve lta xo no m icg ro up so b se rve dinre sultingtre e s(rig hto ftre e ).A d iffe re nc eb e twe e nM P/M La ndNJ a na lyse swa sthep o sitio no fthe¯ yingle m ur(C yno c e p ha lus, d o ub lea ste risk).M Pa ndM Ld iffe re dfro mthesho wntre eb ysup p o rtingp rim a tem o no p hylya nda siste r-g ro up re la tio nshipb e twe e nDe rm o p te raa ndSc a nd e ntia .We ig hte dp a rsim o nya na lyse susingo nlytra nsve rsio ns (Tv),re m o va lo fthird p o sitio ntra nsitio ns(Ts), a nda Tv:Ts we ig hto f2:1 p ro d uc e dto p o lo g ie sc o ng rue ntwiththesho wntre e ,withd iffe re nc e sre vo lvingo nlya ro undb ra nc he sd e p ic te dhe rewith lo w(, 50% )b o o tstra psup p o rt.
© 2001 M acm illan M a gazines Ltd
Murphy et al.
Nature (2001)
uses 15 nuclear plus 3 mitochondrial proteins
Afrotheria / Laurasiatheria
Striking examples of convergent evolution
Cao et al. (2000) Gene
uses 12 mitochondrial proteins
RNA pairs model (GR7) 53 complete Mammalian mitochondrial genomes
Complete set of rRNAs + tRNAs from = 973 pairs. Jow et al. (2002)
86
97
100
100
100
100
100
AB
E
D
C
3BC
D
AE
4
A
D
E
B C
1
A
E
D
BC
2
MCMC searches the rugged landscape in tree space using the Metropolis algorithm.
Obtains a set of possible trees weighted according to their likelihood.
1. Rate parameter changes = continuous
2. Branch length changes = continuous
3. Topology changes = discrete
Nearest-neighbour interchange
Long-range move
Models of Sequence Evolution
Pij(t) = probability of being in state j at time t
given that ancestor was in state i at time 0.
States label bases A,C,G & Ti
t
jkjk
ikij rP
dt
dP
rij is the rate of substitution from state i to state j
The HKY model describes rate of evolution of single sites
*
*
*
*
CGA
TGA
TCA
TCG
T
C
G
A
TCGA
to
from
The frequencies of the four bases are
is the transition-transversion rate parameter
* means minus the sum of elements on the row
.,,, TCGA
1234567 7654321 ((((((( ))))))) Bacillus subtilis GGCUCGG CCGAGCCEscherichia coli GCCCGGA UCCGGGCSaccharomyces cerevisiae GCGGAUU AAUUCGCDrosophila melanogaster GCCGAAA UUUCGGCHomo sapiens GCCGAAA UUUCGGC
Compensatory Substitutions
Two sides of the acceptor stem from a tRNA are shown.
Due to structure conservation alignment is possible in widely different species.
*7
*6
*5
*4
*3
*2
*1
7654321
676575474373272171
677565464363262161
577566454353252151
477466455343242141
377366355344232131
277266255244233121
177166155144133122
MM
CG
UG
UA
GC
GU
AU
MMCGUGUAGCGUAU
Model 7A is a General Reversible 7-state Model
7 frequencies i + 21 rate parameters ij
- 2 constraints = 26 free parameters
Probability of remaining in same state Pii
SSU rRNA sequences from Eubacteria
Probability Pij of changes from CG to other pairs
SSU rRNA from Eubacteria
AU
GC
GU
CG
UA
UG
slow
fastfast
Selection against GU and UG is weaker than against mismatches.
Double transitions are faster than double transversions.
Double transitions are faster than single transitions to GU and UG states. This is explained by the theory of compensatory substitutions.
What is going on?
tRNAmitoch.
tRNAgeneral
tRNAarchaea
Rnase P SSU rRNA
G+C averageG+C helical regions
0.3390.448
0.5320.681
0.6360.829
0.5940.730
0.5450.674
Frequencies GCCGAUUAGUUG
MM
0.2660.1210.2570.2330.0460.0300.046
0.3720.2600.1280.1420.0430.0250.030
0.4730.3200.0570.0770.0310.0200.022
0.3850.2960.1170.1040.0500.0220.026
0.3520.2980.1220.1730.0200.0210.014
Number of sequences 884 754 64 84 455
Number of pairs 21 21 21 80 296
Analysis of RNA sequence databases
Selection for thermodynamically stable structures
Higgs (2000) Quart. Rev. Biophysics
tRNAmitoch.
tRNAgeneral
tRNAarchaea
Rnase P SSU rRNA
Mutabilities GCCGAUUAGUUG
MM
0.670.840.860.772.443.322.32
0.490.831.461.241.965.010.99
0.450.894.011.781.853.000.86
0.650.601.461.091.722.845.24
0.550.661.400.933.924.367.84
Double transitions / Double transversions
4.7 1.7 2.3 3.1 2.1
Double transitions /Transitions to GU or UG
1.6 2.0 8.9 3.6 2.8
Analysis of RNA Substitution Rates
Thermodynamic properties influence Evolutionary properties