Roles of RNA mRNA (messenger) rRNA (ribosomal) tRNA (transfer)

Preview:

DESCRIPTION

Roles of RNA mRNA (messenger) rRNA (ribosomal) tRNA (transfer) other ribonucleoproteins (e.g. spliceosome, signal recognition particle, ribonuclease P) viral genomes artificial ribozymes. Typical transfer RNA structure. Bulges. Internal loops. Hairpin loop. Multi-branched - PowerPoint PPT Presentation

Citation preview

Roles of RNA

• mRNA (messenger)

• rRNA (ribosomal)

• tRNA (transfer)

• other ribonucleoproteins (e.g. spliceosome, signal recognition particle, ribonuclease P)

• viral genomes

• artificial ribozymes

Typical transfer RNA structure

Bulges

Internal loops

Hairpin loop

Multi-branched loop

CG

U A

U A

U

C

C

U

} G = -2.1 kcal/mol

} G = -1.2 kcal/mol

loop G = + 4.5 kcal/mol

Thermodynamics parameters are measured on real molecules.

Helix formation = hydrogen bonds + stacking

Entropic penalty for loop formation.

Sum up contributions of helices and loops over the whole structure.

i

kj

l

(a)

lk

ji

(b)

i

k

j

l

(c)

Pairs i-j and k-l are compatible if (a) i < j < k < l , or (b) i < k < l < j .

(c) is called a pseudoknot: i < k < j < l . Usually not counted as secondary structure.

Bracket notation is used to represent structure:

a: ((((....))))..((((....))))

b: ((.((((....)))).))

Basic problem: Want an algorithm that considers every allowed secondary structure for a given sequence and finds the lowest energy state.

kj

jkijkEkiE

jiEjiE )1,1()1,(min

)1,(min),(

4

i j i j-1 j i k j

= or

Simplest case: find structure which maximizes number of base pairs.

Let = -1 if bases can pair and + if not.

Ignore loop contributions.

E(i,j) = energy of min energy structure for chain segment from i to j.

We want E(1,N).

ij

Algorithms that work by recursion relations like this are called dynamic programming.

The algorithm is O(N3) although the number of structures increases exponentially with N.

Also need to do backtracking to work out the minimum energy structure:

Set B(i,j) = k if j is paired with k, or 0 if unpaired.

Partition Function Algorithm (for simplest energy rules)

Real Energy Rules : Need to consider many special cases.

What type of loop are you closing?

Algorithm is more complex but still is O(N3).

kjjk

j

ikkijiij aZZZZ 1,1

4

1,1,

i j i j-1 j i k j

= or

)/exp( kTa kjkj where

N

endsijjiij

ij Z

ZZap

,1

1,1

N

jiji pp

10, 1

Equilibrium probability that base i is paired with j

Equilibrium probability that base i is unpaired

1 i j N

Example of pairing probabilities taken from Vienna package web-site

A B C D

B

E

CD

F

B

D

G

H

i

ii

iii I

Is folding kinetics important?

RNA folding kinetics involves reorganisation of secondary structure

Native structures may not be global minimum free energy states.

Morgan & Higgs (1996) J. Chem. Phys.

Quantity Fitting Function Parameters

Groundstate energy C1 = 2.9 (0.2)

= -0.368 (0.001)

Total number of states C2 = -5.6 (0.4)

= 0.533 (0.001)

Number of groundstates C3 = 1.75 (0.2)

= 0.068 (0.001)

NCE 1

NC 2ln

NC 3ln

Energy Landscapes in RNA Folding

Morgan & Higgs (1998)

Groundstates are degenerate in this model because energies are integers.

Generate many random groundstates.

How far apart are these groundstates?

How high are the barriers between groundstates?

We found Frozen pairs (present in every groundstate)

This figure shows the frozen pairs only.

The molecule is divided into independent unfrozen loops.

Define Neff as the length of the longest loop.

Two groundstates for the same sequence

Minimum Free Energy Prediction

Deterministic. Always gets MFE structure for a given set of energy rules.

If MFE structure is not the same as biological structure, this could be because

(i) energy rules are inaccurate or insufficient

(ii) kinetics is important and molecule is trapped in metastable state.

Monte Carlo simulations of folding kinetics.

Store a current structure.

Estimate rates of removal of existing helices and rates of addition of other compatible helices.

Choose one helix to be added or removed with probability proportional to its rate.

Repeat this many times. Can simulate structure formation from an unfolded state.

Q is a bacteriophage

RNA virus with approx 4000 nucleotides

Viral RNA has complex secondary structure.

The replicase gene codes for the replicase protein. This is an RNA-dependent RNA polymerase.Synthesizes complementary strand. Viral replication needs two steps: plus to minus to plus.

In vitro RNA evolution in the Q system

c c c c

Begin with Replicase + nucleotides +viral RNA

Replicase + nucleotides only

sequence RNA after many transfers

Transfer small quantity to each successive tube

Barrier heights between alternative groundstates

Observation:

Mean barrier height between groundstates scales as

<h> ~ Neff0.5

Neff ~ 0.3 N

Therefore barriers become significant for large enough sequences.

An example where kinetics is important to control biological function:

the 5’ region of the MS2 phage.

3500

130Maturation protein

0.0 2.0 4.0 6.0 8.0Time (s)

0.0

0.1

0.2

Ave

rage

pro

b. S

D fr

ee CC3435AA

WT & U32C

SA

Time to formation of the 5’ structure influences expression of the maturation protein more than the stability of this structure.

Simulations compare with experiments on mutant sequences.

RNA in comparison to Proteins

Both have well defined 3d structures

RNA folding problem is easier because secondary structure separates from tertiary structure more easily - But it is still a complex problem.

RNA model has real parameters therefore you can say something about real molecules. RNA folding algorithm is simple enough to be able to do statistical physics. (cf. 27-mer lattice protein models).

Part of sequence alignment of Mitochondrial Small Sub-Unit rRNA

Full gene is length ~950

11 Primate species with mouse as outgroup

Mouse : Lemur : Tarsier : SakiMonkey : Marmoset : Baboon : Gibbon : Orangutan : Gorilla : PygmyChimp : Chimp : Human :

* 20 * 40 * 60 * CUCACCAUCUCUUGCUAAUUCAGCCUAUAUACCGCCAUCUUCAGCAAACCCUAAAAAGG-UAUUAAAGUAAGCAAAAGACUCACCACUUCUUGCUAAUUCAACUUAUAUACCGCCAUCCCCAGCAAACCCUAUUAAGGCCC-CAAAGUAAGCAAAAACCUUACCACCUCUUGCUAAUUCAGUCUAUAUACCGCCAUCUUCAGCAAACCCUAAUAAAGGUUUUAAAGUAAGCACAAGUCUUACCACCUCUUGCC-AU-CAGCCUGUAUACCGCCAUCUUCAGCAAACUCUA-UAAUGACAGUAAAGUAAGCACAAGUCUCACCACGUCUAGCC-AU-CAGCCUGUAUACCGCCAUCUUCAGCAAACUCCU-UAAUGAUUGUAAAGUAAGCAGAAGUCCCACCCUCUCUUGCU----UAGUCUAUAUACCGCCAUCUUCAGCAAACCCUGAUGAAGGCUACGAAGUGAGCGCAAAUCUCACCAUCUCUUGCU----CAGCCUAUAUACCGCCAUCUUCAGCAAACCCUGACAAAGGCUAUAAAGUAAGCACAAACCUCACCACCCCUUGCU----CAGCCUAUAUACCGCCAUCUUCAGCAAACCCUGAUGAAGGCCACGAAGUAAGCGCAAACCUCACCACCUCUUGCU----CAGCCUAUAUACCGCCAUCUUCAGCAAACCCUGACGAAGGCCACAAAGUAAGCACAAGUCUCACCGCCUCUUGCU----CAGCCUAUAUACCGCCAUCUUCAGCAAACCCUGAUGAAGGUUACAAAGUAAGCGCAAGUCUCACCGCCUCUUGCU----CAGCCUAUAUACCGCCAUCUUCAGCAAACCCUGAUGAAGGUUACAAAGUAAGCGCAAGUCUCACCACCUCUUGCU----CAGCCUAUAUACCGCCAUCUUCAGCAAACCCUGAUGAAGGCUACAAAGUAAGCGCAAGUCucACC cuCUuGCu cAgccUaUAUACCGCCAUCuuCAGCAAACcCu A G aAAGUaAGC AA

: 78 : 78 : 79 : 76 : 76 : 75 : 75 : 75 : 75 : 75 : 75 : 75

le tte rs to na ture

61 6 N ATURE |V O L 4 0 9 |1 FEBR UA RY 2001|www.na ture .co m

C ondylura *

C holoepus d id. * C holoepus ho f.Tam anduaM yrm ecophagaEuphractus * C haetophractusTrichechus *Loxodonta * P rocavia * Echinops * O rycteropus *M acroscelidesE lephantulus *D idelphis * M acropus *

S ireniaProboscideaH yracoideaTenrecidaeTubulidentata

Xenarthra

M acroscelidea

M arsup ia lia

Tethytheria

Paenungulata

Afrotheria

100 77 99

94< 50 99

97 63

< 50 79

< 50< 50 50

< 50< 50

85 62

< 50

656479

100100100

100100100

100100

100100

100100

100100100

100100

936485

100100 C avia *

H ydrochaerusAgoutiE reth izonM yocastorD inom ysH ystrixH eterocephalus *M us *R attusC ricetusPedetesC astorD ipodom ysTam ias *M uscardinusSylv ilagus *O chotona *H ylobatesH om o *M acaca *A te les *C allim icoC ynocephalus */ ** Lem ur *T a r s i u s *T u p a i a *

Cavio-m orpha Hystric o-

gnath i

Anthropoidea Prim ates

D erm optera **

Scandentia

R odentia

Lagom orpha

Glires

Lem uriform es

Tarsiiform esPrim ates

88< 50

5398

90100

995295

97 87100

7287

9784

988099

60< 50< 50

67< 50< 50

9070

100100

100100100

100100

100100100

100100100

100100

100100

7183

100100100

9995

100100

III

749999

M egaptera *TursiopsH ippopotam us *TragelaphusO kapiaSusLam a * C eratotherium *TapirusEquus *Felis * LeopardusPantheraC anis * U rsusM anis *A rtibeus *N ycterisP teropus * R ousettusErinaceus *Sorex *Asioscalops

Cetacea

C etartiodactyla

Perissodactyla

C arnivora

Pholidota

C hiropteraM egachiroptera

M icrochiroptera

'Eu lipotyphla '

6677

9199

637267

92100 90

100100

9693

5270

100100100

100100100

100 97 98

100 96100

100100

100 99

100100100

100100

96 97100

100100

100100

I V

II

I

100100

< 50< 50 63

< 50< 50 55

< 50< 50

72

Fig ure 1 Phylo g e ne ticre la tio nship sa m o ng64 p la c e nta lm a m m a lsa ndtwom a rsup ia lsb a se do na na lysiso f9,779b pfro m15 nuc le a ra ndthre em tDNAg e ne s.The tre ere p re se ntsthem inim ume vo lutio nto p o lo g ye stim a te dthro ug hne ig hb o urjo ining(N J ),usingm a xim umlike liho o d(M L)d ista nc e s(se eM e tho d s).M a xim ump a rsim o ny(M P)a ndM L a na lyse s(se eSup p le m e nta ryInfo rm a tio n)p ro d uc e da sim ila rto p o lo g y(M P:TL = 26,422,c o nsiste nc yind e xC I= 0.34,re te ntio nind e xRI= 0.47;M L:Ln Like liho o d=- 95086.03)withd iffe re nc e sre vo lvingm o stlya ro undsho rtinte rno d e swithlo wb o o tstra psup p o rt.Num b e rsind ic a tep e rc e ntb o o tstra psup p o rtfro mNJ (to p ),M P(m id d le )a ndM L(b o tto m ),b a se do n1,000ite ra tio nsfo rN J a ndM P, a nd100ite ra tio nsfo rM L. M La na lyse swe reb a se do na p rune dd a tase t(37ta xam a rke dwitha ste risks). Te rm ina lta xa

a rela b e lle db ythe irg e ne ricd e sig na tio n,e xc e p twhe rem ultip lem e m b e rso fa g e nuswe reinc lud e d(se eM e tho d s).Bra c ke tsind ic a tehig he rle ve lta xo no m icg ro up so b se rve dinre sultingtre e s(rig hto ftre e ).A d iffe re nc eb e twe e nM P/M La ndNJ a na lyse swa sthep o sitio no fthe¯ yingle m ur(C yno c e p ha lus, d o ub lea ste risk).M Pa ndM Ld iffe re dfro mthesho wntre eb ysup p o rtingp rim a tem o no p hylya nda siste r-g ro up re la tio nshipb e twe e nDe rm o p te raa ndSc a nd e ntia .We ig hte dp a rsim o nya na lyse susingo nlytra nsve rsio ns (Tv),re m o va lo fthird p o sitio ntra nsitio ns(Ts), a nda Tv:Ts we ig hto f2:1 p ro d uc e dto p o lo g ie sc o ng rue ntwiththesho wntre e ,withd iffe re nc e sre vo lvingo nlya ro undb ra nc he sd e p ic te dhe rewith lo w(, 50% )b o o tstra psup p o rt.

© 2001 M acm illan M a gazines Ltd

Murphy et al.

Nature (2001)

uses 15 nuclear plus 3 mitochondrial proteins

Afrotheria / Laurasiatheria

Striking examples of convergent evolution

Cao et al. (2000) Gene

uses 12 mitochondrial proteins

RNA pairs model (GR7) 53 complete Mammalian mitochondrial genomes

Complete set of rRNAs + tRNAs from = 973 pairs. Jow et al. (2002)

86

97

100

100

100

100

100

AB

E

D

C

3BC

D

AE

4

A

D

E

B C

1

A

E

D

BC

2

MCMC searches the rugged landscape in tree space using the Metropolis algorithm.

Obtains a set of possible trees weighted according to their likelihood.

1. Rate parameter changes = continuous

2. Branch length changes = continuous

3. Topology changes = discrete

Nearest-neighbour interchange

Long-range move

Models of Sequence Evolution

Pij(t) = probability of being in state j at time t

given that ancestor was in state i at time 0.

States label bases A,C,G & Ti

t

jkjk

ikij rP

dt

dP

rij is the rate of substitution from state i to state j

The HKY model describes rate of evolution of single sites

*

*

*

*

CGA

TGA

TCA

TCG

T

C

G

A

TCGA

to

from

The frequencies of the four bases are

is the transition-transversion rate parameter

* means minus the sum of elements on the row

.,,, TCGA

1234567 7654321 ((((((( ))))))) Bacillus subtilis GGCUCGG CCGAGCCEscherichia coli GCCCGGA UCCGGGCSaccharomyces cerevisiae GCGGAUU AAUUCGCDrosophila melanogaster GCCGAAA UUUCGGCHomo sapiens GCCGAAA UUUCGGC

Compensatory Substitutions

Two sides of the acceptor stem from a tRNA are shown.

Due to structure conservation alignment is possible in widely different species.

*7

*6

*5

*4

*3

*2

*1

7654321

676575474373272171

677565464363262161

577566454353252151

477466455343242141

377366355344232131

277266255244233121

177166155144133122

MM

CG

UG

UA

GC

GU

AU

MMCGUGUAGCGUAU

Model 7A is a General Reversible 7-state Model

7 frequencies i + 21 rate parameters ij

- 2 constraints = 26 free parameters

Probability of remaining in same state Pii

SSU rRNA sequences from Eubacteria

Probability Pij of changes from CG to other pairs

SSU rRNA from Eubacteria

 

AU

GC

GU

CG

UA

UG

slow

fastfast

Selection against GU and UG is weaker than against mismatches.

Double transitions are faster than double transversions.

Double transitions are faster than single transitions to GU and UG states. This is explained by the theory of compensatory substitutions.

What is going on?

tRNAmitoch.

tRNAgeneral

tRNAarchaea

Rnase P SSU rRNA

G+C averageG+C helical regions

0.3390.448

0.5320.681

0.6360.829

0.5940.730

0.5450.674

Frequencies GCCGAUUAGUUG

MM

0.2660.1210.2570.2330.0460.0300.046

0.3720.2600.1280.1420.0430.0250.030

0.4730.3200.0570.0770.0310.0200.022

0.3850.2960.1170.1040.0500.0220.026

0.3520.2980.1220.1730.0200.0210.014

Number of sequences 884 754 64 84 455

Number of pairs 21 21 21 80 296

Analysis of RNA sequence databases

Selection for thermodynamically stable structures

Higgs (2000) Quart. Rev. Biophysics

tRNAmitoch.

tRNAgeneral

tRNAarchaea

Rnase P SSU rRNA

Mutabilities GCCGAUUAGUUG

MM

0.670.840.860.772.443.322.32

0.490.831.461.241.965.010.99

0.450.894.011.781.853.000.86

0.650.601.461.091.722.845.24

0.550.661.400.933.924.367.84

Double transitions / Double transversions

4.7 1.7 2.3 3.1 2.1

Double transitions /Transitions to GU or UG

1.6 2.0 8.9 3.6 2.8

Analysis of RNA Substitution Rates

Thermodynamic properties influence Evolutionary properties

Recommended