Transcript
Page 1: David Penny - Loss of information at deeper divergences, and what we can do about it

Loss of information at deeper times (and the origin of proteins?)

David Penny Brisbane July 2014

The mathematicos caused the problem!!! Now they should solve it!

Okay, maybe we could help them, Here are some ideas

And the origins of protein synthesis

Page 2: David Penny - Loss of information at deeper divergences, and what we can do about it

the comfort zone

ML Int

ML Rel Mlav ML

MLan MP ML

MLep MP MP

popn classic phylogeny deep phylogeny

Page 3: David Penny - Loss of information at deeper divergences, and what we can do about it

can we go further back

in time?

Markov models - Loss of information

Mossel and Steel 2004-5

Page 4: David Penny - Loss of information at deeper divergences, and what we can do about it

damned eukaryotes!

Page 5: David Penny - Loss of information at deeper divergences, and what we can do about it

fungamals

Fred or LECA

Animals

Fungi

Microsporidia

One Eukaryotic Tree Plantae

Plants Green Algae

Red Algae

Amoebozoa

Excavates

Diplomonads Parabasalids

Euglenozoa Heterolobosea

Rhizaria Radiolaria

Cercozoa

Chromalveolates

Alveolates

Stramenopiles

What is common to all groups of modern (extant) eukaryotes? We have pretty good data. We can get solid evidence.

crown group

Page 6: David Penny - Loss of information at deeper divergences, and what we can do about it

Calculated results, Δ ≤ ¼ + ne-qt

-0.2

0

0.2

0.4

0.6

0.8

1

1 10 100 1000 10000

0.01 0.005 0.002 0.001

Page 7: David Penny - Loss of information at deeper divergences, and what we can do about it

0%

20%

40%

60%

80%

100%

120%

0.1 1 10

pe

rce

nta

ge

of

tre

es

co

rre

ctd=0.001d=0.100d=0.500d=1.000d=2.000d=5.000infinite

idea 1. simulations (covarion model)

Page 8: David Penny - Loss of information at deeper divergences, and what we can do about it

number of internal edges correct, out of 6neighbor joining, 9 taxa, 1000 columns, i.i.d.

00.5

1

5 8 13 20 32 50 80 125

200

320

500

790

1250

2000

millions of years (log scale)

6

5

4

3

2

1

0

simulation results with standard model

Page 9: David Penny - Loss of information at deeper divergences, and what we can do about it

idea 2, delete fast sites

If there were a mixture of a) faster evolving sites, and b) and we could identify them c) and remove them would that help go further back in time?

Page 10: David Penny - Loss of information at deeper divergences, and what we can do about it

deleting faster sites

Presenter
Presentation Notes
Pearson correlation results. The blue line indicates the Pearson correlation coefficient (r) of the ML distance calculated from “A” (more conserved) and “B” (less conserved) partitions. The red line indicates the r value of uncorrected p-distances and ML distances for B partitions. The r values begin to increase significantly at 31,136 sites remaining and this is taken to indicate that the assumed model of nucleotide evolution is beginning to fit the data well.
Page 11: David Penny - Loss of information at deeper divergences, and what we can do about it

Ancestral Sequence Reconstruction

Giardia animals plants

(idea 3)

Page 12: David Penny - Loss of information at deeper divergences, and what we can do about it

3,4 testing

Ancestral Sequence Reconstr-

uction

vaults 3-D info

Page 13: David Penny - Loss of information at deeper divergences, and what we can do about it

subgroups X and Y

a b c d e k l m n o

ax ay

subgroup X subgroup Y

Page 14: David Penny - Loss of information at deeper divergences, and what we can do about it

chloroplast vs nuclear

Data Type

Group X Group Y Divergence Times

X² d.f. p(X²) X²(control)

p(X²)(control)

Chloroplast

Eudicot Monocot 125mya 289.058 102 1.94E-19 93.690 7.09E-01

Chloroplast

Angiosperm

Gymnosperm 305mya 363.527 104 1.23E-29 85.647 9.05E-01

Chloroplast

Seed plant Fern 390mya 457.118 102 1.69E-44 100.451

5.25E-01

Chloroplast

Streptophyta

Chlorophyta 700mya 300.162 94 2.23E-23 90.982 5.69E-01

Chloroplast

Red Algae Green Algae ~ 1000mya 341.014 82 2.60E-34 70.928 8.03E-01

Chloroplast

Algae Cyanobacteria ~ 1500mya 231.079 56 3.90E-23 62.718 2.50E-01

Nuclear Algae Cyanobacteria ~ 1500mya 70.479 34 2.30E-04 39.342 2.43E-01

Page 15: David Penny - Loss of information at deeper divergences, and what we can do about it

4. gene length vs similarity

Page 16: David Penny - Loss of information at deeper divergences, and what we can do about it

5. should we emphasize conserved residues?

(or surface ones)

Page 17: David Penny - Loss of information at deeper divergences, and what we can do about it

f1 a . . . . a . . . . . . f2 . . . . . a . . . . a . f3 . . . . . a . . . . . a f4 . . . . . a . . . . . . g1 . a a a a . a a . . . . g2 a a a a a . . a . . a . h1 . . a a a . . . a a . . h2 . . a . a . . . a a . . h3 a . . a a . . . a a . .

i j 3 4 5 6 7 8 9 0 1 2

i j i j . . . a

a . a a

upper bound = 17

lower bound = 12

? 13

Would weighting by incompatibilities

help?

6, Weighting

Page 18: David Penny - Loss of information at deeper divergences, and what we can do about it

7, information from sequence order not used Alignment Reordered Alignment

original sequence order shuffled/reordered AIIFLNSALGPSPELFPIILATKVL ASAGPSPPATPLLIIIILLFFNEKV AIMFLNSALGPPTELFPVILATKVL ASAGPPTPATPLLIMVILLFFNEKV SIMFLNHTLNPTPELFPIILATETL SHTNPTPPATPLLIMIILLFFNEET TILFLNSSLGLQPEVTPTVLATKTL TSSGLQPPATPLLILTVLVTFNEKT TLLFLNSMLKPPSELFPIILATKTL TSMKPPSPATPLLLLIILLFFNEKT ALLFLNSTLNPPTELFPLILATKTL ASTNPPTPATPLLLLLILLFFNEKT AILFLNSFLNPPKEFFPIILATKIL ASFNPPKPATPLLILIILFFFNEKI

c! ways to reorder alignment shuffle by columns & by taxa

8. could we use ‘words’ of 2, 3, 4, 5, … letters

9. Alphabet reduction?

Page 19: David Penny - Loss of information at deeper divergences, and what we can do about it

damned eukaryotes!

Page 20: David Penny - Loss of information at deeper divergences, and what we can do about it

limits of evolutionary mechanisms - no miracles

continuity all intermediates ‘functional’ can’t evolve “for” what doesn’t exist

Protein synthesis? mRNA, tRNAs, rRNAs, triplet code - why 3?

the origin of protein synthesis?

Page 21: David Penny - Loss of information at deeper divergences, and what we can do about it

Eigen limit 1

master sequence

Page 22: David Penny - Loss of information at deeper divergences, and what we can do about it

Eigen limit 2

mutation

selection 0

Page 23: David Penny - Loss of information at deeper divergences, and what we can do about it

Eigen limit 3

mutation

selection 0

Page 24: David Penny - Loss of information at deeper divergences, and what we can do about it

Eigen limit 4

mutation

selection 0

~1 error per replication,

error catastrophe,

mutational meltdown

Page 25: David Penny - Loss of information at deeper divergences, and what we can do about it

error rates

the error rate limits the length that can be copied, Manfred Eigen (1971) 2 errors in 37 copies of hammerhead ribozyme, 20-fold improvement

Page 26: David Penny - Loss of information at deeper divergences, and what we can do about it

ribavirin and polio viruses

Crotty et al. PNAS 98, 6895 2001

Page 27: David Penny - Loss of information at deeper divergences, and what we can do about it

synthesis of RNA

from cyclic GTPs

Page 28: David Penny - Loss of information at deeper divergences, and what we can do about it

hydrolysis ↔ polymerisation

+H3N C C N -C C O- + H2O

O H O H

R1 R2

H3N+CC O- + H3N+C C O-

O H

R}

O H

R2

α α α α

two monomers ↔ one dimer (+H2O).

heat amino acids dry, or drying cycles (fluctuating clay environment) or frozen in ice?

Page 29: David Penny - Loss of information at deeper divergences, and what we can do about it

RNA evolution - in vitro

Page 30: David Penny - Loss of information at deeper divergences, and what we can do about it

enzyme efficiency RNA ⇒ RNP ⇒ protein

CATALYST Kcat

(min-1

) Kcat/Km

(M-1

min-1

)

RNA Tetrahymena L-21(SacI) polynucleotide kinase RNase P RNA

0.1 0.3

1

9.0 x 107 6.0 x 103 2.0 x 106

RNP RNase P RNA + protein 2 4.0 x 106 protein RNase T1

T4 polynucleotide kinase triose-P isomerase carbonic anhydrase

5,700 25,000

258,000 600,000,000

1.1 x 108 6.0 x 108 1.4 x 1010 7.2 x 109

RNA copied by protein, error rate is high

Page 31: David Penny - Loss of information at deeper divergences, and what we can do about it

origin of protein synthesis??

NNN

N

N-TP

A

B

C

NNN

How long would a G=C pairing last?

Page 32: David Penny - Loss of information at deeper divergences, and what we can do about it

origin of protein synthesis?

D

new RNA strand

template strand

ribozyme-catalysed decoding, cleavage and

ligation functions

A C C

aa+

aa+

aa+

activated tRNA

activated tRNA

activated tRNA

inactivated tRNA

A C C

NNN

A C C

1 2

3

Page 33: David Penny - Loss of information at deeper divergences, and what we can do about it

predictions

Theoretical/computational time of (say) GC binding is it increased by heavy RNAs di and trinucleotides Experimental length of RNA copied (dinucleotides) amino acid codes???


Recommended