Transcript
Page 1: Comparative analysis of ribosomal proteins in complete genomes: ribosome “striptease” in Archaea Odile Lecompte, Raymond Ripp, Jean-Claude Thierry, Dino

Comparative analysis of ribosomal proteins in complete genomes: ribosome “striptease” in Archaea

Odile Lecompte, Raymond Ripp, Jean-Claude Thierry, Dino Moras and Olivier PochLaboratoire de Biologie et Génomique Structurales, Institut de Génétique et de Biologie Moléculaire et Cellulaire (CNRS, INSERM, ULP), BP163, 67404 Illkirch Cedex, France

S18pS11p

S6p

S7p

S9p

S13p

S19p

S3p

S15p

S17p

S20p

S16p

S4pS12p

S2p

S8p

S5pS14p

S10p

Thx

S18pS11p

S6p

S7p

S9p

S13p

S19p

S3p

S15p

S17p

S20p

S16p

S4pS12p

S2p

S8p

S5pS14p

S10p

Thx

A comprehensive investigation of ribosomal genes in complete genomes from 66 different species allows us to address the distribution of r-proteins between and within the three primary domains. 34 r-protein families are represented in all domains but 33 families are specific to Archaea and Eucarya, providing evidence for specialisation at an early stage of evolution between the bacterial lineage and the lineage leading to archaea and eukaryotes. With only one specific r-protein, the archaeal ribosome appears to be a small-scale model of the eukaryotic one in term of protein composition. However, the mechanism of evolution of the protein component of the ribosome appears dramatically different in Archaea. In Bacteria and Eucarya, a restricted number of ribosomal genes can be lost with a bias toward losses in intracellular pathogens. In Archaea, losses implicate 15% of the ribosomal genes revealing an unexpected plasticity of the translation apparatus and the pattern of gene losses indicates a progressive elimination of ribosomal genes in the course of archaeal evolution. This first documented case of reductive evolution at the domain scale provides a new framework for discussing the shape of the universal tree of life and the selective forces directing the evolution of prokaryotes.

B: 23(8;15)

A: 1(0;1)

E: 11(4;7)

BAE: 34(15;19)

AE: 33(13;20)

BA: 0BE: 0

Bacteria: 57 (23;34)

Archaea: 68 (28;40)Eucarya: 78 (32;46)

B: 23(8;15)

A: 1(0;1)

E: 11(4;7)

BAE: 34(15;19)

AE: 33(13;20)

BA: 0BE: 0

Bacteria: 57 (23;34)

Archaea: 68 (28;40)Eucarya: 78 (32;46)

Archaea EucaryaBacteria

34

23

33

111

Archaea EucaryaBacteria

34

23

33

111

An initial set of ribosomal proteins classified into 102 families was obtained at http://www.expasy.ch/cgi-bin/lists?ribosomp.txt. For each family, representatives of various lineages across Bacteria, Archaea and Eucarya were used as probes and systematically compared to a non-redundant protein database consisting of SwissProt, SpTrEMBL and SpTrEMBLNEW using the BlastP program (1) with a cut-off of E<0.001. The results of the BlastP comparison were cross-validated by a TBlastN search against a complete genome database including 66 different species. The putative new gene sequences detected by the TBlastN searches were examined in the light of their genomic context to eliminate false-positives “hits”. For each r-protein family, the likely r-protein sequences obtained by the BlastP and TBlastN searches were included in a multiple alignment constructed by MAFFT (2). All alignments were refined by RASCAL (3) and their quality assessed by NorMD (4). These alignments were manually examined to remove false-positives observed in some ribosomal protein families, in particular those containing ubiquitous RNA-binding domains.

BlastP Hit between RL40_METJA (Query) and RL40_HUMAN

>SW:RL40_HUMAN P14793 60S RIBOSOMAL PROTEIN L40 (CEP52). 10/2001 Length = 52

Score = 31.6 bits (70), Expect = 1.8 Identities = 18/34 (52%), Positives = 20/34 (57%), Gaps = 3/34 (8%)

Query: 13 KKICMRCNARNPWRATKCR--KCGY-KGLRPKAK 43 K IC +C AR RA CR KCG+ LRPK KSbjct: 17 KMICRKCYARLHPRAVNCRKKKCGHTNNLRPKKK 50

Small size and biased composition of r-proteins

Difficulty of protein detection by

similarity search

Genes often missed during annotation

process

A complex Last Universal Common Ancestor ?A complex Last Universal Common Ancestor ?A complex Last Universal Common Ancestor ?A complex Last Universal Common Ancestor ?

Interdomain distribution

Diplomonads*

Microsporidia

Trichomonads*

Flagellates*

Ciliates*

Plants

Fungi

Animals

Halobacterium

Methanobacterium

Methanococcus

Pyrococcus

Gram positives

Proteobacteria

Cyanobacteria

Chlamydia

Thermotoga

Aeropyrum

Archaeoglobus

Thermoplasma

Aquifex

Deinococcus

Methanopyrus

Pyrobaculum

Sulfolobus

L38e L13e S25e S26e S30e

L14e L34e L30e LXaL35ae

S1p

S21p

L25p

L30p

S22p S21e

L28e

Bacteria Archaea Eucarya

Spirochaetes

Diplomonads*

Microsporidia

Trichomonads*

Flagellates*

Ciliates*

Plants

Fungi

Animals

Halobacterium

Methanobacterium

Methanococcus

Pyrococcus

Gram positives

Proteobacteria

Cyanobacteria

Chlamydia

Thermotoga

Aeropyrum

Archaeoglobus

Thermoplasma

Aquifex

Deinococcus

Methanopyrus

Pyrobaculum

Sulfolobus

L38eL38e L13eL13e S25eS25e S26eS26e S30eS30e

L14eL14e L34eL34e L30eL30e LXaL35aeL35ae

S1pS1p

S21pS21p

L25pL25p

L30pL30p

S22p S21eS21e

L28e

Bacteria Archaea Eucarya

Spirochaetes

Ribosomal protein losses in each of the three domains

Full circles indicate proteins absent in all complete genomes investigated in the

indicated taxon. Empty circles stand for proteins absent in some complete

genomes of the indicated taxon

• Prevalence of r-proteins within the universal pool that may be present in the last universal common ancestor (LUCA)

• specialization of bacterial versus archaeal/eukaryotic ribosomes

• the majority of archeal and eucaryotic r-proteins appears before the split between Archaea and Eucarya, suggesting a complex cenancestor

Reductive evolution as a general trend in Archaea ? In Procaryotes ?

A complex Last Universal Common Ancestor (LUCA) ?

the 30S ribosomal subunit of Thermus thermophilus (5) (back side)

L23p

L13p

L3p

L14p

L29p

L2pL24p

L4p/L4e

L15p

L18pL5p

L6p

L22p

L11p

L34p

L28p

L31pL9p

L19p

L17pL32p

L25p L30p

L33pL21p

L20p

L27pL16p

L36p

L35p

L23p

L13p

L3p

L14p

L29p

L2pL24p

L4p/L4e

L15p

L18pL5p

L6p

L22p

L11p

L34p

L28p

L31pL9p

L19p

L17pL32p

L25p L30p

L33pL21p

L20p

L27pL16p

L36p

L35p

the 50S ribosomal subunit of Deinococcus radodurans (6) (crown view rotated by 180°)

Localisation in the 3D structures « Strip-tease »of the archaeal

ribosome

« Strip-tease »of the archaeal

ribosome

Bacteria-specific proteins (colored in different shades of red) are preferentially located at the periphery of the ribosome

AbstractAbstractAbstractAbstract Ribosomal gene detection : cross-validation Ribosomal gene detection : cross-validation needed !needed !Ribosomal gene detection : cross-validation Ribosomal gene detection : cross-validation needed !needed !

Several representatives f or each protein f amily

BlastP

Proteins

BlastP

Proteins

TBlastN

Completegenomes

Homology DetectionAnalysis)

TBlastN

66 completegenomes

)

Several representatives f or each protein f amily

BlastP

Proteins

BlastP

Proteins

TBlastN

Completegenomes

Homology DetectionAnalysis)

TBlastN

66 completegenomes

)

Protocol of ribosomal gene detection 102 r-protein families

Creation of 24 missed

genes

Complete genomesR-protein families

45

B

acte

ria1

4

Arch

aea

7

Eu

cary

a

100% of the family representatives in both blastp and tblastn>50% of the family representatives in blastp<50% of the family representatives in blastp 0% of the family representatives in both blastp and tblastn 0% of the family representatives in blastp but detected by tblastn

(gene missed during annotation process)

Protein detected by :Validation of protein sequences for each family

Mt L18P S5P L30P L15P SECYL19E... CMK L14EL34EHypADK TRUB

Ap SECY ADK CMK L14EL34EGATAHyp

Ss, St L18P S5P L30P L15P SECYL19E... CMK L14E TRUBa TRUBbL34EHypADK

Mk L18P S5P L30P L15P SECYL19E CMK L14EL34EHypADK

Py CMK L14EL34E TRUBGATA

CMK L14EL34EPa, Ph, Pf, Mj

Mt L18P S5P L30P L15P SECYL19E... CMK L14EL34EHypADK TRUBMt L18P S5P L30P L15P SECYL19E... CMK L14EL34EHypADK TRUB

Ap SECY ADK CMK L14EL34EGATAHypAp SECY ADK CMK L14EL34EGATAHyp

Ss, St L18P S5P L30P L15P SECYL19E... CMK L14E TRUBa TRUBbL34EHypADKSs, St L18P S5P L30P L15P SECYL19E... CMK L14E TRUBa TRUBbL34EHypADK

Mk L18P S5P L30P L15P SECYL19E CMK L14EL34EHypADKMk L18P S5P L30P L15P SECYL19E CMK L14EL34EHypADK

Py CMK L14EL34E TRUBGATAPy CMK L14EL34E TRUBGATA

CMK L14EL34EPa, Ph, Pf, Mj CMK L14EL34EPa, Ph, Pf, Mj

Genomic context analysis

Multiple alignment of complete

sequences

• Coherence of the protein family• Elimination of false-positives• Correction of protein sequences

All the alignments are available at http://www-igbmc.u-strasbg.fr/BioInfo/Rproteins

Progressive elimination of 10 r-proteins (15%) in the course of

archaeal evolution

First example of reductive evolution at domain-scale

E A B

Bacterial rooting

Simple ancestor(s)

A E B

Symbiosis

i o

Simple ancestor(s)

E A B

Eucarya rooting

Complex ancestor(s)

E A B

Bacterial rooting

Simple ancestor(s)

E A B

Bacterial rooting

Simple ancestor(s)

A E B

Symbiosis

i o

Simple ancestor(s)

A E B

Symbiosis

i o

Simple ancestor(s)

E A B

Eucarya rooting

Complex ancestor(s)

E A B

Eucarya rooting

Complex ancestor(s)

Which evolutionary scenario ?

References:1 Altschul,S.F., Madden,T.L., Schaffer,A.A., Zhang,J., Zhang,Z., Miller,W. and Lipman,D.J. (1997) Nucleic Acids Res., 25, 3389-3402.2 Katoh,K., Misawa,K., Kuma,K. and Miyata,T. (2002) Nucleic Acids Res., 30, 3059-3066. 3 Thompson,J.D., Thierry,J.C., Poch,O. (2003) Bioinformatics, 19, 1155-61.  4 Thompson,J.D., Plewniak,F., Ripp,R., Thierry,J.C. and Poch,O. (2001) J. Mol. Biol., 314, 937-951. 5 Wimberly,B.T., Brodersen,D.E., Clemons,W.M., Jr., Morgan-Warren,R.J., Carter,A.P., Vonrhein,C., Hartsch,T. and Ramakrishnan,V. (2000) Nature, 407, 327-339.6 Harms,J., Schluenzen,F., Zarivach,R., Bashan,A., Gat,S., Agmon,I., Bartels,H., Franceschi,F. and Yonath,A. (2001) Cell, 107, 679-688.

 

Recommended