1
Comparative analysis of ribosomal proteins in complete genomes: ribosome “striptease” in Archaea Odile Lecompte, Raymond Ripp, Jean-Claude Thierry, Dino Moras and Olivier Poch Laboratoire de Biologie et Génomique Structurales, Institut de Génétique et de Biologie Moléculaire et Cellulaire (CNRS, INSERM, ULP), BP163, 67404 Illkirch Cedex, France S18p S11p S6p S7p S9p S13p S19p S3p S15p S17p S20p S16p S4p S12p S2p S8p S5p S14p S10p Thx S18p S11p S6p S7p S9p S13p S19p S3p S15p S17p S20p S16p S4p S12p S2p S8p S5p S14p S10p Thx A comprehensive investigation of ribosomal genes in complete genomes from 66 different species allows us to address the distribution of r-proteins between and within the three primary domains. 34 r-protein families are represented in all domains but 33 families are specific to Archaea and Eucarya, providing evidence for specialisation at an early stage of evolution between the bacterial lineage and the lineage leading to archaea and eukaryotes. With only one specific r- protein, the archaeal ribosome appears to be a small-scale model of the eukaryotic one in term of protein composition. However, the mechanism of evolution of the protein component of the ribosome appears dramatically different in Archaea. In Bacteria and Eucarya, a restricted number of ribosomal genes can be lost with a bias toward losses in intracellular pathogens. In Archaea, losses implicate 15% of the ribosomal genes revealing an unexpected plasticity of the translation apparatus and the pattern of gene losses indicates a progressive elimination of ribosomal genes in the course of archaeal evolution. This first documented case of reductive evolution at the domain scale provides a new framework for discussing the shape of the universal tree of life and the selective forces directing the evolution of prokaryotes. B:23 (8;15) A :1 (0;1) E:11 (4;7) BAE:34 (15;19) AE:33 (13;20) BA:0 BE:0 Bacteria: 57 (23;34) A rchaea: 68 (28;40) Eucarya: 78 (32;46) B:23 (8;15) A :1 (0;1) E:11 (4;7) BAE:34 (15;19) AE:33 (13;20) BA:0 BE:0 Bacteria: 57 (23;34) A rchaea: 68 (28;40) Eucarya: 78 (32;46) A rchaea Eucarya Bacteria A rchaea Eucarya Bacteria 34 23 33 11 1 An initial set of ribosomal proteins classified into 102 families was obtained at http://www.expasy.ch/cgi- bin/lists?ribosomp.txt. For each family, representatives of various lineages across Bacteria, Archaea and Eucarya were used as probes and systematically compared to a non-redundant protein database consisting of SwissProt, SpTrEMBL and SpTrEMBLNEW using the BlastP program (1) with a cut- off of E<0.001. The results of the BlastP comparison were cross-validated by a TBlastN search against a complete genome database including 66 different species. The putative new gene sequences detected by the TBlastN searches were examined in the light of their genomic context to eliminate false-positives “hits”. For each r-protein family, the likely r- protein sequences obtained by the BlastP and TBlastN searches were included in a multiple alignment constructed by MAFFT (2). All alignments were refined by RASCAL (3) and their quality assessed by NorMD (4). These alignments were manually examined to remove false-positives observed in some ribosomal protein families, in particular those containing ubiquitous RNA-binding domains. BlastP Hit between RL40_METJA (Query) and RL40_HUMAN >SW:RL40_HUMAN P14793 60S RIBOSOMAL PROTEIN L40 (CEP52). 10/2001 Length = 52 Score = 31.6 bits (70), Expect = 1.8 Identities = 18/34 (52%), Positives = 20/34 (57%), Gaps = 3/34 (8%) Query: 13 KKICMRCNARNPWRATKCR--KCGY-KGLRPKAK 43 K IC +C AR RA CR KCG+ LRPK K Sbjct: 17 KMICRKCYARLHPRAVNCRKKKCGHTNNLRPKKK 50 Small size and biased composition of r-proteins Difficulty of protein detection by similarity search Genes often missed during annotation process A complex Last Universal Common Ancestor ? A complex Last Universal Common Ancestor ? Interdomain distribution Diplom onads* M icrosporidia Trichom onads* Flagellates* Ciliates* Plants Fungi Anim als Halobacterium Methanobacterium Methanococcus Pyrococcus G ram positives Proteobacteria Cyanobacteria Chlam ydia Thermotoga Aeropyrum Archaeoglobus Thermoplasma Aquifex Deinococcus Methanopyrus Pyrobaculum Sulfolobus L38e L13e S25e S26e S30e L14e L34e L30e LXa L35ae S1p S21p L25p L30p S22p S21e L28e Bacteria Archaea E ucarya Spirochaetes Diplom onads* M icrosporidia Trichom onads* Flagellates* Ciliates* Plants Fungi Anim als Halobacterium Methanobacterium Methanococcus Pyrococcus G ram positives Proteobacteria Cyanobacteria Chlam ydia Thermotoga Aeropyrum Archaeoglobus Thermoplasma Aquifex Deinococcus Methanopyrus Pyrobaculum Sulfolobus L38e L38e L13e L13e S25e S25e S26e S26e S30e S30e L14e L14e L34e L34e L30e L30e LXa L35ae L35ae S1p S1p S21p S21p L25p L25p L30p L30p S22p S21e S21e L28e Bacteria Archaea E ucarya Spirochaetes Ribosomal protein losses in each of the three domains Full circles indicate proteins absent in all complete genomes investigated in the indicated taxon. Empty circles stand for proteins absent in some complete genomes of the indicated taxon • Prevalence of r-proteins within the universal pool that may be present in the last universal common ancestor (LUCA) • specialization of bacterial versus archaeal/eukaryotic ribosomes • the majority of archeal and eucaryotic r-proteins appears before the split between Archaea and Eucarya, suggesting a complex cenancestor Reductive evolution as a general trend in Archaea ? In Procaryotes ? A complex Last Universal Common Ancestor (LUCA) ? the 30S ribosomal subunit of Thermus thermophilus (5) (back side) L23p L13p L3p L14p L29p L2p L24p L4p/L4e L15p L18p L5p L6p L22p L11p L34p L28p L31p L9p L19p L17p L32p L25p L30p L33p L21p L20p L27p L16p L36p L35p L23p L13p L3p L14p L29p L2p L24p L4p/L4e L15p L18p L5p L6p L22p L11p L34p L28p L31p L9p L19p L17p L32p L25p L30p L33p L21p L20p L27p L16p L36p L35p the 50S ribosomal subunit of Deinococcus radodurans (6) (crown view rotated by 180°) Localisation in the 3D structures « Strip-tease » ofthe archaeal ribosom e « Strip-tease » ofthe archaeal ribosom e Bacteria-specific proteins (colored in different shades of red) are preferentially located at the periphery of the ribosome Abstract Abstract Ribosomal gene detection : cross- Ribosomal gene detection : cross- validation needed ! validation needed ! S everalrepresentatives for each protein fam ily BlastP Proteins BlastP Proteins TBlastN Com plete genom es H om ology D etection Analysis ) TBlastN 66 com plete genom es ) S everalrepresentatives for each protein fam ily BlastP Proteins BlastP Proteins TBlastN Com plete genom es H om ology D etection Analysis ) TBlastN 66 com plete genom es ) Protocol of ribosomal gene detection 102 r-protein families Creation of 24 missed genes Complete genomes R-protein families 45 Bacteria 14 Archaea 7 Eucary a 100% of the family representatives in both blastp and tblastn >50% of the family representatives in blastp <50% of the family representatives in blastp 0% of the family representatives in both blastp and tblastn 0% of the family representatives in blastp but detected by tblastn (gene missed during annotation process) Protein detected by : Validation of protein sequences for each family L18P S5P L30P L15P SECY L19E ... CMK L14E L34E H yp ADK TR U B SECY ADK CMK L14E L34E GATA H yp L18P S5P L30P L15P SECY L19E ... CMK L14E TRUBa TRUBb L34E H yp ADK L18P S5P L30P L15P SECY L19E CMK L14E L34E H yp ADK CMK L14E L34E TR U B GATA CMK L14E L34E L18P S5P L30P L15P SECY L19E ... CMK L14E L34E H yp ADK TR U B L18P S5P L30P L15P SECY L19E ... CMK L14E L34E H yp ADK TR U B SECY ADK CMK L14E L34E GATA H yp SECY ADK CMK L14E L34E GATA H yp L18P S5P L30P L15P SECY L19E ... CMK L14E TRUBa TRUBb L34E H yp ADK L18P S5P L30P L15P SECY L19E ... CMK L14E TRUBa TRUBb L34E H yp ADK L18P S5P L30P L15P SECY L19E CMK L14E L34E H yp ADK L18P S5P L30P L15P SECY L19E CMK L14E L34E H yp ADK CMK L14E L34E TR U B GATA CMK L14E L34E TR U B GATA CMK L14E L34E CMK L14E L34E Genomic context analysis Multiple alignment of complete sequences • Coherence of the protein family • Elimination of false-positives • Correction of protein sequences All the alignments are available at http://www-igbmc.u-strasbg.fr/ BioInfo/Rproteins Progressive elimination of 10 r-proteins (15%) in the course of archaeal evolution First example of reductive evolution at domain-scale E A B Bacterialrooting Sim pleancestor(s) A E B Symbiosis i o Sim pleancestor(s) E A B Eucarya rooting Com plex ancestor(s) E A B Bacterialrooting Sim pleancestor(s) E A B Bacterialrooting Sim pleancestor(s) A E B Symbiosis i o Sim pleancestor(s) A E B Symbiosis i o Sim pleancestor(s) E A B Eucarya rooting Com plex ancestor(s) E A B Eucarya rooting Com plex ancestor(s) Which evolutionary scenario ? References: 1 Altschul,S.F., Madden,T.L., Schaffer,A.A., Zhang,J., Zhang,Z., Miller,W. and Lipman,D.J. (1997) Nucleic Acids Res., 25, 3389-3402. 2 Katoh,K., Misawa,K., Kuma,K. and Miyata,T. (2002) Nucleic Acids Res., 30, 3059-3066. 3 Thompson,J.D., Thierry,J.C., Poch,O. (2003) Bioinformatics, 19, 1155-61. 4 Thompson,J.D., Plewniak,F., Ripp,R., Thierry,J.C. and Poch,O. (2001) J. Mol. Biol., 314, 937-951. 5 Wimberly,B.T., Brodersen,D.E., Clemons,W.M., Jr., Morgan-Warren,R.J., Carter,A.P., Vonrhein,C., Hartsch,T. and Ramakrishnan,V. (2000) Nature, 407, 327-339. 6 Harms,J., Schluenzen,F., Zarivach,R., Bashan,A., Gat,S., Agmon,I., Bartels,H., Franceschi,F. and Yonath,A. (2001) Cell, 107, 679-688.

Comparative analysis of ribosomal proteins in complete genomes: ribosome “striptease” in Archaea Odile Lecompte, Raymond Ripp, Jean-Claude Thierry, Dino

  • View
    212

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Comparative analysis of ribosomal proteins in complete genomes: ribosome “striptease” in Archaea Odile Lecompte, Raymond Ripp, Jean-Claude Thierry, Dino

Comparative analysis of ribosomal proteins in complete genomes: ribosome “striptease” in Archaea

Odile Lecompte, Raymond Ripp, Jean-Claude Thierry, Dino Moras and Olivier PochLaboratoire de Biologie et Génomique Structurales, Institut de Génétique et de Biologie Moléculaire et Cellulaire (CNRS, INSERM, ULP), BP163, 67404 Illkirch Cedex, France

S18pS11p

S6p

S7p

S9p

S13p

S19p

S3p

S15p

S17p

S20p

S16p

S4pS12p

S2p

S8p

S5pS14p

S10p

Thx

S18pS11p

S6p

S7p

S9p

S13p

S19p

S3p

S15p

S17p

S20p

S16p

S4pS12p

S2p

S8p

S5pS14p

S10p

Thx

A comprehensive investigation of ribosomal genes in complete genomes from 66 different species allows us to address the distribution of r-proteins between and within the three primary domains. 34 r-protein families are represented in all domains but 33 families are specific to Archaea and Eucarya, providing evidence for specialisation at an early stage of evolution between the bacterial lineage and the lineage leading to archaea and eukaryotes. With only one specific r-protein, the archaeal ribosome appears to be a small-scale model of the eukaryotic one in term of protein composition. However, the mechanism of evolution of the protein component of the ribosome appears dramatically different in Archaea. In Bacteria and Eucarya, a restricted number of ribosomal genes can be lost with a bias toward losses in intracellular pathogens. In Archaea, losses implicate 15% of the ribosomal genes revealing an unexpected plasticity of the translation apparatus and the pattern of gene losses indicates a progressive elimination of ribosomal genes in the course of archaeal evolution. This first documented case of reductive evolution at the domain scale provides a new framework for discussing the shape of the universal tree of life and the selective forces directing the evolution of prokaryotes.

B: 23(8;15)

A: 1(0;1)

E: 11(4;7)

BAE: 34(15;19)

AE: 33(13;20)

BA: 0BE: 0

Bacteria: 57 (23;34)

Archaea: 68 (28;40)Eucarya: 78 (32;46)

B: 23(8;15)

A: 1(0;1)

E: 11(4;7)

BAE: 34(15;19)

AE: 33(13;20)

BA: 0BE: 0

Bacteria: 57 (23;34)

Archaea: 68 (28;40)Eucarya: 78 (32;46)

Archaea EucaryaBacteria

34

23

33

111

Archaea EucaryaBacteria

34

23

33

111

An initial set of ribosomal proteins classified into 102 families was obtained at http://www.expasy.ch/cgi-bin/lists?ribosomp.txt. For each family, representatives of various lineages across Bacteria, Archaea and Eucarya were used as probes and systematically compared to a non-redundant protein database consisting of SwissProt, SpTrEMBL and SpTrEMBLNEW using the BlastP program (1) with a cut-off of E<0.001. The results of the BlastP comparison were cross-validated by a TBlastN search against a complete genome database including 66 different species. The putative new gene sequences detected by the TBlastN searches were examined in the light of their genomic context to eliminate false-positives “hits”. For each r-protein family, the likely r-protein sequences obtained by the BlastP and TBlastN searches were included in a multiple alignment constructed by MAFFT (2). All alignments were refined by RASCAL (3) and their quality assessed by NorMD (4). These alignments were manually examined to remove false-positives observed in some ribosomal protein families, in particular those containing ubiquitous RNA-binding domains.

BlastP Hit between RL40_METJA (Query) and RL40_HUMAN

>SW:RL40_HUMAN P14793 60S RIBOSOMAL PROTEIN L40 (CEP52). 10/2001 Length = 52

Score = 31.6 bits (70), Expect = 1.8 Identities = 18/34 (52%), Positives = 20/34 (57%), Gaps = 3/34 (8%)

Query: 13 KKICMRCNARNPWRATKCR--KCGY-KGLRPKAK 43 K IC +C AR RA CR KCG+ LRPK KSbjct: 17 KMICRKCYARLHPRAVNCRKKKCGHTNNLRPKKK 50

Small size and biased composition of r-proteins

Difficulty of protein detection by

similarity search

Genes often missed during annotation

process

A complex Last Universal Common Ancestor ?A complex Last Universal Common Ancestor ?A complex Last Universal Common Ancestor ?A complex Last Universal Common Ancestor ?

Interdomain distribution

Diplomonads*

Microsporidia

Trichomonads*

Flagellates*

Ciliates*

Plants

Fungi

Animals

Halobacterium

Methanobacterium

Methanococcus

Pyrococcus

Gram positives

Proteobacteria

Cyanobacteria

Chlamydia

Thermotoga

Aeropyrum

Archaeoglobus

Thermoplasma

Aquifex

Deinococcus

Methanopyrus

Pyrobaculum

Sulfolobus

L38e L13e S25e S26e S30e

L14e L34e L30e LXaL35ae

S1p

S21p

L25p

L30p

S22p S21e

L28e

Bacteria Archaea Eucarya

Spirochaetes

Diplomonads*

Microsporidia

Trichomonads*

Flagellates*

Ciliates*

Plants

Fungi

Animals

Halobacterium

Methanobacterium

Methanococcus

Pyrococcus

Gram positives

Proteobacteria

Cyanobacteria

Chlamydia

Thermotoga

Aeropyrum

Archaeoglobus

Thermoplasma

Aquifex

Deinococcus

Methanopyrus

Pyrobaculum

Sulfolobus

L38eL38e L13eL13e S25eS25e S26eS26e S30eS30e

L14eL14e L34eL34e L30eL30e LXaL35aeL35ae

S1pS1p

S21pS21p

L25pL25p

L30pL30p

S22p S21eS21e

L28e

Bacteria Archaea Eucarya

Spirochaetes

Ribosomal protein losses in each of the three domains

Full circles indicate proteins absent in all complete genomes investigated in the

indicated taxon. Empty circles stand for proteins absent in some complete

genomes of the indicated taxon

• Prevalence of r-proteins within the universal pool that may be present in the last universal common ancestor (LUCA)

• specialization of bacterial versus archaeal/eukaryotic ribosomes

• the majority of archeal and eucaryotic r-proteins appears before the split between Archaea and Eucarya, suggesting a complex cenancestor

Reductive evolution as a general trend in Archaea ? In Procaryotes ?

A complex Last Universal Common Ancestor (LUCA) ?

the 30S ribosomal subunit of Thermus thermophilus (5) (back side)

L23p

L13p

L3p

L14p

L29p

L2pL24p

L4p/L4e

L15p

L18pL5p

L6p

L22p

L11p

L34p

L28p

L31pL9p

L19p

L17pL32p

L25p L30p

L33pL21p

L20p

L27pL16p

L36p

L35p

L23p

L13p

L3p

L14p

L29p

L2pL24p

L4p/L4e

L15p

L18pL5p

L6p

L22p

L11p

L34p

L28p

L31pL9p

L19p

L17pL32p

L25p L30p

L33pL21p

L20p

L27pL16p

L36p

L35p

the 50S ribosomal subunit of Deinococcus radodurans (6) (crown view rotated by 180°)

Localisation in the 3D structures « Strip-tease »of the archaeal

ribosome

« Strip-tease »of the archaeal

ribosome

Bacteria-specific proteins (colored in different shades of red) are preferentially located at the periphery of the ribosome

AbstractAbstractAbstractAbstract Ribosomal gene detection : cross-validation Ribosomal gene detection : cross-validation needed !needed !Ribosomal gene detection : cross-validation Ribosomal gene detection : cross-validation needed !needed !

Several representatives f or each protein f amily

BlastP

Proteins

BlastP

Proteins

TBlastN

Completegenomes

Homology DetectionAnalysis)

TBlastN

66 completegenomes

)

Several representatives f or each protein f amily

BlastP

Proteins

BlastP

Proteins

TBlastN

Completegenomes

Homology DetectionAnalysis)

TBlastN

66 completegenomes

)

Protocol of ribosomal gene detection 102 r-protein families

Creation of 24 missed

genes

Complete genomesR-protein families

45

B

acte

ria1

4

Arch

aea

7

Eu

cary

a

100% of the family representatives in both blastp and tblastn>50% of the family representatives in blastp<50% of the family representatives in blastp 0% of the family representatives in both blastp and tblastn 0% of the family representatives in blastp but detected by tblastn

(gene missed during annotation process)

Protein detected by :Validation of protein sequences for each family

Mt L18P S5P L30P L15P SECYL19E... CMK L14EL34EHypADK TRUB

Ap SECY ADK CMK L14EL34EGATAHyp

Ss, St L18P S5P L30P L15P SECYL19E... CMK L14E TRUBa TRUBbL34EHypADK

Mk L18P S5P L30P L15P SECYL19E CMK L14EL34EHypADK

Py CMK L14EL34E TRUBGATA

CMK L14EL34EPa, Ph, Pf, Mj

Mt L18P S5P L30P L15P SECYL19E... CMK L14EL34EHypADK TRUBMt L18P S5P L30P L15P SECYL19E... CMK L14EL34EHypADK TRUB

Ap SECY ADK CMK L14EL34EGATAHypAp SECY ADK CMK L14EL34EGATAHyp

Ss, St L18P S5P L30P L15P SECYL19E... CMK L14E TRUBa TRUBbL34EHypADKSs, St L18P S5P L30P L15P SECYL19E... CMK L14E TRUBa TRUBbL34EHypADK

Mk L18P S5P L30P L15P SECYL19E CMK L14EL34EHypADKMk L18P S5P L30P L15P SECYL19E CMK L14EL34EHypADK

Py CMK L14EL34E TRUBGATAPy CMK L14EL34E TRUBGATA

CMK L14EL34EPa, Ph, Pf, Mj CMK L14EL34EPa, Ph, Pf, Mj

Genomic context analysis

Multiple alignment of complete

sequences

• Coherence of the protein family• Elimination of false-positives• Correction of protein sequences

All the alignments are available at http://www-igbmc.u-strasbg.fr/BioInfo/Rproteins

Progressive elimination of 10 r-proteins (15%) in the course of

archaeal evolution

First example of reductive evolution at domain-scale

E A B

Bacterial rooting

Simple ancestor(s)

A E B

Symbiosis

i o

Simple ancestor(s)

E A B

Eucarya rooting

Complex ancestor(s)

E A B

Bacterial rooting

Simple ancestor(s)

E A B

Bacterial rooting

Simple ancestor(s)

A E B

Symbiosis

i o

Simple ancestor(s)

A E B

Symbiosis

i o

Simple ancestor(s)

E A B

Eucarya rooting

Complex ancestor(s)

E A B

Eucarya rooting

Complex ancestor(s)

Which evolutionary scenario ?

References:1 Altschul,S.F., Madden,T.L., Schaffer,A.A., Zhang,J., Zhang,Z., Miller,W. and Lipman,D.J. (1997) Nucleic Acids Res., 25, 3389-3402.2 Katoh,K., Misawa,K., Kuma,K. and Miyata,T. (2002) Nucleic Acids Res., 30, 3059-3066. 3 Thompson,J.D., Thierry,J.C., Poch,O. (2003) Bioinformatics, 19, 1155-61.  4 Thompson,J.D., Plewniak,F., Ripp,R., Thierry,J.C. and Poch,O. (2001) J. Mol. Biol., 314, 937-951. 5 Wimberly,B.T., Brodersen,D.E., Clemons,W.M., Jr., Morgan-Warren,R.J., Carter,A.P., Vonrhein,C., Hartsch,T. and Ramakrishnan,V. (2000) Nature, 407, 327-339.6 Harms,J., Schluenzen,F., Zarivach,R., Bashan,A., Gat,S., Agmon,I., Bartels,H., Franceschi,F. and Yonath,A. (2001) Cell, 107, 679-688.