2Microbial EvolutionaryGenomics, Institut Pasteur,
received in revised form11 September 2009;accepted 15 September 2009Available online19 September 2009
ivehto i. W
structures and properties could arise from genetic amplifications leading tointernal symmetrical repeats. For this, we identified internal structural
symmetry.1 Such symmetrical structures often result typically corresponds to symmetrical interactions.5
Available online at wwwfrom the homomeric association of elements that arenot themselves symmetrical.2 While the reasons forthis pervasive symmetry remain speculative, severalhypotheses have been proposed. First, the symmet-rical state could be the lowest-energy state and thusprovides more stability.3 Second, symmetry pro-vides a simple way of building oligomers with a
Escherichia coli proteins show an average oligomer-isation state of 4 and only a minority of proteins isfound in monomeric form. In general, the singlemost frequent complex state of a protein might be adimer, most frequently a homodimer with a one-symmetry rotation axis (6070% of all knowncomplexes).6 Within the remaining complexes,Edited by M. Sternberg
Most proteins are biologicalform of oligomers containdefined number of elements an
Corresponding author. Atelier de BUniversit Paris 6, Boite courrier 12075252 Paris cedex 05, France. Efirstname.lastname@example.org.Abbreviation used: ssb, single-str
0022-2836/$ - see front matter 2009 Eprocesses because they show significant sequence similarity. Symmetricalrepeats tend to have a fixed number of copies corresponding to theirrotational symmetry order, that is, two for 180 rotation axis, whereasasymmetrical repeats are in longer proteins and show copy numbervariability. When possible, we confirmed that proteins with symmetricalrepeats folding as an n-mer have homologues lacking the repeat with ahigher oligomerisation number corresponding to the rotation symmetryorder of the repeat. Phylogenetic analyses of these protein families suggestthat typically, but not always, symmetrical repeats arise in one single eventfrom proteins that are homo-oligomers. These results suggest thatoligomerisation and amplification of internal sequences can interplay inevolutionary terms because they result in functional analogues when thelatter exhibit rotational symmetry.
2009 Elsevier Ltd. All rights reserved.
Keywords: evolution; oligomerisation; genetics; symmetry; repeats
active only in theng some sort of
aggregation.1 It has also been proposed that thefolding of symmetrical structures faces fewer kineticbarriers.4 Furthermore, simulations on randomlydocked complexes show that the lowest energyReceived 23 February 2009; 180. These repeats were most likely created by genetic amplificationParis, France
in proteins tend to be symmetrical, we found that about half of the largeinternal repeats are symmetrical, most frequently around a rotation axis ofCNRS, URA2171, F-75015repeats in a nonredundant Protein Data Bank subset. While testing if repeatsAlternative to Homo-oligomLocal Symmetry in Proteins
Anne-Laure Abraham1,2, Jol Po1Atelier de BioInformatique,Universit Pierre et MarieCurieParis 06, F-75005 Paris,France
The biologically actoligomerisation. Suchas been proposedallosteric regulationd therefore avoids
ioinformatique2, 4 place Jussieu,l address:
anded DNA binding.
lsevier Ltd. All rights reserverisation: The Creation ofby Internal Amplification
ier1 and Eduardo P.C. Rocha1,2
state of many proteins requires their prior homo-complexes are typically symmetrical, a feature thatncrease their stability and facilitate the evolution ofe wished to examine the possibility that similar
J. Mol. Biol. (2009) 394, 522534
.sciencedirect.comthere are yet 20% of homomeric interactions.Homotetramers are less frequent than homodimers(1520%), while homotrimers, homohexamers andhomo-octamers are even rarer.1,7 A minor fraction ofproteins is found in the form of very long polymersor higher-order oligomers. Hence, associationsamong proteins leading to symmetrical structuresare thought to play key roles in biological systems.
could inflate or play down the number of repeats. Tocheck for this effect, we identified the presence ofTransient or assortative interactions betweenproteins require the existence of independentmolecules. Hence, a protein or a complex thatparticipates in different complexes is expected tobe coded by independent genes to allow for suchmodularity. But why are monomers not replaced bylonger molecules for the vast number of proteinsthat establish long stable interactions within a singlehomomeric complex? Several evolutionary hypoth-eses have been put forward to answer this question:(1) Nature shows abundant examples where build-ing by accumulation of construction bricks isadaptive.8 (2) Assuming a constant error rate anda faulty protein elimination mechanism, it could bemore efficient to construct multiple small subunitsthan larger ones.9 However, it is unclear if theremoval of mistranslated proteins is quick enough toprevent the establishment of a misfolded complex,knowing that such associations often lead tonegative dominant phenotypes.10 (3) The possibilityof associating and dissociating subunits creates apotential for function enhancement and regulation.(4) It has been argued that oligomerised proteins aremore evolutionarily constrained and thus subject tomore stringent selection.2 However, among func-tionally equivalent objects, evolution often favourselements that can easily evolve to adapt overelements that do not tolerate mutations.11 This isbecause the purge of deleterious mutations involvesthe elimination of individuals carrying them fromthe population. This leads to higher genetic load,and it is therefore deleterious in most circumstances.Close repeats occur spontaneously at high rates in
both eukaryotes and prokaryotes and may result induplication of structural domains.12,13 Amplifica-tions can also arise by exon shuffling in eukaryotes,14
where proteins are indeed three times more likely tocontain internal repeats.12 Around 14% of proteinshave been found to have long internal repeats.12
These repeats have important evolutionary rolesand are present even in small bacterial genomes.1518
Accordingly, recent works have shown relativelyhigh frequencies of domain gain, loss and duplicationin proteins.1921 Yet, there has been little work on adirect consequence of such events: that homo-oligomers could be replaced by symmetrical internalstructures created by intragenic partial duplications.We conducted a study to test this idea. First, weidentified internal repeats within a nonredundantdata bank of protein structures.We then classed themas symmetrical or asymmetrical. It must be empha-sised at this stage that what we call a symmetricalrepeat is a set of structural elements in a given protein(i.e., copies of a repeat) that can be superimposedwith a low resulting RMSD after a given symmetryoperation. Many of these structural repeats cannot bestrictly symmetrical, since in general they do not havestrictly identical sequences. They should thus becalled pseudo-symmetrical. Yet, for simplicity, weput together symmetrical and pseudo-symmetrical
Symmetric Repeats and Oligomersrepeats under the same term. We classed repeat-containing proteins according to their structuralfeatures, to separate -rich proteins, very repetitiveinternal repeats in the one structure per familydata set of Astral. We found 3% of proteinscontaining long structural repeats. This is close tothe ratio found with the less than 50% identitydata set in Astral. Among the 172 proteins contain-ing repeats, there are 103 different folds. This clearlyshows that our results are not dominated by one or afew folds being overrepresented in the data set.Since we used very stringent length and similarity
criteria to identify internal repeats, we investigated ifwe would have found more than 172 proteins withmore typical significance thresholds. The defaultvalues of Swelfe, score N250 and relative RMSD b0.5(see Materials and Methods), are estimated toconservatively result in a p value of 103.22,24proteins and the group of other proteins more likelyto show features resembling that of homo-oligomers.We then used the latter set to search for homologousproteins with different multiplicities of the repeat,that is, proteins with just one copy of the repeatedmotif, proteins with two copies of the repeat andproteins with higher number of copies. Naturally,proteins with just one copy of the motif do not have arepeat. We found that proteins with one single copyof the repeat tended to have a doubled state of homo-oligomerisation relative to proteins with two copiesof the repeat. We then analysed the evolution ofthe families of proteins with elements containing andlacking the symmetrical repeat in a phylogeneticframework.
Proteins contain long symmetrical repeats
We searched for structural repeats longer than 50residues among 8657 protein structures of the Astraldata bank. We focused on long repeats becausethese have extremely low likelihoods of arising byrandom assembling of residues. Repeats wereidentified with Swelfe,22 which uses dynamicprogramming to find optimal repeated substruc-tures while weighting matches according to thefrequency of angles in the Protein Data Bank(PDB). This allows downplaying the role of veryfrequent angles involved in archetypical second-ary structural elements such as -helices or -sheets.We found 172 proteins containing long structuralrepeats. They correspond to 2% of the data set (cf.Supplementary Table 1). We included in ouranalysis proteins of the Astral data bank with lessthan 50% sequence identity among themselves.23
This avoids making multiple hits among closelyrelated structures. We kept entire proteins, and notonly domains, for our analyses. If some families offolds were overrepresented in our data set, this
523Using these parameters, we found internal structuralrepeats in 1900 structures, that is, in 22% of theset. We wish to test a very specific hypothesis and
Fig. 1. Histogram of rotationangles allowing the superimposi-tion of the two copies of the repeatfor the 172 proteins containingrepeats. Repeats with a 180 angleare very numerous and correspond
524 Symmetric Repeats and Oligomersmake a proof of principle. Hence, in the remaininganalyses we preferred the use of the smaller but veryreliable data set of long repeats, even if thisrepresents only a small sample of the overall numberof repeats.We calculated the superimposition angle of the
two copies of the internal repeats (see Materials andMethods and Supplementary Table 1). The histo-gram of rotation angles shows that rotations of180 vastly outnumber all others (Fig. 1). There are61 repeats with a rotation angle between 170 and180, indicating that 35% of long structural repeatshave a 2-fold symmetry axis (C2). The 2-foldsymmetry axis is pervasive among homodimers,which are the most abundant homo-oligomers.Hence, this large group of repeats is especiallyinteresting to study in the framework of ourhypothesis that sequence amplifications providethe opportunity to generate symmetrical structuresanalogous to that of homo-oligomers. We also foundsmaller peaks at rotation angles of 120, 90, and
60, which correspond to 3-, 4- and 6-foldsymmetry (Fig. 1). While the number of proteins islow, the number of repeats with these rotationalangles is higher than expected if distribution wasuniform in the range 0160 (p=0.03, p=0.06 forangles 170, Wilcoxon tests). There are 89 proteinscontaining pairs of repeats (51% of the set) withverified rotation angles of 180, 120, 90 or 60,showing that many of the long internal repeats aresymmetrical under an axis of rotation. This is inagreement with our hypothesis that internal ampli-fications can give rise to symmetrical elementssharing structural resemblanceswith homo-oligomers,especially homodimers.
Classifying proteins according to their structure
We clustered proteins with internal repeats intothree groups: -helix-rich proteins, very repetitiveproteins and other proteins (Fig. 2, SupplementaryTable 1, see Materials and Methods for details).Since we suspected that these groups of proteinsunveiled essentially different biological histories, weanalysed them separately. By construction, the firstgroup contains proteins with more than 85% of angles in the range 4065, which correspond to the
to a 2-fold symmetry. There are alsosome small peaks at 60, 90 and120 that might correspond to 6, 4and 3-fold rotation symmetry.angles found in -helices. We compared thesimilarity score between the pairs of copies of therepeats in this group with those of the other twogroups (Fig. 3): Scores corresponding to -helix-richrepeats are significantly lower than the others
Fig. 2. Number of proteins withrepeats that are symmetrical (2- to6-fold) (inner circle in bold) or not(outer circle) in the three categoriesof repeats.
(p=2.4109, Wilcoxon one-sided test). Moreover,this group of proteins often showed negativesequence similarity scores between the two copiesof the repeat. This means that the amino acidsequences of the two copies of the structural repeatare so dissimilar that they cannot be alignedmeaningfully. This suggests that these structuralelements are either very distant homologues whosesequences are saturated with changes or structuralanalogues resulting from convergent evolution.25 -Helices are very abundant because they result froma large variety of protein sequences and are thusparticularly prone to convergent evolution. Thisclass contains diverse functions. For example, thePDB entry 1cii corresponds to a protein of the colicinfamily. This ion-channel-forming protein kills bac-terial cells by co-opting their active transport path-ways and forming voltage-gate ion-conductingchannels across the plasma membrane of thebacteria. The domain made up of two helices (160amino acids long) that are nearly symmetrically
Symmetric Repeats and Oligomersrepeated (rotation angle of 162) enables themolecule to span the periplasmic space and contactsimultaneously the outer and plasma membrane.26
Other -rich proteins include the botulinum neuro-toxin type B (PDB entry 1s0e, with a symmetry angleof 175),27 which is a very potent toxin to humansand causes paralysis, and a bacterioferritin (1nf4,with a symmetry angle of 176), which is able tostore two iron ions.28
The second group of proteins with internal repeatscontains very repetitive proteins. They were identi-fied by visual inspection of structures containing atleast six copies of the repeat. Repetitive proteinscontaining many -helices are in the first group(-helix-rich proteins). Some of these very repetitiveproteins bind other proteins. For example, theGroucho protein (1gxr)29 is a transcriptional co-repressor that interacts with DNA-bound transcrip-tion factors and histones. It contains a seven-bladed-propeller WD40 repeat domain. The -propeller isalso found on...